Understanding Informal Mathematical Discourse ...

Understanding Informal Mathematical Discourse Verstehen Informeller Mathematischer Beweise

Der Technischen Fakultät der Universität Erlangen-Nürnberg zur Erlangung des Grades DOKTOR—INGENIEUR

vorgelegt von

Claus Werner Zinn

Erlangen — 2004

Als Dissertation genehmigt von der Technischen Fakultät der Universität Erlangen-Nürnberg Tag der Einreichung: Tag der Promotion: Dekan: Berichterstatter:

15. Oktober 2003 30. Januar 2004 Prof. Dr. rer. nat. Albrecht Winnacker Prof. Dr. phil. Dr.-Ing. habil. Herbert Stoyan Prof. Dr. h.c. Hans Kamp, PhD

Arbeitsberichte des Instituts für Informatik Friedrich-Alexander-Universität Erlangen Nürnberg

Band 37 • Nummer 4 • September 2004

Claus Zinn

Understanding Informal Mathematical Discourse Dissertation

Herausgeber:

M. Dal Cin, R. German, G. Görz, G. Greiner, U. Herzog, F. Hofmann, J. Hornegger, S. Jablonski, K. Leeb, P. Mertens, K. Meyer-Wegener, H. Müller, H. Niemann, Ch. Pflaum, M. Philippsen, U. Rüde, F. Saglietti, H. J. Schneider, W. Schröder-Preikschat, M. Stamminger, H. Stoyan, J. Teich, H. Wedekind

Die Reihe der Arbeitsberichte des Instituts für Informatik (ehem. Institut für Mathematische Maschinen und Datenverarbeitung) der Universität Erlangen-Nürnberg erscheint seit 1967. Begründet von Prof. Dr. Dr. h. c. mult. Wolfgang Händler

Universität Erlangen-Nürnberg Institut für Informatik Martensstr. 3 91058 Erlangen Tel.: Fax: E-Mail: WWW:

+49.9131.85.27613 +49.9131.39388 [email protected] http://www.informatik.uni-erlangen.de/

© Universität Erlangen-Nürnberg, Institut für Informatik 2004 Alle Rechte bleiben vorbehalten. Nachdruck, auch auszugsweise, nur mit Genehmigung der Herausgeber. ISSN 1611-4205

Herstellung:

Gruner Druck GmbH Sonnenstr. 23b, 91058 Erlangen, Tel. (09131) 6170-0

¨ MEINE M UTTER F UR

iv

Acknowledgements I am very grateful to my first supervisor, Herbert Stoyan, for raising my interest in the thesis’s topic and the opportunity to work on it. I owe a great deal to him for all his help, advice and encouragement (“Nur Mut, Claus!”) while I was writing this dissertation. I am also indepted to Hans Kamp for taking an interest in my work, and for inspiring much of this research’s underlying computational framework. Also, I’d like to thank him for assuming the role of second supervisor. Most of the work on this thesis was carried out while I was employed at Herbert Stoyan’s department of artificial intelligence (AI) at the University of Erlangen-Nuremberg. I had both the privilege and the burden to tutor most of Herbert’s classes, and when you have to explain things to others, you’d better know them. I have also good memories of lecturing advanced AI programming methods in Lisp and Prolog with my colleague Wolfgang Jaksch. All this teaching was very time-consuming, but a lot of fun and a source of inspiration as well. The present thesis also profited from a brief but fruitful four week research visit to the Institute of Mathematics, University of Warsaw (Białystol Branch). Thanks to Andrzej Trybulec and his colleagues for welcoming Wolfgang and me, and for all their helpful advice on the use of their Mizar proof checker. A considerable part of my thesis work was carried out at the University of Edinburgh. In the first few months, I had the opportunity to work in Alan Bundy’s DREAM group. There, I have learned a lot from discussions with Alan and his colleagues as well as from the use of their in-house λ-Clam proof planning system. The mathematical analysis of textbook proofs, as discussed in Ch. 3, profited from this valuable experience. Unfortunately, a joint grant proposal written with Alan Bundy and Johanna Moore building upon this thesis’s work never got funded. Fortunately, Johanna Moore had a vacancy to fill and hired me to work on her tutorial dialogue project. The forthcoming four years resulted into BEETLE, a tutorial learning environment for basic electricity and electronics. As a consequence, work on this thesis was pushed into after-office and week-end hours. Nevertheless, after quite some time, I am delighted to see this thesis coming to an end. I am very grateful to Alan Bundy for his valuable comments and advice on particular aspects of my work. Also, I’d like to thank Carolyn Penstein Rosé for proof reading many parts of my thesis; her red ink crossed the atlantic in thick airmail, and helped improve my English considerably. A big thank also to Bettina Braun who helped tremendously in the final stages of this project. Over the past years many colleagues, referees and friends have made comments and suggestions and provided support. To all those: thank you!

v

vi

Abstract Automated reasoning is one of the most established disciplines in informatics and artificial intelligence, and formal methods become increasingly employed in practical applications. However, for the most part, such applications seem to be limited to informatics-specific areas (e.g., the verification of correctness properties of software and hardware specifications) and areas close to informatics such as computational linguistics (e.g., the computation of consistence and informativeness properties of semantic representations). Naturally, there is also a potential for practical applications in the the area of mathematics: the generation of proofs for mathematically interesting and motivated theorems and, quite associated, the computer-supported formalisation of (parts of elementary) mathematics. It is a matter of fact, however, that mathematicians rarely use computer-support for the construction and verification of proofs. This is mainly caused by the “unnaturalness” of the language and the reasoning that such proof engines support. In the past, researchers in the area of automated reasoning have focused their work on formalisms and algorithms that allow the construction and verification of proofs that are written in a formal-logical language and that only use a limited number of inference rules. For the computer scientist, such formal proofs have the advantage of a simple and ambiguous-free syntax, which can thus be easily processed. Moreover, the limited number of inference rules has a direct impact on the complexity of the search space that needs to be conquered during the process of constructing proofs. The verification of given formal proofs is greatly facilitated by the complete explicitness of their logical argumentation where no reasoning step is left out. For mathematicians, however, such formal proofs are usually hard to understand. For them, they are written in an unfamiliar and artificial language and much too detailed. Moreover, the sheer number of inference steps, while logically relevant, describe only trivial mathematical details and make it difficult to follow a proof’s main underlying argumentation line. In practise, thus, mathematicians use proof generation engines rather seldom, if at all. The same is true with regard to proof verification tools. The amount of work that is required to verify a mathematical proof with such tools is considerable, if not prohibitive. Since proof verification systems only accept formal proofs as input, the mathematician’s first task is to manually translate the mathematical proof into the formal language that is accepted by the verifier. This in turn includes the translation of the proof’s underlying mathematical argumentation into inference rules that are supported by the proof engine. Such translations and refinements are usually very time consuming, tedious, and prone to error themselves. Hence, how the proof verifier then judges the result of proof translation and proof refinement is only of limited relevance to the original mathematical proof. From the mathematician’s point of view, there is thus a need for a proof verification system that is capable of processing mathematical proofs automatically, at least with regard to translating the mathematicians’ expert language into the system’s artificial formal language. Such a system would have an enormous potential in the community of mathematics, and this potential has been recognised early. In the beginning of the 1960s, John McCarthy, one of the pioneers of artificial intelligence, remarked that “[c]hecking mathematical proofs is potentially one of the most interesting and useful applications of automatic computers” [111]. More than forty years thereafter, a tool that supports mathematicians with the verification of mathematical proofs is more science fiction than reality. Its realisation is associated with research questions within the disciplines of automated reasoning and computational linguistics that are still only partially answered. This thesis aims at contributing towards the realisation of a verifier for mathematical proofs. It attempts to provide a general framework as well as an implementation for such a proof engine. The dissertation’s objects of study are short and simple proofs that were taken from textbooks on elementary number theory. Fig. 1 depicts a proof of the mathematical truth that every positive integer greater than 1 can be represented as a vii

viii product of one or more primes. The proof consists of only a few lines and little mathematical knowledge is necessary to follow the proof author’s argumentation. It is this kind of short proof that we attempt to check automatically.

Figure 1: A proof taken from Hardy & Wright’s introduction to elementary number theory [68].

The nature of this problem forces us to conduct our investigation using terminology and techniques from two AI disciplines: natural language processing (NLP) and automated reasoning (AR). In this thesis, we endeavour to demonstrate that NLP techniques are useful for AR, and vice versa. Furthermore, we demonstrate how extensions of existing techniques can be combined in a novel and promising way to solve the problem at hand. From the perspectives of these two disciplines, we can phrase the thesis’s underlying questions as follows: Can we build a program that • understands discourse in a particular text-genre, namely, mathematical discourse? • mechanically verifies the correctness of informal mathematical arguments? We hypothesise that these two questions must have the same answer because understanding mathematical discourse and automatically checking the correctness of what is said in a mathematical discourse are closely related tasks. Beyond this, we argue that the semantics of an informal mathematical argument is its “corresponding” formal proof, and that discourse understanding in this text genre is defined by generating a formal proof from the informal one. That is, a computer system understands a mathematical proof if is is able to refine the given informal argument into a formal proof. After a brief discussion of the notion of proofs and a comprehensive motivation (ch. 1), we describe related work in ch. 2. From the literature review, we learn that the state-of-the-art with regards to the automatic verification of informal mathematical proofs is rather disappointing. Donald Simon was the first who tackled this problem in its full complexity [146, 147]. However, Simon’s approach suffers from the lack of theoretical groundwork. It falls short of incorporating sophisticated techniques from both AR and NLP to model, represent and operationalise the reasoning that one encounters in mathematical proofs, and to appropriately analyse and handle the linguistic problems they contain. The third and fourth chapter of this thesis are therefore devoted to the analysis of textbook proofs. We show that the mechanical “understanding” of textbook proofs and the mechanical verification of their correctness are not trivial. In contrary, we find that they presuppose a considerable amount of mathematical and meta-mathematical knowledge as well as a complete language processing capability on the syntactical, morphological, semantic, and pragmatic level. A deep mathematical analysis of a number of informal proofs

ix (real, unpolished discourse taken from Hardy & Wright’s [68] and LeVeque’s [107] textbooks) is undertaken and reported in ch. 3. Our analysis succeeds in capturing the proofs’ underlying logical structure. Moreover, with the support of the λ-Clam proof planner [131], we construct formal proof objects that preserve the logical structure of the given textbook proofs. In this task, we serve as a mediator between textbook proof (or the respective proof author) and a proof system, the first guiding the latter through the large space of possible inferences. The remaining goal of the dissertation is, thus, to mechanise the mediator. After having disclosed the logical structure of proofs, we need to understand how such logical dependencies are communicated with language. The fourth chapter is thus devoted to the linguistic analysis of textbook proofs. Despite of the good style of mathematical writing that is propagated by many guidelines that are directed at teaching a clear exposition of mathematical ideas (e.g., [75], [101], [162]), the expert language of mathematicians is shown to have many pitfalls. The linguistic phenomena that we know from other text genres re-occur in mathematical discourse. Moreover, they are complemented with phenomena that stem from the use of symbols and their interaction with English text. The mathematician’s endeavour for a precise but concise expression of mathematical ideas leads to formulations that rely heavily on their context. References to the context, in particular, extra-sentential anaphoric expressions (e.g., the use of variable names), presuppositions (e.g., the use of definite noun phrases like “the induction hypothesis”, “the second case”) and elliptic constructions (“Therefore some mathematical statement A”), occur particularly frequent in mathematical texts. The correct identification of such linguistic relations is as necessary for a proper understanding of mathematical texts as is a proper syntactic analysis of sentence structure and the composition of its semantic parts. In ch. 4, we attempt to give a complete coverage of such linguistic phenomena. Our analysis, given the complexity of the task, had to be more shallow than deep. Nevertheless, some material has been covered to some depth, for example, the important use of variable names and conditional expressions. In the second part of this dissertation, we describe Vip (“Verifying Informal Proofs”), our system for the automatic processing of mathematical discourse. The main task is to develop a proof interpretation system that translates textbook proofs into a formal representation, which can then be verified for correctness by a formal proof checker. We claim that one can view the result of such translations as specifications for their semantics, that is, as their canonical logical forms. Vip’s design extends and integrates state-of-the-art technologies from discourse representation theory and proof planning. For the representation of isolated proof sentences, we use an extension of DRT (cf. ch. 5). We discuss a variety of λ-DRT representations for constant, variable, function, and predicate symbols. Furthermore, we propose a number of DRT construction rules that aim at covering typical sentence-level constructions. For the representation of multi-sentence mathematical discourse, we introduce proof representation structures (PRSs) as the central data structure (cf. ch. 6). We describe a PRS as consisting of a numbered list of proof lines that introduce either a term (“discourse referent”) or a formula into the proof context. Proof lines that introduce a formula have an annotation that mark its status (assumption or derived statement) as well as the inference rule that leads to its introduction. Such annotations, together with the numbering scheme, mirror the “rhetorical” structure of the original input proof. Moreover, the numbering scheme defines an accessibility relation on terms and formulae, which could then serve as antecedents for anaphoric expressions that occur later in the input text. PRSs are a considerable extension to discourse representation structures accommodating our need for representing highly-structured mathematical text and handling substructures as well as mathematical sentences as first-class citizens. Proof plan schemata, representing general argumentation patterns in mathematical reasoning, are represented as underspecified proof representation structures. Such reasoning patterns are then appropriately instantiated during the interpretation of the input proof. For the construction of PRSs, we propose an algorithm that is powered by pragmatics (cf. ch. 7). The discourse update engine is described as incorporating an underspecified semantic representation of a single proof sentence into the proof context by making use of a proof planner. The proof planner is informed by mathematical and meta-mathematical knowledge. We argue that proof plan schemata, capturing common patterns of mathematical reasoning, enable us to gain a high level discourse understanding, allowing us to follow the proof author’s argumentation line. A high-level discourse understanding is seen as a prerequisite for computing the parts “between the lines”. As we demonstrate, much inference is required to compute this implicit information. Similar to the construction of DRSs, a PRS is constructed incrementally, by processing the input text sentence by sentence. The first sentence s1 of a discourse is processed in an initial context c0 , being somehow provided to the system, resulting in a new and richer context c1 . Each other sentence si

x of a discourse is processed in the context ci−1 that has been created by processing the earlier sentences of the discourse. The result of processing si is to enrich ci−1 by the semantic contribution of si resulting in a new and richer ci . The context can be enriched by either adding new discourse referents, or new conditions for discourse referents, or both. This incremental process is performed by the discourse update engine. It includes the computing of semantic parts that stem from extra-sentential linguistic phenomena, which where left underspecified by the sentence parser. Moreover, the discourse update engine treats each component of a sentence-level representation (each discourse referent as well as the “sum” of its discourse conditions) anaphorically, referring to its place in the proof representation (the partial PRS constructed so far). The update engine attempts to identify their respective antecedents by observing the PRSs accessibility relation. If this succeeds for each component of the sentence-level representation, then the interpretation of the sentence-level representation succeeds. Otherwise, the discourse update engine attempts to accommodate the proof context to allow establishing such links. Proof accommodation is informed by pragmatic information. The discourse update engine, with the help of a proof planner (who in turn has access to mathematical and meta-mathematical knowledge sources), identifies and instantiates underspecified semantic representations (uPRS) that are then incorporated in the current proof context. With an accommodated proof context, the discourse update engine subsequently re-attempts an interpretation of the sentence-level representation. This interpretation and accommodation loop is followed until all anaphoric links between sentence-level components and proof context are established; in the worst case, however, until a maximal threshold of proof accommodations is reached, in which case the interpretation of a sentence-level representation fails. In this case, a backtracking algorithm then attempts to construct other interpretations for previous input sentences, thus generating alternative proof contexts for the interpretation of the current input sentence. The development of a verification algorithm for textbook proofs is a challenging task, for both the disciplines automated reasoning (modelling mathematical reasoning) and computational linguistics (processing the expert language of mathematicians). We believe, however, that mathematical proofs constitute one of the best text genres to overcome the major obstacle, namely to effectively support interpretation with the exploitation of extra-linguistic knowledge (e.g., domain knowledge, conventions in text composition, rhetorical structure). In this dissertation, we have done a step toward the realisation of such a system.

Contents 1 Introduction 1.1 The Automated Reasoning View: Verifying Textbook Proofs . . 1.1.1 On the Notion of Formal Proof . . . . . . . . . . . . . . 1.1.2 Proofs in Mathematical Practise . . . . . . . . . . . . . 1.2 The Linguistic View: Understanding Mathematical Discourse . . 1.2.1 The Gricean View . . . . . . . . . . . . . . . . . . . . 1.2.2 A Computational Linguist’s View . . . . . . . . . . . . 1.2.3 The Semanticist’s View . . . . . . . . . . . . . . . . . . 1.3 Research Program and Research Questions . . . . . . . . . . . 1.3.1 Doing Formalised Mathematics with Computer Support 1.3.2 The Mathematician’s Assistant . . . . . . . . . . . . . . 1.3.3 Research Questions. . . . . . . . . . . . . . . . . . . . 1.4 The Organisation of the Thesis . . . . . . . . . . . . . . . . . . 1.4.1 The System’s Architecture . . . . . . . . . . . . . . . . 1.4.2 Thesis Overview . . . . . . . . . . . . . . . . . . . . . 1.4.3 Three Notes . . . . . . . . . . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . .

. . . . . . . . . . . . . . .

. . . . . . . . . . . . . . .

. . . . . . . . . . . . . . .

. . . . . . . . . . . . . . .

. . . . . . . . . . . . . . .

. . . . . . . . . . . . . . .

. . . . . . . . . . . . . . .

. . . . . . . . . . . . . . .

. . . . . . . . . . . . . . .

. . . . . . . . . . . . . . .

. . . . . . . . . . . . . . .

. . . . . . . . . . . . . . .

. . . . . . . . . . . . . . .

. . . . . . . . . . . . . . .

1 2 2 4 8 8 10 10 10 10 12 13 15 15 16 16

2 Review of Related Work 2.1 A Systems’ Review . . . . . . . . . . . . . . . . . . . . . . . . 2.1.1 Abrahams’ Proofchecker . . . . . . . . . . . . . . . . . 2.1.2 Bobrow’s Student . . . . . . . . . . . . . . . . . . . . 2.1.3 Mecho — A Program to Solve Mechanics Problems . . 2.1.4 Simon’s Nthchecker . . . . . . . . . . . . . . . . . . . 2.2 Automated Reasoning . . . . . . . . . . . . . . . . . . . . . . . 2.2.1 Constructing and Checking Formal Proofs . . . . . . . . 2.2.2 Towards a Mathematical Vernacular . . . . . . . . . . . 2.2.3 Capturing and Operationalising Mathematical Reasoning 2.2.4 Proof Representation . . . . . . . . . . . . . . . . . . . 2.3 Understanding Multi-Sentence Discourse . . . . . . . . . . . . 2.3.1 The Scripts of Schank et al. . . . . . . . . . . . . . . . 2.3.2 Hobbs et al’s Interpretation as Abduction — Tacitus . . 2.3.3 Discourse Representation Theory (DRT) . . . . . . . . .

. . . . . . . . . . . . . .

. . . . . . . . . . . . . .

. . . . . . . . . . . . . .

. . . . . . . . . . . . . .

. . . . . . . . . . . . . .

. . . . . . . . . . . . . .

. . . . . . . . . . . . . .

. . . . . . . . . . . . . .

. . . . . . . . . . . . . .

. . . . . . . . . . . . . .

. . . . . . . . . . . . . .

. . . . . . . . . . . . . .

. . . . . . . . . . . . . .

. . . . . . . . . . . . . .

. . . . . . . . . . . . . .

19 19 19 20 20 22 25 25 28 29 30 32 32 33 35

3 A Mathematical Analysis of Proofs 3.1 Basic Proof Techniques . . . . . . . . . . . . . . . . . 3.2 The Fundamental Theorem of Arithmetic (Existence) . 3.2.1 Hardy and Wright’s Existence Proof . . . . . . 3.2.2 LeVeque’s Existence Proof . . . . . . . . . . . 3.3 The Fundamental Theorem of Arithmetic (Uniqueness) 3.3.1 Using Ellipsis . . . . . . . . . . . . . . . . . . 3.3.2 Hardy and Wright’s Uniqueness Proof . . . . . 3.4 Hardy and Wright’s Proof of Theorem 3. . . . . . . . . 3.5 Conclusion . . . . . . . . . . . . . . . . . . . . . . .

. . . . . . . . .

. . . . . . . . .

. . . . . . . . .

. . . . . . . . .

. . . . . . . . .

. . . . . . . . .

. . . . . . . . .

. . . . . . . . .

. . . . . . . . .

. . . . . . . . .

. . . . . . . . .

. . . . . . . . .

. . . . . . . . .

. . . . . . . . .

. . . . . . . . .

37 37 41 41 45 47 48 49 52 53

xi

. . . . . . . . .

. . . . . . . . .

. . . . . . . . .

. . . . . . . . .

. . . . . . . . .

xii

CONTENTS

4 A Linguistic Analysis of Proofs 4.1 General Remarks on Mathematical Writing . . . . . . . . . . . . . . . . . . 4.2 Denoting in Mathematical Discourse . . . . . . . . . . . . . . . . . . . . . . 4.2.1 Constant Symbols and other Proper Names . . . . . . . . . . . . . . 4.2.2 Variables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.2.3 Functions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.2.4 Predicates . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.2.5 Definite Descriptions . . . . . . . . . . . . . . . . . . . . . . . . . . 4.3 Linguistic Phenomena . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.3.1 A Discourse Analysis Focusing on Terms and Formulae. . . . . . . . 4.3.2 A Systematic Account on Anaphoric Linkage and Elliptic Constructs 4.3.3 Propositional Discourse Entities . . . . . . . . . . . . . . . . . . . . 4.4 Connectives, Conditional and Pseudo-Conditional Statements . . . . . . . . . 4.4.1 Conjunction, Disjunction, and Negation . . . . . . . . . . . . . . . . 4.4.2 Conditionals . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

. . . . . . . . . . . . . .

. . . . . . . . . . . . . .

. . . . . . . . . . . . . .

. . . . . . . . . . . . . .

. . . . . . . . . . . . . .

. . . . . . . . . . . . . .

. . . . . . . . . . . . . .

. . . . . . . . . . . . . .

55 55 59 60 61 66 67 68 69 69 71 76 79 79 81

5 Mathematical Discourse and DRT — The Parser Module 5.1 Discourse Representation Theory . . . . . . . . . . . . . . . . . 5.1.1 Formal definition of DRSs. . . . . . . . . . . . . . . . . 5.1.2 The Construction of DRSs from the Syntax Tree. . . . . 5.1.3 The Construction of DRSs for Multi-Sentence discourse. 5.2 Semantic Construction for Terms and Formulae . . . . . . . . . 5.2.1 Constants . . . . . . . . . . . . . . . . . . . . . . . . . 5.2.2 Variables . . . . . . . . . . . . . . . . . . . . . . . . . 5.2.3 Functions . . . . . . . . . . . . . . . . . . . . . . . . . 5.2.4 Complex Terms . . . . . . . . . . . . . . . . . . . . . . 5.2.5 Type, Quantification and Scope of Discourse Entities . . 5.3 A DRT Treatment of Selected Linguistic Phenomena . . . . . . 5.3.1 Relational Nouns and Adjectives . . . . . . . . . . . . . 5.3.2 Light Verbs (have Phrases) . . . . . . . . . . . . . . . . 5.3.3 Complex Referring Expressions . . . . . . . . . . . . . 5.4 Connectives, Condititional and Pseudo-Conditional Statements . 5.4.1 Conjunction, Disjunction and Negation . . . . . . . . . 5.4.2 Conditionals . . . . . . . . . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . . .

89 89 90 91 93 94 94 97 104 106 107 110 110 112 113 115 115 116

6 Proof Representation Structures 6.1 Motivation . . . . . . . . . . . . . . . . . . . 6.1.1 Abstract Discourse Referents . . . . . 6.1.2 Representing Discourse Structure . . 6.2 Proof Representation Structures . . . . . . . 6.3 Proof Plan Schemata — Underspecified PRSs 6.3.1 Formal description of uPRS . . . . . 6.3.2 A Few Example uPRSs . . . . . . . . 6.3.3 Vip’s Library of Proof Methods . . . 6.4 Theory Representation Structures . . . . . . .

. . . . . . . . .

. . . . . . . . .

. . . . . . . . .

. . . . . . . . .

. . . . . . . . .

. . . . . . . . .

. . . . . . . . .

. . . . . . . . .

. . . . . . . . .

. . . . . . . . .

. . . . . . . . .

. . . . . . . . .

. . . . . . . . .

. . . . . . . . .

. . . . . . . . .

125 125 125 127 129 133 134 134 135 136

7 Construction of Proof Representation Structures 7.1 The Discourse Update Algorithm . . . . . . . . . . . . . . . . . . . . . . 7.1.1 Descending a Complex DRS. . . . . . . . . . . . . . . . . . . . 7.1.2 Integrating a DRS into the Proof Context. . . . . . . . . . . . . . 7.1.3 Accommodating the Current Proof Context — The Proof Planner. 7.2 Two Construction Examples . . . . . . . . . . . . . . . . . . . . . . . . 7.2.1 First Example . . . . . . . . . . . . . . . . . . . . . . . . . . . .

. . . . . .

. . . . . .

. . . . . .

. . . . . .

. . . . . .

. . . . . .

. . . . . .

. . . . . .

. . . . . .

. . . . . .

139 139 140 142 143 145 145

. . . . . . . . .

. . . . . . . . .

. . . . . . . . .

. . . . . . . . .

. . . . . . . . .

. . . . . . . . .

. . . . . . . . .

. . . . . . . . .

. . . . . . . . .

. . . . . . . . .

CONTENTS

7.3

xiii

7.2.2 Second Example . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 153 Proof Plan Refinement. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 161

8 Conclusion and Future Work 8.1 Recapitulation . . . . . . . . . . . . . . . . . . . . . 8.2 Significance . . . . . . . . . . . . . . . . . . . . . . 8.2.1 The Automated Reasoning Perspective . . . 8.2.2 The Computational Linguistics Perspective . 8.3 System Limitations and Future System Extensions . . 8.3.1 Current System Status . . . . . . . . . . . . 8.3.2 Extensions to Vip . . . . . . . . . . . . . . . 8.4 Future Work . . . . . . . . . . . . . . . . . . . . . . 8.4.1 Investigating the Use of Rhetorical Relations 8.4.2 On Proof Representation . . . . . . . . . . . 8.4.3 A Shallow Parsing Approach . . . . . . . . . 8.4.4 A Tutoring System for Mathematical Proofs . Bibliography

. . . . . . . . . . . .

. . . . . . . . . . . .

. . . . . . . . . . . .

. . . . . . . . . . . .

. . . . . . . . . . . .

. . . . . . . . . . . .

. . . . . . . . . . . .

. . . . . . . . . . . .

. . . . . . . . . . . .

. . . . . . . . . . . .

. . . . . . . . . . . .

. . . . . . . . . . . .

. . . . . . . . . . . .

. . . . . . . . . . . .

. . . . . . . . . . . .

. . . . . . . . . . . .

. . . . . . . . . . . .

. . . . . . . . . . . .

. . . . . . . . . . . .

. . . . . . . . . . . .

. . . . . . . . . . . .

163 163 164 164 165 166 166 166 168 168 169 169 170 172

xiv

CONTENTS

Chapter 1

Introduction This doctoral dissertation is concerned with answering two questions: (i) can we build a program that understands discourse in a particular text-genre, namely, mathematical discourse; and (ii) can we build a program that mechanically verifies the correctness of informal mathematical arguments. We hypothesise that these two questions must have the same answer because understanding mathematical discourse and automatically checking the correctness of what is said in a mathematical discourse are closely related tasks. Beyond this, we will argue that the semantics of an informal mathematical argument is its corresponding formal proof, and that discourse understanding in this text genre is defined by generating a formal proof from the informal one. Fig. 1.1 depicts a proof of the mathematical truth that every positive integer greater than 1 can be represented as a product of one or more primes. The proof consists of only a few lines and little mathematical knowledge is necessary to follow the proof author’s argumentation. It is this kind of short proof, taken from textbooks on elementary number theory, that we would like to check automatically.

Figure 1.1: The First Proof in Hardy & Wright’s Elementary Number Theory [68]

The nature of this problem forces us to conduct our investigation using terminology and techniques from two AI disciplines: natural language processing (NLP) and automated reasoning (AR). In this thesis, we endeavour to demonstrate that NLP techniques are useful for AR, and vice versa. Furthermore, we will demonstrate how extensions of existing techniques can be combined in a novel and promising way. 1

2

Introduction

In this introductory chapter, we will discuss the thesis’ topic from both the automated reasoning and natural language processing perspectives. Next, we formulate the underlying research questions of this thesis. We close with an architectural overview of an implemented system for checking textbook proofs and give an outline of the remainder of this thesis.

1.1

The Automated Reasoning View: Verifying Textbook Proofs

The notion of proof, albeit central to mathematics, has no standard, commonly agreed upon definition among mathematicians. Consequently, in mathematical practise, the standard of checking proofs remains equally vague. That is, it is not clear what proof verification consists of. In this section, we will compare mathematical proofs, proofs in mathematical practise as communicated among mathematicians, with formal proofs. We start with the well-defined notion of formal proof.

1.1.1

On the Notion of Formal Proof

There is a notion of proof that is rarely used in mathematical practise: formal proof. Fig. 1.2 depicts a number of definitions capturing this notion (taken from Mendelson’s contemporary introduction to mathematical logic [115]). This notion of proof, so familiar nowadays to logicians, evolved over the last 2000 years

A formal theory L is defined when the following conditions are satisfied: 1. A countable set of symbols is given as the symbols of L . A finite sequence of symbols of L is called an expression of L . 2. There is a subset of the set of expressions of L called the set of well-defined formulas (wfs) of L . There is usually an effective procedure to determine whether a given expression is a wf. 3. There is a set of wfs called the set of axioms of L . Most often, one can effectively decide whether a given wf is an axiom; in such a case, L is called an axiomatic theory. 4. There is a finite set R1 , . . . , Rn of relations among wfs, called rules of inference. For each Ri , there is a unique positive integer j such that, for every set of j wfs and each wf B , one can effectively decide whether the given j wfs are in the relation Ri to B , and, if so, B is said to follow from or to be a direct consequence of the given wfs by virtue of Ri . A proof in L is a sequence B1 , . . . , Bk of wfs such that, for each i, either Bi is an axiom of L or Bi is a direct consequence of some of the preceding wfs in the sequence by virtue of one of the rules of inference of L . A theorem of L is a wf B of L such that B is the last wf of some proof in L . Such a proof is called a proof of B in L . A wf A is said to be a consequence in L of a set of Γ of wfs if and only if there is a sequence B1 , . . . , Bk of wfs such that Φ is Bk and, for each i, either Bi is an axiom or Bi is in Γ, or Bi is a direct consequence by some rule of inference of some of the preceding wfs in the sequence. Such a sequence is called a proof (or deduction) of Φ from Γ. The members of Γ are called the hypotheses or premises of the proof.

Figure 1.2: The Notion of Formal Proof (taken from Mendelson [115]) of Western Philosophy, from Aristotle’s (384–322 B.C) Organon to Frege’s (1848–1925) Begriffsschrift. Different aspects of formal proof come from different times, with increasing degrees of conciseness. In this thesis we do not attempt to give a detailed account of the history of formal proof.1 We restrict our discussion to its basic components for its use in mathematics. Formal proof in mathematics is characterised by: 1 The

interested reader is referred to the excellent book of Bocheński [17].

1.1


3

• the application of the axiomatic method; for any mathematical discipline (geometry, number theory, topology etc.) establish its foundation by defining its first principles, or axioms. New statements are then only derived by using these principles and other derived statements. • the hypothesis that mathematical reasoning can be captured by rules of inference. These rules govern the combination of axioms and already derived statements to yield new derived statements. • the use of a well-defined formal (artificial) language that allows the expression of all possible statements in a given mathematical theory. • the idea that one can schematically operate with linguistic expressions on purely syntactic grounds, using their form but ignoring their semantic content. This implies the idea that given a calculus, that is, a set of axioms and inference rules, other true statements can be mechanically derived by iteratively applying the inference rules to true statements. The axiomatic method goes back to Aristotle, who also formulated numerous inference rules, namely the syllogisms. The Elements of Euclid still serve as a role model for applying the axiomatic method [47]. Each statement in a Euclidean proof is either a definition, an axiom, a previously proved theorem, or a consequence that follows from one or more preceding statements. However, Euclid neither defines the inference rules used, nor explicitly states them, say at the end of each statement. Furthermore, Euclid neither defines nor uses a formal language to present proofs. In addition, some reasoning is diagrammatical, that is, embedded in diagrams. In contrast, Leibniz (1646–1716) proposed ideas to develop an artificial language, a characteristica universalis, and a calculus for logical reasoning, a calculus ratiocinator. According to Leibniz2 , “[L’épreuve] ne se fait que sur le papier, et par conséquent sur les caractères qui représentent la chose, et non pas sur la chose même. Cette considération est fondamentale en cette matière ...” In line with Leibniz, Frege makes a strong argument against the use of natural languages in mathematical proof. In [53], he writes: “Das Bedürfnis nach einer Begriffsschrift machte sich bei mir fühlbar, als ich nach den unbeweisbaren Grundsätzen oder Axiomen fragte, auf denen die ganze Mathematik beruht. Erst nach Beantwortung dieser Frage kann man mit Erfolg den Erkenntnisquellen nachzuspüren hoffen, aus denen diese Wissenschaft schöpft. Wenn diese letzte Frage nun auch mehr der Philosophie angehört, so muß man jene doch als mathematische anerkennen. Die Frage ist schon alt; denn schon Euklid scheint sie sich gestellt zu haben. Wenn sie trotzdem noch nicht genügend beantwortet worden ist, so ist der Grund in der logischen Unvollkommenheit unserer Sprachen zu sehen. Will man erproben, ob ein Verzeichnis von Axiomen vollständig sei, so muß man versuchen, aus ihnen alle Beweise des Zweiges der Wissenschaft zu führen, um den es sich handelt. Und hierbei muß man genau darauf achten, die Schlüsse nur nach rein logischen Gesetzen zu ziehen; denn sonst würde sich unmerklich etwas einmischen, was als Axiom hätte aufgestellt werden müssen. Der Grund, weshalb die Wortsprachen zu diesem Zwecke wenig geeignet sind, liegt nicht nur in der vorkommenden Vieldeutigkeit der Ausdrücke, sondern vor allem in dem Mangel fester Formen für das Schließen, Wörter wie ,also’, ,folglich’, ,weil’ deuten zwar darauf hin, daß geschlossen wird, und können ohne Sprachfehler auch gebraucht werden, wo gar kein logisch gerechtfertigter Schluß vorliegt. Bei einer Untersuchung, welche ich hier im Auge habe, kommt es aber nicht darauf an, daß man sich von der Wahrheit des Schlußsatzes u¨ berzeuge, womit man sich in der Mathematik meistens begnügt; sondern man muß sich auch zum Bewußt¨ sein bringen, wodurch diese Uberzeugung gerechtfertigt ist, auf welchen Urgesetzen sie beruht. Dazu sind feste Geleise erforderlich, in denen sich das Schließen bewegen muß, und solche sind in den Wortsprachen nicht ausgebildet.” 2 Cited

from Couturat [34, p. 155].

4

Introduction

Research in modern logic has established that all of mathematics is reducible to axiomatic set theory and that in principle, mathematical proofs can be reproduced in this system completely formally in the sense of mechanical verifiability. Moreover, we can search for each mathematical proof of a theorem in a mechanical way. Unfortunately, the belief that every question that is expressible in a given formal system is also provable ¨ in this system has been disproven by Gödel in his famous article “Uber formal unentscheidbare Sätze der Principia Mathematica und verwandter Systeme” [62]. In Frege’s Begriffsschrift [6] and his Grundlagen der Arithmetik [54] as well as in Whitehead & Russell’s Principia Mathematica [173], we can find all four aspects of formal proof. Committed to the notion of formal rigour, with enormous effort, those works managed to lay the ground work for formalising interesting mathematical theories. Contrary to the much more informal reasoning in Euclid’s Elements, however, no such mathematically interesting theorems can be found in those formal accounts. Moreover, the overwhelming majority of mathematicians do not perform such rigorous proofs, neither in high-quality journal papers nor in introductory textbooks. Given the benefits of formal rigour, the obvious question is, why, for truth’s sake, have subsequent generations of mathematicians not adopted this precise method.

1.1.2

Proofs in Mathematical Practise

To answer the aforementioned central question we need to investigate the role of proof in modern mathematical practise. For instance, the mathematician Wilder believes that theorems come from intuitive insight, and that the role of proof is to test these intuitions [174]: “[Now], granted that the mathematical theorem comes from the intuition, what is the role of proof? It seems to me to be only a testing process that we apply to these suggestions of our intuition.” Mathematical proofs, proofs in mathematical practise, therefore, seem to serve at least two purposes, establishing truths and testing intuitions. This motivates the following equation3 : Proof = Guarantee + Explanation. One purpose of proof is to provide evidence that supports the truth of a statement. In mathematics, a correct proof does not just provide evidence, but stronger, it guarantees or establishes the truth of a mathematical statement. Although mathematical proofs are different from formal proofs, they do share many characteristics. They both proceed stepwise, with each argument built on others that were either postulated to be true (axioms), assumed to be true (hypotheses), or previously proven to be true (theorems). In principle, to establish truth, any form of proof should make explicit the connections between axioms, theorems, hypotheses, intermediate conclusions, and reasoning rules. In mathematical practise, restricting the role of proof to guarantee, however, is not sufficient. It would not explain why mathematicians search for new proofs for already established mathematical truths. They appreciate the discovery of new proofs, especially if they consider them shorter and more elegant, or give a new angle of explanation for already established truths. One important role of proof, therefore, is to provide an explanation for why a mathematical statement is true. This is manifested in one key difference between formal proofs and mathematical proofs: the latter contain explanatory remarks. A mathematical argument can be complex, and a proof author, to facilitate proof understanding, may first give a proof outline which describes the overall structure or organisation of the argument. Also, the proof author may explicitly identify critical or interesting arguments. Since proofs often reveal deep mathematical truths, or combine them in a new way, or contain a novel combination of thought, techniques etc., the proof author might put an explicit emphasis on this. Also, the proof author might anticipate that the proof reader, to understand a proof, might only look for such critical arguments or novelties. Those proof annotations are not part of the guarantee component of proof but are designed to facilitate the reader’s understanding of the proof. Formal proofs are characterised by a complete absence of such information. While the guarantee component suggests that there might exist an absolute and objective standard of proof, the explanation component suggests that proof is subjective. As the mathematician and Fields Medal recipient Thurston points out, explaining the truth of some theorem, depends to a large extent on the audience to 3 The

equation has been contributed by John Alan Robinson; for a recent publication see [134].

1.1


5

which the explanation is directed. In his interesting account on proof and progress in mathematics he writes [153]: “Within a subfield, people develop a body of common knowledge and known techniques. By informal contact, people learn to understand and copy each other’s way of thinking, so that ideas can be explained clearly and easily. Mathematical knowledge can be transmitted amazingly fast within a subfield. When a significant theorem is proved, it often (but not always) happens that the solution can be communicated in a matter of minutes from one person to another within a subfield. The same proof would be communicated and generally understood in an hour talk to members of the subfield. It would be subject of a 15- or 20-page paper, which could be read and understood in a few hours or perhaps days by members of the subfield.” The form of mathematical proof does not only depend on the way it is communicated (oral, written, informal, formal, symbolically, diagrammatically etc.) and the audience the explanation is directed to. In a pessimistic statement, Wilder suggests other factors that influence the notion of proof [174]: “Obviously we don’t possess, and probably will never possess, any standard of proof that is independent of time, the thing to be proved, or the person or school of thought using it.” Apparently, Wilder does not see formal proof as a possible standard of proof; he concludes: “[And] under these conditions, the sensible thing to do seems to be to admit that there is no such thing, generally, as absolute truth in mathematics whatever the public may think.” Without any standard of proof and with the appeal to intuition, how are mathematical proofs checked for their correctness? Proof Verification. Verifying a proof can be an intellectually complex and time-consuming activity. A proof, intuitively clear to the author, may puzzle a potentially high number of proof readers. Many readers, highly regarded experts in the field or anonymous students, will struggle for hours or even days to untangle the details of a mathematical argument they consider complex or perhaps unintuitive. As Davis claims, few mathematicians would volunteer to check a fifty-page proof [37]. Also, often only a small number of mathematicians of some sub-discipline have the necessary qualification to check a proof of their domain completely for correctness. In addition, the verification process itself is error-prone. It is therefore inevitable that many published proofs are incomplete, or error-ridden, or both. In 1976, Ulam estimated the number of published theorems and their proofs at 200.000 [157]. Only a small part of these theorems get recognised by the mathematical community and accepted as true statements [67]. This acceptance, however, is to a large part based on other criteria than formal correctness. As Hanna points out, “[the] acceptance of a theorem by practising mathematicians is a social process which is more a function of understanding and significance than of rigorous proof”4 [67]. Hanna gives some standards used by mathematicians for accepting a proof [67, p. 70]: “1. They understand the theorem, the concepts embodied in it, its logical antecedents, and its implications. There is nothing to suggest it is not true; 2. The theorem is significant enough to have implications in one or more significant [sic.] branches of mathematics (and is thus important and useful enough to warrant detailed study and analysis); 3. The theorem is consistent with the body of accepted mathematical results; 4. The author has an unimpeachable reputation as an expert in the subject matter of the theorem. 5. There is a convincing mathematical argument for it (rigorous or otherwise), of a type they have encountered before. 4 For

Hanna, rigorous proof is formal proof.

6

Introduction If there is a rank order of criteria for admissibility, then these five criteria all rank higher than rigorous proof.”

Hanna argues further that the “mathematician is much more interested in the message embodied in the proof than its formal codification and syntax. The mechanics of proof are seen as a necessary but ultimately less significant aspect of mathematics. Certainly being able to follow the steps of a proof is not the same as understanding it”. In the words of Bourbaki5 : “Tout mathématicien sait d’ailleurs qu’une démonstration n’est pas véritablement tant qu’on s’est borné a` vérifier pas a` pas la correction des déductions qui y figurent, ˆ de déductions sans essayer de concevoir clairement les idées qui ont conduit a` bâtir cette chaine de préférence a` toute autre.” The same line of reasoning follows Goodman [63]: “Actually, when we evaluate a mathematical argument, we do not check to see whether it accords with some set of rules taken, let us say, from a logic text. Rather, we try to determine whether the argument works — that is, whether it convinces us, and ought to convince us, of the truth of its conclusions.” And Thurston points out, “people are usually not good in checking the formal correctness of proofs, but they are quite good at detecting potential weaknesses or flaws in proofs” [153]. He elaborates: “Mathematicians can and do fill in gaps, correct errors, and supply more detail and more careful scholarship when they are called on or motivated to do so. Our system is quite good at producing reliable theorems that can be solidly backed up. It’s just that the reliability does not primarily come from mathematicians formally checking arguments; it comes from mathematicians thinking carefully and critically about mathematical ideas.” In a collection of papers edited by Detlefsen [43], Tragesser discusses the following three aspects of mathematical proofs [154, pp. 162–198]: “1 that formal-logical structure is not essential to mathematical proof and, at times, can even serve to conceal the important role that understanding plays in it, 2 that a well-described possible proof can have the same epistemic benefits as an actual proof [...] and 3 that a proof or chain of proofs cannot leave anything unproved, contrary to the common idea that proofs must begin with assumptions that are not themselves proven. These facts point to difficulties in regarding proofs either as themselves being formal-logical derivations or as being satisfactorily represented by them. The formal-logical model idealises away aspects of proof that are vital to mathematical thought, particularly obscuring the complex role that understanding plays in it.” Tragesser illustrates those arguments by pointing at Hardy and Wright’s Elementary Number Theory6 which, he argues, “makes no mention of axioms, of unproven assumptions, and yet one does not have the sense that the proofs are somehow incomplete by virtue of unproven assumptions not having been set out as such. It is understanding that stands in for unproven assumptions.” 5 French:

Bourbaki, N., L’architecture des mathématiques, In F. Le Lionnais (ed.), Les grands couurants de la pensée mathématique, Cahiers du sud, 1948; English: Bourbaki, N., The architecture of mathematics, In F. Le Lionnais (ed.), Great currents of mathematical thought (Vol. 1), New York: Dover, 1971. “A proof is not really ‘understood’ as long as he (the mathematician) has only verified the correctness of the deductions involved step by step, without trying to understand clearly the ideas which led to the construction of this chain of deductions in preference to all others.” 6 This

is a highly praised and widely used standard textbook on elementary number theory [68]. Most of the proofs and text fragments that we analyse in this doctoral thesis were taken from this textbook.

1.1


7

Furthermore, Tragesser points out that “reasoning, in arithmetic, centering on assertions ‘take the least number such that ...’ is justified on the basis of the understanding of the system of numerals; the very fact that the logical analysis of such assertions leads to second-order assertions shows that the logical account of the least number principle does not give an adequate representation of our understanding, our understanding on which arithmetic, number-theoretic, reasoning is based, and so must be understood as based on our intuitive understanding of our numerals!”. In conclusion, mathematicians are quite happy with their established but vague notion of proof and the way they are communicated and checked.7 We summarise with a brief comparison of formal and informal proof. (1) In a formal proof, the theory in which the proof is stated is explicitly stated. The theory, and nothing else, defines the context. In an informal proof, there is no full and explicit theory that one can refer to. The context has to be constructed or completed by the proof reader. (2) In a formal proof, for each proof step, it is explicitly given which conclusion is derivable by which set of premises and by which inference rule. In an informal proof many proof steps are omitted or incomplete. In particular, informal proof is characterised by an almost complete lack of references to the inference rules used. (3) A formal proof is written in a formal language, the latter can be easily described in less than one page. An informal proof is written in an informal language, say English. Despite its formulaic nature, it has so far resisted a formal description. (4) Finding informal or formal proofs is not trivial and is, therefore, hard to mechanise. (5) In formal proofs there is no ambiguity. In informal proofs there should be no ambiguity, but there often is. (6) In a formal proof only form matters. In an informal proof meaning matters more. (7) Formal proofs are structured (e.g., resolution graphs, natural deduction trees, semantics tableaux). Informal proofs are also structured, but at a different level of abstraction. Moreover, informal proofs are characterised by the use of proof outlines and other explanatory remarks that help identifying the structure of the argument. (8) Machines are good at verifying formal proofs, but currently cannot check informal proofs. Humans are good at verifying informal proofs, but bad at verifying formal proofs. Finally, we conclude this section with a citation from Thurston, a typical representative of the mathematics community [153]: “However, we should recognize that the humanly understandable and humanly checkable proofs that we actually do are what is most important to us, and that they are quite different from formal proofs. For the present, formal proofs are out of reach and mostly irrelevant: we have good human processes for checking mathematical validity.” This quotation nicely captures the mathematicians’ interpretation of Robinson’s equation Proo f = Guarantee + Explanation. Given the absence of better proof standards and tools, mathematical proofs, as long they are both humanly understandably and checkable, serve sufficiently well. 7 An

interesting and controversy discussion on mathematical techniques can be found in: BULLETIN (New Series), American Mathematical Society, Volume 30, Number 2, April 1994, cf. [153] and [88].

8

1.2

Introduction

The Linguistic View: Understanding Mathematical Discourse

Processing mathematical proofs is a discourse understanding task, and from this linguistic perspective, an interesting and complex research topic in its own right. It can well be the case that giving one informal proof to N proof readers might result into N or more different interpretations of it. Some readers might have problems in understanding a mathematical discourse, because (i) the presented argument is not as informative as it should be (the argument contains a gap that the reader cannot fill in); (ii) the presented argument does not contain relevant information (the argument contains a gap the author should have provided); (iii) the presented argument contains mathematically and logically incorrect conclusions; (iv) the reader is not familiar with the notations and concepts being used in the argument; (v) the reader is not familiar with the proof methods being employed or fails to recognise them. In these cases, readers find it difficult to keep track of the proof obligations, to identify the critical parts of the argument, or to reconstruct details that were not given explicitly. Readers might therefore feel that the argumentation is incomplete, or even poorly structured. Taking these complexities as givens, restricting our focus to mathematical discourse enables us to study the subject matter of discourse understanding in its purest form: the domain of mathematical discourse is rich and well-defined, the use of language precise and concise, the discourse highly structured, the set of discourse relations small and well-defined, and the reasoning widely studied and understood. Beyond that, I claim that if we fail to construct an understander for mathematical discourse, then we will also fail to write one for other (non-trivial) domains.

1.2.1

The Gricean View

In any well-written mathematical discourse, at any point within the argument, a certain amount of mathematical and meta-mathematical knowledge is presupposed and implied. The proof author, for ease of argument, takes much for granted. The proof reader is not only assumed to verify what is asserted explicitly (and its verification implies some higher degree of understanding), but in order to do so, he must identify its presupposed and implied content. The communication of mathematical insight can be highly effective if author and reader share a common understanding of the domain, perhaps by working within the same subfield. Proofs that occur in elementary textbooks, on the other hand, have to appeal to a wide audience of potential readers most of whom are not experts in the subject being introduced. Here, the common ground between author and reader will be much smaller. Discourse interpretation is facilitated if discourse participants are committed to the Gricean principles [64, 38] (vide Fig. 1.3). But the question is, how do the Gricean principles apply to mathematical discourse? Quality: Of course, a defining cornerstone of mathematics is to respect the Gricean maxim of quality. Mathematicians do not assert the truth of some statement if they lack adequate evidence for it. In striking contrast to non-scientific domains of discourse, the truth of each assertion must be either established by proof, postulated by declaring it as an axiom, or conditionally assumed for the sake of argument. A proof is established by presenting a highly ordered and structured argument. Moreover, conventional argumentation methods, as they are realized in discourse, codify and even impose such structure. Manner: To comply with being perspicuous, mathematicians have defined and continue to define their own special purpose language. Over the centuries special syntactic constructions and notations have evolved that facilitate concise and precise expression of mathematical statements. For example, each new concept or notation should be explicitly defined, before its first use, respecting certain standards on how to define. Similarly, any departure from existing definitions and notations should be made explicit. The resulting expert language for writing mathematical proofs is one of the best defined languages within the scientific disciplines. In mathematical discourse, due to its highly formulaic nature, obscurity and ambiguity of expression can easily be avoided. The art of writing good mathematical texts focuses on clearness and

1.2

The Linguistic View: Understanding Mathematical Discourse

9

• The co-operative principle. Make your contribution such as is required, at the stage at which it occurs, by the accepted purpose or direction of the talk exchange in which you are engaged. • The maxim of quality. Try to make your contribution one that is true, specifically: – do not say what you believe to be false – do not say that for which you lack adequate evidence • The maxim of quantity. – Make your contribution as informative as is required for the current purposes of the exchange. – Do not make your contribution more informative than is required. • The maxim of relevance. Make your contribution relevant. • The maxim of manner. Be perspicuous, and specifically, – – – –

avoid obscurity avoid ambiguity be brief be orderly

Figure 1.3: The Gricean Maxims conciseness and not on an embellished style of expression; frequently symbols and formulae with well-defined semantics are employed in many places to shorten the text. Another aspect of the maxim of manner is the use of conventionalised methods of argumentation in mathematical discourse, which relates to the Gricean imperative to “be orderly!”. To present a proof of a statement of the form A ∧ B, usually but not necessarily, one first presents a proof for A, and then presents a proof for B. Furthermore, to present a proof by induction, the base cases typically precede the presentation of the step case, which is usually the more interesting and complex argument. Quantity and Relation. While the Gricean categories quality and manner seem to be easily definable for mathematical discourse, the maxims quantity (“be informative!”) and relation (“be relevant!”) are not. For mathematical discourse, one might be inclined to distinguish relevancy and informativeness in the following way. Be relevant means: use only axioms and theorems and make only assumptions that are needed for the argument; being informative means: tell the proof reader only about the “interesting” axioms, theorems and assumptions used in the argument. Relevancy can be computed in an objective manner — any axiom, theorem or assumption used to logically derive the truth of some statement is relevant — informativeness depends on the proof reader and must be computed with respect to his knowledge and interests. Clearly, to respect all of the Gricean principles at once necessitates a search for a delicate balance between principles that are often in opposition with one another. For example, how to be informative without being irrelevant? The proof author thus faces the difficult task of balancing the perspicuity of the argument (make it clearly or easily understood, evident, transparent) with its lengthiness. Providing copious detail can easily cause tedious and tiresome reading resulting in the “do not see the forest because of all these trees” phenomena. — Obviously, mathematical discourse must not violate the maxim of quantity. A well presented mathematical discourse permits the reader to study the argument at various levels of detail.8 8 Vide

our brief discussion of Lamport-style proof presentation in ch. 2.

10

Introduction

1.2.2

A Computational Linguist’s View

Building a text understander for mathematical discourse is more feasible than building one for other less scientific domains. Beyond that, we argue that mathematical discourse provides an ideal testbed for understanding discourse, in general. The reasons are four-fold and complement those given in the previous subsection. First, the universe of discourse is mathematics, a science more formally complete and precise than others. Therefore, building an ontology for a mathematical domain, say elementary number theory, is much easier than for a more natural domain of discourse, say, for BBC reports on political conflicts. Moreover, formalisations of elementary number theory are readily available and do not need to be constructed from scratch. Note also that the chosen domain can be arbitrarily sized to accommodate current needs and state-of-the-art technology. Second, textbook proofs are a highly structured form of discourse. Mathematicians agree on a variety of different methods for how to prove theorems. Often, the form and content of a theorem determines applicable proof methods. From a linguistic point of view, these proof methods define discourse plans which serve as good predictors of what to expect next, impose discourse structure, and therefore, to a large extent, facilitate the task of discourse understanding. Third, the number of discourse relations in mathematical discourse is small and well-defined. There is an abundance of discourse markers which helps to identify the relations that hold between parts of a proof. Fourth, the understanding of any multi-sentence text involves, to a considerable degree, inference. This is particularly true for mathematical discourse. To be able to follow the proof author’s line of reasoning, it is necessary to model mathematical reasoning itself — a task, however, which we consider much easier than modelling common sense reasoning of everyday discourses. Moreover, mathematical reasoning has been studied for centuries and is available in a formalised and operational form, ready for deployment.

1.2.3

The Semanticist’s View

Unlike other discourse genres, the semantics of mathematical discourse can be given a clear-cut and precise definition — neglecting the many informative and structural aspects of the informal proof, as discussed above. Understanding mathematical discourse means being able to extract and determine its asserted, presupposed and implied content. A mathematical discourse understanding system must therefore be able to translate an informal mathematical discourse into a formal mathematical discourse, the latter representing the meaning of the former, that is: The semantics of an informal mathematical discourse is its corresponding formal mathematical proof. We can therefore view the translation from informal to formal proof as the construction of semantics. It consists of two parts. First, translating the informal mathematical language to a formal mathematical language; and second, translating the informal mathematical reasoning into formal mathematical reasoning. The informal proof, or the translation process, can be ambiguous, so that an informal proof may have more than one meaning, that is, more than one corresponding formal proof.

1.3

Research Program and Research Questions

This section is divided into three parts. First, we give a literature review on ideas that aim to construct a machine that supports mathematicians in the generation of mathematical arguments that adhere to formal rigour. In the second part, we use these ideas to specify such a mathematician’s assistant. Third, we list research questions that must be answered to successfully pursue this endeavour.

1.3.1

Doing Formalised Mathematics with Computer Support

Given the discussion of § 1.1, it is hard to imagine how to effectively support mathematicians in their tasks. It seems that the bottom line task is to bridge the gap between mathematical proof and formal proof. AUTOMATH, suitably named, was the first major project with an aim to build computer-supported tools for mathematicians [121]. It was founded and subsequently led by a distinguished mathematician, namely

1.3


11

Nicolaas G. de Bruijn, who envisioned an intelligent artificial agent for formalisation and reasoning support [40]: “For a mathematician, the best defined part of artificial intelligence is theorem proving, where we present a proposition to a computer and hope it will be able to prove it. Automatic theorem proving has done remarkably well in some special cases, but it is hard to believe that it will readily become a powerful tool in general. But what does seem to be realizable and profitable as well as interesting is a kind of interplay between a mathematician and an artificially intelligent agent, where the human being presents the main outline of a proof and the machine provides all details. On top of that, the computer can keep a complete record of what has been achieved exactly. Perfect justification systems are available for this kind of mathematical book-keeping. I use the term ‘justification’ rather than ‘verification’, since verification suggests verifying in detail, or a single proof. Justification means much more than that: it refers to processing entire areas, like complete theories and the interplay between various theories. Without artificial intelligence, justification is known to be possible but definitely time consuming. The mathematician who has to present a mathematical text to a machine has to produce a complete description of everything that he normally skips on the basis of experience. His experience usually guarantees that he is able to provide even the most minute details if he is requested to. That is an unpleasant job, but there is reasonable hope that in justification systems artificial intelligence will become able to learn to take care of most of what the mathematician likes to take for granted. A further desideratum is to enable the artificial intelligent agent to guess the meaning of not entirely complete statements, and to ask the human for confirmation of the guess.” In this citation, de Bruijn envisions “justification systems” that support the construction of large theories, reasoning support that allows mathematicians to keep their ways of mathematical thinking, and language support that enables them to express mathematics in a natural way. In a similar vein, in several accounts [166, 167, 168, 169], Wang finds it “appealing to think of an interaction between man and machine, so that computers may become research assistants. The division of labour between human and machine is as follows: the mathematician provides the mathematical proof (that one that could equally be communicated succinctly between experts) chunkwise to the machine; the machine then tries to find long formal proofs for these relatively more simple results.” To ensure effective man-machine-communication and collaboration, Wang elaborates, “it seems that human interventions would be able to improve more substantially the end results if we move from Herbrand proofs to programs with more varied data and strategies”. Building such interfaces, so Wang requires a “more reflective examination of the data, viz. the existing mathematical proofs and methods of proof. It is true that what is natural for man need not be natural or convenient for machine. Hence, it will not be fruitful to attempt to imitate man slavishly. Nevertheless, the existing body of mathematics contains a great wealth of material and constitutes the major source of our understanding of mathematical reasoning. The reasonable course would be to distill from this great reservoir whatever is mechanizable”. In [153], Thurston acknowledges ongoing work towards a mathematician’s assistant, and the benefits of pursuing such work: “There are people working hard on the project of actually formalizing parts of mathematics by computer, with actual formally correct formal deductions. I think this is a very big but very worthwhile project, and I am confident that we will learn a lot from it. The process will help simplify and clarify mathematics. In not too many years, I expect that we will have interactive computer programs that can help people compile significant chunks of formally complete and correct mathematics (based on a few perhaps shaky but at least explicit assumptions), and that they will become part of the standard mathematician’s working environment.”

12

Introduction

1.3.2

The Mathematician’s Assistant

Though many theorem provers exist, interactive ones and automated ones, operationalising many different calculi, running on several different computer devices, they have not been widely adopted in the standard mathematician’s working environment.9 The reason is, I hypothesise, that any of them forces the user to either read generated formal mathematical discourse, or to construct such discourse interactively with the machine. And, formal mathematical discourse violates, for example, the Gricean maxim of quantity. It is tediously long, tiresome to read, and it is hard to extract its interesting parts. It also violates the Gricean maxim of manner: its notation is obscure and does not match the expressiveness of its informal counterpart10 ; its reasoning patterns do not match the kind of reasoning mathematicians perform when presenting informal proofs. The question is, then, what would be classified as a supportive, “easy-to-talk-to”, mathematician’s assistant? Hypothetically and optimally, one that is capable of engaging in a natural mixed-initiative dialogue, aiming at co-operatively constructing mathematical arguments with the human counterpart, allowing each party to effectively articulate and contribute its competences towards achieving a common goal, establishing mathematical truth. Fig. 1.4 illustrates the main problem: how can we ensure effective communication between mathematician and machine?

o

? Effective ? / ? Communication ?

Figure 1.4: Effective Mathematician–Machine–Communication.

A potentially promising answer to this question is to define the division of labour between mathematician and machine is as follows: The mathematician, using her creative mind, develops a proof idea and presents it to the machine as she would present it on paper, that is, in the form of an informal mathematical proof; the machine takes the informal proof as input, analyses it, and in the course of discourse understanding, translates it into a formal proof. Graphically,

Mathematician’s / Creative Idea

Informal proof

Formal proof

Machine / Refinement & Verification

In this scenario, in principle, the mathematician is enabled to express her high-level reasoning using her expert language of mathematics. The machine must cope with her language and her reasoning pattern, and not vice versa. Here, the mathematician remains in the role of one who performs the creative work of “mathematisation”, and who is guiding the proof search; the machine performs the routine work of formalisation, and needs to ensure that the input proof is correct, that is, has a corresponding formal proof. Furthermore, proof construction should be an incremental process, progressing during the dialogue between man and machine. Moreover, communication problems should be dealt with in a natural way. For example, the machine will prompt the mathematician to supply more details if it is not able to interpret and justify a proof step automatically, or if it is unsure about the mapping of vague concepts into their formal counterparts. In the other direction, the mathematician can ask the machine to omit steps in the evolving proof that although being necessary to fulfill the requirement of formal correctness, she believes are “mathematically uninteresting”. Also, the mathematician can ask the machine to display concept definitions, theorems, 9 We

give a review of proof engines in ch. 2. this is the case — but much could be done to make formal languages more expressive and less obscure without compromising their desirable qualities. 10 Generally,

1.3


13

and lemmata that are explicitly or implicitly used during formal proof construction, or to use alternative definitions, theorems, and lemmata to prove the same result. Proof editing and transformation should be possible at any level of abstraction. For instance, we should allow the mathematician to annotate the proof with comments at any level of abstraction. This will facilitate the communication of proofs among human mathematicians, or between a mathematics teacher and her students.

1.3.3

Research Questions.

The construction of a mathematician’s assistant MA that understands mathematical discourse requires the effective combination of knowledge about the domain with linguistic and meta-mathematical knowledge sources. Symbolically, we have to solve the equation MA = Domain + Language + Reasoning, where each of the variables on the right-hand-side are dependent variables. Any sub-discipline of mathematics has its own notational specifics, many symbols have a domain specific use and meaning, along with reasoning techniques that are domain-dependent. An expressive language facilitates, or even enables highlevel mathematical reasoning. For instance, if the language allows elliptical constructions, say pa11 pa22 . . . pak k , then specialised inference rules can be applied that use such forms. The use of meta-mathematical techniques is often implicit. Nevertheless, to facilitate proof understanding it is also made verbally explicit, using language. To maximise the acceptability of a MA, it should provide rich, pre-defined, but adaptable and extensible domains, along with an expressive language and sound, high-level, mathematical reasoning capabilities. In the optimal case, the MA allows mathematicians to benefit from formal methods without exposing them to the inconveniences they feel are connected to formal rigour. The task is therefore threefold and requires exploring the three problems areas: Library. The construction of “interesting” proofs requires large libraries of formalised mathematics. Building such libraries and making them easily accessible is painful, hard work. Being forced to do foundational work would distract the mathematician from his original problem. How can we provide an infra-structure of formalised mathematical theories ready for use? Language. Formal proofs require a formal language. Learning such a new and artificial language, accepting it and using it correctly, is the first hurdle that mathematicians must overcome. For instance, take Frege’s two-dimensional formal language that he used for his Begriffsschrift. It was never widely accepted and used among mathematicians or even logicians. Also note that modern representation languages, typed first or higher order logic languages, fall short of satisfying mathematicians. In contrast, the expert language of mathematicians is far more natural than a formal language; it is imprecise or ambiguous, but very expressive, and an effective means to communicate mathematical arguments among mathematicians. Can we write a parser that effectively recognises the expert language of mathematicians? Reasoning. Formal proofs exhibit an unacceptable level of detail. These proofs are tediously long and therefore it is hard, if not impossible, to read and understand them. This is intolerable for a mathematician, who associates the role of proof with explanation and understanding. In comparison, mathematical proofs are written at a higher level of abstraction. Therefore, they are much shorter (although they often contain explanatory remarks) and, in principle, much more readable (because they contain explanatory remarks). However, textbook proofs do not meet the criteria of formal correctness. How can we simultaneously achieve understandability and formal correctness? In this thesis no attempt will be made to explore the problem area of mathematical libraries. Clearly, laying the foundations requires both theoretical and practical expertise. Thurston is aware of the potential risks and difficulties [153]:

14

Introduction “[Much] effort would have to go into mathematics to make it formally correct and complete. It is not that formal correctness is prohibitively difficult on a small scale — it’s that there are many possible choices of formalization on small scales that translate to huge numbers of interdependent choices in the large. It is quite hard to make these choices compatible; to do so would certainly entail going back and rewriting from scratch all old mathematical papers whose results we depend on. It is also quite hard to come up with good technical choices for formal definitions that will be valid in the variety of ways that mathematicians want to use them and that will anticipate future extensions of mathematics. If we were to continue to cooperate, much of our time would be spent with international standards commissions to establish uniform definitions and resolve large controversies.”

The formalisation of mathematics requires a huge personal contribution from the mathematics community. In contemporary terminology, computer scientists can only contribute knowledge management tools, mathematicians must play the role of content providers. In this thesis we contribute towards answering the latter two questions. The underlying hypothesis of this thesis is that the semantics of an informal mathematical proof is its corresponding formal proof. In general terms, our goal is to obtain a formal representation from informal input, and then to show that this formal representation has some properties. In particular, an informal mathematical discourse that is considered mathematically correct, needs to be transformed into a formal mathematical proof that satisfies the criteria of formal correctness. Our goal, therefore, is to describe this transformation process and to build a machine that is capable of performing such transformation automatically. Such a device will preserve the understandability of the input proof and, through the translation process, guarantee the formal correctness of the generated semantic representation. To provide a computational framework that allows a machine to effectively perform this transformation mechanically, we will need to study the language and the underlying reasoning of textbook proofs. We need to investigate in which ways the expert language of mathematics is different from a natural language, say English. We need to identify which kinds of linguistic phenomena occur in textbook proofs, and if they are different in any way to the ones that are known to occur in other discourse genres. On the discourse level, we need to determine the specifics of mathematical discourse, study its structure, and determine how to recognise it. Since mathematical discourse is a mirror of (systematic) mathematical reasoning, it consists of conventionalised methods of argumentation that impose its structure. This requires, of course, being able to name the (common) patterns of high-level mathematical reasoning, that is, to represent and operationalise meta-mathematical knowledge. Eventually, these high-level reasoning rules will need to be mapped onto sound low-level inference rules guaranteeing correctness to the highest standard of rigour. The Thesis’ Research Questions. Can we give a formal description of the expert language used by mathematicians to write textbook proofs? Moreover, can we effectively operationalise the formal language description to write a parser that syntactically recognises this language? What are the linguistic phenomena that occur in textbook proofs? Furthermore, can we develop an algorithm that can effectively cope with these phenomena and which is being informed by semantic and pragmatic knowledge sources? How can we use conventionalised argumentation methods to determine the structure of informal mathematical discourse? Moreover, can we use or adapt existing theories of structured discourse representation, in particular, discourse representation theory (DRT), to represent mathematical discourse? Can DRT or an extension of DRT serve as a computational framework for incrementally building representations for mathematical discourse? How can we fill in the gaps that characterise informal mathematical discourse? In particular, what kind of reasoning engine is best suited to compute such implicit or hidden proof parts?

1.4

1.4 1.4.1

The Organisation of the Thesis

15

The Organisation of the Thesis The System’s Architecture

Fig. 1.5 depicts the architecture of Vip (Verifying informal proofs), our prototype system for processing informal mathematical discourse. It serves to demonstrate the adequateness and feasibility of our approach. The transformation of an informal textbook proof into a formal mathematical proof undergoes three distinct GF

Linguistic / Analysis

Informal proof

IN

Parser

GFProof Represen-ED tation Structure (PRS)

@A

ED

Sentence-Level / Semantic Representation @A BC

BC

o

/

Math.& Ling. Analysis Discourse Update Engine O

Formal proof

o

OUT

Proof Plan Refinement Method Expander

GF ?>

ED Library of =< Proof Methods & @A 89 Domain Theory BC :;

Figure 1.5: Architecture of Vip

stages. The first two stages aim at the incremental construction of a proof representation structure. For each sentence of the textbook proof, the parser is called to perform a complete syntactic analysis as well as to construct its intermediate and underspecified semantic representation. This representation is then given to the discourse update engine, which attempts to incorporate it into the given context as it is captured by the current proof representation structure. Once all the proof sentences are processed in this manner, in the third stage, a proof refiner inspects the resulting proof representation structure. Each of the high-level reasoning steps it may contain is unfolded into calculus-level inference steps. As a result of this postprocessing stage, we obtain a formal proof. The informal proofs, mostly taken from Hardy & Wright’s textbook [68], have been set in LATEX. Before being fed into the parser module, we have manually tokenised the LATEX input into a stream of Prolog atoms. Also, we marked sentence boundaries. For each tokenised sentence of a proof, the parser performs a linguistic analysis in which syntactic and semantic processing is performed hand in hand. The construction of semantic representations on the sentence level is based upon an adapted version of discourse representation theory. The result of the linguistic analysis is an intermediate and underspecified semantic representation. That is, it contains referential expressions that have to be resolved in subsequent processing. For each intermediate representation that is returned by the parser, the discourse update engine interprets and incorporates it into the current proof context, which in turn is captured by a proof representation structure. In particular, this context contains assumptions and derived statements made earlier in the proof as well as resulting proof obligations. The discourse update engine is supported by a proof planner that is informed by two knowledge sources: a library of proof plan schemata, which captures meta-mathematical expert knowledge, and a library of domain knowledge. In general, the discourse update engine may be able to incorporate more than one intermediate representation into the current proof context. Also, for a given intermediate semantic

16

Introduction

representation, more than one possible proof continuation may arise. In both cases, the discourse update engine module maintains a set of proof continuations (each of which is represented by a partial proof plan, or proof representation structure), some of which may be ruled out if they do not allow to be continued in subsequent processing. After the textbook proof has been processed in this manner, a complete proof plan has been obtained. A proof refinement module takes this proof plan as input and expands all proof methods that occur in this proof plan to inference-level steps such that a formal proof results. We view a formal proof as one possible semantic representation of the given informal textbook proof. Due to the relatively low complexity of the informal input proofs, the prototype system Vip does not incorporate a proof refinement module.

1.4.2

Thesis Overview

The dissertation is divided into eight chapters. It is organised as follows: Ch. 2 gives an overview of related work, in particular, about previous attempts at checking natural language proofs, as well as selected topics in automated reasoning and the state of the art in discourse understanding. Ch. 3 analyses textbook proofs from the mathematical and meta-mathematical perspective and discusses the problems involved in transforming informal proofs into formal proofs. Ch. 4 analyses textbook proofs from the linguistic perspective. We give a systematic account of the linguistic phenomena that occur in textbook proofs. Ch. 5 discusses the use of discourse representation theory and its applicability to mathematical discourse. We uncover its strengths and weaknesses for this domain. Several modifications and extensions are proposed. Vip’s parser module implements our findings. Ch. 6 presents an enhanced formalism for representing mathematical discourse, proof representation structures. In particular, the formalism will be adapted to cope with the problems identified in ch. 3 and ch. 4. Proof plan schemata are defined as underspecified proof representation structures. Ch. 7 describes the planner module and how it operates using mathematical and meta-mathematical knowledge to construct proof representation structures. A discourse update algorithm for mathematical discourse is proposed and described. Ch. 8 concludes, discussing the significance of our research, the current strengths and weaknesses of Vip, its coverage, and several potential extensions to Vip. At the end, ch. 8 outlines four follow-up projects for future research.

1.4.3

Three Notes

Disclaimer 1: the Thesis is not about... There is a difference between understanding textbook proofs, and understanding the way that mathematicians reason when searching for proofs and how they guide their proof search.11 In this study we are interested in the question of how mathematicians communicate their proofs and how they build one argument on top of another in order to present a convincing, complex argument structure supporting the truth for some assertion. Proof understanding is therefore reduced to the understanding of the logical structure of the presented argument. In this study, we are not interested in other forms of understanding, for example, in a geometrical or symbolical understanding of mathematical concepts and statements. This work has nothing to say about intuition, insight or association, or about the eternal permutation of mathematical activity around intuition, trial, error, speculation, conjecture, and proof, or as Hadamard describes it, around preparation, incubation, illumination and verification [66]. For the reader interested in these questions, we provide only one citation of Goodman [63]: 11 If

such research is tractable at all. Current proof systems perform proof search in a quite restricted way, logically. They do not (yet) possess the qualities of experience, insight or introspection.

1.4

The Organisation of the Thesis

17

“Introspection shows that when I am actually doing mathematics, when I am wrestling with a problem that I do not know how to solve, then I am hardly dealing with symbols at all, but rather with ideas and constructions. Some of the hardest work a mathematician does occurs when he has an idea but is, for the moment, unable to express that idea in a formal way. Often such ideas first manifest themselves as visual or kinesthetic images. As the mathematician becomes clearer about them, as they become more formal, he may discover that they manifest considerable internal structure which is, so to speak, not yet symbolically encoded. This point is hard to discuss in a way which avoids purely psychological categories not directly relevant to the epistemological point I am trying to make. Still, mathematicians customarily talk about ideas, constructions, and proofs in a way which makes it clear that they have in mind something other than the symbols they use. Mathematical truth, unlike a mathematical construction, is not something I can hope to find by introspection. It does not exist in my mind. A mathematical theory, like any other scientific theory, is a social product. It is created and developed by the dialectical interplay of many minds, not just one mind. When we study the history of mathematics, we do not find a mere accumulation of new definitions, new techniques, and new theorems. Instead, we find a repeated refinement and sharpening of old concepts and old formulations, a gradually rising standard of rigor, and an impressive secular increase in generality and depth. Each generation of mathematicians rethinks the mathematics of the previous generation, discarding what was faddish or superficial or false and recasting what is still fertile into new and sharper forms. What guides this entire process is a common conception of truth and a common faith that, just as we clarified and corrected the work of our teachers, so our students will clarify and correct our work.” But again, in this dissertation, we are not concerned with how mathematicians reason, where they get their intuitions from, and why and when they are creative. Disclaimer 2: the Thesis is not about... This dissertation is about understanding mathematical discourse that is published in elementary textbooks on mathematics. All of our example discourse is real and unpolished discourse taken from undergraduate textbooks. The main subject of study is a set of proofs that naturally come with theorems, definitions and notations. In particular, we study proofs on elementary number theory taken from [68]. The proofs normally fit into 10 lines or so, and, for the human reader, are relatively easy to understand. For obvious reasons, we did not study proofs that were published in mathematical journals and thus avoid the complex mathematical reasoning that might be involved in such proofs. Disclaimer 3: the Thesis is not about... The goal of this dissertation has been to devise and implement a system that understands simple mathematical proofs. This has proven to be a very complex and interdisciplinary undertaking. As a result, our mathematical analysis of textbook proofs, which we attempted to a good level of detail in ch. 3, can easily be extended to a dissertation topic, say “the identification of reasoning strategies in textbook proofs and their operationalisation in state-of-the-art proof engines”. Also, our linguistic analysis of mathematical discourse is shallow. For example, the use of symbols in mathematical discourse and their interaction with English text, which we describe in ch. 4, easily merits to be the central question of a doctoral dissertation. We are aware of our investigation’s shortcomings and look forward to future dissertations addressing these issues.

18

Introduction

Chapter 2

Review of Related Work This chapter is divided into three parts. The first part reviews earlier attempts to build a textbook proof understander. We describe Proofchecker, the first program that was initially directed towards the automatic verification of textbook proofs, and Nthchecker, the most recent attempt. We complement our review with two influential text understanding systems each of which operates in a domain that is equally well-defined than the domain of logical arguments in mathematics, namely, Bobrow’s Student and Bundy et al’s Mecho. Both programs accepted English as input, needed to extract relevant information from a given problem statement, and complement it with domain knowledge to solve algebra word problems (Student) or Newtonian mechanics problems (Mecho). Our system review shows that building a machine to understand informal mathematical discourse requires the application of both automated reasoning (AR) and natural language processing (NLP) techniques. We believe that within each of these AI sub-disciplines techniques have been developed that can be used, adapted, and successfully combined for the task at hand, namely, understanding mathematical discourse. In the second part of this chapter we therefore review relevant AR techniques. Since the task of building a textbook proof understander has been shown to be inherently difficult, the AR community has focused on the development of computer-assisted development systems for formal proofs. Nevertheless, progress has also been made towards the modelling and operationalisation of high-level mathematical reasoning. We briefly review the state-of-the-art in theorem proving, sketch two theoretical frameworks for capturing mathematical reasoning, the sequent calculus and proof planning, and very briefly discuss related technology in proof presentation, namely, proof “informalisation” or proof verbalisation. In the third part, we take the NLP perspective. We discuss three approaches in multi-sentence text understanding: Schank’s scripts, Hobbs’ “interpretation as abduction” framework, and Kamp’s theoretical framework for discourse understanding, discourse representation theory. All of these approaches provide techniques that aim to infer information that is only implicit in a given text but necessary for its full interpretation.

2.1 2.1.1

A Systems’ Review Abrahams’ Proofchecker

In 1962, John McCarthy said that “Checking mathematical proofs is potentially one of the most interesting and useful applications of automatic computers” [111]. In the first half of the 1960s, one of McCarthy’s students, Paul Abrahams, was asked to implement a Lisp program for the machine verification of mathematical proofs. The program, named Proofchecker, “was primarily directed towards the verification of textbook proofs, i.e., proofs resembling those that normally appear in mathematical textbooks and journals” [3]. But Abrahams soon needed to revise his goal. If, wrote Abrahams, “a computer were to check a textbook proof verbatim, it would require far more intelligence than is possible with the current state of the programming art”. Therefore, said Abrahams, “the user must create a rigorous, i.e., completely formalised, proof that he believes represents the intent of the author of the textbook proof, and use the computer to check this rigorous proof”. Henceforth, Abrahams programmed a computer to check rigorous proofs. Instead of processing textbook proofs, he defined a formal language and a restricted set of proof construction commands; 19

20

Review of Related Work

Proofchecker then checked if a given input satisfied these formal requirements. In one of his concluding remarks, Abrahams claims that “it is a trivial task to program a computer to check a rigorous proof; however, it it not a trivial task to create such a proof from a textbook proof” [3]. This statement is still true, and for two decades the idea of automatically verifying textbook proofs, despite its apparent interestingness, has not been pursued. We are not aware of any publication until the late 1980s, when Simon started his PhD research and reported his first results [146].1

2.1.2

Bobrow’s Student

In parallel to Abrahams, also at MIT, Bobrow worked on his PhD thesis “Natural language input for a computer problem solving system” [16]. His underlying motivation was to enable a problem-solving system to accept natural language input instead of input that was specified in a formal and hence artificial language. Bobrow regarded natural language as a convenient vehicle for declaratively expressing a problem description, leaving the choice of how to solve the problem, the procedural knowledge, to the problem solver. Ideally, the user must not be aware of the system’s internal data structures, algorithms, and problem solving strategies. Bobrow argued that algebra story problems are a good domain for which to build such a problem solver. First, there is a clear idea of the target language, or data structure: a story problem in natural language English has to be translated into a set of mathematical objects, equations. Second, algebra has a rich set of well-known techniques to deduce information that is only implicit in the algebra story problem or its symbolic representation, namely the determination of values for unknowns. Third, thousands of problems are available in textbooks, and therefore, a large corpus is readily available. In contrast to Abrahams who forced the user to communicate with the system in a formal language and using a restricted set of proof construction commands, Bobrow defined a subset of English. Algebra story problems taken from textbooks had to be transcribed by hand into Bobrow’s sub-language of English. The resulting texts were processed with a pattern matching framework based upon the COMIT language, which Bobrow re-implemented and re-named Meteor. Bobrow’s natural language understander is a set of Meteor rules. A Meteor rule is of the form (label pattern transformation goto-label) and the rule interpreter applied rules to whatever resided in the workspace. A rule was applicable if its pattern matched the workspace. As a result, pattern variables were instantiated and a transformation applied changing the content of the workspace. The interpreter then proceeded by looking for applicable rules with the label goto-label (until the end label has been encountered).2 It is quite obvious that Bobrow’s (anti-grammatical) approach, performing a sequence of workspace transformations, could not cope with many linguistic phenomena. Patterns were written for so-called kernel sentences, and any slight departure of input sentences from these kernel sentences caused Meteor to fail.3 Later, Charniak improved Bobrow’s work considerably [27], and pattern matching, as one form of shallow parsing, is still employed in the information extraction community.

2.1.3

Mecho — A Program to Solve Mechanics Problems

In the late seventies, Bundy et al. developed Mecho, a program which solved problems in Newtonian mechanics that were described in English [20] (but see also Mellish’s [114]). Mecho’s development has been driven by the question: “[...] how it is possible to get a formal representation of a problem from an English statement, and how it is then possible to use this representation in order to solve the problem.” 1 The author was recently made aware of some initial work analysing textbook proofs linguistically. In her MSc project, Baur investigates whether off-the-shelf NLP tools can be successfully applied to parse textbook proofs [13]. 2 This is a simplified account: there were also shelves (registers) where intermediate results could be stored. 3 I spent several weeks with a re-implementation of Meteor and followed Bobrow’s footsteps in the textbook proof domain. The results were disappointing. An unpublished paper is available from the author on request.

2.1

A Systems’ Review

21

As Bundy et al. point out, to build Mecho, it was necessary to “investigate every detail of the processed knowledge involved, including aspects like controlling search and model building, which are not usually taught explicitly but are left to the student to pick up from examples”. Mecho operated in the idealised world of mechanics and was able to deal with pulley problems and many other problem types. In [20], Bundy et al. give the following example: (1)

a. Problem statement: Two particles of mass b and c are connected by a light string passing over a smooth pulley. Find the acceleration of the particle of mass b. b. Derived Assertions: isa(period, period1) isa(particle, p1) isa(particle, p2) isa(string,s1) isa(pulley,pull) end(S1,end1,right) end(S1,end1,left) midpt(s1,midpt1) fixed contact(end1,p1,period1) fixed contact(end2,p2,period1) fixed contact(midpt,pull,period1)

mass(p1,mass1,period1) mass(p1,mass2,period1) mass(s1,zero,period1) coeff(pull,zero) accel(p1,a1,270,period1) accel(p2,a2,90,period1) measure(mass1,b) measure(mass2,c) sought(a1) given(mass1) given(mass2)

The original English problem statement (1a) unfolds into a set of the derived assertions (1b), the latter carrying more information than the former. The additional information, which will prove necessary to solve the algebra problem, can only partially be derived from the givens alone. It must be complemented by contextual domain knowledge about mechanics problems, for instance, using the assumptions that pulleys are frictionless and strings weightless. The gaps between the English problem statement and the derived assertions are filled by a cueing schemata mechanism. These schemata bridge the gap between the information explicitly given in the English problem statement and the information needed to solve the problem. The following are three Mecho data structures: a schema for pulley systems, a cue for a standard pulley system problem, and a system definition that defines a pulley system. schema( pullsys, cue pullsys_stan(sys1,pull,s1,p1,p2,period1) [Pull,Str,P1,P2], Time, [ constaccel(P1,Time), sysinfo( pullsys, constaccel(P2,Time), [Pull,Str,P1,P2], cue stringsys(Str,[LPart,RPart]), [pulley,string,solid,solid] ( tension(LPart1,T1,Time) [ supports(Pull,Str), 1, Hardy and Wright then proceed to prove its uniqueness (T HEOREM 2). This requires the definition of a standard form, and the use of a third theorem (T HEOREM 3) and its corollary. Sect. 3.2 includes a discussion of Hardy and Wright’s existence proof in § 3.2.1 as well as a similar proof by LeVeque in § 3.2.2. Sect. 3.3 consists of a detailed account of the uniqueness proof. It also discusses the role of representational means in mathematical writing and reasoning, namely, elliptic constructions. For the sake of completeness, sect. 3.4 analyses the proof of T HEOREM 3, which is relatively simple in comparison to the previous arguments. The goal of our analysis, which will be carried out with scrupulous attention to detail, is to identify and represent the argument’s underlying reasoning patterns, and to make explicit its full logical structure. The analysis is supported by the λ-Clam proof planner [131]. Each of the proof techniques that we describe in 1 In

[58], Gentzen also discusses three variants of standard induction: Γ ⇒ F(t) F(a + 1), ∆ ⇒ F(a) Γ, ∆ ⇒ F(1) Γ ⇒ F(1) ∀x[x ≤ a → F(x)], ∆ ⇒ F(a + 1) Γ, ∆ ⇒ F(t) Γ ⇒ F(t) F(a + 1), ∆ ⇒ ∃x[x ≤ a ∧ F(x)] Γ, ∆ ⇒ F(1)

Standard Induction (descending) Complete Induction (ascending) Complete Induction (descending)

Again, a must not occur in Γ, ∆, F(1) and F(t). 2 Other techniques include diagonalization arguments and techniques used for proving limit theorems. In the Ωmega proof planner [15], both techniques have been modelled (diagonalization [28], limit methods [113]). 3 For example, it is reported that Wiley was forced to extend existing methods, and to develop a number of new methods for proving Fermat’s Last Theorem [148].

40

Proof Technique Forward-Backward

A Mathematical Analysis of Proofs

When To Use It As a first attempt or when B does not have a recognizable form When B has the word “no” or “not” in it When B has the word “not” in it, or when the first two methods fail When B has the term “there is”, “there exists”, etc.

What to assume A

What to Conclude B

How to do it Work forward from A and apply the backward process to B.

NOT B

NOT A

A and NOT B

Some contradiction

Work forward from NOT B and backward from NOT A. Work forward from A and NOT B to reach a contradiction.

A

There is the desired object

Choose

When B has the term “for all”, “for each”, etc.

A, and choose an object with the certain property

That the something happens

Induction

When B is true for each integer beginning with an initial one, say n0

The statement is true for n

Specialisation

When A has the term “exists”, “for all”, “for each” etc. When B has the word “unique” in it

A

The statement is true for n + 1. Also show it is true for n0 . B

There are two such objects, and A

The two objects are equal

Indirect Uniqueness

When B has the word “unique” in it

There are two different objects, and A

Some contradiction

Proof by Elimination

When B has the form “C OR D”

A and NOT C

D

Proof by Cases

When A has the form “C OR D” When B has the form “max S ≤ x” or “min S ≥ x” When B has the form “max S ≥ x” or “min S ≤ x”

Case 1: C Case 2: D

B B

Choose an s in S, and A

s ≤ x or s ≥ x

A

Construct s in S so that s ≥ x or s ≤ x

Contrapositive Contradiction

Construction

Direct Uniqueness

Max/Min 1

Max/Min 2

Figure 3.2: Proof Techniques, from Solow [149]

Guess, construct, etc. the object. Then show that it has the certain property and that the something happens. Work forward from A and the fact that the object has the certain property. Also work backward from the something that happens. First substitute n0 for n everywhere and show it is true. Then invoke the induction hypothesis for n to prove it true for n + 1. Work forward by specialising A to one particular object having the certain property. Work forward using A and the properties of the objects. Also work backward to show the objects are equal. Work forward from A using the properties of the two objects and the fact that they are different. Work forward from A and NOT C, and backward from D. First prove that C implies B; then prove that D implies B. Work forward from A and the fact that s is in S. also work backward. Use A and the construction method to produce the desired s in S.

3.2

The Fundamental Theorem of Arithmetic (Existence)

41

this chapter has been operationalised in λ-Clam. λ-Clam’s proper use will guarantee the construction of a corresponding sound proof plan. Every attempt will be made to ensure that the resulting formal construction preserves the structure and line of reasoning of Hardy and Wright’s original argument. As a by-product, we will enable λ-Clam’s proof planning engine to re-construct Hardy and Wright’s FTA proof automatically, using the reasoning pattern that we previously identified and represented as proof plans. Moreover, we may also discover heuristic knowledge, which may guide λ-Clam as to when and how to apply proof methods, both in the argument at hand and in similar proofs of the same family. Notational Remarks. The analysis consists of a sequence of proof states. Any transition between proof states is a proof step. For conciseness, we translate the expert language used in the informal mathematical argument into sorted first and second-order logic statements. A proof state is represented as Γ ∆, where Γ symbolises the set of assumptions upheld in the given proof state, and ∆ symbolises its (remaining) proof obligations. For example, the initial state in the proof for the existential part of FTA is rendered as Γ ∀n∈N : n > 1 → pprimes(n), with the empty set of formulae Γ. A proof step applies an (applicable) proof method or inference rule to the current proof state and yields a new proof state. We shall sometimes omit type information by abbreviating ∀n∈N : P(n) by ∀n : P(n). Also, we shall sometimes abbreviate ∃n∈N ∃m∈N : P(n, m) by ∃n, m : P(n, m).

3.2


We discuss two different arguments that prove the Fundamental Theorem of Arithmetic (FTA).

3.2.1

Hardy and Wright’s Existence Proof

We start with definitions. The FTA requires the definition of divisible by, divisor of, and prime. For the sake of authenticity, we give Hardy and Wright’s original definitions: (5)

a. An integer a is said to be divisible by another integer b, not 0, if there is a third integer c such that a = bc. b. We express the fact that a is divisible by b, or b is a divisor of a, by b|a. c. A number p is said to be prime if (i.) p > 1, (ii.) p has no positive divisors except 1 and p.

We compress (5a) and (5b) to Definition (3.2.1) and (5c) to Definition (3.2.2). Hardy and Wright do not define the concept product of primes. Given Hardy & Wright’s line of reasoning in (6), our corresponding formal definition (3.2.3) proves to be more suitable than alternative ones. Definition 3.2.1 ∀a∈N ∀b∈N : a|b ⇔ ∃c∈N : b = ac ∧ a ≤ b ∧ c ≤ b Definition 3.2.2 ∀n∈N : prime(n) ⇔ n > 1 ∧ ∀d∈N : d|n → d = 1 ∨ d = n Definition 3.2.3 ∀n∈N : pprimes(n) ⇔ prime(n) ∨ ∃p1∈N ∃n1∈N : n = p1 n1 ∧ p1 < n ∧ n1 < n ∧ prime(p1 ) ∧ pprimes(n1 ) Discourse (6) depicts Hardy & Wright’s first theorem and their proof [68, p. 2].

42


(6)

The main proof idea is to split up prime factors iteratively. The argument makes use of the fact that the least divisor of any number is prime. This assertion as well as its proof is embedded in (6). The well-ordering principle of the natural numbers ensures that the iterative process must eventually terminate. Now, for a detailed proof analysis. If we render Hardy & Wright’s theorem into a first-order logic formula, we obtain ∀n∈N : n > 1 → pprimes(n). Our initial proof obligation, therefore, is Γ ∀n∈N : n > 1 → pprimes(n), with the set of formulae Γ considered empty, for the time being.4 As we will now demonstrate, constructing a maximally rigorous proof from Hardy & Wright’s informal proof requires filling in many steps that they leave unverbalised. This is already true for the formal interpretation of the first proof sentence. The first proof sentence of discourse (6) encodes a rather large sequence of elementary reasoning steps. In particular, it introduces a free variable n and associates it with the predicates prime and divisor o f . First, we need to fill in a reasoning step that introduces such a variable. Given the initial proof state, reasoning backwards, we can apply Gentzen’s ∀-intro yielding Γ ∪ {n∈N } n > 1 → pprimes(n). We have chosen to introduce a free variable n to match Hardy & Wright’s introduction of a variable with the same name. Continuing backward reasoning, we can apply Gentzen’s →-intro and obtain Γ ∪ {n∈N , n > 1} pprimes(n). This step is often omitted in informal mathematical arguments, if the premise A of a statement A → B is simple, which is the case here. Obviously, Hardy & Wright could have made their argument more rigorous by verbalising these two proof steps more explicitly, say by inserting the statement Let n be an integer. Assume n > 1. before their first proof sentence. Note that Solow’s choose method is a combination of these two elementary Gentzen steps. In the particular case, it verbalises as Let n be an integer such that n > 1. The association of the variable n with the predicate prime suggests definitional expansion on pprimes(n) using definition (3.2.3), which yields the new proof state Γ ∪ {n∈N , n > 1} prime(n) ∨ ∃p1 : ∃n1 : n = p1 n1 ∧ p1 < n ∧ n1 < n ∧ prime(p1 ) ∧ pprimes(n1 ). 4 Our

analysis will reveal that the theorem depends on a set of number-theoretic axioms, definitions and derived statements that are not explicitly mentioned in Hardy & Wright’s informal argument. We will disclose these “hidden” dependencies and consider them being added to Γ.

3.2


43

The interpretation of the first proof sentence, which is a disjunction, also suggests a proof by cases. Given that the remaining proof obligation is of the form P ∨ Q, its underlying line of reasoning is as follows: (i) if we assume P, then we can rewrite the proof obligation P ∨ Q to since P entails P ∨ Q; and (ii) if we assume ¬P, then we can rewrite the proof obligation P ∨ Q to Q since P ∨ Q, given ¬P, can only be true if Q holds. With P matching the first disjunct of the remaining proof obligation, and Q as its second disjunct, this formal reasoning adequately models the proof author’s reasoning. Its first case is linguistically expressed as [If] n is prime, when there is nothing to prove. We are therefore left with the second case, symbolically, Γ ∪ {n∈N , n > 1, ¬prime(n)} ∃p1 , n1 : n = p1 n1 ∧ p1 < n ∧ n1 < n ∧ prime(p1 ) ∧ pprimes(n1 ). Now, given the current proof state, we need to derive an interpretation for the textbook statement n has divisors between 1 and n. Switching to forward reasoning, definitional expansion of prime(n), followed by a sequence of logical rewriting steps yields the intended result.5 ¬prime(n) ⇔ ¬(n > 1 ∧ ∀d∈N : d|n → d = 1 ∨ d = n) ⇔ ¬(n > 1) ∨ ¬∀d∈N : d|n → d = 1 ∨ d = n ⇔ ¬(n > 1) ∨ ¬¬∃d∈N : ¬(d|n → d = 1 ∨ d = n) ⇔ ¬(n > 1) ∨ ∃d∈N : ¬(d|n → d = 1 ∨ d = n) ⇔ ¬(n > 1) ∨ ∃dinN : ¬(¬d|n ∨ (d = 1 ∨ d = n)) ⇔ ¬(n > 1) ∨ ∃d∈N : (¬¬d|n ∧ ¬(d = 1 ∨ d = n)) ⇔ ¬(n > 1) ∨ ∃d∈N : (d|n ∧ ¬(d = 1 ∨ d = n)) ⇔ ¬(n > 1) ∨ ∃d∈N : d|n ∧ ¬(d = 1) ∧ ¬(d = n)

deMorgan-1 (¬(A ∧ B) → ¬A ∨ ¬B) rewrite-∀ law of double negation rewrite-→ deMorgan-2 (¬(A ∨ B) → ¬A ∧ ¬B) law of double negation deMorgan-2

The rewriting of ¬prime(n) continues until the expression is sufficiently simplified and in an “intuitive” normal form.6 We obtain the new proof state Γ ∪ {n∈N , n > 1} ∪ {¬(n > 1) ∨ ∃d∈N : d|n ∧ ¬(d = 1) ∧ ¬(d = n)} ∃p1∈N ∃n1∈N : n = p1 n1 ∧ p1 < n ∧ n1 < n ∧prime(p1 ) ∧ pprimes(n1 ). Now, the set of hypotheses contains formulae of the form A and ¬A ∨ B, which obviously can be reduced to A and B (with A standing for n > 1). This reduction is an application of the Modus Ponens inference rule. The second textbook proof sentence, from If m is the least of these divisors, . . . to . . . which contradicts the definition of m, contains a lemma and its proof. Apparently, it serves to reduce one of the proof obligations, namely that prime(p1 ). As with the first proof sentence, a set of intermediate steps has to be identified that are not made explicit in the informal argument. The linguistic expression the least of these divisors suggests a definitional rewriting of | using definition (3.2.1). Such forward reasoning from the assumptions yields Γ ∪ {n∈N , n > 1} ∪ {∃d∈N ∃c∈N : n = dc ∧ d ≤ n ∧ c ≤ n ∧ ¬(d = 1) ∧ ¬(d = n)} ∃p1∈N ∃n1∈N : n = p1 n1

5 λ-Clam, without guidance from the textbook, has a choice point with several branches.

∧

p1 < n ∧ n1 < n

∧

prime(p1 ) ∧ pprimes(n1 ).

For instance, it could reason backwards by attacking the existentially quantified proof obligation, perform definitional expansion of prime, prod of primes, or the other predicates it contains. 6 In this thesis, we omit defining the notion of normal form.

44


Reasoning backwards, Solow’s construction method, which is the same as Gentzen’s ∃-intro, can be applied to each of the two existential quantifiers. We hereby reduce the proof obligation to n = P1 N1 ∧ P1 < n ∧ N1 < n ∧ prime(P1 ) ∧ pprimes(N1 ). Note that P1 and N1 are not actual constructions but meta-variables that will be unified with these constructions at a later stage. The hypotheses will inform the instantiation process. Moving forward, we can apply the specialisation method twice, i.e., to each of the existential quantifiers (λ-Clam: all i on hypotheses). Moreover, we can then repeatedly apply the ∧-elimination rule. The next proof state reflects the result of applying the construction, specialisation, and ∧-rules. Γ n>1 n = dc

n = P1 N1 P1 < n

d ≤ n N1 < n c≤n ¬(d = 1)

prime(P1 ) pprimes(N1 )

¬(d = n) We are therefore left with five proof obligations. The proof obligation n = P1 N1 can be resolved by the hypothesis n = dc, if we instantiate P1 with d and N1 with c. The obligation d < n can be solved using d ≤ n and ¬(d = n) and lemma 3.7 (see below). Similarly, c < n can be reduced to true by using n = dc, ¬(d = 1), and lemma 3.8 (see below). We are left with the proof obligations prime(d) and pprimes(c). We have thus made explicit all the intermediate, implicit reasoning steps that allow the interpretation of the second proof sentence and the application of the lemma it contains. In order to perform a detailed logical analysis of the lemma that claims that the least divisor of any number is prime, we have isolated it from its textbook proof embedding and proven separately. Again, λ-Clam enabled us to support and verify our analysis. Let us now analyse this lemma and its proof. First, we give a formal description of least divisor, namely: Definition 3.2.4 ∀n∈N ∀d∈N : least divisor(d, n) ⇔ d|n ∧ d > 1 ∧ (¬∃c∈N : c|n ∧ c > 1 ∧ c < d). Transforming the lemma into its logical form, we obtain the initial proof state7 Γ ∀d∈N ∀n∈N : least divisor(d, n) → prime(d). We start reducing the proof obligation by first applying Gentzen’s ∀-intro twice. After having eliminated the quantifiers, as the textbook proof suggests, we perform a proof by contradiction: to prove A → B, we assume A and ¬B and need to derive a contradiction from these assumptions. We obtain Γ ∪ {least divisor(d, n), ¬prime(d)} ⊥. With no possibility of backward reasoning, we perform forward reasoning from the hypotheses, expanding the definition of prime in ¬prime. This, followed by the same logical rewriting steps as above, yields Γ ∪ {least divisor(d, n), ¬(d > 1) ∨ ∃x∈N : x|d ∧ ¬(x = 1) ∧ ¬(x = d)} ⊥. Now, if we expand least divisor(d, n), we obtain Γ ∪ {d|n ∧ d > 1 ∧ ¬∃c∈N : c|n ∧ c > 1 ∧ c < d, ¬(d > 1) ∨ ∃x∈N : x|d ∧ ¬(x = 1) ∧ ¬(x = d)} ⊥. 7 Actually,

having bound its variables d and n, we are taking the lemma and its proof outside of its embedding.

3.2


45

If we eliminate the existential quantifier in the formula that resulted from expanding ¬prime(n), break-up all the conjunctions, and perform a Modus Ponens-like inference rule to eliminate ¬(d > 1), we get the proof state Γ ∪ {d|n, d > 1, ¬(∃c∈N : c|n ∧ c > 1 ∧ c < d), x|d, ¬(x = 1), ¬(x = d)} ⊥. Since the goal is ⊥, a contradiction must be in the hypotheses. Again, the informal proof provides guidance. If we can prove ∃c∈N : c|n ∧ c > 1 ∧ c < d from the hypotheses, we contradict ¬(∃c∈N : c|n ∧ c > 1 ∧ c < d). We get the new problem {d|n, d > 1, x|d, ¬(x = 1), ¬(x = d)} ∃c : c|n ∧ c > 1 ∧ c < d. Backwards from the new goal, we apply the construction method and follow-up with breaking apart the conjunctive statement. This leads to three proof obligations, where C is a meta-variable that needs to be appropriately instantiated. Γ ∪ {d|n, x|d, ¬(x = 1), ¬(x = d)} C|n

(3.1)

Γ ∪ {d|n, x|d, ¬(x = 1), ¬(x = d)} C > 1

(3.2)

Γ ∪ {d|n, x|d, ¬(x = 1), ¬(x = d)} C < d

(3.3)

The first obligation can be resolved, if we instantiate C with d. Given that C = d, we could also successfully resolve (3.2), but would fail on the third since ¬d < d. Instantiating C with x in (3.1), we need to prove x|n, which we can obtain with the transitivity of divisor of — ∀a∈N ∀b∈N ∀c∈N : (a|b ∧ b|c) → a|c. With C = x, the second goal (3.2) becomes x > 1. Since x|d, we know that x cannot be zero. With ¬(x = 1), we know that x can only be greater than 1. The third goal is x < d. Rewriting x|d again and eliminating the resulting quantifier, we obtain Γ ∪ {d|n, d > 1, d = xy, x ≤ d, y ≤ d, ¬(x = 1), ¬(x = d)} x < d. Combining x ≤ d and ¬(x = d), we obtain x < d, and we are through. Remainder of the proof. We are left with the obligation to prove pprimes(n1 ). As pointed out before, an iterative argument, a kind of implicit induction is performed. Hardy & Wright indicate their reasoning with repeating the argument and sooner or later. As indicated by these cues, Hardy and Wright’s argument is an inductive proof, although it is not in any easily recognisable, normal form, say as compared to Gentzen’s aforementioned various inductive forms (see footnote 1). Unfortunately, at the time of writing, we have not yet modelled Hardy & Wright’s exact line of reasoning in λ-Clam, and therefore, we need to omit a detailed analysis of the proof remainder. However, Hardy & Wright’s proof can be freely reformulated to yield a form that is open more easily to mechanisation. In [149], for instance, Solow gives a similar proof:

(7)

The statement is clearly true for n = 2. Now assume the statement is true for all integers j between 2 and n. That is, that any integer j between 2 and n can be expressed as a finite product of primes. If (n + 1) is prime, the statement is true for (n + 1); otherwise, (n + 1) has a prime divisor, say p. So there is an integer q with 2 ≤ q ≤ n such that (n + 1) = pq. But by the induction hypothesis, q can be expressed as a finite product of primes and therefore, so can (n + 1).

Solow is very explicit in stating the induction’s base case and the inductive hypothesis. Also, he explicitly mentions its application in the last sentence of his argument. We have been able to model this kind of argument in λ-Clam. In fact, we have chosen LeVeque’s existence proof which is very similar to Solow’s argument. Its formalisation is performed next.

3.2.2

LeVeque’s Existence Proof

LeVeque’s existence proof, taken from [107], is depicted in (8).

46

A Mathematical Analysis of Proofs T HEOREM 2-2. Every integer a > 1 can be represented as a product of one or more primes.

(8)

Proof: The theorem is true for a = 2. Assume it to be true for 2, 3, 4, . . . , a − 1. If a is prime, we are through. Otherwise a has a divisor different from 1 and a, and we have a = bc, with 1 < b < a, 1 < c < a. The induction hypothesis then implies that s

t

i=1

i=1

b = ∏ pi , c = ∏ pi , with pi , pi primes and hence a = p1 p2 . . . ps p1 . . . pt . At first glance, its underlying reasoning seems to be captured by a standard inductive proof over the natural numbers with the base case a = 2 and the step case from n − 1 to n. However, close inspection reveals a course of values induction, or complete induction. In a standard inductive argument, to prove P(n), we can only use the assumption P(n − 1); in complete induction, we can assume P(i) for every n0 < i < n: ∀n∈N : [(∀k∈N : n0 < k < n → P(k)) → P(n)] → ∀n∈N : P(n).

(3.4)

To prove a formula of the form ∀n∈N : P(n), we prove ∀n∈N : (∀k∈N : k < n → P(k)) → P(n)

(3.5)

and apply Modus Ponens using (3.4) and (3.5). To prove (3.5), we apply Solow’s choose method, namely, {n∈N , ∀k∈N : k < n → P(k)} P(n). Note again that in complete induction, we use a generalized induction hypothesis in which we assume that P holds for all previous values of n (not just for n − 1). Thus, there is no need for a separate induction base, because for n = 0 the hypothesis P(k) is obviously true for all k < n (since there is no such k). A short-cut proof plan for the foregoing reasoning chain is: show:

Γ → ∀n∈N : P(n) let n∈N be arbitrary assume Γ, ∀k∈N : k < n → P(k) ↓ forward ...... ↑ backward show P(n)

If we apply this schema for the theorem in (8), we obtain: {n∈N , ∀k∈N : k < n → (k > 1 → pprimes(k))} n > 1 → pprimes(n). Now, the first sentence of the proof, The theorem is true for a = 2 seems to handle the base case of a standard induction: since 2 is prime, it is a product of one or more primes. However, since complete induction is used, as pointed out above, there is no need for such a base. Therefore, in fact, LeVeque’s first proof sentence is superfluous. It is a flaw in his presentation. The second textbook sentence states the induction hypothesis. Since the remaining proof obligation is of the form A → B, the implication method can be applied, and we obtain {n∈N , ∀k∈N : k < n → (k > 1 → pprimes(k)), n > 1} pprimes(n). First, we need to note that LeVeque apparently uses a different definition of pprimes: Definition 3.2.5 ∀n∈N : pprimes(n) ⇔ prime(n) ∨ ∃p1∈N , p2∈N : n = p1 p2 ∧ p1 < n ∧ p2 < n ∧ pprimes(p1 ) ∧ pprimes(p2 )

3.3

The Fundamental Theorem of Arithmetic (Uniqueness)

47

Now, applying this definition, we obtain Γ ∪ {n∈N , ∀k∈N : k < n → (k > 1 → pprimes(k)), n > 1} prime(n) ∨ ∃p1 , p2 : n = p1 p2 ∧

p1 < n ∧ p2 < n

∧

pprimes(p1 ) ∧ pprimes(p2 ).

The next lines of reasoning are very similar to Hardy & Wright’s argumentation: a case split is performed, the first case of which is handled effortlessly. In the second case, forward reasoning requires the definitional expansion of ¬prime(a), and a subsequent rewriting of logical signs to transform its definiens into a normal form. We obtain a proof state that is different from the corresponding one in Hardy & Wright’s proof wrt. the definition of pprimes and the presence of the induction hypothesis: Γ ∪ {n∈N , n > 1} ∪ {∃d : ∃c : n = dc ∧ d ≤ n ∧ c ≤ n ∧ ¬(d = 1) ∧ ¬(d = n)} ∃p1 , p2 : n = p1 p2 ∧ p1 < n ∧ p2 < n ∧

pprimes(p1 ) ∧ pprimes(p1 )

∪ {∀k∈N : k < n → (k > 1 → pprimes(k))} As in Hardy & Wright’s proof, applying the specialisation method, the construction method, and breaking up the conjunctions yields: ∀k∈N : k < n → (k > 1 → pprimes(k)) n>1 n = dc

n = P1 N1 P1 < n

d ≤ n P2 < n c≤n

pprimes(P1 )

¬(d = 1)

pprimes(P2 )

¬(d = n) n = P1 P2 can be achieved by instantiating P1 with d and P2 with c, and using the hypothesis n = dc. The formula d < n is discharged by using the hypotheses n = dc, d ≤ n, ¬(d = 1), and an appropriate lemma. Similarly, c < n can be discharged. We are left with pprimes(c) and pprimes(d) that must be dealt with using the complete inductive method. Since we know that d < n, we can apply a form of Modus Ponens with the induction assumption to obtain d > 1 → pprimes(d) in the hypotheses. Because d > 1 can be easily shown, an application of Modus Ponens yields pprimes(d) in the list of hypotheses, and pprimes(d) can be discharged. Similarly, pprimes(c) can be obtained. The proof made use of the following trivial lemmata:

3.3

∀a∈N : a > 1 → ¬a = 1

(3.6)

∀a∈N ∀b∈N : (a ≤ b ∧ ¬a = b) → a < b

(3.7)

∀a∈N ∀b∈N ∀c∈N : (a = bc ∧ ¬b = 1) → c < a

(3.8)

∀a∈N ∀b∈N ∀c∈N : (a = bc ∧ ¬b = a) → c > 1

(3.9)


The uniqueness proof of the FTA is interesting because of its use of an advanced representational device, namely, the use of ellipsis. 8 We therefore preclude an analysis of Hardy & Wright’s uniqueness proof by a discussion of the representation and operationalisation of elliptic constructions. 8 The

word ellipsis is used in the typographical sense.

48

3.3.1


Using Ellipsis

Ellipsis is a frequently used device in mathematical language, allowing mathematicians to express complex mathematical statements in a very condensed way. Hardy & Wright’s uniqueness argument nicely illustrates this. The following definition of product of primes in standard form, for instance, contains several uses of ellipsis [68, p. 2]:

(9)

A formal representation of this definition must capture all the elliptic constructions that are implicitly or explicitly used in (9), namely, ∃p1 . . . ∃pk , ∃a1 . . . ∃ak , n = pa11 pa22 . . . pak k , a1 > 0 ∧ a2 > 0 ∧ . . . ∧ ak > 0, p1 < p2 < . . . < pk , and prime(p1 ) ∧ prime(p2 ) ∧ . . . ∧ prime(pk ). Moreover, we argue that the modelling of informal mathematical reasoning requires the capability to represent and operationalise elliptic constructions. This has been successfully achieved for a subclass of problems in λ-Clam. A Representation for Elliptic Constructions in λ-Clam. In λ-Clam, elliptic constructions have been defined as -terms [22]. The -notation is used in a similar way to the mathematical use of Σ or ∏. The symbol is a polymorphic, second-order function of type (nat → (nat → τ)) → list(τ). If (N, F) = [F(1), . . . , F(N)], then N is the length of the list, and the function F of type nat → τ. The second-order function applies the function F to each of the natural numbers i, 1 ≤ i ≤ N, and returns the list of its results. Formally, we have (0, F) = nil

(3.10)

(s(N), F) = F(1) :: (N, λi. F(s(i))), where nil designates the empty list, :: is the cons operator, and s is the unary successor function. The concatenation (M, F) (N, G) of two ellipses (M, F) and (N, G) is defined as follows: (M, F) (N, G) = (M + N, comb(M, F, G)). where comb is defined by: comb(M, F, G)(i) =

F(i) if i ≤ M G(i − M) if i > M.

This definition can be portrayed, in elliptic notation, as: [F(1), . . . , F(M)] [G(1), . . . , G(N)] = [F(1), . . . , F(M), G(1), . . . , G(N)]. Using these devices for the representation of elliptic constructions, we can now write the notion of product of primes in standard form, or pop s f , as pop s f ((M, F), n) ↔ n > 1 ∧ ∀i∈N : 1 ≤ i ≤ M → prime(base(F(i))) ∧ ∀i∈N : 1 ≤ i ≤ M → exp(F(i)) > 0 ∧ ∀i∈N : 1 ≤ i < M → base(F(i)) < base(F(s(i))) ∧ n = product((M, F))

(3.11)

3.3


49

where F is a unary function of the form λi.pai i , exp(F(i)) = ai , and base(F(i)) = pi . Since n = 1 is not prime, we add the condition n > 1 to the definition of product of primes in standard form. The term product can be defined as product(nil) ⇒ 1 product(H :: T ) ⇒ mult(H, product(T )) Alternatively, the -term can be specialised to π by replacing the cons operator :: in Def. 3.10 by the product operator ·. The definition of ellipsis concatentation does not need to be changed. In this alternative, the portrayal for π (M, F) can be adapted to F(1) · . . . · F(M). Now, pop s f can be defined as:

pop s f (π (M, F), n) ↔ n > 1

(3.12)

∧ ∀i∈N : 1 ≤ i ≤ M → prime(base(F(i))) ∧ ∀i∈N : 1 ≤ i ≤ M → exp(F(i)) > 0 ∧ ∀i∈N : 1 ≤ i < M → base(F(i)) < base(F(s(i))) ∧ n = π (M, F) The π operator will be used in our subsequent analysis. The proof in discourse (10) below uses three different elliptic constructions to represent products of primes as well as various linguistic expressions that refer to elliptic constructions or parts of them: pa11 pa22 . . . pak k bi−1 bi+1 pb11 . . . pi−1 pi+1 . . . pbk k pa11 . . . pai i −bi . . . pak k

(well-formed)

(3.13)

(ith. element missing)

(3.14)

(ith. element malformed)

(3.15)

Obviously, (3.13) can be expressed easily, namely as π (k, λi.pai i ). The term (3.14) can be represented as bi−1 bi+1 pi+1 . . . pbk k , or the combination pb11 . . . pi−1 b

b

b

b

(i − 1, λ j.p j j ) (k − i, λ j.pi+i+jj ) ≡ (i − 1 + (k − i), comb(i − 1, λ j.p j j , λ j.pi+i+jj )) b

b

≡ (k − 1, comb(i − 1, λ j.p j j , λ j.pi+i+jj )).

i−1 i+1 pai i −bi pi+1 . . . pak k , or The term (3.15) can be represented as the combination pa11 . . . pi−1

a

a

a

−bi−1+ j

i−1+ j (i − 1, λ j.p j j ) (1, λ j.pi−1+ j

a

a

) (k − i, λ j.pi+i+jj ).

For the uniqueness proof, it will prove useful to introduce the definitions of product of natural numbers pon((M, F), n) and product of prime numbers pop((M, F), n). These can be easily defined by dropping one or more of the aforementioned conditions of pop s f ((M, F), n).

3.3.2

Hardy and Wright’s Uniqueness Proof

Discourse (10) depicts Hardy and Wright’s uniqueness proof for the Fundamental Theorem of Arithmetic along with the statement of a third theorem and its corollary [68, p. 3].

50


(10)

We first give a more precise version of the corollary, which in fact makes two assertions, the latter being more specific than the first.

Corollary 1 If p is prime, and p | abc . . . l , then p | a or p | b or p | c . . . or p | l , or in -notation, ∀p∈N : [prime(p) ∧ pon((M, F), n) ∧ p|(M, F)] → ∃ j∈N : 1 ≤ j ≤ M → p|F( j).

Corollary 2 If p is prime, and a, b, c . . . , l are prime, and p | abc . . . l , then p = a or p = b or p = c . . . or p = l , or in -notation, ∀p∈N : [prime(p) ∧ pop((M, F), n) ∧ p|(M, F)] → ∃ j∈N : 1 ≤ j ≤ M → p = F( j). We now go through Hardy & Wright’s proof in formal detail. We start with the initial proof state: Γ ∀n∈N : n > 1 → ∃! M∈N ∃! F∈(N→τ) : pop s f ((M, F), n). Applying the choose method, or a combination of ∀-intro and →-intro, we obtain Γ ∪ {n∈N , n > 1} ∃! M∈N ∃! F∈(N→τ) : pop s f ((M, F), n). The first proof sentence. Now, Hardy & Wright apply the uniqueness method. We can assume the existence of two product of primes representations for the arbitrarily chosen number n and then show that the given representations are actually identical. Formally, {n∈N , m2∈N , f2∈(N→τ) } ∪ {n > 1, pop s f ((M1, F1), n), pop s f ((m2 , f2 ), n)} M1 = m2 ∧ F1 = f2 .

3.3


51

The direct uniqueness method can be explained in terms of Gentzen rules. {n∈N , n > 1} 1 ∃! m∈N ∃! f∈(N→τ) : pop s f ((m, f ), n) {n∈N , n > 1} 2 ∃m1∈N ∃ f1∈(N→τ) : pop s f ((m1 , f1 ), n) ∧ ∀m2∈N ∀ f2∈(N→τ) : pop s f ((m2 , f2 ), n) → m1 = m2 ∧ f1 = f2 {n∈N , n > 1} 3

pop s f ((M1 , F1 ), n) ∧ ∀m2∈N ∀ f2∈(N→τ) : pop s f ((m2 , f2 ), n) → M1 = m2 ∧ F1 = f2

In 1 we have the initial proof obligation. It contains ∃! quantifiers that are expanded in 2 . Then, the resulting existential quantifiers are eliminated by applying Gentzen’s ∃-intro (or Solow’s construction method). The conjunction can then be broken up, and then the proof obligation pop s f ((M1 , F1 ), n) can be discharged by the existence proof given earlier. We are thus left with: {n∈N , n > 1} 4 ∀m2∈N ∀ f2∈(N→τ) : pop s f ((m2 , f2 ), n) → M1 = m2 ∧ F1 = f2 {n, m2∈N , n > 1, f2∈(N→τ) } 5

pop s f ((m2 , f2 ), n) → M1 = m2 ∧ F1 = f2

{n, m2∈N , n > 1, f2∈(N→τ) , pop s f ((m2 , f2 ), n)} 6 M1 = m2 ∧ F1 = f2 In the proof state 4 , we can apply Gentzen’s ∀-intro twice yielding the line 5 . Then, we apply →-intro to obtain the last derivation 6 . To prove M1 = m2 and F1 = f2 , we obviously can (and need to) also use the fact that pop s f ((M1, F1), n). b

The second proof sentence. Hardy & Wright state that pi | qb11 . . . q j j for all i. This holds using Lemma 3 and the use of the equalites n = (M1, F1) and n = (m2, f 2).

3 Lemma ∀n∈N : pop((M, F), n) → ∀i∈N : 1 ≤ i ≤ M → F(i)|n. Then, Hardy & Wright use Corollary 2 to conclude that every p is a q and the fact that pop s f ((X,Y ), n) → pop((X,Y ), n). Analogously9 , it can be proven that every q is a p. The third proof sentence. Now, Hardy & Wright claim that k = j. Since we can map any p onto a q and vice versa, that is, there is a bijective mapping, there must be equal numbers of p’s and q’s, that is, the sets pi and q j are equipollent, and therefore, k = j.

4 Lemma ∀ f1∈(A→τ) ∀ f2∈(B→τ) ∀i∈N : 1 ≤ i ≤ m1 → (∃ j∈N : 1 ≤ j ≤ m2 → f1 (i) = f2 ( j)) ∧ ∀ j∈N : 1 ≤ j ≤ m2 → (∃i∈N : 1 ≤ j ≤ m1 → f2 ( j) = f1 (i)) → |A| = |B| This lemma can be used to discharge the proof obligation M1 = m2 . The second proof obligation requires two function terms to be tested for equivalence. If F1 = λi.pai i and F2 = λi.qbi i , then we have to prove that F1(i) is equal to F2(i), that is, pi = qi and ai = bi for every i with 1 ≤ i ≤ k. Using the strict ordering conditions p1 < p2 < . . ., q1 < q2 < . . ., and results from discharging M1 = m2 , we can conclude that pi = qi for all i. 9 For

the use of symmetry in proofs, see Jim Molony’s PhD thesis [119].

52


The remainder of the proof. λ-Clam is left with the remaining proof obligation that ai = bi for all i, which is discharged as follows. We assume ai = bi and have to derive a contradiction. Two cases have to be considered, ai < bi and bi < ai . Since a contradiction can be derived for each of the cases, we have (¬(ai > bi )) ∧ (¬(bi > ai )), and therefore, ai = bi for all i. A set of statements is used that captures the divisibility properties for pop s f , namely, a

a

pop s f ((k, λ j.p j j ), n) → divide((k, λ j.p j j ), pai i , (k − 1,

(3.16)

a a comb(i − 1, λ j.p j j , λ j.pi+i+jj ))) a

a

a

pop s f ((k, λ j.p j j ), n) ∧ ai > bi → divide((k, λ j.p j j ), pbi i , (i − 1, λ j.p j j )

(3.17)

ai−1+ j −bi−1+ j ) (1, λ j.pi−1+ j ai+ j (k − i, λ j.pi+ j ))

(3.18) and a

a

pop s f ((k − 1, comb(i − 1, λ j.p j j , λ j.pi+i+jj )),

n ) → ¬divisible((k − 1, pai i

(3.19) a

a

comb(i − 1, λ j.p j j , λ j.pi+i+jj )), pi ) The statements (3.16) and (3.17) hold for all k∈N and for all i such that 1 ≤ i < k; (3.19) holds for all k∈N and for all i such that 1 ≤ i < k − 1;

3.4

Hardy and Wright’s Proof of Theorem 3.

For the sake of completeness, we now briefly analyse Hardy & Wright’s proof for T HEOREM 3, which T HEOREM 2 depends on. The logical form of Theorem 3 is ∀p∈N ∀a∈N ∀b∈N : prime(p) ∧ p | ab → p | a ∨ p | b. Its general form has already been given as Corollary 1, which we repeat for convenience: ∀p∈N : [prime(p) ∧ pon((M, F), n) ∧ p|(M, F)] → ∃ j∈N : 1 ≤ j ≤ M → p|F( j). Hardy & Wright prove the specific case only [68, p. 21]. Their proof as depicted in (11).

(11)

The initial proof state is Γ ∀p∈N ∀a∈N ∀b∈N : prime(p) ∧ p | ab → p | a ∨ p | b. If we apply ∀-intro for each of the three universal quantifiers, then apply the proof by elimination method, and then break up the conjunction prime(p) ∧ p | ab into its constituents, we obtain: Γ ∪ {p∈N , a∈N , b∈N , prime(p), p | ab, ¬p | a} p | b. This uncovers the underlying logical structure of the first proof sentence and explains the assumption made in the second, p | a. Now a directed line of forward reasoning departing from p | a follows. We will need these lemmata:

3.5

Conclusion

53

5 Lemma ∀a∈N ∀b∈N : prime(a) ∧ ¬a|b → gcd(a, b) = 1. 6 Lemma ∀a∈N ∀b∈N ∀d∈N : d|a → d|ab. 7 Lemma ∀a∈N ∀b∈N ∀d∈N ∀x∈N ∀y∈N : d|a ∧ d|b → d|xa + yb. 8 Lemma ∀a∈N ∀b∈N ∀x∈N : x|a ∧ a = b → x|b. 9 Lemma ∀a∈N ∀b∈N ∀c∈N : gcd(a, b) = c → ∃x∈N ∃y∈N : xa + yb = c. Applying Lemma 5 to the last proof state yields {p∈N , a∈N , b∈N , prime(p), p|ab, ¬p|a, gcd(a, p) = 1} p|b. The resulting formula, gcd(a, p) = 1 can then be rewritten by applying Lemma 9: {p∈N , a∈N , b∈N , prime(p), p|ab, ¬p|a, ∃x∈N ∃y∈N : xa + yp = 1} p|b. Continuing forward reasoning, we apply the specialisation method twice: {p∈N , a∈N , b∈N , prime(p), p|ab, ¬p|a, X∈N ,Y∈N : Xa +Y p = 1} p|b. Following the informal argument, we multiply both sides of the resulting equality by b: {p∈N , a∈N , b∈N , prime(p), p|ab, ¬p|a, X∈N ,Y∈N : Xab +Y pb = b} p|b. Now, lemma 6 is used twice: from p|ab conclude p|xab, and from p|pb conclude p|ypb. Lemma 7 is applied once; Lemma 8 is applied to finish the proof.

3.5

Conclusion

In this chapter we analysed a set of informal proofs in great detail in order to disclose their underlying logical structure. This is a complex task. As our discussion shows, many reasoning steps were omitted in the original informal proofs, and many choices were made to make them fully explicit. Therefore, “staying as close as possible” to a textbook proof is not a well-defined goal, and in this thesis we make no claim about the particular appropriateness of our formalisation. However, one could define a formalisation as appropriate if the textbook proof can be embedded in the resulting formal proof. A textbook proof is embedded in a formal proof, if each informal mathematical assertion has a corresponding formalised occurrence in the formal proof. The notion of embed-ability on the assertion level could be complemented by the notion of a structurepreserving mapping on the proof plan level. For instance, we failed to give a formal representation for Hardy & Wright’s existence proof because we lacked an adequate proof plan in λ-Clam. The proofs by Solow (discourse 7) and LeVeque (discourse 8) can be seen as free reformulations of Hardy & Wright’s proof, and subsequently, a formal construction for LeVeque’s existence proof was given. Our formalisation of LeVeque’s informal representation preserves its structure on the proof plan level. Hardy & Wright’s informal existence proof, however, cannot be mapped to our formalisation of LeVeque’s proof — because they do not share the same high-level argument structure. Our formalisation shows that the rules of reasoning by Gentzen were adequate to model most of the mathematical reasoning of the informal textbook proofs at hand. At some points we were able to use higher-level rules of reasoning, for instance the induction method and the direct uniqueness method. As we demonstrated, those complex argument patterns can be expressed in terms of Gentzen’s inference rules.

54


Without human guidance, λ-Clam, as well as other proof systems, will have considerable difficulties to prove FTA or to construct mathematical arguments of similar complexity. For instance, a source that may cause infinite looping is the repeated expansion of recursive definitions, say pprimes. A natural idea therefore is that a textbook proof may guide a proof system through the large space of possible inferences; — this is one of the underlying ideas of this dissertation work. In this chapter, the author served as a mediator between textbook proof and proof system. The manual mapping of the given proofs in λ-Clam uncovered a number of weaknesses of λ-Clam, which we briefly discuss now. Forward/Backward Reasoning. An important issue in mathematical reasoning is that it incorporates both forward and backward reasoning. The embedded lemma in Hardy & Wright’s existence proof is a good example of switching between forward and backward reasoning. Having in the hypotheses a formula that asserts the existence of divisors and the goal to prove that there exists some number p being prime, a lemma is sought that states that the least divisor of any number is prime. An analysis of λ-Clam’s pre-defined methods suggests that its developers predominantly favoured goal-directed reasoning to forward reasoning. A large time investment was necessary to allow forward reasoning inferences. λ-Clam would profit from the design of methods that could be applied in both forward and backward reasoning directions. Technicalities of Formal Proof. In rigorous proofs, problems arise that do not arise on a more abstract level of reasoning. For instance, in formal proof the order of rule application is crucial to adhere to the eigenvariable condition. If rules are applied in an order that leads to the violation of eigenvariable conditions, then proof search will require backtracking, and this unnecessarily widens the exploration of the proof’s search space. Interestingly, human mathematicians handle the eigenvariable condition almost automatically. λ-Clam’s design would profit from the implementation of proof plan critics that allow both the detection of such eigenvariable violations as well as their automatic repair. Rewrites Rules. In mathematical proofs, one often expands a definition, applies a lemma, or performs purely logical transformations. In λ-Clam these activities conflate to the operation of term rewriting. λ-Clam would profit from classifying rewrite rules into three categories, namely, definitional expansion, lemma application, and rewriting of logical signs. Ideally, heuristics can be defined and encoded in proof plans that guide λ-Clam in performing these different rewriting activities. For instance, after having rewritten a definition, that is the definiendum into the definiens, a rewriting of logical signs in the definiens into a normal form should be tried. This occured, for instance, in both LeVeque’s and Hardy & Wright’s existence proofs of the FTA. Consequently, the success of mechanically mapping informal proofs to formal proofs largely depends on the proof system that is used to construct such formal proofs. Another basic ingredient, of course, is the capability of interpreting the informal English input statements, and to translate them into appropriate logical forms. In the next chapter, we will analyse informal mathematics, including some of the mathematical arguments that we discussed in this chapter, from a linguistic perspective.

Chapter 4

A Linguistic Analysis of Proofs In the preceding chapter, we analysed informal proofs from a mathematical reasoning perspective. In the present chapter, we put on a linguistic hat and study their language. The goal is to identify and classify the linguistic problems that we have to resolve in order to transform an informal mathematical argument into a formal one. In some ways, as we will see, mathematical discourse is simpler to process than more general types of discourse. Its well-defined lexical entries, its frequent use of formulaic expressions and its logical structure, which we examined in the previous chapter, facilitate the understanding process. On the other hand, various aspects particular to mathematical discourse add complexity to its processing, for instance, the mixtures of symbols and text and the very subtleness of expression. In sect. 4.1, we start with a number of general remarks on mathematical writing. The subsequent sections then analyse the mathematical language on a term, sentence and discourse level. Sect. 4.2 discusses denoting and naming in mathematical discourse. It is followed-up by a classification of linguistic phenomena in sect. 4.3. In this section, we focus on anaphoric and elliptic constructions and also present an extended discussion of abstract discourse entities. In sect. 4.4, we review the use of connectives and conditionals in mathematical discourse. Conjunctions, disjunctions and negations are only briefly examined. Due to their importance, conditional and pseudo-conditional constructions are discussed in greater detail.

4.1

General Remarks on Mathematical Writing

There is no universal language of informal mathematics. In German speaking countries, the language of informal mathematics shares the same lexical entries and grammar rules of everyday German; in French speaking countries, mathematical proofs contain French words and follow French grammar rules. Everyday natural language and the language used by mathematicians differ in several ways, however. Lexicon: The basic units of the informal mathematical language are not only words but symbols of a wide variety. Moreover, the lexical units, because they are used in a scientific setting, come with precise definitions. Also, the proper use of words and symbols is governed by convention. Before using a new concept, mathematicians explicitly define it; and, a mathematician has a clear prescribed method for how to define new lexical entries. As a matter of fact, it is commonplace for a mathematician to review, extend, or redefine his/her language within a complex mathematical argument or tractatus. In the case of redefinition, the previous meaning of a word or symbol is overridden or shadowed, or the word or symbol is overloaded with a set of meanings. In addition, the symbol and word definitions mirror the hierarchical structure of the concepts they denote in a mathematical theory. This argument, of course, is valid for any form of scientific discourse. However, the art of defining lexical entries has achieved its highest standard in the mathematical sciences. Grammar: Symbolic mathematical expressions are composed of mathematical (constant, variable, function, predicate, and special) symbols in a well-defined way. Grammar rules govern the combination of words, terms and formulae to form well-formed mathematical statements. The syntactic constructions of informal mathematical discourse are relatively easy, stylised or formulaic and more or less in line with English grammar rules. There is even a booklet of J. Trzeciak that 55

56

A Linguistic Analysis of Proofs contains hundreds of standard phrases that allow non-native speakers of English to write mathematical arguments in English [156]. Fig. 4.1 depicts a selection of those standard constructions. Since the art of writing good mathematical texts focuses on clearness and conciseness and not on an embellished style of expression, most mathematical arguments could be expressed by instantiating and combining these textual components. Moreover, due to the scientific and timeless nature of mathematics, a mathematical argument is usually stated in the present tense using the active voice.1 Assertions are its most common and basic ingredient; imperatives and questions are rarely used. Frequently, verbs in subjunctive mood are used to state hypothetical assertions.

Discourse: As we have observed in ch. 3, textbook proofs are written in a highly structured form of discourse. Common patterns of reasoning, as captured with proof plans in ch. 3, govern the segmentation of a discourse and help to identify the relations between its parts. They also enable us to predict, in a given discourse context, future discourse continuations. Linguistically, a variety of cue phrases that signal structure contribute to the understanding of mathematical discourse. For example, the cue similarly indicates a parallel construction, but a contrast and which concludes the first case a segment termination. As we shall see in this chapter, the language of mathematics can not only be used to effectively communicate ideas in a precise fashion, but also in a concise, tremendously efficient manner. Surprisingly, the literature on the art of mathematical writing is sparse. We have already mentioned Trzeciak’s catalogue of syntactic constructions, which a mathematician can instantiate and assemble into complex mathematical arguments [156]. In § 2.2.4, we briefly reviewed Lamport’s hierarchical style for presenting mathematical arguments [103]. Proofs in Lamport’s presentation style are multi-layered. The top level gives a high-level and informal presentation of the argument; the embedded levels provide increasingly more detail and formal rigour. The proof is therefore accessible for the reader at different levels of abstraction. Lamport-style proof presentation is the most revolutionary approach for communicating mathematical proofs. It contains, however, the basic ingredients of well-written mathematical arguments, which are also advocated by mainstream mathematicians. A good style in mathematical writing, which is directed at a clear exposition of mathematical ideas, is advocated and discussed by examples in van Gasteren [162]. His monograph also includes a section on “what to name and how to name”. Monographs of Krantz [101] and Higham [75] as well as a technical report of Knuth et al. [98] contain rules that a writer should follow, for example: Avoid introducing unnecessary notation and use notation consistently! Minimise the use of cumbersome notation! Krantz argues that because mathematics is already logically complex and subtle, simple and declarative sentences should be used to express it. A mathematical argument is displayed both in a bad style and in a good style below [101]. (12)

a. Bad: If g is positive, f is continuous, the domain of f is open, and we further invoke Lemma 2.3.6, then the set of points at which f · g is differentiable is a set of the second category, provided that the space of definition of f is metrizable and separable. b. Good: Let X be a separable metric space. Let f be a continuous function that is defined on an open subset of X. Suppose that g is any positive function. Define S = {x : f · g is differentiable at x}. Then, by Lemma 2.3.6, S is of second category.

1 Although

the past tense and passive is frequently used in historic statements, for instance, “Gauss was the first to develop arithmetic as a systematic science” or “Theorem 7 was proved by Tchebychef about 1850”.

4.1

General Remarks on Mathematical Writing

57

Stating the theorem If ____, then ____. If ____ and if ____, then ____. Let assumptions 1-5 hold. Then ____. Under the above assumptions, ____. Under the same hypotheses, ____. Under the conditions stated above, ____. Equality holds in (8) if and only if ____.

____ implies ____. Let M be ____. Suppose (that) ____. Then ____, provided ____. Assume (that) ____. unless ____. Write ____. Given any ____ suppose that ____. Then ____. Let P satisfy the hypothesis of ____. the above assumptions. N(P) = 1.

Stating proof method or proof omission Suppose the assertion of the lemma is false. Suppose, contrary to our claim, that ____. To obtain a contradiction, suppose that ____. Suppose the lemma were false. Then we ____.

The proof is by induction. contradiction. cases. consists in the construction of ____.

The proof is by contradicting ____. We give a direct proof.

We give the proof only for the case ____. The proof is straightforward/standard/immediate.

Reasoning Since ____, we have ____. , it follows that ____. , we see/conclude that ____.

From (5) we conclude/deduce/see that ____. this we have/obtain ____. what has already been proved, it follows that ____.

But ____, since ____ is ____. We have ____, because, ____. We must have ____, for otherwise ____. As ____ is ____, we have ____. Therefore ____ by ____. That ____ follows from ____.

It follows that ____.

It may be concluded, that ____.

____ is ____, and so ____. consequently ____. and, in consequence, ____. and hence ____.

This gives ____. We thus get ____. The result is ____. ____ now becomes ____. it follows that ____. if ____ , then ____.

which gives/implies/yields ____. Since ____, (2) shows that _____, by ____. We conclude from (6) that ____, hence that ____, and finally that ____. Hence ____ . Thus ____. Therefore ____. Consequently ____.

____ only if ____. ____ only when ____.

____ is a sufficient condition for ____. ____ is a necessary condition for ____.

Whenever ____ is true, ____ must also be true. Stating proof termination This proves the theorem. completes the proof. ____ which proves the theorem. completes the proof.

____ which establishes the formula. is the desired conclusion. is our claim. is our assertion.

____ and the proof is complete. ____ when there is nothing to prove.

____ this is precisely the assertion of the lemma. ____ the lemma follows. ____ (2) is proved.

____ in which case the proof is completed. ____ we are through.

This contradicts our assumption. the fact that ____.

____ ____

____ ____

contrary to (2). which is impossible.

which contradicts the maximality of... a contradiction.

Figure 4.1: Standard Phrases in English Mathematical Discourse

58

A Linguistic Analysis of Proofs

In contrast to the bad style version (12a), the good style version is not only easier to read for humans — it is also easier to process mechanically from the grammar engineering point of view.2 Another point can be made for the following example, which is taken from Krantz’s [101]. (13)

a. Every nonnegative real number has a square root. b. ∀x∃y, x ≥ 0 ⇒ y2 = x.

Krantz argues that (13a) should be preferred to (13b). In the same school of thought is Higham, who suggests using symbols only if the idea would be too cumbersome to express in words, or if it is important to make a precise mathematical statement. Words should be used as long as they do not take up much more space than the corresponding symbols [75, p. 24f]. Note however that (13a) and (13b), although mathematically equivalent, are linguistically very different from each other.3 The following pair of sentences shows a similar case: (14)

a. The number of primes is infinite. b. ∀x : prime(x) → ∃y : y > x ∧ prime(y).

Sentence (14a) is the one a mathematician might prefer in theorem statements. The logical form in (14b) is likely to be the one used in proving this statement.4 The question that we need to address is, how can we ensure that the semantic construction of the natural language phrase (13a) exactly yields the predicate logic formula (13b), or at least, that semantic construction of both sentences yields the same natural result? To elaborate, consider an example by Rosser, which demonstrates some of the basic difficulties of translating a statement in mathematics to a statement in symbolic logic [135, p. 96f]. (15)

a. f(x) is continuous at the point x. b. for each positive ε there is a positive δ such that whenever |y − x| < δ we have | f (y) − f (x)| < ε. c. for each positive ε there is a positive δ such that (∀y : |y − x| < δ → | f (y) − f (x)| < ε). d. for each positive ε (∃δ : δ > 0 ∧ (∀y : |y − x| < δ → | f (y) − f (x)| < ε)). e. ∀ε : ε > 0 → (∃δ : δ > 0 ∧ (∀y : |y − x| < δ → | f (y) − f (x)| < ε)).

The translation from (15a) to (15b) is simply a definitional expansion on continuous. The translation from (15b) to (15c) has two interesting points. First, constructions of the form whenever A we have B translate as implications A → B. Second, our resulting expression (15c) considers the y in whenever |y − x| < δ we have | f (y) − f (x)| < ε as a quantified variable, and not as an unknown. This is due to whenever, which gives y a universal reading. If we said instead if |y − x| < δ, then | f (y) − f (x)| < ε, we would translate it as |y − x| < δ → | f (y) − f (x)| < ε, an expression that leaves y unbound. The step from (15c) to (15d) reads the expression there is a positive δ such that F(δ) as δ having simultaneously the two attributes “δ is positive” and F(δ). The translation from (15d) to (15e), however, needs to be different. Contrary to the 2 Krantz’s point is attackable if one uses typographic means to structure the argument. The bad style argument (12a) can be depicted in a nicely set form:

If

g is positive, f is continuous, the domain of f is open, and we further invoke Lemma 2.3.6, then the set of points at which f · g is differentiable is a set of the second category, provided that the space of definition of f is metrizable and separable.

An anonymous referee of one of my papers made this comment. In this dissertation, however, typographic information is neither studied nor used. 3 (13b) is a rather manufactured way of formalising (13a). Some rewriting is necessary to show their mathematical equivalence: x has a square root, say y → y is a square root of x → y = 4 The

√

x → y2 = x.

translation of “infinite” depends on the theory in which it is used. For example, in the negative integers, we need to substitute “>” by “ 0 ∧ Q(ε) since this would assert that ε is both positive and possesses the property Q(ε). Here, the intended interpretation needs to be ∀ε : ε > 0 → Q(ε). These steps transform the logical form continous at point( f (x), x) of (15a) into the symbolic logic statement (15e). The semantics of the resulting expression is as precise as the semantics of its elementary parts, namely, ε > 0, δ > 0, |y − x| < δ, and | f (y) − f (x)| < ε. As pointed out earlier, mathematical discourse should be constructed of simple, declarative and stylised sentences, each of which should be composed of well-defined words and symbols. This has been the case in the sentences (12b), (13a), (14a) and (15a). Making this assumption the underlying premise for the construction of a text understander, however, would be an over-simplication. In practise, the language used in mathematics is sufficiently rich to raise severe problems for an automated text understander. As the following two examples show, there are a wide variety of possibilities for expressing the same mathematical fact and in more or less complex syntactic constructions: (16)

a. Th. 2 (The fundamental Theorem of Arithmetic) [68, p. 3]. The standard form of n is unique; apart from rearrangement of factors, n can be expressed as a product of primes in one way only. b. Theorem on the unique prime decomposition [71, p. 3]. Every natural number a possesses one and only one representation a = p1 . . . pn as a product of (not necessarily distinct ) primes p1 , . . . , pn .

Building a natural language understander that is able to parse (16a) and (16b) and to show their semantic equivalence is a very difficult matter. As we will see, the full range of linguistic phenomena that we know from everyday English discourse is also present in informal mathematical discourse. Moreover, complexity is added that stems from the use of symbolic expressions in textual environments. However, we believe that the highly structured nature of both mathematical domain and mathematical discourse, complemented with the use of advanced reasoning methods, more than compensates the added linguistic complexity. Before we discuss a range of linguistic phenomena that is specific to mathematical writing, we give a thorough account of denoting in mathematical discourse.

4.2

Denoting in Mathematical Discourse

The manner of denotation found in mathematical discourse is influenced by the fact that mathematicians reason about objects that do not exist in the physical world. Because their objects of study are abstract entities, residing in the Platonic universe, referring by gesture (“this one, I mean”; “the one over there”) is not possible. All that mathematicians have are representations of these objects and not the objects themselves. This requires mathematicians to use representations very carefully. To counter-balance the lack of “physical” deixis, they need to have a language that allows them to form representations that describe abstract objects and their properties in a precise and non-ambiguous manner. As a matter of fact, the expert language of mathematics has plenty of “non-physical”, deictic constructions that can refer to object representations and their parts. The use of names for variables is a representational means that is important in mathematical language and reasoning. Having names available for variables enables one to show explicitly and concisely which part of an argument depends on other parts of the argument, and which part does not. This allows for disentanglement and decomposition of the argumentation. In [103], Lamport gives the following example: (17)

a. There do not exist four positive integers, the last being greater than two, such that the sum of the first two, each raised to the power of the fourth, equals the third raised to the same power. b. There do not exist positive integers x, y, z, and n, with n > 2, such that xn + yn = zn .

As Lamport points out, (17a) reflects the style of seventeenth century mathematicians, and (17b) is the modern version. Variables are given names, and formulas are written in a more structured fashion.

60


The use of names for variables and the use of symbols in general has indeed revolutionised mathematics. The symbolisation and its impact on the progress in mathematics is discussed in Krämer’s well-written monograph [100]. In [101, p. 25], Krantz gives a brief tabular overview of the first use in print of some wellknown mathematical symbols. There is also an excellent on-line resource on the earliest uses of symbols in the mathematical disciplines as well as the earliest uses of constant, variable, and predicate symbols.5 Assigning names to entities is one of the basic activities in modern mathematical writing. Since many symbols now have their conventionalised default denotation, naming must respect existing notational conventions. Choosing the right notational means is important. For example, take the Physics formula E = mc2 . Even in a non-Physics context does the use of E, m, and c in E = mc2 induce that E denotes energy, m denotes mass, and c denotes the velocity of light. Einstein’s famous formula would be hardly recognisable if different names for the constant and the two variables were used, say, by writing it as A = BC2 . The proper use of names is complemented by the proper structural representation of a symbolic expression. Although the formulae E = mcc, mcc = E, c2 m = E, or E = cmc all express the same relation between energy, mass and velocity, E = mc2 is easier to recognise because it is the conventionalised way to express this relation.6 Any mathematical expression should have a denotation and sometimes can have a connotation attached to it. The formula E = mc2 has not only its literal, primary meaning but also refers to other entities suggested or implied by that formula, like Albert Einstein, or the theory of relativity. In the following section, we will analyse in greater detail the basic building blocks for writing symbolic expressions, namely, constant, variable, function and predicate symbols.

4.2.1

Constant Symbols and other Proper Names

A constant symbol denotes exactly one entity. In linguistic terminology, therefore, constant symbols are called proper names. Constant symbols and proper names can denote several types of objects. Numbers: The constant name 2 refers to the number 2, and since in the Platonic universe there is exactly one entity 2, we have that 2 unambiguously denotes it. The constant name 2 is not the only reference to 2. Amongst the infinite alternative representations for 2 are ||, 00000010 and 5 − 3. Sets: The symbol N denotes the set of natural numbers; Z denotes the set of integers; the primes denotes the set of all prime numbers; the Fermat numbers denotes the set of all Fermat numbers. Functions: The symbol sin names one of the three fundamental trigonometrical functions. It denotes the unary function of an angle that returns the constant ratio of the length of the side of a right triangle opposite that angle to the length of the hypotenuse. The symbol gcd denotes the binary function that returns the greatest common divisor of two given numbers. Theorems: The expressions Euler’s first theorem, Fermat’s last theorem, and The fundamental theorem of arithmetic are all definite descriptions. However, they function as proper names by uniquely identifying their respective propositional antecedent independently from any specific discourse context. Algorithms: The expressions the sieve of Eratosthenes (number theory), Quicksort (computer science), and the Ford-Fulkerson Max Flow Labeling Algorithm (graph theory) properly denote the respective algorithms. The use of names is often specific to a particular sub-discipline of mathematics. In group theory, e usually denotes the neutral element of a group. In number theory, e denotes the Euler number. However, within a particular mathematical field, it is possible to “shadow” or redefine the denotation of a proper name. To give a bad example, in some contexts e could be redefined to name a variable or an unknown. 5 The

URL is http://members.aol.com/jeff570/mathsym.html. mathematicians, the formulae e = cmc or ec = mc are rather weird constructions. They prefer to collect terms into one expression (e = cmc → e = mcc → e = mc2 ) and isolate the “interesting” term (e is related to mc2 ). This also relates to compact or minimal representations: 7 is a short cut for s(s(s(s(s(s(s(0))))))), or |||||||. 6 For

4.2


4.2.2

61

Variables

This section profits from Rosser’s discussion on unknowns and variables [135, p. 82ff], Epstein’s monograph [46], Frege’s footnote on Russell’s notion of a variable [163] and Schoenfeld & Arcavi’s article on the meaning of variables [145]. The concept of a variable is central to mathematics. Any text understander, a mathematician, a student, a teacher or a machine must cope with its multiple meanings, connotations and uses. Surprisingly, textbooks on mathematics rarely explain the notion of a variable in much detail, in contrast to textbooks on computer science (particularly about programming languages) and, of course, logic. To illustrate the complexity of the notion of a variable from the computer science perspective, we briefly describe the use of variables in programming languages. As we shall review below, other disciplines, especially logic and (informal) mathematics use the concept of a variable differently. 4.2.2.1

Variables in Programming Languages

Programming languages are formal languages, and this fact not only greatly facilitates but enables the processing of algorithms written in these languages. Any programming language that is being used in practise and that the author is aware of incorporates some notion of a variable. However, these conceptions of variables may vary considerably between programming languages. A variable may often be referred to as an identifier, a name or a reference (Pascal, C). In Prolog, a variable is simply called a variable. The Scheme report has the following paragraph about variables [95, p. 6]: “An identifier may name a location where a value can be stored. [...] An identifier that names a location is called a variable and is said to be bound to that location. The set of all visible bindings in effect at some point in a program is known as the environment in effect at that point. The value stored in the location to which a variable is bound is called the variable’s value. By abuse of terminology, the variable is sometimes said to name the value or to be bound to the value. This is not quite accurate, but confusion rarely results from this practice.”

In the programming language C, a variable is also described as being a pointer to a memory location; it denotes a memory location of a well-defined size. The value of a C variable is the content of the memory cell(s) it denotes. A C assignment operation overwrites the content of the memory cell(s) a variable points to with a specific value. In C, given an identifier x of some type, ∗x accesses its value, and &x denotes (the beginning of) its memory address. In Prolog, as the Sicstus Prolog Manual points out, “a variable should be thought of as standing for some definite but unidentified object”, analogous to the use of pronouns in English [1, p. 42]. In contrast to other programming languages, where a variable simply denotes a writable storage location, a Prolog variable is a local name for some data object, and no access is given to its location. Languages unlike Prolog provide global and local variables that come with scope and extent and corresponding notions of accessibility (variable shadowing, variable capturing, variable binding etc.). Variables in these languages need to be defined before their first use (Lisp, Pascal, C) and given a type (Pascal, C). Language interpreters or compilers generate errors if they encounter a variable that has not been declared or defined beforehand, or has not been declared, defined or accessed by using the proper language, or is assigned a value of a wrong type. Variables can serve as parameters or arguments of a function, method or procedure. Each programming language has its protocol for parameter passing that precisely specifies how to pass the value of a variable (call-by-value), its memory location (call-by-reference) or its name (call-by-name) from one part of a program to another. A variable in programming is not automatically an entity that continually decreases, increases or changes its value. Rather, it denotes a memory location or serves as a local name for a data object. Most languages, however, provide language constructs for a loop variable that captures the notion of a variable as an entity that incrementally changes its value in discrete steps.

62


4.2.2.2

Variables in Informal Mathematics and Logic

The notion of a variable in informal mathematics is complex. Any text understander needs to be aware of the different uses of variables in mathematical discourse and their respective linguistic realisations. As Schoenfeld and Arcavi say, “in mathematics, one talks about numbers and quantities, and among them those which are changing or varying, and those which are constant, known, unknown, given etc.” [145]. In writing the Principia Mathematica, Whitehead and Russell aimed to devise a formalisation of mathematics, which therefore included a clear definition of the notion of a variable. Its complex nature is illustrated in the following citation [173, p. 4f]. “To sum up, the three salient facts connected with the use of the variable are: (1) that a variable is ambiguous in its denotation and accordingly undefined; (2) that a variable preserves a recognisable identity in various occurrences throughout the same context, so that many variables can occur together in the same context each with its separate identity; and (3) that either the range of possible determinations of two variables may be the same, so that a possible determination of one variable is also a possible determination of the other, or the ranges of two variables may be different, so that, if a possible determination of one variable is given to the other, the resulting complete phrase is meaningless instead of becoming a complete unambiguous proposition (true or false) as would be the case if all variables in it had been given any suitable determinations.”

Understanding informal mathematical discourse requires the identification of the variables it contains as well as their type, scope and quantification. This is a complex task because the different uses of variables are rarely made explicit in the language that refers to them. Variables can be expressed both symbolically and verbally. Moreover, type, scope and quantification information is usually given in an implicit manner. Consequently, to determine the nature of a variable, it is usually necessary to take into account the meaning of the statement in which it occurs or its wider context. This is in stark contrast to the uses of variables in logic, where the syntactic form of a (well-defined) statement alone defines the variables it contains as well as their type, scope and quantification. Consequently, “variable processing” is considerably less complex in logic than in informal mathematics. There are two types of variables in first order predicate logic, quantified variables and unquantified, or free variables. In informal mathematics, those two types of variables are complemented by unknowns or indeterminates and other kinds of variables, which we now discuss in varying detail. Quantified Variables and Free Variables. In mathematics, the statement x2 −1 = (x+1)(x−1) is true for all values of x. That is, the denotation or value of x can vary without affecting the truth of the assertion that contains it. Although the universal character of x is not made linguistically explicit in x2 − 1 = (x + 1)(x − 1), the meaning of its parts (e.g., the meaning of the equality sign, the multiplication operator) suggests a universal reading of x. In predicate logic, the symbol x is considered a free variable in the equation x2 − 1 = (x + 1)(x − 1). In order to make the universal character of x explicit, we need to write ∀x : x2 − 1 = (x + 1)(x − 1). As we have seen in our mathematical analysis of textbook proofs, a variable can change is status. Within a proof, the same variable can appear quantified and free. Reconsider LeVeque’s existence proof of the Fundamental Theorem of Arithmetic. T HEOREM 2-2. Every integer a > 1 can be represented as a product of one or more primes.

(18)

Proof: The theorem is true for a = 2. Assume it to be true for 2, 3, 4, . . . , a − 1. If a is prime, we are through. Otherwise a has a divisor different from 1 and a, and we have a = bc, with 1 < b < a, 1 < c < a. The induction hypothesis then implies that s

t

i=1

i=1

b = ∏ pi , c = ∏ pi , with pi , pi primes and hence a = p1 p2 . . . ps p1 . . . pt .

4.2


63

The theorem statement makes the introduction of a universally quantified and named variable a linguistically explicit. However, the scope of a is limited to the theorem statement only. As our mathematical analysis in § 3.2.2 reveals, all occurrences of a in the proof are free variables. Readers that are unaware of the logical structure of the proof might find the use of a in the theorem misleading. First, because the theorem has only one occurrence of a and no references to a, there is no presentational need to name the variable. Second, inexperienced readers may incorrectly suggest that all occurrences of a in discourse (18) refer to the same mathematical entity. Experienced readers interpret LeVeque’s use of a in the theorem sentence as a “fore-shadowing” of a’s use in the proof. Since the every integer in the theorem is called a, it is automatically understood that this a is used as the arbitrary a, silently introduced in the beginning of the proof, in accordance with the usual treatment of proofs for universally quantified formulas. This enables the writer not to begin the proof with Let a be an arbitrary integer greater than 1, thus saving repetition, time and space. Experienced readers of mathematical texts are acquainted with this convention. Inexperienced readers may not be aware of this and may find the writer’s style sloppy, blurring the difference between quantified variables and free variables. In [94], Karttunen points out that the existential quantifier has the dual function of asserting existence, thus binding a variable, and of introducing a constant that can figure in subsequent discourse. Taking the previous discussion into account, the same can be said of the universal quantifier, the let statement and the fore-shadowing transformation of universal variables into free variables. Indeterminates or Unknowns. In mathematics, the x in x2 − 4x + 3 = 0 is called an unknown or indeterminate. An unknown or indeterminate does not vary in its denotation, and its value(s) is/are still to be determined. Identifying the value of the unknown x in x2 − 4x + 3 = 0 is easy. The statement is true only if x denotes 1 or 3. Now, the unknown x becomes a name that refers unambiguously to either 1 or 3; x can be considered their placeholder. If we look at the form of the equation x2 − 4x + 3 = 0 only, i.e., ignoring its content, then x has to be considered a free variable, given the absence of any binding quantifiers. The existential reading ∃x : x2 − 4x + 3 = 0 captures best the unknown character of x. Of course, we could give x a universal reading, that is, ∀x : x2 − 4x + 3 = 0. But this is a false statement given a standard model. Pragmatically, unknowns are different from free variables. For instance, the use of x in x2 − 4x + 3 = 0 is different from the use of a in “let a be an arbitrary positive integer greater than 2”. For the equation, a specific mathematical object is sought that x denotes. With the “let” statement one arbitrarily chooses a mathematical entity that is having some properties and names it a. Determining the scope and quantification of variables and unknowns is all but trivial. The keyword any is especially ambiguous, offering at least three different readings. Consider the following four sentences from Hardy & Wright’s textbook. (19)

a. Any modulus S, except the null modulus, contains some positive numbers. b. If n is any positive number of S, then n − zd ∈ S for all z. c. There cannot be any abnormal numbers. d. q is not divisible by any prime.

In (19a), any is used as a universal quantifier (binding S). In (19b), any has two possible readings. First, n can be given a free variable reading. The sentence could then be reworded as let n be an arbitrarily chosen positive number from S. Then n − zd ∈ S for all z. Second, n can be given a universal reading, which reads as ∀z∀n : n ∈ S ∧ n > 0 → n − zd ∈ S. In (19c), any is used as an existential quantifier, and this sentence reads as there is no abnormal number, or ¬∃x : abnormal(x) ∧ number(x). In (19d), any is used with the reading: ∀p ∈ PRIMES : ¬div(q, p) i.e., ¬∃p ∈ PRIMES : div(q, p). Any procedure that identifies the scope and quantification of variables has to take into account that a variable name may have multiple occurrences, where one occurrence does not share the quantification and scope of another occurrence: (20)

a. If f (n) is prime for all large n, then there is an n for which f (n) = p > am and p is prime. n

b. Since pn < 22 is true for n = 1, it is true for all n.

64


In (20a), the n in the first occurrence of f (n) is universally bound, and the n in the second occurrence of f (n) is existentially bound. In (20b), all occurrences of n in the inequality are first referring to the constant 1; then n becomes universally quantified, and so does the inequality. Apart from logical variables, mathematical discourse uses non-logical variables, which we will now discuss briefly. Formalising mathematical discourse requires their transformation into logical variables, which often is of a complex nature.7 Abnormal Uses. In rare occasions, there is an unusual use of symbols as exemplified in the following text fragment from [68, p. 7]: (21)

Suppose that n is an integral variable which tends to infinity, and x a continuous variable which tends to infinity or to zero or to some other limiting value; that φ(n) or φ(x) is a positive function of n or x; and that f (n) or f (x) is any other function of n or x. [...] (vi) f φ means Aφ < f < Aφ, where the two A’s (which are naturally not the same) are both positive and independent of n or x. Thus f φ asserts that ‘ f is of the same order of magnitude as φ’. We shall very often use A as in (vi), viz. as an unspecified positive constant. Different A’s have usually different values, even when they occur in the same formula; and, even when definite values can be assigned to them, these values are irrelevant to the argument.

In this example, as the authors themselves point out, the symbol A does not preserve a recognisable identity in various occurrences throughout the same context. Differentiation and Integration Variables.

Consider the two occurrences of X in the expression d cos(X) . dX

What is the status or use of the symbol X; is X a free or quantified variable? If we substitute all occurrences of X by a number, say 90, we get d cos(90) , d90 an expression that does not make sense. If we simultaneously replace all occurrences of X by Y , we obtain an expression with identical meaning: d cos(Y ) . dY In order to discover the nature of X, we need to look at d. In fact, d is a function that maps functions to ) can be rewritten as d(cos), where cos is the cosines function. functions. In the above example, d cos(Y dY Complex dX terms, like d sin(X) · eX dX can be written as d( f oo) if the function f oo is defined as follows: f oo(X) = sin(X) · eX . If we use an anonymous function, an alternative way to express d sin(X) · eX dX is d(λX. sin(X) · eX ). In this light, X is “bound” by d, for d cos(Y ) , dX 7 Kalish

& Montague’s textbook on techniques of formal reasoning [90] gives a very good characterisation and formalisation of non-logical variables.

4.2


65

or d(λX. cos(Y )) results in cos(Y ). Integration is the converse of differentiation, and our discussion also applies to terms of the form Note that the variables a and b in the equation b a

1 · xn+1 x dx = n+1

f (x)dx.

b

n

a

are integration ranges, which are defining the end points of an interval [a, b]. Since the equation is true for all values of a and b, logically, they are universally quantified, although mathematically, they would be considered as placeholders. Summation Variables. (22)

Consider the following two assertions.

n(n+1)(2n+1) 6 1 1 1 2! + 3! + . . . + n!

a. ∑nk=1 k2 = b. 1 + 1!1 +

+... = e

In (22a), the variable k serves as summation variable (German: “Laufindex”). The expression k = 1 defines its initial value, and n defines its top boundary.8 Formally, (22a) is easily transformed into the following more general formula (here, n is the Laufindex; to obtain (22a), f can be instantiated to the square function f (n) = n2 ): ⎧ ⎪ if n < k ⎨sum( f , k, n) = 0 ∀k : ∀n : ∀ f : sum( f , k, n) = f (n) if n = k ⎪ ⎩ sum( f , k, n) = f (n) + sum( f , k, n − 1) if n > k Note that the left-hand side of (22b) denotes an infinite summation term. It needs to be distinguished from the finite term ∑ni=0 i!1 . The identification of the underlying pattern of a sum, say (22b), is usually non-trivial and requires the use of algebraic knowledge. For instance, the first summand of the elliptic term in (22b) is 1, which is a short-hand for 0!1 . Set, Limit, and Range Variables. (23)

Consider the following three examples.

a. M = {x ∈ N | 1 < x < 10}. b. limn→∞ [1 + (a/n)]n = ea . c. 3t + 6.

In (23a), the variable x is an enumeration variable; it enumerates all elements of the set M. In (23b), n is a variable that is thought of as continuously increasing its value towards infinity. In the term (23c), t is likely to serve as a representative of a range of values. Function Variables. (24)

Consider the following two definitions of a function f .

a. f (x) = ln x b. f (x) = ln x + x

The variable x is universally quantified in each case. In (24a), the function f is given a new name ln. The definitional equation can also be expressed as f = ln. Here, it is therefore not necessary to use the variable x; although it carries the additional information that f and ln are unary functions. The use of the function variable x is obviously required in (24b). We will now inspect functional expressions more closely. that (22a) contains two occurrences of “=”. The expression k = 1 assigns the initial value 1 to k. The second occurrence n(n+1)(2n+1) . of “=” is an asserted identity between ∑nk=1 k2 and 6 8 Note

66

4.2.3


Functions

There are three notational schemes for functional expressions, each of which is demonstrated by one of the following three terms. (25)

a. s(p) b. x + y c. p

These examples show the function s in prefix, the function + in infix, and the function in postfix notation. The interpretation of functional expressions has to cope with symbol overloading. For instance, p is frequently used to represent the successor of p, p + 1. However, in a second reading, spelled “prime p”, it denotes an object p that is in some relation to an object p. A functional expression that is prone to an erroneous interpretation is f 2 x. In its first (and correct) reading it denotes the composition of a function f with itself, that is, f ( f (x)). In an alternative (and incorrect) reading, sometimes given by students, f 2 x denotes the square of the value that results from applying f to x, that is, ( f (x))2 . A major problem in interpreting mathematical expressions is to distinguish a function definition from a function application, and a function object from a function value. In many cases, it is impossible to perform such distinctions on purely linguistic grounds, as the following definition of the Möbius function9 demonstrates [68, p. 234]: (26)

The Möbius function µ(n) is defined as follows: (i) (ii) (iii)

µ(1) = 1; µ(n) = 0 if n has a squared factor; µ(p1 p2 . . . pk ) = (−1)k if all the primes p1 , p2 , . . . , pk are different.

Thus µ(2) = −1, µ(4) = 0, µ(6) = 1. T HEOREM 262.

µ(n) is multiplicative.

This mini-discourse contains three occurrences of µ(n). In its first and third appearance, µ(n) denotes a function object, the unary Möbius function µ. The first occurrence is announced as “The Möbius function”; in its third occurrence, µ(n) is attributed the property multiplicative, which can only be applied to functions. This could have been made more precise and concise, if µ would have been used instead of µ(n). In its second occurrence, µ(n) = 0, µ(n) denotes the result of applying µ to n. Also, it is obvious to the human reader that Thus µ(2) = −1, µ(4) = 0, µ(6) = 1., containing other function applications, is not part of the function definition. Rather, it serves as an explanation that facilitates its understanding. Another example that illustrates the complexity of distinguishing function objects from function applications is the following definition of multiplicative.10 (27)

a. A function f (m) is said to be multiplicative if (m, m ) = 1 implies f (mm ) = f (m) f (m ). b. ∀ f ∀m∀m : multiplicative( f ) ↔ ((m, m ) = 1 → f (mm ) = f (m) f (m )).

In (27a), f (m) does not denote the value obtained from applying some particular function f to some given argument m. Here, f (m) refers to an arbitrary unary function object. The higher order logic expression in (27b) gives a more precise and concise account. The inequality f > 0 illustrates a related problem. If f is a unary function, then the formula f > 0 is short-hand for ∀x : f (x) > 0. Consequently, the f in f > 0 cannot denote a function object. Similarly, the occurrence x2 in the expression x2 is non-negative. denotes an anonymous function, the unary square 9 The M¨ obius function is a total function on the positive integers. In contrast to the given declarative definition, a procedural definition can be given: start to construct the prime factorisation for some given number n; as soon as a prime occurs twice stop and return 0; if the prime factorisation is complete, count the number of primes; if the number of primes is even, return 1, else return −1. 10 The term (m, m ) is a short-hand notation for “the greatest common divisor of m and m .

4.2


67

function. In order to express the intended reading in a formally correct manner, it has to be coerced into ∀x : square(x) > 0 , or, in the formalism of the lambda calculus, ∀y : λx.x2 (y) > 0. Function definition and function application relate to definitional equality and asserted equality. These two forms of equality will be discussed below.

4.2.4

Predicates

Similar to constant and function symbols, predicate symbols can be overloaded with a multitude of meanings. In number theory, for example, the predicate symbol prime can be used both as an unary predicate (as in n is prime) or as a binary predicate (as in n is relatively prime to m). The distinction of prime/1 from prime/2 can usually be performed on purely syntactic grounds. However, there is a predicate that has multiple uses with only subtle differences among them, namely, the equality sign. Consider the following two simple equations of elementary arithmetic. (28)

a. 2 · 3 = 6 b. 6 = 2 · 3

These two equations are semantically equivalent to each other. Pragmatically however, (28a) makes a statement about multiplication, and (28b) makes a statement about factorisation. The reader is invited to verbalise both equations. The equation 2 · 3 = 6 is likely to be verbalised as “2 multiplied with 3 yields 6” or “If we multiple 2 with 3, we obtain 6”. The equation 6 = 2 · 3 is likely to be verbalised as “6 can be factorised into 2 and 3” or “2 and 3 are the factors of 6”. The same argument can be made for the following two examples, which we have taken from [145]. (29)

1 − x+1 =

a. ∀x ∈ R :

1 x−1

b. ∀x ∈ R :

2 x2 −1

=

1 x−1

2 x2 −1

1 − x+1

As above, since “=” is a symmetric relation, the equations are logically equivalent. On pragmatic grounds, the equation (29a) is about the subtraction of algebraic functions while (29b) is about the partial function decomposition of the fraction x22−1 . In each of the four cases the equality sign and its context evokes an algebraic operation, namely, multiplying, factorising, subtracting, and decomposing, respectively. In some instances, the writer of a mathematical argument makes these operations verbally explicit, for instance in FTA’s uniqueness proof, which we discussed in § 3.3.2. (30)

bi −1 bi +1 pi+1 . . . pbk k . The [...] If ai > bi and we divide by pbi i , we obtain pa11 . . . pai i −bi . . . pak k = pb11 . . . pi−1 left-hand side is divisible by pi while the right-hand side is not, a contradiction. [...]

Here, the equation (4.2) is obtained by dividing both sides of the equation (4.1) by pbi i . pa11 pa22 . . . pak k pa11 . . . pai i −bi . . . pak k

b

= qb11 qb22 . . . q j j =

bi −1 bi +1 pb11 . . . pi−1 pi+1 . . . pbk k

(4.1) (4.2)

The fragment (30) demonstrates another related aspect of equations. Their given textual representation cannot be changed without affecting the denotation of definite descriptions that deictically refer to them. For instance, the definite descriptions the left-hand side and the first product require ignoring the symmetry of “=” and the multiplication operator wrt. the representation of the equation or product, respectively. The “=” sign has other subtle properties. Both (4.1) and (4.1) are asserted identities. In ∑nk=1 k2 , discussed briefly above, the equality assigns to k its initial value 1. In the sequence of equations f ib(0) = 1, f ib(1) = 1, and f ib(n + 2) = f ib(n + 1) + f ib(n), the “=” signs are understood as definitional equalities.11 11 Programming

languages clearly distinguish assignment operations from operations that test for equality.

68

4.2.5


Definite Descriptions

A definite description is proper if it denotes exactly one object. These definite descriptions are all proper: (31)

a. the first four Fermat numbers b. the smallest prime number c. the 664999th. prime

The first definite description refers to four objects at once, namely, F1 = 5, F2 = 17, F3 = 257 and F4 = 65537. Their existence and uniqueness is obvious from the definition of Fermat numbers.12 The definite noun phrase in (31b) refers to a small natural number, namely 2. This definite description is proper because the natural numbers are well-ordered, and < is a Nœtherian relation.13 Note that in 2 is the smallest prime number, the definite description carries more mathematically interesting information, or descriptive context, than simply 2.14 The noun phrase (31c) refers to a very large number. Since there are infinitely many primes, there is also the 664999th prime number, and therefore, (31c) is a proper description.15 The next four sentences are examples of improper definite descriptions. (32)

a. b. c. d.

the prime number between 7 and 11 the square root of 2 f(-3) the largest known prime

Sentence (32a) is improper since there is no prime number between 7 and 11. It is therefore violating the existence condition of a proper definite description. Example (32b) is improper in R since there is more than one object that is the square root of 2; it is thus violating the uniqueness condition. Sentence (32c) is improper if the domain of the function f is the set of positive integers. In (32d), we have a definite description that has a time-dependent denotation. Perhaps surprisingly, there is a use of improper definite descriptions in mathematics as another proof for the Fundamental Theorem of Arithmetic illustrates [68, p. 21].

(33)

In (33), the clause the least abnormal number is a fictitious definite description. In the argument, the existence of a non-empty set of abnormal numbers is assumed. Since any set of natural numbers has a least element, the proof author can now use this fact to reason forward from it. Eventually, a contradiction is derived, which leads to the conclusion that abnormal numbers cannot exist. n

numbers are defined by Fn = 22 + 1. 13 The well-ordering of the natural numbers is also relevant to the interpretation of the first four Fermat numbers. 14 This is an equivalent statement to the well-known “Scott is the author of Waverley”, where the identity of the referents of two noun phrases is claimed equivalent by a flanking is. 15 For any given mathematical object, there exists an infinite number of definite descriptions that refer to it. We can neither grasp the 664999th. prime number as a physical object, nor can we write it down (in its Arabic number representation) on a sheet of paper. However, we know that the (664998+1)th. prime number denotes the same mathematical entity. 12 Fermat’s

4.3

Linguistic Phenomena

69

Functions as Definite Descriptions. The application of a function to an argument in its domain is a proper definite description. It refers to an object in the same unambiguous way that a proper name does. The object that is referred to exists and is uniquely referred to by the function application or proper name. For example, the definite description the sum of 2 and 5, the symbolic expression 2 + 5 and the proper name 7 all unambiguously refer to the number object 7. Functions can therefore be regarded as parameterised definite descriptions. For instance, the parameterised definite description the greatest common divisor of a and b, gcd(a, b), yields a definite description if the variables a and b are instantiated to values that are members of the domain of gcd. Finding suitable instantiation of a function’s arguments to yield a specific result may not always be straightforward, as Goldbach’s Theorem shows. (34)

a. If n > 4 is even, then n is the sum of two odd primes. b. ∀n ∈ N : n > 4 → ∃p1 ∃p2 : odd(p1 ) ∧ odd(p2 ) ∧ n = p1 + p2 .

In the given context the sum of two odd primes can be regarded as a potential definite description. Definite descriptions select a mostly unique referent using descriptive content and context information. Often they are employed for anaphoric use and may not always be coreferential with their antecedent, but rather refer to something somehow related to it. In the next section, we will continue our classification of definite descriptions, but in conjunction with other linguistic phenomena.

4.3


This section is divided into three parts. In the first part, we elaborate on the use of symbols in mathematical writing within a large discourse context. The second part gives a systematic account on linguistic phenomena that we consider are particularly predominant in this text genre. A particular emphasis is on abstract discourse entities, which we therefore discuss separately in § 4.3.3. The use of connectives, in particular conditionals, is discussed at a later stage in sect. 4.4.

4.3.1

A Discourse Analysis Focusing on Terms and Formulae.

A prerequisite for parsing textbook proofs is being able to parse formulae that occur in these proofs. Parsing formulae in the empty context, that is, in isolation, is trivial. Problems arise if the textual context has to be taken into account, and when references from the text to formulae and their parts need to be resolved. This is because symbols have a domain and scope that extend across text and formulae. In Hardy & Wright’s textbook, the domain and scope of a symbol may be active throughout the book. On the first page of Hardy & Wright’s textbook on elementary number theory, the following sequence of sentences is given [68, p. 1]. (35)

a. The numbers . . . , −3, −2, −1, 0, 1, 2, . . . are called the rational integers, or simply the integers; the numbers 0, 1, 2, 3, . . . the non-negative integers; and the numbers 1, 2, 3, . . . the positive integers. b. In what follows the letters a, b, . . . , n, p, . . . , x, y, . . . will usually denote integers, which will sometimes, but not always, be subject to further restrictions, such as to be positive or non-negative. c. An integer a is said to be divisible by another integer b, not 0, if there is a third integer c such that a = bc. d. If a and b are positive, c is necessarily positive. e. We express the fact that a is divisible by b, or b is a divisor of a, by b|a. f. Thus 1|a, a|a; and b|0 for every b but 0. g. We shall also sometimes use b | a to express the contrary of b|a. h. It is plain that b|a . c|b → c|a, b|a → bc|ac if c = 0, and c|a . c|b → c|ma + nb for all integral m and n

70


(35a) introduces an object space by enumerating and naming three sets. (35b) introduces a name space, providing names that can be used to denote objects of the object space. With these two sentences, Hardy & Wright set the stage for the use of the variables a, b, . . . , n, p, . . . , x, y, . . .. Only occasionally, they will introduce and use other letters at a later stage. However, nothing is said yet concerning the quantification and scope of these variables. As a matter of fact, these vary throughout Hardy & Wright’s textbook. (35c) defines a binary predicate by using three named variables, namely, two universally quantified variables a and b as well as one existentially quantified variable c.16 It also contains a hidden functor, namely, the multiplication operator in the term bc. (35d) adds a statement to the definition that has been given in the preceding sentence. It uses the same variables as in (35c), each of which is of the type that has been assigned to them in (35b). Since (35d) adds to (35c), presumably, a, b, and c are in the scope of the quantifiers that have been established in (35c). In (35e) the inverse relation of divisible as well as one of its notational alternatives is defined, namely, divisor and |. The question is, from the computational perspective, whether a and b stick to their universal bindings that have been established in the prior discourse, or whether a new universal binding needs to be generated on the fly. Obviously, by using pragmatic knowledge, the latter is true, and we would represent one part of (35e) in first order logic as ∀a ∈ N : ∀b ∈ N : divisible(a, b) ⇔ is divisor(b, a). The same question about the quantification of a and b can be asked for (35f), where a is not explicitly quantified, but b is. In (35g), the notation b | a is introduced. Here neither a nor b are explicitly quantified. Note that in (35h), a, b and c are not explicitly universally quantified, but m and n are. Despite the fact that a prior discourse interpretation, in particular (35c), has introduced an existentially quantified variable c, a semantic analysis of (35h) reveals that c should get a universal quantification. As our analysis of this short sequence of sentences again shows, a major difficulty in the processing of mathematical discourse is the identification of the quantification of the variables it contains as well as the scope of their quantifiers. There are other tasks that a text understander has to cope with, of course. Before we give a more systematic overview of linguistic phenomena that are predominant in mathematical discourse, we now reconsider Hardy & Wright’s argumentation line that proves the Fundamental Theorem of Arithmetic [68, p. 3] from a linguistic point of view. We have enriched the proof depicted in Fig. 4.2 with line numbers to facilitate references to parts of the proof. We start the discussion of the proof with a focus on the use of variables. Obviously, each of the variables p, a, and b in (Fig. 4.2, line 4) are of the type positive integer and each of the variables is universally quantified. However, it is obviously not required that the occurrences of p, a, and b in line 8 are in the scope of these quantifiers. Without any change in line 4, we could replace the formula in line 8 by p | n1 n2 n3 . . . nl → p | n1 or p | n2 or p | n3 . . . or p | nl . Note however that the condition prime(p) that has been established in line 4 carries over to the different variable p in line 8. Although a variable preserves a recognisable identity in various occurrences throughout the same context, the status of a variable may change during the proof. As we discussed in our mathematical analysis, from line 10 onwards, the variables p, a, and b change their status from being universally quantified variables to being free variables. Apart from the variable problem, other linguistic phenomena must be dealt with. For instance, the term abc . . . l (line 8) describes the factorisation of objects a, b, c, . . . , l. The operator symbols that combine the sub-terms into larger terms are missing. Albeit the missing c, the phrase a, b, . . . l (line 9) also enumerates b b the factors a, b, c, . . . , l. Equally, qb11 . . . q j j (line 13) co-refers with qb11 qb22 . . . q j j (line 11) to the same entity. However, the terms pa11 . . . pai i −bi . . . pak k and pa11 pa22 . . . pak k do not describe the same object. b

The occurrences of k and j in k = j (line 13) co-refer with the k and j used in pa11 pa22 . . . pak k and qb11 . . . q j j . While the occurrences of i before line 15 are universally quantified, the i from line 15 onwards occur free. Fig. 4.2 contains definite noun phrases and other linguistic units with anaphoric character. The phrase the standard form of n (line 1) refers to a product, say pa11 pa22 . . . pak k , with certain properties. The units each product (line 12), both sets (line 14), the left-hand side, and the right-hand side (both line 17) we pointed out earlier, the sentence (35c) becomes less readable without the use of names: An integera is said to be divisible by another integerb not 0, if there is a third integerc such that the first numbera equals the product of the secondb and the third numberc . 16 As

4.3


1 2 3 4 5 6 7

71

T HEOREM 2 (T HE FUNDAMENTAL THEOREM OF ARITHMETIC ). The standard form of n is unique; apart from rearrangement of factors,n can be expressed as a product of primes in one way only. [...] T HEOREM 3 (E UCLID ’ S FIRST THEOREM ). If p is prime, and p | ab, then p | a or p | b. We take this theorem for granted for the moment and deduce Theorem 2. [...] It is an obvious corollary of Theorem 3 that p | abc . . . l → p | a or p | b or p | c . . . or p | l,

8 9 10 11

and in particular that, if a, b, . . . , l are primes, then p is one of a, b, . . . l. Suppose now that b n = pa11 pa22 . . . pak k = qb11 qb22 . . . q j j ,

12 13 14 15

each product being a product of primes in standard form. b Then pi | qb11 . . . q j j for every i, so that every p is a q; and similarly every q is a p. Hence k = j and, since both sets are arranged in increasing order, pi = qi for every i. If ai > bi , and we divide by pbi i , we obtain i −1 bi +1 pa11 . . . pai i −bi . . . pak k = pb11 . . . pbi−1 pi+1 . . . pbk k .

16 17 18

The left-hand side is divisible by pi , while the right-hand side is not; a contradiction. Similarly bi > ai yields a contradiction. It follows that ai = bi , and this completes the proof of Theorem 2.

Figure 4.2: Excerpt From Hardy & Wright’s Textbook (The Fundamental Theorem of Arithmetic).

all refer to entities that were introduced by terms and formulae. The definite noun phrases the left-hand side and the right hand side refer to terms that are introduced by the equation in line 16. b

The symbolic expression n = pa11 pa22 . . . pak k = qb11 qb22 . . . q j j (line 11) is elliptic in three ways. Apart from the two elliptic product representations, we have an abbreviated notation for the equality, which expands to17 : b

(n = pa11 pa22 . . . pak k ) ∧ (pa11 pa22 . . . pak k = qb11 qb22 . . . q j j ). The phrase we divide by pbi i (line 15) is elliptic because it lacks a reference to the second argument of b the division. An ellipsis reconstruction must yield we divide both sides of pa11 pa22 . . . pak k = qb11 qb22 . . . q j j by pbi i . Moreover, the sentence that embeds we divide by pbi i contains a state-change anaphora. The resulting terms of the division, as all other terms, refer to something and these references must be resolved. The phrases similarly every q is a p (line 13) and Similarly bi > ai yields a contradiction (line 17/18) are elliptic. In the phrase every p is a q (line 13), p ranges over p1 , p2 , . . . , pk , and q ranges over q1 , q2 , . . . , q j . Expanding the phrase we obtain ∀p∈{p1 ,p2 ,...,pk } ∃q∈{q1 ,q2 ,...,p j } : p = q. The analysis of Fig. 4.2 shows the complexity that is involved in properly interpreting terms and formulae in a large textual context. We give now a more systematic account of anaphoric linkage and elliptic constructs.

4.3.2

A Systematic Account on Anaphoric Linkage and Elliptic Constructs

For each linguistic phenomenon, we give one or more examples only. No attempt is made to discuss any of the phenomena in detail. 4.3.2.1 (36) 17 The

Repeated Form and Partially Repeated Form a. T HEOREM 3 (E UCLID ’ S FIRST THEOREM ). If p is prime, and p | ab, then p | a or p | b. b

statement n = qb11 qb22 . . . q j j is entailed via the transitivity of “=”

72

A Linguistic Analysis of Proofs Suppose that p is prime and p | ab. If p | a then (a, p) = 1, and therefore, by Theorem 24, there are an x and a y for which xa + yp = 1 or xab + ypb = b. But p | ab and p | pb, and therefore p | b. b. T HEOREM 74. The product of any n successive positive integers is divisible by n!. [...] We choose Theorem 74, which asserts that (m)n = m(m + 1) . . . (m + n − 1) is divisible by n!. This is plainly true for n = 1 and all m, and also for m = 1 and all n. We assume that it is true (a) for n = N − 1 and all m and (b) for n = N and m = M. Then (M + 1)N − MN = N(M + 1)N−1 , and (M + 1)N − 1 is divisible by (N − 1)!. Hence (M + 1)N is divisible by N!, and the theorem is true for n = N and m = M + 1. It follows that the theorem is true for n = N and all m. Since it is also true for n = N + 1 and m = 1, we can repeat the argument; and the theorem is true generally. b

c. [...] Suppose now that n = pa11 pa22 . . . pak k = qb11 qb22 . . . q j j , each product being a product of primes b

in standard form. Then pi | qb11 . . . q j j for every i A proper treatment of anaphoric linkage by repeated form must take into account that a variable may change its status, that is, change from a universally quantified variable to a free variable and vice versa. For instance, in the theorem statement of (36a), all occurrences of p, a, and b are universally quantified. In the proof, the variables p, a, and b are free variables, and therefore denote arbitrary entities. The text fragment (36b), taken from [68, p. 64], is an extreme example of variables that change their status and denotation more than once (e.g., all m, m = 1, m = M, m = M + 1). The example (36c) shows that an b b expression is repeated only partially. It is obvious that qb11 qb22 . . . q j j and qb11 . . . q j j refer to the same product bi −1 bi +1 qi+1 . . . q j j does not. term, while qb11 . . . qi−1 b

4.3.2.2

Lexical Replacement

As we have already pointed out, one characteristic of scientific discourse is that it frequently defines new lexicon entries. (37)

a. We express the fact that a is divisible by b, or b is a divisor of a, by b|a. b. Either n is prime, when there is nothing to prove, or n has divisors between 1 and n.

In the example (37a), the inverse relation to divisible as well as an alternative notation is introduced. In (37b), we have another, more implicit, form of lexical replacement. Here, in the negation of n is prime the definiens prime is replaced by its definiendum. As we demonstrate in ch. 3, such definitional expansion must be followed by a rather large sequence of rewriting steps to obtain a formula that comes close to the English lexical replacement n has divisors between 1 and n. 4.3.2.3

Substituted Form

Two examples of anaphoric linkage by substituted form are given next. (38)

a. logn m is irrational if m and n are integers, one of which has a prime factor which the other lacks. b. We call a system S of integral quaternions, one of which is not 0, a right-ideal if it has the properties: [...]

As is the case with pronoun resolution, the proper treatment of one of which is only made harder by the large number of possible antecedents.

4.3


4.3.2.4 (39)

73

Pronouns a. Either n is prime, or it? is divisible by a prime less than n. b. q is not divisible by any of the numbers 2, 3, 5, . . . , p. It? is therefore either prime, or divisible by a prime between p and q. c. If two integers have no common factor larger than 1, we say that they? are relatively prime. d. If a given integer is relatively prime to each of several others, it? is relatively prime to their product? . e. x = 175, y = −68 is one pair of integers such that 4147x + 10672y = 29. It? is not the only such pair? . n

f. Since pn < 22 is true for n = 1, it? is true for all n. In the first sentence, there is talk about two entities. The first entity, named n, is introduced by the first elementary sentence n is prime. The second entity, which remains anonymous, is introduced by the noun phrase a prime. Intuitively, the pronoun refers to the entity named n. Taking the anonymous entity as its antecedent would be unnatural. Note that we have a second occurrence of n in the sentence, which obviously co-refers with the first occurrences of n to the same mathematical object. In (39b), many different objects get introduced, namely, q, the first p prime numbers 2, 3, 5, . . . , p and an anonymous prime number between p and q. The intuitive reading is that the pronoun refers to q and not to any of the prime numbers 2, 3, 5, . . . p. In the third example, the noun phrase two integers introduces two objects, and the pronoun refers to these two objects and places them into the relation coprime. In example sentence (39d), we have two referential expressions. Intuitively, the singular pronoun refers to the entity introduced by a given integer. The noun phrase several others introduces an unknown number of entities. But their product does not refer to this set of entities, but to an entity that results from performing some operation on the members of this set. In the next example, we have several objects: First, the numbers 175, −68, 4147, 10672, and 29. Second, the name x for the number 175, and the name y for the number −68. Third, we have a pair object introduced by one pair of integers, consisting of the integers named by x and y. Fourth, we have an equation object that itself has a complex internal structure, consisting of various subterms. This gives us a large set of possible antecedents for resolving the pronoun. Moreover, It is not the only such pair presupposes the existence of other pair objects that fulfill the equation. On the other hand, this sentence constraints the set of possible antecents. For (39f), we see that pronouns can not only refer to terms or groups of terms, but also to propositions. With n it being attributed the property true, the pronoun must refer to the formula pn < 22 and cannot refer to any of its terms. In § 4.3.3, we will see that pronouns and other definite descriptions can also refer to groups of formulae as well as complete sub-proofs. 4.3.2.5

Bridging Anaphora

A bridging anaphor is an expression that is referring to an entity which has not been explicitly mentioned in the discourse before. (40)

a. f (x) and all its derivatives? take integral values at x = 0. b. In this case the congruence x2 ≡ a(mod p) has the solution? x = x1 ; c. Every positive integer, except 1, is a product of primes. The proof? is trivial.

None of the bracketed expressions can be linked to an explicit antecedent. However, the presence of the function f (x) provides sufficient information to licence the presence of its derivates. With bridging inferences — such as functions can have derivates, equations can have solutions, and assertions can have proofs — definite descriptions can be bridged to existing information of the discourse context.

74 4.3.2.6

A Linguistic Analysis of Proofs Possessives

Possessives relate to bridging anaphora. (41)

a. A number is divisible by 9 if and only if the sum of its digits? is divisible by 9. b. A number is perfect if it is the sum of its divisors? other than itself.

In (41a), we have that digits are the constituents of numbers. In (41b), we have another qualia role associated with number, namely that numbers have divisors, or stronger, that each number can be uniquely represented as a product of its divisors (the fundamental theorem of arithmetic). 4.3.2.7

Appositional anaphora

Appositional phrases express a relationship between two or more words or phrases in which the two units are grammatically parallel and have the same referent. A good English example is The President of the United Calexonian Planet System, General Ouloapulego. In Hardy & Wright’s textbook, we find the following two instances: (42)

a. But n/p1 is less than n and so has the unique prime factorisation p2 p3 . . .. b. The congruence x2 ≡ a(mod p) has the solution x = x1

In (42a), the unique prime factorisation (of n/p1 ) and p2 p3 . . . co-refer to the same entity. In (42b), the character of the apposition is of a different nature. We also have that the phrases the solution (of the congruence x2 ≡ a(mod p)) and x = x1 co-refer to the same entity. However, the interpretation of the second phrase x = x1 requires the presence of the first for substituting x by x1 . 4.3.2.8

Deictic Form

There are deictic expressions in mathematical discourse that refer to representations of mathematical entities. In contrast to bridging references, deictic references can only refer to representations that have been made explicit before. That is, for referential expressions to be called indexical or deictic, their antecedent must have been explicitly verbalised in the prior discourse context. (43)

b

a. We have m = 2b1 3b2 . . . p j j , with every b? either 0 or 1. There are just 2 j possible choices of the exponents? and so not more than 2 j different values of m. b. We have then n = p1 p2 p3 . . . = q1 q2 . . . , where the p and q? are primes, no p? is a q? and no q? is a p? . c. The series (un ) or 1, 1, 2, 3, 5, 8, 13, 21, . . . in which the first two terms? are u1 and u2 , and each term after? is the sum of the two preceding? , is usually called Fibonacci’s series. d. If p is a prime greater than 3, then the numerator? of the fraction? ? 1 1 1 1+ + +...+ 2 3 p−1 is divisible by p2 . The result is false when p = 3. It is irrelevant whether the fraction? is or is not reduced to its lowest terms? , since in any case the denominator? cannot be divisible by p.

e. We have 2 p = (1 + 1) p = 1 + 1p + . . . + pp = 2 + ∑1p−1 pl . Every term on the right, except the first? , is divisible by p.

In (43a), the noun phrase every b reads distributively. It refers to each of the exponents b1 , b2 , . . . b j that were introduced by the previous equation. The noun phrase the exponents refers collectively to a sequence containing all bi ; the noun phrase 2 j possible choices of the exponents then refers to all possible permutations of this sequence. In (43b), we see that the noun phrase the p and q can also be used in the same way as the construction every b in the previous example. In (43b), we also see two instances

4.3


75

involving p with an existential reading. The noun phrases a p reads as ∃p : p ∈ {p1 , p2 , . . .}; and no p reads as its negational form ¬∃p : p ∈ {p1 , p2 , . . .}. In both of these example sentences, we thus have a meta-linguistic use of p. The phrase “a p” means “a number just referred to with the help of the symbol p” whereas “the p” and “every p” read as “all the numbers just referred to with the help of the symbol p”. In (43c), (43d) and (43e), we have exemplary referring expressions that show the full complexity that is involved in identifying their antecedent. 4.3.2.9

State-Change Anaphora

Mathematical texts have plenty of linguistic constructions that first describe algebraic manipulations and then refer to the result of such operations. The following two examples have occurrences of such statechange anaphora: a. If we take p = 2, 3, . . . , P, and multiply the series together, the general term resulting? is of the type 2−a2 s 3−a3 s . . . P−a p s = n−s , where n = 2a2 3a3 . . . Pa p (a2 ≥ 0, a3 ≥ 0, . . . , a p ≥ 0).

(44)

b. The series 1 + x + x2 + . . . , 1 + x2 + x4 + . . . , . . . , 1 + xm + x2 m + . . ., are absolutely convergent, and we can multiply them together and arrange the result? as we please. Within a mathematical argument, the diminution of a proof obligation to a (simpler) proof obligation could also be regarded as a state-change anaphora. 4.3.2.10

that-anaphora

In Hardy & Wright’s textbook, we also find that-anaphora, for instance, the following two examples: a. Hence the denominator of Uk,r is divisible by ω only if that of?

(45)

ωk−1−r k+1−r

=

ωs−1 s+1

is divisible by ω.

b. The degree of h(x) is one less than that of? f (x). In (45a), that of refers to the denominator of, and in (45b), that of refers to the degree of. 4.3.2.11

Elliptic Forms

Elliptic forms are characterised by the omission of words or terms that are considered superfluous. Ellipses are a very effective and commonly used notational device in mathematical discourse. The uniqueness proof of the Fundamental Theorem of Arithmetic discussed earlier serves as a good example. The representation of the same argument without the use of ellipses is a very painful exercise, indeed. In elliptic constructions, term omissions are indicated by a sequence of three dots. The proper interpretation of each of the ellipses requires the use of contextual clues as well as domain knowledge. In general, this is very hard to mechanise. The following mini-discourse gives an extreme example of the extensive use of ellipses in mathematical arguments [68, p. 3f]. (46)

The first primes are 2, 3, 5, 7, 11, 13, 17, 19, 23, 29, 31, 37, 41, 43, 47, 53, . . .. It is easy to construct a table of primes, up to a moderate limit N, by a procedure known as the ‘sieve of Eratosthenes’.√We have seen that if n ≤ N, and n is not prime, then n must be divisible by a prime not greater than N. We now write down the numbers 2, 3, 4, 5, 6, . . . , N and strike out successively (i) 4, 6, 8, 10, . . ., i.e. 22 and then every even number, (ii) 9, 15, 21, 27, . . ., i.e. 32 and then every multiple of 3 not yet struck out,

76

A Linguistic Analysis of Proofs (iii) 25, 35, 55, 65, . . ., i.e. 52 , the square of the next remaining number after 3, and then every multiple of 5 not yet struck out, . . .. We continue the process until the next remaining number, after that whose multiples were canceled √ last, is greater than N. The numbers that remain are primes.

In this text, the elliptic terms (i)-(iii) can only be properly interpreted within their textual context. Interestingly, each of them has a non-symbolic defining and explanatory component.

4.3.3

Propositional Discourse Entities

Mathematicians distinguish different kinds of statements: a theorem labels a major mathematical result, a lemma stands for an auxiliary result, a proposition indicates a minor mathematical result and a corollary marks a direct consequence of a theorem, lemma or proposition.18 A statement is marked as a conjecture if the author thinks that the statement is true, but so far has been unable to prove or disprove. A hypothesis is a statement that is assumed true as a basis for further reasoning. A definition is a statement that introduces a new concept in terms of given concepts. A notation introduces a new linguistic construction, usually to abbreviate an otherwise wordy one. In textbooks, not all of these different statements are explicitly labelled as theorem, lemma, proposition, corollary etc. In Hardy & Wright’s book on elementary number theory, for instance, only theorems are consistently labelled with T HEOREM and also get a unique number to facilitate references to them. Also, the authors chose to label statements that are announced as lemmata as theorems: (47)

The proof of Theorem 118 depends upon the following lemma. T HEOREM 119: ∑1p−1 mk ≡ −εk (p)(mod p).

In Hardy & Wright’s book, conjectures are not labelled but are verbally announced as conjectures. Only a small number of hypotheses is marked as such and their proper identification is left to the reader, as is the governing and highly complex task of mathematical text understanding, namely, to distinguish assumed statements from derived ones. As the previous discussion suggests, propositional discourse entities play an essential role in mathematical writing and reasoning. Any mathematical text understander must give a proper account of anaphora that refers to statements or larger portions of text. We will focus on anaphoric expressions that involve theorems, lemmata, hypotheses and definitional phrases. We ignore examples and remarks as well as references to such statements. Given the sheer number of propositions that are accessible in the context of a mathematical argument, the resolution of propositional anaphora is complex. The process of anaphora resolution can be guided, however, for example, by a notion of saliency. Often the use of mathematical and meta-mathematical knowledge source will prove to be necessary. 4.3.3.1

Reference to a Proposition or its Parts

Recall Leveque’s proof for the existence of prime factorisations. (48)

T HEOREM 2-2. Every integer a > 1 can be represented as a product of one or more primes. Proof: The theorem? is true for a = 2. Assume it? to be true for 2, 3, 4, . . . , a − 1. If a is prime, we are through. Otherwise a has a divisor different from 1 and a, and we have a = bc, with 1 < b < a, 1 < c < a. The induction hypothesis? then implies that s

t

i=1

i=1

b = ∏ pi , c = ∏ pi , with pi , pi primes and hence a = p1 p2 . . . ps p1 . . . pt . 18 As

Higham points out, this classification is rather fuzzy, depending on the context in which results appear and on the personal style of the author [75].

4.3


77

The discourse contains three anaphors that refer to abstract discourse entities. Their propositional type is induced by the verb phrase being true for and by the verb implies; only propositions can be true, and only propositions can imply other propositions. The first propositional anaphor in discourse (48), the theorem, obviously refers to the most recent proposition that has been introduced in the discourse as a theorem. Also, its inter-sentential context suggests that the theorem is referring to a proposition that has an occurrence of a variable a that can be instantiated to 2. The second anaphor, the pronoun it, refers to the same proposition as the theorem. A linkage to the most salient assertion, a = 2, is obviously wrong. The resolution of the definite noun phrase the induction hypothesis to its antecedent requires to choose among all accessible hypotheses within the current proof.19 In this proof, we have assumed so far a > 1, prime(a), ¬prime(a) and ∀k ∈ N : k < n → (k > 1 → prod primes(k)). The assumption prime(a), however, should not be considered accessible since it occurs in a segment of the discourse that has already been closed. The application of additional meta-mathematical knowledge also rules out the assumptions a > 1 and ¬prime(a). This leaves us with one hypothesis, ∀k ∈ N : k < n → (k > 1 → prod primes(k)), which is the correct reading of the induction hypothesis. Reference to Theorems and their Parts. Each of the following mini-discourses shows a different aspect of propositional anaphora and the difficulties to deal with them. We focus on theorem statements only. (49)

a. T HEOREM 3 (E UCLID ’ S FIRST THEOREM ). If p is prime, and p | ab, then p | a or p | b. We take this theorem? for granted for the moment and deduce Theorem 2? . b. T HEOREM 102. Let m ≥ 2, h < 2m and n = h2m + 1 be a quadratic non-residue (mod p) for some odd prime p. Then the necessary and sufficient condition for n to be a prime is that 1

p 2 (n − 1) ≡ −1(modn).

(6.14.3)

[...] Then (6.14.3)? follows at once by Theorem 83? . Hence the condition? is necessary. c. T HEOREM 20. π(x) ≥

logx (x ≥ 1); pn ≤ 4n . 2log2

We take j = π(x), so that p j+1 > x and N(x) = x. We have √ √ x = N(x) ≤ 2π(x) x, 2π(x) ≥ x and the first part? of Theorem 20? ? follows on taking logarithms. If we put x = pn , so that π(x) = n, the second part? is immediate. d. T HEOREM 444. If ϑ1 , ϑ2 , . . . , ϑk are linearly independent, α1 , α2 , . . . , αk are arbitrary, and T and ε are positive, then there is a real number t, and integers p1 , p2 , . . . , pk , such that t > T and |tϑm − pm − αm | < ε (m = 1, 2, . . . , k). The fundamental hypothesis in Theorem 444? is weaker than in Theorem 442, since it only concerns linear relations homogeneous in the ϑ. The use of Theorem 3 in (49a) shows how Hardy & Wright label their theorems in [68]. In this (rare) case, they also attach another proper name to this theorem, namely Euclid’s first theorem. Both of these constructions can get a forward (cataphoric-appositional) as well as a backward (anaphoric) reading. For the anaphoric expression this theorem, however, an anaphora resolution algorithm only needs to scan the prior discourse context, searching for a proposition that most likely has been explicitly labelled as a theorem. To refer to any other than the immediately preceding or succeeding theorem, Hardy & Wright use their theorem number scheme, as in Theorem 2. The text fragment (49b) shows two other aspects of anaphoric reference to propositions. First, equations that will later be referred to are labelled. In (49b), the first occurrence of (6.14.3) labels an equality, while 19 Moreover,

the presupposition is important that the argument in progress is an inductive argument.

78


its second occurrence refers to it. Second, (49b) contains two occurrences of condition within definite noun phrases. The first occurrence is used to introduce the theorem’s necessary and unique condition (6.14.3), while the second co-refers with (6.14.3) to this condition. The examples (49c–49d) are expressions that only refer to a part of a theorem. In (49c), the expression the first part of Theorem 20 refers to the first conjunct of the theorem, and the second part refers to its second conjunct. Apparently, the definite noun phrase in (49d) requires a more sophisticated mechanism. Reference to Definitions. In Hardy & Wright’s textbook, definitions are not explicitly labelled or numbered as such. The defined concept, however, is emphasised in italic. We give two typical examples for referring to a definition; both verbalisations are of the form the definition of X . (50)

a. The proof depends upon the notion of a ‘modulus’ of numbers. A modulus is a system S of numbers such that the sum and difference of any two members of S are themselves members of S: i.e. m ∈ S . n ∈ S → (m ± n) ∈ S. The single number 0 forms a modulus (the null modulus). [...] It follows from the definition of S? that a ∈ S → 0 = a − a ∈ S . 2a = a + a ∈ S. b. There are two natural definitions of a convex region, which may be shown to be equivalent. First, we may say that R (or R∗) is convex if every point of any chord of R, i.e. of any line joining two points of R, belongs to R. Secondly, we may say that R (or R∗) is convex if it is possible, through every point P of C, to draw at least one line l such that the whole of R lies on one side of l. [...] [...] It is easy to prove the equivalence of the two definitions? . Suppose first that R is convex according to the second definition? , [...] Secondly, suppose that R is convex according to the first definition? and that P is a point of C; [...]

In (50a), the concept of modulus is defined both verbally and symbolically. The definition of the null modulus, however, is only verbally expressed. Note that the verbal and symbolic expressions are not independent from each other. In fact, the English part of the definition complements and explains the symbolic part. For instance, while m and n are explicitly universally quantified in English, the quantifier is not present symbolically. In (50b), we have the rare case that Hardy & Wright give two definitions of a concept. The text understander therefore has to cope with referential expressions of the form the first definition of X and the second definition of X . 4.3.3.2

Definite Noun Phrases that Refer to Argument Structures

Frequently, definite noun phrases refer to complex argumentation structures. (51)

T HEOREM 1. Every positive integer, except 1, is a product of primes. Either n is prime, when there is nothing to prove, or n has divisors between 1 and n. If m is the least of these divisors, m is prime; for otherwise ∃l. 1 < l < m . l | m; and l | m → l | n, which contradicts the definition of m. Hence n is prime or divisible by a prime less than n, say p1 in which case n = p1 n1 ,

1 < n1 < n.

Here either n1 is prime, in which case the proof is completed? , or it is divisible by a prime p2 less than n1 , in which case n = p1 n1 = p1 p2 n2 , 1 < n2 < n1 < n. Repeating the argument? , we obtain a sequence of decreasing numbers n, n1 , . . . , nk−1 , . . . , all greater than 1, for each of which the same alternative? presents itself. Sooner or later we must accept the first alternative? , that nk−1 is a prime, say pk , and then n = p1 p2 . . . pk .

4.4

Connectives, Conditional and Pseudo-Conditional Statements

79

Discourse (51) contains four definite noun phrases that refer to argument structures. As is analysed in ch. 3, the proof author iterates over an abstract argument, say P(X). For each instantiation X = n, X = n1 and X = n2 , a proof by cases is carried out. The noun phrase the argument refers to P(X). The identification of such argument structures is challenging since it requires the recognition of such “abstract” proofs by cases. The noun phrase the first alternative refers to the first case of the proof by cases in such an iteration. The definite noun phrase the same alternative refers to the proof by cases object as such. The last noun phrase, the proof is completed is not as easy to resolve as it might appear at a first glance. Proofs have a recursive structure consisting of subproofs. In (51), each proof by cases for each iteration consists of two proofs, the first and terminating case and the second, recursive case. In the first iteration, the first case is terminated by when there is nothing to prove while in the second iteration it is terminated by the proof is completed. 4.3.3.3

It, This, Hence and Therefore

The cue words hence, thus, therefore, this, from this etc. can all refer to one or more propositions, as the following four exemplary sentences show. (52)

n

a. [...] Since pn < 22 is true for n = 1, it? is true for all n. b. [...] Hence p1 and q1 both appear in the unique factorisation of N and p1 q1 | N. From this? it follows that p1 q1 | n and hence that q1 | n/p1 . √ c. If ab = n, then a and b cannot both exceed n. Hence? any composite n is divisible by a prime √ p which does not exceed n. d. Let 2, 3, 5, . . . , p be the aggregate of primes up to p, and let q = 2 · 3 · 5 · · · p + 1. Then q is not divisible by any of the numbers 2, 3, 5, . . . , p. It is therefore? either prime, or divisible by a prime between p and q. In either case there is a prime greater than q, which proves the theorem.

In (52a), the anaphoric expression may refer to its immediately preceding proposition, but it may also refer n 1 to other salient propositions, say n = 1, pn < 22 is true for n = 1, or p1 < 22 . In (52b–52d), the anaphoric expressions from this, hence, and therefore all refer to a set of propositions. These very few examples suggest a difference in the use of therefore and hence. While the former is likely to refer to the last proposition in a reasoning chain, the latter is more readily understood as referring to all prior premises and asserted statements.

4.4


The logical structure of mini-discourse (53) has already been discussed in detail in ch. 3. To facilitate its linguistic analysis, we highlighted each occurrence of the connectives and, or and not as well as keywords signalling conditional statements. (53)

T HEOREM 3 (E UCLID ’ S FIRST THEOREM ). If p is prime, and1 p | ab, then p | a or p | b. Suppose that p is prime and2 p | ab. If p |a then (a, p) = 1, and3 therefore, by Theorem 24, there are an x and4 a y for which xa + yp = 1 or xab + ypb = b. But p | ab and5 p | pb, and6 therefore p | b.

We will first analyse the use of and, or and not.

4.4.1

Conjunction, Disjunction, and Negation

The interpretation of the connectives and, or and not is subject to the same level of ambiguity in mathematical texts as it is their use in other text genres. We therefore restrict ourself to a few, brief remarks.

80


Conjunction. Discourse (53) contains several different uses of and: and1 , and2 and and5 are each employed to express a logical conjunction of two statements; and3 and and6 , in combined use with therefore, signal forward reasoning (i.e., and therefore sequences reasoning steps); and and4 is a coordinating construction that results in the existential quantification of both x and y. There are, however, yet other uses of and as the following three items show: • Filling the arguments of a predicate (e.g., f and φ are asymptotically equivalent, or that f is asymptotic to φ.) • Sequencing actions (e.g., If we arrange them in increasing order, associate sets of equal primes into single factors, and change the notation appropriately, we obtain...) • Combining assumption and action (e.g., If ai > bi , and we divide by pbi i , we obtain...) Noteworthy, Hardy & Wright express conjunction also with the symbols “,” (comma) and “.” (full stop). Whenever a complex statement is composed of those punctuation marks, and its parts are formulae, it is a logical conjunction.

Disjunction. Discourse (53) contains two occurrences of the word or. Only its first occurrence, in the theorem statement, forms a logical disjunction.20 The second occurrence in xa + yp = 1 or xab + ypb = b indicates a rewriting step; the first equation is multiplied with b, yielding the second equation. A nonlogical use of the lexical item or is also present in the following two sentences: (54)

a. We express the fact that a is divisible by b, or b is a divisor of a, by b|a. b. The series (un ) or 1, 1, 2, 3, 5, 8, 13, 21, . . . in which the first two terms are u1 and u2 , and each term after is the sum of the two preceding, is usually called Fibonacci’s series.

The uses of the lexical entry or in (54a) and (54b) are rhetorical: or is used similar to in other words to supply alternative descriptions for a given mathematical entity. In (54a), the word or is used give alternative descriptions for divisible/2, namely divisor of/2 and “| /2”. In (54b), the word or is used to introduce an alternative description of un by enumerating the series as 1, 1, 2, 3, 5, 8, 13, 21, . . .. In Hardy & Wright’s textbook, we did not find a symbol (for instance, ∨) to express a logical disjunction. Negation. There are various ways to express a negated statement. In discourse (53), negation only occurs once, namely in the symbolic expression p | a. In Hardy & Wright’s textbook, negation is often expressed symbolically, as in x = 0, x ≡ a(modm) and s0 < s1 < s. The following three examples, along with their readings, show how Hardy & Wright express negation non-symbolically. (55)

a. a and b have no common factor. Reading: ¬∃x : f actor(x, a) ∧ f actor(x, b) √ b. a and b cannot both exceed n. Reading: ¬(gt(a, sqrt(n)) ∧ gt(b, sqrt(n))) c. Any prime number, except 2 or 3, is of the form 6n + 1 or 6n + 5. Reading: ∀p : (prime(p) ∧ p = 2 ∧ p = 3) → ∃n : p = 6n + 1 ∨ p = 6n + 5 ∧ ¬∃n : (2 = 6n + 1 ∨ 2 = 6n + 5) ∧ ¬∃n : (3 = 6n + 1 ∨ 3 = 6n + 5)

20 Note

that the only correct use of the logical or in “A or B” is “either A or B but not both”. The inclusive use must be stated explicitly as “A or B or both”.

4.4


4.4.2

81

Conditionals

The conditional statement is the basic ingredient of a mathematical argument. In stark contrast to the expression of a conditional in first order logic, which only provides the → sign, mathematical writing has many linguistic constructions to express the logical consequence relation between statements (cf. Fig. 4.1). Moreover, conditional constructions can extend over sentence boundaries and can also exhibit quite a degree of ambiguity. Linguistically, it is consequently a complex if not impossible matter to distinguish the assumed parts from the derived statements of a proof. The difficulty of processing conditional statements has already been recognised elsewhere, of course. As Kamp & Reyle note: “[Consequently,] no simple algorithm exists for deciding where the hypothetically asserted part ends and the text continues with assertions that are made categorically (i.e., not conditionally upon the assumptions expressed by the suppositional part).” As they point out, this question is subordinate to the general problem: “how the language user perceives the logical and rhetorical structure of a discourse or text is still poorly understood, and very far from being solved” [92, p. 145]. Our discussion of conditionals is divided into four parts. In the first part, we study classical conditionals that introduce assumptions and draw conclusions from them; in the second part, categorical conditionals expressing that a statement implies another statement; in the third part counterfactuals, in the fourth part pseudo-conditional statements; and in the fifth part definitional and other bi-conditional statements. We will see that the linguistic clues that occur in textbook proofs do not suffice to disambiguate between various readings of conditional statements.In fact, mathematical and meta-mathematical reasoning, as we demonstrated in ch. 3, must be employed to support the linguistic analysis. 4.4.2.1

Classical Conditionals

Classical conditionals can be divided into two classes; those which make generic assertions and usually occur in theorem contexts; and those which have the same surface form, but have a subtly but importantly different semantic and pragmatic use. Classical Conditionals (theorem contexts). Intra-sentential conditionals of the form If A, then B, where the propositional content of B only depends on the propositional content of A, are exceptional. The following six examples are such classical conditionals or surface form variations thereof. (56)

a. If p is prime, and p | ab, then p | a or p | b. b. If ab = n, then a and b cannot both exceed c. If

2n+1 − 1

is prime, then

2n (2n+1 − 1)

d. If p and q are odd primes, then e. If p and q are odd primes, then

case qp = − qp .

p q

√

n.

is perfect.

p q , where p = 1 (p − 1), q = 1 (q − 1). q p = (−1) 2 2

p q q = p , unless both p and q are of the form 4n + 3, in

which

f. The square of an odd number 2m + 1 is 4m(m + 1) + 1. Statements (56a–56c) are of the classical surface form If A, then B. Statement (56d), Hardy & Wright’s Theorem 98, is of the surface form If A, then B, where C while it reads as If A and C, then B. Thus, its correct logical form is: p q 1 1 = (−1) p q . ∀p∀q∀p ∀q .(odd(p) ∧ odd(q) ∧ prime(p) ∧ prime(q) ∧ p = (p − 1) ∧ q = (q − 1)) → q p 2 2

Sentence (56e), Hardy & Wright’s Theorem 99, has the surface form If A, then B, unless C, in which case D; logically it reads as the conjunction of two conditionals If A and not C, then B and If C, then D. As sentence (56f) shows, a conditional can be expressed without cue words. However, the embedding of an indefinite noun phrase in a functional expression indicates a generic reading. Given a mathematical theory, all of the above sentences can be interpreted on the sentence level. However, this does not need to be the case, as the following discussion shows.

82


A Subtle But Important Difference. Reconsider discourse (53). The proof contains two syntactic constructions of the form If A, then B; one occurs in the theorem statement, and the other is embedded in the proof context. These two conditionals differ in two aspects. From our mathematical analysis in § 3.4, we know that the variables p, a and b in the conditional theorem statement are all universally quantified. Thus, the logical form of the theorem statement reads as ∀p ∈ N : ∀a ∈ N : ∀b ∈ N : prime(p) ∧ p | ab → p | a ∨ p | b. However, the occurrences of the variables p and a in the second if-then construction are free. It reads as p | a → (a, p) = 1. Consequently, the if-then statement itself is not a reliable means to identify the quantification of the variables it contains. The second distinguishing mark between the two conditionals concerns the scope of the variables they use and the assumptions they introduce. The theorem statement introduces two assumptions, namely, p is prime and p|ab. From this, the conclusion part, p | a ∨ p | b, is claimed to hold. That is, the scope of these two assumptions is restricted to the theorem statement itself (and coincides with the scope of the quantifiers for p, a and b). However, the scope of the assumption p | a, which is introduced by the second if-then construction, extends beyond its conclusion part. Moreover, this premise on its own is not sufficient to logically derive the statement in its conclusion part. That is, the derived statement (a, p) = 1 (which reads as the greatest common divisor of a and p equals 1) is not a logical consequence of the assumption p | a. In order to make this a sound reasoning step, the assumption prime(p), introduced earlier in the dialogue, must be used as well. Consequently, the if-then statement can introduce assumptions that “remain active” in subsequent statements. Note that the second if-then statement can be written as Suppose p | a. Then (a, p) = 1 . . ... As a matter of fact, a construction of the form Suppose A is used in the first proof sentence of Discourse (53). Its assumptions, namely, p is prime and p | ab must extend over the remainder of the proof. However, there are obviously cases where assumptions do not remain valid until the proof terminates. In a proof by cases, for example, there is always an assumption that is initiating a case in the proof. This assumption can only be used by reasoning steps of the respective case. For instance, it is unsound to use the initiating assumption of the first case in the reasoning chain that constitutes the subproof for the second case. In the existence proofs of Hardy & Wright and LeVeque, the assumption prime(p) can be used to prove the first case, and the assumption ¬prime(p) is accessible for the derivation of statements in the second case. Constructions of the form Therefore B or Hence B can be regarded as incomplete conditionals: they do not explicitly verbalise or symbolise the propositions that B depends on, either because the propositions were mentioned earlier in the proof and are considered salient, or because the proof author considers them self-evident or trivial. Multi-Sentential Conditionals. It is quite usual for conditionals to cross sentence boundaries, as the following fragment from Hardy & Wright’s uniqueness proof shows: b

Suppose now that n = pa11 pa22 . . . pak k = qb11 qb22 . . . q j j , each product being a product of primes in

(57)

b

standard form. Then pi | qb11 q j j for every i, so that every p is a q; and similarly every q is a p. Hence k = j and, since both sets are arranged in increasing order, pi = p j for every i. The first sentence, prefixed with the cue suppose introduces four assumptions.21 Only in the following sentence do Hardy & Wright draw a number of conclusions from these four assumptions. The individual conclusions are linguistically marked by then, so that and and similarly. The reasoning then continues with a hence construction indicating that its argument k = j is a consequence of earlier statements. In the subsequent since construction, it is conveyed that its first argument both sets are arranged in increasing order, together with the wider mathematical context, implies its second argument pi = p j for every i. 21 The

b

four assumptions are: (1) pa11 pa22 . . . pak k is a product of primes in standard form; (2) qb11 qb22 . . . q j j is a product of primes in b

standard form; (3) n = pa11 pa22 . . . pak k ; (4) n = qb11 qb22 . . . q j j .

4.4


4.4.2.2

83

Categorical Conditionals

Categorical conditionals also constitute a logical relation between two or more statements. The relation, however, is of a different nature. Consider linguistic constructions of the form A implies B. Statements of this form do not introduce the assumption A in order to conclude B. Instead, A implies B constructions always presuppose that A has already been shown to hold in prior discouse, either by some assumption that introduced it or by some forward or backward reasoning chain that derived it. This A is then used to conclude B. Thus, statements of the form A implies B always indicate that the truth of A non-conditionally implies the truth of B and that A has been introduced in the discourse at an earlier stage. A similar argument can be made for constructions of the form Since A, B holds. We give a brief discussion of a few examples. (58)

a. Theorem 28 implies Theorem 29. b. [For] kai − ka j ≡ 0(modm) implies ai − a j ≡ 0(modm), by Theorem 55, [...] c. Since log log x ≤ n, we deduce that π(x) ≥ log log x. d. Since (h, k) = 1, the equation kx − hy = 1 is soluble in integers (Theorem 25). e. Each of these numbers is a root of (7.5.1), since f d ≡ 1 implies f hd ≡ 1.

In (58a), it is claimed that one statement, which is referred to by Theorem 28, logically entails another statement, which is referred to by Theorem 29. In (58b), we have an example of the form A implies B because of C. Sentences (58c-58d) are similar constructions. In the first since example, log log x ≤ n allows to conclude π(x) ≥ log log x. In the second since example, kx − hy = 1 is soluble in integers because of (h, k) = 1 and the application of another statement, which is labelled as Theorem 25. The last sentence shows that since and implies can both appear within the same sentence. In none of these constructions does the proof author introduce an assumption to the proof context. 4.4.2.3

Counterfactuals

A counterfactual is an expression of the form: “If A were the case, then B would be the case”, where A is supposed to be false.22 Read these examples: (59)

a. It will now be obvious why 1 should not be counted as a prime. If it were, Theorem 2 would be false, since we could insert any number of unit factors. b. If p were a prime of k(i), it would divide x + i or x − i, and this is false, since the numbers x i ± p p are not integers. Hence p is not a prime.

In the first example, the premise “1 is prime” would invalidate a theorem, and thus, must be false. In the second example, the premise “p is a prime of k(i)” must be false as well, since assuming its truth, for the sake of argument, yields a contradiction. It is common practise to use counterfactuals within proof by contradiction, or reductio ad absurdum arguments. 4.4.2.4

Pseudo-Conditionals

Pseudo-conditional statements share the syntactic form of conditional statements. However, they do not express a logical relation between a set of assumptions and a hypothetically asserted conclusion, or a logical relation between two statements. 22 As

de Swart and Nederpelt point out, a counterfactual is a subjunctive conditional, but not all subjunctive conditionals are counterfactuals [41].

84


Naming vs. Assuming. The conditional form of a statement can indicate a naming action, as the following example suggests. (60)

a. If m is the least of these divisors, then m is prime. b. The least of these divisors is prime.

While (60a), taken from Hardy & Wright’s existence proof of the FTA, has a conditional form, our reformulation (60b) has not. There are two interpretations of the if-then construction in (60a). In the first interpretation, an assumption is made, namely that m corefers with the least of these divisors to the same object. This might or might not be the case — it is therefore assumed, and under this assumption the statement m is prime is concluded. In this interpretation, (60a) is semantically different from (60b). In the second interpretation of (60a), no assumption is made, but a naming action is performed. For ease of argument, the object being referred to by the least of these divisors is given a second and shorter reference, namely m. This new reference is used in the conclusion m is prime. In this interpretation, (60a) is semantically identical to (60b). Clearly, the choice between these two interpretations depends on the variable m. If m has been introduced in the discourse before (60a), then a proof reader will prefer our first interpretation. And if m is new, that is, m is introduced by the premise of (60a), then a reader will prefer our second interpretation.23 The following sentences are other examples that may suggest a naming action. (61)

a. Hence, if N = n − p1 q1 , we have 0 < N < n and N is not abnormal. Now p1 | n and so p1 | N; similarly q1 | N. Hence p1 and q1 both appear in the unique factorisation of N and p1 q1 | N. b. If pn is the nth prime then π(pn ) = n. c. If q1 is the least q, we have q21 ≤ n.

Only after consideration of their respective discourse context can it be established that the first two sentences contain naming actions, and that the third sentence’s premise is a hypothetical assertion. In (61a), for brevity, the term n − p1 q1 is given a new reference, namely N. Since N is then used six times, the naming action considerably shortens the text. The conditional in (61b) is a naming construction as well since pn does not occur in its discourse context. In sentence (61c), no naming takes place since the name q1 has already been introduced in prior context. Therefore, the premise q1 is the least q introduces an assumption into the discourse instead of a new name for a specified term. Naming vs. Asserting An Identity. As we have seen in the prior discussion, naming can easily been confused with assuming or asserting identities. Furthermore, naming and asserting identity can co-occur in the same text segment as the following mini-discourse taken from [68, p. 20] illustrates. (62)

a. It is plain that any modulus S, except the null modulus, contains some positive numbers. b. Suppose that d is the smallest positive number of S. c. If n is any positive number of S, then n − zd ∈ S for all z. d. If c is the remainder when n is divided by d and n = zd + c, then c ∈ S and 0 ≤ c < d. e. Since d is the smallest positive number of S, c = 0 and n = zd. f. Hence T HEOREM 23. Any modulus, other than the null modulus, is the aggregate of integral multiples of a positive number d.

The discourse consists of six sentences leading to a theorem statement in the last sentence (62f). In the first five sentences, several naming actions take place. In the first sentence, which has the logical form ∀S ∈ Modulus : S = 0 → ∃n ∈ N : n > 0 ∧ n ∈ S, 23 A third reading can be constructed that combines the introduction of the assumption “there is an entity that qualifies as the least

divisor of the set of divisors of p” with the naming action “let the name of this entity be m”.

4.4


85

a universally quantified variable named S is introduced. In subsequent discourse, S then denotes any modulus, except the null modulus. In (62b-62d), other names are introduced: (i) d, denoting the smallest positive number of any S, (ii) n, denoting any member of S and (iii) c, denoting the remainder of n when divided by d. Note that the definite description given for c, the remainder when..., coincides with c’s role in the equation n = zd + c. Now compare the assumption made in (62b) with the first part of the since construction in (62e). The two sentences share the sentence d is the smallest positive number of S word for word. While its first use is identified as a naming action, its second use is clearly an asserted identity. A second confusion may be caused when attempting the interpretation of the symbol d in the theorem sentence (62f). Only pragmatic considerations will help distinguishing d from denoting “a positive number of S” or denoting “the smallest positive number of S”. Acting vs. Assuming. There are sentences that must be clearly distinguished from conditionals and naming actions, albeit their schematic form is similar. Consider these three examples. (63)

i −1 bi +1 pi+1 . . . pbk k . a. If ai > bi , and we divide by pbi i , we obtain pa11 . . . pai i −bi . . . pak k = pb11 . . . pbi−1

b. If we multiply out the right-hand side and use the identity 2 cos mα cos nα = cos (m + n)α + cos (m − n)α, we obtain x1 x2 = 4(x1 + x2 ) = −4. c. If we put a = 1 in Theorem 79, it becomes T HEOREM 80 (W ILSON ’ S T HEOREM ): (p − 1)! ≡ −1(mod p). The conditional’s premise in (63a) contains the statement of one assumption, namely, ai > bi , and the statement of one operation, namely dividing a previously given term by another. In (63b), the conditional’s “premise” does not contain any assumption, but only one operation (multiplying). The result of performing this operation is then made explicit in the conditional’s “consequent”. In (63c), the “premise” we put a = 1 does not introduce the assumption a = 1, but describes an instantiation action. In fact, given this instantiation, Theorem 80 is a trivial corollary of Theorem 79, that is, Theorem 80 does not depend on additional assumptions. 4.4.2.5

Definitional Constructs and other Bi-implications

Due to their highly formulaic nature, definitions and other bi-implications can often be recognised on purely syntactic grounds. A large part of Hardy & Wright’s definitions are of the two forms NP is said to VP if S and if S then NP is said to VP. Moreover, the defined concept is usually italicised. Non-definitional constructions are usually cued by if and only if phrases. NP is said to VP if S. We give four examples. (64)

a. An integer a is said to be divisible by another integer b, not 0, if there is a third integer c such that a = bc. b. A number p is said to be prime if (i.) p > 1, (ii.) p has no positive divisors except 1 and p. c. A function f (m) is said to be multiplicative if (m, m ) = 1 implies f (mm ) = f (m) f (m ). d. A number n is said to be quadratfrei if it has no squared factor.

In each of the examples, type information is asserted in the NP is said to VP-part of the definitional construction: In (64a), a and b are introduced as integers; in (64b) and (64d), p and n are introduced as numbers, respectively; and in (64c), f (m) is introduced as a unary function. The first definition, however, is different from the others. Here, the NP is said to VP-part imposes additional restrictions on the entities it introduces, as the intended logical form of (64a) shows: ∀a∈N ∀b∈N .divisible(a, b) ⇔ (a = b ∧ b = 0 ∧ ∃c∈N .c = a ∧ c = b ∧ a = bc).

86


That is, the two conditions introduced by another integer b, not 0 attach to the right-hand side of the bi-implication. Consequently, this prevents a compositional construction of its semantic representation. Moreover, the two indefinite NPs an integer a and another integer b yield universally-quantified variables in the logical form, while a third integer c, since prefixed with there is, yields an existentially quantified variable. Also, note the use of the pronoun in (64d). It shows that an English anaphoric expression can occur in the premise of a conditional while its antecedent is introduced in the conditional’s consequent. If S then NP is said to VP. We give four examples. (65)

a. If (a, b) = 1, a and b are said to be prime to one another or coprime. b. If ζ and η are two numbers such that ξ = ±1, then ξ is said to be equivalent to η.

aη +b cη +d ,

where a, b, c, d are integers such that ad − bc =

c. ξ is an algebraic number if it is a root of an equation c0 ξn + c1 ξn−1 + . . . + cn = 0

(c0 = 0)

whose coefficients are rational integers. If c0 = 1, then ξ is said to be an algebraic integer. d. If γ = αβ, then γ is said to have α as a left-hand divisor and β as a right-hand divisor. It seems that constructions of the form If S then NP is said to VP are much more context-dependent than those of the form NP is said to VP if S.. In (65a), a and b are not explicitly introduced as numbers. As a matter of fact, in Hardy & Wright, their type was introduced as many as four sentences earlier (see (66d) below). The context dependence also shows in the last sentence of (65c). It clearly depends on the context that is set-up by its first sentence. Similarly, in (65d), the meaning of γ, α and β has to be taken from the previous context. Other Definitional Constructions. In Hardy & Wright’s textbook, there is a variety of other linguistic constructions that can be used to define new concepts: (66)

a. The sum of the series whose general coefficient is f (n) is called the generating function of f (n), and is said to enumerate f (n). b. We define the highest common divisor d of two integers a and b, not both zero, as the largest positive integer which divides both a and b; and write d = (a, b). c. We express the fact that a is divisible by b, or b is a divisor of a, by b|a. d. The least common multiple of two numbers a and b is the least positive number which is divisible by both a and b. e. A class of residues (mod m) is the class of all the numbers congruent to a given residue (mod m), and every member of the class is called a representative of the class.

In (66a), two noun phrases are joined by is called, a linguistic cue phrase that clearly indicates a definition. Similarly, in the sentences (66b-66c), the use of the cue phrases we define and we express the fact that clearly signals a definitional construction. Definitional statements can also be expressed without the use of cue words if, then, is said to be and is called. The sentences (66d–66e) show subtles ways to express a definition. In (66d), two noun phrases are joined together with the verb is. The first sentence of (66e) is a definition of the form INDEF NP is DEF NP. Only the use of the italic font favours a reading that consists of an “identity by definition” relation instead of an “asserted identity” relation.

4.4


87

Bi-Implications. In Hardy & Wright, bi-implications are usually of the form S1 if and only if S2, S1 is a necessary and sufficient condition for S2 and A necessary and sufficient condition that S2 is that S1. We give three examples for constructions of the first form, and one example for each other construction: (67)

a. A number is divisible by 11 if and only if the difference between the sums of its digits of odd and even ranks is divisible by 11. b. The equation ax + by = n is soluble in integers x, y if and only if d|n. c. A number n is the sum of two squares if and only if all prime factors of n of the form 4m + 3 have even exponents in the standard form of n. d. n = 4a (8m + 7) is a necessary and sufficient condition for n to be representable by three squares. e. If m > 1, then a necessary and sufficient condition that m should be prime is that m|(m − 2)! + 1.

In Hardy & Wright’s textbook, these clue phrases indicate a bi-conditional statement. Hardy & Wright do not use these constructions for bi-conditional statements that act as definitions.

Outlook The particularities of mathematical discourse have now been analysed from a mathematical and metamathematical as well as a linguistic perspective. In the second part of this dissertation we develop, step by step, a computational framework for the automatic processing of textbook proofs. The computational framework uses, extends and combines two techniques that have been developed in automated reasoning and natural language processing, namely, proof planning and discourse representation theory. A prototypical system called Vip implements our computational framework. In the sequel, we describe each of its different processing phases in detail. Note that Vip’s architecture has been outlined at the end of ch. 1, especially in Fig. 1.5. In the first of the following 3 chapters, we describe Vip’s parsing module that constructs intermediate semantic representations within an extended DRT framework.

88


Chapter 5

Mathematical Discourse and DRT — The Parser Module Discourse representation theory (DRT) [91, 92, 161] has been motivated by the study of anaphoric behaviour and donkey sentences. It attempts to give a general account of definite descriptions and pronominal anaphora. DRT’s underlying insight or premise is that truth does not only apply to single sentences, but also to multisentence discourse; that the sentences of a discourse are connected to each other; and that they can rarely be interpreted in isolation. In the last years, DRT has developed into an established and widely used theory of discourse representation. Now, the question is whether DRT can be successfully applied to the representation of mathematical discourse, and whether it can also serve as a computational framework for the construction of such representations. As we will see, DRT does indeed provide a good basis. However, it will need to be adapted for our purposes; both the language of discourse representation structures (DRSs) and the discourse update mechanism will need to be modified and extended. Fig. 5.1 describes the input-output behaviour of Vip’s parser module. Its input is a sequence of LATEX1 tokens, with markers that signal sentence boundaries.2 For each sentence, the parser module returns both

LATEX Sentence e.g., [n,is,prime]

Syntactic & Semantic Analysis

/ Parser Module

/

v1∈N name(v1 , n) . v1 =? prime(v1 )

Figure 5.1: The Parser Module

a syntactic and an intermediate and underspecified semantic representation in DRS form. The discourse update engine, empowered by a proof planner, then takes this representation and incorporates it into a proof representation structure (PRS). The update of the PRS results in a complete specification of those parts of the semantic representation that the parser left underspecified. In this chapter, we describe the construction of underspecified DRSs on the sentence level. PRSs are described in ch. 6, and the discourse update engine is described in ch. 7. We start with an elementary introduction to DRT following along the lines of Kamp & Reyle [92].

5.1

Discourse Representation Theory

The basic data structure of DRT is the discourse representation structure (DRS). Its nature is twofold, the representation of content and the provision of context. A DRS consists of a set of discourse referents and a 1 The LAT X E

text setting language is well accepted among mathematicians. Therefore, we decided to use it as a representation language for the input of mathematical expressions. 2 The LAT X input has been tokenised by hand. Also, sentence boundaries have been marked manually. E

89

90

Mathematical Discourse and DRT — The Parser Module

set of discourse conditions. DRT defines which linguistic structures introduce new discourse referents and conditions, and which linguistic structures refer to previously introduced referents.

5.1.1

Formal definition of DRSs.

For the following definitions, let x1 , . . . , xn be discourse referents, let γ1 , . . . γm be conditions, let R be a relation symbol of arity n, and let B, B1 , B2 be DRSs. Definition 5.1.1 Discourse Representation Structures.

1.

x1 , . . . , xn γ1 is a DRS. ... γm

2. R(x1 , . . . xn ) is a condition. . 3. x1 = x2 is a condition. 4. Each of ¬B, B1 ∨ B2 , B1 → B2 is a condition. 5. Nothing else qualifies as a DRS or a condition. The resolution of definite descriptions and pronominal anaphora is restricted by a notion of accessibility, a geometrical concept that defines accessible discourse referents and conditions. Definition 5.1.2 Accessibility Relation Among DRSs. 1. B1 is accessible from B2 if and only if either B1 = B2 , or B2 is a subordinate of B1 . 2. B2 is a subordinate of B1 if and only if either B2 is a direct subordinate of B1 , or there is a DRS B3 such that B3 is a subordinate of B1 , and B2 is a subordinate of B3 . 3. B2 is a direct subordinate of B1 if and only if • B1 contains a condition of the form ¬B2 . • B1 contains a condition of the form B2 ∨ B3 or B3 ∨ B2 for some B3 . • B1 contains a condition of the form B2 → B3 for some B3 . • B1 → B2 is a condition of some DRS B3 . Note that DRT is not a theory that specifies how to link an anaphoric expression to its antecedent. The notion of accessibility merely restricts the number of available antecedents. Usually, additional information such as agreement or topic needs to be taken into account. A DRS B is true in a model M = if there is a function f that maps each of the discourse referents of B to an element of UM and that verifies each of the conditions γ in M. For details, see Kamp & Reyle’s [92]. In this introductory presentation of DRT, however, we content ourselves with defining the semantics of a DRS by specifying its translation into the language of first-order logic: Definition 5.1.3 Translation Semantics of DRSs into First-Order Logic Formulae. ⎞FOL x1 , . . . , xn ⎟ ⎜ γ1 ⎟ = ∃x1 , . . . ∃xn (γFOL ∧ . . . ∧ γFOL 1. ⎜ m ) 1 ⎠ ⎝ ... γm ⎛

2. (R(x1 , . . . xn ))FOL = R(x1 , . . . xn )

5.1


91

. 3. (x1 = x2 )FOL = (x1 = x2 ) 4. (¬B)FOL = ¬(BFOL ) 5. (B1 ∨ B2 )FOL = BFOL ∨ BFOL 1 2 ⎛

⎞FOL x1 , . . . , xn ⎜ γ1 ⎟

FOL FOL ) → BFOL ⎟ (γ 6. ⎜ → B = ∀x , . . . ∀x ∧ . . . ∧ γ 1 n m 1 ⎝ ... ⎠ γm

5.1.2

The Construction of DRSs from the Syntax Tree.

A discourse representation structure is constructed from a syntax tree by the iterative application of construction rules. 5.1.2.1

Kamp & Reyle’s Construction Rules

In [92], Kamp & Reyle give a number of rules that define the construction of DRSs from syntax trees. Fig. 5.2 depicts Kamp & Reyle’s construction rule CR.EVERY [92, p.169]. In this figure, UK denotes the discourse referents of the DRS K, that is, its universe, and ConK denotes K’s conditions. The rule is divided CR.EVERY Triggering configuration γ ⊆ γ¯ ∈ ConK :

S vv @@@@ vv @@ v {vv NPGen=β V P HHH r HHH rrr H# yrr DET N every

Introduce in ConK : Introduce in UK1 : Introduce in CONK1 : Introduce in CONK2 :

Delete γ¯ from ConK

or:

V P LL LLL {{ { LL% { { }{ NPGen=β V HHH rr HHH r r H# yrr N DET every

new condition K1 ⇒ K2 with K1 = K2 = new discourse referent u new condition N(u) new condition γ , where γ results from γ¯ by substituting: u for NPGen=β HHH rr HHH r r r H# yr N DET every

Figure 5.2: Kamp & Reyle’s Construction Rule CR.EVERY

into two parts, a precondition, or triggering configuration, and a postcondition, or tree transformation. If the precondition pattern matches the syntax tree, then the tree is transformed by performing each of the actions of the transformation. Example 5.1.1 Construction Rule Application. The syntax for the sentence Every farmer owns a donkey is given in (68).

92

(68)

Mathematical Discourse and DRT — The Parser Module SR kkkk RRRRRR k k k RRR kkkk RRR ) ukkkk NP I V P I x I x II x II {xx $ DET N V PF FF vv FF vv v # zvv

every

f armer

V

NP I

II II II $

x xx x| x

owns

DET

N

a

donkey

The syntax tree contains a triggering configuration for the construction rule CR.EVERY. Its application yields (69).

x f armer(x)

→ x

|S HHHH || HH | H$ | ~||

V P

V PF

(69) V

v vv vv v z v

owns

FF FF #

NP I

x xx x| x

DET

a

II II II $

N

donkey

The resulting tree can be further reduced. It matches the triggering condition of the construction rule CR.ID, and CR.ID’s application then transforms the subtree for the indefinite noun phrase. As a consequence, a new discourse referent, say y, as well as the condition donkey(y) are introduced. We finally obtain (for details see [92]):

(70)

x y f armer(x) → owns(x, y) donkey(y)

In this thesis, we mostly omit the discussion of DRT construction rules. We mention such rules only if they modify or extend those given in Kamp & Reyle’s [92]. 5.1.2.2

The Composition of DRSs using λ-Terms.

The λ-calculus provides an effective computational device for the construction of discourse representation structures. The use of the λ-calculus for this purpose is described in Asher’s [92] and also in newer accounts, e.g., van Eijck and Kamp’s [161]. In this approach, the leaf nodes of a parse tree, that is, the lexical entries, have as their semantic representation λ-terms. The combination of complex semantic representations from simpler ones is then accomplished by applying λ-terms to each other. The construction of semantic representations is usually compositional, and thus usually follows the syntactic structure of the sentence.

5.1


93

Example 5.1.2 λ-DRT Construction. We use the same example as above. The leaves of the syntax tree now point to their semantic representations as λ-expressions.

(71)

S YYYYYYYY nnn YYYYYY n n YYYYYY nnn n YYYYYY n vn , V P NP C p p C CC ppp CC xppp ! DET N V P LL i i i i LLL iiii LLL iiii i i % i i t i NPA every f armer V s AA s O O s AA s O O A ysss N owns DET λP.λQ.u • P(u) → Q(u) λx. f armer(x) O O a donkey λT.λx.T (λy(owns(x, y))) O O O O

λR.λS.v • R(v) • S(v) λx.donkey(x)

The semantic representation of S is constructed by combining the semantic representations of NP and VP’. The semantic representation of the NP every farmer is obtained by applying the lexical entry of the determiner to the one of the noun; the semantic representation of the VP’ is constructed by combining the semantic representation of the transitive verb owns with the semantic representation of the indefinite noun phrase a donkey. Technically, composition is defined in terms of λ-term applications. For the given example, we have (=β denotes one β reduction step in the λ-calculus): every farmer

[λP.λQ.u • P(u) → Q(u)] • [λx. f armer(x)] =β λQ.(u • (λx( f armer(x)))(u) → Q(u)) =β λQ.u • f armer(u) → Q(u)

a donkey

[λR.λS.v • R(v) • S(v)] • [λx.donkey(x)] =β λS.v • (λx(donkey(x)))(v) • S(v) =β λS.v • donkey(v) • S(v)

owns (a donkey)

[λT.λx.T (λy(owns(x, y)))] • [λS.v • donkey(v) • S(v)] =β λx.[λS.v • donkey(v) • S(v)](λy.owns(x, y)) =β λx.v • donkey(v) • (λy(owns(x, y)))(v) =β λx.v • donkey(v) • owns(x, v) [λQ.(u • f armer(u)) → Q(u)] • [λx.v • donkey(v) • owns(x, v)] =β (u • f armer(u)) → [λx.v • donkey(v) • owns(x, v)](u) =β (u • f armer(u)) → v • donkey(v) • owns(u, v)

((every farmer) (owns (a donkey)))

The last λ-expression is identical to the DRS representation (70). We now describe, in very few sentences, the construction of DRSs for multi-sentence discourse. In ch. 7, we then propose our, quite modified, discourse update algorithm.

5.1.3

The Construction of DRSs for Multi-Sentence discourse.

In DRT, a discourse is processed incrementally, sentence by sentence. The first sentence s1 of a discourse is processed in an initial context c0 , resulting in a new and richer context c1 . Every other sentence si of a discourse is processed in the context ci−1 that has been created by processing the earlier sentences of the discourse. The result of processing si is to enrich ci−1 by the semantic contribution of si resulting in a new and richer ci . Basically, in Kamp & Reyle’s account [92], the addition of a DRS representation of si to the discourse context ci−1 is a union operation: the discourse referents of si are added to the ones of ci−1 , and the conditions of si are added to the ones of ci−1 (multiple occurrences of referents and conditions are deleted). . Then, conditions of the form xi = x j are resolved, w.r.t. accessibility constraints.

94


This concludes our introductory remarks on DRT. Future sections will now discuss the use, revision and extension of DRT to mathematical discourse.

5.2

Semantic Construction for Terms and Formulae

This section follows the structure of sect. 4.2. We describe the DRS representation of constants, variables, terms and predicates. Adaptions or extensions to the standard DRT framework will be proposed in order to make DRT powerful enough for our purposes.

5.2.1

Constants

In number-theoretical texts, number constants are the major source of proper names. Several treatments of proper names have been proposed within the DRT framework. Fig. 5.3 displays seven possible λ-term representations for the proper name 2. c λP. c = 2 •P(c)

c λP. 2(c) •P(c)

(a)

c λP. name(c, 2) •P(c)

(b)

c λP.

•P(c) {} (e)

2 λP.

(c)

c λP. name(c, 2) •P(c) {}

•P(2) (d)

λP.

(f)

•P(c) {} (g)

Figure 5.3: Handling Proper Names The representation in Fig. 5.3(a), given in Kamp’s 1981 paper [91], interprets proper names as logical constants. Therefore, it can not cover cases where the same proper name refers to different individuals. For mathematical constants, however, this is a desired property. For example, in the Platonic universe of elementary number theory, there is only one number 2 , and 2 unambiguously denotes it. The solution proposed in Fig. 5.3(b) is given in Kamp & Reyle’s textbook on DRT [92]. The intended meaning of the unary predicate 2(c) is that it only holds for entities that fulfill the property of “twohood”. In principle, Fig. 5.3(b) allows a proper name to refer to different individuals. For instance, c may also occur as an argument of the “threehood” predicate. Fig. 5.3(c) reads as follows: there is an entity c that can be referred to (or represented by) the name 2. Since a large part of mathematics is concerned with the manipulation of representations (e.g., the demonstration that two representations indeed refer to the same mathematical entity), it makes sense to preserve the mathematical representation of entities in their respective linguistic representations. For example, if we regard 1+1 as one token, then the discourse condition name(c, 1 + 1) contains the information that 1+1 is a name or representation for some entity c. The representation proposed in Fig. 5.3(d) is, in fact, an invalid partial DRS, given our formal definition of DRSs in Definition 5.1.1. It introduces a discourse constant instead of a (variable) discourse referent and no discourse condition. In contrast to the logical constant representation in Fig. 5.3(a), it allows non-standard interpretations, as do the representations in Fig. 5.3(b)–5.3(c). The number of interpretations of a DRSs can be restricted by the introduction of external anchors, a means to implement the notion of direct reference for proper names. The representations in Fig. 5.3(e)–5.3(g) make use of such anchors. For each of the representations, its interpretation must satisfy the constraint that c is mapped to 2. Note that an attachment of the anchor {} to Fig. 5.3(a)–5.3(b) would render their respective conditions c = 2 and 2(c) superfluous. However, its attachment to Fig. 5.3(c), as shown

5.2


95

in Fig. 5.3(f), does not render redundant the condition name(c, 2). As argued above, reconsider the token 1+1, which would be represented as referring to an entity c such that name(c, 1+1) holds, with c anchored to 2, {}. In this case, the name predicate conveys information that supplements the one contained in the anchor, namely that 1 + 1 is a representation or name for 2. The representation of Fig. 5.3(g) is taken from van Eijck and Kamp’s [161]. van Eijck and Kamp consider a proper name as an anaphoric expression that refers to an “externally given” antecedent. Therefore, their semantic representation of 2 does not introduce a new discourse referent c. Thus, proper names get a context-insensitive treatment. This is problematic, as van Eijck and Kamp point out [161, p. 220]: “Does the use of a proper name presuppose that its referent is already represented in the given context? Perhaps, but if so, then ‘context’ needs to be construed in a quite liberal way. So, before such a treatment of proper names can be considered satisfactory, much more needs to be said about how the notion of context is to be construed — what kinds of information may contexts include, from what kinds of contexts can their information come, etc.”

As it stands, for our approach to discourse understanding, the representation in Fig. 5.3(c) is sufficient and will be subsequently used. In addition, each discourse referent will be typed. That is, the representation in Fig. 5.3(d) is extended to λP.

2∈N

•P(2∈N ).

An Example DRS construction. Consider the following two sentences: (72)

a. 17 is prime. b. It is divisible by 1 and 17 (only).

Naturally, the pronoun in (72b) refers to the entity that has been introduced into the discourse by 17. The semantic engine therefore will need to connect it to its antecedent 17. Within the λ-DRT framework, the semantic construction proceeds as follows. The parse tree for the first sentence, which is labelled with lexicon entries and their composition, is given in (73).

NP

/o / 17∈N xS NNNN prime(17) x NNN x xx NNN xx NNN x x NNN x N' {xx

T ERM (73)

CONSTANT

17 O

λR.

17∈N

O

V P H/o / λy∈N . prime(y) HH

V

{ {{ { { {{ }{ {

HH HH HH H#

ADJ

is

prime

O

O

O

O

λy∈N .

prime(y)

O + R(17∈N )

The entity that is indicated by the NP needs to satisfy the predicate introduced by the VP. The NP introduces a discourse referent for the representation of 17. The VP introduces a condition on this discourse referent, namely λy∈N .prime(y), representing the fact that the individual that is expected from the NP satisfies this predicate; the contribution of the verb is empty (or ignored). We obtain the semantic representation of the

96


sentence by composing the semantic representation of the noun phrase with that of the verb phrase. Here, the λ-expression λR.(17∈N • P(17∈N )) is applied to λy∈N .(prime(y)). Two consecutive β-reductions then yield the desired result, namely [17∈N |prime(17)], which is the sequential representation of the DRS that is being depicted at the root of the parse tree. For the second sentence, a standard DRT construction results in (74):

(74)

1∈N , 17∈N , w . w =? divisible(w, 1) divisible(w, 17)

.

The processing of the numbers 1 and 17 each introduce a discourse referent. The pronoun also introduces a discourse referent and a condition signalling that an appropriate antecedent has to be found. If we combine the semantic representation for the second sentence with the semantic representation of the prior discourse context, we obtain (75):

(75)

1∈N , 17∈N , w prime(17) . w = 17 divisible(w, 1) divisible(w, 17)

.

. Note that (75) links the anaphoric expression w to its antecedent, namely, w = 17. If we use a substitutional approach to anaphora resolution, we replace all occurrences of w with 17, and we obtain:

(76)

1, 17 prime(17) divisible(17, 1) divisible(17, 17)

.

As this example demonstrates, discourse update in standard DRT is basically the set union of both the discourse referents and the discourse conditions, followed by an anaphora resolution mechanism. The set union operation deletes multiple occurences of both discourse referents and conditions. If we use the semantic representation for constants in Fig.5.3(f), we obtain (77a), which translates (using Definition 5.1.3) to the first order logic expression (77b), which is a rather cumbersume logical form.

(77)

a.

c1 , c2 , c3 , w name(c1 , 17) prime(c1 ) . w = c1 name(c2 , 1) name(c3 , 17) divisible(w, c2 ) divisible(w, c3 ) {, , }

b. ∃c1 ∃c2 ∃c3 ∃w : name(c1 , 17) ∧ prime(c1 ) ∧ w = c1 ∧ name(c2 , 1) ∧ name(c3 , 17) ∧ divisible(w, c2 ) ∧ divisible(w, c3 ) For the construction of the representation (76), we used a representation for proper names that cannot be expressed in the language of standard DRT. If we add constants to the language of DRT, we also need to extend the translation algorithm as it is captured by Definition 5.1.3. Translating (77a), with a modified translation algorithm that considers constant discourse referents, we obtain the truth-equivalent but more concise formula: prime(17) ∧ divisible(17, 1) ∧ divisible(17, 17).

5.2


97

Proper Names at the Topmost Level. DRS (76) is flat. Each of its referents is on the topmost level and no condition is a DRS substructure. In general, however, the DRS construction mechanism will need to decide whether constant symbols always propagate to the topmost DRS, thus making them accessible as antecedents for anaphoric expressions from anywhere in the discourse. The following example suggests indeed that constant symbols and proper names need to be accommodated at the topmost DRS: (78)

Either 17 is prime or it is divisible by a smaller number.

Keeping discourse referents locally, we obtain the DRS as depicted in Fig. 5.4(a); if we place them globally, we obtain the DRS depicted in Fig. 5.4(b). Only in the latter case are 1 and 17 accessible as possible antecedents for w. 17 17 prime(17)

v, w . w =? ∨ smaller number(v) divisible(w, v) (a)

prime(17)

v, w . w =? ∨ smaller number(v) divisible(w, v) (b)

Figure 5.4: Two DRS Representations for (78).

5.2.2

Variables

In sect. 4.2.2, we described the use of variables in mathematical discourse. As we indicated, the use of symbols greatly facilitates the representation and communication of mathematical ideas. In this section, we re-discuss variables from a linguistic, DRT-centered, point of view. We first focus on the introduction, naming and referring aspect of variables. We then briefly discuss indexed and primed variables. The quantification of variables will be discussed at a later stage in sect. 5.2.5.2. 5.2.2.1

Introducing, Naming and Referring

Reconsider Lamport’s example: (79)

a. There do not exist four positive integers, the last being greater than two, such that the sum of the first two, each raised to the power of the fourth, equals the third raised to the same power. b. There do not exist positive integers x, y, z, and n, with n > 2, such that xn + yn = zn .

In (79a), no symbols are used to name any of the four anonymous variables that are introduced by the noun phrase four positive integers. Therefore, the reference to any of these anonymous entities requires resorting to using the definite descriptions the first two, the third and the last. Similarly, operations on these entities are also expressed verbally, namely, the sum of the first two, the power of the fourth, and the third raised to the same power. As the modern version (79b) demonstrates, the use of symbols yields a concise and more readable form. The noun phrase positive integers x, y, z and n introduces to the discourse and names four variables, namely, x, y, z and n, each of which denotes a positive integer. Each of the other occurrences of x, y, z and n then anaphorically refers to the corresponding previously introduced entity. The expert language of mathematics has a good many of linguistic constructions for introducing, naming and referring to discourse referents. For each of the following typical examples, we assume that the named entities that are being introduced are indeed new to the discourse context.

98 (80)

Mathematical Discourse and DRT — The Parser Module a. Every integer a > 1 can be represented as a product of one or more primes. b. If m is the least of these divisors, m is prime. c. If p and q are odd primes, then

where p = 12 (p − 1),

p q

q p

= (−1) p q ,

q = 12 (q − 1).

In (80a), the noun phrase every integer is followed by the symbol a. The noun phrase introduces an anonymous discourse entity to the discourse, and the symbol then serves to name this entity. This sentence does not contain a referring expression, except 1. In (80b), the noun phrase the least of these divisors introduces an entity to the discourse by singling one element with particular properties out from the others. This element is then named m. This example demonstrates that conditional constructions may not only introduce assumptions but can also be used for the introduction of names. Once the name’s reference is established, the name can then be used to refer to its antecedent. For instance, in (80b), the second occurence of m is an anaphoric referring expression. The sentence (80c) demonstrates another variation of introducing, naming and referring. It introduces and uses four named entities, namely, p, q, p and q . The first two names are introduced in the premise of the conditional construction. Here, the bare plural noun construction odd primes introduces an unknown number of entities. Its embedding in p and q are odd primes not only restricts their number to two (odd primes then reads as two odd primes), but also names these two entities. The other two named entities are introduced symbolically by equations. In these equations, the equality sign is a definitional equality: p is equal by definition to 12 (p − 1), and q is equal by definition to 12 (q − 1). All but the first occurrences of p and q are purely anaphoric. Also, note that first occurrences of p and q have no antecedents. They are only defined at a later stage. Here, referring precedes introducing and naming. From these examples, we see that a symbol in mathematical discourse can have three functions, introducing, naming and referring. Given this threefold function, the question is whether we can expect a symbol to have a single lexical representation. Fig. 5.5 depicts three possible lexical entries for the symbol n. In v31∈N λP. name(v31 , n) •P(v31 ) . v31 =? (a)

n∈N . λP. n =? (b)

•P(n)

λP.λv31 . name(v31 , n) •P(v31 ) (c)

Figure 5.5: Three Lexical Entries for the Symbol n. Fig. 5.5(a), the lexical entry of n introduces a discourse entity v31 as well as the conditions name(v31 , n) . and v31 =?. The first condition provides a name for the newly introduced discourse referent, and the second marks it as a referring expression. In Fig. 5.5(b), n itself plays the role of a (named) discourse referent so that no condition of the form name/2 is required. Again, to make explicit the anaphoric character of n, we . introduce a condition n =?. Note that both lexical entries introduce a discourse entity of type integer. The type information is used to constrain the composition of semantic representations. The representation in Fig. 5.5(c) only performs the naming function, where a name is treated as an adjective. No discourse entity is introduced, and therefore, also no condition to mark it as anaphoric. Now, we test the applicability of those lexical entries by reconsidering examples (80a–80c). Indefinite NP followed by Symbol. We start with linguistic constructions where an indefinite noun phrase is followed by a symbol. Typical examples include a positive number p greater than 2, a positive number greater than 2, say p and an odd positive number, say p. As these examples show, the noun can be preceded by one or more adjectives, and the symbol either follows the noun or is part of a potentially compound prepositional phrase. A possible syntax tree for every integer a is given in (81).

5.2


99

NP

UU gNP SY MBOL

DET (81)

g ggggg sggggg

UUUU U* Nj SY MBOLWW • WWWWW j WWW+ jjjj tjjjj

every O

N

λP.λQ.v • P(v) → Q(v)

integer

O

n

λx.integer(x)

λR.λv8 .name(v8 , a) • R(v8 )

•R

VARIABLE

O

The • operator between two branches of a tree signals that the semantic representation associated with its left branch is applied to the one of its right branch. Similarly, •R signals that the application of λ-terms has to be performed in the opposite direction. Given this syntactic analysis, only the lexical entry from Fig. 5.5(c) can be effectively combined with other parts of the syntax tree. We apply the semantic representation of the symbol n to the semantic representation of the noun integer; the result then serves as an argument to the λ-expression for every. We therefore obtain, after a few β-reductions, at the root of the parse tree the λ-expression λQ.v • name(v, a) • integer(a) → Q(v). The lexical entries Fig. 5.5(a)–5.5(b) require a different parse, for instance, the alternative syntax tree as depicted in (82). NP

(82)

NP SY MBOL UUUU l UUUU lll l UUU* l u ll l DET N SY MBOL • TTTT iii i TTTT i i i i TT) i iiii i t R SAY VART N every • TTTT O kkk k TTTT k k O k k TT) ukkk integer ADJ 2 VARIABLE λP.λQ.u • P(u) → Q(u) • O O n {say} λx.integer(x) O O O O λC.λD.λe.C(λ f .say(e, f )) • D(e)

. λP.(v1 • name(v1 , n) • v1 =?) • P(v1 )

This syntactic analysis makes use of the symbol representation given in Fig. 5.5(a). It serves as an argument to the adverbial say, which may or may not be present in the input string. At the node N SYMBOL of . the syntax tree, we obtain (after a number of β-reductions) the λ-term λe.v • name(v, n) • v =? • say(e, v) • integer(v), which combines with the semantic representation of the determiner to produce: u, v1 name(v1 , n) . → Q(u). λQ. v1 =? say(u, v1 ) integer(v1 ) This representation contains two discourse entitites: u stems from the processing of the determiner every, and v1 stems from the lexical representation of n. This partial DRS is also underspecified since it contains . the referring expression v1 =?. We can now simplify the partial DRS at the noun phrase (or sentence) . level. Using the condition say(u, v1 ), the anaphoric expression v1 =? can be resolved: v1 refers to the entity introduced by the indefinite noun phrase, u. In a substitutional approach to anaphora resolution, we replace all occurrences of v1 by u and then delete the discourse referent v1 as well as the conditions v1 = u and say(u, v1 ). As a result, we obtain the intended result λQ.u • name(u, n) • integer(u) → Q(u). The semantic representation for the symbol n, as depicted in Fig. 5.5(b), is a λ-expression of the same “surface form” as the representation shown in Fig. 5.5(a), and therefore applicable as well. Replacing the

100


representation given in Fig. 5.5(a) by the one given in Fig. 5.5(b), we obtain at the top of the syntax tree the semantic representation u, n . n =? λQ. → Q(u). say(u, n) integer(n) Similarly, n anaphorically refers to u, but substitutional anaphora resolution (replacing all occurrences of n by u and deleting the referent n as well as the conditions n = v and say(u, n)) would yield: λQ.u • integer(u) → Q(u). Here, say(u, n), which reads as ‘u can be referred to by the name n’, has to be preserved. The lexical representation of Fig. 5.5(c) cannot be used within this syntactic analysis. An alternative analysis for the introduction of named discourse entities depends on a new lexical entry for the indefinite determiner and a different syntactic analysis. Assume the semantic value of every is . λP.λN.λQ.u • P(u) • (N(λv.v = u)) → Q(u). Also, we take the representation as depicted in Fig. 5.5(a) as the lexicon entry for the symbol n. Now, if we change the syntactic analysis from (81) to the rather awkward (83), we obtain λQ.u • integer(u) • v31 • . . name(v31 , a), v31 =? • v31 = u → Q(u) as semantic representation for every integer a. NP

NP SY MBOL TT

n nnn vnnn NP PPP w PPP • PPP {www (

(83)

DET

•

TTTT T*

VARIABLE

N

every

T ERM

integer

a

. It contains the underspecified condition v31 =?, which postprocessing can easily resolve by the exploitation . of the condition v31 = u. Yet another linguistic construction suggests the introduction of another lexical entry for determiners. For example, take the phrase every integer a > 1, which has the postnominal modifier > 1. The phrase can be syntactically analysed as (84): NP

NP SY WW R g MBOL REST

ggg sggggg

NP SY MBOL (84)

1

111

... 1

•

WWWWW WWWW+

REST R

ADJ PHRASE P i

ADJ 2

iii iiii i i i t i

>

PPP PPP PP'

NP

1

111

... 1

The indefinite article every is now represented as the λ-term . λP.λN.λR.λQ.u • P(u) • (N(λv.v = u)) • R(u) → Q(u). Its R-term absorbs any postnominal restrictions for the discourse entity that has been introduced and named by NP SYMBOL.

5.2


101

Symbol Connected to a Definite Noun Phrase by Means of is. If the verb is is surrounded by two noun phrases, each filling one argument slot of the binary verb is, then we have a case of asserted or assumed identity. The premise of (80b) serves as a typical example of such a construction. Similar examples of sentences that contain such constructions, taken from Hardy & Wright’s textbook, include Let n be the least abnormal number, Suppose that d is the smallest positive number of S and If pc is the highest power of p which divides (m, n), then pc | m or pc | n and so pc | (a − b). In each case, n, d and pc are newly introduced discourse entities. For ease of argumentation, we study the simpler sentence If m is the least divisor of n, then m is prime. instead of sentence (80b). The syntax tree for its premise is given in (85a), and its underspecified semantic representation is depicted as (85b).

(85)

a.

Q mmm S QQQQQ QQQ mmm m m QQQ m vmmm ( NP V P OO • OOO v vv OOO v v O' zvv VARIABLE V NP NN • o NNN oo o NNN o o o ' wo

m

is

DET

the least

•

NP PPO

p ppp p p px

N PP

divisor of

•

OOO OOO OO'

NP

VARIABLE

n

b.

v3∈N , v7∈N , x . name(v3 , m), v3 =? . name(v7 , n), v7 =? divisor(x, v7 ) y ¬ less(y, x) divisor(x, v7 ) is(v3 , x)

. The DRS (85b) is underspecified since it contains three anaphoric conditions of the form X =?. The multiple . occurrences of v7 and its associated conditions name(v7 , n) and v7 =? originate from using the following lexical expression for the least, which is taken as one token:

x λP.λQ.

• ( P(x) •

¬(

y ) • Q(x). • P(y) ) less(y, x)

From the sentence’s hierarchical structure, we know that the variable n is not new to the context. The symbol n therefore acts anaphorically, and we assume that it can be linked to its antecedent, say vn . The variable m, however, is new to the proof context, and therefore, m introduces a named discourse entity to the context. If we use the lexical entry of Fig. 5.5(a), then the effect of introduction can be achieved by a simple deletion . of the condition v3 =?. The partial DRS then can be further simplified exploiting the (naming) condition is(v3 , x), and we obtain as a semantic representation for (85a):

102

Mathematical Discourse and DRT — The Parser Module v3∈N name(v3 , m) divisor(v3 , vn ) y ¬ less(y, v3 ) divisor(y, vn )

.

Note that we have a different situation in the case that the first argument of is/2 is not new to the discourse context. In this case, is/2 does not read as a naming condition, but as an asserted identity. Also note that for this analysis, we could have also used the lexical representation of Fig. 5.5(b) instead of Fig. 5.5(a), but not Fig. 5.5(c). Symbol Connected with is to an Indefinite Noun Phrase. Similar constructs to If p and q are odd primes, taken from Hardy & Wright, include: (86)

a. We denote by Cs a number which is the sum of s non-negative cubes. b. T HEOREM 45. If x is a root of an equation xm + c1 xm−1 + . . . + cm = 0, with integral coefficients of which the first is unity, then x is either integral or irrational. c. T HEOREM 114. If p is a prime 4n + 3, then 1 (p − 1)! ≡ (−1)v (mod p), 2 where v is the number of quadratic non-residues less than 12 p.

In each of the three examples, the second argument of is is not uniquely identified, hence the use of the indefinite article.3 In (86a), the number Cs is not unique since it depends on the number s as well as the non-negative cubes that one chooses to sum up.4 Similarly, the x and p in (86b) and (86c), respectively, are not uniquely identified. The x denotes any of many possible roots of the given equation, and the p denotes any prime that can be represented as a sum of 4n and 3. Linguistic constructions of this kind are syntactically similar to ones where a symbol is connected via is to a definite noun phrase. Semantically, the representation must convey that the newly introduced symbol does not refer to a well-defined individual but to an unidentifed element of a well-defined set of individuals. Definitional Equalities Expressed with Symbols only. New symbols can be defined symbolically, that is, in terms of symbols that were already introduced in the discourse at an earlier stage. In (80c), we have p and q defined as p = 12 (p − 1) and q = 12 (q − 1), respectively. These equalities read as definitional equalities if p and q are each new in the context, and they read as asserted equalities otherwise. For example, for p = 12 (p − 1), we get the following DRS representation:5 1∈N , 2∈N v0∈N , v1∈N , v2∈N , v3∈N , v4∈N divides∈N2 →N , minus∈N2 →N , times∈N2 →N name(v0 , pv(p)), v0 =? name(v4 , p), v4 =? v1 = divides(1, 2), f un result(v1 ) v2 = minus(v4 , 1), f un result(v2 ) v3 = times(v1 , v2 ), f un result(v3 ) equal(v0 , v3 ) 3 Moreover, in (86c), we have all three introduction and naming cases discussed so far: (i) a symbol followed by an indefinite noun phrase ( p is a prime); (ii) an indefinite noun phrase followed by a symbol or term (a prime 4n + 3); and (iii) a symbol followed by a definite noun phrase (v is the number of...). 4 Also, the phrase we denote by takes the place of is. 5 The handling of primed variables and functional expressions is discussed at a later stage.

5.2


103

Both variables p and p have a DRS condition that marks each of them as a referring expression, namely, . . v0 =? and v4 =?. The reading of the predicate equal/2 is then defined in terms of the outcome of resolving its first argument, v0 . If v0 is new to the discourse context, then we have a definitional equality; otherwise we have an asserted equality. 5.2.2.2

Indexed and Primed Variables

Indexed variables (e.g., p1 , p2 , . . .), primed variables (e.g., p , p , . . .) and primed indexed variables (e.g., p1 , p2 , . . .) relate to their non-indexed and non-primed counterparts (e.g., p). The identification of this relationship introduces additional complexity to the construction of their semantic representations. We give four examples (the first one is ours, the last three are taken from Hardy & Wright). (87)

a. p denotes the successor of p. b. pn denotes the nth. prime. c. We have n = p1 p2 p3 . . . = q1 q2 . . ., where the p and q are primes, no p is a q and no q is a p. d. The induction hypothesis then implies that s

t

i=1

i=1

b = ∏ pi , c = ∏ pi , with pi , pi primes and hence a = p1 p2 . . . ps p1 . . . pt . In (87a) the relationship between p and p is explicitly stated: p is defined in terms of p and the successor function . In (87b), pn is defined as the nth. prime number without the explicit mention of p. Establishing a relation between pn and p therefore requires additional reasoning. In this case, Hardy & Wright do not use the symbol p to denote a prime number; rather it is used as a function symbol. In (87c), the expressions p1 , p2 and p3 all denote primes. However, the notational convention of (87b) does not apply in this context, i.e., p1 does not necessarily denote the first prime 2, p2 the second prime 3 etc. This is conveyed by the fragment the p and q are primes, which reads as the p and q are arbitrary primes. Moreover, the linguistic construction the p, for example, is used to make a distributive statement about each of the primes p1 , p2 , p3 , . . .: if p is p1 , then p is prime; if p is p2 , then p is prime; etc. Here, p relates to any pi via the identity relation. In (87d), which is the concluding sentence of LeVeque’s existence proof of the FTA, the pi and pi denote two, not necessarily distinct, sets of prime numbers. In this case, it is not necessary to identify the exact nature of the relation between a pi and pj . In fact, there is no relation except that the pi represent the prime factorisation of b, and that the pj represent the prime factorisation of c, and that a can be represented as a product of these two factorisations. Fig. 5.6 depicts Vip’s representation of indexed and primed variables. Each of them introduce a discourse v1∈N λP. name(v1 , spv(p)) •P(v1 ) . v1 =? (a)

v1∈N , 1 λP. name(v1 , iv(p, 1)) •P(v1 ) . v1 =?

v1∈N , 1 λP. name(v1 , iv d pv(p, 1)) •P(v1 ) . v1 =?

(b)

(c)

Figure 5.6: Lexical Entries for p , p1 and p1 . referent, v1 , a condition to signal its anaphoric character, and a condition for naming. In Fig. 5.6(a), the condition name(v1 , spv(p)) marks the variable v1 as a single primed variable. In Fig. 5.6(b), the condition name(v1 , iv(p, 1)) marks the variable v1 as an indexed variable, and in Fig. 5.6(c), the condition name(v1 , iv d pv(p, 1)) marks the variable v1 as both an indexed and a double-primed variable.

104


Alternative Representation. The lexical entries given in Fig. 5.6 do not encode any relation between a primed or indexed variable and its non-indexed or non-primed counterpart. In order to make explicit such a relationship, a richer representation is needed, for instance: v1 ∈ N, v2 ∈ N, R ∈ N × N . name(v1 , spv(p)), v1 =? λP. •P(v1 ) . name(v2 , p), v2 =? . rel(R, v1 , v2 ), R =?

(88)

This representation for the primed variable p extends the representation of Fig. 5.6(a) by the introduction of two discourse referents and four conditions.6 The referent v2 and its associated conditions name(v2 , p), . v2 =? introduce a representation for the non-primed counterpart of p . The referent R and its associated . conditions R =? and rel(R, v1 , v2 ) introduce a predicate variable of anaphoric character and a constraint that any possible antecedent of R must then satisfy.

5.2.3

Functions

A functional expression can be composed of symbols (e.g., x2 ), expressed in English (e.g., The square of a number), or stated with a mixture of symbols and English words (e.g., The square of x). 5.2.3.1

Symbolic Functional Expressions

The definition of the Möbius function µ(n), which we repeat in (89), serves as a good example to demonstrate the complexities involved in the interpretation of symbolic functional expressions. The Möbius function µ(n) is defined as follows: (i) (ii) (iii)

(89)

µ(1) = 1; µ(n) = 0 if n has a squared factor; µ(p1 p2 . . . pk ) = (−1)k if all the primes p1 , p2 , . . . , pk are different.

Thus µ(2) = −1, µ(4) = 0, µ(6) = 1. T HEOREM 262. µ(n) is multiplicative. As we pointed out earlier, µ(n) can denote a function object, the unary Möbius function µ, as well as the result of applying µ to its argument n. Therefore, to account for those subtle but necessary differences, Vip’s lexicon will need to contain multiple lexical entries for functions. Fig. 5.7 depicts three lexical entries for the unary function µ. f6∈N→N λP.

f6∈N→N

name( f6 , µ) . f6 =? (a)

+R( f6 )

λP.λR.P(λu.

name( f6 , µ) +R( f6 )) . f6 =? f arg( f6 , u) (b)

f6∈N→N r9∈N name( f6 , µ) λP.λR.P(λu. +R(r9 )) . f6 =? r9 = f6 (u) f un result(r9 ) (c)

Figure 5.7: Three Lexical Entries for the µ Function (Expressed Symbolically). The entry depicted as Fig. 5.7(a) introduces a discourse entity f6 of functional type as well as two discourse conditions. The first condition assigns a name to the function object, and the second expresses its anaphoric character. The discourse referent is passed on, which allows its use in expressions of the form µ is multiplicative. Note that this entry for functions is similar to the ones for variables. The difference lies in the functional type. The analysis of µ(n) in µ(n) is multiplicative requires a different lexicon entry for 6 This

representation has not been fully implemented in Vip.

5.2


105

µ, which is given in Fig. 5.7(b). Its λ-term absorbs the argument n, stores it in f arg/2, but still passes on the functional referent. The second reading of µ(n) is obtained by the entry displayed in Fig. 5.7(c). In addition to the previous entries, it adds an additional discourse referent, r9 , and equates it with the result of applying the function to its argument. The condition f un result(r) marks the special status of r9 . Since the result of applying µ to n is unique, r9 can only take one value. Hence, the referent r9 acts like a proper name.7

Functional Expressions and Coercing. If f is a unary function of type N → N, then the formula f > 0 is short-hand for ∀x∈N : f (x) > 0. We can handle this case with the addition of another entry for > to the lexicon. Its semantic representation is the λ-term (in Prolog-like notation) λPλFτ1 →τ2 (P(λyτ2 (drs([], drs([xτ1 ], []) → drs([rτ2 ], [ f un result(r), apply(F, x, r), greater(r, y)]))))), which will do the trick by exploiting its type restrictions. With the usual entries for function objects and constants, we can parse the expression f > 0 into the DRS8 f6∈N→N , 0∈N name( f6 , f ) (90)

x∈N

r3∈N f un result(r3 ) → apply( f6 , x, r3 ) greater(r3 , 0)

.

Similarly, the occurrence x2 in the expression x2 > 0 denotes an anonymous function, the unary square function. In order to express the intended reading in a formally correct manner, it has to be coerced into ∀x : square(x) > 0 , or, in the formalism of the lambda calculus, ∀y : λx.x2 (y) > 0.

5.2.3.2

English and Mixed English-Symbolic Functional Expressions

Functional relations can also be expressed in plain English or using a mixture of English and symbols as the following three sentences illustrate: (91)

a. The square of a natural number is composite. b. The square of n is composite. c. n2 is composite.

These sentences share a similar syntactic surface structure. If the symbol n denotes an arbitrary natural number, then they also share their semantic content. Ideally, a semantic engine returns representations that reflect their semantic equivalence. In our adapted DRT framework, a discourse referent will need to be introduced for each of the square of any natural number, the square of n and n2 . One possible and non-standard syntactic analysis for the NP of (91b) is depicted as (92).9 ∃!xP(x), then ιxP(x) is the unique x that makes P true. That is, a ι-expression behaves like a proper name, and P(ιxQ(x)) is equivalent to ∃!x(Q(x) ∧ P(x)). 8 The condition apply( f , x, r ) reads as r = f (x). 3 3 6 6 9 The standard syntactic analysis is to decompose square of n into the noun square and the prepositional phrase of n. 7 If

106

Mathematical Discourse and DRT — The Parser Module FUN NP VV VVVV rr r VVVV r VVVV yrr * NP FUN h h L h h L h L h LLL hh hhhh % shhhh DET FUNCT ION T ERM the0 O O

(92)

r2∈N λT λRλS.R(λx. f un result(r2 ) •S(r2 )) r2 = T (x)

square of O O

λy. square(y)

VARIABLE n O O O O

v31∈N λP. name(v31 , n) •P(v31 ) . v31 =? The leaves of the parse tree point to the lexicon entries that are used to construct its semantic representation. Note that the entry for the definite determiner the0 is different from the standard entry λP.λQ.u • P(u) • Q(u). The new entry introduces a discourse referent, r2 , that refers to the value of applying a function T to some variable x. If we apply this λ-expression to the semantic representation of square of, and then apply the resulting expression to the semantic representation of the variable symbol n, we obtain: v31∈N , r2∈N . name(v31 , n), v31 =? f un result(r2 ) r2 = square o f (v31 ) composite(r2 )

.

Note that, in comparison to (91c), no discourse referent is introduced for the function object square o f .

5.2.4

Complex Terms

While the syntactic analysis of complex terms is labourious but conceptually simple, the construction of their semantic representations is not. Grammar rules need to be written to parse complex terms such as b 2 n such rules are necessary to cope with terms a x dx, {x ∈ N | 1 < x < 10} and limn→∞ [1 + (a/n)] . Many where arguments are left out or kept implicit. The term x2 dx, for instance, does not name the lower and upper bound of the integral. Such values must then be constructed from the context in which they appear or from the underlying theory. From the mathematical reasoning point of view, many expressions require a representation in higher-order logic. From the linguistic point of view, the semantic analysis of complex terms must result in representations that facilitate the resolution of references to parts or collections of such terms. This is a non-trivial matter whenver English mixes with symbolic expressions. Example (93a) shows Hardy & Wright’s definition of the Fibonacci series. This definition serves as a good example to show the obstacles an automatic text understander has to overcome in order to construct an appropriate semantic representation, say (93b).10 (93)

a. The series (un ) or 1, 1, 2, 3, 5, 8, 13, 21, . . . in which the first two terms are u1 and u2 , and each term after is the sum of the two preceding, is usually called Fibonacci’s series. b. u1 = 1 :: u2 = 1 :: (∞, λi.ui−1 + ui−2 ) for i > 2;

A major difficulty in the analysis of (93a) is that the ellipsis 1, 1, 2, 3, 5, 8, 13, 21, . . . is embedded in text that serves as both its definition and explanation. The question now is whether we can instruct Vip’s parser to mechanically analyse into representation (93b). The open-ended ellipsis clearly indicates that M in (M, F) is set to ∞. The body of the λ-term needs to be reconstructed from the sum of the two preceding to 10 This

is the notation for ellipses that we used in sect. 3.3.1.

5.2


107

ui−1 + ui−2 . Here, the index variable i plays the role of the iteration variable, and the ui need to be derived from the usage of un , u1 and u2 . The special role of u1 and u2 in made explicit in (93a), and define the first two elements of the ellipsis (outside the λ body). At the time of writing, Vip’s parser module is not able to properly perform such complex analyses. Its current capabilities are better captured by the following, much simpler, handling of a purely symbolic expression. The processing of the expression a = p1 p2 . . . ps results in the following intermediate semantic representation: v1∈N , v2∈N , v3∈N , v4∈N , v5∈N , 1∈N , 2∈N . name(v1 , a), v1 =? . name(v2 , s), v2 =? . name(v3 , i), v3 =? . name(v4 , iv spv(p, v3 )), v4 =? v5 = ell p(v2 , v4 ), prod term(v5 ) equal(v1 , v5 )

.

The processing of the term a = ∏si=1 pi yields a similar representation.

5.2.5

Type, Quantification and Scope of Discourse Entities

So far, we have discussed the three-fold nature of the use of symbols in mathematical writing: introducing, naming and referring. Once a symbol has been introduced, it is crucial to be able to determine its type, quantification and scope. 5.2.5.1

Typing

The identification of a symbol’s type is stronly related to the many notational conventions that govern the use of symbols in mathematical discourse. In Hardy & Wright, for instance, but also in many other textbooks on elementary number theory, the symbol n usually denotes a natural number, and the symbol p usually denotes a prime number. If the symbol is used in a conventional manner, the type is usually omitted, as in (94a). Only in cases where the author makes a non-conventional use of a symbol, the symbol is complemented with explicit type information, as for instance in (94b). (94)

a. n2 − n + 41 is prime for 0 ≤ n ≤ 40. b. ey is irrational for every rational y = 0.

Vip’s parsing component processes type information as follows. The lexicon entries for constant, variable and function symbols introduce typed discourse referents. Type information is also added for predicate symbols. λc∈N .prime(c).11 The lexicon entry for the predicate prime, for instance, has the semantic representation Note that the type information limits the number of possible readings. For example, Vip’s parser successfully parses n is prime into the semantic representation [v2∈N |name(v2 , n), prime(v2 )]. However, it does not return a reading for r is prime, since Vip’s lexicon defines the symbol r as an entity of type rational. 5.2.5.2

Quantification

When a variable is introduced, its quantification can be made linguistically explicit or has to be recovered from its wider context. In the following, we omit a discussion of cases where a variable occurs free or where the quantification of a variable has to be constructed from a wider context. A distinction between these two cases is particularly difficult. However, Vip’s discourse update engine, described in ch. 7, can account for such cases by the exploitation of extra-linguistic information.12 11 Note

that the result type of predicates (Bool) is left unspecified. the sentence-level parser fails to identify the quantification of a variable, then it passes an appropriate underspecified representation to the discourse update engine, which then fills-in the missing information. 12 If

108


Existential Quantification. This paragraph discusses English constructions that explicitly mark variables as existentially quantified. Typical examples are the sentences (95a–95c). (95)

a. There is (always) a prime between n2 and (n + 1)2 . b. An integer a is said to be divisible by another integer b, not 0, if there is a third integer c such that a = bc. c. There is a prime p for which 2 p − 1 − 1 ≡ 0 (mod p2 ).

In (95a), the noun phrase a prime introduces a discourse entity. It is preceded by the explicit existential quantification there is. The sentence (95b) contains three indefinite noun phrases, namely, an integer a, another integer b and a third integer c, each of which introduces and names a discourse referent, a, b and c, respectively. However, only the variable c is explicitly quantified. Although the variables a and b are introduced by indefinite noun phrases, they must get a generic reading, as the wider syntactical context (i.e., the use of an is said to be phrase) indicates.13 In (95c), the symbol p occuring in the formula 2 p − 1 − 1 ≡ 0 (mod p2 ) is attributed the property prime and explicitly existentially quantified. There are a few possible parses for the analysis of these existential sentences. Below, we discuss the syntactic and semantic analysis of sentence (95a). The sentences (95b–95c) can be processed in a similar manner.

(96)

o S SSSSS SSS ooo o SSS o o SSS wooo S) EX PHRASE kk NP WWWWWWWW k k WWWWW kkk WWWWW kkk + ukkk DET ADJ PHRASE PP PHRASE there is ss sss s s ys T ERNARYPPP ADJ ADJ PHRASE an PPP p PPP ppp p p PP( wpp N PREP/3 AND T ERM odd KKK nn n KKK n n n n KK % wnnn T ERM T ERM prime between and 1

111

... 1

1

111

... 1

Given this syntactic analysis, the semantic analysis proceeds as follows: the semantic representation of the NP is computed by applying the λ-term of the PP phrase to the one of the ADJ PHRASE. Then, the semantic representation of DET is applied to the result of the last computation. Finally, we combine the semantic representation of the EX PHRASE there is, namely, λP.P(λy.exist(y)), with the last result, being of the form λR.[v1 . . .]|[. . .] + R(v1 ), to obtain 1∈N , 2∈N v1∈N , v2∈N , v3∈N plusN2 →N , expN2 →N r1∈N , r2∈N , r3∈N name(v2 , n), v2 =? name(v3 , n), v3 =? f un result(r1 ), r1 = exp(v2 , 2) f un result(r2 ), r2 = plus(v3 , 1) f un result(r3 ), r3 = exp(r2 , 2) lt(r1 , v1 ), lt(v1 , r3 ) odd(v1 ), prime(v1 ), exist(v1 )

.

Note the presence of the discourse condition exist(v1 ). It determines the quantification of v1 ; at a later stage, this condition will be exploited by the discourse update engine. 13 A

DRS construction rule for definitional constructions is discussed in § 5.4.2.5.

5.2


109

Universal Quantification. The following extracts from Hardy & Wright show a selection of different linguistic means to explicitly quantify variables. (97)

n

a. Since pn < 22 is true for n = 1, it is true for all n. b. The congruence (p − 1)! + 1 ≡ 0(mod p2 ) is true for p = 5, p = 13, p = 563, but for no other value of p less than 200000. c. M p is prime for p = 2, 3, 5, 7, 13, 17, 19, 31, 61, 89, 107, 127, 521, 607, 1279, 2203, 2281, 3217, 4253, 4423, 9689, 9941, 11213 and composite for all other p < 12000. d. For 0 < x < 1, we have 0 < f (x)

Understanding Informal Mathematical Discourse ...

Understanding Informal Mathematical Discourse ...

Suggest Documents

approaching mathematical discourse - DiVA portal

obstacles in mathematical discourse during

approaching mathematical discourse - DiVA portal

Development of Mathematical Understanding

Managing Informal Mathematical Knowledge: Techniques ... - My FIT

Management of Informal Mathematical KnowledgeâLessons Learned

Enhancing Mathematical Discourse: The Effects of E

mathematical and everyday discourse - Cerme 7

Examining mathematical discourse to understand in ... - Pythagoras

Understanding the linkages between informal and ...

understanding the legalisation process of informal ...

Understanding Informal Group Learning in Online

Understanding formal and informal governance on ...

Understanding Prospective Teachers' Mathematical Modeling

A Mathematical Framework for Understanding

On Understanding Discourse in Human-Computer Interaction

Strategies for Understanding Information Organization in Discourse

Multilevel Assessment for Discourse, Understanding, and ... - CiteSeerX

Multilevel Assessment for Discourse, Understanding, and ... - CiteSeerX

understanding the elephant: the discourse approach

Corpus-based Discourse Understanding in ... - Semantic Scholar

local and global structures in discourse understanding

Understanding Natural Language with Functional Discourse Grammar

On Understanding Discourse in Human-Computer Interaction

Understanding Informal Mathematical Discourse ...