Model-Guided Proof Debugging

0 downloads 0 Views 275KB Size Report
termediate format (e.g. block or natural deduction calculus). ... have been introduced by [4], although an earlier, more restricted version is used in the theorem provers SATCHMO [16] and MGTP [12]. ..... 22/97 Jens M. Felderho , Thomas Marx.
Model-Guided Proof Debugging

Ulrich Furbach, Michael Ku  hn, Frieder Stolzenburg

6/98

Fachberichte INFORMATIK

Universitat Koblenz-Landau Institut fur Informatik, Rheinau 1, D-56075 Koblenz E-mail: [email protected], WWW: http://www.uni-koblenz.de/universitaet/fb/fb4/

Model-Guided Proof Debugging Ulrich Furbach  Michael K¨uhn  Frieder Stolzenburg Universit¨at Koblenz  Institut f¨ur Informatik Rheinau 1  D–56075 Koblenz  Germany fuli,kuehn,[email protected]

Abstract In automated deduction, the final goal is to achieve a fully automatic proof system: given a logical specification of a problem, take a high-performance theorem prover, and let it do the work. Unfortunately, this does not work in practice, not only because theorem provers often lack finding the proof within reasonable time, but also because the specification is error-prone. For the latter, in the literature several methods are proposed for detecting and verifying errors in logic programs. In order to enable such analyses, usually termination of computation is presupposed. In this paper, we introduce techniques which are also applicable in the case of nontermination. One important aspect is the use of a natural language interface for inspecting even intermediate results of the proof search. By this, we are able to investigate the given specification wrt. critical properties: correctness wrt. an intended model, completeness, and sufficiency for answering given queries. For this, we employ tableau-based calculi, especially hyper-tableaux because of its model-building capability that is very helpful for debugging axiomatizations.

1 Motivation: The Deduction Life Cycle Automated deduction makes life easy: given a logical specification of your problem, take a highperformance theorem prover and let it do the work. Unfortunately, this is only a dream. In some cases it works for benchmark suites like the TPTP library [24]. There, a huge number of problems is given in form of a logical specification and the interesting question is, whether a prover can solve the problems—in most cases they have been solved by many other provers before—, and if it can, how fast? In real life, however, the problem is to find the appropriate logical formalization of the given problem. Once a formalization is found, the capabilities of theorem provers can be used to process the logical formulae—and usually one finds out, that the formalization was not as intended: either it was inconsistent or it did not meet the requirements. Our paper takes such a situation as the starting point. 1

1.1 Success and Failure of Automated Deduction In the context of software development this is a very common situation, and indeed it has led to an entire discipline: in the late 60s the term software engineering was born as a reaction of the problems programmers had in developing and maintaining programs in a practical environment. We are claiming that the field of automated deduction is exactly in the same situation as programming was in the late 60s. The techniques which were developed until now had to be applied in various application domains and used by non-experts. In automated deduction there is an obvious trend into the same direction. Let us mention two examples to show that automated deduction is already successfully applied elsewhere. Planning is one of the most traditional disciplines of artificial intelligence. It was considered folklore that planning required specialized formalisms and algorithms which take into account the special problems from this domain, as e.g. the so called frame problem. It turned out only recently that propositional theorem provers are able to outperform special purpose planning systems (see e.g. [13]). In model-based diagnosis of technical systems, there is a long tradition of logical formulation of systems together with the diagnosis task (see [21]). For computing real life diagnosis problems, however, it was only the use of specialized algorithms, which appeared to be appropriate. In [2] it was shown, that a general purpose automatic theorem prover is a powerful tool for the diagnosis tasks. All the successful applications of deduction show, that it is mandatory to have a close look at the axiomatization of the problem. It is the problem specification together with knowledge from the application domain, which leads to solutions.

1.2 Overview of the Paper The structure of the paper is as follows. In Sect. 2, we introduce some critical program properties and present the general loop for debugging axiomatizations. Since models and top-down reasoning play an important role in our approach, we shortly state some calculi which are relevant in our procedure, especially hyper-tableaux, in Sect. 3. After that, we instantiate our debugging loop and show procedures for investigating erroneous specifications in order to detect incorrectness, incompleteness, and insufficiency of programs in Sect. 4. For this, model building and natural language interaction are the key features. Finally, we give some conclusions in Sect. 5.

2 The General Framework In general, our scenario is as follows: given an axiomatization of a certain domain, we want to prove some theorems in it. Thus, at first, we have to give a formal specification of the (mathematical) problem at hand. For example, we want to prove a theorem in ring theory. By means of our tools, we attempt to find out a proof of the given query. These tools work interactively, since we believe, that each phase in the development of the axiomatization and in finding the proof 2

is error-prone. In addition, we use first-order logic as specification language. First-order logic is general enough for specifying many interesting problems, while retaining tractability. One could make use of modules or libraries during the specification phase, which are tested or even proved to be correct. For example, we know that a ring contains an additive group and a multiplicative semi-group (see Ex. 1). Therefore, we can use libraries, containing the definitions of the respective algebraic or data structures. We arrive at a first-order specification that consists of well-established modules but also additional, newly created parts which may be erroneous. In the sequel, we discuss several approaches for declarative program debugging. With program we mean a (disjunctive) logic program, which is the specification or axiomatization of the given problem in first-order clause normal form. We are well aware that many problems appear more naturally in a higher order specification; for this paper we restrict ourselves to the first-order case only.

2.1 Declarative Program Debugging According to [22], we may distinguish three types of errors in program behavior: (a) termination with an incorrect output; (b) termination with a missing output, i.e. the program finitely fails when it should succeed; (c) non-termination. But what does it mean that a program behaves (in)correctly? For this, we take an interpretation for all symbols in the specification representing the intended model of the program. [15] starts from similar assumptions. But in addition to the program specification and its intended meaning, in this paper special attention is drawn to the program behavior wrt. a given query, because often one is interested only in certain queries and their efficient treatment. However, a drawback of both approaches is that the case of non-termination is not really considered. They only work in case of termination (within reasonable time).

2.2 Classifying Axiomatizations and Their Behavior In the sequel, we will present a classification of axiomatizations not only wrt. an (abstract) intended model in terms of the notions correct or complete, but also wrt. the behavior when executed on a certain query. Of course, the latter depends on the chosen computation model, i.e. the proof procedure in our context. So, we will consider axiomatizations in two dimensions: (a) the procedural behavior when asked a certain query; (b) their declarative meaning which may be compared with the intended model. But beforehand, let us fix some notions. Definition 1 By a program (or axiomatization) P we mean a (disjunctive) logic program, which is given as (or may be translated into) a first-order clause set. Because of their declarative nature, algorithmic debugging is feasible for this kind of programs. Then, the intended model for a program is just an interpretation I of all (predicate and function) symbols in P . We write I j= P iff I is a model of P . A goal (or query) G is a conjunction of literals. We write P ` G iff an answer for G may be derived from G (by means of a given proof procedure). If we attempt to prove G from P , it may be the case that our proof procedure terminates with the expected answer or not. In case of termination we obtain some answer, i.e. an instance of G 3

or a disjunction thereof. This answer and even its proof can now be analyzed easily. However, often the procedure does not terminate, at least not within reasonable time. Thus, in case of non-termination we are not sure, whether there is an answer to the given query that might be computed after some more time, or whether the proof procedure will definitely loop forever. In practice, we want to be able to find errors in axiomatizations also in the case of nontermination. Thus, we propose a procedure that may help users even in this case. For this, we need user interaction during the whole computation of the proof. A user should have access on selected intermediate results, i.e parts of attempted proofs. Thus, in contrast to other approaches, we draw attention not only to abstract notions such as correctness and completeness, but we also introduce the more practical notion of sufficiency. Let us now give formal definitions. An interpretation I is called correct wrt. (or a model of) P iff I j= P . Incorrectness in case of termination can be detected by just inspecting the proof of some faulty theorem, or checking whether P [ I is inconsistent. For simple (finite) models, this check can be performed efficiently. It terminates even if the proof procedure is not able to answer all queries G wrt. the program P , because it is just model checking. An axiomatization P is called complete iff for every possible query G it holds either P j= G or P j= :G. When a program P is incomplete, then there are queries G that hold in I but G cannot be derived from P by any proof procedure. This is a disadvantage, of course. However, incompleteness is very difficult to detect, because we have to find out whether there exists a goal G such that P 6j= G. This leads to a strictly undecidable problem in general. Because of this difficulty, we will make use of the more practical notion sufficiency. Definition 2 An axiomatization P is called sufficient wrt. a goal G iff it is possible to derive G within a reasonable time bound, i.e. P ` G. Clearly, any correct and complete clause set P wrt. the intended model I is sufficient to derive G. But there may be a (minimal) subset or different formulation which is sufficient, allowing the proof because termination can be achieved then. In practice, we are interested in axiomatizations that allow the prover to do its work within reasonable time. Completeness is not required, provided that the shortened or otherwise transformed program remains correct wrt. I .

2.3 The Debugging Loop In Sect. 4, we propose several procedures in order to identify errors in incorre ct, incomplete or insufficient programs, which work even in case of non-termination. In all proposed methods, models play an important role, because models are a powerful means for checking consistency of specifications. But before we come to some examples, let us explain our general debugging loop; look at Fig. 1. At first, the user has to specify the problem at hand by a program, i.e. a first-order axiomatization. In addition to this, (s)he has an intended model of the specification in mind. It may be the case that this model allows a restricted and simplified view on the problem. It should be simple enough such that efficient processing is possible. The first-order specification and an appropriate representation of the intended model can be given to a theorem prover that attempts to answer 4

given queries. During this process, even in case of non-termination, the user should have the possibility of interaction with the prover. Since we currently do not expect that a fully automatic detection (or correcintended problem tion) of errors is possible, we need inmodel teraction. We cannot assume, that a user of our system understands theorem proving calculi in detail. Therefore, we need an interaction language which is general enough that a mathematician, i.e. a first-order theorem user that has only domain knowledge, no axiomatization prover knowledge about the system, can handle it. Hence, we decided to generate natural language output of proofs which can be inspected by the user. We find this very useful. For example, one may detect that natural language some false intermediate step has been interaction performed. After the interaction with the prover via the natural language interface, Figure 1: The general debugging loop. the user may correct the program and enter the debugging loop again.

2.4 Related Work Many research has been done in the field of interactive theorem proving. There, mostly generic theorem provers supporting different (higher-order) logics are used. The emphasis i n approaches such as Isabelle [19], KIV [20] and Omega [6] is on the interaction, because the underlying languages are very expressive and complex. Hence, these systems are well-suited for verifying a (known) proof. In this paper, we have a different point of view. We believe that in general the proof of a theorem is not fully known in advance. In addition, proof ideas may contain faults that have to be detected. Thus, we want to supply support for finding proofs and debugging axiomatizations. Therefore, the specification language should be simple and general enough such that (a) at least parts of a proof can be found fully automatically, (b) communicating the advance of the proof search is possible. Both is needed in a system for algorithmic debugging with (natural language) interaction. The field of algorithmic debugging was opened by the book [22]. It proposes the divide-andquery algorithm for this case. This means finished Prolog computations are analyzed and bugs detected, while showing the user as few (sub)queries as possible. A similar approach is performed in [15], where a declarative error diagnoser is presented. In [8], a more model-based procedure is proposed. There, a (minimal) model for a program is computed. If e.g. in this model a negative literal :p(x; y) is contained, then one may conclude that clauses are missing for computing p. But none of these approaches has user interaction other than via Prolog programs or similar formalisms. 5

There are several natural language interfaces such as ILF [9] and Omega [6]. Both systems provide natural language output of (complete) proofs, after first transforming proofs into an intermediate format (e.g. block or natural deduction calculus). We use them to inspect even parts of proofs and include these techniques into our debugging loop.

3 Calculi for Automated Reasoning So far, we emphasized the necessity of having the possibility of interacting with the proof search. What tools can be used for this purpose? In this Sect. 3, we present the procedures we use. For model generation, we use hyper-tableaux. There, we can enumerate models—at least partly—, and by inspecting them we can find errors and correct them. For presentation of proofs, we make use of the top-down prover PROTEIN, since there is a natural translation into output format.

3.1 Refutational and Model-Based Automated Deduction High-performance automated theorem provers usually are based on refutations. In the case of OTTER [18], resolution calculi are used to generate proofs, whereas SETHEO [11] or PROTEIN [3] are applying tableaux-based clause normal form proof-procedures. It has been demonstrated that these systems are able to deal with interesting problems, and indeed there is an increasing number of applications of these high-performance automated theorem provers. Usually these applications do not need only the stamp “proved”, but it is also mandatory to compute answers or an adequate output of the systems proof. In case of mathematical problems this means e.g., a human readable proof has to be given. Indeed there exist work on transforming proofs from the commonly used systems into natural, mathematical language (see e.g. [9]). Fig. 4 depicts parts of an machine found proof and Fig. 5 shows part of a proof which is transformed into a human readable form. In contrast to the above mentioned saturation-based or top-down provers, there is also work on model-based automated reasoning. Here, the idea is to find a proof by means of model generation. One of the first systems, which proved that a very simple proof procedure based on model generation is able to outperform (in some cases) refutation-based provers, was SATCHMO [16]. We are arguing that models offer powerful and cognitive adequate possibilities for debugging specifications. This does not mean that model-based theorem provers should be used in all cases; even if the main tool is a top-down systems, it pays off if for debugging purposes a model generation system is used.

3.2 Theorem Proving with Hyper-Tableaux Hyper-tableaux constitute a complete, model-generating clausal tableaux calculus for first-order logic and have been introduced by [4], although an earlier, more restricted version is used in the theorem provers SATCHMO [16] and MGTP [12]. Hyper-tableaux combine analytical reasoning inherited from tableaux calculi with the hyper-inference rule of resolution calculi. Hyper-tableau 6

(1) (2) (3) (4)

Table 1: Axioms for strict total orders. premise premise asymmetry x 2 U ^ y 2 U totality

a2U b2U x