SMT Proof Checking Using a Logical Framework - College of Liberal ...

6 downloads 0 Views 544KB Size Report
E-mail: {aaron-stump,duckki-oe,andrew-reynolds,cesare-tinelli}@uiowa.edu ...... Ford, J., Shankar, N.: Formal verification of a combination decision procedure.
Noname manuscript No. (will be inserted by the editor)

SMT Proof Checking Using a Logical Framework Aaron Stump · Duckki Oe · Andrew Reynolds · Liana Hadarean · Cesare Tinelli

Received: date / Accepted: date

Abstract Producing and checking proofs from SMT solvers is currently the most feasible method for achieving high confidence in the correctness of solver results. The diversity of solvers and relative complexity of SMT over, say, SAT means that flexibility, as well as performance, is a critical characteristic of a proof-checking solution for SMT. This paper describes such a solution, based on a Logical Framework with Side Conditions (LFSC). We describe the framework and show how it can be applied for flexible proof production and checking for two different SMT solvers, clsat and cvc3. We also report empirical results showing good performance relative to solver execution time. Keywords Satisfiability Modulo Theories · Proof Checking · Edinburgh Logical Framework, LFSC

1 Introduction Solvers for Satisfiability Modulo Theories (SMT) are currently at the heart of several formal method tools such as extended static checkers [16, 3], bounded and unbounded model-checkers [1, 21, 14], symbolic execution tools [25], and program verification environments [10, 49]. The main functionality of an SMT solver is to determine if a given input formula is satisfiable in the logic it supports—typically some fragment of first-order logic with certain built-in theories such as integer or real arithmetic, the theory of arrays and so on. Most SMT solvers combine advanced general-purpose propositional reasoning (SAT) engines with sophisticated implementations of special-purpose reasoning algorithms for built-in theories. They are rather complex and large tools, with codebases often between 50k This work was partially supported by funding from the US National Science Foundation, under awards 0914877 and 0914956. A. Stump, D. Oe, A. Reynolds, C. Tinelli E-mail: {aaron-stump,duckki-oe,andrew-reynolds,cesare-tinelli}@uiowa.edu L. Hadarean E-mail: [email protected]

2

Aaron Stump et al.

and 100k lines of C++. As a consequence, the correctness of their results is a longstanding concern. In an era of tour-de-force verifications of complex systems [24, 27], noteworthy efforts have been made to apply formal verification techniques to algorithms for SAT and SMT [29, 18]. Verifying actual solver code is, however, still extremely challenging [28] due to the complexity and size of modern SMT solvers. One approach to addressing this problem is for SMT solvers to emit independently checkable evidence of the results they report. For formulas reported as unsatisfiable, this evidence takes the form of a refutation proof. Since the relevant proof checking algorithms and implementations are much simpler than those for SMT solving, checking a proof provides a more trustworthy confirmation of the solver’s result. Interactive theorem proving can benefit as well from proof producing SMT solvers, as shown recently in a number of works [12, 11]. Complex subgoals for the interactive prover can be discharged by the SMT solver, and the proof it returns can be checked or reconstructed to confirm the result without trusting the solver. Certain advanced applications also strongly depend on proof production. For example, in the Fine system, Chen et al. translate proofs produced by the Z3 solver [32] into their internal proof language, to support certified compilation of richly typed source programs [13]. For Fine, even a completely verified SMT solver would not be enough since the proofs themselves are actually needed by the compiler. Besides Z3 other examples of proof-producing SMT solvers are clsat, cvc3, Fx7, and veriT [38, 6, 31, 15]. A significant enabler for the success of SMT has been the SMT-LIB standard input language [5], which is supported by most SMT solvers. So far, no standard proof format has emerged. This is, however, no accident. Because of the ever increasing number of logical theories supported by SMT solvers, the variety of deductive systems used to describe the various solving algorithms, and the relatively young age of the SMT field, designing a single set of axioms and inference rules that would be a good target for all solvers does not appear to be practically feasible. We believe that a more viable route to a common standard is the adoption of a common meta-logic in which SMT solver implementors can describe the particular proof systems used by their solver. With this approach, solvers need to agree just on the meta-language used to describe their proof systems. The adoption of a sufficiently rich meta-logic prevents a proliferation of individual proof languages, while allowing for a variety of proof systems. Also, a single meta-level tool suffices to check proofs efficiently on any proof system described in the meta-language. A fast proof checker can be implemented once and for all for the meta-language, using previously developed optimizations [43, 50, 44, e.g.]. Also, different proof systems can be more easily compared, since they are expressed in the same meta-level syntax. This may help identify in the future a common core proof system for some significant class of SMT logics and solvers. In this paper we propose and describe a meta-logic, called LFSC, for “Logical Framework with Side Conditions”, which we have developed explicitly with the goal of supporting the description of several proof systems for SMT, and enabling the implementation of very fast proof checkers. In LFSC, solver implementors can describe their proof rules using a compact declarative notation which also allows the use of computational side conditions. These conditions, expressed in a small functional programming language, enable some parts of a proof to be established by computation. The flexibility of LFSC facilitates the design of proof systems that reflect closely the sort of high-performance inferences made by SMT solvers.

SMT Proof Checking Using a Logical Framework

3

The side conditions feature offers a continuum of possible LFSC encodings of proof systems, from completely declarative at one end, using rules with no side conditions, to completely computational at the other, using a single rule with a huge side condition. We argue that supporting this continuum is a major strength of LFSC. Solver implementors have the freedom to choose the amount of computational inference when devising proof systems for their solver. This freedom cannot be abused since any decision is explicitly recorded in the LFSC formalization and becomes part of the proof system’s trusted computing base. Moreover, the ability to create with a relatively small effort different LFSC proof systems for the same solver provides an additional level of trust even for proof systems with a substantial computational component—since at least during the developing phase one could also produce proofs in a more declarative, if less efficient, proof system. We have put considerable effort in developing a full blown, highly efficient proof checker for LFSC proofs. Instead of developing a dedicated LFSC checker, one could imagine embedding LFSC in declarative languages such as Maude or Haskell. While the advantages of prototyping symbolic tools in these languages are well known, in our experience their performance lags too far behind carefully engineered imperative code for high-performance proof checking. This is especially the case for the sort of proofs generated by SMT solvers which can easily reach sizes measured in megabytes or even gigabytes. Based on previous experimental results by others, argument could be made against the use of interactive theorem provers (such as Isabelle [37] or Coq [8]), which have a very small trusted core, for SMT-proof checking. By allowing the use of computational side conditions and relying on a dedicated proof checker, our solution seeks to strike a pragmatic compromise between trustworthiness and efficiency. We introduce the LFSC language in Section 2 and then describe it formally in Section 3. In Section 4 we provide an overview of how one can encode in LFSC a variety of proof systems that closely reflect the sort of reasoning performed by modern SMT solvers. Building on the general approaches presented in this section, we then focus on a couple of SMT theories in Section 5, with the goal of giving a sense of how to use LFSC’s features to produce compact proofs for theory-specific lemmas. We then present empirical results for our LFSC proof checker which show good performance relative to solver-execution times (Section 6). Finally, Section 7 touches upon a new, more advanced application of LFSC: the generation of certified interpolants from proofs. The present paper builds on previously published workshop papers [45, 38–40] but it expands considerably on the material presented there. A preliminary release of our LFSC proof checker, together with the proof systems and the benchmark data discussed here are available online.1

1.1 Related Work SMT proofs produced by the solver Fx7 for the AUFLIA logic of SMT-LIB have been shown to be checked efficiently by an external checker in [31]. Other approaches [19, 17] have advocated use of interactive theorem provers to certify proofs 1 At http://clc.cs.uiowa.edu/lfsc/. The release’s README file clarifies a number of minor differences with the concrete syntax used in this paper.

4

Aaron Stump et al.

produced by SMT solvers. The advantages of using those theorem provers are well known; in particular, their trusted core contains only a base logic and a small fixed number of proof rules. While recent works [11, 2] have improved proof checking times, the performance of these tools still lags behind C/C++ checkers carefully engineered for fast checking of very large proofs. Besson et al. have recently proposed a similar meta-linguistic approach, though without the intention of providing a formal semantics for user-defined proof systems [9]. We are in discussion with authors of that work, on how to combine LFSC with the approach they advocate.

1.2 Notational Conventions This paper assumes some familiarity with the basics of type theory and of automated theorem proving, and will adhere in general to the notational conventions in the field. LFSC is a direct extension of LF [22], a type-theoretic logical framework based on the λΠ calculus, in turn an extension of the simply typed λ-calculus. The λΠ calculus has three levels of entities: values; types, understood as collections of values; and kinds, families of types. Its main feature is the support for dependent types which are types parametrized by values.2 Informally speaking, if τ2 [x] is a dependent type with value parameter x, and τ1 is a non-dependent type, the expression Πx:τ1 .τ2 [x] denotes in the calculus the type of functions that return a value of type τ2 [v] for each value v of type τ1 for x. When τ2 is itself a non-dependent type, the type Πx:τ1 .τ2 is just the arrow type τ1 → τ2 of simply typed λ-calculus. The current concrete syntax of LFSC is based on Lisp-style S-expressions, with all operators in infix format. For improved readability, we will often write LFSC expressions in abstract syntax instead. We will write concrete syntax expressions in typewriter font. In abstract syntax expressions, we will write variables and metavariables in italics font, and constants in sans serif font. Predefined keywords in the LFSC language will be in bold sans serif font.

2 Introducing LF with Side Conditions LFSC is based on the Edinburgh Logical Framework (LF) [22]. LF has been used extensively as a meta-language for encoding deductive systems including logics, semantics of programming languages, as well as many other applications [26, 7, 33]. In LF, proof systems can be encoded as signatures, which are collections of typing declarations. Each proof rule is a constant symbol whose type represents the inferences allowed by the rule. For example, the following transitivity rule for equality t1 = t2 t2 = t3 eq trans t1 = t3 can be encoded in LF as a constant eq trans of type Πt1 :term. t2 :term. t3 :term. Πu1 :holds (t1 = t2 ). Πu2 :holds (t2 = t3 ). holds (t1 = t3 ) . 2

A simple example of dependent types is the type of bit vectors of (positive integer) size n.

SMT Proof Checking Using a Logical Framework

5

The encoding can be understood intuitively as saying: for any terms t1 , t2 and t3 , and any proofs u1 of the equality t1 = t2 and u2 of t2 = t3 , eq trans constructs a proof of t1 = t3 . In the concrete, Lisp-style syntax of LFSC, the declaration of the rule would look like (declare eq_trans (! t1 term (! t2 term (! t3 term (! u1 (holds (= t1 t2)) (! u2 (holds (= t2 t3)) (holds (= t1 t3))))))))

where ! represents LF’s Π binder, for the dependent function space, term and holds are previously declared type constructors, and = is a previously declared constant of type Πt1 :term. t2 :term. term (i.e., term → term → term). Now, pure LF is not well suited for encoding large proofs from SMT solvers, due to the computational nature of many SMT inferences. For example, consider trying to prove the following simple statement in a logic of arithmetic: (t1 + (t2 + (. . . + tn ) . . .)) − ((ti1 + (ti2 + (. . . + tin ) . . .) = 0

(1)

where ti1 . . . tin is a permutation of the terms t1 , . . . , tn . A purely declarative proof would need Ω(n log n) applications of an associativity and a commutativity rule for +, to bring opposite terms together before they can be pairwise reduced to 0. Producing, and checking, purely declarative proofs in SMT, where input formulas alone are often measured in megabytes, is unfeasible in many cases. To address this problem, LFSC extends LF by supporting the definition of rules with side conditions, computational checks written in a small but expressive first-order functional language. The language has built-in types for arbitrary precision integers and rationals, inductive datatypes, ML-style pattern matching, recursion, and a very restricted set of imperative features. When checking the application of an inference rule with a side condition, an LFSC checker computes actual parameters for the side condition and executes its code. If the side condition fails, because it is not satisfied or because of an exception caused by a pattern-matching failure, the LFSC checker rejects the rule application. In LFSC, a proof of statement (1) could be given by a single application of a rule of the form: (declare eq_zero (! t term (^ (normalize t) 0) (holds (= t 0))))

where normalize is the name of a separately defined function in the side condition language that takes an arithmetic term and returns a normal form for it. The expression (^ (normalize t) 0) defines the side condition of the eq zero rule, with the condition succeeding if and only if the expression (normalize t) evaluates to 0. We will see more about this sort of normalization rules in Section 5.2.

3 The LFSC Language and its Formal Semantics In this section, we introduce the LFSC language in abstract syntax, by defining its formal semantics in the form of a typing relation for terms and types, and providing an operational semantics for the side-condition language. Well-typed value, type, kind and side-condition expressions are drawn from the syntactical categories defined in Figure 1 in BNF format. The kinds typec and type are used to distinguish types with side conditions in them from types without.

6

Aaron Stump et al.

(Kinds)

κ ::= type | typec | kind | Πx:τ. κ

(Types)

τ ::= k | τ t | Πx:τ1 [{s t}]. τ2

(Terms)

t ::= x | c | t:τ | λx[:τ ]. t | t1 t2

(Patterns)

p ::= c | c x1 · · · xn+1

(Programs)

s ::= x | c | −s | s1 + s2 | c s1 · · · sn+1 | let x s1 s2 | fail τ | markvar s | ifmarked s1 s2 s3 | ifneg s1 s2 s3 | match s (p1 s1 ) · · · (pn+1 sn+1 )

Fig. 1 Main syntactical categories of LFSC. Letter c denotes term constants (including rational ones), x denotes term variables, k denotes type constants. The square brackets are grammar meta-symbols enclosing optional subexpressions.

Program expressions s are used in side condition code. There, we also make use of the syntactic sugar (do s1 · · · sn s) for the expression (let x1 s1 · · · let xn sn s) where x1 through xn are fresh variables. Side condition programs in LFSC are monomorphic, simply typed, first-order, recursive functions with pattern matching, inductive data types and two built-in basic types: infinite precision integers and rationals. In practice, our implementation is a little more permissive, allowing side-condition code to pattern-match also over dependently typed data. For simplicity, we restrict our attention here to the formalization for simple types only. The operational semantics of the main constructs in the side condition language could be described informally as follows. Expressions of the form (c s1 · · · sn+1 ) are applications of either term constants or program constants (i.e., declared functions) to arguments. In the former case, the application constructs a new value; in the latter, it invokes a program. The expressions (match s (p1 s1 ) · · · (pn+1 sn+1 )) and (let x s1 s2 ) behave exactly as their corresponding matching and let-binding constructs in ML-like languages. The expression (markvar s) evaluates to the value of s if this value is a variable. In that case, the expression has also the side effect of toggling a Boolean mark on that variable.3 The expression (ifmarked s1 s2 s3 ) evaluates to the value of s2 or of s3 depending on whether s1 evaluates to a marked or an unmarked variable. Both markvar and ifmarked raise a failure exception if their arguments do not evaluate to a variable. The expression (fail τ ) always raises that exception, for any type τ . The typing rules for terms and types, given in Figure 2, are based on the rules of canonical forms LF [47]. They include judgments of the form Γ ` e ⇐ T for checking that expression e has type/kind T in context Γ , where Γ , e, and T are inputs to the judgment; and judgments of the form Γ ` e ⇒ T for computing a type/kind T for expression e in context Γ , where Γ and e are inputs and T is output. The contexts Γ map variables and constants to types or kinds, and map constants f for side condition functions to (possibly recursive) definitions of the form (x1 : τ1 · · · xn : τn ) : τ = s, where s is a term with free variables x1 , . . . , xn , the function’s formal parameters. The three top rules of Figure 2 define well-formed contexts. The other rules, read from conclusion to premises, induce deterministic type/kind checking and type computation algorithms. They work up to a standard definitional equality, namely βη-equivalence; and use standard notation for capture-avoiding substitution ([t/x]T is the result of simultaneously replacing every free occurrence of x in T by t, and renaming any bound variable in T that occurs free in t). Side con3 For simplicity, we limit the description here to a single mark per variable. In reality, there are 32 such marks, each with its own markvar command.

SMT Proof Checking Using a Logical Framework

· Ok

Γ Ok Γ ` τ ⇒ κ Γ, y : τ Ok

Γ Ok Γ ` type ⇒ kind

Γ, x1 : τ1 , . . . , xn : τn , f : k1 → · · · → kn → k Ok Γ, x1 : τ1 , . . . , xn : τn , f : k1 → · · · → kn → k ` s ⇒ k Γ, f (x1 : k1 · · · xn : kn ) : k = s Ok Γ Ok y : τ ∈ Γ Γ `y⇒τ

Γ Ok Γ ` typec ⇒ kind

Γ ` τ ⇐ type

Γ, x : τ ` T ⇒ α α ∈ {type, typec , kind} Γ ` Πx:τ. T ⇒ α

Γ ` τ1 ⇐ type

Γ, x : τ1 ` τ2 ⇒ type Γ, x : τ1 ` s ⇒ τ Γ ` Πx:τ1 {s t}. τ2 ⇒ typec

Γ ` t1 ⇒ Πx:τ1 . τ2 Γ ` t2 ⇐ τ1 Γ ` (t1 t2 ) ⇒ [t2 /x]τ2 Γ ` t1 ⇒ Πx:τ1 {s t}. τ2

7

Γ `t⇐τ Γ `t:τ ⇒τ

Γ ` τ ⇒ Πx:τ1 . κ Γ ` t ⇐ τ1 Γ ` (τ t) ⇒ [t/x]κ Γ, x : τ1 ` t ⇒ τ

Γ ` τ1 ⇒ type Γ, x : τ1 ` t ⇒ τ2 Γ ` λx:τ1 . t ⇒ Πx:τ1 . τ2

Γ ` t2 ⇐ τ1

|Γ | ` ε; [t2 /y]s ↓ [t2 /y]t; σ

Γ ` (t1 t2 ) ⇒ [t2 /x]τ2

Γ, x : τ1 ` t ⇒ τ2 Γ ` λx. t ⇐ Πx:τ1 . τ2

Fig. 2 Bidirectional typing rules and context rules for LFSC. Letter y denotes variables and constants declared in context Γ , letter T denotes types or kinds. Letter ε denotes the state in which every variable is unmarked.

ditions occur in type expressions of the form Πy:τ1 {s t}. τ2 , constructing types of kind typec . The premise of the last rule, defining the well-typedness of applications involving such types, contains a judgement of the form ∆ ` σ; s ↓ s0 ; σ 0 where ∆ is a context consisting only of definitions for side condition functions, and σ and σ 0 are states, i.e., mappings from variables to their mark. Such judgment states that, under the context ∆, evaluating the expression s in state σ results in the expression s0 and state σ 0 . In the application rule, ∆ is |Γ | defined as the collection of all the function definitions in Γ . Note that the rules of Figure 2 enforce that bound variables do not have types with side conditions in them—by requiring those types to be of kind type, as opposed to kind typec . An additional requirement is not formalized in the figure. Suppose Γ declares a constant d with type Πx1 :τ1 . · · · Πxn :τn . τ of kind typec , where τ is either k or (k t1 · · · tm ). Then neither k nor an application of k may be used as the domain of a Π-type. This is to ensure that applications requiring side condition checks never appear in types. Similar typing rules, included in the appendix, define well-typedness for side condition terms, in a fairly standard way. A big-step operational semantics of side condition programs is provided in Figure 3 using ∆ ` σ; s ↓ s0 ; σ 0 judgements. For brevity, we elide “∆ `” from the rules when ∆ is unused. Note that side condition programs can contain unbound variables, which evaluate to themselves. States σ (updated using the notation σ[x 7→ v]) map such variables to the value of their Boolean mark. If no rule applies when running a side condition, program evaluation and hence checking of types with side conditions fails. This also happens when evaluating the fail construct (fail τ ), or when pattern matching fails. Currently, we do not enforce termination of side condition programs, nor do we attempt to provide facilities to reason formally about the behavior of such programs. Our implementation of LFSC supports the use of the wildcard symbol in place of an actual argument of an application when the value of this argument is

8

Aaron Stump et al.

σ1 ; c ↓ c; σ1 σ1 ; s1 ↓ r; σ2

σ1 ; x ↓ x; σ1 r 0}

(a · p)↓ > 0

lra mult c >

{a ∼ 0} lra axiom ∼ a∼0 p∼0

{p  0} lra contra ∼ 2

Fig. 14 Some of proof rules for linear polynomial atoms. Letter p denotes normalized linear polynomials; a denotes rational values; the arithmetic operators denote operations on linear polynomials.

5.2 Quantifier-Free Linear Real Arithmetic The logic QF LRA, for quantifier-free linear arithmetic, consists of formulas interpreted over the real numbers and restricted to Boolean combinations of linear equations and inequations over real variables, with rational coefficients. We sketch an approach for encoding, in LFSC, refutations for sets of QF LRA literals. We devised an LFSC proof system for QF LRA that centers around proof inferences for normalized linear polynomial atoms; that is, atoms of the form a1 · x1 + . . . + an · xn + an+1 ∼ 0 where each ai is a rational value, each xi is a real variable, and ∼ is one of the operators =, >, ≥. We represent linear polynomials in LFSC as (inductive data type) values of the form (pol a l), where a is a rational value and l is a list of monomials of the form (mon ai xi ) for rational value ai and real variable xi . With this representation, certain computational inferences in QF LRA become immediate. For example, to verify that a normalized linear polynomial (pol a l) is the zero polynomial, it suffices to check that a is zero and l is the empty list. With normalized linear polynomials, proving a statement like (1) in Section 2 amounts to a single rule application whose side condition is an arithmetic check for equality between rational values. Figure 14 shows a representative selection of the rules in our LFSC proof system for QF LRA based on a normalized linear polynomial representation of QF LRA literals.7 Again, side conditions are written together with the premises, in braces, and are to be read as mathematical notation. For example, the side condition {p + p0 = 0}, say, denotes the result of checking whether the expression p + p0 evaluates to the zero element of the polynomial ring Q[X], where Q is the field of rational numbers and X the set of all variables. Expression of the form e↓ in the rules denote the result of normalizing the linear polynomial expression e. The normalization is actually done in the rule’s side condition, which is however left implicit here to keep the notation uncluttered. The main idea of this proof system, based on well known results, is that a set of normalized linear polynomial atoms can be shown to be unsatisfiable if and only if it is possible to multiply each of them by some rational coefficient such that their sum normalizes to an atom of the form p ∼ 0 that does not hold in Q[X]. Since the language of QF LRA is not based on normalized linear polynomial atoms, any refutation for a set of QF LRA literals must also include rules that account for the translation of this set into an equisatisfiable set of normalized 7 Additional cases are omitted for the sake of brevity. A comprehensive list of rules can be found in the appendix of [39].

18

Aaron Stump et al.

t1 = p1 t2 = p2 pol norm+ t1 + t2 = (p1 + p2 )↓

t=p pol normc· at · t = (ap · p)↓

at = ap

pol norm const

t1 = p1 t2 = p2 pol norm− t1 − t2 = (p1 − p2 )↓

t=p pol norm·c t · at = (p · ap )↓

vt = vp

pol norm var

t1 ∼ t2

t1 − t2 = p pol norm∼ p∼0

Fig. 15 Proof rules for conversion to normalized linear polynomial atoms. Letter t denotes QF LRA terms; at and ap denote the same rational constant, in one case considered as a term and in the other as a polynomial (similarly for the variables vt and vp ). (declare term type) (declare poly type)

(declare pol_norm (! t term (! p poly type)))

(declare pol_norm_+ (! t1 term (! t2 term (! p1 poly (! p2 poly (! p3 poly (! pn1 (pol_norm t1 p1) (! pn2 (pol_norm t2 p2) (^ (poly_add p1 p2) p3) (pol_norm (+ t1 t2) p3))))))))) Fig. 16 Normalization function for + terms in concrete syntax.

linear polynomial atoms. Figure 15 provides a set of proof rules for this translation. Figure 16 shows, as an example, the encoding of rule pol norm+ in LFSC’s concrete syntax. The type (pol norm t p) encodes the judgment that p is the polynomial normalization of term t—expressed in the rules of Figure 15 as t = p. In pol norm+ , the polynomials p1 and p2 for the terms t1 and t2 are added together using the side condition poly add, which produces the normalized polynomial (p1 + p2 )↓. The side condition of the rule requires that polynomial to be the same as the provided one, p3 . The other proof rules are encoded in a similar way using side condition functions that implement the other polynomial operations (subtraction, scalar multiplication, and comparisons between constant polynomials). We observe that the overall size of the side condition code in this proof system is rather small: about 60 lines total (and less than 2 kilobytes). Complexity-wise, the various normalization functions are comparable to a merge sort of lists of key/value pairs. As a result, manually verifying the correctness of the side conditions is fairly easy.

6 Empirical Results In this section we provide some empirical results on LFSC proof checking for the logics presented in the previous section. These results were obtained with an LFSC proof checker that we have developed to be both general (i.e., supporting the whole LFSC language) and fast. The tool, which we will refer to as lfsc here, is more accurately described as a proof checker generator : given an LFSC signature, a text file declaring a proof system in LFSC format, it produces a checker for that proof system. Some of its notable features in support of high performance proof checking are the compilation of side conditions, as opposed to the incorporation of a side condition language interpreter in the proof checker, and the generation of proof checkers that check proof terms on the fly, as they parse them. See previous

SMT Proof Checking Using a Logical Framework

19

work by Stump et al, as well as work by Necula et al., for more on fast LF proof checking [43, 50, 44, 35, 34].

6.1 QF IDL In separate work, we developed an SMT solver for the QF IDL logic, mostly to experiment with proof generation in SMT. This solver, called clsat, can solve moderately challenging QF IDL benchmarks from the SMT-LIB library, namely those classified with difficulty 0 through 3.8 We ran clsat on the unsatisfiable QF IDL benchmarks in SMT-LIB, and had it produce proofs in the LFSC proof system sketched in Section 5.1 optimized with the deferred resolution rule described in Section 4.2. Then we checked the proofs using the lfsc checker. The experiments were performed on the SMT-EXEC solver execution service.9 A timeout of 1,800 seconds was used for each of the 622 benchmarks. The table below summarizes those results for two configurations: clsat (r453 on SMT-EXEC), in which clsat is run with proof-production off; and clsat+lfsc (r591 on SMT-EXEC), in which clsat is run with proof-production on, followed by a run of lfsc on the produced proof.

Table 1 Summary of results for QF IDL (timeout: 1,800 seconds). Configuration clsat (without proofs) (r453) clsat+lfsc (r591)

Solved

Unsolved

Timeouts

Time

542 539

50 51

30 32

29,507.7s 38,833.6s

The Solved column gives the number of benchmarks each configuration completed successfully. The Unsolved column gives the number of benchmarks each configuration failed to solve before timeout due to clsat’s incomplete CNF conversion implementation and lack of arbitrary precision arithmetic. The first configuration solved all benchmarks solved by the second. The one additional unsolved answer for clsat+lfsc is diamonds.18.10.i.a.u.smt, the proof of which is bigger than 2GB in size and the proof checker failed on due to a memory overflow. The Time column gives the total times taken by each configuration to solve the 539 benchmarks solved by both. Those totals show that the overall overhead of proof generation and proof checking over just solving for those benchmarks was 31.6%, which we consider rather reasonable. For a more detailed picture on the overhead incurred with proof-checking, Figure 17 compares proof checking times by clsat+lfsc with clsat’s solve-only times. Each dot represents one of the 542 benchmarks that clsat could solve. The horizontal axis is for solve-only times (without proof production) while the vertical axis is for proof checking times. Both are in seconds on a log scale. It turned out that the proofs from certain families of benchmarks, namely fischer, diamonds, planning and post office, are much more difficult to verify than those 8

The hardest QF IDL benchmarks in SMT-LIB have difficulty 5. The results are publicly available at http://www.smt-exec.org under the SMT-EXEC job clsat-lfsc-2009.8. 9

20

Aaron Stump et al.

LFSC  proof  checking  (s)  

1000

100

10

1 1

10

100

1000

clsat  solving  only  (s)  

Fig. 17 Solve-only times versus proof checking times for QF IDL.

for other families. In particular, for those benchmarks proof checking took longer than solving. The worst offender is benchmark diamonds.11.3.i.a.u.smt whose proof checking time was 2.30s vs 0.2s of solving time. However, the figure also shows that as these benchmarks get more difficult, the relative proof overheads appear to converge towards 100%, as indicated by the dotted line.10

6.2 Results for QF LRA To evaluate experimentally the LFSC proof system for QF LRA sketched in Section 5.2, we instrumented the SMT solver cvc3 to output proofs in that system. Because cvc3 already has its own proof generation module, which is tightly integrated with the rest of the solver, we generated the LFSC proofs using cvc3’s native proofs as a guide. We looked at all the QF LRA and QF RDL unsatisfiable benchmarks from SMT-LIB.11 . Our experimental evaluation contains no comparisons with other proof checkers besides lfsc for lack of alternative proof-generating solvers and checkers for QF LRA. To our knowledge, the only potential candidate was a former system developed by Ge and Barrett that used the HOL Light prover as a proof checker for cvc3 [19]. Unfortunately, that system, which was never tested on QF LRA benchmarks and was not kept in sync with the latest developments of cvc3, breaks on most of these benchmarks. Instead, as a reference point, we compared proofs produced in our LFSC proof system against proofs translated into an alternative LFSC proof system for QF LRA that mimics the rules contained in cvc3’s native proof format. These rules are mostly declarative, with a few exceptions.12 10

The outlier above the dotted line is diamonds.18.10.i.a.u.smt. Each of these benchmarks consists of an unsatisfiable quantifier-free LRA formula. QF RDL is a sublogic of QF LRA. 12 Details of these proof systems as well as proof excerpts can be found in [39]. 11

SMT Proof Checking Using a Logical Framework Benchmark Family # check-lra 1 check-rdl 1 clock synch 18 gasburner 19 pursuit 8 sal 31 scheduling 8 spider 35 tgc 21 TM 1 tta startup 25 uart 9 windowreal 24 Total 201

Solve + (Pf Gen) (s) cvc lrac lra 0.1 0.3 0.2 0.1 0.1 0.1 11.7 21.8 21.7 4.0 8.6 7.8 16.6 26.3 26.3 1584.8 3254.3 3239.8 281.8 322.9 322.1 10.2 17.4 17.4 31.5 55.9 55.4 17.6 29.3 29.0 29.7 68.4 68.5 1074.2 1391.2 1434.7 20.6 41.6 41.7 3082.9 5238.0 5264.6

21 Pf Size (MB) lrac lra 0.5 0.1 0.0 0.0 14.8 13.0 13.9 7.7 5.1 3.6 537.1 472.2 25.1 17.8 12.7 11.1 22.7 16.9 2.7 2.7 43.6 43.2 102.4 76.6 22.1 21.9 802.8 686.9

Pf Check (s) lrac lra 0.1 0.0 0.0 0.0 2.6 2.3 2.5 1.6 0.8 0.7 275.6 269.0 3.7 2.9 2.3 2.4 4.2 3.4 0.4 0.4 5.4 5.6 42.7 37.0 2.8 2.9 343.2 328.1

T% 79% 48% 17% 46% 36% 6% 37% 15% 16% 0% 3% 13% 3% 8%

Fig. 18 Cumulative results for QF LRA, grouped by benchmark family. Column 2 gives the numbers of benchmarks in each family. Columns 3–5 give cvc3’s aggregate runtime for each of the 3 configurations. Columns 6–7 show the proof sizes for the two proof-producing configurations. Columns 8–9 show LFSC proof checking times. The last column show the average theory content of each benchmark family.

Fig. 19 Comparing proof sizes and proof checking times for QF LRA.

We ran our experiments on a Linux machine with two 2.67GHz 4-core Xeon processors and 8GB of RAM. We discuss benchmarks for which cvc3 could generate a proof within a timeout of 900 seconds: 161 of the 317 unsatisfiable QF LRA benchmarks, and 40 of the 113 unsatisfiable QF RDL benchmarks. We collected runtimes for the following three main configurations of cvc3. cvc: Default, solving benchmarks but with no proof generation. lrac: Solving with proof generation in the LFSC encoding of cvc3’s format. lra: Solving with proof generation in our LFSC proof system. Recall that the main idea of our proof system for QF LRA is to use a polynomial representation of LRA terms so that theory reasoning is justified with rules like those in Figure 14. Propositional reasoning steps (after CNF conversion) are encoded with the deferred resolution rule presented in Section 4.2. As a consequence, one should expect the effectiveness of the polynomial representation in reducing proof sizes and checking times to be correlated with the amount of the-

22

Aaron Stump et al.

ory content of a proof. Concretely, we measure that as the percentage of nodes in a native cvc3 proof that belong to the (sub)proof of a theory lemma. For our benchmarks set, the average theory content was very low, about 8.3%, considerably diluting the global impact of our polynomial representation. However, this impact is clearly significant, and positive, on proofs or subproofs with high theory content, as discussed below. Table 18 shows a summary of our results for various families of benchmarks. Since translating cvc3’s native proofs into LFSC format increased proof generation time and proof sizes only by a small constant factor, we do not include these values in the table. As the table shows, cvc3’s solving times (i.e., the runtimes of the cvc configuration) are on average 1.65 times faster than solving with native proof generation (the lrac configuration). The translation to proofs in our system (the lra configuration) adds additional overhead, which is however less than 3% on average. The scatter plots in Figure 19 are helpful in comparing proof sizes and proof checking times for the two proof systems. The first plot shows that ours, lra, achieves constant size compression factors over the LFSC encoding of native cvc3 proofs, lrac. A number of benchmarks in our test set do not benefit from using our proof system. Such benchmarks are minimally dependent on theory reasoning, having a theory content of less than 2%. In contrast, for benchmarks with higher theory content, lra is effective at proof compression compared to lrac. For instance, over the set of all benchmarks with a theory content of 10% or more, proofs in our system occupy on average 24% less space than cvc3 native proofs in LFSC format. When focusing just on subproofs of theory lemmas, the average compression goes up significantly, to 81.3%; that is to say, theory lemma subproofs in our proof system are 5.3 times smaller than native cvc3 proofs of the same lemmas. Interestingly, the compression factor is not the same for all benchmarks, although an analysis of the individual results shows that benchmarks in the same SMT-LIB family tend to have the same compression factor. It is generally expected that proof checking should be substantially faster than proof generation or even just solving. This was generally the case in our experiments for both proof systems when proof checking used compiled side conditions.13 lfsc’s proof checking times for both proof systems were around 9 times smaller than cvc3’s solving times. Furthermore, checking lra proofs was always more efficient than checking the LFSC encoding of cvc3’s native proofs; in particular, it was on average 2.3 times faster for proofs of theory lemmas. Overall, our experiments with multiple LFSC proof systems for the same logic (QF LRA), show that mixing declarative proof rules, with no side conditions, with more computational arithmetic proof rules, with fairly simple side conditions, is effective in producing smaller proof sizes and proof checking times.

7 Further applications: leveraging type inference To illustrate the power and flexibility of the LFSC framework further, we sketch an additional research direction we have started to explore recently. More details can be found in [40]. Its starting point is the use of the wildcard symbol in proof terms, which requires the LFSC proof checker to perform type inference as opposed 13

The lfsc checker can be used either with compiled or with interpreted side condition code.

SMT Proof Checking Using a Logical Framework (declare (declare (declare (declare

color type) A color) B color) clause type)

(declare (declare (declare (declare

23

formula type) colored (! phi formula (! col color type))) p_interpolant (! c clause (! phi formula type))) interpolant (! phi formula type))

Fig. 20 Basic definitions for encoding interpolating calculi in LFSC.

to mere type checking. One can exploit this feature by designing a logical system in which certain terms of interest in a proof need not be specified beforehand, and are instead computed as a side effect of proof checking. An immediate application can be seen in interpolant-generating proofs. Given a logical theory T and two sets of formulas A and B that are jointly unsatisfiable in T , a T -interpolant for A and B is a formula φ over the symbols of T and the free symbols common to A and B such that (i) A |=T φ and (ii) B, φ |=T ⊥, where |=T is logical entailment in T and ⊥ is the universally false formula. For certain theories, interpolants can be generated efficiently from a refutation of A∧B. Interpolants have been used successfully in a variety of contexts, including symbolic model checking [30] and predicate abstraction [23]. In many applications, it is critical that formulas computed by interpolant-generation procedures are indeed interpolants; that is, exhibit the defining properties above. LFSC offers a way of addressing this need by generating certified interpolants. Extracting interpolants from refutation proofs typically involves the use of interpolant-generating calculi, in which inference rules are augmented with additional information necessary to construct an interpolant. For example, an interpolant for an unsatisfiable pair of propositional clause sets (A, B) can be constructed from a refutation of A ∪ B written in a calculus with rules based on sequents of the form (A, B) ` c[φ], where c is a clause and φ a formula. Sequents like these are commonly referred to as partial interpolants. When c is the empty clause, φ is an interpolant for (A, B). The LF language allows one to augment proof rules to carry additional information through the use of suitably modified types. This makes encoding partial interpolants into LFSC straightforward. For the example above, the dependent type (holds c) used in Section 4.1 can be replaced with (p interpolant c φ), where again c is a clause and φ a formula annotating c. This general scheme may be used when devising encodings of interpolant generating calculi for theories as well. Figure 20 provides some basic definitions common to the encodings of interpolant generating calculi in LFSC for two sets A and B of input formulas. We use a base type color with two nullary constructors, A and B. Colors are used as a way to tag an input formula ψ as occurring in the set A or B, through the type (colored φ col), where col is either A or B. Since formulas in the sets A and B can be inferred from the types of the free variables in proof terms, the sets do not need to be explicitly recorded as part of proof judgment types. In particular, our judgment for interpolants is encoded as the type (interpolant φ). Our approach allows for two options to obtain certified interpolants. With the first, an LFSC proof term P can be checked against the type (interpolant φ) for some given formula φ. In other words, the alleged interpolant φ is explicitly provided as part of the proof, and if proof checking succeeds, then both the proof P and the interpolant φ are certified to be correct. Note that in this case the user and the proof checker must agree on the exact form of the interpolant φ. Alternatively,

24

Aaron Stump et al.

P can be checked against the type schema (interpolant ). If the proof checker verifies that P has type (interpolant φ) for some formula φ generated by type inference, it will output φ. In this case, interpolant generation comes as a side effect of proof checking, and the returned interpolant φ is correct by construction. In recent experimental work [40] we found that interpolants can be generated using LFSC with a small overhead with respect to solving. In that work, we focused on interpolant generation for the theory of equality and uninterpreted functions (EUF), using the lfsc proof checker as an interpolant generator for proofs generated by cvc3. A simple calculus for interpolant generation in EUF can be encoded in LFSC in a natural way, with minimal dependence upon side conditions. Overall, our experiments showed that interpolant generation had a 22% overhead with respect to solving with proof generation, indicating that the generation of certified interpolants is practicably feasible with high-performance SMT solvers.

8 Conclusion and future work We have argued how efficient and highly customizable proof checking can be supported with the Logical Framework with Side Conditions, LFSC. We have shown how diverse proof rules, from purely propositional to core SMT to theory inferences, can be supported naturally and efficiently using LFSC. Thanks to an optimized implementation, LFSC proof checking times compare very favorably to solving times, using two independently developed SMT solvers. We have also shown how more advanced applications, such as interpolant generation, can also be supported by LFSC. This illustrates the further benefits of using a framework beyond basic proof checking. Future work includes a new, more user-friendly syntax for LFSC signatures, a simpler and leaner, from scratch re-implementation of the lfsc checker—intended for public release and under way—as well as additional theories and applications of LFSC for SMT. We also intend to pursue the ultimate goal of a standard SMT-LIB proof format, based on ideas from LFSC, as well as the broader SMT community. Acknowledgements We would like to thank Yeting Ge and Clark Barrett for their help translating cvc3’s proof format into LFSC. We also thank the anonymous referees for their thoughtful and detailed reviews whose suggestions have considerably helped us improve the presentation.

References 1. A. Armando, J.M., Platania., L.: Bounded model checking of software using SMT solvers instead of SAT solvers. In: Proceedings of the 13th International SPIN Workshop on Model Checking of Software (SPIN’06), Lecture Notes in Computer Science, vol. 3925, pp. 146–162. Springer (2006) 2. Armand, M., Faure, G., Gr´ egoire, B., Keller, C., Th´ ery, L., Werner, B.: A modular integration of SAT/SMT solvers to Coq through proof witnesses. In: J.P. Jouannaud, Z. Shao (eds.) Certified Programs and Proofs, Lecture Notes in Computer Science, vol. 7086, pp. 135–150. Springer (2011) 3. Barnett, M., yuh Evan Chang, B., Deline, R., Jacobs, B., Leino, K.R.: Boogie: A modular reusable verifier for object-oriented programs. In: 4th International Symposium on Formal Methods for Components and Objects, Lecture Notes in Computer Science, vol. 4111, pp. 364–387. Springer (2006)

SMT Proof Checking Using a Logical Framework

25

4. Barrett, C., Sebastiani, R., Seshia, S., Tinelli, C.: Satisfiability modulo theories. In: A. Biere, M.J.H. Heule, H. van Maaren, T. Walsh (eds.) Handbook of Satisfiability, vol. 185, chap. 26, pp. 825–885. IOS Press (2009) 5. Barrett, C., Stump, A., Tinelli, C.: The SMT-LIB Standard: Version 2.0. In: A. Gupta, D. Kroening (eds.) Proceedings of the 8th International Workshop on Satisfiability Modulo Theories (Edinburgh, England) (2010). Available from www.smtlib.org 6. Barrett, C., Tinelli, C.: CVC3. In: W. Damm, H. Hermanns (eds.) Proceedings of the 19th International Conference on Computer Aided Verification (CAV’07), Berlin, Germany, Lecture Notes in Computer Science, vol. 4590, pp. 298–302. Springer (2007) 7. Bauer, L., Garriss, S., McCune, J.M., Reiter, M.K., Rouse, J., Rutenbar, P.: Deviceenabled authorization in the Grey system. In: Proceedings of the 8th Information Security Conference (ISC’05), pp. 431–445 (2005) 8. Bertot, Y., Cast´ eran, P.: Interactive Theorem Proving and Program Development. Coq’Art: The Calculus of Inductive Constructions. Texts in Theoretical Computer Science. Springer Verlag (2004) 9. Besson, F., Fontaine, P., Th´ ery, L.: A Flexible Proof Format for SMT: a Proposal. In: P. Fontaine, A. Stump (eds.) Workshop on Proof eXchange for Theorem Proving (PxTP) (2011) 10. Bobot, F., Filliˆ atre, J.C., March´ e, C., Paskevich, A.: Why3: Shepherd your herd of provers. In: Boogie 2011: First International Workshop on Intermediate Verification Languages. Wroclaw, Poland (2011) 11. B¨ ohme, S., Weber, T.: Fast LCF-style proof reconstruction for Z3. In: M. Kaufmann, L. Paulson (eds.) Interactive Theorem Proving, Lecture Notes in Computer Science, vol. 6172, pp. 179–194. Springer (2010) 12. Bouton, T., Caminha B. De Oliveira, D., D´ eharbe, D., Fontaine, P.: veriT: An open, trustable and efficient SMT-solver. In: R.A. Schmidt (ed.) Proceedings of the 22nd International Conference on Automated Deduction (CADE), CADE-22, pp. 151–156. SpringerVerlag (2009) 13. Chen, J., Chugh, R., Swamy, N.: Type-preserving compilation of end-to-end verification of security enforcement. In: Proceedings of the 2010 ACM SIGPLAN Conference on Programming Language Design and Implementation (PLDI), pp. 412–423. ACM (2010) 14. Clarke, E.M., Biere, A., Raimi, R., Zhu, Y.: Bounded model checking using satisfiability solving. Formal Methods in System Design 19(1), 7–34 (2001) 15. Deharbe, D., Fontaine, P., Paleo, B.W.: Quantifier inference rules for SMT proofs. In: Workshop on Proof eXchange for Theorem Proving (2011) 16. Flanagan, C., Leino, K.R.M., Lillibridge, M., Nelson, G., Saxe, J.B.: Extended static checking for Java. In: Proc. ACM Conference on Programming Language Design and Implementation, pp. 234–245 (2002) 17. Fontaine, P., Marion, J.Y., Merz, S., Nieto, L.P., Tiu, A.: Expressiveness + automation + soundness: Towards combining SMT solvers and interactive proof assistants. In: In Tools and Algorithms for Construction and Analysis of Systems (TACAS), Lecture Notes in Computer Science, vol. 3920, pp. 167–181. Springer-Verlag (2006) 18. Ford, J., Shankar, N.: Formal verification of a combination decision procedure. In: A. Voronkov (ed.) 18th International Conference on Automated Deduction (CADE), Lecture Notes in Computer Science, vol. 2392, pp. 347–362. Springer (2002) 19. Ge, Y., Barrett, C.: Proof translation and SMT-LIB benchmark certification: A preliminary report. In: Proceedings of International Workshop on Satisfiability Modulo Theories (2008) 20. Goel, A., Krsti´ c, S., Tinelli, C.: Ground interpolation for combined theories. In: R. Schmidt (ed.) Proceedings of the 22nd International Conference on Automated Deduction (CADE), Lecture Notes in Artificial Intelligence, vol. 5663, pp. 183–198. Springer (2009) 21. Hagen, G., Tinelli, C.: Scaling up the formal verification of Lustre programs with SMTbased techniques. In: A. Cimatti, R. Jones (eds.) Proceedings of the 8th International Conference on Formal Methods in Computer-Aided Design (Portland, Oregon), pp. 109– 117. IEEE (2008) 22. Harper, R., Honsell, F., Plotkin, G.: A Framework for Defining Logics. Journal of the Association for Computing Machinery 40(1), 143–184 (1993) 23. Jhala, R., McMillan, K.L.: A practical and complete approach to predicate refinement. In: Tools and Algorithms for the Construction and Analysis of Systems (TACAS), Lecture Notes in Computer Science, vol. 3920, pp. 459–473. Springer (2006)

26

Aaron Stump et al.

24. Klein, G., Elphinstone, K., Heiser, G., Andronick, J., Cock, D., Derrin, P., Elkaduwe, D., Engelhardt, K., Kolanski, R., Norrish, M., Sewell, T., Tuch, H., Winwood, S.: seL4: Formal verification of an OS kernel. In: J. Matthews, T. Anderson (eds.) 22nd ACM Symposium on Operating Systems Principles (SOSP), pp. 207–220. ACM (2009) 25. Kothari, N., Mahajan, R., Millstein, T.D., Govindan, R., Musuvathi, M.: Finding protocol manipulation attacks. In: S. Keshav, J. Liebeherr, J.W. Byers, J.C. Mogul (eds.) Proceedings of the ACM SIGCOMM 2011 Conference on Applications, Technologies, Architectures, and Protocols for Computer Communications, pp. 26–37 (2011) 26. Lee, D., Crary, K., Harper, R.: Towards a Mechanized Metatheory of Standard ML. In: Proceedings of 34th ACM Symposium on Principles of Programming Languages, pp. 173– 184. ACM Press (2007) 27. Leroy, X.: Formal certification of a compiler back-end, or: programming a compiler with a proof assistant. In: G. Morrisett, S.P. Jones (eds.) 33rd ACM symposium on Principles of Programming Languages, pp. 42–54. ACM Press (2006) 28. Lescuyer, S., Conchon, S.: A Reflexive Formalization of a SAT Solver in Coq. In: Emerging Trends of the 21st International Conference on Theorem Proving in Higher Order Logics (TPHOLs) (2008) 29. Mari´ c, F.: Formal verification of a modern SAT solver by shallow embedding into Isabelle/HOL. Theoretical Computer Science 411, 4333–4356 (2010) 30. McMillan, K.: Interpolation and SAT-based model checking. In: W.A.H. Jr., F. Somenzi (eds.) Proceedings of Computer Aided Verification, Lecture Notes in Computer Science, vol. 2725, pp. 1–13. Springer (2003) 31. Moskal, M.: Rocket-Fast Proof Checking for SMT Solvers. In: C. Ramakrishnan, J. Rehof (eds.) Tools and Algorithms for the Construction and Analysis of Systems (TACAS), Lecture Notes in Computer Science, vol. 4963, pp. 486–500. Springer (2008) 32. de Moura, L., Bjørner, N.: Proofs and Refutations, and Z3. In: B. Konev, R. Schmidt, S. Schulz (eds.) 7th International Workshop on the Implementation of Logics (IWIL) (2008) 33. Necula, G.: Proof-Carrying Code. In: 24th ACM SIGPLAN-SIGACT Symposium on Principles of Programming Languages, pp. 106–119 (1997) 34. Necula, G., Lee, P.: Efficient representation and validation of proofs. In: 13th Annual IEEE Symposium on Logic in Computer Science, pp. 93–104 (1998) 35. Necula, G., Rahul, S.: Oracle-Based Checking of Untrusted Software. In: Proceedings of the 28th ACM Symposium on Principles of Programming Languages, pp. 142–154 (2001) 36. Nieuwenhuis, R., Oliveras, A., Tinelli, C.: Solving SAT and SAT Modulo Theories: from an Abstract Davis-Putnam-Logemann-Loveland Procedure to DPLL(T). Journal of the ACM 53(6), 937–977 (2006) 37. Nipkow, T., Paulson, L.C., Wenzel, M.: Isabelle/HOL — A Proof Assistant for HigherOrder Logic, Lecture Notes in Computer Science, vol. 2283. Springer (2002) 38. Oe, D., Reynolds, A., Stump, A.: Fast and Flexible Proof Checking for SMT. In: B. Dutertre, O. Strichman (eds.) Proceedings of International Workshop on Satisfiability Modulo Theories (2009) 39. Reynolds, A., Hadarean, L., Tinelli, C., Ge, Y., Stump, A., Barrett, C.: Comparing proof systems for linear real arithmetic with LFSC. In: A. Gupta, D. Kroening (eds.) Proceedings of International Workshop on Satisfiability Modulo Theories (2010) 40. Reynolds, A., Tinelli, C., Hadarean, L.: Certified interpolant generation for EUF. In: S. Lahiri, S. Seshia (eds.) Proceedings of the 9th International Workshop on Satisfiability Modulo Theories (2011) 41. Robinson, J., Voronkov, A.E.: Handbook of Automated Reasoning. Elsevier Science Publishers and MIT Press (2001) 42. Sebastiani, R.: Lazy satisability modulo theories. Journal on Satisfiability, Boolean Modeling and Computation 3(3-4), 141–224 (2007) 43. Stump, A.: Proof checking technology for satisfiability modulo theories. In: A. Abel, C. Urban (eds.) Proceedings of the International Workshop on Logical Frameworks and Metalanguages: Theory and Practice (LFMTP) (2008) 44. Stump, A., Dill, D.: Faster Proof Checking in the Edinburgh Logical Framework. In: 18th International Conference on Automated Deduction (CADE), pp. 392–407 (2002) 45. Stump, A., Oe, D.: Towards an SMT Proof Format. In: C. Barrett, L. de Moura (eds.) Proceedings of International Workshop on Satisfiability Modulo Theories (2008) 46. Van Gelder, A.: URL http://users.soe.ucsc.edu/~avg/ProofChecker/ ProofChecker-fileformat.txt

SMT Proof Checking Using a Logical Framework

27

47. Watkins, K., Cervesato, I., Pfenning, F., Walker, D.: A Concurrent Logical Framework I: Judgments and Properties. Tech. Rep. CMU-CS-02-101, Carnegie Mellon University (2002) 48. Weber, T., Amjad, H.: Efficiently checking propositional refutations in HOL theorem provers. Journal of Applied Logic 7(1), 26 – 40 (2009) 49. Zee, K., Kuncak, V., Rinard, M.C.: An integrated proof language for imperative programs. In: Proceedings of the 2009 ACM SIGPLAN Conference on Programming Language Design and Implementation (PLDI), pp. 338–351 (2009) 50. Zeller, M., Stump, A., Deters, M.: Signature Compilation for the Edinburgh Logical Framework. In: C. Sch¨ urmann (ed.) Workshop on Logical Frameworks and Meta-Languages: Theory and Practice (LFMTP) (2007) 51. Zhang, L., Malik, S.: The quest for efficient boolean satisfiability solvers. In: Proceedings of 8th International Conference on Computer Aided Deduction (CADE) (2002)

A Typing Rules for Code The completely standard type-computation rules for code terms are given in Figure 21. Code terms are monomorphically typed. We write N for any arbitrary precision integer, and use several arithmetic operations on these; others can be easily modularly added. Function applications are required to be simply typed. In the typing rule for pattern matching expressions, patterns P must be of the form c or (c x1 · · · xm ), where c is a constructor, not a bound variable (we do not formalize the machinery to track this difference). In the latter case, ctxt(P ) = {x1 : T1 , . . . xn : Tn }, where c has type Πx1 : T1 . · · · xn : Tn .T . We sometimes write (do C1 C2 ) as an abbreviation for (let x C1 C2 ), where x 6∈ FV(C2 ).

Γ (x) = T Γ `x⇒T

Γ ` N ⇒ mpz

Γ ` t1 ⇒ mpz Γ ` t2 ⇒ mpz Γ ` t1 + t2 ⇒ mpz

Γ ` t ⇒ mpz Γ ` − t ⇒ mpz

Γ ` C1 ⇒ T 0 Γ, x : T 0 ` C2 ⇒ T Γ ` (let x C1 C2 ) ⇒ T

Γ ` C1 ⇒ mpz

Γ `C⇒T Γ ` (markvar C) ⇒ T

Γ ` t1 ⇒ Πx:T1 . T2

Γ ` T ⇒ type Γ ` (fail T ) ⇒ T

Γ ` C1 ⇒ T 0 Γ ` C2 ⇒ T Γ ` C3 ⇒ T Γ ` (ifmarked C1 C2 C3 ) ⇒ T

Γ `C⇒T

∀i ∈ {1, . . . , n}.(Γ ` Pi ⇒ T

Γ ` C2 ⇒ T

Γ ` C3 ⇒ T

Γ ` (ifneg C1 C2 C3 ) ⇒ T Γ ` t2 ⇒ T1

x 6∈ FV(T2 )

Γ ` (t1 t2 ) ⇒ T2

Γ, ctxt(Pi ) ` Ci ⇒ T 0 )

Γ ` (match C (P1 C1 ) · · · (Pn Cn )) ⇒ T 0 Fig. 21 Typing Rules for Code Terms. Rules for the built-in rational type are similar to those for the the integer type, and so are omitted.

B Helper Code for Resolution The helper code called by the side condition program resolve of the encoded resolution rule R is given in Figures 22 and 23. We can note the frequent uses of match, for decomposing or testing the form of data. The program eqvar of Figure 22 uses variable marking to test for equality of LF variables. The code assumes a datatype of Booleans tt and ff. It marks the first variable,

28

Aaron Stump et al.

(program eqvar ((v1 var) (v2 var)) bool (do (markvar v1) (let s (ifmarked v2 tt ff) (do (markvar v1) s)))) (program litvar ((l lit)) var (match l ((pos x) x) ((neg x) x))) (program eqlit ((l1 lit) (l2 lit)) bool (match l1 ((pos v1) (match l2 ((pos v2) ((neg v2) ((neg v1) (match l2 ((pos v2) ((neg v2)

(eqvar v1 v2)) ff))) ff) (eqvar v1 v2))))))

Fig. 22 Variable and literal comparison. (declare Ok type) (declare ok Ok) (program in ((l lit) (c clause)) Ok (match c ((clc l’ c’) (match (eqlit l l’) (tt ok) (ff (in l c’)))) (cln (fail Ok)))) (program remove ((l lit) (c clause)) clause (match c (cln cln) ((clc l’ c’) (let u (remove l c’) (match (eqlit l l’) (tt u) (ff (clc l’ u))))))) (program append ((c1 clause) (c2 clause)) clause (match c1 (cln c2) ((clc l c1’) (clc l (append c1’ c2))))) (program dropdups ((c1 clause)) clause (match c1 (cln cln) ((clc l c1’) (let v (litvar l) (ifmarked v (dropdups c1’) (do (markvar v) (let r (clc l (dropdups c1’)) (do (markvar v) ; clear the mark r)))))))) Fig. 23 Operations on clauses.

and then tests if the second variable is marked. Assuming all variables are unmarked except during operations such as this, the second variable will be marked iff it happens to be the first variable. The mark is then cleared (recall that markvar toggles marks), and the appropriate Boolean result returned. Marks are also used by dropdups to drop duplicate literals from the resolvent.

C Small Example Proof Figure 24 shows a small QF IDL proof. This proof derives a contradiction from the assumed formula (and (

Suggest Documents