PDPAR 2005
Reduced Functional Consistency of Uninterpreted Functions Amir Pnueli1 Ofer Strichman2 1 New-York
University, NY, USA and the Weizmann Institute, Rehovot, Israel. Email:
[email protected] 2 Technion, Haifa, Israel. Email:
[email protected]
Abstract A reduction of Equality Logic with Uninterpreted Functions (EUF) to Equality Logic with Ackermann’s method suffers from a quadratic growth in the number of functional consistency constraints (constraints of the form x = y → F (x) = F (y)). We propose a framework in which syntactic characteristics of function instances (their signature) is used for guessing which constraints will possibly be needed for the proof. This framework can be either combined in an abstraction-refinement loop, or, in some cases, be used without refinement iterations. The framework is suitable for equivalence verification problems, which is one of the typical uses of Uninterpreted Functions. It enabled us to verify dozens of verification conditions resulting from Translation Validation that we could not prove otherwise. Key words: Equalities with Uninterpreted Functions, Ackermann-Reduction, Functional Consistency.
1
Introduction
Equality Logic with Uninterpreted Functions (EUF) is a popular logic fragment in verification, especially when it comes to proving equivalence between transition systems. Proving equivalence between two versions of a circuit [6,4] or between source and target of a compiler in a process called Translation Validation [10,8] are two prominent examples of usage of this logic. Well-formed EUF formulas are defined as follows, where constructs inside left and right angle brackets denote non-terminals. Definition 1.1 [Equality Logic with Uninterpreted Functions] formula ::= hformulai ∨ hformulai | ¬hformulai | atom atom ::= htermi = htermi | Boolean-variable term ::= term-variable | function-symbol(h list of termsi) This paper is electronically published in Electronic Notes in Theoretical Computer Science URL: www.elsevier.nl/locate/entcs
Pnueli, Strichman
We will concentrate in this article on a single-sort version of this logic: all term-variables are defined over a single infinite domain. In addition we will assume that term-sharing is enabled through the use of auxiliary macro variables. A macro definition has the form auxi := termi and is conjoined with the rest of the formula. auxi is an auxiliary variable that can be substituted with termi where ever it appears in the EUF formula. Uninterpreted Functions can be eliminated by using Ackermann’s reduction [1] (equivalently, Bryant et al. reduction [5,3]). The reduction requires replacing Uninterpreted Functions with new term-variables, and adding a set of constraints to enforce Functional Consistency, i.e. that two function instances that are instantiated with equal arguments, return the same value. From here on, we will denote function symbols with capital letters. Let F be an n-ary function symbol. We denote by F (~t) an instantiation of F with a vector of arguments ~t, and by |F | the number of syntactically distinct instances of F in the given formula. We assume some predefined indexing of these instances {F1 , . . . , Fk }, where k = |F |. Following Ackermann’s reduction, we replace each function instance in this set with a new term variable, which we denote with small letters. Then, for every pair of instances of F , say F (~a) and F (~b) where ~a = (a1 , . . . , an ) and ~b = (b1 , . . . , bn ), we add a constraint n ^ al = bl → fi = fj (1) l=1
where fi and fj are the new term variables replacing F (~a) and F (~b), respectively. Let F be the set of Uninterpreted Function symbols in a EUF formula ϕ. Then this reduction results in ΣF ∈F |F | · (|F | − 1)/2 constraints 1 . Our experience with Translation Validation [8] problems was that this quadratic growth is the most frequent bottleneck in the verification process, which led us to develop the technique we propose here. Let A and B be the two compared transition systems (equivalently, two circuits), in which some or all of the functions are abstracted with Uninterpreted Functions. Without loss of generality assume that there is only one function symbol F , and that F (t~1 ) . . . F (t~n ) are the instances of F . Let ϕ(A, B) be a EUF formula representing the equivalence verification-condition. After applying Ackermann’s reduction we are left with an Equality Logic formula ϕE (A, B) with new variables f1 , . . . , fn and a set of transitivity constraints (in the form of (1)) F C over t~1 , . . . , t~n and f1 , . . . , fn . The new verification condition is then ^ (2) ( f ) → ϕE (A, B) f ∈F C 1
The alternative reduction by Bryant et al. is syntactically different, but follows similar principles and results in the same complexity. We will not detail here the differences between the two reductions, but it should be noted that our suggested technique applies equally to both.
2
Pnueli, Strichman
We are interested in simplifying the computational effort required for checking this formula by finding F C 0 ⊂ F C such that: (3)
a
One way to tackle this problem is by using a Counterexample-Guided AbstractionRefinement (CEGAR) loop: (i) let F C 0 = ∅. V (ii) if |= ( f ∈F C 0 f ) → ϕ holds, return ‘valid’. (iii) if the counterexample is consistent with F C, return ‘invalid’. (iv) Add constraints from (F C \ F C 0 ) to F C 0 . (v) goto line ii. Termination of this loop is guaranteed because in the worst case F C 0 = F C, a case in which the loop has to terminate in either line ii or iii. The main challenge in this loop is in choosing the refinement strategy in step iv that leads to fast termination. In the context of Model-Checking, extensive research led to several refinement techniques based on an analysis of the spurious counterexample. While similar techniques can be used here, we claim that we can do better based on the typical nature of equivalence checking problems. In particular, we can predefine the refinement mechanism not based on the counterexamples, but rather on our knowledge of the problem. This leads to termination with fewer iterations, and in our particular case to termination without iterations at all: we are able to predict exactly what constraints can possibly be required for the proof. The key observation our technique is based on is that A and B are not arbitrary transition systems: one of them, say A, is typically an early version of the other, B, and an equivalence proof merely assures that the transformation from A to B maintains the semantics of A 2 . If indeed, as we assumed, A and B are equivalent under functional consistency, it probably means that the usage of functions was not altered during the transformation. Therefore it is most likely that the proof only relies on functional consistency between function instances on opposite sides (by ‘opposite sides’ we mean the respective transition relations of A and B), and further, each function instance needs to be consistent with only few instances on the other side and not with all. In the extreme case, the function instances on the two sides can be paired, such that functional consistency between them is all that is needed for the proof. Based on this observation, we propose a framework called Reduced Functional Consistency (RFC) in which functional consistency is imposed only between ‘similar’ function instances (to be defined in the next section). A refinement in this framework corresponds to weakening the definition of similarity. In the worst case all instances are similar, which brings us back to 2
This is true whether the transformation from A to B is mechanical, e.g. with a compiler, or manual as in some of the custom-design and retiming-oriented transformations of circuits.
3
Pnueli, Strichman
Ackemrann’s reduction. Similarity is defined by various syntactic characteristics of function instances. We use signatures of these instances to capture these characteristics. A signature is a function that maps a function instance to a string. With a given signature, RFC partitions the set of function instances to small sets of instances with equivalent signature; ideally, these sets contain exactly a pair of function instances from opposite sides. Functional consistency is then imposed only between pairs of function instances from opposite sides that belong to the same partition. We describe in more detail this idea in Section 2. Instantiations of RFC can be used as refinement strategies as part of an abstraction/refinement loop. If none of the predefined signatures is sufficient (leads to convergence), then other refinement techniques can be used. Hence, completeness is maintained. RFC can also be used independently of such an iterative procedure, if there is a reason to believe that the given signature is sufficient (this is the equivalent of having an abstraction-refinement loop with one iteration). This is the case in our problem domain: we are able to prove validity with only a fraction of the original constraints, without a need for further refinement steps. A detailed example of using RFC in the context of proving program equivalence is given in Section 3. Experiments and conclusions conclude this paper in Section 4.
2
Reduced functional consistency constraints
In the Reduced Functional Consistency (RFC) framework, the set of function instances is partitioned to small sets such that two instances are in the same set if and only if they have the same signature (see below). Assuming we have a well-defined signature Σ, we impose consistency as follows: (i) For each function instance F (~t) compute Σ(F (~t)), the signature of F (~t). (ii) Add a functional consistency constraint for the pair F (~x) and F (~y ) if and only if (a) Σ(F (~x)) = Σ(F (~y )), and (b) F (~x) and F (~y ) appear on opposite sides of the equation. Conditions a and b are orthogonal. From hereon we concentrate only on condition a, i.e., finding signatures, while assuming that condition b is fixed. Further, we will assume that every signature we define partitions the instances to sets with an equal number of instances from each side. Consider first the signature Σ0 : Definition 2.1 [Signature Σ0 ] Let F (~t) be an Uninterpreted Function instance. Σ0 (F (~t)) is a string defined recursively as follows, where expressions inside left and right angle brackets (h·i) are further expanded recursively: 4
Pnueli, Strichman
Σ0 (term) := case term is a function F (term1 , . . . , termn ) : a macro definition aux := term : an input ‘in’ : a numerical constant c : (term) :
F (hΣ0 (term1 )i,. . .,hΣ0 (termn )i) hΣ0 (term)i in c (hΣ0 (term)i)
Note that Σ0 merely propagates macro definitions through the signature. The following example demonstrates the use of this definition. Example 2.2 Consider the sub-formula aux1 := F (i3 , 5) ∧ v1 = G(F (i1 , aux1 ), i2 ) We compute recursively Σ0 (G(F (i1 , aux1 ), i2 )) = Σ0 (G(F (i1 , Σ0 (F (i3 , 5))), i2 )) = ‘(G(F (i1 , F (i3 , 5)), i2 ))’ Such a signature is computed for each Uninterpreted Function instance 3 . 2 A unique characteristic of Σ0 is that if two function instances F (~x) and F (~y ) have the same Σ0 signature, then they can be represented with the same term variable: no functional consistency constraints are necessary. This is because Σ0 (F (~x)) = Σ0 (F (~y )) implies that the arguments of F (~x) and F (~y ) are equal, and hence, by definition of functional consistency, F (~x) and F (~y ) are equal. The problem, as we will now see, is that Σ0 might be too strong, in the sense that relying on it can lead to invalidation of proofs, and a need for a refinement. 2.1
When can Σ0 fail the proof ?
Mapping according to Σ0 can fail the proof if the validity depends on equivalence of different inputs (see below) or on the equivalence of instances of different functions, e.g. on the equivalence of a multiplication expression to an addition expression. Consider, for example, the two formulas in Figure 1. Our goal is to prove the validity of o = o0 . Under full functional consistency as defined by Ackermann, the formula o = o0 is valid. On the other 3
The definition of signatures can be extended to the more general case of equalities instead of macro assignments. In this case one possible strategy is to compute a set of strings rather than a single string as the image of the signature. For example, let ϕ be an EUF of the form aux = term1 ∧ aux = term2 ∧ ϕ0 where ϕ0 uses the term F (aux). Then Σ0 (F (aux)) = {Σ0 (F (term1 )), Σ0 (F (term2 ))}. Given such sets of signatures for each function instance, we can try to enforce consistency between F (t~1 ) and F (t~2 ) only if ∃s1 ∈ Σ0 (F (t~1 )), s2 ∈ Σ0 (F (t~2 )). s1 = s2 .
5
Pnueli, Strichman
o = case G(in) = H(in) else 0 o = case G(in) = H(in) else
: F (G(in)); : 0; : F (H(in)); : 0;
Fig. 1. The equivalence of o and o0 depends on functional consistency across two different function symbols, namely G and H, making the instantiation of RFC with Σ0 fail.
hand Σ0 (F (G(in)) 6= Σ0 (F (H(in))), and therefore according to RFC no functional consistency constraints are added that enforce the equivalence of their respective term variables. Hence, the proof fails. The reason for this is that in this example the correctness depends on the equivalence between two different functions, namely G(in) and H(in). We never encountered in practice an example in which the proof indeed depends on the equivalence between two different functions. On the other hand we did encounter cases in which the proof depends on the equivalence of two function instances that their Σ0 signature is different only because of the inputs. In other words, the validity depends on the relation between different inputs, but the definition of Σ0 implies that we do not add the functional consistency constraints that reflect this fact. Consider, for example, the two expressions in Figure 2. Here again, under functional consistency the formula
o0 = case in1 = in2 : F (in2 ); else : 0;
o = case in1 = in2 : F (in1 ); else : 0;
Fig. 2. The equivalence of o and o0 depends on functional consistency across inputs, making RFC with Σ0 fail.
o = o0 is valid. On the other hand Σ0 (F (in1 )) 6= Σ0 (F (in2 )), hence the two instances are given different term variables and no functional consistency constraints are added that can force them to be equal. Hence, the proof fails. In this case the correctness depends on the equivalence between two different inputs, i1 and i2 , which is not reflected by the functional consistency constraints. We can solve this problem by defining a new signature, Σ1 . Definition 2.3 [Signature Σ1 ] Let F (~t) be an Uninterpreted Function instance. Σ1 (F (~t)) is a string defined recursively as follows, where again left and right angle brackets represent expressions that should be further expanded: 6
Pnueli, Strichman
Σ1 (term) := case term is a function F (term1 , . . . , termn ) : a macro definition aux := term : an input : a numerical constant c : (term) :
F (hΣ1 (term1 )i,. . .,hΣ1 (termn )i) hΣ1 (term)i i c (hΣ1 (term)i)
Note that Σ1 is the same as Σ0 , except that all inputs contribute the same constant string ’i’ to the signature, making them indistinguishable from one another, e.g., given G(F (i1 , v2 ), i2 ) from Example 2.2, Σ1 (G(F (i1 , v2 ), i2 )) = ‘G(F (i, F (i, 5)), i)’. With this signature RFC groups together more function instances and correspondingly adds more functional consistency constraints, which solves the problem that was demonstrated in Figure 2. 2.2
Boolean connectives and ITE expressions
For computing the signature, Boolean connectives (including ITE expressions) inside function calls can be treated as uninterpreted predicates. In this case we add to the signature a new string ‘pb ’ for all b ∈ {¬, ∨, ∧, IT E, . . .}. For example, if an argument of a function is given by an ITE expression, we embed a string ‘pIT E ’ in the signature. Example 2.4 For inputs i1 , i2 , i3 , i4 , Σ0 (f (IT E(i1 ∧ i2 , i3 , i4 ))) = ‘f (pIT E (p∧ (i1 , i2 ), i3 , i4 ))’ 2.3
The size of the resulting formula
If Σ0 is sufficient in terms of validity, we can replace all function instances that have the same signature with the same term variable, since clearly they are always equal to one another. In this case no functional consistency constraints are needed at all (F C = ∅). If, on the other hand, we need Σ1 , functional consistency constraints are still needed. Given a function F ∈ F, Σ1 creates a partition of its set of instances to F 1 , . . . F m , m ≤ |F |/2 function symbols, such that each set F j , 1 ≤ j ≤ m contains an equal number of instances from both sides of the equation. Functional consistency is then enforced only within the smaller sets. Since we only enforce consistency between instances of opposite sides of the j 2 equation (Condition b), we only have Σm j=1 |F | /4 constraints. On the other hand according to the original Ackermann scheme we need |F | · (|F | − 1)/2, j m j or equivalently (Σm j=1 |F |) · (Σj=1 |F | − 1)/2 constraints. If we assume that there are not more than a fixed number l of instances with the same signature Σ1 , then the total number of constraints becomes linear. Indeed in the verification conditions that we needed to prove the number of 7
Pnueli, Strichman
power3(int in) { int i, out_a; out_a = in; for (i = 0; i < 2; i++) out_a = out_a * in; return out_a; }
power3_new(int in) { int out_b;
out_b = (in * in) * in; return out_b; }
(a)
(b)
Fig. 3. In order to prove the equivalence of these two programs, we replace the multiplication function instances in both programs with Uninterpreted Functions.
instances with the same signature was very low (almost never more than 4), resulting in a linear number of constraints.
3
Example: proving program equivalence
Consider the problem of verifying that the two C code segments in Figure 3 return the same output regardless of their input in 4 . First, we transform the programs into Static Single Assignment (SSA) form: we unroll the for loop twice while introducing a new auxiliary variable for each assignment to out a, removing variable declaration and output statements, and finally conjoin all program statements. These operations result in the two transition relations T and T 0 : out0 a, out1 out0 out1 out2
a, out2 a, in ∈ Z a = in ∧ a = out0 a ∗ in ∧ a = out1 a ∗ in
out0 b, in ∈ Z out0 b = (in ∗ in) ∗ in;
(T 0 )
(T )
It is now left to prove the equivalence of these two formulas, which is the same as saying that we want to prove the validity of (4)
T ∧ T 0 → out2 a = out0 b
4
This specific equivalence can be proved without Uninterpreted Functions at all with syntactic substitution. We use this example nevertheless because it is simple enough to demonstrate RFC. It does not reflect the type of Translation Validation problems we work on (which require an elaborated introduction of the translated language).
8
Pnueli, Strichman
Replacing multiplication with an uninterpreted symbol gives us the following two expressions. out0 a = in ∧ out1 a = F (out0 a, in) ∧ out2 a = F (out1 a, in)
out b = F (F (in, in), in)
(T 0UF )
(T UF )
We have four instances of the Uninterpreted Function F : F (out0 a, in), F (out1 a, in), F (in, in), and F (F (in, in), in) which we number in this order. We now replace each Uninterpreted Function symbol with the corresponding variable: out0 a = in ∧ (5) ϕE : out1 a = f1 ∧ ∧ out0 b = f4 → out2 a = out0 b out2 a = f2 and the set of Functional Consistency constraints F C is given by:
(6)
{((out0 a = out1 a ∧ in = in) → ((out0 a = in ∧ in = in) → ((out0 a = f3 ∧ in = in) → ((out1 a = in ∧ in = in) → ((out1 a = f3 ∧ in = in) → ((in = f3 ∧ in = in) →
f1 f1 f1 f2 f2 f3
= f2 ), = f3 ), = f4 ), = f3 ), = f4 ), = f4 )}
With these definitions it is left to validate Equation (2). Using Σ0 , we see that the signature of F1 and F3 is ‘F (in, in)’ while the signature of F2 and F4 is ‘F (F (in, in), in)’ 5 . Therefore we can replace the first two with one term-variable, say f1 , and the other two with a different term-variable, say f2 , and not add functional consistency constraints at all (F C = ∅). Indeed, in this case Σ0 works: the formula is still valid without explicit consistency constraints. Applying Σ1 gives us the same partitioning in this case, because there is only one input variable. We nevertheless need to add these constraints explicitly, since we do not know whether the partition depends on equal or different inputs. Therefore the set F C is now: (7)
{((out0 a = in ∧ in = in) → f1 = f3 ), ((out1 a = f3 ∧ in = in) → f2 = f4 )}
5
Note that we can refer to out0 a, out1 a, out2 a, out0 b as macro definition since each one of these variables appear in one equality conjoined with the expressions in which it is being used.
9
Pnueli, Strichman
4
Results and Summary
RFC was developed as a result of us not being able to prove a large number of verification conditions arising in Translation Validation due to the number of function instances in these formulas. We typically use Σ1 by default. We have examples with functions that are instantiated more than a hundred times, i.e., for some F , |F | > 100, resulting, correspondingly, with |F |2 /2 > 5000 constraints. After applying RFC with Σ1 just a little more than |F |/2 = 50 constraints are generated, indicating almost a perfect match, while keeping the formula valid. The bottleneck resulting from the number of functions literally disappeared in our experiments. We do not show a detailed list of solving-time results because all of our benchmarks were either solved in less than 3 seconds by both methods, or could only be solved (again, in a few seconds) with RFS while timing-out with the original set of constraints. Some representative benchmarks appear in the table below, indicating the number of generated constraints per predicate or arithmetic function symbol (only s49 in this table timed-out with the original set of constraints). These benchmarks were extracted from a bigger benchmark formula by decomposition. In most of them Σ1 generated a perfect match, i.e. k/2 constraints for k instances of a function/predicate symbol. When the arguments of the function instances are syntactically equivalent (like constants), we only introduce one common variable instead of two. This explains cases in which there is an odd number of constraints, and less than k/2 constraints for a function symbol with k instances. The last line contains in parenthesis the number of constraints that would have been added if the original Ackermann’s reduction has been used (denotingPthe number of instances of the i-th symbol by ki , this number is equal to i ki · (ki − 1)/2). s27 s44 s49 symbol k RFC k RFC k RFC < 10 5 4 2 20 22 > 12 6 2 1 16 10 + 2 1 10 5 30 23 − 9 3 4 2 42 32 ∗ 2 1 3 1 32 21 / 3 1 7 3 12 5 Total 38 17(152) 30 14 (82) 152 113 (2168) As expected, RFC both theoretically and practically dominates plain Ackermann’s reduction, as long as completeness is maintained (which was always the case in our experiments with Σ1 ). This method should, in theory, be helpful for all decision procedures for EUF formulas. It reduces drastically the size of the formula and the number of unique terms (in fact, it simply extracts a subset of the original formula), and therefore should be helpful to modern 10
Pnueli, Strichman
SAT-based SMT solvers like ICS[7] and MathSat[2] as well. Our own decision procedure is based on BDDs and the small-domain encoding reported in [9]. 4.1
Summary
We presented the Reduced Functional Consistency (RFC) framework in which signatures of functions are used for partitioning the set of function instances, and then consistency is maintained only within the partitions. We showed two such signatures which proved to be sufficient for maintaining completeness in our problem domain. We believe that in practice it should be possible to derive similar signatures in other domains. This is based on the assumption that in equivalence checking the two sides are not arbitrary, rather one of them is an early version of the other, and hence it is expected that there is a certain mapping between the function instances that we can exploit. In our domain of interest there was no need for further refinement steps beyond Σ1 . In general it can be used within an abstraction-refinement loop. If all predefined signatures fail, other refinements techniques can be invoked, like those that are based on analyzing the counterexample. In this setting, completeness is guaranteed since in the worst case all the original constraints are added.
References [1] W. Ackermann. Solvable cases of the Decision Problem. Studies in Logic and the Foundations of Mathematics. North-Holland, Amsterdam, 1954. [2] G. Audemard, P. Bertoli, A. Cimatti, A. Kornilowicz, and R. Sebastiani. A SAT based approach for solving formulas over boolean and linear mathematical propositions. In Proc. 18th International Conference on Automated Deduction (CADE’02), 2002. [3] R.E. Bryant, S. German, and M. Velev. Exploiting positive equality in a logic of equality with uninterpreted functions. In Proc. 11th Intl. Conference on Computer Aided Verification (CAV’99), 1999. [4] R.E. Bryant, S. German, and M. Velev. Processor verification using efficient reductions of the logic of uninterpreted functions to propositional logic. ACM Transactions on Computational Logic, 2(1):1–41, 2001. [5] R.E. Bryant and M. Velev. Deciding a theory of positive equality with uninterpreted functions. Technical Report CMU-CS-98-141, CMU, 1998. [6] J. R. Burch and D. L. Dill. Automatic verification of pipelined microprocessor control. In Proc. 6th Intl. Conference on Computer Aided Verification (CAV’94), volume 818 of Lect. Notes in Comp. Sci., pages 68–80. SpringerVerlag, 1994.
11
Pnueli, Strichman
[7] J.C. Filliatre, S. Owre, H. Rueb, and N. Shankar. ICS: Integrated canonizer and solver. In G. Berry, H. Comon, and A. Finkel, editors, Proc. 13th Intl. Conference on Computer Aided Verification (CAV’01), LNCS. Springer-Verlag, 2001. [8] A. Pnueli, Y. Rodeh, O. Shtrichman, and M. Siegel. Deciding equality formulas by small-domains instantiations. In Proc. 11th Intl. Conference on Computer Aided Verification (CAV’99), Lect. Notes in Comp. Sci. Springer-Verlag, 1999. [9] A. Pnueli, Y. Rodeh, O. Strichman, and M. Siegel. The small model property: How small can it be? Information and computation, 178(1):279–293, October 2002. [10] A. Pnueli, M. Siegel, and E. Singerman. Translation validation. In B. Steffen, editor, 4th Intl. Conf. TACAS’98, volume 1384 of Lect. Notes in Comp. Sci., pages 151–166. Springer-Verlag, 1998.
12