INSTITUT NATIONAL DE RECHERCHE EN INFORMATIQUE ET EN AUTOMATIQUE
Tracing the Origins of Verification Conditions Ranan Fraer
N˚ 2840 March 1996 THE`ME 2
ISSN 0249-6399
apport de recherche
Tracing the Origins of Verication Conditions Ranan Fraer Thème 2 Génie logiciel et calcul symbolique Projet Croap Rapport de recherche n2840 March 1996 17 pages
Abstract: The typical program verication system is a batch tool that accepts as input a program annotated with Floyd-Hoare assertions, performs syntactic and semantic analysis on it, and generates a list of verication conditions that is subsequently submitted to a theorem prover. When a verication condition cannot be proved, this may be due to an error in the program or an inconsistency in the annotations. Unfortunately, it is very dicult to relate a failing proof attempt to a particular piece of code or assertion. We propose a solution to this problem using the technique of origin tracking. Key-words: program verication, origin tracking, programming environment
(Résumé : tsvp)
to appear in the proceedings of the International Conference on Algebraic Methodology and Software Technology AMAST, Springer-Verlag Lecture Notes in Computer Science, Munich, July 1996.
[email protected]
Unite´ de recherche INRIA Sophia-Antipolis 2004 route des Lucioles, BP 93, 06902 SOPHIA-ANTIPOLIS Cedex (France) Te´le´phone : (33) 93 65 77 77 – Te´le´copie : (33) 93 65 77 65
Tracer les Origines des Conditions de Vérication
Résumé : Le système typique de vérication de programmes est un outil peu intéractif
qui accepte en entrée un programme annoté avec des assertions Floyd-Hoare, analyse ce pro gramme syntaxiquement et sémantiquement, et génère une liste de conditions de vérication qui est ensuite soumise à un démonstrateur de théorèmes. Si une condition de vérication ne peut pas être prouvée, ceci peut être dû à une erreur dans le programme, ou à une in consistance dans les annotations. Malheureusement, il est très dicile de relier un échec dans la preuve à un endroit dans le programme ou dans les assertions. Nous proposons une solution à ce problème en utilisant la technique du suivi des origines. Mots-clé : vérication de programmes, suivi des origines, environnement de programma tion
Tracing the Origins of Verication Conditions
3
1 Introduction Since the late sixties, when Floyd and Hoare [Flo66, Hoa69] set up the basis of a method for proving programs correct, several program verication systems have appeared. Usually the main component of such a system is a verication condition generator (VCG) that takes as input an imperative program and a specication (under the form of pre/post-conditions and loop invariants) and outputs a list of logical conditions. These conditions are supposed to be proved manually or mechanically (using a theorem prover), and their satisability ensures the correctness of the program. As examples of tools that work this way we can cite the Stanford Pascal Verier [ILL75], Gypsy [Goo85], EVES [KPS+ 92] or Penelope [GMP90]. Verication condition generators are also used in formal methods based on stepwise renement like VDM [Jon86] or B [Abr95], as each renement step has to be validated by proving the corresponding set of verication conditions. However, if one verication condition cannot be proved this means that either the pro gram does not satisfy its specication, or the specication does not state correctly the intended meaning of the program. In both cases the user is supposed to modify the program or the specication and the system does not give any hint on where this modication should occur. Even when trying to prove a true condition, stated as an ordinary rst order logic formula, it is quite dicult to understand what one is trying to prove, and in which way this condition is related to a possible execution of the program. In other words, the absence of links between the source program and the verication conditions is a serious lack to the task of formal verication. To compensate, users do some kind of mental processing for retrieving the origin of terms that appear in a condition. For instance, by recognizing in a condition the negation of the test of an if statement, they deduce that this condition is generated by the else branch of this statement. This paper proposes a way of mechanizing this process by instrumenting a verication condition generator with origin tracking facilities. This technique, proposed by Bertot in [Ber91, Ber92, Ber93], has beeen initially applied in implementing source-level debugging of programming languages by establishing a relation between the current instruction and a position in the source program. Given a natural semantics description of the language, it is possible to integrate this relation in the semantics and the integration is shown to be systematic and semi-automatizable. Related work on origin tracking in the framework of algebraic specications is reported by Van Deursen, Klint and Tip in [DKT93]. As origin tracking has been proved useful for interpreters, it is sensible to apply it to ve rication conditions generators. Indeed, both kinds of tools are similar in that they perform syntactic and semantic analysis on the input program and generate some output, be it the current state of an execution machine, or verication conditions. If debugging is considered essential in understanding program execution, there is no counterpart in program verica tion systems that could help understanding verication conditions. Our work represents an attempt to progress in this area. We have applied these ideas in a verication condition generator built with the pro gramming environment generator Centaur [Jac94]. We have beneted of the facilities of manipulating syntactic structures and of the built-in mechanisms of selection and highligh
RR n2840
4
Ranan Fraer
ting. Alternative approaches could use other programming environment generators like the Synthesizer Generator [RT88], or ASF+SDF [Kli93]. The paper is organized as follows. Section 2 presents the algorithm of verication condi tion generation. In the section 3 we exemplify the problem to solve on a specic program and its conditions. Sections 4 and 5 introduce origin functions as a data structure well sui ted to represent descendance information. In section 6 we describe the integration of origin functions in the verication conditions generator and the complexity of the new algorithm is analyzed in section 7. Section 8 explains the particularities of the Centaur implementation. Section 9 studies the extensibility of this work to real program verication systems. We conclude in section 10 by some remarks on the connections with program slicing.
2 Verication condition generators We will concentrate on a small imperative programming language with assignments, condi tionals and loops. Programs are annotated with loop invariants and optional assertions can be placed before instructions. A BNF syntax of the language is given below. Variable declarations, expressions and rst-order assertions are dened in the usual way. ::= decl in {} {} ::= skip | := | ; | if then else | while inv {} do ::= {} | The algorithm to generate verication conditions from an annotated program is based on Dijkstra's weakest-precondition calculus [Dij76]. Let wp(I; Q) denote the weakest precondi tion that should be satised before execution of I , for the postcondition Q to be true after execution of I . The algorithm performs a backwards traversal of the instruction I starting from the postcondition Q, and computes the missing assertions at each intermediary point. When reaching a user provided assertion P , it generates a verication condition of the form P ) wp(I; Q), and it restarts with P as the new postcondition. We introduce the relation1 Post ` I ! Pre; Conds with the meaning: the postcondition Post is true after a terminating execution of the instruction I if the precondition Pre is true before executing I , and if the list of verication conditions Conds contains only valid formulas. We restrict our presentation to partial correctness, but the same considerations apply to total correctness as well. In what follows, we use [] to denote the empty list, [a; b] for a list with two elements a and b, and L1 :L2 for the concatenation of lists L1 and L2 .
1 The convention followed here is to separate inputs and outputs of the relation by an arrow ). The input term preceded by a turnstyle ` is also distinguished as the subject of the relation, while the other inputs form the evaluation context.
INRIA
Tracing the Origins of Verication Conditions
5
The algorithm is described using inference rules:
Q ` skip ! Q; [] x; E ` Q ! Q0 Q ` x := E ! Q0 ; [] Q ` I2 ! P2 ; Conds2 P2 ` I1 ! P1 ; Conds1 Q ` I1 ; I2 ! P1 ; Conds1 :Conds2 Q ` I1 ! P1 ; Conds1 Q ` I2 ! P2 ; Conds2 Q ` if B then I1 else I2 ! (B ) P1 ) ^ (:B ) P2 ); Conds1 :Conds2 Inv ` S ! P; Conds Q ` while B inv fInvg do S ! Inv; [Inv ^ B ) P; Inv ^ :B ) Q]:Conds Q ` S ! P; Conds Q ` fAg S ! A; [A ) P ]:Conds The relation x; E ` Q ! Q0 stands for Q0 = Q[E=x] where the substitution Q[E=x] is
formalized in the usual way avoiding the capture of free variables in quantications. We present below two typical substitution rules:
x; E ` P ! P 0 x; E ` R ! R0 x; E ` P ^ R ! P 0 ^ R0 x; E ` x ! E
(1)
(2) Finally, we need a rule for computing the conditions for the whole program. The notation ` Prog ! Conds stands for: Conds is the set of verication conditions generated from the annotated program Prog.
Q ` I ! P 0 ; Conds ` decl Decls in fP g I fQg ! [P ) P 0 ]:Conds
3 An example
(3)
As an example, consider the following program computing the quotient q and the remainder r of the integer division of x by y. We have purposedly introduced a bug in the program: the test of the loop is r > y instead of r y. This way, if x is an exact multiple of y the program stops with r equal to y and not to 0. In the gure 1 the program is shown on the left-hand side and the verication conditions on the right-hand side. The three conditions state in order that the invariant is satised at
RR n2840
6
Ranan Fraer
decl var x; y; q; r:integer in
f x 0g q := 0 ; r := x ; while r > y inv f x = q y + r ^ r 0g do
q := q + 1 ; r := r , y ; fx = q y + r ^ r 0 ^ r < y g
x0) x=0y+x^x0 x=qy+r^r 0^r >y ) x = (q + 1) y + r , y ^ r , y 0
x=qy+r^r 0^ : r >y ) x=qy+r^r 0^r y) does not imply r < y. A system based on the technique proposed in this paper automatically highlights the origin of r > y in the source program, when the user selects this expression in the verication condition. Note also that the origin of r > y is exactly the same expression in the program. This is not always the case. We distinguish two other cases: there are expressions like : (r > y) that have no origin, although some of their subex pressions might have an origin. They represent new terms that have been constructed from already existing expressions during the generation process. more subtly, an expression like x = (q + 1) y + r , y has its origin in the subexpression x = q y + r of the loop invariant, but some of its subexpressions, like q + 1 and r , y, have their origins in the right-hand sides of the assignments in the loop body. We will propose a representation of origin informations that takes into account these die rences by avoiding to store the origins of all subterms of a term that was not changed during the condition generation.
4 Origin functions In order to understand how parts of the source program appear at dierent positions in the verication conditions, let us consider a very simple program decl D in fP g x := E fx > 0g
INRIA
7
Tracing the Origins of Verication Conditions
which generates only one verication condition: P ) E > 0. Considering the program and the condition as trees the descendance relation can be depicted as in the gure 2. program
=> VCG
:= D
>
>
P
P x
x
0
E
0 E
Figure 2: The descendance relation It is obvious that a node at any position in area P in the condition descends from the node at the same position in area P in the program. The same goes for the area E , while individual nodes like > and 0 descend from their counterparts in the program. Occurrences are the most natural way to denote positions in trees. For every natural number k, we consider the function sk which maps any tree, op(t1 ; : : : ; tk ; : : : ; tn ), to its child of rank k, i.e, tk . We call an occurrence any function obtained by composing an arbitrary number of times the functions sk . The composition operation is denoted , we denote its neutral element id, and O is the set of all the occurrences. For any tree T we call its domain, denoted D(T ), the set of occurrences that are valid on this tree. For example, if we consider the tree T = f (g(1; 2); 3), we have s2 (T ) = 3, s1 (T ) = g(1; 2), and s2 s1 (T ) = 2. The domain of the tree T is the set D(T ) = fid; s1 ; s2 ; s1 s1 ; s2 s1 g. There are two reasons for prefering this unusual representation of occurrences. First, ascending paths are well-suited for top-down recursive computations (as in the VCG algo rithm), as from the occurrence of a node one can compute in constant time the occurrence of a son. of a son can be computed in constant time. Second, projections are more exible to manipulate than lists of integers. For instance, the subterm at a given occurrence is obtained by simple function application. Following [Ber93], we remark that the relation of descendance is simply a relation between occurrences in the nal term and occurrences in the initial term. We also notice that the converse of this descendance relation is a function (called the origin function) from D(Conds) to D(Prog). The origin function is partial as there might be some nodes in Conds newly created during the generation, thus with no origin in Prog. In order to make this function total we associate a special value nil to all the nodes that have no origin. Denoting by Onil the set O[fnilg, we let Orig = Onil ! Onil be the type of origin functions. The occurrence
RR n2840
8
Ranan Fraer
composition can be extended to Onil as follows:
u nil = nil u = nil; u 2 Onil For instance, the origin function O corresponding to the gure 2 is given by: O(id) = nil O(u s1 ) = u s2 ; u 2 D(P ) O(s2 ) = s4 O(v s1 s2 ) = v s2 s3 ; v 2 D(E ) O(s2 s2 ) = s2 s4 O(w) = nil; w 2 Onil n D(Conds)
5 Representing origin functions
Our problem being to compute the origin function relating occurrences in Conds to occur rences in Prog, we have to choose rst a suitable representation for origin functions. As explained in this section, we end up with a non-trivial data structure motivated by the need of mixing implicit and explicit informations. A natural idea is to label each node of Conds with the occurrence of the corresponding node in Prog if there is such a node, and with nil otherwise. The representation of the origin function can follow the recursive structure of terms: suppose we want to describe the labels carried by a term M = op(t1 ; : : : ; tn ). These labels will be completely described once we have given the label u carried by the head of the term, and the origin functions f1 ; : : : ; fn for the terms t1 ; : : : ; tn . The functions f1 ; : : : ; fn may in turn be dened in the same way recursively. Denoting Orig the set of tuples [f1 ; : : : ; fn] of origin functions, we introduce the mapping c : (Orig Onil ) ! Orig that maps a tuple of origin functions over occurrences and an occurrence to an origin function dened by the following properties:
c([f1 ; : : : ; fn ]; u)(id) c([f1 ; : : : ; fn ]; u)(v si ) c([f1 ; : : : ; fn]; u)(w)
u (4) fi (v); i n (5) = nil; w 2 fnil g [ fv si j i > ng (6) Note that in the case of a leaf the tuple [f1 ; : : : ; fn ] is empty, the corresponding origin func tion having the form c([]; u). The operator c gives the possibility to describe extensionally =
=
the labeling of a term as it permits to construct a tree isomorphic to the labeled term, carrying all the labels. Obviously, this notation is powerful enough to represent all origin functions. However, as discussed in [Ber93], this representation would be highly redundant. Let T = u(Prog) be a term that gets translated without modications from Prog to Conds (as
INRIA
9
Tracing the Origins of Verication Conditions
is the case for P or E in the gure 2). The origin function f associated to T is a very regular one: f (v) = v u for any occurrence v 2 D(T ). An informal explanation of this statement (see gure 3) is: if t is the subterm of T at the occurence v then
t = v(T ) = v(u(Prog)) = (v u)(Prog) Conds
Prog VCG u
... v
...
...
T
... v
T
Figure 3: A regular origin function This suggests interpreting occurrences as origin functions in the following way. We introduce a mapping : O ! Orig dened by the property:
(u)(v) = v u; u 2 O; v 2 Onil (7) Now, for each term T = u(Prog) as above, its origin function would be simply denoted by u, being interpreted by the function (u). This way we pass from an extensional representation, demanding an explicit labeling of each node, to an intensional one. The operator will be systematically omitted in practical manipulations, as it is always possible to infer from the context, when u should be interpreted as an occurrence or as the corresponding origin function. Considering again the example of the gure 2, the origin function O can be represented by O = c([s2 ; c([s2 s3 ; s2 s4 ]; s4 )]; nil). The tree representation in gure 4 is easier to understand, as it shows the correspondence between nodes in the syntactic tree of the conditions and their labels in the origin function. It is interesting to remark that due to the intensionality this representation holds for arbitrary values of P and E . This property will be essential when instrumenting the generation algorithm with origin computations.
6 Computing origin functions In this section we show how to integrate origin functions in the VCG algorithm, by adding them as new parameters to each rule. This way, every term t will be annotated with
RR n2840
10
Ranan Fraer nil
=>
s2
>
s4
P 0
s 2 o s 4 s3
E
o
s4
Figure 4: A tree representation of the origin function its origin function Ot relative to the initial term of the computation Prog. First, the relation ` Prog ! Conds will be extended to a new relation ` Prog ! OConds : Conds that computes not only Conds but also its origin function OConds . In turn, this requires corresponding extensions of the other relations, as shown below:
OPost : Post ` OInst : Inst ! OPre : Pre; OConds : Conds x; OE : E ` OQ : Q ! OQ : Q0 The reason for not taking into account the origin of the variable x in the last relation, is that x only acts as a binder here, and it will not appear in the result of the substitution Q0 . This shows that the extension of each relation with origin functions is not as systematic 0
as one might think. It requires in particular a data-ow analysis to see which parts of the input could actually appear in the output. In order to accommodate these extensions we have to modify accordingly the inference rules dening the relations. The computation will be initialised by modifying the rule (3) to: 0
s4 : Q ` s3 : I ! OP : P ; OConds : Conds ` decl Decls in fP g I fQg ! c([s2 ; OP ]; nil):OConds : [P ) P 0 ]:Conds The fact that s4 is the origin functions of Q follows from Q = s4 (Prog). In the same way s3 and s2 are the origin functions of I and P . The origin function of P ) P 0 is c([s2 ; OP ]; nil) as the implication ) has no origin itself. Note also that the origin function of a list like Conds is represented by the list of origin functions associated to each element of the list. 0
0
0
The modication of the rule for skip needs no further explanation:
OQ : Q ` OI : skip ! OQ : Q; [] : [] The only diculty in the assignment rule (shown below) is proving that the origin func tion of E is OE = OI s2 , where stands for the composition of origin functions (operation that will be dened in the next section). This follows from E = s2 (I ):
INRIA
11
Tracing the Origins of Verication Conditions
Let v be an arbitrary occurrence in D(E ). Then the subterm at the occurrence v in E is
I
v(E ) = v(s2 (I )) = (v s2 )(I ) so the origin of this subterm in Prog is given by: OE (v) = OI (v s2 ) = OI (s2 (v)) = (OI s2 )(v) By extensionality we infer OE = OI s2 . x; OI s2 : E ` OQ : Q ! OQ : Q0 OQ : Q ` OI : x := E ! OQ : Q0 ; [] : []
s2 x E v
0
0
One can already see that the integration of origin functions in each rule is systematic as it respects the following general principle: Compute the origins of inputs in premises and the origins of outputs in the conclusion as a function of already computed origins (of inputs in the conclusion or of outputs in premises). In this respect, we remark the analogy with the evaluation of synthesized and inherited attributes in Attribute Grammars. The computed origins have to be coherent with the structure of terms, and also satisfy the property of common variables: each occurrence of a same variable in a rule has to be annotated with the same origin function. For the sake of completeness, we present below the modications of the other rules of the VCG algorithm:
OQ : Q ` OI s2 : I2 ! OP2 : P2 ; OConds2 : Conds2 OP2 : P2 ` OI s1 : I1 ! OP1 : P1 ; OConds1 : Conds1 OQ : Q ` OI : I1 ; I2 ! OP1 : P1 ; OConds1 :OConds2 : Conds1 :Conds2 OQ : Q ` OI s2 : I1 ! OP1 : P1 ; OConds1 : Conds1 OQ : Q ` OI s3 : I2 ! OP2 : P2 ; OConds2 : Conds2 OQ : Q ` OI : if B then I1 else I2 ! c([c([OI s1 ; OP1 ]; nil); c([c([OI s1 ]; nil); OP2 ]; nil)]; nil) : (B ) P1 ) ^ (:B ) P2 ); OConds1 :OConds2 : Conds1 :Conds2 OI s2 : Inv ` OI s3 : S ! OP : P; OConds : Conds OQ : Q ` OI : while B inv fInvg do S ! OI s2 : Inv; [c([c([OI s2 ; OI s1 ]; nil ); OP ]; nil ); c([c([OI s2 ; c([OI s1 ]; nil)]; nil); OQ]; nil)]:OConds : [Inv ^ B ) P; Inv ^ :B ) Q]:Conds
RR n2840
12
Ranan Fraer
OQ : Q ` OI s2 : S ! OP : P; OConds : Conds OQ : Q ` OI : fAg S ! OI s1 : OA ; [c([OI s1 ; OP ]; nil )]:OConds : [A ) P ]:Conds This shows that the manipulation of origin functions becomes quite tedious in the case of more complex rules, but it is still systematic. For instance, in the if rule c([c([OI s1 ; OP1 ]; nil); c([c([OI s1 ]; nil); OP2 ]; nil)]; nil) is the origin function of (B ) P1 ) ^ (:B ) P2 ) where c([OI s1 ; OP1 ]; nil) is the origin function of B ) P1 , etc. Given the diculty of hand-writing such complex rules, it is sensible to investigate the possibility of automatizing the derivation of the new algorithm from the initial one. This automatization is possible for most of the rules, but there are still some delicate cases as exemplied by the substitution rules:
x; OE : E ` OQ s1 : P ! OP : P 0 x; OE : E ` OQ s2 : R ! OR : R0 x; OE : E ` OQ : P ^ R ! c([OP ; OR ]; OQ (id)) : P 0 ^ R0 0
0
0
0
(8)
x; OE : E ` OQ : x ! OE : E
We use the occurrence OQ (id) in the rule (8) to indicate that the operator ^ in P 0 ^ R0 descends from the same operator in P ^ R. Unfortunately, this information cannot be derived automatically from the syntactic form of the rule (1). It is this kind of semantic reasoning that prevents us from having an automatic instrumentation of the initial algorithm with origin computations. Van Deursen, Klint and Tip [DKT93] propose some useful heuristics for dealing with such cases. For instance, an heuristic that will solve the problem above considers that the top symbol of the output tree descends from the top symbol of the input tree. However, this approach is still limited as there are examples where the heuristics will fail to establish some descendance relations. As an alternative, we can envision a semi-automatic approach where an approximate version of each modied rule is generated automatically. This version will only take into account the `syntactic descendance' information given by the property of common variables, and might be further adjusted by the user to accommodate `semantic descendance' informa tion.
7 The complexity of the modied algorithm In this section we prove that the additional overhead of computing origin functions does not modify the global time complexity of the VCG algorithm. The proof is based on a complexity analysis of the two new operations used in the algorithm: the application of an origin function to an occurrence, and the composition of two origin functions. Lemma 1 If f is an origin function and v an occurrence, then the time required to compute f (v) is O(sizeO (v)), where:
sizeO (v si )
=
1+
sizeO (v)
INRIA
13
Tracing the Origins of Verication Conditions
sizeO (id)
=
0
Proof By induction on the structure of v using the equations (4)-(7). 2 Lemma 2 If f and g are two origin functions, then the time complexity of computing f g is O(sizeOrig (g)) where: sizeOrig (c[f1 ; : : : ; fn ]; u) sizeOrig ((v))
= =
sizeO (u) + sizeOrig (f1 ) + : : : + sizeOrig (fn ) sizeO (v)
1+
Proof The composition of two origin functions can be dened recursively on the structure of the second function, as follows: f c([f1 ; : : : ; fn ]; u) c([f1 ; : : : ; fn ]; u) (v si ) f (id) (v) (u)
= = = =
c([f f1 ; : : : ; f fn ]; f (u)) fi (v) f (u v)
All the equalities above can be veried easily, by applying each side of an equality to an arbitrary occurrence w. The proof follows by structural induction on g.2
Proposition 1 The modied VCG algorithm has the same time complexity as the initial one (modulo a multiplying constant).
Proof The complexity of each algorithm is given by the number of inference steps. The new algorithm takes the same steps as the initial one, but each step might take some additional time to compute occurrences like OQ (id) or origin functions like OI s2 . But according to the two lemmas above these additional computations can be done in constant time. 2 The price of intensionality When a variable x has no free occurrence in an expression Q, the result of the substitution is still Q, but the corresponding origin function has a
completely extensional representation instead of a more concise, intensional one. This is due to rules like (8) that systematically use the operator c for constructing origin functions. If we really want to exploit the advantages of an intensional representation, we have to distinguish explicitly the case when no substitution eectively occurs. This can be done by the means of the following normalization rule (whose validity can be proved by applying both sides of the equality to an arbitrary occurrence w):
c([f s1 ; : : : ; f sn ]; f (id)) = f; f 2 Orig (9) Applying this rule in (8) we obtain that if OP = OP and OR = OR then the following holds: OQ = c([OP ; OR ]; OQ (id)) = OQ . 0
0
RR n2840
0
0
0
14
Ranan Fraer
However, the complexity of the equality tests on origin functions is proportional to the size of the functions, so we loose the property of constant time overhead for computing origin functions. Instead of equality tests we might use a boolean that records for the current subtree if a substitution has actually occurred or not. Although this solution is less elegant, it preserves the essential property of a constant time overhead for computing origin functions.
8 Implementation in Centaur The VCG algorithm is written in the Typol formalism [Des84], that allows one to specify natural semantics in an inference-rule style. This style of specication is well suited to exe cution, as it can easily be related with Prolog, Attribute Grammars or functional evaluation. After computing Conds and OConds the next step is to descend recursively in the two trees by labeling each node of Conds with its origin given by the corresponding node in OConds . As the two trees are not really isomorphic (due to the intensionality) a leaf in OConds might correspond to an entire subtree in Conds. In this case we do not descend further in that subtree, as the origin information is implicit for the nodes below. Using Centaur, we benet from the facilities of a syntax directed editor like selection and highlight mechanisms. Thus when selecting a term t in a verication condition, we obtain a pointer to the corresponding node in the syntactic tree of Conds. If t is not labeled, we have to compute its implicit origin by going upwards in the tree until we reach a labeled ancestor T of t. If T is labeled with the occurrence u and t = v(T ) then t descends from the occurrence v u in Prog (see gure 3). Once the origin of t has been computed, if it is dierent from nil we want to send this value from the window containing the verication conditions to the one containing the source program. This is done using Sophtalk [JMBC93] with the two windows declared as nodes of a network, the communication being asynchronous and event-driven. When receiving the occurence denoting the origin of t, the corresponding subtree is highlighted in the window of Prog.
9 Scaling up to real verication condition generators We have described in this paper the instrumentation of a program verication system with origin informations relating terms in the verication conditions to terms in the source pro gram. Of a particular importance is that the modied algorithm has the same time com plexity as the initial one. The use of an intensional representation makes it possible to reduce memory consumption. However, real verication condition generators are more complicated than the one pre sented in this paper. In this section we investigate the application of the method proposed in this paper to these generators.
INRIA
Tracing the Origins of Verication Conditions
15
In the case of complex programming languages, there is a normalization phase reducing all the constructs of the language to some set of normal forms (for example, transforming repeat and for loops in while loops). There is also a simplier, whose task is to discharge automatically some trivial verication conditions and simplify the remaining ones. As origin tracking follows a general principle that applies to arbitrary tree manipulations, origin infor mations can be propagated throughout the various pre/post-processing steps. Note also that our approach has the pleasant property of being language-independent, as the representation of origin functions does not interfere with the particular syntax of annotated programs. In the case of formal methods based on stepwise renement like B [Abr95], when pro ving that a concrete implementation is compatible with an abstract specication, the terms appearing in a verication condition might have their origins in the specication or in the implementation. Although our method assumes that the computation has a single input, it can be trivially extended to multiple inputs. It is enough to consider the forest formed by the trees of all inputs as a single tree with a ctive root. The semi-automatic approach of generating the modied program from the initial one suggested in the section 6 may be of great help when dealing with large generators. It becomes essential if the generator makes use of adaptive proof rules, as is the case for the Stanford Pascal Verier [ILL75] or EVES [KPS+ 92]. One needs a formal representation of the language in which the generation program is written, in order to be able to describe transformations on the programs of this language. It is also important to have a clear semantics of this language (or at least of the subset of the language used in the generation algorithm) as the transformations require precise data-ow analysis on the initial program. These requirements are satised by most of the verication condition generators as they are written in a symbolic processing language like Lisp, ML or Prolog.
10 Future work Until now we have explained how to retrieve the origins of various subexpressions of a verication condition. However, this only gives a local understanding, but does not solve the problem of relating a verication condition to a corresponding loop-free execution path in the program. A way of doing this is to associate to the condition the union of the origins of all subexpressions of this condition. The union will contain the occurrences of right-hand sides of assignments, and tests of conditionals and loops that appear in the condition. Although this approach gives only a rough approximation, it could be worked out such as to identify exactly the corresponding execution path. In fact, it is even more selective, as it gives only the instructions that contribute eectively to the generation of the condition. This relates our work to program slicing techniques [Wei84]. Informally, a program slice contains the parts of a program that aect the values computed at some designated point of interest. Further work will exploit this connection. Instead of proving that a conjunct of asser tions is true at a point in the program, one can prove that each assertion is true on the
RR n2840
16
Ranan Fraer
corresponding slice of the program, and regroup the proofs of the separate slices together to obtain a proof of the whole program.
Acknowledgements. I am very grateful to Yves Bertot for many useful comments and helpful suggestions.
References [Abr95]
J-R. Abrial. The B-Book. Assigning Programs to Meanings. Cambridge Univer sity Press, 1995. (to appear). [Ber91] Y. Bertot. Occurrences in Debugger Specications. In ACM SIGPLAN Confe rence on Programming Language Design and Implementation, 1991. [Ber92] Y. Bertot. Origin Functions in -calculus and Term Rewriting Systems. In CAAP'92, 1992. Springer Verlag LNCS 581. [Ber93] Y. Bertot. A Canonical Calculus of Residuals. In G. Huet and G. Plotkin, editors, Logical Frameworks, pages 147163. Cambridge University Press, 1993. (also appears as INRIA Report no. 1542, Oct. 1991). [Des84] T. Despeyroux. Executable Specications of Static Semantics. In International Symposium on Semantics of Data Types, 1984. Springer-Verlag LNCS 173. [Dij76] E.W. Dijkstra. A Discipline of Programming. Prentice-Hall, 1976. [DKT93] A.V. Deursen, P. Klint, and F. Tip. Origin Tracking. In Journal of Symbolic Computation, volume 15, pages 523545, 1993. [Flo66] R.W. Floyd. Assigning Meanings to Programs. In J.T. Schwartz, editor, Ma thematical Aspects of Computer Science : Proceedings of the 19th Symposium in Applied Mathematics, pages 1932, Providence, United States, 1966. [GMP90] D. Guaspari, C. Marceau, and W. Polak. Formal Verication of Ada Programs. In IEEE Transactions on Software Engineering, volume 16, pages 10581075, September 1990. [Goo85] D.I. Good. Mechanical Proofs about Computer Programs. In C.A.R. Hoare and J.C. Sheperdson, editors, Mathematical Logic and Programming Languages. Prentice Hall, 1985. [Hoa69] C.A.R. Hoare. An Axiomatic Basis for Computer Programming. In Communi cations of the ACM, October 1969.
INRIA
Tracing the Origins of Verication Conditions
[ILL75]
17
S. Igarashi, R.L. London, and D.C. Luckham. Automatic Program Verication: A Logical Basis and its Implementation. In Acta Informatica, volume 4, pages 145182, 1975. [Jac94] I. Jacobs. The Centaur Reference Manual. Technical report, INRIA, So phia-Antipolis, 1994. [JMBC93] I. Jacobs, F. Montagnac, J. Bertot, and D. Clement. The Sophtalk Reference Manual. Technical Report 150, INRIA, Sophia-Antipolis, 1993. [Jon86] C. Jones. Systematic Software Development using VDM. Prentice-Hall, 1986. [Kli93] P. Klint. A Meta-environment for Generating Programming Environments. In ACM Transaction on Software Engineering and Methodology, volume 2, pages 176201, 1993. [KPS+ 92] S. Kromodimoeljo, B. Pase, M. Saaltink, D. Craigen, and I. Meisels. The EVES System. In Proceedings of the International Lecture Series on Functional Pro gramming, Concurrency, Simulation and Automated Reasoning, 1992. [RT88] T. Reps and T. Teitelbaum. The Synthesizer Generator: a System for Construc ting Language Based Editors. Springer Verlag, 1988. (third edition). [Wei84] M. Weiser. Program Slicing. In IEEE Transactions on Software Engineering, volume 10, page 352, 1984.
RR n2840
Unite´ de recherche INRIA Lorraine, Technopoˆle de Nancy-Brabois, Campus scientifique, 615 rue du Jardin Botanique, BP 101, 54600 VILLERS LE`S NANCY Unite´ de recherche INRIA Rennes, Irisa, Campus universitaire de Beaulieu, 35042 RENNES Cedex Unite´ de recherche INRIA Rhoˆne-Alpes, 46 avenue Fe´lix Viallet, 38031 GRENOBLE Cedex 1 Unite´ de recherche INRIA Rocquencourt, Domaine de Voluceau, Rocquencourt, BP 105, 78153 LE CHESNAY Cedex Unite´ de recherche INRIA Sophia-Antipolis, 2004 route des Lucioles, BP 93, 06902 SOPHIA-ANTIPOLIS Cedex
E´diteur INRIA, Domaine de Voluceau, Rocquencourt, BP 105, 78153 LE CHESNAY Cedex (France) ISSN 0249-6399