Applying Formal Methods in Automated Software ...

2 downloads 0 Views 329KB Size Report
Seed's synthesis rules are based in part on guidelines from Dijkstra 6] and Gries. 7] to develop a .... Because w is also a variable, Seed synthesizes the statement w:=c b and nds that its wp is. (w = c b)w ...... 6] Edsger W. Dijkstra. A Discipline of ...
Journal of Computer and Software Engineering, Vol. 2, No. 2, pp. 137{164, 1994.

Applying Formal Methods in Automated Software Development  Betty H.C. Cheng

Department of Computer Science Michigan State University A714 Wells Hall East Lansing, Michigan 48824 ph: (517) 355-8344; fax: (517) 336-1061 email: [email protected]

This work is supported in part by NSF grant CCR-9209873 and a grant from AT&T while the author was at the University of Illinois at Urbana-Champaign. 

1

Abstract Research into the development of software tools that support formal methods is aimed at simplifying and providing assistance during the development of correct software. This paper describes the development of the Seed system, which demonstrates that the building blocks of a software system can be correctly synthesized from user-supplied formal speci cations using techniques amenable to automation. Seed accepts a formal speci cation of a problem written in predicate logic and generates annotated program source code satisfying the speci cation. The rules for choosing which programming language structures to synthesize are contained in a rule base; background knowledge and domain-speci c information are entered into a fact base. During synthesis, Seed uses the fact base to disambiguate rule applications. In addition to primitive programming language constructs, such as assignment, alternative and iterative statements, Seed is capable of synthesizing recursive and non-recursive procedures and functions, as well as abstract data types.

Keywords: Formal methods, proofs of correctness, formal speci cations, procedural abstraction

2

1 Introduction As software is used increasingly to control critical systems, correctness becomes paramount [1, 2, 3, 4]. Research into the development of software tools that support the use of formal methods is aimed at simplifying and providing assistance during the development of correct software. The objective of this project is to build a software development environment (SDE) comprising tools that supports the use of formal methods in all phases of software development, including design, speci cation, implementation, and maintenance. In the rst stage of this project, we have developed an automated program synthesis system, Seed, which synthesizes programs from formal speci cations while verifying their correctness [5]. Through the Seed project, we have shown that the building blocks of a software system can be correctly synthesized from user-supplied formal speci cations in an automated manner. Seed accepts a predicate logic speci cation of a problem and generates program source code satisfying the speci cation. For example, the expression

fQgS fRg indicates that given precondition Q is true, and program S is executed, then postcondition R is satis ed. Seed's synthesis rules are based in part on guidelines from Dijkstra [6] and Gries [7] to develop a program and its proof of correctness simultaneously. A program speci cation is expressed in terms of pre- and postconditions that describe the initial and nal states, respectively, of a program. First order predicate logic has been chosen as the formal speci cation language since it is concise, unambiguous and well-de ned [8, 9]. The semantics of a predicate logic speci cation can be directly extracted from its symbols, the small number of which implies a small rule base. One component of Seed is this small rule base that is used as synthesis rules for each type of logical expression. For example, a disjunctive expression connotes choice or alternation, and accordingly, Seed has a synthesis rule to develop an alternative statement (e.g. if-then-else, CASE statement). The statement S resulting from the synthesis process is veri ed by nding the weakest precondition (wp) of S with respect to the postcondition R, where wp(S; R) describes the set of states in which execution of S can begin and upon termination of S , R will be satis ed. The progression of our investigation was guided by the goal to synthesize the components of software systems. The initial version of Seed synthesized primitive programming structures found in an imperative language, such as assignment, alternative, and iterative statements. Later versions of Seed supported procedural and data abstraction. In order to handle procedural abstractions 3

in the system, rules have been developed to synthesize recursive and non-recursive procedures and functions from speci cations, which can be invoked through procedure calls and referenced with respect to their speci cations. A set of of rules were developed to synthesize abstract data types from speci cations in order to support data abstraction. Currently, the eciency of synthesized programming structures have not been addressed in the Seed project, instead, the investigations focused on the range of programming structures that can be developed from formal speci cations. A syntax-based editor with a graphical user-interface [10] has been developed in order to facilitate the speci cation construction process. We are also studying methods to transform informal problem descriptions into formal speci cations [11]. For example, tools such as Miriyala's SPECIFIER [12, 13] can transform an informal speci cation, written in a subset of natural language, into a formal speci cation. The focus of this paper is the development of primitive programming statements and recursive procedures from formal speci cations. The remainder of the paper is organized as follows. An overview of Seed is given in Section 2. Section 3 describes the synthesis of primitive programming structures. In Section 4, the rules for synthesizing the procedural abstractions, procedures and functions are described. Section 5 brie y describes the implementation of Seed. Section 6 contrasts the Seed project with related work. Finally, Section 7 summarizes the work presented in the paper and discusses future investigations.

2 Overview of SEED Figure 1 gives a pictorial overview of the Seed system. Once a speci cation is entered, it undergoes pre-processing. The speci cation may be decomposed into logically related tasks that will be synthesized as functions or procedures. The need to know principle [14] is the basis of our decomposition algorithm. This principle states that if variable x is not necessary for accomplishing the task of statements S1; : : :; Sn , then variable x should be unavailable to those statements. Similarly, when decomposing a speci cation, the conjuncts that have variables in common (variable-dependent) should be grouped together to form the speci cation for a group of programming statements that can be encapsulated into a routine. If an abstract data type (ADT) is to be generated, the ADT speci cation is pre-processed by testing it for sucient completeness [5, 15, 16]. After both preprocessing steps have been completed, programming statements are synthesized in order to satisfy the speci cation. The rules for choosing which programming language structures to synthesize are stored in a rule base. User-speci ed and system-de ned background knowledge and domain-speci c 4

information are stored in a fact base. During synthesis, Seed uses the fact base to disambiguate synthesis rule applications. Synthesized programming statements are veri ed for correctness; that is, the system uses the wp semantics to determine if the synthesized statements satisfy the postconditions. During this veri cation process, new logical expressions may be obtained that need to be satis ed through further synthesis. This method leads to the simultaneous construction of a program and its proof of correctness in a bottom-up manner [6, 7]. All rules are implemented in Prolog as it can be readily applied to predicate logic speci cations. In order to help the user understand the synthesized code, it is annotated with the user-supplied pre- and postconditions and assertions that are derived during synthesis. That is, if the code is not what is expected by the user, then the generated assertions may provide information as to what decisions were made by the system during the synthesis process, thus guiding the user in determining what changes should be made to the original speci cation in order to generate the appropriate code.

Figure 1: Overview of Seed Once Seed has successfully synthesized a procedure or a function, the speci cation and resultant code are stored in a software component library [17] that is indexed by speci cations and accessible via a graphical browser [18]. If no application of rules yields a satisfactory program, Seed informs the user, who may then modify the speci cation and restart the synthesis. The Seed rule base is made up of three components: rules for pre-processing a speci cation for synthesis, rules for synthesizing programming statements and data structures from a speci cation, and rules (de ned by the wp semantics of the programming statements) used to verify the correctness of the synthesized programming statements [6, 7]. This paper describes the theory supporting the second and third sets of rules, that is, those rules used for the synthesis of programming structures and the veri cation of the synthesis. See [5, 10, 18, 19] for details concerning the other components of Seed. Examples to illustrate each type of synthesis rule are presented in the respective discussions. Due to space constraints, a large example is not included in this paper, but the interested reader is referred to [5] for the description of the speci cation and synthesis of a parser. 5

3 Primitive Programming Structures This section describes the rules that synthesize primitive programming structures from predicate logic speci cations containing conjunctive, equality, inequality, disjunctive, and quanti ed expressions. For each type of logical expression, Seed contains rules to synthesize one or more statements that achieve the behavior speci ed by the expression. For example, if a speci cation contains an equality operator (=), then an assignment statement may be synthesized. A disjunctive expression may be satis ed by an alternative statement. An expression containing an inequality expression may also be synthesized by an alternative statement to capture the case analysis semantics. If the speci cation contains a quanti ed expression (8 or 9), then because the expression concerns a range of values, an iterative structure may be synthesized. Conjunctive expressions are also considered as speci cations for a sequence of statements. These statement types represent the primitive programming structures used in most imperative programs.

3.1 Equality Relations A predicate logic expression of the form x = expr, occurring in a postcondition R, can be satis ed with an assignment statement, x:= expr, where expr is evaluated and its value is stored at location x. We assume that equality expressions are written with the orientation that x is greater than expr according to the following partial ordering on symbols: arrays and function names, variables  primitive operators  bound identi ers, constants. The symbol `' represents a greater than relationship with respect to a partial ordering between symbols. Users may declare identi ers to be variables indicating that their values can be changed in a program. Primitive operators include +; ?; , and =. Quanti ed expressions may bind a variable to the quanti er. For example in the expression, (8i : 0  i < n : P (i)); the variable i is bound to the 8 quanti er and cannot be changed in a statement. Constants include integers and identi ers that are declared by the users to remain unchanged in programs. After synthesizing an assignment statement, Seed veri es the correctness of the synthesized statement by determining the weakest precondition (wp) of the assignment with respect to the 6

postcondition R. The wp of an assignment statement is expressed as wp(x:=expr; R) = Rxexpr , which represents textual substitution, where postcondition R has every occurrence of x replaced by the expression expr, and it is assumed for discussion purposes that expr is de ned. If x corresponds to a vector y of variables and expr represents a vector E of expressions, then the wp of the assignment is of the form RyE , where each distinct yi is replaced by Ei , respectively, in expression R. Figure 2 summarizes the rules for synthesizing an assignment statement.

Figure 2: Rules for synthesizing an assignment statement As an example, consider the speci cation, (z = 5) ^ (w = c  b); where z and w are variables. Seed synthesizes the statement z:= expression ((z = 5) ^ (w = c  b))z5; which can be simpli ed to

5

and nds its wp to be the

(w = c  b):

Because w is also a variable, Seed synthesizes the statement w:=c  b and nds that its wp is (w = c  b)wcb  true: For speci cations involving arrays, we consider arrays as functions. In this approach [7], a[i] represents the application of function a to a parameter i. Following this line of reasoning, the assignment a[i]:= e rede nes the function a: for the parameter i, the value of the function is e;

7

for all other parameter values, the function is unchanged. The new function is represented as

8 >< i = j ! e (a; i : e)[j ] = > : i 6= j ! a[j ];

where the new function is applied to parameter j . Because an assignment to an array element may be treated the same as an assignment to a simple variable, the corresponding wp is expressed as

wp(a[i] := e; R) = Ra(a;i:e): As an example, consider the speci cation (b[j ] = j ) ^ (b[i] = b[j ]): Statement b[j]:= j is synthesized rst, and the wp of the assignment is determined to be

wp(b[j]:= j; (b[j ] = j ) ^ (b[i] = b[j ]))  1em(b[j ] = j ) ^ (b[i] = b[j ])b(b;j:j): Applying textual substitution yields the expression (b; j : j )[j ] = j ^ (b; j : j )[i] = (b; j : j )[j ]: Next, using the previously discussed rule for evaluating arrays, function b is applied to the arguments j and i, respectively, and the following expression is obtained (((j = j ^ j = j ) _ (j 6= j ) ^ (b[j ] = j )))^ 1em(((j = i) ^ (j = j )) _ ((j 6= i) ^ (b[i] = j ))): Using the assumption that the operator ^ has greater precedence than the operator _, the expression is simpli ed to true ^ ((j = i) _ ((j 6= i) ^ (b[i] = b[j ]))): After removing tautologies, the expression becomes (j = i) _ ((j 6= i) ^ (b[i] = b[j ])): The annotated code resulting from the synthesis is: { (j = i) | (j i & b[i] = b[j] } b[j]:= j; {(b[j] = j) & (b[i] = b[j]) },

8

where logical assertions are enclosed in braces, and & and | represent logical and and or, respectively. Note that there may be more than one statement(s) that will satisfy the original postcondition.

3.2 Disjunctive Expressions A predicate logic expression containing one or more occurrences of the disjunctive operator _ may be satis ed with an alternative statement (if- ). A speci cation R, where R is of the form R1 _ : : : _ Rn, may be implemented by the statement

B1! S1; [] B2! S2 ; .. . . ! .. [] Bn! Sn ; if

fi.

Each disjunct Ri may be implemented by a guarded command Bi ! Si , where the statement Si is executed only if the guard Bi is true. The symbol [] separates the guarded commands. The wp of the alternative statement is that there must be at least one guard be true and that every guard Bi must logically imply the wp of its corresponding statement list Si with respect to the postcondition R. Symbolically, the wp is expressed as (9i :: Bi ) ^ (8i :: Bi ! wp(Si ; R)); where `::' indicates that the range of the quanti ed variable i is not pertinent to the current discussion. If none of the guards is true, then the wp is not satis ed; therefore, the execution of the alternative statement is aborted. That is, the alternative statement is successfully executed if at least one guard Bi is true. Analogously, a disjunctive expression has the semantics that, if one disjunct has the value true, then the entire expression has the value true. Therefore, our technique attempts to satisfy each disjunct with one guarded command of an alternative statement. The method we developed for synthesizing the alternative statement for a disjunctive expression are summarized in Figure 3. In order to synthesize a guarded command from a disjunct Ri , Seed must nd an expression that facilitates the synthesis of a statement Si with guard Bi and satis es the expression (8i :: Bi ! wp(Si ; R)). Because Ri ! R, we use the Law of Monotonicity of wp [6, 7] that states

9

Figure 3: Process for synthesizing statements for a disjunctive expression for postconditions Q; R and statement S , if Q ! R then wp(S; Q) ! wp(S; R); to derive the relation

wp(Si ; Ri) ! wp(Si; R):

Thus, if Bi = wp(Si; Ri) for all i, then the expression (8i :: Bi ! wp(Si ; R)) is true, which is one part of the criteria for the wp of an alternative statement. If an alternative statement can be successfully synthesized from the postcondition, then each Bi is obtained from the respective disjuncts Ri , where the criteria of having at least one guard be true when an alternative statement is executed is satis ed, that is, (9i :: Bi ). As an example, consider the speci cation for assigning the maximum of three constants, c; d; e to the variable z : (z = c) ^ (z  d) ^ (z  e) ^ (z  c) _

(1)

(z = d) ^ (z  d) ^ (z  e) ^ (z  c) _

(2)

(z = e) ^ (z  d) ^ (z  e) ^ (z  c)

(3)

The resulting alternative statement, annotated with pre- and postconditions, is of the following form.

10

{ ((c >= d & c >= e) | (e >= c & e >= d)) (c >= d & c >= e) -> (d >= c & d >= e) -> (e >= c & e >= d) -> } if c >= d & c >= e [] d >= c & d >= e [] e >= c & e >= d fi

(d >= c & d >= e) | & wp(z:=c,R) & wp(z:=d,R) & wp(z:=e,R)

-> z:= c; -> z:= d; -> z:= e;

{ R: (z = c) & (z >= d) & (z >= e) & (z >= c) | (z = d) & (z >= d) & (z >= e) & (z >= c) | (z = e) & (z >= d) & (z >= e) & (z >= c) }

A case analysis rule can be applied to a relational expression that includes the equality relation (for example, , ) and at least one nonconstant operand (for example, z  8, where z is a variable) to build a disjunctive expression representing the possible cases in a relational expression (for example, (z 0  8) ^ (z = z 0) _ ((z 0 < 8) ^ (z = 8))), where z 0 represents the initial value of z ). The disjunctive expression can then be satis ed by an alternative statement. Because the expression (z = z 0) can be satis ed by the skip statement, the corresponding wp of the statement is the expression (z 0  8). The assignment z:= 8 is used to satisfy the second disjunct, (z 0 < 8) ^ (z = 8); the wp of the assignment is (z 0 < 8). The requirement that there must be at least one nonconstant identi er in the inequality expression is necessary because the value of an expression containing only constant terms cannot be altered (for example, 6  8). The following alternative statement is synthesized: { (z0 >= 8)

|

(z0 < 8) & z = 8 }

if z >= 8

-> skip;

[] z < 8

-> z:= 8;

fi; { z >= 8 }

Similar approaches for developing alternative statements have been investigated by other program synthesis systems, including those of Luckham and Buchanan [20], Warren [21], Manna and Waldinger [22], and Dershowitz [23]; however, these approaches are more akin to the development of the alternative statement strictly from a case analysis approach. Our rule is not limited to the synthesis of alternative statements with only two possible cases. As discussed earlier, the number of cases (alternatives) is dependent on the number of disjuncts in a disjunctive expression. 11

3.3 Quanti ed Expressions A quanti ed expression speci es one or more conditions existing over a range of values. Analogously, an iterative statement performs a set of operations over a range of values. The iterative statement in our target language is of the form B1! S1 ; [] B2! S2 ; .. .. . !. [] Bn! Sn ; do

od

where Bi ! Si again represents a guarded command. The iterative statement, like the alternative statement, uses guarded commands; at least one guard Bi must be true in order to enter the loop. The synthesis process begins with the application of rules that we developed to nd an invariant expression from the original speci cation. An invariant P describes the conditions of an iterative statement that exist before entry and upon exit of the iterative statement. Currently, two approaches are used to nd an invariant: delete a conjunct from the postcondition or replace a constant by a variable [7]. Both strategies for nding invariants also suggest corresponding expressions to be used as guards. For instance, if the invariant is formed by deleting a conjunct Ci from a postcondition R of the form

C1 ^ : : : ^ Ci ^ : : : ^ Cn; then the corresponding guard is the negation of the deleted conjunct (:Ci ) and the invariant P becomes

C1 ^ : : : ^ Ci?1 ^ Ci+1 ^ : : : ^ Cn:

The loop stops when Ci is true, thus giving the expression

C1 ^ : : : ^ Ci?1 ^ Ci+1 ^ : : :Cn ^ ::Ci ; which is the original postcondition R. If the invariant is obtained by replacing a constant with a variable in the postcondition, then the corresponding guard is the inequality between the variable and the constant that it replaced.

12

Thus, for a postcondition R of the form (8i : 0  i < n : C (i)) ^ Rj ; where Rj represents a conjunct, the invariant is formed by replacing the constant n with some variable k and de ning the range for the new variable. Then the invariant P is of the form, (8i : 0  i < k : C (i)) ^ Rj ^ (0  k  n); and the guard B is the expression

k 6= n:

The loop stops when the guard is not true, but notice that when this occurs, the original postcondition R is obtained. After developing the invariant and the guards for the loop, it must be shown that the invariant P is initially true; this veri cation may require the synthesis of one or more statements. Next, an integer bound function t is constructed, where the value of t must decrease with each iteration of the loop structure thus imposing a bound on the number of loop iterations. In order to decrease the bound function, a statement S that changes a variable involved in the de nition of the bound function must be developed. The wp of the statement S may not be the invariant; for those cases, more statements need to be synthesized in order to ensure that the invariant is satis ed before and after execution of the loop. This method of synthesis lends itself to a bottom-up approach to program development. Termination occurs when none of the guards is true and the relationship

P ^ :G ! R is satis ed, where G is true when at least one guard Bi is true, that is, G = (9i :: Bi ). As an example of the synthesis of an iterative statement, consider the following speci cation of a sort program:

R : (8i : 0  i < n : (8j : 0  j < (n ? i) ? 1 : c[j ]  c[j + 1]))^ perm(c; c0), where perm(c; c0) indicates that the array c is a permutation of the original array c0. Because the speci cation involves a generally quanti ed expression, the rules for synthesizing an iterative 13

statement are applied. First, Seed develops the invariant from R, by replacing the constant expression n with variable k as the upper bound for the quanti ed variable i in order to obtain the expression (8i : 0  i < k : (8j : 0  j < (n ? i) ? 1 : (c[j ]  c[j + 1]))^ perm(c; c0). Because the new variable k was introduced, a range must be given for the new variable; the range is (0  k  n). Next, the guard for the iterative statement is developed; the guard k 6= n, represents the inequality of the newly introduced variable (k) and its upper bound (n). The nal expression for the invariant P is

P : (8i : 0  i < k : (8j : 0  j < (n ? i) ? 1 : c[j ]  c[j + 1]))^ (0  k  n) ^ perm(c; c0). Seed must determine that the invariant is satis ed before entering the loop. Since there is a generally quanti ed statement in P , variable k is assigned the lower bound of the range thus giving an expression quanti ed over an empty range, which can be simpli ed to true. The statement k:= 0 is synthesized, which has the wp (8i : 0  i < 0 : (8j : 0  j < (n ? i) ? 1 : c[j ]  c[j + 1]))^ perm(c; c0)  true. Because the upper bound n in the quanti ed expression was replaced with the variable k, the bound function becomes the di erence between k and n expressed as n ? k > 0. When building the body of the loop, Seed must synthesize statements that ensure progress towards termination. Because n is a constant value and k is not a constant and using domain information of number theory, Seed synthesizes the assignment statement k:= k+1 to decrease the bound function. The wp of the assignment statement is wp(k:=

; P ) = (8i : 0  i < k + 1 : (8j : 0  j < (n ? i) ? 1 : c[j ]  c[j + 1]))^ perm(c; c0).

k+1

14

After simplifying the wp, Seed uses (natural number) axioms from predicate logic [7] to rewrite the wp such that the invariant expression P is in conjunction with some expression R0 : wp(k:= k+1; P ) = (8i : 0  i < k : (8j : 0  j < (n ? i) ? 1 : c[j ]  c[j + 1]))^ perm(c; c0)^ (8j : 0  j < (n ? k) ? 1 : c[j ]  c[j + 1]))^ (0  k  n) = P ^ (8j : 0  j < (n ? k) ? 1 : c[j ]  c[j + 1])) = P ^ R0 .

Because the invariant P is true, Seed must synthesize statements to satisfy the expression R0. Since R0 contains a generally quanti ed expression, the process for developing a loop is used again; the new loop is nested within the rst loop, where the constant expression (n ? k) ? 1 is replaced by a variable kk to form the new invariant. Figure 4 contains the nal results for the sort program. (The quanti ers 8 and 9 are denoted as A and E, respectively, in Seed's synthesis results.)

Figure 4: Sort program generated by Seed

4 Procedures and Functions Thus far, programs built from simple statements have been presented. For a program synthesis tool to handle more sophisticated problems, it must also provide support for procedural abstraction, which allows the programmer to extend the virtual machine de ned by a programming language through the addition of new operations [24]. Procedural abstractions are implemented using procedures or functions. A procedure is an encapsulation of statements or a subprogram that performs a set of operations on one or more variables and returns zero or more values. A function is a subprogram that performs one or more operations on a set of values and returns a single value. In our target programming language, the mathematical de nition of a function is applied in 15

that the arguments of a function cannot be changed within the subprogram. The term routine is used when the discussion is applicable to both procedures and functions. Once a routine has been synthesized, it may be invoked to satisfy future speci cations. Sections 4.1 and 4.2 describe the synthesis of non-recursive and recursive routines, respectively.

4.1 Synthesis of Non-Recursive Routines A speci cation for a routine consists of three parts: a precondition, a postcondition, and an interface declaration. The precondition describes the conditions of the variables prior to the execution of the routine whose behavior is described by the postcondition. The interface declaration of a routine contains the routine name, the names and types of the formal parameters of the routine and, in the case of a function, the type of the value returned. Formal parameters may be of type value, valueresult, or result.1 A procedure is invoked by its name instantiated with a list of actual parameters that correspond to the formal parameters. For the scope of this paper, we require that all actual parameters be expressions that can be evaluated to a single value or location; [16] discusses more complex parameters including routine names. A procedure's interface is of the form Procedure

proc name( val:value-list; val-res:value-result-list; res:result-list),

where proc name represents the name of the procedure, value-list, value-result-list, and result-list represents the list of value, value-result, and result parameters, respectively. Similarly, a function interface declaration is of the form Function

func name(val: value-list);

where func name is the name of the function, value-list represents the list of value parameters. Function results are returned by assigning the value of the function to the name of the function. The synthesis of non-recursive routines involves simply generating the body of the routine using rules discussed in Section 3, enclosing the results in begin and end, and attaching the interface declaration. As an example of the synthesis of a function, consider the speci cation for a max 1

The usual meanings for these terms is assumed; see [25] for more information.

16

function, which returns the maximum value of two arguments:

Q: R:

true (max  x ^ max  y ^ max = x)_ (max  x ^ max  y ^ max = y )

Function max (val:x,y)

where Q and R are the pre- and the postconditions, respectively, and x and y are value parameters. For many procedures and functions, the precondition is true because the actions of the routine are independent of the calling routine. Because the speci cation for the max function contains a disjunctive expression, an alternative statement is synthesized. The greater value of two variables is assigned to the function name max. The body of the function is enclosed in begin and end. The interface declaration is attached to the beginning of the body of the routine. The pre- and postconditions, enclosed in braces, annotate the code. The corresponding code is given in Figure 5.

Figure 5: max function If a speci cation R is of the form R1 ^ : : : ^ Rn , and after decomposition, each Ri can be synthesized separately, then the results are composed according to the wp de nition for sequential statements, S1; S2. That is, wp(S1; S2; R) = wp(S1; wp(S2; R)).

4.2 Recursive Functions Some function de nitions are most naturally expressed recursively. For instance, a possible speci cation for the factorial function is (fact(x) = 1 ^ x = 0) _ (fact(x) = x  fact(x ? 1) ^ x > 0): We know that there is only one argument for fact; the value of the factorial function depends on that argument, which is a parameter for the function. In order to specify the de nition in terms 17

of the parameter, we use the following de nition for recursive function speci cations, which gives a canonical form for the speci cations. The constraints do not limit the scope of recursive functions that Seed can synthesize.

De nition 1

The speci cation for the postcondition of a recursive function has the following format: 1. It must contain a disjunctive expression of two or more disjuncts. 2. Each disjunct must have two or more conjuncts. 3. One of the conjuncts in each disjunct must be an equality expression of the form

f (x1; : : :; xn) = expr;

(4)

where f is the name of the function being speci ed, and expr de nes the value of f for the list of formal parameters x1; : : :; xn, where there exists a well-founded ordering (wfo ) between f (x1; : : :; xn ) and expr [26]. A partial ordering over a set S is a well-founded ordering if S does not contain an in nite decreasing sequence of the form a1  a2  a3 : : : [26].

4. The remaining conjuncts of each disjunct must be predicates that pertain to the parameters x1 ; : : :; xn . 5. Given an equation of the form in Expression (4), there must exist at least one disjunct in which expr is a primitive expression, that is, containing no unde ned function values. An unde ned function value is a reference to a function value that has not been de ned. The group of conjuncts within this disjunct is called the termination condition.

The most important consideration when developing the rules for the synthesis of recursive functions is to guarantee termination of the program, which involves two problems. First, the termination conditions must be determined, that is, those conditions that must hold when the action of the routine does not involve a recursive call. Termination conditions establish a bound on the number of recursive invocations. Second, it must be established that the termination conditions are eventually reached. We use equational logic reasoning [27, 28] in proving termination properties for recursive de nitions and their corresponding programs. Speci cally, for recursive functions, proving progress towards termination requires examination of equations that contain references to the function being de ned on both sides of an equation. In order to ensure that recursive references do not lead to in nite invocations, there must be a well-founded ordering (wfo) between the LHS and the RHS of these equations. Well-founded orderings have also been used by other synthesis approaches [22, 29] for guaranteeing termination. In order to facilitate the determination of a wfo, we use the partial ordering on symbols given in Section 3.1 and user-supplied precedences for newly-introduced symbols, such as new functions, 18

that obey the wfo. In an attempt to de ne assignments to function names, we notice that when a function is recursively de ned, using only the partial ordering on symbols is not sucient to determine a well-founded ordering, thus, we use the recursive path ordering (rpo) [27] to determine the existence of a wfo. A recursive path ordering is de ned as follows.

De nition 2 Let  be a partial order on the set of function symbols F . The recursive path ordering rpo on the set of terms over F is de ned recursively as s = f (s ; : : :; sm) rpo g(t ; : : :tn ) = t 1

1

if

1. si rpo t for some si , where rpo is recursive partial ordering allowing for equality. 2. f = g and fs ; : : :; smg  ft ; : : :tn g, or 3. f  g and s rpo tj for all tj , where  is the multiset extension of rpo [27]. 1

1

Theorem 1 If a speci cation of a function is well-de ned with respect to De nition 1, then the speci cation for a function f (x1 ; : : :; xn) of the form f (x1; : : :; xn) = expr1 ^ p11(x1) ^ : : : ^ p1n (xn ) _ f (x1; : : :; xn) = expr2 ^ p21(x1) ^ : : : ^ p2n (xn ) _ .. .

f (x1 ; : : :; xn) = exprm ^ pm1 (x1 ) ^ : : : ^ pmn (xn); where the pij denote predicates involving the parameters to the function f , can be correctly translated into the following alternative statement:

f (x) :

if [] [] fi

p11(x1 ) ^ : : : ^ p1n(xn) ! f := expr1; p21(x1 ) ^ : : : ^ p2n(xn) ! f := expr2; .. .

pm1 (x1) ^ : : : ^ pmn(xn ) ! f := exprm

Informal Sketch of Proof: The speci cation for a recursive function has the same syntactic structure as the speci cation that is satis ed by an alternative statement, that is, a set of guarded commands where each guarded command implements one disjunct of the speci cation. Recall the earlier discussion concerning array assignments in Section 3. We consider the name of the function to be a variable that can be assigned the value of an expression determined within the body of the function with respect to a speci c set of parameters. For those disjuncts that do not contain recursive equations, the termination conditions are the guards for the non-recursive function assignments. For each disjunct containing a recursive equation, the corresponding guard is the group of predicates that describe the variables in the 19

recursive equations and the recursive function assignment is the command. The veri cation of the alternative statement for the disjunctive expression follows the wp derivation described in Section 3.2. For instance, the speci cation for the factorial function is

Q: R:

true (x = 0) ^ fact(x) = 1) _ ((x > 0) ^ fact(x) = x  fact(x ? 1);

(5) (6)

Function fact (val:x)

where Q and R are the pre- and the postconditions, respectively, and the underlined conjuncts are equations involving the function being de ned. In the factorial example, clearly items (1) and (2) of De nition 1 are satis ed. Disjunct (5) contains an equation involving the fact function only on the LHS. The expression x = 0 is therefore the termination condition; an invocation of the function fact with argument x equal to 0 will not involve recursion. Disjunct (6) contains an equation that has references to the factorial function on both the LHS and the RHS. In order to guarantee progress towards termination, it must be shown that fact(x) wfo x  fact(x ? 1); where wfo represents a well-founded ordering. Thus, we must show that fact(x) is greater than the expression x  fact(x ? 1). The relationships fact   and fact(x)  x is true from the partial ordering on functions given above, and using the second item in De nition 2 and the axioms of natural numbers, which indicates that x is greater than x ? 1, it can be established that fact(x) rpo fact(x ? 1). Therefore, we can conclude that fact(x) wfo x  fact(x ? 1). Since the speci cation for the factorial function is well-de ned with respect to De nition 1, then we can synthesize the corresponding alternative statement given in Figure 6.

Figure 6: Synthesis of factorial function

20

5 Implementation of SEED This section describes the implementation of Seed. First, how Seed is used to develop programs is described. Next, an overview of the implementation of the components of Seed and the motivation for implementing Seed in Prolog is given.

5.1 How to Use Seed In an e ort to facilitate the speci cation process, the Hoare-Triples editor [10] was developed, which has a graphical user interface and provides the user with templates for constructing predicate logic expressions. Figure 7 shows an example session with Hoare-Triples. A set of editing surfaces (display areas) with title bars, corresponding to each logical component of the speci cation, respond to the workstation's mouse; the Precondition subwindow in Figure 7 is an example of an editing surface. Nonterminal symbols, displayed in an italics font, can be replaced with a template by applying a production rule from the grammar for predicate logic; terminal symbols (such as variable names and operators), displayed in a Roman font, cannot be expanded. In Figure 7, the nonterminal Range is highlighted, where the popup menu contains the possible templates that can replace the nonterminal. The user constructs a speci cation according to the procedure given in Figure 8; (the response of the tool is described in italics). If in Step 1 a nonterminal N is selected, and the grammar contains a set of productions, fN ! B1; N ! B2; :::; N ! Bn g, then the user will be allowed to expand the symbol according to the Bi s. After the expansion decision is made in Step 2, a menu is displayed containing items B1; :::; Bn as its entries. In Step 4, when there are no terminals remaining, the user may save the speci cation and interact with another component of Seed.

Figure 7: The Hoare-Triples interface Figure 9 contains a screen dump of the graphical user interface for the Seed system. Synthesized programs are annotated with user-supplied pre- and postconditions and system-synthesized intermediate assertions. The intermediate assertions not only serve to document the synthesized 21

Figure 8: Procedure for constructing speci cations

Figure 9: Sample session with Seed program, but also aid the user in understanding the development process. If the program is not what the user expected, the annotations aid the user in determining what part of the speci cation should be modi ed for the next reinvocation of Seed for a di erent program. Program code cannot be changed directly by the user; this restriction is imposed in order to ensure the soundness of any program produced using Seed. The user has the option of creating speci cations or retrieving an existing speci cation from a le. Using the Operations menu, the user can also add domain-speci c information in order to facilitate the synthesis process. Once the user has entered all necessary information, the Synthesize Code button will begin the synthesis process. At any time, the user may interrupt the process and work directly with the Prolog interface to Seed using the Interactive Session window; this feature may be used for direct fact entry or read-only access to the synthesis or veri cation rules. Note that any facts entered after the synthesis process starts will not be applicable until the synthesis is restarted. The user may also invoke the Browser for retrieving already developed software [18]. It is noted that Seed uses a depth- rst approach in applying synthesis rules. Since there may be more than one programming structure that may satisfy the speci cation, Seed continues to attempt the synthesis process until one suitable programming structure is found or all of the applicable synthesis rules have been exhausted. For example, if a loop invariant cannot be developed using one speci c strategy, Seed backs up to attempt another loop invariant construction strategy. If all loop invariant formation rules fail, then Seed determines that a loop cannot be synthesized from the given information and attempts to synthesize a di erent 22

programming structure. (Currently, we assume that all loops must have a loop invariant.) An error message is returned, if a given speci cation is syntactically invalid with respect to predicate logic or the speci c canonical format for procedural abstraction speci cations. From this point, the user may enter a new speci cation, modify the old speci cation and restart the synthesis, or exit from Seed. Currently, the system is not interactive during program synthesis. That is, once a speci cation has been processed, the system synthesizes a program or returns an error message. In future investigations, we will extend Seed to allow facts to be entered during synthesis in order to handle incomplete speci cations.

5.2 Overview of SEED's Rule Base Sections 3 and 4 describe the theoretical basis for the synthesis of primitive and procedural abstractions, respectively. This section brie y describes Seed's architecture as it relates to the process by which program code is derived from formal speci cations. The Seed rule base is divided into three components: rules for preparing a speci cation for synthesis, rules for synthesizing programming and data structures from a speci cation, and the wp rules used to verify the synthesized programming structures. All rules are encoded in Prolog; we describe the motivation for using Prolog in Section 5.3. After a program speci cation has been decomposed into logical tasks Ri , the Seed rule base is applied to each Ri . The program structure(s) selected to satisfy the Ri is (are) synthesized in a bottom-up fashion. A program speci cation is stated in terms of predicate logic that may include quanti ers and logical connectives. Facts speci c to the problem are also expressed in predicate logic. Although more than one program may correctly satisfy a speci cation, the system returns only one such program. If the resultant program does not satisfy the user, then either the speci cation needs to be changed or more facts need to be added to the system to guide the synthesis down another path of rules in the next invocation of the synthesis process. The synthesis rules are divided into three groups. The rst group synthesizes primitive programming statements. Procedures and functions are synthesized in the second group. The third group synthesizes abstract data types. Although the second and third groups synthesize more complex structures than the rst group, they may still invoke rules from the rst group. This follows from the principle that the primitive programming statements are the basic building blocks of programs. As problems and speci cations become more complex, there is a need for more sophisticated operations, which still make use of the basic building blocks. 23

5.3 Using Prolog for the Implementation Language As mentioned earlier, Prolog was chosen as the implementation language of Seed since it provides many features that work well with predicate logic, including its declarative and procedural nature, its uni cation properties, and its backtracking capabilities. All components of Seed with the exception of the speci cation editor, Hoare-Triples, are implemented in Prolog. The declarative and procedural nature of Prolog provide the necessary qualities for implementing a system of rules applicable to predicate logic expressions. User-de ned problem-speci c facts, are naturally expressed as Prolog facts. For example, an array may be declared to be only permutable (e.g. perm(c,c0) indicates that the nal array c is a permutation of the original array c0), thus limiting the types of operations that should be synthesized for the corresponding speci cation. Each synthesis rule is expressed as a Prolog rule. The arguments in the head of the rule indicate the syntax of the logical expression that is being processed and what programming statement will be developed to satisfy the expression. The logical structure is syntactically matched to all or a portion of a postcondition that needs to be satis ed, thus exploiting the pattern-matching and uni cation capabilities of Prolog. The body of the rule is made up of the conditions that must be satis ed before the corresponding statement can be successfully synthesized. Many of the synthesis rules, such as those used for synthesizing an iterative structure, require an ordering on the evaluation of the conditions within the body of the rule. If the rst few conditions are not satis ed, then there is no reason to evaluate the remainder of the body. Thus, the procedural aspect of Prolog facilitates this type of processing. Uni cation capabilities within Prolog facilitate the application of the synthesis rules. Although the development of the synthesis rules was based on the semantics of the logical constructs that they satis ed, the application of the rules is syntactic-based. Often a sequence of conditions must be satis ed in order to synthesize a programming structure. With uni cation, if the conditions all involve the same variables, then when the rule is instantiated with a query, all of the variables with the names occurring in the head clause are automatically instantiated and the conditions can be readily checked in order. For instance, the top level rule to synthesize an iterative statement for a generally quanti ed expression is of the form

24

syn_prog( all(Lb,Ub,Var,Spec) and R, prog(Loop,Spec) ):develop_invariant(all(Lb,Ub,Var,Spec) and R, LoopVar,Invar,Guard), initialize_invariant( W and R,Lb,Ub,Var, LoopVar,Invar,Init), make_progress(W and R,Lb,Ub,Var,LoopVar, Invar,Wp,Dec), restore_invariant(Wp,Invar,Body), Loop = [Invar,Init,Guard,Dec,Body].

where variables are capitalized. The rule syn prog can be invoked with the speci cation all(i,0,n, geq(z,array(b,i)) and exists(j,0,n, eq(z,array(b,j)))

where Lb,

Ub, Var, Spec,

and R are uni ed with the constants i,0,n,geq(z,array(b,i)) and exists(j,0,n,eq(z,array(b,j))), respectively. The body of the rule consists of procedure calls to other rules that will synthesize the rest of the loop. Because the application of synthesis rules is syntax-based, it may be the case that the rst attempt to synthesize a statement for a speci cation appears to be successful until a condition is reached in the body that cannot be satis ed. The backtracking capability within Prolog allows the system to retreat to the next applicable condition. Depending on the situation, the system may retreat one level of rules to synthesize another variation of the type of programming statement currently being synthesized or it may retrace its steps to the top level and attempt to synthesize an entirely di erent programming statement. Another advantage of backtracking is that variables that were instantiated in the unsuccessful path of rule applications are restored to their original conditions to facilitate another path of rule applications.

6 Other Approaches to Program Development This section discusses and contrasts other approaches to program development to the one addressed in this paper. A transformational approach [30, 31, 32, 33] to program synthesis typically comprises a system of rules to transform programs expressed in recursion equations of the form E ( F , into more ecient programs, where E is a function expression of the form f (e1 ; : : :; en), f is a function name, and ei are expressions. The transformation process begins with programs that have simple structures and through transformations incorporate more complex and ecient structures when useful interactions are introduced between parts of the 25

program that were initially separate. In the transformational approach, intuition, sometimes known as the \Eureka step", is a necessary factor to guide the introduction of new functions. The steps unfolding and folding are repeated throughout the derivation of a program. Unfolding refers to the act of replacing the occurrence of the LHS of a transformation rule E ( E 0 within the RHS F 0 of the program speci cation F ( F 0 by the corresponding instance of E 0, in order to obtain F 00 , which nally yields the rule F ( F 00 . In contrast, folding indicates that an instance of the RHS E 0 of the transformation rule E ( E 0 within the RHS F 0 of the program (F ( F 0 ) is replaced by the corresponding LHS instance E to obtain (F ( F 00 ); this process abstracts a function from a speci cation. The correctness of a nal program is guaranteed since it is assumed that all transformations are equivalence preserving. In contrast to transformational approaches to program synthesis [30, 31, 32, 33], Seed uses a bottom-up derivation of programs guided by the derivation of a valid wp that yields a program satisfying an abstract description of a problem. The transformational approach to program derivation depends on the user's intuition in determining appropriate places to introduce new function de nitions that encompass one or more of the current recursive equations. Thus, e ectiveness of the transformational approach is dependent upon the insight of the user. In Seed, however, the synthesis rules apply to logical structures and the corresponding program structures are individually specialized according to the problem-speci c facts. Dershowitz [23] transforms a formal speci cation in a step-by-step manner into executable code. Each step is an application of a synthesis rule that rewrites a segment of the program in increasing detail. The user begins with input and output speci cations for a program and uses synthesis rules to generate code to solve the current goal or uses rules to transform the current goal to one or more subgoals. Each partial program is equivalent to its predecessor and can be improved by applying programming transformations that preserve its correctness while improving eciency. The nal program is guaranteed to satisfy the initial speci cations. Program synthesis begins with an initial goal P of the form P: begin comment program speci cation assert input speci cation achieve output speci cation varying output variables

end

After processing, the achieve element is rewritten into purpose and output expressions.

26

P : begin comment program speci cation assert input speci cation purpose output speci cation varying output variables assert output speci cations end

where the assert argument is the input speci cation and speci es the set of values of input variables for which the synthesized program is expected to satisfy, the achieve arguments are \compiled" from high-level statements into lower-level \code", the purpose argument represents a comment that describes the intention of the following indented code, and arguments to directive varying output variables indicate the set of variables that may be modi ed by the program. For any input values satisfying the input speci cation, when control reaches the end of the program, the output speci cation should be satis ed. The code must also be \primitive", that is, there are no achieve statements or nonprimitive operators. Dershowitz uses a combination of subgoals and invariant assertions for program derivation. Our approach di ers from Dershowitz's in that Seed's program synthesis rules apply to predicates in a bottom-up fashion rather than in a top-down manner. More speci cally, for a given speci cation R, we use its semantics to nd a suitable program statement. The selection of the statement, S is veri ed by nding its wp predicate transformer to be a valid value with respect to the postcondition. In nding the wp, new assertions may be generated that need to be satis ed by more statements. If the wp is true or a predicate that is implied by a user supplied precondition, the synthesized code satis es the speci cation. The Kestrel Interactive Development System (KIDS) system [34] provides a program development system based on the transformational approach [30, 31]. A unique feature of KIDS is that it provides algorithm design techniques, such as divide-and-conquer and di erent search algorithms, and deductive inferencing capabilities through the application of transformations. Speci cations are expressed as functional constraints on the input and output behavior, respectively. KIDS has domain theories that can be used to derive distributive and monotonicity laws. They provide a hierarchical library of theories that have to be imported to the current development session. The user can develop new functions or create new de nitions by abstracting from the expressions existing in the library. Bhansali and Harandi [35, 36] developed an automatic programming system, APU, that synthesizes a script program for the Unix operating system from a Lisp-like speci cation. The system decomposes a problem in a top-down manner using a hierarchical planner and a layered knowledge-base of rules. The derivation of programs can be facilitated by using derivational 27

analogy. That is, the derivational analogy paradigm of APU enables it to improve upon the solution of analogous problems as well as the time taken to generate the solution. In the APU system, the synthesis progresses in a top-down fashion. A given speci cation is transformed into a statement and correctness is guaranteed because it is assumed that the transformational rules are equivalence preserving. Manna and Waldinger [37] apply a deductive approach to the synthesis of functional programs. In contrast, we apply heuristics, based on the semantics of the logical expressions, in identifying potential programming statements, and verify the suggested programming statement with respect to the formal speci cation according to the programming semantics de ned by Dijkstra's wp de nitions The Munich CIP system [38, 39] uses a transformational development of program schemes. A collection of speci cation languages [38] are used to handle di erent stages of the program design. An algebraic language de nes the program schemes, an applicative language is the meta-language in which the algorithmic transformations are encoded, and conditions for determining the applicability of the transformations are encoded as predicates. In contrast to these approaches, Seed's rules are syntactic-based up to the type of logical structure occurring within the speci cation. For the synthesis of a statement, the veri cation that the statement satis es the speci cation may impose new assertions that need to be satis ed through the synthesis of more statements. The veri cation process is signi cantly simpli ed in that the program and its proof of correctness are developed together using the wp semantics. Furthermore, we address the development of imperative programs that can include recursive procedures and functions in contrast to purely functional (applicative) programs.

7 Concluding Remarks We have presented a system that automatically synthesizes procedural abstractions from speci cations. The Seed system takes as input speci cations written in predicate logic and generates Pascal-like programs as output. We developed rules to synthesize primitive programming structures, such as assignments, alternative statements, and iterative structures and their invariants. The syntheses of the programming structures are veri ed with Dijkstra's weakest precondition predicate transformer. This style of program development was extended to handle the implementation of programs for larger and more complex speci cations, including rules for synthesis and later invocation of recursive and non-recursive procedures and functions. Our 28

synthesis rules have been implemented as the core of the Seed program synthesis system. All synthesized programming structures are veri ed with respect to the wp predicate transformer. The Seed system has provided a platform for us to demonstrate that programs can be constructed from formal speci cations automatically. We are continuing to increase the capabilities of the Seed system in order to address a wider range of problems, while developing other techniques and tools that apply formal methods to support phases of software development other than the implementation phase. Speci cally, future extensions to this work address speci cation construction using formal methods and object-oriented techniques [40, 41, 10, 11], software reuse determined according to formal speci cations [17, 18, 19], and abstraction of formal speci cations from existing program code [42, 43].

8 Acknowledgements The author gratefully acknowledges the detailed comments made by the anonymous referees that have helped to greatly improve the paper. In addition, a signi cant portion of the work described in the paper was performed by the author, while she worked with Dr. Simon M. Kaplan at the University of Illinois at Urbana-Champaign and was supported in part by a grant from AT&T.

References [1] Jeannette M. Wing. A Speci er's Introduction to Formal Methods. IEEE Computer, 23(9):8{24, September 1990. [2] Mark Moriconi, editor. International Workshop on Formal Methods in Software Development, Napa, California, May 1990. ACM SIGSOFT. [3] Susan L. Gerhart. Applications of formal methods: Developing virtuoso software. IEEE Software, 7(5):7{10, September 1990. [4] Nancy G. Leveson. Formal Methods in Software Engineering. IEEE Transactions on Software Engineering, 16(9):929{930, September 1990. [5] Betty Hsiao-Chih Cheng. Synthesis of Procedural and Data Abstractions. PhD thesis, University of Illinois at Urbana-Champaign, 1990. Tech Report UIUCDCS-R-90-1631. [6] Edsger W. Dijkstra. A Discipline of Programming. Prentice Hall, Englewood Cli s, New Jersey, 1976. [7] David Gries. The Science of Programming. Springer-Verlag, 1981. [8] Chin-Liang Chang and Richard Char-Tung Lee. Symbolic Logic and Mechanical Theorem Proving. Academic Press, 1973. 29

[9] Robert Kowalski. Predicate logic as a programming language. Information Processing `74, Proceedings of the IFIP Congress, pages 569{574, 1974. [10] Robert H. Bourdeau and Betty H. C. Cheng. An object-oriented toolkit for constructing speci cation editors. In Proceedings of COMPSAC'92: Computer Software and Applications Conference, pages 239{244, September 1992. [11] Betty H. C. Cheng. Constructing formal speci cations from informal descriptions. In Proceedings of Fourteenth Minnowbrook Software Engineering Workshop, pages 22{23, July 1991. [12] Kanth Miriyala and Mehdi T. Harandi. Analogical Approach to Speci cation Derivation. Proc. Fifth International Workshop on Software Speci cation and Design, pages 203{210, May 1989. [13] Kanth Miriyala and Mehdi T. Harandi. Automatic derivation of formal software speci cations from informal descriptions. IEEE Transactions on Software Engineering, 17(10), October 1991. [14] J. A. Zimmer. Abstraction for Programmers. McGraw-Hill Book Company, 1985. [15] Deepak Kapur, Paliath Narendran, and Hantao Zhang. Proof by induction using test sets. In J.H. Sikemann, editor, Proceedings of the Eighth International Conference on Automata, pages 99{117, Oxford, England, July 1986. Vol. 230 of Lecture Notes in Computer Science, Springer, Berlin. [16] Betty H. C. Cheng. Automated Synthesis of Data Abstractions. In Proc. of Irvine Software Symposium, pages 161{176, June 1991. [17] Jun-jang Jeng and Betty H. C. Cheng. Using Formal Methods to Construct a Software Component Library. Lecture Notes in Computer Science, 717:397{417, September 1993. (Proc. of Fourth European Software Engineering Conference). [18] Jun-jang Jeng and Betty H. C. Cheng. Using Automated Reasoning to Determine Software Reuse. International Journal of Software Engineering and Knowledge Engineering, 2(4):523{546, December 1992. [19] Betty H. C. Cheng and Jun-jang Jeng. Formal methods applied to reuse. In Proceedings of the Fifth Workshop in Software Reuse, 1992. [20] J.R. Buchanan and D.C. Luckham. On automating the construction of programs. Technical Report Memo AIM-236, Stanford University, Arti cial Intelligence Laboratory, Stanford, CA, 1974. [21] D.H.D. Warren. Generating conditional plans and programs. In Proceedings Conference on Arti cial Intelligence and Simulation on Behavior, pages 344{354, Scotland, July 1976. [22] Z. Manna and R. Waldinger. Synthesis : Dreams ! Programs. IEEE Transactions on Software Engineering, 4(SE-5):294{328, 1979. [23] Nachum Dershowitz. Synthetic programming. Arti cial Intelligence, 25:323{373, 1985. [24] Barbara Liskov and John Guttag. Abstraction and Speci cation in Program Development. MIT Press and McGraw-Hill, Cambridge, 1986. 30

[25] Alfred V. Aho and Je rey D. Ullman. Principles of Compiler Design. Addison-Wesley, Reading, MA, 1977. [26] U. S. Reddy. Program speci cation and derivation. Lecture Notes, August 1988. [27] Nachum Dershowitz. Orderings for term-rewriting systems. Theoretical Computer Science, 17(3):279{301, 1982. [28] Nachum Dershowitz. Termination of rewriting. J. Symbolic Computation, 3:69{116, 1987. [29] Deepak Kapur and Hantao Zhang. Rewrite Rule Laboratory. General Electric Company and University of Iowa, respectively, Iowa City, 1989. [30] R. M. Burstall and John Darlington. A transformation system for developing recursive programs. Journal of the Association for Computing Machinery, 24:44{67, 1977. [31] John Darlington. An experimental program transformation and synthesis system. Arti cial Intelligence, 16:99{121, 1981. [32] Martin S. Feather. A system for assisting program transformation. ACM Transactions on Programming Languages and Systems, 4(1):1{20, January 1982. [33] Uday S. Reddy. Transformational derivation of programs using the focus system. Extended abstract, University of Illinois at Urbana-Champaign, Champaign-Urbana, IL, August 1987. [34] D. R. Smith. KIDS: A Semi-automatic Program Development System. IEEE Transactions on Software Engineering, 16(9):1024{1043, September 1990. [35] Sanjay Bhansali and Mehdi T. Harandi. Program synthesis using derivational analogy. Technical Report UIUCDCS-90-1591, University of Illinois at Urbana-Champaign, 1990. [36] Mehdi T. Harandi and Sanjay Bhansali. Apu: Automating unix programming. In Tools for Arti cial Intelligence 90, Washington, D.C., November 1990. [37] Zohar Manna and Richard Waldinger. Fundamentals of deductive program synthesis. IEEE Transactions on Software Engineering, 18(8):674{704, August 1992. [38] F. L. Bauer, R. Berghammer, W. Dosch, R. Gnatz, E. Hangel, B. Moller, H. Partsch, P. Pepper, K. Samelson, H. Wossner, M. Broy, F. Nickl, M. Wirsing, F. Geiselbrechtinger, W. Hesse, B. Krieg-Bruckner, A. Laut, and T. Matzner. The munich project CIP, Volume I: The wide spectrum language CIP-L. In G. Goos and J. Hartmanis, editors, Lecture Notes in Computer Science: 183. CIP System Group, Springer-Verlag, 1985. [39] F.L. Bauer, H. Ehler, B. Moller, A. Horsch, O. Paukner, H. Partsch, and P. Pepper. The munich project CIP, Volume II: The program transformation system CIP-S. In G. Goos and J. Hartmanis, editors, Lecture Notes in Computer Science: 292. CIP System Group, Springer-Verlag, 1987. [40] Robert H. Bourdeau and Betty H. C. Cheng. Formal software speci cations through pictures. Technical Report MSU-CPS-93-16, Department of Computer Science, Michigan State University, A714 Wells Hall, East Lansing, 48824, June 1993. [41] Michael R. Laux, Robert H. Bourdeau, and Betty H. C. Cheng. An integrated development environment for formal speci cations. In Proc. of IEEE International Conference on Software Engineering and Knowledge Engineering, pages 681{688, San Francisco, California, July 1993. 31

[42] Betty H. C. Cheng and Gerald C. Gannod. Constructing formal speci cations from program code. In Proc. of Third International Conference on Tools in Arti cial Intelligence, pages 125{128, November 1991. [43] Gerald C. Gannod and Betty H. C. Cheng. A two-phase approach to reverse engineering using formal methods. Lecture Notes in Computer Science, Proc. of Formal Methods in Programming and Their Applications Conference, 735:335{348, June 1993.

32

List of Figures 1 2 3 4 5 6 7 8 9

Overview of Seed : : : : : : : : : : : : : : : : : : : : : : : : : : Rules for synthesizing an assignment statement : : : : : : : : : Process for synthesizing statements for a disjunctive expression Sort program generated by Seed : : : : : : : : : : : : : : : : : max function : : : : : : : : : : : : : : : : : : : : : : : : : : : : Synthesis of factorial function : : : : : : : : : : : : : : : : : : : The Hoare-Triples interface : : : : : : : : : : : : : : : : : : Procedure for constructing speci cations : : : : : : : : : : : : : Sample session with Seed : : : : : : : : : : : : : : : : : : : : :

33

: : : : : : : : :

: : : : : : : : :

: : : : : : : : :

: : : : : : : : :

: : : : : : : : :

: : : : : : : : :

: : : : : : : : :

: : : : : : : : :

: : : : : : : : :

: : : : : : : : :

: : : : : : : : :

: : : : : : : : :

5 7 10 15 17 20 21 22 22

Specification Editor

Specification

Decomposition

Domain-Specific Facts

ADT Processing

Software Component Library

Synthesis of Programming Structures

Composition of Results

Legend Annotated Program

Figure 1: Overview of Seed

34

Next Step Optional

Given an expression of the form

x = expr; occurring in a postcondition R, 1. The LHS of the equation x is a nonconstant term and is greater than or equal to the RHS expr with respect to the partial ordering on the symbols. (A nonconstant term refers to simple identi ers including array and function names that have not been declared to be constants, that is, identi ers whose values cannot be changed.) 2. Synthesize the statement x:= expr. 3. Find the wp of the assignment statement with respect to the postcondition, R. 4. Simplify the wp and remove tautologies. Figure 2: Rules for synthesizing an assignment statement

35

For a postcondition R, where R  R1 _ : : : _ Rn , 1. Find an expression Ei within disjunct Ri that can be satis ed by an executable statement Si ; 2. Let the guard Bi be the wp of Si with respect to Ri; 3. Simplify and remove tautologies within the wp. 4. Repeat Steps 1-3 until all disjuncts have been processed. Figure 3: Process for synthesizing statements for a disjunctive expression

36

{

true }

k:= 0 ; { (A i: 0 =< i < k: (A j: 0 =< j < (n-i)-1:c[j] =< c[j+1])) & 0 =< k =< n) & perm(c,c0)} do

k n -> kk := 0 ; { (A j: 0 =< j < kk:(c[j] =< c[j+1]) & (0 =< kk =< (n-k)-1) & perm(c,c0)} do

kk (n-k)-1

-> if c[kk] =< c[kk+1] -> skip [] c[kk] > c[kk+1] -> c[kk],c[kk+1] := c[kk+1],c[kk] fi; { c[kk] =< c[kk+1] } kk := kk + 1 ;

od; k := k + 1 ; od; { R: (A i: 0 =< i < n:(A j:0 =< j < (n-i)-1:c[j] =< c[j+1])) & perm(c,c0)}

Figure 4: Sort program generated by Seed

37

{ (x >= y) | (y >= x) Function max(val:x,y) begin if x >= y [] y >= x fi end. { R : (max >= x & max

& (x >= y) -> wp(max:= x,R) & (y >= x) -> wp(max:= y,R) } -> max:= x; -> max:= y;

>= y & max = x) |

(max >= x & max >= y & max = y)}

Figure 5: max function

38

f g

((x = 0) | (x > 0)) & (x = 0) -> wp(fact:= 1,R) & (x > 0) -> wp(fact:= x * fact(x-1),R)

Function fact(val:x) if x = 0 -> fact:= 1; [] x > 0 -> fact:= x * fact(x-1); fi R: (fact(x) = 1) & (x = 0) | (fact(x) = x * fact(x-1)) & (x > 0)

f

Figure 6: Synthesis of factorial function

39

g

Figure 7: The Hoare-Triples interface

40

1. Select a nonterminal symbol (italicized text) from one of the three windows (speci cation components). (the left mouse button). The tool highlights the selected item. 2. Select the Expand operation (the right mouse button). The tool presents the user with a menu of possible replacements. 3. Select one item from the list of possible replacements (the left mouse button). The tool substitutes the selected nonterminal with the chosen replacement. 4. Repeat steps 1 and 2 until there are no more nonterminals. Speci cation is complete. Figure 8: Procedure for constructing speci cations

41

Figure 9: Sample session with Seed

42