Automated Synthesis of Data Abstractions 1

Proc. of Irvine Software Symposium, pp. 161-176, June 1991.

Automated Synthesis of Data Abstractions Betty H.C. Cheng Department of Computer Science Michigan State University East Lansing, Michigan 48824

Abstract Research into the development of software tools that support formal methods is aimed at simplifying and providing assistance during the development of correct software. We have developed the Seed system that uses techniques amenable to automation in order to assist a user in the correct development of the building blocks of a large software system from user-supplied formal speci cations. Seed accepts a formal speci cation of a problem written in predicate logic and generates annotated program source code satisfying the speci cation. The rules for choosing which programming language structures to synthesize are contained in a rule base; background knowledge and domainspeci c information are entered into a fact base. During synthesis, Seed uses the fact base to disambiguate rule applications. In addition to primitive programming language constructs, such as assignment, alternative and iterative statements, Seed is capable of synthesizing recursive and non-recursive procedures and functions, as well as abstract data types. This paper describes the details of the synthesis of abstract data types using the Seed system.

1 Introduction Research into the development of software tools that support the use of formal methods is aimed at simplifying and providing assistance during the development of correct software. Our objective is to develop tools that will support the use of formal methods in all phases of software development, including design, speci cation, implementation, and maintenance. An overall goal is to design and develop a programming environment comprising several such development tools that will assist a programmer in the construction of large software products. In the rst stage of this project, we have developed an automated program synthesis system, Seed, which synthesizes programs from formal speci cations while verifying their correctness [Che90]. In the Seed project, we have shown 1

Specification

Decomposition

ADT Processing

Module Library

Facts

Theorem Prover

Synthesis of Programming Structures

Composition of Results

Legend Annotated Program

Next Step Optional

Figure 1: Overview of Seed that the building blocks of a large software system can be correctly synthesized in an automated manner from user-supplied formal speci cations. Seed accepts a predicate logic speci cation of a problem and generates program source code satisfying the speci cation. Figure 1 gives a pictorial overview of the Seed system. Once a speci cation is entered, it undergoes pre-processing. The speci cation may be decomposed into logical tasks that are synthesized as functions or procedures. If an abstract data type (ADT) is to be generated, the ADT speci cation is tested for sucient completeness[KNZ91, DJ90, Mus80b]. After pre-processing is completed, programming statements are synthesized in order to satisfy the speci cation. The rules for choosing which programming language structures to synthesize are stored in a rule base. Background knowledge and domain-speci c information are entered into a fact base. During synthesis, Seed uses the fact base to disambiguate rule applications. Synthesized

programming statements are veri ed for correctness; this often requires the use of a theorem prover that interfaces with Seed. After veri cation, new logical expressions may be obtained that need to be satis ed through further synthesis. This method leads to the simultaneous construction of a

program and its proof of correctness. Once Seed has successfully synthesized a procedure, function, or ADT, the speci cation and resultant code are stored in a module library. If no application of rules will yield a satisfactory program, Seed informs the user, who may then modify the speci cation and restart the synthesis. Seed's synthesis rules are based in part on guidelines from Dijkstra [Dij76] and Gries [Gri81] to develop a program and its proof of correctness simultaneously. A program speci cation is expressed in terms of pre- and postconditions that describe the initial and nal states, respectively, of a program. In our approach, rst order predicate logic has been chosen as the formal speci cation language. The semantics of a predicate logic speci cation can be directly extracted from its symbols, the small number of which implies a small rule base. The Seed rule base comprises a set of synthesis rules for each type of logical expression such as a disjunctive or a quanti ed expression. The choice of a programming statement S for a postcondition R is veri ed by nding the weakest precondition (wp) of S with respect to R, where wp(S; R) describes the set of states in which execution of S can begin and upon termination of S , R will be satis ed. All rules are synthesized in Prolog as it can be readily applied to predicate logic speci cations. In order to help the user understand the resultant code, it is annotated with pre- and postconditions used during synthesis. To assist the user in the speci cation process, we are developing a syntax-based editor that runs in a window-based environment with a graphical user interface [BC91]. The progression of our investigation was guided by the goal to synthesize the components of large software projects. Our initial version of Seed synthesized primitive programming structures found in an imperative language. These structures include assignment, alternative and iterative statements. Then we looked at two methods of abstraction. The rst method is abstraction by parameterization, where through the introduction of parameters, we can represent a potentially

in nite set of dierent computations with a single program text that is an abstraction of them. The second type of abstraction is abstraction by speci cation, which associates the speci cation of program text with its implementation body. This type of abstraction allows us to reference a procedure in terms of its speci cation and not to its implementation. Given the two methods of abstractions, there are two types of abstractions, procedural abstraction and data abstraction. Procedural abstraction allows us to extend the programming language

with new operations. In order to handle procedural abstractions in the system, we developed rules to synthesize recursive and non-recursive procedures and functions from speci cations, which can be invoked through procedure calls and referenced with respect to their speci cations. See [Che90, Che91] for details concerning the synthesis of procedural abstractions. A data abstraction is a set of objects and a set of operations that characterize the behavior of the objects. Data abstraction allows the programmer to defer decisions about data structures until the uses of the data are fully understood [LG86]. The programmer should be able to tailor the programming language to dierent problem domains by creating data types that perform certain operations more eciently as compared to using the base set of data types de ned in the programming language. Thus, we developed rules that implement abstract data type speci cations using the above described synthesis rules. This paper focuses on the synthesis of data abstractions. The remainder of this paper is organized as follows: In Section 2, we describe the rules for determining the sucient completeness of speci cations of data abstractions. The implementation of ADTs is described in Section 3. An example speci cation and implementation are given in Section 4. Related work is discussed in Section 5. Finally, Section 6 draws conclusions from this work and discusses future investigations.

2 Completeness of Speci cations An axiomatic speci cation of an ADT uses equations to describe the relationships between the operators. The axioms are de ned in terms of constructors, which are symbols used to construct or add to an ADT. For instance, the constructors for the stack ADT are the symbols push and newstack. Stack operators such as top and pop cannot create a stack or add elements to a stack.

Before implementing an abstract data type speci cation, Seed must determine that the axiomatic speci cation describes how every value of the ADT can be generated. This is done in a two-step process. First, the Knuth-Bendix completion procedure [KB70] is applied to the axioms describing the operations of the ADT; it determines if the patterns on the LHSs of the rules cover all possible (variable-free) input values [Der85]. The programmer supplies a precedence ordering between the symbols that is a well-founded ordering to the Knuth-Bendix completion procedure. The precedences provide the partial ordering to rewrite equations of the form x = y , such that the

LHS of an equation is greater in precedence than the RHS, which forms a rewrite rule of the form

l ! r. A rewrite rule l ! r can be used to replace an occurrence of l with r at some position p in a term t. Second, the test set method [KNZ86] is applied to the results from the Knuth-Bendix completion procedure in order to determine whether a set of axioms is suciently complete with respect to a set of constructors [GH78, Mus80a]. A set of rules is suciently complete if every term can be rewritten as a term that is built only from the constructors. The following sections brie y review the Knuth-Bendix completion procedure and the test set method.

2.1 Knuth-Bendix Completion The Knuth-Bendix completion procedure generates a set of rules R that follow from a set of equations E . The completion procedure takes as input a nite set of equations E and a wellfounded ordering (wf o ) on the symbols that occur in the equations. The rule set R is initially empty. If the procedure terminates successfully, it produces a canonical system of rules R for the equations E . Canonical systems are reduced and convergent. If all computations lead to a unique normal form, the system is called convergent. A term t is in normal form if and only if there is no

term t0 such that there exists a rule t ! t0 . A system R is said to be reduced if, for each rule l ! r in R, the RHS r is irreducible under R, and the LHS l is not rewritable by any rule in R other than l ! r. The completion procedure may fail in one of two ways: it may be unable to add any rule because for all equations l = r in E , l and r are incomparable under the ordering wf o , or it may continue to generate an in nite number of new rules without ever nding a canonical system. The completion procedure is given in Figure 2 [Der82]. An important eect of the Knuth-Bendix completion process is that new rewrite rules are generated to resolve ambiguities between existing rules. For those cases when the equations l = r cannot be found, where l wf o r, more information, such as precedence relations between symbols occurring in the equations, is needed from the user to obtain a canonical system of equations. For each rule

l ! r that is added to the system of rules, all critical pairs formed from using the new rule are added to the set of equations, where a critical pair is an equation that represents the most general nonvariable overlap between two LHSs of rules.

Assume E contains the set of equations. Repeat as long as equations are left in E . If none remain, terminate successfully. The set of rewrite rules R is initially empty. 1. Remove an equation l = r (or r = l) from E such that l wf o r. If none exists, terminate with failure (abort). 2. Add the rule M ! N to R. 3. Use R to reduce the RHS of existing rules. 4. Add to E all critical pairs formed using the new rule, where a critical pair is an equation that represents the most general nonvariable overlap between two LHSs of rules. 5. (Optional) Remove all the old rules from R whose LHS contains an instance of l. 6. Use R to reduce both sides of equations in E . Remove any equation whose reduced sides are identical. Figure 2: Knuth-Bendix Completion Procedure

2.2 Test Set Method Determining that a set of rewrite rules is suciently complete with respect to a set of constructors indicates that all terms can be built from constructors. In general, determining the completeness of a speci cation is undecidable [Mus80a, Mus80b, DJ90]; however, determining the completeness of a speci cation with respect to constructors is decidable [GH78]. We use the test set method developed by Kapur et al [KNZ86, KNZ91] to determine the sucient completeness of an ADT speci cation, where a test set is a nite description of ground terms in normal from. We review the theory and the construction of test sets in this section. A ground term is either a constant term or the application of a function all of whose arguments are ground terms. There are dierent strategies for determining sucient completeness [Mus80b, KNZ86, KNZ91, Jal89]; we use the test set method because, if a set of axioms is not complete, the test set returns templates of the LHSs of axioms that are necessary to make the set of axioms suciently complete. The templates can be used by the programmer as guidelines for specifying the missing equations. A template is of the form

rule(op1(opx; opy); something);

where rule(l; r) is used to represent a rewrite rule l ! r; in this example, l is the expression of the form op1(opx; opy ) and the RHS of the rule r, something, is to be supplied by the user. After the programmer supplies the necessary axioms according to the templates, the KnuthBendix completion procedure and the test set generation are repeated. When there are no templates returned from the test set method, the set of axioms is suciently complete with respect to the speci ed constructors. A term is made up of symbols from the set of function symbols F and a superterm is a mapping from N to subsets of F , where N is the set of natural numbers. For instance, given the axioms for addition in natural numbers

x+0 = x x + s(y) = s(x + y);

where x can be any expression and s is the successor function, the term (0+ s(x)) and its superterm are shown below in tree format +

0

{+}

s

term tree

x

{0}

{s}

{} superterm tree

In the term tree, each node represents the name of the function; the left child of each node is the leftmost operand of the parent function and the right child of each node is the second operand of the parent function. In the superterm tree, each node is the set consisting of the corresponding function symbol from the term tree; or if the node in the term tree contains a variable, then the corresponding node in the superterm tree is the empty set (fg). A Def-domain(f; R) is the result of the merge of all superterms corresponding to subterms of the LHSs of rewrite rules in R such that the root symbol, that is, the top level function of each subterm is f . An extending domain of f with respect to R is denoted as Sf and is a superterm covering Def-domain(f; R) for the function f , where s1 covers s2 if for each position p in s2, s2(p) s1(p),

where s(p) represents the position p in the term s and represents the subset relation from set

theory. Extending domains are intended to cover def-domains of several rewrite systems. Thus, if D1 is a def-domain for R1 and D2 is a def-domain for R2, then D1 S D2 is an extending domain of

R1 and R2 . In our context, we only consider one rewrite system R at a time, thus, def-domain and extending domain are synonymous for a given function f and rewrite rule system R. For instance, the term trees for x + 0 and x + s(y ) are: +

+

x

x

0

x+0

s

x+s(y)

y

The corresponding superterm trees are: {+}

{ +}

{0}

{}

{ s}

{}

x+0

x+s(y)

{}

A superterm covering for the function + in the above example is obtained by merging all its superterms in the following manner:

{0}

{}

{ +}

{+}

{ +}

{ s}

{}

{ 0, s }

{}

{}

x+0

x+s(y)

S+

{}

From these de nitions, we can build skeletons of ground terms to de ne test sets.

De nition 1

Let t be a ground term with the root symbol f and Sf be an extending domain of f .

The skeleton of t with respect to Sf is a term t0 maximal in size, such that t0 () = f , where is the empty position, and for all positions pi in t0 , ( t(p) 2 Sf (p) 0 t (pi ) = xt(pi ) ifotherwise, where x is a distinct variable. the function symbol f at position p is an ancestor of the symbol at position pi in a superterm tree and the arity of f is greater than or equal to i. We denote t0 by Skel(t; Sf ). For instance, given the rule set

R = fs(s(0)) ! 0; x + s(0) ! s(0); x + 0 ! xg; the skeleton of a term s(x) + s(y ) with respect to extending domain S+ is s(x1 ) + s(y ), where x1 is a distinct variable. This result follows from looking at the extending domain of +, which is {+}

{ 0, s }

{}

S+

{}

The expression s(x) is changed to s(x1 ) in Skel(s(x)+ s(y ); S+) because s is not a member of S+ in the left child position of S+ . The second operand, s(y ) remains the same in Skel(s(x) + s(y ); S+) because s is a member of the right child of S+ . Another example is the term 0 + s(s(0)) whose skeleton is the term 0 + s(s(x1)). The left operand of the function + remains 0 in the skeleton because + is in the position in S+ . But the symbol 0 in the term s(s(0)) is changed to the variable x1 because the second level s is not a member of S+ at the corresponding level. In summary,

S = fSf j Sf is an extending domain of f in Rg: Let S be an extending domain of R. The standard test set, TS 2(R; S ) is de ned as:

TS 2(R; S ) = fSkel(t; Sf ) j t 2 IRG(R); Sf 2 S and the root symbol of t is f g: where IRG(R) represents the set of all ground terms, made from the set of function symbols F in the axiomatization, in normal from with respect to R.

Referring to the previous example, the test set TS 2(R; S ) is f0; s(0)g where the constructors are

f0; sg and the set S is f Def-domain(0; R), Def-domain(s; R) g , which have the following extending domains: {s}

{0}

{ s, 0 }

{} Def-domain (0 , R ) = S0

Def-domain (s , R ) = Ss

From the extending domains given above and the elements of IRG(R) (f0; s(0)g), we see that the skeletons Skel(0; R) and Skel(s; R) are 0 and s(0), respectively. In [Che90], we describe the implementation of the construction of the test set TS 2. Consider the incomplete axiomatic speci cation of a stack given in Figure 3 as an example, where an equation x = y can also be represented by a 2-place equality expression eq(x,y). eq(create,newstack). eq(top(newstack),notdef). eq(pop(push(x, z)),z). eq(pop(newstack),newstack). eq(isempty(newstack),true). eq(isempty(push(x, y)),false).

Figure 3: Incomplete speci cation for stack

We supply the following precedence ordering create

newstack,

top

notdef;

where notdef is a constant representing unde ned values. After Seed applies the Knuth-Bendix completion procedure successfully, the original equations are returned in the form of rewrite rules given in Figure 4. In the second step of the completeness check, the test set method [KNZ86] is applied to R in order to determine whether a set of axioms is suciently complete with respect to a set of

constructors [GH78, Mus80a]. A set of axioms is suciently complete when every term can be rewritten as a term that is built only from the constructors. Terms that cannot be reduced by the rules in R indicate that the set of rules is not suciently complete. After applying the test set method to the rules in Figure 4 and specifying

newstack

Figure 4 and the skeleton of a rule top(push(x,

z))

and push as constructors, the rules in

are returned, thus indicating that a rule with

this LHS is missing. The stronger implication is that the operation top is not completely speci ed with respect to the constructors push and newstack. Entering the axiom rule(top(push(x, z)),x)

and having Seed apply the Knuth-Bendix completion procedure followed by the application of the test set method results in the suciently complete speci cation for a stack as given in Figure 5. rule(create, newstack). rule(top(newstack) , notdef). rule(pop(newstack) , newstack). rule(pop(push(x, z)) , z). rule(isempty(newstack) , true). rule(isempty(push(x, y)) , false).

Figure 4: Rewrite rules for stack

rule(create,newstack). rule(top(newstack),notdef). rule(top(push(x,z)),x). rule(pop(push(x, z)),z). rule(pop(newstack),newstack). rule(isempty(newstack),true). rule(isempty(push(x, y)),false).

Figure 5: Suciently complete speci cation for stack

3 Implementation of ADT Operations Once a suciently complete set of axioms is obtained, the programmer gives one or more equations that de ne the constructors in terms of an existing data structure in order to implement the ADT. The Knuth-Bendix completion procedure is applied to the suciently complete set of axioms and to the new equations de ning the constructors in terms of an existing data structure. One of the eects of the Knuth-Bendix completion procedure is that the resultant set of rules are in normal form. Recall that the results of the sucient completeness process is a set of rules in terms of the

constructors, and because the newly added rules de ne the constructors in terms of an existing

data structure, after applying the Knuth-Bendix procedure, a set of rules in normal form in terms of the existing data structure is returned. From the suciently complete set of rewrite rules, pre- and postconditions are extracted in order to implement the operations in an imperative language. The preconditions give the types of the operators; they are expressed in signature notation

;

name: left-types -> right-type

where name indicates the name of the operator, left-types gives the argument types of the LHS of an axiom equation and left-types

right-type

gives the type for the RHS of an axiom equation and if

is the symbol [], then the operator has no arguments and is therefore a constant of

type right-type. A postcondition is developed for each operation of a given ADT by forming a disjunctive expression comprising the equality expressions that represent the rewrite rules whose top level function is the operator currently being de ned. Thus, the two rules

rule(op1(opnd1a; opnd2a) ; rhs1) rule(op1(opnd1b; opnd2b) ; rhs2) are transformed into the disjunctive expression

op1(x1; x2) = rhs1 ^ x1 == opnd1a ^ x2 == opnd2a op1(x1; x2) = rhs2 ^ x1 == opnd1b ^ x2 == opnd2b:

_

The variables x1 and x2 are placeholders; they have dierent values for each disjunct. The symbol `==' indicates the boolean equality relation where neither operand can be changed. This symbol diers from the `=' symbol that is a directed equality expression. The synthesis rules for procedural abstractions, described in [Che91, Che90], are applied to the postcondition to implement the operations. In the stack example, the constructors push(x,y) and newstack are implemented with the list data structures, stk(cons(y,x)) and stk(newlist), respectively, where we assume that list is an existing data structure. We encapsulate the list data type with the stk function and declare it to be of type stack. The stk function provides the relationship between the stack and the list data types. The signature of stk is [list ! stack]. We apply the Knuth-Bendix completion procedure in order to obtain the set of rules that de ne the stack ADT in terms of a list.

In order to illustrate the extraction of a postcondition from rewrite rules, we consider in detail the pop operation for a stack ADT, which is de ned by the two rewrite rules rule(pop(push(x,y)) rule(pop(newstack)

; x) ; newstack):

After applying the completion procedure and extraction process, the corresponding postcondition for pop is pop(z) = x

where the

rule

^z

== stk(cons(y,x))

_ pop(z)

= stk(newlist)

^z

;

== stk(newlist)

predicates have been replaced by the corresponding directed equations. Notice

that the element position of push and top is of type univ, which represents the universal type; this indicates that stack is parameterized data type. The synthesized ADTs are parameterized data types, that is, the same ADT can be used for dierent objects in future speci cations. Figure 6 gives the implementation of the stack operations in terms of a list.

4 Example In order to demonstrate the program development methodologies used in our approach, we speci ed the ADTs and routines needed to implement a parser given a grammar. Speci cally, we assume that we are given a grammar G = (V; T; S; P ), where V is the set of nonterminal symbols, T is the set of terminal symbols, S is the start symbol for the grammar, and P is the set of productions. From this information, Seed constructs the components of a parser that will determine if a given string w is valid in grammar G. Figure 7 gives a pictorial description of an LR parser. For illustrative purposes, we assume that the grammar G has been disambiguated and we have access to the previously speci ed set and stack ADTs. In this discussion, we only give the speci cation and the implementation of the closure function that produces the canonical collection of sets of LR(0) items from a set of items and grammar production rules, where we assume that we have access to the set ADT and its operations (e.g. member, union, and insert). For the complete set of speci cations and implementation of the components of a parser and ADTs as described in [AU77], see [Che90].

Implementation of stack Operators

f

g

create: newstack -> stack func create: stack create:= stk(newlist) create = stk(newlist) push: stack X univ -> stack func push(val-res: stk(xstk); val:xelem): stack xstk:= cons(xelem,xstk) push(xstk,xelem) = stk(cons(xelem,xstk)) top: stack -> univ func top(val: x): univ if x == stk(cons(xelem,xstk)) -> top:= xlem || x == stk(newlist) -> top:= notdef fi top(x) = xelem and x == push(xstk,xelem)) or top(x) = notdef andx == newstack pop: stack -> stack func pop(val:x): stack if x == stk(cons(xelem,xstk)) -> pop:= xstk || x == stk(newlist) -> pop:= stk(newlist) fi pop(x) = xstk and x == push(xlem,xstk) or pop(x) = newstackand x == newstack isempty: stack -> boolean func isempty(val: x): boolean if x == stk(cons(xelem,xstk)) -> isempty:= false || x == stk(newlist) -> isempty:= true fi isempty(x) = false and x == push(xstk,xelem) or isempty(x) = true and x == newstack

f f

f f

f f

g

g

g

g

g

f f

f

g

g

Figure 6: Stack implementation in terms of a list

g

g

Speci cation of closure function : We use the following notation convention in the speci cations: productions of the form

a ! b are represented as prod(a; [; dot; b; ]), Greek letters represent strings, upper case identi ers represent parameters, lower case identi ers represent bound identi ers or variables local to a speci c routine, and predicate equivalents are used to represent set operations (e.g. subset(x,y) represents x y and member(x,y) represents y 2 x). function closure(

val P:set; var I:set) : set

f pre : I 0 = I g f Post: (I = I 0) ^ closure = I _ (I = 6 I 0) ^ closure = closure(P; I 0) ! (8p : member(I; p) ^ p == prod(a; [; dot; b; ]) :

a1

ai

an

Input

sm

Stack Driver

Parsing

Routine

Table

Figure 7: Parser Environment

member(P; prod(b; [ ]))^ member(I 0; prod(b; [dot; ])) ^ subset(I; I 0)

g where I represents the set of items, I 0 represents the closure of the set of items, and P represents the set of productions of the grammar.

Implementation of closure function: First, Seed works on the LHS of the implication of the postcondition and uses the disjunctive clause in order to synthesize the following alternative statement. if

I = I' ->

closure:= I;

||

I I' -> closure:= closure(P,I')

fi

f(I = I 0) ^ closure = I ^ I 0 6= I 0 _ (I 6= I 0) ^ closure = closure(P; I 0)g Seed veri es the development of the alternative statement (if- ), by nding the weakest

precondition of the statement with respect to the postcondition Post. The structure of the if- statement is

B1! S1 || B2! S2 . .. . ! .. || Bn! Sn if

fi.

and the wp of an if- statement is (9i :: Bi ) ^ (8i :: Bi ! wp(Si; Post); where Bi is a Boolean condition and Si is the associated statement list that is executed if Bi is true. For our example, the simpli ed wp of the alternative statement is

wp(if ? ; Post) true ^ (8p : member(I; p) ^ p == prod(a; [; dot; b; ]) : member(P; prod(b; [ ])) ^ member(I 0; prod(b; [dot; ])) ^subset(I; I 0)) Let us call the second conjunct Post2 . Because Post2 involves a quanti ed expression, Seed develops an iterative structure, which requires the development of an invariant that describes the conditions before and after the loop, a bound function that provides a bound on the number of iterations, and statements to make progress towards termination and maintain the truth of the invariant. An iterator is a data object created to iterate through the elements of an ADT. There are four iterator operations: create a copy of an ADT, nondeterministically choose an element from the ADT, check if all elements of an ADT have been processed, and destroy a copy of the ADT. See [Che90, Che91] for details concerning the development of an iterative statement and iterators. Seed develops the invariant, Invar by replacing the constant I with the iterator variable pi; and Seed includes the appropriate bounds for pi (subset(pi; I )).

Invariant : Describes loop conditions before entry and upon exit

Invar : (8p : member(pi; p) : p = set next(pi) ^ p == prod(a; [; dot; b; ]) member(P; prod(b; [g]))^ member(I 0; prod(b; [dot; ]))) ^ subset(I; I 0) ^ subset(pi; I )

Bound function: Provides an upper bound on number of loop iterations t(> 0) :

card(pi)

The loop is bounded by the cardinality of the set pi.

Guard of Loop : Boolean iterator function is true when all elements have been processed, otherwise it returns the value false.

B:

set_done(pi)

Initialize Invariant: Creation of iterator object that is a copy of the set ADT containing productions. pi:= set_create(I)

wp(pi:= set create(I); Invar) = (8p : member(I; p)^ p == prod(a; [; dot; b; ]) : member(P; prod(b; [ ])) ^ member(I; prod(b; [dot; ]))) Invar

Make progress towards termination: Nondeterministically pick an element from the set. I:= set_next(pi)

wp(p:= set next(pi); Invar) Post02 = (8p : member(pi; set next(pi)) : set next(pi) == prod(a; [; dot; b; ]) ^ member(P; prod(b; [ ])) ^ member(I 0; prod(b; [dot; ])))^ subset(I; I 0) ^ subset(set next(pi); I ) Invar ^ set next(pi) == prod(a; [; dot; b; ])^ member(P; prod(b; [ ])) ^ member(I 0; prod(b; [dot; ]))) Seed must establish that the following expression Post02 holds:

set next(pi) == prod(a; [; dot; b; ])^ member(P; prod(b; [ ]))^

member(I 0; prod(b; [dot; ]))) A case analysis of Post02 gives the disjunctive expression:

set next(pi) == prod(a; [; dot; b; ])^ member(P; prod(b; [g]))^

member(I 0; prod(b; [dot; ])))

_ set next(pi) == prod(a; [; dot; b; ])^ member(P; prod(b; [ ]))^

:(member(I 0; prod(b; [dot; ]))) The rst disjunct already satis es the expression Post02 , thus

skip

is synthesized as the

guarded statement. The second disjunct has only one value that is not declared as a constant, namely, I 0. Thus, since the axiom for member is de ned in terms of the constructor operator insert, Seed uses the insert operation as the rst attempt in order to satisfy the postcondition.

The following statement is obtained:

:

I':= insert(I',prod(b,[dot, ]))

The corresponding wp is wp(I':=

; Post02 ) = true. Finally,

insert(I',prod(b,[dot, ]))

Seed synthesizes the alternative statement

if set next(pi) == prod(a,[ ,dot,b, ]) and member(P,prod(b,[ ]))

and member(I',prod(b,[dot, ]))) -> skip

not(member(I',prod(b,[dot, ]))) ->

|| set next(pi) == prod(a,[ ,dot,b, ]) and (member(P,prod(b,[ ])) and

I':= insert(I',prod(b,[dot, ]))

fi;

f(I = I 0) ^ closure = I ^ I 0 6= I 0 _ (I 6= I 0) ^ closure = closure(P; I 0)g The synthesized code for the closure function is given in Figure 8.

5 Related Work The AFFIRM system [Mus80a] provides a framework for the algebraic speci cation and veri cation of abstract data types and Pascal-like programs using these types. The programs referencing the abstract data types are written in Pascal while the veri cation conditions and their proofs are carried out in a framework based on rewrite rule theory [HO80]. The Larch Project [HW85] is developing tools and techniques to aid in the productive application of formal speci cations. It is based on a two-tiered approach to speci cation. Each Larch speci cation has components written in two languages : one is designed for a speci c programming language (interface) and the other is common to all programming languages (shared). Each Larch interface language deals with what can be observed about the behavior of components written in a particular programming language. It provides a way to write assertions about program states. This approach diers from ours in that all speci cation information given to Seed is implementation independent. Seed synthesizes program modules from predicate calculus speci cations of pre- and postconditions.

6 Concluding Remarks We have presented a system of rules that facilitate the automated synthesis of programs and data structures from speci cations. The rules take speci cations written in predicate logic as

function closure(val P:set; var I:set): set; { I = I'} pi:= set_create(I);

fInvar 8p g : (

:

member(pi; p) : p = set next(pi) ^ p == prod(a; [; dot;b; ]): member(P; prod(b; [ ])) ^ member(I ; prod(b; [dot; ]))) ^ subset(I; I ) ^ subset(pi; I ) 0

0

do not(set done(pi)) -> if set next(pi) == prod(a,[ ,dot,b, ]) and member(P,prod(b,[ ])) and member(pi,prod(b,[dot, ]))) -> skip || set next(pi) == prod(a,[ ,dot,b, ]) and (member(P,prod(b,[ ])) and not(member(member(pi,prod(b,[dot, ])))) ->I':= insert(I',prod(b,[dot, ])) fi; set next(pi) == prod(a,[ ]) member(P,prod(b,[ ])) member(I',prod(b,[dot, ])))

f

_

^

; dot; b;

; dot; b; ]) ^ dot; ])))

set next(pi) == prod(a,[ (member(I',prod(b,[

g

:

^

member(P,prod(b,[ ]))

^

p:= set next(pi); od; set destroy(pi); if I' = I -> closure:= I || I' I -> closure:= closure(P,I') fi;

fI f (

= Post:

I) 0

^ closure I ^ I 6 I _ I 6 I ^ closure I I ^ closure I _ I 6 I ^ closure closure P; I ! 8p member I; p ^ p prod a; ; dot; b; member P; prod b; ^ member I ; prod b; dot; ^ subset I; I 0

=

(

=

0

)

=

(

=

0

)

=

(

:

0

0

(

(

)

(

g

=

(

0

(

=

0

)

=

closure(P; I ) 0

g

)

==

(

[ ]))

(

[

(

]))

[

]) :

(

0

)

Figure 8: closure function input and generate imperative programs as output. We developed rules to synthesize primitive programming structures, such as assignments, alternative statements, and iterative structures and their invariants. The syntheses of the programming structures are veri ed with Dijkstra's weakest precondition predicate transformer. This style of program development was extended to handle the implementation of programs for larger and more complex speci cations, including rules for synthesis and later invocation of recursive and non-recursive procedures and functions. With the ability to encapsulate recursive and non-recursive procedures, we addressed the problem of implementing user-speci ed ADTs. The implementation of ADTs required the additional

task of determining the completeness of the ADT speci cation with respect to the constructors. The Knuth-Bendix completion procedure [KB70] and the test set method [KNZ86] are used to perform the completion process. Pre- and postconditions are extracted from the complete speci cation and then used by the synthesis rules to generate the ADT operations in the form of recursive and non-recursive routines. From the work that was performed with the Seed project, we have the means to determine the completeness of abstract data type speci cations and produce correct implementations from formal speci cations of procedural and data abstractions. Our current and future investigations into the use of formal methods in software development complement our work with the Seed system. We are currently developing tools that will support formal methods and object-oriented analysis development for the design, speci cation, implementation, and maintenance phases of software engineering.

References [AU77] [BC91] [Che90] [Che91] [Der82] [Der85] [Dij76] [DJ90]

Alfred V. Aho and Jerey D. Ullman. Principles of Compiler Design. Addison-Wesley, Reading, MA, 1977. Robert H. Bourdeau and Betty H.C. Cheng. Spectacle: A Formal Speci cation Editor, 1991. in preparation. Betty Hsiao-Chih Cheng. Synthesis of Procedural and Data Abstractions. PhD thesis, University of Illinois at Urbana-Champaign, 1304 West Spring eld, Urbana, Illinois 61801, August 1990. Tech Report UIUCDCS-R-90-1631. Betty H.C. Cheng. Synthesis of Procedural Abstractions from Formal Speci cations. In COMPSAC'91: Computer Software and Applications Conference, 1991. Nachum Dershowitz. Applications of the Knuth-Bendix completion procedure. In Proceedings of the Seminaire d'Informatique Theorique, pages 95{111, Paris, France, December 1982. Nachum Dershowitz. Synthesis by completion. Fifth International Joint Conference Arti cial Intelligence, 1:208{214, 1985. Edsger W. Dijkstra. A Discipline of Programming. Prentice Hall, Englewood Clis, New Jersey, 1976. Nachum Dershowitz and Jean-Pierre Jouannaud. Rewrite systems. In J. van Leeuwen, editor, Handbook of Theoretical Computer Science, chapter 6. North-Holland, Amsterdam, 1990. In press; available as Rapport 478, LRI,Univ. Paris-Sud,France.

[GH78] J. V. Guttag and J. J. Horning. The Algebraic Speci cation of Abstract Data Types. Acta Informatica, 10:27{52, 1978. [Gri81] David Gries. The Science of Programming. Springer-Verlag, 1981. [HO80] G. Huet and D.C. Oppen. Formal Language Theory, Perspectives and open Problems, chapter Equations and rewrite rules, pages 349{397. Academic, New York, 1980. [HW85] J.V. Guttag J. J. Horning and J. M. Wing. Larch in ve easy pieces. Technical report, Digital Systems Research Center, Palo Alta, California, July 1985. [Jal89] Pankaj Jalote. Testing the completeness of speci cations. IEEE Transactions on Software Engineering, 15(5):526{531, May 1989. [KB70] D. Knuth and P. Bendix. Simple word problems in universal algebras. In J. Leech, editor, Computational Problems in Abstract Algebra, pages 263{297. Pergamon Press, Oxford, 1970. [KNZ86] Deepak Kapur, Paliath Narendran, and Hantao Zhang. Proof by induction using test sets. In J.H. Sikemann, editor, Proceedings of the Eighth International Conference on Automata, pages 99{117, Oxford, England, July 1986. Vol. 230 of Lecture Notes in Computer Science, Springer, Berlin. [KNZ91] Deepak Kapur, Paliath Narendran, and Hantao Zhang. Automating Inductionless Induction using Test Sets. J. Symbolic Computation, 11:83{111, 1991. [LG86] Barbara Liskov and John Guttag. Abstraction and Speci cation in Program Development. MIT Press and McGraw-Hill, Cambridge, 1986. [Mus80a] David R. Musser. Abstract Data Type Speci cation in the AFFIRM System. IEEE, 1(6), January 1980. [Mus80b] David R. Musser. On Proving Inductive Properties of Abstract Data Types. In Conference Record of Third ACM Symposium on Principles of Programming Languages, pages 154{162, 1980.

Automated Synthesis of Data Abstractions 1

Automated Synthesis of Data Abstractions 1

Suggest Documents

Towards Automated Synthesis of Data Mining Programs

Synthesis from Formal Partial Abstractions

Automated Abstractions for Contract Validation - CiteSeerX

1: Introduction - Software Abstractions

Breaking Abstractions and Unstructuring Data

CSE332: Data Abstractions Lecture 1: Introduction; Stacks/Queues ...

Towards Automated Synthesis of Data Mining Programs - CiteSeerX

Supporting the Restructuring of Data Abstractions through ...

Supporting the Restructuring of Data Abstractions through

Automated Radiochemical Synthesis of - JoVE

Breaking Abstractions and Unstructuring Data Structures

FREE [DOWNLOAD] DATA STRUCTURES AND ABSTRACTIONS ...

MapJAX: Data Structure Abstractions for Asynchronous Web ...

Reasonable Abstractions: Semantics for Dynamic Data Visualization

Breaking Abstractions and Unstructuring Data Structures

Programming Abstractions for Data Intensive Computing ... - CiteSeerX

Abstractions for Recursive Pointer Data Structures - CiteSeerX

New Abstractions for Data Parallel Programming - CiteSeerX

electronic reprint Abstractions, algorithms and data ...

Abstractions for Adaptive Data Parallelism - Semantic Scholar

Programming with shared data abstractions - Semantic Scholar

Reasoning about Algebraic Data Types with Abstractions

Automated Abstractions for Contract Validation - Publications at DC

Automated Logical Verification based on Trace Abstractions - CiteSeerX