Electronic Notes in Theoretical Computer Science - CiteSeerX

0 downloads 0 Views 2MB Size Report
Jun 30, 2006 - Recent Advances in Real-Time Maude . ... shop on Rule Based Programming, RULE 2006, which was held in Seattle on the. 11 August 2006 ...
Electronic Notes in Theoretical Computer Science

Rule Based Programming RULE 2006

Seattle, USA 11 August 2006

Guest Editors: ´ndez and Ralf La ¨mmel Maribel Ferna

Contents Preface

v

Claude Kirchner (Invited Speaker) Rewriting (your) calculus . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

1

Francisco Lopez-Fraguas and Jose Miguel Cleva Semantic determinism and functional logic program properties . . . . . . .

2

Victor Winter Model-driven Transformation-based Generation of Java Stress Tests . . 16 Alcino Cunha and Joost Visser Strongly Typed Rewriting For Coupled Software Transformation . . . . .

33

Emanuel Kitzelmann and Ute Schmid Inducing Constructor Systems from Example-Terms by Detecting Syntactical Regularities . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 51 Dick Kieburtz (Invited Speaker) Programmed Strategies for Program Verification . . . . . . . . . . . . . . . . . . . . .

67

Florent Kirchner and Francois-Regis Sinot Rule-Based Operational Semantics for an Imperative Language . . . . . . . 68 Peter Olveczky and Jose Meseguer Recent Advances in Real-Time Maude . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

81

Fernando Rosa-Velardo Coding Mobile Synchronizing Petri Nets into Rewriting Logic . . . . . . . . 100

ii

Preface The rule-based programming paradigm is characterised by the repeated, localised transformation of a data object such as a string, term, graph, proof, constraint store, etc. The transformations are described by rules which separate the description of the object to be replaced (the pattern) from the calculation of the replacement. Optionally, rules can have further conditions that restrict their applicability. The transformations are controlled by explicit or implicit strategies. The basic concepts of rule-based programming appear throughout computer science, from theoretical foundations to practical implementations: Term rewriting is used to specify the semantics of programming languages, and graph rewriting is a popular programming language implementation technique. Rules are used implicitly or explicitly to perform computations, e.g., in Mathematica, OBJ, ELAN, Maude, or to perform deductions, e.g., by using inference rules to describe or implement a logic, theorem prover or constraint solver. Mail clients and mail servers use complex rules to help users organise their email and filter out spam. Language implementations use bottom-up rewrite systems for code generation (as in the BURG family of tools.) Constraint-handling rules (CHRs) are used to specify and implement constraint-based algorithms and applications. Rule-based programming idioms also give rise to programming languages and systems such as Claire, Elan, Maude and Stratego. This volume contains the papers presented at the 7th International Workshop on Rule Based Programming, RULE 2006, which was held in Seattle on the 11 August 2006, as part of the Federated Logic Conference (FLoC). Previous editions of this workshop were held at Nara (2005), Aachen (2004), Valencia (2003), Pittsburg (2002), Florence (2001) and Montreal (2000). The Programme Committee selected seven papers to be presented at RULE 2006, which can be found in these proceedings. In addition the programme of RULE 2006 included two invited talks (joint with the Workshop on Rewriting Strategies), by Claude Kirchner and Dick Kieburtz. The Programme Committee consisted of: • Mark van den Brand (TU Eindhoven, Netherlands) • Horatiu Cirstea (LORIA, France) • Pierre Deransart (INRIA Rocquencourt, France) • Michael L. Collard (Kent State University, USA) • Martin Erwig (Oregon State University, USA) • Fran¸cois Fages (INRIA Rocquencourt, France) • Maribel Fern´ andez (Co-Chair, King’s College London, UK) • Jean-Pierre Jouannaud (LIX, Ecole Polytechnique, France)

iii

• Oleg Kiselyov (FNMOC, USA) • Ralf L¨ ammel (Co-Chair, Microsoft, USA ) • Ugo Montanari (Universita di Pisa, Italy) • Pierre-Etienne Moreau (LORIA, France) • Tobias Nipkow (Technical University Munich, Germany) • Tom Schrijvers (K.U.Leuven, Belgium) • Martin Sulzmann ( National University of Singapore, Singapore) • Victor Winter (University of Nebraska at Omaha, USA) We would like to thank all those who contributed to RULE 2006. We are grateful to the Programme Committee members for their careful and efficient work in reviewing the papers.

Maribel Fern´ andez and Ralf L¨ammel 30 June 2006

iv

RULE 2006

Rewriting (your) Calculus (Abstract) Claude Kirchner INRIA and LORIA 615, rue du Jardin Botanique, BP 101, 54602 Villers les Nancy Cedex, France

Rewriting is clearly established as a general paradigm which agility eases to express and reason about computation and deduction. The rewriting calculus, generalizing the lambda calculus and the rewriting relation, provides us with a theoretical and uniform foundation for this expressivity. Introduced in the late nineties, the framework and its meta-properties are now better understood. We will show why the calculus is well-suited to represent computation as well as deduction, and therefore deduction modulo. The rewriting calculus is therefore a good candidate to backup proof assistants where the user can adapt the computation mechanism to its needs. But furthermore, we want the next generation of proof assistants to provide the user with the possibility to adapt also the deduction system to its needs. We will see how this could be designed and how it can be used to get better higher-level logical representation of user-defined theories.

This paper is electronically published in Electronic Notes in Theoretical Computer Science URL: www.elsevier.nl/locate/entcs

RULE 2006

Semantic determinism and functional logic program properties ? Jos´e Miguel Cleva Francisco J. L´opez-Fraguas Dpto Sistemas Inform´ aticos y Programaci´ on Universidad Complutense de Madrid e-mail {jcleva,fraguas}@sip.ucm.es

Abstract In modern functional logic languages like Curry or Toy, programs are possibly nonconfluent and non-terminating rewrite systems, defining possibly non-deterministic non-strict functions. Therefore, equational reasoning is not valid for deriving properties of such programs. In a previous work we showed how a mapping from CRWL –a well known logical framework for functional logic programming– into logic programming could be in principle used as logical conceptual tool for proving properties of functional logic programs. A severe problem faced in practice is that simple properties, even if they do not involve non-determinism, require difficult proofs when compared to those obtained using equational specifications and methods. In this work we improve our approach by taking into account determinism of (part of) the considered programs. This results in significant shortenings of proofs when we put in practice our methods using standard systems supporting equational reasoning like, e.g., Isabelle.

1

Introduction

A frequent claim about declarative languages is that the task of reasoning about programs is easier than in other programming paradigms because of the existence of an underlying logic providing more or less natural logical methods for that purpose. Although this assertion is essentially true, such logical methods do not come for free with the language, not even when it is provided with sound semantic foundations (e.g., logic or model theoretic) although these are of considerable help. Moreover, achieving effective methods in practice can be difficult. ? The authors have been partially supported by the Spanish projects TIN2005-09207-C0303 ‘MERIT-FORMS’ and S-0505/TIC/0407 ‘PROMESAS-CAM’. This paper is electronically published in Electronic Notes in Theoretical Computer Science URL: www.elsevier.nl/locate/entcs

´ pez-Fraguas Cleva, Lo

In the case of modern functional logic programming (FLP, in short), realized in systems like Curry [9] or Toy [11], the main problem to face is that equational reasoning is not valid for reasoning about programs, which are constructor based rewrite systems possibly non-terminating and non-confluent. Semantically this leads to the presence of non-strict and non-deterministic functions, which have been shown to be quite useful for practical declarative programming. Non-determinism precludes the direct use, when reasoning about functional logic programs, of well-known existing methods and tools developed for equational specifications, like those coming with Isabelle [13] or Coq [2], which usually require also termination. Rewriting logic in the sense of [12] and related verification tools [4] cannot be applied directly, since the semantics for non-determinism of rewriting logic is run-time choice, instead of the call-time choice criterion adopted in FLP [8]. In a previous work [5] we started what is, up to our knowledge, the first general framework for the verification of program properties for FLP with non-deterministic functions. Our work was based on CRWL 1 [7,8], a wellestablished semantic framework for FLP. The idea was to map CRWL into first order logic (FOL) in the following sense: the CRWL-semantics of a program P , given by a reduction relation →, is expressed by means of a FOL theory, actually a logic program PL , whose least model corresponds closely to the CRWL-initial model of P . Then FOL methods can be used to prove the properties of interest, which are those valid in the least model of the program. In practice, large parts of programs are made of ’classical’, deterministic, even terminating, functions. In the approach of [5], no benefit is taken from this knowledge, since the CRWL framework itself does not make any distinction between these well-behaved functions and the rest: in the reduction relation of CRWL, all functions are implicitly considered as potentially nonstrict and non-deterministic. An unpleasant consequence is that the proofs of simple properties concerning deterministic functions are much more complex than the corresponding proofs using equational methods. For instance, the commutativity of the addition of natural numbers, requires within CRWL a long proof in Isabelle (more than two pages) or ITP, while it is almost automatic using an equational specification of addition. To overcome this problem we refine here the CRWL logic to take into account that certain fragments of a program can be deterministic. We prove the technical soundness of the refinement by means of an equivalence theorem with respect to the original logic. Therefore it allows us to specify equationally the deterministic parts of a program, resulting in much shorter proofs when using tools supporting equational reasoning, like Isabelle or ITP. The remainder of the paper is organized as follows. The next section presents some preliminaries about CRWL. Section 3 is the core of the paper 1

CRWL stands for ‘Constructor based ReWriting Logic’.

3

´ pez-Fraguas Cleva, Lo

where we give semantic notions related to determinism, we propose a suitable refinement of CRWL related to them, and we prove an equivalence result for the two versions of CRWL. In Section 4 we discuss the application to the verification of program properties. Finally, Section 5 summarizes some conclusions. Proofs can be found in http://gpd.sip.ucm.es/fraguas/rule06long.pdf.

2

CRWL programs and their logical semantics

We recall the essential notions about CRWL needed for this work. See [8] for details and [5] for a discussion about slight changes in our presentation of CRWL with respect to the original one. S We assume a signature Σ = CSΣ S ∪ F SΣ where CSΣ = n∈IN CSΣn is a set of constructor symbols and F SΣ = n∈IN F SΣn is a set of function symbols, all of them with associated arity and such that CSΣ ∩ F SΣ = ∅. We also assume a countable set V of variable symbols. We write ExpΣ for the set of (total) expressions built up with Σ and V in the usual way, and we distinguish the subset CT ermΣ of (total) constructor terms or (total) c-terms, which only make use of CSΣ and V. The subindex Σ will usually be omitted. Expressions intend to represent possibly reducible expressions, while c-terms represent not further reducible data values. The signature Σ⊥ results of extending Σ with the new constant (0-arity constructor) ⊥, that plays the role of the undefined value. The sets Exp⊥ and CT erm⊥ of (partial) expressions and (partial) c-terms respectively are built up using Σ⊥ . Partial c-terms represent the result of partially evaluated expressions; thus, they can be seen as approximations to the value of expressions. As usual notation we will write X, Y, Z, ... for variables, c, d for constructor symbols, f, g for functions, e for expressions, s, t for c-terms, and x¯ for tuples of x’s. In all cases, primes (’) and subindices can be used. Expressions can be compared by the approximation ordering v, defined as the least partial ordering verifying: ⊥v e and e1 v e01 ∧ . . . ∧ en v e0n ⇒ h(e1 , . . . , en ) v h(e01 , . . . , e0n ), for h ∈ CS n ∪ F S n . We will use the sets of substitutions CSubst = {θ : V → CT erm} and CSubst⊥ = {θ : V → CT erm⊥ }. 2.1

The Proof Calculus for CRWL

Along this paper a CRWL-program P is a finite set of rewrite rules of the form f (t1 , ..., tn ) → e where f ∈ F S n , (t1 , ..., tn ) is a linear tuple (each variable in it occurs only once) of c-terms, and e is an expression. Notice that ⊥ does not occur in programs. We write Pf for the set of defining rules of f in P . The CRWL-program in Fig. 1 will be used as an example to illustrate several points throughout the paper. It uses the data constructors 0 and s to represent natural numbers. Notice that the program is non-confluent (see 4

´ pez-Fraguas Cleva, Lo

0+Y

→Y

coin

→0

loop

→ loop

s(X) + Y

→ s(X + Y )

coin

→ s(0)

g(0)

→0

double(X) → X + X

f (X) → 0

g(X) → s(g(X))

Fig. 1. CRWL sample program Coin

(BT) e→⊥ (CS)

for any e ∈ Exp ⊥

e1 → t1 ... en → tn

c ∈ CS n , ti ∈ CT erm⊥ , ei ∈ Exp ⊥

c(e1 , ..., en ) → c(t1 , ..., tn ) (FR)

e1 → t1 ... en → tn

e→t

if f (t1 , ..., tn ) → e ∈ [P ]⊥

f (e1 , ..., en ) → t

Fig. 2. The CRWL proof calculus

coin and g) and non-terminating (see loop and g). From a given program P , the proof calculus for CRWL can derive reduction or approximation statements of the form e → t, with e ∈ Exp⊥ and t ∈ CT erm⊥ . The intended meaning of such statement is that e can be reduced to t, where reduction may be done by applying rewriting rules of P or by replacing subterms of e by ⊥. We write P `CRWL e → t to express derivability, and define the denotation of e ∈ Exp⊥ as JeKP = {t ∈ CTerm ⊥ : P `CRWL e → t}. The superscript P is usually omitted. When using a function rule R to derive statements, the calculus uses the so called c-instances of R, defined as [R]⊥ = {Rθ|θ ∈ CSubst⊥ }. We write[P ]⊥ for the set of c-instances of all the rules of a program P . Parameter passing in function calls are expressed by means of these c-instances in the proof calculus. Figure 2 shows the proof calculus for CRWL. The rule (FR) allows to use c-instances of program rules to prove approximations. These c-instances may contain ⊥ and by rule (BT) any expression can be reduced to ⊥. This reflects a non-strict semantics, allowing non-terminating programs to have a meaning different from ⊥. The use of c-instances in rule (F R) instead of general instances corresponds to call time choice semantics for non-determinism [10,7,8]). In the example, it is possible to build a CRWL-proof for the reduction double(coin) → 0 and also for double(coin) → s(s(0)), but not for double(coin) → s(0). In contrast, coin + coin can be reduced to 0, s(0) and s(s(0)). Call-time choice is related to sharing, a well known operational technique considered essential for the effective implementation of lazy funcional languages like Haskell, and also adopted in existing FLP languages like Curry or Toy. Run-time choice, an alternative semantics for non-determinism with 5

´ pez-Fraguas Cleva, Lo

which double(coin) can be reduced also to s(0) is investigated for the FLP setting in [1]. From the point of view of verifying properties of FLP programs, nondeterminism and call-time choice semantics imply that equational reasoning is not valid for CRWL-programs. In the previous example, if the rules for coin were understood as the equalities coin = 0 and coin = s(0), then we could deduce 0 = s(0), which is not intended. Call-time choice implies that not only equational reasoning, but also ordinary rewriting is invalid, since rewriting allows to obtain double(coin) → s(0), which is not valid with call-time choice. CRWL is provided also with a model-theoretic semantics, in which every program has a least Herbrand model, which is an initial model. See [8] for details.

3

An improved CRWL-calculus for deterministic program fragments

We remark that the CRWL calculus is a way of fixing the logical semantics of a program, determining formally the set of possible (partial) values of a given expression, but it is not meant as an operational procedure. As a matter of fact, the calculus has a certain degree of non-determinism other than that coming from program rules. For instance, using the program of Fig. 1, there are two CRWL-derivations for coin + coin → s(0), but this is natural since each coin can be reduced independently to 0 and s(0). But we have also that f (double(s(s(0))) → 0 has 152 (!) different CRWL-derivations, despite of the fact that in this case all the involved functions are deterministic. This fact causes no harm to the original purposes of CRWL, but it is a source of practical problems when trying to reason about properties of programs. This section presents an improvement on the CRWL calculus to deal equationally with deterministic parts of a program. The first thing to do is to determine which notion of determinism is adequate. 3.1

Preliminary semantic concepts about determinism

We recall the definition JeK = {t ∈ CTerm ⊥ : e → t}. It is an easy fact that for any e, JeK is not empty (⊥∈ JeK) and downward closed (i.e., t ∈ JeK ∧ t w t0 ⇒ t0 ∈ JeK). We need first some additional definitions about denotations of expressions. Definition 3.1 Let e ∈ Exp⊥ . (a) The total denotation of e is defined as JeKT = {t ∈ CTerm : t ∈ JeK}. Trivially JeKT ⊆ JeK.

(b) The expression e is finite iff JeK is a finite set.

(c) The expression e is (semantically) totally defined iff JeK = JeKT ↓, where the 6

´ pez-Fraguas Cleva, Lo

downwards closure S ↓ of S ⊆ CT erm⊥ is defined as S ↓= {t ∈ CT erm⊥ : t v t0 for some t0 ∈ S}. Some examples follow: Jdouble(coin)K = {⊥, 0, s(⊥), s(s(⊥)), s(s(0))}, while Jdouble(coin)KT = {0, s(s(0))}. Hence double(coin) is finite and totally defined. Denotations, even total denotations, can be infinite. For instance, Jg(0)KT = {0, s(0), s(s(0)), . . .}. Therefore g(0) is infinite, and it is easy to see that it is also totally defined. Rather different is the case of g(s(0)), which is infinite since Jg(s(0))K = {⊥, s(⊥), s(s(⊥)), . . .}, but it is not totally defined, since Jg(s(0))KT = ∅. We give now a first notion of determinism of expressions and functions. Definition 3.2 (a) An expression e ∈ Exp⊥ is deterministic iff JeK is a directed set, that is, given t, t0 ∈ JeK there exists t00 ∈ JeK such that t v t00 and t0 v t00 . A function f ∈ F S is deterministic if for each t¯ ∈ CTerm ⊥ , f (t¯) is a deterministic expression. (b) An expression e ∈ Exp⊥ is strongly deterministic iff e is deterministic, finite and totally defined. A function f ∈ FS is strongly deterministic if for every t¯ ∈ CTerm, f (t¯) is also strongly deterministic. With these definitions, coin, double(coin) and g(0) are examples of nondeterministic expressions. double(s(0)) is strongly deterministic, as happens with f (coin), despite of the presence of coin. loop and g(s(0)) are deterministic, but not strongly deterministic, because they are not totally defined. With respect to functions, +, double and f are strongly deterministic, loop is deterministic (but not strongly) and coin and g are not deterministic. Notice that the property of being deterministic, as stated in (a), has nothing to do with non-termination and partiality. Those conditions also cause problems for equational reasoning and are typically forbidden in the equational part of proof assistants, like Isabelle, Coq or ITP. Although we do not need to use the formal notion of non-termination in our work, the notion of strong determinism intuitively tries to avoid it, as well as partiality. We remark also that strong determinism is somehow related to the notion of confluence, but does not coincide with it for several reasons: first, confluence in the sense of ordinary rewriting is not adequate for CRWL due to call-time choice semantics, and there is no obvious alternative way of defining it. But even if we ignore this, strong determinism might hold for an expression in absence of confluence in the classical sense: consider for example a 0-ary function h defined by the rules h → 0 and h → f ail, where there is no rule for f ail; then the expression h is strongly deterministic but its set of rules is not confluent. An interesting consequence of strong determinism is: Proposition 3.3 If an expression e is strongly deterministic then JeKT is a unitary set, that is, JeKT = {t}. Such t is called the value of e. 7

´ pez-Fraguas Cleva, Lo

Notice that the opposite does not hold. Consider for example a 0-ary function h defined by the rules h → s(s(loop)) and h → s(0). Then JhKT = {s(0)} but h is not strongly deterministic as it is not deterministic, because s(0), s(s(⊥)) ∈ JhK, but they have no common upper bound in v. The refinement of CRWL we are looking for will try to prove e → t by equational means, where e is strongly deterministic and t is its value. Strongly deterministic will play indeed an important role in the refinement, but still it is not sufficient for it, since strongly deterministic functions might use in their definitions non-deterministic functions. As a simple example, consider the function l defined by the rules l(0) → 0, l(s(0)) → 0, l(s(s(X))) → s(0), and the 0-ary k → l(coin). It is easy to see that k is strongly deterministic and its value is 0, but k → 0 cannot be proved only by equational means, due to presence of coin in its definition. For this reason, we strength the notion of strong determinism to the following one: Definition 3.4 Let P be a CRWL-program, and D ⊆ F S a set of function symbols verifying that all functions in D are strongly deterministic and all their defining rules use only function symbols from D. We say that e ∈ Exp is D-globally deterministic if every function symbol of e is from D. For a fixed P we assume also a fixed D, and the mention to D will be usually omitted. In our example, we can set D = {+, double, f }, and therefore the globally deterministic expressions are those not containing function symbols other than those. Notice that globally deterministic expressions do not contain ⊥. Notice also that if e = f (e1 , . . . , en ) is globally deterministic then each ei is also globally deterministic. We will need the following natural result, though surprisingly technically not trivial: Proposition 3.5 If an expression e ∈ Exp is globally deterministic then e is strongly deterministic. We are now prepared for presenting the announced refinement of CRWL. 3.2

A refined CRWL proof calculus for equational reasoning

The rules for the new calculus CRWLE can be found in Fig. 3. As it is apparent, the calculus consists of two sets of rules, one defining the relation e → t, which is still the ’top-level’ relation and has the same meaning as before, and the other defining the auxiliary relation e = t, reserved for globally deterministic expressions e, and with the meaning ’t is the value of e’. The rule (EQ) connects both relations, by stating that the only way of deriving in CRWLE a reduction e → t for a globally deterministic expression e is to derive the value t0 of e and then decrease t0 in the ordering w to obtain t. 8

´ pez-Fraguas Cleva, Lo

Fig. 3. Proof calculus CRWLE

e is not globally deterministic

(BT) e→⊥ (CS)

e1 → t1 ... en → tn

c ∈ CS n , ti ∈ CT erm⊥

c(e1 , ..., en ) → c(t1 , ..., tn ) and c(¯ e) is not globally deterministic (FR)

e1 → t1 ... en → tn

e→t

if t 6≡ ⊥, f (t1 , ..., tn ) → e ∈ [P ]⊥

f (e1 , ..., en ) → t (EQ)

e = t0 e→t

(CSE)

and f (¯ e) is not globally deterministic

if e globally deterministic and t v t0 e1 = t1 ... en = tn

c ∈ CS n , ti ∈ CT erm

c(e1 , ..., en ) = c(t1 , ..., tn ) and c(¯ e) is globally deterministic (FRE)

e1 = t1 ... en = tn

if f (t1 , ..., tn ) → e ∈ [P ]

e=t

and f (¯ e) is globally deterministic

f (e1 , ..., en ) = t

This latter step is arguable: it is needed to guarantee the strong equivalence result given below; but if we admit a weaker correspondence between the two calculi, we could leave e → t0 as the only possible reduction for the globally deterministic expression e. Something similar, but limited to the special case of c-terms, was done in [5], and contributes to further reducing the space of derivations. We remark, as was already done for CRWL, that the refinement CRWLE should not be understood as an operational procedure, nor it is intended to achieve efficiency in the execution of deterministic parts. It does not even pursue to obtain shorter CRWL-derivations: for instance, at least one of the 152 CRWL-derivations for f (double(s(s(0)))) → 0 –namely, that reducing double(s(s(0))) to ⊥, since f is not strict– is shorter than the only CRWLE derivation existing for such reduction, which requires to reduce double(s(s(0))) to its value s(s(s(s(0)))). The purpose of CRWLE is to simplify the space of derivations and therefore the reasoning about programs, by using equations as much as possible. We notice also that the symbol = and their rules in the proof calculus 9

´ pez-Fraguas Cleva, Lo

are related to the joinability relation ./ of [8] (strict equality in the systems Curry or Toy), but there are important differences, apart from their different purposes (= is not a program construct). For instance, coin ./ 0 can be proved with the rules of [8], but coin = 0 is not provable in CRWLE . We finally remark that the condition of being globally deterministic is of semantic nature and, therefore, typically undecidable. To investigate sufficient decidable criteria would be of clear interest in practice, but it is out of the scope of this paper. 3.3

Relation between CRWL and CRWLE

In this section we give the main results relating the reductions obtained from the original calculus CRWL and those obtained in CRWLE . The following lemma relates the reductions obtained in the equational part of CRWLE calculus with the original CRWL calculus. Lemma 3.6 Let P be a CRWL program, e ∈ Exp a globally deterministic expression and t ∈ CT erm. Then P `CRWL e → t ⇔ P `CRWLE e = t In other terms, the lemma ensures that the equational part of CRWLE exactly proves e = t for the value t of e, as it was intended. The next result shows the strong equivalence between CRWLE and CRWL, since the refinement preserves the reduction relation → of CRWL, for arbitrary partial expressions and c-terms. Proposition 3.7 Let P be a CRWL program, e ∈ Exp⊥ any expression and t ∈ CT erm⊥ any partial c-term. Then P `CRWLE e → t ⇔ P `CRWL e → t

4

Application to the verification of CRWL program properties

The previous calculus can be used for verification of properties of functional logic programs. Two essential questions arise in this sense: what is the language of the properties of interest? what means validity for a given property? For verification purposes, the properties we are interested in are those relating the possible reductions of expressions in the CRWLE calculus. Then the properties are specified as FOL formulas over the relations → and =. In many cases, we want the quantifiers to range over a restricted universe as we are interested in properties valid only for CTerm or CTerm⊥ ; therefore we also include two predicates in the language of properties to deal with such restriction, namely tot and term respectively. We can also be interested in properties that have to do with globally deterministic expressions. For that reason, we 10

´ pez-Fraguas Cleva, Lo

introduce another predicate gd for those expressions. Summarizing, properties are expressed as FOL logic formulae over the relations {→, =, tot, term, gd}. With respect to validity, in [5] we translated the CRWL calculus into FOL by associating a logic program to a CRWL-program, and verified properties of the CRWL program in the least model of the associated logic program. We follow here a similar approach for CRWLE : defining a logic program and proving properties in the least model of the logic program. For this purpose we consider the logic program PL associated to a program P of CRWL. Notice that, although programs do not change when moving from CRWL to CRWLE , the associated logic programs do, because the logic has changed. In this case we need also to define the three auxiliary predicates mentioned above to distinguish between different kinds of expressions, another predicate ngd for expressions non globally deterministic and the relation for the CRWL approximation ordering v, called approx in the logic program. The logic program PL for every CRWL program P is obtained using the rules of Figure 4, where ngd and term are defined in a similar way as gd and tot respectively. The implication symbol in clauses is written as ⇐. Since in this approach validity of a given property of a CRWL-program P , expressed as a FOL formula ϕ, means validity of ϕ in the least model of the corresponding logic program PL , it is important to ensure that the logic of PL (given by FOL) and the logic of P (given by CRWL and CRWLE ) have a good correspondence. The following results relate both. Proposition 4.1 Let P be a CRWL-program and PL its corresponding logic program. Then, for any expression e and term t, (i) PL |= e = t ⇔ P `CRWLE e = t. (ii) PL |= e → t ⇔ P `CRWLE e → t (iii) PL |= term(e) ⇔ e ∈ CTerm⊥ (iv) PL |= tot(e) ⇔ e ∈ CTerm (v) PL |= gd (e) ⇔ e is globally deterministic (vi) PL |= approx (e, e0 ) ⇔ e v e0

Therefore, by propositions 3.6 and 3.7 we have the following corollary: Corollary 4.2 Let P be a CRWL-program and PL its corresponding logic program. Then, for any e ∈ Exp⊥ and t ∈ CT erm⊥ , (i) PL |= e → t ⇔ P `CRWL e → t. (ii) If e is globally deterministic, and t ∈ CT erm, then PL |= e = t ⇔ P `CRWL e → t ⇔ t is the value of e. We conclude that the reductions obtained from the logic program are the same as those for the original CRWL program. Also we have that when we refer to an equation e = t it is because t is the value of e in CRWL. 11

´ pez-Fraguas Cleva, Lo

X →⊥ ⇐ ngd (X) X → T ⇐ gd(X) ∧ X = T 0 ∧ approx(T, T 0 ) For every c ∈ CS : c(E1 , . . . , En ) → c(T1 , . . . , Tn ) ⇐ E1 → T1 ∧ . . . ∧ En → Tn ∧ ngd (c(E1 , . . . , En )) For every f ∈ FS and every rule f (t1 , . . . , tn ) = e ∈ P : f (E1 , . . . , En ) → T ⇐ E1 → t1 ∧ . . . ∧ En → tn ∧ e → T ∧ ngd (f (E1 , . . . , En )) For every c ∈ CS : c(E1 , . . . , En ) = c(T1 , . . . , Tn ) ⇐ E1 = T1 ∧ . . . ∧ En = Tn ∧ gd (c(E1 , . . . , EN )) For every f ∈ FS and every rule f (t1 , . . . , tn ) = e ∈ P : f (E1 , . . . , En ) = T ⇐ E1 = t1 ∧ . . . ∧ En = tn ∧ e = T ∧ gd (f (E1 , . . . , En )) For every c ∈ CS : gd (c(E1 , . . . , En )) ⇐ gd (E1 ) ∧ . . . ∧ gd (En ) For every f ∈ GDFS : gd (f (E1 , . . . , En )) ⇐ gd (E1 ) ∧ . . . ∧ gd (En ) For every c ∈ CS : tot(c(E1 , . . . , En )) ⇐ tot(E1 ) ∧ . . . ∧ tot(En ) approx (⊥, X) For every c ∈ CS : approx (c(E1 , . . . , En ), c(E10 , . . . , En0 )) ⇐ approx (E1 , E10 ) ∧ . . . ∧ approx (En , En0 )

Fig. 4. Logic program obtained from CRWLE

4.1

Practical aspects of the approach

The approximation of [5], translating CRWL into a logic program for the verification of properties was tested in various existing theorem provers. We have tested the new approximation in the theorem prover Isabelle [13]. The translation of the reduction process into Isabelle is done in two steps. First we define a function red for the equational part of the program, thus all functions globally deterministic define a unique reduction via the program rules. Such program rules for the collection of globally deterministic functions form the definition of the function red. We also define as functions the predicates term, tot, gd, ngd and the relation approx that appear in the logic program translation. Based on this translation procedure we can transform CRWL programs into Isabelle specifications. Consider the CRWL example on Figure 1 restricted to the functions +, double and coin. The corresponding specification of the logic program from the example in Isabelle is shown in Figure 5. The formulas to specify properties are also transformed when considering this Isabelle specification. As the relation = is now specified as a function red 12

´ pez-Fraguas Cleva, Lo inductive arrow intros bt [intro]: "ngd(x) ==> (x, bottom) : arrow" dcs [intro]: "[|ngd(s x) ; (x, t):arrow|] ==> ((s x), (s t)):arrow" fcoin1 [intro]: "(zero, t):arrow ==> (coin, t):arrow" fcoin2 [intro]: "(s(zero), t):arrow ==> (coin, t):arrow" sum1 [intro]: "[|ngd(sum x y) ; (x, zero):arrow ; (y, t):arrow|] ==> (suma x y, t):arrow" sum2 [intro]: "[|ngd(sum x y) ; (x, s(t1)):arrow ; (y,t2):arrow ; (s(sum t1 t2), t):arrow|] ==> (sum x y , t):arrow" double [intro]: "[|ngd(double x) ; (x, t1):arrow ; (sum t1 t1,t):arrow|] ==> (double(x), t):arrow" eq [intro]: "[|gd(x) ; red(x)=r ; approx(r, t)|] ==>(x, t):arrow" recdef red "measure number" "red(zero) = zero" "red(s x) = s(red x)" "red(sum zero y) = red y" "red(sum (s x) y) = s(red(sum x y))" "red(double x) = red(sum x x)"

Fig. 5. Part of Isabelle specification for Coin

and it is only defined for globally deterministic expressions, a statement of the form e = t in the logic is formulated as gd(e) & red(e)=t For instance, the formula: ∀X, Y, T.(tot(X) ∧ tot(Y ) ∧ X + Y = T ⇒ Y + X = T )

(1)

is transformed into: ∀X, Y, T.(tot(X) ∧ tot(Y ) ∧ gd (X + Y )∧ red(X + Y ) = T ⇒ gd (Y + X)∧ red(Y + X) = T ) With the Isabelle specification of Figure 5 the proof of this property is very short, similar to the proof that one could obtain if the program was functional. This contrasts with the results of using CRWL instead of CRWLE , as done in [5]. The first thing to note is that the formula (1) is not expressible, since within CRWL the relation = simply does not exist. The closest formula would be: ∀X, Y, T.(tot(X) ∧ tot(Y ) ∧ X + Y → T ⇒ Y + X → T )

(2)

which requires a rather long proof in an Isabelle specification of CRWL. To be more fair, we could compare the complexity of the proof of the property (2) with our refinement. Again, the use of CRWLE is successful, since the resulting Isabelle proof is three times shorter than in the case of using CRWL.

5

Conclusions

In this paper we have made some progress towards achieving effective methods for verifying properties of functional logic programs where non-deterministic 13

´ pez-Fraguas Cleva, Lo

functions are permitted. The work was motivated by the following fact: to specify the underlying logic (CRWL [8]) of functional logic programming in other formalisms like first order logic (as in [5]) or rewriting logic (as in [6]), is a good conceptual starting point for verifying properties of those programs, but it is not enough in practice, since simple properties might require complex proofs. This happens because the possibility of non-determinism spreads over the whole logic, even if large parts of a program are purely deterministic. A typical example of such situation is commutativity of the addition of natural numbers, whose proof, if equationally specified, is almost automatic in most proof assistants, but requires a long proof in the approach of [5]. Let us give a succinct summary of our contributions to overcome this problem: •

We have identified, within the CRWL framework, a notion of semantic determinism appropriate to our purposes.



We have refined the CRWL logic in such a way that reduction statements involving only deterministic expressions can be derived with equational-like reasoning, thus reducing enormously the indeterminism inherent to derivations in the original CRWL calculus.



We have proved the correctness of the refinement through an equivalence result.



We have applied the refined logic to our main aim, the proof of properties of CRWL-programs, following a similar scheme to that of [5]: the (refined) CRWL-logical semantics of a program is specified as a logic program; the properties to verify are first order formulae over the involved relations, and validity of a property means validity in the least model of that logic program.



We have used Isabelle [13] to check our ideas in practice. In particular, we obtain a much shorter proof in the example of commutativity of addition.

Our improvement seems practical enough to continue in several ways. First, determining effective sufficient conditions ensuring determinism; in this sense, maybe the techniques in [14,3] could be useful. We are interested also in investigating weaker (but still applicable to our purposes) notions of determinism that will enlarge the deterministic part of the program in which proving properties will be more effective. Finally, we also plan to develop a set of non-trivial case studies for a better evaluation of our methods. This was almost impossible prior to this work due to the complexity of proofs of previous approaches.

Acknowledgements We thank an anonymous reviewer for pointing out some technical problems in the preliminary version of this paper. 14

´ pez-Fraguas Cleva, Lo

References [1] S. Antoy. Optimal Non-deterministic Functional Logic Computations, Proc. Algebraic and Logic Prog. (ALP’97), Springer LNCS 1298, pp. 16–30, 1997. [2] Y. Bertot, P. Casteran. Interactive Theorem Proving and Program Development CoqArt: The Calculus of Inductive Constructions, Texts in Theoretical Computer Science. Springer, 2004. [3] B. Braßel, M. Hanus. Nondeterminism Analysis of Functional Logic Programs, Proc. Int. Conf. on Logic Programming (ICLP’05), Springer LNCS 3668, pp. 265–279, 2005. [4] M. Clavel, M. Palomino. A quick ITP tutorial, Proc. V Jornadas sobre Programaci´ on y Lenguajes (PROLE’05), Thomson, pp. 159–172, 2005. An extended version will appear in Journal of Universal Computer Science. [5] J.M. Cleva, J. Leach, F.J. L´opez-Fraguas. A logic programming approach to the verification of functional-logic programs. Proc. ACM Conf. on Principles and Practice of Declarative Programming (PPDP’04), ACM, pp. 9–19, 2004. [6] J.M. Cleva, I. Pita. An approach to the verification of CRWL programs with rewritng logic. Proc. V Jornadas sobre Programaci´on y Lenguajes (PROLE’05), Thomson, pp. 138–148, 2005. An extended version will appear in Journal of Universal Computer Science. [7] J.C. Gonz´ alez-Moreno, M.T. Hortal´a-Gonz´alez, F.J. L´opez-Fraguas and M. Rodr´ıguez-Artalejo. A Rewriting Logic for Declarative Programming. Proc. European Symp. on Programming (ESOP’96), Springer LNCS 1058, pp. 156– 172, 1996. [8] J.C. Gonz´ alez-Moreno, M.T. Hortal´a-Gonz´alez, F.J. L´opez-Fraguas and M. Rodr´ıguez-Artalejo. An Approach to Declarative Programming Based on a Rewriting Logic. Journal of Logic Programming 40(1), pp. 47–87, 1999. [9] M. Hanus (ed.), Curry: an Integrated Functional Logic Language, Version 0.8.2, March 28, 2006. http://www-i2.informatik.uni-kiel.de/∼curry/. [10] H. Hussmann. Nondeterministic Algebraic Specifications and Nonconfluent Term Rewriting. Journal of Logic Programming 12, pp. 237–255, 1992. [11] F.J.L´ opez Fraguas, J. S´ anchez Hern´andez. T OY: A Multiparadigm Declarative System. Proc. Rewriting Techniques and Applications (RTA’99), Springer LNCS 1631, pp 244–247, 1999. [12] J. Meseguer. Conditional Rewriting Logic as a Unified Model of Concurrency. Theoretical Computer Science 96, pp. 73–155, 1992. [13] T. Nipkow, L.C. Paulson, M. Wenzel. Isabelle/HOL — A Proof Assistant for Higer-Order Logic. Springer LNCS 2283, 2002. [14] R. Pe˜ na-Mar´ı, Clara Segura. Non-determinism analyses in a parallel-functional language, Journal of Functional Programming 15 (1), pp. 67–100, 2005.

15

RULE 2006

Model-driven Transformation-based Generation of Java Stress Tests Victor L. Winter 1,2 Department of Computer Science University of Nebraska at Omaha USA

Abstract This paper describes a practical application of transformation-based analysis and code generation. An overview is given of an approach for automatically constructing Java stress tests whose execution exercises all “interesting” class initialization sequence possibilities for a given class hierarchy. Key words: program transformation, strategic programming, Java class initialization, method, JVM, TL, HATS

1

Overview

This paper describes a model-driven approach in which transformation can be used to automatically generate Java stress tests whose scale and complexity resist manual construction. The approach consists of a framework where a variety of Java entities can be modelled at various levels of abstraction. The models presented have structural properties that naturally lend themselves to transformation-based manipulation. In this setting, transformation-based analysis is performed on the most abstract form of a model and the goal of transformation-based generation is to derive a corresponding concrete model (i.e., a set of Java classes). All analysis and generation transformations discussed in this paper have been implemented in the higher-order transformation language TL [8] using the HATS system [7]. The resulting stress tests are being used to help validate that the SSP [6], a hardware implementation of a significant JVM subset, conforms to the specification of the Java Virtual Machine (JVM). 1

This work was in part supported by the United States Department of Energy under Contract DE-AC04-94AL85000. Sandia is a multiprogram laboratory operated by Sandia Corporation, a Lockheed Martin Company, for the United States Department of Energy. 2 Email: [email protected] This paper is electronically published in Electronic Notes in Theoretical Computer Science URL: www.elsevier.nl/locate/entcs

Winter

In the context of this paper, a (stress) test is the Java source code for a set of classes which, after compilation, can be given as “input” to an implementation of the JVM. The correct execution of this test program provides evidence that a certain portion of an implementation’s behavior conforms to the JVM specification [3]. In particular, this paper focuses on the generation of stress tests that can be used to help validate the behavior of a JVM implementation with respect to class initialization. More specifically, we are interested in providing assurance that methods are properly sequenced (i.e., are invoked at the proper time during program execution). The remainder of this paper is as follows: Section 2 provides background on class initialization as it is specified for the JVM. Section 3 describes various models and model representations that are of interest to our testing goals. Section 4 discusses the selection, observation, and generation of tests. The section introduces the concept of a discrimination net to capture the notion of interesting clinit tests. Next, a design is given that enables method sequencing to be observed in the context of a test program. This is followed by a discussion of transformation-based test generation. Section 5 discusses how clinit stress tests can be generated using the higher-order transformation language TL. Section 6 presents some results and section 7 concludes.

2

Background: Java Class Initialization

Class initialization is part of the linking phase of the JVM [3]. In Java, the initialization of a class takes place at most once during execution. We pick up the discussion of class initialization at the point where verification and preparation have already taken place. Furthermore, here we only describe the general case of initialization for user-defined classes. For example, we do not consider class initializations that are triggered as a result of the invocation of various reflective methods such as those that can be found in class Class or package java.lang.reflect. We also do not discuss initialization with respect to interfaces. And finally, due to space limitations, this background discussion does not cover the effects of constant fields and the passive use of classes on class initialization. Generally speaking, class initialization involves executing the method associated with a class. This method is generated by the compiler and contains code realizing all class variable initializers as well as static initializers. From an operational standpoint, class variable initializers and static initializers are executed in the order in which they syntactically appear in the class. The method cannot be invoked directly at the source code level, but rather may only be invoked internally by the JVM in response to the first active use of a class. There are three kinds of situations that constitute an active use of a class: (1) when a static field of a class is accessed, (2) when a static method of a class is invoked, and (3) when an instance of the class is created. At the bytecode 17

Winter

level, there are four bytecodes whose execution constitute an active use of a class: new, getstatic, putstatic, and invokestatic. The method for a given class may be invoked at most once during the execution of a Java program. The internal structure of a method (i.e., its body) is important to this discussion only to the extent that the body may contain an active use of another class. The execution of the body of a method for a class B should be suspended if the method for the superclass of B has not been invoked 3 . The execution of a method body should also be suspended if an attempt is made, in the method body, to evaluate an active use of a class whose method has not yet been invoked. The execution of a suspended method must resume immediately after completion of the method belonging to the class that triggered the suspension. Rule 1: A method may be invoked at most once. Rule 2: The method of a class B must be invoked before any active use of B may be evaluated. Rule 3: Before executing the body of the method for a class B, the method of the superclass of B must be invoked. Rule 4: The execution of the method of Bi must be suspended upon encountering (in its body) an active use of a class Bk whose method has not been invoked. Rule 5: The execution of a suspended method (for class Bi ) must resume immediately after completion of the method (for class Bk ) that caused the suspension.

Fig. 1. Class initialization rules

And finally, an active use of a class B may be evaluated at any time after the method for B has been invoked (even though the method for B may not be completed). This relaxation on the evaluation of active uses provides a straightforward means for resolving circular dependencies among methods thereby ensuring that initialization sequences are well-founded. 3

Initialization of an interface does not require initialization of its superinterface and consists only of executing the interface’s initialization method.

18

Winter

class A1

{ static int x = A2.w; }

class A2 extends A1 { static int w = 1; static int x = A1.x; } class B1 { static int x = B2.w; } class B2 { static int w = 1; static int x = B1.x; } Fig. 2. The difference between Rule 3 and Rule 4

Figure 1 gives a set of rules that are sufficient to assure the correctness of class initialization. Figure 2 highlights the distinction between initialization Rule 3 and initialization Rule 4. In particular, assuming A1 has not been initialized, a first active use of A2.x will result in A2.x being initialized with the value 0. In contrast, assuming B1 has not been initialized, a first active use of B2.x will result in B2.x being initialized with the value 1.

3

Concepts and Terminology

We use the term model, usually preceded by a descriptor (e.g., class model ), to denote various Java entities. Models can have representations at various levels of abstraction – a characteristic that is exploited during transformation. There are two representational forms that are of particular interest: We use the term abstract form to refer to the most abstract representation of a model that we wish to consider. We use the term concrete form to refer to models represented in the syntax of Java that can be legally embedded within a particular Java source program, compiled, and executed. The scope of our discourse ranges over the following models: •

class hierarchy model – This model represents a set of Java classes. In its abstract form, this model is represented as a list of abstract class models. In its concrete form, this model is represented as a list of concrete class models.



class model – This model represents the clinit dependencies of a Java class including the dependency that exists between a class and its superclass. In its abstract form, this model is represented as a rewrite rule of the form: [B1 → B2 B3 · · · Bn ] where B2 is the superclass of B1 and B3 · · · Bn denotes the active use sequence that occurs within the method of B1 . The concrete form of this model is shown in Figure 4. The concrete form assumes that is the concrete form of an active use model corresponding to 19

Winter

B3 · · · Bn . •

active use model – This model represents a sequence of active uses of a set of classes. In its abstract form, this model is represented as a list of class identifiers: B1 B2 · · · Bm . In its concrete form, this model is represented as an expression of the form: (B 1.x + B 2.x + · · · + B m.x) where it is assumed that the classes B1 , ..., Bm contain static declarations of the integer identifier x.



observed sequence model – This model represents the clinit sequence that has been observed as a result of executing an active use model with respect to a given class hierarchy model. In its abstract form, this model is represented as a list of class identifiers: B1 B2 · · · Bk . In its concrete form, this model is represented by the class observe as shown in Figure 4.



initialization sequence model – This model represents the order in which methods should complete, according to the specification of the JVM, for a given (class hierarchy model, active use model) pair. In its abstract form, this model is represented as a list of class identifiers B1 B2 · · · Br . In its concrete form, this model is represented by the following boolean valued expression: (observe.B 1 == 1 && observe.B 2 == 2 && · · · && observe.B r == r) where it is assumed that an observed sequence model containing the classes B1 B2 · · · Br exists.

4

Testing: Selection, Observation, and Generation

Our testing objective is to generate a Java source program that can be used to validate that an implementation of the JVM has behavioral properties that conform to the rules in Figure 1. Using the concepts defined in the previous section a test, in abstract form, is modelled as a tuple (M, a seq) consisting of a class hierarchy model M and an active use sequence a seq. A stress test is modelled as a list of test models. For a given class hierarchy model M, it will generally be possible to construct an infinite set of active use sequences (e.g., B, BB, BBB, ...) and thus, an infinite number of tests (M, B), (M, BB), and so on. However, since M is finite, an argument can be made that there are only a finite number of “interesting” active use sequences. For example, one might argue that the active use sequence AA is redundant and therefore not interesting. Such arguments reflect assumptions, that are sometimes subtle, about the nature of the error that a test hopes to expose. 20

Winter

4.1

Selection: Discrimination Nets – Interesting Active Use Sequences

We say that an active use model is complete with respect to a given class hierarchy model M if and only if it guarantees the initialization, either directly or indirectly, of every class in M. The abstract form of an active use model a seq is minimal if (1) class identifiers occur at most once, and (2) a seq does not contain a proper prefix that is complete. We refer to the set of all minimal active use models as a discrimination net. Figure 3 shows a class hierarchy and its discrimination net in graphical form. The abstract active use models belonging to the discrimination net are constructed by concatenating the class identifiers of all paths in the tree from root to leaf: {ABC, AC, BC, C}. We are interested in the construction of stress tests that, for a given class hierarchy model M, will validate all active use models belonging to the discrimination net of M. class A { static int y = 0; } class B extends A { static int y = 0; } class C { static int y = B.y; }

A

B

B

C

C

C

C Fig. 3. A class hierarchy and its discrimination net

4.2

Observing Method Sequencing

The clinit dependencies between classes can be seen as having a directed graphlike structure. These dependencies can be modelled by a class hierarchy model M. Given a clinit dependency description M, we are interested in creating, in concrete form, a hierarchy of classes {B1 , B2 , ..., Bn } whose clinit dependencies correspond to M. A test class Bi belonging to a class hierarchy M will have a concrete form conforming to the class model template shown in Figure 4. The extends B r portion of the class definition is optional and is only included when required by the dependency graph description D. In the class B i, the variable x is assigned to an expression whose evaluation explicitly triggers the sequence of active uses called for by D. In this discussion, we will use an expression 21

Winter

Class Model Template class B_i [extends B_r] { static int x = ; static int pos = observe.setB_i(); } Observed Sequence Model Template class observe { static int B_1; static int setB_1() {

B_1 = next_position; next_position += 1; return B_1 ;

} // declarations for remaining classes B_2, ... , B_n // the position counter used by all set methods static int next_position = 1; } Initialization Sequence Model Template class set_and_check { static int a_seq = B_1.x + B_2.x; // Test sequence static void check() { System.out.println ( observe.B_j1 == 1 && observe.B_j2 == 2 && ... observe.B_jn == n ); } } Test Driver Template class test { public static void main(String [ ] args) { set_and_check_1.check(); set_and_check_2.check(); ... set_and_check_m.check(); } };

Fig. 4. Concrete Form Templates

of the form Bj1 .x + Bj2 .x + ... + Bjk .x to trigger the active use sequence hBj1 , Bj2 , ..., Bjk i. The second statement in class B i is a static declaration of the variable pos whose value is assigned to observe.setB i(). The value of this variable represents the position of the class’ method in the overall class initialization sequence for the class hierarchy. If pos has a value 22

Winter

of zero it we conclude that the method for the class has not been invoked (or did not complete). For a given hierarchy of classes {B1 , B2 , ..., Bn }, we construct an associated class observe. The class observe consists of integer and method declarations whose purpose is to positionally record when the methods of {B1 , B2 , ..., Bn } are invoked. Note that the recording of the position of B i’s method is done external to B i (i.e., within the class observe). This permits us to later query B i’s position in a clinit sequence in a manner that does not itself constitute an active use of B i. The observe class contains an integer and method declaration corresponding to each class Bi in the class hierarchy and has a structure conforming to the observed sequence model template shown in Figure 4. For each class hierarchy {B1 , B2 , ..., Bn }, there is also an associated class set and check. The first statement in set and check is a static declaration of a variable a seq whose initializing expression consists of a specific initial (toplevel) active use sequence to be tested. The second statement is a declaration of a method check which accesses the positional elements of the class observe to see if the initialization of classes B1 , B2 , ..., Bn conforms to the correct class initialization sequence that results from the evaluation of a seq. The class set and check has a structure conforming to the initialization sequence model template shown in Figure 4. Since methods are invoked only once, in any given execution run, a hierarchy of classes {B1 , B2 , ..., Bn } can only be used to test the behavior a single active use sequence. However, a different active use sequence for a given (fixed) class hierarchy can be tested in the same execution run by making a copy of the class hierarchy as well as the associated observe and set and check classes. Such copies can be made using standard renaming techniques. In this manner a set of tests can be created and executed from a testsuite’s main method by simply calling the check method of every instance of class set and check that has been created. The template for this driver method is shown in Figure 4. Note that the invocation set and check i.check() will trigger the execution of the method for set and check i which will result in the initialization of the variable set and check.a seq. 4.3

A Test Generator

Let MA and MC respectively denote the abstract and concrete forms of a class hierarchy model. Let {aseq1 , ..., aseqn } denote the discrimination net for MA . Let iseqj denote the (correct) class initialization sequence implied by (MA , aseqj ), and let (MAj , aseqj0 ) denote a consistent renaming of (MA , aseqj ). Under these assumptions, the transformational steps that need to be performed can be summarized as shown in Figure 5. In step (1), the abstract model MA is used to construct the discrimi23

Winter

(1) MA ⇒ (MA , {aseq1 , ..., aseqn }) (2)

⇒ {(MA1 , aseq10 ), ..., (MAn , aseqn0 )}

(3)

⇒ {(MA1 , aseq10 , iseq10 ), ..., (MAn , aseqn0 , iseqn0 )}

(4)

⇒ {(MA1 , observe1 , set check1 ), ..., (MAn , observen , set checkn )}

(5)

⇒ {test, (MC1 , observe1 , set check1 ), ..., (MCn , observen , set checkn )} Fig. 5. A summary of transformational steps

nation net {aseq1 , ..., aseqn }. In step (2) tuples are constructed by pairing the abstract model with each element in the discrimination net and tuple elements are consistently renamed. In step (3) an analysis is performed on each tuple (MAj , aseqj0 ) yielding the expected initialization sequence iseqj0 . In step (4) the pair (aseqj0 , iseqj0 ) is used to generate an instance of the class set and check and the model MAj is used to generate an instance of the class observe. And finally, in step (5) the models MAj are transformed into concrete form and the driver class test is added.

5

Transformation in Practice

Many of the transformational steps in the test generator are straightforward and, due to space considerations, the concrete details of their implementation are not presented. However, highlights of basic transformations are shown in Figure 6. Aside from various standard transformational issues, there are three primary transformational problems that must be overcome when generating Java stress tests in the manner described in this paper. First, one needs to define transformations that are able to construct a discrimination net for a given model MA . Second, one needs to define transformations that are able to construct the expected initialization sequence iseq implied by a model/activation sequence pair (MA , aseq). And third, one must develop transformations that are able to consistently use identifier names across the stress test (e.g., calls in the main method to instances of set and check.check()). In the next section we give a brief overview of TL. This is then followed by a discussion of implementation details of two of these three problems: initialization sequence calculation and the consistent use of names. 5.1

Overview of TL

This section gives a brief overview of TL, a labelled conditional (higher-order) rewriting language supporting a variety of strategic operators and generic traversals. For a more detailed discussion of TL see [8]. In TL, parse trees are the “objects” that TL programs transform. Rewrite rules have the following 24

Winter

Class Model Transformation [B1 → B2 B3 · · · Bm ]

=⇒ class B 1 extends B 2 { static int x = B3 · · · Bm ; } =⇒ class B 1 extends B 2 { static int x = (B 3.x + ... + B m.x); }

Initialization Sequence Transformation B1 · · · Br =⇒ static void check() { B1 · · · Br } =⇒ static void check() { System.out.println(observe.B 1==1 &&...&& observe.B r==r); } Test Class Transformation class set and check 1 { ... static void check() { ... } ... } .. .. .. .. . . . . class set and check s { ... static void check() { ... } ... } =⇒ public static void main(String[] args) { set and check 1.check(); ... set and check s.check(); } Fig. 6. Transformation Highlights

form: r : lhs → sn [if condition]

(1)

In this example, r denotes the label of the rule, lhs denotes a pattern describing a tree structure, sn denotes a strategic expression whose evaluation yields a strategy of order n, and if condition denotes an optional Booleanvalued condition consisting of one or more match expressions constructed using Boolean connectives. A pattern is a notation for describing the parse tree structures that are being manipulated. This notation includes typed variables that are quantified over specific tree structure domains; E.g., stmtJ id1 = 5 K is a tree with root stmt and leaves id1 , =, and 5. In this context, the subscripted variable id1 denotes a typed variable quantified over the domain of all trees having id as their root node. In general, a pattern of the form BJα0 K is structurally valid + if and only if the derivation B ⇒ α is possible according to the grammar and 25

Winter

α0 is obtained from α by subscripting all nonterminals occurring in α. A strategic expression is an expression whose evaluation yields a strategy having a particular order. In the framework of TL, a pattern is considered to be a strategy of order 0. A rewrite rule that transforms its input tree into another tree is considered to be a strategy of order 1 (i.e., a first-order rule). Let s1 denote a first-order strategy. Then the rule lhs → s1 denotes a second-order strategy (e.g., s2 ), and so on. A match expression is a first-order match between two patterns. Let t1 denote a pattern, possibly non-ground, and let t2 denote a ground pattern. The expression t1  t2 denotes a match expression and evaluates to true if and only if a substitution σ can be constructed so that σ(t1 ) = t2 . One or more match expressions can be combined using the Boolean connectives { and, or, not } to form the condition of a rewrite rule. A combinator is an operator defined on strategies. Two widely used combinators are: (1) left-to-right sequential composition (