University of Tasmania at Launceston. Launceston, Tasmania 7250, Australia and. Carolyn Talcott. Computer Science Department, Stanford University. Stanford ...
Reasoning about Object Systems in VTLoE Ian Mason Department of Applied Computing and Mathematics University of Tasmania at Launceston Launceston, Tasmania 7250, Australia
and Carolyn Talcott Computer Science Department, Stanford University Stanford, CA 94305, USA ABSTRACT
VTLoE (Variable Type Logic of Eects) is a logic for reasoning about imperative
functional programs inspired by the variable type systems of Feferman. The underlying programming language, mk, extends the call-by-value lambda calculus with primitives for arithmetic, pairing, branching, and reference cells (mutable data). In VTLoE one can reason about program equivalence and termination, input/output relations, program contexts, and inductively (and co-inductively) de ne data structures. In this paper we present a re nement of VTLoE. We then introduce a notion of object speci cation and establish formal principles for reasoning about object systems within VTLoE. Objects are self-contained entities with local state. The local state of an object can only be changed by action of that object in response to a message. In mk objects are represented as closures with mutable data bound to local variables. A semantic principle called simulation induction was introduced in our earlier work as a means of establishing equivalence relations between streams, object behaviors, and other potentially in nite structures. These are formulated in VTLoE using the class apparatus. The use of these principles is illustrated by validating a variety of basic tranformation rules. Keywords: functional, imperative, Object, simulation induction, contextual assertion, VTLoE
1. Introduction
Imperative functional languages are programming languages that combine the higher-order aspects of functional languages with the ability to manipulate mutable data and with other facilities that have eects. Traditional examples include Lisp, Scheme, ML, and object-oriented languages. More recently there has been much interest in enriching functional programming languages with capabilities to manipulate state. There is a practical need to interface functional programming languages with non-functional languages, and to permit functional programs to aect their environment. A comprehensive semantic theory is needed to support more advanced implementation technology both for purely functional languages and for imperative functional languages such as ML, Scheme, and Lisp dialogues. It is also important to develop speci cation logics for such languages. VTLoE is a logic for reasoning about imperative functional programs, inspired by the variable type systems of Feferman. These systems are two sorted theories of operations and classes initially developed for the formalization of constructive mathematics [3,4] and later applied to the study of purely functional languages [5,6]. An 1
extension incorporating non-local control eects was introduced in [23]. In [16] we presented the rst-order part of VTLoE, and showed how the Meyer-Sieber examples can be treated in this logic. The full system [10,11] extends the rst-order theory by incorporating a theory of classes. This extension provides a more expressive formalism including the ability to construct inductively de ned sets and derive the corresponding induction principles. VTLoE goes well beyond traditional programming logics, such as Hoare's logic [2] and Dynamic logic [9] by treating a richer language and expressing more properties. It is close in spirit to Speci cation Logic [21] and to Evaluation Logic [20]. These logics all incorporate a full rst order theory of data, and the ability to express program equivalence, and the ability to assert and nest Hoare-like triples (called contextual assertions in VTLoE). In the case of Speci cation logic, the underlying programming languages are quite dierent: Speci cation Logic is about Algol-like programs that are strongly typed, can store only rst-order data, and obey call-by-name semantics. In contrast VTLoE is about untyped ML- or Schemelike languages that can store arbitrary values, and obey call-by-value semantics. Evaluation logic extends Moggi's metalanguage for computational monads [19] to a full constructive predicate logic which permits formulation of statements about evaluation of computations to values, and constitutes a framework for expressing reasoning principles. It is inspired by categorical descriptions of computation processes. Evaluation logic is a general system which reduces reasoning about programs to reasoning about mathematical structures for programming semantics, which are not necessarily fully-abstract with respect to a canonical semantics. These systems are justi ed by completeness results over classes of categorical interpretations. The underlying programming language of VTLoE, mk , is based on the call-byvalue lambda calculus extended by the reference primitives mk, set, get. Atoms, references and lambda abstractions are all rst class values { they can be bound to lambda variables, stored, and returned from procedures. The logic combines the features and bene ts of equational calculi as well as program and speci cation logics. There are three layers. The foundation is the syntax and semantics of mk , the underlying term/program language. The second layer is a rst-order theory built on assertions of program equivalence and program modalities called contextual assertions. The third layer extends the logic to include class terms, class membership, and quanti cation over class variables. A simple contextual assertion has the form letfx := e g[ ]] meaning that after execution of e and binding the result to x the formula holds. One important principle for contextual reasoning is to be able to replace e by any operationally equivalent expression without changing the semantics of the contextual assertion. This principle fails in the simple semantics of VTLoE [11], which does not fully protect private local state from observation. In this paper we present a modi ed semantics for formulas of VTLoE that solves the privacy problem. Objects are self-contained entities with local state. The local state of an object can only be changed by action of that object in response to a message. In mk like languages objects are represented as closures with mutable data bound to local 2
variables [22,1]. A semantic principle called simulation induction was introduced in [13] for establishing equivalence relations between streams, object behaviors, and other potentially in nite structures. In [14] informal methods based on simulation induction are used to derive an optimized specialized window editor from generic speci cations of its components. In this paper we introduce a formal notion of object speci cation and establish formal principles based on simulation induction for reasoning about object systems within VTLoE. The formulation of these principles relies heavily on the class apparatus of VTLoE. The remainder of this paper is organized as follows. In x2 we summarize the syntax and semantics of mk , the same subject matter is treated in far greater detail in [11]. In x3 we give the syntax and modi ed semantics of rst-order formulas. Central to the new localized semantics is a notion of a set of visible values. We discuss two possible de nitions of visibility in x4. In x5 we state and prove two theorems concerning the local semantics. In x6 we introduce and discuss persistence properties. In x7 we establish a set of principles for reasoning about behaviors and objects. In x8 we introduce a notion of object speci cation and establish a general theorem relating speci ed behaviors and objects. In x9 we give several examples of object transformations. In x10 we summarize and suggest directions for future research. A useful set of laws established in earlier work are summarized in the appendix. 1.1. Notation
We conclude the introduction with a summary of notation. Let X; Y; Y0; Y1 be sets. We specify meta-variable conventions in the form: let x range over X, which should be read as: the meta-variable x and decorated variants such as x0, x0 , : : :, range over the set X. We use the usual notation for set membership and function application. Y n is the set of sequences of elements of Y of length n. Y is the set of nite sequences of elements of Y . y = [y1; : : :; yn] is the sequence of length n with ith element yi . P! (Y ) is the set of nite subsets of Y . Fmap[Y0; Y1] is the set of nite maps from Y0 to Y1 . [Y0 ! Y1] is the set of total functions f with domain Y0 and range contained in Y1 . We write Dom(f) for the domain of a function and Rng(f) for its range. For any function f, f fy := y0 g is the function f 0 such that Dom(f 0 ) = Dom(f) [ fyg, f 0 (y) = y0 , and f 0 (z) = f(z) for z 6= y; z 2 Dom(f). N = f0; 1; 2; : : :g is the natural numbers and i; j; n; n0; : : : range over N.
2. The Syntax and Semantics of Terms 2.1. Syntax
The syntax of the terms of mk is a simple extension of the lambda calculus to include basic constants or atoms A , (such as the Lisp booleans t and nil S as well as the integers Z), together with a collection of primitive operations, F = n2NFn , where Fn is the (possibly empty) set of n-ary operations. The primitive operations include the memory operations (get; set; mk) and immutable pairs (pr; fst; snd; ispr?), in 3
addition to the usual operations for branching and arithmetic. We assume an in nite set of variables, X and use these to de ne, by mutual induction, the set of -abstractions, L, the set of value expressions, V, the set of expressions, E, and the set of contexts, C , as the least sets satisfying the following equations: Lambda Expressions Immutable Pairs Value Expressions Value Substitutions Expressions Contexts
L = X:E P = pr(V; V) V = X+ A
+ L+P S= Fmap[X; V] E = V+ app(E ; E ) + Fn (E n ) C = fg + X+ A + X:C + app(C ; C ) + Fn (C n )
We let a range over A , x; y; z range over X, v range over V, x:e range over L, range over S, e range over E, and C range over C . Note that the structured data (pairs), P, are taken to be values. is a binding operator and free and bound variables of expressions are de ned as usual. FV(e ) is the set of free variables of e . A value substitution is a nite map from variables to value expressions, we let range over value substitutions. e is the result of simultaneous substitution of free occurrences of x 2 Dom() in e by (x). We represent the function which maps x to v by fx := v g. Thus e fx:=v g is the result of replacing free occurrences of x in e by v (avoiding the capture of free variables in v ). Contexts are expressions with holes. We use to denote a hole. C [e] denotes the result of replacing any holes in C by e . Free variables of e may become bound in this process. Traps(C ) is the set of variables that can actually be trapped in the process of lling the holes in C . 2.2. Notation and Abbreviations
For any syntactic domain Y and set of variables X we let YX be the elements of Y with free variables in X. For example E; is the set of closed expressions, and Vfzg is the set of value expressions v such that FV(v ) fz g. Furthermore, for any syntactic domain Y and set V of values we let YV be those elements of Y of the form y for some y 2 YZ and 2 Z ! V where Z is a nite set of variables. For example Vfx:set(z;x)g is the set of value expressions of the form v fz:= x:set(z;x)g for v 2 Vfzg.
4
In order to make programs easier to read, we introduce some abbreviations.
fx := e0 ge1 abbreviates
let
(e) seq(e0 ; : : :; en)
(x:e1; e0 )
app
abbreviates e abbreviates letfd := e0 g seq(e1 ; : : :; en) d 62 FV(e ) for i n Y abbreviates f:letfz := mk(nil)g seq(set(z; x:app(app(f; get(z)); x)); get(z)) islam? abbreviates x:and(not(isatom?(x); not(ispr?(x)); not(iscell?(x)))) cond() abbreviates nil 0 cond([e0 ) e0 ]; abbreviates if(e0 ; e00; cond([e1 ) e01 ]; [e1 ) e01 ]; : : :; : : :; [en ) e0n ])) [en ) e0n ]) Note that Y is a xed-point combinator essentially identical to the one suggested by Landin [12]. When applied to a functional F of the form f:x:e, Y creates a private local cell, z, with contents G = x:app(app(F; get(z)); x), and returns G. By privacy of z, G is equivalent to F(G) (cf. [13]). seq
i
2.3. Operational Semantics
The operational semantics of expressions is given by a reduction relation 7! on a syntactic representation of the state of an abstract machine, called descriptions. A state has three components: the current state of memory, the current continuation, and the current instruction. Their syntactic counterparts are memory contexts, reduction contexts and redexes respectively. Redexes describe the primitive computation steps ( -reduction or the application of a primitive operation to a sequence of value expressions). Reduction contexts, R, (called evaluation contexts in [8]) identify the subexpression of an expression that is to be evaluated next. R=
fg + app(R; E) + app(V; R) + Fm+n+1 (Vm; R; En )
R ranges over R. The crucial fact to note is that an arbitrary expression is either a value expression, or decomposes uniquely into a redex placed in a reduction context.
We represent the state of memory using memory contexts. A memory context ? is a context of the form
fz1 := mk(nil)g : : : letfzn := mk(nil)gseq(set(z1 ; v1); : : :; set(zn ; vn); ) where zi = 6 zj when i = 6 j. We have divided the context into allocation, followed let
by assignment to allow for the construction of cycles. Thus, any state of memory is constructible by such an expression. We let ? range over memory contexts. We 5
can view memory contexts as nite maps from variables to value expressions. Thus we refer to their domain, Dom(?); modify them, ?fz := mk(v )g, when z 2 Dom(?); and extend them, ?fz := mk(v )g, when z 62 Dom(?). A description is a pair, ?; e, with rst component a memory context and second component an arbitrary expression. Value descriptions are descriptions whose expression is a value expression, ?; v . If Dom() \ Dom(?) = ;, then (?; e ) is (? ; e ) where ? is the result of applying to each element in the range of ? (replacing vi in the above form by vi ). The reduction relation 7! is the re exive transitive closure of 7! (de ned in [11]). The interesting clauses are: (beta) (mk) (get) (set)
?; R [app(x:e ; v )] 7! ?; R [e fx:=v g ] ?; R [mk(v )] 7! ?fz := mk(v )g; R [z] z 62 Dom(?) [ FV(R[v ]) ?; R [get(z)] 7! ?; R [v ] assuming z 2 Dom(?) and ?(z ) = v ?; R [set(z; v )] 7! ?fz := mk(v )g; R [nil] assuming z 2 Dom(?)
A description, ?; e is de ned (written # ?; e) if it evaluates to a value description. We say two closed expressions are equide ned, e0 l e1 , to mean that (# e0 ) i (# e1 ) (# e abbreviates #;; e for closed e ). 2.4. Uniformity of Reductions
Some simple consequences of the computation rules are that reduction is functional modulo alpha conversion, memory contexts may be pulled out of reduction contexts, and computation is uniform in free variables, unreferenced memory and reduction contexts.
Lemma (cr [11]): (i)
?0[e0 ] = ?1[e1 ] if ?; e 7! ? ; e for i < 2 R [?[e]] 7! ?; R [e] if FV(R) \ Dom(?) = ;. ?; e 7! ?0 ; e0 ) (?; e) 7! (?0 ; e0) if Dom(? ) \ Dom() = ; = FV(Rng()) \ (Dom(? ) ? Dom(?)). ?; e 7! ?0 ; e0 ) (?0 [ ?); e 7! (?0 [ ?0 ); e0 if Dom(? ) \ Dom(? ) = ;. ?; R [e] 7! ?0 ; R [e0] ) ?; R 0[e] 7! ?0 ; R 0[e0] if (Dom(? ) \ FV(R )) Dom(?) i
(ii) (iii)
i
0
(iv)
0
0
(v)
0
0
0
In (cr.i) = is the usual notion of alpha equivalence. It makes explicit the fact that arbitrary choice in cell allocation is the same phenomenon as arbitrary choice of names of bound variables. 6
2.5. Operational Equivalence Two expressions are operationally equivalent, written e0 = e1 , if for any closing context C , C [e0] is de ned i C [e1] is de ned. In general it is very dicult to
establish the operational equivalence of expressions. Thus it is desirable to have a simpler characterization of =, one that limits the class of contexts (or observations) that must be considered. A generalization of Milner's context lemma [18] provides the desired characterization. This theorem is the key to giving a semantics to VTLoE formulas.
Theorem (Generalized Context Lemma [13,11]): ^ e0 = e1 , (8?; ; R )( FV(?[R [ei ]]) = ; ) (?[R [e0 ]] l ?[R [e1 ]])) i