Reasoning about Aliasing - CiteSeerX

15 downloads 4297 Views 202KB Size Report
two call-by-reference parameters is called as proc(x;x), or proc(A i];A j]) when i = j. During the ..... implementation is free to always avoid copying and instead pass local stores. (or all ADT .... nodes 6= hi )(hd "= head nodes ^ tl "= last nodes) ..... Pro- ceedings of an ACM Conference on Language Design for Reliable Software.
Formal Aspects of Computing (1997) 3: 1{000

c 1997 BCS

Reasoning about Aliasing Mark Utting

Department of Computer Science The University of Waikato Hamilton, New Zealand Email: [email protected]

Keywords: Aliasing, Pointers, Object-Orientation, Re nement Calculus. Abstract. Object-oriented systems are typically structured as complex networks

of interacting mutable objects. To reason about such systems, simple and ecient techniques for coping with aliasing are needed. This paper identi es several key criteria for evaluating techniques for reasoning about aliasing, then proposes a technique which satis es these criteria. The proposed technique is a simple extension of the traditional local store technique for modelling pointers. The increasing popularity of the object-oriented style of programming has resulted in a renewed interest in reasoning about aliasing. Aliasing is common in object-oriented systems, because they are typically structured as complex networks of interacting mutable objects. Simple and ecient techniques for coping with aliasing are a pre-requisite to reasoning e ectively about the behaviour of such systems. Techniques for reasoning about aliasing within data structures and aliasing between program variables were thoroughly researched during the 1970's [Rey78, Hor79, Cou90]. However, the results of that research have largely been ignored in recent research on aliasing within object-oriented systems. Furthermore, some of the 1970's solutions are unnecessarily restrictive in the forms of aliasing that they allow. Object-oriented systems require more exible solutions. In the rst two sections of this paper, I reconsider the underlying problems caused by aliasing and identify the key criteria which should be used to evaluate the e ectiveness of any technique for reasoning about aliasing. Then I argue that the best technique for satisfying these criteria is one of the 1970's techniques, which I shall call the local store technique. In the remaining sections of this paper, the local store technique is generalised to remove the unnecessary restrictions that are usually placed upon it. Correspondence and o print requests to : Mark Utting

2

Mark Utting

A simple Queue abstract data type (ADT) is used to illustrate how the additional exibility is useful. An ecient implementation of local stores is shown to be possible by proving that several local stores can be data re ned to a single global store.

1. The Essence of Aliasing Aliasing occurs whenever there are two or more paths to a given entity. When this happens, we say that the entity is aliased, or that the paths are aliases for the entity. The syntax of paths is language dependent, but typically includes variable names and operators to index into arrays, dereference pointers and select components of record or objects etc. Aliasing can a ect many di erent kinds of entities. In most languages, it is possible for variables to be aliased, with two variable names referring to the same variable. For example, this can happen when a procedure proc that takes two call-by-reference parameters is called as proc (x ; x ), or proc (A[i ]; A[j ]) when i = j . During the execution of the procedure, both parameter names will refer to the same variable. A more subtle form of variable aliasing occurs when the procedure is called as proc (x ; y ) and y is a global variable that is accessed directly by the procedure body. Techniques for avoiding aliasing of variables are well known [Apt81, Rey78, Mor88] and can be adopted without restricting expressiveness too severely. In this paper, I assume that aliasing of variables is banned. Distinctness of variables is an important rst step towards controlling other forms of aliasing. A more common source of aliasing is complex data structures, including networks of objects. For example, given an array A, the expressions A[i ] and A[j ] will be aliases for the same array location whenever i = j . Two pointer variables p 1 and p 2 will be aliases for the same record whenever they contain the same pointer value. The expressions p 1 " and p 1 " :next " will be aliases for the same record if the record structures contain a loop with p 1 = p 1 " :next . This kind of aliasing is essential for implementing some algorithms and banning it would not be acceptable. Instead, we want practical techniques for reasoning about the correctness of programs that rely on aliasing. Reasoning about aliasing becomes a problem when the following three events occur: 1. multiple paths p1 ; p2; : : : are created to a given object, v . 2. the object v is updated via one of the paths, pi . 3. another path p is dereferenced (written as p "). The important question is whether or not the object referred to by p was a ected by the update of v in step 2. If p 2 fp1 ; p2; : : :g, then the result of step 3 will be the new value of v , otherwise it will be some other value that was unchanged by step 2. Of course, if we know the precise state of the program after step 1, then we know exactly which paths are aliased and which are not, so it is easy to determine whether or not p refers to v . However, to be able to reason about programs in a practical fashion, it is essential to be able to reason about sets of possible executions, not just a single execution. Typical programs have millions of possible states, or even an in nite number, so we cannot reason individually

Reasoning about Aliasing

3

about each state, but must use abstraction and reason about large sets of similar states. So we need a way of reasoning about the e ect of update commands even when we have imperfect or approximate knowledge about the initial state. The obvious solution to this dilemma is to perform a case analysis in step 3, assuming in one case that p 2 fp1 ; p2; : : :g and in the other case that it is not. However, this naive solution is not satisfactory by itself, since it typically results in a large (and exponential) number of cases to consider. For example, if we want to determine the value of an expression E after an update, and E dereferences N pointers, each of those N dereferences will generate a case split, which will give us 2N possible results for the value of E ! A sequence of updates results in even more case splits. This shows that a crucial issue for practical reasoning about aliasing is the eciency of reasoning. That is, we want techniques that do not lead to an exponential explosion in the number of aliasing case splits. For example, one simple technique for avoiding an explosion of case splits is to have simple decidable criteria for identifying common cases where an update and a dereference cannot possibly interact. This allows the majority of case splits to be eliminated, leaving us to consider only the more interesting or complex cases, in which the update and dereference may interact. There are other criteria that we want to satisfy when reasoning about aliasing. In all, there are four that we will consider in this paper: Soundness: The reasoning technique must not allow incorrect deductions to be made about program behaviour. Completeness: The reasoning technique must be able to handle all forms of (pointer) aliasing that can be used in programs. Eciency: As discussed above, the reasoning technique should reduce as far as possible the number of case-splits caused by possible aliasing between pointers. Modularity: It should be possible to hide aliasing within a module or ADT, if desired. That is, an ADT whose speci cation does not use aliasing or pointers should be able to be re ned to use a pointer implementationwithout changing the interface of the module.

2. Local Stores: A Historical View In the 1960's and early 1970's, the denotational semantics work developed by Strachey [Str72] included techniques for formally modelling pointers and aliasing. Pointers were modelled as locations, and were used as indexes into a global store function, which mapped each location to a corresponding value. Hoare and Wirth used the same concepts to de ne axioms for pointers in their axiomatic de nition of Pascal in the early 1970's [HW73].1 They de ned the declaration of a pointer type type T = " T0 1 It is interesting to note that an early version of Pascal [Wir71] included local stores (called classes ) as an explicit language construct. Each class also speci ed the maximum number of elements that it could contain, presumably to make heap management easier. However, these explicit class variables were dropped from later versions of Pascal.

4

Mark Utting

so that it creates a set of pointers (arbitrary values with equality as the only operation), a function  that maps those pointers to values of type T0 and an integer that records the size of  (this is incremented by each new operation). This means that an assignment such as p ":= E is simply an assignment to  (p ), which is equivalent to  :=   fp 7! E g : ( is the function override operator of the Z speci cation language [Spi92]). With this approach, pointers are no harder to reason about than arrays. Furthermore, this approach goes a long way towards meeting our eciency criteria from Section 1, because each type of pointer has a distinct  function, so updates of di erent types of pointers can never interact. That is, we can avoid case analysis whenever an updated pointer and a dereferenced pointer have di erent types. For programs that use a variety of pointer types, this reduces the number of case splits dramatically. The Euclid language [L+ 77, PHL+ 77], designed several years after Hoare and Wirth's axiomatization of Pascal, incorporated further improvements in the area of pointers and aliasing. Firstly, they gave the implicit  arrays a name, calling them collections. Secondly, they recognised the importance of encouraging programmers to create several collections for the same type of value where possible. This could already be done in Pascal by declaring several di erent pointer types to a common type, but was encouraged in Euclid by requiring collections to be declared before pointer types to those collections. Their rationale for this is worth repeating: \Thus, the progammer can partition his dynamic variables and pointers into separate collections to indicate some of his knowledge about how they will be used; the veri er is assured that pointers in di erent collections can never point to overlapping variables." [PHL+ 77, Page 14]

A third extension was to treat collections as values, rather than as types (or some implicit set associated with a type, as in the Pascal semantics). The importance of this is that if collections are rst-class values, we can pass them as parameters to procedures, and this allows us to use smaller, more speci c collections. For example, in a program that manages several linked lists, if we intend to allow aliasing only within lists, not between them, then we could declare a separate collection for each list. A procedure that manipulates two linked lists could then be passed the two collections, as well as various pointers into those collections. When reasoning about aliasing within the procedure, it would be clear that updates to one of the linked lists did not a ect the other linked list, because distinct collections are involved. In contrast, the Pascal solution would require a single collection for all the lists, because the type of the linked list record declarations determines which collection will be used. Declaring multiple types of linked lists (all with the same structure and functionality) would allow separate collections to be used, but loses exibility and can require a lot of code duplication.2 Thus, the Euclid collections allow the pointers to be split into smaller sets, which results in fewer case splits and more ecient reasoning. It should be clear by now that the key to ecient reasoning about pointers is to partition the global store into smaller local stores in such a way that aliasing

2 Eg. If we make N copies of the list type, then we need N copies of each procedure that handles a single list, N 2 copies of procedures that handle two lists, etc.

Reasoning about Aliasing

5

can never occur between those local stores. Collections are an excellent tool for doing this in a language such as Euclid. However, most recent work on reasoning about aliasing in object-oriented languages has tended to use a single global store for pointers, rather than partitioning them into local stores [dB91, HLW+ 92, Hog91]. Reasons for this may include:  Language designers have not realized the importance of partitioning pointers into non-aliased subsets, so have not provided any language facilities which allow programmers to specify such partitions.  Even in strongly-typed object-oriented languages, pointers cannot automatically be partitioned according to their types like they can in Euclid or Pascal. The problem is that subtyping/inheritance means that a pointer whose static type is T is allowed to point to values of type T , provided that T is a subtype of T . Thus pointers of di erent types may still be aliases.  The patterns of aliasing are typically more dynamic in object-oriented programs than in Pascal-like programs, so a static relationship between pointer variables and local stores may be too restrictive. In the next section, these problems and restrictions are addressed. 0

0

3. Local Stores: An Extended View In this section, a new form of local stores is described. It extends the collections of Euclid in two ways:  each local store is allowed to contain many di erent types of object. This gives the programmer maximum exibility in dividing up the global store in application-speci c ways.  we provide a transfer operation that moves pointers from one local store to another. This allows the partitioning of the global store to be dynamic, rather than static. The underlying philosophy of this approach is to give programmers a highly expressive notation to describe the ways in which their programs use pointers. There is a small penalty for making the second extension. Since pointers are not always associated with the same local store, each dereference and update through a pointer must explicitly indicate which local store is being used. For instance, we must write s (p ) rather than just p ". However, this is not too awkward, and in contexts where p is always associated with a single store we can recover the p " notation by adding an appropriate declaration about p . In this paper, the declaration p "=b s (p ), appearing wherever declarations are allowed, de nes p " as an abbreviation for s (p ) throughout the scope of the declaration. The new de nition of local stores is given as an ADT in Fig. 1. The language used for describing programs and ADTs in this paper is the re nement calculus notation [Mor94], extended with a simple ADT construct, ADT T =b Type Init Operations : T is the name of the ADT, Type is the state space of the ADT instances, Init is a predicate that speci es the initial state of each instance and Operations is a set of named procedures whose formal parameters may be instances of type T .

6

Mark Utting

[Val ] Ptr : PVal

(the set of storable values) (the set of pointers is a subset of Val ) nil : Ptr

ADT Store =b Ptr n fnil g !7 7 Val initially(s : Store ) =b s = ? procedure new ; v : Val ; p : Ptr )  ( s p: Store 2 6 dom s 0 =b s ; p : true , s = s0 [ fp 7! v g procedure update ( s : Store ; p : Ptr ; v : VAL) =b s : [ p 2 dom s , s = s0  fp 7! v g] procedure deref ( s : Store ; p : Ptr ; v : Val ) =b v : [ p 2 dom s , v = s (p )] procedure "transfer ( s ; t : Store ; ptrs : F Ptr ) s = ptrs ? C s0 # =b s ; t : ptrs  dom s , t = t0 [ (ptrs C s0 ) ptrs \ dom t0 = ? ref

value

ref

ref

value

value

value

ref

Fig. 1.

value

ref

value

The Local Store ADT

For generality, the programming language used in this paper is untyped, except that formal parameters of procedures may be typed. There are two reasons for typing formal parameters:  to identify which parameters are instances of the ADT being de ned (this is necessary to support data re nement of the ADT), and  as an abbreviation for adding a typing predicate to the precondition and postcondition of the procedure. For example, within ADT T = E : : :, proc ( x : T ; y : T 2) =b x : [ P , Q ] is an abbreviation for ref

x 2E # proc ( x ; y ) =b x : y 2 T 2 , y 2 T 2 : Q P It is important that Store instances cannot be copied, because at all times

" x 2E

ref

during the execution of a program, we want all the pointers in the program to be

partitioned by the local stores that have been created up to that point. Copying

a (non-empty) local store would violate this partitioning property. To ensure that local stores (and possibly other user-de ned types) cannot be copied, it is sucient to impose the following restrictions on the re nement calculus language:  Instances of a ADT cannot be copied unless the ADT explicitly de nes an assignment operation (:=). This is similar to the limited private types of Ada [AA83].  As illustrated in Fig. 1, reference parameters are used rather than the usual value-result parameters of the re nement calculus [Mor88]. Since variables

Reasoning about Aliasing

7

are not aliased, the two are equivalent semantically, but the use of reference parameters emphasises that no copying of parameters is necessary.  Value parameters are immutable (ie. cannot be assigned to within a procedure and cannot be passed by reference to any other procedure). With this restriction, plus the fact that procedure parameters cannot be aliased, it makes no di erence whether value parameters are copied or passed by reference, so an implementation is free to always avoid copying and instead pass local stores (or all ADT instances) by reference. Since Store instances cannot be copied, and variables cannot be aliased, at all times between the creation of a local store s and the destruction of s ,3 there is a unique variable v that contains s . Note that v may either be bound directly to s , or to some compound value (even another local store!) that contains s . In either case, any expression or statement that dereferences or updates s must involve v . This property gives us a simple syntactic test for determining when an assignment does not change s : if the left-hand-side of the assignment does not contain v , then it cannot change s . Given a reasonable partitioning of pointers into local stores, this simple test eliminates the vast majority of case splits that would occur if we attempted to reason about pointers using a single global store. Even though the Store ADT allows pointers to be dereferenced only with respect to a given (local) store, it still requires pointers to be globally unique across all stores, so that the transfer operation can move pointers between two arbitrary local stores without having to reallocate pointers. This is important in practice, because we may want to distribute a particular pointer value throughout various data structures and it would be highly inconvenient to have to update all those data structures whenever we wanted to transfer the pointer's value from one local store to another. The following simple example shows how local stores simplify reasoning. To improve readability, update (s ; p ; v ) is written using the usual assignment notation, s (p ) := v . Pointer derefences are written simply as function applications, because the Store ADT is speci ed using a function.4 simple ( s ; t : Store ; p 1; p 2 : Ptr ) =b j[fp 1 2 dom s ^ p 2 2 dom t g ; s (p 1) := 4 ; if t (p 2) = 2 !    fi]j In this example, we can immediately deduce that the update of s (p 1) does not a ect the following dereference of p 2, because distinct local stores are involved. Note that s and t cannot be the same local store, because the Store ADT does not allow copying, and because simple variables like s and t cannot be aliased (Section 1). In contrast, if simple used a global store, the update of p 1 could potentially a ect the dereference of p 2. ref

value

3 Or the point in the program where s becomes permanently inaccessible { we can view it as being destroyed by a garbage collection process at that point. 4 Note that programs that use an ADT are allowed to refer to its abstract representation

directly in predicates and speci cations. However, all such references must eventuallybe re ned into calls to the ADT operations.

8

Mark Utting

4. A Case Study: Queues

To illustrate the exibility of local stores, we will consider an integer queue ADT as a case study. The ADT has the usual join and leave operations for queues, plus an append (q 1; q 2) operation that concatenates the contents of q 2 onto the end of q 1, leaving q 2 empty.

ADT IntQ =b seq Z; initially(q : IntQ ) =b q = h i procedureh join ( q : IntQ ;i x : Z) a =b q : true , q = q0 hx i procedure leave h ( q : IntQ ;a i x : Z) 6 h i , q0 = hx i q =b q ; x : q = procedure append h ( q 1; q 2 : IntQ ) i =b q 1; q 2 : true , q 1 = q 10 a q 20 ^ q 2 = hi ref

value

ref

result

ref

This can be data re ned to use a linked list representation, with a local store hidden inside each instance of the IntQ ADT. The following Z schema [Spi92], Node , de nes the type of each node of the linked list. The LList schema de nes the whole linked list, with a head and tail pointer and a local store (the sequence of Node records is included to simplify the re nement process { it would typically be removed by a further re nement step). Node val : Z next : Ptr

true

mkNode : Z Ptr ! Node valof : Node ! Z (8 v : Z; p : Ptr ; n : Node  mkNode (v ; p ):val = v ^ mkNode (v ; p ):next = p ^ valof n = n :val )

LList s : Store nodes : seq Node hd ; tl : Ptr ; hd "=b s (hd ); tl "=b s (tl ) (8 i : 1 : : #nodes ? 1  s (nodes (i ):next ) = nodes (i + 1)) last (nodes ):next = nil nodes = h i ) (hd = nil ^ tl = nil) nodes 6= h i ) (hd "= head nodes ^ tl "= last nodes )

To derive a linked list implementation of IntQ , we use the following data abstraction relation between the abstract variable q : Queue and the concrete variable c : LList . QL(q ; c ) =b q = c :nodes o9 valof ^ c 2 LList

A general rule for data re ning an ADT T using an abstraction relation Rac

Reasoning about Aliasing

9

(abstract variable a and concrete variable c ) is [BH93a, BH93b]:

ADT T =b Ta initially(a : T ) =b I (a ) procedure op1 ( a : T ; : : :) =b a ; w1 : [ P1 , Q1] procedure op2 ( a1; a2 : T ; : : :) =b a1; a2; w2 : [ P2 , Q2 ] v ADT T =b Tc initially(c : T ) =b (9 a : Ta  Rac ^ I (a )) procedure op1 ( c : T ; : : :) =b c ; w1 : [ (9 a : Ta  Rac ^ P1 ) , (9 a : Ta  Rac ^ Q1 )] procedure op2 ( c1 ; c2 : T ; : : :)   =b c1; c2 ; w2 : 9 a1 ; a^2 R: Ta ac ^RPa12c1 , 9 a1 ; a^2 R: Ta ac ^RQa12c1 2 2 2 2 ref

ref

ref

ref

where w1 and w2 are all the other (non-T ) parameters to op1 and op2 , respectively. Thus, the data re ned initialization condition is:

initially(c : IntQ ) =b (9 q : seq Z q = c :nodes o9 valof ^ c 2 LList ^ q = hi) = c :nodes = hi The data re ned join operation is:

procedure 13 2 0 join ( c : IntQ ; 1x : 0Z)9 q : seq Z 9 q : seq Z B q = c:nodes o9 valof ^ CC77 6 c :nodes o9 valof ^ C ,B =b c : 64 B A @ qc 2= LList @ c 2 LList ^ A5 ^ a o true q = c0:nodes 9 valof hx i v h i c : true , c :nodes o9 valof = c0 :nodes o9 valof a hx i v j[var n : Node = mkNode (x ; nil); p : Ptr  new (c :s ; n ; p ) ; if c:hd = nil ! c:hd ; c:tl := p ; p [] c :hd = 6 nil ! c :tl " :next ; c :tl := p ; p fi ref

value

]j

Note that c :tl " :next := p is an abbreviation for c :s (c :tl ):next := p , because the tl "=b s (tl ) declaration permanently associates c :tl with c :s . Assignment to a record component like next implicitly leaves all other components unchanged, so this is an abbreviation for c :s (c :tl ) := mkNode (c :s (c :tl ):val ; p ), which is in turn an abbreviation for update (c :s ; c :tl ; mkNode (c :s (c :tl ):val ; p )).

10

Mark Utting

Similarly, the data re ned append operation is: append ( c ; d : IntQ )  o :nodes o9 valof ) a (d0 :nodes o9 valof ) =b c ; d : true , c :nodes 9 valof = (dc:0nodes o valof = hi ref

v

9

if c:hd = nil ! c:hd ; c:tl := d :hd ; d :tl [] c :hd 6= nil ! c :tl " :next ; c :tl := d :hd ; d :tl fi ; transfer (d :s ; c :s ; dom(d :s )) ; d :hd ; d :tl := nil; nil

This example shows how local stores satisfy the modularity requirement discussed in Section 1. The IntQ ADT has been re ned to use a linked list implementation with pointers, but its interface is unchanged. None of its clients need to know whether or not it uses pointers. Furthermore, if some of its clients use pointers themselves, reasoning about those pointers is completely independent of the pointers used in the implementation of IntQ . These modularity properties are essential for any practical reasoning about pointers.

5. Implementation of Local Stores In this section, we consider how local stores can be implemented eciently. In most situations, it would be desirable to implement all the local stores by a single global store. This global store could then be implemented directly by the global store provided by most programming languages. The advantage of this approach is that the transfer operation can be implemented by skip! With this approach, local stores are simply being used as a way of structuring and documenting the disciplined usage of global pointers. In other words, we get the best of both worlds: the ease and eciency of reasoning that only local stores can provide, plus the usual ecient implementation of global stores (with no runtime overhead for recording which store each pointer belongs to). The Store ADT in Fig. 1 is carefully designed so that this re nement to a single global store is possible. In particular, it does not provide any operations for testing whether or not a given pointer is in the domain of a local store | such operations would require some runtime representation of the contents of each local store. To prove that the Store ADT given in Fig. 1 is re ned by a single global store, we need a more general form of data re nement than the ADT re nement rule in Section 4, so that we can replace all the instances of the Store ADT by a single instance. An easy way of doing this is to translate the Store ADT into an equivalent Morgan-style module [Mor94], which maintains a sequence of Store instances and provides explicit Create and Destroy operations for creating and destroying those instances. The details of such a translation have been worked out by Bancroft and Hayes [BH93b, BH93a]. After translating the Store ADT, we get the MStore module in Fig. 2. Next we want to calculate a re ned version of this module, whose internal state is a single (global) store g , using the data re nement relation g = [ ran A : However, this results in a module whose Create operation is infeasible, because

Reasoning about Aliasing

11

module MStore local type Store =b Ptr n fnil g !7 7 Val ; local type StoreId =b N; var A : StoreId !7 7 Store ; initially A = fg; procedure create StoreId )   ( kk62: dom A0 =b A; k : true , A = A0 [ f k 7! fg g) ; procedure new k : StoreId ; v : Val ; p : Ptr ) " ( p 62 dom(A0 k ) # =b A; p : k 2 dom A , A k = A0 k [ fp 7! v g ; fk g ?C A = fk g ? C A0 procedure update ( k : StoreId ; p : Ptr ; v : VAL) A k = A k [ f p ! 7 v g k 2 dom A 0 =b A : p 2 dom(A k ) , fk g ? C A = fk g ? C A0 ; procedure deref ( k : StoreId ;  p : Ptr ; v : Val ) k 2 dom A =b v : p 2 dom(A k ) , v = A k p ; procedure2 transfer ( s ; t : StoreId ; ptrs : F Ptr 3) s= 6 t ? C A0 s 7 dom A , A t =A As 0=t ptrs [ ( ptrs C A0 s ) 5 ; =b A : 64 st 22 dom A ptrs \ dom( A t ) = fg 0 ptrs  dom(A s ) procedure destroy ( k : StoreId ) C A0 ]; =b A : [ k 2 dom A , A = fk g ? end result

value

value

value

value

value

ref

value

value

value

ref

value

value

Fig. 2.

The Store ADT translated to an equivalent module.

Create is required to allocate a fresh StoreId , but the module does not contain any record of what StoreId values have already been allocated. A simple solution to this is to add ids : PN to the state of the module, then use the data re nement

relation g = [ ran A ^ ids = dom A :

With this change, the data re nement calculation is possible, and the resulting module, called MStore 2, is shown in Fig. 3. It could usefully be re ned further to replace the ids set by a simple integer counter that is used to allocate fresh StoreId values and to re ne the transfer operation to skip by weakening its precondition to true. However, such re nements are straightforward and are not shown here. The main point is that we have seen how local stores can be re ned to a single global store whose new , update and deref operations can be implemented using the usual pointer operations of common programming languages.

12

Mark Utting

module MStore 2 local type Store =b Ptr n fnil g !7 7 Val ; local type StoreId =b N; var g : Store ; ids : PStoreId ; initially g = fg ^ ids = fg; procedure create k : StoreId )  ( k 0 ; =b ids ; k : true , ids = 62idsids 0 [ fk g procedure new k : StoreId ; v : Val ; p : Ptr )  ( p 2 6 dom g =b g ; p : k 2 ids , g = g0 [ fp 7! v g ; procedure update ( k : StoreId ;  p : Ptr ; v : VAL) k 2 ids =b g : p 2 dom g , g = g0 [ fp 7! v g ; procedure deref ( k : StoreId p : Ptr ; v : Val )  ; k 2 ids =b v : p 2 dom g , v = g p ; procedure" transfer ( ptrs : F Ptr ) #s ; t : StoreId ; s= 6 t =b g : s 2 ids , g = g0 ; t 2 ids procedure destroy ( k : StoreId ) =b ids : [ k 2 ids , ids = ids0 n fk g] end result

value

value

value

value

value

ref

value

value

value

ref

value

value

Fig. 3.

A re nement of MStore which uses a single store.

6. Related Work The approach taken to controlling aliasing in this paper can be regarded as an application of the syntactic control of interference approach of Reynolds [Rey78]. The key to his work is that all interference must go through a common identi er, and this is a major feature of local stores too. His notion of interference is more general than just aliasing between pointers. However, this paper extends his work by considering an extended programming language with heap allocated objects, and by adding the transfer operation. The closest recent work to the extended local stores described in this paper is the islands of Hogg [Hog91]. An island is the transitive closure of all the objects accessible from a given bridge object. Hogg models pointers using a global store, but uses various read and write modes to ensure that all aliases to objects within an island go through the bridge object, which means that the island is e ectively a local store. This gives his technique better reasoning eciency than a simple global store model. Hogg also de nes a destructive pointer read which is a simple kind of transfer operation. His techniques are informal, and rely upon the distinction between stack-based and heap-based references to objects, so no

Reasoning about Aliasing

13

proof of soundness is available yet. It would be interesting to try to formalize his technique using local stores, to gain a deeper understanding of similarities and di erences between the two approaches.

7. Conclusions It is useful to consider how well the local store technique satis es the four evaluation criteria for reasoning about aliasing that were discussed in Section 1. Soundness: The local store technique is certainly sound, because it is based on the usual denotational semantics description of pointers. Furthermore, it has been the standard technique for reasoning about pointers in Pascal-like languages for over 20 years. Completeness: The local store technique is complete, because it is always possible to use a single local store to model the global store of a given system. However, this is not a good approach in practice, because it does not take advantage of the extra expressibility that local stores have over a single global store. A more useful approach is: given a particular system, try to partition its global store into several local stores in a way that minimises transfer operations. Eciency: Eciency of reasoning is the strongest argument for using local stores. The more local stores that are used, the fewer case splits are generated when reasoning about pointers, because pointers in di erent local stores cannot be aliased. Modularity: The queue example in Section 4 shows how the local store technique supports modularity. Pointer implementations can be completely hidden within an ADT, with no change in interface. Thus, the use of local stores satis es the reasoning criteria admirably. However, the discussion of completeness shows that an important area for further work is to experiment with the local store technique in larger case studies. Such experimental work is necessary to determine whether the use of local stores adds a signi cant burden for the programmer, whether there are some kinds of systems for which it is hard to partition the global store into useful local stores and to develop guidelines about how nely the global store should be partitioned. To carry out such experimental work it would be desirable for the local store technique to be incorporated into a programming language, then used for the development of real systems. Incorporating local stores into a programming language could be done by building it into a new language designed for formal veri cation, or simply by de ning the Store ADT as a library module of an existing object-oriented programming language (and perhaps relying on style guidelines for avoiding aliasing of variables). When programs are being developed informally, it would be useful to implement a variant of the MStore module (rather than the more ecient MStore 2 module), and use local stores as a debugging aid, with error messages being generated whenever the preconditions of the various operations are not satis ed. This would require some runtime overhead, because the domain of each store must be maintained at runtime in order to check the preconditions, but would provide a strong check on the correctness of pointer-based programs. Then, once debugging is completed, or the program has been proved correct, the debugging

14

Mark Utting

implementation of the MStore module could be replaced by the more ecient MStore 2 module. In conclusion, the traditional local store technique is the best method for reasoning about pointers eciently. The enhancements described in this paper (relaxing the type system, treating stores as rst class values and adding a transfer operation) make it applicable to more languages and to a wider variety of systems.

8. Acknowledgements Thanks to members of the re nement group at The University of Queensland for their comments on this work and to Ray Nickson for reviewing a version of this paper.

References [AA83] [AdB94] [Apt81] [BH93a]

ANSI and AJPO. Military Standard: Ada Programming Language. US Department of Defence, 1983. ANSI/MIL-STD-1815A-1983. Pierre America and Frank de Boer. Reasoning about dynamicallyevolving process structures. Formal Aspects of Computing, 6(3):269{316, 1994. K. R. Apt. Ten years of Hoare's logic: A survey - part 1. ACM Trans. on Prog. lang. and Systems, 3:431{483, 1981. P. G. Bancroft and I. J. Hayes. Re ning a module with opaque types. In Gopal Gupta, George Mohay, and Rodney Topor, editors, Proceedings of the 16th Australian Computer Science Conference, pages 615{624. Australian Computer Society, February 1993. [BH93b] Peter Bancroft and Ian Hayes. Re nement of opaque types. Technical Report 267, Department of Computer Science, The University of Queensland, St. Lucia, QLD 4072, Australia, April 1993. An earlier draft was presented at the Second Australian Re nement Workshop, 1992. [Cou90] Patrick Cousot. Methods and logics for proving programs. In Handbook of Theoretical Computer Science. Volume B: Formal Models and Semantics, chapter 15, pages 841{993. Elsevier Science Publishers, 1990. [dB91] Frank S. de Boer. A proof system for the language POOL. In J. W. de Bakker, W. P. de Roever, and G. Rozenberg, editors, Foundations of ObjectOriented Languages, pages 124{150. Springer-Verlag, 1991. Proceedings of REX School/Workshop, May/June 1990, LNCS 489. [HLW+ 92] John Hogg, Doug Lea, Alan Wills, Dennis deChampeaux, and Richard Holt. The Geneva conventionon the treatmentof object aliasing. OOPS Messenger, 3(2):11{ 16, 1992. [Hog91] John Hogg. Islands: Aliasing protection in object-oriented languages. SIGPLAN Notices, 26(11):271{285, 1991. Proceedings of OOPSLA '91. [Hor79] Jim Horning. A case study in language design. In F. L. Bauer and M. Broy, editors, Program Construction. Springer-Verlag, 1979. LNCS 69. [HW73] C. A. R. Hoare and N. Wirth. An axiomatic de nition of the programming language pascal. Acta Informatica, 2:335{355, 1973. [L+ 77] B. W. Lampson et al. Report on the programming language Euclid. SIGPLAN Notices, 12(2), 1977. [Mor88] Carroll Morgan. Procedures, parameters and abstraction. Science of Computer Programming, 11, 1988. [Mor94] Carroll Morgan. Programming from Speci cations. Prentice Hall, 1994. Second Edition. [PHL+ 77] G. J. Popek, J. J. Horning, B. W. Lampson, J. G. Mitchell, and R. L. London. Notes on the design of Euclid. SIGPLAN Notices, 12(3):11{18, March 1977. Proceedings of an ACM Conference on Language Design for Reliable Software.

Reasoning about Aliasing [Rey78] [Spi92] [Str72] [Wir71] [Wir74]

15

John Reynolds. Syntactic control of interference. In Conference Record of the Fifth Annual ACM Symposium on Principles of Programming Languages, pages 39{46, 1978. J. Michael Spivey. The Z Notation: A Reference Manual. International Series in Computer Science. Prentice-Hall, second edition, 1992. C. Strachey. Varieties of programming language. In High level languages, Infotech state of the art report 7. Maidenhead, Eng., 1972. N. Wirth. The programming language Pascal. Acta Informatica, 1:35{63, 1971. Niklaus Wirth. On the design of programming languages. In Information Processing 74, pages 386{393. North-Holland Publishing Company, 1974. Invited Paper.