An Imperative Language with Read/Write Type Modes Paul Roe Queensland University of Technology, Brisbane, Australia
[email protected]
Abstract. Reading and writing of data is fundamental in computing,
as is its control. However control of reading and writing has traditionally only been available at the level of le systems, and not programming language data structures. In this paper a simple imperative language is described which uses type modes to control reading and writing of data. A type may be labelled read-write or read-only; a read-only type is guaranteed by the type system not to be written. Furthermore a read-write type may be treated read-only in a sub-context. To achieve this implicit aliasing is prevented and the program heap is partitioned into collections. Collections form a unit of read-write control of heap allocated data, by isolating dierent heap regions. Collections were originally introduced in the Euclid and Turing programming languages for aliasing control; however this was rather restrictive and not strictly enforced. Controlling aliasing is bene cial in its own right since aliasing is a common source of programming errors.
1 Introduction Reading and writing of data is fundamental in computing, as is its control. However control of reading and writing has traditionally only been available at the level of le systems, and not programming language data structures. Clearly it is desirable for a programming language to be capable of specifying and enforcing controls on the reading and writing of data. In particular this is useful for: documentation, software engineering (more rigorous programming), formal reasoning about programs and security eg for persistent systems or distributed computing (applets). In this paper a simple imperative language is described which uses type modes to specify the read/write attributes of data. Only data of type read-write may be written; data of type read-only is guaranteed not to be written. These are strong guarantees, enforced by the programming language's type system. Furthermore the type system safely permits data of read-write type to be treated read-only in a sub-context. Languages such as C and Ada support constant parameter passing modes; however this is not guaranteed in the presence of aliasing and only operates at a shallow level. The language presented here overcomes these limitations. To make strong guarantees concerning read-write attributes of data, implicit aliasing must be prevented. For example, a variable of type read-only must not be
aliased to one of a writable type. However explicit aliasing via pointers is useful; to support this the program heap is partitioned into collections. Collections form a unit of read-write control of heap allocated data, by isolating dierent heap regions. Collections were originally introduced in the Euclid [9] and, its successor, Turing [6, 7] programming languages for aliasing control. However collections in these systems were rather restrictive and aliasing was not strictly enforced. Controlling aliasing is bene cial in its own right since unintentional aliasing is a common source of programming errors. The idea of statically controlling aliasing has been around for a long time, eg [4, 10]. The remainder of this paper is organised as follows: the next section informally describes the language, Section 3 formally describes the language's type system, the last sections describe related work and, conclusions and future work.
2 The Language: Informal Introduction The philosophy taken was to create a small, rst order, imperative language, similar to existing languages, other than its control of reading and writing. The language is only a toy one, but is suciently expressive to demonstrate key ideas. It could be easily extended with additional control constructs, data types, nested procedures, global variables etc. The most important omissions from the language are: modules and type abstraction; the latter is discussed in the nal section.
2.1 Simple types A program consists of a set of procedure de nitions, one of which is called main, eg: proc main var i: *int begin i := 42 print(i) end
The language supports basic data types (integers, booleans and pointers), and records. Each basic type has an associated mode: read-only denoted \-" or readwrite \*". In the above example, the variable i may be assigned to because it has a read-write type (*). If it was declared as having type -int, assigning to i would be a type error. (As such i would be useless since it could not be written; however the language does not prevent the writing of such programs.) Note, the language has no global variables and procedures cannot be nested. To understand the subtleties of the programming language, assignment must be understood as a copying operation: it copies a value into a location. Type modes subsume parameter passing mechanisms. For example the print procedure is declared thus:
proc print (i: -int) begin ... end
Thus print takes a single integer parameter which is read-only within the context of the procedure. A read-write type is compatible with a read-only formal parameter, but the converse does not hold. Record types are not moded, but their elds, if of basic type, are eg: proc recexample (r: rec i: *int, b: -bool end) begin if r.b then r.i := 42 end print(r.i) end
The boolean eld, b, cannot be written, only read. In general using modes on basic types to control reading and writing necessitates the use of structural type equivalence. Arrays are treated like single element variables. All elements of an array are identi ed with the array since an indexing operation may access any element of the array. Thus writing to any array element is equivalent to writing to all elements, eg: proc arrayexample (a: array [100] *int) var i: *int begin i := 0 while i#100 do a[i] := i i := i+1 end end
Arrays are indexed from zero, of statically known size, and type equivalent by structure including size. Arrays may be multidimensional. Aliasing is only permitted of constant, read-only, types, eg: proc foo(i: *int, j: *int) begin end proc bar(i: -int, j: -int) begin end proc main var x: *int, y: *int begin foo(x,y) -- ok
-- foo(x,x) -- error, aliasing actual to r/w formals bar(x,x) -- ok, since formals are constant end
Notice how a read-write typed actual parameter is compatible with multiple read-only formal parameters. Thus the language has some similarities with linear type systems. Read-write types are like linear types; a read-write value can only be bound to a single variable (identi er).
2.2 Complex Types: Pointers and Collections
Complex types involving pointers necessitate collections. Collections are used to partition the program heap. A collection consists of a bag of values of some type. All pointers point into some collection, and type recursion is only possible through collections. Pointers are only assignment compatible if they refer to the same collection. For example a collection representing list cells may be de ned thus: proc main collection list = rec i: *int, n: *ptr list end var l1 = *ptr list, l2 = *ptr list begin new(l2) l2^.i := 2 l2^.n := nil new(l1) l1^.i := 1 l1^.n := l2 end
The keyword collection declares new collections, in much the same way as var declares local variables. Each collection contains one type of element. All elements of a collection are identi ed with the collection since a pointer may point to any element of the collection. Collections are rather like dynamically sized arrays. Procedures may be parameterised on collections, eg. list length procedure: proc len (l: -ptr list, i: *int) var t: *ptr list begin i := 0 t := l while t # nil do i := i + 1 t := t^.n end end
Collection parameters are enclosed in angle braces. Actual collections are not supplied as explicit arguments. For example len may be invoked thus: len(l1,i). The key point is that actual collections are matched against formal collections by structure. Thus len can be invoked on lists in dierent collections; this is in
stark contrast to the Euclid and Turing languages. Formal collections are ignored if they are not used in formal parameters. In the example above notice how a cursor (t) is used to traverse the read-only list structure. The type of the list collection guarantees that the list will not be changed by this procedure. Using cursors to traverse data structures like this is a common pattern of computation and the types provide a useful description of this. Collection parameters can be aliased, like ordinary parameters. However in addition collections may be combined: multiple actual collections can be bound to one formal collection. In both cases, collection aliasing or combination, collections must have a read-only type. In the case of aliasing this prevents inconsistencies if collection aliases are treated dierently, and in the case of collection combination it prevents implicit cross collection pointers from being formed. Furthermore a value cannot be aliased by being bound to part of a collection formal parameter and by also being bound to a conventional formal parameter; unless it is constant in both situations. For example an invocation such as len(l1,l1^.n) is illegal (type incorrect) since the list collection formal parameter and formal parameter i are aliased, and i has a read-write type. Another example, a procedure to concatenate two lists together, belonging to the same collection: proc listconcat (x: *ptr list, y: -ptr list) var z: *ptr list begin if x=nil then x := y else z := x while z^.n # nil do z := z^.n end z^.n := y end end
The type of the list collection states that the list structure (pointers) may be changed, but the elements (integers) will not change. List head and tail may be de ned in a similar way: proc head (l: -ptr list, i: *int) begin if l#nil then i := l^.i end end proc tail (l: *ptr list) begin if l#nil then l := l^.n end end
Like list length, the type of list head guarantees that the list is not changed, and that the parameter n cannot be part of the list. List tail is more interesting, and reveals the nature of the restricted programming model which arises using collections and only explicit aliasing. As previously mentioned a read-write value cannot be bound to a formal parameter and be part of a collection formal parameter. Thus the parameter to tail cannot actually be part of the list structure itself. This is illustrated in the example below: proc main collection list = rec i: *int, n: *ptr list end var l1 = *ptr list, l2 = *ptr list begin new(l2) l2^.i := 2 l2^.n := nil new(l1) l1^.i := 1 l1^.n := l2 -- tail(l1^.n) -- illegal, l1^.n is part of the list collection tail(l1) -- ok since l1 is not part of the list collection end
Eectively a style of programming arises where root, external, pointers to collections are distinguished from internal ones. This is rather restrictive, particularly for recursive algorithms. However, it is usually possible to copy an internal pointer to a local variable in order to overcome this restriction. Therefore the type of tail can be rewritten thus, without loss of applicability: proc tail (l: *ptr list) begin if l#nil then l := l^.n end end
3 The Formal Type System The abstract syntax for the language is shown in Figure 1 and the types in Figure 2. All basic types have an associated mode: -, * or !. The role of ! will be explained later. A special type () is used for commands. The type rules are shown in Figures 3 and 4. The rules have the form: ? ` S : T where S is one of the categories of abstract syntax. The type of the environments used for type checking are: ?; : Id 7! T . (The predicate CheckColPointers is not shown, it checks that pointer types refer to collections which are in scope.) Most of the type rules are straightforward. A few noteworthy ones are discussed. Assignment type compatibility is checked by the := relation (Figure 5) having form: T := U where T and U are types. It requires the basic type components of an assignment target to have read-write modes. Thus if a record is assigned to, all its components must be writable. The type of new is rather strange in that it only requires the pointer to be writable, but the collection could be read-only; however in such a context the new element could not be set.
Programs Prog = Pd Procedure decls Pd = proc Id Cp Vp Cd Vd C Collection params=decls Cp ; Cd = (Col = T ) Variable params=decls Vp ; Vd = (Id : T ) Collections
Col
= Id
Commands
C
= L := E j
Expressions
E
= L j Integer j
L-Expressions
L
= Id j L.Lb j L^ j L[E ]
Labels Identi ers
Lb Id
j j j
j
new L skip j C1 ; C2 if E do C end j while E do C end call Id E
E1 + E 2
j
nil
E1 < E 2
Fig. 1. Abstract Syntax All of the complexity in the type system is involved in type checking procedure calls: (1) L? (P ) = proc CL (U1 : : : Un ) (2) ? ! =0 ? 8 =1 : ? ` E : T (3) = fCol 7! T j (Col = T ) 2 CLg (4) C dom (? ) dom () Check (C ; ) (5) 8 =1 : C ` T ) U (6) 8(Col ; Col ) 2 C : C ` ?0 (Col ) ) (Col ) ? ` call P (E1 : : :E ) : () Line 1 simply looks up the procedure's type in the environment, but note the language is only rst order. The general approach is similar to that taken for linear type systems; essentially the environment and its types are partitioned, using a three place relation (X !Y Z ) (line 2, de ned in Figure 6) such that if a value (a variable or collection in the environment) has a read-write type it only occurs in one partition. The special type mode ! is used in other partitions to denote that the type (value) is unavailable. Each actual parameter is checked in one of the environment partitions. Thus each actual parameter which must be used read-write can only be bound to one formal parameter, since it is only moded read-write in one environment partition (in others it is \!"). A mini-environment of formal collections is created, line 3, for actual to i
:::n
i
0
i
:::n
i
:::n
i
i
i
i
n
i
0
Types
T; U; V = M B
j j j j
j
rec (Lb : T ) array [Nat ] T proc (Col = T ) T ()
Basic Types B; B 0 = int j bool j nil j ptr Col =?j j!
Type Modes M; ?
Fig. 2. Types
?0 = ? ]
U =1 i
::: n
Programs fP 7! proc Cp i (T jT 2 0Vpi ) where Pd i = proc Pi Cpi Vpi 8 =1 : ? ` Pd i : () ? ` Pd 1 : : : Pd n : () i
i
g
::: n
Procedures ? 0 = fCol
CheckColPointers (Cp ; Vp ; Cd ; Vd )
7! T j(Col = T ) 2 Cp g ] fId 7! T j(Id : T ) 2 Vp g ] fCol 7! T j(Col = T ) 20 Cd g ] fId 7! T j(Id : T ) 2 Vd g ? ` C : () ? ` proc Id Cp Vp Cd Vd C : () X ]Y = if dom (X ) \ dom (Y ) = ; then X [ Yelse unde ned
Fig. 3. Type Checking: Programs and Procedures (? ` Prog : T ; ? ` Pd : T ) formal collection compatibility checking. In line 4 a map between actual and formal collections is created: C : Col Col . In addition the map is subject to the constraints imposed by Check (Figure 7). The Check predicate ensures that if actual and formal collections are not in a simple one to one correspondence then the formal collections involved must be constant. Actual parameter types are checked for compatibility with formal parameter types (line 5) using the ) relation having form: C ` T ) U . This relation, de ned in Figure 8, checks that: types are the same, type modes are compatible, and that any collection mappings occur in C , and hence are valid. Note that
Commands ? ` LE : T ? ` E : U T := U ? ` LE := E : ()
? ` LE : ptr Col ? ` new LE : ()
? ` skip : () ? ` C1 : () ? ` C2 : () ? ` C1 ; C2 : ()
L
? ` E : ?bool ? ` C : () ? ` if/while E do C end : ()
? (P ) = proc CL (U1 : : : Un ) ? ! =0 ? 8 =1 : ? ` E : T = fCol 7! T j (Col = T ) 2 CLg C dom (? ) dom () Check (C ; ) 8 =1 : C ` T ) U 8(Col ; Col 0 ) 2 C : C ` ?0 (Col ) ) (Col 0 ) ? ` call P (E1 : : : E ) : () i
i
::: n
i
i
::: n
::: n
i
i
i
i
i
n
Expressions ? ` Integer : ?
?int ;
` E1 : ?int ? ` E2 : ?int ? ` E1 + E2 : ?int
?
` nil : ?nil
? ` E1 : ?int ? ` E2 : ?int ? ` E1 < E2 : ?bool
L-Expressions
` Id : ? (Id ) ? ` LE : rec : : : (Lb : T ) : : : ? ` LE : ?ptr Col ? (Col ) = T ? ` LE .Lb : T ? ` LE ^ : T ? ` LE : array [N1 : : : Nn ]T 8i =1 n : ? ` Ei : ?int ? ` LE [E1 : : : En ] : T ?
:::
Fig. 4. Type Checking: Commands and Expressions (? ` C :
T; ? ` E : T )
the relation has no rules for ! moded types, hence such types are prohibited in actual parameter types. However importantly \!" moded types can be used during actual parameter evaluation. In the nal line (6) the types of actual collections are checked for compatibility with the types of formal collections using the ) relation. Thus for a procedure call, the transitive closure of all pairs of actual and formal collections which will become bound are checked for parameter type compatibility. Note that the call type environment ? is partitioned into n + 1 parts, where
B := ?B; ptr Col := ?nil 8 =1 : T := U i
::: n
i
i
rec (Lb 1 : T1 ) : : : (Lb n : Tn ) := rec (Lb 1 : U1 ) : : : (Lb n : Un ) T := U array [N1 : : : N ] T := array [N1 : : : N ] U n
n
Fig. 5. Assignment Type Compatibility (T := U )
n
!
L =0 i
::: n
Environments ?
i
= !? ?1 ; i
i
i
0 = ? 0
fg!fgfg T !U V fId 7! T g!fId 7! U gfId 7! V g
?
? ! ?1 ?2 !1 2 ] !(?1 ] 1 )(?2 ] 2 )
Types B !B !B ; B !!B B ; B !?B ?B ; ?B !?B ?B ; !B !!B !B 8 =1 : T !U V rec (Lb 1 : T1 ) : : : (Lbn : Tn )! rec (Lb 1 : U1 ) : : : (Lbn : Un ) i
::: n
i
i
i
rec (Lb 1 : V1 ) : : : (Lbn : Vn )
T !U V
array [N1 : : : N ] T !array [N1 : : : N ] U array [N1 : : : N ] V n
n
n
Fig. 6. Environment and Type Partitioning Relation (X !Y Z ) there are n parameters; a separate environment partition is used for checking collection type compatibility. This ensures that a value in an actual collection cannot be simultaneously bound to a formal parameter and part of a formal collection; unless it is constant in both situations. For example it prevents procedure calls such as: len(l1,l1^.n) (see previous section). An interesting feature of the formal description is that it supports mutual type recursion, via collections, often omitted in such descriptions.
Check (C ; ) = 8(a ; b ) 2 C : 8(c ; d ) 2 C : [a = c & b 6= d ) con ((d )) & con ((b ))] & [a 6= c & b = d ) con ((d ))] con (?B )
con (T1 ) : : : con (Tn ) con (rec (Lb 1 : T1 ) : : : (Lb n : Tn ))
con (T ) con (array [N1 : : : Nn ] T )
Fig. 7. Collection Aliasing and Combination Check Modes ) ;
) ?;
?)?
Types M ) M0 C ` M int ) M 0 int
) M0 M ) M0 0 C ` M bool ) M bool C ` M nil ) M 0 ptr Col M ) M 0 (Col ; Col 0 ) 2 C C ` M ptr Col ) M 0 ptr Col 0 8 =1 : C ` T ) U C ` rec (Lb 1 : T1 ) : : : (Lb n : Tn ) ) rec (Lb 1 : U1 ) : : : (Lb n : Un ) C `T ) U C ` array [N1 : : : N ] T ) array [N1 : : : N ] U M
i
::: n
n
i
i
n
Fig. 8. Parameter Type Compatibility (C ` T )U )
4 Related work Euclid [9] and its successor Turing [6, 7] are Pascal-like languages, both of which support collections and control aliasing. Aliasing is controlled to aid formal reasoning, not for ne level control over reading and writing as it is in this approach. Aliasing is checked by a mixture of static and dynamic tests, the latter of which may be turned o by compilers, rather like array index checking. Collections are limited in these languages because they cannot be abstracted; they are type equivalent by name only. Thus no code can be shared to operate on dierent collections, even if of the same form, eg a separate list length function has to be written for each dierent collection. The closest work to this is FX [2, 3]. FX is a programming language, based on
Scheme, with a sophisticated type system for controlling eects. Expressions have types and eects (such as reading, writing and allocating). FX has a larger class of eects than in the language presented here. Eects take place over regions, which play a similar role to the variables and collections used here. The FX language is polymorphic, thus generic types, eects and regions are supported. In general FX is a more powerful system than that described here, but at the expense of considerable complexity. One of the goals of the language presented here was to keep it simple, which FX is not. The Standard ML and Haskell functional languages both use types to distinguish between constant and writable (reference) values. Both allow the encapsulation of state manipulating computations, and Haskell achieves this without loss of referential transparency [8]. However both Haskell and Standard ML divide values into two worlds: mutable types (references) and immutable types. Values in these worlds are not compatible and must be copied between them. For example a mutable list must be transformed, copied, into a immutable list structure in order to pass it to a immutable formal parameter. The philosophy taken here is not to have two separate worlds of mutable and immutable values, but to have one such world and simply mode values (types) as being writable or not according to context. Some languages support read-only export of identi ers from modules, eg. Oberon-2. This is useful but not as general as the approach, described here, in particular it is rather coarse grained, and only allows clients to read not write values. It is possible to use an ADT to control the reading and writing of values; however this again is rather coarse grained, and can necessitate copying of values cf. ML and Haskell. Using a technique similar to this Bancroft [1] has investigated deriving and reasoning about programs containing pointers. His technique relies on encapsulating complex data types into ADTs which export no pointers (references) to data structure elements. The ADTs encapsulate linear data structures, ie. ones with no sharing. It would be interesting to combine this with this work on collections; for example to support a tree with leaves in dierent collections. Similarly, Islands [5] aims to control object aliasing by grouping objects together in islands. These islands are only accessible from bridge objects. The thrust of this work is on support for encapsulation of object state. Typestate [11] has been used in the NIL and Hermes languages. It enforces static invariants on the state of a program, at dierent program points. For example assignment before use of variables. Typestate is unable to check programs involving traditional pointers; instead it relies on higher level complex data structures such as Lisp s-expressions and ADT's supporting insert, delete and nd. The formal precursors to this work were that of Girard on linear logic [4], and Reynolds on \Syntactic Control of Interference" [10]. Since then there have been many developments in the formal area but few approaches have addressed traditional imperative languages or been as simple or practical as the one presented here.
5 Conclusions and Future Work The language presented here has some limitations. It may be extended to add pure functions (which only have constant parameters), and even write only types (cf. out parameters). Global variables and nested procedures may also be added; aliasing between formals and non-locals can be controlled by treating non-locals as implicit formals. Support for procedure types and subtyping require further work. The most serious de ciency of the language is the lack of type abstraction, ie abstract data types. At present all read/write control is at the level of basic types. Abstract data types require underlying representations to be hidden; thus ADTs themselves require type modes which transitively apply to the whole type. The interaction between transitive type modes and collections requires further investigation. Read-write type modes are potentially very useful for concurrency control. For example to support dierent interaction paradigms such as single writer, multiple reader. In general it is possible to make the mode and type system arbitrarily sophisticated, and complex. For example it may be possible to use existential data types to model local collections, and hence to support data structures such as trees with leaves in dierent local collections. However, more practical experience is required to see what is really useful. The original goal was to devise a simple language in the Pascal tradition with control of reading and writing. A prototype translator for this toy language has been written in Haskell; this translates the language into C. The lack of aliasing means that an implementation may safely pass large read-only (and read-write) parameters by reference and small, eg word sized, read-write parameters by value-result (copy in copy out). Naturally small read-only parameters may be passed by value. The result of the research is a small programming language which controls reading and writing of data via type modes, at the expense of a more restricted programming model than usual. Further experience is needed to see how restrictive the programming model is, and therefore which aspects may require revision. Programs written in the language should be amenable to formal manipulation since all aliasing is explicit.
Acknowledgements This work has been supported by the Programming Languages and Systems group at QUT and partially by Australian Research Council grants.
References 1. P G Bancroft and I J Hayes. Re nement in a type extension context. In Proceedings of the Fifth Australian Re nement Workshop (ARW-96). Department of Computer Science, The University of Queensland, April 1996.
2. D Giord, P Jouvelot, M Sheldon, and J O'Toole. Report on the FX-91 programming language. Technical Report TR-531 (revised version), LCS, MIT. 3. D K Giord and J M Lucassen. Integrating functional and imperative programming. In ACM Conference on Lisp and Functional Programming, pages 28{38, 1986. 4. J-Y Girard. Linear logic. Theoretical Computing Science, (50):1{102, 1987. 5. J Hogg. Islands: Aliasing Protection in Object Oriented Languages. In OOPSLA'91, pages 271{285, 1991. 6. R C Holt and J R Cordy. The Turing Programming Language. CACM, 31(12):1410{1423, 1988. 7. R C Holt, P A Matthews, J A Rosselet, and J R Cordy. The Turing Language: Design and De nition. Prentice Hall, 1987. 8. S L Peyton Jones and J Launchbury. State in Haskell. LASC, 8(5):293{341, December 1995. 9. G J Popek, J J Horning, B W Lampson, J G Mitchell, and R L London. Notes on the Design of Euclid. ACM SIGPLAN Notices, 12(3), 1977. 10. J C Reynolds. Syntactic control of interference. In 5th ACM Symposium on Principles of Programming Languages, pages 39{46, 1978. 11. R E Strom and S Yemini. Typestate a programming language concept for enhancing software reliability. IEEE Transactions on Software Engineering, SE12(1):157{171, January 1986.
This article was processed using the LATEX macro package with LLNCS style