Persistence Extensions to Ada95 Stephen Crawley
Defence Science and Technology Organisation PO Box 1500, Salisbury SA 5108, Australia Email:
[email protected]
Michael Oudshoorn
Department of Computer Science University of Adelaide, SA 5005, Australia Email:
[email protected]
Abstract This paper describes proposed extensions to the Ada95 programming language to provide support for persistent programming. The extensions support transparent migration of data between a program's address space and a persistent store in a way that preserves both type safety and encapsulation of ADT's.
1 Introduction Ada has recently undergone an intensive and major review and update. The result of this is the new programming language Ada95 (ISO 1995) which is, in the main, backward compatible with Ada83. Two of the principle changes to Ada was the addition of appropriate semantic constructs to change Ada from an objectbased language to a fully edged object-oriented language and the provision of annexes de ning additional language capabilities for speci c user communities. In this paper we consider further extensions to Ada95 by way of the introduction of a persistence annex to provide persistent programming support to Ada. We call this revision persistent Ada95 to distinguish it from Ada95 and Ada83. The paper focuses on several of the new language features added to support the object oriented paradigm in Ada. Our aim in undertaking this persistent extension was to follow, as far as practical and possi-
ble, the same revision guidelines adhered to by the Ada95 revision team and to restrict any suggested modi cations to the Ada95 semantics to those that are absolutely essential for a persistent version of Ada. In this paper we examine the various forms of persistence which are available and the merits of each. We then focus on the addition of the most appropriate form of persistence to Ada95 such that the base language is not disturbed; in particular we intend to preserve type safety and encapsulation of abstract data types (ADTs). The intent of this research is to introduce persistent features to the Language Ada95 without disrupting the core language unnecessarily. Interaction with the specialist annexes, such as the real-time annex, or the distributed annex, is not considered. In some cases, such as the use of real-time and persistence annexes, the mixture of the provided features is incompatible and undesirable.
1.1 What is persistence? Persistence is currently used to characterise any process which allows information to live longer than the program execution which creates it. In this sense, the term covers a wide range of mechanisms including:
conventional le I/O, using record manager packages; e.g. IBM's \access methods", program execution checkpointing, data structure attening or \pickling", use of relational and object databases, and use of database programming languages and persistent programming languages.
The original notion (and meaning) of the term persistence, was conceived by Atkinson in the late 1970's. In (Atkinson 1978), Atkinson argued that the task of mapping data between short and long term representations in a conventional system is expensive in both computer resources and programmer eort. He postulated an alternative approach to programming in which data values can transparently \persist" between executions of a program. To quote Dearle (Dearle 1994): \The idea behind persistence is a simple one: data should be able to persist (survive) for as long as that data is required." In 1980, Atkinson together with Morrison took an existing simple programming language (SAlgol (Morrison 1982)) and extended it to support persistence. The result was PS-Algol (Atkinson, Chisholm, and Cockshott 1981), the rst persistent programming language. Atkinson et al (Atkinson, Bailey, Chisholm, Cockshott, and Morrison 1983) have proposed the following two principles for languages that support persistent programming: \The Principle of Persistence Independence: The persistence of a
data object is independent of how the program manipulates that data object, and conversely a fragment of program is expressed independently of the persistence of data it manipulates. For example, it should be possible to call a procedure with its parameters sometimes objects with long term persistence, and at other times only transient." and \The Principle of Persistent Data Completeness: In line with the principle of persistence independence, all objects should be allowed the full range of persistence." A persistence mechanism that follows the two principles above is said to support orthogonal persistence. To quote Dearle again (Dearle 1994): \All data may be persistent and that data may be manipulated in a uniform manner regardless of the length of time
it persists. In other words, the right for data to survive for a long (or short) time is independent of the data type." This implies that persistence is on a object by object basis rather than indicating that all objects of a given type are persistent. In other words, within a programming language that fully supports orthogonal persistence, there is no conceptual discontinuity between the type and value spaces of an executing program and the persistent store.
1.2 Persistence Mechanisms If we restrict ourselves to mechanisms that make program data structures persist (as distinct from saving the information in a dierent form), we can identify four common approaches to persistence1 . These are checkpointing, persistence by property, persistence by copying, and persistence by reachability. In a checkpointing mechanism, the state of an executing program is saved to non-volatile storage. At some later time, the state can be restored, and the program can resume execution. The main diculty with checkpointing is that it is incompatible with the common requirement for concurrent access to stored data by multiple program instances. Checkpointing mechanisms typically do not satisfy the Principle of Persistence Independence, because the persistent data is permanently tied to a single long-running program execution. With persistence by property, the program decides which data should persist in one of a number of ways. In some cases, the decision is made at compile time; e.g. \all objects of a given type will persist". In others, the decision is dynamic; e.g. \all objects created using a given storage allocator will persist". Persistence by property typically does not satisfy the Principle of Persistence Independence. Using persistence by copying, a data structure is made to persist by traversing it and writing a copy to non-volatile storage with the structural information converted to a \ at" form. The saved data structure can be restored by reading the \ at" form of the data into memory and reconstructing the pointers. A typical persistence by copying mechanism is totally automatic (requires no application code), and will preserve sharing and cycles within the data 1 This simple taxonomy conceals the fact that some real persistence mechanisms are a hybrid of the approaches listed.
structure. This approach is also known as data
attening or pickling. The two main problems with persistence by copying mechanisms are that:
they do not preserve object identity in the restored data stucture, and they can be inecient when applied to large data structures.
To illustrate the object identity problem, suppose that a program creates two records (A and B) and that both contain a pointer to another object (C). When A is saved, a le will be created containing a copy of both A and C. Similarly, saving B gives a le containing copies of B and C. If a later execution tries to restore the two les containing A and B, a persistence by copying mechanism will restore two independent copies of C; i.e. after restoration, the A and B data structures no longer share C. On the basis of this, we argue that persistence by copying fails to satisfy the Principle of Persistence Independence; preserving the object identity of persistent objects depends on the programmer \doing the right thing". With persistence by reachability, any data values can potentially persist, and its actual persistence depends on whether or not a future program execution could nd it. A system that uses persistence by reachability typically designates one or more persistent root objects that any program can access. All objects that are reachable from a persistent root could potentially be accessed by a program execution, and must therefore be saved between executions. Objects are typically saved by a traversal algorithm and restored when a program tries to access them (Atkinson, Chisholm, and Cockshott 1983). Persistence by reachability is the most common mechanism for the implementation of orthogonal persistence.
1.3 Persistence in Object-Oriented Programming Languages The Eiel programming language and standard libraries provide two mechanisms for persistence in the form of the storable and environment classes (Meyer 1989; Meyer 1990). If the programmer derives a class from storable, it will have \store" and \retrieve" operations that can be used to make objects persist. The operations work by copying the object and all reachable objects using a persistence by copying mechanism. The environment class de nes objects called
environments that hold a collection of arbitrarily typed persistent objects. All objects that are created while an environment is \open" are implicitly saved to the environment, and objects in an environment can be subsequently retrieved using a \key". The persistence mechanisms provided by the environment class are based on persistence by copying. While both mechanisms are type safe at the representation level, neither can be said to provide orthogonal persistence. While the C++ language itself does not support persistence, there are a number of examples of support for persistence for C++ programs. One common approach is to provide a library of persistent base classes from which the programmer can derive his own classes. Another approach is to provide linguistic extensions that do the same thing, or to de ne a separate data de nition language based on C++. Examples of persistence for C++ include Avalon/C++, O2, O++, the work of Charlton et al (Charlton, Leng, and Rivers 1994) and Interviews Libraries (Linton, Vlissides, and Calder 1988). These examples variously use persistence by copying and persistence by property. VML (Klas and Turau 1992) is an example of database programming language with an objectoriented type system. In VML, classes can be declared as persistent, and all values belonging to a persistent class will persistent until they are explicitly destroyed. VML's persistence mechanism is persistence by property. SmallTalk-80 implementations provide persistence with a checkpointing mechanism and dribble les. This makes it dicult for users to share persistent data and to separate persistent state from the program that created it. However, since the object methods are saved along with the objects, SmallTalk-80 is one of the few object-oriented languages in which the persistence scheme does not break the encapsulation of ADT's. E (Richardson, Carey, and Schuh 1993) is a persistent object-oriented language based on C++ developed by the Exodus extensible database project. E provides a set of \db" types and constructors that parallel the C++ primitive types and constructors. Objects with a \db" type can be declared using the \persistent" storage class. This causes the object declaration to be notionally shared between successive executions of the program. Alternatively, a program may instantiate E's builtin generator2 dbclass with a user de ned class, and use this to allocate persistent 2 Generators in E are a predecessor of templates in C++.
object. In both cases, the persistence mechanism is persistence by property. Richardson et al (Richardson, Carey, and Schuh 1993) acknowledge that forcing the programmer to decide the persistency of objects when they are allocated and explicitly delete them when they are no longer required leads to problems with storage leaks and dangling pointers in persistent data structures. The Comandos environment provides three programming languages which support persistent and distributed programming. C** is an extension to C++, Eiel** is an extension to Eiffel, and there is a new object oriented language for Comandos (Walsh, Taylor, McHugh, Riveill, Cahill, and Balter 1993). These three languages all provide transparent persistence by reachability and a simple R-value binding3 mechanism. In C**, a class may be declared as persistent using the keyword \permclass" instead of \class", so that any objects of the class or descendent classes are potentially persistent. When an object persists, direct member objects are saved with it whether or not they are instances of a persistent class, as do other objects of persistent class that are accessible via a pointer or reference. The programmer may also use the \volclass" keyword to declare a class as non-persistent irrespective of where it inherits from. Standard operations are provided for designating objects as persistent roots with simple names and for R-value (constant) binding to preexisting roots. In Eiel**, all classes are persistent classes and persistence is by reachability. In addition, the semantics of the Eiel storable and environment classes are modi ed to implement the naming and binding for persistent objects, while using the Comandos persistence mechanisms instead of persistence by copying. With the exception of Smalltalk-80, the languages listed above do not guarantee to preserve the encapsulation normally provided by an ADT when a value of the type is made persistent. In some cases, the persistence mechanism does not preserve object identity. In others, object identity is preserved, but variables shared across all instances of a particular class are not necessarily saved. The result may be that saved objects cannot be used correctly; e.g. if an object's logical 3 Persistent R-value and L-value bindings are the analogue of constant and variable declarations. An R-value binding simply fetches a value from the persistent store and assigns it to a program constant. On the other hand, an L-value binding associates a program identi er with a variable cell in the persistent store, allowing the program to update a persistent variable by normal assignment.
state is held in part in a class-wide data structure. Finally, with the exception of SmallTalk80, none of the languages listed save the code of object methods with the objects. This means that it is possible to break ADT encapsulation for a persistent object by editing and recompiling its methods between program runs. On the basis of this, we argue that none of the above languages satisfy the Principle of Persistence Independence for ADT's. By contrast, the Napier88 programming language (Morrison, Brown, Connor, and Dearle 1989) allows function values and closures to persist along with data values. While Napier88 is not objectoriented since it does not allow inheritance, it does provide an \abstype" construct which produced ADT's based on Cardelli's existentially quanti ed types (Cardelli and Wegner 1985). Persistence of Napier88 abstype values does not violate ADT encapsulation.
1.4 Persistence in Ada95 The Ada95 programming language is objectoriented according to Wegner's de nition (Wegner 1987): \object-oriented = objects + object classes + class inheritance" While Ada95 separates the notion of data encapsulation and modularisation from the inheritance mechanism, there are indications that this may actually make object-oriented programming easier in some areas. A comparison of object-orientation in Ada95 with other languages may be found in (Barbey 1994). The aim of our research is to design and implement persistence in Ada95 that is as close as possible to orthogonal persistence, with minimal syntactic and semantic disruption to the language. In addition, persistence must be typesafe such that an object of type T written to the persistent store, can only be retrieved from the store as an object of type T . The Ada95 standard de nes a core language with standard syntax and semantics, and a number of optional specialist Annexes. An Ada95 Annex may de ne new pragmas and attributes, but it may not alter the core language syntax and semantics. Since we would like our persistence extensions to be the starting point for a future Persistent Programming Annex for Ada95, we are restricting ourselves to this kind of extension.
Ada95 has been designed to be a \no surprises" language. The Ada95 revision team has gone to great lengths to remove anomalies, and eliminate cases that lead to unde ned behaviour. It is our aim that our extensions should continue in this vein. The persistence extensions for Ada95 should not depend on \advanced" features such as garbage collection, rst class functions and persistent code objects that Ada does not currently require. While we believe that Ada95 would bene t from garbage collection and rst class functions, and fully orthogonal persistence requires that package code objects are persistent (Crawley and Oudshoorn 1994), there are pragmatic reasons for not making them mandatory for persistent Ada. First, there is considerable resistance to the idea of garbage collection (and hence to language features that would require it) in parts of the Ada software engineering community. Secondly, we would like a Persistent Programming Annex for Ada95 to be implementable within the framework of current generation Ada compiler technology. In (Crawley and Oudshoorn 1994), we identi ed a number of problems in supporting pure reachability-based orthogonal persistence in Ada.
The Ada type system uses name equivalence and static type checking, and types have a xed lifetime that depends on the declaration scope. On the other hand, orthogonal persistence requires that types have an inde nite lifetime. Furthermore, if a persistent application (program + data) is to evolve, structural equivalence and dynamic type checking are desirable, and indeed necessary, when binding to an object from the persistent store. Orthogonal persistence requires that task values and (in Ada95) function pointers can persist. This means that both tasks and functions must be rst-class values which present implementation and, in the case of tasks, semantic problems. Orthogonal persistence of private values de ned in Ada packages requires the associated package state to persist. Since a generic package may be instantiated a number of times in a program, a persistent package instance should be treated as a kind of persistent object in its own right. Orthogonal persistence of private values and package instances requires that code for
the package that created them also persists in some form. Package instance objects do not mesh very well with the Ada95's support for object-oriented programming, and (Crawley and Oudshoorn 1994) concluded that it would be better to replace private types with something closer to Cardelli's abstract types (Cardelli and Wegner 1985). All in all, the conclusion was that adding pure orthogonal persistence to Ada would result in a language that was substantially incompatible with both Ada83 and Ada95. The persistence extensions to Ada95 proposed in this paper will use the persistence by reachability model. However, this version of persistent Ada95 falls short of the ideal orthogonal persistence in some important respects:
Task and function pointer types, and the types constructed using them will not be persistent. Dierent type compatibility rules will apply when binding program identi ers to objects in a persistent store. The program will be expected to specify whether or not package instances should persist, and to explicitly bind to persistent package instances. The programmer needs to say whether or not a change to a package makes it incompatible with pre-existing persistent values.
In each case, the short-fall is a consequence of our design constraints; i.e. compatibility with Ada95 and avoidance of \advanced" features as outlined previously. We should note that the addition of orthogonal persistence to mainstream languages is of considerable interest to the persistent programming community. Furthermore, the exercise of trying to add orthogonal persistence to Ada95 while maintaining compatibility with the base language is revealing some interesting things both about Ada and similar languages, and about persistence.
2 Ada95 and Persistence Section 2 of this paper presents a detailed description of the proposed persistence extensions to Ada95. We start by explaining how persistence and binding are provided for most standard Ada types, then we discuss how tagged
records and packages are handled, and gives some details of the environment in which persistence would work.
0
2.1 Persistence of Values The basis of persistent Ada95 is the notion that types can have the property of persistency; i.e. the property that values of the type may potentially persist beyond a program execution. We follow the example of (Richardson, Carey, and Schuh 1993) and (Walsh, Taylor, McHugh, Riveill, Cahill, and Balter 1993) and classify types as either persistent types or non-persistent types according to whether or not they have this property4. Note that the term persistent type does not mean that all objects of that type must persist, but rather that objects of that type may persist as required, i.e., they may be saved to and restored from a persistent store. In persistent Ada95, most types are persistent types by default. The exceptions to this are as follows:
Ada95 task types and access to subprogram (function pointer) types are non-persistent. Ada allows a program to declare access (pointer) types for user de ned storage pools (heaps). These are non-persistent. Types de ned with representation attributes or representation clauses are nonpersistent. Private types declared in non-persistent packages are non-persistent. (Package persistency is discussed in a later section.) Any types composed or derived from nonpersistent types are also non-persistent.
Some of these restrictions could be removed if we were to assume that the implementation of persistent Ada95 supported garbage collection, persistent code objects and rst-class function values. Further dynamic restrictions apply to the creation and use of access values in a persistent Ada program: 4 This terminology is somewhat questionable for a type system that uses structural equivalence. However, with name equivalence each type has a de nite identity, and hence the idea that a type might need to persist is meaningful.
Ada95 allows a program to generate access values by taking its address using the Access attribute. Values generated in this way may not be root reachable, since this would entail making objects allocated on the stack persistent. Uninitialised access values may not be root reachable. Access values that are dangling references (e.g. created by incorrect use of unchecked deallocation) may not be root reachable, since this would require the root traversal algorithm to cope with bad pointers.
In practice, it may be too expensive to enforce the restrictions in a non-garbage collected implementation. With the above exceptions, any Ada95 value belonging to a persistent type may persist. The actual persistency of a value depends on whether or not it is root reachable at the time objects are committed to a persistent store. This will happen when the program terminates successfully, or when the program directly or indirectly requests a commit. An Ada95 persistent store is a non-volatile storage mechanism that holds typed Ada95 values. In the simple case, the store and associated runtime code provides mechanisms for incrementally migrating values from the store to a program's address on demand, and for atomically committing changes to the program's root reachable data structures. Future persistent stores may also support selective commits. An Ada95 persistent store has one or more persistent root objects which are the basis for determining reachability. It also provides a naming system which allows the programmer to organise and browse the stored values in a typesafe fashion. The root(s) of the name space will typically map onto the persistent root objects in some implementation speci c way, though this is not essential. The persistent extensions to Ada95 provide a persistent binding mechanism for associating a value (or updatable cell) in the persistent name space with an Ada identi er. If an object declaration is followed by a Bind pragma, its meaning is altered from a declaration of a local object to a binding to a persistent object. In Figure 1, the program establishes a binding between the identi er \i" and a cell in the persistent store with the name \my counter" and type Integer. If there is no cell named \my counter"
declare i : Integer := 0; pragma Bind(i, "my counter"); begin i := i + 1; end; Figure 1: A persistent counter using L-value binding. in the persistent name space, one will be created and initialised to the value given (zero). Operations on \i" in the begin { end block will read and update the persistent variable. Assignments to \i" will change the contents of the cell in a manner identical to a conventional assignment statement. However, any updates to the cell will only be written to non-volatile storage when the persistent store is committed. Persistent store commit is an atomic process in which all objects reachable from the store's persistent roots are saved. In the simple model, persistent store commit only occurs when the program execution terminates normally, or when the programmer explicitly requests it. The syntax for the pragmas for persistent binding declarations are given in Figure 2. The \" is an Ada object identi er declared in the preceding object declaration, the \" is a String values expression whose value is a name in the persistent namespace, and the optional \" is an expression which gives an alternative start point for the namespace lookup. The dierence between the Bind and Rebind pragma's is that the former will insert a name into the persistent namespace if necessary, while the later expects the name to be present already. The initial value expression in an Ada object declaration is required when there is a Bind pragma and forbidden when there is a Rebind pragma.
pragma Bind(, [, ]); pragma Rebind(, [, ]); Figure 2: The Bind and Rebind pragmas. Establishing the persistent binding speci ed by a Bind pragma involves the following steps: 1) The name is looked up in the persistent store's namespace. 2) If the name cannot be found, the initial value expression is evaluated, and a new constant or variable will be added to the
persistent namespace with a name, type and initial value as given. 3) If the name is found, the initial value expression is not evaluated. Instead, the type of the existing persistent variable or constant is compared with the object declaration, and an exception is raised if the types are not binding compatible (see below). 4) The constancy of the name is checked, and if the program is trying to perform a variable binding for a persistent constant, an exception is raised5 . 5) A binding is then established between the declared identi er and the persistent value or cell. This binding remains valid for the normal lifetime of the declared identi er. In the case of the Rebind pragma, step 2 above is replaced by the following: 2 ) If the name cannot be found, an exception is raised. 0
We chose to follow the example of Napier88 (Dearle 1989) in supporting L-value bindings (bindings to cells) in persistent Ada95 as well as R-value bindings (bindings to values). This allows us to de ne persistent variables in a natural way. Without L-value binding we would need to simulate persistent variables using access values. Figure 3 shows how this can be done for the persistent counter example in Figure 1.
declare type iptr is access integer; i : constant iptr := new(0); pragma Bind(i, "my counter"); begin i.all := i.all + 1; end; Figure 3: A persistent counter using R-value binding. When a program binds to an object in a persistent store, the runtime system must perform a dynamic type check to ensure that the binding is type-safe, and does not violate abstract type encapsulation. If we were to use pure name equivalence for this type check, we would have 5 An L-value binding to a persistent R-value could be treated as an ordinary variable declaration where the variable is initialised to the value of the persistent constant. However, this would violate the \no surprises" principle since assignments to the Ada identi er would not persist, when it would appear to the programmer that they should.
to provide a mechanism that allows program instances to share type instances6 . It is dicult to see how this could be done without fundamentally changing the Ada95 language. Rather than struggling with the problems of name equivalence, we have decided to provide some form of structural type compatibility for persistent binding. At this time, we have identi ed two approaches to the type equivalence rule:
\pure" structural equivalence rule, and a hybrid of structural and name equivalence rules in which two types are equivalent i they have the same structure and they are de ned in equivalent lexical contexts.
The rst approach would allow persistent binding to be used as a \back door" for breaking the name equivalence rule that normally applies to Ada types. Even so, we have chosen this approach because it is more consistent and easier to understand than a hybrid type equivalence rule. Irrespective of which of the above two options is chosen, the type rules for persistent binding also need to prevent violation of data abstraction for private types that are used to model abstract types. We will describe how this is done in the section on packages.
2.2 Tagged records The Ada95 provides a new type constructor called the tagged record that is designed to support object-oriented programming. When a record type is declared as \tagged", other record types can be de ned as extensions of the tagged type. Tagged record types form the basis for single inheritance in Ada95. Figure 4 is an example of tagged type declarations. Each tagged type declaration de nes an actual type and a class-wide type. The latter is actually a family of types consisting of the type and all extensions of the type. The class-wide type is denoted by the 'Class attribute. Figure 5 illustrates the relationship between the dierent types. Class-wide types are used when an object operation needs to be able to dispatch to speci c functions depending on the tag value of an argument or result. Figure 6 shows how dispatching works with function overloading in Ada95 to support object-oriented operation invocation. 6 That is, the notional type entities that result from each elaboration of a type declaration.
type Object is tagged record
X Coord: Float; Y Coord: Float; end record; type Circle is new Object with
record Radius: Float; end record;
Figure 4: Type specialisation using tagged records.
type Object Ptr is access all Object; type Any Object Ptr is access all Object'Class; type Circle Ptr is access all Circle; op: Object Ptr; ap: Any Object Ptr; cp: Circle Ptr; begin op := new Object(1.0, 1.0); cp := new Circle(1.0, 1.0, 2.0); ap := Any Object Ptr(cp); ap := Any Object Ptr(op); op := Object Ptr(ap); { { May raise a constraint error exception { { if ap does not point to an Object. end;
Figure 5: Use of class-wide types. In supporting persistence for tagged record types, we would like to allow evolution of tagged type hierarchies by adding new extensions to the hierarchy. This presents us with two problems. The rst problem is that the tag values for persistent tagged record types need to be consistent across the dierent program images and through time in the face of extensions to the tagged type hierarchy. The Ada95 standard has been designed so that tag values can be table indexes, and operation dispatching can be a simple table indirection. It is not obvious that we can be this ecient when dispatching operations based on persistent tag values. The second problem is that a program might bind to a persistent tagged record with given tag, but not have the speci c functions needed for that tag. We have identi ed three possible solutions to this problem:
If we do not allow class-wide types to persist, we can guarantee that a program can only bind to tagged records if it has the required functions. We could change the semantics of dispatching so that a constraint error exception is raised if the function required to perform an operation is not available.
function Area(obj: in Any Object Ptr) return Float is abstract; function Area(obj: in Object Ptr) return Float is begin return 0.0; end Area; function Area(obj: in Circle Ptr) return Float is begin return pi obj.radius 2; end Area;
... ap: Any Object Ptr := ... begin ... := Area(ap); { { This invocation dispatches to the { { appropriate Area function, depending { { on the tag of ap.all end;
Figure 6: Dispatching of overloaded operations. We could make the binding type compatibility rule for class-wide types such that it guarantees that a program can handle all possible tag values that the persistent data may contain. One way to do this is to require that the set of types in the class-wide type, as de ned by the program, is a superset of the corresponding set in the program that created the persistent data structure. The rst alternative is too restrictive, since it makes it impossible to save data structures containing polymorphic objects. The second alternative introduces an unnecessary source of runtime errors. At this stage, the third alternative seems to be the most workable. (Note that the solution suggested in (Charlton, Leng, and Rivers 1994) of treating records with unrecognised tags as members of a known superclass is not applicable to persistent Ada95, since it would break encapsulation of ADT's de ned using tagged records.)
2.3 Packages Packages present linguistic problems for persistence in Ada, especially since we are trying to preserve data encapsulation for persistent values. The problems arise in two areas: preserving the state of a package instance, and dealing with evolution In standard Ada83 and Ada95, a generic package is a template for creating a collection of entities; e.g. types, constants, variables and subprograms. Some of these entities are partially or completely visible as the package's interface. Others are completely hidden from outside of
the package boundary. The template is typically used (elaborated) only once in a given program execution, and the result (a package instance) is potentially visible throughout the program. However, a package may be elaborated multiple times in a program, or elaborated within a nested scope. Suppose that a package instance is responsible for managing a collection of objects, which it makes available to other components of the program as either public (open) or private (opaque) types. Management of these objects by the package may involve the use of package state in the form of public or hidden data structures that are \global" to the package instance. Figure 7 gives an example of such a package. Further package state may be contained in subsidiary or embedded packages and tasks.
package Table is pragma Package Persistency(Stateful); { { See below. type Key Type is private; Null Key: constant Key Type; function Save(value: in Integer) return Key Type; function Fetch(key: in Key Type) return Integer;
private type Key Type is 1 .. 100; constant Null Key := 1; end Table; package body Table is tab: array (Key Type) of Integer := (others => 0);
next: Key Type := 1; procedure Save(value: in Integer) return Key Type is begin next := next + 1; tab(next) := value; return next; end Save; function Fetch(key: in Key Type) return Integer is begin return tab(key); end Fetch; end Table;
Figure 7: A package that uses global state for object management. If some other part of a program makes one of these managed objects reachable from a persistent root, it will be saved to the persistent store
in the normal way. When the object is restored in a later run, it can only be used successfully if the state of the original package instance is available as well. In other words, in some cases package instances must be made persistent in order to preserve package encapsulation. However, in other cases there is no need to preserve the package instances. This is a problem where we require assistance from the Ada95 programmer. Since a program may create many instances of a package and save the objects created by each instance, it follows that the program needs to be able to dynamically select the appropriate persistent package instance before rebinding to the stored object. Furthermore, the persistent binding mechanism must ensure that the correct instances are used for each object to preserve package encapsulation. The second problem with packages is that the programmer may make changes to the package code between saving a collection of objects and restoring them. Any change to a package's interface or implementation has the potential to make programs that use it incompatible with previously stored objects. For example:
The type declarations for the objects' representation could change. The declarations for the package's public or hidden state variables could change. The package's algorithms could change in a way that alters the data structure invariants for the package. Similar changes could be made to other compilation units which the package depends on.
In the rst two cases, it would be possible to detect any incompatibilities by comparing package signatures when restoring package instances. In the nal two cases, a mechanism that can automatically distinguish theoretically compatible changes from incompatible ones is beyond the state of the art (if not impossible). Once again, the solution involves some assistance from the programmer. These two problems are not speci c to Ada packages. As we discussed in section 1.3, the problems arise in many other object-oriented languages that support persistence. The only completely general solution seems to be to make packages (classes) and their respective code objects persistent. In persistent Ada95, we address the rst problem by giving packages a persistency property,
and classify each package as one of stateful, stateless or non-persistent. Stateless and
stateful packages are both persistent, although in dierent ways. Stateless packages are those for which the state of a package instance does not need to be saved between runs. Persistent objects with a private type are treated as independent of the package instance that created them. A persistent binding involving a private type de ned by a stateless package simply requires structural equivalence of the full type. Stateful packages are those whose state must persist. In particular, a persistent object created by a given package instance can only be successfully used if the state of the package instance persists. When a persistent binding involves a private type declared by a stateful package, name equivalence rather than structural equivalence is used for the type. This ensures that persistent objects belonging to the private type can only be used in conjunction with the package instance that created them. Non-persistent packages are those which have private state that cannot be made to persist; e.g. those \global" data structures which include non-persistent types, or whose bodies have embedded tasks. Private types declared by a non-persistent package are also non-persistent. The default persistency of a package is determined by the following rules: Any package whose interface contains any \global" state declarations with nonpersistent types defaults to non-persistent. Any package whose interface contains \global" state declarations which all have persistent types are stateful. Any package whose interface declares no \global" state is stateless. A package's default persistency can be overridden using the Package Persistency pragma shown in Figure 8.
pragma Package Persistency ([stateful j stateless j nonpersistent]); pragma Package Bind(, [, ]); pragma Package Rebind(, [, ]); Figure 8: The Package Persistency and Binding pragmas. A package body must be consistent with the persistency of its interface. If the declaration of a
private type is deferred to the package body, the corresponding full type must be consistent with the persistency implied by the interface. Furthermore, if a package is stateful, its body must not declare hidden \global" state using nonpersistent types or have any embedded tasks, since this would make it impossible to correctly save the state of a package instance. The scheme for making instances of stateful packages persist and for using them are analogous to those used to make objects persist. The persistence extensions to Ada95 de ne two pragmas Package Bind and Package Rebind (see Figure 8) that alter the meaning of a \with" statement or a package declaration to a persistent binding. The two forms are illustrated in the Figure 9.
with Table; pragma Package Rebind(Table, "my table"); procedure Example is package Table2 is new Table; pragma Package Bind(Table, "my table2"); ... end Example; Figure 9: Use of the package binding pragmas. The process of making a persistent package binding is similar to a binding to a persistent object. An attempt is made to locate the package instance named by the pragma in the persistent namespace. If the lookup fails, a package instance is elaborated in the normal way, and is inserted into the persistent namespace. If the lookup succeeds, the existing persistent instance is used. When a binding is made to a persistent package instance, the runtime system checks that the package code in the executing program is compatible with the persistent instance. The check will ensure that the program uses the correct types for the package state, and that the code is \semantically compatible" with the package instance. Semantic compatibility will be determined by comparing semantic keys from the package code and the package instance. Each time a package is changed and recompiled, it will (by default) be given a new semantic key. Normally (i.e. if the programmer does nothing), packages with dierent keys are semantically incompatible. However, the programmer will be able to specify that particular pairs of keys are fully or upwards compatible. Figure 10 shows a persistent program that employs a persistent Table instance and a private
type declared in the Table package (see Figure 7). This example shows how a value of a private type declared by the package that is saved independently of the package used.
with Table; procedure Example2 is package Table1 is new Table; pragma Bind Package(Table1, "my table 1"); with Text IO, Integer IO; key1: Table1.Key Type := Table1.Null Key; { { If "key 1 " was created by a dierent { { persistent instance of Table, the binding { { will raise an exception. begin if key1 = Table1.Null Key then Text IO.Put Line("No previous key value"); else Text IO.Put("Last value saved was "); Integer IO.Put(Table1.Fetch(key1)); Text IO.New Line; end if ; Text IO.Put("Enter a number:"); key1 := Table1.Fetch(Integer IO.Get); end Example2;
pragma Bind(key1, "key 1");
Figure 10: Using a persistent Table. A Type Persistency pragma to give the programmer ner grain control over type persistency and the type compatibility rules used for persistent binding may be considered. For example, a programmer may wish to bind objects with public types to a particular persistent package instance, or allow objects of a private type to be used independently of the package instance. In some cases, the programmer may even need to declare that a type is non-persistent.
2.4 The Environment for Persistent Ada95 This section of the paper will brie y discuss some issues related to the persistency environment for Ada95. Persistent Ada95 as described above requires that the persistency mechanism provides a strongly typed naming system for persistent variables and constants. The program interacts with this naming scheme using the Bind, Rebind, Package Bind and Package Rebind pragmas whose syntax and semantics ae described above. The Name Space type is an abstract tagged type which provides a group of operations for performing name lookup and for adding and removing names from the persistent namespace.
Some of these operations are used by the runtime system when performing a persistent bind or rebind. Some of them may also be invoked by a program directly. The required interface for a Name Space object is illustrated in Figure 11. Note that operations are provided for obtaining and setting the Name Space object used in a binding pragma that omits the optional parameter.
package Persistent Naming is type Name Space is abstract tagged private; type Name Space Entry is private; Null Entry: constant Name Space Entry; procedure Lookup(ns: in Name Space; name: in String; entry: out Name Space Entry) is abstract; procedure Insert(ns: in Name Space; name: in String; entry: in Name Space Entry) is abstract; procedure Delete(ns: in Name Space; name: in string) is abstract; function Get Default return Name Space; procedure Set Default(ns: in Name Space); private type Name Space is abstract tagged null record; type Entry Obj; { { implementation speci c. type Name Space Entry is access Entry Obj; Null Entry: constant Name Space Entry := null; end Persistent Naming; Figure 11: The interface for Name Space objects. The Ada95 runtime library will provide one or more standard classes derived from Name Space. However, declaring Name Space as abstract is intended to allow a program to provide its own implementation. The Entry Obj type is an opaque type that is used to pass a persistent variable or constant's type and value or L-value to the runtime system. The representation of this information is be implementation speci c. The earlier discussions of the binding pragmas assumed that the value of the would be a simple name in a \ at" name space. In practice, we expect that some implementations will provide tree or graph structured namespaces. In such cases, the value of the
could be a compound name or path using a syntax that is understood by the namespace's Lookup and Insert operations. An implementation of the Name Space abstraction could also implement access controls. There is no fundamental reason why a Name Space object used in a binding pragma needs to be attached to a persistent root. Free standing Name Space objects can be used for dynamic binding independent of persistence. In many respects, this is similar to the way that environment objects can be used in Napier88 and Eiffel. However, we are unable to use Name Space objects to support evolutionary programming as described in (Dearle 1989) because persistent Ada95 does not support persistent function pointers. The dierences between binding in persistent Ada95 and Napier88(Morrison, Brown, Connor, and Dearle 1989) can be summarised as follows:
Persistent Ada95 supports binding using dynamically calculated names, while Napier88 requires the name to be compiled in. Persistent Ada95 binding has \create if not there" semantics, while the Napier88 programmer has to do this manually. In persistent Ada95 Name Space objects can support compound name lookups, while in Napier88 the programmer must use a sequence of simple name lookups. Persistent Ada95 supports user-defined Name Space objects, while Napier88 restricts the user to the builtin \env" type. Napier88 binding can be used for all data types, while persistent Ada95 only allows binding of persistent types.
The model for persistence in persistent Ada95 does not prescribe a logical or physical architecture for persistent storage. An implementation can provide one logical store or many logical stores. A logical store could be implemented in a variety of ways; e.g. as les in the host le system, an object manager, an object-oriented database system, and so on. Distributed and replicated persistent stores may also be possible. While concurrent shared access to the persistent store is highly desirable, it is not required by the model. Indeed, the persistent store in the rst prototype implementation of persistent Ada95 is unlikely to support more than one program
execution at a time. Another possibility is to allow sharing of read-only persistent data. Our reasons for not requiring the persistent store to support concurrent write access are two-fold. First, the implementation technology for sharing in reachability-based persistent stores is not mature. Second, the problems of concurrency control in persistent programming languages are not well understood. The only hard requirements that persistent Ada95 places on the persistent store are:
that it supports persistence by reachability, that it supports atomic commit to nonvolatile store for the graph of reachable objects, that type-safety and object identity are preserved, that the persistent store is garbage collected.
The justi cation for the last requirement is as follows. Programs that operate on persistent data will inevitably \lose" objects that are no longer accessible from the persistent root. (It is in general impossible for a program to safely dispose of an object, since it cannot tell whether a dierent program has hidden a pointer to it in another part of the reachable graph.) Garbage objects will accumulate in the persistent store, until a point at which it is necessary to reclaim the space they occupy. At this point, there are only three alternatives: 1. Reclaim the space using an automatic garbage collector that discards objects that are no longer reachable from a persistent root. 2. Erase the persistent store and rebuild it from transaction logs or other external information. In general, this information will not be available. 3. Save all important information from the store, then erase and rebuild it. Since the code to do this has to written by the programmer, a major bene t of persistent programming is nulli ed. A future Persistent Programming Annex for Ada95 would specify other features for persistent programming. These might include:
a classi cation of possible persistent store architectures,
library functions for data structure pickling, and a standard syntax for compound names in the persistent namespace.
There will inevitably be some interactions between a Persistent Programming Annex and the existing Ada95 standard. One interaction is that some types de ned by the standard library would need to be declared as non-persistent. For example, the various File Type types would probably be non-persistent in a UNIX-based implementation, since we cannot guarantee to be able to reconnect a persistent le handle to the original le, pipe or socket. Other points of interaction include the Distributed and Real-Time Systems Annexes, and areas of the language related to machine speci c data representation.
3 Future Work In this section, we will brie y outline our plans for implementing the persistence extensions to Ada95, and some possible directions for research following that.
3.1 The Initial Implementation Our current plans are to implement a prototype persistent Ada95 system using the \copyleft" GNAT Ada95 compiler and pre-existing persistent store technology as starting points. The prototype will support persistence as described for the core Ada language and standard libraries, with restrictions as noted in the area of machine speci c representations. The prototype will support the one store/one user model, with facilities provided for oine garbage collection. It is our intention to make the prototype persistent Ada95 system publicly available as \copyleft" software as soon as it is stable. This will allow researchers to evaluate the design, and use it for developing persistent applications. The prototype will also be a good platform for a scienti c evaluation of the bene ts of the persistent programming paradigm.
3.2 True Orthogonal Persistence Once the initial prototype has been released, the next step is to provide an automatic garbage collector for the persistent Ada95 heap. By the time we need this, we anticipate that someone
will have added a garbage collection to the standard GNAT compiler, for example by adapting Boehm's conservative garbage collector. Once garbage collection is available, we can start to remove the restrictions that prevent us from calling it orthogonal persistence. This entails making functions and tasks persistent, preferably as \ rst-class" values. To achieve this we need to modify the compiler to allow code objects to be saved in the store, and (for rst class functions) to allocate local frames from the heap. The later would involve major modi cations to the GNAT (i.e. GCC) code generators.
3.3 Support for Evolution Persistent rst class functions and persistent code objects allow us to address the dicult problem of evolution in persistent Ada95. Previous research in the context of Napier88 has demonstrated that L-value binding of function values allows incremental or evolutionary construction of programs (Dearle 1989). However, L-value binding does not address the problem of evolution of the schemas of persistent data. A possible basis of a scheme for the evolution of abstract data types in a persistent store can be found in (Crawley and Barman 1987).
3.4 Re ection and Rei cation In a system which supports persistent code objects and rst-class functions, it is a small step to provide a compiler that can be invoked at runtime; i.e. a callable compiler. A callable compiler allows the programmer to use re ective programming techniques. Combining re ection with persistent source code representations leads to interesting ideas like hyperprogramming (Kirby, Connor, Cutts, Dearle, Farkas, and Morrison 1992). Another idea that is currently being explored in conjunction with persistent programming is rei cation or meta-programming. The basic idea of rei cation is that values and types can be \lifted" to a meta-programming level where normal types can be treated as values and normal values can be operated on in a type independent fashion. At a later point, the types and values can be \dropped" back to the normal level. Tools like source level debuggers normally operate at the meta-programming level. In the context of persistence, rei cation is also potentially useful for maintenance of the data structures in the persistent store, and may turn out to provide a useful tool for schema evolution.
4 Conclusions In this paper we have presented a design for persistent programming extensions to the Ada95 language. We have shown that by adopting a pragmatic approach it is possible to come close to the ideal of orthogonal persistence, while still maintaining compatibility with the base language in all respects. The compromises that need to be made in Ada95 are that tasks and function values cannot persist and that the programmer must make decisions about the behaviour of packages and private types when they persist. The persistent Ada95 extensions will not be suitable for use in a hard real-time applications. The unpredictable nature of persistence is incompatible with the requirements for real-time programming, and it is likely to continue to be so for the forseeable future. However, this is not a good argument for not supporting persistence as an Ada95 Annex. A real-time programmer should use a compiler that can \turn o" all support for persistence. The persistence extensions are designed to be implementable in the framework of current Ada compilers such as the GNAT Ada95 compiler. While the design has intentionally avoided making it a requirement to implement heap garbage collection, rst-class function values, or persistent code objects, we believe that persistent Ada (and Ada in general) would bene t from these features. Providing them should allow full orthogonal persistence to be supported, and will enable support for re ection and rei cation. We have identi ed a problem with persistence as implemented in a wide variety of object-oriented programming languages; namely that data abstraction is not preserved by persistence. In persistent Ada95, we address the problem by preserving the state of package instance, and by checking the package code for representational and semantic compatibility (in the particular sense described above). Evolution of systems with persistent data is a dicult problem in general, and for persistent programming systems in particular. The persistence extensions for Ada95 provide ad-hoc support for the evolution of package code, with the potential risk of data abstraction violation. Other possible alternatives include the use of rei cation for schema evolution, and the use of \projection" for evolution of ADT interfaces. One of the main failings of current generation implementations of persistent programming languages is that they do not support concurrent
shared access of persistent data. Current research in this area is focussed on persistent store and memory architectures to support sharing (Vaughan, Dearle, Cao, di Bona, Farrow, Henskens, Lindstrom, and Rosenberg 1993), but little serious work has been done on the linguistic implications of concurrency. In persistent Ada95, it is possible that we can \leverage o" the protected record type that is used to synchronise concurrent access to shared data by Ada tasks.
References
Atkinson, M. P. (1978). Programming languages and database. In Proceedings 4th Internatinal Conference on Very Large Databases, pp. 408{ 419. IEEE. Atkinson, M. P., P. J. Bailey, K. J. Chisholm, W. P. Cockshott, and R. Morrison (1983). An approach to persistent programming. The Computer Journal 26 (4), 360{365. Atkinson, M. P., K. J. Chisholm, and W. P. Cockshott (1981, July). PS-Algol: An Algol with a persistent heap. ACM SIGPLAN Notices 17 (7), 24{31. Atkinson, M. P., K. J. Chisholm, and W. P. Cockshott (1983, March). Algorithms for a persistent heap. Software - Practice and Experience 13 (3), 259{272. Barbey, S. (1994, November). Working with Ada9X classes. In Proceedings TRI-Ada'94, Baltimore MD, pp. 129{140. ACM. Cardelli, L. and P. Wegner (1985). On understanding types, data abstraction and polymorphism. ACM Computing Surveys 17 (4), 471{ 523. Charlton, C., P. Leng, and M. Rivers (1994, March). Using inheritence to provide schema views in a shared persistent object database. In Proceedings TOOLS Europe '94: Technology of Object-Oriented Languages and Systems, Versailles France. Crawley, S. C. and H. J. Barman (1987, August). Flexibility in an object-based type system. In Proceedings 2nd International Conference on Persistent Object Systems, Appin Scotland. University of Glasgow. PPRR 44. Crawley, S. C. and M. J. Oudshoorn (1994, November). Orthogonal persistence and Ada. In Proceedings TRI-Ada'94, Baltimore MD, pp. 298{308. ACM. Dearle, A. (1989). Environments: a exible binding mechanism to support system evolution. In Proc. 22nd Hawaii International Conference on Systems Sciences, January 1989, Volume 2. Dearle, A. (1994, August). Use of orthogonal persistence technology in industrial and application oriented research and development. Technical Report PS-26, Dept of Computer Science, University of Adelaide.
ISO (1995). Ada 95 Reference Manual. International Standard ANSI/ISO/IEC-8652:1995. Kirby, G. N. C., R. C. H. Connor, Q. I. Cutts, A. Dearle, A. M. Farkas, and R. Morrison (1992, September). Persistent hyper programs. In Proceedings 5th International Workshop on Persistent Object Systems, San Miniato, Italy. Klas, W. and V. Turau (1992, July). Persistence in the object-oriented database programming language VML. Technical Report TR-92045, International Computer Science Institute, Berkeley, CA. Linton, M., J. Vlissides, and P. Calder (1988, November). Composing user interfaces with InterViews. Technical Report CSL-TR-88-369, Stanford University. Meyer, B. (1989). Eiel: The language. Technical Report TR-EI-17/RM, Interactive Software Engineering Inc. Meyer, B. (1990). Eiel: The libraries. Technical Report TR-EI-7/LI, Interactive Software Engineering Inc. Morrison, R. (1982). S-Algol: a simple algol. BCS Computer Bulletin 2 (31), 17{20. Morrison, R., A. Brown, R. C. H. Connor, and A. Dearle (1989). The Napier88 reference manual. Technical Report PPRR-77-89, University of St Andrews. Richardson, J., M. Carey, and D. Schuh (1993, July). The design of the E programming language. ACM Trans. on Programming Languages and Systems 15 (3), 494{534. Vaughan, F., A. Dearle, J. Cao, R. di Bona, J. Farrow, F. Henskens, A. Lindstrom, and J. Rosenberg (1993). Causality considerations in distributed persistent operating systems. Australian Computer Science Communications 16 (1), 409{419. Walsh, B., P. Taylor, C. McHugh, M. Riveill, V. Cahill, and R. Balter (1993, January). The Comandos supported programming languages. Technical Report TCD-CS-93-34, Trinity College, Dublin; Unite mixte BULL-IMAG, Trinity College Dublin. Wegner, P. (1987). The object-oriented classi cation paradigm. In B. D. Shriver and P. Wegner (Eds.), Research Directions in Object-Oriented Programming, pp. 479{560. MIT Press.