Evolutionary Data Conversion in the PJama Persistent Language M. Dmitriev and M. Atkinson
Department of Computing Science, University of Glasgow, Glasgow G12 8RZ, Scotland, UK Email: fmisha,
[email protected]
Abstract
The persistent object conversion facilities available in the evolution technology for the PJama persistent programming language are described. They include default and custom conversion, and within the latter - bulk and fully controlled conversion. Where the programmer needs to specify transformations dierent from the default, they are coded in standard Java. During conversion the \old" object graph remains unchanged, that is, substitute objects are not directly reachable from it, which is crucial for comprehensible semantics of evolution code. The authors believe that the present set of facilities is complete, i.e. it is enough to convert any persistent store instead of rebuilding it from scratch. However, the question of whether this can be generally proven is raised.
1 Introduction The problem of schema evolution is important for persistent languages and object-oriented databases. Though the exact details of persistent data and class storage vary signi cantly between dierent systems, this issue is common for all of them. The quality of evolution facilities of a particular system can greately aect its usability, both for the programmers who need to make lots of changes to classes during application development, and for the nal users who eventually receive updated classes and need to evolve their databases accordingly. For a commercial system, it may be crucial that its evolution subsystem can guarantee that any reasonable modi cation of persistent classes (schema) and, consequently, persistent objects, can be performed with minimum eort and without the necessity of rebuilding the whole database from scratch. PJama [1, 2, 10, 13] is an experimental persistent programming system for the Java programming language. It has much in common with object-oriented database systems used together with Java. PJama is being developed as a collaborative project between Glasgow University and Sun Microsystems. For PJama, mechanisms are embedded into the Java Virtual Machine to support the persistence of objects beyond the execution time of the program that created it. In fact, not all the data of a program, but only objects chosen explicitly or implicitly by a PJama programmer persist. Objects created by one program can later be used by another. To support persistence, the standard JDK Java Virtual Machine was modi ed to become the PJVM and an API consisting of several classes was developed. PJama implements a transparent model of persistence, that is, several API calls that register or retrieve persistent roots are the only thing that should be added to the Java program to make it manipulate the whole graph of persistent objects. We also didn't make any changes to the format of Java class les or individual bytecodes. At present PJama writes both persistent objects and their classes to a persistent store (an equivalent of a single database). Furthermore, it preserves all classes reachable from classes of persistent objects. The reasons for adoption of such a policy are explained in [10]. For many applications this results in the number of persistent classes becoming large, often much larger than one intuitively expects. Once a class becomes persistent, it is saved inside the store and therefore can not be replaced by simply modifying its source code and running the Java compiler. Compilation is just the rst step, and then special evolution technology is required to verify the substitutability of the new version of a class and promote it into the
store. To allow the programmer to develop successive versions of applications smoothly, this technology has to be powerful and reliable. The evolution tool called opjsubst plus the mechanisms embedded into the PJama VM, collectively called the evolution technology, have been developed for PJama. The tool currently runs o-line on a quiescent store. This stop-the-world model is temporary unavoidable, because PJama at present does not allow several \writer" applications to run concurrently on the same store. The tool can be driven by command line arguments or, alternatively, by text les containing similar instructions. These instructions specify the changes at the level of the class hierarchy, i.e. operations like substitution or insertion of classes. Changes to individual classes are not speci ed explicitly | the programmer just modi es a class and the evolution tool picks it up. A change to a class can cause the format of its instances to change | then, if there are any persistent instances of this class, we say that this change aects persistent data. If this happens, measures should necessarily be taken | the aected persistent objects should be converted to make them match the new format. Conversion can be performed either automatically (default conversion) or can be tailored by the programmer (custom conversion). In the latter case, the programmer has to write some code in Java that actually performs some part or the whole set of conversion operations. At present the PJama evolution technology supports only eager (immediate) conversion, which means that all instances of a modi ed class should be converted in one go, just after the class is substituted. Lazy conversion is delayed for PJama due to the limitations of the present store implementation. Our previous paper [3] described mostly our consistency checking strategy which is designed to ensure that any types of changes to classes and class hierarchy, including the unsafe ones such as deletion of elds, methods or whole classes from the middle of the class tree, will not lead to broken contracts between classes. This paper describes the data conversion facilities available in the second major release of the PJama evolution technology. We believe that the set of these facilities is complete and sucient, i.e. the programmer can perform any modi cation to persistent classes and objects without the need to rebuild the store from scratch. However, both the exact de nition and proof of completeness of evolution technology requires clari cation through theoretical and practical research. That is, how can we be sure that we have enough facilities that we will never need to rebuild a database?
2 Types of Evolutionary Operations on Persistent Classes In this section we brie y present the primitives available in the PJama evolution technology. See our previous paper [3] for the detailed discussion. Of the operations on persistent classes that are supported by the present PJama evolution technology the most frequently used is the substitution of a single persistent class. In addition, the following operations that aect the hierarchy of persistent classes are available: insertion of a class into the middle of the existing hierarchy, deletion of a class from the middle of the hierarchy and replacement of a class. If class C is inserted (not just added) as a direct subclass of a class Csuper, it means that at least some of the former direct subclasses of Csuper now become direct subclasses of C. The system gets all this information from the de nition of C and from the new de nitions of C's subclasses. If class C is deleted, its direct subclasses become direct subclasses of C's superclass (or, if the latter is also deleted, of the lowest undeleted superclass of C). The programmer should modify, recompile and pass to the evolution system all the aected subclasses as well as all other classes that were previously referencing C. The operation of replacement of a class is essentially substitution combined with renaming. If the programmer replaces class C with class D, all instances of C automatically become instances of D. All references to class C in the source code of other classes should be replaced by the programmer with references to class D and these classes recompiled. The evolution system will pick them up, verify and substitute.
3 When Conversion is Required All the above operations may or may not aect the persistent objects. Whether an object is aected or not depends on whether the modi cation of its class is such that the format of its instance changes. If we de ne instance-format equivalence in an implementation independent manner, it would mean strict equivalence of the number, names and types of all instance (non-static) data elds of both classes. PJama, in addition, requires the declared order of the elds to be the same in both classes, because the
elds are laid out in objects in the same order as they are declared, and accessed by their physical osets. However, in PJama there are a number of cases when the type of a eld can be changed, but the format of instances remains the same, and the compatibility between the current value of the eld and the new eld's type is guaranteed. Examples of such changes are replacement of the primitive type short with the type int or replacement of a class type with its superclass type. It is possible for some primitive types in PJama to be compatible in the above sense, because in the implementation of Java on which PJama is currently based (Sun's JDK)1 each primitive type eld (except for elds of types long and double) occupies a xed-size 32-bit slot in an object2 . Class type elds are always physically compatible, because they are all just pointers to objects. However, there are only a limited number of cases when two class types are guaranteed to be logically compatible, i.e. the assignment of a eld of one type to a eld of another will never cause problems. It turns out that the rules of widening reference conversions given in the Java Language Speci cation [8] are exactly what we need to de ne compatibility of class types, since our situation is exactly equivalent to the one when the above conversions are used. As for scalar types, only the subset of widening primitive conversions where both arguments are integers is applicable. The complete set of our rules was presented in [3]. If a modi cation to the class is such that the format of instances of a modi ed class changes and there are some instances of this class in the store, it is necessary to convert all instances of this class3. If a class is nominated for deletion, but there are some persistent instances of it, they should be migrated to other classes. One additional operation which we have implemented in PJama and describe here is not really evolutionary. It is intended to be used in the case when the programmer wants to modify all instances of some class, avoiding the diculties of their lookup in perhaps complex and deep application data structures. It turns out that a mechanism very similar to evolutionary data conversion can be used for it. Thus this operation, called modi cation is also discussed in this paper. Since the mechanism that we exploit looks very similar for all three tasks, in the further discussion we will often use the term \conversion" in a wider sense to denote all three kinds of operation: conversion, migration and modi cation.
4 Default and Custom Conversion If the programmer wants to convert instances of some class, they have a choice between default conversion and custom conversion. If conversion, in the strict sense is implied, i.e. some class has been changed and all instances of this class should be made compatible with the class's new de nition, it is often enough to use default conversion. This means that for each instance in the old format, the evolution system will automatically create a new one in the new format. The value of each eld that has the same name and the same or compatible type in both versions of the class will be copied from the old instance to the new one. Also, all the elds that don't exist in the old instance will be initialized with default values (as speci ed in [8]), i.e. 0 or null. To perform default conversion, the programmer simply con rms that they want it, when the evolution tool detects a format-transforming modi cation to some class and asks what to do with its instances. Default migration is also applicable when the programmer wants all instances of a class nominated for deletion to migrate to one class. In that case, they should simply specify, together with a class to delete, a class to which they want to migrate the \orphan" instances. However, sometimes conversion between two dierent formats of instances may be non-trivial. For example, it might be necessary to recalculate sizes from feet and inches to centimetres or replace full postal addresses with pairs \post code { premises number". The most convenient option for the programmer in this case is to be able to encode them in the same programming language that is used to create the data, in our case Java. This type of data conversion is called custom or programmer-de ned. In PJama, there are two ways of performing custom conversion. The rst and simpler one is called bulk conversion. Bulk custom conversion is supposed to be used when all instances of some class should We believe this should be the case for any reasonable implementation of Java on 32-bit hardware. In arrays, however, elements of these types always take the smallest possible space, e.g. 8 bits for type byte or 16 bits for type short. 3 If no instances of a class are expected to exist, the user should explicitly con rm this fact to the system, which will then verify it, currently by a linear scan of the store. 1 2
be converted in the same way. To perform it, the programmer should write an appropriate conversion method in Java. One such method can be de ned for each evolved class. Conversion methods should have prede ned names and signatures so that the evolution system can recognise them and call them correctly. All conversion methods should be placed in one class called conversion class. Having this class, the evolution system will scan the store linearly. For each detected instance of an evolved class it will call an appropriate conversion method, and that will result in creation of a new, substitute instance. The evolution system remembers all pairs \old instance { new instance". After conversion is nished, it will iterate over all persistent objects and replace all references to old instances with references to corresponding new instances. Custom conversion can be combined with default conversion, i.e. the programmer can avoid writing the code that copies the contents of elds with the same names and types from \old" instances to \new" ones. The descriptions of the available conversion methods and the details of combining default and custom conversion are discussed in section 5. The task of bulk modi cation of instances when their class is unchanged, that was mentioned in the previous section, is not actually an evolutionary problem. However, since Java methods very similar to those used for bulk custom conversion are used, we describe them in the same section. In practice the application of bulk custom conversion proceeds as follows. The programmer puts all conversion methods into one class with an arbitrary name, compiles it and passes it to the PJama evolution tool along with the names of the modi ed persistent classes and their newer versions (in .class les). The tool veri es that the new versions of classes are valid substitutes for the old ones (see [3] for the description of the veri cation procedure). Then, if all checks are successful, it runs the conversion, patches the references to converted persistent objects, and nally stabilises the store, thus making the changes permanent. If something goes wrong before stabilisation, nothing is saved and the store remains unchanged. In addition to bulk conversion, fully controlled custom conversion is available in PJama evolution technology. It is intended to be used if the programmer wants to have full control over how and in what order the instances are converted. To run fully controlled custom conversion, the programmer should put the method called conversionMain() into the conversion class. If this method is present, the evolution system will simply call it after veri cation of substitutability of the classes, ignoring any other conversion methods. No automatic linear scan of the store would be performed. Thus the programmer gets the total freedom and the full responsibility for the results of such conversion. They should ensure that all instances of the modi ed class are either converted or made unreachable. The detailed description of fully controlled conversion is given in section 6.
5 Bulk Conversion In the following discussion we will denote a class for which conversion is required as C. Csuper means any superclass of C. Let us rst describe the categories of conversion methods that correspond to categories of changes to class C.
5.1 Class C Modi ed
The signatures of the conversion methods recognised by the evolution system if class C is modi ed, are given below. The programmer can choose any one of the available headers and write the appropriate method body:
public static void convertInstance(C$$_old_ver_ c0, C public static C convertInstance(C$$_old_ver_ c) public static Csuper convertInstance(C$$_old_ver c)
c1)
The unusual sux $$ old ver is used to distinguish the old and new versions of C. This is an entirely valid sequence of symbols to use in a Java identi er, however it has been chosen such that anybody else is unlikely to use it for a dierent purpose. Before conversion starts, the old version of class C is automatically renamed by the evolution system. After conversion is nished, the old version is invalidated, so the only place where it is possible to operate on two versions of one class and distinguish them this way is in conversion code.
The alternative way to access old versions of classes and old instances could be to use mechanisms similar to Java re ection. This is discussed in [3]. However, it is clear that the adopted mechanism is much more convenient for the programmer, although it requires a small change to the Java compiler's class loader. The latter is needed to enable it to load a class de nition from the persistent store instead of the le system whenever the name of the class contains the $$ old ver sux. The rst form of convertInstance method is the simplest one, and also the one that allows automatic default conversion. Before the system calls it, it creates an instance c1 of new version of class C and copies into it from c0 the values of all data elds that have the same name and compatible types in both versions of class C. The new instance is created without invocation of a constructor. The second and third forms allow the programmer to change the actual class of an instance during conversion. The second form can be used if the programmer wants to change the class of an instance to new version of C or to a subclass of the latter. The third form permits a replacement class that is a superclass of C or that just has some common superclass with C. Both of these methods should explicitly call the new operator to create a new instance and then explicitly copy all the necessary data from c to the new instance. The second form of convertInstance is guaranteed to be reference-safe. This means that if there are some data structures in the store that refer to instances of C, for example there is a class CRef that looks like class CRef { C cref; ... }
then after conversion all references from instances of CRef remain valid, although now some or all of them can point to instances of C's subclasses. That's because Java, as any other object-oriented language, allows a class type variable to refer to an instance of a class that is a subclasses of the declared class of this variable. However, a reference to an instance of a class that is C's superclass (or extends C's superclass) would be illegal. Therefore the third form of convertInstance method is unsafe in this sense, and it is the programmer's responsibility to arrange that there are no illegal references after conversion is complete. However, it gives the programmer more freedom in restructuring the persistent data and is justi ed if, for example, it is necessary to migrate all instances of class C to another class, which has a common superclass with C but is situated on another branch of class hierarchy. The subcase of class modi cation is when C is replaced (i.e. modi ed and renamed). Let us denote C's new name as NewC. Being informed by the programmer, the evolution tool knows that C and NewC are really the old and new names of the same evolved class. Therefore semantically exactly the same set of conversion methods can be used in this case:
public static void convertInstance(C c, NewC nc) public static NewC convertInstance(C c) public static C_and_NewC_super convertInstance(C c)
5.2 Class C Deleted
If class C that has some persistent instances, should be deleted from the class hierarchy, its \orphan" instances should migrate to other classes. The following methods can be used to perform migration:
public static void migrateInstance(C c0, Csuper_sub c1) public static Csuper migrateInstance(C c) Csuper should not be nominated for deletion itself. Csuper sub is a class which has a common (undeleted) superclass with C. As before, the rst form of migrateInstance method receives an already prepared instance of the replacement class from the evolution system. The values of all elds with the same name and compatible types are already copied from c0 to c1. The second method should call the new operator and copy the necessary data between the instances explicitly. The second form of the migrateInstance method is also reference-unsafe.
5.3 Class C Unchanged { Bulk Modi cation of Instances
The evolution system was extended to handle this non-evolutionary problem, because it became clear that the existing mechanism of bulk conversion looks quite attractive for it, and only minor modi cations to the system are required. So currently, if programmers want to write a short and simple program that modi es all persistent instances of some class, which is itself unchanged, they can use the following methods:
public static void modifyInstance(C c) public static C modifyInstance(C c) public static Csuper modifyInstance(C c)
5.4 Semi-automatic Copying of Data Between \Old" and \New" Instances
As mentioned above, conversion methods that get an \old" instance as a single argument should create a replacement instance explicitly and they are fully responsible for copying data from one instance to another. However, even though the classes of these instances are most likely to be dierent and the class of the replacement instance may change from one invocation of the method to another, there can still be many elds with the same name and compatible types in both instances. To facilitate copying of such elds between instances, the following method is available in PJama core class org.opj.utilities.PJEvolution:
public static void
copyDefaults(Object oldObj, Object newObj)
This method copies the values of all elds that have the same name and compatible types (as de ned in section 3), from oldObj to newObj, irrespective of their actual classes. The method uses Java re ection to nd all such pairs. To speed up copying, it caches the results (mappings between elds) for every new pair of classes it nds.
5.5 Copying and Conversion of Static Variables
PJama supports the persistence of static variables unless they are marked transient. This sets PJama apart from all other known persistence solutions for the Java platform, which treat static variables as implicitly transient [10]. Therefore the evolution system has to provide support for conversion of static variables. When class C is subsituted, the values of all its static elds that have same names and compatible types in both versions are by default copied from the old version of C to the new one. The value of the eld f is not copied, however, if f is static final in either the old or the new version of C. This is because ordinary static elds often hold some information that is obtained during program execution. Such information is preserved between executions of a persistent program and, similarly, across subsequent evolved versions of the class. In contrast, final elds usually serve as constants. They usually do not accumulate any information during runtime and remain the same in all evolving versions of a persistent class. However, if in some new version of a class such a constant has a dierent value, it is most likely that this change is intentional and should be propagated into the store. For example, the programmer might want to modify some message that the program prints, or change some normally \stable" constants. The programmer can override the above default rule for non- nal statics of some class using a special command line option of the evolution tool or the similar ag of the tool's change speci cation language. In that case the static variables of this class will be assigned the values in the usual Java way, i.e. by executing their static initialisers. Similarly, copying of nal static variables between the versions of a class can be enforced. If simple copying of statics is not enough, a conversion method for statics can be used. This method's signature is:
public static void
convertStatics()
If a method with this signature is present in the conversion class, it is called after statics are copied with the default procedure, but before bulk instance conversion. The code in this method should deal with all aected classes and can refer to their old versions as usual, i.e. using the $$ old ver sux.
5.6 An Example { an Airline Maintaining a \Frequent Flyer" Programme
After presenting all available conversion methods, we will illustrate their use on a simple example. Consider an airline that maintains a database of frequently ying customers. Each person is represented as an instance of class Customer. Every time a customer ies with this airline, miles are credited to their account. When a sucient number of miles is collected, they can be used to y somewhere for free. Consider the case where the programmer wants to modify the de nition of class Customer to make it work with address data more conveniently:
class
Customer { // Old String name; String address; int milesCollected; ...
}
class
Customer { // Revised String name; String houseNo, street, city, postcode, country; int milesCollected; ...
}
The single eld address is replaced with several elds: houseNo, street, etc., while other elds remain the same and should retain the same information 4 . In order to convert data, the programmer can write the following conversion class:
class CustomerConverter { // The name can be arbitrary public static void convertInstance(Customer$$_old_ver_ oldC,
Customer newC)
newC.houseNo = extractHouseNo(oldC.address); newC.street = extractStreet(oldC.address); ...
} ... // Methods extractXXX not shown
In the above method, it is enough to deal explicitly only with the elds that have been replaced and added. The values of all others, such as name and milesCollected, are copied from oldC to newC automatically. Now imagine that the airline decides to divide customers into three categories: Gold Tier, Silver Tier and Bronze Tier, depending on the number of collected miles. Class Customer becomes an abstract superclass of three new classes, and each Customer instance should be transformed into an instance of the appropriate specialised class. In order to do such a transformation, we have to use a conversion method that can create and return an instance of more than one class. The solution may look like:
import org.opj.utilities.PJEvolution; class CustomerConverter { public static Customer convertInstance(Customer$$_old_ver_ oldC)
{
Customer newC; if (oldC.totalMiles > 50000) newC = GoldTierCust(); else if (oldC.totalMiles > 20000) newC = SilverTierCust(); else newC = BronseTierCust();
new new new
PJEvolution.copyDefaults(oldC, newC); newC;
}
return
// Explicit copying
5.7 Stability of the \Old" Object Graph during Bulk Conversion
An important feature of the bulk conversion mechanism implemented in PJama is the stability of the source (\old") data. During conversion, newly-created instances are not automatically made reachable
4 Note that all client classes of class Customer are checked to ensure that they are simultaneously transformed to use the new class de nition.
from any persistent data structure. \Old instance { new instance" reference pairs are kept in a hidden system table instead, and the source object graph remains unaected. During conversion, reference elds of the freshly created and initialised \new" instance would point to \old" objects, as illustrated in Figure 1. This stability is essential for comprehensible conversion semantics. When conversion is nished, the persistent store is scanned and all references in persistent objects pointing to \old" instances are switched to their \new" counterparts, making \old" instances unreachable and preserving the identity of converted objects. This has an eect of an instant \ ip" that transforms the old object graph into the new. The \old" instances can eventually be reclaimed by the disk garbage collector [15]. Reference patching Initial object graph
Conversion
"Old" instance of the evolved class
New object graph
"New" instance of the evolved class
Figure 1: Management of references during conversion The fact that the old object graph remains stable during conversion and is visible to conversion methods in its entirety gives the programmer free access to all data in the unconverted format at any moment during conversion. This can be used, say, to collect statistics and for similar purposes. A new version for an \old" converted object, if it already exists, can be obtained using yet another method declared in the PJEvolution class called getNewObjectVersion(Object). Continuing the above example, imagine that there is a eld reference of type Customer in both old and new versions of class Customer. This eld points to a person that has once referred this customer to the airline. The airline decides that if the customer goes to the Gold Tier, then the one who has referred them gets bonus miles:
public static Customer convertInstance(Customer$$_old_ver_ oldC) Customer newC; if (oldC.totalMiles > 50000) { newC = new GoldTierCust();
{
// Give bonus to the person who referred this customer (we get old instance here) Customer$$_old_ver_ ref = (Customer$$_old_ver_) ((Object) oldC.reference); ref.totalMiles += BONUS_MILES; // See if ref has already been converted, and if so, update its new version Customer refNew = (Customer) PJEvolution.getNewObjectVersion(ref); (refNew != ) // ref has already been converted refNew.totalMiles = ref.totalMiles; } ... newC;
if
}
null
return
Note that during conversion oldC.reference continues to point to an instance of Customer$$ old ver irrespective of whether that particular instance has already been converted or not.
6 Fully Controlled Conversion The mechanism of fully controlled conversion can be used if the programmer wants to convert instances of the evolved class in a non-random order, or considerably restructure the data in addition to conversion, or
do something else for which ordinary bulk conversion is not appropriate. To run fully controlled custom conversion, the programmer simply puts the method called conversionMain() into the conversion class. If this method is present, the evolution system will call just it, ignoring any other conversion methods. No automatic linear scan of the store will be performed, therefore it is solely the programmer's responsibility to ensure that all instances of the modi ed class are converted. If fully controlled conversion is used, preservation of identity of instances described in the previous section can't be done automatically. The programmer should explicitly inform the system about every \old instance-replacement instance" pair. For that, there is a special method in the PJama core class org.opj.utilities.PJEvolution:
public static native void
preserveIdentity(Object oldObj, Object newObj);
We will now illustrate the usage of fully controlled conversion on the same example of an airline. Continuing the story, let us imagine, that in addition to sorting customers into three categories, the programmmer also decides to put them into three separate collections instead of one array. Furthermore they might want to get rid of those instances for which the collected miles have expired. The following method can be added to the conversion class in addition to the already existing Customer convertInstance(Customer$$ old ver oldC) method:
public static void conversionMain() { Customer$$_old_ver_ allCustomers[] = getPersistentRoot("allCustomers"); for (int i = 0; i < allCustomers.length; i++) if (! milesHaveExpired(allCustomers[i])) { // This instance is valid, Customer c = convertInstance(allCustomers[i]); // so we convert it // Preserve the identity explicitly PJEvolution.preserveIdentity(allCustomers[i], c); // Put new instance into the appropriate collection if (c GoldTierCust) goldC.add(c); else if (c SilverTierCust) silverC.add(c); else bronzeC.add(c);
instanceof instanceof
} makeNewPersistentRoot("bronzeC", bronseC); ... }
7 Related Work The OODBMSs that support both schema and data evolution dier signi cantly in the approach followed for data conversion. However, we believe the following features are essential and can be used to characterise any system: whether it supports user-de ned conversion; whether it supports versioning of classes and objects (only during conversion, as in PJama, or inde nitely); whether it supports both eager and lazy (immediate and deferred) conversion. The commercial system Objectivity [11] does not provide any tool to automatically update the database, besides providing object versions. The designer has to write a program which reads the necessary objects in the old format and assigns the values to the corresponding objects in the new format. The program can be written to transform the database both eagerly and lazily. ObjectStore [9] makes use of immediate database conversion. Transformation functions, which override the default conversion, can be associated with each modi ed class. A new instance, which conforms to the new de nition of a corresponding class, is created for each old instance. The transformation function reads the value of the old instance and assigns it (after having made some modi cations on it) to the new
object. All references to the object have to be updated explicitly by an application program in order for them to point to the newly created object. The commercial systems Itasca [7], Versant [16] and POET [14] support lazy conversion of objects, but do not provide the user with custom conversion functions. All they have is the possibility to override a default conversion by assigning new constant values to modi ed or added attributes of a class; data contained in the deleted attributes is inevitably lost. O2 [4, 5] is the system with the most sophisticated evolution support currently known to the authors. Schema modi cation can be performed either incrementally using primitives like adding or deleting attributes to a class, or by rede ning the structure of a class as a whole. Conversion of instances can be done eagerly or lazily. Both default conversion and user-de ned conversion functions are available. Migration of objects of one class to another class is supported. It is reported that versioning has also been implemented for it [5], making it the only system to support both adaptational and schema versioning approaches to schema evolution. This means that the schema and the underlying database can either be transformed completely, as in all the above systems, or new schema versions can be created and coexist simultaneously with the older ones. In the latter case, the identity of the object whose class has been evolved remains the same, but it is now represented in several forms which correspond to several versions of its class. To support this, the programmer must provide a forward conversion function and a backward conversion function for each version of the evolved class. They are just pieces of code which are responsible for updating a eld in one representation whenever the corresponding eld in the other representation is updated, and vice versa. For example, the eld temperature in the old version of a class may encode temperature in Fahrenheit and in the new version in Celsius. The corresponding conversion functions containing the formulae to convert the temperature will be invoked automatically every time this eld is updated in either representation. A similar technique is used in the experimental system described in Odberg's work [12], where this technique is called sharing. This system, however, is only a partially implemented prototype. The commercial system GemStone/J [6] is based on the Java language, and many of its features are similar to the existing or planned features of PJama. Its evolution facilities, however, are implemented mostly as API calls rather than commands of a standalone tool. Although GemStone/J supports concurrent access of multiple VMs to the same store, class evolution cannot be reliably performed in a concurrent fashion in their system. The manual recommends termination of all applications before starting transformation, and the shutting down and restarting of the server VM after it is nished. Classes themselves can be evolved arbitrarily, as in PJama. But as far as conversion (transformation, in GemStone's terminology) of instances is concerned, the exibility is quite limited. No user-de ned conversion functions are available. Instead, the programmer has to create a Java structure which maps elds of the old class to elds of the new class. This map, together with old and new class objects, is passed to the special method that creates a speci cation to transform the old class to the new class. A method is also available, that creates a speci cation in the default way, where values of the elds with the same name and compatible types are preserved between versions. The transformation itself is performed eagerly, by calling yet another method, but many operations before and after it, such as loading new classes, are performed explicitly.
8 Conclusions and Future Work We have described the persistent data conversion facilities that are available in the second major release of the evolution technology for the PJama persistent programming language. They include default and custom (user-de ned) conversion, and within the latter | bulk and fully controlled conversion. Where the application programmer (user) is specifying data transformations dierent from the default, they are coded in standard Java. Language mechanisms to access old and new versions of classes and objects allow the speci cation of major re-organisation of stored data. To provide the user code with comprehensible semantics, the \old" object graph remains unchanged during conversion (unless the programmer wants to modify it explicitly). The replacement objects are not directly reachable from anywhere in the old object graph. However, reference elds that a \new" object inherits from the \old" one, by default continue to refer to the objects of the old graph. When conversion is nished, the persistent store is scanned and all references in persistent objects pointing to \old" instances are switched to their \new" counterparts, making \old" instances unreachable and creating the \new" object graph. In future we might consider implementing lazy conversion of persistent objects. However, this will
require overcoming a number of both conceptual and engineering problems, some of which are discussed in [3]. The most serious problem, similar to the problem of complex lazy transformation described in [4], is the following. Lazy conversion of objects means that after it is initiated, whenever in the code of the running PJama application we refer to an instance of an evolved class and it is not yet converted, an appropriate conversion method is invisibly called. This method converts the instance and returns the new one. However, the conversion method itself might need to access some other instances of evolved classes. This is called complex transformation. At present, if in PJama conversion code we refer to an instance of an evolved class, we always get the old copy of this instance. It would be desirable to have the same behaviour of conversion code in case of lazy conversion. Otherwise we might face, for example, endless recursive cyclic calls of conversion methods, or just the fact that functionally equivalent conversion code must be dierent depending on whether it is executed eagerly or lazily. However, to ensure the same behaviour of conversion code when it is invoked lazily at arbitrary time, we would have to somehow preserve all old copies of instances of evolved classes for possibly inde nite time. This obviously can have serious engineering drawbacks, though this is implemented in some systems, e.g. [4]. At present we are starting to evaluate our evolution technology on several persistent applications that are developed at the University of Glasgow. We believe that this practical research will help to clarify the issues of the completeness and convenience of our set of evolution facilities. There are also some Java-speci c evolution problems that we are currently working on, for example evolution of persistent Java core classes.
Acknowledgements This work is supported by a collaborative research grant from Sun Microsystems Inc. and by the British Engineering and Science Research Council, grant number GR/K87791. The authors are grateful to Huw Evans for extensive reviewing of the draft of this paper.
References [1] M.P. Atkinson, L. Daynes, M.J. Jordan, and S. Spence. Design Issues for Persistent Java: A TypeSafe, Object-Oriented, Orthogonally Persistent System. In The Proceedings of the 7th International Workshop on Persistent Object Systems (POS 7), May 1996. [2] M.P. Atkinson and M.J. Jordan, Issues Raised by Three Years of Developing PJama. in C. Beeri & P. Buneman (Eds), Database Theory - ICDT'99, Springer Verlag Lecture Notes in CS 1540, pp 1-30. [3] Misha Dmitriev. The First Experience of Class Evolution Support in PJama. In Advances in Persistent Object System. Proceedings of The 8th International Workshop on Persistent Object Systems (POS-8) and The 3rd International Workshop on Persistence and Java (PJAVA-3), Morgan Kaufmann, 1999, pp 279-296. [4] Fabrizio Ferrandina, Guy Ferran, Joelle Madec, Thorsten Meyer, and Roberto Zicari. Schema and Database Evolution in the O2 Object Database System. In Proc. of the 21th International Conference on Very Large Databases, pages 170-181, Zurich, Switzerland, September 11-15, 1995. [5] Fabrizio Ferrandina, Sven-Eric Lautemann. An Integrated Apporach to Schema Evolution for Object Databases. In Dilip Patel, Yuan Sun, and Shishuma Patel, editors, Proc. of the 3rd Int-l Conf. on Object-Oriented Information Systems. (OOIS), pages 280-294, London, UK, December 1996. [6] GemStone Systems, Inc. GemStone/J Programming Guide. Version 1.1 March 1998. [7] Itasca Systems, Inc. Itasca systems Technical Report Number TM-92-001. OODBMS Feature Checklist. Rev. 1.1, December 1993. [8] James Gosling, Bill Joy, Guy Steele. The Java Language Speci cation. Addison{Wesley Co., Inc., 1996. [9] Object Design Inc. ObjectStore User Guide, Release 3.0. Chapter 10, December 1993. [10] Mick Jordan and Malcolm Atkinson. Orthogonal Persistence for Java - A Mid-term Report. In Advances in Persistent Object System. Proceedings of The 8th International Workshop on Persistent Object Systems (POS-8) and The 3rd International Workshop on Persistence and Java (PJAVA-3), Morgan Kaufmann, 1999, pp 335-352.
[11] Objectivity Inc. Objectivity, User Manual, Version 2.0. March 1993. [12] Erik Odberg. Multiperspectives: Object Evolution and Schema Modi cation Management for ObjectOriented Databases Doctor's Thesis. Department of Computer Systems and Telematics, Norwegian Institute of Technology, February 1995. [13] PJama Team. PJama Tutorial. http://www.sunlabs.com/research/forest/opj.main.html [14] Poet Software Corp. POET C++/Java SDK Programmer's Guide. 1997. [15] T. Pritnezis. Analysing a Simple Disk Garbage Collector. In Proceedings of the Second International Workshop on Persistence and Java, Sun Microsystems Technical Report TR-97-63, 1997. [16] Versant Object Technology, 4500 Bohannon Drive, Menlo Park, CA 94025. Versant User Manual, 1992.