building software engineering environments (although ... involved in the definition and execution of SLANG mod- els. The third section describes ... manner as well, with one root activity at the top of the hierarchy. Each activity encapsulates a set of logically related process ..... The latter solution is less demanding in terms of.
Experiences in the Implementation of a Process-centered Software Engineering Environment using Object-Oriented Technology*
Sergio Bandinelli, Luciano Baresi, Alfonso Fuggetta, and Luigi Lavazza CEFRIEL - Politecnico di Milano, Via Emanueli 15, I-20126 Milano (Italy)
Software engineering environments (SEEs) pose complex and critical requirements to the supporting repositories. Object-oriented Database Management Systems (ODBMSs) are expected to provide suitable features to successfully address these issues. SPADE is a process-centered SEE being developed at CEFRIEL and Politecnico di Milano. SPADE is built on top of an OODBMS and features process modeling and enactment, management of process data and facilities to integrate development tools. This paper reports the experiences gained by the authors in defining the requirements for SPADE’s process data repository, and in assessing six different OODBMSs against these requirements. The assessment has been carried out through several prototypes and within the context of the ESPRIT project GoodStep.
Introduction In recent years, many software engineering researchers have identified the software process as the key issue to obtain higher quality products, improved productivity, and more controllable projects (see for example [24,32]). By software process we mean the set of activities, rules, methodologies, tools, and roles that participate in the development of software within a given organization. There is no single process that can be used by any organization, for any kind of product, any development environment, or any software lifecycle. It is necessary to be able to envisage different processes, depending on the characteristics of the product, the market, and the development organization. For this purpose the software engineering community is producing an increasing effort in designing and developing languages and the related
Received June 1, 1993; revised November 2, 1993; accepted November 2, 1993. *This work has been partially supported by ESPRIT project 6115 GoodStep - General Object-Oriented Database for Software Processes[22,23]. ° c
1994 John Wiley & Sons, Inc.
THEORY AND PRACTICE OF OBJECT SYSTEMS.
support technology to formally describe, assess, and — wherever possible — automate software processes. In process-centered SEEs (PSEEs), the development of software is guided by a process interpreter according to a given process model. In general PSEEs include a repository for process data (i.e., the software artifacts and other process-specific information) and a set of integrated tools. Many authors (see for example [5]) have addressed the issue of identifying the requirements for a database system supporting a software engineering environment. More recently, other works have discussed the suitability of OODBMSs [1] as a vehicle to implement software engineering repositories [19,3]. Experiences in the usage of object-oriented technology in building software engineering environments (although not process-centered) have also been reported [33]. SPADE is a process-centered environment that is built around the O2 OODBMS. The decision of using an object-oriented database for building the repository of SPADE descends from the consideration — supported by [19,33] among others — that OODBMSs can satisfy several of the requirements posed by software engineering applications. In particular, the support for complex objects, the management of concurrency, persistency, and distribution (at least at the client-server level) are considered as essential features to build advanced PSEEs. On the contrary, other database technologies, like relational databases or structurally objectoriented databases, proved clearly inadequate for this kind of systems [15,16]. In a previous paper, the authors have described their early experiences in building SPADE using O2 [3]. The evolution of OODBMS technology made the results described in [3] obsolete in a very short time, thus convincing the SPADE developers to change many project decisions, in order to exploit the new features of O2 . At the same time, experiments were carried out with other OODBMSs (namely, GemStone, ODE, DEC Object/DB, ObjectStore, and Itasca).
24(4):1 18, 1994
CCC1042-98329/94/020253-18
This paper reports how the advanced features of an OODBMS can be effectively exploited in building a PSEE, while pointing out the requirements for PSEEs that are still unsatisfactorily dealt with by currently available OODBMSs. Actually, SPADE is used as the reference PSEE throughout the paper, however many of the reported results are general enough to be applicable to other PSEEs. The paper is organized as follows. Next section briefly describes the main features of the SLANG (Spade LANGuage) process modeling language, and gives a conceptual description and classification of data involved in the definition and execution of SLANG models. The third section describes the requirements for the database posed by the process language and the whole supporting environment. In the fourth section the O2 based implementation of SPADE is described and discussed, while the following section briefly reports about the experiences with other OODBMSs. The sixth section briefly accounts for related work and discusses the generality of the requirements reported in this paper. Finally, some conclusions are drawn and future research directions are presented. The SLANG Software Process Language SPADE (Software Process Analysis, Design and Enactment) is a software process environment, being developed at CEFRIEL and Politecnico di Milano. It provides mechanisms for the definition, analysis, enactment, and evolution of a software development process. SPADE provides a domain-specific language for modeling software processes called SLANG (Spade LANGuage) [7,8,6]. SLANG is based on high-level Petri nets and is formally defined in terms of a translation scheme from SLANG into ER nets. ER nets [21] are a mathematically defined class of high-level Petri nets that provide the designer with powerful means to describe concurrent and real-time systems. In ER nets, it is possible to assign values to tokens and relations to transitions, describing the constraints on tokens consumed and produced by transition firings. ProcessData
Place
Activity
Token
Arc
Transition
SLANG Types
MetaType Active Copy Type ModelType
User-defined Types
Unit
TestCase
TestResult
TestSummary
UnitInterface
Change SpecDoc
ExecUnit
ExecTest
FIG. 1. A SLANG type hierarchy, including predefined SLANG types and user defined ones.
2
THEORY AND PRACTICE OF OBJECT SYSTEMS—May 1994
SLANG features This section provides only a brief overview of the language. The complete language definition is available in [8]. A SLANG process model consists of a set of type and activity definitions. They are used to describe process data and process activities respectively. Type definitions are organized in a generalization hierarchy following an object-oriented style. Attributes and methods can be defined for each type. Figure 1 presents a user-defined SLANG type hierarchy, including some of the types that are necessary to model the activity of changing a software unit. As an example, consider the definition of type Unit: class Unit inherit ModelType type tuple ( public name: UnitName; public AuthorName: PersonName; public language: string; public source: Text) end; Unit represents a software unit. This type has several subtypes which define specialized units, namely executable units and testable units. The attributes of class Unit have the following meaning:
• name is the symbolic name of the unit; • AuthorName indicates the name of the author of the software unit;
• language indicates in which language is written the code contained in the unit;
• source is the code contained in the unit. Activity definitions are structured in a hierarchical manner as well, with one root activity at the top of the hierarchy. Each activity encapsulates a set of logically related process steps and may include invocations of other activities. Activities at the leaves of the hierarchy are defined only in terms of high-level Petri nets and do not include invocations to other activities. An activity state is represented by a Petri net marking, i.e., an association of tokens with places. Tokens are typed objects and may model documents, tools, resources, test data, programs, etc. Places are distributed persistent object repositories; they are typed and may only contain tokens of the declared type. Transitions represent events that may, or may not, occur in a given state. A transition firing represents the occurrence of an event, taking a negligible amount of time. Arcs connect transitions to places and places to transitions. The arcs’ weights indicate the number of tokens that may flow through them at each transition firing. Weights can be statically defined (the default value is 1), or dynamically computed. In the latter case, the arc weight is indicated by a “*”, and it models events consuming a token set, whose cardinality is not statically known. In
addition to “normal” arcs, SLANG provides two special kinds of arcs: read-only and overwrite arcs. A read-only arc, represented by a dashed arrow, may connect a place to a transition. The transition can read and use token values from the input place in order to evaluate the guard and the action, but tokens are not removed. An overwrite arc, represented by a double arrow, may connect a transition to a place. When the transition fires the output place is emptied of all its tokens. Then, the token(s) produced by the firing are inserted in the output place. The overall effect is that the produced tokens overwrite the previous contents of the output place. The net topology describes precedence relations among events; it also describes parallelisms and conflict situations. Each transition is associated with a guard and an action. The transition’s guard is a predicate on input tokens and is used to decide if an input token tuple enables the transition. The dynamic behavior of a transition is described by the firing rule. The firing rule states that when a transition fires, tokens satisfying the guard are removed from input places and the transition’s action is executed. As a result of executing the action, a token tuple is inserted in the output places of the fired transition. Following the principle of information hiding, each activity definition has an interface and an implementation part. The activity interface includes a set of interface transitions, called starting events or ending events; and a set of interface places, classified in input places (input to starting events), output places (output from ending events), and shared places, that are shared by different activities and play the role of communication and synchronization variables during activity execution. Figure 2 provides an example for the interface of activity Change unit. This activity interface has a starting events (Start change), an ending events (End change), one input place (Change specification), one output place (New unit), and two shared places (Unit repository and Unit dependency). When an activity starting event is fired, an activity instance (active copy) is generated and its execution is started. The execution ends when one of the ending events is fired. In particular, it is possible to instantiate the same activity several times, generating different execution threads for the activity. Each active copy enactment proceeds concurrently with the enactment of other active copies.
Software development is a multi-person process that involves the use of a large variety of software tools. A SLANG process model may also describe the interaction with tools and users. This is done with a special kind of transitions, called black transitions, and a special kind of places, called user places. A black transition represents the invocation of an external (non-SLANG) executable routine (e.g., a Unix executable file). The external routine is executed asynchronously, that is, once the external tool has been started, the activity execution is resumed. User places are used to capture events that occur in the external environment and are relevant to process enaction. These events can be generated by users through tools provided by the environment. Figure 3 shows the complete SLANG definition of activity Change unit. The activity interface is outside the dashed box and its implementation is contained inside the box. Change unit models the process of changing a software unit whan new specification are provided. The design is changed according to the new specification, and a new design unit together with a new unit interface is produced. In general, changes in the implementation of a unit U do not affect other units. Changes in the interface of U , instead, need to be agreed upon by the designer responsible for U and those responsible for all units which “use” U . Thus, once the new design of U has been completed, the new interface is proposed to all U ’s users, who are asked to either accept the new interface or to reject it. If any U ’s user rejects the proposed interface, the unit’s designer makes a new proposal, which is in turn reconsidered by all users. This process is iterated Change specification Start change Change spec. doc. Design change
New design
Interface not new
Propose interface change
Unit repository Approved interface
Generate test cases
Coded unit Compile
Change specification Start change Unit repository
Change unit
Unit dependency
End change New unit
FIG. 2. Structure of the SPADE repository.
Interface for test
Unit coding Unit with errors
Unit dependency
Unit interface
Exec unit
Compiled unit Compile Compile Errors OK
Test cases
Run tests
Tested unit
Change unit End change
New unit
FIG. 3. The SLANG definition of activity Change unit.
THEORY AND PRACTICE OF OBJECT SYSTEMS—May 1994
3
until all U ’s users agree on the interface; then, U ’s coding may start. Coding implies editing the unit and compiling it. Preparation of data for unit test proceeds in parallel with coding. The unit is then run on the test cases one after the other. The errors that may be discovered in each execution are accumulated in an error report. The test continues until all test cases have been run. Only if no error is found, the unit is accepted. Otherwise the unit is rejected and a report is produced with a list of all the errors found. The meaning of the invoked subactivities is the following:
• Design change represents the production of a new design for the unit, according to the given specifications. It produces a new unit design and a unit interface definition. • Propose interface change represents the activity of proposing the unit interface to unit users for approval. It is invoked only if the unit interface has been modified. Otherwise, the unit interface is considered to be automatically approved by transition Interface not new. Note that a copy of the interface definition is also produced as input to the test case generation. • Unit coding, Generate test cases, and Run tests represent the usual activities of converting design into code, generating test cases and testing. As a last example, we present the guard and action associated with black transition Compile. Assume that place Coded unit has type Unit and place Compliled unit has type CompiledUnit, defined as follows: class CompiledUnit inherit Unit type tuple ( public Errors: Text; public Result: ObjectCode end;
Types used within the process model (modifiable part) Schema level Instance level
Process model References:
Process data
Non modifiable Modified to change the process model Changed during process enactment (state of the enacted process model)
FIG. 4. Structure of the SPADE repository.
4
TRANSITION { name Compile guard true action extAction = "make " + Coded_unit->name; if (exists(Coded_unit->name + + ".err")) Compiled_unit->initializeErr( Coded_unit->name + + ".err"); else Compiled_unit->initializeOk( Coded_unit->name + + ".out"); } Note that the guard is always satisfied: in fact, any time a source code is available, it can be compiled. Mechanisms supporting process evolution in SLANG In process-centered environments, process definitions (or models) play the role of the code in traditional computing systems. The enactment of the process definition causes the automatic execution of computer-based actions and guides the behavior of people involved in the process. In SLANG, it is possible to model and enact not only the software production process, but also the meta-process. The meta-process describes those actions that concern the management of the process itself, e.g., creation or modification of activities, types, etc. In SLANG these items are stored as tokens, and can be manipulated by the process as any other piece of data. SLANG is therefore a reflective language. The SPADE repository
We make the hypothesis that the used makefile produces either the executable file or a file with the errors found. We assume also that class CompiledUnit has two methods, one for initializing the result (the token that SLANG basic constructs (fixed part)
will be placed in Compiled unit) with compilation errors, and another for transferring in the result the content of the object file. We can now specify the guard and action associated with transition Compile:
THEORY AND PRACTICE OF OBJECT SYSTEMS—May 1994
All the information describing a SLANG model and the related process state are stored in a repository that is shared by all the SLANG interpreters. The SPADE repository is organized according to the following structure (see Figure 4). The schema of the database is partitioned in two parts: a fixed part that contains the types of SLANG basic constructs and a modifiable part containing the definition of the types used within a specific process model. This modifiable part can change to cope with modifications in the modeled process. For example, we may add a type to describe a new class of documents or software items. At the instance level, we have two different sets of objects as well. The instances of the types in the fixed part of the schema correspond to a specific process model definition (i.e., a collection of arcs, transitions, and places constituting a SLANG specification). The
instances of the modifiable part of the schema correspond to process data (e.g., modules, test results, test cases, etc.) produced or modified during process enactment. In conclusion, it is not possible to change the definition of the SLANG language (fixed part of the schema). Changes to the variable part of the schema and to the instances of fixed types correspond to changes in the process model. Changes to the instances of modifiable types correspond to changes in the state of the enacted process model. Database Requirements The following subsections describe the main requirements that SLANG poses to the supporting repository. Support for reflectivity SLANG is a reflective language. This means that the definition of a model may be changed by the execution of the model itself. There are several kinds of changes in the overall model that can be performed during process execution, in particular they may concern:
• the types in the modifiable part; • the structure of the model (i.e., the topology of the net);
• the text of a guard or action associated with a transition. In conclusion, the process model changes while it is executed. Since the process is a long-lived entity, performed by several cooperating agents, it is not feasible to stop the process and regenerate the SLANG interpreter whenever the process manager modifies parts of the model being enacted. This means that either the model is interpreted, or the underlying OODBMS must provide a facility for recompiling the changed parts without stopping the SLANG interpreter. Object handling As already mentioned, the user-definable part of the schema may be changed, even during process enactment, i.e., it must be possible to apply changes to the schema concurrently with the execution of models that are instances of the same schema. Schema updates must be safe: each change must leave the database in a consistent state. In particular, when a type definition is changed, the existing instances of the changed type have to be handled properly: some mechanisms are needed to support migration of the existing objects from the old definition to the new one. Although it is possible to “manually” convert existing objects, this operation is rather cumbersome, and in
general this solution is unfeasible for databases of non trivial size. An automatic mechanism for type migration should support different operating strategies:
• lazy migration: the object is “converted” to the new type only when the object is accessed;
• eager migration: the object is “converted” to the new type as soon as the new type is defined;
• intermediate strategies: any user-defined migration procedure half way between fully eager and fully lazy migration. Consider, for instance, a model that handles documents conforming to a given format, i.e., according to the definition of class “document”. When the format changes (i.e., when the definition of class “document” is updated), all the existing documents must be converted. The first solution, eager migration, can result in a long and heavy process if the updated class has many instances, and the required changes imply a complex computation. Since the object migration is made in parallel with the “regular” enactment of the process it is possible that the former activity slows down the latter. On the contrary, the documents could be modified only when needed (lazy migration): every time a document is opened, it is automatically converted to the new format. The latter solution is less demanding in terms of performances, but it leaves the process in a partially inconsistent state for a possibly long time, during which the process is forced to “remember” how objects have to be converted. A different approach to type migration consists of providing support for type versioning: both types survive, new objects will be created according to the new type definition, while old objects are accessed according to the corresponding type version. Following the previous example, each document would hold its original definition, without needing to be modified. All the formats would remain available within the model. Type versioning is actually necessary to deal with the updates in activity definitions. If the process owner decides to redefine an activity (e.g., the change control), it is generally unreasonable that the currently running activities are converted to the new definition. This conversion is difficult to achieve because any activity could be in a different state, which does not necessarily correspond to a state consistent with the new definition. Therefore, converting the running activities would imply to stop them, and define a conversion path from their state into a legal state of the new definition. Moreover, in many cases there are practical considerations (smooth introduction of new procedures, delays due to training, etc.) that suggest to maintain both the old and new activity definitions, for a while. Support for SLANG characteristics
THEORY AND PRACTICE OF OBJECT SYSTEMS—May 1994
5
The interpretation of SLANG specific constructs must be effectively supported. For example, the evaluation of guards involves the computation of a tuple of tokens enabling the transition firing. This operation consisting of a join of the sets of tokens in the preset of each transition - is very frequent, and therefore it must be performed efficiently. Distribution SPADE supports multi-user project development by allowing the distribution of running activities over a number of workstations. The model of the process that supervises and governs the activities is shared among the workstations. There are, basically, two ways to implement this scenario. The first solution is based on distributed accesses from the users’ workstations to a DB server. The whole model resides on the server, that is a fundamental component to determine the performance of the whole system. This approach appears to be interesting for small to medium projects, and for large projects that can be split into quite independent subprojects. The second idea, suitable for larger projects, is to distribute transparently the process model execution over several workstations. This implies distributing the process data, consisting mainly of the artifacts that are produced locally by each user. The tools used should not need to know anything about the physical distribution of the used data: it should be the responsibility of the database system to manage physical distribution. This approach reduces the client-server traffic on the local area network, while it introduces the need for consistency control mechanisms to preserve the consistency of the distributed databases [34]. Support for tool integration Very often tools share common data structures. For example, a syntax-directed editor and a compiler can share the same abstract syntax representation of the source code. A discussion of data integration in software engineering environments can be found in [38]. SPADE does not impose any constraint on the granularity at which data are actually shared, i.e., a granule could be both an entire abstract syntax tree or one of its nodes. In fact, SPADE always tries to provide mechanisms that are as general as possible, i.e., that do not impose any particular policy or strategy or methodology. Often this results in the need for the process modeler to explicitly specify the characteristics of the process. In particular, the granularity of a given piece of data is determined by the process modeler when he/she defines the type of the places that will contain that data. The decomposition in finer granules can be explicitly programmed: e.g., if a tool T requires fine grained data, an aggregated data will be decomposed by a specific
6
THEORY AND PRACTICE OF OBJECT SYSTEMS—May 1994
transition and the intermediate results will be stored in suitable places belonging to the preset of the black transition that invokes tool T. Moreover, in the SPADE environment tools and process interpreters are database applications1 sharing a common schema (i.e., a common set of data type definitions) and a common base (i.e., a common set of objects representing software artifacts). In this scenario it is necessary that the underlying repository provides applications with facilities for data communication. For example, when the syntax-directed editor saves a modified program, a reference to this program must be sent to the process engine, that will forward it to the compiler. Other issues There are some features that are frequently mentioned as requirements for a software engineering database, but play a minor part in our environment. The reason —already mentioned in the preceding section— is that it was decided not to hard-code any specific policy within SPADE, but to provide only the basic mechanisms to let users program these services according to their needs. In particular, concurrency management, versions handling and access control are examples of programmable operations. This means that SPADE can do without primitive mechanism provided by the DBMS for these purposes. Nevertheless, the existence of readily available mechanisms can be exploited by the process modeler, thus making the development of models easier. Actually, we believe that PSEEs (at least commercial ones) should supply a sort of library, containing different customizable models for concurrency management, versions handling, access control, etc. In this way, the characteristics of the underlying DBMS would be hidden, avoiding to the user the burden of mapping the desired startegy or policy or procedure onto the mechanism supplied by the DBMS. In this section we briefly describe the aforementioned activities and explain why they have not been considered in the first implementation of SPADE.
In SPADE, several process engines can be active in parallel. They communicate via shared places, i.e., different process engines can access tokens in the same places. It is thus necessary to properly synchronize accesses to shared places: this can be achieved by means of basic mechanisms (such as two-phase lock) that guarantee transaction serializability. Locks should be placed on objects, since locking at a coarser granularity (e.g., locking whole pages) could prevent potentially concurrent activities from actually proceeding in parallel.
Concurrency management
The OODBMS does not necessarily need to provide sophisticated concurrency control mechanisms, such as long or nested transactions. In fact, the process modeler can build SLANG nets that “implement” the desired behavior, by using just the constructs provided by the process language.
Each user has an identity and a role: these attributes can be modeled explicitly in SLANG, and guards may take them into account, so that only users having a specific profile are enabled to perform given actions. In other words, access control can be explicitly programmed by the process modeler. The database management system has simply to provide basic mechanisms that prevent unauthorized access to the object base, e.g., from outside the process-centered environment. Access control
termined by the methods defined in its class. Methods are coded in O2 C, a fourth generation language, born as a superset of ANSI C and extended to support the object-oriented data model of O2 . O2 allows users to write programs, to manipulate their databases and to generate appropriate user interfaces. Applications written in other languages, like C and C++, have access to the features offered by O2 by means of an import/export mechanism. Users can also easily design graphic interfaces using O2 Look, built on top of X Window System and Motif, that provides a set of high-level functions to display and to edit complex objects. The O2 system offers a declarative query language, called O2 SQL, whose syntax is styled on SQL, the standard query language for relational databases. O2 SQL allows the user to query an O2 database either in an interactive way or under program control. Run-Time schema modification
Versioning of artifacts can also be “programmed” in SLANG, thus a specific support by the OODBMS is not strictly required. Nonetheless, support from the OODBMS could facilitate the process programmer in writing the definition of the versionable artifacts. Versioning
Using O as a Process Repository In this section we briefly describe O2 [17], a “state of the art” OODBMS, and we assess its suitability as the repository supporting SPADE2 . The choice of O2 descended from an evaluation of available OODBMSs with respect to a set of initial requirements (that were then refined into those described above). Experiments in the development of the SPADE repository using other databases have also been carried out at CEFRIEL, in order to assess the variety of features provided by commercially available OODBMSs. The results of these experiments are reported in the following section. The O OODBMS O2 is an OODBMS in the sense of [1]. It is compliant with many of the Object Data Management Group (ODMG) standards [9] and can be considered a state of the art OODBMS. The first prototype of O2 was the result of a research project started in 1986 by the Alta¨ır consortium; O2 is now a commercial product of O2 Technology. O2 is provided with a complete development environment and a set of user interface tools. Information is organized in objects (instances of classes), and values (instances of types). A value has only a type, while an object has an identity, a value and a behavior, de-
O2 offers the possibility to invoke the database schema manipulation capabilities at run-time. Namely, O2 offers the visibility and possibility of updating the predefined object Metaschema, that describes the schema of the used base. This feature facilitates the run-time creation and modification of classes, including updating and creating methods. Furthermore, it allows to intercept possible schema inconsistencies, letting the programmer avoid many kinds of run-time errors. Another interesting usage of the Metaschema is the ability to build and to use objects of a class that is not known at compile time, as is required by the definition of SLANG actions. The dynamic interpretation of SLANG The basic part of the SLANG interpreter (i.e., the loop comprising guard evaluation, transition selection, and firing) has a rather traditional structure, and is therefore written in O2 C. In order to support reflectivity, in the first prototype of SPADE class Transition was provided with two methods, evaluateGuard and executeAction, to evaluate guards and execute actions, respectively. These operations were accomplished by representing guards and actions as strings containing O2 SQL code. This code was interpreted by calling the function o2query, i.e., the embedded query language interpreter. The O2 SQL based SLANG interpreter is described in [3]. Although O2 SQL is quite powerful, experiments with this language revealed the following problems:
• O2 SQL has some problems in understanding the dynamic type of operands, and consequently sometimes it does not handle them correctly (i.e., according to the usual rules of object-oriented programming).
THEORY AND PRACTICE OF OBJECT SYSTEMS—May 1994
7
This situation seems to be related with a trade-off between efficiency and complete conformity to the rules of polymorphism and dynamic binding. In particular O2 SQL does not handle the dynamic types of the components of an operand defined as a set of objects. Any collection is considered a value, so that its elements are statically bound to their declaration type. For instance, a set declared as a set of polygons (super-class), that actually contains a triangle, a rectangle and a square (sub-classes) can be queried only about the properties belonging to polygons, but not about triangles’ or squares’ specific properties. In order to overcome this problem we had to implement some tricks, like passing the parameters of the queries one by one, instead of as a polymorphic set of objects. This is in contrast with the required generality of the code, since the number of places belonging to enabling tuples is artificially constrained.
• When evaluating guards, we are generally interested in finding just one enabling tuple for the considered transition. In the first prototype, there is no way of querying the database for just one element: it is necessary to find out all the elements that satisfy the query, and then extract one of them. Since O2 SQL does not optimize this kind of queries, computing the enabling tuple is very inefficient. This situation was a real problem in SPADE, because the computation of the enabling tuples is the most frequent operation performed by the SLANG interpreter.
• Interpretation of guards in presence of dynamically weighted arcs poses hard problems of performances. Because of the semantics of this kind of arcs, the only general solution implies the creation of power sets. When implemented in O2 SQL, this operation seems to cause difficulties to O2 , even for relatively small sets. Moreover O2 SQL syntax obliges to define a maximum cardinality of the sets that can be built. When the O2 Metaschema facility was made available, it was decided to abandon the interpretation schema based on O2 SQL, and exploit the features of O2 Metaschema to overcome the problems described so far. The fact that guards and actions are processdependent, i.e., they deal with user-defined objects, suggested to implement each of them by means of O2 functions generated by the modeler through the activity editor and then dynamically compiled using O2 Metaschema. Every time the activity editor commits the change of an activity definition, this is “compiled”, and the new functions associated with guards and actions are produced. The adopted solution schema allows fast execution of actions and efficient evaluation of guards. Moreover
8
THEORY AND PRACTICE OF OBJECT SYSTEMS—May 1994
it is general, i.e., it is able to translate any legal guard or action. The current implementation of SPADE is actually based on the Metaschema. The description of the implementation based on O2 SQL was retained to describe the problems involved, and to justify the exploitation of the Metaschema. Type Migration O2 does not provide direct support to type migration. Any change made to the type structure of a class (like adding or removing an attribute) will yield inconsistent objects of that class in each base governed by the modified schema. This inconsistency will lead to wrong accesses during the execution of a program body and to the abortion of transactions. In order to avoid this problem, the user must dump all objects of this class before changing its structure. Dumped information can be later retrieved and associated to instances of the updated type. O2 enables the run-time modification of a type, but this operation is safe only if there are no instances of that type. However, in most cases it is possible to obtain a sort of type migration (where object’s identity is not preserved) exploiting the mechanisms provided by the O2 Metaschema. For example, the manual describes a generic function that adds an attribute to all object of a given class C. Such an effect is obtained by a) creating a subclass of C that includes the specified attribute, b) creating an object of the new class, and c) copying the value of each attribute of the old object into the corresponding attribute of the newly created object (the extra attribute is given a default value). Point c) is implemented as follows: 1. Using the predefined methods of the Metaschema the definition of the original class is examined, and the name of each of its attributes is retrieved. 2. The text of a function that assigns each attribute of the old object to the corresponding attribute of the new object is composed. 3. Such a function is declared and compiled by means of the predefined method command of the predefined object Schema. 4. The function is executed by means of a o2query call. Note however that this way of implementing type migration changes the object identifier (OID) of the migrating objects: this is not a problem in SPADE, while in other environments this effect could be a severe problem. In general if the migrating object is referenced by other objects, changing the OID causes the references to get lost. In SPADE this is not the case, because the objects are identified through their location, i.e., the place they are currently stored in, and not through their OID. Moreover, an object in a place P1 is not allowed to reference an object in place P2, becuse the firing of
a transition having P2 in the preset would change the content of P1 as a side-effect. This would violate the semantics of Petri nets, and is therefore forbidden. In SPADE the user deals with logical classes. A single logical class name can be related to several system classes, implementing different versions of the logical class. For example, suppose that a user class C corresponds to the system class C1. Active copies A1 and A2 (instances of activity A) handle tokens of class C, i.e., objects of class C1. The user changes the definition of class C: this results in a new system class C2 that is now associated to C. Active copies A1 and A2 continue to use the old definition C1. If a new instance of A, say A3, is created, it will use the definition C2. It is also possible to suspend the execution of A2 and force the objects of type C1 to migrate to type C2 (or to another special definition C3). The proposed solution is a general implementation of type versioning, applicable whenever the underlying database does not supply it, but allows direct management of meta-data.
O2 enhancements in GoodStep The features presented in this section are being added to the O2 OODBMS in the framework of the GoodStep Project. Rather than developing a new system, GoodStep enhances and improves O2 in order to obtain an OODBMS dedicated to support software development environments (SDEs). VIEWS
The tool O2 Views [36] is implemented in O2 C on top of the O2 Engine, the kernel of the O2 system. A view is defined as a virtual schema, i.e., a set of virtual definitions of O2 classes derived from a real schema, that becomes the root schema of the view. A virtual schema defines a virtual base, the image of the real root base through the view, i.e., it presents the data actually stored in the base according to the new virtual definitions. In fact a view can be seen as a filtering schema through which the information of a real base is seen. Basically, a view consists of virtual and imaginary classes which define the appearance of the objects in the view through virtual and hidden attributes that respectively augment and restrict the interface of real objects. ACTIVE RULES
Rules [12] are comprised of three parts: an Event (E), a Condition (C) and an Action (A): when the event E occurs, if the condition C holds, then the action A is executed. O2 Rules are defined as schema elements and currently they can be associated with primitive events only. A primitive event occur when objects are created, values are modified, entities become persistent,
messages are sent to objects, transactions start, commit, validate or abort, and also when programs start or finish. Furthermore, they respect encapsulation (only public entities can generate valid events) and can be activated or deactivated by suitable messages. Due to the flat transactional model of the O2 system, triggered rules are not seen as sub-transactions but become parts of the triggering transactions. OBJECT MIGRATION
The proposed solution [20] is based on migration functions that, while preserving objects’ OIDs, converts instances created using a class definition into objects belonging to a new class. Preserving the identity of an object avoids changing all the references to the object itself. Object updates can be performed either by userdefined migration functions, when supplied, or by system default transformations. These functions must be constrained to be able to ensure correctness even in the most critical situations, e.g., when an update uses information stored in different objects. The mechanism can be used to implement either eager or lazy migration policies. OBJECT VERSIONING
The Version Manager (VM) does not handle single objects, but version units, i.e. sets of objects. Version units are user-defined and may evolve dynamically during their lifecicle, i.e., objects can be added to or removed from a version unit. Handling versioned objects is very simple: an application just has to indicate explicitly which version (e.g. X) of a version unit U it intends to use: then, every reference to an object O belonging to version unit U will implicitly make reference to version X of O. The end-user is given access to the VM through the class Version and its public methods, that define the operations allowed on each version unit. This class belongs to an O2 system schema and it has to be imported by every schema that manages versions. Of course, the class Version can be locally refined in order to customize object versioning in accordance to the specific needs. In fact the VM provides only the basic features to define personalized mechanisms for managing versions, without enforcing any policy. Distribution O2 is based on a client-server architecture: this means that clients (that are responsible for the execution of methods and applications) may run on different machines, while the server (that is responsible for data management) runs on a centralized machine. Although not an ideal situation, this architecture is sufficient to support a distributed SEE with a limited number of people and activities involved in the process. In fact
THEORY AND PRACTICE OF OBJECT SYSTEMS—May 1994
9
the amount of traffic required to move objects to and from the server limits the distribution of activities on different clients. Logical distribution of data (i.e., separate bases having a common schema) is supported by O2 . A future release of O2 will support actually distributed bases. Support for tool integration Process engines and tools are concurrent processes that use a (common) O2 database, and are thus O2 clients. Data are actually stored in the common database, and they are therefore accessible by every client: the real problem in order to achieve dataintegration is to give a client the visibility of some data which it is not aware of. Passing a reference from a client to another is not possible, because this reference (i.e., the OID) is not known by the application (it is an implicit information, internal to the OODBMS) and, moreover, the address spaces of two clients are different. Tool communication in SPADE is based on O2 Sockets, a refinement of the mechanism illustrated in [14]. The basic idea is very simple: two clients can communicate via a shared object3 . Data flow from one client to another through two object channels. In order to establish a communication, a client has to access a named object (port) asking for a connection. A pair of channels is created and it is returned to the asking client and added to the port’s list of pending requests. In order to accept a connection, a client has simply to extract the first entry from the port’s list. Note that once the connection is established the port is no longer involved in the communication. This means that the lock on the port is released, and the transactions required by the communication need not to be serialized with other communication related transactions. The plain version of O2 Sockets does not support any method of addressing a particular client, but this problem has been easily solved by an ad-hoc protocol and a private port (e.g., a well known port). For example two clients may agree to communicate through a specific port. Otherwise, using just one port, the identity of the client creating a connection may be part of the connection itself. Once again, though the solution described so far seems to be deeply tailored on O2 , it can be generalized and applied to other OODBMSs. Concurrency management O2 — like many other current implementations of object-oriented databases — provides page-level twophase locking (i.e., pages —contiguous blocks of memory, possibly containing several objects— are locked according to the 2PL mechanism). This satisfies the re-
10
THEORY AND PRACTICE OF OBJECT SYSTEMS—May 1994
quirement for a mechanism allowing atomicity and independence of transition firing; however it can cause logically independent activities to conflict when the involved objects are stored on the same page. Even worse, spurious deadlocks may arise because two transactions are each waiting for an object that is stored on a page locked by the other transaction. This situation is very annoying, since it forces the SLANG interpreter to handle deadlock conditions, while the execution of SLANG model would be deadlock-free by construction. A prototype version of O2 featuring object-level locking was developed within the GoodStep project and released in May 1995. Experiments with other OODBMSs Experiments were conducted with other three OODBMS, namely GemStone, DEC Object/DB, and ODE, while other two, ObjectStore and Itasca4 have been studied without conducting experiments. These OODBMSs were chosen because they are significant examples of modern OODBMs. Experiments consisted in writing and executing small programs, each demonstrating the ability of the examined OODBMS to satisfy one (or more) of the reported requirements. For a detailed description of the experiments, see [27,39]. GemStone GemStone, developed by Servio Corporation, was one of the first commercially available OODBMSs. It combines the concepts of an object-oriented language like Smalltalk with the functionalities of a database management system. The basic components of the GemStone architecture are the Gem server process and the Stone monitor. The Gem server is where object behavior is executed, while the Stone monitor coordinates commit activities by multiple Gems. Note that this is a rather different organization with respect to O2 , where the object behavior is enacted by each application client. GemStone provides a data definition and manipulation language, called OPAL, and C and C++ interfaces. Moreover it supplies a graphical environment, called Geode, to make application development easier. The results of our experiments with GemStone can be summarized as follows:
• In GemStone — as in Smalltalk — everything is an object, including classes. It is thus possible to create and manipulate class definitions, and to instantiate classes that did not exist when the program was launched. • Type versioning is fully supported. Classes are objects, therefore a new version of a class is defined by creating a new class object either with the same
•
• • •
•
name or explicitly declared as a “new version” of the old class. Any new object will be instance of the new version, while the old version continues to be referenced by the old instances. Two versions do not need to share the same structure or to be in hierarchical relation, but they must belong to the same object called class history. Existing objects can migrated from one class to another, provided that both classes are related to the same class history. No specific policy, e.g., lazy migration or eager migration, is provided. Being based on a client-server architecture, GemStone satisfies the minimum requirements as far as distribution is concerned. GemStone provides satisfactory mechanisms for concurrency management, access control, security, etc. GemStone does not support multiple inheritance. This limitation does not allow to create classes of versioned objects by defining a new class that inherits from both the original class and the VersionedObject class. This limitation creates problems to the integration of tools: if a tool uses data of class C, while the PSEE deals with data of class token, a new class is needed whose instances can be used by both the tool and the process environment. Multiple inheritance allows the creation of a new class Ctoken that inherits from both C and token and satisfies the requirement. The lack of multiple inheritance makes this kind of integration more cumbersome. GemStone features persistency roots that are quite similar to O2 ’s ones, so the data-level tool integration scheme implemented in O2 can be easily ported onto GemStone.
In conclusion, GemStone seams to provide the basic mechanisms that are needed in the SPADE project, the only limitation being multiple inheritance. The lack of multiple inheritance forces the user to build inheritance hierarchies that are “unnatural” and complex. ODE ODE (Object Database & Environment) is an objectoriented database, based on the C++ object model, developed by AT&T Research Laboratories. It is currently distributed as a prototype version (v. 3.0) to universities and research centers. ODE provides four different interfaces to the database:
• • • •
O++: a C++ programming interface, CQL++: an SQL-like interface, OdeFS: a UNIX file system-like interface, OdeView: a graphical X-window based interface.
Objects created with one interface can be accessed and manipulated with the other ones.
Currently only O++ and OdeFS (as a beta version) are available. O++ extends C++ with facilities for managing persistent objects, querying the database, running transactions, and specifying advanced features like constraints and triggers. The version of ODE that was available for experiments (v. 2.0) and the current one (v. 3.0.3) are still incomplete with respect to the goals of the ODE project, thus only a subset of the planned experiments could be performed. The results of our experimentations can be summarized as follows:
• ODE has a catalog that records the types of objects that are (or which were previously) in the database. The catalog itself consists of elements of type metatype. The objects’ types stored can be examined by iterating over the metatype type extent. Modifying a type definition having instances will lead to unspecified behavior. To avoid this, in order to modify the definition of class T, all objects of type T should first be deleted. Even if version 3.0.3 provides a Metaschema (the catalog), currently ODE does not seem to completely support reflection. Moreover it does not provide an embedded query interpreter. CQL++ sequences have to be translated into equivalent C++ code and then compiled, thus the program needs to be stopped in order to be changed. • ODE satisfies the minimum requirements as far as distribution is concerned. In fact, ODE features a client-server architecture. Moreover ODE allows single user applications to run without a server, i.e., with local data. • ODE provides satisfactory mechanisms for concurrency management. Using two-version two-phase locking (2V2P)5 , it supplies three kinds of transactions: update, read only and hypothetical. Hypothetical transactions allow users to create “what if” scenarios: they can change data and evaluate the consequences without affecting the database. ODE lets users control physical clustering of data. Persistent objects can be stored in explicitly specified clusters, rather than in the objects’ type extents (the default). • Versioning in ODE is orthogonal to type: it is an object property. Users can refer to either a logical object, i.e. the object with all its versions, or to a precise version of that object. • In ODE, data-level tool integration can be achieved by means of shared clusters, i.e., processes store objects in, and retrieve objects from well-known clusters. Since clusters can be dynamically declared at runtime, they are very similar to the ones provided by DEC Object/DB (see below).
THEORY AND PRACTICE OF OBJECT SYSTEMS—May 1994
11
In conclusion ODE fails to provide reflectivity, while the other requirements are satisfied. Moreover, ODE provides sophisticated features (like integrity control and object clustering management) that are not strictly necessary in building the SPADE repository, although clearly useful in building other types of applications. DEC Object/DB DEC Object/DB, originally designed and developed by Objectivity Inc., was bought and enhanced by Digital. It is a C++ based OODBMS, in the sense that the data definition language is translated in C++, and applications themselves are written in C++. DEC Object/DB provides an interesting solution to data distribution. It organizes data storage in a tree hierarchy. The root is represented by a federated database that contains the database schema and information about object distribution. Within a federated database there are one or more databases and each of them can be located on a different workstation over a LAN. Finally, a database embodies some units of clustering, called containers, where objects are actually stored. Through this design, an user can access distributed objects in a quite transparent way. The results of our experimentations with DEC Object/DB can be summarized as follows:
• It does not support reflectivity. It does not provide
• • •
•
a Metaschema, and both programs written in C++ and those written in DEC Object/SQL++ need to be compiled, thus a program needs to be stopped, in order to change the database schema. DEC Object/DB supports transparent access to object bases that are physically distributed over a network. DEC Object/DB provides a sophisticated versioning mechanism. DEC Object/DB provides satisfactory mechanisms for concurrency management, access control, etc. In particular, DEC Object/DB features interesting locking mechanisms: object level locking (not yet available in the release we used) and write-one-readmany (WORM) transactions that speed up concurrent accesses. As far as tool integration is concerned, applications can exchange information by accessing the same containers. A process can, dynamically, create a container within a database in a given federated database, while another application can use it simply by traversing the same persistence hierarchy. Thus, the data-level integration mechanism devised for O2 is in principle applicable to DEC Object/DB too.
In conclusion, DEC Object/DB fails to provide the basic mechanisms that are needed in the SPADE project, as far as reflectivity is concerned. Instead, it provides advanced features (like version management and distri-
12
THEORY AND PRACTICE OF OBJECT SYSTEMS—May 1994
bution) that can simplify the development of process models under several respects. ObjectStore ObjectStore [28], developed by Object Design, is one of the most popular OODBMS. It merges database concepts with C++ programming language. C or C++ developers already have the basic knowledge to use ObjectStore: writing an application that uses ObjectStore is just like writing an ordinary C or C++ program, except for substituting read and write operations with ObjectStore primitives. Currently ObjectStore provides three programming interfaces: C, C++ and a high-level data manipulation language, an extension to C++, based on AT&T CFRONT C++ Language System, that provides users with higher-level commands to manage queries and to iterate on object groups and with relationships6 among objects. ObjectStore supplies also a Schema Designer to create and modify schemas and a Browser to display information about schemas as well as about single objects stored in databases. The results of our analysis can be summarized as follows:
• The system is based on a multi-client/multi-server architecture: clients can simultaneously access multiple databases on different servers, a server and a client can reside on the same workstation. • Building an ObjectStore application has a facet not addressed in normal C++ applications: the generation of schema information as a set of C++ objects. These objects are instances of classes belonging to the MetaObject Protocol, and provide an application with the functionalities necessary to add, delete or modify classes, attributes and methods. • Object Migration consists of making a copy of the old object and then modifying this copy. OIDs change, but the system automatically modifies all the pointers to the changed instances so that referential integrity is preserved. Once the transformation is completed, all the old instances are deleted. • Version management is based on the concepts of Configuration and Workspace. A Configuration groups together objects that have to be treated as a whole with respect to versioning. WorkSpaces are working areas within which Configurations are handled and stored. In order to work on an object, a user has to check out the Configuration the object belongs to, thus obtaining a working copy visible only within the user’s WorkSpace. This means that different users can work concurrently on the same object (Configuration) since they actually handle a private copy and not the object itself. Changes become visible to the outside only when the user’s copy is checked-in the parent WorkSpace, generating a new version of the
Configuration just checked-in. ObjectStore does not indicate how to merge parallel versions: the merging procedure must be programmed by the user, that is free to adopt the most suitable policy.
• Check-in/check-out mechanisms are provided by ObjectStore together with conventional and nested transactions. Users can also set a time-out on their lock requests or even use a lock probe before actually requiring an object. Nested transactions must share the same type (write or read-only) with their parents. User-defined abortions of nested transactions roll data back to their previous consistent state, without causing the abortion of the parent transaction. On the contrary, if the system aborts a transaction T on its own initiative, all the transactions in which T is nested are aborted too. Undoing all the changes done by these transactions prevents the database from going into an intermediate inconsistent state.
databases and to manage the system. Moreover, ITASCA supplies programming interfaces with standard LISP, C and C++. The analisys of the documentation [37,2] showed that:
• Classes can contain both class and instance at-
•
•
• Persistence is orthogonal to classes, i.e. objects of any class can be either persistent or transient. ObjectStore provides advanced clustering capabilities to control object storage. By default, it requires that the user indicates the database where the object will be stored, but in addition the user can also indicate the specific database segment, a cluster, or even the object near which the given object should be stored. Once an object becomes persistent, it can be retrieved by means of Database roots, objects that associate persistent data with labels.
•
•
In conclusion, ObjectStore seems to provide the basic mechanisms required by SPADE and also some other interesting features. These conclusions arise from the study of the available documentation, but have yet to be validated through practical experimentations. ORION/ITASCA ORION [26] is a prototype database system that was developed at the Microelectronics and Computer Technology Corporation (MCC), starting in 1985. An extention of the ORION-2 prototype became a product named ITASCA, marketed by the Itasca Systems. ITASCA extends Common LISP towards an objectoriented and database language, retaining, however, the LISP functional programming style. The architecture of the system is based on a client/server model: the server process is the database monitor and delivers data to the clients, that are possibly distributed on a network. The application programing environment is a Common LISP environment augmented with object-oriented extensions to manipulate classes and instances. Graphical interfaces are provided to edit schemas, to browe
•
•
tributes. The values given to the first ones are common to all the instances of a class, while the values assigned to the second type of attributes are specific of each instance. Database schemas can be modified at run-time. ITASCA offers features to create, delete and modify classes, attributes and methods. The system imposes that schema changes satisfy a set of invariants, that guarantee that the schema remains in a legal state. When a class definition is changed, (e.g., an attribute is deleted), its instances are not actually modified. A screening mechanism presents the instances according to the new definition (e.g., hiding the value of the deleted attributes). This mechanism is very flexible, but makes the access to database objects slower. Only the classes that have been declared as “versionable” can own versioned instances. New versions can be derived from existing ones, creating a hierachy whose structure is maintained in an object called generic object. The versions of an object have different OIDs, but the generic object offers a unifying logical representation: for instance, it provides the current version. This is by default the most recently created, but can be set by the user to any existing version. Short and long transactions are supported. Short transactions are atomic, independent and serialized in execution. Long transactions operate on private data and can have a long durantion. They are based on a check-in/check-out mechanism: an object is checked-out to create a private copy of it, while it is checked-in to end the transaction. ITASCA allows shared transactions, where multiple clients partecipate in the same transaction for improving cooperative work. ITASCA usually places the instances of a class in a segment (a set of sequential pages, that can be dynamically allocated and freed). In addition, users are provided with clustering facilities to put in the same segment instances of a user-defined set of classes, in order to store complex object in an efficient way. The query language allows queries to contain any of the equality or non-equality or identity operators of the language. Methods invocations can appear in a query, but no optimization is performed.
In conclusion, ITASCA seems to supply very interesting features, both from SPADE point of view and in general. Being LISP-based, ITASCA could provide significant advantages as far as dynamic schema modification is concerned.
THEORY AND PRACTICE OF OBJECT SYSTEMS—May 1994
13
tions, long transactions, or explicitly programmable transaction schemes [19].
The characteristics outlined so far have not been checked through experiments. Related work
Databases for Software Engineering Several researchers have discussed the issue of specific database functionalities that are required by software engineering tools and environments. The early work by Bernstein [5] provides an extended abstract about these features and their possible solutions. Other works face the problem from different view points: [19] envisages process-centered environment repositories and [33] presents an approach to software process modeling based on an object-based model called PMDB+. A set of common requirement emerges from this work:
• Software engineering databases have to efficiently
•
•
• •
•
store and manage large and variable-length objects. This is due to the nature of the artifacts (documents, programs, etc.) they have to handle. [5] emphasizes physical storage solutions, while [19] focuses on the need they have to be stored in some very fine-grained format in order to allow concurrent incremental update and access to different parts of the same object. Databases should provide flexible data models, that fully support inheritance, information hiding and method definition, allowing users to implement specialized operators [19] or “procedural attributes” [5]. Systems have to support run-time schema modifications, i.e., it must be possible to create or modify type definitions dynamically. Accordingly, it must be possible to create or modify instances of these types [5,33,19]. OODBMS should provide a computationally complete data manipulation language. Distribution is a relevant issue for SEE repositories. In [19] it is observed that a client-server architecture should be sufficient to support limited-sized projects (especially if the architecture is client-oriented, i.e., most computations are done at the client site). On the contrary, large projects generally require completely distributed databases. Triggers [34,5] and integrity constraints [5] are used as control or monitoring mechanisms (resembling exception handling in programming languages).
Moreover, they can be used to implement active objects [33]. • Views [19] and access control mechanisms [33] provide tool-oriented or user-oriented restrictions, i.e. constrained views, of the project-wide schema. • CASE environments require complex transaction models. They need constructs like nested transac-
14
THEORY AND PRACTICE OF OBJECT SYSTEMS—May 1994
In [4] several approaches for implementing PCTE on top of an OODBMS are examined. The identified requirements for the underlying OODBMS are more numerous and more demanding than those reported here. This is rather natural, since PCTE is a more generic environment than SPADE and similar PSEEs. Moreover the features that an OODBMS must offer strictly depend on the extent to which the PCTE implementation is provided directly by the OODBMS. There is a general consensus that OODBMSs feature the most promising technology to satisfy the requirements described above. In particular, (almost) all of the currently available OODBMSs provide a set of features that satisfy several of the aforementioned requirements:
• flexible type systems, allowing the definition of new • • • •
types; large, complex object management; encapsulation of data and operations (specified by a computationally complete data manipulation language); support of basic object-oriented features (inheritance, polymorphism and late binding); client-server architecture.
Moreover, many implementations are currently beginning to provide advanced features, such as versioning and elementary schema update facilities. Triggers and views are also available as prototype implementations for some systems [12,36]. In building SPADE we were able to consider the problems posed by a software engineering application from a new - more specific - point of view. In particular, we dealt with the problems connected with the construction of an interpreter for a process modeling language. SLANG aims at providing process modelers with a powerful and flexible set of primitives, while the definition of the policies to be followed (such as version and configuration management, advanced concurrency control mechanisms, etc.) are specified by the process modeler. This means that many of the requirements that are reported in the literature are not essential for implementing SPADE. On the contrary, our experience proved that the requirements posed by SPADE can be classified in two main groups: those satisfied by features commonly available in OODBMSs, and those regarding the reflective features of the language. In particular, in the latter group, run-time schema update and type migration are the most critical problems (since evolution is an extremely important feature of software processes [7], it is absolutely necessary that PSEEs support it). We noticed that many OODBMSs fall short in satisfying these requirements. In a software engineering environment, an OODBMS can be very useful in providing a common interface
(guaranteed by the common schema) to distributed objects (data and services) residing on different clients. These clients can be located on different machines and different operating systems. This is a desirable feature that is rarely mentioned among the requirements attaining OODBMSs for Software Engineering. Nevertheless, we think that it is a characteristic that could — and should— be exploited by SEE builders. In order to facilitate this, OODBMSs should provide mechanisms to easily communicate object identifiers among clients.
In Merlin, fine-grained data are stored in the non standard DB system GRAS [29]. ALF [10] is based on PCTE, although the implemented concepts are largely independent of the repository. Adele-Tempo is based on an Entity-Relationship database, complemented by object-oriented concepts and an activity manager based on triggers. Oikos [30] uses logic languages (extensions of Extended Shared Prolog) for process modeling. Data storage is also based on a logical DBMS, named Salad [11].
A look at other PSEEs Practically all the PSEEs rely on a repository. Although each PSEE has its own characteristics, that can result in different requirements for the repository, it is nevertheless interesting to consider the storage systems used by some of the outstanding PSEEs7 . The overview is organized according to the following classification of the storage systems:
• file system; • ad-hoc repositories; • commercial repositories, further distinguished according to the data model.
In general, a PSEE has to allow the usage of traditional development tools. These tools are usually file-based programs, i.e. the read data from files and write results into files. Therefore, many PSEEs store data in the file-system. For example, in the SPADE environment it is possible to transfer data from the process repository into the file system, in order to let Unix tools like vi, cc, etc. to operate. Merlin [25] also adopts an heterogeneous approach to data storage: some documents (those used by Unix tools) are stored as Unix files, while others are stored as abstract syntax graphs.
The workflow management environment Leu [18] represents process data according to an extended Entity-Relationship model, that is supported by the commercial RDBMS Oracle. In Merlin, part of the process data are managed by a full object-oriented DBMS, in order to overcome some limitations in the locking mechanism offered by GRAS (it only allows to lock whole graphs, while often it would be convenient to lock finer granules). The used OODBMS was formerly GemStone, recently replaced by O2 . SPADE also uses O2 . PSEEs based on commercial DBMSs
File system based PSEEs
Concluding Remarks
EPOS [13] is based on EPOSDB [31], a client-server DBMS offering uniform, changed-oriented versioning and a structurally object-oriented data model. EPOSDB also features long, nested and cooperating transactions.
In this paper we have reported our experience in building the repository for the process centered software development environment SPADE. In particular, we described the actual repository, built using O2 , and experiments using other OODBMSs. We illustrated the requirements that a PSEE poses to the supporting repository, and we described how object-oriented database technology fulfills such requirements. Athough we have used SPADE as a reference PSEE, many of the illustrated results can be generalized, under the conditions discussed. Table 1 summarizes our experiences8 . Our experiment gave encouraging results, in the sense that OODBMS technology proved enough mature to let us undertake the non trivial task of developing an interpreter for SLANG (using O2 ). However, several goals are still to be met, as discussed in the sections
PSEEs based on ad-hoc repositories
TABLE 1. A concise summary of our experimentations
Requirements
O2
ODE
GemStone
Object/DB
ObjectStore
Itasca
Reflectivity Run-time Schema Evolution Concurrency Management Architecture Distribution Type Versioning and Migration Data Integration
Yes Yes Yes Yes No Yes
No No Yes Yes No Yes
Yes Yes Yes Yes Yes Yes
No No Yes Yes No Yes
Yes? Yes Yes Yes Yes Yes
Yes Yes Yes Yes Yes? Yes?
THEORY AND PRACTICE OF OBJECT SYSTEMS—May 1994
15
concerning O2 and the other OODBMS. Among these, we remind object-level locking, support for type migration, and physical distribution of data. We hope that this paper will contribute to enhance the confidence of the software engineering community in the capabilities of object-oriented databases, as well as to provide some suggestions for the evolution of OODBMS technology. Ongoing and future activities include:
• Experiments aiming at determining when physical distribution of data is actually necessary (i.e., when it is necessary to store data where it is most frequently needed). This will imply evaluating the limits of the client-server architecture of O2 and investigating the efficiency issues related with page caching at the clients.
• Integration of tools in the SPADE environment, investigating both the usage of data-integrated tools, and tool environments such as FIELD [35], in order to couple the benefits of a PSEE and those of a service-oriented, message-driven environment.
• We will also continue to observe the evolution of OODBMSs, in order to identify possible features that would allow the construction of more powerful or more efficient repositories for SPADE and for PSEEs in general. It is our will to experiment with such new features, and possibly to use them in the development of new releases of the repository for SPADE.
Acknowledgments People from O2 Technology provided continuos support and help. Antonio Carzaniga, Silvio Leggio and Stefano Torcello helped in carrying out experiments. Notes 1. Actually SPADE also supports tools, like Unix tools, that are not database applications. Nevertheless, such tools are not integrated at the data level. 2. SPADE was built using O2 release 4.4.0. 3. In O2 shared objects are named objects. They are persistent and visible by all clients that use the schema where the object names are declared. 4. We used GemStone version 4.0, DEC Object/DB version 2.0.0b2, ODE version 2.0 (results were also checked with the documentation of release 3.0.3) and ObjectStore version 3.0. The results described here refer to such versions. It is clearly possible that new versions, made available after this paper was written, solve some or all of the problems reported here. 5. A number of readers and at most one writer can be operating simultaneously on the same granule. The writer has to wait for all the readers to finish before it can commit.
16
THEORY AND PRACTICE OF OBJECT SYSTEMS—May 1994
6. Relationships can be thought of as a pair of inverse pointers. If an object points to another, the second object has an inverse pointer back to the first. The system, maintaining the integrity of these pointers, supplies a safe way for handling 1-1, 1-n and n-m relations. 7. For space reasons we had to select a relatively small number of PSEEs. The selection has been made in order to represent the different design choices with respect to persistent data mangement. 8. Note that some evaluations are accompanied by a question mark. We have not enough information on these topics: practical experiments are needed. Requirements are listed in casual order, as they are considered equally important.
References [1] Atkinson M., F. Bancilhon, D. DeWitt, K. Dittrich, D. Maier, S. Zdonik (1989) The object-oriented database system manifesto. In Proceedings of the First DOOD Conference, Japan, 1989. [2] Ahmed S., A. Wong, D. Sriram, and R. Logcher (1992) Object-oriented database management systems for engineering: A comparison. Journal of ObjectOriented Programming, June 1992. [3] Bandinelli S., L. Baresi, A. Fuggetta and L. Lavazza (1993) Requirements and Early Experiences in the Implementation of the SPADE Repository using ObjectOriented Technology. In International Symposium on Object Technologies for Advanced Software, Kanazawa (Japan), November 1993. JSSST, Springer Verlag. Lecture Notes on Computer Science n. 742. [4] Boudier C., C. Cuisinier, F. Bruno, U. Kelter, W. Seelbach, D. Nolte, F. Stewing, D. Mulcahy, and D. O’Riordan. Implementing PCTE on an ObjectOriented Database. In Proceedings of PCTE ’94, 1994. [5] Bernstein P.A. (1987) Database system support for software engineering–an extended abstract. In Proceedings of the Ninth International Conference on Software Engineering, pages 166–168. IEEE, 1987. [6] Bandinelli S. and A. Fuggetta. (1993) Computational Reflection in Software Process Modeling: The SLANG Approach. In Proceedings of the 15th International Conference on Software engineering, Baltimore, (USA), May 1993. IEEE. [7] Bandinelli S., A. Fuggetta, and C. Ghezzi. (1993) Software Process Model Evolution in the SPADE Environment. IEEE Transactions on Software Engineering. Special Issue on Process Evolution, December 1993. [8] Bandinelli S., A. Fuggetta, C. Ghezzi, and L. Lavazza. (1993) The SLANG 1.0 Process Modeling Language Reference Manual. Technical Report RT93032, CEFRIEL, Via Emanueli, 15 - 20126 Milano (Italy), September 1993. [9] Cattell R.G.G., editor. (1993) The Object Database standard. Morgan Kaufmann, 1993. [10] Canals G., N. Boudjlida, J. Derniame, C. Godard, and J. Lonchamp. (1994) ALF: A Framework for Building Process-Centered Software Engineering En-
vironments. In A. Finkelstein, J. Kramer, and B. Nuseibeh, editors, Software Process Modelling and Technology, pages 103–130. Research Studies Press Ltd., 1994. [11] Chimenti D., R. Gamboa, R. Krishnamurthy, S. Naqvi, S. Tsur, and C. Zaniolo. (1990) The LDL System Prototype. IEEE Tctions on Data and Knowledge Engineering, 2(1), 1990. [12] Collet C., P. Habraken, T. Coupaye, and M. Adiba. (1994) Active rules for the GOODSTEP Software engineering platform. In Proc. of the 2nd International Workshop on Database and Software engineering 16th international conference on Software Engineering, Sorrento, Italy, May 16-17 1994. [13] Conradi R., M. Hagaseth, J. Larsen, M. N. Nguyen, B. P. Munch, P. H. Westby, W. Zhu, M. L. Jaccheri, and C. Liu. (1994) EPOS: Object-Oriented Cooperative Process Modelling. In A. Finkelstein, J. Kramer, and B. Nuseibeh, editors, Software Process Modelling and Technology, pages 33–70. Research Studies Press Ltd., 1994. [14] Carzaniga A., G.P. Picco, and G. Vigna. (1994) Designing and Implementing Inter-Client Communication in the O2 Object-Oriented Database Management System. In Proceedings of the International Symposium Object-Oriented Systems, Methodologies and Applications, Palermo (Italy), September 1994. AICA. [15] Dissmann S., W. Emmerich, B. Holtkamp, K. Lichtinghagen, and L. Shope. (1991) OMSs comparative study. Internal Report D2.4.3-rep-1.0-UDO-EL, ATMOSPHERE, 1991. [16] Dewal S., W. Emmerich, and K. Lichtinghagen. (1992) A Decision Support Method for the Selection of OMSs. In Proceedings of the Second Int. Conference on System Integration, pages 32–40, Morristown, N.J., 1992. IEEE Computer Society Press. [17] O. Deux. (1991) The O2 System. Communications of the ACM, 34(10), October 1991. [18] Dinkhoff G., V. Gruhn, A. Saalmann, and M. Zielonka. (1994) Business Process Modeling in the Workflow Management Environment Leu. In P. Loucopoulos, editor, Proceedings of the 13th International Conference on the Entity-Relationship Approach, pages 46–63. Springer Verlag LNC n.881, 1994. Published with title: Entity-Relationship ApproachER ’94. [19] Emmerich W., W. Sch¨afer, and J. Welsh. (1992) Suitable Databases For Process-centred Environments Do Not Yet Exist. In Jean-Claude Derniame, editor, Proceedings of the Second European Workshop on Software Process Technology, volume 635 of LNCS, pages 94–98, Trondheim (Norway), September 1992. Springer-Verlag. [20] Ferrandina F., T. Meyer, and R. Zicari. (1994) Implementing Lazy Database Updates for an Object Database System. Technical Report 9, GoodStep, March 1994.
[21] Ghezzi C., D. Mandrioli, S. Morasca, and M. Pezz´e. (1991) A Unified High-level Petri Net Formalism for Time-critical Systems. IEEE Transactions on Software Engineering, February 1991. [22] The GoodStep Team. (1993) Description of software engineering applications and requirements for an object-oriented repository. Deliverable 1, ESPRIT project 6115 GoodStep - General Object-Oriented Databases for Software Processes, March 1993. [23] The GoodStep Team. (1994) The GOODSTEP Project: General Object-Oriented Database for Software Engineering Processes. In Proceedings of APSEC’94, the First Asia-Pacific Software Engineering Conference, Tokyo, December 1994. [24] Humphrey W.S. (1989) Managing the Software Process. SEI Series in Software Engineering. AddisonWesley, 1989. [25] Junkermann G., B. Peuschel, W. Sch¨afer, and S. Wolf. (1994) Merlin: Supporting Cooperation in Software Development Through a Knowledge-Based Environment. In A. Finkelstein, J. Kramer, and B. Nuseibeh, editors, Software Process Modelling and Technology, pages 103–130. Research Studies Press Ltd., 1994. [26] Kim W. (1991) Introduction to Object-Oriented Databases. MIT Press, Cambridge, MA, 1991. [27] Leggio S. (1994) Progetto SPADE: studio di fattibilit´a per l’implementazione mediante strumenti software diversi. Tesi di laurea, Politecnico di Milano, June 1994. In italian. [28] Lamb C.W., G. Landis, J.A. Orestein, and D.L. Weinreb. (1991) The ObjectStore Database System. Communications of the ACM, 34(10), October 1991. [29] Lewerentz C. and A. Sch¨ urr. (1988) GRAS - a Management System for Graph-like Documents. In Beeri, Schmidt, and Dayal, editors, Proceedings of the 3rd Conference on Data and Knowledge Bases, pages 19– 31. Morgan Kaufmann, 1988. [30] Montangero C. and V. Ambriola. (1994) OIKOS: Constructing Process-Centered SDEs. In A. Finkelstein, J. Kramer, and B. Nuseibeh, editors, Software Process Modelling and Technology, pages 187–222. Research Studies Press Ltd., 1994. [31] Munch B.P. (1993) Versioning in a Software Engineering Database — the Change Oriented Way. PhD thesis, DCST, NTH, Trondheim, Norway, 1993. [32] Osterweil L. (1987) Software processes are software too. In Proceedings of the Ninth International Conference on Software Engineering. IEEE, 1987. [33] Penedo M.H. and C. Shu. (1991) Acquiring experiences with the modelling and implementation of the project life-cycle process: the PMDB work. Software Engineering Journal, pages 259–273, September 1991. [34] Pueschel B. and S. Wolf. (1993) Architectural support for distributed process-centered software development environments. In Proceedings of the 8th International Software Process Workshop, pages 126–128,
THEORY AND PRACTICE OF OBJECT SYSTEMS—May 1994
17
Berlin (Germany), February 1993. [35] Reiss S. (1990) Connecting Tools using Message Passing in the FIELD Program Development Environment. IEEE Software, pages 57–67, July 1990. [36] Santos C., S. Abiteboul, and C. Delobel. (1994) Virtual Schemas and Bases. In Proc. of the EDBT (Extending Database Technology) Conference, 1994. [37] Skiadelli M. (1994) Object-Oriented Database System Evaluation for the DAQ System. Technical report,
18
THEORY AND PRACTICE OF OBJECT SYSTEMS—May 1994
CERN, 1994. RD13 collection. [38] Thomas I. and B. A. Nejmeh. (1992) Definition of tool integration for environments. IEEE Software, 9(2):29–35, March 1992. [39] Torcello S. (1993) I requisiti per un DBMS di supporto ad un PSEE, ed esperienze con alcuni OODBMS commerciali. Tesi di laurea, Universit´a degli Studi di Milano, October 1993. In italian.