Document not found! Please try again

A Taxonomy for Schema Versioning Based on the ... - Semantic Scholar

0 downloads 0 Views 52KB Size Report
Relational and Entity Relationship Models†. John F. Roddick ... example the articles listed in Bolour et al. [1], McKenzie [2], Stam and Snodgrass [3], Soo [4] and.
A Taxonomy for Schema Versioning Based on the Relational and Entity Relationship Models† John F. Roddick

Noel G. Craske

Thomas J. Richards

University of South Australia The Levels, SA 5095, Australia

Monash University Caulfield East, VIC 3145, Australia

La Trobe University Bundoora, VIC 3083, Australia

Recently there has been increasing interest in both the problems and the potential of accommodating evolving schema in databases, especially in systems which necessitate a high volume of structural changes or where structural change is difficult. This paper presents a taxonomy of changes applicable to the Entity-Relationship Model together with their effects on the underlying relational model expressed in terms of a second taxonomy relevant to the relational model.

1 Introduction Temporal and historical database systems possess the ability to maintain and manipulate historical data. Various papers have suggested architectures and operations appropriate for such support (see for example the articles listed in Bolour et al. [1], McKenzie [2], Stam and Snodgrass [3], Soo [4] and more recently Tansel, et al. [5]). Since many database systems must not only deal with time-varying data but also with time-varying data structure, support for schema evolution is also required. This paper aims to contribute to the solving of this problem by investigating the accommodation of timevarying database structure through Entity-Relationship Modelling. The paper first discusses the nature of schema versioning and outlines the functionality that a database system supporting schema versioning will require. Secondly, two taxonomies are proposed; the first applicable to the relational data model and the second to the Entity-Relationship Model. A translation between these taxonomies is given.

2 Functionality required of architectures supporting schema versioning A recent bibliography lists approximately 40 papers dealing with various aspects of schema evolution in database systems [6]. From these papers a number of essential and desirable characteristics of such systems can be extracted. There has been an increasing emphasis on building object-oriented functionality on top of relational engines - the unified approach [7]. The approach taken here is to extend the relational model and in doing so provide a possible platform on which the object-oriented model might base its more complex schema evolution. 2.1

Schema modification, evolution and versioning

Before discussing the functionality in detail, three definitions must be given which are taken from Roddick [8] and Jensen, et al. [9]. These definitions are consistent with the majority of the research dealing with schema evolution and schema versioning [10-15] . Schema Modification Schema Modification is accommodated when a database system allows changes to the schema definition of a populated database. ■



This paper appeared as Roddick, J.F., Craske, N.G. and Richards, T.J. 1993. ‘A taxonomy for schema versioning based on the relational and entity relationship models’. In Proc. Twelfth International Conference on Entity-Relationship Approach, Dallas, Texas. Springer-Verlag. 143-154.

Schema Evolution Schema Evolution is accommodated when a database system permits the modification of the database schema without loss of the semantic content of existing data. ■ Schema Versioning Schema Versioning is accommodated when a database system allows the viewing of all data, both retrospectively and prospectively, through user definable version interfaces. ■ The significant difference between evolution and versioning is the ability for users to identify quiet points in the definition and label the definition in force at that time for later reference. Schema evolution does not require the ability to version data except in so far as each changed schema can be considered a new version. Schema changes will not necessarily result in a new version; typically schema changes will be of a finer grain than the definable versions. Moreover, versions will tend to be labelled by some user-defined method whereas schema evolution changes are referred to more often by the (transaction) time of change. It is important to note that schema evolution does not necessarily involve full historical support for the schema; only the ability to change the schema definition without loss of data. In practice, retention of past definitions will often be appropriate. In contrast schema versioning, even in its simplest form, requires that a history of changes be maintained to enable the retention of past schema definitions. 2.2

Domain evolution

The most common change to a database schema is the simple modification of the domain of an attribute. The trivial example suggested in [6], below, exemplifies many of the problems. Staff I d 21677 21678 21680 21683 21687

Position Code G55 G56 A05 A09 G51

Salary $33,000 $37,000 $45,500 $65,400 $32,000

Consider the replacement of the existing position codes with new codes based on new, incompatible domains. The database administrator is faced with a number of questions arising from the retention of the current data. These problems are discussed by Ventrone and Heiler [16] who present a number of examples of changes to the semantics of a domain which result in lost or misleading information. They suggest that any proposed solution be based around capturing the semantics of a domain and the identification of that semantic content within the metadata thus replacing the problems of semantic heterogeneity by more tractable problems of syntactic heterogeneity [16]. 2.3

Relation evolution

Suggestions in the literature indicate that modification of the database schema to accommodate changes at relation or class level (and above) can be achieved in a number of ways. For instance, within the object-oriented paradigm a common method is to establish a set of invariants to ensure the semantic integrity of the schema and a set of rules or primitives for effecting the schema changes [1719] while within the relational model a set of atomic operations is proposed which result in a consistent and, as far as possible, reversible database structure [20]. 2.4

Schema conversion mechanisms

A number of suggestions have been proposed for the conversion of the schema at the physical level. Firstly, the complete schema can be converted to a new version as in Orion [19, 21, 22]. This

method, while being conceptually simple, prohibits the parallel schema versions required in some application environments. The approach of Skarra and Zdonik in Encore [23-25] is to version at class level and thus permit parallel changes as long as they are in different classes. Secondly, Kim and Chou [26], and later Andany, Leonard and Palisser [13], present a system whereby views (or contexts) are constructed and schema evolution is achieved through view creation. This allows multiple concurrent versions of the schema. The relational equivalent can be considered as the creation of a meta-schema. Thirdly, as in Charly [27], the objects can be made self descriptive thus making object and schema modification homogeneous. This method, while being potentially powerful, leads to other problems (for example, schema versioning by this method is difficult) and was later rejected by Andary et. al. in favour of the Kim and Chou approach in a subsequent system. 2.5

Data conversion mechanisms

The mechanisms whereby the data is physically converted to the new version have also been investigated. The three options proposed are the strict conversion method adopted in GemStone [28] in which a change to the schema results in an immediate propagation of that change to the data, the lazy conversion mechanism of Tan and Katayama [29] in which data are changed to the current format only when required, or the logical conversion method in which a system of screens translates the attribute into the required format at access time [19, 26, 30]. No conversion is therefore required. Again, these are physical considerations and are investigated more fully in Roddick [8]. 2.6

Access rights considerations

This is particularly a problem in object-oriented database systems where methods and attributes can be inherited from classes higher in the hierarchy. In these cases schema evolution can result in a violation of access rights. Consider as an example, a change to a class (eg. Employees) from which attributes are inherited to a sub-class (eg. Engineers) for which the modifying user has no legitimate access. Any change to the definition of attributes inherited from the superclass can be considered to violate the access rights of the subclass. Moreover, in some systems ownership of a class does not imply ownership of all instances of that class. In GemStone [18] this is avoided by the rule that ownership of the class is considered sufficient authorisation to allow modification to all instances of that class and any subclasses that may inherit attributes. 2.7

Concurrency control

In a multi-user environment it may be necessary to modify the database schema while another user is currently accessing the database, resulting in significant concurrency control problems. These can be reduced if schema versioning is accommodated but are significant if static schema evolution only is supported. This problem becomes more acute when two users are modifying related schema definitions at one time. 2.8

Query language support

With historical support for the database schema, query language support may be enhanced to provide user access to old schema definitions. This is particularly useful if applications using embedded database language facilities are not to be made obsolete by minor, often irrelevant schema changes. 2.9

Related Areas

The work by Sjøberg [31] looks at the quantification of schema changes within database systems in order to understand the ways in which schema changes are applied to actual systems both under development and in use. Such knowledge may influence the architectural considerations for databases with schema evolution support; for instance in the choice of lazy or eager data conversion.

Van Bommel presents a methodology to enable the development of structurally optimised data models using an evolutionary approach [32]. His approach searches the solution space of possible internal representations of a conceptual model by random mutation. Although using an evolutionary approach, the accommodation of schema evolution of populated databases is not mentioned.

3 Two taxonomies for systems supporting schema versioning Schema versioning is accommodated into relational database systems based on the EntityRelationship model by defining two taxonomies. Firstly a taxonomy for relational database schema changes is proposed; this taxonomy was first introduced in [8] and is an extension of that proposed in [20]. Secondly a taxonomy of Entity-Relationship modelling changes are developed and the effects of these changes are specified in terms of changes to the relational model using the former taxonomy. The proposed taxonomies given below are part of a larger project to allow the accommodation of both schema versioning and structure found by inductive inference. The aim of the developing system, Boswell, is to be able to handle both evolving structure whether database administrator supplied or found by some machine learning algorithm. The approach taken is to define a set of simple, atomic schema modification operations based on relational algebraic operations on the schema which is held as a set of historical relations. A transaction based approach then provides the necessary user-defined referents required by schema versioning. In this way the mechanism provides for most common schema versioning operations and allows review and conversion of data across schema versions. 3.1

A taxonomy of relational schema versioning operations

The following taxonomy of schema change operations (first proposed in [8]) is proposed according to a number of design criteria: i. Schema modification should require the minimum level of intervention appropriate to the change being performed. ii. Schema modification should be as symmetric as possible so that not only can existing data be viewed through new schema definitions but also that data recorded later can be viewed under previous schemas1. iii. The change should be as reversible as possible so that erroneous changes can be removed. iv. While the taxonomy requires historical support for schema definitions, it should not be a requirement to support time in the base architecture. This implies that schema evolution/versioning should be available even for static (non-temporal) relations. v. Details of the implementation are dependent on the operational environment and the user's requirements and thus are not prescribed. vi. As far as possible, the modifications should be expressible in terms of relational algebra operations on the meta-database. vii. Changes on a larger scale than simple relational operations should be composable from the elementary operations supplied. The operations are listed briefly below. More details can be found in [8]. Domain/Attribute Evolution Expanding an attribute domain Restricting an attribute domain Changing the domain of an attribute Adding an attribute to the database Renaming an attribute

1

Relation Evolution Adding a relation Deactivating a relation Activating a relation

The proposals outlined here are not fully symmetric on all operations. A fully symmetric taxonomy would be highly redundant and thus, pragmatically, to be avoided.

Partitioning a relation Joining two relations Coalescing two relations

Attribute-Relation Assignment Evolution Adding an attribute to a relation Deactivating an attribute Promoting an attribute Demoting an attribute Splitting a relation

Schema Transaction Support Schema commit Schema rollback

Note that the temporal append only property is evident in this taxonomy in that deletion is omitted in favour of deactivation. This is consistent with the notion of temporal/historical databases being nonoverwriting and append only. 3.2

A taxonomy based on the Entity Relationship Model

The relational taxonomy given in Section 3.1 exacts operationally changes to a relational database. This can be extended by investigating the evolution of Entity-Relationship models. Consider the simple system depicted by Figure 1. SubId

Name

StuId M Subject

N Takes

Student

Dept

Name Quals

Year

Semester

Grade

Fig. 1 - Example ER diagram

Now consider the following list of possible modifications: i. A subject may be taught by more than one department; ii. Student qualifications are no longer held; iii. An extra attribute of sex is required for the student; iv. A new relationship of "Tutors in" is required between Subject and Student; v. The cardinality between Student and Subject in this new relationship is altered. Below is a taxonomy of modifications which can occur in the evolution of an Entity-Relationship diagram. For each change the corresponding relational changes are given as listed in the relational taxonomy in Section 3.1. For the purposes of this paper the simplest and most logical translation is assumed. This taxonomy assumes a normalised first normal form relational database system. Entity Evolution ER Modification

RDB Modification

Notes

Add entity

Create a relation.

In practice, a place holder attribute is defined until further attributes are defined for the entity or the physical creation is delayed until attributes are specified.

Delete entity

Deactivate the corresponding relation.

This is only valid if no relationships are currently attached to the entity. As discussed in [8] relations are not physically deleted to enable rollback queries and reactivation.

Rename entity

The relation corresponding to the entity is renamed.

This may not result in a database change if a translation table is in use.

Change entity to attribute of rela- Coalesce the corresponding tionship entity and relationship relations for each relationship attached to the entity (if necessary). Demote any key or attributes not identifying the remaining entities. If only one entity remains associated with a relationship a recursive relationship is assumed and the key attribute is repeated.

Semantically this operation makes little sense unless the cardinality of the entity in the relationship is 1 in which case the coalescing will not be necessary.

Change regular entity to weak en- The identifying attributes of tity the parent entity are added to (or promoted into) the set identifying the newly weakened entity. Change weak entity to regular en- The identifying attributes of tity the parent entity are demoted from the set identifying the entity.

Attribute Evolution ER Modification

RDB Modification

Notes

Add attribute to entity

Add an attribute to the corresponding relation.

See note under Add Entity above.

Add attribute to relationship

Add an attribute to the rela- The later case occurs tionship's matching relation for 1 to 1 and some 1 if extant. If not then the re- to N relationships. lationship data is held in a entity's relation and the attribute is added to that instead.

Add sub-attribute to attribute

Add attribute to corresponding relation.

Delete attribute from entity

Deactivate the attribute.

See also the note under Delete Entity above.

Delete attribute from relationship

Deactivate the attribute.

See also the note under Delete Entity above.

Delete sub-attribute from attribute Deactivate attribute from corresponding relation.

Rename attribute

Rename the corresponding attribute.

Change attribute to entity

Promote (add) attributes to the set identifying new entity.

or

Include attribute in set of identifying attributes

Relationship is assumed unary on the side of the new entity until later changed.

Promote attribute into key.

Demote attribute from identifying Demote attribute from key. attribute Relationship Evolution ER Modification

RDB Modification

Notes

Add relationship between two or more entities 1

The identifying attributes are promoted.

1

Add a link between relationship and entity 1

A one to one relationship is assumed until changed using the Change cardinality of relationship operation below.

Add (or promote) the identi- A cardinality of 1 is fying attributes of the enassumed for the new tity to the key of the relalink. tionship relation.

Delete a link between relationship Demote the identifying atand entity tributes of the entity from the key of the relationship relation. Delete relationship

Deactivate the relation representing this relationship.

Rename relationship

Rename corresponding relation.

See also the note under Delete Entity above

Change cardinality of relationship The cardinality determines whether a relationship is represented physically as a separate relation. Altering the cardinality may therefore result in relation creation of deletion as follows: w

x

y

z

1 to 1 -> 1 to N Split relation into two if necessary with foreign key dependency 1 to N -> M to N Create relation to model relationship. 1 to 1 -> M to N Both of above (ie two relations may be created). Inverses perform coalesce on relations above as necessary.

4 Summary A taxonomy and query language for the accommodation of schema versioning into entity-relationship models based on relational database systems has been presented. The paper presents two taxonomies that can be used together which allow modifications to Entity-Relationship models to be reflected in physical database design changes according to a number of specified criteria such as minimum

intervention, rollback support and reversibility. Research is planned to further investigate tools which will allow intelligent DBA support for evolving database models.

Acknowledgements This research was supported, in part, by a research grant from the University of South Australia and, in part, from a research grant from the Institute of Computer Systems Engineering and Assurance.

References [1]

A. Bolour, T.L. Anderson, L.J. Dekeyser and H.K.T. Wong, "The role of time in information processing: a survey". SIGMOD Rec. vol. 12, no. 3, pp. 28-48, 1982.

[2]

L.E. McKenzie, "Bibliography: temporal databases". SIGMOD Rec. vol. 15, no. 4, pp. 40-52, 1986.

[3]

R.B. Stam and R. Snodgrass, "A bibliography on temporal databases". Data Eng. vol. 7, no. 4, pp. 53-61, 1988.

[4]

M.D. Soo, "Bibliography on temporal databases". SIGMOD Rec. vol. 20, no. 1, pp. 14-23, 1991.

[5]

A.U. Tansel, J. Clifford, S.K. Gadia, S. Jajodia, A. Segev and R.T. Snodgrass, Temporal databases: theory, design and implementation. Redwood City, CA: Benjamin Cummings, 1993.

[6]

J.F. Roddick, "Schema evolution in database systems - an annotated bibliography". SIGMOD Rec. vol. 21, no. 4, pp. 35-40., 1992.

[7]

W. Kim, "Object-oriented systems: promises, reality and future", in Proc. Nineteenth International Conference on Very Large Databases. Dublin, Ireland, Morgan Kaufmann, Palo Alto, CA, pp. 676-687, 1993.

[8]

J.F. Roddick, "Implementing schema evolution in relational database systems: an approach based on historical schemata". Department of Computer Science and Computer Engineering, La Trobe University, 1993.

[9]

C. Jensen, et al., "A consensus glossary of temporal database concepts". SIGMOD Rec. vol. 23, no. 1, pp. 52-64. Also Technical Report R93-2035, Department of Mathematics and Computer Science, Aalborg University, Denmark, November, 1993, 1994.

[10] D. Beech and B. Mahbod, "Generalised version control in an Object-oriented database", in Proc. Fourth IEEE International Conference on Data Engineering. Los Angeles, CA, IEEE Computer Society Press, pp. 14-22, 1988. [11] K. Narayanaswamy and K.V. Bapa Rao, "An incremental mechanism for schema evolution in engineering domains", in Proc. Fourth IEEE International Conference on Data Engineering. Los Angeles, CA, IEEE Computer Society Press, pp. 294-301, 1988. [12] A. Bjornerstedt and C. Hulten, "Version control in an object-oriented architecture", in ObjectOriented Concepts, Databases and Applications, Kim, W. and Lochovsky, F., (eds.), New York: Addison-Wesley/ACM Press, pp. 451-485, 1989. [13] J. Andany, M. Leonard and C. Palisser, "Management of schema evolution in databases", in Proc. Seventeenth International Conference on Very Large Databases. Barcelona, Spain, Morgan Kaufmann, San Mateo, CA, pp. 161-170, 1991. [14] G. Ariav, "Temporally oriented data definitions: managing schema evolution in temporally oriented databases". Data Knowl. Eng. vol. 6, no. 6, pp. 451-467, 1991. [15] S.R. Monk and I. Sommerville, "A model for versioning of classes in object-oriented databases", in Proc. Tenth British National Conference on Databases. Aberdeen, SpringerVerlag, pp. 42-58, 1992. [16] V. Ventrone and S. Heiler, "Semantic heterogeneity as a result of domain evolution". SIGMOD Rec. vol. 20, no. 4, pp. 16-20, 1991.

[17] B.S. Lerner and A.N. Habermann, "Beyond schema evolution to database reorganisation". SIGPLAN Not. vol. 25, no. 10, pp. 67-76, 1990. [18] R. Bretl, et al., "The GemStone data management system", in Object-oriented Concepts, Databases and Applications, Kim, W. and Lochovsky, F., (eds.), New York: ACM Press, pp. 283-308, 1989. [19] J. Banerjee, H.-T. Chou, H.J. Kim and H.F. Korth, "Schema evolution in object-oriented persistent databases", in Proc. Sixth Advanced Database Symposium. Tokyo, pp. 23-31, 1986. [20] B. Shneiderman and G. Thomas, "An architecture for automatic relational database system conversion". ACM Trans. Database Syst. vol. 7, no. 2, pp. 235-257, 1982. [21] W. Kim, J.F. Garza, N. Ballou and D. Woelk, "Architecture of the ORION next-generation database system". IEEE Trans. Knowl. and Data Eng. vol. 2, no. 1, pp. 109-124, 1990. [22] W. Kim, N. Ballou, H.-T. Chou, J.F. Garza and D. Woelk, "Features of the Orion objectoriented database system", in Object-oriented Concepts, Databases and Applications, Kim, W. and Lochovsky, F., (eds.), New York: ACM Press, pp. 251-282, 1989. [23] S.B. Zdonik, "Version management in an object-oriented database", , Conradi, R., Didriksen, T.M. and Wanvik, D.H., (eds.), Berlin: Springer-Verlag, pp. 405-422, 1986. [24] A.H. Skarra and S.B. Zdonik, "The management of changing types in an object-oriented database". OOPSLA '86 (SIGPLAN Notices). vol. 21, no. 11, pp. 483-495, 1986. [25] A.H. Skarra and S.B. Zdonik, "Type evolution in an object-oriented database", in Research directions in object-oriented programming, Shriver, B., (ed.) Cambridge, MA: MIT Press, pp. 393-416, 1987. [26] W. Kim and H.-T. Chou, "Versions of schema for object-oriented databases", in Proc. Fourteenth International Conference on Very Large Databases. Los Angeles, CA, Morgan Kaufmann, Palo Alto, CA, pp. 148-159, 1988. [27] C. Palisser, "Charly, un gestionnaire de versions pour la CAO en architecture". Doctoral thesis, Aix-Marseilles, 1989. [28] D.J. Penney and J. Stein, "Class modification in the GemStone object-oriented DBMS". OOPSLA '87 (SIGPLAN Notices). vol. 22, no. 12, pp. 111-117, 1987. [29] L. Tan and T. Katayama, "Meta operations for type management in object-oriented databases - a lazy mechanism for schema evolution", in Proc. First International Conference on Deductive and Object-Oriented Databases, DOOD '89. Kyoto, Japan, North-Holland, pp. 241-258, 1989. [30] J. Banerjee, H.-T. Chou, H.J. Kim and H.F. Korth, "Semantics and implementation of schema evolution in object-oriented databases". ACM SIGMOD International Conference on Management of Data, SIGMOD Record. vol. 16, no. 3, pp. 311-322, 1987. [31] D. Sjøberg, "Quantifying schema evolution". Inf. Softw. Technol. vol. 35, no. 1, pp. 35-44, 1993. [32] P. van Bommel, "A randomised schema mutator for evolutionary database optimisation". Aust. Comput. J. vol. 25, no. 2, pp. 61-69, 1993.

Suggest Documents