The prototype facility of Gambit allows the designer to test the design ... [24]. The
database design system Gambit is an easily under- standable tool for all users ...
IEEE TRANSACTIONS ON SOFTWARE ENGINEERING, VOL. SE-11, NO. 7, JULY 1985
574
Gambit: An Interactive Database Design Tool for Data Structures, Integrity Constraints, and Transactions RICHARD P. BRAEGGER, ANDREAS M. DUDLER, JUERG REBSAMEN, AND CARL AUGUST ZEHNDER
velop different versions of a data structure. Working with those versions and with some test data leads to feedback, which may result in an improvement of the data structure. Therefore, Gambit allows prototyping of databases and may be used as a powerful development tool for larger applications. The features of the interactive data defilnition system Gambit which is fully operational are described in this,paper. Section II starts with a short overview over the architecture of the database system LIDAS and its use. Gambit supports the definition of static data structures in terms of the extended relational model (Section III) and the description of additional semantic integrity constraints by means of a full database programming language (Section IV). With these instruments the user is able to define the dynamic side of the database, i.e., the definition Index Terms-Data modules, database design, database programming of integrity preserving transactions on the defined data struclanguage, entity relationship model, integrity constraints, propagation ture (Section V).
Abstract-The design of a database is a rather complex and dynamic process that requires comprehensive knowledge and experience. There exist many manual design tools and techniques, but the step from a schema to an implementation is still a delicate subject. The interactive database design tool Gambit supports the whole process in an optimal way. It is based on an extended relational-entity relationship model. The designer is assisted in outlining and describing data structures and consistency preserving update transactions. The constraints are formulated using the database programming language Modula/R which is based upon first-order predicate calculus. The update transactions are generated automatically as Modula/R programs and include all defined integrity constraints. They are collected in so-called data modules that represent the only interface to the database apart from read operations. The prototype facility of Gambit allows the designer to test the design of the database. The results can be used as feedback leading to an improvement of the conceptual schema and the transactions.
path concept.
I. INTRODUCTION
JN THE last years, increasing investigations in computer controlled database systems (DBS) were made. The first problem of an installation is the database design resulting in a conceptual schema. This process involves steps of analyzing the enterprise information, collecting requirements, and organizing
the data into a. structure. To capture the dynamic properties, the operations that are relevant to the considered data must also be added to the conceptual schema. Available database systems usually provide two user interfaces: a database manipulation language, and a tool for the evaluation of ad hoc queries. Also, the interactive design of auxiliary data (masks, forms, etc.) is fully supported, but the database design itself is still subject to a manual procedure. In this paper an interactive database design tool is proposed. It is part of the LIDAS database system [12]. LIDAS is based on our extended relational-entity relationship model, including notions of entity and binary hierarchical relationship [20], [24]. The database design system Gambit is an easily understandable tool for all users familiar with database techniques, even for nonspecialists. With Gambit the user may quickly deManuscript received June 1, 1984. R. P. Braegger, A. M. Dudler, and C. A. Zehnder are with the Swiss Federal Institute of Technology (ETH), Zurich, Switzerland. J. Rebsamen was with the Swiss Federal Institute of Technology (ETH), Zurich, Switzerland. He is now with the Fides Treuhandgesellschaft, Zurich, Switzerland.
II. DATABASE ARCHITECTURE AND DATA MODULES A. Database System LIDAS We believe that DBS for modeling and organizing user data should also be available in a workstation environment. User interfaces should be very simple and easy to understand since typical users are not database specialists. This was the main goal during the design of LIDAS. This system was designed for the personal computer Lilith [22]. Lilith is a single-user machine with 128 kword (16 bit) main memory and a 10 Mbyte cartridge disk drive. It has a graphics display and uses a mouse to point to screen positions. LIDAS is written in Modula-2, a high-level general purpose programming language with a systematic module concept [231. Three different user interfaces are offered: an interactive data definition system called Gambit (for the user in the role of the database administrator) described in this paper; a database programming language Modula/R [141; and an interactive data manipulation system call DISCUSS [211 (Fig. 1). All concepts of LIDAS. are rather hardware independent. At the moment, the whole system is also going to be implemented on a VAX/VMS. Modula/R extends the programming language Modula-2 to be a database programming language with the concepts of the relational database model [4]. Relations are integrated as a new data type that can be perceived as a set of elements of type record. In each relation a key has to be specified as a group of attributes of the relation elements which uniquely identifies the elements of a relation (identification key). To select elements of a relation and to operate with logical rela-
0098-5589/85/0700-0574$01.00
C 1985 IEEE
BRAEGGER et al.: GAMBIT: AN INTERACTIVE DATABASE DESIGN TOOL Database Administrator
Database Definition System Gambit
Casual User
+
Casual User
Database
Database Manipulation System DISCtISS
Programmer
Programming Language Modula/R
Database Management System RDS
Operating System
Database(phys)|
Fig. 1. User interfaces of LIDAS. Definition Time 4
*
Compilation Time
Execution Time Casual User
575
insert, delete, and replace operations). All constraints that correspond to an entity are integrated into its operations (see Section V). Such operations are called transactions [8] since they preserve the data integrity. Only these transactions can be used to alter the database. This is achieved by introducing so-called abstract data types. They consist of an entity and its corresponding transactions. The concept of modules in Modula2 supports the realization of abstract data types very well. A user-dependent set of such abstract data types is collected in a data module. These data modules form the interface to all application programs altering the database. They are automatically generated as Modula/R programs by Gambit. So no uncontrolled usage of DML operations is possible, the database can only be altered using these entity-dependent integrity preserving transactions. III. DEFINITION OF A CONCEPTUAL DATA STRUCTURE
tional operations, existential and universal quantifiers have been added, and predicates of the power of first-order predicate calculus can be formulated. Standard application programs written in Modula/R can be used by casual users to communicate with the database. Casual users can also formulate queries or update the database using the interactive data manipulation system DISCUSS.
In this section the method of how to define a conceptual data structure (schema) with Gambit is explained. A schema is defined by stepwise refinement. The database model used in LIDAS is an extention of the relational model towards a binary hierarchical entity relationship model [3]. It includes extended association types for relationships and generalizations of entity sets [13]. The major goal of this highly interactive data definition tool is to help the designer as far as possible to define for his application an appropriate and correct data structure, and userdefined constraints and consistency preserving transactions as needed. The definition of user-defined constraints (exceeding the concepts of the extended relational model) is explained in Section IV and the definition of integrity preserving transactions in Section V.
B. Data Modules In a database, data must not only be stored and retrieved; the system must ensure the semantic integrity of data. For that, various approaches are presented in the literature; an overview is given in [131. In most systems the programmer is allowed to use the database manipulation language (DML) operators to store and update the data in the database unrestrictedly. As a result of such an uncontrolled usage of DML operators, the database may become temporarily inconsistent with respect to the semantic integrity constraints defined in the schema. In conventional implementations, the correctness of these updates are usually checked after the execution. Due to the relatively high cost, many database management systems do not support integrity checking at all, leaving this task to the experienced application programmer. This situation is clearly undesirable, however, since the correctness of the data in the database cannot be guaranteed. Therefore, to get a powerful DBS the semantic integrity constraint management and checking should be done by the system itself. Since the known techniques are rather complex, costly, and not comprehensive enough, a new solution is proposed (Fig. 2). The database definition system, Gambit, allows the database administrator to define the static data structure consisting of entity sets, integrity constraints, and update operations (i.e.,
A. The Extended Relational-Entity Relationship Model Entity, Entity Set, Attribute, Identification Key: An entity is any distinguishable object of the real world or of the imagination that is to be represented in the database. Entities are described by values of attributes. Each attribute is associated with a value domain. Entities with equal attributes, but different attribute values, are grouped in entity sets. By defining entity sets, a real-world situation is modeled discretely. A central problem concerns the identification of entities. An identification key is an attribute (or a minimal attribute combination), which identifies each entity in an entity set. In our model identification key values are available for each entity and must not change during the existence of that entity. Each entity set corresponds exactly to one relation. Relationships Between Entity Sets: While Codd's relational model analyzes the internal properties of relations, our extension covers first interrelational aspects. An important aspect in constructing a conceptual data structure is the definition of relationships between 'two entity sets. The type of a relationship between entity sets ES 1 and ES2 is given by the two association cardinalities k 12 and k21. The cardinality k 12 is the number of entities in the entity set ES2, that may be associated to any one entity of ES 1. This cardinality may vary between a lower bound 112 and an upper bound u 12. In special
Fig. 2. Architecture of the database system LIDAS.
IEEE TRANSACTIONS ON SOFTWARE ENGINEERING, VOL. SE-li, NO. 7, JULY 1985
576 k21 person
1
k12 1.2
Fig. 3. Representation of a relationship. k21
ESi
k12
ES2ESES
k21
k12 |ES3
Fig. 4. Explicit representation of a nonhierarchical relationship. cases lower and upper bound may be equal or no upper bound may be specified. In other cases both are equal (l12 = u 12 =
k 12). For most applications it is sufficient to distinguish -between 112 = 0 and 112 = 1 for lower bound, and between u 12 = 1 and u 12 > 1 for upper bound (Fig. 3). A lower bound equal to zero and an unlimited upper bound does not restrict the data structure at all. Other values for 112 or u 12, however, express that a minimal or maximal number of entities of ES2 must always exist in the database for each entity in ES 1. These conditions have to be enforced by the DBS automatically and permanently (e.g., "Each person has one or two addresses" is equivalent to 112 = 1, u 12 = 2). If these consistency conditions may be weakened in special cases (e.g., an arriving person may be accepted without address), the designer also has to describe such a situation (cf. Section V). In the extended relational model, relationships are represented implicitly (by similar attributes of the involved entity sets) or explicitly by additional entity sets. In the latter case a relationship between two entities is seen as an entity itself. An implicit representation is only possible for hierarchical relationships; a relationship is called hierarchical if kl2 or k21 are equal to one. All other relationships are transformed ("normalized") and represented explicitly. This avoids problems with null values and improves the internal representation of the database. Fig. 4 shows the modeling of a nonhierarchical relationship. For further details, see [13] and [251. Local and Global Attributes, Static and Dynamic Domains: The systematic elimination of redundancy in the conceptual schema of a database is an important aspect in data modeling. A tool for this elimination is the concept of normal forms of entity sets [4], [5], [251. With the construction of entity sets (= relations) in third normal form, the most frequent cases of redundancy are eliminated. However, global aspects outside one simple entity set are not considered. For example, the redundancy based on overlapping entity sets is not removed by normalization. Therefore, the normalization process is extended. To describe it we concentrate on those attributes which identify entities and connect corresponding entities in different entity sets (= implicit relationships). An attribute is called global if it is used in at least one identification key. An attribute is called local if it is used in one entity set only, and is not part of its identification key. The global redundancy is eliminated if every attribute in a conceptual data structure is either global or local. That means it is not "llowed to place the same attribute in different entity sets, but nowhere in the identificcation key. To avoid this situation, overlapping entity sets must be embedded in a
new entity set, which takes over all common attributes as local ones (see the following Generalization section). The next concept which is important in the extended relational model allows a very strict definition of (value) domains for all attributes used in the data structure. A static domain is similar to a scalar type of a high-level programming language; the set of values in a static domain does not change during the lifetime of the database. Local attributes are always based on static domains. A dynamic domain, however, is based on the actual content of the database, more precisely on the actual entities of a specific entity set. (For example, address -is only meaningful in connection with a specific person; the attribute person-no in address has to be based, therefore, on the dynamic domain built by the key attribute person-no in the entity set person.) Global attributes are based on such dynamic domains in most cases; the dynamic domains are identical to the set of identification key values of other entity sets. Only once in a database each global attribute itself has to be introduced and based on static domain too. (For example, in the entity set person the key attribute person-no must be assigned to each person according to the rules of its data type only; all other references to person-no in the database are restricted to occurrences of entities in person which may change frequently during the lifetime of the database.) Generalization: Modeling the information aspects of a part of the real world within a database means first finding the right granularity for the entity sets in the conceptual data structure. A very general entity set collects a lot of entities with a few common properties (e.g., persons) and in another situation one would like to have a lot of small, but specialized entity sets (e.g., professors, secretaries, assistants). A good method to solve this ambiguity is to collect small entity sets within a more general entity set. This method is called generalization and was first introduced in [18] . One aspect of generalization was mentioned above, concerning overlapping entity sets to reduce global redundancy. Two criteria are used to classify generalizations: first, the specialized entity sets are either disjoint or overlapping; and second, the union of all specialized entity sets is either a subset of or equal to the generic entity set. A method to describe such types of generalization in an entity block diagram is shown in the following example. Example: This example illustrates the use of the extended relational model. It shows the conceptual data structure of a database used for the administration of a university department. The database covers persons, which are specialized first as visitors and employees. The employees themselves are a generalization of secretaries, assistants, and professors. Each person has 1 or 2 addresses, each either a private address or a company address. The database also covers lectures. The many-to-many relationship between professors and lectures is represented by the explicit entity set participation (Fig. 5). The description of each entity is given by its name and a list of its attributes. The attributes forming the identification key are printed in italics.
person( persno, personkind,name,firstname,title) employee(persno,yearofbirth,roomno,telefonno ,function, maritalstatus)
BRAEGGER et
al.:
GAMBIT: AN INTERACTIVE DATABASE DESIGN TOOL
577 -_ DB .(. )SSufiVO ty-. | person
employee -,,-.------ -,--,--
]
adres .. ...
..
Entity Sets: Define Type name of new entity set
Fig. 6. Definition of a new entity set. participation
Fig. 5. Entity block diagram of a sample database for the administration of a university department.
|professor
lecture 1..max
visitor(persno ,interests) address(persno ,addressno ,addressType,city,country) privateaddress(persno,addressno,street,remark) companyaddress(persno,addressno,line 1 ,line2 Jine3 ,line4) secretary (persno ,secretaryno,skills) assistant(persno, degrees) professor(persno ,researchobjects) lecture(object,semester,name,time) participation(object,semester,persno)
B. Interactive Data Definition with Gambit
Gambit helps the user in the design of a schema following the method of stepwise refinement (top-down). During each design step Gambit is charged with all routine tasks, for example, with the schema translation from the entity block diagram into a verbal description of entity sets, as far as possible. Gambit even allows iterations and corrections in the design process, and shows immediately the consequences of a change of earlier decisions (e.g., the deletion of an identification key). And Gambit helps the user to find an appropriate representation of the data structure in certain cases like nonhierarchical relationships or recursive data structures (Fig. 7). Another advantage concerns the permanent checking of the data structure's integrity. At any moment the designer may ask for the status of his actual definition. Gambit then shows him which tasks are to be performed to complete the definition of his schema. After terminating the data structure design, Gambit generates a documentation consisting of an entity block diagram and the database module (in Modula/R) with all details concerning the entity sets. To provide these capabilities, the following concept was used. Gambit guides the user during the design process. At every moment he may choose from a variety of possible steps. So the user may specify local attributes for one entity set, while certain other entity sets or relationships are not yet defined. It is also possible to postpone important decisions, e.g., the definition of an identification key. Gambit takes care of such intermediate inconsistencies and makes them visible to the user upon request. At every moment the designer has two different views of his intermediate design product. He may see a global entity block diagram with all entity sets and relationships defined, or the verbal specification of one entity set with
O.max
low..max
-)->) Dialog C 1 < Type upper bound > 4(< 1 2 3 4
8 9
0O
Define en intermediate entity set Indicate lower tat corner of the entity set box Type name of the entity set
Fig. 7. Definition of a nonhierarchical relationship.
all details, e.g., the identification key and all global and local attributes and domains. The global viewis used to defilne entity sets and relationships and the local view to define identification keys, global, and local attributes. Operations of Entity Sets and Relationships: An entity set is marked by a rectangle on the screen. To define an entity set the user is asked by Gambit for the place of the rectangle and for the name of the set (Fig. 6). He marks the position with the cursor and types the name. This name is automatically checked for uniqueness. The deletion of an entity set causes the deletion of all involved relationships. This implies that global attributes and foreign keys based on this entity set are removed from the whole schema. All these dependent operations are executed by Gambit only after asking the user for his final decision. The specification of relationships isyperformed using the graphical representation of the entity block diagram. The user has first to indicate (with the cursor on the screen) the two entity sets involved. Then Gambit shows the relationship by drawing a connection line. Afterwards, the user is asked for the relationship type, i.e., for the both cardinalities k12 and k21. These cardinalities may be chosen from a menu and only some special subranges must be specified further. If the defined relationship is not hierarchical (e.g., in Fig. 7), the designer is urged to define a special entity set for this relationship, but he has only to indicate the place and to type in the name of the new entity set. Gambit itself then generates two appropriate hierarchical relationships with modified association types. In a similar manner Gambit helps the user to eliminate re-
578
IEEE TRANSACTIONS ON SOFTWARE ENGINEERING,
VOL.
»\> uW0ihest D 1965) or a single entity as a whole (e.g., year-of-birth < = year-of-death). Other constraints concern an entity as part of an entity set (e.g., uniqueness) or as part of the data structure (e.g., relationships). Constraints may also concern a whole entity set (e.g., average < 50) or even more than one entity set. The amount of work to test these constraints is small for a single attribute- and increases for constraints concerning a whole entity set. Operation Dependency: Integrity constraints describe a used for the identification attributes that are not imported. consistent state of the database and at first view they should therefore not be dependent on database operations. However, Domains of imported attributes must not be changed.
579
BRAEGGER et al.: GAMBIT: AN INTERACTIVE DATABASE DESIGN TOOL ................................
_
Opeation
,
Insertion
_
Deletion Modification
Insert./ Delet.
Insert./ Modif. Delet.J Modif. All Operations
_
Expression Result Proc.
UNIQUE None
_
Waming Warning
Trrorigr Trigger
_
Input according
to the selection
Fig. 10. Definition of a constraint.
there are constraints which are operation dependent. For example, it is not possible to change the marital status of a divorced person to widowed. There is even another reason to bind an integrity constraint to a set of database operations. It is possible that a constraint may only be violated by some, but not all, operations. For example, it is not possible to violate a uniqueness constraint by the deletion of an entity. Therefore, not all defined constraints have to be tested against every database operation. Gambit binds every model-independent constraint to certain operations (insert, delete, modify). Reaction in Case of Violation: When introducing an integrity constraint, one must specify the reaction in case of a violation. Three possible cases of error reactions are distinguished: A strong integrity constraint is defined by specifying the reaction as an error. The DBS must not tolerate any violation. In case of violation, the currently running transaction is either aborted, or the user is forced to continue with new consistent data. A weak integrity constraint is defined by specifing the reaction as a warning. The DBS is assumed to enforce this constraint, but is ready to tolerate some exceptions. In case of a violation, a warning is given by the DBS, and the user has to decide whether to tolerate this violation or to enter new consistent data. A more detailed reaction may be planned, if instead of an error message or a warning a reaction procedure (trigger) is called. This procedure may correct inconsistent data automatically, or may perform other activites on or from the database. B. Definition of User-Defined Constraints with Gambit For the definition of user-defined constraints, Gambit offers the constructs of the database programming language Modula/R [14]. Therefore, it allows the same semantic possibilities as for the data manipulation. For the interactive use of LIDAS it is helpful if inconsistent input is detected as early as possible. Therefore, every constraint is checked before any database operations are done. The user must be aware of this when defining a constraint. Definition of a Constraint: The definition of a constraint starts by selecting the concerned entity set. Gambit then shows a list of all attributes of this entity set with their data types, and opens an empty window, where the definition of the constraint will be written. This definition consists of three parts, indicated by corresponding menus (Fig. 10). First, the designer has to specify before which database operations this constraint has to be checked. In the second step the user defines the actual constraint. He may choose one of the following cases. 1) Expression: A Boolean expression is expected. The user enters a Modula/R expression using the keyboard. This ex-
pression must evaluate to TRUE before the assigned operations are performed. Within the expression the entity that wants to be inserted is referenced by a record with the name new and an entity that wants to be deleted by a record with the name old. 2) Result Procedure: The user specifies the call of a Boolean result procedure, eventually with actual parameters. The body of this procedure is not defined with Gambit but with the normal text editor on a special trigger file. Although this file is not physically combined with the other data descriptions, it also belongs to the conceptual schema. Like an expression this result procedure must return TRUE before the assigned operations are performed. 3) Unique Constraint: The designer indicates (using the cursor) some attributes of this entity set for a unique constraint. (This case could also be implemented with a Modula/R
expression.)
4) None: No condition is specifed. The following reaction is performed in every case. The third and last step of a constraint definition is the specification of the reaction in case of a violation. The user has to choose one of three possibilities in a menu. 1) Warning: A warning text is entered by the keyboard. The defined condition becomes a weak integrity constraint. 2) Error: An error text is entered by the keyboard. The defi'ned condition becomes a strong integrity constraint. 3) Trigger: The user specifies the call of a trigger procedure, eventually with actual parameters. The body of this procedure is also defined on the trigger file.
C. Examples The first example shows the result of the definition process according to Section IV-B asking for a special test for every person number (the definition of the procedure must be prepared on the trigger file) BEFORE INSERTION, MODIFICATION ASSERT PersNoTest(new.persno) ELSE ERROR "wrong format of a person number";
PROCEDURE PersNoTest (no: PersNumber): BOOLEAN; VAR ok:BOOLEAN; BEGIN ; RETURN ok END PersNoTest; ...
To describe the marital status of an employee the values single, married, divorced, and widowed are used (single means never married). Not all modifications of this field are permitted, e.g., a widowed person can only remain widowed or get married again. Fig. 11 shows the constraint that checks the altering of this field. V. TRANSACTION MODELING In the last years, the research in database design was concentrated mainly on the static aspects of data structures. Operational aspects are treated more casually. An integrated design of structural and behavioral properties is still an exception.
IEEE TRANSACTIONS ON SOFTWARE ENGINEERING, VOL. SE-lI, NO. 7, JULY 1985
580
CONSTRAINTS of ENTITY SET employee
No 1
BEFORE MODIFICATION ASSERT (old.maritalstatus = new.maritalstatus) OR (new.maritalstatus = married) OR ((old.maritalstatus = married) AND (new.maritalstatus single)) ELSE WARNING Illegal change of the maritalstatus" .~~~~~~~~~.
....
DELETE
A
B
A
O.. high
-
(+)
(
low..high low >0
+
A
B
1 B
+
()
no propagation strict propagation (+) cond. propagation
Fig. 13. Propagation rules.
..... ...... ............
Fig. 11. Definition of an additional integrity constraint.
employee function
INSERT
+
type warning text
1