Foundations of Entity-Relationship Modeling - CiteSeerX

6 downloads 0 Views 449KB Size Report
Foundations of Entity-Relationship Modeling. Bernhard Thalheim ... Albert-Einstein-Str. 21. D-O-2500 Rostock, FRG thalheim @ informatik.uni-rostock.dbp.de.
Foundations of Entity-Relationship Modeling

Bernhard Thalheim

Computer Science Department University of Rostock Albert-Einstein-Str. 21 D-O-2500 Rostock, FRG thalheim @ informatik.uni-rostock.dbp.de

Submitted to Annals of Mathematics and Arti cial Intelligence August 1991

Abstract

Database design methodologies should facilitate database modeling, e ectively support database processing and transform a conceptual schema of the database to a high-performance database schema in the model of the corresponding DBMS. The Entity-Relationship Model is extended to the Higher-order Entity-Relationship Model (HERM) which can be used as a high-level, simple and comprehensive database design model for the complete database information on the structure, operations, static and dynamic semantics. The model has the expressive power of semantic models and possesses the simplicity of the entity-relationship model. The paper shows that the model has a well-founded semantics. Several semantical constraints are considered for this model. 1

Introduction

The problem of database design can be stated as follows: Design the logical and physical structure of a database in a given database management system to contain all the information required by the user and required for an ecient behavior of the information system. The implicit goals of database design are:  to meet all the information (content) requirements of the entire spectrum of users in the given application area;  to provide a "natural" and easy-to-understand structuring of the information content;  to conserve the whole semantic information of the designers for a later redesign;  to achieve all the processing requirements and achieve a high degree of eciency of processing;  to achieve the logical independence for query and transaction formulation on this level. 1



HERM Foundations

2

While on the one hand the inputs to the process are so informal, the nal output of the database design is a database de nition with formal syntax and with qualitative and quantitative decisions regarding such problems of physical design like physical placement, indexing and organization of data. This adds additional complexity to the database design process in such a formal design must be turned out from, at times, extremely informal available information. The main complexity of the design process is already given by the complexity and number of items included in the database scheme, and further by the semantics de ned for the database and the operations. A design system captures a lot of information about schemes under design. It needs a structure in which schema information can be expressed and transformed - a database model that can support all phases of the design process. Nowadays, the design process is understood to capture the structural design and the modelling of the semantics, as well as the description of the behavior of the database, especially the operations de ned for the database. Database design is based on one or more data models. A large number of conceptual data models have been proposed. However, actual experience with the use of these models as a basis for implementing a generalized DBMS is very scant. While most models have been proposed primarily for stand-alone database management systems and are adapted to implementational restrictions in database systems, it is increasingly apparent that data models will be from one hand side incorporated directly into programming languages and a variety of tools (e.g. CAD/CAM, expert systems, knowledge bases) and from the other hand side have to be extended to interoperating environments and multisystem- and multimodel-paradigms. Nearly all early commercial DBMS implementations were based on the hierarchical model such as IMS and SYSTEM-2000 or the network model such as IDS and IDMS or the relational model such as INGRES, DB2. The relational data model was proposed as a simple and theoretically well-founded representation of data, and it has soon become the most important model for database systems(see for example [PDG89,Ull89]). The primary virtues of the model are its rigorous mathematical foundation and the correspondence of a relation with the notion of a table. However, research e orts have highlighted a large number of drawbacks to the relational model. Rather than abandon the relational paradigm because of these disadvantages, we are interested in extending relational languages in a way that incorporates useful ideas from alternative language paradigms but allows the retention of most, if not all, of the advantages of the relational approach. The entity-relationship (ER) model is one of the most popular database design models. Despite numerous positive features of the ER approach there still exists a strong nedd for a theoretic basis. This theory must be able to de ne sets of semantically well-formed ER schemes for particular user-speci c ER-techniques as well as for subproblems as scheme design, view integration, query generation, and scheme transformation. Additionally, the formalism has to be suited for computer aided software engineering tools. In [YaT89,Tha89] the suitability of the HERM approach for the solution of database design problems is shown. One reason for the huge variety of extensions of the ER model is that a well-founded theory is still under development. Codd [Cod90, p. 477] states even that the entity-relationship model "is clearly the winner in terms of its lack of precise de nitions, lack of clear level of abstraction, and lack of mental discipline". We show that a precise, formal de nition exists. For that the classical entity-relationship model is evolved to the higher-order entity-relationship model (HERM) which can support design in any of the main classical data models and higher order data models and also translation among them. It supports integrity constraints. Constraint declarations include: attribute data types,



HERM Foundations

3

non-null attributes, attribute combinations forming primary and candidate entity keys, functional dependencies, multivalued dependencies, and inclusion dependencies. They also include relationship cardinalities and other dependencies. The chosen constraint set is powerful enough to capture the constraints in each schema, and to support the generation of equivalent schemes. Without constraints, there are only trivial equivalences between schemes. Without equivalence, it is impossible to justify transformations as provably preserving the information content of a schema. The HERM-methodology uses an abstract-data-type-approach and is based on objectoriented modeling. Objects are to be handled and modelled in databases. They can own an object identi er and are to be characterized by values and references to other objects, i.e. o = (i; f(s; v)g; fref g) The value characterization is to be bound to an already de ned structure s. Characterized properties of objects are to be described by attributes which forms the structure of the object. Objects have furthermore a special semantics and a general semantics. Operators are to be associated to objects. These operators have a behavior. Object which have the same structure, the same general semantics and the same operators are be collected in classes. The structure, the semantics and the operations of a class is represented in types. Modelling of objects includes in this case the association of objects to classes C and their corresponding value type T and reference type R. Therefore, objects are to be represented by o = (i; f(C; T; v )g; f(C; R; ref )g). The known design methodologies vary in the scale of information to be modelled in the types. If objects in the classes can be distinguished by their values then the identi ers can be omitted and we use value-oriented modelling. In the other case, we use an object-oriented approach. In the object-oriented approach di erent approaches can be distinguished. If all objects are identi able by their value types or by references to identi able objects the database is called value-representable. In this case, the database can be modelled by the value-oriented approach too and a mapping from the value-representable scheme to a value-oriented scheme exists. If the database is not value-representable then we have to use object identi ers. It is well-known that in this case either the identi er handling should be made public or the databases can not be updated and maintained. Therefore, value-representable databases are of special interest. Normally, objects do not exist in a database independently. An objects is to be called kernel objects (or independent) if its existence in the database is independent of the existence of any other object in the database. An object is called characteristic if it describes some other object. Objects can perform a superordinate role in interrelating other objects, in which case they are called associative. The exists associations among objects. Associations can be also objects. Kernel objects are to be described by entities in the valued-oriented approach. All other object can be entities or relationships. Kernel objects can be distinguished by their values of some attributes. These attributes are called key. In value-representable databases objects are kernel objects if they are identi able by their values. These objects are represented by entities. All other objects are to be represented by relationships. The classical entity-relationship model uses entity types for the representation of kernel and other objects which are not associations. Only associations are represented by relationships. The recently developed standard drops partially this strictness [STH91]. The HERM-approach



HERM Foundations

4

uses the weakest form of the distinction between entities and relationships which is theoretically sound. Kernel objects are to be described by entity types. All other objects, especially existence dependent objects like characteristic objects are describable by relationship types. The paper is organized as follows. In Section 2, HERM is introduced informally and formally. In Section 3, some logical fundamentals of the models are to be considered. the theory of integrity constraints. Then we discuss the application and extensions of the proposed approach. 2

The Higher-Order Entity-Relationship Model

We introduce now the structural part of the higher-order entity-relationship model. In the this section, it is to be shown that a well-founded theory exists for this model. Besides the existence of a strong theoretical basis there are several other advantages of the HERM approach: HERM-schemes are much simpler and are easier understandable than ERM-schemes. HERM-schemes support abstraction in a simple but comprehensible manner. HERM-schemes can be translated together with the corresponding constraints, with the corresponding user-de ned operations and with the generic operations to normalized relational, hierarchical or network schemes. The HERM type consists of information on the structure, (static) semantics, operations and behavior (dynamic semantics), i.e. HERM-Type = Structure + Semantics + Operations + Behavior. This notation can be generalized to the more general which is out of scope for the purposes of this chapter HERM-Type = Structure + Semantics + Operations + Behavior + Environment.

2.1 Overview

The higher-order entity-relationship model has the following modelling constructs: Simple attributes For a given set of domains there are de ned attributes and their corresponding domains. Nested attributes Using basic types complex attributes can be de ned by means of the following constructors:

 Tuple constructor. Using nested attributes a new nested attribute is de ned by the

cartesian aggregation.  Set constructor. Using a nested attribute a new nested attribute is de ned by the set aggregation.

Addtionally, the bag and the list constructors can be used. For the sake of simplicity we use here only the tuple and set constructors. Entities Entity types are characterized by their attributes. Entity types have a set of attributes which can be used as the identi er of the type.



HERM Foundations

5

Clusters The union of types is called cluster. First-order relationships First-order relationships types are de ned to be associations be-

tween entity types or clusters of those. They can be additionally characterized by attributes. Higher-order relationships The relationship type of the order i is de ned as an association of relationship types of order less than i or entity types and can be additionally characterized by attributes. Integrity constraints A corresponding logical operator can be de ned for each type. A set of logical formulas using this operator can de ne the integrity constraints which are valid for each instance of the type. Operations Operations can be de ned for each type.  The generic operations insert, delete and update are de ned for each type.  The algebra consists of classical relational and set operations like projection, restricted complement, union, intersection, di erence and restrictions de ned on expressions for formulas, cartesian product and extension, technical operations for handling of nested attributes and of sequences like reordering of sequences and of attributes, copying, renaming and additionally operations like tagged clustering, general clustering and declustering.  Each type can have a set of (conditional) operations.  Based on the algebra, query forms and transactions can be speci ed. Examples for the constructs are the following: A name of a person is de ned by the cartesian aggregation Name(First, Fam). The membership in societies can be de ned by the set aggregation attribute MembershipfMember(Society,Since)g . The address of persons is usually de ned to a complex attribute, for example Address(State, City(Code, Town), Street(Name, House(Numb, Appartm))) . A person can be characterized by its social security number, its name, its address and the sex and has the set f SSN g as the identi er , i. e. Person = ( f SSN, Name(First, Fam),Adr(Zip, Town, Street(Name, Nr)), Sexg, fSSNg). The relationship marriage of persons can be modelled by Marriage = (Person, Person, From, To) . An example of an integrity constraint over marriage is the following predicate CorrectMarr := Marriage(Person(Sex),1) 6= Marriage(Person(Sex),2) which indicates that the sex of the rst person on Marriage must be di erent from the sex of the second person in Marriage. The function



HERM Foundations

6

FamilyName(x) := (Person(x)[Name])[Fam] is an example of an operation de ned for the type Person. The function Spouse(x) := Marriage((Person(Name)=x,1))[(Person(Name),2)] + Marriage((Person(Name)=x,2))[(Person(Name),1)] is an example of an operation which involves more than one type. Such operations can be used to de ne roles of relationships. Operations can be de ned using preconditions and postconditions. If, for instance, a type Student = ( f SSN, Name(First, Fam),Adr(Zip, Town, Street(Name, Nr)), Sex, StudentNumber g,fStudentNumberg) is de ned and any student is a person which is stored in the database then the insertion operation for the type Student can be de ned with one precondition or with a postcondition based on the operation Insert: Add1(Student,(x,y,z,u,v)) := if Person(x,y,z,u) then Insert(Student,(x,y,z,u,v)); Add2(Student,(x,y,z,u,v)) := Insert(Student,(x,y,z,u,v)) [if not Person(x,y,z,u) then Insert(Person,(x,y,z,u))]. The HERM di ers from the classical ERM and from extended ERM in several constructs. 1. Constructs which can be found in di erent extensions of the classical ERM are: nested attributes, rst-order relationships of higher arities, clusters and some integrity constraints like the complexity. 2. The following constructs are new: higher-order relationships, integrity constraints including the generalized complexity, operations and conditional operations. 3. Since weak entity types and Is-A-relationships can be represented directly in the HERM there is no direct need for these constructs in the HERM. 4. Kernel objects are distinguished from other objects and di erently represented by entity types. All other objects are to be represented by relationship types. For this reason the schemes in HERM are simpler several times than ERM schemes. Now we introduce the formal de nition of the structural part of the model.

2.2 Entity and Relationship Types

A data scheme DD = (U; D; dom) is given by a nite set U of simple attributes , by a set D = D1; D2; ::: of domains , and by an arity or domain function dom : U ?! D which associates with every simple or atomar attribute its domain. Based on the notion of the data scheme, we can introduce also more complex attributes, the so-called nested attributes. Given a data scheme DD = (U; D; dom) and a set of names NA di erent from U . We now formally de ne the set UN of nested attributes.



HERM Foundations

7

1. ; 2 UN . 2. U  UN . 3. If X1; :::; Xn 2 UN and X 2 NA then X (X1; :::; Xn) is a (tuple-valued) nested attribute in UN . This attribute type can be also used in the notion X . 4. If X 0 2 UN and X 2 NA then X fX 0g is a (set-valued) nested attribute in UN . 5. No other elements are in UN . Note that in [GPV88] an analog notion of nested attributes is introduced but with the following additional condition: for di erent nested attributes X; Y , the sets AT (X ) and AT (Y ) of atomic attributes used in X and Y are disjoint. Now we can extend the function dom to Dom on UN . 1. Dom(;) = ; . 2. For A 2 U , Dom(A) = dom(A) . 3. For X (X1; :::; Xn) 2 UN , Dom(X ) = Dom(X1)  :::  Dom(Xn ) where by M1  :::  Mn is denoted the cartesian product of the sets M1 ; :::; Mn. 4. For X fX 0g 2 UN , Dom(X fX 0g) = Pow(Dom(X )) where by Pow(M ) is denoted the power set of the set M . Let us for the data scheme DD denote by D the union of all sets Dom(X ) for X 2 UN . A tuple on X  UN and on DD = (U; D; dom) is a function t : X ?! D with t(A) 2 Dom(A) for A 2 X . Let us now de ne the key concept based on shallow equality and the structure [AlT90]. For that we need the de nition of the subattribute. We extend the approach of [Yok88]. 1. For A 2 U , ; and A itself are subattributes of A . 2. For X (X1; :::; Xn) 2 UN and fZ1; :::; Zmg  fX1; :::; Xng and subattributes Yi of Zi ( 1  i  m ), X (Y1; :::; Ym) is a subattribute of X (X1; :::; Xn) . 3. For X fY g 2 UN and a subattribute Z of Y , X fZ g is a subattribute of X fY g . For a given set of attributes X = fX1; :::; Xng and fZ1 ; :::; Zmg  fX1; :::; Xng and subattributes Yi of Zi ( 1  i  m ) fY1 ; :::; Ymg is a generalized subset of fX1; :::; Xng . Given now a set of tuples r on X and DD , and a generalized subset Y of X . Y is called key of r if all elements of r can be distinguished using Y . Normally, the key is considered to be minimal. Then for a set of sets of tuples on X , Y should be the smallest set. The key Y is called minimal key of r if all elements of r can be distinguished using Y but no generalized



HERM Foundations

8

subset of Y can be used to distinguish the tuples of r . An entity type E is a pair (attr(E ); id(E )) , where E is an entity set name, attr(E ) is a set of attributes and id(E ) is a non-empty generalized subset of attr(E ) called key or identi er. Therefore concrete entities e of E can be now de ned as tuples on attr(E ). For a xed moment of time t the present entity set E t for the entity type E is a set of tuples r on attr(E ) for which id(E ) is a key. This approach is more general than the approaches used in (NF)2 relations. In order to use a simple semantics in [BaRS82] and [Hul90] partitioned normal forms are introduced. An entity set is in partitioned normal form if the atomic attributes are a key of the set and any non-atomic value, component of a tuple of the relation, is also in partitioned normal form. However, the subclass of entity sets in partitioned normal forms is not closed in general under the algebra of nested relations [PDG89]. Furthermore, it is not natural to restrict entity types to those with key sets consisting only of atomic attributes. For instance, the entity type Person = ( f Name(First,Fam),Adr(Zip, Town, Street(Name, Nr)), Sex g, f Name(First,Fam) g) would be not allowed. It is possible to use the behavior equality [AlT90] for the key de nition. Then the key de nition corresponds exactly to the key de nition of (NF)2 relations after attening with the unnest operation. These two approaches do not di er if the tuple constructor is used but they di er if the set operator is used. The following three tuples de ned on Phones = (Person, Telephonesf(RoomsfRoomg,NumbersfTelNumberg)g) t = ( a , f (fbg,fc,dg), (fe,fg,fd,hg) g) , t' = ( a , f (fb,e,fg,fdg), (fbg,fcg),(ffg,fhg) g) , and t" = ( a , f(fbg,fdg), (fbg,fcg), (feg,fdg), (ffg,fdg), (ffg,fhg) g) are not shallow equal but they are behavior equal. These tuples can carry di erent semantics. Therefore, sometimes behavior equality or other equality concepts may be useful for the application area. The key de nition on the basis of behavior equality is stronger than the key de nition on the basis of shallow equality. There are other equality concepts [AlT90] which are weaker than the shallow equality but still stronger than the behavior equality. In some cases, super/subclass relationship with more than one subclass occurs. This choice can be modeled by the union of the subclasses. The construct to be used is called cluster. Given now a set of types R1; :::; Rk . These types can be clustered by a "category" or a cluster C = R1 + R2 + ::: + Rk . For a cluster C = R1 + ::: + Rk we can similarly de ne a set C t as the union of the sets Rt1 ; :::; Rtk. If R1; :::; Rk are entity types (0-order relationship types) then C is a cluster of entity types. Given now entity types E1; :::; Ek . A rst-order relationship type has the form R = (ent(R); attr(R)) where R is the name of the type,



HERM Foundations

9

ent(R) is a sequence of entity types, and of clusters of those and attr(R) is a set of attributes from UN . Given now a rst-order relationship type R = ((E1; :::; En); fB1; :::; Bkg) and for a given moment t sets E1t ; :::; Ent . A rst-order relationship r is then de nable as an element of the Cartesian product E1t  :::  Ent  dom(B1)  :::  dom(Bk ). A rst-order relationship set Rt is then a set of relationships, i. e. Rt  E1t  :::  Ent  dom(B1 )  :::  dom(Bk ) . A set fE1; :::En; R1; :::; Rmg of entity schemes and rst-order relationship type on a data declaration DD is called consistent if the relationship schemes use only the entity schemes E1; :::; En . Given now entity schemes E1; :::Ek and relationship schemes R1 ; ::; Rl of orders not higher than i . For i > 0 , an (i+1)-order relationship type has the form R = (ent(R); rel(R); attr(R)) where R is the name of the type, ent(R) is a sequence of entity types or clusters from fE1; :::; Ekg , rel(R) is a sequence of relationship types or of clusters from fR1; :::; Rlg of order not higher than i , and attr(R) is a set of attributes from UN . Each entity type is to be considered as a 0-order relationship type. This leads to a general critism of the classical entity-relationship model. Codd [Cod90] states that there is no reason for distinguishing entity types and relationship types. Even, one designer's entity type could be another designer's relationship type. This variety of design decisions is one reason for the view integration problems [NDT88]. The explicit de nition of the key is the only di erence between entity and relationship types in our approach. The key concept can be generalized to relationship types [Tha91]. In this case, there is no di erence between entity types and relationship types. In this paper, we assume that relationship types have identi ers consisting of the component relationship types. The HERM methodology is based on this approach. However it is easy to extent this methodology to the approach in [Tha91]. Therefore we can use also the notion R = (ent(R); rel(R); attr(R); id(R)) where id(R) is a subset of the components and attributes of R . We notice furthermore, that relational schemes can be de ned as entity types. Therefore, the relational model can be interpreted as a special case of HERM. Given now a relationship type R = (E1; :::; En; R1; :::; Rm; fB1; :::; Bkg) and for a given moment t sets E1t ; :::; Ent ; Rt1; :::; Rtm . A relationship r is then de nable as an element of the cartesian product E1t  :::  Ent  Rt1  :::  Rtm  dom(B1)  :::  dom(Bk ). A relationship set Rt is then a set of relationships, i.e.



HERM Foundations

10

Rt  E1t  :::  Ent  Rt1  :::  Rtm  dom(B1)  :::  dom(Bk ) . Example 1. Let us design a HERM scheme for a simple university application covering the following information: 1. A catalogue of persons working or studying in the university. A person has a social security number or person's number which uniquely identi es this person. Persons have a name ( rst and last names and titles), and an addresses (with a town, a zip, and a street). 2. A catalogue of students which are characterized by their students numbers. A student is also a person. Students have a major and a minor specialization. They are supervised by a professor. 3. A catalogue of professors (with their specialisation). Professors are persons. Each professor is associated with a department. 4. A catalogue of courses, given in the university and characterized by a unique course number, and names. A course can have di erent prerequisites. 5. A catalogue of course o erings with the course, the semester (year and season), the room (room number and building) and the professor. 6. A catalogue of projects characterized by a unique project number, the begin and the end, and the title. 7. A catalog of students activities. Students can enroll a certain course in a term. They obtain a nal grade. This can be modeled by the following entity and relationship types: Person = ( f Person's number, Name (LastName, FirstName, f Title g ), Address (Zip,Town, Street(Name,Nr)) g , Person's number), Course = ( f CNu, CName g , f CNu g ), Project = ( f Num, Begin, End, PName g , f Num g ), Room = ( fNr,Building g , f Nr,Building g ), Student = ( Person, f StudNr g ), Professor = ( Person, f Speciality g ), Department = ( f DName, Director, Phones f Phones g g , f DName g ), Major = ( Student, Department, ; ), Minor = ( Student, Department, ; ), In = ( Professor, Department , ; ), Has = (Project, Professor + Person , ; ), Prerequisite = (Course, Course, ; ), Supervisor = ( Student, Professor, f Since g ), Lecture = (Professor, Course, Room, Semester, f Time(Day,Hour)g), Semester = (f Year, Season g, f Year, Season g),



HERM Foundations

11

CourseEnrollment = (Student, Lecture, f Result g ). We used synonyms and homonyms for the attribute names. An example of homonyms are Project. PName and Course. CName . In the above example we have used attribute identi ers. Those are useful for the translation of entity-relationship schemes and for the logical language which is used to express integrity constraints in the higher-order entity-relationship model. If A 2 U then A is an attribute identi er on A . If X (X1; :::; Xn) is a nested attribute and Ai is an attribute identi er on Xi then X:Ai is an attribute identi er on X . If X fX 0g is a nested attribute and A is an attribute identi er on X 0 then fX:Ag is an attribute identi er on X . If attr(E ) = fX1; :::; Xmg and A is an attribute identi er on Xi then E:A is an attribute identi er on E . For R = R1 + R2 + ::: + Rk and an attribute identi er A on Ri A is an attribute identi er on R too. For R = (R1; :::; Rm; attr(R)) and an attribute identi er R'.A on Ri R:(R0; i):A is an attribute identi er on R . If Ri is unique in the sequence R1; R2; :::; Rm then R:R0:A is an attribute identi er on R . If attr(R) = fX1; :::; Xmg and A is an attribute identi er on Xi then E:A is an attribute identi er on R. Generally, each component type in a relationship type is speci ed by its name and the component number. It is also possible to use labels for names and then the order in the sequence is not of importance. In recursive types, i.e. types which are using the same component type in di erent positions, the correct naming is represented by the name and the component number as already used in the de nition of the attribute identi er. For instance, in Prerequisite (Course,1) is di erent from (Course,2). Labeling is an alias concept. The label can be used also in operations and in semantics. For instance, the recursive type Prerequisite can be speci ed di erently: Prerequisite' = (Requires:Course, Required:Course, ; ). In this case, Requires can be used instead of Requires:Course. A set fE1; :::; En; R1; :::; Rmg of entity schemes and relationship type on a data declaration DD is called consistent if the relationship schemes use only the entity schemes E1; :::; En and the relationship schemes R1; :::; Rm . Given now a set ERDec = fE1; :::; En; R1; :::; Rmg of consistent entity and relationship schemes. Let R(ERDec) be the set of all entity and relationship sets f(E1t ; :::; Ent ; Rt1; :::; Rtm)jt  0g: Then it is possible to de ne a function C integrity constraints for the set R(ERDec) C : R(ERDec) ?! f0; 1g . For a given set ERDec of consistent entity and relationship schemes and a function C of



HERM Foundations

12

integrity constraints, the pair ERS = (ERDec; C ) is called entity-relationship scheme . For an entity-relationship type ERS = (ERDec; C ) , an element er from R(ERDec) is called entity-relationship database (ERS-database ) if C (er) = 1.

2.3 HERM-Diagrams

Now we introduce a graphical representation language for entity- relationship schemes called entity-relationship diagrams ( ERD ) using the following bricks. Given a data scheme DD = (U, D ,dom) and a set of consistent en- tity and relationship schemes S = fE1; :::; En; R1; :::; Rmg . The entity-relationship diagram is a nite labeled digraph GS = (U [ S; H ) where H is the set of directed edges where an edge can be of one of the following forms: 1. Ei ?! Aj ; 2. Ri ?! Aj ; 3. Ri ?! Ej ; 4. Ri ?! Rj . E(ntity)-vertices are represented graphically by rectancles, A-vertices and R(elationship)-vertices are represented graphically L by circles and diamonds, respectively. Clusters are represented by diamonds labeled with or simply as a common input point to a diamond. The edges Ei ?! Aj can be labeled by dom(Aj ). Other edges could be labelled by integrity constraints, for instance cardinality constraints. This labelling concept is to be introduced in section 3.2. Primary keys are underlined. Furthermore, if in a relationship type a set of attributes is determining the components then these attributes are underlined. Example 1. (continued) Figure 1 shows the diagram of example 1. The example shows that the proposed extension of the HERM solves some of the drawbacks of the relational and the entity-relational database modeling like uniformity of relations, view de nition, problems in deterministic normalization algorithms, problems in query languages, set inadequacy, sequence modeling. This model maintains also the advantages of the relational and the entity-relationship model like availability of database design theory, query language, access path propagation, optimization, user-friendliness, visibility, understandability, and data dependency propagation.

2.4 Bene ts of the HERM approach

We demonstrate now that the proposed model has several bene ts: The used constructs have a well-de ned semantics. Using higher-order relationship types leads to simpli cation of schemes.



HERM Foundations

13

Name(First,Fam,fTitleg) Adr(Zip,Town,Street(Name,Nr)) Person's number >  I @ K @ A  A  @  A @  A  @ Supervisor H   A H  @ Since HA   HH @   A H   HH @   A   Major H@ A HH   Q j@ Speciality  ? A QQ s Department ?  In AA - Professor Student 3  I @  > A  @ Minor  StudNr CO 6 A PhonesfPhoneg DName  C  A Director A C   A C  A C  A C   A C  AL C  A C Result  Time(Day,Hour) Enroll Lecture Has             9  +  ? CNu Semester Room Course CName ? 6 6 Year Season Nr Building Project

Person

??@@ @ ? H HH HHH @? HH ??@@ @@?? HHHH   HHH  H

H HH HHH HH

?@@ ? @@??

H HH HHH HH

H HHPrerequisHHH HH

H HH HHH HH

H HH HHH HH

Num

Begin

Figure 1: HERM-Diagram of the University Database

End

PName



HERM Foundations

14

The model is extensible by de ning constructs as expressions of basic constructs. The de nition of the type concept is general enough to accommodate also active types with associations and special semantics . A relationship type R = (R1:::Rm; attr(R)) with comp(R; Ri) = (1; 1) is called existence

relationship. The item Ri is called existence dependent on

R1; :::; Ri?1; Ri+1; :::; Rm. The items R1 ; :::; Ri?1; Ri+1; :::; Rm are called dominant types and Ri is said to be a subordinate type. The classical entity-relationship model uses also weak entity types. Entity types E = (attr(E ); id(E )) are called strong if id(E ) is not empty otherwise they are called weak. A relationship scheme is called consistent if all weak entity types are subordinate types. Weak entity types are characterized via relationships by other entity types. There is no unique treatment for weak entity types which are characterized via non-binary relationship types. If for instance two weak entity types are characterized via the same relationship type but di erently then the representation requires additional constructs. For this reason, there is no theoretical treatment for weak entity types if there are used only structural constructs. For instance, given the strong entity types Person, Flat and the weak entity types Employment, InsuranceContract which are characterized by both strong entity types. The Employment is characterized only by Person but with the additional restriction that employees need to have a at before being employed. Insurance contracts are characterized by persons and their ats. Until now there is no construct to express this situation. One solution would be the separation of the characterization of Employment and InsuranceContract. But in this case the additional restriction for employees needs to be expressed by a complicated path inclusion dependency. For the sake of simplicity, another representation would be via one relationship type Employ-Insur. This is unnatural but keeps the semantics and does not repreat information in two di erent relationship types. Generally, it can be shown that if a weak entity type is only characterized by its characterizing relationship type on that weak entity type and on only strong entity types furthermore and this relationship type is not used elsewhere then the concept of weak entity types is consistent. However, this is a strong restriction. For instance, we want to keep geographic information for the post, for instances addresses consisting of the information on the state, the town, the street and the appartments. Then this information can be represented by a path of weak entity types. It is easy to show, that weak entity types can be modeled directly by relationships with additional attributes which can participate in the key. This modeling corresponds to the modeling approaches used in the network model and used in the relational model. Therefore, there is no need for the special consideration of weak entity types. Using the approach of [AlT90] we can also enhance the identi er concept for weak entity types. The identi er is de ned on all attribute identi ers of the scheme. This concept can be shown to be well-de ned. Let us illustrate this approach using the newspaper example in [YaT89]. The address is treated as an entity type and decomposed into the weak entity types State, Town, Road and Address. In this example the address is an object and not only a property. It is used as an address of a customer and as an address to be served by a deliverer. The



HERM Foundations

15

representation of an address as an attribute would require the introduction of a special complex integrity constraint. Therefore we get the following entity types. State = (fName, Codeg, fNameg) , Town = (fCity, PostOce, Zipg, fState.Name, Cityg) , Road = (fStreet, Locationg, fTown.City, Town.State.Name, Streetg), Address = (fAppartmentNumber, HouseNumberg, fRoad.Town.City, Road.Town.State.Name, Road.Street, AppartmentNumber, HouseNumberg). The complex attribute identi ers indicate the existence of inclusion constraints between these entity types. Therefore, this approach is still too complicated. However, the HERM approach allows also the simpler representation by an entity type and three relationship types which represents the same information in a simpler form using the above mentioned identi er de nition in relationship types too. State = (fName, Codeg, fNameg) , Town = (State, fCity, PostOce, Zipg, fState, Cityg) , Road = (Town, fStreet, Locationg, fTown, Streetg), Address = (Road, fAppartmentNumber, HouseNumberg, fRoad, AppartmentNumber, HouseNumberg). It is often stated that relationship types can be only rst-order types. This statement is based on the observation that each normalized relational scheme can be represented by an entityrelationship scheme which is using only rst-order relationship types. However, this leads to unnatural and complicated schemes. In terms of the classical entity-relationship model [Ris88], the university example would use the following entity and relationship types for the information on course enrollment: Person = ( f Person's number, Name (LastName, FirstName, f Title g ), Address (Zip,Town, Street(Name,Nr)) g , fPerson's numberg), Student' = ( f Person's number, StudNr g, f StudNr g ), Professor' = ( f Person's number, Speciality g, f Speciality g ), Course = ( f CNu, CName g , f CNu g ), Student'-IsA-Person = ( Student', Person, ; ), Professor'-IsA-Person = ( Professor', Person, ; ), Room = ( fNr,Building g , f Nr,Building g ), Lecture = (Professor', Course, Room, Semester f Time(Day,Hour)g), Semester = (f Year, Season g, f Year, Season g), CourseEnrollment = (Student', Course, Room, Professor', f Result g ). The IsA-relationships are treated in the HERM-approach by rst-order relationships. The relationship between course enrollment and lectures can not be represented so simple as in the HERM approach. A student can enroll a course given by a professor in a certain room only if this course information is kept in the lecture relationship set. It is surprising that translating this scheme leads to a relational scheme which is not minimal. Using the translation theory of [Teo89] we obtain the following two relation schemes: LECTURE = ( f Professor'.Person's number, Course.CNu, Room.Nr, Room.Building,



HERM Foundations

16

Time(Day,Hour), Semester.Year, Semester.Season g, , f Professor'.Person's number, Course.CNu, Semester.Year, Semester.Season g ) , ENROLL = ( fStudent'.Person's number, Course.CNu, Room.Nr, Room.Building, Professor'.Person's number, Result g , fStudent'.Person's number, Course.CNu g ) . These schemes need to be minimized. Otherwise a complex referential integrity constraint has to be maintained. Therefore, the translated scheme requires further normalization. For this reason, the ERM approach is not useable as necessary for a good design. Furthermore, the normalized relational scheme has a direct HERM representation. The HERM scheme can be translated directly to a normalized relational scheme. The above discussed example has shown that the direct representation of normalized relational schemes by entity-relationship schemes with rst-order relationship types can not be retranslated to the same scheme without additional minimization. There is also another reason that this approach is not so useful. Normally, the referential integrity in such schemes has to be represented by additional inclusion constraints. Although these inclusion constraints are structural constraints they are now to be represented by semantical constraints. Let us consider a special case which shows that such constraints can be directly represented by the structure without additional integrity constraints.

Proposition 2.1 Given the two relationship types

R = (ent(R); rel(R); attr(R); id(R)) and R0 = (ent(R0 ); rel(R0); attr(R0); id(R0)) with the identi er relationship id(R)  id(R0) and the inclusion dependency R0[id(R)]  R[id(R)]. Then R0 can be replaced by R" = (ent(R"); rel(R"); attr(R0); id(R0) n id(R) SfRg) omitting the entity and relationship types which are in id(R) from ent(R0 ); rel(R0); attr(R0). The identi er relationship and the inclusion dependency is represented by the de nition of R". The HERM maintains the advantages of both models, the relational and the entity-relationship model. HERM schemes are normally simpler and easy understandable whereas ERM schemes are more complicated (up to several times) and not so easy to understand. The Mathematical reviews library example in [TWB89] uses 54 entity types, 58 relationship types. The HERM scheme represents these types using only 8 entity types and 17 relationship types [Tha90']. Only those classes are represented by entity types which exist independently from the others. There are several reasons that we need n-ary relationship types. The binary relationship model uses only binary relationship types. Cood [Cod90] discusses some problems in this model. The proposal in [Hai90] to use only binary relationship types by introducing additional abstract (object) identi ers leads to more complex schemes and loss of transparency. The methodology used in this paper for the representation of objects is based on the separation of kernel objects



HERM Foundations

17

from other objects. In this case we need higher-order relationship types for the representation of hierarchies between objects. There are also other reasons for the utilization of higher-order relationship types. It is our intention to use the database theory developed especially for the relational model. In this case we should develop also a normalization approach for the HERM [Tha90']. It is well known that the normalization of relations can not be represented in the entity-relationship model without semantical integrity constraints [JaN83,Lien80,MeZ90]. Let us use in our terminology the example of [JaN83]. Given the relation schemes E = (fAg; fAg), F = (fB g; fB g), G = (fC g; fC g), R1 = (fA; B g; fA; B g) and R2 = (fA; B; C g; fAg) with the inclusion dependencies R1 [A]  E [A], R1[B ]  F [B ], R2 [C ]  G[C ], and R2[A; B ]  R1[A; B]. This can be represented in the ERM-approach by three entity types E; F; G and the relationship types R = (E; F; ;); S = (E; F; G; ;) with the additional inclusion dependency R[E; F ]  S [E; F ]. But in reality the last inclusion dependency is a structural dependency. The HERM-representation uses the three entity types E; F; G and the relationship types R = (E; F; ;); S 0 = (R; G; ;) without additional integrity constraints. Clusters can be generalized to algebraic expressions based on set-theoretic operations. Since expressions are linear ordered we can use this linear ordering for the de nition of attribute identi ers. We can use also labels in algebraic expressions. In this case the usage of the cluster de nition ordering can be restricted. This identi cation can be used also for the computation of generic operations. The identi cation can be simpli ed also by utilization of integrity constraints. Several approaches for relational translation of clusters can be applied. If the cluster components are de ned with the same identi ers then the translation is simple [EN89]. In the other case, the relationship type is either replaced by a relationship type for a subcluster and another relationship type for the remaining part of the cluster or an additional abstract identi er is introduced for the cluster in the relationship type adding furthermore the corresponding number of relationship types on the cluster components and the abstract identi er. We can extend the constructor set for types. This can be done by de ning the extension by expressions on the existing constructs [STW91]. We show this now for null values, power sets and sequences (lists). In some cases it may be useful to use null values in relationships too. This concept can be introduced in HERM as a shortened form. Since null values can be used also in keys [Tha89'] the identi cation property is not lost. We can use brackets for null-valued components. According to [Tha90], horizontal decomposition can be used for representing the types by types without null values. For example, the types LectureO ered = (Professor, Course, Semester, ;), LectureScheduled = (LectureO ered, Room, f Time(Day,Hour)g), represent the type Lecture' = (Professor, Course, [Room], Semester f Time(Day,Hour)g). Since this decomposition can not be applied to unary relationship types without loss of the identi cation property Is-A relationships can not use null values in its component type.



HERM Foundations

18

The same approach can be used for the representation of power sets and sequences. For instance, a a set of teams can be speci ed by Team = (Person, f TeamNumber g) with the identi er fPerson, TeamNumberg . The research team of a project is then speci ed in Has' = (Project, Team, ; ). Sequence and lists can be represented in the same manner. We need to represent the order in the list. The list of teams is to be represented by the type TeamOrdered = (Person, fTeamNumber, NumberInTeam g) with the identi er f TeamNumber, NumberInTeam g and the additional identi er fPerson, TeamNumber g which expresses the set property of a team. As we mentioned already, nested relations are used to represent not only at informations but using nesting the internal structure of objects to be represented can be represented better. Nested entity types are introduced already in HERM. The presented model can be also extended to nested relationship types. Let us consider in our main example a new entity type Hobby which is used to represent some hobby information. Then a new relationship type Has-Hobby = ( Person, Hobby, f Period g ) can be used for the representation of the hobbies of persons. Another, more adequate representation would be the relationship type Has-Hobby = ( Person, HisHobbies(f(Hobby, Period) g ) . Another example where atening leads to diculties in maintenance would be the relationship type Comp-Tennis-Pair = ( First:(Person,Person),Second:(Person,Person), fResultg) which is representing the information on the tennis competition between pairs. Generally, this extension could be applied to the model too. However, there is no need for the explicit introduction of this construct since this construct can be seen as the shortened form of hierarchies of types. For instance, the last example is represented by the types Pair = ( Person,Person, ;) , Comp-Tennis-Pair = ( First:Pair , Second:Pair, fResultg) . This new type is useful for the simpli cation of the scheme by enclusion of types which are not representing concepts. This possibility can be used for view cooperation. 3

Integrity Constraints

As it is shown in gure 3 there can be introduced also di erent classical integrity constraints [Tha90] in HERM. Generally, we distinguish between static integrity constraints (for the representation of semantics of all possible instances of the database) and dynamic integrity constraints (for the representation of the behavior of the database during its lifetime, e.g. correctness of sequences of states of the database). Static integrity constraint classes can be classi ed according to their function in the scheme:



HERM Foundations

19

integrity constraints ? ? ? ?

?@ ? @

static

@ @

@ R @

dynamic

P @PPP   PP  @ PP    @ PP  PP   @ PP     R @ q

structural

semantic representational

design

conditions on the structure, relationships

semantic internal structure restrictions of the database

user-friendly expressions

ID, ED, (FD)

FD, MVD

JD, TGD

GFD

Figure 2: The classi cation of integrity constraints 1. structural dependencies, i.e. dependencies which are used in reality for the database design and express conditions on the structure, e.g. inclusion, exclusion, functional dependencies; 2. semantic dependencies. i.e. dependencies which are used in reality for the database design and are semantic restrictions, e.g. functional, multivalued dependencies; 3. representation dependencies, i.e. dependencies which are used for the representation of the database, e.g. inclusion, join, tuple-generating dependencies; 4. design dependencies, i.e. dependencies which can be used for a user-friendly schema design [Tha88], e.g. general functional and generalized functional dependencies. It can be shown that these constraints can be used in dynamic integrity constraints. Dynamic integrity constraints are useful for the maintenance of the database system. At present, there is no general framework for the utilization of dynamic integrity constraints. This classi cation includes both inherent and explicit constraints. The distinction between inherent and explicit constraints depends on the model to be used. In the relational model, all integrity constraints are represented together. This comixing led to diculties in classi cation. In [BeK86] a classi cation of dependencies by their role is proposed. This approach was further developed in [Tha88'] to the above presented di erentiation. The consideration of the set of all constraints has the advantage that we need only one unique derivation procedure. However, there is no axiomatization for di erent sets of integrity constraints, e.g. for functional and



HERM Foundations

20

inclusion dependencies. Therefore in this case, only the axiomatization known in rst-order predicate logic could be applied. But this is fairly complex. The mixing leads also to a mismatch of constraint types. Especially, in relational database design dependencies are intending to express both basic relationships and semantic relationships. In the entity-relationship approach, structural constraints are modelled by inherited constraints like inclusion dependencies which are de ned on the structure of the scheme. In most extensions of the ER model di erent types of functional constraints like one-to-one or one-to-many relationships or cardinality constraints are considered. The advantage of those constraints is that they are easy to specify and that they are design dependencies. We can restrict the utilization of integrityS constraints to the following mappings: 1. f1 : fDesignDepg ,! fStructIntg fSemanticIntg, 2. f2 : fStructIntg ,! fRepresentIntg, 3. f3 : fSemanticInt Sg ,! fRepresentInt S g, 4. f4 : fStructIntg fSemanticIntg fRepresentIntg ,! fDynamicIntg. It should be noticed that the representation in the higher-order entity-relationship scheme hides and covers di erent integrity constraints which should be extracted whenever the scheme is represented by other database models. We notice that there can be developed axioms and derivation rules for di erent integrity constraints like in the relational data model. We obtain also some new constraints like homonyms and synonyms. These are used to express the similarity and the di erence of values used in di erent types. Synonyms are identi ers that have the same meaning. Homonyms are identi ers that sound alike but are used in di erent types and have di erent meanings. They are useful for simpli cation of the scheme and for translation of schemes. In our example, we get directly the following synonyms (expressed by ) and homonyms (expressed by : ). Synonym associations are re exive, symmetric and transitive. Homonym associations are irre exive, symmetic and none transitive. In the relational model, synonyms are usually implicitly given by assumptions like the unique name assumption. For the university example, we can use the following synonym and homonym associations. Person'sNumber  Person.Person'sNumber , Person'sNumber  Professor.Person.Person'sNumber , Person'sNumber  Student.Person.Person'sNumber , Student.StudNr  StudNr, Minor.Department.DName  DName, Supervisor.Professor.Person.Person'sNumber :  Person'sNumber . There exists a simple calculus for synonym and homonym associations. This calculus is sound and complete. Calculus

Axioms

?syn;homon

x  x

Rules

HERM Foundations

x  y  x  y ; y  x  z x : x :

21

x : y y : x x : y ; y  z x : z x x : x y y  x

y x z

Proposition 3.1 (folklore) The calculus ?syn;homon is a non-redundant, sound and complete calculus for the implication of synonym and homonym associations.

3.1 Generalizing Classical Constraints

3.1.1 Generalizing Inclusion and Exclusion Constraints The integrity constraints known in the relational database model [Tha90] can be generalized to integrity constraints in entity-relationship models. Then also the known complexity results can be extended [DK 83,DLM89] to complexity results in HERM. Generalizing then the relational database theory almost all results can be used in the HERM theory. There are also other constraints which are speci c for the HERM. Let us consider some of these classes and their use in normalization and representation of HERM structures. As already mentioned, the entityrelationship approach allows a better distinction of integrity constraints. Structural constraints can be expressed directly by the structure. Inclusion dependencies are expressed directly by the relationship type de nition. There are also other inclusion dependencies which can not be expressed only by the structural part of the type de nition. The class of inclusion dependencies is axiomatizable [CFP84,Mit83]. In the main example, the following inclusion dependencies are valid. Student[Major.Department.DName]  Department[DName] , Student[Supervisor.Professor.Person.Person'sNumber]  Professor[Person.Person'sNumber], Professor[Person.Person'sNumber]  Person[Person'sNumber]. Using these constraints we derive the following constraint Student[Supervisor.Professor.Person.Person'sNumber]  Person[Person'sNumber]. The class of exclusion dependencies [Tha90] is another important class of integrity constraints. Given two types R; S and identi ers R:A1 ; ::::; R:An; S:B1; :::; S:Bn de ned on these types. An exclusion dependency is an expression of the form R[R:A1; ::::; R:An] k S [S:B1; :::; S:Bn] and is valid in a database (:::; Rt; :::; S t; :::) if for all r 2 Rt and all s 2 S t r jA1 ;::::;An 6= s jB1 ;:::;Bn . The class of exclusion dependencies is a powerful and simple class of dependencies and can be used to express di erent semantical restrictions. For instance, in the main example the exclusion



HERM Foundations

22

dependency

Department[Department:DName] k Course[Course:CName] states that these two names are di erent. Exclusion dependencies using the keys of two types are exclusion constraints on classes. Generally speaking, exclusion constraints on classes express the disjointness of two classes or two domain sets. Objects can not belong to the classes connected by an exclusion dependency at the same time. In this case, exclusion constraints are a specialization of the subtype concept expressing that an object belonging to a class and some of its subclasses does not belong to other subclasses at the same time. In our main example, for instance, the exclusion constraint Student k Professor expresses that students can not be professors at the same time and vice versa. These constraints are structural constraints. It is possible to express exclusion constraints directly in the diagram using an additional labeled edge between the two classes with a label expressing the exclusion [NiH89]. These constraints are discussed also in [AtPa86,FoMV91] for other data models. In [AtPa86], exclusion and inclusion constraints on classes are axiomatized. Exclusion dependencies can be expressed by an negated inclusion dependencies. Objects belonging to the rst class do not belong to second classes at the same time. For instance, the dependency Student j Professor expresses that students can not be professors at the same time and further that professors can not be students at the same time. Then an exclusion dependency is given by one negated inclusion dependencies. The class of exlusion and inclusion dependencies is axiomatizable like in [Tha90]. Let us rst restrict our consideration of exclusion and inclusion constraint to classes. In this case, a simple theory exists which can be used also in object-oriented database modeling. The inclusion dependencies can be generalized to nondeterministical inclusion dependencies [Tha90]. This generalization is important if the model permits clustering. Given, for instance, the relationship type R = (L : (R1 + ::: + Rn); S1; :::; Sm; attr(R)). The inclusion dependencies R[L]  Ri , R[L:Ri]  Ri are not valid in general for this type. Instead of that, the nondeterministical inclusion dependency R[L]  R1 + ::: + Rn is valid for each Rt . We can generalize these constraints to algebraic dependencies [Tha90] or algebraic constraints on types. For this purpose, we introduce algebraic expressions on types. Given a set of types fT1; :::; Tng. The set of all algebraic expressions = (fT1; :::; Tng) on fT1; :::; Tng is the smallest set with the following properties: 1. ; 2 , 2. fT1; :::; Tng  , 3. if ";  2 then (" [  ); (" \  ) 2 . Parentheses can be omitted according to the conventions that \ binds stronger than [ and according to the property that union and intersection are associative, idempotent and commutative. An algebraic constraint on is an expression of the form "   for ";  2 .



HERM Foundations

23

For any algebraic expression  and fT1t; :::; Tnt g the set  t is de ned similarly. We can de ne the validity of algebraic constraints for fT1; :::; Tng in the classical approach. "   is valid in fT1t; :::; Tntg i "t   t . In the usual way, the implication is to be introduced for algebraic constraints. In the style of [Tha90] propositional formulas can used for the representation of algebraic constraints. The set fp1 ; :::; png of variables is to be associated with the set of types fT1; :::; Tng. Using this association a propositional formula ' is constructed for each algebraic expression  from . For a propositional formala ' on fp1; :::; png the semantics can be de ned as usual. We get directly the following correspondence between propositional formulas and algebraic constraints.

Proposition 3.2 For algebraic constraints f"i  i j 0  i  mg the following properties are equivalent: 1. f"i  i j 1  i  mg j= "0  0 ; 2. f'"i ! 'i j 1  i  mg j= '"0 ! '0 ; 3. j= (('"1 ! '1 ) ^ ::: ^ ('"m ! 'm )) ! ('"0 ! '0 ); 4. (('"1 ! '1 ) ^ ::: ^ ('"m ! 'm )) ! ('"0 ! '0 ) = 1.

Proof. The equivalence of 2., 3. and 4. is known from logics. Therefore, we prove only the equivalence between 1. and 2. We prove the equivalence for m = 1. The proof for m > 1 is analogous. Assume now that '"1 ! '1 j== '"0 ! '0 . In this case, the exists an interpretation () = (1; :::; n) for (p1; :::; pn) such that ('"1 () ! '1 () ! ('"0 () ! '0 )() = 0. Now let us de ne a database fT1t; :::; Tnt g such that Tit = ; i i = 0 for 1  i  n. Using this database we obtain "1  1 j== "0  0 . That 2. implies 1. can be proven in the same manner.

We get directly the following property.

Corollary 3.3 For a set of algebraic constraints = f"i  i j 0  i  ng there exists an equivalent algebraic constraint "   . An algebraic expression is in the conjunctive canonical form if it is the intersection of elementary unions. An elementary union is the union of types from fT1; :::; Tng. An algebraic expression is in the disjunctive canonical form if it is the union of elementary intersections. An elementary intersection is the intersection of types from fT1; :::; Tng. A canonical algebraic constraint is a constraints of the form "   where " is in disjunctive canonical form and  is in conjunctive canonical from. By proposition 3.2 any set of algebraic constraints can be converted to an equivalent set of canonical algebraic constraints. The following calculus allows us to convert sets of algebraic



HERM Foundations

24

constraints to sets of canonical algebraic constraints. For algebraic expressions "1 ; "2; "3; "4 ("1 [ "2 ) \ "4  "3 "1 [ "2  " 3

"1  "3 "2  "3 "1 \ "4  "3 "2 \ "4  "3 "1  "2 \ " 3 "1  ("2 \ "3 ) [ "4 "1  "2 "1  "3 "1  "2 [ "4 "1  "3 [ "4 An canonical algebraic constraint "   is simple if " is an elementary intersection and  is

an elementary union. Using the above presented transformation an equivalent set of simple canonical algebraic constraints can be constructed for each set of algebraic constraints. We get by proposition 3.2 and the completeness of the resolution calculus that the following calculus is complete and sound. The soundness and completeness follows also from the soundness and completeness of the system ?MFD [Tha90] de ned for the class of monotone functional dependencies. System ?cac Axioms

"  [";

"\  "

Rules

"      "[   " " "   " " "[   "  \

Theorem 3.4 The system ?cac is sound and complete for implication of canonical algebraic constraints.

Let us now consider the meaning of simple canonical algebraic constraints. A simple canonical algebraic constraint is of the form X1:::Xn  Y1 [ ::: [ Ym for Xi 2 fT1; :::; Tng; Yj 2 fT1; :::; Tng; 1  i  n; 1  j  m. Let us denote by 0 the set Y1 [ ::: [ Ym for m = 0 and by 1 the set X1:::Xn for n = 0. Let us consider four di erent cases for n 6= 0; or m 6= 0 : 1. n 6= 0; m 6= 0 : X1:::Xn  Y1 [ ::: [ Ym is the nondeterministical inclusion dependency. 2. n > 1; m = 0 : X1:::Xn  0 is another notation of the exclusion constraint X1:::Xn?1 k Xn. 3. n = 0; m 6= 0 : 1  Y1 [ ::: [ Ym is the union dependency which is valid in a database if any object is in Y1t [ ::: [ Ymt . 4. n = 1; m = 0 : X  0 means that the class X t is empty for any t. This can be also denoted by X k X .



HERM Foundations

25

Using this correspondence we obtain the following calculus for nondeterministical inclusion dependencies and exclusion constraints on classes. System

?niec

Axioms

Rules

XY  X X  X[Y X1:::Xn  Y [ W1 [ ::: [ Wm Y Z1 :::Zn  V1 [ ::: [ Vl l 6= 0 X1:::XnZ1:::Zn  V1 [ ::: [ Vl [ W1 [ ::: [ Wm X1:::Xn  Y [ W1 [ ::: [ Wm Y Z1 :::Zn k V m 6= 0 X1:::XnZ1 :::ZnV  W1 [ ::: [ Wm X1:::Xn  Y; Y Z1 :::Zn k V X1:::XnZ1:::Zn k V X kX X kX X kY X Y X1 :::Xn   X(1) :::X(n)   foranypermutation  X1:::Xn k Z X(1) :::X(n) k Z foranypermutation  X1:::Xn k X0 X0X1 :::Xn?1 k Xn

The system ?niec generalizes the system in [AtPa86] which is sound and complete for inclusion and exclusion constraints. By theorem 3.4 we conclude the following property of ?niec .

Theorem 3.5 The system ?niec is sound and complete for implication of nondeterministical

inclusion dependencies and exclusion constraints.

Since the implication problem for exclusion and inclusion constraints is NP-complete [AtPa86] and because of proposition 3.2 and corollary 3.3 the implication problem for nondeterministical inclusion dependencies and exclusion constraints is also NP-complete. Therefore, the implication problem is not more complex for this generalization.



HERM Foundations

26

3.1.2 Generalizing Functional Dependencies Functional dependencies are to be considered one of the most important constraint classes. The results known from relational database theory [Tha90] can be used also for the HERM. We generalize this class to path functional dependencies. This class is more general than the class of functional dependencies. Given a schema ERDec = fE1; :::; En; R1; :::; Rmg. A sequence p = P1 ? ::: ? Pk of types from ERDec is called ERDec-path (or brie y path) if for each i; 1  i < k either Pi is a component of Pi+1 or Pi has the component Pi+1 . In the main example, the following sequences are pathes: Person - Student - Supervisor - Professor - Person , Student - Supervisor - Professor - In - Department . For pathes the previously de ned concept of attribute identi ers can be extended. For pathes only such identi ers are of importance which are used in the path. For this we distinguish leaves and roots of the path. Let us now introduce this concept formally. A path element Pi is called leaf of p = P1 ? ::: ? Pk if 1. for i = 1 P1 is a component of P2 and if P1 = Pk then Pk is a component of Pk?1 or 2. for 1 < i < k Pi is a component of Pi?1 and of Pi+1 or 3. for i = k Pk is a component of Pk?1 and if P1 = Pk then P1 is a component of P2 . For each type in p there is de ned the set of all attribute identi ers on p. Let us denote by leaf ? attr(p) the set of all attribute identi ers de ned on leaves of p. A path element Pi is called root of p = P1 ? ::: ? Pk if 1. for i = 1 P2 is a component of P1 and if P1 = Pk then Pk?1 is a component of Pk or 2. for 1 < i < k Pi has the components Pi?1 and Pi+1 or 3. for i = k Pk?1 is a component of Pk and if P1 = Pk then P2 is a component of P1 . The set of all path component type'(p) identi ers is then de ned recursively: 1. Each root is an element of type'(p). 2. If X.Pi 2 type'(p) and for i < k Pi+1 is a component of Pi then X.Pi .Pi+1 2 type'(p). 3. If X.Pi 2 type'(p) and for i > 1 Pi?1 is a component of Pi then X.Pi.Pi?1 2 type'(p). 4. If P1 = Pk , X.Pk 2 type'(p) and P2 is a component of P1 then X.Pk .P2 2 type'(p). 5. If P1 = Pk , X.P1 2 type'(p) and Pk?1 is a component of Pk then X.P1.Pk?1 2 type'(p). Since components of the path which are leaves can be speci ed twice the set type'(p) is to be reduced to the set type(p) removing from type'(p) all those identi ers X.Pi .Pi?1 for which there is Y.Pi?2 .Pi?1 2 type'(p) and similar removal in the case P1 = Pk . Now we can concatenate the set type(p) with the set leaf ? attr(p) to the set attr(r) = type(p) SfX:R:Y j R:Y 2 leaf ? attr(p); X:R 2 type(p)g . The attribute identi er set



HERM Foundations

27

attr(p) = f Supervisor.Since, Supervisor.Student.StudNr, Supervisor.Student.Name(First,Fam,fTitleg), ..., Supervisor.Professor.Speciality, Supervisor.Professor.Name(First,Fam,fTitleg), ..., In.Department.DName, In.Department.Director, In.Department.PhonefPhoneg, Supervisor.Student, Supervisor, Supervisor.Professor, In, In.Department g is de ned for the path p = Student - Supervisor - Professor - In - Department. Given a path p and the sets attr(p), X; Y  attr(p). A path functional dependency is p : X ! Y . The validity of path functional dependencies is de ned like for functional dependencies for the relation pt which is de ned as the relation generated over the path p from the relations in ERDect. Given a path p = P1 ? ::: ? Pk and a database ERDect. The relation pt is de ned inductively. 1. For the path p = R: pt = Rt . 2. For the path p = R ? S : If R is a component of S then pt = (S t)[R]. If S is a component of R then let pt = (Rt)[S ]. 3. For the path p = X ? R ? S : If R is a component of S then pt = (X ? R)t 1 (S t)[R]. If S is a component of R then let pt = (X ? R)t[S ]. The path functional dependency pfd = Student - Supervisor - Professor - In - Department : fSupervisor.Student.StudNrg ! fIn.Departmentg speci es that a student can have supervisors only from one department. The corresponding relation for this path dependency is Supervisort[Student; Professor] 1 Int[Professor; Department]. We can also invert pathes. Since the attribut notation of pathes depends from the order of the path we can also introduce the inversion of a path. A path p?1 = Pk ? ::: ? P1 is called the inversion of the path p = P1 ? ::: ? Pk . Then for p?1 the corresponding sets leaf ? attr(p?1 ); root ? type(p?1 ); attr(p?1) are introduced and for each path functional dependency p : X ! Y the corresponding inverted path functional dependency (p : X ! Y )?1 is obtained by replacing p by p?1 and attribute identi ers from attr(p) by the corresponding attribute identi ers from attr(p?1 ). The path functional dependency pfd' = Department - In - Professor - Supervisor - Student: fSupervisor.Student.StudNrg ! fIn.Departmentg is the inversion of pfd = Student - Supervisor - Professor - In - Department : fSupervisor.Student.StudNrg ! fIn.Departmentg . For this inversion the attribut identi er In.Professor.Speciality is corresponding to the attribut identi er Supervisor.Professor.Speciality. Let us now introduce the calculus for path functional dependencies.



HERM Foundations System

Axioms

28

?pfd (IdentifierAugmentation) X  Y : (PathIdentification) X = p0 ? Y :

p:X !Y p : p0 ! X

Rules

! YS (Union) p : pX :SXZ ! Y Z (Transitivity ) p : X !p Y: X; !p Z: Y ! Z

!Y (PathAugmentation) p0 ? p p: p: 0 X ? X ! p0 ? Y (PathInversion) (pp : :XX !! YY)?1 It can be proven that this system is sound and complete for implications of path functional dependencies. The proof is based on the following propositions which are well-known in relational database theory.

Proposition 3.6 Given a functional dependency X ! Y and a relation r which satis es the functional dependency. Then each subset r0 of r satis es the functional dependency.

Since each path can be represented by a relation scheme the following lemma is a consequence of proposition 3.6 and the fact that the Identi er Augmentation axiom and the Union and Transitivity rule are the Armstrong system for functional dependencies. Lemma 3.1 The Identi er Augmentation axiom and the Union and Transitivity rules are sound.

Proposition 3.7 SGiven a functional dependency X ! Y and a relation r on a scheme

which contains X Y . Then r satis es the functional dependency if and only if the projection S S 0 r = r[X Y ] of r to X Y satis es the functional dependency.

Since p0 ? p can be interpreted as the extension of p proposition 3.7 proves the following lemma. Lemma 3.2 The Path Augmentation and Path Inversion rules and the Path Identi cation axiom are sound. Therefore, we get the property



HERM Foundations

29

Corollary 3.8 The system ?pfd is sound. Let us now prove the completeness of the system ?pfd . For this we introduce additional rules. Y ; p S: X ! Z (Additivity ) p : X p!: X !Y Z SZ p : X ! Y (Projectivity ) p:X !Y !Y (SimplePathAugmentation) p0 p? p: X :X ! Y Since these rules are derivable from the rules of the system ?pfd they are also sound. We use now these rules for the completeness proof. For a given set F of path functional dependencies the closure F + of F is the set of all path functional dependencies which can be derived in the system ?pfd from F . For path p , the identi er set attr(p) and X  attr(p) let Xp+ = fA 2 attr(p) j (p : X ! fAg) 2 F + g. Obviously, by Additivity and Projectivity Lemma 3.3 p : X ! Y 2 F + i Y  Xp+. Using the classical completeness proof for functional dependencies the following fact is to be proven. Lemma 3.4 If p : X ! Y 2= F + then there exists pt which satis es F and violates p:X ! Y. Summarizing corollary 3.8 and lemma 3.4 we obtain the following result.

Theorem 3.9 The system ?pfd is sound and complete for implication of path functional dependencies.

In [Wedd89], simple path functional dependencies are introduced. A dependency is a simple path dependency if it is speci ed for a type R over the set of identi ers R:X . Therefore, a

simple path functional dependency is a path functional dependency de ned over a single element path. As a corollary of theorem 3.9 and the property that in trees pathes are uniquely speci ed by there leaves we get that the system of [Wedd89] is sound and complete. Path functional dependencies can be generalized to graph dependencies which can be speci ed over subgraphs of the whole HERM graph.



HERM Foundations

30

The calculus ?pfd can be restricted to simple path functional dependencies as well as extended to graph functional dependencies. There can be also considered other rules like 0 0 ? X ! p0 ? Y : (Reduction) p ? p p: p: X !Y This rule is not valid in general. The rule can be only applied if the path weight is not 0, i.e. none of the relationship components involved is partial. Also, the Reduction rule can not be applied to synonyms. If the complexity constraint comp(R; R0)  (1; 1) is valid for two synonyms R:R0:X  R0 :X then instead of R:R0:X the identi er R0:X can be used and the Reduction rule can be applied.

3.2 Cardinality Constraints

Cardinality or complexity constraints are the most commonly used class of constraints for binary relationship types. But their properties are still not well understood in the case of n-ary relationship types. There is some related work in the area of relational database theory which is adaptable to the entity-relationship model. We know that inclusion dependencies express partially complexities. Furthermore, it is known that key dependencies [Tha90] can be expressed by complexities. Complexities can express domain dependencies [CoK83] or numerical dependencies [GrM85]. Therefore, the work on inclusion, domain and numerical dependencies [CFP84,CoK83,Mit83] can be used for complexities. The scheme de nition de nes directly also inclusion dependencies. These implicit de ned inclusion dependencies are typed and it is possible to use even only full inclusion depedencies [Tha90]. Since these inclusion dependencies are de ned on the keys they can be considered to be onto constraints [Kob85]. Since the class of cardinality constraints forms an important class of integrity constraints in entity-relationship models we need a complete theory of these constraints. Let us de ne for R = ((R1; :::; Rk); attr(R)) where Ri is an entity or a relationship type and for each i; 1  i  k, the following complexity comp(R; Ri) = (m; n) specifying that in each database state an item e from Rti appears in Rt at least m and at most n times, e.g. comp(R; Ri) = (m; n) i for all t , all e 2 Rti m  jfr 2 Rtjr(Ri) = egj  n where by jM j is denoted the cardinality of M and by r(Ri) is denoted the restriction of r to Ri . Notice, that recursive components in relationship types are denoted by either their labels, e.g. in Prerequis as Required:Course, or by using the component number, e.g. (Course,1). This notation is used for the complexity too. If n is unbounded then it is denoted by (m; :) . The pairs (n; m) are partially ordered by the natural partial order . This can be generalized to sequences of relationship types in

R = ((R1; :::; Rk; attr(R)). For each subsequence R01:::R0m , the



HERM Foundations

31

generalized complexity

comp(R; R01:::R0m) = (l; p) speci es that in each database state the items from R01t; :::; R0mt appear in Rt at least l and at most p times, e.g. comp(R; R01:::R0m) = (l; p) i for all t , all e1 2 R01t; :::; em 2 R0mt l  jfr 2 Rt jr(R0i) = ei ; 1  i  mgj  p . If Rj is a cluster ( Rj = Rj;1 +...+ Rj;m then the complexity notion can be generalized as follows for Rj;l (1  l  m): comp(R:Rj;l; Ri) = comp(R; Rj;lRi ). For example, the following complexities can be easily developed in the university example: comp(Prerequisite, (Course,1)) = (0,3) - a course can have at most three prerequisites where (Course,1) means the rst occurrence of Course in Prerequisite ; comp(Has.Professor, Project) = (1,1) - a project has one and only one investigator which is a professor; comp(Has.Person,Project) = (0,.) - a project can have contributors which are generally persons; comp(Has, Project) = (1,m) - a project has at least one contributor which is a person (or a professor); comp(Has, Professor) = (0,.) - professors can be the investigators of several projects; comp(Lecture, Professor) = (0,m) - professors o ers di erent courses; comp(Enroll, Lecture) = (0,40) - any o ered course is taken by not more than 40 students. Furthermore, it is possible to use also the following complexities: comp(Lecture[Semester], Professor, Course) = (0,1) - a course is in a term o ered only by one person; comp(Supervisor,Professor) = (1,.) - each professor supervises at least one student; comp(Major,Student) = (1,1) - each student has one and only one major. In [ZNG90] an analogous complexity de nition based on projections of relationship types is introduced (called attened constraints and nested constraints). There it is claimed that using that using nested constraints always results in consistent speci cations but attened may lead to inconsistent speci cations. We can use the examples below to show that this statement is wrong. There are nested speci cations which are inconsistent (proposition 3.25). For binary relationship types R = (R1; R2; attr(R)) between two entity or relationship types R1; R2, traditionally there are introduced special cardinality types: One-to-One, One-to-Many,

Many-to-One and Many-to-Many. One-to-one. Each item in Rt1 is associated with at most one item in Rt2 , each item in Rt2 is associated with at most one item in Rt1 , i. e. comp(R; R1) = (0; 1) (partial) or comp(R; R1) = (1; 1) (total) and comp(R; R2) = (0; 1) or comp(R; R2) = (1; 1). One-to-many. Each item in Rt1 is associated with any number of items in Rt2 , each item in Rt2 is associated with at most one item in Rt1 , i. e. comp(R; R1) = (0; m) or comp(R; R1) = (1; m) and comp(R; R2) = (0; 1) or comp(R; R2) = (1; 1).



HERM Foundations

32

Many-to-one. Each item in Rt2 is associated with any number of items in Rt1 , each item in Rt1 is

associated with at most one item in Rt2 , , i. e. comp(R; R1) = (0; 1) or comp(R; R1) = (1; 1) and comp(R; R2) = (0; m) or comp(R; R2) = (1; m). Many-to-many. Each item in Rt1 is associated with any number of items in Rt2 , each item in Rt2 is associated with any number of items in Rt1 , i. e. comp(R; R1) = (0; m) or comp(R; R1) = (1; m) and comp(R; R2) = (0; m) or comp(R; R2) = (1; m) . This notation can be extended also to arbitrary relationships. For a given relationship type R = (E1:::Ek; attr(R)) we can introduce similarly the notation of the (n1; m1); (n2; m2); :::; (nk; mk ) -relationship. Obviously, these cardinality constraints are special functional dependencies. The semantics of functional dependencies and of cardinality constraints is di erent. Functional dependencies are two-tuple constraints. Cardinality constraints are restrictions.

Proposition 3.10 (folklore) 1. For R = (R1; :::; Rk; attr(R)) the cardinality constraint comp(R; R01:::R0m)  (1; 1) is valid if and only if the functional dependency

R01:::R0m ?! R1; :::; Rk is valid in R. 2. For R = (R1; :::; Rk; attr(R)) the cardinality constraint comp(R; R0)  (1; 1)

is valid if and only if the inclusion dependency R0  R[R0] is valid in the scheme.

In the second case, the cardinality constraint de nes an into constraint [Kob85]. The complexity comp(Lecture; CourseSemesterProfessor) = (0; 1) expresses that rooms are determined by courses, terms and teachers and therefore the functional dependency Lecture : fCourse; Semester; Professorg ?! fRoom; Time(Day; Hour)g is valid in Lecturet . Using the de nition and properties of restrictions we conclude directly the following fact.

Proposition 3.11 The cardinality function is monotone decreasing for component sequences: Given a relationship R = (R1; :::; Rk; fA1; :::; Alg) . Then comp(R; Ri1 Ri2 :::Rij )  comp(R; R01:::R0m) for 1  j  k; 1  i1 < i2 < ::: < ij  k, fR01; :::; R0mg  fRi1 ; Ri2 ; :::Rij g .



HERM Foundations

33

This proposition shows that the cardinality function is based on some kind of "minimum semantics". The lower bound 0 is inherited to all supersequences of component sequences.

Corollary 3.12 Given a relationship R = (R1; :::; Rk; fA1; :::; Alg) and for 1  j  k; 1  i1 < i2 < ::: < ij  k, let fR01; :::; R0mg be a proper subset of fRi1 ; Ri2 ; :::Rij g . 1. 2. 3. 4.

If comp(R; R01:::R0m) = (0; s) and comp(R; Ri1 Ri2 :::Rij ) = (n; p) then n = 0 . comp(R; Ri1 Ri2 :::Rij )  minfR1 ;:::;RmgfRi1 ;Ri2 ;:::Rij g comp(R; R01:::R0m). If comp(R; R01:::R0m) = (1; 1) then comp(R; Ri1 Ri2 :::Rij )  (1; 1) . If comp(R; Ri1 Ri2 :::Rij )  (1; 1) then comp(R; R01:::R0m)  (1; 1): 0

0

The properties in proposition 3.11 and corollary 3.12 could be used for the axiomatization of the set of generalized complexities. However, as we will see later in this chapter, the axiomatization is more complex for generalized complexities. Using only inequalities obtained in this way inconsistent schemes can be derived. For this reason we introduce also graphs on schemes and complexity of pathes. It is easy to see that the set of generalized complexities is not k-ary axiomatizable [Tha90] by rules with k premises. The diagrams can be also labeled by cardinality constraints. It should be noted that there is a little aggreement on what edge labels to use, and what they mean in ER-diagrams. The classical notation is the following for binary relationships R = (E1; E2; attr(R)) (see for instance [EN89,Vos87]): The edge R ?! E1 is labeled by comp(R; E2) = (n; m) or by 1 if comp(R; E2) 2 f(0; 1); (1; 1)g or by n if comp(R; E2) 2 f(l; kjl 2 f0; 1g; l < k; k > 1g. The edge R ?! E2 is labeled by comp(R; E1) = (n; m) or by 1 if comp(R; E1) 2 f(0; 1); (1; 1)g or by n if comp(R; E1) 2 f(l; kjl 2 f0; 1g; l < k; k > 1g. This notation can not be extended to ternary relationships. For that reason, in [Teo89] complexities for ternary relationships are marked by shaded areas in the relationship triangle if the relationship is "many". More concrete, for instance, the E1-corner in the triangle which represents the relationship R = (E1; E2; E3; attr(R)) is not shaded if comp(R; E2E3)  (1; 1). This notation is complicated and comp(R; Ej )-complexities are not represented. This proposal could be extended to quadrary relationships but then we loose the information on the other complexities. Other books avoid the question or present examples for binary relationship types. [TL82] states that "the semantics of ternary and higher-order relationship sets can become quite complex to comprehend". But using the above de nitions, it is possible to represent the entire complexity information. We can use this information for labeling. For simpli cation, we propose the fol-



HERM Foundations

34

lowing simple labeling concept. This can be extended by entire complexity information. Given a relationship R = (R1; :::; Rk; fA1; :::; Alg) . For 1  j  k, the edge R ?! Rj can be labeled by comp(R; Rj ) = (n; m) or by 1 if comp(R; Rj ) 2 f(0; 1); (1; 1)g or by n if comp(R; Rj ) 2 f(l; kjl 2 f0; 1g; l < k; k > 1g. For 1  j  l, the edge R ?! Aj can be labeled by dom(Aj ). A more complex labeling would be the following label for the diamonds: Given a relationship

R = (R1; :::; Rk; fA1; :::; Alg). The diamond of the relationship R is labeled by a subset of the set f(< i1; i2; :::ij >; (n; m)) j 1  j  k; 1  i1 < i2 < ::: < ij  k; comp(R; Ri1 ; Ri2 ; :::Rij ) = (n; m)g . This label extends the notion of [Teo89].

For instance, for the relationship type Lecture = (Professor, Course, Room, Semester f Time(Day,Hour)g) besides the trivial generalized complexities like (< 1 >; (1; :)); (< 2 >; (1; :)); (< 3 >; (1; :)); (< 4 >; (1; :)); (< 1; 3 >; (0; :)); (< 1; 2; 3; 4 >; (0; 1)); and (< 2; 3 >; (0; :)); we obtain nontrivial generalized complexities like (< 1; 2 >; (0; 3) - each professor can take a course only three times, (< 1; 4 >; (0; :) - there are professors which are absent for a term, (< 1; 2; 4 >; (0; 1) - the below mentioned functional dependency , and (< 2; 4 >; (1; 3)) - each course is given in each term at least once but not more than three times. The last generalized complexity implies together with the last trivial generalized complexity the complexity (< 2; 3; 4 >; (0; 3)). The complexity (< 1 >; (1; :)) together with (< 1; 4 >; (0; :) expresses the constraint that a new professor can not be absent for his/her rst term. There can be used two other notions for the complexity of the relationship type R = ((R1; :::; Rk; attr(R)) : 1. For each subsequence R01:::R0m , the *-complexity comp(R; R01:::R0m) = (l; p) speci es that inT each database state the items from R01t  :::  R0mt Rt jR1 ;:::;Rm appear in Rt at least l and at most p times, e.g. comp(R; R01:::R0m) = (l; p) i for all t , all r0 2 Rt jR1 ;:::;Rm l  jfr 2 Rt jr(R01; :::; R0m) = r0gj  p . 0

0

0

2. For each subsequence R01:::R0m , the +-complexity

0



HERM Foundations

35

comp+(R; R01:::R0m) = (l; p)

speci es that in each database state the items from

Rt jR1 ; :::; Rt jRm appear in Rt at least l and at most p times, e.g. comp(R; R01:::R0m) = (l; p) i for all t , all e1 2 Rt jR1 ; :::; em 2 Rt jRm l  jfr 2 Rt jr(R0i) = ei; 1  i  mgj  p . 0

0

0

0

Using the de nition and properties of restrictions we conclude directly the following fact.

Proposition 3.13 Given a relationship type R = (R1; :::; Rk; fA1; :::; Alg) and fR01; :::; R0mg  fRi1 ; Ri2 ; :::Rij g for 1  j  k; 1  i1 < i2 < ::: < ij  k . 1. The *-complexity and the +-complexity are monotone decreasing for component sequences, e.g. comp+ (R; Ri1 Ri2 :::Rij )  comp+ (R; R01:::R0m) comp(R; Ri1 Ri2 :::Rij )  comp(R; R01:::R0m) 2. comp(R; Ri1 Ri2 :::Rij )  (1; 1). 3. comp+ (R; Ri)  (1; 1) . 4. If comp (R; Ri1 Ri2 :::Rij ) = (n1 ; m1), comp+ (R; Ri1 Ri2 :::Rij ) = (n2; m2), comp(R; Ri1 Ri2 :::Rij ) = (n3 ; m3) then m1 = m2 = m3 and n3  n2  n1 . 5. If comp+ (R; Ri1 Ri2 :::Rij ) = (0; s) and comp(R; Ri1 Ri2 :::Rij ) = (n; p) then n = 0 . Corollary 3.14 For R = (R1; :::; Rk; attr(R)) the cardinality constraint comp(R; R01:::R0m) = (1; 1)

is valid if and only if the functional dependency

R01:::R0m ?! R1; :::; Rk is valid in R.

The complexity comp(Lecture; CourseSemesterProfessor) = (1; 1) expresses the validity of the functional dependency Lecture : fCourse; Semester; Professorg ?! fRoom; Time(Day; Hour)g in Lecturet . This proposition shows that the *-complexity is the weakest form. We can extend Corollary 3.12 to the *-complexity and the +-complexity. Sometimes the utilization of the +-complexity together with the complexity could be useful. For instance, if for a relationship type R = (R1; R2; R3; attr(R)) comp(R; R1) = (0; m); comp(R; R2 = (0; n) then comp(R; R1R2) = (0; k). But it is possible that comp+ (R; R1R2) = (1; k). The last assertion carries more semantical information and can not



HERM Foundations

36

be deduced directly from the premises. In the university example the complexity comp+ (Lecture; CourseSemester) = (0; 3) expresses that each course is not given in each term and that course in a term a not given more than three times in a term whereas the complexity comp+ (Lecture; CourseSemester) = (1; 3) expresses that each course is given in any term and there are not more than three parallel sessions.

Theorem 3.15 Given a relationship type R = (R1; :::; Rk; attr(R)) and a subsequence R01; :::; R0m

of R1; :::; Rk. The following are equivalent: 1. comp+ (R; R01R02:::R0m)  (1; 1) . 2. The embedded cross dependency (fR01g; fR02g; :::; fR0mg) is valid in R, i.e. for each database state Rt it is valid that Rt jfR1;:::;Rmg= Rt jR1 :::  Rt jRm . 0

0

0

0

Proof. The proof is obvious by the de nitions.

Corollary 3.16 Given a relationship type R = (R1; :::; Rk; attr(R))

and a subsequence R01; :::; R0m of R1; :::; Rk. If comp(R; R01R02:::R0m)  (1; 1) then the embedded cross dependency (fR01g; fR02g; :::; fR0mg) is valid in R. The opposite is not valid.

Let us now consider the satis ability of cardinality constraints. Generally, each HERM scheme with a set of cardinality constraints is satis able by the empty database. A HERM-scheme S with a set of complexities C is called consistent (strongly satis able) if there exists at least one database DB = (r1; :::; rk) in SAT (S; C ) in which all ri are not empty. This property is not trivial. If for instance

comp(Prerequis; Required : Course) = (1; 2) and comp(Prerequis; Requires : Course) = (3; 4)

meaning that each course requires at least three and at most four prerequisites and each course is required from at least one and at most two courses then either Courset is empty or in nite. Suppose, Courset 6= ;. Then there exists at least one course c1 which is required by at least three other course. Let us assume that the requiring courses are c1; c2; c3. Furthermore, c2 is required by at least three others, e.g. c1; c2; c3. The course c3 is required by the at least three other courses. From the other side, the courses c1; c2; c3 can require at most two other courses. Therefore, c3 is required by three other courses, e.g. c4; c5; c6 . We can repeat the same procedure



HERM Foundations

37

for c4; c5. However, c6 is to be required by at least three di erent courses, say c7; c8; c9. Repeating this procedure we get an in nite set Courset . The reason for this is the ratio in the recursion. Let us consider rst recursive relationships. Given a relationship R = (R1; :::Rn; attr(R)) and the complexities comp(R; Ri) = (xi ; yi). Let us reorder the sequence of components in the relationship according the components, i.e. R = ((R1; 1); :::; (R1; q1); :::; (Rk; 1); :::(Rk; qk ); attr(R)) and comp(R; (Ri; j )) = (xi;j ; yi;j ); 1  i  k; 1  j  qi . Then for each database (Rt1; :::; Rtk; Rt) satisfying the cardinality constraints we obtain the condition xi;j  j Rti j  j Rt j  yi;j  j Rtk j; 1  i  k; 1  j  qi . Summarizing these conditions we derive the following statement.

Proposition 3.17 Given a relationship

R = ((R1; 1); :::; (R1; q1); :::; (Rk; 1); :::(Rk; qk ); attr(R)) with comp(R; (Ri; j )) = (xi;j ; yi;j ); 1  i  k; 1  j  qi . Then the HERM scheme fR1; :::; Rk; Rg with the complexities is consistent if and only if for all i; 1  i  k with qi > 1 it is valid that maxfxi;j j 1  j  qi g  minfyi;j j 1  j  qig . Directly, we can conclude that for the above presented example the scheme with the complexities comp(Prerequis; Required : Course) = (1; 2) and comp(Prerequis; Requires : Course) = (3; 4) is inconsistent whereas the scheme with the complexity comp(Prerequis; Required : Course) = (1; 2) and comp(Prerequis; Requires : Course) = (2; 4) is consistent. Let us now generalize the approach of [LN90] where is proposed a nonconstructive solution. It can be shown generalizing the approach of [LN90] that there is a constructive criterion. Furthermore, we extend the approach to recursive relationships. For that we introduce a labeled graph for HERM-schemes S = fE1; :::; Ek; R1; :::; Rmg and a set of associated complexities C : G(S; C) = (V; E ) where V = f(E1; 1); :::; (Ek; 1)g S f(Ri; j ) j 1  i  m; Ri = (R01; :::; R0n; attr(Ri); 1  j  ng E = f((R0j ; 1); (Ri; j ); y) j Ri = (R01; :::R0n; attr(Ri )); comp(Ri; R0j ) = (x; y)g S fS((Ri; j ); (Ri; j + 1); 1); ((Ri; j + 1); (Ri; j ); 1) j Ri = (RS01; :::; R0n; attr(Ri)); 1  j < ng f((Ri; j ); (R0j; 1); c(i; j )) j Ri = (R01; :::R0n; attr(Ri))g f((R0j ; 1); (Ri; j ); 1) j Ri = (R01; :::R0n; attr(Ri)); comp(Ri; R0j ) not de ned g S



HERM Foundations

38

2

(Course,1)

*        

(Prerequis,1)

1 HH 4 YH H H

HH HH j 1 HH H

3

6

1

1

?

(Prerequis,2)

Figure 3: Graph for Prerequisites

f((Ri; j ); (R0j; 1); 1) j Ri = (R01; :::R0n; attr(Ri)); comp(Ri; R0j ) not de ned g where ( 1 0  x if comp(Ri; Rj ) = (x; y )  (1; 1) c (i; j ) == 1 otherwise For the above considered example we obtain the graph in gure 3. Let G = (V; E ) where V = fvi j 1  i  ng and E = f(vi; vj ; c(i; j ))g a labeled graph. A sequence p = v1; :::; vn of nodes from V with (vi ; vi+1; j ) 2 E for some j and all i; 1  i < n is called path and is called cycle if additionally v1 = vn . The cycle is called simple if the elements in the sequence are pairwise di erent. For a sequence of nodes p = v1; :::; vn of nodes from V with (vi; vi+1 ; wi) 2 E the value ?1 wi is called weight of p. weight(p) = Qni=1 A critical cycle p is a simple cycle with a weight weight(p) less than 1. A critical cycle in gure 3 is for instance the cycle (Course,1),(Prerequis,1),(Prerequis,2),(Course,1) . The weight of the cycle is 32 . Let us consider the properties of pathes.

Lemma 3.5 Given a path p = (R01; i1); :::; (R0k; ik). Let weight(p) = mn < 1. Then m j R01t j  n j R0kt j. The proof is obvious. Therefore if weight(p) > 1 then in any database R0kt has more elements than R01t . Now we conclude directly



HERM Foundations

39

Proposition 3.18 Given a critical cycle

p = (R01; i1); :::; (R0k; ik ); (R01; i1) with weight(p) < 1.

Then in any database satisfying the complexities the sets R0jt are either empty or in nite.

Corollary 3.19 Given a HERM-scheme S and a set of associated complexities C . If a critical

cycle exists in (S; C ) then the scheme (S; C ) is inconsistent.

We can ask now whether it is possible to obtain from inconsistent schemes consistent subschemes.

Algorithm 1.

Given a HERM-scheme S = E1; :::En; R1; :::; Rk, a set of associated complexities C , and a set P = fpi j 1  i  mg of critical cycles. Step 1. Mark all entity and relationship types which are in a critical cycle of P . Step 2. Mark all relationship types which have components which are marked. Step 3. Repeat step 2 until all possible relationship types are marked. Step 4. Delete all marked entity and relationship types and associated complexities from the scheme.

Lemma 3.6 The scheme (S 0; C 0) obtained from scheme (S; C) by algorithm 1 is consistent. This proof uses the proof of [LN90]. In the proof of [LN90] correct assignments are used similarly to lemma 3.5. It can be easily shown that if a graph does not contain critical cycles then one correct assignment is a multiple from the path weights. Proof. Let us rst consider a simple fact stating that a relationship set can not have more elements than the cartesian product of the component sets. Given a relationship R = ((R1; 1); :::; (R1; q1 ); :::; (Rk; 1); :::(Rk; qk ); attr(R)) and comp(R; (Ri; j )) = (xi;j ; yi;j ); 1  i  k; 1  j  qi . Then for each database (Rt1; :::; Rtk; Rt) satisfying the cardinality constraints we obtain the condition j Rt j Qki=1(j Rti j)qi . Furthermore, the scheme obtained by algorithm 1 does not contain critical cycles. Now we can apply the following system of inequalities for the scheme

S = E1; :::En; R1; :::; Rk obtained by algorithm 1 and for the database S t = E1t ; :::Ent ; Rt1; :::; Rtk according to lemma 3.5: For R = ((R1; 1); :::; (R1; q1); :::; (Rk; 1); :::(Rk; qk ); attr(R)) and comp(R; (Ri; j )) = (xi;j ; yi;j ); 1  i  k; 1  j  qi (1) j Rt j  xi;j  j Rti j and (2) j Rt j  yi;j  j Rti j.

Therefore if the schema is consistent then each database satis es (1) and (2). Since there are



HERM Foundations

40

no critical cycles the ratio de ned by (1) and (2) is consistent. It is known from algebra that in this case there is an integer solution for this system. Now we need to construct for this solution the corresponding entity and relationship sets. We use for that an abstract domain NAT of all natural numbers. If for an entity set j E t j = n then let E t = f1; 2; :::; ng. The relationships are to be constructed using the merging of [LN90]. For an n-ary relationship type we use the sets of the component types as follows. Let us number all element of the relationship set according to the solution of the inequality system. For the rst component of the relationship set we associate the elements of the component set in the order of the component set to elements of the relationship set. Then we reorder the relationship set according to the order given by the component set. Now we associate elements of the next component set in the order of the component set to the relationship set. This we continue for all components. It is easy to see that the new relationship set satis es (1) and (2). Since the database obtained during this procedure is fully populated and satis es the inequality relations we can conclude that (S; C ) is consistent. Another proof of lemma 3.6 is using the following idea. This idea can be used further for simpli cation of the schemes and for simpli cation of reasoning on scheme properties. For this reason we introduce the contraction of schemes. Each cycle can be represented by a relationship R and a recursive relationship R0 = ((R; 1); (R; 2); ;) on R. We use the following construction. Given the graph G(S; C ) = (V; E ) and the cycle p = R1; :::; Rn; Rn+1 with Rn+1 = R1 . Since the scheme S is an hierarchical schemethe cycle has one element Ri such that the elements Ri?1 and Ri+1 are of an higher order than Ri. Now we can construct a new relationship type using the cycle p. Without loss of generality we can assume that i = 1. The cycle p de nes a forest of relationship and entity types. For the cycle p we de ne an algebraic compression expression. If the order of the type Rj ?1 is less than the order of the type Rj we de ne a new type by Rj [Rj?1 ] which is of the order of Rj?1 . If the order of the type Rj ?1 is greater than the order of the type Rj we de ne a new type by Rj ?1 [Rj ] which is of the order of Rj . If the orders of the types Rj ?1 and Rj are equal then we de ne a new type by the join of the two types. This compression computes the cycle p for a given database. Let us denote by Rp the binary relationship type on R1. For instance given the types R1 = (:::; R6; :::) ; R2 = (:::R3; :::; R6; :::) ; R4 = (:::; R3; :::) ; R5 = (:::R1; :::; R4; ::::) and the cycle p = R5; R1; R6; R2; R3; R4; R5. Then we obtain the relationship type Rp = R5[ (R1[R6]) ; ( (R4[R3] 1 R2[R3; R6] )[R6]) ] which is a recursive binary type on R6 . We denote the join by 1 and the projection to components by [ ]. Using lemma 3.5 we can compute the corresponding complexities of the new relationship type. Directly we get that this new type has a critical cycle via the rst to the second component of the new type if and only if the cycle p is critical. For this reason, we need to consider only



HERM Foundations

41

binary recursive relationship types for proofs of properties of cycles. This special compression explains why we need to consider for cycles only binary relationship types. For this reason, if we consider only the satis ability of complexities only binary relationship types could be used instead of relationship types of higher arity. Lemma 3.6 can be reformulated.

Corollary 3.20 If a scheme (S; C) does not have critical cycles then the scheme is consistent. Summarizing corollaries 3.19 and 3.20 we get the following general fact.

Theorem 3.21 Given a HERM-scheme S and a set of associated complexities C . (S; C) is inconsistent i the graph G(S; C ) contains a critical cycle.

The computation whether the graph contains a critical cycle can be done by computing a special matrix product: Let G = (V; E ) where V = fvi j 1  i  ng and E = f(vi ; vj ; c(i; j ))g a labeled graph. Then we de ne an adjacency matrix M0(G) by ( c(i; j ) if (vi ; vj ; c(i; j )) 2 E 0 mi;j (G) = 1 otherwise Now we de ne inductively SMs+1 (G) as follows msi;j+1 (G) = minfmsi;j (G)g fmsi;k (G)m0k;j (G) j 1  k  ng . Now we can conclude the following proposition.

Proposition 3.22 The graph G(S,C) contains a critical cycle i there are s; i; j such that

msi;j+1 (G(S; C)) < 1.

According to theorem 3.21 the implication problem of complexities di ers for acyclic schemes and schemes containing a cycle. Using the equivalence of functional and inclusion dependencies and a result of [KCV83] we get that the implication problem of complexities for acyclic HERM schemes is PSPACE-hard. Using the approach of [CFP84] we can show that the implication problem for cyclic HERM schemes is not axiomatizable. We can exploit the weight function of pathes also for scheme corrections. Let us consider pathes p with weight(p) = 1. According to Lemma 3.5 pathes with this property must have a ration greater or equal than 1. Let us consider an abstract scheme: R2 = (R1; R3; ;); R4 = (R3; R1; ;);



HERM Foundations

42

comp(R2; R1) = (1; 1); comp(R2; R3) = (0; 1); comp(R4; R3) = (1; 1); comp(R4; R1) = (0; 1); .

According to Theorem 3.21 this scheme is consistent. Let us assume that one of the (0; 1)complexities is proper in the database ( Rt1 , Rt2, Rt3, Rt4 ), for instance the rst. Then there is an element c in Rt3 which is not related to an element in Rt1 by Rt2. This element is related to one and only one element a in Rt1 by Rt4. This element is related to another element c0 of Rt3 by Rt2. Via Rt4 we obtain another element a0 in Rt1. Now we get (a0 ; c") 2 Rt1. Further, c" 6= c. Continuing this consideration we conclude that the database must be in nite in this case. If this scheme has a nite database then no such dangling c exists. Therefore we get comp(R2; R1) = (1; 1); comp(R2; R3) = (1; 1); comp(R4; R3) = (1; 1); comp(R4; R1) = (1; 1); .

Proposition 3.23 Given a HERM-scheme S and a set of associated complexities C and a

cycle p with weight(p) = 1. Then all complexities of the cycle are nite, i.e. for the graph G(S; C) = (V; E ) and the path p = v1; :::; vn of nodes from V with (vi; vi+1; j ) 2 E for some j the condition j = 6 1 for all i; 1  i < n.

This property is similar to a rule used in the axiomatization of nite implication of unary inclusion dependencies and functional dependencies [Tha90,KCV83]. Proof. Given the graph G(S; C ) = (V; E ), a nite database DB satisfying C and the cycle p = R1; :::; Rn; Rn+1 with weight(p) = 1 and Rn+1 = R1 . Without loss of generality we assume that R2 and Rn are relationship types on R1. Now we contract the path p to one binary relationship type R02 by separating the subpathes p1 = v1 ; v2 and p2 = v2 ; :::; vn. Now we get for R02t using lemma 3.5 j Rt1 j  weight(p1) j R02t[R1; 1] j, j Rt1 j  weight(p2) j R02t[R1; 2] j, and therefore since weight(p) = 1 j Rt1 j = weight(p2) j R02t[R1; 2] j = weight(p1) j R02t[R1; 1] j. But in this case, no dangling element can exist in the database sets in the cycle. The proposition can not be extended to pathes with a weight higher than 1. In this case, dangling values are possible. Using this proposition we can use now the following algorithm for correcting schemes.

Algorithm 2.

Given a HERM-scheme S = E1; :::En; R1; :::; Rk, a set of associated complexities C , and a set P = fpi j 1  i  mg of cycles with weight 1. Step. For each cycle and for the complexities comp(Ri; Rj = (x; y ) in the cycle: If x = 0 then correct x to 1.



HERM Foundations

43

Algorithm 2 and proposition 3.23 leads directly to the following fact.

Corollary 3.24 Given a HERM-scheme S and a set of associated complexities C . The scheme (S; C ) is equivalent to the scheme obtained by algorithm 2 G(S; C 0). Let us now consider the question whether the properties of complexities are inherited by generalized complexities. Obviously, if there are given only generalized complexities then we can not derive conditions on complexities. But if there are given complexities and generalized complexities then we can derive inequalities using proposition 3.11 and corollary 3.12. For instance, given the relationship type R = (E1; E2; E1) and the constraints comp(R; (E1; 1)) = (1; 2); comp(R; (E1; 2); E2) = (3; 4) then we obtain using corollary 3.12 comp(R; (E1; 2))  (3; 4) and by theorem 3.21 that Rt = ; for each t. Proposition 3.17 can be generalized using the same proof.

Proposition 3.25 Given a relationship

R = ((R1; 1); :::; (R1; q1 ); :::; (Rk; 1); :::(Rk; qk ); attr(R)) with comp(R; (Ri; j )) = (xi;j ; yi;j ) ; 1  i  k; 1  j  qi , and comp(R; (Ri; j )(Ri; j 0)) = (x(i;j)(i;j ) ; y(i;j)(i;j )) ; 1  i  k; 1  j < j 0  qi . Then the HERM scheme fR1; :::; Rk; Rg with the complexities is inconsistent if for some i; 1  i  k with qi > 1 it is valid that maxf x xi;j j 1  j < j 0  qi g > minf y yi;j j 1  j < j 0  qig (i;j )(i;j ) (i;j )(i;j ) 0

0

4

0

0

Conclusion

The goal of database modeling is to design an ecient and appropriate database. Some important criteria are performance, integrity, understandability, and extensibility. We have developed an extension of the entity-relationship model. Based on this extension a new approach to database design has been developed which is e ective in meeting these goals. This approach shows that a strong theory can be developed and applied for important practical problems. The history of database management systems demonstrates that a lacking theoretical basis leads to poor and dicult to apply technologies. The presented model has the following advantages: 1. The model has a strong theoretical basis.  The model is based on a multitype logic which is equivalent to the rst-order predicate logic. For this reason, results known from discrete mathematics and relational theory [Tha90] can be used.



HERM Foundations

44

 The model covers the complete modeling information. The structure, static seman-

tics, generic and user-speci ed operations and behavior of an application can be described by the model.  The theory is simpli ed and cleaned up. Sequences, subsets and powersets of objects can be modeled directly. Is-A-Relationships are treated in a better way. Weak entity types can be avoided completely. A normal form theory is developed for the HERM. Using this normal form theory, we can obtain normalized schemes like in the classical theory.  Since the model uses the distinction between kernel objects and dependent objects the database schemes are directly translatable to classical models without additional restructuring. 2. The modeling is more natural and can be applied in a simple manner. Only necessary facts are to be expressed.  The model supports a direct translation to the three classical database models. This translation preserves normal forms. Since a direct translation to relational, network and hierarchical schemes can be used the design decisions directly could be used to obtain schemes in normal forms. The translation theory can be used for a multimodeland multisystem-support [YaT89] and presents a practical solution to interoperability of systems.  The HERM algebra is used for query de nition. The corresponding relational, network or hierarchical queries can be automatically generated.  The model supports a rich set of constraints. These constraints are used for the development of the scheme equivalence. Although the excessive number of factencoding mechanisms means that the same semantic unit can be declared in many syntactically di erent and compatible ways, the information described is equivalent. This equivalence theory can be used for automatic modi cation of schemes [BOT90].  The database maintenance procedures can be derived using the design information.  Using a knowledge base previous and system-provided design decisions can be reused or partially reused what simpli es the design task. Furthermore, similarities in the design can be detected and used for simpli cation of the implementation.  Using the whole design information the retranslation of application programs can be used for the adaption of existing database systems to changing environments. 3. The theory is applicable to practical needs.  Based on the theory a multi-paradigm, robust design methodology is developed [Tha91,YaT89] which encorporates approaches known in object-oriented modeling [STW91], modular programming [Tha89] and programming in large.  Using this modeling approach, a view cooperation concepts was developed. Since full view integration is not decidable and not axiomatizable view cooperation is the only applicable approach.



HERM Foundations

45

 The approach can be used for reverse engineering. Systems and programs developed

for one management system can be recompiled and adapted to other management systems. 4. The results of the design are much simpler than in other approaches.  We have used the the model for modeling also some more complex applications. One observation is that the obtained schemes are from three to ve times simpler than those obtained by other models. The example of [TWB89] is simpli ed by four times and can be placed on one page or one screen. In other examples, the simpli cation makes it possible to nd a model. Using this modeling approach, an airport counter application was modelled by less than 40 entity types and less than 120 relationship types whereas the original solution with more than 150 entity types and more than 400 relationship types was unacceptable by users because of complexity and nontransparency.  The simpli cation leads also to a better understanding of the application and makes normalization easier to perceive.  The schemes avoid additional redundancy. Using HERM, the normalization and the minimalization of schemes can be considered together. 5. The model is easy understandable, simple and perceivable.  The model can be used as a basis of database design tools [Tha89,Tha91]. The system (DB)2 is used at present by more than 100 user groups.  The modeling approach is simple to use even for large problems.  Since the model uses graphical representations the modeling result is easier to understand and visible.  In an experiment, 20 novice or end-user database designers learned the HERM methodology and later designed di erent database schemes in di erent areas. Our experience was that that the methodology was easily accepted, led to more accurate, to less redundant schemes and to a correct modeling of complex relationships.  Using query forms [YaT89] the user can specify application transactions and programs on the top of the HERM which reduces substantially the complexity of application programming. We have represented an extended entity-relationship model together with some theoretical investigations on this model. The extensions are simple but very useful for practical database modeling. We developed some theoretical fundamentals of this models and illustrated that this model has a sound, well-founded theory alike the relational model. The introduced approach has several advantages over the existing ones.  This approach can be used for value-oriented modelling of databases as well as for objectoriented modelling of value-representable databases.



HERM Foundations

46

 The semantics is well-founded. We can use most of relational database theory. Therefore,

a rich set of well-founded semantical constructs is available.  The structure of the modelled databases can be simple represented by diagrams. Relationships on relationships are independently introduced in [RMN90]. The simpli cation of ER-schemes is one of the most important advantages of this extension. ER-schemes are normally much simpler with this extension. For instance, the corresponding diagram of example 3 can not be placed on one page if we use the approach of [TWB89]. Since IsA-relationships and other dependence relationships like weak entity types are often to be used the normal simpli cation ratio can be estimated by three to ve, i.e. HERM schemes are up to ve times simpler. Furthermore, IsA-relationships can be easier extracted. The direct representability of relational decomposition and relational normalization is another important advantage. For this reason, the schemes can be normalized on the HERM level and then translated to third normal form or BCNF schemes. Normally, the ER model requires normalization after translation. As it is illustrated also in [RMN90,Tha89,YaT89] simple structural associations have to be represented in the ERM approach by very complex semantical constraints like generalized path dependencies requiring from the designer a deep knowledge in logics and a high abstraction level. However, using the HERM approach structural associations can be represented by structural constructs which are much simpler. In [YaT89] an example of a complex database scheme of the Kuwait Ministery of Planning is discussed. This database was represented in a simple manner by a HERM scheme which was using relationship types of third order and then translated to an ecient relational scheme together with the generation of restructuring procedures of the existing database and the generation of adaption procedures of existing transaction modules. Then after restructuring the original database the new database could operate in the same manner as before but was more ecient. Since the HERM approach is representing also hierarchies of types this approach can be used also for restructuring and adapting existing network and even hierarchical databases to relational databases without any loss of information. There was presented a theory of satis ability for cardinality constraints. This theory has to be extended to other constraints. Determining, whether a given set of integrity constraints is satis able or not is an undecidable problem because of undecidability of consistency of a set of logical formulas. As far as satis ability is concerned, dependencies are uncritical [Tha90]. Dependencies are valid in the empty database and in trivial databases in which each relation contains not more than one element. But extending our set of integrity constraints it is easy to nd examples for unsatis able schemes (see for instance [Man90]). Let us consider our example with two additional relationship types Chair = (Department, Professor, ;), Leads = (Subord:Professor, Leader:Professor, ;) , with the complexities comp(Chair, Department) = (1,.) , comp(In, Professor) = (1,.) , the inclusion dependency Chair[Department,Professor]  In[Department,Professor] , the path inclusion dependency



HERM Foundations

47

Professor-In-Department-Chair-Professor[In.Professor,Chair.Professor]  Subord:Professor-Leads-Leader:Professor [Leads.Subord:Professor,Leads.Leader:Professor] , the existence constraint 9 Professor and the hierarchical constraint specifying the hierarchy in departments Leads(x,y) ?! x 6= y. These constraints have a perfectly natural semantics in the scheme. Unfortunately, the scheme can be shown to be unsatis able. Every database of this example has to contain at least one professor - say p1 - which is a person. By the second complexity, there is at least one department - say d1 . In order to satisfy the rst complexity constraint, there exists another professor p2 which is the chair of the department d1 and a member in this department according to the inclusion dependency. Up till now p1 and p2 can be the same person. The path inclusion dependency relates p1 and p2 and also p2 with p2 . The last relationship contradicts the hierarchical constraint. The reason for this inconsistency is the exceptional case for the path inclusion dependency that the chair of the department is not his own leader. Thus the path inclusion dependency should be extended to the disjunctive path inclusion dependency [Tha90] Professor-In-Department-Chair-Professor[In.Professor,Chair.Professor]  Subord:Professor-Leads-Leader:Professor [Leads.Subord:Professor,Leads.Leader:Professor] S Professor-In-Department-Chair-Professor[Chair.Professor,In.Professor] which states that the path inclusion dependency is valid for those professors which are not the chairs of the department. References

[AlT90] [AtPa86] [BaRS82] [BeK86] [BeK90] [BOT90]

S. Al-Fedaghi and B. Thalheim. Fundamentals of the database theory: The key concept. Submitted for publication. Kuwait 1990. P. Atzeni and D.S. Parker, Set containment inference. Proc. ICDT'86, LNCS 243, 1986, 73{90. F. Bancilhon, P. Richard and M. Scholl, On-line processing of compacted relations. Proc. 8th VLDB Conf., 1982, 263{269. C. Beeri, M. Kifer, An integrated approach to logical design of relational database schemes. ACM TODS, 11, 1986, 159{185. C. Beeri and Y. Kornatzky. Algebraic optimization of object-oriented query languages. Proc. ICDT 90 (Eds. S. Abiteboul and P. C. Kanellakis), Lecture Notes in Computer Science 470, 72{88. P. Bachmann, W. Oberschelp, B. Thalheim, and G. Vossen, The design of RAD: Towards an interactive toolbox for database design. RWTH Aachen, Fachgruppe Informatik, Aachener Informatik-Berichte, 90-28, 1990



HERM Foundations

[CFP84] [Cod90] [CoK83] [DK 83] [DLM89] [EN89] [FoMV91] [GPV88] [GrM85] [Hai90] [Hul90] [JaN83] [KCV83] [Kob85] [LN90] [Lien80]

48

M.A. Casanova, R. Fagin, and C.H. Papadimitiou, Inclusion dependencies and their interaction with functional dependencies. Journal of Computer and System Sciences, 28 1, 1984, 29{59. E.F. Codd, The relational model for database management, Version 2. Addison-Wesley, Reading, 1990. S.S. Cosmodakis and P.C. Kanellakis, Functional and inclusion dependencies - A graph theoretic approach. Technical Report CS-83-21, Brown University, Dept. of Computer Science, 1983. J. Demetrovics and G.O.H. Katona. Combinatorial problems of database models. Colloquia Mathematica Societatis Janos Bolyai 42, Algebra, Combinatorics and Logic in Computer Science, Gyor (Hungary), 1983, 331{352. J. Demetrovics, L.O. Libkin, and I.B. Muchnik. Functional dependencies and the semilattice of closed classes. Proc. MFDBS-89, LNCS 364, 1989, 136{147. R. Elmasri and S. H. Navathe, Fundamentals of database systems. Benjamin/Cummings Publ., Redwood City, 1989. A. Formica, M. Missiko , and S. Vazzana, An object-oriented data model for arti cial intelligence applications. LNCS 504, Springer 1991, 26{41. M. Gyssens, J. Paredaens, and D. Van Gucht. A uniform approach towards handling atomic and structural information in the nested relational database model. Report of the University of Antwerp UIA 88-17, 1988. J. Grant and J. Minker, Inferences for numerical dependencies. Theoretical Computer Science 41, 1985, 271{287. J.-L. Hainaut. Entity-relationship models: formal speci cation and comparison. Proc. 9th ER Conference, ed. H. Kangassalo, 1990, 53{64. G. Hulin, On restructuring nested relations in partioned normal form. Proc. VLDB, 1990, 626{637. S. Jajodia and P.A. Ng, On representation of relational structures by entity-relationship diagrams. Entity-Relationship Approach to Software Engineering, eds. C.G. Davis, S. Jajodia, P.A. Ng and R.T. Yeh, North-Holland, 1983, 249{263. P.C. Kanellakis, S.S. Cosmodakis, and M.Y. Vardi, Unary inclusion dependencies have polynomial time inference problems. Technical report CS-83-09, Brown University, Dept. of Computer Science, 1983. I. Kobayashi, An overview of database mangement technology. In Advances in Information System Science, ed. J.T. tou, Vol. 9, Plenum Press, New York, 1985. M. Lenzerini and P. Nobili, On the satis ability of dependency constraints in entityrelationship schemata. Information Systems, Vol. 15, 4, 1990, 453{461. Y.E. Lien, On the semantics of the entity-relationship model. Entity-Relationship Approach to system analysis and design, ed. P.P. Chen, 1980, 155{167.



HERM Foundations

[Man90]

[MeZ90] [Mit83] [NDT88] [NiH89] [PDG89] [Ris88] [RMN90] [STH91] [STW91] [Teo89] [TWB89] [Tha88] [Tha88'] [Tha89] [Tha89'] [Tha90] [Tha90']

49

R. Manthey, Satis ability of Integrity Constraints: Re ections on a neglected problem. Proc. Second workshop on Foundations of Models and Languages (eds. J. Goers, A. Heuer), Aigen, 1990, Informatik-Bericht 90/3, University Clausthal-Zellersfeld, Computer Science Dept., 169{180. M.A. Melkano and C. Zaniolo, Decomposition of relations and sysnthesis of entityrelationship diagrams. Entity-Relationship Approach to system analysis and design, ed. P.P. Chen, 1980, 277{294. J.C. Mitchell, The implication problem for functional and inclusion dependencies. information and Control, 56, 3, 1983, 154{173. G.M. Nijssen, D.J. Duke and S.M. Twine, The entity-relationship data model considered to be harmful. Preprint University of Queensland, Dept. of Computer Science, 1988. G. M. Nijssen and T. A. Halpern. Conceptual schema and relational database design - a fact oriented approach. Prentice Hall, Sydney 1989. J. Paredaens, P. De Bra, M. Gyssens, and D. Van Gucht. The structure of the relational database model. Springer, Berlin, 1989. N. Rishe. Database Design Fundamentals. Prentice-Hall, Englewood-Cli s, 1988. A. Rochfeld, J. Morejon, and P. Negros, Inter-relationship links in E-R models. Proc. 9th Entity-Relationship Conference (ed. J. Kangassalo), 143{156. R. Spencer, T. Teorey, and E. Hevia. ER standards proposal. Proc. 9th ER Conference, ed. H. Kangassalo, 1990, 405{412. K.-D. Schewe, B. Thalheim, I. Wetzel and J.W. Schmidt, Extensible safe object-oriented design of database applications (together with K.-D. Schewe, J.W. Schmidt, I. Wetzel). Submitted for publication, 1991. J.T. Teorey. Database Modeling and Design: The Entity-Relationship Approach. Morgan Kaufmann Publ. San Mateo, 1989. T.J. Teorey, G. Wei, D.L. Bolton, and J.A. Koenig, ER model clustering as an aid for user communication and documentation in database design. Comm. ACM 32, 1989, 8, 975{987. B. Thalheim. Logical relational database design tools using di erent classes of dependencies. J. New Gener. Comput. Syst. 1 (1988), 3, 211{228. B. Thalheim. A systematic approach to Database Theory. In: INFO-88, GDR, 1988, p. 158 - 160 (in German). B. Thalheim. The higher-order entity-relationship model and (DB)2 . LNCS 364, Springer 1989, 382{397. B. Thalheim. On Semantic Issues Connected with Keys in Relational Databases Permitting Null Values. Journal Information Processing and Cybernetics, EIK, 1989, 25, 1/2, 11{20. B. Thalheim. Dependencies in Relational Databases. Leipzig, Teubner Verlag 1991. B. Thalheim. Theoretical fundamentals of the higher-order entity-relationship model. Prepared for publication. Kuwait 1990.



HERM Foundations

[Tha91] [Thay89] [TL82] [Ull89] [Vos87] [Wedd89] [YaT89] [Yok88] [ZNG90]

50

B. Thalheim, Concepts of the database design. In: Trends in database management systems, (eds. G. Vossen, K.-U. Witt), Oldenbourg, Munchen, 1{48 (in German). A. Thayse (ed.), From modal logic to deductive databases. John Wiley, vol. 1: 1989, vol. 2: 1990. D. Tsichritzis and F. Lochovsky, Data Models. Prentice-Hall, 1982. J. D. Ullman. Principles of database and knowledge-base systems. Computer Science Press, 1989. G. Vossen. Datenmodelle, Datenbanksprachen und Datenbank-Management-Systeme. Addison-Wesley, Bonn, 1987. G.E. Weddell, A theory of functional dependencies for object-oriented models. OODB Conference, Kyoto, 1989, 150{169. M. Yaseen and B. Thalheim. Practical Database Design Methodologies. Kuwait University, Faculty of Science, 1989, 256p. K. Yokota, Deductive approach for nested relations. In: Programming of future generation computers II, K. Fuchi, L. Kott (eds.), 1988, 461{481. J. Zhu, R. Nassif, P. Goyal, P. Drew and B. Askelid. Incorporating a model hierarchy into the ER paradigm. Proc. 9th ER Conference, ed. H. Kangassalo, 1990, 68{80.

Suggest Documents