On the Notion of Concept 1 Introduction and ... - Semantic Scholar

9 downloads 0 Views 254KB Size Report
To add a concrete singer called Elvis Presley and his fanClub buriers we specify: de ne concept (is elvis singer) features (firstName elvis "Elvis"), (lastName elvis ...
On the Notion of Concept Peter C. Lockemann Guido Moerkotte Fakultat fur Informatik Postfach 6980 Universitat Karlsruhe D - W 7500 Karlsruhe Germany Netmail: [email protected] Abstract

The notion of concept is central to the notion of data model in databases. Its purpose is to provide construction mechanisms for organizing a universe of data into manageable segments with a well-de ned structure. However, so far the notion lacks rigor with a concomitant danger of inconsistent, overlapping or redundant use of concepts. The paper takes the approach that rigor is achieved by expressing the intuitive semantics within a formal logic. In doing so, the notion of concept may also provide a mechanism for structuring sets of facts, rules, and constraints in general deductive databases. In the paper a formal semantics for concept de nitions is given by showing that each concept de nition can be interpreted as a mapping from one database state to another. A number of examples show the usefulness of the concept de nition. Especially, the basics of object-orientation are modeled as concepts. The paper also examines some dicult issues arising in connection with the concepts in conventional data models. First practical experiences are related, and topics for further research are pointed out.

1 Introduction and Motivation The notion of concept is a central part of the notion of data model as it is used in the realm of database technology. Intuitively speaking, a concept is a construction mechanism for organizing a universe of data into manageable segments with a wellde ned structure. Usually it is assumed that a data model contains only a very small number of such concepts. For example, the relational data model incorporates the concepts of domain, tuple, and relation, the network model the concepts of domain, record, and set. Viewed from a programming language perspective, concepts are generic 1

constructors, since they are not data types by themselves but yield data types { or in databases, a database schema { when applied to already existing data types. Associated with each concept is a set of (generic) operators for inserting, removing, replacing or accessing data that correspond to the structure, or part of it, imposed by the concept. Take again the relational model, where the relational algebra constitutes such a set of operators. Such a view of concept, then, corresponds to the notion of parameterized abstract data type. Not surprisingly, the notion of concept has been carried over into the realm of semantic data models, even though it usually is con ned to structural aspects whereas operators play a lesser role or no role at all. Take the entity-relationship model with its concepts of attribute, domain, entity, relationship, and its later extensions by cardinalities, aggregation, generalization. Again, a limited set of concepts is provided to impose a clear, legible structure on a database or, in the case of semantic modeling, on the universe of discourse that is to be mapped to a database. Particularly in semantic models concepts are chosen in a way that makes it easy to express the semantics of a universe of discourse. This makes the introduction of new concepts a highly subjective and intuitive a air. Karl and Lockemann [7] give a list of twenty or so concepts that over time have appeared in various semantic models. They also point out that it is often not clear to which extent these concepts are identical or at least overlapping because no precise formalism is used to de ne them. Consequently, when mixing them within a semantic schema it remains unclear to which extent the schema contains built-in redundancies or contradictions. Further, the lack of a formal de nition is acutely felt when one tries to establish algorithms for automatically mapping the components of a semantic schema to those of a logical database schema. Karl and Lockemann also argue that new applications should be put in a position to de ne their own concepts which are speci cally tailored to the needs of the new universe of discourse. In such a case a more formal approach to introducing concepts seems absolutely necessary because an application administrator (in the sense of the ANSI/SPARC three-schema approach) could not be expected to determine on his own the consistency of a schema. Many years ago the strong relationship between logic and databases was already pointed out [4]. Since questions of inconsistency and redundancy fall naturally into the realm of logic, and proofs on the adequacy of mapping a semantic schema, say an entity-relationship schema, to a logical schema, say a relational schema, could at least be based on formal logic, the present paper proposes to use a logic-based approach to the formalization of the notion of concept. Such an approach seems to o er a further advantage. One of the recent directions in database development has been deductive databases. Unfortunately, deductive databases seem to forfeit one of the important bene ts of databases grounded in data models: a discriminatory structure that allows to orient oneself within the often huge volumes of data. Instead the user is faced with a mass of facts and rules that on cursory inspection all seem to look alike. Numerous attempts have been reported in the more recent literature to try to impose some sort of structure on such sets of rules 2

[1, 2, 3, 6, 13, 15]. Our hope, then, is that if we succeed in giving a formal, logic-based de nition of concept, the same notion may also provide clues as to how to organize a set of facts and rules into more manageable units. A formal basis for the notion of concept will not by itself provide any gains in expressiveness. It will however, by the sheer need of a database designer to express his intentions and needs for a new concept within a strict framework, contribute to a more careful evaluation of existing and more careful construction of new data models. It will also allow to examine a given database schema for certain formal properties such as consistency, redundancy or even decidability. In developing a formal notion of concept we have, of course, to rely on intuition: the nal product should re ect what database people all along have intuitively called a concept. The rest of the paper is organized as follows. Section 2 gives a rst taste of the notion of concept by introducing several simple examples. The full syntax as well as a precise semantics of a concept de nition are given in section 3. In particular the semantics is expressed in terms of an abstract deductive database having a precise semantics. Section 4 contains a comprehensive example. Here, we show how the basic ingredients of objectorientation can be modeled by means of concepts. Section 5 demonstrates that by and large the concepts in traditional data models satisfy our notion of concept, but that a few additions will ultimately be needed to overcome observed de ciencies. Section 6 brie y discusses rst practical experiences while section 7 concludes the paper.

2 A First Glimpse

2.1 The Intuitive Semantics of a Concept De nition

We start by developing an intuitive notion of concept. To express it we need a syntax together with a semantics that associates a meaning with the syntactic constructs. The semantics should re ect our objectives stated in the introduction. Consequently, we take a constructive approach to the semantics: every introduction (de nition) of a new concept leaves an observable e ect in the database. Since we plan to allow concepts to occupy any of the traditional levels, generic constructs, types, or instances, we look for a notion of database that obscures the levels and, hence, permits uniform treatment of all concepts. Such a notion is nicely provided by deductive databases because of their foundations in formal logic. In section 3.2 we will give the precise semantics of a concept de nition by mapping it to a change in the contents of a deductive database. To facilitate the understanding of the subsequent sections, the basic line of argument is previewed in informal terms in this subsection. As usual, a deductive database consists of three parts: a set of facts, a set of (deductive) rules, and a set of consistency constraints. The deductive rules are used to derive new facts from the set of facts present in the fact part of the database. The set of all explicitly stated facts unioned by the deducible facts are called the extension of the database. Contrary to rules, constraints are not used to derive facts. Instead, they are conditions that must be satis ed within the extension of the database. By checking 3

the constraints the system guarantees that no contradiction occurs within the union of the extension of the database and the set of consistency constraints. Syntactically, a concept de nition consists of a concept head which contains the name of the concept and the applicable arguments, and a set of clauses. Every clause, if speci ed, results in the addition of its constituents to some part of the database. E.g., all the formulas given within the requires clause are added as constraints to the constraints part of the database. More of this mapping is given in the next subsection, and the precise de nition follows in section 3.2.

2.2 First Examples

Giving precision to the notion of concept is rst of all giving precision to our intuitive understanding of what we mean by "concept". To do so we proceed in the standard fashion: we use a number of examples and organize them into a syntactic framework where the syntactic constructs re ect an intuitive, verbally explainable semantics. As we progress, these semantics become more and more formalized. The syntactic framework follows generally used rules. Keywords are denoted in bold characters, variables start with a capital letter, constants { including already de ned concepts { with a lower-case letter. Further, predicates are expressed in the form of tuples, e.g., 2-place predicates as (p x y) with p a predicate are interpreted as p(x,y). We assume the existence of the standard classi cation predicate is, although this will be introduced more formally as a concept in section 4.1. The rst concept we introduce is parent. Classes will be modelled by constants. (is x c) indicates that x belongs to class c. A triple (parent x y) will denote that x is the parent of y. We further restrict the arguments x and y to be persons. All this information is now gathered in a so-called concept de nition frame. de ne concept (parent X Y) requires dependent (is X person) and (is Y person)

end concept (and represents the obvious boolean connector.) The requires clause states that both arguments of the parent relation must be persons. In general the requires clause is meant to model requirements which must hold

for the introduced concept. In this particular case the clause constrains the applicability of the concept, i.e., no fact (parent x y) can be introduced if not both, x and y are known to be persons. The keyword dependent indicates that the variables X and Y are bound to the arguments of the head of the concept de nition, i.e., of (parent X Y). It can thus be read as all X,Y (parent X Y) impl ((is X person) and (is Y person)) (all denotes 8-quanti cation, impl denotes implication.) The semantics will be, to add this formula as a consistency constraint to the database. 4

Our next concept is that of ancestor. Here, (ancestor x y) means that x is an ancestor of y. Of course, the information necessary to determine the ancestor relation is fully contained in the parent relation: the ancestor relation is the transitive closure of the parent relation. Thus the former can be derived from the latter. We call such a concept derivable. The rules needed to derive the information necessary for it are gathered in the if clause of a concept de nition. We further require that the ancestor relation is acyclic, i.e., there is no person x such that (ancestor x x). de ne derivable concept (ancestor X Y) if (parent X Y) ; (parent X Z), (ancestor Z Y) requires all X not (ancestor X X)

end concept (not represents negation.)

The two atom lists, separated by ';', of the if clause represent the de nition of the transitive closure of the parent relation. The ',' is an abbreviation of the keyword and, the only allowed boolean connector within rule de nitions. The given atom lists are expanded to complete rules by using the concept head, i.e., (ancestor X Y), as the rule head. The completed rules are then added to the rule base. The constraint within the requires clause requires the acyclicity of the ancestor relation. Note that contrary to the concept de nition of parent the keyword dependent is not used within the requires clause, here. This is possible, since all the variables of the formula forall X not (ancestor X X) are bound and no reference to the arguments of the concept head is necessary. Note that parent should also be acyclic. Since ancestor is required to be acyclic, the acyclicity of the parent relationship is being dealt with implicitly. From the standpoint of the parent de nition, though, we observe a scattering of related information. This will be one of the arguments for rede ning the concept of parent at the end of this section. Note that the two concepts of parent and ancestor resemble more of a data type de nition than a generic constructor because we bound the variables x and y to persons which already are types. To extend our example to generic constructors we now explicate the notion of transitive closure. Such a concept would indeed be generic, since it applies to any elements no matter what their types are. As a consequence variables must be used for predicates. This is a higher-order construct which we will give a rst-order semantics by rei cation (explained after the example). de ne concept (transitive closure P Q) implies dependent (P X Y) impl (Q X Y); (P X Y), (Q Y Z) impl (Q X Z)

end concept The implies clause is used to specify the implications (consequences) of a concept. Note that we use the keyword dependent in the implies clause since the variables P

and Q have to be instantiated before the rules make any sense. This is not true for 5

variables X,Y and Z. The semantics of the implies dependent construct can, for this example, be described by adding (transitive closure P Q) as a premise to the rules and then adding these modi ed rules, i.e., (transitive closure P Q), (P X Y) impl (Q X Y) (transitive closure P Q), (P X Y), (Q Y Z) impl (Q X Z) to the rule base once. This way the deduction possibilities of the database are used. As mentioned earlier, our syntax of concept de nitions is higher-order which means that we allow variables for predicates (the rst term within a tuple which itself is a list of terms). In order to give a rst-order semantics to a concept de nition we use the technique of rei cation. Within the deductive database we will use the single predicate holds with arbitrary arity. Then each tuple is mapped to an argument list of holds. E.g., the tuple (p x y) is mapped to holds(p x y). One may also argue that it is useful to introduce a separate concept acyclicity. This can be done as follows: de ne concept (acyclic P) requires dependent all X not (P X X)

end concept

These two (generic) concepts may now be used to ease the de nition of the concept parent, to condense it, and to further group semantically related information together. Remember that we had two concept de nitions, one for the parent relation and one for the ancestor relation. Hence, despite their close interconnection the information is scattered. We will avoid this within the alternative de nition of parent given below. de ne concept (parent X Y) followups (transitive closure parent ancestor) features (acyclic ancestor) requires dependent (is X person) and (is Y person)

end concept

This single concept de nition of parent replaces the two previous de nitions of parent and ancestor. The followups clause is used to de ne facts which are directly associated with the de ned concept. In the example the relation ancestor is introduced by de ning it to be the transitive closure of parent. The feature clause is used to de ne features of the concept de ned in the head of the concept de nition, or of those introduced in the followups clause. Since all the facts following both, the followups and the features clauses will simply be added to the fact part of the database there does not exist any distinction in terms of semantics. Nevertheless, we think there is a distinction in the pragmatics of the two clauses. Whereas the followups clause should be used to introduce further constants within the given context, the features clause should be used to describe features of the de ned concept or the relationships introduced within the followups clause. 6

3 The General Notion of Concept 3.1 Syntax

In this subsection we present a more complete syntax for a concept de nition. By introducing a special syntax for a concept de nition we try to explicitly state the semantic relationships between facts, rules, and consistency constraints found in a semantic unit. This allows for some order in the mass of facts, rules, and consistency constraints by grouping those items under a single header (concept de nition). Using "[]" for optionality, and g jj s for the repetition of g items separated by s the general syntax for a concept de nition frame is as follows: de ne [derivable] concept (name x1 . . . xn ) [dependent a jj ','] [if f(a jj ',')g jj ';'] [followups a jj ','] [features a jj ','] [implies f [a jj ','] [dependent] r g jj ';'] [requiresf [a jj ','] [dependent] c g jj ';']

end concept

Here, xi (1  i  n, n  0) are terms (i.e., variables or constants), a is a pre-atom1 of the form (t1 : : : tn ) for terms t1 : : : tn, r is a pre-rule in the form of Horn clauses, and c is a consistency constraint, built from pre-literals and the logical connectors (all, ex, and, or, not, impl, eqv) in the usual way. As pre-literals we allow a sequence (t1 : : : tn), t1 == t2, and t1 =/= t2 with ti constants or variables. While the ti could also be general terms involving (evaluable) functions we restrict our considerations to the aforementioned simpler cases. The expression following the de ne concept is referred to as the head of the concept de nition (or concept head for short).

3.2 Semantics

In this subsection we show how concept de nitions can be mapped to an abstract deductive database.

Preliminaries: A term is a variable symbol or a constant symbol (we restrict our-

selves to function-free terms such as in DATALOG). The set of all terms is denoted by T . The set of atoms is de ned as A := fholds(t1 :::tn) j ti 2 T g. Note that holds is the only predicate and that it may have an arbitrary arity. An atom not containing any variable is called a fact. A literal is de ned as either an atom or, if a is an atom then :a is a literal. The set of all literals is denoted by L. If l1; :::; ln; ln+1 are literals then l1; : : : ; ln =) ln+1 is a rule. A matrix is either a literal, or 1

The use of the pre x \pre-" becomes clear in the next subsection.

7

for m1; m2 matrices an expression of the form :m1 , (m1 _ m2 ), (m1 ^ m2 ), (m1 =) m2), (m1 () m2) or (l1; : : : ; ln =) ln+1). If possible we omit parentheses by using the usual precedence ordering of the boolean connectors. Every matrix is a formula, and if f is a formula then 8 x f and 9 x f are formulas. The set of all formulas is denoted by F . A database is de ned to consist of facts, rules, and consistency constraints. De nition 3.1 (Database) A database is a triple DB := (DB a; DB d; DB c) where DB a is a set of facts, DB d is a set of rules, and DB c is a set of closed range-restricted2 formulas called consistency constraints. The semantics of such a database de nition can be found, e.g., in [11], together with the de nition of consistency of a database. Another notion we have been referring to before is extension. De nition 3.2 (Extension) For a database DB := (DB a; DB d; DB c) we de ne its extension M (DB ) := faja is a fact; DB j= ag, and its complete extension or for short completion C (DB ) := M (DB ) [ f:aja is a fact; DB 6j= ag.

Conversion If p = (p1    pn) we denote by p holds(p1    pn). This notation is advanced to any formula. Additionally, it denotes the mapping of the symbols all, ex, and, or, impl, not to 8; 9; ^; _; =); :, respectively. The introduction of a concept 0

de nition with head H into a deductive database is now de ned as follows: 1. If the head H is a fact: DB a := DB a [ fH g 2. If the concept is derivable we perform for each atom list a1 ; : : : ; an in the if clause: DB d := DB d [ fa1 ; : : : ; an =) H g 3. The facts following the followups and features are added to DB a or DB d , i.e., for each fact f occurring in one of the above clauses: DB a := DB a [ ff g if there is no dependent clause, and DB d := DB d [ fd =) f g if the dependent clause is speci ed as an atom list d. 4. For each rule a1 ; : : : ; an impl an+1 occurring in the implies clause we add: 0

0

0

0

0

0

0

 if dependent is not speci ed: DB d := DB d [ fa1; : : : ; an =) an+1 g  if no atom list precedes dependent and it is speci ed: DB d := DB d [ fH ; a1; : : : ; an =) an+1g  if an atom list b1 ; : : : ; bm precedes dependent: DB d := DB d [ fb1 ; : : : ; bm ; H ; a1; : : : ; an =) an+1g 0

0

0

0

2

0

0

0

0

0

0

0

see [14]

8

0

0

5. For each formula f speci ed within the requires clause we add: If no dependent clause is speci ed:  if dependent is not speci ed: DB c := DB c [ ff g  if no atom list precedes dependent and it is speci ed: DB c := DB c [ f8X1 : : : 8Xk H =) f g if X1 : : : Xk are the variables occurring in H .  if an atom list b1 ; : : : ; bm precedes dependent: DB c := DB c [ f8X1 : : : 8Xk b1 ; : : : ; bm H =) f g if X1 : : : Xk are the variables occurring in b1 ; : : : ; bm ; H . If the dependent clause is speci ed its atoms are added to the preconditions in the same way as for the dependent option. If the head of a concept de nition does not contain any variable, the head can be thought of as being treated as a fact added to the fact base of a usual deductive database. If the dependent clause is speci ed within a concept de nition, then the pre-literals speci ed in the subsequent list will be used as preconditions to all the subsequent clauses of the concept de nition frame. An example will be given in section 4.1. 0

0

0

0

0

0

0

0

3.3 A Simple Example

We now show how the rst concept de nitions given in section 2 are to be added to the database. For de ne concept (parent X Y) requires dependent (is X person) and (is Y person)

end concept

the following is performed: DB c := DB c [f8X; Y holds (parent X Y ) =) holds (is X person ) ^ holds (is Y person )g The de nition of the ancestor concept de ne derivable concept (ancestor X Y) if (parent X Y) ; (parent X Z), (ancestor Z Y) requires all X not (ancestor X X)

end concept yields:

9

 DB d := DB d [ fholds (parent X Y ) =) holds (ancestor X Y )g [fholds (parent X Z ); holds (ancestor Z Y ) =) holds (ancestor X Y )g  DB c := DB c [ f8X :holds (ancestor X X )g Rei cation becomes a necessity when adding the concept de nition for the transitive closure to the database. de ne concept (transitive closure P Q) implies dependent (P X Y) impl (Q X Y); (P X Y), (Q Y Z) impl (Q X Z)

end concept

results in  DB d := DB d [fholds (transitive closure P Q ); holds (P X Y ) =) holds (Q X Y )

[ fholds (transitive closure P Q ); holds (P X Y ); holds (Q Y Z ) =) holds (Q X Z )g.

Due to space limitations we do not repeat the de nition of acyclic. To conclude, we examine the compacti ed de nition of parent in place of the parent and ancestor de nitions above. de ne concept (parent X Y) followups (transitive closure parent ancestor) features (acyclic ancestor) requires dependent (is X person) and (is Y person)

end concept

results in  DB a := DB a [fholds(transitive closure parent ancestor); holds(acyclic ancestor)g  DB c := DB c[ f8 X; Y holds(parent X Y ) =) holds(is X person) ^ holds(is Y person)g In general, whenever the database is updated, its consistency is checked within the extension [11]. This is necessary in the previous three cases, on entering the de nition of  parent, because of the additional constraint the consistency of the existing database must be checked again (after all, we cannot entirely exclude the possibility that some parent facts had previously been entered);  ancestor, because new ancestor relationships may now be deduced, and their acyclicity would have to be proven;  transitive closure, because the extension of the database is again enlarged. 10

Technically speaking, consistency checking becomes an extremely cumbersome and lengthy, even impractical a air unless one takes special precautions. Some have to do with limiting the database space to be checked (for several techniques see, e.g., [14, 9, 8, 10]), others with spreading the time intervals between checking (e.g., by a suitable transaction mechanism). Adding the facts to DB a may lead to an even larger addition to the extension of the database because the rules introduced as part of the de nition of transitive closure may now { provided there already exist parent facts { derive new ancestor facts. It is this new extension which must subsequently be checked for consistency. We summarize: The notion of concept has now made precise the more intuitive feeling that a concept brings order into the description of a universe of discourse by 1. obviating the need to explicitly state all consequences of observed facts, 2. restricting what is acceptable in terms of entered facts.

4 A Comprehensive Example: Object-Orientation

4.1 Objects and Attributes

In the previous chapters we already used the is concept. Whereas we then relied on an intuitive understanding we now introduce it more rigorously in this section. As a prerequisite, we introduce a number of constants. In an object-oriented environment, everything is an object. All objects are classi ed via is into subsets called object types which collectively are referred to by ot. With objects one may associate certain properties named by attributes. Attributes will have a domain (dom) and a range (ran). Consistency constraints will be used to restrict the arguments of an attribute to the domain and range speci ed. All attributes will be gathered in a special object type called at. By default all attributes will be multi-valued and will allow null-values. Restrictions must then be expressed by specifying single-valued and/or not-null for attributes. Concepts for these restrictions will also be introduced. Since not every user will wish to use our notion of object-orientation, we make the package of object-orientation dependent on the de nition of a concept basic o-o. Only if de ne concept (basic o-o)

end concept

is included the basic object-orientation package de ned below will become visible to the user. This de nition results in DB a := DB a [ fholds(basic o-o)g. Let us start with the de nition of is. (is x y) indicates that x is of type y. de ne concept (is X Y) dependent (basic o-o) 11

followups (is object object), (is ot object), (is at object) features (is is object), (is object ot), (is is at) implies dependent (is X object), (is Y object); requires dependent (is Y ot); all X (is X object) impl ex T (is T ot), (is X T), T =/= object end concept (Note that this concept de nition is not consistent without the following one.) Let us brie y explain the is-concept. As already mentioned, it totally depends on the presence of a fact (basic o-o) within DB a. In the followups clause the bootstrap mechanism for the is-predicate is presented. We state the previously mentioned facts that the constants object, ot, and is are objects. Additionally, object is made known as an object type (ot). (At rst glance this may sound surprising; however if we arrange the object types within a class lattice then object is needed as the root object type.) The implies clause guarantees for any introduced fact (is a b) the derivation that both, a and b are objects. The requires clause contains two constraints. The rst states that the second argument b of a (is a b) tuple must be an object type. The second constraint demands that each rst argument a of (is a b) is of a type di erent from object. For example, all the constants object, is, ot are of object type ot which is di erent from object. The changes in the database are as follows: DB d := DB d [ fholds(basic o-o) =) holds(is object object); holds(basic o-o) =) holds(is is object); holds(basic o-o) =) holds(is ot object); holds(basic o-o) =) holds(is object ot); holds(basic o-o); holds(is X Y ) =) holds(is X object); holds(basic o-o); holds(is X Y ) =) holds(is Y object)g DB c := DB c [ fholds(basic o-o); holds(is X Y ) =) holds(is Y ot) holds(basic o-o) =) 8Xholds(is X object) =) 9 T holds(is T ot) ^ holds(is X T ) ^ T = = = objectg Here, the use of the rst dependent clause is exempli ed. All the facts given as followups and features are introduced as rules with holds(basic o-o) as the only premise. Additionally, all rules and constraints include this atom as an additional premise. The next de nition introduces attributes. If (attr a d r) de nes an attribute a then a term of the form (a o v) is interpreted as object o having the value v for the attribute a. The constants d and r give the domain and the range,respectively, of the attribute. The consistency constraints will require that for for each (a o v) o will be in the domain, and v in the range of a. The de nition of the concept attribute is now as follows: de ne concept (attr A D R) dependent (basic o-o) followups (is at object), (is dom object), (is ran object) features (is ot ot), (is at ot), (is dom at), (is ran at) 12

(dom dom at), (ran dom ot), (dom ran at), (ran ran ot) (is is at), (dom is object), (ran is ot) implies dependent (is A at), (dom A D), (ran A R) requires (is A at) dependent all D1 D2 (dom A D1), (dom A D2) impl D1 == D2; (is A at) dependent all O,D,V (A O V), (dom A D) impl (is O D); (is A at) dependent all R1,R2 (ran A R1), (ran A R2) impl R1 == R2; (is A at) dependent all O,R,V (A O V), (ran A R) impl (is V R)

end concept The followups introduces the three constants at, dom and ran and declares them to be objects. The features speci es at to be an object type, and dom and ran to be

members of at, thus making them attributes. Then the domains and ranges of dom and ran are given. Lastly, is is made known as an attribute, too. The domain (range) for is is object (ot). The set of constraints can be divided into two groups (containing two constraints each) capturing the semantics of dom and ran, respectively. The rst group simply states that for every attribute the domain must be unique, and that the rst argument of each attribute must be in the domain. The second group states the analogon for ran. The modi cations of the database are as follows: DB d := DB d [ fholds(basic o-o) =) holds(is at object); holds(basic o-o) =) holds(is dom object); holds(basic o-o) =) holds(is ran object); holds(basic o-o) =) holds(is at ot); holds(basic o-o) =) holds(is dom at); holds(basic o-o) =) holds(is ran at); holds(basic o-o) =) holds(dom dom at); holds(basic o-o) =) holds(ran dom ot); holds(basic o-o) =) holds(dom ran at); holds(basic o-o) =) holds(ran ran ot); holds(basic o-o) =) holds(is is at); holds(basic o-o) =) holds(dom is object); holds(basic o-o) =) holds(ran is ot)g DB d := DB d [ fholds(basic o-o); (attr A D R) =) holds(is A at) holds(basic o-o); (attr A D R) =) holds(dom A D) holds(basic o-o); (attr A D R) =) holds(ran A R)g DB c := DB c [ f8 A holds(basic o-o); holds(is A at) =) 8 D1; D2 holds(dom A D1); holds(dom A D2) =) D1 == D2; 8 A holds(basic o-o); holds(is A at) =) 8 O; D; V holds(A O V ); holds(dom A D) =) holds(is O D); 13

8 A holds(basic o-o); holds(is A at) =) 8 R1; R2 holds(ran A R1); holds(ran A R2) =) R1 == R2; 8 A holds(basic o-o); holds(is A at) =) 8 O; R; V holds(A O V ); holds(ran A R) =) holds(is V R)g This example illustrates for the rst time the use of a non-empty atom list preceding the dependent option. As a consequence, holds(is A ot) becomes an additional premise to the constraints. Lastly we de ne the concepts single-valued and not-null which specialize the previously de ned attribute concept. de ne concept (single-valued A) requires dependent (is A at) and all O,V1,V2 (A O V1), (A O V2) impl V1 == V2

end concept

The resulting database updates are: DB c := DB c [ fholds(single-valued A) =) holds(is A at)^ 8 O; V 1; V 2 holds(A O V 1); holds(A O V 2) =) V 1 == V 2g

de ne concept (not-null A) requires dependent (is A at) and all D,O (dom A D), (is O D) impl ex V (A O V) end concept requires DB c := DB c [ fholds(not-null A) =) (holds(is A at)^ 8D; O holds(dom A D); holds(is O D) =) 9 V holds(A O V )g) We will not end this section without giving an example application of the concepts introduced here. The object type to be de ned is person.

de ne concept (is person ot) followups (attr name person string),

(attr age person integer), (attr profession person string) features (not-null name), (not-null age), (single-valued age)

end concept

Let us illustrate the introduction of a speci c person named egon: 14

de ne concept (is egon person) features (name egon "egon"), end concept

(age egon 23), (profession egon "dancer")

Taking both de nitions together the following modi cations to the database ensue: DB a := DB a [ fholds(is person ot); holds(attr name person string), holds(attr age person integer), holds(attr profession person string) holds(not-null name); holds(not-null age); (single-valued age); holds(is egon person); holds(name egon egon ); holds(age egon 23); holds(profession egon dancer ) 00

00

00

00

4.2 Is-a is a Concept

In this subsection we model the isa relation together with inheritance. Hence, the de nition should include its transitive closure called isa trans. Its features will include acyclicity. Further, we model inheritance of the class membership is by giving the appropriate rule in the implies clause. de ne concept (isa X Y) dependent (basic o-o) followups (transitive closure isa isa trans) features (acyclic isa trans) implies dependent (is Z X), (isa trans X Y) impl (is Z Y) requires dependent (is X ot), (is Y ot)

end concept

The database implications are: DB d := DB d [ fholds(basic o-o) =) holds(transitive closure isa isa trans) holds(basic o-o) =) holds(acyclic isa trans) holds(basic o-o); holds(isa X Y ); holds(is Z X ); holds(isa trans X Y ) =) holds(is Z Y )g c c DB := DB [ fholds(basic o-o); holds(isa X Y ) =) holds(is X ot) holds(basic o-o); holds(isa X Y ) =) holds(is Y ot)g

15

5 Traditional Concepts Revisited At this point the reader may agree with the authors that the notion of concept introduced before does indeed give rigour to the more intuitive usage of the term in the standard literature. The few examples stated so far, however, leave open the question whether the notion does indeed cover all the concepts that have variously been termed as such in the literature. Only if this could be shown to be true would our notion prove its worth. If not, then the de ciencies would at least point a way to further re nements. We investigate the question for two of the conventional data models, the entity-relationship model and the relational model.

5.1 Entity-Relationship Model

The entity-relationship model provides two di erent concepts: entity type and relationship type. Below we attempt a de nition for both concepts. We model both, entity types and relationship types as object types (ot) in the sense of section 4.1. In this way the modeling of attributes can be carried over. Entity types are represented by their names. de ne concept (entity type E) features (is entity type ot) implies dependent (is E ot) dependent (is E entity type)

end concept

To allow the instantiation of entity types, i.e., introducing an entity of the respective type, we de ne every entity type to be an object type. Besides having a name, relationship types have an arbitrary number of entity types they relate to. Additionally, for each relating entity type one of the following cardinality speci cations must hold: 1 1, 1 n, 0 1, 0 n (minimum and maximum cardinalities). de ne concept (card) followups (is card ot), (is 1 1 card), (is m 1 card), (is m n card) requires all C (is C card) impl (C == 1 1 or C == 1 n or C == 0 1 or C == 0 n)

end concept

To simplify our discussion we shall disregard recursive relationships 3 . Thus every entity type can be involved in a speci c relationship type only once. The possibility of one entity type being multiply involved in a relationship requires a di erentiation between the di erent occurences. This leads to one more argument for the relate and the card predicates thus lengthening the concept de nition. Since this is an entirely technical issue nothing is gained by including it. 3

16

de ne concept (reltype R) features (is reltype ot) implies dependent (is R reltype), (is R ot), (attr R R object), (attr relate R entity type)

requires all R, E, C (card R E C) impl (reltype R) and (entity type E) and (is C card); all R, E, C1, C2 (card R E C1), (card R E C2) impl C1 == C2; all R, E (relate R E) impl ex C (card R E C); dependent ex E1, E1 (relate R E1), (relate R E2), E1 =/= E2; dependent all Ri, Ei, (relate R Ei), (is Ri R) impl ex Eio (R Ri Eio); dependent all Ei, Ri, Eio1, Eio2 (relate R Ei),

(is Eio1 Ei), (is Eio2 Ei), (R Ri Eio1), (R Ri Eio2) impl Eio1 == Eio2; dependent all Ei, Ci (relate R Ei), (card R Ei Ci) impl ( (Ci == 1 1 or Ci == 1 n) impl all Eio, Ej (is Eio Ei), (relate R Ej), Ei =/= Ej impl ex Ro, Ejo (is Ro R), (is Ejo Ej), (R Ro Ejo))

and

( (Ci == 1 1 or Ci == 0 1) impl all Ri, Rj, Eio (is Ri R), (is Rj R), (is Eio Ei), (R Ri Eio), (R Rj Eio), impl Ri == Rj)

end concept

First, we made reltype an object type where we collect all the de ned relationships R. Every relationship is then related to a set of entity types. As soon as a speci c relationship (an instance of R) is introduced this relationship also becomes an attribute which relates to every instance of the relationship the accordingly related entities (objects). The rst six constraints state the following: 1. Due to its arity 3 card does not t into the attribute framework. Thus we state explicitly that the rst argument of card must be a relationship, the second one an entity, the third one an appropriate cardinality. 2. For each relationship type, only one cardinality may be speci ed for each entity type. 3. For every involved entity type a cardinality must be speci ed. 4. Each relationship type relates to at least two entity types. 5. Every instance of a relationship type must relate to at least one instance for each of its entity types. 17

singer

?@ ? @ @@ 1 1 0n ?? @?singerHasFanClub?@ @ ? @@ ?? @?

fanClub

Figure 1: An Example E/R-Diagram 6. Every instance of a relationship type may relate to at most one instance for each of its entity types. The last constraint keeps an eye on the cardinality restrictions. We note that { due to the logic used { our language is not expressive enough to deal with general cardinalities. Figure 5.1 shows a very simple E/R-Diagram. Two entity types singer and fanClub are related by a single relationship type singerHasFanClub. The cardinalities are such that every fanClub has exactly one related singer, and there is no requirement for a singer having a fanClub. The concept de nitions to model the E/R-Diagram are as follows:

de ne concept (entity type singer) followups (attr rstName singer string), (attr lastName singer string), (attr status singer statusT)

end concept

de ne concept (entity type fanClub) followups (attr numberOfFans fanClub integer) end concept de ne concept (reltype singerHasFanClub) features (relates singerHasFanClub singer), (relates singerHasFanClub fanClub)

(card singerHasFanClub singer 0 n), (card singerHasFanClub fanClub 1 1)

end concept

To add a concrete singer called Elvis Presley and his fanClub buriers we specify:

de ne concept (is elvis singer) features ( rstName elvis "Elvis"), (lastName elvis "Presley"), 18

(status elvis dead)

end concept

de ne concept (is buriers fanClub) features (numberOfFans buriers 133) end concept de ne concept (singerHasFanClub elvis buriers) end concept

5.2 Relational Model

The relational model provides just one concept, the relation. We run immediately into a problem when trying to de ne a concept relation within our framework. This is caused by the arbitrary arity of a relation. Whereas it is easy to de ne individual relations within our framework it is dicult to de ne the general concept of a relation. This is due to the lack of a general mechanism (within our notion of concept) to express an arbitrary number of arguments. The problem may be circumvented, but only by very unnatural means that defeat our initial purpose of the notion of concept: de ne a relation as having a mapping has attrs from its name to its attributes. Equally unpleasant, in order to express that a relation consists of tuples, again due to the arbitrary arity problem we must explicitly introduce a tuple identi er (TID) for each tuple and must then treat the attribute values using the mappings introduced within the de nition of the relation. We denote a tuple tid being an element of a relation rname by (in rname tid). To ease the understanding of the following concept de nition we neither treat typing of the attributes nor null values.

de ne concept (relation Rname) requires dependent ex Attr 1 (has attr Rname Attr 1); all Tid, Rname1, Rname2 (in Tid Rname1), (in Tid Rname2) impl Rname1 == Rname2; all Attr, Tid, Val (Attr Tid Val) impl ex Rname (in Tid Rname); all Attr, Tid, Val (Attr Tid Val) impl ex Rname (has attr Rname Attr); all Attr, Tid, Val1, Val2 (Attr Tid Val1), (Attr Tid Val2) impl Val1 == Val2; dependent all Tid, Attr (has attr Rname Attr), (in Rname Tid) impl ex Val (Attr Tid Val) end concept 19

The constraints require the following (in the above order): 1. Every relation must at least have one attribute. 2. Any Tid can be used only once within all relations. 3. An attribute can only be de ned for tuples being in some relation. 4. Only de ned attributes can be used. 5. Attributes values must be unique. This also avoids multiple usage of the same Tid within one relation. 6. No null values are allowed, that is, for every tuple in a relation and every attribute de ned for this relation there exists an attribute value. To illustrate the de nition of a speci c relation a relation person with attributes name and age is introduced. de ne concept (relation person) followups (has attr person name), (has attr person age)

end concept

We now introduce a person named john of age 23. As a tuple identi er we will use p1. de ne concept (in p1 person) features (p1 name john), (p1 age 23)

end concept

6 Experiences The previously discussed work is an outgrowth of a project on a design environment for engineering databases. Its underlying premise is that semantic models for engineering applications are so diverse that it is close to impossible to come up with a limited set of universally acceptable modeling concepts. Instead, one would have to expect an evergrowing library of concepts that would make it impossible even for a skilled database designer to clearly di erentiate between the concepts, to avoid overlapping use of them, redundancies and contradictions, and to maintain semantic schemas that are easy to communicate to and be understood by the application experts. To resolve the dilemma, the design environment instead o ers facilities to an application administrator to de ne a small set of concepts tailored to the application at hand. The concept de nition language introduced in this paper is part of the facilities. Basically, the environment consists of a deductive database complete with inference and constraint check mechanisms, a concept manager that collects the concept de nitions and also adds the ensuing facts, rules and constraints to the database where they 20

are checked, a knowledge acquisition tool through which the application administrator enters his concept de nitions and later on utilizes them when designing the semantic schema, and a validation tool that generates consistent test databases as a way of rapid prototyping and then allows to run transactions against the test database. In these experiments, the concept mechanism has proven to be a valuable asset that gives coherence and precision to the entire design environment.

7 Conclusion and Future Research We set out to give precise meaning to the intuitive notion of (modeling) concept as it is used in the literature on data models. We hope to have shown that the logic-based approach does indeed give rigour to the notion, particularly if one takes the semantics employed into account which associates with each de nition an e ect in a deductive database. The approach seems to o er a number of advantages. First of all, the de nition language introduced could be seen as a speci cation tool which forces the designer of a data model into a disciplined and rigorous explication and evaluation of his thoughts and ideas. Second, the consistency of his concepts (be they generic and/or typed) can be examined by formal proof methods and techniques. In that sense the semantics could also be seen as a kind of rapid prototyping (see e.g. [5, 6]). Third, such a deductive database may then be used for querying general knowledge ([12]). On the other hand, the approach is only an initial step towards concept design. Major problems remain and should be the subject of further research. For one, as chapter 5 clearly indicates, the formalism is as yet not powerful enough to specify even those concepts that for a decade or more have been common currency among database designers. Further enhancements are clearly required. Second, in standard database applications where performance is a critical issue an approach that remains rule-based is impractical. Ways must be found to convert the operational semantics of augmenting a deductive database and executing constraint checks against its extension into algorithmic solutions that automatically guarantee a large portion of the consistency constraints, and construct portions of the extension on demand.

Acknowledgment: We thank Klaus Radermacher for careful reading of a rst draft. Also, we are grateful to the anonymous referees for fruitful comments.

References [1] J. Baro , R. Simon, F. Gilman, and B. Shneidermann. Direct manipulation user interfaces for expert systems. In J.A. Hendler, editor, Expert Systems: The User Interface, pages 99{125, 1988. 21

[2] D.G. Bobrow and M. Ste k. The LOOPS manual. Technical Report KB-VLSI-8113, Palo Alto Research Center, 1981. [3] R.O. Duda, P.E. Hart, N.J. Nilsson, and G.L. Sutherland. Semantic network representations in rule-based inference systems. In D.A. Waterman and F. Hayes-Roth, editors, Pattern-Directed Inference Systems, pages 203{221, 1978. [4] H. Gallaire and J. Minker, editors. Logic and Data Bases. Plenum Press, New York, 1978. [5] G. Grosz and C. Rolland. Using arti cial intelligence techniques to formalize the information system design process. In A.M. Tjoa and R. Wagner, editors, Proc. Int. Conf. on Database and Expert Systems Applications, pages 374{380, 1990. [6] R.J.K. Jacob and J.N. Froscher. A software engineering methodology for rule-based systems. IEEE Trans. on Knowledge and Data Engineering, 2(2):173{189, 1990. [7] S. Karl and P.C. Lockemann. Design of engineering databases: A case for more varied semantic modelling concepts. Information Systems, 13:335{357, 1988. [8] R. Kowalski, F. Sadri, and P. Soper. Integrity checking in deductive databases. In Proc. 13th Int. Conf. VLDB, pages 61{69, 1987. [9] J.W. Lloyd and R.W. Topor. A basis for deductive database systems. J. Logic Programming, 2:93{109, 1985. [10] G. Moerkotte and S. Karl. Ecient consistency checking in deductive databases. In 2nd. Int. Conf. On Database Theory, 1988. 118-128. [11] G. Moerkotte and P.C. Lockemann. Reactive consistency control in deductive databases. Internal report 3/90, Universitat Karlsruhe, 1990. also submitted. [12] A. Motro and Q. Yuan. Querying database knowledge. In Proc. ACM SIGMOD Int. Conf. on Management of Data, pages 173{183, 1990. [13] D.S. Nau and M. Gray. Hierarchical knowledge clustering: A way to represent and use problem-solving knowledge. In J.A. Hendler, editor, Expert Systems: The User Interface, pages 81{98, 1988. [14] J.-M. Nicolas. Logic for improving integrity checking in relational data bases. Acta Informatica, 18, 1982. 227-253. [15] J.B. Wright, F.D. Miller, G.V.E. Otto, E.M. Siegfried, G.T. Vesonder, and J.E. Zielinski. ACE: Going from prototype to product with an expert system. In Proc. 1984 ACM Annual Conf. 5th Generation Challenge, pages 24{28, 1984.

22

Suggest Documents