Beginnings of a Theory of General Database

0 downloads 0 Views 214KB Size Report
Abstract. Ordinary logical implication is not enough for answering queries in a logic database, since es- pecially negative information is only implicitly ...
1 In: Abiteboul/Kanellakis (Eds.): ICDT’90 (Third International Conference on Database Theory) pp. 349–363, Springer-Verlag, Lecture Notes in Computer Science 470, 1990

Beginnings of a Theory of General Database Completions Stefan Brass Universit¨at Hannover Institut f¨ ur Informatik, FG Datenbanken und Informationssysteme Postfach 6009, D-3000 Hannover 1, Fed. Rep. Germany Electronic mail: [email protected], [email protected]

Abstract Ordinary logical implication is not enough for answering queries in a logic database, since especially negative information is only implicitly represented in the database state. Many database completions have been proposed to remedy this problem, but a clear understanding of their properties and differences is still needed. In this paper, a general framework is proposed for studying database completions, which reveals an interesting and fruitful relation to social choice theory. We apply our framework to parameterized forms of the minimal model approach and the closed world assumption, as well as to two versions of the default logic: simple defaults with constraints and normal defaults. Thereby the relationship between these important classes of database completions is clarified. In particular, it is shown that the GCWA cannot be simulated by any of the other three approaches. We also give a characterization of those completions which can be represented as a form of minimal implication. The whole discussion is based on various properties of database completions which have proven useful as a means for classifying the proposed approaches. Finally, we illustrate the applicability of general database completions to conventional deductive databases by means of transformation.

1

Introduction

A logic database stores formulae which describe facts corresponding to conventional database information, rules for deducing further information, and indefinite information. Thus, a database state is a set of first order formulae. This is more general than a deductive database where each formula must be an implication with an atom in its head [Llo87]. Especially incomplete and purely negative information cannot be represented in such a restricted language. The need for extensions is generally accepted, e.g., there are some proposals which allow disjunctions in the heads of the implications [BH86, RT88, Prz88]. Now database completions are needed to define the semantics of such a database state, i.e. which answers to queries are correct. It is not sufficient to consider only proper logical consequences, since especially negative information is only implicitly represented in the database state. Some completions also allow to specify default rules more directly which would otherwise have to be coded in the database state. Many forms of database completions have been defined in the literature, e.g. various forms of minimal implication (circumscription) and different versions of closed world assumptions. This gives rise to the question “Which completion should we use?” (probably depending on the application).

2 In this paper, a general framework is proposed for studying database completions. We characterize those completions which can be represented by means of preference relations [Dav80, McC80, Sho87] and investigate their relation to the class of closed world assumptions [Rei78, Min82, BL89], as well as to two versions of the default logic: simple defaults with constraints [Poo88] and normal defaults [Rei80, Luk85]. This is of course only the beginning, and the goal at the moment is mainly to demonstrate an interesting and promising new way of thinking about database completions. We believe that it will lead to a clear understanding of database completions which is the prerequisite for using them as an effective tool in “knowledge base design”. Our paper is structured as follows: In section 2 we propose a general framework for studying database completions. We start by defining some basic notions about database states and queries, then we give three equivalent formalizations of the concept of “database completion”, and finally we define some properties that database completions might have. These properties are important for evaluating or classifying completions, and are also needed for characterizing those completions which can be represented in certain ways. This representation of completions is the topic of section 3: Here we investigate the completions which can be represented in the four formalisms mentioned above. In section 4 we briefly suggest specification transformations as a way to make general completions applicable to more conventional deductive databases. Finally, in section 5 we give a short summary of our results and outline directions for further research. The proofs had to be left out due to space restrictions.

2

A Formal Framework

2.1

Database States and Queries

As usual in deductive databases, we use a many-sorted (or “typed”) logic. So we expect a signature Σ = (S, Ω, Π) to be given (as part of the database scheme), which determines • a finite set S of sorts (such as person, city, . . .), • a finite set Ω of constants, each of a specific sort (e.g. Jones: person, Hannover : city), and • a finite set Π of predicates (or relation symbols), together with their argument sorts (e.g. direct connection by train(city, city), lives in(person, city)). We excluded function symbols in order to enforce finiteness of the models which must be considered (see below). Although this restriction is very strong, it is quite usual in deductive database theory (e.g., [YH85]). A database state is a set of formulae over Σ, i.e. it is an element of 2FΣ , where FΣ denotes the set of all formulae over Σ and 2FΣ is its powerset. We will assume that each database state implicitly contains the domain closure axioms (DCA) and the unique name axioms (UNA). These axioms require that each element in the carrier set of a sort (e.g., each person) can be named injectively by a ground term of this sort. This is another strong but usual restriction (e.g., [GPP86]). It results in “first order logic” being equivalent to propositional logic (with the ground atomic formulae as propositional variables), so the quantifiers can be thought of as mere shorthands for finite conjunctions or disjunctions. Of course, these shorthands are very useful. Naturally, we will have to ask whether the results can be generalized to a less restrictive setting. Some of them do not hold in full clausal logic (or even first order logic), but we still regard them as useful. In a deductive database system, termination of query evaluation has to be guaranteed, so full clausal logic must not be allowed anyway. It is our intuition that most of the results will probably

3 hold in any restricted logic suitable for query evaluation. And finally note that considering extreme cases usually helps to get a better understanding. In spite of all restrictions, the purpose of database states still is to model real world situations. These determine interpretations of the symbols which make the formulae in the database state true (called models of the database state). Many database states are intended to describe only one real world situation, but it is also possible to have multiple models so that incomplete information can be represented: it is not exactly known which of the models corresponds to the situation in the universe of discourse. Because of our restrictions we only have to consider Herbrand models, i.e. models where each carrier set consists of all constants of this sort and therefore the constants can be interpreted by themselves. Thus, any model γ is completely specified by a truth assignment to the (finitely many) ground atomic formulae. This is an ideal situation in comparison to other approaches. The advantage of our restrictions is that we do not have to distinguish between formulae and sets of models (whereas in general not all sets of models can be described by a formula). We will denote the set of all Herbrand structures over Σ by SΣ , the set of all Herbrand models of a given state Φ by Mod (Φ), and the set of formulae true in each model from a set Γ by Th(Γ). A query is a first order formula, possibly with free variables, called result variables. An answer is a set Θ of substitutions θi (i = 1, . . . , m) of constants for the result variables of the query. Of course, if ψ is a boolean query, i.e. ψ does not have answer variables, then the resulting empty substitution should be written as “yes”. We would certainly call Θ a correct answer in state Φ if Φ ` ψθi (i = 1, . . . , m), where ` denotes the logical implication: ψi θ is true in each model of Φ. But it is well known that logical implication is not sufficient for answering queries: too many information is only implicitly represented in the database state, so that query answering rests on many silent assumptions. This is especially true for negative information: Given the table orders customer article Smith torch Jones batteries represented in the logical data model as a set of facts Φ := {order (Smith, torch), order (Jones, batteries)} one would certainly expect that {hx / Smithi} is a correct answer on the query “Who ordered a torch but no batteries?”: order (x, torch) ∧ ¬order (x, batteries). But with the above definition, this answer would not be correct since Φ 6` ¬order (Smith, batteries) (of course, order (Smith, batteries) does also not follow from Φ). In this example, the implicit assumption is “every ground atomic formula not following from Φ is false” [Rei78]. Considering implicit assumptions can be formalized by means of a database completion, as defined in the next subsection. While in the last example, only a single Herbrand model was the unique intended model, it is also possible that more than one model is intended: This is due to incomplete information. A typical example might be parents(Jimmy, James, Jane), bloodtype(James, A), bloodtype(Jane, 0 ), parents(x, y, z) ∧ bloodtype(y, A) ∧ bloodtype(z, 0 ) → bloodtype(x, A) ∨ bloodtype(x, 0 ). Given this database state, it is equally possible that Jimmy has bloodtype A or 0 . Before we proceed, we have to explain a final “restriction”: We require that query-answering makes no difference between logically equivalent states. This is not true for PROLOG, since the addition

4 of a clause p(x) ← p(x) will generate an infinite loop, but does not change the class of models (the formula is equivalent to true). But thinking about infinite loops is clearly operational, and any semantics that behaves in this way cannot be classified as fully logical. We should rather allow the optimization of deleting this clause (which would not be possible if this would change the semantics of the database state). To show that this constraint does not exclude all cases of practical interest, let us remark that the “standard model semantics” of stratified DATALOG does have this property: The standard model depends only on the logical contents of the database state and on the stratification (which assigns a priority of minimization to each predicate) [Lif88]. So the extralogical information given by the stratification can be clearly separated from the database state and used as a parameter to the database completion.

2.2

Database Completions — Three Equivalent Formulations

We now present three equivalent formalizations of the concept of “database completion”. This will help to get a better understanding of what a completion really is. But it is also necessary for practical reasons: The known completions have been defined in different frameworks and we need a way of comparing them. If we would directly apply the conversions proposed in the theorems below, then this would result in complicated and unnatural definitions. Definition 2.2.1 (Database Completion): • A syntactic completion is a mapping comp: 2FΣ → 2FΣ such that - Φ ⊆ comp(Φ) for each state Φ (“no loss of information”) and - comp(Φ1 ) and comp(Φ2 ) are equivalent for every pair Φ1 , Φ2 of equivalent states (“preservation of equivalence”). • An abstract completion is a relation `c ⊆ 2FΣ × FΣ such that - Φ ` ψ =⇒ Φ `c ψ (“no loss of information”), - Φ `c ψ1 , . . . , Φ `c ψn , {ψ1 , . . . , ψn } ` ψ =⇒ Φ `c ψ (“closure under logical consequences”), - Φ1 `c ψ, Φ1 ` Φ2 , Φ2 ` Φ1 =⇒ Φ2 `c ψ (“preservation of equivalence”). • A modeltheoretic completion is a mapping sel : 2SΣ → 2SΣ such that - sel (Γ) ⊆ Γ for each set of Herbrand structures Γ (= Mod (Φ)) (“no loss of information”). Definition 2.2.2 (Correct Answers): Let ψ be a query and Θ be a set of ground substitutions θ i (i = 1, . . . , m) for the result variables of ψ. Let a database state Φ be given. Θ is correct with respect to a completion comp/ `c/sel iff for each i = 1, . . . , m: • comp(Φ) ` ψθi . • Φ `c ψθi . • sel (Mod (Φ)) ⊆ Mod (ψθi ). So, a database completion can be formalized as a mapping comp from database states to supersets. The intuition is that the completion makes the implicit assumptions explicit. The various forms of the CWA are defined in this way. A completion can also be formalized as a superset of the logical inference relation `. This seems to be “more abstract”, but it directly describes which boolean queries should be answered with “yes”, and the meaning of general queries can be reduced to boolean queries by means of substitutions.

5 Finally, one can also consider the selection of models. A database state has a set of models, but not all are really meant, so the completion selects the intended ones. The minimal model approach is defined in this way. This last formalization reveals a strong connection to social choice theory, a joint branch of social sciences and mathematics: Here choice functions are examined which pick some outcome(s) from every issue (subset of a fixed set of possible outcomes). We might also talk about candidates and winners, but the concept is more general and applies to any form of decision-making with varying sets of alternatives. In our context, the candidates are all the models of a database state and the winners are the intended models used for query answering. For a survey on choice functions from the social choice theory point of view, see [Mou85]. Now we state that all these definitions are equivalent with respect to the correct answers: Definition 2.2.3 (Equivalence of Completions): Two completions comp 1/ `c1/ sel 1 and comp 2 / `c2/sel 2 are equivalent, iff for each database state Φ, each query ψ, and each set of ground substitutions Θ the following holds: Θ is correct with respect to comp 1/ `c1/sel 1 ⇐⇒ Θ is correct with respect to comp 2/ `c2/sel 2 . Theorem 2.2.4: • Let comp be a syntactic completion. Then the relation `c ⊆ 2FΣ × FΣ defined by Φ `c ψ :⇐⇒ comp(Φ) ` ψ is an equivalent abstract completion. • Let `c be an abstract completion. Then the mapping comp: 2FΣ → 2FΣ defined by comp(Φ) := {ψ ∈ FΣ | Φ `c ψ} is an equivalent syntactic completion. Theorem 2.2.5: • Let comp be a syntactic completion. Then the mapping sel : 2SΣ → 2SΣ defined by sel (Γ) := Mod (comp(Th(Γ))) is an equivalent modeltheoretic completion. • Let sel be a modeltheoretic completion. Then the mapping comp: 2FΣ → 2FΣ defined by comp(Φ) := Th(sel (Mod (Φ))) is an equivalent syntactic completion.

2.3

Properties of Database Completions

In this subsection, we list some properties which can be used to evaluate or classify database completions. We formulate these properties for each of the three definitions of “database completion”, since one or the other might be intuitively clearer. Of course, the definitions are equivalent (with respect to the conversions given in theorem 2.2.4 and theorem 2.2.5). Quite a lot of these properties stems from social choice theory. Of course, only the formulation for modeltheoretic completions can be directly quoted in this case. If not otherwise specified, the conditions in the following definitions must be satisfied for all database states Φ, Φ1 , Φ2 ∈ 2FΣ , all formulae ϕ, ϕ1 , ϕ2 ∈ FΣ , all queries ψ, ψ1 , ψ2 ∈ FΣ , and all model sets Γ, Γ1 , Γ2 ∈ 2SΣ . Any completion should preserve consistency because otherwise the information contained in the database state would be lost — every query would be answered with “yes”.

6 Definition 2.3.1 (Consistency Preserving): A completion comp/ `c/sel is consistency preserving (has property CON) iff • Φ consistent =⇒ comp(Φ) consistent. • Φ 6` false =⇒ Φ 6`c false. • Γ 6= ∅ =⇒ sel (Γ) 6= ∅. Some of the earlier completions, like the original CWA, are not consistency preserving. Circumscription has some problems, too, but only for infinite models (which are excluded here). The following property formalizes the intuition that if a completion assumes some formula then it should not change if we tell it that this formula really holds, i.e. insert the formula into the database state. This is similiar to the “materialization of a view”: If some consequences of the (completed) database state are computed very often, then we might want to insert the result of this computation into the database state for efficiency reasons. Of course, the semantics of the database state should not be changed by this transformation. In social choice terms this property means that if some losers withdraw their candidature then the set of winners should not change. So every completion should be stable in the following sense: Definition 2.3.2 (Cumulation): A completion comp/ `c/ sel is cumulative (has property CUM) iff • Φ1 ⊆ Φ2 , comp(Φ1 ) ` Φ2 =⇒ comp(Φ1 ) and comp(Φ2 ) are equivalent. • Φ1 `c Φ2 , Φ2 ` Φ1 =⇒ (Φ1 `c ψ ⇐⇒ Φ2 `c ψ). • sel (Γ1 ) ⊆ Γ2 ⊆ Γ1 =⇒ sel (Γ2 ) = sel (Γ1 ). Most database completions are indeed cumulative (at least as long as they are consistency preserving), but there are exceptions, e.g. default logic [Mak89] and first order circumscription [Bra90]. Cumulation can be split into two more basic properties, restricted monotonicity and the cut-rule [Gab85]. Whereas the preceding properties are absolutely essential for any “good” completion, the following property is useful to classify completions. It constraints the information which is lost when a formula is deleted from the database state: Definition 2.3.3 (Deduction Rule): A completion comp/ `c/sel satisfies the deduction rule (has property DED) iff • comp(Φ ∪ {ϕ}) ` ψ =⇒ comp(Φ) ` (ϕ → ψ). • Φ ∪ {ϕ} `c ψ =⇒ Φ `c (ϕ → ψ). • sel (Γ1 ) ∩ Γ2 ⊆ sel (Γ1 ∩ Γ2 ). The validity of the deduction rule is equivalent to the validity of the case analysis rule: If it doesn’t matter for some query ψ whether this formula ϕ or its negation is contained in the database state then it should be possible to delete it. Theorem 2.3.4: A completion comp/ `c/sel satisfies the deduction rule iff • comp(Φ ∪ {ϕ}) ` ψ, comp(Φ ∪ {¬ϕ}) ` ψ =⇒ comp(Φ) ` ψ. • Φ ∪ {ϕ} `c ψ, Φ ∪ {¬ϕ} `c ψ =⇒ Φ `c ψ. • sel (Γ1 ∪ Γ2 ) ⊆ sel (Γ1 ) ∪ sel (Γ2 ). Interestingly, the validity of the deduction rule also formalizes that the completion assumes defaults only in the absense of information, so if Φ1 contains more information than Φ2 , then less defaults will be assumed in Φ1 .

7 Theorem 2.3.5: A completion comp/ `c/sel satisfies the deduction rule iff • Φ1 ` Φ2 =⇒ comp(Φ2 ) ∪ Φ1 ` comp(Φ1 ). • Φ1 ` Φ2 , Φ1 `c ψ, =⇒ there is ψ2 ∈ FΣ such that Φ2 `c ψ2 , Φ1 ∪ {ψ2 } ` ψ. • Γ1 ⊆ Γ2 =⇒ sel (Γ2 ) ∩ Γ1 ⊆ sel (Γ1 ). So, at least, this property seems to be very useful, as so many of the usual deduction rules rely on it. We will show that it is interesting for classifiying completions, too, since not all of the proposed database completions have this property. And in some applications the knowledge base engineer might intensionally violate the deduction rule: Example 2.3.6: Suppose, we know that the powersupply or the fuses of some computer are faulty (e.g., because there is no voltage on the power supply wires): Φ := {faulty(fuses) ∨ faulty(power supply)}. Then we cannot exclude that both of them are faulty, i.e. we require comp(Φ) 6` ¬faulty(fuses) ∨ ¬faulty(power supply), i.e. the disjunction should be interpreted inclusively, not as “exclusive or”. The last condition can also be written as comp(Φ) 6` faulty(fuses) → ¬faulty(power supply). Suppose on the other hand, that we exactly know that the fuses are burnt out. Then this might be a sufficient explanation for the malfunction of the computer, so we will assume that all other parts are not faulty: comp({faulty(fuses)}) ` ¬faulty(power supply), which implies that comp(Φ ∪ {faulty(fuses)}) ` ¬faulty(power supply) (since Φ ∪ {faulty(fuses)} is equivalent to {faulty(fuses)}). So the deduction rule is violated. 2 Let us finally note that this property is known in social choice theory for a long time under numerous different names (e.g., “Chernoff’s property” in [Mou85]). There it means that a winner in a larger set of candidates should also win when the number of competitors decreases. There are various weakenings of the deduction property, one is the “property of compatible extension”: If a formula ϕ does not contradict a database state Φ, then Φ and Φ∪{ϕ} should not contradict each other, too: Definition 2.3.7 (Compatible Extension): A completion comp/ `c/sel has the property of compatible extension (property CEX) iff • {ϕ} ∪ comp(Φ) is consistent =⇒ comp(Φ ∪ {ϕ}) ∪ comp(Φ) is consistent. • Φ 6`c ¬ϕ, Φ `c ψ =⇒ Φ ∪ {ϕ} 6`c ¬ψ. • Γ1 ⊆ Γ2 , Γ1 ∩ sel (Γ2 ) 6= ∅ =⇒ sel (Γ1 ) ∩ sel (Γ2 ) 6= ∅. Theorem 2.3.8: If a completion has the deduction property, then it has the property of compatible extension. Whereas the deduction rule limits the amount of information lost when the database state gets weaker, the “expansion propery” limits the amount of information gained in this case. It might seem counterintuitive that information can be gained by weakening the state, but this is due to the nonmonotonic nature of database completions: They assume facts by default and have to revoke it if they get the additional information that this assumption does not hold.

8 Definition 2.3.9 (Expansion Property): A completion comp/ `c/sel has the expansion property (property EXP) iff • comp(Φ ∪ {ϕ1 }) ∪ comp(Φ ∪ {ϕ2 }) ` comp(Φ ∪ {ϕ1 ∨ ϕ2 }). • Φ ∪ {ϕ1 ∨ ϕ2 } `c ψ =⇒ there are ψ1 , ψ2 ∈ FΣ such that Φ ∪ {ϕ1 } `c ψ1 , Φ ∪ {ϕ2 } `c ψ2 , {ψ1 , ψ2 } ` ψ. • sel (Γ1 ) ∩ sel (Γ2 ) ⊆ sel (Γ1 ∪ Γ2 ). When talking about candidates and winners (i.e., all models and the intended ones), this property requires that a winner in Γ1 and in Γ2 should not loose in Γ1 ∪ Γ2 . Now let us consider the new information generated by a database completion. Of course, a completion should not allow to conclude arbitrary new formulae, e.g. we would not expect to get new positive information when the extensions of the predicates are minimized. On the other hand, such a completion would be insufficient if the application requires new positive information. Definition 2.3.10 (New Information): A completion comp/ `c/sel does not generate new information from Ψ ⊆ FΣ iff for each ψ ∈ Ψ: • comp(Φ) ` ψ =⇒ Φ ` ψ. • Φ `c ψ =⇒ Φ ` ψ. • sel (Γ) ⊆ Mod (ψ) =⇒ Γ ⊆ Mod (ψ). Two database completions can be compared by the amount of new information generated: Definition 2.3.11 (Weaker Than): A completion comp 1/ `c1/ sel 1 is weaker than another completion comp 2/ `c2/sel 2 iff • comp 2 (Φ) ` comp 1 (Φ). • Φ `c2 ψ =⇒ Φ `c1 ψ. • sel 2 (Γ) ⊆ sel 1 (Γ).

3

Representations of DB-Completions

We will now consider the problem of representing database completions, i.e. investigate formalisms which the knowledge base engineer might use to denote the intended completion. In principle, it would be possible to specify a completion as a table, listing for each database state the completed database state. Of course, this is impossible for any practical application (there are n 22 non-equivalent database states with n propositional variables, since each database state specifies a subset of the 2n possible situations). This representation of a database completion is also totally unstructured: It would be very difficult to see any reason in this endless list of argument-value pairs. On the other hand, some completions are “rationalizable” by means of preference relations, various forms of defaults, or in some other way. In this case, one can also say that the preference relation or set of defaults is “revealed” by the completion. Just to get an impression, let us note that there are 71.638 cumulative and consistency preserving completions for two propositional variables, but only 219 of them can be represented by preference relations (these figures have been computed by a simple enumeration program). So, in this section we will study formalisms for denoting completions i.e. parameterized completions which map a specification of a completion (the parameter) to a (syntactical, abstract or modeltheoretic) completion. Given such a parameterized completion, one might ask the following questions (among others):

9 • Which properties can be guranteed independently of the parameter, i.e. we would like to have results of the form “Which ever value the knowledge base engineer chooses for the parameter, the specified completion will have property P .” • How powerful is this formalism? Which completions can be specified using this formalism? Answers to this questions might use the properties in the form “Each completion with property P can be specified using formalism F ” or might compare the expressive power of two such formalisms, e.g. “Each completion which can be represented in formalism F1 can also be represented in formalism F2 ”. Of course, there is a tradeoff between expressiveness and the guaranteed properties. • Naturally, the specification formalism should also be easy to use: Typical specifications should be concise, well structured and clear. Unfortunately, such conditions are hard to formalize, although there are some special aspects which can be easily checked, e.g. whether it is possible to specify priorities between conflicting defaults. We will not discuss this topic further in this paper. The possibility to be represented in a given formalism can also be regarded as a property of the completion, and therefore the results in this section are again implications between properties of completions.

3.1

Minimal Models

Some completions result from a preference relation on the interpretations: Certain situations of the real world are considered as more likely than others. For instance, we would prefer a situation in which the bird Tweety can fly to a situation in which it does not. So the completion is specified by means of a partial order ≺ (any transitive and irreflexive relation) on the situations (i.e. Herbrand structures). Since most preference relations try to minimize the extensions of the predicates, we will use γ1 ≺ γ2 to denote that we prefer γ1 to γ2 and not vice versa. Predicate circumscription, for instance, uses the preference relation defined by γ1 ≺ γ2 :⇐⇒ γ1 [p] ⊆ γ2 [p] for all predicates p (⊂ for at least one). The use of preference relations to define completions originates from [Dav80], completions defined by general preference relations have been investigated in [Sho87]. Definition 3.1.1 (Minimal Models): Let ≺ be a partial order relation on the Herbrand structures SΣ . Then the following modeltheoretic completion min ≺ is given by ≺: min ≺ (Γ) := {γ ∈ Γ | there is no γ0 ∈ Γ with γ0 ≺ γ}. Minimal implication has the following properties (cumulation has been proven in [Mak89], the validity of the deduction theorem in [Sho87]): Theorem 3.1.2: Let ≺ be any partial order on SΣ . Then min ≺ is consistency preserving (CON) and cumulative (CUM), satisfies the deduction rule (DED), and has the property of compatible extension (CEX) as well as the expansion property (EXP). The following theorem characterizes the completions which can be represented as a form of minimal implication. It is a reformulation of a theorem from social choice theory [Sch76], and this should demonstrate that the relation to social choice theory is no “red herring”: Theorem 3.1.3: Let sel be a modeltheoretic completion which is consistency preserving (CON), cumulative (CUM), satisfies the deduction rule (DED), and has the expansion property (EXP). Then there is a partial order ≺ on SΣ such that sel = min ≺ . (The converse of this theorem is also true, see above.)

10 As an aside, let us remark that the closure properties of a class of database completions might also be of interest. The purpose of database completions is to formalize the intuition of people on the correct answers. But what if we have two persons with different intuitions? If we want to be careful then we might take the union of the selected models, thereby answering only those queries with “yes” which both persons would have agreed with. Interestingly, the completion defined by sel (Γ) := min ≺1 (Γ) ∪ min ≺2 (Γ) is not always a minimal implication, so more completions can be described in this way.

3.2

Closed World Assumptions

“Closed World Assumptions” are another form of database completions, descending from Reiter’s original CWA [Rei78]. They can be most naturally described as syntactic completions, extending the database state by additional formulae. In the most general case, the formulae which can be used to extend the database state are given as a parameter ∆. A typical value for ∆ would be the set of negated ground atomic formulae, resulting in Minker’s GCWA [Min82] (which is identical to the original CWA if this is consistent). We treat the elements δ ∈ ∆ as defaults: they should be assumed in every given database state if there is no evidence to the contrary. A slight complication arises because there might only be evidence that some defaults cannot hold at the same time. In this case, we will assume none of them to prevent the inconsistency: Definition 3.2.1 (Maximal Extension): Let ∆ be any set of closed formulae. A subset E ⊆ ∆ is called a maximal extension of a database state Φ iff • Φ ∪ E is consistent, and • Φ ∪ E ∪ {δ} ist inconsistent for every δ ∈ ∆ − E. Definition 3.2.2 (Closed World Assumption): Let ∆ be any set of closed formulae. Then the following syntactic completion cwa ∆ is given by ∆: cwa ∆ (Φ) := Φ ∪ {δ ∈ ∆ | δ ∈ E for each maximal extension E of Φ}. This generalization of the CWA has been investigated in [BL89]. Alternatively, the set of assumed defaults can also be defined in the following way which is very similar to Minker’s definition of the GCWA: Theorem 3.2.3: Let ∆ be any set of closed formulae. Then the following syntactic completion cwa ∆ is given by ∆: cwa ∆ (Φ) := Φ ∪ {δ ∈ ∆ | there are no δ1 , . . . , δn ∈ ∆ with Φ ` ¬(δ ∧ δ1 ∧ · · · ∧ δn ) and Φ 6` ¬(δ1 ∧ · · · ∧ δn )}. Closed world assumptions have the following properties: Theorem 3.2.4: Let ∆ be any set of closed formulae. Then cwa ∆ is consistency preserving (CON), cumulative (CUM), and has the property of compatible extension (CEX). CWAs do not in general statisfy the deduction rule. Minker’s GCWA [Min82] is one example of a practical CWA which violates this condition (and also the expansion property). As shown in example 2.3.6, it is necessary for some applications to violate this rule, and the GCWA will behave exactly as required in this example. But since minimal implication satisfies the deduction rule for every ≺, this example cannot be handled by any form of minimal implication. Closed world assumptions are more general than minimal implications, but if the parameter ∆ is restricted to sets closed under disjunction, then we exactly get the minimal implications:

11 Theorem 3.2.5: A completion can be represented by minimal models iff it can be represented by closed world assumptions where the parameter ∆ is closed under disjunction. In fact, the only difference between the expressiveness of the two approaches is that the closed world assumptions may violate the deduction property: Theorem 3.2.6: A completion can be represented by minimal models iff it can be represented as a closed world assumption and has the deduction property. As an aside, let us note that there are also quite different ways to characterize a completion: If we only know the new information which should not be generated by some completion, then we might be interested in the maximal completion with this property. It is quite astonishing that such a maximal completion exists: Theorem 3.2.7: Let ∆ be closed under disjunction and have the following property: For each two interpretations γ1 , γ2 there is at least one δ ∈ ∆ such that γ1 and γ2 differ in the interpretation of δ. Then cwa ∆ is the maximal completion without new information from Ψ := {¬δ1 ∨ · · · ∨ ¬δn | δi ∈ ∆}, i.e. any other completion comp without new information from Ψ is weaker. As an example, this theorem can be applied to the CWA-Version of [YH85] which uses the set of negative ground clauses as ∆ (and is identical to predicate circumscription in our restricted setting). Another theorem of this sort has been proven in [BL89].

3.3

Simple Defaults with Constraints

The formalism of [Poo88] has two parameters, both of which are sets of (closed) formulae: A set ∆ of defaults or “possible hypotheses”, and a set C of constraints. The intension is that as many defaults as possible should be assumed, but a contradiction to the constraints C has to be avoided. The constraints are indented to block unwanted application of the defaults, e.g., to exclude the contraposition of the default rules. The definition of “maximal extension” is nearly identical to the version for the closed world assumption: Definition 3.3.1 (Maximal Extension): Let ∆ and C be any sets of formulae. Then a maximal extension of a database state Φ is a subset E ⊆ ∆ such that • Φ ∪ E ∪ C is consistent (or E = ∅), but • Φ ∪ E ∪ C ∪ {δ} is inconsistent for each δ ∈ ∆ − E. It might seem that the closed world assumptions are a special case with C = ∅, but the treatment of conflicting defaults is different in both approaches. While multiple extensions are intersected directly in the CWA, they are first closed under logical implication here: Definition 3.3.2 (Application of Simple Defaults): The completion `S(∆,C) given by ∆ and C is defined by Φ `S(∆,C) ψ :⇐⇒ Φ ∪ E ` ψ for each maximal extension E of Φ. As long as there are no constraints, the CWA can simulate the behaviour of this approach by closing ∆ under disjunction. Theorem 3.3.3: Let ∆ and C be any two sets of formulae. Then `S(∆,C) is consistency preserving (CON), has the cumulation property (CUM), and the expansion property (EXP).

12 Theorem 3.3.4: A completion can be represented with minimal models iff it can be represented with simple defaults without constraints (i.e. C = ∅).

3.4

Normal Defaults

Normal Defaults are a special case of Reiter’s Default Logic [Rei80]. Here, the parameter ∆ contains pairs of (closed) formulae, which are again called defaults. Each default δ consists of a prerequisite pre(δ) and a consequent conc(δ). The default rule “birds typically can fly” might be formalized using defaults with pre(δ) := bird (c) and conc(δ) := flies(c) (for each constant c of the right sort). The defaults can be most easily viewed as rules for reducing the set of intended models (see [Luk85]): A default is applicable to a set Γ of models iff Γ ⊆ Mod (pre(δ)) and Γ ∩ Mod (conc(δ)) 6= ∅. Then the result of the application of δ to Γ is Γ ∩ Mod (conc(δ)). Finally, the modeltheoretic completion sel N ∆ given by ∆ maps Mod (Φ) to the union of the irreducible model sets which can be reached by repeatedly applying the defaults. Theorem 3.4.1: Let ∆ be any set of normal defaults. Then sel N ∆ is consistency preserving (CON) and has the property of compatible extension (CEX). Theorem 3.4.2: A completion can be represented by minimal models iff it can be represented by normal defaults ∆ with the restriction that pre(δ) = true for each δ ∈ ∆. Let us finally note that normal defaults are not more general than the closed world assumptions: The GCWA (with at least 3 propositions) cannot be simulated with normal defaults.

3.5

Summary

In the following table we listed which of the above properties can be guranteed for each of the parameterized completions studied here. An empty field means that there is at least one value for the parameter such that the specified completion violates the respective property. Properties of Parameterized Completions

CON CUM DED CEX EXP

Minimal Models

CWAs

• • • • •

• •

Simple Defaults with Constraints • •



Normal Defaults •

• •

With respect to the representability of completions, our results can be summarized as follows: The completions which can be represented by minimal models can also be represented in each of the other three approaches, whereas no such subset relations hold between these ones. The gain of expressive power of the closed world assumptions (resp. simple defaults with constraints) with respect to the minimal model approach is only due to the violation of the deduction rule.

4

Specification Transformations

Although answer algorithms for some forms of minimal implication have recently been proposed, they will probably not meet the efficiency requirements of large deductive databases. Of course,

13 these algorithms are important for prototyping and we do not deny the possibility that efficient implementations of these algorithms might be developed. Another way to efficient query answering might be the transformation of a database scheme with a general completion to one which uses only a very restricted (but computationally efficient) completion. Of course, this transformation should not alter the answers to queries. An example for one step in such a transformation can be the “naming of defaults” of [Poo88]. This is equally applicable to our parameterized closed world assumption: Suppose ∆ contains defaults of the form bird (c) → flies(c) (“normal birds can fly”). If we only have negative ground literals (or their disjunctions) as possible defaults, then we can introduce the new predicate abnormal , and expand the database state by the single rule ¬abnormal (x) → (bird (x) → flies(x)) which is equivalent to the usual formulation ¬abnormal (x) ∧ bird (x) → flies(x). Now we only have to assume the negative ground literals of abnormal (while assuming nothing about bird and flies), and any query (which does not contain the new predicate) will be answered in exactly the same way. So variable circumscription will be sufficient for disjunctively closed default sets. The transformation of circumscription to stratified DATALOG has been studied in [GL89]. At the moment, only special cases can be handled, so further research is needed. But this demonstrates promising ways to make use of powerful completions without sacrificing efficiency.

5

Conclusions

Any logical database has to use some sort of database completion. We argued that at least in the specification phase no fixed completion should be used, the “knowledge base administrator” should have the freedom to specify the completion as part of the database scheme. This will result in a higher level description of the underlying default rules. The necessary efficiency of query-answering can (hopefully) later be reached by means of specification transformations. Many forms of database completions have been proposed in the literature, but a clear understanding of their properties and differences is still needed. This paper has tried to attack the problem by defining a general framework for studying database completions. In our view, this abstract notion of database completion is especially well suited to the very first phase of “knowledge base design”, since one almost invariably starts by considering examples of the form: “Given this database state, these queries should be answered with yes, those with unknown, and those with no.” This correspondes exactly to a partially specified abstract completion. Only later one starts to think about preference relations on the models or something like that. Our definitions of general database completions revealed an interesting connection to social choice theory. The direct outcome of this has been a characterization of those database completions which can be represented by means of preference relations. By reformulating properties of choice functions, we arrived at some known and some new properties of database completions. Such properties will help to classify and to evaluate completion formalisms, and also to clarify the knowledge base designer’s needs for specific completion features. We investigated parameterized forms of the closed world assumption and the minimal model approach, as well as two versions of the default logic: simple defaults with constraints and normal defaults. Thereby, we clarified the relationship between these important classes of database completions: The completions which can be represented by minimal models can also be represented in each of the other three approaches, whereas no such subset relations hold between these ones. In

14 particular, the GCWA, which allows the specification of real disjunctions, can only be represented in the CWA formalism. We have to admit that the representability of minimal models by the various forms of defaults is in part due to our strong finiteness conditions. But in our view, these simplifications are an adequate means to focus on the essential differences. Our results can also be seen in the context of nonmonotonic reasoning, since our notion of an “abstract completion” is a nonmonotonic inference relation with certain properties. In comparison to other approaches in this field (e.g., [KLM90]), we emphasized the syntactic and the modeltheoretic point of view, and tried to compare the existing approaches to database completions instead of defining new ones. One specific difference is the notion of “models”, which are typically much more complicated in the literature on nonmonotonic reasoning, and might not even respect the meaning of the logical connectives. Finally, the relation to social choice theory is a unique feature of this paper (to the best of the author’s knowledge). Much work remains to be done. Of course, we should investigate some more classes of database completions and more properties. We should also try to formalize the “ease of use” of the various completion mechanisms as a specification tool. A better understanding of database completions will hopefully lead to practical advice which formalism should be used for “knowledge base specification”. The applicability of this theory to large deductive databases mainly rests on the possibility to transform specifications so that only very restricted (but computationally efficient) completions will be used in the end. At the moment, only special cases can be handled. Finally, the problem of “modular database specification” will also have to be considered. Due to the fact that any implication can also be read the other way round (by contraposition) the defining and applied occurences of a predicate might not be clearly distinguished. This cannot happen for stratified deductive databases [Bra88], but the problem is much more difficult in presence of incomplete information. We will also have to define the semantics of two modules with different completions put together.

Acknowledgement I would like to thank Udo Lipeck for many fruitful discussions and Bernhard Convent for helpful comments on an earlier version of this paper.

References [BH86]

N. Bidoit, R. Hull: Positivism vs. minimalism in deductive databases. In Proc. of the Fifth ACM SIGACT-SIGMOD Symposium on Principles of Database Systems (PODS’86), 123– 132, 1986.

[BL89]

S. Brass, U. W. Lipeck: Specifying closed world assumptions for logic databases. In Second Symposium on Mathematical Fundamentals of Database Systems (MFDBS’89), 68– 84, Lecture Notes in Computer Science 364, Springer-Verlag, Berlin, 1989.

[Bra88]

S. Brass: Vervollst¨andigungen f¨ ur Logikdatenbanken (completions of logic databases). Diploma thesis, Informatics, Techn. Univ. Braunschweig, 1988. In German. Revised Version available as technical report 315/1989, Informatics, Univ. Dortmund.

[Bra90]

S. Brass: Remarks on first order circumscription. Submitted for publication, 1990.

[Dav80]

M. Davis: The mathematics of non-monotonic reasoning. Artificial Intelligence 13 (1980), 73–80.

15 [Gab85] D. M. Gabbay: Theoretical foundations for non-monotonic reasoning in expert systems. In K. R. Apt (ed.), Logics and Models of Concurrent Systems, 439–457, Springer, Berlin, 1985. [GL89]

M. Gelfond, V. Lifschitz: Compiling circumscriptive therories into logic programs. In NonMonotonic Reasoning (2nd International Workshop), 74–99, Lecture Notes in Artificial Intelligence 346, Springer-Verlag, Berlin, 1989.

[GPP86] M. Gelfond, H. Przymusinska, T. Przymusinski: The extended closed world assumption and its relationship to parallel circumscription. In Proc. of the Fifth ACM SIGACTSIGMOD Symposium on Principles of Database Systems (PODS’86), 153–185, 1986. [KLM90] S. Kraus, D. Lehmann, M. Magidor: Nonmonotonic reasoning, preferential models and cumulative logics. Artificial Intelligence 44 (1990), 167–207. [Lif88]

V. Lifschitz: On the declarative semantics of logic programs with negation. In J. Minker (ed.), Foundations of Deductive Databases and Logic Programming, 177–192, Morgan Kaufmann Publishers, Los-Altos (Calif.), 1988.

[Llo87]

J. W. Lloyd: Foundations of Logic Programming, second edition. Springer-Verlag, Berlin, 1987.

[Luk85]

W. Lukaszewics: Two results on default logic. In Proc. 9th International Joint Conference on Artificial Intelligence (IJCAI), 459–461, Los Angeles, 1985.

[Mak89] D. Makinson: General theory of cumulative inference. In Non-Monotonic Reasoning (2nd International Workshop), 1–18, Lecture Notes in Artificial Intelligence 346, SpringerVerlag, Berlin, 1989. [McC80] J. McCarthy: Circumscription — a form of non-monotonic reasoning. Artificial Intelligence 13 (1980), 27–39. [Min82]

J. Minker: On indefinite databases and the closed world assumption. In D. W. Loveland (ed.), 6th Conference on Automated Deduction, 292–308, Lecture Notes in Computer Science 138, Springer-Verlag, Berlin, 1982.

[Mou85] H. Moulin: Choice functions over a finite set: A summary. Social Choice and Welfare 2 (1985), 147–160. [Poo88]

D. Poole: A logical framework for default reasoning. Artificial Intelligence 36 (1988), 27–47.

[Prz88]

T. C. Przymusinski: On the declarative semantics of deductive databases and logic programs. In J. Minker (ed.), Foundations of Deductive Databases and Logic Programming, 193–216, Morgan Kaufmann Publishers, Los-Altos (Calif.), 1988.

[Rei78]

R. Reiter: On closed world data bases. In H. Gallaire, J. Minker (eds.), Logic and Data Bases, 55–76, Plenum, New York, 1978.

[Rei80]

R. Reiter: A logic for default reasoning. Artificial Intelligence 13 (1980), 81–132.

[RT88]

K. A. Ross, R. W. Topor: Inferring negative information from disjunctive databases. Journal of Automated Reasoning 4 (1988), 397–424.

[Sch76]

T. Schwartz: Choice functions, rationality conditions and variations on the weak axiom of revealed preferences. J. Econom. Theory 13 (1976), 414–427.

[Sho87]

Y. Shoham: Nonmonotonic logics: Meaning and utility. In Proc. 10th International Joint Conference on Artificial Intelligence (IJCAI), 388–393, Milan, 1987.

[YH85]

A. Yahya, L. J. Henschen: Deduction in non-horn databases. Journal of Automated Reasoning 1 (1985), 141–160.