Efficient Query Evaluation in Disjunctive Deductive

0 downloads 0 Views 224KB Size Report
and ambiguity in natural-language understanding. Furthermore ... disjunctions are not used too heavily, query evaluation can be nearly as efficient as in the .... ¬faulty(X,power supply) ← ok(X,power on light burning). .... i=1(Fi − {Bi})}. Then, as ...
University of Hannover, Insitut f¨ ur Informatik, Research Report 93Br02 (submitted for publication)

Efficient Query Evaluation in Disjunctive Deductive Databases Stefan Brass Institut f¨ ur Informatik, Universit¨at Hannover Lange Laube 22, D-30159 Hannover, Fed. Rep. Germany E-mail: [email protected] Fax: (+49 511) 762 4961

Abstract It is known that bottom-up query evaluation can be extended to work with disjunctive facts, but there seems to be the common assumption that it is much too inefficient for practical applications. In this paper, we improve the extended bottom-up evaluation by making the resolvable literal in a disjunctive fact unique. In many cases, this reduces an exponential behaviour to a polynomial one. We introduce the notion of “disjunction types” formalizing which predicates can appear together in a disjunction. This information is needed to generalize implementation techniques based on the predicate dependency graph, e.g. to determine a sequence for the evaluation of the rules. These two ideas are utilized in a translation of disjunctive rules into Horn clauses with some list-valued arguments. This shows that at least the addition of a few disjunctive rules to an otherwise Horn database does not destroy the possibility of efficient query evaluation.

1

Introduction

The theory of deductive databases [Ull88, Ull89, HPRV89, CGT90] is by now well-matured and has lead to quite a few prototype systems (e.g. [NT89, PDR91, RSS92]). Commercial systems probably will become available in the next few years. But all of these systems are restricted to Horn clauses with (more or less) stratified negation. The importance of representing disjunctive information in deductive databases is generally accepted [BH86, RT88, BL89, LMR92]. Applications include, e.g., biological inheritance, legal rules, non-unique explanations for observed symptoms in fault diagnosis, and ambiguity in natural-language understanding. Furthermore, if one adds overridable rules to deductive databases [KLW90, LV90, BL91, BL92, BL93], conflicting multiple inheritance is another source of disjunctive information.

1

There are a few prototypes for extending logic programming in this direction, but the bottom-up approach is generally considered as much too inefficient to be practically usable. The goal of this paper is to demonstrate that this is not in general true: If disjunctions are not used too heavily, query evaluation can be nearly as efficient as in the Horn case. Of course, there is some work to do in order to achieve this efficiency. One reason for the exponential behaviour of the generalized bottom-up evaluation is that every literal in a disjunctive fact can participate in the resolution. For instance, if we have the disjunctive fact p(a1 ) ∨ p(a2 ) ∨ p(a3 ) and the rule q(X) ← p(X), any subset of the three p-literals can be translated to the corresponding q-literal. By improving an optimization known for positive hyperresolution, we make the resolvable literal unique. In many cases, this dramatically reduces the number of derivable disjunctive facts, i.e. the size of the intermediate results in the bottom-up evaluation. Another problem with the extended bottom-up evaluation is that many implementation techniques developed for Horn clauses do not directly generalize. In the above example, the rule q(X) ← p(X) surely is non-recursive (if there are no other rules). Nevertheless, it does not suffice to apply it once; it has to be applied three times. In order to solve this problem, we analyse how often which predicates can appear together in a disjunctive fact (we call this a “disjunction type”). Of course, in general disjunctions of unbounded length can be constructed. So we also introduce disjunction types with less information. Based on these two ideas, we present a translation of disjunctive rules into efficiently evaluable Horn clauses with list-valued arguments. The lists are only needed for unbounded disjunctions. So by adding a few disjunctive rules to a set of Horn clauses and then applying the translation, the efficiency of query evaluation is usually not very much reduced. Of course, this translation can be directly applied to use existing deductive database systems for reasoning with incomplete information. But it is also useful for transforming implementation techniques known for Horn clauses in the reverse direction. Finally, let us remark that although in this paper we do not consider a variant of negation-as-failure or, more generally, defaults, our work can be easily integrated with the approach of [BL92, BL93]. The two issues of computing logical consequences and of applying defaults can in fact be separated, and the improvement reported here directly yields a better algorithm for query answering with disjunctions and defaults. Our paper is structured as follows: First (section 2) we review the syntax and declarative semantics of disjunctive deductive databases. In section 3 we define the improved version of the corresponding bottom-up evaluation and prove its completeness. In section 4 we analyse which types of disjunctions can actually appear in the derived disjunctive facts. Section 5 contains the translation into Horn clauses. Finally, in section 6 we give a short summary and outline directions for future work.

2

2

Disjunctive Deductive Databases

In this section, we define the syntax and declarative semantics of disjunctive deductive databases. This is the yardstick to measure the correctness and completeness of the proposed query evaluation algorithm. First, the application domain determines a (possiby multi-sorted) signature Σ, i.e. the names of constants and predicates to be used in the formulas. We allow infinitely many constants (e.g., all strings), but no function symbols (as usual in DATALOG). Now a deductive database contains two kinds of Σ-formulas: Definition 2.1 (Disjunctive Fact): A disjunctive Σ-fact is a set {A 1 , . . . , An } of ground Σ-atoms (atomic formulas). It is usually written as A1 ∨ · · · ∨ A n (in any order, without duplicates). Definition 2.2 (Disjunctive Rule): A disjunctive Σ-rule is a formula of the form A1 ∨ · · · ∨ A n ← B 1 ∧ · · · ∧ B m , where the Ai and Bi are atomic Σ-formulas and n ≥ 0, m ≥ 1. A1 ∨ · · · ∨ An is the head of this rule, and B1 ∧ · · · ∧ Bm is its body. A rule is safe iff every variable appearing in head also appears in the body. Definition 2.3 (DDDB): A disjunctive deductive database (DDDB) consists of • Σ, a signature, • F, a finite set of disjunctive Σ-facts, and • R, a finite set of safe disjunctive Σ-rules. Example 2.4: Disjunctive information arises for instance when we try to find the faulty part in a computer. Here the specific part is only known at the very end of this diagnosis. Suppose we store the observed symptoms Y about a computer X in a relation symptom(X, Y ), and the things that we checked in the relation ok . Then we might have a rule like faulty(X, fan) ∨ faulty(X, power supply) ← symptom(X, fan not running) ∧ ok (X, switched on). Of course, there are also rules stating that something is not faulty: ← ok (X, power on light burning) ∧ faulty(X, power supply). Since the empty disjunction is treated as false, this is logically equivalent to the formula ¬faulty(X, power supply) ← ok (X, power on light burning). If the two rules are applicable for a computer pc1 , we can conclude faulty(pc1 , fan).

3

2

The purpose of a deductive database is of course to answer queries. One can allow more or less general formulas as queries, but atomic formulas are sufficient, since the user can add rules like answer (X1 , . . . , Xn ) ← A1 ∧ · · · ∧ Am to the database. Definition 2.5 (Query): A query is an atomic formula A. Definition 2.6 (Correct Answer): Given a DDDB hΣ, F, Ri, a correct answer to a query A is a set of substitutions {θ1 , . . . , θk }, such that F ∪ R ` Aθ1 ∨ · · · ∨ Aθk (we use ` to denote logical consequences). An answer is minimal if no proper subset also is a correct answer. An answer is definite if it consists only of one element. In fact, there are different proposals for the notion of an answer in the context of disjunctive information (this one is taken from [Rei78]), but all need the computation of implied disjunctive facts, which is the subject of the next section.

3

Extended Bottom-Up Query Evaluation

Horn-clauses are usually evaluated by inserting matching facts for the body literals and deriving the fact corresponding to the head literal, e.g.: p(a)  ↑ p(X) ← q1 (X) ∧ q2 (X, Y ).  ↑ ↑ q1 (a) q2 (a, b)



Now, if we work with disjunctive facts instead of simple facts, the idea is to split the disjunction into the “active literal” and the “context”. The active literal participates in the resolution as before, i.e. it is matched with a body literal. The context is directly passed into the resulting disjunction: p(a) ∨ p(b) ∨ s(c)  ↑ ↑  p(X) ∨ p(Y ) ← q1 (X) ∧ q2 (X, Y ). 

↑ ↑ q1 (a) q2 (a, b) ∨ s(c)

Here, p(a)∨p(b)∨s(c) is obviously a logical consequence of the rule and the two disjunctive facts: We do not know whether the active literal q2 (a, b) or the context s(c) is true. But

4

if the context is true, the resulting disjunction is trivially satisfied. So assume that s(c) is false. Then the active literal q2 (a, b) must be true, the rule is applicable, and we can derive p(a) ∨ p(b). Up to now, the extended bottom-up evaluation usually requires that a disjunctive fact with n literals is split in the n possible ways into the active literal and the context. In fact, the literature leaves this point a bit unclear. But if one would take the definition of [MR90] literally, p would not be derivable from p ← q and p ∨ q (contradicting their theorem 2) because only the first literal of p ∨ q can participate in the resolution. So they surely intended to allow the resolution with any literal in a disjunctive fact. But this obviously is a source of major inefficiencies, if one wants to use it as a method for query evaluation ([MR90] use it only as a declarative semantics):  Example 3.1: Let F := s(ai , ai+1 ) 0 ≤ i ≤ n and R consist of the following rules: p(X) ← s(a0 , X). q(X) ∨ p(Y ) ← p(X) ∧ s(X, Y ). ← p(an+1 ). r(X) ← q(X).

(1) (2) (3) (4)

Then rule (1) allows to conclude p(a1 ), rule (2) produces disjunctions of the form q(a1 ) ∨ q(a2 ) ∨ · · · ∨ q(ai ) ∨ p(ai+1 ), and with rule (3) we derive q(a1 ) ∨ q(a2 ) ∨ · · · ∨ q(an ). But now rule (4) can be used to replace any subset of the q-literals by the corresponding r-literals — a clearly exponential behaviour. 2 In fact, the extended bottom-up evaluation is nothing else than a special case of positive hyperresolution [CL73]. And it is known that positive hyperresolution remains complete if one restricts the active literal to have the minimal predicate with respect to some order (e.g., the lexicographical one) of the predicates [CL73, BL92]. This is a big step in the right direction, but unfortunately disjunctions often consist of many literals with the same predicate (e.g. q(a1 ) ∨ · · · ∨ q(an )), so the active literal is still far from being uniquely determined. But in contrast to the theorem-proving community, we consider only safe rules. This allows us to extend the order from the predicates to the ground literals, so that the minimality requirement for the active literal uniquely determines it. Example 3.2: In example 3.1, rule (4) generates only n instead of 2n − 1 disjunctions out of q(a1 ) ∨ · · · ∨ q(an ), namely the disjunctions of the form r(a1 ) ∨ · · · ∨ r(ai ) ∨ q(ai+1 ) ∨ · · · ∨ q(an ) (since we can only apply (4) to the lexicographically minimal literal).

5

2

To formally define the extended bottom-up evaluation, we only have to modify the usual “direct consequence” operator TR : Definition 3.3 (Direct Consequences): Let R∗ be the set of ground instances of rules in R, Fˆ be any set of disjunctive facts, and ≺ be a linear order on the ground atoms. Then the direct consequences of Fˆ via R are:  ˆ := Fˆ ∪ F there is a rule instance A1 ∨ · · · ∨ An ← B1 ∧ · · · ∧ Bm in R∗ TR (F) and there are disjunctive facts F1 , . . . , Fm ∈ Fˆ such that Bi is the ≺-minimal element of Fi (i = 1, . . . , m) S and F = {A1 , . . . , An } ∪ m i=1 (Fi − {Bi }) . Then, as usual, this operator is iteratively applied to the given facts F until nothing changes. Such a fixpoint is reached after a finite number of iterations because the derived disjunctive facts can contain only constants explicitly appearing in F ∪ R. Of course, we are only interested in minimal derivable disjunctions: Definition 3.4 (Derivable Disjunctive Facts): Let F0 := F, Fi+1 := TR (Fi ), and n ∈ IN with Fn = Fn−1 . Then the set of derivable disjunctive facts is DR (F) = {F ∈ Fn | there is no F 0 ⊂ F with F 0 ∈ Fn }. Now we of course want to prove correctness and completeness. But there is a problem: In example 3.1, the 2n − 1 disjunctions derived from q(a1 ) ∨ q(a2 ) ∨ · · · ∨ q(an ) are logical consequences of the given formulas. Since we do not want to derive all of these disjunctions, our procedure cannot be complete in this sense. Fortunately, we are usually only interested in disjunctions consisting entirely of literals with the special predicate answer , which is defined by the user and does not appear in rule bodies. If we choose the order ≺ on the ground atoms in such a way that the answer atoms are maximal, then the conditions of the following theorem are satisfied for the set A of atoms of the form answer (c1 , . . . , cn ): Theorem 3.5: Let A be a set of ground atoms satisfying the following conditions: • A is upwards closed wrt ≺, i.e. if A ∈ A and A ≺ A0 , then A0 ∈ A. • For every rule instance A1 ∨ · · · ∨ An ← B1 ∧ · · · ∧ Bm in R∗ : {B1 , . . . , Bm } ∩ A = ∅. Let A∨ be the disjunctive facts consisting only of atoms from A. Then: DR (F) ∩ A∨ = {F ∈ A∨ | F ∪ R ` F and F ∪ R 6` F 0 for every F 0 ⊂ F }. Proof: The correctness is easy: Consider a derivation step with TR : A model of the rule A1 ∨ · · · ∨ An ← B1 ∧ · · · ∧ Bm and the disjunctive facts F1 , . . . , Fm must satisfy one of the contexts Fi − {Bi } or all the Bi , and therefore one of the Aj . But this means that the resulting disjunction is satisfied. Now we have to show that if F is a minimal disjunction with R ∪ F ` F , then it is derivable by iterated application of TR . We first prove this for sets of ground rules only, and later lift the argument to the general case. The proof is by induction on the number n of ground atoms appearing in R ∪ F ∪ {F }.

6

The case n = 0 is trivial since F can only be the empty disjunction 2, and then F must include 2. In the inductive step, let us first assume that F is not 2. Then let A be any atom in F , and let F 0 := F − {A}. Let F 0 and R0 the result of evaluating A as false, i.e. remove A from every disjunctive fact or rule head, and delete rules containing A in the body. Then it is easy to see that • F 0 ∪ R0 ∪ {F 0 } contains (at least) one fewer ground atom than R ∪ F ∪ {F }, • F 0 ∪ R0 ` F 0 (a model of F 0 ∪ R0 not satisfying F 0 can be extended to a model of F ∪ R not satisfying F ), • F 0 is still ⊆-minimal with this property (if F 00 ⊂ F 0 would follow from F 0 ∪ R0 , then F 00 ∨ A would follow from F ∪ R). So by the inductive Hypothesis, F 0 ∈ DR0 (F 0 ). Now the same derivation steps can be performed from F by R, with the only difference that the derived facts can contain A in addition. But the conditions on A ensure that A is always contained in the context and never prevents a resolution step. And since F was ⊆-minimal and the correctness of TR is already proven, we can conclude that F is derived from F ∪ R, and not F 0 . Now the second case in the inductive step ist F = 2. Let A be the ≺-maximal ground atom appearing in F ∪ R. As before, we construct a reduced set F 0 ∪ R0 by interpreting A as false. Since F ∪ R has no model at all, it certainly has no model in which A is false, so F 0 ∪ R0 ` 2, and the inductive hypothesis yields a derivation of 2 from F 0 ∪ R0 . Now, because A was the ≺-maximal atom, it does not prevent any of the previous resolution steps if we put it back into the rule heads and disjunctive facts. So we are able to derive 2 or A from F ∪ R. In the first case, we are done. In the second case, we construct F 00 and R00 by interpreting A as true (i.e. we remove A from the rule bodies and delete rules and facts containing it in the head). The inductive hypothesis now gives us a derivation of 2 from F 00 ∪ R00 . In order to turn this into a derivation from F ∪ R, we may need A in some rule bodies. But we know already that A is derivable. This completes the induction for the ground case. In the general case, we can conclude by Herbrands theorem that there is a finite set R0 of ground instances of rules in R such that F ∪ R0 ` F (in fact, the ground instances with the finitely many constants appearing in F ∪ R would have this property). But because of the safety condition very variable is bound in each derivation step, so the “lifting” of a proof from the ground instances is trivial here. 2

4

Types of Derived Disjunctions

In this section, we introduce the notion of disjunction types determining which predicates can appear together in the disjunctive facts generated during query evaluation.

Disjunction Types Basically, a disjunction type is a multiset of predicates. So in order to determine the type of a disjunctive fact, we simply count how often each predicate occurs. But sometimes

7

unbounded disjunctions are derivable. For instance, in example 3.1, the length n of the derivable disjunctions depends on the given facts about s, but our analysis should be independend of them. So we introduce the special count + representing “any number ≥ 1”. If too many disjunction types would be generated, it may be useful to intenionally discard information about the disjunctions, so we use ∗ to represent “any number ≥ 0”. Definition 4.1 (Disjunction Type): A disjunction type is a function τ : P → IN0 ∪ {+, ∗}. We denote a disjunction type τ by a string consisting of the predicates p with τ (p) 6= 0, showing the number τ (p) in the exponent (if 6= 1). For instance, the disjunctive fact p(a) ∨ p(b) ∨ r(c) has the disjunction type p 2 r: Definition 4.2 (Disjunction Type of a Disjunctive Fact): Let A1 ∨ · · · ∨ An be a disjunctive fact, and let pi be the predicate of Ai . Then the disjunction type τ of A1 ∨ · · · ∨ An is defined by τ (p) := |{Ai | pi = p}|. Besides this disjunction type giving the exact counts there are also more general disjunction types: Definition 4.3 (Subsumption between Disjunction Types): A disjunction type τ subsumes a disjunction type τ 0 iff for every p ∈ P one of the following conditions holds: • τ (p) = τ 0 (p) or • τ (p) = + and τ 0 (p) ∈ IN − {0}. • τ (p) = ∗. For instance, p2 r is subsumed by p+ r, p2 r+ , p∗ q ∗ r∗ and other disjunction types. Before we can start our analysis, we need two further assumptions: First, since the analysis should look only at the predicates, but not at their arguments, we must assume that the order ≺ uses the predicate as the most important sorting condition. So there is a linear order ≺0 on the predicates such that if p ≺0 q, then A ≺ B for every p-atom A and q-atom B. Second, if we knew nothing about the disjunctive facts contained in F, our analysis could only report that all disjunction types are possible. We must assume that the possible disjunction types TF in the fact base F were defined during database design. This should be done anyway, since if the users were allowed to enter arbitrary disjunctions into the database, the rules might interact in unexpected ways. Definition 4.4 (Allowed Fact Base): F is allowed with respect to TF iff for every F ∈ F there is τ ∈ TF such that τ subsumes the disjunction type of F .

8

Disjunction Types Resulting from a Rule Application Now the goal is to compute the disjunction types which can be generated from the given disjunction types TF by applying the rules in R. First note that several disjunction types may result from applying the same rule to facts of the same disjunction types. For instance, if we apply the rule r(X) ∨ r(Y ) ← p(X) ∧ q(X, Y ) to a disjunctions of type pr and qr, the resulting disjunction may contain from 1 to 4 r-literals (depending on whether the arguments of the generated literals are identical). But such extreme cases should rarely appear in practical applications. Definition 4.5 (Head Type): τ is head type of a rule A1 ∨ · · · ∨ An ← B1 ∨ · · · ∨ Bm iff there is a ground substitution θ such that τ is the disjunction type of (A1 ∨ · · · ∨ An )θ. Of course, it is not necessary to check all possible substitutions θ (this would be infinitely many). It suffices to substitute the variables by constants occuring in the head or one distinct new constant per variable. Definition 4.6 (Matching Disjunction Type): atom q(t1 , . . . , tn ) iff

A disjunction type τ matches an

• τ (q) 6= 0, and • τ (p) ∈ {0, ∗} for all p ∈ P with p ≺0 q. Definition 4.7 (Context Disjunction Type): Let τ match q(t1 , . . . , tn ). Then τ 0 is a context type of this match iff • τ 0 (p) = 0 for every   τ (q) − 1 0 • τ (q) = 0 or +  ∗

p ≺0 q, if τ (q) ∈ IN, if τ (q) = +, if τ (q) = ∗.

• τ 0 (r) = τ (r) for every r with q ≺0 r. So there are at most two context types, and only if the matched literal has count +: Then the result of substracting 1 can be + or 0. Definition 4.8 (Union Disjunction Type): of τ1 , . . . , τn iff for every predicate p: • If all τi (p) ∈ IN0 , then max τi (p) ≤ τ (p) ≤

A disjunction type τ is a union type P

τi (p).

• If the is an i with τi (p) = ∗, then τ (p) = ∗.

• If there is an i with τi (p) = +, and τj (p) 6= ∗ for all j, then τ (p) = +.

9

Definition 4.9 (Resulting Type of a Rule Application): A disjunction type τ results from the application of a rule A1 ∨ · · · ∨ An ← B1 ∧ · · · ∧ Bm to disjunction types τ1 , . . . , τm iff: • τi matches Bi for i = 1, . . . , m. • Let τi0 be corresponding context types, and τ0 be a head type. Then τ is a union 0 . type of τ0 , τ10 , . . . , τm The following lemma is now obvious from the definitions: Lemma 4.10: Let F be the result of applying A1 ∨· · ·∨An ← B1 ∧· · ·∧Bm to disjunctive facts F1 , . . . , Fm . If τ1 , . . . , τm subsume the disjunction type of F1 , . . . , Fm , then there is a disjunction type τ resulting from the application of this rule to τ1 , . . . , τm which subsumes the disjunction type of F .

Disjunction Type Dependency Graph A disjunction type dependency graph is based on a finite, unambiguous, generated, and complete set of disjunction types: Definition 4.11 (Unambiguous Set of Disjunction Types): A set T of disjunction types is unambiguous iff for every disjunction type τ there are no two different types τ1 , τ2 ∈ T subsuming τ . This property ensures that a disjunctive fact can be classified uniquely. Definition 4.12 (Generated Set of Disjunction Types): A set T of disjunction types is generated iff its elements can be put into a sequence τ1 , τ2 , . . . such that τi subsumes an element of TF or a resulting type of a rule application to disjunction types earlier in the sequence. Definition 4.13 (Complete Set of Disjunction Types): types is complete iff

A set T of disjunction

• For each τ ∈ TF there is τ 0 ∈ T subsuming τ . • For each result type τ of a rule application to disjunction types in T there is τ 0 ∈ T subsuming τ . For example, if p1 , . . . , pn are all the predicates, then p∗1 . . . p∗n forms a set with this properties. Another complete and unambiguous set is  + p i1 . . . p + ik {i1 , . . . , ik } ⊂ {1, . . . , n} .

Of course, usually only a small subset of this is generated. The naive algorithm to compute disjunction types is to start with the disjunction types in TF and to iteratively apply the rules. Of course, this does not necessarily terminate (e.g., in example 3.1 larger and larger disjunction types are generated). But for most

10

practical purposes, disjunction types with very large numbers are not useful anyway. So one ensures termination by allowing only a finite subset of all disjunction types. A canonical choice would be to require that τ (p) ∈ {0, 1, +}. But in example 2.4, it would be useful to allow the count 2 for faulty, because two faulty-literals appear in a rule head. Definition 4.14 (Disjunction Type Dependency Graph): Let T be a finite, unambiguous, generated, and complete set of disjunction types. Then the dependency graph based on T is defined as follows: • The nodes are the disjunction types in T . • If τ ∈ T subsumes a result type of a rule application to τ1 , . . . , τm ∈ T , then there is an edge from each of the τi to τ . • There are no other edges.

5

Translation into Horn Clauses

In this section we present the translation of disjunctive rules into efficiently evaluable Horn clauses. The translation is based on partial evaluation of the following meta-interpreter: disfact(Stored) :db disfact(Stored). disfact(Derived) :rule 1 1(A1, B1), disfact([B1|C1]), merge([A1], C1, Derived). ... disfact(Derived) :rule 2 2(A1, A2, B1, B2), disfact([B1|C1]), disfact([B2|C2]), merge(C1, C2, C1C2), merge([A1], C1C2, C1C2A1), merge([A2], C1C2A1, Derived). ... We assume that the disjunctive rules with n head literals and m body literals are stored in rule n m-facts. For instance, the rule p(X, a) ∨ p(X, b) ← q(X, a) ∧ r(X) would be represented as: rule 2 2(p(X, a), p(X, b), q(X, a), r(X)). We need a rule in the meta-interpreter for every occurring n and m (this problem seems to be inherent in bottom-up meta-interpreters, in [Bry90] a built-in predicate evaluate

11

was invented for this purpose). The given disjunctive facts are stored as ≺-sorted lists in db disfact. The built-in predicate merge merges two ≺-sorted lists and removes duplicate elements. Note that we must merge the head literals to the context one by one, since we know nothing about their order. Of course, the next step is to partially evaluate the meta-interpreter with respect to the given rules R. In the case of our example rule, this would lead to: disfact(Derived) :disfact([q(X, a)|C1]), disfact([r(X)|C2]), merge(C1, C2, C1C2), merge([p(X, a)], C1C2, C1C2A1), merge([p(X, b)], C1C2A1, Derived). The final step is to replace the general disfact-predicate by predicates for the actually appearing disjunction types T . The predicate corresponding to a disjunction type τ has the following arguments: • For a predicate with τ (p) ∈ IN, we simply have τ (p) copies of the arguments of p. • For a predicate with τ (p) = +, there is an argument which is a non-empty list of p-literals (in ≺-order). • For predicates with τ (p) = ∗ there is a single list containing all their literals. In this way, p∗1 . . . p∗n corresponds to the above predicate disfact. Given a disjunctive rule A1 ∨ · · · ∨ An ← B1 ∧ · · · ∧ Bm we construct a Horn clause for every disjunction types τ1 , . . . , τm matching the body literals and every τ subsuming a resulting type (sometimes we need an additional case distinction). The rule generation is straightforward, we illustrate it with a few examples for p(X, a)∨p(X, b) ← q(X, a)∧r(X): • The easiest case is for the disjunction types q and r matching the body literals, and the resulting disjunction type p2 : p2 (X, a, X, b) : − q(X, a), r(X). Of course, p2 (X, a, X, b) corresponds to disfact([p(X, a), p(X, b)]). We assume that p(X, a) ≺ p(X, b) holds for every value of X (e.g., ≺ is the lexicographical order). • Now suppose the body disjunction types are qr and r 2 , and the head disjunction type is p+ r2 /p+ r: p+ r2 ([p(X, a), p(X, b)], Y, Z) : − qr(X, a, Y), r2 (X, Z), Y < Z. p+ r2 ([p(X, a), p(X, b)], Z, Y) : − qr(X, a, Y), r2 (X, Z), Z < Y. p+ r([p(X, a), p(X, b)], Y) : − qr(X, a, Y), r2 (X, Y). Here we cannot determine the ≺-sequence of r(Y ) and r(Z) at “compile time”, so we do it at “runtime” (we assume that r(Y ) ≺ r(Z) ⇐⇒ Y < Z). This test and the case distinction is the result of partially evaluating the call to merge.

12

• Finally, if the given disjunction types are qr + and r+ , we must call merge: p2 r+ (X, a, X, b, C1C2) : − qr+ (X, a, C1), r+ ([r(X)|C2]), merge(C1, C2, C1C2). Fortunately we know that C1C2 is non-empty (because C1 is non-empty), otherwise we would again need to distinguish two cases. Of course, if too many rules should be generated, one should start with a smaller set T containing more general disjunction types. But note that all these rules result from partial evaluation of the above meta-interpreter, so they do not cause any new work.

6

Conclusions

Our long-term goal is to develop an efficient bottom-up query evaluation algorithm for knowledge bases containing disjunctions and prioritized defaults. Defaults are handled in [BL92, BL93], but the algorithms presented there make heavy use of a logical theorem-prover. Therefore, in this paper, we concentrated on the question how to efficiently derive logically implied disjunctive facts. Any progress made here will directly improve the overall algorithm. The situation at the moment seems to be that bottom-up evaluation of Horn clauses is very efficiently possible, whereas in the disjunctive case most people can hardly believe that it might work at all. Therefore we tried to utilize the work invested in the Horn case. We proposed a translation of disjunctive rules into Horn clauses (with a few list-valued arguments). This is based on an optimization making the resolvable literal of a disjunctive fact unique and on the analysis which types of disjunctions can appear. Of course, for a given set of rules, some orders ≺ restricting the resolvable literal are certainly better than others. So it would be very useful to automatically determine an “optimal” ≺. This is subject of our future research. Finally, the extended bottom-up evaluation should be made goal-directed using ideas of [Dem91]. In fact, our research about the disjunction types was motivated by the insight that something like the magic set rewriting technique is impossible for disjunctive rules, because the extended bottom-up evaluation can make no difference between disjunctive facts with different contexts. In [Dem91] this problem was solved by defining rather special deduction rules, but with our notion of disjunction types, it should be possible to make better use of known implementation techniques.

Acknowledgement I would like to thank Udo Lipeck for supervising my doctoral thesis (which contains the result of section 3) and for lots of fruitful discussions.

13

References [BH86]

N. Bidoit, R. Hull: Positivism vs. minimalism in deductive databases. In Proc. of the 5th ACM Symp. on Principles of Database Systems (PODS’86), 123–132, 1986.

[BL89]

S. Brass, U. W. Lipeck: Specifying closed world assumptions for logic databases. In J. Demetrovics, B. Thalheim (eds.), 2nd Symp. on Mathematical Fundamentals of Database Syst. (MFDBS’89), 68–84, LNCS 364, SpringerVerlag, 1989.

[BL91]

S. Brass, U. W. Lipeck: Semantics of inheritance in logical object specifications. In C. Delobel, M. Kifer, Y. Masunaga (eds.), Deductive and ObjectOriented Databases, 2nd Int. Conf. (DOOD’91), 411–430, LNCS 566, Springer, 1991.

[BL92]

S. Brass, U. W. Lipeck: Generalized bottom-up query evaluation. In A. Pirotte, C. Delobel, G. Gottlob (eds.), Advances in Database Technology — EDBT’92, 3rd Int. Conf., 88–103, LNCS 580, Springer-Verlag, 1992.

[BL93]

S. Brass, U. W. Lipeck: Bottom-up query evaluation with partially ordered defaults. Research Report 93Br01, Institut f¨ ur Informatik, Universit¨at Hannover, 1993.

[Bry90]

F. Bry: Query evaluation in recursive databases: bottom-up and top-down reconciled. Data & Knowledge Engineering 5 (1990), 289–312.

[CGT90]

S. Ceri, G. Gottlob, L. Tanca: Logic Programming and Databases. Surveys in Computer Science. Springer-Verlag, Berlin, 1990.

[CL73]

C.-L. Chang, R. C.-T. Lee: Symbolic Logic and Mechanical Theorem Proving. Academic Press, New York, 1973.

[Dem91]

R. Demolombe: An efficient strategy for non-horn deductive databases. Theoretical Computer Science 78 (1991), 245–259.

[HPRV89] G. Hulin, A. Pirotte, D. Roelants, M. Vauclair: Logic and databases. In A. Thayse (ed.), From Modal Logic to Deductive Databases — Introducing a Logic Based Approach to Artificial Intelligence, 279–350. Wiley, 1989. [KLW90] M. Kifer, G. Lausen, J. Wu: Logical foundations of object-oriented and framebased languages. Technical report, SUNY at Stony Brook, 1990. [LMR92]

J. Lobo, J. Minker, A. Rajasekar: Foundations of Disjunctive Logic Programming. MIT Press, Cambridge, Massachusetts, 1992.

14

[LV90]

E. Laenens, D. Vermeir: A fixpoint semantics for ordered logic. Journal of Logic and Computation 1:2 (1990), 159–185.

[MR90]

J. Minker, A. Rajasekar: A fixpoint semantics for disjunctive logic programs. The Journal of Logic Programming 9 (1990), 45–74.

[NT89]

S. Naqvi, S. Tsur: A Logical Language for Data and Knowledge Bases. Computer Science Press, New York, 1989.

[PDR91]

G. Phipps, M. A. Derr, K. A. Ross: Glue-Nail: A deductive database system. In J. Clifford, R. King (eds.), Proc. of the 1991 ACM SIGMOD Int. Conf. on Management of Data (SIGMOD’91), 308–317, 1991.

[Rei78]

R. Reiter: On closed world data bases. In H. Gallaire, J. Minker (eds.), Logic and Data Bases, 55–76, Plenum, New York, 1978.

[RSS92]

R. Ramakrishnan, D. Srivastava, S. Sudarshan: CORAL — control, relations and logic. In L.-Y. Yuan (ed.), Very Large Data Bases, Proc. of the 18th Int. Conf. (VLDB’92), 238–250, Morgan Kaufmann Publishers, 1992.

[RT88]

K. A. Ross, R. W. Topor: Inferring negative information from disjunctive databases. Journal of Automated Reasoning 4 (1988), 397–424.

[Ull88]

J. D. Ullman: Principles of Database and Knowledge-Base Systems, Vol. 1. Computer Science Press, Rockville, 1988.

[Ull89]

J. D. Ullman: Principles of Database and Knowledge-Base Systems, Vol. 2. Computer Science Press, Rockville, 1989.

Our papers are available on the ftp-server “wega.informatik.uni-hannover.de” (130.75.26.1).

15