Constructing Partial and Complete Intelligent Answers for Recursive Queries Xubo Zhang and Z. Meral Ozsoyoglu
Department of Computer Engineering and Science Case Western Reserve University Cleveland, OH 44106 Email: fxzhang,
[email protected] Phone: (216) 368-f8843, 2818g Abstract In this paper we study the issue of providing intelligent answers to recursive queries initially proposed in [Imie88]. In contrast to conventional query answers, which are sets of tuples, intelligent answers are rules describing the characteristics of the results. This way, an intelligent answer facilitates users to better capture the meaning of the query. To this aim, we rst extend the notion of intelligent answers by allowing them to be partial (or incomplete). Complete and partial intelligent answers are both useful, in that they provide (possibly partial) characterizations of queries. We will show that complete intelligent answers also give a more ecient way to evaluate queries. For constructing intelligent answers, we propose a general approach that enables us to combine several cases studied in [Imie88]. Moreover, we will show that semantic constraints can be systematically utilized to provide complete intelligent answers to a wider range of queries. We consider two commonly used classes of semantic constraints called implication and referential constraints, and prove that the completeness of intelligent answers with respect to semantic constraints can be reduced to the semantics-based query containment problem. Finally, a necessary and sucient condition is given to solve this containment problem.
1
1 Introduction Deductive database extends the relational model by providing a simple yet powerful logic framework for manipulating knowledge instead of mere facts. Intensional relations can be declaratively de ned on factual data. One essential extension by deductive databases is recursion. Through recursion, queries not expressible in relational operations can be formulated. A deductive database generally consists of three components: extensional database (EDB), which are relations that are physically stored sets of tuples (ground facts); intensional database (IDB), which are relations de ned by (Datalog) rules; and semantic constraints that are rules regulating meaningful and legal database states. Although recursive rules increase the expressive power, they could be costly to evaluate and hard to understand. Ecient evaluation of recursive rules thus became a research focus for many years, and signi cant progress has been made. One main strategy to deal with recursion is to push the query restrictions into recursion, so that we don't have to evaluate the least xed point, but only a portion of it pertinent to the query. Magic set transformation is a comprehensive approach that successfully embodies this strategy. Another method proposed by Imielinski [Imie88] attacked the problem from the viewpoint of providing intelligent answers to recursive queries, in which both atomic facts and (intelligent) rules are present. Intelligent rules usually provide easy-to-understand properties of the result. They also restrict the original least xed points by propagating query restrictions. The third advantage of the intelligent rules de ned by Imielinski is that queries can be evaluated incrementally using these rules, realizing the idea of lazy-computation. We call such rules complete intelligent rules. [Imie88] studied the cases of query syntax in terms of relational operators for formulating complete intelligent rules. In this paper we rst extend the notion of intelligent answers by allowing them to be partial. While complete intelligent answers are equivalent to original queries, partial intelligent answers are also useful in that they provide partial characterizations of queries for users' better understanding. To construct intelligent rules, we propose a procedure similar to the rule composition (goal reduction) process that is often used in logic programming. Thus we give a uni ed way to generate complete or partial intelligent answers. Moreover, our approach allows us to use semantic constraints systematically in constructing complete intelligent answers for a wider range of queries. As other related works, the issue of providing intelligent (intensional) answers to 2
non-recursive queries based on semantic constraints are discussed in [ChDe87], [Motr89], [PiRo89], [FoGr92], etc. Later we will outline the relationship between our work and these works. First let us illustrate the problem and our approaches through a simple example.
Example 1.1
Suppose we have a database system consisting of the following components: 1. Three EDB predicates: course(course; subject; level), prereq(course; coursep ), and group(professor; coprofessor). 2. An IDB predicate teach(professor, course) de ned by an initial set of tuples (base facts) and the following rule, saying that if professor P can teach course C , then he/she can also teach any prerequisite course Cp of C ,
p : teach(P; Cp) 1
teach(P; C ); prereq(C; Cp):
Now, suppose we have a query asking for the courses John can teach, i.e.,
q : query(C )
teach(john; C ):
1
The obvious way to answer the query is to evaluate the recursive relation teach rst and then do the selection-projection on it. This evaluation becomes unnecessarily costly when there is only a small portion of the teach relation is about John. By the results in [Imie88] (rules for projection and selection), we can transform query q into the following rule: iq : query(Cp) query(C ); prereq(C; Cp) where iq stands for the \intelligent answer for q ". This rule says that if a course is in the answer, then all of its prerequisite courses are also in the answer. As we can see, iq makes the original query more meaningful. This rule also allows us to evaluate the query directly, without evaluating the least xed point of teach; moreover, a lazy-evaluation is made possible here: we can rst compute the initial set for query by selection and projection on the initial teach-tuples, and then perhaps upon users' request, one or more rounds of evaluations can be performed. We call rule iq a complete intelligent answer because every query-tuple can be generated this way. 1
1
1
1
1
1
3
Let us consider another query asking for the courses of levels 400 or higher that John can teach: q : query(C ) teach(john; C ); course(C; S; L); L 400: we would naturally expect that a similar intelligent answer serves the same purposes: iq : query(Cp) query(C ); prereq(C; Cp); course(Cp; Sp; Lp); Lp 400: The above rule is sound, in the sense that every tuple it produces satis es q . In fact, iq still provides a useful property for query. It is, however, not complete with respect to the lazy-evaluation strategy, i.e., there exists some database instance on which q produces some tuple that iq is unable to. Following is such an instance: fteach(john; database); prereq(database; datastructure); course(datastructure; cs; 400)g: On this database instance (readers may notice that it does not satisfy referential integrity), rule q can not generate any initial tuples for query, so rule iq won't produce any query fact. However the whole teach-relation is fteach(john; database); teach(john; datastructure)g; therefore q does produce a fact query(datastructure). In fact, by [Imie88] query q does not satisfy the conditions for total transformation, so no complete intelligent answer can be formed. However, things change if we have the following two semantic constraints: 1. any course appearing in prereq relation must also appear in course relation. 2. the level of a course must be greater than that of its prerequisite courses. Put into formulas, they are: 2
2
2
2
2
2
2
2
2
2
rc : 8(C; Cp)[prereq(C; Cp) ! 9(S; L)course(C; S; L)] 1
ic : 8(C; S; L; Cp; Sp; Lp) [prereq(C; Cp); course(C; S; L); course(Cp; Sp; Lp) ! L > Lp] 1
rc is called a referential constraint, and ic is called implication constraint. An equivalent and, as will be seen in the next section, canonical form for the implication constraint ic is 1
1
1
4
8(C; S; L; Cp; Sp; Lp)[prereq(C; Cp); course(C; S; L); course(Cp; Sp; Lp); L Lp ! false]. When frc ; ic g is enforced database instances such as the one above will not exist, and we can prove that iq becomes a complete rule. 2 1
1
2
In the subsequent sections, we rst brie y go over some preliminary notions, and then we introduce a formal method for deriving intelligent answers. After that the issue of utilizing semantic constraints to test the completeness is discussed.
2 Preliminaries We use a partially interpreted rst order language. There is a nite set of types
T = f ; :::; d g(d 1) 1
Each i has a xed interpretation (domain, universe) which is either the set of real numbers, IR, or the set of strings over the English alphabet, . IR is dense and totally ordered with the transitive and irre exive ordering predicate 1) such that RC contains 1
8
p (w ) ! p (w0 ); p (w ) ! p (w0 ); :::; and pn(wn) ! p (w0 ): 1
1
2
2
2
2
3
1
3
1
3.2 Computing the Least Fixed Points The least xed point of a recursive Datalog program is the set of the facts (tuples) that are derivable using the rules and base facts. For a Datalog (without negation) program P , we de ne an operator TP as follows: for any database instance D, TP (D) = fAjA B ; :::; Bj is a ground instance of a rule in P , and each Bi is either in D or a true (in)equalityg. For any given database database instance D, the procedural semantics of Rp is the least xed point computed by Rp on D, which is always nite ([Ull89a]), and can be expressed as lfp(Rp)(D) = TR " m (D) for some integer m 0, where 1
p
1. TR " 0 (D) = D, and p
2. TR " (k + 1) (D) = TR (TR " k (D)) [ TR " k (D). p
p
p
p
For any k 0, we call the p-tuples in TR " k (D) p k -tuples (wrt D), and call the p-tuples in TR (TR " k (D)) new p k -tuples. Notice that new p k -tuples will be contained in the p k -tuples after a xed point has been reached before the k +1st round of computation. query-tuples can also be produced iteratively: for any k 0, we can apply rule q : query(u) p(u ); F; I to TR " k (D) to obtain a set of query-tuples. We call such tuples, i.e. tuples in Tq (TR " k (D)) query k -tuples (we use Tq instead of Tfqg to denote the operator for the single-rule program fqg). Since predicate p appears only once in rule q, we know that query k -tuples not producible as query k -tuples can only be produced from new p k -tuples. Therefore, we call the tuples in Tq (TR (TR " k (D))) new query k -tuples. Our goal is to establish a relationship between new query k -tuples and query k tuples, so that the recursive relation p needs not to be evaluated. Although this relationship may be partial, it must be correct. By de nition, query k -tuples are computed by the following rule rewritten from rule q : query(u) p(u ); F; I : q k : query k (u) p k (u ); F; I: ( )
p
p
( +1)
p
( +1)
( )
1
p
( )
p
( +1)
( )
( +1)
p
p
( +1)
( +1)
( +1) 1
( +1)
( +1)
( +1)
9
1
( )
Also by de nition, new p k -tuples are computed by the following rules rewritten from rules pi : p(vi ) Pi; Fi; Ii, for i = 1; :::; n: ( +1)
pi k
( +1)
: p k (vi) ( +1)
P i k ; Fi ; Ii ; ( )
where Pi k is Pi with the occurrences of p replaced by p k . Note that the above reasoning does not require the original rules to be linear. Since new query k -tuples can only be produced from new p k -tuples, the composition of the above rewritten rules allows us to compute new query k -tuples from p k -tuples. Now the missing link in the chain is the dependency of p k -tuples on query k -tuples. Indeed, this can be established by the following \reverse reasoning" rule: ( )
( )
( +1)
( +1)
( +1)
( )
( )
( )
q k : p k (u0 ) ( )
( )
query k (u); ( )
1
where u0 is u with variables not in u replaced by (Skolem) terms of the form h(u), where h is a unique Skolem function symbol. The correctness of the above rule is as follows: assume query k -tuples are produced correctly, by the query de nition rule q, the following formula is true on Tq (TR " k (D)) (for any D): 8V[query k (u) ! 9V0[p k (u ); F; I ]] where V are the variables in u, V0 are the variables in u or F but are not in u. So formula 8V[query k (u) ! 9V0[p k (u )]] k is also true. Rule q is the Skolemized form of this formula. k Hereafter we use Pk to denote the set of rules fp k ; :::; pnk ; q k ; q g. 1
1
( )
p
( )
( )
1
1
( )
( )
1
( )
( +1) 1
+1
( +1)
( )
( +1)
Example 3.1
Continued from Example ??. From rule p : teach(P; Cp) teach(P; C ); prereq(C; Cp), we can obtain the following rule specifying new teach k -tuples on teach k -tuples: p k : teach k (P; Cp) teach k (P; C ); prereq(C; Cp): And from rule q : query(C ) teach(john; C ); course(C; S; L); L 400, we have the following two rewritten rules: 1
( +1)
( +1) 1
( +1)
( )
2
10
( )
q k : query k (C ) teach k (john; C ); course(C; S; L); L 400; and q k : teach k (john; C ) query k (C ): The rst rule speci es how query k -tuples are de ned on teach k -tuples. Notice that since teach appears once in the body of rule q , new query k -tuples can only be produced from new teach k -tuples. The second rule is obtained as follows: since query k -tuples are (assumed) to be produced correctly, so by rule q , the following formula must be true: ( +1) 2
( +1)
( +1)
( )
( )
2
( )
( +1)
( +1)
( +1)
2
( +1)
( )
2
8C [query k (C ) ! 9(S; L)[teach k (john; C ); course(C; S; L); L 400]]: ( )
( )
Rule q
k
( ) 2
(among other clauses) is obtained by Skolemizing the above formula. 2
3.3 Rule Composition We have the following observation on the rules in Pk : q k de nes query k -tuples on p k -tuples; pik 's de ne new p k -tuples on p k -tuples; k and q de nes p k -tuples on query k -tuples. Therefore, we may use SLD-resolution (rule composition, goal reduction) in logic programming to obtain rules that de ne new query k -tuples on query k -tuples. Specifically, we use the following rule composition process: Initially let H = B = query k (u). Then repeat the following steps until a stop is encountered: 1. Choose (nondeterministically) an ordinary subgoal G from B and a rule r : H 0 B 0 (so renamed that r shares no variables with G) from Pk , such that G and H 0 unify with a most general uni er (mgu) , stop and return \H B " if there is no such G and r; 2. Replace G by the subgoals in B 0 ; 3. Apply to H and B , stop and return \fail" if the (in)equalities in B are unsatis able. +1
( +1)
( +1)
( +1)
( +1)
( +1)
( )
( )
( )
( )
( +1)
( )
( +1)
+1
It is easy to see that the above procedure always terminates, because there is no recursion (either direct or mutual) in program Pk . +1
11
With dierent choices of the subgoals and rules, we will get dierent resulting rules. We are only interested in those valid Datalog rules that de ne query k -tuples on query k -tuples: let H B be a resulting rule, we call it a q k q k -rule if 1) there are no p k - or p k -subgoals in B , and 2) there are no Skolem function symbols in B . Dropping the superscripts from a q k q k -rule we get an intelligent rule, unless it is trivial, i.e., there is a subgoal in the B that is the same as H . The intelligent answer Riq for query, is the Datalog program consisting of all the intelligent rules obtained this way. ( +1)
( )
( +1) ( )
( +1)
( )
( +1) ( )
Example 3.2
Continued from the previous example, where we got Pk = fp k ; q k ; q k g. Following is a computation of the rule composition process, where the rules used are put above =), and the choices of subgoals and mgu's are not written explicitly. query k (C ) query k (C ) q =) query k (Cp) teach k (john; Cp); course(Cp; Sp; Lp); Lp 400 ( +1) 1
+1
( +1)
( +1) 2
( )
2
( +1)
(k+1) 2
( +1)
( +1)
p(1k+1)
=)
query k (Cp) teach k (john; C ); prereq(C; Cp); course(Cp; Sp; Lp); Lp 400 ( )
( +1)
(k)
q = ) 2
query k (Cp) query k (C ); prereq(C; Cp); course(Cp; Sp; Lp); Lp 400. ( +1)
( )
Here we get a q k q k -rule as the resulting rule, which is the only q k q k -rule that can be obtained by the rule composition process. Dropping the superscripts we get the intelligent rule iq : query(Cp) query(C ); prereq(C; Cp); course(Cp; Sp; Lp); Lp 400. 2 ( +1) ( )
( +1) ( )
2
Let us take a look at another example, in which we suppose to have a second rule about teach, which says professors in a group can teach the same courses:
p : teach(P ; C ) 2
2
teach(P ; C ); group(P ; P ): 1
12
1
2
Example 3.3
Now let us ask the following query: list the professors and the mathematics courses they can teach: q : query(P; C ) teach(P; C ); course(C; math; L): 3
Using the above procedure we can easily get the following two intelligent answers for this query:
iq : query(P; Cp)
query(P; C ); prereq(C; Cp); course(Cp; math; Lp)
iq0 : query(P ; C )
query(P ; C ); group(P ; P ); course(C; math; L):
3
and 2
3
1
1
2
Note that to transform this query, the approach in [Imie88] would require an auxiliary predicate and certain analysis of the relationship between p and p . 1
2
3.4 Soundness of the Intelligent Answer We conclude this section by giving a theorem on the soundness of the intelligent answer Riq speci ed in the previous subsections. By soundness we mean that the rules in fqg [ Riq can not produce any tuples that are not producible by the original set of rules fp ; :::; pn; qg. 1
Theorem 3.1 (Soundness) The intelligent answer Riq speci ed in this section is sound,
i.e.,
Tq (lfp(Rp)(D)) lfp(Riq )(Tq (D) [ D)jquery for any database instance D.
Proof. Let D be any database instance. We prove that
Tq (TR " k (D)) TR " k (Tq (D) [ D)jquery p
iq
for any k 0 by induction on k. The base case is trivially true by de nition. Assume Tq (TR " k(D)) TR " k (Tq (D) [ D)jquery . To prove p
iq
Tq (TR " (k + 1) (D)) TR " (k + 1) (Tq (D) [ D)jquery ; p
iq
13
k we only need to show that rules p k , ..., pnk , q k , and q are correct, since the q k q k -rules are derived from them. The correctness of pik 's and q k is k obvious. The correctness of q is as follows: by the induction hypothesis and rule q : query(u) p(u ); F; I , the following formula is true on any Tq (TR " k (D)): ( +1) 1
( +1)
( +1)
( )
( +1)
( +1) ( )
( +1)
( )
1
p
8V[query k (u) ! 9V0[p k (u ); F; I ]]; where V are the variables in u, V0 are the variables in u or F but are not in u. So 8V[query k (u) ! 9V0[p k (u )]] ( )
( )
1
1
( )
( )
1
k is also true. Skolemize this formula we get q . 2 ( )
4 Complete Intelligent Answers We have given a uni ed procedure to generate sound intelligent answers. We would also like to know if they are complete ones, i.e., ones that always produce the same results as the original programs. In [Imie88], sucient conditions on query syntax are given to ensure the completeness (called total transformability). For query evaluation purpose, the complete intelligent answer allows us to evaluate query directly, without pre-computing the least xed point of p. Since we are considering databases where semantic integrity constraints are part of the system, we naturally want to use a more general notion of completeness, a relative completeness based on semantics. It is not dicult to conceive that when semantic constraints are enforced, more queries can have complete intelligent answers. In this and the next sections, we give a sucient condition for testing this (relative) completeness in which semantic constraints are systematically utilized.
De nition 4.1 (Completeness) Let Riq be the intelligent answer speci ed in the pre-
vious section. Riq is said to be complete if
Tq (lfp(Rp)(D)) lfp(Riq )(Tq (D) [ D)jquery for any database instance D satisfying IC [ RC .
Let us review the rule composition process. For any k 0, a successful composition of a q k q k -rule can be partitioned into three stages: in the rst stage, the initial ( +1) ( )
14
query k -goal is reduced to several p k -subgoals by rule q k ; in the second stage, the p k -subgoals are reduced to p k -subgoals by rules pik 's; in the last stage, the k p k -subgoals are reduced to query k -subgoals by rule q . The completeness can not be lost in the rst stage, because new query k - tuples can only be generated from new p k -tuples, as p occurs once in rule q. There is also no loss of completeness in the second stage, because every new p k -tuples can be generated from p k -tuples by rules pik 's. So the completeness can only be lost in the ( +1)
( +1)
( +1)
( +1)
( +1)
( )
( )
( )
( )
( +1)
( +1)
( +1)
( +1)
( )
k last stage, by using rule q . Let us call the intermediate rules obtained right after the second stage q k p k rules, since they de ne query k -tuples on p k -tuples. Assuming that the p k -tuples are generated completedly, we can use rule q k : query k (u) p k (u ); F; I to reduce (resolve) all the query k -goals in q k q k -rules into p k -subgoals. Let us call the resulting rules iq k p k -rules. The following theorem tells us how to test the completeness of intelligent answers. ( )
( +1)
( +1)
( )
( )
( )
( +1)
( )
( )
( )
( )
1
( +1) ( )
( )
( )
Theorem 4.1 The intelligent answer Riq speci ed in the previous section is complete, if on any database instance satisfying IC [ RC and for any k 0, the q k p k -rules ( +1)
( )
produce a subset of the set of tuples produced by the iq (k+1) p(k) -rules.
Remark. Although k is a parameter in the theorem, the containment needs to be tested only once, because the q(k+1)p(k) -rules (iq(k+1) p(k) -rules) are of the similar form for all k's. We will discuss this point in detail later. Proof. Let D be any database instance satisfying IC [RC . We prove the following claim by induction on k: if the condition in the theorem is true, then for any k 0,
Tq (TR " k (D)) TR " k (Tq (D) [ D)jquery: p
iq
The base case is trivially true. Assume Tq (TR " k(D)) TR " k (Tq (D) [ D)jquery . By the de nitions of query and p, q k p k -rules de ne all new query k -tuples. Therefore, to prove that Tq (TR " (k + 1)(D)) TR " (k + 1) (Tq (D) [ D)jquery ; it is sucient to prove that the q k p k -rules always produce a subset of new query k tuples that are produced by the q k q k -rules. p
( +1)
p
( )
iq
( +1)
iq
( +1) ( )
( +1)
( +1) ( )
15
By the induction hypothesis, rule
q k : query k (u) ( )
( )
p k (u ); F; I ( )
1
is true on any Tq (TR " k (D)), therefore, we can use it to reduce the q k q k -rules into iq k p k -rules. By the condition given in the theorem, we have Tq (TR " (k + 1)(D)) TR " (k + 1) (Tq (D) [ D)jquery. 2 ( +1) ( )
p
( +1)
( )
p
iq
Example 4.1
Continued from Example ??. In the goal reduction process, the only q k p k -rule is tmp : query k (Cp) teach k (john; C ); prereq(C; Cp); course(Cp; Sp; Lp); Lp 400; and the q k q k -rule is itmp : query k (Cp) query k (C ); prereq(C; Cp); course(Cp; Sp; Lp); Lp 400: To test the completeness of the intelligent answer fiq g we rst use the rule query k (C ) teach k (john; C ); course(C; S; L); L 400 to reduce itmp into the following iq k p k -rule: itmp0 : query k (Cp) teach k (john; C ); prereq(C; Cp); course(C; S; L); course(Cp; Sp; Lp); L 400; Lp 400; and then test whether itmp0 (D) tmp (D) for any D satisfying fic ; rc g. 2 ( +1)
( )
( +1)
1
( )
( +1) ( )
( +1)
1
( )
2
( )
( )
( +1) ( )
1
( )
( +1)
1
1
1
1
1
Since the q k p k -rules and iq k p k -rules are not recursive, each of them corresponds to a conjunctive query ([Ull89b]), which is a class of most commonly used and well understood relational queries. Consequently, the sets of q k p k -rules and iq k p k rules are respectively equivalent to two union queries. A union query is denoted as union(Q ; :::; Qn), where each Qi is a conjunctive query (union(Q ) = Q ). The result of a union query on a database instance D is the set-union of the results of the component queries on D. Given any k 0, let Q = union(Q ; :::; Qn) be correspond to the q k p k -rules, and iQ = union(iQ ; :::; iQm ) be correspond to the iq k p k -rules. By Theorem ??, Riq is complete if for any ( nite) D satisfying IC [ RC , iQ(D) Q(D). This is the semantics-based containment problem ([ZhOz93], [ZhOz94], [Zhang94]). In the rest of the paper, we use the notation (IC [ RC ) j=f (iQ Q) to denote the condition that for ( +1)
( )
( +1)
( )
( +1)
1
( )
( +1)
1
1
( +1)
1
( +1)
1
16
( )
( )
( )
a ( nite) database instance D satisfying IC [ RC , the result iQ(D) of query iQ contains the result Q(D) of query Q. Note that the j=f denotes nite implication, which only considers nite models of the semantic constraints.
Example 4.2
Continuing from the previous example. We can write tmp and itmp0 as the following conjunctive queries respectively: Q = fhCpi : teach k (john; C ); prereq(C; Cp); course(Cp; Sp; Lp); Lp 400g, and iQ = fhCpi : teach k (john; C ); prereq(C; Cp); course(C; S; L); course(Cp; Sp; Lp); L 400; Lp 400g. To decide whether the intelligent answer iq is complete, it is sucient to test whether fic ; rc g j=f (iQ Q ). 2 1
( )
1
( )
1
2
1
1
1
1
5 Semantics-Based Containment Problem Now that we have reduced the problem of deciding the completeness to one of the most basic problems in query optimization area: the query containment problem ([Ull89a, b]). Query Containment problem is important, because it is the foundation of establishing equivalence relation between queries. Here however, we need to solve a more complicated containment problem than studied in most literature: the containment problem with respect to a set of implication and referential constraint. Conceptually speaking, a query Q contains another query Q with respect to a set of implication and referential constraints IRC , if the result of Q (as a set) contains that of Q on any nite database instance. In this case, we write IRC j=f (Q Q). In this section, we will establish a necessary and sucient result for this problem, and illustrate through examples how to use it to test the completeness of intelligent answers. 1
1
1
5.1 Referential Expansions and Symbol Mappings For given sets IC of implication constraints and RC of referential constraints, it is clear that (IC [ RC ) j=f (iQ union(Q ; :::; Qm ) if and only if (IC [ RC ) j=f (iQ Qi ) for i = 1, ..., m. To solve these individual containment problems we rst expand the original query using referential constraints, and then look for certain symbol mappings ([Klug88] and 1
17
[Zhang94]) from the implication constraints to the query. Notations. A conjunctive query is often written in the form Q = fhOi : F; I g, in which hOi is called the output tuple. Note that the type of each and every variable or constant in the output tuple collectively constitute the type of Q. Q is said in normal form ([KKR90]) if there are only distinct occurrences of variables in the subformula fhOi : F g. Two queries are in the same type if
De nition 5.1 (Referential Expansion) Let RC be a set of (Skolemized) RCs and Q = fhOi : F; I g. The referential expansion of Q (by RC ) is a formula of the form fhOi : F re ; I g, where F re is the set (conjunction) of atoms de ned as follows: 1. F F re ; 2. for any t 2 F re and (t1 ! t2 ) 2 RC , if there exists a substitution such that (t1 ) = t, then (t2 ) 2 F re.
For example, query Q in the previous example is expanded as follows by rc : prereq(C; Cp) ! course(C; hs(C; Cp); hl (C; Cp)): Qre = fhCpi : teach k (john; C ); prereq(C; Cp); course(C; hs(C; Cp); hl(C; Cp)), course(Cp; Sp; Lp); Lp 400g. Note that the additional conjunct course(C; hs(C; Cp); hl (C; Cp)) logically derived from the conjunct prereq(C; Cp) in Q by rc . In general, the process of referential expansion may not terminate. For example, if we have two referential constraints like p(X ) ! q(X; h(X )) and q(X; Y ) ! p(Y ); and a query fhX i : p(X )g, then in nitely many conjuncts of the form p(hn(X )) (n 1) are in the body of the referential expansion of this query. However it always terminates with the Acyclic assumption, since we can easily see that the number of conjuncts in a referential expansion will not exceed the number of the original conjuncts times the number of RCs. Now we introduce the notion of symbol mapping ([Klug88], [ZhOz93], [ZhOz94]). Let Q be a query, Q0 be a query or a referential expansion of a query. A symbol mapping : Q ! Q0 is a function from the set of symbols (variables and constants) of Q to the terms in Q0 that satis es the following conditions: 1
1
1
( )
1
1
1. it is an identity on constants and function symbols; 18
2. it induces a mapping that maps the output tuple of Q to that of Q0 ; 3. it induces a mapping from the set of conjuncts of Q to that of Q0 . A symbol mapping also induces a mapping on (in)equalities. For a symbol mapping and a conjunction of (in)equalities I , we write (I ) to denote the formula obtained from I under the mapping induced from (empty I or (I ) means true). Symbol mappings from ICs to queries or referential expansions are similarly de ned, except the second condition above is not applicable.
5.2 Finite Implication In de ning the semantics-based query containment problem we used the notion of nite implication (denoted j=f ) instead of implication (denoted j=), as the former is suitable in database context. These two notions generally do not coincide. However our next lemma shows that if the RCs are acyclic, then nite implication ( nite unsatis ability) and implication (unsatis ability) do coincide and are decidable.
Lemma 5.1
Let Q, Q1 , ..., Qr (r 1) be queries of the same type, and IRC be a set of ICs and RCs such that the RCs are acyclic. Then IRC j=f (union(Q1 ; :::; Qr ) Q) if and only if IRC j= (union(Q1 ; :::; Qr ) Q): Proof. Let Qi = fhOii : Fi0 ; Ii0g for i = 1; :::; s: Assume the variables in Qi's are properly renamed so that they share no variables among themselves and with Q. Let Vd be the set of output variables of Q. Let Vn and Vni be the non-output variables of Q and Qi respectively, for i = 1; :::; s. Let RC = frc1; :::; rctg; where t 0. Logically,
(IC [ RC ) j=f (union(Q ; :::Qr ) Q) 1
is equivalent to saying that formula !f is true. Here \!f " is nite implication, and = ici ^ ::: ^ icr ^ rc ^ ::: ^ rct; and _ = 8Vd[9Vn [F; I ] ! 9(Oi ; Vni) [hOii = hOi; Fi0; Ii0]]; 1
is)
(1
19
where
hOii = hOi (i = 1; :::; s)
is a conjunction of equalities between the corresponding components of hOii and hOi. It is not dicult to see that !f is true if and only if ^ 0 is nitely unsatis able, where 0 is ^ F (b; b0) ^ I (b; b0) ^ 8(Oi ; Vni )[[hOii = hO[Vd b]i]; Fi0; Ii0 !]: is)
(1
Here F (b; b0), and I (b; b0) are respectively F and I with Vd and Vn being replaced by vectors of Skolem constant symbols b and b0 respectively, and O[Vd b] is O with Vd being replaced by b. Clearly, if ^ 0 is unsatis able then ^ 0 is nitely unsatis able. Now suppose M is a model ( nite or not) for ^ 0, then it is a model for ^ 0, where = rc ^ ::: ^ rcs. Since rci's are acyclic, there must exist a nite submodel M of M for ^ 0 , which consists of a number of ground facts in M that is no more than the number of conjuncts in F times the number of RCs. Since any subset of M is a model for ic ^ ::: ^ icr , M0 is a nite model for . 1
1
1
1
2
1
1
5.3 The Containment Theorem Due to the above lemma, we will just use j= instead of j=f . The semantics-based query containment problem can be solved by the following theorem ([Zhang94]):
Theorem 5.1 Suppose we are given the followings: 1. a set of implication constraints IC = fic1 ; :::; icr g (r 0) such that the ici 's are in normal form; 2. an acyclic set of referential constraints RC ; 3. a query Q = fhOi : F; I g and its referential expansion Qre = fhOi : F re; I g; 4. a union query union(Q1 ; :::; Qs) (s 1) where the Qi 's are in normal form and of the same type as Q.
20
Then (IC [ RC ) j=f (union(Q1 ; :::; Qs ) Q) if and only if there exist (n1 + ::: + nr + m1 + ::: + ms) 1 symbol mappings: i;1 ; :::; i;n : ici ! fhi : F re; I g (for i = 1; :::; r), and i;1 ; :::; i;m : Qi ! fhi : F re; I g (for i = 1; :::; s), W W such that I implies (1ir; 1jn ) i;j (Ii ) _ (1is; 1jm ) i;j (Ii0), where I1 ; :::; Ir are the (in)equality subformulas of ic1 ; :::; icr respectively, and I10 ; :::; Is0 are the (in)equality subformulas of Q1 ; :::; Qs respectively. i
i
s
r
Proof. Let Q = fhOi : F; I g and Qre = fhOi : F re ; I g. Let Qi = fhOii : Fi0; Ii0g for i = 1; :::; s. Assume the variables in Qi's are renamed so that they share no variables among themselves and with Q. Let Vd be the output variables in Q. Let Vn and Vni be the non-output variables in Q and Qi respectively, for i = 1; :::; s. Let RC = frc1; :::; rctg (t 0). Logically, (IC [ RC ) j= (union(Q1 ; :::Qr ) Q) is equivalent to saying that formula ! is true, where = ici ^ ::: ^ icr ^ rc1 ^ ::: ^ rct; and _ = 8Vd[9Vn [F; I ] ! 9(Oi ; Vni) [hOii = hOi; Fi0; Ii0]]; is)
(1
where
hOii = hOi (i = 1; :::; s)
is a conjunction of equalities between the corresponding components of hOii and hOi. ! is true if and only if ^ 0 is unsatis able, where 0 is ^ [F (b; b0 ); I (b; b0)] ^ 8(Oi; Vni)[[hOii = hO[Vd b]i]; Fi0; Ii0 !]; is)
(1
where F (b; b0), and I (b; b0 ) are respectively F and I with Vd and Vn being replaced by Skolem constant symbols b and b0 respectively, and O[Vd b] is O with Vd being replaced by b. It can be shown that ^ 0 is unsatis able if and only if ^ ic ^ ::: ^ icr 8(Oi ; Vni)[[hOi i = hO[Vd b]i]; Fi0; Ii0 !] ^ [F re(b; b0); I (b; b0)] 1
is)
(1
21
is unsatis able. Let ici = (F ; Ii !) for i = 1; :::; r. By the above reasoning, (IC [RC ) j= (union(Q ; :::Qr ) Q) if and only if the following (sets) of clauses are unsatis able: 1
1
cl : F re(b; b0 ); 1
cl : 3
_
cl : I (b; b0); b]]i _ :Fi0 _ :Ii0; for i = 1; :::; s; 2
[hOii 6= hO[Vd
and
cl : :Fi _ :Ii; for i = 1; :::; r; 4
where
_ [hOii 6= hO[Vd s]i] is a disjunction of 6=-inequalities between the corresponding components of hOii and hO[Vd s]i. Therefore we can use resolutions (with paramodulation) on conjunct literals to get an unsatis able set of resolvents consisting only of (in)equalities. Since all ici's and Qi 's are in normal form, every possible uni cation between conjunct literals can be performed without paramodulations. Also notice that in every (in)equality resolvent obtained from cl and cl , there exists a subformula of the form _ [hOi i 6= hO[Vd s]i]; which is always satis able unless hOii is mapped to hO[Vd s]i, because Oi does not appear elsewhere. This leads to the conclusion of the theorem. 1
3
2
Example 5.1
We now conclude our running example. The normal form for iQ is iQ0 = fhCpi : teach k (john; C ); prereq(C 0; Cp0 ); course(C 00; S; L); course(Cp00; Sp; Lp); Cp = Cp0 ; Cp = Cp00 ; C = C 0; C = C 00; L 400; Lp 400g, and the normal form for ic is ic0 : prereq(C; Cp); course(C 0; S; L); course(Cp0 ; Sp; Lp); C = C 0; Cp = Cp0 L Lp ! : To nd symbol mappings from iQ0 and ic0 to Qre having the property stated in the theorem, we rst remember that Qre is: 1
1
( )
1
1
1
1
1
22
1
fhCpi : teach k (john; C ); prereq(C; Cp); course(C; hs(C; Cp); hl (C; Cp)); course(Cp; Sp; Lp); Lp 400g. ( )
Following is a symbol mapping from iQ0 to Qre: = fCp; Cp0 ; Cp00 7! Cp; john 7! john; C; C 0; C 00 7! C; S 7! hs(C; Cp); L 7! hl (C; Cp); Sp 7! Sp; Lp 7! Lpg, under which the (in)equality subformula of iQ0 is mapped to I = (hl (C; Cp) 400) ^ (Lp 400). There is also a symbol mapping from ic0 to Qre: = fC; C 0 7! C; Cp; Cp0 7! Cp; S 7! hs(C; Cp); L 7! hl (C; Cp); Sp 7! Sp; Lp 7! Lpg; under which the (in)equality subformula of ic0 is mapped to I = hl (C; Cp) Lp. Let I be the (in)equality subformula of Qre, i.e. I = Lp 400. It is readily checked that the symbol mappings and are such that the implication I ! (I _ I ) is made true. Thus by Theorem ??, fic ; rc g j=f (iQ Q ) is true. Here we can see that the existence of , , and the implication relation do not depend on any particular k, therefore the semantics-based containment holds for any k 0. By Theorem ??, iq is indeed a complete intelligent answer. 2 1
1
1
1
1
2
1
1
2
2
1
1
1
2
1
1
1
1
1
2
1
2
2
6 Conclusions and Future Works In this paper we have addressed the issue of providing intelligent answers to recursive queries. We rst extend the notion of intelligent answers by allowing them to be partial. And then, under the frame work of logic programming, we give a uni ed procedure to produce intelligent answers. We also give a necessary and sucient characterization to test their completeness based on data semantics. It should be pointed out that complete intelligent answers are often ideal to obtain. Most of the times we have to be content with partial ones, for example, when there are multiple rules or nonlinear recursive rules de ning p. To enlarge the scope of constructing complete intelligent answers, Imielinski studied the interactions between these rules, and give conditions for building complete intelligent answers after the pre-evaluation of some the rules. It is still an open problem what kind of role semantic constraints can play in this situation. 23
Intelligent answers are becoming more and more important for database and knowledgebase queries. The work presented here is preliminary. It is our view that future research studies in this area are needed for discovering new types of intelligent answers and effective ways of generating them.
References [CGM90] Chakravarthy,U.S., Grant,J. and Minker, J., Logic-Based Approach To Semantic Query Optimization, ACM TODS, Vol.15, 1990, pp 162-207. [ChDe87] L. Cholvy and R. Demolombe, Querying a Rule Base. In Proc. of the 1st Int'l Conf. on Experrt Database Systems, pp 365-371, 1987. [CoKa83] S. Cosmadakis and P. C. Kanellakis, Functional and Inclusion Dependencies: A Graph Theoretic Approach. Proc. of ACM Symp. on PODS, 1984. [FoGr92] M. M. Fonkam and W. A. Gray, Employing Integrity Constraints for Query Modi cation and Intensional Answer Generation in Multi-database systems. LNCS 618, pp 244-260, 1992. [HaZd80] M.M. Hammer and S.B. Zdonik. Knowledge Based Query Processing. Proc. 6th VLDB, 1980, pp 137-147. [Han91] J. Han, Constraint-Based Reasoning in Deductive Databases. Proc. of 7th Data Engineering, 1991, pp 257-265. [Imie88] T. Imielinski, Intelligent Query Answering in Rule Based systems. Foundations of Deductive Database and Logic Programming, Ed. Jack Minker, pp 275-311, 1988. [King81] J.J. King. QUIST: A System for Semantic Query Optimization in in Relational Databases. Proc. 7th VLDB, 1981, pp 510-517. [Klug88] Klug,A.,On Conjunctive Queries Containing Inequalities, JACM vol 35:1,pp 147-160,1988. [LeHa88] S. Lee and J. Han, Semantic Query Optimization in Recursive Databases. Proc. of 4th Data Engineering, 1988, pp 444-451. 24
[Motr89] A. Motro, Using Integrity Constraints to Provide Intensional Answers to Relational Queries. Proc. 15th VLDB, 1989, pp 237-246. [PLO91] H.H. Pang, H.J. Lu, and B.C. Ooi, An Ecient Query Optimization Algorithm. Proc. 7th Data Engineering, 1991, pp 326-335. [PiRo89] A. Pirotte and D. Roelants, Constraints for Improving the Generation of Intensional Answers in a Deductive Database. Proc. 5th Data Engineering, 1989, pp 652-659. [Scio83] E. Sciore, Inclusion Dependencies and the Universal Instance. Proc. of ACM Symp. on PODS, 1983. [ShOz87] S.T. Shenoy and Z.M. Ozsoyoglu, A System for Semantic Query Optimization. Proc. ACM SIGMOD, 1987. [StSh86] L. Sterling and E. Shapiro, The Art of Prolog: Advanced Programming Techniques. MIT Press, 1986. [Ull89a] J.Ullman. Principles of Database and Knowledge-Base systems, volume I, Computer Science Press, 1989. [Ull89b] J.Ullman. Principles of Database and Knowledge-Base systems, volume II, Computer Science Press, 1989. [Zhang94] Xubo Zhang, Implication and Referential Constraints: A New Formal Treatment and the Applications in Query Processing. Ph.D Thesis, Case Western Reserve Univ., 1994. [ZhOz93] Xubo Zhang and Z.M.Ozsoyoglu, On Ecient Reasoning with Implication Constraints. Proc. of the 3rd Int'l Conf. on DOOD. 1993. [ZhOz94] Xubo Zhang and Z.M.Ozsoyoglu, On Reasoning with Implication and Referential Constraints. ILPS Workshop on Constraints and Databases, 1994.
25