On E cient Reasoning with Implication Constraints 1 ... - CiteSeerX

11 downloads 0 Views 238KB Size Report
sales; M = M1g. The normal and compressed forms for ICs are de ned similarly if we view. ICs as yes/no queries. Any query can be (polynomially) rewritten into ...
On Ecient Reasoning with Implication Constraints Xubo Zhang and Z. Meral Ozsoyoglu

Department of Computer Engineering and Science Case Western Reserve University Cleveland, OH 44106 Email: fxzhang, [email protected] Phone: (216) 368{f8843, 2818g

Abstract In this paper, we address the complexity issue of reasoning with implication constraints. We consider the IC-RFT problem, which is the problem of deciding whether a conjunctive yes/no query always produces the empty relation (\no" answer) on database instances satisfying a given set of implication constraints, as a central problem in this respect. We show that several other important problems, such as the query containment problem, are polynomially equivalent to the IC-RFT problem. More importantly, we give criteria for designing a set of implication constraints so that an ecient \units-refutation" process can be used to solve the IC-RFT problem.

1 Introduction Semantic constraints are logic rules specifying semantically meaningful database instances. In this paper we consider an important kind of semantic constraints called implication constraints that are expressible by empty headed Horn clauses having positive database literals. Since early 1980's, there have been investigations on the utilization of such constraints to optimize relational queries ([HaZd80], [King81], [ShOz87], [CGM90] etc.), which is known to be the eld of semantic query optimization. The main tasks are to nd inconsistent queries, eliminate joins, delete redundant -join predicates, and introduce selection conditions on indexed attributes. More recently, there are research studies investigating a much wider range of \constrained" problems in relational, deductive, and object-oriented databases; for example, the update problem ([Elka90]), the redundancy problem in Datalog ([LeSa92]); the recursive query optimization problem ([LeHa88], [Han91], [PLO91]); and the problem of providing intentional answers ([PiRo89], [Motr89]). The maintenance of non-redundant and consistent constraint bases is also studied ([OSI90]). 1

All these works involve the reasoning with implication constraints, few studied have addressed the complexity issue. In this paper, we discuss one basic problem, the problem of deciding whether a conjunctive yes/no query always produces the empty relation (\no" answer) on database instances satisfying a given set of implication constraints. We call this problem the IC-RFT (reads \IC refuting") problem. IC-RFT problem is important because, as we will show, it is closely related to the well known containment problem for conjunctive queries ([ChMe76], [Klug88], [Ull89], [ZhOz92] etc.), and many other problems. Before going into further discussions, let's consider an example.

Example 1.1

Suppose we have a database system for a small company. There is a relation scheme: dept(dname, manager), and an implication constraint: ic1 : dept(D; M ); D 6= sales; D 6= service !, which tells us that there are only two departments: \sales" department, and \service" department. Now, let us ask a query \are there three di erent departments?", i.e., Q = fhi : dept(D1 ; M1 ); dept(D2 ; M2 ); dept(D3 ; M3 ); D1 6= D2 ; D1 6= D3 ; D2 6= D3 g. This is a what we call yes/no query, where there is no free variable in the summary hi, and the variables are existentially quanti ed. Assuming the constraint has been enforced, i.e., only relation instances satisfying ic1 can be stored in the database, the answer for Q is always \no" (i.e., Q produces the empty set). This answer can be obtained without searching the actual database, but by refuting the query (formula) using the constraint. We can prove that, Q always answers \no" if and only if the following clauses are unsatis able: (D1 = sales _ D1 = service), (D2 = sales _ D2 = service), (D3 = sales _ D3 = service), D1 6= D2 , D1 6= D3 , D2 6= D3 . Indeed, exhaustively testing all the combinations of the (in)equality literals shows that the above clauses are unsatis able. Now, let us look at the problem from another perspective: if we view Q as de ning a constraint saying that there do not exist three di erent departments: ic : dept(D1 ; M1 ); dept(D2 ; M2 ); dept(D3 ; M3 ); D1 6= D2 ; D1 6= D3 ; D2 6= D3 !, we know ic is always true if ic1 is enforced, because ic1 logically implies ic. Thus ic is redundant. More interestingly, we can view ic1 as de ning a yes/no queryQ1 as follows, asking whether there is a department which is neither the \sales" department nor the \service" department: Q1 = fhi : dept(D; M ); D 6= sales; D 6= serviceg. 2

It is easy to see that Q1  Q (the result of Q1 always contains the result of Q on any database instance). Intuitively, if there exist three di erent departments, then one of them must be neither the \sales" department nor the \service" department. 2 Later in the paper we prove that the IC-RFT problem is polynomially equivalent to the query containment problem ([Klug88], [KKR90]) and the constraint redundancy problem ([OSI90]). The query containment problem has recently been proved to be p2 -complete ([Mey92]), and so is the IC-RFT problem. p2 is a class higher than NP in the polynomial structure ([GaJo79]). More importantly, we give a characterization for an set of implication constraints, such that a much more ecient process, called units-refutation, can be used to solve the IC-RFT problem. For this type of constraint sets, the complexity of the IC-RFT problem can be reduced to NP-complete, and even to polynomial in a more practical case. We give an example to show that for the constraint sets not satisfying our criterion, unitsrefutation process is not sucient. But still we show that our results can be used to reduced the complexity. These issues will be addressed starting from section 3, while in section 2 we explain some preliminary concepts.

2 Preliminaries We assume to have an underlying many sorted rst order language. There is a nite set of domains (sorts) D = fD1 ; :::; Dd g. Any Di is either dense and totally ordered with a transitive and irre exive ordering predicate >Di , or is just equipped with the ordering predicates =Di and 6=Di . By overloading notations, we omit the subscripts of these predicates. The set of the ordering predicates induced from > in a conventional way is denoted as  = f; ; =; 6=g. An (in)equality is a formula in one of the two forms, (X op Y ) or (X op b), where constant b and variables X and Y are from the same domain, op 2  if the domain is dense and totally ordered, or op 2 f=; 6=g if otherwise. A database schema consists of a number of relation schemes, each of which is denoted like f (att1; :::; attn ), where f is the relation name, n is the arity, and atti's are attributes. The domain of atti is denoted as Dom(f [i]). A conjunct is an atomic formula of the form f (s1 ; :::; sn ), where f is a relation name of arity n, and each si is either a variable or a constant of Dom(f [i]). An instance of f is a subset of Dom(f [1])  :::  Dom(f [n]). A database instance is a set of instances for its relation scheme. A query is a rst order formula of the form fho1 ; :::; ok i : 9X1 ; :::; 9Xm [t1 ; :::; tp ; c1 ; :::; cq ]g, where k  0, p  1, and q  0, and: 3

1. ho1 ; :::; ok i is called summary, where each oi is either a variable, called distinguished variable (dv), or a constant. We de ne the type of the query to be hDom(o1 ); :::; Dom(ok )i, where Dom(oi ) is the domain of oi . 2. Xi is called a nondistinguished variable (ndv) which is not a dv. 3. ti is a conjunct; ci is an (in)equality; commas between ti 's and cj 's denote logical \and". 4. Every variable (dv or ndv) must occur in some conjunct, or can be (transitively) equated either to a variable appearing in some conjunct or to a constant. The result of applying a query Q to a database instance Db, denoted as Q(Db), is the set of all the instantiations of the summary such that the query formula is satis ed. We will only consider queries whose (in)equality subformula (if any) is satis able. A yes/no query is a query whose summary is hi. The result of a yes/no query is either the set fhig, or the empty set fg, the former can be interpreted as answering \yes" while the letter as answering \no". We will also use union queries. A union query is of the form union(Q1 ; :::; Qr ), where Q1 ; :::; Qr (r  1) are queries of the same type. The result of union(Q1 ; :::; Qr ) on a database instance Db is the set union of Qi (Db)'s, for i = 1; :::; r. The containment between queries (of the same type), denoted as, e.g., Q1  Q, means Q1 (Db)  Q(Db) for any database instances Db. Q is equivalent to Q1 , denoted Q  Q1 , if Q  Q1 and Q1  Q. An implication constraint (IC) is a formula of the form t1 ; :::; tp ; c1 ; :::; cq !, where t1 ; :::; tp ; c1 ; :::; cq is in the same format as the body of a yes/no query. The variables in it are considered universally quanti ed. We also assume c1 ; :::; cq is satis able. A database instance satis es an IC, if all instantiations of the IC with tuples in the corresponding relation instances make the IC true. A database instance satis es an IC-base, if it satis es all the ICs in the base. Notations. We denote variables by upper case letters, denote relation and attribute names by small case strings in bold face, and denote constants by lower case strings. Some times we write queries and ICs in a shortened form, like fhO i : F; I g and F; I ! respectively, in which hOi is the summary, F is the conjunct subformula, and I is the (in)equality subformula. For ICs, (disjunctive) clausal forms are also used, for example, :t1 _ ::: _ :tp _ :c1 _ ::: _ :cq , or more concisely, :F _ :I . :I is called the (disjunctive) (in)equality subclause.

3 The IC-RFT Problem De nition 3.1 (IC-RFT problem) Given an IC-base I , and a \yes/no" query Q, the ICRFT problem is to decide whether I j= Q  fg, i.e., whether Q always produces the empty relation on database instances satisfying I . C

C

C

4

Before giving a necessary and sucient condition for the IC-RFT problem, we need to de ne a few more notations. First, we call a query Q = fhO i : F; I g in normal form, if there are only distinct occurrences of variables in the subformula fhO i : F g. Q is called in compressed form if there are no explicit or implied equalities in I . For example, fhM i : dept(sales; M )g is a compressed form query, and it can be rewritten into normal form: fhM i : dept(D; M1 ); D = sales; M = M1 g. The normal and compressed forms for ICs are de ned similarly if we view ICs as yes/no queries. Any query can be (polynomially) rewritten into normal or compressed forms. Let Q and Q1 be two queries of the same type. A symbol mapping ([Klug88], [ZhOz92])  : Q1 ! Q is a function from the symbols (variables and constants) of Q1 to those of Q, that is identity on constants, and that induces a mapping from the summary of Q1 to that of Q, and from the conjuncts of Q1 to those of Q. We can similarly de ne symbol mappings from ICs to yes/no queries if we view ICs as yes/no queries.

Theorem 3.1 Let I be an IC-base, in which all constraints are in normal forms, and Q = fhi : F; I g be a yes/no query in compressed form. I j= Q  fg if and only if there are constraints ic1 : F1 ; I1 !; :::; icr : Fr ; Ir ! in I , and symbol mappings: 1;1 ; :::; 1;n1 (n1 > 0) : ic1 ! Q, C

C

C

.. .

r;1 ; :::; r;nr (nr > 0) : icr ! Q such that I implies 1;1 (I1 ) _ ::: _ 1;n1 (I1 ) _ ::: _ r;1 (Ir ) _ ::: _ r;nr (Ir ). Sketch of the proof. Let I = fic1 ; :::; icr ; icr+1 ; :::; icm g, where ic1 ; :::; icr are the ICs on which symbol mappings to Q can be de ned. Let icj = :Fj _ :Ij , for j = 1; :::; m. Logically, I j= Q  fg is equivalent to saying that the following formulas are not satis able: 9VQ[F (VQ ); I (VQ )], 8V1[:F1(V1) _ :I1(V1 )], .. . 8Vm[:Fm (Vm) _ :Im(Vm)], where VQ are the ndv's of Q, V1 ; :::; Vm are the ndv's of ic1 ; :::; icm respectively. Expressing the axioms of density and totally orderedness in clausal form, and with all the tautologies of constants, we can thus use the resolution with paramodulation process to get an unsatis able set of (in)equality clauses. Since all the ICs are in normal form, every possible resolutions between conjunct literals can be performed and paramodulation is not needed. This leads to the conclusion of the theorem. 2 C

C

Example 3.1 5

Continued from Example 1.1. There are only three symbol mappings from ic1 to Q (remember ic1 is dept(D; M ); D 6= sales; D 6= service !, and Q is fhi : dept(D1 ; M1 ); dept(D2 ; M2 ); dept(D3 ; M3 ); D1 6= D2 ; D1 6= D3; D2 6= D3g), they are: 1 = fD 7! D1 ; M 7! M1 g, 2 = fD 7! D2 ; M 7! M2 g, and 3 = fD 7! D3 ; M 7! M3 g. By the above theorem, fic1 g j= Q  fg if and only if (D1 6= D2 ; D1 6= D3 ; D2 6= D3 ) implies (D1 6= sales; D1 6= service) _ (D2 6= sales; D2 6= service) _ (D3 6= sales; D3 6= service). 2

4 Polynomially Equivalent Problems In this section, we are going to consider the relationship between the IC-RFT problem and the following problems, and show that they are polynomially equivalent to the IC-RFT problem: 1) the generalized IC-RFT problem, 2) the query containment problem, 3) the generalized query containment problem, 4) the IC redundancy problem. We will also show that another related problem, namely the (in)equality redundancy problem is polynomially reducible to the IC-RFT problem. The basic observation here is that queries and implication constraints can be viewed as playing complementary roles, and they can be transformed to each other in a certain way. First we consider the generalized IC-RFT problem, in which queries are not necessarily yes/no queries but may have arbitrary summaries. Solving this problem allows us to nd intrinsic inconsistency of the query with respect to the semantics of the database. This is one of the main issues in semantic query optimization and intentional query answering. Formally, the generalized IC-RFT problem is to decide whether I j= Q  fg, i.e., whether a query Q always produces an empty relation on database instances satisfying an IC-base I (here fg should be understood as having the same type as Q). The following proposition tells us that this problem is polynomially equivalent to the IC-RFT problem. C

C

Proposition 4.1 The generalized IC-RFT problem is polynomially equivalent to the IC-RFT

problem.

Proof. The (polynomial) reduction of the IC-RFT problem to the general IC-RFT problem is obvious. Here we only show the reduction of the other direction. Let I = fic1 ; :::; icn g, and Q = fhOi : F; I g. We prove I j= Q  fg if and only if I j= fhi : F; I g  fg. I j= Q  fg logically means that the formulas 9VD 9VN [F; I ], ic1 , ..., and icn are unsatis able, where VD and VN are the dv's and ndv's of Q respectively. This is exactly C

C

C

C

6

I j= fhi : F; I g  fg. 2 C

Example 4.1

Consider the query Q in Example 1.1, asking whether there are three di erent departments. Let us modify it into a query to list the triplets of di erent department names, i.e., Q = fhD1 ; D2 ; D3 i : dept(D1 ; M1 ); dept(D2 ; M2 ); dept(D3 ; M3 ); D1 6= D2 ; D1 6= D3 ; D2 6= D3 g. Given any IC-base I , we can see that I j= Q  fg if and only if I j= Q  fg. Indeed, if there are no three di erent departments then we can not list any triplet of di erent departments; and the reverse is also true. 2 In the introduction we have seen the relationship between the IC-RFT problem and the query containment problem. We now de ne a more general notion of query containment problem. Let Q, Q1 ; :::; Qr be queries of the same type. The query containment problem is to decide whether union(Q1 ; :::; Qr )  Q. To show the equivalence between this problem and the IC-RFT problem, we rst give a theorem on the necessary and sucient condition of the containment. 0

C

C

C

0

Theorem 4.1 Let Q = fhOi : F; I g be in compressed from, Q1 = fhO1 i : F1 ; I1 g, ..., Qr = fhOr i : Fr ; Ir g be normal form queries of the same type. then union(Q1; :::; Qr )  Q if and only if there are n1 + ::: + nr  1 symbol mappings: 1;1 ; :::; 1;n1 : Q1 ! Q, ..., r;1 ; :::; r;nr : Qr ! Q, such that I implies 1;1 (I1 ) _ ::: _ 1;n1 (I1 ) _ ::: _ r;1 (Ir ) _ ::: _ r;nr (Ir ). The proof is similar to that of Theorem 3.1, and is omitted here. In [ZhOz92], we proved a similar theorem where Q is in compressed form, and Q1 , ..., Qr are re exive queries that can be in any form.

Proposition 4.2 The query containment problem is polynomially equivalent to the IC-RFT

problem.

Proof. First we show the reduction from the query containment problem to the IC-RFT problem. Since it is polynomial to transform a query into normal or compressed froms, we assume Q = fhO i : F; I g be in compressed from, Q1 = fhO1 i : F1 ; I1 g, ..., Qr = fhOr i : Fr ; Ir g be normal form queries of the same type. From Theorem 3.1 and Theorem 4.1 it follows that union(Q1 ; :::; Qr )  Q if and only if fic1 ; :::; icr g j= Q  fg, where ici = out(Oi); Fi ; Ii ! for i = 1; :::; r), and Q = fhi : out(O); F; I g, where \out" is an unused predicate name whose domain speci cations correspond to the type of Q. By the same reasoning we can show the reduction of the IC-RFT problem to the query containment problem. In fact, given an IC-base I consisting of normal form ICs ic1 : F1 ; I1 !, 0

0

C

7

..., icr : Fr ; Ir !, and a compressed yes/no query Q, I j= Q  fg if and only if union(fhi : F1 ; I1 g; :::; fhi : Fr ; Ir g)  Q. 2 We can further generalize the containment problem by considering constraints. More specifically, given an IC-base I , a union query union(Q1 ; :::; Qr ), and a query Q, the generalized query containment problem is to decide whether I j= union(Q1; :::; Qr )  Q, i.e., whether union(Q1 ; :::; Qr )(Db)  Q(Db) on any database instance Db satisfying I . This generalized problem, however, does not have higher complexity. C

C

C

C

Proposition 4.3 The generalized query containment problem is polynomially equivalent to the

IC-RFT problem.

Proof. The reduction of the IC-RFT problem to the generalized query containment problem has been shown in the previous proposition. Using a similar reasoning in the proof of the previous proposition, we can prove the reduction of the other direction: given an IC-base I in which all the ICs are in normal form, a compressed query Q = fhO i : F; I g, and normal form queries Q1 = fhO1 i : F1 ; I1 g; :::; Qr = fhOr i : Fr ; Ir g of the same type, then I j= union(Q1 ; :::; Qr )  Q if and only if I [ fic1 ; :::; icr g j= Q  fg, where ici = out(Oi ); Fi ; Ii ! for i = 1; :::; r, and Q = fhi : out(O); F; I g. Here \out" is an unused predicate name whose domain speci cations correspond to the type of Q. 2 In maintaining an IC-base, when we are to add a new implication constraint to an IC-base, we need to decide whether the candidate constraint is implied by the IC-base. This is a recently investigated issue in [OSI90]. The example in the introduction illustrated the relationship between this problem and the IC-RFT problem. Now we give a more formal statement. Given an IC-base I and an implication constraint ic, the IC redundancy problem is to decide whether I j= ic, i.e., whether ic is logically implied by the constraints in I . C

C

0

C

0

C

C

C

Proposition 4.4 The IC redundancy problem is polynomially equivalent to the IC-RFT prob-

lem.

Proof. Here we only give the two reductions and omit the proofs, as they are similar to that of the previous propositions. The reduction of the IC-RFT problem to the IC redundancy problem is: given an IC-base I and a yes/no query Q = fhi : F; I g, I j= Q  fg if and only if I j= (F; I !). The reduction of the IC redundancy problem to the IC-RFT problem is: given an IC-base I and an IC ic : F; I !, I j= ic if and only if I j= fhi : F; I g  fg. 2 Another important issue in semantic query optimization is to determine redundant selection (in)equalities. We may want to introduce some redundant (in)equalities, e.g., ones on indexed attributes; we may also want to eliminate some other redundant predicates, e.g., ones across C

C

C

C

C

C

8

relations. Moreover, removing redundant (in)equalities is needed for deciding whether a join can be removed in the presence of implication and referential constraints ([King81], [CGM90]). Precisely, let Q = fhO i : F; I; cg be a query in which c is an (in)equality, and I be an IC-base, the (in)equality redundancy problem is to decide whether I j= Q  fhO i : F; I g, i.e., Q and fhOi : F; I g always produce the same answer on any database instances satisfying I . Although this problem is not known to be polynomially equivalent to the IC-RFT problem, the proposition below shows that it can be reduced to the IC-RFT problem. C

C

C

Proposition 4.5 The (in)equality redundancy problem is (polynomially) reducible to the IC-

RFT problem.

Proof. Given an IC-base I and a query Q = fhO i : F; I; cg, where c is an (in)equality, I j= Q  fhOi : F; I g if and only if I j= Q  fhOi : F; I g, since it is always true that I j= fhOi : F; I g  Q. Now, we have reduced the (in)equality redundancy problem to the generalized query containment problem, which in turn can be reduced to the IC-RFT problem by Proposition 4.3. 2 C

C

C

C

5 Units-Refuting IC-Bases 5.1 Units-refutation >From Theorem 3.1 in the last section, it is not dicult to see that, if we can identify the situation where we can reduce the complexity of deciding whether I implies 1;1 (I1 ) _ ::: _ 1;n1 (I1) _ ::: _ r;1 (Ir ) _ ::: _ r;nr (Ir ), then we can reduce the complexity of the ICRFT problem. This is logically equivalent to deciding whether the set of (in)equality clauses fI; :1;1 (I1 ); :::; :r;nr (Ir )g is unsatis able.

De nition 5.1 (Units-Refutation) Given a set of (in)equality clauses C , the units-refutation L

is a process de ned as follows: use several single-literal clauses to eliminate a contradictory literal in another clause cl, and replace the resulting clause for cl in C . Repeat the above steps, until an empty clause is obtained or no more literals can be eliminated. L

Notice that units-refutation is similar to the \unit resolution" ([ChLe73]), where there are only two parent clauses involved in each resolution step. Here we may use more than one (in)equalities to refute another (in)equality.

Example 5.1 For f(E1 = 6 E2 _ T = manager); (T 6= manager _ S > 50); (E1 = E2)g, the units-refutation process proceeds as follows:

9

f(E1 6= E2 _T = manager); (T 6= manager _S > 50); (E1 = E2)g =) f(T = manager); (T 6= manager _ S > 50); (E1 = E2 )g =) fT = manager; S > 50; E1 = E2 g: 2 When units-refutation process terminates with an empty clause, the set of clauses is unsatis able. Generally speaking, an exponential algorithm is needed to determine an unsatis able set of (in)equality clauses. Although units-refutation process is not sucient for general ICbases, it only takes polynomial time (with respect to the number of literals in the clause set), since to decide whether a conjunction of (in)equalities is satis able is polynomial with respect to the number of literals in the conjunction ([OSI90], [Ull89]).

De nition 5.2 An IC-base I is called a units-refuting base if for any yes/no query Q = fhi : F; I g, if I j= Q  fg, then there are constraints ic1 : F1 ; I1 !; :::; icr : Fr ; Ir !, and symbol C

C

mappings:

1;1 ; :::; 1;n1 (n1 > 0) : ic1 ! Q, .. .

r;1 ; :::; r;nr (nr > 0) : icr ! Q

such that the units-refutation process can be used to obtain an empty clause from I , :1;1(I1 ); :::; :1;n1 (I1 ), ..., :r;1(Ir ); :::; :r;nr (Ir ).

Recall theorem 3.1: when I j= Q  fg, there always exists constraints ic1 , ..., icr and corresponding symbol mappings 1;1 ; :::; 1;n1 , ..., r;1 ; :::; r;nr , such that I , :1;1 (I1 ); :::; :1;n1 (I1 ), ..., :r;1(Ir ); :::; :r;nr (Ir ) are unsatis able. The desired property here for units-refuting bases is that the units-refutation process is sucient to decide whether I , :1;1 (I1 ); :::; :1;n1 (I1 ), ..., :r;1(Ir ); :::; :r;nr (Ir ) are unsatis able or not for any I and any symbol mappings 1;1 ; :::; 1;n1 , ..., r;1 ; :::; r;nr . C

Example 5.2

In Example 3.1, we used Theorem 3.1 to show that fic1 g j= Q  fg if and only if the following clauses are unsatis able: (D1 = sales _ D1 = service), (D2 = sales _ D2 = service), (D3 = sales _ D3 = service), D1 6= D2 , D1 6= D3 , D2 6= D3 . Theses clauses are indeed unsatis able, but the units-refutation process can not get us the empty clause. It is easily seen that we can eliminate no more literals using single-literal clauses D1 6= D2 , D1 6= D3 , and D2 6= D3 . So, fic1 g is not a units-refuting base. 2 10

We will give examples of units-refuting bases after we give a criterion for it. First of all, we want to prove that the complexity of IC-RFT problem for units-refuting bases is NP-complete.

Theorem 5.1 For any units-refuting IC-base, the IC-RFT problem is NP-complete. Proof. First we prove the problem is NP by showing that the number of symbol mappings needed in the units-refutation process is polynomially bounded. Actually this number is always  6  (number of variables and constants in the query)2, which is the number of all possible (in)equalities that can be built using the symbols in the query. This is so because in the worst case the units-refutation process will produce a single-literal clause from every original (in)equality clause. Now, this number of symbol mappings can be guessed polynomially; and by units-refutation process, we can check whether they are unsatis able also in polynomial time. The NP-hardness is obvious, since the NP-complete containment problem for equality queries ([ChMe76]) can be reduced to it (see the discussion following Theorem 5.2). 2

5.2 Con icting (in)equalities In order to nd conditions for an IC-base to be units-refuting, we rst consider the relationships between a pair of (in)equalities.

De nition 5.3 (Potentially-Con icting) Let c1 and c2 be two (in)equalities, and c1 and 0

c2 be obtained from c1 and c2 respectively, by renaming the variables so that c1 and c2 have no common variables. Then c1 and c2 are called potentially-con icting (or simply con icting) if there exists a set of (in)equalities I , such that both I [ fc1 g and I [ fc2 g are satis able, but I [ fc1 ; c2 g is unsatis able. 0

0

0

0

0

0

0

For example, X  3 and Y = 1 are con icting, because fX  3; Y = 1; X = Y g is unsatis able; but X  3 and Y 6= 5 are not con icting. An (in)equality may also be (potentially) con icting to itself, e.g., D = sales and D = sales: after we rename D to D1 in one of the above literals, we get a pair of con icting literals, D = sales and D1 = sales. One important property of a pair of non-con icting (in)equalities c1 and c2 is that 1 (c1 ) and 2 (c2 ) is non-con icting for any symbol mappings 1 and 2 , since a symbol mapping can be expressed by a set of equalities. In refuting a query, di erent instances of ICs may be used, hence we need this property. Also notice that if c1 and c2 are not con icting, then fc1 ; c2 g is always satis able. It is easy to see that two (in)equalities over di erent domains are not con icting. Using the graph-theoretic method and results in [OSI90], we can prove the con icting relations between 11

any pair of (in)equalities, as listed in Table 1 and Table 2, where Table 1 is for dense and totally ordered domains, and Table 2 is for other domains. Entries marked with \NC " means the pair is not con icting; other non-blank entries specify the conditions for the pair not to be con icting; all the blank entries indicate the pair is con icting. Table 1: Con ict table for dense and totally ordered domains X > a X  a X = a X 6= a X < a X  a X > Y X  Y X = Y X 6= Y

U > b NC U  b NC U =b a b a 6= b a > b a > b a 6= b NC NC a 6= b a < b NC NC NC a < b a 6= b NC NC

NC

NC

NC NC NC

NC NC

NC

Table 2: Con ict table for other domains X = a X 6= a X = Y X 6= Y U =b a 6= b U 6= b a 6= b NC NC U =V U 6= V NC NC Given an IC-base, we have the set of its (in)equality subclauses f:I1 ; :::; :In g. In this set, a literal is called a con icting literal if it is con icting with itself or some other literals in the set. A clause which has only one con icting literal is called a single-con icting clause. Notice that con icting literals and single-con icting clauses are always de ned with respect to a speci ed set of clauses.

Example 5.3

In the set with one clause f(D = sales _ D = service)g, both literals are con icting literals, because each literal is con icting with, say, itself and the other literal. Thus the clause is not a single-con icting clause. 2

12

5.3 Units-refuting bases and implementation In this subsection we give a sucient condition for an IC-base to be units-refuting. After that, we will discuss how to use our result to lower the complexity of solving the IC-RFT problem for units-refuting bases, and non-units-refuting bases as well. First of all, we give some lemmas about properties of single con icting clauses. Our rst lemma tells us that single-con ictingness is preserved under symbol mappings.

Lemma 5.1 Let fic1 : :F1 _:I1; :::; icr : :Fr _:Ir g be a set of ICs. Let i;1; :::; i;ni be symbol mappings on ici (to some query Q), for i = 1; :::; r. Then :i;j (Ii ) (i = 1; :::; r; j = 1; :::; ni ) is a single-con icting clause of f:1;1 (I1 ); :::; :r;nr (Ir )g, if :Ii is a single-con icting clause in f:I1; :::; :Ir g. The following lemma tells us that single-con ictingness is preserved when subset of subclauses are considered.

Lemma 5.2 Let C = fcl1 ; :::; cln g and C = fcli1 ; :::; clim g be two sets of (in)equality clauses, such that 1  i1 < ::: < im  n, and every clij is a subclause of clij . Then clij (j = 1; :::; m) is a single-con icting clause of C , if clij is a single con icting clause in C . 0

L

0

0

L

0

0

0

L

L

The proofs of the above lemmas are straightforward from the de nitions of con icting literals and single-con icting clauses, so they are omitted. Our last lemma tells us that after the units-refutation process, if a clause having two or more literals is a single-con icting clause, then it is always satis able together with other clauses.

Lemma 5.3 Let C 1 be a set of (in)equalities, and C 2 be a set of multi-literal (in)equality clauses (ones having two or more literals), such that C 1 [fc0 g is satis able for every literal c0 in C 2 . Suppose C 1 [ C 2 is unsatis able. Then no minimal unsatis able subset of C 1 [ C 2 contains a single-con icting clause of C 2 L

L

L

L

L

L

L

L

L

Proof. Let C be a minimal unsatis able subset of C 1 [ C 2 and cl be a single-con icting clause in C 2 . Suppose cl 2 C , we construct a contradiction. Since C ? fclg is satis able, we can form a set C of literals each from a di erent clause in C ? fclg, such that C is satis able. Let c be a non-con icting literal of cl in C 2 . We know C [ fcg is unsatis able. Let C1 be a minimal unsatis able subset of C [ fcg. C1 must contain c, and at least one other literal c from C 2 , as C 1 [ fc0 g is satis able for every literal c0 in C 2. Because C1 is a minimal unsatis able set, C1 ? fcg and C1 ? fc g are both satis able. But since c and c are not con icting, we know C1 is also satis able. A contradiction. 2 >From the above three lemmas, we can easily prove the following theorem, which is one of our main results for units-refuting bases. L

L

L

L

L

L

L

L

0

L

L

0

L

0

13

Theorem 5.2 An IC-base I is units-refuting if, starting with its set of (in)equality subclauses, C

and by repeatedly determining a single-con icting clause and deleting it from the set, we can get an empty set.

As a very special case, if there are only equalities in the (in)equality subformulas of the ICs in I , then I is units-refuting. The reason is that equalities become 6=-inequalities in the disjunctive clauses, and any pair of 6=-inequalities are not con icting. We have a more interesting example as follows. C

C

Example 5.4

We have already seen the relation scheme dept and IC ic1 in Example 1.1. The whole database consists the following relational schemes: dept(dname; manager) and emp(ename; salary; title). The domain speci cations for the attributes are: 1. Dom(ename) = Dom(manager); 2. Dom(ename), Dom(dname), and Dom(title) are distinct domains of strings, equipped with = and 6=; 3. Dom(salary) is the set of rational numbers. The IC-base fic2 ; ic3 ; ic4 g is a units-refuting base, where ic2 , ic3 , and ic4 are the following implication constraints: ic2 : emp(E1 ; S; T ); dept(D; E2 ); E1 = E2; T 6= manager !, ic3 : emp(E; S; T ); T = technician; S  30 !, ic4 : emp(E; S; T ); T = manager; S < 50 !, where ic2 says that if an employee is a manager of some department, then his title is \manager". ic3 says that a \technician" earns less than $30k. ic4 says a \manager" earns $50k or more. The set of the (in)equality subclauses of fic2 ; ic3 ; ic4 g is as follows: fcl2 : (E1 6= E2 _T = manager); cl3 : (T 6= technician_S < 30); cl4 : (T 6= manager _S  50)g. cl2 and cl3 are both single-con icting clauses, because in cl2 only T = manager is a con icting literal, and in cl3 only S < 30 is a con icting literal. Deleting cl2 and cl3 , we get fcl4 g, and cl4 becomes single-con icting, because the two (in)equalities of cl4 are not con icting literals in fcl4 g now. However, the IC-base fic1 ; ic2 ; ic3 ; ic4 g is not units-refuting, where ic1 is dept(D; M ); D 6= sales; D 6= service ! in Example 1.1, because ic1 is not single-con icting even without any 14

other clauses. In fact, Example 5.2 shows that units-refutation is not sucient to prove fic1 g j= Q  fg. In the next section we will show that our IC-base is units-refuting for queries having distinct predicate names. 2 Now that we have given a sucient condition for units-refuting bases, we discuss the issue on implementing an ecient algorithm to solve the IC-RFT problem. In fact, our three lemmas can help us to lower the complexity of the IC-RFT problem even for non-units-refuting bases. Given an IC-base I and a query Q, rst we nd out all the symbol mappings from the ICs to Q. In worst case, there are exponentially many symbol mappings. But in practical cases, this number is usually very restricted due to the following factors: 1) only those ICs whose predicate names also appear in the query can be mapped to the query; 2) the number of mappings also depends on the number of times a predicate name appears in the query, and if the query only has distinct predicate names (more discussion in next section), there will be polynomial number of symbol mappings. Suppose C = fI; cl1 ; :::; clr g is the set of (in)equalities of Q and the (in)equality subclauses of the ICs after the symbol mappings. We know that I j= Q  fg if and only if C is unsatis able. First we do units-refutation on C . Upon termination, we get the updated C . If I is units-refuting, we can tell at once whether I j= Q  fg by looking for the empty clause. If I is not units-refuting, we can repeatedly delete the multi-literal single-con icting clauses from C , and do exponential checking only on the remaining clauses. C

L

C

L

C

L

L

C

C

L

6 Units-Refuting Bases for DPN-Queries In practice, we often encounter queries which do not have multiple conjuncts with the same name. We will call such queries distinct-predicate-name (DPN) queries. In this section, we are going to discuss the criterion of the units-refuting bases for DPN-queries. In this case, the IC-RFT problem is polynomial, because there are only polynomial number of symbol mappings from the ICs to the query. In fact, given a DPN-query, there is at most one instance of each IC involved in the refutation. So, in searching for single-con icting clauses, we don't need to consider con icting literals within the same clause. Given a set of (in)equality clauses, a literal is called a extra-con icting literal if it is con icting with a literal in some other clause. A clause which has only one extra-con icting literal is called a single-extra-con icting clause. Single-extra-con icting clauses play a similar role as single-con icting clauses. In fact, if we replace the phrase \single-con icting" by \single-extra-con icting" in Lemma 5.2 and Lemma 5.3, the results are still true. 15

The following lemma and theorem tells us that we can allow some kind of clauses other than the single-extra-con icting clauses and still have a units-refuting base for DPN-queries.

Lemma 6.1 Let C 1 be a set of (in)equalities, and C 2 be a set of multi-literal (in)equality clauses, such that C 1 [fc0 g is satis able for every literal c0 in C 2 . Then C 1 [C 2 is satis able if C 2 has the following property (*): Every literal c in C 2 is con icting with at most one literal in some other clause in C 2 . L

L

L

L

L

L

L

L

L

Proof. We can build a satis able set, C , of (in)equalities each from a di erent clause in C 2 . Initially, C contains a literal c from a clause cl of C 2 , and we delete cl from C 2 . While C 2 is not empty we repeat the following steps: L

L

L

L

1. choose a clause from C 2 which has a con icting literal c with the last inserted literal of C (if there is no such clause then pick any clause); 0

L

2. pick a literal other than c from the clause, put it into C , and delete the clause from C 2 . 0

L

It is easy to see that C thus formed consists of literals pairwisely non-con icting, so C 1 [ C is satis able. Hence C 1 [ C 2 is satis able. 2 L

L

L

Theorem 6.1 An IC-base I is DPN-units-refuting if, starting with the set of its (in)equality C

subclauses, and by iteratively determining a single-extra-con icting clause and deleting it from the set, we can get a set of clauses satisfying the property (*) in Lemma 6.1. Proof. Straightforward from Lemma 6.1. 2

Example 6.1

Continued from the last example. We show fic1 ; ic2 ; ic3 ; ic4 g is DNP-units-refuting base. Since a single-con icting clause is also a single-extra-con icting clause, so by Theorem 6.1, we get f(D = sales _ D = service)g after deleting cl2 , cl3 , and cl4 . Obviously this set satis es the property (*) of Lemma 6.1, as it has only one clause. In fact, fic1 ; ic2 ; ic3 ; ic4 ; ic5 g is still a DPN-units-refuting IC-base, in which we have another constraint ic5 telling that no employee is the manager of both the \sales" department and the \service" department: ic5 : dept(D1 ; M1 ); dept(D2 ; M2 ); D1 = sales; D2 = service; M1 = M2 !, because in f(D = sales _ D = service); (D1 6= sales _ D2 6= service _ M1 6= M2 )g every literal is con icting with at most one extra-literal. 2 16

If an IC-base is not a DPN-units-refuting base, the implementation principle discussed at the end of the previous section is still applicable: after the units-refutation process, a multiliteral single-extra-con icting clause can not be in any minimal unsatis able set of clauses. Hence we can delete the multi-literal single-extra-con icting clauses, and do the exponential checking only on the remaining clauses.

7 Conclusion and Future Work In this paper we have identi ed the IC-RFT problem as a central problem in reasoning with implication constraints. Many other important problems are shown to be polynomially equivalent to it. We also gave criteria for designing IC-bases such that the units-refutation process can be used to solve the IC-RFT problem. For this type of IC-bases, the complexity of the IC-RFT problem can be reduced from p2 -complete to NP-complete, and even to polynomial in a more practical case. For IC-bases not totally satisfying our characterization, we can use our lemmas to reduce the complexity. Here we suggest two future directions in continuing this research, one is to dynamically decide single-con icting clauses, i.e., to decide single-con icting clauses with respect to given queries, in this way we are expecting to be able to further reduce the number of multi-literal clauses we have to check exponentially; the other direction is to incorporate other types of semantic constraints, e.g., referential constraints, along with the implication constraints.

References [ChMe76] Chandra,A.K., and Merlin,P.M. Optimal implementation of conjunctive queries in relational databases. Proc. ACM STOC, 77-90, 1976. [CGM90] Chakravarthy,U.S., Grant,J. and Minker, j., Logic-Based Approach To Semantic Query Optimization, ACM TODS, Vol.15, 1990, pp 162-207. [ChLe73] C-L Chang and R. C. Lee, Symbolic Logic and Mechanical Theorem Proving. Academic Press, 1973. [Elka90] C. Elkan, Independence of Logic Database Queries and Updates, Proc. of 9th ACM Symp. on PODS, pp 154-160, 1990. [GaJo79] C. Garey M. and John D.S., Computers and Intractability: A Guide to the theory of NP-completeness. W.Freeman and Co., San Francisco, 1979. [Han91] J. Han, Constraint-Based Reasoning in Deductive Databases. Proc. of 7th Data Engineering, 1991, pp 257-265. 17

[HaZd80] M.M. Hammer and S.B. Zdonik. Knowlogy Based Query Processing. Proc. 6th VLDB, 1980, pp 137-147. [KKR90] Kanellakis,P.C., Kuper,G.M., and Revesz,P.Z. Constraint Query Languages, Proc. of 9th ACM Symp. on PODS, pp 288-298, 1990. [King81] J.J. King. QUIST: A System for Semantic Query Optimization in in Relational Databases. Proc. 7th VLDB, 1981, pp 510-517. [Klug88] Klug,A.,On Conjunctive Queries Containing Inequalities, JACM vol 35:1,pp 147160,1988. [LeHa88] S. Lee and J. Han, Semantic Query Optimization in Recursive Databases. Proc. of 4th Data Engineering, 1988, pp 444-451. [LeSa92] A. Levy and Y. Sagiv, Constraints and Redundancy in Datalog. Proc. of 11th ACM Symp. on PODS, pp 67-80, 1992. [Mey92] R. van der Meyden, The Complexity of Querying In nite Data about Linearly Ordered Domains, Proc. of the 11th ACM Symp. on PODS, pp331-345, 1992. [Motr89] A. Motro, Using Integrity Constraints to Provide Intentional Answers to Relational Queries. Proc. 15th VLDB, 1989, pp 237-246. [OSI90] Ozsoyoglu,Z.M., Shenoy S.T., Ishakbeyoglu N.S., On The Maintenance of Semantic Integrity Constraints, 1990. [PLO91] H.H. Pang, H.J. Lu, and B.C. Ooi, An Ecient Query Optimization Algorithm. Proc. 7th Data Engineering, 1991, pp 326-335. [PiRo89] A. Pirotte and D. Roelants, Constraints for Improving the Generation of Intentional Answers in a Deductive Database. Proc. 5th Data Engineering, 1989, pp 652-659. [ShOz87] S.T. Shenoy and Z.M. Ozsoyoglu, A System for Semantic Query Optimization. Proc. SIGMOD, 1987. [Ull89]

J.Ullman. Principles of Database and Knowledge-Base systems, volume II., Computer Science Press, 1989.

[ZhOz92] Xubo Zhang and Z.M.Ozsoyoglu, The Containment and Minimization of Inequality Queries. Tec. Report, CES 92-18, CWRU, 1992

18