On Containment of Conjunctive Queries with Negation Victor Felea ”A.I.Cuza” University of Iasi Computer Science Department, 16 General Berthelot Street, Iasi, Romania
[email protected] http://www.infoiasi.ro
Abstract. We consider the problem of query containment for conjunctive queries with the safe negation property. Some necessary conditions for this problem are given. A part of the necessary conditions use maximal cliques from the graphs associated to the first query. These necessary conditions improve the algorithms for the containment problem. For a class of queries a necessary and sufficient condition for containment problem is specified. Some aspects of time complexity for the conditions are discussed. Keywords: query containment, negation, maximal sets, cliques in graphs.
1
Introduction
Query containment is a very important problem in many management applications, including query optimization, checking of integrity constraints, analysis of data source in the data integration, verification of knowledge bases, finding queries independent of updates, rewriting queries using views. The problem of query containment has already captivated many researchers [5, 10, 12, 13, 14, 15, 16, 20, 23]. In [23]J.D.Ullman presents an algorithm based on canonical databases, using an exponential number of such databases. F.Wei and G.Lausen propose an algorithm that uses containment mappings defined for the two queries in [24]. This algorithm increases the number of positive atoms from the first query in the containment problem. Many authors study the problem of query containment under constraints. Thus, in [10] C.Farre and al. present the constructive query containment method to check query containment with constraints. In [14]N.Huyn and al. consider the problem of incrementally checking global integrity constrains. Some authors approach the containment problem for applications in Web services in [8,9,18]. The containment query problem is used for rewriting queries using views by F.Afrati in [1,2]. Checking containment of conjunctive queries without negation (called positive) is an NP-complete problem [5]. It can be solved by testing the existence of a containment mapping corresponding to the two queries. For queries with negation, query containment problem becomes Π2P -complete. M. Leclere and M.L.Mugnier investigate the containment problem of conjunctive queries using graphs homomorphism giving sufficient conditions for query containment in [16]. S. Cohen et al. reduce
the containment problem to equivalence for queries with expandable aggregate functions in [7]. In a recent paper the author introduces and studies a notion of strong containment that implies classical containment problem for two queries in conjunctive form with negation[11]. In this paper we specify several necessary conditions for the query containment problem. For a special class of queries a necessary and sufficient condition is given. The time complexity of the proposed algorithm depends on the number of containment mappings and the number of these sets of equality relations. F. Wei and G. Lausen show that in the worst case the algorithm proposed by them in [24] has the same performance as the one proposed by J. Ullman in [23]. Considering the number of databases from the Ullman’s algorithm and the time complexity (specified in Section 5) for the necessary condition from Proposition 9, we remark that the last is better than the complexity of the first, which is normal. The following example points out the utility of our research. Example 1. Suppose we have some informations about companies, products, supply operations and import restrictions for certain products. Let us consider the following schema that consists of the relations COM , P ROD, SU P P LY , REST R, where: COM (ComId, Country) contains a set of companies having import or export operations of alimentary products as the activity object. The attribute ComId is an identifier for a company and Country represents the country in which the company identified by ComId is registered. P ROD(P rodId, P rodN ame) contains dates about products. The attribute P rodId is an identifier for a product and P rodN ame is the product name for the product identified by P rodId. SU P P LY (ComId1, ComId2, P rodId) contains supply operations: the attribute P rodId represents the product supplied by the company ComId1 to the company ComId2. REST R(P rodId, Country1, Country2) contains import-export restrictions, namely for the country Country2 is not allowed to import the product P rodId from the country Country1. Let us consider two queries denoted Q1 and Q2 about these relations: Q1 : Find all companies ComId such that there exist two products p1 and p2 , two companies ComId1 and ComId2 that satisfy the following property: ComId supplies the product p1 to the company ComId1 and the country corresponding to the company ComId1 has no import restrictions for the product p1 for any import operations from the country corresponding to ComId, and ComId2 supplies the product p2 to the company ComId and the country corresponding to the company ComId has no import restrictions for the product p2 for any import operations from the country corresponding to ComId. Q2 : Find all companies ComId such that there exists a product p and a company ComId1 that satisfy the following property: ComId supplies the product p to the company ComId1 and the country corresponding to the company ComId1 has no import restrictions for the product p for any import operations from the country corresponding to ComId. If we denote by H the head of the two queries
and use the variables as arguments of literals, we obtain: Q1 : H(x) : −COM (x, y1 ), P ROD(p1 , y2 ), P ROD(p2 , y3 ), COM (y4 , y5 ), COM (y6 , y7 ), SU P P LY (x, y4 , p1 ), SU P P LY (y6 , x, p2 ), ¬REST R(p1 , y1 , y5 ), ¬REST R(p2 , y7 , y1 ), Q2 : H(x) : −COM (x, z1 ), P ROD(p, z2 ), COM (z3 , z4 ), SU P P LY (x, z3 , p), ¬REST R(p, z1 , z4 ). We are interested to find if Q1 ⊆ Q2 and if Q2 ⊆ Q1 . In Example 4 we establish that the first statement is true and the second is false. This implies that the two queries are not equivalent. The paper is organized as follows: in Section 2 we define the answer of a query for a database, the problem of query containment and the notion of satisfiable query. In Section 3 we give several necessary conditions for the two queries to be in the containment relation and we point out a necessary and sufficient condition for the containment problem in case when the second query satisfies a certain restriction. In Section 4 we specify a method to calculate the sets of equality relations asked for by the condition formulated in Section 3. In Section 5 we give the time complexity for some necessary conditions specified in Section 3. Finally, the conclusion is presented.
2
Preliminaries
Consider two queries Q1 and Q2 having the following forms: Q1 : H(x) : −f1 (x, y) and Q2 : H(x) : −f2 (x, z), where f1 (x, y) = R1 (w1 ), . . . , Rh (wh ), ¬Rh+1 (wh+1 ), . . . , ¬Rh+p (wh+p ) f2 (x, z) = S1 (w01 ), . . . , Sk (w0k ), ¬Sk+1 (w0k+1 ), . . . , ¬Sk+n (w0k+n )
(1)
The vector x is a variable vector consisting of all free variables from Q1 and Q2 , y, z are vectors that consist of all existentially quantified variables from Q1 and Q2 , respectively. The symbols Ri and Sj are relational symbols, wi are vectors of variables from x or from y; w0j are variable vectors with components from x or from z. The character ”, ” between literals represents the logic conjunction. For the sake of simplicity we consider queries without constants, but the results also follow if there are constants. The difference consists in the unifier definition. The following assumptions are made on the variables of the queries from (1): the variables occurring in the head also occur in the body and all variables occurring in the negated subgoals also occur in the positive ones. The last constraint is called the safe negation property. A database is a set of atoms defined on a value domain Dom of constants or variables.
Definition 1. For a query Q1 having the form as in (1) and D a database on Dom, we define the answer of Q1 for D, denoted Q1 (D) as the set of all H(τ x), where τ is a substitution for variables from x, such that there is a substitution τ1 that is an extension for τ to all variables from y such that D satisfies the right part of Q1 for τ1 . Formally, Q1 (D) = {H(τ x) | ∃τ1 an extension of τ so that D |= τ1 f1 (x, y)}
(2)
The notation D |= τ1 f1 (x, y) means: τ1 Rj (wj ) ∈ D, for each j, 1 ≤ j ≤ h and τ1 Rh+i (wh+i ) 6∈ D, for each i, 1 ≤ i ≤ p Definition 2. We say that the query Q1 is contained in Q2 , denoted Q1 ⊆ Q2 , if for each domain Dom and database D on Dom, the answer of Q1 for D is contained in the answer of Q2 for D, that means Q1 (D) ⊆ Q2 (D). Definition 3. A query Q1 having the form as in (1) is satisfiable if there is a database D, such that Q1 (D) 6= ∅, otherwise it is unsatisfiable. Proposition 1. [5] A query Q1 as in (1) is unsatisfiable iff there is Rj (wj ), 1 ≤ j ≤ h and Rh+i (wh+i ), 1 ≤ i ≤ p such that these atoms are identical, that means Rj = Rh+i and their arguments are equal: wj = wh+i . In case where f1 (x, y) satisfies the condition of unsatisfiability from Proposition 1, we denote this by f1 (x, y) = ⊥. Since in case when f1 (x, y) = ⊥, we have Q1 (D) = ∅, it is sufficient to consider the case when f1 (x, y) 6= ⊥. We need to consider the equality relations defined on the set Y = {x1 , . . . xq , y1 , . . . , ym }, where xj , 1 ≤ j ≤ q are all variables from x and yi , 1 ≤ i ≤ m are all variables from y. Let us denote by M a set of equality relations on Y . We express M as: M = {(tα1 , tβ1 ), . . . , (tαs , tβs )}, tαi , tβi ∈ Y . Let us denote by M ∗ the reflexive, symmetric and transitive closure of M . Thus, M ∗ produces a set of equivalence classes. We denote by yb the class that contains y. We must consider a total order on Y , let us consider this order as x1 < . . . < xq < y1 < · · · < ym . Let us consider a conjunction of literals like f1 (x, y). We define the conjunction denoted ψM f1 (x, y) by replacing in f1 (x, y) every variable tj with a, where a is the minimum element from the class b tj with respect to ”