On the Complexity of the Containment Problem for Conjunctive ...

8 downloads 0 Views 1MB Size Report
structural conditions on the computational complexity of tho query containment problem for safe conjunctive queries with discquation # as a built-in predicate.
On the Complexity of the Containment Problem for Conjunctive Queries with Built-in Predicates Phokion G. Kolaitis* UC Santa Cruz Santa Cruz, CA 95064 kolaitisacse.ucsc.edu

David L. Martin SH International Menlo Park, CA 94025 [email protected]

Abotroct

When the inputs are conjunctive queries with #, 5, or < as built-in predicates, the query containment problem ‘is Qr 5 Qz?,, is I$-complete and, thus, highly intractable. In this paper, we investigate the impact of syntactic and structural conditions on the computational complexity of tho query containment problem for safe conjunctive queries with discquation # as a built-in predicate. In the case of Np~r~‘, conjunctive queries (no built-in predicates), it is known that the boundary between polynomial-time solvability and NP-completeness is crossed, when the number of occurrcnccs of any database predicate in Qr increases from two to three, We show here that, as regards safe conjunctivc qucrics with disequations, the same syntactic condition dolincatcs the boundary between membership in coNP and II?$completencss, Moreover, it is also known that the “pure” conjunctive query containment problem is solvable in polynomial time, if the hypergraph associated with the database predicates of Qz is acyclic. In contrast, we show that the vory samo structural condition does not lower the computational complexity of the containment problem for safe conjunctivc queries with disequations, that is, the problem remnins II~completc. We also analyze the computational complexity of the quary equivalence problem for conjunctive queries with disequations, when one of the two queries is fixed. We show that this problem can be DP-complete, where DP is the class of nil decision problems that are the conjunction of a problem in NP and a problem in coNP. It follows that, as regards conjunctive queries with disequations, the complexity of the query cquivalcncc problem may be higher than the complexity of the query containment problem, when one of the two qucrics is fixed. 1 Introduction

and Summary of Results

Since the early days of relational databases, researchers realized the significance of conjunctive queries and dedicated a considerable amount of attention to the study of their

Madhukar N. Thakur Interbase Software Corporation Scotts Valley, CA 95066 [email protected]

structural and algorithmic properties. In terms of expressive power, the class of conjunctive queries coincides with the class of select-project-join queries of relational algebra. The latter contains most of the queries that are frequently asked by users of relational database systems; for this reason, the convenience and efficiency by which database query languages, such as SQL, handle conjunctive queries is one of the criteria used to compare these languages (see [mlS9]). As regards algorithmic probIems, conjunctme query equG,valence and conjunctive query containment were identified as fundamental problems, because of their importance in query optimization and evaluation. For example, since computing joins can be prohibitively expensive, equivalence algorithms may be used to find a query that is equivalent to a given conjunctive query and has a minimum number of joins. Moreover, it is clear that conjunctive query equivalence can be reduced to conjunctive query containment. Chandra and Merlin [CM771 showed, however, that both conjunctive query containment and conjunctive query equivalence are NP-complete problems. The original formulation of conjunctive queries does not allow for comparisons between data values. In practice, however, users often want to ask select-project-join queries that do involve such comparisons in the selection condition. Klug wlu88] extended the class of conjunctive queries by allowing inequality 5 comparisons and strict Cxquality < comparisons between variabIes or between variables and constants. Moreover, Klug [Klu88] investigated the containment problem for these extended queries and showed that it belongs in II;, the second level of the polynomial hierarchy introduced by Stockmeyer [Sto77]. He also raised the question of fmding the exact complexity of thii problem, and conjectured that membership in II: is a tight upper bound. Later on, van der Meyden [vdM97] co&reed Klug’s conjecture by establishing that the containment problem for conjunctive queries with inequalities (or with strict inequalities) is II$complete. Moreover, van der Meyden [vdM97] pointed out that hi proof can be adapted to show that the containment problem for conjunctive queries with disequotions # is II$-complete as welI. In recent years, the problem of information integration has occupied a prominent place in database research. In a nutshell, this problem is concerned with the extraction of information from heterogeneous sources, such as legacy databases or the web. A fruitful approach to this problem is to use an architecture based on mediators between the sources and the users. Usually, mediators maintain materialiied views over the sources, and attempt to rmswer users, queries by synthesizing these views (see [u1197]). In

197

turn, this has resulted in a renewed interest in the conjunctlvc query containment problem, since algorithms based on query containment can be used to answer queries by synthesizing materialized views [LMSS95, RSU95]. In view of the NP-hardness of the conjunctive query containment problem, researchers attempted to discover restricted classes of conjunctive queries for which the containment problem ‘Iis Qr C Qz’?” can be solved in polynomial time. Specifically, Sarzya [SarSl] showed that conjunctive query containment can be solved in linear time, if every database predicate occurs at most twice in the body of Qr. It should be noted that this result is optimal, since conjunctive query containment is N&complete, if some database predicate is allowed to occur at lcsst three times in the body of Qr. After this, Qian [Qia96] showed that conjunctive query containment can bc solved in polynomial time, if Qz is an acycZic query (this also follows from earlier results by Yannakakis ~anSl] on conjunctive query evaluation). Acyclicity is a structural condition on the hypergraph obtained from a conjunctive query by viewing the variables as nodes and the subgoals as hypcrcdges, More recently, Chekuri and Rajaraman [CR971 cstablishcd that for every Ic 2 1, conjunctive query containment can be solved in polynomial time, if Qz has querywidth at most k and a query decomposition of Qz of width k 1sgiven. This is a considerable extension of Qian’s result [&ia96], becauseacyclic queries are precisely the queries that have query-width 1. At the end of their paper, Chekuri and Rajsraman [CR971 rniscd the question of whether the tractability results for conjunctive queries extend to conjunctive queries with builtin predicates. Our goal in this paper is to provide answers to this question by focusing on disequation # as a built-in predicate, since (apart from equality) it is the most basic and most frequently used such predicate. Specifically, we analyzc the containment problem for safe conjunctive queries with discquations; that is, disequations X # Y are allowed in the body of the query, and every variable appearing in a discquation also appears as an argument in at least one of the database predicates in the body. First, we revisit van dcr Moyden’s [vdM97] l&hardness results for conjunctive queries with inequalities or strict inequalities. These results used a reduction from &-SATISFIABILITY, that is, satisfiability of quantified Boolean formulas with a t/‘3’ quantiller prefix. By perusing and slightly adapting hi proof, we observe that the containment problem ‘?s Qr E Qz?” is II!$nrd, if Qr and Qz are safe conjunctive queries with disequations and each database predicate occurs at most four times in the body of Qr. Thus, the question becomes: what is the complexity of the query containment problem for safe conjunctive queries with disequations, if each database predicate occurs fewer than four times in the body of Qr? Our first main result asserts that the query containment probIcm for safe conjunctive queries with disequations is I!;complete, if each database predicate occurs at most three times in the body of Qr. It is interesting to note that the lower bound is established via a reduction from the 3-aOLOnrNG EXTENSION problem, which Ajtai, Fagin and Stockmcyer [AFS98] showed very recently to be I’$hard. Moreover, the same reduction from Q-COLORING EXTENSION shows that the query containment problem “is Qr E Qz?” for safe conjunctive queries with disequations remains I$hard, even if the hypergraph associated with the variables of Qs and the database predicates of Qz is acyclic. In particular, this means that the tractability results of Qian [Qia96J and Chckuri-Rajaraman [CR971 do not extend to conjunctive quorics with # as a built-in predicate, if the hypergraph 198

of QZ is obtained by taking into account only the subgoals of Qz that involve database predicates (i.e., disequations do not contribute hyperedges). Finally, we turn attention to the csse in which every database predicate occurs at most twice in the body of &I. We show that in this case the containment problem for safe conjunctive queries with disequations is in coNP and, thus, is easier than the general case. It should be pointed out that the restriction to safe conjunctive queries is important for the coNP-upper bound, since, otherwise, we can show that the problem is l&hard via a reduction from Q-COLORING EXTENSION. We conjecture that query containment for safe conjunctive queries with disequations is coNP-complete, if every database predicate occurs at most twice in the body of Qr. Although we have not been able to establish the lower bound so far, we offer some partial evidence for this conjecture by showing that the related containment problem “is Qr E Qe?” is coNP-hard, if Qr is a safe conjunctive query with disequations in which every database predicate occurs at most twice, and Qz is a una’on of safe conjunctive queries with disequations. In the problems considered up to this point, both queries Qr and Qz serve as input to the question “is Qr C Qz?“, In the last section of this paper, we examine the complexity of query containment and query equivalence, when one of the two queries is fixed. For conjunctive queries with no builtin predicates, it is known that query containment can be NP-complete, if Qr is fixed, but is solvable in polynomialtime, if Qz is fixed. Moreover, if one of the conjunctive queries is fixed, then query equivalence is in NP, and can be NP-complete. For safe conjunctive queries with disequations, results of van der Meyden [vdM97] imply that if Qr is fixed, then query containment is in NP, while if Qz is fixed, then query containment is in coNP. Moreover, there are queries for which these problems are NP-complete and coNP-complete respectively. In contrast, we show here that, when one of the two queries is fixed, query equivalence of safe conjunctive queries with disequations may have higher complexity than the corresponding query containment problem. Specifically, we show that thii restricted query equivalence problem can be DP-complete, where DP consists of all problems that are the conjunction of a problem in NP and a problem in coNP. The class DP was introduced by Papadimitriou and Yannakakis [PY82]; several natural decision problems from combinatorial optimization and logic, SU~~~SEXACTTSP and CRITICALSATISFIABILITY,CU~ known to be DP-complete (see [Pap94]). There are, however, very few problems from database theory that are knonrn to be DP-complete; in fact, to the best of our knowledge, the only other such problem is a certain query evaluation problem studied by Cosmadakis [Cos83]. 2

Preliminaries

A n-ary conjunctive query Q is an existential first-order formula (3~1 . ..3z.,,) $(zr ,... ,zn,zr,.. .,z,,,), where the quantifier-free part $(%I,. . . , zn, zr, . . . , z~) is a conjunction of atomic formulas in which the predicate symbols are interpreted by extensional database predicates (EDBs). The free variables ~1,. . . , x,, of the formula are called the &s&g&shed variables of Q. A conjunctive query is usually written as a Prolog-style rule, whose head is &(%I,. . . ,o,,) and whose body is $(zr,. . _,z,,,.zr,. . . ,z,,,). The atomic formulas in the body of the rule are called its subgoals. For example, the formula (~zI~z~)(E(~~,~~)AE(~~,~~)AB(z~)A R(zz)) defines a unary conjunctive query Q, which in rule form is written as Q(zr) :- E(zI, or), E(zl, zz), B(zI), R&J.

to safe conjunctive {_

Suggest Documents