Properties of Languages That Make Recursive Views ... - CiteSeerX

0 downloads 0 Views 301KB Size Report
recursive datalog, and the maintenance is done in relational calculus. ...... games, that no rst-order sentence of quanti er rank m can distinguish the structures ...
Properties of Languages That Make Recursive Views Unmaintainabley Guozhu Dong

Department of Computer Science University of Melbourne Parkville, Vic. 3052, Australia Email: [email protected]

Leonid Libkin

Bell Laboratories 600 Mountain Avenue Murray Hill, NJ 07974, USA Email: [email protected]

Limsoon Wong

BioInformatics Centre & Institute of Systems Science Singapore 119597 Email: [email protected]

Abstract We study the problem of maintaining recursively-de ned views, such as the transitive closure of a relation, in traditional relational languages that do not have recursion mechanisms. The main results of this paper show that in most cases such incremental maintenance is impossible if either no auxiliary relations are used or they are limited to be deterministic. Instead of concentrating on proving these results for some particular languages, we try to identify properties of query languages that make such incremental maintenance impossible. That is, we want to use known results on expressive power of languages to derive new results on expressiveness of incremental recomputation. We identify two properties, studied previously in the literature on expressive power of query languages and nite-model theory, that imply unmaintainability of several recursive queries under insertions or deletions. Furthermore, using known results on expressive power of query languages, we simplify existing proofs on unmaintainability of the transitive closure and same-generation queries in relational calculus, and derive new results showing that these queries remain inexpressible in the presence of aggregate functions. Finally, we relate the complexity of updating transitive closure to that of updating the same-generation query and show that the latter is strictly harder than the former, and we extend this to updating queries based on context-free sets.

1 Introduction It is well known that relational calculus (equivalently, rst-order logic) cannot express recursive queries such as transitive closure or same-generation, cf. [1]. This is one of the main reasons why languages extending rst-order logic (such as various xpoint logics) have been so extensively studied in database theory. However, most practical database systems still use query languages with limited expressive power. Indeed, plain SQL that is used for writing majority of queries is essentially rst-order logic extended with grouping and aggregation, and as such it cannot code recursion mechanisms. What can one do if one needs to know the result of a recursive query? One possibility is to use a general-purpose programming language to compute such a query. However, this may not be desirable,

 Part of this work was done when the authors were visiting each other in 5 out of 6 possible combinations (Libkin never went to Melbourne). y Submitted to Information & Computation. A preliminary version of this paper appeared in the Proceedings of the 5th International Workshop on Database Programming Languages, Gubbio, Italy, September 1995.

1

for at least two reasons. First, one no longer has access to a declarative query language. Second, one no longer has access to a query language optimizer. An alternative solution, that attracted a lot of attention recently is the following: use a general-purpose programming language to compute the initial result of a query, and then update the result every time the database changes. For example, for the transitive closure query this amounts to updating the transitive closure of a graph every time an edge is inserted or deleted. The problem of updating the results of queries (called views) when the underlying database changes is known under the name of view maintenance. There is also extensive literature on dynamic algorithms (see, for example, [20, 24]) which does not consider the issue of a query language in which updates are expressed. Since databases are normally queried and updated by languages of limited expressive power, this issue becomes important for view maintenance. There is a large body of literature on view maintenance that assumes that view is de ned and maintained using the same language. Numerous algorithms exist dealing with fragments of relational algebra [2], full relational algebra [26, 17], bag (multiset) languages [16, 5], languages with grouping and aggregation [19, 27] and others; see [18] for a survey. However, much less is known in the case when a view is de ned in one, more powerful language, and is maintained in another one, less powerful. Those papers that do consider this situation deal with the case when a recursive query is computable in polynomial time and de nable in a language such as recursive datalog, and the maintenance is done in relational calculus. The query that received most attention is the transitive closure. It can be easily shown that the transitive closure can be maintained under the insertion of edges [8, 4]. A more interesting result of [25] shows that transitive closure of undirected graphs can always be maintained (that is, under both deletions and insertions), provided some auxiliary ternary relations can be used. It was further strengthened in [9] which showed that transitive closure of undirected graphs can be maintained using only binary auxiliary relations. They also showed that it cannot be done using only unary auxiliary relations. For directed graphs, the situation is more complex. The best positive solution so far is that of [7]: the transitive closure of acyclic graphs can be maintained in relational calculus, under both insertions and deletions. But this is not completely satisfactory because acyclicity itself cannot be tested in relational calculus [14], although it can be tested in the presence of the transitive closure. More examples of queries that can be maintained in relational calculus can be found in [25, 9]. It is conjectured that most recursive queries, such as the transitive closure, cannot be maintained in relational calculus, no matter what auxiliary relations (of polynomial size) are used. Partial proofs of this result exist: for instance, it was shown in [9] that auxiliary relations of arity up to 2 do not help. However, no general results of this kind are known yet. Another de ciency of existing results is that they are very closely tied to a particular language|the relational calculus. The negative results in papers such as [9] heavily rely on the fact that the relational calculus has precisely the power of rst-order logic, by using techniques developed for the study of rst-order logic (e.g., Ehrenfeucht-Fraisse games). Thus, existing results are not robust: one cannot use existing techniques to extend these results to other languages. One extension that we have in mind is to a language with aggregation. A number of results obtained recently [21, 6, 23] show that in terms of expressive power languages with aggregation are rather close to relational calculus. Thus, one may expect that they have similar power in terms of maintenance of views. However, none of the existing proofs on the limitations of incremental expressive power of relational calculus applies directly 2

to languages with aggregation. Thus, the main goal of this paper is to nd properties of query languages (describing their expressiveness) that would imply unmaintainability of certain recursive views. The properties we describe here are the ones typically used as tools in nite-model theory for proving inexpressibility results. In particular, the fact that they are possessed by relational calculus (even with aggregate functions) can be proved by quotation. In terms of recursive queries, we concentrate on the two most famous examples of queries expressible in datalog but not in relational calculus: transitive closure and same-generation. Given the fact that some recursive queries can be maintained in relational calculus, it is probably impossible to nd general characterizations of this kind, and thus one has to concentrate on some particular queries. However, we believe that the techniques developed in this paper are easily extendible to deal with other queries. While our results do generalize in the direction of allowing more powerful incremental languages, it should be pointed out that our results are for situations where no auxiliary relations are used or where they are limited to be deterministic. In contrast, the negative results of [9] allow nondeterministic auxiliary relations. The rest of the paper is organized as follows. In Section 2 we de ne two properties of languages. One is called cycle-simplicity. It says that the language cannot distinguish two cycles if they are long enough. That is, for any boolean query Q on graphs that is de nable in the language, it is the case that there exists a constant k, depending on Q only, that for any two cycle graphs R and R0 of length at least k, Q(R) = Q(R0 ). This property is known to hold for rst-order logic [14], and was used previously in nite-model theory research [13]. The second property states that the language cannot express the transitive closure of a chain graph. Equivalently, it cannot test if the graph a chain. This amounts to inexpressibility of DLOGSPACE-complete problems (see [12]). Again, this property was used before in the work on expressive power of languages [21, 12, 6]. We also note that both relational calculus and plain SQL (extension of relational calculus with grouping and aggregation) possess these properties. We de ne the framework for incremental maintenance in Section 3. In Section 4, we consider incremental maintenance of transitive closure. We note that it can be maintained under the insertion of edges, and concentrate on the deletion case. Our main result is that any language that cannot test if the input graph is a chain, cannot maintain the transitive closure query. In particular, this applies to relational calculus and plain SQL. We also consider the case when some limited auxiliary data is allowed. In Section 5, we study the same-generation query. We rst show that any language that cannot test if the input graph is a chain, cannot maintain the same-generation query under deletions. Further, we show that any cycle-simple language cannot maintain the same-generation query under insertions. Since relational calculus and SQL have both of these properties, we obtain that neither can maintain same-generation under any kind of update. In Section 6, we consider a generalization of the same-generation query: context-free chain queries. We show that any such query given by an in nite context-free language is harder than the transitive closure query, as far as incremental maintenance is concerned. In particular, the same-generation query is harder than transitive closure. We give concluding remarks in Section 7.

3

2 Language properties In this section we introduce two main properties, cycle-simplicity and recursion-freeness, study the relationship between them, and note that some familiar query languages possess both properties.

De nition 2.1 A single cycle is a graph hV; E i where V = fv ; : : : ; vn g is a set of n distinct nodes and E = f(v ; v ); (v ; v ); : : : ; (vn? ; vn ); (vn ; v )g for some n  1; a chain is a graph hV; E 0 i where V is as above and E 0 = f(v ; v ); (v ; v ); : : : ; (vn? ; vn )g. 2 1

1

2

2

3

1

1

2

2

3

1

1

In what follows, we shall always deal with languages that have at least the power of relational calculus. Given that, we can assume that they are closed under the usual Boolean connectives and rst-order quanti cation. We now introduce our two main properties, and show how they are related.

De nition 2.2 A language L is cycle-simple if it cannot test properties of single cycles that are both in nite and coin nite. In other words, for every Boolean query Q on graphs1 there exists a constant k depending on Q only, such that Q(R) = Q(R0 ) for any two single cycles R and R0 of length at least k. A language L is recursion-free if it cannot test if an arbitrary graph is a chain.

These two properties have been looked at before. First-order logic is cycle-simple (this follows easily from Gaifman's locality theorem [14], for example); furthermore, inability to distinguish large cycles was used as a tool in proving various expressiveness results. The second property deserves some explanation. Inability to test if a graph is a chain is essentially the same as inability to express DLOGSPACE-complete queries (e.g., deterministic transitive closure)|this follows from [12]. DLOGSPACE appears to be the lowest complexity class that allows de nitions of recursive queries (such as deterministic transitive closure), those typically expressed in languages such as Datalog, or xpoint logics. Various complexity classes conjectured to be below DLOGSPACE (e.g., TC0 ) appear to be incapable of expressing such recursive queries; for classes such as AC0 , this is known. Thus, DLOGSPACE-complete queries are a good candidate for the \simplest recursive queries." We use inexpressibility of a particular query|testing for chain|in the de nition in order to make it as simple as possible, and avoid dealing with reductions with respect to which the problem is complete, and with the ambient logic, since these issues are not particularly relevant to the problems we are studying.

Proposition 2.3 Let L be a language closed under the rst-order operations. Assume L is cycle-simple. Then L is recursion-free. Proof. We show that with a test for a chain, it is possible to test if the cardinality of a single cycle is even. The proof follows that of [15]. Assume that the cardinality of the input cycle R is at least 3. We test if there is a node, a, such that removing the edge from a to its successor, a0 , results in a chain R0 with an even number of nodes. This chain has a0 as its start node and a as its end node. The test is performed as follows: construct a graph R00 that has the edge (a; a0 ) and all edges of the form (b; b00 ), where b00 is the successor of the successor of b, for all b di erent from a and its predecessor (that is, In the rest of the paper, we will equate a graph with its set of edges while its set of nodes is understood to be given by the edges. The simpli cation will not a ect our results on the recursive queries. 1

4

there is a path of length two from b to b00 in R0 ). Then it is easy to see that R00 's construction can be carried out using the power of rst-order logic and that R00 is a chain if and only if R has an even number of nodes, cf. [15]. 2 We shall also use another form of the de nition of being recursion-free. Let Qtc chain be the class of graph queries Q with the following property: whenever the input to Q is a chain R, Q(R) is its transitive closure.

Lemma 2.4 If L is a language closed under the rst-order operations, the following are equivalent: 1. L can test if a graph is a chain; 2. L can express a query from Qtc chain . Proof. That Part 1 implies Part 2 was shown in [21]. To show the other implication, recall the de nition of C&C (chain-&-cycle) graphs [3]. These graphs have exactly one node of in-degree one and outdegree zero, exactly one node of out-degree one and in-degree zero, every other node has in-degree one and out-degree one, and there are no loops (i.e. cycles of length one). These graphs are de nable in rstorder logic. Each such graph has one or more connected components, one of them being a chain, and others being single cycles. Assume that there is a query Q 2 Qtc chain de nable in L. Consider the Boolean query Q0 on graphs de ned as follows: Q0 (R) is true if R is a C&C graph, Q(R) is a linear order, and every edge of R is present in Q(R). Since L is closed under rst-order, Q0 is de nable in L with Q. We now show that Q0 (R) is true i R is a chain. The `if' part is clearly true. For the `only if' part, assume Q0 (R) is true, and R is not a chain. Since R is a C&C-graph, it then contains at least one cycle R0 of length at least 2. Thus we obtain that R0 is contained in a linear ordering, which is impossible. This completes the proof. 2

Corollary 2.5 Let L be a language closed under the rst-order operations. Then L is recursion-free 2 i L cannot express any query from Qtc chain . Finally, we discuss these properties and some query languages. It is well known in the literature that

Fact 1 The relational calculus is both recursion-free and cycle-simple.

2

What is rather pleasant is that these properties continue to hold for the language that is essentially plain SQL. SQL, the dominant language of commercial databases, adds two main features to the relational calculus: grouping and aggregation. In a number of papers [21, 6, 23] we studied a theoretical reconstruction of plain SQL and its expressive power. Our approach was as follows. To model the grouping feature, we considered a nested relational language, as in [4]. If one deals with the usual queries from at relational databases to at relational databases, then nested sets can appear as intermediate results. It is known that the nested relational algebra is an extension of relational algebra that has enough power to express the GROUPBY and HAVING clauses of SQL. To model aggregation, we made the language two-sorted. In other words, it has two base types, one of them being the type of rational 5

numbers. By graph queries we meant queries of the type fb  bg ! fb  bg, where b is the other base type. We assumed that the usual rational arithmetic is present. Furthermore, we added an operator for summation of function values over a column, and showed that such a language computes the standard aggregate functions such as AVG, TOTAL, COUNT. It follows from the results of [21, 6] that

Fact 2 Plain SQL is both recursion-free and cycle-simple.

2

The second of these properties depends on what kind of arithmetic operations on rational numbers are allowed in plain SQL. In [21, 6] we assumed that those are +; ; ?;  and

Suggest Documents