Definability and descriptive complexity on databases of ... - CiteSeerX

Definability and descriptive complexity on databases of bounded tree-width Martin Grohe and Julian Mariño Institut für mathematische Logik Eckerstr. 1, 79104 Freiburg, Germany grohe,marino @sun2.mathematik.uni-freiburg.de

f

g

Abstract. We study the expressive power of various query languages on relational databases of bounded tree-width. Our first theorem says that fixed-point logic with counting captures polynomial time on classes of databases of bounded tree-width. This result should be seen on the background of an important open question of Chandra and Harel [7] asking whether there is a query language capturing polynomial time on unordered databases. Our theorem is a further step in a larger project of extending the scope of databases on which polynomial time can be captured by reasonable query languages. We then prove a general definability theorem stating that each query on a class of databases of bounded tree-width which is definable in monadic second-order logic is also definable in fixed-point logic (or datalog). Furthermore, for each k 1 the class of databases of tree-width at most k is definable in fixed-point logic. These results have some remarkable consequences concerning the definability of certain classes of graphs. Finally, we show that each database of tree-width at most k can be characterized up to isomorphism in the language Ck+3 , the (k + 3)-variable fragment of firstorder logic with counting.

1 Introduction The tree-width of a graph, which measures the similarity of the graph to a tree, has turned out to be an indispensable tool when trying to find feasible instances of hard algorithmic problems. As a matter of fact, many NP-complete problems can be solved in linear time on classes of graphs of bounded tree-width (see [5] for a survey). Numerous examples can easily be obtained by a result of Courcelle [8] saying that any query definable in monadic second-order logic ( MSO) can be evaluated in linear time on graphs of bounded tree-width. Courcelle’s MSO allows to quantify over sets of edges and sets of vertices; this makes it quite powerful. For instance, hamiltonicity of a graph is defined by an MSO-sentence saying that “there exists a set X of edges such that each vertex is incident to exactly two edges in X , and the subgraph spanned by the edge-set X is connected”. The importance of these results relies on the fact that many classes of graphs occurring in practice are of small tree-width. The notion of tree-width of a graph has been introduced (under a different name) by Halin [15] and later re-invented by Robertson and Seymour [20]. It has a straightforward generalization to relational databases (due to Feder and Vardi [10], also see [11]). It

is easy to see that most of the results on graphs of bounded tree-width generalize to relational databases. Among them is Courcelle’s theorem. It immediately implies that on a class of databases of bounded tree-width each first-order query can be evaluated in linear time.1 Our main task is to study the expressive power of fixed-point logic with and without counting on databases of bounded tree-width. Of course, our results transfer to the query languages datalog and datalog with counting, both under the inflationary semantics, which are known to have the same expressive power as the corresponding fixed-point logics ([1, 12]). Theorem 1. Let k 1. Fixed-point logic with counting captures polynomial time on the class of all databases of tree-width at most k . In other words, a “generic” query on a class of databases of bounded tree-width is computable in polynomial time if, and only if, it is definable in fixed-point logic with counting. It is important to note here that we are considering classes of unordered databases; on ordered databases the result is a just a special case of the well-known result of Immerman and Vardi [16, 24] saying that fixed-point logic captures polynomial time on ordered databases. The problem of capturing polynomial time on unordered databases goes back to Chandra and Harel [7]. Our result should be seen as part of a larger project of gradually extending the scope of classes of databases on which we can capture polynomial time by “nice” logics. The main motivation for this is to get a better understanding of “generic polynomial time”, which may contribute to the design of better query languages. Though it is known that fixed-point logic with counting does not capture polynomial time on all databases [6], it has turned out to be the right logic in several special cases. The best previously known result is that it captures polynomial time on the class of planar graphs [14]. Our Theorem 1 is somewhat complementary by a result of Robertson and Seymour [22] saying that a class C of graphs is of bounded tree-width if, and only if, there is a planar graph that is not a minor (see below) of any graph in C . We next turn to fixed-point logic without counting. Although this weaker logic does certainly not capture polynomial time on databases of bounded tree-width, it remains surprisingly expressive. Theorem 2. Let k

1 and a database schema.

(1) The class of all databases over of tree-width at most k is definable in fixed-point logic. (2) Each MSO-definable query on the class of databases of tree-width at most k is definable in fixed-point logic. More precisely, for each MSO-definable class D of databases the class

fD 2 D j D is of treewidth at most kg is definable in fixed-point logic.2 1

2

More precisely, for each first-order formula '( x) there is a linear-time algorithm that, given a database D and a tuple a , checks whether D = '( a). For convenience we have only formulated the result for Boolean queries, but the generalization to arbitrary queries is straightforward.

j

2

This theorem has some nice applications to graphs. To obtain the full strength of Courcelle’s MSO with quantification over edge-sets, we consider graphs as databases over the schema fV; E; I g with unary V; E and a binary I . A graph is then a triple G = (V G ; E G ; I G ), where V G is the vertex-set, E G is the edge-set, and I G V G E G the binary incidence relation between them.3 Thus Theorem 2 implies, for example, that hamiltonicity is fixed-point definable on graphs of bounded tree-width. Recall that a graph H is a minor of a graph G if it is obtained from a subgraph of G by contracting edges. It can easily be seen that for each fixed graph H there is an MSOsentence saying that a graph G contains H as a minor. In their Graph Minor Theorem, Robertson and Seymour [23] proved that for each class C of graphs closed under taking minors there are finitely many graphs H1 ; : : : ; Hn such that a graph G is in C if, and only if, it contains none of H1 ; : : : ; Hn as a minor. (Actually, a restricted version of this theorem, proved in [21], suffices for our purposes.) Together with the fact that the class of all planar graphs is fixed-point definable [14] and the abovementioned result of Robertson and Seymour that a class C of graphs is of bounded tree-width if, and only if, there is a planar graph that is not a minor of any graph in C , we obtain a quite surprising corollary: Corollary 3. Let C be a class of planar graphs that is closed under taking minors. Then C is definable in fixed-point logic. As another by-product of our main results, we obtain a theorem that continues a study initiated by Immerman and Lander [17, 18]. Ck denotes the k -variable (first-order) logic with counting quantifiers. Theorem 4. Let k 1. For each database D of tree-width at most k there is a Ck+3 sentence that characterizes D up to isomorphism. On the other hand, Cai, Fürer, and Immerman [6] proved that for each k 1 there are non-isomorphic graphs G; H of size O(k ) that cannot be distinguished by a Ck sentence.

2 Preliminaries A database schema is a finite set of relation symbols with associated arities. We fix a countable domain dom. A database instance, or just database, over the schema = fR1 ; : : : ; Rn g is a tuple D = (R1D ; : : : ; RnD ), where RiD is a finite relation on dom of the arity associated with Ri . The active domain of D, denoted by AD , is the set of all elements of dom occurring in a tuple contained in any of the RiD . For convenience, we usually write a 2 D instead of a 2 AD ; we even write a 2 D instead of a 2 (AD )k for k -tuples a . By a we always denote a tuple a1 : : : ak , for some k 1. The size of a 3

f

g

It is more common in database theory to encode graphs as databases over V; R , where V is unary and R binary. But it is easy to see that there is a first-order definable transformation between graphs (V; R) and the corresponding (V; E; I ) and vice-versa, and it changes the tree-width by at most 1.

3

database D, denoted by jDj, is the size of its active domain. In general, jS j denotes the size of a set S . If D is a database over , X a k -ary relation symbol not contained in , and X a k-ary relation over dom, then (D; X ) denotes the database over [ fX g defined in the obvious way. Occasionally we also need to expand a database D by distinguished elements, for a tuple a 2 dom we write (D; a) for the expansion of D by a. An isomorphism between (D; a1 : : : al ) and (E; b1 : : : bl ) is a bijective f : AD [ fa1 ; : : : ; al g ! AE [ fb1 ; : : : ; bl g such that f (ai ) = bi for i l and f is an isomorphism between D and E . A subinstance of a database D over is a database E over such that for all R 2 we have RE RD . For B dom we let hB iD denote the subinstance E of D where for all R 2 , say, of arity r, we have RE = RD \ B r . We often omit the superscript D and just write hB i if D is clear from the context. The union, intersection, difference of a family of databases over the same schema is defined relation-wise. For example, the union of two databases D and E over is the database D [ E over with RD[E = RD [ RE for all R 2 . If D is a database and B dom we let D n B = hAD n B iD . Furthermore, for an l-tuple a 2 dom we let D n a = D n fa1; : : : ; al g. An ordered database is a database D over a schema that contains the binary relation symbol such that D is a linear order of the active domain of D. We assume that the reader is familiar with the notions of a query and query language. Most of the time, we restrict our attention to Boolean queries,4 which we just consider as isomorphism closed classes of databases over the the same schema. A Boolean query on a class D of databases is a Boolean query Q which is a subclass of D. We say that a query language L captures a complexity class K if for each query Q we have: Q is definable in L if, and only if, it is computable in K. We say that L captures K on a class D of databases if for each query Q on D we have: Q is definable in L if, and only if, it is computable in K. For example, a well-known result of Immerman and Vardi [16, 24] says that least fixed-point logic captures polynomial time on the class of all ordered databases. 2.1 Fixed-point logics We assume a certain familiarity with first-order logic FO. The query languages we are mainly interested in are inflationary fixed-point logic IFP and inflationary fixed-point logic with counting IFP+C. The set of IFP-formulas is obtained adding the following formula-formation rule to the usual rules to form first-order formulas: Given a formula ' over the schema [ fX g, where X is a k -ary relation ; z of variables, we may form the symbol for some k 1, and two k -tuples x new formula [IFPx;X '] z over . 4

This restriction is inessential and can easily be removed. But it simplifies things a little bit.

4

The semantics of IFP is, as usually, given by a satisfaction relation j=. In particular, consider an IFP-formula '( x; y) over a schema [ fX g, such that the free variables of ' occur in the tuples x ; y , and a database D over . For each tuple b 2 D we let X0b = ; andSXib+1 = Xib [ fa 2 D j (D; Xib ) j= '(a; b)g (for i 0). Furthermore, we b b let X1 = i1 Xib . Then D j= [IFPx;X '(x; b)]c if, and only, c 2 X1 (for all c 2 D). To define IFP+C we need to introduce a second countable domain num disjoint from dom. We think of num as a copy of the non-negative integers, and we usually do not distinguish between an element of num and the corresponding integer. We also need variables of sort num which we denote by ; . The symbols x; y; z always refer to variables of sort dom. If we do not want to specify the sort of a variable, we use symbols u; v; w. Furthermore, we let 6 denote a binary relation symbol which is supposed to range over num. For each database D over we let N D be the initial segment f0; : : : ; jDjg of num of length jDj + 1, and we let 6D be the natural order on N D . We let D# = (D; 6D ) considered as a database of schema # = [f6g on the domain dom [ num. Note that the active domain of D# is AD [ N D . We can now consider IFP on databases with this extended domain. The set of IFP+C-formulas is defined by the same rules as the set of IFP-formulas and the following additional rule: If ' is a formula and x; are variables then 9= x' is a new formula. (Recall that by our convention x ranges over dom and over num.)

To define the semantics, let '(x; w) be a formula over # whose free variables all occur in x; w , let D be a database over , and d 2 AD [ N D , i 2 N D . Then D# j= = i 9 x'(x; d) if, and only if, the number of a 2 AD such that D# j= '(a; d) is i. So far, our IFP+C-formulas only speak about databases D# . However, for formulas ' without free num-variables, we can let D j= ' () D# j= '. This way, IFP+Cformulas also define queries in our usual framework of databases on the domain dom. IFP+C has turned out to capture polynomial time on several classes of databases. It follows easily from the Immerman-Vardi-Theorem mentioned earlier that IFP+C captures polynomial time on the class of all ordered databases. The following Lemma, which is based on the notion of definable canonization introduced in [13, 14], can be used to extend this result to further classes of databases. The straightforward proof can be found in [13]. # For an IFP+C-formula '(w ) and a database D we let '(w)D = fd 2 D# j D# j= '(d)g. Note, in particular, that if w is an l-tuple of num-variables, then '(w)D# is an l-ary relation on num.

Lemma 5. Let = fR1 ; : : : ; Rn g be a database schema, where Ri is ri -ary, and let D be a class of databases over . Suppose that there are IFP+C-formulas '1 (1 ), : : : , 'n (n ), where i is an ri -tuple of num-variables, such that for all D 2 D the database

'1 (1 )D# ; : : : ; 'n (n )D#

?

(which is a database over on the domain num) is isomorphic to D. Then IFP+C captures polynomial time on D. 5

The reason that this holds is that we can speak of the order 6D on N D in IFP+C. Thus essentially the hypothesis of the lemma says that we can define ordered copies of all databases D 2 D in IFP+C.

3 Tree decompositions Deviating from the introduction, from now on we find it convenient to consider graphs as databases over fV; E g, where V is unary and E is binary. Without further explanation, we use usual graph theoretic notions such as paths, cycles, etc. A tree is a connected, cycle-free graph. Definition 6. A tree-decomposition of a database D is aSpair (T; (Bt )t2T ), where T is a tree and (Bt )t2T a family of subsets of AD such that t2T hBt iD = D and for each a 2 D the subgraph hft j a 2 Bt giT of T is connected. The Bt are called the blocks of the decomposition. The width of (T; (Bt )t2T ) is maxfjBt j j t 2 T g ? 1. The tree-width of D, denoted by tw(D), is the minimal width of a tree-decomposition of D. On graphs, this notion of tree-decomposition coincides with the usual notion introduced by Robertson and Seymour [20]. The first thing we will do now is present two basic and well-known lemmas that give rise to a fixed-point definition of the class of databases of tree-width at most k . This requires some additional notation. We recommend the reader to really get familiar with this notation and the lemmas, since they will be used again and again throughout the whole paper. For other basic facts about tree-decompositions and tree-width we refer the reader to [9]. Similar techniques as we use them here have been employed by Bodländer in [3, 4].

2D Definition 7. Let k 1 and D a database. A k -preclique of D is a (k +1)-tuple a of distinct elements such that D has a tree-decomposition (T; (Bt )t2T ) of width at most k with Bt = fa1 ; : : : ; ak+1 g for some t 2 T .

1 and D a database of tree-width at most k and size at least (k +1). D has a tree-decomposition (T; (Bt )) with jBt j = k +1 for all t 2 T and Bt = 6 Bu for all distinct t; u 2 T .

Lemma 8. Let k (1)

In other words, D has a tree-decomposition whose blocks are pairwise distinct k-precliques. We call such a tree-decomposition a regular tree-decomposition. (2) For each k -preclique a of D there is a regular tree-decomposition (T; (Bt )t2T ) of D such that there is a t 2 T with Bt = fa1 ; : : : ; ak+1 g. Proof. To prove (1), let (T; (Bt )t2T ) be a tree-decomposition of D of width k such that among all tree-decompositions of D of width k : (i) jT j is minimal.P (ii) Subject to (i), t2T

jBt j is maximal. 6

(i) clearly implies that there are no adjacent t; u 2 T such that Bt Bu , because otherwise we could simply contract the edge tu in T and obtain a smaller tree. Now suppose jBt j < k + 1 for some t 2 T . Let u be a neighbor of t and a 2 Bu n Bt . We can simply add a to Bt and obtain a new tree-decomposition with a larger sum in (ii). This contradicts our choice of (T; (Bt )). (2) can be proved similarly.

2

With each database D over we associate its Gaifman graph G(D) with vertex set V G(D) = AD and an edge between two vertices a; b if there is a tuple c in one of the relations of D such that both a and b occur in c. Now we can transfer graph theoretic notions such as connected components or distances to arbitrary databases; we just refer to the respective notions in the Gaifman graph. By comp(D) we denote the set of active domains of the connected components of a database D. Hence the connected components of D are the subinstances hC i for C 2 comp(D). Let l 1, D a database, and a 2 D an l-tuple of vertices. For b 2 D we let Cb?a denote the (unique) C 2 comp(D n a) with b 2 C ; if b 2 fa1; : : : ; al g then Cb?a is the empty set. Note that comp(D n a) = fCb?a j b 2 Dg. Furthermore, we let Cb+a = Cb?a [ fa1; : : : ; al g. , an i l, and any c we let a=i denote the (l ? 1)-tuple obtained from For an l-tuple a a by deleting the ith component and a(c=i) the l-tuple obtained from a by replacing the ith component by c. Observe that a k -preclique of a database D either separates D or is the block of a leaf in every regular tree-decomposition of D where it occurs as a block. Also note that a database has tree-width at most k if, and only if, it has size at most k or contains a k-preclique. The following lemma (for graphs) is essentially due to Arnborg, Corneil, and Proskurowski [2].

1, D a database, and a 2 D a (k + 1)-tuple of distinct elements. is a k-preclique of D if, and only if, a is a k-preclique of hCb+a i for all The tuple a b 2 D. , the tuple a is a k-preclique of hCb+a i if, and only if, there are For all b 2 D n a i k + 1 and c 2 Cb?a such that a=i isolates ai in hCb+a i (that is, ai is not adjacent to any d 2 Cb?a in the Gaifman graph of D) and a (c=i) is a k-preclique + a =i + a of hCb i(= hCb i n ai ). For all b 2 fa1 ; : : : ; ak+1 g, the tuple a is a k-preclique of hCb+a i.

Lemma 9. Let k (1) (2)

(3)

ai a/i c

b

Figure 1. 7

Proof. Note first that if (T; (Bt )t2T ) is a tree-decomposition of a database D and E is a subinstance of D then (T; (AE \ Bt )t2T ) is a tree-decomposition of E of at most the same width. Then the forward direction of (1) is obvious. For the backward direction we can simply paste together decompositions of the Cb+a in which fa1 ; : : : ; ak+1 g forms a block along these blocks. For the forward direction of (2), let (T; (Bt )t2T ) be a regular tree-decomposition does not separate of hCb+a i that contains a t 2 T with Bt = fa1 ; : : : ; ak+1 g. Since a hCb+a i, t must be a leaf of T . Let u be the neighbor of t in T . Let i 2 f1; : : : ; k + 1g such that ai 2 Bt n Bu and c 2 Bu n Bt . The claim follows. + a=i i For the backward direction, let (T; (Bt )t2T ) be a tree-decomposition of hCb that contains a t 2 T with Bt = fa1 ; : : : ; ai?1 ; c; ai+1 ; : : : ; ak+1 g. We add a new vertex u to T and make it adjacent to t. Furthermore, we let Bu = fa1 ; : : : ; ak+1 g. We obtain a tree-decomposition of Cb+a of the desired form. (3) is trivial, since Cb+a = fa1 ; : : : :ak+1 g in this case.

2

Combining (1) and (2) we obtain the following:

2 D a (k + 1)-tuple of distinct elements, Corollary 10. Let k 1, D a database, a . and b 2 D n a is a k-preclique in hCb+a i if, and only if, there is an i (k + 1) and a c 2 Then a ? a Cb such that a=i isolates ai in hCb+a i and a(c=i) is a k-preclique in all subinstances hCd+a(c=i) i, where d 2 Cb?a=i . + a(c=i)

Note that the subinstances hCd i are smaller then hCb+a i. a; b), This gives rise to an inductive definition of the relation P consisting of all ( where a is a (k + 1)-tuple and b a single element, with the property that a is a preclique in hCb+a i: We start by letting P consist of all ( a; ai ), where a is a (k + 1)-tuple and 1 i k +1. Then we repeatedly add all pairs (a; b) for which there exists an i k +1 and a c 2 Cb?a such that a =i isolates ai in hCb+a i and we already have (a(c=i); d) 2 P ? a =i for all d 2 Cb . This inductive definition can easily be formalized in IFP. Thus we have:

x; y) and Lemma 11. There are IFP-formulas '( (k + 1)-tuples a 2 D and b 2 D we have

(x) such that for all databases D,

D j= '(a; b) () a is a k-preclique in hCb+a i; D j= (a) () a is a k-preclique: As a corollary we obtain the first part of Theorem 2. Corollary 12. Let be a database schema and k over of tree-width at most k is definable in IFP. 8

1. Then the class of all databases

4 Canonizing databases of bounded tree-width In this section we sketch a proof of Theorem 1. The basic idea of the proof is the same as used in the proof that IFP+C captures polynomial time on the class of trees given in [13]. Let us fix a k 1. Without loss of generality we restrict our attention to databases over the schema fRg, where R is a binary relation symbol. We want to apply Lemma 5 to the class D of databases over fRg of tree-width at most k . We shall define a formula '(; ) such that for all databases D 2 D we have # D = '(; )D . For the course of our presentation, let us fix a database D 2 D; of course the formula ' we are going to obtain does not depend on D but works uniformly over D. Inductively we are going to define a (k + 4)-ary relation X (AD )k+2 (N D )2 with the following property: ; b 2 D such a is a k-preclique in hCb+a i we have: For all a

f between (hCb+a i; a) and (X ab ; 1 : : : k + 1), where D 2 X ab = fij 2 (N ) j abij 2 X g. (Recall that (hCb+a i; a ) denotes the expansion of the database hCb+a i by the distin.) guished elements a (ii) For all b0 2 Cb+a we have X a b = X ab0 . (i) There is an isomorphism

bij where a 2 D is a We start our induction by letting X be the set of all tuples a (k + 1)-tuple of distinct elements, b 2 fa1; : : : ; ak+1 g, and i; j 2 f1; : : : ; k + 1g are

chosen such that ai aj 2 RD . ; c 2 D, i 2 f1; : : : For the induction step, recall Corollary 10. We consider a such that

; k +1g

=i isolates ai in Cc+a , – a ?a=i , a(c=i) is a k-preclique of hC +a(c=i) i and the relation X a(c=i)d – for all d 2 Cc d has already been defined, – Xa c has not been defined yet. is a k-preclique in hCc+a i. Note that these conditions imply, by Corollary 10, that a ? a =i 0 Let a = a(c=i). For the d 2 Cc we remember that X a0 d is a database over fRg whose active domain is contained in N D . On N D we have the order 6D available. Hence we can view these two relations together as an ordered database over the schema fR; 6g. Ordered databases, in turn, can be ordered lexicographically. 0 ?a=i . For 1 i l Let C1 ; : : : ; Cl be a list of the components hCd+a i, for d 2 Cc and d 2 Ci , let Xi = X a d . By (ii), Xi does not depend on the choice of d. Some of the Xi may appear more than once, because different components may be isomorphic. We produce another list Y1 ; : : : ; Ym (where m l) that is ordered lexicographically (with respect to the order explained in the previous paragraph) and that only contains each entry once, and we let ni be the number of times Yi occurs in the list X1 ; : : : ; Xl . (The list Y1 ; : : : ; Ym can actually be defined in IFP+C; counting is crucially needed to obtain the multiplicities ni .) 9

Let C denote the subinstance of D obtained by pasting the components C1 ; : : : ; Cl together along the tuple a 0 . Using the Yi and ni we define a binary relation Y on N D such that (C; a 0 ) is isomorphic to (Y; 1 : : : k + 1). It is clearly possible to do the arithmetic necessary here within the formal framework of our logic IFP+C. In a second step we add an element representing ai in an appropriate position and c that satisfies (i). However, (ii) is not guaranteed so far rearrange Y to obtain an X 0 a because our definition may depend on the choices of c and i. So finally we let X a c be the lexicographically first among all the X 0 a c0 , for all suitable choices of c0 and i. For all b 2 Cc?a we let X a b = X ac . This completes the induction step. We eventually reach a situation where we have defined X a b for all k-precliques a and for all b. Similarly as in the definition of Y above, for each k-preclique a we can now define a binary Za on N D such that (D; a ) is isomorphic to (Za ; 1 : : : k + 1). we obtain an ordered copy of D whose first (k + 1)-elements In other words, for all a represent a . We pick the lexicographically smallest among all the Za to be our canonical copy. This definition can be formalized in IFP+C and gives us the desired formula '.

2

Theorem 4 can be proved by a similar induction, though its proof is simpler because we do not have to define a canonical copy of our database, but just determine its isomorphism type. Due to space limitations, we omit the proof.

5 Monadic second order logic The set of MSO-formulas is obtained adding the following formula-formation rule to the usual rules to form first-order formulas: Given a formula ' over [ fX g, where X is a unary relation symbol, we may form a new formula 9X' over .

Furthermore, we use the abbreviation 8X' for :9X :'. The semantics of MSO is obtained by inductively defining a relation interesting step being

D j= 9X' ()

There is a subset X

j=, the only

AD such that (D; X ) j= '.

In this section we want to prove the second part of Theorem 2: On a class of databases of bounded tree-width, each MSO-definable query is IFP-definable. To prove Courcelle’s [8] result that each MSO-definable query on graphs of bounded tree-width is computable in linear time one may proceed as follows: The first step is to compute a tree-decomposition of the input graph in which each vertex has valence at most 3. Then the crucial observation is that if we have such a tree-decomposition, we can describe each MSO-formula by a finite tree-automaton. It is not hard to simulate such an automaton in linear time. We proceed in a similar spirit, having to deal with two problems. The first is that we do not know how to define a tree-decomposition of a database in IFP. But inductively climbing along the precliques is almost as good. The second problem is that we cannot 10

guarantee a bounded valence, that is, we cannot give a bound on the number of components Cb+a that may occur in Lemma 9(1). However, in the proof sketched above this is needed to make the automaton finite. To overcome this problem we first need to prove the following “padding lemma”. The quantifier-rank of an MSO-formula is the maximal number of nested (first and second-order quantifiers) occurring in the formula. For a database D and an l-tuple a 2 D we let tpr (D; a) be the set of all MSO-formulas (x) of quantifier-rank r such a). The tuple a may be empty, in this case we just write tpr (D). We let that D j= (

) j D database over ; a 2 D l-tupleg: types(; l; r) = ftpr (D; a Note that this set is finite for all ; l; r. Lemma 13. Let be a database schema and l; r 1. Then there is an integer K = K (; l; r) such that for all databases D; E over and l-tuples a 2 D, b 2 E the following holds: If for all 2 types(; l; r) we have ?

min K; fC 2 comp(D n a) j tpr (hC +a iD ; a) = g ? = min K; fC 2 comp(E n b) j tpr (hC +b iE ; b) = g ) = tpr (E; b). then tpr (D; a

The proof of this lemma is an induction on r. It uses the Ehrenfeucht-Fra¨ıssé game characterizing MSO-equivalence. ) we only need a finite amount The essence of the lemma is that to compute tpr (D; a of information on the types of the components hC +a iD . We can arrange this information in a large disjunction of formulas saying: If there are k1 components C such that hC +x i = 1 and k2 components C such that hC +x i = 2 and : : : and km components C such that hC +x i = m then the type of the whole database is .

1 ; : : : ; m range over the set types(; l; r). The ki are integers between 0 and K (; l; r), and in case ki = K (; l; r) then the ith conjunct is to be read “if there are at least ki components” (that is, we only count up to K (; l; r)). Recall Lemma 9. Using it we define, by a simultaneous induction, (k + 2)-ary relations X , for 2 types(; k + 1; r). The intended meaning of x y 2 X is “x is a k-preclique in hCy+x i and tpr (hCy+x i; x) = ”. It is not hard to formalize such an induc the type tpr (D) is determined by tpr (D; a), tive definition in IFP. Since for all r; D; a

Here

we have proved the following lemma.

Lemma 14. Let be a database schema, r; k 1, and 2 types(; 0; r). Then there is an IFP-sentence ' such that for all databases D over of tree-width at most k we have tpr (D) =

() D j= ' : 11

Theorem 2(2) follows, since each MSO-sentence equivalent to the (finite) disjunction _

2types(;0;r); 2

over

of quantifier-rank

r is

' :

6 Conclusions We have studied the concept of tree-width of a relational database and seen that from a descriptive complexity theoretic perspective classes of databases of bounded tree-width have very nice properties. Of course our results are of a rather theoretical nature. The more practically minded will ask whether databases occurring in practice can be expected to have small treewidth. Probably, this will not be the case for the average database. However, if in a specific situation it is known that the databases in question will have a small tree width, then it may be worthwhile to explore this. It seems plausible to us that in particular data carrying not too much structure can be arranged in a database of small tree-width. This point of view may be supported by a different perspective on tree-width, which roughly says that tree-width is a measure for the “global connectivity” of a graph (or database). Essentially, the tree-width of a graph is the same as its linkedness, which is the minimal number of vertices required to split the graph and subsets of its vertex-set into more or less even parts (see [19])

References 1. S. Abiteboul and V. Vianu. Fixpoint extensions of first order logic and datalog-like languages. In Proceedings of the 4th IEEE Symposium on Logic in Computer Science, pages 71–79, 1989. 2. S. Arnborg, D. Corneil, and A. Proskurowski. Complexity of finding embeddings in a k-tree. SIAM Journal on Algebraic Discrete Methods, 8:277–284, 1987. 3. H.L. Bodländer. NC-algorithms for graphs with small treewidth. In J. van Leeuwen, editor, Proceedings of the 14th International Workshop on Graph theoretic Concepts in Computer Science WG’88, volume 344 of Lecture Notes in Computer Science, pages 1–10. SpringerVerlag, 1988. 4. H.L. Bodländer. Polynomial algorithms for graph isomorphism and chromatic index on partial k-trees. Journal of Algorithms, 11:631–643, 1990. 5. H.L. Bodländer. Treewidth: Algorithmic techniques and results. In Proceedings 22nd International Symposium on Mathematical Foundations of Computer Science, MFCS’97, volume 1295 of Lecture Notes in Computer Science, pages 29–36. Springer-Verlag, 1997. 6. J. Cai, M. Fürer, and N. Immerman. An optimal lower bound on the number of variables for graph identification. Combinatorica, 12:389–410, 1992. 7. A. Chandra and D. Harel. Structure and complexity of relational queries. Journal of Computer and System Sciences, 25:99–128, 1982. 8. B. Courcelle. Graph rewriting: An algebraic and logic approach. In J. van Leeuwen, editor, Handbook of Theoretical Computer Science, volume 2, pages 194–242. Elsevier Science Publishers, 1990.

12

9. R. Diestel. Graph Theory. Springer-Verlag, 1997. 10. T. Feder and M.Y. Vardi. Monotone monadic SNP and constraint satisfaction. In Proceedings of the 25th ACM Symposium on Theory of Computing, pages 612–622, 1993. 11. E. Grädel. On the restraining power of guards, 1998. 12. E. Grädel and M. Otto. Inductive definability with counting on finite structures. In E. Börger, G. Jäger, H. Kleine Büning, S. Martini, and M.M. Richter, editors, Computer Science Logic, 6th Workshop, CSL ‘92, San Miniato 1992, Selected Papers, volume 702 of Lecture Notes in Computer Science, pages 231–247. Springer-Verlag, 1993. 13. M. Grohe. Finite-variable logics in descriptive complexity theory, 1998. 14. M. Grohe. Fixed-point logics on planar graphs. In Proceedings of the 13th IEEE Symposium on Logic in Computer Science, pages 6–15, 1998. 15. R. Halin. S-Functions for graphs. Journal of Geometry, 8:171–186, 1976. 16. N. Immerman. Relational queries computable in polynomial time. Information and Control, 68:86–104, 1986. 17. N. Immerman. Expressibility as a complexity measure: results and directions. In Proceedings of the 2nd IEEE Symposium on Structure in Complexity Theory, pages 194–202, 1987. 18. N. Immerman and E. Lander. Describing graphs: A first-order approach to graph canonization. In A. Selman, editor, Complexity theory retrospective, pages 59–81. Springer-Verlag, 1990. 19. B. Reed. Tree width and tangles: A new connectivity measure and some applications. In R.A. Bailey, editor, Surveys in Combinatorics, volume 241 of LMS Lecture Note Series, pages 87–162. Cambridge University Press, 1997. 20. N. Robertson and P.D. Seymour. Graph minors II. Algorithmic aspects of tree-width. Journal of Algorithms, 7:309–322, 1986. 21. N. Robertson and P.D. Seymour. Graph minors IV. Tree-width and well-quasi-ordering. Journal of Combinatorial Theory, Series B, 48:227–254, 1990. 22. N. Robertson and P.D. Seymour. Graph minors V. Excluding a planar graph. Journal of Combinatorial Theory, Series B, 41:92–114, 1986. 23. N. Robertson and P.D. Seymour. Graph minors XX. Wagner’s conjecture, 1988. unpublished manuscript. 24. M. Y. Vardi. The complexity of relational query languages. In Proceedings of the 14th ACM Symposium on Theory of Computing, pages 137–146, 1982.

13

Definability and descriptive complexity on databases of ... - CiteSeerX

Definability and descriptive complexity on databases of ... - CiteSeerX

Suggest Documents

Descriptive Complexity and Finite Models - CiteSeerX

Descriptive Complexity of Optimization and Counting Problems

Expressiveness and Complexity of Active Databases

Modal definability in topology - CiteSeerX

Descriptive Complexity of $\#\textrm {AC}^ 0$ Functions

Note on Supervenience and Definability - Princeton University

Evaluating the Complexity of Databases for Person ... - CiteSeerX

The Complexity of Satisfying Constraints on Databases of Transactions

Intensionality, Definability and Computation

ON DEFINABILITY IN MULTIMODAL LOGIC

Descriptive Models, Grade-Tonnage Relations, and Databases for the ...

Descriptive-complexity based distance for fuzzy sets

ITERATED DEFINABILITY, LAWLESS SEQUENCES AND ...

On First-Order Definability and Computability of Progression for ... - IJCAI

On Inverse Halftoning : Computational Complexity and ... - CiteSeerX

Aspects of descriptive, referential, and information ... - CiteSeerX

Remarks on Graph Complexity - CiteSeerX

Descriptive Epidemiology of Cholangiocarcinoma and ... - CiteSeerX

Characterizing Definability of Second-Order

A descriptive study - CiteSeerX

Complexity and Partitions - CiteSeerX

Complexity and Partitions - CiteSeerX

Descriptive statistics - CiteSeerX

Computability and Complexity - CiteSeerX