View-Based Query Containment - CiteSeerX

4 downloads 0 Views 224KB Size Report
Page 1. Page 2. Page 3. Page 4. Page 5. Page 6. Page 7. Page 8. Page 9. Page 10. Page 11. Page 12.
View-Based Query Containment Diego Calvanese Giuseppe De Giacomo Maurizio Lenzerini Dipartimento di Informatica e Sistemistica Univ. di Roma “La Sapienza” Via Salaria 113, I-00198 Roma, Italy

Moshe Y. Vardi Department of Computer Science Rice University, P.O. Box 1892 Houston, TX 77251-1892, U.S.A.

[email protected]

[email protected] ABSTRACT

Query ontainment is the problem of he king whether for all databases the answer to a query is a subset of the answer to a se ond query. In several data management tasks, su h as data integration, mobile omputing, et ., the data of interest are only a

essible through a given set of views. In this ase, ontainment of queries should be determined relative to the set of views, as already noted in the literature. Su h a form of ontainment, whi h we all view-based query ontainment, is the subje t of this paper. The problem omes in various forms, depending on whether ea h of the two queries is expressed over the base alphabet or the alphabet of the view names. We present a thorough analysis of view-based query ontainment, by dis ussing all possible

ombinations from a semanti point of view, and by showing their mutual relationships. In parti ular, for the two settings of onjun tive queries and two-way regular path queries, we provide both te hniques and omplexity bounds for the di erent variants of the problem. Finally, we study the relationship between view-based query ontainment and view-based query rewriting. 1.

Introdu tion

Querying is the fundamental me hanism for extra ting information from a database. Besides the basi task of query answering, i.e., evaluating a query over a database, data and knowledge representation systems should support other reasoning servi es related to querying. One of the most important is query ontainment, i.e., verifying whether for all databases the answer to a query is a subset of the answer to a se ond query. Che king ontainment of queries is ru ial in several ontexts, su h as query optimization, query reformulation, knowledge-base veri ation, information integration, integrity he king, and ooperative answering [22, 28, 16, 5, 9, 33, 27, 11, 18, 20, 32℄. Obviously, query ontainment is also useful for he king equivalen e of queries, i.e., verifying

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. To copy otherwise, to republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. PODS 2003, June 9-12, 2003, San Diego, CA. Copyright 2003 ACM 1-58113-670-6/03/06 ... 5.00

$

whether for all databases the answer to a query is the same as the answer to another query. The main results on query

ontainment are summarized in [15℄. In several data management tasks, the data of interest are only a

essible through a given set of views. In other words, we would like to a

ess a database, but we have only information about the data satisfying the views. For example, in data integration [25℄, data are stored in lo al sour es modeled as views over a virtual database. It follows that answering a query expressed over the virtual database s hema, amounts to answering the query based on the data satisfying the views, rather than the data in the database. Similarly, a data warehouse an be seen as a set of materialized views omputed over a olle tion of data sour es. Only these views, and not the raw data at the sour es, are a

essible when answering queries posed to the data warehouse. Also, in mobile omputing, the data stored in the lo al devi e are those satisfying a set of views over a global database, and it is often desirable to avoid a

essing the global database, in order to save network bandwidth. In all the above mentioned settings, query pro essing omes in a di erent form with respe t to the traditional framework. Instead of onsidering a query over of a given database, one should take into a

ount that the query is now pro essed by relying only on the information about the data satisfying the views. We

all this kind of pro essing view-based query pro essing [12℄. Again, the basi reasoning servi es related to view-based query pro essing is the one aiming at omputing the answer to a query, based on the available views. The problem,

alled view-based query answering has been the subje t of an extensive investigation in the last years [23℄. Analogously to the traditional setting, also in the ontext of view-based query pro essing, the need arises of he king ontainment between queries. In this ontext, ontainment of queries should be determined relative to the set of views, as already noted in the literature [31, 30℄. Su h form of ontainment,

alled view-based query ontainment, is the subje t of this paper. To the best of our knowledge, the rst paper dealing with view-based query ontainment is [31℄, where the problem,

alled \relative ontainment", is studied for variants of onjun tive queries and views. In parti ular, it is shown that relative ontainment is P2 - omplete in the ase of onjun tive queries and views. In [30℄, the results are extended to

the ase where views have limited a

ess patterns. Although previous work on view-based query ontainment

onsidered only the ase of queries expressed in the alphabet of the database (base alphabet), we show here that the problem omes in various forms, depending on whether ea h of the two queries is expressed over the base alphabet, or the alphabet of the view names (view alphabet). The rst ontribution of this paper is a thorough analysis of view-based query ontainment. In parti ular, we dis uss several semanti s of ontainment for all the possible forms, and we study their mutual relationships. The se ond ontribution is a study of view-based query ontainment in two spe i ontexts: onjun tive queries in relational databases, and regular path queries in semistru tured databases. The standard model for semistru tured databases are edge-labeled graphs, where nodes represent obje ts, and labeled edges between two nodes represent links between obje ts [8, 19℄. This model aptures also data expressed using XML-like languages [7, 10℄. Regular path queries (RPQ), whi h ask for all pairs of obje ts that are

onne ted by a path onforming to a regular expression, are the basi querying me hanism in this framework [8, 2, 4℄. Regular path queries an express omplex navigations in a graph. In parti ular, union and transitive losure are ru ial when we do not have a omplete knowledge of the stru ture of the database. In our regular path queries, we in lude also the inverse operator, whi h enables us to navigate edges ba kward [8, 9℄, for example, from a hild to its parent. We all these queries two-way regular path queries (2RPQs). For both onjun tive and two-way regular path queries and views, we provide te hniques and omplexity bounds for the di erent variants of view-based query ontainment. In establishing the results for 2RPQs, we make use of and extend the known onne tion between view-based query answering and onstraint satisfa tion [13℄. In parti ular, we show that su h a onne tion implies a strong relationship between onstraint satisfa tion and view-based query ontainment, in the ases where at least one of the queries is expressed in the base alphabet. This relationship omplements the one between onstraint satisfa tion and onjun tive query ontainment [24℄. A third ontribution is a study of the relationship between view-based query ontainment and view-based query rewriting. In parti ular, we show that view-based query ontainment allows one to verify desirable properties of rewritings. The rest of the paper is organized as follows. Se tion 2 introdu es the te hni al preliminaries on databases and queries. Se tion 3 extends the onne tion between view-based query answering and onstraint satisfa tion of [13℄ to 2RPQs. Se tions 4, 5, 6, and 7 provide the de nitions and the results for view-based query ontainment, in the various ases. Se tion 8 dis usses the onne tion with view-based query rewriting. Se tion 9 on ludes the paper. 2.

Preliminaries

In this paper, we onsider databases and view extensions as nite relational stru tures. Let  be an alphabet of relation symbols, ea h with an asso iated arity. A nite relational

stru ture (or simply stru ture ) R over  is a pair (R ; R ), where R is a nite domain and R is a fun tion that assigns to ea h relation symbol in r 2  a relationRrR , also denoted by r(R), of the appropriate arity over  . Given a query Q over , we denote by Q (R) the result of evaluating Q over R. Note that we expli itly annotate queries with

the alphabet over whi h they are de ned. Given two stru tures R1 and R2 over , we use R1  R2 to denote that r(R1 )  r(R2 ), for ea h r 2 . A query Q is monotone if Q (R1 )  Q (R2 ) whenever R1  R2 . Let  be a nite alphabet of relation symbols, xed on e and for all, whi h we all base alphabet. A database is a stru ture over . Consider a database that is a

essible only through a olle tion V of views, and suppose we want to answer a query over the database only on the basis of our knowledge on the views. Spe i ally, the olle tion of views is represented by a nite set V of view symbols, ea h denoting a relation. Ea h view symbol V 2 V has an asso iated view de nition V  , whi h is a query over . A V -extension E is a stru ture over V . We onsider views to be sound [3, 21℄, i.e., we model a situation where the extension of the views provides a subset of1 the results of applying the view de nitions to the database . Formally, given a set of views V and a database B, we use V  (B) to denote the V -extension E su h that V (E ) = V  (B), for ea h V 2 V . We saythat a V -extension E is sound wrt a database B if E  V (B). In other words, for a view V 2 V , all the tuples in V (E ) must appear in V  (B), but V  (B) may ontain tuples not in V (E ). A set V of views is non- onstraining if ea h V -extension is sound wrt some database, i.e., for ea h V -extension E there exists a database B su h that E  V  (B). Intuitively, all extensions of non onstraining views are admissible, sin e they are sound wrt some database. Given a set V of views, a V -extension E , and a query Q ,  the set of ertain answers under sound views to Q with respe t to V and E is the set of tuples t of obje ts su h that t 2 Q (B) for every database B wrt whi h E is sound, i.e., E  V  (B). View-based query answering under sound views onsists in de iding whether a given tuple of obje ts is a ertain answer under sound views to Q with respe t to V and E . Given a set V of views and a query Q , we denote by ert Q ;V the query that, for every V -extension E , returns the set of ertain answers under sound views to Q with respe t to V and E . Sin e in this paper we onsider sound views only, we simply refer to ertain answers and view-based query answering, thus dropping the quali ation \under sound views". We onsider the ase of onjun tive queries over relational databases. In parti ular, we deal with standard onjun tive queries without equalities and without onstants. Observe that su h queries are monotone and that every set of views whose de nitions are onjun tive queries is non onstraining. We also onsider (in fa t mainly fo us on) the

ase of regular path queries over semistru tured databases. A semistru tured database is a nite graph whose nodes represent obje ts and whose edges are labeled by elements from 1 This orresponds to adopting the so- alled \open world assumption" for the views.

 [8, 1℄. An edge (x; r; y) from node x to node y labeled by r represents the fa t that relation r holds between the obje t x and the obje t y . Note that a semistru tured database an be viewed as a stru ture B over the set  of binary relational symbols. A regular-path query (RPQ) over an alphabet  of binary relation symbols is expressed as a regular expression or a nite-state automaton over . When evaluated on a stru ture R over , an RPQ omputes the set of pairs of obje ts onne ted in R by a path in the regular language de ned by the RPQ. We onsider two-way regular-path queries (2RPQs) [12℄, whi hextend RPQs with the inverse operator. Formally, let  =  [ fr j r 2 g be the alphabet in luding a new symbol r for ea h r in . Intuitively, r denotes the inverse of the binary relation r. If q 2  , then we use q to mean the inverse of q, i.e., if q is r, then q is r , and if q is r , then q is r. 2RPQs are expressed by means of regular expressions or nite-state automata over  , whose language is di erent from the language onsisting only of the empty word ". When evaluated on a stru ture R over , a 2RPQ Q omputes the set Q (R) of pairs of obje ts onne ted in R bya semipath that onforms to the regular language L(Q ). A semipath in R from x to y (labeled with q1    qn ) is a sequen e of the form (y0 ; q1 ; y1 ; : : : ; yn 1 ; qn ; yn ), where n  0, y0 = x, yn = y, and for ea h yi 1 ; qi ; yi , we have that qi 2  , and, if qi = r then (yi 1 ; yi ) 2 r(R), and if qi = r then (yi ; yi 1 ) 2 r(R ). We say that a semipath (y0 ; q1 ; : : : ; qn ; yn ) onforms to Q if q1    qn 2 L(Q ). A semipath is said to be simple if no obje t appears more than on e in the orresponding sequen e. Observe that 2RPQs (resp., RPQs) are monotone and that every set of 2RPQ (resp., RPQ) views is non- onstraining. Given two queries Q1 and Q2 over the same alphabet , we say that Q1 is ontained in Q2, denoted Q 1  Q2 ,if for every stru ture R over , we have that Q1 (R)  Q2 (R). In this paper we onsider query ontainment relative to a set of views. In parti ular, we study the notion of view-based query ontainment of a query Q1 1 in a query Q2 2 wrt a  set V of views, denoted by Q 1 1 V Q2 2 , where ea h i is either  or V . The semanti s of this notion depends on 1 and 2 , i.e., the alphabets over whi h the two queries are expressed, and will be dis ussed for the various ases in the next se tions. The omplexity results we provide refer to two spe i settings: onjun tive queries, where both queries and views are onjun tive, and 2RPQs, where both queries and views are 2RPQs. 3.

Constraint-satisfa tion Answering

and

Query

A onstraint-satisfa tion problem (CSP) is traditionally de ned in terms of a set of variables, a set of values, and a set of onstraints, and asks whether there is an assignment of the variables with the values that satis es the onstraints. A hara terization of CSP an be given in terms of homomorphisms between stru tures [17℄. A homomorphism h : A ! B between two stru tures A and B over the same alphabet is a mapping h : A ! B su h that, if ( 1 ; : : : ; n ) 2 RA , then (h( 1 ); : : : ; h( n )) 2 RB , for every relation symbol R in the alphabet. Let A and B be two

lasses of stru tures. The (uniform) onstraint-satisfa tion problem CSP (A; B) is the following de ision problem: given a stru ture A 2 A and a stru ture B 2 B over the same alphabet, is there a homomorphism h : A ! B ? When B onsists of a single stru ture B and A is the set of all stru tures over the alphabet of B , we get the so- alled non-uniform

onstraint-satisfa tion problem, denoted by CSP (B ), where B is xed and the input is just a stru ture A 2 A. As usual, we onsider CSP (B ) as the set of stru tures A su h that there is a homomorphism from A to B . Interestingly, ontainment between two CSPs an be hara terized in terms of homomorphism. Indeed, the following general result holds. Proposition 1. Let B1 and B2 be two stru tures over the same alphabet. Then CSP (B1 ) is ontained in CSP (B2 ) i there exists a homomorphism from B1 to B2 . Proof. \(" Suppose there exists a homomorphism h from B1 to B2 . Given an instan e A of CSP (B1 ), by de nition, there is a homomorphism g from A to B1 . Then g Æ h is an homomorphism from A to B2 . \)" B1 is obviously in CSP (B1 ), sin e the identity fun tion is an homomorphism. Sin e by hypothesis CSP (B1 ) is

ontained in CSP (B2 ), we have that B1 is in CSP (B2 ), i.e, there is an homomorphism from B1 to B2 .

We will exploit su h a hara terization for the results in the following se tions. A tight relationship between non-uniform CSP and viewbased query answering for RPQs is illustrated in [13℄, whi h provides a polynomial redu tion from view-based query answering for RPQs to non-uniform CSP. In this paper we extend this relationship to 2RPQs and exploit it in di erent ways, by making use of the notions of onstraint template and onstraints instan e (both for query answering and for view-based query answering), de ned as follows. Given a 2RPQ Q over an alphabet , the onstraint template CT Q ; of Q wrt  is the stru ture C de ned as follows.  The alphabet of C is  [ fUi ; Uf g, where Ui and Uf

denote unary relation symbols.  Let AQ = ( ; S; S0 ; ; F ) be a (nondeterministi ) automaton for Q . The stru ture C = (C ; C ) is given by: { C = 2S ; {  2 UiC i S0   ; {  2 UfC i  \ F = ;; { (1 ; 2 ) 2 rC i (1 ; r)  2 and (2 ; r )  1 | we onsider here  as extended to sets of states in the usual way.

Given a stru ture R over  and a pair of obje ts and d, the is the stru ture

onstraint instan e R ;d of CSP (CT Q ; ) I = (I ; I ) de ned as follows:

 I = R [ f ; dg;  rI = rR, for ea h r 2 ;  UiI = f g, and UfI = fdg.

2. Let Q be a 2RPQ over an alphabet , R a stru ture over , and , d a pair of obje ts. Then, ( ; d) 62 Q (R) if and only if there is a homomorphism from R ;d to Theorem

To he k the existen e of a word q1    qk 2 L(V ) and of a sequen e T0 ; : : : ; Tk of subsets of S su h that onditions 1{3 above are satis ed, we he k nonemptiness of the interse tion of V with an automaton A de ned as follows [35℄. A = ( ; S A ; I A ; Æ A ; F A ) with S A = 2S  2S , I A = f(1 ; 1 )g, F A = f(T; 2 ) j T 2 2S g, and with (U; V ) 2 ÆA ((T; U ); q) if the following holds:  if s 2 U and t 2 (s; q ), then t 2 T ;  if s 2 U and t 2 (s; q), then t 2 V .

CT Q ; .

Proof. \(" By ontradi tion. Let h be a homomorphism from R ;d to CT Q ; , and assume there is a semipath p = (x0 ; q1 ; x1 ; : : : ; xm 1 ; qm ; xm ) from = x0 to d = xm in R with q1    qm 2 L(Q). Hen e there is a sequen e Æ = (s0 ; : : : ; sm ) of states of AQ su h that s0 2 S0 , sm 2 F , and si+1 2 (si; qi+1 ) for ea h i 2 f0; : : : ; m 1g. Sin e h is a homomorphism, we have that s0 2 h( ), and hen e, by

onstru tion of the relations ri in the onstraint template, we have that si 2 h(xi ) for ea h i 2 f0; : : : ; m 1g. Moreover, sin e h(d) 2 Uf , we have that sm 62 F . Contradi tion. \)" Given a stru ture R and two obje ts , d su h that ( ; d) 62 Q(R), we build a mapping h : R ! 2S by putting ea h state in S0 in h( ) and repeating the following until h does not hange any more: if (x; y) 2 r(R) and s 2 h(x) then add (s; r) to h(y), and if (x; y) 2 r(R) and s 2 h(y) then add (s; r ) to h(x). Note that, sin e ( ; d) 62 Q (R), we have that h ;d(d) [ F = ;. Hen e, h is indeed an homomorphism from R to CT Q ;.

We note that it follows from Theorem 2 and [17℄ that the

omplement of a 2RPQ an be expressed in monadi NP (i.e., in monadi existential se ond-order logi ). In ontrast, it follows from [6℄ that even a simple query su h as a annot be expressed in monadi NP. Next we de ne onstraint templates for the ertain answers of 2RPQs wrt a set of 2RPQ views, extending the analogous notion for RPQs in [13℄. Given a 2RPQ Q and a set V of 2RPQ views, the onstraint template CT Q ;V of Q wrt V is the stru ture C de ned as follows.  The alphabet of C is V [ fUi ; Uf g.  Let AQ = ( ; S; S0 ; ; F ) be a (nondeterministi ) automaton for Q . The stru ture C = (C ; C ) is

given by: { C = 2S ; {  2 UiC i S0   ; {  2 UfC i  \ F = ;; { (1 ; 2 ) 2 V C i there exists a word q1    qk 2 L(V ) and a sequen e T0 ; : : : ; Tk of subsets of S su h that the following hold: 1. T0 = 1 and Tk = 2 , 2. if s 2 Ti and t 2 (s; qi ) then t 2 Ti+1 , for 0  i < k, and 3. if s 2 Ti and t 2 (s; qi ) then t 2 Ti 1 , for 0 < i  k.

It is easy to see that the automaton A a

epts a word q1    qk if there exists a sequen e T0 ; : : : ; Tk of subsets of S su h that onditions 1{3 above are satis ed. Hen e, by

onstru ting A on the y while he king for nonemptiness, one an verify whether (1 ; 2 ) 2 V C in NLOGSPACE in the size of V and in PSPACE in the size of Q . Given a V -extension ;d E and a pair of obje ts and d, the

onstraint instan e E of CSP (CT Q ;V ) is the stru ture I = (I ; I ) over the alphabet V[fUi ; Uf g de ned as follows:  I = E [ f ; dg;  V I = V E , for ea h V 2 V ;  UiI = f g, and UfI = fdg. Theorem 3. Let Q be a 2RPQ, V a set of 2RPQ views, E a V -extension, and , d a pair of obje ts. Then, ( ; d) 62

ert Q ;V (E ) if and only if there is a homomorphism from E ;d to CT Q ;V . Proof. \(" By ontradi tion. Given a homomorphism from E ;d to CT Q ;V , we onstru t a database B su h that E  V (B) as follows: for every view V and every pair (a; b) 2 V (E ) we hoose a word w = q1    qk 2 L(V  ) su h that there is a sequen e T0 ; : : : ; Tk of subsets of S satisfying the ondition used to build the onstraint template, and introdu e in B a simple semipath (y0 ; q1 ; y1 ; : : : ; yk 1 ; qk ; yk ) where y0 = a, yk = b and y1 ; : : : ; yk 1 are new obje ts. Suppose that ( ; d) 2 Q (B), i.e., there is a semipath p = (x0 ; q1 ; x1 ; : : : ; xm 1 ; qm ; xm ) from = x0 to d = xm in B with q1    qm 2 L(Q ). Hen e there is a sequen e Æ = (s0 ; : : : ; sm ) of states of AQ su h that s0 2 S0 , sm 2 F , and si+1 2 (si ; qi+1 ) for ea h i 2 f0; : : : ; m 1g. Let xi and xi + h be two obje ts in the extension E su h that, for ea h i < j < i+h, xj is one of the new obje ts introdu ed by the onstru tion of B. By onstru tion of the relations V in the onstraint template, we have that, if si 2 h(xi ), then si+h 2 h(xi+h ). Now we have that s0 2 h( ), and hen e, by indu tion on the number of obje ts among x1 ; : : : ; xm that appear in E , we have that sm 2 h(d). But sin e h(d) 2 Uf , we have that sm 62 F . Contradi tion. \)" Given a database B su hthat E  V (B) and two obje ts

, d su h that ( ; d) 62 Q (B), we build a mapping h0 : B ! 2S by putting ea h state in S0 in h0 ( ) and repeating the following until h does not hange any more: if (x; y) 2 r(B) and s 2 h0 (x) then add (s; r) to h0 (y), and

h

if (x; y) 2 r(B) and s 2 h0 (y) then add (s; r0 ) to h0 (x). Note that, sin e ( ; d) 62 Q (B), we have that h (d) [ F = ;. Proje ting h0 on E we obtain a homomorphism from E ;d to CT Q ;V . Again, by Theorem 2 and [17℄, it follows that the omplement of omputing ertain tuples of a 2RPQ with respe t to a xed set of 2RPQ views is in monadi NP. The proof via CSP is mu h simpler than the proof in [12℄ that omputing the ertain answers is in oNP in data omplexity. 4.

View-based ontainment of

Q1 in Q2

The rst problem we address is view-based ontainment between two queries over the base alphabet. This is the problem originally studied in [31℄. 1. Q1 V Q2 if for every database B, and for every V -extension E with E  V (B), we have

ert Q1 ;V (E )  ert Q2 ;V (E ). Definition

A simple example showing the di eren e between traditional query ontainment and view-based query ontainment is the following. Let the base alphabet be  = fperson ; worksfor g and onsider the two queries Q1 = fx j person (x)g Q2 = fx j person (x); worksfor (x; y )g Obviously, Q1 is not ontained in Q2 . Now, suppose our database is only a

essible through the views V = fV g, where the de nition asso iated to the only view V is V = fx j person (x); worksfor (x; y)g. It is easy to see that, for every extension E of V , the ertain answers to both Q1 and Q2 wrt V and E oin ide with E . It follows that view-based

ontainment between Q1 and Q2 holds, i.e., Q1 V Q2 . Observe that in De nition 1 we onsider only V -extensions that are sound wrt some database. It turns out that, w.l.o.g. we an drop this requirement of V -extensions. Intuitively, this happens be ause for ea h extension E that is not ontained in V (B) for some database B, both ert Q1 ;V (E ) and

ert Q ;V (E ) be ome trivial and return all possible tuples of obje ts2 in E . Formally, we have the following result. Proposition 4. Q 1 V Q2 if and only if for every V extension E , we have ert Q1 ;V (E )  ert Q2 ;V (E ). Proof. \(" Obvious. \)" By ontradi tion. Suppose that for all databases B, and for all V -extensions E with E  V (B), we have that ert Q1 ;V (E )  ert Q2 ;V (E ), but there exists an E su h that ert Q1 ;V (E) 6 ert Q2 ;V (E). The latter implies that there is a tuple t of obje ts in E su h that t 2 ert Q1 ;V (E) and for a database B with E  V (B) we have t 62 Q2 (B). But this ontradi ts the hypothesis.

Note that the ondition in the proposition above is a tually the de nition of relative ontainment in [31℄.

One ould also ompare the two queries only wrt extensions that orrespond exa tly to the evaluation of the views over some database. Formally, this notion orresponds to he king whether, for all databases B, we have ert Q1 ;V (V (B)) 

ert Q2 ;V (V (B)). It turns out that su h a notion is equivalent to view-based query ontainment as de ned in De nition 1. Proposition 5. ert Q ;V (V (B ))  1 every database B, if and only if Q1 V Proof.

onditions:

ert Q2 ;V (V (B)) for Q2 .

By Proposition 4, it suÆ es to show that the

1. for every V -extension E we have that ert Q1 ;V (E ) 

ert Q2 ;V (E ) 2. for every database B, we have that ert Q1 ;V (V (B)) 

ert Q2 ;V (V (B)) are equivalent. (1) implies (2): straightforward, sin e for ea h database B, V (B) is a V -extension. (2) implies (1): we show that not (1) implies not (2). Assume that there exists a V -extension0 E and a tuple of obje ts t su h that for all databases B su h that E  V (B0 ), we have that t 2 Q1 (B0 ), and there exists a database B su h that E  V (B) and t 62 Q2 (B). Observe that, sin e E  V (B),00 we have that t 2 Q1 (B00). Moreover, for 00all databases B su h that00 V (B)  V (B ), sin e E  V (B ), we have that t 2 Q1 (B ). Hen e we get that for the database B, we have t 2 ert Q1 ;V (V (B)) but t 62 ert Q2 ;V (V (B)). Te hniques for view-based query ontainment of Q1 in Q2 for onjun tive queries are des ribed in [31℄. It is well known that for onjun tive views V and a onjun tive query Q , the ertain answers ert Q ;V an be expressed as a union of onjun tive queries, alled maximally ontained rewriting of Q wrt V [26℄. Hen e, he king Q1 V Q2 redu es to

he king (traditional) ontainment between the maximally

ontained rewriting of Q1 wrt V and the maximally ontained rewriting of Q2 wrt V . Considering that the size of the maximally ontained rewriting of a onjun tive query Q wrt V is in general exponential in the size of Q, and that a union of onjun tive queries [i qi is ontained in a union of

onjun tive queries [j rj , if and only if for every i there is some j su h that qi is ontained in rj , one gets the upper bound given in [31℄. In the same paper, also a mat hing lower bound is proved, thus getting the following omputational omplexity hara terization. Che king Q1

omplete for onjun tive queries. Theorem 6

([31℄).

V Q2

is

P2 -

Next we study the problem for 2RPQs. By Proposition 4, we an on entrate on he king whether for all V -extensions E we have that ert Q1 ;V (E )  ert Q2 ;V (E ). By Theorem 3,

we know for a 2RPQ Q that ( ; d) is not in ert Q ;V (E ) if and only if E ;d is in CSP (CT Q ;V ). Therefore, Q1 V Q2 if and only if for all ;dV -extensions E , and for all obje ts

; d, we have that E in CSP (CT Q2 ;V ) implies E ;d in CSP (CT Q1 ;V ). In order to perform su h a he k we resort to a suitable variant of ontainment between CSPs (see, Theorem 1). Given a onstraint template CT Q ;V , a proper onstraint template CT ; Q ;V is obtained by eliminating from Ui all but one element and from Uf all but one element . Observe that, in general, there are many proper onstraint templates for ea h onstraint template CT Q ;V (more pre isely there are j2S  2S j, where S is the set of states of the automaton for Q ). Observe that a proper onstraint template is itself a

onstraint instan e of the original onstraint template. Theorem

7. For 2RPQs, Q1 V ; Q2 if and only if, for

every proper onstraint template CT Q ;V of CT Q2 ;V , there 2 is a homomorphism from CT ; Q ;V to CT Q 1 ;V . 2

Proof.

By Theorem 3, it suÆ es to verify that, if for all

V -extensions E and for all ;dobje ts ; d, we have that E ;d in CSP (CT Q2 ;V ) implies E in CSP (CT Q1 ;V ), then for every

proper onstraint template CT ; of CT Q2 ;V there exists Q 2 ;V ; a homomorphism from CT Q2 ;V to CT Q1 ;V , and vi e-versa.

\(" Suppose for every proper onstraint template CT ; Q 2 ;V ; of CT Q2 ;V there exists a homomorphism from CT Q2 ;V to CT Q1 ;V . Consider an extension E ;d in CSP (CT Q2 ;V ), i.e, there is a homomorphism g from E ;d to CT Q2 ;V . Consider the proper onstraint template CT Qg(2 );;gV (d) . By hypothesis there is an homomorphism h from CT Qg(2 );;gV (d) to CT Q1 ;V . Then g Æ h is an homomorphism from E ;d to CT Q1 ;V . \)" Every proper onstraint template CT ; of CT Q2 ;V Q 2 ;V is obviously in CSP (CT Q2 ;V ). Sin e by hypothesis every onstraint instan e in CSP (CT Q2 ;V ) is also in CSP (CT Q1 ;V ), we have that CT ; is in CSP (CT Q1 ;V ), Q 2 ;V ; i.e, there is an homomorphism from CT Q2 ;V to CT Q1 ;V . For the following lemma, we exploit the hara terization of Q1 V Q2 , in Theorem 7. Lemma

2RPQs.

8. Che king Q1 V

Q2 is in NEXPTIME for

Proof. We have to he k whether every proper onstraint template CT ; of CT Q2 ;V is in CSP (CT Q1 ;V ). Q 2 ;V Both onstru ting the onstraint template CT Q1 ;V and onstru ting ea h proper onstraint template CT ; is expoQ 2 ;V   nential in the size of Q1 and Q2 respe tively, and polyno-

mial in the size of the de nitions of the views in V . Che king the existen e of ea h homomorphism is NP in the size of the CT Q1 ;V . Moreover, the number of proper onstraint templates is j2S  2S j, where S is the set of states of the automaton for Q2 , and hen e it is exponential in the size of Q2 . These bounds give us the laim. The next lemma gives us a mat hing lower-bound. Lemma

RPQs.

9. Che king Q1 V Q2 is NEXPTIME-hard for

Proof. In Lemma 20, we will prove NEXPTIMEhardness of Q1 V QV2 for RPQs. Consider the RPQ QV2 in that proof, and let Q2 be the RPQ obtained by repla ing in QV2 theVview symbols by their de nitions. It is easy to see that Q2 (E ) = ert Q2 ;V (E ). Thus, the redu tion in Lemma 20 also proves NEXPTIME-hardness for the ontainment of ert Q1 ;V in ert Q2 ;V .

From Lemmas 8 and 9, we get the following omputational

omplexity hara terization. 10. Che king

Q1 V Q2 is NEXPTIME omplete both for 2RPQs and for RPQs. Theorem

5.

View-based ontainment of

QV1 in Q2

In this se tion we study view-based ontainment of a query expressed on the view alphabet in a query Q2 expressed on the base alphabet. Intuitively, we want to he k whether, for all view extensions E that are sound with respe t to some database, the evaluation of QV1 over E yields a subset of the

ertain answers of Q1 with respe t to V and E .

QV1

Definition

2. QV1 V

for every V -extension

ert Q2 ;V (E ).

E

Q2 if for every database B, and with E  V (B), we have QV1 (E ) 

Again, the above de nition onsiders only V -extensions that are sound with respe t to some database. However, as for the ase of previous se tion, it turns out that we an drop this requirement w.l.o.g. 11. QVV1 V Q2 if and only if for every V extension E we have Q1 (E )  ert Q2 ;V (E ). Proposition

Proof.

equivalent:

We show that the following two onditions are

1. forV all databases B, for all V -extensions E  V (B), Q1 (E )  ert Q2 ;V (E ) 2. for all V -extensions E , QV1 (E )  ert Q2 ;V (E ).

(2) implies (1): obvious. (1) implies (2): by ontradi tion. Suppose that for every database B, and for every V -extension E with E  V (B), we have that QV1 (E )  ert Q2 ;V (E ), but there exists an E su h that QV1 (E) 6 ert Q2 ;V (E). The latter implies that there is a tuple t of obje ts in E su h that t 2 QV1 (E) and for a database B with E  V (B) we have t 62 Q2 (B). But this

ontradi ts the hypothesis.

Interestingly, it turns out that, if both queries QV1 and Q2 are monotone, the ondition in the proposition above is equivalent to a simplerV ondition, namely: for every database B we have that Q1 (V (B))  Q2 (B). This is stated in the following proposition.

As for the ase of Se tion 4, one ould also on eive a di erent notion of ontainment of QV1 in Q2 than the one given in De nition 2, namely: for every database B, we have that QV1 (V (B)) is a subset of ert Q2 ;V (V (B)). In general this notion di ers from the one given in De nition 2, as shown by the following example. Let the base alphabet be  = fRg and the views be V = fV1 ; V2 g with de nitions V1 = V2 = R. Consider the two unary queries QV1 = fx j V1 (x); :V2 (x)g Q2 = ; ItV is easy to see that, for every database B, we have Q1 (V (B)) = ;, and hen e QV1 (V (B))  ert Q2 ;V (V (B)) (in fa t, also ert Q2 ;V (V (B)) = ;). However, onsider the database B with RB = f(a; b)g and the V -extension  E with V1E = f(a; b)g and V2E = ;, whi h is su h that E  V V (B). For su h a database Vand view extension we have Q1 (E) = f(a; b)g, and hen e Q1 (E) 6 ert Q2 ;V (E).

Sin e QV1 is monotone, by Propositions 11 and 12, it suÆ es to prove that the following two onditions are equivalent:

However it turns out that, if QV1 is monotone, the two notions are equivalent.

Proposition 13. Let QV1 V Q2 if and only QV1 (V (B))  Q2 (B).

QV1 and Q2 be monotone. Then, if for every database B, we have

Proof.

1. for all databases B, we have that QV1 (V (B)) 

ert Q2 ;V (V (B)) 2. for all databases B, we have that QV1 (V (B))  Q2 (B) (1) implies (2): straightforward, sin e by de nition of ertain answers, for all databases B we have that ert Q2 ;V (V (B))  Q2 (B). (2) implies (1): we show that not (1) implies not (2). Assume that there exists a database B and a tuple of obje ts t su h that t 2 QV1 (V (B)), and there exists a database B0 su h that V (B)  V(B0 ) and t 62 Q2 (B0 ). Observe that, by monotoni ity of Q2 , we have that t 62 Q2 (V (B)). Hen e for the database B we have that t 2 QV1 (V (B)) and t 62 Q2 (V (B)), thus getting the thesis.

Proposition 12. Let QV QV1 V 1 be monotone. Then,  V Q2 if and only if for every database B, we have Q1 (V (B)) 

ert Q2 ;V (V (B)).

Sin e onjun tive queries are monotone, the above Vtheorem implies that, for onjun tive queries, he king Q1 V Q2 simply amounts to he king whether for every B, QV1 (V (B ))  Q2 (B). This is equivalent to substituting the view symbols appearing in QV1 with the orresponding view de nitions, thus getting a new onjun tive query Q1 over the base alphabet , and then he king Q1  Q2 .

Proof. By Proposition 11, it suÆ es to prove that the following two onditions are equivalent:

Theorem 14. Che king QV 1

onjun tive queries.

QV1 (E )



QV1 (V (B))



1. for all V -extensions E we have that

ert Q2 ;V (E ) 2. for all databases B, we have that

ert Q2 ;V (V (B))

(1) implies (2): straightforward, sin e for ea h database B,

V (B) is a V -extension.

(2) implies (1): we show that not (1) implies not (2). Assume that there exists a V -extension E and a tuple of obje ts t su h that t 2 QV1 (E ), and there exists a database B0 su h that E  V (B0 ) and t 62 Q2 (B0 ). Observe that, by monotoni ity of QV1 , we have that t 2 QV1 (V (B0 )). Hen e the database B0 is su h that QV1 (V (B0 )) 6 ert Q2 ;V (V (B0 )).

V Q2

is NP- omplete for

Now we turn to the problem of he king whether QV1 V Q2 in the setting of 2RPQs. As for the ase of onjun tive queries, sin e 2RPQs are monotone, he king view-based

ontainmentV of QV1 in Q2 redu es to he king whether for every B, Q1 (V (B ))  Q2 (B). Again, this is equivalent to substituting views with their de nitions, thus getting a new 2RPQ Q1 , and then he king Q1  Q2 . Theorem 15. Che king QV 1 V Q2 is PSPACE omplete both for 2RPQs and for RPQs. Proof. Sin e 2RPQs are monotone, by Proposition 13,

he king QV1 V Q2 is equivalent to he king whether Q1  Q2 , where Q1 is the 2RPQ over  obtained from QV1 by substituting ea h view symbol V with its de nition

V .

The upper bound follows from the fa t that ontainment between two 2RPQs an be he ked in PSPACE [15℄. The lower bound follows from PSPACE-hardness of ontainment of regular expressions, onsidering a set of views that

oin ide with the base symbols.

6.

View-based ontainment of

Q1 in QV2

We address the problem of view-based ontainment between a query over the base alphabet and a query over the view alphabet. Definition 3. Q 1 V QV2 if for every database B, and for every V -extension E  V (B), we have ert Q1 ;V (E )  QV2 (E ).

Again in this de nition we onsider only V -extensions that are sound wrt some database. However, we an drop this requirement w.l.o.g. for non- onstraining views. Proposition 16. Let V be a set of non- onstraining views. Then, Q1 V QV2 if and only if for every V -extension E , we have ert Q1 ;V (E )  QV2 (E )

\(" Obvious. \)" By ontradi tion. Suppose that for ea h database B and for ea h V -extensions E with E  V (B), we have that ert Q1 ;V (E )  QV2 (E ), but suppose there exists an E su h that ert Q1 ;V (E) 6 QV2 (E). Now, sin e views are non- onstraining, there exists a database B su h that E  V (B). Hen e E ontradi ts the hypothesis. Proof.

For onjun tive queries, we know that ert Q1 ;V oin ides with the maximally ontained rewriting of Q1 wrt V , whi h is a possibly exponential union of onjun tive queries, ea h of linearV size in Q1 [23℄. To he k non- ontainment

ert Q1 ;V 6 Q2 , we have to guess a onjun tive query in the maximal rewriting of Q1 , and verify that it is not ontained V in Q2 . Hen e we get the following upper bound. Theorem

tive queries.

17.

Che king Q1 V QV2 is in P2 for onjun -

Whether this bound is tight is an open problem. Now we turn to the problem of he king whether Q1 V QV2 , in the setting of 2RPQs. Sin e 2RPQ views are non onstraining, by Proposition 16, itV is suÆ ient to he k whether ert Q2 ;V is ontained in Q1 . By Theorem 3, we know for a 2RPQ Q that ( ; d) is not in

ert Q ;V (E ) i E ;d is in CSP (CT Q ;V ). On the other hand, byV Theorem ;d2, we know for a query QV that ( ; d)is not inV Q (E ) i E is in CSP (CT QV ;V ). Therefore, Q1 V Q2

if and only if for every V -extension E , and for every pair obje ts and d, we have that E ;d in CSP (CT QV2 ;V ) implies E ;d in CSP (CT Q1 ;V ).

In order to perform su h a he k we again resort to a suitable variant of ontainment between CSPs (see Theorem 1). Again, we introdu e proper onstraint templates. Given a

onstraint template CT QV ;V , a proper onstraint template CT ; QV ;V is obtained by eliminating from Ui all but one element and from Uf all but one element . Theorem 18. For 2RPQs, Q 1 V QV2 if and only if for every proper onstraint template CT ; QV2 ;V of CT QV2 ;V there ; exists a homomorphism from CT QV ;V to CT Q1 ;V .

2

Proof.

Analogous to the proof of Theorem 7.

For the following lemma, we exploit the above hara terization of Q1 V QV2 . Lemma

19. Che king Q1 V QV2

Proof.

Analogous to the proof of Theorem 8.

2RPQs.

is in NEXPTIME for

The next lemma gives us a mat hing lower-bound. Lemma

for RPQs.

20. Che king

Q1

V QV2

is NEXPTIME-hard

Proof. We show a redu tion from a bounded version of the tiling problem [36℄. The problem is de ned as follows. A tile system onsists of a nite tile set T , two adja en y relations V  T  T and H  T  T , an initial tile 0 , and a nal tile f . In a bounded tiling problem we are given su h a tile system and a bound n > 0 (in unary). We have to de ide whether there is a proper tiling of the 2n  2n -grid su h that: (1) t0 is in the bottom left orner and tf is in the top left orner, (2) every pair of horizontal neighbors is in H , and (3) every pair of verti al neighbors is in V . Formally, nwe have to de ide whether there is a tiling fun tion : 2  2n ! T su h that (1) (0; 0) = 0 and t(0; 2n n 1) = f , (2) for every 0  i < 2n 1, and 0  j < 2 , we have nthat ( (i; j ); (in+ 1; j )) 2 H , and (3) for every 0  i < 2 , and 0  jn< 2 1, we have that ( (i; j ); (i; j +1)) 2 V and also (2 1; j ); (0; j +1)) 2 H . This problem is NEXPTIME- omplete [29, 34℄. We redu e it to our ontainment problem. We onstru t a set V of RPQ views and RPQs Q1 and QV2 su h that ert Q1 ;V is ontained in QV2 i the tiling problem is satis able. We en ode grids by labeled dire ted graphs, using the set of views V = fs; f; h; v; 0; 1; tg. The intuition is that s labels the starting edge of the grid, f labels the nal edge of the grid, h label horizontal adja en y edges, v labels verti al adja en y edges, 0 and 1 label numeri al edges (representing bits of the oordinates), and t labels tile edges. We intend the wholen grid to be represented by a path of the form s((0+ 1)22nn th)4 1 (0 + 1)2n tf , where ea h blo k of the form (0 + 1)n t represents one point in2nthe grid and the sequen e of 4 blo ks of the form (0 + 1) represents a 2n-bit ounter.

Note that h-edges onne t horizontally adja ent blo ks, i.e., either i0 = i + 1 and j 0 = j or i = 2n 1, i0 = 0, and j 0 = j + 1. We also intend that if there are v -edges between nodes on this path they should onne t verti ally adja ent blo ks, i.e., i0 = i and j 0 = j +1. We all a path of this form a good path. Good paths annot be des ribed by su

in t regular expressions. We an, however, write a regular expression QV2 of size polynomial in n that des ribes bad paths, i.e.:  paths that do 2not

ontain v2n-edges but are not of the n 

form s((0 + 1) th) (0 + 1) tf , 2n th) (0+1)2n tf , where the  paths of the form s2((0+1) n rst blo k is not 0 or the last blo k is not 12n ,  paths of the form s((0 + 1)2n th) (0 + 1)2n tf , where an h-edge onne ts blo ks that are not horizontally adja ent, or  paths of the form s((0 + 1)2n th) (0 + 1)2n tv((0 + 1)2n th) (0 + 1)2n tf , where the v-edge onne ts blo ks that are not verti ally adja ent (we all these bad vedges). Let the base alphabet be  = fs; f; h; v; 0; 1g [ T . The view de nitions are as follows: (1) s, f , h, v, 0, and 1 are the identity views, e.g., s = s, (2) t = t, where t = [ 2T  . Intuitively, the view t \forgets" the pre ise tiling. The query Q1 is an RPQ of polynomial size that a

epts paths with bad tiling:  paths of the form s((0 + 1) th) (0 + 1) tf , where the

rst tile is not 0 or the last tile is not f ,  paths of the form s((0 + 1) th) (0 + 1) tf , where an h-edge onne ts pairs of tiled blo ks violating H , or  paths of the form s((0 + 1) th) (0 + 1) tv((0 +   1) th) (0 + 1) tf , where the v-edge onne ts pairs of tiled blo ks that violate V . Suppose rst that the given tiling problem is not satis able. Consider a V -extension E that ontains a good path from a to b, augmented by all allowable v-edges, i.e., v-edges onne ting every pair of verti ally adja ent blo ks. Sin e the path is good, (a; b) 62 QV2 (E ). Let B be a database su h that E = V (B) (observe that, be ause of the parti ular form of the views de nitions, su h a B always exists). Thus, every t-edge in E appears in B as a  -edge for some  2 T . Thus, the path from a to b represents a andidate tiling. Sin e the tiling problem is not satis able, we know that this tiling is not a proper tiling, that is, it either does not start with 0 , does not end with f , violates H , or violates V . In all these ases we have that (a; b) 2 Q1 (B). It follows that (a; b) 2 ert Q1 ;V (E ). Thus, ert Q1 ;V (E ) is not ontained in QV2 . Suppose now that the given tiling problem is satis able. Consider a V -extension E and a pair (a; b) 62 QV2 (E ). If a and b are not onne ted, then (a; b) 62 ert Q1 ;V (E ). If a and

b are onne ted, then there is a good path from a to b.

Let

B be a database su h that E = V (B) (again su h a database exists, be ause of the form of the view de nitions). Thus, every t-edge in E appears in B as a  -edge for some  2 T .

Thus, the path from a to b represents a andidate tiling. Sin e the tiling problem is satis able, we know that there is some B su h that this tiling is a proper tiling, that is, it does start with 0 , end with f , respe ts H , and respe t V . It follows that (a; b) 62 Q1 (B). Thus, (a; b) 62 ert Q1 ;V (E ). Consequently, ert Q1 ;V (E ) is ontained in QV2 .

Note that Q1 depends only on H and V but not on n. By starting with a NEXPTIME- omplete Turing ma hine, we an obtain a xed tiling system whose satis ability problem is NEXPTIME- omplete, where in addition to n we need to spe ify the rst n tiles on the grid. From this we get that our ontainment problem is NEXPTIME- omplete even for xed V and Q1 . From Lemmas 19 and 20, we get the following omputational

omplexity hara terization. Theorem 21. Che king Q 1 V QV2 is NEXPTIME omplete both for 2RPQs and for RPQs.

One ould also ompare the two queries only wrt extensions that orrespond exa tly to the evaluation of the views over some database. Formally, this orresponds to he king whether, for every database B, we have that ert Q1 ;V (V (B)) is a subset of QV2 (V (B)). Su h a notion di ers from Q1 V QV2 , even for non- onstraining views and monotone queries. Indeed, onsider  = fRg, V V= fV1 ; V2 g with V1 = V2 = R, and queries Q1 = R and Q2 = V2 . It is easy to see that, for every database B we have ert Q1 ;V (V (B))  QV2 (V (B)). However, the V -extension E with V1 (E ) = f( ; d)g and V2 (E ) = ; is su h that ( ; d) 2 ert Q1 ;V (E ) but ( ; d) 62 QV2 (E ), thus showing that Q1 6V QV2 . Sin e the two versions of ontainment di er, the question arisesV on whether we are able to he k ontainment of Q1 in Q2 only for those V -extensions that orrespond to the evaluation of the view de nitions over some database. To this purpose we an exploit the following proposition. 22. Let QV be monotone, and Vlet Q be the query over  obtained by substituting in Q the view symbols with the orresponding view de nitions. Then, for every database B, we have QV (V (B)) = ert Q ;V (V (B)). Proposition

Proof. . Suppose there exists a database B and a tuple su h that t0 2 QV but for some database B0 su h that V (B)  V (B V), t 62 Q (B0 ). Now Q (B0 ) = QV (V (B 0 )), hen e sin e Q is monotone, we get a ontradi tion. . Suppose there exists0 a database B and a tuple t su h )  V (B0 ) we have that for alldatabases B su h that V ( B that t 2 Q (B0 ), but t 62 QV (V (B)). Now sin e Q (B0 ) = V Q (V (B0 )), we get a ontradi tion.

t

Observe that, in general,V it is not true that, for every V extension E , we have Q (E )) = ert Q ;V (E ). For example,

onsider  = fRg, the set of views V = fV1 ; V2 g with V1 =  V2 = R, and the V -extension E with V1 (E ) = f(a; b)g and V2 (E ) = ;. Then, for the query QV = V2 , we have that QV (E ) = ;, while ert Q ;V (E ) = f(a; b)g. By Proposition 22, if QV2 is monotone, we an he k whether for every database B we have that ert Q1 ;V (V (B))  QV2 (V (B)), by he king whether for every database B we have that ert Q1 ;V (V (B))  ert Q2 ;V (V (B)), where Q2 is the query obtained from QV2 by expanding view symbols with their de nitions. By Proposition 5, this an be done by he king whetherPQ1 V Q2 ( f. Se tion 4), and therefore the problem is 2 - omplete for onjun tive queries, and NEXPTIME- omplete for 2RPQs and RPQs. 7.

View-based ontainment of

QV1

in

QV2

The last form of view-based query ontainment is when the two queries are over the view alphabet. QV2 if for every database B, and for every V -extension E  V (B), we have QV1 (E )  QV2 (E ). Definition

4. QV1 V

Analogously to the previous ase, for non- onstraining views, we an drop w.l.o.g. the requirement to onsider only V -extensions that are sound wrt some database. Therefore, if views are non- onstraining, he king view-based ontainment redu es to he king ontainment of the two queries over the view alphabet. Theorem 23. Che king QV 1 V QV2 is NP- omplete for

onjun tive queries, and PSPACE- omplete both for 2RPQs and for RPQs.

It is interesting to observe that, even for non- onstraining views and monotone queries, he king QV1 V QV2 is not equivalent to he king whether, for every database B, we have that QV1 (V (B))  QV2(V (B)). Indeed,

onsider  = fRVg, V = fV1 ; V2 gV, with V1 = R and V2  = R, and queries Q1 = V1 and Q2 = V2 . It is easy to see that for every database B, we have QV1 (V (B))  QV2 (V (B)). However, the V -extension E with V1 (E ) = f(a; b)g and V2 (E ) = ; shows that QV1 6V QV2 . Sin e the two versions of ontainment di er, the question arises on whether we areV able to he k the ondition that for ea h database B, Q1 (V (B))  QV2 (V (B)). It turns out that this is equivalent to substituting view symbols with their de nitions in both queries, thus getting new queries Q1 and Q2 , and then he king Q1  Q2 . It follows that

he king whether, for ea h database B, QV1 (V (B))  QV2 (V (B)), is NP- omplete for onjun tive queries, and PSPACE- omplete for 2RPQs and RPQs. 8.

Relationship with Rewriting

Although in the previous se tions we have referred to viewbased query answering, another approa h to view-based

query pro essing has been advo ated, namely the one based on rewriting. In view-based query rewriting, a query Q over the base alphabet is pro essed by rst reformulating Q into an expression of a xed language (e.g., union of

onjun tive queries) over the view alphabet, and then evaluating su h an expression over the view extensions. The relationship between view-based query answering and viewbased query rewriting is investigated in [23, 13, 25℄. Note that, in our setting, views are sound, and this property must be taken into a

ount in the reformulation step of the rewriting pro ess. However, most papers on rewriting queries using views are based, either impli itly or expli itly, on the exa t view assumption, whi h states that the extension of the views provides exa tly the results of applying the view de nitions to the database. It follows that we need to provide an adequate de nition of rewriting in a setting where views are sound. 5. QV1 is a rewriting of Q2 under sound views V , if for every database B and for every V -extension E with E  V (B), we have that QV1 (E )  Q2 (B). Definition

Two interesting observations easily follow from this de nition. First, it is immediate to verify that ert Q ;V , when seen as a query over the view alphabet,  onforms to the definition, and therefore is a rewriting of Q under sound views V . Se ond, if QV1 is a rewriting of Q2 under sound views V , then for ea h V -extension E with E  V (B), we have that QV1 (E )  ert Q2 ;V (E ). It is lear from the above observations that, if we do not have any onstraint on the language used to express the rewriting, ert Q ;V is the best rewriting of Q one an obtain. This motivates why, in [13℄, ert Q ;V has been alled the perfe t rewriting of Q under sound views V . On the other hand, when the rewriting is restri ted to belong to a ertain query lass C , one is typi ally interested in a so- alled maximally ontained rewriting, i.e., a rewriting that belongs to C , and is maximal in C . The maximally ontained rewriting is obviously the best possible rewriting in the lass C . Now, given a query QV1 over the view alphabet and a query Q2 over the base alphabet, three interesting questions related to view-based query rewriting arise: 1. Is QV1 a rewriting of Q2 under sound views V ? 2. Is QV1 the perfe t rewriting of Q2 under sound views V? 3. For a lass C of rewritings, is the maximally ontained rewriting of Q2 perfe t? We show in the following that view-based query ontainment is the right on eptual tool to provide the answers to the above questions. Starting from the 1st question, the following proposition shows that view-based query ontainment provides a method for he king whether QV1 is a rewriting of Q2 under sound

views V . The proposition is an easy onsequen e of the definition of rewriting under sound views, and Proposition 11. QV1 is a rewriting of Q2 under sound if and only if QV1  ert Q2 ;V , if and only if QV1 V

Proposition

views V Q2 .

24.

By exploiting this property and the results presented in Se tion 5 on view-based ontainment of QV1 in Q2 , we derive the following omplexity result. Che king whether QV1 is a rewriting of Q2 under sound views V is NP- omplete for onjun tive queries, and PSPACE- omplete both for 2RPQs and for RPQs. Theorem

25.

ToV address the 2nd question, we need to he k both that Q1 is a rewriting of Q2 under sound views V (question 1), and that the perfe t rewriting of Q2 under sound views V is

ontained in QV1 . The latter amounts to verifying whether for every database B, and for every V -extension E with E  V (B), we have that ert Q2 ;V (E )  QV1 (E ). The next proposition shows that we an again use view-based query

ontainment for this purpose. Proposition 26. The perfe t rewriting of Q 2 under sound views V is ontained in QV1 if and only if Q2 V QV1 .

From the results presented in Se tion 6 on view-based ontainment of Q2 in QV1 , the theorem below dire tly follows. 27.

Che king whether the perfe t rewriting of Q2 under sound views V is ontained in QV1 an be done in P2 for onjun tive queries, and is NEXPTIME- omplete both for 2RPQs and for RPQs. Theorem

Propositions 24 and 26 show that, both for onjun tive queries, and for 2RPQs, we have a method for he king whether a given query QV1 is the perfe t rewriting of Q2 under sound views V . Turning our attention to the 3rd question, observe rst that it may happenthat, for a lass C , no maximally ontained rewriting of Q oin ides with the perfe t rewriting of Q under sound views V . A method for omputing maximally

ontained onjun tive rewritings for onjun tive queries and views is des ribed in [26℄. Analogously, a method for omputing maximally ontained two-way regular rewritings for 2RPQs queries and views is des ribed in [12℄. By resorting to these results, and by referring to Theorem 27, we an

on lude that, for both lasses of queries, we have an upper bound for he king whether the maximally ontained rewriting of a query Q2 under sound views V is perfe t. Finally, perfe tness should not be onfused with exa tness. A rewriting of a query is alled exa t if it is equivalent to the query itself. Although every exa t rewriting is also perfe t, it may happen that the perfe t rewriting of a query Q

under sound views V is not exa t. In [14℄, a set of views V is  alled lossless wrt a query Q if the perfe t rewriting of Q under sound views V is exa t. In the same paper, a method for he king losslessness is presented for RPQs. It is easy to see that he king losslessness amounts to he king

ontainment between Q and ert Q ;V , and therefore is not

aptured by the notion of view-based query ontainment studied in this paper. 9.

Con lusions

We have presented a thorough analysis of view-based query

ontainment, showing that the problem omes in various forms, depending on whether ea h of the two queries is expressed over the base alphabet or the view alphabet. We have investigated all possible ombinations from a semanti point of view, and studied their mutual relationships. For both settings of onjun tive queries and of two-way regular path queries, we have provided te hniques and omplexity bounds for the di erent variants of view-based query ontainment. The only problem left open by our analysis is the exa t lower bound of he king QV1 V Q2 for onjun tive queries. Finally, we have studied the relationship between view-based query ontainment and view-based query rewriting. In our investigation, we have onsidered the ase where views are sound, i.e., their extensions provide a subset of the results of applying the view de nitions to the database. Although this is the usual assumption in most domains, e.g., data integration, there are settings where views should be

onsidered exa t, e.g., query optimization. An interesting dire tion for ontinuing our work is to study view-based query ontainment under exa t views. 10.

A knowledgments

11.

REFERENCES

This work was supported in part by proje ts Infomix (IST2001-33570) and Sewasie (IST-2001-34825) funded by the European Union, by proje t D2I (\From Data to Information") funded by MIUR (Italian Ministry of University and Resear h), by NSF grants CCR-9988322, CCR-0124077, IIS-9908435, IIS-9978135, and EIA-0086264, by BSF grant 9800096, and by a grant from the Intel Corporation. [1℄ S. Abiteboul. Querying semi-stru tured data. In Pro . of ICDT'97, pages 1{18, 1997. [2℄ S. Abiteboul, P. Buneman, and D. Su iu. Data on the Web: from Relations to Semistru tured Data and XML. Morgan Kaufmann, Los Altos, 2000. [3℄ S. Abiteboul and O. Dus hka. Complexity of answering queries using materialized views. In Pro . of PODS'98, pages 254{265, 1998. [4℄ S. Abiteboul and V. Vianu. Regular path queries with onstraints. J. of Computer and System S ien es, 58(3):428{452, 1999. [5℄ S. Adali, K. S. Candan, Y. Papakonstantinou, and V. S. Subrahmanian. Query a hing and optimization in distributed mediator systems. In Pro . of ACM SIGMOD, pages 137{148, 1996.

[6℄ M. Ajtai and R. Fagin. Rea hability is harder for dire ted than for undire ted nite graphs. J. of Symboli Logi , 55(1):113{150, 1990. [7℄ T. Bray, J. Paoli, and C. M. Sperberg-M Queen. Extensible Markup Language (XML) 1.0 | W3C re ommendation. Te hni al report, World Wide Web Consortium, 1998. Available at http://www.w3.org/TR/1998/ REC-xml-19980210. [8℄ P. Buneman. Semistru tured data. In Pro . of PODS'97, pages 117{121, 1997. [9℄ P. Buneman, S. Davidson, G. Hillebrand, and D. Su iu. A query language and optimization te hnique for unstru tured data. In Pro . of ACM SIGMOD, pages 505{516, 1996. [10℄ D. Calvanese, G. De Gia omo, and M. Lenzerini. Representing and reasoning on XML do uments: A des ription logi approa h. J. of Log. and Comp., 9(3):295{ 318, 1999. [11℄ D. Calvanese, G. De Gia omo, M. Lenzerini, D. Nardi, and R. Rosati. Des ription logi framework for information integration. In Pro . of KR'98, pages 2{13, 1998. [12℄ D. Calvanese, G. De Gia omo, M. Lenzerini, and M. Y. Vardi. Query pro essing using views for regular path queries with inverse. In Pro . of PODS 2000, pages 58{ 66, 2000. [13℄ D. Calvanese, G. De Gia omo, M. Lenzerini, and M. Y. Vardi. View-based query pro essing and onstraint satisfa tion. In Pro . of LICS 2000, pages 361{371, 2000. [14℄ D. Calvanese, G. De Gia omo, M. Lenzerini, and M. Y. Vardi. Lossless regular views. In Pro . of PODS 2002, pages 58{66, 2002. [15℄ D. Calvanese, G. De Gia omo, M. Lenzerini, and M. Y. Vardi. View-based query answering and query ontainment over semistru tured data. In G. Ghelli and G. Grahne, editors, Revised Papers of the 8th International Workshop on Database Programming Languages (DBPL 2001), volume 2397 of LNCS, pages 40{61. Springer, 2002. [16℄ S. Chaudhuri, S. Krishnamurthy, S. Potarnianos, and K. Shim. Optimizing queries with materialized views. In Pro . of ICDE'95, Taipei (Taiwan), 1995. [17℄ T. Feder and M. Y. Vardi. The omputational stru ture of monotone monadi SNP and onstraint satisfa tion. SIAM J. on Computing, 28:57{104, 1999. [18℄ M. F. Fernandez, D. Flores u, A. Levy, and D. Su iu. Verifying integrity onstraints on web-sites. In Pro . of IJCAI'99, pages 614{619, 1999. [19℄ D. Flores u, A. Levy, and A. Mendelzon. Database te hniques for the World-Wide Web: A survey. SIGMOD Re ord, 27(3):59{74, 1998. [20℄ M. Friedman, A. Levy, and T. Millstein. Navigational plans for data integration. In Pro . of AAAI'99, pages 67{73. AAAI Press/The MIT Press, 1999.

[21℄ G. Grahne and A. O. Mendelzon. Tableau te hniques for querying information sour es through global s hemas. In Pro . of ICDT'99, volume 1540 of LNCS, pages 332{347. Springer, 1999. [22℄ A. Gupta and J. D. Ullman. Generalizing onjun tive query ontainment for view maintenan e and integrity

onstraint veri ation (abstra t). In Workshop on Dedu tive Databases (In onjun tion with JICSLP), page 195, Washington D.C. (USA), 1992. [23℄ A. Y. Halevy. Answering queries using views: A survey. VLDB Journal, 10(4):270{294, 2001. [24℄ P. G. Kolaitis and M. Y. Vardi. Conjun tive-query

ontainment and onstraint satisfa tion. In Pro . of PODS'98, pages 205{213, 1998. [25℄ M. Lenzerini. Data integration: A theoreti al perspe tive. In Pro . of PODS 2002, pages 233{246, 2002. [26℄ A. Y. Levy, A. O. Mendelzon, Y. Sagiv, and D. Srivastava. Answering queries using views. In Pro . of PODS'95, pages 95{104, 1995. [27℄ A. Y. Levy and M.-C. Rousset. Veri ation of knowledge bases: a unifying logi al view. In Pro . of the 4th European Symposium on the Validation and Veri ation of Knowledge Based Systems, Leuven, Belgium, 1997. [28℄ A. Y. Levy and Y. Sagiv. Semanti query optimization in Datalog programs. In Pro . of PODS'95, pages 163{ 173, 1995. [29℄ H. R. Lewis. Complexity of solvable ases of the de ision problem for the predi ate al ulus. In FOCS-78, pages 35{47, 1978. [30℄ C. Li and E. Chang. On answering queries in the presen e of limited a

ess patterns. In Pro . of ICDT 2001, pages 219{233, 2001. [31℄ T. D. Millstein, A. Y. Levy, and M. Friedman. Query

ontainment for data integration systems. In Pro . of PODS 2000, pages 67{75, 2000. [32℄ T. Milo and D. Su iu. Index stru tures for path expressions. In Pro . of ICDT'99, volume 1540 of LNCS, pages 277{295. Springer, 1999. [33℄ A. Motro. Panorama: A database system that annotates its answers to queries with their properties. J. of Intelligent Information Systems, 7(1), 1996. [34℄ P. van Emde Boas. The onvenien e of tilings. In A. Sorbi, editor, Complexity, Logi , and Re ursion Theory, volume 187 of Le ture Notes in Pure and Applied Mathemati s, pages 331{363. Mar el Dekker In ., 1997. [35℄ M. Y. Vardi. A note on the redu tion of two-way automata to one-way automata. Information Pro essing Letters, 30(5):261{264, 1989. [36℄ H. Wang. Dominoes and the 898 ase of the de ision problem. In Symposium on the Mathemati al Theory of Automata, pages 23{55, 1962.

Suggest Documents