Converting nested algebra expressions into flat algebra ... - CiteSeerX

9 downloads 348407 Views 1MB Size Report
Converting Nested Algebra Expressions . 69. (4). (5). (6). (7). A. D(B). C n000 ...... edge. (t(a, tl(al)).It is easy to see that all of the components of the u variables,.
Converting Nested Algebra Flat Algebra Expressions

Expressions

into

JAN PAREDAENS University

of Antwerp

and DIRK VAN GUCHT Indiana

University

Nested relations generalize ordinary flat relations by allowing tuple values to be either atomic or set valued. The nested algebra is a generalization of the flat relational algebra to manipulate nested relations. In this paper we etudy the expressive power of the nested algebra relative to its operation on flat relational databases. We show that the flat relational algebra is rich enough to extract the same “flat information” from a flat database as the nested algebra does. Theoretically, this result implies that recursive queries such as the transitive closure of a binary relation cannot be expressed in the neeted algebra. Practically, this result is relevant to (flat) relational query optimization. Categories and Subject Descriptors: guages; H.3.3 [Information Storage

H.2.3 [Database and Retrieval]:

Management]: Languages—query lanInformation Search and Retrieval—query

formulation

General

Terms: Algorithms,

Languages,

Theory

Additional Key Words and Phrases: Algebraic calculus, nested relations, relational databases

1.

transformation,

nested

algebra,

nested

INTRODUCTION

In 1977 Makinouchi by

query

removing

the

[29] proposed first

normal

generalizing form

introduced a generalization of the relations with set-valued attributes

assumption.

the relational Jaeschke

ordinary relational and by adding two

database and

model

Schek

model by restructuring

[24]

allowing opera-

tors, the nest and the unnest operators, to manipulate such homogeneously nested relations of powerset type. A similar generalization was proposed by

Authors’ addresses: J. Paredaens, Department of Mathematics and Computer Science, University of Antwerp, B-261O Antwerpen, Belgium; D. Van Gucht, Computer Science Department, Indiana University, Bloomington, IN 47405. Permission to copy without fee all or part of thie material is granted provided that the copies are not made or distributed for direct commercial advantage, the ACM copyright notice and the title of the publication and its date appear, and notice is given that copying is by permission of the Association for Computing Machinery. To copy otherwise, or to republish, requires a fee and/or specific permission. An abstract of this paper appeared as “Possibilities and Limitations of Using Flat Operators in of the 7th ACM SIGACT–SIGMOD– SIGART in Proceedings Nested Algebra Expressions,” Symposium on Principles of Database Systems (Austin, Tex.), 1988, pp. 29-38. @ 1992 ACM 0362-5915/92/0300-0065 $01.50 ACM Transactions

on Database Systems, Vol. 17, No. 1, March 1992, Pages 65-93.

66

J. Paredaens and D. Van Gucht

.

Ozsoyoglu, manipulate

Ozsoyoglu, one-level

and Mates [31]. nested relations

(In addition, and showed

they defined a calculus to equivalence of their alge-

braic and calculus-like data manipulation languages.) Thomas and Fischer [43] generalized Jaeschke and Schek’s model and algebra and allowed nested relations of arbitrary (but fixed) depth (see also [2, 4, 12, 18, 21, 32, 381). Roth, the

Korth, nested

and Silberschatz relational

[37] defined

model

of Thomas

SQL-like query languages guages [22], and datalog-like duced

for this

[7, 13-17, relational

model

a calculus-like and

query

Fischer.

Since

[28, 34, 35, 37], graphical-oriented languages [1, 5, 6, 9, 25, 30] have

or for slight

generalizations

of it.

Also,

for

numerous query lanbeen intro-

various

27, 33, 39, 411 have started with the implementation database model, some on top of an existing database

system and others from scratch. In this paper we focus on the nested algebra as proposed Fischer [43]. The key problem we address is the following:

language

then,

groups

of the nested management

by Thomas and “What is the

expressive power of the nested algebra when we are only interested in its operation on flat relations and when the result is also flat?” In other words, “IS it possible to write queries in the nested algebra with flat operands as input and with a flat output that cannot be expressed relational algebra?” We show that the answer to this Hence, from

the flat a flat

theoretical

algebra

database as well

is rich as the

enough

nested

as practical

to extract

algebra

in the ordinary (flat) question is negative.

the same

does. This

“flat

result

information” has interesting

consequences:

—Recursive queries such as the transitive closure of a binary relation cannot be expressed in the nested algebra, because if they could, the transitive closure could be expressed in the flat algebra, contradicting and Unman [3]. In contrast, when the powerset operator

a result of Aho is added to the

nested algebra, then it is possible, as shown by Abiteboul and Beeri [1] and by Gyssens and Van Gucht [19, 20], to express recursive queries. Hence, the nested algebra with the powerset operator is strictly more powerful than the classical nested algebra studied in this paper. In this light, it is also interesting to mention the results of Hull and Su [23] and of Kuper and Vardi [26] regarding the expressiveness of the powerset algebra. They show that there exists set algebra expressions.

a noncollapsing In particular,

hierarchy of classes of powerthey show that the class of

powerset algebra expressions in which one can use up to n (n = O) (nested) applications of the powerset operator is strictly less expressive than the class of nested algebra expressions that can use up to n + 1 (nested) applications of the powerset operator. Our result shows that an analogous property does not hold for the nested algebra; that is, the number allowed (nested) applications of the nest operator plays no role in expressive power of the nested algebra expressions (for a related result,

of the see

[81). —Several

researchers

systems that flat relational

[33, 41] have

proposed

building

database

management

support the nested relational database model and provide a interface. Our result indicates that such systems have the

ACM Transactions on Database Systems, Vol 17, No. 1, March 1992

Converting Nested Algebra Expressions potential

to optimize

the user’s

language, by using it is an interesting relational result.

properties research

query

query,

expressed

in a flat

.

relational

67 query

of the nested algebra [10, 40]. Alternatively, problem to study the degree to which current

optimizers

can

be extended

to

take

advantage

of this

2. FORMALISM In this

section

and

nested

2.1

Scheme

we define

relation

schemes,

relation

instances,

nested

algebra,

calculus. of a Relation

We define schemes to be lists of attributes. Two kinds of attributes are considered: atomic attributes, represented by their name, and structured attributes, represented by their name followed by their scheme. ( Attribute) Such an attribute

is called

+ (Identifier)

atomic,

and

(Identifier)

is called

the

name

of the

attribute. (Attribute) Such

an attribute

is called

+ (Identifier)

structured,

and

(Scheme) (Identifier)

is called

the

name

of

the attribute. (List-of-attributes)

+ (Empty

-1ist)

I (Non_ empty _list-of-attributes) (Non_ empty -list-o

f_attributes)

lists

A ~ and

+ ( Attribute)

I ( Attribute),

(Non_ empty-list-of-attributes) Two

of attributes

Az are called

compatible

(i.e.,

the

attributes have the same nesting structure) if and only if (iff) they empty, or Al = CYl, A’l and Az = az, h’z with A’l, A’z compatible attributes with

and

compatible

al and

az both

atomic

attributes

or both

structured

(ordered) are both lists of attributes

schemes. (Scheme)

All of the identifiers compatible iff their

+ (_(Non-empty-list-o

f-attributes)_)

in a scheme have to be different. Two schemes are called respective lists of attributes are compatible. A scheme is

called fZat iff all of its attributes are atomic. These are some examples of schemes: (A, (A, A database

scheme

2.2

Instances

Let

S be the

atomic

B,c)

E(D(B),

C))

is a set of schemes.

of a Relation

scheme (al, or structured. The {s I s is a finite

Scheme . . . , a.), instances subset

where ai stands for an attribute, either of S, denoted Inst(S), are the set

of ualues(al)

ACM Transactions

x . . . x ualues(am)},

on Database Systems, Vol. 17, No, 1, March 1992.

68

J. Paredaens and D. Van Gucht

.

A

E(D(B)

C)

ABC’

0 000 100 101

I

110 111

1

200 201

0 1

(b)

(a) (a) Instance

o

El

210

Fig. 1.

u El

1

.sl of lnst( A, B, C); (b) instance

S2 of Znst( A, E(D/)(B),

C))

where ualues( A) is the set of natural numbers for and zmlues( A( h)) = Inst(( h)), where A is a nonempty We need to make the following remarks:

A, an atomic attribute, list of attributes.

—All

namely,

atomic

natural –Two

schemes

In Figure Inst( 2.3

attributes

have

the

same

ualues

set,

the

set of the

numbers. are compatible

1 we show

A, lZ(D(B), Nested

We define

iff their

an instance

sets of instances

s ~ of Inst(

are equal.

A, B, C) and

an instance

Sz of

C’)), respectively.

Algebra a nested

algebra

for

manipulating

schemes

and

their

instances,

similar to the one introduced by Thomas and Fischer [43]. It should be noted that the empty operator and the renaming operator were not considered in [43]. The empty operator is introduced here to create empty structured component because

values; nesting

an instance with of nine operators, (1)

the

nest

an empty

operator instance

cannot again

create

yields

such

a single tuple equal to the empty which are defined as follows:

operator

–: Let

(x ~) and

intuitively

instance

set. The

Union operator U: Let (h ~) and (Az) be compatible SI e Inst(( A ~)) and Sz e lnst(( Az )). Then SI U S2 is the SI and SZ and is an instance of the scheme (A ~).

(2) Difference

values,

an empty

and

algebra

schemes, (standard)

( Az) be compatible

and let union of

schemes,

SI e Inst(( A ~)) and Sz e lnst(( AZ )). Then SI – Sz is the (standard) of SI and s~ and is an instance of the scheme (l ~).

not

consists

and

let

difference

(3) Cross product operator x: Let (h ~) and ( Az) be schemes with no common identifiers, and let SI e lnst(( h ~)) and SZe lnst(( Az )). Then SI x SZ is the (standard) (A,, A2).

cross product

of SI and

SZ and

is an instance

ACM Transactions on Database Systems, Vol 17, No. 1, March 1992.

of the

scheme

70

J. Paredaens and D. Van Gucht

.

A

D(B)

000

C

El 1

Fig.

3.

(a) Instance

PE(D(B), ~)(sz); (b) in-

D

()

()

stance p~cEJp~c~c~),c)(sz))

()

ABC —.

100

000

n

1

010

1

El

100 (b)

(a)

(8) Empty

operator

@: Let

(A) be

a scheme,

and

let

A

be

no

identifier

of A. Let s e Inst(( h)). Then DA(s) is an instance of the scheme ( A( h)) that has only one element, being the empty instance of the scheme (h). (9) Renaming operator p: Let ,0(s, A, B) is the (standard) the scheme Algebraic way.

(A), with

the identifier

expressions

An

expression

Ffexpression) of expressions

(h) be a scheme, and let s e Inst((h)). Then renaming of A by B in s. It is an instance of

in

of the the

nested

nested

n,(a2=,(vc; rrl(sl)

first,

2.4

second, of the

Nested

are

is called

defined a

and

second,

third third,

in

flat-flat

B. the

usual

expression

are some examples SI of Figure 1,

x ~2(%)

P~,C,(~D(~)(a2=

results

algebra

algebra

by the identifier

iff its operands and its result are flat. Here in the nested algebra. Reconsider instance

~1(%)

The

A replaced

3(vC;~(v~;D(sl) E(vB;

x

OD(sl)

examples and

fourth

))))

D(s1)))) x ~3(%)

are

ff-expressions;

expressions

the are

fourth

shown

in

is not. Figure

The 4.

Calculus

We define a calculus for manipulating schemes and their instances, similar to the ones introduced by Roth, Korth, and Silberschatz [371 and by Abiteboul and Beeri [1]. A query in the nested calculus has the form

{t[h]/f(t)} list of t is an identifier, called the target variable; X IS a nonempty The scheme of this attributes, called the scheme of t;and f(t)is a formula. query is (A). The free variables of this query are those of f, except for t.

where

ACM TransactIons on Database Systems, Vol. 17, No. 1, March 1992

Converting Nested Algebra Expressions

A

D(B)

000

.

69

C

n

A

El El El

D(B)

E(C)

100

1

‘mm

mm

101

1

2ENII

200

1

201

n

2EH21 (b)

(a) Fig. 2.

(a) Instance

VB,~(sl);

(b) instance

VC,~(u~, ~(sl)).

operator II: (4) Projection (k > 1) be the indexes

Let (h) be the scheme (al, . . . . an), let il, . . . . ‘k of different attributes a,,, . . . . a,, of h, and let ,,, (s) is the (standard) projection on the a,, s e Inst((h)). Then H,,, through u ,k attributes and is an instance of the scheme (a ,1, ,aLh)

(5) Selection different subset

operator

Vi,, A(s)

is

the

the

scheme

and let

(standard)

of the

with

equal the

A be no identifier nest

of

s on

i and

j

h)). Then

(~) have

(A ~, A(XZ)).

scheme

let

s e Inst((

of elements

V: Let list,

(h) be a scheme;

of ( h), and let

of s consisting

(6) Nest operator not the empty in~tance

u: Let

attributes

the

v~zi A(s)

be the

indexes

of

o, =~(s) is the largest

and j components.

i

form of

(h 1, Az), where

h. Let

attributes

is the

s ● Inst((h)). of

A2

set of all

X2 is Then is

an

elements

and

t,

t in s such that t.restricted to h ~ is equal for which there is an element to Al, and tv( A(XJ) = { u restricted to Xz I u 6s and u to t restricted to A ~}. Notice that in this restricted to h ~ is equal to tv restricted definition an end

we have sequence

SI of Figure Figure 2. (7)

made

the notational

of h and

1. The

not

instances

simplifying

a subsequence V& D( S1) and

assumption

of h. Reconsider Vc, ~(VBi

D( s1))

are

that

Az is

instance Shown

in

Unnest operator p.’ Let the scheme (N have the form (A 1, @ Az)), and let s e Inst(( x)). Then p~(k,)(s) is the (standard) unnest of s on the A( X2 ) attribute of k and is an instance of the scheme (X 1, Az). pA(~z)( S) is the set tk for which there is a element t in s such that tK of all elements restricted to h ~ is equal to t restricted to A ~ and tw restricted to h2 is an element of t( A( A2)). Observe that the application of the unnest operator can lose information when applied to instances containing empty-valued tuple components. Reconsider instance S2 of Figure 1. The instances ~mmm,

CJ(S2) and

~U~)(PE(~(m, C)(SZ)) are shown in Figure 3. ACM Transactions on Database Systems, Vol 17, No

1, March 1992

Converting Nested Algebra Expressions

A

o

0

D

1

000

1 A

2

101 110

0

111

1 Fig. 4.

A formula

can have

of attributes, the forms

called

is a query

2

Results of some nested algebraic

o

El

1

Cl

o

u

1

u

expressions.

one of the forms

where all f,s are formulas; ts are different and differ

A term

L.J

1

71

C

o

ABC’

100

D(B)

.

t is an identifier, called from the target variable);

the scheme

of t.Furthermore,

(term,)

= (term,)

(terml)

e (term,).

a tuple variable (all these and h is a nonempty list a formula

can have

one of

and has one of the forms TRUE FALSE t t(a)

(t(a)

is called

a component

oft)

R where

t is a tuple

bute (atomic attribute, is

variable,

R is a relation

identifier,

or structured). The scheme of t(A), A. The scheme of t(A( h)), where

where A is

and A a

a is an attriis an atomic nonempty list

of attributes and A(A) is a structured attribute, is (h). Note that (terml) = ( termz) is only a formula if the schemes of (terml) and (term,) are compatible, and that (term,) = (term,) is only a formula if the scheme of (terml) is A and the scheme of (termz) is compatible with (N. ACM Transactions

on Database Systems, Vol. 17, No. 1, March 1992

72

J. Paredaens and D, Van Gucht

.

Here

are some examples {t[A]

of queries;

13t1[A,

B(C, {~[A,

{tl[C]

l~t,[B(C)](t,(B(

2.5

D)](tleR~ B(C,

C))

The first query is a projection, In the third query, note how

R has the

scheme

t(A)

B(C,

(A,

D)),

= tl(A)))

D)]lmteR}

= {t,[C]

I t,et,(B(C))}

Atlet,(B(C)))}

and the second, an (unsafe) complementation. tz(B(C)) is defined in terms of itself.

Flat Algebra

An expression

of the

nested

algebra

of every

relation

is also an expression

of the

flat

algebra

iff (1) every

attribute

(notice

that

this

eliminates

such an expression), (2) neither

2.6

example

nor the empty

in Section

expression

from

operator

occurs

is atomic

being

present

in

in the expression.

2.3 is an expression

of the flat

of the nested

is also a query

flat

calculus

(1) every

attribute

of every

(2) every

attribute

of the scheme

atomic,

algebra.

OF THE

section

we

translated

into

the nested

algebra.

occurs

of every

a relation

with

to

We suppose

that

respectively,

in the query

variable

that

occurs

iff

is atomic, in the query

is

how

ALGEBRA

expressions

calculus.

scheme

(h)

TO THE of the

Let

rzae, nael,

(R

denotes

nested

and

an

NESTED

CALCULUS can

be

rzaez be expressions

of

element

algebra

of

lnst((h)))

is

{ t[ A] I t e R}.

the

translated

nael

NESTED

show

the nested

translated

– Union:

that

calculus

is a query.

3. TRANSLATION this

relation

of the

and

(3) no term

–R,

in the

operator

Flat Calculus

A query

In

occurs

unnest

and

the nest operator

The first

that

the

U naez

nested to

algebraic

{ t[h]

(where

I f(t)},

A ~ and

expressions { t[hll

I fl(t)},

Xz are

nae,

and

nael,

and

nae2

are,

{ t[ Aql I fz(f)}.

compatible)

is translated

to

is translated

to

{~[~,ll(f,(~)vfz(t))}. –Difference: {t[hl]l(fl(t) ACM Transactions

nael

– naez (where

h ~ and

Az are compatible)

A~f2(t))}. on Database Systems, Vol

17, No. 1, March 1992

Converting Nested Algebra Expressions – Cross product:

nael

is translated

Al and

Az have

(nae)

k = 1 and

no common

73

identifiers)

to

–Projection: of

x naez (where

o

Hi, ....,

different

(where

attributes

ail,

il, . . . . i~ of

. . , u,~

h)

are

is

the

indexes

translated

to

t(~i,)= t~(~~,))}. {t[~~,). ...~i,]/3t~(A)(f(t~)~~=~ —Selection: ui.J( nae) (where i and j are the indexes of different and aj of h) is translated to { t[ N I ( fl t) A t( a,) = ~(~j))}. –Nest:

Suppose

attributes),

that

h has the

and let

x /yY)

— Unnest:

Suppose

form

that

t[i,

aA(nae)

Let

A2 is a nonempty

of h. Vkz;A( nae) is translated

pyd

= t,(~)

h has the form

hz] 13tJA]u2[A7

{

—Empty:

Al, h2 (where

A be no identifier

~, A(h2).

of

‘o

(f(h)~ t(~) C@?

(h) be the

is translated

scheme

of

nae,

and

let

A be no identifier

EXPRESSIONS

we

of

L

to

4. TRANSLATION OF ff-EXPRESSIONS FLAT ALGEBRA section

list to

pA(~2)( nae) is translated

the

this

a,

W)})}~

=

—Renaming: ,0( nae, A, B) is translated to every occurrence of A is replaced by B.

In

attributes

show

that

INTO

ff-expressions

translation

of the

nested

of

nae

where

OF THE algebra

can

be

translated into equivalent expressions of the flat algebra. To establish this result, we first translate the ff-expression into a calculus query that satisfies ACM Transactions on Database Systems,Vol. 17, No. 1, March 1992.

74

J. Paredaens and D. Van Gucht

.

certain

syntactic

translate, query

into

finally calculus

The by

the

a safe

use

the

to find

syntactic

the

flat

algebra.

that

of calculus

441 We

h,= such that and c lJ

the

on the relevant

queries

tl(al)

of the

flat

[441)

equivalence

the

would

like

to

strategy

state

that this

We then constructive

of

to achieve

intermediate calculus

would

calculus

formulas

form is broad but restricted

that

calculus.

We

the

safe

flat

an

open

it

is

result.

otherwise

enough enough require

existential

queries

are

in

are captured

a certain

normal

to allow the formulation to disallow the formulaa powerset

normal

form

operation

in

(enf) iff

k~l,

..vh~),

i, 1 < i < k, -“”

3u,~t[h,~L](c/?

d, is in enf or

=

this

of Unman

setting (see also [11). calculus) formula g is in

~Uzl[h,l]

constructive. rules,

Form

this normal expressions,

for all

a query

about

algebraic

g=(hlv. such that

such

transformation

sense

[11,

restrictions

requiring

an algebraic A (nested

(in

call

several

theorem

Normal

Existential

We

of

a completely

form. Intuitively, of nested algebra tion

help

query

Codd’s

and

problem 4.1

restrictions.

with

=

d,

is

‘--

FALSE

~c,@~d,),

(in which

n,>

case “A

~ d,”

0,1,

>O,

is not written)

t2(ci2),

or ciJ = t1et2

(a),

or where

f is in enf,

= tl(A(h)),

where

f is in enf,

f}>

where

f is in enf,

The formulas of the three examples We now define the disjunction, existential quantification again are in enf. Let

in Section 2.4 are in enf. the conjunction, the negation,

for formulas

in enf so that

g =

,~lh, ()

ACM Transactions

on Database Systems, Vol

17, No. 1, March 1992

the

resulting

and

the

formulas

Converting Nested Algebra Expressions

.

75

with

h, = 3ui1[AtJ

““”

LZL

~~ %,[

and let g’=

,:lh~ ()

with

with

where

h;

= ~ t[h]h,.

In the following LEMMA

1.

lemma,

we establish

For each formula

fl

some simple

If t2 is not a free variable of f,, equivalent to 3 tz[ h]( fl A f2( ta)).

(B)

If tz is not a free variable

of fl(tl)

then and

then (3 tl[hllfl(tl) A ~ t2[A21 f2(t2)) 3tl[h113t2[A21( fl(tJAfJta)). 3 t[N( fl(t)

LEMMA

PROOF. 4.2

2.

(resp.

(gVg’)

v f2(t)) (g y g’) (g Ag’),

This

Constructive

is logically

equivalent

(resp. (g A g’), ~ g, i’~[~]g).

results

from

Lemma

equivalences:

and f2, we have

(A)

(C)

logical

~ g,

1 and

tl

(f,

A ~ tz[ N f2(t2))

is

is not a free variable is

logically

tO (3 d Nfl(t) ?t[~lg)

is

elementary

logically of fa(ta),

equivalent

to

v 5 t[Nf2(t)).

logically

logic.

equivalent

to



Queries

We start this section class of queries admits

by defining constructive the formulation of all

queries. Intuitively, this nested algebra expressions

ACM Transactions on DatabaseSystems,Vol 17, No 1, March 1992.

76

J. Paredaens and D. Van Gucht

.

(Lemma

6); however,

algebraic setting In constructive tuple

variables

cursive

have

(acyclic) variables

property;

these in

established

4.3, The

in Lemma into

or sub query to the result

.@ety

algebra

(in

will

then

the

the components

have

to satisfy

this

by F.

constructive

[Ill

query

sense

regarding the

a flat be

translation

of

the

facilitate

into

of [441) will final

step in the

. . . . u z(aPz)) be a set 1 of components the variables of F) with

are called

...

~U,~,[hz~L](c,l/?

the constructibility the

graph

set of possible

values

possible values of x. The constructibility is the directed graph with the following (1) all

Also,

in a rzorzre-

..vh~),

h, = ~uzl[hzl]

that

query

F = { ul(aP,),

g=(hlv.

nodes.

in an

quantified

graph

of a query

every

theorem

Ul, . . ., ul (these

operations

in a set denoted

of this

8. Codd’s

powerset

in a constructibility

we transform

g be in enf, and let

We now define

require

as constructive queries. components of existentially

are collected

relational

of the variables

informally

would

relation

components

query.

safe queries translation. Let

from

corresponding

Section

constructive

that

to be reachable

way

of tuple Later,

queries

cannot be expressed queries, all of the

components

of the

tuple

‘-4 AC,l,A

of ( h,, F). An edge (x, y) denotes of y is “bounded”

graph of (h,, nodes:

variables

~dz).

U,l,

by the

F), notated

. . . . Uzn,

and

set of

as C(h,,

all

the

F),

elements

of F; (2) all of the tuple (3) all queries (4) all relations and with (1) If

R that

= tz(az)

U,l, . . . . u,., and the variables

appear

is one

(Q, t( A(N)) (4) if

= Q or

hL on the highest

of the

c,,s,

C,JS, then

Q = t( A(h))

Ul, . . . . Ul; in some

level

in some

c,~; and C,J;

then

t,(a,))is an edge

(tl(al),

if

is

(t2(a), tl)is an edge one

of the

if

t2(a)# fi

CZ,S, Q being

a query,

then

is an edge;

te R is one of the

(5) if t= Q is one of the (6) if

in

level

tl(al)) is an edge if tz(az) # F;

# F and (tz(az),

t(A(h))

h, on the highest

edges:

(2) if t,G t,(a)is one of the (3) if

in

appear

the following

tl(al)

tl(al)

variables

Q that

t is a variable

and

C,js, then

(R,

C,JS, Q being

t) is an edge; a query,

t(a) is a component

(Q, t)is an edge; and

then

of t,then

(t, t(a))is an edge.

t(a)is called reachable in a subgraph C’(h,, F) of C(h,, F) iff A component a query Q or a relation R to t(a).A there is a path in C’(h,, F) from component

t(a)is called

reachable

in (h,,

F) iff t(a) is reachable

1 The notions of constructibility graph, reachability, acyclicity, F These variable relative to a set of variable components variable components of the involved enfs (see, in particular, query { d N I g} in Section 4.2). ACM TransactIons on Database Systems, Vol

in C( h,, F).

and constructiveness are defined components correspond to free the definition of a constructive

17, No, 1, March 1992

Converting Nested

Algebra

Expressions

.

77

R o tl(A)

t(A)

t

\ +—

o~o~o

tl(B(C,

D))

● tl

o

(a)

t/-(A)

R

e

e

\



t(B(C,D))

(b)

t,(c)

‘1

/“+2

.~a~e +2( B(C))

\



{t3[cl

I t3 E t2(B(C))}

(c) Fig. 5.

Constructibility

R A t(A)

= tl(A)),

graphs of the three examples

{t(A)});

(b)

C(7 t e R, {t(A),



in Section 2.4. (a) C(3 tl[ A, B(C, D)](tl

@(C’, D))});

(C)

C(3

tz[B(C)l(tJB(C))

=

{QCI I ~3G~JB(C))} A ~1GtJMC)), {tI(C)}). The component (h,, F), for all reachable. tl(C)

In Figure

5b no component

in (g, F) iff it is reachable t(A), tl(A), and tl(B(C, D))

is reachable.

In Figure

5C tz( B(C))

in are and

are reachable.

Informally, elements (hi,

t(a) is called reachable 5a i, 1 s i s k. In Figure

F) is called

(1) all

(h,,

F) is called

of F are reachable

of the

acyclic

iff

C(hz,

components

variables in the A(h,, F); and

acyclic

of the

sequel)

if all

in an acyclic

and

of its

u variables

subgraph

F) has a subgraph tuple all

variables

of the

elements

and

of C( h,, F). A(h,,

all

of the

Formally,

F) such that

ULI, . . . . Vz., (called of F are

the

u

reachable

in

(2) A(hi, F) augmented with all the edges (t(a), Q), with t(a) appearing the query Q, is acyclic. We call the latter graph A U( h,, F).

in

Informally, (h,, F) is called constructive if it is acyclic and if the negated part, di, and all of the subqueries that occur in h, are constructive. Formally, (h,, F) is called

(1) (h,,

constructive

iff

F) is acyclic;

(2) (o?i, @) is constructive; ACM Transactions

on Database Systems, Vol. 17, No. 1, March 1992.

78

J. Paredaens

.

(3) if some

c,Jhas

and D. Van Gucht

the form u(a)

= {Lv[~]

I f},

or {~[7]

I f}

= u(~),

or Ue{w[y]l

then

{ W[Y] I f}

(4) if some

then

f},

is constructive;

c,] has the form

{ W1[Y 11I f’l}

(g, F) is called

and

{~1[~11 I f-l}

=

{ wz[-r ~1 I tz}

are constructive.

constructive

iff for

all

{%[y,]lf,},

i, 1 s i < k, (h,,

F) is constructive.

is called constructive iff (g, F) is constructive, where F = { u(a) I a ● h}. Note that, whenever G C F, (g, G) is constructive if (g, F) is also constructive. recursiue (see The definition of the constructiveness of (h,, F) is clearly

The

{ U[ h] I g}

query

points (2), (3), and (4)). However, it is clear that in spite (purely syntax-oriented) definition is well defined. The first example of Section 2.4 is constructive;

of the recursion the

second

this is

not

of t is reachable in (= t e R, { t( A), t(B(C, D))})) (since neither component be cyclic and (Figure 5b), and neither is the third (since the graph AU would therefore In

does

not

exist,

3-5

we

Lemmas

as can show

be seen

how

from

Figure

constructiveness

5c). is

preserved

in

various

situations. LEMMA

(A)

If

3.

(g,

Let g and g’ be formulas, F)

and

(g’,

F)

are

and let F and F both

be sets of components.

constructive,

then

(g ~ g’, F)

is

constructive. (B)

If (g, F) and (g’, F) no common components

are both constructive or common variables,

and then

if g and g’ have is (g ~ g’, F U F)

(C)

If (g, F) and constructive.

(D)

If (g, F) is constructive and F contains the set of all of the components is constructive, F, being the the variable t, then (~ t[ AIg, F,) of components of F that are not components oft.

constructive. (g’,

@)

are

both

constructive,

then

(g ~

~ g’, F)

is of set

PROOF

(A)

Trivial,

(B)

Clearly, F U F’)

since

(gvg’)

C(h ,,, = A(hl,

= (g ~ g’).

F U F’) = C(h,, F) U C(h:., F’). F) U A(h~,, F’); then AU(hii,,

Now take A(htc, F U F’) = AU(hL,

or common variF) U A U( h~, F’). Since g and g’ have no components ables (and, hence, clearly no common queries), and since A U( h,, F) and ACM TransactIons on Database Systems, Vol. 17, No 1, March 1992

Converting Nested Algebra Expressions

(C)

AU(h;,,

F’) are both

F’) that

contains

Clearly,

(~ g’, a)

ness),

that

equal

to

which

is an acyclic

@)

A(hi,

LEMMA

4.

constructive tz(az)), F) PROOF.

is

This

FJ

It

(g

since

Let

5.

F)

clear

if

that

(h:,

a query

that

(h;,

occurs

FJ

, Fl)

is acyclic.

A(EJ,

Fl)

= Ai7(h,,

F),

We know

that

as a term,

is constructive,

it

and

remains therefore,



= t2(a2)) is in enf and ((hi A tl(al) = tz(a~)), = tz(az)), F) = A(h,, F). choose A((hi A tl(al)

L tl(al)

we

can



F) is constructive.

g be a formula,

and

let F be a set of components.

Let

be constructive. in

(B)

If t( A(AI))

of t and no component

is a component

in ~t[hlg, constructive.

then

L tl(a)

~t[h]g,

(~t[hl(g

Q is a constructive

(St[N(g

R.

of constructive-

AU(h;

Fl)

If tl(al) does not appear is constructive. { tl(a,)})

If

F U

We can choose

(A)

(C)

AU(h,z,,

since no arc ends in

Let g be a formula, and let F be a set of components. If (g, F) is and tl(al) and tz(az) are two elements of F, then ((g ~ tl(al) = is constructive. Clearly,

LEMMA

be

Also,

((hi A tl( CYJ = tz(az)),

(1 t[ Ng,

in

79

: g’, F) is constructive.

and hence,

proves

be a cycle

is constructive.

should

graph,

is constructive.

acyclic,

Hence,

F).

only

is impossible

(see (2) of the definition

(g L

(h”, FJ

is constructive.

(2 t[ Ng,

could

is constructive

to prove

constructive.

F)

there R. This

so by (B) we see that

(D) We have

(di,

acyclic,

a relation

.

(~t[ h](g

tlG t(A(Al))),

L

query

then

and

tl(a)

~ t(a)

= tl(al)),

of tl or tl itself F U {tl(a)l

does not

appear

F U

appears

a e Al}) in I t[ Ng,

is then

= Q), F U { tl(a)})is constructive.

PROOF

(A)

Let

be the ith component of ~t[hl(g We have to prove that { t,(al)}. that

it is acyclic.

We can choose

and ~ t[a)= tl(al)), (i,, F) is constructive. A(~,,

~) as A(3 t[ A]hi,

let ~ be F U We first show F) plus

the edge

tl(al)). Hence, AU(~i, ~) is AU(3 t[~lhi, F) plus the edge (t(a, It is easy to see that all of the components of the u variables, the tl(al)). t and the gle~ents of F and tl(al)are reachable in A(~,, ~). variable tl(al)does not appear in Furthermore, A U(hi, F) is acyclic because (t(a),

(B)

~ t[ A]g.

Clearly

remain

constructive.

(d,,

@)

is

constructive,

and

queries

appearing

in

~,

Let

be

the

ith

component

I~a G Al}. We { tl(ci)

have

of

~t[h](g to

show

ACM Transactions

~ t, e t( A(AJ)), that

(~,, ~)

and is

let

~ = F U

constructive.

We

on Database Systems, Vol. 17, No. 1, March 1992

.

80

J. Paredaens and D. Van Gucht

first

show that

it is acyclic.

the edge (t(A(hI)), A U(3 d h]hz, F) plus the of

components { tl(a)

We can choose

tl) and the edges (tl, the edges just

of the

I a e A ~} and

of

F

are

A U(%,, ~) is acyclic

because

Clearly,

constructive,

(d,,

@)

is

for all

mentioned.

variables,

u

A(%z, ~) as A(3 t[ h]h,,

tl(a))

the

reachable and

~) is

It is easy to see that

all of

in

t and A(Z,,

~).

queries

appearing

the

elements

Furthermore,

of t or t itself

no components

F) plus

AU(%,,

variable

a e Al.

app~ars in

in g.

h remain

constructive. Let

(c)

~,= be the

ith

component

We have acyc~c.

to show ye

variables, in A(~,, appear

in g. Clearly, constructive. 3-5

(d,,

~) as A(3 t[hlhi,

~ = F U { t,(a)}. show

that

it is

F) plus the edge (Q, tl(a)).

and queries

appearing

in ~,

Section into

3, where

a nested

we discussed

calculus

query,

allow

translation

of a

us to establish

expression

of the nested

algebra

can be translated

into

a

query. z Let

nae

by induction

Induction

be an expression on the number

hypothesis.

Every

n (n > O) operators

Basis

let

We first

@) is constructive,

with

expression

Every

6.

ROOF.

than

= Q), and

result:

constructive

lemma

A(~,,

L t,(a)

is constructive.



together

algebra

LEMMA

F)

the variable tl and the elements of F and tl( a) are reachable A U(Z,, ~) is acyclic because tl(ct) does not ~). Furthermore,

the following

be

can choose

remain Lemmas

(h,,

and the edges ( x, Q), F) is AU(3 t[hlh,, F) plus the edge (Q, tl(a)) x appears in Q. It is easy to see that the components of the u

All(h,, where

nested

of_qt[~](g

that

= Q)

3t[~](hLAt1(a)

step.

into

the

expression

query

into

nae = R (with {

nested in

of the

can be translated

(n = O). Then

translated

of the of operators

t[h]Ite R},

algebra.

nested

algebra

a constructive

scheme

which

We

prove

the

nae.

(h)).

This

is clearly

with

fewer

query. expression

can

a constructive

query. Induction Binary

step. operators.

Suppose

nae has

n operators

(n > O).

Then nae = nael

U naez,

nae = nael

– naez,

nae = nael

x nae2.

or or

2 Although

it is not necessary for our results, we conjecture that the converse of this lemma is also true. A possible way to prove this would be to compare constructive queries with the calculus queries introduced in [37] and with the strictly safe calculus queries introduced in [1].

ACM Transactions

on Database Systems, Vol. 17, No 1, March 1992,

Converting Nested Algebra Expressions Since nael and the constructive (i) (ii)

nae can be translated

nae = nael

U nae2.

is constructive

nae = nael L ~,(tl))}.

(iii)

naez have fewer than n operators, they can be translated queries { tl[ Al] I ~l(tl)} and { tz[ Az 1I ~z( tz)}, respectively.

This

query

– naez. This

nae = nael identifiers.

because

nae

query

can

be

is constructive

Unary

because

operators.

into

+JCY)

of Lemmas

y fz(tl))}. L

3(C).

Az have

no common

= t,(a))

~a,h,t(cl)

because

into

{ tl[ All I ( fl( tl)

of Lemma

case h ~ and

A2]l(:t,[hJ(f,(tJ

is constructive

{ tl[ A ~1I ( ~l(tl)

81

3(A).

translated

Notice that in this nae can be translated into

~:t2[A2](f2(t2) query

into

of Lemma

X naez.

{t[h,,

This

.

= t,(a)))}. 3(D),

5(A),

and 3(B).

Then nae = II ,I,,,,,,(nael),

or nae -

u, =J ( nael),

or ~ae = W;A(n%)) or nae = fl~,,)(nael), or nae = @~( nael), or nae = p(nael). Since nael constructive (i)

nae - II,,,,,,

This (ii)

has fewer than n operators, A ~1I fl( tl)}. query { tl[

query

(nael).

because

(iii)

query

is constructive

nae = v~,, ~(nael), {t[h’,

with

t(A(hg)) = t,(~)

ACM Transactions

3(D)

the

and 5(A),

3(B).

nae can be translated

‘~~~lt(a)

= {tz[~2]/sts[~](f(t3)

~aEA2t2(@)

into

= h(~,))}

of Lemma

Al = Al, h2. Then

A(h2)]lstl[~](~(tl)

translated

into

A h(%) because

be

into

of Lemmas

nae can be translated {~l[~l]l(fl(h)

This

can

nae can be translated

is constructive

nae = O,=J(nael).

it

= tl(a)

into

L

~ue~’t[a)

= ~3(~))})}. on Database Systems, Vol. 17, No. 1, March 1992.

82

J Paredaens and D. Van Gucht

.

This query fact that

is constructive

is a constructive (iv)

query

nae = ~~(~l)(nael), {[)+,

(which

with )i2]

(v)

query

ncze = p(rzael,

4.3

because

The

A, B).

Between

of Lemmas

=

3(D),

and the

and 5(A)). into

b(cY))). 5(A),

and 5(B).

into

of Lemmas

renaming

Logically

query

te {d N I f(u)}

form

3(D)

5(C),

~aexl(a)

of

a

3(C) and 5(C). constructive

Equivalent

query

is

a

transformation

rules

Formulas that

ness and that will allow us to “flatten” constructive rules are considered: (1) The query elimination (Tl) the

5(A),



query.

several

because

Lemmas

‘a,x2t(cY)

rme can be translated

Transformations

We now define

from

3(D),

h’. nae can be translated

~ t2et,(Aoi1))

is constructive

constructive

follows

13t,[h1]3t2[h2] (f,(t,) —

is constructive

nae = 0~( nael).

This (vi)

query

of Lemmas

Al = A(hl),

= t,(a) This

because

by the

equivalent

f(t)

preserve

constructive-

calculus queries. Four replaces subformulas of

subformulas.

(2) The

q~er~

propagation (T2) substitutes, if t(a) = Q, every occurrence of t(a) by Q. Applying the query propagation repeatedly results in a query in which each structured component occurs at most once. (3) The structured component elimination (T3) then allows the removal of these components. After applying the structured component elimination, the query no longer contains structured components. The only reason for a query to be nonflat is occurrences of subformulas of the form Q1 = Q’ (where Q1 and Q’ are subqueries). (4) The query equality elimination can then be used to transform these subformulas. Let g be a formula, let F be a set of components, and let (g, 1’) be constructive,

(Tl) query

Query

with

membership

membership

g’=(hlv

~=3u1[AJ

ACM Transactions

elimination.

elimination

. ..vh._lvh;

””” 3un[~n]

rule vh,+lv”

((cl

A”””

Let substitutes

CJ = t1e{t2[A211 g by g’, with

““vh~), AcJ_lAcj+l

A””

on Database Systems, Vol. 17, No. 1, March 1992

.Ac1A7~)A~(tl)).

f(t2)}.

The

Converting Nested (T2)

Query

Q - {tz[~zl propagation

where

propagation.

I ~(~2)},

rule

CL ( p #j)

(1) Delete

the term

the attribute

(3) If the scheme

LEMMA

7.

.

83

Cj = Q = tl(ci)where

of

A(h,,

~).

The

query

with

every

occurrence

the

formula

h,,

from

the scheme

empty,

of u~.

remove

elimination.

Query

structured

3 Un[ 1.

h’,, and let

equality

membership

and

hi.

A(h)

equality

Consider

term in which u~( A( h)) occurs. The substitutes g by g’ in three steps:

c, from

formula

The query

query

only rule

elimination.

of Vn becomes

the resulting

f2(t2)}.

or

edge

is CP(d, respectively)

component

(2) Delete

Query

= Q

Expressions

by Q.

Structured

(T4)

Cj = tl(u)

(d’, respectively)

suppose that Cj is the component elimination

Call

Let

(Q, h(~)) be an g by g’, with

let

substitutes

of tl(a)replaced (T3)

and

Algebra

Let

elimination

substitutes

propagation,

query

I fl(tl)}

= {tz[~zl

I

g by g’, with

structured and

elimination,

c, = {tl[yl]

component equality

elimination,

elimination

preserve

constructiveness. ROOF.

rules

(Tl),

In this (T2),

proof (T3),

we use the notation

let

graphs,

Therefore,

F be a set of components,

C(h,,

1’) contains

in the transformation

To show that query membership we first need to establish a property

(Tl) Query membership elimination. elimination preserves constructiveness, of constructibility

introduced

and (T4).

a subgraph

—that satisfies the reachability definition of constructibility

let

and assume A’(h,,

that

(h,,

3’) is constructive;

then

F)

and acyclicity graphs; and

conditions

mentioned

in the

ACM Transactions on Database Systems, Vol. 17, No, 1, March 1992.

84

J, Paredaens

.

and D, Van Gucht

–for which, for each component of the u variables and there exists a unique query or relation from which element can be reached in A’(hi, F).

each element of F, the component or

To see that this is so, consider C(h,, F), and let x(a) be a component there a u variable or an element of F. Since (h,, F) is constructive, subgraph

A( h,,

F) of C’( h,, F) that

satisfies

the

reachability

and

of either exists

a

acyclicit

y

conditions mentioned in the definition of the constructibility graph. Suppose there exist two different nodes Ql, Qz (each of which is either a query or a relation) in A( h,, F) from which x(a) is reachable (see Figure 6). Let y be the node where the paths from QI and Qz to x(a) join for the first time, let (z, y) be the edge of the path from Qz to x(a) pointing to y. Consider subg-raph

of

that this mentioned

subgraph in the

that

in

A( h,, F) where

this

the

edge

(z, y) is removed.

still satisfies the reachability definition of constructibilit y

graph

x(a)

can

be reached

via

It

should

and the

be clear

and acyclicity conditions graphs and, furthermore,

one

less

in A( hi, F). Repeated removal of such edges eventually A’(h,, F) with the desired properties. (This establishes

path

than

it

could

results in a subgraph a result about con-

structibility graphs that will be necessary in the rest of the proof about the preservation of constructiveness by query membership elimination.) Let Q = {tz[~z] I f(tz)} (remember that CJ = tl e {tz[hzl I f(tz)} in (Tl) and, hence,

that

Cj = tl ~ Q, let f(t2)

= (p,(t2)v

““”

vpm(t2)v

““’

Vpr(t,)),

and let

Then

h;=

Let

him

j aul[~l] m=l

be the

constructive.

rrzth

Consider

““ . 3un[An]3wm1[AmJ

disjunct h, and

“ - “ %Jm.m]

to prove that (h~m, l’) of h;. We have p~( tz). We know that (h,, F) and ( Pn(ta),

(where G = { tz( 13)I tz(i3) e Az } ) A( hi, F) and A( pn( tz), G). There

are constructive. are two cases:

Consider

the

is

G)

graphs

7). Consider what happens in this (1) (Q, tl) is an edge of A’(h,, F) (Figure situation when we “join” the graphs A’(hi, F) and A( p~(tz), G) by applying the query membership elimination rule. Figure 8 shows the resulting graph A U(hjm, F). Clearly, AU(hjm, F) is A(h’lm, F) plus ACM Transactions

on Database Systems, Vol

17, No. 1, March 1992.

Converting Nested

Fig, 6.

x(a) is reachable

“\

.

85

from both QI and Q2

●+-----e-



Q1(t2) •~o

/

•~*

A’(hi

ExpressIons

tz(pl)

tl (al)

Q

Algebra

,F)

e~e



tl (Ciq)

Qq(t2)

;~’d A(pm(t2

),G)

(a)

(b)

Fig. 7.

(a) Graph

A’(h,,

F); (b) graph

A( pm(tz),

G).

the edges of the form ( z(-y), P) if .z(T) occurs in the query P. (It important to observe that t2 is replaced by tl in these graphs and that has disappeared.) We have to show the reachability the w variables, and the elements AU(hl~, F). Let of F. We consider (a) (b)

of the components of the u variables, of Fin A(h’Lm, F), and the acyclicity of

z(y) be a component two cases:

Z(Y) is not reachable the reachability

from

of a u variable

tl(CYU). Then

of Z(T) in

is Q

or of an element

the reachability

follows

from

A( h,, F).

z(y) is reachable from tl(a”). Then the reachability reachability of tz( @v) in A( pn(tz), G).

follows

Let z(-y) be a component of a w variable. The reachability from the reachability of z(y) in A( Pn(tz), G).

from

of z(y)

the

follows

Next we have to show that A U( h;n, F) is acyclic. To show this, we first identify two parts of the graph A U( h:m, F), called L and R, respectively. L is the subgraph of AU(h;~, F) corresponding to A’ U( h,, F), and R is the

subgraph

of

A U(h’in,

F)

corresponding

to

A U( p~( tz ), G)

(see

ACM Transactions on Database Systems, Vol. 17, No. 1, March 1992

86

J. Paredaens and D. Van Gucht

.

‘2(P1)

●+-----exl

Q1(t2) o~e

(3)

Xq (5 ) A’(hi

,F)

;~”d

+ (aq)

Qq(t2)

M+Jt2

))G)

(b)

(a) Graph

Fig. 8

Figure

8). It should

.Z(7) occurs

e~e

be clear

in the query

that

P, might

(a) Z(7) is not a component rise to cyclicity:

of tl.

A U(h~m,

only

edges of the form

create

cycles.

We consider

(i) z(y) occurs in L and appears path in AU(h~m, F), giving a path necessarily since z(y) occurs

F).

(z(y),

We consider two

P), where two cases:

cases that

in P in R. Suppose that rise to a cycle. Notice

may

give

there is a that such

has to go through a component of tl. Now, in Q before the query in P, it also occurred

membership elimination that this would have

rule was applied. given rise to a cycle

It is easily seen in A’ U( h,, F), a

contradiction. (ii)

(b) z(~)

z(y) occurs in R and appears in P in L. Notice that the only Because of paths from L to R originate in tl or in its components. the special characteristics of A’(h,, F), there is no path from P to tl or to any components of tl;hence, the edge (z(y), P) cannot create a cycle in A U(h: ~, F). There of tl,say, tl(au).

is a component

are two cases:

(i) Suppose that P is in L. Then the edge (tl(CYU), P) was added to F). This edge cannot give rise to a cycle, since in A’(h,, F) A(h;~, there is not A’77(hz, F’). (ii)

suppose A(h’,m,

that

a path

from

P is in

F). There

P to

R. Then

tl( au) because the

edge

(tl(au),

of the

acyclicity

P) was

added

of to

are two cases to consider:

(ii.1)

tl(au) occurred in P before the query membership elimination rule was applied. In this case, tl(aU) occurred in Q, giving rise to a cycle in A’ U( h,, F), a contradiction.

(ii.2)

tz(DU) occurred in P before the query membership elimination rule was applied. In this case, the acyclicity of

ACM TransactIons on Database Systems, Vol. 17, No. 1, March 1992

Converting Nested Algebra Expressions L

Q1(t2)

‘1/”

“~o—”

“k

Fw

,F)

( pn(tz), AU(h;m,

Qq(t2)

Graphs

G) guarantees F).

(2) (Q, tl) does not appear

in

queries

that

(Q, tl) case.

A’(h,,

so no reachability of variable existence of Q; in particular, other

F) and A(p~(t2), G).

A(h,,

We have thus shown that, when acyclic. We now turn to the other

from

.—.

‘1 (aq)

Fig. 9.

results

87

R

+(~1)

A(h’im

.

there

can

be

is an edge of A(hl,

F). Hence,

in

A’(h,,

no

F),

cycle

in

(h’im, F)

is

F) no edges leave

components results as a consequence the reachability of the components

or relations

(see Figure

Q,

of the of tl

9).

For A(h~m, F) we propose using the graph shown in Figure 10. Clearly, AU(h’im, F) is A(h’im, F) plus the edges (z(-y), P) if z(~) appears in the query P. Again, we introduce the notation L and R. It should be clear that the reachability of the components of tl and the components of the u variables follows from their reachability in A’( hi, F). The reachability of the components of the w variables follows from their reachability

in

cause there Hence,

(hjm,

A( pm( t2 ), G).

are no paths

1’) is acyclic

from

in this

The

acyclicity

of

A U( h’i ~, F)

L to R and vice versa case also. Furthermore,

is constructive, it follows from Lemma 3(C) Hence, (h:, F) and (g’, F) are constructive.

that

follows

(see Figure since

( hjm, F)

be-

10).

(d V eJ(tl),

Ql)

is constructive.

(T2) Query propagation. We have to prove that (h;, F) is constructive. tl(a) (remember that tl(a) = Q or Q = tl(a) in (T2)) can First note that occur within a subquery P of h ~. In P, tl( u) is certainly a free variable component. constructibility

This

implies graphs,

and (2) in the edge part

that

at the query

we have definition

that

P, when

no edge leaves

of constructibility

we consider tl(a)

its associated

(see conditions

graphs)

(1)

and, therefore,

of cyclicity in the that the substitution of tl(a)by Q cannot lead to problems subquery P. The reachability of the components of the u variables and the elements of F in A( h’,, F) follows from the reachability of these ACM Transactions on Database Systems, Vol. 17, No. 1, March 1992.

.

88

J. Paredaens and D. Van Gucht

L

%

‘“F‘:,7“

Xq (8)

e =---—e_____e__e

and



“d” ~ Fig. 10.

components

o~o

xl (a)

‘\

‘~

Q,(L)

.--)

elements

in

Graph

A(hjm,

A( h,, F).

The

Qq(t2)

F).

only

interesting

observation

to

from a query P and on the path from P make is that, when x(8) is reachable F) the reachability of x(~) will to x(6) the node tl(a) occurs, then in A(hj, result from the fact that there is a path from Q to x(/3) in A( hi, I’). We now turn

to the acyclicity

occurrence

of

consider

F). We first

of (h’,,

tl(a);otherwise,

two cases in which

remark

that

the

query

Q contains

no

there would be a cycle in A U( h,, F). We substitution of tl(cx) may potentially give rise

the

to cyclicity:

(a) Let

x( f?) be reachable

occurs

on the

path

from

a query

P in

P to x((3). After

from

A U(h,,

substituting

F), tl(a)

such by

that

tl(a)

Q, there

is

a path from Q to x(/3). Suppose that x(8) occurs in Q. But then there F), namely, the path consisting of the edge would be a cycle in A ll(h,, going

from going

edge

Q to from

tl( a),

x(6)

the

path

going

from

tl(a)

to

x(~),

and

the

to Q, a contradiction.

from a query P in A i7(h L, F), and suppose (b) Let x(@) be reachable of tl( a). Hence, after the substitution, contains an occurrence

that

P

P

is

transformed into a query P’ in which tl(a) is replaced by Q. Suppose that of x((3). This would introduce a cycle in Q contains an occurrence A U(h;, F). This cannot happen, however, because there would be a cycle in

A U( h,, F),

tl(a), the

namely,

the edge going edge going from

the

Hence, (h;, 1’) is acyclic. Also, for tl(CY) in d does not affect structive sub queries are not Hence, (hj, F) is constructive, (T3) eliminate

Structured terms,

ACM Transactions

component attributes,

path

consisting

of the

from tl(a) to P, the path x(~) to Q.

going

edge going from

from

P to x(6),

it can easily be seen that the substitution the constructiveness of (d’, @). Finally, affected by the substitution of Q for and therefore, so is (g’, F’). elimination.

and,

eventually,

This

is obvious

quantifiers.

on Database Systems, Vol. 17, No. 1, March 1992.

since

we

Q to and

of

Q

contl( a).

only

Converting Nested (T4)

Query

structive.

equality

We first

choose for

elimination.

show

A( h’i, F) the

that

We

(h:,

graph

Algebra

have

to show

F) is acyclic.

that

This

follows

the

nodes

A( h ~, F) without

and { the queries { tl[y ~1I ~l(tl)} queries are preserved by the query

Expressions

.

(h;,

F)

because

89

is

corresponding

t2[-y21I f2(t2)}.

Clearly, constructive elimination rule. We only

equality

con-

we can to subneed

to show that

is constructive. LEMMA

This

from

A constructive

8.

PROOF.

follows

According

Lemmas

query

to Unman

3(C), 3(D),

in the flat

[44],

calculus

a query

{



and 3(A). is safe.

t[h]I f}

of the flat

calculus

is

safe iff (1) whenever

t satisfies

relation

f

each

of the database;

component

of

t occurs

somewhere

in

a

and

(2) whenever 3 u[yl fl occurs in the query, if fl is satisfied by u for any values of the other free variables in fl, then each component of u occurs somewhere in a relation of the database. query in the flat calculus. Let { t[N I f} be a constructive ( f,F) is constructive, where F = { t(a) I a e X}. In particular, (1) all of the elements

of F, hence,

(f, F), which implies components; and (2) in

a subformula

item

3 v[-YI fl

all

of the

since

4.4

t only

components

( fl, Fl), F1 = { u(a) I a e ~}, which implies components of the u variables are atomic

implies

that

of t,are reachable

all the components

(1) above,

This

has atomic of

u are

in

attribute

reachable

item (2) above, since attribute components.

in

all of the



Translate

Algorithm

We now present an algorithm to construct a query in the flat algebra that is logically equivalent to a formula of the nested algebra with flat operands and a flat

result.

Algorithm

translate

Input.

An ff-expression

Output.

An expression

of the nested

algebra.

of the flat algebra.

Method (1) Transform 6. (2) Repeat

steps

(2. 1) Repeat (2.2)

the given

ff-expression

(2. 1) and (2.2) qu..y

until

membership

Repeat query propagation elimination until no longer

into

a constructive

no longer elimination immediately possible.

calculus

query,

using

Lemma

possible. until

no longer

followed

by

possible. structured

component

ACM Transactions on Database Systems, Vol. 17, No. 1, March 1992.

90

J. Paredaens and D. Van Gucht

.

(3) Repeat

query

equality

elimination

(4) Apply Codd’s theorem algebra expression.

We illustrate step following constructive

until

for translating

(2) of the query:

{t[A]l~t,[B(C)]

no longer

algorithm

with

~t2[D(E(F))](te

t2(D(E(F)))

possible.

a safe calculus

query

an

tl(B(C))

so we apply

{t[A]\~tl[B(C)]

flat

Consider

the

example.

= {t4[I]lt4~

step (2.2).

(tetl(B(C))

an equivalent

At1et2(D(E(F)))A

= {t3[G(H)]lt3(G(H))

Step (2. 1) is not applicable,

into

Atl=

This

R}})}.

yields

the query

{t,[G(@]lt,(G(~))

={t4[I]lt4GR}})}.

Now

we can apply

step (2.1),

{t[A]l~t,[B(C)] Now

we apply

and we get the query

(tGtl(B(C))

step (2.2),

apply

= {t4[I]\t4GR})}.

and we get {t[A]lte

We finally

Atl(B(C))

{tA[l]ltleR}}.

step (2.1) to get the q~ery {t[A]lteR}

LEMMA

Algorithm

9.

Translate,

taking

nested algebra, stops and produces equivalent to the given ffiexpression. PROOF. prove

performed. nent

First

that

it The

of every

component

we ends. result

prove

be

the

algorithm

7 indicates

is a constructive

variable can

that

Lemma that left

occurs after

as input

an expression

is correct,

how query.

in

step

the

query

(2).

Nor

t(a) G Q, t(a) = Q or Q = t(a) left. the form QI = Qz are eliminated by step created by this step, all of the queries are the result after step (3) is a constructive query is safe by Lemma 8 and can be form

an ffiexpression

of the flat

if

steps

(2)

Since

every

is reachable, are

any

it

and

of the

algebra

ends. (3) no

Then

have

structured formulas

that

to

is

we be

compostructured of

the

Since all of the formulas of (3), and since no new query is eliminated after step (3). Hence, query in the flat calculus. This transformed into an equivalent

expression in the flat algebra. In order to prove that the algorithm ends, we call n~ the number of queries and n~ the number of occurrences of structured components in the query. Suppose that we apply step (2) each time on sub queries without occurrences of structured components. Each time step (2. 1) or step (2.2) is executed, the value n~ + ns becomes smaller, since every ❑ (h,, F) is acyclic. Each time step (3) is applied, n~ becomes smaller. ACM Transactions

on Database Systems, Vol

17, No. 1, March 1992

Converting Nested We are now able to formulate For

THEOREM.

equivalent

every

expression

ffiexpression

in the flat

The transitive

COROLLARY.

our main

Algebra

Expressions

.

91

result:

in

the

nested

algebra,

there

is

an

algebra.

closure

is not expressible

in the nested

algebra.

5. CONCLUSION Our main result states that the flat algebra is as expressive as the nested algebra operating on a flat database and with a flat relation as a result. A simple consequence is, therefore, that recursive queries, such as the transitive

closure,

are

not

expressible

in the

nested

separates the nested algebra and the powerset that recursive queries are indeed expressible again

suggests

flat algebra, Although (either

that

whereas we did

atomic

the

nested

algebra

the powerset not consider

or structured)

algebra.

This

is a conservative

algebra is not. selection operations

or selection

result

clearly

algebra since it is well known in the powerset algebra. This

operations

extension involving

involving

of the

constants subset

condi-

tions, it is straightforward to generalize our results to include these options. This generalization notably expands the range of practical queries that are affected by our results. We believe that our mization. tional

This queries

results

will

clearly

can

be

can contribute

depend

processed

on how in

to (flat) efficiently

relational

Related to this issue, it would be interesting proof of our main result, that is, a proof

relational flat-to-flat

database

query

opti-

nested

rela-

implementations.

to obtain a complete algebraic without having to introduce an

intermediate calculus. Finally, algebraic query languages have also been introduced for objectoriented databases [421. It would be interesting to investigate if and how our translation strategies and results generalize to such databases.

ACKNOWLEDGMENTS

The authors

are grateful

to the referees

for their

many

excellent

comments

and suggestions.

REFERENCES 1, ABITEBOUL, S., AND BEERI, S. On the power of languages for the manipulation of complex objects. INRIA Tech. Rep. 846. 1988. 2. ABITEBOUL, S., AND BIDOIT, N. Non first normal form relations: An algebra allowing data restructuring. J. Comput. Syst. Sci. 33, 3 (Dec. 1986), 361-393. of 3. AHO, A. V., AND ULLMAN, J. D. Universality of data retrieval languages, In Proceedings the 6th ACM Symposium on Princ~ples of Programming Languages (Jan. 1979, San Antonio, Tex.). ACM, New York, 1979, pp. 110-117. 4. ARISAWA, H., MORIYA, K., AND MIURA, T. Operations and the properties on non-firstof the 9dz VLDB (Florence). 1983, pp. normal-form relational databaaes. In Proceedings 197-205. ACM SIGMOD Rec. 15, 3 5. BANCILHON, F. A logic-programming/object-oriented cocktail. (Sept. 1986), 11-20. ACM Transactions on Database Systems, Vol. 17, No. 1, March 1992.

92

J. Paredaens and D. Van Gucht

.

6. BANCILHON, F., AND KHOSHAFIAN, S. 5th

ACM

A calculus

SIGACT-SIGMOD-SIGART

for complex

Sympos~um

on

objects. In %oceedmgs

Princ~ples

of

Database

of the

Systems

(Mar 1986, Cambridge, Mass.). ACM, New York, 1986, pp. 53-59. On line processing of compacted relations In 7 BANCILHON, F., RICHARD, P., AND SCHOLL, M Proceedings of the 8th VLDB (Sept. 1982, Mexico City). VLDB Endowment, 1982, pp. 263-269. of 8. BEERI, C., AND KORNATZ~Y, Y. The many faces of query monotonicity. In Proceeclmgs Aduances in Database Technology –EDBT ’90 (Mar. 1990, Venice) Lecture Notes in Computer Science, vol. 416. Springer-Verlag, New York, 1990, pp. 120-135 9. BEERI, C., NAQVI, S., RAMAKRISHNAN, R., SHMUELI, O., AND TSUR, S. Sets and negation in a of the 6th ACM SIGACT-SIGMOD-SIGART logic database language (LDL1). In Proceedings SYmposzum on Principles of Database Systems (San Diego, Calif.) ACM, New York, 1987, pp. 21-37. 10. BIDOIT, N.

The Verso algebra

Sci. 35, 3 (Dec. 1987),

11.CODD, E. F. Rustin,

Relational Ed. Prentice-Hall,

12 COLBY, L.

A recursive

of the A CM– SIGMOD

1989, Portland,

and how to answer

queries

without

joins.

J. Comput.

Syst.

321-364.

completeness of database sublanguages. In Database Englewood Cliffs, N.J , 1972, pp. 65-98. algebra

and query optimization

International

Ore.). ACM,

Conference

New York,

for nested relations.

on Management

of Data

R.

Systems,

In proceedings (May 1–June 2,

1989, pp 273-283.

13 DADAM, P. Advanced information management IEEE Data Eng 11, 3 (Dec. 1988), 4-14 relations.

(AIM):

Research

m

extended

nested

14. DADAM, P , KUESPERT, K., ANDERSEN, F., BLANKEN, H., ERBE, R , GUENAUER, J , LUM, V., PISTOR, P., AND WALCH, G A DBMS prototype to support extended NF2 relatlons: An of the ACM- SIGMOD integrated view on flat tables and hierarchies. In Proceedings Internat~onal Conference on Management of Data (May 1986, Austin, Tex.). ACM, New York, 1986, pp. 356-366. A storage

15. DEPPISCH, U., PAUL, H. B., AND SCHE~, H. J Proceedings

of the

Grove, Calif.).

International

Workshop

and

of the GI Conference

Scientific

system

Object-Oriented

for complex

Database

objects. In (Pacific

Systems

1986, pp. 183-195

16. DESHPANDE, A., AND VAN GUCHT, D. Proceedings

on

Applications

(Apr.

A storage

on Database

structure

Systems

1987, Darmstadt,

for

for unnormalized Office

Germany)

relations.

Automation,

Inf

In

Engineering

Fach berzchte

136

(1987),

481-486. 17. DESHPANDE, A., AND VAN GUCHT, D. Proceedings

of

the 14th

18. DESHPANDE, V., AND LARSON, P. A. Dept. of Computer

Science, Univ.

19. GYSSENS,M., AND VAN GUCHT, D. constructs to the nested relational tional

Conference

An implementation

VLDB (Los Angeles, Calif.).

on Management

An algebra

of Waterloo,

for nested relational 1988, pp 76-88.

for nested relatlons.

Ontario,

databases

Res. Rep

In

CS-87-65,

1987.

The powerset algebra as a result of adding programming of the ACM- SIGMOD Internaalgebra In Proceedings of Data (June 1988, Chicago, 111.). ACM, New York, 1988,

pp. 225-232 20. GYSSENS,M., AND VAN GUCHT, D. The powerset algebra J. Comput. SYst. SCZ. To be published. database relatio~s.

as a natural

tool to handle

nested

21. GYSSENS, M., PAREDAENS, J., AND VAN GUCHT, D. A umform approach towards handling atomic and structured information in the nested relational database. J. ACM 36, 4 (Ott. 1989), 790-825. 22. HOUBEN, G., AND PAREDAENS, J. The Rz-algebra: An extension of an algebra for nested relatlons. Tech. Rep., Computer Science Dept., Techmcal Univ., Eindhoven, The Netherlands, 1987. 23. HULL, R., AND Su, J. On the expressive power of database queries with intermediate types. of the 7th ACM SIGACT-SIGMOD-SIGART Symposium on Principles of In Proceedings Database Systems (Austin, Tex. ). ACM, New York, 1988, pp. 39-51. 24. JAESCHKE, G., AND SCHEK, H. ACM Transactions

J.

Remarks

on the

algebra

on Database Swtems, VO1 17, No. 1, March 1992.

on non

first

normal

form

Converting Nested Algebra Expressions relations.

In

Principles

of Database

Proceedings

of

the

1st

ACM

SIGACT-

SIGMOD-

(Los Angeles, Calif,). ACM, programming with sets. J. Comput.

Systems

25. KUPER, G. M. Logic 44-64. 26. KUPER, G. M., AND VARDI, M.

Syst.

93

Symposium

on

1982, pp. 124-138. 41, 1 (Aug. 1990),

Sci.

in the logical data model. In Theory (Bruges, Belgium). Lecture Notes in Computer Science, vol. 326. Springer-Verlag, New York, 1988, pp. 267-280. 27. LARSON, P, A. The data model and query language of Laurel, IEEE Data Eng. 11, 3 (Sept, 1988), 23-30. 28. LINNEMANN, 1’. Non first normal form relations and recursive queries: An SQL-based Proceedings

of the 2nd

On the complexity

SIGART

New York,

.

International

Conference

of queries

on Database

of the3rd IEEE International Conference on Data Engineering approach. In proceedings Angeles, Calif.). IEEE, New York, 1987, pp. 591-598. 29. MAKINOUCHI, A. Aconsideration ofnormal form ofnot-necessarily-normalizedrelationsin of the 3rd VLDB (Oct. 1977, Tokyo). the relational data model. In Proceedings Endowment, 1977, pp. 447-453.

(Los

VLDB

ALogical Language for Data and Knowledge Databases. Freeman, 30. NAQW,S., ANDTSUR,S. San Francisco, Calif., 1989. 31. OZSOYOGLU, G., OZSOYOGLU, Z. M., AND MATOS, V. Extending relational algebra and relational calculus with set-valued attributes andag~egate functions, ACM Trans. Database Syst.12,4

(Dec.

32. PAREDAENS, J., Relational

Model.

1987),566-593.

DE BRA, P., GYSSENS, M., EATCS

Monographs

AND VAN GUCHT, D.

on Theoretical

Computer

The

Science.

Structure

of

the

Springer-Verlag,

New York, 1989. 33. PAUL, H. B., SCHEK, H. J., SCHOLL, M. H., WEIKUM, G., AND DEPPWCH, U. Architecture and Proceedings of the implementation of the Darmstadt database kernel system. In ACM- SZGMOD International Conference on Management of Data (May 1987, San Francisco, Calif.) ACM, New York, 1987, pp. 196-207. 34. PISTOR, P., AND ANDERSEN, F. Designing a generalized NF 2 model with an SQL-type of the 12th VLDB (Aug. 1986, Kyoto). VLDB Endowlanguage interface. In Proceedings ment, 1986, pp. 278-288. 35. PISTOR, P., AND TRAUNMUELLER, R. A database language for sets, lists and tables. Zn~ Syst. 11, 4 (1986),

323-336.

36. ROTH, M. A.,

KORTH, H. F., AND BATORY, D. S. SQL/NF: relational databases. Znfi Syst. 12, 1 (1987), 99-114. 37. ROTH, M. A., KORTH, H. F., AND SILBERSCHATZ,A. Extended Syst. 13, 4 (Dec. relational databases. ACM Trans. Database 38. SCHEK, H. J., AND SCHOLL, M. H. The relational model with Syst.

11, 2 (1986),

A query

language

for ~ lNF

algebra and calculus for nested 1988), 389-417. Infi relation-valued attributes.

137-147.

39. SCHEK, H.

40.

41.

42. 43.

44.

J., PAUL, H. B,, SCHOLL, M. H., AND WEIKUM, G. The DASDBS project: Data Eng. 2, 1 (June Objectives, experiences, and future prospects. IEEE Trans. Knowledge 1990), 25-43. SCHOLL, M. H. Theoretical foundation of algebraic optimization utilizing unnormalized of the 1st International Conference on Database Theory (Aug. 1986, relations. In Proceedings Rome). Lecture Notes in Computer Science, vol. 243. Springer-Verlag, New York, pp. 380-396. SCHOLL, M. H., PAUL, H. B., AND SCHEK, H. J. Supporting flat relations by a nested of the 13th VLDB (Sept. 1987, Brighton, Great Britain). relational kernel. In Proceedings VLDB Endowment, 1987, pp. 137-146. SHAW, G., AND ZDONIK, S. A query algebra for object-oriented databases. Tech. Rep. CS-89-19, Dept. of Computer Science, Brown Univ., Providence, R. I., 1989. in Computing THOMAS, S. J., AND FISCHER, P. C. Nested relational structure. In Advances Research III, The Theory of Databases, P. C. Kanellakis, Ed. JAI Press, Greenwich, Corm., 1986, pp. 269-307. Principles of Database Systems. 2nd ed. Computer Science Press, Rockville, ULLMAN, J. D. Md., 1982.

Received July 1989; revised February

1991; accepted March

1991

ACM Transactions on Database Systems, Vol

17, No. 1, March 1992.

Suggest Documents