Converting Nested Algebra Expressions . 69. (4). (5). (6). (7). A. D(B). C n000 ...... edge. (t(a, tl(al)).It is easy to see that all of the components of the u variables,.
Converting Nested Algebra Flat Algebra Expressions
Expressions
into
JAN PAREDAENS University
of Antwerp
and DIRK VAN GUCHT Indiana
University
Nested relations generalize ordinary flat relations by allowing tuple values to be either atomic or set valued. The nested algebra is a generalization of the flat relational algebra to manipulate nested relations. In this paper we etudy the expressive power of the nested algebra relative to its operation on flat relational databases. We show that the flat relational algebra is rich enough to extract the same “flat information” from a flat database as the nested algebra does. Theoretically, this result implies that recursive queries such as the transitive closure of a binary relation cannot be expressed in the neeted algebra. Practically, this result is relevant to (flat) relational query optimization. Categories and Subject Descriptors: guages; H.3.3 [Information Storage
H.2.3 [Database and Retrieval]:
Management]: Languages—query lanInformation Search and Retrieval—query
formulation
General
Terms: Algorithms,
Languages,
Theory
Additional Key Words and Phrases: Algebraic calculus, nested relations, relational databases
1.
transformation,
nested
algebra,
nested
INTRODUCTION
In 1977 Makinouchi by
query
removing
the
[29] proposed first
normal
generalizing form
introduced a generalization of the relations with set-valued attributes
assumption.
the relational Jaeschke
ordinary relational and by adding two
database and
model
Schek
model by restructuring
[24]
allowing opera-
tors, the nest and the unnest operators, to manipulate such homogeneously nested relations of powerset type. A similar generalization was proposed by
Authors’ addresses: J. Paredaens, Department of Mathematics and Computer Science, University of Antwerp, B-261O Antwerpen, Belgium; D. Van Gucht, Computer Science Department, Indiana University, Bloomington, IN 47405. Permission to copy without fee all or part of thie material is granted provided that the copies are not made or distributed for direct commercial advantage, the ACM copyright notice and the title of the publication and its date appear, and notice is given that copying is by permission of the Association for Computing Machinery. To copy otherwise, or to republish, requires a fee and/or specific permission. An abstract of this paper appeared as “Possibilities and Limitations of Using Flat Operators in of the 7th ACM SIGACT–SIGMOD– SIGART in Proceedings Nested Algebra Expressions,” Symposium on Principles of Database Systems (Austin, Tex.), 1988, pp. 29-38. @ 1992 ACM 0362-5915/92/0300-0065 $01.50 ACM Transactions
on Database Systems, Vol. 17, No. 1, March 1992, Pages 65-93.
66
J. Paredaens and D. Van Gucht
.
Ozsoyoglu, manipulate
Ozsoyoglu, one-level
and Mates [31]. nested relations
(In addition, and showed
they defined a calculus to equivalence of their alge-
braic and calculus-like data manipulation languages.) Thomas and Fischer [43] generalized Jaeschke and Schek’s model and algebra and allowed nested relations of arbitrary (but fixed) depth (see also [2, 4, 12, 18, 21, 32, 381). Roth, the
Korth, nested
and Silberschatz relational
[37] defined
model
of Thomas
SQL-like query languages guages [22], and datalog-like duced
for this
[7, 13-17, relational
model
a calculus-like and
query
Fischer.
Since
[28, 34, 35, 37], graphical-oriented languages [1, 5, 6, 9, 25, 30] have
or for slight
generalizations
of it.
Also,
for
numerous query lanbeen intro-
various
27, 33, 39, 411 have started with the implementation database model, some on top of an existing database
system and others from scratch. In this paper we focus on the nested algebra as proposed Fischer [43]. The key problem we address is the following:
language
then,
groups
of the nested management
by Thomas and “What is the
expressive power of the nested algebra when we are only interested in its operation on flat relations and when the result is also flat?” In other words, “IS it possible to write queries in the nested algebra with flat operands as input and with a flat output that cannot be expressed relational algebra?” We show that the answer to this Hence, from
the flat a flat
theoretical
algebra
database as well
is rich as the
enough
nested
as practical
to extract
algebra
in the ordinary (flat) question is negative.
the same
does. This
“flat
result
information” has interesting
consequences:
—Recursive queries such as the transitive closure of a binary relation cannot be expressed in the nested algebra, because if they could, the transitive closure could be expressed in the flat algebra, contradicting and Unman [3]. In contrast, when the powerset operator
a result of Aho is added to the
nested algebra, then it is possible, as shown by Abiteboul and Beeri [1] and by Gyssens and Van Gucht [19, 20], to express recursive queries. Hence, the nested algebra with the powerset operator is strictly more powerful than the classical nested algebra studied in this paper. In this light, it is also interesting to mention the results of Hull and Su [23] and of Kuper and Vardi [26] regarding the expressiveness of the powerset algebra. They show that there exists set algebra expressions.
a noncollapsing In particular,
hierarchy of classes of powerthey show that the class of
powerset algebra expressions in which one can use up to n (n = O) (nested) applications of the powerset operator is strictly less expressive than the class of nested algebra expressions that can use up to n + 1 (nested) applications of the powerset operator. Our result shows that an analogous property does not hold for the nested algebra; that is, the number allowed (nested) applications of the nest operator plays no role in expressive power of the nested algebra expressions (for a related result,
of the see
[81). —Several
researchers
systems that flat relational
[33, 41] have
proposed
building
database
management
support the nested relational database model and provide a interface. Our result indicates that such systems have the
ACM Transactions on Database Systems, Vol 17, No. 1, March 1992
Converting Nested Algebra Expressions potential
to optimize
the user’s
language, by using it is an interesting relational result.
properties research
query
query,
expressed
in a flat
.
relational
67 query
of the nested algebra [10, 40]. Alternatively, problem to study the degree to which current
optimizers
can
be extended
to
take
advantage
of this
2. FORMALISM In this
section
and
nested
2.1
Scheme
we define
relation
schemes,
relation
instances,
nested
algebra,
calculus. of a Relation
We define schemes to be lists of attributes. Two kinds of attributes are considered: atomic attributes, represented by their name, and structured attributes, represented by their name followed by their scheme. ( Attribute) Such an attribute
is called
+ (Identifier)
atomic,
and
(Identifier)
is called
the
name
of the
attribute. (Attribute) Such
an attribute
is called
+ (Identifier)
structured,
and
(Scheme) (Identifier)
is called
the
name
of
the attribute. (List-of-attributes)
+ (Empty
-1ist)
I (Non_ empty _list-of-attributes) (Non_ empty -list-o
f_attributes)
lists
A ~ and
+ ( Attribute)
I ( Attribute),
(Non_ empty-list-of-attributes) Two
of attributes
Az are called
compatible
(i.e.,
the
attributes have the same nesting structure) if and only if (iff) they empty, or Al = CYl, A’l and Az = az, h’z with A’l, A’z compatible attributes with
and
compatible
al and
az both
atomic
attributes
or both
structured
(ordered) are both lists of attributes
schemes. (Scheme)
All of the identifiers compatible iff their
+ (_(Non-empty-list-o
f-attributes)_)
in a scheme have to be different. Two schemes are called respective lists of attributes are compatible. A scheme is
called fZat iff all of its attributes are atomic. These are some examples of schemes: (A, (A, A database
scheme
2.2
Instances
Let
S be the
atomic
B,c)
E(D(B),
C))
is a set of schemes.
of a Relation
scheme (al, or structured. The {s I s is a finite
Scheme . . . , a.), instances subset
where ai stands for an attribute, either of S, denoted Inst(S), are the set
of ualues(al)
ACM Transactions
x . . . x ualues(am)},
on Database Systems, Vol. 17, No, 1, March 1992.
68
J. Paredaens and D. Van Gucht
.
A
E(D(B)
C)
ABC’
0 000 100 101
I
110 111
1
200 201
0 1
(b)
(a) (a) Instance
o
El
210
Fig. 1.
u El
1
.sl of lnst( A, B, C); (b) instance
S2 of Znst( A, E(D/)(B),
C))
where ualues( A) is the set of natural numbers for and zmlues( A( h)) = Inst(( h)), where A is a nonempty We need to make the following remarks:
A, an atomic attribute, list of attributes.
—All
namely,
atomic
natural –Two
schemes
In Figure Inst( 2.3
attributes
have
the
same
ualues
set,
the
set of the
numbers. are compatible
1 we show
A, lZ(D(B), Nested
We define
iff their
an instance
sets of instances
s ~ of Inst(
are equal.
A, B, C) and
an instance
Sz of
C’)), respectively.
Algebra a nested
algebra
for
manipulating
schemes
and
their
instances,
similar to the one introduced by Thomas and Fischer [43]. It should be noted that the empty operator and the renaming operator were not considered in [43]. The empty operator is introduced here to create empty structured component because
values; nesting
an instance with of nine operators, (1)
the
nest
an empty
operator instance
cannot again
create
yields
such
a single tuple equal to the empty which are defined as follows:
operator
–: Let
(x ~) and
intuitively
instance
set. The
Union operator U: Let (h ~) and (Az) be compatible SI e Inst(( A ~)) and Sz e lnst(( Az )). Then SI U S2 is the SI and SZ and is an instance of the scheme (A ~).
(2) Difference
values,
an empty
and
algebra
schemes, (standard)
( Az) be compatible
and let union of
schemes,
SI e Inst(( A ~)) and Sz e lnst(( AZ )). Then SI – Sz is the (standard) of SI and s~ and is an instance of the scheme (l ~).
not
consists
and
let
difference
(3) Cross product operator x: Let (h ~) and ( Az) be schemes with no common identifiers, and let SI e lnst(( h ~)) and SZe lnst(( Az )). Then SI x SZ is the (standard) (A,, A2).
cross product
of SI and
SZ and
is an instance
ACM Transactions on Database Systems, Vol 17, No. 1, March 1992.
of the
scheme
70
J. Paredaens and D. Van Gucht
.
A
D(B)
000
C
El 1
Fig.
3.
(a) Instance
PE(D(B), ~)(sz); (b) in-
D
()
()
stance p~cEJp~c~c~),c)(sz))
()
ABC —.
100
000
n
1
010
1
El
100 (b)
(a)
(8) Empty
operator
@: Let
(A) be
a scheme,
and
let
A
be
no
identifier
of A. Let s e Inst(( h)). Then DA(s) is an instance of the scheme ( A( h)) that has only one element, being the empty instance of the scheme (h). (9) Renaming operator p: Let ,0(s, A, B) is the (standard) the scheme Algebraic way.
(A), with
the identifier
expressions
An
expression
Ffexpression) of expressions
(h) be a scheme, and let s e Inst((h)). Then renaming of A by B in s. It is an instance of
in
of the the
nested
nested
n,(a2=,(vc; rrl(sl)
first,
2.4
second, of the
Nested
are
is called
defined a
and
second,
third third,
in
flat-flat
B. the
usual
expression
are some examples SI of Figure 1,
x ~2(%)
P~,C,(~D(~)(a2=
results
algebra
algebra
by the identifier
iff its operands and its result are flat. Here in the nested algebra. Reconsider instance
~1(%)
The
A replaced
3(vC;~(v~;D(sl) E(vB;
x
OD(sl)
examples and
fourth
))))
D(s1)))) x ~3(%)
are
ff-expressions;
expressions
the are
fourth
shown
in
is not. Figure
The 4.
Calculus
We define a calculus for manipulating schemes and their instances, similar to the ones introduced by Roth, Korth, and Silberschatz [371 and by Abiteboul and Beeri [1]. A query in the nested calculus has the form
{t[h]/f(t)} list of t is an identifier, called the target variable; X IS a nonempty The scheme of this attributes, called the scheme of t;and f(t)is a formula. query is (A). The free variables of this query are those of f, except for t.
where
ACM TransactIons on Database Systems, Vol. 17, No. 1, March 1992
Converting Nested Algebra Expressions
A
D(B)
000
.
69
C
n
A
El El El
D(B)
E(C)
100
1
‘mm
mm
101
1
2ENII
200
1
201
n
2EH21 (b)
(a) Fig. 2.
(a) Instance
VB,~(sl);
(b) instance
VC,~(u~, ~(sl)).
operator II: (4) Projection (k > 1) be the indexes
Let (h) be the scheme (al, . . . . an), let il, . . . . ‘k of different attributes a,,, . . . . a,, of h, and let ,,, (s) is the (standard) projection on the a,, s e Inst((h)). Then H,,, through u ,k attributes and is an instance of the scheme (a ,1, ,aLh)
(5) Selection different subset
operator
Vi,, A(s)
is
the
the
scheme
and let
(standard)
of the
with
equal the
A be no identifier nest
of
s on
i and
j
h)). Then
(~) have
(A ~, A(XZ)).
scheme
let
s e Inst((
of elements
V: Let list,
(h) be a scheme;
of ( h), and let
of s consisting
(6) Nest operator not the empty in~tance
u: Let
attributes
the
v~zi A(s)
be the
indexes
of
o, =~(s) is the largest
and j components.
i
form of
(h 1, Az), where
h. Let
attributes
is the
s ● Inst((h)). of
A2
set of all
X2 is Then is
an
elements
and
t,
t in s such that t.restricted to h ~ is equal for which there is an element to Al, and tv( A(XJ) = { u restricted to Xz I u 6s and u to t restricted to A ~}. Notice that in this restricted to h ~ is equal to tv restricted definition an end
we have sequence
SI of Figure Figure 2. (7)
made
the notational
of h and
1. The
not
instances
simplifying
a subsequence V& D( S1) and
assumption
of h. Reconsider Vc, ~(VBi
D( s1))
are
that
Az is
instance Shown
in
Unnest operator p.’ Let the scheme (N have the form (A 1, @ Az)), and let s e Inst(( x)). Then p~(k,)(s) is the (standard) unnest of s on the A( X2 ) attribute of k and is an instance of the scheme (X 1, Az). pA(~z)( S) is the set tk for which there is a element t in s such that tK of all elements restricted to h ~ is equal to t restricted to A ~ and tw restricted to h2 is an element of t( A( A2)). Observe that the application of the unnest operator can lose information when applied to instances containing empty-valued tuple components. Reconsider instance S2 of Figure 1. The instances ~mmm,
CJ(S2) and
~U~)(PE(~(m, C)(SZ)) are shown in Figure 3. ACM Transactions on Database Systems, Vol 17, No
1, March 1992
Converting Nested Algebra Expressions
A
o
0
D
1
000
1 A
2
101 110
0
111
1 Fig. 4.
A formula
can have
of attributes, the forms
called
is a query
2
Results of some nested algebraic
o
El
1
Cl
o
u
1
u
expressions.
one of the forms
where all f,s are formulas; ts are different and differ
A term
L.J
1
71
C
o
ABC’
100
D(B)
.
t is an identifier, called from the target variable);
the scheme
of t.Furthermore,
(term,)
= (term,)
(terml)
e (term,).
a tuple variable (all these and h is a nonempty list a formula
can have
one of
and has one of the forms TRUE FALSE t t(a)
(t(a)
is called
a component
oft)
R where
t is a tuple
bute (atomic attribute, is
variable,
R is a relation
identifier,
or structured). The scheme of t(A), A. The scheme of t(A( h)), where
where A is
and A a
a is an attriis an atomic nonempty list
of attributes and A(A) is a structured attribute, is (h). Note that (terml) = ( termz) is only a formula if the schemes of (terml) and (term,) are compatible, and that (term,) = (term,) is only a formula if the scheme of (terml) is A and the scheme of (termz) is compatible with (N. ACM Transactions
on Database Systems, Vol. 17, No. 1, March 1992
72
J. Paredaens and D, Van Gucht
.
Here
are some examples {t[A]
of queries;
13t1[A,
B(C, {~[A,
{tl[C]
l~t,[B(C)](t,(B(
2.5
D)](tleR~ B(C,
C))
The first query is a projection, In the third query, note how
R has the
scheme
t(A)
B(C,
(A,
D)),
= tl(A)))
D)]lmteR}
= {t,[C]
I t,et,(B(C))}
Atlet,(B(C)))}
and the second, an (unsafe) complementation. tz(B(C)) is defined in terms of itself.
Flat Algebra
An expression
of the
nested
algebra
of every
relation
is also an expression
of the
flat
algebra
iff (1) every
attribute
(notice
that
this
eliminates
such an expression), (2) neither
2.6
example
nor the empty
in Section
expression
from
operator
occurs
is atomic
being
present
in
in the expression.
2.3 is an expression
of the flat
of the nested
is also a query
flat
calculus
(1) every
attribute
of every
(2) every
attribute
of the scheme
atomic,
algebra.
OF THE
section
we
translated
into
the nested
algebra.
occurs
of every
a relation
with
to
We suppose
that
respectively,
in the query
variable
that
occurs
iff
is atomic, in the query
is
how
ALGEBRA
expressions
calculus.
scheme
(h)
TO THE of the
Let
rzae, nael,
(R
denotes
nested
and
an
NESTED
CALCULUS can
be
rzaez be expressions
of
element
algebra
of
lnst((h)))
is
{ t[ A] I t e R}.
the
translated
nael
NESTED
show
the nested
translated
– Union:
that
calculus
is a query.
3. TRANSLATION this
relation
of the
and
(3) no term
–R,
in the
operator
Flat Calculus
A query
In
occurs
unnest
and
the nest operator
The first
that
the
U naez
nested to
algebraic
{ t[h]
(where
I f(t)},
A ~ and
expressions { t[hll
I fl(t)},
Xz are
nae,
and
nael,
and
nae2
are,
{ t[ Aql I fz(f)}.
compatible)
is translated
to
is translated
to
{~[~,ll(f,(~)vfz(t))}. –Difference: {t[hl]l(fl(t) ACM Transactions
nael
– naez (where
h ~ and
Az are compatible)
A~f2(t))}. on Database Systems, Vol
17, No. 1, March 1992
Converting Nested Algebra Expressions – Cross product:
nael
is translated
Al and
Az have
(nae)
k = 1 and
no common
73
identifiers)
to
–Projection: of
x naez (where
o
Hi, ....,
different
(where
attributes
ail,
il, . . . . i~ of
. . , u,~
h)
are
is
the
indexes
translated
to
t(~i,)= t~(~~,))}. {t[~~,). ...~i,]/3t~(A)(f(t~)~~=~ —Selection: ui.J( nae) (where i and j are the indexes of different and aj of h) is translated to { t[ N I ( fl t) A t( a,) = ~(~j))}. –Nest:
Suppose
attributes),
that
h has the
and let
x /yY)
— Unnest:
Suppose
form
that
t[i,
aA(nae)
Let
A2 is a nonempty
of h. Vkz;A( nae) is translated
pyd
= t,(~)
h has the form
hz] 13tJA]u2[A7
{
—Empty:
Al, h2 (where
A be no identifier
~, A(h2).
of
‘o
(f(h)~ t(~) C@?
(h) be the
is translated
scheme
of
nae,
and
let
A be no identifier
EXPRESSIONS
we
of
L
to
4. TRANSLATION OF ff-EXPRESSIONS FLAT ALGEBRA section
list to
pA(~2)( nae) is translated
the
this
a,
W)})}~
=
—Renaming: ,0( nae, A, B) is translated to every occurrence of A is replaced by B.
In
attributes
show
that
INTO
ff-expressions
translation
of the
nested
of
nae
where
OF THE algebra
can
be
translated into equivalent expressions of the flat algebra. To establish this result, we first translate the ff-expression into a calculus query that satisfies ACM Transactions on Database Systems,Vol. 17, No. 1, March 1992.
74
J. Paredaens and D. Van Gucht
.
certain
syntactic
translate, query
into
finally calculus
The by
the
a safe
use
the
to find
syntactic
the
flat
algebra.
that
of calculus
441 We
h,= such that and c lJ
the
on the relevant
queries
tl(al)
of the
flat
[441)
equivalence
the
would
like
to
strategy
state
that this
We then constructive
of
to achieve
intermediate calculus
would
calculus
formulas
form is broad but restricted
that
calculus.
We
the
safe
flat
an
open
it
is
result.
otherwise
enough enough require
existential
queries
are
in
are captured
a certain
normal
to allow the formulation to disallow the formulaa powerset
normal
form
operation
in
(enf) iff
k~l,
..vh~),
i, 1 < i < k, -“”
3u,~t[h,~L](c/?
d, is in enf or
=
this
of Unman
setting (see also [11). calculus) formula g is in
~Uzl[h,l]
constructive. rules,
Form
this normal expressions,
for all
a query
about
algebraic
g=(hlv. such that
such
transformation
sense
[11,
restrictions
requiring
an algebraic A (nested
(in
call
several
theorem
Normal
Existential
We
of
a completely
form. Intuitively, of nested algebra tion
help
query
Codd’s
and
problem 4.1
restrictions.
with
=
d,
is
‘--
FALSE
~c,@~d,),
(in which
n,>
case “A
~ d,”
0,1,
>O,
is not written)
t2(ci2),
or ciJ = t1et2
(a),
or where
f is in enf,
= tl(A(h)),
where
f is in enf,
f}>
where
f is in enf,
The formulas of the three examples We now define the disjunction, existential quantification again are in enf. Let
in Section 2.4 are in enf. the conjunction, the negation,
for formulas
in enf so that
g =
,~lh, ()
ACM Transactions
on Database Systems, Vol
17, No. 1, March 1992
the
resulting
and
the
formulas
Converting Nested Algebra Expressions
.
75
with
h, = 3ui1[AtJ
““”
LZL
~~ %,[
and let g’=
,:lh~ ()
with
with
where
h;
= ~ t[h]h,.
In the following LEMMA
1.
lemma,
we establish
For each formula
fl
some simple
If t2 is not a free variable of f,, equivalent to 3 tz[ h]( fl A f2( ta)).
(B)
If tz is not a free variable
of fl(tl)
then and
then (3 tl[hllfl(tl) A ~ t2[A21 f2(t2)) 3tl[h113t2[A21( fl(tJAfJta)). 3 t[N( fl(t)
LEMMA
PROOF. 4.2
2.
(resp.
(gVg’)
v f2(t)) (g y g’) (g Ag’),
This
Constructive
is logically
equivalent
(resp. (g A g’), ~ g, i’~[~]g).
results
from
Lemma
equivalences:
and f2, we have
(A)
(C)
logical
~ g,
1 and
tl
(f,
A ~ tz[ N f2(t2))
is
is not a free variable is
logically
tO (3 d Nfl(t) ?t[~lg)
is
elementary
logically of fa(ta),
equivalent
to
v 5 t[Nf2(t)).
logically
logic.
equivalent
to
❑
Queries
We start this section class of queries admits
by defining constructive the formulation of all
queries. Intuitively, this nested algebra expressions
ACM Transactions on DatabaseSystems,Vol 17, No 1, March 1992.
76
J. Paredaens and D. Van Gucht
.
(Lemma
6); however,
algebraic setting In constructive tuple
variables
cursive
have
(acyclic) variables
property;
these in
established
4.3, The
in Lemma into
or sub query to the result
.@ety
algebra
(in
will
then
the
the components
have
to satisfy
this
by F.
constructive
[Ill
query
sense
regarding the
a flat be
translation
of
the
facilitate
into
of [441) will final
step in the
. . . . u z(aPz)) be a set 1 of components the variables of F) with
are called
...
~U,~,[hz~L](c,l/?
the constructibility the
graph
set of possible
values
possible values of x. The constructibility is the directed graph with the following (1) all
Also,
in a rzorzre-
..vh~),
h, = ~uzl[hzl]
that
query
F = { ul(aP,),
g=(hlv.
nodes.
in an
quantified
graph
of a query
every
theorem
Ul, . . ., ul (these
operations
in a set denoted
of this
8. Codd’s
powerset
in a constructibility
we transform
g be in enf, and let
We now define
require
as constructive queries. components of existentially
are collected
relational
of the variables
informally
would
relation
components
query.
safe queries translation. Let
from
corresponding
Section
constructive
that
to be reachable
way
of tuple Later,
queries
cannot be expressed queries, all of the
components
of the
tuple
‘-4 AC,l,A
of ( h,, F). An edge (x, y) denotes of y is “bounded”
graph of (h,, nodes:
variables
~dz).
U,l,
by the
F), notated
. . . . Uzn,
and
set of
as C(h,,
all
the
F),
elements
of F; (2) all of the tuple (3) all queries (4) all relations and with (1) If
R that
= tz(az)
U,l, . . . . u,., and the variables
appear
is one
(Q, t( A(N)) (4) if
= Q or
hL on the highest
of the
c,,s,
C,JS, then
Q = t( A(h))
Ul, . . . . Ul; in some
level
in some
c,~; and C,J;
then
t,(a,))is an edge
(tl(al),
if
is
(t2(a), tl)is an edge one
of the
if
t2(a)# fi
CZ,S, Q being
a query,
then
is an edge;
te R is one of the
(5) if t= Q is one of the (6) if
in
level
tl(al)) is an edge if tz(az) # F;
# F and (tz(az),
t(A(h))
h, on the highest
edges:
(2) if t,G t,(a)is one of the (3) if
in
appear
the following
tl(al)
tl(al)
variables
Q that
t is a variable
and
C,js, then
(R,
C,JS, Q being
t) is an edge; a query,
t(a) is a component
(Q, t)is an edge; and
then
of t,then
(t, t(a))is an edge.
t(a)is called reachable in a subgraph C’(h,, F) of C(h,, F) iff A component a query Q or a relation R to t(a).A there is a path in C’(h,, F) from component
t(a)is called
reachable
in (h,,
F) iff t(a) is reachable
1 The notions of constructibility graph, reachability, acyclicity, F These variable relative to a set of variable components variable components of the involved enfs (see, in particular, query { d N I g} in Section 4.2). ACM TransactIons on Database Systems, Vol
in C( h,, F).
and constructiveness are defined components correspond to free the definition of a constructive
17, No, 1, March 1992
Converting Nested
Algebra
Expressions
.
77
R o tl(A)
t(A)
t
\ +—
o~o~o
tl(B(C,
D))
● tl
o
(a)
t/-(A)
R
e
e
\
●
t(B(C,D))
(b)
t,(c)
‘1
/“+2
.~a~e +2( B(C))
\
●
{t3[cl
I t3 E t2(B(C))}
(c) Fig. 5.
Constructibility
R A t(A)
= tl(A)),
graphs of the three examples
{t(A)});
(b)
C(7 t e R, {t(A),
●
in Section 2.4. (a) C(3 tl[ A, B(C, D)](tl
@(C’, D))});
(C)
C(3
tz[B(C)l(tJB(C))
=
{QCI I ~3G~JB(C))} A ~1GtJMC)), {tI(C)}). The component (h,, F), for all reachable. tl(C)
In Figure
5b no component
in (g, F) iff it is reachable t(A), tl(A), and tl(B(C, D))
is reachable.
In Figure
5C tz( B(C))
in are and
are reachable.
Informally, elements (hi,
t(a) is called reachable 5a i, 1 s i s k. In Figure
F) is called
(1) all
(h,,
F) is called
of F are reachable
of the
acyclic
iff
C(hz,
components
variables in the A(h,, F); and
acyclic
of the
sequel)
if all
in an acyclic
and
of its
u variables
subgraph
F) has a subgraph tuple all
variables
of the
elements
and
of C( h,, F). A(h,,
all
of the
Formally,
F) such that
ULI, . . . . Vz., (called of F are
the
u
reachable
in
(2) A(hi, F) augmented with all the edges (t(a), Q), with t(a) appearing the query Q, is acyclic. We call the latter graph A U( h,, F).
in
Informally, (h,, F) is called constructive if it is acyclic and if the negated part, di, and all of the subqueries that occur in h, are constructive. Formally, (h,, F) is called
(1) (h,,
constructive
iff
F) is acyclic;
(2) (o?i, @) is constructive; ACM Transactions
on Database Systems, Vol. 17, No. 1, March 1992.
78
J. Paredaens
.
(3) if some
c,Jhas
and D. Van Gucht
the form u(a)
= {Lv[~]
I f},
or {~[7]
I f}
= u(~),
or Ue{w[y]l
then
{ W[Y] I f}
(4) if some
then
f},
is constructive;
c,] has the form
{ W1[Y 11I f’l}
(g, F) is called
and
{~1[~11 I f-l}
=
{ wz[-r ~1 I tz}
are constructive.
constructive
iff for
all
{%[y,]lf,},
i, 1 s i < k, (h,,
F) is constructive.
is called constructive iff (g, F) is constructive, where F = { u(a) I a ● h}. Note that, whenever G C F, (g, G) is constructive if (g, F) is also constructive. recursiue (see The definition of the constructiveness of (h,, F) is clearly
The
{ U[ h] I g}
query
points (2), (3), and (4)). However, it is clear that in spite (purely syntax-oriented) definition is well defined. The first example of Section 2.4 is constructive;
of the recursion the
second
this is
not
of t is reachable in (= t e R, { t( A), t(B(C, D))})) (since neither component be cyclic and (Figure 5b), and neither is the third (since the graph AU would therefore In
does
not
exist,
3-5
we
Lemmas
as can show
be seen
how
from
Figure
constructiveness
5c). is
preserved
in
various
situations. LEMMA
(A)
If
3.
(g,
Let g and g’ be formulas, F)
and
(g’,
F)
are
and let F and F both
be sets of components.
constructive,
then
(g ~ g’, F)
is
constructive. (B)
If (g, F) and (g’, F) no common components
are both constructive or common variables,
and then
if g and g’ have is (g ~ g’, F U F)
(C)
If (g, F) and constructive.
(D)
If (g, F) is constructive and F contains the set of all of the components is constructive, F, being the the variable t, then (~ t[ AIg, F,) of components of F that are not components oft.
constructive. (g’,
@)
are
both
constructive,
then
(g ~
~ g’, F)
is of set
PROOF
(A)
Trivial,
(B)
Clearly, F U F’)
since
(gvg’)
C(h ,,, = A(hl,
= (g ~ g’).
F U F’) = C(h,, F) U C(h:., F’). F) U A(h~,, F’); then AU(hii,,
Now take A(htc, F U F’) = AU(hL,
or common variF) U A U( h~, F’). Since g and g’ have no components ables (and, hence, clearly no common queries), and since A U( h,, F) and ACM TransactIons on Database Systems, Vol. 17, No 1, March 1992
Converting Nested Algebra Expressions
(C)
AU(h;,,
F’) are both
F’) that
contains
Clearly,
(~ g’, a)
ness),
that
equal
to
which
is an acyclic
@)
A(hi,
LEMMA
4.
constructive tz(az)), F) PROOF.
is
This
FJ
It
(g
since
Let
5.
F)
clear
if
that
(h:,
a query
that
(h;,
occurs
FJ
, Fl)
is acyclic.
A(EJ,
Fl)
= Ai7(h,,
F),
We know
that
as a term,
is constructive,
it
and
remains therefore,
❑
= t2(a2)) is in enf and ((hi A tl(al) = tz(a~)), = tz(az)), F) = A(h,, F). choose A((hi A tl(al)
L tl(al)
we
can
❑
F) is constructive.
g be a formula,
and
let F be a set of components.
Let
be constructive. in
(B)
If t( A(AI))
of t and no component
is a component
in ~t[hlg, constructive.
then
L tl(a)
~t[h]g,
(~t[hl(g
Q is a constructive
(St[N(g
R.
of constructive-
AU(h;
Fl)
If tl(al) does not appear is constructive. { tl(a,)})
If
F U
We can choose
(A)
(C)
AU(h,z,,
since no arc ends in
Let g be a formula, and let F be a set of components. If (g, F) is and tl(al) and tz(az) are two elements of F, then ((g ~ tl(al) = is constructive. Clearly,
LEMMA
be
Also,
((hi A tl( CYJ = tz(az)),
(1 t[ Ng,
in
79
: g’, F) is constructive.
and hence,
proves
be a cycle
is constructive.
should
graph,
is constructive.
acyclic,
Hence,
F).
only
is impossible
(see (2) of the definition
(g L
(h”, FJ
is constructive.
(2 t[ Ng,
could
is constructive
to prove
constructive.
F)
there R. This
so by (B) we see that
(D) We have
(di,
acyclic,
a relation
.
(~t[ h](g
tlG t(A(Al))),
L
query
then
and
tl(a)
~ t(a)
= tl(al)),
of tl or tl itself F U {tl(a)l
does not
appear
F U
appears
a e Al}) in I t[ Ng,
is then
= Q), F U { tl(a)})is constructive.
PROOF
(A)
Let
be the ith component of ~t[hl(g We have to prove that { t,(al)}. that
it is acyclic.
We can choose
and ~ t[a)= tl(al)), (i,, F) is constructive. A(~,,
~) as A(3 t[ A]hi,
let ~ be F U We first show F) plus
the edge
tl(al)). Hence, AU(~i, ~) is AU(3 t[~lhi, F) plus the edge (t(a, It is easy to see that all of the components of the u variables, the tl(al)). t and the gle~ents of F and tl(al)are reachable in A(~,, ~). variable tl(al)does not appear in Furthermore, A U(hi, F) is acyclic because (t(a),
(B)
~ t[ A]g.
Clearly
remain
constructive.
(d,,
@)
is
constructive,
and
queries
appearing
in
~,
Let
be
the
ith
component
I~a G Al}. We { tl(ci)
have
of
~t[h](g to
show
ACM Transactions
~ t, e t( A(AJ)), that
(~,, ~)
and is
let
~ = F U
constructive.
We
on Database Systems, Vol. 17, No. 1, March 1992
.
80
J. Paredaens and D. Van Gucht
first
show that
it is acyclic.
the edge (t(A(hI)), A U(3 d h]hz, F) plus the of
components { tl(a)
We can choose
tl) and the edges (tl, the edges just
of the
I a e A ~} and
of
F
are
A U(%,, ~) is acyclic
because
Clearly,
constructive,
(d,,
@)
is
for all
mentioned.
variables,
u
A(%z, ~) as A(3 t[ h]h,,
tl(a))
the
reachable and
~) is
It is easy to see that
all of
in
t and A(Z,,
~).
queries
appearing
the
elements
Furthermore,
of t or t itself
no components
F) plus
AU(%,,
variable
a e Al.
app~ars in
in g.
h remain
constructive. Let
(c)
~,= be the
ith
component
We have acyc~c.
to show ye
variables, in A(~,, appear
in g. Clearly, constructive. 3-5
(d,,
~) as A(3 t[hlhi,
~ = F U { t,(a)}. show
that
it is
F) plus the edge (Q, tl(a)).
and queries
appearing
in ~,
Section into
3, where
a nested
we discussed
calculus
query,
allow
translation
of a
us to establish
expression
of the nested
algebra
can be translated
into
a
query. z Let
nae
by induction
Induction
be an expression on the number
hypothesis.
Every
n (n > O) operators
Basis
let
We first
@) is constructive,
with
expression
Every
6.
ROOF.
than
= Q), and
result:
constructive
lemma
A(~,,
L t,(a)
is constructive.
❑
together
algebra
LEMMA
F)
the variable tl and the elements of F and tl( a) are reachable A U(Z,, ~) is acyclic because tl(ct) does not ~). Furthermore,
the following
be
can choose
remain Lemmas
(h,,
and the edges ( x, Q), F) is AU(3 t[hlh,, F) plus the edge (Q, tl(a)) x appears in Q. It is easy to see that the components of the u
All(h,, where
nested
of_qt[~](g
that
= Q)
3t[~](hLAt1(a)
step.
into
the
expression
query
into
nae = R (with {
nested in
of the
can be translated
(n = O). Then
translated
of the of operators
t[h]Ite R},
algebra.
nested
algebra
a constructive
scheme
which
We
prove
the
nae.
(h)).
This
is clearly
with
fewer
query. expression
can
a constructive
query. Induction Binary
step. operators.
Suppose
nae has
n operators
(n > O).
Then nae = nael
U naez,
nae = nael
– naez,
nae = nael
x nae2.
or or
2 Although
it is not necessary for our results, we conjecture that the converse of this lemma is also true. A possible way to prove this would be to compare constructive queries with the calculus queries introduced in [37] and with the strictly safe calculus queries introduced in [1].
ACM Transactions
on Database Systems, Vol. 17, No 1, March 1992,
Converting Nested Algebra Expressions Since nael and the constructive (i) (ii)
nae can be translated
nae = nael
U nae2.
is constructive
nae = nael L ~,(tl))}.
(iii)
naez have fewer than n operators, they can be translated queries { tl[ Al] I ~l(tl)} and { tz[ Az 1I ~z( tz)}, respectively.
This
query
– naez. This
nae = nael identifiers.
because
nae
query
can
be
is constructive
Unary
because
operators.
into
+JCY)
of Lemmas
y fz(tl))}. L
3(C).
Az have
no common
= t,(a))
~a,h,t(cl)
because
into
{ tl[ All I ( fl( tl)
of Lemma
case h ~ and
A2]l(:t,[hJ(f,(tJ
is constructive
{ tl[ A ~1I ( ~l(tl)
81
3(A).
translated
Notice that in this nae can be translated into
~:t2[A2](f2(t2) query
into
of Lemma
X naez.
{t[h,,
This
.
= t,(a)))}. 3(D),
5(A),
and 3(B).
Then nae = II ,I,,,,,,(nael),
or nae -
u, =J ( nael),
or ~ae = W;A(n%)) or nae = fl~,,)(nael), or nae = @~( nael), or nae = p(nael). Since nael constructive (i)
nae - II,,,,,,
This (ii)
has fewer than n operators, A ~1I fl( tl)}. query { tl[
query
(nael).
because
(iii)
query
is constructive
nae = v~,, ~(nael), {t[h’,
with
t(A(hg)) = t,(~)
ACM Transactions
3(D)
the
and 5(A),
3(B).
nae can be translated
‘~~~lt(a)
= {tz[~2]/sts[~](f(t3)
~aEA2t2(@)
into
= h(~,))}
of Lemma
Al = Al, h2. Then
A(h2)]lstl[~](~(tl)
translated
into
A h(%) because
be
into
of Lemmas
nae can be translated {~l[~l]l(fl(h)
This
can
nae can be translated
is constructive
nae = O,=J(nael).
it
= tl(a)
into
L
~ue~’t[a)
= ~3(~))})}. on Database Systems, Vol. 17, No. 1, March 1992.
82
J Paredaens and D. Van Gucht
.
This query fact that
is constructive
is a constructive (iv)
query
nae = ~~(~l)(nael), {[)+,
(which
with )i2]
(v)
query
ncze = p(rzael,
4.3
because
The
A, B).
Between
of Lemmas
=
3(D),
and the
and 5(A)). into
b(cY))). 5(A),
and 5(B).
into
of Lemmas
renaming
Logically
query
te {d N I f(u)}
form
3(D)
5(C),
~aexl(a)
of
a
3(C) and 5(C). constructive
Equivalent
query
is
a
transformation
rules
Formulas that
ness and that will allow us to “flatten” constructive rules are considered: (1) The query elimination (Tl) the
5(A),
❑
query.
several
because
Lemmas
‘a,x2t(cY)
rme can be translated
Transformations
We now define
from
3(D),
h’. nae can be translated
~ t2et,(Aoi1))
is constructive
constructive
follows
13t,[h1]3t2[h2] (f,(t,) —
is constructive
nae = 0~( nael).
This (vi)
query
of Lemmas
Al = A(hl),
= t,(a) This
because
by the
equivalent
f(t)
preserve
constructive-
calculus queries. Four replaces subformulas of
subformulas.
(2) The
q~er~
propagation (T2) substitutes, if t(a) = Q, every occurrence of t(a) by Q. Applying the query propagation repeatedly results in a query in which each structured component occurs at most once. (3) The structured component elimination (T3) then allows the removal of these components. After applying the structured component elimination, the query no longer contains structured components. The only reason for a query to be nonflat is occurrences of subformulas of the form Q1 = Q’ (where Q1 and Q’ are subqueries). (4) The query equality elimination can then be used to transform these subformulas. Let g be a formula, let F be a set of components, and let (g, 1’) be constructive,
(Tl) query
Query
with
membership
membership
g’=(hlv
~=3u1[AJ
ACM Transactions
elimination.
elimination
. ..vh._lvh;
””” 3un[~n]
rule vh,+lv”
((cl
A”””
Let substitutes
CJ = t1e{t2[A211 g by g’, with
““vh~), AcJ_lAcj+l
A””
on Database Systems, Vol. 17, No. 1, March 1992
.Ac1A7~)A~(tl)).
f(t2)}.
The
Converting Nested (T2)
Query
Q - {tz[~zl propagation
where
propagation.
I ~(~2)},
rule
CL ( p #j)
(1) Delete
the term
the attribute
(3) If the scheme
LEMMA
7.
.
83
Cj = Q = tl(ci)where
of
A(h,,
~).
The
query
with
every
occurrence
the
formula
h,,
from
the scheme
empty,
of u~.
remove
elimination.
Query
structured
3 Un[ 1.
h’,, and let
equality
membership
and
hi.
A(h)
equality
Consider
term in which u~( A( h)) occurs. The substitutes g by g’ in three steps:
c, from
formula
The query
query
only rule
elimination.
of Vn becomes
the resulting
f2(t2)}.
or
edge
is CP(d, respectively)
component
(2) Delete
Query
= Q
Expressions
by Q.
Structured
(T4)
Cj = tl(u)
(d’, respectively)
suppose that Cj is the component elimination
Call
Let
(Q, h(~)) be an g by g’, with
let
substitutes
of tl(a)replaced (T3)
and
Algebra
Let
elimination
substitutes
propagation,
query
I fl(tl)}
= {tz[~zl
I
g by g’, with
structured and
elimination,
c, = {tl[yl]
component equality
elimination,
elimination
preserve
constructiveness. ROOF.
rules
(Tl),
In this (T2),
proof (T3),
we use the notation
let
graphs,
Therefore,
F be a set of components,
C(h,,
1’) contains
in the transformation
To show that query membership we first need to establish a property
(Tl) Query membership elimination. elimination preserves constructiveness, of constructibility
introduced
and (T4).
a subgraph
—that satisfies the reachability definition of constructibility
let
and assume A’(h,,
that
(h,,
3’) is constructive;
then
F)
and acyclicity graphs; and
conditions
mentioned
in the
ACM Transactions on Database Systems, Vol. 17, No, 1, March 1992.
84
J, Paredaens
.
and D, Van Gucht
–for which, for each component of the u variables and there exists a unique query or relation from which element can be reached in A’(hi, F).
each element of F, the component or
To see that this is so, consider C(h,, F), and let x(a) be a component there a u variable or an element of F. Since (h,, F) is constructive, subgraph
A( h,,
F) of C’( h,, F) that
satisfies
the
reachability
and
of either exists
a
acyclicit
y
conditions mentioned in the definition of the constructibility graph. Suppose there exist two different nodes Ql, Qz (each of which is either a query or a relation) in A( h,, F) from which x(a) is reachable (see Figure 6). Let y be the node where the paths from QI and Qz to x(a) join for the first time, let (z, y) be the edge of the path from Qz to x(a) pointing to y. Consider subg-raph
of
that this mentioned
subgraph in the
that
in
A( h,, F) where
this
the
edge
(z, y) is removed.
still satisfies the reachability definition of constructibilit y
graph
x(a)
can
be reached
via
It
should
and the
be clear
and acyclicity conditions graphs and, furthermore,
one
less
in A( hi, F). Repeated removal of such edges eventually A’(h,, F) with the desired properties. (This establishes
path
than
it
could
results in a subgraph a result about con-
structibility graphs that will be necessary in the rest of the proof about the preservation of constructiveness by query membership elimination.) Let Q = {tz[~z] I f(tz)} (remember that CJ = tl e {tz[hzl I f(tz)} in (Tl) and, hence,
that
Cj = tl ~ Q, let f(t2)
= (p,(t2)v
““”
vpm(t2)v
““’
Vpr(t,)),
and let
Then
h;=
Let
him
j aul[~l] m=l
be the
constructive.
rrzth
Consider
““ . 3un[An]3wm1[AmJ
disjunct h, and
“ - “ %Jm.m]
to prove that (h~m, l’) of h;. We have p~( tz). We know that (h,, F) and ( Pn(ta),
(where G = { tz( 13)I tz(i3) e Az } ) A( hi, F) and A( pn( tz), G). There
are constructive. are two cases:
Consider
the
is
G)
graphs
7). Consider what happens in this (1) (Q, tl) is an edge of A’(h,, F) (Figure situation when we “join” the graphs A’(hi, F) and A( p~(tz), G) by applying the query membership elimination rule. Figure 8 shows the resulting graph A U(hjm, F). Clearly, AU(hjm, F) is A(h’lm, F) plus ACM Transactions
on Database Systems, Vol
17, No. 1, March 1992.
Converting Nested
Fig, 6.
x(a) is reachable
“\
.
85
from both QI and Q2
●+-----e-
●
Q1(t2) •~o
/
•~*
A’(hi
ExpressIons
tz(pl)
tl (al)
Q
Algebra
,F)
e~e
●
tl (Ciq)
Qq(t2)
;~’d A(pm(t2
),G)
(a)
(b)
Fig. 7.
(a) Graph
A’(h,,
F); (b) graph
A( pm(tz),
G).
the edges of the form ( z(-y), P) if .z(T) occurs in the query P. (It important to observe that t2 is replaced by tl in these graphs and that has disappeared.) We have to show the reachability the w variables, and the elements AU(hl~, F). Let of F. We consider (a) (b)
of the components of the u variables, of Fin A(h’Lm, F), and the acyclicity of
z(y) be a component two cases:
Z(Y) is not reachable the reachability
from
of a u variable
tl(CYU). Then
of Z(T) in
is Q
or of an element
the reachability
follows
from
A( h,, F).
z(y) is reachable from tl(a”). Then the reachability reachability of tz( @v) in A( pn(tz), G).
follows
Let z(-y) be a component of a w variable. The reachability from the reachability of z(y) in A( Pn(tz), G).
from
of z(y)
the
follows
Next we have to show that A U( h;n, F) is acyclic. To show this, we first identify two parts of the graph A U( h:m, F), called L and R, respectively. L is the subgraph of AU(h;~, F) corresponding to A’ U( h,, F), and R is the
subgraph
of
A U(h’in,
F)
corresponding
to
A U( p~( tz ), G)
(see
ACM Transactions on Database Systems, Vol. 17, No. 1, March 1992
86
J. Paredaens and D. Van Gucht
.
‘2(P1)
●+-----exl
Q1(t2) o~e
(3)
Xq (5 ) A’(hi
,F)
;~”d
+ (aq)
Qq(t2)
M+Jt2
))G)
(b)
(a) Graph
Fig. 8
Figure
8). It should
.Z(7) occurs
e~e
be clear
in the query
that
P, might
(a) Z(7) is not a component rise to cyclicity:
of tl.
A U(h~m,
only
edges of the form
create
cycles.
We consider
(i) z(y) occurs in L and appears path in AU(h~m, F), giving a path necessarily since z(y) occurs
F).
(z(y),
We consider two
P), where two cases:
cases that
in P in R. Suppose that rise to a cycle. Notice
may
give
there is a that such
has to go through a component of tl. Now, in Q before the query in P, it also occurred
membership elimination that this would have
rule was applied. given rise to a cycle
It is easily seen in A’ U( h,, F), a
contradiction. (ii)
(b) z(~)
z(y) occurs in R and appears in P in L. Notice that the only Because of paths from L to R originate in tl or in its components. the special characteristics of A’(h,, F), there is no path from P to tl or to any components of tl;hence, the edge (z(y), P) cannot create a cycle in A U(h: ~, F). There of tl,say, tl(au).
is a component
are two cases:
(i) Suppose that P is in L. Then the edge (tl(CYU), P) was added to F). This edge cannot give rise to a cycle, since in A’(h,, F) A(h;~, there is not A’77(hz, F’). (ii)
suppose A(h’,m,
that
a path
from
P is in
F). There
P to
R. Then
tl( au) because the
edge
(tl(au),
of the
acyclicity
P) was
added
of to
are two cases to consider:
(ii.1)
tl(au) occurred in P before the query membership elimination rule was applied. In this case, tl(aU) occurred in Q, giving rise to a cycle in A’ U( h,, F), a contradiction.
(ii.2)
tz(DU) occurred in P before the query membership elimination rule was applied. In this case, the acyclicity of
ACM TransactIons on Database Systems, Vol. 17, No. 1, March 1992
Converting Nested Algebra Expressions L
Q1(t2)
‘1/”
“~o—”
“k
Fw
,F)
( pn(tz), AU(h;m,
Qq(t2)
Graphs
G) guarantees F).
(2) (Q, tl) does not appear
in
queries
that
(Q, tl) case.
A’(h,,
so no reachability of variable existence of Q; in particular, other
F) and A(p~(t2), G).
A(h,,
We have thus shown that, when acyclic. We now turn to the other
from
.—.
‘1 (aq)
Fig. 9.
results
87
R
+(~1)
A(h’im
.
there
can
be
is an edge of A(hl,
F). Hence,
in
A’(h,,
no
F),
cycle
in
(h’im, F)
is
F) no edges leave
components results as a consequence the reachability of the components
or relations
(see Figure
Q,
of the of tl
9).
For A(h~m, F) we propose using the graph shown in Figure 10. Clearly, AU(h’im, F) is A(h’im, F) plus the edges (z(-y), P) if z(~) appears in the query P. Again, we introduce the notation L and R. It should be clear that the reachability of the components of tl and the components of the u variables follows from their reachability in A’( hi, F). The reachability of the components of the w variables follows from their reachability
in
cause there Hence,
(hjm,
A( pm( t2 ), G).
are no paths
1’) is acyclic
from
in this
The
acyclicity
of
A U( h’i ~, F)
L to R and vice versa case also. Furthermore,
is constructive, it follows from Lemma 3(C) Hence, (h:, F) and (g’, F) are constructive.
that
follows
(see Figure since
( hjm, F)
be-
10).
(d V eJ(tl),
Ql)
is constructive.
(T2) Query propagation. We have to prove that (h;, F) is constructive. tl(a) (remember that tl(a) = Q or Q = tl(a) in (T2)) can First note that occur within a subquery P of h ~. In P, tl( u) is certainly a free variable component. constructibility
This
implies graphs,
and (2) in the edge part
that
at the query
we have definition
that
P, when
no edge leaves
of constructibility
we consider tl(a)
its associated
(see conditions
graphs)
(1)
and, therefore,
of cyclicity in the that the substitution of tl(a)by Q cannot lead to problems subquery P. The reachability of the components of the u variables and the elements of F in A( h’,, F) follows from the reachability of these ACM Transactions on Database Systems, Vol. 17, No. 1, March 1992.
.
88
J. Paredaens and D. Van Gucht
L
%
‘“F‘:,7“
Xq (8)
e =---—e_____e__e
and
“
“d” ~ Fig. 10.
components
o~o
xl (a)
‘\
‘~
Q,(L)
.--)
elements
in
Graph
A(hjm,
A( h,, F).
The
Qq(t2)
F).
only
interesting
observation
to
from a query P and on the path from P make is that, when x(8) is reachable F) the reachability of x(~) will to x(6) the node tl(a) occurs, then in A(hj, result from the fact that there is a path from Q to x(/3) in A( hi, I’). We now turn
to the acyclicity
occurrence
of
consider
F). We first
of (h’,,
tl(a);otherwise,
two cases in which
remark
that
the
query
Q contains
no
there would be a cycle in A U( h,, F). We substitution of tl(cx) may potentially give rise
the
to cyclicity:
(a) Let
x( f?) be reachable
occurs
on the
path
from
a query
P in
P to x((3). After
from
A U(h,,
substituting
F), tl(a)
such by
that
tl(a)
Q, there
is
a path from Q to x(/3). Suppose that x(8) occurs in Q. But then there F), namely, the path consisting of the edge would be a cycle in A ll(h,, going
from going
edge
Q to from
tl( a),
x(6)
the
path
going
from
tl(a)
to
x(~),
and
the
to Q, a contradiction.
from a query P in A i7(h L, F), and suppose (b) Let x(@) be reachable of tl( a). Hence, after the substitution, contains an occurrence
that
P
P
is
transformed into a query P’ in which tl(a) is replaced by Q. Suppose that of x((3). This would introduce a cycle in Q contains an occurrence A U(h;, F). This cannot happen, however, because there would be a cycle in
A U( h,, F),
tl(a), the
namely,
the edge going edge going from
the
Hence, (h;, 1’) is acyclic. Also, for tl(CY) in d does not affect structive sub queries are not Hence, (hj, F) is constructive, (T3) eliminate
Structured terms,
ACM Transactions
component attributes,
path
consisting
of the
from tl(a) to P, the path x(~) to Q.
going
edge going from
from
P to x(6),
it can easily be seen that the substitution the constructiveness of (d’, @). Finally, affected by the substitution of Q for and therefore, so is (g’, F’). elimination.
and,
eventually,
This
is obvious
quantifiers.
on Database Systems, Vol. 17, No. 1, March 1992.
since
we
Q to and
of
Q
contl( a).
only
Converting Nested (T4)
Query
structive.
equality
We first
choose for
elimination.
show
A( h’i, F) the
that
We
(h:,
graph
Algebra
have
to show
F) is acyclic.
that
This
follows
the
nodes
A( h ~, F) without
and { the queries { tl[y ~1I ~l(tl)} queries are preserved by the query
Expressions
.
(h;,
F)
because
89
is
corresponding
t2[-y21I f2(t2)}.
Clearly, constructive elimination rule. We only
equality
con-
we can to subneed
to show that
is constructive. LEMMA
This
from
A constructive
8.
PROOF.
follows
According
Lemmas
query
to Unman
3(C), 3(D),
in the flat
[44],
calculus
a query
{
❑
and 3(A). is safe.
t[h]I f}
of the flat
calculus
is
safe iff (1) whenever
t satisfies
relation
f
each
of the database;
component
of
t occurs
somewhere
in
a
and
(2) whenever 3 u[yl fl occurs in the query, if fl is satisfied by u for any values of the other free variables in fl, then each component of u occurs somewhere in a relation of the database. query in the flat calculus. Let { t[N I f} be a constructive ( f,F) is constructive, where F = { t(a) I a e X}. In particular, (1) all of the elements
of F, hence,
(f, F), which implies components; and (2) in
a subformula
item
3 v[-YI fl
all
of the
since
4.4
t only
components
( fl, Fl), F1 = { u(a) I a e ~}, which implies components of the u variables are atomic
implies
that
of t,are reachable
all the components
(1) above,
This
has atomic of
u are
in
attribute
reachable
item (2) above, since attribute components.
in
all of the
❑
Translate
Algorithm
We now present an algorithm to construct a query in the flat algebra that is logically equivalent to a formula of the nested algebra with flat operands and a flat
result.
Algorithm
translate
Input.
An ff-expression
Output.
An expression
of the nested
algebra.
of the flat algebra.
Method (1) Transform 6. (2) Repeat
steps
(2. 1) Repeat (2.2)
the given
ff-expression
(2. 1) and (2.2) qu..y
until
membership
Repeat query propagation elimination until no longer
into
a constructive
no longer elimination immediately possible.
calculus
query,
using
Lemma
possible. until
no longer
followed
by
possible. structured
component
ACM Transactions on Database Systems, Vol. 17, No. 1, March 1992.
90
J. Paredaens and D. Van Gucht
.
(3) Repeat
query
equality
elimination
(4) Apply Codd’s theorem algebra expression.
We illustrate step following constructive
until
for translating
(2) of the query:
{t[A]l~t,[B(C)]
no longer
algorithm
with
~t2[D(E(F))](te
t2(D(E(F)))
possible.
a safe calculus
query
an
tl(B(C))
so we apply
{t[A]\~tl[B(C)]
flat
Consider
the
example.
= {t4[I]lt4~
step (2.2).
(tetl(B(C))
an equivalent
At1et2(D(E(F)))A
= {t3[G(H)]lt3(G(H))
Step (2. 1) is not applicable,
into
Atl=
This
R}})}.
yields
the query
{t,[G(@]lt,(G(~))
={t4[I]lt4GR}})}.
Now
we can apply
step (2.1),
{t[A]l~t,[B(C)] Now
we apply
and we get the query
(tGtl(B(C))
step (2.2),
apply
= {t4[I]\t4GR})}.
and we get {t[A]lte
We finally
Atl(B(C))
{tA[l]ltleR}}.
step (2.1) to get the q~ery {t[A]lteR}
LEMMA
Algorithm
9.
Translate,
taking
nested algebra, stops and produces equivalent to the given ffiexpression. PROOF. prove
performed. nent
First
that
it The
of every
component
we ends. result
prove
be
the
algorithm
7 indicates
is a constructive
variable can
that
Lemma that left
occurs after
as input
an expression
is correct,
how query.
in
step
the
query
(2).
Nor
t(a) G Q, t(a) = Q or Q = t(a) left. the form QI = Qz are eliminated by step created by this step, all of the queries are the result after step (3) is a constructive query is safe by Lemma 8 and can be form
an ffiexpression
of the flat
if
steps
(2)
Since
every
is reachable, are
any
it
and
of the
algebra
ends. (3) no
Then
have
structured formulas
that
to
is
we be
compostructured of
the
Since all of the formulas of (3), and since no new query is eliminated after step (3). Hence, query in the flat calculus. This transformed into an equivalent
expression in the flat algebra. In order to prove that the algorithm ends, we call n~ the number of queries and n~ the number of occurrences of structured components in the query. Suppose that we apply step (2) each time on sub queries without occurrences of structured components. Each time step (2. 1) or step (2.2) is executed, the value n~ + ns becomes smaller, since every ❑ (h,, F) is acyclic. Each time step (3) is applied, n~ becomes smaller. ACM Transactions
on Database Systems, Vol
17, No. 1, March 1992
Converting Nested We are now able to formulate For
THEOREM.
equivalent
every
expression
ffiexpression
in the flat
The transitive
COROLLARY.
our main
Algebra
Expressions
.
91
result:
in
the
nested
algebra,
there
is
an
algebra.
closure
is not expressible
in the nested
algebra.
5. CONCLUSION Our main result states that the flat algebra is as expressive as the nested algebra operating on a flat database and with a flat relation as a result. A simple consequence is, therefore, that recursive queries, such as the transitive
closure,
are
not
expressible
in the
nested
separates the nested algebra and the powerset that recursive queries are indeed expressible again
suggests
flat algebra, Although (either
that
whereas we did
atomic
the
nested
algebra
the powerset not consider
or structured)
algebra.
This
is a conservative
algebra is not. selection operations
or selection
result
clearly
algebra since it is well known in the powerset algebra. This
operations
extension involving
involving
of the
constants subset
condi-
tions, it is straightforward to generalize our results to include these options. This generalization notably expands the range of practical queries that are affected by our results. We believe that our mization. tional
This queries
results
will
clearly
can
be
can contribute
depend
processed
on how in
to (flat) efficiently
relational
Related to this issue, it would be interesting proof of our main result, that is, a proof
relational flat-to-flat
database
query
opti-
nested
rela-
implementations.
to obtain a complete algebraic without having to introduce an
intermediate calculus. Finally, algebraic query languages have also been introduced for objectoriented databases [421. It would be interesting to investigate if and how our translation strategies and results generalize to such databases.
ACKNOWLEDGMENTS
The authors
are grateful
to the referees
for their
many
excellent
comments
and suggestions.
REFERENCES 1, ABITEBOUL, S., AND BEERI, S. On the power of languages for the manipulation of complex objects. INRIA Tech. Rep. 846. 1988. 2. ABITEBOUL, S., AND BIDOIT, N. Non first normal form relations: An algebra allowing data restructuring. J. Comput. Syst. Sci. 33, 3 (Dec. 1986), 361-393. of 3. AHO, A. V., AND ULLMAN, J. D. Universality of data retrieval languages, In Proceedings the 6th ACM Symposium on Princ~ples of Programming Languages (Jan. 1979, San Antonio, Tex.). ACM, New York, 1979, pp. 110-117. 4. ARISAWA, H., MORIYA, K., AND MIURA, T. Operations and the properties on non-firstof the 9dz VLDB (Florence). 1983, pp. normal-form relational databaaes. In Proceedings 197-205. ACM SIGMOD Rec. 15, 3 5. BANCILHON, F. A logic-programming/object-oriented cocktail. (Sept. 1986), 11-20. ACM Transactions on Database Systems, Vol. 17, No. 1, March 1992.
92
J. Paredaens and D. Van Gucht
.
6. BANCILHON, F., AND KHOSHAFIAN, S. 5th
ACM
A calculus
SIGACT-SIGMOD-SIGART
for complex
Sympos~um
on
objects. In %oceedmgs
Princ~ples
of
Database
of the
Systems
(Mar 1986, Cambridge, Mass.). ACM, New York, 1986, pp. 53-59. On line processing of compacted relations In 7 BANCILHON, F., RICHARD, P., AND SCHOLL, M Proceedings of the 8th VLDB (Sept. 1982, Mexico City). VLDB Endowment, 1982, pp. 263-269. of 8. BEERI, C., AND KORNATZ~Y, Y. The many faces of query monotonicity. In Proceeclmgs Aduances in Database Technology –EDBT ’90 (Mar. 1990, Venice) Lecture Notes in Computer Science, vol. 416. Springer-Verlag, New York, 1990, pp. 120-135 9. BEERI, C., NAQVI, S., RAMAKRISHNAN, R., SHMUELI, O., AND TSUR, S. Sets and negation in a of the 6th ACM SIGACT-SIGMOD-SIGART logic database language (LDL1). In Proceedings SYmposzum on Principles of Database Systems (San Diego, Calif.) ACM, New York, 1987, pp. 21-37. 10. BIDOIT, N.
The Verso algebra
Sci. 35, 3 (Dec. 1987),
11.CODD, E. F. Rustin,
Relational Ed. Prentice-Hall,
12 COLBY, L.
A recursive
of the A CM– SIGMOD
1989, Portland,
and how to answer
queries
without
joins.
J. Comput.
Syst.
321-364.
completeness of database sublanguages. In Database Englewood Cliffs, N.J , 1972, pp. 65-98. algebra
and query optimization
International
Ore.). ACM,
Conference
New York,
for nested relations.
on Management
of Data
R.
Systems,
In proceedings (May 1–June 2,
1989, pp 273-283.
13 DADAM, P. Advanced information management IEEE Data Eng 11, 3 (Dec. 1988), 4-14 relations.
(AIM):
Research
m
extended
nested
14. DADAM, P , KUESPERT, K., ANDERSEN, F., BLANKEN, H., ERBE, R , GUENAUER, J , LUM, V., PISTOR, P., AND WALCH, G A DBMS prototype to support extended NF2 relatlons: An of the ACM- SIGMOD integrated view on flat tables and hierarchies. In Proceedings Internat~onal Conference on Management of Data (May 1986, Austin, Tex.). ACM, New York, 1986, pp. 356-366. A storage
15. DEPPISCH, U., PAUL, H. B., AND SCHE~, H. J Proceedings
of the
Grove, Calif.).
International
Workshop
and
of the GI Conference
Scientific
system
Object-Oriented
for complex
Database
objects. In (Pacific
Systems
1986, pp. 183-195
16. DESHPANDE, A., AND VAN GUCHT, D. Proceedings
on
Applications
(Apr.
A storage
on Database
structure
Systems
1987, Darmstadt,
for
for unnormalized Office
Germany)
relations.
Automation,
Inf
In
Engineering
Fach berzchte
136
(1987),
481-486. 17. DESHPANDE, A., AND VAN GUCHT, D. Proceedings
of
the 14th
18. DESHPANDE, V., AND LARSON, P. A. Dept. of Computer
Science, Univ.
19. GYSSENS,M., AND VAN GUCHT, D. constructs to the nested relational tional
Conference
An implementation
VLDB (Los Angeles, Calif.).
on Management
An algebra
of Waterloo,
for nested relational 1988, pp 76-88.
for nested relatlons.
Ontario,
databases
Res. Rep
In
CS-87-65,
1987.
The powerset algebra as a result of adding programming of the ACM- SIGMOD Internaalgebra In Proceedings of Data (June 1988, Chicago, 111.). ACM, New York, 1988,
pp. 225-232 20. GYSSENS,M., AND VAN GUCHT, D. The powerset algebra J. Comput. SYst. SCZ. To be published. database relatio~s.
as a natural
tool to handle
nested
21. GYSSENS, M., PAREDAENS, J., AND VAN GUCHT, D. A umform approach towards handling atomic and structured information in the nested relational database. J. ACM 36, 4 (Ott. 1989), 790-825. 22. HOUBEN, G., AND PAREDAENS, J. The Rz-algebra: An extension of an algebra for nested relatlons. Tech. Rep., Computer Science Dept., Techmcal Univ., Eindhoven, The Netherlands, 1987. 23. HULL, R., AND Su, J. On the expressive power of database queries with intermediate types. of the 7th ACM SIGACT-SIGMOD-SIGART Symposium on Principles of In Proceedings Database Systems (Austin, Tex. ). ACM, New York, 1988, pp. 39-51. 24. JAESCHKE, G., AND SCHEK, H. ACM Transactions
J.
Remarks
on the
algebra
on Database Swtems, VO1 17, No. 1, March 1992.
on non
first
normal
form
Converting Nested Algebra Expressions relations.
In
Principles
of Database
Proceedings
of
the
1st
ACM
SIGACT-
SIGMOD-
(Los Angeles, Calif,). ACM, programming with sets. J. Comput.
Systems
25. KUPER, G. M. Logic 44-64. 26. KUPER, G. M., AND VARDI, M.
Syst.
93
Symposium
on
1982, pp. 124-138. 41, 1 (Aug. 1990),
Sci.
in the logical data model. In Theory (Bruges, Belgium). Lecture Notes in Computer Science, vol. 326. Springer-Verlag, New York, 1988, pp. 267-280. 27. LARSON, P, A. The data model and query language of Laurel, IEEE Data Eng. 11, 3 (Sept, 1988), 23-30. 28. LINNEMANN, 1’. Non first normal form relations and recursive queries: An SQL-based Proceedings
of the 2nd
On the complexity
SIGART
New York,
.
International
Conference
of queries
on Database
of the3rd IEEE International Conference on Data Engineering approach. In proceedings Angeles, Calif.). IEEE, New York, 1987, pp. 591-598. 29. MAKINOUCHI, A. Aconsideration ofnormal form ofnot-necessarily-normalizedrelationsin of the 3rd VLDB (Oct. 1977, Tokyo). the relational data model. In Proceedings Endowment, 1977, pp. 447-453.
(Los
VLDB
ALogical Language for Data and Knowledge Databases. Freeman, 30. NAQW,S., ANDTSUR,S. San Francisco, Calif., 1989. 31. OZSOYOGLU, G., OZSOYOGLU, Z. M., AND MATOS, V. Extending relational algebra and relational calculus with set-valued attributes andag~egate functions, ACM Trans. Database Syst.12,4
(Dec.
32. PAREDAENS, J., Relational
Model.
1987),566-593.
DE BRA, P., GYSSENS, M., EATCS
Monographs
AND VAN GUCHT, D.
on Theoretical
Computer
The
Science.
Structure
of
the
Springer-Verlag,
New York, 1989. 33. PAUL, H. B., SCHEK, H. J., SCHOLL, M. H., WEIKUM, G., AND DEPPWCH, U. Architecture and Proceedings of the implementation of the Darmstadt database kernel system. In ACM- SZGMOD International Conference on Management of Data (May 1987, San Francisco, Calif.) ACM, New York, 1987, pp. 196-207. 34. PISTOR, P., AND ANDERSEN, F. Designing a generalized NF 2 model with an SQL-type of the 12th VLDB (Aug. 1986, Kyoto). VLDB Endowlanguage interface. In Proceedings ment, 1986, pp. 278-288. 35. PISTOR, P., AND TRAUNMUELLER, R. A database language for sets, lists and tables. Zn~ Syst. 11, 4 (1986),
323-336.
36. ROTH, M. A.,
KORTH, H. F., AND BATORY, D. S. SQL/NF: relational databases. Znfi Syst. 12, 1 (1987), 99-114. 37. ROTH, M. A., KORTH, H. F., AND SILBERSCHATZ,A. Extended Syst. 13, 4 (Dec. relational databases. ACM Trans. Database 38. SCHEK, H. J., AND SCHOLL, M. H. The relational model with Syst.
11, 2 (1986),
A query
language
for ~ lNF
algebra and calculus for nested 1988), 389-417. Infi relation-valued attributes.
137-147.
39. SCHEK, H.
40.
41.
42. 43.
44.
J., PAUL, H. B,, SCHOLL, M. H., AND WEIKUM, G. The DASDBS project: Data Eng. 2, 1 (June Objectives, experiences, and future prospects. IEEE Trans. Knowledge 1990), 25-43. SCHOLL, M. H. Theoretical foundation of algebraic optimization utilizing unnormalized of the 1st International Conference on Database Theory (Aug. 1986, relations. In Proceedings Rome). Lecture Notes in Computer Science, vol. 243. Springer-Verlag, New York, pp. 380-396. SCHOLL, M. H., PAUL, H. B., AND SCHEK, H. J. Supporting flat relations by a nested of the 13th VLDB (Sept. 1987, Brighton, Great Britain). relational kernel. In Proceedings VLDB Endowment, 1987, pp. 137-146. SHAW, G., AND ZDONIK, S. A query algebra for object-oriented databases. Tech. Rep. CS-89-19, Dept. of Computer Science, Brown Univ., Providence, R. I., 1989. in Computing THOMAS, S. J., AND FISCHER, P. C. Nested relational structure. In Advances Research III, The Theory of Databases, P. C. Kanellakis, Ed. JAI Press, Greenwich, Corm., 1986, pp. 269-307. Principles of Database Systems. 2nd ed. Computer Science Press, Rockville, ULLMAN, J. D. Md., 1982.
Received July 1989; revised February
1991; accepted March
1991
ACM Transactions on Database Systems, Vol
17, No. 1, March 1992.