on view pointer caches. .... on Database. Systems, Vol 20, No 2, June 1995 ... pointer caches to store the result of old queries as pointers to the qualifying tuples.
Incremental Computation of Nested Relational Query Expressions LARS
BAEKGAARD
Aalborg
University
and LEO
MARK
Georgia
Institute
of Technology
Efficient algorithms for incrementally computmg nested query expressions do not exist. Nested query expressions are query expressions in which selection/join predicates contain subqueries. In order to respond to this problem, we propose a two-step strategy for incrementally computing nested query expressions. In step (1), the query expression is transformed into an equivalent unnested flat query expression, In step (2), the flat query expression is incrementally computed. To support step (1), we have developed a very concise algebra-to-algebra transformation algorithm, and we have formally proved its correctness The flat query expressions resulting from the transformation make intensive use of the relational set-difference operator. To support step (2), we present and analyze an efficient algorithm for incrementally computing set differences based on view pointer caches. When combined with existing incremental algorithms for SPJ queries, our incremental set-difference algorithm can be used to compute the unnested flat query expressions efficiently. It is important to notice that without our incremental set-difference algorithm the existing incremental algorithms for SPJ queries are useless for any query involving the set-difference operator, including queries that are not the result of unnesting nested queries. Categories and Subject Descriptors: H.2.2 [Database Management]: Physical Design—access methods; H.2.3 [Database Management]: Languages—query languages; H.2.4 [Database Management]: Systems—query processes General
Terms: Algorithms,
Performance
Additional Key Words and Phrases: Incremental differences, unnesting, view pointer caches
computation,
nested
query
expressions,
set
1. INTRODUCTION Since the emergence of the relational model of data [Codd research effort has been devoted to the problems of efficiently
1970], much computing
Authors’ addresses: L. B=kgaard, Department of Mathematics and Computer Science, Aalborg University, Fr. Bajers Vej 7E, DK-9220 Aalborg 0, Denmark; L. Mark, College of Computing, Georgia Institute of Technology, Atlanta, GA 30322-0280. Permission to make digital/hard copy of part or all of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage, the copyright notice, the title of the publication, and its date appear, and notice is given that copying is by permission of ACM, Inc. To copy otherwise, to republish, to post on servers, or to redistribute to lists, requires prior specific permission and/or a fee. @ 1995 ACM 0362-5915/95/0600-0111 $03.50 ACM Transactmns on Database Systems, Vol 20, No. 2, June 1995, Pages 111-148
112
.
L Ba?kgaard and L Mark
relational
query
expressions
[Blakeley
and
Martin
1990;
Jarke
1985;
Negri
and Pelagatti, 1991; Omiecinski 1989; Ozsu and Meechen 1990; Sacco 1986; Scholl et al. 1987; Sellis 1986; Unman 1989; Valduriez 1987]. A query can be computed
by means
The
idea
basic
of recomputation,
or it can be computed
[Jarke
of recomputation
and
Koch
1984;
incrementally.
Smith
and
Chang
1975; Unman 1989] is to compute a query expression from scratch each time computation [Blakeley et al. it is referenced. The basic idea of incremental 1986; Blakeley and Martin 1990; Hanson 1987; Lindsay et al. 1986; Qian and Wiederhold persistent
1991; Roussopoulos caches and to reuse
We are expressions
1991] them.
is to store
not aware of any incremental of the following form:
SELECT FROM WHERE
the
result
algorithms
of old queries
for
SQL-like
in
query
.,. &LECT FROM WHERE
.. )
CONTAINS
(SELECT FROM WHERE
.. .. . ..)
The most likely explanation is that it is very difficult, if not impossible, to define simple and efficient incremental update rules for such queries. In general, it is very hard to compute such queries efficiently at all. The simplest solution fact that but
would this
be to use a nested-iteration can be very
we are not
aware
expensive.
strategy,
Indices
of any efficient
may
methods
but
it is a well-known
be utilized
that
in special
are generally
cases,
applicable.
Nested query expressions are very useful in many situations, but because of the lack of efficient computation methods, most commercial database management systems do not support the CONTAINS operator. In response to this
problem,
we suggest
a two-step
strategy
for the incremental
of nested query expressions. In step (1) all nesting CONTAINS, are removed by transforming a nested
computation
operators, like IN and query expression to an
equivalent flat expression. In step (2) the flat expressions are computed by conventional recomputation methods or by incremental methods. With respect to step (l), we present an unnesting algorithm that transforms nested relational algebra expressions into equivalent flat relational algebra expressions. The resulting expressions are based on a combination of selections, projections, joins, and set differences. Other transformation algorithms exist, but we have invented a very simple and concise notation based on algebra-to-algebra transformations. Our novel notation has made it possible to formulate transformation in a clear and readable way, and it has made it possible for us to construct a simple and convincing correctness proof for the transformation algorithm. In order to facilitate the correctness proof, we have expressed our unnesting algorithm in terms of algebra-to-algebra transformations. In order to make the results directly applicable to SQL, we have extended the relational algebra with simple nesting constructs that resemble the structure of SQL. No existing nested algebra has this desirable characteristic. We have exACM
TransactIons
on Database
Systems,
Vol
20, No 2, June
1995
Nested Relational Query Expressions pressed
most
of our
examples
paper more readable. Our unnesting algorithm
in an SQL-like can transform
notation
tree
in order
queries
with
. to make
arbitrarily
nesting. Specifically, it transforms nested comparison selections tion predicates of the form “(value) = (query) ,“ set-membership with selection predicates selections with selection Subqueries and outer not transform queries With
respect
incremental
to
of the form “(value) predicates of the form
step
(2),
we
present
algorithm.
related.
and
set-difference
set-difference algorithms, expressions view
operator.
We
algorithms. our algorithms
the
incremental
qualifying incremental
caches
not
aware
Our
to
store
incremental
the
result
generated algorithms
algebra.
efficient
incremental
existing incremental to compute nested queries
In Section
4 we present
three
Section 6 we discuss the ideas and assumptions tation, and we summarize the state of the art. data
structures,
that
is cost
efficiency
view
pointer
In Section efficient
caches,
many
of our incremental
a sort–merge
2. RELATED
that
8 we present
in
SPJ query
algorithm
uses
as pointers
to the
component in the the set-difference
by our unnesting algorithm or not. Without it, for SPJ queries cannot be used. Our cost model
expressions. In Section 5 we present an unnesting the categories from Section 4, and we formally
computation.
of an can be
computations strongly indicate that incremental query computation rior to recomputation in many situations. In Section 2 we discuss related work. In Section 3 we present relational
does
efficiency
set-difference of old
algorithm
algorithms
tuples [Roussopoulos 1991]. It is a necessary computation of any query expression involving
operator, whether other incremental
deep
with selecselections
queries [Roussopoumake intensive use of
of any
When combined with can be used efficiently
incrementally.
pointer
are
Our
analyze
Existing
used for the computation of SPJ (selection–project–join) 10S 1991]. However, transformed CONTAINS queries the
the
IN (query) ,“ and set-inclusion “(query) CONTAINS (query ).”
queries can be mutually with aggregate functions.
set-difference
113
set-difference
categories
query on In
of incremental query compuIn Section 7 we describe the
we use as the basis In
a nested
of nested
algorithm that is based prove its correctness.
an incremental
situations.
is supe-
Section
algorithm
and compare
algorithm.
In Section
for incremental
set-difference
algorithm
9 we analyze it to the 1/0
10 we conclude
the
1/0
efficiency the
of
paper.
WORK
Much work on query transformation has aimed at producing semantically equivalent versions of a given query expression that can be computed more efficiently than the original expression [Nakano 1995; Unman 1989]. Recently,
a considerable
be expressed Gottlob 1989]. Kim’s flat
SQL
1985; Dayal [1982]
amount
as SPJ queries 1987;
algorithm
queries.
The
of work
has been done on operators
[Bzekgaard Ganski
transforms set of source ACM
1993; Bultzingslowen
and Wong SQL-like queries
TransactIons
1987;
Kim
nested includes
on Database
that
cannot
1987; Ceri
and
1982; Muralikrishna
queries
into
set-membership Systems,
equivalent queries
Vol. 20, No, 2, June
1995.
114
L. B~kgaard
.
using
the
IN
operator,
and various the
set-inclusion
aggregate
functions.
nested-iteration
1979]
and L, Mark
method
is inefficient
equivalent
join
in
used
most
queries,
queries Kim
appropriate join computation Ganski and Wong’s [ 1987]
in systems
cases.
Kim
using
the
was motivated By
like
System
transforming
enabled
the
method. algorithm
CONTAINS
represents
that
R [Astrahan
the
query
operator,
by the observation nested
et al.
queries
to
optimizer
to use the
most
a solution
to a bug in Kim’s
algorithm that is caused by the possibility of duplicate rows in SQL. Furthermore, their algorithm extends the set of source queries that can be transformed. For example, it is able to unnest queries containing the EXISTS operator, Ceri and into
Gottlob’s
queries
aggregate
functions.
efficient define
[ 1985]
formulated
query
algorithm
in Whereas
computation,
the semantics
first
into
are transformed gate functions. SQL
algebra
Kim
and
Ceri
and
of SQL in terms
argued that by transforming identification of syntactically Bultzingsloewen’s [ 1987] queries,
transforms
relational
calculus
nested
extended
Ganski
and
Gottlob
Wong
used
of relational
SQL-like
with
their
queries
a notation
focused
algebra.
they
into algebra rather than into SQL they facilitate different but semantic equivalent queries. two-step algorithm transforms nested SQL
and then
into
algebra.
In the first
step, the queries
In the
second
step,
the
calculus
queries
are transformed
into
algebra enhanced with a notation for aggregate functions. purpose of this step is to facilitate efficient query computation.
opposed
to Kim, how
construct. Muralikrishna A query
Ganski
and Wong,
to transform [ 1989] focused
is a tree-query
and
queries
Ceri
to SQL with
the
is more
advantages
approach
Gottlob, the
on data-flow-based
if there
variant of the relational algebra that By doing this he combined a study language. Our unnesting
and
containing
than
Kim and Ceri and Gottlob focused mostly at most one subquery at each level. Dayal [ 1987] studied the transformation
related
to
Furthermore,
into relational calculus enhanced with a notation for aggreThe primary purpose of this step is to define the semantics of
queries.
showed
on
transformations
relational primary wen
for
solely
differs
and
Biiltzingsloe-
GROUP-BY/HAVING
evaluation
one subquery
on linear
The As
queries
of nested
of tree-queries. at the same level. in which
SQL
there
is
into
a
queries
handles the problem of duplicate rows. of some problems that are inherently of using
from
relational
the previous
algebra work
as the target
in two ways.
First,
transformations are applied to nested algebra queries, and the transformed queries are algebra queries as well. This has made it possible for us to use a very concise and precise notation for the various transformations. Second, we have formally proved the correctness of our algorithm. Conceptually, our transformation rules are similar to the ones proposed by Kim [1982] and by Ceri and Gottlob [1985]. Our major contributions are the use of a concise and readable algebra to algebra notation and the proof of correctness. A number of nested algebras have been proposed for nested relations [Ozsoyoglu ACM
et al.
TransactIons
1987;
on Database
Paradaens Systems,
Vol
and
Van
20, No 2, June
Gucht 1995
1992;
Roth
et al.
1988;
Nested Relational Query Expressions Schek due
and
Scholl
to the
relations
Scholl of the
a nested
relation,
Colby
[ 1989]
into
nested
1986;
occurrence
relation.
et al. 1987]. operators
The
nesting
in these
NEST,
which
transforms
and UNNEST, designed
which
a recursive
removes algebra
115
. algebras
of
from
a
nesting that
is
a set
can do the
same processing as the nested algebras without the need for the operators NEST and UNNEST. Gyssens and Van Gucht [1988] suggested using a power set algebra operators Among paid
as a means
of querying
NEST and UNNEST. the relational algebra
to SPJ
queries.
nested
relations
operators,
It has been
without
the need for the
most
research
attention
that
end-user
queries
assumed
has been tend
dominated . . . FROM
by a combination of these operators. The SQL statement . . . WHERE . . . directly reflects a SPJ query. Very little has been paid to the set difference and set union operators. The
efficient
computation
of relational
set
differences
has
to be
SELECT attention
not
received
much research attention. Smith and Chang [1975] developed a set of recomputation algorithms for the computation of set differences. We have used their sort–merge algorithm as a point of reference and comparison for the analysis of the efficiency of our incremental our incremental set-difference algorithm,
set-difference algorithm. Without other incremental algorithms can-
not be used in queries involving the set-difference Blakeley et al. [1986] developed an algorithm and
incremental
developed
computation
an algorithm
combination
of SPJ
that
of selections,
computes
queries. change
projections,
operator. for change-set Qian
and
sets for any query
multiplications,
computation
Wiederhold
[1991]
defined
set unions,
and
as a set
differences. Jensen et al. [1991] studied SPJ queries in the context of transaction-time databases and added the notion of decremental computation of time slices, that is, query results as of some time in the past. Neither of these approaches provide cost models or cost analysis. Incremental computation in rule-based
systems
et al. 1989; Wolfson
et al. 1991;
Roussopoulos and
has been
Stonebraker
cost analysis.
algorithms
for
Ceri
[ 1991]
studied
Hanson
and Widom
1991;
studied
SPJ queries
He described their
by a number
et al. 1990;
efficient
data
of researchers
et al. 1990;
Carey
Hanson
1992].
in detail
and provided
structures
computation
for view and
pointer
materialization.
[Rosenthal et al. 1990; cost models caches He
and used
simulations and cost computations tation to the cost of computation incremental algorithms outperform tions.
to compare the cost of incremental compuby recomputation. He concluded that his recomputation algorithms in many situa-
3. NESTED
EXPRESSIONS
In this lNF
RELATIONAL
section
relations,
we present that
QUERY a nested
is, queries
with
relational nested
algebra
for nested
comparison
predicates,
queries
on
set-mem-
bership predicates, and set-inclusion predicates. The structure of our nested algebra resembles the structure of SQL directly. This makes it very easy to use our algorithms to process SQL-like nested queries on lNF relations. A number of nested algebras have been proposed for queries on nested relations [ Schek and Scholl 1986; Scholl et al. 1987; Ozsoyoglu et al. 1987]. ACM
TransactIons
on Database
Systems,
Vol. 20, No. 2, June
1995
116
L. B~kgaard
.
and L. Mark Table I.
Operators Comment
Definition
Operator (J[P]R
{rlr
II[czl,...,
Relational
a,]R
SelectIon R(al:dl, Projection R(al:dl, Cartesian
GRAP(r)}
(( UI,...,,)l$
(UI,I,
u,,,,
un)=~}un)=~}
{(up,..., u,n)l(ul, Un)=RA)=RA (U,t+l,..., u,,,)=s}
RxS
Rx[p]s
a[P](R
Rx,$j’
m%,..
al]S
a[l(R. al, . . .. G Tidal,,.,,
... a.:dn)
R(al:dl,..., cz,:d,, an:dn):dn) S(a, :d,, ... an:dn, ,a,n:dJn) Difference R(al:dl,..., an:dn) S(al:dl, . .,andn) Generalized difference
a.),) a,]S]R
R(al:dl,..., S’(al:dl, Union R(al:dl,. S(aI :dl,..., Intersection
RuS
{tlt=RAt=S}
RnS
d,,
= S.an]S)
R\S
R\[al,...,
. ,al: product
R(al:dl,..., an:drt) S(an+l d,, +l,. .,a,n:d,n) Join R(al:dl, . . . ..d .).) S(a,, + I:dn + 1, ... al,, :dln) Natural join
X S)
>a,,ll (R=[l?.a, = Sat ~ Ran
... an:dn)
L’3(al:
ar, dn) ,an: d.) .,an:dn) an dn) R(al : d,,...,
all,...,
an : dn)
am:).)
Their structure resembles the structure of nested relations. Dadashzadeh [1989] presented an improved division operator for the relational algebra. Ozsoyoglu and Wang [1989] presented a relational calculus with set operators. Both of these approaches represent query facilities that resemble our nested algebra, but they are not easily translated into SQL. The relational operators are defined in Table I. Single-value, single-attribute
queries
can be treated
as expressions,
and vice versa.
Two
queries
are
equivalent, R = S, if and only if (iffl they return the same set of tuples whenever the same base relations are substituted for common base relation names in the two queries. that is, it can be computed with A query expression, R, can be time sliced,
t is an expression that a time index by means of the notation R[ t ], where evaluates to a time point value. The effect is to compute R on the state of the ~[ S ] R to symbolize underlying database as of time t.We use the notation the projection of R onto the attributes of S. Our examples will be based on the following relational schemata for a small ACM
simplified TransactIons
library on Database
database.
The
Systems,
20, No. 2, June
Vol
underlined 1995
attributes
are (composite)
Nested Relational Query Expressions
.
117
keys: Borrowers
(Name,
Books
(Title,
Loans
(Name,
Reservations Authors
numbers
times and
(Name,
Title)
Title,
Area)
Borrowers.
of books
Reservations.Name
by each
are
Loans. Title
THREE
and
CATEGORIES
Op={=,
an
,
foreign
keys
Reservations.
u[EOp
T]R,
of type
String
of type
bute relational expressions. A set-inclusion selection
5. the
of
Loans.Name
Borrowers. keys
following
cr[SOp
The
at-
pointing
to
forms,
Each the where
or
TJR.
Integer.
S and
has one of the following
T
are
single-tuple,
forms:
oISGT]R.
String
or Integer.
S is a single-tuple,
T is a single-attribute relational has one of the following forms,
single-attri-
expression. R, S, and T are
expressions: a[T=S]R, queries
are flat
u[Tx queries.
S]R,
o-[ T>
S]R.
R, S, and T are called
left-hand-side inner block (the contained set), and the block (the containing set), respectively. The level of a predicate (or inner block) is equal
the
outer
block,
right-hand-side to
the
depth
the inner
of the
surrounding query expression down to the predicate (or inner block). depth of a query expression is defined recursively as follows: The depth flat query predicate
to-
expressions.
selection
E is an expression
other
to foreign
the
number
EXPRESSIONS
we present in Section selection has one of
(r[EG!l’]R,
All
are
QUERY
total
The attributes
pointing
Title
contain
the
of three categories of nested query expressions. into four types, and they are used to structure
relational
A set-membership
relational
and
s,>}:
expression
single-attribute
Books. NoOfLoans borrower
respectively.
OF NESTED
unnesting algorithm that A nested comparison
is
and
NoO@ootls
borrowed
We study the unnesting category is subdivided
E
NoOfiooks)
NoOfLoans)
each book has been borrowed,
tributes Books. 4.
Areaj
Author,
Title)
(Name,
attributes
‘Ike
tal
Address,
Type,
expression of a depth
is 1. If a flat query k query expression,
The of a
expression is inserted into a level k the result is a depth k + 1 query
expression. The term free reference is defined as follows: An attribute reference refers to the deepest higher level relation in its scope. If no such relation exists, then the reference is a free reference. Multiple occurrences of a relation ACM
TransactIons
on Database
Systems,
Vol. 20, No. 2, June
1995.
118
L. B~kgaard
.
and L. Mark
name, R, in a query expression can be distinguished by means of quotes: R’, R“, etc. Only subquery expressions can contain free references. tuple dependent iff it contains a free reference. An inner block is selection Otherwise,
it is selection
of a subquery
tuple
expression
independent.
of an inner
The
block
dependence/independence
is defined
similarly.
A selection
predicate can be selection tuple dependent even if it contains no inner blocks. This is the case for the query a [ NoO/Loans > 25] Books. A set-membership query expression can be formulated as a set-inclusion E as a query expression. c [ E ● T] R = w [ T Q E] R, since we can interpret single-attribute, single-tuple relation. We have explicitly included the setmembership queries for three reasons: First, many queries are most naturally formulated as set-membership queries. Second, a set-membership query expression left-hand
contains
the
side must
extra
algorithm
is
expression
can be computed
applied
to
equivalent set-inclusion For each of the three selections.
information
be a single
tuple
to the
expression.
a set-membership more
query
efficiently
query expression. categories of nested
The reason
is that
query Third, the
the
unnested
selections, inner
that
the
the unnesting
expression
than
the left-hand-side
evaluator when
resulting
result
there
are four
block
may
of an
types
or may
of not
be selection tuple dependent and the right-hand-side inner block may or may not be selection tuple dependent. The corresponding 12 types of nested selections are described in Table 11, C, M, and I denote nested comparison selections, set-membership selections, and set-inclusion selections, respectively. The following have
Type
been borrowed SELECT FROM WHERE
(2’I nested more
comparison
times
than
>
(SELECT
selection
the book titled
The
following
SELECT
FROM WHERE
Type
Cz nested
SELECT FROM WHERE
TransactIons
selection
at most
extracts
all loans
where
50 times:
* Loans
50>
(SELECT FROM
query
is an equivalent
No Oj5Loans Books TtllQ = Loans
Type
(SELECT
on Database
NoOfioans Books Title = Loans Systems,
Vol
20, No
.Title
C~ nested
‘ Loans FROM WHERE
ACM
that
NoOfLoans Books Title = “Tomorrow”)
comparison
book has been borrowed
WHERE
The following
all books
“ Books NoO/Loans
FROM WHERE
the borrowed
extracts “Tomorrow”:
.Title)
50)
Vol. 20, No 2, June
1995
Title Books NoOfLoans
all borrowers at least
> 50)
that
50 times:
have
)
Nested Relational Query Expressions The
following
currently
Type
borrowed
SELECT FROM WHERE
Is
set-inclusion
books
have
selection
extracts
all been borrowed
“ Borrowers (SELECT FROM WHERE
Books NoOfLoans
(SELECT FROM WHERE
Title Loans Name = Borrowers .Name)
all
at Ieast
121
.
borrowers
whose
50 times:
Title > 50)
CONTAINS
The following currently
Type
borrowed
SELECT FROM WHERE
IL set-inclusion
selection
all of the books
that
extracts
they
have
all borrowers currently
that
have
reserved:
* Borrowers
(SELECT FROM WHERE
Loans
Title
(SELECT FROM WHERE
Title Reservations Name = Borrowers .Name)
Name = Borrowers .Name)
CONTAINS
5. AN In
this
UNNESTING section
algebra queries way to compute compute
ALGORITHM
we present
an algorithm
into equivalent relational the Type 1A set-inclusion
that
transforms
algebra selection
nested
relational
queries. The most obvious from Section 4 is simply to
the subqueries
SELECT FROM WHERE
Title
Reservations Name
= Borrowers
.Name
and SELECT FROM WHERE for each tuple is true. This
Title Loans
Name = Borrowers .Name in Borrowers, computation
and to select the tuple if the inclusion method is used to compute nested
systems like System R [Astrahan et al. [1982], the cost of such a nested-iteration Alternatively,
Borrowers
could
1979], but computation
be sorted
on Name
predicate queries in
as pointed out by Kim can be very high. and
Reservations,
and
Loans could be sorted on Name and Title. Then, the three relations could be scanned in parallel, and the desired Borrowers tuples could be extracted. This is possible because both Reservations and Loans are related to Borrowers via
Name,
and in most
cases it is considerably
more
ACM
Systems,
Transactions
on Database
efficient
than
nested
Vol. 20, No. 2, June
1995
122
.
L, B=kgaard
iteration than
for the same reason
nested-iteration
It is, however, to compute arbitrary
more
and
inner
joins
usually
are more
efficient
1989]. strategy
should
efficiently
of nesting
exists and
blocks
be used in the general
than
nested
iteration.
at all. The complexity
the and
possible outer
lack
blocks
introduced
of symmetry would
case
Moreover, in
probably
it by the
make
very complex. Therefore, it seems to be more fruitful query expressions to equivalent flat expressions and then
query
We formally transformation
sort–merge
if such a strategy
such a strategy transform nested
of Ceri
which
queries
between
existing 1990].
that
[Unman
not clear
depth
relationships
apply [Codd
joins
nested
is not even clear the
and L. Mark
computation
techniques
to the
transformed
to to
expression
prove that the algorithm always preserves equivalence. Our rules are inspired by the work of Kim [ 1982] and by the work Gottlob
[1985].
We
have
reformulated
because our notation differs substantially First, our notation is considerably more
the
rules
from
scratch
from the notation used by others. concise and readable than existing
notations. Second, our notation has made it possible for us to construct concise and convincing proof of the correctness of our algorithm. Our pings,
unnesting that is,
expression.
algorithm is formulated in terms of query expression functions that map a set of parameters to a single
is a query the actual
expression parameters
ters
in the
S, T)~(Rx
S)\(Rx
T)
mapping. When a query expression mapping is applied, are inserted where the corresponding formal parame-
right-hand
side.
Only
queries
that
contain nesting, and since joins are defined in terms - CT[ P ]( R x S), we only need to describe unnesting The following with
atomic
three
equivalences
allow
us to restrict
contain
predicates
can
of selections, (R ~ [ P ] S ) of nested selections. unnesting
to selections
predicates: ~[P1
APZ]R-
OIP1
vPz]R-u cr[l
5.1
mapquery
For example, q:(R,
occur
a
Set-Membership
P]R
Predicates
~[Pll
Rna[I’21R,
IP1]Rua[Pzl -R
\
(1) R,
u[P]R.
and Nested
(2) (3)
Comparison
Predicates
We present a set of transformation rules that can be used to unnest selections with set-membership predicates or nested comparison predicates. The rules are inspired by Kim’s [1992] unnesting rules and can be viewed as generalizations of these formulated in terms of relational algebra. The following rule removes one level of nesting from a set-membership selection of the form a [ E G H[ A]S ] R, where A is a subset of S’s attributes. We assume that S is replaced by q(S’l, . . . . S,), where {Sl, . . . , S,} is the set of q is constructed base relations in S, and the query expression mapping that q(S1, . . . . S~) = S. Before the rule is applied, all projections in ACM
TransactIons
cm Database
Systems,
Vol
20, No
2, June
1995
such S are
Nested Relational Query Expressions augmented changed
with into
R‘s
attributes,
and
all
multiplications,
.
123
S, x SJ, in
S are
11[ S,, Sj, 7?]( S, X S1):
a[EGrIIA]q(S1,
...,
S~)]R
-
rIIR](u[E=A]q(R
XS1,...,
R XS.
)). (4)
Recall
that
an R tuple
is selected
iff the
left-hand-side
expression
(or inner
block) is a member of S. There is potentially one version of S per R tuple. The basic idea underlying (4) is to combine each R tuple with its induced right-hand-side tuples. We do this, by multiplying R with all of the base relations in S, to select all tuples in the resulting relation that satisfy E = A and to project in
S with
the result
R’s
surrounding
and the attributes
R’s attributes.
are
of Cartesian
done
how
constructed
the
with
that
these
a projection
q(R)
type -
Mz
a[ Title
of projections are
not
lost,
The
onto the attributes
is done to remove
following
q, such that
The augmentation
to ensure
products
of the operands
of the operands. Let us illustrate have
onto
attributes
R’s attributes
selection
of R
from
one
is transformed.
We
= Loans. Title]R:
u [ Name E 11[ Name] u [ Title = Loans .Title] Reservations] Loans + u [ Names G 11[ Name] q(Reseruations )] Loans + H[ Loans] u [ Loans .Name = Reseruations.Name] q ( Loans X Reservations ) + II [ Loans ] Loans ~ Reservations. Note
that
type
selections
Ma and
by rule
The following from comparison to projections a[130p
Mb selections
are transformed
into
nested
comparison
(4). slightly modified version selections. We presume
as described
HIA]q(Sl,
...,
of (4) can be used to remove the same preprocessing with
nesting respect
above: S.)]R
-
III R](a[EOp
A]q(R
XS1,...,
R XS~)). (5)
(5) is identical to (4) except that in (4) the set-membership operator, = , is replaced by = in the flat selection, whereas in (5) the comparison operator Op is copied selections
from
the nested
are transformed
selection into
type
to the flat
selection.
C~ selections
must be changed into C’z selections before (5) is applied. not designed to transform type C~ selections. 1. Ifq(S1,.. combined with
LEMMA
tuples
We
PROOF.
algebra
prove
Note
by rule
Because
., S.) E S, then q(R x S1,..., their induced S tuples.
R x S.)
Lemma
structure
1 by induction
on the
that
type
Cl
(5). C~ selections rule
(5) is
contains
all R
of relational
queries.
Basis. If S is a base relation, then tuples combined with induced S tuples. Induction. expression
q(R
x S)
obviously
contains
all
R
In each of the following cases, we assume that the query mappings q ~ and q~ have been successfully applied to Z’l and T2, ACM
Transactions
on Database
Systems,
Vol. 20, No. 2, June
1995
124
L. Baekgaard and L. Mark
.
respectively: (Tl,, . . . , 7’1,) is the set of base relations is the set of base relations in Tz. Projection. Tll,
If
S -
III A]T1,
. . . . R X Tll ) contains
Selection. T 11,...)
If
all
S E cr[P]T1,
R X Tl, ) contains
Cartesian
then
product.
then
H
S=
q(R
x
Tll,
then
S= T1u Tz, then q(RXT1,,..., Rx TIL)uqJRx Tz,,..., T:,,..., with their induced S tuples.
Setdifference.
lf
xTz)=ql(R
S~Tl
xT~,,
\Tz,
. . .. Rx
their
~nduced
with
their
q(RXTll,
then
q(Rx
T1,,...,
x
induced
S tuples.
. . .. RX
T1. Rx
RxTz~
R XT1, RX TZ,,..., R x Tz,)’ contains all
T1. ) \qz(RXTz,,
x
S tuples.
. . . . R x T1 ) = cr[P]ql(R
RXT1 )xqg(Rx Tz,,..., their induced S tuples.
If
=ql(Rx combined
. . . . R X T1 ) = III A]ql(R with
combined
TIXTZ,
. . ., RX Tz,)~ql(R xTll,..., t&& all R tuples combined with
x Tll,
combined
all R tuples
T
Union.
q(R
R tuples
in 1“1, and (T’zl, . . . . 7’Z,)
R XTl,
RX
con.
1
RXTZ) R tupl&
Tz,,...,
. .. JR XTzcontainsns
R
all
R
J
tuplek
combined
with
Given the induction maintained. ❑ LEMMA
2.
(4)
bership
predicate
algebra
query
PROOF.
combined
hypothesis,
transforms contains
exactly
E 3.
●
5.2
(5)
Similar
Set-Inclusion
In this
it is clear
that
the desired
tuple
set is always
any set-membership selection, where the set-memno nested queries, into an equivalent flat relational
the
combination
of R tuples
and
induced
S tuples
❑
11[ A]S. transforms
the comparison predicate relational algebra query. PROOF.
S tuples.
Lemma 1 states that q(R x S1, . . . . R x S.) contains all R tuples with their induced S tuples. Therefore, the selection predicate
satisfy
LEMMA
induced
expression.
E = A extracts that
their
any nested contains
to the proof
selection
no nested
for Lemma
2.
comparison
queries,
into
selection, an equivalent
where flat
❑
Predicates
subsection
we present
a set of transformation
rules
that
can be used
to unnest selections with set-inclusion predicates. The rules are inspired by Ceri and Gottlob’s [1985] unnesting rules and can be viewed as reformulations of these in terms of relational algebra. We describe the unnesting of set-inclusion selections like a[ 11[ Az ]T 2 JI[ A1]S]R, where Al is a subset of S’s attributes and Az is a subset of T’s attributes. We assume that S is replaced by ql(S1, . . . . S.). {S1, . . . . S.} is the set of base relations in S. ql is constructed such that ql(Sl, . . . . S,l) = S. Furthermore, we assume that T is replaced by qz(T1, . . . . T~). {Tl, . . . . T~} is the set of base relations in T. qa is constructed such that qz(T1, . . . . T,. ) - T. ACM
TransactIons
on Database
Systems,
Vol
20, No
2, June
1995
Nested Relational Query Expressions Each
R tuple
predicate.
The
relation, Also,
R ~, that
each
selection. matching
Before
induces basic
R tuple
Rs
is and
expression
in
combines
our
each
induces
We define T tuples.
attributes,
a set of tuples idea
R tuple
R~,
that
the
R~-HIR,
Az]qz(R
xT1,...,
Rx
T~).
projections
combined
left
S
are
same
with
= —
R,i~~t -
their
induced
R1.~t and
fIIR](Rs
with
augmented
reasons
R~ is constructed. combined with their
sets of R tuples, R
in
S, x Sj, in S are changed the
define
into
induced
S
of the all the
with
R’s
the
as discussed
T tuples.
a
S tuples. side
R tuple
S~),
all
to
right-hand
each
Rx
all multiplications,
two
from
combines
xS1,...,
constructed,
is
all of the matching
A1]ql(R
R tuples
RT to compute
with
side of the selection
approach
R~-fIIR,
H[ S,, Sj, R]( S1 x SJ), for
all
the left-hand
a set of tuples
a relation,
Similarly, T is changed before Rs contains all R tuples contains
from
transformation
125
.
query above,
tuples.
We use Rs
R~ and
Rrl~~t:
\R~),
III R](R~\Rs).
Rl,f~ contains a given R tuple iff it induces at least one tuple in S without inducing the same tuple in T. R,,~f,t contains a given R tuple iff it induces at least one tuple in T without inducing the same tuple in S. The following of nesting
from
transformation a selection
rule with
a[fIIAz]Tz
uses Rleft and
a set-inclusion fIIA1]S]R
Rrlght to remove
one level
predicate: +R
(6)
\R1.ft
(6) is based on the fact that the set of R tuples that does not belong to Rleft is equivalent to the set of R tuples whose induced left-hand-side tuples is a subset of the induced right-hand-side The following transformation rule of nesting
from
a selection
with
tuples. uses Rlcft and
a set equality
Rrlght to remove
one level
predicate: (7)
(7) is based on the fact that the set of R tuples that belongs to neither Rleft nor R,l~~t is equivalent to the set of R tuples whose induced set of left-handside tuples is equivalent to the induced set of right-hand-side tuples. The following transformation rule uses R,eft and Rr,ght to remove one level of nesting
from
a selection
with
OIIIIAZ]T>
a proper fIIA1]S]R
set-inclusion +R,,~~t
(8) is based on the fact that the set of R tuples equivalent to the set of R tuples whose induced a proper
subset
of the induced ACM
\R1,~t.
(8)
that do not belong set of left-hand-side
set of right-hand-side TransactIons
predicate:
on Database
to R,eft is tuples is
tuples. Systems,
Vol. 20, No 2. June
1995
126
L, B~kgaard
.
The following into that
example
an equivalent
ql(R) q2(R)
and L. Mark
flat
illustrates
query
the transformation
expression.
We have
of a type
constructed
ql
qz such
= o-[ Name = Borrowers .Name]R = U[ NoOfioans > 50]R
The transformation
is as follows:
m[IIITitle] cr[NoOfZoarzs > 50] Books z 11[7’itle]a[Name Borrowers .Name]Loans]Borro wers + o-[ H[ Zltle]qz(
Books ) ~ 11[ Title ]ql(Loans
=
)] Borrowers
Borrowers \ (11[ Borrowers](( 11[ Borrowers, Title]ql( x Books)))) + ( 11[ Borrowers, Title]q2( Borrowers
x Loans
Borrowers
Borrowers \ (11[ Borrowers](( 11[ Borrowers, Title]( cr[ Borrowers x Loans))) \ (11[ Borrowers, Title] Loans. Name]( Borrowers x Books))))) ( a[ Books. NoOfioans > 50]( Borrowers LEMMA
4.
tuples,
Iq selection and
and
(a)
R~
(b)
R~
contains
all
contains
all
.Name
)) \
=
R tuples
combined
with
their
induced
S
R tuples
combined
with
their
induced
T
tuples. PROOF.
Similar
to the proof
of Lemma
❑
1.
LEMMA 5. (a) Rleft contains all R tuples for lohich R ,,~~t contains all R tuples for which S z T is false. PROOF.
Lemma
their
induced
their
induced
4 states
S tuples T tuples.
induce
at least
proves
(a), and the proof
LEMMA
6.
and
(6)
in
RS
contains
all
R
tuples
combined
with
RT
contains
all
R
tuples
combined
with
PROOF.
Follows
from
LEMMA
7.
(7)
any
Lemma
inducing
the
query
expression
query
5. The set of R tuples
any
query
tuple
of the form
expressions,
of R tuples right-hand-side
no nested
all R tuples
same
in
that
T. This
❑
whose induced tuples. ❑
Set
transforms
where S and T contain algebra expression.
contains
11[ R]( RS \ RT)
S without
where S and T contain no nested relational algebra expression.
to the R left iS wuivalent is a subset of the induced
(b)
that
for (b) is similar.
transforms
and
that
Therefore,
one tuple
T > S is false,
expression
queries,
into
a[ T z SIR,
into
an equivalent
that
does not belong
left-hand-side
of the form
an equivalent
flat
to
tuples
cr[ S = T]R, fi!at
relational
PROOF. (7) is derived from the equivalence a[ S = T]R - a[ T Q S]R n 0-[S z T]R. Lemma 6 states that a[S = T]R - (R \ Rleft) n (R \ R,,~~t). But, then, cT[S = T]R = (R \ Rl,f,) \ R,,~~t. ❑ LEMMA 8. (8) transforms any query expression of the form U[ T ~ S]R, where S and T contain no nested queries, into an equivalent flat relational algebra expression. ACM
TransactIons
on Database
Systems,
Vol
20, No 2, June
1995
Nested Relational Query Expressions PROOF.
is derived
(8)
from
the
equivalence
u[T
127
.
~ S].R = ( a[ T Q S]R)
\
(cT[S = T] R). Lemmas 6 and 7 imply that a[T o SIR - (.R \ Rleft) \ ((R \ a[T ~ S]R = R,i~ht \ Rleft, since R 2 R1~ft and Rleft) \ Rright). But, then, R s Rrlght. ❑ 5.3 The Unnesting Algorithm nested
Algorithm
UNNEST relational
uses
algebra UNNEST.
Algorithm
the
transformation
expression
rules
as defined
Unnesting
(4)-(8)
in Section
of a nested relational
to unnest
any
2.
algebra
selection,
a [P]R.
(1)
Apply the transformation space reduction rules (I)-(3) to a[ P]R until all level 1 predicates are atomic. Change joins into selections, projections, and multiplications.
(2)
FOR each atomic, nested subselection
on level 1, 0[ Pi]R,
DO
(2.1) Change all level 1 multiplications, SI X S2 in P to IIIS1, SZI(SI Augment all level 1 projections in P with R’s attributes. (2.2) Use the relevant unnesting (2.3) Apply UNNEST recursively sion. THEOREM
expression
Basis. there
the
of the nesting The
rules (4)-(8) to unnest a[ PzIR. to the partially transformed query expres-
UNNEST transforms any an equivalent flat relational
We prove
PROOF.
the depth
atomic
1.
into
algorithm
occurrences
correctness
nested algebra
of algorithm
in the input
expression
applies
transformation
of level
is no level 2 nesting.
the
1 nesting. But
Lemmas
the
x S2).
relational expression. UNNEST
rules
1–8 prove
transformation
query
by induction
(4)–(8)
that
rules
algebra
this
only
on
to remove will affect
work level
if 1
base relation and predicates, and level 2 nesting will thus occur at level 2 nesting in the result. The free references are not affected by the transformation rules. Induction. that
The algorithm
k levels
be level
1 and
because
there
5.4
Other
simply
unnests
have
been
correctly
will
thus
be unnested.
are only
Nesting
a finite
one level
unnested, The
number
at a time,
the original algorithm
of atomic
level
and assuming k + 1 will
necessarily
nesting
now
terminates
occurrences.
❑
Forms
We have described the unnesting queries, and set-inclusion queries. nested queries. The equivalences
of comparison queries, set-membership These are, of course, not the only forms of below indicate how a number of nested
queries can be unnested by means of the same method that is used in the algorithm UNNEST. RevOp means the reverse operator of Op, where the latter is in{= , < , > , < , >}. S(s) Op T means that S Op T is true for some elements in S. S Op( s)T means that S Op T is true for some element in T. S(a)Op T means that S Op T is true for all elements in S. S Op(a)T means ACM
TransactIons
on Database
Systems,
Vol
20, No 2, June
1995.
128
.
L, B~kgaard
and L Mark
that S Op T is true for all elements in T. Such operators are inspired by SQL 1991]. comparison operators like “ > all,” “ < all,” and “ = some” [Korth In the following equivalent queries, an R tuple is selected iff S and T have at least
one common
element:
a[Sozlerlaps In the following least
T]R
equivalent
one element
queries,
n T=
an R tuple
S]R
In the following
equivalent
true
one right-hand-side
a[EOp(s)H[
a[7(S
0)1 R.
is selected
iff there
exists
at
in S: a[exists
for at least
-
A]q(Sl,
= a[T(S
queries,
. . .. S.)]
In the following equivalent true for all right-hand-side
=O)]R.
an R tuple
is selected
iff the predicate
is
element:
R-
IIIR]a[
EOp
queries, an R tuple elements:
o-[l?Op(a)S]R
=R
A]q(Rx
Sl, . . .. Rx
is selected
iff the predicate
S~). is
‘. a[i3RevOp(s)S]R.
In the following equivalent queries, an R tuple is selected iff there exists at least one combination of a left-hand-side and a right-hand-side element for which
the predicate
is true:
a[HIA~lq~(Sl,
. . ..)(s)
=IIIRIu
IA~Op
xq~(R In the following true
A~](qs(R
xSl,...,
x T1, . . ..R
X ~,,)).
queries,
an R tuple
equivalent
for all combinations
Op(p( s) III A~]q~(Tl,
of left-hand-side
o-[ S(a) Op(a)T]R
-
with
any right-hand-side
a[S(s)Op(a)T]R Notice
is selected
u[NOT(S(s
iff the predicate
is
elements:
)RevOp(s)T)l
R.
is selected iff there exists at predicate is true when com-
element:
= (u[S(s)Op(s)T]R)
\
(a[S(s)RevOp(
s) T] R).
that qS(Sl,
In the following true for at least
. . .. S~)andqT(Tl
In the folIowing equivalent true for all left-hand-side
= T]R
queries, elements.
w[S(a)G’T] Transactions
.T, ,) T,,).
equivalent queries, an R tuple one left-hand-side element: o-[ S(S)
ACM
RxS~)
and right-hand-side
In the following equivalent queries, an R tuple least one left-hand-side element for which the bined
. . .. T)]R]R
on Database
Systems,
-
is selected
u[Souerlaps
an R tuple
is
iff the predicate
is
T]R. is selected
R=m[T~S]R.
Vol. 20. No 2, June
iff the predicate
1995
Nested Relational Query Expressions 6. INCREMENTAL So far
we have
sions.
In
the
QUERY
COMPUTATION
described
the unnesting
remaining
part
computation selection/join
of unnested predicates
subqueries.
Consequently,
of the
of nested paper,
we
relational focus
algebra
on
queries. We can, therefore, focus are simple comparison predicates we can utilize
existing
the
of flat selections, projections, following sections, we show incrementally
flat joins, and how to efficiently
by means
Roussopoulos’s incremental algorithm makes it possible A query
expression
be computed
of view
pointer
expres-
incremental
on queries where that contain no
algorithms
for
tally computing SPJ queries [Roussopoulos 1991]. As demonstrated in Section 5, nested queries can be reformulated
ences
129
.
incremenin terms
set differences. In this and the compute relational set differcaches.
When
combined
with
SPJ algorithms, our incremental set-difference to compute unnested queries incrementally.
can be computed
incrementally.
by means
The basic
idea
of recomputation,
of recomputation
or it can
is to construct
a
query expression from scratch each time [Jarke and Koch 1984; Smith and Chang 1975; Unman 1989]. Access paths maybe reused, but the results of old queries are not utilized. The basic idea of incremental computation is to store the results of old queries in persistent caches and to reuse them when similar queries are computed [Blakeley et al. 1986; Qian and Wiederhold 1991; Roussopoluos 1991]. Instead of evaluating everything from scratch again, the intermediate changes are used to modify the cached query results. There must be some sort of repeated query pattern in order for the cached query results to be useful. Usually, it is assumed that a given query expression is computed repeatedly or that a set of query expressions shares one or more common subexpressions. When a query expression is used in a view definition, When
it must queries
be computed are formulated
each time
a reference
in terms
of selections,
is made
to the view.
projections,
joins,
set
differences, and set unions, it is possible to identify fairly simple update rules for old query results [Blakeley et al. 1986; Bzekgaard 1993; Qian and Wiederhold 1991; Roussopoulos 1991]. Incremental computation is based on the notion
of change
Definition the following
1.
sets as defined RI
is true
and
RD are minimal
1:
insertion
and deletion
sets for R iff
for tl < tz: R[t2] R1n R1n
R[tl]
the intermediate deletions made
-(
R[tll
\RD)LJ
(a)
RI,
RD=O,
RD \R[tl] RI contains intermediate
in Definition
(b)
=0,
(c)
= 0.
(d)
insertions made to R, and RD contains the to R. Eq. (a) defines the incremental computa-
tion of R. It states that the intermediate deletions, RD, must be removed from R and that the intermediate insertions, RI, must be added to R. Eq. (b) states that there must be no cross-references between RI and RD. Eq. (c) ACM
TransactIons
on Database
Systems,
Vol. 20, No 2, June
1995.
130
.
L, B~kgaard
and L. Mark
R
s
RD
+
+
s, +
Rl
Fig. 1,
Change-set
propagation
for set differences.
states that RI must contain no false insertions. Eq. (d) states that contain no false deletions. In Section 7 we show how to construct
RD must minimal
change sets. Figure 1 illustrates
how (R
of RI, R~, SI, and algorithms in Section
SD. The figure is related directly to our incremental 8. All subsets marked by “+” belong to (R \ S)l, and
\ S)I
and (R
all subsets marked by “–” belong to (R Eq. (9)–( 11) express Figure 1 in terms (( R\
RD)URJ)
\((S\SD)
US1)=((R
\ S )~ can be computed
\ S)~. of relational
in terms
algebra:
\S)\(R\
S) D) U(R\S)I, (9)
(R
\ S)r=
(R1 \ (R
((S
\SD)
\S)D
= ((R
US1))
U (SD n ((R
\S)
n (S1 URD)).
\RD)
URI)),
(10)
(11)
Eq. (9)-(11) and Figure 1 presume that RI and R~ are minimal insertion and deletion sets as defined in Definition 1. Roussopoulos [ 1991] demonstrated how view pointer caches can be used to store a set of pointers identifying old query results. Each time a given query expression is computed, the corresponding view pointer caches are updated to reflect the effects of intermediate changes made to the underlying database. Figure 2 illustrates a typical view pointer cache S are base relations. i-l, rz, r~, r~, r~, SI, SZ, identifiers. The arrows symbolize page identifiers tuples in R \ S. Figure 2 shows that the query R ACM
Transactions
on Database
Systems,
Vol. 20, No 2, June
1995.
for a set difference. R and Sz, S4 and s~ are tuple for disk pages containing \ S contains the R tuples
Nested Relational Query Expressions R
131
.
s
A
1 B
rl
1
Br’ c
r3
D
r4
Sl
Er5
c
$2
F
S3
G
S4
H
S5
Fig. 2. Sample view pointer symbolize page identifiers.
i
!
L
cache: The arrows
---i’=1”’ I
I
r5
with
tuple
rs3
r ,, r~, and
identifiers
R \ S contains
the
be constructed Results
r~. Also,
page identifiers
via a sequential
of complex
query
Figure
2 shows
for the tuples.
that
the cache
Consequently,
for
R \ S can
scan of the cache for R \ S. expressions,
that
is, expressions
that
are
com-
posed by more than one relational operator, are stored as multilevel cache structures. Figure 3 illustrates a cache structure for an unnested version of the following
query:
O-[(HIT. The unnested
version S \
The
nodes
C]CT[T. C%= s.al!i”) of the query
ll[S]((Il[S,
labeled
R,
Q (H[R.c
has the following
R,cl(SXR)) S,
and
T
\
represent
R(a:d., c:dC), S(a:d~, b:db), and T(a;da, relational operator and contains a pointer query defined by the operator. When the cache structure in Figure mediate
changes
made
to R and
IoIR.cz
(III
and used to update (III
form:
base
relations
(13) with
schemata
c:d, ). Each other node represents a list that identifies the result of the
3 is updated
S are used
incrementally,
to update
the
caches
the interfor R ccS
propagated and used to made to these are then
the cache for
S, R. CI(SDOR)) ACM
(12)
S, T. C](SKT))).
and S ccT. The changes made to these caches are update the caches for the projections. The changes propagated
= s.czIR)ls.
\
Transactions
(H[S,
T. C](S=T)).
on Database
Systems,
Vol. 20, No. 2. June
1995
132
.
L, Ba?kgaard and L. Mark
* R
join
t
?7 project
project
Fig. 3,
Cache structure
for unnested
query,
L!? proJect
minus
Finally,
when
the cache for R \
(( III S, R. C](SNR))
\
(JIIS,
Z’.C](SX7’)))
has been updated, it is materialized, and the result In general, there is a view pointer cache per algebra
expression.
An
incremental
algorithm
is displayed. operator in
is associated
with
a relational each
rela-
tional algebra operator, and when a request is made to a view, the algorithm corresponding to the defining operator is executed, For the requested query, the algorithm updates the corresponding view pointer cache and materializes it. For subqueries, the algorithm updates the corresponding view pointer cache, propagates point cache. 7. DATA
changes
to higher-level
queries,
and materializes
the view
STRUCTURES
Three data structures are used as the basis for incremental computation. First, changes made to base relations are time-stamped and stored on differential files. There is one differential file per base relation. Second, a view pointer cache is stored on a file that contains the appropriate tuple identifiers and page identifiers. Third, a change set is stored as a change file, that is, an ACM
TransactIons
on Database
Systems,
Vol. 20, No. 2, June
1995
.
133
made
to a
Nested Relational Query Expressions extraction query 7.1 All
from
result
Differential tuples,
erated fiers
a set of differential
within
a certain
that
contain
the changes
of time.
Files
including
base relation
and maintained are never
tuple
modified.
a differential file until all relevant following
files
period
All
tuples,
identifier
are augmented that
changes
made
is globally
with
a system-gen-
unique.
to a base relation,
[Severance and Lohman 1976], A! 8, where queries have been updated. Differential
Tuple
identi-
R, are stored
on
they are kept files have the
attributes: TID’ Surr
Name: Domain:
TID Surr
Time
TimePoint
PID Ptr
Operator
Data
{INS, DEL}
Tuple
The attribute The attribute modification.
TID’ contains the tuple identifier of the differential file tuple. Time contains a time stamp that identifies the time of the The attribute TID contains the tuple identifier of the modified
base relation
tuple.
of the modified insertions and content of the deletion/insertion 7.2 View Pointer
The attribute
PID
base relation tuple. DEL for deletions. tuple
contains
The The
the (physical)
attribute attribute
after the modification. pair with identical time
page identifier
Operator contains INS for Data contains the actual
A tuple stamps.
update
is modeled
as a
Caches
A view pointer cache is a file with sequential access that identifies the set of tuples in a query result at a given point in time [Roussopoulos 1991]. Figure 2 illustrates A unary
a typical view pointer cache for a set difference. view pointer cache has the following attributes:
Name: Domain:
TID’ Surr
TID Surr
It can be used to identify
PID Ptr the tuples
corresponding
to a query
if each tuple
in
the query can be created from one operand tuple. This is true for queries defined by operators like selection, projection, and set difference. There is one cache tuple for each tuple in the corresponding query result, and its attributes are interpreted as follows: TID’ is a tuple identifier for the cache tuple,
and TID
and PID
contain
a tuple
identifier
corresponding operand tuple. A binary view pointer cache has the following Name: Domain:
It can be and join from both ing query, identifier tuple and
TID’ Surr
TID1 Surr
PID1 Ptr
TID2 Surr
and a page identifier
for the
attributes:
PID2 Ptr
used to cache the result of binary operators like Cartesian product where the resulting tuples are created as a combination of tuples operands. There is one cache tuple for each tuple in the correspondand its attributes are interpreted as follows: TID contains a tuple for the cache tuple, and TIDI, PIDI, TIDZ, and PIDZ contain the page identifiers of the relevant operand tuples. ACM
Transactions
on Database
Systems,
Vol. 20, No 2, June
1995,
134
L, BAgaard
Q
Entries memory
and L. Mark
on a unary pages
view
that
are
pointer
cache
accessed
are
stored
sequentially.
on a list
For
any
pair
of secondary of entries,
if
(tidb, tidb, pidb) is located on a page preceded by a page of (tida, tida, pida), then pida < pidb. This ensures that the query can be materialized without reading any page more than once [Roussopoulos 1991]. Furthermore, if pida = pidb, then tida < tidb. This ensures that the view pointer cache can be updated by merging (PID, TID). Entries on a binary memory (tid
pages
that
the cache view are
and the change
pointer
accessed
are
For
on a list any
pair
on a page preceded
are sorted
on
of secondary of entires, by the
tid2b,
tidla, then
pidla, pid2a
tid2a, pid2a), then pidla s pidlb. Furthermore, if pidla = < pid2b. This ensures that the view pointer cache can be
by a suboptimal number of page fetches [Roussopoulos can make it necessary to split an overfull cache page
page
if
pidlb,
materialized Insertions
is located
stored
if these
b, tidlb,
(tida, pidlb,
pid2b)
cache
sequentially.
files
of
1991]. into two
pages, and deletions can make it necessary to combine two or more sparse pages into one. In both cases the changes must be propagated to higher-level view pointer caches in order to update the tuple and page identifiers on these. 7.3
Change
Change
Files
files
described
for
in this
base
relations
subsection.
can
At least
be constructed two
strategies
change sets for complex query expressions. developed an algorithm that takes a relational generates two queries that define the sets. Inspired by Blakeley et al. [ 1986], to extract change sets from differential
by Algorithm
CFC,
Qian and Wiederhold [1991] algebra query as input and
corresponding insertion and deletion we use a change propagation method files. Briefly, the change file corre-
sponding to a cache node defined by a relational operator is extracted the deletion and insertion set(s) corresponding to the operand(s). A change-file tuple has the following attributes: Name: Domain:
TID Surr
PID Ptr
CFC.
(1) R’8 ~ a[(Time
from
Data Tuple
The semantics of a change-file (TID, PID, Data) has been modified. it belongs to a deletion set, and it insertion set. Change-file tuples do Algorithm CFC constructs deletion Algorithm
as
can be used to construct
tuple is that the tuple described by A change-file tuple describes a deletion if describes an insertion if it belongs to an not have tuple identifiers. and insertion sets for base relations.
Change file construction, > tl)AND (Time s fz)lR8
(2) Sort R’S on (TID, Time). (3) Scan R’8, and do the following for each modification sequence that refers to a given TID. For modifications, the deletion precedes the insertion: (3.1) If the first tuple is a deletion, then add (TID, PID, Data) to RD. then add (TID, PID, Data) to RI. If the last tuple is an insertion, (3.3) If a selected deletion/insertion pan- has identical data parts, then skip it,
(3.2)
ACM
TransactIons
on Database
Systems,
Vol. 20, No
2, June
1995
Nested Relational Query Expressions A change contains
set constructed
by Algorithm
only one modification
all of its modifications Step (3) of Algorithm insertions
and
CFC
per tuple
have a visible effect. CFC is motivated
deletions
that
have
in the
five
is minimal
relative
been
by the
fact
made
to
compressed
as described
hand sides
cover all possible change patterns. compressed sequences that have
sides show
corresponding
left-hand
The function be found
8. INCREMENTAL
compression
that
results
is defined
as follows.
Compress(DELX
~.. INS,
Compress(DELX
. . . INSY)
Similar
compression
- DELX
. . . INSY ) = INS,,
Compress(INSl
. . . DELY)
Compress(DELl
““” DEL,)
always
incremental algorithms
can
INS,,
= DELX.
results.
We illustrate
queries. propagate
algorithm. In order we also describe a we describe Rous-
caches to store
how change
for the set-difference
[ Gardarin
pointers
to the
sets are computed
We describe incremental changes to higher-level RC to denote
and
algorithms that queries, and that
the cache correspond-
four recomputation algorithms for the algorithms are based on various as-
about sorting and indexing, and the following to their sort–merge algorithm. Note that the
Algorithm
rules
- nil,
SPJ algorithms. use view pointer
the cache. We use the symbol
works
left-
ALGORITHMS
Smith and Chang [ 1975] described computation of set differences. Their
operator
four
be
) - nil,
Compress(INSX
of old query
sumptions identical
The
of
can
[ 1992]:
propagated to higher-level maintain the cache, that materialize ing to R.
sequence
tuple
below.
it
and that
Each of the four right-hand the same net effect as the
In this section we describe an incremental set-difference to compute the efficiency of the incremental algorithm, sort–merge-based recomputation algorithm. Furthermore, sopoulos’s [1991] The incremental
the
a given
rules
sense that
interval
side.
Compress
in Hanson
in the
to the time
135
.
operator.
This
algorithm, sort–merge
SMD, is method
is not the case for the join
1989]:
SMD.
Sort-merge
computation
of set differences:
R \
S.
(1) Sort R and S. (2) Scan R and S in parallel, (2.1) Display The
next
two
R tuples
algorithms
and do the following:
that are not found on S. and
cost models
are adapted
from
Roussopoulos
[1991]. The algorithm IS uses a unary view pointer cache to handle incremental computation of selections. The algorithm IJ uses a binary view pointer cache to handle incremental computation of joins. Roussopoulos suggested ACM
TransactIons
on Database
Systems,
Vol
20, No, 2, June
1995
136
L, Ba?kgaard and L. Mark
.
handling
projections
as selections
and removing
duplicates
during
material-
ization. IS is based selections
on the
[Blakeley a[P]((ROl~
Algorithm
following
formula
et al. 1986; \fi~)
U~l)
1S. Incremental
for
the
Roussopoulos =
incremental
((~[~]~o~d)
computation
computation
of
1991]: \~D)
of selection:
a[PIR1.
u
cr[ P ]R.
(1)I + ff[P]R1. (2)
Scan
Let the current
(a[P]R)C[tl].
from
(2.1) Remove RI) tuples
page be p.
p. Propagate
to (CT[PIR)~
(2.2) Add I tuples to p. Propagate
to (cr[ PIR)l
(2.3)
If all entries on p have sary, and rewrite p.
processed,
The notation
used in the algorithm
been
(t,, t, ].
(tl, tz 1 then
is interpreted
materialize,
as follows:
split
When
if neces-
tuples
are
added to the cache, it is the pointers that are added, and a tuple identifier for the new cache tuple is generated. When tuples are deleted from the cache, it is implicitly
assumed
change-set
tuple,
is deleted
from
(PID,
that
only
TID,
Data),
or inserted
into
(TID’, p,, Data) is propagated. IJ is based on the following [Blakeley
et al. 1986; ((~O,d
\~~) -
existing a cache
Roussopoulos
((~O,~NIP]SO,d)
\
Scan
(Rco[P]S)L[tl].
(TID’,
page number
When TID,
P,,
a
PID)
the tuple of joins
1991]: \S~)
USI)
((R~~[I’lSO,~)
computation
U (~O~d=[p]s~)))
of join:
u (R1m[P]((s
Let the current
(2.1) Remove RI tuples from
R cc[ P ]S.
\ SD)
u
s,)).
page be P.
p. Propagate
to (RcDIPIS)D(tl, tz 1.
(2.2) Add 1 tuples to P. Propagate
to (R CGIPIS)l (tl, tz 1.
(2.3)
processed,
If all sary,
deleted.
u (Rnew~[P]sl)).
(1)I - (((R\RD) u R,)m[P]s,) (2)
are
if the tuple
page with
U~~)m[P]((SO~d
LJ. Incremental
tuples
“ - “ is an assignment operator. formula for incremental computation
u((R1=[P]snew) Algorithm
cache
is propagated,
entries on p have and rewrite p.
been
then
materialize,
split
if neces-
In SPJ, queries, selections, projections, and joins can be computed independently because the project operator distributes over selections and joins. This makes it possible to remove duplicates during materialization. The project operator does not distribute over the set-difference operator. The consequence is that management of duplicates must be done continuously. It can be done for the projection caches, but this solution requires bookkeeping that cannot be done without extension to the view pointer cache data structures. ACM
TransactIons
on Database
Systems,
Vol. 20, No. 2, June
1995
Nested Relational Query Expressions Our
solution
of the
is to handle
generalized
algorithm
combined
set-difference
maintains
and
ized set-difference The incremental
materializes
operator. set-difference
observations regarding attributes (al, . . ..a~.
projections
operator.
and set differences
Our
a view
algorithm,
incremental
pointer ID,
in terms
set-difference
cache
for the
is based
on the
generalfollowing
change propagation: We assume that R and S have ..., am) and that the content of a change-file tuple is v~ ). We also assume that the query expression v~,...,
described by (vi,..., R\ [al,..., am ]S was computed at time tl and is being computed time tz, according to the intermediate changes made to R and S. Figure 1 refers to the tuple identifiers and data parts of the queries and change sets, and it presumes that First, there is no overlap between corresponding
the change insertion
Second,
corresponding
the that
the
deletion
insertion
sets
Algorithm
(1)
ID.
INSI
sets are contained and
the propagated
the
Incremental [al,...,
III Rl(S~X
queries
are
again
at
involved
sets are minimal. and deletion sets. queries.
disjoint.
This
Third, ensures
sets are minimal. computation
of R \ [al,...,
am ]S.
amlS[tzl.
(2.1) DEL + III RI(SIZICZI, (2.2) INSZ -
in the
corresponding
change
+- R,\
137
.
..., a.l R[tzl).
[al,...,
(3)
Sort RD and INS1, INS,
(4)
Scan (R \ [al,..., cache page be p.
anlR[tzl). and DEL on (PID, TID).
a,. ]S)C[tl],
R~, INS1, INSZ, and DEL. Let the current
(4.1) Remove RD tuples from p. Propagate to (1? \ [al,..., (4.2) Remove DEL tuples from p. Propagate to (R \ [a,,..., (4.3) Add INSI tuples to p. Propagate
to (R \ [al,...,
a,.]S)~ a~lS)~
an,lS)l
(tl, t2]. (tl, t~ ].
(tl, tz 1.
tz 1. (4.4) Add INSZ tuples to p. Propagate to (R \ [al,..., amlS)l (tl, split (4.5) If all entries on p have been processed, then materialize, necessary, and rewrite p.
if
Step (1) subtracts S[t21 from RI, giving the subset of RI that must be added to R\ [al,..., an ]S. Step (2) computes the intersection between SD and R[ t2] (giving the subset of SD that must be added to R \ [al,..., an,lS) and the intersection between S + and R[ tz ] (giving the subset of SI that contains potential deletions from R \ [ a ~, . . . . am] S). In order to facilitate a merge update of the cache, the pointers on SD and SI are replaced by corresponding R pointers. Step (4) scans (R \ [al,... , am] S) Jtl], RD, INS1, INS2, and DEL in parallel. The cache organization and the sortings in step (3) ensure that no pages from the scanned files have to be fetched more than once. During each iteration of the scan, only hitherto unprocessed TIDs that are relevant for all five pages
are processed.
Specifically,
in each iteration
the largest
(PID,
TID)
pair on each of the five pages is found. Among these, the smallest, (pid, tid), is identified. Only hitherto unprocessed entries up to (pid, tid) are processed. At the end of each iteration, the next page is fetched on a scanned file if all ACM Transactions
on Database
Systems,
Vol. 20, No 2. June
1995.
138
.
L B-kgaard
entries on its current least one file. 9. COST In this
page have been processed.
ANALYSIS section
and L Mark
AND
we analyze
the
cost efficiency
true
for at
of incremental
computation
of
algebra queries. First, we analyze the Next, we analyze three computation Sort–merge computation of the nested
version, sort–merge computation of the computation of the unnested version. Cost
is necessarily
COMPUTATIONS
unnested versions of nested relational set-difference operation in isolation. strategies for a class of nested queries:
9.1
This
unnested
version,
and
incremental
Models
For each algorithm
we describe
an I/O-based
cost model
that
can be used to
estimate the number of necessary references to secondary memory Thus, we do not consider the algorithms’ consumption of CPU time.
pages. For all
algorithms we have excluded the cost of writing the result since this cost is exactly the same in all cases. Our cost model notation is summarized in Table III. Our cost models for the incremental algorithms presume that the operands are base relations. Only minor changes are needed in order to handle the situations where one or both operands are defined by complex The cost of change file construction following
query expressions. can be estimated
by
means
of the
cost model:
c ~*~
= 46,
if O