Incremental computation of nested relational query expressions

1 downloads 0 Views 2MB Size Report
on view pointer caches. .... on Database. Systems, Vol 20, No 2, June 1995 ... pointer caches to store the result of old queries as pointers to the qualifying tuples.
Incremental Computation of Nested Relational Query Expressions LARS

BAEKGAARD

Aalborg

University

and LEO

MARK

Georgia

Institute

of Technology

Efficient algorithms for incrementally computmg nested query expressions do not exist. Nested query expressions are query expressions in which selection/join predicates contain subqueries. In order to respond to this problem, we propose a two-step strategy for incrementally computing nested query expressions. In step (1), the query expression is transformed into an equivalent unnested flat query expression, In step (2), the flat query expression is incrementally computed. To support step (1), we have developed a very concise algebra-to-algebra transformation algorithm, and we have formally proved its correctness The flat query expressions resulting from the transformation make intensive use of the relational set-difference operator. To support step (2), we present and analyze an efficient algorithm for incrementally computing set differences based on view pointer caches. When combined with existing incremental algorithms for SPJ queries, our incremental set-difference algorithm can be used to compute the unnested flat query expressions efficiently. It is important to notice that without our incremental set-difference algorithm the existing incremental algorithms for SPJ queries are useless for any query involving the set-difference operator, including queries that are not the result of unnesting nested queries. Categories and Subject Descriptors: H.2.2 [Database Management]: Physical Design—access methods; H.2.3 [Database Management]: Languages—query languages; H.2.4 [Database Management]: Systems—query processes General

Terms: Algorithms,

Performance

Additional Key Words and Phrases: Incremental differences, unnesting, view pointer caches

computation,

nested

query

expressions,

set

1. INTRODUCTION Since the emergence of the relational model of data [Codd research effort has been devoted to the problems of efficiently

1970], much computing

Authors’ addresses: L. B=kgaard, Department of Mathematics and Computer Science, Aalborg University, Fr. Bajers Vej 7E, DK-9220 Aalborg 0, Denmark; L. Mark, College of Computing, Georgia Institute of Technology, Atlanta, GA 30322-0280. Permission to make digital/hard copy of part or all of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage, the copyright notice, the title of the publication, and its date appear, and notice is given that copying is by permission of ACM, Inc. To copy otherwise, to republish, to post on servers, or to redistribute to lists, requires prior specific permission and/or a fee. @ 1995 ACM 0362-5915/95/0600-0111 $03.50 ACM Transactmns on Database Systems, Vol 20, No. 2, June 1995, Pages 111-148

112

.

L Ba?kgaard and L Mark

relational

query

expressions

[Blakeley

and

Martin

1990;

Jarke

1985;

Negri

and Pelagatti, 1991; Omiecinski 1989; Ozsu and Meechen 1990; Sacco 1986; Scholl et al. 1987; Sellis 1986; Unman 1989; Valduriez 1987]. A query can be computed

by means

The

idea

basic

of recomputation,

or it can be computed

[Jarke

of recomputation

and

Koch

1984;

incrementally.

Smith

and

Chang

1975; Unman 1989] is to compute a query expression from scratch each time computation [Blakeley et al. it is referenced. The basic idea of incremental 1986; Blakeley and Martin 1990; Hanson 1987; Lindsay et al. 1986; Qian and Wiederhold persistent

1991; Roussopoulos caches and to reuse

We are expressions

1991] them.

is to store

not aware of any incremental of the following form:

SELECT FROM WHERE

the

result

algorithms

of old queries

for

SQL-like

in

query

.,. &LECT FROM WHERE

.. )

CONTAINS

(SELECT FROM WHERE

.. .. . ..)

The most likely explanation is that it is very difficult, if not impossible, to define simple and efficient incremental update rules for such queries. In general, it is very hard to compute such queries efficiently at all. The simplest solution fact that but

would this

be to use a nested-iteration can be very

we are not

aware

expensive.

strategy,

Indices

of any efficient

may

methods

but

it is a well-known

be utilized

that

in special

are generally

cases,

applicable.

Nested query expressions are very useful in many situations, but because of the lack of efficient computation methods, most commercial database management systems do not support the CONTAINS operator. In response to this

problem,

we suggest

a two-step

strategy

for the incremental

of nested query expressions. In step (1) all nesting CONTAINS, are removed by transforming a nested

computation

operators, like IN and query expression to an

equivalent flat expression. In step (2) the flat expressions are computed by conventional recomputation methods or by incremental methods. With respect to step (l), we present an unnesting algorithm that transforms nested relational algebra expressions into equivalent flat relational algebra expressions. The resulting expressions are based on a combination of selections, projections, joins, and set differences. Other transformation algorithms exist, but we have invented a very simple and concise notation based on algebra-to-algebra transformations. Our novel notation has made it possible to formulate transformation in a clear and readable way, and it has made it possible for us to construct a simple and convincing correctness proof for the transformation algorithm. In order to facilitate the correctness proof, we have expressed our unnesting algorithm in terms of algebra-to-algebra transformations. In order to make the results directly applicable to SQL, we have extended the relational algebra with simple nesting constructs that resemble the structure of SQL. No existing nested algebra has this desirable characteristic. We have exACM

TransactIons

on Database

Systems,

Vol

20, No 2, June

1995

Nested Relational Query Expressions pressed

most

of our

examples

paper more readable. Our unnesting algorithm

in an SQL-like can transform

notation

tree

in order

queries

with

. to make

arbitrarily

nesting. Specifically, it transforms nested comparison selections tion predicates of the form “(value) = (query) ,“ set-membership with selection predicates selections with selection Subqueries and outer not transform queries With

respect

incremental

to

of the form “(value) predicates of the form

step

(2),

we

present

algorithm.

related.

and

set-difference

set-difference algorithms, expressions view

operator.

We

algorithms. our algorithms

the

incremental

qualifying incremental

caches

not

aware

Our

to

store

incremental

the

result

generated algorithms

algebra.

efficient

incremental

existing incremental to compute nested queries

In Section

4 we present

three

Section 6 we discuss the ideas and assumptions tation, and we summarize the state of the art. data

structures,

that

is cost

efficiency

view

pointer

In Section efficient

caches,

many

of our incremental

a sort–merge

2. RELATED

that

8 we present

in

SPJ query

algorithm

uses

as pointers

to the

component in the the set-difference

by our unnesting algorithm or not. Without it, for SPJ queries cannot be used. Our cost model

expressions. In Section 5 we present an unnesting the categories from Section 4, and we formally

computation.

of an can be

computations strongly indicate that incremental query computation rior to recomputation in many situations. In Section 2 we discuss related work. In Section 3 we present relational

does

efficiency

set-difference of old

algorithm

algorithms

tuples [Roussopoulos 1991]. It is a necessary computation of any query expression involving

operator, whether other incremental

deep

with selecselections

queries [Roussopoumake intensive use of

of any

When combined with can be used efficiently

incrementally.

pointer

are

Our

analyze

Existing

used for the computation of SPJ (selection–project–join) 10S 1991]. However, transformed CONTAINS queries the

the

IN (query) ,“ and set-inclusion “(query) CONTAINS (query ).”

queries can be mutually with aggregate functions.

set-difference

113

set-difference

categories

query on In

of incremental query compuIn Section 7 we describe the

we use as the basis In

a nested

of nested

algorithm that is based prove its correctness.

an incremental

situations.

is supe-

Section

algorithm

and compare

algorithm.

In Section

for incremental

set-difference

algorithm

9 we analyze it to the 1/0

10 we conclude

the

1/0

efficiency the

of

paper.

WORK

Much work on query transformation has aimed at producing semantically equivalent versions of a given query expression that can be computed more efficiently than the original expression [Nakano 1995; Unman 1989]. Recently,

a considerable

be expressed Gottlob 1989]. Kim’s flat

SQL

1985; Dayal [1982]

amount

as SPJ queries 1987;

algorithm

queries.

The

of work

has been done on operators

[Bzekgaard Ganski

transforms set of source ACM

1993; Bultzingslowen

and Wong SQL-like queries

TransactIons

1987;

Kim

nested includes

on Database

that

cannot

1987; Ceri

and

1982; Muralikrishna

queries

into

set-membership Systems,

equivalent queries

Vol. 20, No, 2, June

1995.

114

L. B~kgaard

.

using

the

IN

operator,

and various the

set-inclusion

aggregate

functions.

nested-iteration

1979]

and L, Mark

method

is inefficient

equivalent

join

in

used

most

queries,

queries Kim

appropriate join computation Ganski and Wong’s [ 1987]

in systems

cases.

Kim

using

the

was motivated By

like

System

transforming

enabled

the

method. algorithm

CONTAINS

represents

that

R [Astrahan

the

query

operator,

by the observation nested

et al.

queries

to

optimizer

to use the

most

a solution

to a bug in Kim’s

algorithm that is caused by the possibility of duplicate rows in SQL. Furthermore, their algorithm extends the set of source queries that can be transformed. For example, it is able to unnest queries containing the EXISTS operator, Ceri and into

Gottlob’s

queries

aggregate

functions.

efficient define

[ 1985]

formulated

query

algorithm

in Whereas

computation,

the semantics

first

into

are transformed gate functions. SQL

algebra

Kim

and

Ceri

and

of SQL in terms

argued that by transforming identification of syntactically Bultzingsloewen’s [ 1987] queries,

transforms

relational

calculus

nested

extended

Ganski

and

Gottlob

Wong

used

of relational

SQL-like

with

their

queries

a notation

focused

algebra.

they

into algebra rather than into SQL they facilitate different but semantic equivalent queries. two-step algorithm transforms nested SQL

and then

into

algebra.

In the first

step, the queries

In the

second

step,

the

calculus

queries

are transformed

into

algebra enhanced with a notation for aggregate functions. purpose of this step is to facilitate efficient query computation.

opposed

to Kim, how

construct. Muralikrishna A query

Ganski

and Wong,

to transform [ 1989] focused

is a tree-query

and

queries

Ceri

to SQL with

the

is more

advantages

approach

Gottlob, the

on data-flow-based

if there

variant of the relational algebra that By doing this he combined a study language. Our unnesting

and

containing

than

Kim and Ceri and Gottlob focused mostly at most one subquery at each level. Dayal [ 1987] studied the transformation

related

to

Furthermore,

into relational calculus enhanced with a notation for aggreThe primary purpose of this step is to define the semantics of

queries.

showed

on

transformations

relational primary wen

for

solely

differs

and

Biiltzingsloe-

GROUP-BY/HAVING

evaluation

one subquery

on linear

The As

queries

of nested

of tree-queries. at the same level. in which

SQL

there

is

into

a

queries

handles the problem of duplicate rows. of some problems that are inherently of using

from

relational

the previous

algebra work

as the target

in two ways.

First,

transformations are applied to nested algebra queries, and the transformed queries are algebra queries as well. This has made it possible for us to use a very concise and precise notation for the various transformations. Second, we have formally proved the correctness of our algorithm. Conceptually, our transformation rules are similar to the ones proposed by Kim [1982] and by Ceri and Gottlob [1985]. Our major contributions are the use of a concise and readable algebra to algebra notation and the proof of correctness. A number of nested algebras have been proposed for nested relations [Ozsoyoglu ACM

et al.

TransactIons

1987;

on Database

Paradaens Systems,

Vol

and

Van

20, No 2, June

Gucht 1995

1992;

Roth

et al.

1988;

Nested Relational Query Expressions Schek due

and

Scholl

to the

relations

Scholl of the

a nested

relation,

Colby

[ 1989]

into

nested

1986;

occurrence

relation.

et al. 1987]. operators

The

nesting

in these

NEST,

which

transforms

and UNNEST, designed

which

a recursive

removes algebra

115

. algebras

of

from

a

nesting that

is

a set

can do the

same processing as the nested algebras without the need for the operators NEST and UNNEST. Gyssens and Van Gucht [1988] suggested using a power set algebra operators Among paid

as a means

of querying

NEST and UNNEST. the relational algebra

to SPJ

queries.

nested

relations

operators,

It has been

without

the need for the

most

research

attention

that

end-user

queries

assumed

has been tend

dominated . . . FROM

by a combination of these operators. The SQL statement . . . WHERE . . . directly reflects a SPJ query. Very little has been paid to the set difference and set union operators. The

efficient

computation

of relational

set

differences

has

to be

SELECT attention

not

received

much research attention. Smith and Chang [1975] developed a set of recomputation algorithms for the computation of set differences. We have used their sort–merge algorithm as a point of reference and comparison for the analysis of the efficiency of our incremental our incremental set-difference algorithm,

set-difference algorithm. Without other incremental algorithms can-

not be used in queries involving the set-difference Blakeley et al. [1986] developed an algorithm and

incremental

developed

computation

an algorithm

combination

of SPJ

that

of selections,

computes

queries. change

projections,

operator. for change-set Qian

and

sets for any query

multiplications,

computation

Wiederhold

[1991]

defined

set unions,

and

as a set

differences. Jensen et al. [1991] studied SPJ queries in the context of transaction-time databases and added the notion of decremental computation of time slices, that is, query results as of some time in the past. Neither of these approaches provide cost models or cost analysis. Incremental computation in rule-based

systems

et al. 1989; Wolfson

et al. 1991;

Roussopoulos and

has been

Stonebraker

cost analysis.

algorithms

for

Ceri

[ 1991]

studied

Hanson

and Widom

1991;

studied

SPJ queries

He described their

by a number

et al. 1990;

efficient

data

of researchers

et al. 1990;

Carey

Hanson

1992].

in detail

and provided

structures

computation

for view and

pointer

materialization.

[Rosenthal et al. 1990; cost models caches He

and used

simulations and cost computations tation to the cost of computation incremental algorithms outperform tions.

to compare the cost of incremental compuby recomputation. He concluded that his recomputation algorithms in many situa-

3. NESTED

EXPRESSIONS

In this lNF

RELATIONAL

section

relations,

we present that

QUERY a nested

is, queries

with

relational nested

algebra

for nested

comparison

predicates,

queries

on

set-mem-

bership predicates, and set-inclusion predicates. The structure of our nested algebra resembles the structure of SQL directly. This makes it very easy to use our algorithms to process SQL-like nested queries on lNF relations. A number of nested algebras have been proposed for queries on nested relations [ Schek and Scholl 1986; Scholl et al. 1987; Ozsoyoglu et al. 1987]. ACM

TransactIons

on Database

Systems,

Vol. 20, No. 2, June

1995

116

L. B~kgaard

.

and L. Mark Table I.

Operators Comment

Definition

Operator (J[P]R

{rlr

II[czl,...,

Relational

a,]R

SelectIon R(al:dl, Projection R(al:dl, Cartesian

GRAP(r)}

(( UI,...,,)l$

(UI,I,

u,,,,

un)=~}un)=~}

{(up,..., u,n)l(ul, Un)=RA)=RA (U,t+l,..., u,,,)=s}

RxS

Rx[p]s

a[P](R

Rx,$j’

m%,..

al]S

a[l(R. al, . . .. G Tidal,,.,,

... a.:dn)

R(al:dl,..., cz,:d,, an:dn):dn) S(a, :d,, ... an:dn, ,a,n:dJn) Difference R(al:dl,..., an:dn) S(al:dl, . .,andn) Generalized difference

a.),) a,]S]R

R(al:dl,..., S’(al:dl, Union R(al:dl,. S(aI :dl,..., Intersection

RuS

{tlt=RAt=S}

RnS

d,,

= S.an]S)

R\S

R\[al,...,

. ,al: product

R(al:dl,..., an:drt) S(an+l d,, +l,. .,a,n:d,n) Join R(al:dl, . . . ..d .).) S(a,, + I:dn + 1, ... al,, :dln) Natural join

X S)

>a,,ll (R=[l?.a, = Sat ~ Ran

... an:dn)

L’3(al:

ar, dn) ,an: d.) .,an:dn) an dn) R(al : d,,...,

all,...,

an : dn)

am:).)

Their structure resembles the structure of nested relations. Dadashzadeh [1989] presented an improved division operator for the relational algebra. Ozsoyoglu and Wang [1989] presented a relational calculus with set operators. Both of these approaches represent query facilities that resemble our nested algebra, but they are not easily translated into SQL. The relational operators are defined in Table I. Single-value, single-attribute

queries

can be treated

as expressions,

and vice versa.

Two

queries

are

equivalent, R = S, if and only if (iffl they return the same set of tuples whenever the same base relations are substituted for common base relation names in the two queries. that is, it can be computed with A query expression, R, can be time sliced,

t is an expression that a time index by means of the notation R[ t ], where evaluates to a time point value. The effect is to compute R on the state of the ~[ S ] R to symbolize underlying database as of time t.We use the notation the projection of R onto the attributes of S. Our examples will be based on the following relational schemata for a small ACM

simplified TransactIons

library on Database

database.

The

Systems,

20, No. 2, June

Vol

underlined 1995

attributes

are (composite)

Nested Relational Query Expressions

.

117

keys: Borrowers

(Name,

Books

(Title,

Loans

(Name,

Reservations Authors

numbers

times and

(Name,

Title)

Title,

Area)

Borrowers.

of books

Reservations.Name

by each

are

Loans. Title

THREE

and

CATEGORIES

Op={=,

an

,

foreign

keys

Reservations.

u[EOp

T]R,

of type

String

of type

bute relational expressions. A set-inclusion selection

5. the

of

Loans.Name

Borrowers. keys

following

cr[SOp

The

at-

pointing

to

forms,

Each the where

or

TJR.

Integer.

S and

has one of the following

T

are

single-tuple,

forms:

oISGT]R.

String

or Integer.

S is a single-tuple,

T is a single-attribute relational has one of the following forms,

single-attri-

expression. R, S, and T are

expressions: a[T=S]R, queries

are flat

u[Tx queries.

S]R,

o-[ T>

S]R.

R, S, and T are called

left-hand-side inner block (the contained set), and the block (the containing set), respectively. The level of a predicate (or inner block) is equal

the

outer

block,

right-hand-side to

the

depth

the inner

of the

surrounding query expression down to the predicate (or inner block). depth of a query expression is defined recursively as follows: The depth flat query predicate

to-

expressions.

selection

E is an expression

other

to foreign

the

number

EXPRESSIONS

we present in Section selection has one of

(r[EG!l’]R,

All

are

QUERY

total

The attributes

pointing

Title

contain

the

of three categories of nested query expressions. into four types, and they are used to structure

relational

A set-membership

relational

and

s,>}:

expression

single-attribute

Books. NoOfLoans borrower

respectively.

OF NESTED

unnesting algorithm that A nested comparison

is

and

NoO@ootls

borrowed

We study the unnesting category is subdivided

E

NoOfiooks)

NoOfLoans)

each book has been borrowed,

tributes Books. 4.

Areaj

Author,

Title)

(Name,

attributes

‘Ike

tal

Address,

Type,

expression of a depth

is 1. If a flat query k query expression,

The of a

expression is inserted into a level k the result is a depth k + 1 query

expression. The term free reference is defined as follows: An attribute reference refers to the deepest higher level relation in its scope. If no such relation exists, then the reference is a free reference. Multiple occurrences of a relation ACM

TransactIons

on Database

Systems,

Vol. 20, No. 2, June

1995.

118

L. B~kgaard

.

and L. Mark

name, R, in a query expression can be distinguished by means of quotes: R’, R“, etc. Only subquery expressions can contain free references. tuple dependent iff it contains a free reference. An inner block is selection Otherwise,

it is selection

of a subquery

tuple

expression

independent.

of an inner

The

block

dependence/independence

is defined

similarly.

A selection

predicate can be selection tuple dependent even if it contains no inner blocks. This is the case for the query a [ NoO/Loans > 25] Books. A set-membership query expression can be formulated as a set-inclusion E as a query expression. c [ E ● T] R = w [ T Q E] R, since we can interpret single-attribute, single-tuple relation. We have explicitly included the setmembership queries for three reasons: First, many queries are most naturally formulated as set-membership queries. Second, a set-membership query expression left-hand

contains

the

side must

extra

algorithm

is

expression

can be computed

applied

to

equivalent set-inclusion For each of the three selections.

information

be a single

tuple

to the

expression.

a set-membership more

query

efficiently

query expression. categories of nested

The reason

is that

query Third, the

the

unnested

selections, inner

that

the

the unnesting

expression

than

the left-hand-side

evaluator when

resulting

result

there

are four

block

may

of an

types

or may

of not

be selection tuple dependent and the right-hand-side inner block may or may not be selection tuple dependent. The corresponding 12 types of nested selections are described in Table 11, C, M, and I denote nested comparison selections, set-membership selections, and set-inclusion selections, respectively. The following have

Type

been borrowed SELECT FROM WHERE

(2’I nested more

comparison

times

than

>

(SELECT

selection

the book titled

The

following

SELECT

FROM WHERE

Type

Cz nested

SELECT FROM WHERE

TransactIons

selection

at most

extracts

all loans

where

50 times:

* Loans

50>

(SELECT FROM

query

is an equivalent

No Oj5Loans Books TtllQ = Loans

Type

(SELECT

on Database

NoOfioans Books Title = Loans Systems,

Vol

20, No

.Title

C~ nested

‘ Loans FROM WHERE

ACM

that

NoOfLoans Books Title = “Tomorrow”)

comparison

book has been borrowed

WHERE

The following

all books

“ Books NoO/Loans

FROM WHERE

the borrowed

extracts “Tomorrow”:

.Title)

50)

Vol. 20, No 2, June

1995

Title Books NoOfLoans

all borrowers at least

> 50)

that

50 times:

have

)

Nested Relational Query Expressions The

following

currently

Type

borrowed

SELECT FROM WHERE

Is

set-inclusion

books

have

selection

extracts

all been borrowed

“ Borrowers (SELECT FROM WHERE

Books NoOfLoans

(SELECT FROM WHERE

Title Loans Name = Borrowers .Name)

all

at Ieast

121

.

borrowers

whose

50 times:

Title > 50)

CONTAINS

The following currently

Type

borrowed

SELECT FROM WHERE

IL set-inclusion

selection

all of the books

that

extracts

they

have

all borrowers currently

that

have

reserved:

* Borrowers

(SELECT FROM WHERE

Loans

Title

(SELECT FROM WHERE

Title Reservations Name = Borrowers .Name)

Name = Borrowers .Name)

CONTAINS

5. AN In

this

UNNESTING section

algebra queries way to compute compute

ALGORITHM

we present

an algorithm

into equivalent relational the Type 1A set-inclusion

that

transforms

algebra selection

nested

relational

queries. The most obvious from Section 4 is simply to

the subqueries

SELECT FROM WHERE

Title

Reservations Name

= Borrowers

.Name

and SELECT FROM WHERE for each tuple is true. This

Title Loans

Name = Borrowers .Name in Borrowers, computation

and to select the tuple if the inclusion method is used to compute nested

systems like System R [Astrahan et al. [1982], the cost of such a nested-iteration Alternatively,

Borrowers

could

1979], but computation

be sorted

on Name

predicate queries in

as pointed out by Kim can be very high. and

Reservations,

and

Loans could be sorted on Name and Title. Then, the three relations could be scanned in parallel, and the desired Borrowers tuples could be extracted. This is possible because both Reservations and Loans are related to Borrowers via

Name,

and in most

cases it is considerably

more

ACM

Systems,

Transactions

on Database

efficient

than

nested

Vol. 20, No. 2, June

1995

122

.

L, B=kgaard

iteration than

for the same reason

nested-iteration

It is, however, to compute arbitrary

more

and

inner

joins

usually

are more

efficient

1989]. strategy

should

efficiently

of nesting

exists and

blocks

be used in the general

than

nested

iteration.

at all. The complexity

the and

possible outer

lack

blocks

introduced

of symmetry would

case

Moreover, in

probably

it by the

make

very complex. Therefore, it seems to be more fruitful query expressions to equivalent flat expressions and then

query

We formally transformation

sort–merge

if such a strategy

such a strategy transform nested

of Ceri

which

queries

between

existing 1990].

that

[Unman

not clear

depth

relationships

apply [Codd

joins

nested

is not even clear the

and L. Mark

computation

techniques

to the

transformed

to to

expression

prove that the algorithm always preserves equivalence. Our rules are inspired by the work of Kim [ 1982] and by the work Gottlob

[1985].

We

have

reformulated

because our notation differs substantially First, our notation is considerably more

the

rules

from

scratch

from the notation used by others. concise and readable than existing

notations. Second, our notation has made it possible for us to construct concise and convincing proof of the correctness of our algorithm. Our pings,

unnesting that is,

expression.

algorithm is formulated in terms of query expression functions that map a set of parameters to a single

is a query the actual

expression parameters

ters

in the

S, T)~(Rx

S)\(Rx

T)

mapping. When a query expression mapping is applied, are inserted where the corresponding formal parame-

right-hand

side.

Only

queries

that

contain nesting, and since joins are defined in terms - CT[ P ]( R x S), we only need to describe unnesting The following with

atomic

three

equivalences

allow

us to restrict

contain

predicates

can

of selections, (R ~ [ P ] S ) of nested selections. unnesting

to selections

predicates: ~[P1

APZ]R-

OIP1

vPz]R-u cr[l

5.1

mapquery

For example, q:(R,

occur

a

Set-Membership

P]R

Predicates

~[Pll

Rna[I’21R,

IP1]Rua[Pzl -R

\

(1) R,

u[P]R.

and Nested

(2) (3)

Comparison

Predicates

We present a set of transformation rules that can be used to unnest selections with set-membership predicates or nested comparison predicates. The rules are inspired by Kim’s [1992] unnesting rules and can be viewed as generalizations of these formulated in terms of relational algebra. The following rule removes one level of nesting from a set-membership selection of the form a [ E G H[ A]S ] R, where A is a subset of S’s attributes. We assume that S is replaced by q(S’l, . . . . S,), where {Sl, . . . , S,} is the set of q is constructed base relations in S, and the query expression mapping that q(S1, . . . . S~) = S. Before the rule is applied, all projections in ACM

TransactIons

cm Database

Systems,

Vol

20, No

2, June

1995

such S are

Nested Relational Query Expressions augmented changed

with into

R‘s

attributes,

and

all

multiplications,

.

123

S, x SJ, in

S are

11[ S,, Sj, 7?]( S, X S1):

a[EGrIIA]q(S1,

...,

S~)]R

-

rIIR](u[E=A]q(R

XS1,...,

R XS.

)). (4)

Recall

that

an R tuple

is selected

iff the

left-hand-side

expression

(or inner

block) is a member of S. There is potentially one version of S per R tuple. The basic idea underlying (4) is to combine each R tuple with its induced right-hand-side tuples. We do this, by multiplying R with all of the base relations in S, to select all tuples in the resulting relation that satisfy E = A and to project in

S with

the result

R’s

surrounding

and the attributes

R’s attributes.

are

of Cartesian

done

how

constructed

the

with

that

these

a projection

q(R)

type -

Mz

a[ Title

of projections are

not

lost,

The

onto the attributes

is done to remove

following

q, such that

The augmentation

to ensure

products

of the operands

of the operands. Let us illustrate have

onto

attributes

R’s attributes

selection

of R

from

one

is transformed.

We

= Loans. Title]R:

u [ Name E 11[ Name] u [ Title = Loans .Title] Reservations] Loans + u [ Names G 11[ Name] q(Reseruations )] Loans + H[ Loans] u [ Loans .Name = Reseruations.Name] q ( Loans X Reservations ) + II [ Loans ] Loans ~ Reservations. Note

that

type

selections

Ma and

by rule

The following from comparison to projections a[130p

Mb selections

are transformed

into

nested

comparison

(4). slightly modified version selections. We presume

as described

HIA]q(Sl,

...,

of (4) can be used to remove the same preprocessing with

nesting respect

above: S.)]R

-

III R](a[EOp

A]q(R

XS1,...,

R XS~)). (5)

(5) is identical to (4) except that in (4) the set-membership operator, = , is replaced by = in the flat selection, whereas in (5) the comparison operator Op is copied selections

from

the nested

are transformed

selection into

type

to the flat

selection.

C~ selections

must be changed into C’z selections before (5) is applied. not designed to transform type C~ selections. 1. Ifq(S1,.. combined with

LEMMA

tuples

We

PROOF.

algebra

prove

Note

by rule

Because

., S.) E S, then q(R x S1,..., their induced S tuples.

R x S.)

Lemma

structure

1 by induction

on the

that

type

Cl

(5). C~ selections rule

(5) is

contains

all R

of relational

queries.

Basis. If S is a base relation, then tuples combined with induced S tuples. Induction. expression

q(R

x S)

obviously

contains

all

R

In each of the following cases, we assume that the query mappings q ~ and q~ have been successfully applied to Z’l and T2, ACM

Transactions

on Database

Systems,

Vol. 20, No. 2, June

1995

124

L. Baekgaard and L. Mark

.

respectively: (Tl,, . . . , 7’1,) is the set of base relations is the set of base relations in Tz. Projection. Tll,

If

S -

III A]T1,

. . . . R X Tll ) contains

Selection. T 11,...)

If

all

S E cr[P]T1,

R X Tl, ) contains

Cartesian

then

product.

then

H

S=

q(R

x

Tll,

then

S= T1u Tz, then q(RXT1,,..., Rx TIL)uqJRx Tz,,..., T:,,..., with their induced S tuples.

Setdifference.

lf

xTz)=ql(R

S~Tl

xT~,,

\Tz,

. . .. Rx

their

~nduced

with

their

q(RXTll,

then

q(Rx

T1,,...,

x

induced

S tuples.

. . .. RX

T1. Rx

RxTz~

R XT1, RX TZ,,..., R x Tz,)’ contains all

T1. ) \qz(RXTz,,

x

S tuples.

. . . . R x T1 ) = cr[P]ql(R

RXT1 )xqg(Rx Tz,,..., their induced S tuples.

If

=ql(Rx combined

. . . . R X T1 ) = III A]ql(R with

combined

TIXTZ,

. . ., RX Tz,)~ql(R xTll,..., t&& all R tuples combined with

x Tll,

combined

all R tuples

T

Union.

q(R

R tuples

in 1“1, and (T’zl, . . . . 7’Z,)

R XTl,

RX

con.

1

RXTZ) R tupl&

Tz,,...,

. .. JR XTzcontainsns

R

all

R

J

tuplek

combined

with

Given the induction maintained. ❑ LEMMA

2.

(4)

bership

predicate

algebra

query

PROOF.

combined

hypothesis,

transforms contains

exactly

E 3.



5.2

(5)

Similar

Set-Inclusion

In this

it is clear

that

the desired

tuple

set is always

any set-membership selection, where the set-memno nested queries, into an equivalent flat relational

the

combination

of R tuples

and

induced

S tuples



11[ A]S. transforms

the comparison predicate relational algebra query. PROOF.

S tuples.

Lemma 1 states that q(R x S1, . . . . R x S.) contains all R tuples with their induced S tuples. Therefore, the selection predicate

satisfy

LEMMA

induced

expression.

E = A extracts that

their

any nested contains

to the proof

selection

no nested

for Lemma

2.

comparison

queries,

into

selection, an equivalent

where flat



Predicates

subsection

we present

a set of transformation

rules

that

can be used

to unnest selections with set-inclusion predicates. The rules are inspired by Ceri and Gottlob’s [1985] unnesting rules and can be viewed as reformulations of these in terms of relational algebra. We describe the unnesting of set-inclusion selections like a[ 11[ Az ]T 2 JI[ A1]S]R, where Al is a subset of S’s attributes and Az is a subset of T’s attributes. We assume that S is replaced by ql(S1, . . . . S.). {S1, . . . . S.} is the set of base relations in S. ql is constructed such that ql(Sl, . . . . S,l) = S. Furthermore, we assume that T is replaced by qz(T1, . . . . T~). {Tl, . . . . T~} is the set of base relations in T. qa is constructed such that qz(T1, . . . . T,. ) - T. ACM

TransactIons

on Database

Systems,

Vol

20, No

2, June

1995

Nested Relational Query Expressions Each

R tuple

predicate.

The

relation, Also,

R ~, that

each

selection. matching

Before

induces basic

R tuple

Rs

is and

expression

in

combines

our

each

induces

We define T tuples.

attributes,

a set of tuples idea

R tuple

R~,

that

the

R~-HIR,

Az]qz(R

xT1,...,

Rx

T~).

projections

combined

left

S

are

same

with

= —

R,i~~t -

their

induced

R1.~t and

fIIR](Rs

with

augmented

reasons

R~ is constructed. combined with their

sets of R tuples, R

in

S, x Sj, in S are changed the

define

into

induced

S

of the all the

with

R’s

the

as discussed

T tuples.

a

S tuples. side

R tuple

S~),

all

to

right-hand

each

Rx

all multiplications,

two

from

combines

xS1,...,

constructed,

is

all of the matching

A1]ql(R

R tuples

RT to compute

with

side of the selection

approach

R~-fIIR,

H[ S,, Sj, R]( S1 x SJ), for

all

the left-hand

a set of tuples

a relation,

Similarly, T is changed before Rs contains all R tuples contains

from

transformation

125

.

query above,

tuples.

We use Rs

R~ and

Rrl~~t:

\R~),

III R](R~\Rs).

Rl,f~ contains a given R tuple iff it induces at least one tuple in S without inducing the same tuple in T. R,,~f,t contains a given R tuple iff it induces at least one tuple in T without inducing the same tuple in S. The following of nesting

from

transformation a selection

rule with

a[fIIAz]Tz

uses Rleft and

a set-inclusion fIIA1]S]R

Rrlght to remove

one level

predicate: +R

(6)

\R1.ft

(6) is based on the fact that the set of R tuples that does not belong to Rleft is equivalent to the set of R tuples whose induced left-hand-side tuples is a subset of the induced right-hand-side The following transformation rule of nesting

from

a selection

with

tuples. uses Rlcft and

a set equality

Rrlght to remove

one level

predicate: (7)

(7) is based on the fact that the set of R tuples that belongs to neither Rleft nor R,l~~t is equivalent to the set of R tuples whose induced set of left-handside tuples is equivalent to the induced set of right-hand-side tuples. The following transformation rule uses R,eft and Rr,ght to remove one level of nesting

from

a selection

with

OIIIIAZ]T>

a proper fIIA1]S]R

set-inclusion +R,,~~t

(8) is based on the fact that the set of R tuples equivalent to the set of R tuples whose induced a proper

subset

of the induced ACM

\R1,~t.

(8)

that do not belong set of left-hand-side

set of right-hand-side TransactIons

predicate:

on Database

to R,eft is tuples is

tuples. Systems,

Vol. 20, No 2. June

1995

126

L, B~kgaard

.

The following into that

example

an equivalent

ql(R) q2(R)

and L. Mark

flat

illustrates

query

the transformation

expression.

We have

of a type

constructed

ql

qz such

= o-[ Name = Borrowers .Name]R = U[ NoOfioans > 50]R

The transformation

is as follows:

m[IIITitle] cr[NoOfZoarzs > 50] Books z 11[7’itle]a[Name Borrowers .Name]Loans]Borro wers + o-[ H[ Zltle]qz(

Books ) ~ 11[ Title ]ql(Loans

=

)] Borrowers

Borrowers \ (11[ Borrowers](( 11[ Borrowers, Title]ql( x Books)))) + ( 11[ Borrowers, Title]q2( Borrowers

x Loans

Borrowers

Borrowers \ (11[ Borrowers](( 11[ Borrowers, Title]( cr[ Borrowers x Loans))) \ (11[ Borrowers, Title] Loans. Name]( Borrowers x Books))))) ( a[ Books. NoOfioans > 50]( Borrowers LEMMA

4.

tuples,

Iq selection and

and

(a)

R~

(b)

R~

contains

all

contains

all

.Name

)) \

=

R tuples

combined

with

their

induced

S

R tuples

combined

with

their

induced

T

tuples. PROOF.

Similar

to the proof

of Lemma



1.

LEMMA 5. (a) Rleft contains all R tuples for lohich R ,,~~t contains all R tuples for which S z T is false. PROOF.

Lemma

their

induced

their

induced

4 states

S tuples T tuples.

induce

at least

proves

(a), and the proof

LEMMA

6.

and

(6)

in

RS

contains

all

R

tuples

combined

with

RT

contains

all

R

tuples

combined

with

PROOF.

Follows

from

LEMMA

7.

(7)

any

Lemma

inducing

the

query

expression

query

5. The set of R tuples

any

query

tuple

of the form

expressions,

of R tuples right-hand-side

no nested

all R tuples

same

in

that

T. This



whose induced tuples. ❑

Set

transforms

where S and T contain algebra expression.

contains

11[ R]( RS \ RT)

S without

where S and T contain no nested relational algebra expression.

to the R left iS wuivalent is a subset of the induced

(b)

that

for (b) is similar.

transforms

and

that

Therefore,

one tuple

T > S is false,

expression

queries,

into

a[ T z SIR,

into

an equivalent

that

does not belong

left-hand-side

of the form

an equivalent

flat

to

tuples

cr[ S = T]R, fi!at

relational

PROOF. (7) is derived from the equivalence a[ S = T]R - a[ T Q S]R n 0-[S z T]R. Lemma 6 states that a[S = T]R - (R \ Rleft) n (R \ R,,~~t). But, then, cT[S = T]R = (R \ Rl,f,) \ R,,~~t. ❑ LEMMA 8. (8) transforms any query expression of the form U[ T ~ S]R, where S and T contain no nested queries, into an equivalent flat relational algebra expression. ACM

TransactIons

on Database

Systems,

Vol

20, No 2, June

1995

Nested Relational Query Expressions PROOF.

is derived

(8)

from

the

equivalence

u[T

127

.

~ S].R = ( a[ T Q S]R)

\

(cT[S = T] R). Lemmas 6 and 7 imply that a[T o SIR - (.R \ Rleft) \ ((R \ a[T ~ S]R = R,i~ht \ Rleft, since R 2 R1~ft and Rleft) \ Rright). But, then, R s Rrlght. ❑ 5.3 The Unnesting Algorithm nested

Algorithm

UNNEST relational

uses

algebra UNNEST.

Algorithm

the

transformation

expression

rules

as defined

Unnesting

(4)-(8)

in Section

of a nested relational

to unnest

any

2.

algebra

selection,

a [P]R.

(1)

Apply the transformation space reduction rules (I)-(3) to a[ P]R until all level 1 predicates are atomic. Change joins into selections, projections, and multiplications.

(2)

FOR each atomic, nested subselection

on level 1, 0[ Pi]R,

DO

(2.1) Change all level 1 multiplications, SI X S2 in P to IIIS1, SZI(SI Augment all level 1 projections in P with R’s attributes. (2.2) Use the relevant unnesting (2.3) Apply UNNEST recursively sion. THEOREM

expression

Basis. there

the

of the nesting The

rules (4)-(8) to unnest a[ PzIR. to the partially transformed query expres-

UNNEST transforms any an equivalent flat relational

We prove

PROOF.

the depth

atomic

1.

into

algorithm

occurrences

correctness

nested algebra

of algorithm

in the input

expression

applies

transformation

of level

is no level 2 nesting.

the

1 nesting. But

Lemmas

the

x S2).

relational expression. UNNEST

rules

1–8 prove

transformation

query

by induction

(4)–(8)

that

rules

algebra

this

only

on

to remove will affect

work level

if 1

base relation and predicates, and level 2 nesting will thus occur at level 2 nesting in the result. The free references are not affected by the transformation rules. Induction. that

The algorithm

k levels

be level

1 and

because

there

5.4

Other

simply

unnests

have

been

correctly

will

thus

be unnested.

are only

Nesting

a finite

one level

unnested, The

number

at a time,

the original algorithm

of atomic

level

and assuming k + 1 will

necessarily

nesting

now

terminates

occurrences.



Forms

We have described the unnesting queries, and set-inclusion queries. nested queries. The equivalences

of comparison queries, set-membership These are, of course, not the only forms of below indicate how a number of nested

queries can be unnested by means of the same method that is used in the algorithm UNNEST. RevOp means the reverse operator of Op, where the latter is in{= , < , > , < , >}. S(s) Op T means that S Op T is true for some elements in S. S Op( s)T means that S Op T is true for some element in T. S(a)Op T means that S Op T is true for all elements in S. S Op(a)T means ACM

TransactIons

on Database

Systems,

Vol

20, No 2, June

1995.

128

.

L, B~kgaard

and L Mark

that S Op T is true for all elements in T. Such operators are inspired by SQL 1991]. comparison operators like “ > all,” “ < all,” and “ = some” [Korth In the following equivalent queries, an R tuple is selected iff S and T have at least

one common

element:

a[Sozlerlaps In the following least

T]R

equivalent

one element

queries,

n T=

an R tuple

S]R

In the following

equivalent

true

one right-hand-side

a[EOp(s)H[

a[7(S

0)1 R.

is selected

iff there

exists

at

in S: a[exists

for at least

-

A]q(Sl,

= a[T(S

queries,

. . .. S.)]

In the following equivalent true for all right-hand-side

=O)]R.

an R tuple

is selected

iff the predicate

is

element:

R-

IIIR]a[

EOp

queries, an R tuple elements:

o-[l?Op(a)S]R

=R

A]q(Rx

Sl, . . .. Rx

is selected

iff the predicate

S~). is

‘. a[i3RevOp(s)S]R.

In the following equivalent queries, an R tuple is selected iff there exists at least one combination of a left-hand-side and a right-hand-side element for which

the predicate

is true:

a[HIA~lq~(Sl,

. . ..)(s)

=IIIRIu

IA~Op

xq~(R In the following true

A~](qs(R

xSl,...,

x T1, . . ..R

X ~,,)).

queries,

an R tuple

equivalent

for all combinations

Op(p( s) III A~]q~(Tl,

of left-hand-side

o-[ S(a) Op(a)T]R

-

with

any right-hand-side

a[S(s)Op(a)T]R Notice

is selected

u[NOT(S(s

iff the predicate

is

elements:

)RevOp(s)T)l

R.

is selected iff there exists at predicate is true when com-

element:

= (u[S(s)Op(s)T]R)

\

(a[S(s)RevOp(

s) T] R).

that qS(Sl,

In the following true for at least

. . .. S~)andqT(Tl

In the folIowing equivalent true for all left-hand-side

= T]R

queries, elements.

w[S(a)G’T] Transactions

.T, ,) T,,).

equivalent queries, an R tuple one left-hand-side element: o-[ S(S)

ACM

RxS~)

and right-hand-side

In the following equivalent queries, an R tuple least one left-hand-side element for which the bined

. . .. T)]R]R

on Database

Systems,

-

is selected

u[Souerlaps

an R tuple

is

iff the predicate

is

T]R. is selected

R=m[T~S]R.

Vol. 20. No 2, June

iff the predicate

1995

Nested Relational Query Expressions 6. INCREMENTAL So far

we have

sions.

In

the

QUERY

COMPUTATION

described

the unnesting

remaining

part

computation selection/join

of unnested predicates

subqueries.

Consequently,

of the

of nested paper,

we

relational focus

algebra

on

queries. We can, therefore, focus are simple comparison predicates we can utilize

existing

the

of flat selections, projections, following sections, we show incrementally

flat joins, and how to efficiently

by means

Roussopoulos’s incremental algorithm makes it possible A query

expression

be computed

of view

pointer

expres-

incremental

on queries where that contain no

algorithms

for

tally computing SPJ queries [Roussopoulos 1991]. As demonstrated in Section 5, nested queries can be reformulated

ences

129

.

incremenin terms

set differences. In this and the compute relational set differcaches.

When

combined

with

SPJ algorithms, our incremental set-difference to compute unnested queries incrementally.

can be computed

incrementally.

by means

The basic

idea

of recomputation,

of recomputation

or it can

is to construct

a

query expression from scratch each time [Jarke and Koch 1984; Smith and Chang 1975; Unman 1989]. Access paths maybe reused, but the results of old queries are not utilized. The basic idea of incremental computation is to store the results of old queries in persistent caches and to reuse them when similar queries are computed [Blakeley et al. 1986; Qian and Wiederhold 1991; Roussopoluos 1991]. Instead of evaluating everything from scratch again, the intermediate changes are used to modify the cached query results. There must be some sort of repeated query pattern in order for the cached query results to be useful. Usually, it is assumed that a given query expression is computed repeatedly or that a set of query expressions shares one or more common subexpressions. When a query expression is used in a view definition, When

it must queries

be computed are formulated

each time

a reference

in terms

of selections,

is made

to the view.

projections,

joins,

set

differences, and set unions, it is possible to identify fairly simple update rules for old query results [Blakeley et al. 1986; Bzekgaard 1993; Qian and Wiederhold 1991; Roussopoulos 1991]. Incremental computation is based on the notion

of change

Definition the following

1.

sets as defined RI

is true

and

RD are minimal

1:

insertion

and deletion

sets for R iff

for tl < tz: R[t2] R1n R1n

R[tl]

the intermediate deletions made

-(

R[tll

\RD)LJ

(a)

RI,

RD=O,

RD \R[tl] RI contains intermediate

in Definition

(b)

=0,

(c)

= 0.

(d)

insertions made to R, and RD contains the to R. Eq. (a) defines the incremental computa-

tion of R. It states that the intermediate deletions, RD, must be removed from R and that the intermediate insertions, RI, must be added to R. Eq. (b) states that there must be no cross-references between RI and RD. Eq. (c) ACM

TransactIons

on Database

Systems,

Vol. 20, No 2, June

1995.

130

.

L, B~kgaard

and L. Mark

R

s

RD

+

+

s, +

Rl

Fig. 1,

Change-set

propagation

for set differences.

states that RI must contain no false insertions. Eq. (d) states that contain no false deletions. In Section 7 we show how to construct

RD must minimal

change sets. Figure 1 illustrates

how (R

of RI, R~, SI, and algorithms in Section

SD. The figure is related directly to our incremental 8. All subsets marked by “+” belong to (R \ S)l, and

\ S)I

and (R

all subsets marked by “–” belong to (R Eq. (9)–( 11) express Figure 1 in terms (( R\

RD)URJ)

\((S\SD)

US1)=((R

\ S )~ can be computed

\ S)~. of relational

in terms

algebra:

\S)\(R\

S) D) U(R\S)I, (9)

(R

\ S)r=

(R1 \ (R

((S

\SD)

\S)D

= ((R

US1))

U (SD n ((R

\S)

n (S1 URD)).

\RD)

URI)),

(10)

(11)

Eq. (9)-(11) and Figure 1 presume that RI and R~ are minimal insertion and deletion sets as defined in Definition 1. Roussopoulos [ 1991] demonstrated how view pointer caches can be used to store a set of pointers identifying old query results. Each time a given query expression is computed, the corresponding view pointer caches are updated to reflect the effects of intermediate changes made to the underlying database. Figure 2 illustrates a typical view pointer cache S are base relations. i-l, rz, r~, r~, r~, SI, SZ, identifiers. The arrows symbolize page identifiers tuples in R \ S. Figure 2 shows that the query R ACM

Transactions

on Database

Systems,

Vol. 20, No 2, June

1995.

for a set difference. R and Sz, S4 and s~ are tuple for disk pages containing \ S contains the R tuples

Nested Relational Query Expressions R

131

.

s

A

1 B

rl

1

Br’ c

r3

D

r4

Sl

Er5

c

$2

F

S3

G

S4

H

S5

Fig. 2. Sample view pointer symbolize page identifiers.

i

!

L

cache: The arrows

---i’=1”’ I

I

r5

with

tuple

rs3

r ,, r~, and

identifiers

R \ S contains

the

be constructed Results

r~. Also,

page identifiers

via a sequential

of complex

query

Figure

2 shows

for the tuples.

that

the cache

Consequently,

for

R \ S can

scan of the cache for R \ S. expressions,

that

is, expressions

that

are

com-

posed by more than one relational operator, are stored as multilevel cache structures. Figure 3 illustrates a cache structure for an unnested version of the following

query:

O-[(HIT. The unnested

version S \

The

nodes

C]CT[T. C%= s.al!i”) of the query

ll[S]((Il[S,

labeled

R,

Q (H[R.c

has the following

R,cl(SXR)) S,

and

T

\

represent

R(a:d., c:dC), S(a:d~, b:db), and T(a;da, relational operator and contains a pointer query defined by the operator. When the cache structure in Figure mediate

changes

made

to R and

IoIR.cz

(III

and used to update (III

form:

base

relations

(13) with

schemata

c:d, ). Each other node represents a list that identifies the result of the

3 is updated

S are used

incrementally,

to update

the

caches

the interfor R ccS

propagated and used to made to these are then

the cache for

S, R. CI(SDOR)) ACM

(12)

S, T. C](SKT))).

and S ccT. The changes made to these caches are update the caches for the projections. The changes propagated

= s.czIR)ls.

\

Transactions

(H[S,

T. C](S=T)).

on Database

Systems,

Vol. 20, No. 2. June

1995

132

.

L, Ba?kgaard and L. Mark

* R

join

t

?7 project

project

Fig. 3,

Cache structure

for unnested

query,

L!? proJect

minus

Finally,

when

the cache for R \

(( III S, R. C](SNR))

\

(JIIS,

Z’.C](SX7’)))

has been updated, it is materialized, and the result In general, there is a view pointer cache per algebra

expression.

An

incremental

algorithm

is displayed. operator in

is associated

with

a relational each

rela-

tional algebra operator, and when a request is made to a view, the algorithm corresponding to the defining operator is executed, For the requested query, the algorithm updates the corresponding view pointer cache and materializes it. For subqueries, the algorithm updates the corresponding view pointer cache, propagates point cache. 7. DATA

changes

to higher-level

queries,

and materializes

the view

STRUCTURES

Three data structures are used as the basis for incremental computation. First, changes made to base relations are time-stamped and stored on differential files. There is one differential file per base relation. Second, a view pointer cache is stored on a file that contains the appropriate tuple identifiers and page identifiers. Third, a change set is stored as a change file, that is, an ACM

TransactIons

on Database

Systems,

Vol. 20, No. 2, June

1995

.

133

made

to a

Nested Relational Query Expressions extraction query 7.1 All

from

result

Differential tuples,

erated fiers

a set of differential

within

a certain

that

contain

the changes

of time.

Files

including

base relation

and maintained are never

tuple

modified.

a differential file until all relevant following

files

period

All

tuples,

identifier

are augmented that

changes

made

is globally

with

a system-gen-

unique.

to a base relation,

[Severance and Lohman 1976], A! 8, where queries have been updated. Differential

Tuple

identi-

R, are stored

on

they are kept files have the

attributes: TID’ Surr

Name: Domain:

TID Surr

Time

TimePoint

PID Ptr

Operator

Data

{INS, DEL}

Tuple

The attribute The attribute modification.

TID’ contains the tuple identifier of the differential file tuple. Time contains a time stamp that identifies the time of the The attribute TID contains the tuple identifier of the modified

base relation

tuple.

of the modified insertions and content of the deletion/insertion 7.2 View Pointer

The attribute

PID

base relation tuple. DEL for deletions. tuple

contains

The The

the (physical)

attribute attribute

after the modification. pair with identical time

page identifier

Operator contains INS for Data contains the actual

A tuple stamps.

update

is modeled

as a

Caches

A view pointer cache is a file with sequential access that identifies the set of tuples in a query result at a given point in time [Roussopoulos 1991]. Figure 2 illustrates A unary

a typical view pointer cache for a set difference. view pointer cache has the following attributes:

Name: Domain:

TID’ Surr

TID Surr

It can be used to identify

PID Ptr the tuples

corresponding

to a query

if each tuple

in

the query can be created from one operand tuple. This is true for queries defined by operators like selection, projection, and set difference. There is one cache tuple for each tuple in the corresponding query result, and its attributes are interpreted as follows: TID’ is a tuple identifier for the cache tuple,

and TID

and PID

contain

a tuple

identifier

corresponding operand tuple. A binary view pointer cache has the following Name: Domain:

It can be and join from both ing query, identifier tuple and

TID’ Surr

TID1 Surr

PID1 Ptr

TID2 Surr

and a page identifier

for the

attributes:

PID2 Ptr

used to cache the result of binary operators like Cartesian product where the resulting tuples are created as a combination of tuples operands. There is one cache tuple for each tuple in the correspondand its attributes are interpreted as follows: TID contains a tuple for the cache tuple, and TIDI, PIDI, TIDZ, and PIDZ contain the page identifiers of the relevant operand tuples. ACM

Transactions

on Database

Systems,

Vol. 20, No 2, June

1995,

134

L, BAgaard

Q

Entries memory

and L. Mark

on a unary pages

view

that

are

pointer

cache

accessed

are

stored

sequentially.

on a list

For

any

pair

of secondary of entries,

if

(tidb, tidb, pidb) is located on a page preceded by a page of (tida, tida, pida), then pida < pidb. This ensures that the query can be materialized without reading any page more than once [Roussopoulos 1991]. Furthermore, if pida = pidb, then tida < tidb. This ensures that the view pointer cache can be updated by merging (PID, TID). Entries on a binary memory (tid

pages

that

the cache view are

and the change

pointer

accessed

are

For

on a list any

pair

on a page preceded

are sorted

on

of secondary of entires, by the

tid2b,

tidla, then

pidla, pid2a

tid2a, pid2a), then pidla s pidlb. Furthermore, if pidla = < pid2b. This ensures that the view pointer cache can be

by a suboptimal number of page fetches [Roussopoulos can make it necessary to split an overfull cache page

page

if

pidlb,

materialized Insertions

is located

stored

if these

b, tidlb,

(tida, pidlb,

pid2b)

cache

sequentially.

files

of

1991]. into two

pages, and deletions can make it necessary to combine two or more sparse pages into one. In both cases the changes must be propagated to higher-level view pointer caches in order to update the tuple and page identifiers on these. 7.3

Change

Change

Files

files

described

for

in this

base

relations

subsection.

can

At least

be constructed two

strategies

change sets for complex query expressions. developed an algorithm that takes a relational generates two queries that define the sets. Inspired by Blakeley et al. [ 1986], to extract change sets from differential

by Algorithm

CFC,

Qian and Wiederhold [1991] algebra query as input and

corresponding insertion and deletion we use a change propagation method files. Briefly, the change file corre-

sponding to a cache node defined by a relational operator is extracted the deletion and insertion set(s) corresponding to the operand(s). A change-file tuple has the following attributes: Name: Domain:

TID Surr

PID Ptr

CFC.

(1) R’8 ~ a[(Time

from

Data Tuple

The semantics of a change-file (TID, PID, Data) has been modified. it belongs to a deletion set, and it insertion set. Change-file tuples do Algorithm CFC constructs deletion Algorithm

as

can be used to construct

tuple is that the tuple described by A change-file tuple describes a deletion if describes an insertion if it belongs to an not have tuple identifiers. and insertion sets for base relations.

Change file construction, > tl)AND (Time s fz)lR8

(2) Sort R’S on (TID, Time). (3) Scan R’8, and do the following for each modification sequence that refers to a given TID. For modifications, the deletion precedes the insertion: (3.1) If the first tuple is a deletion, then add (TID, PID, Data) to RD. then add (TID, PID, Data) to RI. If the last tuple is an insertion, (3.3) If a selected deletion/insertion pan- has identical data parts, then skip it,

(3.2)

ACM

TransactIons

on Database

Systems,

Vol. 20, No

2, June

1995

Nested Relational Query Expressions A change contains

set constructed

by Algorithm

only one modification

all of its modifications Step (3) of Algorithm insertions

and

CFC

per tuple

have a visible effect. CFC is motivated

deletions

that

have

in the

five

is minimal

relative

been

by the

fact

made

to

compressed

as described

hand sides

cover all possible change patterns. compressed sequences that have

sides show

corresponding

left-hand

The function be found

8. INCREMENTAL

compression

that

results

is defined

as follows.

Compress(DELX

~.. INS,

Compress(DELX

. . . INSY)

Similar

compression

- DELX

. . . INSY ) = INS,,

Compress(INSl

. . . DELY)

Compress(DELl

““” DEL,)

always

incremental algorithms

can

INS,,

= DELX.

results.

We illustrate

queries. propagate

algorithm. In order we also describe a we describe Rous-

caches to store

how change

for the set-difference

[ Gardarin

pointers

to the

sets are computed

We describe incremental changes to higher-level RC to denote

and

algorithms that queries, and that

the cache correspond-

four recomputation algorithms for the algorithms are based on various as-

about sorting and indexing, and the following to their sort–merge algorithm. Note that the

Algorithm

rules

- nil,

SPJ algorithms. use view pointer

the cache. We use the symbol

works

left-

ALGORITHMS

Smith and Chang [ 1975] described computation of set differences. Their

operator

four

be

) - nil,

Compress(INSX

of old query

sumptions identical

The

of

can

[ 1992]:

propagated to higher-level maintain the cache, that materialize ing to R.

sequence

tuple

below.

it

and that

Each of the four right-hand the same net effect as the

In this section we describe an incremental set-difference to compute the efficiency of the incremental algorithm, sort–merge-based recomputation algorithm. Furthermore, sopoulos’s [1991] The incremental

the

a given

rules

sense that

interval

side.

Compress

in Hanson

in the

to the time

135

.

operator.

This

algorithm, sort–merge

SMD, is method

is not the case for the join

1989]:

SMD.

Sort-merge

computation

of set differences:

R \

S.

(1) Sort R and S. (2) Scan R and S in parallel, (2.1) Display The

next

two

R tuples

algorithms

and do the following:

that are not found on S. and

cost models

are adapted

from

Roussopoulos

[1991]. The algorithm IS uses a unary view pointer cache to handle incremental computation of selections. The algorithm IJ uses a binary view pointer cache to handle incremental computation of joins. Roussopoulos suggested ACM

TransactIons

on Database

Systems,

Vol

20, No, 2, June

1995

136

L, Ba?kgaard and L. Mark

.

handling

projections

as selections

and removing

duplicates

during

material-

ization. IS is based selections

on the

[Blakeley a[P]((ROl~

Algorithm

following

formula

et al. 1986; \fi~)

U~l)

1S. Incremental

for

the

Roussopoulos =

incremental

((~[~]~o~d)

computation

computation

of

1991]: \~D)

of selection:

a[PIR1.

u

cr[ P ]R.

(1)I + ff[P]R1. (2)

Scan

Let the current

(a[P]R)C[tl].

from

(2.1) Remove RI) tuples

page be p.

p. Propagate

to (CT[PIR)~

(2.2) Add I tuples to p. Propagate

to (cr[ PIR)l

(2.3)

If all entries on p have sary, and rewrite p.

processed,

The notation

used in the algorithm

been

(t,, t, ].

(tl, tz 1 then

is interpreted

materialize,

as follows:

split

When

if neces-

tuples

are

added to the cache, it is the pointers that are added, and a tuple identifier for the new cache tuple is generated. When tuples are deleted from the cache, it is implicitly

assumed

change-set

tuple,

is deleted

from

(PID,

that

only

TID,

Data),

or inserted

into

(TID’, p,, Data) is propagated. IJ is based on the following [Blakeley

et al. 1986; ((~O,d

\~~) -

existing a cache

Roussopoulos

((~O,~NIP]SO,d)

\

Scan

(Rco[P]S)L[tl].

(TID’,

page number

When TID,

P,,

a

PID)

the tuple of joins

1991]: \S~)

USI)

((R~~[I’lSO,~)

computation

U (~O~d=[p]s~)))

of join:

u (R1m[P]((s

Let the current

(2.1) Remove RI tuples from

R cc[ P ]S.

\ SD)

u

s,)).

page be P.

p. Propagate

to (RcDIPIS)D(tl, tz 1.

(2.2) Add 1 tuples to P. Propagate

to (R CGIPIS)l (tl, tz 1.

(2.3)

processed,

If all sary,

deleted.

u (Rnew~[P]sl)).

(1)I - (((R\RD) u R,)m[P]s,) (2)

are

if the tuple

page with

U~~)m[P]((SO~d

LJ. Incremental

tuples

“ - “ is an assignment operator. formula for incremental computation

u((R1=[P]snew) Algorithm

cache

is propagated,

entries on p have and rewrite p.

been

then

materialize,

split

if neces-

In SPJ, queries, selections, projections, and joins can be computed independently because the project operator distributes over selections and joins. This makes it possible to remove duplicates during materialization. The project operator does not distribute over the set-difference operator. The consequence is that management of duplicates must be done continuously. It can be done for the projection caches, but this solution requires bookkeeping that cannot be done without extension to the view pointer cache data structures. ACM

TransactIons

on Database

Systems,

Vol. 20, No. 2, June

1995

Nested Relational Query Expressions Our

solution

of the

is to handle

generalized

algorithm

combined

set-difference

maintains

and

ized set-difference The incremental

materializes

operator. set-difference

observations regarding attributes (al, . . ..a~.

projections

operator.

and set differences

Our

a view

algorithm,

incremental

pointer ID,

in terms

set-difference

cache

for the

is based

on the

generalfollowing

change propagation: We assume that R and S have ..., am) and that the content of a change-file tuple is v~ ). We also assume that the query expression v~,...,

described by (vi,..., R\ [al,..., am ]S was computed at time tl and is being computed time tz, according to the intermediate changes made to R and S. Figure 1 refers to the tuple identifiers and data parts of the queries and change sets, and it presumes that First, there is no overlap between corresponding

the change insertion

Second,

corresponding

the that

the

deletion

insertion

sets

Algorithm

(1)

ID.

INSI

sets are contained and

the propagated

the

Incremental [al,...,

III Rl(S~X

queries

are

again

at

involved

sets are minimal. and deletion sets. queries.

disjoint.

This

Third, ensures

sets are minimal. computation

of R \ [al,...,

am ]S.

amlS[tzl.

(2.1) DEL + III RI(SIZICZI, (2.2) INSZ -

in the

corresponding

change

+- R,\

137

.

..., a.l R[tzl).

[al,...,

(3)

Sort RD and INS1, INS,

(4)

Scan (R \ [al,..., cache page be p.

anlR[tzl). and DEL on (PID, TID).

a,. ]S)C[tl],

R~, INS1, INSZ, and DEL. Let the current

(4.1) Remove RD tuples from p. Propagate to (1? \ [al,..., (4.2) Remove DEL tuples from p. Propagate to (R \ [a,,..., (4.3) Add INSI tuples to p. Propagate

to (R \ [al,...,

a,.]S)~ a~lS)~

an,lS)l

(tl, t2]. (tl, t~ ].

(tl, tz 1.

tz 1. (4.4) Add INSZ tuples to p. Propagate to (R \ [al,..., amlS)l (tl, split (4.5) If all entries on p have been processed, then materialize, necessary, and rewrite p.

if

Step (1) subtracts S[t21 from RI, giving the subset of RI that must be added to R\ [al,..., an ]S. Step (2) computes the intersection between SD and R[ t2] (giving the subset of SD that must be added to R \ [al,..., an,lS) and the intersection between S + and R[ tz ] (giving the subset of SI that contains potential deletions from R \ [ a ~, . . . . am] S). In order to facilitate a merge update of the cache, the pointers on SD and SI are replaced by corresponding R pointers. Step (4) scans (R \ [al,... , am] S) Jtl], RD, INS1, INS2, and DEL in parallel. The cache organization and the sortings in step (3) ensure that no pages from the scanned files have to be fetched more than once. During each iteration of the scan, only hitherto unprocessed TIDs that are relevant for all five pages

are processed.

Specifically,

in each iteration

the largest

(PID,

TID)

pair on each of the five pages is found. Among these, the smallest, (pid, tid), is identified. Only hitherto unprocessed entries up to (pid, tid) are processed. At the end of each iteration, the next page is fetched on a scanned file if all ACM Transactions

on Database

Systems,

Vol. 20, No 2. June

1995.

138

.

L B-kgaard

entries on its current least one file. 9. COST In this

page have been processed.

ANALYSIS section

and L Mark

AND

we analyze

the

cost efficiency

true

for at

of incremental

computation

of

algebra queries. First, we analyze the Next, we analyze three computation Sort–merge computation of the nested

version, sort–merge computation of the computation of the unnested version. Cost

is necessarily

COMPUTATIONS

unnested versions of nested relational set-difference operation in isolation. strategies for a class of nested queries:

9.1

This

unnested

version,

and

incremental

Models

For each algorithm

we describe

an I/O-based

cost model

that

can be used to

estimate the number of necessary references to secondary memory Thus, we do not consider the algorithms’ consumption of CPU time.

pages. For all

algorithms we have excluded the cost of writing the result since this cost is exactly the same in all cases. Our cost model notation is summarized in Table III. Our cost models for the incremental algorithms presume that the operands are base relations. Only minor changes are needed in order to handle the situations where one or both operands are defined by complex The cost of change file construction following

query expressions. can be estimated

by

means

of the

cost model:

c ~*~

= 46,

if O