llanslation of Calculus Queries (Extended Abstract) with Scalar ...

12 downloads 305 Views 999KB Size Report
domain independent” in [AB88] ), and generalize the notion of allowed. [Top87] to embedded allowed .... names with associated arities, and a countably infinite set of function names with associated arities. ...... inexpensive to obtain rbd(~) from.
Safety

and !llanslation

of Calculus (Extended

Martha

Escobar-Molano .

and

Computer University

Richard

Science CA

database

tion

model

it is desirable

as possible

language

(here

relational).

to permit

between

In this

ae much

the underlying

and the database

ap-

to provide supporting

programming

for a comprehensive

scalar functions,

here the theoretical

allowed

framework

for supporting

scalar functions

of allowed

Gelder-Topor

*This 9107055

paper

is embedded

is demonstrated calculus

[Top87]

In the full query

increase

settings

the probllems consider

regard

efficiency.

A

to applying

our

are discussed. raised

by the presence of

the following

queries:

to

embedded

calculus

it is shown domain

into

[GT91]

in

that

ing relation

independent. extension algebra.

part

by

NSF

(em-

by an extended

a point-wise

projection

to the apply-append

This

see also [AB88].

of the van

evaluation

of ~

operator,

operi~tor

which

of the 00A1gebra

For example,

proj

is analgous [Day89];

ect ( [@l ,f (01)1 , R)

computes the binary relation having one tuple t for each tuple in R, where i(l) is in R and t(2) = $(t(l)). Query

allowed

In order to grants

R, performing

for each element of R, performing a second point-wise evaluation of g on these, and finally returning the set of results as the answer. In our extended algebra (a subset of the language Heraclitus[Alg,C] [GHJ92, GHJ93]) point-wise evaluation of scalar functions is accomplished

each em-

for translating

the relational

resesrch was supported and INT-8817874.

allowed

and can

be translated into the (extended) algebra. For example, speaking informally ql cam be computed by first obtain-

the flex-

in relational

using a non-trivial algorithm

queries

to with

Queries ql and 92 are “safe” in our formalism,

component

Our framework generalizes previous work on the relational calculus to this context. We use the notion of embedded domain independent (called “bounded depth domain independent” in [AB88] ), and generalize the allowed).

in practical

To illustrate

queries.

notion

is proposed

of considerations

this in the context of the relational this in the relational calculus is sig-

We present

ible use of (total)

USA

framework

In particu-

nificantly more difficult. Indeed, the calculus sublanguage of PASCAL/R does not permit the use of scalar functions.

California

number

lar, it is essential that scalar functions (i.e., interpreted functions that use only atomic values as inputs and outputs), both system-defined or user-defined, can be used inside database queries. While it is relatively straightforward algebra,

Jacobs

Department

dependencies

communica-

access language.

Dean

accommodate functions, an extended algebra is used. The translation framework uses a generalization of jinit eness dependencies (Finns) [RBS87]; these are analogous to functional dependencies and carry information about how subformulas involving scalar functions can restrict the possible range of variables. A special family of succinct “reduced” covers for sets of finiteness

An important research goal for the 90’s is the development of programming languages which support database functionalities. One approach, illustrated by PASCAL/R [Sch77], is to extend an imperative language (here PASCAL) to incorporate access to a parproach,

Functions*

}Qpollux.usc.edu

Introduction

ticular

and

Hull

90089-0782

{marthae,hull,jacobs 1

Scalar

Abstract)

of Southern

Los Angeles,

with

Queries

ql is thus equivalent

to proj

ect ( [g(f

(Ql) )1 ,R). Also,

using a natural selection operator permitting function R). calls, qz is equivalent to select ({@2 == f(@l)},

IRI-

Permission to copy without fee all or part of this material is granted provided that the copies are not made or distributed for direct commercial advantage, the ACM copyright notice and the title of the publication and its date appear, and notice is given that copying is by permission of the Association for Computing Machinery. To copy otharwisa, or to republish, requiree a fee and/or specific permission. ACM-PODS-5/93/Washington, D.C. 01993 ACM 0.89791 -593 -3/93 /0005 /0253 . ..$1 .50

Query q~, on the other formalism. For one thing, might map to a member in which case the output

hand, is not “safe” in our infinitely many values for y

of the second coordinate of q3 would be infinite.

of S, Also,

in the context of a conventional programming language, it might be impossible to compute the inverse of f.

253

Section presents

2 briefly

mentions

an example

notation.

Section

and Section

6 defines FinDs highlights

8 describes

Related

As suggested

other

paper

domain

the reduced

Sec-

sets of FinDs.

Sec-

primarily

of work

are related

[AB88]

the work

extension

and [RBS87].

also [B M92b])

studies

the

incorporation

of

for complex objects, and is called here) embedded

specifications,

a number

than their baeed on

[BM92a]

of variations

for the

integers. Among other things, [Top91] develops a syntactic notion of safety for calculus queries which ensures the properties of universal domain independence ( [Top9 1]’s terminology),

an extended

algebra.

weaker A ~(z)

than = y)

our em-allowed.

For ex-

A ~(y)

(call TIO here) in order to be translated. [Coh86] defines a notion of safe calculus and provides a translation from the calculus

= Z) is

queries directly

9

s

illustrates

algebra. The presentaare given in subsequent

vs. FinDs,

that

pairs;

and of 254

Mgr is a unary relation

relation

over strings

giv-

and comp is a user-defined

A

Vy, z [ (kfges(z, +

This

populates

y) A f?fges(z,

(Comp(y)

p with

the

z) A y # z)

+ cm-l-q(z)

names

< COnzp(.))

]}

of the managers

whose compensation exceeds the the sum of the salaries of any two of his/her employees. As part of the analysis of this query quantifier is replaced by an existential obtain:

{z I ~gr(z) =3v,

This

the universal quantifier, to

A % [ (kfges(z,

query

the definition

y) A ~ge$(z,

z) A v # z)

+ co~p(z)

< co~P(~))

is em-allowed,

and furthermore,

of “Relational

Algebra

Normal

S = select

({Q2

0

]}

satisfies Form”

defined in Section 7. We present a translation into the algebra in three stages. First, let

@4],

{al

== @3},

Mges,

the join

Mges) )

here forms

Mges

@I == @3, and then projects

x Mges, selects onto columns

[@l, @2, @31. Now let P = project

Finally,

( [@l] , select ({(comp(@2) + comp(@3) >= Camp)}, s))

the full

as

of this

Q33,

join([@l,@2,

indicates that access to AnSS# is possible. If Annual.Compesation is a computed attribute, ‘PERSON: {Annual_Compensation} yields {SS#}’ would not hold. of annotations

Assume

{z I itfgr(z)

Intuitively,

implement ation. Details about how implemented are specified using “anFor example ‘PERSON: {SS#} yields

comparisons

that

scalar function which computes the “compensation” of employees for the current pay period. (The compensation might be a combination of salary, overtime, bonuses, etc.). Consider the query

on condition

notations”. {Annual.Compensation}’ nual-Compensation from

Detailed

an example

Mges is a binary

A l(cO~P(V)

of safety

(R(y)

3.1:

over strings;

a transla-

notion

V

The

and

safe according to our definition, while it is not safe for [ToP91]. Also, [ToP91] suggests that it is sufficient to use the natural generalizations of the transformations of [GT91] to perform the translation into the algebra. However, the query of Example 7.1 satisfies [Top90]’s syntactic conditions, but requires a new transformation

into physical relations are

we present

(see

this introduces a 2-sorted logic (integers and an uninterpreted domain) and permits scalar functions on the

(R(z)

section

ing manager-managed

notion of “safe’) query, and provides a translation into an algebra query language. Again, their notion of safe is based on range-restriction. Another related investigation is described in [Top91];

ample,

In this

Example

Several

to the work.

explores

of algebraic

is strictly

Example

presented

as a non-trivial

in [GT91]

the perspective

there

An

3

how scalar functions naturally arise in practical queries, and how our framework is used to analyze and translate

and the concluding

Our notion of em-allowed is much broader notion of safe. In a much richer context

into

have not yet been per-

algorithm,

domain independent. The paper introduces a family of “safe” calculus queries based on “range-restriction”, and provides a translation algorithm into the algebra.

tion

algorithms,

these queries into the extended tion is intuitive; formal details sections,

functions into query languages introduced the notion of (what

and finiteness

the two translation formed.

indepen-

of the translation

in the Introduction,

investigations

The

3

Work

here can be viewed and synthesis

Section

and em-allowed.

tion 9 considers practical extensions, section mentions open issuea. 2

work.

4 gives preliminary

5 defines embedded

dent, and Section tion 7 mentions

related

and Section

query

is equivalent

to Mgr -

P



4

Preliminary

In this

section

Definitions we establish

We We assume relational

familiarity

database

mathematical

a query

some with

theory

(cf.

the

terminology.

basic

[Mai83,

notions

of

U1188]), and of

logic (cf. [End72]).

We assume a one-sorted a countably

infinite

logic,

arities, and a countably with associated arities. In general

and let dom

set of “uninterpreted”

We assume a countably infinite countably infinite set of relation

names,

basic

relation

constants.

set of function

we focus on a fixed finite

and a fixed

denote

set of variables var, a names with associated

infinite

“neighborhood”

only a bounded tion)

domain

to q on an instance

a smalll from

of adorn(q,

distance

adorn(q,

independent”

I depends

(in terms

the set of values of terms

Note

set 7? of

1), which of function

on

extends applica-

I).

built

from

the functions of F as interpreted function nesting depth ~ i.

names

if the

exclusively

Let P = (all, F) be a pre-interpretation, Definitiorx or and C be a subset of d. For each i ~ O, term>(C), termi (C) if P is understood from the context, denotes

set 3 of function

schema, i.e., finite

q is “embedded

answer

that

if C is finite,,

elements by P,

then

termi

of C and

which

(C)

have

is finite

for

each i.

relation names. If d ~ dom then a pre-interpretation is a pair (d, F), where F maps each element $ of ~ to a total function from di to d, where i is the arity of $. A

Definition: P’ = (d’, F’)

(database)

(b) for each sequence bl, ..,, bn of constants in term~ (C) and function ~ of arity n in F, F(~)(bl, . . . . bn) =

interpretation

is a triple

F) is a pre-interpretation, instance, i.e., a function X? to a finite

(d, F, I), where (d,

and I is a relational database taking each relation name R in

set of tuples

over d having

We write assignment

to indicate

u is defined notion

Zn into

of the answer

q(I)

.,z~

F’($)(bl, I P(z1,..

of query

of an instance

is the set of constants

occurring

use an

extended

relational

. . . ,&)

I, denoted in I. This

algebra

We now have the notion

and that

of embedded

P and P’

domain

inde-

pendent.

adorn(I), is defined

Definition:

we use, based

b~), if , bn} ~ temn$(C)

otherwise

c,

It is easily seen that P is finite, agree on C to level i. ❑

A query

q is embedded domain

dent at level i if for all interpretations

indepen-

S1 = (d, F, I) and

S2 = (d’, F’, I) which agree on adom(q, I) to level i, q yields the same output on S1 and S2. Query q is embedded domain independent if for some i it is embedded domain independent at level i.

on

6

Embedded

This section

domain

(d, F) is a pre~ O. Let c c pre-interpretation

[

(

projections.

Embedded

=

q on instance

the language Heraclitus[Alg] [GHJ92]. The extension to incorporate function symbols consists primarily of permitting the use of terms built using function symbols in selection and join conditions, and in the targets of

5

that

F(f)(hl,..., {b,,...

.,z~)}.

similarly for formulae ~ and queries q. Also, U aciorn(I). e.g., adom(g, I) to denote adorn(q) We

i

p

I in pre-interpretation (d,F) is defined in the usual manner. In the general case, infinite answers may arise. We sometimes use P(zl, . . ., zn) to denote the query z~ I p(zl, . . ..zn)}. {x,,,.., The active domain

=

d ~ dom,

(d, F, I) satisfying

{z1,..

P

wz}

“ {~y + ~}*){~!Y!z}]*,{=,v,*} @ {Zg + ~}*!{ ~?Y)z})*){%Y>z}



over variable

set

v

As shown in the full paper,

is a shorthand

for I’*>~ree(vJ.

aIlowed)

Given the sets rl and 172over a variable and rz are equivalent if I’~’v = I’~’v.

FD2

will

6.2

Given

a formula

p, ~ ~ bd(p).

I’t-X+Y}

abbreviations from

that

X

(b)

and +

Y

FDs

variable can

be inferred

denotes from

A formula

+ 0 +

in the

of variables

256

p is embedded

for each subformula’3ZV bd(+) # free(3Z@) +

4Z

r

FD3

of em-allowed. allowed

(em-

free(p)

(c) for each subformula bd(+) + ~ree(V2+)

- specifically, z, XY

the notion

if

(a) bd(p)

of the function bd which This will be defined so

given two sets X and Y of variables X u Y, and Xz denotes X u {z}. FD1,

(e.g., +)

rules”,

bd(pz)

We now turn to the definition associates FinDs with formulas.

3r

have inverses

A witness to the fact that

Definition:

using

can

scalar functions nor then bd(p) = {0 +

9.

Consider

We now present

2We use the traditional

which

in Section

Given

Proposition

Definition set V, rl

to be It

mulas. As indicated in the definition of bd, the bounding information yielded by a conjunction is essential] y the union of bd of the conjuncts. For disjunctions, we use a kind of intersection given by:

this property.

r *va{X+YlXY~Vand3

p, r“)~

p.

and each i ~ O,

and I

Definition: If I’ is a set of FinDs then the closure of I’ over V is

For a formula

by a formula

Following [GT91], the pushnot operator used here “pushes” negations one step towards the atoms in for-

X;

FD3

scalar

known

for Z1, . . . . Zn such

they satisfy the following “inference Y, U, W range over sets of variables:

FD1

without

Y a FinD

p satisjies

assignment

(adorn(p,

bd may

is some j z O such that

(dom,

a is a variable

that

Then

Y, ifi There

for each interpretation

and X -

domain

functions

Definition: Let p be a formula

In the context

be shown that if v has neither equality nor inequality predicates,

be discussed

with

Definition: A finiteness dependency (FinD) over (basic-type) variable set Z is a syntactic expression of the form X ~ Y, where X and Y are (possibly empty) subsets of Z. A1s0,2 xl . . . Zn + yl . . . ~n denotes the FinD {Z1,. ... Zn}~ {yl,..., y~}. Definition:

However,

by p.

gen holds for a set of variables

arithmetic

[RBS87]

we use them

satisfied

z I gen(z, p) holds}”’~. Figure 1 presents the overall definition of bd, which includes operators defined below. The incorporate ion of

allowed

in

q, p ~ bd(p).

all FinDs

The bd function can be viewed as a generalization of the gera operator of [GT91] (called pos in [Top87]

Query ql equalities

involving function terms can be used to infer additional bounding information, which will be captured using a generalization of the finiteness dependencies (FinDs) of We begin the function

for each formula

not include

context

where

occurring

in Z.

of p, [Z n ~ree(~)]

VE@ of p, +

[2

n ~ree(~)]

a set should

appear,

denotes

the

set

-. ../ B1

R “ (n,..

{0+

.,r=)

X}*J$’ where X =

+?(?)

0“’~

+

B4

far,,...,

rn)=r

bd(pushnot(+)} for + not of the form l?(?) 0“”+’ if r is not a variable, or T is a variable occuring in one of rl,...,r~

B5

furl,...,

rn)=r

{x + T}*’$’ if r is a variable not occuring in any where X = set of variables occurring {z+ y,y + z}””+’ 0*’~ (bd(@, ) U . . . U bd(vn))”q (bd(tjl) ~ . . . @ bd(+n))”)v (bd(+) – all FinDs in which some variable (bd(+) - all FinDs in which some variable

Z=g 71 # 7-2

B6 B7 B8 B9 B1o Bll

+lA...

A+n

+*v... v$n 321...32”4 Vzl . . .Vzn$

Figure

The strated

set of variables that are members of {TI, ..., r~}

B2 B3

1: Definition

main result can now be stated. It is demonin the course of translating em-allowed formulas

into equivalent

algebra

queries,

as described

of function

Theorem

6.3

If p is em-allowed

independent

From

the

in the next

(b)

then p is embedded

to the

In this section we describe a non-trivial the algorithm of [GT91] for translating into

equivalent

algebra

queries.

[GT91] and our algorithm four steps. (1) Replace all subformulas

Both

perform

generalization of allowed formulas the algorithm

of

called T15 there).

in

(4) Translate the formula algebra expression. Steps (l),

Relational

into a(n extended)

(2) and (3) are accomplished

NorAlge-

T2

as

negative

(see

a transformation

(called

TIO

Their

transformation

is subsumed

a transformation so that

algebra.

Transformation

technical

role in connection

TIO

not

queries such as the

also plays with

T16

definition into the

an important

transformation

T15:

it is crucial in the proof of Lemma 7.7, which in turn is crucial in proving that certain em-allowed formulas can be transformed into the extended algebra. Difference (e)

relational

is largely

using fami-

lies of transformations, which map subformulas to subformulas, and step (4) is accomplished with transformations mapping subformulas The major differences between of [GT91] are as follows:

#

one in the Example 7.1 which satisfy the of em-allowed, can be successfully translated

mal FormG (ENF). (3) Put the formula into (generalized) bra Normal FormG (RANF).

‘rl

by ours. (e) In step 3 we introduce present in [GT91]. Difference (c) is included

Vp by =37v.

Existential

view

here), not present in [GT91]. (d) In step 3 our transformation T15 is slightly different from the analogous transformation of [GT91] (also

Rename quantified variables “apart”, i.e., rename them in such a way that a q~antified variable occurs only in the scope of its quantifier. (2) Put the formula into (generalized)

and

technical T1 = rz bounding

7.2).

(c) In step 2 we introduce

the translation

the choice of some

Our notion of “positive” and “negative” formulas is slightly different. [CrT91] views all atoms ~1 = rz

Example

Algebra

of the form

influence

and T1 # 72 as “positive” for primarily reasons. We view atoms of the form to be positive, because they may give

at levels \\p[[ – 1.

Calculus

z~} occurs)”’~ zfi} occurs) *’W

bd(~)

information,

7

in {XI,..., in {xl,...,

(a) In steps 3 and 4, Finns transformation steps.

section.

domain

of TI, . . . , Tnj in TI, . . ..T~

to algebra expressions. our algorithm and that

cosmetic,

and involves

transformations

map, e.g., R(z, ~(y))

into 3Z(Z = ~(y) A R(x,

7.1

a fcmmula

Transforming

into

which z)).

ENF

In this section we discuss the algorithm that transforms a formula into ENF, to be defined shortly. We assume that all universal quantifiers have been removed and all quantified variables renamed apart as indicated in (1) above. First we introduce one important component 257

in

the

translation,

formula.

Then,

namely

the

we introduce

“simplification”

additional

of a

transformations

Example

7.k

Consider

necessary to get to ENF. Finally, we indicate that each transformation preserves the em-allowed property of the

the following

$%(z, y)

=

A formula

is simplified

if and only if

a. There

is no occurrence

of -mp

b. There

is no occurrence

of =(~1 = 72) or =(71 # 72),

for terms c. There

If we apply

d. The polyadic

of V.

operators

A, V, 3 are flattened;

that

is

i. In subformula a conjunction

VI A.. .A P., no operand

pi is itself

ii, In subformula a disjunction

VI V, . .V Pn, no operand

pi is itself

iii.

In sub formula

3Zp,

e. In every subformula free in p

p does not begin

339, each variable

In step (2) of our algorithm, T7 (see Figure

are applied

until

Now we present an example formation

These

the formula

Definition: A simplified formula p is negative if and only if p = -I@ for some formula @ or p S ~1 # q for

Condition (3) in the following present in [GT91]; it is related

definition of ENF is not to our use of TIO and

the justification Definition: (.ENF)

is positive

T8 (ss opposed

where the modified

to using

[GT91]’s

rectly) is necessary to successfully complete tion of a formula into the algebra.

is simplified.

formula

T8,

which is in ENF. Notice that without the transformation TIO, no transformations could have been applied to the original formula. ❑

T1 to

some terms rl, T2. A simplified is not negative.

we obtain,

[((f(~) = Y V 9(Z)= y) A =R(z, Y))V ((h(z) = y V k(z) = y) A +(z,y))] A S(z) A +(y)

xi is actually

formulas.

iteratively

Now, we can apply

3

transformations

2) are used to simplify

transformations

with

TIO twice,

V R(s, Y)) A V P($, y))]

‘[=((i(x) = Y V g(~) = V) A =R(~, Y)) A =((h(z) = y V k(a) = y) A =P(z, y))] A S(z) A -T(y)

q and 7-2

is no occurrence

formula.

‘[((f(~) # YA 9(Z) #Y) ((h(z) # y A k(z) #y) A S(z) A =T(y)

formula. Definition:

em-allowed

Example

7.2

Consider

the following

trans-

analog

di-

the transla-

formula,

if it vz(~, If we apply

Y) = ‘(~($)

# v A h(~)

# v) A R(x)

T8, we obtain,

of T15. A formula

is in Existential

Normal

(~(~)

Form

= Y V h(~) = !/) A R(z)

if and only if which



is in ENF.

(1) It is simplified. (2) Each disjunction a. The parent b. Each

in the formula

of the disjunction,

operand

of the

In the full paper we define the algorithm ENF, which iteratively applies our transformations T1-T12.

satisfies: if it has one, is A

disjunction

The following technical lemma is needed to prove that the transformations preserve em-allowedness and that

is a positive

formula. (3) The parent 3.

the algorithm of a conjunction

of negative

formulas

presented

in this paper

terminates.

is Lemma

7.3:

O*IW then bd(=p)

For an arbitrary

formula

p, if bd(y)

of this section

is given by:

= 0*19.

We use transformations T8 to T12 (see Figure 2) to put formulas into ENF. Two of these (Tll and T12) are analogous to ones in [GT91]. Two others (T8 and T9) are modified versions of the analogs of [GT91]. Finally TIO is new. The following example illustrates the use of transfor-

Lemma 7.4 Given an em-allowed terminates and yields an em-allowed

mation

ENF

Finally

TIO.

258

+

the main

and equivalent

result

to p.

formula formula

p, ENF(p) that is in

T1 T2 T3 T4

Original

Transformed

-11

4

7T1=T2 ($ -(n

TI #

)

rl

# rz)

&A...

#l A.,. A#n where, an operand +~ is a conjunction

T5

where, an operand #~ is a disjunction 3;+ 35+ m(@l A.. .A @n) where for each i, $i is negative and, either *i ~ ‘