domain independentâ in [AB88] ), and generalize the notion of allowed. [Top87] to embedded allowed .... names with associated arities, and a countably infinite set of function names with associated arities. ...... inexpensive to obtain rbd(~) from.
Safety
and !llanslation
of Calculus (Extended
Martha
Escobar-Molano .
and
Computer University
Richard
Science CA
database
tion
model
it is desirable
as possible
language
(here
relational).
to permit
between
In this
ae much
the underlying
and the database
ap-
to provide supporting
programming
for a comprehensive
scalar functions,
here the theoretical
allowed
framework
for supporting
scalar functions
of allowed
Gelder-Topor
*This 9107055
paper
is embedded
is demonstrated calculus
[Top87]
In the full query
increase
settings
the probllems consider
regard
efficiency.
A
to applying
our
are discussed. raised
by the presence of
the following
queries:
to
embedded
calculus
it is shown domain
into
[GT91]
in
that
ing relation
independent. extension algebra.
part
by
NSF
(em-
by an extended
a point-wise
projection
to the apply-append
This
see also [AB88].
of the van
evaluation
of ~
operator,
operi~tor
which
of the 00A1gebra
For example,
proj
is analgous [Day89];
ect ( [@l ,f (01)1 , R)
computes the binary relation having one tuple t for each tuple in R, where i(l) is in R and t(2) = $(t(l)). Query
allowed
In order to grants
R, performing
for each element of R, performing a second point-wise evaluation of g on these, and finally returning the set of results as the answer. In our extended algebra (a subset of the language Heraclitus[Alg,C] [GHJ92, GHJ93]) point-wise evaluation of scalar functions is accomplished
each em-
for translating
the relational
resesrch was supported and INT-8817874.
allowed
and can
be translated into the (extended) algebra. For example, speaking informally ql cam be computed by first obtain-
the flex-
in relational
using a non-trivial algorithm
queries
to with
Queries ql and 92 are “safe” in our formalism,
component
Our framework generalizes previous work on the relational calculus to this context. We use the notion of embedded domain independent (called “bounded depth domain independent” in [AB88] ), and generalize the allowed).
in practical
To illustrate
queries.
notion
is proposed
of considerations
this in the context of the relational this in the relational calculus is sig-
We present
ible use of (total)
USA
framework
In particu-
nificantly more difficult. Indeed, the calculus sublanguage of PASCAL/R does not permit the use of scalar functions.
California
number
lar, it is essential that scalar functions (i.e., interpreted functions that use only atomic values as inputs and outputs), both system-defined or user-defined, can be used inside database queries. While it is relatively straightforward algebra,
Jacobs
Department
dependencies
communica-
access language.
Dean
accommodate functions, an extended algebra is used. The translation framework uses a generalization of jinit eness dependencies (Finns) [RBS87]; these are analogous to functional dependencies and carry information about how subformulas involving scalar functions can restrict the possible range of variables. A special family of succinct “reduced” covers for sets of finiteness
An important research goal for the 90’s is the development of programming languages which support database functionalities. One approach, illustrated by PASCAL/R [Sch77], is to extend an imperative language (here PASCAL) to incorporate access to a parproach,
Functions*
}Qpollux.usc.edu
Introduction
ticular
and
Hull
90089-0782
{marthae,hull,jacobs 1
Scalar
Abstract)
of Southern
Los Angeles,
with
Queries
ql is thus equivalent
to proj
ect ( [g(f
(Ql) )1 ,R). Also,
using a natural selection operator permitting function R). calls, qz is equivalent to select ({@2 == f(@l)},
IRI-
Permission to copy without fee all or part of this material is granted provided that the copies are not made or distributed for direct commercial advantage, the ACM copyright notice and the title of the publication and its date appear, and notice is given that copying is by permission of the Association for Computing Machinery. To copy otharwisa, or to republish, requiree a fee and/or specific permission. ACM-PODS-5/93/Washington, D.C. 01993 ACM 0.89791 -593 -3/93 /0005 /0253 . ..$1 .50
Query q~, on the other formalism. For one thing, might map to a member in which case the output
hand, is not “safe” in our infinitely many values for y
of the second coordinate of q3 would be infinite.
of S, Also,
in the context of a conventional programming language, it might be impossible to compute the inverse of f.
253
Section presents
2 briefly
mentions
an example
notation.
Section
and Section
6 defines FinDs highlights
8 describes
Related
As suggested
other
paper
domain
the reduced
Sec-
sets of FinDs.
Sec-
primarily
of work
are related
[AB88]
the work
extension
and [RBS87].
also [B M92b])
studies
the
incorporation
of
for complex objects, and is called here) embedded
specifications,
a number
than their baeed on
[BM92a]
of variations
for the
integers. Among other things, [Top91] develops a syntactic notion of safety for calculus queries which ensures the properties of universal domain independence ( [Top9 1]’s terminology),
an extended
algebra.
weaker A ~(z)
than = y)
our em-allowed.
For ex-
A ~(y)
(call TIO here) in order to be translated. [Coh86] defines a notion of safe calculus and provides a translation from the calculus
= Z) is
queries directly
9
s
illustrates
algebra. The presentaare given in subsequent
vs. FinDs,
that
pairs;
and of 254
Mgr is a unary relation
relation
over strings
giv-
and comp is a user-defined
A
Vy, z [ (kfges(z, +
This
populates
y) A f?fges(z,
(Comp(y)
p with
the
z) A y # z)
+ cm-l-q(z)
names
< COnzp(.))
]}
of the managers
whose compensation exceeds the the sum of the salaries of any two of his/her employees. As part of the analysis of this query quantifier is replaced by an existential obtain:
{z I ~gr(z) =3v,
This
the universal quantifier, to
A % [ (kfges(z,
query
the definition
y) A ~ge$(z,
z) A v # z)
+ co~p(z)
< co~P(~))
is em-allowed,
and furthermore,
of “Relational
Algebra
Normal
S = select
({Q2
0
]}
satisfies Form”
defined in Section 7. We present a translation into the algebra in three stages. First, let
@4],
{al
== @3},
Mges,
the join
Mges) )
here forms
Mges
@I == @3, and then projects
x Mges, selects onto columns
[@l, @2, @31. Now let P = project
Finally,
( [@l] , select ({(comp(@2) + comp(@3) >= Camp)}, s))
the full
as
of this
Q33,
join([@l,@2,
indicates that access to AnSS# is possible. If Annual.Compesation is a computed attribute, ‘PERSON: {Annual_Compensation} yields {SS#}’ would not hold. of annotations
Assume
{z I itfgr(z)
Intuitively,
implement ation. Details about how implemented are specified using “anFor example ‘PERSON: {SS#} yields
comparisons
that
scalar function which computes the “compensation” of employees for the current pay period. (The compensation might be a combination of salary, overtime, bonuses, etc.). Consider the query
on condition
notations”. {Annual.Compensation}’ nual-Compensation from
Detailed
an example
Mges is a binary
A l(cO~P(V)
of safety
(R(y)
3.1:
over strings;
a transla-
notion
V
The
and
safe according to our definition, while it is not safe for [ToP91]. Also, [ToP91] suggests that it is sufficient to use the natural generalizations of the transformations of [GT91] to perform the translation into the algebra. However, the query of Example 7.1 satisfies [Top90]’s syntactic conditions, but requires a new transformation
into physical relations are
we present
(see
this introduces a 2-sorted logic (integers and an uninterpreted domain) and permits scalar functions on the
(R(z)
section
ing manager-managed
notion of “safe’) query, and provides a translation into an algebra query language. Again, their notion of safe is based on range-restriction. Another related investigation is described in [Top91];
ample,
In this
Example
Several
to the work.
explores
of algebraic
is strictly
Example
presented
as a non-trivial
in [GT91]
the perspective
there
An
3
how scalar functions naturally arise in practical queries, and how our framework is used to analyze and translate
and the concluding
Our notion of em-allowed is much broader notion of safe. In a much richer context
into
have not yet been per-
algorithm,
domain independent. The paper introduces a family of “safe” calculus queries based on “range-restriction”, and provides a translation algorithm into the algebra.
tion
algorithms,
these queries into the extended tion is intuitive; formal details sections,
functions into query languages introduced the notion of (what
and finiteness
the two translation formed.
indepen-
of the translation
in the Introduction,
investigations
The
3
Work
here can be viewed and synthesis
Section
and em-allowed.
tion 9 considers practical extensions, section mentions open issuea. 2
work.
4 gives preliminary
5 defines embedded
dent, and Section tion 7 mentions
related
and Section
query
is equivalent
to Mgr -
P
❑
4
Preliminary
In this
section
Definitions we establish
We We assume relational
familiarity
database
mathematical
a query
some with
theory
(cf.
the
terminology.
basic
[Mai83,
notions
of
U1188]), and of
logic (cf. [End72]).
We assume a one-sorted a countably
infinite
logic,
arities, and a countably with associated arities. In general
and let dom
set of “uninterpreted”
We assume a countably infinite countably infinite set of relation
names,
basic
relation
constants.
set of function
we focus on a fixed finite
and a fixed
denote
set of variables var, a names with associated
infinite
“neighborhood”
only a bounded tion)
domain
to q on an instance
a smalll from
of adorn(q,
distance
adorn(q,
independent”
I depends
(in terms
the set of values of terms
Note
set 7? of
1), which of function
on
extends applica-
I).
built
from
the functions of F as interpreted function nesting depth ~ i.
names
if the
exclusively
Let P = (all, F) be a pre-interpretation, Definitiorx or and C be a subset of d. For each i ~ O, term>(C), termi (C) if P is understood from the context, denotes
set 3 of function
schema, i.e., finite
q is “embedded
answer
that
if C is finite,,
elements by P,
then
termi
of C and
which
(C)
have
is finite
for
each i.
relation names. If d ~ dom then a pre-interpretation is a pair (d, F), where F maps each element $ of ~ to a total function from di to d, where i is the arity of $. A
Definition: P’ = (d’, F’)
(database)
(b) for each sequence bl, ..,, bn of constants in term~ (C) and function ~ of arity n in F, F(~)(bl, . . . . bn) =
interpretation
is a triple
F) is a pre-interpretation, instance, i.e., a function X? to a finite
(d, F, I), where (d,
and I is a relational database taking each relation name R in
set of tuples
over d having
We write assignment
to indicate
u is defined notion
Zn into
of the answer
q(I)
.,z~
F’($)(bl, I P(z1,..
of query
of an instance
is the set of constants
occurring
use an
extended
relational
. . . ,&)
I, denoted in I. This
algebra
We now have the notion
and that
of embedded
P and P’
domain
inde-
pendent.
adorn(I), is defined
Definition:
we use, based
b~), if , bn} ~ temn$(C)
otherwise
c,
It is easily seen that P is finite, agree on C to level i. ❑
A query
q is embedded domain
dent at level i if for all interpretations
indepen-
S1 = (d, F, I) and
S2 = (d’, F’, I) which agree on adom(q, I) to level i, q yields the same output on S1 and S2. Query q is embedded domain independent if for some i it is embedded domain independent at level i.
on
6
Embedded
This section
domain
(d, F) is a pre~ O. Let c c pre-interpretation
[
(
projections.
Embedded
=
q on instance
the language Heraclitus[Alg] [GHJ92]. The extension to incorporate function symbols consists primarily of permitting the use of terms built using function symbols in selection and join conditions, and in the targets of
5
that
F(f)(hl,..., {b,,...
.,z~)}.
similarly for formulae ~ and queries q. Also, U aciorn(I). e.g., adom(g, I) to denote adorn(q) We
i
p
I in pre-interpretation (d,F) is defined in the usual manner. In the general case, infinite answers may arise. We sometimes use P(zl, . . ., zn) to denote the query z~ I p(zl, . . ..zn)}. {x,,,.., The active domain
=
d ~ dom,
(d, F, I) satisfying
{z1,..
P
wz}
“ {~y + ~}*){~!Y!z}]*,{=,v,*} @ {Zg + ~}*!{ ~?Y)z})*){%Y>z}
❑
over variable
set
v
As shown in the full paper,
is a shorthand
for I’*>~ree(vJ.
aIlowed)
Given the sets rl and 172over a variable and rz are equivalent if I’~’v = I’~’v.
FD2
will
6.2
Given
a formula
p, ~ ~ bd(p).
I’t-X+Y}
abbreviations from
that
X
(b)
and +
Y
FDs
variable can
be inferred
denotes from
A formula
+ 0 +
in the
of variables
256
p is embedded
for each subformula’3ZV bd(+) # free(3Z@) +
4Z
r
FD3
of em-allowed. allowed
(em-
free(p)
(c) for each subformula bd(+) + ~ree(V2+)
- specifically, z, XY
the notion
if
(a) bd(p)
of the function bd which This will be defined so
given two sets X and Y of variables X u Y, and Xz denotes X u {z}. FD1,
(e.g., +)
rules”,
bd(pz)
We now turn to the definition associates FinDs with formulas.
3r
have inverses
A witness to the fact that
Definition:
using
can
scalar functions nor then bd(p) = {0 +
9.
Consider
We now present
2We use the traditional
which
in Section
Given
Proposition
Definition set V, rl
to be It
mulas. As indicated in the definition of bd, the bounding information yielded by a conjunction is essential] y the union of bd of the conjuncts. For disjunctions, we use a kind of intersection given by:
this property.
r *va{X+YlXY~Vand3
p, r“)~
p.
and each i ~ O,
and I
Definition: If I’ is a set of FinDs then the closure of I’ over V is
For a formula
by a formula
Following [GT91], the pushnot operator used here “pushes” negations one step towards the atoms in for-
X;
FD3
scalar
known
for Z1, . . . . Zn such
they satisfy the following “inference Y, U, W range over sets of variables:
FD1
without
Y a FinD
p satisjies
assignment
(adorn(p,
bd may
is some j z O such that
(dom,
a is a variable
that
Then
Y, ifi There
for each interpretation
and X -
domain
functions
Definition: Let p be a formula
In the context
be shown that if v has neither equality nor inequality predicates,
be discussed
with
Definition: A finiteness dependency (FinD) over (basic-type) variable set Z is a syntactic expression of the form X ~ Y, where X and Y are (possibly empty) subsets of Z. A1s0,2 xl . . . Zn + yl . . . ~n denotes the FinD {Z1,. ... Zn}~ {yl,..., y~}. Definition:
However,
by p.
gen holds for a set of variables
arithmetic
[RBS87]
we use them
satisfied
z I gen(z, p) holds}”’~. Figure 1 presents the overall definition of bd, which includes operators defined below. The incorporate ion of
allowed
in
q, p ~ bd(p).
all FinDs
The bd function can be viewed as a generalization of the gera operator of [GT91] (called pos in [Top87]
Query ql equalities
involving function terms can be used to infer additional bounding information, which will be captured using a generalization of the finiteness dependencies (FinDs) of We begin the function
for each formula
not include
context
where
occurring
in Z.
of p, [Z n ~ree(~)]
VE@ of p, +
[2
n ~ree(~)]
a set should
appear,
denotes
the
set
-. ../ B1
R “ (n,..
{0+
.,r=)
X}*J$’ where X =
+?(?)
0“’~
+
B4
far,,...,
rn)=r
bd(pushnot(+)} for + not of the form l?(?) 0“”+’ if r is not a variable, or T is a variable occuring in one of rl,...,r~
B5
furl,...,
rn)=r
{x + T}*’$’ if r is a variable not occuring in any where X = set of variables occurring {z+ y,y + z}””+’ 0*’~ (bd(@, ) U . . . U bd(vn))”q (bd(tjl) ~ . . . @ bd(+n))”)v (bd(+) – all FinDs in which some variable (bd(+) - all FinDs in which some variable
Z=g 71 # 7-2
B6 B7 B8 B9 B1o Bll
+lA...
A+n
+*v... v$n 321...32”4 Vzl . . .Vzn$
Figure
The strated
set of variables that are members of {TI, ..., r~}
B2 B3
1: Definition
main result can now be stated. It is demonin the course of translating em-allowed formulas
into equivalent
algebra
queries,
as described
of function
Theorem
6.3
If p is em-allowed
independent
From
the
in the next
(b)
then p is embedded
to the
In this section we describe a non-trivial the algorithm of [GT91] for translating into
equivalent
algebra
queries.
[GT91] and our algorithm four steps. (1) Replace all subformulas
Both
perform
generalization of allowed formulas the algorithm
of
called T15 there).
in
(4) Translate the formula algebra expression. Steps (l),
Relational
into a(n extended)
(2) and (3) are accomplished
NorAlge-
T2
as
negative
(see
a transformation
(called
TIO
Their
transformation
is subsumed
a transformation so that
algebra.
Transformation
technical
role in connection
TIO
not
queries such as the
also plays with
T16
definition into the
an important
transformation
T15:
it is crucial in the proof of Lemma 7.7, which in turn is crucial in proving that certain em-allowed formulas can be transformed into the extended algebra. Difference (e)
relational
is largely
using fami-
lies of transformations, which map subformulas to subformulas, and step (4) is accomplished with transformations mapping subformulas The major differences between of [GT91] are as follows:
#
one in the Example 7.1 which satisfy the of em-allowed, can be successfully translated
mal FormG (ENF). (3) Put the formula into (generalized) bra Normal FormG (RANF).
‘rl
by ours. (e) In step 3 we introduce present in [GT91]. Difference (c) is included
Vp by =37v.
Existential
view
here), not present in [GT91]. (d) In step 3 our transformation T15 is slightly different from the analogous transformation of [GT91] (also
Rename quantified variables “apart”, i.e., rename them in such a way that a q~antified variable occurs only in the scope of its quantifier. (2) Put the formula into (generalized)
and
technical T1 = rz bounding
7.2).
(c) In step 2 we introduce
the translation
the choice of some
Our notion of “positive” and “negative” formulas is slightly different. [CrT91] views all atoms ~1 = rz
Example
Algebra
of the form
influence
and T1 # 72 as “positive” for primarily reasons. We view atoms of the form to be positive, because they may give
at levels \\p[[ – 1.
Calculus
z~} occurs)”’~ zfi} occurs) *’W
bd(~)
information,
7
in {XI,..., in {xl,...,
(a) In steps 3 and 4, Finns transformation steps.
section.
domain
of TI, . . . , Tnj in TI, . . ..T~
to algebra expressions. our algorithm and that
cosmetic,
and involves
transformations
map, e.g., R(z, ~(y))
into 3Z(Z = ~(y) A R(x,
7.1
a fcmmula
Transforming
into
which z)).
ENF
In this section we discuss the algorithm that transforms a formula into ENF, to be defined shortly. We assume that all universal quantifiers have been removed and all quantified variables renamed apart as indicated in (1) above. First we introduce one important component 257
in
the
translation,
formula.
Then,
namely
the
we introduce
“simplification”
additional
of a
transformations
Example
7.k
Consider
necessary to get to ENF. Finally, we indicate that each transformation preserves the em-allowed property of the
the following
$%(z, y)
=
A formula
is simplified
if and only if
a. There
is no occurrence
of -mp
b. There
is no occurrence
of =(~1 = 72) or =(71 # 72),
for terms c. There
If we apply
d. The polyadic
of V.
operators
A, V, 3 are flattened;
that
is
i. In subformula a conjunction
VI A.. .A P., no operand
pi is itself
ii, In subformula a disjunction
VI V, . .V Pn, no operand
pi is itself
iii.
In sub formula
3Zp,
e. In every subformula free in p
p does not begin
339, each variable
In step (2) of our algorithm, T7 (see Figure
are applied
until
Now we present an example formation
These
the formula
Definition: A simplified formula p is negative if and only if p = -I@ for some formula @ or p S ~1 # q for
Condition (3) in the following present in [GT91]; it is related
definition of ENF is not to our use of TIO and
the justification Definition: (.ENF)
is positive
T8 (ss opposed
where the modified
to using
[GT91]’s
rectly) is necessary to successfully complete tion of a formula into the algebra.
is simplified.
formula
T8,
which is in ENF. Notice that without the transformation TIO, no transformations could have been applied to the original formula. ❑
T1 to
some terms rl, T2. A simplified is not negative.
we obtain,
[((f(~) = Y V 9(Z)= y) A =R(z, Y))V ((h(z) = y V k(z) = y) A +(z,y))] A S(z) A +(y)
xi is actually
formulas.
iteratively
Now, we can apply
3
transformations
2) are used to simplify
transformations
with
TIO twice,
V R(s, Y)) A V P($, y))]
‘[=((i(x) = Y V g(~) = V) A =R(~, Y)) A =((h(z) = y V k(a) = y) A =P(z, y))] A S(z) A -T(y)
q and 7-2
is no occurrence
formula.
‘[((f(~) # YA 9(Z) #Y) ((h(z) # y A k(z) #y) A S(z) A =T(y)
formula. Definition:
em-allowed
Example
7.2
Consider
the following
trans-
analog
di-
the transla-
formula,
if it vz(~, If we apply
Y) = ‘(~($)
# v A h(~)
# v) A R(x)
T8, we obtain,
of T15. A formula
is in Existential
Normal
(~(~)
Form
= Y V h(~) = !/) A R(z)
if and only if which
❑
is in ENF.
(1) It is simplified. (2) Each disjunction a. The parent b. Each
in the formula
of the disjunction,
operand
of the
In the full paper we define the algorithm ENF, which iteratively applies our transformations T1-T12.
satisfies: if it has one, is A
disjunction
The following technical lemma is needed to prove that the transformations preserve em-allowedness and that
is a positive
formula. (3) The parent 3.
the algorithm of a conjunction
of negative
formulas
presented
in this paper
terminates.
is Lemma
7.3:
O*IW then bd(=p)
For an arbitrary
formula
p, if bd(y)
of this section
is given by:
= 0*19.
We use transformations T8 to T12 (see Figure 2) to put formulas into ENF. Two of these (Tll and T12) are analogous to ones in [GT91]. Two others (T8 and T9) are modified versions of the analogs of [GT91]. Finally TIO is new. The following example illustrates the use of transfor-
Lemma 7.4 Given an em-allowed terminates and yields an em-allowed
mation
ENF
Finally
TIO.
258
+
the main
and equivalent
result
to p.
formula formula
p, ENF(p) that is in
T1 T2 T3 T4
Original
Transformed
-11
4
7T1=T2 ($ -(n
TI #
)
rl
# rz)
&A...
#l A.,. A#n where, an operand +~ is a conjunction
T5
where, an operand #~ is a disjunction 3;+ 35+ m(@l A.. .A @n) where for each i, $i is negative and, either *i ~ ‘