On the Integrity of Databases with Incomplete Information - CiteSeerX

11 downloads 77 Views 1MB Size Report
Address: lJ3M Research K55/801, 650 Harry Rd., San. Jose, CA ... the real world are taken into account, incomplete- ...... San Francisco, May 1982, pp. 137-.
ON THE INTEGRITY

OF DATABASES

WITH INCOMPLETE

INFORMATION

Extended Abstract

Moshe Y. Vardi+ IBM Almaden Research Center

Abstract

harder

We consider bases with

the

incomplete

meaningfulness

of data-

information.

The

basic

idea is that such a database is meaningful

if it

can be completed

to a database with

information

satisfies

straints. the

that

the

assumption

from both

complexity

and

of

completion.

The

open-world

aspects of computational

logical

axiomatizability,

are harder than integrity

while for data

1. Introduction A database is a model of the real world.

open-world

requires that the database with

with

information.

incomplete

assumption

be a conservative

the database with the

incomplete

somewhat

under

ness of information

information.

surprising

the closed-world

ple lack of knowledge

stems not

to take into accounts cer-

about the aspects that we

did take into account.

is

* Address: lJ3M Research K55/801, 650 Harry Rd., San Jose, CA 9512043099.

Dealing -with

information

is a central

problem

Intelligence

and database

theory,

weakest point of current base management Permission to copy without fee all or part of this maicrial is granted provided tba~ the copies are not made or distributed for direct commercial advantage. the ACM copyright notice and the title of the publication and its dale appear, and notice is given that copying is by permission of the Association for Computing Machinery. l’o copy otherwise, or to republish. requires a fee and;or specific permission.

need for systematic information,

252

in Artificial and it is the

database and knowledge

Re84].

lL84,

There is a manifest

methods to model incomplete

for algorithms

that

modify

incom-

in response to new facts, and

for query answering algorithms

$00.75

incomplete

systems jCo75, FW83,

Li81, Li83, McDD80,

plete information ACM-0-89791-179~2/86/0300-0252

how-

tain aspects of the real world, but also from sim-

that

assumption

In practice,

of information

only from our inability

of We

result

is inherent.

ever, incompleteness

com-

extension

process not all aspects of

the real world are taken into account, incomplete-

The closed-world

requires that the database with

plete information

Since in any modelling

com-

be an extension of the database

0 1986

the

bases with complete information.

con-

plete information

integrity

assumption

under

We look at two approaches to defining

notion

prove

integrity

both notions

complete

integrity

than

that are consistent

with our models and our update algorithms.

To be more precise, we have to specify our

The fundamental problem that we address

model for information

more concretely.

in this paper is the meaningfulne.ss of incomplete

represent full information

information.

set U of attributes.

In order to address the problem we

We

as a relation on some

Such relations are called

We focus here on a particular

need to define meaningfulness more precisely.

complete relations.

This is quite straightforward

for full information.

type of incompleteness, where data are missing in

For any given application,

only a subset of all

a uniform manner. Specifically, we take incomto be relations on V, where V is a

possible collections of data is usually of interest.

plete relations

This subset is defined by certain constraints,

proper subset of U. A complete relation p is an

called integrity

eztension

constraints.

The data is con-

of an incomplete relation q if every

sidered ‘to be meaningful if it satisfies the con-

tuple in q comes from some tuple in p, i.e,

straints.

qEnv(p).

(An example of an integrity mechanism

p is a conservative

eztension of q if in

is that of keye, where there can be no two data

addition every tuple in p is reflected by a tuple in

items with

q, i.e, q=nt(p).

the same key.) If, however, only

incomplete information

Thus the OWA assumes that

our knowledge about V-tuples is possibly incom-

is given to us, then we

might not be able to test whether the constraints

plete, while the CWA assumes that while our

are satisfied or not. The intuitive

knowledge

answer is that

of

Utuples

is

incomplete,

incomplete information is meaningful if it can be

knowledge of V-tuples is complete.

completed to meaningful full information, i.e., if

according to CWA, if a certain Vtuple is not in

we can complete it to a complete collection of

q, then this tuple does not represent a correct

data that satisfies the constraints.

fact (Re78]. The closed-world assumption is an

(This idea

That

our is,

emerged first in the framework of the universal

example of what is called in AI default reasoning

relation model [Ho82]).

[Re80]. We specify integrity

We have of course to decide when complete information

approaches. approach

According

to

first-order sentences that seem to be suitable for

We consider two the

specifying .database semantics va83,Ul83].

open-world

given dependencies. We say that an incomplete

be an eztension of the

information.

The

A

complete relation is meaningful if it satisfies the

(OWA), what is required is that the

complete information incomplete

dencies. The class of dependencies is a class of

is considered to be a completion of

some incomplete information.

constraints by depen-

relation is 0 WA-consistent

closed-w0rl.l

with the given depen-

approach (CWA) requires that the complete infor-

dencies if it has an extension that satisfies the

extension of the incom-

dependencies. We say that an incomplete rela-

mation be a conservative

tion is CWA-consistent

plete information.

with the given dependen-

cies if it has a conservative extension that satisfies

253

,

scheme R is a mapping from R to a set of values

the dependencies. We investigate here two questions related to

What

is the

complezity

of testing

con-

sistency? (2)

A relation on R is a finite set

of tuples on R. An unrestricted

consistency: (1)

called the domain. finite or an infinite

relation on R is a

set of tuples on R.

Our

interest here is mainly in relations. If t is a tuple of X and YCX, then t[Yl is a tuple of Y defined

What is the logic required to aziomatize consistency?

as the restriction of t to Y If r is a relation on X and YCX, then the projection

More formally,

we are given a set C of.

given by

dependencies that the complete relation is supposed to satisfy.

Let con@)

of r onto Y is

7ry(r)={

t[ y1: Er}

be the class of

incomplete relations that are consistent with C.

Our definition of relations is different in an

We try to find out what is the complexity of

inessential manner from the standard definition of

recognizing incomplete relations in cons(C), and

relations in mathematical logic. That is, by fixing

whether we can axiomatize it, that is, construct a

some linear ordering for the attributes of U we

set C’

can consider a relation on R to be a finite subset

of sentences such that cons(C) is exactly

the class of incomplete relations that satisfy C’ .

of D’“, where m=IRI.

The domain of this relation

CWA-

is the set of all elements that occur in some tuple,

consistency is harder than OWA-consistency from

and for our purposes need not be mentioned

both aspects of complexity and axiomatizability.

explicitly.

Our

investigation

shows

that

For example, according to one complexity meas-

2.2.

Dependencies

ure, OWA-consistency can be checked in polynomial

time

complete.

while

CWA-consistency there

Interestingly,

are

is

For any given application only a subclass of

NP-

database

aspects where CWA makes life easier. We discuss

all possible relations is of interest.

This subclass

is defined by semantic constraints that are to be satisfied by the relations of interest. A family of

thii at the end of the paper.

constraints that was extensively studied in the literature

2. Basic Definitions

is the family of dependencies.

reader who is interested 2.1. Tuples,

Relations,

We have a finite called the universe,

and Databases

set U={Al,

of attributes,

(The

in the relationship

between the family’ of dependencies defined here and other families of dependencies is referred to

. . . ,A,,},

IFa=1 4

which intui-

tively are column names. A relation scheme is a

The language will be a first-order language

nonempty subset of U. A tuple on a relation

with

254

equality

and without

function

symbols.

When

talking

about

relations

over

(called full in [Fa82]).

the

R,

Observe that egd’s are

language will contain one ]R]-ary predicate sym-

necessarily total.

bol R.

dependency is equivalent

This language will be denoted as L(R).

We

call

an

WI,

. - , ,v,.)

atomic

formula

of

the

of i.e.,

dependencies with a single atomic formula on the

a relational formula, and an atomic

is a first-order

to a conjunction

finitely many total dependencies with q=l,

form

formula of the form vr=v* an equality formula. A dependency

Observe also that every total

sentence in the

right-hand-side

of the implication.

assume without

loss of generality that all total

dependencies are of, this form.

language L of the form

Thus, we We will

say

“dependencies on R” instead of “dependencies in VYl * -

* Yk%

* * . z&4/\

* . * I\Ap-+&I\

* * * l\BJ,

the language L(R)“.

where:

(1) t-4

(3)

or untyped

k,p,q>l

[BVSl,Fa82].

We mostly focus on

total untyped dependencies in this paper; In $6

and 120.

The A’s are relational

we shall consider non-total and typed dependen-

formulas that use

between themselves exactly all the variables

cies.

Yl, * * * ,Yk

2.3. Satisfaction

The B’s use between themselves all the vari-

and

Consistency

If we are given a set C of dependencies on

ables zl, . . . ,zl and possibly some y’s.

(4)

Dependencies can be typed

U, then it is quite obvious when a complete rela-

Either all the B’s are relational formulas, or

tion satisfies C. If p is a relation on U, then we

I=0 and they are all equality formulas.

just have to check that the relational structure‘

If all B’s are relational formulas, the depen-

< C,p>,

dency is called a tuple-generating (abbr. tgd), Intuitively,

where C is the set of elements that

occur in p, satisfies the dependencies in C. The

dependency

situation

a tgd says that if some

is more complicated with incomplete

relations.

tuples, satisfying certain equalities, exist in the relation, then some either other tuples, satisfying

Let V be a proper subset of U, and suppose

certain other equalities, must also exist in the

that for some reason we lack information

relation.

the data entries for the attributes in U-V.

If all the B’s are equality formulas, the

dependency is called an equality-generating dency (abbr. egd). Intuitively,

depen-

call relations on V an incomplete

about

relation,

W.e as

an egd says that if

opposed to complete relations, which are relations

some tuples, satisfying certain equalities, exist in

on U, There could be many possible reasons for

the relation, then these tuples must also satisfy

the lack of information.

some other certain equalities.

not be authorized to read this information or pos-

For example, we might

Dependencies without existential quantifiers,

sibly a physical sensor that is supposed to supply

i.e., in the syntax above l=O,, are called total

this information is broken. At any rate, we now

255

want to decide whether

a given relation

semantically

meaningful.

Intuitively,

plete relation

is semantically

meaningful

be completed

to a complete

relation.

this notion of completion

approach has been pursued in the context of data-

on V is

an incom-

base modelling

if it can To define

3.

open-world

assumption

(OWA)

assumption.

So

data entries for the attributes may also lack information for the attributes ing definition.

in U-V,

in V. This motivate

dependencies

with

The open-world

model (cf. [GMv86,Mw84])

assumes that

though

V.

This

motivate

the following

relations OWA

p on U is a conservative

relation

q on V if q=mrv(p).

extension

with

position

as

“databases:

as logical theories.

and CWA,

give rise to different

approaches,

however,

consistency

theories.

A similar

with

of the universal

reduction,

relation

in

model, is

be a subset of U, and let q be a relation To describe q by a logical

theory

ment the language by individual corresponding

constant symbol

to the entries in q.

tions

The language

256

name

name for rela-

on V, and C is the set of elements

occur in q.

closed-world

on V.

we first aug-

will be JYQJ,V,C), where U is the relation

is not in q

This is an instance of what is called in The

theories.

Let C be a set of dependencies on U, let V

A

c if it has a conservative

[Re80].

vs.

The two approaches,

to hold.

reasoning

theory

[Ko81, NG78, Re84].

for relations on U, V is the relation

default

In this csse

This is the crux of the

then this tuple represent a fact that is known not

AI

is more

described in [GMv86].

in

q is said to be

t

it

of the real world.

known

the context

in WV,

eztension of a

that if a tuple

is incomplete,

to describe the database as a model

of the resulting

that satisfies C. Thus the CWA takes

the default

and the database

the given dependencies is reduced to satisfiability

(CWA)

definition.

approach

We now show how to represent incomplete

relation

about the attributes

relation

CWA-COn8i8tcnt

information

interpretation”

that

we may lack information

we have full information

with

as a model of the real world.

In both

about the data entries for the attributes

is the intuitive

as

can be described

paradigm

and in the context

assvmption

ClO88d-WOdd

This

than as an interpretation.

of query processing (cf. [IL84]). The

databases

is complete

has been

of the universal

viewed

the database should be viewed ss a theory rather

q is said to be

approach

have

i.e., they associate relations

of our knowledge

c (recall that C is a set of

pursued in the context

we

when information

appropriate

the follow-

on v) if it has an extension

satisfies C.

far

names.

When

but we

p on U is an extension

q on V if qEav(p).

OWA-COn8i8tcd

about the

about the data entries

A relation

of a relation

relation

is in

It assumes

that not only do we lack information

_I

Databases

interpretations,

some sense a worst-case

.

Logic&l

about the degree

of our lack of information. The

of

query processing (cf. [Re78, Re83, Va85]).

more precisely, we have

to make some more assumptions

(cf. [Re84]) and in the context

that

C serves as the set of individual

HyW(Y)-v=al\/- * - \/Y=&

constant symbols. The OWA theory of q, denoted Z’ho,(q,C) consists of four components:

(1)

Integrity

constraints:

That is, the CWA axiom says that all information about the attributes in Vis already represented in

These are the depen-

dencies in C written in the language L(U).

q. If q is empty, then the CWA axiom is t7’

This axioms say that the complete relation

Y(-V(Y))*

that we are considering has to satisfy the

We can now state the connection between

given dependencies.

(2)

Uniqueness

consistency and the above theories. Recall that a theory is finitely

aziomcr: For every pair c,d of

distinct constant symbol’ we have an axiom

model,

-~(c=d).

Theorem

This says that unique elements are

indeed unique.

(3)

3.1.

satisfiable

if it has a finite

Let U be a set of attributes, and

let C be a set of dependencies on U Let V be a axiom:

Contaiflmekt V={Al,

. . . ,A,,,}

U={Al,

. . . ,A,,A,+l,

Assume

. . . ,A,,}.

that

subset of U, and let q be a relation on V Then q

and

is OWA-consistent ThowA(q,C)

Then this

with

C if

and only

is finitely satisfiable, and q is CWA-

consistent with C if and only if Thcw,,(q,C)

axiom is

if is

finitely satisfiable. [] 4.

Computational

Complexity

We now want to analyze how hard it is to determine whether a given incomplete relation is

Thii axiom says that we are considering a

consistent with a given set of dependencies. Note

complete relation that is an extension of the

that this decision problem has two parameters:

incomplete relation.

(4

Atomic jacte: For each tuple

analyze the complexity

in q we have an axiom V(aI, . . . ,a,,,).

complexity with respect to the size of the given

The CWA theory of q, denoted Z%o,(q,C),

relation and complexity with respect to the size of

consists, in addition to the above axioms, also the CWA

axiom.

the given dependencies. The former has been

Let al, . . . ,ak be the list of all

tuples in q. For an mtuple y=< variables, let y=a'

in two different ways:

yl, . . . , y,> of

termed in va82] the data complexity,

while the

latter has been termed there ezpksion

complez-

ity.

be a shorthand for /I\ y,=a$

bined

The CWA axiom is

(We do not consider in thii abstract comcomplexity,

which

is complexity

with

respect to the combined size of the given relation

257

(2)

and the given dependencies.)

There is a finite set C of total dependencies such that

To demonstrate the difference between data

RELow,@)is

PTIME-complete.

complexity and expression complexity, let us first survey known results about the complexity satisfaction.

of

Thus while the data complexity of satisfaction is

To study data complexity, we have

in LOGSPACE, the data complexity of OWA-

to fbc a given set C of dependencies on U and

consistency’is complete for PTIME.

consider the set

suggests that while we can check satisfaction fast by using parallel processing, we cannot do the

is a relation on U that satisfies C}.

REL(C)={p:p

This strongly

same for OWA-consistency [E3077]. To study expression complexity, we have to fix a

To study expression complexity, we have to

given relation p on U and consider the set

fix a given relation q on Vand consider the set

: C is a finite set of total dependencies

TDEp(p)={C

in [CM77,Ch81]

(2)

4.1

(1)

For

For every finite set C of total dependencies,

REL(C)is in

For

relation

every

TDEp(p)

(3)

Theorem

the collection

p,

4.3.

every

relation

q,

the

collection

TDEPOwA(q) is in EXPTIME.

LOGSPACE. the

U

such that q is OWA-consistent with C}.

The following theorem follows from results

(1)

Vc U and C

is a finite set of total dependencies on

on U that are satisfied by p}.

Theorem

< U,C> :

Z’DEPOwA(q)={

(2)

There is a relation q such that TDEPowA(q)

collection

is EXPTIMEI-complete. []

is in co-NP.

Thus

the

expression

complexity

of

OWA-

There is a relation p such that the collec-

consistency is exponentially harder than its data

tion TDEF’(p) is co-NP-complete. []

complexity and it is provably intractable.

We now refer to OWA-consistency.

To

We now give the analogous definitions for

study data complexity, we have to fix a given set C of dependencies on

RELowA(C)={

U and consider the :

VSU and qis

CWA-consistency.

set

To study data complexity, we

have to fix a given set C of dependencies on

U

and consider the set

a relation

RELcwA(C)={ < V,q>

on Vthat is OWA-consistent with C}.

:

Vc U and

q is a relation

on V that is CWA-consistent with C}. j Theorem

(1)

4.2 To

For every finite set C of total dependencies, the collection

RELowA(C) is in

study expression complexity, we have to fix a

given relation q on Vand consider the set

PTIME.

258

TDEPC&q)={

< U,C> : Vg U and C

marked nulls. We now chase p with the dependencies in C, p’ossibly equating marked nulls and

is a finite set of total dependencies on U

adding tuples to p. such that q is CWA-consistent with C}. Theorem

two non-null consistent

4.4

If we are forced to equate

elements, then q is not OWA-

with

C,

otherwise

it

is

OWA-

consistent. This process is polynomial in the size (1)

For every finite set C of total dependencies,

of p and exponential in the size of C.

.

the collection RELcw,.,(C) is in NF’. To check for CWA-consistency we also have (2)

There is a finite set C of total dependencies such that RELcw@)

to check at the end that q=nv(p).

is NF’-complete. []

Theorem 4.4 is reminiscent

If this is not

the case we have to guess an assignment of non-

of Theorem 2 in

nulls to the nulls such that q=ndp)

will be

[CKS85], which p roves an NP-completeness result

satisfied. After such an assignment we may have

in the context of the universal relation model

to repeat the process of chasing and assigning

[Mw84]

until we reach convergence or until we are forced

and the

domain-closure

assumption

[Re84]. Theorem

(1)

For

to equate non-nulls.

is due to the nondeterministic assignment.

4.5.

every

relation

q,

the

In the next section we shall see that CWA-

collection

is ‘in NRXPTIME.

consistency is harder than OWA-consistency not

There is a relation q such that TDEPCwA(q)

only from ‘a computational point of view but also

TDEP,,(q) (2)

Thus the added complexity

from a logical point of view.

is NEXF’Tlh!lI?-complete. [] According

to the above result the gap

6. Axiomatizability

between OWA-consistency and CWA-consistency is the gap between deterministic ministic

time.

A subject of great interest in mathematical

and nondeter-

logic is that of aziomatizability.

Note, however, that practically

Given a class s1

of structures, the logician tries to axiomatize it by

speaking this is an exponential gap!

defining a logic A, which consists of a language L

To explain why CWA-consistency is prob-

and a satisfaction relationship between structures

ably harder than OWA-consistency, we informally

and sentences in L. s2 is aziomatizable by A if

describe an algorithm to check consistency. The

there exists a set C of sentences of A, such that a

idea is that given a set C of total dependencies on

structure M is in h2 if and only if M satisfies all

U and a relation p on Vc U, we try to construct

sentences in C. If C is finite, then ct is finitely

a (conservative) extension p of q that .satisfies C.

aziomatirable by A. This notion of axiomatizabil-

This is done as follows. For every tuple t in q,

ity enables us to classify the expressive power of

we construct a tuple in p by extending t with

.logics according to the classes of structures that

259

they can axiomatize or finitely axiomatize.

dependencies C such that RELowA(V,C) is not

We

axiomatizable by egd’s. 0

show in this section that it is harder to axiomatize CWA-consistency than to axiomatize OWA-

Theorem 5.1 suggest that CWA-consistency

consistency.

is logically harder than OWA-consistency, but it

We first try to axiomatize consistency by

seems to be only “mildly”

harder.

To see that

first-order logic. We have to bear in mind, how-

CWA-consistency is more than “mildly”

harder

ever, that every class of relations that is closed

than OWA-consistency, it is instructive

to con-

under isomorphism is axiomatizable by first-order

sider unrestricted relations (i.e., relations that can

logic. Furthermore, it is even axiomatizable in a

be either finite or infinite).

proper subset of first-order logic.

the definitions in 52 carry over to unrestricted

which we call universal-etistential

This subset,

relations with no modification.

logic, is the set

of all first-order sentences whose prefix consists of

following definitions:

a string of universal quantifiers followed by a

uR~~owA(v,~)={q

string of existential tizability

quantifiers.

Thus, axioma-

results for first-order

logic are not

We also need the

: Q is an unrestricte relation

on V that is OWA-consistent with C},

interesting, unless they talk about finite axiomatizability

It is easy to see that

: q is an unrestricted relation

~‘=OWA(~“,c)={q

or about a proper subset of universalon V that is CWA-consistent with Cl.

existential lqgic . To study axiomatizability

.of consistency we

Theorem

6.2. Let C be a set of total dependen-

have to fix a set C of dependencies on U and a

cies on U, and let V be a subset of U. Then

relation scheme V& lJ. Thus we define

URELOWA(

RELowA( V,C)={q

v,c)

is axiomatizable by egd’s. On

the other hand, there are particular U and V and

: q is a relation on V

a particular

that is OWA-consistent with C}.

finite set C of total dependencies

such that URELcwA( V,C) is not axiomatizable by R.f!3LowA(V,C)={q

first-order logic. 1

: q is a relation on V

The above results are interesting theoreti-

that is CWA-consistent with C}.

cally, but do not really have practical significance Theorem

because the set of dependencies promised by the

6.1. Let C be a set of total dependen-

theorem can be infinite.

cies on U, and let V be a subset of U. Then RELowA(V,C)

is axiomatizable

What we would like to

have is finite axiomatizability

by egd’s, and

by first-order logic.

RELowA( V,C) is axiomatizable by total dependen-

Since first-order satisfaction can be tested in loga-

cies. On the other hand, there are particular

rithmic

and

V and

a particular

finite

U

space [Ch81], finite axiomatizability

consistency by first-order

set of total

260

logic will

of

entail, by

Theorem

4.4, that NP=LOGSPACE!

gests the following Theorem

6.3.

This sug-

The following

result.

consistency

There are relation

V and a finite set C of total

schemes Wand

can be axiomatized

Theorem

5.4.

REL~~~( V,C) and RELcwA( V,C) are

dependencies

not finitely

axiomatizable

Then RELowA(V,C)

Since we can not finitely sistency

by first-order

higher-order consistency

con-

logic, we try to do it by

logics.

Studying

the definition

we observe that essentially

of existentially

[]

axiomatize

quantifying

is finitely

axiomatizable

fkpoint

logic.

ticular

U and V and a particular

finite set C of

axiomatizable

by fixpoint

logic.

0

CWA-consistency

tions, which are relations over a possibly extended

even more powerful

logic: existential

domain.

logic (eso logic).

logic

logicFe74].

It is a very

satisfaction

relationship

an

powerful

language

is not necessarily

recur-

sentences of L are of the form ZlP(+), where I$ is

domain.

con-

obtained

a first-order

by adding P to L.

formula of L’ . Let M be a structure

of L with domain D. M satisfies the sentence

We

s(4)

first

consider

the

if there is a relation

P to L.

be the language The fixpoint

Theorem

sen-

6.5.

dependencies

Let C be a finite

first-order

finitely

axiomatizable

free variables

21, . . . ,z,, where P occurs positively. a structure minimal

of L with

n-ary relation

domain

L’ . The relation the structure relationship:

M.

6. Non-Total

on the domain of M, such

is satisfied in the structure

. . . ,z,,)=

RELcwA( V,C)

by eso logic.

are

[]

Dependencies

So far we have considered only total depen-

4)

dencies.

(A&p) of the language

p is the least jixpoint

and

Let M be

D. Let p be the

that the sentences ‘w’zi * * * z,(P(q,

set of total

on U, and let V be a subset of U.

REL~~,J( V,C)

with

(M,p) of

the language L’ .

Then

of L’

p on the domain of M

such that 6, is satisfied in the structure

tences of L are of the form pP(q5), where 4 is a formula

The eso

logic that does not use

name, and let L’

by adding

be the

name, and let L’

whose

Let L be a language, let Let P be a new n-

obtained

second-order

Let L be a language, let Let P

be a new n-ary relation

logic of [AU79,CH82].

ary relation

we need an

logic,

by higher-order

extended

fixpoint

projective

0 ur aim here is to axiomatize

sive [Ha76]. sistency

is called in

many-sorted

by

On the other hand, there are par-

To axiomatize

mathematical

set of total

on U, and let V be a subset of U.

rela-

The logic of such definition

logic,

total dependencies such that RELCwA( V,C) is not

of

it consists

over complete

OWA-

by fixpoint

Let C be a finite

such that

logic.

claims that

which is not the case for CWA-consistency.

dependencies on U

by first-order

theorem

In this

dependencies.

section We

also

we consider wish

to

non-total distinguish

between typed and non-typed dependencies.

of 4 in

We now define the satisfaction

tively,

typed dependencies do not require interac-

tion between

M satisfies pP(4) if p=D”.

Formally,

261

Intui-

different

columns

of the relations.

a dependency u is typed if it is subject

to the following (1)

syntactic

If a variable position

constraints:

DEPowA(q)={

z occurs in the i-th argument

: VsUand

C

is a finite set of dependencies on U

of R, then it does not occur in the

such that q is OWA-consistent

with C},

+th argument position of R for jsi. (2)

If a variable

position of R, and a variable j-th

argument

position

then the equality

s-y

y occurs in the

typed.

and Fagin’s

untyped

[Fa82].

other

such that q is CWA-consistent

e.g.,

dependencies,

embedded implicational

cies, on the

Theorem

are

(1)

dependen-

Inclusion

hand,

dependen-

are an example

of

(2)

dependencies.

dependencies

is that for typed total

OWA-consistency

and

For every finite set C of dependencies,

the

collections

REL OWA@) and RJ%WAW

are

recursively

enumerable.

There is a finite set C of typed dependencies RELowA(C)

(3)

For

every

relation

DEpoWA(q) 6.1.

Let U be a set of attributes,

and

(4

Let V be a subset of U, and let q be a relation on Then

q is OWA-consistent

CWA-consistent

with

are

q,

the

and DEpcwA(q)

collections

are

WX.USiVely

enumerable.

let C be a set of typed total dependencies on U.

V

and RELcw,Q)

not recursive.

CWA-

consistency coincide. Theorem

with C}.

6.2.

such that Our first observation

: Vc U and C

does not occur in u.

multivalued

cies are also typed

< U,C>

is a finite set of dependencies on U

of R, where j#i,

Most dependencies studied in the literature, junctional

DEPcwA(q)={

z occurs in the 6th argument

There

is

a relation

q

such

D,??poWA(cl) and DEPGwA(q)

C iff q is

sive.

that

that

are not recur-

1

with C. [] According

to Theorem

6.2 both notions

of

Theorem 6.1 explains why previous works on conconsistency sistency with

respect to typed total dependencies

total

(e.g., [Fa82, GH83, GZ82, Hu84, Hu86]) did not distinguish

beween open-world

assumptions. consistency

As

we

shall

see later,

and CWA-consistency

for typed non-total

point

and closed-world

to

get finite

that

there is no

axiomatizability

We can still try to axiomatize

consistency by infinite

sets of sentences.

dependencies.

non-total

first some definitions,

trying

imply decidability.

Theorem with

It follows

dependencies. in

in the presence non-

results in the spirit of $5, since such results would

OWA-

do not coincide

Let us consider now the complexity sistency

are intractable

dependencies.

of con-

U,

We need

and

6.3. Let C be a set of dependencies on let

V

be

a subset

of

U.

Then

RELowA( V,C) and UREL~G~A( V,C) are axiomatiz-

where q is a relation on V

able by egd’s.

262

On the other

hand,

there

are

particular

U and Vand a particular

typed

dependencies

not

axiomatizable

u-Gm4( logic.

such that by

finite set C of

RELcwA( V,C) is

dependencies,

is not axiomatizable

JF)

We say that adp)

and

by first-order

in q if 4 holds in

p of q that

4 CWA-holds2

XV(P) for all conservative

satisfies C.

in q if 4 holds in

extensions

p of q that

satisfies C. It follows from the definitions q is CWA-consistent

The second claim of Theorem 6.3 answers in the negative

a question

posed by Fagin

We note that the counterexample U is the set ABC,

dependency

the embedded multivalued 7. Static

To study

and

Q-B]

C.

closed-world pulation

approach

of incomplete

CWA-holdsc={(V,q,4)

mani-

that

tions (i.e., relations

Theorem

7.1.

(1)

collection

refers to incomplete

on V). For simplicity

answer.

Vis a first-order

A first-order

to define the semantics relations. approach

Our

(2)

we con-

We now have

is analogous

taken in the definition

to

the incomplete

q be a relation

the

collection

OWA-holds@

is

co-r.e.-

[] under CWA is not harder

than

evaluation,

standard evaluation

query

under

OWA

[Ch477], while is

intractable.

Thus, there is a trade-off between the complexity of manipulating

of

complexity

relation.

As with consistency

is PSPACE

Thus query evaluation

query

of consistency.

is we apply the query to all completions

The

complete.

of queries on incomplete

approach

CWA-holdsc

dencies.

rela-

Boolean query on

sentence in L(V).

The

complete for any finite set C of total depen-

sider only Boolean queries, i.e., queries that have a yes/no

in q},

Let U be an

set, let C be a set of dependencies on U,

symbol

: g5 CWA-hold+

q5is a query on V

and let V be a subset of U. Recall that V is the predicate

on V, and

where Vis a subset of U, q is a relation on V, and

databases more difficult.

Let us consider now the static manipulation

attribute

q},

q5is a query on V, and

that the

makes the dynamic

of

collec-

V,q,d) : 4 0 WA-holdsE.in

where Vis a subset of U, q is a relation

of the database

of databases, i.e., query evaluation.

we consider the following

0 WA-holdsx={(

Our results indicate

complexity

tions:

needs to be checked only in the process of updating the database.

the computational

query evaluation

A Trade-Off

the consistency

C, then 4 CWA-holds

viewed as a complete relation).

very

AC+B

dependency

vs. Dynamic:

Normally,

is actually

with

that if

in q iff $J holds in q (i.e., 4 holds in q where q is

Fa82].

V is the set AB and C

consist of the functional

That

for all extensions

We say that

[]

simple:

4 OWA-holdsc

we have two cases. Let

on V, and let (b be a query on V

263

the database statically

of manipnlating

it dynamically.

and the

Acknowledgements.

Portland, March 1985, pp. 261-275.

Pd like to thank Ron

Fagin and Shuky Sagiv for their comments on a

Fw

previous draft of this paper.

Chandra,

A.K.,

Merlin,

Optimal implementation

of conjunc-

tive queries in relational References [AU791

languages.

Computing, 1977, pp. 77-90.

Proc.

(Co751

6th ACM Symp. on Principle8 of Pro-

Borodin, A.B.: On relating time and

Bull.

,of

7, 3-4(1975), pp. 25

Fagin, R.: Horn clauses and database

SLAM J.

dependencies. J. ACM 29(1982), pp.

Comput. 6(1977), pp. 733-744.

252-285.

Beeri, C., Vardi, M.Y.: The implica-

[Fe741

tion problem for data dependencies, Languages, and Programming,

Feferman, S.r Two notes on abstract model theory - Properties invariant

Proc. 8th Int. Colloq. on Automata,

on the range of definable relations

July

between

Notes in Computer

structures.

Fundamenta

Math. 82(1974), pp. 153-165.

Science - Vol. 115, Springer-Verlag,

PJvw

1981, pp. 73-85. Chandra, AK.:

FDT

28.

space to size and depth.

[Ch81]

#7).

ACM-SIGMOD

1979, pp. 110-117.

1981, Lecture

Codd, E.F.: Understanding relations (Installment

gramming Languages, San Antonio,

[BV81]

databases.

Proc. 9th ACM Symp. on Theory of Aho, A.V., Ullman, J.D.: Universality of data retrieval

PO771

P.M.:

Fagin, R., Ullman, J.D., Vardi, M.Y.: On the semantics of updates in data-

Programming primi-

bases, Proc.

tives for database languages. Proc.

Principle8

8th ACM Symp. on Principle8 of Pro-

2nd ACM of

Symp. on

Database

Systems,

Atlanta, March 1983, pp. 352365..

gramming Languagea, 1981, pp. 50[GH83]

62.

Ginsburg, S., Hull, R.: Characterizations for functional

[~=I

Chandra, A.K., Harel, D.: Structure

dependency and

Boyce-Codd normal

and complexity of relational queries.

Theoretic,al

J. Computer and System Sciences

form families.

Computer

Science

29(1983), pp. 243-284.

25(1982), pp. 99-128. [CKSSS]

Cosmadakis, S.S., Kanellakis,

[GZ82]

P.C.,

of functional dependency families.

Spyratosi N.: Partition semantics for relations. Principle8

ACM 29(1982) pp. 678-698.

Proc. 4th ACM Symp. on of

Databaee

Ginsburg, S., Zaiddan, S.: Properties

Syetems,

264

J.

[GMV86]

Graham, Vardi,

M.,

M.Y.:

satisfaction, (GV84]

Graham,

Notions

[Li81]

and

M.Y.:

[Li83]

On the

[Ma831

model-theoretic

Proc.

Hierarchy Note

in

Theory, Mathematics

Springer-Verlag, [Ho821

1975.

and

P.: Testing satisfaction

functional

dependencies.

Hull, tional

R.: Finitely

537,

J. ACM

[McDD80]

families.

J. ACM

Hull,

R.: Non-finite

projections families.

specifiability

[NG78]

of

of functional

dependency

To appear in

Theoretical

Imielinski, information

T., Lipski,

[Re78]

Kowalski, language.

on Theoretical Cetraro,

[Re80]

Advanced

Computer Maryland,

1983.

of Relational Science Press,

1983. J.D., Vardi, M.Y.: of the universal

relation

model,

Database

Systems 9(1984), pp. 283

McDermott,

ACM

Trans.

on

D., Doyle, J.: Nonmono-

knowledge

I.

Artificial

Intelli-

Nicolas,

J.M., Gallaire,

- theory

vs. interpretation.

In Logic

and Data Bases, Plenum

Press, New

Reiter,

R.:

bases.

In Logic

H.: Database

On

closed world

data-

and Databases (H.

and J. Minker,

eds.), Plenum

Reiter, R.: A logic for default reasoning.

R.A.: Logic as a database Proc.

Laboratoire

Press, New York, 1978, pp. 55-76.

databases.

J. ACM 31(1984), pp. 671-791. [Ko81]

D., The Theory

Gallaire

W.: Incomplete

in relational

#138,

York, 1978, p. 33-54.

Computer Science.

P41

de Paris-Sud,

data-

gence 13(1980), pp. 41-72.

31(1984), pp. 21@226. [Hu86]

Maier,

tonic implica-

in

308.

of

specifiable

dependency

ACM

related

Report

On the foundations

29(1982), pp. 668-677. [Hu84]

information

Maier, D., Ullman,

1976, pp. 335-345.

Honeyman,

Problems

Research

Rockville,

Lecture

- Vol.

W.: Logical

Databases,

languages.

2nd Conf. on Set Theory

with

J.

de Recherche ed Informatique,

1984,

Hajek, P.: Some remarks on observational

Lipski,

Universite

pp. 281-289. [Ha761

databases

information.

bases.

of Data-

April

On

to incomplete

of

database states, Proc. 3rd

base Systems, Waterloo,

W.:

28(1981), pp. 41-70.

axiomatizability

Symp. on Principles

Lipski, incomplete

of dependency

M.H., Vardi,

consistent

A.O.,

To appear in J. ACM.

complexity

ACM

Mendelzon,

Artificial

Intelligence

13(1980),

pp. 81-132.

Seminar

Issues in Databases,

[Re83]

1981.

Reiter, complete

265

R.: A sound and sometimes query

evaluation

algorithm

jot

relational

database8

with

null

Tech. Report 83-11, Dept. of

values.

Computer Science, Univ. of British Columbia, 1983. [Re84]

Reiter, R.: Towards a logical reconstruction theory. (M.L.

of

relational

database

In On Conceptual

Drodie,

Modelling

J. Mylopoulos,

and

J.W. Schmidt, eds.), Springer-Verlag, New York, 1984, pp. 191-233. Ullman, J. D., Principles Systems,

Computer

of Database

Science Press,

Potomac, Maryland, 1983. P821

Vardi, M.Y.: The complexity of relational query languages. Proc. 14th ACM Symp. on Theory of Computing,

San Francisco, May 1982, pp. 137146. M.Y. Vardi: Querying logical databases. Proc. Principle8

4th ACM of

Database

Symp.

on

Systems,

March 1985, pp. 57-65. To appear in J. Computer and System Sciences.

266

Suggest Documents