On The Updatability Of Relational Views - Semantic Scholar

48 downloads 0 Views 1MB Size Report
sions to view extensions and is called the V&J definition function. In the relational model, a view definition can include: renaming attributes; changing the units or ...
ON THE UPDATABILITY OF RELATIONAL VIEWS+

Umeshwar Dayal and Philip

A. Bernstein

Aiken Computation Laboratory Harvard University Cambridge, MA 02138

Abstract

Because this mapping is functional, updates that cause changes in the schema extension are unambiguously translated into corresponding changes in the view extension. For a view to be useful, users must be able to apply retrieval and update operations to it. These operations on the view must be translated into functionally equivalent operations on the schema It is commonly known that retrievals extension. are easier to handle than updates, since retrievals from a view extension can always be mapped into equivalent retrievals from the schema extension. To evaluate a retrieval query against a view, one can construct the view extension by applying the view definition to the schema extension and then evaluate the query against the view extension so constructed. Clearly, this procedure retrieves exactly the data requested by the query. (We are not suggesting that this is the most efficient procedure; our aim is only to point out that it is always possible to retrieve exactly the desired data from the database by querying the view.) An alternative approach is to augment the query by the view definition, and then apply the modified (This is called query to the schema extension. "query modification" in [ll].) A mapping is also required to translate view updates into equivalent schema updates. However, such an update mapping does not always exist and, when it does exist, it may not be unique [51. So a change in the view extension may not translate unambiguously into equivalent changes in the schema extension. By way of illustration, consider the classical EMPLOYEEschema and view of Fig. l(a,b). (The notation should be clear; it is precisely defined in Sec. 2.) The deletion of the tuple from the view EM* translates uniquely to the deletion of from ED*. No other action has the desired effect without also For example, excising other tuples from the view. deleting from DM* also causes So, for this the deletion of . view, there does indeed exist a unique mapping from However, inview deletions to schema deletions. To insert the tuple sertions pose a problem. into the view, we are faced with the problem of deciding whether Smith is to be assigned to the Concurrency or Implementation DEPT, or to some new DEPT which is also managed byMaril1. Hence, inthis case there is no unique mapping that translates view insertions into schema insertions.

Most relational database systems provide a facility for supporting user views. Permitting this level of abstraction has the danger, however, that update requests issued by a user within the context of his view may not translate correctly into equivalent updates on the underlying database. It is the purpose of this paper to formalize the notion of correct translatability, and to derive constraints on view definitions that ensure the existence of correct update mappings. In summary, our theorems show that there are very few situations in which view updates are possible--even fewer, in fact, than intuition might suggest.

1. INTRODUCTION Most database systems provide a facility whereby a user can delimit his view of the database to that portion of it which is relevant to his application. This user-defined abstraction is Views have three called a user submodez or view. primary benefits 121. They simpZify the unser interface by allowing a user to ignore data that is of no interest to him. They enhance data independence, since most changes in the structure of the database need not impact the view. And, they provide a measure of protection by preventing a user from accessing data outside his view. The structure of a database is defined by a schema; the content of the database (e.g., sets of tuples) is called the extension of the schema. The structure of a view is defined by a sequence of operations applied to the schema; the content of the view, called the extension of the view, is defined by the same sequence of operations applied to the schema extension. This sequence of operations is a functional mapping from schema extensions to view extensions and is called the V&J definition function. In the relational model, a renaming attributes; view definition can include: changing the units or representation of values in some domain; creating computed attributes, such as aggregations; and relational-algebraic operations [31. on the base relations A view extension does not have an independent existence; it is completely defined by applying the definition function to the current schema extension.

368 CH1389-6/78/0000-0368$00.750

1978 IEEE

The purpose of this paper is to develop correctness criteria for translating view updates, and to determine the conditions that must be imposed on the view definition function to guarantee the existence of update mappings satisfying these criteria. We first derive these conditions in terms of schema and view extensions. Later, we use functional dependencies as a means of representing semantic relationships among data items; this permits a syntactic characterization of the conditions on extensions. We believe that this two-level approach to the problem is an attractive one. The conditions derived in terms of extensions are intuitively comprehensible and show exactly why certain updates on certain views translate correctly to schema updates while others do not. On the other hand, the syntactic characterization provides an efficient implementation: a view update that does not translate correctly into a schema update can be detected at compile time as a syntax violation without having to actually access the database. In summary, our theorems show that there are very few situations in which view updates are possible--even fewer, in fact, than intuition might suggest. The next section defines precisely the notation used in this paper. The correctness criteria for update mappings are developed inSec.3. In Sec. 4, necessary and sufficient conditions for correct translation of updates are derived in terms of extensions. In Sec. 5, these conditions are expressed syntactically in the language of keys and functional dependencies.

EM* EMPNAME MGR Beeri Beeri Rothnie Goodman Fagin Fagin Shipman

Bernstein Codd Marill Marill Bernstein Codd Marill

c. ViewEDS: RANGEOF ed IS ED RANGEOF =a IS ESA DEFINE VI~EDS(EMPNAME=ed.EMPNAMB, DEPT=ed.DEPT,SALARAY=escSALARY) WBERE~.EMPNAME=esa.~NAME EDS* BMPNAWE DEPT

SALARY

Beeri Normalization Rothnie Concurrency Goodman Implementation

55K 75K 99K

d. ViewEDM: DEFINE VIEW EDM(EMPNAME=ed.EMPNAME, DEPl%ed.DEPT,MGR=dm.MGRFWBERE -ed.DE%dm-* DEPT EDM* EMPNAME DEPT Beeri Beeri Rothnie Goodman Fagin Fagin Shipman

MGR

Normalization Normalization Concurrency Implementation Normalization Normalization Concurrency

Bernstein Codd Marill Marill Bernstein Codd Marill

FIGURE 1 2. NOTATION a. Schema: In the relational data model, data is organin relations. A rekrtion R* over a set of A,) is a subset of the attributes A= {Al,...,

DEFINE RELATION SCHEMEED(EMPNAME,DEPT) DEFINE RFLATION SCHEMEESA(EMPNAME,SALARY,AGE) DEFINE RELATION SCHEMEDM(DEPT,MGR)

ized

Cartesian

product

DEPT

where for

Normalization Concurrency Implementation Performance Normalization AI Concurrency

associated described

in A, dom(Ai) is the domain of i The structure of a relation is values. by a relation scheme R(Ai,...,Am), and

Extension: ED* EMPNAME Beeri Rothnie GOOdman

Gagliardi Fagin Mylopoulos Shipman DM* DEPT

MRG

Normalization Normalization Cbncurrency Implementation

Bernstein Codd Marill Marill

is defined by the statement DEFINE RELATION SCHEME Note that the relation scheme R is RfAlr--.,A,). static while the relation R* (i.e., its eXtenSiOn) changes as tuples are inserted, modified, or deleted. A schema $ consists of a set of relation )li=l,...,n). At any schemes $={Ri(Ail,...,Aim.

ESA* EMPNAWE SALARY AGE Beeri Rothnie Goodman

55K 75K 99K

dam(R) = dom(Al)x...xdom(A,),

each A

time the extension $* of tie schema is given by $*= CR;,..., R;) where RI is the current extension of R.. 'We allow semantic integrity constraints to be imposed on the data in the database [6,81. Each constraint is a predicate, SEMj, on the product space 2dom(Rl)x~,~x2dom@n) . We denote the conj=unction of all such constraints on the database by predicate SEWSwith the interpretation that

25 28 28

b. View EM: RANGEOF ed IS ED RAtUGEOF r IS DM DBFIIiEVI~EM(EWPNAWE=ed.EWPWAME, MGR=dm.MGR)WHERE -ed.DE%dm.DEPT

SEM$Rf,..., semantically

fies

369

all

R*) n is TRUE iff consistent

the semantic

$*={R;,...,R;)

is a

extension, i.e., $* satisintegrity constraints.

A view V is defined from schema $ using a set of tuple variables, as in QUEL [g]. A tupzf? Sariable t is a variable whose value is a tuple, t, of a relgtion R*. We say that R is the range of t. Tuple variables are declared by means of range statements: RANGEOF t IS R. We assume, for simplicity, that at most one tuple variable ranges over any relation scheme. The definition of view relation V from schema $ is formally expressed by the statement: DEFINE VIEW V() WHERE . is a list of declarationsofthe form= where is: (i) a constant; (ii] an indexed-tuple t.A, where t is a tuple variable and A is an attribute of the-range of t; or (iii) a function of indexedtuples f(Ll.Al,...,$.Ap). In this paper, we

because they are not required for the view update and yet alter the extension of the schema. The set of updates U has no side effects if only the desired update is performed on the view. For example, consider the insertion of into the view extension EDM* (Fig. ld). This insertion requires inserting into ED* and into DM*. However, now thetuple also appears in the view exThis is a side effect, which we judge to tension. be undesirable because it is not requested by the user and yet alters the view extension. A set of update operations on a semantically consistent schema extension preserves semantic consistency if the resulting schema extension is An update that also semantically consistent. violates this criterion is the insertion of into EDM* (Fig. ld). This requires the insertion of into ED*. But if there is a semantic constraint which prohibits the presence of two tuples in ED* with EMPNAMB=Shipman (e.g., if EMPNAMEis a violates semantic key of ED), then this insertion consistency. The uniqueness criterion is controversial and requires some justification. This criterion says that the inverse mapping (of the view definition) from view extensions to schema extensions must be The alternative, of course, is that a function. there are several distinct sets of schema updates that will alter the view extension as desired. The problem in the latter case is how to choose An arbitrary choice seems unwhich set to apply. acceptable. For example, we have seen in Sec. 1 that to insert a tuple into view extension EM* (Fig. lb], any value of DEPT will do. An arbitrary choice would (incorrectly) suggest knowledge of the DEPT value, when in fact it is unknown. There might exist semantic integrity constraints that In the help in disambiguating the several updates. , one could use a unique absence of such constraints This last NULL-value for the DEPT attributes. alternative is attractive, but poses other quesFor example, if DEPT were a key of DM (in tions. Fig. la it actually is not), then what semantics do we use to replace NULL-valued keys by real domain-values? We have found no simple way out of this quandary, and have decided to follow earlier [2,12]) by selecting uniqueness as the work (e.g., However, our results are SOmeWhat criterion. sensitive to this correctness criterion, and alternative criteria should undoubtedly be formulated and investigated. There is another uncomfortable dimension to In our examples, we the issue of translatability. translated a view insertion only into translates into the insertion of into ED* and into ESA* (where '-' stands for NULL or "undefined"]. If, instead, we were to insert the tuple into ESA*, then the AGE-value 25 is extraneous with respect to the desired insertion on Extraneous updates are undesirable the view.

370

So, there is a degree of freedom in defining upThe view definition function, which date mappings. describes the relationship between the contents of the database and of the view extension, is often not sufficient to express the semantics of the relationship between update operations on the view extension and update operations on the schema extension. Some additional semantics are required. We have chosen to map view updates of one type (insertions, deletions, replacements) to schema updates of the same type. Theoretically, this choice is arbitrary. Intuitively, though, it makes good sense and, in the absence of a theory of update semantics for relations, intuition is the strongest argument we can bring to bear. Based on this last restriction and on the correctness criteria developed in this section, we have derived formal necessary and sufficient conditions for view updates to be translatable. Admittedly, our correctness criteria have some weaknesses. However, these weaknesses are due to gaps in our understanding of update semantics in the relational model; they are not due to inherent weaknesses in our formal model of views. We see our formal approach to the views problem and our theorems to be equally important contributions. Different update semantics may lead to different theorems, but we see our formal framework as remaining essentially intact.

D'=D-Dj V*-Vi

Deletions

Let $ be a schema, and V be a view relation defined on $ by DEFv. Assume that the initial schema extension $* is semantically consistent. The initial view extension is V*=DEFv($*). Let v be a tuple variable ranging over V. The syntax Gf a &Zetion operation d on R* is: DELETE EWBERE .

operations D on $* depletes 2 DFF~(D($*)), where D is

source-tupte

is the function performed by D on $*. Tte deletion without side effects if vf-v; 5 DEF~(D($*)). D semantic

consistency

TRUE. A set of operations with

respect

to the deZetion

if

SEMS(6($*))

and

where 6I is the function

per-

sources of the MGR-value Marill in the view. consider the deletion of the tuple Finally, from EDM*. There is no way of deleting tuples from ED* and DM* to effect the desired deletion without side effects, since every tuple of these relations is either not involved in the creation of or is involved in the creation of other That is, neither relation contains tuples as well. a functionally determining source of . Proceeding formally,a tuple tkEq is a

Let fi:2dom(V) +Zdom(') be the function that performs the deletion specified by d, and let V$ be the set of tuples of V* that satisfy the qualification (and are, therefore, to be deleted). Then, the new exgension after the deletion is: d(V*) = V* - V?d .

presemes

consistency,

by x.X=x, and the deletion by d(X). The results are easily extended to the case where the qualification is in Disjunctive Normal Form; such a deletion can be modelled as a sequence of simple deletions. Before stating the conditions, let us consider a few examples. Suppose we wish to delete the tuple Qothnie, Concurrency, Marill> from the view EDM* of Fig. Id. We can achieve this only by deleting from ED*, for deleting from DM* produces the side effect of excising from the view extension as well. We say that {} is a functionally determining sowee of because it is not involved in the creation of any other view tuples. On the other hand, {CConcurrency,Marill>) is a source of but not a functionally determining source, because it is also involved in the creation of , which has no other source in DM*. Next, consider the deletion from EDM* of all tuples with MGR=Marill, i.e., the tuples , and . We now have the option of deleting either T = (CRothnie, gD,,) from ED*, or T . J&{'f~~~';;;; Marill>, in ESA* by . However, the request to replace Beeri by Mylopoulos is not translatable (if EMPNAMEis a key of ED, then this violates semantic consistency; otherwise, it causes additive side effects). 1 be a schema, "~(Y,x) be the simple replacement that replaces v with V[X]=X in v* by vr. For all semanticaZZy consistent extensions S*, rW,X) is transZatabZe to replacements on the sehema iff there exist unique (possib1yempt.y) THEOREM3. Let $= iR ,...,R and v a view defined on it! Let

such.that:. 1. For every vEv*

tiEdEm

such that V[X] =x, there exist tiET., i=l,...,n, such that the corresponding ti,...: tnI are source-tuptes of vr; 2. The resulting schema extension (R;',...,R;') iS SWT?anticUlly t. ET.]; 1

COnSiStsnt,

where R;'=

R;-TiU

{tii

1

3. The set of operations

bsphCs

ti

in R;

by t; for all tiET., I. i=l,...,n) contains no extraneous replacements; if ti is a source-tupZe 4. For every tiETi, of some VEV* such that V[X] #x, then there exists another source of v after replacement; and 5. FOP every uiE R;-TiJ i.e., for every ui that is unchanged, if ui is a source-tupte of some view-tupte w after replacement, then there exists before replacement a view-tupte v such that 0 vr = w. 5. SYNTACTIC CHARACTERIZATION In Sec. 4, we stated semantic conditions for the existence of correct mappings from view updates Now, we introduce keys as to schema updates. semantic integrity constraints and use them to imply the semantic conditions needed for updates to be translatable. Since keys are syntactic constructs, they will give us a compile-time check on update translatability. A functional dependency is a time-varying function f:dom(Al)x...xdom(An)+dom(Bl)x...xdom(Bm), where (Al,...,An),

{Bl,...,Bm)

in the database. For notational con. . . xdom(B,) venience, we write f:A ,...,A n +B l,...,Bm. A com1 plete axiomatisation of functional dependencies appears in (11. A subset K of the set of attributes of a relation scheme R(Al,..., An) is a superkey of R if K-tA., i=l,...,n. If no proper subset of K has this'property, then K is a key of R. We will assume that a set of keys is specified for every relation scheme in the schema and that there are no functional dependencies in the schema other than those implied by keys. One key is identified as the primary key for every relation scheme; NULL values are never specified for the attributes of a primary key (41. To conveniently describe the syntactic conditions for update translatability, we use two graphical representations of the view definitions and of the semantic information provided by keys. First, we define a view trace graph G(V) to be a directed graph with nodes and arcs as follows. Every attribute of a base relation schem&is represented by a O-node, and every view attribute by a A-node, labelled with the corresponding attribute name. For every declaration D=t..A in the target list of V, there are arcs and , where R. is the range of t.? the qualifievery f?qUi~Oih clause Li.A=t..Biin cation of V, there are arcs 7'Ri.A,R'.B' and 'Rj.BIRi.A'. For every equirestrictiJe clause ~i.A=C in the qualification, introduce a new A-node labelled with the constant value 'cl, and add an . arc . If there is a path from a A-node V.D to a O-node Ri.A, then we say that Ri.A is traceabZe from the view, and V.D is a View-tRZce of Ri.A. This indicates that given a value for V.D, we can propagate it uniquely to Ri.A. If L is a set of O-nodes, then 1V.DlV.D is a view-trace of some AEL} is the view-trace of L. Figure 2 gives the view-trace graphs for views FM, EDS, and EDM, of Fig. 1. Observe that the join attributes ED.DEPT and DM.DEPT are not traceable from view EM, whereas they are traceable from EDM. In Fig. Zb, the join attributes ED.EMPNAMEand ESA.EMPNAMEare both traceable from view EDS, and have the same viewtrace EDS.EMPNAME. Next, we define an augmented view-graph F(V) by augmenting the view-trace graph G(V) to include the semantic information provided by functional dependencies. For every functional dependency f:Al,...,A, +B, add the FD-node f and the arcs of (For notational convenience, we represent Fig. 3a. F:A+B, where A and B are single attributes, by an Also, if there is a functional depenarc .) An +B, and some AiE{Al,...,An) appears . dency Al,..., in an equirestrictive clause in the qualification of B, then draw an arc 'B,Ai'. We define paths in F(V) as follows. i. If there is an arc from node A to node B, then there is a path from A to B. ii. Let f by an FD-node representing the functional dependency f:Al,...,An+B, and let Y If there is a path from Y to be a set of nodes. every Air 1 < i < n, then there is a path from Y to B, i.e., an F&node may be traversed only if all are traversed (Fig. 3b). its "inputs"

are sets of database

attributes. At every point in time, for a given n-tuple Edom(Al)x...xdom(An), there 1 exists at most one m-tuple Edom(131)x... 1

373

To see the connection between paths and sources, suppose tiERT is a source-tupe of bEdom(V.B) and that Ri.K -V V.B. Then it must be that ti is not "linked to" any other view tuple with b'#b, since this violates Ri.K SI V.B. That is, if ti is a source-tuple of b, then it is a functionally determining source-tuple of b. So. we can use the paths in F(V) to check for functionally determining sources, and the paths in G(V) to check for traceability. Figures 4b,c,d, give the augmented view graphs of views EM, EDS, and EDM, defined as in Fig. 1 on the schema {ED, ESA, DM} with constraints as defined in Fig. 4a. Observe that while EM.ENPNAME ti DM.DEPT (Fig. 4b), DM.DEPT is not traceable from the view FM (Fig. 2a).

iii. If A and B are nodes and there is a sequence of nodes C ,...,C such that CO=A, Ck=B, 0 k' and for all j, 0 < j < k-l, there is a path from C. I to c then there i; a path from A to B. j+l' iv. Let Y be a set of nodes. If there is a path from some subsetof Y to a node A, then there is a path from Y to A. v. Let Y and Z be sets of nodes. If there is a path from Y to every node in Z, then there is a path from Y to Z. The notation Y * Z means that there is a path in F(V) from Y to Z. FIGURE 2.

View-Trace

Graphs for theViewsof

Fig.

1

a. G&M) ED.EMPNAME ED.DEPT 0-o

DM.DEPT

FIGURE 3

DM.MGR a. Representation

E EM.EMPNAMFl a b. G(EDSI

of f:A

1

,...,A,+B

% EM.MGR B

ED.EMPNAMEED.DEPT ESA.EMPNAI6.E ESA.SALARY ESA.AGE 0 0 % a EDS.EMPNAMEEDS.DEPT c.

b. Path from Y = {Cl,...,C,}

a EDS.SALARY

cl%= 0!D-

GfED’d

l3D.EMPNAM.E ED.DEPT %

DM.DEPT

DM.MGR

@O

%

f

EDM.EMPNAME EDM.DEPT Note:

C--,

f

B f

cm------+

A n

EDM.MGR

is equivalent

to B through

toe FIGURE 4

A path in F(V) indicates a functionally deterSuppose there is a path from a set mining source. of nodes Ri.L to a node Rj.B. Then, for all semantically consistent schema extensions 15 k 2 n, {R;, . ..,Rf;) and all tk, tie%, (QUAIfr(tl, A~~[L]

= tj[L])

implies

for any two R;-tuples

value

of Ri.L, through

all

R;-tuples

tj[B] that that

= tj [Bl

they are "linked of

Constraints

Expressed

as

b. Augmented View Graph F(EM) for View EM:

have the same

QUALV must have the same value

Integrity

DEFINE RELATION SCHEMEED(EMPNAME,DEPT) WITH KEY (EMPNAME} DEFINE RELATION SCHEMEDM(DEPT,MGR) WITH KEY ~DEPT,MGR) DEFINE RELATION SCHEMEESA(ENPNAME,SALARY,AGE) WITH KEY {EMPNAME)

. . ..tn)AQU~(ti....,t~)

i.e., to"

a. Schema with Keys:

ED.EMPNAME ED.DEPT o-o-0

Rj.B.

DM.MGR 0

DM .DEPT

t A EM.EMPNAME

if there is a path from a set of O-nodes Similarly, R..L to a A-node V.B, then for all semantically cinsistent schema extensions{RT,...,Ri} and all l

Suggest Documents