A Normal Form Redundancy in for Precisely Characterizing Nested ...

6 downloads 0 Views 2MB Size Report
Brigham Young University. We givea .... the total unnesting of the nested tuple (Young, {Chess, Soccer)). 2.2. ... denotes the union of attributes in all descendants.
A Normal Form for Precisely Characterizing Redundancy in Nested Relations WAI YIN

MOK,

Brigham

Young

YIU-KAI

NG,

and

DAVID

W. EMBLEY

University

We givea straightforward

definition for redundancy in individual nested relations and define a new normal form that precisely characterizes redundancy for nested relations. We base our definition of redundancy on an arbitrary set of functional and multivalued dependencies, and show that our definition of nested normal form generalizes standard relational normalization theory. In addition, we give a condition that can prevent an unwanted structural anomaly in nested relations, namely, embedded nested relations with at most one tuple, Like other normal forms, our nested normal form can serve as a guide for database design. Categories and Subject models% normal forms

H.2. 1 [Databaae

Descriptors:

Management]:

Logical Desigr-data

General Terms: Design, Theory Additional Key Words and Phrases: Database design, data redundancy, functional and multivalued dependencies, nested normal form, nested relations, normalization theory, scheme trees

1. INTRODUCTION Although

normalization

its extension

to nested

theory relations

for flat relations is much

has a long

more recent.

research

Partition

Normal

history, Form

(F’NF’) [Roth et al. 19881, which guarantees eqmctid properties for nesting and unnesting and for keys of nested relations, has been well accepted. indeed, nested relations are sometimes defined such that only PNF relations are allowed,l and for Abiteboul and Bidoit [ 1986], the definition predates PNF. A normal form for nested relation schemes that detects potential redundancy

and the possible

has not posed [Ozsoyoglu Although these relation schemes, however,

‘ See Abiteboul

update

anomalies

that accompany

redundancy,

been widely accepted, even though some have been proand Yuan 1987, 1989; Roth and Korth 1987]. earlier proposals provided guidance for the design of nested they did not succeed in precisely characterizing potential

and Bidoit [ 1986], Ozsoyoglu

and Yuan [ 1987, 1989], and Roth and Korth [ 1987].

Much of the work on this paper was done while W. Y, Mok was at Hong Kong Polytechnic. Authors’ address: Department of Computer Science, Brigham Young University, Provo, UT 84602. Permission tomake digital/hard copyofpartor allofthiswork for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage, the copyright notice, the title of the publication, and its date appear, and notice is given that copying is by permission of ACM, Inc. To copy otherwise, to republish, to post on servers, or to redistribute to lists, requires prior specific permission and/or a fee. IQ 1996 ACM 0362-5915/96/0300-(3077 $03.50

ACMTransactions on Database Systems, Vol. 21, No. 1, March 1996, Pages 77-106

78

.

W. Y. Mok et al,

redundancy. In this article we propose a new normal form for individual nested relation schemes that completely characterizes redundancy with respect to any given set of functional dependencies (FDs) and multivalued dependencies (MVDs). The result we present is a generalization of standard relational normalization theory. We proceed as follows. In Section 2, we provide our basic definitions for nested relations. Like Abiteboul and Bidoit [ 1986], Ozsoyoglu and Yuan [ 1987, 1989], and Roth and Korth [1987], we define our nested relations to be in PNF. In Section 2 we also give carefully specified redundancy definitions. As illustrations for our redundancy definitions, we give examples, which we use later to show that none of the earlier definitions fully detects redundancy. In Section 3, we present our definition, which we call NNF (Nested Normal Form). As we illustrate our definition, we also compare it to earlier definitions and show that ours can provide greater flexibility in how attributes may be clustered in nested relation schemes. In Section 4, we present a theorem guaranteeing that NNF detects potential redundancy. In Section 5, we investigate the converse of this theorem. We show that a nested relation scheme that is not consistent with the given set of MVDs and FDs, as we define consistency, is automatically not in our normal form. In addition, we are able to show that if a nested relation scheme is consistent with the given set of MVDs and FDs and there is no potential redundancy, then the nested relation scheme satisfies our definition of NNF. In Section 6, we show that our definition of NNF is a generalization of standard relational normalization theory. In particular, we show that 4th Normal Form (4NF), as defined by Fagin [1977], is a special case of NNF, and that Boyce-Codd Normal Form (BCNF) is also a special case when we limit the dependencies to FDs. Thus, like other normal forms, our definition of NNF can provide a guide to database design. It also has the drawbacks of these other normal forms, and, in this sense, is not a panacea for database design. We therefore comment on what our definition does and does not provide for the designer. In Section 7, we present a condition that can prevent an unwanted structural characteristic of nested relations, which we call singleton buckets because a nested relation represented by a singleton bucket allows at most one tuple. We then prove that this condition does indeed prevent singleton buckets. Although this condition has nothing to do with redundancy, it is in harmony with earlier definitions [Ozsouyoglu and Yuan 1989; Roth and Korth 1987], that also disallow singleton buckets. In Section 8, we present our conclusions. 2. BASIC DEFINITIONS

AND

PROPERTIES

2.1. Nested Relations A nested relation allows each tuple component to be either atomic or another nested relation, which may itself be nested several levels deep. As in Abiteboul and Bidoit [1986], Ozsoyoglu and Yuan [1987, 1989], and Roth and Korth [1987], we are only interested in nested relations that are in PNF. ACM Transactions on Database Systems, Vol. 21, No. 1, March 1996.

Redundancy in Nested Relations

.

79

Thus, in a nested relation, there can never be distinct tuples that agree on the atomic attributes of either the nested relation itself or of any nested relation embedded within it [Atzeni and DeAntcmellis 1993].

Definition 2.1.1. recursively

Let U be a set of attributes. defined as follows:

(1) If X is a nonempty subset of U, then over the set of attributes X.

A

nested relation scheme is

X is a

nested relation scheme

, Xn are pairwise disjoint, nonempty subsets of U, and R ~,. . . . R. are nested relation schemes over Xl,, . . . X. respectively, then X( h!,)* . . . (R ~)* is a nested relation scheme over XXl . . . Xm.

(2) If X, Xl,...

Definition 2.1.2. attributes

Let R be a nested relation scheme over a nonempty set of Z. Let the domain of an attribute A G Z be denoted by dom( A). A

nested relation ouer R is recursively

defined

as follows:

(1) If R has the form X where X is a set of attributes {Al,..., A.), n > 1, then r is a nested relation ouer R if r is a (possibly empty) set of functions {tl, . . . . tm) where each function t,, 1 < i < m, maps A, to an element in dom( A,), 1 s j < n. (2) If R has the form attributes

(A

X(RI)* . ..(R~)*. m >1, where X is a set of ~,. . ., A~), n > 1, then r is a nested relation over R if

(a)

r is a (possibly

(b)

t,c r and t, G r and t,(X)

empty) set of functions {tl,..., tp}where each function t,, 1< i s p, maps Aj to an element in dom( A, ), 1 < j < n, and maps l?h to a nested relation over Rk, 1 < k < m, and

Each function of a nested nested tuple of r

relation

= t,(X)

r over

implies

nested

t,= t,,1< i, j < p.

relation

scheme

R is a

Example 2.1.1. Figure 1 shows a nested relation. Its scheme is Dept Chair ( Prof( Hobby)* ( Matriculation(Student( Interest )* )* )* )*, and it contains two nested tuples. As in Abiteboul and Bidoit [ 1986], we draw a bucket for each embedded nested relation. Each bucket also contains nested tuples of ita own. For example, {Young, {Chess, Soccer)) and { Barker, {Skiing)) are nested tuples in the first bucket under the embedded nested relation scheme Student( Interest)*. Notice that, as required, PNF is satisfied. Thus the values for the atomic attributes Dept Chair differ, and in each bucket the atomic values differ. Definition 2.1.3. Let R be a nested relation scheme. Let r be a nested relation on R. The total unnesting of r is recursively defined as follows: R has the form X, where total unnesting of r.

(1) If

X is a set of attributes,

then r is the

(2) If R has the form X(RI)* . . . (Rn)*, where X, is the set of attributes in R,, 1 s i s n, then the total unnesting of r = {t] there exists a ACM Transactions on Database Systems, Vol. 21, No. 1, March 1996.

80

.

W. Y. Mok et al.

Dept

Chair

( Prof

CS

Turing

Jane

( Hobby

)*

I Skiing

I

L

1

(Matriculation

Student

Ph.D.

Young

I

( interest )*)”)”)’

u

Chess Socoer

‘“s’r2xal , Pat

I Hitdng

I

I

Ph.D.

I

Lee

I Travel

I

““s’

I

caner

m]

I I I

~ Math

Polya

Steve Ill&-l

I

Fig. 1. Nested relation.

nested tuple u c r such that t(X) = u(X) total unnesting of u(l?i ), 1 S i < n.}

and t(Xi) is a tuple in the

Definition 2.1.4. Let R be a nested relation scheme. Let r be a nested relation on R. Let t be a nested tuple of r. The total unnesting of t is defined as the total unnesting of q, where q is a nested relation containing the single nested tuple t. Example 2.1.2. Figure 2 shows the total unnesting of the nested relation in Figure 1. Observe that the first two tuples in the total unnesting contain the total unnesting of the nested tuple (Young, {Chess, Soccer)). 2.2.

Scheme

Trees

We can graphically represent a nested relation scheme by a tree, called a scheme tree. A scheme tree captures the logical structure of a nested relation scheme and explicitly represents a set of MVDs. Scheme trees have been used for earlier normal form definitions for nested relations [Ozsoyogu and Yuan 1987, 1989; Roth and Korth 1987]. We use them here for the same purpose.

Definition 2.2.1. A scheme tree T corresponding scheme R is recursively defined as follows:

to a nested

relation

(1) If R has the form X, then T is a single node scheme tree whose root node is the set of attributes X. (2) If R has the form X(RI)* . . .(R.)*, then the root node of T is the set of attributes X, and a child of the root of T is the root of the scheme tree Ti, where T, is the corresponding scheme tree for the nested relation scheme Ri, 1 s i s n. ACM Transactions on Database Systems, Vol. 21, No. 1, March 1996.

Redundancy in Nested Relations Dept

Chair

Prof

Hobby

Matriculation

Student

Interest

Cs Cs Cs Cs Cs

Polya

Skiing Skiing Skiing Skiing Hiking Dance Dance Hiking Hiking

Young Young Barker Adams Lee Carter Carter Carter Carter

Chess

Math

Jane Jane Jane Jane Pat Steve Steve Steve Steve

Ph.D. Ph.D. Ph.D.

Math Math Math

Turing Turing Turing Turing Turing Polya Polya Polya

Fig. 2.

fd.s. Ph.D. M.S. M.S. M.S. M.S.

.

81

Skiing Skiing Travel Travel Skiing Travel Skiing

Total unnesting of nested relation inFig.1.

The one-to-one correspondence between a scheme tion scheme along with the definition of a nested

tree

and a nested

rela-

relation scheme impose several properties on a scheme tree. Let T be a scheme tree. We denote the set of attributes in T by Aset(T). Observe that the atomic attributes of a nested relation scheme, at any level of nesting, constitute a node in a scheme tree. Observe further that because Definition 2.1.1 requires nonempty sets of attributes, every node in T consists of a nonempty set of attributes. Furthermore, because the sets of attributes corresponding to nodes in T are pairwise disjoint and include all the attributes of T, the nodes in T are pairwise disjoint,

and their

union

is

Aset(T).

Let N be a node in T. Notationly, Ancestor(N) denotes the union of attributes in all ancestors of N, including N. Similarly, Descendant N ) denotes the union of attributes in all descendants of N, including N. In a scheme tree T each edge (V, W ), where V is the parent of W, denotes an MVD Ancestor(V) + Descendant(W). Notationly, we use MVD(T ) to denote the union of all the MVDs represented by the edges in T. By construction, each MVD in MVD(T) is satisfied in the total unnesting of any nested relation for T. Because FDs are also of interest, we use FD( T ) to denote any set of FDs equivalent to all FDs X -+ Y implied by a given set of FDs and MVDs over a set of attributes U such that Aset(T ) c U and XY L Aset(T ). (Note that because a set of FDs F together with a set of MVDs M can imply FDs not implied by F alone, FD(T), in general, is not equivalent to the set of FDs in F whose left-hand side is a subset of Aset ( T ) and whose right-hand side is restricted to Aset (T ).) Figure 3 shows the scheme tree T for the scheme of the relation in Figure 1. Figure 3 also gives the set of attributes in Aset(T ) and the set of dependencies MVD(T). Observe that each of the MVDs in MVD(T ) is satisfied in the unnested relation in Figure 2.

Example 2.2.1.

nested

2.3. Data Redundancy Data redundancy is a concern in database design. Redundant data can lead to higher storage and access cost. It can lead to update anomalies, forcing multiple copies of the same data value to be updated when one copy changes, and it can lead to data inconsistency if all copies do not agree. ACM Transactions on Database Systems, Vol 21, No. 1, March 1996.

82

.

W. Y. Mok et al. T = Dept Chair

I Prof

/\ Hobby

Matriculation

I

student

I Interest Aset(T) = Dept Chair Prof Hobby Matriculation Student Interest MVD(T) = {Dept Dept Dept Dept Dept

Chair Chair Chair Chair Chair

+ Prof Prof Prof Prof

Prof Hobby Matriculation Student Interest, + Hobby, + Matriculation Student Interest, Matriculation + Student Interest, Matriculation Student+ Interest)

Fig. 3. Scheme tree Z’, AaeKZ’), and MVD(Z’) for nested relation scheme in Fig. 1.

Except in rare cases, such as Vincent and Srinivasan [1992], papers and textbooks on normalization fail to provide rigorous definitions for redundancy and thus also fail to prove that normalization removes redundancy as expected. Offered instead are motivating examples to illustrate redundancy removal. Thus in the vast body of research literature on normalization, we have mostly only rigorous syntactic justifications for normalization; what we are missing are rigorous semantic justifications. Besides only providing for syntactic characterizations, a danger of not treating redundancy formally is that the examples may be misleading. Indeed, as we show in the following, the definition for 4NF found in most textbooks does not detect potential redundancy for all cases even though some readers of these books are led to believe that it does. In creating definitions for redundancy, we should try to find simple and intuitive characterizations, but creating a simple and intuitive definition for redundancy is more difficult than one might at first think. Any definition will involve a sophisticated statement, and there are many possible approaches one might use. Our notion of redundancy is based on the idea that an atomic value u in a nested or flat relation is redundant if we can erase u, and then from what remains and from a single FD or MVD that holds, determine what u must have been. ACMTransactions on DatabaseSystems,Vol.21,No. 1, March 1996.

Redundancy in Nested Relations

.

83

U = {Dept, Chair, Prof, Hobby, Hobby-Equipment, Matriculation, Student, Interest)

F = { Student + Dept + Chair)

Matriculation, Student +

M = ( Student + Interest, Prof + Hobby + Hobby-Equipment) Fig. 4.

Some given constraints

Prof. Prof +

Dept.

Hobby Hobby-Equipment,

over a set of attribuks.

The way we define “holds” is important. Here, we adapt Fagin’s [ 1977] definition, and we explain it thoroughly before proceeding with our definition of redundancy.

Definition 2.3.1. Let U be a set of attributes. Let M be a set of MVDs U and F be a set of FDs over U. Let T be a scheme tree such Aset(T) c U. An MVD X + Y holds for T with respect to M and X c Aset(T ) and there exists a set of attributes Z & U such that Y = Aset(T) and M u F implies X + Z on U. An FD X - Y holds for T X + Y on U. respect to M and F if XY G Aset(T ) and M u F implies This definition Fagin [ 1977].

is motivated

by the following

over that

F if 2 n with

Lemma, which is Theorem 5 in

LEMMA 2.3.1. Let U be a set of attributes and R G U. Let M be a set of MVDS over U and F be a set of FDs over U. Let X G R, Z c U, and Y= Z~R. If MU Fimplies X+ Zon U,then MU Firnplies X+ Yon R.

ROOF.

Fagin [ 1977].

•l

Example 2.3.1. Figure 4 shows a given set of attributes U and a given set of FDs F over U and a given set of MVDs M over U. All the FDs in F in Figure 4 hold for the scheme tree T in Figure 3, as do all the FDs implied by M u F. Not all the MVDs in M hold for T. In particular, neither Hobby + Hobby-Equipment nor Prof + Hobby Hobby-Equipment hold for T. Because Hobby Hobby-Equipment ~ Aset (T) = Hobby, however, Prof + Hobby does hold for T. Although Prof + Hobby holds for T, obsewe that it is not implied by Mu Fon U. As illustrated in Example 2.3.1, certain MVDs hold for a relation scheme even when they are not implied by a given set of FDs and MVDs. It is those that hold that are of interest to us. We now return to our task of defining redundancy. Because our definition depends on the validity of a nested relation, however, we must first define what it means for a relation to be valid for a given set of FDs and MVDs.

Definition 2.3.2. Let U be a set of attributes. Let M be a set of MVDs over U and F be a set of FDs over U. Let T be a scheme tree such that ACM Transactions on Database Systems, Vol ’21, No. 1, March 1996.

84.

W. Y. Mok et al.

Aset(Z’) c U. Let r be a nested relation on T. Nested relation r is valid with respect to lkf U 1’ if in the total unnesting of r, every FD and every MVD that holds for T with respect tQ M and F is satisfied. We now define redundancy. and MVD redundancy.

The definition

has two parts: FD redundancy

Definition 2.3.3. Let U be a set of attributes. Let M be a set of MVDs over U and F be a set of FDs over U. Let T be a scheme tree such that Aset(T) G U. Let XY G Aset(T), and let X-* Y be an FD or an MVD that holds for T with respect to M and F and has an attribute A ● Y and A 6X. Let r be a nonempty nested relation on T that is valid with respect to M U F. Let S be a subtree of T that contains A as an atomic attribute, and Ietsl,..., Sn be the nested relations over S in r. Let u ~ and u ~ be distinct nested tuples of Si and Sj, respectively, 1 s i, j s n, such that Ul( A) = U, U2(A) = u’, and u = u’. (Nots that i = ~ is possible so that si and sj may either be the same nested relation under S or may be in different nested relations under S.) Let tl and t~ be distinct tuples in the total unnesting of r such that tl( Aset(S)) and tz( Aset(S )) are tuples in the total unnesting of U1 and u ~, respectively. —FD redundancy, when X-* Y is X + Y: If tl(X) = tz(X), value v is a redundant atomic value in r caused by X + Y.

then atomic

when X-* Y is X + Y: If tl(X) = tz(X), tl(Y) = tz(Y), where Z = Aset(T) – (xY), then atomic value v is a redundant atomic value in r caused by X * Y.

—MVD redundancy, and tl(Z) # tz(Z)

Example 2.3.2. Let Student + Dept and consider the nested relation and its total unnesting in Figure 5. Both appearances of Math are redundant in both the nested and unnested relation. We can see this formally as follows. Let t~ be the third tuple and tz be the last tuple in the unnested relation. Now we have tJStudent) = tJStudent), and thus Math in the third tuple of the unnested relation is redundant. Because Math in the third tuple of the unnested relation comes from the first nested tuple in the nested relation, Math in the first nested tuple of the nested relation is redundant. By reversing t~ and t~, we can see formally that Math in the second nested tuple of the nested relation and in the last tuple of the unnested relation are also redundant. It is possible for a value not tQ be redundant in a nested relation and yet be redundant in the total unnesting of the relation. h-ideed, this is often the reason we create nested relations—h remove redundant values.

Example 2.3.3. Suppose Student + have multiple majors. Now consider the ing in Figure 6. Observe that Skiing is However, in the nested relation, Skiing only once.

Interest and we allow students

to nested relation and its total unnestredundant in the unnested relation. is not redundant because it appears

ACM Transactions on Database Systems, Vol. 21, No. 1, March 1996.

Redundancy in Nested Relations Student +

Interest

(Dept

.

85

Dept

(Student)*)*

Skiing

Travel

I

‘s

w

‘at’

15!!ll

Cs

Interest Skiing Skiing skiing Travel Travel

I Lee

Student

CS

Barker Adams Carter Lee Carter

Cs

Math

‘s

Math

( Dept

Fig.5.

I

Dept

Student +

Interest

I

I

Redundancy caused byan FD.

Interest

(Student

)*)*

Skiing

w Fig.6.

Travel

Elimination

ofredundancy

by nesting.

~

Interest

Dept

Student

Skiing Skiing Skiing Travel

CS CS Math Math

Barker Adams Barker Catter ACM Transactions on Database Systems, Vol. 21, No. l, March 1996

86

.

W. Y. Mok et al.

Example 2.3.4. Let Prof + Dept Student and Prof + Hobby HobbyEquipment and consider the flat relation in Figure 7. Because the scheme of the relation in Figure 7 is Prof Student Hobby and Prof + Dept Student and Prof + Hobby Hobby-Equipment, by Lemma 2.3.1 Prof + Student and Prof + Hobby hold for the scheme Prof Student Hobby. But now all the data values under Student and Hobby are redundant, as can be seen formally by appropriately picking two distinct tuples and choosing which attribute and value to consider. For example, let tl be the first tuple and t~ be the second tuple, then Young in t~ is redundant because tJ Prof ) = tJProf ), tl(Student) = tz(Student), and tl(Hobby) # tJHobby). As an aside, we observe here that by the common definition of 4NF found in most textbooks (e.g., [Korth and Silberschatz 1991; Maier 1983]) the relation scheme in Figure 7 is in 4NF. This is because no nontrivial MVD, given or implied, applies to the scheme, where “applies” means that the set of attributes that constitutes the MVD is a subset of the scheme. In particular for Example 2.3.4, neither Prof + Student nor Prof + Hobby is implied by or is in the given set of MVDs {Prof + Dept Student, Prof + Hobby Hobby-Equipment). According to the original definition given by Fagin [1977], however, the relation scheme in Figure 7 is not in 4NF. Fagin’s definition not only considers all nontrivial MVDs that are given or implied (without regard to the scheme under consideration), but also the MVDs that hold when the scheme is considered.

Example 2.3.5. To show an example of a nested relation (with embedded relations) that has redundancy caused by an MVD, we present one more example of redundancy. Let U = {Prof, Article-Title, Publication-Location) and let Prof - Article-Title and Article-Title + Prof be the MVDs. (Note that Example 2 in Beeri and Kifer [1986] has exactly the same characteristics). Consider the nested relation and its total unnesting in Figure 8. Based on the MVD Article-Title + Publication-Location, which holds for the nested relation scheme in Figure 8, all the values under Publication-Location in the nested relation are redundant. We can see formally, for example, that the last Hong Kong value under (Publication-Location)* is redundant by letting t~ be the last tuple and t~ be the 4th tuple in the unnested relation. Thus t1(Article-Title) = t 2( Article-Title), t ~(Publication-Location) = t 2(Publica tion-Location), and tl(Prof) + tJProf). Definition 2.3.3 tells us what it means for an individual atomic value to be redundant in a nested relation for an FD or MVD that holds. Our next definition ties together the notion of a redundant data value in a nested relation and the notion of a nested relation scheme that permits valid nested relations that contain redundancy. It is this definition that allows us to later show that our normal form definition detects redundancy. over Definition 2.3.4. Let U be a set of attributes. Let M be a set of ~s U and F be a set of FDs over U. Let T be a scheme tree such that Aset(T) c U. T is said to have potential redundancy with respect to M U F if there exists a redundant atomic value in any valid nested relation for T caused by either an FD or an MVD that holds for T with respect to M and F. ACM Transactions on Database Systems, Vol. 21, No. 1, March 1996

Redundancy in Nested Relations Prof + Prof +

.

87

5, 7, and

8 all

Dept Student Hobby Hobby-Equipment

Prof

Student

Hobby

Jane Jane Jane Jane

Young Young Barker Barker

Reading Skiing Reading Skiing

Fig. 7.

A flat relation with redundancy

Prof ~ Article-Title Article-Title + Prof

Prof

Article-Tfile

(

)* (Publication-Location)*

Steve lx&xN&lw Pat

Programming in Ada W&&!9_l

Article-Tfile

Prof

Programming Programming Programming Programming Programming Programming

Steve Steve Steve Steve Pat Pat

Publication-Location

in C++ in Ada in C++ in Ada in Ada in Ada

USA USA Hong Kong Hong Kong USA Hong Kong

Fig.8. Redundancycausedby a MVD

Example 2.3.6, have

redundancy,

said to have

Because the nested

potential

the nested relation

redundancy

with

relations schemes respect

in Figures in Figure

5, 7, and 8 are all

to the FDs

and MVDs

given

for the examples.

3. NESTED

NORMAL

FORM

we motivate the need for a new normal-form definition for nested relations by making certain observations about the examples we have presented. First, if we are given the FDs and MVDs in Figure 4, none of the earlier normal-form definitions

[Ozsoyoglu

and Yuan

1987,

1989; Roth

and Korth

1987]

allow the

ACM Transactionson DatabaseSystems,Vol 21, No. 1, March 1996.

88

.

W. Y. Mok et al.

scheme of the nested relation in Figure 1, which is also equivalent to the scheme tree T in Figure 3. They therefore do not allow the nested relation in Figure 1 even though it is a good clustering for this application and has no redundancy. For a scheme tree T to satisfy the earlier normal-form definitions, T must satis& four conditions. It turns out that T in Figure 3 does not satisfy the fourth condition for any of these previous definitions. One requirement of the fourth condition insists that the root of a scheme tree be the left-hand side of a reduced nontrivial MVD, but all (implied) MVDs with Dept Chair as the left-hand side are trivial. In fact, this is not the only violation. In particular, Matriculation cannot be an inner node of T in Figure 3. For the definitions in Ozsoyoglu and Yuan [1987, 1989], there are even partial MVDs in T because of the edges (Prof, Hobby) and (Student, Interest). Because there are unnecessary conditions in these previous normal form definitions, they all restrict attribute clustering and design flexibility, as these examples show. In fact, these conditions can lead to unnecessary decompositions of schemes. Second, all the earlier definitions [Ozsoyoglu and Yuan 1987, 1989; Roth and Korth 1987] allow the scheme of the nested relation in Figure 8, but as pointed out in Example 2.3.5, the nested relation has redundancy. We can see that the earlier definitions allow the scheme of the nested relation in Figure 8 as follows. Let T be the scheme tree for the scheme of the nested relation in Figure 8, and assume we are given the set of MVDs, M = {Prof + ArticleTitle, Article-Title + Prof }, and the empty set of FDs. When there are no FDs, all three earlier definitions are equivalent. Now observe that MVD(T) = {Prof + Article-Title, Prof + Publication-Location}. Because ill implies MVD(T), the first condition of their definitions is satisfied. Because M does not imply an MVD with a left-hand side that is a proper subset of Prof, T has no partial MVDs, and thus their second condition is satisfied. Article-Title + Prof, T has no transitive MVDS, and thus their third condition is satisfied. Each node in the scheme tree for Figure 8 is a single attribute, therefore there can be no decomposition of nodes, and thus their fourth condition is satisfied. We now give our definition for Nested Normal Form.

Definition 3.1. Let U be a set of attributes. Let M be a set of MVDs over U and F be a set of FDs over U. Let T be a scheme tree such that Aset(T) G U. T is in Nested Normal Form (NNF) with respect to M u F if the following

conditions

are satisfied.

(1) If D is the set of MVDs and FDs that hold for T with respect M u F, then D is equivalent to MVD(T) u FD(T) on Aset(T).

to

(2) For each nontrivial FD X - A that holds for T with respect to M u F, X + Ancestor(N~) also holds with respect to M u F, where NA is the node in T that contains A. Example 3.1. Suppose we are given scheme tree T in Figure 3 is in NNF. follows. We have observed in Example does not hold for Aset(T). The set of

U, F, and M as in Figure 4. Then the We can see this from our definition as 2.3.1 that Hobby + Hobby-Equipment MVDs and FDs that do hold for T is

ACM Transactions on Database Systems, Vol. 21, No. 1, March 1996

Redundancy in Nested Relations

.

89

equivalent to F u {Student + Interest, Prof + Hobby} considered on Aset(T ). This set is thus equivalent to D in the NNF definition. MVD(T ) is the set of MVDS in Figure 3, and we can let F in Figure 4 be FD( T ). We can convince ourselves that Condition 1 is satisfied by applying a few standard MVD and FD derivation rules. For example, we can derive Prof + Hobby from MVD(T)

and FD(T ) by using the FDs in FD(T) to obtain Prof + Dept Chair Prof, converting this derived FD into an MVD, and then applying transitivity with Dept Chair Prof + Hobby in MVD(T) to obtain Prof + Hobby. To convince ourselves that Condition 2 is satisfied, we consider Student + Matriculation, which holds for T, and observe that Ancestor( Matriculation) = Matriculation Prof Dept Chair and that Student + Matriculation Prof Dept Chair is implied. Hence Student + Ancestor( Matriculation). We similarly check each nontrivial satisfied. Example

FD in

FD( T), which

3.1 not only illustrates

is suficient NNF

to ensure

in a nontrivial

that case,

Condition

2 is

but also shows

that our definition

accepts the nested relation scheme in Figure 1, which we consider to be good, but which is rejected by all the earlier definitions as previously explained. We now continue by giving two more examples, one that violates Condition 1 of NNF and one that violates Condition 2. Our example that violates Condition 1 also shows that NNF detects the problem of the nested relation scheme in Figure 8. It therefore recognizes the scheme that allows redundancy, but is not detected by the earlier definitions.

Example 3.2. Let U = {Prof, Article-Title, Publication-Location), M = {Prof + Article-Titie, Article-Title + Prof}, and F = 0. As in Figure 8, let T be Prof( Article-Title)* (Publication-Location)*. T does not satise Condition 1 because MVD(T) U FD(T), which is {Prof * Article-Title, Prof + Publica tion-Location), is not equivalent to the set of FDs and MVDs that hold for T. For example, we cannot derive Article-Title + Prof from { Prof + ArticleTitle, Prof + Publication-Location}. Thus Condition 1 is not satisfied. Example 3.3. Let U = {Interest, Dept, Student), M = 0, and F = {Student + Dept). As in Figure 5, let T = Interest(Dept(Student)* )*. T does not satisfy Condition 2 because Student + Dept is a nontrivial FD that holds for T and Ancestor( Dept ) = Dept Interest, but Student + Dept Interest. 4. NNF

AND

POTENTIAL

REDUNDANCY

In this section we prove one of our main results, In particular, we prove that a nested relation whose scheme is in NNF for a given set of FDs and MVDs cannot have redundancy with respect to the given FDs and MVDs. Many of the lemmas here depend on a set of FD and MVD derivation rules. We use the following rules, where X, Y, Z, V, W, and Z’ are all subsets of a set of attributes R:

FD derivation rules: F1: (reflexivity) Y c X implies X + Y. F2: (augmentation) X + Y and V c W imply XW + YV. F3: (transitivity) X + Y and Y + Z imply X + Z. ACM Transactions on Database Systems, Vol. 21, No. 1, March 1996

W. Y. Mok et al.

90.

MVD derivation rules: Ml: M2: M3: M4: M5:

(reflexivity) Y c X implies X ~ Y. (augmentation) X a Y and Z’ c Z imply X2 -B YZ’. (transitivity) X ~ Y and Y a Z imply X -P Z – Y. (complementation) X ~ Y implies X ~ R – (xY). (trivial complementation) X -+ R – X.

Combined FD and MVD derivation rules: Cl: (replication) C2: (coalescence)

X + Y implies X a Y. X ~ Y, Z a W, W c Y, and Y n Z = 0 imply X + W.

These FD and MVD derivation rules are sound and complete [Beeri et al. 1977], but not minimal. Indeed, part of what we show is that Ml (reflexivity) is not needed so that without it the derivation rules are sound and complete. The more common choice, of course, is to retain Ml and omit M5. For our proofs about scheme trees, however, it is often required that our MVDs stretch from root to leaf. We therefore use the alternative choice for trivial MVDs. Because this choice is not common, we prove in Lemma 4.1 that this is possible. In addition, we add a corollary that tailors the lemma only for the case of MVDs. LEMMA 4.1. Let U be a set of attributes. Let M be a set of MVDs over U and F be a set of FDs over U. Let Z -B W be an MVD implied by M U F on U. There exists an (M u F)-based derivation sequence for Z -B Won U that uses only FI–F3, M2–M5, and C1–C!2. ~OOF. Because Z ~ W is implied by M U F on U and rules F1 –F3, M 1–M5, C 1, and C2 are sound and complete, derivation sequence S for Z - W on U using these rules. include Ml, we are done. Otherwise, we replace each usage of

X ~ Y, by Ml (reflexivity) by the following

sequence

the derivation there exists a If S does not Ml as follows:

where Y ~ X,

of derivation

X ~ Y, by F1 (reflexivity) because X ~ Y, by C 1 (replication). ❑

rules: Y c X.

COROLLARY. If F = 0, there exists an M-based derivation sequence for Z + W that uses only the MVD rules M2 – M5. ~OOF. Because M1–M4 are sound and complete when no FDs are given, there exists a derivation sequence S for Z -B W that uses only M1–M4. If S does not include Ml, we are done. Otherwise, we replace each usage of Ml by the following sequence of derivation rules: X ~ R – X, by M5 (trivial complementation). X ~ R – (X(R – X)), by M4 (complementation). X ~ 0, because R – (X(R – X)) = 0. XY + Y, by M2 (augmentation). X ~ Y, because Y c X. •l ACM Transactions on Database Systems, Vol. 21, No. 1, March 1996

Redundancy in Nested Relations

Lemma 4.2 guarantees is in the right-hand side, if the MVD is implied by in N are included in the LEMMA

Aset(T) Aset(T)

.

91

us that if an attribute of a node N in a scheme tree but not the left-hand side, of a nontrivial MVD and the MVDs of the scheme tree, then all the attributes MVD.

4.2. Let U be a set of attributes. Let T be a scheme tree such that & U. Let XY G Aset(T). Let X + Y be an MVD in MVD(T)’ on such that A G Y and A E X. Let A be in node N of T, then N c XY.

WOOF. Because X ~ Y is an MVD in MVD(Z’)’ on Aset(Z’) and MVD(T ) consists only of MVDS, there exists an ikfVD(T)-based derivation sequence S for X * Y on Aset(Z’), that by the Corollary to Lemma 2, uses only the MVD rules M2–M5. We show by induction on the number of MVDs n in S that for every MVD X’ + Y’ in S, if A is an attribute in node N of T such that A ~ Y’ and A @X’, then N GX’Y’, Thus because X + Y is in S, N cXY.

Basis: Suppose n = 1, because only rules M2-M5 are used and M2-M4 require antecedents, X + Y is either given or introduced by M5 (trivial complementation). If X ~ Y is given, then X + Y= iWYD(T), and thus Y zN. If X + Y is introduced by M5 (trivial complementation), XY = Aset (7’ ), and thus N c XY. Induction: X + Y can be introduced by any of the MVD derivation rules M2-M5 or as a given MVD in MVD(T), and therefore we have five cases to consider. Because the cases for given MVDs and M5 (trivial complementation) have no antecedents, they are the same as in the basis. Therefore, we can reduce the cases to three. (1) X’ ~ Y’ is introduced by M2 (augmentation). Let V + W be the antecedent MVD and let Z’ G Z such that X’ = VZ and Y’ = WZ’. If A ● Y’ – X’, then because Z’ c Z, A 6 W and A @ V. By the induction hypothesis, N c VW and thus N G X’ Y’. (2) X’ ~ Y’ is introduced by M3 (transitivity). Let V - W and W + Z be the antecedent MVDs. Thus X’ = V and Y’ = Z – W. Now assume there exists an attribute A in node N of T such that A e Y’ and A E X’, but N ~ X Y’. Then there exists an attribute B such that B 6 N and B @ X’Y’. Because B @ X’Y’ and X’ = V and Y’ = Z – W, B @ V and either B G W or B ~Aset(T) – (VWZ). Suppose B E W, then because B G W and B @ V, by the induction hypothesis, N c VW. A ● N, therefore A G VW. But because A @ X’ and X’=V, AEV; and because A~Y’and Y’=Z– W, AEW. Thus A @ VW. We therefore suppose that B G Aset(T ) – (VWZ ). But now we have A G Y’, Y’ = Z – W, and therefore A c Z, A E W, and A = N. Therefore, by the induction hypothesis, we have N L WZ. Because B G N, B = WZ. However, B G Aset(T) – (VWZ) and thus, B # WZ—a contradiction. (3)

X’ ~ Y’ is introduced by M4 (complementation). Let V ~ W be the antecedent MVD. Thus X’ = V and Y’ = Aset(T) – (VW). Now assume there exists an attribute A in node N of T such that A G Y’ ACM Transactions on Database Systems, Vol. 21, No. 1, March 1996

92

.

W. Y. Mok et al.

and A @ X’, that B ● N B E W and N G VW. A Y’ = Aset(T)

but N ~ X’ Y’. Then there exists an attribute B such and B Q!X’Y’. Hence B E V(Aset(T) – (VW)) and thus B E V. Because B ● N, by the induction hypothesis, G N, therefore, A e VW. However, because A ~ Y’ and – (VW), A @ VW-a contradiction. ❑

Lemma 4.3 extends Lemma 4.2 to not only guarantee us that a node is included, but also that all ancestors and descendants of the node are included. That is, Lemma 4.3 guarantees us that if an attribute of a node in a scheme tree is in the right-hand side of a nontrivial MVD (but not in the left-hand side), and if the MVD is implied by the MVDs of the scheme tree, then both the ancestors and the descendants of the node are included in the MvD. LEMMA 4.3. LQt U be a set of attributes. Let T be a scheme tree such that Aset(T) c U. Let XY c Aset(T). Let X + Y be a nontrivial MVD in MVD(T)+ on Aset(T). Let A be an attribute such that A = Y and A z X, and let A be in node N of T. Then both Ancestor(N) G XY and Descendant(N) c XY. ROOF. As in Lemma 4.2, we show by induction on the number of MVDs n in the &fVD(T)-based derivation sequence S for X a Y on Aset (T ) that for every MVD X’ ~ Y’ in S if A is an attribute such that A G Y’ and A @ X’, and if A is in node N of T, then both Ancestor(N) c X’ Y’ and

Descendants(N) G X’ Y’. Basis: Suppose n = 1. Because S has no MVD introduced

by Ml (reflexivity) and there is only one MVD X ~ Y in S, X + Y is given or is introduced by M5 (trivial complementation). If X ~ Y is given, then X ~ Y G MVD(T). As argued in Lemma 4.2, Y 2 N, and since X ~ Y = MVD(T), therefore both Ancestor(N) c XY and

Descendant(N)

c XY.

If X ~ Y is introduced by M5 (trivial complementation), XY = Aset(T). Hence every node is a subset of XY, and thus, both Ancestor(N) G XY and

Descendant(N) c XY. Induction: As in Lemma 4.2, we have only three cases to consider. (1) X’ ~ Y’ is introduced by M2 (augmentation). lar to the proof of Case 1 in Lemma 4.2.

The argument

is simi-

(2) X’ ~ Y’ is introduced by M3 (transitivity). Let V + W and W + Z be the antecedent MVDs. Thus X’ = V and Y’ = Z – W. Let A be an attribute in node N of T such that A G Y’ and A G X’. Hence, by Lemma 4.2, N G X’Y’. We claim that Ancestor(N) g X’ Y’. If not, then there exists an attribute B G Ancestor(N) such that B @ X’ Y’. Because B @ X’ Y’ and X’ = V and Y’ = Z – W, B @ V(Z – W). Thus B @ V and either B G W or B G Aset(T) – (VWZ). We first assume that B G W. Because B g V and B G W, by Lemma 4.2, B is in a node N’ such that N’ G VW. B G Ancestor(N) and B G N’, thus N G Descendant(N’) and A G ACM Transactions on Database Systems, Vol. 21, No. 1, March 1996.

Redundancy in Nested Relations

.

93

Descendant(i’V’ ). By the induction hypothesis, Descendant(N’ ) c VW. Thus A G Descendant(N’), and therefore A G VW. But because A~Y’and Y’=Z– W, A@ W,and A@ Xand X’= V, contradiction. A E V. Thus A E VW-a Thus we assume that B = Aset (T) – (VWZ ). Because A G Y’ and Y’ = Z – W, A = Z and A g W. Thus by the induction hypothesis, Ancestor(N) c WZ, 23 = Ancestor(N) and Ancestor(N) c WZ, and therefore B E WZ. Thus 1? @ Aset(Z’) – ( VWZ )—a contradiction. By an identical argument with Descendant replacing vice versa, Descendant N) G X’ Y’.

(3) X’ + Y’ is introduced

by M4 (complementation). similar to the proof of Case 3 in Lemma 4.2. ❑

Lemma 4.4 tells us that the set D of dependencies tree T is the closure of itself on Aset(T).

Ancestor and

The argument

is

that holds for a scheme

LEMMA 4.4. Let U be a set of attributes. Let M be a set of MVDs over U and F be a set of FDs ol)er U. Let T be a scheme tree such that Aset(T ) c U. L-et D be the set of dependencies in (M u F)+ that hold for T. Then D+= D on Aset(T). PROOF. inference

The rules

strategy

for

can

a new

add

cases except for C2 are cannot add a new FD. Let in Dsuchthat WgYand Because X + Y holds for in (M u F)+ such that Y Z + W is in (M u F)+

proving

this

result

is to

show

that

none

of

the

D. All the straightforward, therefore we just prove that C2 X + Y be an MVD in D, and let Z - W be an FD Y nZ=O. Thus X+ Yand Z +W hold for T. T, XY G Aset(Z’) and there exits an MVD X + Y’ = Y’ n Aset(T). Z + W holds for T, and therefore and Zw ~ Aset(T). Because W c Y and Y G Y’, dependency

that

is not

already

in

W G Y’. Y = Y’ n Aset(T), Z CAset(T), and Y n Z = 0, therefore Y’ n Z = 0. Hence X + W is in (M U F)+. Because XW G Aset(T), X + W is already in D. u Lemma garded

4.5

provides

if we are only

an interesting interested

result:

in certain

the implied

given

FDs

MVDs.

can

be disre-

In particular,

if

MVD(T) and FD(T ) imply an MVD X + Y, then if we close the left-hand side of the MVD under MVD(T) and FD(T) on Aset (T ) to obtain X+, MVD(T) alone is sufficient to imply X++ Y, The converse also holds, and although we do not need the converse for Theorem 4.1, we provide because we need it later for a lemma required for Theorem 5.2.

it here

LEMMA 4.5. Let U be a set of attributes. Let M be a set of MVDs over U and F be a set of FDs over U. Let T be a scheme tree such that Aset(T ) c U. Let XY Q Aset(T). If MVD(7’) u FD(T) implies X + Y on Aset(T), then MVD(T)

implies XL

Y on Aset(T) and conversely.

The result can be proved by using Theorem

PROOF.

[1986].

VD{T )b FD( T ) +

1 in Beeri and Kifer

❑ ACM Transactions on Database Systems, Vol. 21, No. 1, March 1996.

94

.

W. Y. Mok et al.

Lemma 4.6 begins to directly address the redundancy issue in nested relations. We use it twice in Theorem 4.1, and thus we write it separately as a Lemma. Before stating and proving Lemma 4.6, we need a definition for a path in a scheme tree.

Definition 4.1. where

A path of a scheme tree T is a sequence of nodes NI, ..., N. NI is the root of T and N. is a leaf node of T and Ni is the parent of

Ni+l,l

Suggest Documents