Catriel Beeri, Ronald Fagin, and John H. Howard. Proc. 1977 ACM ... completeness of Armstrong's rules for functional .... tuples of R. W e will use the term pro-.
A COMPLETE AXIOMATIZATION FOR FUNCTIONAL AND MULTIVALUJZD
DEPENDENCIES IN DATABASE RELATIONS
Catriel Beeri, Ronald Fagin, and John H. Howard
Proc. 1977 ACM SIGMOD, D.C.P.
Smith (ed.),
Toronto, 47-61.
A COMPLETE AXIOMATIZATION FOR FUNCTIONAL AND MULTIVALUED DEPENDENCIES I N DATABASE RELATIONS
C a t r i e l Beeri Department of E l e c t r i c a l E n g i n e e r i n g and Computer S c i e n c e P r i n c e t o n U n i v e r s i t y , P r i n c e t o n , N. J. 08540 Ronald F a q i n and John H. Howard I B M R e s e a r c h L a b o r a t o r y K53/282 San J o s e , C a l i f o r n i a 9 5 1 9 3
mining i t s p r o p e r t i e s . Functional d e p e n d e n c i e s a r e a n i m p o r t a n t example o f such a r e l a t i o n s h i p . Codd, i n h i s f i r s t p a p e r s on t h e r e l a t i o n a l model [Coddl, Codd23, c b s e r v e d t h a t r e l a t i o n s i n which c e r t a i n p a t t e r n s of f u n c t i o n a l depende n c i e s o c c u r e x h i b i t u n d e s i r a b l e behavior. T h i s o b s e r v a t i o n l e d him t o t h e d e f i n i t i o n o f second and t h i r d normal forms f o r r e l a t i o n a l schemas. These normal forms e l i m i n a t e most o f t h e problems r e l a t e d t o t h e e x i s t e n c e o f functional dependencies i n the r e l a t i o n s .
ABSTRACT W e i n v e s t i g a t e the inference r u l e s t h a t c a n b e a p p l i e d t o f u n c t i o n a l and multivalued dependencies t h a t e x i s t i n a database r e l a t i o n . T h r e e t y p e s of r u l e s F i r s t , we list the w e l l a r e discussed. known r u l e s f o r f u n c t i o n a l d e p e n d e n c i e s . Ther, w e i n v e s t i g a t e t h e r u l e s f o r m u l t i v a l u e d d e p e n d e n c i e s . I t i s shown t h a t fcr ...ach r u l e f o r f u n c t i o n a l d e p e n d e n c i e s t h e same rule o r a s i m i l a r r u l e h o l d s f o r n u l t i v a l u e d dependencies. There i s , howc .ier, o n e a d d i t i o n a l r u l e f o r m u l t i v a l u e d d e p e n d e n c i e s t h a t h a s no p a r a l l e l amonq t5e r u l e s f o r f u n c t i o n a l dependencies. Finally, we present rules t h a t invo 1ve f u n c t i o n a 1 and mu l t i v a l u e d depeidencies together. The main r e s u l t of t he paper is t h a t t h e r u l e s p r e s e n t e d a r e r o m p l e t e f o r t h e f a m i l y of f u n c t i o n a l and m u l t i v a l u e d d e p e n d e n c i e s .
1.
*
Because o f t h e i r i m p o r t a n c e f o r t h e understanding of t h e p r o p e r t i e s of r e l a t i o n a l schemas, f u n c t i o n a l d e p e n d e n c i e s have been e x t e n s i v e l y s t u d i e d . Armstrong fArm] d i d a n e x h a u s t i v e s t u d y of t h e i r p r o p e r t i e s and p r e s e n t e d a s e t o f i n f e r e n c e r u l e s ( h e c a l l e d them axioms) f o r the family of functional dependencies. H e proved t h a t i f F i s a s e t o f f u n c t i o n a l d e p e n d e n c i e s t h a t i s c l o s e d under h i s r u l e s , t h e n t h e r e i s a r e l a t i o n R such t h a t t h e set of f u n c t i o n a l dependencies A s shown which h o l d f o r R is e x a c t l y F. by F a g i n [ F a g l ] , t h i s r e s u l t i m p l i e s t h e completeness of Armstrong's r u l e s f o r f u n c t i o n a l d e p e n d e n c i e s . Using A r m s t r o n g ' s r u l e s , B e r n s t e i n and B e e r i ( [ B e r n l , BernZ]) have r e c e n t l y p r e s e n t e d an e f f i c i e n t a lg o rith m f o r s y n th e s iz ing a r e l a t i o n a l schema from a g i v e n s e t o f f u n c t i o n a l d e p e n d e n c i e s . T h e i r work c l a r i f i e s t h e c o n n e c t i o n between t h e u s e r ' s view a s e x p r e s s e d by a s e t o f f u n c t i o n a l d e p e n d e n c i e s and t h e r e l a t i o n a l schema t h a t embodies i t .
INTRODUCTION
I n t h e r e l a t i o n a l model o f d a t a b a s e s , t h e d a t a is o r g a n i z e d i n t o r e l a t i o n s . A r e l a t i o n c a n b e viewed a s a t a b l e where each row o f t h e t a b l e d e s c r i b e s a n e n t i t y and e a c h column corresponds t o an a t t r i b u t e o f the e n t i t y being described. The r e l a t i o n s i n the data base a r e t i m e varying: e n t i t i e s a r e i n s e r t e d , d e l e t e d and m o d i f i e d . However, t h e s t r u c t u r e o f t h e r e l a t i o n s i s i n v a r i a n t . This s t r u c t u r e i s d e s c r i b e d by means o f a r e l a t i o n a l schema. Such a schema c o n s i s t s o f a d e s c r i p t i o n of the s t r u c t u r e of each one of t h e r e l a t i o n s , e . g . , i t s name, t h e a t t r i b u t e s t h a t appear i n it, i n t e g r i t y constraints, etc.
I t h a s b e e n o b s e r v e d b y workers i n t h e a r e a o f d a t a b a s e s e m a n t i c s (see e . g . [SS]) t h a t t h e concept of a f u n c t i o n a l dependency i s n o t g e n e r a l enough t o capt u r e t h e s e m a n t i c s o f the user's view o f t h e d a t a base. There a r e d e p e n d e n c i e s that are nat functional.
I t h a s been o b s e r v e d t h a t t h e r e l a t i o n s h i p s between t h e a t t r i b u t e s i n a r e l a t i o n have a n i m p o r t a n t r o l e i n d e t e r -
47
Suppose, for example, t h a t t h e c h i l d r e n o f employees are l i s t e d i n t h e r e l a t i o n which c o n t a i n s t h e i n f o r m a t i o n a b o u t t h e employees. C l e a r l y , t h e s e t of c h i l d r e n names of an employee depends on t h e i d e n t i f i e r o f t h e employee only: however, s i n c e t h e r e may b e more t h a n o n e c h i l d name i n t h e s e t t h i s i s n o t a f u n c t i o n a l dependency. I t h a s a l s o been observed t h a t t h e e x i s t e n c e of such ' g e n e r a l i z e d ' dependencies i n a r e l a t i o n may l e a d t o redundancy i n t h e r e p r e s e n t a t i o n o f t h e d a t a and, a s a r e s u l t , t o undes i r a b l e properties similiar to those t h a t w e r e observed by codd i n r e l a t i o n s t h a t a r e n o t i n t h i r d normal form. However, s i n c e t h i r d normal form i s based on t h e concept o f f u n c t i o n a l dependency, t h e s e p r o p e r t i e s e x i s t even i n t h i r d normal form schemas.
g i v e n t h e e x i s t e n c e o f f u n c t i o n a l and m u l t i v a l u e d dependencies i n a r e l a t i o n , o n e can i n f e r t h e e x i s t e n c e i n t h e r e l a t i o n o f a d d i t i o n a l dependencies, t h e e x i s t e n c e of which c a n n o t he i n f e r r e d u s i n g r u l e s o f t h e previous types. T h e r u l e s f o r f u n c t i o n a l dependencies a r e w e l l known and w e p r e s e n t them w i t h o u t proofs. Certain inference r u l e s of t h e second type w e r e n o t e d by Faqin [Fag21 and Z a n i o l o [ Z a n ] . The e x i s t e n c e o f r u l e s o f t h e t h i r d t y p e was f i r s t observed by F a q i n , who n o t e d a s p e c i a l cdse. F a g i n ' s work was mainly meant t o b e an i n t r o d u c t i o n t o t h e c o n c e p t of a m u l t i v a l u e d dependency and some.of t h e r u l e s w e r e p r e s e n t ed i n r e s t r i c t e d forms. O u r purpose i n t h i s p a p e r i s t o s t a t e and prove t h e r u l e s i n what w e b e l i e v e i s t h e i r most g e n e r a l form and t o show t h a t t h e r u l e s t h a t w e p r e s e n t a r e c o m p l e t e , t h a t i s , no addi t i o n a l r u l e s a r e needed.
R e c e n t l y , Faqin [Fag21 and, i n d e p e n d e n t l y , Z a n i o l o [Zan) have p r e s e n t e d a formal d e f i n i t i o n o f a n o n f u n c t i o n a l dependency, which Faqin c a l l s a " m u l t i valued dependency. It turns out that t h e s e dependencies are indeed a g e n e r a l j z a t i o n of f u n c t i o n a l d e p e n d e n c i e s . A l s o , t h e y have t h e i m p o r t a n t p r o p e r t y t h a t t h e ? x i s t e n c e o f such dependencies i n a r e l a t i o n i s e q u i v a l e n t t o the f a c t t h a t t h e r e l a t i o n i s t h e n a t u r a l j o i n o f some c f its projections, It seems therefore t h a t t h e s t u d y o f t h e s e dependencies may .ead t o a b e t t e r u n d e r s t a n d i n g o f t h e s t r u c t u r e o f r e l a t i o n a l schemas.
There a r e q u i t e p r a c t i c a l r e a s o n s f o r o b t a i n i n g a complete a x i o m a t i z a t i o n ( t h a t i s , a complete s e t of i n f e r e n c e r u l e s ) . I t i s w e l l known t h a t c o n s i d e r a t i o n o f dependencies t h a t e x i s t among a t t r i b u t e s i n t h e d a t a base i s a v a l u a b l e t o o l f o r t h e d a t a b a s e d e s i g n e r . Assume t h a t a d a t a b a s e d e s i g n e r has noted c e r t a i n f u n c t i o n a l and m u l t i v a l u e d dependencies t h a t h o l d f o r a r e l a t i o n schema t h a t i s b e i n g a n a l y z e d . He can u s e t h e s e depende n c i e s as i n p u t t o a ' b o x ' which ( u s i n g t h e complete s e t o f r u l e s ) can d e t e r m i n e f o r any o t h e r dependency whether i t i s a l o g i c a l consequence o f t h e i n p u t dependencies. (See [ B e e r l ] o r [ B e r n l ] f o r a d e s c r i p t i o n o f such a ' b o x ' f o r f u n c t i o n a l dependencies and [Beer21 f o r a desc r i p t i o n o f a ' b o x ' f o r m u l t i v a l u e d dep e n d e n c i e s . ) I f t h e s e t of r u l e s i s n o t known t o be complete t h e n t h e r e may e x i s t dependencies t h a t are l o g i c a l consequences of t h e i n p u t dependencies b u t cannot be so l a b e l e d by t h e ' b o x ' . Thus, t h e completeness of the r u l e s ensures t h e schema d e s i g n e r t h a t he h a s complete knowledge a b o u t t h e i n p u t dependencies and a l l t h e i r l o g i c a l consequences. For f u r t h e r d i s c u s s i o n of t h i s s u b j e c t s e e [Fag31 *
I'
I n [FagZ], Fagin demonstrated a number o f p r o p e r t i e s of t h e s e m u l t i valued dependencies. In particular, h e showed t h a t t h e e x i s t e n c e of such dependencies i n a r e l a t i o n may indeed r e s u l t i n u n d e s i r a b l e redundancy. T h i s observation led h i m to a defini t i o n o f a new ( ' f o u r t h ' ) normal form f o r r e l a t i o n a l schemas. To f a c i l i t a t e f u r t h e r s t u d y of t h e s e dependencies and t h e i r i n f l u e n c e on t h e s t r u c t u r e o f r e l a t i o n s , w e i n v e s t i g a t e i n t h i s paper t h e i n f e r e n c e r u l e s t h a t can b e a p p l i e d t o t h e s e dependencies. W e present a s e t of such r u l e s and prove t h a t t h i s s e t i s complete. There a r e t h r e e t y p e s of i n f e r e n c e r u l e s t o be d i s c u s s e d . The f i r s t type i s i n f e r e n c e r u l e s f o r T h e second t y p e f u n c t i o n a l dependencies. i s i n f e r e n c e r u l e s f o r m u l t i v a l u e d depend encies, t h a t i s , r u l e s t h a t allow u s t o i n f e r from t h e e x i s t e n c e of some m u l t i valued dependencies i n a r e l a t i o n t h e e x i s t e n c e o f a d d i t i o n a l m u l t i v a l u e d dependencies i n t h e r e l a t i o n . The t h i r d t y p e i s "mixed" i n f e r e n c e r u l e s by which,
I t is interesting t o note t h a t the r u l e s f o r m u l t i v a l u e d dependencies p r e s e n t e d h e r e a r e very s i m i l a r t o t h e r u l e s f o r f u n c t i o n a l dependencies. Specif i c a l l y , w e s h o w t h a t for each r u l e for f u n c t i o n a l d e p e n d e n c i e s , e i t h e r t h e same r u l e o r a very s i m i l a r r u l e a p p l i e s t o m u l t i v a l u e d dependencies and, t o o b t a i n a
48
h a s a s s o c i a t e d w i t h it a domain, d e n o t e d by DOM(A) , which i s t h e s e t o f p o s s i b l e For a s e t of values €or t h a t a t t r i b u t e . a t t r i b u t e s X , a n X-value is a n assignment of v a l u e s t o t h e a t r r i b u t e s of X from t h e i r domains. W e w i l l u s e t h e l e t t e r s A , B, f o r s i n g l e a t t r i b u t e s and t h e letters X, Y, f o r sets o f a t t r i b u t e s . Following common p r a c t i c e i n p a p e r s on r e l a t i o n a l d a t a bases, w e w i l l n o t d i s t i n g u i s h between t h e a t t r i b u t e A and t h e s e t (A). A l s o , i f X and Y a r e s e t s o f attributes (not necessarily d i s j o i n t ) , t h e n w e w r i t e X Y f o r t h e union o f X and Y whenever t h a t u n i o n a p p e a r s i n a dependenc y o r i n an e x p r e s s i o n d e n o t i n g a proj e c t i o n of a r e l a t i o n .
I n t h i s paper w e d e a l o n l y with i n f e r e n c e r u l e s f o r dependencies i n a f i x e d given r e l a t i o n . The problems t h a t a r i s e when t h e r e l a t i o n i s n o t f i x e d a r e of a d i f f e r e n t nature. For example, one may want t o know, g i v e n a s e t o f dependencies i n a r e l a t i o n , which dependencies w i l l b e v a l i d i n a p r o j e c t i o n of t h e r e l a t i o n o r i n a j o i n of t h e r e l a t i o n w i t h some o t h e r relation. These problems a r e o u t s i d e t h e scope of t h i s p a p e r . (The p r o j e c t i o n problem w a s d i s c u s s e d bv Fagin [FagZ].)
...
(Al,
..., ...,
...
There are two o p e r a t i o n s on r e l a t i o n s t h a t a r e of i n t e r e s t t o us - p r o j e c t i o n and t h e n a t u r a l j o i n . I f u is a tuple i n a relation R(X) and A i s an a t t r i b u t e i n x , t h e n u(A] i s t h e A-component o f u. Similarly, i f Y is a s u b s e t of X , t h e n u [ y ] i s t h e t u p l e ( o f s i z e I Y l ) c o n t a i n i n g t h e components of u c o r r e s p o n d i n g t o t h e e l e m e n t s of Y. The p r o j e c t i o n o f R on Y , denoted by R[Y], i s d e f i n e d by
RELATIONS AND CONSTRAINTS
The word r e l a t i o n i s used i n t h e l i t e r a t u r e t o d e n o t e a s e t o f t u p l e s and a l s o t o denote a s t r u c t u r a l d e s c r i p t i o n ~t i s i m p o r t a n t t o o f s e t s of t u p l e s . d i s t i n g u i s h between t h e s e two d e n o t a t i o n s . I n t h i s p a p e r we d e a l w i t h s e t s of t u p l e s and t h e word r e l a t i o n i s used o n l y t o denote a s e t of t u p l e s .
R[Y] = {u[Y]
and
I
u E R)
T h a t i s , R[Y] i s a r e l a t i o n on Y c o n t a i n i n g a l l t u p l e s g e n e r a t e d from t h e t u p l e s o f R by o m i t t i n g t h e components c o r r e s Note t h a t ponding t o a t t r i b u t e s n o t i n Y. R[Y] i s a s e t , so e a c h t u p l e o c c u r s o n l y once even i f i t i s g e n e r a t e d b y s e v e r a l t u p l e s o f R. W e w i l l u s e t h e term proj e c t i o n a l s o f o r s i n g l e t u p l e s ; e.g., u[Y] i s c a l l e d t h e p r o j e c t i o n o f u on Y.
The f u n c t i o n a l and m u l t i v a l u e d depend e n c i e s t r e a t e d i n t h i s paper a r e s p e c i a l t y p e s of c o n s t r a i n t s on r e l a t i o n s . In t h i s s e c t i o n we e x p l a i n what a c o n s t r a i n t i s and d i s c u s s i n f e r e n c e r u l e s and comp l e t e n e s s o f s e t s of i n f e r e n c e r u l e s f o r f a m i l i e s of c o n s t r a i n t s . Relations
...,
...
i s a comDlete s e t of i n f e r e n c e r u l e s . ~n S e c f i o n 3 w e d e f i n e f u n c t i o n a l dependentie:; and l i s t t h e i r i n f e r e n c e r u l e s . Y u l t i v a l u e d dependencies a r e d i s c u s s e d i n icction 4. They a r e d e f i n e d i n S e c t l o n 4 . 1 , t h e i r i n f e r e n c e r u l e s (of t h e second t y p e ) a r e d i s c u s s e d and proved i n S e c t i o n 4 . 2 and t h e i n f e r e n c e r u l e s o f t h e t h l r d t y p e a r e proved i n S e c t i o n 4.3. I n S e c t i o n 5 we prove t h a t t h e r u l e s l i s t e d i n S e c t i o n s 3 and 4 a r e c o c r ’ e t e f o r t h e f a m i l y o f f u n c t i o n a l and mu1:ivalued dependencies.
2.1
. ..
A r e l a t i o n on t h e set of a t t r i b u t e s A ) is a s u b s e t of t h e c r o s s n x DOM(A ) . The ele2 r o d u c t DOM(A1) x ments o f t h e r e l a t i o n are c a l f e d t u p l e s o r rows. Since a r e l a t i o n is a set, the order of t h e rows i s u n i m p o r t a n t . Also, s i n c e t h e a t t r i b u t e s are d i s t i n c t , t h e o r d e r of t h e columns i s u n i m p o r t a n t . A r e l a t i o n R on (A1, An) w i l l b e denoted by S i m i l a r l y , i f R i s a reR(A1, An). l a t i o n o v e r t h e union o f t h e sets X , Y , . . . , t h e n w e u s e t h e n o t a t i o n R ( X , Y, ...). The l e t t e r s u , v , w i l l be used t o denote s i n g l e t u p l e s .
The p a p e r is o r g a n i z e d as f o l l o w s . I n S e c t i o n 2 w e i n t r o d u c e r e l a t i o n s and d e f i n e t h e c o n c e p t of a c o n s t r a i n t on a s e t o f r e l a t i o n s . W e view b o t h f u n c t i o n a l and m u l t i v a l u e d dependencies a s speci a l t y p e s o f c o n s t r a i n t s . W e a l s o exp l a i n what 1s an i n f e r e n c e r u l e and what
2.
Each a t t r i b u t e A
f i n i t e s e t (A1,A 2 . . . ) .
complete s e t of r u l e s f o r m u l t i v a l u e d dep e n d e n c i e s , o n l y one a d d i t i o n a l r u l e is needed. T h i s s i m i l a r i t y h a s been e x p l o r e d by B e e r i [ B e e r 2 1 t o e x t e n d t h e r e s u l t s of [ B e r n l , ~ e r n a ]t o m u l t i v a l u e d dependencies.
R e l a t i o n a l Operations Let R(X,Y) and S ( Y , Z ) be r e l a t i o n s
where X, Y and Z are d i s j o i n t s e t s o f
A t t r i b u t e s a r e symbols t a k e n from a
49
p l i e d by r i f C i s v a l i d i n e v e r y r e l a t ion which obeys a l l t h e c o n s t r a i n t s i n I n o t h e r words, C i s i m p l i e d b y r i f t h e r e e x i s t s no counterexample r e l a t i o n t h a t obeys a l l t h e c o n s t r a i n t s o f r b u t does n o t obey C . An i n f e r e n c e & f o r a fami l y of c o n s t r a i n t s i s a r u l e by which, g i v e n some c o n s t r a i n t s from t h e f a m i l y , o n e c a n i n f e r t h a t some o t h e r c o n s t r a i n t i s i m p l i e d by them.
a t t r i b u t e s (i.e., Y is the set of a t t r i b u t e s t h a t a r e common t o R and S ) . The n a t u r a l join o f R and S i s t h e r e l a t i o n T ( X , Y , Z ) d e f i n e d by T(X,Y,Z)
= ((x,Y,~)
I
R and
(x,Y)
(Y,Z) 6
r.
sl
That i s , t h e n a t u r a l j o i n i s c r e a t e d by gluing t o g e t h e r t u p l e s of R w i t h t u p l e s o f S t h a t have t h e same v a l u e s f o r a l l a t t r i b u t e s t h a t are common t o t h e t w o relations. 2.2
constraints
and
For an example of an i n f e r e n c e r u l e l e t u s look a t t h e w e l l known t r a n s i t i v i t y r u l e f o r f u n c t i o n a l dependencies. ( F u n c t i o n a l dependencies a r e d e f i n e d f o r mally i n t h e n e x t s e c t i o n . ) t h e r u l e s t a t e s t h a t i f t h e f u n c t i o n a l dependenc i e s A->B and B->C are v a l i d i n a r e l a t i o n t h e n t h e f u n c t i o n a l dependency A->C i s also v a l i d i n i t . S i n c e t h e r u l e e n a b l e s u s t o i n f e r t h e v a l i d i t y of A->C i n a r e l a t i o n from t h e v a l i d i t y of A->B and B->C i n t h e r e l a t i o n , i t e n a b l e s u s t o i n f e r t h a t A->C i s i m p l i e d by A->B and
Inference r u l e s
I t is convenient t o d i s c u s s functi o n a l and m u l t i v a l u e d dependencies and t h e i r i n f e r e n c e r u l e s i n t h e c o n t e x t of c o n s t r a i n t s on r e l a t i o n s . A s w e w i l l s e e , t h e s e dependencies are indeed s p e c i a l t y p e s of c o n s t r a i n t s . A c o n s t r a i n t involving t h e set o f A n ) i s a p r e d i c a t e on a t t r i b u t e s (Al, t h e c o l l e c t i o n o f a l l r e l a t i o n s on t h i s set. A relation R(A , An) obeys t h e c o n s t r a i n t i f t h e value of t h e p r e d i c a t e I f R obeys t h e consfor R i s 'TRUE'. t r a i n t , t h e n w e a l s o s a y t h a t t h e conA constraint is s t r a i n t i s v a l i d i n R. d e f i n e d by g i v i n g a n o t a t i o n f o r exp r e s s i n g i t and t h e c o n d i t i o n under which a r e l a t i o n obeys i t .
...,
B->C.
...,
A s e t o f i n f e r e n c e r u l e s i s complete f o r a family o f c o n s t r a i n t s i f f o r each s e t r o f c o n s t r a i n t s from t h e f a m i l y , t h e c o n s t r a i n t s t h a t are i m p l i e d by rare exa c t l y t h o s e t h a t can be d e r i v e d from i t using t h e s e i n f e r e n c e r u l e s . That i s , t h e r u l e s p r e s e n t u s w i t h an e f f e c t i v e way o f f i n d i n g a l l t h e c o n s t r a i n t s t h a t are i m p l i e d by any g i v e n s e t o f c o n s t r a i n t s i n t h e f a m i l y . W e n o t e t h e followi n g i m p o r t a n t consequence of t h e d e f i n i t i o n o f completeness: A s e t of r u l e s i s complete f o r a f a m i l y o f c o n s t r a i n t s i f and o n l y i f , f o r each s e t r o f c o n s t r a i n t s i n t h e f a m i l y and f o r each c o n s t r a i n t c t h a t can n o t be i n f e r r e d from Tby u s i n g the r u l e s i n the set, there e x i s t s a rel a t i o n i n which valid but C i s not v a l i d . This observation is t h e b a s i s o f o u r completeness proof i n S e c t i o n 5 . ( A c t u a l l y , i n a d d i t i o n t o p r o v i n g compl e t e n e s s i n t h e sense defined h e r e , we a l s o prove t h e r e completeness o f o u r r u l e s f o r a s l i g h t l y s t r o n g e r concept o f completeness. )
A s an example, suppose t h a t e m p l o y e e s i n a company a r e t o be d e s c r i b e d by -: r e l a t i o n EMP(EMP-NO, EMP-SAL, EMPAGE,. . ) . A p r i o r i , any r e l a t i o n of t h i s form can e x i s t i n t h e d a t a b a s e . However, i f one s p e c i f i e s a c o n s t r a i n t t h a t ENP-AGE should b e between 18 and 6 5 t h e n o n l y r e l a t i o n s i n which t h i s c o n s t r a i n t i s v a l i d can exist i n t h e data b a s e . Simi l a r l y , t h e s p e c i f i c a t i o n t h a t Em-NO i s a key i s a c t u a l l y a c o n s t r a i n t on t h e r e l a t i o n s . A r e l a t i o n obeys t h i s cons t r a i n t o n l y i f no two d i f f e r e n t t u p l e s i n it have t h e s a m e v a l u e o f t h e a t t r i Note t h a t t o s p e c i f y t h a t b u t e EMP-NO. E W - X O i s a key is t h e same a s t o r e q u i r e t h a t a l l t h e o t h e r a t t r i b u t e s o f t h e rel a t i o n a r e f u n c t i o n a l l y dependent on i t . Thus, f u n c t i o n a l dependencies a r e const r a i n t s . The c o n s t r a i n t s which w e cons i d e r i n t h i s p a p e r a r e f u n c t i o n a l dependencies and m u l t i v a l u e d dependencies.
.
ris
The c o n c e p t of completeness of a s e t o f r u l e s i s o f prime importance i n any system where i n f e r e n c e r u l e s a r e used. As w e n o t e d i n t h e I n t r o d u c t i o n , o n l y i f a complete s e t of r u l e s i s u s e d , can t h e d a t a b a s e d e s i g n e r be a s s u r e d t h a t he h a s complete knowledge o f a l l dependencies t h a t h o l d i n a g i v e n d a t a b a s e . Indeed, t h e f a c t t h a t A r m s t r o n g ' s r u l e s f o r funct i o n a l dependencies a r e complete is an i n h e r e n t and b a s i c assumption i n t h e work
I t i s o f t e n p o s s i b l e , given a s e t of c o n s t r a i n t s , t o prove t h a t some o t h e r constraints a r e i m p l i e d by t h e q i v e n s e t . Given a s e t o f c o n s t r a i n t s = (C I' : " ' C n j , we s a y t h a t t h e c o n s t r a i n t c i s G-
r
50
I
i o n no m a t t e r what o t h e r F D ' s e x i s t i n it. However, t h i s d i s t i n c t i o n i s n o t s i g n i f i c a n t f o r t h e purpose of t h i s paper and w e ignore it.)
on f u n c t i o n a l dependencies r e p o r t e d i n [ B e r n l , Bernz]. Similarly, a study of m u l t i v a l u e d dependencies and t h e i r i n f l u ence on t h e s t r u c t u r e of r e l a t i o n s m u s t be based on a complete s e t of r u l e s f o r them. T h i s need f o r a complete s e t of r u l e s w a s t h e m o t i v a t i o n f o r t h e work r e p o r t e d i n t h i s paper. 3.
Even though t h i s s e t o f r u l e s i s complete, i t i s c o n v e n i e n t t o i n t r o d u c e a d d i t i o n a l r u l e s (which, of c o u r s e , a r e consequences of F D 1 - FD3).
FUNCTIONAL DEPENDENCIES AND THEIR INFERENCE RULES
FD4 ( P s e u d o - t r a n s i t i v i t y ) : I f X->Y and YW-X then Xw-iZ.
F u n c t i o n a l dependencies form a f a m i l y o f c o n s t r a i n t s t h a t h a s been t r e a t e d e x t e n s i v e l y i n t h e l i t e r a t u r e (see, e . g . , [ A r m , B e r n l , Bern21 and t h e r e f e r e n c e s given t h e r e ) . Therefore w e omit t h e proofs i n t h i s Section. A f u n c t i o n a l dependency ( a b b r . F D ) f , i s a s t a t e m e n t f : x+Y where X and Y are s e t s of a t t r i b u t e s . I f R ( X , Y , ...)
FD5 ( u n i o n ) : I F X+Y and X-bZ then X-+YZ. FD6 (Decomposition) : I f X->YZ t h e n X->Y and X->Z. The r u l e s FD5, FD6 s t a t e t h a t € o r a g i v e n s e t o f a t t r i b u t e s w e can combine and decompose t h e s e t s t h a t depend on it a r b i t r arily. I n p a r t i c u l a r , i t follows t h a t t h e i s e q u i v a l e n t t o t h e s e t of F D X->A1.. SAn F D ' s X->A1,. . , X->A n T h i s p r o p e r t y i s v e r y u s e f u l f o r t h e m a n i p u l a t i o n o f FD's and f o r p r o v i n g t h e i r p r o p e r t i e s . (See, f o r example, t h e e x t e n s i v e u s e of t h e s e p r o p e r t i e s i n [Bernl, BernZ].)
,
i s a r e l a t i o n on a s e t o f a t t r i b u t e s t h a t c o n t a i n s X and Y , t h e n R obeys t h e FD f i f e v e r y two t u p l e s of R which have t h e same p r o j e c t i o n on X a l s o have t h e same p r o j e c t i o n on Y. Given f:X-iY, w e say t h a t f i s a f u c c t i o n a l dependency from X t o Y , t h a t Y i 5 f u n c t i o n a l l y dependent on X o r t h a t S f u n c t i o n a l l y determines Y . From the d e f i r i t i o n it f o l l o w s t h a t f o r each p a i r o f s e t s X and Y t h e r e i s a t most one f u n c t i o n a l dependency from X t o Y. Therefore. w e u s u a l l y omit t h e name of t h e FD and w r i t e V->Y.
.
FD5,
FD2
cX cW
(Augmentation) :
If 2
(Transitivity):
I f X->Y
t h e n X+Y.
and X->Y t h e n XW->YZ.
and Y->Z t h e n x-X. ( S t r i c t l y s p e a k i n g , F D 1 i s an axiom schema: t h e " r e f l e x i v e " PD's e x i s t i n e v e r y r e l a t FD3
4.
MULTIVALUED DEPENDENCIES AND T H E I R INFERENCE RULES
4.1
M u l t i v a l u e d Dependencies
The c o n c e p t of f u n c t i o n a l dependency i s n o t g e n e r a l enough t o c a p t u r e t h e v a r i o u s t y p e s o f dependencies t h a t e x i s t in relations. I t is possible t h a t i n a r e l a t i o n t h e values o f t h e a t t r i b u t e s i n a s e t Y depend o n l y on t h e v a l u e s o f t h e a t t r i b u t e s i n a s e t X , b u t t h e r e i s more t h a n one Y-value f o r a g i v e n X-value. Such a dependency i s n o t f u n c t i o n a l . The c o n c e p t o f m u l t i v a l u e d dependency was int r o d u c e d by Fagin [Fag21 and Zaniolo [Zan] t o d e s c r i b e such dependencies. The r e a d e r is r e f e r r e d t o t h e i r papers f o r d e t a i l e d d i s c u s s i o n and examples.
Using t h e d e f i n i t i o n it i s e a s y t o prove t h e v a l i d i t y of t h e f o l l o w i n g i n f e r e n c e r u l e s ( o r axioms) f o r F D ' s . Fagin [ F a g l ] showed t h a t i t f o l l o w s from Armstrong's r e s u l t [ A r m ] t h a t t h e s e r u l e s are complete f o r F D ' s . I n t h e r u l e s , X , Y, 2 and :J a r e a r b i t r a r y s e t s of a t t r i b u t e s . If Y
Note: The s e t o f r u l e s F D 1 , FD3, i s a l s o complete f o r F D ' s .
For a s e t F o f F D ' s t h e c l o s u r e of F, denoted by F + , i s t h e s e t o f a l l F D ' S d e r i v a b l e from F u s i n g t h e r u l e s F D 1 - FD3. Armstrong's r e s u l t can be s t a t e d a s follows: For e v e r y s e t F o f F D ' s , t h e r e e x i s t s a r e l a t i o n such t h a t t h e s e t of F D ' s t h a t are v a l i d i n it is e x a c t l y F+.
Note t h a t even though an FD X->Y involves only t he a t t r i b u t e s of t h e s e t s X a n d Y , i t can b e i n t e r p r e t e d a s a cons t r a i n t on r e l a t i o n s whose a t t r i b u t e s i n c l u d e t h e a t t r i b u t e s i n X and Y and perhaps more. Indeed, it f o l l o w s from t h e d e f i n i t i o n t h a t t h e v a l i d i t y of t h e FD for a p a r t i c u l a r r e l a t i o n depends o n l y on t h e v a l u e s i n t h e columns named by t h e a t t r i b u t e s from X and s.
F D 1 ( R e f l e x i v i t y ):
.
L e t R ( U ) be a r e l a t i o n , and l e t Y be a s u b s e t of U. For each s u b s e t x of u
51
There is a fundamental d i f f e r e n c e between t h e d e f i n i t i o n s o f F D ' s and M V D ' s . An FD X->Y i s d e f i n e d ' i n t e r m s of t h e s e t s X and Y a l o n e and can b e i n t e r p r e t e d a s a c o n s t r a i n t i n v o l v i n g any s e t o f a t t r i b u t e s I n d e e d , t o check t h a t c o n t a i n s X and Y. i s valid i n a relation R(U) i f an FD X->Y w e o n l y have t o check i t s v a l i d i t y i n t h e p r o j e c t i o n R [ X Y ] : t h e v a l i d i t y does n o t depend on t h e v a l u e s of t h e o t h e r a t t r i b u t e s . On t h e o t h e r hand, t h e v a l i d i t y o f X-i->Y i n R(U) depends on t h e v a l u e s of a l l a t t r i b u t e s i n U and c a n n o t be checked i n R [ X Y ] . I t i s p o s s i b l e thaC X-+->Y is n o t v a l i d i n R ( U ) b u t t h a t X->->Y is v a l i d i n t h e p r o j e c t i o n R [ U ' ] , where W e c U. Thus, MVD'S a r e s e n s i t i v e t o "cont e x t " w h i l e f u n c t i o n a l dependencies are not. ( S e e [Fag21 f o r f u r t h e r d i s c u s s i o n o f t h i s i s s u e ) . W e w i l l see l a t e r t h a t t h i s " c o n t e x t dependence" g i v e s r i s e t o an i n f e r e n c e r u l e for MVD's t h a t h a s no It p a r a l l e l among t h e r u l e s for F D ' s . f o l l o w s t h a t t h e s p e c i f i c a t i o n o f U i s an i n t e r g r a l p a r t o f t h e MVD. I t would be more a p p r o p r i a t e , p e r h a p s , t o u s e t h e (U) t o stress the f a c t n o t a t i o n X->->Y Howt h a t t h e MTD i n v o l v e s t h e s e t U . e v e r , i n t h i s p a p e r w e assume t h a t U i s a f i x e d g i v e n s e t of a t t r i b u t e s . For t h i s r e a s o n , w e o m i t t h e r e f e r e n c e t o U and w r i t e simply x->--,Y.
(X and Y a r e n o t n e c e s s a r i l y d i s j o i n t ) , and for each X-value, x, w e d e f i n e
u[X] = x and u[Y] = y ] Y is a f u n c t i o n t h a t g i v e s , f o r e a c h X R and f o r each X-value, t h e s e t o f Y-values t h a t appear w i t h t h i s X-value i n t u p l e s of the relation. I n terms o f r e l a t i o n a l o p e r a t i o n s , t h e v a l u e o f Y ( x ) i s computed tuples ( i f by f i r s t s e l e c t i n g from R any) t h a t have x as t h e i r x - p r o j e c t i o n and then p r o j e c t i n g t h e r e s u l t i n g set o f t u p l es on Y. The set Y ( x ) i s nonempty i f and R o n l y i f x a p p e a r s i n a t l e a s t one t u p l e o f R. The d e f i n i t i o n can be g e n e r a l i z e d t o arguments t h a t are s e t s of X-values i n t h e obvious way. I n p a r t i c u l a r , i t i s possibl e t o c o n s i d e r composition, e.g., ZR(Y*(X) 1
be
-
A m u l t i v a l u e d dependency ( a b b r . MVD) , g , on a s e t o f a t t r i b u t e s U i s a s t a t e ment g:X->->Y, where X and Y a r e s u 5 s e t s o f IJ. L e t Z be t h e complement of t h e u n i o n o f X and Y i n U. A r e l a t i o n R ( v ) obeys i f f o r e v e r y XZ-value, t h e MvD g:X->+Y x z , t h a t a p p e a r s i n R, we have Y ( x z ) = Y (XI. I n words, t h e MVD g i s vRa l i d i n R +'-le s e t o f Y-values t h a t appears i n R w i t h a g i v e n x a p p e a r s w i t h e v e r y combi n a t i o n o f x and z i n R . Thus, t h i s s e t i s a f u n c t i o n of x a l o n e and d o e s n o t depend on t h e Z-values t h a t a p p e a r w i t h x. Given g:X-,->Y, we say t h a t q is a m u l t i valued dependency from X t o Y ( i n t h e s e t U). A s w e do f o r FD's, h e r e a l s o w e u s u a i l y o m i t t h e name g of t h e MVD.
ip
I n F a g i n ' s paper [Fag2], he r e q u i r e d t h a t t h e l e f t and r i g h t s i d e s o f an MvD be d i s j o i n t . That i s , f o r X->->Y t o be d e f i n e d , it i s r e q u i r e d t h a t X and Y be disjoint. The r e a s o n f o r t h i s r e s t r i c t i o n i s t h a t t r a n s i t i v i t y does n o t always hold i f the r e s t r i c t i o n is l i f t e d . We f o l l o w Z a n i o l o [ z a n ] i n t h a t w e do n o t A s w e show i n require t h i s restriction. t h e next s u b s e c t i o n , t h e u s e of t h e more general d e f i n i t i o n leads t o a set of inf e r e n c e r u l e s t h a t a r e simple t o s t a t e , e a s y t o u s e and a l l b u t one of which (complementation) a r e s i m i l a r t o c o r r e s ponding r u l e s f o r FD's. T h i s s i m i l a r i t y i s explored i n [ B e e r 2 1 t o extend t h e r e s u l t s of [ B e r n l , Bern21 t o m u l t i v a l u e d dependencies.
Remark: W e a l l o w X o r Y t o be t h e empty set. I f Y i s t h e empty set, t h e n w e g e t w e a g r e e t h a t t h i s tam t h e MVD X--.->j3; i s v a l i d i n a l l r e l a t i o n s . If X i s t h e empty s e t , w e get %->->Y which i s v a l i d i n a r e l a t i o n i f and o n l y i f t h e s e t of Y-values i n t h e r e l a t i o n i s independent o f t h e v a l u e s of all t h e o t h e r a t t r i b u t e s i n L e t R ( Y , Z ) be a r e l a t i o n the relation. Intuitively, where Y and Z a r e d i s j o i n t . t h e MVD @-->--)Y is v a l i d i n R i f and o n l y i f R i s t h e C a r t e s i a n p r o d u c t o f i t s proj e c t i o n s R[Y) and R [ Z ] . T h i s i s i n accordance w i t h an a l t e r n a t i v e c h a r a c t e r i z a t i o n of MVD's g i v e n by Fagin (see P r o p o s i t i o n 2 b e l o w ) . For f u r t h e r d i s c u s s i o n of t h e s e s p e c i a l c a s e s , -see [ F a g a l .
Fagin's r e s u l t s for the restricted O f MVD's a r e r e l a t e d t o our more g e n e r a l form by t h e f o l l o w i n g p r o p o s i t i o n , which i s a d i r e c t consequence o f t h e def i n it i o n .
form
From t h e d e f i n i t i o n s it f o l l o w s t h a t i f X->Y is v a l i d i n a r e l a t i o n R t h e n X->->Y i s a l s o v a l i d i n R. Thus, each F D i s a l s o an HVD. T h e c o n v e r s e i s n o t t r u e . A n MVD X->+Y i s an F D o n l y i f f o r e a c h x t h e s e t Y ( x ) c o n t a i n s a t most one e l e m e n t , R which i s n o t always t h e case.
P r o p o s i t i o n I: For a l l s e t s of a t t r i b u t e s x,Y and U such t h a t X , Y U and for each
c
r e l a t i o n R ( U ) , t h e MVD X-'.-+Y (on t h e s e t U ) i s v a l i d i n R i f and o n l y i f X->->Y-X is v a l i d i n R . (Y-X i s t h e s e t
52
d i f f e r e n c e o f Y and X . )
of decomposition.) c o n n e c t i o n between and t h e c o n c e p t of p e c t t h a t m's w i r o l e i n t h e theory bases.
Cl
I n o t h e r words, t h e MVD X->->y is e q u i v a l e n t t o t h e r e s t r i c t e d MVD X->->Y-X. The f o l l o w i n g i m p o r t a n t p r o p o s i t i o n g i v e s an a l t e r n a t i v e c h a r a c t e r i z a t i o n o f
4.2
MVD'S.
I n f e r e n c e Rules f o r MVD'S
I n t h i s s u b s e c t i o n w e d i s c u s s "pure" i n f e r e n c e r u l e s for MVD's, t h a t i s , r u l e s t h a t d e a l w i t h t h e i m p l i c a t i o n o f new M w ' s from g i v e n MVD's. W e w i l l d i s c u s s "mixed" r u l e s t h a t i n v o l v e b o t h FD's and MVD's i n t h e next subsection. I n the f o l l o w i n g w e assume t h a t U i s a g i v e n s e t o f a t t r i b u t e s and t h a t a l l o t h e r s e t s of a t t r i b u t e s a r e c o n t a i n e d i n i t . A l l M V D ' s a r e t o be i n t e r p r e t e d i n t h e c o n t e x t of U.
Proposition 2: L e t X , Y and Z b e s e t s such t h a t t h e i r union is U and Y f l Z C X . For each r e l a t i o n R ( U ) , t h e MVD X--,--ZY i s v a l i d i n R i f and o n l y i f R i s t h e n a t u r a l j o i n of i t s p r o j e c t i o n s R[XY] and R[XZ]
.
Proof: Fagin [Fag21 h a s s i t i o n for the case t h a t It is a p a r t i t i o n of U . t o v e r i f y t h a t the claim f a c t and P r o p o s i t i o n 1.
Because o f t h i s c l o s e t h e concept o f an MVD decomposition, w e exl l have an i m p o r t a n t of relational data
proved t h e propoX , Y and 2 form
straightforward f o l l o w s from t h i s
I n P r o p o s i t i o n 2 , t h e s e t s Y and 2 have symmetric r o l e s . A s a c o r o l l a r y w e o b t a i n t h e following i n f e r e n c e r u l e .
3
The most i n t e r e s t i n g s p e c i a l case o f P r o p o s i t i o n 2 i s g i v e n by t h e f o l l o w i n g corollary.
MVDO
Corollary: Let X and Y be s u b s e t s of U and l e t Z be t h e complement ( i n U) o f t h e For each r e l a t i o n R ( U ) , ::;-.ion o f X and Y . t h e MVD X->->Y i s v a l i d i n R i f and o n l y i i R is the natural join of i t s projections R[XY] and R[XZ]
(Complementation): L e t X , Y and Z be s e t s such t h a t t h e i r union i s U and Y f l Z 5 x. Then X+->Y i f and o n l y i f
x--;->z. The r u l e i s c a l l e d t h e complementation r u l e s i n c e i t i s u s u a l l y a p p l i e d when Z i s t h e complement ( i n U ) of e i t h e r Y o r o f t h e u n i o n of X and y . I t is t h e only r u l e f o r MVD's i n which t h e consequense of a p p l y i n g t h e r u l e t o some MVD depends ( t o some e x t e n t ) on t h e u n d e r l y i n g s e t U. ~n a l l t h e o t h e r r u l e s only the sets t h a t a p p e a r i n t h e g i v e n dependencies p a r t i c i p a t e i n forming t h e s i d e s of t h e r e s u l t dependency. A s an example of t h e use of the r u l e , w e note t h a t Proposition 1 is a consequence of t h e r u l e . Indeed, it f o l l o w s by two a p p l i c a t i o n s of M v D O t h a t X--)->Y i f f X-p--,u-(Y-X) i f f X->--.Y-X.
.
p r o p o s i t i o n 2 ( e s p e c i a l l y i n t h e form g i v e n i n t h e c o r o l l a r y ) i s probably t h e m s t important s i n g l e property of m u l t i valued dependencies. F i r s t , a s shown i n [Fag21 and i n t h e n e x t s e c t i o n , i t .-in be used t o prove v a r i o u s o t h e r prope r t i e s of :4vD's. Even more i m p o r t a n t i s t h e r e l a t i o n s h i p it e s t a b l i s h e s between Y V D ' s and decompositions of r e l a t i o n s . E s s e n t i a l l y , t h e p r o p o s i t i o n means t h a t an >lM i s an a l t e r n a t i v e way of s p e c i f y i n g t h a t a r e l a t i o n can be e x p r e s s e d a s t h e n a t u r a l j o i n of some o f i t s p r o j e c t i o n s , t h a t i s , t h a t t h e r e l a t i o n can be decomposed i n t o r e l a t i o n s on s m a l l e r s e t s o f attributes. Decomposition i s a b a s i c component o f t h e t h e o r y o f r e l a t i o n a l d a t a b a s e s . A w e l l known example of i t s u s e i s Codd ' s n o r m a l i z a t i o n p r o c e s s . More gene r a l l y , decomposition can be viewed a s p a r t of t h e p r o c e s s o f schema d e s i g n . A p o s s i b l e approach t o t h e problem o f des i g n i n g a r e l a t i o n a l schema i s t o cons i d e r a l l a t t r i b u t e s t o be i n i t i a l l y cont a i n e d i n one b i g r e l a t i o n . The schema i s t h e r e s u l t of decomposing t h i s r e l a t i o n i n t o a s e t o f r e l a t i o n s on s m a l l e r s e t s of a t t r i S u t e s . A s u i t a b l e decornposition i s determined by t h e r e l a t i o n s h i p s ( e . g . , dep e n d e n c i e s ) t h a t e x i s t amoncj t h e a t t r i (See [Fag31 f o r f u r t h e r d i s c u s s i o n butes.
W e now proceed t o p r e s e n t t h e a d d i t i o n a l inference r u l e s t h a t together with MVDO c o n s t i t u t e a complete s e t of i n f e r ence r u l e s f o r MVD'S.
MVDl (Reflexivity):
If Y X t h e n X-!->Y.
MVl2 (Augmentation): I f 2
C W and
x ->
->y
t h e n xw->->YZ. MVD3 ( T r a n s i t i v i t y ) : I f X+--rY
and
y->-\z t h e n X--,--rZ
-Y
.
The v a l i d i t y o f t h e f i r s t two r u l e s f o l l o w s d i r e c t l y from t h e d e f i n i t i o n and w e o n i t t h e p r o o f s . The most i n t e r e s t i n g Fagin c a s e o f MVD2 o c c u r s when Z = @. [Fag21 h a s proved t r a n s i t i v i t y , u s i n g
53
L
P r o p o s i t i o n 2 , f o r t h e s p e c i a l c a s e where Y and Z a r e p a i r w i s e d i s j o i n t . (Witho u t t h e s e r e s t r i c t i o n s , some o f t h e M v D ' s i n t h e r u l e are u n d e f i n e d a c c o r d i n g t o h i s d e f i n i t i o n o f MVD'S.) Note t h a t when Y and Z a r e d i s j o i n t , M V D 3 g i v e s u s c l a s s i c a l t r a n s i t i v i t y : i f X->->Y and y->->Z, t h e n X->->Z. W e present here a d i r e c t proof f o r t h e g e n e r a l c a s e .
s u b s e t of Y t h e n Y->+Z: however, t h e r e a r e many examples of M V D ' s X->->Y such t h a t t h e r e i s no MVD from X t o a s u b s e t of Y .
x,
I n t h e n e x t s e c t i o n w e prove t h a t t h e r u l e s MVDO - MVD3 a r e complete f o r t h e f a m i l y of MVD's. However, a minimal comp l e t e s e t o f r u l e s i s not; n e c e s s a r i l y convenient t o u s e . A s w e d i d f o r F D ' s , we i n t r o d u c e h e r e some a d d i t i o n a l r u l e s t h a t w e have found t o b e u s e f u l f o r t h e manip u l a t i o n of MVD'S.
w e look f i r s t a t t h e s i m p l e s i t u a t i o n of F D ' s . I f X->Y and Y->Z t h e n t h e proof of X->Z i s a l m o s t immediate. It i s true f o r a l l sets X , Y and Z t h a t 2 ( x ) c Z R ( Y ( x )) I f X-->Y t h e n Y (x)R c o n t a i n s a sing!?e element: i f a l s o Y->z R t h e n 2 (Y (x)) c o n t a i n s a s i n g l e element and R so R is
-
.
MVD4
(Pseudo-transitivity) : and yw->-->z t h e n xw->->z-yw.
I f X--)-->Y
e q u a l t o 2 ( x ) . The f a c t t h a t 2 (x) conR t a i n s a s i n g l e element means t h a P X->Z. For MVD's, however, t h e s i t u a t i o n i s more complicated. Even i f 2 (x) = Z (Y (x)), t h e c o n d i t i o n f o r X->->E may nog b R e satisfied.
T h i s r u l e i s a consequence o f M V D 2 and MW3. u s i n g augmentation on X++Y we o b t a i n Xw--2->yw, t h e n xW->->z-yT? i s obt a i n e d by t r a n s i t i v i t y .
By P r o p o s i t i o n 1 w e know t h a t Y->->Z i s e q u i v a l e n t t o Y->-%-Y. Therefore, w i t h o u t loss o f g e n e r a l i t y , w e assume t h a t Y and Z a r e d i s j o i n t . L e t W b e t h e comLet picxnent i n U of t h e union of X and Z . R ( U ) b e a r e l a t i o n i n which X->->Y and Y->->Z a r e v a l i d . To prove t h a t X->->Z we have t o show t h a t Z ( x ) = Z ( x w ) f o r each XW-value, xw, t h a f a p p e a r R s i n R.
MVD5 (Union) :
If
x->+y
and x->+y2
t h e n x->-->y MvD6
(Decomposition) :
y
1 2'
x->->y t h e n X->->y X->->Y
x ->
F i r s t w e show t h a t , f o r each y i n Y , , ( x ) , w e have Z (x) = 2 ( x y ) . C l e a r l y , U 7,.k xy) C Z R ( x ) . R I f t h e r e R e x i s t s some z
and
I f X->->Y1
->y
1 1
2 y2'
-Y
2
and
2-1.
While MVD5 i s e x a c t l y p a r a l l e l t o F D 5 , r u l e M V D 6 i s more r e s t r i c t e d t h e n F D 6 . W e c a n n o t decompose a s e t t h a t a p p e a r s on t h e r i g h t s i d e o f an M v D i n an a r b i t r a r y manner. The o n l y decomposition allowed i s t h e r e s u l t o f Boolean o p e r a t i o n s on s e v e r a l sets t h a t depend on a common set.
i n the set d i f f e r e n c e z ( x ) - z (xy) then a t h e combination xz a R p p e a r s R i n some t u p l e s of R , and b ) t h e r e i s no t u p l e u ' i n R such t h a t u ' [ X ] = x, u ' [ z ] z and u ' [ Y ] = y hold f o r u ' . I t follows t h a t Y ( X Z ) does n o t c o n t a i n y , a l t h o u g h R Y ( x ) does c o n t a i n y. S i n c e Y and Z a r e R . d i s i o i n t , t h i s c o n t r a d i c t s t h e g i v e n MVD
-
Now, s i n c e Y->->Z, we have immediatel y t h a t Z R ( x y ) = 2 ( y ) and, by t h e p r e v i o u s p a r a g r a p h , 2 (x) = '9Z R ( y ) This i s t r u e f o r each y i n Y yx) S i n c e w e want t o show t h a t Z (xw)R = Z (x), w e need o n l y show t h a t R zR(xw) = z ( y ) For some y i n Y ( x ) But, s i n c e Z an8 Y a r e d i s j o i n t , weRknow t h a t Y C XW. T h e r e f o r e xw h a s a p r o j e c t i o n y ' on Y . T h i s y ' belongs t o Y ( x ) and, s i n c e Y--,--.Z, i t f o l l o w s t h a t Z ( XRW ) = z (17'). R This concludes t h e p r o o f . R
The r u l e s M V D 5 and MVD6 a r e very u s e f u l f o r t h e m a n i p u l a t i o n o f MVD's. The c o r r e s p o n d i n g r u l e s f o r F D ' s , namely F D 5 and FD6, a r e h e a v i l y r e l i e d upon i n esse n t i a l l y a l l t h e papers t h a t d e a l with F D ' s ( e - g . , [Beerl, Bernl, BernZ]). The r u l e s M V D 5 and M V D ~ , though somewhat res t r i c t e d , are equally useful. For example, t h e y are used i n t h e completeness proof o f t h e n e x t s e c t i o n . They a r e also used i n [ B e e r 2 ] , e . g . , i n t h e c o n s t r u c t i o n o f an e f f i c i e n t a l g o r i t h m t o d e c i d e i f a g i v e n MVD can b e d e r i v e d from a given set of MVD'S.
T h e r u l e MVD3 cannot be g e n e r a l i z e d . I f Y and 2 a r e n o t d i s j o i n t t h e n it i s not always t r u e t h a t X--)->Z. See (Fag21 f o r counterexamples. ~n p a r t i c u l a r , i f z i s a
W e p r o v e h e r e o n l y iV3D5. I t i s an e a s y exercise t o p r o v e M V D 6 from MVD5 and MVDO. S t a r t i n g w i t h X--,->Y2, w e augment b o t h s i d e s by X t o o b t a i n X->-,,xy
X->->Y.
.
.
.
2'
54
Z t R ( x y )c o n t a i n s a s i n g l e element and
S i m i l a r l y , augmenting X->->Y1 by Y 2 , w e o b t a i n xJ2->->Y Y I f we w e r e dealing w i t h F D ' s , w e c&u& a p p l y t r a n s i t i v i t y t o terminate t h e proof. However, X Y 2 and YlY2 a r e n o t d i s j o i n t , and a p p l y i n g MVD3 we g e t o n l y X->->y1. Instead, we f i r s t apply complementation t o o b t a i n X Y 2 +->U-X-Y 1Y 2 ' Now t h e s i d e s are d i s j o i n t and w e a p p l y t r a n s i t i v i t y and t h e n complementation once more t o o b t a i n t h e d e s i r e d r e s u l t X->->YIYz
x->z ' . Note t h a t i f w e are g i v e n a s e t F o f F D ' s and a s e t G o f MVD's, t h e u s e o f t h e r u l e FD-MVD2 can d e r i v e new F D ' S t h a t p o s s i b l y c a n n o t be d e r i v e d from t h e s e t F alone. S i m i l a r l y , u s i n g r u l e F D - M V D ~ on t h e s e new F D ' s , w e c a n d e r i v e a d d i t i o n a l
MVD's t h a t p o s s i b l y cannot b e d e r i v e d from G and F w i t h o u t u s i n g r u l e FD-MVD2. A s an example, suppose t h a t w e have t h e a t t r i b u t e s A , B , c , D, E and w e are g i v e n t h e dependencies D->C and A->->BC. Using r u l e FD-MVD2 w e can d e r i v e t h e dependency W e l e a v e i t t o t h e r e a d e r t o check A->C.
C u r r e n t l y , w e know o f no way t o d e r i v e - MM3, t h a t i s , w i t h o u t u s i n g complementation. W e conj e c t u r e t h a t complementation must b e used. T h i s i s n o t j u s t a t h e o r e t i c a l problem. I n [ ~ e e r 2 ]i t i s shown t h a t f o r t h e s y n t h e s i s o f r e l a t i o n a l schemas from M M ' s , one needs t o consider a s e t of inference r u l e s t h a t does n o t c o n t a i n MVDO. It is therefore i n t e r e s t i n g t o know i f i n such a s e t MVD5 and MVD6 a r e independent of t h e o t h e r r u l e s . M V D 5 o r MVD6 from M V D l
4.3
t h a t t h i s dependency c a n n o t be d e r i v e d from t h e g i v e n two dependencies u s i n g all r u l e s e x c e p t FD-MVD2. The next r u l e we introduce i s n o t independent; r a t h e r , i t can b e d e r i v e d from t h e r u l e s f o r MVD's and t h e r u l e s FD-MVD1, FD-MVD2. We p r e s e n t it f o r t h e same r e a son w e i n t r o d u c e d a d d i t i o n a l r u l e s f o r F D ' s and M V D ' s - t o o b t a i n a s e t o f u s e f u l and f l e x i b l e r u l e s . For more d e t a i l s on t h e u s e o f t h i s r u l e see [ B e e r z ] .
Mixed I n f e r e n c e R u l e s
I n t h e previous subsection we d e a l t w i t h t h e f o l l o w i n g problem: Given a set o f M V D ' s , what a r e t h e a d d i t i o n a l MVD's i m o l i e d by t h e s e t ? w e now t u r n o u r a t t e r r t i o n t o t h e more g e n e r a l problem. suppcse w e a r e g i v e n a s e t F of F D ' s , and a s e t G o f MVD's. W e can a p p l y r u l e s F D 1 FD6 t o F to o b t a i n a d d i t i o n a l F D ' s ; w e can -.:so a p p l y YVDO - MVD6 t o G t o o b t a i n A r e t h e r e any a d d i t i o n a i l d i t i o n a l MID'S. a l dependencies t h a t a r e implied by F U G ? billat a r e t h e i n f e r e n c e r u l e s t h a t can be used t o d e r i v e them? One such r u l e h a s a l r e a d y been mentioned i m p l i c i t l y , namely, t h a t each FD i s a l s o an MVD. Thus w e can apply >lVDO - MVD6 to F Y G , n o t Only t o G . I t a l s o t u r n s o u t t h a t c e r t a i n combinations o f FD's and MVD's imply a d d i t i o n a l F D ' s t h a t c a n n o t be d e r i v e d by t h e u s e o f t h e above r u l e s . These o b s e r v a t i o n s l e a d u s t o introduce here three additional r u l e s . FD-MVDI:
I f X-BY
FD-MVD2:
I f X->->Z
FD-MVD~:
t r a n s i t i v i t y w e o b t a i n X->->z-xY. Now, XY+Z i s e q u i v a l e n t t o XY->Z-xY so by and a p p l y i n g FD-MVD2 t o X->+Z-XY xI-->Z-XY, w e o b t a i n X->Z-XY. Clearly, t h i s i m p l i e s X->Z-Y. 5.
L e t F and G be sets of F D ' s and M V D ' s (on a s e t U ) , r e s p e c t i v e l y . The c l o s u r e o f F U G , denoted by (F,G)+, i s t h e s e t o f a l l F D ' s and M v D ' s t h a t can be d e r i v e d from F U G by r e p e a t e d a p p l i c a t i o n s o f t h e r u l e s i n t h e s e t ( F D 1 , F D 2 , F D 3 , MVDO, M V D L , MVD2, MvD3, FD-MVD1, FD-MVD2). By t h e r e s u l t s of t h e p r e v i o u s s e c t i o n s and t h o s e o f [Arm, F a g l ] , each of t h e s e r u l e s i s a v a l i d inference r u l e f o r t h e family of dependencies. T h e r e f o r e , each dependency i n ( F , G ) + i s i m p l i e d by F J G , 1 . t- , it i s v a l i d i n each r e l a t i o n t h a t obeys a l l t h e dependencies i n F and G . To prove completeness o f t h i s s e t o f r u l e s , i t remains t o show t h a t t h e converse i s
Z),
where y and Z a r e d i s j o i n t , t h e n x->Z'
COMPLETENESS O F THE RULES
In this section w e prove t h a t t h e s e t o f r u l e s i n t r o d u c e d i n S e c t i o n s 3 and 4 i s complete f o r t h e f a m i l y of f u n c t i o n a l and m u l t i v a l u e d dependencies.
and Y->Z'
g
and XY->Z, t h e n *
To prove t h i s r u l e , w e f i r s t augment X->->Y by X t o o b t a i n X-->XY. Applying
t h e n X->->Y. (2'
I f X->->Y X->Z-Y
.
FD-MVD1 f o l l o w s from t h e d e f i n i t i o n s . W e prove FD-,MVD2. L e t R ( U ) be a r e l a t i o n i n which 'X-2-z~ and Y - > z ' a r e v a l i d . S i n c e Y
and Z a r e d i s j o i n t , i t f o l l o w s from t h e d e f i n i t i o n o f an MVD t h a t Z ( x ) = z (xy) f o r each xy t h a t a p p e a r s i n R t h e r e l aR t i o n . S i n c e Z ' g 2, it f o l l o w s t h a t a l s o 2' (x) = it follows h a t Z I R ( x y ) . B u t from Y->Z' xI-->z'. So Z ' (x) which i s e q u a l t o R
55
a l s o t r u e , t h a t is , t h a t each dependency t h a t i s i m p l i e d by F U G b e l o n g s t o ( F , G ) + .
i n g s e t s c o v e r U-X*. w e note that since and X*->X, it f o l l o w s t h a t D E P ( X ) = DEP(X*) and t h e dependency basis of X i s t h e s a m e a s t h e dependency b a s i s of X*.
X+X*
R e c a l l t h a t a dependency f i s i m p l i e d by F U G i f t h e r e i s no counterexample rel a t i o n such t h a t a l l dependencies i n F U G a r e v a l i d i n it b u t f i s n o t . To show com p l e t e n e s s o f t h e rules, w e have t o show t h a t f o r each dependency n o t i n ( F , G ) + such a counterexample r e l a t i o n does e x i s t .
Theorem 1 (Completeness theorem f o r F D ' s and MVD's): L e t F and G b e s e t s of F D ' s and MvD's (on a s e t U ) , r e s p e c t i v e l y . For each f u n c t i o n a l o r m u l t i v a l u e d dependency t h a t does n o t b e l o n g t o ( F , G ) + t h e r e e x i s t s a r e l a t i o n R ( U ) such t h a t all t h e dependencies i n (F,G)+ a r e v a l i d i n R b u t t h e g i v e n dependency i s n o t v a l i d i n R.
I n t h e f o l l o w i n g w e assume t h a t s e t s respectively, are given. Before w e p r e s e n t t h e complet+ n e s s theorem w e need a f e w more c o n c e p t s .
.
F and G of F D ' s and M V D ' s ,
Proof: L e t X b e t h e l e f t s i d e of t h e dependency t h a t i s n o t i n ( F , G ) + . The s e t X* i s a p r o p e r s u b s e t o f U s i n c e . otherwise every ( f u n c t i o n a l o r multivalued) dependency w i t h l e f t s i d e X b e l o n g s to (F,G)+. L e t W ,.. . , W m ( m 2 1) b e t h e sets i n t h e depenhency b a s i s o f x t h a t Thus, X*, W1, ,W form a c o v e r U-X*. p a r t i t i o n o f U. The MVD X->->XTIW1l.. wm i s i n ( F , G ) + a n d , f u r t h e r m o r e , i f an MVD i n ( F , G ) + h a s X a s i t s l e f t s i d e t h e n i t s r i g h t s i d e i s a union o f a s u b s e t o f X* and some of t h e s e t s w 1' - ,wm*
L e t X be a s u b s e t of U. There a r e s e v e r a l s e t s Y such t h a t t h e MVD X++Y is i n ( F , G ) + , ( e . g . , x->->u-x i s always i n (F,G)+). Following Fagin, w e u s e t h e n o t a t i o n x+--,Y i . . . [ Y t o denote t h e c o l l e c t i o n o f M 3 D s x->+bl, x->->Y , X--'--)Y . From now o n , when t h i s n o $ a t i o n i s u s e % , w e assume t h a t none of t h e s e t s Y1,...,yk i s t h e empty s e t .
I y2
...
...,
--
L e t u s d e n o t e by D E P ( X ) t h e f a m i l y o f a l l s e t s Y f o r which X->->Y. (DEP(X) i s , o f c o u r s e , a f u n c t i o n of t h e g i v e n s e t s o f dependencies F and G.) W e have s e e n t h a t D E P O ; ) i s c l o s e d under Boolean o p e r a t i o n s (!TJPr, b l V D 6 ) . Therefore. it contains a u n i u .e s u b f a m i l y w i t h t h e f o l l o w i n g properties: -)
The s e t s i n t h e s u b f a m i l y a r e nonempty .
'1
Each p a i r of s e t s i n t h e subfamily i s d i s j o i n t .
c)
Each s e t i n D E P ( X ) i s a union o f s e t s from t h e s u b f a m i l y .
.I
The r e l a t i o n R ( U ) is c o n s t r u c t e d as f o l l o w s : W e choose t h e s e t (0,l) a s t h e domain of each o f t h e a t t r i b u t e s i n U . The r e l a t i o n R h a s 2m rows, one row f o r each sequence of z e r o s and o n e s of l e n g t h m. I n t h e row c o r r e s p o n d i n g t o a sequence < a l y . . . , a > (where a E (0,l)), m i each o f t h e a t t r i b u t e s i n W . i s a s s i g n e d t h e value a ( i =,l. . . , m ) . Sach a t t r i b u t e i i n X* i s a s s i g n e d t h e v a l u e 1 i n a l l t h e rows o f t h e r e l a t i o n . For example, i f m = 3 , t h e n t h e row c o r r e s p o n d i n g t o t h e sequence h a s a l l 1's i n t h e X* columns, a l l 0 ' s i n t h e W columns, a l l 1's i n t h e W columns an& a l l 1's i n t h e 2 W3 columns.
T h i s s u b f a m i l y consists o f all nonempty minimal s e t s i n DEP(X), i . e . , t h o s e s e t s t h a t do n o t c o n t a i n any o t h e r nonernpty s e t of DEP(X) , W e c a l l t h i s subfamily t h e I f yl,. . . ,yk a r e dependency b a s i s o f X. t h e s e t s i n t h e dependency b a s i s o f X , then a s Fagin noted [Fag21 , t h e " g e n e r a l i z e d " NVD Y->JrY1 I . . (Y,--Z t o o b t a i n t h a t U-wi->-=Z is i n (F,G)+. S i n c e x+->u-w is a l s o i n ( F , G ) + it f o l l o w s by t r i n s i t i v i t y ( ~ ~ 3 3 ) t h a t X->+Z-(U-W. ) , t h a t i s X--)->Z 17 w is i n ( F , G ) + . T k i s i s a contradictionito t h e assumption t h a t W i s a member o f t h e dependency b a s i s of X.i Thus Y m u s t i n t e r n wi i s sect W and, by c l a i m 3 , Y++Z valid i i n R . W e have now shown t h a t , f o r each i , Y->->Z n w . i s v a l i d i n R and s o i s a l s o Y->-sZ n X 4 . By t a k i n g t h e union (MvD5) i t follows t h a t Y->->Z is valid i n
W e show t h a t g i s v a l i d i n R . that Y :Z fl X* i s v a l i d i n R .
.
We f i r s t prove one d i r e c t i o n of c l a i m 1. For each f i x e d row of R , a l l t h e a t t r i b u t e s i n W have t h e same v a l u e . I t i f o l l o w s t??at every a t t r i b u t e i n W i s f u n c t i o n a l l y dependent i n R on e vie r y o t h e r a t t r i b u t e i n W. ( a n d , by augmentat i o n , on e v e r y s e t t h a t c o n t a i n s such an a r t r i b u t e ) . Thus w e have proved t h a t i f the l e f t s i d e of t h e F D i n t e r s e c t s Wi then t h e r"D i s v a l i d i n R . From t h i s a ! 30 f o l l o w s t h e c o r r e s p o n d i n g d i r e c t i on o f c l a i m 3 , s i n c e e v e r y FD i s a l s o an f.Y '3
.
W e now prove c l a i m 2 and t h e o t h e r C..rections of c l a i m s 1 and 3 . The rel a t i o n I? i s t h e C a r t e s i a n p r o d u c t o f i t s p! l e c t i o n s R [ W . ] and R[U-W.]. It follows immediately t h a t th; MVD 8--->Wi i s v a l i d i n R and, by augmentation, Y--'---W i s v a l i d i n R , f o r e v e r y s e t Y. i T h i s proves claim 2 . Now l e t y and Z be s e t s such t h a t Y i s d i s j o i n t from W and Z i s a nonempty s u b s e t o f w . . It i
R.
F i n a l l y , l e t u s c o n s i d e r t h e dependency ( w i t h l e f t s i d e X ) which i s known n o t t o b e i n ( F , G ) + . I f i t i s an FD X - ~ Y t h e n Y i s n o t a s u b s e t of X*, so Y i n t e r s e c t s W f o r some i . By FD6 i f X->Y i s valid i n i R so i s X->Y n w. and t h i s con-
f o l l o w s from t h e above f a c t o 1rization of R t h a t f o r each Y-value y , t h e s e t Z ( y ) c o n t a i n s two 2-values a O-assR
-
ignment and a 1-assignment t o t h e a t t r i b u t e s of 2. I n p a r t i c u l a r , t h e F D Y--.Z i s n o t v a l i d i n R ; t h i s concludes t h e proof of c l a i m 1. I f Z i s a p r o p e r s u b s e t o f wi l e t A b e an a t t r i b u t e i n W . -2. Then Z ( y a ) , where ya i s a YA-value, conFains o n l y a s i n g l e 2-value s i n c e t h e a t t r i b u t e s of Z m u s t b e a s s i g n ed t h e same v a l u e a s t h e a t t r i b u t e A . T h e r e f o r e 2 ( y ) # z ( y a ) and i t f o l l o w s R i s n o t v a l i d i n R. t h a t t5e ?I&Y--.-.z
t r a d i c t s c l a i m 1. ( R e c a l i t h a t x * i s d i s Therefore j o i n t from each o f t h e W i . ) X->Y i s n o t v a l i d i n R. I f t h e dependent h e n , f o r same i , cy i s an MVD X-?->Y Y fl W . m u s t be a nonempty p r o p e r s u b s e t o f W 1 ( e l s e s i n c e X--.Y 0 X* and X--'--,Y n w f o r each i i a r e i n ( F , G ) + t h e MTJD i X-+->Y would be i n ( F , G ) + ) . Now, f o r n Wi i s n o t v a l i d i n R by t h i s i , X->->Y claim 3 . S i n c e X->->W is v a l i d i n R , i f X->->Y w e r e a l s o v a l i d i i n R we could app-
T h i s concludes the proof o f c l a i m 3.
l y M V D 6 ( i n t e r s e c t i o n ) t o o b t a i n a conis not v a l i d i n t r a d i c t i o n . Thus X->->Y R. 3
W e now show t h a t R s a t i s f i e s t h e con-
d i t i o n o f t h e theorem.
First,
let f be
57
BY t h e theorem, w e know t h a t any s i n g l e dependency n o t i n ( F , G ) + i s n o t i m p l i ed by F U G . One might a s k whether it is p o s s i b l e t h a t t h e set F U G implies a s t a t e m e n t l i k e 'f o r f i , where f and f 1 The meaning do n o t b e l o n g t o ( F , G ) such an i m p l i c a t i o n i s t h a t e i t h e r f o r f 2 1 must b e v a l i d i n each r e l a t i o n i n which F U G i s v a l i d though none o f them i s v a l i d i n a l l such r e l a t i o n s . O u r n e x t theorem s t a t e s t h a t t h i s i s n o t t h e case. A s w e have n o t e d a t t h e end o f S e c t i o n 2 , t h i s means t h a t w e a r e p r o v i n g completeness of t h e r u l e s u s i n g a c o n c e p t o f completeness t h a t is s t r o n g e r t h a n t h e one d e f i n e d t h e r e . W e n o t e t h a t Fagin [ F a g l ] p r e s e n t s an example of a system i n which completeness h o l d s b u t i n which " s t r o n g completeness" f a i l s .
.
n e s s r e s u l t s f o r t h e s u b f a m i l y of F D ' s and W e f i r s t show f o r t h e s u b f a m i l y o f MVD's. t h a t the r u l e s FD1 FD3 a r e complete f o r FD's. T h i s r e s u l t means t h a t e v e r y FD t h a t i s d e r i v a b l e from a g i v e n s e t o f F D ' S by u s i n g a l l n i n e r u l e s (FDl-FD3, MVDOMVD3, FD-MVD1,FD-MVDZ) presented in t h i s p a p e r , can be d e r i v e d from i t u s i n g o n l y t h e t h r e e r u l e s FD1-FD3. Similarly, we show t h a t t h e r u l e s MVDO-MVD3 a r e complete for MVD's. T h a t i s , e v e r y MVD d e r i v a b l e from a g i v e n s e t of MVD's can b e d e r i v e d from i t u s i n g o n l y t h e s e f o u r rules -
-
02
The completeness r e s u l t for F D ' s i s n o t new, o f c o u r s e . The proof p r e s e n t e d h e r e i s e s s e n t i a l l y t h e proof g i v e n i n [ F a g l ] . W e i n c l u d e it h e r e f o r s e v e r a l reasons. F i r s t l y , w e want t o p r e s e n t a l l completeness r e s u l t s i n one p l a c e . Secondly, t h i s i s t h e f i r s t t i m e t h a t t h e completeness i s s u e i s d i s c u s s e d i n t h e c o n t e x t of FD's and MVD's t o g e t h e r . U n t i l now, e v e r y known i n f e r e n c e r u l e f o r F D ' s w a s e i t h e r one o f FD1-FD3 o r could b e shown t o be p r o v a b l e from them ( e . g . , FD4, F D 5 and FD6). Nowe have two a d d i t i o n a l r u l e s , namely FD-MVD2 and FD-MVD3, which are n o t l o g i c a l consequences of FDl-FD3. W e want t o s t r e s s t h e f a c t t h a t , when o n l y F D ' s are g i v e n , t h e r u l e s FD1-FD3 are sufficient to derive a l l derivable FD's. (The r e a d e r should n o t e , however, t h a t t h i s f a c t does f o l l o w from t h e d e f i n i t i o n o f completeness and t h e p r o o f s o f completeness of t h i s s e t of r u l e s i n [ A r m , F a g l ] . W e want t o s t r e s s a known r e s u l t , n o t t o prove a new r e s u l t . ) Finally, t h e r e is an i n t e r e s t i n g c o r o l l a r y t h a t f o l l o w s from t h e p r o o f . '
Theorem 2 ( S t r o n q completeness theorem) : L e t F and G be s e t s of F D ' s and M V D ' s (on a s e t U ) , r e s p e c t i v e l y . There e x i s t s a r e l a t i o n R ( U ) such t h a t t h e s e t o f depend e n c i e s v a l i d i n R i s e x a c t l y (F,G)+. Proof: W e want t o show t h a t t h e r e e x i s t s a r e l a t i o n i n which a l l dependencies i n (F,; + a r e v a l i d and i n which no o t h e r dependency i s v a l i d . We have s e e n i n t h e p r o r f of Theorem 1 t h a t f o r e a c h dependenc. f n o t i n ( F , G ) + t h e r e e x i s t s a r e l a t i o n R (U) i n which a l l dependencies i n ' ' , G ) $ a r e v a l i d , f i s n o t v a l i d and a l l a t t r i b u t e s assume o n l y v a l u e s from t h e s e t { O , 11. L e t u s now u s e f o r each d e p ,dency f a d i s t i n c t s e t o f v a l u e s (Cf,lf). The r e q u i r e d r e l a t i o n R ( U ) i s thc nion of t h e s e r e l a t i o n s R ( U ) o v e r I t 1s a l l dependencies f n o t i n ( F , G f + . e a s y t o see t h a t R s a t i s f i e s t h e condi3 t i o n o f t h e theorem. Note that o u r approach t o proving s t r o n g completeness i s somewhat d i f f e r e n t from A r m s t r o n g ' s approach. Armstrong proved s t r o n q completeness f o r F D ' s . For t h a t he showed how t o d i r e c t l y c o n s t r u c t t h e counterexample r e l a t i o n i n which t h e dependencies i n t h e c l o s u r e of a g i v e n s e t a r e v a l i d and no o t h e r dependency i s valid. The r e s u l t i n g r e l a t i o n i s q u i t e cumbersome. I n o u r approach, one f i r s t c o n s t r u c t s a counterexample r e l a t i o n f o r each dependency t h a t i s n o t i n t h e c l o s u r e o f t h e g i v e n s e t , a s i n t h e proof o f Theorem 1: then t h e s e r e l a t i o n s a r e g l u e d t o g e t h e r The e n s u i n g a s i n t h e proof o f Theorem 2 . r e l a t i o n i s more e a s i l y understood t h a n Armstrong's r e l a t i o n . I n t h e p r e v i o u s theorems w e s t a t e d t h e completeness of o u r n l e s f o r t h e f a m i l y o f all dependencies. We now p r e s e n t complete-
Theorem 3 ( S t r o n q completeness theorem f o r F D ' s ) : T h e r u l e s F D 1 , F D 2 , FD3 a r e s t r o n g l y complete f o r t h e f a m i l y o f FD'S.
f o r every there exists a relation in which a l l t h e F D ' s t h a t a r e d e r i v a b l e from t h e g i v e n s e t by u s i n g t h e r u l e s FD1-FD3 a r e v a l i d and i n which no o t h e r FD i s v a l i d . What we w i l l show i s t h a t , g i v e n a s e t o f F D ' s , i f an FD c a n n o t be d e r i v e d from i t by u s i n g t h e s e t h r e e r u l e s , t h e n it c a n n o t be d e r i v e d from i t a t a l l ( t h a t i s , by u s i n g a l l n i n e r u l e s ) . The d e s i r e d r e s u l t w i l l then follow a s a c o r o l l a r y of Theorem 2 . W e u s e t h e same t e c h n i q u e t h a t w a s used i n t h e proof of Theorem 1 b u t t h e proof h e r e i s much simpler . Proof:
W e have t o show t h a t ,
s e t o f FD's,
W e f i r s t n o t e t h a t t h e r u l e s FD5 and
58
i
i
r u l e s FDl-FD3. T h i s d e f i n i t i o n is more r e s t r i c t e d t h a n o u r d e f i n i t i o n which i n c l u d e s i n t h e c l o s u r e a l l F D ' s and M V D ' s d e r i v a b l e from the set by u s i n g a l l n i n e r u l e s p r e s e n t e d i n t h i s paper. However, i t f o l l o w s from t h e theorem t h a t , a s f a r as F D ' s are c o n c e r n e d , t h e two d e f i n i t i o n s are e q u i v a l e n t . c l e a r l y , i n any work t h a t d e a l s o n l y w i t h F D ' s and i n which t h e c o n c e p t o f MVD does n o t a p p e a r , no a m b i g u i t y can a r i s e i f t h e more r e s t r i c t e d d e f i n i t i o n i s used.
and F D 6 can be proved from F D 1 - F D 3 . T h i s means t h a t an a p p l i c a t i o n of F D 5 o r F D 6 i s e q u i v a l e n t t o a sequence of a p p l i c a t i o n s of r u l e s from t h e s e t ( F D ~ , FD2, FD3). T h e r e f o r e w e can assume i n t h e following, without l o s s o f g e n e r a l i t y , t h r : a l l F D ' s have a s i n g l e a t t r i b u t e on their right side. L e t F be a g i v e n s e t of F D ' s on a s e t of a t t r i b u t e s U, and l e t f:X->A b e an F D t h a t cannot be d e r i v e d from F by u s i n g t h e r u l e s FDl-FD3. L e t u s d e n o t e by i? t h e s e t of a t t r i b u t e s t h a t a r e f u n c t i o n a l l y depend e n t on X by some F D d e r i v a b l e from F by (We note t h a t , using t h e r u l e s FD1-FD3. because of F D 1 , 2 c o n t a i n s X . I t is a l s o c l e a r t h a t 2 i s c o n t a i n e d i n X* - t h e s e t of a t t r i b u t e s t h a t a r e f u n c t i o n a l l y depend e n t on X by some FD i n F+. A p r i o r i , the 1 a'. +.er containment may b e p r o p e r . - However , i t follows from t h e theorem t h a t X = X*. We c a n n o t , of c o u r s e , u s e t h i s e q u a l i t y u n t i l t h e theorem h a s been proven. ) L e t R ( U ) be a r e l a t i o n c o n s i s t i n g of t h e f o l l o w i n g two t u p l e s . In the first tuple of R a l l t h e a t t r i b u t e s a r e a s s i g n e d t h e v a i L e 1. I n t h e second t u p l e o f R a l l a t t r i b u t e s of >r a r e a s s i g n e d t h e v a l u e 1 an(? a l l t h e a t t r i b u t e s i n U-? a r e a s s i g n e d t h e value 0 . Note t h a t u-? i s n o t empty s i n c e it c o n t a i n s t h e a t t r i b u t e A.
A s a c o r o l l a r y t o t h e p o o f of Theorem 3 w e can now o b t a i n a'simple c h a r a c t e r i z a t i o n of the MVD's i n the closure of a set F o f F D ' s . Namely, each MVD i n F+ i s e i t h e r one o f t h e F D ' s i n F+ o r i s t h e r e s u l t o f a p p l y i n g MVDO (comp l e m e n t a t i o n ) t o such an F D . W e n o t e t h a t t h i s c o r o l l a r y can a l s o be o b t a i n e d from a r e s u l t by Rissanen [ R i S S , The.11.
c o r o l l a r y : L e t F b e a s e t o f F D ' S on a s e t of a t t r i b u t e s U . L e t g:X->-+Y be an MVD i n F+ and l e t be t h e MVD ->U-Y (which i s a l s o i n F + ) . Then e i t h e r g o r g i s t h e r e s u l t of a p p l y i n g r u l e FD-MVD~ t o an FD i n F+. (Recall t h a t r u l e FD-MVD1 e s s e n t i a l l y s t a t e s t h a t e v e r y F D i s a l s o an MVD. Thus t h e c o r o l l a r y means t h a t e i t h e r g o r i t s 'complement' S i s e s s e n t i a l l y an FD i n F ' . )
-Proof:
L e t t h e s e t X be t h e l e f t s i d e o f t h e g i v e n MVD g. we c o n s i d e r t h e r e l a t i o n R ( U ) c o n s t r u c t e d i n t h e proof o f Theorem 3 , i n which a l l a t t r i b u t e s of X* ( = 2) a r e a s s i g n e d t h e v a l u e 1 i n both t u p l e s and t h e a t t r i b u t e s of U-X* a r e assigned t h e value 1 i n t h e f i r s t t u p l e (For and t h e v a l u e 0 i n t h e second t u p l e . t h i s X it i s p o s s i b l e t h a t U-X* i s empty, s i n c e now no a t t r i b u t e A i s g i v e n t h a t i s known t o be i n U-X*.) W e want t o show i s an F D i n F+. t h a t e i t h e r X->Y o r X->U-Y S i n c e g i s i n F', w e know t h a t g i s v a l i d i n R. Now, i f Y i s a s u b s e t o f X* t h e n X->Y i s an FD i n F+. If Y i n t e r s e c t s U-X* t h e n it i s o b v i o u s t h a t X->->Y is valid i n R only i f Y contains a l l a t t r i b u t e s o f U-X*. (Compare t o c l a i m s 2 and 3 i n t h e proof o f Theorem 1. ) B u t then U-Y i s a s u b s e t o f X* and X->U-Y i s an F D i n F+. 0
I t i s e a s y t o see t h a t a ) e v e r y F D whose r i g h t s i d e i s c o n t a i n e d i n i? i s ' J ~ ,d , i n R , and b ) an F D whose r i g h t s i d e L n r e r s e c t s U-2 i s v a l i d i n R i f and o n l y i f t s l e f t s i d e a l s o i n t e r s e c t s U-?. ff:-,--pare t o c l a i m s 1 and 3 i n t h e proof o f Theorem 1 . ) L e t u s now c o n s i d e r an a r b i t r a r y FD Y->B i n t h e g i v e n s e t F. W e w i l l show t h a t t h e FD i s v a l i d i n R . I f Y is a s u b s e t of 2 t h e n X->y can be d e r i v e d f r o m F by t h e r u l e s F D l - F D 3 ; since the FD Y-pB i s i n F , w e o b t a i n by F D 3 ( t r a n s - ' i t i - i i t y ) t h a t B i s i n 2 and Y->B is v a l i d i n R. I f Y i n t e r s e c t s U-? t h e n , by t h e two c l a i m s above, Y->B i s v a l i d i n R . It follows t h a t a l l t h e F D ' s i n F a r e v a l i d i n R and t h e r e f o r e a l l F D ' s and MVD's i n Ff a r e v a l i d i n R .
Now l e t u s c o n s i d e r t h e g i v e n F D f:X->A. S i n c e i t c a n n o t be d e r i v e d from F k y u s i n g t h e r u l e s F D l - F D 3 , A belongs t o U-X. I t follows from ( b ) above t h a t f i s n o t v a l i d i n R and t h e r e f o r e t h a t f i s n o t i n F+. 3 I n many p a p e r s d e a l i n g w i t h F D ' S ( e . g . [ A r m , B e r n l , B e r n 2 , F a g l ] ) , t h e c l o s u r e of a set of F D ' s i s defined t o b e t h e s e t of a l l F D ' s d e r i v a b l e from i t by u s i n g t h e
W e now p r e s e n t t h e completeness res u l t for MVD'S.
Theorem 4 ( S t r o n g completeness theorem for M V D ' s ) : The r u l e s M V D O , MVD1, MVD2 MvD3 a r e complete f o r t h e f a m i l y o f M W ' s . )
,
Proof:
L e t G be a g i v e n s e t of MVD's.
W e want t o show t h a t t h e r e e x i s t s a r e -
59
l a t i o n i n which a l l MVD's d e r i v a b l e from G by t h e u s e of t h e r u l e s MVDO-MVD~ are v a l i d Let R and i n which no o t h e r MVD i s v a l i d . be t h e r e l a t i o n c o n s t r u c t e d i n t h e p r o o f o f Theorem 2 where G i s t h e g i v e n set of MVD'S and F i s t h e empty set of FD's. A s w e showed i n t h e p r o o f o f Theorem 2 , a l l MVD's d e r i v a b l e from t h e s e t F U G (which i s e q u a l t o G , s i n c e F i s empty) by u s i n g FD-MVDI and t h e r u l e s FDl-FD3, MYDO-MVD~, FD-MVD2 a r e v a l i d i n R and no o t h e r MVD's a r e v a l i d i n R. Now, t h e o n l y way t o d e r i v e new MVD's i n Gf w i t h o u t u s i n g t h e r u l e s MVDO-MVD3 i s by a p p l y i n g r u l e FD-MVD1 t o some FD's. However, s i n c e F i s t h e empty s e t , i n i t i a l l y w e h a v e no FD's. The o n l y way t o g e n e r a t e new FD's when no FD's a r e g i v e n i s t o u s e r u l e FD1 which g e n e r a t e s all t h e " r e f l e x i v e " FD's ( o f t h e form X->Y where Y is a subset of X ) . I t i s e a s y t o see t h a t r u l e FD-MVD2 c a n n o t be a p p l i e d when o n l y r e f l e x i v e F D ' s a r e g i v e n and t h a t t h e o t h e r FD-generating r u l e s (FD2, FD3) cann o t g e n e r a t e n o n r e f l e x i v e FD's from ref l e x i v e FD's. Thus, t h e c l o s u r e of G cont 3 i n s o n l y r e f l e x i v e FD's and a p p l y i n g r u l e F'D-MVD~ w e g e t o n l y t h e r e f l e x i v e W ' s . i l \ w e v e r , t h e s e a r e also g e n e r a t e d b y r u l e NVDl. To c o n c l u d e , a l l ~ I V D ' Sv a l i d i n R a r e g e n e r a t e d from G by t h e r u l e s MVDOm3 a s w a s t o be shown. Ll
'.
Note t h a t now w e a l s o have a c h a r a c t i z a t i o n o f a l l FD's i n t h e c l o s u r e o f a - v e n s e t G o f MVD's, namely, e v e r y FD 1 G' i s a r e f l e x i v e FD. I t follows t h a t inhen o n e i s i n t e r e s t e d i n MVD'S, no amb i g u i t y c a n a r i s e i f t h e c l o s u r e of a g i v e n s e t o f t h e W I D ' S IS d e f i n e d t o be t h e s e t o f MVD's d e r i v a b l e from i t by u s i n g the r u l e s MVDO-MVD~. 8
6.
CONCLUSION
I n t h i s p a p e r w e have i n v e s t i g a t e d t h e i n f e r e n c e r u l e s f o r f u n c t i o n a l and m u l t i v a l u e d d e p e n d e n c i e s i n a d a t a b a s e rel a t i o n . A s e t o f r u l e s was p r e s e n t e d and shown t o b e c o m p l e t e f o r t h e f a m i l y of f u n c t i o n a l and m u l t i v a l u e d d e p e n d e n c i e s . I t was a l s o shown t h a t t h e s u b s e t o f r u l e s t h a t apply t o rnultivalued dependencies is a c o m p l e t e s e t of r u l e s f o r t h e s e dependencies. Thus, t h e r u l e s p r e s e n t e d h e r e a r e s u f f i c i e n t for t h e a n a l y s i s of t h e p r o p e r t i e s o f f u n c t i o n a l and m u l t i v a l u e d dependencies.
dependencies. S p e c i f i c a l l y , i t was shown t h a t f o r e a c h r u l e f o r f u n c t i o n a l depende n c i e s , t h e same r u l e o r a s i m i l a r r u l e is v a l i d for m u l t i v a l u e d d e p e n d e n c i e s . T h e r e i s , however, an a d d i t i o n a l r u l e (complementation) f o r multivalued dependencies t h a t h a s no p a r a l l e l among t h e rules for f u n c t i o n a l d e p e n d e n c i e s . Of p a r t i c u l a r i m p o r t a n c e a r e t h e r u l e s f o r t h e manipul a t i o n o f r i g h t s i d e s of d e p e n d e n c i e s t h e Union and t h e Decomposition r u l e s . These two r u l e s w e r e a l r e a d y known t o be very u s e f u l f o r t h e manipulation of funct i o n a l d e p e n d e n c i e s . While t h e Decompo s i t i o n r u l e f o r multivaluea dependencies is s l i g h t l y less g e n e r a l , t h e s e r u l e s a r e s t i l l very u s e f u l f o r t h e manipulation o f these dependencies since they allow u s t o p e r f o r m Boolean o p e r a t i o n s on t h e r i g h t s i d e s of dependencies. I t turns out that a c t u a l l y , i n many c a s e s where t h e s e r u l e s a r e u s e d t o m a n i p u l a t e f u n c t i o n a l depende n c i e s , Boolean o p e r a t i o n s a r e a l l t h a t i s needed. Therefore, s i m i l a r manipulations can be a p p lie d t o multivalued dependencies. An example o f t h e u s e o f t h e s e r u l e s i s o u r p r o o f of Theorem 1. Another example is given i n [Beerz].
-
W e c o n c l u d e b y m e n t i o n i n g some probl e m s t h a t m e r i t further research. Now t h a t w e u n d e r s t a n d t h e p r o p e r t i e s of dependencies i n a r e l a t i o n , w e should t r y t o c l a r i f y t h e i n f l u e n c e o f such depende n c i e s on t h e s t r u c t u r e o f r e l a t i o n a l schemas. Some work i n t h i s d i r e c t i o n h a s a l r e a d y been done ( [ B e r n l , B e r r . 2 , B e e r 2 , F a g z ] ) . However, a g e n e r a l t h e o r y t h a t t i e s t o g e t h e r d e p e n d e n c i e s , r e l a t i o n s and o p e r a t i o n s on r e l a t i o n s i s s t i l l l a c k i n g . A s p e c i f i c problem i n t h i s d i r e c t i o n i s \ t o i n v e s t i g a t e what h a p p e n s t o d e p e n d e n c i e s when r e l a t i o n s a r e j o i n e d . W e hope that t h e r e s u l t s presented i n t h i s paper w i l l serve as a b a s i s for attacking these problems.
REFERENCES [Arm] Armstrong, W. W . , "Dependency S t r u c t u r e s o f Data Base. R e l a t i o n s h i p s " , -. of I F I P ' 7 4 , N o r t h H o l l a n d , 1 9 7 4 , pp. -580-583.
[ B e e r l ] B e e r i , C . and B e r n s t e i n , P . A . , "On t h e A l g o r i t h m i c P r o p e r t i e s o f F u n c t i o n a l Dependencies", i n p r e p a r a t i o n . [Beer21 Beeri, C.
,
" M u l t i v a l u e d Dependenc-
i e s and t h e C o n s t r u c t i o n of R e l a t i o n a l
I n a d d i t i o n , w e have shown a n a n a l o g y between t h e w e l l known r u l e s for f u n c t i o n a l d e p e n d e n c i e s and t h e r u l - e s f o r mu1 t i v a l u e d
-t
Schemas"
, in preparation.
[Bern11 B e r n s t e i n , P . A . ,
"Synthesizing
!
I j
I I i
I
I I
T h i r d Normal Form R e l a t i o n s from F u n c t i o n a l D e p e n d e n c i e s " , ACM T r a n s . % D a t a b a s e Systems 1, N o . 4 ( D e c . 1976), pp. 277-298. [ g e r n 2 ] B e r n s t e i n , P . A . and B e e r i , C . , "Ar, A l g o r i t h m i c Approach t o N o r m a l i z a t i o n o f R ~ l a t i o n a lD a t a b a s e Schemas", T e c h n i c a l Yeport CSRG-73, Computer S y s t e m s R e s e a r c h GroUp, U n i v e r s i t y of T o r o n t o , S e p t . 1976. [ C o d d l l Codd, E . F . , " A R e l a t i o n a l Model f o r L a r g e S h a r e d D a t a B a s e s " , CACM 1 3 , N o . 6 ( J u n e 1970) , p p . 377-387. [CoddZ] Codd, E . F . , " F u r t h e r N o r m a l i z a t i o n of t h e D a t a Base R e l a t i o n a l Model", i n D a t a Base S y s t e m s ( C o u r a n t Computer S c i e n c e Symposium 6 , R . R u s t i n e d . ) , P r e n t i c e - H a l l , 1972, pp. 33-64. IFagl] Fagin, R . , " F u n c t i o n a l Dependencies i n 3 R e l a t i o n a l D a t a b a s e and P r o p o s i t i o n a l Logic", Revision of I B M Research Report R J 1 7 7 6 , San J o s e . C a l i f o r n i a , 1976.
..
[Fag21 F a g i n , R . , " M u l t i v a l u e d Depende n c i e s and a N e w Normal Form for R e l a t i o n a l D a t a b a s e s " , t o a p p e a r i n ACM Tran- on D a t a b a s e S y s t e m s . rFag31 F a g i n , R . , "The D e c o m p o s i t i o n v s . t h e :y n t h e t i c Approach t o R e l a t i o n a l D a t a n a s e Design", IBN R e s e a r c h R e p o r t F(J1976, S a n J o s e , C R l i f o r n i a , 1977. r R i s s 1 R i s s a n e n , J . , " I n d e p e n d e n t Componc ~ t o s f R e l a t i o n s " , I B M Research Re+ t RJ1899, San J o s e , C a l i f o r n i a 1977. .
Schmid, H . A . and Swenson, J. R . , "On t h e S e m a n t i c s o f t h e R e l a t i o n a l D a t a M o d e l " , E .O f ACM SIGMOD C o n f . , San J o s e , C a l i f o r n i a , May 1975, pp. 211-223. ISS:
" A n a l y s i s and D e s i g n [Zan] Z a n i o l o , C . of R e l a t i o n a l Schemata f o r D a t a b a s e S y s t e m s " , ( P h . D t h e s i s ) , Computer S c i e n c e D e p a r t m e n t , UCLA, T e c h n i c a l R e p o r t UCLA-ENG-7769, J u l y 1 9 7 6 .
.
!.
61