Algebraic Equivalences Among Nested Relational ... - CiteSeerX

Algebraic Equivalences Among Nested Relational Expressions Hong-Cheu Liu Kotagiri Ramamohanarao

Technical Report 94/4

Department of Computer Science The University of Melbourne, Parkville 3052, Australia E-mail: [email protected] [email protected] March, 1994

Abstract: Algebraic optimization are both theoretically and practically important for

query processing in (nested) relational databases. In this paper, we consider this issue and investigate some algebraic properties concerning to the nested relational operators. We also outline a heuristic optimization algorithm for nested relational expressions by adopting algebraic transformation rules developed in this paper and previous related work.

1

1 Introduction In the last decade, much research has been carried out on nested relations and complex objects. By relaxing the 1NF assumption, the resulting nested relational model (sometimes called NF2 ) can support such new applications as oce automation, multimedia system, scienti c data processing system, engineering design system, and so forth. In order to model these applications we use hierarchical structures rather than at tables to enable the representation of complex objects. The nested relational model is the basis for bridging the relational model and other extended models (e.g. object-oriented model). The nested relational model provides the structural core of objected-oriented database, and plays the role of an intermediate stage in an evolutionary path from the relational model to the object-oriented data model and query language [15]. As noted in [4], in terms of object-oriented data model, we can view nested relation as class and tuple of the relation as an instance of the class. By transforming object queries into an object algebra in the spirit of (nested) relational algebra, the power of optimization techniques can be applied to query processing in object-oriented system [4, 15]. However, query optimization and processing for the nested relational model still lacks satisfactory work done in the database community. Algebraic optimization techniques for relational database language, though well-understood and quite successful, cannot be fully applied to the nested relational model. Therefore, algebraic properties of nested relational operators are both theoretically and practically important. In this paper, we consider the problem of this issue and derive a series of algebraic equivalences. The join operation is one of the most expensive and critical issues in nested relational query processing. In the paper, we will review the P-join operator which has been proposed in [10, 11]. The P-join operator is very powerful and allows queries to be expressed more succinct and more easily optimized. We will then investigate some algebraic properties concerning to the P-join operator and extended relational operators [13], which can be used for query optimization in nested relational databases. Finally, we outline the steps of an algorithm that transform an initial query tree into an optimized tree that is more ecient to execute. This algorithm is the extension of that developed in 1NF relational database and forms the basis for evaluating queries in nested relational databases. Our main technical contribution is the deriving a series of algebraic equivalences and the heuristic optimization algorithm. The paper concludes with some suggestions for further research.

1

2 The Nested Relational Model In this section, we brie y review some well-known concepts and present the basic de nitions of NF2 relational data structures and NF2 relational algebra used throughout this paper. A detailed formalism can be found in [12, 13, 16].

2.1 Nested Relational Schemes We assume familiarity with 1NF relations and relational algebra on these relations. De nition 2.1 A relation scheme R is recursively de ned by the following rules: (1) if fA1 ; :::; Akg U and A1; :::; Ak are atomic-valued attributes, then R = (A1; :::; Ak) is a relation scheme. (2) if fA1; :::; Akg U , A1; :::; Ak are atomic-valued attributes, and R1; :::; Rn are relation schemes, then R=(A1 ; :::; Ak; R1; :::; Rn) is a relation scheme. 2 The atomic-valued attributes A1 ; :::; Ak are called zero-order attributes; R1; :::; Rn are called relation-valued attributes or higher-order attributes. The set of top-level attributes in a relation scheme (or a subrelation scheme) R is denoted by ER . The projection of relation r onto attributes N is denoted r[N ], and similarly, the projection of tuple t 2 r onto attributes N is denoted t[N ]. A relation structure R consists of a relation scheme R and an instance r de ned on R, and is denoted < R; r >. Restructuring operators nest and unnest are used to add one level of nesting to a relation and atten a relation by one level respectively. Nest ( ) takes a relation structure R =< R; r > and groups together tuples with common values in some subset of the attributes in R. Unnest (), the inverse of the nest operator, takes a relation structure nested on some set of attributes and ungroups it by one level [16].

2.2 Nested Scheme Tree A nested relation scheme is structured as a rooted tree in which the nodes are labeled with elements of U . Such a tree is called a scheme tree. The set of nodes of scheme tree T is denoted by node(T ). The set of leaf nodes of scheme tree T is denoted by leaf (T ). Figure 1 shows a relation structure and the corresponding scheme tree. In order to de ne selection operator in the next section, we give the concept of selectioncomparable nodes.

De nition 2.2 For all nodes Na; Nb 2 node(T ), where Na 6= Nb, if node Na is a child of 2

s A B

X

s

Y

C Z E F D

a1 b1 c1 d1 a2 b2 c1 d1 d2 c2 d2 a3 b3 c3 d3

A B

e 1 f1 e 1 f2 e 2 f2

X C Z E

Y F

D (b) nested scheme tree Ts

e 3 f3

(a) nested relation s Figure 1: An example of nested relation and nested scheme tree

an ancestor of a node Nb , then Na and Nb are called selection-comparable nodes. We denote N.2 it by Na ?! b D, A ?! C , but C and E are not selection-comparable For example, in Figure 1 (b), B ?! nodes.

De nition 2.3 The expression of a node N in the scheme tree TR is denoted by Ri P i , where Ri 2 ER ; P i = ; if Ri 2 leaf (TR) or P i = the expression of node N in subtree TRi . 2 For example the expression of node D in the scheme tree Ts in Figure 1 is X Z D.

2.3 Selection Operator and Projection Operator We need more sophisticated selection and projection operators which can work on nested relations.

Selection Operator

De nition 2.4 Let r be a relation with relation scheme R. Let = e1e2 be a selection condition; c be a constant which belongs to C , where C is set of all constants including

relation-valued constants. Then (r) is de ned as follows:

(1) (r) = ft j (t 2 r) ^ ((t) = true)g if is a condition on ER i.e., ei 2 (ER [ C ).

3

(2) (r)8= ft j (9tr 2 r), (t[ER ? Ri] = tr [ER ? Ri ]) ^(t[Ri ] = (tr [Ri]) 6= ;)g where >< P i c if = Ri P i c = P i tr [Rj ] if = Ri P i Rj >: P i1 P i2 if = R P i1 R P i2 i i Ri; Rj 2 ER; P i1 , P i2 are the expressions for selection-comparable nodes in subtree TRi . Note that comparison operators include set comparison and set membership operators, in addition to the usual arithmetic operators. Figure 2(b) shows an example of a recursive selection, B =X Z D , on the relation s. 0

0

Projection Operator De nition 2.5 Let r be a relation with relation scheme R. The project-list L of R has the form (1) L is empty, or (2) L = (R1L1; :::; RnLn ), where Ri 2 ER; Li is a project-list of Ri . Then L (r) is de ned as follows: (1) (r) = r (2) (R1 L1 ;:::;RnLn ) (r) =ft j (9tr 2 r) ^ (t[Ri ] = Li (tr [Ri])), i 2 f1; :::; ngg 2 For example Figure 2(c) shows a recursive projection, (A;X (Z (D))), on s. (a) nested relation s A B X Y C Z E F D 1 1 5 1 2 2 2 2 10 2 3 1 3 3 2 15 2 3 3 20 4 3 3

(b) B=X:Z:D (s) A B X C Z D 1 1 5 1 2 2 10 2 15 2

Y E F 2 2 3 1 3 2

(c)(A;X (Z (D))) (s) A X Z D 1 1 2 2 3 2 3 4

Figure 2: Examples of recursive selection and projection on s

2.4 Extended Natural Join and Extended Projection Operators We state the formal de nition of extended intersection, extended natural join and extended projection from [13]. De nition 2.6 Let r1 and r2 be two nested relations with the same schemes R. Let X range over the zero-order names in ER and Y range over the higher-order names in ER. Then, 4

(a) r1 A B a1

b1

a2

b1

a2

b2

(b) r2 E B e1 e2

b1 b2

(c) r1 1e r2 A E B

X

C D

c1 c2 c3 c1 c2 c2

d1 d2 d3 d1 d2 d2

d1 d2 d3 d2 d2

e1

b1 b1

c1 c3 c1

d1 d3 d1

a2

e1

a2

e2

b2

c2

d2

f1 f2 f1 f2 f2

e (d) E;B;X (r1 1e r2) E B X C D

X Y C D F

c1 c1 c3 c1 c2

a1

X Y C D F

f1 f2

e1

b1

e2

b2

c1 c3 c2

d1 d3 d2

f2

Figure 3: Examples of 1e and e .

r1 \e r2 = ftj( 9t1 2 r1 ^ 9t2 2 r2: (8X; Y 2 ER : t[X ] = t1 [X ] = t2 [X ], t[Y ] = (t1[Y ] \e t2[Y ]); t[Y ] = 6 ;))g 2 De nition 2.7 Let X be the higher-order attributes in ER1 \ ER2 , M =ER1 ? X , and N =ER2 ? X . Then the extended natural join is R1 1e R2 =< R; r > where (1) R = (M; X; N ), and (2) r = ftj9u 2 r1; v 2 r2 : t[M ] = u[M ]; t[N ] = v [N ]; t[X ] = (u[X ] \e v [X ]), t[X ] = 6 ;g 2 For example Figure 3 (c) shows an extended natural join on r1; r2. De nition 2.8 Let r be a relation with relation schema R and L be a project-list of R. The extended projection operator is de ned as:

Le (r) = [et L(r)(t) If L=R, we simply shorten Le (r) to e(r). Figure 3 (d) shows an example of extended 2

projection.

3 Path-dependent Join We consider the P-join operator in the investigation of algebraic properties of nested relational operators in this paper. It is more general than other join operators proposed in the 5

context of nested relational model. We brie y review some auxiliary concepts and describe the P-join operator which has been presented in [10, 11].

3.1 Extended Cartesian Product and Path-dependent Cartesian Product The idea behind the extended Cartesian product e is that we form the Cartesian product by combining two relational operands with common higher-order attributes not only at the top level but also at the subscheme levels. Let \" denote the concatenation operator for tuples as well as relation schemes. For example, let two tuples be t1 = (1; 2), t2 = (2; 4), then t1 t2 = (1; 2; 2; 4); let ER = (A; X ), EQ = (B; X ) be two schemes of relations r, q respectively, then ER EQ = (A; X; B; X ). We distinguish between the two X s with suxes Xr , Xq .

De nition 3.1 Let r and q be two nested relations, with schemes R and Q respectively. Then the extended Cartesian product, e , is de ned as follows. (1) if r and q contain no common higher-order attributes r e q = r q (2) if r and q contain common higher-order attributes r e q= ftj(9tr 2 r; 9tq 2 q); t[(ER ? X ) (EQ ? X )] = tr [ER ? X ] tq [EQ ? X ] ^ t[Xi] = tr [Xi] e tq [Xi]; 81 i kg where X = fX1; :::; Xkg are common higher-order attributes. 2

De nition 3.2 Let R be a relation scheme, and T be the scheme tree of R. A path Pr = (N1 Nk ) is a join-path of R if N1 is a child of root(T ) and Nk is a non-leaf node of T.

De nition 3.3 Let r and q be two relations with schemes R and Q respectively. Then p the path-dependent Cartesian product, , is de ned as follows. (1) if Pr = Pq = ;, p r(Pr ) q(Pq ) = r e q. (2) if Pr = Ri Pri and Pq = Qj Pqj are two join-paths of R and Q respectively, p r(Pr ) q(Pq ) = (a) ftj( 9tr 2 r), (t[ER ? fRi g] = tr [ER ? fRi g]) p ^(t[Ri] = tr [Ri](Pri) q(Pq ))g; if length of Pr > length of Pq . 6

(b) ftj( 9tr 2 r; tq 2 q ), (t[(ER ? fRi g) (EQ ? fQj g)] = tr [ER ? fRi g]e tq [EQ ? fQj g]) p ^(t[RiQj ] = tr [Ri](Pri) tq [Qj ](Pqj ))g, if length of Pr = length of Pq : p (c) q (Pq ) r(Pr ), if length of Pr < length of Pq . 2

Example 3.1 Let r and q be two relations of Figure 4(a) and (b) respectively. Figure 4(c) shows the path-dependent extended Cartesian product of r(X Y ) and q (Z ). (a) A

r

B

X

D 15 10 1 2 25 20 3 (b) C Z F G 20 2 2 3 3 40 2 2

(c) ( A

r X Y

Y

E 1 2 2

B

)

p

( ) X

q Z

C

D 15 10 20 1 1 2 2 10 40 1 2 25 20 20 3 3 20 40 3

q

YZ E F 1 2 1 3 2 2 2 3 1 2 2 2 2 2 2 3 2 2

G 2 3 2 3 2 2 2 3 2

Figure 4: An example of path-dependent Cartesian product

3.2 P-join Operator with Single Join-path

The P-join operator with single join-path has been presented in [10]. In this section, we review the basic de nitions and concepts regarding this new join operator.

De nition 3.4 We use r(Nk) to denote r(Pr ) if the join-path Pr = (N1 Nk ). Nk is called the path-determining node of Pr .

Suppose NR and NQ are the path-determining nodes of Pr and Pq respectively, then the p p scheme of relation r(NR) q (NQ) is denoted by R(NR) Q(NQ).

De nition 3.5 The predicate (TR) is true when the condition f8Na; Nb 2 leaf (TR); if Na and Nb have same name then (Na ?! Nb ) _ (Nb ?! Na)g holds. De nition 3.6 Letp r and q be two relations with two join-paths Pr , Pq respectively. In the scheme R(NR) Q(NQ), if (1) A1 ; :::; Ak are the names used for both schemes R and 7

Q, and Air , Aiq have the same parent node, 1 i k, (2) B1; :::; Bl are the names used for both R and Q and Brj , Bqj have dierent parent nodes, 1 j l, and (3) (TR(N ) p Q(N )), R

then

Q

p

r(NR) 1p q(NQ) = B (X [A (r(NR) q(NQ))]), where NR, NQ are the path-determining nodes of paths Pr , Pq respectively and A is (P 1 A1r = P 1 A1q ) ^ (P 2 A2r = P 2 A2q ) ^ ^ (P k Akr = P k Akq ), where P i is the expression of the parent node of Ai , p

X is the scheme of r(NR) q(NQ) except that the components of P 1 A1q ,...,P k Akq , B is (P11 Br1 = P21 Bq1) ^ (P12 Br2 = P22 Bq2) ^ ^ (P1l Brl = P2l Bql ), where P1i , P2i are the expressions of the parent nodes of Bri , Bqi respectively. 2 Example 3.2 Figure 5(c) shows an example of natural P-join between r and q, which are given in Figure 5(a),(b) where p r(Y ) 1p q(Z )= X Br =Bq (A;X (Br;C);Bq ;Y Z(Dr ;E;F ) [Y Z Dr =Y Z Dq (r(Y ) q(Z ))])

(a) A

r

X

B C D 10 5 7 1 15 8 2 4 20 15 8 3 15 9 4

Y

(b) B

E 1 2 4 3 4

(c) ( ) 1p ( ) A X B YZ B C D E 10 5 7 5 1 1 2 2 10 15 8 15 4 4 20 15 8 15 4 4 15 9 r Y

q

D 1 2 15 4

Z

5

F 1 2 4

q Z

F 1 2 4 4

Figure 5: An example of natural P-join between r and q

3.3 Decomposition P-join Operator

Due to the fact that the same attribute names in two join relations may appear in multiple subtrees, we can extend P-join with multiple join-paths which exploits the more general situation. Of course, the choice of join paths is highly dependent on the relation scheme structure and queries. The formal de nition of the decomposition P-join follows.

De nition 3.7 Let (R1; :::; Rn) and (Q1; :::; Qn) be lossless-join decompositions on schemes

R and Q respectively. The intersection of all subschemes in each decomposition contains at least one common zero-order attribute. ie., ER = [ni=1 Ri ; EQ = [ni=1 Qi , and \ni=1 Ri = Ar , \ni=1Qi = Aq , where Ar , Aq are zero-order attributes of R, Q respectively. The decomposition P-join of two relations r and q is: 8r 2 IR, 8q 2 IQ, Lr = (N1r; :::; Nnr) and Lq = (N1q ; :::; Nnq) are lists of join-paths of r and q respectively. If (TR (N r ) p Q (N q ) ), 8i 2 f1; :::; ng, then i i i

i

8

(a) A a1

a2

r

B C b1

c1

b2

c2

b2

c2

X I

Z

J

D

Y

U E K

(b) I i1

i1

j1

d1

e1

k1

i2

j2

d2

e2

k2

i1

j2

i2

j2

q

i2 d1

e1

k2

d3

e3

k3

F

V K G

d1

f1

k1

g1

d3

f2

k3

g3

d1

f2

k2

g2

d3

f3

D

Y

0

(c) ( ) 1p ( ) A I X B C Z I J

YY D E F

UV K G

a1

d1

e1

f1

k1

g1

r Y; U

i1

q Y

0

;V

b1

c1

i1

j1

b2

c2

i1

j2

0

a1

i2

b1

c1

i2

j2

d1

e1

f2

k2

g2

a2

i2

b2

c2

i2

j2

d1

e1

f2

k2

g2

d3

e3

f3

Figure 6: An example of decomposition P-join between r and q

r(Lr) 1p q(Lq ) = [1 (J1; :::; Jn)], where Ji = r[Ri](Nir ) 1p q [Qi](Niq ), 1 i n, 1 is the standard natural join, is the

predicate, with \=" comparison operator, on those selection-comparable nodes with the same attribute names in the scheme tree of 1 (J1; :::; Jn).

Example 3.3 Figure 6 (c) shows an example of decomposition P-join between r and

q, shown in Figure 6 (a), (b).

4 Equivalences of Algebraic Expressions The essence of a query optimization is to nd an execution plan that minimizes a cost function. An optimization process involves two deeply connected levels that are classi ed as heuristic optimization and systematic cost estimate. The rst level is based on heuristic rules for ordering the operations in a query execution strategy to nd an equivalent expression with expected performance improved. The second level use a cost model based on system information to choose the execution plan with the lowest cost estimate. We discuss heuristic rules for transforming algebraic expressions into equivalent ones in this section. We rst present the following theorem regarding the laws of commutativity and associativity of the extended natural join (1e ). Theorem 1 (1) r 1e s = s 1e r (2) r1 1e (r2 1e r3) = (r1 1e r2) 1e r3 9

Proof: See Appendix. Due to the fact that extended natural join (1e ) is applied to each level of the schema of P-join, the laws of commutativity and associativity can be similarly valid for P-join. The detailed description of these results can be found in [10, 11]. The following four theorems and necessary related lemmas concern commuting a projection ( or e ) with join operators (1e or 1p ). These equivalences are direct extensions of those of the relational model. Theorem 2 If (M ER; N ES ), Then MN (r 1e s) = M (r) 1e N (s) i (M \ N = ER \ ES ).

Proof (() We show inclusion both ways under given conditions. : 8t 2 LHS , there is a tuple t 2 r 1e s such that t = MN (t ). Since t 2 r 1e s, there must be a tuple tr 2 r; ts 2 s such that t = tr 1e ts . Because M ER, N ES and M \ N = ER \ ES , so t = t [MN ] = (tr 1e ts )[MN ] = tr [M ] 1e ts [N ] 2 M (r) 1e N (s) = RHS . : 8t 2 RHS , there exist a tuple t^r 2 M (s) and a tuple t^s 2 N (s) such that t = t^r 1e t^s . Since ^tr 2 M (r), there is tr 2 r such that ^tr = tr [M ]. Similarly, there is ts 2 s such that t^s = ts [N ]. By assumption M \ N = ER \ ES , t = t^r 1e t^s = tr [M ] 1e ts [N ] = (tr 1e ts)[MN ] 2 MN (r 1e s) = LHS . ()) Suppose M \ N =6 ER \ ES , there exists attribute Q 2 ER \ ES such that Q 62 M \ N . We prove that t 2 M (r) 1e N (s) ) t 2 MN (r 1e s) does not always hold. Since t 2 M (r) 1e N (s) there must be t^r 2 M (r); ^ts 2 N (s) such that t = t^r 1e t^s . Let tr = ft jt 2 r; t [M ] = t^r g, Similarly ts = ft jt 2 s; t [N ] = ^ts g, We assume tr [Q] \e ts [Q] = ;. Because Q 2 ER \ ES , The relation r 1e s contains no tuple having t as its MN component. i.e. t 62 MN (r 1e s). So M (r) 1e N (s) = 6 MN (r 1e s) e e We conclude that (M \ N = 6 ER \ ES ) ) ( MN (r 1 s) = M (r) 1 N (s) does not 0

0

0

0

0

0

0

00

00

00

always hold).

Lemma 3 e(r1 1e r2) = e(r1) 1e e(r2) Proof: See Appendix. e (r1 1e r2) = e (r1) 1e e (r2) i (M \ N = ER \ ES ): Lemma 4 MN N M e (r1 1e r2)= [e Proof: MN t2MN (r1 1e r2 ) (t) = [et2(M (r1 )1e N (r2 ))(t)

= = =

e(M (r1) 1e N (r2)) e(M (r1)) 1e e(N (r2)) Me (r1) 1e Ne (r2) 10

(by Theorem 2) (by Lemma 3)

0

If the project-list L can be split into L1 and L2 such that they contain attributes of r and s in L respectively, and they each contain all common attributes involved in the join, then we can get the following results regarding commuting a projection ( or e ) with P-join.

Lemma 5 L[r(Nr) 1p s(Ns)] = L[L1 (r)(Nr) 1p L2 (s)(Ns)], where Nr , Ns are the

path-determining nodes of r and s respectively.

Proof: At each level of schema, P-join is de ned in terms of 1e. We can get result by applying Theorem 2 recursively to each level of schema of P-join. i.e., p

L[r(Nr) 1p s(Ns)] =L[B (L (X A (r(Nr) (s)(Ns))] p =L [B X A (L1 (r)(Nr) (L2 (s)(Ns ))] =L [L1 (r)(Nr ) 1p L2 (s)(Ns)] 0

0

0

0

(by Theorem 2) p

Where A , X , B are A , X , B restricted to the schema of L1 (r)(Nr) L2 (s)(Ns), L is the project-list L augmented with attributes in B. 2 0

0

0

0

The following theorem is the generalization of Lemma 5 with single path replaced by multiple paths. We present it without proof due to the fact that the proof is similar to that of Lemma 5.

Theorem 6 L[r(Lr) 1p s(Ls)] = L[L1 (r)(Lr) 1p L2 (s)(Ls)], where Lr , Ls are the

lists of join-paths of r and s respectively.

Lemma 7 Le (r) = Le e(r) Proof: By e de nition, the equivalence holds obviously. Theorem 8 Le (r(Lr) 1p s(Ls)) = Le 1 (r)(Lr) 1p Le 2 (s)(Ls), where Lr , Ls are the lists of join-paths of r and s respectively. Proof: LHS = Le (r(Lr) 1p s(Ls)) = e (L(r(Lr ) 1p s(Ls ))) = e (L(L1 (r)(Lr ) 1p L2 (s)(Ls ))) = Le (L1 (r)(Lr) 1p L2 (s)(Ls )) = Le e (L1 (r)(Lr) 1p L2 (s)(Ls )) = Le (Le 1 (r)(Lr) 1p Le 2 (s)(Ls )) = Le 1 (r)(Lr ) 1p Le 2 (s)(Ls) = RHS

(by Theorem 6 ) (by Lemma 3)

Lemma 9 (1) (r 1e s) = (r) 1e s, if all the attributes mentioned in are attributes of r. (2) (r 1e s) = 1 (r) 1e 2 (s), if is of the form 1 ^ 2 , where 1 involves only 11

attributes of r, and 2 involves only attributes of s. By P-join de nition, we know that 1e is performed at each level in the schema of P-join. We can generalize the result of Lemma 9 to the case of P-join as follows.

Theorem 10 (1) (r(Lr) 1p s(Ls )) = (r)(Lr) 1p s(Ls ), if contains no attributes

in common with s. (2) (r(Lr ) 1p s(Ls )) = 1 (r)(Lr ) 1p 2 (s)(Ls ), if is of the form 1 ^ 2, where 1 and 2 involve only attributes which are selection-comparable notes in r, and s respectively.

Lemma 11 LX (r) = X L (r) if X L and L = X (L ) Proof: See [16]. 0

0

0

Theorem 12 Le X (r) = X Le (r) if X L and L = X (L ) Proof: 0

0

0

LHS = e(LX (r)) = e (X L (r)) = X ( e (L (r))) = X (Le (r)) 0

(by Lemma 11)

0

0

The following two propositions illustrate the commutativity of unnest (nest) with selection operator for our case of selection de nition. These propositions are easy to check, we state them without proofs.

Proposition 13 (1) X ((r)) = (X (r)) where is the derived condition on the 0

0

schema of relation r with attribute X unnested. (2) X ( (r)) = (X (r)) where is the derived condition on the schema of relation r with attribute X nested. 0

0

Proposition 14 If condition involves only attributes in L, then (1) L( (r)) = (L(r)),

(2) Le ( (r)) = (Le (r)).

5 Outline of a Heuristic Optimization Algorithm In this section we discuss optimization techniques that can use equivalence rules of Section 4 to optimize nested relational algebraic expressions. We can now outline the steps of an algorithm that transform an initial query tree into an optimized tree that is more ecient to execute. The main ideas behind this algorithm is similar to those discussed in 1NF relational databases except that restructuring operators and path-dependent nested relational operators are the new operators considered in the algorithm. The steps of the algorithm are as follows:

Analyze the parse tree according to the input algebraic expression which includes standard operations as well as nested relational operators. 12

Using rule of cascade of selection, separate each SELECT operations with conjunctive conditions into a cascade of SELECT operations.

Using rules of propositions 13 and 14, theorem 10, move each SELECT operation as far down the query tree as possible.

Using rules concerning associativity of binary operations1, [p , \p, ?p , and 1p [9], rearrange the leaf nodes of the tree.

Using rules concerning cascading of PROJECT and commuting of PROJECT with other operations, move projection as far down the tree as possible.

Apply the knowledge of functional dependency, multivalued dependency and mutual

data dependency [3, 8] to move the restructuring operations UNNEST, NEST down the binary operations, but should be above the unary operations ( , ). Identify subtrees that represent group of operations that can be executed by a single access routine.

Let us introduce a database example and illustrate algebraic optimization techniques that apply to the corresponding query.

Example 5.1 Consider the following database which has nested relations Product and

Part.

Product = (prodname, Warranty(premium, country, w-period), Composition(c-name, c-id, Parts(p-name, quantity)), Distributors(company, fee)) Part = (p-name, weight, Source(company, cost))

Consider also the following query.

Q1: Find those products whose warranty period under three years and their parts, together those companies that are both distributor and parts source, and their corresponding delivery fees and costs. Group the result on prodname and p-name. Note that we denote X + Y for new attribute XY which is de ned in De nition 3.3. We could express this query as:

prodname;p?name;Distributor+Source(company;fee;cost) (Warranty:w?period3 (Product(Distributor) 1p Part(Source))) The algebraic transformation of this query is as follows. 1

[e, \e, ?e, 1e ) [13] are

Note that standard relational operators and extended relational operators (

special cases of path-dependent nested relational operators [9].

13

prodname;p?name;Distributor+Source(company;fee;cost) (Warranty:w?period3 (Product(Distributor) 1p Part(Source))) = prodname;p?name;Distributor+Source(company;fee;cost) ((Warranty:w?period3 Product)(Distributor) 1p Part(Source)) = prodname;p?name;Distributor+Source(company;fee;cost) ([prodname;Composition(Parts(p?name)) (Warranty:w?period3 Product)](Distributor) 1p [p?name;Source(company;cost) Part](Source)) The resulting equivalent expression is more ecient than the original one.

6 Conclusion The nested relational model provides a better way to represent complex objects than the traditional relational model, and allows users to describe their concepts of real world data objects more easily. However, the optimization of queries is dierent from that in the 1NF relational model. In this paper, we have investigated some algebraic properties of nested relational operators, which are useful for query optimization in the nested relational model. The object-oriented database shows great promise in serving as a model for next generation database systems [1, 7]. Many instances of the object-oriented queries are structurally similar to nested relational queries [5, 6]. Most essential techniques for (NF2) relational query processing are directly applicable to object-oriented query processing [5]. The semantics of explicit join of classes in object-oriented data model is similar to that of P-join in NF2 data model. Therefore, we hope that the theoretical results obtained for optimization of nested relational algebra can be carried over to an object algebra. The selection operator presented in Section 2 only considers selection-comparable nodes for the purpose of this paper. The more general, sophisticated selection operator is needed in the nested relational model. The higher-order calculus may be an alternative approach. The algebraic expression of queries is more succinct and more easily optimized when they are expressed using P-join, than they are when using other extended join operators. The complexity of algorithm developed by methods indicated in [10] will not be worsen than other join algorithms with expensive restructuring operators involved, which are recently proposed for nested relational database systems. However, the design of ecient algorithms for P-join operation and detailed performance analysis are needed in further research work.

References [1] C. Beeri. New Data Models and Languages - the Challenge. In Proceedings of the 11th ACM Symposium on Principles of Database Systems, pages 1-15, 1992. [2] L.S. Colby. A Recursive Algebra and Query Optimization for Nested Relations. In Proceedings of the ACM SIGMOD International Conference on the Management of Data, pages 273-283, 1989. 14

[3] Y. Jan. Algebraic Optimization for Nested Relations. In Proceedings of the 23rd Hawaii International Conference on System Sciences, Vol.2, pages 278-287, 1990. [4] H.F. Korth. Optimization of Object-Retrieval Queries. In Proceedings of the 2nd International Workshop on Object-Oriented Database Systems, pages 352-357, 1988. [5] W. Kim. Introduction to Object-Oriented Databases. The MIT Press, 1990. [6] M. Kifer, W. Kim, and Y. Sagiv. Querying Object-Oriented Databases. In Proceedings of the ACM SIGMOD International Conference on the Management of Data, pages 393-402, 1992. [7] W. Kim. Object-Oriented Database System: Promises, Reality, and Future. In Proceedings of the 19th Very Large Data Bases Conference, pages 676-687, 1993. [8] H.-C. Liu and K. Ramamohanarao. Equivalences of Nested Relational Operators. In Proceedings of the Third Australian Database Conference, pages 124-138, 1992. [9] H.-C. Liu and K. Ramamohanarao. Path-dependent Nested Relational Algebra. Technical Report 93/13, Department of Computer Science, The University of Melbourne, Australia, 1993. [10] H.-C. Liu and K. Ramamohanarao. Path-dependent Join for Nested Relations. To appear in Proceedings of Fifth International Hong Kong Computer Society Database Workshop, February, 1994. [11] H.-C. Liu and K. Ramamohanarao. Multiple Paths Join for Nested Relational Databases. In Proceedings of the Fifth Australasian Database Conference, pages 30-44, 1994. [12] Z.M. Ozsoyoglu and L.-Y. Yuan. A New Normal Form for Nested Relations. ACM Transaction on Database Systems, 12(1):111-136, 1987. [13] M.A. Roth, H.F. Korth and A. Silberschatz. Extended Algebra and Calculus for :1NF Relational Databases. ACM Transactions on Database Systems, 13(4):389-417, 1988. [14] M.H. Scholl. Theoretical Foundation of Algebraic Optimization Utilizing Unnormalized Relations. In Proceedings of the International Conference on Database Theory, pages 380-396, 1986. [15] M.H. Scholl and H.J. Schek. The Relational Object Model. In Proceedings of International Conference on Database Theory, pages 89-105, 1990. [16] S.J. Thomas and P.C. Fischer. Nested Relational Structures. In P.C. Kanellakis, editor, Advances in Computing Research 3, JAI press, pages 269-307, 1986.

Appendix

Theorem 1 (1) r 1e s = s 1e r (2) r1 1e (r2 1e r3) = (r1 1e r2) 1e r3 Proof: (1) Under extended natural join two tuples contribute to the join if the extended 15

intersection of their projection over common attributes is not empty. Due to the fact that extended intersection operator is commutative, we imply that extended natural join is commutative. (2) We show inclusion both ways. : 8t 2 LHS , there exist a tuple t1 2 r1 and a tuple u 2 r2 1e r3 such that t is extended natural join of t1 and u. i.e., t = t1 1e u. Similarly, there must be tuples t2 and t3 in r2 and r3 respectively such that u = t2 1e t3. So t = t1 1e u = t1 1e (t2 1e t3). Now we divide the following proof into two parts. (a) Let A be the set of zero-order attributes in the scheme of the relation of LHS . t[A] = (t1 1e (t2 1e t3))[A] = t1 [ER1 \ A ] 1e (t2 1e t3 )[ER21e R3 \ A] = t1 [ER1 \ A ] 1 (t2 [ER2 \ A ] 1e t3 [ER3 \ A ]) = t1 [ER1 \ A ] 1 (t2 [ER2 \ A ] 1 t3 [ER3 \ A ]) = (t1 [ER1 \ A ] 1 t2 [ER2 \ A ]) 1 t3 [ER3 \ A]:::::(see Section 2.2.2 ) = (t1 1e t2 )[ER11e R2 \ A ] 1 t3 [ER3 \ A ] = ((t1 1e t2 ) 1e t3 )[E(R11e R2 )1e R3 \ A ] = ((t1 1e t2 ) 1e t3 )[A ] (b) Let X be the set of higher-order attributes in the scheme of the relation of LHS . There are three cases: < i > X 2 X and X precisely belongs to one relation scheme, say ri, then ri[X ] do not participate in the join with the other two relations. We get t[X ] = (t1 1e (t2 1e t3 ))[X ] = ti [X ] = ((t1 1e t2) 1e t3)[X ] < ii > X 2 X and X is exactly a common attribute of two relations, say ri; rj , then t[X ] = (t1 1e (t2 1e t3 ))[X ] = ti [X ] \e tj [X ] = ((t1 1e t2 ) 1e t3 )[X ] < iii > X 2 X and X is a common attribute of three relations, then t[X ] = t1[X ] 1e (t2[X ] 1e t3[X ]) = t1 [X ] \e (t2 [X ] \e t3 [X ])::::::(by 1e de nition) = (t1 [X ] \e t2 [X ]) \e t3 [X ]::::::(by associativity of \e ) = (t1 [X ] 1e t2 [X ]) 1e t3 [X ] = ((t1 1e t2 ) 1e t3 )[X ] By < i >< ii >< iii > t[X ] = (t1 1e (t2 1e t3 ))[X ] =((t1 1e t2 ) 1e t3 )[X ] By (a) (b), we imply that t = t1 1e (t2 1e t3 ) = (t1 1e t2 ) 1e t3 2 (r1 1e r2) 1e r3 = RHS . Since t is arbitrary element, we conclude LHS RHS . : This proof is similar to that of part \". 16

Lemma 3 e(r1 1e r2) = e(r1) 1e e(r2) Proof Let X be the higher-order attributes in ER1 \ ER2 . Let M = ER1 ? X and N = ER2 ? X . Then we show inclusion both ways to prove equivalence at the instance level.

We partition r1 1e r2 on zero-order attributesi. By e de nition, 8t 2 LHS , t is the extended union of those tuples in some block Bof this partition. i.e. 9 integer k, such that t = [e fti jti 2 r1 1 r2; 1 i k; ti 2 Bg There exist s1 2 e (r1), s2 2 e (r2) such that s1 [M ] = t[M ]; s2[N ] = t[N ]. Now we prove t = s1 1e s2 as follows. (1) For each ti 2 r1 1 r2, there must exist t1i 2 r1, t2i 2 r2 such that ti = t1i 1e t2i , 1 i k, and t[M ] = ti [M ] = t1i [M ], t[N ] = ti [N ] = t2i [N ]. (2) 8X 2 X . t[X ] = [e ti [X ] =[e (t1i [X ]\et2i [X ]) =([e t1i [X ])\e([e t2i [X ]) =([e t1i )[X ]\e ([e t2i )[X ] =s1 [X ] \e s2 [X ]. (3) We claim that t[X ] = s1 [X ] \e s2 [X ]. If it was not then there is some w 2 s1[X ] \e s2[X ] and w 62 t[X ]. There must exist w1 2 r1 and w2 2 r2 such that w 2 (w1 1e w2)[X ], where w1[M ] = s1[M ] = t[M ]; w2[N ] = s2[N ] = t[N ]. So (w1 1e w2)[X ] 62 t[X ] = [ti [X ], ie. w1 1e w2 62 [ti . But w1 1e w2 have same values on zero-order attributes with t. This implies [ti is not a partition on zeroorder attributes of r1 1e r2. This is a contradiction. So t[X ] = s1 [X ] \e s2 [X ]. We conclude that t = s1 1e s2 2 ( e (r1) 1e e (r2)) 8t 2 RHS , there exist t 2 e(r1) and t 2 e(r2) such that t = t 1 s . By e de nition, t = [e1ik s1i , where si 2 r1 and all si have values on zero-order attributes. Similarly, t = [e1j l s2j , where s2j 2 r2 and all s2j have values on zero-order attributes. 0

00

0

0

0

00

0

00

8X 2 X , t[X ] = t [X ] \e t [X ] = ([e1ik s1i )[X ] \e ([e1j l s2j )[X ] = ([e1ik s1i [X ]) \e ([e1j l s2j [X ]) = [e1ik;1j l (s1i [X ] \e s2j [X ]) = [e1ik;1j l ((s1i 1e s2j )[X ]) = ([e1ik;1j l (s1i 1e s2j ))[X ] 0

00

In r1 1e r2, only those tuples which are joined by s1i and s2j can have same values on zero-order attributes with t. So P = [e1ik;1j l (s1i 1e s2j ) form a partition on zero-order attributes in r1 1e r2). We get t 2 LHS .

17

Algebraic Equivalences Among Nested Relational ... - CiteSeerX

Algebraic Equivalences Among Nested Relational ... - CiteSeerX

Suggest Documents

UNIVERSAL ALGEBRAIC EQUIVALENCES BETWEEN

Optimization of Nested Queries in a Distributed Relational ... - CiteSeerX

Optimization of Nested Queries in a Distributed Relational ... - CiteSeerX

Incremental computation of nested relational query expressions

Equivalences among Various Logical Frameworks ... - Semantic Scholar

Equivalences Among Aggregate Queries with Negation

Programmatic implications of implementing the relational algebraic ...

EQUIVALENCES AND TRANSFORMATIONS SYSTEMS ... - CiteSeerX

Algebraic relational approach for geospatial feature correlation

Propositional Temporal Logics and Equivalences - CiteSeerX

Nested Datatypes - CiteSeerX

Nested Sketches - CiteSeerX

Supporting Flat Relations by a Nested Relational ... - Semantic Scholar

A Nested Relational Approach to Processing SQL ... - Semantic Scholar

Multiple Paths Join for Nested Relational Databases - Semantic Scholar

A Conserative Property of a Nested Relational Query ...

Addendum to Null values in nested relational databases - Springer Link

Relational coordination among nurses and other providers - CiteSeerX

Handling Environments in a Nested Relational Algebra with ...

Query shredding: Efficient relational evaluation of queries over nested ...

Numerical Algebraic Geometry and Algebraic Kinematics - CiteSeerX

Finitely Representable Nested Relations - CiteSeerX

nested support vector machines - CiteSeerX

Nested clade analysis statistics - CiteSeerX