On the Number of Expressions Modulo ... - Semantic Scholar

5 downloads 0 Views 140KB Size Report
is, if jSj > 1 then the root of opt(S) has exactly two descendants, opt(S1) and opt(S2). Exchanging the subsets, S1 and S2, does not in itself distinguish the new ...
On the Number of Expressions Modulo Commutativity over a Finite Semi-group Cong-Cong Xing and Bill P. Buckles Department of Electrical Engineering & Computer Science Tulane University New Orleans, Louisiana

iaks Center for Intelligent and Knowledge-based Systems Technical Report EECS/CIAKS{97{0101/TU June 1997

On the Number of Expressions Modulo Commutativity over a Finite Semi-group

Abstract

A problem,  , that is closely related to Catalan numbers, b , is formulated, analysed, and solved. The result shows that  is exponential, which poses a challenge for nding a polynomial time algorithm if a search space is  . Conversely, it inspires attempts to prove that such search problems are N P -complete. This work has immediate applications to the join optimization problem in database systems. n

n

n

n

 

Catalan numbers [7] b = can be interpreted as, given a nite semi-group (G; ) with jGj = n, the number of expressions over (G; ) which are equivalent under associativity to a speci c expression e of length n. Catalan numbers have applications in such diverse areas as determining the number of ways that parentheses may be con gured in matrix-chain multiplication [1], determining the number of ways that a n-polygon can be triangularized [4], and determining the number of full binary trees [2]. Given the Catalan number b , it is easy to see that, if we introduce commutativity to (G; ), the number  (in the sense of associativity and commutativity) would be the universe of expressions of length n, n!b . However, what is  if we identify those expressions which are equivalent in terms of commutativity? where x is a function of n. But obtaining x is not trival.  must be of the form The purpose of this paper is to determine a closed form for  and analyse it. We seek the solution to  directly from the nature of the problem rather than rst determining x . This problem is manifested in a number of optimization problems including that of \join optimization" in database systems [6]. Join, represented by ./, is a binary algebraic operator that produces a result database relation when given two operand relations. ./ is both associative and commutative. In database query optimization, a crucial issue arises when given a (sub)expression ./ R , where each R is a relation. One wishes to determine the lowest cost computation sequence among all those possible, recognizing that reassociating terms a ects the cost of evaluation but commuting subexpressions does not. The question concerning  may be rephrased as: How large is the search space? The result we found is that  > n! > b > 2 (n  6) which eliminates brute-force as a reasonable search strategy and poses a challenge for nding an ecient algorithm for solving the \join optimization" problem. 1

n

2n

n+1

n

n

n

n

n

n!bn xn

n

n

n

n

n

n

n i=1

i

i

n

n

n

n

1

Analysis

First, let us state clearly the number  with which we are concerned. n

n

To assist the discussion, we present the necessary de nitions. Throughout the discussion, we use G to denote the commutative semi-group (G; ) when the context is clear. De nition 1 Given a commutative semi-group G, an expression e over G is a non-empty string of the elements of G in xed by (zero or more) and well-balanced, nonredundant parentheses, ( and ), in which each element is distinct. The length of e is the number of elements of G in e. For example, G = fa; b; cg, a b and a (b c) are expressions with length 2 and 3 respectively. The set of all expressions over G is denoted by G. A subset G  G is the set of all expressions of length n. De nition 2 Given G , for all e ; e 2 G , e = e i after we apply the commutative rule to e (zero or more times), e is identical to e . It is easy to verify that = is an equivalence relation on G . De nition 3 Let X be a non-empty set, and  be an equivalence relation on X . Then the set of equivalence classes, denoted by X=, is called the quotient set of X modulo the equivalence relation . Now, our interest is to take the expression subset G , form the quotient set G == using the equivalence relation = de ned above. Since each element in G == represents a unique computation sequence regardless of the commutativity, our goal is therefore to decide the size of G == when jGj = n. Let us take a look at G ==, where G = fa; b; cg and a; b; c 2 N and is normal integer addition. We have: f(a + b) + c a + (b + c) (a + c) + b (b + a) + c a + (c + b) (c + a) + b G = c + (a + b) (b + c) + a b + (a + c) c + (b + a) (c + b) + a b + (c + a)g and G== = f[(a + b) + c]; [a + (b + c)]; [(a + c) + b]g where [e] denotes an equivalence class with representative e. Clearly, we have 3 distinct ways to form the expression of length 3. We seek a closed formula for the size of G in the general case. To this end, it is helpful to introduce another concept, op-tree of a set S , which is de ned inductively as following. n

n

1

1

1

2

n

1

2

2

n

n

n

n

n

3

3

3

n

2

De nition 4 Given a non-empty nite set S , an op-tree of S , denoted by opt(S ), is :  a 2 S if S is a singleton set fag.  otherwise, a root denoted by and two children denoted by opt(S ) and opt(S ), where S ; S  S , S \ S = ; and S [ S = S . 1

1

2

1

2

1

2

2

Note that the above de nition is that of a strictly binary tree and that we do not distinguish the \positions" (e.g., left or right) of the descendants of an internal ( ) node. That is, if jS j > 1 then the root of opt(S ) has exactly two descendants, opt(S ) and opt(S ). Exchanging the subsets, S and S , does not in itself distinguish the new op-tree from the original. Yet, there are many op-trees for a given set S . We use opt(S ) to denote one of them and opt(S) with jS j = n to denote the entire collection. Given the notation of op-tree, we can easily see that there is an 1-1 correspondence between the equivalence classes in G == with jGj = n and the op-trees in opt(S) by viewing the last (outtermost) operator as the root and its two operands as the descendants. The corresponding op-trees of G == are given in Fig. 1. 1

1

2

2

n

3

+

+

+ +

+

a

b

c

a

[(a+b)+c]

b

+

c

c

a

[a+(b+c)]

b

[(a+c)+b]

Figure 1: op-trees

Recursive Solution

Now, we are ready to present the partial solution of the problem in terms of the following lemma.

Lemma 1 Given a nite commutative semi-group G with jGj = n, ? X = 12 = 1 n

n

1

i=1

1

n

!

(n > 1)

in?i

i

(1) (2)

Proof. Note that  = jG == j = jopt(G)j. If n > 1, then any opt(G) partitions G into blocks of size i and n ? i (1  i  n ? 1). There are ways a partition can be performed n

n

n i

3

and the two blocks of each partition can be con gured into  and  ? op-trees, respectively. Finally, there is one summation term for each value of i and each determines two blocks, distinguished only by their contents. Therefore, i = k and i = n ? k enumerate identical op-trees. That is, each partition is enumerated twice, thus the factor of . i

n i

1

2

Closed Form

While the recurrence (1) is indicative of the magnitude of  , a closed form solution is more useful. Such may be found using a generating function [5]. n

Theorem 1 Equation (1) is equivalent to 2n

n!

= 2

n+1

!

(n  0)

n

n

(3)

Proof. We can substitute (3) into (1) to verify that (1) holds, but it is more informative to derive (3) from (1). 2 =

? n X

n

n

=

1

? X

1

i=1

i n?i

i

i=1

n

!

n!

i

n?i

i! (n

? i)!

For each i , let  = , then i i!

i

? X

n

2 = =

1

n

i=1

in?i

1n?1 + 2 n?2 +

 +  ?  n

1

1

Let generating function g (x)

=

1x + 2x2 +

 +  x +  n

(4)

n

We have g 2 (x)

= (  )x + (  +   )x +    + (  ? +    +  = 2( x +  x +    +  x +   ) = 2(g(x) ? x) 2

1

3

1

1

2

2

3

3

2

2

1

1

n

n

4

n

1

n ?  )x +   

n

1

1

Thus

? 2g(x) + 2x = 0 p1 ? 2x. We pick the root 1 ? p 1 ? 2x because g(x) has no constant terms with roots 1  in (4). Hence p g (x) = 1 ? 1 ? 2x p Let h(x) = 1 ? 2x. It is easy to compute the n-th derivative h (x) of h(x) at x = 0, g 2(x)

(n)

(n)

h

(0) = (?1)

n ? Y (2i ? 3)

2n

1

i=2

So, the Taylor expansion of g(x) is, = 1 ? [1 + (?1)x + (?2!1) x +    + h n!(0) x +   ] = x + 2!1 x ?    ? h n!(0) x ?    Compare the coecients of equations (4) and (5), we have, (n)

g (x)

2

(n)

n

2

n

n

(5)

Y = ? n1! (?1) ? (2i ? 3) Y = n1! (2i ? 3) 2n

n

1

i=2

n

i=2

Therefore, n

= =

n n! n Y

(2i ? 3)

i=2

= (n2 ??1)! 2(nn??11) (n

1)

That is, n+1

n!

=2

n

2n n

5

!

!

(n  1)

(n  0)

(6)

Discussion

Equation (6) clearly shows that  is exponential and is thus a potential source of intractability [3] in computation. This is due to the recursive and combinatorial nature of  as best exhibited in Lemma 1.   because the two It is desirable to compare  with the Catalan number b = have similar characteristics. Note that (n + 1)!  =b 2 (2n ? 1) and (n + 1)! > 2 (2n ? 1). Therefore,  > b > 2 . A yet more illuminating relation is  > n! > b > 2 (n  6). This is expected because b gives the number of expressions for a xed operand sequence taking only associativity into account. Meanwhile n! counts all operand sequences but does not account for associativity. Finally,  accounts for associativity and modulo of commutivity for all operand sequences. n

n

n

n

n

n

n

n

n

n

1

2n

n+1

n

n

n

n

n

n

n

Conclusion We have demonstrated a complete solution for  , the number of expressions modulo commutativity over a nite commutative semi-group (G; ) with jGj = n.   2 . This has direct applications to one query (join) optimization problem in database systems. The superexponential nature of  also suggests the the possibility of exploring the N P -completeness [3] of the corresponding decision problem for join optimization. n

n

n

n

References [1] T. H. Cormen, C. E. Leiserson and R. L. Rivest, Introdution to Algorithms, MIT Press, Cambridge, MA, 1990. [2] Shimon Even, Graph Algorithms, Computer Science Press, Potomac, MD, 1979. [3] M. R. Garey and D. S. Johnson, Computers and Intractability, W. H. Freeman and Company, San Francisco, CA, 1979. [4] S. Sen Gupta, Geometric classi cation of triangulations and their enumeration in a convex polygon, Computer & Mathematics with Applications, 27, 7 (1994), pp. 99-115. [5] Donald E. Knuth, Fundamental Algorithms, The Art of Computer Programming, Vol. I, Addison-Wesley, Reading, MA, 1973. 6

[6] Dennis Shasha and T. Wang, Optimizing equijoin queries in distributed databases where relations are hash partitioned, ACM Trans. on Database Systems, 16 (1991), pp. 279-308. [7] Thomas A. Standish. Data Structure Techniques, Addison-Wesley, Reading, MA, 1980.

7

Suggest Documents