Set-Term Matching in a Logic Database language Seung Jin Lim Yiu-Kai Ng Computer Science Department Brigham Young University Provo, Utah 84602, U.S.A. Email:
[email protected] Abstract
Existing set-term matching algorithms [AGS92] for logic-based database languages, of which set terms have the commutative and idempotent properties, have several problems: Given a pair of set terms, matchers can only be computed sequentially and duplicated matchers are generated. Hence, these algorithms cannot take the advantage of existing multiple processors for computing all matchers in parallel. Further, duplicated matchers are redundant and undesirable. In order to overcome these shortcomings, we propose an improved set-term matching algorithm for LDL=NR, a logic database language for nested relations, which generates non-redundant matchers for a given pair of set terms in parallel. The proposed algorithm thus eliminates the sequential and redundant problems in [AGS92].
Keywords: Logic database language, logic programming, set-term, matching, uni cation
1 Introduction Logic-based data languages, such as LDL [NT89] [CGK 90], HILOG [CC90], HILOG-R [CK91], LDL=PNF [Cha93], and LDL=NR (Logic Database Language for Nested Relations) [CN94], extend logic programming in two dierent ways. First, logic programming is extended to a database query language for expressing (recursive) database queries in a logic database system. Second, logic programming is extended from rst-order logic and the Horn clause subset of logic to high-order logic to handle complex data structures (i.e., non- rst-normal-form relations). A common data structure used in these languages is set terms which are of the form fe ; : ::; en g, where each ei; 1 i n, is distinct and may be complex in structure, and a key design issue of these languages is uni cation, which is a computation tool widely used in automated deduction [Sie89] [SS94]. In logic programming, the uni cation problem can be interpreted as the determination of the solution of a system of equations, while in a logic database system, the uni cation problem can be interpreted as the determination of an answer to a database query. A +
1
1
database query is used for retrieving data from a database which can be accomplished by assigning data values in the database to variables in a query. The set-term uni cation problem in a high-order logic database language is more complex than the uni cation problem in a rst-order logic database language since the former deals with set terms. In this paper, we assume that each element of a set term S is either a constant or a variable. Assuming that each element of S is either a constant or a variable is not a restriction since a complex term t (which is neither a constant nor a variable) in a set term can be replaced by a variable T with the additional equality constraint t = T as in [AGS92]. Further, S is treated as the concatenation of a number of elements, i.e., as an ordered list. However, since a set term represents a set and since ordering and repetition of elements are immaterial in a set, the concatenation operation is enhanced with the properties of commutativity and idempotency [NT89] [CGK 90] [AGS92] . In this paper, we investigate the set-term matching problem. Generally speaking, setterm matching is considered as one-way set-term uni cation [KN86] [LC88] since one of the given pair of set terms consists of constants only. Since set-term matching is commonly used in deductive database systems for handling bottom-up computation, considering only setterm matching is not really a restriction [AGS92]. Indeed, the problem of set-term matching has been studied. [NT89] [CGK 90] adopted a rewriting technique that transforms a program with set terms into an equivalent program with standard terms only at compile time. However, this rewriting approach generates a very large target code. [AGS92] proposed two set-term matching algorithms that compute matchers for a given pair of set terms at run time. However, these algorithms, which also handle set terms with the commutative and idempotent properties, have several problems. One of these problems is that matchers are computed sequentially from a given pair of set terms and hence matchers cannot be computed in parallel. Another problem is that duplicated matchers are generated since these algorithms are implemented using a classical permutation algorithm which typically is applied to a set of distinct elements [MAK86]. Set-term matching problems described above must be solved in order to process queries in LDL=NR. LDL=NR is of interest because it captures the constraints of nested relations precisely. Further, since for each complex-object type there is a nested-relation type with the same \information capacity" [Hul86], LDL=NR can handle complex data-type queries. In Section 3 we propose two dierent set-term matching algorithms for LDL=NR. One of these algorithms generates matchers in parallel, and both algorithms generate matchers without duplicates. We compute the time complexity of these algorithms in Section 4. +
1
+
2 Basic De nitions 2.1 LDL/NR
LDL=NR is based on the notions of type, term, formula, rule, fact, and query, which in turn are de ned on an alphabet. Constants and variables in LDL=NR are of atomic type. From now on whenever we refer set-terms, we mean set-terms with the commutative and idempotent properties unless stated otherwise. 1
2
There are two other types: tuple type and set type, de ned in LDL=NR. Set type and tuple type can be used alternatively to form complex data types, as in [CC90] [CK91].
De nition 1 A type is recursively de ned as follows: (i) an atomic type is a type, (ii) a set type sfr g is a type, where r is either an atomic type or a tuple type, and (iii) a tuple type p(s ; : : :; sn ) is a type, where each si, 1 i n, is either an atomic type or a set type. 2 Example 1 The following type declaration de nes dept which is of tuple type with components Dname, projects, and employees. Dname is of atomic type, and projects, which includes all projects controlled by department Dname, and employees, which includes all employees who work for department Dname, are of set types. dept(Dname; projectsfPnameg; employeesfemployee(Ename; EID )g). 2 De nition 2 A term is recursively de ned as follows: (i) a constant is a term, (ii) a variable is a term, (iii) for a set-type sfrg, sft ; : ::; tmg (m 2 I ) is a term called set term, where each ti, 1 i m, is of type r, and (iv) for a tuple-type p(s ; : ::; sn), p(t ; : ::; tn) is a term called tuple term, where ti is of type si, 1 i n. 2 Example 2 An instance of the tuple-type dept is dept(cs, projectsfdb; se; lpg, employees femployee(smith; 123), employee(jones; 567), employee(snow; 201)g). 2 De nition 3 A (well-formed) formula is inductively de ned as follows: (i) a tuple term or set term is a formula, (ii) if T and S are terms of comparable types, then TS is a formula, where is an applicable (set) comparison operator, (iii) if F is a formula and X is a variable, then 9X F and 8X F are formulas, and (iv) if F and G are formulas, so are :F , F _ G, F ^ G, F ! G, and F $ G. 2 A tuple term, set term, or (set) comparison operator with arguments is an atomic formula or simply an atom. A ground formula (term) is a formula (term) without variables. De nition 4 A rule is of the form head : { body, where head is a positive atom, body is a conjunction of positive atoms, and : { is an implication symbol. A unit rule is a rule with an empty body. A fact is a positive unit rule. In a database, a nite set of facts forms an extensional database (EDB ). 2 Example 3 The following rule retrieves all employees who work on another project besides the db project: works on pj(Ename) : { employee(Ename, EID), works on(EID, wprojsfPname; dbg). 2 De nition 5 A program P is a collection of rules such that for each C 2 P , C is either a fact in EDB or a rule. A goal is a rule without a head, denoted ? ? body. A logic database query in LDL=NR consists of a program and a goal. 2 Example 4 Assume that a nested relational database, which consists of a nested relation parent and a at relation person, is represented by the following facts: 1
+
1
1
3
1
parent(john, childrenfchild(mary, 13), child(anthony, 10), child(bill, 5)g). person(john, male). person(mary, female). person(anthony, male). person(bill, male). and the rule R, father(X; Y ) : { person(X; male);parent(X; Y )., which determines each father X and X 's children. The logic database query, which consists of the program that includes R and the facts above and the goal, ? ? father(john; Z )., retrieves all the children of john. 2
2.2 Set-Terms in LDL/NR
In this subsection, we de ne the commutative and idempotent properties of set terms and set-term matchers in LDL=NR. The commutative property of a set term S in LDL=NR allows elements in S to be arranged in any order, while the idempotent property of S treats duplicated elements in S as a single element. Recall that a set term is treated as an ordered list.
De nition 6 A set term S in LDL=NR satis es the commutative property if given any two elements e and e , concat(e ; concat(e ; S)) = concat(e ; concat(e ; S)). A set term S in LDL=NR satis es the idempotent property if given any element e, concat(e; concat(e; S)) = concat(e; S). 2 As in other logic database languages for complex data structures, in LDL=NR the assignment of a term to a variable is called a binding, and a nite set of bindings is called a substitution. De nition 7 Let S and S be two set terms in LDL=NR. A substitution is a uni er for S and S if S = S . If either S or S is a ground term, then is a matcher for S and S . 2 Example 5 Let S = fa; bg and S = fX; Y; bg be two set terms in LDL=NR. Then, = fX=a; Y=ag, = fX=a; Y=bg, and = fX=b; Y=ag are all matchers of S and S . Thus, there can be more than one matcher for any given pair of set terms in LDL=NR. 2 As in [CC90], it is assumed that the satisfaction of sfa, b, cg by an LDL/NR interpretation I implies the satisfaction of sfX g by I , where X fa; b; cg. Similarly, the satisfactions of sfag and sfbg by I imply the satisfaction of sfa; bg by I . 1
2
1
1
1
2
1
2
2
1
2
2
1
2
1
2
1
2
2
1
3
1
2
3 Set-Term Matching Algorithms for LDL/NR In this section, we propose two set-term matching algorithms for LDL=NR and present the approaches for set-term matching adopted by these algorithms. The rst algorithm is simple, while the second one is a parallel algorithm. Both algorithms solve the duplicated matchers problem in [AGS92] that is caused by using the classical permutation approach. It is assumed that an input of our algorithms satis es the following condition, called matching condition, as in [AGS92]. 4
De nition 8 (Matching condition) Given any two set terms S and S in LDL=NR as input to our algorithms, S and S are of the following form: 1
1
2
2
m}|n z S = fa| ; {z ; am}; |am ; {z ; am n{}g m n +
1
1
+1
m p z }| S = fX ; ; X ; X m | {z } | m ; {z ; Xm m
+
{
+
2
1
+1
p}; |am
+
p
+1
; {z ; am n}g +
n
where ai (1 i m + n) denotes a constant, Xi (1 i m + p) denotes a variable, a ; : ::; am are constants in S but not in S , and am ; : ::; am n are common constants, i.e., constants appear in both S and S . It is further required that there exists at least one constant in S and at least one variable in S . Hence, m + n > 0, m + p > 0, and m; n; p 0. Constants a ; : ::; am are called m-compliant constants (or m-compliants). 2 1
1
2
1
+1
+
2
1
2
1
From now on whenever we refer two set terms S and S , S and S are assumed to satisfy the matching condition unless stated otherwise. 1
2
1
2
De nition 9 Given two set terms S and S , a subset (multiset or permutation) S of S is called an m-idempotent subset (multiset or permutation) of S if fa ; : ::; amg S. De nition 10 If the cardinality of a set, subset, multiset, or permutation S is l, then S is called an l-set, l-subset, l-multiset, or l-permutation, respectively. Example 6 Let S = fa; b; cg and S = fW; X; Y; Z; b;cg be two given set terms. Hence, m = 1, n = 2, and p = 3. fa; bg is an 1-idempotent 2-subset of S , and fa; b; b;cg is an 1-idempotent 4-multiset (permutation) of S and is a matcher of S and S . Any non-midempotent (m + p)-multiset (permutation) of S , e.g., fb; c; b; cg, is not a matcher of S and S . 2 1
2
2
1
1
1
1
2
1
1
1
2
1
1
2
3.1 A Simple Set-Term Matching Algorithm for LDL=N R
In this subsection, we propose a simple set-term matching algorithm (Algorithm 1) which generates all unique matchers of two given set terms S and S . These matchers are generated from the permutations of m-idempotent (m + p)-multisets of S (as shown in Proposition 2). These permutations are computed by using an improved permutation algorithm PA which computes all possible m-idempotent (m + p)-permutations of the m + n constants in S in lexicographical order of the indices of these constants. PA thus enhances the permutation algorithm in [AGS92] since the latter adopts a classical permutation algorithm which assumes that an input consists of distinct elements and hence generates duplicated permutations for computed multisets as a result. The process of constructing a permutation and eliminating all non-m-idempotent permutations is described below. 1
2
1
1
A k -multiset is often called k -selections in mathematics, where k denotes the number of (not necessarily distinct) elements in a multiset. This is initiated from the idea that given a set S , select k elements (which are not necessarily distinct) from S . 2
5
(a) Computing permutations. The following strategies are adopted for computing permutations in Algorithm 1: 1. Each component of a permutation is assigned the index of a constant from left to right. 2. The index of a constant in S can be assigned up to m + p times in a permutation. 3. Constants in S are chosen according to the lexicographical order of their indices, i.e., a constant ai is chosen before constant aj if i j. 1
1
(b) Eliminating non-m-idempotent permutations. Algorithm 1 detects the existence of a non-m-idempotent permutation p being constructed if one of the following two conditions holds: 1. The number of variables that have not been bound (denoted by the variable #Vars rem) is less than the number of m-compliant constants that have not been assigned to p (denoted by the variable #M com rem). 2. When #Vars rem = #M com rem, a non-m-compliant constant is being chosen to be assigned. Since each m-compliant in S must be assigned to a variable in S in order to yield a matcher (an m-idempotent permutation yields a matcher) of S and S , if Condition 1 or Condition 2 holds, then it is guaranteed that p cannot be a matcher of S and S . Hence, the construction process of p should be terminated immediately. 1
2
1
2
1
2
Algorithm 1 follows the steps discussed above to compute all unique m-idempotent (m + p)-permutations while at the same time eliminate all non-m-idempotent permutations.
Algorithm 1: A simple set-term matching algorithm. Input: Set terms S and S that satisfy the matching condition. Output: All unique matchers of S and S . var m, n, p : INT; /* m; n; and p of S & S */ perm : array[m+p] of 1..m+n; IsMCompliant : array[m+n] of 0..1; /* 1 denotes an m-compliant constant */ begin extract mnp(); /* extract m; n; and p from S and S */ for i := 1 to m do IsMCompliant[i] := 1; /* initialize m-compliants */ for i := m+1 to m+n do IsMCompliant[i] := 0; /* initialize non-m-compliants */ permute and eliminate(1, m); /* generate all m-idempotent permutations */ end /* Algorithm 1 */ 1
2
1
2
1
2
1
Procedure extract mnp() begin n := # of common constants in S and S ; m := jS j - n; p := jS j - m - n; end /* extract mnp */ 1
1
2
2
6
2
Procedure permute and eliminate(Curr comp : 1..m+p, #M com rem : 0..m) var WasMCompliant: 0..1; /* an assigned constant represented by its index */ #Vars rem : m+p..1; /* # of variables that have not yet been bound */ begin #Vars rem := m+p - Curr comp + 1; for i := 1 to m+n do if #Vars rem > #M com rem or IsMCompliant[i] = 1 then begin /* no violation of m-idempotency */ perm[Curr comp]:= i; /* assign the index of a constant to a component */ if Curr comp = m+p then /* a permutation has been generated */ get a matcher() /* compute a matcher by variable substitution */ else begin WasMCompliant := IsMCompliant[i]; IsMCompliant[i] := 0; /* a constant has been assigned */ permute and eliminate(Curr comp+1, #M com rem-WasMCompliant); IsMCompliant[i] := WasMCompliant; /* Restore previous assigned (non)-m-compliant for next level of permutations */ end end end /* permute and eliminate */ Procedure get a matcher() var X, a: STRING; matcher := ;; /* SET of X=a: X is a variable in S & a is a constant in S */ begin for i := 1 to m+p do begin X := extract variable(i); /* built-in predicate which extracts Xi 2 S */ a:=extract constant(perm[i]); /*built-in predicate that extracts aperm i 2 S */ matcher := matcher [ f X=a g; end print matcher; end /* get a matcher */ Example 7 Given S = fa; b; cg and S = fX; Y; Z; cg, the process of generating all midempotent (m + p)-permutations of S using Algorithm 1 is represented by the tree as shown in Figure 1. The node `aaa' in Figure 1 is not constructed since it does not satisfy m-idempotency. Further, since any permutation with two constants c as components cannot yield a matcher, the node `ccX' and all its descendant nodes are not generated. 2 2
1
2
[ ]
1
1
2
1
3.2 A Parallel Set-Term Matching Algorithm for LDL=N R
In Algorithm 1, permutations are generated directly from an input in a sequential manner. In this subsection, we propose a parallel set-term matching algorithm (Algorithm 2) for LDL=NR that rst generates in parallel all m-idempotent subset groups which are used to 7
? HH j
hx X
START XXX
An `a' is chosen. A `b' is chosen. A `c' is chosen.
PP
PP
m-idempotent non-m-idempotent unbound component
cXX aXX bXX aaX abX acX caX cbX ccX ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ )
A
A ? AU
HH
H j H
?
A
A ? AU
PP P
PP q P
HH HH
HH H
?
?
A
A ? AU
aaa aab aac aba abb abc aca acb acc
A
A ? AU
HH j H
?
A
A
A ? AU
?
caa cab cac cba cbb cbc cca ccb ccc
Figure 1: The Process of Computing 2-idempotent 3-permutations for S = fa; b; cg. 1
compute all m-idempotent (m + p)-multiset groups G in parallel. Using G, all permutations P of G can be generated. Each permutation p in P yields a matcher of the given pair of set terms after applying variable substitution on each component of p. Prior to discuss the parallel approaches adopted by Algorithm 2, we give the de nitions of subset group and multiset group. De nition 11 Let S be a set term and 0 l jSj. A subset group Sl = fSl ; :: :; Snl g, where Sil S and jSilj = l, 1 i n. A multiset group Gl is of the following form: 8 9 > > M l; < = Gl = > ... > ; m l min(m + n; m + p); :M ; 1
1
l;k
where l denotes the number of distinct constants in each m-idempotent (m + p)-multiset Ml;j (1 j k ) of Gl . 2
3.2.1 Generating Multisets in Parallel Given two set terms S and S , in order to generate all m-idempotent (m+p)-multisets of S in parallel, we rst determine all m-idempotent subset groups Sm , Sm , : : :, S min m n;m p of S in parallel by choosing in between m to min(m + n; m + p) constants from S , with the inclusion of all m-compliants, and then generate each m-idempotent (m + p)-multiset group Gl from Sl , m l min(m + n; m + p), by appending constants chosen from the subset SB in Sl to SB, if necessary. Note that any subset group Sl , 0 l m ? 1 or l > min(m + n; m + p), is excluded. S l , 0 l m ? 1, is excluded since there does not exist any m-idempotent subset in Sl (and subsequently, any m-idempotent (m + p)-multiset generated from Sl ) with less than m elements. Sl , l > min(m + n; m + p), is excluded since 1
2
1
+1
1
(
(
+
1
8
A AU
+ ))
we cannot choose each constant in any subset of Sl at least once to generate m-idempotent (m + p)-multisets. Further, since each subset group Sl , m l min(m + n; m + p), can be generated in parallel , we can determine each Gl from Sl in parallel. 3
Example 8 Given S = fa; bg and S = fX; Y; Z; bg, m = n = 1 and p = 2. Choosing a 2 S and choosing both constants a and b in S yield the m-idempotent subset groups S = ffagg and S = ffa; bgg, respectively. G and G , the two m-idempotent (m + p)multiset groups generated from S and S , respectively, are given below: ( ) n o M = f a; b; a g ; G = M ; = fa; a; ag ; and G = M = fa; b; bg : ; In this example, since choosing constant a in S and replicating a m+p-1 times (i.e., twice) to yield M ; in G and choosing two constants a and b in S and replicating one of them to yield M ; (M ; ) in G can be done independently, G and G can be generated in parallel. Indeed, since a and b in S can be chosen independently, G can further be divided into two subgroups G a and G b , where G a = fM ; = fa; b; agg and G b = fM ; = fa; b; bgg and each subgroup can be generated in parallel. 2 Proposition 1 Given two set terms S and S , ! ! n m + p ? 1 l jG j = i ; p?i where m l min(m + n; m + p) and i = l ? m, i.e., 0 i min(n; p). Proof. Since each m-idempotent subset-group Sl has a number of subsets and each subset generates N m-idempotent (m + p)-multisets, jGlj = jSlj N: To determine which subset Sil 2 Sl satis es m-idempotency, there are three cases to be considered: Case l < m. No m-idempotent subset of Sl can be generated with less than m constants chosen from S . Case l = m. Choosing ! m constants a , : : :, am from S and none of am , : : :, am n from S yields n0 = 1 m-idempotent subset. 1
2
1
1
1
2
1
1
2
2
1
2
2 1
1 1
2 2
1
1
1 1
2 1
1
2
2 2
1
2
2
1
2
2
2
2
2 1
1
2 2
2
1
1
1
+1
+
1
Case m < l min(m + n; m + p). In this case, since we can choose l ? m constants from fam , : : :, a!m n g in addition to fa , : : :, am g to form m-idempotent subsets, there are l ?nm m-idempotent subsets. +1
+
1
9
Subset group Number of possible subsets Number of m-idempotent Number of m-idempo in a subset group subsets in a subset group ( m + p)-multisets from a !
S
1
S ...
2
S m?
m+n 1 ! m+n 2 ... ! m+n m?1 ! m+n m ! m+n m+1 ! m+n m+2 ... ! m+n min(m + n; m + p)
0
0
0 ...
0 ...
0 0 ! ! n m + p ? 1 Sm 0! p ! n m + p ? 1 m S 1! p?1 ! n m+p?1 Sm 2 p?2 ... ... ... ! ! n m + p ? 1 min m n;m p S min(n; p) ! p ? min(n; p) Pmin n;p n Pmin n;p m + p ? 1 Total i i i p?i Table 1: Number of m-idempotent subsets and m-idempotent (m + p)-multisets 1
+1
+2
(
+
+ )
(
=0
10
)
(
=0
)
n
!
! n ,0 i
Hence, the number of distinct m-idempotent l-subsets in Sl is l ? m = i min(n; p), as shown in the third column of Table 1. We now consider the number of m-idempotent (m + p)-multisets that can be generated from Sil , 0 i min(n; p). Given l distinct constants fb ; : : :; blg in an l-subset, m l min(m + n; m + p), we can assign each of these constants at least once (up to m+p-l times) over m+p components of an (m + p)-multiset. The number of these assignments is indeed the number of non-negative integer solutions of the equation x + : : :+ xl = m + p ? l, where xj denotes the number of bj (1 j l ) that has been chosen. This is similar to assigning in between 0 and m+p-l balls to each of the l dierent buckets. Thus, the number of these assignments are: ! ! (m + p ? l) + l ? 1 = m + p ? 1 : l?1 m+p?l ! m + p ? 1 Hence, the number of multisets of a given l-subset is , 0 i min(n; p), p?i as shown in the fourth column of Table 1, and the!number of m-idempotent (m+p)-multisets ! p?1 . 2 generated from an (m + i)-subset group is ni m + p?i 1
1
3.2.2 Generating Unique Matchers The proposed parallel set-term matching algorithmsolves the problem of duplicated matchers in [AGS92] as does Algorithm 1 by using an improved permutation algorithm instead of a classical permutation algorithm. There is, however, a dierence between the two improved permutation algorithms used in Algorithms 1 and 2, respectively. The permutation algorithm in Algorithm 2 accepts an m-idempotent (m + p)-multiset M as input and generates unique permutations of M as output, whereas the permutation algorithm in Algorithm 1 eliminates non-m-idempotent permutations, if they exist, while generating all m + p-permutations. The approach for eliminating duplicated permutations in Algorithm 2 is similar to that of Algorithm 1 in the sense that both algorithms use lexicographical ordering of the indices of constants in S . However, in Algorithm 2 (i) an inventory vector V is created that keeps track of the number of each distinct constant ai; 1 i m + n, in a multiset M that can be used for binding, and (ii) permutations of M are generated by choosing a constant c in lexicographical order such that the corresponding index value of c in V is greater than zero. Example 9 Given a 2-idempotent 3-multiset M = fa; b; ag, the initial inventory vector V of M is < 2; 1 > which denotes that there are two a's and one b, and every possible permutation of M is generated in lexicographical order of the indices. For example, fa; a; bg is generated before fa; b; ag. During the process of generating a permutation P from M , each time a constant ai (1 i m + n) is chosen as a component of P , the corresponding component of V that denotes the number of remaining ai's in M is decreased by one. The 1
Choosing i elements from a given set S to form S and choosing j elements from S to form S , where i 6= j , can be done independently. 3
i
11
j
1
START P = fX, X, Xg V = < 2; 1 >HH 6 H
2
P = fa, X, Xg V = < 1; 1 > Q
HH j
4
Q
7
Q
Q s Q
P = fa, a, Xg V = < 0; 1 > HH 3 H
P = fb, X, Xg V = < 2; 0 >
P = fa, b, Xg P = fb, a, Xg V = < 1; 0 > V = < 1; 0 > 5 8
HH j
P = fa, a, bg P = fa, b, ag P = fb, a, ag V = < 0; 0 > V = < 0; 0 > V = < 0; 0 > Figure 2: The Process of Computing 3-permutations for a 2-idempotent multiset fa, b, ag. process of generating all permutations for M is illustrated in Figure 2, where X stands for a component in a particular permutation that has not been assigned any value, and the number 1; : : :; 8 denotes the sequence of execution. This approach guarantees that no duplicated permutation is generated from M , while classical permutation algorithms generate six permutations (3!) of which three are duplicated. 2
3.2.3 Parallelism Revisited We can further enhance the approach of generating unique permutations discussed in the previous subsection to allow all the permutations to be computed in parallel. Since each constant can be chosen from a given set S^ in parallel, we can compute all permutations of S^ in parallel. Hence, Figure 2 shows that permutations (denoted by dierent subtrees at the same level) can be computed in parallel. For instance, P = fa; X; X g and P = fb; X; X g at the rst level of the tree can be computed in parallel, so as P = fa; a; X g, P = fa; b; X g and P = fb; a; X g at the second level. Each of the permutation P generated yields a matcher of S and S after substituting variables in S by the constants in P in order. Figure 3 illustrates the entire parallel approach adopted by Algorithm 2. Based on the approaches described above, we can solve the problem of generating matchers in a sequential order and the problem of duplicated matchers in [AGS92]. 1
2
2
3.2.4 Algorithm 2: A Parallel Set-Term Matching Algorithm In this subsection, we provide the parallel set-term matching algorithm. The algorithm rst computes all m-idempotent subset groups Sl of S in procedure get a subset group. Each Sl , except Sl if jSl j = m + p, is passed as an input to procedure comp multiset which computes all the m-idempotent (m + p)-multisets of S . Hereafter, each multiset is permuted in procedure get permutes and all permutations are transformed into matchers of 1
1
12
Permutation Generator X H HX H H
* Multiset Generator m + n 7 Gm H
m
Input S &S 1
2
HH j H
m+n
PP
m+1
PP q
@ J @ J @@ R J J J m+n min(m + n; m + p) JJ^
( ( ( ( ( aa ( X X a X X X X
+1
*
(
n;m p
+
-
-
matchers
1
matchers
2
: X X X X X X z
Gm
Gmin m
-
( (a( ( ( ( X X a a X X X X
-
( ( ( aa ( ( ( a X X X X X X
-
-
+ )
XXX z X
matchersi \ matchersj = , i 6= j
-
matchersn
+1
Figure 3: Parallelism in Algorithm 2 S and S in procedure get a matcher. In Algorithm 2, variable DepthOfRecur denotes the current component of a subset, multiset, or permutation S^ (recall that a subset, multiset, or permutation is an ordered list) to be assigned a constant (represented by an index), variable Size of Subset denotes the cardinality of the subset being constructed, and variable Min V al denotes the smallest index value that can be assigned to the current component of S^. Algorithm 2. Parallel set-term matching algorithm. Input: Set terms S and S that satisfy the matching condition. Output: All unique matchers of S and S . type ARRAY = array[m+p] of 1..m+n; var m, n, p : INT; subset, multiset, inventory, perm : ARRAY; 1
2
1
2
1
2
begin extract mnp(); /* extract m; n; and p from S & S , same as Algorithm 1 */ for i := m to min(m+n, m+p) do /* compute subset groups in parallel */ beginpar get a subset group(subset, multiset, inventory, perm, i, 1, 1); endpar end /* Algorithm 2 */ Procedure get a subset group(subset, multiset, inventory, perm : ARRAY, Size of Subset, DepthOfRecur, Min Val: INT) 1
13
2
var
Max Val := m+p - Size of Subset + DepthOfRecur; /* max index value */ #M com rem : 0..m; /* # of unassigned m-compliants */
begin for i := Min Val to Max Val do begin subset[DepthOfRecur] := i; if Size of Subset = DepthOfRecur then /* an m-idempotent subset exists */ begin #M com rem := comp rem MC(subset, Size of Subset); if Size of Subset = m+p then /* a multiset has been generated */ begin for i := 1 to Size of Subset do multiset[i] := subset[i]; /* copy subset to multiset */ inventory := get inventory vector(multiset); get permutes(multiset, inventory, perm, 1); end else if #M com rem = 0 then /* to compute an (m + p)-multiset */ comp multiset(subset, multiset, inventory, perm, Size of Subset, 1, 1); end else /* continue to generate a complete Size of Subset-subset */ get a subset group(subset, multiset, inventory, perm, Size of Subset, DepthOfRecur+1, i+1); end end /* get a subset group */ Function comp rem MP(subset:ARRAY, Size of Subset:m..min(m+n, m+p)): INT var #MP select := 0; begin for i := 1 to Size of Subset do if subset[i] m then /* an m-compliant is in subset */ #MP select := #MP select + 1; return (m-#MP select); /* # of remaining m-compliants in subset */ end /* comp rem MP */ Procedure comp multiset(subset, multiset, inventory, perm : ARRAY, Size of Subset, DepthOfRecur, Min Val : INT) begin for i := 1 to Size of Subset do multiset[i] := subset[i]; /* copy subset to multiset */ for j := Min Val to Size of Subset do begin multiset[Size of Subset + DepthOfRecur] := subset[j]; if Size of Subset+DepthOfRecur = m+p then begin /* an (m + p)-multiset */ inventory := get inventory vector(multiset); /* compute inventory vector */ get permutes(multiset, inventory, perm, 1); 14
end else
/* to generate an (m + p)-multiset */ comp multiset(subset, multiset, inventory, perm, Size of Subset, DepthOfRecur+1, j);
end end /* comp multiset */ Procedure get permutes(multiset, inventory, perm : ARRAY, DepthOfRecur : INT) var OldValue : 0..m+n; begin if DepthOfRecur > m+p then /* a permutation has been generated */ get a matcher(); /* generate a matcher, same as Algorithm 1 */ else /* assign components with constants */ for i := 1 to m+n beginpar /* compute permutations in parallel */ if inventory[i] > 0 then begin /* assign index */ perm[DepthOfRecur] := multiset[i]; OldValue := inventory[i]; inventory[i] := inventory[i] - 1; /* update inventory vector */ get permutes(multiset, inventory, perm, DepthOfRecur+1); inventory[i] := OldValue; /* Restore previous # of constants */ end endpar end /* get permutes */ Function get inventory vector(multiset : ARRAY) : ARRAY var accum := 0; inventory : ARRAY; begin Initialize all entries in inventory to zero; for i := 1 to m+p do inventory[i] := inventory[multiset[i]] + 1; for i := 1 to m+n do /* eliminate all zero entries in inventory by shifting non-zero entries to the left */ if inventory[i] = 0 then accum := accum + 1 else inventory[i-accum] := inventory[i]; for i := m+n+1 to m+p do inventory[i] := 0; /* reset all remaining slots to zero */ return inventory; end /* get inventory vector */ Proposition 2 Given S and S , let P = fp ; : : :; pm pg be a permutation that is generated from an (m + p)-multiset M = fm ; : : :; mm pg such that mi 2 S , 1 i m + p. A 1
2
1
1
+
+
15
1
substitution fX =p ; : ::; Xm p=pm pg is a matcher of S and S if and only if M satis es m-idempotency. 1
1
+
+
1
2
Proof. (If ) Let V = fX ; : : :; Xm pg be the set of variables in S , and let M be an (m + p)-multiset that satis es m-idempotency. Then, by De nition 9, M must include at least all m-compliant constants a ; : ::; am, and since P is generated from M by permutation, P also includes all m-compliants. We can generate a variable substitution = fX =p ; : ::; Xm p=pm pg. Hence, S = S , and by de nition is a matcher. (Only if ) Since = fX =p ; : : :; Xm p=pm pg is a matcher of S and S , S = S . Then, S ? fam ; : : :; am n g = fa ; : : :; am g. Hence, P must include at least fa ; : ::; amg, and thus satis es m-idempotency. Further, since P is a permutation of M , M must also satisfy m-idempotent. 2 Example 10 Let S = fa; bg and S = fX; Y; bg. A 2-multiset M = fa; bg of S and S satis es 1-idempotency. S fX=a; Y=bg = S and S fX=b; Y=ag = S . 2 1
+
2
1
1
1
+
+
2
1
1
2
+1
1
1
+
+
+
1
2
2
1
1
1
2
2
1
1
2
2
1
4 Complexity Analysis and Correctness of Set-Term Matching Algorithms 4.1 Complexity Analysis and Correctness of Algorithm 2
In this subsection, we present the time complexity and correctness of Algorithm 2.
Proposition 3 The number of subsets in a subset group generated by get a subset group is s 2 MAXsubset?group = n 2n : (1)
! n Proof. For each subset group Sl , procedure get a subset group generates i subsets, where 0 i min(n; p) and i = l ? m, which follows from the rst half of the proof for Proposition 1. Further, for any n, if min(n; p) = n, then the largest binomial coecient exists at n according to the binomial theorem [MAK86], as shown in Figure! 4. If n ! p < n, then the largest coecient also appears at n . If p < n , then nj < nn , ! n 0 j p. Therefore, MAXsubset?group = n=2 = n=n 2 . By Stirling's formula, np q n 2 ( ne ) n2 = n n n 2 . 2 n= n 2 pn ( 2e ) 2
2
2
2
2
!
(
!
(
2
2)!
2
2)!
Proposition 4 The number of multisets generated from a subset using comp multiset is s 2 m p? : MAXmultisets = (2) (m + p ? 1) 2 +
16
1
. ! n 250 . . . . . . . . . . . p. . d. 2 .e . . . . . . . . ,l i , .. l n
6
200 . 150 . 100 . 50 . 0
-5
n
n
. ll . . . . . . . . . . . . . . . . . . . . . ... . . . . A. . . . . . . . . . . . . . . . . A . . A n = 10 . . . . . . . . . . . . . . . . . . . . . . . . . . . . .A.A . . . . . . . . . . . . . . . A . A . " b n = 9
. " b . . . . . . . . . . . . . .". . . . . . . . . . b. . . . . . J. J. . . . . . . . . . . . " b .
? J @ . n XXX= 8 @
? J . . . . . . . . . . ?. . . . . .. . . . . . .. . . .X. H. H .@ . . . . . .JJ. . . . . . . . . b ? .HHnH= 7 H@ HH b n = 6 ! H H . X ! bb H X HH ! XXX H ! X . H ( ( h h X ( ( h X ( ( h X H X ( h h ( h h XX hh ( . h hh , ,
-4
-3 -2 -1 d n e +1 +2 +3 Figure 4: Distribution of binomial coecients 2
+4
. . . .
+5
-
i
! m + p ? 1 Proof. For each subset procedure comp multiset generates multisets, p?i where 0 i min(n; p) and i = l ? m, which follows from ! the second half of the proof m + p ? 1 for Proposition 1. Hence, MAXmultisets = by the binomial theorem, and m p? applying Sterling's formula to MAXmultisets yields formula (2). 2 Sil ,
+
1
2
Proposition 5 To generate any permutation from an (m + p)-multiset, the number of recursive calls of get permutes is MAXpermutation = m + p: (3) Proof. Since given any (m + p)-multiset M procedure get permutes always makes the same number of recursive calls for generating a permutation of M , MAXpermutation = m + p. 2 Theorem 1 The time complexity of Algorithm 2 is
0 00s 1 s 2 O @(m + p) @@ n 2nA (m + p2 ? 1) 2m
+
p?
! 1
11 + (m + p)AA :
(4)
Proof. A subset group consists of a number of subsets, each of which generates a set of multisets M . Each multiset in M is used for computing a number of permutations in parallel by using procedure get permutes, and each permutation can be generated by the same number of recursive calls. Further, each permutation requires m + p unit time to generate a matcher by a variable substitution for m+p variables. Hence, the time complexity of Algorithm2 is O ((m + p) (MAXsubset?group MAXmultisets + MAXpermutation)) which is formula (4). 2 17
Theorem 2 Algorithm 2 is sound and complete. Proof Algorithm 2 generates all m-idempotent (m + p)-multisets which are permuted to generate all the permutations that are transformed into matchers by variable substitutions. Hence, the algorithm generates all matchers by Proposition 2. 2
4.2 Complexity Analysis and Correctness of Algorithm 1 Theorem 3 The time complexity of Algorithm 1 is
O (m + p)
m X i o
(?1)i
=
! !! n (m + n ? i)m p : i +
Proof. The time complexity of Algorithm 1 is the same as the time complexity of the Optimal Set-term Matching Algorithm in [AGS92]. For details of the proof, see [AGS92]. 2 Theorem 4 Algorithm 1 is sound and complete. Proof. Algorithm 1 generates all possible permutations of (m + p)-multisets and eliminate all non-m-idempotent permutations. Hence, the algorithm generates all matchers by Proposition 2. 2
5 Summary and Future Works In this paper we have proposed two set-term matching algorithms for LDL=NR which generate non-redundant matchers for a given pair of set terms. The proposed parallel algorithm further enhances the set-term matching algorithms in [AGS92] by computing matchers in parallel. These two set-term matching algorithms have been implemented. For future works, we would like to study the set-term uni cation problem in LDL=NR, and provide set-term uni cation algorithms for LDL=NR which generate uni ers in parallel with no duplicates.
References
[AGS92] N. Arni, S. Greco, and S. Sacca. Set-Term Matching in Logic Programming. In International Conference on Database Theory, pages 436{449, 1992. Lecture Notes in Computer Science, 646. [CC90] Q. Chen and W.W. Chu. Deductive and Object-Oriented Database, chapter HILOG: A High-order Logic Programming Language for Non-1NF Deductive Databases. Elsevier Science Publishers, 1990. W. Kim, J. Nicolas, and S. Nishio (Editors). [CGK 90] D. Chimenti, R. Gamboa, R. Krishnamurthy, S. Naqvi, S. Tsur, and C. Zaniolo. The LDL System Prototype. IEEE Transactions on Knowledge and Data Engineering, 2(1):76{89, March 1990. +
18
[Cha93]
Q. Chang. A Logic Database Language for Nested Relations in Partitioned Normal Form. Master's thesis, Brigham Young University, Provo, Utah, December 1993. [CK91] Q. Chen and Y. Kambayashi. Nested Relation Based Database Knowledge Representation. In Proceedings of 1991 ACM SIGMOD International Conference on Management of Data, pages 328{337. ACM, 1991. [CN94] Q. Chang and Y-K. Ng. Built-in Predicates for Extended Relational Algebra Operations in a Logic Database Language. Submitted for Publication, 1994. [Hul86] Richard Hull. Relative Information Capacity of Simple Relational Database Schemata. SIAM Journal of Computing, 15(3):856{886, August 1986. [KN86] D. Kapur and P. Narendran. NP-Completeness of the Set Uni cation and Matching Problems. In Proceedings of 8th International Conference on Automated Deduction, pages 489{495, 1986. Springer-Verlag. [LC88] P. Lincoln and J. Christian. Adventures in Associative-Commutative Uni cation. In Proceedings of 9th Internet Conference on Automated Deduction, pages 359{367, 1988. [MAK86] J. L. Mott and T. P. Baker A. Kandel. Discrete Mathematics for Computer Scientists and Mathematicians. Prentice-Hall, Englewood Clis, NJ, 1986. [NT89] S. Naqvi and S. Tsur. A Logical Language for Data and Knowledge Bases. Computer Science Press, Rockville, MD, 1989. [Sie89] J. H. Siekmann. Uni cation Theory. Journal of Symbolic Computation, 7:207{ 274, 1989. [SS94] L. Sterling and E. Shapiro. The Art of Prolog, Second Edition. The MIT Press, Cambridge, MA, 1994.
19