Membership-Constraints and Complexity in Logic

2 downloads 0 Views 154KB Size Report
anaphora resolution. (7) In [6] it is shown that the ..... we want to unify the sets A and B (represented as the Prolog lists x1;:::;xm] and y1;:::;yn] respectively) then ...
Membership-Constraints and Complexity in Logic Programming with Sets Frieder Stolzenburg Universit¨at Koblenz  Institut f¨ur Informatik Rheinau 1  D-56075 Koblenz  Germany E-mail: [email protected] Abstract. General agreement exists about the usefulness of sets as very highlevel representations of complex data structures. Therefore it is worthwhile to introduce sets into constraint logic programming or set constraints into programming languages in general. We start with a brief overview on different notions of sets. This seems to be necessary since there are almost as many different notions in the field as there are applications such as e.g. program analysis, rapid software prototyping, unificationbased grammar formalisms. An efficient algorithm for treating membership-constraints is introduced. It is used in the implementation of an algorithm for unifying finite sets with tails – also presented here – which is needed in any logic programming language embedding sets. Finally it is shown how a full set language including the operators 2, 2 = , \, [ can be built on membership-constraints. The text closes with a reflection on the complexity of different algorithms – which is single exponential – showing the efficiency of our new algorithm. It illustrates the usefulness of exploiting constraint logic programming techniques for the implementation of a set constraint language. Keywords. Constraint logic programming, logic programming with sets, set unification, data structure sets.

1 Introduction Implementations of logic programming languages have terms as their main data type. But often it may be useful and more natural to represent objects by sets instead of simple terms. General agreement exists about the usefulness of sets as very high-level representations of complex data structures. In particular sets can be conveniently used in rapid software prototyping. Therefore it is worthwhile to introduce sets into logic programming. Of course there are many more applications for sets; we will list them together with different notions on sets later on in the text. 1.1 The Data Structure Sets We will first recall some definitions of the (technical) term set which are common in the literature and used in applications. Various systems of set constraints have been set up for the purpose of axiomatising (Zermelo-Fraenkel) set theory and proving theorems in it. However in practice it seems to be necessary to restrict discussion to a fragment of the theory such that problems involving set constraints remain computable and reasonably efficient procedures exist. We will now define and discuss several notions of sets very briefly.

(1) In [10] ground sets are considered. These are finite subsets of the (finite) Herbrand universe, i.e. sets of ground terms or integers. This restricted notion of set leads to efficient algorithms. Set domain variables are introduced, i.e. variables which are attached with a set of ground sets. Such a set is approximated by its greatest lower and least upper bound with respect to set inclusion. This allows us to exploit efficient constraint satisfaction techniques (arc consistency) [23] for the basic set relation constraints inclusion and disjointness of ground sets. Application domains for ground sets are combinatorial problems based on sets, relations or graphs, e.g. set partition and bin packing. (2) Regular sets are sets describable by a finite tree automaton or equivalently by a system of simultaneous set equations of a certain form. The authors of [1, 5] consider them for set-based program analysis in both the imperative and the logic programming settings. Only monadic (set) properties of program variables are considered; all interdependencies are ignored. In this context there is only one type of object, namely (possibly infinite) sets of ground terms described by set expressions, no explicit membership predicate, but set inclusion and its negation. (3) A hereditarily finite set is a set of finite depth that is finite, its members are finite, the members of its members are finite, etc. Such sets are being considered in a French project on constraint logic programming with sets [15] which aims at rapid prototyping combinatorial problems with sets, multisets and sequences, writing executable specifications and software modelling which also makes use of partial constraint consistency techniques. (4) We may consider finite sets over arbitrary terms, i.e. including variables. They may be (finitely) nested. These sets are used in unification theory, set and associativecommutative-idempotent unification problems, theorem proving and logic programming with sets [13, 20]. Here e.g. the expression fx; y g may denote a set consisting of one or two entities. This is in the nature of sets of course, but we cannot know in advance whether x and y become identical or not because they are variable terms. (5) In the sequel we will mainly consider sets with tails s = fx1 ; : : : ; xk jtg, understood as the union fx1 ; : : : ; xk g [ t; here t is a variable for a set, called the tail of s. These sets may be (finitely) nested too. They are well-suited for the extension of logic programming with sets [7, 8, 21] by exploiting constraint logic programming techniques. (6) Hypersets [2] are rational, i.e. possibly cyclic hereditarily finite sets. If this concept is combined with feature structures as done in [17, 16] then it is possible to implement unification-based grammar formalisms, especially head-driven phrase structure grammar [18]. There, sets are used in the treatment of so-called unbounded dependency constructions, relative clauses, questions, linguistic quantifiers and anaphora resolution. (7) In [6] it is shown that the combination between constraint logic programming with sets and constructive negation opens up the possibility of representing a general class of intensionally defined sets, i.e. sets that are defined by the properties of its elements, not extensionally by enumerating the elements of the sets. The presence of intensional sets leads to an increase in the expressive power and abstraction level offered by the host logic language. 2

1.2 Notions and Notations In our context we are mainly concerned with finite sets with tails. Therefore we assume that the signature of our language is endowed with two functional symbols: ; for the fxjsg stands for fxg [ s in usual notaempty set, and fjg as the set constructor where  tion. Furthermore fx; y jsg is an abbreviation of xjfy jsg , and fz g of fz j;g. The following properties hold: (1) (2)

x; y s = y; x s z; z s = z s

f

j g

f

f

j g

f j g

(permutativity) (absorption)

j g

This set theory corresponds to standard set theory as proven in [8]. In that paper it also is shown how logic programming with sets fits into the framework of constraint logic programming [11]. For this we can fix the set of constraint predicate symbols of = , = and 6= and take our set unification algorithm our scheme to be 2 (membership), 2 based on membership-constraints as a constraint simplification algorithm. As interpretation domain we may choose a hereditarily finite universe over the Herbrand domain. However so-called urelements are disallowed. But another view is possible from the standpoint of unification theory [4]. We can take the equivalence classes of the finest equivalence relation over the Herbrand domain that fulfils the above properties as interpretation domain. Then we have to perform set theory unification. Our set theory is finitary, i.e. for two sets there may be finitely many different and most general unifiers in general. We will consider the general unification problem here where additional function symbols of arbitrary arity may occur. Finite sets with tails shall be used on argument positions in (logic) programs. If we query the system e.g. fx; y g = fa; bg the system will answer with two variable bindings, namely [x a; y b] and [x b; y a] which represent the solutions of the corresponding set unification problem. It is also useful to introduce more operators in = . Furthermore set union [ and intersection the programming language, namely 2 and 2 \ may be desirable. There are constraint canonization algorithms like the ones stated in [8, 16]. Both need an algorithm for set unification but they do not develop algorithms that are reasonable efficient in the average case. In this paper we will concentrate on the implementation of an algorithm for set unification which is needed in any logic programming language embedding (finite) sets. Before we address this problem we have to give a precise definition of set unification. Definition 1. A substitution  is called a unifier of two sets A = fx1 ; : : : ; xm g and B = fy1; : : : ; yn g iff for every x in A there exists some y in B (and vice versa too) such that x = y [12], or (stated differently) A = B , i.e. A  B and B  A . 1.3 Overview of the Paper In the following we will present an algorithm for sets with tails that behaves efficiently in the average case. Firstly, the case where the set tails are empty can be treated by socalled membership-constraints which can be seen as a tricky variant of the predicate member=2 of Prolog using non-unifiability constraints [22]. Constraint techniques such as delayed execution and the first-fail principle lead to even more improvements. Secondly, it will be shown how the algorithm can be lifted to the general case with possibly non-empty set tails. 3

The algorithm compares very well with other approaches since our algorithm avoids the computation of many redundant solutions and has a good run-time performance. Although it uses ideas from constraint logic programming with finite domains [23] and generalized propagation [14], it goes beyond that. The algorithm can be easily implemented in logic programming languages that provide delay mechanisms and explicit control of delayed goals. A prototype has been implemented in ECLiPSe-Prolog [9].

2 Membership-Constraints In this section we want to define two primitive predicates, namely memb=2 and wake=1 that help us deal with membership-constraints. The semantics of memb=2 shall be the same as that of member=2 in Prolog. The difference is that memb=2 is treated as a constraint. Membership-constraints are a means for unifying sets without tails. 2.1 Definitions We will call C = memb(x; L) a membership-constraint where x is a term that will be unified with (at least) one element of the list L which represents a set. Only a complete list L is admissible, i.e. its tail must be the empty list [ ]. By jLj we denote the length of the list L. If L = [y1 ; : : : ; yn ] we can also view C as the disjunction (x = y1 _   _ x = yn ). We look for a substitution  that solves one of the disjuncts, i.e. which makes x identical with some yk (1  k  n). We need the following definition in the algorithm stated next. Definition 2. A generalizer of two terms t1 and t2 is a mapping from terms into variables (called anti-substitution) such that t1 = t2 . is called the most specific generalizer of t1 and t2 iff for any other generalizer  for t1 and t2 there is an anti-substitution " such that t1  = t1 " (or t2  = t2 ", equivalently). A membership-constraint C = memb(x; L) where L = [y1 ; : : : ; yn ] can be simplified by the following Algorithm A. Its implementation in Prolog is sketched in Figure 1. The predicate simplify/3 implements the membership-constraint simplification procedure. It sets up and simplifies the constraint memb(Term,List). If at the end the list List contains more than one element then either the constraint is delayed (Mode=memb) or the term Term is unified with one element of the list (Mode=wake). The mode depends on the context in which simplify/3 is called. The auxiliary predicate extract(Term,List,Rest) has the following semantics. The list Rest (third argument) consists of all those elements of the list List (second argument) which are unifiable with the term Term (first argument). The predicate fails if there is an element in the list which is identical with the term Term. – It follows a natural language description of the algorithm. (1) If there is a y 2 L with x = y (syntactical identity), then C is already solved by this y. (2) Otherwise remove all y 2 L which (a) are not unifiable with x or (b) are identical duplicates of other terms in L. Let L0 be the list of remaining elements. (3) If L0 = [ ] then the computation is stopped here. Backtracking has to be initiated. 4

simplify(Mode,Term,List) :(extract(Term,List,Rest) -> Rest \== [], sort(Rest,Sort), (a) (Sort = [Single] -> Term = Single ; anti_unify_all(Sort,Term), (b) (Mode == memb -> suspend(memb(Term,Sort)) ; (c) Mode == wake -> memb(Term,Sort))) (d) ; true). extract(_,[],[]). extract(Term,[Head|_],_) :Term == Head, !, fail. extract(Term,[Head|Tail],Rest) :Term \= Head, !, extract(Term,Tail,Rest). extract(Term,[Head|Tail],[Head|Rest]) :extract(Term,Tail,Rest). Remarks: (a) The built-in predicate sort/2 sorts a list of Prolog terms and removes duplicates. – (b) The anti-unification (generalization) of all elements in the list Sort is implemented by library functions. – (c) The predicate suspend/2 can be realized by the predicate delay/2 in ECLiPSe, better known as freeze/2 in other dialects. – (d) The definition of memb/2 has to be taken from Algorithm B. – The expression (If -> Then ; Else) is the ifthen-else-construct in Prolog. Fig. 1. Simplification Procedure in Prolog

(4) If L0 = [y ] then x and y have to be unified. Their most general unifier  is applied to all relevant terms occurring in the problem. (5) If jL0 j > 1 then x is unified with the (most specific) generalization (also called antiunification) of all y 2 L0 yielding x0 . It replaces each occurrence of x in the problem. (6) After that the membership-constraint C 0 = memb(x0 ; L0 ) is set up and will be solved later on. The old constraint C is discarded. Proposition 3. Algorithm A is a correct and complete with respect to the satisfiability of membership-constraints. Proof. The correctness of the procedure is obvious. So we will only provide some remarks on the completeness of the procedure. Firstly, completeness is preserved in step (1) because we are only interested in the most general solutions of the constraint C , and the empty substitution  which solves the disjunct x = y is more general than any other substitution. Secondly, step (2b) is justified by a similar argument. Finally, since x must eventually be unified with one y 2 L0 in order to solve the constraint, x may be unified by the above-mentioned generalization in step (5). t u 5

2.2 Constraint-Based Treatment Delayed membership-constraints have to be solved eventually. This is done when a constraint C is woken implicitly or explicitly. If a variable in C is instantiated or unified with another term then it is woken implicitly. Algorithm A is applied to C so that we get a better approximation of this constraint. The predicate wake(C ) shall allow the user to wake up a delayed membership-constraint explicitly. By this x is unified with one y 2 L. A call wake(C ) chooses one of the delayed constraints in a non-deterministic fashion. The well-known first-fail principle can serve as a heuristic for that choice, i.e. the constraint memb(x; L) with smallest jLj should be chosen. Also it is a good idea to treat those constraints first which share the most variables with other constraints. See also [23]. Now we want to present another effective optimization. A membership-constraint can be viewed as the disjunction shown in (1) below. We observe that (1) is equivalent with (2) where 6= denotes the non-unifiability constraint that x and y shall never become identical. This can be implemented via the delaying predicate = =2 of some Prolog dialects. It is clear that version (2) can avoid a lot of redundant solutions. In (3) it is shown how to code this optimization into a real logic program for membership-constraints. We will refer to it as Algorithm B. (1) (2) (3)

x = y 1 x = y2 x = yn x = y1 (x = y2 x = y1) (x = yn x = yn?1 memb(x; [x ]): memb(x; [y L]) x = y memb(x; L): _

_ _

_

^

6

_ _

^

6

^ ^

x = y1 ) 6

j

j

6

^

Proposition 4. The formulae (1) and (2) from above are equivalent. Proof. It is obvious that (2) implies (1). For the other direction of the proof, let us assume that (1) holds. Then one of the disjuncts of (1) holds. Strictly speaking a non-empty subset D of disjuncts of (1) holds. Now let x = yk be the disjunct in D with the smallest k (1  k  n). Then the solution of the disjunct (x = yk ^ x 6= yk?1 ^ : : : ^ x 6= y1 ) of (2) will be identical with or more general than the solution of D. Since we are only interested in most general solutions, the completeness of the algorithm is preserved in t u this pruning step. 2.3 How to Encode Set Unification We can express set unification by means of membership-constraints as follows. If we want to unify the sets A and B (represented as the Prolog lists [x1 ; : : : ; xm ] and [y1 ; : : : ; yn ] respectively) then we have to solve the membership-constraints in (1). That follows directly from the definition of set unification. The code for automatically reducing unification of finite sets to membership-constraints is shown in (2) and (3). We will call the whole procedure Algorithm C. It is sound and complete for set unification without tails. (1) (2) (3)

?

V?

memb(x1 ; B ) memb(xm ; B ) memb(y1 ; A) unify sets(A; B ) subset of (A; B ) subset of (B; A): subset of ([ ]; ): subset of ([x R]; L) memb(x; L) subset of (R; L): ^ ^

^

j

^

6

^ ^

memb(yn ; A)



3 Sets with Tails Now we will consider the unification of sets with tails and formulate a generalized unification algorithm based on Algorithm C. 3.1 The Unification Algorithm The algorithm is based on a case distinction. The set tails may both be the empty set (where it reduces to Algorithm C), or exactly one of them is the empty set; if both tails are variables, then we treat the case separately where both are identical in order to improve efficiency. The new algorithm works as follows: Algorithm D. A unifier of two sets fx1 ; : : : ; xm jsg and fy1 ; : : : ; yn jtg where the set tails s and t may be variables or the empty set can be computed by the following non-deterministic procedure: (1) Let A; A0 ; B and B 0 be finite sets (without tails) such thatA ] A0 = fx1 ; : : : ; xm g and B ] B 0 = fy1 ; : : : ; yn g where ] denotes the union of disjoint sets. If s = ; then it must be B 0 = ;; if t = ; then it must be A0 = ;. (2) Let  be a unifier of A and B , computed by the Algorithm C. If the sets are not unifiable then the computation stops here with failure. Backtracking is initiated then. (3) Let A00 ; B 00 and C be finite sets (without tails) such that A00 ] B 00 ] C = A (or equivalently B ). If s = ; then B 00 = ;; if t = ; then A00 = ;. If s and t are identical variables then this step is omitted. This step prunes a lot of branches because of the disjoint union. (4) If s and t are identical variables then let  0 be defined by the unification equation system [s = t = A0  [ B 0  [ N ]. Otherwise  0 is defined by [s = B 0  [ B 00 [ N; t = A0  [ A00 [ N ]. N is a new variable in this context. If s = ; or t = ; then it is N = ;. (5) If all steps can be executed successfully then  =  0 is a unifier of the given sets. In Figure 2 the algorithm is stated more formally, namely as pseudo-code. We will soon present some examples that shall clarify how it works. But before that we show the correctness and completeness of the algorithm. Theorem 5. Algorithm D computes a correct and complete set of unifiers and is always terminating. Proof. We will only prove the most complex case where the tails s and t are distinct variables. The other cases can be shown in a similar manner. The correctness proof is straightforward and therefore omitted here. The algorithm computes a complete set of unifiers, i.e. for every unifier  of the given sets, there is a unifier  0   (i.e.  0 is more general than  ) that can be computed by the algorithm. Let  be a unifier of the two sets fx1 ; : : : ; xm jsg and fy1 ; : : : ; yn jtg. During step (1) put all x 2 fx1 ; : : : ; xm g for which there exists a y 2 fy1 ; : : : ; yn g with () x = y into A; similarly, put all y 2 fy1; : : : ; yng for which there exists an x 2 0 0 fx1 ; : : : ; xm g with () into B . For all other x and y , let x 2 A and y 2 B , respectively. – Because of the completeness of Algorithm C, a substitution    can be computed in step (2) of Algorithm D with x = y . 7

INPUT two sets fx1 ; : : : ; xm jsg and fy1 ; : : : ; yn jtg OUTPUT unifying substitution  VARS A; B; C and derivatives : finite sets without tails N : set (tail) variable BEGIN (1)

s = THEN B := ; t = THEN A := ; A A := x1 ; : : : ; xm ; B B := y1 ; : : : ; yn ;  := [A = B ]; IF s = t THEN IF s = THEN B := ; IF t = THEN A := ; A B C := A(= B);  := [s = B  B N; t = A  A IF IF

0

(3)

;

f

0

]

;

0

;

]

(2)

0

;

g

f

g

6

00

;

;

00

(4)

]

00

;

00

;

]

0

0

00

[

0

[

[

00

[

N]

ELSE

 := [s = t = A  B  N ];  :=  0

(5)

0

0

[

[

0

END Remarks: (a) A ] B := C partitions the set C non-deterministically into the sets without tail A and B . If one of the latter sets has been defined earlier as the empty set ;, then the other one becomes identical with C . Thus ] denotes disjoint union. – (b) [x = y ] denotes one of the solutions of the unification equation system containing x = y . – (c) Steps (1) and (2) may be totally interleaved for efficiency reasons in order to avoid too much generating and testing. – (d) In step (2) any unification algorithm for finite sets without tails can be used. Fig. 2. Main Algorithm as Pseudo-Code

If () and at most one of the conditions (y) x 2 t and y 2 s for some x and y hold then we can choose x 2 B 00 or y 2 A00 , respectively in step (3). Otherwise, if both conditions (y) hold then  0   is a unifier where  0 contains a solution of [s = fyjR1 g; t = fxjR2 g] where R1 and R2 are the remainders of the sets in question. However the more general solution  0 =  00   0   can be computed by the algorithm where  00   0 contains a solution of [s = R1 ; t = R2 ] in step (4). The algorithm is always terminating. – If at all, the only source for infinite loops is in step (4) where the equation system [s = B 0  [ B 00 [ N; t = A0  [ A00 [ N ] has to be solved. The only critical case is where s or t are not variables. That can only happen if s or t are tails of (nested) sets occurring in the original problem. Since there can only be finitely many such tails and every computed  in each recursive step eliminates at least one such tail variable, the computation will terminate. t u 3.2 Example Unifications Let us now consider the two sets fx1 ; x2 jtg and fc1 ; c2 jtg with identical variable tails t where x1 and x2 are variables and c1 and c2 are constants. In step (1) of Algorithm D we choose fx1 ; x2 g = fx1 ; x2 g ] ; and fc1 ; c2 g = fc1 g ] fc2 g. A unifier of fx1 ; x2 g and 8

fc1 g computed by Algorithm C is  = [x1 c1 ; x2 c1 ] in step (2). We omit step (3) because the tails of the sets are identical variables. In step (4) we have to solve the fc2 jN g] equation t = ; [ fc2 g [ N where N is a new variable. We get  0 = [t 0 as its solution. Thus  =  = [x1 c1 ; x2 c1 ; t fc2jN g] is one of the most general unifiers of the two given sets. If we take different choices we will find further solutions. They coincide with the minimal set of most general unifiers for this example. We do not hesitate to give two more examples: The first is fx; y jsg = fz g where all identifiers denote variables. Here step (1) implies A0 = ; and hence A = fx; y g. This enforces B = fz g and B 0 = ; since otherwise in step (2) A = B would not be z; y z ] then. In Step (3) A00 = ; is constrained. solvable. Step (2) yields  = [x Let us choose B 00 = fz g and thus C = ;. Finally, step (4) leads to  0 = [s fz g]. So the overall solution is  = [x z; y z; s fz g]. There is another solution  0 = [x z; y z; s ;]. Both,  and  0 are most general. – As last example we want to consider the problem fajsg = fbjtg where a; b are constants and s; t are set tail variables. Here in steps (1) and (2) there is nothing left for A and B but to become ; and hence A0 = fag and B 0 = fbg. In step (3) it happens that A00 = B 00 = C = ;. The last steps (4) and (5) yield the only most general unifier  = [s fbjN g; t fajN g].

3.3 Implementation The presented set unification algorithm is implemented in ECLiPSe-Prolog [9] as an extension of Prolog. All solutions can be enumerated via backtracking. In order to avoid a combinatorial explosion, constraint techniques are exploited. Sets on argument positions in Prolog predicates can be written as expected with curly brackets and transformed into a metaterm. A metaterm is a variable with an associated attribute. It behaves like a normal variable, however when it is unified with another term, an event is raised and a user-defined handler specifies what the result of the unification will be. Thus rapidly implementing set unification is possible. Building a full set constraint language is easy then. The Algorithms A and B may be used for the treatment of membership-constraints as an efficient kernel of the language. For the representation of sets we use metaterms (as said above). Another useful concept is coroutining. It modifies the standard left-to-right computation rules by delaying (suspending) the execution of a goal if a certain condition on its arguments hold. It is shown below how basic set-theoretic operations can be expressed in a clear and concise way within our language all using the efficiently implemented membershipconstraints. – First of all the 2-relation can directly be expressed by means of membership-constraints (1). In (2) and (3) the definitions for 2 = and  are stated. If we are able to treat restricted universal quantifiers of the form 8x 2 s where s is a finite set (possibly) with tail then we can express the relation  as shown in (4). In [7] an algorithm for transforming extended Horn clauses with restricted universal quantifiers into ones without them is shown. This allows us to express e.g. the operations intersection \ and union [ quite naturally; see (5) and (6). (1) x 2 fy1 ; : : : ; yn jsg (2) delay x 2 = s if var(s):

x= : x= yr s: xr s 2 ;

2 f j g

(3)

;  f

j g 

memb(x; [y1 ; : : : ; yn ]) s = x : _

x = y x = r: 6

^

2

x s r s: 2

^



9

f

j g

(4) (5) (6)

r s x r : x s: intersection(s2 ; s3 ; r) x r : (x s2 x s3 ) union(s2 ; s3 ; r) x r : (x s2 x s3 ) 

8

2

2

8

8

2

2

2

2

_

^

2

2

That means, we can really incorporate set constraints into logic programming by the above-stated definitions of set-theoretic operations which are based on membershipconstraints. Since unification is one of the main ingredients in logic programming, an algorithm for set unification is absolutely necessary. In addition an extended constraint simplification algorithm is useful such as the one in [8]. For example, the following clash x 6= x can be detected. Here t is a new set tail variable and means ”simplifies to”.

x s x =s;s= xt 2

^

2

f

j g^

x =s;s= xt 2

f

j g^

?

;

x=x x =t 6

^

2



; FALSE

4 Comparison with Other Approaches Let us first make an experimental comparison of different methods that are applicable to set unification problems based on their implementations in Prolog. Later on we will also give a more theoretical analysis. In any case we will not consider nested sets, although most of the algorithms could handle them. 4.1 Experiments and Experiences Let us now draw our attention to the following examples of set unification problems: (1) (2) (3) (4) (5) (6)

w; x; y; z = a; b x; y; z = x; y; z x; f (y1 ); g(y1 ); g(z1 ) = x; f (y2); g(y2 ); g(z2 ) x; y t = z x; y t = a; b t u; v; w s = x; y; z t

f f

g

g

f

f

f

g

f

j g

f g

f

j g

f

f

g

g

j g

f

g

j g

f

j g

The results we get with Prolog implementations (done by the author) are listed in the table shown below. The first row indicates the problem number from above. In an expression of the form n=t, n means the number of solutions that are computed by the respective method and t the overall time in ms to compute the complete solution set. This notation is used in the middle rows. The last row shows the number of minimal solutions. problem (1) (2) (3) (4) (5) (6) na¨ıve 48/2.2 729/20.7 969/38.2 – – – propia [14] 12/68.8 1/17.8 9/141.7 – – – unify [8] 14/13.9 73/61.2 93/222.0 2/0.6 ? 1372/204.2 set [20] 14/3.0 15/4.4 17/9.8 2/0.5 9/1.1 829/46.9 sua [3] 14/28.4 1/17.4 6/65.6 2/7.5 9/37.9 652/784.3 memb 14/6.1 1/0.6 7/5.1 4/0.6 9/3.0 1900/165.1 minimal 14 1 3 2 9 652 Firstly, the results of the na¨ıve approach are listed where the usual definition of member=2 is used instead of membership-constraints. In this case quite a lot of redundant solutions are computed which can be avoided by the other methods. This and the 10

next algorithm is applicable only to finite sets without tails, i.e. for examples (1), (2) and (3). Secondly, generalized propagation [14] applied to the predicate member=2 is used as it is implemented in the library propia of ECLiPSe-Prolog [9]. The main idea of generalized propagation is to extract information from the definition of an arbitrary predicate which is common to all answers to a given goal, say p(x1 ; : : : ; xn ). This means antiunification is performed on all rules defining p=n. In case there remains information not yet extracted, the constraint goal must be delayed so that completeness is preserved. When no more information can be extracted by constraint propagation further progress requires that the system makes some choices which can be made automatically by the goal propia labeling . After that the solution may still contain some constraints saying that a variable may take one of several values; so solutions are bundled sometimes. This fact explains the number of 12 solutions for example (1). Generalized propagation (as implemented in propia) behaves not as good as the other methods, e.g. the membership-constraints presented here. This indicates that the optimization via non-unifiability constraints is a good idea which is not incorporated in generalized propagation. In addition the constraint propagation steps appear to be very timeconsuming. Thirdly, the results gained by a rapid Prolog implementation of the constraint simplification procedure named unify in [8] are listed. It is quite fast in computing a single solution, i.e. the ratio number of solutions / time is not too bad, but it produces a lot of redundant unifiers. The case where the set tails are identical variables is a bit complicated because the plain algorithm might go into an infinite loop. This is why we do not have a measuring for example (5). Fourthly, the row labelled with set shows the number of solutions computed by Algorithm D where in step (2) the set unification algorithm presented in [20] is inserted instead of Algorithm C. This illustrates the fact that we can take an arbitrary algorithm for unifying finite sets without tails in Algorithm D. The results are optimal for the examples where there are no variable interdependencies among the elements of the sets and both set tails are empty. But for other cases it may perform badly. See e.g. examples (2). The algorithm is quite fast because there are no expensive tests. Fifthly, the set unification algorithm sua [3] is considered. In this paper the minimal number of unifiers are stated for some sample problems. The algorithm presented therein avoids some of the redundant solutions produced by the algorithm named set here. But these optimizations can also be built into Algorithm D. For example the values of s and t can be constrained not to introduce variables occurring in some left-to-right or rightto-left fork, respectively. We speak of a fork iff an element of one set has been unified (i.e. chosen in a membership-constraint) with two or more elements of the other set. – The algorithm sua computes in most but not all cases a minimal solution set; look at example (3). However its run-time performance is not so good. Last but not least, the behaviour of our membership-constraints (with the proposed optimization) on the examples is shown (row memb), i.e. Algorithm D plus C. It turns out that its performance is quite reasonable in the average, but there seems to be a tradeoff. On the one hand the non-unifiability constraints avoid redundant solutions if there are variable interdependencies among the elements of the sets. But on the other hand, if not, then the Algorithm D does not always find a minimal complete solution set. However, we made an interesting observation on membership-constraints, changing the order in which the elements are chosen by the predicate memb=2 for each 11

membership-constraint. The results suggest that we can almost always achieve that only the minimal solutions are computed, provided we take the right ordering. This point needs further investigation. For optimal orderings both the number of solutions and the run time decrease simultaneously of course. – More analyses are stated in [20, 22, 3]. Yet another approach can be found in [19]. It is dedicated to database applications. In that paper a compilation technique is proposed which in some cases unfortunately increases the code size exponentially. In addition only matching of sets without tails is considered. So it is more restricted than the other algorithms. 4.2 Complexity Issues The algorithm presented here of course is much better than the na¨ıve algorithm that makes use of the ordinary predicate member=2. This leads to an explosion of (at most)

Tna¨ıve(m; n) = mn nm solutions, provided we want to unify the sets A = x1 ; : : : ; xm and B = y1 ; : : : ; yn . (In this context T. shall always denote the maximal number of unifiers of finite sets with

f

g

f

g

out tails computed by the respective algorithm.) In addition, if there are many interdependencies among the terms in the sets, then our algorithm outperforms the others in many cases because of the constraint techniques. Nevertheless, the set unification (decision) problem remains NP-complete, even if only sets without tails are considered as shown in [12]. The algorithms for unifying finite sets with tails have ”only” single exponential complexity, whereas in deduction with associative-commutative (AC) functors (possibly with idempotency and unit element) the number of AC-unifiers may be double exponential [13] (although this could be reduced by constraint techniques in many cases). However, if we restrict ourselves to finite sets with tails single exponential complexity is obtained. Look at the following theorem. Theorem 6. The complexity of Algorithm D is single exponential. That means, given two sets A [ s and B [ t where A and B are as above, and s and t are the set tails, the algorithm does not compute more than cp(m;n) solutions where c is a constant and p(m; n) a polynomial in m and n. Proof. We will show that each step in Algorithm D admits at most a single exponential number of (non-deterministic) choices such that the overall complexity, i.e. the product of the complexities over all steps, is clearly single exponential too. We can partition A and B in 2m and 2n ways in step (1). Their product 2m 2n = 2m+n obviously is single exponential in m and n, i.e. in the input length. Below we will see that step (2) also requires single exponential complexity. So we only have to consider step (3). Here the set A or B is split into three parts. There are at most 3min(m;n) possibilities for doing so, i.e. single exponentially many. t u The last critical step is (2) where two sets without tails have to be unified. If we assume the worst case, i.e. A and B consists of variables that are pairwise distinct, then we have Tmin minimal solutions which is the number of minimal left and right total relations between A and B . From the theory of exponential generating functions it follows that

@f m+n ?0; 0 Tmin(m; n) = m! n! @x m @y n 

where

12

f (x; y) = ex (ey ?1)+y (ex ?1)?xy

which has been proven in [20]. – Anyway, all algorithms for unifying finite sets without tails presented here have single exponential complexity, even the na¨ıve one because of the following proposition: Theorem 7. For all m; n  1 and c  e2=e

min(m; n)!

(1)





2:087 it holds:

Tmin(m; n) Tset (m; n) Tmemb(m; n) Tna¨ıve(m; n) cmn (2)

(3)

(4)

(5)









Proof. Due to the lack of space we will not carry out the proof in detail but give only some remarks on the parts of the chain of inequations: (1)

(2)

(3) (4) (5)

min(m; n)! is a lower bound of the complexity because each permutation of the

smaller set leads to a solution. – Since factorials increase faster than any power cm+n , the degree of the polynomial in the exponent must be greater than 1. Since all considered algorithms compute a complete set of unifiers this relation is quite clear. – It holds Tmin = Tset if there are no variable interdependencies among the terms in the sets. This holds because the solution set computed by the algorithm in [20] always is a subset of the solutions computed with membership-constraints. As in the previous item, also a subset-relationship holds here. The proof for this case requires standard mathematical analysis techniques. – It follows that the degree d of the polynomial in the exponent must be less than 2, i.e. we d estimate the complexity by a function of the form c(m+n) . t u

The lower and upper bounds in the chain of inequations above imply that for the degree d it holds 1 < d < 2. Furthermore d tends to 1 for m; n ! 1. But this only gives a rough estimation of the complexity. Of course Algorithm C whose analysis is given next is orders of magnitudes better than the na¨ıve algorithm. – At first we introduce special numbers which are needed in the subsequent proof. 

Definition 8. nk stands for the number of ways to partition a set of n things into k non-empty subsets. They are called Stirlingnnumbers kind and are defined o of the second  0 n  n?1 n ? 1 as follows: 0 = 1; k = k  k + k?1 for n  k > 0; nk = 0 otherwise. Proposition 9. Algorithm C produces (at most) the following number of solutions :

Tmemb(m; n) =

min( m;n) X

k=0

n on o k! mk nk

Proof. We will show that each solution computed by Algorithm C uniquely determines an equivalence relation on A [ B where every equivalence class contains at least one element from each of both sets A and B . This clearly leads to the above-stated formula for Tmemb. – But how can we uniquely map each solution of the algorithm to such an equivalence relation? For this let  be an arbitrary solution produced by Algorithm C. Then  establishes an equivalence relation  on A [ B – where in fact every class contains at least one element of each set – as follows: z  z 0 iff z = z 0  . Each solution  corresponds to exactly one such relation since for each equivalence class [z ] the following must hold: 13

Let X = A \ [z ] and Y = B \ [z ], and xi0 2 X and yj0 2 Y the elements with the smallest indices i0 and j0 in the respective sets. Then for all x 2 X the membershipconstraint memb(x; B ) permits only one solution, namely x = yj0 , since it must hold x 6= yj for all j < j0 (because of the non-unifiability constraints). For similar reasons the membership-constraint memb(y; A) with y 2 Y must deterministically choose y = xi0 . t u

5 Conclusion The Algorithm D presented here is reasonably efficient and easily implementable. It uses constraint techniques and delaying mechanisms which avoid many redundant solutions and hence combinatorial explosion. It can be embedded in a full set constraint language [8, 6] such that it is possible to use sets as first-class citizens in logic programming. – Furthermore we gave a complexity analysis which gives an asymptotic estimation of our algorithm as well as the problem itself. It shows the good performance of the algorithm presented here.

Acknowledgements I would like to thank Peter Baumgartner, J¨urgen Dix, Dexter Kozen, Bruno Legeard, Andreas Podelski, J¨orn Richts, Gianfranco Rossi, Martin Volk and Graham Wrightson for helpful discussions or comments on this paper.

References 1. A. Aiken, D. Kozen, M. Vardi, and E. Wimmers. The complexity of set constraints. In E. B¨orger, Y. Gurevich, and K. Meinke, editors, Proceedings of the Conference on Computer Science Logic, September 1993, pages 1–17. European Association for Computer Science Logic, Springer, Berlin, Heidelberg, New York, 1993. LNCS 832. 2. D. Aliffi, G. Rossi, A. Dovier, and E. O. Omodeo. Unification of hyperset terms. In E. G. Omodeo and G. Rossi, editors, Proceedings of the Workshop on Logic Programming with Sets, in conjunction with the 10th International Conference on Logic Programming, pages 27–30, Budapest, Hungary, June 1993. 3. P. Arenas-S´anchez and A. Dovier. Minimal set unification. In M. Hermenegildo and S. D. Swierstra, editors, Proceedings of the 7th International Symposium on Programming Language Implementation and Logic Programming, pages 397–414. Springer, Berlin, Heidelberg, New York, 1995. LNCS 982. 4. F. Baader and K. U. Schulz. Unification in the union of disjoint equational theories: Combining decision procedures. In D. Kapur, editor, Proceedings of the 11th International Conference on Automated Deduction, Saratoga Springs, NY, USA, June 1992, pages 50–65. Springer, Berlin, Heidelberg, New York, 1992. LNAI 607. 5. L. Bachmair, H. Ganzinger, and U. Waldmann. Set constraints are the monadic class. In Proceedings of the 8th Annual Symposium on Logic in Computer Science. IEEE, June 1993. 6. P. Bruscoli, A. Dovier, E. Pontelli, and G. Rossi. Compiling intensional sets in CLP. In P. Van Hentenryck, editor, Proceedings of the 11th International Conference on Logic Programming, Santa Margherita, Ligure, Italy, June 1994, pages 647–661. MIT Press, Cambridge, MA, London, England, 1994.

14

7. A. Dovier, E. G. Omodeo, E. Pontelli, and G. Rossi. Embedding finite sets in a logic programming language. In E. Lamma and P. Mello, editors, Proceedings of the 3rd International Workshop on Extensions of Logic Programming, Bologna, Italy, February 1992, pages 150– 167. Springer, Berlin, Heidelberg, New York, 1993. LNAI 660. 8. A. Dovier and G. Rossi. Embedding finite sets in CLP. In D. Miller, editor, Proceedings of the International Logic Programming Symposium. MIT Press, Cambridge, MA, London, England, 1993. 9. ECRC GmbH, M¨unchen. ECLiPSe 3.5: User Manual – Extensions User Manual, February 1995. 10. C. Gervet. Conjunto: Constraint logic programming with finite set domains. In M. Bruynooghe, editor, Proceedings of the International Logic Programming Symposium, Ithaca, NY, November 1994, pages 339–358. MIT Press, Cambridge, MA, London, England, 1994. 11. J. Jaffar and M. J. Maher. Constraint logic programming: a survey. Journal of Logic Programming, 19,20:503–581, 1994. 12. D. Kapur and P. Narendran. NP-completeness of the set unification and matching problems. In J. H. Siekmann, editor, Proceedings of the 8th International Conference on Automated Deduction, Oxford, July 1986, pages 489–495. Springer, Berlin, Heidelberg, 1986. LNCS 230. 13. D. Kapur and P. Narendran. Double-exponential complexity of computing a complete set of AC-unifiers. In Proceedings of the 7th Annual Symposium on Logic in Computer Science, Santa Cruz, CA, pages 11–21, 1992. 14. T. Le Provost and M. Wallace. Generalized constraint propagation over the CLP scheme. Journal of Logic Programming, 16(3&4):319–359, 1993. 15. B. Legeard, H. Lombardi, E. Legros, and M. Hibti. A constraint satisfaction approach to set unification. In Proceedings of the 13th International Conference on Artificial Intelligence, Expert Systems and Natural Language, pages 265–276, Avignon, May 1993. 16. S. Manandhar. An attributive logic of set descriptions and set operations. In Proceedings of the 32nd Annual Meeting of the Association for Computational Linguistics, 1994. 17. C. J. Pollard and M. D. Moshier. Unifying partial description of sets. In P. Hanson, editor, Information, Language, and Cognition, pages 285–322. University of British Columbia Press, Vancouver, BC, 1990. 18. C. J. Pollard and I. A. Sag. Head-Driven Phrase Structure Grammar. University of Chicago Press, Chicago, London, 1994. CSLI publication. 19. O. Shmueli, S. Tsur, and C. Zaniolo. Compilation of set terms in the logic data language (LDL). Journal of Logic Programming, 12(1&2):89–119, 1992. 20. F. Stolzenburg. An algorithm for general set unification and its complexity. In E. G. Omodeo and G. Rossi, editors, Proceedings of the Workshop on Logic Programming with Sets, in conjunction with the 10th International Conference on Logic Programming, pages 17–22, Budapest, Hungary, June 1993. Submitted to Journal of Automated Reasoning. 21. F. Stolzenburg. Logic programming with sets by membership-constraints. In N. E. Fuchs and G. Gottlob, editors, Proceedings of the 10th Logic Programming Workshop, Universit¨at Z¨urich, 1994. Institut f¨ur Informatik. Technical Report ifi 94.10. 22. F. Stolzenburg. Membership-constraints and some applications. Fachberichte Informatik 5/94, Universit¨at Koblenz-Landau, Koblenz, May 1994. 23. P. Van Hentenryck. Constraint Satisfaction in Logic Programming. MIT Press, Cambridge, MA, London, England, 1989.

15

Suggest Documents