Minimal Set Uni cation Puri Arenas-Sanchez
Dpto. Informatica y Automatica Fac. C.C. Matematicas, (U.C.M.) Avda. Complutense s/n, 28040 MADRID. e-mail:
[email protected]
Agostino Dovier
Dipartimento di Informatica Universita di Pisa Corso Italia 40, 56100 PISA. e-mail:
[email protected]
Abstract
A uni cation algorithm is said to be minimal for a uni cation problem if it generates exactly the minimal complete set of uni ers, without instances, without repetitions. Aim of this paper is to describe a new set uni cation algorithm which is minimal for a signi cant set of sample problems that can be used as benchmarks for testing any set uni cation algorithm. Keywords: Logic Programming with Sets, CLP, Uni cation.
1 Introduction The drawing up of many papers concerning Constraint Logic Programming with sets (see e.g. [5, 6, 10]) has pointed out that the complexity of the (NP complete{see e.g. [3]) set uni cation problem is the real bottleneck of any attempt to extend Logic Programming with set entities. The loss of the uniqueness of the most general uni er property forces any set uni cation algorithm to return a complete set of uni ers (i.e., as it will be explained in Sect. 2, a set of uni ers which covers all possible solutions for the problem at hand) for any satis able input. If the algorithm returns such uni ers all at once (as a disjunction), then the number of computed answers grows exponentially. The non-determinism lying inside any Logic Programming interpreter suggests the use of a non-deterministic uni cation algorithm which returns exactly one uni er for each non-deterministic branch: in this way any non-deterministic computation can run without undergoing the unpleasant eects of the non uniqueness of the most general uni er. In [2] it has been shown that if a representation for sets comprising also a constant symbol for the set universe and the set-minus operator is adopted, the unique P. Arenas-S anchez is partially supported by the Spanish National Project TIC92{0793{C02{01 \PDR"and the Esprit BRA Working Group Nr. 6028 \CCL". The work is partially supported by C.N.R. grant 94.00472.CT12, \Logic Programming with Sets".
uni cation theorem can be recovered. Nevertheless, nested sets are not allowed, and the answer to a uni cation problem contains a large amount of information, becoming scarcely readable. In [10] any set uni cation problem is delayed until it is transformed into a simple ground `test'. This improves eciency, however, if the two terms do not become ground, obscure answers such as fX1; f (X1 ; X3)g = fY1; f (fY3g; X1); X2g are returned. There are dierent ways to represent a nite set. Among them, the union of singletons representation which depicts fs1 ; : : :; smg as fs1g[ [fsmg, and the list representation which uses the term fs1 j fs2 j f fsm j f gg ggg for denoting the same set. The former representation (which is associated with an ACI equational theory{ cf. [11]) is more expressive than the latter (which is associated to the equational theory described in Sect. 2). For instance, the problem X1 [ [ Xm = fa1g[ [fang, where Xi 's are pairwise distinct variables and aj 's are pairwise distinct constant symbols, admits (2m ? 1)n independent solutions. Since the semantics of ft j sg is s [ftg, if m > 1 such problem cannot be expressed by a list representation. Since the minimum cardinality of a complete set of uni ers expressible with the list representation is itself conspicuous (cf. Sect. 3), we prefer to deal with such representation, avoiding further problems that the \union of singletons" approach would open. The same choice has been performed in [5, 10]. In this paper we present a new set uni cation algorithm, minimal for a signi cant set of sample problems. Such problems can be used for testing any set uni cation algorithm due to their simplicity (which re ects into a simpli cation of the analysis) and the fact that they maximize the number of solutions for uni cation problems of given size (the presence of distinct variables as elements of the sets to be uni ed guarantees the maximum number of solutions). A deep combinatorial study of such problems along with the recursive functions computing the cardinality of their minimal complete set of most general uni ers can be found in [1]. The aim of getting minimal algorithms for set uni cation has already been treated in literature. For instance in [12] three set uni cation algorithms are proposed. The most ecient seems to be the third one, however, comparing its results with our minimality study, it is possible to conclude that such algorithm is not minimal for problems (6), (7), and (8) described in Sect. 3. In [3], the presented naive set uni cation algorithm (based on [9]) has a minimal behavior only for the rst of such benchmarks. The paper is organized as follows: in Sect. 2 we comment brie y on some preliminary concepts needed in the rest of the paper. Section 3 presents eight sample uni cation problems along with the tables reporting their minimal number of solutions. In Sect. 4 the new set uni cation algorithm SUA is presented, proving its termination and minimality for all sample problems suggested; Some conclusions are nally drawn up in Sec. 5.
2 Preliminaries We will make use of standard CLP (see e.g. [8]) and uni cation theory (see e.g. [11]) notations. Given a signature (a set of functional symbols along with their arities) and a denumerable set of variables V , ( [V ) and () will denote the sets of terms and ground terms, respectively. We require the signature contains (at least) the constant symbol f g, representing the empty set, and the binary symbol f j g, used as set constructor symbol; the intuitive semantics of ft j sg is ftg [ s. Similarly to lists in Prolog, the term fa j fb j fc j f gggg will be denoted simply as fa; b; cg. The following two equational axioms (Ab) fX; X j Z g = fX j Z g (CR) fX; Y j Z g = fY; X j Z g (Ab stands for absorption and CR stands for commutativity on the right {see [11]) uniquely identify a nest congruence =T on (). Following [11], two terms s and t are said to be T -uni able i there is a substitution such that s =T t; such a is called a T -uni er. We write, for any set of variables W V , =WT i 8x 2 W x =T x. In the same way, is more general than in T over W ( WT ) i 9 =WT . The corresponding equivalence relation on substitutions is denoted by WT ; i.e., WT Si WT and WT . The set of all T -uni ers of two terms s and t is denoted by T (s; t). A set U of T -uni ers is said to be complete w.r.t. s and S t if (8 2 T (s; t))(9 2 U )( WT ), where W = V ar(s) [ V ar(t). The set of most general T -uni ers of sSand t, denoted by ST (s; t), is a minimal complete set of T -uni ers if (8 S2 T (s; t))( WT ) WT ), where W = V ar(s) [ V ar(t). Note that since T (s; t) exists, it is unique up to VT [7]. Similar de nitions can be given starting from a system of equations E instead of the uni cation problem s = t. When the context is clear, we will omit the pre x T before the word uni er. For any satis able Herbrand system E involving terms from ([V ), a uni cation algorithm should be able to compute through non-determinism each element of a complete set of uni ers of E . Notice that it is not required that it computes exactly ST (E ). However, as we will see in detail in Sect.S3, the presented theory T is such that, even for simple uni cation problems s = t, T (s; t) becomes larger and larger. This means that a valid criterion to compare two (set) uni cation algorithms is the analysis of the length of the list of solutions computed by them (the use of the word `list' here is needed to re ect the fact that if a uni cation algorithm computes exactly ST (s; t), but some solution is returned more than once, it cannot be considered minimal). If the input system E is T -unsatis able, any uni cation algorithm should conclude its computation reporting a failure result.
3 Sample Problems In this section we single out eight set uni cation problems that we propose as `benchmarks' to test the minimality of any set uni cation algorithm.
Given a uni cation problem between two sets (for instance ffX; f (g(Y )) j Rg; f (X )g = fA; fB; f (C )g j Rg) it is possible, in principle, to determineSthe number of most general and independent uni ers for it (i.e. the cardinality of T (s; t)). Nevertheless, itSis impossible to test the optimality (namely the capability of returning exactly j T (s; t) j solutions) for all possible problems s = t with s; t in ( [ V ). A criterion for selecting some sample problems must be chosen: problems with nested sets (i.e. sets containing sets, such as example above) give rise to confusion in the analysis. It is more important to concentrate the eorts in pointing out the new uni cation problems between elements of the two sets that must be generated; with the signature , closed sets (i.e. of the form ft1 j ft2 j ftn j f gg gg) and open sets (i.e. of the form ft1 j ft2 j ftn j Rg gg, R variable) can be described. Any uni cation problem between two sets can be of one of the following forms: { closed with closed (problems (1){(4)); { open with closed (or vice versa) (problems (5) and (6)); { open with open (problems (7) and (8)). In each of such cases, if the elements of the sets are distinct variables, the number of possible solutions is maximized (for instance fX1; X2 ; X3g = fY1; Y2; Y3g admits 15 solutions, meanwhile fX1; X2; X3g = fa1; a2; a3g, fX1; X2; X3g = fY1; X2; X3g and fX1; X2; X3g = fY1; Y2; X3g admit 6, 3, and 6 solutions, respectively). As particular cases, it is interesting to analyze the cleverness of an algorithm in solving problems in which the two sets share elements (problems (3) and (4)). Moreover, also a shrewd treatment of the matching problem (namely when one of the two sets is ground) is important (problems (1) and (5)). The uni cation problem between open sets must be considered dierently whether the `rest' variables are identical or not. The announced sample problems are the following: (1) fX1; : : :; Xmg = fa1; : : :; ang ; (2) fX1; : : :; Xmg = fY1; : : :; Yng ; (3) fX1; : : :; Xm; a1; : : :; ak g = fY1; : : :; Yn; a1; : : :; ak g ; (4) fX1; : : :; Xm; Z1; : : :; Zk g = fY1; : : :; Yn; Z1; : : :; Zk g ; (5) fX1; : : :; Xm j Z g = fa1; : : :; ang ; (6) fX1; : : :; Xm j Z g = fY1; : : :; Yng ; (7) fX1; : : :; Xm j Z g = fY1; : : :; Yn j Z g ; (8) fX1; : : :; Xm j W g = fY1; : : :; Yn j Z g ; where X1 ; : : :; Xm ; Y1; : : : ; Yn; Z1; : : : ; Zk ; W; Z are pairwise distinct variables, and a1; : : : ; amaxfn;kg are pairwise distinct constant symbols.
Due to space limits we will not enter into the details of the functions which S compute the cardinality of T for problems (1){(8). A complete analysis of such functions can be found in [1]. We only give a taste of them by presenting the following tables which report some numerical values, where the number on the `x' axis denotes the value for m (the rst parameter in our uni cation problems). (1) 1 2 3 4 5 6 7
1 2 3 4 5 6 7 1 1 1 1 1 1 1 2 6 14 30 62 126 6 36 150 540 1806 24 240 1560 8400 120 1800 16800 720 15120 5040
(2) 1 2 3 4 5 6 7
1 2 1 1 2
3 1 6 15
4 1 14 48 184
5 6 1 1 30 62 165 558 680 2664 2945 13080 63756
7 1 126 1827 11032 59605 320292 1748803
Problem (3) (numerically equal to problem (4)) would require a three-dimensional matrix to represent its values. Assume k = 3: (3) 1 2 3 4 5 6 7 (5) 1 2 3 4 5 6 7 (6) 1 2 3 4 5 6 7 (7) 1 2 3 4 5 6 7 (8) 1 2 3 4 5 6 7
1 2 3 7 19 61 56 195 705
4 223 746 2859 12226
5 6 877 3559 3093 13808 12681 60231 56891 284286 277091 1448325 7888698
7 14581 65391 302829 1510483 8044117 45590823 273498973 1 2 3 4 5 6 7 2 2 2 2 2 2 2 4 12 28 60 124 252 508 6 30 126 462 1566 5070 15966 8 56 344 1880 9368 43736 195224 10 90 730 5370 36250 228090 1359130 12 132 1332 12372 106452 856212 6505812 14 182 2198 24710 259574 2562182 23928758 1 2 3 4 5 6 7 2 2 2 2 2 2 2 5 12 28 60 124 252 508 10 42 144 486 1596 5106 16008 19 126 584 2584 11208 48248 205864 36 360 2200 11930 63000 330450 1733000 69 1016 8118 52740 325812 1983084 12073836 134 2870 29876 231518 1641444 11310530 77511140 1 2 3 4 5 6 7 2 4 8 16 32 64 128 11 30 85 248 735 2194 94 308 1104 4210 16538 1041 3920 16981 80260 14006 59412 303428 221971 1048054 4063382 1 2 3 4 5 6 7 4 9 18 35 68 133 262 39 131 413 1185 3459 10071 652 2811 11402 44983 175224 15937 82499 409897 1997795 524056 3133773 18217350 21998671 148144723 1136372140
4 The Algorithm SUA In order to make the description of the algorithm as clear as possible, some local notation will be de ned. The set-operations `j j' (cardinality), `' (inclusion), `' (strict inclusion), `[', (union) `\', (intersection) and `?' (set dierence) will be used on terms denoting sets. The meaning of the set operators is purely syntactical; for instance jfX1; X2gj = 2: we do not need to distinguish the two cases X1 = X2 ^ jfX1; X2gj = 1 and X1 6= X2 ^ jfX1; X2gj = 2. The function Can which removes duplicates in a term representing a set ensures that they can be used without semantical problems. A set of equations E is said to be in solved form if it has the form fX1 = t1; : : : ; Xn = tng where Xi's are distinct variables not occurring in tj , for all i; j 2 f1; : : : ; ng. A solved form system fX1 = t1; : : :; Xn = tng can be viewed as the substitution fX1 =t1; : : : ; Xn=tn g. An equation X = t is said to be solved in E if X does not occur neither in t nor in E n fX = tg. Actions 1{8 of the uni cation algorithm SUA are identical to the used ones in the uni cation algorithm presented in [3] (for further details the reader can consult that paper). SUA takes as input a system of equations E between terms and returns either fail {E is not uni able{ or, non-deterministically, a substitution . The set of all such 's constitutes a complete set of T -uni ers. In the algorithm, X , W , W 0, Z and Z 0 denote generic variables, t; t1; t2; : : :; s1; s2; : : : denote generic terms, N denotes a new variable introduced by SUA. k and k0 will denote non-variable terms whose main functional symbol is distinct from f j g1. Var (`) represents the set of variables occurring in the term `, while jS constraints the domain of to the variables contained in S . n.d. is a denotation for non-deterministically. The algorithm temporarily generates equations marked by `'; they are called active equations and they are immediately removed by action 0. E denotes the set of active equations. function SUA(E ); If E is in solved form then return E elseif E is not empty then choose n.d. one active equation s = t in E ; E 0 := E ? fs = tg; 0. i. s X : if X occurs in t then fail else return SUA(E 0 [X=t] [ fX = tg); ii. s f (s1 ; : : :; sm) and t f 0 (t1 ; : : :; tn ): if f 6 f 0 then fail else (i.e. f f 0 and m = n): return SUA(fs1 = t1 ; : : :; sn = tn g [ E 0 ); else choose n.d. one equation e (not in solved form) in E ; E 0 := E ? feg; case e of: 1. X = X : return SUA(E 0 ); 2. t = X and t is not a variable: return SUA(fX = tg [ E 0 ); 3. X = t, X is a variable occurring in E 0 but not in t: return SUA(E 0 [X=t] [ fX = tg); 4. X = ft1 ; : : :; tm j X g: return SUA(E 0 [ fX = ft1 ; : : :; tm j N gg); 5. X = ft1 ; : : :; tm j tg, where t is a variable or t f (t1; : : :; tn ), f 6 f j g, and X occurs in t1 , or : : :, or in tm , or t is not a variable and X occurs in t: fail; 6. X = t, and t f (t1 ; : : :; tn ), f 6 f j g and X is a variable occurring in t: fail; 1
Such entities are named kernels in [3].
7. f (s1 ; : : :; sn ) = g (t1; : : :tm ), where f; g 2 , f 6 g : fail; 8. f (s1 ; : : :; sn ) = f (t1 ; : : :tn ), f 2 , f 6 f j g: return SUA(E 0 [ fs1 = t1 ; : : :; sn = tn g); 9. ft1 ; : : :; tm j kg = fs1 ; : : :; sn j k0g: return SUA(unify set(ft1 ; : : :; tm g; fs1; : : :; sn g) [fk = k0 g [ E 0 ); 10. fs1 ; : : :; sn j kg = ft1 ; : : :; tm j Z g: return SUA(fft1 ; : : :; tm j Z g = fs1 ; : : :; sn j kgg [ E 0 ); 11. ft1 ; : : :; tm j Z g = fs1 ; : : :; sn j kg, choose n.d. f g = 6 S Can (fs1; : : :; sn g): return SUA(fZ = Can (fs1; : : :; sn g ? S [ Z 0 [ k)g [ limit 1(ft1; : : :; tm g; S , Z 0 ; E 0)jVar ([s1 ;:::;sn ;t1 ;:::;tm ;Z ]) ); 12. ft1 ; : : :; tm j Z g = fs1 ; : : :; sn j Z g: select n.d. one of the following actions: i. choose n.d. T ft1 ; : : :; tm g and S fs1; : : :; sn g s.t. T [ S 6= ft1 ; : : :; tm ; s1 ; : : :; sng: return SUA(unify set(T; S ) [fZ = Can ((ft1; : : :; tm g ? T )[ (fs1; : : :; sn g ? S ) [ N )g [ E 0); ii. return SUA(unify set(ft1 ; : : :; tm g; fs1; : : :; sn g) [E 0 ); 13. ft1 ; : : :; tm j W g = fs1; : : :; sn j Z g where Z and W are dierent variables; choose n.d. T Can(ft1 ; : : :; tmg), S Can (fs1; : : :; sng): return SUA(fW = Can (fs1; : : :; sn g ? S [ W 0 ); Z = Can (ft1 ; : : :; tm g ? T [ Z 0 )g[ limit 2(T , S ,Z 0,W 0,E 0 )jVar ([s1 ;:::;sn ;t1 ;:::;tm ;Z;W ]) ).
Active equations s = t should be handled before the others in order to guarantee termination (see details in the proof of Theorem 1). They are temporarily introduced by actions 4, 9, 11, 12, and 13. The function unify set takes as input two terms S1 and S2 representing nonempty sets; it selects non-deterministically which equalities between elements of S1 and S2 should accompany the system E in a recursive call to SUA. function unify set(S1 ; S2); Let ft1; : : :; tmg = Can (S1) ; fs1; : : :; sng = Can(S2) ; 1. If ft1 ; : : :; tm g and fs1 ; : : :; sn g are syntactically equal then return SUA(E ) 2. elseif m = 1 and n > 1 then return fsi = t1 : 1 i ng; 3. elseif m 1 and n = 1 then return fti = s1 : 1 i mg 4. else Common part := ft1; : : :; tm g \ fs1 ; : : :; sn g ; Disagr 1 := ft1; : : :; tm g ? Common part ; Disagr 2 := fs1; : : :; sn g ? Common part ; (a) if Common part = f g then x an i 2 f1; : : :; mg: select n.d. one of the following actions: i. return =1 (ti ; ft1; : : :; tm g; fs1; : : :; sn g); ii. return =2 (ti ; ft1; : : :; tm g; fs1; : : :; sn g); iii. return =3 (ti ; ft1; : : :; tm g; fs1; : : :; sn g); (b) if Common part 6= f g then choose n.d. S0 ; S1 Common part, S0 \ S1 = f g; choose n.d. T0 Disagr1 and T1 Disagr2 such that jT0j jS0j and jT1j jS1j: return unify set(Disagr 1 ? T0; Disagr 2 ? T1 ) [ unify set2(T1 ; S1) [ unify set2(T0 ; S0).
Some comments on unify set are needed in order to relate it to the uni cation problems (1), (2), (3) and (4). Action 4.(a) is motivated by problems of type (1)
and (2). The basic idea is that when an answer is computed by SUA without using 2 the function =, then such answer may be considered as a surjective function from the leftmost set to the rightmost one. Note that the cardinality of the minimal complete set of most general uni ers for problem (1) is exactly the number of surjective functions between fX1; : : : ; Xmg and fa1; : : :; ang. Since =2 can not be applied to problem (1) (otherwise SUA would produce fail), then the algorithm computes surjective functions as solutions for such problem. On the other hand, for problem (2), =2 must be considered in order to capture solutions containing Yk1 = Xi; : : : ; Ykj = Xi (named kj -forks), for some j > 1, 1 i m. The intended meaning of unify set2 3 2 is to consider non deterministically either =1 or =, but never =, since to consider k-forks produces redundant solutions for problem (4)2 . The formal de nition of unify set2 is the following: function unify set2(ft1 ; : : :; tm g; fs1; : : :; sn g); 1. if m = 1 and n > 1 then return ft1 = si : i = 1; : : :; ng; 2. if m 1 and n = 1 then return fti = s1 : i = 1; : : :; mg; 3. if m; n > 1 then x a value i 2 f1; : : :; mg; select n.d. one of the following actions: i. return =1 (ti ; Can (ft1; : : :; tm g); Can (fs1; : : :; sn g)); ii. return =3 (ti ; Can (ft1; : : :; tm g); Can (fs1; : : :; sn g)). i See now the de nitions of functions =, 1 i 3: 1 = matches one element ti of the rst set with one element sj of the second one, and combines the two sets deprived of the selected elements. =1 has the following structure: function =1 (ti ; ft1; : : :; tm g; fs1; : : :; sn g); choose n.d. one j 2 f1; : : :; ng: return fti = sj g[ unify set(ft1 ; : : :; ti?1 ; ti+1; : : :; tm g; fs1; : : :; sj ?1; sj +1 ; : : :; sn g). =2 captures the concept of k-fork and is de ned as follows: function =2 (ti ; ft1; : : :; tm g; fs1; : : :; sn g); choose n.d. S fs1; : : :; sn g such that jS j 2: return fti = s : for all s 2 S g[ unify set(ft1 ; : : :; ti?1 ; ti+1 ; : : :; tm g; fs1; : : :; sn g ? S g). =3 try to unify k > 1 elements of the leftmost set to only one element of the rightmost one. In particular, for problem (2), =3 produces bindings of the form Xk1 = Yj ; : : : ; Xki = Yj , for some 1 j n, i > 1, named ki -cones. function =3 (ti ; ft1; : : :; tm g; fs1; : : :; sn g); choose n.d. f g = 6 T ft1; : : :; ti?1; ti+1; : : :; tmg: return fti = t : for all t 2 T g [ =1 (ti ; ft1; : : :; tm g ? T; fs1; : : :; sn g). A problem of type (6) can only be solved by using action 11 of SUA. The function limit 1 must simultaneously solve the uni cation problem fX1 ; : : : ; Xm g = S , for 2
For problem (3), the use of function =2 is not possible.
1 2 some S fY1; : : : ; Yng (for such reason, the de nitions of =, =, and =3 are embedded in its de nition) and bind Z to any subset of S not containing variables occurring in any h-fork corresponding to the solution being computed. function limit 1(S1; S2; Z; E ); Let ft1; : : :; tmg = Can (S1) ; fs1; : : :; sng = Can(S2) ; 1. If ft1 ; : : :; tm g and fs1 ; : : :; sm g are syntactically equal then return SUA(fZ = f gg [ E ); 2. elseif m = 1 and n > 1 then return SUA(fsi = t1 : 1 i ng [ fZ = f gg [ E ) 3. elseif m = 1 and n = 1 or m > 1 and n = 1 then choose n.d. T fs1 g then return SUA(fti = s1 : 1 i mg [ fZ = T g [ E ) 4. else x i 2 f1; : : :; mg and choose n.d. one of the following actions: i. choose n.d. j 2 f1; : : :; ng and T fsj g: return SUA(fti = sj g [ fZ = T [ Z 0 g[ limit 1(ft1 ; : : :; ti?1 ; ti+1 ; : : :; tm g; fs1; : : :; sj ?1 ; sj +1 ; : : :; sn g; Z 0; E )); ii. choose n.d. S fs1; : : :; sn g s.t. jS j 2: return SUA(fti = s : for all s 2 S g [ limit 1(ft1; : : :; ti?1; ti+1; : : :; tmg; fs1; : : :; sng ? S; Z; E )); iii. choose n.d. T 0 ft1 ; : : :; ti?1 ; ti+1 ; : : :; tm g, j 2 f1; : : :; ng and T fsj g: return SUA(ft = sj : for all t 2 T 0 g [ fti = sj g [ fZ = T [ Z 0 g[ limit 1(ft1 ; : : :; ti?1 ; ti+1 ; : : :; tm g ? T 0; fs1; : : :; sj ?1 ; sj +1 ; : : :; sn g; Z 0; E )). The de nition of limit 2 is similar to the one of limit 1. In particular, for problems of type (8), the values of W and Z are constrained in order to avoid the introduction of variables occurring in some h-fork or h-cone respectively, of the solution being computed. On the other hand, limit 2 must also control that those variables bounded in by a simple binding are not introduced in Z and W simultaneously. function limit 2(S1; S2; Z; W; E ); Let ft1; : : :; tmg = Can (S1) ; fs1; : : :; sng = Can(S2) ; 1. If ft1 ; : : :; tm g and fs1 ; : : :; sn g are syntactically equal then return SUA(fZ = f gg [ fW = f gg [ E ); 2. elseif m = 1 and n > 1 then choose n.d. T ft1 g: return SUA(fsi = t1 : 1 i ng [ fZ = T g [ fW = f gg [ E ); 3. elseif n = 1 and m > 1 then choose n.d. S fs1 g: return SUA(fti = s1 : 1 i mg [ fZ = f gg [ fW = S g [ E ); 4. elseif m = 1 and n = 1 then choose n.d. T ft1 g, S fs1 g s.t. T [ S = 6 ft1; s1g: return SUA(fti = s1 : 1 i mg [ fZ = T g [ fW = S g [ E ); 5. else x i 2 f1; : : :; mg and choose n.d. one of the following actions: i. choose n.d. j 2 f1; : : :; ng and S fsj g, T ftj g such that T [ S = 6 fsj ; tig: 0 0 return SUA(fti = sj g [ fW = S [ W g [ fZ = T [ Z g[ limit 2(ft1 ; : : :; ti?1; ti+1 ; : : :; tm g; fs1; : : :; sj ?1 ; sj +1 ; : : :; sn g; Z 0; W 0E )); ii. choose n.d. S fs1; : : :; sn g s.t. jS j 2, T fti g: return SUA(fti = s : for all s 2 S g [ fZ = T [ Z 0 g[ limit 2(ft1 ; : : :; ti?1; ti+1 ; : : :; tm g; fs1; : : :; sn g ? S; Z 0; W; E )); iii. choose n.d. T ft1 ; : : :; ti?1 ; ti+1; : : :; tm g, j 2 f1; : : :; ng and S fsj g: return SUA(ft = sj : for all t 2 T g [ fti = sj g [ fW = T [ W 0 g[ limit 2(ft1 ; : : :; ti?1; ti+1 ; : : :; tm g ? T; fs1; : : :; sj ?1 ; sj +1 ; : : :; sn g; Z; W 0; E )).
4.1 Termination, Correctness, and Minimality of SUA
The following de nition is helpful for the termination proof. Let E be a system of equations, and let p be the total number of occurrences of function symbols in it. Then a function lev : Var (E ) ?! ! extended to non-variable terms as follows lev (f (t0 ; : : :; tn)) = maxf1 + lev (t0); : : :; 1 + lev (tn)g when f 2 , f 6 f j g lev (ft j sg) = maxf1 + lev (t); lev (s)g and ful lling condition (i) lev (`); lev (r) p, for any equation ` = r in E always exists. If we require the further condition (ii) lev (`) = lev (r) for any equation ` = r in E then such lev may not exist (e.g. when E = fX = f (X )g).
Lemma 1 Let E be a satis able equation system; then a function lev whose exten-
sion to terms ful lls both conditions (i) and (ii) always exists. Let fX = tg [ E be an equation system and let p be the number of occurrences of functional symbols in it. Assume X does not occur in t and assume the function lev ful lls condition (i). Moreover, assume that lev is such that lev (X ) = lev (t). Then lev ful lls condition (i) also for the system E [X=t] [ fX = tg.
Algorithm SUA calls the functions unify set, limit 1, and limit 2. To prove the termination of them is straightforward. In the proof of the following Theorem we will make use also of their semantics.
Theorem 1 (Termination) For any input system E , SUA always terminates, no matter what non-deterministic sequence of choices is made. Proof Assume that an in nite sequence of non-deterministic choices s.t. SUA(E ) does not terminate, exists. Let E (0); E (1); E (2); : : : be the values of E at the 0th; 1st ;
2nd ; : : : iteration, respectively, and let Sp be the number of occurrences of functional symbols in E (0). A function lev : Var ( j0 E (j)) ?! ! s.t. it ful lls condition (i) for all equation sets E (j), and any time a substitution [X=t] has been applied (actions 0 and 3), then lev (X ) = lev (t), must necessarily exist. If it did not exist, then a failure situation caused by occur check would rise, causing the computation be nite. Picking such lev , we de ne a measure of complexity LE for the equation set E : LE =Def [#(2p); #(2p ? 1); #(2p ? 2); : : : ; #(1); #(0)] where #(j ) returns the number of equations not in solved form ` = r in E s.t. lev (`) + lev (r) = j . The ordering between two lists of this form is the well-founded lexicographical ordering. It is easy to see that actions 0, 1, 3, and 8 cause LE to strictly decrease. Actions 4, 9, 11, 12, and 13 do not increase LE ; however they introduce active (i.e. marked by `') equations which will re immediately action 0. Actions 2 and 10 leave LE
unchanged; nevertheless they at most can double the number of actions, hence we may forget them. Since the lexicographical ordering on constant-length lists of nonnegative integers is a well-ordering, this is sucient to prove termination. 2 The above Theorem proves that the search tree built by SUA does not contain in nite branches. Since such tree is nitely branched, it is guaranteed that SUA computes a nite number of uni ers. The following Theorem is a direct consequence of the corresponding one presented in [3] and therefore we omit the proof here. Theorem 2 (Correctness and Completeness) The function SUA is correct with respect to the equational theory presented in Sect. 2. Moreover, it is correct and complete with respect to the well-founded theory of hybrid sets presented in [3, 5]. It remains to show that the uni cation algorithm SUA is optimal w.r.t. all chosen sample problems. We will deal only with set terms ended by the constant symbol f g. This simpli es the analysis of actions 9 and 10 of the algorithm (k and k0 are both f g). The very long and technical proof is omitted due to space limits. Theorem 3 (Minimality) SUA is optimal w.r.t. all the eight sample problems described in Sec. 3, i.e., SUA enumerates without repetitions a minimal complete set of uni ers, to any given uni cation problem belonging to one of the eight given kinds.
5 Conclusions In this paper we have presented a new set uni cation algorithm SUA, optimal for a signi cant set of sample problems, that is computing exactly the minimal complete set of most general uni ers for them. Tables reporting the optimal number of solutions for such sample problems have been also presented. These tables show that developing minimal set uni cation algorithms is an interesting line of research: since the minimal number of solutions is big by itself, it is very important to avoid repeated solutions and solutions being instances of other ones. Although it has been only proved that SUA is optimal for the eight problems presented in Sec. 3, due to their relevance {they can be used for testing any set uni cation algorithm{ it is reasonable to expect a good behavior of SUA for any instance of the set uni cation problem. Finally, we deem that SUA can adventageously replace any other known set uni cation algorithms in implementations of set-based logic programming languages (for instance in the implementation of flogg presented in [4]).
Acknowledgements We are very grateful to Giorgio Levi for giving us the possibility of working together in Pisa. We would like to thank Alberto Policriti, Enrico Pontelli, Mario Rodrguez-
Artalejo, Gianfranco Rossi, Alvaro Ruiz-Andino, and an anonymous referee for their wise advises. A special `thank you' to Ana Gil-Luezas for her unconditional help.
References [1] Arenas-Sanchez, P., and Dovier, A. Minimal Set Uni cation. Technical Report: TR{6/95, Universita di Pisa, dip. di Informatica, april 1995. [2] Buttner, W., and Simonis, H. Embedding Boolean Expressions into Logic Programming. Journal of Symbolic Computation 4 (1987), 191{205. [3] Dovier, A., Omodeo, E., Pontelli, E., and Rossi, G. Embedding Finite Sets in a Logic Programming Language. In 3rd Int.l Workshop on Extension of Logic Programming (1993), E. Lamma and P. Mello, Eds., vol. 660 of Lecture Notes in Arti cial Intelligence, Springer-Verlag, pp. 150{167. [4] Dovier, A., and Pontelli., E. A WAM based Implementation of a Logic Language with Sets. In Proc. Fifth Int'l Symp. on Programming Language Implementation and Logic Programming (1993), M. Bruynooghe and J. Penjam, Ed., vol. 714 of Lecture Notes in Computer Science, Springer-Verlag, pp. 275{ 290. [5] Dovier, A., and Rossi, G. Embedding Extensional Finite Sets in CLP. In Proc. of International Logic Programming Symposium, ILPS'93 (1993), D. Miller, Ed., The MIT Press, pp. 540{556. [6] Gervet, C. Conjunto: Constraint Logic Programming with Finite Set Domains. In Proc. of International Logic Programming Symposium, ILPS'94 (1994), M. Bruynooghe, Ed., The MIT Press, pp. 339{358. [7] Huet, G. Resolution d'equations dans des langages d'ordre 1,2,: : :; !. These d'E tat, Univ. de Paris, VII, 1976. [8] Jaffar, J., and Maher, M. J. Constraint Logic Programming: A Survey. The Journal of Logic Programming 19{20 (1994), 503{581. [9] Jayaraman, B. Implementation of Subset-Equational Programs. Journal of Logic Programming 12, 4 (1992), 299{324. [10] Legeard, B., and Legros, E. Short overview of the CLPS system. In Proc. Third Int'l Symp. on Programming Language Implementation and Logic Programming (august 1991), J. Maluszynsky and M. Wirsing, Eds., vol. 528 of Lecture Notes in Computer Science, Springer-Verlag, pp. 431{433. [11] Siekmann, J. H. Uni cation theory. In Uni cation, C. Kirchner, Ed. Academic Press, 1990. [12] Stolzenburg, F. An Algorithm for General Set Uni cation and its Complexity. In Proc. of Workshop on Logic Programming with Sets, in conjunction with ICLP'93 (1993), E. G. Omodeo and G. Rossi, Eds.