On Computing Constraint Abduction Answers

1 downloads 0 Views 144KB Size Report
Abstract. We address the problem of computing and representing answers of constraint abduction problems over the Herbrand domain. This problem is of in-.
On Computing Constraint Abduction Answers Michael Maher and Ge Huang NICTA and University of NSW Sydney, Australia [email protected] [email protected] Abstract. We address the problem of computing and representing answers of constraint abduction problems over the Herbrand domain. This problem is of interest when performing type inference involving generalized algebraic data types. We show that simply recognizing a maximally general answer or fully maximal answer is co-NP complete. However we present an algorithm that computes the (finite) set of fully maximal answers of an abduction problem. The maximally general answers are generally infinite in number but we show how to generate a finite representation of them when only unary function symbols are present.

1 Introduction Constraint abduction is the inference procedure that, given constraints B and C, infers constraint A such that A ∧ B → C. Recent work on constraint-based type inference for generalized algebraic data types (GADTs) [15] has used conjunctions of expressions B → C to express the type requirements of a program [11, 13]. Answers to a constraint abduction problem correspond to a well-typing of the program [13, 14]. Some approaches to type inference with GADTs require programmer annotations [9, 8], while others attempt to infer types without such help [13, 14]. In this paper we explore representational and computational issues that arise in the latter approach, by addressing them in the general context of constraint abduction. In the Herbrand constraint domain the maximally general answers represent all answers, but there are in general infinitely many maximally general answers to a constraint abduction problem instance [3]. Thus computing them is not straightforward. Furthermore, we show that simply recognising that an answer is maximally general is a co-NP complete problem. There are two ways we can address this problem. The first approach is to develop a representation scheme whereby the set of maximally general answers can be finitely presented. This is a difficult problem and we obtain a solution only in the case where all function symbols are unary. The second approach is to find a finite subset of the maximally general answers that is canonically defined and of use in practice. The class of fully maximal answers was identified in [3] as omitting many “unexpected” and unhelpful maximally general answers, and was shown to be finite. It is used in [13, 14]. Here we provide an algorithm that applies without restriction on the function symbols and generates all fully maximal answers. For both maximally general answers and fully maximal answers we address specifically the case where there are no function symbols. We have simple characterizations

in this case, and we use them to show that the number of maximally general answers and fully maximal answers can grow explosively, even in this simple case. After some preliminaries on constraint abduction and the Herbrand constraint domain (Section 2) we address, in turn, the complexity of recognising answers (Section 3), and the problems of representing all maximally general answers (Section 4) and computing all fully maximal answers (Section 5).

2 Background The syntax and semantics of constraints are defined by a constraint domain. Given a signature Σ, and a set of variables V ars (which we assume is infinite), a constraint domain is a pair (D, L) where D is a Σ-structure and L (the language of constraints) is a set of Σ-formulas closed under conjunction and renaming of free variables. Constraint abduction is a form of abduction where all predicates in the formulas over which the abduction is being inferred have a semantics determined by a constraint domain. Definition 1 The Simple Constraint Abduction (SCA) Problem is as follows: Given a constraint domain (D, L), and given two constraints B, C ∈ L such that ˜ B ∧ C, for what constraints A ∈ L does D |= ∃ D |= (A ∧ B) → C and

˜ (A ∧ B) D |= ∃

An instance of the problem has a fixed constraint domain and fixed constraints B and C. A constraint A satisfying the above properties is called an answer. Intuitively, B is background information and C is a conclusion drawn from B and some missing information; each answer A is a candidate for the missing information. Throughout this paper, A, B and C refer to the constraints in a simple constraint abduction problem. We assume B ∧ C is satisfiable; otherwise, there are no answers. Usually we leave the constraint domain implicit and use (A ∧ B) → C, for example, in place of D |= (A ∧ B) → C. Of all the answers, we are most interested in the maximally general answers, that is, constraints A such that (A ∧ B) → C and for every A′ , if A → A′ and (A′ ∧ B) → C then A′ → A. (That is, there is no answer strictly more general than A.) Under some conditions on the constraint domain, they can also be thought of as a means to (somewhat) compactly represent all answers. In general, we wish to solve several SCA problems simultaneously, but it is shown in [3] that maximally general answers to such joint problem can be constructed from the maximally general answers of the individual SCA problems using the algorithm JCASolve. Furthermore, if C is c1 ∧ · · · ∧ cn then we can reduce the SCA problem involving B and C to the problem of solving simultaneously the SCA problems involving B and ci , for i = 1, . . . , n. Thus we can assume without loss of generality that C consists of a single constraint.

In general, there can be infinitely many maximally general answers to a SCA problem, many of which result in a conjunction A ∧ B that is not maximally general. In contexts where A will later be combined with B, we might want A ∧ B to be maximally general. This has led to the definition of fully maximal answers [3] – a subset of the maximally general answers that is finite in some constraint domains. Definition 2 An answer A is fully maximal if A is a maximally general answer and A ∧ B is maximally general among all expressions A′ ∧ B where A′ is an answer. Equivalently, A is a fully maximal answer iff A is a maximally general answer and (A ∧ B) ↔ (B ∧ C). In this paper we are primarily interested in the Herbrand constraint domain, which consists of (possibly) existentially quantified conjunctions of equations on terms, where equality of ground terms is syntactic identity. We will denote this constraint domain by F T∃ . The weaker constraint domain where existential quantifiers are not used is denoted by F T . These domains are widely used for symbolic computation in automated reasoning, logic programming and type systems. Unification [10, 2] is an algorithm for solving equations in these constraint domains. We assume that in every SCA instance there is a finite set of variables x ˜ of interest. Usually we can take x˜ to be vars(B) ∪ vars(C), where vars(o) denotes the free variables of o. A constraint in F T ∃ is in standard form if it has the form ∃˜ yx ˜ = t˜(˜ y ), where x˜ is a sequence of all variables of interest and y˜ is a disjoint set of existentially quantified variables. The righthandside of such a constraint in standard form is the sequence of terms t˜(˜ y ). Every satisfiable constraint in F T ∃ (and F T ) can be presented in this form. In F T a constraint is in solved form if it has the form x ˜ = t˜(˜ y ), where x˜ and y˜ are disjoint. Example 1. We consider a SCA problem instance over F T . Let B be k(h(x), y) = k(v, g(z)) and C be v = h(f (z)). The solved form of B is v = h(x), y = g(z). Among the maximally general answers to this SCA instance are: the trivial answer v = h(f (z)); x = f (z); and x = f (u), y = g(u), for any variable u other than v, x, y, z. The latter class of answers are among the “unexpected” maximally general answers [3] to this instance; they involve a variable not involved in the problem. The answers in this class are not fully maximal. The other two answers are fully maximal. For example, v = h(f (z)) ∧ v = h(x) → x = f (z). If we consider this problem as an instance over F T∃ then the standard form of B is ∃u1 , u2 v = h(u1 ), x = u1 , y = g(u2 ), z = u2 and the standard form of C is ∃u3 v = h(f (u3 )), z = u3 . Again v = h(f (z)) and x = f (z) are maximally general answers. Another fully maximal answer is ∃u x = f (u), y = g(u), which is strictly more general than x = f (u), y = g(u) for any variable u. In the Herbrand constraint domains, whether F T ∃ or F T , the maximally general answers of a SCA problem instance represent all answers, and in general there are infinitely many of them, but always only a finite number of fully maximal answers [3].

3 Recognising Answers It is straightforward to determine whether a given constraint A is an answer for a SCA problem involving B and C. It can be done in time linear in the size of A, B, C, using a linear unification algorithm [7, 5]. However, determining whether A is maximally general or fully maximal is substantially harder. Theorem 1. Let A, B, and C be constraints of F T , and consider the SCA involving B and C. 1. The question whether A is an answer can be decided in linear time 2. Recognising that A is a maximally general answer is co-NP complete 3. Recognising that A is a fully maximal answer is co-NP complete To prove parts 2 and 3 we reduce SAT to the problem of finding a more general answer than a given answer. We conjecture that in F T∃ recognising that an answer is maximally general or fully maximal is also co-NP complete. However Theorem 1 and its proof do not directly extend to F T∃ .

4 Maximally General Answers In this section we consider only Herbrand domains where the signature Σ contains only constants and unary function symbols. The restriction to unary function symbols does not limit the unruly proliferation of maximally general answers as identified in [3]. Example 2. Let B be x0 = x1 , x2 = x3 and C be v = z. Then, in addition to the more obvious maximally general answers is Ay defined by x0 = s(v), x1 = s(y), x2 = t(y), x3 = t(z) for any variable y not occurring in B or C, and any terms s and t. Before defining an algorithm we must introduce several definitions. Let Σ1 be the set of unary function symbols in Σ. We will use words constructed from these symbols to represent the repeated application of the functions. For example, the application of functions f (g(h(f (x)))) is represented by the word f ghf applied to the variable x. The application of a word w to a term t is written w(t). The empty word is denoted by ε. While function symbols represent tree (or term) constructors, we introduce inverse elements as the deconstructors for the underlying function symbol. Thus f −1 applied to the term f (g(x)), or f −1 (f (g(x))), is equal to g(x). In terms of equations, x = f −1 (y) is defined to mean y = f (x). We extend the inverse notation to general expressions by defining (uw)−1 = w−1 u−1 . In some cases the application of an inverse element to a term is not meaningful, for example, application of g −1 to f (g(x)). It corresponds to a clash of function symbols in unification. As a result, composition of expressions is a partial function. For example, (f f −1 )g is equivalent to g, but f (f −1 g) represents a clash. We can formulate equational reasoning in a partial algebra W of term constructors and destructors where the values are Σ1 -words, ε is the empty word, each σ ∈ Σ1 is a constant, there is a binary composition operator (represented by juxtaposition) and the

inverse operator −1 . (The problem of solving equations on this partial algebra has some similarity to solving equations on the free group with rational constraints forcing each variable to be a word [1].) The meaning of expressions in the algebra is given by formulas in F T∃ relating two variables, and similarly for equations over the algebra. Let e, e1 , e2 be expressions in the algebra and w be a Σ1 -word. [[x (w) y]] is the equation x = w(y) [[x (e−1 ) y]] = [[y (e) x]] [[x (e1 e2 ) y]] = ∃z [[x (e1 ) z]] ∧ [[z (e2 ) y]] [[e1 = e2 ]] = ∀x, y ([[x (e1 ) y]] ↔ [[x (e2 ) y]]) We introduce an infinite set W V ars of variables ranging over words, and extend expressions to incorporate word variables. A word expression is an expression in the algebra that does not involve the inverse operator. Clearly, the composition operator is associative in cases where both results are defined. We adopt the convention that composition associates to the right. For a set S of expressions, we define S −1 = {s−1 | s ∈ S}. Recall that we write B → c for F T |= B → c. We say terms s and t are Bequivalent if B → s = t. We write [s] for the B-equivalence class of s. We group the terms that are (perhaps indirectly) equationally related by B. Given a set of equations B, a B-class is a minimal non-empty set S of terms that is closed under (a) B-equivalence, (b) taking subterms, and (c) taking superterms. Each B-class has at least one term (variable or constant) such that neither it, nor any term B-equivalent to it, has a subterm. For each B-class we fix one such term and refer to it as the base of the B-class. We say a B-class has a constant base if it contains a constant (in which case we will choose the constant to be the base without loss of generality). Every term in a B-class is B-equivalent to a term with the base as a subterm. Thus, for every term t there is a (unique) corresponding word wt such that B → t = wt (b), where b is the base of the B-class containing t. Note that all B-classes are disjoint and, given a variable or constant z, all terms containing an occurrence of z are in the same B-class. Since we assume B is satisfiable in F T , each B-class contains, at most, one constant. Given two variables x and y in the same B-class, they are equationally related via x = wx (b), y = wy (b), where b is the base of the B-class. We can visualize the parts of the B-class relevant to x and y as shown in Figure 1(a). In those terms, wx = ac and wy = dc, where c is the greatest common suffix1 of wx and wy . A B-class contains infinitely many B-equivalence classes, assuming Σ1 is nonempty. However, a finite representation of the B-classes can be computed by applying a congruence closure algorithm [6] to the equations in B. The constants and variables not congruent to any term with subterms are candidates for the base of a B-class. It is straightforward to use an ordering on variables and constants to canonically choose a base. We will assume that a constant is chosen as base, whenever possible. Now, let C be v = w(z), where v is a variable, but z may be a variable or a constant. Consider a prospective answer A, which we can assume is in solved form. 1

The greatest common suffix of words w1 and w2 is the longest word w such that there are words u1 and u2 with w1 = u1 w and w2 = u2 w.

Fig. 1. Diagrams of (a) a B-class, and (b) the relevant part of the A, B-graph for Example 3.

The coarse A, B-graph is an undirected graph where the B-classes are vertices, for each equation (s1 = s2 ) ∈ A there is an edge between the B-class of s1 and the B-class of s2 , and there are no other edges. We can use the coarse A, B-graph to eliminate some prospective answers that are not maximally general answers. A simple path between Bclasses b1 and b2 is a minimal set of edges that connects b1 and b2 . When necessary, we can arrange the edges in a sequence, to form a path as conventionally defined. Proposition 1 Consider a solved form constraint A and its coarse A, B-graph. If A is a maximally general answer of the SCA problem involving B and C (v = w(z)) then then the coarse A, B-graph consists exactly of a simple path between the B-class of z and the B-class of v. However, the coarse A, B-graph does not address the details of A and cannot even be used to determine whether A ∧ B is satisfiable, much less characterize maximally general answers. Hence we define a more detailed graph. The A, B-graph is a labelled, directed graph where the B-equivalence classes are vertices and there is an edge from [s] to [t] labelled with a word u if either B → t = u(s) or t = u(s) ∈ A. A simple path in the A, B-graph is a simple path in the underlying undirected graph. We will use paths in this graph to represent equational reasoning that might be used to infer C (i.e. v = u(z)) from A ∧ B. Traversing an edge in the direction of an arrow labelled with u corresponds to applying the word u to the terms in the current B-equivalence class, that is, constructing (the B-equivalence class of) larger terms from the current terms. Conversely, traversing an edge against the direction of an arrow labelled with u corresponds to applying u−1 to the current B-equivalence class and deconstructing the current terms, that is, deleting the prefix u from the word defining a current term to produce (the B-equivalence class of) a subterm. If u is not a prefix of the word defining the term then we cannot establish an equational relationship between the terms at the two endpoints of the edge. An MGA-path from z to v is a sequence of B-equivalence classes E0 , . . . , E2n+1 such that E0 = [z], E2n+1 = [v], E2i and E2i+1 are in the same B-class, for i = 0, . . . , n, and the path involves at most one B-class with a constant base. (Note that E2i

and E2i+1 may be the same B-equivalence class.) The MGA-path is induced by the solved form constraint A if {E2i+1 , E2i+2 } = {[x], [y]} for some equation x = u(y) ∈ A, for i = 0, . . . , n − 1 and, conversely, for each equation x = u(y) ∈ A we have {[x], [y]} = {E2i+1 , E2i+2 }, for some i. We say that A and the MGA-path correspond. Notice that the edges relevant to the previous definition involve only B-equivalence classes of variables or constants. Hence only finitely many MGA-paths follow the same route as a simple path in the coarse A, B-graph. An MGA-path is simple if the sequence contains either 2 or 0 occurrences of B-equivalence classes from each B-class. Example 3. Let C be v = z, B be x = hg(w), y = f g(w), z = h(u) and A be x = h(v), y = f (z), where u, v, w, x, y, z are variables. Then there are three nonsimple B-classes: one contains v as the base, and is non-simple only because v occurs in C; one contains z and u, with u as the base; and one contains w, x and y, with w as the base. Part of the A, B-graph is shown in Figure 1(b). The MGA-path induced by A is the path from [z] to [v] via [x] and [y]. The coarse A, B-graph consists of the three non-simple B-classes, connected by the dashed edges. We now refine Proposition 1. Proposition 2 A maximally general answer A induces a simple MGA-path from [z] to [v] in the A, B-graph. Since the equational relationship between any two B-equivalence classes [x] and [y] in the same B-class is as described in Figure 1(a), a maximally general answer must induce a path within the A, B-graph of the form shown in Figure 1(b), where the dashed edges correspond to equations in A. Let us index the variables and words. We will use the naming scheme for variables and words in a B-class described in Figure 1(a) with an index i for the i’th B-class (counting from v, on the left), so that B → (xi = ai ci (bi ) ∧ yi = di ci (bi )) and equations in A relate yi and xi+1 . These equations may be of the form yi = ui (xi+1 ) or xi+1 = ui (yi ), where ui is a word. We express these two possibilities compactly as yi = uei i (xi+1 ) where ei ∈ {1, −1}. We use the algebra of term constructors and their inverses to formulate requirements on A to be a maximally general answer. The main requirement is the condition for A, represented by a path from z to v, to establish that v = w(z). e1 −1 ei −1 a1 d−1 1 u1 . . . ai di ui . . . an dn = w

(1)

This is a necessary, but not sufficient, condition for A to be a maximally general answer. Example 4. Continuing with Example 3 in Figure 1(b), the path from [z] to [v] via [y] and [x] is labelled h−1 hf −1 f , which is equal to ε, implying that A ∧ B → C. However A∧B is inconsistent: there is a clash between the h in z = h(u) and the g in y = f g(w). Thus A is not an answer. If, in place of z = h(u), B contained z = a then we would have a clash between a and g.

The problem is that, while the MGA-path demonstrates the possibility that we have a maximally general answer, a ci (bi ) (in terms of Figure 1) can be incompatible with either another cj , as in the above example, or with part of the MGA-path. There are two possibilities: a clash between a constant and a function symbol, or a clash between function symbols. The first possibility arises only if the path contains a B-class with a constant base b. Suppose it is the m’th B-class. For every simple path from b to another base we require that the result is a word. This ensures that there is no clash between b and a unary function symbol. There are two variations of the constraint, depending on whether the target base is in a B-class between b and z or between b and v. For each j : m < j ≤ n −ej

−1 ∃u c−1 j aj u j

−1 −ei −em dj−1 . . . ai+1 ui di . . . dm+1 a−1 m+1 um dm cm = u

(2)

For each j : 1 ≤ j < m e

e

m−1 −1 −1 ei −1 j ∃u c−1 j dj uj aj+1 . . . ai di ui . . . am−1 dm−1 um−1 am cm = u

(3)

If there is a clash between function symbols on the MGA-path the requirement (1) will not be satisfied, because the left side will not evaluate to a word. We need to examine all paths between bases to ensure that there is no clash between the function symbols off the MGA-path and any other function symbol. Every variable-free word expression without a clash can be simplified to the form u′ u−1 , for some words u and u′ . Hence we require that each path expression between bases will evaluate to this form. For each j, m : 1 ≤ j < m ≤ n e

e

−1 −1 ei m−1 −1 j ′ −1 ∃u, u′ c−1 j dj uj aj+1 . . . ai di ui . . . am−1 dm−1 um−1 am cm = u u

(4)

The requirements (1) – (4) are necessary and sufficient for A to be a maximally general answer. Theorem 2. Consider a SCA problem involving B and C (v = w(z)) and a solved form constraint A. A is a maximally general answer if and only if A corresponds to a simple MGA-path in the A, B-graph from [z] to [v], and, for that path, requirements (1) – (4) are satisfied. Thus if we can finitely represent all solutions to the requirements (1) and (2) – (4), where we now regard the ui ’s as variables, then we also represent all maximally general answers. This leads us to the abstract algorithm in Figure 2. Note that the output equations may contain (possibly constrained) word variables in the solutions for ui ’s; any consistent instantiation of all these variables by words gives a maximally general answer. For this approach to produce a finite representation, we must have an upper bound m on the number of B-classes in a sequence. A B-class is simple if it contains exactly one variable or constant (which must be the base) and does not contain v or z. A variable (constant) that appears in a simple class is called a simple variable (simple constant). It turns out that no maximally general answer A in solved form can correspond to a MGApath with two adjacent simple B-classes because of the syntactic form of solved forms

algorithm MGA(B, C) for n = 1, . . . , m do for every usable sequence of B-classes B1 , . . . , Bn do for i = 1, . . . , n − 1 do choose value of ei choose values for [yi ] in Bi and [xi+1 ] in Bi+1 to form a MGA-path Generate equations (1) - (4) in variables ui Solve equations for ui choose variable or constant representatives pi and qi for each [xi ] and [yi ] Check that the output describes a sufficiently general solved form output equations qi = uei i (pi ) end algorithm Fig. 2. Nondeterministic algorithm for maximally general answers

and the lack of any other term B-equivalent to the simple variable or constant. Hence if q is the number of non-simple B-classes, then we can take m = 2q − 1; no maximally general answer corresponds to a longer MGA-path. A sequence of B-classes in usable if simple B-classes do not appear consecutively in the sequence. Unusable sequences do not correspond to maximally general answers. There is one further restriction on simple variables in maximally general answers: Simple variables must appear on the righthandside of equations in A, but must not appear as a bare variable (i.e. they must appear as part of a larger term). For example, if xs is a simple variable then y1 = s(xs ), y2 = t(xs ) is acceptable as part of A, assuming s and t are non-empty words, but y1 = xs , y2 = t(xs ) is less general than y2 = t(y1 ) and, similarly, xs = s(y1 ), y2 = t(xs ) is less general than y2 = ts(y1 ), while xs = s(y1 ), xs = t(y2 ) is not in solved form. The same point applies to simple constants. Although non-simple variables may appear on the lefthandside of equations in A, they also may not appear as a bare variable on the righthandside (except for v and z), nor may they appear both on the lefthandside and the righthandside of equations in A. For simplicity, we express these restrictions in the algorithm of Figure 2 as a check before output, but clearly it would be more efficient to enforce them at the time choices are made. The names of simple variables are unimportant: any renaming of these variables will result in a different maximally general answer. Thus any maximally general answer involving a simple variable other than v and z represents an infinite set of maximally general answers (since V ars is infinite). The answers involving simple variables are some of the “unexpected” maximally general answers discussed in [3] and Example 1. Rather than solve the equations (1) – (4) directly, we employ a backtracking search procedure that produces a finite representation of the values of the word variables ui for which the equations A on an MGA-path form a maximally general answer. There is no room in this paper to present the complete algorithm solve, but we give an outline. The state of the algorithm is described by the following parameters: the remainder of the MGA-path to be explored; a variable or constant b that is the deepest base of a Bclass discovered so far; word expressions r, s, and t describing the relationship between

the current point p, the desired value of v (that is, w(z)) and b; and constraints E on word variables. The relationship of some of these parameters is displayed in Figure 3. In the algorithm, we iterate along e1 −1 ei −1 expression a1 d−1 1 u1 . . . ai di ui . . . an dn of (1) from right to left, preserving the invariants listed below at each point p (in B-class Bj ) of that expression by updating the parameters. The algorithm branches non-deterministically when different updates are possible, and on each branch accumulates conditions on the word variables ui . These conditions E are output when the branch terminates successfully. If E is unsatisfiable then the branch fails (i.e. terminates unsuccessfully). Thus the algorithm has a constraint programming style. Let N be the number of symbols Fig. 3. Diagram of algorithm parameters. (variables and constants) in (1). Each branch has length bounded by N . At each point the branching factor is at most N + n. Testing satisfiability of E terminates, since the constraints involved have a restricted form. Hence the algorithm must terminate. Let e denote the part of the expression already visited. The invariants are: 1. 2. 3. 4. 5.

the longest common suffix of s and t is ε A ∧ B |=E ts−1 (p) = tr(b) = w(z) A ∧ B |=E p = sr(b) = e(z) b is a constant iff some Bi has a constant base for j ≤ i ≤ n A ∧ B |=E (2)j ∧ (3)j ∧ (4)j

Here A ∧ B |=E ψ denotes that for every valuation σ of the word variables that satisfies E, F T |= σ(A) ∧ B → σ(ψ) (where σ(A) and σ(ψ) are well-formed (sets of) equations in the language of F T ). (i)j denotes the subset of equations (i) that refers to paths within and between B-classes Bj , Bj+1 , . . . , Bn . The collection of conditions E output by the algorithm solve constitute a finite representation of the solutions of equations (1) – (4). Theorem 3. Consider a SCA problem involving B and C over F T , and equations (1) – (4) for a given MGA-path. 1. The algorithm solve terminates. 2. Let σ be a solution to the equations (1) – (4). Then σ can be extended to a solution of one of the outputs of the solve algorithm. 3. If E is an output of the solve algorithm then every solution of E, when restricted to the free variables of equations (1) – (4), is a solution of those equations. Combining Theorem 2 and Theorem 3, we establish that the algorithm MGA(B, C) in Figure 2 produces a finite representation of all maximally general answers of a

simple constraint abduction problem over F T where C is a single equation. When C contains multiple equations we apply the JCA-Solve algorithm of [3] to the outputs of MGA(B, c), for each c ∈ C, as mentioned earlier. Example 5. Consider a variation of Example 3 where C is v = z and B is x = hg(w), y = f g(w). The B-classes of v, y, z form a usable sequence and there is a MGA-path [z], [y], [x], [v]. Of the possible values of the exponents, only one combination leads to equations with a solution. We consider this case, where A will have the form x = u1 (v), y = u2 (z). −1 The resulting constraints on the word variables ui are u−1 u2 = ε from (1) and 1 fh −1 the following three constraints from (4): ∃u, u′ g −1 h−1 u2 = u′ u−1 , ∃u, u′ u−1 u2 = 1 fh ′ −1 u′ u−1 , which is redundant wrt the first equation, and ∃u, u′ u−1 f g = u u . 1 The solve algorithm produces the following two descriptions of solutions to the constraints: u1 = f, u2 = h and u1 = f gu, u2 = hgu for any word u. These correspond to the answers x = f (v), y = h(z) and x = f (g(k(v))), y = h(g(k(z))) for any term k (including the empty term). There are, of course, answers derived from other MGA-paths. Only small modifications are need to handle F T∃ constraints: B-classes might contain existential variables, and these cannot be used as representatives of classes in a MGA-path, and simple variables must be considered existential variables. A Signature of Constants When there are only constants in the signature the above discussion becomes much simpler. There is no distinction between F T and F T∃ , the B-classes reduce to Bequivalence classes, simple B-classes become irrelevant, the A, B-graph is identical to the coarse A, B-graph, and all labels on edges of the the A, B-graph are the empty word. Thus every solved form constraint that corresponds to a simple path from [z] to [v] is a maximally general answer. We can use this to count the number of maximally general answers in some cases. For example, Proposition 3 Consider a SCA problem where the signature Σ consists only of constants, C is v = z and at most one constant appears in B and C. The number of maximally general answers is q+2 X

X

n=2 sequences B1 ...Bn

|B1 | ∗ |Bn | ∗

n−1 Y

|Bi | ∗ (|Bi | − 1)

i=1

where q is the number of non-simple B-classes, |Bi | denotes the cardinality of a Bclass Bi , and we sum over all sequences of non-simple B-classes with B1 = [z] and Bn = [v]. When there are function symbols the number of maximally general answers can grow rapidly – doubly exponentially – or can be infinite [3]. But even for SCA problems that do not involve function symbols, the number of maximally general answers can grow very rapidly.

Example 6. Consider the SCA problem where C is x1 = x2m and B is x1 = x2 , . . . , x = x2i , . . . , x2m−1 = x2m . Then |Bi | = 2, for i = 1, ..., m and |B1 | ∗ |Bn | ∗ Q2i−1 n n |B i |∗(|Bi |−1) = 2 for any n. For any length i=2 Pm n there are (n−2)! sequences that start with B0 and end with Bn . Thus there are n=2 (n − 2)! ∗ 2n maximally general answers. If m ≥ 2 then there is a lower bound of (m − 2)!2m and an upper bound of (m − 1)!2m . Note that this discussion only applies when the signature does not contain any unary (or higher arity) function symbols. It is not sufficient to simply impose that the SCA is function-free (i.e. no such function symbols appear in B or C). Example 7. Let B be u = v and C be x = y. Then the algorithm in Figure 2 generates the maximally general answers x = y, and x = u, y = v, and x = v, y = u. However, if the signature contains function symbols then there are also maximally general answers x = k(u), y = k(v) and v = k(x), u = k(y), among others, for any term k. Nevertheless, it seems that Example 6 provides a lower bound on the growth of the number of expressions needed to represent all maximally general answers in the worst case. Example 6 is also suggestive of the number of maximally general answers for any constraint domain containing equations, since the problem and the answers are valid in any constraint domain. On the other hand, there are some (non-equational) constraint domains that have a unique most general answer [4].

5 Fully Maximal Answers To a very limited extent we can use the results of the previous section to compute fully maximal answers. For example Proposition 4 Consider a SCA problem over a signature of constants. Suppose C consists of a single equation v = z. Then the fully maximal answers are those defined by the algorithm MGA with a sequence of B-classes of length 2. Thus there are |[v]| ∗ |[z]| fully maximal answers. When C contains more than one equation the JCA-Solve algorithm will not necessarily produce all the fully maximal answers from the individual fully maximal answers. Example 8. Let B be x = y, z = w and let C be x = a, z = a. Then the smaller SCA problems where c1 is x = a (and c2 is z = a) have fully maximal answers x = a and y = a (respectively, z = a and w = a). The JCA-Solve algorithm combines these answers to give four fully maximal answers (such as x = a, z = a). However, there are other fully maximal answers such as x = z, y = a that are not composed from fully maximal answers to smaller problems. Thus we take a more direct approach to computing fully maximal answers. Fortunately, under this approach we need no restriction on the signature. We will first address the problem over the constraint domain F T∃ , and later discuss the small modifications needed to adapt it to F T . Suppose A is a constraint of F T∃ in standard form and S is a nonempty set of positions in A. We define a few auxiliary functions.

algorithm FMA(B, C) if (B → C) then return true let A be the standard form of B ∧ C do forever let A be next(A) if (∀A′ ∈ A A′ ∧ B 6→ C) then return A choose A ∈ A such that A ∧ B → C end algorithm Fig. 4. Nondeterministic algorithm for computing fully maximal answers

pos(t, A) returns the set of positions of the term t in the righthandside of A repl(S, A) replaces all terms in the righthandside of A occurring at a position in S by a new variable (that is existentially quantified) next(A) is the set {repl(S, A) | S is a nonempty set of positions of identical terms t in A such that t 6∈ V ars, or S ⊂ pos(t, A)} next(A) is a set of constraints strictly more general than A. All constraints (up to equivalence) more general than A can be generated by iterating next. The algorithm defined in Figure 4 non-deterministically finds a maximally general answer that is more general than B ∧ C. Using backtracking to implement the nondeterminism, we can enumerate all fully maximal answers. It is straightforward to see that the algorithm is correct and terminates: By the conditions in the definition of next, each iteration of the loop makes A strictly more general. Since for each constraint there are only finitely many more general constraints, the loop must terminate. An invariant of the loop is that A is an answer and is more general than B ∧ C. When the loop is exited, there are no answers more general than A, and hence A must be a maximally general answer. Thus, Definition 2, A is fully maximal. Furthermore, since any constraint that is more general than B ∧ C can be obtained from B ∧ C by a sequence of next operations, every fully maximal answer is generated. Thus Theorem 4. Algorithm FMA outputs all fully maximal answers to the SCA problem over F T∃ and terminates. There are optimizations that could be applied to the algorithm. In particular, it will find the same fully maximal answer several times because the order in which subterms t of A are generalized by repl is not significant to the final outcome. If we restrict the order in which subterms t are chosen we will avoid this possibility, and restrict the branching factor. We can also require that subterms t that are chosen do not contain new variables (introduced by repl). This prevents the algorithm proceeding by several “partial” generalizations. Another possibility is to require an incremental approach, where next(A) contains only constraints minimally (strictly) more general than A. That can be achieved by restricting t in the definition of next to terms containing at most one function symbol. We need vary the algorithm only slightly to find fully maximal answers over the constraint domain F T . We assume A is in solved form, and redefine two auxiliary functions as follows.

repl(S, A, x, t) deletes x = t from A and replaces all occurrences of t at a position in S by x next(A) is the set {repl(S, A, x, t) | x = t ∈ A, S ⊆ pos(t, A)} A variant of this algorithm has been proposed by Tom Schrijvers [12]. The correctness of the algorithm follows from the same argument as for the previous theorem, underpinned by results from [2] on the structure of constraints in F T . Theorem 5. Algorithm FMA, with the auxiliary functions modified as above, outputs all fully maximal answers to the SCA problem over F T and terminates. The correctness of the algorithm relies only on next(A) returning all constraints minimally more general than A and termination relies only on every constraint having only finitely many more general constraints. Thus the abstract algorithm in Figure 4 can be adapted to any constraint domain where a constraint has only finitely many generalizations, provided next can be defined constructively. Obviously these algorithms have high complexity. To some extent this cannot be avoided. [3] has an example with a binary function symbol where the number of fully maximal answers in F T grows doubly exponentially with the size of B and C. Again we look at the function-free case, where we have a more direct characterization of the fully maximal answers. A Signature of Constants When Σ contains only constants we can consider the A, B-graph as an undirected graph. We say A connects B-classes [s] and [t] if there is a path from [s] to [t] in the A, B-graph. We now characterize the fully maximal answers in the function-free case. Theorem 6. Consider a function-free SCA problem. A is a fully maximal answer iff A and C connect the same B-equivalence classes and the A, B-graph is a forest. Unlike the computation of maximally general answers discussed in Section 4, this result permits signatures containing function symbols (provided those symbols do not appear in B or C). Using this characterization we can show that the number of fully maximal answers can grow exponentially. Example 9. Suppose C is yi = ti for i = 1, . . . , n, where each yi and ti occurs just once in C, and B consists of the equations yi = yi′ for i = 1, . . . , n where the yi′ are variables not appearing elsewhere in B and C. Then the size of the problem, combining B and C, is θ(n). The connected subgraphs of the C, B-graph all consist of a single edge connecting two vertices (or the trivial case of an isolated vertex). It follows from Theorem 6 that any fully maximal answer A has an A, B-graph isomorphic to the C, B-graph. The i’th edge might be represented in A by yi = ti or yi′ = ti . Thus there are 2n inequivalent fully maximal answers to this problem. We can make the same point here as in the discussion of maximally general answers over a signature of constants: it appears that most constraint domains involving equations will have a similar growth in the number of fully maximal answers.

6 Conclusion We have investigated constraint abduction over the Herbrand domain. We have shown how to compute fully maximal answers, and represent finitely all maximally general answers in the unary case. However, these problems are intractable in their full generality, and even in terms of the number of fully maximal answers in the function-free case. This suggests that the use of constraint abduction for practical type inference is out of reach until a smaller subset of answers can be identified with the meaning of the type constraints, or more compact representations can be found. For the latter quest, the A, B-graph is a starting point. Acknowledgements: The authors thank J. Jaffar, T. Schrijvers and P. Stuckey for discussions related to this paper. We thank the referees for their comments, which helped improve the presentation. NICTA is funded by the Australian Government as represented by the Department of Broadband, Communications and the Digital Economy and the Australian Research Council through the ICT Centre of Excellence program.

References 1. V. Diekert, C. Gutierrez & C. Hagenah, The existential theory of equations with rational constraints in free groups is PSPACE-complete, Information and Computation, Volume 202:2, 105–140, 2005. 2. J-L. Lassez, M.J. Maher & K.G. Marriott, Unification Revisited, in: Foundations of Deductive Databases and Logic Programming, J. Minker (Ed.), 587–625, Kauffman, 1987. 3. M.J. Maher, Herbrand Constraint Abduction, Proc. Symp. on Logic in Computer Science, 397–406, 2005. 4. M. Maher, Heyting Domains for Constraint Abduction, Proc. Australian Joint Conf. on Artificial Intelligence LNAI 4304, Springer, 9–18, 2006. 5. A. Martelli & U. Montanari, An Efficient Unification Algorithm, ACM Trans. Program. Lang. Syst. 4(2): 258–282, 1982. 6. G. Nelson & D.C. Oppen, Fast Decision Procedures Based on Congruence Closure, JACM 27, 2, 356–364, 1980. 7. M. Paterson & M.N. Wegman, Linear Unification, J. Comput. Syst. Sci. 16(2): 158–167, 1978. 8. S. Peyton Jones, D. Vytiniotis, S. Weirich & G. Washburn, Simple Unification-based Type Inference for GADTs, Proc. Int. Conf. Functional Programming, 50–61, ACM Press, 2006. 9. F. Pottier & Y. R´egis-Gianas, Stratified type inference for generalized algebraic data types, Proc. POPL, 232–244, ACM Press, 2006. 10. J.A. Robinson, A Machine-Oriented Logic Based on the Resolution Principle, JACM 12(1), 23–41, 1965. 11. V. Simonet & F. Pottier, Constraint-based type inference with guarded algebraic data types, ACM Transactions on Programming Languages and Systems 29(1), 2007. 12. T. Schrijvers, personal communication, 13. P.J. Stuckey, M. Sulzmann & J. Wazny, Type Processing by Constraint Reasoning, Proc. Asian Symp. on Programming Languages and Systems, LNCS 4279 Springer, 1–25, 2006. 14. M. Sulzmann, T. Schrijvers & P.J. Stuckey, Type inference for GADTs via Herbrand constraint abduction, Report CW 507, K.U.Leuven, Dept. of Computer Science, 2008. 15. H. Xi, C. Chen & G. Chen, Guarded recursive datatype constructors, Proc. POPL, 224–235, 2003.