Multi-SLD Resolution - CiteSeerX

3 downloads 0 Views 241KB Size Report
component is the empty constraint true, the resolution rule tells how to progress ... this end, let S be a function which, applied to a constraint C, reduces C to a ... disj goals is essential: breadth- rst search (or some equivalent search strategy .... malized in the notion of environment tree, which we present with an example.
Multi-SLD Resolution

(Appeared in LPAR94)

Donald A. Smith1 and Timothy J. Hickey2 1 Department of Computer Science, University of Waikato, Hamilton, New Zealand

Email: [email protected], Fax: (07) 838-4155

2 Department of Computer Science, Brandeis University, Waltham, MA 02254,

Email: [email protected], Fax: (617) 736-2741

Abstract. Multi-SLD resolution is a variant of SLD resolution based on

a simple idea: Let the allowed constraints be closed under disjunction, and provide a mechanism for collecting solutions to a goal and turning the solutions into a disjunctive constraint. This idea leads to an operational model of logic programming, called data or-parallelism, in which multiple constraint environments partially replace backtracking as the operational embodiment of disjunction. The model has a natural implementation on data-parallel computers since each disjunct of a disjunctive constraint can be handled by a single (virtual) processor. In this paper, we { formalize the notions of multi-SLD resolution, multi-derivation, multiSLD tree, and environment tree; { prove the soundness and completeness of multi-SLD resolution; and { describe and justify several useful optimization techniques based on the form of constraints in a multi-derivation: the distinction between engine and multi variables, templates, and sharing of bindings in the environment tree. Together these results provide the foundations for a new operational semantics of disjunction in logic programming.

1 Introduction

We describe a resolution rule, multi-SLD resolution, that has the novel property of yielding a natural, data parallel implementation of or-parallelism. We give a formal de nition of multi-SLD resolution and the related concepts of multi-derivation and multi-SLD tree. We then go on to prove the soundness and completeness of multi-SLD and to examine some properties of the set of substitutions comprising the constraint component of the state of a multi-derivation. On this basis, we devise several useful optimizations. Previous papers [23] [21] informally introduced multi-SLD resolution and the Prolog dialect MultiLog: described a machine architecture (the Multi-WAM) for executing MultiLog programs; and presented benchmark results for sequential and parallel implementations of the language. Even on a uniprocessor computer, multi-SLD was shown to be faster than SLD for many (most?) combinatorial search problems. For example, the Instant Insanity puzzle from [29] runs about 3 times faster on uniprocessor MultiLog than on uniprocessor Prolog, using comparable WAM technology. For a contrived program that checks bits strings for

palindromicity, uniprocessor MultiLog is over 88 times faster than uniprocessor Prolog. The reason for this speedup is explained brie y below and in more detail in a companion paper [24], where a predictive model is given that accounts for observed speedups to within a constant factor. A second companion paper, [25], presents and analyzes the complexity of several representation schemes for managing substitutions in MultiLog. The rst author's dissertation, [27], discusses all of these issues in more detail. Informally, multi-SLD resolution can be described as follows. The abstract machine state of a multi-SLD interpreter consists of two components: a list of goals, and a disjunction ((multi)set) of substitutions.3 There are two sorts of multi-SLD resolution steps. In a normal multi-SLD resolution step, some atom is selected from the goal list and resolved against some clause in the program; since there are multiple substitutions in the constraint component, uni cation of the atom with the head of the clause occurs independently in the various substitutions. If any substitutions survive the resolution step, then the surviving substitutions, extended with the bindings resulting from head uni cation, become the constraint component of the next abstract machine state, whose goal list is found by replacing the selected atom with the body of the clause. In a disj multi-SLD resolution step, a subcomputation is begun on the selected atom (which in practice is annotated by the unary control operator disj) and some nite, nonempty subset of the solutions to the selected atom is collected and installed as the new constraint component. The new goal component consists of the previous goal list minus the selected atom. If control backtracks into the disj goal, then another non-empty, nite subset of solutions is collected and installed, and so on. The canonical example that illustrates multi-SLD resolution and the resulting data or-parallelism is the query4 | ?- generate(X),test(X).

To solve this query, standard Prolog enumerates the solutions to generate/1 one by one via backtracking and tests each solution separately with test/1. A control or-parallel implementation [1] [8] starts up multiple Prolog search engines to explore subparts of the SLD tree in parallel. In contrast, if we pre x the goal generate(X) with the operator disj, then an implementation based on multiSLD resolution collects subsets of the solutions to generate/1 and creates a set of binding environments which are tested en mass (in parallel) by test/1. As a result, test/1 is executed once per subset rather than once per solution, and fewer instructions are executed overall. Moreover, powerful optimizations are available for data or-parallel Prolog that are not available for control or-parallel Prolog due to the latter's multiple threads of control. With multi-SLD resolution, the various substitutions in the constraint component of the abstract machine state share much structure, and 3 More generally and from the viewpoint of CLP, the second component consists of a

disjunction of allowed constraints.

4 In general, multiple variables can get bound by a disj goal.

2

the uni cations in the various substitutions are not, after all, independent. The shared, common component of computation (in test/1) can `factored out' so that computations are performed (and structures are built on the heap) only once per subset of solutions. The plan of the present paper is as follows. Section 2 formally de nes multiSLD resolution and the related concepts of multi-derivation and multi-SLD tree. Section 3 introduces the notion of environment tree, which describes the shared structure of substitutions resulting from multi-derivations; it then uncovers an additional type of sharing among substitutions. Section 4 proves the soundness and completeness of multi-SLD resolution and compares the relative completeness of depth- rst SLD and multi-SLD interpreters. Section 5 presents a formal justi cation of two implementation techniques: the distinction between engine (sequential) and multi (parallel) variables, and templates. The last section considers related and future work.

2 Multi-SLD Resolution

In this section we make precise the operational semantics of MultiLog by formalizing the notions of multi-SLD resolution, multi-derivation, and multi-SLD tree. The reader interested mainly in the optimizations may wish to skip to Section 3. We assume the standard de nitions and notations of (constraint) logic programming [13] [9]. In an abstract operational model of a CLP language, the state of a derivation is a pair G  C, where G (the goal component) is a list of user atoms, C (the constraint component) is an allowed constraint and  is a synonym for ^. Starting from an initial state whose goal component is the query and whose constraint component is the empty constraint true, the resolution rule tells how to progress non-deterministically to the next state. In standard CLP languages this rule is just resolution, and a sequence of states progressing according to the resolution rule is called a (unary) derivation. In Multi-SLD the constraint component is a disjunction. This fact by itself is not foreign to the CLP framework, in which the allowed constraints can be any logical formula. What is new, though, is the mechanism for creating disjunctive constraints, as well as the implementation techniques available and the style of programming that results. We describe multi-SLD resolution for the special case where the Herbrand Universe is the domain of computation; the extension to other constraint domains is straightforward. In order for a derivation to continue from a state G  C it is necessary for C to be solvable. That is, the existential closure of C must evaluate to true in the theory of the constraint domain. Logically, when C is a disjunction, it is sucient for just one disjunct to be solvable, and we have investigated [27] the e ects of lazily evaluating the disjunctive constraints. But here we assume that unsolvable disjuncts are deleted from the constraint component, so that at each step the constraint component consists of a disjunction of solvable formulas. To this end, let S be a function which, applied to a constraint C, reduces C to a simpli ed form. In MultiLog, S is Herbrand's solved form algorithm (uni cation) 3

applied to each disjunct, with false disjuncts deleted (subsumed disjuncts could also be removed). The result is a DNF formula: a disjunction of solved form sets of equations. Let 2 be the empty goal. Assume some xed computation rule for selecting atoms to participate in resolution steps. De ne a multi-derivation from query G0 and program P to be a ( nite or in nite) sequence of states S0 ; S1; : : :, where S0  (G0  true) and for i > 0, Si?1 derives Si , written Si?1 =) Si , by the multi-resolution rule of Figure 1. Let =) be the re exive transitive closure of =). If a multi-derivation is nite and ends in a state with an empty goal component and with a non-false constraint C, then we write S0 =) (2  C). The multi-resolution rule is divided into two cases, depending on whether the selected atom is a normal user atom (e.g., p(f(X))), or a disj goal (e.g., disj q(Y,g(X,Y))). In practice, the interpreter or compiler could decide which goals should be labeled with disj operators. In this sense, the choice between the two multi-resolution rules is arbitrary, and each time an atom is selected a new decision could be made which rule to use. 1. (normal multi-resolution rule) Suppose the current state is A1 ; : : : ; An  (E1 _ : : : _ Em ); the selected atom, Ai , is a normal user atom; H B1 ; : : : ; Bk is a new renaming of a clause in P ; and

C with S (C ) 6 false. Then

_

1jm

(Ej ^ (Ai = H ))

A1 ; : : : ; An  (E1 _ : : : _ Em ) =)

A1 ; : : : ; Ai?1 ; B1 ; : : : ; Bk ; Ai+1 ; : : : ; An  S (C ) : This rule is just the standard resolution rule for CLP languages explicitly specialized to the case where the constraint component is a disjunction and where the domain of computation is the Herbrand Universe. 2. (disj multi-resolution rule) Suppose the current state is A1 ; : : : ; An C ; the selected atom Ai is a disjunctive goal (disj A); F finite fC Sol j(A  C )=) (2  C Sol )g; and F 6= ;. Then

A1 ; : : : ; A n  C =)

W

A1 ; : : : ; Ai?1 ; Ai+1 ; : : : ; An  C Sol 2F C Sol : In words, a nonempty, nite subset F of the solutions to A  C is obtained. The

new state has a goal component consisting of the remaining goals and a constraint component equal to the disjunction of constraints in F .

Fig. 1. Multi-SLD Resolution 4

The nondeterminism in the choice of clauses and in the choice of solutions for goals is essential: breadth- rst search (or some equivalent search strategy such as iterative deepening) is in general necessary to guarantee completeness. For a goal disj A, breadth- rst search requires concurrent consideration of subsets of solutions to A. By answer we mean a constraint (disjunction of substitutions) returned by a successful multi-derivation. By solution we mean a substitution returned as a disjunct in some answer. For 8 queens, MultiLog and Prolog both return 92 solutions. But Prolog returns 92 answers (via backtracking), while MultiLog returns just one answer (assuming all solutions are returned at once to each disj goal). disj

2.1 Multi-SLD Tree An SLD tree describes the search space of an SLD interpreter with a given selection function for a program P and query Q [13]. For Multi-SLD resolution the corresponding structure is the multi-SLD tree. Multi-SLD trees are parameterized by a selection function on atoms in a goal list and by a partition function  on the solutions to disj goals. Each node is labeled by a pair consisting of a goal list and a set of substitutions. In addition, disj nodes (de ned below) are labeled, recursively, with multi-SLD trees. The root is labeled by the query goal and a set consisting of an empty substitution. Each arc is labeled, in a manner to made precise below, by a set of substitutions representing the bindings resulting from that multi-SLD resolution step. The set of substitutions labeling each node is obtained by composition of substitutions on arcs along the path from the root. There are two kinds of nodes corresponding to the two multi-resolution cases above. For normal nodes, the selected atom is not annotated with disj; normal nodes have one child for each multi-resolvent of the selected user atom. For disj nodes, the selected atom is annotated with disj; disj nodes have a nite or in nite number of child nodes, depending on whether the argument goal has a nite number of solutions or an in nite number of solutions, respectively. Each leaf is marked either as a success node (labeled by the empty goal together with an answer), as a failure node (if the node is a normal node with no resolving clauses, or the node is a disj node with no solutions and no in nite multiderivations), or as a loop node (in which case it is a disj node for a goal that has no solution and that has in nite multi-derivations). A branch in a multi-SLD tree can be identi ed with a multi-derivation. Suppose normal node n is labeled with goal list A1 ; : : :; Ai?1; Ai; Ai+1 ; : : :; An and set of substitutions 1 ; : : :; k ; Ai is the selected atom; and H B1 ; : : :; Bm is a (new renaming of a) clause in P such that H can be uni ed with Aj for some j. Then there is a child node c of n labeled with A1 ; : : :; Ai?1; B1; : : :; Bm ; Ai+1; : : :; An and the set of all substitutions 0 such that for some j 2 f1; : : :; kg Ai j = H has mgu j and 0 = j j . The arc from n to c is labeled with the set of mgus j . Notice that there are as many child nodes as there are clauses that multi-resolve with the selected atom given the input substitutions 1 ; : : :; k . 5

Suppose disj node n is labeled with goal list G and nonempty set of substitutions S 0 = f1 ; : : :; k g; and disj A is the selected atom. Let T be a multi-SLD tree with root node labeled with goal list A and substitutions S 0 ; and let S be the possibly in nite, possibly empty set of solutions appearing at success nodes of T. Finally, let S1 ; S2 ; : : : be some partition of S, dependent on the given partition function , such that each Si is non-empty and nite. Then n is labeled with the multi-SLD tree T, and n has child nodes c1 ; : : :, each labeled with goal G?(disj A) and sets of substitutions S1 ; S2 ; : : :, respectively. The arc from n to child node ci is labeled with fj9 2 Si 9j 2 f1; : : :; kg :  = j g. That is, the arc is labeled with the di erential substitutions  that represent the bindings resulting from goal A, one substitution for each solution in the set Si at ci .

3 Two Types of Sharing

In this section we uncover some facts about the logical form of environments in a multi-derivation. We expose two types of sharing among environments, corresponding to the two multi-resolution rules. These facts will be used both in proving soundness and completeness and in devising ecient implementations of multi-SLD resolution.

3.1 Sharing from disj Multi-resolution Steps: Environment Trees

After a multi-resolution step, each of the substitutions in the constraint component extends some unique input substitution from before the step. Moreover, for disj multi-resolution steps, any given input substitution can have multiple output substitutions extending it; if the argument goal succeeds m times then each input substitution can have up to m child substitutions. As multiple disj goals are encountered, execution results in an implicit tree of substitutions, organized according to the parent-child relationship. Each surviving disjunct in the constraint component extends some ancestor disjunct in each previous constraint component of the multi-derivation. These ideas are formalized in the notion of environment tree, which we present with an example. (See [27] for the precise, straightforward de nition.) Figure 2 shows the tree of environments resulting from execution of the following program and query. We assume all solutions are obtained for each disj goal. (The atoms disj p(X), etc. on the right are for documentation only and are not part of the environment tree.) | ?- disj p(X),disj q(X,Y),r(Y),disj s(Z). p(a). p(b). p(c). q(a,1). q(a,2). q(c,4). q(c,6). r(Y):- Y

Suggest Documents