A WAM-Based Implementation of a Logic ... - Semantic Scholar

0 downloads 0 Views 119KB Size Report
actually running on a DECstation 5000, under ULTRIX V4.2A. {Log} Compiler. Source {Log} code extended. WAM extended unification constraint management.
A WAM-Based Implementation of a Logic Language with Sets Agostino Dovier Dipartimento di Informatica Universita’ di Pisa C.so Italia 40, 56100 PISA ITALY [email protected] Enrico Pontelli New Mexico State University Dept. Computer Science Las Cruces, NM 88003 USA [email protected]

Abstract: the paper analyzes an approach for integrating set theoretical constructs into a logic programming language. The focus is on describing a new abstract machine, based on the classical WAM, designed to support these new features. A major part of the paper concentrates on the implementation of the new unification algorithm and the set-constraints management support.

1

Introduction

Set theory represents a universally accepted mean for representing various forms of knowledge, both in formal and commonsense reasoning. Nevertheless their use as a programming tool has been quite limited, due to the inherent complexity of computing with sets and the gap existing between unordered (sets) and ordered (computer memory) structures. In recent years, the evolution of declarative programming paradigms (like functional and logic programming) resulted in more effort being put towards integrating set theory in these new programming languages. Various proposals [2,5,31] describe the problems introduced by inclusion of sets, together with different approaches to solve them. Some of these proposals have been actually implemented, leading to new programming languages which gained a certain success (like SETL [29] and LDL[5]). The language that we are proposing, called {Log}, has been obtained by merging pure logic programming (Horn Clause Logic) with finite set theory. The resulting language proved to be successful in describing problems in different application fields (planning, operational research, deductive database). A complete and formal definition of the language is presented in [15,16].

The main purpose of this paper is to present an actual implementation of {Log}, focusing on the main problems encountered and the solutions proposed. The whole implementation is based on a modified Warren Abstract Machine, which allows to take advantage of the most modern compiler techniques. The first part of the paper is dedicated to a concise description of {Log}; only the features relevant for the implementation analysis are presented. Section 3 presents a general overview of the implementation, based on a modification of the classical Warren Abstract Machine. Section 4 deals with the main aspects of the implementation of the extended unification algorithm used by {Log}. The section focuses on the implementation of the algorithm. Appendix A contains a formal description of the algorithm adopted in our implementation. Section 5 presents the core of the {Log} resolution procedure, based on constraints management [26, 32]. Section 6 draws some consideration regarding the possible interactions between sets management and parallel processing. Section 7 concludes the presentation.

2

{Log}

2.1 Syntax {Log} [15,16] is a pure logic programming language augmented with facilities designed to permit effective use of finite sets in a logical framework. The language is based on a classical signature , where the symbols represent respectively the set of constants, functional symbols, variables, and predicate symbols. The only restrictions that we impose are: ❥ a constant, denoted {}, should always appear in C; ❥ a binary functional symbols, with, is introduced in F; ❥ four new predicate symbols appear in P, and they are respectively indicated as =, ≠, ∈, ∉. These syntactical extensions allows to construct terms used to describe sets. The constant {} denotes the empty set (Ø). The terms constructed with the symbol with have a clear meaning: s with t represents the set s ∪ {t}, that is the set obtained by adding the element t to the set s. In the rest of this paper we will implicitly assume that the first argument of with represents a set (although often superfluous...). Some considerations on the relaxation of this requirement (obtaining the so called colored sets) can be found in [15]. The new predicate symbols are subdivided into two classes: {=, ∈} are called positive constraints while {≠, ∉} are called negative constraints. This distinction affects the way in which such predicates are manipulated. Positive constraints are solved using the extended unification procedure, while negative constraints are manipulated by an ad-hoc constraint solver (called Can). For the sake of simplicity we keep this distinction also at the clause level. A program clause is an object of the form A :- C1 & ... & Cn ◊ B1 & ... & Bm where Ci are negative constraints and Bi are positive constraints and generic atoms. The symbol ◊ is a simple separator. The clause format is analogous to the one adopted in CLP(R) [21]. Example: the following clause describes very simply how is it possible to compute all the possible permutations of a set of elements: permutations({},[]). permutations(R with A, [A|S]) :A∉R◊ permutations(R,S).

Note that the program automatically removes any repeated element in the set. ❚ The kind of constructors introduced, although very simple, proved to be very powerful: some sophisticated set operators (e.g. Restricted Universal Quantifiers [16,8]) can be easily translated using some proper combinations of with, constraint symbols, and recursion [15]. 2.2

Declarative Semantics

{Log} satisfies most of the semantic properties of a pure logic programming language [4]. Some novel problems are introduced by the necessity to mimic the ‘intuitive’ behavior of the new symbols in the various interpretations of the language. A complete development of this semantics features falls outside the scope of this paper. The main results are briefly reported here. A complete description of the semantic structure adopted is reported in [15], while the various semantics results can be found in [16]. Clearly pure Herbrand interpretations and models do not comply anymore with our needs: the Herbrand universe does not allow us to represent the intuitive relations existing between terms like ({} with a) and ({} with a with a) and other similar properties. The declarative semantics of {Log} has been built around the notion of E-Herbrand Universe [20], where E identifies the following equational theory1: (X with Y with Y = X with Y)∀ 2 (X with Y with Z = X with Z with Y)∀ and the E-Herbrand Universe is defined as the set of the congruence classes induced on the classical Herbrand universe by the finest congruence relation satisfying the axioms of E. Intuitively, the main idea is to replace the usual syntactical equivalence with a more powerful concept of equality, which embodies the notion of equivalence between sets, as expressed in the equational theory above. This concept assume a basic importance in the context of unification: a complete study of the development of a unification algorithm modulo E will be presented in a successive work.

1. for the sake of readability we omit the parenthesis in the expressions built with with assuming a left associativity of the operator 2. the notation (...)∀ denotes the universal closure of the open formula between brackets

In this universe it is possible to prove all the basic results required by logic programming (existence and uniqueness of the minimal model, fixpoint semantics, etc.). Apart from this, no other relevant changes should be done in order to give a semantics to {Log}. The constraint symbols are mapped on the correspondent relations over the interpretation domain (∈ → ∈D, etc.). 2.3

Operational Semantics

The operational semantics of {Log} is based on a variation of the CLP model, as described for example in [21]. Only negative constraints are considered in this framework; positive constraints are directly managed by the unification procedure. The unification algorithm itself has been extended in order to implement the extended concept of terms equality required (as described in the previous section). In the rest of this chapter we will present briefly the structure of {Log}’s operational semantics. The presentation is mainly based on examples and intuitive concepts. A formal description of this can be found in [16]. The presentation is articulated in two steps: first the new unification algorithm is introduced, and only after the complete resolution procedure is described. Extended Unification. Clearly standard unification is no more adequate for our purposes (the only solution to an equality like {} with a with b = {} with X with Y is {X/a, Y/b} while we desire also the substitution {X/b, Y/a} to be a solution). Although it has been proven possible to employ for {Log} some pseudo-standard unification procedure (like narrowing & reflection), we have decided to develop our own unification algorithm, essentially for efficiency reasons.Note that the unification problem in {Log} (i.e. the problem of finding a complete and finite set of unifiers) is NPcomplete [14]. A formal definition of the unification algorithm employed is presented in Appendix A. The main differences with respect to the classical Robinson’s approach are: 1. the occur-check rule should be relaxed when a set term is involved. An equality of the form X = X with a0 with ... with ak

in the case in which X does not appear in a0, ..., ak, has solutions, and the most general one is the substitution { X / N with a0 with ... with ak } where N is a generic new variable. 2. the new case in which two set-terms are compared

should be added. The case 1. illustrated above is quite intuitive. Regarding case 2, when two set-terms are compared for equality, all the possible combinations between the elements of the two sets should be considered. In the algorithm used this has been realized by observing that two sets A, B are equal iff A ⊆ B and B ⊆ A. Example: an equality {} with a with b = R with a The set {} with a with b should be a subset of R with a. This means that a, b ∈ R with a. A possible justification of this is that ‘a’ matches the ‘a’ of R with a, while the ‘b’ should necessarily belongs to R. This means that the first subset test leads to the instantiation of R to the set R’ with b, where R’ is a new variable. Vice-versa, since the explicit elements of R with a should belong to {} with a with b, the second subset test reduces to matching the ‘a’ of R with a with the ‘a’ with the ‘a’ of {} with a with b. The whole first phase produces the naive equations a=a and b=b, plus the instantiation for R. The solution returned for this combination is then X = R’ with b (which is another way to express the concept b ∈ X). ❚ It is important to notice that the algorithm adopted produces a finite and complete set of unifiers for the system of equations proposed. The mentioned set may be not minimal (i.e. it is possible that, given σ, θ produced, there exists another substitution γ such that σ = θ ° γ). Producing in a single step a complete and minimal set of solutions seems to be a very difficult problem. The algorithm has been proved to be sound, complete, and always terminating. An aspect on which is very important to draw the attention is the don’t know non-determinism present in set-unification. From a practical point of view, this means that the unification itself may produce choice points (i.e. backtracking points in a sequential implementation).

Constraints Management. As already mentioned before, in the context of {Log} we are managing as explicit constraints all the atomic formulas built using the negative constraint symbols ≠, ∉. The first step in the definition of a constraint solver for {Log} is represented by the definition of a canonical form for these constraints3. A constraint is said to be in canonical form if it satisfies one of the following conditions: a. the constraint has the form τ ∉ γ, where γ ∈ V and γ is not a subterm of τ; b. the constraint has the form γ ≠ τ, where γ ∈ V and γ is

not a subterm of τ. This means that a constraint like X ≠ 5 is in canonical form while something of the form a ∉ X with b is not. Intuitively, a constraint is in canonical form if it cannot be further simplified given the current instantiation of variables. We have proved [16,15] that any conjunction of constraints in canonical form is always satisfiable. The idea is to keep the constraints currently applied to the solution always in canonical form. If the constraints are satisfiable, then a canonical representation exists, while if the constraints do not have solution such canonical form does not exist. This idea is very similar to the solution compactness requirement which is usually adopted in CLP(X) [21]. An algorithm, called Can, has been developed in order to produce a canonical form (if there is one) for an arbitrary set of constraints. As for the case of extended unification, also in this case the constraint manager contains don’t know non-determinism. This is due to the fact that a generic set of constraints Γ may not be expressed in terms of a single equivalent conjunction of canonical constraints. In general we have Γ ⇔ D1 ∨ ... ∨ Dk where Dj is a conjunction of constraints in canonical form. As before, the constraint manager will introduce some further choice points during execution (i.e. backtracking points). The complete algorithm for simplifying constraints is described in Appendix B. Here we just present a simple example to give an intuitive idea of the approach. 3. from now on whenever we use the term ‘constraint’ we refer to the {Log}’s negative constraints.

Example: given the set of constraints { b ∉ X, X ≠ {} with Y } this is clearly in canonical form. If the variable X is instantiated to {} with b, then we obtain { b ∉ {} with b, {} with b ≠ {} with Y } The simplification may start from the first constraint; b ∉ {} with b may be reduced to b ≠ b ∧ b ∉ {} which is obviously always false. The simplification fails and no solution exists for the given system. If X is instantiated instead to {} with c, then the first constraint reduces to b ≠ c ∧ b ∉ {} which is always true. This means that the first constraint may be simply removed. The second constraint simplifies to Y ≠ c ∧ Y ∉ {} ∧ c ∉ {} which can be furtherly reduced to Y≠c ❚ The constraint solver Can satisfies the conditions of soundness and completeness; furthermore its execution is always convergent to a finite set (possibly empty) of solutions. Operational Semantics. Given the previous definition, it is quite easy to introduce the resolution procedure adopted by {Log}. The idea is to extend the well-known Forward-Checking Inference Rule (FCIR - [17,32]): the constraint can be actually solved only when there is a single non-instantiated variable present. But in addition, at each step the constraints are simplified to canonical form. The simplification includes automatically the test for satisfiability. The resolution step is very simple: given a goal :- C ◊ A & R where C is a conjunction of constraints in canonical form, A is the selected atom, and R is the rest of the goal, given a clause H :- C’ ◊ R’ and given an element µ belonging to a complete set of unifiers of the system { A = H }, we can compute Can ( ( C & C’ )µ ) and if < D, σ > is one of the solutions produced by Can, the following resolvent can be adopted :- D ◊ (R & R’)µσ

This definition of extended resolution step is very similar to the one adopted in CLP(X) [21,22].

3

{Log} Implementation

This section of the paper will present an overview of the implementation of {Log}, focusing on some general aspects (term representation, etc.). A more detailed description of the implementation of the unification algorithm and the constraint analyzer is presented in the successive sections. Two implementations of {Log} have been realized till now. The first one was a prototype built as a meta-interpreter in Prolog and allowed us to verify the practical feasibility of the theoretical ideas proposed. The lack of efficiency made this prototype absolutely unusable for any decent application. The second implementation4, which is described in this paper, has been realized extending the most common Prolog engine used in literature, the Warren Abstract Machine [33]. Source {Log} code {Log} Compiler extended WAM code extended unification

extended WAM

constraint management

Fig. 1. {Log} system structure

The previous diagram visualizes the general structure of the implemented system. A compiler will translate the {Log} code into an extended WAM code. This code will then be interpreted by an engine, constituted by an extension of the WAM, interacting with the new unification algorithm and the constraint manager Can. The whole system runs inside an environment very similar to that of Prolog. A symbolic debugger is also provided. For the sake of simplicity, from now on we will use {WAM} to denote the extended WAM.

4. the {Log} system described has been developed in C and it is actually running on a DECstation 5000, under ULTRIX V4.2A

3.1 {Log} Compiler The compiler adopted for {Log} does not present particular differences with respect to the usual standard WAMbased implementations5. The compilation process is articulated into 4 steps: 1. PreProcessing: this phase is aimed to remove some syntactic sugar used in writing the source code. The syntactic sugar allows the programmer to use some short forms for sets (e.g. {a, b, c} instead of writing {} with c with b with a) and to use restricted universal quantifiers (for a description of how to translate RUQs into pure {Log} see [16]). 2. Compilation: the compiler accepts input clauses of the

form p(t1, ..., tk) :- Constraints ◊ Body Each clause is translated into a {WAM} code of the form < allocation environment> < head unification > < Constraints construction> < Can hook > < Body > Different clauses belonging to the same procedure are linked using the usual try_me_else, retry_me_else, and trust_me_else_fail instructions. 3. Linking: the code produced by the compiler is aug-

mented with some standard routines (∈ management, arithmetic, etc.). 4. Loading: the process is terminated by loading the dif-

ferent pieces of code in the code area and solving the remaining suspended references. Appendix C gives a sample of {WAM} code. 3.2

{WAM} Data Areas

The {WAM} architecture is analogous to the standard WAM, as described in [3,10,18] (we did not adopt certain popular modifications, like splitting the stack in local stack and choice-point stack). Fig. 2 illustrates the various data structures adopted in our model. We will later describe the

5. except for the fact that the compiler has been realized using YACC&Lex for increasing the portability and efficiency

new kind of information that is stored in these data structures. STACK

HEAP

TRAIL

PDL

environment choice point

terms ref. to variable

terms. The structure of a memory word on the heap (or stack) is

pairs of terms

Tag

Value

Ctrl.Bits

where: a. Tag identifies the type of term represented; the set of values for the Tag proposed by Warren (ref, con, str, lis) has been extended with two new types, set, used for set-terms, and ctr, used for constraints. b. Value can be an effective value (constant) or a pointer

---- -- - - --- -- - - -WAM Instructions --- --- --- - -- ----- --- ---- -- ----- ---------- - -- - -- ---- ---- - -- --

CODE

Queue1 Queue2

to another Heap/Stack cell. c. Control Bits they are used to encode some particular

information regarding the status of the term (i.e. identifies the rightmost term in a complex structure). A set term s with t is represented on the Heap as shown in the next picture.

Constraint Table < t representation > set

Fig. 2. {WAM} data areas

The objects on the Stack are choice points and environments. The main difference w.r.t the basic WAM is the extension of the choice point, and the introduction of a new kind of choice point (Data Choice Point) used to code the don’t know non-determinism resulting from the unification algorithm. The trail will be used to store not only conditional variables (e.g. variable bound after the creation of a choice point), but also certain pointers (pointer to top of the PushDown List), in order to simplify the backtracking process. The purpose of the two queues and the constraint table will be explained in a successive section. 3.3

Terms Representation

Terms in {WAM} are represented exactly in the same way as in WAM. The only difference is represented by the introduction of two new kind of objects on the Heap, the set-terms and the constraints. Set-terms. Although the internal representation of sets is based on the use of normal terms (those whose main functor is with), it is useful to distinguish on the heap set term from the other

Fig. 3. Set-terms representation on the Heap

Constraints. A (negative) constraint is built explicitly, making use of some new instructions, on the Heap. A constraint is represented using a Heap cell, marked with the ctr tag and followed by the n cells representing the constraint argument (if n is the arity of the constraint functor). The representation has been kept general (constraint with general functors and general arity) in order to allow future expansions. A constraint π(t1, ..., tk) is represented as show in fig. 4.

< tk representation > < t1 representation > π/k ctr Fig. 4. Representation of a constraint

typically, the arguments of the constraint are just pointers to other terms on the Heap.

the counter is reset). The user can modify this threshold in order to tune the performance of the program.

3.4 Instruction Set Some new instructions has been added in the {WAM} instruction set in order to support set-terms management and constraints creation and reduction.

Constraint Management. The choice of keeping the constraints distinguishable from the other literals leads to the necessity of introducing some specific instructions for creating constraints on the Heap.

Set-Terms Management. Although it would be possible to treat set-terms as any other generic term, we chose to distinguish between them, in order to simplify their identification during the unification process. Two new instructions have been introduced:

• put_set Ai - which is used to create a new set-term cell on the Heap, returning in register Ai a reference to such cell;

• get_set Ai - which is used during head-unification to match the argument contained in the Ai register with a set-term. For the rest, the set-terms are managed exactly in the same way as other complex terms (except during unification, of course). Control Instructions. First of all, focusing on the specific implementation realized, we would like to point out that the instruction trust_me_else_fail has been removed6. The last clause in a procedure is introduced by an instance of the retry_me_else instruction: the argument of this last occurrence is a system-defined constant, called FAIL. Only a new control instruction has been added:

• call_Can - this instruction is used to invoke the Can constraint manager. The instruction will pass the control to the Can; at the end of the constraint simplification phase, if a canonical form is reached, then the control is transferred to the successive instruction. Otherwise the control is moved to the backtracking procedure. The argument is used for realizing the classical environment trimming. The current implementation allows the user to specify, using a proper directive, a call threshold: a counter is incremented each time call_Can is encountered and the instruction is actually executed only when the counter has reached the specified threshold (and immediately afterwards 6. this simpliefies the introduction of dynamic predicates

Two instructions are used for this purpose:

• write_constraint - this instruction allocates a ctrcell on the heap, storing the information in the value part.

• write_c_arg Ai - this instruction is used to specify an argument of a constraint. A constraint of the form π(t1, ..., t k) is created using a piece of code of the form: < build t1 and leave a reference in A1 > .... < build tk and leave a reference in Ak > write_constraint π/κ write_c_arg A1 ... write_c_arg Ak 3.5 Code Format The format of the code produced by the compiler for a set of clauses is very similar to the standard WAM code. The main differences are represented, by the introduction of a constraints construction step immediately after head unification and the presence of an explicit Can call used to invoked the constraint manager. Explicit calls to the constraint manager are made for flexibility reasons: since the implementation may easily allow the introduction of different kind of constraints, it may be necessary to invoke the manager at different moments during execution. This approach allows such kind of extension in a very simple way. Furthermore the call-threshold allows to tune the frequency of execution of the constraint manager depending on the level of overhead considered acceptable.

The following picture depicts the structure of a {WAM} procedure. try_me_else .... Clause 1

Head Unification Constraint Build.

retry_me_else ... Clause 2

Can call 1st Subgoal

choice point. This is done, obviously, making use of choice points. On the other side, the choice points supported by standard WAM are not suitable for our needs, since we need to save a completely different kind of information. For this reason we chose to create a distinction between the two kind of choice points:

• the choice points originated by the usual Prolog computation are called procedural choice points; their structure is unchanged except for a new tag field.

• the choice points originated by the extended unification retrry_me_else FAIL nth Subgoal Clause m Fig. 5. {WAM} procedure organization

4

Unification Procedure

The unification algorithm used in the standard WAM has been replaced in {WAM} with a call to an implementation of the extended unification previously described (and completely specified in Appendix A). The main problems faced in this part of the implementation model are: 1. finding a suitable representation of the alternative paths which can be taken in the don’t know non-deterministic points; 2. find an effective (i.e. computable) representation of the

non-determinism implicit in the algorithm and modify the backtracking algorithm in order to deal with it; 3. develop a schema which allows to reduce the quantity

of information saved in each choice point. The successive subsections analyze in detail the various solutions adopted for these problems. 4.1 Alternative Representation The unification algorithm generates non-deterministic choices only in one case: when two terms of the form (.with.) are compared. Since such kind of comparison may be solved in different ways (leading, possibly, to different solutions), then it is necessary to leave on the {WAM} stack a trace of this

procedure are called data choice points; the layout of a data choice point is shown in the following picture.

Tag s1, s2 Sequence E CP B Top of Trail Top of Heap return address Fig. 6. Content of a Data Choice Point

The Tag will store a value used to discriminate during backtracking between the two different kinds of choice points. The fields for storing s1, s2, and the sequence will be explained later. The E, CP, B, Top of Trail, and Top of Heap stores the old values of such registers. Finally, the return address is the address of the {WAM} instruction which should be executed once unification is terminated. During backtracking, whenever a choice point is found on the stack, the backtracking algorithm will take different actions depending on the value of the Tag field. What remains to describe is how to represent the different alternatives of a data choice point. In the procedural choice point the alternatives are simply represented by the address of the next clause to be tried. Let’s assume that the two terms to be matched are: R with ak with ... with a0 = S with bh with ... with b0

The unification procedure tries to express the different ways in which the above equation could be reduced to ∀i . ∃j . ai = bj ∨ ai ∈ S ∀i . ∃j . bi = aj ∨ bi ∈ R This idea has been expressed in {WAM} using the Matching Patterns. From the example above it is possible to generate a sequence of matching patterns: each of them is composed by a pair of sequences of non-negative integers, one of length k+1 and one of length h+1 seq = ( (v0, ..., vk), (w0, ..., wh) ) such that ∀ 0 ≤ i ≤ k . 0 ≤ vi ≤ h+1 and ∀ 0 ≤ j ≤ h . 0 ≤ wj ≤ k+1 Each matching pattern describes one possible matching between the elements of the two set-terms. The corresponding matching is: a. if vi ≤ h, then an equation of the form ai = bvi is assumed; b. if vi = h+1, then a positive constraint of the form

ai ∈ S is assumed. and vice-versa for wj. The original equation is replaced by a set of equations, obtained by the equations (pure equations) generated by the current matching pattern plus the two equations generated from the new ∈-relations. Assuming that {r1, ..., rp} are the indexes of the elements of the first set reduced to ∈-relations and {c1, ..., cq} are the indexes of the elements of the second set reduced to ∈relations, then the two additional equations are R = N with bcq with... with bc1 S = N with arp with... with ar17 Obviously we do not want to generate blindly all the possible matching patterns and the consequent equations. For this reason two optimizations have been implemented: 1. Removal of Redundant Equations: the process of extracting equations from the matching pattern tries to avoid repeated equations; 2. Most General Matching Pattern (MGMP): it is possi-

ble to discard many matching patterns which would 7. if R = S then a single equation of the form R = N with bcq with... with bc1 with arp with... with ar1 is generated

lead to solution less general than others. The idea is that, if vi ≤ h (i.e. an equation of the form ai = bvi) then the only meaningful values for wvi are i and k+1. A different value would generate an equation of the form awvi = bvi which, combined with the previous one, will lead in general to a solution which is less specific than the one obtained from the same matching pattern with wvi equals to i. In the current implementation only MGMP are generated and redundant equations are removed. Example: given the equation R with f(Y) with X = {} with f(Z) with f(a) a pattern like ((0, 1), (0, 1)) is acceptable, and will lead to the system of equations X = f(a) f(Y) = f(Z) R=N {} = N while a pattern as ((0, 1), (1, 1)) will be rejected, since it leads to the system X = f(a) f(Y) = f(Z) f(Y) = f(a) R=N {} = N which is clearly subsumed by the previous one. ❚ 4.2 Unification & Backtracking Algorithm In this section we give a schematic structure for the unification and the backtracking algorithm. Unification. The external structure of the unification function is the following: unify (arg1, arg2, flag) ... if (( ) && (< arg2 is a set> )) if (! flag) k = SetSize (arg1); h = SetSize (arg2); GenerateFirstPattern(k,h); CreateDataChoicePoint(k,h); else GenerateNextPattern (); GenerateEquation (); endif ...

endif The flag is used to signal whether the unification is entered directly or on backtracking. When unification is entered directly (first time) a new data choice point is allocated and the first pattern is generated and stored in it. When unification is entered from backtracking (the backtracking has found a data choice point on the stack), then a new pattern is generated (and stored in the choice point). The GenerateEquations function produces the new equations extracted from the current matching pattern. Backtracking. The pseudo-code for the backtracking procedure assumes the following aspect: if (B ≠ Undefined) case Type(B) of Procedure: P = NextInstruction(B); Data: if (LastPattern(B)) RemoveChoicePoint(B); Backtrack(); else if (! RestartUnify(B)) Backtrack(); endif endcase else EndOfProgram(); endif Whenever a data choice point is encountered, if all the matching patterns have been analyzed (LastPattern), then the choice point is removed and backtracking is continued. Otherwise unification is restarted (RestartUnify): this means that the correct framework is rebuilt, using the information stored in the choice point, and the unify function is reactivated (with the proper flag). 4.3 Pdl Management Another problem related to the implementation of the extended unification procedure regards the management of the Pdl (Push-down list). The Pdl is a stack structure which is used in the WAM architecture to maintain the current set of equations to be solved during the unification process. The new problem emerging regards the necessity of saving the “current status” of the Pdl whenever a data choice

point is created. In fact, each time we are backtracking inside the unification algorithm, we need to find on the Pdl the proper set of equations to be solved. Example: if we have a system of equations of the form { {} with a = X with Y, f(X) = f({}) } and the first equation is selected first, then a data choice point is created and the reference to the equation f(X) = f({}) should be somehow saved; in fact, whenever we are backtracking looking for new solutions for the first equation, we have also to solve again the second one (w.r.t. the new solution generated). ❚ Since the elements of the Pdl are just pairs of pointers to the Heap, we can simply solve the mentioned problem by transferring the content of the Pdl to the Trail when the new data choice point is created. On backtracking, the content of the Pdl will be easily retrieved by unwinding the trail stack. In order to avoid saving the same pieces of information many times on the trail, a small stack of pointers to the pdl has been introduced, which allow to detect parts of the Pdl which have already been moved to the trail.

5

Constraint Management

5.1 Introduction The operational semantics of {Log} takes full advantage of the management of negative constraints in order to perform an ‘a priori’ pruning of the solutions search tree. The kernel of the constraint management activity is represented by the Can procedure, used to

• simplify conjunction of constraints; • verify the satisfiability of a conjunction of constraints; The abstract structure of Can is shown in Appendix B. Here following we will concentrate on some aspects of the implementation of Can and integration of the constraint management activity into {WAM}. This section starts with a description of the idea of constraint status and the presentation of the different phases which characterize the life of a constraint. Successively the main data structures introduced to support constraints are described, with particular attention to the idea of constraint indexing.

Finally we present some general ideas on the actual implementation of Can, focusing on the generality and flexibility of the approach taken. 5.2 Constraint Status Each constraint present in the system during an execution is subject to a status-based classification. Each constraint can be in one of the following three states (one and only one at the time): 1. Not_existing: the constraint has still to be created - its

existence is still implicit into the {WAM} instructions yet to be executed; 2. Active: a constraint can be active if

a: it has just been created and not reduced yet; b: it has been reactivated by the unification procedure; 3. Sleeping: a constraint is sleeping if it is in canonical

form and it has already been completely reduced. The current status of a constraints implies also which is its current location: 1. a constraint in active status is always located in one of the two constraint queues; 2. a constraint in sleeping status is always located into the

constraint table. The following directed graph illustrates the possible transitions of a constraint from one status to another. creation not created

active Unification

sleeping

constraint simplification

Fig. 7. Transition Graph for Constraint Status

A constraint, once created, is immediately assigned to active status and loaded into one of the two queues. When Can is invoked, all the active constraints are on the queue and a simplification is attempted. If the Can procedure manages to produce at least one solution (i.e. normalized system of constraints in canonical form), then the final set of constraints produced is moved to the constraint table.

Constraints in the constraint table are properly indexed in order to guarantee a quick access. A constraint should be taken again into consideration whenever one of its variables get instantiated (as explained by P. Van Hentenryck in [32]). The indexing mechanism allows to re-activate a constraint in sleeping state (i.e. move it again to active state) during the unification process. Once re-activated the constraint is transferred back to one of the queues for successive simplifications. 5.3 Data Structures As briefly said before, two new data structures are introduced in {WAM} in order to support the constraint management process: • Working Queues; • Constraint Table. The working queues are simply used to support the execution of the Can procedure (as obvious by reading the algorithm in Appendix B). Basically the queues as the same role for Can of the pushdown list Pdl for the unification procedure. Constraints which are reduced are moved from one queue to the other. At the end of one cycle the two queues are switched and the process repeated until no more simplifications are possible. Once a simplified system of constraints has been produced, the correspondent constraints are moved to the constraint table. In doing this, an indexing process is applied. A constraint in canonical form can be ❥ X ≠ t and X does not appear in t; ❥ t ∉ X and X does not appear in t. It is quite clear that the only instantiation which may allow to furtherly simplify such constraint is one regarding the variable X; any other instantiation will not affect the constraint (X is the forward variable in Van Hentenryck’s nomenclature). We say that X is the index of the constraint. All the constraints with the same index are stored in a common area of the constraint table and a link from the variable (in the Heap/Stack) to such constraint table area is created. A slight difficulty occurs when the constraint has the form X ≠ Y or X ≠ Y with ..... In these two cases the

contraint is duplicated and stored under two different indexes (X and Y). Constraint Table

X:

ref X ≠ ... ... ∉ X ....

d. Choice Point Management: since the Can procedure is

inherently non-deterministic, some instructions have been provided in order to allocate choice points and to allow proper integration of Can with the backtracking procedure In [28] it is possible to find the complete listing of the Can procedure express in terms of these new high-level instructions.

6 Fig. 8. Unbound Variable in {WAM}

The Unification procedure is actually modified. A check is performed whenever a variable is instantiated and, eventually, all the correspondent constraints are released on the current working queue. This approach is somehow similar to the one used in Sicstus WAM for constrained variables [9]. 5.4 The Can Procedure Can has been implemented as an extension to the {WAM} instruction set. A new instruction, called call_Can, is introduced to invoke the constraint manager on the constraints currently stored in the current working queue (a register AQ is introduced to point to the current working queue). This procedure, realized using a piece of C code, has been written in a way to be as much flexible and easily modifiable as possible. For this reason a certain number of high-level {WAM} instructions have been designed and implemented (through macros) and the complete Can has been described in terms of these high-level instructions. The following categories of high-level {WAM} instructions have been introduced: a. Term Analysis Instructions: they have an effect similar to the get/unify instructions of the WAM but they avoid variable instantiation, they allow to read generic terms (without specifying their type), perform occur-checks, etc. b. Constraint Management: these instructions are used to

read constraints, create indexes, store constraints in the constraint table, etc. c. Comparison Instructions: they are used to perform a

Conclusions

6.1 Results The solutions described in the previous sections of this work have been adopted in the current implementation of {Log}. Here follow timings of some benchmarks. The reader is referred to [28] for the source code of these benchmarks and a more precise analysis of the system performance. In the basic cases the execution times are quite acceptable, while for sets of bigger size the time increases; this is due to the fact that the problems are such to lead to a complete exploration of the search space. On the other side the complexity and readability of the programs is improved with respect to analogous programs written in Prolog. Both the first two examples are obtained executing the same program: subset({},_). subset(B with A, X) :A∉B◊ A ∈ X, subset(B,X). The powerset is obtained by executing subset with the first argument not instantiated. TABLE 1.

Subset Testing (from a set of 10 elements)

Number of Elements

time (seconds)

1 2 3 4

0.00391 0.03125 0.96598 2.04451

TABLE 2.

Powerset Computation

Set Size

Time (seconds)

1

0.01953

pure syntactical check between specified arguments;

2 3 4 5

0.05469 0.33592 2.00391 18.10168

The following program is used to compute a path in a directed graph without passing through some prohibited nodes. The main part of the program is composed by only two clauses: path(From,From,stop). path(From,To,step(From,Y)) :From ≠ To, From ∉ Ζ ◊ prohibit(Z), graph(W), edge(From,New) ∈ W, path(New,To,Y). TABLE 3.

Path Search

Path Length/Prohibit.Nodes

Time (seconds)

2 3 4 5

0.02734 0.04297 0.07421 0.14459

This benchmark shows also some good performances due to the high level of pruning performed by the constraints in the path clause. 6.2 Summary This paper describes an extension of a Horn-clause logic language (called {Log}) with set theoretical constructs, focusing mainly on the implementation aspects. A complete abstract machine supporting the execution of {Log} is described, extending the most classical model adopted for Prolog implementation, the Warren Abstract Machine (WAM). The use of such a well-known model allows a high level of portability of the current implementation, guaranteeing at the same time the same level of efficiency of most of the common Prolog compilers on pure Prolog programs. The efficiency on problems involving complex sets may be affected by the inherent complexity of set-management problems (e.g. comparison of sets) but the solution here proposed represents, in our opinion, a good trade-off between speed and expressiveness of programs and computed answers.

The extensions required to convert a WAM into a {WAM} are well localized (set-terms support, extended unification, constraints management), allowing a high-level of modularity in the implementation. A complete implementation of a unification procedure supporting sets has been studied, introducing the novel idea of matching patterns. This technique seems to offer some good performances. Furthermore many heuristics can be developed on this concept in order to perform some pruning of the search space and increase the performance. Set-based constraints are managed using a constraint analyzer, whose implementation takes advantage of the possibility of indexing on single variables the simplified constraints. This allows a fast reactivation of suspended constraint during the unification process. 6.3

Future Works

We will not deal in this context with the innumerable theoretical extensions that can be made around the {Log} idea (and many of them are currently under investigation). For what regards the implementation, more can be done in terms of optimizing the design and improving the efficiency, in particular for what regards the unification algorithm. The possibility of applying faster algorithm to special cases (like reducing the problem of matching ground sets to the problem of finding the solutions of a linear diophantine equation) may allow to improve the global performance of the system. Another project actually under consideration is to extend some more powerful WAM-based machines to support {Log} execution. In particular we are currently considering an integration between {WAM} and some parallel execution models for logic programming (like RAP-WAM [19] and Muse [23]). Finally, some further work has to be done in order to support at the compiler level some more powerful constructs, like negation and/or set-grouping.

Acknowledgments A special ‘thank you’ goes to the following people who helped in different ways in the realization of this project: I. Cervesato, G. Gupta, E. Lamma, E.G. Omodeo, G. Pieroni, G. Rossi, L.P. Slothouber. Enrico Pontelli is supported by NSF Grant CCR 92-11732. References

[1]

[2]

[3] [4]

[5]

Aliffi D., Dovier A., Omodeo E.G., Rossi G. (1993) “Unification of hypersets terms” - Submitted to Fifth International Conference on Rewriting Techniques and Applications.

[14] Dovier A., Omodeo E.G., Pontelli E., Rossi G.

Ambert F. (1991) “Systemes de Resolution de Constraintes Ensemblistes” - Research Report, Université de Franche-Comte, Faculté des Sciences de Besançon.

[15] Dovier A., Omodeo E.G., Pontelli E., Rossi G.

Ait-Kaci H. (1990) “The WAM: A (Real) Tutorial” DEC Technical Report, January 1990.

[16] Dovier A., Pontelli E. (1991) “La programmazione

Apt K., Blair H., Walker A. (1987) “Towards a theory of declarative knowledge” - in (J. Minker ed.) Foundations of deductive databases and Logic Programming, Morgan Kaufmann, Los Altos, CA. Beeri C., Naqvi S., et al. (1987) - “Set and Negation in a Logic Database Language (LDL1)” - Proceedings of the 6th ACM SIGMOD Symposium.

[6]

Buettner K.A. (1989) - “Fast Decompilation of Compiled Prolog Clauses”.

[7]

Campbell J.A. (1984) - “Implementations of Prolog” - J. Wiley & Sons.

[8]

Cantone D., Ferro A., Omodeo E.G. (1989) “Computable Set Theory” - International Series of Monographs on Computer Science, Oxford Press.

[9]

Carlsson M. (1991) “The SICStus Emulator” - SICS Technical Report T91:15, Swedish Institute of Computer Science.

[10] Cavalieri M., Lamma E., Mello P. (1988) - “Warren

Abstract Machine: una macchina astratta per l’implementazione efficiente di Prolog” - Technical Report, Universitá degli Studi di Bologna, March 1988. [11] Conery J.S. (1987) - “Parallel Execution of Logic

Programming” - Kluwer Academic Publishers. [12] Debray S.K. (1989) - “Register Allocation in a Pro-

log Machine” - Proceedings of the Symposium on Logic Programming, IEEE Computer Society, Salt Lake City, UT. [13] Dovier A., Omodeo E.G., Pontelli E., Rossi G.

(1991) - “{Log}: A Logic Programming Language with Finite Sets” - in Logic Programming: Proceedings of the 8th International Conference, MIT Press.

(1992) - “Embedding Finite Sets in a Logic Programming Language” - Third Workshop on Extensions of Logic Programming, Bologna, Italy. (1993) “Embedding Finite Sets in a Logic Programming Language” - Research Report, University of Rome, La Sapienza. logica con insiemi” - Thesis (in Italian), Universitá degli Studi di Udine, Italy. [17] Dincbas M., Van Hentenryck P., et al. (1988) - “The

Constraint Logic Programming Language CHIP” Proceedings of the International Conference on Fifth Generation Computer Systems, ICOT, Tokyo. [18] Gabriel J., Lindholm T., Lusk E.L., Overbeek R.A.

(1985) - “A tutorial on the Warren Abstract Machine for Computational Logic” - Technical Report ANL84-84, Argonne National Laboratory. [19] Hermenegildo M., Greene K. (1991) “The &-Prolog

System: exploiting Independent And-Parallelism” in New Generation Computing, 9(34). [20] Holldobler S. (1989) “Foundations of Equational

Logic Programming” - LNCS 353, Springer Verlag. [21] Jaffar J., Lassez J.L. (1987) - “Constraint Logic Pro-

gramming” - 14th POPL, Munich. [22] Jaffar J., Michaylov S. (1987) - “Methodology and

Implementation of a CLP System” - Proceedings of the 4th International Conference on Logic Programming, MIT Press. [23] Karlsson R. (1992) “A high Performace OR-Parallel

Prolog System” - PhD Thesis, SICS Dissertation Series 07. [24] Kwon K., Nadathur G., Wilson D.S. (1991) -

“Implementing Logic Programming Languages with Polymorphic Typing” - Technical Report CS-199139, Duke University. [25] Kursawe P. (1987) - “How to invent a Prolog

machine” - in New Generation Computing, 5 (1987). [26] Legeard B., Legros E. (1991) - “CLPS: A set con-

straints logic programming language” - Technical Report, Universitè de Besançon.

[27] Nadathur G., Jayaraman B. (1989) - “Towards a

WAM Model for λProlog” - Technical Report CS1989, Duke University.

[28] Pontelli E. (1992) - “Logic Programming with Sets:

Theory and Implementation” - MSC Thesis, University of Houston, Houston, TX. [29] Schwartz J.T., Dewar R.B.K., Dubinsky E., Schon-

berg E. (1986) “Programming with sets, an introduction to SETL” - Springer Verlag. [30] Shen K. (1992) “Exploiting Dependent And-paral-

lelism in Prolog: the Dynamic Dependent And-parallel Scheme” - in Proc. Joint Int’l Conf. and Symp. on Logic Programming, MIT Press, 1992. [31] Sigal R. (1989) “Desiderata for Logic Programming

with Sets” - in Proceedings GULP’89: Fourth National Conference on Logic Programming, Bologna, Italy. [32] Van Hentenryck P. (1989) - “Constraint Satisfaction

in Logic Programming” - MIT Press, Cambridge. [33] Warren H.D. (1983) “An Abstract Prolog Instruction

Set” - Technical Note 309, SRI International, Menlo Park, CA.

Appendix A: Unification Algorithm function unify (E: Herbrand System): Herbrand System; begin if (E is in Solved Form) then return E; else select e ∈ E; case e of X = X: return E \ {e}; t = X (t ∉ V): return ( E \ {e} ) ∪ { X = t }; X = t (X is not in t and appears elsewhere in E): return ( ( E \ {e} )σ ∪ {e} ) with σ= {X/t}; X = X with t0 with .... with tn (X does not appear in t0, ..., tn): return ( (E \ {e}) ∪ { X = N with t0 with .... with tn } ); f(t0,..., tk) = f(s0, ..., sk): return ( (E \ {e}) ∪ { t0 = s0, ..., tk = sk } ); X with t0 with .... with tn =Y with s0 with .... with sm: given U,V new variables, compute subset({} with t0 with .... with tn ,V with s0 with .... with sm ); subset({} with s0 with .... with sm , U with t0 with .... with tn ); which will return instantiations for U and V U ≡ W with a0 with .... with ak V ≡ W with b0 with .... with bh together with two new sets of equalities EqV and EqU; if (R ≠ S) or ( R = S = {}) then return ( (E \ {e}) ∪ EqV ∪ EqU ∪ {R = W with b0 with .... with bh} ∪ ∪ {S = W with a0 with .... with ak} ); else return ( (E \ {e}) ∪ EqV ∪ EqU ∪ ∪ {R = W with b0 with .... with bh with a0 with .... with ak} ); endif; otherwise: FAIL; endcase; endif; end; function subset (s1, s2: Set Terms): Herbrand System; begin let s2 be U with b0 with .... with bh; if (s1 = {}) then return ∅; else let s1 be R with a; choose non-deterministically return { a = b0 } ∪ subset(R,s2); .... return {a = bh} ∪ subset(R,s2); return { U = N with a } ∪ subset(R,N with b0 with .... with bh); end

Appendix B: Constraint Manager function Can (C: Constraints System): < Constraints System, Substitution >; begin for each c ∈ C do enqueue (c, queue1); endfor; modified := FALSE; repeat while (not EmptyQueue(queue1)) do dequeue (c, queue1); case c of t ∉ a (a ∈ C): t ∉ f(...) (f ≠ with): t ∉ X and X appears in t and t not set-term: a ≠ b (a,b different constants): a ≠ f(...) or f(...) ≠ a(a constant, f functional symbol): X ≠ t (or t ≠ X) and X appears in t: X ≠ Y with a0 with ... with ak (or vice-versa) and X appears in ai: f(...) ≠ g(...) (f,g different functional symbols): continue; t ∉ X and X does not appear in t: enqueue (t ∉ X, queue2); continue; X ≠ Y with a0 with ... with ak (or vice-versa) and X does not appear in a0, ..., ak, Y: enqueue (X ≠ Y with a0 with ... with ak, queue2); continue; t ∉ s with v: enqueue (t ≠ v, queue2); enqueue (t ∉ s, queue2); modified := TRUE; continue; X ≠ t (or t ≠ X) and X does not appear in t: enqueue (X ≠ t, queue2); continue; X ≠ X with a0 with ... with ak (or vice-versa) and X does not appear in a0, ..., ak: < non-deterministically choose one of the following: > - enqueue (a0 ∉ X, queue2); ... - enqueue (ak ∉ X, queue2); modified := TRUE; continue; f(t0, ..., tk) ≠ f(s0, ..., sk) and f is not with : < non-deterministically choose one of the following: > - enqueue (t0 ≠ s0, queue2); ... - enqueue (tk ≠ sk, queue2); modified := TRUE; continue; s with t ≠ s’ with t’: < non-deterministically choose one of the following: > - enqueue (Z ∉ s’ with t’, queue2) and add substitution produced by Z ∈ s with t; - enqueue (Z ∉ s with t, queue2) and add substitution produced by Z ∈ s’ with t’; modified := TRUE;

continue; otherwise: FAIL; endwhile; switch_queue (queue1, queue2); until (not modified); end.

Appendix C: {WAM} code The following source code is considered: p(X, Z, B) :X≠Z◊ X ∈ B. p(X, Z, B) :X∉B◊ r(X) & q(X,Z). The compiler produces the following code: p/3_0: 76 try_me_else p/3_1 77 allocate 2 78 get_variable Y0, A0 79 get_variable A3, A1 80 get_variable Y1, A2 81 put_value A3, A0 82 put_value Y0, A1 82 write_constraint neq 83 write_c_arg A0 84 write_c_arg A1 85 call_Can, 2 86 put_value Y0, A0 87 put_value Y1, A1 88 deallocate 89 execute in_0 p/3_1: 90 retry_me_else FAIL 91 allocate 2 92 get_variable Y0, A0 93 get_variable Y1, A1 94 get_variable A3, A2 95 put_value Y0, A0 96 put_value A3, A1 97 write_constraint nin 98 write_c_arg A0 99 write_c_arg A1 100 call_Can, 2 101 put_value Y0, A0 102 call r/1_0, 2 103 put_value Y0, A0 104 put_value Y1, A1 105 deallocate 106 execute q/2_0

3

Suggest Documents