Performance of Sequential SLG Evaluation 1 ... - Semantic Scholar

2 downloads 0 Views 246KB Size Report
Nov 15, 1993 - SOLARIS, IRIX, ULTRIX, LINUX, 386BSD, AMIGA-DOS, HP-UX, System V.3, SCO Unix, and Mach have been tested. A port for Windows NT ...
Performance of Sequential SLG Evaluation Terrance Swift

Computer Science Department University at Stony Brook Stony Brook, NY 11794-4400 [email protected]

David S. Warren



Department of Computer Science University at Stony Brook Stony Brook, NY 11794-4400 Tel: 516-632-8454 Fax: 516-632-8334 [email protected]

November 15, 1993

Abstract

SLG is a table-oriented resolution method that is gaining increasing attention due to its ability to combine the deductive database and logic programming paradigms. As an example of its applicability to deductive databases, SLG is asymptotically equivalent to magic template evaluations over the class of de nite programs. In terms of the logic programming (and non-monotonic) paradigms, SLG computes the well-founded model for non-strati ed programs, and has been extended to compute 3-valued and stable model semantics. SLG resembles SLD in that it admits a tuple-at-a-time resolution method, so it can make use of many of the techniques developed in implementing SLD over the past decade. Indeed, a program can contain any mixture of SLG and SLD evaluated predicates. As a result SLG forms a natural basis for extending Prolog systems. This paper presents performance analysis and comparisons of the SLG-WAM, an abstract machine-oriented implementation of SLG. Firstly, the results indicate that when executing SLD, the overhead of the SLG-WAM compared to a similar WAM implementation is minimal, and usually less than 10%. Furthermore, results are presented to indicate tradeo s between SLD and SLG evaluation for a variety of programs. Surprisingly, SLG derivation may be expected to be competitive with SLD derivation on numerous datalog programs, even when SLD terminates and contains no redundant subcomputations. Finally, performance comparisons indicate that the SLG-WAM is about an order of magnitude faster than current deductive database systems even for datalog queries. These results call into question traditional arguments about top-down versus bottom-up evaluation eciency, and may provide information of use in designing and implementing the next generation of logic programming and deductive database systems.

Keywords: Bottom-Up Evaluation, Implementation, Performance, Deductive Databases.

1 Introduction Over the last decade various researchers have explored implementations of logic for data-oriented queries ([15] provides a summary of the research). Despite its ecient engine, Prolog has not always proven useful in these contexts. SLDNF is susceptible to in nite loops in programs as simple as transitive closure, giving rise to semantic problems. Even when it terminates, it may execute slowly because of redundant subcomputations. The Deductive Database community deserves signi cant credit for identifying and even solving these problems. Numerous rewriting techniques have been developed that produce the goal directedness of a 

Supported in part by the National Science Foundation under Grant No. CCR-9102159.

top-down evaluation in a bottom-up framework. Nonetheless, producing an ecient bottom-up engine, that moreover can be smoothly integrated with the procedural power of Prolog is a dicult enterprise that we believe has not yet been completely accomplished, despite such noteworthy e orts as CORAL [14], LDL [6], and GLUE/NAIL [7]. SLG resolution [5, 4] o ers an alternative implementation strategy to those based on magic evaluation methods. SLG is complete for non- oundering programs with nite models, whether they are strati ed or not1. SLG has been implemented in the XSB logic programming system. XSB's engine, the SLG-WAM, currently implements a version of SLG restricted to modularly strati ed programs [16]. A metainterpreter is provided to evaluate non-strati ed programs according to the well-founded semantics [21] or, equivalently to the three-valued stable model [12] semantics for that program. XSB appears to evaluate strati ed queries much faster than current bottom-up approaches, a claim that will be substantiated in section 4. XSB's implementation of SLG relies less on rewriting techniques and on alternate control strategies than is usual in bottom-up approaches. This allows XSB to maintain an operational interpretation of a program in almost as elegant manner as Prolog | without relying on the user's knowledge of that interpretation to avoid in nite loops and redundancy. Version 1.4 of XSB has been tested on over a dozen hardware and operating system platforms2 and on databases whose relations have on the order of hundreds of thousands of tuples. It is available through anonymous ftp from cs.sunysb.edu.

2 Preliminaries

2.1 SLG Resolution

SLG resolution [5, 4] uses memoing to evaluate general logic programs. The actual details of SLG evaluation have been presented elsewhere and are beyond the scope of this paper. However a short summary of the important features of SLG can be given here. These features may be presented most clearly by describing the action of SLG rst on de nite programs, then on strati ed programs, and nally on general logic programs. On de nite programs, SLG reduces to SLD resolution with memoing. Each call to a selected subgoal must check whether (a variant of the) subgoal has been previously called during the evaluation. If not, the subgoal is copied to a table and program clauses are resolved against the subgoal exactly as in SLD. If a variant subgoal has been previously called, the subgoal is resolved against answer clauses in the table. These answer clauses, in their turn, are added to a table for a subgoal whenever that subgoal is resolved away during the course of an evaluation. The evaluation itself is completed when all program clauses and answer clauses have resolved against all applicable subgoals. Since answer clauses may be created during the course of the same evaluation that uses them, the evaluation may be seen as a sort of x point computation. Subgoals for which an evaluation has reached a x point are termed completely evaluated. Seki in [18] proves that a \top-down" resolution method, QSQR is asymptotically identical to a bottom up method, Alexander Templates. There are a number of technical details that would need to be worked out in order to use Seki's result to compare SLG on de nite programs to magic templates. 1 A program ounders if there is an atom whose truth cannot be proven without making a call to a non-ground negative

literal. 2 Currently SPARC, MIPS, Intel 80X86 and Motorola 680X0 chips have been tested; for operating systems, SUNOS, SOLARIS, IRIX, ULTRIX, LINUX, 386BSD, AMIGA-DOS, HP-UX, System V.3, SCO Unix, and Mach have been tested. A port for Windows NT has been partially tested.

2

Even so, the similarities between the two methods are striking. For instance, the magic facts of the magic template method appear to correspond to the tabled subgoals of an SLG evaluation. For strati ed programs, SLG di ers from methods such as OLDTNF [10] in that SLG completely evaluates goals created in negative contexts, whereas OLDTNF does not. This strategy gives SLG a polynomial data complexity for datalog programs3. Non-strati ed programs need to handle negative subgoals di erently than do strati ed programs. In SLG, non-strati ed programs must delay at least some calls to negated subgoals. If a negative literal in the body of a clause is delayed, other literals in the body may fail, and in this case the delayed literal will never need to be rechecked. On the other hand, if a subgoal is proven subject to a delayed literal (or literals), the delayed literal must be rechecked after the relevant subgoals have reached x point. This rechecking is in fact done to tabled answer clauses. Rechecking the delayed literal may lead to failure of the subgoal (if the literal is false) or to unconditional proof of the subgoal (if the literal is true, and there are no other delays). However, it will not always be the case that a subgoal can be proven either true or false: there may be cyclic dependencies between the delayed subgoals among a set of subgoals. In this last case, the subgoals whose delays cannot be resolved away are given a truth value of unknown if the well-founded semantics is used.

2.2 Implementation of SLG

Because of the similarity of SLG to SLD on de nite programs, it is natural to ask whether an SLG interpreter (or preprocessor) could be written in Prolog using SLDNF. If so, then Prolog itself could compute SLG and | given suitable extensions for indexing, memory management and other features | could probably be transformed into a deductive database. Such an interpreter can and in fact has been written but its speeds have turned out to be unacceptable for general programming. (Indeed the extension tables of [8] could be viewed as a Prolog processor to compute OLDT). It is not hard to understand why the interpreter is slow. As was mentioned in section 2.1, many subgoals resolve against answer clauses rather than against program clauses. Since answer clauses are derived during the course of an evaluation, an evaluation has a x point quality not present in SLD evaluation. Subgoals that are to be resolved against answer clauses must be retained until the x point is reached: until all applicable answer clauses have been derived and resolved against the subgoals. Likewise, newly derived answer clauses must be queued to resolve against subgoals arbitrarily far away in the search tree. At the level of Prolog implementation, the two tasks described above correspond to freezing an environment to retain subgoals, and to switching environments to return new answers to extant subgoals. These tasks are impossible in a sequential WAM implementation. Interpreters and preprocessors which try to compute SLG (or OLDT) end up needing computational mechanisms not natural to the WAM, and as a result their performance su ers. The SLG-WAM contains WAM extensions for ecient evaluation of SLG4 and is roughly 100 times faster than its meta-interpreter running on a similar emulator. An associated paper will discuss these extensions in detail; here we mention the major ones only in passing. To begin with, a separate memory area, the table space is added to the WAM for table manipulation, and routines have been added to copy derived answers from the SLG-WAM program stacks to table space and back. As mentioned in the previous section, the evaluation must not backtrack over subgoals which use answer clauses. To ensure this, freeze registers are added to each of the WAM stacks. During an evaluation these freeze registers protect a subgoal's environment from being overwritten. As a result, 3 As a practical matter, Existential Negation can provide an alternative behavior in SLG programs, and is explained

later. 4 The current implementation is limited to strati ed programs, although detailed design has begun for the full SLGWAM.

3

the WAM stack register points to information about the current environment, but may not necessarily denote the top of stack. A consequence of this is that a stack frame's \parent" may not immediately precede it in memory. The \stacks" become trees and the stack frames require back pointers to their parents5. The evaluation must also provide a mechanism to return a new answer clause to an extant solution. Here, the WAM environment must be changed. To switch environments, information from choice points can be used to reset registers, as in backtracking. However, switching environments requires variables to be bound as well as unbound, and the SLG-WAM trail is changed to a forward trail to accommodate this. In switching from one environment to another, the engine untrails to a common ancestor, then uses values in the forward trail to reconstitute the target environment. One of the nal additions supports incremental completion, or the ability to determine when a subgoal or set of mutually dependent subgoals has been completely evaluated. As mentioned above, a subgoal is not completely evaluated until all subgoals upon which it depends are completely evaluated. The SLG-WAM uses a subgoal stack to keep track of the dependencies. The subgoal stack is guaranteed to produce an over-approximation of subgoal dependencies, so that the SLG-WAM does not mark subgoals as completed until all possible answers have been derived for them. It should be noted that SLG-WAM overhead for SLD resolution is minimal. Comparisons of the SLG-WAM with PSB-Prolog, from which it is derived, indicate that the SLG-WAM is usually less than 10% slower than PSB-Prolog's WAM, and is sometimes faster. The comparison is made murky by the fact that numerous improvements were made to the emulator in the course of implementing the SLG-WAM and in preparing XSB for release. As a Prolog system using only SLD, XSB runs at approximately the speed of emulated SICStus, and is about three times slower than Quintus6 .

2.3 XSB Programming Environment

The SLG-WAM forms the engine for the XSB Logic Programming system [17]. This section reviews features of XSB's programming environment which will be needed later in the paper. The discussions in the previous two sections indicate a close connection between SLG and SLD. Indeed, there is no impedance mismatch between the two, so it is not surprising that a user can freely choose between SLG and SLD evaluation. Predicates in XSB are executed using SLDNF by default and can be declared tabled either by the user on a per predicate basis, or by the system on a per module basis through the directive table all.

3 Performance of SLD and SLG-def The restriction of SLG to de nite programs may be called SLG-def. Despite a great deal of folklore, the decision to use SLD (top-down) or SLG (roughly, bottom-up, tuple-at-a-time) is still an art. On the one hand, the bottom-up aspect of tabling in SLG aids in ltering out redundant computations { indeed it is this property which makes SLG complete for non- oundering Datalog programs. The samegeneration program furnishes an extreme case of the usefulness of ltering out redundant computations. The program sg(X,Y):-cyl(X,X1),sg(X1,Y1),cyl(Y,Y1). sg(X,X).

5 This is already the situation in the WAM local stack. 6 For these comparisons, Prolog-style hashing was used rather than XSB's rst-string indexing. Details of these com-

parisons can be obtained from the authors.

4

was executed on a 24x24x2 randomly generated cylinder [1] using the query ?- sg(1, ),fail. A cylinder can be thought of as a rectangular matrix of elements where each element in row N has links to a certain number of elements in row N+1. The 24x24x2 cylinder then, is an array of 24x24 nodes, where each of the nodes in each row (except the last) is connected to two elements in the next higher row. Executing this query under SLG results in times more than three orders of magnitude faster than SLD, since SLD will search a complete binary tree of depth 24. Clearly examples can be developed to make SLG execute arbitrarily faster than SLD. On the other hand, for certain non-datalog programs an overhead accrues in copying information to and from tables. It is well known, for example, that naively copying information from stack space to table space can increase the computational complexity of a query like append(b,f,f) from O(N ) to O(N 2). Of course, lists of ground elements can be copied into the table as a whole and the complexity of the query reduced. Most deductive database systems, including XSB, have facilities for doing so. However if a tabled predicate naively recurses over a structure containing variables, SLG execution may increase the complexity of a query7 . For datalog queries that do not contain redundant computations, SLG performs surprisingly well compared to SLD. Example 3.1 To compare the eciency of the right-recursive path(X,Y) :- edge(X,Y). path(X,Y) :- edge(X,Z),path(Z,Y).

using SLD against the left-recursive

:- table path/2. path(X,Y) :- edge(X,Y). path(X,Y) :- path(X,Z),edge(Z,Y).

using SLG, benches were run on chains

edge(1,2), edge(2,3), ..., edge(N-1,N).

and cycles

edge(1,2), edge(2,3), ..., edge(N,1).

of varying lengths and complete binary trees edge(1,2), edge(1,3), ..., edge(2n ? 1,2n+1 ? 1). of varying heights. Tables 1, 2, and 3 show the normalized times for the query path(1,X),fail in version 1.4 of XSB, and in the SLG meta-interpreter, running under Quintus. Right recursion is the only way to write path/2 in pure SLD so that it will terminate. Left recursion is the most ecient, and arguably the most natural way to write path/2 in SLG. In addition, [11] has identi ed programs which are equivalent to left-recursive forms and speci ed transformations into these forms. However, the SLG-WAM has made no particular optimizations for left-recursion. SLD evaluation is linear for the queries to the chain and the tree since neither contains a redundant path. However, the left-recursive, SLG derivation takes nearly the same time as SLD for the chain and tree (mostly about 20-25% longer), while it terminates for the cycle. The similarities in speed on the chain and tree are especially signi cant since the SLG times include time taken to copy answer clauses to table space, and to abolish and reclaim table space at the end of each iteration. The comparison of SLG to SLD for datalog programs does not take into account program transformations such as projection pushing which can make use of mode information. However, there is certainly no reason that the wealth of datalog program transformations developed over the last decade cannot be used with SLG. While [20] contains some preliminary results, it remains an open question which transformations will prove the most useful. 7 Ecient techniques for writing such predicates are described in [22].

5

Length Interpreted SLG SLD Compiled SLG

8 16 32 64 128 256 512 1k 2k 31 33 36 40 34 34 35 36 44 .7 .62 .61 .67 .84 .84 .82 .83 .83 1 1 1 1 1 1 1 1 1

Table 1: Normalized Times for path/2 on Acyclic Linear Lists Height 6 7 8 9 10 11 Interpreted SLG 32 32 34 32 37 44 SLD .82 .86 .87 .79 .86 .79 Compiled SLG 1 1 1 1 1 1 Table 2: Normalized Times for path/2 on Complete Binary Trees Length 8 16 32 64 128 256 512 1k 2k Interpreted SLG 33 37 38 42 33 35 33 35 45 Compiled SLG 1 1 1 1 1 1 1 1 1 Table 3: Normalized Times for path/2 on Cyclic Linear Lists The SLG meta-interpreter is slower than the SLG-WAM for this set of problems for the reasons mentioned in section 2.2. The meta-interpreter was benched while running under Quintus Prolog, where it is about 30 times slower than the engine, rather than 100 times. It will be demonstrated in section 4 that despite being written as a Prolog interpreter, the meta-interpreter running under Quintus may give times roughly comparable with many deductive databases. The same point was made for programs with negation in [4].

3.1 SLG-strat vs SLDNF

One of the signi cant advantages of SLG over other memoization methods, such as OLDT is its handling of general negative literals within clauses. This section will focus on win/1. Experiment 3.1 The program win(X):-move(X,Y),\+win(Y).

is strati ed i move/2 is acyclic. Using the chains and trees from experiment 3.1, we can retest compiled SLG against SLDNF. Comparing SLG with SLDNF is slightly more complicated than comparing it with SLD. This is due to the fact that with SLDNF, a call to a predicate is failed when the rst solution to that call is obtained. In SLG however, a call to a negated predicate may have created new subgoals which are all completely evaluated during the course of execution. In other words, SLG will explore the entire search tree rooted in the negated subgoal. Failure to do so may lead to exponential data complexity in certain programs [5]. On the other hand, the strategy of fully evaluating subgoals is not always the best choice, as can be seen from the example of win/1 over the binary tree. Consider the calls made by SLDNF for the query win(1) over a binary tree of height 4 with 31 nodes. The calls are represented as circled nodes in Figure 1. Because SLDNF checks only for the existence of a solution for a negative subgoal, only 13 out of 31 possible subgoals are evaluated p by SLDNF, and in general the execution of win(1) over a binary tree grows proportionally to 2n in SLDNF rather than to 2n . 8 8 The exact formula is G(n) = 2b n2 c+2 ? 3 + 2( n 2

? b n2 c).

6

1

2

3

4

8

16

5

9

17

18

10

19

20

6

11

21

22

7

12

23

24

13

25

26

14

27

28

15

29

30

31

Figure 1: Calls to win/1 over a binary tree Version 1.4 of XSB allows three di erent ways of executing win/1. The rst uses pure SLG resolution in which all subgoals are fully evaluated. The second uses SLDNF resolution. Existential Negation is the third alternative of XSB, which incorporates some of the search strategy of SLDNF resolution into SLG resolution. In Existential Negation, when a de nitely true answer is derived for A, the corresponding ground negative subgoal A fails. Furthermore, under certain conditions, subgoals that are created during the evaluation of A can be discarded before they are fully evaluated without losing termination and correctness properties of SLG resolution. The tables below show normalizations of the execution times of the SLG meta interpreter, XSB using< SLDNF and XSB using no Existential Negation to those of XSB with Existential Negation for the two benchmark programs that are modularly strati ed. Length 8 16 32 SLG Interp. 30 30 33 XSB / No E-Neg .99 .99 1 XSB / SLDNF .67 .21 .22 XSB / E-Neg 1 1 1

64 128 256 32 29 29 .99 1 .97 .22 .24 .26 1 1 1

Table 4: Comparisons of SLG implementations for acyclic linear lists Height 6 7 8 9 10 11 SLG Interp. 123 116 229 261 812 906 XSB / No E-Neg 4.5 4.25 7.6 8.2 15.4 15.7 XSB / SLDNF .3 .24 .22 .24 .24 .23 XSB / E-Neg 1 1 1 1 1 1 Table 5: Comparisons of SLG implementations for complete binary trees

7

As with example 3.1 the SLG-WAM is about 100 times faster than the interpreter (about 30 times faster when the interpreter is run under Quintus). To some extent, the ratios in this comparison may be slightly low, since negation of tabled predicates is not yet fully compiled in XSB.

3.2 Tabled code vs Compiled code

SLG evaluation was originally formulated as a program transformation method. For modularly strati ed programs, answers in completed tables take the place of program clauses. So SLG's ltering of redundant subcomputations can be used as a way of amortizing repeated queries. In this context it is worthwhile to consider the speed of executing tabled code in the SLG-WAM against the speed of executing compiled code. Example 3.2 Consider the query ?- path(1,Y),fail. against the completed table from example 3.1 and against the compiled unit clauses edge(1,2). edge(1,3). , ... , edge(1,N). Retrieving tabled code consistently takes about 85% longer than retrieving compiled code. As explained in [20], the di erence in speed is due partly to table navigation and partly to the fact that instructions which copy information out of a table are not yet as specialized as WAM get instructions. The speed of retrieving solutions from a completed table means that code can be optimized by tabling even small predicates that are likely to be used as common subexpressions. Furthermore, code can also be optimized by factoring out common subexpressions, and creating tabled predicates that use them.

4 Comparisons with Deductive Database Systems Detailed comparisons with other deductive comparisons are not complete since many systems are not publicly available or are not directly comparable. In the case of GLUE/NAIL which is not publicly available, comparison can be made using published times for certain queries, comparisons which will be provided below. Aditi is a multi-user system which allows direct computation on disk-resident data, and so comparisons with a single user system for memory-resident queries seem pointless. As for LDL, [14] presents a comparison of CORAL with LDL, and indicates that for most simple queries, CORAL is usually signi cantly faster than LDL (an assessment largely echoed in [9]). As a result, the comparisons in this section are usually made against against CORAL, although we include supporting information about GLUE/NAIL and LDL performance. We include the range-restricted datalog queries on which one would expect set-at-a-time evaluation methods to excel, along with queries involving structures, non-range-restricted queries, and negation. While further benching will surely prove useful, we believe that the times presented, and the analysis thereof are nonetheless of general signi cance.

4.1 Benching Environment

A pre-release of XSB version 1.3 was benched against version 0.1 of CORAL dated. To partially account for the disparity of versions between XSB and CORAL, an XSB version without C compiler optimizations was tested. Compiling XSB with the C compiler optimizations gives a 40-50% speedup compared to compiling unoptimized for programs like those below. The benches were all done on a SPARCstation 2, measuring cpu time in seconds. For both systems, several iterations were made of each bench. For CORAL, the iterations were done in a separate, pipelined module in CORAL. Times given in the tables represent time for all iterations and may vary with the number of iterations tested. 8

Return hash table sizes are presently static in XSB. Accordingly sizes of return index hash tables were declared explicitly for XSB times, after making a guess of an ecient size. Between each iteration tables were cleared using abolish all tables/0 in the case of XSB and to rel clear/1 in the case of CORAL. By testing a single iteration with no table clear, it was estimated that clearing tables took 5-15% of CORAL's time and 1-2% of XSB's time. (This is thought to be a re ection of the lack of sophistication of XSB's memory management, rather than any lack of eciency of CORAL). As a nal point, time to read data into the database from an external le and to index it takes roughly the same time in both XSB and CORAL when XSB's formatted data read is used9 . It should also be noted that XSB's internment of atoms uses a static hash table, and is not suitable as it exists for reading in les over a certain size, or, indeed, for running programs which create large numbers of atoms.

4.2 Execution and Compilation Strategies

CORAL has a number of compilation and execution modes. The modes tested were:  Default. CORAL uses a variant of supplementary magic templates as its default rewrite mode. In its default state it expects possible non-ground terms as answers, and eliminates duplicate answers, as well as checking for subsumption.  @factoring. Factoring is a rewrite technique which is applied to programs that have undergone the magic transformation [11]. In the path/2 predicate, factoring allows the removal of a join to a magic predicate in the body of the recursive clause for the path relation. This option is turned o as part of the default.  @index deltas. When this option is turned on, the delta set produced by each iteration of the semi-naive algorithm is indexed. Turning this option o will save time when very few facts are returned per iteration, but will lose time when many facts are returned. This option is turned on as part of the default.  @multiset. Turned on, the engine does not check for duplicate returns. If a predicate is known not to produce duplicate returns, in the case of, say the path/2 relation over a tree or chain, turning o this option will save time. If on the other hand, the data structure traversed by path/2 is a dag or a cycle, the computation will perform redundant work or will not terminate. This option is turned o as part of the default.  @check subsumption. This turns o checking for whether a return is subsumed by other returns. The checking for subsumption is especially relevant in a bottom-up context since, in certain cases, magic transformations can lead to programs which return many instances of an answer before returning the general answer itself. This option is turned on in the default system.  @supplementary magic indexing. This allows a given answer to be joined in only one of a number of clauses to which it might otherwise be joined [19]. It is turned o in the default.  @non ground facts-. This may be turned o for safe data queries. If turned o , the engine may simply match arguments without having to unify them. It is turned on in the default. 9 The alternate is a full reader which reads HiLog terms using operator declarations, a generally unsuitable operation

for bulk copying into a database.

9

 @pipelining. When turned on, the engine approximates top-down tuple-at-a-time evaluation.

Unlike bottom-up evaluation, results are not materialized. The CORAL manual states that convenience was chosen over eciency in the implementation of pipelining, an future versions may be more ecient. It is turned o in the default.  @ordered search+. This evaluation strategy becomes useful for certain types of programs with negation. Ordered search hides answers derived in a negative context so that they need not be fully evaluated after an answer is found for a negative goal. This option is turned o in the default. It is useful to understand how the CORAL modes apply to the SLG-WAM. First of all, the SLGWAM constructs its derivation tree explicitly as the program is evaluated, and thus directly ensures goal-directedness, rather than requiring a rewrite of the program. On the other hand, CORAL's use of supplementary magic templates allows it to lter out duplicates at each join, at the cost of maintaining supplementary magic facts. Due to SLG's direct execution, like @factoring are thus not directly applicable. The @index deltas option is also inapplicable for the tuple-at-a-time SLG. As for @multiset, XSB never allows duplicates when using SLG evaluation (though this would be possible) although it does allow them when using SLD. Nor does the SLG-WAM check for subsumption of returns, although it is possible to do so within the framework of SLG. The tuple-at-a-time strategy automatically provides the functionality of @supplemental magic indexing (not to be confused with supplementary magic templates): an answer is returned only to calls which would derive it through a path on an SLG tree. Finally, the SLG-WAM always assumes that non-ground facts are possible. Surprisingly, XSB's Existential Negation is an analogue of @ordered search [13]. As mentioned previously, Existential Negation discards tables created after a negative call under certain circumstances, based on the approximation in the subgoal stack. Ordered Search is similar in that subgoals created in a negative context are hidden by the rest of the program and are deleted if an answer is derived for the negative goal.

4.3 Tests of Datalog

In this section we rst present tests of left recursion in detail. However, the relative times presented for left recursion do not di er greatly from those for right and double recursion in section 4.3.2.

4.3.1 Tests of Left Recursion

The left-recursive path/2 program of section 3 was tested on a number of di erent data structures and for several di erent CORAL optimization modes. The usual query atom ?-p(1, ) was used. A bf annotation was given to CORAL. Let us start the series of benches by calling the path predicate over simple cycles of length 8 to length 2k. The range of these data structures does not re ect any limitations in XSB or CORAL in handling larger relations. The observed times are shown in gure 2. XSB times appear to be about 10x faster than CORAL times. One of the di erences in the times might be due to the fact that the XSB return hash table sizes were declared explicitly, while CORAL sizes were determined dynamically. The time CORAL spends eliminating duplicates can be found by recompiling with @multiset on for a chain data structure with the same lengths as the cycles, and with check subsumption o . Note that under these settings, CORAL will not check for duplicate answers, and presumably will not need to index the entire set of returns for duplicate elimination. 10. Times for are shown in gure 3. 10 It should be mentioned that we do not have an intimate knowledge of the CORAL engine.

10

4000 XSB CORAL-def CORAL-fac

3500

Seconds for 1000 iterations

3000

2500

2000

1500

1000

500

0 0

500

1000 1500 Elements in Cycle

2000

2500

Figure 2: XSB vs CORAL for Cycles The combination of multiset+ and check subsumption (not shown) does not di er signi cantly from the other optimizations. It seems that most of the CORAL time is not spent checking for duplicate returns. However allowing multisets in returns can be dangerous in general since an evaluation will not terminate for recursions over cyclic data. (As an aside XSB could also use SLD resolution on this example by rewriting the path program to be right recursive, giving a speedup of about 25%). Also note that turning o the @index deltas option for the chain also reduces the CORAL time for all but the 2k chain. The @index deltas option indexes the delta set at each iteration and may be useful for checking duplicates in the delta set as well as for joining when the deltas are used in a x point iteration. Indexing the delta set is a (usually) a waste of time for the chain because the delta set at the end of each iteration is a singleton. Beyond whether or not the delta set is indexed, comparisons of chains and linear cycles are potentially unfair to CORAL since these data structures are not especially amenable to set-at-a-time strategies. Next we try path/2 using the same options over a balanced binary tree of varying height. As seen in table 6, XSB is still faster than CORAL, although the ratio is somewhat less for the default and for factoring. Note the blow up of the index deltas option. In general the CORAL/XSB ratios are somewhat less for the tree than for the chain, and one wonders whether there is a branching factor (delta size) large enough that CORAL becomes more ecient than XSB? We next test structures of the form edge(1,1),...,edge(1,N)

to try to determine this. These are trivial data structures all of whose answers can be found in the rst iteration of a bottom-up x point, when called with ?-path(1,X). Figure 4 does not indicate a cross-over point between XSB and CORAL, although further testing is probably necessary to conclude that such a point does not exist. Staying with the path relation, the next suite tests time to copy a structure into a table. Here the program di ers slightly: 11

3000 XSB CORAL-def CORAL-fac CORAL-mul CORAL-idl

2500

2000

1500

1000

500

0 0

500

1000

1500

2000

2500

Figure 3: XSB vs CORAL for chains path(X,Y,Z):-edge(X,Y,Z). path(X,Y,Z):- path(X,W,_),edge(W,Y,Z).

and the data structure is a chain of type

edge(N,N+1,g(g(g(g(g(g(g(g(g(g(N))))))))))).

Here the third eld of edge/3 does not participate in the joins but is carried along, being copied into and out of the table (at least in the SLG-WAM). The CORAL times are a ected somewhat less in table 7 than the XSB times, and the CORAL/XSB ratios are less than before. As will be discussed below, CORAL's relatively good times may stem from their structure-sharing approach to using information in their solution store as opposed to the SLG-WAM's (default) structure-copying approach. Since structure-sharing approaches may become less ecient when variables are introduced, another test replaced the ground structure in the third argument by an existential variable, with the data structure now: Height Iterations XSB CORAL Default CORAL Factoring+ CORAL Multiset+ CORAL Non ground factCORAL Index DeltasCORAL Sup magic indexing+

6 500 5.82 45.57 39.16 36.66 45.18 114.29 87.34

7 100 2.35 17.42 15.28 13.69 17.48 71.97 30.92

8 100 4.849 39.1 33.19 31.31 38.66 292.3 73.98

9 100 9.36 81.36 70.16 66.39 81.93 1083 187.11

10 10 1.839 17.81 15.93 14.8 17.71 .... 42.12

Table 6: Comparison For Binary Trees of Varying Height 12

11 10 3.699 35.36 43.15 29.75 35.63 .... 125.11

2500 XSB CORAL-def CORAL-fac

Seconds for 1000 iterations

2000

1500

1000

500

0 0

500

1000 1500 Number of leaves

2000

2500

Figure 4: XSB vs CORAL for Fanout Experiment edge(N,N+1,g(g(g(g(g(g(g(g(g(g(X)))))))))))

Unfortunately, CORAL could not compile and execute the entire le of facts, perhaps because of the early release used. For those chains that did compile, the ratio of CORAL/XSB times appeared to be larger than with ground structures.

4.3.2 Other Datalog Queries

A large portion of our comparisons were concerned with left-recursion, since the simplicity of this recursion lends itself well to understanding the e ects of various options and data structures. However it is useful to present information about other datalog queries to verify that left-recursive queries are not anomalies. Other tests indicate that left-recursion is representative of the performance of the two systems for datalog. CORAL was about 8 times slower than XSB for the same-generation predicate program, and Length Iterations XSB CORAL Default CORAL Factoring + CORAL MultisetCORAL Index deltasCORAL non ground factCORAL sup magic indexing+

8 500 1.289 5.3 4.82 4.95 4.28 5.1 7.96

16 500 2.62 9.54 9.38 8.2 7.89 9.74 14.11

32 500 5.75 18.61 17.94 16.24 14.57 18.11 26.19

64 500 10.9 35.75 34.74 32.73 29.36 37.42 49.9

128 100 4.739 15.26 13.82 13.73 12.24 16.1 20.04

256 100 9.45 30.23 27.62 25.92 23.65 29.71 42.2

Table 7: Comparison of Ground Structures of level 10 13

512 100 19.22 63.38 60.07 56.64 52.45 64.25 91.8

1k 10 3.83 13.43 11.68 11.78 10.74 13.48 16.94

2k 10 7.64 26.54 25.35 24.63 24.08 28.07 43.82

query

on the 24x24x2 cylinder from section 3. On the right-recursive version of over chains CORAL was about 15x slower than XSB. For path/2 written as a double recursion, CORAL was 5-10x slower than XSB over the range tested. The variance is perhaps due to inadequate subgoal indexing in XSB. At this point published results of testing CORAL against LDL may be mentioned. CORAL is usually signi cantly faster than LDL for non-left-recursive datalog queries, about 5x faster according to [14]. LDL was reported to be about 2x faster than CORAL for a particular left-recursive query, in which case it is still signi cantly slower than the SLG-WAM. ?-sg(1, ),fail. path/2

4.4 Tests of Negation

Both XSB and CORAL compute programs that are left to right modularly strati ed, and both systems are poised to compute general logic programs through full SLG in XSB's case and through well-founded magic sets in CORAL's. The program win(X):- move(X,Y),\+ win(Y).

along with the chain and tree data structures from section 3.1 was retested for a comparison. On the chain the times were as in table 8, while for trees the times were as in table 9. Length 8 16 Iterations 500 500 XSB (no e-neg) .62 1.3 XSB (e-neg) .62 1.23 CORAL (Ordered Search) 18.01 37.02 CORAL (default) 18.93 37.98

32 64 500 500 2.77 5.39 2.51 5.01 78.93 152.01 75.26 159.66

Table 8: Comparison for win/1 on Chain Height 6 7 Iterations 500 100 XSB (no e-neg) 12.26 5.031 XSB (e-neg) 2.62 1.09 CORAL (Ordered Search) 281.2 116.35 CORAL (default) 272.46 116.09 Table 9: Comparison for win/1 on Complete Binary Tree [4] tested XSB for the same win/1 program against published GLUE/NAIL times using both the SLG-WAM for the tree and chain, and the SLG meta-interpreter for the cycles. Although the GLUE/NAIL times were run on a slightly faster machine than XSB (S. Morishita, personal communication), the meta-interpreter running under Quintus was very competitive with GLUE/NAIL. For the chains and trees, the SLG-WAM was solidly two orders of magnitude faster than GLUE/NAIL when full SLG evaluation was used, and often three orders of magnitude faster when SLG with Existential Negation or when SLD was used.

14

4.5 Tests of List Recursion

One would not expect that append/3 would be tabled in a practical program: still, benching this predicate highlights more di erences between the two systems. The current version of the XSB copies lists from the stacks to table space and back for each SLG call to append/3, giving an overhead linear in the size of the lists11. CORAL contains mechanisms to minimize copying from and to answer store, and uses an algorithm that adds an overhead linear in the number of variables in the lists[19]. For the comparison, calls to append(b,b,f) were made varying the length of the list in the rst argument. For CORAL, we tested the default bottom-up version, an optimized bottom-up version from CORAL's example directory (@check subsumption and @index deltas both turned o ), along with the pipelined version. Length 8 16 32 64 Iterations 10000 10000 1000 1000 CORAL Default 435 1025 291 885 XSB / SLG 25 66.4 20.7 69.8 CORAL sugg opts 243 396.1 74.1 147 CORAL Pipelined 42 82.6 15.7 42.9 XSB / SLD 1.68 2.5 .44 .8

128 100 310 26.3 29.6 10 .15

256 100 1188 99.5 60.1 20.0 .31

512 100 4666 395 108 37.7 .60

Table 10: Comparison for append/3 on lists of varying lengths Results in table 10 indicate that CORAL's current implementation of pipelining is about 30x slower than SLD. Of course, by including a Prolog emulator CORAL could achieve SLD speeds. Surprisingly the speed of SLG makes it comparable with CORAL's optimized bottom-up implementation when the length of lists is below 200 or so, and with pipelining when the length is below about 20-30. When optimizations for ground answer clauses are completed for SLG, we expect SLG times for this program to be competitive with SLD for ground lists. However, for non-ground lists, our implementation of SLG will remain quadratic.

4.6 Analysis of Results

The test suite is incomplete, and is not intended to re ect a balanced comparison of the two systems in their entirety. With that in mind, the results of the previous sections re ect the known result that SLG (SLD with tabling) and magic transformations have the same complexity on de nite programs. As support for this statement, while the ratios vary from about 5x to 15x over di erent programs, they tend to be more or less constant as data size changes. Do the ratios re ect essential aspects of data structures and scheduling for set-at-a-time and tuple-at-a-time? Or is it just the state of the art for the two approaches? The questions are impossible to answer as stated, but to spur further research, we can present hypotheses that arise out of the comparison. To begin with, the SLG-WAM spends the majority of its time executing WAM instructions and only a minority of its time executing tabling instructions. Furthermore, we claim that using present technologies, tuple-at-a-time techniques are more ecient than set-at-a-time techniques for crucial datalog operations like joins. Using instruction counts and timings like those detailed in [20], the approximate percentage of time executing WAM instructions can be determined. Approximately 5-10% of execution time is spent in tabling instructions for a left-recursive version of path/2 over a cycle and about 33% of the time in 11 Future versions of XSB will obviate this copying when the lists are ground.

15

tabling instructions for same generation/2. Execution time is largely dependent on WAM execution time for these predicates. The only datalog exception we have found is the double-recursive path/2, where the time in tabling instructions rose to about 65%. For this last program, the tabling time was fairly evenly spent between returning answer clauses to subgoals, checking for duplicate calls and returns, and copying information to and from tables. Of these tasks, the process of returning answers to subgoals conceptually resembles backtracking, and indeed, the SLG-WAM answer-return instructions resemble WAM backtracking instructions. As an indication of WAM eciency for datalog operations, consider the WAM paradigm as it executes a join. In this test we consider XSB, Quintus Prolog, LDL, and CORAL | though as a point of reference we also include Sybase. Table 11 gives approximate relative times for an indexed join for various systems. All data was in RAM, either under the control of UNIX or in the Sybase system bu er. Quintus 1 XSB 3 LDL 8 CORAL 24 Sybase 100 Table 11: Approximate Relative Join Speeds Sybase uses a fundamentally di erent paradigm than the other systems: all except Sybase are optimized for memory-resident queries, and none except Sybase have made special provisions for concurrency or recoverability. The two WAM-oriented systems are the fastest: for this problem Quintus is 3 times faster than XSB, due mostly to the fact that Quintus is written directly in assembler. These results also indicate the usefulness of separating out concurrency from a query engine, as well as the advantages of using specialized (e.g. indexing) techniques for memory-resident queries. The results also raise a deeper question about the low-level operations for the implementations. Consider the actions taken by the (SLG) WAM in resolving a program or answer clause. After accessing the clause, the WAM uni es arguments, altering bindings in variables by trailing their addresses. At backtracking these addresses will be used to untrail the bindings. The machine operations necessary to do this are minimal. Rule application is the analogy to clause resolution in CORAL, and is summarized in [19]. A setat-a-time technique like CORAL's must avoid unnecessary copying from and to the answer store in applying rules during a x point computation. CORAL's approach uses a sophisticated algorithm that allows sharing of non-ground structures. As a result, its copying algorithm is linear in the number of variables of a structure rather than in the size of the structure. However, implementing this structuresharing approach may introduce a constant but non-negligible overhead compared to WAM variable binding. The overheads of set-at-a-time in scheduling application of rules to data also seem greater than WAM overheads for backtracking, at least at present. If set-at-a-time engines are to become competitive with the SLG-WAM, it seems probable that they must compile down to an instruction set of a comparable level. However, since the WAM relies fundamentally on backtracking and on copying non-ground structures, a set-at-a-time abstract machine must be invented from scratch. If this reasoning is correct, and the di erence between the two systems lies in their fundamental engine operation, then low-level engine design should become a central issue for set-at-a-time research.

16

5 Conclusion Judging from results in Prolog implementation, about an order of magnitude speedup appears to be possible for the SLG-WAM. A few of the more important aspects are:  Improve indexing, especially for SLG calls and answers. The XSB development team is presently working on this, using ideas based on [2].  Make better use of mode information, both static, and when it is dynamically obtained (for instance, detection of ground subterms in answer clauses can reduce copying).  Introduce bottom-up rewriting techniques into the compiler to transform inecient programs into programs more suitable for the engine. This will include both datalog style analysis as well as new methods speci cally designed for XSB.  Improve the SLG-WAM engine by either coding parts of it in assembly code as Quintus does, or by generating native code for rules. Since it factors out redundant subcomputations, set-at-a-time evaluation has traditionally been considered more declarative than tuple-at-a-time, as well as easier to extend to secondary storage. If tuple-at-a-time is taken to mean SLD, this judgement is correct. However, we believe SLG o ers the declarative advantages of set-at-a-time along with the speed of the more procedural SLD, making it natural for in-memory computations. Future research into SLG systems needs to address how such systems should handle disk-resident data. It remains open whether a system using SLG should interface with a disk-resident set-at-a-time module, or whether some species of enveloping should be developed for it instead.

Acknowledgments Most of the SLG meta-interpreter was written by Weidong Chen. We would like to thank Kostis Sagonas for his comments and for his contributions to emulator speed. We also sincerely appreciate suggestions from Raghu Ramakrishnan and S. Sudarshan that improved the quality of the XSB and CORAL comparisons.

References [1] F. Banchillon and R. Ramakrishnan. An amateur's introdution to recursive query processing strategies. In Proc. of SIGMOD 1986 Conf., pages 16{52. ACM, 1986. [2] T. Chen, I.V. Ramakrishnan, and R. Ramesh. Multistage indexing algorithms for speeding Prolog execution. In Proc. of the Joint Int'l Conf. and Symp. on Logic Programming, pages 639{653. ACM, 1992. [3] W. Chen, M. Kifer, and D.S. Warren. HiLog: A foundation for higher-order logic programming. Journal of Logic Programming, 15(3):187{230, 1993. [4] W. Chen, T. Swift, and D.S. Warren. Ecient implementation of general logical queries. Technical report, State University of New York at Stony Brook, 1993. Submitted. [5] W. Chen and D.S. Warren. Towards e ective evaluation of general logic programs. Technical report, State University of New York at Stony Brook, 1993. Manuscript. 17

[6] Daniette Chimenti, Ruben Gamboa, Ravi Krishnamurthy, Shamim Naqvi, Shalom Tsur, and Carlo Zaniolo. The LDL system prototype. IEEE Transactions on Knowledge and Data Engineering, 2:76{89, 1990. [7] M. Derr, S. Morishita, and G. Phipps. Design and implementation of the Glue-Nail database system. In Proc. of the SIGMOD 1993 Conf., pages 147{156. ACM, 1993. [8] Suzanne Dietrich. Extension Tables for Recursive Query Evaluation. PhD thesis, State University of New York at Stony Brook, 1987. [9] P. Hsu and C. Zaniolo. A new user's impressions on LDL++ and CORAL. Technical report, ILPS'94 Workshop on Programming with Logic Databases, 1993. [10] D. Kemp and Rodney Topor. Completeness of a top-down query evaluation procedure for strati ed databases. In Logic Programming: Proc. of the Fifth Int'l Conf. and Symp., pages 178{194, 1988. [11] J. Naughton, R. Ramakirshnan, Y. Sagiv, and J. Ullman. Argument reduction through factoring. In Proc. of the 15th Int'l Conf. on Very Large Data Bases, pages 173{182. VLDB End., 1989. [12] T.C. Przymusinski. The well-founded semantics coincides with the three-valued stable semantics. Fundamenta Informaticae, 1989. [13] R. Ramakirshnan, D. Srivastava, and S. Sudarshan. Controlling the search in bottom-up evaluation. In Proc. of the Joint Int'l Conf. and Symp. on Logic Programming, 1992. [14] R. Ramakirshnan, D. Srivastava, and S. Sudarshan. CORAL: Control, relations, and logic. In Proc. of the 18th Int'l Conf. on Very Large Data Bases, pages 238{249. VLDB End., 1992. [15] R. Ramakrishnan and J. Ullman. A survey of research on deductive database systems. Technical report, University of Wisconsin, 1993. manuscript. [16] K.A. Ross. Modular strati cation and magic sets for datalog programs with negation. In ACM SIGACT-SIGMOD-SIGART Symp. on Principles of Database Systems, pages 161{171, 1990. [17] K. Sagonas, T. Swift, and D.S. Warren. An overview of the design and implementation of XSB, 1993. [18] H. Seki. On the power of alexander templates. In Proc. of the Eighth ACM SIGACT-SIGMODSIGART Symp. on Pronciples of Database Systems, pages 150{159. ACM, 1989. [19] S. Sudarshan and R. Ramakirshnan. Optimizations of bottom-up evaluation with non-ground terms. In Proc. of the Symp. on Logic Programming, 1993. [20] T. Swift and D. S. Warren. XSB performance measurement. Technical report, State University of New York at Stony Brook, 1993. [21] A. van Gelder, K.A. Ross, and J.S. Schlipf. The well-founded semantics for general logic programs. Journal of ACM, 38(3), 1991. [22] David S. Warren. Programming the PTQ grammar in XSB. Technical report, Department of Computer Science, Oct 1993. Proc. ILPS'94 Workshop on Programming with Logic Databases.

18

Suggest Documents