Dynamic Argument Reduction for In-memory Data Queries Prasad Rao
I.V. Ramakrishnan
Terrance Swift
Department of Computer Science SUNY at Stony Brook Stony Brook, NY 11794-4400
David S. Warren
fprasad,ram,
[email protected]
March 27, 1994
Abstract
Recently tuple-at-a-time evaluation strategies have begun receiving renewed attention for in-memory data queries, due to the existence of ecient techniques for their implementation. One such technique, the SLG-WAM, forms the basis of the XSB system[9]; performance tests indicate it to be an order of magnitude faster than current deductive databases for a wide range of in-memory queries. The SLG-WAM is based on SLG resolution [5], [4], a resolution method which uses tabling to evaluate normal logic programs1 . For any tabling method, be it SLG or Magic, access to tabled subgoals and to their answers is critical for eciency. Concern with this access time has led to optimization techniques, such as those for removing ground arguments from recursive rules. This paper presents table access methods, which combined with the backtracking of the SLG-WAM's tuple-at-a-time evaluation, allow dynamic reductions in the arguments of inferred predicates at an almost imperceptible cost compared to static methods. These access methods are based on representing tabled calls and returns as tries. While tries are a well-known data structure, they have properties of use to tabling methods beyond dynamic argument reduction. Using a trie, the existence of a term in a table can be determined in the same pass that the term is copied into the table. This paper describes the implementation of the new access methods, and of dynamic argument reduction, and provides performance results that validate their use in XSB.
1 The Table Access Problem Deductive database systems typically use both top-down and bottom-up mechanisms to evaluate queries written in a logical language. In magic evaluation, which is set-at-a-time, the top-down component is an analysis step which transforms a program so that it can be executed by a bottomup engine in a goal directed manner. Some magic-based systems, [8] and [6], also provide a top-down execution mode (pipelining) which can interact with bottom-up evaluations. Deductive database systems can also be built upon tuple-at-a-time evaluation methods, such as SLG [5]. SLG allows resolution of subgoals either by program clauses, providing a top-down component, or by answer clauses from a table, providing a bottom-up component. Recent performance evaluation of SLG 1
The present implementation of the SLG-WAM is restricted to modularly strati ed programs.
1
1 THE TABLE ACCESS PROBLEM
2
[11] as implemented in the XSB system[9], indicate that XSB is an order of magnitude faster than comparable deductive database systems for in-memory queries2. We believe that the main reason for the dierence in speed is due to the fact that the engine of XSB, the SLG-WAM[12], uses a paradigm, the WAM, more mature than implementation paradigms available to set-at-a-time methods. Broadly speaking, the compiler for the SLG-WAM analyzes programs at a low level, at the level of clause variables and WAM registers, while the magic compilers analyze programs at a higher level using global information about modes or types. Based on our experience, the greatest eciency gains under present technology can be made at the level of engine design. This paper presents the results of some design experiments for reducing the time for an engine to access tabled information, as well as an important optimization technique, substitution factoring. Substitution factoring allows answers to a subgoal to be saved and loaded in time proportional to the sizes of substitutions to variables of the subgoal, rather than in time proportional to the size of the subgoal itself. Example 1.1 The importance of substitution factoring can most easily be seen through what is now a canonical datalog example. Substitution factoring can dynamically reduce the evaluation of the program abf (X; Y ) : ?pbf (X; Y ): abf (X; Z ) : ?abf (X; Z ); pbf (X; Y ): to af (Y ) : ?pbf (X; Y ): af (Z ) : ?af (Z ); pbf (X; Y ): In this notation, the string bf indicates that the rst argument of the predicate is bound (instantiated), and the second is free. The annotations are provided only for explanation, and are not required by substitution factoring. Since the mode of an argument is undecidable in general, a dynamic approach is strictly more powerful than a compile-time optimization. The table access methods we propose, have other advantages as well, and to motivate them, we review the modalities of table access for SLG evaluation of de nite programs. These table access modalities are not essentially dierent from those of other tuple-at-a-time tabling methods. Subgoal Check/Insert. When a tabled subgoal is called, a check must be made to see whether the subgoal is redundant or not. In the case of XSB, this amounts to a variant check of whether the new subgoal is a variant of one that exists in the table3. If it does, the subgoal is termed active and answer clauses are resolved against the subgoal. If not, the subgoal is termed generating, entered into the table, and program clause resolution is used instead. Answer Backtracking. When an active subgoal is created it backtracks through answers in the table in the course of its evaluation. Answer Check/Insert. When an answer is derived for a particular subgoal, a check is made to determine whether it has already been entered into the table for the subgoal. If it has, the derivation path fails, a vital step for ensuring niteness. If not, the computation continues, and the answer is scheduled for return to the applicable active subgoals. XSB is available through anonymous ftp from cs.sunysb.edu as are all Stony Brook tech reports referred to in this paper. 3 In logic, one atom is a variant of another if they are the same up to variable renaming. Tabling systems may perform variant checks as does XSB, or subsumption checks, where an atom A1 subsumes an atom A2 if there is some substitution which uni es A1 and A2 such that the domain of includes only variables of A1 . 2
2 TRIES FOR CHECK/INSERT OPERATIONS
3
For in-memory computations, we would like tabled programs to have speed comparable with Prolog programs, an achievement which is only possible if the above three operations take place at Prolog speeds. We can thus speak of the table access problem which tabling systems (and deductive databases) must solve if they are to run at the speed of Prolog. In the case of the subgoal check/insert step, a call to a tabled predicate must take nearly the same time as a call to a non-tabled predicate. Backtracking through answer clauses must take roughly the same time as backtracking through unit clauses. Lastly, since answers are added for each solution to a tabled predicate, the time of the answer check/insert step should be small relative to the time required to derive the answers. These requirements cannot be met for all programs, but it is critical to make them as ecient as possible. It is important that the engine modi cations to enable the storage of subgoals and answers in tables be general enough not to degrade the performance of the system for any class of problems. This paper explores a trie-based method for storing subgoals and their returns in tables. We contrast it with a hash-based method used in previous versions of XSB. Our performance results indicate that, in addition to supporting dynamic argument reduction, the trie-based storage oers two salient advantages for table storage 1. Adaptable discrimination. When using variant checks for subgoals and answers, the trie structure provides discrimination between terms no matter where in a term the discriminating element lies. In contrast limited-depth hashing may suer if the discriminating element were deeply nested the trie-based approach loses little time, and may actually reduce space because of greater sharing between terms. 2. Collapsed check/insert. For the subgoal and answer check/insert, a single traversal of the term is necessary, regardless of whether the term needs to be copied into the table. Given the prevalence of these operations in a tabling system, the savings in time over a two-pass operation can be substantial. This paper rst discusses a simple method for storing subgoal and answer tables as tries, and shows how tries give a collapsed check/insert operation. It then discusses the implementation of Substitution Factoring. Our development is informal, though we hope convincing; full justi cation for our claims will be provided in the nal version of this paper. We present performance results which substantiate our claims, and conclude with a discussion of the relevance of this work to inmemory query optimization techniques. Because of space limitations we must assume a knowledge of the WAM[1].
2 Tries for Check/Insert Operations Tries were originally invented to index strings, but given a xed order of term traversal, they can be used to index terms as well, a point that has been made before in the literature (see, as one example, [3]). Example 2.1 provides an example of indexing terms using a trie. Example 2.1 Suppose we are given the following fragment of a chemical database: 1) 3) 5)
methoxymethanol(c(1),h(3)). methoxymethanol(c(1),h(5)). methoxymethanol(o(2),c(6)).
2) 4) 6)
methoxymethanol(c(1),h(4)). methoxymethanol(o(2),c(1)). methoxymethanol(c(6),c(2)).
2 TRIES FOR CHECK/INSERT OPERATIONS 7) 9)
methoxymethanol(c(6),h(7)). methoxymethanol(o(9),c(6)).
4
8) methoxymethanol(c(6),h(8)). 10) methoxymethanol(o(9),h(10)).
Term traversal based on WAM-structures must be depth- rst, although the search through the order of arguments may be arbitrary. Given a xed order of traversal for the methoxymethanol/2 facts, an entire fact may need to be traversed in order to discriminate it from other facts. For example, if the traversal is left-to-right through the arguments, the entire terms must be traversed to distinguish between clauses 1 and 2. If the traversal is right-to-left, then the entire terms must be traversed to distinguish between clauses 5 and 9. On the other hand, to distinguish between clauses 3 and 4, only the functor of the rst argument needs to be used. Tries give a way to exibly discriminate between facts: methoxymethanol/2 is shown in gure 1 3 1
h/1
4 5
c/1
6
h/1
8 7
c/1
1
methoxymethanol/2 2
c/1
1 6
o/1 c/1
6
h/1
10
9
Figure 1: Methoxymethanol Represented as a Trie While the clauses in example 2.1 are program clauses, tries can be used to store both subgoals and answers in a table. Algorithms to insert strings (or terms) into tries are well-known. At a detailed level, given a xed term traversal, such an algorithm traverses the trie in matching mode as long as the current component of the term matches a child of the current node of the trie. If the algorithm remains in matching mode when the trie is exhausted, it reports a match. Otherwise, if the traversal nds a component that does not match, the algorithm begins inserting the elements, reporting no match when the traversal terminates. The answer check/insert and subgoal check/insert steps of tabling systems both require a check and an insert if the term is not present. This reasoning leads to a useful result. Proposition 2.1 For the subgoal and answer check/insert steps, each element of the term is examined only once. In the trie insertion algorithm, a simple index check is made to see if the next element of the term exists as a child of the current node of the trie. Since it performs a variant check, the algorithm can treat each unique variable as a unique constant, and is equivalent to indexing a ground term. Figure 2 shows the trie for example 2.1 as an answer trie for the call ?- methoxymethanol(X,Y).
3 SUBSTITUTION FACTORING
5
The links from nodes to their children are used to traverse the trie in the check/insert operations. The tries also contain a link from each node to its sibling, if any. In version 1.4 of XSB, the insertion algorithm searches sequentially through the list of sibling links until the list reaches a prede ned size, at which time it recon gures the list into a hash table. Each path in the trie corresponds to an answer (or subgoal). There is also a link from each node to its parent, for loading answer clauses, as well as a sequential list of leaf nodes, to support backtracking through answer clauses. Subgoal tries have much the same structure as answer tries, although the sequential list of leaf nodes is not necessary for them because for subgoals there is no operation equivalent to backtracking through answers. 3 1
h/1
4 5
c/1
8 h/1 6
c/1
7 1
methoxymethanol/2 1 2
c/1 6
o/1 c/1
6
h/1
10
9
Call Struct
Figure 2: Methoxymethanol as Tabled Answer Clauses
3 Substitution Factoring In the SLG-WAM, answers are always associated with a particular subgoal 4 . Given a call, A, any answer for A must be subsumed by A, and can be represented as A. The core idea of substitution factoring is to store only the substitution for each answer, and to create a mechanism of returning answers to active subgoals that is proportional to the size of rather than the size of A. In the language of tabling, substitution factoring ensures that answer tries contain no information found in their associated subgoal tries. In order to explain the implementation of substitution factoring, we consider rst the creation of a (SLG-)WAM choice point for a non-tabled program clause. The WAM creates a choice point by copying the program registers at the time of the call | including registers representing each 4 From a set-at-a-time point of view, and for de nite programs, this appears equivalent to MGU Magic Templates [10].
3 SUBSTITUTION FACTORING
6
argument of the goal. For tabled subgoals, the entire subgoal needs to be traversed, either to determine whether it is redundant, or to copy it into the table if it is not. During the term traversal, a set of free variables can be obtained. If the subgoal is non-redundant, a (generator) choice point is created which will backtrack through program clauses. However, instead of copying arguments of the subgoal into the choice point, only the free variables of the call are copied. If the subgoal is redundant, program clause resolution is not used. Instead, an active subgoal frame is created, which serves as an environment into which answers can be returned. Like the generator choice points, these active subgoal frames contain only the free variables of the subgoal. In either case, bindings to the dereferenced values of these variables are trailed through forward execution, whether it is program clause or answer clause resolution. The values are untrailed through backtracking just as argument cells would be in WAM execution. When an answer to a tabled subgoal is detected, the dereferenced values of the variable cells are copied from the generator choice point directly into the table, and can later be loaded directly into the active subgoal frames as a means of answer return. Substitution factoring adds a slight overhead to subgoal check/insert due to its more sophisticated handling of variables. From the last section it is apparent that when a tabled subgoal is executed, it must be fully traversed in all cases, since it must either be copied into the table completely, or must be fully checked for redundancy against other subgoals in the table. During this term traversal, pointers to the variables must be copied into the call structure, if the subgoal is new to the evaluation, or into the active subgoal frame, if it is redundant. Because the active subgoal frames and generator choice points contain only variable cells, saving and loading of answers is proportional to the number of free arguments of the call in the case of datalog. However substitution factoring is more powerful than argument reduction for datalog. Figure 3 shows a trie using substitution factoring for the calls methoxymethanol(c(X),c(Y)), and methoxymethanol(c(X),o(Y)). For each answer to these calls, two elements need to be checked or inserted rather than the four that would need to be checked or inserted without substitution factoring. For instance, the answer methoxymethanol(o(2),c(1)), need only check or copy the 2, and 1, rather than o(2), and c(1). In fact, the cost of answer check/insert, and of answer backtracking is proportional to the size of the answer substitution, a simple concept whose meaning we make precise. De nition 3.1 Let A1 and A2 be two terms, standardized apart such that A1 subsumes A2. Then an answer substitution fv1 t1 ; :::; vn tn g of A1 and A2 is the most general uni er of A1 and A2. The size of an answer substitution is the sum of the sizes of the terms t1 ; :::; tn where the size of a term is de ned recursively as follows. The size of a variable or a constant is 1. The size of a compound term f (t1; :::; tn) is one plus the sum of the sizes of its arguments. Clearly, the size of an answer substitution of a subgoal S and answer A is less than or equal to the size of the answer A. For the program in example 1, an actual form of the subgoal for abf (X; Y ) might be a(1,Y), while the form of an answer might be a(1,2). If so, answer substitution would be fY 2g and substitution factoring will store only the value 2 (and associated pointers) rather than the pair (1; 2). Given a precise de nition of the algorithm presented informally in this section, the following simple statement can be proven. Proposition 3.1 Let S be a subgoal for which substitution factoring is used, and A a given answer. Then both answer check/insert and answer backtracking can be performed in cost proportional to the size of the answer substitution of S and A.
4 PERFORMANCE RESULTS c/1
7 var1
c/1
var2
Call Struct
6
1
methoxymethanol/2 1
o/1
var1
c/1
var2
Call Struct
2 6 9
6
Figure 3: Call factoring for methoxymethanol In terms of related work, substitution factoring bears a certain resemblance to the factoring of [7] (here termed NRSU factoring) in that both reduce the number of arguments copied into or out of a table. However, since substitution factoring is dynamic rather than static, as is NRSU, it has dierent characteristics. Whether a predicate is NRSU factorable is undecidable so that substitution factoring may reduce arguments not reduced by NRSU factoring. Further, substitution factoring is de ned for non-datalog programs. On the other hand [7] introduces additional optimizations based on the factored program which are not performed by substitution factoring. These optimizations can transform certain right and double recursions into left recursions, an important transformation not done by substitution factoring.
4 Performance Results Version 1.4 of XSB supports trie-based table structures along with hash-based table structures that have formed the basis of earlier versions. Performance studies of the earlier versions have indicated that XSB to be order of magnitude faster other deductive databases for in-memory datalog queries[9]. The hash-based table structures have a simple form. Each tabled predicate has its own subgoal hash table. For the subgoal check/insert step, the subgoal is hashed and compared, using a variant-check, against any other subgoals in the hash bucket. If the subgoal is not present, it is copied into table space, and the hash chain list updated. Each subgoal has its own answer hash table which resembles the subgoal hash tables in its essential details, and which requires a variant check in the case of hash collisions. Subgoals are hashed on the constant or outer functor symbol of their rst argument, while answers are hashed on the combination of constant or outer functor symbol of all their arguments. The hash technique consists of a quick insert but a slow check if hash collisions occur. The trie technique consists of a slower insert than the hash | it must set parent and sibling pointers | but combines the check and insert, and may need to copy less information for answers due to substitution factoring. All times in this section were for 25 iterations of a given query. The sizes of the data sets are not intended to re ect any limitation in XSB. The rst set of tests uses standard datalog transitive closures. Both left recursion
4 PERFORMANCE RESULTS
8
a(X,Y):- p(X,Y). a(X,Y):- a(X,Z),p(Z,Y).
and right recursion
a(X,Y):- p(X,Y). a(X,Y):- p(X,Z),a(Z,Y).
were tested, each with a query of the form ?chain were used:
. For the data structures, a datalog tree and
a(b,f)
edge(1,2), edge(1,3), ..., edge( n , n+1 edge(1,2), edge(1,3), ..., edge(n-1,n).
2 ?1 2
? 1).
In addition, the tree and chain were also tested after being nested in a small structure as follows: edge(g(1),g(2)), edge(g(1),g(3)), ..., edge(g( n ),g( n+1 edge(g(1),g(2)), edge(g(1),g(3)), ..., edge(g(n-1),g(n)).
2 ?1
2
? 1)).
For the examples with structures, XSB transformational indexing was used to compile the EDB. Transformational indexing processes the edge/2 clauses into a non-deterministic net which, in this case, gives perfect indexing. As can be seen from the tables, the trie structure of the tables allows an equivalent adaptation to allow the tables to be indexed on constants within function symbols, an adaptation not made by the hash technique. The times for the trees were Depth Trie-L Hash-L Trie-L-g/1 Hash-L-g/1 Trie-R Hash-R Trie-R-g/1 Hash-R-g/1 3 .011 .01 .03 .069 .05 .1 0.05 .15 4 .019 .021 .039 .1 .078 .20 .09 .34 5 .049 .031 .07 .27 .16 .42 .4 1.0 6 .08 .06 .13 1.09 .39 .86 .39 3.31 7 .12 .23 .24 3.96 .73 2.19 .91 12.43 8 .309 .45 .49 15.5 1.7 5.01 1.89 47.53 9 .479 .62 .84 64.2 3.3 10.29 3.8 187.06 10 .87 .92 1.83 258.4 ... ... ... ... while the times for the chains were Length Trie-L Hash-L Trie-L-g/1 Hash-L-g/1 Trie-R Hash-R Trie-R-g/1 Hash-R-g/1 32 .021 .01 .031 .089 .289 .26 .42 .99 64 .04 .029 .04 .279 .79 .689 .95 6.1 128 .06 .051 .08 1.01 2.31 2.05 2.9 46.52 256 .109 .11 .16 3.9 7.78 7.75 9.48 399.9 512 .199 .24 .29 15.32 28.38 30.59 58.89 ... 1024 .36 .43 .54 61.48 ... ... ... ... 2048 .731 .84 1.09 ... ... ... ... ... 4096 1.56 1.70 2.319 ... ... ... ... ... 8192 3.39 3.64 5.0 ... ... ... ... ...
5 CONCLUSIONS
9
Several points can be made about these examples. For the datalog examples, the times are generally similar, especially for left recursion, as long as the hash method has hash tables set to an appropriate size. However, as soon as discriminating information is nested within structures the times for hashing and for tries diverges widely, and the tries are radically more ecient for tabling the structured data of this example. As a test of substitution factoring, a program of the form a(Y):- query(X),p(X,Y). a(Y):- a(X),p(X,Y).
was tested in the datalog chain against the left recursive form above. As can be seen from the table below, the times were nearly indistinguishable. Length 32 64 128 256 512 1024 2048 4096 8192 a/2 .21 .036 .062 .11 .19 .35 .74 1.48 3.43 a/1 .21 .035 .059 .10 .18 .35 .74 1.49 3.53 Further proof of the adaptability of the trie technique is obtained by benching variants of the same generation program. sg1(X,X). sg1(X,Y) :p(X,Xp), sg1(Xp,Yp), p(Y,Yp).
sg2(X,X). sg2(X,Y) :p(X,Xp), p(Y,Yp), sg2(Xp,Yp).
The parent relation in this case is a 24x24x2 randomly generated cylinder [2]. The 24x24x2 cylinder can be thought of as an array of 24x24 nodes, where each of the nodes in each row (except the last) is connected to two elements in the next higher row. Same generation queries were made to elements on the bottom row of the cylinder for each program. program sg1/2 sg2/2 trie 0.15 0.51 hash 0.16 24.46 In this case, the poor literal ordering of sg2/2 had little eect on the trie access method, compared to the hash methods. The degeneration in the hash times appears due to the fact that there are a large number of calls to sgbb =2. While the times for the hash access methods could be improved for this example by hashing on the outer functor of both arguments rather than on the rst argument only, the trie access method adapts itself automatically for the subgoal check/insert operation on this example, just as it did for the answer check/insert for the chain and tree nested in the g/1 structures.
5 Conclusions The trie-based approach which we present has been shown to have useful properties in its adaptability to the form of the data tabled, and in its collapsed check/insert operation. Perhaps more surprising is its extendibility to substitution factoring, which provides dynamic argument reduction,
REFERENCES
10
and, indeed, reductions within complex terms. Two aspects of our approach made this possible: the backtracking we use for generating answers, and our association of answers with speci c subgoals. The present paper is a snapshot of work in progress, and we are pursuing two lines of research with respect to table access methods. The rst is to integrate the indexing of our table access methods with that of program clauses, so that in a tabled evaluation, no subterm in a subgoal needs to be examined more than once. The second is to formalize the tabling access techniques more thoroughly. We believe that the advantages of the trie structure are not accidental. Every tabling program can be viewed as a partial evaluation technique which produces answer clauses for a program as specialized for a particular query5 . The trie-structures for answer clauses appear to have a close relation to programs transformed by transformational indexing, a source code transformation developed for XSB. If so, then the table access methods discussed in this paper are equivalent to performing transformational indexing on the tabled partial evaluation of a program, and so can be made to t cleanly into a theory of program transformations.
References [1] Hassan Ait-Kaci. The WAM: a (real) tutorial. Technical Report 5, DEC Paris Research Report, 1990. [2] F. Banchillon and R. Ramakrishnan. An amateur's introdution to recursive query processing strategies. In Proc. of SIGMOD 1986 Conf., pages 16{52. ACM, 1986. [3] T. Chen, I.V. Ramakirshnan, and R. Ramesh. Multistage indexing algorithms for speeding Prolog execution. In Proc. of the Int'l Conf. and Symp. on Logic Programming, pages 639{653, 1992. [4] W. Chen, T. Swift, and D.S. Warren. Ecient implementation of general logical queries. Technical report, State University of New York at Stony Brook, 1993. Submitted. [5] Weidong Chen and David S. Warren. Query evaluation under the well-founded semantics. In Proceedings of the Twelfth Symposium on Principles of Database Systems. ACM, 1993. [6] D. Chimenti, R. Gamboa, R. Krishnamurthy, S. Naqvi, S. Tsur, and C. Zaniolo. The LDL system prototype. IEEE Transactions on Knowledge and Data Engineering, 2:76{89, 1990. [7] J. Naughton, R. Ramakirshnan, Y. Sagiv, and J. Ullman. Argument reduction through factoring. In Proc. of the 15th Int'l Conf. on Very Large Data Bases, pages 173{182. VLDB End., 1989. [8] R. Ramakirshnan, D. Srivastava, and S. Sudarshan. CORAL: Control, relations, and logic. In Proc. of the 18th Int'l Conf. on Very Large Data Bases, pages 238{249. VLDB End., 1992. [9] K. Sagonas, T. Swift, and D.S. Warren. XSB as an ecient deductive database engine. In Proc. of SIGMOD 1994 Conf. ACM, 1994. For modularly strati ed programs, the evaluation is total. In SLG, answer clauses may contain unresolved dependencies for non-strati ed programs. It can be proven [5] that all models of the partially evaluated result are partial stable models of the original program. 5
REFERENCES
11
[10] S. Sudarshan. Optimizing Bottom-up Query Evaluation for Deductive Databases. PhD thesis, University of Wisconson, 1992. [11] T. Swift and D. S. Warren. Performance of sequential SLG evaluation. Technical report, State University of New York at Stony Brook, 1993. Submitted. [12] T. Swift and D. S. Warren. An abstract machine for SLG resolution. Technical report, State University of New York at Stony Brook, 1994. Submitted.