Scheduling Strategies for Evaluation of Recursive Queries ... - CiteSeerX

Scheduling Strategies for Evaluation of Recursive Queries over Memory and Disk-Resident Data A Dissertation Presented by

Juliana Freire de Lima e Silva to the graduate school in partial fulfillment of the requirements for the Degree of Doctor of Philosophy in Computer Science

State University of New York at Stony Brook

December 1997

c Copyright 1997

by Juliana Freire de Lima e Silva

State University of New York at Stony Brook The Graduate School Juliana Freire de Lima e Silva

We, the dissertation committee for the above candidate for the Doctor of Philosophy degree, hereby recommend acceptance of this dissertation David S. Warren, Dissertation Director Professor, Computer Science I.V. Ramakrishnan, Committee Chair Professor, Computer Science Terrance Swift Research Professor, Computer Science Vtor Santos Costa Professor, Computer Science, University of Porto, Portugal Approved for the University Committee on Graduate Studies:

Dean of Graduate Studies & Research

ii

Abstract of the Dissertation Scheduling Strategies for Evaluation of Recursive Queries over Memory and Disk-Resident Data by

Juliana Freire de Lima e Silva Doctor of Philosophy in

Computer Science State University of New York at Stony Brook 1997 Tabulation extends the power of logic programming languages such as Prolog by avoiding redundant computation and guaranteeing the termination for programs with nite models. In this dissertation, I investigate alternative scheduling strategies to improve the performance of tabled evaluation of recursive queries. I address the inability of resolution-based systems to deal with data intensive applications by proposing a set-at-a-time scheduling strategy that is able to evaluate in-memory recursive queries at the best in-memory speeds, while queries to disk have the same access patterns as the best set-ata-time methods. I also develop other strategies that improve the performance of in-memory queries. One such strategy, Local Scheduling, can arbitrarily iii

improve the performance of some programs that make use of answer subsumption (e.g., aggregate queries). Finally, I argue that there is no single best scheduling strategy | whereas a strategy can achieve very good performance for certain applications, for others it might add overheads and even lead to unacceptable ineciency. In order to address this problem, I propose combining multiple strategies in an evaluation as a means of achieving the best possible performance. The scheduling strategies proposed in this dissertation form the foundation of a framework that contains ecient procedural and data retrieval components, both using the language of rst-order logic. We believe this framework is a strong candidate to form a computational basis to combine the elds of logic programming and databases.

iv

To Claudio

Contents List of Figures

ix

List of Tables

xii

Acknowledgements

xiv

1 Introduction

1

2 Overview of Tabling

6

2.1 Tabling in a Nutshell . . . . . . . . . . . . . . . . . . . . . . . 2.2 SLG Resolution . . . . . . . . . . . . . . . . . . . . . . . . . . 2.3 The SLG-WAM: A Virtual Machine for Tabling . . . . . . . .

6 9 23

3 Single Stack Scheduling

29

4 Batched Scheduling

35

4.1 4.2 4.3 4.4

Motivation . . . . . . . . . . . . Batching the Return of Answers Implementation . . . . . . . . . Experimental Results . . . . . . vi

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

35 35 38 44

5 Local Scheduling

5.1 Motivation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5.2 Evaluating one SCC at a Time . . . . . . . . . . . . . . . . . . 5.3 Using Local Scheduling to Compute the Well-Founded Model of Normal Programs . . . . . . . . . . . . . . . . . . . . . . . 5.4 Implementation . . . . . . . . . . . . . . . . . . . . . . . . . . 5.5 Experimental Results . . . . . . . . . . . . . . . . . . . . . . .

6 Set-at-a-Time Scheduling 6.1 6.2 6.3 6.4 6.5 6.6 6.7

Motivation . . . . . . . . . . . . . . . . . . . Semi-Naive Evaluation and Magic Rewriting Breadth-First SLG: A Fair Strategy . . . . . Tabling and Magic: Iteration Equivalences . Implementation . . . . . . . . . . . . . . . . A Set-at-a-time Database Interface . . . . . Experimental Results . . . . . . . . . . . . .

7 Combining Scheduling Strategies 7.1 7.2 7.3 7.4

. . . . . . .

Motivation . . . . . . . . . . . . . . . . . . . . Controlling the Search in Tabled Evaluations . Implementation . . . . . . . . . . . . . . . . . Experimental Results . . . . . . . . . . . . . .

8 Scheduling Tabled Resolution Revisited

. . . . . . . . . . .

. . . . . . . . . . .

. . . . . . . . . . .

. . . . . . . . . . .

. . . . . . . . . . .

. . . . . . . . . . .

. . . . . . . . . . .

. . . . . . . . . . .

. . . . . . . . . . .

52 52 54 60 63 67

77 77 78 81 82 89 95 98

108

108 110 111 115

120

8.1 Motivation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 120 8.2 SLGsched: Scheduling in SLG . . . . . . . . . . . . . . . . . . . 121 vii

8.3 De ning Scheduling Strategies . . . . . . . . . . . . . . . . . . 132 8.3.1 Single Stack Scheduling . . . . . . . . . . . . . . . . . 134 8.3.2 Batched Scheduling . . . . . . . . . . . . . . . . . . . . 138

9 Related Work

142

10 Conclusions and Future Work

147

Bibliography

151

Appendix

165

A Benchmarks: De nite Programs

165

B Benchmarks: Normal Programs

167

C Proofs of Chapter 6

179

D Proofs of Chapter 8

187

viii

List of Figures 1 2 3 4 5

An in nite SLD tree . . . . . . . . . . . . . . . . . . A nite forest for tabled evaluation . . . . . . . . . . SLG evaluation of a program with negation . . . . . . The completion stack and its approximation of SCCs Exact subgoal dependency graph . . . . . . . . . . . .

. . . . .

. . . . .

. . . . .

. . . . .

. . . . .

6

Snapshots of the choice point stack under Single Stack Scheduling 32

7 8 9

37 38

SLG evaluation under Batched Scheduling . . . . . . . . . . . Snapshots of the choice point stack under Batched Scheduling . Times for left-recursive transitive closure on chains and trees of varying size (Single Stack vs. Batched) . . . . . . . . . . . . . 10 Total memory usage for left-recursive transitive closure (Single Stack vs. Batched) . . . . . . . . . . . . . . . . . . . . . . . . 11 Times to compute the shortest-path between 1 and n for varying n (Single Stack vs. Batched) . . . . . . . . . . . . . . . . . . . 12 Times to evaluate shortest-path queries on variations of the Words graph (Single Stack vs. Batched) . . . . . . . . . . . . .

ix

7 8 20 27 27

47 48 49 50

13 14 15 16 17 18 19 20 21 22 23 24

Ancestor relation . . . . . . . . . . . . . . . . . . . . . . . . . SLG evaluation under Local Scheduling . . . . . . . . . . . . . Subgoal dependency graphs under dierent search strategies . . A subgoal dependency graph . . . . . . . . . . . . . . . . . . . Completion stack under Batched and Local Scheduling . . . . . Times for left-recursive transitive closure (Batched vs. Local) . Times for the query subsumes(min)(sgi(n-1,n),I) for varying n (Batched vs. Local) . . . . . . . . . . . . . . . . . . . . . Total memory usage for left-recursive transitive closure (Single Stack vs. Batched vs. Local) . . . . . . . . . . . . . . . . . . . Times to evaluate shortest-path queries on variations of the Words graph (Single Stack vs. Batched vs. Local) . . . . . . . Times to compute the shortest-path between 1 and n for varying n (Single Stack vs. Batched vs. Local) . . . . . . . . . . . . . . Running times for nocyle(X,Y) on dierent graphs of varying size (Batched vs. Local) . . . . . . . . . . . . . . . . . . . . . Running times for win(X) on dierent graphs of varying size (Batched vs. Local) . . . . . . . . . . . . . . . . . . . . . . . .

53 56 57 58 59 69 70 71 72 72 75 76

25 SLG evaluation under Breadth-First Scheduling . . . . . . . . 82 26 CPU times to compute dierent shortest-path queries (Single Stack vs. Breadth-First) . . . . . . . . . . . . . . . . . . . . . 106 27 Times to nd all words reachable from \words" in Words1000, Words2000, Words3000 and Words stored in Oracle . . . . . . 107

x

28 Overheads for XSB-Integ for left-recursive transitive closure on chains and trees of varying size . . . . . . . . . . . . . . . . . 117 29 Times to evaluate subsumes(min)(sgi(n-1,n),I) for varying n (Batched vs. Integ) . . . . . . . . . . . . . . . . . . . . . . . 118 30 31 32 33

Scheduling sequences for Single Stack Scheduling SLG system for Single Stack Scheduling . . . . . SLG system for Batched Scheduling . . . . . . . Scheduling sequences for Batched Scheduling . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

135 136 139 140

34 Part of a tabled evaluation . . . . . . . . . . . . . . . . . . . . 181

xi

List of Tables 1

SLG-WAM code sample . . . . . . . . . . . . . . . . . . . . .

2

SLG-WAM execution pro le for left-recursive transitive closure on a linear chain with 1024 nodes (Single Stack vs. Batched) . 46 SLG-WAM execution pro le for left-recursive transitive closure on a complete binary tree of height 9 (Single Stack vs. Batched) 46

3 4 5 6 7 8

SLG-WAM execution pro le for left-recursive transitive closure on a linear chain with 1024 nodes (Batched vs. Local) . . . . . SLG-WAM execution pro le for left-recursive transitive closure on a complete binary tree of height 9 (Batched vs. Local) . . . Running times (in seconds) for some programs with negation (Batched vs. Local) . . . . . . . . . . . . . . . . . . . . . . . . Memory usage (in bytes) for programs with negation (Batched vs. Local) . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

26

68 69 73 73

Normalized times for left-recursive transitive closure on linear chains and complete binary trees . . . . . . . . . . . . . . . . 100

xii

9

Normalized times for left-recursive transitive closure for various graphs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10 Normalized times for same-generation on a 24x24 cylinder . . 11 Normalized times for left-recursive transitive closure on linear chains and binary trees using the Breadth-First engine and a variation using the consuming node optimization . . . . . . . . 12 Memory utilization for left and right-recursive transitive closure

101 101

103 104

13 Advantages and disadvantages of various scheduling strategies 109 14 XSB-Integ can be more ecient than either Batched or Local . 118

xiii

Acknowledgments This dissertation would not have become reality if it were not for the various people that directly and indirectly contributed to it. First of all, I would like to thank my advisor, Professor David S. Warren, for an endless supply of ideas, suggestions, and criticisms. I learned a lot from him. David is a wonderful person to work with, and he is one of the main reasons my years at Stony Brook were so enjoyable and my experience so ful lling. I am deeply grateful to Professor I.V. Ramakrishnan, he is a great mentor and his enthusiasm contagious. Professor Anita Wasilewska for being a mentor and a very good friend. Professor Richard Larson, for his support and encouragement. Dr. Terrance Swift for helping me learn the intricacies of the SLG-WAM and write papers about it. I am indebted to the members of my committees, Professors David Warren, I.V. Ramakrishnan, Vtor Santos Costa, Terrance Swift, Michael Kifer, Phil Lewis and Leo Bachmair, for their time and useful criticisms. Many thanks to the computer science sta, in particular, Kathy Germana, Betty Knittweis, Peggy Thomas, Pat Saccaro, Brian Tria, and Anne Kilarjian, who made my life at Stony Brook so much easier. I would like to thank the people who directly contributed to the XSB

project, and made this work possible: Baoqiu Cui, Steven Dawson, Hasan Davulcu, Rui Hu, Ernie Johnson, Rui Marques, C.R. Ramakrishnan, I.V. Ramakrishnan, Prasad Rao, Abhik Roychoudry, Kostis Sagonas, Terrance Swift, David Warren. To the many friends I made while at Stony Brook, for their support, and the good times we spent together, besides the members of the XSB group: Ivan Almeida, Karen Bernstein, Rui Chiou, Mauricio Cortes, Steve Dawson, Patricia Gomez, David Gerstl, Daren Krebsbach, Pedro Souto, and Michael Vernick. I thank Eduardo Prado and Renata Grunberg for being very good friends; Luciana and Ricardo, for their love. Special thanks to Seu Checo, D. Maze, Tita and Maria, they have been wonderful role models, they inspired and supported me throughout my life. Most of all, I thank my husband, Claudio, he helped me throughout my graduate studies, and as he put it, \in both technical and non-technical matters".

Chapter 1 Introduction Much of the success of the relational database model can be attributed to the declarativeness of its query language SQL. Unfortunately, SQL is not expressive enough. There are many useful queries (e.g., queries that involve recursion) which cannot be expressed in this language. Usually, when one wants to reason about the contents of a database, it is necessary to leave the relational model by embedding SQL into a lower-level language such as C, which results in an impedance mismatch 1 and consequent loss of declarativeness. Deductive databases [GM78, GMN81, GMN84a, GMN84b, Min88] address this problem by adopting logic programming [Kow74, Llo84] or a restriction such as Datalog [Ull89a] as the query language. Prolog engines have been used to evaluate Datalog queries (see e.g., [CGT90]), but they have proven to be unacceptable for data-oriented queries for two major reasons: their poor termination and complexity properties for Datalog, and their tuple-at-a-time A mismatch between the data manipulation language and the host programming language. 1

1

CHAPTER 1. INTRODUCTION

2

strategy. Pure bottom-up algorithms (e.g., the semi-naive algorithm [Ull89a]) on the other hand, do terminate for Datalog programs, but because they are not goal-directed, they may generate super uous answers and result in unacceptable ineciency. The problem of evaluating recursive (Datalog) queries has been extensively studied and a number of techniques have been proposed (see e.g., [BR86], [Vie89], [CW96]). Recently, two approaches have gained special attention: (1) magic sets [BMSU86, Sek89, BR91, Ram91], which adds goal directedness to bottom-up evaluation, and (2) tabling (or memoization) [CW96, TS86, Vie89, BD93], which adds features of database evaluation to logic programming languages. These two approaches resemble each other in that they combine top-down goal orientation with bottom-up redundancy checking. In fact, under certain assumptions they have been proved to be asymptotically equivalent [Sek89]. Despite these well-known equivalences, magic-style systems have traditionally diered from tabling systems. Magic-style systems, such as LDL [CGK+90], CORAL [RSS92b], Glue-Nail [DMP93], and Aditi [VRK+94], are built upon set-at-a-time engines, and can use set-at-a-time operations like relational joins that may be made ecient for disk-resident data, while tabling systems, such as XSB [SSW94], use a tuple-at-a-time strategy that re ects their genesis in the logic programming community. Presently for in-memory Datalog queries, due mainly to the use of Prolog compilation technology, the fastest tabling systems show an order of magnitude speedup over magic-style systems [SSW94]. However, because of their tuple-at-a-time strategy, these tabling systems cannot be eciently extended to disk, and thus, they are not


3

practical for applications that deal with massive amounts of data. Nonetheless, tabling has a number of advantages over magic besides inmemory performance. Tabled evaluations such as SLG [CW96] are well-suited (and ecient) to compute the well-founded model [vRS91] of normal programs with negation; they handle variables in a natural way; they provide a clear connection between selected atoms and computed answers (and unlike the magic approach, do not require additional join operations to re-establish this connection); their tuple-at-a-time strategy is ecient for computing existential queries; and they can make use of Prolog's compilation technology. A question then arises as to whether it is possible to integrate set-at-a-time processing into a tabled evaluation. Tabled evaluations ensure termination of programs with nite models by keeping track of which subgoals (or subqueries) have been called. Given several variant subgoals in an evaluation, only the rst (the generator) will use program clause resolution; the rest (consumers) must perform answer resolution using answers computed by the original invocation. This use of answer resolution prevents the possibility of in nite looping for Datalog programs, which sometimes occurs in SLD. A tabled evaluation can then be seen as a set of generator subgoals producing answers which are asynchronously consumed by consumer subgoals. Tabling systems thus face important scheduling choices not present in traditional top-down evaluation: when to schedule answers to consumers. Dierent scheduling strategies | and by implication searches | can be formulated. In particular, a set-at-a-time strategy that is iteration equivalent to the semi-naive evaluation of a magic transformed program (i.e., at each iteration generates the same calls and answers) was proposed in [FSW97b]. In


4

Chapter 6, we describe this strategy, its implementation, and give experimental results which demonstrate that this set-at-time tabled evaluation can evaluate in-memory queries at the best in-memory speeds, while queries to disk have the same access patterns as the best set-at-a-time methods. Because tabling systems can evaluate recursive queries at Prolog speed [SW94b], besides the evaluation of deductive databases queries, they have proven useful for a number of other xpoint-style problems, such as program analysis [DRW96, JBD95, CDS96], compiler optimization [DRSS96], model checking [RRR+97]. Ensuring that these diverse applications run eciently may require dierent scheduling strategies, and an important question arises: how does the order of returning answers to consuming nodes aect program eciency? In this dissertation, we investigate alternative scheduling strategies for tabled evaluations and study their performance characteristics. The original implementation of SLG, the SLG-WAM [SW94a], had a simple mechanism of scheduling answer resolution which was expensive in terms of trailing and choice point creation. In Chapter 4, we propose a more sophisticated scheduling strategy, Batched Scheduling [FSW97a], which reduces the overheads of these operations and provides dramatic space reduction as well as speedups for many programs. In Chapter 5, we de ne Local Scheduling [FSW97a], which has applications to non-monotonic reasoning and when combined with answer subsumption can arbitrarily improve the performance of some programs. Even though a strategy can result in considerable speedups for some applications, for others it may add overheads and even lead to unacceptable ineciency. Since dierent applications have dierent requirements, the ability


5

to use multiple strategies in an evaluation is likely to be bene cial. In Chapter 7 we discuss the issues involved in combining scheduling strategies in a tabled evaluation and describe an implementation which provides engine-level support for integrating dierent strategies at the predicate level. The main contributions of the dissertation result from the detailed study of the previously unexplored problem of scheduling in tabled evaluations. More speci cally:

We de ne a set-at-a-time tabled evaluation that allows tabling engines

to eciently evaluate recursive queries that involve out-of-memory data without incurring signi cant overheads for in-memory queries (Chapter 6);

We formulate two dierent scheduling strategies, Batched Scheduling and Local Scheduling, and analyze how these strategies aect the eciency of the evaluation (Chapters 4 and 5);

We argue that no single strategy can be uniformly best, and propose

the integration of multiple strategies at the predicate level as a means of allowing the programmer to control the tabled search to achieve the best performance (Chapter 7);

We present a new and simpli ed de nition of SLG and propose a framework to formally de ne scheduling strategies for SLG evaluation (Chapter 8).

Chapter 2 Overview of Tabling Tabling (or memoization) has been discovered in dierent guises from a number of dierent starting points and given a number of dierent names (e.g., [War79, Die87, TS86, Vie89]). Here we motivate the need for tabling in a logic programming framework, and highlight its underlying ideas. For a more comprehensive description, the reader is referred to [War92].

2.1 Tabling in a Nutshell The basic idea of tabling is simple: during the evaluation of a logic program, remember each subgoal that is called and the answers generated for these subgoals. If a call is made to a subgoal that has already been called, instead of re-evaluating this subgoal, the new call will consume the answers generated by the original invocation. The following example illustrates the actions of a tabled evaluation. 6

CHAPTER 2. OVERVIEW OF TABLING

7

Example 2.1.1 Consider the following program that computes the transitive closure of a directed graph: :- table p/2. p(X,Y) :- p(X,Z), p(Z,Y). p(X,Y) :- a(X,Y). a(1,2).

a(1,3).

a(2,3).

and the query :- p(1,Y). The SLD [Llo84, Doe94] tree for this program and query is given in Figure 1. Because the leftmost branch of this tree is in nite, if all solutions are sought for this query, Prolog will go into an in nite loop regardless of the search strategy it uses. Figure 2 shows a forest of trees for the above query at the end of its tabled evaluation. As we will describe in detail later, by avoiding the recomputation of subgoal p(1,Z) (in node 2), tabling ensures the termination of this query. 2 p(1,Y)

p(1,Z), p(Z,Y)

a(1,2)

a(1,3)

p(1,Z1),p(Z1,Y) a(1,2),p(2,Y) a(1,3),p(3,Y)

Figure 1: An in nite SLD tree Besides the avoidance of in nite loops, tabling can also result in better complexity for some programs, as the following example shows.

Example 2.1.2 Let the Fibonacci program be de ned as:

CHAPTER 2. OVERVIEW OF TABLING

8

1. p(1,Y) ( tnot(p); true) ; true), ( q ->

( tnot(q); true ); true),

( r ->( tnot(r); true ); true), abolish_all_tables, fail. test :- q, fail. test :- ( p ->( tnot(p); true); true),

167

APPENDIX B. BENCHMARKS: NORMAL PROGRAMS ( q ->

( tnot(q); true); true),

( r ->( tnot(r); true); true), abolish_all_tables, fail. test :- r, fail. test :- ( p ->( tnot(p); true); true), ( q ->

( tnot(q); true); true),

( r ->( tnot(r); true); true).

fr: non-strati ed :- table p/0, q/0, r/0, t/0.

% Model True = {}, False = {t}, Undefined = {p,q,r}. p :- tnot(q), tnot(r), p. p :- tnot(t), tnot(r).

q :- tnot(q), tnot(r), p.

r :- tnot(q), p, tnot(r).

t :- fail.

test :- q, fail. test :- ( p ->( tnot(p); true); true), ( q ->( tnot(q); true); true), ( r ->( tnot(r); true); true), ( t ->( tnot(t); true); true).

168

APPENDIX B. BENCHMARKS: NORMAL PROGRAMS

interp: a meta-interpretor :- dynamic rule/2.

:- table interp_g/1.

interp([]). interp([G|Gs]) :- interp_g(G),interp(Gs). interp_g(tnot(G)) :- tnot(interp_g(G)). interp_g(G) :- rule(G,B),interp(B).

test :- new_program, query(Goal), interp_g(Goal), fail. test.

new_program :- cleanup, assert(query(p)), assert(rule(p, [tnot(q),p])), assert(rule(q, [tnot(q),p])), assert(rule(q, [tnot(p)])). new_program :- cleanup, assert(query(q)), assert(rule(p, [tnot(q), p])), assert(rule(q, [tnot(p), q])), assert(rule(p, [tnot(p)])). new_program :- cleanup, assert(query(p)), assert(rule(p, [tnot(q), p])), assert(rule(q, [tnot(p), q])), assert(rule(q, [tnot(p)])).

169

APPENDIX B. BENCHMARKS: NORMAL PROGRAMS cleanup :retractall(query(_)), retractall(rule(_,_)), abolish_all_tables.

simpl: requires simpli cation :- table s1/0, s2/2, s3/1, s4/0.

s1 :- tnot(s1).

s2(_,_) :- s1. s2(a,_) :- tnot(s4). s2(a,_).

s3(X) :- var(X), tnot(s4). s3(X) :- var(X), s1. s3(X) :- var(X), s2(X,Y), s3(Y).

s3(X) :- tnot(s4), s1, s2(X,Y), Y = b.

s4 :- tnot(s4).

test :- s1, fail. test :- s2(X,_), s3(X), fail. test.

170

APPENDIX B. BENCHMARKS: NORMAL PROGRAMS

przy: :- table p/0, q/0, r/0, s/0.

% Model True = {s} p :- tnot(s), q, tnot(r). q :- r, tnot(p). r :- p, tnot(q). s :- tnot(p), tnot(q), tnot(r).

test :- p, fail. test :( p -> ( tnot(p); true) ; true), ( q -> ( tnot(q); true) ; true), ( r -> ( tnot(r); true) ; true), ( s -> ( tnot(s); true) ; true).

residual: computes the residual program :- import numbervars/1 from num_vars.

171

APPENDIX B. BENCHMARKS: NORMAL PROGRAMS :- import get_residual/2 from tables.

test :-test(C,D), fail. test :- test_2nd_arg_bound, fail. test.

test_2nd_arg_bound :get_residual(p4(X), [One,tnot(p3),Two,p2|Rest]).

call_them :t(_), u(_), up(_), p, p2, p3, p4(_), p5(_,_,_,_,_), p6(_),pfff(_), p(_,_), pos(_,_,_), p(_,_,_), L = [_|_], p(_,L,_), nlin(_), nlin(_,_), mdl(_).

t(_,_) :- call_them, fail. t(C,D) :- C = p, gcr(C,D). t(C,D) :- C = p2, gcr(C,D). t(C,D) :- C = p4(_), gcr(C,D).

test(_,_) :- call_them, fail. test(C,D) :- C = t(_), gcr(C,D). test(C,D) :- C = u(_), gcr(C,D). test(C,D) :- C = up(_), gcr(C,D). test(C,D) :- C = p, gcr(C,D). test(C,D) :- C = p2, gcr(C,D). test(C,D) :- C = p3, gcr(C,D).

172

APPENDIX B. BENCHMARKS: NORMAL PROGRAMS test(C,D) :- C = p4(_), gcr(C,D). test(C,D) :- C = p5(_,_,_,_,_), gcr(C,D). test(C,D) :- C = p6(_), gcr(C,D). test(C,D) :- C = p(_,_), gcr(C,D). test(C,D) :- C = p(_,_,_), gcr(C,D). test(C,D) :- L = [_|_], C = p(_,L,_), gcr(C,D). test(C,D) :- C = pf(_), gcr(C,D). test(C,D) :- C = pff(_), gcr(C,D). test(C,D) :- C = pfff(_), gcr(C,D). test(C,D) :- C = nlin(_), gcr(C,D). test(C,D) :- C = nlin(_,_), gcr(C,D). test(C,D) :- C = mdl(_), gcr(C,D).

gcr(C,D) :- get_residual(C,D).

:- table t/1, pos/3, u/1, up/1, p/0, p2/0, p3/0, p4/1, p5/5, p6/1, p/2, p/3.

t(1).

u(f(1)) :- tnot(u(f(1))).

up(X) :- X = f(1), u(X).

p :- tnot(p).

173

APPENDIX B. BENCHMARKS: NORMAL PROGRAMS p2 :- tnot(p2), p.

p3 :- p, p2, tnot(p3).

p4(Y) :- p, tnot(p3), X = 11, p(X,Y), p2.

p5(1,2.1,a,[],55) :- tnot(p5(1,2.1,a,[],55)), tnot(p2).

p6(Y) :- p, p2, tnot(p3), X = 11, p(X,Y), p2, Z = a, p(_,Z,F), float(F).

p(11,22) :- tnot(p(11,22)).

pos(1,2,3) :- X = 11, Y = 22, p(X,Y), p(X,_).

p(1,a,1.1) :- tnot(p(1,a,1.1)), p. p(1,a,1.1) :- tnot(p(2,f(a,g(b,1),c),3)), p2. p(2,f(a,g(b,1),c),3) :- p. p(3,[a],_) :- tnot(p(3,[a],3)). p(_,[a,b|Y],Y) :- tnot(p(4,[a,b],[])).

:- table pf/1, pff/1, pfff/1.

pf(f(g(a),1,h(i(1,a),[1,2,3]))) :-

174

APPENDIX B. BENCHMARKS: NORMAL PROGRAMS tnot(pf(f(g(a),1,h(i(1,a),[1,2,3])))).

pff(X) :- Y = f(g(X),1,h(i(1,a),[1,2|_])), pf(Y).

pfff(X) :- pff(X), Y = f(g(X),_,h(i(1,a),[1|_])), pf(Y).

:- table nlin/1, nlin/2.

nlin(X) :- nlin(X,X).

nlin(X,X) :X = f(f(f(f(f(1,2,3,4,5))))), tnot(nlin(X,X)), p2.

:- table mdl/1.

mdl(X) :- up(X). mdl(X) :- nlin(X). mdl(f(1)) :- p, p2, tnot(p3). mdl(X) :- X = f(1), tnot(u(X)).

ans compl: needs answer completion :- table p/0, q/0, s/0, r/0.

p :- p. p :- q.

175

APPENDIX B. BENCHMARKS: NORMAL PROGRAMS q :- p. q :- tnot(s).

s :- tnot(r). s :- p.

r :- tnot(s), r.

test :- p, fail. test :( p -> ( tnot(p); true) ; true), ( q -> ( tnot(q); true) ; true), ( s -> ( tnot(s); true) ; true), ( r -> ( tnot(r); true) ; true).

undef: all unde ned :- table p/0, q/0, r/0.

176

APPENDIX B. BENCHMARKS: NORMAL PROGRAMS p :- tnot(q). p :- tnot(r), fail.

q :- p.

r :- p.

test :- p, fail. test :( p -> ( tnot(p); true) ; true), ( q -> ( tnot(q); true) ; true), ( r -> ( tnot(r); true) ; true).

nonstrat: non-strati ed % Model Undefined = {a,b,c} test :- a, fail. test :- b, fail. test :- c, fail. test :( a ->

177

APPENDIX B. BENCHMARKS: NORMAL PROGRAMS ( tnot(a); true) ; true), ( b -> ( tnot(b); true) ; true), ( c -> ( tnot(c); true) ; true).

:- table a/0, b/0, c/0, d/0.

a :- tnot(c).

b :- a.

c :- tnot(b).

178

Appendix C Proofs of Chapter 6 Theorem 6.4.1 Let P be a de nite program and let M(P,Q) be its magic rewrite for a query Q. Also let T(P) be the fully tabled program, and assume Q is an element of Subgoals0. Then, at each iteration t:

1. An SNMT evaluation of P for Q derives a non-magic fact A if BreadthFirst Search derives the answer A. 2. An SNMT evaluation of P for Q produces a new fact magic(S ) if BreadthFirst Search adds S to Subgoalst.

Proof:

Statements 1 and 2 are proven together by induction on the number of iterations of the Breadth-First evaluation.

Base case. { 1) Non-magic Facts: Suppose the rst iteration of the Breadth-First

evaluation produces an answer fact A via rule R. By Algorithms 179

APPENDIX C. PROOFS OF CHAPTER 6

180

6.4.1{6.4.3, the production of any answer in the Breadth-First evaluation depends on the presence of a corresponding subgoal in the evaluation. Since Q is the only subgoal in Subgoals0 and since Ans0 is empty, A must have root subgoal Q and cannot depend on the results of any other tabled subgoals, that is A can depend only on base facts and not on derived facts. Under these conditions, A would be also produced by the SNMT evaluation.

{ 2) Magic Facts: Suppose the Breadth-First evaluation produces a

subgoal S. In this case, there is a rule R1 whose head uni es with Q, and a pre x Lit1 ; :::; Litm leading a call to a rule R2 whose head uni es with S. Also, it is clear that each Liti must be a base fact. Next in the Breadth-First evaluation, R2 is used for resolution, and the subgoal S immediately produced. Now consider how R2 is produced by rewrite rule 3 of De nition 6.2.1:

R1 : Q

Lit1 ; :::; Litm ; S

Rmagic : magic(S )

R2 : S

BodyS

)

magic(Q); Lit1 ; :::; Litm

Rmagic exists in the magic-transformed program. Because each literal in the body must either be in Ans0 or be a base fact, magic(S ) will be derived by the SNMT evaluation.

Inductive Case. Assume the statements 1 and 2 hold at all iterations less than N to demonstrate that the statements will also hold at N.

{ 1) Non-magic Facts: In the case of an answer A derived by the


181

T: S lit1 C1

Cn litn A

Figure 34: Part of a tabled evaluation Breadth-First evaluation, the rule

R : Head

Lit1 ; :::LitN

was used to derive A through a program clause resolution step against some query Q followed by a series of resolution steps of answers against selected literals of consuming nodes. It must be the case that each of the body literals is either a base fact, or is in AnsN ?1 . It remains to also show that at least one of the answers is contained in AnsN ?1, or that the subgoal Q is in SubgoalsN ?1. To see this, suppose that in the forest for the Breadth-First evaluation, A is in tree T, and consider the path C = C1; :::: Cn of consuming nodes leading from the root of T to the particular instance of A derived at N. (Note that there could be more than one such answer leaf A in T. We need consider only a particular instance produced at iteration N). This situation is depicted in Figure 34. For the purpose of simplicity, rst assume that the Q was added as a subgoal before iteration N-1. In this case, the path C cannot


182

be empty | R cannot be a bodyless clause | otherwise this instance of A would have been added before iteration N-1. Therefore, there must be at least a consuming node and the path cannot be empty. Consider the actions, in iteration N, of resolving an answer against some consuming node, say Ci, in the path. This resolution produces a new consuming node, Ci+1, and any answers to the subgoal of Ci+1 that were produced in previous iterations are resolved against Ci+1 in iteration N (cf. the interactions of Algorithms 6.4.2 and 6.4.3). Extending this argument, it is easy to see that the rest of the path from Ci+1 to A is created at iteration N. Thus, if Q is not in SubgoalsN ?1 it must be the case that some answer used for resolution along the path C must be in AnsN ?1 . The argument can be easily extended to demonstrate that if no answer used along C is in AnsN ?1 , then Q must be in SubgoalsN ?1. This argument shows that in the Breadth-First evaluation, an answer leaf A is produced i its subgoal is in the delta set, or one of the consuming nodes resolves against an answer in the delta set. Now consider the SNMT evaluation. By rewriting rule 2 of De nition 6.2.1, a similar rule to the one above is used by SNMT:

R:Q Rmagic : Q

Lit1 ; :::LitN

)

magic(Q); Lit1 ; :::LitN

By the induction hypothesis, any derived literals in AnsN in the Breadth-First evaluation are also in the SNMT evaluation (Q must have been derived), and by the properties of SNMT, at least one


183

of the derived or magic facts is in the delta set for SNMT. Thus SNMT will produce A at iteration N.

{ 2) Magic Fact: Suppose the Breadth-First evaluation derives a sub-

goal Q. In this case, the Breadth-First evaluation has used for resolution a rule for a subgoal Q' whose body consists of a pre x, Lit1 ; :::; Litm leading a call to a rule whose head uni es with S. By rewrite rule 3 of De nition 6.2.1

R : Q0

Lit1 ; :::; Litm ; S )

Rmagic : magic(S )

magic(Q0); Lit1 ; :::; Litm

Rmagic exists in the magic-transformed program. By an argument essentially the same as for non-magic facts, for the Breadth-First evaluation, either Q' is in SubgoalsN ?1 or at least one answer of the Breadth-First evaluation must be in AnsN ?1 . In addition, any answers used for resolution on the path of consuming nodes to S must be in AnsN ?1 . By the induction hypothesis, SNMT would contain the same derived and magic answers in the delta sets for the same iteration. Through the above rule it also produces magic(S) in the SNMT evaluation.

} The proof of the completeness of Algorithms 6.4.1{6.4.3 depends on the following metric for clauses.

De nition C.1 We de ne the distance to a consuming, answer, interior, or

subgoal (or generator) node recursively:


184

Let Q be the initial subgoal. Then distance(Q) is 0. Let S be a (non-initial) subgoal. Then distance(S) is the minimum of the distances of all consuming nodes whose selected subgoal is S.

Let N be a consuming, interior or answer node. Then { if parent(N) is a subgoal node, distance(N) is distance(parent(N)) + 1

{ if parent(N) is a consuming node, and answer A was resolved against parent(N) to produce N, then distance(N) is

distance(parent(N)) if distance(A) < distance(parent(N )) distance(A) + 1 otherwise { if parent(N) is an interior node, then distance(N) is distance(parent(N))

2

Theorem 6.4.2 Let P be a fully tabled de nite program evaluated by Breadth-

First Search, MP be the least model of P, and G be an element of MP . Then, at some iteration t there is a subgoal S such that G is subsumed by an element of AnsSt . Proof: (Sketch) Like other tabling methods, SLG has the subgoal completeness property [CW96]. So proving subgoal completeness for Algorithms 6.4.1{ 6.4.3 can be reduced to showing that they produce the same answers as SLG. Furthermore, for de nite programs, an SLG evaluation contains at most !


185

states. Therefore, the equivalence can be proven through (a series of) nite inductions. In our sketch, we prove only the outer induction step, which corresponds to Algorithm 6.4.1 (Breadth-First Main). In order to complete this proof, similar inductions are needed for the loops in the Algorithm 6.4.2, Get Program Clause Closure, and Algorithm 6.4.3, Perform Answer Resolution. Our (outer) induction statement is as follows. Using the distance measure in De nition C.1, we prove that for de nite programs, for each natural number N, every subgoal, consuming node and answer node of distance N produced by SLG will also be produced by Algorithms 6.4.1{6.4.3. Base Case: The initial subgoal has distance 0 and is created by assumption. Inductive Case. Assume that all nodes of distance less than N have been created to show that nodes of distance N will be created. Consider the cases for a node Node at distance N+1.

parent(Node) is a subgoal node, denoted by S. In this case, by the induc-

tion assumption every subgoal node at distance N or less is produced by algorithm Breadth-First Main. Expansion at stage N+1 is done in algorithm Breadth-First Main in lines 4-14. All resolvents of program clauses against subgoals created at stage N are derived in lines 4-5. These resolvents are added into the ClauseCache with status new. Then, in lines 6-14 the various cases are addressed where N may be an answer, or consuming node.

Node is a subgoal node. If the distance to Node is N+1 then Node was

produced along with a consuming node in line 12 of Breadth-First Main,


186

line 11 of Get Program Clause Closure, or line 11 of Perform Answer Resolution. Thus the proof that all subgoals of distance N+1 are produced can be reduced to the proof that all consuming nodes of distance N+1 are produced.

If the parent(Node) is a consuming node, and the answer A was resolved against parent(Node) to produce Node then there are two cases:

{ distance(A) < distance(parent(Node)): In this case, by De nition C.1 distance(parent(Node)) must be N+1 (i.e., parent(Node) 2 CnsN +1 ). In this case the call to Perform Answer Resolution in line 16 of algorithm Breadth-First Main will ensure that parent(Node) is resolved properly. Also, since this call is the last step of each iteration, and since Perform Answer Resolution is itself a xpoint iteration, it is a straightforward induction on the loop of Perform Answer Resolution to show that every element of CnsN +1 will have the appropriate resolution performed against it. Finally it remains to show that all consuming nodes of distance N+1 will in fact be generated. This again requires straightforward inductions on the loops of Get Program Clause Closure, and of Perform Answer Resolution both of which, in addition to algorithm Breadth-First Main, can add elements to CnsN +1 . { distance(A) distance(parent(Node)): In this case, the call to Perform Answer Resolution in line 15 of Breadth-First Main performs the appropriate resolution step.

}

Appendix D Proofs of Chapter 8 Theorem 8.2.1 Let E 0 be a non- oundering terminating SLGsched evaluation of a given program P and query Q, such that

E 0 = f(Fi0 ; Seq0 ); (Fi1 ; Seq1 ); : : : ; (Fim ; Seqm)g then there exists an SLG evaluation E = fF0,F1,. . . ,Fng for P and Q, such that 8k 2 f0; : : : ; ng, 9ik0 2 f0; : : : ; mg; ik0 k such that Fk = Fik0 (i.e., for an SLG evaluation E = fF0,F1 ,. . . ,Fng, the corresponding SLGsched evaluation might contain multiple tuples with the same forest, for instance, E 0 = f(F0; Seq0); : : : ; (Fj ; Seqk ); (Fj ; Seqk+1); : : : ; (Fim ; Seqm )g. Proof: (sketch) Note that in SLGsched, evaluation moves from one SLG system Fik0 to another Fik0+1 by applying an SLG operation stored in the scheduling sequence, at which point new applicable operations might be added to the sequence. However, because operations are scheduled eagerly, it might happen that when an operation is selected, either it is no longer applicable (e.g., ND(S ) is in 187

APPENDIX D. PROOFS OF CHAPTER 8

188

the sequence but before it is selected an answer is created for node S ), or it cannot be applied yet (e.g., Completion(S ) is selected but there are still other operations involving node S in the scheduling sequence). In these cases, by De nition 8.2.3, the selected operations act as no-ops, and the forest does not change as a result of their application, that is, Fik0 = Fik0+1 . If a selected operation op is applicable, the changes that op makes to the forest Fik0 result in Fik0+1 6= Fik0 , and if Fik0 = Fk then Fik01 = Fik+1 . }

Theorem 8.2.2 Let E be a nite and non- oundering SLG evaluation of a given program P and query Q, such that E = f F0,F1,. . . ,Fn g. Then there exists an SLGsched evaluation

E 0 = f(Fi0 ; Seq0 ); (Fi1 ; Seq1 ); : : : ; (Fim ; Seqm)g such that 8k 2 f0; : : : ; ng, 9k0 2 f0; : : : ; mg; ik0 k such that Fk = Fik0 , and Seqk0 AppFk . Proof: (sketch) In order to show that SLGsched is complete, we prove by induction that at each point in the evaluation, if an operation is applicable in a forest, it is also present in the scheduling sequence associated with the forest. That is, given a forest Fik0 in an SLGsched evaluation op Fik0 ?! Fik0+1 ) op 2 Seqk0

Base case: Given a program P and query Q, the only applicable SLG

operation (De nition 2.2.9) is Subgoal Call, which by de nition is in the initial scheduling sequence.

Inductive step: Let us examine each possible applicable operation in the


189

forest Fn, and prove that if an operation op is applicable in the forest Fn, then op 2 Seqn

(op = Subgoal Call) If Subgoal Call is applicable in Fn, then exists a leaf node NodeNumber: AT DS j S, GL (or the original query S ) such that S is tabled and there is no tree in Fn with a root that is a variant of S. We claim that SC(NodeNumber) 2 Seqn. By de nition, whenever a new node is created in SLGsched, if the selected literal in the new node is tabled and it is not present in the evaluation, SC(NodeNumber) is added to the scheduling sequence (see Procedure 8.2.1).

(op = Program Clause Resolution) If Program Clause Resolution is applicable, either (1) there is a root note NodeNumber:

S j S ; or (2) there is a non-root node NodeNumber:AT DS j S, GL in which S is not tabled. In either case there is a program clause whose head uni es with S.

If Program Clause Resolution is applicable to the root node NodeNumber and a clause C, then PCR(NodeNumber,C) 2 Seqn, since the root node was created as the result of Subgoal Call, which in SLGsched adds PCR tuples for the new root against all matching clauses (see De nition 8.2.3 (1)). If Program Clause Resolution is applicable to the non-root node NodeNumber and a clause C, then PCR(NodeNumber,C) 2 Seqn , since when a new node is created, if the selected literal is nontabled, PCR tuples for the new root against all matching clauses


190

are added (see Procedure 8.2.1).

(op = Answer Clause Resolution) If Answer Clause Resolution is applicable in Fn , there must exist an answer node AN in the tree for a subgoal S, and a non-root node CN whose selected literal is a variant of S. We claim that ACR(AN',CN) 2 Seqn, where AN' AN, and if AN 6= AN', AN' is an ancestor of AN that has an empty goal list. Let us consider the following possibilities:

{ AN' < CN (i.e., the answer was created before the consuming

node). In this case, when Subgoal Call was executed for node CN, SLGsched (see De nition 8.2.3 1) must have add ACR frames for CN and all answers available in the tree for S, in particular ACR(AN',CN). { AN' > CN (i.e., the answer was created after the consuming node). When the answer AN' was created, Procedure 8.2.1 (ScheduleNewNode) must have added ACR frames for AN' and all non-root nodes whose selected literals are variants of S, in particular ACR(AN',CN). In either case, if AN 6= AN', SLGsched uses AN' simply as a designator of the answer AN, and ACR is actually applied to AN (see De nition 8.2.3 (3)).

(op = Negative Delay) If Negative Delay is applicable, then (1) there exists a leaf node N:AnswerTemplate DelaySet j not(S), GoalList, and (2) there exists a tree with root S whose status status is non-complete.


191

In SLGsched, when the leaf node N is created whose selected literal S is tabled, a frame SC(N) is added to the scheduling sequence. Since when Negative Delay becomes applicable, there must be a tree with root S, we are guaranteed that SC(N) has already been selected, and when it was selected it must have added ND(N) to the scheduling sequence | by De nition 8.2.3 (1), if a subgoal is called negatively and it is either new to the evaluation, or if there is a non-completed tree with that subgoal as root, ND is scheduled, and thus ND(N) 2 Seqn.

(op = Negative Return) If Negative Return is applicable, there exists a node N:AnswerTemplate DelaySet j not(S),GoalList, and

(1) S is completed and has no answers, or (2) S has an unconditional answer. In either case, the tabled subgoal S must have already been called, that is, SC(N) has been selected. Then, by De nition 8.2.3 (1) NR(N) 2 Seqn, since unconditional answers can never become conditional, and completed subgoals \uncompleted".

(op = Completion) In Completion is applicable to Fn there must

exist a set Subg of subgoals which is completely evaluated. We claim that the tuple Completion(Subgi) 2 Seqn, for each subgoal i in Subg. In SLGsched, a Completion operation is scheduled for each new subgoal at Subgoal Call. Even though Completion for a subgoal S is scheduled at subgoal call, it will only be applicable after all operations (SC,PCR,ACR,ND and NR) involving S and/or subgoals S depends on have been applied. In order to ensure Completion(S ) is present in the sequence whenever it becomes


192

applicable, SLGsched reschedules this operation in case it is selected early.

(op = Simplification) If Simplification is applicable to a conditional answer AN:Ans DS, then there is an atom AS 2 DS , or ASA0 2 DS whose truth value is known. Notice that the truth value of an atom becomes known when: (1) there is an unconditional answer for the subgoal S that is a variant of A; (2) S is completed without any unconditional answers; or (3) the last conditional answer of S is failed (see De nition 2.2.9). In any of these cases SLGsched schedules the simpli cation of AN. The atom AS (or ASA0 ) is only delayed if S is not completed and has no unconditional answers. If S is later completed, SLGsched schedules simpli cation for all answers that contain a variant of S in their delay set (see Procedure 8.2.2). Alternatively, S can only get unconditional answers if one of its conditional answers become unconditional, and this only happens as a result of Simplification, in which case SLGsched schedules all applicable simpli cations (see De nition 8.2.3 (7)). Finally, the last conditional answer of a subgoal can only become failed as a result of Simplification or Answer Completion, and in both cases SLGsched schedules all applicable simpli cations (see De nition 8.2.3 (7) and (8)).

(op = Answer Completion) If Answer Completion is applicable then there exists a set of unsupported answers. We argue that, in SLGsched, if there is a set UA of unsupported answers, then


193

AnsCompletion(UAi ) 2 Seqn, where UA S UAi. A set of answers can only become unsupported if a subgoal gets completed, as the result of Simplification, or Answer Completion(see De nition 2.2.9). In either of these cases SLGsched schedules any available Answer Completion(see De nition 8.2.3 (6), (7) and (8)).

}

Scheduling Strategies for Evaluation of Recursive Queries ... - CiteSeerX

Scheduling Strategies for Evaluation of Recursive Queries ... - CiteSeerX

Suggest Documents

Hibernate the Recursive Queries - Defining the Recursive Queries ...

Answering Recursive Queries Using Views - CiteSeerX

Field Evaluation of Irrigation Scheduling Strategies ...

Scheduling Intersection Queries in Term Partitioned ... - CiteSeerX

Performance Evaluation of Scheduling Algorithms for ... - CiteSeerX

Improved Results for Stackelberg Scheduling Strategies ... - CiteSeerX

Negotiation Strategies for Grid scheduling - CiteSeerX

Recursive analytical performance evaluation of broadcast ... - CiteSeerX

Design and Evaluation of Job Scheduling Strategies for Grid Computing

Evaluation of Job-Scheduling Strategies for Grid Computing Volker ...

Performance Evaluation of Scheduling Strategies for LTE Networks in ...

Performance Evaluation of Scheduling Strategies for LTE Networks in ...

Parallel Evaluation of Conjunctive Queries - CiteSeerX

Nonrecursive Incremental Evaluation of Datalog Queries - CiteSeerX

Implementation of Recursive Queries for a Data ... - Cs.UCLA.Edu

On the Nonexistence of Optimal Scheduling Strategies for ... - CiteSeerX

Novel Study of Scheduling Strategies for Multi-user MIMO ... - CiteSeerX

Performance Evaluation of Scheduling Control of ... - CiteSeerX

Distributed Aggregation Strategies for Preference Queries - Unibo

Scheduling Queries Across Replicas - University of Glasgow

Evaluation Strategies for Functional Logic Programming - CiteSeerX

Evaluation of a Broadcast Scheduling Algorithm - CiteSeerX

An Experimental Evaluation of List Scheduling - CiteSeerX

A Comparison of Dag-Scheduling Strategies for