Exploiting Parallelism in Tabled Evaluations? - Semantic Scholar

3 downloads 6718 Views 278KB Size Report
Email: fjuliana,ruihu,tswift,[email protected]. Abstract. ... allelism inherent in these evaluation methods which we call table-parallelism. At a general level, the idea ..... Memory Layout ..... 14. H. Seki. On the power of Alexander templates.
Exploiting Parallelism in Tabled Evaluations? Juliana Freire, Rui Hu, Terrance Swift, David S. Warren Department of Computer Science State University of New York at Stony Brook Stony Brook, NY 11794-4400 Email: fjuliana,ruihu,tswift,[email protected]

Abstract. This paper addresses general issues involved in parallelizing tabled evaluations by introducing a model of shared-memory parallelism which we call table-parallelism, and by comparing it to traditional models of parallelizing SLD. A basic architecture for supporting table-parallelism in the framework of the SLG-WAM[16] is also presented, along with an algorithm for detecting termination of subcomputations.

1 Introduction The de ciencies of SLD resolution are well known, and extended e orts have been made to remedy these de ciencies. For instance, while SLD can be combined eciently with negation-by-failure in SLDNF, the semantics of SLDNF have proven unacceptable for many purposes, in particular for non-monotonic reasoning. Even without negation SLD is susceptible to in nite loops and redundant subcomputations, making it unacceptable for deductive databases. The latter de ciency, that of repeating subcomputations, has given rise to several systems which table subcomputations: OLDT [18], SLD-AL [20], and SLG [5, 4] are three tabling methods which have been implemented. At an abstract level, systems which use magic evaluation can also be thought of as tabling systems. Substantiation for this claim stems both from the asymptotic results of [14] and the experimental results of [17]. Tabling is also be relevant for computing the well-founded semantics: besides SLG, well-founded ordered search [15] and the tabulated resolution of [3] also use tabling. While nearly all of these approaches are sequential, there is a natural parallelism inherent in these evaluation methods which we call table-parallelism. At a general level, the idea behind table-parallelism is simple: the table can be thought of as a large structured bu er through which cooperating threads communicate. Parallelization then can take place at each tabled subgoal. If a subgoal is called and is not in the global table, an entry for it is created, and answers are derived for the subgoal using program clause resolution. Otherwise, if this subgoal is a variant of subgoal present in the table, it will consume the answers stored in the table. ?

This work was partially supported by CAPES-Brazil and NSF grants CCR-9102159, CCR-9123200, CCR-9404921, CDA-9303181.

t(X,Y) r(1,2). r(1,3). r(3,2). b(2,3). b(3,4). b(3,5).

r(1,2), s(2,Y)

r(1,3), s(3,Y)

r(3,2), s(2,Y)

s(X,Y) :- b(X,Y). s(X,Y) :- b(X,Z), s(Z,Y).

s(2,3) s(2,4) s(2,5)

s(3,4) s(3,5)

s(2,3) s(2,4) s(2,5)

t(X,Y) :- r(X,Z), s(Z,Y).

Fig. 1. Join in Parallel Prolog Fig. 1 illustrates how, by reducing redundant subcomputations, tabling can improve performance over an SLD-based resolution method ([2, 17]). An orparallel (or any sequential) Prolog would have to compute the relation s(2; Y ) twice. In a tabling system, if the predicate s were declared as tabled it would be computed only once, and any other subsequent call would simply consult the table. This avoidance of redundant computation has long been recognized as necessary for data-oriented queries. On the other hand, a bottom-up method such as tabling can also introduce an overhead for copying information into and out of tables, making it unsuitable for programs such as list recursion. We believe that, just as a mixture of SLD and tabled evaluation is arguably most useful for sequential evaluation, a combination of SLD-based and- and orparallelism and table-parallelism will prove most useful for parallel evaluation. In terms of practical programs, the mixture of SLD and tabling has proven useful for program analysis [8] over at domains [6], and for the Uni cation Factoring compiler optimization [7]. While implementation has so far been sequential, both of these algorithms contain bottom-up subcomputations that are amenable to table-parallelism, along with top-down computations amenable to traditional SLD parallelism. The structure of the paper is as follows: { We brie y overview general concepts underlying tabling using SLG formalism, and its implementation in the SLG-WAM. { Using SLG terminology, we introduce the abstract model of table-parallelism and discuss its relation to more traditional and- and or- parallelism, and brie y discuss issues for the analysis of table-parallelism. { We describe and prove the correctness of a parallel completion algorithm which detects termination of subcomputations, allowing a great deal of concurrency between subcomputations. { In the framework of the SLG-WAM [16] , we present the extensions to tabling operations necessary to implement table-parallelism.

2 A brief overview of SLG SLG evaluates programs by keeping a table of answers to subgoals, and resolving repeated subgoals against answer clauses from the table rather than against

0. a(a,Z):gen:a(a,Z)

1.a(a,Z):act:a(a,Z) i are done, another subgoal rj , where 0  j < i, generates a new answer and activates a subgoal rk that has been already checked by root. The leader will wrongly assume that rk is done, and it will incorrectly complete. Following the model of [9], we address this problem using a ColorFlag for each tabled subgoal. The ColorFlag indicates that a node in the SCC may have generated new answers after another node was checked as completed by the leader. This ag can be either black, if new answers were added, or white, otherwise.

Subgoal

StateFlag AnswerFlag ColorFlag

n

done ...

done ...

white ...

k+1

done .. .

done .. .

white .. .

ri .. .

i+1

done .. .

done .. .

white .. .

rj .. .

j+1

done .. .

done .. .

black .. .

root

1

undone

done

white

r n-1 ... rk .. . check flags

DFN

wake up

Fig. 6. Snapshot of Completion Table Initially the ColorFlags for all tabled subgoals are set to white. When after waking up, a subgoal r generates new answers, it will set its own color to black. If the leader nds a black node, during termination detection, it concludes that nodes that it has already checked might have been awakened, and therefore this procedure is suspended. Later, leader and termination detections will have to be restarted. With the addition of the ColorFlag, Proposition 2 can be strengthened. Proposition3. (Termination Detection Algorithm) Given a set S of mutually dependent tabled subgoals, Algorithm 4.2 will mark as complete the subgoals in S i all subgoals in S are completely evaluated. The correctness of termination detection depends on the fact that there is only one node, the leader, in each SCC executing this procedure at one time. If multiple nodes in an SCC were allowed to execute termination detection simultaneously, they may interfere with each other in checking and setting the ColorFlag. Given a formalization of a parallel tabled evaluation as a whole, and not just of its completion algorithm, it can be proven that the synchronization required to complete tables causes neither deadlock nor starvation (this formalization is provided in the full version of the paper [11]). We now turn to the description of how the operations of Section 3 can be modi ed to implement table-parallelism.

5 Implementation Framework The instruction-level changes needed to parallelize the SLG-WAM are discussed below. Since the changes mainly involve adding concurrency features to tabling operations, and since SLG-WAM instructions generally correspond to primitive tabling operations, a detailed knowledge of the SLG-WAM in not required to understand this section. For further elaboration, SLG-WAM instructions for definite programs have been presented in [16]. Changes occur in the following stages of the tabling process:

{ In the Tabled Subgoal Call operation; { In the New Answer operation, when adding an answer for a tabled subgoal; { In the Completion operation. Note that the Answer Return operation of Section 3 is not a ected by parallelization. We discuss each aspect in the following subsections.

Tabled Subgoal Call As outlined in Section 3, three di erent kinds of actions can occur when a tabled subgoal is selected. We review the actions which take place in a sequential system. (1) If a tabled subgoal is not already in the table, an entry is added for it and a new frame is created in the completion table. In the case of the SLG-WAM, the subtree rooted in this new subgoal is then explored via program clause resolution. (2) If the subgoal is in the table, but is not completely evaluated, an active node is set up which will use answer clause resolution on the subgoal. This active node cannot be deallocated until the subgoal is known to be completely evaluated. Finally, (3) if the subgoal is in the table and completed, the answers are treated as if they were facts, and the node may be deallocated as soon as it has backtracked through the answers in the table. In the parallel model, at a call to a new tabled subgoal, the caller must arrange for that subgoal to be scheduled for execution. In addition, an active node is created which will consume answers, and a pointer to this node is added in the subgoal dependency list of the caller. The calling subgoal will then suspend the current call and move on to nd other work in its own subtree. On the other hand, if the selected subgoal is in the table and not complete, the subgoal dependency list should be updated and resolution can then begin using answers in the table. Finally, if the subgoal is in the table and is completed, actions are the same in the parallel and sequential models: the answers are treated as facts and neither the completion table nor the subgoal dependency list needs to be updated. Adding a new SLG subgoal to the table will require a lock to prevent multiple threads from simultaneously adding the same subgoal to the table. In the SLGWAM the table for each predicate is structured as a tree, so only a subtree needs to be locked allowing a great deal of concurrency for this operation.

New Answer In principle, parallelism introduces a minimum of change for the operation of adding a new answer to a table. If the answer is not already in the table, the operation will set the subgoal's ColorFlag to black indicating that other subgoals may have been a ected. The AnswerFlag, which is a (pointer to a) list of pointers into the table, will be e ectively updated as discussed in Section 4. Note that no locking will be necessary when adding answers to a table, since there is a single generating node for each table (though possibly many consuming nodes).

A deeper issue in parallelization is how to schedule return of new answers to existing active subgoals. For the parallel SLG-WAM, we are implementing a strategy somewhat akin to traditional breadth- rst searches.

Completion In a manner similar to traditional semi-naive execution, executions of the completion operation in the sequential engine mark the boundaries of an iteration and control the scheduling of answer resolution. At the end of each iteration the completion instruction checks whether new answers have been added or whether the SCC has instead reached a xpoint. An important di erence however, is that while traditional semi-naive methods use static approximations of SCCs, the completion algorithm dynamically determines SCCs. Another di erence stems from the fact that the choice point stack controls the scheduling for the engine. The completion instruction is executed when the engine fails back into a completion choice point frame. This instruction in turn might add answer return choice points to the choice point stack. These answers will cause new search space to be explored and, in a terminating program, the engine will eventually backtrack to the completion frame, causing a new iteration or determining a xpoint. This breadth- rst strategy can be incorporated into parallel execution: whenever a subgoal executes a completion operation, it returns to its active subgoals any answers added since the last time it checked for completion. If no new answers have been added, rather than determining the xpoint immediately as in the sequential case, a parallel consuming subgoal must coordinate with others to determine if its active subgoals have indeed been completely evaluated. If the current subgoal is a leader, it checks for the xpoint as described in Section 4. Otherwise, it must poll at intervals to see whether new answers have been added. Algorithm 5.1 schedule answers(Subgoal: S) If S.subgoal dependency list is empty set S.StateFlag to unconditionally done; deallocate the completion frame for S; exit; Else For each subgoal ri in S.subgoal dependency list If (ri is unconditionally complete and there are no unexamined answers) Remove ri from list; Else If there are unexamined answers for ri Set up choice points to backtrack through answers;

The parallel completion operation is speci ed in Algorithm 5.2, and the portion of it which schedules answer return is summarized in the procedure

schedule answers of Algorithm 5.1. Recall that the subgoal dependency list

maintains a list of subgoals for each tabled subgoal which are active in that subgoal's tree. In schedule answers, a subgoal removes other subgoals from its subgoal dependency list as they become complete. Thus, if the list is empty, the subgoal can mark itself as unconditionally complete. Otherwise, it will run through its subgoal dependency list, removing any subgoals which have become unconditionally complete and scheduling answer resolution otherwise. For each subgoal in the subgoal dependency list, a choice point is set up to backtrack through unresolved answers in the table. The evaluation then returns answers to each subgoal, by failing into these answer return choice points. Algorithm 5.2 completion(Subgoal: S) If schedule answers exits exit; If S depends on a subgoal which is not complete If answers have been returned S.StateFlag = undone; backtrack through new answers; Else S.StateFlag = done; perform leader detection(S); If S is the leader perform termination detection(S); Else sleep(S);

In Algorithm 5.2, after a generator subgoal polls for answers (using procedure

schedule answers) it will set its StateFlag to undone and continue with its

normal, sequential SLG, evaluation. When it runs out of work (or if there was no work to begin with), it resets its StateFlag to done. If the current subgoal is not the leader of its SCC, it then sleeps. Otherwise, it will start the termination detection described in Section 4.

6 Conclusion Tabling adds important termination properties to evaluations, whether they are sequential or parallel. In doing so, it may avoid recomputation and reduce the complexity of some programs, while adding an unnecessary overhead to others. In an actual parallel system, then, one would expect some predicates to use table-parallelism, others to use and- or or- parallelism, and still others to be executed sequentially. To attain this level of integration for a parallel engine, it must be possible to detect dynamic SCCs and their completion so that resources may be released and negation performed correctly. Section 4 provided algorithms for ensuring this, and Section 5 indicated how the basic tabling operations must change to implement table-parallelism. While the development in this paper has taken place within the framework of SLG and the SLG-WAM, we believe that many of the results here are generally

applicable to other tabling-style evaluation strategies and engines. For instance, magic set evaluations, which determine SCCs statically, could pick leaders statically, and would not require the leader detection algorithm presented here. However, the termination algorithm, and its associated ags would allow detection of xpoint of an SCC. Indeed, by keeping in mind the well-known similarities between tabling and magic evaluation (see e.g. [17]) table-parallelism can also be seen as a way to parallelize database queries that may include recursion. In our implementation framework, only three elements are shared: the table itself, the completion table, and the subgoal dependency list. If the table itself were distributed using techniques as those in [19] and if the completion algorithms were distributed, then a model of distributed table-parallelism could be formulated which would serve as a general mechanism for distributed data queries. Our current e orts regarding table-parallelism are thus twofold: in addition to implementing the shared memory parallelism sketched here, we are also exploring extensions to these algorithms for distributed table-parallelism. Acknowledgements: We are indebted to Phil Lewis for his contributions to the design of the termination and leader detection algorithms.

References 1. K. R. Apt. Correctness proofs of distributed termination algorithms. ACM Transactions on Programming Languages and Systems, 8(3):388 { 405, July 1986. 2. F. Banchilhon and R. Ramakrishnan. An amateur's introduction to recursive query processing strategies. In Proc. of SIGMOD 1986 Conf., pages 16{52. ACM, 1986. 3. R. Bol and L. Degerstedt. Tabulated resolution for the well-founded semantics. In Proc. ILPS'93 Workshop on Programming with Logic Databases. MIT Press, 1993. 4. W. Chen, T. Swift, and D.S. Warren. Ecient implementation of general logical queries. J. Logic Programming. To Appear. 5. W. Chen and D. S. Warren. Query evaluation under the well-founded semantics. In Proc. of 12th PODS, pages 168{179, 1993. 6. M. Codish and B. Demoen. Analysing logic programs using 'prop'-ositional logic programs and a magic wand. In Proc. of the Int'l Symp. on Logic Programming, pages 114{130, 1993. 7. S. Dawson, C. R. Ramakrishnan, I. V. Ramakrishnan, K. Sagonas, S. Skiena, T. Swift, and D. S. Warren. Uni cation factoring for ecient execution of logic programs. In Proc. of the 22nd Symp. on Principles of Programming Languages. ACM, 1995. 8. S. Dawson, C.R. Ramakrishnan, and D.S. Warren. Using XSB for abstract interpretation. In Special Workshop on Abstract Interpretation, 1995. Eliat, Israel. To Appear. 9. E.W. Dijkstra, W.H.J Feijen, and A.J.M. van Gasteren. Derivation of a termination detection algorithm for distributed computations. Information Processing Letters, pages 217 { 219, June 1983. 10. N. Francez. Program Veri cation. Addison-Wesley, 1992.

11. J. Freire, R. Hu, T. Swift, and D.S. Warren. Exploiting parallelism in tabled evaluations. Technical report, SUNY at Stony Brook, 1995. Full version available at http://www.cs.sunysb.edu/~sbprolog. 12. M. Hermenegildo and F. Rossi. On the correctness and eciency of independent and-parallelism in logic programs. In N. Amer. Conf. on Logic Programming., 1989. 13. M. Hermenegildo and F. Rossi. Non-strict independent and-parallelism. In Logic Programming: Proc of the 5th Int'l. Conf., pages 237{252, 1990. 14. H. Seki. On the power of Alexander templates. In Proc. of 8th PODS, pages 150{159. ACM, 1989. 15. P. Stuckey and S. Sudarshan. Well-founded ordered search. In 13th conference on Foundations of Software Technology and Theoretical Computer Science, pages 161{172, 1993. 16. T. Swift and D. S. Warren. An abstract machine for SLG resolution: de nite programs. In Proceedings of the Symposium on Logic Programming, pages 633{ 654, 1994. 17. T. Swift and D. S. Warren. Analysis of sequential SLG evaluation. In Proceedings of the Symposium on Logic Programming, pages 219{238, 1994. 18. H. Tamaki and T. Sato. OLDT resolution with tabulation. In Third Int'l Conf. on Logic Programming, pages 84{98, 1986. 19. J. D. Ullman. Flux, sorting, and supercomputer organization for AI applications. Journal of Parallel and Distributed Computing, 1:133{151, 1984. 20. L. Vieille. Recursive query processing: The power of logic. Theoretical Computer Science, 69:1{53, 1989. 21. D.H.D. Warren. An abstract Prolog instruction set. Technical report, SRI, 1983. 22. D.H.D. Warren. Or-parallel models of Prolog. In Proceedings of the International Conference on Theory and Practice of Software Development. Springer-Verlag, 1987.

This appendix contains only supplementary reference material for the convenience of the referees.

A Appendix: Correctness Proof of Completion Algorithm In this section we provide an informal proof of the correctness of the completion algorithm presented in Sections 4 and 5. Program veri cation techniques, such as the ones described in [10, 1], are needed for a formal proof. The proof here is based on the distributed completion proof of [9]. Algorithm A.1 evaluation(S:subgoal) while(true) if (S.StateFlag == unconditionally done) deallocate entry for S in completion table; exit; computation(S); leader detection(S) termination detection(S);

At any point during an evaluation, a subgoal is performing one of the following actions, as can be seen in Algorithm A.1: checking if it is unconditionally done; resolving answer and program clauses (computing); performing leader detection; or executing termination detection. The following properties are assumed as invariants for each subgoal S throughout evaluation: 1. There is only one entry for each subgoal in the answer and completion tables. 2. S has consumed all answers from the previous iterations. 3. S.SDL contains all subgoals S depends on. 4. After S nishes its computation, S.StateFlag is set to done. Locking the answer and completion tables at Subgoal Call guarantees that property 1 holds. Since answers are scheduled on the choice point stack, on top of the root choice point, the engine will only start a new iteration { failing over the root { after all answers are consumed. Properties 3 and 4 follow from the speci cations of Subgoal Call and completion described in Section 5. Let us de ne an invariant I as being composed by these properties. This invariant should hold as pre and post-condition for the procedures in Algorithm A.1. After a subgoal nishes computation, it starts the leader detection. Since multiple subgoals might start leader detection simultaneously, we have to guarantee that a single subgoal will succeed and mark itself as the leader of the SCC it is part of. While executing leader detection, a subgoal will traverse its subgoal dependency list (which contains all subgoals it directly depends on) to discover the topology of its detected SCC. Notice that the topology of the detected SCC might

change after leader detection. In the case where the detected SCC is not the same as the true SCC, termination detection will fail, because one of the subgoals will have its ColorFlag marked as black. In this case, leader detection will be performed again by the leader of the true SCC in accordance with Algorithm 5.2. In fact, based on the graph-theoretic facts that (1) SCCs are disjoint, and (2) there exists exactly one leader (minimal DFN node) for each SCC, it can be proven that in an evaluation where there are a nite number of subgoals and answers, Algorithm 4.1 (leader detection) will eventually choose a unique leader for each true SCC. The argument is essentially a formalization of the one given in Section 4. If a single (and correct) leader is elected during the leader detection, the termination detection phase will complete the corresponding SCC if and only if all of its components are done, that is, they have no more program or answer clauses to resolve. Termination detection is a deeper property than detection of a unique leader and more dicult to prove. The termination detection algorithm is stated Algorithm 4.2, and its correctness is derived from the following theorem. Theorem4. (Termination Detection Algorithm) Given a set S of mutually dependent tabled subgoals, Algorithm 4.2 will mark as complete the subgoals in S i 8A 2 S , A is completely evaluated.

Proof:

()) Only the leader of an SCC will execute the termination detection phase. At that time, if all subgoals in the SCC are completely evaluated, the leader will complete the tables and signal the other subgoals to terminate. If any subgoal is still busy, the leader will wait until it becomes idle to resume the checking. If the leader nds any subgoal with a black ColorFlag, it whitens the ColorFlags for all other subgoals in its SCC and suspends the completion check. In the sequel, a series of re nements of an invariant P will be constructed. P will capture various systems states, starting with some simplifying assumptions and evolving to the full problem. While executing the termination detection, root, the root of the SCC, checks the status of subgoals rn?1; :::; r1; root, in this order. The necessary condition for the completion of the SCC is that root succeeds the checking for all ri. Let t denote the number of the subgoal whose

ags are currently being checked by root, then completion checking ends with t = 0. As a rst step, we solve the problem in the absence of wakeups: in that case an undone (active) subgoal can become done (passive) | no more work left, but a done subgoal cannot become undone. The conclusion that all subgoals are done has to follow from (i) the invariant (P0 as shown below); (ii) t = 0; and (iii) the fact that the ags of the root itself are done. Furthermore, the invariant (P0 ) has to hold, independently of the value of t, when subgoal root has initiated the completion checking, i.e., when t = n ? 1. The above requirements are met by P0: P0 : (8i : t < i < N : subgoal ri is done)

We can now design the checking of AnswerFlag and StateFlag of each subgoal so as to keep P0 invariant. Initiation of completion checking establishes P0. Sequentially checking the remaining subgoals, however, decreases t by 1, and may falsify P0. Invariance of P0 is achieved by adopting:

Rule A.1 The root waits until the subgoal ri+1 is done before it checks ri's StateFlag and AnswerFlag. In Algorithm 4.2, Rule A.1 corresponds to the statements: While ((ri :StateFlag == undone) or (ri :AnswerFlag == undone)); When the ags of the other subgoals are done, the checking of ags will in due time be reached by root; when in addition subgoal root is done, termination can be concluded. The above discussion comprises the rules for checking ags. Next to be taken into account is the possibility of wakeup: P0 is falsi ed when subgoal ri with t < i becomes undone on account of being waken up. Since only undone subgoals wake up other subgoals, we deduce that the waking up that falsi ed P0 has been initiated by a subgoal rj with j  t. In order to solve the situation we adopt the weaker invariant P0 _ P1, such that any wakeup that may falsify P0 establishes P1. To this end, each subgoal is postulated to be either black or white (represented by the ColorFlag of each subgoal). For P1 we choose:

P1 : (9j; 0  j  t; such that subgoal rj is black) Wakeup is prevented from falsifying P0 _ P1 by adopting the following rule:

Rule A.2 A subgoal wakes up another subgoal with a number higher than its own makes itself black 8 .

We have to verify, however, that the information available at subgoal root combined with the weaker P0 _ P1 can still suce to conclude termination. Since (t = 0 ^ subgoal root is white) =) : P 1 detection of termination has not been disabled. With the possibility of a subgoal's having the color black, a new phenomenon has been introduced, viz. that of the unsuccessful completion checking: when a black subgoal is detected, the conclusion of termination cannot be drawn. In a rst instance this problem can be tackled by adopting:

Rule A.3 After failing the completion checking, root restarts checking for completion. 8 For consistency with Section 5, a relaxed Rule 1 is used in Algorithm 4.2. This relaxed rule e ectively states that a subgoal waking up another subgoal will blacken itself.

This corresponds to the statements If (rk :ColorFlag == black)



suspend and restart leader detection later; in Algorithm 4.2. Without the possibility of transitions from black to white, such a rechecking may not be successful. Therefore our next task is to assure that the whitenings do not falsify the invariant P0 _ P1. In view of the fact that initiating a check for completion establishes P0 , we can safely adopt:

Rule A.4 Subgoal root initiates a completion checking by making itself white and starting checking the status of subgoal rn?1.

The root can safely be whitened, and we turn to the other subgoals. Since whitening a subgoal can falsify only P1, but does do so when that subgoal's number exceeds t, we can safely adopt:

Rule A.5 When it restarts completion checking, the root whitens all subgoals with number greater than t.

Since the restart of completion checking changes the value of t to n ? 1, which establishes P0 , we can further adopt:

Rule A.6 When it restarts the completion checking, the root whitens all subgoals from 0 to N ? 1. In Algorithm 4.2 this corresponds to the statements If (rk :ColorFlag == black) For each subgoal ri 2 S's SCC ri :ColorFlag = white; The above whitening protocols suce to prove that when the algorithm terminates then all subgoals of the SCC are completed. An iteration of completion checking initiated after all subgoals are completely evaluated will end with all subgoals white and hence, a reiteration of the completion check is guaranteed to succeed. (() Now we prove that, given a set S of mutually dependent tabled subgoals, if 8A 2 S , A is completely evaluated, then the completion algorithm (Algorithm 4.2) will mark as complete the subgoals in S . Let us de ne a new invariant: completed. We say that the invariant completed is true if the leader initiated termination detection and all the subgoals in its SCC are completely evaluated. Let ControlPart be the set of all states formed by each thread executing either termination detection or leader detection:

ControlPart = [[T1 :: [LeaderDetection;TerminationDetection]] : : : [Tn :: [LeaderDetection;TerminationDetection]]] In order to prove the above, the following statement has to hold in the sense of total correctness. If a leader enters the ControlPart with the pre-conditions I and completed, it will leave the ControlPart with the StateFlags for all subgoals in the SCC set to unconditionally done: fI ^ completedg ControlPart fStateFlag is unconditionally doneg This statement holds for evaluation because: 1. There exists exactly one node that will be detected as leader by Algorithm 4.1 and; 2. Since all subgoals in the SCC have AnswerFlags and StateFlags set to done (because completed holds), the SCC is completely evaluated and therefore all subgoals will be marked as unconditionally done. 2 The concurrent execution of computation, leader detection and termination detection by multiple subgoals might result in starvation and deadlocks. We prove below that our algorithms are deadlock-free and lead to no starvation.

Proposition5. (Absence of Deadlock and Starvation)

There is no deadlock or starvation in Evaluation (Algorithm A.1).

Proof. There are three possible sources of deadlock involving leader detection: within leader detection itself; between leader detection and termination detection; and between leader detection and computation. In the latter case, the StateFlag is guaranteed to be marked as done as soon as the subgoal runs out of answer and program clauses to resolve (Algorithm 5.2) or unconditionally done when all the subgoals it depends on are completed (Algorithm 5.1) , and thus there cannot be any cyclic dependencies. The absence of deadlock within leader detection can be easily seen from the fact that there is an implicit ordering. Only subgoals with smaller DFN wait for subgoals with higher DFN, but not the other way around. And the subgoals with higher DFN are guaranteed to mark themselves as not leader if they depend on a subgoal with a lower DFN. Another possible source of deadlock is when a subgoal Ssmall depends on a subgoal Slarge , with a higher DFN, which is in a di erent SCC. In this case, Ssmall will wait for Slarge , which by Proposition 4, will eventually complete and mark its StateFlag as unconditionally done, thus breaking the dependency. Termination detection only waits for computation, but since computation need not wait for termination detection, deadlock is prevented. Starvation might occur if subgoals repeatedly execute only leader detection and termination detection. We have to make sure that all possible computation is performed. A subgoal S is said to be completed if it is the leader of its SCC, and all the other subgoals in the SCC are completely evaluated.

In order to prove that no starvation will occur, the following statement has to hold in the sense of total correctness. If a leader enters the ControlPart with the pre-conditions I and :completed, it will leave the ControlPart with the StateFlags for all subgoals in the SCC not set to unconditionally done: fI ^ :completedg ControlPart fStateFlag is not unconditionally doneg The intuition behind this statement is that if a leader is executing the termination detection and its SCC not completely evaluated, or if the topology of the SCC changed in the meantime, termination will fail. At this point either computation or a new leader detection has to be scheduled.

This article was processed using the LaTEX macro package with LLNCS style