On Scheduling Collaborative Computations on the ... - cs.UManitoba.ca

3 downloads 0 Views 180KB Size Report
a “pebble game” that abstracts the process of scheduling a computation-dag for .... mine how to play the W-O Pebble Game in a way that max- imizes the number ...
On Scheduling Collaborative Computations on the Internet, I: Mesh-Dags and Their Close Relatives (Extended Abstract)

Arnold L. Rosenberg Department of Computer Science University of Massachusetts Amherst Amherst, MA 01003, USA [email protected]

Abstract Advancing technology has rendered the Internet a viable medium for collaborative computing, via mechanisms such as Web-Based Computing and Grid-Computing. We present a “pebble game” that abstracts the process of scheduling a computation-dag for computing over the Internet, including a novel formal criterion for comparing the qualities of competing schedules. Within this formal setting, we identify a strategy for scheduling the task-nodes of a computationdag whose dependencies have the structure of a mesh of any finite dimensionality (a mesh-dag), that is optimal to within a small constant factor (to within a low-order additive term for 2- and 3-dimensional mesh-dags). We show that this strategy remains nearly optimal for a generalization of 2-dimensional mesh-dags whose structures are determined by abelian monoids (a monoid-based version of Cayley graphs).

1. Introduction Advancing technology has rendered the Internet a viable medium for collaborative computing, via mechanisms such as Web-Based Computing (WBC, for short) and GridComputing (GC, for short). Both WBC and GC depend on remotely situated “volunteers” to collaborate in the computation of a massive collection of (usually computeintensive) tasks. A WBC project proceeds essentially as follows. Interested volunteers register with a WBC website. Thereafter, each registered volunteer visits the website from time to time to receive a task to compute. Some time after completing the task, the volunteer returns the results from that task and receives a new task. And the cycle con Supported in part by NSF Grant CCR-00-73401.

tinues. Interesting examples of WBC projects are [13, 22], which benefit from Internet computing because of the sheer volume of their workloads, and [12, 14, 21], which benefit because of the computational complexity of their individual tasks. A Computational Grid is a consortium of (usually geographically dispersed) computing sites that contract to share resources; see [3, 4, 7, 8]. In contrast to the “pull”based allocation of tasks in a WBC project, GC projects often have Grid members “push” tasks to other members. Such differences aside, the collaboration inherent in both WBC and GC enables a large range of computations that cannot be handled efficiently by any fixed-size assemblage of dedicated computing agents (e.g., a multiprocessor or a NOW). Given the initial successes of Internet computing— see the cited sources—one can envision the Internet’s becoming the computing platform of choice for a variety of computational problems that are prohibitively consumptive of resources on traditional computing platforms. As with every major technological advance in collaborative computing, WBC and GC create significant new scheduling challenges even while enabling new levels of computational efficiency. In the case of WBC and GC, a major challenge arises from the fact that both WBC volunteers and participating members of a Grid promise to return the results of executing an allocated task only eventually. This lack of any time guarantee creates a nontrivial scheduling challenge when the tasks comprising the collaborative workload have interdependencies that constrain their order of execution. Specifically, such dependencies can potentially engender gridlock when no new tasks can be allocated for an indeterminate period, pending the execution of already allocated tasks. Although “safety devices” such as deadline-triggered reallocation of tasks address this danger, they do not eliminate it, since the “backup” WBC volunteer or Grid member assigned a given task may be as dilatory as the primary one.

0-7695-1926-1/03/$17.00 (C) 2003 IEEE

In this paper, we study this scheduling problem for the common situation in which the collaborative workload’s intertask dependencies are structured as a dag1 ; cf. [9, 10]. Specifically, we study how to schedule the allocation of tasks of a computation-dag in a way that minimizes the danger of gridlock, by provably (approximately) maximizing the number of tasks that are eligible for execution at each step of the computation. The contributions of our study are of three types. (1) We develop a variant of the pebble games that have proven so useful in the formal study of a large variety of scheduling problems, that is tailored for Internet-computing. A major facet of devising such a computing model is developing a formal criterion for comparing the quality of competing schedules. We formulate such a criterion for computationdags whose dependencies have the structure of a mesh of some finite dimensionality (a mesh-dag, for short). The quality-measuring component of our model formalizes and quantifies the intuitive goal that a schedule “maximizes the number of tasks that are eligible for execution at each step of the computation.” (2) We identify a strategy for scheduling mesh-dags, that is provably optimal (according to our formal criterion) to within a small constant factor (to within a low-order additive term for 2- and 3-dimensional meshdags). This strategy allocates task-nodes of a mesh-dag M along M’s successive “diagonal” levels. (3) We augment this second contribution by showing that this level-by-level scheduling strategy remains nearly optimal for a generalization of 2-dimensional mesh-dags whose structures are determined by abelian monoids (a monoid-based version of Cayley graphs).

2. A Formal Model for Internet Computing In this section, we develop the entities that underlie our study: computation-dags and the pebble games that model the process of executing the dags’ computations.

(b) an arc (u ! v ) 2 AG represents the dependence of task v on task u: v cannot be executed until u is. For each (u ! v) 2 AG , we call u a parent of v and v a child of u. We purposely do not posit the finiteness of computationdags. While the intertask dependencies in nontrivial computational jobs usually have cycles—typically caused by loops—it is useful to “unroll” these loops when scheduling the individual tasks in such jobs. This converts the job’s (often modest-size) computation-digraph into an evolving set of expanding “prefixes” of what eventually becomes an enormous computation-dag. One often has better algorithmic control over the “steady-state” scheduling of such jobs if one expands these computation-dags to their infinite limits and concentrates on scheduling tasks in a way that leads to a computationally expedient set of evolving prefixes. In this paper, we study an important class of such jobs: the large variety of (say, linear-algebraic) computations involving often-enormous matrices, whose associated computation-dags are mesh-dags.

2.2. Mesh-structured computation-dags Let N denote the set of nonnegative integers. For each positive integer d, the d-dimensional mesh-dag Md has node-set Nd , the set of d-tuples of nonnegative integers. The arcs of Md connect each node hv1 ; v2 ; : : : ; vd i 2 Nd to its d children, hv1 ; v2 ; : : : ; vj + 1; : : : ; vd i, for all 1  j  d. Node h0; 0; : : : ; 0i is Md ’s unique parentless node. The diagonal levels of the dags Md play an essential role in our study. Each such level is the subset of Md ’s nodes that share the sum of their coordidef (d) nates: each level ` = 0; 1; : : : of Md is the set L` = fhv1 ; : : : ; vd i j v1 +    + vd = `g. See Fig. 1. 0 0i

h ;

!

#

1 0i

h ;

2 0i

h ;

A directed graph (digraph, for short) G is given by: a set of nodes NG and a set of arcs (or, directed edges) AG , each having the form (u ! v ), where u; v 2 NG . A path in G is a sequence of arcs that share adjacent endpoints, as in the following path from node u1 to node un :

!

(u1 ! u2 ); (u2 ! u3 ); : : : ; (un

2

! un

1

); (un

1

! un)

A dag (directed acyclic graph) G is a digraph that has no cycles; i.e., in a dag, the preceding path cannot have u1 = un . When a dag G is used to model a computation (is a computation-dag): (a) each node v 2 NG represents a task; 1 Precise

definitions of all formal notions appear in Section 2.

3 0i

1 1i

h ;

!

2 1i

h ;

0 2i

h ;

!

# !

#

# h ;

!

#

#

2.1. Computation-dags

0 1i

h ;

1 2i

h ;

0 3i

h ;

!



# !

..

.

# !

..

.

# !

..

.

#

.. .

Figure 1. The first four diagonal levels of M2 .

2.3. The Web-Oriented Pebble Game A variety of pebble games on dags have been shown, over several decades, to yield elegant formal analogues of

0-7695-1926-1/03/$17.00 (C) 2003 IEEE

a diverse range of problems related to scheduling the tasknodes of a computation-dag G [5, 11, 15]. Such games use tokens (called pebbles) to model the progress of a computation on G : the placement or removal of pebbles of various types—which is constrained by the dependencies modeled by G ’s arcs—represents the changing (computational) status of the tasks represented by G ’s nodes. The pebble game that we study here is essentially the “no recomputation allowed” pebble game from [20], but it differs from that game in the resource one strives to optimize. For brevity, we describe only the WBC version of the game here; the simpler GC version should be apparent to the reader. The rules of the game. The Web-Oriented (W-O, for short) Pebble Game on a dag G involves one player S , the Server, and an indeterminate number of players C1 ; C2 ; : : :, the Clients. The Server has access to unlimited supplies of three types of pebbles: ELIGIBLE - BUT- UNALLOCATED (EBU, for short) pebbles, ELIGIBLE - AND - ALLOCATED (EAA, for short) pebbles, and EXECUTED pebbles. The Game’s rules are described in Fig. 2. The W-O Pebble Game’s moves reflect the successive stages in the “life-cycle” of a node in a computation-dag, from eligibility for execution through actual execution. The moves also reflect the danger of a game’s being stalled indefinitely by dilatory Clients (the “gridlock” of Section 1). The quality of a play of the game. Our goal is to determine how to play the W-O Pebble Game in a way that maximizes the number of EBU nodes at every moment—so that we maximize the chance that the Server has a task to allocate when approached by a Client. We believe that no single formalization of this goal is appropriate to all computationdags; therefore, we focus on specific classes of dags of similar structures—mesh-like dags in this paper, tree-dags in [18]—and craft a formal quality criterion that seems appropriate for each class. We hope that the analyses that uncover the optimal scheduling strategies for these specific classes will suggest avenues toward crafting criteria that are appropriate for other specific classes of computation-dags. We begin our formulation of a quality criterion by ignoring the distinction between EBU and EAA pebbles, lumping both together henceforth as ELIGIBLE pebbles. We do this because the number n of ELIGIBLE pebbles at a step of the W-O Pebble Game is determined by the structure of the computation-dag in question and by the strategy we use when playing the Game. The breakdown of n into the relative numbers of EBU and EAA pebbles reflects the frequency of approaches by Clients during that play—which is beyond our control! Given our focus on mesh-dags, we restrict attention to families G of computation-dags such that each G 2 G:

 

is infinite: The set NG is infinite; has infinite width: For each integer n > 0, in every successful play of the Game on G , there is a step at which  n nodes of G contain ELIGIBLE pebbles. One final simplifying assumption will lead us to a tractable formalization of our informal quality criterion, that is suitable for mesh-like dags:  Tasks are executed in the same order as they are allocated. Within the preceding environment, we strive to maximize, at each step t of a play of the W-O Pebble Game on a computation-dag G , the number of ELIGIBLE pebbles on G ’s nodes at step t, as a function of the number of EXECUTED pebbles on G ’s nodes at step t. In other words, we seek a schedule S for allocating ELIGIBLE tasks, that produces the slowest growing enabling function for G :

E G (n; S )

= def

the number of nodes of G that must be EXECUTED under S before there are n ELIGIBLE pebbles on G

In contrast to historical pebble games, which strive to assign EXECUTED pebbles as quickly as possible, we strive to allocate ELIGIBLE pebbles as quickly as possible.

3. Near-Optimal Schedules for Mesh-Dags Theorem 1 (a)  (i) For any schedule S for E M2 (n; S )  n . (ii) Any schedule S for

2

that schedules  nodes  along the diagonal levels of n E M2 (n; S ) < + n.

M, M 2

M

2

2

has

2

(b) (i) For any schedule S for Md , E Md (n; S ) =

(nd=(d 1)). (ii) Any schedule S for Md that schedules nodes along diagonal levels has E Md (n; S ) = (nd=(d 1)). The constants “hiding” in the big- and the big- depend on d but not on n. Proof Sketch. Our analysis of schedules for the meshdags Md proceeds by induction on d. Part (a) forms the basis of our induction, part (b) its extension. (a)(i) Focus on a time t when n nodes of M2 are ELIGIBLE. Call the induced subdag of M2 on the set of nodes fig  N (resp., N  fj g) the ith row (resp., the j th column) of M2 . The dependencies of M2 guarantee that:  No two ELIGIBLE nodes reside in the same row or the same column of M2 .  Every row- and column-ancestor (parent, parent’s parent, . . . ) of each ELIGIBLE node must be EXECUTED by time t. We must, thus, have n distinct rows, each containing a distinct (perforce, nonnegative) number of EXECUTED nodes.

0-7695-1926-1/03/$17.00 (C) 2003 IEEE

 

 

At any step of the Game, S may place an EBU pebble on any unpebbled parentless node of G . /*Such nodes are always eligible for execution, having no parents whose prior execution they depend on.*/ Say that Client Ci approaches S requesting a task. If Ci has previously been allocated a task that it has not completed, then Ci ’s request is ignored; otherwise, the following occurs. – If at least one node of G contains an EBU pebble, then S gives Ci the task corresponding to one such node and replaces that node’s pebble by an EAA pebble. – If no node of G contains an EBU pebble, then Ci is told to withdraw its request, and this move is a no-op. When a Client returns (the results from) a task-node, S replaces that task-node’s EAA pebble by an EXECUTED pebble. S then places an EBU pebble on each unpebbled node of G all of whose parents contain EXECUTED pebbles. S ’s goal is to allocate nodes in such a way that every node v of G eventually contains an EXECUTED pebble. /*This modest goal is necessitated by the possibility that G is infinite.*/ Figure 2: The W-O pebble game

The number of EXECUTED nodes must, therefore, be no n n X1 smaller than k = .

2

k=0

(a)(ii) Let S be any schedule for M2 that allocates ELI GIBLE nodes along successive diagonal levels. To see that S achieves the advertised upper bound, note that: (1) The n nodes of M2 thatlieon level L(2) n are all ELIGIBLE as n soon as all of their ancestors have been EXECUTED.

2

(2) While proceeding from the position wherein the ELIGI (2) BLE nodes comprise level Ln to the analogous position for (2) level L nn+1  , the number of EXECUTED nodes always has the + k for some k < n. See Fig. 3. form

2

(b)(i) At any time t when n nodes of Md are ELIGIBLE for execution, Md ’s dependencies guarantee that all ancestors of each ELIGIBLE node of Md must be EXECUTED. Consequently: Fact 1 For each index a 2 f1; 2; : : : ; dg: for each hu1 ; : : : ; ua 1 ; ua+1; : : : ; udi 2 Nd 1, there is at most one ua 2 N such that hu1 ; : : : ; ua 1 ; ua ; ua+1 ; : : : ; ud i is an ELIGIBLE node of Md .

We operate with the following inductive hypothesis. Inductive Hypothesis. For each dimensionality Æ < d, there exists a constant cÆ > 0 such that: for any schedule S for MÆ , for all n, E MÆ (n; S )  cÆ nÆ=(Æ 1) . Let us decompose Md into slices, each of which is a copy of Md 1 , by fixing the value in a single coordinateposition. By induction, any resulting slice that contains m ELIGIBLE nodes of Md contains  E Md 1 (m; S )  cd 1 m(d 1)=(d 2) EXECUTED nodes. By focusing only on the mutually disjoint slices corresponding to fixing u1 at the successive values 0; 1; : : :, in the light of the just-described inequalities, we infer that, for any schedule S ,

E Md (n; S ) 

cd

1



min m0 +    + mr = n m 0 >    > mr > 0

r X j =1

mdj

1

:

(3.1) The intention here is that, of the n ELIGIBLE nodes of Md , mj appear in the slice u1 = j , for j = 0; 1; : : : ; r. The requirement that parents of ELIGIBLE nodes be EXECUTED nodes, coupled with Fact 1, guarantees that m0 >    > mr , which in turn guarantees that m0 > r. By the concavity of the function x(d 1)=(d 2) , the sum in (3.1) is minimized when the mi are as close to equal as possible. We are, thus, led to the following assignment of the desired n ELIGIBLE nodes to slices. We start with m0 = bn1=(d 1) c and each mj = m0 j . If the thus-assigned mj sum to less than n, then we increment m0 m0 + 1; m1 m1 + 1; : : :, in turn, proceeding until we have a total of n ELIGIBLE nodes. We thus have

E Md (n; S ) 

cd

1



=(d 1) c bn1X

jd

1

= (nd=(d

1)

):

j =1

(b)(ii) Any schedule S that allocates ELIGIBLE nodes of Md along successive diagonal levelsachieves theadverm+d 1 tised upper bound. This is because the nodes d 1 (d) on level Lm of Md are all ELIGIBLE whenever all of their

X j + d 1 m + d 1 = d 1 d

m 1 j =0

ancestors have been EXECUTED. As schedule S proceeds from the position wherein the ELIGIBLE nodes comprise (d) level Lm , to the position wherein the ELIGIBLE nodes com(d) prise level Lm +1 , the number  of EXECUTED nodes always m+d 1 has the form + ` for some `  jL(md)j = d

0-7695-1926-1/03/$17.00 (C) 2003 IEEE

X X E

!

#

)

#

!

E

E

!

E

#

!

#

X

)

!

!

# !

# E

#

!

#

+

X

!

#

X

!

E

#

X

!

X

!

E

#

!

X

# !

X

#

X #

X

X

!

# !

#

(

#

E

!

#

E

! X

#

E #

(

!

E

#

X

X

!

# !

# E

#

!

!

#

!

E

E

!

# !

# !

#

!

#

+

X

!

#

X

!

X

#

X

!

#

X

!

E

#

X

!

#

E

!

#

E

!

# !

#

)

!

  

#

E

!

#

Figure 3. The start of a diagonal-level computation of node; “E ” denotes an ELIGIBLE node.





m+d 1 . Thus, when there are n ELIGIBLE nodes, n d 1 lies in the range

m + d 1  d 1

n
0. Now, if node x  y of M is ELIGIBLE at some step t of the W-O Pebble Game, then all of the nodes fu  v j [u < x] and [v < y]g must be EXECUTED by step 3 When

= 1, the digraph

M

has self-loops, hence is not acyclic.

References [1] F.S. Annexstein, M. Baumslag, A.L. Rosenberg (1990): Group action graphs and parallel architectures. SIAM J. Comput. 19, 544–569. [2] A.R. Butt, S. Adabala, N.H. Kapadia, R. Figueiredo, J.A.B. Fortes (2002): Fine-grain access control for securing shared resources in Computational Grids. Intl. Parallel and Distr. Processing Symp. (IPDPS’02). [3] H. Casanova (2002): Distributed computing research issues in Grid Computing. Typescript, Univ. California, San Diego. [4] W. Cirne and K. Marzullo (1999): The Computational CoOp: gathering clusters into a metacomputer. 13th Intl. Parallel Processing Symp., 160–166. [5] S.A. Cook (1974): An observation on time-storage tradeoff. J. Comp. Syst. Scis. 9, 308–316. [6] V.V. Dimakopoulos and N.J. Dimopoulos (2001): Optimal total exchange in Cayley graphs. IEEE Trans. Parallel and Distr. Systs. 12, 1162–1168.

0-7695-1926-1/03/$17.00 (C) 2003 IEEE

1 a b

c d

λ ρ

λ ρ

A λ

e

A ρ

B w x

e

λ

e

y z B w x

Figure 4. A schematic depiction of Fact 3. Node A is a b c+e d .

ρ

e

y z B

= c d; Node B is a b+e = (c d+e = a+e b ) =

[7] I. Foster and C. Kesselman [eds.] (1999): The Grid: Blueprint for a New Computing Infrastructure, MorganKaufmann.

[18] A.L. Rosenberg (2002): On scheduling collaborative computations on the Internet, II: Tree-structured computation-dags. Typescript, Univ. Massachusetts.

[8] I. Foster, C. Kesselman, S. Tuecke (2001): The anatomy of the Grid: enabling scalable virtual organizations. Intl. J. Supercomputer Applications.

[19] A.L. Rosenberg (2003): Accountable Web-computing. IEEE Trans. Parallel and Distr. Systs. 14, to appear.

[9] A. Gerasoulis and T. Yang (1992): A comparison of clustering heuristics for scheduling dags on multiprocessors. J. Parallel Distr. Comput. 16, 276–291. [10] L. He, Z. Han, H. Jin, L. Pan, S. Li (2000): DAG-based parallel real time task scheduling algorithm on a cluster. Intl. Conf. on Parallel and Distr. Processing Techniques and Applications (PDPTA’2000), 437–443. [11] J.-W. Hong and H.T. Kung (1981): I/O complexity: the redblue pebble game. 13th ACM Symp. on Theory of Computing, 326–333. [12] The Intel Philanthropic hwww.intel.com/curei.

Peer-to-Peer

[20] A.L. Rosenberg and I.H. Sudborough (1983): Bandwidth and pebbling. Computing 31, 115–139. [21] The RSA Factoring by Web Project. hhttp://www.npac.syr.edu/factoringi (Foreword by A. Lenstra). Northeast Parallel Arch. Ctr. [22] C. Weth, U. Kraus, J. Freuer, M. Ruder, R. Dannecker, P. Schneider, M. Konold, H. Ruder (2000): XPulsar@home — schools help scientists. Typescript, Univ. T¨ubingen. [23] S.W. White and D.C. Torney (1993): Use of a workstation cluster for the physical mapping of chromosomes. SIAM NEWS, March, 1993, 14–17.

program.

[13] E. Korpela, D. Werthimer, D. Anderson, J. Cobb, M. Lebofsky (2000): SETI@home: massively distributed computing for SETI. In Computing in Science and Engineering (P.F. Dubois, Ed.) IEEE Computer Soc. Press, Los Alamitos, CA. [14] The Olson Laboratory Fight AIDS@Home hwww.fightaidsathome.orgi.

project.

[15] M.S. Paterson, C.E. Hewitt (1970): Comparative schematology. Project MAC Conf. on Concurrent Systems and Parallel Computation, ACM Press, 119–127. [16] A.L. Rosenberg (1971): Data graphs and addressing schemes. J. Comput. Syst. Scis. 5, 193–238. [17] A.L. Rosenberg (1972): Addressable data graphs. J. ACM 19, 309–340.

0-7695-1926-1/03/$17.00 (C) 2003 IEEE