Actor Languages for Speci cation of Parallel Computations - CiteSeerX

10 downloads 223 Views 237KB Size Report
lence Award from the Digital Equipment Corporation Faculty Program, and .... provide the history-sensitive behavior necessary to model asynchronous access.
DIMACS Series in Discrete Mathematics and Theoretical Computer Science Volume 00, 0000

Actor Languages for Speci cation of Parallel Computations GUL AGHA, WOOYOUNG KIM AND RAJENDRA PANWAR Abstract. We describe high-level language constructs for specifying parallel programs and show how they may be used to provide modular speci cation of communication, synchronization and placement. The high-level constructs are translated into actors which provide exible low-level primitives for interconnecting distributed components and ecient execution on concurrent computers. We argue that our linguistic constructs allow parallel program speci cations that are easier to reason about and ecient to implement.

1. Introduction

Current methods for programming parallel computers involve very low-level mechanisms which allow ecient execution only on particular architectures. In order to raise the level of abstraction at which programs are speci ed, a number of requirements must be met by high-level languages for parallel computing. Speci cally, in order for high-level parallel languages to be practical they must:  provide constructs which abstract over the coordination structures that are common in concurrent computing;  simplify programming by separating design concerns from implementation details;  include only those language constructs which may be transformed into code that is eciently executable on concurrent architectures. Actors provide exible low-level primitives for interconnecting components in order to support their ecient execution on concurrent computers (see Fig. 1). 1991 Mathematics Subject Classi cation. Primary 68N15; Secondary 68Q10. The research described has been made possible by support from the Oce of Naval Research (ONR contract numbers N00014-90-J-1899 and N00014-93-1-0273), by an Incentives for Excellence Award from the Digital Equipment Corporation Faculty Program, and by joint support from the Defense Advanced Research Projects Agency and the National Science Foundation (NSF CCR 90-07195).

c 0000 American Mathematical Society 0000-0000/00 $1.00 + $.25 per page 1

2

GUL AGHA, WOOYOUNG KIM AND RAJENDRA PANWAR Local Computation

Local Computation

Local Computation

Local Computation

Local computation and coordination. We build programming abstractions based on actors in order to provide highlevel speci cation of communication, synchronization, scheduling and placement. Data and the procedures to manipulate them are encapsulated within actors, much as objects do in sequential computing. An advantage of object oriented programming is that it separates how something is computed (the representation) from what is computed (the interface speci cation). In sequential languages, the order in which statements are executed is totally determined by the order in which they are given and the call/return semantics of procedure invocations. For parallel computing, such linearization of the order of execution is an overspeci cation which is both inecient and unnecessary. Moreover, speci cation of where something is to be done is meaningless on a single computer. In actor languages, constraints on the order of execution (when something may be done) and the placement (where it may be done) are speci ed separately from how and what. The resulting abstraction in the code promotes greater reuse of concurrency patterns, for example, when the same functionality is used in contexts requiring di erent reactive behavior. The organization of this paper is as follows. In the next section we describe our computational model. Section 3 illustrates our programming model and motivates some of the linguistic abstractions we develop in subsequent sections. Section 4 describes speci cation of synchronization constraints. Section 5 discusses group creation abstractions and communication abstractions. Section 6 describes one of the communication abstractions we provide. Abstractions for speci cation of placement are discussed in Section 7. Finally, we brie y discuss some related research and open issues. Figure 1.

2. The Computational Model

Actors are independent concurrent objects that interact by sending asynchronous messages. Each actor has a unique mail address which may be used to send it messages. Moreover, a mail address may be included in messages sent to

ACTOR LANGUAGES FOR SPECIFICATION 1

2

n n+1 n+2 ...

. . .

address

3

mail queue become Xn Xn+1

send create message 1

2 ...

address mail queue

Y1

Actions performed by an actor in response to a communication. In response to the nth message an actor speci es a new behavior Xn+1 which will be used to process the next pending message. Newly created actors have their own mail address and corresponding mail queue. Figure 2.

other actors { this allows those actors to communicate with the actor whose mail address they have received. Each actor has a behavior which determines how the actor responds to a given message. The behavior of an actor is de ned by a script consisting of a set of methods (or procedures) and a list of acquaintances representing actors whose mail addresses are known to the actor. Communication in actors is bu ered: incoming messages are queued until the actor is ready to respond to them. The behavior of an actor consists of three kinds of actions; in response to a message, an actor may:  send messages asynchronously to speci ed actors,  create actors with speci ed behaviors, and  become a new actor, assuming a new behavior to respond to the next message. Figure 2 illustrates the behavior of an actor in response to a message. We represent each of the three kinds of actions that an actor may take with a corresponding primitive operator. The send primitive operator causes a message to be put in the receiver's mail queue. Communication is point-to-point so that the recipient's identity (its

4

GUL AGHA, WOOYOUNG KIM AND RAJENDRA PANWAR

mail address) must be speci ed. As noted above, mail addresses may also be communicated in a message, allowing for a dynamic communication topology. Each message invokes a method at the destination. Although the arrival order of messages is nondeterministic, every message sent to an actor is guaranteed to be delivered eventually. The become primitive operator allows actors to change their behavior; become creates an anonymous actor to carry out the rest of the current computation, alters the behavior of the actor executing the become to be the behavior speci ed by its argument, and frees that actor to accept another message. This provides additional parallelism. The anonymous actor may send messages or create new actors in the process of completing its computation, but will never receive any messages as its mail address can never be known. Note that in open distributed systems, the order of arrival of messages from di erent external sources is nondeterministic. The become operator is used to provide the history-sensitive behavior necessary to model asynchronous access to shared resources in such systems. A canonical example of the use of become is in modeling a shared bank account accessible by two or more automatic teller machines. Although the current behavior of an actor is always a deterministic function of the sequence of messages that the actor has thus far received, the sequence cannot be predicted a priori . The create primitive operator is used to dynamically create an actor with a speci ed behavior. It allocates a unique mail address to the newly created actor and returns this address. Thus, the mail address is returned to the creating context (in the actor executing the create ) and is initially known only to the creating actor. Subsequently, it may be communicated to other actors. The actor primitive operators form a simple but powerful set on which to build a wide range of higher-level abstractions and concurrent programming paradigms. In fact, the actor operators may be used to extend almost any standard sequential language to provide coordination and communication in a distributed environment. In this paper, we extend an imperative language with a syntax similar to C or Modula-3 with actor constructs to obtain a concurrent high-level language. The constructs in our language, Hal, include both the primitive actor operators de ned above and high-level communication, synchronization and coordination abstractions. Moreover, we de ne a meta-architecture which provides a exible execution model for actors. In particular, this metaarchitecture is used to specify the placement or migration policy for a group of actors. The Actor model was rst described by Hewitt [17] and later developed in [1]. A mathematical theory of actors is developed in [4].

ACTOR LANGUAGES FOR SPECIFICATION

5

3. Cholesky Factorization in Actors

We illustrate the actor operators by providing some example programs. In particular, because we are extending a sequential language, actor programs with di erent concurrency characteristics may be speci ed to implement the same algorithm. We use the Cholesky Decomposition algorithm for a dense matrix to illustrate the concurrency characteristics of the speci cations that may be given. Moreover, the discussion will also motivate some abstractions for actors that we develop in subsequent sections. For a given symmetric positive de nite matrix A of size n  n the Cholesky Factorization algorithm computes a lower triangular matrix L, of size n  n such that A = LLT [16]. The algorithm may be described as follows: p 1  i  n (0) Ai1 = Ai1 = A11, for j = 2:n Aij = Aij ? end

Aij = Aij =

Xj?1Aik  Ajk

pAjj

k=1

,

,

j  i  n

j  i  n

(1) (2)

Figure 3 describes the communication patterns and the concurrency available in the Cholesky Decomposition algorithm. An uninteresting implementation of the Cholesky Decomposition algorithm is a coarse one: all matrix elements are encapsulated within an actor, and the algorithm may be speci ed within a single method which implements the algorithm sequentially. In order to specify the algorithm with some parallelism, each row of a matrix may be encapsulated as an actor and the matrix itself may be represented as a group of actors. (We discuss linguistic support for a simple form of groups in Section 5). The following speci es the behavior of a row actor which performs step 1 of an iteration in the Cholesky algorithm. The i-th iteration gets started by sending start iteration message to the Row actor actor whose row index value is equal to i. behv

Row actor

| row index, element array, next iter | %% element array

is an array representing elements of row

... method start iteration ()

broadcast element array with to the rows below; start the next iteration;

end

iteration

message

row index

6

GUL AGHA, WOOYOUNG KIM AND RAJENDRA PANWAR a11

a11

a21

a22

a31

a32

a33

a41

a42

a43

a44

a51

a52

a53

a54

a55

a21

a22

a31

a32

a33

a41

a42

a43

a51

a52

a53

a44 a54

a55

Iteration k=3; aij computes aij*akj and sends value to aik; aik is updated to aik−aij*akj

Iteration k=3, broadcast elements of row 3 to all rows below

a11

a11 a21

a22

a31

a32

a33

a41

a42

a43

a44

a51

a52

a53

a54

a55

Iteration k=3; updated akk is sent to all elements of column k=3; element aik is updated to aik/sqrt(akk)

a21

a22

a31

a32

a33

a41

a42

a43

a44

a51

a52

a53

a54

a55

Iteration k=4 is started

Communication Patterns in Cholesky Decomposition algorithm. Figure 3.

method iteration(iter, row) prod = vec vec product(element array, row); element array[iter] prod row[iter];

update

using

and

next iter = iter + 1; end ... end

A third implementation of Cholesky matrix encapsulates each element as an actor. Such encapsulation expresses the maximal parallelism available in the algorithm. The behavior of an element actor is given as the behavior Element . The methods perform computation given in step 1 of Cholesky algorithm. The i-th computation is initiated by broadcasting the start iteration message to the element actors of row i. behv

Element

| i, j, self value | method start iteration ()

ACTOR LANGUAGES FOR SPECIFICATION

7

broadcast self value with multiply row message to the column elements below; start the next iteration;

end method

end

multiply row (row element, iter)

send update column(self value*row element) to the i th element of column iter;

method

update column (prod) self value = self value - prod;

end ... end

The eciency of the above parallel algorithm depends not only on how the elements are mapped to actors (i.e., the domain decomposition), but how the resulting actors are placed. Placement speci cations are discussed in Section 7. Note that the Actor model guarantees the eventual delivery of a message but does not guarantee that the message send order is preserved on message delivery. In order to ensure the correct execution of the above version of Cholesky decomposition, we need some way to order the computational phases correctly. This is achieved in our model by allowing the recipient to enforce the correct order on message reception. A naive way of ensuring the correct order of message processing stipulates that the execution of one iteration be completed before starting the next. Such an implementation of the Cholesky Decomposition algorithm is inecient compared to one which pipelines the execution of iterations by overlapping the communication and computation in the algorithm. In fact, without such pipelining, the Cholesky Decomposition algorithm is not scalable. We show this by looking at a speci c analytic measure for scalability. An experimental measurements, reported elsewhere, also con rm the prediction of the analytic model [3]. The isoeciency function speci es how a problem size must grow to keep the constant eciency as the number of processors, p, increases. A small isoeciency function means that a small increment in the problem size is sucient for the ecient utilization of an increasing number of processors, indicating the parallel algorithm is highly scalable. For more detailed discussion of the isoeciency function, see [22]. Using isoeciency functions, Table 1 compares the scalability of the Cholesky Decomposition algorithm using sequentialized iterations with that using pipelined iterations on a mesh multicomputer. The analysis of the pipelined case is dependent on two measures: namely, tcomp and tcomm which are, respectively, the time taken to perform an iteration and the time taken for communication during an iteration of the algorithm (Figure 3). Suppose an mm submatrix is assigned to each processor. Although the time to communicate a single num-

8

GUL AGHA, WOOYOUNG KIM AND RAJENDRA PANWAR

Scheduling Strategy Isoeciency Sequentialized O(p4:5) Iterations Pipelined Iterations O(p3) (tcomp > tcomm ) Pipelined Iterations O(p4:5) (tcomp < tcomm ) Table 1. Isoeciency function of Cholesky Decomposition algorithm using di erent scheduling strategies. tcomp and tcomm are the time taken to perform an iteration and the time taken for communication during an iteration of the algorithm, respectively. (see Figure 3) ber between two nodes is typically much greater than the time to compute a

oating point number, tcomp is quadratically proportional to m while tcomm is only linearly proportional to m. Thus, only for very small m, tcomm > tcomp . Implementation of the pipelined version is direct in the message driven actor framework which requires the recipient to order messages correctly. On the other hand, sequentializing iterations would require complex synchronization between multiple actors which would be more dicult to express. In the next section, we describe our notation to allow a more abstract speci cation of local synchronization constraints.

4. Specifying Synchronization Constraints

As we discussed in the above example, messages corresponding to di erent iterations of Cholesky Decomposition algorithm may exist at the same time in the system. Because the arrival order of messages is nondeterministic, messages may be delivered out of order; for example, an actor on processor P1 can send a message to a given actor r to do iteration k while processor P2 sends a message to r to do iteration (k+1). Even if the message from P2 is sent after the message from P1, the two messages may follow di erent paths and reach r in the reverse order. Processing messages for two di erent iterations in a wrong order leads to incorrect results; the speci cation must impose the correct processing order on incoming messages. In order to ensure the correct execution order without compromisingeciency, we use local synchronization constraints. Synchronization constraints specify a set of states under which a particular method of an object may be invoked by a given message. By explicitly postponing certain messages, synchronization constraints can guarantee the correct order of message reception, and thus, data consistency of the receiving actor. Moreover, synchronization constraints, unlike input guards in conventional process oriented languages [19], do not cause the

ACTOR LANGUAGES FOR SPECIFICATION

9

sender to wait until such time when the recipient is in a state in which it can process the message. Thus synchronization constraints ensure maximal overlap of computation and communication. We specify synchronization constraints as follows: restrict

< msg-expr>

with (

< bool-expr>

);

where is a function of acquaintance variables and method arguments. Messages which match the pattern speci ed by are delayed if the actor's current state does not satis es the . Such messages are put into the actor's pending queue and wait to be processed until the state changes cause it to be no longer disabled. Using synchronization constraints is similar to, but more general than, the notion of enabled sets [27]. In enabled sets, a method cannot be enabled for some messages while disabled for others. In the Cholesky Decomposition example, the synchronization constraint for method iteration can be speci ed as follows: restrict

iteration(iter,row)

with

(next iter == iter) ;

The use of synchronization constraints simpli es programming in a large number of cases. For example, Figure 4 shows a printer spooler that receives messages from clients and a printer. A client sends a put message to the spooler to request the execution of a print job. When the printer is free it sends a get message to the spooler to process the next print job. The spooler may use synchronization constraints to delay invocation of get message if there are no pending requests. behv

Spooler

| nJobs | restrict

get()

with

(nJobs > 0) ;

method put (job) ... end method get () ... end end

The key advantage of using local synchronization constraints is that it frees the programmer from explicitly bu ering and testing messages based on its local state. Moreover, synchronization constraints can be incrementally modi ed when used with an inheritance mechanism, provided that they are speci ed on a per class basis and separated from method de nitions [14]. In particular, such constraints may be incrementally weakened allowing substitutability of subclasses

10

GUL AGHA, WOOYOUNG KIM AND RAJENDRA PANWAR

Spooler

get put

Client Printer

get is delayed if spooler has no requests

A printer spooler. with respect to liveness properties, strengthened allowing substitutability of subclasses with respect to safety properties, or replaced entirely. Figure 4.

5. Group Abstractions

If the computation structure of a problem provides some inherent uniformity in the behavior of a group of actors, the speci cation can be greatly simpli ed by using group abstractions. For example, data parallelism available in an algorithm can be naturally expressed by using abstractions to represent group communication. Besides simplifying the speci cation, abstractions based on groups often allow compilers to optimize execution. For example, they may reduce message trac: broadcasting a message to a group may be optimized by sending only one copy of the message to each node and making group members re-use the message. Moreover, copying the message for each member actor may be avoided. A general framework for modeling groups is the ActorSpace paradigm [8] which provides group abstractions as a key component of its semantics. ActorSpace adopts the pattern-based model of communication. Messages may be sent to groups of actors; patterns are used to de ne such groups. In Hal, we use a simple but restrictive model of groups in order to ensure greater eciency at the cost of expressiveness. A group is explicitly created with its member actors. Abstractly, member actors are organized as an ordered collection, though they may be physically distributed across the di erent nodes of a concurrent computer. Groups provide a passive container as in the ActorSpace model, but only at structures are allowed for groups. Groups cannot be nested in other groups and they can't overlap. Each group is assigned a unique group identi er which is used to name the group. A member actor is referred to by

ACTOR LANGUAGES FOR SPECIFICATION

11

specifying its group identi er and an index expression. For example, the execution of the following method fragment: a = B.grpnew (16); a[5] a message ();

sends a message message to the fth member actor of the group. In order to allow member actors to name the group to which they belong, Hal provides for pseudo-variables mygrp and mygrpidx . Pseudo variables are variables which can be referred to but cannot be modi ed. mygrp and mygrpidx are instantiated with the group identi er and the actor's index in the group, respectively, by the creator actor. A member actor can name its peer actor by using mygrp and an index expression. mygrp[0] is de ned to be a mail address of the creator actor of the group. Group related primitives are divided into three categories with respect to their functionalities: namely, those related to group creation, group communication and group membership manipulation. grpnew and clone are used to create a group with its member actors. A group of actors may be created using grpnew operator. Such creation distributes member actors across physical nodes in some indeterminate manner. However, the creation may be extended to allow the speci cation of particular distribution of member actors by associating a distribution strategy (as we discuss in Sec. 7). Note that the group size must be explicitly speci ed in the group creation expression. clone is a specialization of grpnew in that exactly one member actor is created on each node of the system. If a message is sent to a group identi er (i.e., a group), the message is broadcast to all members of the group ( broadcast ). Point-to-point communication between member actors can be done by naming individual member actors using the group identi er. However, message sends to an arbitrary indeterminate member of the group, as in the ActorSpace model, is not explicitly supported. Membership is dynamic but restricted. It may be manipulated using resign and restore which are similar to, but more restrictive than, visibility control primitives in the ActorSpace model. Execution of resign by an actor causes the suspension of its own membership of a group. An actor is not allowed to suspend other actors' membership. restore , when executed, restores the membership of all the original members of the given group. Figure 5 illustrates the use of group abstractions. The methods iteration and eliminate implement the i-th iteration of Gaussian Elimination algorithm. The following is corresponding i-th iteration of the forward elimination process of sequential Gaussian Elimination algorithm without pivoting for a linear system Ax = b. for j = i + 1:n Aij = Aij = Aii end

12

GUL AGHA, WOOYOUNG KIM AND RAJENDRA PANWAR

behv Row j rowidx, rowA, eltB, nextiter j restrict iteration() with (rowidx == nextiter); restrict eliminate(iter,inRow,inB) with (iter == nextiter); :::

method iteration () for i = rowidx + 1 to colsize(rowA) do rowA [i] = rowA [i] / rowA [rowidx]; end eltB = eltB / rowA [rowidx]; rowA [rowidx] = 1; mygrp eliminate (rowidx, rowA, eltB); if (rowidx == colsize(rowA)) then mygrp.restore (); mygrp backsubst (rowidx, eltB); else resign (); end end method eliminate (iter, inRow, inB) :::

end method backsubst (i, x) :::

end end

Figure 5. A group formulation of Gaussian Elimination without pivoting

bi = bi = Aii Aii = 1 for j = i + 1:n for k = i + 1:n Ajk = Ajk ? Aji  Aik end

end

bj = bj ? Aji  bi

i-th iteration gets started by normalizing i-th row of A and i-th element of b. Then, it eliminates i-th column below i-th row. After i-th iteration, i-th row does not participate in the computation until the back substitution process begins. Thus, for i + 1-th iteration, even if the eliminate message is broadcast by the actor containing i + 1-th row, actors for 1st through i-th rows will not process the message. Synchronization constraints may be used to make this implication explicit, but only with unnecessary overhead. By controlling scope of the group dynamically with resign and restore , we avoid unnecessary messages to those actors that would not process the messages { thus simplify the speci cation of the algorithm.

ACTOR LANGUAGES FOR SPECIFICATION

6. Communication Abstractions

13

Although point-to-point asynchronous message send is the most ecient form of communication in scalable distributed networks, concurrent languages must provide a rich set of communication abstractions to simplify programming. Our approach is to de ne abstractions and use program transformations to manipulate them into primitive actors for ecient execution. We illustrate our approach using call/return communication. In call/return communication, an object invokes another object and waits for it to return a value before continuing. A standard mechanism for call/return communication in concurrent programming is the remote procedure call: a procedure calls another procedure at a remote node and waits for the result which is returned to the place where the call was made. In high-level actor languages, concurrent call/return communication allows a simple expression of functional parallelism. Speci cation of actor coordination may require the speci cation of dependence relation of an actor's continuation on the `remote' result. Such dependence may be explicitly expressed by using asynchronous message sends with continuations and synchronization constraints. However, explicit manipulation of continuations and synchronization constraints can be tedious and error prone. A call/return communication operator in actor languages provides a simple abstraction to express such dependence in programs. Call/return communication causes the results of computations done by di erent actors to be sent back to the message sending context, thus making the continuation implicit in the caller's program and allowing for more intuitive reasoning about programs. In addition, call/return communication may be used to impose a temporal order between computations at two di erent actors. Call/return communication subsumes remote procedure calls: in actors, addressing is universal and location transparent. We generally do not want to block a sender of an call/return message send; if the actor invoked is on a di erent node, we could lose useful concurrency unnecessarily. Furthermore, the costs of saving and restoring the execution context of an actor (such as stack frames and register values) can be considerable. Whenever feasible, we allow the sender to continue its computation as soon as it has nished the message send. This is achieved by transforming call/return communication to semantically equivalent asynchronous message sends with corresponding continuations. Depending on how the result of a remote computation is to be used, either a behavior template for continuation actors or a continuation method is emitted as the result of the transformation. If the replacement behavior of the sender is dependent on the result, a continuation method will be generated. Otherwise, a continuation actor will be generated [21, 2]. Consider the following statement in the body of an actor a : b

: : : ),

msg (v, c.request1 (

: : : ));

d.request2 (

14

GUL AGHA, WOOYOUNG KIM AND RAJENDRA PANWAR request2 d

reply2 invocation

a

asynchronous send

b

request1

reply

c reply1

Figure 6.

Before the transformation creating join continuation asynchronous send

request1

d

reply1

a

jc

request2

Figure 7.

c

b

reply2

After the transformation of creating join continuation

When the above statement is executed, two message sends (invocations) are executed which send actors c and d the messages request1 and request2 , respectively. Actor b is then sent a message which includes the results from these two invocations along with the value of the expression v (computed by the actor a ). Figures 6 and 7 show the calling pattern of the execution before and after the transformation, respectively. In this case, the transformation has generated a continuation actor. We illustrate the use of call/return communication with the implementation of bitonic merge network used in bitonic sort algorithm [22]. The behavior template BitonicMergingNetwork in Figure 8 implements a key operation of the bitonic sorting network which rearranges a bitonic sequence into a monotonically increasing sequence. A member actor encapsulates an element of the input bitonic sequence. Let s = ha0 ; a1; : : :; a2n?1i be a bitonic sequence such that a0 a1    an?1 and an    a2n?1. For simplicity, we assume the numbers in the sequence are positive numbers and its length is a power of 2. In order to sort the sequence in monotonically increasing order, for each pair (ai , an+i), the lesser must go to the left subsequence, s1 , and the greater must go to the right subsequence, s2 . Thus, separating s into s1 and s2 requires a comparison and a possible swap of two numbers for each pair (ai , an+i ). Actors belonging to the left half of s send a compare message with their value to their corresponding actors in the

ACTOR LANGUAGES FOR SPECIFICATION

15

%% Assume that the number of elements to be sorted are 2n behv BitonicMergingNetwork j value j init (x) value = x; end method swap (dist) j temp j if ((mygrpidx-1) % (2*dist) dist) then temp = mygrp[mygrpidx+dist].compare (value); if (temp 0) then value = temp; end if (mygrpidx == 1) then if (dist 2) then mygrp swap (dist/2); else %% notify the creator of the group of %% nishing one stage of merging. mygrp[0] - nish one stage (); end end end end method compare (left) if (left value) then reply (value); value = given; else reply (0); end end end


>




Behavior Template for BitonicMergeNetwork right half of s. If swapping is needed, the recipient of the compare message replies its value back to the sender. Otherwise, it ignores the value and replies 0, instead. Actors on the left side wait for the arrival of the reply and then update their value accordingly. The wait for the reply is expressed with call/return communication (using dot notation) in Figure 8. Note that the compiler will transform the call/return communication into an asynchronous message send with an appropriate synchronization constraint and, in this case, generate a continuation method. Figure 8.

7. Specifying a Placement Strategy

Because an algorithm's performance on parallel computers depends in part on how many messages should be sent to which nodes, it is important to place and schedule objects in an ecient way to get good execution performance. For example, consider a dense matrix { such matrices are used in a number of algorithms ranging from Gaussian Elimination to graph algorithms such as the Floyd-Worshall's algorithm for the all-pairs shortest-path problem and domain decomposition techniques for solving Partial Di erential Equations. A dense

16

GUL AGHA, WOOYOUNG KIM AND RAJENDRA PANWAR

matrix may be partitioned on a parallel computer using di erent placement strategies (see, for example, [22]). In general, the problem of nding an optimal policy is intractable. However, a developer might be able to determine the most ecient policy for a given algorithm and a given architecture. To allow portability of parallel algorithm implementations without sacri cing eciency, it is necessary to separate architecture dependent portion of the speci cation (i.e., placement policy) from the algorithm speci cation. The two speci cations may be combined to obtain an ecient implementation of the algorithm for a given problem size and architecture. We describe the placement policy in terms of the distribution of a group of actors on a concurrent computer. We characterize a placement policy for a given set of actors as xed with respect to a set of events E if the placement of actors in the set remains the same for all events bounded by E. We de ne an event as bounded by a set of events if it must occur between some two events in the set regardless of the observer. Note that events in a distributed system are partially ordered but may be mapped to a linear global time as they may be observed by a hypothetical observer [12]. In intuitive terms, a xed placement policy for a computation performed by a group of actors places the actors before the computation starts and doesn't change their location during the computation. A placement policy which is not xed is said to be dynamic. Fixed placement policies are often sucient for representing data structures where the number of data elements and their communication pattern during a computation are known before the computation begins; although the actors representing a static data structure may be dynamically created, the computation described by the algorithm starts after the data structure has been instantiated. By contrast, in a dynamic placement policy, the placement of a group of actors performing a computation is determined during the course of a computation. Dynamic placement policies are especially important for parallel algorithms that use dynamic data structures because the total number of actors involved in the computation and their exact communication topology may not be determined before the computation starts. For example, the number of non-zero elements in a sparse matrix and their topology may change during the computation and thus the structure of the sparse matrix may get modi ed. Similarly, the topology between nodes in a binary search tree and its structure may depend on the order in which the data elements are added to the tree. For a good review of parallel algorithms which naturally use dynamic data structures, see [24]. Dynamic placement policies may need to interact with the ongoing computation to place newly created actors or to migrate existing ones. In some cases, installation of placement decisions for dynamic placement policies may be triggered by messages which mark the beginning of a particular phase of the computation. In others, it may get started by a distinguished event that the load on a particular node exceeds the prede ned threshold.

ACTOR LANGUAGES FOR SPECIFICATION

17

Initial Placement. An actor with a behavior an array

anArray

Element holding an element of can be created on a remote node with the statement

Element.new (anArray[anIndex]) on node i;.

Note that node i evaluates to the identi er of a node in a concurrent computer on which the actor is to be created (i.e., placed). Each actor is associated with a system actor called its meta-actor. The actor is referred to as the meta-actor's base actor. A meta-actor processes new operation executed by its base actor. It also traps and executes migrate message sent to its base actor. We may customize the meta-actor of an actor to implement a particular placement policy. To specify a placement policy for a group, we may customize the create operations of all actors which may create members of the group. Note that the computations related to a placement policy are all implemented as the behaviors of meta-actors whereas the application algorithm is speci ed as the behaviors of the base actors. This provides the separation of the speci cation of placement policies from the algorithms.

Migration. Execution of the migrate operator moves an actor from one node

to another during a computation. Two potential reasons to use migration are as follows. First, di erent phases of an algorithm (or an application) may be implemented more eciently using di erent placement policies. Such changes may be speci ed statically but triggered dynamically when a speci ed component reaches a particular phase. Second, the irregular nature of a computation may require that the placement be determined based on current states of actors that are involved in the computation. In particular, heuristics based on the dynamic behavior of the computation may be used to determine actor placement. Adaptive load balancing or di usion scheduling are two examples of such strategies. The migration of an actor may be triggered by sending to the actor migrate message with node expression as its argument. Such messages are trapped and processed by its meta-actor which responds by moving the actor to the processor with the address node expression. The migrate message itself may be sent by other meta-actors; for example, a group of actors may represent a tree data structure and the meta-actor of an actor may send the migrate message to the actor's children. The migration of the actors in the group to satisfy a new placement policy requires a meta-actor to communicate with the meta-actors of its base actor's two children (called lchild and rchild ) but does not otherwise depend on the speci c implementation of the algorithm in the tree nodes. For a more detailed discussion of modular placement speci cations, see [25].

8. Discussion

Concurrent object oriented programming is an active area of research interest [5, 9]. The programming constructs used in many concurrent object oriented

18

GUL AGHA, WOOYOUNG KIM AND RAJENDRA PANWAR

languages are closely related to those in actors (e.g., Cantor [6], ABCL [26], Concurrent Aggregates [11] and Charm [20]). In some cases, there are also fundamental di erences between the di erent models of concurrent objects. For example, CC++ is one of several proposals to extend C++ with concurrency constructs. CC++ adds parallel programming constructs such as par , parfor and spawn to C++ [10]. Much as message passing and actor creation do in actors, such constructs provide for functional parallelism as well as explicit concurrency. However, unlike actors, CC++ objects do not necessarily represent units of concurrency: multiple threads may be active in a given object. By contrast, an actor provides both a data abstraction boundary and a unit of concurrency. The operational semantics of the group abstraction we use here is a particular form of the ActorSpace paradigm [8]. ActorSpace uses communication based on destination patterns. In Hal, a restricted model of patterns provides greater eciency at the cost of expressiveness. Another restrict model of actor groups is Concurrent Aggregates which supports static membership [11]. It should be observed that the synchronization constraints we discuss have some important limitations: because they depend only on the local state of a single actor, they are unsatisfactory for describing collective multiactor coordination patterns. Such coordination patterns may be expressed in the form of synchronizers [2, 15]. Synchronizers allow us to specify multiactor constraints such as temporal ordering constraints on invocations and invocation atomicity constraints. A synchronizer is speci ed in terms of the interface of a group of actors, and independently of their functionality. Thus synchronizers separate the code for coordination from that for the actors' functionality, enabling better description, reasoning, and modi cation. Our methodology for modular speci cation of placement and migration of actors bears resemblance to some extensions of Fortran. In particular, FortranD [18] and High Performance Fortran (HPF) [23] allow explicit speci cation of data decomposition and distribution policies to improve execution on distributed memory computers. For irregular problems, such as PDEs on unstructured mesh or sparse matrix algorithms, the communication pattern depends on the input data. In this case, it is not feasible to gure out the necessary data distribution a priori. To address this problem, PARTI [13] and Kali [7] transform a userde ned for loop to a inspector/executor pair. The inspector loop determines what data are needed for each processor and where they are and prefetches them to localize them. Later, the executor uses the prefetched data to perform the computation speci ed in the original loop. In these languages, the compiler is expected to gure out some forms of concurrency such as the potential for pipelining iterations in parallelizing the loops in the program. Such parallelization works well for programs which t a Single Program Multiple Data (SPMD) model of computation but does not generally capture functional parallelism. In many regular problems using dense matrices, a compiler may be smart enough to exploit useful parallelism in programs. In

ACTOR LANGUAGES FOR SPECIFICATION

19

other cases, it may not be able to detect all useful parallelism. In our model, actors are the unit of concurrency and all forms of parallelism can be expressed explicitly in the programs themselves. Moreover, although HPF extends Fortran 90 which supports dynamic data structures, HPF placement directives cannot be used in conjunction with such recursive structures. Our framework allows placement and migration policies for arbitrary data structures to be expressed. References

1. G. Agha, Actors: A Model of Concurrent Computation in Distributed Systems, MIT Press, 1986. 2. G. Agha, S. Frlund, W. Kim, R. Panwar, A. Patterson, and D. Sturman, Abstraction and Modularity Mechanisms for Concurrent Computing, IEEE Parallel and Distributed Technology: Systems and Applications 1 (1993), no. 2, 3{14. 3. G. Agha, C. Houck, and R. Panwar, Distributed Execution of Actor Systems, Languages and Compilers for Parallel Computing (D. Gelernter, T. Gross, A. Nicolau, and D. Padua, eds.), Springer-Verlag, 1992, Lecture Notes in Computer Science 589, pp. 1{17. 4. G. Agha, I. Mason, S. Smith, and C. Talcott, Towards a Theory of Actor Computation, Third International Conference on Concurrency Theory (CONCUR '92), Springer-Verlag, August 1992, LNCS, pp. 565{579. 5. G. Agha, P. Wegner, and A. Yonezawa (eds.), Research Directions in Concurrent ObjectOriented Programming, MIT Press, Cambridge, Massachussets, 1993. 6. W. Athas and C. Seitz, Multicomputers: Message-Passing Concurrent Computers, IEEE Computer (1988), 9{23. 7. P. Mehrotra C. Koelbel, Compiling Global Name-space Parallel loops for Distibuted Execution, IEEE Transactions on Parallel and Distributed Systems 2 (1991), no. 4, 440{451. 8. C. J. Callsen and G. A. Agha, Open Heterogeneous Computing in ActorSpace, Journal of Parallel and Distributed Computing (1994), 289{300. 9. Denis Caromel, Toward a Method of Object-Oriented Concurrent Programming, Communications of the ACM 36 (1993), no. 9, 90{102. 10. K. M. Chandy and C. Kesselman, Compositional C++: Compositional parallel programming, Research Directions in Object-Oriented Programming (G. Agha, P. Wegner, and A. Yonezawa, eds.), MIT Press, 1993. 11. A. Chien, Supporting Modularity in Highly-Parallel Programs, Research Directions in Object-Oriented Programming (G. Agha, P. Wegner, and A. Yonezawa, eds.), MIT Press, 1993. 12. W. Clinger, Foundations of Actor Semantics, AI-TR- 633, MIT Arti cial Intelligence Laboratory, May 1981. 13. R. Das, R. Ponnusamy, J. Saltz, and D. Mavriplis, Distributed Memory Compiler Methods for Irregular Problems - Data Copy Reuse and Runtime Partitioning, Languages, Compilers and Run-Time Environments for Distributed Memory Machines (J. Saltz and P. Mehrotra, eds.), Elsevier Science Publishers, 1992. 14. S. Frlund, Inheritance of Synchronization Constraints in Concurrent Object-Oriented Programming Languages, ECOOP'92 European Conference on Object-Oriented Programming (O. Lehrmann Madsen, ed.), Springer-Verlag, June 1992, Lecture Notes in Computer Science 615, pp. 185{196. 15. S. Frlund and G. Agha, A Language Framework for Multi-Object Coordination, Proceedings of the European Conference on Object-Oriented Programming '93, Springer Verlag, 1993, LNCS 707, pp. 346{360. 16. G. Golub and C. Van Loan, Matrix Computations, The Johns Hopkins University Press, 1983. 17. C. Hewitt, Viewing Control Structures as Patterns of Passing Messages, Journal of Arti cial Intelligence 8 (1977), no. 3, 323{364. 18. S. Hiranandani, K. Kennedy, and C.-W. Tseng, Compiling Fortran-D for MIMD Dis-

20

GUL AGHA, WOOYOUNG KIM AND RAJENDRA PANWAR

tributed Memory Machines, Communications of the ACM 35 (1992), no. 8, 66{80. 19. C. A. R. Hoare, Communicating Sequential Processes, Communications of the ACM 21 (1978), no. 8, 666{677. 20. L. V. Kale and S. Krishnan, CHARM++: A Portable Concurrent Object Oriented System Based On C++, OOPSLA 93' (Andreas Paepcke, ed.), ACM Press, October 1993, ACM SIGPLAN Notices 28(10). 21. W. Kim and G. Agha, Compilation of a Highly Parallel Actor-Based Language, Proceedings of the Workshop on Languages and Compilers for Parallel Computing (U. Banerjee, D. Gelernter, A. Nicolau, and D. Padua, eds.), Yale University, Springer-Verlag, 1993, LNCS 757. 22. V. Kumar, A. Grama, A. Gupta, and G. Karypis, Introduction to Parallel Computing: Design and Analysis of Algorithms, Benjamin/Cummings Publishing Company, Inc., 1994. 23. David. B. Loveman, High Performance Fortran, Parallel & Distributed Technology, Systems & Applications 1 (1993), no. 1, 25{42. 24. P. Mehrotra, J. Saltz, and R. Voigt (eds.), Unstructured Scienti c Computation on Scalable Multiprocessors, MIT Press, Cambridge, Massachussets, 1992. 25. Rajendra Panwar and Gul Agha, A Methodology for Programming Scalable Architectures, Journal of Parallel and Distributed Computing (1994), (to appear). 26. K. Taura, S. Matsuoka, and A. Yonezawa, An Ecient Implementation Scheme of Concurrent Object-Oriented Languages on Stock Multicomputers, Fourth ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming PPOPP, May 1993, pp. 218{228. 27. C. Tomlinson and V. Singh, Inheritance and Synchronization with Enabled-Sets, OOPSLA Proceedings, 1989. Open Systems Laboratory, Department of Computer Science, 1304 W. Springfield Avenue, University of Illinois at Urbana-Champaign, Urbana, IL 61801, USA

E-mail address : fagha j wooyoung j [email protected]

Suggest Documents