Gardens: An Integrated Programming Language and System for Parallel Programming Across Networks of Workstations Paul Roe and Clemens Szyperski School of Computing Science, Queensland University of Technology, Brisbane, Australia. fp.roe,
[email protected]
Abstract. Gardens is an integrated programming language and system
supporting parallel computation across networks of workstations. It addresses a combination of goals: it (i) maximises performance and (ii) is still safe, it supports (iii) the programming of abstractions (parallel libraries) and (iv) adaptive parallel computation, ie, computation that adapts at run-time to a changing set of available workstations. In Gardens, tasks represent units of work and task migration supports adaptation: releasing workstations on demand. To support adaptation, problems are over-decomposed into more tasks than processors. Tasking is non-preemptive: simplifying semantics and admitting a very ecient implementation. Within its local heap, each task manages a collection of global objects. These support: communication via global methods which other tasks may invoke, abstraction and typed safe asynchronous communication (including freedom from self-in icted and distributed network deadlocks). The implementation of global objects maps eciently to high performance messaging layers, such as Active Messages.
keywords: programming language, programming model, parallel computing, networks of workstations, adaptive parallelism
1 Introduction Idle workstations represent a considerable computational resource|as yet untapped. Gardens is an integrated programming language and system designed to utilise such resources; it supports parallel computation across networks of, otherwise idle, workstations. Gardens is targeted at workstation networks which utilise state-of-the-art communications networks such as Myrinet and ATM. Such systems have the potential to provide supercomputer levels of performance. Thus Gardens enables a virtual supercomputer to be dynamically constructed from idle workstations. Why is yet another new programming language and system required? Because no existing system meets all the requirements of Gardens (see next section). Gardens' requirements are: adaptation, safety, abstraction and performance (ASAP ).
Performance: the whole point of creating a virtual supercomputer is performance. To achieve overall performance of a parallel system it is crucial to have high performance communications (in terms of latency and throughput). State-of-the-art communications networks oer the required performance; however this can be lost in the software layers of high level communications libraries. (This rules out many distributed systems approaches to the requirements.) For this reason special lightweight communications libraries have been developed, eg Active Messages (AM) [von Eicken et al., 1992]. These communications libraries are very ecient, but rather low level in nature. In order to perform well, Gardens has been heavily in uenced by AM. We achieve a comparable performance to using C and Active Messages, and meet the other requirements. Safety: this is the prevention of untrapped program errors, eg dereferencing an uninitialised pointer. Static safety checking prevents a large class of subtle run-time errors from occurring. The current trend is towards programming languages which statically guarantee safety, eg Java. Unfortunately, high performance communication libraries, like AM, have safety requirements exceeding the capabilities of traditional programming languages. Currently, the use of such libraries entails either run-time checking to ensure communications libraries are used correctly (negating performance), or the following of unchecked programming conventions. The Gardens programming language statically enforces the safe use of AM, or similar high performance messaging libraries. Adaptation: idle workstations come and go over time. Thus a parallel computation must adapt to a changing set of idle workstations: adaptive parallelism. Such adaptation must be transparent to the workstation user, and ideally to the application programmer too; Gardens achieves both goals. A modi ed screen saver, together with user con gurable parameters, is used to determine when a workstation is idle and when a machine should be returned to the user. Lightweight tasks and fast task migration are used to map the parallel computation across the set of available workstations, as it changes over time. In particular seed tasks (new tasks which have never been run) can be very eciently migrated. Abstraction: the nal requirement of Gardens is to support the programming of abstractions, necessary for large scale software development. Unfortunately, the low level nature of AM does not support the programming of abstractions. For this reason Gardens has elevated the level of AM to global objects. Global objects (GOs) map directly to AM with virtually no performance penalty, and support abstraction. GOs also support the addressing of mobile tasks, necessary for adaptive parallelism. These four facets of Gardens are orthogonal; for example, it is possible to use global objects without task migration. However what makes Gardens unique is addressing all these requirements. The Gardens programming language, Mianjin [Roe and Szyperski, 1997], is based on Oberon, the latest in the Pascal line. Like Java, Oberon is safe and supports abstraction via objects. Unlike Java, Oberon is a very ecient language and has been used for systems programming; in performance terms it is similar
to C. (Nevertheless the Mianjin extensions could be adapted to Java.) Mianjin extends Oberon with support for global objects, and their safe use. The remainder of this paper is organised as follows. The next section covers related work. Sections 3 and 4 describe the communications and tasking model of Gardens. Section 5 summarises the model and its invariants. Section 6 presents some performance gures. The nal section discusses conclusions.
2 Related Work There are three main elds contributing work that is related to Gardens: traditional parallel computing, parallel computing on clusters, and distributed computing. Traditional parallel computing and, in many cases, cluster computing assume a xed number of processors for any given run of a parallel program. In other words, such systems do not adapt dynamically to a changing set of available processors. Distributed computing does consider such dynamics, but emphasises availability, transparency and fault-tolerance over performance. Table 1 summarises some of the key dierences between the various approaches. While Gardens is ecient and safe, and supports abstraction and adaptation without restricting the model of computation, all the other approaches are de cient in one or more of these dimensions. Java MPI UPVM Split-C Orca Piranha Cilk Charm Gardens RMI 2.0 v2 (++) ecient N Y Y Y Y Y Y Y Y safe Y n/a1 n/a1 N Y n/a1 Y N Y abstractions Y Y N N Y2 N Y Y Y adaptive N3 N Y N N Y Y N Y unrestricted Y Y Y Y N N N N Y model
Table 1. Comparison of Approaches. Remarks: 1 not inherently unsafe, but depends on 2
language binding; limited abstraction since Orca objects cannot refer to each other; 3 adaptation is not part of the current Java speci cations.
Java Remote Method Invocations (RMI) [JavaSoft, 1997] is an approach for traditional distributed computing, but like Gardens advocates communication via remote method invocation. Java RMI is synchronous and parameter passing uses a costly serialisation protocol, ensuring platform independence. The overall model, and de nitely its current implementations, are far too costly for most parallel applications. Unlike current Java, a number of attempts have been documented to integrate general process migration facilities into operating systems, eg, MOSIX [Barak et al, 1996]. All these systems aim at distributed comput-
ing with much coarser granularities than those required for high performance parallel computing. The Message Passing Interface (MPI) and the Parallel Virtual Machine (PVM) are both libraries supporting parallel computing based on message passing and abstracting from the speci c platform. Both support dynamic process creation (MPI v2.0), but not process migration. MPI, but not PVM, provides a (crude) means to support abstraction (`communicators'). An extension to PVM, UPVM [Konuru et al., 1994], supports lightweight user-level processes and their migration. A more recent project, Mist [Casas et al., 1995], considers migration of full OS processes, rather than lightweight tasks; however, this is very expensive compared with lightweight task migration. Unlike Gardens, Mist addresses fault-tolerance. Active Messages (AM) is a low-level message passing library particularly designed to support parallel computing across clusters. Since Gardens builds on AM, a detailed description follows in Section 3.1. Split-C [Culler et al., 1993] is a parallel extension of ANSI C and was originally designed to map eciently to AM. Split-C is ecient but unsafe and neither supports abstraction nor adaptation. This is the language used by the NOW project [Anderson et al, 1995]. Orca [Bal et al., 1992] is an integrated language and system that is safe and supports partial abstraction. Orca separates active processes from passive shared objects. Shared objects cannot have mutual references thus limiting both the model of computation and the degree of abstraction. All communication between processes is via atomic (indivisible) side-eecting operations on such objects. Shared objects can be transparently replicated or sliced across available machines, but overall eciency relies heavily on compiler optimisations. Adaptation/process migration is not supported. Charm is another integrated language and system [Kale and Krishnan, 1993]. The key concept is message-driven execution where context switching is used to hide communication latencies. Charm uses sophisticated load balancing strategies and works well for irregular problems. Task migration is only possible for new, never activated tasks (seed tasks); once a task is running it cannot be migrated, ruling out true adaptation. A particularly interesting concept is that of branch oce chares. These are replicated exactly once per processor and allow for programming of processor-local functionality (a similar facility is being considered for Gardens). Piranha [Carriero et al., 1995] is one of the few adaptive parallel programming systems. It is based on the tuple space abstraction introduced by Linda to which it adds a few run-time hooks to support adaptive parallelism. It relies on a restrictive at master worker model and requires programmers to explicitly code the actions required to release a machine. Cilk(v2) [Blumofe and Lisiecki, 1997] is an integrated language and system that only supports a very limited functional style of parallel computing. However, Cilk supports transparent adaptation and fault-tolerance on the basis of automatic cancellation and restart of subtasks that have yet to report their result.
3 Communication Our system is predicated upon high-performance communications networks, characterised as being: reliable, low latency, high throughput, connection-less and switched. Ecient utilisation of such hardware requires a very lightweight and ecient messaging software layer, such as Active Messages (AM). Our programming language Mianjin supports global objects and type annotations. Together these provide a safe interface to AM which supports abstraction, and this is achieved without incurring any signi cant performance overheads. The whole of Gardens has been heavily in uenced by AM; however similar systems such as, Fast Messages, U-Net, GM etc. could equally well be utilised as a base.
3.1 Active Messages
In essence, AM [von Eicken et al., 1992] is a form of lightweight asynchronous remote procedure call, with a synchronous poll (accept) mechanism. The AM design assumes a few strong invariants that guarantee the following properties: 1. Non-preemptive semantics. 2. Local AM calls almost as ecient as local procedure calls. 3. No self-in icted deadlock despite non-preemptive semantics. 4. No distributed network deadlock caused by buer over ow. Our programming language Mianjin statically guarantees these important safety properties. In AM, request operations may be issued which asynchronously send a message to a remote processor, consisting of a handler (function pointer) and some data. On receipt messages are queued until a poll operation is performed. \Poll" processes messages by invoking handlers on associated data; this gives the recipient control over when handlers are invoked (Property 1). Note that \Poll" may process any message; it is not possible to lter particular messages. \Poll" does guarantee that messages are processed atomically and in order if from the same sender. A request handler may invoke a reply operation (similar to a request) to return a message to the original sender; however, reply handlers may perform no communication. Thus communication operations cannot be nested via handlers. If the destination of a request (or reply) is the local processor the addressed handler will be called immediately,approaching the eciency of a local procedure call (Properties 2 and 3). Thus all request operations also perform a poll on completion; this ensures the semantics of local requests is consistent with that of non-local requests. A credit counting scheme is used to control network deadlock (Property 4): as opposed to application deadlock. Each host maintains a credit count, representing current buer usage, for each possible destination host. Credits are lost by sending messages to a host, and gained by receiving reply/acknowledgement messages from hosts. Therefore, no protocol is needed to handle buer over ow at the receiving end; in general, AM's performance is largely a result of trimming back traditional network and transport protocol overheads. If a request is
issued when the credit count is zero the request operation will poll until credit is available; after which the request will be performed. Thus, a request operation may cause a poll before and after its operation.
3.2 Global Objects Active messages is very ecient but dicult to use since all communications occur through static memory. Thus typical programs use global variables for communication, which do not support abstraction|violating one of our goals. Also AM does not support the addressing of mobile tasks (see Section 4.3). The key goal of our global objects is to support abstraction of communications, and to do so without sacri cing the performance AM provides. In Gardens tasks may perform point to point communication with other tasks via global objects (GOs). In general a task will manage several GOs, which act as communication `ports' for that task. Global objects support asynchronous remote dynamic dispatch; that is a task may invoke a method on an object which is located on a dierent processor. This is implemented by AM's request operation. GOs are ordinary objects created within the heap of a particular task, and are managed by that task. Thus GOs have the same visibility as ordinary programming language values: no name space issues arise, cf distributed systems. The only dierence between ordinary local objects and GOs is that GOs are globally contactable. Any object can be made globally contactable by handing out a global reference to it. The task owning a GO may access it as a normal object (a record), and, for it, the GO is indistinguishable from any other heap allocated record. Global references cannot be dereferenced, and they only support a subset of the original object's methods, in particular those methods labelled GLOBAL. Global methods are required to have a restricted interface. In particular VAR parameters, and return values are disallowed, and local pointers are coerced (demoted) to global ones. These restrictions prevent local (ie nonGO) references escaping from tasks, including implicit ones created by VAR parameters. A simple example is shown below: TYPE Accumulator = POINTER TO RECORD count: INTEGER; sum: REAL END; GLOBAL PROCEDURE (self: Accumulator) Add (s: REAL); BEGIN self.sum := self.sum + s; DEC(self.count) END Add;
(* a global method *)
POLL PROCEDURE Worker (gsum: GLOBAL Accumulator); VAR localsum: REAL; BEGIN ... (* expensive calculation of localsum *) gsum.Add(localsum) (* global method invocation *) END Worker;
POLL PROCEDURE Master; VAR acc: Accumulator; BEGIN NEW(acc); acc.sum := 0; acc.count := NTasks; ... (* create NTasks worker tasks performing Worker(acc) *) ... (* poll while waiting for results, ie, acc.count -> 0 *) ... END Master;
The example shows how a global object (acc), managed by a master task, may accumulate the sum of several local sums, each calculated by worker tasks. Some code has been elided; this will be revealed in subsequent sections, as will the signi cance of the POLL annotations. The only way worker tasks can access the acc is via its global methods, in this case Add. When a worker task invokes the global method Add, actual parameters and the method index are communicated via an AM request operation to the task owning the object. After communication, the method is invoked locally on the object. Global object reading, corresponding to AM reply operations, is not described here, for information see [Roe and Szyperski, 1997].
3.3 Poll Procedure Annotations Global objects support abstraction and the addressing of mobile tasks. However there remain the safety restrictions on message handlers (global object methods). In particular, a global method may not directly or indirectly poll or invoke a global object method. These safety restrictions cannot be enforced by traditional programming languages such as C. An important characteristic of program code is whether it may perform a poll operation or de nitely will not. Polling may be performed explicitly via Poll or implicitly eg via global method invocation, in either case such code is termed polling. Code which is not polling is termed atomic. Mianjin captures the notion of polling in its type system. Atomic code is the default; all polling methods and procedures must be labelled POLL. The compiler is able to statically check that:
{ Polling code is only invoked by other polling code. { Global methods are only invoked on GOs by polling code, since such invocations are polling.
{ Global method implementations (bodies) are atomic (ie cannot poll). Thus a global method may not invoke other global methods or call Poll.
These POLL annotations are also useful for the programmer. In particular, if a library routine is labelled POLL the programmer knows that poll operations may occur, and hence any GOs must be prepared for global method invocations. If a routine is not labelled POLL, the default, no poll operations (implicit or explicit)
will be performed by the library. Thus GOs need not be in a consistent state when the library is called. Note, unlike other parallel object-based systems, there is no possibility of deadlock as the result of recursive or cyclic invocations, since nested invocations are statically prohibited. (The restrictions required by AM are automatically and naturally met.) Furthermore, a task may invoke a global method on one of its own objects with no possibility of deadlock.
4 Adaptation and Tasking Tasks are our unit of work, they are used for adaptation: that is for dynamically mapping a parallel computation across a changing set of idle workstations. This is accomplished by over-decomposing a problem into more tasks than there are workstations. To adapt to a changing set of workstations, tasks may be migrated. The task migration machinery is described in detail in [Beitz et al., 1997]. Task migration is primarily targeted at supporting the release and acquisition of workstations; however it may also be used for some coarse-grained load balancing of tasks across a stable processor set. The key goal of our tasking system is to eciently support adaptive parallelism without sacri cing performance. The programmer's model of computation is a network of communicating tasks; these are dynamically mapped onto the network of workstations. Tasks are created dynamically; they communicate with other tasks, using GOs, independent of which processor they occupy. Note that there is no concept of task identi ers, since such values tend to break abstraction. Instead we rely on GOs to support communication between tasks. We only allow a single Gardens application per workstation; this avoids problems associated with parallel multitasking, in particular coscheduling/gang scheduling.
4.1 Multitasking
How can multitasking and AM/GOs coexist? By only allowing a context switch
to occur while a task is at a poll point. A poll point is any point in a program where a poll operations may directly or indirectly (eg global method invocation) occur. The Gardens system supports lightweight non-preemptive tasks which typically inhabit a relatively heavyweight OS process (eg Unix or NT process). Each task has its own stack and heap. Non-preemption simpli es programming, is compatible with our AM/GO programming model and is more ecient than preemptive multitasking. Furthermore no protection or isolation is enforced at run-time; this is enforced statically by our programming language. Thus when a task executes a purely sequential (atomic) portion of code performing no communication or blocking (see next section), that code will be run at maximum speed, incurring no overheads due to parallel execution. Tasks are created using Fork, a Gardens' library procedure similar to the Unix fork operation. The fork operation is atomic, but the new task is not:
PROCEDURE Fork (p: POLL PROCEDURE (go: GLOBAL ANYPTR); go: GLOBAL ANYPTR);
For example the worker tasks of the previous example may be created thus: FOR i:= 1 TO NTasks DO Fork (Worker,acc) END
Fork is asymmetric: there is no implicit synchronisation between child and parent tasks. A single global object is passed to the child task; this is sucient to bootstrap communication. Notice that no task may communicate with the child task until the child task has initiated some communication. This allows simple migration of new tasks which have not yet been activated, see Section 4.3.
4.2 Blocking Blocking has two roles: it causes a context switch, and eliminates unnecessary context switches to tasks still waiting to synchronise. Gardens prides itself on its unfairness! Unless a task explicitly gives up control by voluntarily blocking, or is migrated (see next section), it will run top speed to completion at the exclusion of all other tasks on that processor. Only blocking guarantees to perform a context switch; thus blocking must be used to guarantee progress of a program. The poll operation may perform a context switch; however blocking will perform a context switch. Ecient tasking requires support for blocking: even with asynchronous communication, tasks will eventually need to synchronise. Blocking eliminates unnecessary context switches to tasks waiting to synchronise. The Block and Unblock routines implement blocking; they take no arguments. \Block" blocks the currently running task by removing it from the runnable queue and putting it in the blocked pool. Semantically, block is equivalent to a poll and context switch; hence block is declared as polling. Blocking blocks a task but does not prevent global method invocations on that task's GOs: which may unblock it. Unblock, usually performed by a GO's method, unblocks the associated task. Unblock is atomic and hence is not labelled POLL. (Where no unblocked tasks remain, the kernel continues polling for incoming messages.) Block/unblock operations have direct eect regardless of their nesting level. In a non-preemptive system like Gardens the sophistication of semaphores, and equivalent mechanisms, is not required. We may add blocking to our previous example thus: GLOBAL PROCEDURE (self: Accumulator) Add (s: REAL); BEGIN self.sum := self.sum + s; DEC(self.count); (* if last result unblock master task *) IF self.count=0 THEN Unblock END END Add;
POLL PROCEDURE Master; VAR acc: Accumulator; BEGIN NEW(acc); acc.sum := 0; acc.count := NTasks; FOR i:= 1 TO NTasks DO Fork (Worker,acc) END; WHILE acc.count#0 DO Block END; ... END Master;
It is necessary to wrap a while loop around Block since other abstractions may also perform Block and Unblock operations. Note, in general we expect such detailed coding to be encapsulated in abstractions. Our blocking and unblocking is general and ecient. For example, there is no need for a kernel task to repeatedly test a ag to see whether a task should be unblocked.
4.3 Task migration To support the acquisition and release of workstations, work must be dynamically reallocated to workstations. Tasks are our unit of work hence reallocation implies migration of tasks. Currently, task migration is only supported across homogeneous platforms; although we are researching heterogeneous task migration. The tasking implementation partitions the virtual memory address space of a Gardens application across processors. Tasks occupy disjoint regions of virtual memory. Thus task migration can be achieved by copying a task's heap and stack from one processor to the same regions on another processor. Migration is non-preemptive, and can only occur when all tasks are at poll points (ie poll, global method invocation, or block). At each poll point the release workstation ag is tested, and, if set (see next section), migration is initiated. Thus general migration requires global synchronisation. This enables all messages in transit to be ushed before migration, hence no message forwarding mechanism is needed. It also means that our model invariants are preserved, see Section 5. Thus the programmer must ensure that programs poll frequently enough to enable workstation release within a reasonable time. Programs failing to voluntarily release workstations within a speci ed time are killed. New tasks which have never been run, support a much more ecient form of migration. Until a task is run, no other task can communicate with it and all the state that needs to be kept in such a seed task are the arguments to Fork (two addresses). Seed tasks can be migrated without requiring global synchronisation. In fact one AM request/reply communication can be used for migration of seed tasks. Such tasks do not even need a heap or stack until they are rst run. In Gardens, general load balancing is thus performed using seed tasks, while task migration is normally only used for adaptation to a changing processor set.
4.4 Gardens Screen Saver A modi ed screen saver, together with user con gurable parameters, is used to determine when a workstation becomes idle and when a machine should be returned to the user. A simple signal mechanism is used to communicate between the screen saver and the Gardens application. The signal sets a ag in the Gardens application indicating that the workstation should be released (returned to the user). Task migration, and hence workstation release, is non-preemptive. Thus if voluntary release does not occur before some time limit expires, the Gardens application is killed. Issues of user acceptance, screen saver behaviour and parameters are beyond the scope of this paper.
5 Summary: the Model and its Invariants Our model of tasking, task migration and communications has the following invariants, which guarantee safe use of AM:
{ Tasks can be in one of three states: seed, runnable, or blocked. { On each host, at most one runnable task is not at a poll point; all blocked tasks are at poll points.
{ Message handlers are called only if all local (non-seed) tasks are at poll points. { Migration can only occur if all (non-seed) tasks on all processors are at { { { {
poll points (ie poll, global method invocation, or block), and there are no outstanding messages. No \network deadlock" can occur, as is the case with AM. Messages are causally ordered, even across migration. Seed tasks have performed no communication and cannot be communicated with. Tasks occupy disjoint regions of a single global virtual address space.
6 Performance Results We have a prototype system implemented and are currently working on optimising its performance. (Note, our GOs require distributed garbage collection which is currently being implemented.) The test platform consists of four 32Mb 120MHz Sun SparcStation-4s connected via a Myrinet. The rst set of measurements show the overhead of using global objects versus raw active messages (times are for one way communication). data size (bytes) AM (s) GO (s) 16 (short comm) 20 23 1024 (long comm) 137 139 The following gures demonstrate the eciency of our current tasking system:
fork (seed task creation) 4s fork (eager task creation) 26s block (block task, poll and context switch) 18s unblock task 3s The time taken for task migration depends on the task size and the time required to synchronise processors, which is application dependent. The gures below give times for synchronising all processors and migrating a single task: Task size (Kbyte) 8 16 32 64 128 Migration time (ms) 1.6 2.1 3.7 6.8 13.5 The time to create (via fork) and migrate a seed task, which consequently has no stack or heap data and requires no global synchronisation, is just 24s! A typical time to release a workstation, including synchronisation and task migration, is of the order of one second (given several megabytes of task state to migrate).
7 Conclusions In this paper we have described the approach to parallel computing on networks of workstations that underlies the Gardens language and system. In particular, Gardens allows parallel applications to be safe, ecient, adaptive, and composed using programmable abstractions, without severely restricting the model of computation. As demonstrated by some experimental applications, the current implementation on Sparc/Solaris and Myrinet ful lls all these requirements. The system runs entirely in one user-level process per participating machine and does not require any special permissions to execute, thus not interfering with workstation users' safety requirements. Nor does the system interfere with the needs of workstation users; workstations are quickly acquired and released. A rst public release of the system is scheduled for early 1998. (For more information see http://www.fit.qut.edu.au/~szypersk/Gardens/.)
Acknowledgements Our thanks for their eorts in discussing and implementing the model described in this paper go to: Ashley Beitz, Siu Yuen Chan, Geo Elgey, Nickolas Kwiatkowski and all other members of the Gardens Project group. This work has been supported by the Programming Languages and Systems group at QUT and partially by Australian Research Council grants.
References Barak, A, Braverman, A, Gilderman, I, and La'adan, O (1996). The MOSIX multicomputer operating system for scalable NOW and its dynamic resource sharing algorithms. Technical Report 96-11, Institute of Computer Science, The Hebrew University.
Anderson, Th E, Culler, D E, Patterson, D A, et al. (1995). A case for NOW (Networks of Workstations). IEEE Micro. Bal, H E, Kaashoek, M F, and Tanenbaum, A S (1992). Orca: A language for parallel programming of distributed systems. IEEE Software Engineering, 18(3):190{205. Beitz, A, Chan, S-Y, and Kwiatkowski, N (1997). A migration-friendly tasking environment for Gardens. In Fourth Australasian Conference on Parallel and Real-Time Systems (PART'97), Newcastle, Australia, Springer. Blumofe, R D and Lisiecki, P A (1997). Adaptive and reliable parallel computing on networks of workstations. In Proceedings of the USENIX 1997 Annual Technical Conference on UNIX and Advanced Computing Systems, Anaheim, California.
Carriero, N, Freeman, E, Gelernter, D, and Kraminsky, D (1995). Adaptive parallelism and piranha. IEEE Computer, pages pp40{49. Casas et al., J (1995). MPVM: A migration transparent version of PVM. Computing Systems, 8(2):171{216. Culler et al., D E (1993). Parallel programming in Split-C. In Proc., Supercomputing '93 Conf.
JavaSoft (1997). Java RMI. Technical report, Sun Microsystems, Inc. http://java.sun.com/products/rmi/. Kale, L V and Krishnan, S (1993). CHARM++ : A portable concurrent object oriented system based on C++. In Proc, Conf on Object-Oriented Programming Systems, Languages, and Applications (OOPSLA'93). Konuru, R, Casas, J, Otto, S, Prouty, R, and Walpole, J (1994). A user-level process package for PVM. In Proceedings of the Scalable High Performance Computing Conference, pages 48{55. Roe, P and Szyperski, C (1997). Mianjin is Gardens Point: A parallel language taming asynchronous communication. In Fourth Australasian Conference on Parallel and Real-Time Systems (PART'97), Newcastle, Australia, Springer. von Eicken, T, Culler, D, Goldstein, S C, and Schauser, K E (1992). Active messages: A mechanism for integrated communication and computation. In Proceedings 19th International Symposium on Computer Architecture.