support parallel execution of programs on computer networks. These systems .... Systems of this class are used to run a set of independent jobs on a set of ...
Distributed Parallel Computing in Networks of Workstations — A Survey Study Rinat Khoussainov, Ahmed Patel Computer Networks and Distributed Systems Research Group, Department of Computer Science, University College Dublin Belfield, Dublin 4, Ireland and H.D.J. ten Voorde Department of Information Technology and Systems, Delft University of Technology Delft, The Netherlands
ABSTRACT
utilisation of the computational potential of computer networks is still an open problem and a research challenge for today and the future.
Recent developments in networking have turned computer networks into attractive platforms for parallel computing bringing in a new concept of network computing when a network is viewed as a multi-processor parallel computer. Numerous programming environments have been developed to support parallel execution of programs on computer networks. This paper presents a survey and analysis of existing frameworks for network computing with a particular focus on computing systems for Networks of Workstations (NOWs). A classification and assessment scheme for such systems is proposed. Analysis of systems advantages and drawbacks is given and future perspectives are discussed.
Numerous programming environments have been developed to support parallel execution of programs on computer networks. These systems operate at various levels of abstraction, employ different formal models for representation of parallelism, use specialised or general-purpose languages, and range from geographically distributed to local area network-based systems. Usually, development of these systems employed rather ad hoc approaches following few basic ideas. This led to frameworks that often cover only a small part of the issues arising in network computing environments, or to frameworks that are specialised and applicable only to a limited set of tasks.
Keywords: Distributed Parallel Computing, Networks of Workstations, Survey
INTRODUCTION The demand for computational power has been one of the major driving forces in computer science since its beginnings. Technology has achieved spectacular progress in computing performance, data storage capacity, circuit integration scale etc. over the past couple of decades, yet it often appears insufficient to fulfil the resource requirements of modern applications. Ironically, powerful computers stimulate attempts to approach more complex problems, which have been considered infeasible to solve before, and this trend is very likely to remain in the future. The only way to achieve unlimited increase in performance is parallel computing. Recent developments in networking have turned computer networks into attractive platforms for parallel computing bringing in a new concept of network computing [1]. Network computing allows the network to be viewed as a computer with the obvious benefits of its scalable cumulative power, more efficient use of existing resources, and more effective system management. The basic type of message is not simple data, but a program, e.g., a mobile agent or applet. The celebrated achievements in cipher breaking on the Internet have clearly demonstrated the capability of network computing systems to solve intractable problems. However, efficient
Different environments have different advantages as well as drawbacks. Although the design of the best ever system for distributed parallel computing seems an unrealistic target, yet understanding of the sources for these advantages and drawbacks should help researches to achieve better performance and to improve other important properties of network computing frameworks, such as fault-tolerance, ease of programming, etc. The purpose of this paper is to bring some order in the “wild” world of the systems for distributed parallel computing. We not only present a survey of existing frameworks, but also analyse the most popular models for parallel computation employed in these systems, discuss how these models are mapped onto particular implementations, and what advantages are achieved by such mappings. Finally, we propose a classification scheme that allows for assessment and comparison of different network computing frameworks.
PLATFORMS FOR NETWORK COMPUTING Features of the underlying computing platform significantly influence requirements to, design, and final characteristics of network computing frameworks. Thus, we begin our study with analysis of network computing platforms. Computing networks can be considered as distributed memory MIMD machines: each network node has its own processor (or
processors) and a local memory, all nodes are interconnected via the network. Although architectures with shared memory are also possible, there are much less common and are mostly used in centralised parallel computers. Depending on the type of the network and the operational mode of separate nodes, platforms for network computing can be classified as Local Area Networks (LANs), Clusters, and Wide Area Networks (WANs). Although clusters may seem an instance of local area networks, there are a number of significant differences between them: • •
• •
Nodes in a cluster are homogeneous in terms of the processor and machine architecture. Clusters are composed of dedicated machines with centralised management. The whole cluster can be allocated to a single job, while in a LAN owners of separate network nodes have priority use of node resources. Clusters may use very high-speed networks that are usually considered too expensive for an ordinary LAN (e.g., Gigabit Ethernet, Myrinet, etc.). Clusters may use specialised system software (e.g., distributed operating systems, etc.) that supports distribution of work and resource management.
Attempts to utilise geographically distributed computing resources have always suffered from poor management, relatively slow network links between participating computers, and security breaches. Besides, the network heterogeneity requires the use of platform-independent programming languages, such as Java, that often result in low execution speed of the mobile code. On the other hand, it is much easier to cope with these problems in local networks. Indeed, they already have the required administrative and management infrastructure in place. They provide acceptable throughput, and parameters of the network are more predictable. Finally, the diversity of computing platforms in a local network is, as a rule, quite limited thus allowing for efficient multi-platform implementation of applications (e.g., in the most trivial case by pre-compiling the software for every platform used in the computation). Local networks of workstations (NOWs) have a number of important differences from traditional centralised multiprocessor parallel computers: • • •
The amount and availability of computation resources in LANs change more dynamically and are harder to predict. Workstations are less reliable than separate CPUs in a parallel computer; e.g., they can be just switched off. Communication speed in a LAN is significantly lower than the exchange rate between CPUs in a multi-processor system. Therefore, the communication intensity between processes becomes more critical factor in network computing environments.
Taking into the account the above-mentioned factors, we particularly focus on the following aspects of network computing frameworks:
• • • •
Utilisation of computing resources in local area networks rather than in wide area networks. Mechanisms for adaptation to changes in the availability of computing resources (workstations) in a network. Mechanisms for coping with faults of separate computing nodes in the network. Analysing properties of individual network nodes (e.g., CPU speed, available RAM, connection speed and bandwidth, etc) and balancing the work distribution accordingly.
However, this does not mean that other types of distributed parallel computing systems are of less importance. Techniques used by WAN-based systems and even by centralised parallel computers may be beneficially employed by the LAN-based network computing frameworks and vice versa.
EXISTING ENVIRONMENTS AND TOOLS In our survey, we group systems according to their accentuating features, although different systems in the same group may use different models of computations, languages, computing platforms, etc. The purpose of this section is just to give an overview of existing systems. Communication Libraries Communication libraries are usually considered as low-level systems for distributed parallel computing. They provide only primitive support, including data exchange between parallel processes and very basic process management (e.g., process creation and destroying). These libraries implement one of the two communication techniques: message passing or shared memory. Message Passing Interface (MPI) [2] and Parallel Virtual Machine (PVM) [3] are the most popular examples of the message passing systems. MPI is a standardised interface for inter-process communication. While early versions of MPI provided only messaging primitives, in MPI2 version of the standard process creation and start-up functions are added. MPI comes in a form of software library (e.g., MPICH), so the programmer can use calls to library functions to perform process management or data exchange between processes. Implementations of MPI exist for popular programming languages (e.g., Fortran, C, C++, and Java) on a variety of platforms including networks of workstations. PVM is more oriented towards networks of workstations. It provides an abstraction of a processor pool. In fact, each processor in the pool can be a separate network node. The programmer can use PVM routines to start/stop a process or send messages between processes. A NOW-based implementation of PVM consists of a PVM daemon, responsible for managing processes on a network node, and a library, containing PVM communication and process management functions. Implementations of PVM exist for different architectures. Moreover, PVM programs written for different architectures can communicate to each other, thus allowing for building of heterogeneous network computing systems.
Shared memory is a more generic communication technique (e.g., message passing can be implemented using shared memory). Shared memory allows for use of more standard communication and synchronisation paradigms, such as shared variables and semaphores. A number of systems provide implementation of shared memory on computer networks, called Distributed Shared Memory (DSM). TreadMarks [4] provides distributed shared memory as a linear array of bytes. A number of techniques is used to optimise the memory performance and to reduce the amount of communication. For example, the system employs a lazy release consistency model, where local memory updates are reflected on remote nodes only when processes running on those nodes synchronise with local processes (e.g., between conflicting memory accesses). A Multiple-Writer Protocol allows the system to give write access to the same page to several processes. Linda [5] is an example of a structured distributed shared memory. In Linda, the memory is called tuple space, where a tuple is a set of data values. Processes can put or read tuples in the tuple space. Read access is associative. A process fills a part of a tuple and then issues a read request. Then, the process is blocked until there is a tuple in the tuple space matching the filled values in the given tuple. The missing values are returned as a result of the read operation. There are also a number of projects developing system-level communication libraries that significantly speed-up message exchange between machines by avoiding the standard OS network stack and delivering messages directly to destination processes. Examples are Active Messages (AM) [7], GAMMA [6] (also based on AM), and M-VIA [8]. Batch-processing systems Systems of this class are used to run a set of independent jobs on a set of networked machines. The most common use of such systems is utilisation of idling workstations in a network. Usually, such a system has a queue of submitted jobs. When a workstation becomes idling, the system extracts a job from the queue and starts it on that workstation. Condor [9] is an example of such batch-processing system. Condor allows for specification of resource requirements of a job and tries to match jobs and machines. When programming with Condor, a user has to link her/his program with a Condor library instead of the standard C library. This allows Condor to redirect all input/output calls from the machine where the job is executed to the machine from which the job was submitted to Condor. Condor can transparently checkpoint the jobs, so that if a workstation crashes, only a small part of work is lost. Checkpointing also allows for migration of processes, e.g. when a better match between the job requirements and machine characteristics is discovered. MOSIX [10] is another example of process migration environment. It is built as an enhancement of Linux kernel and provides redirection services for I/O operations. MOSIX provides dynamic load balancing, probabilistic dissemination of information about available resources, and memory ushering. Unlike Condor, MOSIX is decentralised, thus nodes may join and leave network independently. MOSIX can be
used with parallel programming environments (e.g., PVM or MPI) just as a traditional Linux. Parallel computing frameworks for NOWs These parallel computing frameworks are, as a rule, oriented towards execution of a single job on a network of workstations in parallel. The purpose of such systems is to make a computer network act as a virtual parallel computer. The difference between, say, PVM is that these systems provide more highlevel support for the programmer. For example, they may conceal mapping between processes and processors, perform implicit synchronisation or communication between parallel processes, etc. Such frameworks consist of a programming language support and a runtime support. Programming language support usually extends an existing programming language with constructs required to write parallel programs, e.g., parallel loops, spawning of parallel processes, synchronisation and communication functions. Runtime support is responsible for parallel execution of a program on a computer network. It may deal with distribution of work between nodes, faults of separate machines, migration of processes, runtime communication and synchronisation, etc. A-BSP system [11] implements a Bulk Synchronous Parallel (BSP) computation model on NOWs. In the BSP model, there is a set of processes executing in parallel. The computation proceeds in a series of supersteps. At the end of each superstep there is a barrier synchronisation. Processes can communicate to each other, but the values sent through the communication network are not guaranteed to reach destination processes before the end of the current superstep. In brief, the processes work independently in parallel for a while. Then, they exchange with computation results and synchronise. After that, this sequence repeats. In A-BSP, workers are linked in a logical ring. Initial state of each process is replicated on several workers in the ring. When a worker finishes computation of one process, it sends results to its successor in the ring. If a worker does not get a message from its predecessor, it starts execution of the process of its predecessor using the replicated initial state. This allows for coping with the situation, when a worker crashes or becomes busy and exits the logical ring. By analogy, when a machine becomes idling, it is included into the logical ring and utilised in the computation. This scheme is called Adaptive Replication System (ARS). When workers finish computations of all processes in a superstep, they distribute (replicate) new process state data and synchronise to continue with the next superstep. Cilk-NOW [12] is based on a Directed Acyclic Graph (DAG) model of parallel algorithms. Each thread in Cilk can spawn a child thread that executes in parallel with the parent. When a child finishes its computation, it returns results to its parent. The parent can perform synchronisation that in Cilk is to wait until all children of this thread return computation results to the parent. Cilk is based on the C language and it makes a distinction between normal C functions and Cilk functions. A call to a Cilk function means spawning a child thread executing that function in parallel. At runtime, each processor executes threads from a queue of ready (i.e., not waiting for their children) threads. If no threads are available in the ready queue, the processor engages in work stealing. It randomly chooses another processor (victim) and requests a thread to execute. Adaptive parallelism allows Cilk to take advantage of idle
machines. Whenever a workstation is not used, it can “steal” and execute work from others. Vice versa, whenever a workstation becomes busy, it offloads all its threads back to the corresponding victims. Given the Cilk adaptive parallelism, fault tolerance is only a short step away. When a worker learns of a crash, it goes though all of its stolen threads checking each of them to see if it was stolen by the crashed worker. Each such thread is returned back to the ready queue. Calypso [13], a part of the MILAN project [13], use an extended BSP model. In this model, a program may have sequential parts as well as BSP-like parallel steps. A manager process in Calypso is responsible for execution of sequential parts of a program and for scheduling parallel tasks between available workers. Calypso uses eager scheduling: the work is scheduled until at least one worker completes it. The same work may be scheduled to several workers. This allows Calypso to cope with faults of separate machines and with bottlenecks caused by assignment of task to a slow worker. Chime [13], also a part of the MILAN project, in addition to Calypso features also provides multi-processor shared memory (Calypso supports only shared variables), nested task parallelism and inter-task synchronisation. In Chime, each task in a parallel step can in turn do parallel steps (nested parallel tasks), thus a Chime program forms a DAG. Synchronisation in Chime is performed using single assignment variables (i.e., a reading process is blocked until a writing process assigns a value to the variable). A number of systems (e.g., Linda, Piranha [14]) use the Linda DSM and Master/Worker parallelism. The master divides initial work into pieces that can be executed in parallel and put their descriptions into the Linda tuple space. The workers read submitted descriptions, execute them, and return results into the tuple space. Implementation of Singe Program Multiple Data (SPMD) algorithms is quite straightforward in this model, although workers may execute different code as well. Piranha also exhibited adaptive parallelism, that is, when a workstation becomes idling it gets and executes a task from the tuple space. OpenMP [15] is a standardised API for parallel computing on shared memory computers. OpenMP extends an existing language with keywords required for parallel execution, synchronisation, and shared variables. The structure of a parallel algorithm is similar to the one in Calypso. There are sequential parts and there are parallel steps that can perform either the same operations on different data (parallel do) or a set of different tasks in parallel (parallel/end parallel directives). The implementation of OpenMP on NOWs [16] uses TreadMarks to provide shared memory. It also introduces few changes to the original OpenMP specification to obtain a more efficient implementation of DSM. Gardens [17] views a computation as a network of communicating tasks. These tasks are dynamically mapped onto workstations. Gardens can also migrate tasks when availability of network resources changes. For communication, tasks use Global Objects, an AM-based communication mechanism. The difference between normal objects and Global Objects is that Global Objects are also visible outside the task that owns them. Thus, remote tasks can call certain methods (labelled as “GLOBAL”) of a Global Object. There is an
analogy between Global Objects and asynchronous Remote Procedure Calls (RPC) with synchronous acceptation mechanism. Another interesting feature of this system is that the Garden’s programming language, Mianjin, is derived from Oberon that allows for writing of safe programs. Web-based Systems These systems employ the Java language to achieve portability and to utilise globally distributed computing resources. As a rule, in such systems the code running on a web server distributes work between computers connecting to the server. For communication between the central server and workers, the systems may use custom mechanisms or Java Remote Method Invocation (RMI). Charlotte [13] uses shared memory located on the server and the Calypso-like programming style. The parallel steps are composed of routines that can be executed on remote machines. The central server also maintains coherence of the shared memory. Manta system [18] optimises RMI by using different communication protocols (TCP/IP or AM) for different communication networks. Java Market [19] is targeting at an Internet-wide metacomputing system that bring together people who have work to execute and people who have spare computing resources. The goal of Java Market is to make it possible to transfer jobs to any participating machine. In addition, Java Market establishes cost beneficial interrelations between resource producers and consumers: consumers pay for used resources and producers are paid depending on their quality of service. CORBA-based systems CORBA-based systems use CORBA mechanisms for communication between parallel tasks and for task management. Usually, such systems (e.g., PARDIS [20], Cobra [21]) implement the SPMD programming model by replacing a single CORBA object with a distributed collection of objects executing the same CORBA request on different portions of data. The runtime system distributes data between objects and collects results. The Cobra system also provides a resource allocator, while in PARDIS distribution of objects is let to programmers.
DISCUSSION AND CLASSIFICATION The survey shows that most of the frameworks for network computing employ quite a limited range of models for parallel computation. These are BSP and its extensions, DAG, DSM with Master/Worker parallelism, and message passing. For message passing, the programmer has to decide on decomposition of a task into pieces (processes) that will be executed in parallel, when and where each process should be started, what data should be communicated, between which processes and when. Obviously, this significantly complicates development of applications for such libraries and makes efficiency of such systems the matter of the programmer’s talent. On the other hand, such libraries are very flexible and can be adapted to a wide range of application areas. Library interfaces can be well standardised and incorporated into existing programming languages. These advantages facilitate
•
portability of applications and minimise the time necessary for a programmer to learn the system. In the A-BSP implementation of BSP, a set of available workers in a network executes work in a superstep until all processes reach the barrier. The more workers available and the faster they are, the faster the superstep is executed. If a worker crashes, others do its work. Such organisation is particularly effective, when BSP programs are executed on non-dedicated workstations. On the other hand, on dedicated machines the system produces undesirable (though not too big) overhead for data replication. In general, BSP is a simple yet efficient model. BSP extensions combine the BSP’s advantages with a greater flexibility of the program structure.
•
The DAG model is also quite simple and intuitive for a programmer used to operate with function or procedure calls. Disadvantages of the model may come from restrictive implementations. For example, the Cilk-NOW language is essentially functional, besides Cilk-NOW does not allow for redistribution of work, thus a slow worker can slow down the whole computation even if faster workers become available later. Batch-processing systems may suffer from the lack of communication between processes. Such systems are applicable only to specific sorts of tasks, e.g., when the same program is to be run with different input data. The system cannot speed up execution of a separate job, but it can run several jobs in parallel if there are available resources and it can ensure positive progress on a single job over a long period of time. (Condor developers call this high-throughput computing). However, these systems can serve as a basis for parallel computing environments, e.g., when combined with other tools, such as PVM. In this case, processes can work on a single parallel computation, while Condor (or MOSIX) is responsible for optimal allocation of available resources to processes. The above discussion leaves an impression that assessment and classification of different network computing systems is not an easy problem. To facilitate analysis of such frameworks we developed a classification and assessment scheme that is based on our survey as well as on the analysis of specific properties of network computing environments. The classification concentrates on the following aspects of distributed parallel computing systems: • •
Performance: Can the system performance be predicted/bounded? Does this apply to execution time, communication intensity, and communication time? Programming interface: Consider a set of generic actions that are performed by either the programmer or the system during execution of a parallel program. Then the question is how responsibilities are divided between the system and the programmer. For example: • Create a thread/process (a piece of work that could be executed in parallel with others): What can be a thread body and who defines it? Who decides when it is created and where (on which processor)? • Delete a thread/process: Who decides which thread and when?
•
•
•
Execute a thread/process: Who decides which thread and when? • Migrate a thread/process (transfer a thread from one processor onto another): Who decides which thread, when, and where to migrate? • Synchronise threads (wait until threads reach a certain point in their executions): Who specifies which threads should be synchronised and when? • Send data between threads: Who decides what to send and when? Is it necessary to specify the destination thread? • Receive data: Who decides what to receive and when? Is it necessary to specify the source thread? Adaptation capabilities: • Processors availability: How is the system capable to adapt to changes in the CPU availability. E.g.: a fixed set of processors (dedicated workstations), processors are acquired as they become free and returned back when requested, processors are acquired when they become free, work may be redistributed depending on processors capabilities (speed, memory, etc.). • Faults of separate processors: Is the system capable to recover after a fault of a processor? How are faults detected? Is there any adaptation to changes in the environment (e.g., transmission latency, etc.)? How does the system recover? (Restarts the whole computation, restarts only the crashed thread, restarts the crashed thread from the last checkpoint, etc.) Safety: How safe is the program? Is safety preserved in the programming language (i.e., checked at compile time)? Is runtime system looking after errors (Is it capable to correct an error, or the thread just crashes safely, when it does something wrong?) Security: How secure is the program execution for the program itself and for the execution platform? I.e., can the program corrupt data or crash the node it is executed on, or can the owner of the network node interfere with the program execution (e.g., by falsifying results, etc.)? How does the system maintain its integrity? (Is there any authentication between workers, protection of interprocess communications?) Platform properties: Platform type (SMP, Heterogeneous workstations, Homogeneous workstations, etc.) and degree of distribution (LAN, WAN, etc.)
The proposed classification scheme reveals the fact, which we already mentioned in the introduction, that many systems are quite imbalanced. That is, they concentrate only on a limited number of features leaving other important aspects unresolved. In addition, the most common weak points are platform heterogeneity and security.
CONCLUSIONS In this paper, we presented a survey study of existing environments and tools for distributed parallel computing. Many systems have been developed to date that support execution of parallel programs on computer networks. Our work attempted to provide a basis for systematic analysis of such systems. In particular, the survey shows that despite the “mushrooming” of systems available, there are still room for
research in this area and opportunities for development of advanced network computing frameworks. In our opinion, the two most intriguing questions about distributed parallel computing are: • •
Why have not most of these systems become widely accepted and used? Where does the future lay for distributed parallel computing frameworks? and
Indeed, the most popular systems are the most primitive ones, such as MPI or PVM, while much more advanced frameworks are used mainly inside the walls of the universities developed them. The answer to the first question may be a very successful framework for distributed parallel computing. The authors believe that the future of network computing should be closely related to development of open hierarchical distributed architectures for parallel computing. The architectures that can bring together different existing systems, application specific requirements, and various hardware platforms distributed all over the globe. The openness of such architectures should allow them to benefit from advantages offered by different parallel computing models and to adapt to properties of different applications and network environments. A number of such systems is being developed (e.g., Globus [22], Legion [23]) that provide global resource management, communication, information, and security services for widely distributed computation grids.
REFERENCES [1] D. Clark et al. “Strategic Directions in Networks and Telecommunications”. ACM Computing Surveys, Vol. 28, No. 4, December 1996. [2] J. Dongarra et al. “An introduction to the MPI standard”. Technical Report CS-95-274, University of Tennessee, January 1995. [3] A. Beguelin. “PVM: Experiences, current status and future directions”. Technical Report CS/E 94-015. Oregon Graduate Institute CS, 1994. [4] C. Amza, A.L. Cox et al. “TreadMarks: Shared Memory Computing on Networks of Workstations”. IEEE Computer, Vol. 29, No. 2, February 1996. [5] N. Carriero, D. Gelernter. How to Write Parallel Programs: A first Course. MIT Press, Cambridge 1990. [6] G. Chiola and G. Ciaccio, “GAMMA: a Low-cost Network of Workstations Based on Active Messages”. In proc. of PDP'97 (5th EUROMICRO workshop on Parallel and Distributed Processing), London, UK, January 1997. [7] Alan M. Mainwaring, David E. Culler. Active Messages: Organization and Applications Programming Interface. Technical Document, 1995. [8] Virtual Interface Architecture Specification. Draft Revision 1.0, December, 1997. [9] http://www.cs.wisc.edu/condor/ [10] A. Barak and O. La'adan. “The MOSIX Multicomputer Operating System for High Performance Cluster Computing”. Journal of Future Generation Computer Systems, Vol. 13, No. 4-5, March 1998.
[11] M. Nibhanupudi. “Adaptive Bulk-Synchronous Parallelism in a Network of Nondedicated Workstations”. In proc. of the 12th Annual International Symposium on High Performance Computing Systems and Applications (HPCS'98), 1998. [12] R.D. Blumofe, P.A. Lisiecki. “Adaptive and Reliable Parallel Computing on Networks of Workstations”. In proc. of USENIX 1997 Annual Technical Conference on UNIX and Advanced Computing Systems, Anaheim, California, January 6-10, 1997. [13] P. Baratloo, V. Dasgupta, Z.M. Karamcheti, Kedem. “Metacomputing with MILAN”. Heterogeneous Computing Workshop, International Parallel Processing Symposium, April 1999. [14] N. Carriero, D. Gelernter, D. Kaminsky. J.Westbrook. “Adaptive Parallelism with Piranha”. Technical report. Department of Computer Science, Yale University. [15] The OpenMP Forum. OpenMP Fortran Application Program Interface, Version 1.0. http://www.openmp.org, October 1997. [16] H. Lu, Y.C. Hu, and W. Zwaenepoel. “OpenMP on Networks of Workstations”. In proc. of Supercomputing'98, October 1998. [17] P. Roe and C. Szyperski. “Gardens: An integrated programming language and system for parallel programming across networks of workstations”. 21st Australasian Computer Science Conference (ACSC'98), Perth, Western Australia, Australia, 4-6 February 1998. [18] R. van Nieuwpoort et al. “Wide Area Parallel Computing in Java”. In proc. of JAVA'99, San Francisco, CA, USA, 1999. [19] Y. Amir, B. Awerbuch, R. Borgstrom. “The JavaMarket: Transforming the Internet into a Metacomputer”. Technical Report CNDS-98-1. The Johns Hopkins University, 1998. [20] K. Keahey and D. Gannon. “PARDIS: A Parallel Approach to CORBA”. In proc. of the 6th IEEE International Symposium on High Performance Distributed Computing, August 1997. [21] T. Priol and C. René. “Cobra: A CORBA-compliant Programming Environment for High-performance Computing”. In Proc. of Euro-Par’98, LNCS, Vol.1470, Springer Verlag, Southampton, UK, September 1998. [22] I. Foster and C. Kesselman. “The Globus Project: A Status Report”. In proc. of IPPS/SPDP '98 Heterogeneous Computing Workshop, 1998. [23] A.S. Grimshaw, W.A. Wulf et al. “The Legion Vision of a Worldwide Virtual Computer”. Communications of the ACM, Vol. 40, No. 1, January 1997.