Development Routes for Message Passing Parallelism in ... - CiteSeerX

Development Routes for Message Passing Parallelism in Java J.A. Mathew, H.A. James and K.A. Hawick Distributed a n d High Performance C o m p u t i n g Group D e p a r t m e n t of C o m p u t e r Science, University of Adelaide Adelaide, SA 5005, Australia Tel +61 8 8303 4519, Fax +61 8 8303 4366 [email protected]

ABSTRACT

platforms, a n d especially on cluster based computing systems [4]. T h e Java p r o g r a m m i n g language a n d environment is an attractive i m p l e m e n t a t i o n technology for such message based parallelism due to the large a n d growing n u m b e r of Java programs a n d p r o g r a m m e r s in existence. As has been discussed elsewhere [19], Java Virtual Machines a n d libraries still have some performance limitations, but since this seems certain to continually improve it is worthwhile to consider the use of Java for high performance message passing systems now. This is a n active area with several groups a t t e m p t i n g implementations [2, 11, 21]. Developers seem to split into two m a j o r camps - those a t t e m p t i n g to provide Java bindings for the M P I [13] interface and those who have implemented non-conforming systems. We find good arguments for b o t h these approaches. Considerable work for message passing in a procedural or imperative style of prog r a m m i n g has been carried out by the M P I Forum and this should not be lost or reinvented. Conversely, there seems widespread agreement t h a t the bindings are not entirely appropriate for an O b j e c t Oriented language like Java a n d t h a t a lot of additional ease of use can be supplied by exploiting m e t h o d overloading (assumed arguments), encapsulated information a n d other OO techniques.

Java is an attractive environment for writing portable message passing parallel programs. Considerable work in message passing interface bindings for the C and Fortran languages has been done. We show how this work can be reused and bindings for Java developed. We have built a Pure Java Message Passing Implementation ( P J M P I ) t h a t is strongly compatible with the M P I standard. Conversely, the imperative programming style bindings are not entirely appropriate for the Java programming style and we have therefore also developed a less compatible system, known as JUMP, t h a t enables many of the message passing parallel technological ideas but in a way t h a t we believe will be more appropriate to the style of Java programs. J U M P is also intended as a development platform for many of our higher level ideas in parallel programming and parallel paradigms t h a t MPI enables but does not directly implement. We review ongoing a t t e m p t s at resolving this present crisis in reconciling Java and MPI. We have looked at some of the more advanced Java technologies, specifically Jiui a n d JavaSpaces, which may contribute to Java message passing, but have found the performance of these to be somewhat deficient at the time of writing. We have therefore designed J U M P to be independent of Jiui and JavaSpaces at present although use of these technologies may be strongly desirable. We describe the ClassLoading problem and other techniques we have employed in J U M P to enable a pure Java message passing system suitable for use on local and remote clusters amongst other parallel computing platforms. K e y w o r d s : Java, Java Grande, message passing, MPI,'--P J M P I , JUMP, Jini, JavaSpaces.

1.

INTRODUCTION

Message based parallel computing is a n i m p o r t a n t approach to achieving high performance on many existing computing

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed tbr profit or commercial advantage and that copies bear this notice and the full citation on the first page, To copy otherwise, to republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Java 2000 San Francisco CA USA Copyright ACM 2000 1-58113-288-3/00/6...$5.00

Similar controversy a n d indecision has plagued the message passing community over what should a n d could be accepted as a de facto s t a n d a r d for message passing in a n object based language like C + + . This issue is still unresolved although several good approaches have been proposed. Most previous MPI implementations have been constructed using a native C i m p l e m e n t a t i o n with any additional language support provided t h r o u g h bindings, eg to Fortran. This has been done for Java [2], b u t is not a n ideal approach given the dependence on a n M P I system a n d issues such as the lack of portability. It is possible a n d we believe desirable to consider a pure Java i m p l e m e n t a t i o n as well. Performance issues are of course i m p o r t a n t however, a n d it seems likely t h a t native a n d pure implementations may need to coexist for some time yet. In the face of the controversy a n d difficulty in arriving at an agreed s t a n d a r d approach we have developed two separate message based systems in Java. One, P J M P I , is a partial a t t e m p t to conform to the M P I s t a n d a r d although using pure Java i m p l e m e n t a t i o n techniques. T h e second, JUMP, is influenced by M P I ideas, b u t is not intended to be

54

compliant. Our main interest is in developing higher level parallel constructs and harnesses that might ultimately provide a base library for a parailelisation tool. P J M P I uses Java RMI and raw sockets for communications setup and operation respectively. J U M P uses only sockets for communications and is able to load its own classes without the need for shared stubs/skeletons or anything other than a known port number.

is not available as a standalone system. The installation of D O G M A onto a system is also non-trivial. A similar, albeit commercial, project is underway by MPI Software Technology [23]. Few details about the status of this project are available. Another effort to develop a pure Java version of MPI is the J M P I project [6]. The JMPI system largely complies with the draft Java mpi API [5]. JMPI is layered upon J P V M [9], a pure Java P V M implementation. Published figures [2] for latency and bandwidth show that b o t h J M P I and J P V M has a significant overhead on an UltraSparc 1 (compared to PVM). J M P I is currently not publicly available.

In this paper we describe these two systems, characterising the two main approaches to developing Java messaging systems, and compare their relative merits. We discuss some performance information achieved in section 4. We also consider the impact of emerging technologies like Jini and JavaSpaces services as a vehicle for implementing JUMP.

3.

2.

PJMPI ARCHITECTURE

P J M P I is implemented as a pure Java multi-threaded daemon that executes on each participating node. Users submit their program via a client to a daemon. The daemon then communicates with daemons running on other hosts as required using T C P / I P sockets, requesting that they execute tasks that comprise the job. Each task runs as a separate thread within the daemon program.

Despite the previous work done to provide MPI support for Java we believe that there is still work to be done in creating a pure Java open source MPI system. Our P J M P I system is designed and implemented as a simple pure Java "MPI-like" system. The MPI specification describes a total of 132 functions, however only a small subset of these are commonly used. The initial implementation of the P J M P I library includes the following functions which we consider to be the most important. These are the control and simple messaging routines: i n i t ; f i n a l i z e ; rank; s i z e ; send; i s e n d ; r e c v ; and i r e c v , as well as the simpler subset of multicast routines: b c a s t ; b a r r i e r ; r e d u c e ; a l l r e d u c e ; g a t h e r ; a l l g a t h e r ; and s c a t t e r . The syntax we have chosen is largely similar, but not identical to that specified in [5].

The client program is available in two versions. One is based on a command fine interface and the user is required to provide as arguments the following information: the program (class file) to be executed, the number of nodes to use and optionally provide a file listing the nodes to be used for the computation. A GUI-based version of the client, based on Java Swing components, has also been developed.

JAVA-MPI IMPLEMENTATIONS

The program to be executed must implement a wrapper Interface MPI_Progrem which has an e x e c u t e ( ) method: this contains the program code to be executed by the daemon. This interface also has a c o m p l e t e ( ) method that returns the percentage of the task t h a t has been completed. The graphical client program periodically requests for the amount of the task that has been completed and displays this value using status bars.

There are numerous libraries to support distributed computing that are publicly available. Most of these are either based on message passing or shared distributed objects. The de facto standard for distributed computing based on message-passing is MPI [13]. MPI is currently available for a large number of systems with bindings for C, C + + and Fortran, hence it would be worthwhile having a Java-based implementation. The JavaGrande Forum [19] has also been working on developing a standard API for a Java binding to MPI [5].

Once the job has been submitted, output from the various tasks can be redirected to the client. This is implemented via a static m e t h o d of the MPI class, o u t p u t () that enables any tasks to write to the standard output stream of the client program.

There are two basic approaches for developing a Java based MPI system. The first is to develop a Java binding for native MPI implementations using Java Native Interface (JNI). This has the advantage of being able to utilise existing MPI implementations. JavaMPI [6] and mpiJava [2] are implemented as wrappers for native MPI Implementations. An effort by the developers of the W M P I (MPI system for Microsoft Windows) to develop a Java interface for the W M P I is described in [21].

Once a daemon receives a job, it contacts daemons on other nodes as required to start tasks on those nodes. The daemon that initiates the job creates a unique Job ID that can be used to identify all tasks associated with a particular job. The initiating daemon starts the task with a rank of 0. Each new task runs as a separate thread in the corresponding daemon program. On calling the i n i t () method, each task daemon obtains a list of all tasks and corresponding hosts for a particular job from the initiating daemon. A socket connection between each pair of nodes that are participating in this job is created at this point: these sockets are used for communications as required by the program.

The second approach is to implement MPI (or a subset of the MPI standard) in Java. One of the key features of Java is its platform independence. Hence we believe that for Java to become widely used in the distributed computing community an efficient pure Java MPI implementation is vital. There are a few efforts currently underway to develop such a system. One such system, MPIJ, runs as part of the DOGMA [7] system. The D O G M A runtime system is freely available but the source is not. A limitation of M P I J is that it requires the D O G M A system to be installed and

As well as supporting the arguments normally used in MPI, the s e n d ( ) and r e c v ( ) methods have been overloaded to accept fewer arguments, since providing information about data types and array sizes is available in Java through the

55

use of reflection. This is a n example of where the syntax of MPI is "unnatural" in Java. T h e daemon is multi-threaded, and accepts incoming connections even if no matching r e c v () has been posted. These connections are stored internally until required by the program. Blocking and nonblocking send and receive operations have been implemented, but not buffered or ready communications operations.

ing t h a t the above figures are still substantially b e t t e r t h a t published figures for o t h e r pure Java based message passing systems such as J M P I a n d J P V M , upon which J M P I is implemented. T h e conclusion t h a t can be made from these figures is t h a t since the cost of communication is larger in Java t h a n in C, it is likely t h a t for optimal performance of a given problem, the granularity of the Java program should be higher, ie. the ratio of c o m p u t a t i o n to communication should be higher. This is also affected by the relative performance of Java a n d C versions of the c o m p u t a t i o n component of the programs.

The current implementation assumes t h a t class files of the program being executed are present on the CLASSPATH of aJl daemons, however this limitation could be removed by using the R M I C l a s s l o a d e r to distribute the class files. A problem with this approach is t h a t the R M I C l a s s l o a d e r obtains class files from a H T T P server and hence a simple H T T P server would need to be incorporated into the client. A n alternative approach would be to develop a custom classloader using sockets, similar to t h a t used by the Code Server [22].

4.

T a b l e 2: B a n d w i d t h measurements for different message types on different target platforms. Windows NT and Linux results are for 450 MHz Pentium III PCs. Host Message Data B/W Data B/W Type Size, bytes Type Mbits/s type Mbits/s Sparc 1 byte 0.004 int 0.016 Sparc 10 byte 0.041 int 0.11 Sparc 100 byte 0.39 int 0.21 Sparc 1000 byte 3.3 int 0.218 Sparc 10000 byte 9.3 int 0.17 Sparc 100000 byte 9.02 int 0.18 . S p a r c . 1000000 .. byte . 9.29 .. Alpha 1 byte 0.002 int 0.007 Alpha 10 byte 0.020 int 0.050 Alpha 100 byte 0.20 int 0.12 Alpha 1000 byte 1.7 int 0.13 Alpha 10000 byte 5.7 int 0.14 Alpha 100000 byte 9.0 int 0.15 Alpha 1000000 byte 13.0 NT 1 byte 0.004 int 0.004 NT 10 byte 0.044 int 0.040 NT 100 byte 0.45 int 0.14 NT 1000 byte 3.6 int 0.19 NT 10000 byte 18.2 int 0.22 NT 100000 byte 32.4 int 0.22 NT 1000000 byte 35.9 Linux 1 byte 0.005 int 0.005 Linux 10 byte 0.053 int 0.051 Linux 100 byte 0.53 int 0.17 Linux 1000 byte ~ 6.5 int 0.33 Linux 10000 byte i 21.8 int 0.36 Linux 100000 byte 37.8 int 0.40 Linux 1000000 byte 48.6 ]

PERFORMANCE MEASUREMENTS

In order to determine the performance of the P J M P I systems, three sets of measurements were obtained. First we measured the "Ping P o n t " time: the time required to send a single byte from one host to a n o t h e r a n d to receive a single byte as the response. This test also measures the effective latency between user spaces in the P J M P I environment. Measurements were also made of the available b a n d w i d t h as a function of the message size. This experiment was performed for the transmission of byte arrays and for integer arrays in order to determine the effect of marshaling the d a t a into byte arrays on performance. Finally, the performance of a program to numerically solve Laplace's equation was measured as a n example of a realistic (although simple) scientific application. There were four test systems used to obtain measurement. T h e first was a cluster of Digital AlphaStation 255's with 300Mhz C P U s and 128 MB of R A M and running Digital Unix. T h e machines communicate via an A T M network. T h e second system was two dual-processor Sun UltraSparc machines running Solaris, a n Ultra 2/170 a n d a n Enterprise 250, each with 128Mb of R A M connected via 100 M b / s Ethernet. T h e third and fourth test systems were clusters of dual Pentium III 550 MHz machines, each of which was fitted with 256 MB of R A M and 100 M b / s network interfaces. One cluster of which machines were running Window N T and the other one Linux. T h e Java environment on the Solaris a n d Linux systems was J D K 1.2.1, whilst JDK 1.2.2 was installed on the Digital Unix and Windows N T systems.

T h e b a n d w i d t h figures in table 2 show some interesting properties. T h e m a x i m u m b a n d w i d t h for all systems is less t h a n the m a x i m u m r a t e d capacity of 155 M b / s for A T M and 100 M B / s for ethernet. We have found t h a t when trying to send large byte arrays using the write m e t h o d available in the various Java o u t p u t stream classes, if the byte array is larger t h a n some m a x i m u m size, the d a t a being sent is truncated. This is similar to the s t a n d a r d write function for Clanguage sockets, however unlike the C-language version the Java i m p l e m e n t a t i o n does not r e t u r n the n u m b e r of bytes t h a t was t r a n s m i t t e d . Hence to ensure transmission of the data, we have to choose some safe m a x i m u m value for each call to the write method. This was selected to be 10Kb, and this is potentially the cause of the relatively low m a x i m u m

T a b l e 1: T i m e f o r a s i n g l e b y t e t o b e s e n t t o , a n d received from a remote machine using the PJMPI and MPICH environments. Host Type PJPMI M P I C H (C) Host Type Ping Pong (ms) Ping Pong (ms) Ultra Sparc 4.7 0.79 Digital Alpha 8.4 1.1 Linux 2.0 0.34 Windows N T 5.4

Table 1 shows t h a t the latency is substantially higher for the Java based implementation. Despite this, it is worth not-

56

5.

bandwidth that is available. Despite this problem the observed bandwidth obtained when transmitting byte arrays is quite acceptable.

AN ALTERNATIVE TO MPI BINDINGS

As previously mentioned, not all MPI bindings are sensible in an object-oriented programming environment such as Java. We propose a rethink of the message-passing interface specification in the light of OO technology becoming more popular.

However, the maximum bandwidth that is available when transmitting integer arrays is substantially less than if an equivalent amount of data was t r a n s m i t t e d as a byte array and the maximum available bandwidth is only a small fractions of that obtained when transmitting byte arrays. This overhead is due to the marshaling of data in byte arrays, which is discussed in [20]. In order to transmit an array of data that is not bytes in Java there are three options: 1) use the high level communications routines in the Data0utputStreeea class to individually transmit each array element; 2) marshall each data element into a byte representation and pack it into a byte array; or, 3) transmit the entire array as an object using the w r i t e 0 b j e c t ( ) m e t h o d of the 0 b j e c t [ l u t p u t S t r e a m class, which serializes the array.

In addition to portability, the Java language provides a number of other benefits too: single inheritance of classes ensures programming simplicity for new users by not requiring them to locate and compile their programs with arcane-named libraries; Java is strongly-typed, throwing an exception when a object is typecast to an invalid type, unlike C's default behaviour; Java's security managers are able to specify the permissible actions on resources, and the security managers are enforced by the runtime system. We have built an MPI-like environment written completely in Java, known (somewhat tongue in cheek) as the Java Ubiquitous Messaging Environment ( J U M P ) [17]. J U M P builds on our past experiences in building multi-threaded daemons in Java for software management [18]. We also utilise our experience in classloading in constructing code server databases of dynamically invocable Java bytes code from databases [22].

All of these approaches introduce some overhead. The mpiJava system uses native libraries for marshaling the d a t a to improve performance. Since P J M P I is a pure Java system, we did not want to take this approach. In our current system we have chosen the third option from the list above. However serialization has been shown to add significant overhead [3] and this results in the relatively poor performance. We accept this tradeoff for the present implementation.

J U M P programs can be run in a similar fashion to conventional parallel programs written in MPI or PVM. We are still experimenting with the configuration of such programs [16] and are trying to determine where configuration information should best be located. At present we utilise a flat hostfile of participating node names in a fashion similar to PVM.

As an example of a more realistic scientific problem, we have measured the performance of the numerical solution of Laplace's equation using finite differences and red/black checker boarding [24]. The parallel program was executed on the Compaq/Digital Alpha cluster (using 4 AlphaStations). For comparison purposes the performance of a serial version of the program was also measured. These results are presented in table 3.

6.

JUMP ARCHITECTURE

The J U M P architecture is composed of three levels: user code; a user-level API; and underlying message-passing libraries (or in this case, Java base class). The similarity between J U M P and other message-passing environments (PVM and MPI implementations) is shown in figure 1.

T a b l e 3: E x e c u t i o n t i m e f o r a s o l u t i o n t o L a p l a c e ' s e q u a t i o n u s i n g a s i n g l e m a c h i n e a n d m u l t i p l e mao chines. Iterations Serial Parallel Matrix Performed Time (s) Time (s) Size 1100 3.32 52.7 60 2640 30.5 60.5 120 1340 132 302 180 1560 375 777 240

Figure 2 gives an overview of the J U M P architecture. The architecture is divided into two parts: the J U M P base class and the J U M P daemon. An a b s t r a c t J U M P base class contains all the common MPI methods and custom initialisation and finalisation methods. A J U M P daemon ( J U M P d ) runs on each machine in the host pool, waiting to receive requests to start slave instances of user J U M P code. The daemon only exists as a b o o t s t r a p mechanism to initialize slave instances of user code. Figure 2 shows a user class extending the J U M P base class, which initiates communications through its local J U M P daemon. A network of socket connections are established between participating daemons on an as-needed basis. These can be reused to minimise socket instantiation overheads. Once a socket is established for that program between two nodes, no other will be created in our present implementation.

In these results the performance of the serial program is always superior to that of the parallel program. This arises due to a number of factors. Firstly, the Laplace algorithm involves relatively little computation for every iteration of the algorithm. The relatively high latency for MPI communications results in the poor performance that is evident. Hence for small arrays, the total time is dominated by the communication time. This is not a surprising result. As the array size increases, the relative performance of the parallel algorithm improves. The performance does not improve as rapidly as would be expected however, due to limitations on the available bandwidth as discussed above. This analysis conforms our belief that it is important to use Java communications technology that will yield the best possible bandwidth.

The user writes J U M P code by extending the J U M P base class. Extending the base class allows the user's code to be used by the J U M P environment by object polymorphism. The user is only required to supply two methods in order to extend the base class: a public void jumpRun(String []args) method and public static void main( String

57

User MPI Program

........ D o t t e d

User JUMP Class User

I imports

MPI API

I imports MPI Communications Substrate

denotes

....

Dashed

- -

Solid

denotes

separate H o s t separate Java V M

Class

l extends

denotes

one

or more o b j e c t s

insmntiatedand running

Abstract JUMP Baseclass (with API)

JUMP user class and daemon JUMPBaseClass running in various JVMs and hosts

I imports JUMP Communications Substrate

F i g u r e 1: M P I a n d J U M P a r e b o t h t h r e e - l a y e r message-passing architectures. In MPI the user imp o r t s t h e M P I A P I library w h i c h in t u r n i m p o r t s the MPI communications substrate; the JUMP user class e x t e n d s t h e b a s e c l a s s , w h i c h p r o v i d e s t h e A P I and initialisation routines. The JUMP communicat i o n s s u b s t r a t e is a c c e s s e d v i a a n i m p l e m e n t a t i o n independent interface.

.. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

LaunchingHost

:

]

*-. . . . . . . . . . . . . . . . . . . . . . . .

i

SlaveHostor Node

F i g u r e 2: R e l a t i o n s h i p s b e t w e e n U s e r C l a s s e x t e n d ing JUMP base class and JUMPd daemon running inside separate Java Virtual Machines.

form for the user to provide these explicitly when needed, we have chosen Java objects to be used as descriptors.

[] a r g s ). The former method is the main body of the user's code, run after initialisation, while the latter is used to initialise the computation. The main m e t h o d usually consists of a call to create a new instance of the user class and a call to jumpRun. The user chooses where in their main m e t h o d they will call jumpRun so the master instance is able to load any data it may require to distribute to the slaves. The current version of the J U M P environment communicates via T C P / I P and Internet-domain sockets.

In order for J U M P to function on a truly federated cluster of compute nodes, we cannot assume the presence of a shared file system. For this reason, we use a custom classloader to transfer the necessary code to the remote machines. This process is shown in figure 4. A copy of the user's bytecode, running in a single Java Virtual Machine (JVM) is shown in figure 4i. W h e n the code is to be transferred to a remote machine, a copy is sent to the local J U M P d . Because it cannot be assumed t h a t these two appfications will be running in the same J V M or even share a classpath, a socket is opened and the bytecode sent to the daemon. This is shown in figure 4ii. W h e n the remote daemons are contacted by the local J U M P d , a copy of the user's bytecode is sent to the remote daemon (shown in figure 4iii). The remote daemon a t t e m p t s to create an instance of the user's bytecode, shown in figure 4iv, and if the code references any other non-core java class, a C l a s s N o t F o u n d E x c e p t i o n will be thrown. This is shown in figure 4v, where the remote J U M P d contacts the user's code to request the missing bytecode. The remote daemon caches downloaded bytecode so that multiple instances of the slave can be instantiated without incurring a large network overhead. W h e n all of the relevant code is downloaded, the slave instance can be properly instantiated and the details of the slave are returned to the master. This is shown in figure 4vii.

Initialization of the J U M P environment is done automatically by the J U M P base class. Initialization occurs when the user instantiates an instance of their J U M P class in the main() method of the class. The steps that occur are shown in figure 3. When the user's code is first initialized it is not connected to any other programs, as shown in figure 3i. Through functionality built into the J U M P base class, the local J U M P daemon ( J U M P d ) is contacted. This is shown in figure 3ii. If a host list was specified when the user's class was initialized from the main() method, the list is passed to the local daemon. If required, the local J U M P d requests that remote J U M P d ' s become part of the distributed computation by sending them a request bundled with a copy of the user program byte code. As shown in figure 3iii, they respond their availability by returning their host details (including a port number t h a t the slave instance of the byte code is listening to). Figure 3iv shows a copy of the complete host table being distributed from the daemon local to the master user program. Finally, each of the slaves that are created as part of the distributed system are able to begin executing, as shown in figure 3v.

directly

To allow for the possibility of multiple, independent user programs running on the same physical machine, messages between user classes and the daemons, and between daemons axe tagged with the source and destination host IP address and the port on which the program is listening. With the exception of the J V M crashing user programs are protected from interference of other programs running on the

Java encapsulation and overloading allows us to provide default communicator tags to partition message space appropriately to avoid library layers colliding with each others messages. While we are still debating the best syntactic

58

i)

b y t e c o d c to d a e m o n w i t h n u m b e r o f r e m o t e c o p i e s a a d arguments • "" "JV~f h h - d o u ,

i) sends user bym c o d e to r e t o o l ° d a e m o n .

iv)

fi)

I

R e m o t e d a e m o n tlies to instantlate n e w instance o f user class.

i'iii!!il

v,

C u s t o m C l a s s L o a d ° r traps ClassNotFoundExoelRion and sends request for

l

vi) iv)

:........... "J'~u~'~aq

vii) When m~to d ~ n has all noc.essary classos, S l a v e i~st ~ c z o f u s e r c o d e is c ~ t e d .

F i g u r e 4: J U M P distributed classloading mechan i s m . i) T h e u s e r p r o g r a m l o a d s a s e r i a l i z e d c o p y of its class byte code in preparation for distribut i o n . ii) T h e s e r i a l i z e d b y t e c o d e is s e n t t o t h e local J U M P d a e m o n , ill) T h e b y t e c o d e is s e n t w i t h an initialisatlon request and run-time parameters to t h e r e m o t e J U M P d a e m o n s , iv) T h e r e m o t e J U M P daemon tries to instantiate an instance of the user p r o g r a m o n t h e r e m o t e m a c h i n e , v) I f t h e u s e r p r o g r a m u s e s o t h e r n o n - J a v a c o r e c l a s s e s , a r e q u e s t is sent to the master instance of the user program. vi) T h e r e q u e s t e d c l a s s e s a r e s e n t t o t h e r e m o t e d a e m o n , w h e r e t h e y a r e c a c h e d . S t e p s v) a n d vi) a r e r e p e a t e d as r e q u i r e d , vi) T h e s l a v e i n s t a n c e is i n s t a n t i a t e d a n d t h e r u n Jump() m e t h o d is p r e p a r e d with the run-time arguments.

F i g u r e 3: I n i t i a l i z a t i o n o f t h e J U M P e n v i r o n m e n t . i) T h e u s e r e x e c u t e s t h e i r p r o g r a m , ii) A s o c k e t is c r e a t e d t h a t c o n t a c t s t h e l o c a l d a e m o n , iii) T h e local JUMP daemon selects the appropriate number of remote hosts and their JUMP daemons an initialization request. They respond with the host I P / p o r t t h a t t h e s l a v e p r o g r a m l i s t e n s t o . iv) W h e n all t h e r e m o t e d a e m o n s h a v e r e s p o n d e d f a v o u r a b l y to the initiallzation request, the complete host table is d i s t r i b u t e d f r o m t h e m a s t e r J U M P d a e m o n .

59

same machine.

7.

T a b l e 4: T i m e t o w r i t e a m e s s a g e t o a l o c a l J a v a S p a c e a n d t o r e a d it b a c k . Host Initialisation Message Size Time (ms) Time (ms) 126 Alpha 5080 13.3 Solaris 1410 13.3 Linux 902 16.7 Windows N T 2180

FUTURE DIRECTIONS FOR JUMP

J U M P uses a similar approach to message communicator tags and grouping mechanisms t h a t MPI specifies a n d which was pioneered in the Edinburgh C H I M P system [8]. This is i m p o r t a n t to allow library layers of more sophisticated communications to operate with internal messaging t h a t does not impact on the explicit messages sent by user program code. We anticipate t h a t J U M P will provide a platform with which we can investigate the challenging area of d a t a layout specification. For example H P F [12] provides the ability to distribute d a t a in block and cyclic patterns, depending on the way the d a t a is to be used. We plan to introduce a n extra layer between the user's code and the J U M P base class which will allow communications to be automatically routed to the correct user j u m p instance in the current d a t a layout. For example, consider a weather simulation t h a t combines the computed predictions with observed d a t a (from ships, aircraft and weather balloons). T h e E a r t h is likely to be modeled as a regular grid; therefore the code t h a t operates on the previous predictions for the surface of the E a r t h for which there is no observed d a t a is likely to be a regular grid decomposition. However observational d a t a is not spread uniformly across the surface of the Earth. It is tightly clustered around certain points (most corresponding with shipping lanes and flight paths); this observational d a t a may be best processed using a task farming model with the ability to u p d a t e the cells of the regular grid t h a t are affected. W h e n this extra layer of functionality is added to the J U M P environment we expect the architecture to look like t h a t shown in figure 5. This figure shows the proposed three-level architecture incorporating support for d a t a layout specification. . . . . COubs~tio .

.

. JUMP.B a ~ l ~. (with AP[)

Data Layogt Sl~Clf~tlon

JavaSpaces technology is a Linda-based distributed computing technology developed by Sun. JavaSpaces has been implemented on top of Jini [1]. Whilst Linda-based paradigms can potentially simplify the process of distributed programming, the performance of such systems has been considered to be poor a n d our performance measurements confirm this. Sample performance m e a s u r e m e n t s are outlined in table 4. T h e time to obtain a reference to a JavaSpace n m n i n g on the local machine, as well as the time to write a message to the space a n d to read it back were measured. We have also found t h a t accessing a space on a remote machine does not significantly affect the performance, indicating t h a t network capability has only a small impact on the measured performance. These m e a s u r e m e n t s show t h a t JavaSpaces introduces a significant communications overhead, t h a t currently limit it's usefulness. We have also found t h a t JavaSpaces has significant hardware requirements, b o t h in t e r m s of C P U power and memory requirements. C u r r e n t versions of JavaSpaces exhibit some instability a n d installation a n d configuration of JavaSpaces can be non trivial. Based on our results a n d experiences, we believe t h a t JavaSpaces is currently unsuitable for high performance distributed computing. As the performance a n d stability improves in future versions of the product, it is likely to become a more viable alternative to M P I systems.

User JUMPClass

F i g u r e 5: F u t u r e v e r s i o n s o f J U M P will i n c l u d e a n extra layer of functionality to allow different data layout specifications and used by users.

8.

CONCLUSIONS

We have discussed the main issues for development of a message passing system using Java a n d have described our two a t t e m p t s P J M P I a n d J U M P t h a t approach the problem from two different directions. We believe t h a t b o t h these approaches may need to coexist. It is advantageous to have a n MPI-compllant system as it allows access a n d interoperability with native a n d truly fast MPI implementations from Java programs. However, as the performance of J V M s improve it will be desirable to allow users to capitalise on Java's other features. In particular, we believe there is considerable scope for investigation into use of higher level parallelism constructs t h a t will be much easier to write correctly in OO Java t h a n using procedural bindings. J U M P is intended as a n experimental framework for carrying out this work. We are encouraged by our performance results so far, b u t recognize t h a t J V M performance will surely improve significantly in the future.

While the current version of J U M P communicates via sockets, it is not reliant on any particular communications technology. T h e current version is written using T C P / I P a n d Internet-domain sockets; the architecture does not preclude alternate versions t h a t use, for example, RMI or JavaSpaces [10]. We are experimenting with the tuple space approach offered by JavaSpaces. T h e tuple space model where messages are tagged with complex predicate objects is an attractive mechanism for building higher level systems. However, a limitation of t h a t system is lack of support for "spaces of spaces". A single node would have to host the entire communications space, which we believe is not suitable for a high performance communications system. We are investigating ways to enable scalable tuple spaces. We have compared this with the performance t h a t would arise if all communications in our system were brokered t h r o u g h a single host. This will typically present too much message congestion for any practical applications. We are waiting for the JavaSpaces technology to m a t u r e further before we use it to implement the J U M P communications substrate.

9.

REFERENCES

[1] Ken Arnold, B r y a n O'Sullivan, R o b e r t W. Sehiefler, Jim Waldo a n d Arm Wollrath. The Jini Specification,

60

[16] K. A. Hawick. The Configuration Problem in Parallel and Distributed Systems. Technical Report DHPC-076, C. S. Dept, The University of Adelaide, November 1999.

Addison Wesley Longman, June 1999. ISBN 0-201-61634-3. [2] Mark Baker, Bryan Carpenter, Geoffrey Fox, Sunh Hoon Ko, and Xinying Li. mpiJava: A Java MPI Interface. http://www.npac.syr.edu/projects/pcrc/papers/mpiJava/

[17] K. A. Hawick and H. A. James. A Java-Based Parallel Programming Support Environment. Technical Report DHPC-081, C. S. Dept, The University of Adelaide, November 1999.

[3] Bryan Carpenter, Geoffrey Fox, Sung Hoon Ko, and Sang Lim. Object serialization for marshaling data in a Java interface to MPI. In ACM JavaGrande Conference 1999.

[18] K.A.Hawick, H.A.James, A.J.Silis, D.A.Grove, K.E.Kerry, J.A.Mathew, P.D.Coddington, C.J.Patten, J.F.Hercus and F.A.Vaughan, DISCWorld: An Environment for Service-Based Metacomputing, Journal of Future Generation Computer Systems 15(1999)623-635.

[4] Rajkumar Buyya, Mark Baker, Ken Hawick, and Heath James, editors. Proc. First IEEE Computer Society International Workshop on Cluster Computing (IWCC'99). IEEE Computer Society, December 1999. ISBN 0-7695-0343-8.

[19] Java Grande Forum. Java Grande Forum Home Page. ht t p: / / www.j avagrande.org

[5] Bryan Carpenter, Vladimir Getov, Glenn Judd, Tony Skjellum, and Geoffrey Fox. MPI for Java: Position document and draft API specification. Java Grande Forum Technical Report JGF-TR-03, November 1998.

[20] Glenn Judd, Mark Clement, Quinn Snell, and Vladimir Getov. Design Issues for Efficient Implementation of MPI in Java. In ACM JavaGrande Conference 1999.

[6] Kivanc Dincer. Ubiquitous Message Passing Interface implementation in Java: JMPI. In Proc. 13th Int. Parallel Processing Syrup. and lOth Syrup. on Parallel and Distributed Processing. IEEE, 1998.

[21] P. Martins, L. M. Silva, and J. Silva. A Java Interface for WMPI. LNCS, 1497:p121-, 1998.

[7] Dynamic Object Group. DOGMA Home Page. http://ccc.cs.byu.edu/DOGMA.

[22] J. A. Mathew, A. J. Sifts, and K. A. Hawick. Inter Server Transport of Java Byte Code in a Metacomputing Environment. In Proc. TOOLS Pacific (Tools 28), 1998.

[8] Edinburgh Parallel Computing Centre. CHIMP concepts. June 1991.

[23] MPI Software Technology web page. January 2000. http://www.mpi-softtech.com

[9] Adam J. Ferrari. JPVM: Network Parallel Computing in Java, 1998. In 1998 ACM Workshop on Java for High Performance Network Computing, 1998.

[24] W. H. Press, S. A. Teukolsky, W. T. Vetterling, and Brian P. Flarmery. Numerical recipes in C. 1988. [25] Sun Microsystems. Java Remote Method Invocation (RMI). Available at http://java.sun.com/products/jdk/1.2/docs/guide/rmi/index.html.

[10] Eric Freeman, Susanne Hupfer and Ken Arnold. JavaSpaces Principles, Patterns, and Practice, Addison Wesley Longman, June 1999. ISBN 0-201-30955-6. [11] Vladimir Getov, Susan Flynn-Hummel, and Sava Mintchev. High-performance parallel programming in Java: Exploiting native libraries. In 1998 ACM Workshop on Java for High Performance Network Computing, 1998. [12] High Performance Fortran Forum. High Performance Fortran Language Specification version 2.0. High Performance Fortran Forum, January 1997. [13] Message Passing Interface Forum. MPI: A message passing interface standard, 1995. [14] A. Geist, A. Beguelin, J. Dongarra, W. Jiang, R. Manchek, and V. Sunderam. PVM: Parallel Virtual Machine A Users' Guide ~ Tutorial for Networked Parallel Computing. MIT Press, 1994. [15] W. Gropp, E. Lusk, N. Doss, and A. Sjkellum. A High-Performance, Portable Implementation of the MPI Message Passing Interface Standard. Argonne National Laboratories, 1996.

61

4 .

Development Routes for Message Passing Parallelism in ... - CiteSeerX

Development Routes for Message Passing Parallelism in ... - CiteSeerX

Suggest Documents

Development Routes for Message Passing Parallelism in ... - CiteSeerX

Concurrency and Message Passing in Erlang - CiteSeerX

Message-Passing Program Development by Ensemble

Message passing for vertex covers

Robust Message-Passing for Statistical Inference in ... - CiteSeerX

Interoperability of Message-Passing Interface - CiteSeerX

Parallel Tabu Search Message-Passing Synchronous ... - CiteSeerX

Message Passing Algorithms - People

SIMPLIFIED GRID MESSAGE-PASSING

Extending the Message Passing Interface (MPI) - CiteSeerX

Scheduling on Heterogeneous Message Passing Parallel ... - CiteSeerX

ON VARIATIONAL MESSAGE PASSING ON FACTOR ... - CiteSeerX

The Portals 3.3 Message Passing Interface - CiteSeerX

Message Passing, Remote Procedure Calls and ... - CiteSeerX

Message Passing, Remote Procedure Calls and ... - CiteSeerX

Message Passing Programming

Message Passing Support on StarT-Voyager - CiteSeerX

Message-passing over shared memory for the DECK ... - CiteSeerX

An Efficient Clustering Technique for Message Passing ... - CiteSeerX

MPICH-GQ: Quality-of-Service for Message Passing ... - CiteSeerX

Efficient Serial Message-Passing Schedules for LDPC ... - CiteSeerX

Two-Bit Message Passing Decoders for LDPC Codes ... - CiteSeerX

Message passing for task redistribution on sparse graphs - CiteSeerX

Collective operations for wide-area message-passing ... - CiteSeerX