GMI: Flexible and Efficient Group Method Invocation for Parallel Programming Jason Maassen, Thilo Kielmann, Henri E. Bal Faculty of Sciences, Division of Mathematics and Computer Science Vrije Universiteit, Amsterdam, The Netherlands
fjason,kielmann,
[email protected]
ABSTRACT We present a generalization of Java’s Remote Method Invocation model (RMI) providing an efficient group communication mechanism for parallel programming. Our Group Method Invocation model (GMI) allows methods to be invoked either on a single object or on a group of objects, the latter possibly with personalized parameters. Likewise, result values and exceptions can be returned normally (as with RMI), discarded, or, when multiple results are produced, combined into a single result. The different method invocation and reply handling schemes can be combined orthogonally, allowing us to express a large spectrum of communication mechanisms of which RMI and MPI-style collective communication are special cases. With GMI, the invocation and reply schemes can be selected for each method individually at run time. GMI has been implemented in the Manta high-performance Java system. Using several micro-benchmarks and applications (partially taken from the Java Grande benchmark set), we investigate GMI’s runtime efficiency and compare to the collective communication operations from the mpiJava library. Our prototype implementation performs competitively to or even faster than mpiJava.
1.
INTRODUCTION
Object-oriented languages like Java support distributed programming using the Remote Method Invocation (RMI) model. The key advantage of RMI is that it extends the object-oriented notion of method invocation to communication between processes. For parallel applications, however, many authors have argued that RMI is inadequate and that support for alternative forms of communication is also needed, allowing interaction between groups of objects [10, 16]. For example, multicast may be implemented by using multiple RMI calls, but this is cumbersome and cannot exploit efficient low-level (hardware) multicast primitives. More complex communication, like personalized multicast or data reduction are even harder to express using RMI. For many parallel applications, communication between groups of objects is needed, both for code simplicity and application speed. One approach to introduce group communication is to extend object-oriented languages with support for collective communica-
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. To copy otherwise, to republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Copyright 2001 ACM X-XXXXX-XX-X/XX/XX ...$5.00.
http://www.cs.vu.nl/manta/
tion through a native library such as MPI [10, 14, 23]. This increases expressiveness, but at the cost of adding a separate model based on message passing, which does not integrate with (methodinvocation based) object models. MPI deals with static groups of processes rather than with objects and threads. In particular, MPI requires that collective operations are explicitly invoked by all participating processes, so processes have to execute in lock-step, which is ill-suited for object-oriented programs. Another approach to introduce group communication is object replication, which can be integrated seamlessly with method invocations [4, 16]. However, object replication can only be used to express multicast (by writing on a replicated object), but not for other forms of group communication. In this paper, we introduce an elegant approach to integrate flexible group communication1 into an object-oriented language. Our goal is to express group communication using method invocations, just like RMI is used to express point-to-point communication. We therefore generalize the RMI model in such a way that it can express communication with a group of objects. This extended model is called Group Method Invocation (GMI). The key idea is to extend the way in which a method invocation and its result value are handled. A method invocation can be forwarded to a single object or to a group of objects (possibly personalizing the parameters for each destination). The result value (including exceptions) can be returned normally, discarded, forwarded to a separate object, or, when multiple results are produced, combined into a single result. All schemes for handling invocations can be combined with all schemes for handling results, giving a fully orthogonal design. Due to this orthogonal approach, GMI is both simple and highly expressive. GMI can express MPI-style collective communication, where all members of a group collectively invoke the same communication primitive. Unlike MPI, GMI also allows a single thread to invoke a method on a group of objects, without requiring active involvement from other threads (e.g., to read information from a group of objects). Because of GMI’s orthogonal design, it can also express many communication patterns that have thus far been regarded as unrelated, special primitives. In particular, object replication, futures, and voting can easily be expressed using GMI. We have implemented GMI as part of the Manta high-performance Java system [17] and we have used it for a variety of applications, including parts of the MPJ benchmark suite from the Java Grande Forum [9]. We will show that GMI can be implemented highly efficiently. For example, invoking a method on a group of
1 Please note that, in this paper, the term group communication refers to arbitrary forms of communication between groups of objects; in the operating systems community, the term is often used to denote multicast, which we regard as just one specific form of group communication.
64 remote objects takes about 70 s on a Myrinet cluster. In the paper, we will do a detailed evaluation of both low-level performance of GMI and of the performance (speedups) of applications. The main contribution of the paper is a generalization of RMI to an expressive, efficient, and easy-to-use framework for parallel programming that supports a rich variety of communication patterns, while seamlessly integrating with object-oriented languages. The remainder of the paper is structured as follows. In Section 2 we describe our GMI model. In Section 3 we briefly describe the API and the implementation of this model in Manta. Section 4 shows examples, and in Section 5 we describe the performance of benchmarks and six applications on a 64-node Myrinet cluster. Section 6 presents related work and Section 7 concludes.
2.
With RMI, result handling by stub and skeleton is fixed: they always operate synchronously. After the stub forwards the method to the skeleton, it waits for a reply message before continuing. The skeleton must therefore always send a reply back to the stub (even if the method has no result). Furthermore, a stub in RMI always serves as a remote reference to a single object, which can not be changed once the stub has been created.
2.2 Group Method Invocation In GMI, we generalize the RMI model in three ways. First, a stub can be used as a reference to a group of objects, instead of just to a single one. In Figure 3, the stub (on machine 1) serves as a reference to a group of two objects (on machines 2 and 3).
THE GMI MODEL
In this section, we describe the GMI model, and how it is used to implement group communication. We will first summarize the RMI model and show how we have generalized it to support groups of objects. We will then describe how communication patterns within these groups can be used to implement many types of group communication.
1111 0000 0000 1111 0000 1111 0000 1111 skeleton 1111 0000 0000 1111 0000 1111 0000 1111 0000 1111 0000 1111 0000 1111 0000 1111
skeleton
0000 1111 1111 0000 0000 1111 0000 1111 0000 1111 0000 1111 0000 1111 0000 1111
1111 0000 0000 1111 0000 1111 0000 1111
stub
1111 0000 0000 1111 0000 1111 0000 1111 0000 1111 0000 1111 0000 1111 0000 1111
skeleton
SumImpl
machine 3
stub
Figure 1: Stubs acting as remote references in RMI.
reply
thread machine 1
1111 0000 0000 1111 0000 1111 0000 1111 0000 1111 0000 1111 0000 1111 0000 1111
stub
Figure 3: A stub acting as group reference in GMI. Second, to allow method invocations and result values to be handled in many different ways, communication code for stubs and skeletons is generated respectively. As with RMI, exceptions are treated as special cases of result values. Third, through a simple API, the programmer can configure the stubs and skeletons for each method individually at run time. Different forwarding and result-handling schemes can be combined, giving a rich variety of communication mechanisms. By configuring stubs at run time, the communication behavior of the application can easily be adapted to changing requirements. Below, we will first describe the notion of object groups, and then discuss the forwarding schemes, the result-handling schemes, and their combinations. We finally discuss synchronization issues on object groups.
2.2.1 Object Groups
machine 2
add(3.8)
SumImpl
machine 3
RMI allows methods of an object to be invoked remotely. Such methods must be defined in a special remote interface. An object can be made suitable for receiving RMIs by implementing this remote interface. When this object is compiled, two extra objects are generated, a stub and a skeleton. The stub contains special, compiler-generated implementations of the methods in the remote interface, and can be used as a remote reference to the object. The example in Figure 1 shows two stubs, acting as remote references to a single object, on machine 3. We use the Sum object from Section 4 as running example.
machine 1
1111 0000 0000 1111 0000 1111 0000 1111
stub machine 1
2.1 Remote Method Invocation
SumImpl
machine 2
invoke message reply message
1111 0000 0000 1111 0000 1111 0000 1111 skeleton
add(3.8)
reply
SumImpl
machine 3
Figure 2: A method invocation via a stub. When a method of the stub is invoked, the stub forwards the invocation to the skeleton by sending a network message with a description of the method and its parameters (see Figure 2). The skeleton waits for a message, decodes it, invokes the requested method on the object, and returns the method’s result to the stub, which returns it to the invoker.
An object group consists of several objects, possibly on different machines. Groups are created dynamically. However, for proper semantics of group operations, groups become immutable upon creation [20]. The objects in the group are ordered and can be identified by their rank. Ranks and the group size are available to the application. The objects in a group may have different types, but they must all implement the same group interface, which serves a similar function as a remote interface: it defines which methods can be invoked on the group and makes the compiler generate stub and skeleton objects for this interface.
2.2.2 Forwarding Schemes To support communication with multiple objects, GMI allows a stub to act as a reference to a group of objects. Depending on the application, the stub must be able to forward invocations to one or more objects in the group. GMI offers the following forwarding schemes:
single invocation The invocation is forwarded to a single (possibly remote) object of the group, identified via its rank. group invocation The invocation is forwarded to every object in the group. personalized group invocation The invocation is forwarded to every object in the group, while the parameters are personalized for each destination using a user-defined method.
The first scheme is similar to a normal RMI. When configuring the stub, the programmer can specify to which group object the method invocations are forwarded. Unlike RMI, however, GMI allows this destination to be different for each invocation. The second scheme can be used to express a multicast. The same method invocation (with identical parameters) is forwarded to each object in a group. The last scheme is suitable for distributing data (or computations) over a group. Before the invocation is forwarded to each of the group objects, a user-defined personalization method is invoked. This method serves as a filter for the parameters to the group method. It gets as input the parameters to the group method, the rank of the destination object, and the size of the group. It produces a personalized set of parameters as output. These parameters are then forwarded to the destination object. Thus a personalized version of the call is sent to each group object. In GMI, each method in a stub can be configured separately, at runtime, to use a specific forwarding scheme. As a result, the group stub can be configured in such a way that each of its methods behaves differently. By configuring a method in a stub, the programmer selects which compiler-generated communication code is used for method forwarding. For every method in a group interface, the compiler generates communication code for each forwarding scheme. Although this increases the code size of the stub, it allows the compiler to use compile-time information to generate optimized communication code for each method.
2.2.3 Result-Handling Schemes Java RMI only supports synchronous method invocations, which is too restrictive for parallel programming. The GMI model therefore does not require the invoker to always wait for a result. Instead, it offers a variety of result-handling schemes that can be used to express asynchronous communication, futures, and other primitives. The skeletons in GMI offer the following result handling schemes:
discard results No results are returned to the stub. They are discarded by the skeleton. forward results All results are returned to the stub, which forwards them to a user-defined handler object (rather than the original invoker). return one result A single result is returned to the stub. If the method is invoked on a single object, the result can be returned directly. Otherwise, one of the results (selected via a rank) is returned to the stub. combine results Combine all results into a single one (using a user-defined filter method). The final value is returned to the stub.
discard results allows a method to be invoked asynchronously. The stub will forward the invocation to one or more skeletons and return immediately (without waiting for a reply). If the method returns a result, a default value (e.g., ’0.0’ or a Null reference) will be returned. Any result value returned (or exception thrown) by the method are directly discarded by the skeletons. The second scheme, forward results, allows results (including exceptions) to be forwarded to a separate handler object. The stub will forward any method invocations, and return immediately. However, the skeletons do return their result values to the stub. The stub will forward them to a user-defined handler object, where the original invoker can retrieve the final results later on. This mechanism can be used to implement futures (where the result value is stored until it is explicitly retrieved), voting (where all results are collected and the most frequently occurring one is returned), selection (of one result) and combining (of several results). The third result handling scheme, return one result, is the default way of handling results in unicast invocations like RMI. The stub forwards a method invocation and blocks until the skeleton returns a result. If multiple results are produced, the user selects in advance one of the group objects by its rank. This has the advantage that only a single skeleton has to return a result, avoiding communication overhead. machine 2 skeleton
1111 0000 0000 1111 0000 1111 0000 1111
e ag
s
es
result message (3.0)
em
ok
get()
inv
0000 1111 1111 0000 0000 1111 0000 1111 0000 1111 0000 1111 0000 1111 0000 result 7.0 1111
inv
oke
bin
thread machine 1
stub
me
ssa
com
ed
ge
res
ult
(3.0
+4
.0)
1111 0000 0000 1111 0000 1111 0000 1111
skeleton
get()
result 3.0
SumImpl
get()
result 4.0
SumImpl
machine 3
Figure 4: An example of result combining
The final scheme, combine results, is useful when multiple results are produced. Instead of combining results in the stub (as with the forward results scheme), the skeletons communicate amongst each other to combine all results into a single one. An example is shown in Figure 4, where an invocation is forwarded to a group of two objects. Both group objects execute the method and return their result to the local skeleton object. The first skeleton then forwards its result to the second skeleton which waits for this result to arrive and combines it with its own result (using a user-defined combine method), and then returns the final result to the stub. The combine method acts a filter and takes multiple return values (including exceptions) as input, and produces a single result as output. This scheme can combine multiple results in parallel, for example by arranging the skeletons in a tree structure. This is important if the object group is large. This scheme can significantly save communication overheads, e.g. for data reduction operations. Any exception thrown during the execution of a combine operation will be forwarded in place of regular result values. The overall result of the combine operation will then be one of the thrown exceptions.
invocation single group personalized
discard asynchronous RMI multicast scatter
Table 1: Combinations of stub-skeleton behavior reply handling forward return one combine future RMI collective combine (allreduce/allgather) voting/flat combine synchronous multicast group combine (reduce/gather) scatter + voting/flat combine synchronous scatter scatter + group combine
2.2.4 Combinations of the Forwarding and ReplyHandling Schemes We presented three different schemes to forward a method invocation and four schemes for handling the results. GMI allows these schemes to be combined orthogonally, as shown in Table 1. In the table, scatter, (all)gather, and (all)reduce refer to the equivalent collective operations from MPI. Flat combine denotes a combine operation where stub and skeletons are arranged in a flat communication tree. All twelve combinations result in useful communication patterns, making GMI highly expressive. In Section 4, we illustrate several forms of group communication with examples. One important advantage of GMI is that it subsumes two general forms of communication: MPI-style collective operations, which require active participation from all threads, and RMI-style group operations, which are initiated by a single thread and do not require participation from other threads. We illustrate the importance of RMI-style group operations by showing how a group combine operation can be used to implement a load balancing mechanism. In Figure 5, a thread produces jobs for three servers, each containing an object which together form a group. These objects monitor the load of their server. The thread invokes the getLoad method on each of the objects using a group invocation (A). Each object executes this method and returns an estimate of the local load. The thread then selects the least loaded server using a combine operation (B), and sends it the job (C). This application is difficult to express using MPI-style collective communication, because it is not known in advance when the load balancing information will be requested. Such asynchronous events are easily expressed with group invocations, but not with collective communication. The servers could either frequently poll for incoming getLoad messages, or periodically broadcast their latest load information, both of which would degrade performance.
2.2.5 Synchronization and Ordering of Method Invocation In RMI, no guarantees are given about the order in which method invocations are received. For example, if the two stubs in Figure 1 simultaneously forward a method to the skeleton, the order in which the skeleton receives (or handles) these invocations is not defined. It may even run the two invocations concurrently, depending on the behavior of the network and the implementation of the skeleton. The programmer can use the regular Java monitor mechanism (synchronized/wait/notify) to ensure that the object behaves correctly, independent of the order in which method invocations are received. Similarly, the GMI model does not guarantee any ordering of group invocations. When two group invocations are done simultaneously on the same group, they may be received in different orders by different group objects. Semantically, a group invocation is equivalent to doing multiple RMI calls, and synchronization problems have to be addressed by the programmer in the same way as with RMI. The advantage of using group invocations, however, is that all the details of message forwarding and result processing are
handled efficiently by the GMI implementation. For applications where group objects are merely intended as replicas of a single object, it is preferable to use the RepMI program model [16]. Our native implementation of RepMI offers replicated objects that are kept consistent using totally-ordered multicast and a replicationaware thread scheduler. GMI was specifically designed for those applications where RepMI’s consistency model is too strict and alternative communication forms (like personalized broadcast) are needed. However, RepMI can also be implemented on top of GMI.
3. IMPLEMENTATION In this section we will briefly describe the API and implementation of the GMI system. The GMI model is not restricted to any specific programming language. We chose to implement GMI in the Manta high-performance Java system [17]. Below, we will describe how we have extended Manta to support the group model.
3.1 API We have designed an Applications Programming Interface for the GMI model. Just like Java RMI, this API uses a special interface and library classes, but no language extensions. The library routines are written entirely in Java and are therefore fully portable. We have defined a Java package group that contains the necessary classes and interfaces for using groups of objects. The GroupMethods interface is similar to the java.rmi.Remote interface. It does not define any methods, but serves as a signal to the compiler. The compiler recognizes interfaces that extend GroupMethods as a group interface (see Section 2.2), and then generates the required stub and skeleton classes. The GroupMember class contains basic functionality for group objects (which must always extend this class) and is similar to the java.rmi.UnicastRemoteObject of RMI. It contains fields that give the rank of the object in the group, and the total size of the group. The GroupStub class is the parent class for generated stubs. It contains the setInvoke and setResult methods, which select the forwarding and reply-handing scheme for a method. setInvoke and setResult thus are the core of the GMI API. The Group class contains methods to create new groups, to add objects to groups, and to create group stubs. In Section 4, we will show some examples of how this API can be used in applications.
3.2 Implementation We have implemented the model and its API in the Manta highperformance Java system [17]. The Manta system has been specifically designed for parallel programming in Java. Manta contains a native Java compiler (i.e., it compiles Java source code directly to native executables instead of byte code), an efficient runtime system, and a highly optimized implementation of RMI. Briefly, the RMI implementation is optimized in four ways. The generated code for the RMI stubs and skeletons is generated in C instead of Java, the serialization mechanism used in Java is replaced with a more efficient mechanism (also based on generated C code), the RMI library has been replaced with an efficient C-based runtime system, and, instead of using TCP/IP for the actual communica-
load=8
load=8
load=8
)
d oa
tL
ge getLoad
thread
Server 1
getLoad
load=4
0000 1111 0000 Server3 1111 0000 1111 stub 0000 1111
thread
ge
tL
oa
d
Server 1
e
(S 0000 1111
1111 0000 0000 1111 0000 1111 0000 1111 stub 0000 1111 0000 1111 0000 1111 0000 1111
,8
r1
rve
(Server2, 4)
0000 1111 1111 0000 0000 1111
load=4
thread
(S
Server 2
Server 1 putJob
er
1111 0000 0000 1111 0000 1111 0000 1111 stub 0000 1111 0000 1111 0000 1111 0000 1111
load=4 pu
tJ
ve
Server 2
ob
Server 2
r3
,1
)
A
load=1
load=1
load=1
Server 3
Server 3
Server 3
B
C
Figure 5: Load balancing using GMI
tion, we use the Panda communication substrate [4]. An important performance optimization is that our system uses efficient lowlevel communication primitives to implement group communication. These primitives are provided by the Panda communication layer. For example, Panda uses hardware multicast on Ethernet and support for the network interface firmware on Myrinet [4]. We have extended the Manta compiler to recognize the GroupMember interface and generate the required stubs and skeletons. In other Java systems, a preprocessor would be needed, similar to the rmic used for RMI. The group stubs invoke the Panda communication routines, which in turn call the efficient network-level primitives. To do a group invocation, the stub can use the efficient multicast offered by Panda. The combining of result values is implemented using an efficient binomial tree algorithm. The API is completely implemented in Java. Creating and joining groups is implemented using RMI. We use a central RMI object which functions as a group registry. When a new group is created, its name is registered in this group registry, allowing other machines to access this group. When an object tries to join a group, a request is sent to this central registry (using RMI), containing information about this object. The registry waits until all join requests have been received to complete the group, and then returns all group information (size, location of the objects, rank to object mapping etc.) to every group member. Every GroupStub contains a table of information about the group methods it implements. There is an entry in this table for every group method, containing information about the forwarding scheme, reply-handling scheme, and combine and personalization method used. Using the setInvoke and setResult methods, the information in this table can be changed. The compiler-generated group method in the stub then checks this table to see which forwarding scheme and reply-handling scheme must be used for this method, and invokes the appropriate (compiler generated) implementation. To prevent the use of reflection (which is usually inefficient), we use an additional preprocessor to generate classes containing empty prototypes for personalization and combine methods. The implementation of the API (the library) is written in Java and is more than 3000 lines of code. The implementation of the group-related part of the Manta runtime system (written in C) is approximately the same size. The compiler-related code, required for generating stubs and skeletons is significantly larger, some 6000 lines of code. Although much of this code (especially the compiler) is fairly straightforward, this gives some indication of the complexity hidden from the programmer by the GMI framework.
4.
EXAMPLES
We now show four example applications that illustrate how GMI can be used. Using the API described above, creating a group ob-
ject is similar to creating a RMI object. A group interface (extending the interface GroupMethods) must be created, which defines the methods that are common to the group objects. To be part of a group, the object must implement this group interface and extend the GroupMember class. An example is shown in Figure 6. Here, the group interface Sum contains two methods, add and get. The add method adds a value to the sum, while get retrieves the current value. The group object, SumImpl, implements these methods. import group.*; interface Sum extends GroupMethods { public void add(double v); public double get(); } class SumImpl extends GroupMember implements Sum
{
private double value = 0.0; public synchronized void add(double v) { value += v; } public synchronized double get() { return value; } }
Figure 6: A simple group object The Main class, shown in Figure 7, contains the main program, which is run on all participating machines. One machine creates a new group called SumGroup.2 All machines then create a new SumImpl object and add it to the group. Like in RMI, method invocations are done on a stub, rather than directly on the group objects. Therefore, we create a new stub (using the bind method). The object group is now ready for use. As will be shown in the following examples, each of the methods in this stub can be configured separately to use a specific combination of forwarding and reply handling schemes. Multiple stubs using different configurations for the same method can also be used. In the following, we extend the SumGroup example to illustrate four important forms of group communication, which are obtained by configuring the stub using setInvoke and setResult, as explained in Section 3.1. We do so by giving different implementations of the
2 We assume that some infrastructure is present to allow multiple Java programs running on different machines to participate in a parallel computation. E.g., every machine is capable to contact every other participating machine, and the machines are ordered by a ranking scheme.
class Main { public static void main(String [] args) { // if program rank is 0 // and number of programs is N Group.create("SumGroup", N); SumImpl sum = new SumImpl(); Group.join("SumGroup", sum); Sum group = (Sum) Group.bind("SumGroup"); doWork(sum.rank(), group); } }
Figure 7: A simple group application method doWork, which is called from the main method of Figure 7.
class Combine extends SumCombine { public double c_get(double result1, int rank1, double result2, int rank2, int size) { return (result1 + result2); } } - - - - - - - - - - - - - - - - - - - - - - - - public static void doWork(int rank, Sum group) { if (rank == 0) { Combine c = new Combine(); group.setInvoke("double get()", Invocation.GROUP); group.setResult("double get()", Result.COMBINE, c); double result = group.get(); } }
4.1 Group Invocation Figure 8 illustrates a group invocation in GMI. One group member sets the forwarding scheme for the add method to group invocation using setInvoke. A description of the method (’void add(double v)’) is passed as a String parameter. The generated stub contains a table with the descriptions of all its group methods. This table can be used to efficiently select the configuration of a method (without using Java’s inefficient reflection mechanism). The invocations of add are then forwarded to all objects in the SumGroup. This example illustrates the ease of expressing group communication in GMI, and the similarity with RMI. Note that although the group method is invoked many times, it has to be configured only once. public static void doWork(int rank, Sum group) { if (rank == 0) { group.setInvoke("void add(double v)", Invocation.GROUP); for (int i=0;i