Programming the Grid with Distributed Objects Thierry Priol1 IRISA/INRIA, Campus de Beaulieu - 35042 Rennes Cedex, France e-mail:
[email protected]
Abstract. The design of application for Computational Grids relies
partly on communication paradigms. In most of the Grid experiments, message-passing has been the main paradigm either to let several processes from a single parallel application to exchange data or to allow several applications to communicate between each others. In this article, we advocate the use of a modern approach for programming a Grid. It is based on the use of distributed objects, namely CORBA objects. In this paper, we give an overwiew of two projects that deal with the building of an ecient distributed objects platform for computational grids. The rst project aims at encapsulating eciently parallel codes into distributed objects (PaCo) whereas the second project aims at designing a runtime system allow a CORBA middleware to exploit very fast networking technologies.
1 Introduction With the availability of high-performance networking technologies, it is nowadays feasible to couple several computing resources together to oer a new kind of computing infrastructure that is called a Computational Grid [5]. Such system can be made of a set of heterogeneous computing resources that are interconnected together through multi-gigabit networks. Software infrastructures, such as Globus [4] or Legion [6], provide a set of basic services to support the execution of distributed and parallel programs. One of the problem that arises immediately is how to program such a computational Grid and what is the most suitable communication model for Grid-enabled applications ? It is very tempting to extend existing message-passing libraries so that they can be used for distributed programming. We believe that this approach cannot be seen as a viable solution for the future of Grid Computing. Instead, we advocate an approach that allows the combination of communication paradigms for parallel and distributed programming. This approach, called PaCO, is based on an extension to a well known and mature distributed object technology, namely CORBA. It is presented in section 2. The PaCO concept can me implemented using the Padico runtime that allows a CORBA ORB to use fast networking technologies such as the ones available in PC clusters or parallel computers such as the NEC Cenju4. It is presented in section 3.
2 Concept of parallel CORBA object 2.1 A short overview of CORBA CORBA is a speci cation from the OMG (Object Management Group) to support distributed object-oriented applications. CORBA acts as a middleware that provides a set of services allowing the distribution of objects among a set of computing resources connected to a common network. Transparent remote method invocations are handled by an Object Request Broker (ORB) which provides a communication infrastructure independent of the underlying network. An object interface is speci ed using the Interface De nition Language (IDL). An IDL le contains a list of operations for a given object that can be invoked remotely. An IDL compiler is in charge of generating a stub for the client side and a skeleton at the server side. A stub is simply a proxy object that behaves as the object implementation at the server side. Its role is to deliver requests to the server. Similarly, the skeleton is an object that accepts requests from the ORB and delivers them to the object implementation.
2.2 PaCo: Parallel CORBA Object The concept of parallel CORBA object1 is simply a collection of identical CORBA objects as shown in gure 1. It aims at encapsulating a MPI code into CORBA objects so that a MPI code can be fully integrated into a CORBAbased application. Our goal is to hide as much as possible of the problems that appear when dealing with coarse-grain parallelism on a distributed memory parallel architecture like a cluster of PCs. However, this is done without entailing a lost of performance when communicating with the MPI code. First of all, the calling of an operation by a client will result in the execution of the associated method by all objects belonging to the collection at the server side. Execution of parallel objects is based on the SPMD execution model. This parallel activation is done transparently by our system. Data distribution between the objects belonging to a collection is entirely handled by the system. However, to let the system to carry out parallel execution and data distribution between the objects of the collection, some speci cations have to be added to the component interface. A parallel object interface is thus described by an extended version of IDL, called Extended{IDL as shown in gure 1. It is a set of new keywords (in bold in the gure), added to the IDL syntax2, to specify the number of objects in the collection, the shape of the virtual node array where objects of the collection will be mapped on, the data distribution modes associated with parameters and the collective operations applied to parameters of scalar types. Performance evaluation of this concept has been performed using a cluster of PCs and a NEC Cenju-4 [7]. The conclusion of this evaluation was that a good scalability can be achieved when transfering data between two dierent collection 1 2
we will use parallel object from now on A more complete description of these extensions is given in [9, 10]
Machine A
interface[*:2*n] MatrixOperations { const long SIZE=100; typedef double Vector[SIZE]; typedef double Matrix[SIZE][SIZE]; void multiply(in dist[BLOCK][*] Matrix A, in Vector B, out dist[BLOCK] Vector C); void skal(in dist[BLOCK] Vector C, out csum double skal); };
Cluster of PCs Parallel CORBA object MPI Communication layer Object impl.
Object impl.
Object impl.
Object impl.
SPMD code
SPMD code
SPMD code
SPMD code
Skel.
Skel.
Skel.
Skel.
PBOA
PBOA
PBOA
PBOA
Client Extended -IDL compiler Stub
CORBA ORB
Fig. 1. Encapsulation of MPI-based parallel codes into CORBA objects. of objects. When communicating between two dierent collections of objects, data redistribution has to be performed accordingly to the data distribution mode of each of the two collections. We have shown that this operation has to be performed either on the client collection or the server collection. As for instance, we map the client collection on a PC cluster with an Ethernet network and the server collection on a NEC Cenju-4 parallel machine. In such a case, data redistribution has to be performed on the server side taking full bene t of the NEC Cenju-4 interconnection network. Very recently, we had access to the VTHD network. It is an experimental network of 2.5 Gb/s that in particular interconnects two INRIA laboratories, which are about one thousand kilometers apart. In a peer-to-peer situation using two parallel CORBA objects we measure a throughput of 11 MB/s; the Ethernet 100 Mb/s card being the limiting factor. For experiments with an 8-node parallel client and an 8-node parallel object, we measure an aggregated bandwidth of 85.7 MB/s, which represents a point-to-point bandwidth of 10.7 MB/s. Portable CORBA parallel objects prove to eciently aggregate bandwidth.
3 Padico: a high-performance distributed objects platform 3.1 Managing networking resources
Due to the high level of heterogeneity in a computational Grid, designing a runtime system for such computing infrastructure is extremely challenging in many aspects. In this section we focus our work on a particular facet that a grid runtime has to tackle: managing various communication resources and hiding them to the middlewares so that they can use them transparently and eciently. Communication resources in a computational grid cover a wide spectrum of communication technologies: from high bandwidth WAN (Wide Area Network), such as the VTHD network that is being experimented in France (2.5 Gbit/s), to LAN (Local Area Network) or SAN (System Area Network), such as Myrinet [2], SCI or a particular networking technology such as the one in the NEC Cenju4. The main diculty comes from various communication protocols that are
associated with these networking technologies. One possible solution to hide this heterogeneity should be to have one single communication protocol (such as TCP/IP) upon which various middlewares could be based on. However, such approach failed due to its lack of ability to exploit eciently all these networking technologies.
3.2 The Padico platform We developed a research platform for parallel and distributed computing called Padico. The runtime environment is called Padico Task Manager, or in shorter, PadicoTM. The role of PadicoTM is to provide a high performance infrastructure to plug in middlewares like CORBA, MPI, JVM or DSM, etc. It oers a framework that deals with communication and threads issues, allowing dierent middlewares to eciently share the same process. Its strength is to oer the same interface to very dierent networks. In particular, Padico is intended to be our research platform for code coupling application based on the concept of parallel CORBA objects [3]. Application
JVM
CORBA MPI
DSM
VSock
Padico Task Manager Padico NetAccess
Padico ThreadManager
Madeleine TCP
Marcel Myrinet
SCI
Fig. 2. Padico overview The design of Padico, derived from the software component technology, is very modular. Every module is represented as a component: a description le is attached to the binary (in a dynamically loadable library form) that describes it. PadicoTM implements network multiplexing, provided by the Padico NetAccess module and thread management, provided by the Padico ThreadManager module. Padico NetAccess and Padico ThreadManager, built on top of Madeleine and Marcel, are the core of PadicoTM. Then, services are plugged in PadicoTM core. This services are:
{ { { {
the virtual socket module VSock, used by CORBA. It may be used by several other modules at the same time; the CORBA module, based on OmniORB3 or MICO, on top of VSock. the MPI module, derived from MPICH / Madeleine [1]; a basic CORBA gatekeeper that allows the user to dynamically load modules upon CORBA requests. Currently, we have a functional prototype with all these modules available. Padico is just in its begining phase. Several important issues like security, deployment and fault tolerance are not yet addressed.
3.3 Performance 100
100 omniORB/SCI omniORB/Myrinet omniORB/TCP (reference)
omniORB/SCI MPI/SCI
80
Bandwidth (MB/s)
Bandwidth (MB/s)
80
60
40
20
60
40
20
0
0 32
1KB
32KB
1 MB
Message size (bytes)
Fig. 3. Bandwidth of OmniORB over SCI and Myrinet networks
32
1KB
32KB
1 MB
Message size (bytes)
Fig. 4. Comparison between CORBA and MPI bandwidth
Bandwidth. The bandwidth of our high-performance CORBA implementation is shown on Figure 3. We ran our benchmark on dual-Pentium II 450 machines, with Ethernet-100, SCI and Myrinet. The benchmark consists in a remote invocation of a method which takes an inout parameter of variable size. The peak bandwidth is 86 MB/s on SCI and 91 MB/s on Myrinet. This performance is very good, especially when compared to the maximum achievable bandwidth with Madeleine: 99 % on SCI and 92 % on Myrinet. MPI vs. CORBA. Figure 4 shows a comparison of the bandwidth of MPI/Madeleine [1]
and our OmniORB/Madeleine on SCI. For small messages, CORBA is a little slower than MPI, because of the software overhead introduced by the ORB (see the discussion about latency below). For larger messages, our CORBA implementation outperforms MPI, because MPICH fragments large messages not very adequately. The overall performance of CORBA is thus comparable to MPI. This validates our approach of using both MPI and CORBA for a better structuration of the applications without entailing a performance lost.
Latency issues. On high-speed networks (eg. SCI and Myrinet), the latency of our OmniORB/Madeleine is around 55 s. It is a good point when compared to the 160 s latency of the ORB over TCP/Ethernet-100. However, MPI/Madeleine latency is 23 s. This rst high-performance ORB uses the GIOP protocol. GIOP is very time-consuming and is not needed inside a homogeneous part of a grid system. CORBA enables us to write other protocols than GIOP (called ESIOP). Thus, to lower the latency, it is possible to write a high performance network ESIOP (a Grid-aware ESIOP), or to use an existing light protocol like TAO's light-GIOP [8]. This will be investigated in future works.
4 Conclusion Thanks to the continuous improvement of networks, Computational Grids are becoming more and more popular. Some Grid Architectures, like Globus, provide a parallel programming model, which does not appear well suited for certain applications, for example coupled simulations. For such applications, we advocate a programming model based on a combination of parallel and distributed programming models. CORBA has proved to be an interesting technology. However, as it does not handle parallelism, there is a clear need of parallel CORBA objects when interconnecting for example two MPI parallel codes. We have shown that such concept is able to exploit very high bandwitdh wide area networking technologies by aggregating the networking resources of a PC cluster or a parallel machine like the NEC Cenju4. Moreover, CORBA, through its ORB, is not able to exploit eciently various networking technologies. We have shown, thanks to the Padico platform, that CORBA could be adapted to Grid networking environment without modifying current implementations. Preliminary performance evaluations have shown that CORBA oers the same performance than other communication middleware like MPI. Therefore, using a modern approach (through distributed objects) to program computation Grids does not entail a lack of performance.
Acknowledgments We would like to thank Satoshi Goto and Toshiyuki Naka-
ta for their continuous valuable advice. We would pay tribute to Alexandre Denis, Tsuneiko Kamachi (C&C Media Research Laboratories, NEC Corporation), Christian Perez and Christope Rene for their contributions to the work presented in this paper. This work was carrying out within the INRIA-NEC collaboration framework under contract 099C1850031308065.
References 1. Olivier Aumage, Guillaume Mercier, and Raymond Namyst. MPICH/Madeleine: a true multi-protocol MPI for high-performance networks. In Proc. 15th International Parallel and Distributed Processing Symposium (IPDPS 2001), San Francisco, April 2001. IEEE. To appear.
2. N. J. Boden, D. Cohen, R. E. Felderman, A. E. Kulawik, C. L. Seitz, J. N. S., and W.-K. Su. Myrinet : A gigabit-per-second local area network. IEEE-Micro, 15(1):29{36, February 1995. 3. A. Denis, C. Prez, and T. Priol. Portable parallel corba objects: an approach to combine parallel and distributed programming for grid computing. In Proc. of the Intl. Euro-Par'01 conf., Manchester, UK, 2001. To appear. 4. I. Foster and C. Kesselman. Globus: A metacomputing infrastructure toolkit. The International Journal of Supercomputer Applications and High Performance Computing, 11(2):115{128, Summer 1997. 5. I. Foster and C. Kesselman, editors. The Grid: Blueprint for a New Computing Infracstructure. Morgan Kaufmann Publishers, Inc, 1998. 6. A. S. Grimshaw, W. A. Wulf, and the Legion team. The Legion Vision of a Worldwide Virtual Computer. Communications of the ACM, 1(40):39{45, January 1997. 7. T. Kamachi, T. Priol, and C. Reni. Data distribution for parallel corba objects. In EuroPar'00 conference, August 2000. 8. Carlos O'Ryan, Fred Kuhns, Douglas C. Schmidt, and Je Parsons. Design Patterns in Communications, chapter Applying Patterns to Develop a Pluggable Protocols Framework for ORB Middleware. Cambridge University Press, 2000. 9. T. Priol and C. Rene. Cobra: A CORBA-compliant Programming Environment for High-Performance Computing. In Euro-Par'98, pages 1114{1122, September 1998. 10. C. Rene and T. Priol. MPI code encapsulating using parallel CORBA object. In Proceedings of the Eighth IEEE International Symposium on High Performance Distributed Computing, pages 3{10, August 1999.