Abstract. The paper describes how Condor-pools could be joined together to form a large computational cluster-grid. In the architecture Jini provides the.
Connecting Condor Pools into Computational Grids by Jini1 Gergely Sipos and Péter Kacsuk MTA SZTAKI Computer and Automation Research Institute P.O. Box 63., H-1518 Hungary {sipos, kacsuk}@sztaki.hu
Abstract. The paper describes how Condor-pools could be joined together to form a large computational cluster-grid. In the architecture Jini provides the infrastructure for resource lookup, while Condor manages the job execution on the individual clusters. Semi on-line application monitoring is also available in this structure, moreover it works even through firewalls. Beside Condor the presented Jini based Grid can support other local jobmanager implementations, thus various types of sequential or parallel jobs could be executed with the same framework.
1
Introduction
The availability of the Condor local jobmanager within single administrative domains has been proved in several projects [4]. Other works described how Condor flocking can be applied to connect clusters together [2]. Unfortunately in such a role Condor meets neither the security, nor the functionality requirements that second generation, service oriented grids should do. We already presented how the Java based Jini technology can be used as the middleware layer in computational Grids [5]. Jini does have service-oriented vision, and based on its Lookup Service infrastructure clients can find suitable computational services. To exploit the advantages of both Condor and Jini we integrated them into a single framework. In this system Jini acts as the information system layer, while Condor manages the running jobs on the connected clusters. Appling this structure there is no need to use Condor flocking since Jini can provide the necessary tools and protocols for the inter-domain communication. To make the cooperation of the two technologies available Condor had to be wrapped into a Jini service program, and a suitable service proxy had to be developed for it. Since neither Jini nor Condor supports application monitoring, the Mercury monitor infrastructure [1] has been integrated into the Grid as well. Our system supposes that Mercury has been accordingly installed on the machines of the Condor-pools and clients use the GRM trace collector and the PROVE visualiser tools [3]. 1
The work presented in this paper was supported by the Ministry of Education under No. IKTA5-089/2002, the Hungarian Scientific Research Fund No. T042459 and IHM 4671/1/2003.
Although Condor can manage different types of sequential and parallel jobs – thus PVM, MPI and Java applications can be executed in our Grid – the purpose of this work is to give a general pattern that developers of Jini based multi-layered Grids can follow. Later, similarly to the presented solution any other jobmanager implementation (e.g. Sun Grid Engine, Fork) can be wrapped into the same grid. In Section 2 the structure and usage scenario of the Condor-Jini Grid is presented, while Section 3 outlines conclusions.
2
Job execution and monitoring in the Jini based Condor-Grid
The developed server program wraps the job executor functionality of Condor into a Jini service. Using this service Jini enabled clients can submit sequential and parallel jobs into remote Condor-pools. In the system the job submission and result download processes are fully performed by the cooperating cluster side server program and its client side proxy, the user only has to start these procedures. Figure 1 presents the usage scenario of the high-level job executor and monitor service. Lookup service MS URL
Cluster proxy
Grid client machine
Cluster front-end machine
Condor pool
Grid application Jini client (3) MS URL
Cluster proxy
Job proxy Monitor ID
(2)
(1) Jini service program
(4)
GRM and PROVE
Running grid application
(6) (9) (9)
(7)
(5)
Monitor service (MS)
(8) firewall
(9)
Local monitors
firewall
Fig. 1. The usage scenario of a Condor cluster in the Jini based Grid.
The service program has to be started on the front-end machine of the Condor-pool. With this separation the security of the whole system could be significantly improved, since this machine can perform every grid-related task, the Condor nodes can stay protected. After its start-up, the service program discovers the lookup services and registers the cluster proxy together with the URL of the Mercury Monitor Service (MS URL) at them (1). When an appropriate Jini client application downloads these
two objects (2), the proxy can be used to submit compiled PVM, MPI or Java programs to the Condor cluster (3). The proxy forwards the received application to the remote server (4) which submits it into the Condor-pool with native calls (5). At the same time a job proxy is returned to the client (6). This second proxy can be used to start or stop the remote grid application or to download its result files. Based on the monitor ID contained by this job proxy and on the MS URL has been downloaded from the Lookup Service the GRM tool can register for the trace of the remote job (7, 8). Appling the Mercury infrastructure (the components inside the broken lines) the instrumented grid application can forward trace events to the client side GRM trace collector and the PROVE visualiser tools (9). Since Mercury is a pre-installed service on the cluster only one port has to be opened from the front-end machine to the public network to enable trace forwarding. Beside the service program and the proxies we already developed a client application that can use the service in the described way.
3
Conclusions
The presented Condor based Jini executor service has been publicly demonstrated during the Grid Dissemination Day organised by the Hungarian Grid Competence Centre, as an important part of the JGrid project [6]. Although its automatic service discovery and usage functionalities derived from Jini resulted an easy-to-use and easy-to-install system, due to security issues the present version cannot be publicly used. The demonstrated version builds on Jini version 1, thus authentication, authorization and dynamic policy configuration could not be handled. We are already working on the next version of the service that will apply every security solution provided by Jini 2.
References 1. Z. Balaton and G. Gombás: Resource and Job Monitoring in the Grid. Proc. of EuroPar’2003 Conference, Klagenfurt, pp. 404-411, 2003. 2. D. H. J. Epema, M. Livny, R. van Dantzig, X. Evers, and J. Pruyne: A worldwide flock of condors: load sharing among workstation clusters. Technical Report DUT-TWI-95-130, Delft, The Netherlands, 1995. 3. P. Kacsuk: Performance Visualization in the GRADE Parallel Programming Environment. Proc. of the 5th international conference/exhibition on High Performance Computing in Asia-Pacific region (HPC’Asia 2000), Peking, pp. 446-450, 2000. 4. M. J. Litzkov, M. Livny, and M. W. Mutka: Condor – A hunter of idle workstations. Proc. of the 8th IEEE International Conference on Distributed Computing Systems, pp. 104-111, 1988. 5. G. Sipos and P. Kacsuk: Executing and Monitoring PVM Programs in Computational Grids with Jini, Proc. of the 10th EuroPVM/MPI Conference, Venice, pp. 570-576., 2003. 6. JGrid project: http://pds.irt.vein.hu/jgrid