A DRMAA-based Target System Interface Framework for UNICORE Morris Riedel, Roger Menday, Achim Streit Central Institute of Applied Mathematics Research Centre Juelich D-52425 Juelich, Germany {m.riedel, r.menday, a.streit}@fz-juelich.de Abstract The UNICORE Grid technology provides a seamless, secure, and intuitive access to distributed Grid resources. UNICORE is a full-grown and well-tested Grid middleware system that is used in daily production and research projects worldwide. The success of the UNICORE technology can at least partially be explained by the fact that UNICORE consists of a three-tier architecture. In this paper we present the evolution of one of its tiers that is mainly used for job and resource management. This evolution integrates the Distributed Resource Management Application API (DRMAA) of the Global Grid Forum providing UNICORE with a standardized interface to underlying resource management systems and other Grid systems.
1 Introduction The Grid is an emerging infrastructure that fundamentally changes the way computational scientists and engineers work. Grid middleware software has evolved into mature and robust systems comprising many different types of Grid resources i. e. large data-storage facilities with PetaBytes of storable data, databases with valuable content from life- or bio-sciences, drug discovery, medical science, highenergy- and astro-physics, or compute resources like the worlds most powerful clusters and supercomputers. The UNICORE Grid middleware system has been developed since the mid 1990s, and is currently used in daily production at many supercomputing centers worldwide [26]. Beside its powerful features such as matured workflow capabilities, single sign-on, and command-line interfaces, its successful three-tier architecture offers functionality to support the operating and batch systems of all vendors present at worldwide partner sites. In 1997 the supported systems were e. g. large Cray T3E systems, NEC and Hitachi vector machines, and IBM SP2s. Nowadays, the spectrum of systems is even broader, including modern systems such as
Piotr Bala Faculty of Mathematics and Computer Science Nicolaus Copernicus University 87-100 Torun, POLAND
[email protected] IBM p690 systems, SGI Altix systems, or diverse Linuxbased clusters and server-farms. While new forms of systems appear, the major issues when deploying Grid software on these systems are still the same: the Grid software has to be non-intrusive leading to strictly no changes in the computing centers hard- and/or software infrastructures. UNICORE addresses this particular issue by providing lightweight Target System Interfaces (TSI) components [26]. These components implement UNICOREs interface to the underlying supercomputer and its Resource Management System (RMS). Versions of the TSI are provided for common RMS such as PBS Pro [10], IBM Loadleveler [7], or CCS [22]. This is accomplished by deploying one particular perl-based TSI component for each RMS that is supported. In recent years, the widespread deployment of Java technology has even spread to supercomputers and clusters which makes it reasonable to consider new possibilities for Java-based Grid components running directly on large supercomputers. The goals of using this technology at the target system are two fold. Primarily, polymorphism and dynamic code loading allows for an easier adaptation of underlying RMS during the runtime of UNICORE. Secondly, it allows for the support of upcoming RMS that implement this standardized interface, and also allows for the adaption of systems that were not yet supported in UNICORE, for instance the Torque Resource Manager [12]. In this paper, we present an extensible Target System Interface Framework for UNICORE that improves the scalability, portability, interoperability, flexibility, and compliance with standards of UNICOREs Target System Interfaces by the usage of the Java technology and the standardized interface of the Distributed Resource Management Application API (DRMAA) [17] of the Global Grid Forum [5]. This framework allows for new forms of Target System Interfaces that provide interoperability with other Grid systems such as Condor [2], Sun Grid Engine [11], Globus [20], and easier support for widespread RMS such as the Torque Resource Manager or LoadLeveler.
The remainder of this paper is structured as follows. In Section 2, we provide an overview of the UNICORE Grid System and introduce the DRMAA technology in Section 3. The extensible Target System Interface Framework is described in Section 4, while this paper ends with concluding remarks and future directions.
2. UNICORE technology The Uniform Interface to Computing Resources (UNICORE) Grid technology provides a seamless, secure, and intuitive access to distributed Grid resources such as supercomputers, clusters, or server-farms. As well as production usage (e. g. in the European DEISA infrastructure [3]), it serves as a solid basis in many European and International research projects (e. g. in the OMII-Europe project [9]). As shown in Figure 1, UNICORE is an implementation of a layered architecture consisting of a client, server, and target system tier. The client tier is represented by the UNICORE client that provides a GUI to exploit the functionality offered by the server and target system tier and to define complex workflows of interdependent jobs that can be executed on any UNICORE site. Furthermore, the Command Line Interface (CLI) [23] of UNICORE allows an easy integration of UNICORE functionality in user-specific scripts that are often used in the scientific community. As a scientific tool, UNICORE is intended to be extensible for domain-specific applications. As such, UNICORE provides a plug-in mechanism enabling the development of application-specific UNICORE plug-ins and thus provides an extensible architecture for various scientific domains. Over the years, many UNICORE plug-ins were developed for applications [16, 19] and different user-communities [25]. The client communicates with the server-tier using the Abstract Job Object (AJO) and UNICORE Protocol Layer (UPL) over standard Secure Socket Layer (SSL). The server tier consists of a Gateway and the Network Job Supervisor (NJS). While the Gateway is responsible for authenticating users and providing one secure entry point to a UNICORE site (Usite), the NJS controls Grid resources located at a Virtual site (Vsite). A Vsite is a particular Grid resource such as a supercomputer or Linux cluster, and there can be multiple Vsites within a single Usite. Workflows can be created across multiple Vsites in multiple Usites. The Gateway forwards incoming jobs to the NJS for further processing, which includes the mapping of AJO definitions to the corresponding target system job definition, through a process called ’incarnation’. Therefore, the NJS needs target system-specific information (amount of nodes, memory, or installed software packages, etc.) regarding the underlying resources, and these are stored in the Incarnation Database (IDB). Hence, the NJS is responsible for the ‘virtualization’ of the underlying resources by mapping the
Figure 1. Architecture of UNICORE with different Target System Interfaces. abstract jobs on a specific target system. Furthermore, the NJS authorizes a user via the UNICORE User DataBase (UUDB) which leads to a mapping from the users certificate identity to target system specific login accounts. The target system tier consists of the Target System Interface (TSI) that realizes interactions with underlying resource management systems. Currently, each supported resource management system is used in conjunction with a target system-specific TSI implemented in perl. These perl TSI implementations are stateless daemons used by UNICORE to submit and control jobs to one resource management system. Figure 1 illustrates the various TSI implementations that can be statically configured before startup.
Finally, the UNICORE technology is maintained via UNICORE@SourceForge [13] by the international UNICORE developer community. For an extensive insight to UNICORE please refer to [26].
3. DRMAA The Distributed Resource Management Application API (DRMAA) provides a generalized API to Resource Management Systems (RMS). The DRMAA working group of the GGF published the DRMAA specification as a proposed recommendation in June 2004. The scope of the DRMAA specification [17] is limited to job submission, job monitoring, and job control, but it provides a sophisticated programming model that enables the development of distributed applications and Grid frameworks that are dynamically coupled to an underlying RMS using DRMAA interfaces. Since June 2004, the interfaces of the DRMAA specification were implemented in several RMS such as Sun Grid Engine or Torque. Additionally, Grid systems such as Condor also implemented the DRMAA interface specification, allowing High Throughput Computing (HTC) on large collections of distributively owned computing resources through the DRMAA interface. Recently, the GridWay meta scheduler [6] also supports DRMAA. The DRMAA working group provides c, perl, .NET and Java bindings for the DRMAA specification, which is itself described by using a programming language neutral interface description. Being an API for job management, DRMAA is intended to provide interfaces for the submission and control of jobs to RMS. As such, the specification comprises high-level interfaces to submit a job to RMSs and control them via common operations such as termination or suspension. In particular, the API consists of a Session interface that represents the main functionality of the DRMAA specification. Besides other functionality, this interface provides methods such as runJob(), or control() for job submit and control. In order to allow a DRMAA application to retrieve a vendor specific implementation of the Session interface, the factory method pattern [21] is used. Therefore, the interface SessionFactory of the specification can be used to return different Session implementations that are interchangeably selected at the run time by the administrator of the corresponding target system. Hence, the DRMAA application can be changed to a different underlying RMS as long as a DRMAA Session and SessionFactory interface implementation is provided by the RMS. Each DRMAA job definition consists of an implementation of the JobTemplate interface that consists of several attributes that describe the jobs in detail. For example, the following attributes can be used to define a job: setRemoteCommand(), setWorkingDirectory(), or setJobCategory(). So, the basic design idea of DRMAA is to create a
JobTemplate implementation using the corresponding Session implementation, to specify any required job attributes in the template and to pass it back to the Session implementation for submission on the underlying DRMAA implementation. In this context, it seems very reasonable to consider ways of getting the results, and to discover information about the resource usage of a submitted job. For these reasons the specification also defines a JobInfo interface that consists of methods such as getResourceUsage(), or getExitStatus() to encapsulate information regarding a job’s execution. Furthermore, the JobTemplate interface provides methods such as setOutputPath() and setErrorPath() to define where results of the job’s execution have to be written. Finally, a complete description of DRMAA is out of scope for this paper. Please refer to [17] for further information.
4. Target System Interface Framework The success of the UNICORE technology can at least partially be explained by the fact that UNICORE consists of a three-tier architecture. Such a distributed client/server design usually provides good performance, flexibility, maintainability, reusability, and scalability, making UNICORE a choice for modern Grid projects such as the German Grid initiative (D-Grid) [4] or the European DEISA infrastructure [3] that use UNICORE to access the Grid resources. Another feature of UNICOREs three-tier architecture is the exchangeability of single layers when the protocol between these layers stays the same. Here, we make use of this fact and exchange the TSI layer as shown in Figure 1 with another TSI layer as illustrated in Figure 2. Hence, we exchange only the target system tier implementations. Of course, all existing TSI implementations can still be used, because our approach does not affect any other layers (server or client tiers) or the NJS-TSI protocol. Based on this new Java-based TSI, we add portability of the whole stack of UNICORE to its features, since every single component is implemented in Java. This portability simplifies the installation of the whole UNICORE system, while Java is being accepted on Supercomputers and Clusters, which experiences in the context of DEISA and D-Grid reveal.
4.1
Integration of DRMAA
Besides the gains in portability by the Java technology, the exchange is motivated by a need to integrate standardized interfaces to RMS leading to an easier integration of new RMS that come in common use within Grid environments. One example is the implementation of a TSI for the Torque resource manager, which is not yet supported in UNICORE, but implements the GGF standardized interfaces of the DRMAA working group. The here presented
TSI Framework integrates the DRMAA Java language bindings and thus splits the Java-based TSI into two loosely coupled parts: The Java TSI Core and the corresponding DRMAA implementation already supported by different RMS and Grid systems [17]. It is considered to be a framework, because it goes further than the integration of RMS by also integrating other Grid systems that implement the DRMAA interface. Figure 2 illustrates various aspects of UNICOREs target system tier and its enhancements with DRMAA functionality.
Figure 2. Multiple DRMAA implementations. The TSI Framework consists of a TSI Core that is capable of mapping the proprietary UNICORE protocol between NJS and TSI to the standardized DRMAA Session interface and corresponding DRMAA JobTemplates. The communication between the NJS and TSI is based on a simple textbased protocol named as the NJS-TSI protocol. Because of the fact that we only want to exchange one layer, we are not allowed to change this NJS-TSI protocol. On the other hand, we are also not able to adjust the DRMAA interface to UNICORE needs. Therefore, we used the adapter design pattern [21] to implement a NJS-TSI-DRMAA adapter. This adapter works as a proxy for incoming job requests and delegates them to the corresponding DRMAA implementation. More precisely, the adapter maps single elements of the NJS-TSI protocol to DRMAA as shown in Table 1. Additionally, in order to provide simple non-distributed execution on a system, the TSI Framework also consists of a Nobatch DRMAA implementation that can be run without RMS, only working with UNIX-style commands such as sudo and fork. Note that exchanging these commands with MS Windows-style commands such as ‘cmd /c’ will allow an easy generation of a Nobatch TSI for MS Windows. However, the characteristic use case of the TSI Framework is as follows: The TSICore.jar is deployed on a computing resource which runs a RMS or Grid system that implements the DRMAA interface. To provide an example, Figure 2 illustrates that either the Condor, Torque, Sun Grid
Engine or any other system that implements DRMAA can be configured to work with the TSI Core and thus with UNICORE. Using late binding and polymorphism, the DRMAA implementations are provided as shared modules that can be interchangeably selected at startup of the TSI core. The benefit is that any other upcoming RMS is automatically supported as long as it implements the DRMAA interfaces. For simplification, the adaptation of the DRMAA implementation provides a DRMAA Session implementation, which allows a multi-threaded execution of jobs. More precisely, a configuration file of the TSI core provides, in addition to ports and hosts for communication with the NJS, also the explicit class of the factory for the corresponding DRMAA implementation (‘TorqueFactory’, ‘CondorFactory’, etc.). Hence, the DRMAA framework uses the factory method pattern [21] in conjunction with late binding, polymorphism and Java’s introspection API to provide a loosely coupled implementation of the Session interface. This Session implementation is used by the TSI core implementation to explicitly map parts of the NJS-TSI protocol to DRMAA functionality. To understand this relation, it is helpful to have a closer look on a mapping example related to the working directory of jobs during execution. UNICORE uses a directory called UNICORE space (USPACE) as the working directory for jobs [26]. Each job within UNICORE gets a dedicated unique directory within this USPACE and is thus a property of the job. In DRMAA, the properties of jobs are encapsulated within a Jobtemplate implementation. In this case we use the setWorkingDirectory() property accessor of the template implementation to map the USPACE definition of submitted UNICORE jobs to the DRMAA-style working directory. In a similar way, all other job properties defined within the NJS-TSI protocol are also mapped to corresponding DRMAA representations. As shown in Table 1, the runJob(jobTemplate) method of the Session interface can be used to submit the job to the corresponding DRMAA implementation. Desirable features of the TSI include, in addition to job submission, also methods for job control that can manipulate the state of the job. While UNICORE distinguishes between several invoked commands for state change on the NJS-TSI level (#TSI ABORTJOB, #TSI HOLDJOB, #TSI RESUMEJOB,...), the DRMAA Session interface provides only the control() method for job control. Therefore, the commands of UNICORE are mapped to this method, using different parameters for the control(jobName, operation) method. The jobName parameter is a unique job identifier, but the operation parameter can be used to map the different UNICORE commands to the corresponding behavior by the DRMAA implementation. Hence, we use the operation parameter with pre-defined values of the specification like DRMAASession.TERMINATE, DRMAASession.HOLD, or DRMAASession.RESUME.
NJS-TSI Protocol ... #TSI EXECUTESCRIPT
DRMAA ... DRMAASession.runJob(jobTemplate), jobTemplate.category=EXECUTE
#TSI HOLDJOB
DRMAASession.control(jobname, DRMAASession.HOLD)
#TSI RESUMEJOB
DRMAASession.control(jobname, DRMAASession.RESUME)
#TSI ABORTJOB
DRMAASession.control(jobname, DRMAASession.TERMINATE)
#TSI PUTFILES
DRMAASession.runJob(jobTemplate), jobTemplate.setTransferFiles( mode ) mode.setOutputStream( USPACE ) ...
...
Table 1. Example mappings at the TSI. Unfortunately, DRMAA does not share all functionality of a RMS or Grid system. The basic design idea of DRMAA is to keep the interfaces simple, address areas of agreement, and to leave room for areas of disagreement between the different systems. One major disagreement is related to the file transfer techniques from the client to the server tiers and vice versa. In some cases the preferred transfer method is the GridFTP standard [14] that provides a sophisticated performance; in other cases streaming from client to server is enough. This disagreement leads to the lack of the DRMAA interfaces for the provisioning of file transfer functionality through the standardized interfaces. There is only a setTranferFiles(mode) operation of a Jobtemplate implementation which allows a more detailed configuration of a DRMAA out-of-band transfer mechanism [17]. However, as the Session interface implementations do not have such possibilities, they can be enhanced with some powerful transfer mechanism. This is helpful by mapping the file transfer commands of UNICORE (#TSI GETFILECHUNK or #TSI PUTFILES) to a DRMAA out-of-band mechanism. For that reason, UNICORE can be configured to use alternative transfer mechanisms like GridFTP in conjunction with the DRMAA file transfer configurations as shown in Table 1.
4.2
Use case scenarios
Apart from the integration of any RMS and Grid System that implements DRMAA there are several use cases for a TSI Framework. For instance, there are many issues surrounding the interoperability between UNICORE and Globus systems. Early prototypes for interoperability with Globus 2.4 were developed during the Grid Interoperability Project (GRIP) [24]. During the project, desirable features of interoperability with the Globus Toolkit include in addition to secure interactions, the job submission,
management and monitoring of UNICORE jobs that were delegated to a Globus Grid Resource Allocation Manager (GRAM) [20]. The delegation of jobs to GRAM is realized by a java-based Globus TSI that only allows some prototype functionality. The here presented TSI Framework improves the maturity of the prototype Globus TSI by implementing DRMAA interfaces for job submission and control on Globus systems. This is accomplished by mapping the UNICORE commands through the DRMAA interface to Globus-specific Resource Specification Language (RSL) commands. Furthermore, the handling of proxies [15] within the TSI core also allows the creation of TSIs that are interoperable with other Grid systems that need such Globus-based proxies such as LCG or gLite. In this sense, further interoperability can be easily achieved, for instance, by using the Gridway meta scheduler [6] that controls and submits jobs by means of DRMAA. Another use case scenario are lightweight dynamic Grids, for example PC pools within universities. As UNICORE allows for creation of more flexible and dynamic Grid infrastructures through the dynamic registration of NJSs at Gateways, such lightweight dynamic computational Grids are possible. Such computational Grids can be built by using a number of PCs running e. g. Windows. In particular, each PC has installed a NJS and a java-based Nobatch TSI, optimized for Windows, using ‘cmd /c’ commands for execution. Both components are activated while the system is idle, for example by the Windows screen saver. In result, the number of systems registered at the Gateway changes dynamically. The UNICORE Client is used to prepare and submit jobs to this dynamic infrastructure. A dedicated client plugin checks available NJSs (currently registered at the Gateway) and creates corresponding jobs. Since the UNICORE Client is able to query Gateways for the number of connected NJSs and the respective TSIs including their configuration, the job can be created automatically. In addition, the execution of the submitted jobs is tracked by the UNICORE Client, and in the case of single TSI failures, the particular part of the job is submitted to another PC automatically. The described computation model applies to trivial parallel tasks, for example similarity search or image rendering, but can also be used for other simulations performed in the master-slave model. Hence, this style of Grid can be partially compared to high throughput computing solutions such as Condor, but in this case the UNICORE infrastructure is more lightweight with less demand of statically configurations. Another use case scenario is related to the integration of the TSI Framework into the new Web Services Resource Framework (WS-RF) [18] based UNICORE 6, currently being developed within the UniGrids [1] project. In this paper we are concerned with the execution of atomic tasks on particular resources, and we have proposed to use an abstract
layer (based on DRMAA) between a networked Grid component (TSI) and the RMS for a particular resource. Such ease of deployment is equally desirable for the continuing development of UNICORE 6. Here a Web services based interface for the management of abstract ‘atomic’ tasks on a computational resource was defined. This notion of abstractness and thus seamlessness continues as a strong UNICORE theme. This interface called the UNICORE Atomic Services (UAS), includes functionality for job, transfer, import/exports and local file management. In future it is likely that this interface will contribute to, and be aligned with, the work of the OGSA-BES [8] GGF working group. We consider the TSI framework presented in this paper is an ideal foundation for the development of a lightweight implementation of the UAS. DRMAA presents an interface to RMS with bindings to multiple languages, and thus using DRMAA as a internal abstraction API, the internal implementation of the UAS is afforded a great deal of deployment flexibility as it is isolated from the specifics of a particular RMS. Note, the UAS include file management functionality (including import/export, transfers), which is executed directly on the target system by-passing DRMAA and the RMS. The distribution and the deployment of the UAS and the TSI implementation can be easily done using a common hosting environment. This can be attractive for certain deployments. Note, the introduction of a new protocol between the component does not introduce any interoperability concerns, as the NJS-TSI protocol is strictly an ’internal’ UNICORE protocol.
5. Conclusion In this paper we presented an extensible TSI Framework for the UNICORE technology that provides significant improvements to its current architecture in terms of portability, flexibility and compliance of standards. In particular, the framework is based on Java and uses DRMAA interfaces to support a whole stack of different resource management systems and other Grid systems. The Target System Interface Core implementation can be easily adapted with any resource management system that implements the GGF DRMAA interface such as Torque. In addition, any Grid system that also provides a DRMAA implementation such as Sun Grid Engine or Condor can easily be adopted without reconfiguring UNICORE. Furthermore, the framework provides also a Nobatch DRMAA TSI for simple job execution and management on a stand-alone PC without using any resource management system.
References [1] European UniGrids Project, http://www.unigrids.org.
[2] CONDOR High Throughput Computing. http://www.cs.wisc.edu/condor. [3] DEISA - Distributed European Infrastructure for Supercomputing Applications. http://www.deisa.org. [4] German Grid Initiative, D-Grid. http://www.d-grid.de. [5] Global Grid Forum. http://www.gridforum.org/. [6] GRIDWAY Meta Scheduler. http://www.gridway.org/. [7] IBM Load Leveler Resource Management. http://www03.ibm.com/servers/eserver/clusters/software/loadleveler.html. [8] OGSA Basic Execution Services WG. https://forge.gridforum.org/projects/ogsa-bes-wg/. [9] OMII-Europe. http://www.omii-europe.com. [10] Portable Batch System. http://www.openpbs.org. [11] SUN Grid Engine. http://www.sun.com/software/gridware/. [12] TORQUE Resource Manager. http://www.clusterresources.com/. [13] UNICORE at SourceForge. http://unicore.sourceforge.net. [14] W. Allcock. GridFTP: Protocol Extensions to FTP for the Grid. http://www.ggf.org/documents/GFD.20.pdf. [15] J. Basney. MyProxy Protocol. http://www.ggf.org/documents/GFD.54.pdf. [16] D. Breuer, D. Erwin, D. Mallmann, R. Menday, M. Romberg, V. Sander, B. Schuller, and P. Wieder. Scientific Computing with UNICORE. In Proc. of the NIC Symposium 2004, volume 20, pages 429–440. John von Neumann Institute for Computing, 2004. [17] R. Brobst, W. Chan, F. Ferstl, J. Gardiner, J. P. Robarts, A. Haas, B. Nitzberg, H. Rajic, and J. Tollefsrud. Distributed Resource Management Application API Specification 1.0. http://www.ggf.org/documents/GFD.22.pdf. [18] K. Czajkowski et al. The Web Services Resource Framework. http://www.oasisopen.org/committees/download.php/6796/ws-wsrf.pdf. [19] D. Erwin, editor. UNICORE Plus Final Report. UNICORE Forum e.V., 2003. [20] I. Foster. Globus Toolkit 4: Software for Service-Oriented Systems. In IFIP International Federation for Information Processing, LNCS 3779, pages 2–13, 2005. [21] E. Gamma, R. Helm, R. Johnson, and J. Vlissides, editors. Design Patterns: Elements of Reusable Object-Oriented Software. Addison-Wesley Prof.Comp.Series, 1995. [22] M. Hovestadt, O. Kao, A. Keller, and A. Streit. Scheduling in HPC Resource Management Systems: Queuing vs. Planning. In Proc. of the 9th Workshop on Job Scheduling Strategies for Parallel Processing, volume 2862 of Lecture Notes in Computer Science,, pages 1–20. Springer, 2003. [23] R. Menday, L. Kirtchakova, B. Schuller, and A. Streit. An API for Building New Clients for UNICORE. In Proc. of 5th Cracow Grid Workshop (CGW’05), 2005, to appear. [24] R. Menday and P. Wieder. GRIP: The Evolution of UNICORE towards a Service Oriented Grid. In Proc. of the 3rd Cracow Grid Workshop (CGW’03), pages 142–150, 2003. [25] M. Romberg, editor. OpenMolGRID - Open Computing Grid for Molecular Science and Engineering, Final Report. John von Neumann Institute for Computing, 2005. [26] A. Streit, D. Erwin, T. Lippert, D. Mallmann, R. Menday, M. Rambadt, M. Riedel, M. Romberg, B. Schuller, and P. Wieder. UNICORE - From Project Results to Production Grids. Elsevier, L. Grandinetti (Edt.), Grid Comp. and New Frontiers of High Performance Proc., pages 357–376, 2005.