Nimrod/G and Gridbus Broker. Nimrod/G [3] is execution management designed specifically for Parameter Sweep. Application (PSA). PSAs consist of set of.
CRO-GRID Grid Execution Management System E. Imamagic, B. Radic, D. Dobrenic University Computing Centre, University of Zagreb, Croatia {emir.imamagic, branimir.radic, dobrisa.dobrenic}@srce.hr Abstract. Computational grids are platform for demanding scientific and engineering applications. Such applications usually need complex mapping of tasks to resources. Scheduling in dynamic and fault-prone grid environment is very difficult procedure and universal solution has not been designed yet. However scheduling system is necessary in order to enable users to easily submit and manage their jobs without detailed knowledge about underlying grid architecture. In this paper, we give overview of existing grid scheduling systems. Furthermore, we propose our own system with following functionalities: support for several job description languages, support for user defined scheduling algorithms, extensive use of advance reservations and storing detailed records of users’ jobs.
Keywords.
Grid,
scheduling,
execution
management
1. Introduction Computational grid is distributed platform for demanding applications. One of the most important aspects of grid is users' jobs execution. Standards in this area have reached mature phase and various implementations exist today. Grid scheduling (also called superscheduling, metascheduling or grid brokering) is defined as the process of scheduling jobs where resources are distributed over multiple administrative domains [13]. Execution management system (also called resource management) is system whose main objectives are: use grid monitoring system to discover and evaluate resources, perform job scheduling and manage task execution. Often, rescheduling and job migration is needed because of resource failure or incomplete and incorrect resource information. Majority of computational grids are comprised of clusters. Clusters are managed by local job management systems (JMS), also called local schedulers. In this case, resources can be
considered relatively reliable, especially if JMS supports advance reservations of resources. Furthermore, cluster JMS can provide variety of data useful for grid scheduling process. Important issue that we see is that there are no grid monitoring and discovery system that can provide cluster JMS’s information in depth. Another issue is that execution management systems rarely use JMSs’ advance reservations. There are many differences between grid scheduling and classical cluster batch scheduling. In case of grid, scheduler does not have full control over resources. Furthermore, information about resources is often unreliable and stale, because resources in grid environment can be volatile and fault-prone. Cluster scheduler is dealing with static environment where information are usually valid and where scheduler has full control over resources. There are also differences from utilization aspect. Cluster schedulers are usually system centric, meaning that they optimize utilization of system. In grid environment, it is hard to optimize utilization of the whole system because of different local policies on individual resources. Therefore, grid schedulers are usually application centric, meaning that they optimize application execution. This paper is organized as follows. In section 2, we describe requirements for execution management system and shortly describe existing solutions. In section 3, we define job description language and list existing solutions. Grid standards related to execution management are listed in section 4. We describe our grid environment in section 5. In section 6, we describe our proposed systems in detail. Conclusions are given in section 7 and future directions in section 8.
2. Grid execution management systems Basic requirements that execution management system has to fulfill are described below.
Execution management system should support existing grid middleware systems for information gathering (e.g. MDS, Ganglia, NWS), job execution (e.g. Globus GRAM [8], UNICORE [14]) and data movement (e.g. GridFTP, UNICORE file staging). Ability to extend support for custom middleware systems is advantage. Execution management system should make use of JMSs’ advance reservations mechanism. By using advance reservations, grid execution system enables users to run their applications in explicitly defined timeframe. This functionality is important for interactive applications, where user needs to get response in real time. In addition, by using advance reservations, parallel applications distributed over multiple clusters can be guaranteed synchronous startup of processes on all clusters. Support for various job types should be implemented. Basic types are parallel and serial jobs, job arrays and workflows. Users should be able to define custom scheduling algorithms. Beside support for custom algorithm development, most common algorithms implementations should be provided. Execution management system should take into account data location and movement (data-aware scheduling) and characteristics of network links. Example of data-aware scheduling is mapping tasks to resources closer to data instead of moving large data over network. System scalability is very important due to the nature of grid system. For example, execution management system should be implemented using hierarchical or peer-to-peer model. Standard functionalities of job management systems should be provided: checkpointing, preemption, job migration, fault-tolerance and rescheduling. In the rest of the section, we shortly describe existing execution management solutions.
2.1. Condor Condor [4] offers four solutions for execution management: Condor-G, Condor-G Matchmaking, Condor flocking and Condor Glide-in. Condor-G allows users to use standard Condor tools to submit jobs to Globus resources. Condor-G Matchmaking extends Condor-G with ability to use matchmaking algorithm to schedule jobs to Globus resources. Last two solutions perform grid scheduling based on Condor
matchmaking mechanism. Advantage of Condor is ClassAds language, which allows users to define custom attributes to resources and jobs. In current version, it is not possible to use custom scheduling algorithm. Developers have announced that in future versions that option will be supported. Another issue is that in case of Condor-G Matchmaking, grid administrator needs to develop program for resource monitoring and advertisement. Furthermore, there is no implicit data-aware scheduling, although users can explicitly define resources closer to input data are preferred.
2.2. Nimrod/G and Gridbus Broker Nimrod/G [3] is execution management designed specifically for Parameter Sweep Application (PSA). PSAs consist of set of independent or loosely coupled tasks. Nimrod/G uses economic model scheduling [3]. It allows users to define job arrays and certain limits (such as time deadline and cost limit). Based on users’ time and cost requirements, Nimrod/G makes decision in a way that the both requirements are met. Furthermore, Nimrod/G provides graphical interface and web portal for execution steering. Gridbus [9] is project which aims to design and implement service-oriented grid middleware. Gridbus Broker [15] is one of Gridbus projects that extend Nimrod/G with support for data scheduling and UNICORE grid middleware. Drawbacks of these two systems are lack of JMSs’ advance reservations usage, focus on specific type of applications and lack of ability to define custom scheduling algorithm.
2.3. AppLeS Parameter Sweep Template AppLeS Parameter Sweep Template (APST) [2] is another execution management for PSAs, which uses different scheduling algorithm then Nimrod/G. APST provides several heuristic scheduling algorithms, such as max-min, minmin and sufferage [2]. Another feature of APST is support for various grid information systems, execution protocols and file management protocols. APST also takes advantage of resource behavior prediction provided by Network Weather Service. Unfortunately, APST doesn’t utilize JMSs’ advance reservations, has no support for parallel jobs or workflows, doesn't allow users to define their own scheduling algorithms and has limited support for dataaware scheduling.
2.4. Gridway GridWay [10] is execution management system with following functionalities: fault-tolerance, rescheduling, support for custom scheduling algorithms. However, GridWay doesn’t use JMSs’ advance reservations, provides only round-robin algorithm implementation and doesn’t support data-aware scheduling.
2.5. CSF CSF [12] is Platform's execution management system that implements OGSI-Agreement protocol. Due to the radical change in standards from OGSI to WSRF, CSF is now being adapted to new WS-Agreement protocol. CSF advantages are usage of JMSs’ advance reservations, possibility to define custom scheduling algorithm and multiple job queues. Drawbacks are limitation to Globus versions above 3, lack of data-aware scheduling and there is no functional version yet.
2.6. Moab/Silver Moab grid scheduler (Silver) [11] is commercial solution based on Maui cluster scheduler. Silver offers all advantages of Maui scheduler: advance reservations, support for serial and parallel jobs, scalability. Silver has recently added support for data-aware scheduling and decisions based on network characteristics. Drawbacks of Silver are: no support for custom algorithms and advance reservations are supported only for clusters that use Maui.
3. Job description language Job description language is a language for describing requirements of computational jobs for submission to grid and other job management systems (such as cluster JMS) [1]. Various languages exist today. All major execution management systems have developed their own job description language and some of them allow description of resources as well. Condor ClassAds (Classified Ads) is language used in Condor system. Beside of describing job requirements, ClassAds is used for describing resources. It is extensible and widely used.
The Globus Resource Specification Language (RSL) is language used in Globus Toolkit system. There are two versions of RSL: RSL v1.0 used in pre-WS implementations and XMLbased RSL v2.0 used in WS implementations. Cluster JMSs also provide simple job description languages. Examples are Portable Batching System’s (PBS) and Sun Grid Engine’s (SGE) languages. These languages are extremely simple and allow user to describe basic job’s properties such as name, input and output files, number of processors, etc. Grid execution management systems described above provide their own languages. Nimrod/G and Gridbus Broker provide complex way of describing job requirement by allowing users to create simple programs. APST uses XML-based language. GridWay uses simple language similar to cluster JMS’s languages.
4. Grid standards Global Grid Forum (GGF) [7] is organization whose main objective is development of grid standards and protocols. Standards are extremely important for grid, because they enable interaction between heterogeneous resources, services and solutions. GGF is divided in seven areas, which consist of numerous working and research groups. The most important area for grid scheduling is Scheduling and Resource Management (SRM). Here we point to two standards, which are important for execution management. Job Submission Description Language (JSDL) [1] is description language. JSDL is being developed by GGF’s JSDL working group. It is XML-based language ant it is supposed to become standard language for grid middleware systems. Currently, JSDL is under development. WS-Agreement grid standard is being developed by GGF's Grid Resource Allocation Agreement Protocol (GRAAP) working group. WS-Agreement protocol enables services to negotiate on Service Level Agreement (SLA). SLA is agreement between resource provider and user, which defines quality of service that resource provider agrees to provide to the user. Quality of service defines level of security, performance, quantity of resources (number of processors, hard disk capacity, etc). WSAgreement standard is still under development.
5. CRO-GRID grid environment CRO-GRID [5] is Croatian national initiative which consists of three projects and aims to build computational grid for science and enterprise needs. Three scientific applications are being developed for this grid within CRO-GRID Application project. CRO-GRID Infrastructure project [6] is focused on building and maintaining grid infrastructure. Third project – CRO-GRID mediator is developing service-oriented grid middleware based on recent WSRF specification.
Figure 1. CRO-GRID Infrastructure network
At this moment, CRO-GRID infrastructure consists of five clusters placed in institutes in four cities. Network used for CRO-GRID infrastructure is shown on Fig 1. In cooperation with Giga CARNet project, all links (beside link to ETFOS cluster) have gigabit bandwidth. Table 1. CRO-GRID clusters
Cluster SRCE IRB ETFOS FESB GRADRI
Local JMS Sun Grid Engine Torque & Maui Torque & Maui Torque & Maui Sun Grid Engine
We use Globus version 3.2 pre-WS services and UNICORE as basic grid middleware. Local JMSs used on clusters are shown in Table 1. All these JMSs support advance reservations. However, none of the systems described in section 2 fully utilize advance reservation feature. Closest solution is Silver, which uses Maui’s ability to create advance reservations. However, Silver does not provide support for SGE’s advance reservation mechanism. Since the grid will be used for running batch jobs and short test jobs in development phase,
grid scheduler should be able to execute test jobs interactively. In order to execute jobs interactively scheduler has to be able to preempt running jobs. We expect that scheduler will need to handle various types of jobs: serial, parallel jobs and parallel jobs distributed over multiple clusters. Furthermore, most of the users are accustomed to define job requirements with cluster JMSs’ languages. Therefore, tools for translation of those languages to some flavor of grid job description language are necessary.
6. CRO-GRID Management
Grid
Execution
After thorough analysis of existing execution management systems, we decided to design new system – CRO-GRID Grid Execution Management (GEM), which will provide additional features to our users. Our primary idea is to utilize existing system and extend them with additional features. These additional features will implement user demands identified in previous section. We believe that these requirements reflect general needs placed before grid execution management and that these extensions can be useful to other grid systems. Main features of proposed system are following: • Provide simple command line interface, which enables users to submit, manage and monitor their jobs. • Support several job description languages including GGF JSDL standard in its present form and cluster JMSs’ languages. • Provide thin interface to other execution management systems (such as Condor-G, APST) and allow users to design custom scheduling algorithms. • Implement extensive cluster JMS’s information gathering. • Extensive usage of JMSs’ advance reservations mechanism. • Build and maintain database with jobs’ records. • Enable preemptive scheduling for short test jobs. Proposed system is currently in early stage of development. We have identified set of existing system that will be utilized. In addition, we defined set of job description languages which will be supported and translation tools are in
development stage. Details about our components and current state of development are described in following subsections.
6.1. Proposed architecture Architecture of system is shown on Fig 2. Components colored in gray are parts of proposed CRO-GRID GEM system. Scheduler component can be both external and internal. Grid access machine is considered any computer used for accessing grid (ex. end-users’ computers or grid portal).
by Scheduler and grid middleware system. Before starting the whole system user has to configure which Scheduler, grid middleware system and description language are used. As mentioned above, Scheduler can be both internal and external component. We are planning to provide support for existing execution management solutions described in section 2 and enable users to define their own resource selection algorithms.
6.2. Description language conversion Users are usually familiarized with cluster JMS’s job description languages. Instead of forcing them to learn new languages, we propose component, which converts user’s selected language to the language of underlying grid middleware. In version that is currently being developed, we plan to support: Condor ClassAds, Globus RSL1 & 2, PBS’s and SGE’s languages and GGF’s JSDL.
6.3. Advance reservations management
Figure 2. CRO-GRID GEM architecture
Manager is the central component responsible for users’ jobs management. It keeps list of all active jobs and coordinate between Scheduler and grid middleware system. Manager uses grid information system to get information about available grid resources and provides them to Scheduler. Once the Scheduler decides which jobs will be executed on which resources, Manager uses underlying grid middleware to start job execution. Manager also monitors jobs’ execution, allow users to manage their jobs and maintain jobs’ records database (JRDB). Reservation Manager is installed on grid resources. Goal of Reservation Manager is to discover which reservations are available on grid resources and publish that information to grid monitoring system (1). Furthermore, Reservation Manager is responsible for creating reservations by using local JMS’s API (2). Description Language Converter translates users’ job description language to language used
Since all clusters within CRO-GRID environment support advance reservations we decided to extend existing grid middleware with Reservation Manager. Reservation Manager has two main goals: discovery of active and available reservations and creation of new reservations. As our grid environment will be used for application development, users will be submitting short test jobs. These jobs usually have to be executed quickly and don’t require reservations. In this case, we are planning to utilize backfilling mechanism, where that option is supported. For example, Maui allows shorter jobs to be executed on nodes that are reserved for later use. In addition, resources that don’t support reservations can be used for such jobs. This component is in early state of development and is implemented as simple Perl script. In future version we are planning to make WSRF compliant implementation.
6.4. Scheduling Scheduler is the most important component and can be external or user designed execution management system. This plugin-based architecture will allow end-users to use existing commodity systems and advanced users to
develop algorithms, which will perform better scheduling of their applications. Scheduler will receive rich set of information about available resources. We plan to utilize various grid monitoring systems in order to provide as much of information as possible. Basic set of information consists of: various cluster information (number of nodes, architecture, free memory, etc), available reservations and network information. In future, we will extend this set with authorization and data location information. In current version, we use Condor-G, APST and GridWay systems, as we find them most mature. In future versions, we plan to experiment with Gridbus Broker and CSF. API for developing custom plugin scheduler is still in development.
6.5. Jobs' records database Advance reservations are very powerful mechanism that can assure certain level of quality of service to users. However, efficiency of scheduling algorithm that uses advance reservations depends heavily on quality of jobs' description. For example, if user defines that job needs more time then it really needs, scheduler might not find available resources or assume that resources allocated to the job will be longer unavailable which consequently causes other jobs to be scheduled for later time. In order to help users describing their jobs we propose extension to execution management. We propose that Manager stores information about users' jobs in permanent database – JRDB. Data from JRDB will help users in future description of their jobs. Furthermore, JRDB can be used for application performance analysis, accounting and optimizing scheduling process.
7. Conclusion Seamless job execution management is extremely important capability of computational grid. As we see it, today's systems are not mature enough for end users to use them heavily. Another issue of existing systems is limited grid middleware support. They introduce new job description languages, limited usage of information and mechanisms from cluster JMSs or support only specific type of applications. In this article, we propose execution management system – CRO-GRID GEM, which allows easier job execution. Motivated with
CRO-GRID users’ needs we focused on following issues: support for various job description languages, enabling users to define custom scheduling algorithms, extensive usage of advance reservation, providing jobs’ records database and enabling preemptive job scheduling.
8. Future Work In future, we are planning to extend system in following directions: • Designing simple graphical user interface, web portal and web portlet, which will enable integration of job submission service into portlet-based grid portals. • Extending Description Language Converter with more job description languages if needed. Upcoming updates to GGF’s JDSL are priority. • Evolving Reservation Manager component to support GRAAP’s WS-Agreement standard. Also, we plan to implement WSRF compliant version of this component. • Integrating other GGF’s standards related to execution management and scheduling. Priority is applying GGF’s Usage Records standard format for data stored in JRDB. • Linking CRO-GRID GEM with virtual organization management system. CRO-GRID GEM will use virtual organization management to discover users’ rights on individual resources – authorization filtering. This method will optimize scheduling process. • Linking execution management with data management system, which will enable data-aware scheduling. At this point of development, we are focused on implementing defined set of functionalities needed by our users. Once we finish this stage, we will concentrate on benchmarking the performance of whole system. We are especially interested in benefits gained by extensive usage of advance reservations.
9. Acknowledgements All given results were achieved through the work on the TEST program – Technological research-development projects with the support of the Ministry of science, education and sports.
10. References [1] Anjomshoaa A, Brisard F, Ly A, McGough S, Pulsipher D, Savva A. Job Submission Description Language (JSDL) Specification, version 0.9.2. Global Grid Forum; 2005. [02/28/2005] [2] Berman F, Wolski R, Casanova H, Cirne W, Dail H, Faerman M, Figuiera S, Hayes J, Obertelli G, Schopf J, Shao G, Smallen S, Spring N, Su A, Zagorodonov D. Adaptive Computing on the Grid Using AppLeS., IEEE Transactions in Parallel and Distributed Systems, Volume 14, Number 5. May 2003. [3] Buyya R, Abramson D, Giddy J. Nimrod-G Resource Broker for Service-Oriented Grid Computing. IEEE Distributed Systems Online, Volume 2, Number 7. November 2001. [4] The Condor Project Homepage. http://www.cs.wisc.edu/condor [02/28/2005] [5] CRO-GRID Homepage. www.cro-grid.hr [02/28/2005] [6] CRO_GRID Infrastructure Project. www.srce.hr/crogrid/infrastructure/ [02/28/2005] [7] Global Grid Forum. http://www.ggf.org [02/28/2005] [8] Globus Alliance. http://www.globus.org [02/28/2005] [9] The Gridbus Project. http://www.gridbus.org [02/28/2005] [10] GridWay. http://www.gridway.org/ [02/28/2005] [11] Moab Grid Scheduler. http://www.clusterresources.com/products/m gs/ [02/28/2005] [12] Open source metascheduling for Virtual Organizations with Community Scheduler Framework (CSF). Technical whitepaper; 2004. http://sourceforge.net/projects/gcsf/ [02/28/2005] [13] Schopf J M. A General Architecture for Scheduling on the Grid; 2002. http://wwwunix.mcs.anl.gov/~schopf/Pubs/sched.arch.2 002.pdf [02/28/2005] [14] UNICORE. http://unicore.sourceforge.net/ [05/05/2005] [15] Venugopal S, Buyya R, Winton L. A Grid Service Broker for Scheduling Distributed Data-Oriented Applications on Global Grids. In: Proceedings of the 2nd International Workshop on Middleware for
Grid Computing; 2004 October 18; Toronto, Canada. ACM Press, 2004, USA.