General Running Service: An Execution Framework for ... - CiteSeerX

2 downloads 377 Views 499KB Size Report
2School of Computer Science and Technology, Shandong University , China [email protected], ... implementation in ChinaGrid Support Platform (CGSP) is provided. ... managers. It provides the abilities to do remote job execution ...
General Running Service: An Execution Framework for Executing Legacy Program on Grid * Likun Liu 1, Yongwei Wu 1, Guangwen Yang 1, Ruyue Ma 2, Feng He 1 1 Department of Computer Science and Technology Tsinghua National Laboratory for Information Science and Technology, Tsinghua University Beijing 100084, China 2 School of Computer Science and Technology, Shandong University , China [email protected], {wuyw,ygw}@tsinghua.edu.cn {maruyue,hefeng}@chinagrid.edu.cn Abstract Legacy program execution system, which is necessary for Grid to enable users to reuse all legacy programs without reengineering source codes, is still an open problem. We propose General Running Service (GRS) as a legacy program execution framework to solve two of the most important problems, shielding heterogeneousness of all kinds target system and exactly configuring complex execution environments required by legacy program. In this paper, we first abstract the task model of grid task as the core of our execution framework, following with some conceptions of GRS are provided including application deployment, JSDL job submission and target system adapter. And then the detail of GRS implementation in ChinaGrid Support Platform (CGSP) is provided. Furthermore, the performance of GRS through comparing with GRAM of GT4 is also evaluated.

1. Introduction Computational Grid is a platform for demanding scientific and engineering applications. From the Execution Management perspective, grid, just like Internet which is a network of networks, is a job manager of all kinds of heterogeneous native job managers. It provides the abilities to do remote job execution transparently, shields the heterogeneousness

* This Work is supported by ChinaGrid project of Ministry of Education of China, Natural Science Foundation of China (60373004, 60373005, 90412006, 90412011, 60573110, 90612016), and National Key Basic Research Project of China (2004CB318000, 2003CB316907)

of different target systems, and provides a uniform and user-friendly interface to access computing resource. Legacy program execution service is necessary for Grid in order to enable user to reuse all legacy program in grid environment without reengineering source codes. Furthermore, by providing a service-based job execution interface for legacy program, it can be invoked or composed into a Workflow as an activity, and the goal of implementing co-operation between each other and integrating computing resources can be achieved. Two problems of legacy program execution in grid environment which must be issued are: How to exactly configure the complex execution environment needed by a legacy program. How to shield heterogeneousness of all kinds of target systems and provide a transparent and uniform execution interface. Different grid provides its own solution to both of the problems, Globus use GRAM, while UNICORE use its own, and our approach is GRS. The goal and motivation of GRS is to implement the virtualization of computing resource, providing uniform submission interface for legacy programs task, and executing legacy programs without any modification in the grid environment. A Grid task model is proposed as an abstraction for grid task; furthermore we provide a uniform job submission interface by use of JSDL, shield heterogeneousness of target system using target system adapter, and define the execution environment required by programs through Application Description Files (ADF). All of these conceptions consist of our GRS execution framework. This paper is organized as follows: Section 2 illustrates our grid task model. The conception of GRS is described in Section 3. In Section 4, we give more details about how we implement our execution

framework in ChinaGrid Support Platform (CGSP) [2][3]. Performance evaluation is put forward in Section 5. Some related works is in Section 6 and we provide our conclusions and future work in Section 7.

2. Grid task abstraction The core issue of an execution management is the notion of job and task. In Grid environment, task is an atomic unit of Grid execution with specific data input from and output to Grid; job is computational unit that may consist of more than one task. Different local job management systems provide different management interface for submitting, monitoring, and controlling tasks, so some degree of abstraction is needed to provide a unify way to submit, monitor, and control executions of different tasks. The goal of our task model is to support a single computing operation with specific data staging in Grid environment. It is not necessary to support complex task consisting of many actions and support conditional execution of actions depending on the states of other actions, which is the responsibility of Workflow Manager over GRS. To fulfill this goal, we define a minimal set of several related abstractions that a task must contains, that is, ID: a unique identifier to identify the task among all the tasks. User credential: specifying which user creates and owns the task. Status: that is Submitted, started, downloading data, running, uploading data, completed, aborted, failed and error. The status transition diagram for a task is shown in Figure 1.

Control handler: the control handles is used to control the task. Four fundamental handles is required, that is: start, abort, suspend, resume. A task execution is defined as a strict order of three phases: data staging in, doing native computation, data staging out. Both the staging in and out can be omitted if no requirements. We will not try to do data staging and computation of the same task simultaneous. Each phase can have one or more native processes or threads to do real work simultaneously, but no processes and threads can be across two phases. The task phases are shown in Figure 2 . -

Figure 2. Task phases

3. Conceptions of GRS The aim of GRS is to reliably execute a program with complex environment requirement, shield all kinds of heterogeneous target system, and provide a consistent view of computing resource for grid. To achieve this aim, three conceptions are proposed, i.e. application deployment, JSDL job submission, and target system adapter. In this section, we will describe them in details.

3.1 Execution environment definition and application deployment

Figure 1. Transition diagram of task status

-

Provider: The target system backend used for this task. This plays a key role in translating the abstract task into protocol-specific task.

Some legacy programs need a complex or specific execution environment in order to execute correctly. Our approach to this problem is application deployment. Such a program must be deployed as a GRS Application in order to define an exact execution environment. Although for some simple programs with no specific requirements of environment configuration, the deployment is not necessary. We can benefit in some other aspects besides defining execution environment by virtue of deployment: avoiding transfer the program before every execution

Providing validity checking of the parameters before task execution Providing a friendly way to access the legacy program providing the uniform interface In order to specify the execution environment required by the program, an Application Description File (ADF) is created during deployment, which enables executing system to know how to execute the program with an exact execution environment needed by the program. -

Figure 3. An example of Application Description File

Figure 3 shows an ADF file example, which consists of five sections. Beginning with a description as the first section, the parameters section describes the list of parameters, each one including name, type, default value, and command line format which indicates the format the parameter will appears in the command line. The restriction of the parameters which is used to do input validation is defined in the restriction section. Three kinds of restriction are defined, i.e. required, sequential, and choice. Required indicates the parameter is required; sequential indicates all the parameter enclose by this elements should appears in order in the command line; and choice indicates only one of the parameters enclosed by this element can be found in the command line. The tsa section specifies the target system adapter to be used. The last section lists the environment that will be set before the program execution. There are three other issues to be addressed:

Execution environment variable: some execution information is needed when the program is deployed or the task is submitted, but can only be determined at the task execution time. Our solution to this problem is execution environment variable which is a variable used by the task to get the physical execution information that only can be determined by the task execution time. By use of execution environment variables, we provide user and administrator a virtual view of program execution space with no knowledge of the physical execution environment. The execution environment variable is processed by TargetSystemAdapter (see Section 3.3 for details) before real task execution. At current time, four execution environment variables are defined and must be supported by all the TargetSystemAdapters, i.e. ${cwd}, ${stdout}, ${stderr}, and ${deploy.dir}. The directory specified by ${deploy.dir} on the computing resource is the location the application is deployed into. You can define your own execution environment variables as long as your TargetSystemAdapter can accept and process them properly. Pre- and Post-Execution Scripts: Each legacy program that has been deployed as a GRS application may have pre- and post-execution scripts to execute before and after the program is executed. It enables the user who deploys the application to configure the execution environment in a more flexible way before the program execution or do some clean work after the program finish. If the pre-execution script fails, the program will not be executed. All post-execution scripts can either be executed or not when their associated program fails depending on a flag set when deploying the application. Dynamic deployment: GRS support dynamic deploying and undeploying of the application which enable user to deploy and undeploy GRS application without stopping the GRS service. It is important for a real Grid system, as frequently shutting down and starting up of service is not allowed because there are some tasks which may take several days to finish.

3.2 JSDL job submission Job submission, as a fundamental function which is provided by any execution system, is not covered in our task model, since the task itself does not exist before the submission. We provide supporting for the Job submission using TaskFactory Service which is an individual service independent from the TaskManager. As all the different target systems define their own job submission languages and provide different interfaces and utilities for job definition and submission, which

should be transparent for Grid end-users, our approach to this problem is JSDL. JSDL [4] is XML-based language for describing requirements of computational jobs for submission to resources, particularly in Grid environments (though not restricted to the latter) [4]. It is developed by GGF’s JSDL WG [5] and is becoming the standard language of Grid job submission. In order to enable the target system with no support of JSDL to know how to execute tasks described using JSDL without significant changes on the target system, the conversion should be made to map JSDL job submission to the native job submission that can be accepted by the target system. The conversion is performed at two levels: to an internal presentation at GRS task manager and added some information which is not available to the submitter, and to the specified presentation at the TargetSystemAdapter. Figure 4 shows a case study of conversion between JSDL and GRS’s internal presentation. The jsdl:Application is transformed as an exec operation which use a fork TSA to do real execution;

All the jsdl:DataStagings with jsdl:Source tags is transformed a set of transfer operation in a parallel section before exec operation, so that all staging-ins can be parallel executed. All the jsdl:DataStagings with jsdl:Target tags transforms in the same way except executing after exec operation. Additionally, some extra information is added, such as cwd, id, which is not available to submitter.

3.3 Target system adapter In order to shield the GRS from the detail of all kinds of target system, a Target System Adapter (TSA) is used for each kinds of target system. The TSA encapsulates the associating underlying target system, and provides a unified communication interface which enables GRS to use a consisted way to control the target system. Additionally, it is also responsible for transforming the GRS internal submission to a format that can be accepted by the target system.

Figure 4. A case study of conversion from JSDL submission to GRS presentation

A schematic diagram of the GRS execution architecture is given in Figure 5 to illustrate how the GRS works in detail.

Figure 5. The schematic diagram of GRS execution architecture

When a JSDL request is received by task factory (step 1), it create a task resource via corresponding TSA according to information of JSDL (step 2) and return the resource key to the submitter (step 3); the submitter manages the task through task manager (step 4) which exploits the underlying TSA to do real operation(step 5,6).

4. GRS implementation in CGSP We have implemented most of the GRS conceptions as a WSRF-compliant service in CGSP. Current version of GRS implementation supports dynamic legacy program deployment and undeployment, supports submitting job using JSDL, supports data staging using local cp command and CGSP data service, supports using fork and PBS as our local task manager and running MPI program (Other local task manager supporting is underway). The whole GRS execution architecture is show in Figure 6. Two of the most important functionalities provided by GRS are application 1) deployment management which is the responsibility of the application manager and 2) task execution management which is supported by task manager. Application deployment management Three main components, application packer, application manager and application database consist of the application management system, whose responsibility is for managing all legacy programs that has been deployed as GRS applications, carrying out program deployment and undeployment.

Figure 6. GRS Architecture

The arrow tagged by square in Figure 6 shows a complete workflow of application deployment. The administrator packs the program and all other files that are needed by the program using application packer, which is Step a. During the packing, an ADF file is generated according to the information provided by the administrator and packed into the deploy package also. In Step b, a deploy request is send to application manager. Application manager exploits its internal deployer to complete the deployment, that is, putting the new program into application database and updating the meta-information of application management (Step c). At last, an update notification is send to information center via the register (step d). After that, all component of the Grid will found and can use the application by invoking associated GRS service. A friendly GUI interface is provided on the CGSP portal to facilitate user to do application deployment. Figure 7 snapshoots the GRS Application deployment of cap3 program. Task execution management is the core of GRS, which is responsible for executing, monitoring and controlling execution of native tasks. It consists of three logical components: Task Manager does task management in an abstract level; it consists of life-cycle manager which maintains the life-cycle of all tasks and is responsible for terminating a task when it expires, monitor which is responsible for monitoring the status of all tasks via the mechanisms of polling and notification, and task pool which stores the information of all tasks. The life-

cycle manager plays an important role in GRS. It enables GRS to reclaim the resource and destroy the task under any circumstance. This is very important if job manager of CGSP crashes. There is no scheduler in the task manager, because it is not required as GRS is more like a dispatcher that dispatches task requests to underlying target systems which usually provide internal scheduling mechanism. Task Factory is used to create corresponding task according to specified submission request. It exploits the proper target system to create the task instance and return the resource identifier to the task manager. Target System Adapter (TSA) is responsible for communicating with corresponding target system to do real execution, and providing a consistent interface for abstract task management; Figure 6 outlines the whole execution workflow of a GRS application using arrow tagged by small circle with the digit number as its order. An authorized user can get the information of all available application by querying infoCenter (step 1) and submit a JSDL job to the job manager (step 2); the job manager, according to its internal scheduling policy, choose an appropriate GRS which can satisfy the resource requirements of the job, and pass the job to it (step 3). The task factory creates the task resource in the help of TSA and return resource id to job manager (step 4). The task manager exploits the data manager to stage in the input files (step 5), and starts native task by virtue of TSA (step 6, 7). When the native task completes, outputs was uploaded to data spaces (step 8).

Figure 7. A snapshoot of legacy program deployment

5. Performance evaluation Some experiments are performed to evaluate the performance of GRS under heavy workload. The experiments are setup as follows: Two Intel Pentium machines (Pentium IV 2.4GHz, 2G Memory, RedHat Linux 9.0) are used, one as the computing node, with Both GRS and GRAM installed individually at different time; the other as the client to submit jobs. The job is an execution of cap3 program with the data set of 237934 Bytes. Cap3 is a standard program widely used in Biology. The client executes a multithreaded Java program that runs a total of 5 to 25 threads to submit 100 jobs. The average of time of submission response (from the submission request sending to the job starting) and time of running (from the job starting to complet) is shown in Figure 8.

security solution which is password-based now. The average running time is almost the same (between 18s ~ 19s, the different comes from some occasional factor of experiments) which is obviously, since both GRS and GRAM never do the real task execution which is the responsibility of the underlying target systems’.

6. Related works There are several projects that address issues similar to those addressed by this paper. The basic Grid service for executing applications on remote computers is Globus GRAM [6]. While GRAM performs its basic function adequately, it does have some deficiencies. It is so tightly coupled with GSI (short for Grid Security Infrastructure, which is security solution of Globus) that it can not work without GSI. And also it is platform dependency and not fully transparently to the user. Additionally it does not always set the environment of the application as specified by the user, due to difficulties interfacing to the many different types of local scheduling systems [8]. Finally, it also does not capture the exit code of applications executed through it [8]. Otherwise, GRAM does not support JSDL, but use RSL as its job submission language. The Condor-G system [8],[9] and DAGMan [10] build atop the GRAM service, but improve on it by enhancing its reliability and supporting the execution of Directed Acyclic Graphs. Unfortunately, this improvement currently comes with a cost of maintaining and administration [11]. And it is too complex to implement, and that is not necessary for a system with Workflow. UNICORE [14] provides its own services for executing jobs. These jobs can consist of many tasks with execution order dependencies between them and the tasks can be composite tasks that contain other tasks. UNICORE also provides features such as executing each job in its own file space which is a convenient abstraction. Unfortunately, UNICORE is a vertical solution and requires adopting all or none of it.

7. Conclusions and future work

Figure 8. Results of performance testing

The results show that GRS is quicker in job submission than GRAM. The main reason for this may be that GRS and GRAM using different security implementation. GRAM uses GSI [5] which is a X.509-based certification, requiring certificate invalidating for each request. GRS uses CGSP’s

This paper presents out General Running Service which is implemented as a WSRF service and reliably executes a legacy program with complex execution environment requirements. This service is part of our CGSP architecture whose purpose is to integrate the computing resource of the CERNET. An important feature of GRS is that it can reliable executing a program with specific environment requirement and allows users to specify environment in

flexible way. Another important feature of our GRS is that it shields grid from all the details of target systems, provides a uniform interface to access computing resource by virtue of standard JSDL. In future, we are planning to extend system in following directions: first, as required by our user, a graphical user interface for management should be provided. Second, Our Application Description Language should be extended to describe more complex execution environments description. Third, full supporting to JSDL should be provided to enable our system’s interoperability with other Grid system. Four, more local task manager adapter should be provided. And finally, we will try to allow our users to suspend submitted tasks, modify them, and then resume the tasks.

References [1] A.B. Smith, C.D. Jones, and E.F. Roberts, “Article Title”, Journal, Publisher, Location, Date, pp. 1-10. [1] "The Chinagrid Project", http://www.chinagrid.edu.cn [2] Wu Yongwei, Wu Song, Yu Huashan, Hu Chunming, "Introduction to ChinaGrid Support Platform" , Lecture Notes in Computer Science, Vol.3759, P.232-240, 2005 (ISPA2005) [3] Hai Jin, Zhaoneng Chen, Hsinchun Chen, Qihao Miao, "ChinaGrid: Making Grid Computing a Reality", Digital Libraries: International Collaboration and CrossFertilization - Lecture Notes in Computer Science, Vol.3334 [4] Ali Anjomshoaa,Fred Brisard,An Ly, Stephen McGough, Darren Pulsipher,Andreas Savva, "Job Submission Description Language (JSDL) Specification, Version 1.0 (draft 19) ", Global Grid Forum 27 May 2005 [5] Global Grid Forum. http://www.ggf.org [02/28/2005] [6] "The Globus Project," http://www.globus.org [7] Warren Smith, Chaumin Hu, "An Execution Service for Grid Computing", NAS Technical Report NAS-04-004, April 2004 [8] Kaizar Amin, Gregor von Laszewski, Mihael Hategan, "An Abstraction Model for a Grid Execution Framework", Accepted for publication in Euromicro Journal of Systems Architecture [9] E. Imamagic, B. Radic, D. Dobrenic, "CRO-GRID Grid Execution Management System" [10] T. Kiss, T. Delaitre, A. Goyeneche, "GEMLCA: Grid Execution Management for Legacy Code Architecture" [11] Paul D. Coddington Lici Lu Darren Webb Andrew L. Wendelborn, "Extensible Job Managers for Grid Computing" [12] Gregor von Laszewski, Ian Foster, Jarek Gawor, Andreas Schreiber, "InfoGram: A Grid Service that Supports Both Information Queries and Job Execution" [13] Weimin Zheng, Meizhi Hu, Lin Liu, Yongwei Wu, Jing Tie, "FleMA: A Flexible Measurement Architecture for ChinaGrid", International Symposium on Parallel and Distributed Processing and Application 2005, LNCS Vol. 3759, P.297-304 (ISPA 2005)

[14] D. Erwin, "UNICORE Plus Final Report - Uniform Interface to Computing Resources," UNICORE Forum e.V. 2003. [15] Condor, "Condor Version 6.4.7 Manual," University of Wisconsin-Madison 2003.

Suggest Documents