December 15, SLA˙Quan˙kao˙Join
2005
16:14
WSPC/INSTRUCTION
FILE
Journal of Interconnection Networks c World Scientific Publishing Company
ON ARCHITECTURE FOR SLA-AWARE WORKFLOWS IN GRID ENVIRONMENTS
Dang Minh Quan Paderborn Center for Parallel Computing (PC2 ), University of Paderborn, Fuerstenalle 11 Paderborn, 33102,Germany
[email protected] http:// Odej Kao Department of Computer Science and Paderborn Center for Parallel Computing (PC 2 ), University of Paderborn, Fuerstenalle 11 Paderborn, 33102,Germany
[email protected] http:// Received Day Month Year Revised Day Month Year Service Level Agreements (SLAs) are currently one of the major research topics in Grid Computing, as they serve as a foundation for reliable and predictable Grids. SLAs define an explicit statement of expectations and obligations in a business relationship between provider and customer. Thus, SLAs should guarantee the desired and a-priori negotiated Quality of Service (QoS), which is a mandatory prerequisite for the Next Generation Grids. This development is proved by a manifold research work about SLAs and architectures for implementing SLAs in Grid environments. However, this work is mostly related to SLAs for standard, monolithic Grid jobs and neglects the dependencies between different steps of operation. The complexity of an SLA-specification for workflows grows significantly, as characteristics of correlated sub-jobs, the data transfer phases, the deadline constraints and possible failures have to be considered. Thus, an architecture for an SLA-aware workflow implementation needs sophisticated mechanisms for specification and management, subjob mapping, data transfer optimization and fault reaction. Therefore, this paper presents an architecture for SLA-aware Grid workflows. The main contributions are an improved specification language for SLA-aware workflows, a mapping and optimization algorithm for sub-job assignment to Grid resources and a prototype implementation using standard middleware. Experimental measurements prove the quality of the development. Keywords: Service Level Agreement (SLA); workflow; Grid computing.
1. Introduction Service Level Agreements (SLAs) are currently one of the major research topics in Grid Computing, as they serve as a foundation for a reliable and predictable job execution at remote Grid sites. SLAs are defined as an explicit statement of expectations and obligations in a business relationship between service provider and customer. 1
December 15, SLA˙Quan˙kao˙Join
2
2005
16:14
WSPC/INSTRUCTION
FILE
Dang Minh Quan & Odej Kao
Thus, an SLA manages the expectations, regulates the resource usage and specifies the costs. The application of such an SLA should guarantee the desired and a-priori negotiated Quality of Service (QoS), which is a mandatory prerequisite for the Next Generation Grids. The complexity of an SLA-specification grows significantly, if a workflow of multiple, dependent sub-jobs is considered. In current proposals Grid jobs are defined as monolithic entities, where the user sends the input data to a service, computes the data – without dependencies – on this site and receives the results. Web service2 Cloudiness Web service3 Visibility
Input
Web service1 Data collection
Web service4 Moisture convergence
Web service7 Linear
Web service8 Dynamic Modelling
Web service9 Visualization
Output
Web service5 Phase of precipitation Web service6 Turbulence
Fig. 1.
A sample workflow for weather forecast
A broad number of applications require workflow processing beyond the simple execution of monolithic jobs, because the resources (hardware and software) as well as expertise are distributed over multiple administrative domains. Thus, the processing has to be modelled as a workflow of dependent sub-jobs. Figure 1 depicts a sample scenario by considering weather forecast as an example workflow. The main requirement is that a three hours forecast should be available within 20 minutes after the measurement. In our scenario Web Service 1 collects the input data from many sources such as radars, satellites, lightning detectors etc. Then the data is processed by the Web Services 2 to 6 to get quantities about cloudiness, visibility, moisture convergence, phase of precipitation etc. Thereafter, the results are processed by Web Service 7 to interpolate between the field analysis and the forecast of the numerical weather prediction models. This data is used in dynamical modelling by Web Service 8 to build high resolution models with special physical parameterization schemes for the precise prediction of weather events. Finally, the weather information is visualized by Web Service 9. On this example one can see that a Grid workflow contains many sub-jobs, where each sub-job receives data from one or multiple preceding sub-jobs. Thus, each sub-job in the workflow is different from a monolithic Grid job in following aspects: • Data dependence: The latter sub-job needs the output from preceding sub-job(s). The data dependence leads to a sequence and a time dependence as well. • Specific sub-jobs should be processed in parallel. • Input and output data from different sub-jobs are not always in the same file
December 15, SLA˙Quan˙kao˙Join
2005
16:14
WSPC/INSTRUCTION
FILE
On Architecture for SLA-aware Workflows in Grid Environments
3
system. In the Grid workflow, there are many sub-jobs that must be run in parallel to ensure the SLA and they require certain resources. If the resource of a local RMS cannot meet those demands then the SLA will be violated. Thus it is necessary to re-distribute the sub-jobs to different sites. • If one sub-job fails then it can affect other related sub-jobs and thus violate the entire SLA. These differences lead to several new problems when supporting SLAs for Grid workflows that an existing SLA architecture 3,15 for monolithic job cannot solve. • The user needs a separate SLA for each sub-job, which is a tedious and timeintensive task. Furthermore, mapping each sub-job with methods for singular jobs without workflow consideration leads to a non-optimized solution with higher cost. • Lack of failure reaction mechanism: A failed sub-job will be re-started at another site but other related sub-jobs will be affected without adjustment and in consequence the entire workflow will fail. • The description language, the monitoring mechanism for Grid workflows will be more complex as compared to monolithic Grid jobs. All above problems need new methods to be solved. Therefore, this paper introduces an architecture and a prototype for implementation of SLA-aware workflows in Grid environments. Section 2 presents the related work while the system architecture is depicted in Section 3. Section 4 and Section 5 describe the proposed SLA language for workflows and the mapping algorithm respectively. The implementation and the deployment results are subject of Section 6. 2. Related work The SLA idea is well-known from many different areas, most recently it is investigated in connection with Web Services. The main focus here was/is set on building SLA-aware systems and defining an SLA language, which can be processed automatically16,22 . In 16 a novel SLA-specification language for spontaneous services is presented. In order to implement the system, Sahai et al. proposed an automated and distributed SLA monitoring engine 22 , which is a core component for providing SLA-enabled Web Services. In this architecture the measurement component, the information collection module, the SLA tracking module and the SLA violation module interact in order to ensure SLAs for Web Services. The adaptation of those ideas for Grid Computing is in progress, however different Grid goals and requirements create additional demands. Grid Computing is often used for high-performance computation and thus its demands differ from the demands of transaction-oriented Web Services significantly. As the transaction processing is rather short, compute resources assigned to the Web Service are not considered a primary issue. Moreover, solely the customer may set constraints on the service provider. In case of Grid Computing resource allocation and usage requires
December 15, SLA˙Quan˙kao˙Join
4
2005
16:14
WSPC/INSTRUCTION
FILE
Dang Minh Quan & Odej Kao
e.g. runtime and capacity estimation and thus an increased level of responsibility from the customer leading to novel requirements for the SLA architecture and language. Therefore, in scope of GGF, a number of design efforts for a Grid-based SLA are available, which led to a language recommendation 1 , resource management models for negotiation4 , and properties for resource reservation in the Grids 9 . However, the proposed language and consequently the architecture consider solely the business aspect, which mainly deals with obligation of parties, financial conditions, penalty for failed SLAs, available time period, etc. A detail description and support of the Service Level Objective (SLO), which expresses a commitment to maintain a QoS in a given time interval is still not available. In order to implement the proposed standards Burchard et al. developed a novel architecture, the Virtual Resource Manager, which provides SLA features for the local resource management system3 . The SLA management is achieved via a Virtual Resource Manager (VRM) that enables interaction between a number of local resource management systems on different clusters. This architecture highlights the feature of runtime responsibility, resource virtualization, information hiding, autonomy provision, and smooth integration of existing resource management systems. However, the model is intended to work with a group of co-located cluster, not Grid wide. Keahey et al. propose an architecture called Virtual Application Service (VAS) for managing QoS in computational Grids 15 . VAS is an extended Grid service with additional interfaces for negotiation of QoS level and service demands. The key objective of VAS is to facilitate the execution of real-time services which have specific deadline constraints. A client submits a request to VAS for advance or immediate reservation of a service, supplying only time constraints. The system gains meta information – consisting of application information and application modelling information associated with every service – which allows the system to compute the feasibly of fulfilling the client’s request under such time constraints. From this meta information (such as execution time and hardware resource information) the system determines the computational resources required to support the request and subsequently, undertakes CPU slot reservation. A Service Level Agreement is then presented to the user. Nevertheless, all this work is related to monolithic Grid job execution only. Cao et al. present a workflow management system for Grid computing, called GridFlow which includes a user portal and services of both global Grid workflow management and local Grid sub-workflow scheduling 5 . This is an integrated environment that enables the design of a Grid workflow and the access to Grid services, however without support of SLAs. In this system, the authors used an algorithm that maps each sub-job separately on individual Grid resources. The algorithm processes one sub-job at time, schedules it to a suitable RMS with start time slot not conflicting the dependency of the flow. The selection of the destination resources is optimized with respect to a minimal completion time. When applying this strategy to the workflow within an SLA context, each sub-job will be assigned separately
December 15, SLA˙Quan˙kao˙Join
2005
16:14
WSPC/INSTRUCTION
FILE
On Architecture for SLA-aware Workflows in Grid Environments
5
to the cheapest feasible RMS. This strategy allows fast computation of a feasible schedule, but it does not consider the entire workflow and the dependencies between the sub-jobs. Deelman et al. presented a system called GriPhyN which allows request submission by an application-specific description of the desired data product 7 . The system generates a workflow by selecting appropriate application components, assigning the required computing resources and overseeing the successful execution. The mapping of Grid workflows onto Grid resources is based on an existing planning technology. This work focuses on coding the problem to be compatible with the input format of a specific planning system and thus transforming the mapping problem to a planning problem. Although this is a flexible way to gain different destinations, which include some SLA criteria, significant disadvantages regarding the time-intensive computation, long response times and the missing consideration of Grid-specific constraints appeared. The latter is the main cause that the suggested solutions often do not express the expected quality. 3. System Architecture Currently, a large number of Grid Resource Management systems as well as local Resource Management Systems (RMS) exists. However, many local RMSs still do not fulfil all requirements needed for Grid computing, in particular those related to SLAs and QoS issues. The Open Grid System Architecture (OGSA) attempts to converge the diversity in Grid Resource Management field. Our architecture has the main aim to enable SLA-aware, Grid-based workflow and is based on the OGSA model to gain a most compatible solution. The main component is a so called SLA flow broker, which can be seen as a virtual Grid combining various sites with their individual resources. The general system architecture is presented in Figure 2. Following sections trace the architecture layer by layer. Web Service
UI / Web browser QoS monitoring
SLA negotiation
SLA flow broker Sub SLA negotiation
QoS monitoring
SLA local RMSs RMS Fig. 2.
RMS
RMS
General system architecture
December 15, SLA˙Quan˙kao˙Join
6
2005
16:14
WSPC/INSTRUCTION
FILE
Dang Minh Quan & Odej Kao
3.1. SLA-aware RMS A Grid middleware can support SLAs only if the underlying local RMS is able to assure the fulfilment, i.e. to guarantee the negotiated QoS. So an SLA-aware RMS has at least following features: • Resource reservation: An SLA demands for certain resources during the lifetime. To reach the desired performance it is essential that a specified amount of processors, network bandwidth or disk capacity is available at runtime. This can be achieved only with a reservation which does resource planning for the present and the future. • Resource usage monitoring: To ensure QoS, the health of planned resource which dedicates to the requirement must be observed. Those monitoring information will help the system to detect abnormal activity which can affect the QoS so it can react properly. • Error recovery: Mechanisms for checkpointing and job migration should minimize the number of SLA violations. In case of hardware failures, the running job will be stopped and moved to other resources with a minimal loss of time. An example for a RMS fully or partly satisfying those conditions the system software CCS12 and MauiME13 can be mentioned. 3.2. SLA flow broker The SLA flow broker plays a major role in the system, as it provides a virtual view for the client on the Grid services and resource. Clients do not necessary know about the underlying structure of the Grid, they solely work with the broker. Thus, the broker must have the ability to manage workflows, to discover Grid resources, to negotiate SLAs and to monitor QoS requirements. Figure 3 presents the architecture of the broker component.
Parser
SLA language
Failure Reaction
Database
Task Mapping Fig. 3.
Sub SLA Negotiation
QoS Monitoring
Broker architecture
In following the main components of the SLA flow broker are Described.
December 15, SLA˙Quan˙kao˙Join
2005
16:14
WSPC/INSTRUCTION
FILE
On Architecture for SLA-aware Workflows in Grid Environments
7
SLA language: The language is the data container which describes the most aspects required for an SLA-aware workflow such as business description, flow description, job description, and resource description. It provides an uniform format that can be processed automatically at each component. The language specification is given in Section 4. Parser: This component receives the SLA documents from the UI and analyzes three types of information – flow information, resource information and SLA information. These attributes are related to each other and can be changed through the SLA negotiation phase. Task Mapping: Generally, a workflow includes several monolithic jobs, which have to be executed sequentially or in parallel. This module finds an appropriate RMS for each of the sub-jobs based on the information provided by the parser. The mapping between the RMS and the sub-jobs is done by an optimization algorithm in order to minimize the cost by finding an appropriate RMS for the sub-SLA with the lowest price. A second algorithm discovers an appropriate topology among the possible RMS to optimize the cost for staging data in. Of course, these two types of optimizations may lead in opposite directions, thus a third algorithm has to generate the best possible compromise. After finding an appropriate RMS for the sub-jobs, the broker will negotiate a suitable sub-SLA. More details about the algorithm are presented in Section 5. Sub-SLA negotiation: This module executes the SLA negotiation with each possible RMS. If successful, the SLA is signed, otherwise the broker computes another mapping solution and repeats the negotiation process. If no solution is available, the broker will notify the client for additional modification. Monitoring: After signing all sub-SLAs, the broker stages the sub-jobs from the client to the RMS. The monitoring component can ask the RMS to provide the current status of the job (periodically or on-demand) in order to control the job progress and the fulfilment of the required QoS. Failure reaction: If one of the Grid components fails and cannot finish the assigned sub-job, the entire workflow is affected. In case of failure reaction, the component initiates one of the following actions: • Allow the RMS to re-start the job from the nearest checkpoint on an alternative resource within the site. • If the SLA cannot be fulfilled any more, the penalty will be booked and alternative resources are sought for a re-start of the sub-job. Furthermore, the reserved times for the subsequent steps have to be adapted to the new situation. 3.3. User Interface/Web browser The client module provides features, which support clients describing the workflow as well as the SLA conditions. To construct a Grid workflow, a user needs to define properties of each sub-job and their execution sequences such as
December 15, SLA˙Quan˙kao˙Join
8
• • • •
2005
16:14
WSPC/INSTRUCTION
FILE
Dang Minh Quan & Odej Kao
Job flow with sub-jobs including pre- and post processing conditions Resource requirements for each sub-job QoS requirements for each sub-job SLA information for the workflow
The XML document describing the SLA can be transferred back and forth between the user interface (UI) and the broker in the negotiation process. Usually users do not need to revise the document, unless no response due to the high QoS or resource requirements arrived. In this case the SLA will be modified based on the information supplied by the broker. After submitting the job, the UI receives the required monitoring information from the broker to prove the job status according to the SLA requirements. 4. SLA language for workflows The following section describes a language proposal for the specification of SLAaware workflows, which considers the business, resource, job requirements and the service level objective. 4.1. Context constraint The SLA-aware workflow in Grid environments is defined on an abstract layer by specifying the data and the control flow among software and data resources. The definition of the Grid job is independent of the hardware infrastructure, and it is up to the Grid architecture to map this Grid job onto adequate Grid resources using additional meta data and to ensure its reliable execution. Similar to existing approaches we also assume a static Grid architecture, consisting of a set of reliable distributed resources, and use static workflow modelling based on Directed Acyclic Graphs (DAG)7,8 . The proposed language is based on the Common Job Description Mark-up Language17 and can describe the most aspects required for an SLA-aware workflow such as business description, flow description, job description, and resource description. As depicted in Figure 4, an SLA for a Grid workflow includes five components: General SLA description, specification of the included sub-jobs with computational tasks and related SLO descriptions, data transferring tasks and signature. A typical SLA starts with general information such as SLA title, main aim description, deadlines, responsible provider, costs, penalties, etc. The specification of sub-jobs is the main part of the SLA, which is compiled into a Grid workflow specification. Each sub-job part describes the information about its own computational task as well as the related SLOs. The computational task gives the requirements on software, hardware, I/O, executables and all other resources needed for a successful run. The computation task description can be divided into three categories. • Software components required by the task including specification of the operating system, database system, message passing library and others.
December 15, SLA˙Quan˙kao˙Join
2005
16:14
WSPC/INSTRUCTION
FILE
On Architecture for SLA-aware Workflows in Grid Environments
9
SLA workflow General SLA description Subjobs description SLO description Data transfer description Signature
Fig. 4.
Layers of SLA language for Grid workflow structure
• Hardware components required including the specification of the CPU architecture, memory/storage capacity, CPU speed, network speed, other special device such as scanner, DVD Writer etc. • Task description with specification of the I/O data, executable, environment, start parameters, etc. An example for modelling those aspects is given in Figure 5. For the implementation of such an example with other languages, e.g. the language proposed in 1 , the integration of at least two additional components is mandatory. COMPUTE_TASK Software request: OS=Linux, database=MySQL, meslib=LAM-MPI Resource request: numnode=4, arch=x86, mem=256MB, storage=10GB, CPUspeed>=1000Mhz, expert=1 Task description: argument= -a 1024 -p 55, executable=cfp, stdin=cfp_stdin, stdout=cfp_stdout, stdlog=cfp _stdlog, checkpointdir=ckp Fig. 5.
SLA specification of HW/SW resources as well as the task description
SLOs focus on deadlines and the number/capacity of required resources. Both provider and consumer can cause failure of a Grid job, for example if the time to finish the computation task passes the deadline, because the provider resources failed or because the customer underestimated the task duration. Another scenario is
December 15, SLA˙Quan˙kao˙Join
10
2005
16:14
WSPC/INSTRUCTION
FILE
Dang Minh Quan & Odej Kao
related to the number of requested and really used resources: If during runtime more resources are used than allocated, other reservations can be affected and thus the job has to be stopped. None of the available SLA languages considers these issues. The proposed SLO states all these problems by implementing following structures: • Condition: What type of problem occurred, e.g. runtime exceeded the deadline or over used resources. • Reason: Why the problem occurred, e.g. system failure or wrong estimation. • Responsible site: Who is responsible for the problem, provider or consumer. • Action: How the system will treat the task, e.g. cancel or continue. • Penalty: Which penalty is to be paid by the site responsible for the problem. • Monitor information: Which job information has to be monitored so that consumer can track the progress or observe the failure circumstances? An example for the statement ”If the allocated runtime exceeded because of system failure, the provider will be fined 1000 USD per hour overdue. Provider sends an exit code to the customer when the job is finished.” is given in Figure 6. Of course, not every SLO needs all above information, the number of the data fields depends on the current scenario under investigation. SLO SLO_CONDITION = Runtime exceed SLO_ REASON=System failure SLO_ RESPON_SITE=Provider SLO_ACTION= Continue running SLO_PUNISH=1000 USD SLO_MOINITOR= exit code when finished
Fig. 6.
Sample SLO specification
In the next step the data transfer between the sub-jobs is described. The data transfers between related sub-jobs also define the rank order between them. Thus one can see that each data transfer presents a directed arc in the flow graph. The arc is described with a pair (source sub-job ID, destination sub-job ID). Additional information about data to be transferred is presented by list of files. The signature describes the legal status of the involved parties. Further details about the SLA language for workflows are given in 19 .
5. Mapping algorithm This section describes the proposed solution for the problem of mapping a workflow to a set of RMSs within an SLA context.
December 15, SLA˙Quan˙kao˙Join
2005
16:14
WSPC/INSTRUCTION
FILE
On Architecture for SLA-aware Workflows in Grid Environments
11
5.1. Problem statement The formal specification of the described problem includes following elements: • Let K be the set of Grid RMSs. This set includes a finite number of RMSs which provide static information about controlled resources and the current reservations/assignments. • Let S be the set of sub-jobs in a given workflow including all sub-jobs with the current resource and deadline requirements. • Let E be the set of connections (edges) in the workflow, which express the dependency between the sub-jobs and the necessity for data transfers between the jobs. • Let Ti be the set of time slots for the sub-job S i , Si ∈ S. • Let Ki be the set of resource candidates of sub-job S i . This set includes all resources which can run sub-job Si , Ki ∈ K. Based on the given input, a feasible and possibly optimal solution is sought, which allows the most efficient mapping of the workflow in a Grid environment with respect to the given global deadline. The required solution is a set defined as R = {(Si , kij , til )|Si ∈ S, kij ∈ Ki , til ∈ Ti }
(5.1)
A feasible solution must satisfy following conditions: • For all Ki 6= 0 at least one RMS exists which can satisfy all resource requirements for each sub-job. • The earliest start time slot of the sub-job S i ≤ til ≤ the latest start time slot of Si . Each sub-job must have its start time slot in the valid period. • The dependencies of the sub-jobs are resolved and the execution order remains unchanged. • Each RMS provides a profile of currently available resources and can run many sub-jobs of a single flow both sequentially and in parallel. Those sub-jobs which run on the same RMS determine a resource requirements profile. With each RMS kij running sub-jobs of the Grid workflow, with each time slot in the profile of available resources and profile of resource requirements, the number of available resources must be larger than the resource requirement. In the next phase the feasible solution with the lowest cost is sought. The cost of a Grid workflow in this specific example is defined as a sum of four factors: compute time, memory usage, cost of using experts knowledge and finally time required for transferring data between the involved resources. If two sequent sub-jobs run on the same RMS, the cost of transferring data from the previous sub-job to the later subjob neglected. Considering the RMS’s ability to run several sub-jobs in parallel and the evaluation of resource profiles increases the complexity of the flexible job shop scheduling problem. It can be shown easily that the optimal mapping of Grid-based workflows as described above is a NP hard problem.
December 15, SLA˙Quan˙kao˙Join
12
2005
16:14
WSPC/INSTRUCTION
FILE
Dang Minh Quan & Odej Kao
5.2. Algorithm Workflow mapping to RMS with respect to agreed SLA context can be seen as a special case of the well known job shop scheduling problem 2,6 . For solving this problem, two methods – complete and incomplete method – exist. A complete method explores systematically the entire search space, while the incomplete (non-exact) method examines as rapidly as possible a large number of points according to selective or random strategy. Local search is one of the most prominent examples for this approach, which is realized by a number of methods such as Tabu Search 10,11 , Simulated Annealing21 , GA14 etc. However, with the appearance of resource profiles, the evaluation at each step in local search is hard and – in case of large number of variables – time-consuming. The proposed algorithm for mapping workflows on Grid resources uses Tabu search to find the best possible assignment of sub-jobs to resources. In order to shorten the computation time caused by the high number of resource profiles to be analysed and by the flexibility while determining start and end times for the sub-jobs, several techniques for reducing the search space are introduced. The core algorithm requires a specification of the workflow and sub-job requirements as well as the description of the available resources as input. This information is necessary in order to resolve the dependencies in the workflow and is stored in a file. The properties of the involved RMS are contained in a database. This input information is processed according to the following algorithm steps: (i) Based on the description in the database, all RMS are selected, which fulfil the requirements of at least one sub-job. (ii) Computation of the earliest start time and latest start time for each sub-job by analysing the input workflow with traditional graph techniques. (iii) Removing requirement bottlenecks. This task aims to detect bottlenecks with a large number of resource requirements, which can reduce the possible start/end times of a sub-job. Based on this information sub-jobs are moved to other sites with more available resources in order to gain a longer time span for the positioning of the sub-jobs in the workflow. (iv) Definition of the solution space by grouping all RMS candidates, which have enough available resources to start the sub-jobs within the determined slack time and to run the sub-job until it is completed. This task is performed by establishing a connection with the site and retrieving the current and planned schedule. (v) The gained search space is evaluated with respect of the contained number of feasible solutions (large or small number of possible solutions). Subsequently, an initial solution is created. (vi) Starting with the initial solution, a Tabu search is applied in order to increase the quality of the solution and to find the best possible and feasible assignment. A detailed presentation of the experimental results is found in 20 .
December 15, SLA˙Quan˙kao˙Join
2005
16:14
WSPC/INSTRUCTION
FILE
On Architecture for SLA-aware Workflows in Grid Environments
13
6. Prototype implementation and experiment An implemented prototype fulfils the above constraints and provides basic features, which allow any client to negotiate SLAs, monitor running workflows and to receive the result. Based on standard components such as Globus Toolkit 3.2, MySQL database, Maui ME for a local RMS, the system is compatible with the existing Grid infrastructure. The overall architecture schema is depicted in Figure 7. Client console OGSI SLA workflow broker service Parser
Database Mapping Monitoring Negotiation
Client
Broker
OGSI SLA local service OGSI SLA aware layer Maui ME Fig. 7.
Local RMS
System implementation layers
The system includes three main components: the client plays the consumer role, the SLA-aware workflow broker and the SLA-aware local RMS act as provider. 6.1. SLA-aware RMS The local RMS provides SLA services for each individual site on the Grid. This service follows the OGSA principles and is based on the latest version of Globus Toolkit. When deployed, this service is one among many services that are provided by the Globus Toolkit. The local RMS must be SLA aware, as resource reservations are necessary. Furthermore, standard features such as resource monitoring and fault recovery are mandatory. However, the most of the existing RMSs do not support those features, therefore we provide with MauiME an SLA-aware layer on top of the RMS. The structure of the local RMS is depicted in Figure 8. • Planner. MauiME supports solely node reservations. So we do not use this feature of MauiME but developed a planner which can support reservations for nodes, storage, and experts. • Monitoring. This module monitors the state of jobs/nodes/resources. This information is used to detect failures or QoS violations. • SLA parser. This module is written in Java to parse or create SLA text using the SLA language for workflows.
December 15, SLA˙Quan˙kao˙Join
14
2005
16:14
WSPC/INSTRUCTION
FILE
Dang Minh Quan & Odej Kao
SLA local RMS SLA negotiation Error recovery SLA parser
Planner
Monitoring
MauiME Fig. 8.
SLA-aware local RMS architecture
• SLA negotiation. This module uses the SLA parser and the module planner to check the feasibility of the SLA. The planner output is used to negotiate the SLA with other participants. • Error recovery. This module is responsible for running jobs even if some nodes failed. The monitoring information delivers reports on possible failures. The error recovery module cancels the specific job, uses the planner to allocate new resources and re-executes the job from the checkpoint image. 6.2. SLA workflow broker The SLA workflow broker is the central system unit and responsible for processing of the client requirements, for dispatching sub-jobs as well as for the SLA negotiation between clients and RMS. The SLA workflow broker provides services to clients and uses services from the local RMS. The communication (client – broker) and (broker – local RMS) is done by the OGSI platform. As can be seen in Figure 8, the SLA workflow broker includes many modules. For co-operation among those modules, the SLA broker uses a database to manage all aspect of operation. The database structure, which includes two main information groups, is depicted in Figure 9. The first group describes information about the workflow, which are the general SLA, the sub-job, the SLO, the monitoring and data transfer. The second group describes the local RMS including resource description and resource reservations. The SLA broker receives an SLA requirement from the client and parses it to get all sub-job information. The information is stored in three files: General SLA description, sub-jobs description, and arcs of the workflow description. Based on those files and the data of the local RMS (resource description, reservation in database), the SLA broker invokes the mapping algorithm. If a feasible solution is found it returns the solution which includes the sub-job ID with its associate RMS ID and the starting time slot. The module SLA negotiation uses this information. If the client accepts the solution from the previous step, the SLA broker will invoke an insertion module to enter all necessary information into the database. Subsequently, the next module the SLA local service client module negotiates with the local RMS, collects all GFTP handle services, returns the SLA flow ID to the client together with all GFTP handle services. While sub-jobs are running, monitor information about resource states, job states, reservation state of local RMS are sent to broker
December 15, SLA˙Quan˙kao˙Join
2005
16:14
WSPC/INSTRUCTION
FILE
On Architecture for SLA-aware Workflows in Grid Environments reserv_comp PK
subjob
id
PK
id_hpc num_node num_mem num_exp time_start time_end
id
res_desc id name handle type num_proc cpu_type cpu_speed mem storage num_exp os mpi_lib soft_other cost_comp cost_mem cost_tran cost_exp avail
Fig. 9.
title descrip start_time end_time provider consummer cost real_cost
monitoring
id id_provider start_time end_time id_sla consummer type num_proc cpu_type cpu_speed mem storage num_exp os mpi_lib cost_subjob id_inRMS state
gen_sla PK
PK
15
PK PK
sj_id s_time sj_state sj_term mem_usage
slo PK
dat_tran PK
id
id id_subjob condition reason respon_site punish violate field_affect field_value
id_sla start_time end_time id_provider consummer tran_amount cost_datran FTP_addr
Database structure used in SLA flow broker module
and stored in the database. The broker uses this information to fulfil the monitoring requirements posted by the client. 6.3. Client The client is implemented in Java and provides a Grid service interface. The SLA text is compiled in a file and transferred to the SLA broker through the OGSI infrastructure. In the negotiation period, the difference between submitted and received SLA text is detected and presented to the user. The client component allows the user to supervise the performance of the system, periodically or randomly. 6.4. Experimental results with implemented prototype The prototype is deployed in three cluster systems running Linux, LAM-MPI 7.1, MauiME with different resources configuration as described in Table 1. On the frontnode of each cluster, Globus Toolkit 3.2 and the module SLA-aware local RMS service are installed. One separate machine is used to run the SLA workflow broker. On this system, we have done experiments with several state of the art workflows. The common experiment scenario is presented in Figure 11 with a workflow is executed over many RMSs. The effective of mapping algorithm in the prototype was proved in 20 . Here we focus on other aspects of the system which are SLA negotiation process and the
December 15, SLA˙Quan˙kao˙Join
16
2005
16:14
WSPC/INSTRUCTION
FILE
Dang Minh Quan & Odej Kao
SLA local RMS services Grid service Client
Grid service Client
SLA local RMS services
SLA Workflow Broker
SLA local RMS services
Grid service Client
Fig. 10.
Experiment system deployment
Table 1. RMSs. ID
Resource configuration of Nodes
Storage
Expert
7 14 9
200 100 300
2 1 2
RMS1 RMS2 RMS3
SLA workflow broker Provider 2 Provider 1 Subjob 0
Subjob 1 Provider 3
Provider 3
Provider 1
Subjob 2
Subjob 3
Subjob 6
Provider 1
Provider 2
Subjob 4
Subjob 5
Fig. 11.
Experiment scenario
execution engine. In the case of SLA for workflows, the consumer negotiates with the broker on workflow, the broker negotiates with the provider on each sub-job and the provider negotiates with other providers on data transfer. Which SLA text is to be used in each case and how to synchronize among those phases are problems
December 15, SLA˙Quan˙kao˙Join
2005
16:14
WSPC/INSTRUCTION
FILE
On Architecture for SLA-aware Workflows in Grid Environments
17
which are not stated in the previous works 3,1,4,18 but are solved in our prototype system. When the workflow is executed, if sub-job 0 is finished early, sub-job 1, 2, 4 can be shifted to be run earlier which make the running process more reliable and decrease the fragment of local RMS resource usage. The online demonstration of the running process, which includes the mapping, negotiation process, state monitoring as well as the execution, of a workflow consisting of seven sub-jobs in cooperation of three local RMSs can be found at http://pckao3.upb.de:9035/manual/test.html Figure 12 depicts the Web-based user interface in the running process.
Fig. 12.
Initial web based client
7. Conclusion SLAs are currently one of the major research topics in Grid Computing, as they serve as a foundation for a reliable and predictable job execution at remote Grid sites and represent a mandatory prerequisite for the Next Generation Grids. Therefore, this paper presented a system architecture for definition and implementation of SLAaware workflows in Grid environments consisting of several building blocks. The global architecture includes components for SLA definition and negotiation, task
December 15, SLA˙Quan˙kao˙Join
18
2005
16:14
WSPC/INSTRUCTION
FILE
Dang Minh Quan & Odej Kao
mapping, monitoring and fault reaction. The proposed language can describe the most aspects required for SLA-aware workflows such as business description, flow description, job description, and resource description. In particular, we focussed on aspects related to the Service Level Objective. The developed mapping algorithm is based on a formal problem statement and provides an optimized assignment of sub-jobs to Grid resources with respect to the agreed SLAs and data transfer cost. Finally, all components are implemented using standard Grid and RMS components and provided as online demo. Experimental measurements proved the feasibility and quality of the developed architecture. The future work is related to completion of the middleware by integration of components for accounting, security and fault tolerance. Moreover, the search for suitable Grid resource during the sub-job assignment can be enhanced by utilizing soft computing approaches. Finally, experiments in large environments should allow reliable statements on the scalability of the developed architecture. References 1. A. Andrieux, “Web Services Agreement Specification (WS-Agreement),” Global Grid Forum, http://www.ggf.org., 2003. 2. J. W. Barnes, J. B. Chambers, “Flexible Job Shop Scheduling by Tabu Search,” Technical Report Series,Graduate Program in Operations Research and Industrial Engineering. The University of Texas at Austin, ORP, 96(09) (1996). 3. L. Burchard, M. Hovestadt, O. Kao, A. Keller, and B. Linnert, “The Virtual Resource Manager: An Architecture for SLA-aware Resource Management,” Proc. IEEE CCGrid 2004., IEEE Press, 2004. 4. K. Czajkowski and I. Foster and C. Kesselman and V. Sander and S. Tuecke, “ SNAP: A Protocol for Negotiating Service Level Agreements and Coordinating Resource Management in Distributed Systems,” Proc. 8th Workshop on Job Scheduling Strategies for Parallel Processing, 2002., LNCS Press, 2002. 5. J. Cao, S. A. Jarvis, S. Saini, G. R. Nudd, “GridFlow: Workflow Management for Grid Computing,” Proc. 3rd IEEE/ACM Int. Symp. on Cluster Computing and the Grid., ACM Press, 2003, pp. 198–205. 6. S. D. Peres, J. Roux, J.B. Lasserre, “Multi-resource shop scheduling with resource flexibility,” European Journal of Operational Research, 107 (1998), pp. 289–305. 7. E. Deelman, J. Blythe, Y.Gil, C. Kesselman, G. Mehta, K. Vahi, K. Blackburn, A. Lazzarini, A. Arbree, R. Cavanaugh, S. Koranda, “Mapping Abstract Complex Workflows onto Grid Environments,” Journal of Grid Computing, 1(1) (2003), pp. 25–39. 8. M. Erwin, D. William, “UNICORE: A Grid Computing Environment,” ConcurancyPractice and Experience, 14 (2002), pp. 1395–1410. 9. I. Foster, C. Kesselman, C. Lee, R. Lindell, K. Nahrstedt, and A. Roy, “A distributed resource management architecture that supports advance reservation and coallocation,” Proc. International Workshop on Quality of Service., 1999, pp. 27–36. 10. F. Glover, “Tabu search Part I,” ORSA Journal on Computing, (1989), pp. 190–206. 11. F. Glover, “Tabu search Part II,” ORSA Journal on Computing, (1990), pp. 4–32. 12. M. Hovestadt, “Scheduling in HPC Resource Management Systems: Queuing vs. Plan-
December 15, SLA˙Quan˙kao˙Join
2005
16:14
WSPC/INSTRUCTION
FILE
On Architecture for SLA-aware Workflows in Grid Environments
19
ning,” Proc. 9th Workshop on JSSPP at GGF8, LNCS, 2003, pp. 1–20. 13. D. Jackson, Q. Snell and M. Clement, “Core Algorithms of the Maui Scheduler,” Proc. 7th Workshop on Job Scheduling Strategies for Parallel Processing, LNCS 2221, 2001, pp. 87–102. 14. I. Kacem, S. Hammadi, P. Borne, “Approach by Localization and Multi-objective Evolutionary Optimization for Flexible Job-Shop Scheduling Problems,” IEEE Transactions on Systems, Man, and Cybernetics. Part C, 32 (1) (2002), pp. 1–13. 15. L. Keahey and K. Motawi, The Taming of the Grid: Virtual Application Service. , Argonne National Laboratory Technical Memorandum No. 262, May 2003. 16. H. Ludwig, “A service level agreement language for dynamic electronic services,” Proc. 4th IEEE Workshop on Advance issues of E-Commerce and Web-Based Information Systems., IEEE Press, 2002. 17. S. McGough, “A Common Job Description Markup Language written in XML,” Global Grid Forum, http://www.ggf.org., 2003. 18. L. Nassif, J. M. Nogueira, M. Ahmed, R. Impey, A. Karmouch, “Agent-based Negotiation for Resource Allocation in Grid,” 3rd Workshop on computational Grids and applications, Summer program LNCC - 2005, 2005. 19. D.M. Quan, O. Kao, “On Architecture for an SLA-aware Job Flows in Grid Environments,” Proc. 19th IEEE International Conference on Advanced Information Networking and Applications (AINA 2005)., IEEE Press, 2005, pp. 287–292. 20. D.M. Quan, O. Kao, “Mapping Grid job flows to Grid resources within SLA context,” Proc. European Grid Conference,(EGC 2005), LNCS., Springer Verlag, 2005, pp. 1107– 1116. 21. N. Sadeh, Y. Nakakuki, “Focused Simulated Annealing Search: An Application to Job Shop Scheduling,” Annals of Operations Research, 60 (1996), pp. 77–103. 22. A. Sahai, “Automated SLA monitoring for Web Services,” DSOM 2002, LNCS., Springer Verlag, 2002, pp. 28–41.