Grid Workflow Optimization with Inferential

0 downloads 0 Views 90KB Size Report
Abstract. Grid workload management and scheduling enable the effi- ... by carrying out workflow manipulation, and by running reactive and proactive actions in ...
Grid Workflow Optimization with Inferential Reasoning Edgardo Ambrosi12 , Laura Bocchi13 , Tiziana Ferrari1 , and Elisabetta Ronchieri1 1 2 3

Istituto Nazionale di Fisica Nucleare - CNAF, Italy Dept. Computer Science, University of Firenze, Italy Dept. Computer Science, University of Bologna, Italy

Abstract. Grid workload management and scheduling enable the efficient distribution of tasks in Grid systems and allow their transparent execution by hiding the complexity of the Grid infrastructure at the fabric layer. We propose an approach to workload management and scheduling that aims at optimizing resource utilization (some optimization criteria are discussed). This is achieved through the usage of a reasoning activity. The reasoner analyzes static and run-time information about the system status to predict the probable future behavior of a workflow execution in the overall Grid infrastructure. The reasoner relies on a knowledge base modeled with Description Logics. In this work, starting from a set of use cases concerning Grid workflows, we model a pool of core Grid services and some typical inter-tasks dependencies (i.e., workflow patterns and dependencies among tasks requirements).

1

Introduction

In wide-area heterogeneous distributed systems, Workload Management (WM) is critical to guarantee the efficient usage of resources. Grid is an example of system based on the principle of resource sharing by large user communities. At present, in many modern Grid infrastructures, scheduling relies only on static properties and pre-determined states of resources. Nevertheless, resource utilization can be enhanced by the addition of run time information and of forecasting capabilities. For this reason, we propose an approach to WM and scheduling that extends the existing solutions with a reasoning activity on Grid properties and states. The proposed approach relies on the availability of monitoring information, offering an up-to-date snapshot of the system. This information, analyzed by the reasoner, allows the prediction of the future system status. The enhancement of resource utilization performance in WM is achieved by carrying out workflow manipulation, and by running reactive and proactive actions in case of specific predicted conditions. In our framework, diagnosis and proactive self-healing are based on probabilistic and temporal Description Logics (DLs), and inferential reasoning. We propose the usage of two knowledge bases to support reasoning. The former

knowledge base is intended to model the pool of resources and services available. The latter complements this by describing a selected number of dependencies. In this paper, we present a few use cases and we identify the dependencies among the tasks of the workflows involved in such scenarios. Dependencies among different workflow schedules are also important, however, this analysis is left as a future work. The reasoning activity of the Grid WM requires a statistical characterization of the behavior of users and services. For this reason, we are considering a number of forecasting models and we propose the usage of accounting data collected from real-life Grids for the identification of more accurate inference rules. The rest of the paper is organized as follows. Section 2 presents some Grid workflow use cases and the dependencies that we consider in the modeling phase. Section 3 describes the high level view of our approach. Finally, Section 4 presents a preliminary discussion on the modeling work.

2

Grid Workflows

Grid workflows address the submission as a whole of a number of inter-dependent tasks, which constitute a workflow schedule. Typically, a user submits an abstract workflow describing a set of required tasks to be executed, as well as the corresponding inter-tasks dependencies. In an abstract workflow, the constituent tasks are not yet associated to a binding. Conversely, in a concrete workflow all the component tasks are already bound to particular service. Given an abstract workflow, the purpose of our framework is to allow a WM system to construct a set of concrete workflows starting from a set of individual and mutual task dependencies. The concrete workflows are selected according to some internal optimization rules discussed in Section 3. We consider two types of inter-task dependencies: temporal and requirementbased. We refer to temporal dependencies as the execution dependencies among tasks. Some examples of temporal dependency are the sequence, the choice and the parallel task composition. A number of Grid implementations already offer a minimal level of coordination within a set of submitted jobs. In particular, Condor DAGMan [1] allows to submit multiple jobs and to specify a causal relation among their executions by means of a Direct Acyclic Graph (DAG). Other workflow engines provide a high level expressiveness, for instance by relying on Petri Nets (e.g., Grid Job Handler, YAWL engine etc.). In our approach we adopt a pattern-based high level notation to define workflow schedules. In particular, we start from a preliminary set of use cases to outline a number of meaningful patterns, defined as a subset of those defined in [2]. The notion of compensating transactions as formalized in [3] is also considered. Our use cases are focused on workflows for data transfer with storage space reservation and network bandwidth reservation, for co-allocation of heterogeneous resources such as computing, storage and network bandwidth, and for bulk submission of executables.

Requirement-based dependencies specify the constraints among tasks that have to be met in order to successfully execute them (e.g., ‘T2 and T1 have to be executed on the same machine’). These are opposed to the task requirements, which describe the set of restrictions imposed by a single task in order to be run successfully (e.g., ‘T1 must be executed on a machine supplying a certain CPU power’).

3

Optimization of Workflow Scheduling

Various optimization criteria can be adopted by the WM in the transformation of an abstract workflow into a concrete one. We provide an overview of a number of possible criteria, that rely on the knowledge of static information (e.g., operating system provided by a service, etc.) and dynamic information (e.g., instantaneous service load, queuing time, etc.). Criteria can pursue different aims: the minimization of the single task execution time, the minimization of the workflow execution time, the fairness of the load distribution, etc. Optimization rules are based on quantifiable metrics such as: the atomic task average queuing time, the atomic task and/or workflow average execution time, the workflow reliability and distribution fairness, etc. Optimization can be carried out in a number of ways, for example by workflow manipulation and Grid infrastructure reconfiguration. Workflow manipulation is possible as a group of workflows can be treated by the WM, in a similar way as a pre-processor can optimize source code (based on a procedural programming language) to generate better object code. Similarly, Grid reconfiguration can be achieved through reactive and proactive actions. Reactive actions are triggered by the occurrence of a problem and aim at performing self-healing operations to minimize and control the consequences that the problem has on the system. For example, reactive actions can prevent the binding of an atomic task to a service whose current and/or predicted status can not meet the workflow QoS requirements. On the other hand, proactive actions aim at avoiding predicted undesirable impairments before they occur. An example of proactive action is the usage of reservation in case of a foreseen resource shortage. Also, a proactive action can trigger the dynamic addition and/or removal or resources to/from Virtual Organization resource pools, according to predicted load fluctuations.

4

Description Logics and Inferential Reasoning

We propose the modeling of resources and workflows in Grids through DLs with temporal and probabilistic extensions (as also suggested in [4] for process-centric workflow specifications), where knowledge is constructed through DL information modeling activities [5]. DL is a formalism supporting knowledge specification and logic rules. It is equipped with a rich set of syntax terms and operators to express: concepts,

logic conjunction and disjunction, relations, inverse relations, classifiers and subclassifier, relation quantifiers, relation cardinality, etc. Many relevant DL extensions have been proposed, such as the addition of probabilistic and temporal operators. These are particularly relevant in this context given the incomplete and/or inaccurate information about future resource states. The reasoning activity relies on two components, namely Knowledge and Deduction. The former one comprises: 1- the knowledge of the inter-dependencies among tasks, 2- the knowledge of the current/past state of physical resources and 3- the knowledge of the inter-dependencies among WFs (left as future work). Deduction, relying on probabilistic/temporal logic rules and inferential procedures, provides the capacity to perform simulations (e.g., predict the system status in a future instant of time) starting from a potentially incomplete or inaccurate information base. Inferential procedures enable to verify the workflow semantics, and to compute, through deduction, a large amount of hidden (i.e., implicit) information about workflows. In DL a set of logical rules are used for deductive activities. These rules need to take into account a number of aspects, including the intrinsic Grid service behavior and service usage patterns of the end-users. Rule validation is critical as scheduling depends on the model accuracy. We propose the scheduler to be instrumented with a number of different forecasting methods: mean and median estimators, and autoregressive methods. In addition to this, we plan to define forecasting models based on information provided by accounting systems from real-life Grids, such as [6]. Our workflow model relies on two elements: the TASK atomic object and the workflow WF class. The WF class comprises a set of inter-dependent tasks or workflows, together with the related inter-dependencies, which are modeled as relationships between classes. The usage of DLs allows the specification of both necessary and sufficient conditions that a TASK and a WF need to obey in order to be member of a given class.

References 1. The Directed Acyclic Graph Manager (DAGMan). http://www.cs.wisc.edu/ condor/dagman/. 2. Van Der Aalst W. M. P., Ter Hofstede A. H. M., Kiepuszewski B., and Barros A. P. Workflow patterns. Distrib. Parallel Databases, 14(1):5–51, 2003. 3. Bocchi L., Laneve C., and Zavattaro G. A Calculus for Long Running Transactions. In Proceedings of FMOODS 2003, volume 2884 of Lecture Notes in Computer Science, pages 124–138. Springer Verlag, 2003. 4. Barros A. P. and ter Hofstede A. H. M. Modeling Extensions for Concurrent Workflow Coordination. In CoopIS, pages 336–347, 1999. 5. Ambrosi E., Bianchi M., Gambosi G., Gaibisso, C., and Lombardi, F. A description logic based grid inferential monitoring and discovery system. In The 2005 International Conference on Grid Computing and Applications. CSREA Press, 2005. 6. Piro R. M., Guarise A., and Werbrouck A. An Economy-based Accounting Infrastructure for the DataGrid. In Proceedings of the 4th Int. Workshop on Grid Computing, pages 202–204. IEEE Computer Society Digial Library, 2003.