Robust Data Placement in Urgent Computing Environments Jason M. Cope 1 , Nick Trebon 3 , Henry M. Tufo 1 , and Pete Beckman 2 1
2
Department of Computer Science, University of Colorado at Boulder UCB 430, Boulder, CO 80309 {jason.cope, henry.tufo}@colorado.edu
Mathematics and Computer Science Division, Argonne National Laboratory 9700 S. Cass Ave, Argonne, IL 60439
[email protected] 3
Department of Computer Science, University of Chicago 1100 East 58th Street, Chicago, IL 60637
[email protected]
Abstract Distributed urgent computing workflows often require data to be staged between multiple computational resources. Since these workflows execute in shared computing environments where users compete for resource usage, it is necessary to allocate resources that can meet the deadlines associated with time-critical workflows and can tolerate interference from other users. In this paper, we evaluate the use of robust resource selection and scheduling heuristics to improve the execution of tasks and workflows in urgent computing environments that are dependent on the availability of data resources and impacted by interference from less urgent tasks.
1
Introduction
Urgent computing environments provide elevated resource access for time-critical applications and workflows. Time-critical applications, such as severe weather forecasting systems, are often deadline-driven and the performance of these workflows is influenced by several Quality of Service (QoS) metrics. Heuristics that insulate urgent computing workflows from fluctuations in the QoS parameters are necessary to reliably and predictably schedule these workflows. We are evaluating the use of robust computing techniques for scheduling urgent computing workflows that are subject to QoS fluctuations.
Prior robust computing research provides heuristics that insulate applications from unexpected variations in the computing environment. This work focused on developing heuristics for mapping application executions in heterogenous computing environments. We apply this prior work to scheduling urgent computing tasks with storage and datarelated constraints. Our contributions include (1) defining data-specific robustness constraints for urgent computing resource allocations, (2) developing resource allocation heuristics that integrate these constraints, and (3) evaluating our proposed robust storage resource allocation heuristics in simulated urgent computing environments. Our goal is to illustrate how robust resource allocations for data-related activities can improve the predictability and reliability of urgent computing workflow execution. While data placement covers activities such as data transfer, replication, and staging [16], we limit our evaluation to the allocation of storage and the positioning of data for task assignments on resources. In this paper we present our evaluation of robust storage resource availability metrics and heuristics for urgent computing applications. Urgent computing environments, infrastructure, and data requirements are described in Section 2. Section 3 describes related research in the areas of Grid resource provisioning, metascheduling, and robust task scheduling. Section 4 describes our formulation of the robustness metrics. Section 5 describes our proposed heuristics. Section 6 presents our evaluation of the robust data resource allocation capabilities in a simulated urgent computing environment. In the final sections of this paper,
Authorized licensed use limited to: University of Nebraska - Lincoln. Downloaded on February 8, 2010 at 15:57 from IEEE Xplore. Restrictions apply.
Site
CPUs (Alloc/Total)
TFlops (Alloc/Total)
Storage (TB)
UC / ANL NCSA TACC NCAR IU Total
316 / 316 1744 / 1744 64 / 5840 256 / 2048 NA / 3072 2380 / 13020
0.61 / 0.61 10.23 / 10.23 0.68 / 62.16 0.72 / 5.73 NA / 30.6 12.24 / 100.93
4 60 106.5 6 266 442.5
Table 1. SPRUCE urgent computing resources provided by TeraGrid resource providers. The fraction of the computational resources (allocated/total) and total storage space are listed on a per site basis.
Figure 1. Topology of the TeraGrid network.
2.2 we present our future work and conclusions.
2
Background
2.1
Urgent Computing and SPRUCE
Our evaluation of robust computing techniques focuses on their use in urgent computing environments. An example urgent computing environment is SPRUCE [2]. SPRUCE enhances Grid and high-performance computing environments with resource allocation, authorization, and selection capabilities required for time-critical applications. SPRUCE integrates with Globus-based Grid resource managers [9], local resource managers such as PBS [13] and Cobalt [10], and Condor high-throughput computing environments [27]. The SPRUCE Advisor provides resource selection capabilities for urgent computing workflows [28, 29]. Additional work on urgent computing environments outlines application data requirements and capabilities [6], evaluates authorization and provisioning components for Web services [7], and evaluates the impacts of data management processes on urgent computing resource usage [8]. Recently, several urgent computing applications and workflows have been implemented. They include severe weather forecasting workflows such as Linked Environments for Atmospheric Discovery (LEAD) [11] and the Southeastern Universities Research Association (SURA) Coastal Ocean Observing and Prediction (SCOOP) storm surge modeling applications [3]. LEAD and SCOOP integrate with SPRUCE and both of these applications use the SPRUCE high-priority resource allocation capabilities to access to TeraGrid computational resources.
Data Transport and Storage Infrastructure
Our focus in this evaluation is on supporting data services and resources available to urgent computing environments. There are several data resources available in urgent computing environments, including the data storage infrastructure and networking infrastructure used to transfer data between storage resources. These storage and network resources are a heterogeneous collection of resources with different storage capacities, throughput, and management policies. Since SPRUCE is supported by the TeraGrid, the SPRUCE environment integrates several TeraGrid resources, including the TeraGrid network and the storage resources integrated with the TeraGrid computational resources. The geographic topology of the TeraGrid is depicted in Figure 1. The TeraGrid network provides a 40 Gigabit/second backbone between Chicago and Los Angles and additional 10 Gigabit/second links to the other TeraGrid resource providers. The most common storage resources available in urgent computing environments are disk-based storage systems attached to computational resources or archival storage systems. On the TeraGrid, the size of these filesystems range from single terabyte cluster filesystems to multi-petabyte archival storage systems. The specifications of the TeraGrid storage systems attached to SPRUCE-enabled TeraGrid resources are described in Table 1.
2.3
Urgent Computing Data Placement Problem Statement
In this paper, we evaluate the scheduling of urgent computing applications in storage-constrained, heterogeneous, urgent computing environments similar to the SPRUCE urgent computing environment deployed on the TeraGrid. We examine the general problem of assigning tasks to heteroge-
Authorized licensed use limited to: University of Nebraska - Lincoln. Downloaded on February 8, 2010 at 15:57 from IEEE Xplore. Restrictions apply.
neous computing resources. We assume that these tasks are composed of several subtasks that include data placement activities, such as input and output data transfers, and computation activities. Every computational resource is coupled to a single storage resource. Each task can only interact with a single storage resource during the task computation phase, but can interact with multiple storage resources during the data placement phases. Each task is assigned a priority. There are four task priorities available in this environment and they range from non-urgent to extremely urgent. For each computational resource in this environment, tasks are assigned to application management queues based on the priority assigned to the task. The implementation of priorities on each resource varies across all of the resources. The expected behavior of user applications is known and can be described to the robust resource selection tools. Each task is assumed to execute frequently enough that the expected execution time, the expected application queue wait time, the expected input data requirements, and the expected output data requirements are known during the application scheduling in this computing environment. Although the expected behavior is known, it is also assumed that application requirements will change. We defined a simple model to describe the data requirements for each application, Equation 1. Each application consists of a set of data placement tasks to bring data to the application, a task that computes a result based on the data and other configuration parameters, and a set of data placement tasks to move the data to another resource if necesary. The storage usage for a task assigned to storage resource is the sum of all input and output data, including temporary and permanent output data products. Dtj =
I X i=0
Dini +
L X
Dtmpl +
l=0
K X
Doutk
(1)
k=0
The utilization of a storage resource, Dsi , is the sum of the storage requirements for each task assigned to the storage and computation resource pair. Equation 2 describes the resource utilization model. This system model is also illustrated in Figure 2. Dsi is time varying and is dependent on the tasks assigned to si . Dsi =
J X
Dtj
(2)
j=0
Our goal is to identify execution schedules for urgent computing applications that provide sufficient storage space to all tasks assigned to a resource. That is, Dsi ≤ Df ree (si ), where Df ree (si ) is the amount of unallocated storage for resource si . In large urgent computing environments with many tasks and computational and storage resource pairs, identifying the optimal schedule that follows these constraints is difficult to complete quickly. Identifying a ro-
Storage Resource, si IN0
t0
OUT0
... IN1
t1 OUTk
IN2
t2
TMP0 ...
... ...
INi
tj
TMPl
Figure 2. Task usage model of storage resources coupled to an urgent computing resource.
bust task schedule that can tolerate fluctuations in storage resource usage is also computationally expensive and difficult to complete quickly. In this paper, we specifically examine the development of storage availability metrics and resource allocation heuristics that schedule tasks to the best resource identified by the heuristics based on the metrics. We also analyze the use of resource robustness metrics to help heuristics identify resources that can tolerate unexpected fluctuations in storage resource availability.
3
Related Work
Areas of work related to our research and evaluation presented in this paper include Grid resource allocation methods, resource monitoring and performance forecasting, and scheduling algorithms and heuristics for robust workflow execution. In multiuser high-performance computing environments, resource managers are typically used to manage, schedule, and allocate resources for executing the breadth of jobs submitted to a resource while abiding by the job and resource specific constraints. Example local resource managers include PBS [13] and Grid resource managers such as the Globus Toolkit’s [12] Grid Resource Allocation and Management service (GRAM) [9]. Additional resource provisioning and scheduling tools are available to meet ap-
Authorized licensed use limited to: University of Nebraska - Lincoln. Downloaded on February 8, 2010 at 15:57 from IEEE Xplore. Restrictions apply.
plication specific requirements. These resource provisioning tools include execution management tools, such as the many task computing capabilities of Falkon [23], or the the bandwidth reservation capabilities of the NLR Sherpa and DVS or the ESNet TeraPaths toolset [14]. Advanced resource management capabilities that leverage existing capabilities also includes co-scheduling capabilities [18], advanced reservation support [22], and workflow scheduling and planning tools [24]. Data placement algorithms and tools are often used in Grid computing workflows to coordinate data transfers or storage with applications. Stork provides a batch scheduler for managing data placement tasks in a variety of computing environments [17]. Data aware application scheduling has also been investigated in the GFarm project using Grid filesystems [31]. Workflow planning tools, such as Pegasus, have used genetic algorithms and other heuristics to map data and applications to resources in storage constrained environments [24]. Asynchronous data transfers using the Data Replication Service have been used to manage the placement of data for large scientific workflows in Grid computing environments [5]. Algorithms with task load and data availability metrics have been studied for scheduling applications and data placement activities in data Grids [25]. Resource monitoring and performance tools are often used to guide resource management and scheduling software processes. The Network Weather Service (NWS) [30] is an example resource monitoring tool that provides resource Quality of Service (QoS) and availability data and forecasts. The NWS data can be used by application schedulers or performance models to estimate the performance of a schedule or application execution [15]. NWS forecasting tools have been used to estimate the load and queue wait time for a high-performance computing system [20]. This load prediction tool has been used to produce workflow schedules that minimize the wait time for the workflow applications to execute [21] and estimate if an application can meet an execution deadline [29]. Another related area of research is robust resource allocation and scheduling in heterogeneous computing environments. A recent survey of robustness methods highlights several robustness computation methods [4]. Additional work has produced a robustness radii method that represents the robustness of a task schedule given multiple performance metrics [1, 26]. In this paper, we focus on applying the robustness radius computation and heuristics to scheduling data placement tasks in urgent computing workflows.
4
Data Placement and Resource Allocation Metrics
In the following section, we present the metrics used by our resource allocation heuristics to assign tasks to urgent computing resources. The goal of the robustness-based metrics are to determine the resilience of resources and applications to variations in availability. The metrics described range from simple resource usage metrics to more detailed metrics that consider the data requirements for applications assigned to a resource. Applications using data and computational resources in urgent computing environments must be resilient to unexpected fluctuations in availability and performance of the required resources. A robust urgent computing data system ensures that data requirements for urgent computing tasks are achievable under variable resource loads. For storage resources, a primary uncertainty is the availability of storage space used by urgent computing applications and workflows. Since data storage in urgent computing environments is used by a variety of applications and the amount of data generated by these applications is seldom constant, the availability of storage space is uncertain. We evaluated the robustness radii method to quantify the robustness of an urgent computing storage system based on data placement tasks. To quantitatively compute the robustness of the urgent storage system, we followed FePIA process [1] and based our solution on dynamic robust scheduling [19]. To compute the robustness of the urgent computing storage system, we defined the usage of a storage resource (Dusedj )as a function of the total storage for a resource (Dtotalj ), the amount of free storage for a resource (Df reej ), and the sum of all pending data allocations for applications scheduled to the data resource (Di ). This function is described by Equation 3. We defined several metrics to determine the availability of free storage on a storage resource. The simplest metric is to formulate the availability of free storage based on the unallocated storage for a resource, j, at a point in time, t. Equation 3 describes this metric mathematically.
Dusedj (t) = Dtotalj (t) − Df reej (t) +
J X
Di (t)
(3)
i=0
Using the robustness radius formulation [19, 1], the robustness radius for the storage system is represented by Equation 4. For each storage resource, the robustness radius is calculated based on the resources storage usage at time t (Dusedj (t)), the max storage usage in the system (βDused (t) ), a storage space tolerance (τf ree ), and the number of data producing tasks assigned to resource j, (J). This formula describes the amount of data that each application
Authorized licensed use limited to: University of Nebraska - Lincoln. Downloaded on February 8, 2010 at 15:57 from IEEE Xplore. Restrictions apply.
assigned to a storage resource can consume before the task assignment is no longer considered robust based on the output data requirements.
r(Dusedj , t) =
τused + βDused (t) − Dusedj (t) √ J
(4)
If the input data products for an application are known, we can leverage this information to better assess the robustness of an application schedule. We devised an alternative data metric that includes the input data requirements for all applications assigned to a resource. In Equation 5, we considered the set of unique, shared input data used by each application on a resource. Dioj (t) =
F X
Dinf
(5)
f =0
The storage robustness radius with integrated application input data requirements, Equations 6, is calculated based on the resource’s storage allocation of shared files at time t (Dioj (t)), the max shared files available on a storage resource in the system (βDio (t) ), a storage space tolerance (τio ), and the number of data producing tasks assigned to resource j, (J). This formula describes the amount of data that each application assigned to a storage can consume before the task assignment is no longer considered robust based on both its input data requirements. r(Dioj , t) =
τio + βDio (t) − Dioj (t) √ J
(6)
The robustness of the urgent computing system at any point in time is the minimum robustness radius for all storage resources in the system. Using these robustness metrics, we developed several data placement heuristics to quantify the robustness of the system.
5
Robust Task Scheduling Heuristics
5.1
Non-robust Heuristics
In this subsection, we describe heuristics that are based on metrics other than resource robustness metrics. These heuristics match tasks to resources by either finding the minimum or maximum metric values and scheduling the tasks on the optimal resources. All of the heuristics are immediate mode heuristics and schedule tasks as they arrive. No task migration or task re-scheduling occurs in these heuristics. The Maximum Available Storage (MAS) heuristic searches the list of storage resources for the resource with the largest amount of unallocated storage space
(SR[i].f reeSpace) and assigns the task to that resource. The pseudocode for this heuristic is illustrated in Heuristic 1. The Maximum Data Available (MDA) heuristic identifies the storage resource with the highest percentage of required data (SR[i].dataAvail(T [j])) for a given task and schedules the task to that resource. This heuristic is illustrated in Heuristic 2. The Minimum Task Load (MTL) heuristic schedules a task on the resource with the minimum number of tasks assigned to that node (|SR[i].jobQeueue|). The Urgent Minimum Task Load (UMTL) is an extension to the MTL heuristic that assigns a task to a node with the minimum number of tasks with equivalent or greater priorities (SR[i].numU rgentJobs(T [j].priority)). Heuristic 3 presents the pseudocode for this heuristic. Heuristic 1 Maximum Available Storage Task Assignment Require: |SR| > 0 ∧ |T | > 0 1: for j = 0 to |T | − 1 do 2: M ⇐ SR[0].f reeSpace 3: MID ⇐ 0 4: for i = 0 to |SR| − 1 do 5: if SR[i].f reespace > M then 6: M ⇐ SR[i].f reeSpace 7: MID ⇐ i 8: end if 9: end for 10: Schedule T [j] on SR[MID ] 11: end for
Heuristic 2 Maximum Data Available Task Assignment Require: |SR| > 0 ∧ |T | > 0 1: for j = 0 to |T | − 1 do 2: M ⇐ SR[0].dataAvail(T [0]) 3: MID ⇐ 0 4: for i = 0 to |SR| − 1 do 5: if SR[i].dataAvail(T [j]) > M then 6: M ⇐ SR[i].dataAvail(T [j]) 7: MID ⇐ i 8: end if 9: end for 10: Schedule T [j] on SR[MID ] 11: end for
5.2
Robust Heuristics
The goal for the robust heuristics we evaluate in this paper is to maximize the robustness of the computing environment and resources so that unanticipated fluctuations in storage resource usage are tolerable. This goal is complementary to urgent computing environments because it ensures that there are tolerances for unanticipated changes in
Authorized licensed use limited to: University of Nebraska - Lincoln. Downloaded on February 8, 2010 at 15:57 from IEEE Xplore. Restrictions apply.
Heuristic 3 Urgent Minimum Task Load Task Assignment Require: |SR| > 0 ∧ |T | > 0 1: for j = 0 to |T | − 1 do 2: M ⇐ SR[0].numU rgentJobs(T [0].priority) 3: MID ⇐ 0 4: for i = 0 to |SR| − 1 do 5: if SR[i].numU rgentJobs(T [j].priority) < M then 6: M ⇐ SR[i].numU rgentJobs(T [j].priority) 7: MID ⇐ i 8: end if 9: end for 10: Schedule T [j] on SR[MID ] 11: end for
application or system behavior. We utilize feasible robustness heuristics that are variants of the non-robust heuristics defined in Section 5.1. All of these robustness heuristics can use either of the robustness metrics defined in Section 4 when computing the feasible set of resources or the maximum robustness. Unlike the non-robust heuristics in Section 5.1, the feasible robustness heuristics only consider assigning tasks to resources that meet a robustness constraint, α, such that (R(t) ≥ α). Each of the feasible robustness metrics first constructs a sublist of feasible robustness resources from all of the resources and then executes the underlying heuristic for scheduling the outstanding tasks. The goal for these heuristics is to only produce robust task-resource schedules. If no resources are considered to be robust, the scheduling is delayed until at least one robust resource becomes available. The Feasible Robustness Maximum Available Storage (FRMAS) heuristic searches the list of feasible robust storage resources for the resource with the largest amount of unallocated storage space (SRf [i].f reeSpace) and assigns the task to that resource. The pseudocode for this heuristic is illustrated in Heuristic 4. The Feasible Robustness Maximum Data Available (FRMDA) heuristic identifies the storage resource that meets the robustness constraint with the highest percentage of required data (SRf [i].dataAvail(T [j])) for a given task and schedules the task to that resource. This heuristic is illustrated in Heuristic 5. The Feasible Robustness Urgent Minimum Task Load (FRUMTL) heuristic is an extension to the MTL heuristic that assigns a task to a feasible robust resource with the minimum number of tasks with equivalent or greater priorities (SRf [i].numU rgentJobs(T [j].priority)). Heuristic 6 presents the pseudocode for this heuristic.
Heuristic 4 Feasible Robustness, Maximum Available Storage Task Assignment Require: |SRf | > 0 ∧ |T | > 0 1: for j = 0 to |T | − 1 do 2: M ⇐ SR[0].f reeSpace 3: MID ⇐ 0 4: for i = 0 to |SR| − 1 do 5: if SRf [i].f reespace > M then 6: M ⇐ SRf [i].f reeSpace 7: MID ⇐ i 8: end if 9: end for 10: Schedule T [j] on SRf [MID ] 11: end for
Heuristic 5 Feasible Robustness, Maximum Data Available Task Assignment Require: |SRf | > 0 ∧ |T | > 0 1: for j = 0 to |T | − 1 do 2: M ⇐ SRf [0].dataAvail(T [0]) 3: MID ⇐ 0 4: for i = 0 to |SRf | − 1 do 5: if SRf [i].dataAvail(T [j]) > M then 6: M ⇐ SRf [i].dataAvail(T [j]) 7: MID ⇐ i 8: end if 9: end for 10: Schedule T [j] on SRf [MID ] 11: end for
Heuristic 6 Feasible Robustness, Urgent Minimum Task Load Task Assignment Require: |SRf | > 0 ∧ |T | > 0 1: for j = 0 to |T | − 1 do 2: M ⇐ SRf [0].numU rgentJobs(T [0].priority) 3: MID ⇐ 0 4: for i = 0 to |SRf | − 1 do 5: if SR[i].numU rgentJobs(T [j].priority) < M then 6: M ⇐ SR[i].numU rgentJobs(T [j].priority) 7: MID ⇐ i 8: end if 9: end for 10: Schedule T [j] on SRf [MID ] 11: end for
Authorized licensed use limited to: University of Nebraska - Lincoln. Downloaded on February 8, 2010 at 15:57 from IEEE Xplore. Restrictions apply.
6
Evaluation of Robust Scheduling and Data Placement Heuristics 1000
alpha=0.0, beta=0.1 alpha=0.0, beta=3.0 alpha=0.0, beta=12.0
In this evaluation, we present the results of our experiments using robustness metrics for scheduling data intensive tasks in urgent computing environments. These experiments were conducted in a simulated urgent computing environment we have used in prior work to evaluate the performance of data management processes under various computing environment conditions [8]. For the simulation in this evaluation, we constructed a 20 node urgent computing environment. Each node was configured with one petabyte of storage that could be accessed at 120 megabytes/second. The simulations assumed that the network capacity connecting the resources was large enough that no network contention issues were noticeable. Each storage resource scheduled applications through an application queue and tracked files stored on the resource through a file registry. Each storage resource was capable of querying the state of other storage resource’s application queues, utilization data, and file registries. Our simulated user community submitted a variety of applications with variable data requirements and execution times. A resource selection and scheduling tool was layered on top of the environment and was responsible for distributing tasks to storage resources using the metrics and heuristics described in Sections 4 and 5. For these simulations, we generated trace data to simulate the usage of the storage resources. The traces included application profiles and storage resources utilization data. Our initial experiments were constructed such that a set of application tasks with similar execution requirements, but heterogeneous data requirements, were executed in the computing environment. Each application had an execution wait time of 300 seconds and an execution time of 300 seconds. The priority of each task submitted was the same. All 1000 application submissions were separated by 600 seconds. This created a low utilization for the computing environment where it was a rare occurrence for multiple applications to execute simultaneously in the environment. All of the tasks were capable of using 1000 files tracked by the storage resource file registries. In additional experiments, we varied the arrival rate and execution time of the applications to simulate increased competition for resource usage. The storage resources were initialized with data sets distributed to simulate various use cases and environmental conditions. First, several traces were generated to simulate the distribution of data across the 20 storage resources using a gamma distribution. Three file distribution scenarios were considered: nearly all files were located on a single resource, most of the files were at least located on 25% of the resources and at most 50% of the resources, and the files
Number of Files Hosted
Simulation Setup
800 600 400 200 00
5
15
10
Resource ID
20
Figure 3. Distribution of files used by tasks across the resources in the experiment. The alpha and beta parameters define the shape of the gamma distribution used in these experiments.
1000
Number of requests
6.1
a=1.01, N=10 a=1.4, N=10 a=2.0, N=10 a=1.01, N=50 a=1.4, N=50 a=2.0, N=50 a=1.01, N=100 a=1.4, N=100 a=2.0, N=100
800 600 400 200 0 100
101
File ID
102
103
Figure 4. Number of requests by tasks for each file. The a parameter defines the shape of the Zipf distribution and N defines the maximum number of files an application can access.
Authorized licensed use limited to: University of Nebraska - Lincoln. Downloaded on February 8, 2010 at 15:57 from IEEE Xplore. Restrictions apply.
50 500
40
Number of Files
300
Data Transfered (GB)
mu=2, std=05 mu=2, std=1 mu=2, std=2 mu=4, std=05 mu=4, std=1 mu=4, std=2 mu=8, std=05 mu=8, std=1 mu=8, std=2
400
30
20
Figure 5. Average distribution of file sizes for data placement experiment. The mu and std parameters define the shape for the normally distributed file size data.
were evenly distributed across all of the resources. Figure 3 illustrates the file distribution scenarios. Several traces of application usage of the files in the system were generated. The file usage traces were generated using a Zipf-like distribution. Figure 4 illustrates the usage of files by applications. The distributions had varying amounts of commonly used files to simulate various degrees of file reuse between applications. Finally, we generated data that described the size of the files stored in the simulation environment. We used a normal distribution to generate files of varying sizes. The mean and standard deviation of the distributions varied and the data we used is illustrated in Figure 5. We generated 30 different file size traces for each file size distribution.
6.2
Low Utilization Resource Selection Experiments
In the first experiment, we evaluated the use of the data aware, task scheduling heuristics in a low utilization environment. The low resource utilization caused the robust and non-robust resource selection tools to schedule tasks to the same set of resources because the feasible set of resources was every resource in the environment. Since the results were similar, we present the non-robust results for this experiment. We evaluated the MDA, MTL, and MAS heuristics for all of the generated storage resource usage scenarios. Figure 6 illustrates the average amount of data transfered for each application to run based on the resource selected by the
N=100, MTL
File Access Distribution
N=50, MTL
N=10, MTL
105
N=100, MAS
104
N=50, MAS
103
File Size (MB)
N=10, MAS
102
N=100, MDA
101
N=50, MDA
0
100
N=10, MDA
10
200
0 100
alpha=1.01 alpha=1.4 alpha=2
Figure 6. Amount of data transfered in the computing environment for the MDA, MAS, and MTL heuristics with maximum input data set size (N ) of 10, 50, and 100 files. The file distribution access parameter (alpha) was evaluated at 1.01, 1.4, and 2.0. The file resource distribution parameter (beta) was held at 0.1.
heuristic. In these experiments, we monitored the amount of data transfered between the storage resources for task input data. The amount of data from transfers was much larger than the data generated by applications. In the utilization experiment, the file usage distribution varied. The MDA heuristic with a central file distribution scheduled all of the tasks on the resource with the required files, so no files were transfered. While this reduced file transfer costs, it did not utilize the other resources in this environment. Figure 7 illustrates the average amount of data transfered for each application to run based on the resource selected by the heuristic. The file placement on the storage resources varied between several different distributions. Just like the previous experiment, the MDA heuristic, with a central file distribution, transfered very few files.
6.3
High Utilization Resource Selection Experiments
In the high utilization experiments, we evaluated the robust scheduling heuristics when there were many tasks competing for usage of the resources. We used the same heterogeneous file and storage data distributions used in the previous experiments. We present the results from a subset of the data (N = 100, mu = 2 , std = 2, and beta = 12). For
Authorized licensed use limited to: University of Nebraska - Lincoln. Downloaded on February 8, 2010 at 15:57 from IEEE Xplore. Restrictions apply.
beta=0.1 beta=3.0 beta=12.0
Data Transfered (GB)
40
25
30
20
20
15
10 a=2.0, MTL
a=1.4, MTL
a=1.01, MTL
a=2.0, MAS
a=1.4, MAS
a=1.01, MAS
a=2.0, MDA
a=1.4, MDA
10
a=1.01, MDA
0
30
Amount of Data Transfered (GB)
50
File Resource Distribution
Figure 7. Amount of data transfered in the computing environment for the MDA, MAS, and MTL heuristics with maximum input data set size (N ) of 100 files. The file distribution access parameter (alpha) was evaluated at 1.01, 1.4, and 2.0 and the file distribution parameter (beta) was evaluated at 0.1, 3.0 and 12.0.
the experiments in this section, we varied the arrival time of the tasks between periods of 15, 30, and 60 seconds. Applications in this experiment also had variable priorities. We assigned 25% of the tasks with a higher priority than the others. The task traces with shorter arrival periods induced congestion in the system, since more tasks computed simultaneously. In the first experiment for a task congested system, we evaluated the heuristics using the maximum data available metric, described by Equation 4. These experiments varied the feasibility constraint over a range of zero to 7.5. The constraint value of zero always found a set of feasible resources equal to the full set of resources. As the task load increased, the robustness value sometimes surpassed the feasibility constraint and removed resources from the feasible resource list. The amount of data transfered between the storage resources by each heuristic for each task arrival period are plotted in Figure 8, 9, and 10. Figure 8 illustrates the amount of data transfered between storage resources in the simulation environment so that the input data requirements for the tasks, with a 60 second arrival rate, were met and the tasks could run. The FRMTL and FRMAS heuristics transfered a constant amount of data between the resources regardless of the feasibility constraint. FRMTL transfered significantly less
5 00
1
2
3
4
5
Robustness Constraint
6
FRMAS FRMDA FRMTL 7 8
Figure 8. Amount of data transfered in the computing environment for the FRMDA, FRMAS, and FRMTL heuristics with variable feasibility constraints. The average task arrival time was 60 seconds.
storage than FRMAS because the FRMTL heuristic guided tasks based on the task load of a resource and FRMAS transfered data based on the availability of storage resources. The task utilization in the system was low, so the implementation of FRMTL used a subset of all of the resources and several resources remained idle. The FRMTL resource usage pattern therefore required fewer data transfers, since the core set of files that were reused between applications were only moved to the subset of resources that the tasks ran on. FRMAS utilized all of the resources in the environment and replicated the core set of files used by each application to all of the resources. Conversely, FRMTL oversubscribed tasks to a subset of the resources and left resources idle. The FRMDA heuristic required the least amount of data transfers between the storage resources when the feasibility constraint was low. When the constraint was zero, the heuristic assigned the task to the resource that stored most of the files in the computing environment. As the constraint increased and more tasks were assigned to resources, the FRMDA heuristic scheduled tasks on other resources and transfered the required data to those resources. In the next experiment, we increased the task congestion by decreasing the period between task arrivals to 30 seconds. The amount of data transfered between storage resources in this experiment is illustrated in Figure 9. The main difference between the results of this experiment and the previous experiment is that there were increases in the amount of data transfered for the FRMTL and FRMDA
Authorized licensed use limited to: University of Nebraska - Lincoln. Downloaded on February 8, 2010 at 15:57 from IEEE Xplore. Restrictions apply.
30
Amount of Data Transfered (GB)
Amount of Data Transfered (GB)
30 25
25
20
20
15
15
10
10
5 00
FRMAS FRMDA FRMTL
1
2
3
4
5
Robustness Constraint
6
FRMAS FRMDA FRMTL 7 8
5 00
1
2
3
4
5
Robustness Constraint
6
7
8
Figure 9. Amount of data transfered in the computing environment for the FRMDA, FRMAS, and FRMTL heuristics with variable feasibility constraints. The average task arrival time was 30 seconds.
Figure 10. Amount of data transfered in the computing environment for the FRMDA, FRMAS, and FRMTL heuristics with variable feasibility constraints. The average task arrival time was 15 seconds.
heuristics. Since there was an increase in the amount of tasks executing in the system, there were fewer idle and robust resources available for the FRMTL and it scheduled tasks on all resources. The scheduling behavior for FRMTL required data transfers to all resources that the tasks ran on. Increases in the number of data generating tasks in the system caused a small increase in the amount of data required to be transfered for most feasibility constraints. The amount of data required by tasks scheduled with the FRMAS heuristic was comparable to the previous experiment. In the final experiment, the task arrival period was set to 15 seconds. The results for this experiment are illustrated in Figure 10. This increase had no affect on the FRMTL and FRMAS heuristics compared to the results generated from the 30 second task period experiments, except for higher feasibility constraints. The results for these two metrics at higher feasibility constraints had considerably more error than the previous experiments and require further evaluation to identify the source of the data transfer behavior observed. Like the previous experiment, there was also an increase in the amount of data transfered by the FRMDA heuristic because of the increase in the task congestion. It appeared that as the task congestion in the system increased, the amount of data transfered by the heuristics converged to an upper bound. In this experiment, the upper bound for the data transfers was about 22.5 gigabytes. We executed the same data transfer experiments using the metric described by Equation 6. The results of these ex-
periments using this metric are illustrated in Figures 11, 12, and 13. This metric operated over a different range of constraints than the metric used in the previous experiments, described by Equation 4. The goal of this metric was to find the feasible set of resources that had the most data required by a task to execute. One notable similarity between the two sets of experiments was that the FRMTL heuristic transfered additional data as the task congestion increased. The FRMDA metric data transfers increased as the feasibility constraint increased and the congestion of the system increased. The FRUMTL heuristic transfered less data than the FRMTL heuristic and followed the trend of FRMTL.
7
Future Work
In the future we will evaluate the metrics and heuristics presented in this paper with data we have collected from the SPRUCE urgent computing environment deployed on the TeraGrid. Future work also includes integrating decision support tools into this data management environment. We are evaluating the use of decision support tools based on Bayes’ Theorem, support vector machines, and decision trees to help select the appropriate resource management or allocation policy. These selection tools will analyze the current state of the urgent computing environment and the urgent computing workload requirements to make informed decisions and selections. Using these tools, our goal is to provide support for evolving urgent computing use cases
Authorized licensed use limited to: University of Nebraska - Lincoln. Downloaded on February 8, 2010 at 15:57 from IEEE Xplore. Restrictions apply.
Amount of Data Transfered (GB)
30
Amount of Data Transfered (GB)
30 FRMAS FRMDA FRMTL FRUMTL
25
25
20
20 15
10
10 5
0 0.0000
FRMAS FRMDA FRMTL FRUMTL
15
5
0 0.0000 0.0001
0.0002
0.0003
Robustness Constraint
0.0004
0.0001
0.0005
0.0002
0.0003
Robustness Constraint
0.0004
0.0005
Figure 13. Amount of data transfered in the computing environment for the FRMDA, FRMAS, FRUMTL, and FRMTL heuristics with variable feasibility constraints. The average task arrival time was 15 seconds.
Figure 11. Amount of data transfered in the computing environment for the FRMDA, FRMAS, FRUMTL, and FRMTL heuristics with variable feasibility constraints. The average task arrival time was 60 seconds.
and scenarios that can learn from and improve upon past resource management and allocation decisions.
8
Amount of Data Transfered (GB)
30
FRMAS FRMDA FRMTL FRUMTL
25
20 15
10
Conclusions
Robustness metrics can help guide data placement tasks in heterogeneous computing systems, such as Grids. Quantifying the robustness of a task in an urgent computing environment can help insulate urgent computing applications from changes in environmental conditions. In this paper, we evaluated the use of robust data placement techniques to determine the availability of storage resources and found that these techniques can balance storage usage from a variety of tasks with heterogeneous data requirements.
5
0 0.0000
Acknowledgements 0.0001
0.0002
0.0003
Robustness Constraint
0.0004
0.0005
Figure 12. Amount of data transfered in the computing environment for the FRMDA, FRMAS, FRUMTL, and FRMTL heuristics with variable feasibility constraints. The average task arrival time was 30 seconds.
University of Colorado computer time was provided by equipment supported under National Science Foundation (NSF) MRI Grant #CNS-0421498, NSF MRI Grant #CNS0420873, NSF MRI Grant #CNS-0420985, NSF Grant #0503697 “ETF Grid Infrastructure Group: Providing System Management and Integration for the TeraGrid,” and a grant from the IBM Shared University Research (SUR) program. This research was supported in part by the NSF through support for the National Center for Atmospheric Research and TeraGrid resources provided by IU, NCAR,
Authorized licensed use limited to: University of Nebraska - Lincoln. Downloaded on February 8, 2010 at 15:57 from IEEE Xplore. Restrictions apply.
NCSA, TACC, and UC/ANL.
References [1] S. Ali, A. Maciejewski, H. Siegel, and J. Kim. Measuring the robustness of a resource allocation. IEEE Transactions on Parallel and Distributed Systems, 15(1), July 2004. [2] P. Beckman, I. Beschatnikh, S. Nadella, and N. Trebon. Building an Infrastructure for Urgent Computing. High Performance Computing and Grids in Action, 2007. [3] P. Bogden, T. Gale, G. Allen, J. MacLaren, G. Almes, G. Creager, J. Bintz, L. Wright, H. Graber, N. Williams, S. Graves, H. Conover, K. Galluppi, R. Luettich, W. Perrie, B. Toulany, Y. Sheng, J. Davis, H. Wang, and D. Forrest. Architecture of a Community Infrastructure for Predicting and Analyzing Coastal Inundation. Marine Technology Society Journal, 41(1):53–71, 2007. [4] L. Cannon and E. Jeannot. A comparison of robustness metrics for scheduling dags on heterogeneous systems. In Procedings of the 2007 IEEE International Conference on Cluster Computing, September 2007. [5] A. Chervenak, E. Deelman, M. Livny, M. Su, R. Schuler, S. Bharathi, G. Mehta, and K. Vahi. Data placement for scientific applications in distributed environments. In Proceedings of the 8th IEEE/ACM International Conference on Grid Computing (Grid 2007), September 2007. [6] J. Cope and H. Tufo. A Data Management Framework for Urgent Geoscience Workflows. In Proceedings of the International Conference on Computational Science (ICCS 2008), June 2008. [7] J. Cope and H. Tufo. Adapting grid services for urgent computing environments. In Proceedings of the 2008 International Conference on Software and Data Technologies (ICSOFT 2008), July 2008. [8] J. Cope and H. Tufo. Supporting storage resources in urgent computing environments. In Proceedings of the 2008 IEEE International Conference on Cluster Computing (Cluster 2008), September 2008. [9] K. Czajkowski, I. Foster, N. Karonis, C. Kesselman, S. Martin, W. Smith, and S. Tuecke. A resource management architecture for metacomputing systems. In Proceedings of the IPPS/SPDP 1998 Workshop on Job Scheduling Strategies for Parallel Processing, 1998. [10] N. Desai, E. Lusk, A. Cherry, and T. Voran. The computer as software component: A mechanism for developing and testing resource management software. In Proceedings of IEEE Cluster 2007, 2007. [11] K. Droegemeier, V. Chandrasekar, R. Clark, D. Gannon, S. Graves, E. Joesph, M. Ramamurthy, R. Wilhelmson, K. Brewster, B. Domenico, T. Leyton, V. Morris, D. Murray, B. Pale, R. Ramachandran, D. Reed, J. Rushing, D. Weber, A. Wilson, M. Xue, and S. Yalda. Linked Environments for atmospheric discovery (LEAD): A Cyberinfrastructure for Mesoscale Meteorology Research and Education. In Proceedings of the 20th Conference on Interactive Information Processing Systems for Meteorology, Oceanography, and Hydrology, Seattle, WA, January 2004. American Meteorological Society.
[12] I. Foster. Globus toolkit version 4: Software for serviceoriented systems. In IFIP International Conference on Network and Parallel Computing, 2005. [13] R. Henderson. Job scheduling under the portable batch system. In Proceedinsg of the Workshop on Job Scheduling Strategies for Parallel Processing, 1995. [14] D. Katramatos, D. Yu, B. Gibbard, and S. McKee. The TeraPaths Testbed: Exploring End-to-End Network QoS. In 2007 3rd International Conference on Testbeds and Research Infrastructures for the Development of Networks and Communities (TridentCom 2007), May 2007. [15] K. Kennedy. Why Performance Models Matter for Grid Computing. Grid Based Problem Solving Environments, 239:19–29, 2007. [16] G. Kola, T. Kosar, and M. Livny. Run-time adaptation of grid data placement jobs. In In Proceedings of the International Workshop on Adaptive Grid Middleware, 2003. [17] T. Kosar and M. Livny. Stork: Making data placement a first class citizen in the grid. In Proceedings of 24th IEEE International Conference on Distributed Computing Systems (ICDCS2004), Tokyo, Japan, March 2004. [18] J. MacLaren. Harc: The highly-available resource coallocator. In In Proceedings of International Symposium on Grid computing, high-performAnce and Distributed Applications 2007 (GADA07), 2007. [19] A. Mehta, J. Smith, H. Siegel, A. Maciejewski, A. Jayaseelan, and B. Ye. Dynamic resource allocation heuristics that manage tradeoff between makespan and robustness. Journal of Supercomputing, 42(1), 2007. [20] D. Nurmi, J. Brevik, and R. Wolski. QBETS: Queue Bounds Estimation from Time Series. In Proceedings of the 13th Workshop on Job Scheduling Strategies for Parallel Processing, June 2007. [21] D. Nurmi, A. Mandal, J. Brevik, C. Koelbel, R. Wolski, and K. Kennedy. Evaluation of a workflow scheduler using integrated performance modeling and batch queue wait time prediction. In Proceedings of the 2006 IEEE International Conference on Supercomputing (IEEE SC06), Novemember 2006. [22] D. Nurmi, R. Wolski, and J. Brevik. Probabilistic advanced reservations for batch-scheduled parallel machines. In Proceedings of the 13th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, 2008. [23] I. Raicu, Y. Zhao, C. Dumitrescu, I. Foster, and M. Wilde. Falkon: a fast and light-weight task execution framework. In Proceedings of IEEE/ACM SuperComputing 2007, 2007. [24] A. Ramakrishnan, G. Singh, H. Zhao, E. Deelman, R. Sakellariou, K. Vahi, K. Blackburn, D. Meyers, and M. Samidi. Scheduling data-intensive workflows onto storage-constrained distributed resources. In Procceedings of the 7th IEEE International Symposium on Cluster Computing and the Grid 2007 (CCGrid 2007), 2007. [25] K. Ranganathan and I. Foster. Simulation studies of computation and data scheduling algorithms for data grids. Journal of Grid Computing, 1(1), March 2003. [26] V. Shestak, H. Siegel, A. Maciejewski, and S. Ali. Robust resource allocations in parallel computing systems: Models and heuristics. Proceedings of the 8th International Symposium on Parallel Architectures,Algorithms and Networks, 2005.
Authorized licensed use limited to: University of Nebraska - Lincoln. Downloaded on February 8, 2010 at 15:57 from IEEE Xplore. Restrictions apply.
[27] D. Thain, T. Tannenbaum, and M. Livny. Distributed computing in practice: the condor experience. Concurrency Practice and Experience, 17(2-4):323–356, 2005. [28] N. Trebon. Deadline-based grid resource selection for urgent computing. Master’s thesis, University of Chicago, 2008. [29] N. Trebon and P. Beckman. Empirical-based probabilistic upper bounds for urgent computing applications. In Proceedings of the 2008 IEEE International Conference on Cluster Computing (Cluster 2008), September 2008. [30] R. Wolski, G. Obertelli, M. Allen, D. Numri, and J. Brevik. Predicting grid resource performance on–line. Handbook of Innovative Computing: Models, Enabling Technologies, and Applications, 2005. [31] W. Xiaohui, W. Li, O. Tatebe, G. X, H. Liang, and J. Jiubin. Implementing data aware scheduling in gfarm using lsf scheduler plugin mechanism. In Proceedings of the 2005 International Conference on Grid Computing and Applications (GCA’05), 2005.
Biographies Jason Cope is a Ph.D. candidate in the Department of Computer Science at the University of Colorado at Boulder. Jason graduated from Virginia Tech in 2003 with a B.S. in computer engineering. He received his M.S. in computer science in 2006 from the University of Colorado at Boulder. Jason is a student assistant in the Computational and Information Systems Laboratory at the National Center for Atmospheric Research. His research interests include the development of resource allocation tools to support scientific workflows in Grid computing environments. His Ph.D. dissertation addresses data management and resource allocation techniques for urgent computing applications, workloads, and systems. Nick Trebon is a fourth-year doctoral student at the University of Chicago. He graduated from the University of Oregon with a B.S. in computer science in 2002. In 2005, Nick earned an M.S. in computer science from the University of Oregon. In 2008, Nick earned a second M.S. from the University of Chicago. Since 2006, Nick has been involved in the fields of distributed and Grid computing. In particular, his research has focused on the domain of urgent computing.
large-scale parallel computing. Pete Beckman is a recognized global expert in high-end computing systems. During the past 20 years, he has designed and built software and architectures for large-scale parallel and distributed computing systems. After receiving his Ph.D. degree in computer science from Indiana University, he helped found the universitys Extreme Computing Laboratory, which focused on parallel languages, portable run-time systems, and collaboration technology. In 1997 Pete joined the Advanced Computing Laboratory at Los Alamos National Laboratory, where he founded the ACL’s Linux cluster team and launched the Extreme Linux series of workshops and activities that helped catalyze the highperformance Linux computing cluster community. Pete also has been a leader within industry. In 2000 he founded a Turbolinux-sponsored research laboratory in Santa Fe that developed the world’s first dynamic provisioning system for cloud computing and HPC clusters. The following year, Pete became Vice President of Turbolinux’s worldwide engineering efforts, managing development offices in Japan, China, Korea, and Slovenia. Pete joined Argonne National Laboratory in 2002. As Director of Engineering, and later as Chief Architect for the TeraGrid, he designed and deployed the world’s most powerful Grid computing system for linking production HPC computing centers for the National Science Foundation. After the TeraGrid became fully operational, Pete started a research team focusing on petascale high-performance software systems, wireless sensor networks, Linux, and the SPRUCE system to provide urgent computing for critical, time-sensitive decision support. In 2008 he became the Project Director for the Argonne Leadership Computing Facility, which is home to the world’s fastest open science supercomputer. He also leads Argonne’s exascale computing strategic initiative and explores system software and programming models for exascale computing.
Henry Tufo is an Associate Professor of Computer Science and Computational Science Center Director at the University of Colorado and Section Head of the Computer Science Section in the Computational and Information Systems Laboratory at the National Center for Atmospheric Research. Tufo conducts research in high-performance scientific computing, parallel algorithms and architectures, Linux clusters, Grid computing, scalable solvers, high-order numerical methods, computational fluid dynamics, climate modeling, and flow visualization. He is co-developer of NEK5000, a state-of-the-art code for simulating unsteady incompressible flows in complex geometries, and leads the development team for the High-Order Methods Modeling Environment (HOMME), a framework for building conservative and accurate atmospheric general circulation models. His work has been recognized with Gordon Bell prizes in 1999 and 2000 for demonstrated excellence in high-performance and
Authorized licensed use limited to: University of Nebraska - Lincoln. Downloaded on February 8, 2010 at 15:57 from IEEE Xplore. Restrictions apply.