Simulation Studies of Computation and Data Scheduling ... - CiteSeerX

19 downloads 10440 Views 324KB Size Report
at that site; a Local Scheduler (LS), which determines the order in ... web caching and replication [9, 14, 20]. Gwertzman ..... considered, even the best scheduling algorithms fall ..... tecture for a Global Web Hosting Servic”, in Proceedings of.
Journal of Grid Computing 1: 53–62, 2003. © 2003 Kluwer Academic Publishers. Printed in the Netherlands.

53

Simulation Studies of Computation and Data Scheduling Algorithms for Data Grids Kavitha Ranganathan1 and Ian Foster2 1 Department of

Computer Science, University of Chicago, Chicago, IL 60637, USA E-mail: [email protected] 2 Math. and Computer Science Division, Argonne National Laboratory, Argonne, IL 60439, USA E-mail: [email protected]

Key words: data replication, distributed computing, grid computing, scheduling, simulation

Abstract Data Grids seek to harness geographically distributed resources for large-scale data-intensive problems. Such problems, involving loosely coupled jobs and large data-sets, are found in fields like high-energy physics, astronomy and bioinformatics. A variety of factors need to be considered for effective scheduling of resources in such environments: e.g., resource utilization, response time, global and local allocation policies and scalability. We propose a general and extensible scheduling architecture that addresses these issues. Within this architecture we develop a suite of job scheduling and data replication algorithms that we evaluate using simulations for a wide range of parameters. Our results show that it is important to evaluate the combined effectiveness of replication and scheduling strategies, rather than study them separately. More specifically, we find that scheduling jobs to locations that contain the data they need and asynchronously replicating popular data-sets to remote sites, works rather well.

1. Introduction As network speeds grow, it is possible to envision global Grids that pool resources, and allow users to submit jobs from anywhere in the world. These jobs can then be run on any and all available compute power [12, 13]. Data Grids that harness geographically distributed storage and compute resources are being built by projects such as GriPhyN, PPDG and EU DataGrid. Target applications for such Grids include high-energy physics experiments, which involve a large number of data-intensive loosely coupled jobs. It is expected that by 2007, experiments like CMS and ATLAS at CERN will have generated petabytes of data that need to be processed by hundreds of physicist around the world [16]. Effective scheduling of jobs in such a system is complicated by a number of factors. The large amount of input data needed for each job would suggest that job placement strategies should take data locality into account. Moreover, intelligent data management strategies like replication and caching may go a long

way in reducing the latency involved in remote data access. Also, considering the large number of users and resources that such Grids aim to support, decentralized strategies may perform better than centralized strategies. Another important aspect to consider for a community scheduler is to allow local policies to stay in place. A Grid is comprised of various different administrative domains, and meaningful collaboration can only be achieved if each domain is allowed to maintain its local scheduling policy. We propose a general and extensible scheduling architecture that addresses the above mentioned issues. We assume a system model that consists of many different sites each with some compute resources (used to run jobs), storage (used to hold data) and users who may submit jobs. At each site, we place three components: an External Scheduler (ES), which determines where (i.e. to which site) to send jobs that originate at that site; a Local Scheduler (LS), which determines the order in which jobs that are allocated to that site are executed; and a Data Scheduler (DS), responsible

54 for determining if and when to replicate data and/or delete local files. Within this architecture, a specific scheduling system is defined by the choice of ES, DS and LS algorithms. We define a family of five ES and four DS algorithms, LS algorithms being widely researched in the past [7, 23]. We also identify four parameters that can effect the performance of each strategy: network bandwidth, user access patterns, quality of the information used to make job and data placement decision and ratio of job run time to input data size. To evaluate these different scheduling algorithms and the impact of the different factors detailed above, we have developed a modular and extensible discrete event Data Grid simulation system, ChicagoSim (Chicago Grid Simulator) and performed extensive simulation studies. The results reported here (which represent a significant extension to the results reported in a previous paper [22] as detailed below) indicate that in most cases, scheduling jobs to locations where the input data already exists, and asynchronously replicating popular data files across the Grid works rather well. We also find that it is important to consider the joint performance of scheduling and replication strategies as opposed to studying each field separately. We extend on a number of axes from our previous work [22] by considering contention in the network to/from a site, a wide range of bandwidth scenarios, three different user access patterns, varying job characteristics of different applications and varying quality of information that is used to make scheduling decisions. We also introduce new scheduling and replication strategies aimed to counter the effect of “herd behavior”, the result of many users simultaneously converging on a desirable site, quickly overloading it. In addition, we model a Poisson arrival rate of jobs instead of assuming that a user submits a new job only when the previously submitted job has finished executing. The rest of the article is as follows. Section 2 reviews related work in Grid scheduling and data placement. In Section 3, we discuss the Grid scheduling problem and provide details of our proposed model and algorithms. Section 4 presents our simulation experiments and results while Section 5 analyzes the impact of different parameters on our algorithms. We conclude and point to future directions in Section 6.

2. Related Work Most previous scheduling work has considered data locality/storage issues as secondary to job placement. We first discuss work that has recognized the importance of data location in job scheduling, then discuss work that has concentrated on data replication issues alone and finally mention other scheduling work. Thain et al. [25] describe a system that links jobs and data by binding execution and storage sites into I/O communities that reflect physical reality. They present building blocks for such communities but do not address policy issues; thus this work is complementary to our work on scheduler policies. The same is true for Basney et al.’s work on Execution Domains [5], a framework that defines bindings between computing power and data resources such that applications are scheduled to run at CPUs that have access to required data and storage. Casanova et al. [10] describe an adaptive scheduling algorithm for parameter sweep applications that uses a centralized scheduler to compute an optimal placement of data, prior to job execution. In contrast, we actively replicate/pre-stages files at runtime using a decentralized strategy that allows each site to take informed decisions based on its view of the Grid. Heuristics like Max-min and Min-Min are used for Level-by-Level scheduling of DAGS by Alhusaini et al. [1]. They consider data location as a parameter but assume that resource performance characteristics are perfectly predictable. We adopt a more realistic approach in which schedulers take decisions based on current Grid conditions. Work on data replication strategies for Grids include [6] and [21]. In [21] we examined dynamic replica placement strategies in a hierarchical Grid environment. In [6], an economic model for replica creation and deletion is proposed and compared to more traditional algorithms for replica deletion. Closely related to Grid replication is the work done on web caching and replication [9, 14, 20]. Gwertzman et al. [14] maintain detailed access logs to calculate the optimal subnet to move a popular file to. Bestavros et al. [9] use information available in the TCP/IP route record option to create a tree of nodes and periodically replicate popular files down the tree. In RaDar [21] information from routers is used to create preference paths for files and used to move files closer to potential users. Job scheduling algorithms for distributed computing have been extensively studied (e.g., [15, 18,

55 23, 24]). Subramani et al. [24] look at the advantages of sending multiple simultaneous job requests to different sites, while James et al. [18] examine algorithms for scenarios where all jobs are registered before the scheduling begins. Hamscher et al. [15] discuss various metacomputing scheduling frameworks. They talk about both centralized and decentralized approaches and algorithms suited for each case. What sets our work apart from Grid scheduling research done so far is that we consider dynamic data replication as a fundamental part of the scheduling problem.

3. Scheduling Problem and Proposed Architecture We model a Data Grid as a set of sites, each comprising a number of processors and storage; a set of users, each associated with a site; and a set of files, each of a specified size, initially mapped to sites according to some distribution. We assume that all processors have the same performance and that all processors at a site can access any storage at that site. Each site can have a different number of processors and hence different amounts of compute power. Sites are connected to each other by limited bandwidth. Each user generates jobs according to some distribution. Each job requires that a specific file be available before it can execute. It then executes for a specified amount of time on a single processor. A particular Data Grid Execution (DGE) is defined by a sequence of job submissions, allocations to sites, and executions along with data movements. A DGE can be characterized according to various metrics, such as elapsed time, average response time, processor utilization, network utilization, and storage utilization. The scheduling problem in a Data Grid is thus to define algorithms that produce DGEs that are both correct and good with respect to one or more metrics. 3.1. Scheduling in a Data Grid A number of Grid properties distinguish the Grid scheduling problem from scheduling on multiprocessors and clusters. Geographical Distances: Grid resources may be connected by relatively slower (and/or oversubscribed) WANs as compared to high-speed LAN connections. Moreover, the input datasets for jobs may be

large (perhaps many Gigabytes). In these conditions, data transfer times and network latency can be significant, compelling job-scheduling decisions to consider the amount of network traffic they generate. Thus, intelligent management of data plays an important role in Grid scheduling. Administrative Domains: The various sites of the Grid may belong to different administrative domains, each with their own local policies which may take precedence over global scheduler policies. Distributed control is an inherent property of Grids, and the scheduling framework should enable this. Size: Many hundreds of sites, thousands of users and millions of resources may eventually be part of a Grid: assembled by a community such as high-energy physics. Thus, scalability and reliability are important concerns for any scheduling paradigm. Decentralized, scalable solutions may be preferred. Dynamic Characteristics: Resources in the Grid are expected to fluctuate in their availabilities and capabilities. Hence, batch scheduling techniques that assume constant availability and map out the entire schedule before running jobs may not be successful, as by the time a particular job can run, the Grid scenario might be quite different than that used by the scheduler to map out where a job would run. Algorithms have to work with the assumption that the information used for mapping jobs to resources will quickly be out of date. To work around this problem, a Grid scheduler could use just in time scheduling or online scheduling where jobs are dispatched as and when they arrive at a scheduler, depending on the current status of resources. 3.2. Scheduling Architecture Using the above factors as guidelines we define a Grid scheduling architecture that facilitates efficient data management, allows priority to local policies, is decentralized with no single point of failure and employs online job allocation techniques. The scheduling logic of the architecture is encapsulated in three distinct modules (see Figure 1). – External Scheduler (ES): Each user in the system is associated with an External Scheduler and submits jobs to that External Scheduler. We can typically imagine one ES per site but the framework does not enforce this. Study of other user: ES mappings are left for future work. The ES then decides the remote site to which to send the job to depending on some scheduling algorithm. It may

56

Figure 1. Interactions among Data Grid Components.

use external information such as load at a remote site or the location of a dataset, as input to its decisions. – Local Scheduler (LS): Once a job is assigned to run at a particular site (and sent to an incoming job queue of that site) it is managed by the Local Scheduler at that site. The LS decides how to schedule all jobs allocated to it, on its local resources. It could, for example, decide to give priority to certain kinds of jobs, or it could refuse to run jobs submitted by a certain user. – Dataset Scheduler (DS): The DS at each site keeps track of the popularity of each dataset locally available. It then replicates popular datasets to remote sites depending on some algorithm. The DS may use external information such as whether the data already exists at a site and load at a remote site to guide its decisions. The external information a module needs can be obtained via an information service such as the Globus Toolkit’s Monitoring and Discovery Service [11].

3.3. Scheduling and Replication Algorithms Within this architecture, a particular system is defined by a choice of ES, DS and LS algorithms. We focus in this article on external scheduling and data replication and for each we define (and evaluate) a range of different algorithms. Management of internal resources is a problem widely researched [23] and we use FIFO (first in first out) as our LS algorithm. We define five different algorithms used by an External Scheduler to select a remote site to which to send a job: – JobRandom: A randomly selected site. – JobLeastLoaded: The site that currently has the least load. A variety of definitions for load are possible; here we define it simply as the least number of jobs waiting to run. – JobRandLeastLoaded: Identify the N least-loaded sites and randomly select one of these. This algorithm is a variation of JobLeastLoaded with the aim to minimize any hot-spots caused by the symmetry of actions of various External Schedulers.

57 – JobDataPresent: A site that already has the required data. If more than one site qualifies, choose the least loaded one. – JobLocal: Always run jobs at the originating site. In each case, any data required to run a job is fetched locally before the task is run, if it is not already present at the site. For the Dataset Scheduler, we define four alternative algorithms: – DataCaching: No active replication is performed by the DS. – DataRandom: The DS keeps track of the popularity of the datasets that its site contains (by tracking the number of jobs that have needed a particular dataset). If a sites computational load exceeds a threshold, those popular datasets are replicated to a random site on the Grid. – DataLeastLoaded: The DS keeps track of dataset popularity and chooses the least loaded site as a new host for a popular dataset. Again it replicates only when the computational load at its site exceeds a threshold. – DataRandLeastLoaded: Variation of DataLeastLoaded where a random site is picked from the top N least loaded sites. In all cases, data is also cached and the finite storage at each site is managed using LRU. We use N = 5 for both DataRandLeastLoaded and JobRandLeastLoaded. We thus have a total of 5 × 4 = 20 algorithm combinations to evaluate.

4. Simulation Studies We choose to evaluate these algorithms via simulations to allow for the analysis of different algorithms in a wide range of scenarios. We have constructed a discrete event simulator, ChicagoSim, to evaluate the performance of different combinations of task and data scheduling algorithms. ChicagoSim is built on top of Parsec [4], a C-based simulation language. We describe below the simulation framework and experiments performed. 4.1. Simulation Framework In the absence of real traces from real data Grids, we model the amount of processing power needed per unit of data, and the size of input and output datasets, on the values expected for the CMS experiment [16],

but otherwise generate synthetic data distributions and workloads, as we now describe. We model a Grid with 120 users distributed among 30 sites. We do not model a network topology but assume that sites are connected to the Grid via a limited bandwidth network of 10 MBytes/sec (other bandwidth scenarios are considered in Section 5.1). Hence, the bandwidth between any two sites is constant and there is no network contention except at a site-network boundary where total input or output cannot exceed 10 MB/sec. There are three different tiers of sites as envisioned by the GriPhyN project where Tier1 sites contain more processing power than Tier2 sites and so on [2] (5 compute elements at Tier1 sites, 3 each at Tier2 and 2 compute elements at Tier3 sites). Dataset sizes are selected randomly with a uniform distribution between 500 MB to 2 GB. Datasets are distributed uniformly across sites and initially only one replica per dataset is in the system. Users are mapped evenly across sites and submit jobs according to a Poisson distribution with an inter-arrival time of 5 seconds. The workload comprises a total of 6000 jobs, 50 per user. In the experiments reported here, each job requires a single input file and runs for 300D seconds, where D is the size of the input file in GB. Variations in the ratio of the input file size to job run times are discussed in Section 5.4. The transfer of input files from one site to another incurs a cost corresponding to the size of the file divided by the nominal speed of the link. Since job output is often significantly smaller in size as compared to input [17], we ignore output issues for now. The jobs (i.e., input file names) needed by a particular user are generated randomly according to two different distributions, geometric and Zipf (Figure 4a), with the goal of modeling situations in which a community focuses on some datasets more than others. The typically larger tails of Zipf distributions could impact the effectiveness of the caching strategy employed and thus the amount of data transfer needed and the performance of the replication strategy. We elaborate on these issues in Section 5.2. Note that we do not attempt to model changes in dataset popularity over time. 4.2. Metrics We ran each set of experiments (5 ∗ 4 = 20 algorithm combinations) five times, with different random seeds in order to evaluate variance; in practice, we found no significant variation among different runs of the same algorithm. For each experiment, we measured:

58 – Average amount of data transferred (bandwidth consumed) per job. – Average job completion time defined as max (data transfer time, queue time) + compute time. – Average idle time for a processor. The amount of data transferred is important from the perspective of overall resource utilization, while system response time is of more concern to users. Since the data transfer needed for a job starts while the job is still in the processor queue of a site, the average job completion time includes the maximum of the queue time and transfer time in addition to job execution time. Processor idle time provides a measure of system utilization. A processor is idle because either the job queue of that site is empty or the datasets needed for queued jobs are not yet available. 4.3. Results and Discussion Figures 2 and 3a show the average response time, bandwidth consumed and idle time of processors for the different combinations of data replication and job scheduling algorithms when access patterns follow a geometric distribution with α = 0.98. The results are the average over the five experiments performed for each algorithm pair. When plain caching (DataCaching) is used, algorithm JobRandLeastLoaded (job is sent to a randomly selected least loaded site) performs the best in terms of response time and algorithms JobDataPresent (job is run where the data is) performs the worst. JobRandom

and JobLocal perform worse than the least loaded algorithms but significantly better then JobDataPresent. These results can be explained as follows: although data is initially uniformly distributed across the sites, the geometric distribution of dataset popularity causes certain sites to contain often-used datasets. When algorithm JobDataPresent is used, these sites get more than their fair share of jobs, and hence tend to get overloaded. This effect leads to long queues at those particular sites and hence a degradation in performance. Since there is no active replication, the amount of data transferred is zero in this particular case (Figure 2b). JobRandLeastLoaded performs better than JobLeastLoaded because it effectively minimizes the negative impact of symmetry of decisions made by different sites. Once we introduce a replication policy however, algorithm JobDataPresent performs remarkably better than all other alternatives (Figure 2). Perhaps surprisingly, it does not seem to matter which particular replication strategy is chosen as long as one is employed. Similarly, the idle time for processors is least when JobDataPresent is employed along with a replication policy (Figure 3a). Clearly, in this case, dynamic replication helps to reduce hot spots created by popular data and enables load sharing. The significantly better performance of strategy JobDataPresent when combined with a replication policy, can be explained as follows. The JobDataPresent strategy by itself generates minimal data transfer, as jobs are scheduled to the site where the data they need is already present. Datasets are moved only as a result of explicit replication. Thus

Figure 2. Performance of different combinations of scheduling strategies (X axis) and replication strategies (shaded bars) in the case of geometric distribution of dataset popularity.

59

Figure 3. (a) Percentage of time when processors are idle for different strategies and (b) response times of scheduling algorithms for different bandwidth scenarios (replication strategy used is DataRandLeastLoaded).

the network does not get overloaded. Moreover, since the input data is already present at the site, jobs never wait for required data to arrive, and thus response times tend to be short. We note that these statements will not always be true in the case of multiple input files for a job. We are currently working on such scenarios and preliminary results show that while the gains in using JobDataPresent are not as marked as the single file scenario, choosing execution sites based on the data they contain still has significant advantages. Replication policies improve performance by replicating popular data to other sites, thus ensuring that jobs do not accumulate at a few sites that contain popular data. The replication algorithms studied do not have any significant effects when used with a scheduling algorithm other than JobDataPresent. As Figure 2b illustrates, the difference in the average amount of data transferred between algorithm JobDataPresent and the others is large. Clearly, if data locality issues are not considered, even the best scheduling algorithms fall prey to data transfer bottlenecks. These results emphasize the importance of studying scheduling and data replication algorithms together. If the scheduling algorithms were studied by themselves (using plain caching), we would conclude that algorithm JobRandLeastLoaded was the best choice. However when studied along with the effect of replication strategies, JobDataPresent works much better than any other alternative. Similarly, a replication policy that might work well by itself may not guarantee the best overall performance. Only by studying the effects of the combination of different replication and scheduling policies were we able to

come up with a solution that works better than each isolated study. Perhaps surprisingly, we found no significant performance differences between the replication algorithms that we evaluated, DataLeastLoaded, DataRandom and DataRandLeastLoaded.

5. Sensitivity of Algorithms to Parameters Scheduling jobs at data sources and actively replicating popular data turns out to be the best choice for the parameters defined in Section 4.1, as it ensures load sharing in the Grid with minimal data transfer. We next evaluate the sensitivity of our results to four system parameters: network capacity, dataset popularity, quality of information (of load at a site) and ratio of job run time to input data size. 5.1. Network Capacity The large amounts of data transfered suggests that available bandwidth might impact performance. We find, not surprisingly, that if site network bandwidth is decreased from 10 MB/sec to 1 MB/sec (Figure 3b) the performance of all algorithms that involve extensive data transfer (JobRandom, JobLeastLoaded, JobRandLeastLoaded and JobLocal) degrade sharply. Strategy JobDataPresent performs more or less consistently as it does not involve a large amount of data movement. Similarly, when site bandwidth is increased, the four algorithms with large data transfers perform much better. Under these new conditions of ample bandwidth (100 MB/sec), algorithm JobRandLeastLoaded performs almost as well as algorithm

60

Figure 4. (a) The three dataset popularity functions used in our experiments. Geometric popularity with α = 0.95 and 0.99 and Zipf with β = 0.68; (b) the performance of job scheduling strategies for these different usage patterns.

JobDataPresent, with a response time of 536 seconds as opposed to 483 seconds for JobDataPresent. Thus, while we believe that the system parameters we used are realistic for a global scientific Grid, we must be careful to evaluate the impact of future technological changes on our results. 5.2. Dataset Popularity Distributions We ran experiments for three different dataset popularity distributions (Figure 4a): two geometric distributions and one zipf distribution. Web accesses are usually characterized by zipf distributions, but since we do not yet know the nature of Data Grid file accesses, it is important to test other possible cases. The zipf distribution has a characteristically longer tail than the geometric distributions: that is a large number of files are accessed only once or twice. As shown in Figure 4b, access pattern does affect the performance of each strategy. Zipf has the highest response time in most cases. This might be due to the large number of files that are accessed only a couple of times, thus minimizing the positive effects of caching and replication. However JobDataPresent still works best in all cases. Thus JobDataPresent combined with dynamic replication seems to be preferable even under varying user behavior. 5.3. Quality of Information Information about load on remote sites is crucial to both JobLeastLoaded and JobRandLeastLoaded as these algorithms aim to schedule at least loaded sites.

We envision a service much like the GIS (Grid Information Service) [11] where sites will obtain information about the current status of remote resources. Depending on how often this service is updated, the information obtained through such a service could be correct, a little stale or totally misleading. Of course, keeping the GIS up-to-date comes with overheads, so algorithms that are less sensitive to stale information are more robust and thus desirable. Figure 5a shows the performance of each strategy as the quality of information degrades. As the information update frequency is decreased, JobLeastLoaded worsens in performance. JobRandLeastLoaded also suffers but not as drastically. JobDataPresent however is uniform in its performance as it does not need load information at sites. The replication strategies DataLeastLoaded and DataRandLeastLoaded do use load information from the GIS but as each site replicates only periodically, the impact of stale information is less harmful for the replication strategies. These results suggest that for JobRandLeastLoaded to work almost at par with JobDataPresent, the GIS has to be highly accurate, a difficult task considering the dynamic nature of the Grid. 5.4. Ratio of Job Run-time to Input-data Size Next we consider the impact of the ratio of job run time to input data size. Each job runs for C∗D seconds where D is the size of input data in GBytes. In our earlier experiments, C was estimated at 300 though this depends on the application. We run experiments with C ranging from 100 through 400 to encompass

61

Figure 5. Performance of scheduling strategies as (a) the quality of information about load at sites and (b) job run time, varies. DataRandLeastLoaded is the replication policy used.

a wider range of scenarios. The results for three algorithms are show in Figure 5b. Algorithm JobRandom and JobLocal perform much worse than the others and are not included in the graph so that the relative performance of the others can be observed. As Figure 5b shows, for lower values of C, JobDataPresent has the lowest response time. This implies that for high-energy scientific experiments like CMS and ATLAS (with C ranging from 100–300) JobDataPresent would be a suitable scheduling policy. If however the nature of jobs change substantially, with much longer run times for the same amount of input data (with C nearing 400), JobRandLeastLoaded might be preferable. This is because with longer job run times, computational power and not bandwidth, becomes the bottleneck in the Grid. Hence, there is ample time to fetch required input data for jobs that are waiting in the run queue, without increasing the response time. It should however be noted, that the amount of network traffic generated by JobRandLeastLoaded is much higher than that of JobDataPresent. Hence, if bandwidth conservation is of priority JobDataPresent combined with a dynamic replication policy is definitely preferable.

6. Conclusions and Future Work We have investigated the problem of placement of jobs and data in a distributed Data Grid environment, with the goal of achieving good system utilization and response time. Towards this goal, we developed

a modular and extensible scheduling architecture that addresses key Grid properties. A wide variety of job scheduling and data replication algorithms can be instantiated within this architecture. We used simulation studies to evaluate five different scheduling and four different replication algorithms for this architecture. We also identify four parameters that can affect algorithm performance: network bandwidth, user access patterns, quality of the information used to make job and data placement decision and ratio of job run time to input data size. We find that it is important to take data locality into account while scheduling: for example, simply scheduling jobs to idle processors, and then moving data if required, performs significantly less well than algorithms that also consider data location when scheduling. More specifically, our results show that the simple strategy of always scheduling jobs to where required data is located, combined with independently generating new replicas of popular datasets, works particularly well. This approach has significant implementation advantages when compared to (say) approaches that attempt to generate a globally optimal schedule: first, it effectively decouples job scheduling and data replication, so that these two functions can be implemented and optimized separately, and second it permits highly decentralized implementations. In future work, we plan to investigate more realistic scenarios with multiple input files and real user access patterns. We also aim to validate our simulation results on real Grids, such as those being developed within the GriPhyN project [2] and by participants in

62 the International Virtual Data Grid Laboratory [3]. Another interesting aspect is to study the nature of Grid access patterns for formulating data handling schemes. For example, correlations in file usage could help design more efficient replication/pre-staging strategies. Our ultimate goal is to come up with a suite of adaptive job placement and data movement algorithms that would dynamically choose strategies depending on current and predicted Grid conditions.

Acknowledgements We thank Jennifer Schopf and Mike Wilde for valuable discussions and feedback, and Koen Holtman for his input on physics workloads. This research was supported by the National Science Foundation’s GriPhyN project under contract ITR-0086044.

References 1. A.H. Alhusaini, V.K. Prasanna and C.S. Raghavendra, “A Unified Resource Scheduling Framework for Heterogeneous Computing Environments”, in Eighth Heterogeneous Computing Workshop, 1999. 2. P. Avery and I. Foster, “The GriPhyN Project: Towards Petascale Virtual Data Grids”, Technical report GriPhyN, 2001. 3. P. Avery, I. Foster, R. Gardner, H. Newman and A. Szalay, “An International Virtual-Data Grid Laboratory for Data Intensive Science”, Technical report iVDGL, 2001. 4. R. Bagrodia et al., “Parsec: A Parallel Simulation Environment for Complex Systems”, Computer, Vol. 31, No. 10, pp. 77–85, 1998. 5. J. Basney, M. Livny and P. Mazzanti, “Harnessing the Capacity of Computational Grids for High Energy Physics”, in Proceedings of the International Conference on Computing in High Energy and Nuclear Physics (CHEP 2000), 2000. 6. W.H. Bell, D.G. Cameron et al., “Simulation of Dynamic Grid Replication Strategies in OptorSim”, in Proceedings of the Third Int’l Workshop on Grid Computing, 2002. 7. F. Berman, The Grid, Blueprint for a New Computing Infrastructure, Chapter 12. Morgan Kaufmann Publishers, Inc., 1998. 8. F. Berman, R. Wolski, S. Figuera, J. Schopf and G. Shao, “Application-Level Scheduling on Distributed Heterogeneous Networks”, in Proceedings of SuperComputing’96, 1996. 9. A. Bestavros and C. Cunha, “Server-Initiated Document Dessimination for the WWW”, IEEE Data Engineering Bulletin, Vol. 19, pp. 3–11, 1996.

10. H. Casanova, G. Obertelli, F. Berman and R. Wolski, “The AppLeS Parameter Sweep Template: User-Level Middleware for the Grid”, in Proceedings of SuperComputing’00, 2000. 11. K. Czajkowski, S. Fitzzgerald, I. Foster and C. Kesselman, “Grid Information Services for Distributed Resource Sharing”, in Proceedings of 10th IEEE International Symposium on High Performance Distributed Computing (HPDC-10), 2001. 12. I. Foster and C. Kesselman (eds.), The Grid: Blueprint for a New Computing Infrastructure. Morgan Kaufmann, 1999. 13. I. Foster, C. Kesselman and S. Tuecke, “The Anatomy of the Grid: Enabling Scalable Virtual Organizations”, International J. Supercomputing Applications, Vol. 15, No. 3, 2001. 14. J. Gwertzman and M. Seltzer, “The Case for Geographical Push Caching”, in Proceedings of the Workshop on Hot Topics in Operating Systems, 1995. 15. V. Hamscher, U. Schwiegelshohn, A. Streit and R. Yahyapour, “Evaluation of Job-Scheduling Strategies for Grid Computing”, in Proceedings of the Seventh International Conference of High Performance Computing, 2000. 16. K. Holtman, “CMS Requirements for the Grid”, in Proceedings of the International Conference on Computing in High Energy and Nuclear Physics (CHEP2001), 2001. 17. K. Holtman, “HEPGRID2001: A Model of a Virtual Data Grid Application”, in Proceedings of HPCN Europe 2001, 2001. 18. H.A. James, K.A. Hawick and P.D. Coddington, “Scheduling Independent Tasks on Metacomputing Systems”, in Proceedings of Conference on Parallel and Distributed Computing Systems, 1999. 19. M. Maheswaran, S. Ali, H.J. Siegel and D. Hensgen, “Dynamic Matching and Scheduling of a Class of Independent Tasks Onto Heterogeneous Computing Systems”, in Proceedings of 8th Heterogeneous Computing Workshop, 1999. 20. M. Rabinovich and A. Aggarwal, “RaDaR: A Scalable Architecture for a Global Web Hosting Servic”, in Proceedings of the Eighth International World Wide Web Workshop, 1999. 21. K. Ranganathan and I. Foster, “Identifying Dynamic Replication Strategies for a High Performance Data Grid”, in Proceedings of the Second International Workshop on Grid Computing, 2001. 22. K. Ranganathan and I. Foster, “Decoupling Computation and Data Scheduling in Distributed Data-Intensive Applications”, in Proceedings of the 11th IEEE International Symposium on High Performance Distributed Computing HPDC-11, 2002. 23. B.A. Shirazi, A.R. Husson and K.M. Kavi (eds.), Scheduling and Load Balancing in Parallel and Distributed Systems. IEEE Computer Society Press, 1995. 24. V. Subramani, R. Kettimuthu, S. Srinivasan and P. Sadayappan, “Distributed Job Scheduling on Computational Grids Using Multiple Simultaneous Requests”, in Proceedings of the 11th IEEE International Symposium on High Performance Distributed Computing (HPDC-11), 2002. 25. D. Thain, J. Bent, A. Arpaci-Dusseau, R. Arpaci-Dusseau and M. Livny, “Gathering at the Well: Creating Communities for Grid I/O”, in Proceedings of SuperComputing 2001, 2001.

Suggest Documents