Peer-to-Peer Technologies ... design and implementation of Omnivore, a fully decentralized job ... tional computer systems, alleviating the problem of resource.
Omnivore: Integration of Grid Meta-Scheduling and Peer-to-Peer Technologies Michael Heidt, Tim D¨ornemann, Kay D¨ornemann, Bernd Freisleben Department of Mathematics and Computer Science, University of Marburg Hans-Meerwein-Str. 3, D-35032 Marburg, Germany Email: {heidtm, doernemt, doernemk, freisleb}@informatik.uni-marburg.de
Abstract— Dedicated servers remain to be a common constituent of Grid job scheduling architectures, forcing site administrators to make compromises between administrative expenses and system reliability. Apart from requiring administrative attention, dedicated servers create single points of failure and should not be subjected to network churn. This paper presents the design and implementation of Omnivore, a fully decentralized job scheduling system, built on a peer-to-peer based meta-scheduler. Omnivore is able to cope both with node failures and network churn, eliminating the need for central administration and continuous resource availability. It is integrated into the Grid landscape (especially the Globus Toolkit 4) by means of the GridWay metascheduler to provide scalable distributed scheduling, replicated storage and system monitoring capabilities. Results obtained from an experimental evaluation of our implementation show that Omnivore is both scalable and resilient in the presence of node failures and network churn.
I. I NTRODUCTION Today’s Computing Grids [10] are primarily used to connect dedicated compute clusters. Building dedicated compute clusters requires considerable administrative and fiscal resources. Often, necessary compute power is already available in the form of desktop computers - incorporating them into on-demand resource pools prevents investments in additional computer systems, alleviating the problem of resource wastage. Peer-to-peer (P2P) technology, on the other hand, is aimed at connecting large numbers of compute nodes, each offering services to other nodes. Most P2P applications to date are based on a vertically integrated architecture, providing a single specialized type of service to their peers, such as e.g. file sharing. With the advent of P2P substrates, layered P2P architectures and common APIs [7], new architectural possibilities are offered. In practice, Grids usually connect smaller numbers of nodes (since a Grid node represents a cluster with up to thousands of nodes), while the P2P world has seen operational deployment of systems on an Internet scale [16]. Thus, Grids can benefit greatly from the adoption of P2P technologies, eliminating the typically hierarchical organization of Grids and avoiding single points of failure. The convergence of P2P and Grid technologies has been envisioned for several years [11], but only a few systems have reached an implementation stage up to now. In this paper, a contribution to these research efforts is made by presenting a fully decentralized, P2P-based Grid job
scheduling system, allowing opportunistic work pools to be created. These work pools are integrated with existing Grid infrastructures at the level of meta-schedulers. In our proposal, a new type of resource managed by meta-schedulers is created by introducing decentralized work pools. By interfacing them with a meta-scheduler, the novel character of these resources will remain invisible to the Grid user. The P2P-based job scheduling system, called Omnivore, is able to cope both with node failures and network churn, eliminating the need for central administration and continuous resource availability. It provides scalable distributed scheduling, replicated storage and system monitoring capabilities. Our implementation of Omnivore is integrated into the Globus Toolkit 4 by means of the GridWay meta-scheduler, allowing a Grid user to access the new type of resource on demand. Experimental results obtained from an evaluation of our implementation show that Omnivore is both scalable and resilient in the presence of node failures and network churn. This paper is organized as follows. Section II presents the problem statement. Section III describes the proposed architecture of our P2P approach and its integration into the GridWay meta-scheduling system. Section IV describes implementation details of the Omnivore system, while section V evaluates Omnivore by means of quantitative test results. Section VI contains a discussion of related work. Section VII concludes the paper and outlines areas for future research. II. P ROBLEM S TATEMENT The motivation for the work presented in this paper originates from our hands-on experience in a cooperation with colleagues from the physics faculty of our university. They perform turbulence simulations which require a tremendous amount of computating power. Typically, the compute cluster at our university (142 dual-processor nodes each with dualcore AMD Opterons, i.e. 568 cores) is used to perform the computations. Since the cluster is used by many faculties, the physicists do not achieve the desired throughput. On the other hand, many workstations in the faculties and student computer pools have a low utilization, especially during nights and weekends. To abstract from this concrete use case, the aim of our work is to integrate unused resources (e.g. desktop computers) into a Grid (of dedicated cluster resources) on demand. In particular, universities and small and medium-sized enterprises (SME)
would benefit from this possibility, since they could avoid making investments into dedicated Grid hardware. Furthermore, this dynamic resource acquisition approach allows to react on peak-loads due to exceptional situations without the investment into spare hardware. A sample topology of such an infrastructure is shown in figure 1. The Omnivore pool consists of a dynamically expanding and shrinking number of nodes. The clusters are configured in a static manner.
Fig. 1.
Example topology of an Omnivore-enabled Grid
When integrating workstations into a Grid infrastructure, it must be taken into account that these resources tend to be instable. For example, machines might be suddenly rebooted or disconnected from the network. The infrastructure has to be able to cope with this type of high dynamics. Furthermore, the dynamic work pool should be self-configuring, and adding new nodes should be as easy as possible to avoid administrative overheads. Finally, the dynamic work pool should seamlessly integrate with existing Grid middlewares without the need to modify middleware code. To sum up, the following goals have to be achieved: • Scalability - the work pool architecture must scale from a dozen of nodes up to huge collections of nodes. • Resilience - failures of single nodes as well as correlated failures of interconnected nodes must not pose a problem to the integrity of the system as a whole. • Efficiency - the overhead due to decentralized management should be kept to an absolute minimum, allowing for efficient levels of resource utilization. • Seamless integration - Grid users should be able to remain oblivious to the distinction between opportunistic and dedicated resources. Jobs should be distributed between available nodes on demand. Grid administrators should not be forced to perform integration efforts by themselves. Installation should be minimally invasive, and no source code adaption should be necessary, allowing integration of P2P work pools without disrupting Grid
operation. III. O PPORTUNISTIC G RID W ORK P OOLS In this section, the design of the P2P scheduling system Omnivore will be described. Omnivore combines Grid and P2P paradigms, facilitating Grid enabled, decentralized job scheduling. It does so by enabling a site to create opportunistic work pools, comprised of idle or otherwise unused machines, integrating them with Grids on the level of meta-scheduling. We have already presented the design and implementation of a decentralized, P2P-based meta-scheduler [9], called PPM. In this paper, PPM is used by Omnivore which adds information retrieval mechanisms and a distributed data management solution. Most notably, in contrast to PPM, Omnivore is seamlessly integrated with the GridWay meta-scheduler. Omnivore’s system design is geared towards achieving the goals formulated in the previous section. The performance objectives, namely scalability, resilience and efficiency are achieved by employing suitable algorithmic patterns, which are implemented on top of proven P2P components. The integration goal is pursued by providing a meta-scheduling link to the Grid, thereby hiding intrinsics of the system from users. To establish a robust P2P foundation for the architecture, the P2P substrate FreePastry is used as our implementation basis. For the purpose of obtaining global status information, a distributed tree algorithm has been devised. The algorithm is implemented with the help of Scribe [5], a FreePastry component. Storage requirements are addressed by employing distributed list structures whose implementation is based on PAST [15], also developed by the FreePastry project. The integration with meta-scheduling systems is achieved by interfacing Omnivore with the GridWay meta-scheduling system [13]. Omnivore’s architecture is based on three components: • Information subsystem - responsible for collecting information concerning the system as a whole. Information collected by this component will be fed into GridWay’s scheduling algorithms as a basis for decision making. • Storage subsystem - responsible for storing data necessary for the operation of Omnivore itself as well as data relevant for job execution, i.e. input and output files. • Execution subsystem - responsible for handling job descriptions and dispatching jobs to PPM. It will process scheduling requests received by GridWay and modify them as necessary before passing them on. In the following, the design of the components of Omnivore will be discussed. A. Information Subsystem The information subsystem is responsible for collecting information concerning the system as a whole. For example, the GridWay meta-scheduler needs to know how many nodes are available in the system, how many computing slots are available at the current moment etc. To obtain this data, the information subsystem employs a tree-based algorithm to reduce
communication complexity while maintaining an ample level of responsiveness to changes concerning resource availability. Nodes are organized in a single tree structure, churn is dealt with by dynamically managing the tree. Consequently, the tree structure is subject to temporal changes. The tree management problem is solved by employing Scribe, a multicast application built on FreePastry. A higher level information retrieval algorithm has been developed as part of Omnivore, as described in the following.
local_info = (local_nodecount, local_freeslots, ...); for i=1 to n do receive info from child i; child_info[i]