A Probabilistic Scheduling Heuristic for ... - Ryerson University

Multiagent and Grid Systems, vol. 2, no. 1, 2006, pp. 45-59. The original publication is available at www.iospress.nl

A Probabilistic Scheduling Heuristic for Computational Grids Youcef Derbal School of Information Technology Management Ryerson University 350 Victoria Street, Toronto, ON, Canada M5B 2K3 Tel: 1(416) 979-5000 x7918 E-mail: [email protected]

Abstract Computational grids are large scale distributed networks of peer clusters of computing resources bounded by a decentralized management framework for the purpose of providing computing services, called grid services. The scheduling problem consists in finding the clusters that host the required set of grid services with a sufficient available capacity to handle user service requests in compliance with some specified quality of service. The interplay of intermittent resource participation, resource load dynamics, network latency and processing delay, and random subsystem failures creates a ubiquitous uncertainty on the state of the grid capacity to handle user requests. In addition to the need to account for this uncertainty, the scheduling strategy has to be decentralized since computational grids span distinct management domains. In this paper, we propose a decentralized scheduling strategy that views the dynamics of the grid service capacity as a stochastic process modeled by a Markov chain. The proposed scheduling scheme uses this model to predict the future local availability of resources. This is consolidated by a confidence model that approximates the future ability of peer clusters to successfully handle delegated service requests. The scalability of the proposed scheduling strategy is illustrated through simulation. Keywords: Probabilistic Scheduling, Decentralized Scheduling, Service Capacity, Computational Grid, Markov Chain.

1. Introduction Computing as a utility is the vision fuelling the emerging field of grid computing. The guiding principal of this vision is the sharing or economic exploitation of computing resources through aggregation and pooling within a virtual organization called a computational grid (CG). The potential utility of CGs is enormous both in scientific research and business where they are emerging as feasible alternatives to traditional supercomputing systems in the undertaking of grand challenge applications such as pharmaceutical drug discovery, oil reservoir modeling, large scale environmental modeling [12, 34, 56] . One constant and central question in grid computing is: where will a task be handled when submitted to a grid? This constitutes the basic issue underlying the grid scheduling problem, which resembles its classical counterpart for processor-array-based computing systems [2]. However, there are many prominent distinctions; including the lack of central control, the unbounded number of computing hosts, the variable load and availability, the diverse usage policies of the resource owners, the uncertainty on the knowledge of resource state, and the large scale geographic distribution of CGs [29] . In response to these challenges, new scheduling methods are needed. These may take advantage of the rich pool of research on traditional scheduling in various application domains; in particular high performance computing. In this application scope, scheduling may either be approached from the perspective of the application program or from the view of the underlying processing system [9]. The perspective of the application program divides the scheduling techniques into static and dynamic categories [2]. Static scheduling assumes the knowledge, at compile time, of many properties of the program, including data dependencies and task processing times [4, 27, 40]. The execution process of 1

Multiagent and Grid Systems, vol. 2, no. 1, 2006, pp. 45-59. IOS Press (www.iospress.nl) submitted jobs can therefore be modeled using a Direct Acyclic Graph (DAG), where the node weights represent the task processing times and the edge weights represent the inter-task dependencies and communication times [20, 45-47]. Dynamic scheduling techniques on the other hand, are devised to handle programs for which the above assumptions may not always be appropriate[27]. A large number of the later techniques rely on dynamic processor load balancing through “idle-cycle stealing” so as to minimize the overall execution time of the parallel program as well as the scheduling overhead. The scheduling strategies may also be categorized in relation to the architecture of the system. The Bounded Number of Processors category applies to a fully connected cluster of processors where the underlying physical network is fast enough to neglect the communication overhead [3]. Efficient scheduling can in this case be achieved since the scheduling problem is reduced to the allocation of tasks to one of the available processors irrespective of their network location. On the other hand, the Arbitrary Number of Processor scheduling category is more complex. In this case the scheduling performance is significantly affected by the underlying topology of the inter-processor connectivity and the inter-task communication and synchronization delays[3]. The grid scheduling problem has significant commonality with the dynamic scheduling for a system with an arbitrary number of processors. However, while the emphasis in traditional scheduling for high performance computing systems is on the minimization of the execution time for a parallel program, in grid scheduling the focus is on the handling of user requests in compliance with their specified quality of service (QoS) in addition to the maximization of grid resource exploitation and the system throughput. For most of the above mentioned scheduling strategies, the difficulty is well illustrated by the proven NP-completeness of the DAG scheduling variants, to the exception of some special cases [27, 31]. The assumptions behind many of the proposed heuristics with polynomial-time complexity [17, 27] are in practice not feasible for the open grid environment. However, the bulk of the scheduling principles and techniques utilized in traditional distributed computing systems [7, 9, 17] have inspired a significant portion of the grid scheduling strategies, especially those applied to grid enabled applications that are decomposable into independent tasks; including parameter sweep application [15, 28, 38, 51]. Current grid resource management systems use a variety of scheduling models and strategies [5, 6, 13, 16, 18, 37, 41, 42, 48-50, 61]. Some of these systems such as Condor[49] and LSF [61] can be integrated as part of an overarching grid infrastructure such as Globus [21] to provide high throughput scheduling without necessarily be concerned with the optimization of the resource allocation function. Batch queuing with first-come-first-serve scheduling policies are supported by many of the above systems; including PBS [37], LSF [61], Condor [49], Maui [41], and EASY [48]. Several resource management systems, including Globus [21], Legion [18], NetSolve [16], and Condor[49] utilize scheduling strategies that rely on a fixed-query function based mechanism for the discovery of resources and their subsequent selection to match the requirements of the submitted jobs [43]. This is in contrast to agent-based scheduling where agent-initiated resource advertisement and discovery decisions are adaptively synthesized based on performance prediction data such as the expected task execution time [13]. In order to address the varying load and resource performance in a grid, many adaptive and performance based scheduling methods have been proposed [5, 6, 13, 14, 30, 36, 44, 50, 54, 55, 57, 59]. For these approaches, the resource allocation is based on a prediction model of performance which may involve resource performance, application execution behavior, and load conditions. Often, the performance model is used to predict the expected task start time, and task completion time [13, 36, 55]. In AppleS [6], application specific schedulers are designed based on predefined templates. The scheduling strategy is adaptively synthesized based on the predicted performance of resources and the user QoS. This custom approach is particularly suited for parameter sweep applications such as MCell[15]. He et al [36] formulated a scheduling heuristic that takes in 2

Multiagent and Grid Systems, vol. 2, no. 1, 2006, pp. 45-59. IOS Press (www.iospress.nl) consideration the quality of service. The approach relies on a prediction model of the task completion time proposed in [60] . Yang et al [59] proposed a work allocation scheme whereby less work is assigned to resources with higher expected load variance so as to achieve the completion of all the tasks for a given job at roughly the same time for all the resources involved in the processing. A timeseries based predictor provides the expected load and variance [58]. In [32] the grid is viewed as a collection of independent resource clusters equipped with a hierarchical scheduling mechanism. Tasks that can not be scheduled at the cluster level are delegated to the grid layer where a grid scheduler will then attempt to find an appropriate cluster that has the required resources. A similar hierarchical scheduling mechanism is used in [35]. However, a more comprehensive performance optimization is applied at each level of the management hierarchy to achieve a quality of service defined by performance metrics such as over-deadline, makespan, and idle-time. Economy-centric scheduling make up another category of strategies used for the grid environment. These are not the focus of this paper, however, it may be informative to mention that they are built around economic considerations such as resource price, utilization budget, and the overall economic utility of resources to the consumer [1, 10, 11, 19]. The performance models underlying most of the surveyed scheduling strategies are focused on metrics related to the task execution such as the expected task start time, and task completion time. These models may be adequate for parameter sweep applications which are essentially dependent on the availability of processor computing cycles and storage memory. However, for the handling of complex grid services the availability of compute cycles and storage memory are only few of a potentially long list of relevant variables which may include the response time of database servers that host datasets, the availability of sufficient software licenses, the response time of application servers that host application services, and the availability of devices such as Digital Signal Processors which may be used for specialized processing. Because of the coupling induced by the competing resource needs of deployed services, the aggregated capability of the hosting environment is a more appropriate measure of its ability to handle service requests. In addition, the interplay of intermittent resource participation, resource load dynamics, network latency and processing delay, and random subsystem failures creates a ubiquitous uncertainty on the state of the grid capacity. Hence, new scheduling approaches are needed where a consideration is given to the aggregate capacity of a hosting environment as well the uncertainty on the resource availability and load information. Furthermore, in addition to the necessary adaptability to the dynamic and uncertain environment of the grid, the scalability of the scheduling strategies, which is critical for the practical feasibility of computational grids, should be a central point of concern. In this respect, many of the surveyed scheduling models rely on a central grid management layer to facilitate the scheduling and migration of jobs among distinct management domains [25, 30, 32, 35, 39]. These centralized approaches may be appropriate for a grid with highly integrated management domains. However, they are detrimental to scalability and may not be appropriate for grids that span multiple management domains bound by loose dynamic associations. This paper presents a decentralized grid scheduling strategy which views the grid as a dynamic federation of resource clusters contributed by various providers. Each cluster constitutes a private management domain and is defined by a set of service providing agents attached to computing hosts such as desktops, devices & instruments, high performance computing clusters, supercomputers, and special computing systems. The scheduling strategy consists in a series of service request delegation to peer clusters, if any required, and a single local cluster scheduling step. The decision of delegation versus local scheduling is adaptively synthesized through a comparative assessment of the probabilistic prediction of the future local availability of service capacity and the confidence in the capabilities of peer clusters to handle the request in compliance with a specified quality of service. The measure of confidence in peer clusters is estimated through a confidence model maintained by each 3

Multiagent and Grid Systems, vol. 2, no. 1, 2006, pp. 45-59. IOS Press (www.iospress.nl) cluster about its peers based on their past performance with respect to their handling of delegated service requests. In a clear contrast to the surveyed scheduling strategies, the proposed approach utilizes a service capacity measure that quantifies the aggregate capability of the hosting environment to handle the request of a hosted grid service [24]. This measure embodies the coupling induced by the resource utilization of the grid services as they compete for the consumption of common resources such as compute cycle, run-time memory, database servers, application servers, software licenses, and network bandwidth. Unlike the task-centric performance metrics, which narrowly characterize the resource performance vis-à-vis a given set of tasks, the service capacity measure characterizes the capability of the provider’s hosting environment vis-à-vis an entire grid service. The proposed scheduling strategy uses a Markov chain to model the dynamics of the available service capacity so as to account for the stochastic load and the uncertainty on the monitored resource state information. In this model, the future state of the available service capacity depends on only the current state of the available service capacity. This probabilistic approach may be more appropriate for the stochastic grid load when compared to the many predictive models of performance used in the surveyed works, which assume that the grid load pattern can be adequately modeled by linear statistical models such as time-series [26, 58, 59]. Section 2 provides a formulation of the grid scheduling problem and describes the details of the proposed scheduling strategy. Section 3 includes the simulation results and a discussion about some open issues. Related works and conclusions are provided in section 4.

2. Probabilistic Grid Scheduling Given the emergence of the service oriented computing paradigm [53], and the increasing acceptance of the service oriented architecture, computational grids are increasingly viewed as large scale distributed environments for service provision. Some of these services include compute cycle, storage services, bandwidth provisioning services, and application services such as data mining and processing. The exploitation of grid resources is enabled through user submitted service requests that may require one or more grid services to be available with sufficient capacity. Hence, we define a User Service Request (USR) as follows:

usr = (Ω, R, Φ, Q )

(1)

Where Ω = {s0 , s1 ,.., sm −1} , R = {c0 , c1 ,.., cm−1} . si , i = 0,..., m − 1 is a required grid service and ci , i = 0,..., m − 1 is the capacity required for service s i . The service capacity is expressed as the number of servslots, where a servslot is defined as the unit capacity of the hosting environment necessary to run a single instance of the service in question [24]. This definition is analogous to the combination of server-share and service-share units introduced in [33]. It represents the aggregated capacity of the collection of software and hardware resources required for the successful operation of a single service instance. The resources in question may include CPU slots, RAM, special hardware devices, disk space, cache size as well as any required licenses of utility software that the service instance may need for its successful operation. If the service requires for its execution a specific Operating System (OS), some processor architecture, or the presence of a Java Virtual Machine (JVM) and possibly a required heap size, then these would be part of the resources attached to a servslot. Q is a user defined set of QoS parameters which may, for example, include the maximum wait time before scheduling. The service handling flow Φ , which may be specified using a state machine, defines the execution sequence of the various tasks associated with the handling of the USR. 4

Multiagent and Grid Systems, vol. 2, no. 1, 2006, pp. 45-59. IOS Press (www.iospress.nl) For the remainder of the paper we will assume that a CG is a collection of service providing clusters (providers), each capable of locally scheduling as well as handling service requests for which it has the required service capacity (figure 1). The solid lines define the topology of the grid, i.e. the pathways for the information exchange about the capacity of hosted services. This in turn defines the notion of a neighbor. Hence, two clusters are considered to be neighbors if and only if there is an information pathway linking them as defined above. Furthermore, any grid cluster may interact with any peer for which it has the locating information, i.e. the address of the peer cluster. This locating information would be supplied by a neighbor. The grid scheduling problem can be formulated along the following scenario. A USR is submitted to a given cluster (submission cluster), the question then is: what is the collection of clusters that have the required service capacity to execute the tasks associated with the USR? Our proposed solution is a multi-step scheduling approach applied independently to each grid service required by the USR. Let the sequence of scheduling steps, needed to find the cluster that posses the required capacity of the grid service s j , be defined as follows: A j = {a ij }i =A0 , j = 0,..., m − 1 n

(2)

n A is a non-negative integer. Each step a ij is bound to the target cluster x where it is performed. The scheduling step a ij has one of the following outcomes: (O1) the Time-To-Schedule (TTS), counted starting from the submission time is exceeded; (O2) cluster x has a sufficient service capacity for the required grid service s j , or is expected to have a sufficient capacity within some specified future time period; (O3) the scheduling task is delegated to a neighboring cluster. The TTS denotes the maximum allowed time interval separating the USR submission and the scheduling of the last task of the USR. The TTS could be set by the user as part of the USR QoS specification. The first outcome implies that the USR can not be scheduled in compliance with the required QoS and the entire scheduling operation is aborted. The second outcome implies that the component of the USR is deemed to be scheduled and the corresponding cluster is added to the solution set. For the third outcome, the scheduling step results in the delegation of the task handling to a peer cluster. We will assume that the coordination of the individual scheduling tasks to yield the completed USR scheduling will be performed by the submission cluster. This paper will address the individual task scheduling assuming that the tasks that make up the USR are independent and that they can be executed concurrently without requiring any inter-task communication or synchronization. Our approach views the scheduling problem as a set of two sub-problems, namely: (1) a local scheduling sub-problem; and (2) a delegation sub-problem. For the local scheduling sub-problem we developed a resource state estimation model, based on Markov chains, in order to predict the future availability of local resources. In addition, we propose a confidence model that estimates, based on past interactions between peer clusters, the likelihood that a delegation of a service request to a given cluster would lead to a successful handling. The scheduling strategy built using the above models is fully decentralized so as to achieve the critical scalability characteristic. One of the elements of the proposed scheme that targets the achievement of scalability is the built-in incentive for the consumption of closer resources before seeking the consumption of distant resources at a higher cost of network bandwidth and failure rate. The overall scheduling heuristic is captured using the state chart of figure 2, and is further illustrated in figure 3. In figure 2, the rounded rectangles are action states. The actions consist in processing the right hand side of the equation. The right-hand side to the “/” are actions to be taken on transition between the corresponding states. 5

Multiagent and Grid Systems, vol. 2, no. 1, 2006, pp. 45-59. IOS Press (www.iospress.nl) Throughout this paper, each cluster is assumed to be uniquely identified by a natural number. The identifier can for example be determined from the IP address of the principal using a universal mapping accessible through an out-of-bound process or simply communicated by a neighboring cluster as part of the process of joining the grid. At time zero of the scheduling life cycle, the target cluster x is set to the submission cluster x sub , i.e. the cluster that first received the USR.

Ψ : (G, ℜ G ) → {0,1} evaluates to 1 if ∃ sˆ ∈ R x such that sˆ ≡ s and sˆ has sufficient residual capacity to meet the requirement associated with the requested service s . In all other cases Ψ (.,.) evaluates to zero. sˆ ≡ s means that the two services s and sˆ are equivalent in the sense that they have the same name, the same published interface, and the same description respectively. G is the set of clusters that make up the grid, and ℜ G = ∪ R x , where ℜ x is the set of grid services maintained in a registry bound x∈G

to cluster x . g : (G , ℜ G ) → Ne ( x ) yields the selection of cluster y (*) ∈ Ne ( x ) such that:

α ( x, y (*) , s j , n) = max α ( x, y, s j , n)

(3)

y∈Ne ( x )

Ne ( x ) is the set of neighbors to cluster x . n ∈ Ν represents the discrete time where Ν is the set of natural numbers. The function α : (G,G,ℜG, Ν) →[0 1] is called a confidence index. Each cluster maintains a confidence index about its neighbors to quantify the likelihood that a scheduling task delegated to a given cluster would lead to a successful completion. The details of the above heuristic will be explicated in the next subsections.

2.1 Prediction Model of Service Capacity In this paper we assume the existence of a mechanism that provides the instantaneous value of service capacity based on the availability of the resources required by the service in question. A detail discussion on such mechanism is given in [24], and an experimental validation of a liner model of the mapping between resource availability and service capacity is given in [23]. The instantaneous service capacity depends on the time-varying load of local and delegated service requests as well as the distribution of the requested service types. In addition, the monitoring and estimation of service capacity is subject to a ubiquitous uncertainty caused by the intermittent host participation, network latency and processing delay, and random subsystem failures. In order to account for the interplay of this uncertainty and the load dynamics, which may be assumed random, the capacity dynamics of a given service is modeled as a stochastic process {X n , n = 0,1,2,....} that takes on a finite or countable number of possible values of servslots. The set of possible values of the process is the set of non-negative integers {0,1,2,...} . The index n represents discrete time. The process is said to be in state i at time n if X n = i . Let us assume that whenever the process is in state i , there exists a constant probability Pij that the process will next be in state j such that: Pij = P{X n +1 = j X n = i, X n −1 = in −1 ,..., X 0 = i0 }

(4)

Assuming that this is true for all n ≥ 0 and all states i 0 , i1 ,..., i n −1 , i, j , the process under consideration is known as a Markov chain [52]. The probabilities Pij are also called the one-step transition

6

Multiagent and Grid Systems, vol. 2, no. 1, 2006, pp. 45-59. IOS Press (www.iospress.nl) probabilities since they are associated with a single increment of the time index variable. Then we can define the matrix of one-step transition probabilities Pij as follows: ⎡ ⎢ P00 ⎢ Ρ = ⎢ P01 ⎢ . ⎢ . ⎣

⎤ ...⎥ ⎥ ...⎥ .⎥ . ⎥⎦

P10 P11 . .

(5)

Similarly, we can define the matrix of the n-step transition probabilities as follows:

Ρ

( n)

⎡ ⎢ P00( n ) ⎢ = ⎢ P10( n ) ⎢ . ⎢ . ⎣

( n) 01 ( n) 11

P P

. .

⎤ ...⎥ ⎥ ...⎥ .⎥ . ⎥⎦

(6)

Where

Pij( n ) = P{X n+ m = j X m = i}

(7)

Where Pij(n ) represents the probability that the service capacity takes a value equal to j at time n + m knowing that it took a value of i at time m . Each cluster maintains a registry of its service offerings. Let s be one such service, and let lij(n ) be the total number of events, recorded in the time window (0 n] , where the capacity of s has changed from a value i to a value j according to the cluster’s resource accounting and management process. Then, we can estimate the one-step transition probabilities at time n as follows:

Pîj (n) =

lij( n )

(8)

n

These transition probabilities are maintained in the service registry and updated at every discrete moment of time. The above equation can be rewritten in the following form:

Pîj (n) =

lij( n −1) + n0 n

(9)

n0 = 1 if a transition i → j occurs at time n , otherwise n0 = 0 . Using the above relation we can rewrite (9) in the following recursive form: Pîj (n) = (1 − λ ).Pîj (n − 1) + λ .n0

(10)

λ = 1/ n 7

Multiagent and Grid Systems, vol. 2, no. 1, 2006, pp. 45-59. IOS Press (www.iospress.nl) For the implementation of the above algorithm we start with Pîj (0) = 0 for all elements of the transition matrix except for Pˆ (0) , which is set to 1, where h is the state of high service capacity. Given the hh

estimates of one-step transition probabilities, the model can provide an n-step-ahead prediction of a specific service capacity state such that: Ρ (n) = Ρ n

(11)

The above relation is a direct result of the Chapman-Kolmogorov equations [52]. Using (5), (6), (10), and (11) it is now possible to predict the service capacity state for any arbitrary future time-step. This capability is the basis of a service capacity prediction model to be used in the next subsection to complete the formulation of the scheduling strategy started earlier. The overall integration of the prediction model within the scheduling strategy is illustrated in figure 4.

2.2 Delegation Decision and Confidence Model Let us assume that at the discrete time n , the capacity of a requested service s hosted by a given cluster is at state i ∈∑, where ∑ is the set of all possible states of the service capacity as defined by the above Markov chain model. Let us also assume that at the same time n , the cluster hosting the service s has a number of queued requests for such service with a required cumulative capacity of C0 > 0 that is expected with high probability to be available at time n + m, m > 0 as suggested by (5), (6), (10), and (11). In addition to the already queued requests, a request is received for the service s in question with a required capacity δC > 0 expected to be available at time n + m, m > 0. In other words with the received request for the service s at time n , the required value of the capacity of s at time n+ m is C0 + δC . Let k be the state of the capacity of service s associated with the value C0 +δC, then the definition of the Ψ (.,.) can be detailed as follows:

⎧ ⎪ Ψ ( s, x ) = ⎨ ⎪ ⎩

1 if ∃ sˆ ∈ Rx such that sˆ ≡ s and Pîk( m ) ≥ p0

(12)

0 Otherwise

0 < p0 ≤ 1 is a positive real number which represents a decision threshold to be experimentally tuned. Pˆ(m) is to be computed using (5), (6), (10), and (11). For the definition of the confidence index, let ik

n yx ( s, n, M ) be the number of instances, counted over a discrete time window of length (n − M

n] ,

where cluster x has delegated to cluster y the scheduling of the task associated with the handling of a request of service s. Let m yx ( s, n, M ) be the size of the set of successfully handled requests in compliance with the specified QoS within the same period of time (n − M index definition is detailed as follows:

α ( x, y , s , n ) =

m yx ( s, n, M ) n yx ( s, n, M )

(13) 8

n] . Hence, the confidence

Multiagent and Grid Systems, vol. 2, no. 1, 2006, pp. 45-59. IOS Press (www.iospress.nl) The above confidence index informs the delegating cluster about the track record of a peer cluster with respect to its ability to handle the scheduling of delegated tasks. The confidence index is set to zero if n yx ( s, n, M ) = 0 . Initially, with the confidence indices being set to zero, delegations are made to randomly chosen neighbors. As the number of delegations to neighbors reaches an experimentally tuned threshold, the full use of the confidence model is enabled. Other models of confidence may be explored in future works.

3. Simulation Results The Midland Grid Emulator developed by the author at Ryerson University is used for the simulation of the proposed scheduling approach (see figure 5). The emulated grid can be configured for an arbitrary number of clusters. The inter-arrival time of USRs at the various clusters, is simulated using a Poisson process. For this simulation, the USRs consist of a single required grid service with a randomly generated required capacity. The scalability of the grid decision-making processes such as scheduling is one of the most critical performance criteria for the practical feasibility of CGs. In tangible terms, the scheduling strategy is said to be scalable if an increase in the grid size does not result in a drastic degradation of the grid performance as represented by the most pivotal performance indicators such as response time, and throughput. In order to quantify these indicators, two corresponding metrics are defined; namely: 1 nR ∑ nhopsi nR i =1 n µ= H nR

v=

(14) (15)

ν is the grid-wide average number of hops per successful scheduling decision and µ is the scheduling throughput rate. nR is the total grid-wide number of service request, and nH is the number of handled service requests. nhopsi is the number of hops before the i − th service request was successfully scheduled. The defined performance metrics are used to illustrate the performance of the probabilistic grid scheduling approach through a comparison against a scheduling strategy based on random delegation (figures 6-9). For the random scheduling scheme, a USR is locally scheduled if and only if a sufficient capacity is available at the time of scheduling. Otherwise, the USR is delegated to a randomly selected neighbor, which would apply the same scheduling rule. The delegation chain is terminated when the TTS is exceeded. The simulations were conducted for balanced and unbalance grid resource distributions. For an unbalanced distribution, the grid clusters are configured to host a randomly chosen set of services with randomly generated values of respective service capacities. For a balanced distribution all grid clusters are configured to have the same services with the same service capacity levels. The simulation results show that the proposed probabilistic scheduling strategy outperforms the random scheduling scheme with respect to response time (figures 6 and 8) and throughput (figure 7 and 9). The simulations were conducted under the exact same configuration of the Midland Grid Emulator for both strategies, which equally benefit from a multi-tiered scheduling approach with a neighborhood-bound delegation mechanism. The main distinction between the proposed strategy and the random scheduling scheme is in the integrated use of the confidence and prediction models to choose the peer cluster to which the service request is to be delegated when future local availability of 9

Multiagent and Grid Systems, vol. 2, no. 1, 2006, pp. 45-59. IOS Press (www.iospress.nl) sufficient capacity is not expected. Hence, the encouraging simulation results may be attributed to the performance of the prediction and confidence models respectively. The prediction enables the local queuing of service requests in anticipation of future availability of the required resources, thus avoiding a costly delegation that may requires a higher number of hops before convergence. The delegation decision is further constrained by the confidence index which provides a subset of neighboring peers deemed most likely to succeed in handling the type of service request in question. The resulting reduction of the rejection rate of service requests, due to exceeded TTS limit, may explain the improved throughput of the proposed approach compared to the random scheduling strategy (figures 7 and 9). Furthermore, from the perspective of a submission cluster, the information provided by the confidence index diminishes the uncertainty on the capacity of peer clusters enabling hence, in conjunction with the prediction of local capacity, a delegation decision that induces a better performance compared to the random scheduling scheme (figures 6 and 8). In essence, the proposed scheduling strategy encourages the consumption of closer resources as opposed to seeking the use of distant ones. This is further facilitated by the assumed CG topology which limits the pathways of service request delegation to the communication links between neighbors. As a result, the service requests tend to be delegated within a relatively small radius (measured in number of hops) from their submission clusters especially for a balanced resource distribution. In particular, the average number of hops may be approximated by a linear function of the grid size for an unbalanced resource distribution (figure 6). In contrast, the random scheduling scheme exhibits a significantly higher average number of hops for most grid sizes. The improved performance for the balanced resource configuration compared to the unbalance resource configuration is expected for both scheduling strategies (figures 8). Indeed, given the uniformly distributed load and initial resource availability for all clusters, most service requests would end-up being handled by neighbors close to the submission cluster irrespective of the grid size. In this respect, the proposed scheduling approach yields an average number of hops upper-bounded by a ceiling and 60% lower compared to the random scheduling scheme (figure 8). However, it would be of great value to investigate the “stability” question associated with the proposed delegation strategy. That is, given a submitted service request, under what conditions on the parameters of the confidence and prediction models, and the CG topology would such service request be successfully handled by a cluster within a given radius r * of the submission cluster? While the proposed scheduling scheme offers some encouraging results with respect to scalability, more investigations are needed for a number of issues such as the capability of the proposed approach to assure the compliance with a user specified QoS. Furthermore, although one may claim, from the simulation results, that the increase in the size of the grid does not significantly increase the number of hops, we also need to consider the effect on the quality of service of the actual local wait-time-beforehandling of a service request. For future works, the interplay between the prediction model and the confidence model may require further analysis, in the context of a QoS model, to establish a tradeoff between the local wait-time-before-handling and the number of hops.

4. Related Works & Conclusion Many works surveyed in the first section of this paper are related in various degrees to the proposed approach. This relationship involves the considered grid organization, the adaptability towards the dynamic grid environment, and the scalability of the grid scheduling function [8, 13, 14, 30, 32, 35, 36, 54]. In terms of the adaptability towards the varying resource performance and the dynamic grid load, most cited adaptive strategies include various local scheduling schemes that rely on a performance prediction model of resource and job related metrics such as the expected job start time, and the job makespan (job completion time). While the predictions of job completion time may be 10

Multiagent and Grid Systems, vol. 2, no. 1, 2006, pp. 45-59. IOS Press (www.iospress.nl) appropriate and feasible for a controlled hosting environment with a homogenous set of deployed grid services, this may not be true for a grid that hosts a diverse spectrum of services. In such a grid, the ability to estimate the aggregate resource behavior of the hosting environment provides a more adequate measure of the likelihood that a service request will be handled with some defined quality of service [24, 33]. Because such aggregate behavior includes the cumulative effect of concurrent exploitation of the local resources, it is hence more indicative of the hosting environment’s potential to meet the requirement of a service request. In this respect, the proposed grid scheduling approach is different from these cited works in that the local scheduling strategy relies on a model of service capacity of the hosting environment as opposed to a dictionary of name-value pairs of resource attributes. Some of the advantages of such approach are the simplification of the decision-making mechanisms and the comprehensive consideration of all parameters relevant to the hosting environment [33]. The works cited above include a closely related and comparable subset to the proposed approach [32, 35]. These related scheduling strategies are appropriately formulated within the context of a multistep scheduling framework and address the delegation and local scheduling phases using different approaches. In [35], the grid is assumed to be a collection of clusters each associated with a local scheduler. At the grid layer, a multi-cluster scheduler is responsible for the assignment of submitted jobs to the appropriate cluster based on execution time predictions of the submitted jobs. At the cluster level, a genetic algorithm is used to schedule parallel jobs over the cluster resources so as to achieve a low value of a metric named comprehensive performance (CP), defined as a linear function of job makespan, over-deadline and resource idle-time. In addition, the cluster load is quantified using a metric based on job centric parameters such as the number of running jobs, the sum of their execution time, and the total job size. This metric is used by the scheduler at the grid layer to balance the load among the various clusters. Compared to [35], the proposed scheduling approach possess an inherent scalability advantage since no scheduler is required outside the confine of the independent management domain of a cluster. The architecture adopted in this work is coherent with the desirable organization of the grid as a decentralized system of independent providers with distinct management domains and usage policies [22]. Furthermore, in contrast to [35], the proposed strategy provides a mechanism that enables the choice of delegation versus local queuing of service requests through a comparison of the measure of confidence in the local prediction versus the measure of confidence in the expected quality of service that may be received from a distant provider. This mechanism may, as argued earlier, encourage the exploitation of closer resources with a better response time and lower network cost as opposed to the use of distant resources at higher bandwidth cost and with a lower response time. Its introduction in this paper is a first step towards the development of a management framework that enables the efficient resource exploitation in the face of the ubiquitous uncertainty on the resource state information and the quality of service delivered by the providers. In [32], a scheduling model is proposed for the execution of parallel jobs on a multi-cluster grid. It consists of a grid scheduler cooperating with a set of cluster bound schedulers. If the availability of cluster processors is not sufficient for the tasks’ execution of a given parallel job, the cluster’s scheduler attempts to find a neighboring cluster (without the involvement of the grid scheduler) that can execute the job. If such neighboring cluster is found, the job is migrated accordingly. Otherwise the job is sent to the grid scheduler where it is consequently assigned to another cluster (chosen among all grid clusters) that can execute the job. Given the inter-neighbor cooperation between clusters, this approach may offer a better scalability compared to [35]. However, the need for a central grid scheduler is detrimental to the overall scalability of the scheduling approach. In addition, the lack of an adaptability mechanism in this approach is an overall barrier against its wider use in a grid with varying resource performance and user load. 11

Multiagent and Grid Systems, vol. 2, no. 1, 2006, pp. 45-59. IOS Press (www.iospress.nl) The probabilistic grid scheduling approach presented in this paper assumes a decentralized organization of the grid. It uses a Markov chain model to predict the capacity of the hosting environment to handle a given service. This model accounts for the uncertainty on the resource state information, the dynamic user load, and varying resource performance. In addition, an experiencebased confidence model is utilized to improve the delegation performance. The scalability advantage of the presented scheduling strategy is well illustrated by the simulation results for both balanced and unbalanced resource distributions. Both defined performance metrics exhibit a relatively stable level as the grid size is increased from 100 to 1000 clusters. The results of the comparison against the random scheduling strategy indicates, at least for the resource distribution and user load patterns used in the simulation, that the confidence and prediction models have a positive effect on the scheduling strategy. However, as part of future works, an explicit consideration of more comprehensive QoS models may be needed if the delegation is performed between clusters cooperating through negotiated service level agreements. Such QoS models may include metrics such as the wait-time-beforehandling, and rate of service refusal. The quantified variance of the delivered QoS from the desired QoS may then be added as another factor in the confidence index.

Acknowledgement I am grateful to Professor Aziz Guergachi and the Canadian Foundation for Innovation for providing the computing resources used to run the simulations reported in this paper.

References [1]

[2] [3] [4]

[5]

[6]

[7]

[8]

D. Abramson, R. Buyya, and J. Giddy, A computational economy for grid computing and its implementation in the Nimrod-G resource broker, Future Generation Computer Systems 18 (2002), 1061-1074. I. Ahmad and Y.-K. Kwok, On parallelizing the multiprocessor scheduling problem, IEEE Transactions on Parallel and Distributed Systems 10 (1999), 414-432. I. Ahmad and Y.-K. Kwok, Parallel Program Scheduling Techniques, in: High Performance Cluster Computing, vol. 1, Prentice Hall PTR, New Jersey, 1999, pp. 553-578. M. A. Al-Mouhamed, Lower bound on the number of processors and time for scheduling precedence graphs with communication costs, IEEE Transactions on Software Engineering 16 (1990), 1317-1322. F. Berman, H. Casanova, A. Chien, K. Cooper, H. Dail, A. Dasgupta, W. Deng, J. Dongarra, L. Johnsson, K. Kennedy, C. Koelbel, B. Liu, X. Liu, A. Mandal, G. Marin, M. Mazina, J. Mellor-Crummey, C. Mendes, A. Olugbile, M. Patel, D. Reed, Z. Shi, O. Sievert, H. Xia, and A. Yarkhan, New grid scheduling and rescheduling methods in the GrADS project, International Journal of Parallel Programming 33 (2005), 209-229. F. Berman, R. Wolski, H. Casanova, W. Cirne, H. Dail, M. Faerman, S. Figueira, J. Hayes, G. Obertelli, J. Schopf, G. Shao, S. Smallen, N. Spring, A. Su, and D. Zagorodnov, Adaptive computing on the grid using AppLeS, IEEE Transactions on Parallel and Distributed Systems 14 (2003), 369-382. T. D. Braun, H. J. Siegel, N. Beck, L. Boloni, M. Maheswaran, A. I. Reuther, J. P. Robertson, M. D. Theys, and B. Yao, Taxonomy for describing matching and scheduling heuristics for mixed-machine heterogeneous computing systems, in: Proceedings of the IEEE Symposium on Reliable Distributed Systems, 1998, pp. 330-335. T. D. Braun, H. J. Siegel, N. Beck, L. L. Boloni, M. Maheswaran, A. I. Reuther, J. P. Robertson, M. D. Theys, B. Yao, D. Hensgen, and R. F. Freund, A Comparison of Eleven 12

Multiagent and Grid Systems, vol. 2, no. 1, 2006, pp. 45-59. IOS Press (www.iospress.nl)

[9] [10]

[11] [12]

[13] [14] [15] [16]

[17] [18] [19] [20] [21]

[22] [23] [24]

[25] [26] [27]

Static Heuristics for Mapping a Class of Independent Tasks onto Heterogeneous Distributed Computing Systems, Journal of Parallel and Distributed Computing 61 (2001), 810-837. R. Buyya, High performance cluster computing, Prentice Hall PTR, Upper Saddle River, N.J., 1999. R. Buyya, D. Abramson, J. Giddy, and H. Stockinger, Economic models for resource management and scheduling in grid computing, Concurrency Computation Practice and Experience 14 (2002), 1507-1542. R. Buyya, D. Abramson, and S. Venugopal, The grid economy, Proceedings of the IEEE 93 (2005), 698-714. R. Buyya, K. Branson, J. Giddy, and D. Abramson, The Virtual Laboratory: A toolset to enable distributed molecular modelling for drug design on the world-wide grid, Concurrency Computation Practice and Experience 15 (2003), 1-25. J. Cao, S. A. Jarvis, S. Saini, D. J. Kerbyson, and G. R. Nudd, ARMS: An agent-based resource management system for grid computing, Scientific Programming 10 (2002), 135-148. J. Cao, D. P. Spooner, S. A. Jarvis, and G. R. Nudd, Grid load balancing using intelligent agents, Future Generation Computer Systems 21 (2005), 135-149. H. Casanova, T. M. Bartol, J. Stiles, and F. Berman, Distributing MCell simulations on the grid, International Journal of High Performance Computing Applications 15 (2001), 243-257. H. Casanova and J. Dongarra, NetSolve: A network-enabled server for solving computational science problems, International Journal of Supercomputer Applications and High Performance Computing 11 (1997), 212-223. T. L. Casavant and J. G. Kuhl, A Taxonomy of Scheduling in General-Purpose Distributed Computing Systems, IEEE Transactions on Software Engineering 14 (1988), 141-155. S. J. Chapin, D. Katramatos, J. Karpovich, and A. Grimshaw, Resource management in Legion, Future Generation Computer Systems 15 (1999), 583-594. L. Chunlin and L. Layuan, A distributed utility-based two level market solution for optimal resource scheduling in computational grid, Parallel Computing 31 (2005), 332-351. M. Cosnard and M. Loi, Automatic Task Graph Generation Techniques, Parallel Processing Letters 54 (1995), 527-538. K. Czajkowski, S. Fitzgerald, I. Foster, and C. Kesselman, Grid information services for distributed resource sharing, in: IEEE International Symposium on High Performance Distributed Computing, Proceedings, 2001, pp. 181-194. K. Czajkowski, I. Foster, and C. Kesselman, Agreement-based resource management, Proceedings of the IEEE 93 (2005), 631-643. Y. Derbal, Service-oriented model of grid resource availability, International Journal of Computers and Applications (2005), In review. Y. Derbal, Service oriented grid resource modeling and management, in: Proceedings of the 1st International Conference on Web Information Systems and Technologies (WEBIST 2005), 2005, pp. 146-153. V. Di Martino and M. Mililotti, Sub optimal scheduling in a grid using genetic algorithms, Parallel Computing 30 (2004), 553-565. P. A. Dinda, Online prediction of the running time of tasks, in: IEEE International Symposium on High Performance Distributed Computing, Proceedings, 2001, pp. 383-394. H. El-Rewini, T. G. Lewis, and H. H. Ali, Task scheduling in parallel and distributed systems, Prentice Hall, Englewood Cliffs, N.J., 1994.

13

Multiagent and Grid Systems, vol. 2, no. 1, 2006, pp. 45-59. IOS Press (www.iospress.nl) [28]

[29] [30] [31] [32] [33] [34] [35]

[36] [37]

[38]

[39] [40]

[41] [42]

[43]

[44] [45] [46]

M. Faerman, A. Birnbaum, F. Berman, and H. Casanova, Resource allocation strategies for guided parameter space searches, International Journal of High Performance Computing Applications 17 (2003), 383-402. I. Foster and C. Kesselman, The grid: blueprint for a new computing infrastructure, Elsevier Science, San Francisco, 2004. Y. Gao, H. Rong, and J. Z. Huang, Adaptive grid job scheduling with genetic algorithms, Future Generation Computer Systems 21 (2005), 151-161. R. Gary and D. Johnson, Computers and Intractability - A Guide to the Theory of NPCompleteness, Freeman, San Francisco, 1979. A. Goldman and C. Queiroz, A model for parallel job scheduling on dynamical computer Grids, Concurrency Computation Practice and Experience 16 (2004), 461-468. S. Graupner, V. Kotov, A. Andrzejak, and H. Trinks, Service-centric globally distributed computing, IEEE Internet Computing 7 (2003), 36 - 43. J. Gustafson, Program of grand challenge problems: expectations and results, in: Aizu International Symposium on Parallel Algorithms/Architecture Synthesis, 1997, pp. 2-7. L. He, S. A. Jarvis, D. P. Spooner, X. Chen, and G. R. Nudd, Hybrid performance-based workload management for multiclusters and grids, IEE Proceedings: Software 151 (2004), 224231. X. He, X. Sun, and G. Von Laszewski, QoS guided Min-Min heuristic for grid task scheduling, Journal of Computer Science and Technology 18 (2003), 442-451. R. L. Henderson, Job scheduling under the Portable Batch System, in: Lecture Notes in Computer Science, vol. 949, Springer Verlag, Heidelberg, D-69121, Germany, 1995, pp. 279294. E. Huedo, R. S. Montero, and I. M. Llorente, Experiences on adaptive grid scheduling of parameter sweep applications, in: Proceedings of the 12th Euromicro Conference on Parallel, Distributed and Network-Based Processing, Institute of Electrical and Electronics Engineers Inc., 2004, pp. 28 - 33. E. Huedo, R. S. Montero, and I. M. Llorente, A framework for adaptive execution in grids, Software - Practice and Experience 34 (2004), 631 - 651. J.-J. Hwang, Y.-C. Chow, F. D. Anger, and C.-Y. Lee, Scheduling precedence graphs in systems with interprocessor communication times, SIAM Journal on Computing 18 (1989), 244-257. D. Jackson, Q. Snell, and M. Clement, Core algorithms of the Maui scheduler, in: Lecture Notes Computer Science, Springer, Berlin, 2001, pp. 87-102. N. Kapadia and J. Fortes, PUNCH: An architecture for Web-enabled wide-area networkcomputing, Cluster Computing: The Journal of Networks, Software Tools and Applications 2 (1999), 153-164. K. Krauter, R. Buyya, and M. Maheswaran, A taxonomy and survey of grid resource management systems for distributed computing, Software - Practice and Experience 32 (2002), 135-164. N. Krothapalli and A. V. Deshmukh, Dynamic allocation of communicating tasks in computational grids, IIE Transactions (Institute of Industrial Engineers) 36 (2004), 1037-4053. Y.-K. Kwok and I. Ahmad, Benchmarking the task graph scheduling algorithms, in: Proceedings of the International Parallel Processing Symposium, IPPS, 1998, pp. 531-537. Y.-K. Kwok and I. Ahmad, Dynamic critical-path scheduling: an effective technique for allocating task graphs to multiprocessors, IEEE Transactions on Parallel and Distributed Systems 7 (1996), 506-521. 14

Multiagent and Grid Systems, vol. 2, no. 1, 2006, pp. 45-59. IOS Press (www.iospress.nl) [47] [48] [49] [50] [51] [52] [53] [54]

[55] [56] [57] [58] [59]

[60]

[61]

Y.-K. Kwok and I. Ahmad, Static Scheduling Algorithms for Allocating Directed Task Graphs to Multiprocessors, ACM Computing Surveys 31 (1999), 406-471. D. Lifka, The ANL/IBM SP scheduling system, in: Lecture Notes Computer Science, vol. 2221, Springer, Berlin, 2001, pp. 187-191. M. J. Litzkow, M. Livny, and M. W. Mutka, Condor - a hunter of idle workstations, in: Proceedings - International Conference on Distributed Computing Systems, 1988, pp. 104-111. H. Nakada, M. Sato, and S. Sekiguchi, Design and implementations of Ninf: towards a global computing infrastructure, Future Generation Computer Systems 15 (1999), 649-658. A. L. Rosenberg and M. Yurkewych, Guidelines for scheduling some common computationdags for internet-based computing, IEEE Transactions on Computers 54 (2005), 428-438. S. M. Ross, Introduction to Probability Models, Academic Press, Inc., San Diego, CA, 1989. M. P. Singh and M. N. Huhns, Service-Oriented Computing: Semantics, Processes, Agents, John Wiley & Sons, 2005. D. P. Spooner, S. A. Jarvis, J. Cao, S. Saini, and G. R. Nudd, Local grid scheduling techniques using performance prediction, IEE Proceedings: Computers and Digital Techniques 150 (2003), 87-96. X.-H. Sun and M. Wu, Grid Harvest Service: a system for long-term, application-level task scheduling, in: Parallel and Distributed Processing Symposium, 2003, pp. 25-33. E. Tantoso, H. A. Wahab, and H. Y. Chan, Molecular docking: An example of grid enabled applications, New Generation Computing 22 (2004), 189-190. C. Weng and X. Lu, Heuristic scheduling for bag-of-tasks applications in combination with QoS in the computational grid, Future Generation Computer Systems 21 (2005), 271-280. L. Yang, I. Foster, and J. M. Schopf, Homeostatic and Tendency-based CPU Load Predictions, in: Procedeeings of the Parallel and Distributed Processing Symposium, 2003, pp. 42-51. L. Yang, J. M. Schopf, and I. Foster, Conservative Scheduling: Using Predicted Variance to Improve Scheduling Decisions in Dynamic Environments, in: Proceedings of the 2003 ACM/IEEE conference on Supercomputing, 2003, pp. 31-47. J. Yi, C. Choi, K. Park, and S. Kim, Efficient dynamic resource reallocation scheme using time-slot connection pattern, in: Proceedings of the International Conference on Parallel and Distributed Processing Techniques and Applications, Association for Computing Machinery, New York, United States, 2003, pp. 677-682. S. Zhou, X. Zheng, J. Wang, and P. Delisle, Utopia: A load sharing facility for large, heterogeneous distributed computer systems, Software - Practice and Experience 23 (1993), 224-234.

15


Youcef Derbal, B. Eng., M.Sc., Ph.D. (Electrical and Computer Engineering, Queen’s University). In his dissertation, Dr. Derbal investigated the stability of a class of nonlinear dynamic systems. In particular he formulated a stability theory applicable to the design of non-linear control systems, including neuro-adaptive structures. For over a decade, Dr. Derbal worked in various information and computing industries. His research interests are focused on the various decision-making mechanisms underlying grid systems, as well as the application of grid computing solutions to the simulation of large scale models of environmental and biological systems.

16

Multiagent and Grid Systems, vol. 2, no. 1, 2006, pp. 45-59. IOS Press (www.iospress.nl) Fig.1: A computational grid is viewed as a federation of clusters which expose their resources through a set of grid services. A cluster is assumed to include a special agent that organizes the cluster’s internal operations and mediates the inter-cluster transactions. Fig. 2: A UML state chart of the scheduling heuristic. Fig. 3: Schematic of the multi-step scheduling strategy. Fig. 4: Integration of the Markov-chain based prediction model with the scheduling strategy. Fig. 5: The Midland Grid Emulator. Fig. 6: Average number of hops as a function of the CG size for an unbalance resource distribution. Fig. 7: Average throughput as a function of the CG size for an unbalance resource distribution. Fig. 8: Average number of hops as a function of the CG size for a balanced resource distribution. Fig. 9: Average throughput as a function of the CG size for a balanced resource distribution.

17


Principal Agent

Principal Agent

Agent

Principal

Agent

Agent

Agent Agent

Agent

Principal Agent

Agent

Agent

Agent

Fig. 1

18


[t ≥ TTS ]

/ t = 0 , x = xsub p = Ψ( x, s j ) [ p = 0] [ p = 1]

Fig. 2

19

x = g ( x, s j )


Fig. 3

20


Fig. 4

21


Fig. 5

22


40

Average Number of Hops

35

Random Scheduling

30

25

Probabilistic Scheduling

20

15

10 100

200

300

400

500

600

CG Size

Fig. 6

23

700

800

900

1000


0.6 0.55


Average Throughput Rate

0.5 0.45 0.4 0.35 0.3

Random Scheduling

0.25 0.2 0.15 0.1 100

200

300

400

500

600

CG Size

Fig. 7

24

700

800

900

1000


35

Average Number of Hops

30

25

Random Scheduling

20

15

10


5

0 100

200

300

400

500

600

CG Size

Fig. 8

25

700

800

900

1000


1

Average Throughput Rate

0.9

0.8

0.7

0.6

0.5

0.4 100

200

300

400

500

600

CG Size

Fig. 9

26

700

800

900

1000