Future Generation Computer Systems 26 (2010) 183–197
Contents lists available at ScienceDirect
Future Generation Computer Systems journal homepage: www.elsevier.com/locate/fgcs
Dynamic resource selection heuristics for a non-reserved bidding-based Grid environment Chien-Min Wang a , Hsi-Min Chen b,∗ , Chun-Chen Hsu c , Jonathan Lee b a
Institute of Information Science, Academia Sinica, Taipei, Taiwan
b
Department of Computer Science and Information Engineering, National Central University, Taoyuan, Taiwan
c
Department of Computer Science and Information Engineering, National Taiwan University, Taipei, Taiwan
article
info
Article history: Received 19 December 2008 Received in revised form 29 July 2009 Accepted 3 August 2009 Available online 7 August 2009 Keywords: Grid computing Resource selection Resource management Bidding Matchmaking
abstract A Grid system is comprised of large sets of heterogeneous and geographically distributed resources that are aggregated as a virtual computing platform for executing large-scale scientific applications. As the number of resources in Grids increases rapidly, selecting appropriate resources for jobs has become a crucial issue. To avoid single point of failure and server overload problems, bidding provides an alternative means of resource selection in distributed systems. However, under the bidding model, the key challenge of resource selection is that there is no global information system to facilitate optimum decision-making; hence requesters can only obtain partial information revealed by resource providers. To address this problem, we propose a set of resource selection heuristics to minimize the turnaround time in a non-reserved bidding-based Grid environment, while considering the level of information about competing jobs revealed by providers. We also present the results of experiments conducted to evaluate the performance of the proposed heuristics. © 2009 Elsevier B.V. All rights reserved.
1. Introduction With the rapid growth in the number of PCs and clusters, Grid computing technologies have emerged to facilitate resource sharing and the coordination of problem solving in distributed systems [1,2]. Such systems consist of large sets of heterogeneous and geographically distributed resources that are aggregated as a virtual computing platform for executing large-scale scientific applications. As the number of resources in Grids increases rapidly, selecting appropriate resources for jobs has become a crucial issue. In essence, Grid resources are heterogeneous and managed independently by different organizations, and resource providers can specify their own access policies for sharing resources and joining/leaving Grids dynamically. Thus, exploiting previous cluster-based scheduling heuristics [3–7] to allocate tasks through a centralized manager or mapper is not feasible. In recent years, many matchmaking-based technologies have been proposed to address the issue of Grid resource management [8–15]. Fig. 1(a) presents an abstract matchmaking model generalized from these technologies. However, the matchmaking technique may cause a matchmaker overload problem. Since a resource
∗
Corresponding author. E-mail address:
[email protected] (H.-M. Chen).
0167-739X/$ – see front matter © 2009 Elsevier B.V. All rights reserved. doi:10.1016/j.future.2009.08.003
matchmaker is responsible for registering all resource states advertised by providers and executing matching algorithms, an increase in the number of resources and the frequency of job requests creates a performance bottleneck. Moreover, resource states may change minute by minute due to requesters’ activities or resource failures, so the matchmaking technique may fail to reflect the dynamic nature of Grid resources. This is because matchmaking is a push-based model in which a matchmaker does not learn about the changes in resource states until the resource providers publish their new states. In consequence, matchmaking may return inaccurate results. To avoid single point of failure, matchmaker overload and expired resource information, bidding provides an alternative means of resource allocation in distributed systems [16–24]. Fig. 1(b) depicts the abstract process of the bidding model. A resource requester starts a bidding process by sending a set of call-for-proposal (CFP) requests, which contain job requirements, to resource providers. Then, based their resource utilization and policies, the providers decide whether or not to participate in the bidding process. If they join the bidding process, they return bids that describe the states of their resources to the requester. Finally, the requester evaluates and ranks the collected bids based on its selection strategy and submits the job to the provider that proposes the best-ranked bid. The bidding model has the following advantages over the matchmaking model. (1) Scalability: resource allocations between providers and requesters in the
184
C.-M. Wang et al. / Future Generation Computer Systems 26 (2010) 183–197
a
b
Fig. 1. (a) The abstract process of the matchmaking model. (b) The abstract process of the bidding model.
bidding model are fully distributed without the intervention of a centralized matchmaker/broker. (2) Autonomy: requesters themselves can determine which of the offered resources are bestsuited to execute their jobs, while providers can contribute their resources according to their sharing policies and report up-to-date state information. (3) Reliability: if a resource fails during a job’s execution, the requester can select other candidate resources from the received bids. Under the bidding model, resource providers can usually choose between two bidding strategies, reserved and non-reserved bidding [25]. Providers who adopt the reserved strategy keep the resources for each bid as commitments to guarantee future resource states. However, if the requester subsequently rejects the bid, the reserved resources will be wasted. In this scenario, other requesters may be prepared to accept the bid before the original requester rejects it; hence, there is a high probability that the provider will miss the opportunity to serve other requesters with reserved resources. In contrast to the reserved strategy, resource providers who adopt the non-reserved option offer the same resource states to a set of requesters without reserving resources for each bid. This strategy enables providers to fully utilize their resources, but it does not guarantee the resource states. If requesters receiving the same bids submit jobs to the provider simultaneously, they will have to compete for the resources so that the job completion time may not be as expected. In addition to the above strategies, the bidding model allows providers to reveal different levels of information about competing jobs to requesters. As shown in Fig. 1(b), after resource providers receive CFPs from requesters, they can simply reveal the capabilities of the provided resources, provide information about the number of competitors, or give even more complete information about the competitors. The level of information revealed is an important factor that affects the performance of resource selection for a job’s execution. In this paper, our objective is to minimize the turnaround time of jobs in a non-reserved bidding-based Grid environment. The turnaround time covers the period from the time a job arrives to the receipt of the executed result. In online systems, users are more sensitive to the turnaround time than the execution time, waiting time or makespan [26]. To minimize the turnaround time in this model, we propose a set of deterministic and probabilistic resource selection heuristics. In contrast to traditional centralized scheduling problems, the key challenge of resource selection in the bidding model is that there is no global information system to facilitate optimum decision-making; hence, requesters are only aware of partial information released by resource providers. Thus, we consider various levels of information about competitors in the proposed heuristics. We want to determine whether requesters could make better scheduling decisions if they have more information about the states of competing jobs. We conduct experiments to evaluate the performance of the heuristics for various levels of information and the impact
of non-cooperative requesters. The experimental results show that the performance of the Dissolve-P heuristic is superior to that of other heuristics when information about competitors is not provided. However, the MCT-D heuristics outperform the other heuristics when information about the execution times of competitors is provided. We also find that the level of information has a significant effect on the performance of the MCT-D based heuristics, but it does not influence the Dissolve-P based heuristics. Furthermore, requesters who adopt cooperative resource selection strategies achieve better results than those that use non-cooperative strategies. The contributions of this paper are as follows. (1) To the best of our knowledge, this is the first study of the resource selection issue in an online non-reserved bidding-based Grid system that focuses on minimizing the turnaround time. (2) To address this issue, we propose a set of probabilistic and deterministic resource selection heuristics, as well as a pre-scheduling mechanism, and evaluate their performance. (3) The proposed heuristics consider various levels of information about competing jobs. (4) We examine the impact of cooperative requesters and non-cooperative requesters on the performance of Grid resource selection. The remainder of the paper is organized as follows. Section 2 contains a review of the literature on resource selection. In Section 3, we formally define the problem considered in this research. Section 4 presents the proposed heuristics for the various levels of information revealed by providers. We describe the simulation setup and evaluate the performance of the proposed heuristics in Section 5. Then, in Section 6, we summarize our conclusions. 2. Related work A number of resource management approaches have been proposed in various Grid projects. Globus Toolkit [27], the most popular Grid middleware, integrates distributed computing resources and provides a set of management tools, such as security, data management, information services and execution management. In addition, for resource management, Globus provides MDS (Monitoring and Discovery Service) [28] to support the discovery and monitoring of resources, services, and computations, and GRAM(Grid Resource Allocation and Management) [29] combined with RSL(Resource Specification Language) for resource allocation tasks. However, Globus only allows users to specify basic configurations, such as the file path, maximum CPU power, required memory, and wall clock time. It does not support job matching/scheduling at the global level; instead it leaves the task to the development of an upper-layer service. The bidding model and the proposed heuristics can be constructed as a high-level resource management service on top of Globus. Condor matchmaker [8,11] is another well-known resource management framework designed for high-throughput computing
C.-M. Wang et al. / Future Generation Computer Systems 26 (2010) 183–197
185
Fig. 2. An example of the postponement phenomenon in the non-reserved bidding model.
in Grids. Under this framework, providers and consumers describe their respective capabilities and requirements in classified advertisements (classads), which are pushed to a central matchmaker that does the matching. One of key features of the framework is that it considers different levels of sharing policies, which are specified in the providers’ advertisements. The approaches in [9,10, 12] are extensions of the Condor matchmaker for handling specific requirements. However, Condor is based on a centralized matchmaking model in which the problem of matchmaker overload may occur. Moreover, matchmaking decisions are made by checking the resource states kept by the matchmaker, but those states may not be consistent with the real states of resources. Therefore, the matching results provided to requesters may be incorrect. In contrast to the centralized matchmaking-based approaches, many bidding-related studies have been conducted in the field of distributed systems [17]. For example, Xiao et al. [24] presented a bidding-based resource management mechanism called a P2P decentralized scheduling framework. Based on the mechanism, they proposed an incentive-based scheduling scheme to maximize the success rate of job executions and minimize the fairness deviation among resources. In [20], the authors proposed two contract-net based resource selection policies to increase the number of jobs completed successfully according to the given budgets and deadlines. Das Anubhav et al. [19] introduced a combinatorial auctionbased resource allocation protocol in which a user bids a price value for each of a combination of resources available for a task’s execution. The CORA (Coallocative, Oversubscribing Resource Allocation) [16] architecture is a market-based resource reservation system that utilizes the trustworthy Vickrey auction to make combinatorial allocations of resources. These approaches focus on devising economic Grid methods based on the trading prices for resources used to achieve various goals. Although we adopt a similar bidding scheme, unlike these approaches, our objective is to select appropriate resources for requesters and also improve the performance by minimizing the turnaround time in a non-reserved bidding-based Grid environment. Surfer [22] is a resource selection and ranking framework that adopts a pull-based protocol (a simplified bidding model) to extract the highest ranked resources. A pull-based model allows requesters to obtain dynamic information directly from providers, but it only provides a general resource selection framework and it is neutral in terms of selection policies. [21] presents an agentbased resource selection mechanism that splits the Grid scheduling process into two phases. In the discovery phase, resources that do not satisfy static resource requirements are filtered out. Then, in the second phase, requesters negotiate directly with providers to determine the current state of the remaining resources and select those suitable for the job’s execution. Unlike our work, the approach in [21] focuses on the benefit to individual requesters instead of all requesters, and it assumes that each provider adopts the reserved resource model. 3. Problem statement Suppose that R = {r1 , r2 , . . . , rm } requesters and S = {s1 , s2 , . . . , sn } providers are given in a non-reserved bidding-based Grid system. A requester ri ∈ R submits Ji = {ji,1 , ji,2 , . . . , ji,li } jobs
dynamically within a given time period T . The arrival rates of jobs are different for each requester. Each job ji,k ∈ Ji has a workload wi,k , which is included in CFP messages and sent to a set of providers. After sending CFP messages, the requester ri is given a deadline di for current bids, and bids received after the deadline will be ignored. Once the deadline has passed, requester ri starts evaluating bids within a time interval ei and finally submits a job to the selected provider. The initiation time tiniti,k is the point at which job ji,k arrives. Note that since we assume this is an online system where jobs arrive dynamically, the workload wi,k and initiation time tiniti,k are not known a priori. We assume each provider sj ∈ S manages a computing resource that has a given CPU capability cj . A bid proposed by a provider sj in reply to a CFP from requester ri includes an expected available time aj,i,k and a predicted execution time etj,i,k , which is approximately wi,k /cj , for executing job ji,k . The expected available time is the time at which a provider finishes the execution of all accepted jobs. As mentioned previously, we focus on the non-reserved bidding model in which providers do not reserve resources for each bid proposed by them; therefore, the expected available time is not updated by a provider until it actually receives a job. In other words, each provider proposes bids with the same expected available time for each CFP before it receives a job. Thus, we define the actual available time uj,i,k as the point at which provider sj starts executing job ji,k . Because resources are not reserved for each bid, the available time may be postponed, i.e., uj,i,k ≥ aj,i,k . Fig. 2 shows an example of the postponement phenomenon in the non-reserved bidding model. Three job requests jm , jm+1 and jm+2 were included in CFPm , CFPm+1 and CFPm+2 , respectively, and sent to provider sj . Because the provider did not accept any jobs between the times that CFPm and CFPm+2 were received, under the non-reserved bidding model, the three requesting jobs jm , jm+1 and jm+2 were allocated, respectively, bids bn , bn+1 and bn+2 with the same expected available time aj,n . Since the requester of job jm decided to submit its job before the requester of job jm+1 , job jm can be executed at actual available time um , which is equal to the expected available time aj,n . However, because the time slot after aj,n had been allocated to job jm , the actual available time for executing job jm+1 would be postponed to um+1 , i.e. aj,n +etj,m . Thus, only one of the competitors, which received bids with the same expected available time, can be executed at the proposed expected available time, and the execution times of others will be deferred. Competitors Pj,i,k of job ji,k are the requesting jobs that receive bids with the same expected available time from provider sj , and each one receives its bid before job ji,k . For example, there is no competitor of job jm in Fig. 2. Job jm+1 has one competitor contending for the resource of provider sj , i.e., Pj,m+1 = {jm }, whereas job jm+2 has two competitors contending for the resource of provider sj , i.e., Pj,m+2 = {jm , jm+1 }. We also define the order of competitors as the time precedence (≺) that a provider proposes the bids with the same expected available time to competitors. For example, in Fig. 2, the order of Competitors Pj,m+2 is (jm ≺ jm+1 ), so provider sj sent a bid to the requester of job jm before sending it to the requester of job jm + 1. Not all providers in the system are capable of proposing bids to requesters to satisfy job requests. Hence, we define that there are Qi,k ⊆ S contactable providers in the system (a.k.a.
186
C.-M. Wang et al. / Future Generation Computer Systems 26 (2010) 183–197
Table 1 The notations used for the turnaround time and the presented heuristics. Symbol
Description
R
The total number of requesters {r1 , r2 , . . . , rm } in the system. The total number of providers {s1 , s2 , . . . , sn } in the system. A requester where ri ∈ R. A provider where sj ∈ S. The CPU capability cj of each provider sj . The time period during which requesters submit jobs. The total number of jobs {ji,1 , ji,2 , . . . , ji,li } that requester ri submits during T . A job submitted by requester ri , where ji,k ∈ Ji . Each job ji,k ∈ Ji has a workload wi,j . The time job ji,k arrives. The execution time (≈ wi,k /cj ) predicted by provider sj during which provider sj executes ji,k . The expected available time at which provider sj expects to start executing job ji,k The actual time that provider sj starts executing job ji,k . The probability of selecting resource sj to execute job ji,k . The contactable providers for job ji,k . The competitors of job ji,k that contend for provider sj .
S ri sj cj T Ji ji,k
wi,k tiniti,k etj,i,k aj,i,k uj,i,k pj,i,k Qi,k Pj,i,k
feasible machines in [30]) that can propose bids to requester ri for job ji,k . A list of contactable providers can be obtained from lightweight/hierarchical matchmakers [31,32] or by employing peer discovery technologies of P2P [24,33]. Recall that, in the bidding model, requesters do not have global information about other requesters’ selection decisions. Therefore, if all requesters greedily select the same provider, e.g., the one with the most powerful CPU capability or the minimum completion time, the load on that provider would become imbalanced. To address the problem, we adopt a probabilistic concept for allocating jobs, whereby the most powerful providers can execute more jobs, but the less powerful ones can still be employed. Suppose that the probability of selecting provider sj to execute job ji,k is pj,i,k . Under the non-reserved bidding model, given a time period T in which requesters dynamically generate jobs for submission to providers, we try to allocate the jobs such that the total turnaround time for all jobs will be minimized; that is, we try to find an appropriate pj,i,k . The notations used for turnaround time and presented heuristics are listed in Table 1. Eq. (1) defines the formal objective function that we want to minimize. The job execution time is etj,i,k and the waiting time is uj,i,k − tiniti,k , which is the actual available time minus the initiation time. As similar to [34,35], we focus on the variables of job execution time and waiting time for selecting resources, so we assume each machine interconnected with high-speed links. Thus, the sum of the waiting time and the job execution time is the turnaround time. m
li
XX
(etj,i,k + uj,i,k − tiniti,k ),
where ji,k is assigned to sj .
(1)
i=1 k=1
4. Heuristics Resource selection in the non-reserved bidding-based model presents two major challenges. The first is the lack of a global information system to facilitate optimum decision-making so that a requester cannot determine if its competitors are selecting for the resource it requests. The second challenge is that, since we assume the non-reserved bidding model works in an on-line system, the job arrival time and job workload are not known a priori. To address these challenges, we propose a set of resource selection heuristics for various levels of information released by resource providers under this model. Specifically, we consider four levels of information:
(1) No Competitors’ Information: Only information about predicted job execution time and expected available time is revealed by providers. (2) The number of Competitors: Besides the above information, providers list the number of competitors. (3) Competitors’ Execution Times: In addition to the above information, providers report the execution times of competitors. (4) Complete Information about Competitors: Besides the above information, providers release the order of competing jobs. Previous works [34,36] proposed Minimum Execution Time and Minimum Completion Time strategies to facilitate centralized task allocation. However, because the non-reserved bidding model does not provide centralized control, the resource loads would become imbalanced if all requesters greedily select the same resource. Therefore, to allocate jobs, we propose a set of heuristics based on the notion of probability. To find an appropriate probability to minimize the turnaround time, the probability pj,i,k of allocating job ji,k to provider sj is derived by each proposed heuristic. We also consider the extreme case of probabilistic resource selection, i.e., the probability of selecting the most preferred provider is one, and the probability of selecting the others is zero. We call the extreme case Deterministic selection and other cases Probabilistic selection. For ease of presentation, we discuss the heuristics in the following order: No Competitors’ Information, Complete Information about Competitors, Competitors’ Execution Times, and The Number of Competitors. 4.1. No competitors’ information For the level of no competitors’ information, we propose one random and three probabilistic resource selection heuristics in addition to the Minimum Execution Time and the Minimum Completion Time. We use the postfixes ‘‘-D’’ and ‘‘-P’’ to distinguish deterministic strategies from probabilistic strategies. The formal definition of each strategy is as follows. Random selection (RANDOM): For a job ji,k , the probability of selecting one of the contactable providers Qi,k is calculated as follows: pj,i,k =
1
|Qi,k |
.
Minimum Execution Time-Deterministic (MET-D): The provider that offers the minimum execution time for the job ji,k is selected. The formulation of the MET-D heuristic is as follows:
pj,i,k =
1, 0,
if etj,i,k is minimum ∀sj ∈ Qi,k , otherwise.
Minimum Execution Time-Probabilistic (MET-P): For the job ji,k , the probability of selecting one of the contactable providers Qi,k is proportional to the CPU capability of the provider over that of all contactable providers. The formulation of MET-P heuristic is as follows: 1 etj,i,k
pj,i,k =
P ∀sn ∈Qi,k
1 etn,i,k
.
Minimum Completion Time-Deterministic (MCT-D): The provider that offers the minimum completion time, i.e., the waiting time plus the execution time, for job ji,k is selected. The formulation of the MCT-D heuristic is as follows: 1,
if etj,i,k + max{aj,i,k , tiniti,k } − tiniti,k is minimum ∀sj ∈ Qi,k , 0, otherwise.
( pj,i,k =
C.-M. Wang et al. / Future Generation Computer Systems 26 (2010) 183–197
Minimum Completion Time-Probabilistic (MCT-P): For the job ji,ki , the probability of selecting one of the contactable providers Qi,k is proportional to the inverse of the completion time of the provider over that of all contactable providers. The formulation of the MCT-P heuristic is as follows: pj,i,k =
1 etj,i,k +max{aj,i,k ,tinit
P ∀sn ∈Qi,k
}−tiniti,k i,k 1 etn,i,k +max{an,i,k ,tinit }−tinit i,k i,k
a
b
187
c
.
Fig. 3. An example of the Dissolve-P heuristic.
Algorithm 1 The Algorithm of Dissolve-P 1: 2: 3: 4: 5: 6: 7: 8: 9: 10: 11: 12: 13: 14: 15: 16: 17: 18: 19: 20: 21: 22: 23: 24: 25: 26: 27: 28: 29:
− →
Q i,k ← sort Qi,k of job ji,k by waiting time. w = wi,k ; ct = 0; numOfProviders = 0; − → for ∀sj ∈ Q i,k do wtj,i,k = max{aj,i,k , tiniti,k } − tiniti,k ; c = sj .capability; for k ← 1 to j do c + = sk .capability; end for − → if j (w tj+1,i,k − w tj,i,k ) then w = w − c × (w tj+1,i,k − wtj,i,k ); ct = w tj+1,i,k ; else ct = (w/c ) + w tj,i,k ; numOfProviders = j; break; end if else ct = (w/c ) + w tj,i,k ; numOfProviders = j; end if end for for j ← 1 to numOfProviders do wtj,i,k = max{aj,i,k , tiniti,k } − tiniti,k ; pj,i,k = (sj .capability × (ct − w tj,i,k ))/wi,k ; end for
Dissolve-Probabilistic(Dissolve-P): This heuristic is inspired by the way ice cubes dissolve. We treat the workload of a job as an ice cube that can be dissolved by several providers. Fig. 3 shows an example of the Dissolve-P heuristic. First, it sorts the providers based on their waiting times. Then, it tries to dissolve the workload on provider s1 , which offers the minimum waiting time, and checks if s1 has enough capability to perform the job (Fig. 3(a)). However, s1 does not have enough capability to service the job because the completion time ct1 , i.e., the time that s1 could complete the job, would be greater than the waiting time, w t2 , of provider s2 . This means the workload would overflow to provider s2 , which would have to help service the job. Likewise, in Fig. 3(b), providers s1 and s2 are not capable of servicing the job because completion time ct2 would be longer than w t3 . The heuristic repeats the process to check if the workload overflows to provider s3 (Fig. 3(c)). It finds that providers s1 , s2 and s3 can service the job, i.e., the completion time ct3 is less than the waiting time w t4 . In this way, we can derive the selection probability of each involved provider from its potential contribution to executing the job.
Algorithm 1 details the steps of the Dissolve-P heuristic. Given a set of contactable providers Qi,k of job ji,k , suppose that we want to obtain the selection probability pj,i,k of provider sj involved in the dissolution process. First, the algorithm sorts the contactable providers for job ji,k in order of waiting time from the shortest to the longest (line 1). The remaining workload, w , which is not yet dissolved, is initially set as the workload of ji,k , and ct is the final completion time of the providers involved in executing ji,k (lines 2–3). In line 4, numOfProviders denotes the number of providers involved in the dissolution process. The for loop of lines 5–25 is used to determine how many providers are involved in the dissolution and the final completion time for executing ji,k . The aggregate capability, c, of the involved providers, comprised of the current and previously involved providers, is derived from lines 7–9. We use two cases to determine the number of involved providers and the final completion time. The condition in line 11 is used to check if the number of partial providers have sufficient capability to serve ji,k . If the capability is insufficient, the algorithm proceeds to the second case (lines 22–23), indicating that all contactable providers are involved in the dissolution. In the first case, if the aggregate capability is sufficient to serve ji,k (lines 17–19), the number of involved providers and the final completion time can be determined. Otherwise, the algorithm subtracts the workload executed by the aggregate capability from the remaining workload (line 14), updates the completion time as the waiting time of the next provider (line 15), and jumps to the next loop. In the second for loop (lines 26–29), after the number of involved providers and the final completion time are determined, the selection probability of each involved provider can be obtained from the proportion of the workload served to the whole workload of ji,k . 4.2. Complete information about competitors Under the bidding model, a number of competitors can contend for a resource simultaneously. If the provider reveals information about the competitors, we could schedule the competitors in advance and make proper selection decisions to balance the providers’ loads, which would further minimize turnaround time. To this end, we propose a mechanism that pre-schedules the competitors and updates the waiting time of each involved provider. Based on the updated waiting times, we use the previous heuristics to derive appropriate probabilities for resource selection. Before pre-scheduling competitors, we have to sort the jobs by their time precedence (≺), since we assume that the earlier jobs have more opportunities to be allocated first. Fig. 4 shows an example of the sorting process for job ji,k . In this example, Qi,k = {s1 , . . . , s5 } are the contactable providers for job ji,k and
188
C.-M. Wang et al. / Future Generation Computer Systems 26 (2010) 183–197
a
b
c
d
Fig. 4. An example of the process for sorting competitors.
Pi,k = {j1 , . . . , j8 } are the competitors of ji,k . If the providers reveal details of the number of competitors to the requester, we can construct a bipartite graph to describe the relationships between providers and competitors, as shown in Fig. 4(a). Furthermore, if the order information of competitors is released, we can construct the precedence links of the competitors for each provider, as shown in Fig. 4(b). Then, we can merge the precedence links into a precedence graph, as shown in Fig. 4(c). Finally, we can use a topological sort [37] to derive the time precedence of competitors (j1 ≺ j3 ≺ j4 ≺ j5 ≺ j7 ≺ j8 ≺ j2 ≺ j6 ), as shown in Fig. 4(d). After sorting, we can determine the time precedence of competitors as well as the providers that may be selected. We start prescheduling the competitors from the earliest to the latest based on the previously proposed heuristics. For instance, if we apply the MCT-P heuristic in the pre-scheduling stage, each competing job would use the same heuristic to select providers. Once a competitor selects a provider, the waiting time of the provider will be updated to include the execution time of the competitor. The reason for updating the waiting time of the providers selected by competitors is that the jobs have a chance to be submitted to the selected providers before the requester submits its job to the selected providers. Based on the updated waiting times of the contactable providers, we reuse the previously proposed heuristics to select an appropriate provider for a requester. We propose the following three extensions of the previous heuristics for this level of information. Minimum Completion Time-Deterministic-Complete (MCTD-C): In the pre-scheduling phase, we assume that each competitor adopts the MCT-D heuristic to select providers, and uses the competitors’ execution times as input to derive the updated waiting times of their respective providers. After pre-scheduling the competitors by the MCT-D heuristic, the provider that offers the minimum completion time is selected for the job’s execution. Minimum Completion Time-Probabilistic-Complete (MCTP-C): In the pre-scheduling phase, we assume that each competitor adopts the MCT-P heuristic to select providers, and uses the other competitors’ execution times as input to derive the updated waiting times of their respective providers. After pre-scheduling the competitors by the MCT-P heuristic, the
probability of selecting one of the providers is proportional to the inverse of the completion time of the provider over that of all contactable providers. Dissolve-Probabilistic-Complete (Dissolve-P-C): In the prescheduling phase, we assume that each competitor adopts the Dissolve-P heuristic to select providers, and uses the competitors’ execution times as input to derive the updated waiting times of their respective providers. After pre-scheduling the competing jobs by the Dissolve-P heuristic, Algorithm 1 is reapplied to calculate the selection probability of the providers; however, the waiting time in Algorithm 1 must be replaced by the updated waiting time derived by the pre-scheduling process. 4.3. Competitors’ execution times In this level of information, providers reveal the competitors’ execution times, but not the order of the competitors. Due to the lack of order information, we cannot derive the time precedence of competitors, e.g., the case in Fig. 4(d). To solve this problem, we sort the order of competitors in an arbitrary manner. For this information level, we propose three heuristics, Minimum Completion Time-Deterministic-Execution Time (MCT-D-E), Minimum Completion Time-Probabilistic-Execution Time (MCT-P-E) and Dissolve-Probabilistic-Execution Time (Dissolve-P-E). They are similar to the heuristics for Complete Information about Competitors, but the order of competitors is arranged arbitrarily. 4.4. The number of competitors Requesters know the number of competitors, but not the order and execution times of the competitors. Without the time information, we cannot determine the updated waiting times of the providers selected by competitors in the pre-scheduling phase. To solve this problem, we substitute the execution time of the requester’s job for those of the competitors. For instance, in Fig. 4, the execution time et1,i,k of job ji,k is used as a substitute for execution times et1,4 , et1,5 , et1,6 of jobs j4 , j5 , and j6 . For this level of information, we propose three heuristics, Minimum Completion Time-Deterministic-Number (MCT-D-N), Minimum Completion Time-Probabilistic-Number (MCT-P-N) and Dissolve-ProbabilisticNumber (Dissolve-P-N). They are similar to the heuristics for Competitors’ Execution Times, but they take the execution time of the requester’s job as a substitute to derive the updated waiting times of the involved providers in the pre-scheduling phase. 5. Performance evaluation 5.1. Experiment setup The objective of the experiments described in this section is threefold: (1) to evaluate the performance of each proposed heuristic under the various levels of information revealed by resource providers; (2) to determine the impact of non-cooperative requesters; and (3) to compare the performance with a centralized matchmaking model. For the experiments, we developed a resource selection tool running on Taiwan UniGrid [38] to evaluate the performance of the presented heuristics. Table 2 details the set-up parameters used in the experiments. We selected 20 machines from Taiwan Unigrid as resource providers, as shown in the first column in Fig. 4. For the job types, we adopted the Linpack benchmark [39,40] and four application benchmarks, namely, the Fast Fourier Transform (FFT), Jacobi Successive Over-relaxation (SOR), Monte Carlo Integration (MCI) and Dense LU Matrix Factorization (LU), provided by SciMark2 [41]. As shown in Table 3, four of the benchmarks comprised two different-size problems. MCI (not
C.-M. Wang et al. / Future Generation Computer Systems 26 (2010) 183–197 Table 2 Experiment settings. Parameter
Value
Number of providers (|S |) Number of requesters (|R|) Number of contactable providers (|Qi,k |) Job arrival rate
20 4, 8, 12, 16 and 20 20 and Random(20)
Deadline for waiting bids (di ) Experiment period (T )
Negative Exponential Distribution with mean=50 seconds 20 s 15 min
shown) only had one problem. Thus, we had a total of 9 benchmarks, from which the type of each experimental job was selected at random. Fig. 4 shows a snapshot of the time required to execute each type of job on each selected machine. The result is the average of executing each type of job three times. Up (jn+1 ) =
n X (wi · U (ji )),
where wi =
i =1
i n P
.
(2)
j =1
To help providers report the predicted job execution time for each proposed bid, we adopt the weighted-mean prediction function to predict the execution times of jobs that belong to the same job type. Eq. (2) is the formal definition of the weighted-mean prediction function, where Up (jn+1 ) is the predicted execution time for the next job, and U (ji ) denotes the actual execution time of the previous job. Each actual execution time is assigned a weight, based on the freshness degree of the jobs already executed, i.e., the execution time of the most recent job has more weight than the execution times of the second and third most recently executed jobs. In our experiments, we take the execution times of the last 5 jobs as historical data to predict the execution time of the next job. To solve the problem of the lack of historical data at the beginning of the experiments, we use the execution times listed in Table 4 as historical data to predict the execution times of the first 5 jobs.
189
In the following experiments, the deviation rate of execution time prediction is approximately 7% on average. We conduct four groups of experiments to evaluate the performance of the proposed heuristics for each level of information. To observe the influence of the system load on the performance, we use various numbers of requesters, ranging from 4 to 20, to represent the relative system loads. Using 4 requesters allows us to assess the performance when only a small number of requesters contend for resources, whereas 20 requesters allows us to observe the performance when a large number of requesters compete for limited resources. Clearly, the system load is proportional to the number of requesters that join the system. We also consider the impact of the number of contactable providers on the performance of resource selection. Specifically, we assess two groups, comprised of 20 and Random(20) providers respectively. The different numbers of contactable providers represent the levels of heterogeneity of the providers. |Qi,k | = |S | = 20 indicates that the providers are homogeneous in terms of their specifications, except for their computational capability. Thus, potentially, all providers could be accessed by all requesters. In contrast, |Qi,k | = Random(20)