A Proactive Non-Cooperative Game-Theoretic ... - IEEE Xplore

Eighth IEEE International Symposium on Cluster Computing and the Grid

A Proactive Non-Cooperative Game-theoretic Framework for Data Replication in Data Grids Ali H. Elghirani, Student Member, IEEE, Riky Subrata, Member, IEEE, and Albert Y. Zomaya, Fellow, IEEE

Abstract— Data grids and its cost effective nature has taken on a new level of interest in recent years; amalgamation of different providers results in increased capacity as well as lower energy costs. As a result, there are efforts worldwide to design more efficient data replication algorithms. Such replication algorithm for grids is further complicated by the fact that the different sites in a grid system are likely to have different ownerships with their own self interest and priorities. As such, any replication algorithm that simply aims to minimize total job delays are likely to fail in grids. Further, a grid differs from traditional high performance computing systems in the heterogeneity of the communication links that connect the different nodes together. In this paper, we propose a distributed, non-cooperative game theoretic approach to the data replication problem in grids. Our proposed replication scheme directly takes into account the self interest and priorities of the different providers in a grid, and maximizes the utility of each provider individually. Experiments were conducted to show the applicability of the proposed approaches. One advantage of our scheme is the relatively low overhead and robust performance against inaccuracies in performance prediction information. Index Terms—Data grids, framework, non-cooperative game theory, replication.

D

I. INTRODUCTION

grids provide geographically distributed storage resources to large computational problems that require evaluation and management of large amounts of data. Such infrastructure is attractive due to its cost effectiveness and fault tolerant nature. The idea behind data grids, that of integrating the different providers together into one virtual system, results in increased capacity as well as lower energy costs. However, such systems have different constraints and requirements to those of traditional high performance computing systems, such as heterogeneous computing resources and considerable communication delays. Data replication in grids is further complicated by the fact that the multitudes of sites that comprise a grid are likely to have a multitude of ownerships with their own self interest and priorities. Therefore, a data replication algorithm designed for the grid should not just aim to minimize the total job ATA

The authors are with the School of Information Technologies, University of Sydney, Building J12, Sydney, NSW 2006, Australia. E-mail: {aghirani, efax, zomaya}@it.usyd.edu.au.

978-0-7695-3156-4/08 $25.00 © 2008 IEEE DOI 10.1109/CCGRID.2008.22

433

delays, but also take into account the self interest of each individual provider. In general, data replication algorithms can be classified as static or dynamic. In static algorithms, job allocation decisions are made at compile time and remain constant during runtime. Such algorithms assume all information governing data replication decisions are known in advance; such an assumption may not apply to a grid environment. In contrast, dynamic data replication algorithms (e.g. [6]) attempt to use the runtime state information to make more informative data replication decisions. Undoubtedly, the static approach is easier to implement and has minimal runtime overhead. However, dynamic approaches may result in better performance. One major drawback of dynamic algorithms is their sensitivity to inaccuracies in performance prediction information that the algorithm uses for replication purposes. Some dynamic replication algorithms are more sensitive to the inaccuracies, and can generate extremely poor results even when the information accuracy is only slightly less than 100%; in real grid environments however, 100% accuracy in information is very hard to achieve and maintain. Data replication schemes can be either centralized or decentralized. In the centralized approach, one node in the system acts as a scheduler and makes all the data replication decisions. Information is sent from the other nodes to this node, and is therefore the single point of failure. In the decentralized approach, multiple nodes in the system are involved in the data replication decisions. It is therefore very costly for each node to obtain and maintain the dynamic state information of the whole system; most decentralized approaches have each node obtaining and maintaining only partial information locally to make sub-optimal decisions. Decentralized approaches, however, offers better fault tolerance than centralized ones. In this paper, we propose a distributed, non-cooperative game theoretic approach to the data replication problem in grids. Our proposed replication scheme directly takes into account the self interest and priorities of the different providers in a grid, and maximizes the utility of each provider individually. We discuss the notion of payment (or utility) received by a storage provider for each dataset retrieved from them. Naturally then, each provider would attempt to maximize its own profit/utility. Multiple providers would result in a competition for profit maximization. In the

experiments we show that such competition is not necessarily bad and has the effect of driving down the average job delays. That is, through proper utilities, selfish profit maximization and competition has the secondary effect of minimizing the average job delays. Therefore, the contribution of this work is two fold, (1) considering the self interest of the different resource providers and the maximization of their payoffs and (2) showing the applicability of employing non-cooperative replication games on the system level to achieve better data access time and hence enhancing the grid system performance. We show the existence of the Nash equilibrium in the game, and that the Nash equilibria is relatively efficient, in terms of average job delays experienced in the system. The proposed algorithm can be considered semi-static, as it responds to changes in system states during run-time. However, it does not use as much information as traditional dynamic schemes; as such, it has relatively low overhead, is relatively insensitive to inaccuracies in performance prediction information, and stable. The next section provides a brief overview of related work. In section III we present an overview of the system model including the grid and communication model that we are using. This is followed by the development of a game theoretic algorithm to solve the grid data replication problems. The Experiment section provides a number of detailed experiments that shows the applicability of the proposed approaches.

balancing scheme using general service times and taking into account communication delays (that may be significant in web-scale distributed systems) was presented in [15]. The use of general service times (having finite mean and variance) and M/G/1 queuing system results in a more realistic model to the usual simplifying assumption of exponential service times. The paper uses the Bounded Pareto distribution with the α k α ⋅ x −α −1 , k ≤ x ≤ p to represent the following pdf: f ( x ) = α 1− k p

( )

heavy-tailed property of the internet. Distributed algorithms for network bandwidth control based on cooperative games have also been reported in the literature [16, 20]; it should be noted that better performance, or even Pareto optimal solutions may be achieved through cooperation. However, in cooperative games one must also take into account hostile players – players that do not cooperate fully. From another perspective, an important question for any cooperative algorithm or scheme designed for the internet (or web-scale distributed systems) whose sites have different ownerships is: if these sites behave in a selfish manner (instead of in a ‘socially responsible’ manner as expected), would the stability of the system still hold? Under these conditions the system as a whole would tend toward the Nash equilibrium, and is therefore preferable to have relatively efficient Nash equilibria (e.g. see [1, 10]). In relation to the above mentioned prior work, our approach that we will propose is novel in that we directly take into account the multitude of ownerships of providers in a grid and incentives for cooperation, and maximizes the utility of each provider individually.

II. RELATED WORK In recent years there have been an interest in the use of game theoretic and market oriented models for the design and analysis of distributed systems and networking algorithms [11, 13, 14]. A so called G-commerce model was presented in [18] for the control of computational resources in grid environment; two markets – commodities market and auctions, were studied, and it was concluded that commodities market is a better choice of model. Several systems such Spawn [17] and POPCORN [12] employ decentralized auctions with resource accounting. Challenger [5] is another system which implements load balancing with market approach. Requests for bids are sent to agents in the network to provide estimates for job execution times. Nimrod-G [4] is a more flexible system that employs an economy-driven resource broker for scheduling computations on globally distributed resources in a typical grid environment. The Vickrey-Clarke-Groves (VCG) mechanism was studied in [8] for load balancing in distributed systems; in the model, each computer optimizes its profit by considering the payments and costs of handling particular jobs. An auction style model for data replication was studied in [2]. Another auction style pricing strategy model for job allocation was discussed in [7]. In [9] distributed self-interested agents are used in a game to enhance the performance of the servers that they represent. The work in [3] uses an empirical model to derive the Nash equilibrium for caching. More recently, a game theoretic, decentralized load

III. SYSTEM MODEL We assume that the data grid system consists of a set of sites S connected by a communication network. In general, each site may contain multiple computing nodes, where each computing node (CE) may have single or multiple processors. The processors in the nodes are heterogeneous, meaning they may have different processing power. Without loss of generality and to emphasize our main ideas, we assume each site has one computing node equipped with a single processor; the processors in the different computing nodes have different processing power. The sites s1,…,sn in S are fully interconnected, meaning that there exists a communication path between any two sites (si , sj) in S. Our communication model represents network performance between a site si to a site sj using a data transmission rate cij representing the bandwidth available on the path from si to sj. For a message of size m to be transmitted from site si to sj, the transmission time is then given by

Li , j =

m ci , j

(1)

cij can be calculated from analytical models or historical information, or dynamically forecasted by facilities such as the

434

Network Weather Service (NWS) [19]. Each site si in the grid system can represent one or a combination of the following entities: A user; a user generates jobs to be executed by the processors. Each user sends the jobs to the scheduler to be scheduled for processing. Each job of the application needs a minimum of one dataset to execute. A scheduler; a scheduler receives jobs from a set of users and assigns them to the processors in the grid system. Every time a job is received, the scheduler decides which processors will process the jobs of the application and sends the jobs to the selected processors. Ideally, there would be many more users than schedulers in the system. As such, the jobs scheduled by the schedulers are an aggregate from many users. A processor; a processor (or computing element, CE) executes and processes jobs sent to it. Each processor has a queue that holds jobs to be executed; each job is then processed on a first-come-first-serve (FCFS) basis. Finally, a storage element (SE); the SE can be a storage device attached to the computing host, a Mass Storage facility connected to the Internet, or a storage server of the site with a limited storage capacity. Fig. 1. shows the relationship between users, the scheduler, processors, and storages. Note that a site in the grid system can be a user, a scheduler, a processor, and a storage all at the same time. That is, the site generates jobs that need processing, receives jobs from other users and schedules both its own jobs and others to the processors, and also executes and processes both its own jobs and others. In this work we consider a single central scheduler that receives and schedules jobs of applications from different users of the system. Obviously, the jobs that are executed locally at the site will have minimal communication delay Li,j if all of the datasets requested by the job is available locally. We consider much of the delay to be incurred by the transfer of the datasets. Hence the use of proactive replication of the datasets to mitigate the delay of data transfers at the job execution time. Compute Nodes + SE

Main Storage Host DM

CE 1

User 1

d1 CE 2

User 2

SE 1

Scheduler

d2 CE 3

User 3

CE # n

IV. PROACTIVE NON-COOPERATIVE REPLICATION GAME The structure of our game is that of a non-cooperative game, whereby the players have an interest in maximizing their own utilities without regards to other players. One basic assumption underlying the theory is that players are perfectly rational and pursue well-defined objectives. Before we delve into the description of our formulation of the data replication game, we give a few essential definitions that would enhance the understanding of the structure of the game.

SE 2

.. . User p

an SE in which case the CE uses remote access to the datasets required by the jobs to be executed, or just an independent SE. For the replication problem, within this system we consider the existence of a set of m storage elements Φ = {se1,…,sem}, a set of n computing nodes Λ= {ce1,…,cen}, and a main repository host that has all the datasets required for the jobs. We also have a set of p users = {u1,…,up}. 2) A set of h datasets D = {d1,…,dh}, each identified by a unique name and size in MB. A minimum of one dataset is needed by each job for execution and are shared by different jobs. Jobs are then sent by the users to the scheduler that dispatches them to the CEs (Fig. 1). The scheduler is responsible for the scheduling of jobs to the computation nodes (CEs) with minimum execution time; as the scheduler does not know how long a job will exactly take, the scheduler uses the shortest queue length rule: a job is sent to the processor with the shortest queue length. Every time a CE receives a job for execution it checks if the datasets required by the job are available locally, otherwise it fetches them from the closest SEs with minimum transfer time. If none of the SEs has the required datasets or not all of the datasets needed are available at the SEs, the CE fetches the required datasets from the main repository. We assume that the sizes of the datasets have a range from several MB to some order of GBs, and datasets can be shared by different jobs. We assume the existence of one primary copy of each dataset in the system and this primary copy cannot be de-allocated. We also assume all primary copies are stored in a data repository host with a data manager process that maintains information about the datasets and the whole replication scheme.

Definition 1: (strategy profile): A strategy profile r = {r1 , r2 ,", rm } is a set of strategies for each SE j (1≤j≤m) in

d3 SE 3

dn SE m

Fig. 1. Relationship of users, scheduler, processors, and storages.

For job scheduling and data replication purposes, we define the following for the grid system: 1) A set of N sites, S = {s1,…,sN} each has a processor (CE) and a storage element SE with limited computational power and storage capacity respectively, or a CE without

435

the game, and which fully specify all of its actions. A strategy profile must include one and only one strategy for every SE. Definition 2: (Nash equilibrium): The Nash equilibrium for the game is a strategy profile r = {r1 , r2 ,", rm } in which no player can increase its utility by unilaterally changing its strategy. That is, for every player j (1≤j≤m), r j ∈ arg max U j (r1 , r2 ,..., r 'j ,..., rm ) (2) ' rj

We model our data replication problem as a noncooperative game whereby the objective of the game is to reach the Nash equilibrium. In this game, we have a set of CE, SE, and a data manager that has all of the primary copies of the datasets. The players in the game are the SEs, and their objective in the game is individualistic profit maximization by each of them (the SEs) from payoffs by the CEs. By doing this, each SE in effect tries to have a cache of datasets that are closer to the CEs. This in turn, results in lower total access cost of the datasets required for the execution of the jobs in the application. We assume each SE achieves or receives a certain payoff or utility if datasets are fetched from them. Each SE then tries to maximize its own utility, independently of the other SEs. We assume a certain communication price for an SE j, which is the inverse of the nominal time that would be taken to transfer a dataset i: c j,k δ ij , k = (3) di where di is the dataset size, and cj,k is the communication bandwidth from SEj to CEk. This price is dependent on each CE, as cj,k is dependent on each CE. The utility received by an SE will also depend on the rate of access (frequency of use) of the dataset i at CE k, λki , and the probability of a CE k getting

h  n  U j = ∑  ∑ δ ji , k ⋅ λki ⋅ pij , k ⋅ rji  (6) i =1  k =1  The goal of each SE is to find a feasible replication strategy

{

r j = r1j , rj2 ," , rjm

preferred over a strategy r j if it results in a higher utility. Naturally we have the constraint that each SE cannot exceed its total storage capacity: h

∑r i =1

pij , k will depend on several factors

including the communication bandwidth available from SE j to CE k, and how many other SEs has the dataset. For our purposes, we will assume that a pij , k = 1 if SE j has the highest bandwidth to CE k, cj,k, compared to other SE that has the dataset, and zero otherwise. We currently consider a utility function for an SE j for a dataset i that is equal to the expected payment/price for the dataset: n

(7)

(8)

Theorem (1) (Existence of Nash equilibrium): A Nash equilibrium exists in our game model. λi Proof: Let ωki = k denote the potential utility for a dataset i di for a computing element (CE) k. Suppose that the potential ωi utility k for every dataset i (1≤i≤h) for every computing element k (1≤k ≤n) is known, and is sorted and indexed according to its potential utility value. Let each element in the Γ ( i, k ) list be denoted as u . Without loss of generality assume Γ ( i, k ) in the sorted list that the relationship of each adjacent u Γ1 > Γ 2 > ... > Γ p is strictly decreasing, that is , where p has the value of h⋅n. Then Γ1, the dataset with the highest potential utility will be replicated by the storage element (SE) j (1≤j≤m) with the lowest communication cost, conversely the SE with the highest communication bandwidth cj,k, and available storage capacity. There is then no more incentive for other SEs to replicate this dataset i for the purpose of satisfying this particular CE k, as the probability of the CE requesting the dataset from other SEs is zero. Similarly, Γ2 will be replicated by the SE with the lowest communication cost and available storage capacity, and there is then no more incentive for other SEs to replicate the dataset to satisfy the particular CE. Continuing in a similar fashion for the rest of the elements in the list, in the end there is no incentive for any CE to unilaterally change strategy as doing so will decrease their utility. Therefore a Nash equilibrium exists in the game. As each player acts independently of the other players, to reach the Nash equilibrium the following process is used: each players periodically calculates a strategy ri that results in minimum expected utility Uj. This strategy is a best-reply given the current state of the system. Each player periodically updates its strategy until an equilibrium is reached (no player wants to change its strategy as it results in a decrease in its expected utility Uj). The system will then remain in equilibrium until there are changes in the system’s states. Periodic replication by the players ensures optimum strategies for each player is maintained. Note that at each replication

(4)

U ij = ∑ δ ji , k ⋅ λki ⋅ pij , k ⋅ rji

⋅ di = z j

is rji ∈ {0,1}

k =1

The probability

i j

where zj is the storage capacity of SE j. We also have the constraint that rji can only take on the value zero or one, that

an SE j for a dataset i that it holds is therefore: n

such that its utility Uj is maximized,

independently of the other SEs. A strategy r 'j is always

the dataset i from SE j, pij , k . The expected payment/price for

κ ij = ∑ δ ji , k ⋅ λki ⋅ p ij , k

}

(5)

k =1

where rji is zero if SE j doesn’t have the dataset, and one if it has. The utility function has the following properties: the utility of having a dataset depends on the bandwidth the SE has to the CE that needs them. As it increases, the utility for having that dataset increases. The utility of having a dataset also depends on how many of that dataset is available in other SEs. As datasets become more common, the utility for having that dataset decreases. This is taken into account in pij , k . Finally, if a dataset is very popular the potential utility for having that dataset also increases. The overall utility function for SE j, considering all the possible datasets i (1≤i≤h) is then

436

instant a ‘best-reply’ strategy is employed by each player. Whether or not such ‘best-reply’ strategy converges to the Nash equilibrium remains an open problem [3]. In the next section experiments are conducted with different parameters that show convergence for more than two players. Our objective is to maximize equation (6) subject to constraints (7) and (8). The constrained maximization problem has the solution shown below. The procedure below can be easily shown to be correct.

this dataset if the dataset has been replicated by too many SEs. A similar concept exists in business which indicates that flooding the market with a certain product can result in the decrease in the payoff achieved by businesses as a result of the increase of the competition. Our approach leads to more datasets being distributed in the system resulting in less datasets being fetched from the main repository host. This in effect serves for better scalability and fault tolerance (prevents bottlenecks and single point of failure) of the system.

A. Algorithm (Best-Reply Replication): For a storage element sej∈Φ: 1. For each dataset di ∈D: 2. For each computing element cek∈Λ: c j,k i i 3. γ ij ,k ← ⋅ λk ⋅ p j ,k di

V. EXPERIMENTS AND RESULTS In order to evaluate the effectiveness of our game theoretic replication framework, different simulation scenarios were carried out with different heterogeneous system configurations involving number of SEs and computing nodes (CEs), number of datasets and their sizes, datasets access patterns, storage capacities and computing powers for the SEs and CEs respectively. The parameters for the experiments are as shown:

4. κ ij ←Sum of γ ij ,k End 5. Ψ j ← Set consisting of κ ij End 6. Sort and re-index the elements in the set Ψ j such

•

that κ < κ < ... < κ . 7. Replication decision of a dataset i is then according i  if ∑ rjk ⋅ di ≤ z j 1, i to rj =  k =1 0, otherwise  End Theorem (2): The Best-Reply Replication algorithm solves the Best-Reply(sej) optimization problem and is the best-reply strategy for a player j. Proof: Step 1 to 5 involves the calculation of the expected 1 j

2 j

h j

pi =

n

k =1

437

i −α

(9) β where α defines the shape of the exponential curve. For simulation purposes, instead of generating the datasets for the jobs at the scheduler level, we generate them at the CE level; to make a more interesting case, we use slightly different values of α at CE; to this end, α ranges from 1.9 to 2. β is a normalization constant and is given by:

payment κ ij = ∑ δ ji , k ⋅ λki ⋅ p ij , k , for a dataset i. In step 6 and step 7 datasets are selected for replication with priorities given to the datasets with the highest expected payment, given the capacity constraint. As such, the Best-Reply Replication algorithm maximizes the utility of sej within the constraints and is a best-reply for sej given the current state of the system. In this algorithm, each SE calculates the probability of each dataset being requested from it and the utility it will receive (expected utility). Based on the bandwidth between the SE j and the CE k for example, SE j checks if there are other SEs that host the datasets and have a higher bandwidth to the CE k (for example through the data manager). If this is the case SE j may refrain from replicating the dataset and moves to the next less popular dataset instead. The algorithm also considers the number of replicas of each dataset available in the system. That is, how many other SEs has the dataset which reflects the number of replicas of the same dataset in the system. The SE decides on the replication of the dataset if the number of SEs in the system hosting this dataset is still small enough for it to make a positive expected utility compared to hosting other datasets. The SE recognizes that it will not achieve high payoff through the replication of

The access pattern follows a power law (zipf) distribution as it is more realistic and has been used in the literature to model dataset frequency of use. This is to indicate that some datasets would be used more than others according to a power law. The probability of a datasets being used by a job, in decreasing order of frequency decreases in an exponential fashion according to:

h

β = ∑ i −α

(10)

i =1

•

• •

•

The sizes of the network range from 30 to 2000 nodes including both SEs and CEs. The number of SEs ranges from 10 to 500 while the number of CEs ranges from 20 to 1500. Capacity of SE ranges from 1 to 30 GB. The computational powers of the CEs are assumed to be in the range of 180 to 220 computational units per second. The dataset sizes are in the range of 500MB to 2GB and total size of the file set is over 300GB. The jobs are configured to require datasets for their execution ranging from 3 to 60 datasets. All master datasets are stored initially at a main repository host with a data manager process which also acts as a replica location service (RLS). Each simulation run lasted for 200,000 seconds. In this time, the number of jobs to be executed by the CEs is in the order of thousands.

•

improvement. The main advantages of GT scheme are the distributed structure, low complexity and overhead, and minimum information required.

As we want to evaluate the replication schemes, we assume applications that are data intensive, whereby data transfer time (if needed because there’s no local copy) takes a considerable amount relative to computation. For this purpose we assume an exponentially distributed job sizes with a mean of 1 computational unit. • The communication links have bandwidth in the range of 15MB/s to 150MB/s. The evaluation of the framework performed involved the use of the average job completion time, also referred to as makespan, as a metric for the comparison of different replication schemes used in the system. We compare our novel game theory approach (labeled GT) to the following: • IGRPS replication algorithm presented in [6]. The algorithm is labeled IGRPS. • A new heuristic (CeP) which proactively replicates datasets to SEs closer to the CEs with more computing power. For every dataset the algorithm check the power of each CE and finds the SE with lowest transfer cost and replicates the dataset there. If the SE does not have enough space, it finds the next closest SE with minimum transfer time to the CE. • A random replication scheme which randomly distributes datasets to the different SEs until their storage capacity is reached. This scheme is labeled RAND. • Finally, results when no replication is involved (labeled as NOREP) are also shown. For the scheduling of jobs a queue-length based scheduler is used. The scheduler checks the number of jobs in the queue and schedules the jobs to the computing node with the least number of jobs. For our proposed GT algorithm, in the experiment the SEs update in a sequential manner. An interesting case occurs when the parameters of the system do not change (that is, they are static). In this case we expect the system to reach a (Nash) equilibrium, whereby no player has a tendency to unilaterally change its strategy. In terms of periodic replication done by each SE, an equilibrium is reached when the calculated strategy do not change from one period to another. However, changes to the strategy may be needed in the next period due to changes in the system’s states. In each period, the GT algorithm is executed once by the SEs, which takes a relatively small computation time. The following results are shown when the GT algorithm has converged and the system is in equilibrium. Figure 2 depicts the performance of the different replication schemes. In all scenarios, the figure clearly shows the superior performance of our game theoretic approach (GT) in terms of the makespan. The algorithm seems to maintain its position regardless of the scenario where the system configuration is changed. It significantly outperforms all of the algorithms by 2% to 95%), except for the IGRPS where the performance is slightly better. Although the improvement over the second best algorithm (IGRPS), which is from 2 to 5% might not seem very significant, considering the fact that the IGRPS is a centralized algorithm while the GT is decentralized makes the advantages of GT overweigh the low percentage in

120 100 Makespan (%)

GT 80

CeP

60

IGRPS

40

RAND NOREP

20 0 1

2

3

4

5

System Scenarios

Fig. 2. Algorithms makespan (%) improvement

In the experiments conducted, we also studied the effect of the availability of resources at the different resource providers. For this study we use storage capacity as our measure for the payoff (utility) achieved by the resource providers in the grid system. Figure 3 shows the percentage of lost utility incurred by a resource as a result of their limited storage capacity in this case. It is intuitive to come to the conclusion that more available resources (storage capacity) leads to more payoff or utility gains. However, this is not necessarily true as the action of a provider j increasing its storage capacity will result in an imbalance to the system and a reaction by other providers that may result in lower actual utility gain by provider j than the potential utility gain shown in the figure; that is, the actual increase in utility will only be equal the potential increase in utility as shown on the graph if the other providers do not react to the actions of provider j, which in general is not true. Also, the resource providers’ utilities tend to reach a point where the effect of the storage capacity does not necessarily mean an increase in the utilities are received. 100%

Lost Utility (%)

80% 60%

LostUt ActualUt

40% 20% 0% 1

6

11

16

21

26

31

36

41

46

Storage Elements (SEs)

Fig. 3. (%) of Utility lost by resource providers

Fig. 4. shows the average utility of the providers as the capacity ratio in the system is increased. The capacity ratio is defined as follows:

438

[3]

m

CR =

∑z

j

∑d

i

j =1 h

(11)

[4]

i =1

5.2

[5]

5.1

Utility

5

[6]

Ut

4.9

4.8

4.7

[7]

4.6 0

2

4

6

8

10

Capacity Ratio

Fig. 4. Effect of storage capacity on utility achieved

The figure clearly shows that the utility of the resource providers reaches the point where extra capacity does not increase the utility of the providers significantly as the utility tends to level off. At certain point increases in storage capacity does not increase the average utility of the providers.

[8]

[9]

VI. CONCLUSION In this paper we have addressed the data replication problem in grid environments and proposed a non-cooperative game theoretic replication framework. Our approach directly takes into account the self-interest and priorities of the resource providers in a grid system. The providers compete for maximizing their individual utilities as datasets are retrieved from them. We show that such competition has the effect of driving down the average job delays. The evaluation of our approach shows its superiority over four other replication schemes. In the experiments, we also have discussed the effect of increasing the resources by providers, for example storage capacity. We show that providers can suffer loss of payoff as a result of resource (storage capacity) shortage. We also show that providers benefit (receive more payoff) from increasing their storage capacities until a certain point where any further growth in the storage capacity would have no effect towards the increase in the payoff. REFERENCES [1]

[2]

[10]

[11]

[12] [13]

[14] [15]

[16]

A. Akella, R. Karp, C. Papadimitriou, S. Seshan, and S. Shenker, "Selfish Behavior and Stability of the Internet: A Game-Theoretic Analysis of TCP", in Proceedings ACM SIGCOMM, 2002, pp. 117-30, Pittsburgh, Pennsylvania, USA. W. H. Bell, D. G. Cameron, R. Carvajal-Schiaffino, A. P. Millar, K. Stockinger, and F. Zini, "Evaluation of an Economy-Based File Replication Strategy for a Data Grid", in Proceedings. International Symposium on Cluster Computing and the Grid 2003.

[17]

[18]

439

T. Boulogne, E. Altman, and O. Pourtallier, "On the convergence to Nash equilibrium in problems of distributed computing", Annals of Operation research, pp. 279-91, 2002. R. Buyya, D. Abramson, and J. Giddy, "Nimrod-G: An Architecture for a Resource Management and Scheduling System in a Global Computational Grid", in Proceedings International Conference on High Performance Computing in Asia-Pacific Region, 2000, pp. 283-89. A. Chavez, A. Moukas, and P. Maes, "Challenger: A Multiagent System for Distributed Resource Allocation", in Proceedings 1st ACM International Conference on Autonomous Agents, 1997, pp. 323-31. A. Elghirani, R. Subrata, and A. Y. Zomaya, "Intelligent Scheduling and Replication in Datagrids: a Synergistic Approach", in Proceedings 7th IEEE International Symposium on Cluster Computing and the Grid, 2007, pp. 179-86, Rio de Janeiro, Brazil. P. Ghosh, K. Basu, and S. K. Das, "A game theory-based pricing strategy to support single/multiclass job allocation schemes for bandwidth-constrained distributed computing systems", IEEE Transactions on Parallel and Distributed Systems, vol. 18, pp. 289-306, 2007. D. Grosu and A. T. Chronopoulos, "Algorithmic mechanism design for load balancing in distributed systems", IEEE Transactions on Systems, Man and Cybernetics, vol. 34, pp. 77-84, 2004. S. U. Khan and I. Ahmad, "A Pure Nash Equilibrium Guaranteeing Game Theoretical Replica Allocation Method for Reducing Web Access Time", in Proceedings 12th International Conference on Parallel and Distributed Systems, 2006. Y. K. Kwok, K. Hwang, and S. Song, "Selfish Grids: GameTheoretic Modeling and NAS/PSA Benchmark Evaluation", IEEE Transactions on Parallel and Distributed Systems, vol. 18, pp. 621-36, 2007. R. Mahajan, M. Rodrig, D. Wetherall, and J. Zahorjan, "Experiences applying game theory to system design", in Proceedings ACM SIGCOMM, 2004, pp. 183-90, Portland, Oregon, USA. N. Nisan, S. London, O. Regev, and N. Camiel, "Globally Distributed Computation over the Internet: The POPCORN Project", in Proceedings 18th ICDCS, 1998, pp. 592-601. K. Ranganathan, M. Ripeanu, A. Sarin, and I. Foster, "Incentive mechanisms for large collaborative resource sharing", in Proceedings IEEE International Symposium on Cluster Computing and the Grid, 2004, pp. 1-8. T. Roughgarden and É. Tardos, "How bad is selfish routing?" Journal of the ACM, vol. 49, pp. 236-59, 2002. R. Subrata, A. Y. Zomaya, and B. Landfeldt, "Game Theoretic Approach for Load Balancing in Computational Grids", IEEE Transactions on Parallel and Distributed Systems, in press. C. Touati, E. Altman, and J. Galtier, "Generalized nash bargaining solution for bandwidth allocation", Computer Networks, vol. 50, pp. 3242-63, 2006. C. Waldspurger, T. Hogg, B. Huberman, J. Kephart, and W. Stornetta, "Spawn: A Distributed Computational Economy", IEEE Transaction on Software Engineering, vol. 18, pp. 10317, 1992. R. Wolski, J. S. Plank, T. Bryan, and J. Brevik, "Gcommerce: market formulations controlling resource allocation on the computational grid", in Proceedings IEEE IPDPS 2001, 2001, San Francisco, CA, USA.

[19]

[20]

R. Wolski, N. T. Spring, and J. Hayes, "The Network Weather Service: A Distributed Resource Performance Forecasting Service for Metacomputing", Journal Future Generation Computer Systems, vol. 15, pp. 757-68, 1998. H. Yaiche, R. R. Mazumdar, and C. Rosenberg, "A game theoretic framework for bandwidth allocation and pricing in broadband networks", IEEE/ACM Transactions on Networking, vol. 8, pp. 667-78, 2000.

440