a game theoretical method for selection and allocation of ... - CiteSeerX

3 downloads 0 Views 2MB Size Report
Sep 4, 2009 - total data item transfer cost in the network due to the read and update .... data on the mobile hosts because wireless communication is more costly than wired communication. Such strategies address the issue of keeping consistency between ...... temporarily allocated, the memory space remains free.
J Supercomput (2011) 55: 321–366 DOI 10.1007/s11227-009-0329-y

Mosaic-Net: a game theoretical method for selection and allocation of replicas in ad hoc networks Samee Ullah Khan

Published online: 4 September 2009 © Springer Science+Business Media, LLC 2009

Abstract Due to the continuous mobility of hosts, an ad hoc network suffers from frequent disconnections. This phenomenon is undesirable when mobile hosts are accessing data from each other, and thus, data accessibility is lower than that in conventional fixed networks. Because one cannot control network disconnections, an alternative solution to this problem is to replicate data objects onto mobile hosts so that when disconnections occur, mobile hosts can still access data. In this paper, a mathematical model for data object replication in ad hoc networks is formulated. The derived model is general, flexible and adaptable to cater for various applications in ad hoc networks. We prove that this problem in general is NP-complete and propose a game theoretical technique in which players (mobile hosts) continuously compete in a non-cooperative environment to improve data accessibility by replicating data objects. The technique incorporates the access frequency from mobile hosts to each data object, the status of the network connectivity, and communication costs. In the proposed scheme, players (mobile hosts) compete through bids in a non-cooperative environment to replicate data objects that are beneficial to themselves and the system as a whole. To cater for the possibility of cartel type behavior of the players, the scheme uses the Vickrey payment protocol that leaves the players with no option than to bid in such a fashion that is beneficial to the system as a whole. The paper also identifies some useful properties of the proposed scheme and the necessary conditions of optimality. The proposed technique is extensively evaluated against some well-known ad hoc network replica allocation methods, such as: (a) randomized, (b) extended static access frequency, (c) extended dynamic access frequency and neighborhood, and (d) extended dynamic connectivity grouping. The experimental results reveal that the proposed approach outperforms the four techniques in solution quality and projects a competitive execution time. S.U. Khan () Department of Electrical and Computer Engineering, North Dakota State University, Fargo, ND 58108-6050, USA e-mail: [email protected]

322

S.U. Khan

Keywords Ad hoc networks · Replica allocation · Data accessibility · Mobile computing

1 Introduction There is continuing interest to study and improve technologies related to ad hoc networks because they are constructed from only mobile hosts and require no particular infrastructure [4]. In such a setup, every mobile host acts as a router and communicates with other mobile hosts. As a consequence, even if the source and destination mobile hosts are not in each other’s communication range, data is still forwarded to the destination mobile host by relaying transmission through other mobile hosts that exist between the two mobile hosts. Due to the continuous mobility of hosts, an ad hoc network suffers from frequent disconnections. This phenomenon is undesirable when mobile hosts are accessing data from each other. Because one cannot control network disconnections, an alternative solution to this problem is to replicate data onto various mobile hosts so that when disconnections occur, mobile hosts can still access data [16]. Figures 1a and 1b illustrate this notion. If the wireless channels between the mobile hosts M2 , M3 and M4 fail, then the set of mobile hosts (M1 , M2 and M3 ) cannot access the data object O2 and the set of mobile hosts (M4 , M5 and M6 ) cannot access the data object O1 . However, if the replicas of the data objects are allocated at one of the mobile hosts of the opposite side of the divided ad hoc network as shown in Fig. 1b, then every mobile host can access both the data objects even after the network division. The decision where to place replicated data must trade off the cost of accessing data that is reduced by additional copies with the cost of storing and updating the replicas [12, 25]. These costs have severe implications in ad hoc networks because Fig. 1a Ad hoc network division and data access

Fig. 1b Effective data replication for continued data access

Mosaic-Net: a game theoretical method for selection and allocation

323

mobile hosts have limited resources (storage and processing power). In general, mobile hosts would experience reduced access latencies provided that data is replicated within their close proximity. However, this is applicable in cases when only read accesses are considered. If updates of the contents are also under focus, then the locations of the replicas have to be: (a) in close proximity to the mobile hosts, and (b) in close proximity to the primary (assuming a “master” replication environment [17]) copy. Therefore, efficient and effective replication schemes strongly depend on how many replicas to be placed in the system, and more importantly where [5]. Replication has been extensively studied in the context of conventional (wired) networks. For an overview of data replication techniques, see [27]. Although in the context of ad hoc networks it has drawn little attention, yet the problem is interesting and much more challenging due to the unpredictable network topology, limited resources of mobile hosts and its application in military communications [19]. The ad hoc data replication problem (ADRP) was first introduced by Hara [14] that was further extended [15, 16] to incorporate various network connectivity related issues. Although the above-mentioned works are plausible in the sense that they advance the study of ADRP, yet none involve reasoning via a concrete mathematical model. Thus, it is imperative to derive and understand an optimization model that is general, flexible and modifiable to cater for various applications of ADRP. We build on the above-mentioned work of Hara and address the selfish behavior of mobile servers in our solution concept. In ad hoc networks, resources may belong to different self-interested servers [29]. These servers may manipulate the resource (replica) allocation mechanism for their own benefit by misrepresenting their preferences, which may result in severe performance degradation [21]. Such scenarios are often modeled and studied using game theory. In this paper, we will first focus on deriving a mathematical model for ADRP and show that this problem in general is NP-complete. We then follow it up by proposing “Mosaic-Net” (an acronym for a game theoretical method for selection and allocation of replicas in ad hoc networks) that exhibits fast execution time and guarantees excellent solution quality. As indicated by Arrow’s impossibility theorem for satisfactory voting systems [3], aggregating players’ valuations (or preferences) to reach a collective decision is a difficult problem. It is further complicated by the possibility that the players might try to manipulate the mechanism. Nisan and Ronen [29] were the first to consider discrete optimization problems in the context of game theory, where the correct valuation is not directly available to the mechanism. Instead, players report some valuation to the mechanism, but they might lie. Their contribution is significant in the sense that they break the traditional barriers of game theory by explicitly stating that the mechanism’s objective function may have nothing to do with social welfare, which is the crux of the theory of collective decision making. Hence, many optimization problems can now be studied by relating player valuations to the objective function of the mechanism because both rely on the player’s valuation and both determine the player’s strategies. Nisan and Ronen termed their framework as Algorithmic Mechanism Design (AMD) [29]. AMD can be used to force the players to always tell the truth and follow the rules by laying out a set of incentives and repercussions. In the literature, such mechanisms are known as “truthful” mechanisms [2]. Each player in AMD has some valuation function that quantifies its benefit or loss. Every player reports the output of its valuation function to a centralized mechanism, which chooses

324

S.U. Khan

an outcome that optimizes a given objective function and makes payments to players. The design of payments is important because side payments are used to provide incentives to the players to report to the mechanism truthfully. In this paper, we design a truthful mechanism for the ADRP, termed Mosaic-Net, where each player’s valuation is naturally expressed by a single positive real number. In Mosaic-Net we restrict the form of valuations but allow general objective functions. This is contrary to Vickrey–Clarke–Groves (VCG) mechanisms that allow arbitrary valuation functions but apply only to utilitarian objective functions [2]. Thus, our results are also applicable to optimization problems other than the ADRP. The outcome of the mechanism we consider will always define some set of replica allocations at the player’s mobile server. A player’s valuation will always be the cost it incurs per replica, and this valuation will have some physical significance in terms of the amount of traffic (read requests from servers that do not hold replicas and are the closest to the player’s mobile server plus updates) that it processes. The goal of a player is to maximize its profit, which is payment minus cost. The goal of the mechanism is to minimize the total data item transfer cost in the network due to the read and update accesses. In Mosaic-Net, we use side payments to encourage players to tell the truth. It is known that for some output functions no side payments can make the resulting mechanism truthful [29]. However, Archer and Tardos [2] have shown that output functions that can be used in truthful mechanisms are those where the resource allocation made to a player decreases as the cost increases, where each player’s valuation function is composed of only one parameter. We build on that result and show that the same outcome holds when each player’s valuation function is composed of multiple parameters. Briefly, our game theoretical technique involves players (mobile hosts) that compete in a non-cooperative environment to replicate data objects. Because the players do not have a global view of the problem domain, it is always possible that the players in order to satisfy local queries replicate data objects that are not beneficial to the system as a whole in terms of saving communication cost (although it may be productive from the players’ point of view). To counteract such negative notions, a referring body is introduced (termed as the mechanism). The aim of the mechanism is to direct the competition in such a fashion that a global optimal is achieved even though the agents are competing against one another. Note that there is a significant difference between the definition of truthfulness used in [22, 23] and the one used in this paper. The randomized replica allocation technique in [22, 23] yields a truthful dominant strategy for any possible random choice of the algorithm; however, the mechanism is truthful only in expectation, a weaker notion of truthfulness [30]. A randomized mechanism can be seen as a probability distribution over deterministic mechanisms: an element x is selected randomly and the corresponding mechanism is used. So, the mechanism in [23] is truthful for every fixed x. Moreover, in [20], the notion of utility is replaced by the expected utility one: even though the expected utility is maximized when telling the truth, for some x there might exist a better (untruthful) strategy. We distinguish ourselves from the above-mentioned work by deviating in the definition of truthfulness by reporting a deterministic truthful outcome. Highlighting the fact that revenue is a major concern, Ref. [24] suggests examining auctions (or mechanisms) that do not necessarily maximize the social welfare, and characterizes all truthful mechanisms for this problem. The characterization of [24] is a special case of ours, for bounded (predefined

Mosaic-Net: a game theoretical method for selection and allocation

325

number of replicas in the system) replica allocations, and it also appears implicitly in [20, 22, 23]. In [21], the authors also ignore the social welfare, instead they attempt to compute various functions of the agents valuations (such as the order statistics) using auctions of minimal communication complexity. The proposed game theoretical technique is extensively evaluated against four ad hoc network replica placement methods. The comparative methods include: (a) randomized, (b) extended static access frequency [15], (c) extended dynamic access frequency [15] and neighborhood, and (d) extended dynamic connectivity grouping [15]. Experimental data is recorded to measure the effects of: (a) replica allocation period, (b) read access frequency, (c) update access frequency, (d) storage capacity, and (e) radio communication range. The comparative studies reveal that the proposed approach outperforms the four methods in solution quality and projects a competitive execution time. The reminder of this paper is organized as follows. In Sect. 2, we give a brief description of the related works and discuss how our approach is different from others. In Sect. 3, we describe the system model and the underlying assumptions. Section 4 focuses on deriving a mathematical model for ADRP, followed by proof of NP-completeness. Section 5 depicts the Mosaic-Net technique for ADRP, followed by experimental evaluations in Sect. 6. Finally, in Sect. 7, we summarize this paper.

2 Related works The ad hoc data replication problem (ADRP) is an extension of the classical file allocation problem (FAP). Chu [8] studied the file allocation problem with respect to multiple files in a multiprocessor system. Casey [6] extended this work by distinguishing between updates and read file requests. Eswaran [10] proved that Casey’s formulation was NP-complete. In [28] Mahmoud and Reardon provide an iterative approach that achieves good solution quality when solving the FAP for infinite server capacities. Apers in [1] considered the data allocation problem (DAP) in distributed databases where the query execution strategy influences allocation decisions. In [28] the authors proposed several algorithms to solve the data allocation problem in distributed multimedia databases (without replication), also called as video allocation problem (VAP). All the above-mentioned methods cater for improving data accessibility in distributed computing systems. This is considered to be similar to our approach, because all address the improvement of data accessibility by replicating data. However, system failures do not frequently occur in fixed networks, it is usually sufficient to create a few replicas of a database, and thus no special strategy is required. On the other hand, frequent division of the network is a special characteristic in ad hoc networks, our approach takes this into account and is completely different from that for distributed computing systems. In close connection to the ADRP, several routing protocols for improving the connectivity in ad hoc networks have been proposed in Internet Engineering Task Force (IETF) [19, 31–33]. All these protocols improve the connectivity among mobile hosts that are connected to each other by one-hop/multi-hop links, but they fail to provide any solution when the network is divided. On the other hand, our proposed replica

326

S.U. Khan

allocation method appropriately allocates replicas of data items before the division of the network, and thus, improves data accessibility. Several strategies for caching data contents in mobile computing environments have also been proposed [4, 7, 17, 18, 34, 39]. Most of these strategies assume an environment where mobile hosts access contents at sites in a fixed network, and cache data on the mobile hosts because wireless communication is more costly than wired communication. Such strategies address the issue of keeping consistency between original data and its replicas or caches with low communication costs. They are considered to be similar to our approach, because both approaches replicate data on mobile hosts. However, these strategies assume only one-hop wireless communication, and thus, they are completely different from our approach that assumes multi-hop communication in ad hoc networks. Another closely related research topic is of push-based information systems in which a server repeatedly broadcasts data to clients using a broadband channel. Several caching strategies have been proposed to improve user access [13, 36]. In these strategies, clients are typically mobile hosts, and the cache replacement is determined based on several parameters such as the access frequency from each mobile host to each data item, the broadcast frequency of each data item, and the time remaining until each item is broadcast next. They are also considered to be similar to our approach, because both approaches replicate data on mobile hosts. However, comparing the strategies for caching or replicating, both approaches are completely different because the strategies in push-based information systems do not assume that the clients cooperatively share cached data items by constructing ad hoc networks. Hara’s work [14–16] is the closest among all the related works on ADRP compared to this paper. However, our work differs from Hara’s in: (a) deriving a mathematical problem formulation for ADRP, (b) proving that the generalized form of ADRP is NP-complete, (c) proposing an optimization technique that allocates replicas so as to minimize the network traffic under storage constraints with “read from the nearest” and “push-based update through the primary mobile host” policies, (d) extensively evaluating the techniques under varying system parameters, (e) using game theoretical techniques.

3 System model and assumptions The system model under consideration is an ad hoc network where mobile hosts access data objects held by other mobile hosts as the originals. When a mobile host issues a request to access a data object, the request is successful in either case: (a) the mobile host has the original or the replica of the data object in its local database or (b) at least one mobile host that is connected (possibly via multi-hop) to the mobile host that issued the access request has the original or the replica of the data object. If none of the above two cases return the data object for which an access was requested, then the request fails. Below we elucidate the few system related assumptions. The most frequent notations used in this paper are recorded in Table 1:

Mosaic-Net: a game theoretical method for selection and allocation

327

Table 1 Notations and their meanings Symbols

Meaning

m

Number of mobile hosts in the ad hoc network

n

Number of data objects in the ad hoc network

Ok

kth data object

ok

Size of data object k

Mi

ith mobile host

si

Storage capacity of mobile host i

as i

Available storage capacity of mobile host i

rki

Read frequency for data object k from mobile host i

Rki

Aggregate read cost of rki

uik Uki

Update frequency for data object k from mobile host i Aggregate update cost of uik

NN ik

Nearest neighbor of mobile host i holding data object k

c(i, j )

Communication cost between mobile hosts i and j

Pk

Primary mobile host of the kth data object

Rk

Replication scheme of data object k

Coverall

Total overall network transfer cost

ADRP

Ad hoc data replication problem

NTC

Network transfer cost

Mosaic-Net

Game theoretical method for selection and allocation of replicas in ad hoc networks

1. Each mobile host is assigned a unique host identifier. The set of all the mobile hosts is denoted by M = {M 1 , M 2 , . . . , M m }, where m is the total number of mobile hosts and M i (1 ≤ i ≤ m) denotes a mobile host identifier. 2. Each data object is assigned a unique object identifier. The set of all the data objects is denoted by O = {O1 , O2 , . . . , On }, where n is the total number of data objects and Ok (1 ≤ k ≤ n) denotes an object identifier. The original copy of an object is held by a particular mobile host in the system. 3. Each mobile host has limited storage capacity that is denoted by s i . 4. The access frequencies are known a priori (or observed through access log). 5. For updates we assume a “master” replication environment [17], in which each mobile host M i “owns” the original copy of an object Ok , i.e., it generates all the updates to Ok . Note that a mobile host M i can own more than one data object as the original, e.g. M 1 , can own O2 and O7 , and thus, can update both the data objects. After a data object is updated, its replicas become invalid if mobile hosts holding them are not connected to the mobile host that holds the original copy. This is because the update cannot be propagated. Based on the above system overview and the underlying assumptions, if we are to find an optimal placement of replicas in an ad hoc network, then we must incorporate among others the following parameters in a brute force (exhaustive search) method [14]: 1. The access frequency from each mobile host to each data item.

328

S.U. Khan

2. The time remaining until each item is updated next. 3. The probability that each mobile host will participate in the network and will disappear from the network. 4. The probability that each two mobile hosts connected by a one-hop link will be disconnected. 5. The probability that each two disconnected mobile hosts will be connected by a one-hop link. Even if some looping is possible, the computational complexity is very high, and this calculation must be done every time when the network topology changes due to mobile host migration. Moreover, parameter 2 changes as time passes, and parameters 3, 4, and 5 cannot be formulated in practical terms because mobile hosts move freely. For these reasons, we take the following heuristic approach: 1. Replicas are reallocated in a specific period (relocation period). 2. At every relocation period, replica allocation is determined based on the access (both read and update) frequency from each mobile host to each data item and the network topology at that moment. We must understand that the relocation period (as defined originally in [14]) is the time period between two executions of a replica placement procedure, which is measured in seconds (or in this paper simulation time units). When a replica placement procedure is executed, the system is frozen until the replica placement procedure terminates. Thus, the input to a replica placement procedure is sufficient and necessary to compute a valid replica placement. Moreover, this also emphasizes the fact that we must design fast and high solution quality guaranteeing replica placement procedures.

4 Ad hoc data replication problem (ADRP) 4.1 Problem definition Consider an ad hoc network comprising m mobile hosts, with each mobile host having its own processing power and storage. Let M i and s i be the name and the total storage capacity (in simple data units, e.g. blocks), respectively, of mobile host i where 1 ≤ i ≤ m. The m mobile hosts can communicate with each other using a wireless communication network. A wireless channel between two mobile hosts M i and M j (if it exists) has a positive integer c(i, j ) associated with it, giving the communication cost for transferring a data unit between mobile hosts M i and M j . (In this paper, the communication cost between two mobile hosts is determined by the power-attenuation model [35]. See Sect. 5.3 for details.) If the two mobile hosts are not one-hop connected by a wireless channel then the above cost is given as the sum of the costs of all the wireless channels (multi-hop) in a chosen path from site M i to the site M j . Without the loss of generality we assume that c(i, j ) = c(j, i). Such an assumption can be relaxed when the upstream and downstream bandwidths vary for a wireless channel. Let there be n data objects, each identifiable by a unique name Ok and size in simple data units ok where 1 ≤ k ≤ n. Let rki and uik be the total number of reads and

Mosaic-Net: a game theoretical method for selection and allocation

329

updates, respectively, initiated from M i for Ok during a certain time period. In other words, rki and uik are the read and update frequencies, respectively. Our replication policy assumes the existence of one primary copy for each object in the system. Let Pk , be the mobile host that holds the primary copy of Ok , i.e., the only copy in the network that cannot be de-allocated, hence referred to as primary mobile host of the kth object. Each primary mobile host Pk , contains information about the whole replication scheme Rk of Ok . This can be done by maintaining a list of the mobile hosts where the kth object is replicated at, called from now on, the replicators of Ok . When a mobile host M i initiates a read request rki for a data object Ok , the request is redirected to the nearest neighbor NN ik mobile host that holds either the original or the copy of the data object Ok . In other words, NN ik is the mobile host for which the reads from M i for Ok , if served there, would incur the minimum possible communication cost. It is possible that NN ik = M i , if M i is a replicator or the primary mobile host of Ok . Another possibility is that NN ik = Pk , if the primary mobile host is the closest one holding a replica of Ok . For the updates we assume that only the mobile host Pk (that owns the data object(s)) can perform such operations. Such an assumption is justifiable when considering sensor networks that are a special class of ad hoc networks. Pk updates a data object Ok by sending broadcasts to the set of mobile hosts that hold the replicas of Ok , i.e., M i ∈ Rk . For the ADRP under consideration, we are interested in minimizing the total network transfer cost (NTC) due to data object movement, i.e. the data object movement due to the read and update accesses. It is to be noted that the minimization of NTC in turn leads to increased data accessibility. There are two components affecting NTC. The first component of NTC is due to the read requests. Let Rki denote the total NTC, due to M i s’ reading requests for object Ok , addressed to the nearest site NN ik . This cost is given by the following equation:   Rki = rki ok c i, NN ik ,

(1)

where NN ik = {Mobile host j |j ∈ Rk and min c(i, j )}. The second component of NTC is the cost arising due to the updates. Let Uki be the total NTC, due to Pk s’ updates requests for object Ok . This cost is given by the following equation: Uki = uik ok



c(Pk , i).

(2)

∀i∈Rk

The cumulative NTC, denoted as Coverall , due to reads and writes is given by: m  n   i  Coverall = Rk + Uki . i=1 k=1

(3)

330

S.U. Khan

Let Xik = 1 if M i holds a replica of object Ok , and 0 otherwise. Xik s define an m × n replication matrix, named X, with boolean elements. Equation (3) is now refined to:  m  m  n    i  x (1 − Xik ) rk ok min{c(i, j )|Xj k = 1} + Xik uk ok c(Pk , i). (4) X= i=1 k=1

x=1

Mobile hosts that are not the replicators of data object Ok create NTC equal to the communication cost of their reads from the nearest replicator. Sites belonging to the replication scheme of Ok , are associated with the cost of receiving all the updated versions of it. Using the above formulation, the ADRP can be defined as: “Find the assignment of 0, 1 values in the X matrix that minimizes Coverall , subject to the storage capacity constraint: n 

Xik ok ≤ s i

∀ (1 ≤ i ≤ m), and subject to the primary copies policy:

k=1

XPk k = 1 ∀ (1 ≤ k ≤ n).” The minimization of Coverall has two impacts on an ad hoc network under consideration. First, data objects are replicated as close as possible to their respective primary mobile hosts to reduce the update costs. Second, data objects are replicated as close as possible to the mobile hosts that access replicated data to minimize the read costs. 4.2 Proof of NP-completeness An equivalent constraint optimization problem similar to the ADRP can be stated as follows: “Given an instance of an ad hoc network, with only primary copies existing (no replicas) and an integer K, is there an assignment of 0, 1 values to matrix X such that Coverall ≤ K and the constraints are satisfied?” In O(mn) time a non-deterministic algorithm that assigns 0,1 values to the elements of X, while checking the constraints, can easily decide the problem. Thus, ADRP belongs to NP. In order to prove that ADRP is also NP-complete, we will reduce it to the knapsack problem. The knapsack problem can be stated as follows [11]: INSTANCE: A finite set U , a “size” s(u) ∈ Z+ and a “value” v(u) ∈ Z+ for each u ∈ U , a size constraint B ∈ Z+ , and a value goal K ∈ Z+ . QUESTION: Is there a subset U  ⊆ U such that u∈U  s(u) ≤ B and u∈U  v(u) ≥ K? For every instance of the knapsack problem we can build an ad hoc network such that a solution to the knapsack is also a solution to an instance of ADRP. (The basic idea is to identify a 1-1 mapping of objects belonging to U and the data objects in the ad hoc network.) We build the ad hoc network as follows: 1. Order the set U as: n = |U | ∧ ok = s(k)∀1 ≤ k ≤ n. 2. Pick up a random number of mobile hosts m − 1(m ≥ 2).

Mosaic-Net: a game theoretical method for selection and allocation

331

3. Allocate all the data objects to each of them, assigning also u∈U s(u) storage capacity, i.e., Xik = 1 ∧ s i = u∈U s(u)∀1 ≤ k ≤ n, 1 ≤ i ≤ m − 1. 4. Randomly decide which of the m − 1 copies of a data object k would be the primary one. 5. After defining the primary copies, add an “empty” (holding no data object) mobile host with storage capacity B. Call this as the knapsack server (Ks ). We can do so because an assignment of objects to the knapsack server that solves the knapsack problem also solves the ADRP. To simplify the derivation, we only consider a primitive version of the ADRP. We set c(i, j ) = 1 ∀ 1 ≤ i, j ≤ m, uik = 0 ∀ 1 ≤ i ≤ m, 1 ≤ k ≤ n and set rkKs =

v(k) u∈U s(u) s(k) ∀k ∈ U . The NTC from the requests of all servers other than Ks is zero. This is due to the fact that the read requests are satisfied locally and no updates exist. Every object allocation scheme for Ks that minimizes NTC is a solution to ADRP (the NTC introduced by the rest of the network is zero). Because all wireless channels have communication cost assumed to be equal to 1, therefore each object

allocated to Ks reduces the total NTC by rkKs ok or v(k) u∈U s(u). Let U  be the set of objects allocated to Ks . Let X  denote the NTC when Ks is empty and XKs the NTC after completing the object allocation in Ks . Then, the initial ADRP decision problem can be restated as: “Assuming an instance of an ad hoc network that yields NTC equal to X  , is there a subset U  of the set of objects that if allocated to Ks server (without exceeding its capacity), the total NTC X will be X ≤ K, with K any integer?”

The as:X  − nk=1 ok · x∈U  v(x) ≤ K, or inequality X ≤ K can be restated C · x∈U  v(x) ≥ E or finally as: x∈U  v(x) ≥ K  (where K, K  , E are any integers and C is constant). In order to answer the question whether a U  such as  x∈U  v(x) ≥ K  exists or not, we must answer the relevant knapsack question ( u∈U  v(u) ≥ K ). Therefore, for every knapsack instance we can construct an ad hoc network as above, so that a solution to the knapsack is also a solution to the ADRP.

5 Game theoretical technique In this section we will first describe the basics of our proposed game theoretical model followed by extending the model to ADRP. 5.1 The basic model The Setup: The setup consists of m players. Each player i has some private data t i ∈ R. This data is termed as the agent’s true type or true data. Only player i has knowledge of t i . Let t denote the vector of all the true types t = (t 1 , . . . , t m ). Communications: Each player communicates with the mechanism (refereeing body) via bids bi .(We describe the structure of bids in the subsequent text.) Let b denote the vector of all the bids, b = (b1 , . . . , bm ), and let b−i denote the vector of bids, not

332

S.U. Khan

including player i, i.e., b−i = (b1 , . . . , bi−1 , bi+1 , . . . , bm ). It is to be understood that we can also write b = (b−i , bi ). Components: The game theoretical model has two components: (1) the algorithmic output x(·), and (2) the payment mapping function p(·). Algorithmic output: The mechanism allows a set of outputs X, based on the output function that takes in as the argument, the bidding vector, i.e., x(b) = {x 1 (b), . . . , x m (b)}, where x(b) ∈ X. This output function relays a unique output given a vector b. That is, when x(·) receives b, it generates an output that is of the form of (replica) allocations x i (b). Intuitively it would mean that the algorithm takes in the vector bid b and then relays to each player its replica allocation(s). Monetary cost: Each player i incurs some monetary cost ci (bi , x i (b)), i.e., the cost to accommodate the allocation x i (b). This cost is dependent upon what player i bids and the output. Payments: To offset ci , the mechanism makes a payment p i (b) to player i. A player i always attempts to maximize its utility (profit) ui (t i , b) = p i (b) − ci (bi , x i (b)). Each player i cares about the other players’ bid only insofar as they influence the outcome and the payment. Bids: Each player i is interested in reporting a bid bi such that it maximizes its profit, regardless of what the other agents bid, i.e., ui (t i , (b−i , t i )) ≥ ui (t i , (b−i , bi )) for all b−i and bi . In essence, each player aims to identify a bid that dominates every other player’s bid. A dominant bid means that a player i can always guarantee success for itself no mater what the other players bid. We state the following lemma from literature that says that truth-telling is a dominant strategy, i.e., if all the players report the exact worth of a data object, it conforms to dominating strategies. Lemma 1 Truth-telling for agents having a one-parameter data is a dominant strategy. Proof [2].



The Mechanism: We now put all the pieces together. A mechanism (both the referee and the model) ς consists of a pair, ς = (x(b), p(b)), where x(·) is the output function and p(·) is the payment mapping function. The objective of the mechanism is to select an output x that optimizes a given objective function. Below we identify the desired characteristics of the mechanism. Truthfulness: We say that an output function admits fairness if there exists a payment mapping function p(·) such that the mechanism ς is truthful [2]. Using Lemma 1, this

Mosaic-Net: a game theoretical method for selection and allocation

333

would transform to: a mechanism, ς , that is implemented using dominant strategies, ς = (x(t), p(t)).(Notice that now the mechanism, ς , relies on t and not b.) Voluntary participation: A mechanism is characterized as a voluntary participation mechanism if ∀i, ui (t i , (b−i , t i )) ≥ 0, i.e., no player incurs a net loss. Objective: The mechanism defined above leaves us with the following two optimization problems: 1. Identify a strategy that is dominant for every player. 2. Identify a payment function that encourages truthfulness and voluntary participation. 5.2 Game model applied to ADRP (Mosaic-Net) We follow the same pattern as discussed in Sect. 5.1. To make things clear, this game model applied to ADRP would be invoked at every relocation period. The Setup: The ad hoc network described in Sect. 3 is considered, where each mobile host is represented by a player, i.e., the mechanism contains m players. In the context of the ADRP, a player holds two key elements of information: (a) the available mobile host storage, as i , and (b) the read access frequency, rki , for a data object k. Consider the following three possible cases: 1. ADRP [π]: Where each player holds the read frequency, rki = t i , associated with each object k as private information, whereas the available storage of the mobile host and everything else (network topology, system parameters, etc.) is public knowledge. 2. ADRP [σ ]: Where each player holds the available mobile host storage, as i = t i , as private information, whereas the read frequency and everything else is public knowledge. 3. ADRP [π, σ ]: Where each player holds both the read frequencies and the mobile host storage, {rki , as i } = t i , whereas everything else is public knowledge. Intuitively, if players know the available storage of each other, that gives them no advantage whatsoever. However, if they come about to know the read frequency, then they can easily modify their bids and alter the algorithmic output, as the read frequency directly represents the popularity of data objects [29]. It is to be noted that a player i can only calculate a data object k’s benefit (that it brings by reducing the cost of read accesses due to replication) by making use of the access frequency, and thus, everything else such as the network topology, latency on communication lines, and even the mobile host capacities can be public knowledge. Therefore, DRP[π] is the only natural choice. Communications: The players in the mechanism are assumed to be greedy (or selfish) and therefore, they project a bid bi (as apposed to t i ) to the mechanism. At this moment we want to clarify that the mechanism has no way of knowing that bi is t i or some other value. However, in the subsequent text we will rigorously prove that if the

334

S.U. Khan

mechanism provides proper incentive (payments to compensate for holding replicas of data objects) to the players, then they are in fact forced to bid bi = t i . Components: The mechanism has two components: (1) the algorithmic output x(·), and (2) the payment mapping function p(·). Algorithmic output: In the context of the ADRP, the algorithm accepts bids from all the players, and outputs the maximum beneficial bid, i.e., the bid that incurs the minimum NTC due to object movement (3). (We will give a detailed description of the algorithm in the subsequent text.) Monetary cost: When a data object is allocated (for replication) to a player i, the player becomes responsible to entertain (read and write) requests to that object. For example, assume object k is replicated to player i, then the amount of traffic that the player has to entertain due to the replication of object k is exactly equivalent to the NTC cost, i.e., ci = Rki + Uki . This fact is easily deducible from (4). Payments: To offset ci , the mechanism makes a payment p i (b) to player i. This payment is equivalent to the second highest bid that is submitted to the mechanism for a data object k. The readers would immediately note that in such a payment function, a player i can never obtain a net profit greater than 0. This is exactly what we want. In a selfish (or greedy) environment, it is possible that the players bid higher than the true value. The mechanism creates an illusion to negate that, by compensating the players with the payment that is lower than their incurred cost. This leaves no room for the players to overbid or underbid. For example, (1) If a player i overbids for an object k, it gives i no advantage because it receives payment equivalent to the second highest bid; (2) If a player i underbids for an object k, it lessens its chances to (win and) replicate k. In literature, such a payment function is termed as Vickrey payment [38]. For more details on the optimality of such type of a payment functions, see [36]. In that paper, the authors have identified many such scenarios, but all fail to exploit this (Vickrey) payment option. We want to emphasize that each player’s incentive is to replicate data objects so that queries can be answered locally. If the replicas are made available elsewhere, the cost to access data would be much higher. Bids: Each player reports a bid that is the direct representation of the true data that it holds, i.e., bi = rki ok c(i, N Nki ) = Rki . Readers will immediately notice that bi only represents one-half of the NTC. This is because Uki is the private information for Pk . In essence, the mechanism ς(x(b), p(b)), takes in the vector of bids b from all the players, adjusts the bids by injecting the cost of updates and then selects the highest bid. The highest bidder is allocated the data object k that is added to its allocation set x i . The mechanism then pays the bidder p i . This payment is equivalent to the second highest bid submitted for object k.

Mosaic-Net: a game theoretical method for selection and allocation

335

5.3 A numerical example At a relocation period, each mobile host broadcasts its host identifier. After all mobile hosts complete their broadcasts, every host knows its connected mobile hosts. Consider one such connected set of mobile hosts (shown in Fig. 2a). To keep the example as simple as possible we allow only two primary data objects, i.e., O1 at M 1 and O2 at M 6 . The communication cost of a one-hop wireless channel follows the widely cited power-attenuation model [35]. In this model the radio signal of the mobile hosts fall as 1/d κ , where d is the distance between the two mobile hosts and κ is a real constant dependent on the wireless environment, typically between 2 and 4 (we will set κ = 4). In this model all the mobile hosts have the same power threshold for signal detection, typically normalized to one. With these properties in mind, the minimum power requirement for supporting a one-hop wireless channel between two mobile hosts M i and M j separated by a distance d is: c(i, j ) = c(j, i) = d κ . We admit that this may not be a realistic way to measure the communication cost of transferring data between two mobile hosts, but it does provide a lower bound of the communication cost (for details on this claim, see [35]). More sophisticated models such as Shannon channel capacity model [37] would be more useful to study accurate communication costs. However, such a model for measuring the communication cost is more appropriate when the primary focus is on power dissemination. That certainly is not the focus of this paper. Figure 2b shows the cost of the ad hoc network with power-attenuation values, e.g., c(M 1 , M 2 ) = c(M 2 , M 1 ) = (1.1)4 = 1.46. Tables 2, 3, 4, 5, 6 show the system parameters of the ad hoc network under consideration. First, players calculate their local benefits. By local benefit we mean the savings in the network transfer cost (NTC) brought by reading the object locally. Recall that this cost is captured by the Rki part of the NTC as described in Sect. 4. For example, Player 1 (Fig. 3a) calculates the savings in the read cost by (1). The agents choose the Fig. 2a Connected set of mobile hosts

Fig. 2b Network with communication costs

336 Table 2 Read access frequencies

Table 3 Update access frequencies

S.U. Khan Data objects

Mobile hosts M1

M2

M3

M4

M5

M6

O1

0.51

0.25

0.50

0.11

0.12

0.01

O2

0.08

0.62

0.17

0.40

0.09

0.78

Data objects

Mobile hosts M3

M4

M5

M6

M1

M2

O1

0.15

×

×

×

×

×

O2

×

×

×

×

×

0.08

Table 4 Storage capacities of mobile hosts excluding the primary object they currently hold

Mobile hosts

Size

Table 5 Size of data objects

M1

M2

M3

M4

M5

M6

3.4 kB

5.4 kB

3.9 kB

22.3 kB

16.2 kB

2.3 kB

Data objects

Size

O1

O2

O3

O4

O5

O6

1.00 kB

3.56 kB

2.47 kB

6.34 kB

0.82 kB

4.41 kB

maximum (between O1 and O2 ) of the local benefit as their bids. Figure 3b shows the bids submitted to the mechanism. It is clear that the player can submit a bid for any object that it desires. This information is revealed to the auctioneer via the bid description vector, i.e., bi (Ok ) describes the bid of amount bi submitted by player i for object Ok . Recall that the NTC for an object also relies on the update cost. This value is not known to any of the players except the mechanism (and Pk ). When the bids are submitted to the mechanism, it calculates the related update cost for the object under consideration and subtracts that value from the actual bid to produce v i (adjusted value of player i’s bid). The reason for subtracting update cost is that although reading locally saves a certain amount of traffic-off of the ad hoc network, yet this would inject additional traffic due to the updates related to that object. Columns 7 and 8 in Fig. 3b show the adjusted values. Based on these adjusted values the mechanism chooses the highest bid and declares the winner. The only information relayed to the players is “who is allowed to replicate what.” This process continues till there is no more positive benefit (due to replication, Fig. 3h) observable by the mechanism. At a relocation period, a mobile host might not connect to another mobile host that has an original or a valid replica of a data item that the host should allocate. In this case, the memory space for the replica is temporarily filled with one of the replicas that have been allocated because the previous relocation period but are not currently selected for allocation. This temporarily allocated replica is chosen from among the possible replicas according to Mosaic-Net. If there is no replica that can be

[M 3 , M 1 ] = 3.05

[M 4 , M 2 , M 1 ] = 7.38

[M 5 , M 4 , M 2 , M 1 ] = 9.82

M3

M4

M5

M6

[M 2 , M 1 ] = 1.46

M2

[M 1 , M 3 ] = 3.05

M3

M5

M6

[M 6 , M 5 ] = 3.13

[M 5 , M 5 ] = 0

[M 4 , M 5 ] = 2.44

[M 3 , M 4 , M 5 ] = 8.07

[M 2 , M 4 , M 5 ] = 8.36

[M 6 , M 6 ] = 0

[M 5 , M 6 ] = 3.13

[M 4 , M 6 ] = 2.85

[M 3 , M 4 , M 6 ] = 8.48

[M 2 , M 4 , M 6 ] = 8.77

[M 1 , M 2 , M 4 ] = 7.38 [M 1 , M 2 , M 4 , M 5 ] = 9.82 [M 1 , M 2 , M 4 , M 6 ] = 10.23

M4

[M 2 , M 1 , M 3 ] = 4.51 [M 2 , M 4 ] = 5.92 3 1 2 [M , M , M ] = 4.51 [M 3 , M 3 ] = 0 [M 3 , M 4 ] = 5.63 4 2 4 3 [M , M ] = 5.92 [M , M ] = 5.63 [M 4 , M 4 ] = 0

[M 2 , M 2 ] = 0

[M 1 , M 2 ] = 1.46

M2

[M 5 , M 4 , M 2 ] = 8.36 [M 5 , M 4 , M 3 ] = 8.07 [M 5 , M 4 ] = 2.44 6 4 2 1 [M , M , M , M ] = 10.23 [M 6 , M 4 , M 2 ] = 8.77 [M 6 , M 4 , M 3 ] = 8.48 [M 6 , M 4 ] = 2.85

[M 1 , M 1 ] = 0

M1

M1

Mobile hosts Mobile hosts

Table 6 Calculation of least cost wireless channels for connected set of mobile hosts

Mosaic-Net: a game theoretical method for selection and allocation 337

3.4 kB 5.4 kB 3.9 kB 22.3 kB 16.2 kB 2.3 kB

M1 M2 M3 M4 M5 M6

0.51 0.25 0.50 0.11 0.12 0.01

r1i

1.00 kB 1.00 kB 1.00 kB 1.00 kB 1.00 kB 1.00 kB

o1

M1 M1 M1 M1 M1 M1

NNi1 0.00 1.46 3.05 7.38 9.82 10.23

c(i, NN ik )

2.91 (O2 ) 19.35 (O2 ) 5.13 (O2 ) 4.05 (O2 ) 1.17 (O1 ) 0.10 (O1 )

M1 M2 M3 M4 M5 M6

0.08 0.08 0.08 0.08 0.15 0.15

uik

3.56 kB 3.56 kB 3.56 kB 3.56 kB 1.00 kB 1.00 kB

ok

M6 M6 M6 M6 M1 M1

Pk 10.23 8.77 8.48 2.85 9.82 10.23

c(Pk , i) 2.91 2.49 2.41 0.81 1.47 1.53

Uki = uik ok c(Pk , i, )

0.00 0.36 1.53 0.81 1.17 0.10

R1i = r1i o1 c(i, NN i1 )

Fig. 3b Round 1: Players submit bid and the winner is determined

Bid bi (Ok )

Player

Fig. 3a Round 1: Calculate the local benefit (valuations)

as i

Player 3.56 kB 3.56 kB 3.56 kB 3.56 kB 3.56 kB 3.56 kB

o2 M6 M6 M6 M6 M6 M6

NN i2

0.00 16.86 2.72 3.24 −0.30 −1.43

Adjustment (v i = bi − Uki )

0.08 0.62 0.17 0.40 0.09 0.78

r2i 2.91 19.35 5.13 4.05 1.00 0.00

16.86

2.91 19.35 5.13 4.05 1.17 0.10

Max(R1i , R2i

M 2 replicates O2 (M 2 gets p2 = 3.24)

Announcement

R2i = r2i o2 c(i, NN i2 )

Max(v1 , v2 , v3 , v4 , v5 )

10.23 8.77 8.48 2.85 3.13 0.00

c(i, NN i2 )

338 S.U. Khan

3.4 kB 1.84 kB 3.9 kB 22.3 kB 16.2 kB 2.3 kB

as i

0.51 0.25 0.50 0.11 0.12 0.01

r1i

1.00 kB 1.00 kB 1.00 kB 1.00 kB 1.00 kB 1.00 kB

o1

M1 M1 M1 M1 M1 M1

NNi1 0.00 1.46 3.05 7.38 9.82 10.23

c(i, NN ik )

0.41 (O2 ) 0.36 (O1 ) 2.72 (O2 ) 4.05 (O2 ) 1.17 (O1 ) 0.10 (O1 )

M1 M2 M3 M4 M5 M6

0.08 0.15 0.08 0.08 0.15 0.15

uik

3.56 kB 1.00 kB 3.56 kB 3.56 kB 1.00 kB 1.00 kB

ok

M6 M1 M6 M6 M1 M1

Pk

10.23 + 8.77 = 19.00 1.46 8.48 + 8.77 = 17.25 2.85 + 8.77 = 11.62 9.82 10.23

c(Pk , i)

0.00 0.36 1.53 0.81 1.17 0.10

5.41 0.21 4.91 3.30 1.47 1.53

3.56 kB 3.56 kB 3.56 kB 3.56 kB 3.56 kB 3.56 kB

o2 M2 M2 M2 M6 M6 M6

NNi2 1.46 0.00 4.51 2.85 3.13 0.00

−5.00 0.15 −2.19 0.75 −0.30 −1.43

0.41 0.00 2.72 4.05 1.00 0.00

0.75

0.41 0.36 2.72 4.05 1.17 0.10

Max(R1i , R2i )

M 4 replicates O2 (M 4 gets p2 = 0.15)

Announcement

R2i = r2i o2 c(i, NN i2 )

Max(v1 , v2 , v3 , v4 , v5 )

c(i, NN i2 )

Adjustment (v i = bi − Uki )

0.08 0.62 0.17 0.40 0.09 0.78

r2i

Uki = uik ok c(Pk , i, )

R1i = r1i o1 c(i, NN i1 )

Fig. 3d Round 2: Players submit bid and the winner is determined

Bid bi (Ok )

Player

Fig. 3c Round 2: Calculate the local benefit (valuations)

M1 M2 M3 M4 M5 M6

Player

Mosaic-Net: a game theoretical method for selection and allocation 339

3.4 kB 1.84 kB 3.9 kB 18.74 kB 16.2 kB 2.3 kB

M1 M2 M3 M4 M5 M6

0.51 0.25 0.50 0.11 0.12 0.01

r1i

1.00 kB 1.00 kB 1.00 kB 1.00 kB 1.00 kB 1.00 kB

o1

M1 M1 M1 M1 M1 M1

NNi1 0.00 1.46 3.05 7.38 9.82 10.23

c(i, NN ik )

3.56 kB 1.00 kB 3.56 kB 1.00 kB 1.00 kB 1.00 kB

M6 M1 M6 M1 M1 M1

Pk

0.00 0.36 1.53 0.81 1.17 0.10

0.08 0.62 0.17 0.40 0.09 0.78

r2i 3.56 kB 3.56 kB 3.56 kB 3.56 kB 3.56 kB 3.56 kB

o2 M2 M2 M2 M4 M4 M6

NNi2 1.46 0.00 4.51 0.00 2.44 0.00

c(i, NN i2 ) 0.41 0.00 2.72 0.00 0.78 0.00

R2i = r2i o2 c(i, NN i2 )

0.41 0.36 2.72 0.81 1.17 0.10

Max(R1i , R2i )

6.22 0.21 5.72 1.10 1.47 1.53

−5.81 0.15 −3.00 −0.29 −0.30 −1.43

0.15

M 2 replicates O1 (M 4 gets p2 = 0.00)

Uki = uik ok c(Pk , i, ) Adjustment (v i = bi − Uki ) Max(v1 , v2 , v3 , v4 , v5 ) Announcement

R1i = r1i o1 c(i, NN i1 )

10.23 + 8.77 + 2.85 = 21.85 1.46 8.48 + 8.77 + 2.85 = 20.10 7.38 9.82 10.23

c(Pk , i)

Fig. 3f Round 3: Agents submit bid and the winner is determined

0.08 0.15 0.08 0.15 0.15 0.15

M1 M2 M3 M4 M5 M6

0.41 (O2 ) 0.36 (O1 ) 2.72 (O2 ) 0.81 (O1 ) 1.17 (O1 ) 0.10 (O1 )

ok

Player Bid bi (Ok ) uik

Fig. 3e Round 3: Calculate the local benefit (valuations)

as i

Player

340 S.U. Khan

3.4 kB 0.84 kB 3.9 kB 18.74 kB 16.2 kB 2.3 kB

M1 M2 M3 M4 M5 M6

0.51 0.25 0.50 0.11 0.12 0.01

r1i

1.00 kB 1.00 kB 1.00 kB 1.00 kB 1.00 kB 1.00 kB

o1

M1 M2 M1 M2 M1 M1

NNi1 0.00 0.00 3.05 5.92 8.36 8.77

c(i, NN ik )

2.72 (O2 ) 0.65 (O1 ) 1.00 (O1 ) 0.08 (O1 )

0.41 (O2 )

Pk

c(Pk , i)

0.00 0.00 1.53 0.65 1.00 0.08

0.08 0.62 0.17 0.40 0.09 0.78

r2i 3.56 kB 3.56 kB 3.56 kB 3.56 kB 3.56 kB 3.56 kB

o2 M2 M2 M2 M4 M4 M6

NNi2 1.46 0.00 4.51 0.00 2.44 0.00

c(i, NN i2 ) 0.41 0.00 2.72 0.00 0.78 0.00

R2i = r2i o2 c(i, NN i2 )

0.41 0.00 2.72 0.65 1.00 0.08

Max(R1i , R2i )

5.72 1.32 1.69 1.75

6.22 −3.00 −0.67 −0.69 −1.67

−5.81 −0.67

No replicas; Mosaic-Net ended

Uki = uik ok c(Pk , i, ) Adjustment (v i = bi − Uki ) Max(v1 , v2 , v3 , v4 , v5 ) Announcement

R1i = r1i o1 c(i, NN i1 )

0.08 3.56 kB M 6 10.23 + 8.77 + 2.85 = 21.85 N/A 0.08 3.56 kB M 6 8.48 + 8.77 + 2.85 = 20.10 0.15 1.00 kB M 1 7.38 + 1.46 = 8.84 0.15 1.00 kB M 1 9.82 + 1.46 = 11.28 0.15 1.00 kB M 1 10.23 + 1.46 = 11.69

ok

Fig. 3h Round 4: Agents submit bid and the winner is determined

M1 M2 M3 M4 M5 M6

Player Bid bi (Ok ) uik

Fig. 3g Round 4: Calculate the local benefit (valuations)

as i

Player

Mosaic-Net: a game theoretical method for selection and allocation 341

342

S.U. Khan

Game Theoretical Mechanism for Selection and Allocation of Replicas in Ad hoc Networks (Mosaic-Net) Initialize: 01 LM, Li , Tki , ς, MT 02 WHILE LS = NULL DO 03 OMAX = NULL; MT = NULL; P i = NULL; 04 PARFOR each M i ∈ LS DO 05 FOR each Ok ∈ Li DO 06 Bki = compute (t i ); /*compute the valuation corresponding to the desired object*/ 07 ENDFOR 08 bi = argmaxk (Bki ); 09 SEND t i to ς; 10 ENDPARFOR 11 RECEIVE at ςbi and immediately adjust bi to v i and save in MT; 12 OMAX = argmaxk (MT ); /*Choose the global dominate valuation*/ 13 pi = argmaxk (delete(argmaxk (MT ))); /*Calculate the payment*/ 14 BROADCAST OMAX; 15 SEND P i to M i ; /*Send payments to the agent who allocates the object OMAX*/ 16 Replicate OOMAX ; /*Update capacity*/ 17 as i = as i − ok ; /*Update the list*/ 18 Li = Li − Ok ; 19 IF Li = NULL THEN SEND info to ς to update LM = LM − M i ; /*Update mechanism players*/ 20 PARFOR each M i ∈ LM DO 21 Update NN iOMAX /*Update the nearest neighbor list*/ 22 ENDPARFOR /*Get ready for the next round*/ 23 ENDWHILE Fig. 4 Pseudocode for Mosaic-Net

temporarily allocated, the memory space remains free. When data access to the data item whose replica should be allocated succeeds, the memory space is filled with the valid replica. The example allows us to deduce a pseudocode for Mosaic-Net (Fig. 4). Description of Algorithm: At a relocation period, each mobile host broadcasts its host identifier. Because in a relocation period the system itself is per se frozen, therefore, system dynamics do not alter during the run of the Mosaic-Net. The reason for this design feature is twofold: (a) because we consider static replication procedures, the input must not change during the run of the replica placement procedure, and (b) the exact same design feature was assumed by the comparative techniques (randomized, extended static access frequency, extended dynamic access frequency and neighborhood, and extended dynamic connectivity grouping) that have become a standard for ad hoc data replication procedures. For a thorough understanding of various classifications of data replication procedures and the differences between static and dynamic data replication procedures, the reader is encouraged to review [26]. After all of the mobile hosts complete their broadcasts, every host knows its connected mobile hosts. We must note that in every connected ad hoc network, the Mosaic-Net is invoked. That is to say, there can be a number of concurrent Mosaic-Net executions at a relocation period. Below we describe the procedure of Mosaic-Net for one such execution.

Mosaic-Net: a game theoretical method for selection and allocation

343

Within each connected network, every mobile host maintains a list Li . This list contains all of the data objects that can be replicated by player i onto the mobile host M i . We can obtain this list by examining the two constraints of the ADRP. List Li would contain all of the data objects that have their size less than the total available storage as i . Moreover, if the mobile host M i is the primary host of some object k  , then k  should not be in Li . The mechanism maintains a list LM containing all mobile hosts within the connected network that can replicate object objects, i.e., M i ∈ LM if Li = NULL. Because Mosaic-Net is invoked during the relocation period, in which the system is frozen, the integrity of LM is ensured. Mosaic-Net works iteratively. In each step the mechanism (ς) asks all of the players to send their preferences (first PARFOR loop). We must understand that the PARFOR loop is equivalent to the parallel FOR loop in FORTRAN or for that matter any programming language that can execute a parallel FOR loop. Such a loop will ensure that the code within the PARFOR loop is executed concurrently by each of the player. Each player i recursively calculates the true data of every data object in list Li . Each player then reports the dominant true data (Line 09) to the mechanism. The mechanism receives all the corresponding entries, injects them with the update cost in Line 11. From all of the entries, the mechanism chooses the globally dominant true data (Line 12). The true data is broadcasted to all the players, so that they can update their nearest neighbor table NN ik that is shown in Line 21 (NN iOMAX ). The object is replicated and the payment is made to the player, which is the second highest adjusted true data, v i . The mechanism progresses forward until there are no more players interested in acquiring any data for replication (Line 19). We also would like to distinguish between the code that would be executed by the mechanism and the individual players. This is to clarify any ambiguity that may arise due to the combined representation of the code segments of the players and mechanism. The overall control of Mosaic-net is maintained by the mechanism; therefore, the WHILE loop in Line 02 is executed by the mechanism. Twice during the execution of the WHILE loop, the control is shifted to the players that execute the localized code encapsulated by the two PARFOR loops in Lines 04 and 20, respectively. The above discussion allows us to deduce the following result about the mechanism. Theorem 1 Mosaic-Net requires O(MN 2 ) time. Proof The worst case scenario is when each site has sufficient capacity to store all objects. In that case, the WHILE loop (Line 02) performs MN iterations. The time complexity for each iteration is governed by the two PARFOR loops (Lines 04 and 20). The first loop uses at most N iterations, while the send loop performs the update in constant time. Hence, we conclude that the worst case running time of the mechanism is O(MN 2 ).  Theorem 2 In the worst case Mosaic-Net uses O(M 3 N ) messages. Proof First we calculate the number of messages in a single iteration. First, each agent sends its true data to the mechanism, which constitutes M messages. Second, the mechanism broadcasts information about the object being allocated, this constitutes M messages. Third, the mechanism sends a single message about payment to

344

S.U. Khan

the agent to whom the replica was assigned, which we can ignore since it has little impact on the total number of messages. The total number of messages required in a single iteration is of the order of M 2 . From Theorem 1, we can conclude that in the worst case the mechanism requires O(M 3 N ) messages.  It is evident from Theorem 2 that the mechanism requires tremendous amounts of communications. This might be reduced by using some sophisticated network protocols. In the future-generation distributed computing systems, the participating agents might actually be mobile, i.e., they can travel from node to node in the network. In such a scenario the agents can converge to a specific site and execute the mechanism. In that case the number of messages will be reduced astronomically because the procedure will be invoked locally. However, to stay true to the distributed implementation of the proposed Mosaic-Net procedure, we utilize the Ada programming language distributed annex GLADE. This annex allows us to implement the procedure with message passing. Therefore, the experimentally observed execution time of the proposed Mosaic-Net also encapsulates the time to propagate messages. In the implementation, we ensure that the message propagation time is realistic by incorporating the propagation delay and transmission time on the wireless channel. Other possible implementations/improvements are left as future research endeavors. 5.4 Supplementary theoretical results on the optimality of Mosaic-Net We now proceed with proving that the derived Mosaic-Net indeed uses dominating strategies towards implementing truthfulness and voluntary participation. (These were the two objectives that were stated at the end of Sect. 5.1.) Assume that the mechanism ς(x(b), p(b)) is truthful and each payment p i (b−i , bi ) and allocation x i (b−i , bi ) are twice differentiable with respect to bi , for all the values of b−i . We fix some player i and derive a formula for p i , allocation x i , and profit to be the functions of just player i’s bid bi . Because player i’s profit is always maximized by bidding truthfully (Lemma 1), the derivative is zero and the second derivative is non-positive at t i . Because this holds no matter what the value of t i is, we can integrate to obtain an expression for p i . We state: bi p i (bi ) = p i (0) + bi x i (bi ) − 0 x i (u) du. This is now the basis of our extended theoretical results. Definition 2 [2] With the other players’ bid b−i fixed, consider x i (b−i , bi ) as a single variable function of bi . We call this the allocation curve or the allocation profile of player i. We say the output function x is decreasing if each of the associated allocation curves is decreasing, i.e., x i (b−i , bi ) is a decreasing function of bi , for all i and b−i . Based on the above definition, we can state the following theorem. Theorem 3 A mechanism is truthful if its output function x(b) is decreasing. Proof We prove this for Mosaic-Net and the proof is similar to that present in [2]. For simplicity we fix all bids b−i , and focus on x i (b−i , bi ) as a single variable function

Mosaic-Net: a game theoretical method for selection and allocation

345

of bi , i.e., the allocation x i would only change if bi is altered. We now consider two bids bi and bi such that bi > bi . In terms of the true data t i , this conforms to Rki > Rki . Let x i and x i be the allocations made to the player i when it bids bi and bi , respectively. For a given allocation, the aggregate savings in NTC due to local reads can be represented as C i = k∈xi Rki . The proof of the theorem reduces to proving that x i < x i , i.e., the allocation computed by the algorithmic output is decreasing in bi . The proof is simple by contradiction. Assume that x i ≥ x i . This implies that (C i − Rki ) > (C i − Rki ) ≥ (C i − Rki ). This means that there must be a player other than i (which we denoted as −i) who has a bid that supersedes bi . But that is not possible as we began with the assumption that all other bids are fixed, so there can be no other agent −i. If i = −i, then that is also not possible because we assumed that  bi > bi . We now extend the result obtained in Theorem 3 and state: Theorem 4 A decreasing output function admits a truthful payment scheme satisfy∞ ing voluntary participation if and only if 0 x i (b−i , u) du < ∞ for all i, b−i . In this ∞ case we can take the payments to be: p i (b−i , bi ) = bi x i (b−i , bi ) + 0 x i (b−i , u) du. Proof The first term bi x i (b−i , bi ) compensates the cost incurred by agent i to host ∞ the allocation x i . The second term 0 x i (b−i , u) du represents the expected profit of agent i. If agent i bids its true value t i , then its profit is = ui (t i , (b−i , t i )) i i −i i = t x (b , t ) + =



∞ ti

x i (b−i , x) dx − t i x i (b−i , t i )

x i (b−i , x) dx.

ti

If agent i bids its true value, then the expected profit is greater than in the case it bids other values. We explain this as follows: If agent i bids higher (bi > t i ), then the expected profit is = ui (t i , (b−i , bi )) = bi x i (b−i , bi ) +





bi

x i (b−i , x) dx − t i x i (b−i , bi )

= (bi − t i )x i (b−i , bi ) + ∞





x i (b−i , x) dx.

bi

x i (b−i , x) dx < ∞ and bi > t i , we can express the profit when agent ∞ bi i bids the true value as follows: t i x i (b−i , x) dx + bi x i (b−i , x) dx. This is because x i is decreasing in bi and bi > t i , and we have the following equation: bi (bi − t i )x i (b−i , bi ) < t i x i (b−i , x) dx. From this relation, it can be seen that the profit with overbidding is lower than the profit with bidding the true data. Similar arguments can be used for underbidding.  Because

bi

346

S.U. Khan

6 Experimental setup, results and discussion 6.1 Experimental setup Mobile hosts exist in a size 1000 × 1000 flatland. Each host randomly moves in all directions, and the movement speed is randomly determined from 0 to d. The radio communication range of each mobile host has a radius of R. The communication cost of one-hop wireless channel follows the power-attenuation model [35] (details for which were provided in Sect. 5.3). The number of mobile hosts was set to 200, and the number of data objects was set to 2000. The primary data object’s original mobile host was mimicked by choosing random locations. The size of data objects was obtained using the highly cited [9] hybrid lognormal-Pareto distribution, where the distribution’s body follows lognormal and tail follows Pareto. In all the experiments, the basic storage capacity of mobile hosts (C%) was proportional to the total size of data objects. In order to ensure that the system had mobile servers with diverse enough storing capabilities, the actual storage capacity, s i , of a mobile host was a random value between (C/2)% and (3C/2)%. For simulation purposes, replicas were periodically allocated based on the relocation period T . A read access frequency of each mobile host to Ok was rki of any of the three cases [14]: 1. Case 1: rki = 0.5(1 + 0.01i). 2. Case 2: rki = 0.025i. 3. Case 3: rki was determined as a positive value based on the normal distribution with mean 0.5(1 + 0.01i) and standard deviation σ . Case 1 encapsulates a scenario where mobile hosts have the same read access frequency characteristics varying in a small range. Case 2 is similar to case 1 but the range is much larger. Case 3 depicts a scenario where there exists the scatter of access characteristics of mobile hosts. When σ = 0, case 3 equals to case 1. As σ gets larger, the difference of access characteristics among mobile hosts gets larger. An update frequency of each mobile host holding the primary copy of Ok was uik and followed the Zipf’s law [40]. The Zipf distribution with a parameter θ is frequently used to model non-uniform access. It produces access patterns that become increasingly skewed as θ increases, i.e., the probability of accessing any object numbered k is proportional to (1/k)θ . The reason for choosing different frequency models for reads and updates was to diversify our simulation study. No one frequency model is complete and robust [9], and thus, the simulation studies require rigorous testing under various frequency models. Table 7 shows parameters and their values used in the simulation experiments. We want to clarify that each parameter is basically fixed to a constant value, but it is changed in a range in one of the simulation experiments. Because the relative effect of changing R and C% is same as changing the number of mobile hosts and kinds of data objects, respectively [15], we keep both m and n constant. In all simulation experiments, we examine the average data accessibility, the total traffic and the total network transfer cost of each of the methods, during 100,000 units of simulation time units.

Mosaic-Net: a game theoretical method for selection and allocation Table 7 Parameter variance intervals for simulations

Parameter

Value [Range]

m

200

n

2000

d

1

R

7 [1, 20]

347

ok

Lognormal (μ = 9.357, σ = 1.318);

C%

Pareto (k = 1.33K, α = 1.1) n k=1 ok

si

30% = (3C/10)% [(C/2)%, (3C/2)%]

T

256 [1, 8192]

rki

σ = 0.01(using case 3)

uik

Zipf’s law.

θ

0.90 [0.35, 0.95]

κ

4

6.2 Comparative techniques For comparison, we selected four various types of replica placement techniques. To provide a fair evaluation, the assumptions and system parameters were kept the same in all the approaches. The techniques studied include: (a) randomized, (b) extended static access frequency (ESAF) [15], (c) extended dynamic access frequency and neighborhood (EDAFN) [15], and (d) extended dynamic connectivity grouping (EDCG) [15]. Note that the proposed game theoretical technique has an acronym Mosaic-Net. 6.2.1 Randomized replica placements (RAND) At a relocation period, each mobile host allocates replicas of data objects at random. At a relocation period, a mobile host might not connect to another mobile host that has an original or a valid replica of a data object that the host should allocate. In this case, the memory space for the replica is temporarily filled with one of the replicas that have been allocated because the previous relocation period but are not currently selected for allocation. The reason for choosing such a technique was to identify an approach that would provide a baseline (or possibly better) replica placement. 6.2.2 Prefetching technique (PT) The rest of the chosen comparative techniques use a prefetching technique (PT) [16]. PT is defined as the value of a data object by taking the product of the probability, pki , of access of the data object Ok for a mobile host M i , with the time, τk , measure in seconds (or in this paper simulation time unit) that will elapse before that data object is updated again. The PT value is given as: P T = pki τk = pki (Tk − tk ),

(5)

348

S.U. Khan

where, Tk denotes the update period of Ok and tk denotes the time that has passed because Ok has been updated. Using our model, we can easily find PT values for data objects. The key to this is to find a 1-1 mapping between uik and τk . Recall that uik (the update frequency of object Ok ) tells us the number of updates made in a certain amount of time. If this time is taken to be the relocation time, then the relationship between uik and τk is captured by: τk =

1 uik

− T mod

1 uik

.

Therefore, according to the method described in [15], the PT value is given as:

 1 i i 1 PT = pk τk = pk i − T mod i . uk uk

(6)

(7)

6.2.3 Extended static access frequency method (ESAF) At a relocation period, each mobile host allocates replicas of data objects in descending order of PT values. At a relocation period, a mobile host might not connect to another mobile host that has an original or a valid replica of a data object that the host should allocate. In this case, the memory space for the replica is temporarily filled with one of the replicas that have been allocated because the previous relocation period but are not currently selected for allocation. This temporarily allocated replica is chosen from among the possible replicas according to the highest PT value. If there is no replica that can be temporarily allocated, the memory space remains free. When data access to the data object whose replica should be allocated succeeds, the memory space is filled with the valid replica. 6.2.4 Extended dynamic access frequency and neighborhood method (EDAFN) At a relocation period, each mobile host broadcasts its host identifier and information on access frequencies to data objects. After all mobile hosts complete their broadcasts, from the received host identifiers, every host knows its connected mobile hosts. Each mobile host determines the preliminary allocation of replicas based on the ESAF method. In each set of mobile hosts that are connected to each other, starting from the mobile host with the lowest suffix (i) of host identifier (M i ), the following procedure is repeated in the order of the breadth first search. When there is duplication of a data object (original/replica) between two neighboring mobile hosts, and if one of them is the original, the host that holds the replica replaces it with another replica. If both of them are replicas, the host whose PT value to the data object is lower replaces the replica with another replica. When replacing the replica, from among data objects whose replicas are not allocated at either of the two hosts, a different replicated data object is selected whose PT value is the highest. At a relocation period, a mobile host might not connect to another mobile host that has an original or a replica of a data object that the host should allocate. In this case, in the same way as in the ESAF method, the memory space for the replica is temporarily filled with another replica, and is later filled with the valid replica when data access to the data object succeeds.

Mosaic-Net: a game theoretical method for selection and allocation

349

6.2.5 Extended dynamic connectivity-based grouping method (EDCG) At a relocation period, each mobile host broadcasts its host identifier and information on access frequencies to data objects. After all mobile hosts complete their broadcasts, every host knows its connected mobile hosts. In each set of mobile hosts that are connected to each other, starting from the mobile host with the lowest suffix (i) of host identifier (M i ), an algorithm to find bi-connected components is executed. Then, each bi-connected component is put in a group. If a mobile host belongs to more than one bi-connected component, it can belong only to the group in which the corresponding bi-connected component was found earlier when executing the algorithm. In each group, a group access frequency to each data object is calculated as a summation of access frequencies of mobile hosts in the group to the data object. Then, the PT value of the group to each data object is calculated. These calculations are done by the mobile host with the lowest suffix of host identifier in the group. In descending order of PT values in each group, replicas are allocated until the memory space of all mobile hosts in the group becomes full. Here, replicas of data objects that are held as originals by mobile hosts in the group are not allocated. Each replica is allocated at a mobile host whose PT value to the data object is the highest among hosts that have free memory space to create it. After allocating replicas of all data items that have no original in the group, if there is still free memory space at mobile hosts in the group, replicas are allocated in descending order of PT values until the memory space is full. Each replica is allocated at a mobile host whose PT value to the data object is the highest among hosts that have free memory space to create it and do not hold a replica or the original of the data object. If there is no such mobile host, the replica is not allocated. When a mobile host does not connect to another mobile host that has an original or a replica of a data object that the host should allocate, it behaves in the same way as in the ESAF method. 6.3 Performance metrics We use three performance metrics defined as follows: 1. Data accessibility: Percentage number of successful hits/access to data objects. 2. Traffic: The total hop count of data transmission for allocating/relocating replicas. 3. NTC savings: Network transfer cost (in percentage) that is saved under the replication scheme found by the algorithm, compared to when only primary copies exist. 6.4 Results and discussion 6.4.1 Effects of relocation period To examine the effects of relocation period, we fix m = 200, n = 2000, C = 30%, d = 1, R = 7 and θ = 0.90. The set of Figs. 5a, 5b, 5c and 6a, 6b, 6c show the simulation results. From Fig. 5a, the Mosaic-Net method gives the highest data accessibility, and the EDCG method gives the next. Because the degree of replication in ESAF is extremely

350

S.U. Khan

Fig. 5a Relocation period and data accessibility (case 1)

Fig. 5b Relocation period and traffic (case 1)

high, EDAFN, EDCG and Mosaic-Net give a sweeping improvement simply because they selectively replicate data objects. An important observation is that the relocation period, T , measured in seconds (in this paper simulation time units) has little effect on the methods studied in this paper. Unlike our theoretical expectations, short relocation period does not provide high data accessibility. This is due to the fact that shorter relocation period will more frequently implement a new replication scheme compared to a longer relocation period. Accessibility may be hampered whenever a new replication scheme is implemented. In Fig. 5b we observe that EDCG method gives the highest traffic, and Mosaic-Net gives the next. In general, the traffic caused

Mosaic-Net: a game theoretical method for selection and allocation

351

Fig. 5c Relocation period and NTC (case 1)

by all the methods is inversely proportional to the relocation period. It is important to see that Mosaic-Net gives less traffic than EDCG: this is because it incorporates the cost of accessing data and chooses the paths that are less costly. Figure 5c depicts the effects of relocation period on the NTC. Mosaic-Net method gives the highest NTC savings, and EDCG gives the next. However, the difference in the NTC savings by having a smaller relocation period is apposed to larger relocation period is minute. Thus, the relocation period should be set to a certain value that trades off the issues of: (1) (high) data consistency, (2) (high) data accessibility, and (3) (low) traffic. Similar observations were made based on case 2. Figures 6a–6c show the simulation results. These results illustrate the same characteristics as those based on case 1. Readers will notice that results obtained using case 2 have slightly less values of data accessibility, traffic and NTC savings. This is because in case 2, access frequencies vary in a wide range, and thus replicas are also made of objects that have very minute read accesses. This causes the drop in all three performance matrices. From the graphs (set of Figs. 5a–5c and 6a–6c) we can observe that an ideal value for the relocation period, T , would be 256. 6.4.2 Effects of scattered access We examine the characterization of scattering the read access frequency of mobile hosts. To observe this, the read access frequencies of mobile hosts are determined based on case 3, and the standard deviation, σ , is changed. When σ = 0, access frequencies to data objects are equivalent to case 1. As σ increases, the scatter of read access frequencies increases. As a consequence, the difference in read access characteristics among the mobile hosts also increases. We fix m = 200, n = 2000, T = 256, C = 30%, d = 1, R = 7 and θ = 0.90. Figures 7a, 7b, 7c show the simulation results.

352

S.U. Khan

Fig. 6a Relocation period and data accessibility (case 2)

Fig. 6b Relocation period and traffic (case 2)

Figure 7a shows that as the difference of read access characteristics gets larger, the relative difference in data accessibility increases. This is because when the scatter of access characteristics is larger, each mobile host allocates more replicas, and thus, the mobile hosts share a wide range of data objects. This sharing of data objects in turn generates traffic (for accessing data objects) as depicted in Fig. 7b. However, notice that with the increase in read access characteristic, the traffic generated by MosaicNet method reduces. This phenomenon is attributed to the fact that when the access characteristic is larger, the degree of duplication is small in Mosaic-Net compared to that of EDCG.

Mosaic-Net: a game theoretical method for selection and allocation

353

Fig. 6c Relocation period and NTC (case 2)

Fig. 7a Scatter of access and data accessibility

With the increase in read access characteristics, all methods showed increase in NTC savings. Mosaic-Net showed (Fig. 7c) the most savings followed by EDCG. Observe that nearly every method showed almost 90% of its total improvement in NTC savings with an initial increase in σ (as little as 0.01). This is due to the fact that σ injects much more diversity than the number of objects available in the system. Thus, with the later increase in σ , not much of NTC savings is observed. For this reason, from here on, we use case 3 for modeling read access frequency with σ = 0.01.

354

S.U. Khan

Fig. 7b Scatter of access and traffic

Fig. 7c Scatter of access and NTC

6.4.3 Effects of change in updates We examine the effects of change in updates. We fix m = 200, n = 2000, T = 256, C = 30%, d = 1, R = 7 and σ = 0.01. The increase in the frequency of updates in the system requires the replicas be placed as close to the primary site as possible (to reduce the update broadcast). This phenomenon is also interrelated with the system storage capacity, as the update ratio sets an upper bound on the possible traffic reduction through replication. Thus, if we consider a system with unlimited storage

Mosaic-Net: a game theoretical method for selection and allocation

355

Fig. 8a Updates and data accessibility

Fig. 8b Updates and traffic

capacity, the “replicate everywhere anything” policy is strictly inadequate [21]. Figures 8a, 8b, 8c show the simulation results. Figure 8a shows that as the Zipf parameter (θ ) increases, the accessibility of every method decreases. The reason behind this is that when the update frequency increases, the replica contents change with respect to their originals frequently. This in turn shows that when the update frequency is higher, all the comparative methods (RAND, ESAF, EDAFN and EDCG) discussed in this paper do not perform well. This is attributed to the fact that the replicas quickly become invalid, and thus, relocation considering the time remaining until each data object is updated next has

356

S.U. Khan

Fig. 8c Update and NTC

rather adverse effects. On the other hand, if the update frequency is lower, relocation considering the time remaining until each data object is updated next is meaningless and it might also result in adverse effects. Figure 8b shows the relationship between traffic generated and the Zipf parameter (θ ). EDCG method produces the highest amount of traffic and Mosaic-Net method produces the next highest. Notice that as the update frequency gets higher, the traffic soars. This is because each mobile host must frequently refresh the replicas that they hold after the originals have been updated. This fact is also realized in Fig. 8c where all the methods gradually lose NTC savings as the Zipf parameter increases. From Figs. 8a–8c we can observe that the ideal value for the Zipf parameter is 0.90. For this reason, we have used the value of θ = 0.90 in all of our experimental evaluations, as it represents high update frequency. 6.4.4 Effects of storage capacity Next, we observe the effects of system capacity increase. An increase in the storage capacity means that a large number of data objects can be replicated. Replicating an object that is already extensively replicated, is unlikely to result in significant traffic savings as only a small portion of the servers will be affected overall. Because data objects are not equally read intensive, increase in the storage capacity would have a great impact at the beginning (initial increase in capacity), but has little effect after a certain point, where the most beneficial ones are already replicated. We fix m = 200, n = 2000, T = 256, d = 1, R = 7, θ = 0.90 and σ = 0.01. Figures 9a, 9b, 9c show the simulation results. From Fig. 9a, as the system storage size increases, the data accessibility also gets larger in every method. In most cases, the Mosaic-Net method gives the highest data accessibility, and the EDCG method gives the next. The accessibility of the ESAF method is linearly affected by the memory size. The accessibility of the DAFN

Mosaic-Net: a game theoretical method for selection and allocation

357

Fig. 9a Storage capacity and data accessibility

Fig. 9b Storage capacity and traffic

method is also linearly affected by the memory size first, then it shows almost the same value as that of the DCG method. From Fig. 9b, as the system storage capacity increases, the traffic caused by the Mosaic-Net, EDCG and EDAFN methods is also larger at first, but it gets smaller from a certain point. When memory is very small, replica relocation does not cause large traffic because the number of replicas created is small. From Fig. 9c we can observe an immediate initial increase (the point after which further replicating objects is inefficient) in NTC savings by all the methods, but afterwards they all show a near constant performance. RAND although performed the

358

S.U. Khan

Fig. 9c Storage capacity and NTC

Fig. 10a Radio communication range and data accessibility

worst, but observably gained the most NTC savings (41%) followed by Mosaic-Net with 34%. 6.4.5 Effects of radio communication range We examine the effects of radio communication range of mobile hosts. We fix m = 200, n = 2000, d = 1, T = 256, σ = 0.01, θ = 0.90. Figures 10a, 10b, 10c show the simulation results.

Mosaic-Net: a game theoretical method for selection and allocation

359

Fig. 10b Radio communication range and traffic

Fig. 10c Radio communication range and NTC

From Fig. 10a, as the radio communication range gets larger, the data accessibility also gets larger in every method. In most cases, the Mosaic-Net method gives the highest data accessibility. When the communication range is very small, every method gives almost the same data accessibility. This is because the number of mobile hosts connected to each other is small, and thus replica relocation rarely occurs. When the communication range is very large, every method also gives almost the same data accessibility. This is because most mobile hosts are connected to each other, and thus mobile hosts can access original data items in most cases. From Fig. 10b, as the ra-

360

S.U. Khan

dio communication range gets larger, the traffic caused by the Mosaic-Net, DCG and DAFN methods also gets larger at first, but it gets smaller from a certain point. When the radio communication range is very small, the traffic caused by these three methods is small. This is because the number of mobile hosts connected to each other is small, and thus replica relocation does not cause large traffic. When the radio communication range is very large, the DCG method gives smaller traffic than the DAFN method. This is because in the DCG method, the number of mobile hosts in a group is very large and thus replica relocation rarely occurs. Some interesting observations are made when the system is observed in terms of NTC savings relative to the increase in radio communication (Fig. 10c). At first all the methods give near identical NTC savings, but when R is increased from 9 to 10, Mosaic-Net and DCG show an increase of almost 40% in NTC savings after which the savings remain constant. This is due to the fact that nearly all the mobile hosts in the ad hoc network become connected via multi-hop when R = 10, and thus, an immediate increase is observable. However, this increase does not continue in such a magnitude because further increase in the range makes only a few more mobile hosts connected. A natural inquisition at this point would be to investigate an ad hoc network with mobile devices that have heterogeneous radio communication ranges. For this purpose, we fix m = 200, n = 2000, d = 1, T = 256, σ = 0.01, θ = 0.90. Figures 11a, 11b, 11c depict the simulation results. We must note that on the horizontal axis a radio communication range, say x, means that each mobile device was assigned a radio communication range from a uniform random distribution of [1, x]. We also must note that for radio communication range equal to 1, the results of Figs. 10a–10c and Figs. 11a–11c are identical. This is because a uniform random distribution from [1, 1] would always result in 1. Moreover, we observe that the results of Fig. 11a–11c are in principle similar to those of Figs. 10a–10c. This is because the measurement of data object accessibility in an ad hoc network is complicated by the movement of

Fig. 11a Heterogeneous radio communication range and data accessibility

Mosaic-Net: a game theoretical method for selection and allocation

361

Fig. 11b Heterogeneous radio communication range and traffic

Fig. 11c Heterogeneous radio communication range and NTC

mobile devices and it is hard to quantify the affects of the heterogeneity of the radio communication ranges and relative movement of mobile devices. Perhaps a sound future work would be to investigate the phenomenon discussed above with changing, pre-observable, or pre-computable mobility models, such as human mobility models and insect mobility models. 6.4.6 Algorithm termination timings Finally, we compare the termination time of the algorithms. We randomly choose 30 problem instances with varying parameters. The entries (Table 8) in bold repre-

17.91

18.65 18.74

15.94 18.25

8.72 9.50 9.41 9.45 8.63 9.46 4.97 6.40 7.59 9.28 10.37 11.33

m = 200, n = 2000 [d = 1, R = 7, T = 1, C = 30%, σ = 0.01, θ = 0.90]

m = 200, n = 2000 [d = 1, R = 7, T = 512, C = 30%, σ = 0.01, θ = 0.90]

m = 200, n = 2000 [d = 1, R = 7, T = 1024, C = 30%, σ = 0.01, θ = 0.90]

m = 200, n = 2000 [d = 1, R = 7, T = 2048, C = 30%, σ = 0.01, θ = 0.90]

m = 200, n = 2000 [d = 1, R = 7, T = 4096, C = 30%, σ = 0.01, θ = 0.90]

m = 200, n = 2000 [d = 1, R = 7, T = 8192, C = 30%, σ = 0.01, θ = 0.90]

m = 200, n = 2000 [d = 1, R = 7, T = 256, C = 10%, σ = 0.01, θ = 0.90]

m = 200, n = 2000 [d = 1, R = 7, T = 256, C = 15%, σ = 0.01, θ = 0.90]

m = 200, n = 2000 [d = 1, R = 7, T = 256, C = 20%, σ = 0.01, θ = 0.90]

m = 200, n = 2000 [d = 1, R = 7, T = 256, C = 25%, σ = 0.01, θ = 0.90]

m = 200, n = 2000 [d = 1, R = 7, T = 256, C = 35%, σ = 0.01, θ = 0.90]

m = 200, n = 2000 [d = 1, R = 7, T = 256, C = 40%, σ = 0.01, θ = 0.90]

24.16

27.51

23.68

20.93

14.62

18.56

18.69

18.04

18.46

27.09

10.73 11.76

m = 200, n = 2000 [d = 1, R = 15, T = 256, C = 30%, σ = 0.01, θ = 0.90]

21.25

m = 200, n = 2000 [d = 1, R = 20, T = 256, C = 30%, σ = 0.01, θ = 0.90]

8.52 10.09

m = 200, n = 2000 [d = 1, R = 7, T = 256, C = 30%, σ = 0.01, θ = 0.90]

m = 200, n = 2000 [d = 1, R = 10, T = 256, C = 30%, σ = 0.01, θ = 0.90]

14.30 16.24

5.59 6.16

m = 200, n = 2000 [d = 1, R = 1, T = 256, C = 30%, σ = 0.01, θ = 0.90]

m = 200, n = 2000 [d = 1, R = 5, T = 256, C = 30%, σ = 0.01, θ = 0.90]

ESAF

RAND

Problem size

Table 8 Running time in seconds

25.91

22.15

19.69

15.64

15.21

14.19

23.01

20.34

20.59

19.20

24.19

16.65

27.50

22.81

19.48

16.33

15.95

14.28

EDAFN

41.67

30.87

28.66

23.59

21.59

19.54

22.13

21.29

21.49

21.67

21.67

21.53

30.46

25.36

25.88

21.16

19.82

19.24

EDCG

12.32

10.89

10.26

8.47

6.87

5.60

9.75

9.42

9.84

9.63

9.53

9.41

12.12

11.14

10.25

8.85

6.64

6.38

Mosaic-Net

362 S.U. Khan

23.60

23.28 22.95

15.41 12.53 17.45 20.36 8.53 13.27 15.05 17.29 17.82 20.01

m = 200, n = 2000 [d = 1, R = 7, T = 256, C = 30%, σ = 0.04, θ = 0.90]

m = 200, n = 2000 [d = 1, R = 7, T = 256, C = 30%, σ = 0.06, θ = 0.90]

m = 200, n = 2000 [d = 1, R = 7, T = 256, C = 30%, σ = 0.08, θ = 0.90]

m = 200, n = 2000 [d = 1, R = 7, T = 256, C = 30%, σ = 0.10, θ = 0.90]

m = 200, n = 2000 [d = 1, R = 7, T = 256, C = 30%, σ = 0.01, θ = 0.35]

m = 200, n = 2000 [d = 1, R = 7, T = 256, C = 30%, σ = 0.01, θ = 0.45]

m = 200, n = 2000 [d = 1, R = 7, T = 256, C = 30%, σ = 0.01, θ = 0.65]

m = 200, n = 2000 [d = 1, R = 7, T = 256, C = 30%, σ = 0.01, θ = 0.75]

m = 200, n = 2000 [d = 1, R = 7, T = 256, C = 30%, σ = 0.01, θ = 0.85]

m = 200, n = 2000 [d = 1, R = 7, T = 256, C = 30%, σ = 0.01, θ = 0.95]

35.33

30.58

27.15

21.70

34.52

29.61

29.72

15.40 16.89

10.89 12.14

m = 200, n = 2000 [d = 1, R = 7, T = 256, C = 30%, σ = 0.00, θ = 0.90]

m = 200, n = 2000 [d = 1, R = 7, T = 256, C = 30%, σ = 0.02, θ = 0.90]

ESAF

RAND

Problem size

Table 8 (Continued)

30.96

26.05

23.54

22.64

21.28

16.22

33.09

31.17

20.96

25.14

23.46

14.98

EDAFN

40.45

32.38

31.52

29.83

29.42

25.13

45.64

40.18

32.21

30.75

24.91

21.32

EDCG

9.06

13.51

15.40

18.23

18.33

20.31

19.92

17.40

13.06

16.36

12.45

11.81

Mosaic-Net

Mosaic-Net: a game theoretical method for selection and allocation 363

364

S.U. Khan

sent the fastest time recorded over the problem instance. It is observable that RAND and Mosaic-Net terminated faster than all the other techniques, followed by ESAF, EDAFN and DCG. We must note that the termination time of Mosaic-Net also includes the message passing communication times. That is to say the termination time of Mosaic-Net is a true representation of real-life implementation of Mosaic-Net on real mobile devices.

7 Conclusions We summarize the major results of this paper as follows: 1. A mathematical optimization model for the ad hoc network data replication problem (ADRP). 2. The derived model is general, flexible and modifiable to cater for various applications of ad hoc networks. 3. The ADRP in general is NP-complete. 4. A game theoretical technique (Mosaic-Net) to cater for ADRP. 5. Investigated Mosaic-Net in detail by identifying some useful properties and the necessary conditions of optimality. 6. Showed that Mosaic-Net can effectively counteract the various uncertainties associated with the ad hoc networks, in particular the network disconnections. 7. Extensive experimental comparisons are made against some well-known methods, such as: randomized, extended static access frequency [15], extended dynamic access frequency and neighborhood [15] and extended dynamic connectivity group [15]. The proposed game theoretical replica allocation mechanism in ad hoc networks (Mosaic-Net) is a protocol for automatic replication and migration of objects in response to demand changes. Mosaic-Net aims to place objects in the proximity of a majority of requests while ensuring that no mobile hosts become overloaded. With Mosaic-Net, players (mobile hosts) compete through bids in a noncooperative environment to replicate data objects that are beneficial to themselves and the system as a whole. To cater for the possibility of cartel type behavior of the players, Mosaic-Net uses the Vickrey payment protocol [38] that leaves the players with no option than to bid in such a fashion that is beneficial to the system as a whole. We compared Mosaic-Net with some well-known techniques, as: randomized, extended static access frequency, extended dynamic access frequency and neighborhood and extended dynamic connectivity group. To provide a fair comparison, the assumptions and system parameters were kept the same in all the approaches. The experimental results revealed that Mosaic-Net outperformed the four powerful techniques in both the execution time and solution quality. In summary, Mosaic-Net exhibited up to 20% better solution quality and considerable savings in the algorithm termination timings. As future work, we would like to extend Mosaic-Net to incorporate the phenomenon of unstable radio links [16]. That is, when mobile hosts are connected by unstable radio links that are likely to be disconnected after a short time, it is inefficient

Mosaic-Net: a game theoretical method for selection and allocation

365

to allocate different replicas on them because they cannot share the replicas after disconnections [15]. The stability of radio links is estimated from the current locations of mobile hosts, the radio communication ranges, and mobility characteristics such as speed and direction. Even when obstacles to radio waves exist, e.g., buildings and mountains, the stability of radio links can also be estimated from the degree of radio wave intensity. Thus, it would be interesting and extremely important to incorporate the notion of unstable radio links and study their behaviors according to our proposed ADRP.

References 1. Apers P (1988) Data allocation in distributed database systems. ACM Trans Database Syst 13(3):263– 304 2. Archer A, Tardos E (2001) Truthful mechanisms for one-parameter agents. In: Proc of 42nd IEEE annual symposium on foundations of computer science, 2001, pp 482–491 3. Arrow K (1950) A difficulty in the concept of social welfare. J Political Econ 58(4):328–346 4. Barbara D, Imielinski T (1994) Sleepers and workaholics: caching strategies in mobile environments. In: Proc of ACM SIGMOD, 1994, pp 1–12 5. Bayati A, Ghodsnia P, Basseda R, Rahgozar M (2006) A novel way of determining the optimal location of a fragment in a DDBS: BGBR. In: Proc of international conference on systems and networks communication (ICSNC), 2006 6. Casey R (1972) Allocation of copies of a file in an information network. In: Proc spring joint computer conf, IFIPS, 1972, pp 617–625 7. Cay J, Tan KL, Ooi BC (1997) On incremental cache coherency schemes in mobile computing environments. In: Proc of IEEE ICDE, 1997, pp 114–123 8. Chu W (1969) Optimal file allocation in a multiple computer system. IEEE Trans Comput C-18(10):885–889 9. Crovella M, Bestavros A (1997) Self-similarity in world wide web traffic: evidence and possible causes. IEEE/ACM Trans Netw 5(6):835–846 10. Eswaran K (1974) Placement of records in a file and file allocation in a computer network. Inf Process Lett 304–307 11. Garey M, Johnson D (1979) Computers and intractability. W.H. Freeman, New York 12. Gerbia HA, Cenan C (2005) Distributed database replication—a game theory? In: Proc of 7th international symposium on symbolic and numeric algorithms for scientific computing, 2005 13. Grassi V (2000) Prefetching policies for energy saving and latency reduction in a wireless broadcast data delivery system. In: Proc of ACM international workshop on modeling, 2000, pp 77–84 14. Hara T (2001) Effective replica allocation in ad hoc networks for improving data accessibility. In: Proc of IEEE INFOCOM, 2001, pp 1568–1576 15. Hara T (2003) Replica allocation in ad hoc networks with data update. Mobile Netw Appl 8:343–354 16. Hara T, Madria SK (2004) Dynamic data replication using aperiodic updates in mobile ad hoc networks. In: Proc of 9th international conference on database systems for advance applications, 2004, pp 869–881 17. Huang Y, Sistla P, Wolfson O (1994) Data replication for mobile computer. In: Proc of ACM SIGMOD, 1994, pp 13–24 18. Jing J, Elmagarmid A, Helal A, Alonso R (1997) Bit-sequences: an adaptive cache invalidation method in mobile client/server environments. Mobile Netw Appl 2(2):115–127 19. Johnson DB (1994) Routing in ad hoc networks of mobile hosts. In: Proc of IEEE workshop on mobile computing systems and applications, 1994, pp 158–163 20. Khan SU, Ahmad I (2004) Heuristic-based replication schemas for fast information retrieval over the Internet. In: Proc of 17th international conference on parallel and distributed computing systems, 2004, pp 278–283 21. Khan SU, Ahmad I (2005) A powerful direct mechanism for optimal WWW content replication. In: Proc of 19th IEEE International parallel and distributed processing symposium, 2005

366

S.U. Khan

22. Khan SU, Ahmad I (2005) RAMM: a game theoretical replica allocation and management mechanism. In: 8th International symposium on parallel architectures, algorithms, and networks (I-SPAN), Las Vegas, NV, USA, December 2005, pp 160–165 23. Khan SU, Ahmad I (2006) A pure nash equilibrium guaranteeing game theoretical replica allocation method for reducing web access time. In: 12th international conference on parallel and distributed systems (ICPADS), Minneapolis, MN, USA, July 2006, pp 169–176 24. Khan SU, Ahmad I (2006) Replicating data objects in large-scale distributed computing systems using extended Vickery auction. Int J Comput Intell 3(1):14–22 25. Khan SU, Ahmad I (2007) Discriminatory algorithmic mechanism design based WWW content replication. Informatica 31(1):105–119 26. Khan SU, Ahmad I (2009) A pure Nash equilibrium-based game theoretical method for data replication across multiple servers. IEEE Trans Knowl Data Eng 20(3):346–360 27. Loukopoulos T, Ahmad I, Papadias D (2002) An overview of data replication on the Internet. In: Proc of ISPAN, 2002, pp 31–36 28. Mahmoud S, Riordon J (1976) Optimal allocation of resources in distributed information networks. ACM Trans Database Syst 1(1):66–78 29. Nisan N, Ronen A (1999) Algorithmic mechanism design. In: Proc 31st annual ACM symposium on theory of computing (STOC ’99), 1999, pp 129–140 30. Osborne M, Rubinstein A (1994) A course in game theory. MIT Press, Cambridge 31. Pearlman MR, Haas ZJ (1999) Determining the optimal configuration for the zone routing protocol. IEEE J Select Areas Commun 17(8):1395–1414 32. Perkins CE, Bhagwat P (1994) Highly dynamic destination-sequenced distance-vector routing for mobile computers. In: Proc of ACM SIGCOMM, 1994, pp 234–244 33. Perkins CE, Royer EM (1999) Ad hoc on demand distance vector routing. In: Proc of IEEE workshop on mobile computing systems and applications, 1999, pp 90–100 34. Pitoura E, Bhargava B (1995) Maintaining consistency of data in mobile distributed environments. In: Proc of IEEE ICDCS, 1995, pp 404–413 35. Rappaport TS (1996) Wireless communications: principles and practices. Prentice Hall, New York 36. Saurabh S, Parkes D (2004) Hard-to-manipulate VCG-based auctions. Technical Report, Harvard University 37. Shannon CE (1949) The mathematical theory of information. University of Illinois Press, Urbana, 1949 38. Vickrey W (1961) Counterspeculation, auctions and competitive sealed tenders, J Finance 8–37 39. Wu KL, Yu PS, Chen MS (1996) Energy-efficient caching for wireless mobile computing. In: Proc of IEEE ICDE, 1996, pp 336–343 40. Zipf G (1949) Human behavior and the principle of least-effort. Addison-Wesley, Cambridge