A Game-Theoretic Look at Simple Relay Channel - Semantic Scholar

2 downloads 0 Views 199KB Size Report
before to study the information-theoretic multi-access channel [2] as well as random ... on node 2 to forward them to the final destination, whereas node 2 has the ...
A Game-Theoretic Look at Simple Relay Channel Yalin Evren Sagduyu1 and Anthony Ephremides1 Electrical and Computer Engineering Department and Institute for Systems Research University of Maryland, College Park, MD 20742, USA [email protected], [email protected] Abstract. In this paper, we address the problem of communication over the relay channel from the perspective of a stochastic game and evaluate the fundamental tradeoffs between direct communication and relaying through a detailed foray into the questions of throughput, delay and energy-efficiency. We consider a simple wireless ad-hoc network with a transmitter, a relay and a common destination. The relay node not only assists in delivering packets of transmitter but also generates its own packet traffic for destination. Transmitter and relay nodes have conflicting throughput objectives, since all packets transmissions share a single, slotted, classical collision channel, which allows at most one successful transmission per time slot. Relaying packets over an intermediate node might be preferred by transmitter node over direct communication from the energy-efficiency point of view at the expense of increasing energy consumption and decreasing throughput for the relay node. These potential conflicts between interests of transmitter and relay nodes suggest a game-theoretic formulation of communication strategies over simple relay channel. Instead of using external enforcement, we rely on a reward-based mechanism to stimulate cooperation for relaying purposes. In this context, we evaluate the non-cooperative Nash equilibrium by allowing nodes to select their own transmission strategies to optimize individual performance measures, which involve throughput, delay and energy costs. We also formulate the full cooperation of transmitter and relay nodes as a team problem of optimizing total system performance.

1

Introduction

The relay channel is the fundamental building block of the wireless ad-hoc networks (together with multi-access and broadcast channels), and represents the simplest form of multi-hop routing operation. Although the relay channel has been extensively analyzed in terms of capacity properties [1], wireless ad-hoc networks are known to have additional performance objectives of energy-efficiency and delay, which involve not clearly developed tradeoffs intertwined with throughput measures. If transmission signal is modeled as decaying with distance, relaying packets over intermediate nodes might be desirable (depending on node locations) for the purpose of preserving energy and reducing interference to other nodes. However, the additional distance traveled by packets in multi-hop communication is expected to increase the packet delay and reduce the throughput especially if we do not allow simultaneous transmission and reception. Cooperation for relaying will also increase energy consumption and decrease throughput of relay nodes. To understand tradeoffs involving energy, interference and distance limitations, we strip out all complexities of multiple source-destination pairs and consider a simple relay channel, namely a wireless ad-hoc network of two transmitters and a common destination with relaying operation only possible over a single relay node chosen as the closest to the destination. In this paper, we formulate the communication over the relay channel as a stochastic game between a transmitter and relay node with conflicting interests of transmitting to a common destination, and evaluate the tradeoffs involving throughput, delay and energy properties. Game theory has been applied before to study the information-theoretic multi-access channel [2] as well as random access for single collision channel with a single receiver [3], [4], [5], [6]. On the other hand, classical multi-hop communication studies inherently assume cooperation of nodes for relaying purposes. However, recent studies [7], [8], [9] also reflect an emerging interest in analyzing stimulation mechanisms for cooperation as well as applying game-theoretic tools to model the selfish node behavior in wireless ad-hoc networks. Compared to the centralized (cooperation-based) medium-access-control, a system with selfish nodes is inherently distributed and therefore more scalable. Selfishness can also improve the system fairness, since nodes cannot achieve performance gains individually by breaking rules of channel access schemes.

In this context, we evaluate the fundamental tradeoffs between cooperation and selfishness in terms of two basic transmission schemes of direct communication and relaying. Our ultimate objective is to apply game-theoretic tools to understand the efficient operational modes of wireless ad-hoc networks, as discussed in [10] from the perspective of information theory. In our game formulation, transmitter node selects one of three possible actions of transmitting packets directly to the final destination, delivering them first to the closer relay node, or simply waiting. On the other hand, relay node chooses either to transmit directly to destination or to wait, if its queue contains a packet, and decides on whether to accept or reject a packet of transmitter node, if its queue was empty at the end of the previous slot. The non-cooperative approach allows selfish nodes to select probability distributions over the available actions in order to optimize their own performance measures, whereas cooperation among nodes can be set up as a team problem of optimizing total system performance. The rest of the paper is organized as follows. Section 2 describes the network model, relaying mechanism, rewards and costs for actions of nodes. In section 3, we introduce the general model for stochastic games with partially observable states. We devote section 4 to formulate communication over the relay channel as a stochastic game and specify the strategies, state transition matrix and expected utilities. Section 5 provides numerical results for equilibrium strategies. Section 6 introduces an adaptive algorithm for the case of imperfect information. In section 7, we introduce an additional rule giving transmission priority to forwarded packets. This is followed in section 8 by the theoretical analysis of cooperation incentives for the special case without packet generation by relay node. In section 9, we modify the game model enforcing immediate transmission of new packets. We collect final remarks in section 10.

2

Network Model

We consider a simple wireless packet network of a transmitter node (node 1), a relay node (node 2), and a common destination node (node 3), as shown in Figure 1. We assume a slotted synchronous system, where nodes 1 and 2 independently generate packets destined to common destination at each time slot with probability a1 and a2 , respectively, under the assumption that their individual queues were empty at the end of the previous slot. To simplify the analysis, we assume the maximum queue size of one. Tranmitter Node (Node 1)

Receiver Node (Node 3)

Relay Node (Node 2)

Figure 1. Simple relay channel model of three nodes (Arrows indicate possible transmissions among nodes.)

As the transmission strategy, node 1 chooses between sending packets directly to node 3 or relying on node 2 to forward them to the final destination, whereas node 2 has the only option of directly transmitting packets to node 3. We consider the synchronous and slotted classical collision channel model for transmissions to both relay and common destination nodes. All transmitted packets have the same length and require one time unit (equal to one time slot) for transmission. At each receiver node (namely node 2 or 3), transmissions are subject to one of three possible channel outputs, namely idle, success or collision, whenever 0 and 1 or more than one packet are transmitted in the given time slot to the particular receiver. We assume that simultaneous transmissions to different receivers destructively interfere with each other. We do not also allow multi-packet transmission or simultaneous transmission and reception by any node. Another form of unsuccessful transmission, which we do not distinguish from a collision with another packet, occurs if node 1 transmits to node 2, which has already a packet in its queue or does not accept the packet from node 1 to forward. For all packet collisions, nodes attempt to retransmit their backlogged packets in subsequent slots for reliable communication. We assume that all nodes in the system have either infinite or immediately renewable energy supplies. Transmitting nodes 1 and 2 have immediate access to error-free ternary feedback, i.e. whether a collision, success or an idle was observed during the preceding slot at their intended receivers. We assume that no feedback information is received from the unintended receivers. A separate collision-free channel (based on scheduled rather than random-access) is dedicated to feedback control packets.

2.1

Relaying Rules

Node 2 cannot receive the packet of node 1, whenever it has already a packet in its queue. Otherwise, node 2 decides on whether to accept or reject a possible packet from node 1. For the case of acceptance, node 2 does not generate any new packet during the particular slot, and undertakes all future rewards and costs regarding the transmission of the accepted packet by paying an immediate reward c to node 1. Charging the relay node an immediate payoff for the forwarded packet and offering instead a future throughput reward (same as for the self-generated packets) will stimulate the cooperation of node 2 to deliver the packet accepted from node 1 to the final destination. We assume that node 1 is immediately informed of whether the transmitted packet is accepted by node 2 or not. 2.2

Rewards and Costs

For each successful transmission to the common destination, node 1 or 2 receives a throughput reward of value 1. Node 1 receives a reward c from node 2 for delivering a packet to node 2, which can only obtain the full throughput credit of value 1 after successfully transmitting that particular packet to the final destination in the subsequent slots. Each packet transmission attempt from node i to node j incurs an energy cost Ei,j . Transmission power is chosen as the smallest value that would result in successful packet reception in the absence of interfering transmissions. If the signal power decays with distance r as r1α , where α ≥ 2 is the path loss exponent, we have Ei,j = kdα i,j for some positive constant k, where di,j denotes the distance between nodes i and j. Each slot of delay for a packet results in an additive cost of d for the node having that particular packet in its own queue (regardless of the originator of that particular packet). The immediate utility of any node is defined as the amount of reward (received by the particular node in the given time slot), from which the incurred cost is subtracted.

3

General Model of Stochastic Games

First, we introduce the general model for the stochastic games of interest. A two-person game is described by the tuple (K, A1 , A2 , T, u1 , u2 ), where K is the state space, Ai is the action space of user i, T : K x A1 x A2 x K → [0, 1] is a transition probability function and ui : K x A1 x A2 → ℜ is an immediate utility function for user i. Suppose that the game is in state k t ∈ K in time slot t and users play the actions at1 ∈ A1 and at2 ∈ A2 . Then, each user i receives immediate utility of ui (k t , at1 , at2 ) and the game moves in next time slot t + 1 to state k t+1 with probability given by transition function T (k t+1 /k t , at1 , at2 ). We define a history ht at time t to be the sequence of previous states and actions, as well as the t−1 t t current state: ht = (k 1 , a11 , a12 , ..., kt−1 , at−1 1 , a2 , k ). Let H be the set of all possible histories until time t. A mixed strategy si for user i is a sequence of st : H t → P (Ai ), which is a function that assigns to Ht a probability measure over the set of actions of user i. Let s = (s1 , s2 ) denote the joint mixed strategy of both users. A distribution β for the initial state and a strategy s together define a probability measure Ps,β which determines the distributions of stochastic process {k t , at1 , at2 } of states and actions. The (undiscounted) average utility (per time slot) of user i is defined as tf X 1 Es,β [ ui (k t , at1 , at2 )] tf →∞ tf t=1

Uiβ (s) = lim

(1)

where the expectation is taken over Ps,β . In the non-cooperative game, user i independently selects si P2 to maximize Uiβ (s). For the cooperative team problem, users jointly select s to maximize i=1 Uiβ (s). 3.1

Stochastic Games with Partially Observed States

The given formulation of stochastic games is based on the assumption that all states of the game are perfectly observable by both users. However, the communication over the relay channel does not fit in the described general framework, since nodes can have only partial information about the state of the game. We propose specifically to use the content of transmitter and relay queues (i.e. whether the queue of each node has new, backlogged, forwarded or no packets) as states of the game and restrict the information of each node to its own packet queue. Therefore, we need to modify the game model and redefine the expected utilities for a tractable analysis. We consider only stationary (mixed) strategies of form si,k (Ai ) that assigns (stationary) probability distributions to available actions of user (i.e. node) i at state k ∈ K. We follow the Markov game

assumption such that decisions of all users are based only on current state k t ∈ K independent of time t instead of the entire history ht . We define si = {si,k (Ai ) : k ∈ K} as the collection of only random (nondeterministic) strategies of user i for all possible states and denote by s = (s1 , s2 ) the random stationary strategies of both users. For any stationary strategy s, each state kj ∈ K has the stationary distribution πj (s) uniquely determined by the stationary state transition functions T (k2 /k1 , a1 , a2 ), k1 , k2 ∈ K, a1 ∈ A1 , a2 ∈ A2 . We define T(s) as the state transition matrix, where the (k1 , k2 )th entry gives the probability of transition from state k1 to state k2 under the stationary strategy s. For the case of partially observable states, we use the expectation over state distributions, rather than the exact state, to define Ui (s) as an alternative stationary expected utility per time slot for user i: X Ui (s) = (2) πj (s)Eiu (kj , s) kj ∈K

Eiu (kj , s)

denotes the immediate utility expected by user i, if joint strategy s is played at state kj . where The objective of each user i is to select independently the strategy si in order to maximize Ui (s) in the P2 non-cooperative case. For the cooperative team problem, users jointly select s to maximize i=1 Ui (s).

4 4.1

Communication over Relay Channel as a Stochastic Game State Definition

The state of the game is defined as (Q1 , Q2 ), where Q1 and Q2 denote the queue content of node 1 and 2, respectively. The quantity Qi takes the value of 0 or 1, if no or one (new, forwarded or backlogged) packet is present at the queue of node i. We assume that new (and forwarded) packets are immediately backlogged, before they are transmitted for the first time. There are four states of the game: k1 = (0, 0), k2 = (0, 1), k3 = (1, 0) and k4 = (1, 1). We assume that nodes have complete information on their own queues only. Therefore, the strategy of each node i is only based on Qi rather than the complete state (Q1 , Q2 ). In sections 7 and 9, we will extend the state space to include new, forwarded and backlogged packets as additional types of the queue content to exploit more efficiently the available partial information of nodes about each other’s packet queues. 4.2

Action Space and Mixed Stationary Strategies

The actions of node 1 are A1 (transmitting to node 3), A2 (transmitting to node 2) and A3 (waiting), i.e. we have A1 = {A1 , A2 , A3 }. The corresponding mixed stationary strategies are s1,k (A1 , A2 , A3 ) = (p1 , p2 , 1 − p1 − p2 ), for states k = {(1, 0), (1, 1)}, where p1 , p2 and 1 − p1 − p2 denote the probabilities to select actions A1 , A2 and A3 , respectively. For states k = {(0, 0), (0, 1)}, node 1 has the only stationary strategy s1,k (A1 , A2 , A3 ) = (0, 0, 1). The random strategy space of node 1 is given by s1 = (p1 , p2 ). Node 2 has the actions of B1 (transmitting to node 3) and B2 (waiting) for states (0, 1), (1, 1), and has actions of C1 (accepting the packet transmitted from node 1) and C2 (rejecting the packet transmitted from node 1) for states (0, 0), (1, 0). Thus, we have A2 = {B1 , B2 , C1 , C2 }. The corresponding mixed stationary strategies are s2,k (B1 , B2 , C1 , C2 ) = (q, 1 − q, 0, 1) for states k = {(0, 1), (1, 1)}, where q, and 1−q denote the probabilities of selecting actions B1 , B2 in states (0, 1) and (1, 1), and s2,k = (0, 1, r, 1−r) for states k = {(0, 0), (1, 0)}, where r, and 1 − r denote the probabilities of selecting actions B1 , B2 in states (0, 0) and (1, 0). The random strategy space of node 2 is uniquely described by the stationary distribution s2 = (q, r) and the joint set of random stationary strategies is given by s = (p1 , p2 , q, r). 4.3

State Transition Matrix and Expected Utility Functions

For any stationary strategy s, the evolution of (Q1 , Q2 ) follows a two-dimensional ergodic Markov chain, which is irreducible and aperiodic with finite number of states. Therefore, there exists a unique stationary distribution π(s) = {πj (s), ∀kj ∈ K}, where πj (s) denotes the stationary distribution of state kj , if nodes assume the joint stationary strategy s. The global balance equations are given by π(s) = π(s)T(s), where the state transition matrix T(s) can be expressed for s = (p1 , p2 , q, r) as follows:   (1 − a1 )(1 − a2 ) (1 − a1 )a2 a1 (1 − a2 ) a1 a2   (1 − a1 )q(1 − a2 ) (1 − a1 )(1 − q + qa2 ) a1 q(1 − a2 ) a1 (1 − q + qa2 )    p1 (1 − a1 )(1 − a2 ) (1 − a1 )(p1 a2 + p2 r) (1 − a2 )(1 − p1 − p2 (1 − p1 − p2 )a2 + p2 ra1   T(s) =   +p1 a1 + p2 (1 − r)) +p2 (1 − r)a2 + p1 a1 a2     0 p1 (1 − q)(1 − a1 ) q(1 − a2 )(1 − p1 − p2 ) (1 − p1 − p2 )(1 − q + qa2 )  +p1 q + p2 + p1 a1 (1 − q)

If nodes follow the strategy s = (p1 , p2 , q, r), their expected utilities U1 and U2 are given by U1 (s) = π3 (s)[p1 (1 − E1,3 ) + p2 (rc − (1 − r)d − E1,2 ) − (1 − p1 − p2 )d]

(3)

+π4 (s)[p1 (−E1,3 + 1 − q − qd) + p2 (−E1,2 − d) − (1 − p1 − p2 )d] U2 (s) = π2 (s)[q(1 − E2,3 ) − (1 − q)d] + π3 (s)[−p2 rc]

(4)

+π4 (s)[q(−E2,3 − (p1 + p2 )d + 1 − p1 − p2 ) − (1 − q)d] Remark 1: U2 seems to decrease monotonically with increasing r, i.e. accepting packets is always malicious to node 2. However, if we increase r, then π2 and π4 will also increase (if a1 > a2 ) and π3 will decrease, which will compensate the decrease in U2 (because of term p2 rc). Although node 1 seems to benefit from an increase in r, the resulting increase in T32 can shift distribution from state (1, 0) to (0, 1) and can decrease U1 . Thus, we cannot necessarily expect pure strategy r ∈ {0, 1} for all values of c. Remark 2: The collision channel with two uplink nodes is equivalent to the case of direct communication (p2 = 0 or r = 0) in the described game model and has been studied before in [3], [4], [5], [6] as symmetric games with common cost and reward functions for all transmitters. However, the formulation of this paper assumes arbitrary system parameters and includes the non-symmetric games. On the other hand, the pure strategy of two-hop relaying (i.e. p1 = 0) can be used to model the case E1,3 ≫ E1,2 . 4.4

Non-Cooperative (Selfish) and Cooperative (Social) Equilibrium

For the non-cooperative case, we evaluate strategies of selfish users in Nash Equilibrium, such that no user can improve its own utility, if strategies of other users remain the same. In other words, we are interested in the problem of finding equilibrium strategies s∗ = (s∗1 , s∗2 ) such that for any user i and any strategy si for that user, we have U1 (s∗ ) ≥ U1 (s1 , s∗2 ) and U2 (s∗ ) ≥ U2 (s∗1 , s2 ). The best response correspondence of node 1 (if node 2 plays the strategy s2 ) is B1 (s2 ) = arg maxs1 U1 (s), where s = (s1 , s2 ) and the best response correspondence of node 2 (if node 1 plays the strategy s1 ) is B2 (s1 ) = arg max{s2 } U2 (s). As a result, the non-cooperative Nash equilibrium s∗ = (s∗1 , s∗2 ) is given by s∗1 ∈ B1 (s∗2 ) and s∗2 ∈ B2 (s∗1 ). The cooperation between transmitter and relay nodes can be set up as a team problem P of maximizing Utotal (s) = U1 (s) + U2 (s) over s such that π(s) = π(s)T(s), 0 ≤ πj (s) ≤ 1, kj ∈ K, and kj ∈K πj (s) = 1. Remark 3: Node mobility can affect transmission strategies through changes in the energy costs. We assume that node mobility is slow compared to the evolution rate of strategies towards the equilibrium.

5

Numerical Investigation

Unless we specify otherwise, the system parameters for numerical results are: E1,3 = 0.4, E1,2 = 0.1, E2,3 = 0.1, d = 0.1, c = 0.6, a1 = 0.25, a2 = 0.25. As the energy measure, we fix E1,2 and define the quantity κ = E1,3 /E1,2 to denote the ratio of energy consumption by direct and two-hop relay communication of node 1. For the non-cooperative case, we illustrate the equilibrium strategies of node 1 (namely probabilities p1 and p2 ) and the cooperation strategy of node 2 (namely probability r) as functions of κ, c, a1 and a2 in Figures 2, 3, 4 and 5, respectively. 1

1 p1 p 2

0.9

p 1 p2

0.9

r

0.8

r

r

0.8

Direct Communication

r

0.7

0.7

p

Equilibrium Strategies

Equilibrium Strategies

p 0.6

1

p2

0.5

0.4

1

0.6

0.5

Direct Communication

Direct Communication

0.4

Two−Hop Relaying 0.3

0.3

0.2

0.2

0.1

0

p2

0.1

0

1

2

3

4

5 κ=E

1,3

6

7

8

9

10

/E

1,2

Figure 2. Effects of energy costs on the non-cooperative equilibrium strategies

0 0.4

0.45

0.5

0.55

0.6

0.65

0.7

0.75

0.8

0.85

0.9

c : cooperation stimulation parameter

Figure 3. Effects of cooperation mechanism on the noncooperative equilibrium strategies

From Figure 2, we observe that low values of κ support direct communication, whereas node 1 transfers distributions from p1 to p2 with increasing κ and starts operating in the two-hop relaying mode with p2 = 0 for high values of κ. As shown in Figure 3, node 1 selects p2 = 0 for c < 0.51 and node 2 selects r = 0 for c > 0.73, whereas node 1 counteracts by switching to strategy p2 = 0. For the intermediate values of c, we have the mixed strategies of direct communication and relaying. Figures 4 and 5 show that r and p2 are inversely proportional to both a1 and a2 . High values of a2 reduce π4 and restrict node 1 to strategy p2 = 0 (i.e. direct communication) and node 2 to strategy r = 0. 1

1 p1 p2

0.9

p1 p2

0.9

r

0.8

r

0.8

p1 0.7

r Equilibrium Strategies

Equilibrium Strategies

0.7

0.6

0.5

0.4

p1

0.3

0.6

0.5 Direct Communication

0.4

r 0.3

Direct Communication

0.2

p2

0.2

p2 0.1

0

0.1

0

0.1

0.2

0.3 0.4 0.5 0.6 0.7 a1 : probability of packet new arrival at node 1

0.8

0.9

0

1

Figure 4. Effects of packet arrival probability at node 1 on the non-cooperative equilibrium strategies

0

0.1

0.2

0.3 0.4 0.5 0.6 0.7 a2 : probability of new packet arrival at node 2

0.8

0.9

Figure 5. Effects of packet arrival probability at node 2 on the non-cooperative equilibrium strategies

We evaluate the non-cooperative and social equilibrium utilities as functions of κ, c, a1 and a2 , and ∗ ∗ summarize the results in Table 1. We define Ui,nc and Ui,c as the expected utilities of node i in nonP2 P2 ∗ ∗ cooperative and social equilibrium. The gap between i=1 Ui,c and i=1 Ui,nc increases with increasing P2 ∗ a1 and a2 , since transmission decisions of selfish nodes become more aggressive. The term i=1 Ui,nc P2 ∗ strongly depends on c, although the total expected utility and the term i=1 Ui,c do not depend on c. We also observe that increasing κ reduces the expected utilities in non-cooperative and social equilibrium. Table 1. Comparison of Expected Utilities in Non-cooperative and Cooperative Equilibrium

∗ U1,nc ∗ U2,nc ∗ ∗ U1,nc + U2,nc ∗ ∗ U1,c + U2,c

6

a2 = 0 κ=1 c=0.4 0.1500 0 0.1500 0.1500

a2 = 0.25 κ=1 c=0.4 0.0823 0.1602 0.2425 0.2637

a2 = 0 κ=5 c=0.4 0.1034 0.0321 0.1355 0.1467

1

a2 = 0.25 κ=5 c=0.4 0.0728 0.1668 0.2383 0.2546

a2 = 0 κ=1 c=0.6 0.1500 0 0.1500 0.1500

a2 = 0.25 κ=1 c=0.6 0.0855 0.1651 0.2506 0.2637

a2 = 0 κ = 5, c=0.6 0.1052 0.0402 0.1454 0.1467

a2 = 0.25 κ=5 c=0.6 0.0745 0.1695 0.2440 0.2546

A Distributed Adaptive Algorithm for the Case of Imperfect Information

In previous sections, we assumed that nodes have perfect information about system parameters so that they can use the feedback information to compute immediate utilities and state transition probabilities for playing any action. It is of interest to study algorithms that do not depend on parameters, which cannot be completely known by nodes in some cases. We extend the stochastic adaptive algorithm of [5] (for slotted Aloha systems) to the relay channel and allow nodes to vary strategies as function of time t depending on feedback information, i.e. s1 = (p1 (t), p2 (t)), s2 = (q(t), r(t)). We assume that successful transmission yields higher immediate utility than waiting or collision, and node 2 cooperates to forward packets instead of staying idle, i.e. 1 − E2,3 > c. If node 1 has a packet to transmit at time t, p1 (t + 1) = min ( max ( p1 (t) + ǫ(t) ξ1 (t) , 0 ), 1 )

(5)

p2 (t + 1) = min ( max ( p2 (t) + ǫ(t) ξ2 (t) , 0 ), 1 )

(6)

where ξ1 (t) = 1, ξ2 (t) = − 12 or ξ1 (t) = − 21 , ξ2 (t) = − 21 , if transmission of node 1 to node 3 at time t is successful or fails. Similarly, ξ1 (t) = − 12 , ξ2 (t) = 1 or ξ1 (t) = 21 , ξ2 (t) = −1, if transmission of node 1 to P2 node 2 at time t is successful or fails. If (5) and (6) yield invalid distributions i=1 pi (t + 1) > 1, we need P2 further mapping such that ξ2 (t) and ξ3 (t) are chosen as the largest values to satisfy i=1 pi (t + 1) = 1. On the other hand, if node 2 has a packet to transmit at time t, then we have q(t + 1) = min ( max ( q(t) + ǫ(t) ξ3 (t) , 0 ), 1 )

(7)

where ξ3 (t) = 1 or −1, if transmission of node 2 at time t is successful or not. If node 2 has empty queue at time t, then we have r(t + 1) = min ( max ( r(t) + ǫ(t) ξ4 (t) , 0 ), 1 ) (8) where ξ4 (t) = −1 or 1, if there is a forwarding request at time t or not. The idea is to stimulate node 1 to send more (or less) packets to node 2 for low (or high) rates of forwarding requests. The proposed algorithm represents throughput rewards and delay costs but cannot distinguish between energy properties of direct communication and relaying. Therefore, the algorithm does not necessarily converge to non-cooperative equilibrium strategies, since expected utilities strongly depend on the unknown values of energy costs. Hence, we expect poor performance, if energy costs dominate decisions of nodes. P∞ We assume that system parameters are unknown but fixed and ǫ(t) satisfies limt→∞ ǫ(t) = 0 and t=1 ǫ(t) = ∞ as suggested by the stochastic approximation theory. 1 for t ≥ 1 and consider system parameters d = 0.1, c = For numerical results, we assume ǫ(t) = 10t 0.6, a1 = 0.25, a2 = 0.25. We consider two different sets for energy costs: (I) E1,3 = 0.04, E1,2 = E2,3 = 0.01 and (II) E1,3 = 0.4, E1,2 = E2,3 = 0.1. The distributed algorithm is run for 105 time slots. We illustrate the temporal evolution of p1 (t) and p2 (t) in Figures 6 and 7. For low energy costs, the algorithm converges close to the non-cooperative strategies with perfect information. However, a similar result does not hold for high energy costs, which strongly affect the decisions of users and expected utilities. 0.7

0.35 p1 for low energy costs

0.6

p for high energy costs 2

0.3

p1 for high energy costs

0.5

0.25

p2(t) equilibrium p for low energy costs 2 equilibrium p for high energy costs 2

p1(t) 0.2 2

p1(t)

p (t)

0.4

p1(t) equilibrium p1 for low energy costs equilibrium p for high energy costs 1

0.3

0.15

0.2

0.1

0.1

0.05

p (t) 2

p2 for low energy costs

0

0

1

2

3

4

5

6

7

8

9

10

0

0

1

2

3

4

5

6

4

t : time (slots)

Figure 6. Temporal evolution of p1 (t) according to the distributed adaptive algorithm

7

7

8

9

10 4

x 10

t : time (slots)

x 10

Figure 7. Temporal evolution of p2 (t) according to the distributed adaptive algorithm

First Improvement on the Game Model: A New Relaying Rule

It is possible to stimulate cooperation and prevent definite collisions, if the state of game further specifies whether a packet at the queue of node 2 is a new generated one or has been forwarded from node 2 in the previous slot. If packet of node 1 is accepted by node 2, node 1 is informed of this decision (i.e. the queue content of node 2) and has the opportunity to avoid transmitting to node 2 in the next slot, which will definitely result in collision. Therefore, we impose the rule that node 2 immediately forwards the packet received from node 1, whereas node 1 waits for the next slot independent of new packet arrivals. 7.1

Modified State Definition and Mixed Stationary Strategies

We assume that nodes 1 and 2 agree on the new relaying rule as a form of cooperative packet scheduling (i.e. node 1 waits, while node 2 transmits the forwarded packet) to stimulate cooperation of node 2 as

well as to avoid possible packet collisions. The state of the game is defined as (Q1 , Q2 ), where Q1 ∈ {0, 1} and Q2 ∈ {0, f, f }. Q2 = f refers to the case when a packet of node 1 has been accepted to the queue of node 2 in the previous slot, whereas f represents a new generated or already backlogged packet at queue of node 2. Thus, the state space has been extended to K = {(0, 0), (0, f ), (0, f ), (1, 0), (1, f ), (1, f )}. Nodes have same action spaces A1 = {A1 , A2 , A3 } and A2 = {B1 , B2 , C1 , C2 } as defined in section 4.2. The stationary strategies of node 1 are s1,k (A1 , A2 , A3 ) = (p1 , p2 , 1−p1 −p2 ) for k = {(1, 0), (1, f )}, where p1 , p2 and 1 − p1 − p2 denote the probabilities of selecting actions A1 , A2 and A3 , and s1,k (A1 , A2 , A3 ) = (0, 0, 1) for k = {(0, 0), (0, f ), (0, f ), (1, f )}. The random stationary strategy of node 1 is s1 = (p1 , p2 ). Node 2 has the stationary strategies of s2,k (B1 , B2 , C1 , C2 ) = (q, 1 − q, 0, 1) for k = {(0, f ), (1, f )}, where q, and 1 − q denote the probabilities of selecting actions B1 , B2 , s2,k (B1 , B2 , C1 , C2 ) = (1, 0, 0, 1) for k = {(0, f ), (1, f )}, and s2,k = (0, 1, r, 1 − r) for k = {(0, 0), (1, 0)}, where r, and 1 − r denote the probabilities of selecting actions C1 , C2 . The random stationary strategy of node 2 is s2 = (q, r). 7.2 Modified State Transition Matrix and Expected Utility Functions The joint set of random stationary strategies of nodes 1 and 2 are given by s = (p1 , p2 , q, r). We denote the six possible states as k1 = (0, 0), k2 = (0, f ), k3 = (0, f ), k4 = (1, 0), k5 = (1, f ), k6 = (1, f )}. The ith row of the state transition matrix Ti (s) is given by T1 (s) = [ (1 − a1 )(1 − a2 ), 0, (1 − a1 )a2 , a1 (1 − a2 ), 0, a1 a2 ] T2 (s) = [(1 − a1 )(1 − a2 ), 0, (1 − a1 )a2 , a1 (1 − a2 ), 0, a1 a2 ] T3 (s) = [ (1 − a1 )q(1 − a2 ), 0, (1 − a1 )(1 − q + qa2 ), a1 q(1 − a2 ), 0, a1 (1 − q + qa2 ) ] T4 (s) = [ p1 (1 − a1 )(1 − a2 ), (1 − a1 )p2 r, (1 − a1 )p1 a2 , (1 − a2 )(1 − p1 − p2 + p1 a1 + p2 (1 − r)), p2 ra1 , (1 − p1 − p2 )a2 + p1 a1 a2 + p2 (1 − r)a2 ], T5 (s) = [ 0, 0, 0, 1 − a2 , 0, a2 ] T6 (s) = [ 0, 0, p1 (1 − a1 )(1 − q), (1 − p1 − p2 )q(1 − a2 ), 0, 1 − p1 (1 − a1 )(1 − q) − (1 − p1 − p2 )q(1 − a2 )] If nodes 1 and 2 follow the strategy s = (p1 , p2 , q, r), their expected utilities U1 and U2 are given by U1 (s) = π4 (s)[p1 (1 − E1,3 ) + p2 (rc − (1 − r)d − E1,2 ) − (1 − p1 − p2 )d]

(9)

+π5 (s)[−d] + π6 (s)[p1 (−E1,3 + 1 − q − qd) + p2 (−E1,2 − d) − (1 − p1 − p2 )d] U2 (s) = π2 (s)[1 − E2,3 ] + π3 (s)[q(1 − E2,3 ) − (1 − q)d] + π4 (s)[−p2 rc]

(10)

+π5 (s)[1 − E2,3 ] + π6 (s)[q(−E2,3 − (p1 + p2 )d + 1 − p1 − p2 ) − (1 − q)d] 7.3 Numerical Analysis of the Performance Improvement In this section, we evaluate the effects of the modified game model on the expected utilities. We consider system parameters of E1,2 = 0.1, E2,3 = 0.1, d = 0.1, a1 = 0.25 and summarize the results in Table 2. The comparison of Tables 1 and 2 indicates that the proposed changes in the game model improve the performance for all cases, which is actually an expected result, since offering priority to transmissions of the forwarded packets supports the cooperation of relay node and prevents possible packets collisions. Table 2. Expected Equilibrium Utilities with the First Improvement on the Game Model

∗ U1,nc ∗ U2,nc ∗ ∗ U1,nc + U2,nc ∗ + U∗ U1,c 2,c

8

a2 = 0 κ=1 c=0.4 0.1500 0 0.1500 0.1500

a2 = 0.25 κ=1 c=0.4 0.0835 0.1637 0.2472 0.2690

a2 = 0 κ=5 c=0.4 0.1062 0.0379 0.1441 0.1472

a2 = 0.25 κ=5 c=0.4 0.0741 0.1708 0.2449 0.2621

a2 = 0 κ=1 c=0.6 0.1500 0 0.1500 0.1500

a2 = 0.25 κ=1 c=0.6 0.0892 0.1685 0.2577 0.2690

a2 = 0 κ = 5, c=0.6 0.1073 0.0408 0.1481 0.1472

a2 = 0.25 κ=5 c=0.6 0.0768 0.1734 0.2502 0.2621

A Closer Look at Cooperation Incentives for the Special Case a2 = 0

In this section, we explore the properties of the reward-based cooperation mechanism introduced in section 2.1. For tractable analytical solutions, we assume a2 = 0 and evaluate the effects of c on the equilibrium strategies and cooperation incentives. This will partially reveal the relationship among system parameters and decisions of users as well as provide estimates on how to choose strategies for a2 6= 0.

8.1 Non-Cooperative Equilibrium Solutions From (9) and (10), we can express the expected utilities of nodes 1 and 2 for the case a2 = 0 as follows: U1 (s) = π4 (s)[ p1 (1 − E1,3 ) + p2 (rc − (1 − r)d − E1,2 − ra1 d) − (1 − p1 − p2 )d ]

(11)

U2 (s) = π4 (s)p2 r(−c + 1 − E2,3 )

(12)

a1 . p1 (1−a1 )+p2 r(1−a1 +a21 +a1 )+a1

where π4 = The quantity U2 (s) is monotonically increasing with r, if 1 − E2,3 > c, and monotonically decreasing with r, if 1 − E2,3 < c. Therefore, to maximize U2 (s), node 2 selects strategy r = 1 if 1 − E2,3 > c, strategy r = 0 if 1 − E2,3 < c or chooses arbitrary r if 1 − E2,3 = c. For the case r = 0, i.e. 1 − E2,3 < c, the necessary condition for the pure strategy of direct communication, i.e. p1 = 1, p2 = 0, is given by a1 (1 − E1,3 ) > −d, whereas the pure strategy of waiting, i.e. p1 = 0, p2 = 0, requires the condition a1 (1 − E1,3 ) < −d to be satisfied. Note that the pure strategy of two-hop relaying, i.e. p1 = 0, p2 = 1, is not feasible for any value of E1,2 > 0. For the case r = 1, i.e. 1−E2,3 > c, the necessary condition for pure strategy of direct communication is (1 + a21 )(1 − E1,3 ) ≥ (c − E1,2 − a1 d). The pure strategy of two-hop relaying is possible only for the reverse inequality. On the other hand, the necessary conditions for the pure strategy solution of waiting are a1 (1 − E1,3 ) + d ≤ 0 and a1 (c − E1,2 ) + d ≤ 0. 8.2 Cooperative Equilibrium Solutions to the Team Problem For the cooperative case, we need r = 1 to preserve the energy resources and the total expected utility is a1 [p1 (1 − E1,3 + d) + p2 (1 − E2,3 − E1,2 + d − a1 d) − d] (13) Utotal (s) = p1 (1 − a1 ) + p2 (1 − a1 + a21 ) + a1 For the pure strategy of direct communication, we need (1 + a21 )(1 − E1,3 ) ≥ (1 − E2,3 − a1 d − E1,2 ), whereas the pure strategy of two-hop relaying requires the reverse inequality to be satisfied. For the pure strategy solution of waiting, we need a1 (1 − E1,3 ) + d ≤ 0 and a1 (1 − E1,2 − E2,3 ) + d ≤ 0. For the general case a2 6= 0, we will need lower values of c to stimulate the relay node to cooperate and we can refer to the results of this section as useful bounds to select appropriate values of c.

9

Second Improvement: Immediate Transmission of New Packets

Immediate transmission of new packets in the next slot is a general rule in random distributed medium access protocols, as implemented in slotted Aloha Systems [5]. In this section, we distinguish the new and backlogged packets in the state definition and enforce immediate transmission of new (generated) and forwarded packets to avoid unnecessary packet delays, which might otherwise cause unavoidable collisions in subsequent slots. Since nodes with new and backlogged packets are subject to statistically different channel conditions, we allow them to select different distributions over their available actions. 9.1 Modified State Definition and Mixed Stationary Strategies The state of game is defined as (Q1 ∈ {0, n, b}, Q2 ∈ {0, n, b, f }), where 0, n, b and f correspond to the cases that no packet, a new generated, a backlogged or a forwarded packet is present at the corresponding queue. There are only 11 possible states, since the case of Q1 = b, Q2 = f is not possible. Nodes have the same action spaces A1 = {A1 , A2 , A3 } and A2 = {B1 , B2 , C1 , C2 } as defined in section 4.2. The deterministic strategies of node 1 are s1,k (A1 , A2 , A3 ) = (0, 0, 1) for k = {(0, 0), (0, n), (0, b), (0, f ), (n, f )}. The only random decisions of node 1 are between actions A1 and A2 (with probabilities p1,1 , 1 − p1,1 , respectively) in the case of new packet, and between actions A1 , A2 and A3 (with probabilities p1,2 , p2 , 1 − p1,2 − p2 , respectively) in the case of backlogged packet. In other words, s1,k (A1 , A2 , A3 ) = (p1,1 , 1 − p1,1 , 0) for k = {(n, 0), (n, n), (n, b)} and s1,k (A1 , A2 , A3 ) = (p1,2 , p2 , 1 − p1,2 − p2 ) for k = {(b, 0), (b, n), (b, b)}. The random stationary strategy of node 1 is given by s1 = (p1,1 , p1,2 , p2 ). Node 2 has the deterministic strategies of s2,k (B1 , B2 , C1 , C2 ) = (1, 0, 0, 1) for k = {(0, f ), (n, f ), (0, n), (n, n), (b, n)}. The only random decisions of node 2 are between B1 and B2 (with probabilities of q, 1 − q, respectively) in the case of backlogged packet, and between C1 and C2 (with probabilities of r, 1 − r, respectively) in the case of empty queue. In other words, we have s2,k (B1 , B2 , C1 , C2 ) = (q, 1 − q, 0, 1) for k = {(0, b), (n, b), (b, b)} and s2,k (B1 , B2 , C1 , C2 ) = (0, 1, r, 1 − r) for k = {(0, 0), (n, 0), (b, 0)}. The random stationary strategy of node 2 is given by s2 = (q, r), whereas the joint set of random stationary strategies can be expressed as s = (p1,1 , p1,2 , p2 , q, r). It is straightforward to determine the state transition matrix and expected utilities for the modified game model. We skip the exact derivation in this paper for brevity and continue with numerical results.

9.2 Numerical Analysis of the Performance Improvement We consider parameters E1,2 = 0.1, E2,3 = 0.1, d = 0.1, a1 = 0.25 and provide numerical results for expected utilities in Table 3. The comparison of Tables 1, 2 and 3 verifies that the proposed changes in the game model improve the expected utilities in non-cooperative and social equilibrium. This is an expected result, since delaying transmission of new packets causes congestion in the system, which increases the stationary distribution of states with backlogged packets and reduces the expected utilities. Table 3. Expected Equilibrium Utilities of the Game Model with the First and Second Improvements

∗ U1,nc ∗ U2,nc ∗ ∗ U1,nc + U2,nc ∗ ∗ U1,c + U2,c

10

a2 = 0 κ=1 c=0.4 0.1500 0 0.1500 0.1500

a2 = 0.25 κ=1 c=0.4 0.0842 0.1641 0.2482 0.2708

a2 = 0 κ=5 c=0.4 0.1086 0.0388 0.1474 0.1496

a2 = 0.25 κ=5 c=0.4 0.0761 0.1733 0.2494 0.2687

a2 = 0 κ=1 c=0.6 0.1500 0 0.1500 0.1500

a2 = 0.25 κ=1 c=0.6 0.0905 0.1804 0.2709 0.2708

a2 = 0 κ = 5, c=0.6 0.1091 0.0401 0.1492 0.1496

a2 = 0.25 κ=5 c=0.6 0.0779 0.1801 0.2580 0.2687

Conclusions

In this paper, we addressed the tradeoffs between direct communication and multi-hop relaying from the perspective of stochastic games in a simple network of a transmitter and relay node with conflicting interests of transmitting to a common destination. Transmitter node randomizes actions between directly transmitting to destination or relying on relay node to forward packets, whereas relay node decides on whether to accept packets from transmitter node or to transmit its own packets to destination. As random medium access strategies, we allowed nodes to select distributions over available actions depending on the state of the game, which we defined as the content of both packet queues. We motivated the cooperation of relay node by offering a future reward for successful transmission of forwarded packets. We relied on numerical investigation to derive the equilibrium strategies. For the simple case without packet generation by relay node, we further provided analytical results about the feasibility regions of direct communication and two-hop relaying. We also developed a distributed adaptive algorithm for the case of unknown system parameters and pointed at deviations from the equilibrium for high energy costs. Finally, we introduce two changes in the game model giving priority to transmissions of forwarded and new generated packets. We verified the performance improvement via numerical results. The game model of this paper can be further extended to incorporate arbitrary queue sizes. It is also worth exploring the general case when the relay node is also a potential destination for the packets of transmitter node. This will introduce additional incentives for cooperation of the relay node besides the reward-based stimulation mechanism introduced in this paper.

References 1. T. Cover and A. El Gamal, “Capacity theorems for the relay channel,” IEEE Trans. Information Theory, vol. IT-25, no. 5, pp. 572-584, September 1979. 2. R. J. La, and V. Anantharam, “A game-theoretic look at the Gaussian multiaccess channel,” DIMACS Workshop on Network Information Theory, Rutgers University, Piscataway, NJ, USA, Mar. 2003. 3. A.B. MacKenzie and S.B. Wicker, “Selfish users in Aloha: A game theoretic approach,” in Proc. Fall 2001 IEEE Vehicular Technology Conference, Atlantic City, NJ, USA, Oct. 2001. 4. Y. Jin and G. Kesidis, “Equilibria of a non-cooperative game for heterogeneous users of an ALOHA network,” IEEE Communications Letters, vol. 6, no 7, pp. 282-284, July 2002. 5. E. Altman, R. El-Azouzi and T. Jimenez, “Slotted Aloha as a stochastic game with partial information,” in Proc. WiOpt’03, Sophia-Antipolis, France, Mar. 2003. 6. Y. E. Sagduyu and A. Ephremides, “Power control and rate adaptation as stochastic games for random access,” in Proc. 42nd IEEE Conference on Decision and Control, Maui, HI, USA, Dec. 2003. 7. A. Urpi, M. Bonuccelli, and S. Giordano, “Modelling cooperation in mobile ad hoc networks: a formal description of selfishness,” in Proc. WiOpt’03, Sophia-Antipolis, France, Mar. 2003. 8. V. Srinivasan, P. Nuggehalli, C. F. Chiasserini, and R. R. Rao, “Cooperation in wireless ad hoc networks,” in Proc. of IEEE Infocom, San Francisco, CA, USA, Apr. 2003. 9. L. Buttyan and J. P. Hubaux, “Stimulating cooperation in self-organizing mobile ad hoc networks,” ACM Journal for Mobile Networks (MONET), special issue on Mobile Ad Hoc Networks, vol. 8, no. 5, Oct. 2003. 10. L. Xie and P. R. Kumar, “New results in network information theory: Scaling laws and optimal operational modes for wireless networks,” in Proc. 41st IEEE Conference on Decision and Control, Las Vegas, NV, USA, Dec. 2002.