IEEE TRANSACTIONS ON VEHICULAR TECHNOLOGY, VOL. 59, NO. 5, JUNE 2010
2149
Distributed Optimal Relay Selection in Wireless Cooperative Networks With Finite-State Markov Channels Yifei Wei, F. Richard Yu, Senior Member, IEEE, and Mei Song
Abstract—Relay selection is crucial in improving the performance of wireless cooperative networks. Most previous works for relay selection use the current observed channel conditions to make the relay-selection decision for the subsequent frame. However, this memoryless channel assumption is often not realistic given the time-varying nature of some mobile environments. In this paper, we consider finite-state Markov channels in the relay-selection problem. Moreover, we also incorporate adaptive modulation and coding, as well as residual relay energy in the relay-selection process. The objectives of the proposed scheme are to increase spectral efficiency, mitigate error propagation, and maximize the network lifetime. The formulation of the proposed relay-selection scheme is based on recent advances in stochastic control algorithms. The obtained relay-selection policy has an indexability property that dramatically reduces the computation and implementation complexity. In addition, there is no need for a centralized control point in the network, and relays can freely join and leave from the set of potential relays. Simulation results are presented to show the effectiveness of the proposed scheme. Index Terms—Finite-state Markov channel, relay selection, restless bandit, wireless cooperative networks.
I. I NTRODUCTION
T
HE ADVANTAGES of space diversity have widely been acknowledged, and as one kind of space diversity techniques, multiple input–multiple output (MIMO) has been incorporated into recent wireless standards. Since it is difficult to equip handheld devices with multiple antennas due to size, cost, or hardware limitations [1], the concept of cooperative relaying has been proposed to generate a virtual antenna array [2]. The basic idea of cooperative relaying in wireless networks is that some nodes that overheard the information transmitted from the source node relay it to the destination node instead of treating it as interference. Since the destination node receives multiManuscript received April 13, 2009; revised July 8, 2009 and October 13, 2009; accepted December 22, 2009. Date of publication February 2, 2010; date of current version June 16, 2010. This work was supported in part by the National High-Tech Research and Development Plan of China under Grant 2007AA01Z226, by the National Natural Science Foundation of China under Grant 60971083, by the National International Science and Technology Cooperation Project of China under Grant 2008DFA12090, and by the Natural Sciences and Engineering Research Council of Canada. The review of this paper was coordinated by Dr. M. Dohler. Y. Wei and M. Song are with the School of Electronic Engineering, Beijing University of Posts and Telecommunications, Beijing 100876, China (e-mail:
[email protected];
[email protected]). F. R. Yu is with the Department of Systems and Computer Engineering, Carleton University, Ottawa, ON K1S 5B6, Canada (e-mail: richard_yu@ carleton.ca). Color versions of one or more of the figures in this paper are available online at http://ieeexplore.ieee.org. Digital Object Identifier 10.1109/TVT.2010.2041803
ple independently faded copies of the transmitted information from the source node and relay nodes, cooperative diversity is achieved. Recently, cooperative relaying has been considered as a promising technique and has been involved in the standard of IEEE 802.16j [3] and expected to be integrated in ThirdGeneration Partnership Project Long-Term Evolution multihop cellular networks [4]. Relaying could be implemented using an amplify-and-forward, decode-and-forward (DF), or distributed space-time-coded (STC) scheme. In the STC scheme, the unfixed number of participating antennas and the synchronization difficulties make it a challenging scheme for implementation. In this paper, the discussion will focus on the DF relaying scheme since it has advantages in digital processing and avoids noise amplification. Relay selection among available relays is crucial in improving the performance of cooperative relaying [5]–[10]. In [5], a concept of selection diversity in cooperative systems is proposed, and the authors demonstrate that cooperative relay selection outperforms the distributed STC scheme. Ng and Yu [6] present a centralized optimization framework, in which the base station solves the joint relay strategies and resource-allocation problem based on the feedback of receivers’ channel estimation and then informs all users about the appropriate power levels and cooperative strategies. In [7], a semidistributed relaying algorithm is proposed to jointly optimize the relay selection and power allocation of the system. Distributed relay-selection methods based on local instantaneous channel measurements without the topology information are considered in [8]–[10]. The involved nodes exchange their current estimation of the channel-state information (CSI) and decide the “best” relay for the subsequent frame transmission. The “best” relay is the one that has the maximum instantaneous value of a metric, which is the minimum or the harmonic mean of its source-to-relay (S2R) and relay-to-destination (R2D) channels’ gains. In [8], each relay calculates and sends this metric to the source through a feedback channel; then, the source uses its source-to-destination (S2D) channel’s gain and each relay’s metric to determine with which relay to cooperate, and finally, the source sends a control signal to the destination and the relays to indicate its decision. In [9], the estimated CSI is gathered at the destination, and the destination chooses the “best” relay based on the metric and then instructs the selected relay to participate in the relaying phase. Bletsas et al. [10] deploy timers that are proportional to the channels’ gains in each node to help the relay selection. Although some works have been done for relay selection in wireless cooperative networks, most previous works use the
0018-9545/$26.00 © 2010 IEEE
2150
IEEE TRANSACTIONS ON VEHICULAR TECHNOLOGY, VOL. 59, NO. 5, JUNE 2010
current observed channel conditions to make the relay-selection decision for the subsequent frame transmission. Specifically, it is assumed that the channel fading is slow enough such that the channel conditions remain in the same state from the current frame to the next. The estimated CSI of the current frame is simply taken as the predicted CSI for the next frame. However, this memoryless channel assumption is often not realistic, given the time-varying nature of some mobile environments [11]. Finite-state Markov channel (FSMC) models have widely been accepted in the literature as an effective approach to characterize the correlation structure of the fading process, including Rayleigh [12], Ricean [13], and Nakagami fading channels [14]. Considering FSMC models may enable substantial performance improvement over the schemes with memoryless channel models. In addition, most existing relay-selection schemes assume that both the source and relay use the same modulation and coding scheme (MCS) without considering the application of adaptive modulation and coding (AMC). In this paper, the source and relay can use different MCSs to achieve high spectral efficiency. Moreover, we consider the residual relay energy in relay selection to maximize the network lifetime. We model the relay-selection problem as a stochastic control problem. The distinct features of the proposed scheme include the following. 1) One relay is selected from all available relays according to their states, including the residual relay energy and CSI of both S2R and R2D links. A first-order FSMC is used to model the relay state. The source and relay can use different MCSs. 2) The objectives of the proposed scheme are to mitigate error propagation, increase the spectral efficiency, and maximize the network lifetime. 3) The obtained relay-selection policy has an indexability property that dramatically reduces the computation and implementation complexity of the relay-selection policy. Online implementation is further simplified by constructing a lookup table that maps the relay states to the priority indices. 4) The proposed scheme is fully distributed and scalable. There is no need for a centralized control point, and relays can freely join and leave from the set of relay candidates. The rest of this paper is organized as follows. In Section II, the system models and the optimization objectives are described. Section III formulates the problem as a stochastic system and solves the problem with a linear programming (LP) relaxation algorithm. Section IV discusses the distributed relayselection process in wireless cooperative networks. Some simulation results are presented in Section V. Finally, we conclude this study in Section VI. II. S YSTEM M ODELS IN THE R ELAY-S ELECTION P ROBLEM We consider a distributed cooperative wireless network with peer-to-peer relaying, where each node has the ability to relay data packets for each other [15]. When viewed from the multihop routing diversity point of view, the first hop is more important than all subsequent hop(s) [16]. Therefore, in this study, we
Fig. 1.
Cooperative relaying model.
only consider two-hop relays S2R and R2D, as shown in Fig. 1. Assume that the network uses request-to-send (RTS)/clear-tosend (CTS) packet exchanges between the sender and receiver to avoid collision and acquire the current estimation of the SNR. The network employs the AMC scheme; thus, the source and relay may use different MCSs. The destination can combine the packets delivered by different modulation schemes using digital combining techniques such as “chase combining” in hybrid automatic repeat request [17]. Zhang et al. [18] investigate how to do the combining and detection when the received data from S and R may be in different modulations, and the modulationadaptive cooperation scheme has been proposed [19] to increase spectral efficiency and system throughput. In our proposed relay-selection scheme, the combination algorithm at the destination is not specified. Existing combination algorithms are all applicable to our proposal. A sort of MCS-adaptive technique is the receiver-based autorate [20], in which the decision of the MCS is performed at the receiver side according to the observed channel condition. When the receiver receives the RTS packet, it will determine the MCS for the subsequent frame based on the average received SNR and inform the sender of the MCS via the CTS packet. In Section IV, we will explain our distributed relayselection procedure and the determination of relay candidates in detail. Here, we give the basic idea: Before each time slot, the source broadcasts RTS; then, its destination estimates γS2D and decides the MCS of S2D (M CS2D ), and relays can estimate γS2R . After the destination broadcasts the CTS with the MCS information, the source and relays both receive it, and relays can estimate γR2D and decide the MCS of R2D (M CR2D ). Each MCS has a corresponding spectral efficiency, we assume that there are totally K classes of MCSs with spectral efficiencies denoted as η0 , η1 , . . . , ηK−1 , and the minimum decoding SNRs ∗ , respectively. for different MCSs are γ0∗ , γ1∗ , . . . , γK−1 We assume that there are N available relay candidates that can decode both RTS and CTS between a pair of a source and destination and denote the set of available relays as N = {1, 2, . . . , N }. The duration of the whole communication between a pair of a source and destination is divided into T time slots that correspond to the time interval between two continuous decisions, and this time interval is also known as horizon in this paper. Let t ∈ T = {0, 1, . . . , T − 1} stand for the time instant when a decision needs to be made. The action of each relay n ∈ N in time slot t is represented by an (t), an (t) ∈ A = {0, 1}, in which an (t) = 0 means that the relay n is passive (not selected) in time slot t, and an (t) = 1 means that it is active (selected).
WEI et al.: DISTRIBUTED OPTIMAL RELAY SELECTION IN WIRELESS COOPERATIVE NETWORKS WITH FSMCs
A. S2R Channel In the first phase of cooperative relaying, the source transmits a data packet to its destination using the MCS M CS2D , and the selected relay overhears it. The FSMC model is widely used in the literature to characterize the wireless channel. A firstorder FSMC is used in [21] to approximate the channel when designing a decision-feedback maximum-likelihood decoder. Tan and Beaulieu [22] address the limitations of a first-order FSMC model used for bit-level models and point out that a block-level Markov model can be accurate. A first-order FSMC is used in [23] to determine the packet error rate in the noninterleaved Rayleigh fading channel. Wang and Chang [12] demonstrate that a first-order FSMC is sufficient to model wireless channels and that the improvements of a second-order or higher order FSMC are negligible. In this paper, therefore, the first-order FSMC model is used. In the FSMC, the channel state is characterized via the received SNR, which is a parameter that is commonly used to represent the quality of a channel. The range of the average SNR of a received packet is partitioned (quantized) into L levels, and each level is associated with a state of a Markov chain. The channel varies over these states at each time slot according to a set of Markov transition probabilities. That is, the average received SNR at a relay can be modeled as a random variable γS2Rn evolving according to a finite-state Markov chain, which is characterized by a set of states C = {C0 , C1 , . . . , CL−1 }. The S2R channel-state realization of γS2Rn is Γn (t) for relay n in time slot t. Let φgn hn (t) denote the probability that γS2Rn moves from state gn to state hn at time t. The L × L S2R channel-state transition probability matrix of relay n is defined as Φn (t) = [φgn hn (t)]L×L
(1)
where φgn hn (t) = Pr(Γn (t + 1) = hn |Γn (t) = gn ), and gn , hn ∈ C. Relay n can estimate its current channel state Γn (t) after receiving RTS from the source at time t. B. R2D Channel During the second phase (i.e., relaying phase), the selected relay retransmits the packet to the destination using the MCS M CR2D , and the destination combines both directly received and relayed packets. Given the target bit error rate (BER), ∗ for different the minimum decoding SNRs γ0∗ , γ1∗ , . . . , γK−1 MCSs can be calculated. Since we are concerned about the MCS of the R2D link, the channel state of the R2D link can be divided into discrete K levels: Υ0 if γ0∗ γR2D < γ1∗ ; ∗ . The Υ1 if γ1∗ γR2D < γ2∗ ; . . .; and ΥK−1 if γR2D γK−1 average received SNR of each packet can be modeled as a random variable γR2Dn evolving according to a K-state Markov chain, which has a finite-state space denoted as D = {D0 , D1 , . . . , DK−1 }. The R2D channel state realization of γR2Dn is Υn (t) for relay n in time slot t. Let ψun vn (t) denote the probability that γR2Dn moves from state un to state vn at time t. The K × K R2D channel-state transition probability matrix of relay n is defined as Ψn (t) = [ψun vn (t)]K×K
(2)
2151
where ψun vn (t) = Pr(Υn (t + 1) = vn |Υn (t) = un ), and un , vn ∈ D. Relay n can estimate its current channel state Υn (t) after receiving CTS from the destination at time t. C. Energy Model Since most wireless mobile devices are powered by batteries with limited energy, the battery power should be consumed more carefully and efficiently to maximize the network lifetime. Therefore, we should select the relay with high residual energy to avoid overutilizing a node. Since the battery energy of a device will decrease due to any application (e.g., a multimedia application or wireless transmission) run on the device, we do not exactly know the energy state at the next time slot. Therefore, the battery energy can be modeled as a random variable en . For simplification, the continuous residual energy realization of en can be divided into discrete levels, which are denoted by E = {E0 , E1 , . . . , EH−1 }, where H is the number of available energy state levels. Assume the residual energy realization of en to be En (t) for relay n at time t. Hu et al. [24] model the transition of the energy levels of nodes in wireless networks as the Markov chain. We adopt this model and define the energy-state-transition probability matrix of relay n taking action a as (3) Θan (t) = θfan yn (t) H×H where θfan yn (t) = Pr(En (t + 1) = yn |En (t) = fn , an (t) = a), and fn , yn ∈ E, a ∈ A. The energy model used in some other papers such as [25] assumes that the energy is reduced by a fixed amount after every data-transmission action. This model can be considered a special case of the Markov model, where 1, if yn is the lower energy state next to fn θf1n yn (t) = 0, otherwise. (4) In real systems, the values in the aforementioned transitionprobability matrices can be obtained from the history observation of the wireless network. D. Objectives We need to find out the optimal relay-selection policy, which can set one relay to be active at time slot t according to the relays’ states that contain their S2R channel state Γn (t) ∈ C, R2D channel state Υn (t) ∈ D, and residual energy state En (t) ∈ E. In this paper, we use the following optimization objectives. 1) Mitigate error propagation. A better channel state Γn (t) ∈ C mitigates error propagation with a lower BER and should be reflected on higher reward in our formulation. 2) Increase spectral efficiency. A better channel state Υn (t) ∈ D enables a higher MCS with higher spectral efficiency and should be reflected on higher reward in our formulation. 3) Maximize network lifetime. Two approaches are adopted to maximize the network lifetime: minimizing the energy consumption required to deliver data packets by selecting
2152
IEEE TRANSACTIONS ON VEHICULAR TECHNOLOGY, VOL. 59, NO. 5, JUNE 2010
the relay with a better channel state and balancing energy usage among the relays by selecting the relay with high residual energy. Since one of the objectives of the proposed scheme is to mitigate error propagation in the DF relaying scheme, we use S2R channel state Γn (t) ∈ C in our formulation in the following section. In fact, an S2R link with a better BER leads to a better BER of the end-to-end (from the source to the destination) communication, which has been shown in existing works with closed-form expressions for the error probability of DF relaying (e.g., [26] and the references therein). The closed-form expressions for the end-to-end BER involves γS2R , γR2D , and γS2D . Therefore, considering S2R’s BER as the metric is the same as considering the end-to-end BER as the metric, since the S2D link is the same for all relays. Similarly, a higher spectral efficiency of the R2D link leads to a higher end-to-end special efficiency. Considering the R2D spectral efficiency as the metric is the same as considering the end-to-end special efficiency as the metric, since the S2D link is the same for all relays. If we use end-to-end quality of service (QoS), which involves γS2R , γR2D , and γS2D , as the metric in our formulation, the computational complexity of the proposed scheme will be increased. III. S TOCHASTIC F ORMULATION The classical multiarmed bandit problem, which was originally described by Robbins in 1952 [27], is an analogy with a traditional slot machine (one-armed bandit) but with more than one lever. When pulled, each lever provides a reward drawn from a distribution associated with that specific lever. It is a simple model of an agent that simultaneously attempts to acquire new knowledge and to optimize its decisions based on existing knowledge. The questions arising are related to the problem of balancing reward maximization based on the knowledge already acquired with attempting new actions to further increase knowledge. A restless bandit is a special type of multiarmed bandit. It is used to formulate the stochastic selection problem as follows: Consider a collection of N projects; each project n can be in a state in (t) ∈ Sn in each time slot t = 0, 1, 2, . . .. According to their states, m out of N projects are selected to work or set to be active (an (t) = 1), and the other projects a (t) are set to be passive (an (t) = 0). The system reward Rinn(t) is earned when action an (t) is taken, and their states change in a Markovian fashion, according to a transition probability matrix (into state jn (t + 1) ∈ Sn with probability pain jn ). Rewards are discounted in time by a discount factor 0 < β < 1. Projects are selected over time under a policy u ∈ U, where U is the set of all Markovian policies (which select the current action as a function, possibly randomized, of the current state and time). The problem is to determine the optimal u that maximizes the total expected discounted reward over the time horizon. The restless bandit problem can be solved according to the indices of the projects, which is calculated by an LP relaxation algorithm. Recent advances in solving the restless bandit problem make it a powerful modeling framework. In this section, we formulate the procedure of relay selection as a restless bandit problem and discuss the solutions.
A. Relay States The state of an available relay n ∈ {1, 2, . . . , N } in time slot t ∈ {0, 1, . . . , T − 1} is determined by the realization of the states Γn (t), Υn (t), and En (t) of the random variables γS2Rn , γR2Dn , and en . Consequently, the state of an available relay is the combination of them, i.e., in (t) = [Γn (t), Υn (t), En (t)] .
(5)
In practice, the change of channel state on S2R, channel state on R2D, and residual energy state are independent of each other, i.e., random variables γS2Rn , γR2Dn , and en are independent. Therefore, the relay state will change in a Markovian fashion, and the finite-state space of relay n is represented as Sn , in (t) ∈ Sn , with the transition-probability matrix (6) Pna (t) = φgn hn (t), ψun vn (t), θfan yn (t) G ×G n
n
where φgn hn (t), ψun vn (t), and θfan yn (t) are defined in (1)–(3), respectively, and Gn = L × K × H. The element of Pna (t) is pain jn (t), denoting the transition probability that the state of relay n changes from in to jn , where in , jn ∈ Sn . B. System Reward In the restless bandit problem, the system reward represents the optimization objectives. Since the objectives of the proposed scheme is to mitigate error propagation, increase spectral efficiency, and maximize network lifetime, we formulate the system reward to be the function of the BER of the S2R link, the spectral efficiency of the R2D link, and the energy consumption of delivering the data packets from the source to the destination. For the objective of balancing energy usage among the relays, residual energy state En (t) should also be one factor of the reward function. The action of a relay determines whether the reward will be gained. Therefore, we define the system reward as a (t)
Rinn(t) = an (t)R (ωp Pb (M CS2D , Γn (t)) , ωη ηk × (M CR2D , Υn (t)) , ωJ J(Pt , l, rk ), ωE En (t))
(7)
where |ωp | + |ωη | + |ωJ | + |ωE | = 1, ωη and ωE are positive weights, ωp and ωJ are negative weights, Pb is the BER function determined by channel state Γn (t) when an MCS is given, ηk is the spectral efficiency determined by the modulation scheme that adapts to channel state Υn (t), J is the energy consumption function of the transmit power, packet length, and data rate, Pt is the transmit power of the relay for retransmitting a data packet, l is the length of the packet, and rk is the data rate. a (t) The instantaneous reward Rinn(t) is earned for relay n in state in (t) when it takes action an (t) in time slot t. For a stochastic process, a maximum immediate value does not mean the maximum expected long-term accumulated value. Therefore, we need to think about more than just the instantaneous reward that the system can receive. We denote by u ∈ U the policy and denote by β the discount factor. Solving the optimal policy for the infinite-horizon problems requires the discount factor 0 < β < 1 to ensure that the expected reward is bounded and converged. We assume that the duration of the whole
WEI et al.: DISTRIBUTED OPTIMAL RELAY SELECTION IN WIRELESS COOPERATIVE NETWORKS WITH FSMCs
communication is long enough and that T is approximately infinite. In the case of undiscounted finite-horizon problems (i.e., T is small), we can set β = 1. The goal of the relay selection is to find a selection policy that maximizes the total expected discounted reward during the whole communication period, and the optimum value is T −1 a (t) a2 (t) aN (t) 1 ∗ t Ri1 (t) + Ri2 (t) + · · · + RiN (t) β . Z = max Eu u∈U
t=0
(8) C. Solution to the Restless Bandit Problem
t=0
x0in (u) = Eu where Ii1n (t) =
1, 0,
1, 0 Iin (t) = 0,
t=0
Ii0n (t)β t
(10)
(11) if relaynis in state in and passive at timet otherwise. (12)
The performance measure x1in (u) (respectively, x0in (u)) represents the total expected discounted time that relay n is in state in and active (respectively, passive) under selection policy u. We denote by X the corresponding performance region spanned by performance vector x = (xainn (u))in ∈Sn ,an ∈{0,1} under all Markovian policies u ∈ U, i.e.,
X = {x = xainn (u) i ∈S ,a ∈{0,1},n∈N |u ∈ U}. (13) n
n
Since the restless bandit problem is naturally formulated as a discounted MDC, Bertsimas and Niño-Mora [28] proved that performance region X is a polytope, which they referred to as the restless bandit polytope P. The restless bandit problem can thus be formulated as the linear program Riann xainn . (14) (LP) Z ∗ = max x∈X
Therefore, the first-order relaxation can be formulated as the linear program Riann xainn (LP1 ) Z 1 = max n∈N in ∈Sn an ∈{0,1}
subject to xn ∈ Pn1 , n∈N in ∈Sn
if relay n is in state in and active at time t otherwise
n
{xn = (xainn (u))in ∈Sn ,an ∈{0,1},n∈N |u ∈ U}, which is precisely the projection of restless bandit polytope P over the space of the variable xainn for relay n. Furthermore, Pn1 is also the performance region of the first-order MDC corresponding to relay n. Let αin denote the probability that the initial state is in , for in ∈ Sn ; thus, the initial state probability vector α = (αin )in ∈Sn is given. A complete formulation of Pn1 is given by [28] |S ×{0,1}| 0 |xjn + x1jn Pn1 = xn ∈ +n
an an = αjn + β pin jn xin , jn ∈ Sn . (15) in ∈Sn an ∈{0,1}
To solve the restless bandit problem, a hierarchy of increasingly stronger LP relexations is developed based on the classical result on LP formulations of Markov decision chains (MDCs) [28]. To formulate the restless bandit problem as a linear program, we introduce performance measures T −1
1 1 t Iin (t)β (9) xin (u) = Eu T −1
2153
n∈N in ∈Sn an ∈{0,1}
The approach developed in [28] is to construct relaxations of polytope X that yield polynomial-size relaxations of the linear ⊇ X the relaxations not on the space program. Denote by X of the original variables xai but in a higher dimensional space that includes new auxiliary variables. Define the polytope Pn1 =
n∈N x1in
=
m . 1−β
(16)
In our scheme, only one relay is active; therefore, m = 1. We will refer to the feasible space of linear program (LP1 ) as the first-order approximation to the restless bandit polytope P and denote it as P 1 . Notice that linear program (LP1 ) has O(N |Smax |) variables and constraints, where |Smax | = maxn∈N |Sn |, and its size is thus polynomial in the problem dimensions. IV. D ISTRIBUTED R ELAY-S ELECTION S CHEME In this section, we will present the distributed relay-selection scheme in wireless cooperative networks. The proposed scheme is based on the RTS/CTS mechanism of collision avoidance, the current channel state can be observed, and the set of available relays can be determined via exchanging RTS/CTS packets. A. Available Relay Candidates A source node initiates its transmission by sending an RTS packet to its destination node. Its destination and all neighbor nodes will receive this RTS packet and estimate γS2D and γS2R , respectively. The destination decides the MCS of the S2D link (M CS2D ) according to the prediction of the upcoming channel quality and replies to the source with a CTS packet containing the M CS2D and γS2D information. The source and all neighbor nodes of the destination will receive it. Then, the source will adjust its MCS to M CS2D for the next time slot. The neighbor nodes also learn which MCS should be adopted to overhear the data packet from the source and get the S2D channel information γS2D , and they can estimate γR2D from the CTS packet and decide the MCS of the R2D link (M CR2D ) according to the prediction of the upcoming channel quality. Here, we assume that the forward and backward channels between the relay and the destination are the same due to the reciprocity theorem [10] when the transmissions occur on the same frequency band and
2154
IEEE TRANSACTIONS ON VEHICULAR TECHNOLOGY, VOL. 59, NO. 5, JUNE 2010
same coherence interval. The common neighbors of the pair of source and destination that can decode both RTS and CTS could be the set of potential relays. However, to provide better relay candidates, we apply the following criteria to examine whether they can be in the set of available relays: γS2D < min(γS2R , γR2D ).
(17)
Those common neighbors that satisfy the criteria that the SNR in direct link is less than the minimum of the SNRs in the relaying path will constitute the set of available relays N . Notice that we use a first-order FSMC model to approximate the time variations of the average SNR in the subsequent time slot, which means that the prediction of upcoming channel quality is the set of K possible states, together with the state transition probabilities. Therefore, the decision of which MCS to use in the upcoming time slot is based on the predicted channel state with the highest probability. B. Relay-Selection Process After the handshaking packets, each available relay n gets its current S2R channel state Γn (t) and R2D channel state Υn (t) at time t, and they also can detect their energy state En (t). Given the current state in (t), each available relay calculates its index δin and broadcasts a candidate index (CI) packet containing its index together with the M CR2D information. To reduce the probability of collision between CI packets, relays can contend for the channel first using a standard carrier sense multipleaccess (CSMA) splitting scheme. In CSMA, a node first senses the channel to make sure that it is idle before transmitting. Adopting other channel reservation mechanisms may solve the “hidden-node” and “exposed-node” problems but will impose additional communication overhead and complexity. When other available relays receive this CI packet, they will compare the received index with their own index and broadcast their own CI packet only if their own index is smaller than the received index; otherwise, they will not broadcast their own CI packet and keep silent without listening to the source transmission, which can be energy efficient from a network sense. We can simply set a timer on the source node and destination node to stop receiving the delayed CI packets. After the timer expires, the source receives all the CI packets or the subset of all CI packets due to the collision between them, and it will select the relay with the smallest index with which to cooperate. After receiving all the CI packets or the subset of all CI packets, the destination adopts the (M CR2D ) information from the relay with the smallest index to receive the retransmission from this relay. We would like to emphasize that feeding back the index to the source and making decisions at the source can solve the hiddenterminal problem that the relays may be hidden to each other and cannot hear the CI broadcast from each other. Since the priority indices can be computed and stored into a table offline before transmission, the relay-selection process can be divided into the offline stage and the online stage. 1) Offline computation: Before any transmission is started, the offline computation takes as input the state-transition
probability painnjn , the reward Riann , the discount factor β, and the initial state probability vector α and produces as output the set of indices {δin } according to Section III-C. Relays store these indices and the corresponding painnjn , Riann , and α in a table. 2) Online selection: After each handshaking between the pair of source and destination, each available relay n ∈ N looks up its index table to find out the index δin corresponding to its current state in and then broadcasts its CI packet if it has not received any CI packet or its own index is smaller than the received indices. V. S IMULATION R ESULTS AND D ISCUSSIONS In this section, we illustrate the performance of the proposed scheme by simulations. We compare the proposed scheme with an existing memoryless selection method [9] and the random selection method. We use M -ary phase-shift keying [i.e., binary phase shift keying (BPSK), quadrature phase shift keying (QPSK), and 8PSK] modulations as the available MCSs. We assume that the state of the S2R channel can be bad (s0) or good (s1), the state of the R2D channel can be good for BPSK (d0), good for QPSK (d1), or good for 8PSK (d2), and the state of residual energy can be dead (e0), low (e1), or high (e2). Consequently, there are 18 states for each available relay: s0d0e0, s0d0e1, s0d0e2, s0d1e0, s0d1e1, s0d1e2, s0d2e0, s0d2e1, s0d2e2, s1d0e0, s1d0e1, s1d0e2, s1d1e0, s1d1e1, s1d1e2, s1d2e0, s1d2e1, and s1d2e2. In the S2R channel, we set the transition probability of staying in the same state as 0.7 and set the transition probability from one state to another as 0.3. In the R2D channel, we set the transition probability of staying in the same state as 0.6 and set the probability of transition to the adjacent state to be three times that of transition to a nonadjacent state. The state-transition probability matrices of the S2R and R2D channels are 0.7 0.3 Φ= 0.3 0.7 ⎛ ⎞ 0.6 0.3 0.1 Ψ = ⎝ 0.2 0.6 0.2 ⎠ . 0.1 0.3 0.6 Let the residual energy state-transition probability matrix be Θ0 when the relay is passive and Θ1 when the relay is active, i.e., ⎛ ⎞ 1.00 0.00 0.00 Θ0 = ⎝ 0.01 0.99 0.00 ⎠ 0.00 0.01 0.99 ⎛ ⎞ 1.00 0.00 0.00 Θ1 = ⎝ 0.08 0.92 0.00 ⎠ . 0.00 0.08 0.92 Since the S2R channel state, the R2D channel state, and the residual energy state are independent of each other, the relay-state transition-probability matrices P 0 and P 1 for the 18 states can easily be acquired according to (6). The reward considered in this paper is given by (7), which is the function of the BER in the S2R link, the spectral efficiency in the R2D link, energy consumption, and residual energy with
WEI et al.: DISTRIBUTED OPTIMAL RELAY SELECTION IN WIRELESS COOPERATIVE NETWORKS WITH FSMCs
different weights. Assume that the BER of a bad channel is about Pb = 10−3 , and the BER of a good channel is about Pb = 10−5 . We take lg(Pb /10−3 ) as the first component of the reward function weighted by negative ωp and get reward 0 when the state of the S2R channel is s0 and get reward −2ωp when the state of channel S2R is s1. The spectral efficiencies of BPSK, QPSK, and 8PSK are 1, 2, and 3, respectively. In general, the better the channel state, the lower the transmission energy requirement [25]. Here, we assume that the transmit power is fixed and that the energy consumption for retransmitting a data packet is inversely proportional to data rate or spectral efficiency. Let the energy consumption for the d0 channel, d1 channel, and d2 channel be 1, 1/2, and 1/3, respectively. Therefore, the second and third components of the reward function are determined by the state of channel R2D weighted by ωη and ωJ , respectively. The reward is ωη + ωJ when the state of channel R2D is d0; the reward is 2ωη + 1/2ωJ when the state of channel R2D is d1; the reward is 3ωη + 1/3ωJ when the state of channel R2D is d2. The fourth component of the reward function is determined by the state of residual energy weighted by ωE . We set the reward to be 0, ωE , and 2ωE for energy states e0, e1, and e2, respectively. The reward function Riann will be zero if the relay is passive (an = 0) or the energy state is dead (e0); thus, we get the reward corresponding to the 18 states of a relay. We set the discount factor β = 0.8 in the simulations. The initial states of the relays are random.
2155
Fig. 2. Spectral efficiency using different relay selection schemes.
A. R2D Spectral Efficiency Improvement In this section, we illustrate the R2D spectral efficiency improvement of the proposed scheme. For simplicity of presentation, we do not consider the energy issue here, which will be considered later. Without the energy state, each relay has six states: s0d0, s0d1, s0d2, s1d0, s1d1, and s1d2. The relay-statetransition probability matrix (P) can easily be acquired according to the state-transition-probability matrices of the S2R and R2D channels. The system reward can be acquired according to different weights to each component of the reward function. We can specify ωη = 1 and 0 for the other weights. Thus, the reward for the six states is R1 = {1, 2, 3, 1, 2, 3} when the relay is selected and R0 = {0, 0, 0, 0, 0, 0} when the relay is not selected. We run the simulations for 2000 s, with N = 8 available relays, one of which will be selected at each decision time. Fig. 2 shows the spectral efficiency using different relayselection schemes. It can be seen that the proposed scheme can select a relay with a good-for-8PSK (d2) channel for the subsequent frame at almost every decision time, and the spectral efficiency is near 3 b/s/Hz. The spectral efficiency using the existing memoryless selection method is about 2.5 b/s/Hz, since it selects a relay for the subsequent frame according to the current state, which may change in the subsequent frame. The performance of the memoryless method will be better if the wireless channel is nearly static. Fig. 3 shows the average spectral efficiency with different state-transition probabilities of staying in the same state. It can be seen that the performance of the memoryless method is getting closer to our proposed scheme with the increase of the transition probability, and this method performs as good as our proposed scheme when the
Fig. 3. Spectral efficiency with different transition probabilities.
channel is absolutely static, in which the transition probability that the channel will be at the same state is 1. Our proposed scheme can achieve the highest R2D spectral efficiency in any transition probability, and the random-selection scheme has the spectral efficiency of about 2 b/s/Hz in any case. We also observe that the memoryless method is still better than the random-selection scheme when the transition probability is 0.5. This is because the transition probability from d2 to d1 is three times the transition probability from d2 to d0, and this method always tries to select the relay with d2 state, which is better than random selection among d2, d1, and d0 states. We also investigate the effects of the number of available relays N on the system performance. Fig. 4 shows the performance with different numbers of available relays. As the number of available relays increases, the probability that there exists at least one relay with a good-for-8PSK (d2) channel is high so that there is always a good candidate for the relayselection schemes. Since the R2D channel state of a relay has the probability of 1/3 of being in the good-for-8PSK (d2) state, we can see that eight available relays are enough to provide a good relay, and the proposed scheme will select it and obtain the highest spectral efficiency. It can be seen that the proposed scheme always has better R2D spectral efficiency compared
2156
IEEE TRANSACTIONS ON VEHICULAR TECHNOLOGY, VOL. 59, NO. 5, JUNE 2010
Fig. 4. Spectral efficiency with different numbers of available relays.
Fig. 6.
Average BER with different transition probabilities.
Fig. 7.
Average BER with different numbers of available relays.
Fig. 5. Average BER using different relay selection schemes.
with the existing memoryless-selection and random-selection schemes. B. S2R Error-Propagation-Mitigation Improvement If we care most about the S2R error-propagation problem, we can assign ωp = −1 and 0 for the other weights. Thus, ωp lg(Pb /10−3 ) will be the system reward, and the reward of the six states is R1 = {0, 0, 0, 2, 2, 2} when the relay is active, and R0 = {0, 0, 0, 0, 0, 0} when the relay is passive. We run the simulations for 2000 s, with N = 8 available relays. Fig. 5 shows the average BER using different relay-selection schemes. It can be seen that the proposed scheme can select a relay with a good (s1) channel for the subsequent frame at almost every decision time, and the average BER is about 10−5 . The average BER using the existing memoryless method fluctuates round 4 × 10−5 , and the average BER of the random-selection scheme fluctuates round 10−4 . Fig. 6 shows the performance with different transition probabilities for the channel staying in the same state. It can be seen that the performance of the existing memoryless method is getting closer to our proposed scheme with the increase of the transition probability, and this method performs as good as our proposed scheme when the channel is absolutely static. Notice
that the memoryless method performs the same as the randomselection scheme when the transition probability is 0.5, which means that it is not sure whether the channel will stay in the same state or not. Our proposed scheme can achieve the lowest average BER in any transition probabilities, and the performance of the random-selection scheme is worst in any case. Fig. 7 compares the performance with different numbers of available relays. Since there are only two states of channel S2R, the probability that there exists at least one relay with a good (s1) channel is very high if the number of available relays is bigger than six. It can be seen that the proposed scheme is able to select a relay with a good (s1) channel at every decision time when the number of available relays is larger than six. We can also observe that the proposed scheme always has better performance compared with the existing memorylessselection and random-selection schemes with different numbers of available relays. C. Network Lifetime Improvement In this section, we concentrate on the energy issue and conduct simulations with the energy state as a component of the relay state, and each relay has 18 states, as mentioned at the
WEI et al.: DISTRIBUTED OPTIMAL RELAY SELECTION IN WIRELESS COOPERATIVE NETWORKS WITH FSMCs
Fig. 8.
Network lifetime comparison under different threshold definitions.
beginning of Section V. The relay-state-transition probability matrices and the system reward corresponding to the 18 states have been discussed. Each simulation lasts 2000 s, and there are N = 20 available relays at the beginning of each simulation. Since increasingly more relays will run out of energy after transmission for some time, the number of available relays will become increasingly less. The definition of the network lifetime depends on the underlying network application, and one of the commonly used lifetime definitions is that the number of dead nodes reaches a threshold Nth that the network can no longer achieve the target performance. This definition is used in this paper. If the network lifetime is the most concerned issue, we set the two related weights ωE = 0.5 and ωJ = −0.5 and 0 for the other weights. Therefore, the residual energy and energy consumption will be the system reward in our proposed scheme. We set the initial state of each relay as high (e2) energy, and the number of dead relays will increase with transmission time. Fig. 8 compares the network lifetime of different selection methods with varying Nth . As expected, the lifetime of all methods increases with Nth , and the proposed scheme always has a longer lifetime. The existing memoryless method always selects the relay with the most residual energy at the decision time without considering the energy consumption, and the randomselection scheme randomly selects one relay from those that are still alive. As can be seen from Fig. 8, the first dead relay appears at about 250 s in the random-selection scheme, the first dead relay appears at about 330 s and 450 s in the memoryless method and our proposed scheme, respectively. Energy also has some effects on spectral efficiency. As shown in Fig. 9, the spectral efficiency will decline with time. This is because increasingly more relays run out of energy after data transmission for some time slots. It can be seen that there is hardly any live relay at 2000 s, and the spectral efficiency drops to near zero in all three schemes. The proposed scheme outperforms the other two schemes during most of the simulation time. VI. C ONCLUSION AND F UTURE W ORK In this paper, we have presented a distributed relay-selection scheme considering FSMCs, AMC, and residual relay energy,
2157
Fig. 9. Spectral efficiency considering the energy state.
in wireless cooperative networks. We have formulated the relay-selection problem as a restless multiarmed bandit system and solved this stochastic control problem with an LP relaxation algorithm. The obtained relay-selection policy has an indexability property, which dramatically reduces the online computation and implementation complexity. The priority indices can be computed and stored into a table offline. For the online relay selection, each relay just needs to look up its index table to find out the index corresponding to its current state. Moreover, the proposed relay-selection process operates in a distributed manner, and relays can freely join or leave from the set of relay candidates. Simulation results have been presented to illustrate that the proposed relay-selection scheme can significantly increase the R2D spectral efficiency and network lifetime, as well as mitigate S2R error propagation. Future work is in progress to extend the proposed scheme to consider other parameters such as power control and application-layer QoS in wireless cooperative networks. We are also working on a fading simulator to compare the results with the Markov model in the relay-selection problem. ACKNOWLEDGMENT The authors would like to thank the reviewers for their detailed reviews and constructive comments, which have helped to improve the quality of this paper. R EFERENCES [1] A. Sendonaris, E. Erkip, and B. Aazhang, “User cooperation diversity part I and part II,” IEEE Trans. Commun., vol. 51, no. 11, pp. 1927–1948, Nov. 2003. [2] A. Nosratinia, T. E. Hunter, and A. Hedayat, “Cooperative communication in wireless networks,” IEEE Commun. Mag., vol. 42, no. 10, pp. 74–80, Oct. 2004. [3] W. Ni, G. Shen, S. Jin, T. Fahldieck, and R. Muenzner, “Cooperative relay in IEEE 802.16j MMR,” Alcatel, Shanghai, China, Tech. Rep. IEEE C802.16j-06_006r1, May 2006. [4] P. H. J. Chong, F. Adachi, S. Hamalainen, and V. Leung, “Technologies in multihop cellular network,” IEEE Commun. Mag., vol. 45, no. 9, pp. 64– 65, Sep. 2007. [5] E. Beres and R. Adve, “Selection cooperation in multi-source cooperative networks,” IEEE Trans. Wireless Commun., vol. 7, no. 1, pp. 118–127, Jan. 2008.
2158
IEEE TRANSACTIONS ON VEHICULAR TECHNOLOGY, VOL. 59, NO. 5, JUNE 2010
[6] T. C.-Y. Ng and W. Yu, “Joint optimization of relay strategies and resource allocations in cooperative cellular networks,” IEEE J. Sel. Areas Commun., vol. 25, no. 2, pp. 328–339, Feb. 2007. [7] J. Cai, X. Shen, J. W. Mark, and A. S. Alfa, “Semi-distributed user relaying algorithm for amplify-and-forward wireless relay networks,” IEEE Trans. Wireless Commun., vol. 7, no. 4, pp. 1348–1357, Apr. 2008. [8] A. S. Ibrahim, A. K. Sadek, W. Su, and K. J. R. Liu, “Relay selection in multi-node cooperative communications: When to cooperate and whom to cooperate with?” in Proc. IEEE GLOBECOM, Nov. 2006, pp. 1–5. [9] M. M. Fareed and M. Uysal, “A novel relay selection method for decodeand-forward relaying,” in Proc. CCECE, Niagara Falls, ON, Canada, May 2008, pp. 135–140. [10] A. Bletsas, A. Khisti, D. P. Reed, and A. Lippman, “A simple cooperative diversity method based on network path selection,” IEEE J. Sel. Areas Commun., vol. 24, no. 3, pp. 659–672, Mar. 2006. [11] J. Yang, A. K. Khandani, and N. Tin, “Statistical decision making in adaptive modulation and coding for 3G wireless systems,” IEEE Trans. Veh. Technol., vol. 54, no. 6, pp. 2066–2073, Nov. 2005. [12] H. S. Wang and P.-C. Chang, “On verifying the first-order Markovian assumption for a Rayleigh fading channel model,” IEEE Trans. Veh. Technol., vol. 45, no. 2, pp. 353–357, May 1996. [13] C. Pimentel, T. H. Falk, and L. Lisbôa, “Finite-state Markov modeling of correlated Rician-fading channels,” IEEE Trans. Veh. Technol., vol. 53, no. 5, pp. 1491–1501, Sep. 2004. [14] C. D. Iskander and P. T. Mathiopoulos, “Fast simulation of diversity Nakagami fading channels using finite-state Markov models,” IEEE Trans. Broadcast., vol. 49, no. 3, pp. 269–277, Sep. 2003. [15] J. N. Laneman, D. N. C. Tse, and G. W. Wornell, “Cooperative diversity in wireless networks: Efficient protocols and outage behavior,” IEEE Trans. Inf. Theory, vol. 50, no. 12, pp. 3062–3080, Dec. 2004. [16] M. Yu and J. Li, “Is amplify-and-forward practically better than decodeand-forward or vice versa?” in Proc. IEEE ICASSP, Mar. 2005, vol. 3, pp. 365–368. [17] J.-F. Cheng, “Coding performance of hybrid ARQ schemes,” IEEE Trans. Commun., vol. 54, no. 6, pp. 1017–1029, Jun. 2006. [18] Y. Zhang, Y. Ma, and R. Tafazolli, “Modulation-adaptive cooperation schemes for wireless networks,” in Proc. IEEE VTC Spring, Singapore, May 2008, pp. 1320–1324. [19] S. Choi, M.-S. Alouini, H.-C. Yang, and M. O. Hasna, “Comparison of relaying strategies for cooperative diversity systems with adaptive modulation,” in Proc. IEEE VTC Spring, Barcelona, Spain, Apr. 2009, pp. 1–5. [20] G. Holland, N. H. Vaidya, and P. Bahl, “A rate adaptive MAC protocol for multi-hop wireless networks,” in Proc. IEEE Mobicom, Jul. 2001, pp. 236–251. [21] L. Li and A. J. Goldsmith, “Low-complexity maximum-likelihood detection of coded signals sent over finite-state Markov channels,” IEEE Trans. Commun., vol. 50, no. 4, pp. 524–531, Apr. 2002. [22] C. C. Tan and N. C. Beaulieu, “On first-order Markov modeling for the Rayleigh fading channel,” IEEE Trans. Commun., vol. 48, no. 12, pp. 2032–2040, Dec. 2000. [23] H. Bischl and E. Lutz, “Packet error rate in the non-interleaved Rayleigh channel,” IEEE Trans. Commun., vol. 43, no. 2–4, pp. 1375–1382, Feb.–Apr. 1995. [24] P. Hu, Z. Zhou, Q. Liu, and F. Li, “The HMM-based modeling for the energy level prediction in wireless sensor networks,” in Proc. IEEE 2nd Conf. Ind. Electron. Appl., Harbin, China, May 2007, pp. 2253–2258. [25] Y. Chen, Q. Zhao, V. Krishnamurthy, and D. Djonin, “Transmission scheduling for optimizing sensor network lifetime: A stochastic shortest path approach,” IEEE Trans. Signal Process., vol. 55, no. 5, pp. 2294– 2309, May 2007. [26] J. Hu and N. C. Beaulieu, “Close-form expression for the outage and error probabilities of decode-and-forward relaying in dissimilar Rayleigh fading channels,” in Proc. IEEE ICC, Glasgow, U.K., Jun. 2007, pp. 5553–5557. [27] H. Robbins, “Some aspects of the sequential design of experiments,” Bull. Amer. Math. Soc., vol. 58, no. 5, pp. 527–535, 1952. [28] D. Bertsimas and J. Niño-Mora, “Restless bandits, linear programming relaxations, and a primal dual index heuristic,” Oper. Res., vol. 48, no. 1, pp. 80–90, Jan./Feb. 2000.
Yifei Wei received the B.S. and Ph.D. degrees in electrical engineering from Beijing University of Posts and Telecommunications (BUPT), Beijing, China, in 2004 and 2009, respectively. He is currently a Lecturer with the School of Electronic Engineering, BUPT. He was involved in several projects funded by the National High Technology Research and Development Program of China and the National Natural Science Foundation of China. He was invited to study at Carleton University, Ottawa, ON, Canada, in electrical engineering for one year, supported by the China Scholarship Council. He has published 18 papers in journals and conference proceedings. He is the holder of eight applied patents. His research interests are in wireless mesh networks, heterogeneous converged networks, and cooperative relaying networks.
F. Richard Yu (S’00–M’04–SM’08) received the Ph.D. degree in electrical engineering from the University of British Columbia, Vancouver, BC, Canada, in 2003. From 2002 to 2004, he was with Ericsson, Lund, Sweden, where he worked on the research and development of third-generation cellular networks. From 2005 to 2006, he was with a start-up in California, where he worked on the research and development in the areas of advanced wireless communication technologies and new standards. In 2007, he joined the Carleton School of Information Technology and the Department of Systems and Computer Engineering, Carleton University, Ottawa, ON, Canada, where he is currently an Assistant Professor. His research interests include cross-layer design, security, and quality-of-service provisioning in wireless networks. Dr. Yu received the Leadership Opportunity Fund Award from the Canada Foundation of Innovation in 2009 and best paper awards at the 2009 IEEE/ International Federation for Information Processing (IFIP) TrustCom and the 2005 International Conference on Networking. He has served on the Technical Program Committees (TPC) of numerous conferences and as a Cochair of the 2009 International Conference on Ultra Modern Telecommunications Workshop on Cognitive Wireless Communications and Networking (CWCN) and a TPC Cochair of the 2010 IEEE Conference on Computer Communications (INFOCOM)–CWCN, the 2009 IEEE International Wireless Communications and Mobile Computing Conference, the 2008 Fall IEEE Vehicular Technology Conference Track 4, and the 2007 International Workshop on Wireless Networking for Intelligent Transportation Systems.
Mei Song received the B.S. and M.S. degrees in electrical engineering from Tianjin University, Beijing, China, in 1983 and 1986, respectively. She is currently a Professor with the Beijing University of Posts and Telecommunications, where she is also the Vice Dean the School of Electronic Engineering. Her research interests are in future communications, wireless broadband networks, and heterogonous networks.