Cognitive Spectrum Access Control Based on Intrinsic Primary ARQ ...

3 downloads 773 Views 119KB Size Report
the information available to the spectrum sensor. It should be noted that ... under the SU-Tx access control policy, therefore allowing the. SU-Tx to better exploit ...
Cognitive Spectrum Access Control Based on Intrinsic Primary ARQ Information Fabio E. Lapiccirella, Zhi Ding and Xin Liu Electrical and Computer Engineering University of California, Davis, California 95616 Email: [email protected] [email protected] [email protected]

Abstract—We explore a flexible and non-intrusive opportunistic spectrum access control mechanism for cognitive radios that goes beyond the standard “Listen-Before-Talk” (LBT) strategy. We exploit the bi-directional nature of many primary communication systems to control cognitive radio access that is based on both spectral sensing and primary receiver acknowledgements. In this work, we limit our attention on the ACK/NAK messages in primary data-link-control. We use partially observable Markovdecision processes to devise an optimized admission control policy. Our new method achieves high cognitive network throughput while offering robust protection of primary user signals.

I. I NTRODUCTION Cognitive radio access delineates a new paradigm for dynamically alleviating the spectrum scarcity problem. Cognitive radios are designed to access a bandwidth already allocated to a primary user network under the constraint that the primary user communication quality of service is not heavily impaired. Cognitive radio access control has been a popular topic of research, for example, the authors of [1] devised distributed spectrum sensing and access strategies under an energy constraint on secondary users. In [2] and [3], the design of sensing policies for tracking spectrum opportunities is explored. The authors of [4] applied a partially observable Markov decision framework to devise an optimal sensing and channel selection policy in a multi-channel opportunistic communication system. The authors of [5] and [6] studied the challenge of SNR threshold below which the sensing outcome is severely impaired regardless of the sensing time. The authors of [7] derived an optimal sensing policy for exponential idle time whereas in [8] they addressed optimal sensing strategies for continuous time systems, exploiting acknowledgement signals from the SU-Rx. Also, the authors of [9] presented a cognitive radio system that varies transmission power according to all the information available to the spectrum sensor. It should be noted that, thus far, “listen-before-talk (LBT)” is the state-ofthe-art opportunistic spectrum access approach to cognitive radio. LBT simply relies on spectrum sensing, thereby enabling a secondary user (SU) to access the spectrum of a primary user (PU) only after sensing the PU spectrum unoccupied band. LBT is natural and practical, not requiring any modification to the existing PU infrastructure. However, LBT only senses primary transmission activities and it is unaware of the actual receiver conditions. More specifically, it neither solves the

hidden receiver problem nor utilizes any capacity that robust interference-resistant PU networks may provide. To expand the applications and to improve the efficiency of cognitive radios, we enhance the LBT learning capability by leveraging the feedback information typically available in duplex PU links. Feedback signals for data-link-control are available in many systems such as HSDPA [10] and WiMAX [11] in the form of ACK/NAK packets and downlink/uplink profiles. Such information allows the SU to more accurately infer the effect of the SU activity on the PU receivers (PU-Rx). In this work, we emphasize that advanced cognition should require the SUs to learn about primary network characteristics and user interaction. Similar concept of cognition has been discussed in [12] where a SU transmits probing signals to observe the PU transmission power changes in response to better estimate the SU to PU channel gain. Specifically, we utilize the ACK/NAK signals transmitted by the packet receivers for data-link-control. Exploiting the ACK/NAK signals from the PU-Rx allows the SUs to detect the presence of any hidden PU-Rx and to assess the quality of the PU-Rx reception, thereby better providing the necessary PU protection. It also makes it possible for the SU transmitter (SU-Tx) to optimize its access policy by estimating the PU-Rx reception quality under the SU-Tx access control policy, therefore allowing the SU-Tx to better exploit any PU excess capacity. This paper is an extension of our preliminary work [13] where we presented an optimal channel access scheme based PU ACK/NAK information overhearing. The rest of this manuscript is divided into three sections. Section II presents the description of the problem and the basic formulation of the cognitive spectrum access based on acknowledgement information. Section III presents the development of our optimized spectrum access policy. Section IV presents our test results of our access algorithm. Section V summarizes our conclusions and future directions under pursuit. II. S YSTEM

MODEL

Fig. 1 illustrates a wireless scenario under investigation that involves the co-existence of a primary and a secondary link. The PU access is time-slotted for packet transmissions. The high priority PU may transmit the available packets at the beginning of each slot. We assume that the SU uses that same

111 000 000 111 000 111 000 111 000 111 000 111 000 111 000 111 000 111 000 111 000 111 PU−Tx

Forward link

that the SU-Tx can correctly decode a feedback signal from the PU-Rx. To optimize the access control, we need to define the perstage reward/cost function. Let pt = q and the action at = ν, then:    rT − cT · P [NAK], for ν=T; (4) R(q, ν) = −cS , for ν=S;   0, for ν=I;

Reverse link

PU−Rx

11 00 00 11 00 11 00 11 00 11

SU−Tx

Fig. 1.

SU learning environment.

slot length and that the SU actions are in synchronization with the PU time slot. The PU traffic is randomly busy or idle, irrespective of the SU action. The random BUSY/IDLE durations are independent distributed with probability mass functions, respectively, pB (k) = P [DB = k]

pI (k) = P [DI = k].

(1)

Here DB and DI are discrete random variables denoting the BUSY and IDLE periods, respectively. SUs are aware of the IDLE/BUSY distribution (from prior measurement). We use partially observable Markov decision processes to devise an optimal admission control policy. Our time horizon is infinite. At the start time t of every slot, we define the binary variable st via: ( 0, when the PU channel is BUSY; st = (2) 1, when the PU channel is IDLE. The SU-Tx can take one of three possible actions:   I, “Idling”; at = S, “Sensing”;   T, “Transmitting”.

where rT is a reward the SU-Tx gets for each packet transmission, cT and cS are the cost of a packet collision with the PUs and the cost of sensing the PU spectrum, respectively. P [NAK] = (1 − q) · Q2 is the prior probability of PU-Rx sending a NAK. In what follows, we will define τj (t) as the estimated transition instant to state j ∈ {0; 1} from state 1 − j based on observation up to time t. τj (t) is needed in order to tackle the information state update from one time slot to another. Our algorithm development assumes that the SU-Tx knows the starting point of the first PU BUSY cycle. For simplicity and without loss of generality we will assume the first BUSY cycle starts at time 0. The access control policy we seek consists of a sequence of functions π = {µ0 , µ1 , . . . , µt , . . .} where each function µt (.) maps the information state pt into an action st ∈ {I, S, T}. At time t, upon taking action at , the SU-Tx uses the current and past observations to determine a maximum-a-posteriori (MAP) estimation of the next PU-traffic transition moment. Given a policy π we define the value function of the SU-Tx from time ts = 0 as:

(3)

To take action “S”, the SU-Tx uses an energy detector to detect whether the PU-Tx spectrum is IDLE or BUSY. The detection (sensing) is not perfect and is characterized by a false-alarm probability PF and a missed-detection probability PM . “False alarm” means declaring an IDLE primary channel to be BUSY, while a missed detection means declaring a BUSY primary channel to be IDLE. Since the state st is not directly observable to the SU, we define

Vπ (q, ts , τ0 (ts )) = Eπ [

+∞ X

αt r(pt , at )|pts = q].

The optimized access policy is an admissible policy π ∗ that maximizes the expected value function (5): π ∗ = argmax{Vπ (q, ts , τ0 (ts )))}. π

III. O PTIMIZED SPECTRUM

ACCESS CONTROL

ρt = P [st = 1], PU activity

and pt as the estimate of ρt by the SU-Tx from its observations up to time t. We will refer to pt as the information state of the SU-Tx. When the PU is BUSY, it corresponds to packet error rates of Q1 and Q2 , in the presence and the absence of SU-Tx transmissions, respectively. Naturally, Q1 < Q2 . We let both Q1 and Q2 be known to the SUs a priori from measurements before deploying access control. We assume that the SU-Tx can either correctly decode a feedback message from the PURx or cannot decode it at all. We define as η the probability

(5)

t=ts

11 00 00 11 00 11 00 11

DATA Feedback

11111111 00000000 00000000 11111111 00000000 11111111 00000000 11111111 00000000 11111111 0

1

11111 00000 00000 11111 00000 11111 00000 11111 00000 11111

2

3

4

5

time

5

time

SU activity a0 = S

O0 O2 O3 O4

a1 = I

a2 = S

0 1 2 ∈ {I∅ , Ia , In , B∅ , Ba , Bn } ∈ {I∅ , B∅ } =∅ ∈ {ACK, NAK, ∅}

Fig. 2.

a3 = T 3

a4 = T 4

Sequences of spectrum access.

(6)

Fig. 2 shows a possible sequence of PU/SU activities in time. Clearly, no observation is associated to the action ν = I. At time t, for action at = T , we have: OtT ∈ ΩT = {ACK, NAK, ∅};

(7)

where “ACK” stands for the event of receiving an ACK from the PU-Rx, “NAK” denotes the event of receiving a NAK packet from the PU-Rx, possibly caused by the SU interference, whereas “∅” stands for the case of SU-Tx failing to decode any feedback from the PU-Rx. This may be due to either high noise/interference level at the SU-Tx or an idle PU. For action at = S, the SU-Tx observes OtS . Specifically, OtS

S

∈ Ω = {I∅ , IA , IN , B∅ , BA , BN };

latest estimated PU traffic transition at t, the information vector is defined ∀t > 0: a

t−1 pt = P [st = 1|Ot−1 , τj (t − 1)];

The SU-Tx updates pt following two steps: 1) Determining the posterior probability: pˆt = P [st = 1|Otν , τj (t − 1)].

v1 = P [Otν |st = 1],

(8)

= argmax x>τj (t−1)

P [st+1 = j|{st = j, . . . , st−Kj = j}] = P [Dj ≥Kj +1] ; P [Dj ≥ Kj + 1|Dj ≥ Kj ] = P [D j ≥Kj ]

A. Information state update equations At time t, the SU-Tx takes action at = ν and uses Otν and O−t to determine the next PU traffic transitions via a MAP estimator. Since the MAP estimator is error-prone, the SUTx will base its channel admission control on the information state pt to fine-tune the PU channel state. Let τj (t) be the

(16)

where Dj is a random variable denoting the number of time slots in state j. For computational tractability, we assume that since the last τj (t), at most one transition may take place. To update the probability estimate that st+1 = 1, we need to define this probability:

= x]) + ln(P [τ1−j (t) = x)}.

There are two possibilities for τi (t): 1) τj (t − 1) < τ1−j (t) < t: the estimated transition from state j to 1−j happened before t. In this case, the traffic distribution used to calculate the information state pt will change from pj (k) = P [Dj = k] to p1−j (k) = P [D1−j = k]. Moreover, the SU-Tx will erase the stale O−t and create a new one associated to τ1−j (t). 2) τ1−j (t) ≥ t: the MAP estimator does not detect a transition before t. The SU-Tx assumes the PU channel state to remain in state j and will discard the estimate τ1−j (t).

(14)

2) Updating the information vector pt+1 . Let Kj be the number of time slots the PU channel has been in state j ∈ {0; 1}. Its probability of being still in state j at time t + 1 is:

(10)

(11)

v0 = P [Otν |st = 0].

It then uses (14) to calculate the observation probability p(Otν ) = v1 · pt + v0 · (1 − pt ). The posterior probability (13) is defined as: v1 · pt . (15) pˆt = v1 · pt + v0 · (1 − pt )

Recall that τj (t − 1) denotes the last estimated transition instant at t−1. After collecting a new Otν , the SU-Tx estimates the j → 1 − j transition instant τ1−j (t) via the MAP principle {ln(P [Otν , O−t |τ1−j (t)

(13)

In order to find pˆt of (13), the SU-Tx determines first the probabilities

where I∅ denotes an IDLE observation while the SU-Tx does not decode any feedback message from the PU-Rx, IA denotes IDLE observation while an ACK is correctly decoded at the end of the time slot; IN represents IDLE sensing while a NAK from the PU-Rx is correctly decoded. The symbol B follows the same convention to represent the three cases when the PU channel has been sensed as BUSY. The SU stores the past observations since the latest PU traffic pattern transition in the vector O−t . For example, if at time t the latest transition epoch is τ0 (t), the vector O−t will be: h a i at−1 τ0 (t) , . . . , Ot−1 O−t = Oτ0 (t) . (9)

τ1−j (t) = MAP(τ1−j (t)|Otν , O−t , τj (t − 1))

(12)

Kj X

P [D1−j ≥ Kj − k + 1] ). P [D1−j ≥ Kj − k] k=1 (17) Hence, the updated information vector is w=

p[Dj = k] · (1 −

pt+1 =

P [Dj ≥Kj +1] P [Dj ≥Kj ]

· pˆt + w · (1 − pˆt ).

(18)

For the sake of simplicity, we will refer this update procedure as a function: pt+1 = UPDATE(Otat , pt , t, τj (t)).

(19)

B. Optimal value function and policy calculation We define V (p, t, τj (t)) as the maximum expected discounted value function that the SU-Tx can get at time slot t, with the information state p assuming that the latest PU traffic transition was to state j. From the Bellman equation: V (p, t, τj (t)) =

max {Va (p, t, τj (t))}; T, S}

a∈{I,

(20)

where VI (p, t, τj (t)), VS (p, t, τj (t)), VT (p, t, τj (t)) are the value function associated with actions “Idling”, “Sensing”,

and “Transmitting”, respectively. They are defined as Total Transmission, Sense, Idle time %

80

VI (p, t, τj (t)) = R(p, I) + α · V (q = UPDATE({∅}, p, t, τj (t))); VS (p, t, τj (t)) = R(p, S) X +α· P [OS ]V (q = UPDATE(OS , p, t, τj (t))); OS ∈ΩS

VT (p, t, τj (t)) = R(p, T ) (21) X +α· P [OT ]V (q = UPDATE(OT , p, t, τj (t))); OT ∈ΩT

50 40 30 20 10 0 0

0.2

0.4

η

0.6

0.8

1

Total Transmission, Sense, Idle time %

70 60 Sensing Idling Transmitting

50 40 30 20 10 0 0

0.2

0.4

(b)

η

0.6

0.8

1

Fig. 3. Total percentage of sensing, transmission and idle time as η varies: (a) under setting A; (b) under setting B.

where I(·) is the indicator function and T is the test duration. In all our tests, we set T = 100. To examine the effect of cost/reward on the SU behavior, we test 2 different settings of costs and reward shown below. The sensing cost in setting B is much lower than in setting A. We let Q1 = 0.01 and Q2 = 0.99, implying that a collision between simultaneous PU-SU transmissions will almost certainly lead to a NAK from the PU-Rx. set A set B

60

(a)

where the set Ων is defined in (7) and (8), for action ν = {T,S} respectively; R(p, ν) is the per-stage reward function associated with the action ν, and q = UPDATE(Oν , p, t, τj (t))) is the information state-update procedure earlier. We omitted the time indices of the current and next state for simplicity. IV. S IMULATION RESULTS We now test our optimized SU access policy through computer simulations. In the test, we let BUSY and IDLE durations be uniformly distributed over [1, 10] and [1, 20], respectively. Our performance metrics are the PU packet collision probability (Pcoll ) and the SU throughput PT I(at = T) (22) SUth = t=0 (T + 1)

Sensing Idling Transmitting

70

65 60

SUth set A vs. set A %

55 50

cs=5

45

cs=0.5

40 35 30 25

rT = 5 rT = 5

cT = 15 cT = 20

cS = 5 cS = 0.5

20 0

0.2

0.4

(a)

0.6

0.8

25

1

cs=5 cs=0.5

20 Collision prob %

Figure 3 compares the SU behavior as a function of η under the two different cost/reward settings. The transmission percentage can also be regarded as a throughput metric. As the decoding probability η increases, the feedback information becomes more reliable. Therefore, the SU throughput increases and the percentage of time spent in the idle state decreases. Since setting A represents a high sensing cost, the SU-Tx senses the spectrum more frequently compared to setting B. In both cases, as η increases which leads to higher ACK/NAK reliability, the transmission time increases. Fig. 4(a) provides an SU throughput comparison between the two settings, where as Fig. 4(b) illustrates the corresponding collision rate under the same conditions. When η = 0, then the optimal decision is based solely on the sensing outcome. In this case, SU-Tx shows a higher throughput under setting B, whose lower sensing cost allows the SU-Tx to sense more

η

15

10

5

0 0

(b) Fig. 4.

0.2

0.4

η

0.6

0.8

Comparison of: (a) SUth and (b) PU collision rate.

and transmit more. When the ACK/NAK decoding is less reliable (0.1 ≤ η ≤ 0.35), the SU throughput associated to

Transmission, Sense, Idle time % in a PU Idle cycle

setting A is higher because its higher sensing cost does not encourage sensing whereas some correctly decoded feedback information from the PU-Rx may encourage transmission in lieu of sensing. On the other hand, cs = 0.5 allows the SUTx to be more conservative and to transmit only when it is confident of the PU channel state. This is reflected by the Pcoll shown in Fig. 4(b) where it is clear that sensing more frequently leads to less collisions with the PUs. When PUTx signals become more reliable (η > .35) the SUth in both settings have similar throughput results, as the SU can rely more on the PU-Tx feedback and less on sensing. When cs = 0.5 and the sensing cost is low, the Pcoll varies little between 5% and 8%. In this case, sensing is more desirable than getting ACK/NACK feedback (η > 0) as ACK/NAK does little to improve the performance versus no feedback (η = 0). More sensing leads to lower of SUth as less time is devoted to transmission. On the other hand, when cs = 5, reliable feedback information help reduce the Pcoll from 15% to 5%. This means that when pure sensing is expensive or unreliable, feedback from the PU-Rx constitutes a reliable observation signal for PU protection.

Transmission, Sense, Idle time % in a PU Idle cycle

We investigated means of improving the basic LBT access strategy for cognitive radio systems. Exploiting data-linkcontrol messages that can be overheard by the SU-Tx, our new approach can enhance the traditional spectral sensing and more accurately determine the operating conditions of the primary reception for protection. Based on the simple ACK/NAK signals from the PU-Rx and the prior knowledge of the PU idle-busy probability distribution, we applied partially observable Markov-decision processes to devise an optimal channel access control strategy in order to maximize the secondary user utility. Our future works include investigating means for SU capacity enhancement by detecting the robust mode of the PU-Rx under SU interference as well as developing more versatile ways of SU access such as multi-level power access.

90

R EFERENCES

80 70 60

Sensing Idling Transmitting

50 40 30 20 10 0 0

0.2

0.4

(a)

(b)

stay idle, as shown in Fig. 5(a). Fig. 5(b) shows that when the sensing cost decreases, the SU-Tx changes its behavior and senses the PU channel more frequently. In both cases, when the SU-Tx is able to decode almost all its feedback messages (η > 0.99), the data-link-control feedback message represents a very reliable indicator of spectrum opportunity that allows the SU-Tx to reach throughput of nearly 90%. V. C ONCLUSIONS AND FUTURE WORKS

η

0.6

0.8

1

90 80 70

Sensing Idling Transmitting

60 50 40 30 20 10 0 0

0.2

0.4

η

0.6

0.8

1

Fig. 5. SU sensing, transmission and idle time percentage during a PU Idle period as a function of η under: (a) setting A; (b) setting B.

Fig. 5 shows the percentage of the three possible actions during a PU idle period for the two test cases of high versus low sensing cost. Since the “T” action allows the SU-Tx to listen on the reverse channel and update its information state through the observation of the feedback, when the sensing cost is higher, it is preferable for the SU-Tx to either transmit or

[1] Y. Chen; Q. Zhao; A. Swami, “Distributed Spectrum Sensing and Access in Cognitive Radio Networks With Energy Constraint,” IEEE Trans. Signal Processing, 57(2):783-797, Feb. 2009. [2] Q. Zhao; L Tong; A Swami and Yunxia Chen, “Decentralized cognitive MAC for opportunistic spectrum access in ad hoc networks: A POMDP framework,” IEEE Journal Selected Areas in Communications,25(3):589-600, April 2007. [3] Y. Chen; Q. Zhao; A. Swami, “Joint Design and Separation Principle for Opportunistic Spectrum Access in the Presence of Sensing Errors,” IEEE Trans. Info. Theory, 54(5):2053-2071, May 2008. [4] Q. Zhao; B. Krishnamachari; K. Liu, “On myopic sensing for multichannel opportunistic access: structure, optimality, and performance,” IEEE Trans Wireless Comm., 7(12):5431-5440, Dec. 2008. [5] A. Sahai, N. Hoven, and R. Tandra, “Some fundamental limits on cognitive radio,” 42th Allerton, 2004. [6] R. Tandra and A. Sahai, “Fundamental limits on detection in low SNR under noise uncertainty,” IEEE WirelessCom Symp. on Emerging Networks, Technologies and Standards, Hawaii, June 2005. [7] S. Huang; X. Liu; Z. Ding, “Short Paper: On Optimal Sensing and Transmission Strategies for Dynamic Spectrum Access,” 3rd IEEE Intl. Symp. on New Frontiers in Dynamic Spectrum Access Networks, 2008. [8] S. Huang; X. Liu; Z. Ding, “Optimal Sensing-Transmission Structure for Dynamic Spectrum Access,” IEEE INFOCOM, April 2009. [9] S. Srinivasa and S. Jafar, “Soft sensing and optimal power control for cognitive radio,” IEEE GLOBECOM, Nov. 2007. [10] 3GPP Technical Specification Group Radio Access Network Physical layer procedures (FDD) (Release 5), 3rd Generation Partnership Project Std. S25.214 V5.11.0, 2005. [11] Air Interface for Fixed Broadband Wireless Access Systems, IEEE Std. 802.16-2004, 2004. [12] R. Zhang and Y. C. Liang, “Exploiting hidden power feedbacks in cognitive radio networks,” 3rd IEEE Intl. Symp. on New Frontiers in Dynamic Spectrum Access Networks, 2008. [13] F. E. Lapiccirella; S. Huang; X. Liu; Z. Ding, “Feedback-based access and power control for distributed multiuser cognitive networks,” Info. Theory and Applications Workshop, pp.85-89, Feb. 2009.

Suggest Documents