Optimal Secondary User Selection Scheme for Primary Users in Cognitive Radio Networks Mi Zhang, Pengbo Si, Yanhua Zhang College of Electronics Information and Control Engineering, Beijing University of Technology, Beijing, China Email:
[email protected],
[email protected],
[email protected] Abstract—Cognitive radio (CR) is considered as one of the most promising solutions to the problem of overcrowded spectrum by allowing secondary users (SUs) to access the spectrum dynamically. Most related previous work on CR networks with primary users’ (PUs’) participation ignores the SU selection scheme when the spectrum holes cannot satisfy the requirements of all SUs. In this paper, we propose an optimal SU selection scheme with the optimization goals of maximizing the PU’s reward, which is defined as the summation of rent collected from the SUs. To solve this problem, a restless bandit approach is used, which dramatically simplifies the computational complexity. Furthermore, extensive simulation results are presented to show the significant performance improvement of the proposed scheme compared to the existing ones. Keywords-cognitive radio; restless bandits; SU selection optimization
I.
INTRODUCTION
Traditional wireless communication systems utilize the frequency bands statically allocated by the spectrum management authorities like Federal Communications Commission (FCC) in the US and Bureau of Radio in China. As the density of wireless transceivers growing rapidly nowadays, frequency spectrum turns to be the scarcest resource for wireless communications. Therefore, the concept of cognitive radio (CR) has been proposed as a smart and agile technology allowing non-legitimate users to utilize licensed bands opportunistically [1]. By detecting particular spectrum holes and jumping into them rapidly, the CR devices could improve the spectrum utilization significantly [2]. A lot of work on the user behavior and key functions of CR devices for dynamic spectrum utilization has been done. Most of the work presently is based on the coexistence of the PUs and SUs, where PUs do not participant in the cognitive progresses [3]. However, with the increasing of SUs, the spectrum utilization may become disordered and the efficiency may stay low. As the license holder, PUs can act as the band manager to make rules and benefit from spectrum renting [4]. In such a cooperative cognitive radio scenario, SUs and PUs interact and a secondary user should ask the primary user for permission to use the spectrum before transmitting. This information exchange provides an opportunity for the primary user to guarantee quality of service (QoS) for secondary user [5]. This is the case of secondary spectrum market that operates in real time with PU as manager [6]. Based on the research on cooperative cognitive radio system, in this paper, with the optimization goals of
978-1-4577-1415-3/12/$26.00 ©2012 IEEE
1166
maximizing the reward of the PU, we propose an optimal secondary user selection scheme. We have the motivations that the cooperation between PUs and SUs will minimize the interference, and it’s important to guarantee the accurate and efficient selection of SUs in real time cognitive radio networks. What’s more, PUs, as the band manager, have rights to benefit from renting their spectrum to other users. The SU selection system is formulated as a restless bandits system. The restless bandits formulation is an extension of the classical multi-armed bandit, which is a special type of stochastic control problems. Whittle in 1988 proposed an index policy for the restless bandit problem [7] and introduced the concept of indexability, which theory has been improved into a powerful modeling framework in recent years [8]. The rest of this paper is organized as follows. Section II describes the system model of the secondary user selection problem. In Section III we formulate the problem as a restless bandit system. Section IV brings forth the whole SU selection process, and simulation results and discussions of our proposed scheme are presented in Section V. This paper is concluded in Section VI. II. SYSTEM MODEL In this section, we have a discussion on the cognitive radio network architecture, the data arrival model and the action optimization objective of the primary user. A.
Network Architecture We consider the cognitive radio network based on the cooperation among SUs and PU showed in Fig.1, where a large number of secondary users are operating in the same frequency band as the primary user. We assume the scenario that the spectrum requirement of the SUs is high thus the free spectrum bands from the PU are insufficient. SUs exchange information with the primary user without interacting with each other. Besides, since there may be more PUs with the overlapping coverage, SUs may decide to access to other PUs’ networks when the spectrum resource from the current accessing PU is so tense that their cached data exceeds a certain threshold. The whole cognitive network operates as follows. When SUs have data to transmit in their cache, they sense the wireless environment for spectrum holes and send spectrum request messages to PU together with their cache information. Assume that there are N SUs who send request messages, agree to obey the rules and pay corresponding rent
to PU. Then one of the secondary users is presented as un U ^u1 , u2 ,..., u N ` , where U is the set of all the SUs. Assume that the spectrum pooling duration is divided into T equal-length time slots. For the infinite time horizon case, we choose T large enough for approximation. Then one of the slots is t T tT 1` , where T is the set of the ^t0 , t1 , ..., time slots.
Figure 1. A cognitive network based on cooperation
The available spectrum in the pool is divided into M subbands with a fixed bandwidth w . The spectrum is allocated with the unit of one subband each time, which means that SU un can use at most one subband at each time slot. PU needs to choose M SUs among N spectrum applicants, and sends acceptance messages to these M SUs and rejection messages to others, respectively. B. Data Transmission Model SUs can be divided into several cache state levels according to the amount of their cache. Different cache state levels correspond to different rent price, that is, more data to transmit, more urgent the spectrum need would be, thus the SU have to pay more rent to the PU. This is the reason why the rent is set according to the cache state levels. And if there is too much data in the cache that their spectrum needs cannot be met, SUs would no longer stay but turn to other PUs’ networks.
For the SUs, the arriving of the data to be transmitted are assumed to be Poisson processes, the expected data arrival rate of SU n is represented by O ( n ) . According to O ( n) , we can calculate the state transition probability matrix of the SU under different PU actions. The system state transits as a Markov process according to the state transition probability matrix. C. Actions and Objective The action of SU n in time slot t is represented by an (t ) A ^0,1` . an (t ) 1 means the SU n is active in time slot t , and an (t ) 0 means the SU n is passive in time slot t . The action decision is made by PU according to the system state and SUs obey the spectrum accessing command, which is also the PU’s action decision. SUs pay rent corresponding with their states, and the objective of PU is to
1167
maximize the total rent reward in the whole spectrum pooling duration. III.
RB FORMULATION
The restless bandit formulation is an extension of the classical multi-armed bandit problems. A multi-armed bandit is a special type of stochastic control problem. In these types of problems, there are N parallel projects, each of which has a finite state space. At each discrete time instant, exactly one project will be active. In restless bandit problem, multiple projects can be active and all projects evolve at each time instant and the reward is obtained in each time slot. Projects are selected under a policy u U , where U is the set of all admissible policies. The problem is to find the optimal u so that the optimization objective can be achieved. A. System State The system state is composed of states of potential SUs. The state of SU n in time slot t is determined by the quantity of its data cache. The set of the state of each SU can be presented by I I G 1 , I G ` , where G+1 is the total ^I 0 , I1 , ..., number of cache states and in is I n (t ) for SU n in time slot t . I 0 means no data to transmit, and I G means too much data that SU turns to other PU’s spectrum holes. It is also assumed that SUs will send their cache messages to PU at the beginning of each time slot only when in ^ I1 ,..., I G 1 ` .
We use H g to respect the upper bound of state I g , and H the actual amount of data cache. in can be described by
I g H g 1 H H g , g [1, G]) ? 0(5*()250$7
ig
The upper bound of state I G should be H G f . Then the set ^H 1 , H 2 , " H G ` can be used to describe SUs’ state division thresholds. B. State Transition Probabilities matrix of the SU taking The state transition probability a action a is denoted as Pn [ pin jn (t )]( G 1)u( G 1) , where pin jn (t) Pr In (t 1) jn | In (t) in, an (t) a ? 0(5*()250$7 in, jn I ,a A a
a
Pn can be also written as
a
Pn
ª pEE p01a p02a " p0aG º « p10a p11a " p1aG » « » ? 0(5*()250$7 « #%"# » ¬ pBK " pOT ¼
where pEE denotes the probability the cache of SU is empty during the a time slot, then we have a a p01 p02 " p0 G 1 pEE . pBK denotes the probability SU comes back to the PU’s network after they turn to other
networks because of too much data, and pOT denotes the probability SU is still in other networks, pOT pBK 1 . a
The other elements in Pn
depend on SUs’ actions, the
expected data arrival rate O ( n ) and the bandwidth of the subband w . We know the Poisson probability distribution P( X
k)
e
O
O
k
k!
,
k
X here stands for the
0 ,1, 2 , " .
amount of data arrived in current time slot. We assume the upper bound of state I i to be bi , thus the probability for SU transfer from state i to state j under action a 0 can be presented by
0
pij
P(bj 1 [
bi bi1
] d X d bj [
bi bi1
2 2 i, j 1,2,..., G. If a
]) ? 0(5*()250$7
1 , then
b b b b 1 ( j1 [ i i1 ]wd X dbj [ i i1 ]w) ? 0(5*()250$7 pij Pb 2 2 i, j 1,2,...,G C. System Reward rI 0 , rI1 , ..., rIG1 , rIG , We define the rent set as R corresponding to the state set I of SUs, where rI 0 rIG 0 a and rI1 rI 2 ... rIG1 . An instantaneous reward Rin (t ) is accrued in each time slot t when SU n is in state in and takes action a .
^
a
Rin (t )
`
^
0a 0 rin a 1 rin R ? 0(5*()250$7 a
in each time Then the instantaneous reward r (t ) accrued N a a slot for PU can be presented as r (t ) R ( t ) . i n 1 n
¦
D. Indices and policies The index for SU n in state in is represented as G in . The optimal policy has an index rule: The M SUs with the smallest indices in a given time slot t act as the active SUs. That is, assuming G k1 , G k2 , ..., G k N to be the set of indices arranged from the smallest value to the largest value in time slot t , the SU ’s action should be
^
an (t )
^
1,if n ^k1 , k 2 ,..., k M ` 0,otherwise.
`
? 0(5*()250$7
Let U denote the class of all admissible policies. The admissible policy u U is a T u N matrix, whose element of the t th row and the n th column is an (t ) , representing the
1168
action on SU n in time slot t . In each time slot, the number of active SUs is equal to M . We denote the discount factor by E (0 E 1) . The total expected discounted reward over the time horizon is (E , E ,..., E )u(r (0), r (1),..., r (T 1))' 0
Z(u)
1
T1
1
1
1
T
? 0(5*()250$7
The optimal policy u is the policy that achieves the optimization objective to be
u
*
a rg m a x Z ( u ) u U
E. Solving the Restless Bandits Problem To solve the restless bandit problem, a hierarchy of increasingly stronger LP relaxations is developed based on the result of LP formulations of Markov decision chains (MDCs) [9]. Then by using the information contained in optimal primal and dual solutions to the first-order relaxation we build a heuristic. The primal-dual heuristic can be interpreted as a priority-index heuristic as well. The priority-index rule is to select the M senders that have the smallest indices to be active. At each time slot, PU looks up the table for the current (i1 , i2 , ..., iN ) and chooses SUs with lowest M indices. IV.
SU SELECTION PROCESS
The SU selection process is based on the indices of each SU, which can be computed off-line for each available state of each SU network, and stored in a table. In the on-line stage, it is only needed to lookup the table to decide the action according to the state. The selection process is as follows. 1) Potential secondary users sense the wireless environment for spectrum holes and send spectrum request messages to primary user of these spectrum holes, with their cache and data arrival rate information in the messages. 2) PU divides its spare spectrum into M subbands and collects the spectrum request messages, decides the state number, rent set and time slot interval. Then PU announces this information to all potential SUs. 3) If a potential secondary user accepts the rules set by PU, it sends acceptance message to PU. 4) PU collects acceptance messages and gets aware of the number N . PU sets up indices table off-line. 5) PU looks up the table to find M SUs with smallest indices according to the state and send messages to SUs to tell if they are allowed to access in the next time slot. 6) SU accesses the subband and transmits its data if allowed and does nothing if not. 7) At the end of the time slot, if SU still has data to transmit and the data cached doesn’t exceed its tolerance, it will tell the PU how much data left in its cache.
8) Go to step 5 until PU has to use the spectrum. PU computes rent of each SU and sends messages to announce transmission end and make confirmation of rent. V.
SIMULATIONS RESULTS AND ANALYSIS
In this section, the simulation results of the proposed optimal SU selection scheme are presented and compared to the existing scheme, which ignores user selection optimization. Suppose the number of cache state levels G+1=5 and the length of each time slot to be one second. The average data arrival rate of SUs O 10Mbps , the width of the subband w 8 Mbps , the discount factor E 0.9 , the probability pEE 0.3 and pBK 0.1 . We also assume the upper bound of state set I ^H 1 , H 2 , " H G ` and the corresponding rent set to rI 0 , rI1 , ..., rIG1 , rIG . be R
5000 Existing Selection Scheme
4500
`
Optimal Selection Scheme
4000 Mean Total Reward
^
While the upper bound of state set I and the corresponding rent set are static, the date arrival rate may change according to different SUs. Fig.4 shows the effect of different O , from which we can see that there is a highest point at which PU would get the maximum reward under that data arrival rate. When data arrival rate is large, current PU’s network may not be able to satisfy SUs’, consequently, more SUs leave the current network and the mean total reward for PU decreases. This implies that it is important for the PUs to arrange proper state levels to get more reward.
With the increasing of the number of potential SUs, we can see the proposed optimal decision scheme improves the PU’s reward significantly in Fig.2. More SUs means more choices, and each additional SU increases the mean total reward during 2000 seconds by about 5%.
3500 3000 2500 2000 1500 1000
5000
500
Existing Selection Scheme
4500
Optimal Selection Scheme
Mean Total Reward
4000
6
7
8
9 10 11 12 Arrival Data Rate:Ȝ
13
14
15
Figure 4. Mean total reward with different data arrival rate O
3500
VI.
3000 2500 2000 1500 1000
5
6
7
8
9 10 11 12 T he Number of SUs:N
13
14
15
Figure 2. Mean total reward with different number of SU
Since SUs may do not always have data to transmit all the time, pEE is used to describe this probability. Fig.3 shows the effect of different cache staying empty probabilities on the mean total reward. The mean total reward doesn’t change much as SUs in state I 0 having little chances to be chosen. 4500 4000 Mean Total Reward
5
In this paper, we have proposed an optimal secondary user selection scheme for primary users in cognitive networks based on cooperation. The problem was formulated as a restless bandit problem which can simply select the SUs with the lowest indices. The maximized total discounted reward could be obtained by solving this problem. Extensive simulation results were also presented to illustrate that the proposed optimal action decision scheme can improve the PU’s reward compared with the existing scheme.
References [1] [2]
Existing Selection Scheme Optimal Selection Scheme
[3]
3500 3000
[4]
2500 2000
[5]
1500
[6]
1000 0.1
0.2 0.3 0.4 0.5 0.6 0.7 Probability of the Cache of SU Stays Empty:P EE
[7]
0.8
[8]
Figure 3. Mean total reward with different probabilities of p EE
[9]
1169
CONCLUSIONS
J. Mitola, G. Q. Maguire. “Cognitive radio: making software radios more personal,” IEEE Personal Commun, 1999, vol.6, pp. 13-18. T. A. Weiss, F. K. Jondral. “Spectrum pooling: an innovative strategy for the enhancement of spectrum efficiency,” IEEE Commun, 2004, vol.42, pp. 8-14. P. B. Si, F. R. Yu, et al. “Optimal cooperative internetwork spectrum sharing for cognitive radios systems with spectrum pooling, ” IEEE Transactions on Vehicular Technology, 2010, vol.59(5), pp. 1760-1768. A. T. Hoang, Y. C. Liang, M. H. Islam. “Power control and channel allocation in cognitive radio networks with primary users’ cooperation,” IEEE Transactions on Mobile Computing, 2010,Vol.9(3) , pp. 348-360. J. M. Peha. Sharing spectrum through spectrum policy reform and cognitive radio. Proceedings of the IEEE, 2009,Vol.97(4) , pp. 708-719. J. M. Peha, S. Panichpapiboon, “Real-time secondary markets for spectrum,” Telecomm Policy, 2004, vol.28(7-8) , pp. 603-618. P. Whittle, “Restless bandits: activity allocation in a changing world,” Applied Probability, 1988, Special volume: A celebration of applied probability (A festschrift for Joe Gani) , pp. 287-298. Niño-Mora J. Restless bandits, partial conservation laws and indexability”, Adv Applied Probability. 2001, Vol.33, pp. 76-98. D. Berstimas, J. Niño-Mora, “Restless bandits, linear programming
relaxations, and a primal dual index heuristic,” Operations Research,
1170
2000, vol.48(1) , pp. 80-90.