This full text paper was peer reviewed at the direction of IEEE Communications Society subject matter experts for publication in the WCNC 2009 proceedings.
A Distributed Network Selection Scheme in Next Generation Heterogeneous Wireless Networks Pengbo Si†‡ , F. Richard Yu‡ , Hong Ji† and Victor C.M. Leung§ Key Laboratory of Universal Wireless Communication, Ministry of Education Beijing University of Posts and Telecommunications, Beijing, P.R. China ‡ Department of Systems and Computer Engineering, Carleton University, Ottawa, ON, Canada § Department of Electrical and Computer Engineering, The University of British Columbia, Vancouver, BC, Canada Email:
[email protected], richard
[email protected],
[email protected] and
[email protected] †
Abstract— The network selection problem in next generation heterogeneous wireless networks is one of the key issues to integrate a wide range of radio access technologies. In most previous work, the network selection problem is modeled as a centralized system, without considering the distributed nature of heterogeneous wireless networks. In this paper, we propose a distributed network selection scheme in heterogeneous wireless networks. The the heterogeneous wireless networks are formulated as a restless bandit system, which has an “indexability” property that dramatically reduces the complexity. Extensive simulation results are presented to illustrate the performance improvement of the proposed scheme.
I. I NTRODUCTION Recently, integrating different radio access technologies (RATs) is becoming popular, because the complementary characteristics of different wireless networks make it attractive to integrate a wide range of RAT standards. Several internetworking architectures have been proposed. European Telecommunications Standards Institution (ETSI) specified loose coupling and tight coupling as the two generic integration approaches [1]. In the loose coupling approach, data flows from different types of networks go to the external IP network directly, and only signaling is required between cellular networks and other complementary networks. In the tight coupling approach, complementary networks communicate with the external network through the cellular networks. The Third Generation Partnership Project (3GPP) also proposed an internetworking architecture to enable the radio resource reuse between the networks as well as the authentication, authorization and accounting (AAA) [2]. A number of schemes are proposed to deal with the network integration problems to improve the performance of heterogeneous networks and keep users always best connected (ABC) [3]. Game theory is introduced to heterogeneous networks in [4] for radio resource management including bandwidth allocation and admission control. Authors of [5] propose a Markovian framework for the allocation of multiple services in multiple RATs and a model to embed the evaluation of This work was jointly supported by the Hi-Tech Research and Development Program (National 863 Program) under Grant 2007AA01Z221 and 2009AA01Z246, the National Natural Science Foundation of China under Grant 60672124 and 60832009, and the Scientific Research Foundation of Graduate School of BUPT under Grant No. 6, 2006.
several RAT selection policies considering different allocation criteria. Analytical hierarchy process (AHP) and grey relational analysis (GRA) are used in [6] to combine multiple network selection criteria and decide the weights of the criteria according to the user preferences and service applications. In [7], several resource management and admission control schemes are proposed in cellular/WLAN integrated networks. Although some work has been done to integrate heterogeneous wireless networks, in most previous work, the network selection problem is modeled as a centralized system, without considering the distributed nature of heterogeneous wireless networks. In this paper, we propose a distributed network selection scheme in next generation heterogeneous wireless networks. The proposed network selection scheme is based on the restless bandit approach [8] [9] [10] [11], which has an indexable rule that dramatically simplifies the computational complexity. The process of new session admission in the heterogeneous network is modeled as a Markov chain. We take into account the video transmission distortion, which is a key QoS metric. Besides, the price to access the networks is also the important factor that users are quite sensitive to. Consequently, in this paper, we use the reward that combines the video distortion and the price of the network access as the optimization goal of the network selection scheme. Furthermore, the network selection process is discussed. The rest of the paper is organized as follows. In Section II, the system is modeled as a restless bandit problem. The restless bandit problem is solved and the network selection process is discussed in Section III. Simulation results are presented in Section IV. Finally, we conclude this study in Section V. II. S YSTEM M ODEL In heterogeneous networks, multiple types of totally N networks cooperate to provide seamless coverage for universal wireless access. In this paper, we consider an area with the coverage of three types of networks: wireless local area networks (WLANs), WiMAX networks and cellular networks. Given a data rate, the authors in [12] provide a closed form distortion model taking into account varying characteristics of the input video, coding algorithm, and the intra-refreshing rate. In this paper, we adop this distortion model.
978-1-4244-2948-6/09/$25.00 ©2009 IEEE
This full text paper was peer reviewed at the direction of IEEE Communications Society subject matter experts for publication in the WCNC 2009 proceedings.
The optimal network selection problem is modeled as a restless bandit problem, which has an “indexibility” property that dramatically reduces the computational load [13]. A. Admissible Set in WLANs In IEEE 802.11e based WLANs, throughput and delay are important QoS metrics. An optimal operating point is determined in [14]. According to these results, we adopt the following admissible set for wireless LAN n J : B l (n) ≥ T B l (n), E l (n) ≤ T E l (n) , Sn = g(n) ∈ Z+ where B l (n) ≥ T B l (n) is the throughput constraint and E l (n) ≤ T E l (n) is the delay constraint. B. Admissible Set in WiMAX Networks To enhance the capacity and quality, adaptive modulation and coding are assumed in WiMAX networks. In [15], the probability of active mode θ is given by P (θ). We use the average transmission rate as the capacity C(n) = Θ+1 θ=0 P (θ)C(θ), where C(θ) is the capacity with mode θ. Thus the admissible set of WiMAX network n is [15] L J l l U (n)W (n) ≤ C(n) , Sn = g(n) ∈ Z+ : l=1
where L is the total number of service types and U l (n) is the number of sessions of service type l in network n, W l (n) is the bandwidth required by type l service. C. Admissible Set in Cellular Networks We take CDMA cellular networks with matched filter receivers as one type of the networks in the heterogeneous networks. One important physical layer QoS requirement for sessions of service type l in CDMA cellular network n is the target signal-to-interference ratio (SIR), which should be kept above the target value ωn,l [16]. Denote by W the total cell bandwidth, Rn,l the average data rate, ρ the orthogonality factor, σ the ratio between intercell interference and total intracell power, Λu the path loss of session u, Pp the power of common control channels, PN the power of background noise. To guarantee the SIR requirement, the minimum base station transmission power is L U
Pp + PN l=1 u=1 PT = L U 1 − l=1 u=1
Λu W ωn,l Hn,l ρ+σ
W ωn,l Hn,l
+ρ
,
+ρ
Then the admissible set of CDMA networks n with matched filter receiver is [16] J Sn = g(n) ∈ Z+ : PT ≤ PTM AX , where PTM AX is the maximum available base station power.
D. Decision Epochs and System State Space In this paper, we assume the decision epochs to be the session arrival and departure time points, because the states change when a session arrives and departs. The time intervals between two adjacent arrival epochs and two adjacent departure epochs are both random variables with distributed exponential L L the rates, ν = l=1 ν l and l=1 U l (tk )μl , respectively, where U l (tk ) is the total number of type l sessions in the networks at epoch tk , ν l and μl are the type l session arrival and departure rate. Consequently, the time intervals between epochs (tk , tk+1 ] are exponentially distributed with the expected number of L epochs in each time unit ν + l=1 U l (tk )μl . Assume the number of sessions with service type l in network n at epoch tk is U l (n, tk ). The state of network n at epoch tk is defined as s(n, tk ) = [U l (n, tk )]l∈{1,2,...,L} , where L is the number of service types. Thus the state space of network n is the admissible set Sn . The state of network n under action a evolves according to a Markov chain with the transition probability pai,j (n) from state si (n) = [uli (n)]l∈{1,2,...,L} to sj (n) = [ulj (n)]l∈{1,2,...,L} . Define the expected interval between two epochs for the state si τi = E (tk+1 − tk |si (n, tk )) , which is the inverse −1 of the total traffic rate τi = L ν + l=1 Uil (n)μl . Define the transition probability matrix of network n with action a to be P a (n) = [pai,j (n)]S(n)×S(n) , where S(n) is the number of available states s(n) of network n. Denote by (l), 1 ≤ l ≤ L, the L-element row vector of which the lth element is one and the other elements are zero, thus the transition probabilities can be represented as a p ⎧i,j (n) = if sj (n) = si (n) + (l), νl ζ(sj (n))aτi , ⎪ ⎪ ⎨ l if sj (n) = si (n) − (l), Ui (n)μl τi , l l ⎪ ⎪ 1 − νl ζ(sj (n))aτi − Ui (n)μ τi , if sj (n) = si (n), ⎩ 0, otherwise,
where ζ(x) is defined as ζ(x) =
1, if x ∈ Sn , 0, otherwise.
E. Policy and System Reward The policy is a set of actions corresponding to the decision epochs. Define the set of all available policies to be A = {A}. Thus A∗ = arg maxA∈A Z(A). The action is the network selection decision at the current epoch. At each epoch tk , one of the networks is selected to be active, meaning that it is ready to admit a new arrival session at the next epoch tk+1 if a new session arrives at tk+1 . For each network n at epoch tk , 1, if network n is active at epoch tk , an (tk ) = 0, if network n is passive at epoch tk . N The actions satisfy n=1 an (tk ) = 1.
This full text paper was peer reviewed at the direction of IEEE Communications Society subject matter experts for publication in the WCNC 2009 proceedings.
The optimization goal is to maximize the total discounted reward which is defined as Z=
(tk ) T −1 U
β
T −k−1
Ru (tk ),
(1)
k=0 u=1
where T is the number of epochs considered, and Ru (tk ) is the reward of session u at epoch tk , which is defined as Ru (D(u), B(u)) = [−c1 lg(D(u)) − c2 B(u) + c3 ] τi ,
(2)
where D(u) is session u’s distortion, B(u) is the price paid by session u, which is related to the current serving network. c1 ≥ 0, c2 ≥ 0 and c3 are constant coefficients. By adjusting the coefficients, the balance of distortion and price can be achieved. Since sessions of the same service type in the same network have the consistent properties, they have similar distortion. Besides, the costs of these sessions are also the same. Consequently, (1) can be also written as Z = L T −1 N l l l k=0 n=1 l=1 U (n, tk )R (n), where R (n) is the reward by session of type l in network n. The objective of our problem is to maximize the total reward to achieve Z ∗ = max Z(A). A∈A
(3)
F. Indices The restless bandit approach has an indexable rule that reduces the computational complexity dramatically. For network n in state in , we denote by the index δn (in ). According to the restless bandit approach, the optimal policy A∗ is a set of optimal actions. Let the element of A∗ in row n and column k be a∗n (tk ), which represents the optimal action for network n at epoch tk , thus 1, if δn is the smallest in {δ1 , δ2 , . . . , δN }, ∗ an (tk ) = 0, else. In our network selection problem, at each epoch, the network with the smallest index δn is set to be active, while other networks are passive. At the next epoch, if a session arrives, the active network will admit the new session; if a session departs, only the corresponding network needs to do the de-association action. III. S OLVING THE R ESTLESS BANDIT P ROBLEM The standard restless bandit problem allows M out of N objects to be active at epoch tk . The reward Ra (n) is earned by each object, with its state changing according to the transition probability matrix P a (n). The total reward is time-discounted by the discount factor β. The aim is to find the optimal policy A∗ ∈ A to maximize the expected reward R(A). A. Solving the Restless Bandit Problem by LP Relaxation To solve the restless bandit problem, a hierarchy of increasingly stronger LP relaxations is developed based on the result of LP formulations of Markov decision chains (MDCs) [9], the last one of which is exact.
To formulate the problem, we first introduce Ija (tk ) =
1, if action a is taken at epoch tk in state j, 0, otherwise.
With Ija (t), let xaj (A)
= EA
T −1
Ija (tk )β tk
(4)
k=0
represent the total discounted time that action a is taken in state j under policy A. Denote by D = {(i, a) : i ∈ S , a ∈ A } the state-action space. Consequently, (3) can be translated into
Z ∗ = max A∈A
Rian xai (A),
(5)
(i,a)∈D
where Rian is the reward by the network n in state i with action a. Let’s introduce the performance vector x(A) = (xaj (A))j∈S ,A∈A under all A ∈ A . We can rewrite (5) as Z ∗ = maxx∈X (i,a)∈D Rian xai , where X = {x(u), u ∈ U }. (4) can be decomposed for two admissible actions x1in (A) x0in (A)
= EA = EA
T −1 k=0 T −1
Ii1n (tk )β tk
,
Ii0n (tk )β tk
.
k=0
Thus the restless bandit problem can be formulated as the linear program
Z ∗ = max x∈X
n∈{1,2,...,N } in ∈Sn an ∈{0,1}
Riann xainn ,
where X = {x = (xainn (A))in ∈Sn ,an ∈{0,1},n=1,...,N |A ∈ A }. The approach to solve this problem is to construct relaxations of polytope X that yield polynomial-size relaxations of ⊇ X the relaxations, not on the linear program. Denote by X space of the original variables xai , but in a higher-dimensional space that includes new auxiliary variables [9]. Now the first-order relaxation can be formulated as the linear program Z 1 = max
n∈{1,2,...,N } in ∈Sn an ∈{0,1}
subject to xn ∈ Q1n , n ∈ {1, 2, . . . , N }, M . x1in = 1−β
Riann xainn
(6)
n∈{1,2,...,N } in ∈Sn
There are O(N |Smax |) variables and constraints of this linear program, where |Smax | = maxn∈{1,2,...,N } |Sn |, with the size polynomial in the problem dimensions.
This full text paper was peer reviewed at the direction of IEEE Communications Society subject matter experts for publication in the WCNC 2009 proceedings.
B. Primal-Dual Priority-Index Heuristic
350
n∈{1,2,...,N } jn ∈Sn
subject to λin − β p0in jn λjn ≥ Ri0n , in ∈ Sn , n = 1, . . . , N, jn ∈Sn
λin − β
p1in jn λjn ≥ Ri1n , in ∈ Sn , n = 1, . . . , N,
jn ∈Sn
λ ≥ 0.
(7)
We denote by {xainn } and {λin , λ} the optimal primal and dual solution pair to the first-order relaxation (6) and its dual (7). Let {γ ainn } represent the corresponding optimal reduced cost coefficients: γ 0in = λin − β p0in jn λjn − Ri0n , jn ∈Sn
γ 1in
= λin − β
p1in jn λjn − Ri1n ,
(8)
jn ∈Sn
which must be nonnegative. Furthermore, γ 0in and γ 1in can be interpreted as the rates of decrease in the objective-value of linear program (6) per unit increase in the value of the variable x0in and x1in , respectively. We define a directed graph from the transition probabilities for each network n ∈ N : Gn = (Sn , An ), where An = {(in , jn )|p0in jn > 0, and p0in jn > 0, in jn ∈ Sn }. Thus under the mixing assumption that Gn is connected for every n, every extreme point x of polytope P 1 has the following properties [9]: 1) There are at most one network m and one state im ∈ Sm for which x1im > 0 and x0im > 0. 2) For all other networks n and all other states either x1in > 0 or x0in > 0. Based on the cost coefficients computed in (8), the index of the network n in state in is defined as δin = γ 1in − γ 0in .
(9)
The priority-index rule is to select the network that has the smallest indices to be active. In case of ties, set active network with x1in > 0. C. The Process of the Optimal Scheme The network selection is in a distributed and cooperative way, which can be divided into the off-line stage and the online stage. In the off-line stage, indices are calculated for all states and actions, and are stored in a table. In the on-line stage, a network looks up its table to find out the index corresponding to the current state and action.
Optimal Scheme Existing Scheme
300
Expected Reward
In this subsection, a heuristic for the restless bandit problem that uses the information contained in optimal primal and dual solutions to the first-order relaxation is presented. The primaldual heuristic is interpreted as a priority-index heuristic as well. The dual of (6) is M λ αjn λjn + D1 = max 1−β
250 200 150 100 50 0 0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
User Departure Rate
Fig. 1. The expected reward comparison with different session departure rate µ.
IV. S IMULATION R ESULTS AND D ISCUSSIONS Extensive simulation results are presented to show the performance improvement of the proposed optimal network selection scheme in this section. The area considered is covered by three networks: a WLAN, a WiMAX network and a cellular network. Assume that in the WLAN, the channel bit rate is 11 Mbps, with each slot length 10 μs, and maximum contention window size 32. In the WiMAX network, the system bandwidth is 7 MHz, the number of OFDM sub-carriers is 256 with a guard period ratio 1/4 and a sampling factor 8/7, and the average SNR is 15 dB. In the cellular network, the target SIR value is 10 dB, the total cell bandwidth is 3.84 Mc/s, the inter/intra cell intereference ratio is 0.55, the orthogonality factor is 0.4, the common control channel power is 33 dB, the background noise power is -106 dB, and the maximum base station power is 43 dB. We use the Internet access prices in New York city currently: the price of 3G mobile Internet access provided by AT&T is $60 per month with a limitation of 5GB; WiMAX broadband access by Sprint/Nextel is $59.99 per month also with the 5GB limitation; WiFi access by AT&T is $19.95 per month. Assume c1 = c2 = 0.5 and c3 = 34 in (2). We compare the proposed scheme with the existing scheme, in which no application layer QoS is considered and each individual network is optimized separately. A. Reward with Different Traffic Rates In this subsection, we present the affect of traffic rate on the expected reward. We adopt the average value of the reward after 60 minutes (convergence time) in the time line as the expected reward. As shown in Fig. 1, with the increase of session departure rate μ, the session number decreases in the networks, and consequently, the reward decreases. We set ν1 = 1.6 and ν2 = 3.2 in this simulation. We can see that the reward of the optimal scheme is always better than the existing scheme.
This full text paper was peer reviewed at the direction of IEEE Communications Society subject matter experts for publication in the WCNC 2009 proceedings.
V. C ONCLUSIONS AND F UTURE W ORK
400
Expected Reward
350 300 250 200 150 100 Optimal Scheme
50
Existing Scheme
0 0.3
0.35
0.4
0.45
0.5
0.55
0.6
Price Ratio: WLAN/WiMAX
In this paper, we have presented an optimal distributed network selection scheme in next generation heterogeneous wireless networks, taking into account the multimedia distortion as the application layer QoS. We have modeled the network selection problem as a restless bandit system, which has an indexable rule that dramatically simplifies the computational complexity. Simulation results have been presented to show that the proposed scheme can improve the performance significantly. Future work is in progress to consider optimizing application layer parameters simultaneously in the proposed framework. R EFERENCES
Fig. 2.
The reward with variable WiMAX price.
400
Expected Reward
350 300 250 200 150 100 Optimal Scheme
50 0 0.5
Existing Scheme 1
0.9
1.1
1.3
1.5
Price Ratio: WiMAX/CDMA
Fig. 3.
The reward with variable cellular network price.
B. Reward with Different Price Ratios Since the price differs in different regions, we also change WiMAX and cellular network access prices and present the performance with different price ratios. We compare the performance of the proposed scheme with different service prices with the existing scheme. In Fig. 2, the reward comparison is presented with fixed WLAN and cellular network prices and variable WiMAX price. The reward is plotted against the WLAN/WiMAX price ratio. With the decrease of WiMAX price (the increase of WLAN/WiMAX price ratio), we can see that the total reward goes up, and the optimal scheme always chooses the best network for the new arrival session, thus obtains a higher reward. We then assume different WiMAX/cellular network price ratio with fixed WLAN and WiMAX prices and variable cellular network price. As shown in Fig. 3, the expected reward also grows as the price ratio increases, and the price of cellular network decreases. Our optimal scheme performs the optimal action at each epoch to guarantee the maximized reward under different price conditions.
[1] ETSI, “Requirements and architectures for interworking between HIPERLAN/3 and 3rd Generation cellular systems,” Tech. Rep. ETSI TR 101 957, Aug. 2001. [2] 3GPP TS 23.234, v.6.2.0, “Group services and system aspects; 3GPP systems to wireless local area network (WLAN) interworking; system description (release 6),” Sept. 2004. [3] E. Gustafsson and A. Jonsson, “Always best connected,” IEEE Wireless Commun., vol. 10, no. 1, pp. 49–55, 2003. [4] D. Niyato and E. Hossain, “A noncooperative game-theoretic framework for radio resource management in 4G heterogeneous wireless access networks,” IEEE Trans. Mobile Comput., vol. 7, no. 3, pp. 332–345, Mar. 2008. [5] X. Gelabert, J. Per´ez-Romero, O. Sallent, and R. Agust´ı, “A markovian approach to radio access technology selection in heterogeneous multiaccess/Multiservice wireless networks,” IEEE Trans. Mobile Comput., vol. 7, no. 10, pp. 1257–1270, Oct. 2008. [6] Q. Song and A. Jamalipour, “Network selection in an integrated wireless LAN and UMTS environment using mathematical modeling and computing techniques,” IEEE Wireless Commun., vol. 12, no. 3, pp. 42–48, June 2005. [7] W. Song, H. Jiang, W. Zhuang, and X. Shen, “Resource management for QoS support in cellular/WLAN interworking,” IEEE Netw., vol. 19, no. 5, pp. 12–18, Sep. 2005. [8] P. Whittle, “Restless bandits: activity allocation in a changing world,” in A Celebration of Applied Probability (J. Gani, ed.), vol. 25 of J. Appl. Probab., pp. 287–298, Applied Probability Trust, 1988. [9] D. Berstimas and J. Ni˜no-Mora, “Restless bandits, linear programming relaxations, and a primal dual index heuristic,” Operations Research, vol. 48, no. 1, pp. 80–90, 2000. [10] J. L. Ny, M. Dahleh, and E. Feron, “Multi-agent task assignment in the bandit framework,” in Proc. 45th IEEE Conf. Decision and Control, (San Diego, California), pp. 5281–5286, Dec. 2006. [11] J. L. Ny and E. Feron, “Restless bandits with switching costs: Linear programming relaxations, performance bounds and limited lookahead policies,” in Proc. 2006 American Control Conf., (Minneapolis, Minnesota), pp. 1587–1592, June 2006. [12] Z. He, J. Cai, and C. Chen, “Joint source channel rate-distortion analysis for adaptive mode selection and rate control in wireless video coding,” IEEE Trans. Circ. Sys. Video Tech., vol. 12, no. 6, pp. 511–523, June 2002. [13] H. Robbins, “Some aspects of the sequential design of experiments,” Bulletin of the American Mathematical Society, vol. 55, pp. 527–535, 1952. [14] H. Zhu and I. Chlamtac, “A call admission and rate control scheme for multimedia support over IEEE 802.11 wireless LANs,” Wireless. Netw., vol. 12, pp. 451–463, July 2006. [15] A. I. Elwalid and D. Mitra, “Effective bandwidth of general Markovian traffic sources and admission control of high speed networks,” IEEE/ACM Trans. Netw., vol. 1, no. 3, pp. 329–343, Jun. 1993. [16] H. Holma and A. Toskala, WCDMA for UMTS: Radio Access for Third Generation Mobile Communications. NY: Wiley, 2004.