Optimal Call Admission and Call Dropping Control in Links with Variable Capacity. Antonio Pietrabissa. Computer and System Science Department (DIS), ...
European Journal of Control (2009)1:56–67 # 2009 EUCA DOI:10.3166/EJC.15.56–67
Optimal Call Admission and Call Dropping Control in Links with Variable Capacity Antonio Pietrabissa Computer and System Science Department (DIS), University of Rome ‘‘La Sapienza’’, Rome, Italy
This paper defines a theoretical framework based on Markov Decision Processes (MDPs) to deal with call control algorithms in links with variable capacity supporting multiple classes of service. The variable capacity problem, which arises in wireless network scenarios, is addressed by incorporating the link model into the MDP formulation and by introducing, besides the standard call admission policy, a call dropping policy. In this way, the proposed approach is capable of controlling class-level quality of service in terms of both blocking and dropping probabilities. Numerical simulations show the effectiveness of the approach. Keywords: Call Control, Markov Decision Process (MDP), Linear Programming (LP), Communication Networks.
1. Introduction To support a variety of applications (data, audio, video), the Internet best-effort paradigm is no more sufficient; thus, recent and next generation communication networks support different classes of service, characterized by different requirements. Calls of different classes compete for bandwidth, and are regulated by a call admission control (CAC) algorithm, which decides whether a new call can be safely Correspondence to: A. Pietrabissa, E-mail: pietrabissa@dis. uniroma1.it The network does not provide any guarantees that the data is delivered or that the user gets a defined service level.
accepted by the network or not. The CAC problem has been successfully modelled as a Markov Decision Process (MDP), based on the fact that the decision to accept or reject a call impacts on whether future calls will be accepted or not ([1]). MDPs are stochastic control processes, and provide a mathematical framework for optimization problems involving both random events and decision makers. MDPs can be classified as unconstrained or constrained, depending on the presence of global constraints: an unconstrained MDP can be solved via dynamic programming, whereas a constrained MDP can be solved (i) by formulating the unconstrained MDP obtained by neglecting the global constraints as a linear programming (LP) one ([6], [21]), and then (ii) by adding the global constraints within the LP formulation. As described below, global constraints have to be considered by the call control problem due to the presence of class-level requirements, usually specified in terms of target blocking probabilities and dropping probabilities ([10]). In the MDP formulations introduced so far (e.g., [1], [10]–[16], [18], [19], [26]–[28]), the link capacity is considered as a given constant value. On the contrary, when we consider wireless networks this is not true, because the link capacity is time-varying, for instance due to weather conditions, interferences, speed of mobile users, adaptive coding and modulation schemes; examples of such networks are CDMA (e.g., UMTS) ([7], [17]), WiMAX ([8]), and DVB-S2 [5]. If the link capacity decreases, the network might have to Received 22 April 2008; Accepted 20 October 2008 Recommended by J. Lunze, A.J. van der Schaft
57
Call Control in Variable Capacity Links
drop one or more calls ( forced dropping). This last event is rather disruptive and should be avoided as far as possible. Thus, to reduce the dropping probability, the network operator can decide to reserve a portion of the link capacity to best-effort traffic only; the problem of this solution is that it increases the blocking probability of guaranteed traffic. Alternatively, this paper proposes to incorporate the link model within the MDP framework and to compute simultaneously both an optimal admission policy and an optimal dropping policy: as shown in below, the developed approach is capable of explicitly controlling both blocking and dropping probabilities. The main drawback of the proposed solution is that it worsens the ‘curse of dimensionality’ problem of the MDP approach ([1]), i.e., the state-space explosion as the link capacity and the number of supported classes increase. However, the purpose of this paper is to define the fundamental theoretical framework which is necessary to analyze the time-varying capacity problem: then, the developed framework can be used to develop practical algorithms. In fact, the framework can be used as a basis for approximate dynamic programming (ADP) algorithms (see [2] and the reference herein to have a comprehensive overview of ADP); for example, existing ADP algorithms developed for networks with fixed-capacity alleviate the scalability problem by state-space and policyspace reduction approaches (as in [14], [18]) or by introducing different problem formulations (as in [19]), and deal with non-stationary environments by reinforcement learning (RL) approaches ([24]). In the literature, the time-varying link capacity problem has been faced separately from the MDPbased admission control problem; in fact, on the one hand the research is focused on modelling the link characteristics, on the other hand a significant number of MDP-based admission control algorithms have been proposed in the literature, but none considering time-varying link capacity. The most common and effective way to model a link is Markov-chain modelling, where each link state is characterized by a different available capacity. For instance, in [3] and [25], N-state Markov chains are defined to represent environment parameter variations of a satellite link; link parameters are evaluated from the experimental data. In [20] and [22], similar approaches—based on N-state Markov-chain modelling—are applied to terrestrial wireless links. As mentioned above, MDPbased admission control algorithms are widespread in the literature, both for wireless and wired networks. For instance MDP CAC is considered in [10] and in [14] for broadband multi-service networks, in [15] for ATM networks, in [12] for optical networks with the
use of weights to enforce fairness among the classes. Also MDP-based CAC for wireless networks has been analyzed in the literature: for example, [13] enforces fairness guarantees via the LP formulation; in [26], the LP approach is aimed at maximizing the revenue; in [16] and in [28] the data throughput is maximized under blocking probability constraints, [27] examines the CAC problem considering the bandwidth asymmetry between uplink and downlink which is typical in wireless networks. The structure of the paper is as follows: Section 2 defines the MDP framework for links with timevarying capacity; Section 3 describes the LP problem formulation which is used to control blocking and dropping probabilities; Section 4 presents the numerical simulations; and in Section 5 the conclusions are drawn.
2. MDP for Links with Variable Capacity Section 2.1 details a generic time-varying link model, which, in Section 2.2, is integrated in the developed Discrete-Time MDP (DTMDP) model; Section 2.3 illustrates the model properties. 2.1. Time-varying Link Model In this paper we consider a generic time-varying link modelled as a Markov chain. Each state of the Markov chain is characterized by a different available capacity, denoted with l, l ¼ 1, . . . , L. The number L of link states is defined based on the link characteristics. For instance, in DVB-S2 satellite networks L is the number of the (finite) available {modulation/coding/symbol rate} combinations, each one leading to a different transmission capacity [5]; in UMTS networks the link capacity is interference-limited ([17]) and varies continuously (soft capacity): in this case, the number of states L is a model parameter driving the trade-off between scalability (which increases with L) and granularity (which decreases with L). The generic transition frequency ’lk between two link states can be directly obtained by link measures as described in [3]. Let l be a generic link state, and T(l) the time during which the link state was l: ’lk is given by the number of transitions from state l to state k divided by T(l). Then, to obtain a Markov chain, a uniformization approach is followed to let the sum of the outgoing transitions from each link state equal to one. Because the link model will be integrated within the DTMDP model, the uniformization procedure will be applied to the whole model as described in Section 2.2.3.
58
A. Pietrabissa
2.2. DTMDP Model The algorithm is applied as a statistical-based admission control. Let us consider a link supporting C classes of service, each one characterized by a bitrate ec, c ¼ 1, . . . , C. As stated in the previous section, the L values of the link capacity are denoted with l, l ¼ 1, . . . , L. The network can be represented by a discrete-time system, whose generic state x(t) is given by the number of calls of each class c on-going at time t, nc(t), plus the link capacity at time t, l(t): x(t) ¼ (n1(t), n2(t), . . . , (t) associated to state nC(t), l(t) ). At time t, the load P nc ðtÞec . Two cases x(t) is then (t) ¼ [x(t)] ¼ c¼1;:::;C arise: 1. The state load at time t is less than or equal to the link capacity at time t: (t) l(t). In this case, if a call request arrives, the role of the admission controller is to decide whether to admit or reject it based on the current state x(t). As examined below, the admission decisions constitute the admission policy of the network. 2. The state load at time t is greater than the link capacity at time t: (t) > l(t). In this case, one ore more calls must be dropped until the system reaches a state whose load is less than the link capacity. Thus, besides the admission policy, a dropping policy will be introduced. Hereafter, a given state x(t) will be denoted as available at time t if [x(t)] l(t), unavailable otherwise. Note that, even if the admission policy is such that a given class c call is blocked whenever (t) þ ec > l(t), forced dropping cannot be avoided due to the fact that the link capacity is varying. The system is sampled (see Section 2.2.3), and has the following dynamics: xðt þ 1Þ ¼ fðxðtÞ; uðtÞ; zðtÞÞ
ð1Þ
where u(t) is the control action, and the disturbance z(t) represents call attempts and terminations, characterized as follows: for each class c, call attempts are distributed according to a Poisson process with mean arrival frequency c; the call holding time of class c is exponentially distributed with mean termination frequency cy. The control u(t) is relevant either at call attempts or when a state becomes unfeasible due to the variation of the link capacity: in the former case u(t) is the admission
y
Poisson call attempts and exponential call holding time are widely used in the literature, and are adequate for voice users, but further research is needed in the area of Markov regenerative decision processes to justify it for the new traffic services ([11], [28]).
decision, in the latter it is the dropping decision. The objective is to find the optimal control law u(t) with respect to an appropriate reward function. To model the system as a DTMDP, the state space S, the action space A, the transition probability matrix T and the reward R have to be defined. 2.2.1. State Space S The system state at time t is defined by the number of on-going connections and by the link state: x(t) ¼ (n1(t), n2(t), . . . , nC(t), l(t) ). Let us consider a generic link state l; there is a finite number of states identified by the following equation: X nc ðtÞec L : ð2Þ c¼1;:::;C
By denoting with M the number of states, the system states when the link is in state l are denoted as xi,l ¼ (n1(i), n2(i), . . . , nC(i), l), i ¼ 1, . . . , M. The set of the states whose load is lower than or equal to l are available, whereas the other states are unavailable. Let Nl be the number of available states when the link state is l, and let the states xi,l be ordered with respect to their load, i.e., (xi þ 1,l) (xi,l), i ¼ 1, . . . ,M-1. The set of available states is then defined as follows: ( SAV ¼ xi;l ¼ n1 ðiÞ; n2 ðiÞ; :::; nC ðiÞ;l ; ) X n ðiÞec l ; i ¼ 1; :::;M; l ¼ 1;:::; L c¼1;:::;C c ð3Þ ¼ xi;l l¼1;:::;L;i¼1;:::;Nl ; the set of unavailable states is ( SUN ¼ xi;l ¼ n1 ðiÞ;n2 ðiÞ;:::;nC ðiÞ;l ; ) X nc ðiÞec L ; i¼1;:::;M; l¼1;:::;L l < c¼1;:::;C ð4Þ ¼ xi;l l¼1;:::;L;i¼Nl þ1;:::;M ; and the whole state space is ( S ¼ xi;l ¼ n1 ðiÞ;n2 ðiÞ;:::;nC ðiÞ;l ; ) X n ðiÞec L ; i¼ 1;:::;M; l¼ 1;:::;L c¼1;:::;C c ¼ xi;l l¼1;:::;L;i¼1;:::;M ¼ SAV [SUN :
ð5Þ
Obviously, when the link state is L, NL ¼ M, and all the states xi,L are available. The total number of states is LM.
59
Call Control in Variable Capacity Links
2.2.2. Action Space A In the generic available state xi,l ¼ (n1(i), . . . , nc(i), . . . , nC(i), l) 2 SAV, if a call attempt of class c occurs, the controller might decide to block the call (and the state remains the same) or to accept it, provided that xj,l ¼ (n1(i), . . . , nc(i) þ 1, . . . , nC(i), l) 2 SAV. Let us denote such decision as uadm(xi,l,c), and let us associate the value 1 if the decision is to accept the new call, 0 if it is to reject it; no decision can be taken on call terminations. By defining dc as a 1x(C þ 1) vector of zeros but the element of column c equal to 1, the above-defined state xj,l is equal to xi,l þ dc; similarly, the state xi,l is equal to xj,l – dc. The action space AAV, which associates the admission decisions to the available state space SAV, is then defined as follows: AAV ¼ uadm ðxi;l Þ ¼ uadm ðxi;l ; 1Þ; :::; uadm ðxi;l ; CÞ uadm ðxi;l ; cÞ 2 f0; 1g if xi;l þ dc 2 SAV ; = SAV ; uadm ðxi;l ; cÞ ¼ 0 if xi;l þ dc 2 c ¼ 1;:::; C; i ¼ 1; :::; Nl g
ð6Þ
In the generic unavailable state, if a call attempt of class c occurs, the controller has to block the call; moreover, the controller must drop one or more call to try to reach an available state. Let us consider a generic available state xi,l ¼ (n1(i), . . . , nc(i), . . . , (i), l) 2 SAV (i.e., a state xj,l such that ðxi;l Þ ¼ nCP nc ðiÞec l ), and let us consider a link state c¼1;:::;C
transition which decreases the link capacity from l to k, with k< (xi,l): this capacity variation leads the system to the unavailable state xi,k ¼ (n1(i), . . . , nc(i), . . . , nC(i), k) 2 SUN. Then, the controller must lead the system to an available state by choosing a sequence of one or more call dropping which reduces the system load by reducing the number of on-going calls. Note that if we had to consider all the possible paths among each unavailable state and the available states, we would cause an action-space explosion leading to severe scalability problems. This paper proposes to limit the controller choice to drop one call at each control interval, greatly simplifying the action space. The drawback is that the system might remain in the unavailable state space for more time intervals, until the visited state becomes available; however, as described in the following section, this drawback is negligible because the frequency of the controller actions is practically much faster than traffic and link state dynamics. In particular, the frequency of the dropping action fdrop (which determines the time interval between two consecutive dropping action)
should be set as large as possible considering implementation constraints. In conclusion, considering a generic unavailable state xi,k ¼ (n1(i), . . . , nc(i), . . . , nC(i), k) 2 SUN, the controller decision is to select the class c of the call to be dropped among all classes c such that nc(i) > 0. The system is then driven to another state in the unavailable condition, if xi,k – dc 2 SUN, or to an available state, if xi,k – dc 2 SAV. Let us denote such decision as udrop(xi,k,c), and let us associate the value 1 if the decision is to drop a class c call, 0 if it is to drop a call of another class. Note that when the system is in a generic unavailable state xi,k, exactly one dropping decision must be equal C P udrop ðxi;k ; cÞ ¼ 1; 8xi;k 2 SUN : The action to 1: c¼1
space AUN associates the dropping decisions to the unavailable state space SUN, which is then defined as follows:
n AUN ¼ udrop ðxi;k Þ¼ udrop ðxi;k ;1Þ;:::;udrop ðxi;k ;CÞ udrop ðxi;k ;cÞ 2 f0;1g if xi;k dc 2S; = S; udrop ðxi;k ;cÞ¼0 if xi;k dc 2 C X
udrop ðxi;k ;cÞ¼1; i¼Nk þ1;:::; M; k¼1;:::;L:
c¼1
ð7Þ The action space A is the set of the available and the unavailable action spaces: A ¼ AAV [ AUN
ð8Þ
Summarizing: the admission policy uadm ¼ uadm ðxi;l Þ l¼1;:::;L; i ¼1;:::;Nl maps each available state xi, action l 2 SAV to the admission control uadm(xi,l); the dropping policy udrop ¼ udrop ðxi;l Þ l¼1;:::;L;i¼Nl þ1;:::;M maps each unavailable state xi,l 2 SUN to the dropping control action udrop(xi,l); the controller policy is u ¼ uadm [ udrop. The policy space U is defined as the set of all the feasible policies: U ¼ fuadm juadm ðxi;l Þ 2 AAV ; 8i;l 2 SAV g [ fudrop judrop ðxi;l Þ 2 AUN ; 8xi;l 2 SUN g ð9Þ
2.2.3. Transition Matrix T The transition frequencies (xi,l, xj,k) between couples of states xi,l, xj,k are inferred from the link model,
60
A. Pietrabissa
from the above-stated assumptions on z(t), and from the above-defined action space A: (a) The transition frequency (xi,l, xj,l) between an available state xi,l 2 SAV and xj,l ¼ xi,l – dc is equal to nc(i)c if xj,l 2 S, to 0 otherwise; (b) the transition frequency (xi,l, xj,l) between an available state xi,l 2 SAV and xj,l ¼ xi,l þ dc is equal to uadm(xi,l,c)c if xj,l 2 SAV, to 0 otherwise; (c) the transition frequency (xi,l, xj,l) between an unavailable state xi,l 2 SUN and xj,l ¼ xi,l – dc is equal to fdrop udrop(xi,l) þ nc(i)c if xj,l 2 S, to 0 otherwise; (d) the transition frequency (xi,l, xi,k) between the states xi,l 2 S and xi,k 2 S is equal to ’lk. All the other transition frequencies are null. To apply DP algorithms, we have to obtain a uniform, discrete-time Markov chain by defining the transition probabilities. The generic transition probability between two states xi,l 2 S and xj,k 2 S will be denoted with pu(xi,l,xj,k), where the base u explicitly highlights that the transition matrix depends on the adopted policy u. A standard method is to define the transition probabilities as described by the following procedure ([11]): (i) Divide the transition frequencies of each state by the following constant (uniformization): ( ) L X M X ðxi;l ; xj;k Þ ; ð10Þ > max l¼1;:::;L;i¼1;:::;M
(c) The probability of the transition between the states xi,l 2 SUN and xj,l ¼ xi,l – dc is given by: 8 fdrop nc ðiÞc > > > > udrop ðxi ; cÞ þ < if xi;l dc 2 S; pu ðxi;l ; xi;l dc Þ ¼ > > > > : 0 otherwise l ¼ 1; . . . ; L; i ¼ Nl þ 1; . . . ; M; c ¼ 1; . . . ; C;
ð13Þ
which depends on the dropping decision in state xi,l; (d) The probability of the transition between the states xi,l and xi,j is given by: ’lk pu ðxi;l ; xi;k Þ ¼ ; i ¼ 1; . . . ; M; l ¼ 1; . . . ; L; k ¼ 1; . . . ; L:
ð14Þ
ii) Add state self-transitions to let the sum of the transitions leaving each state equal to 1: X pu ðxi;l ; xi;l Þ ¼ 1 pu ðxi;l ; xj;k Þ; j¼1;:::;M;k¼1;:::;L ðj;kÞ6¼ði;lÞ
i ¼ 1; :::; M; l ¼ 1; :::; L:
ð15Þ
The elements pu(xi,l,xj,k) define the transition matrix T.
k¼1 j¼1
which computes the maximum total output frequency among all the states, thus obtaining the following transition probabilities: (a) The probability of the transition between the states xi,l 2 SAV and xj,l ¼ xi,l – dc is given by: 8 > < nc ðiÞc if xi;l dc 2 S; pu ðxi;l ; xi;l dc Þ ¼ > : 0 otherwise l ¼ 1; . . . ; L; i ¼ 1; . . . ; Nl ; c ¼ 1; . . . ; C:
For admission control algorithms, the most common evaluation parameter is the link utilization or throughput; the reward associated to the available state xi,l 2 SAV is then defined as the load occupied by the on-going accepted calls: rðxi;l Þ ¼
C X
nc ðiÞec ; l ¼ 1; :::; L; i ¼ 1; :::; Nl :
c¼1
ð16Þ
ð11Þ
(b) The probability of the transition between the states xi,l 2 SAV and xj,l ¼ xi þ dc is given by: 8 > < c uadm ðxi;l ;cÞ if xi;l þ dc 2 SAV ; pu ðxi;l ;xi;l þ dc Þ ¼ > : 0 otherwise l ¼ 1; . . . ; L; i ¼ 1; . . . ; Nl ; c ¼ 1; . . . ; C:
2.2.4. Reward Function
ð12Þ
which depends on the admission decision in state xi,l;
In the unavailable states, where the state load is greater than the link capacity, the actual throughput is equal to the link capacity; the reward associated to the unavailable state xi,l 2 SUN is then the link capacity l: rðxi;l Þ ¼ l ; l ¼ 1; :::; L; i ¼ Nl þ 1; :::; M: ð17Þ 2.3. Model Properties The described DTMDP is unichain (i.e., all stationary policies have a single recurrent class and possibly a non-empty set of transient states), because blocking
61
Call Control in Variable Capacity Links
decisions set the associated transition probability to zero (see eq. 12 ). Given a policy u 2 U, the set of recurrent states will be denoted with SR u , and the set of transient states will be denoted with STu (we recall that T SR u [ Su S). For example, let us consider a policy u0 which accepts all calls whenever possible, but class c calls which are accepted only if the number of class c ongoing calls is not larger than a given maximum nMAX(c); then, the states xi,l such that nc(i) > nMAX(c) constitute the set of transient states STu0 , because all the transitions towards this set are associated to blocked call arrivals and are null, while the transitions departing from this set are associated either to call terminations, to call dropping or to link state changes, and, thus, are positive (see eqs. (11), (13) and (14) ). The balance equations of the DTMDP are the following ([6]): X ðxi0 ;l0 Þpu ðxi0 ;l0 ; xi;l Þ ¼ 0; 8xi;l 2 S; ðxi;l Þ xi0 ;l0 2S
ð18Þ subject to: X
ðxi;l Þ ¼ 1;
ð19Þ
xi;l 2 S
and ð20Þ xi;l 0; 8xi;l 2 S; where one of the equations of the system (18) is redundant and the unknown (xi,l) is the stationary probability that the system is in xi,l: ðxi;l Þ ¼ ProbfxðtÞ ¼ xi;l g;
8 xi;l 2 S:
ð21Þ
Because the DTMDP is unichain, from the Markovchain properties, it follows that, for any policy u 2 U, the system of eqs. (18)–(20) has a unique solution, denoted with u, such that > 0 if xi;l 2 SR u : u ðxi;l Þ ð22Þ ¼ 0 if xi;l 2 STu
3. LP Formulation of the DTMDP The LP formulation of DTMDP problems is based on the definition of randomized policies ([6], [21]): each admission decision uadm(xi,l,c) is a real number between 0 and 1, and represents the probability that a call of class c is accepted when the system is in an available state xi,l 2 SAV; similarly, each dropping decision udrop(xi,l,c) is a real number between 0 and 1, and represents the probability that a call of class c is
dropped when the system is in an unavailable state xi,l 2 SUN. In a network supporting C classes, each available state is associated to the decision of accepting/rejecting a class c call. Consequently, the following generic acceptance action ak(i) can be associated to state xi,l 2 SAV: h i ðCÞ ; ak ðxi;l Þ¼ að1Þ ::: a k k f0; 1g if xi;l þ dc 2 SAV ; ðcÞ ak 2 f0g otherwise
ð23Þ
Thus, up to 2C actions are associated to a single state xi,l 2 SAV. For example, if C ¼ 2 the following four actions can be taken: a1 ¼ [0 0], reject all; a2 ¼ [0 1], admit class 2 only; a3 ¼ [1 0], admit class 1 only; a4 ¼ [1 1], admit all. Note that in the states xi,l such that xi,l þ c 2 = SAV for one or more c 2 [1, . . . ,C], the actions ak which accept calls of class c are not applicable. Hereafter, the set of indexes k of the applicable actions in state xi,l 2 SAV will be denoted with Aa(i). Similarly, each state xi,l 2 SUN is associated to the decision of dropping or not a class c call. Recalling the definition of dc, because one (and only one) dropping decision must be equal to 1 (see eq. (8) ), the following dropping actions dc can be associated to each state xi,l 2 SUN: dc(xi,l) ¼ dc;, c ¼ 1, . . . , C, 8 xi,l 2 SUN.
(24)
Thus, up to C actions are associated to a single state xi,l 2 SUN. For example, if C ¼ 2 the following 2 actions can be taken: a1 ¼ [1 0], drop class 1 call, a2 ¼ [0 1], drop class 2 call. In the states xi,l such that xi,l – = S for one or more c’ 2 [1, . . . ,C], the actions c 2 ac0 (which drop calls of class c0 ) are not applicable. The set of indexes c of the applicable dropping actions in state xi,l 2 SUN will be denoted with Ad(i). By introducing the unknown z(xi,l,ak), which is the probability that the system is in the available state xi,l 2 SAV and admission action ak 2 Aa(i) is chosen, and the unknowns z(xi,l, dc), which is the probability that the system is in the unavailable state xi,l 2 SUN and dropping action dc is chosen, the following LP is defined (see [21]): Maximize fðzÞ ¼
X
X
zðxi;l ; ak Þrðxi;l Þ
xi;l 2SAV k2Aa ðiÞ
þ
X
X
xi;m 2SUN c2Ad ðiÞ
zðxi;m ; dc Þrðxi;m Þ
ð25Þ
62
A. Pietrabissa
subject to 8 P P P > zðxi0 ;l0 ;ak0 Þ zðxi;l ;ak Þpðxi;l ;xi0 ;l0 ;ak Þ > > > k0 2Aa ði0 Þ xi;l 2SAV k2Aa ðiÞ > > > P P > > zðxi;l ;dc Þpðxi;l ;xi0 ;l0 ;dc Þ¼0; 8xi0 ;l0 2 SAV > zðxi0 ;l0 ;dc0 Þ zðxi;l ;ak Þpðxi;l ;xj0 ;l0 ;ak Þ > > > xi;l 2SAV k2Aa ðiÞ c0 2Ad ði0 Þ > > > P P > > zðxi;l ;dc Þpðxi;l ;xj0 ;l0 ;dc Þ¼0; 8xi0 ;l0 2 SUN > : xi;l 2SUN c2Ad ðiÞ
ð26Þ X
X
X
zðxi;l ; ak Þ þ
X
ð27Þ and
zðxi;l ; ak Þ 0; zðxi;l ; dc Þ 0;
1. The throughput must be maximized; 2. The fairness among classes in terms of blocking probabilities must be maximized; 3. The class dropping probabilities must be below a given thresholds.
zðxj;l ; dc Þ ¼ 1;
xj;l 2SUN c2Ad ðxj;l Þ
xi ;l2SAV k2Aa ðxi;l Þ
As already mentioned, the interest in the LP formulation is motivated by the possibility of explicitly controlling both blocking and dropping probabilities. The basic objective is the throughput maximization; besides this, different policies can be pursued by Telecom operators. In this paper, we consider three objectives:
k 2 Aa ðiÞ; xi;l 2 SAV ; c 2 Ad ðiÞ; xi;l 2 SUN
ð28Þ
Because the stationary probabilities that the system is in state xi,l 2 S can be computed as 8 P zðxi;l ; ak Þ ¼ ðxi;l Þ; xi;l 2 SAV > < k2Aa ðiÞ P ð29Þ zðxi;l ; dc Þ ¼ ðxi;l Þ; xi;l 2 SUN > : c2Ad ðiÞ
it follows (i) that the reward function (25) computes the expected throughput, and (ii) that constraints (26)–(28) express the balance equations of the DTMDP (18)–(20). As demonstrated in [21], the optimal solution z generates the optimal policy u. Let D(xi,l, ak) denote the probability of choosing the acceptance action akwhen the system is in state xi,l 2 SAV, and let D(xj,m, dk) denote the probability of choosing the dropping action dk when the system is in state xj,l 2 SUN: 8 z ðx ;a Þ ; k 2 Aa ðiÞ; xi;l 2 SAV Dðxi;l ; ak Þ ¼ P i;l k > > > z ðxi;l ;ak0 Þ < k0 2Aa ðiÞ
z ðx ;d Þ > Dðxi;l ; dc Þ ¼ P i;l c ; c 2 Ad ðiÞ; xi;l 2 SUN > > z ðxi;l ;dc0 Þ : c0 2Ad ðiÞ
ð30Þ The optimal policy u is defined as 8 P Dðxi;l ; ak Þak ; xi;l 2 SAV > < uacc ðxi;l Þ ¼ k2Aa ðiÞ P Dðxj;m ; dc Þdc ; xj;m 2 SUN > : udrop ðxj;m Þ ¼ c2Ad ðiÞ
ð31Þ
The first two objectives can be achieved by properly defining the cost function f in a multi-objective fashion, whereas the third objective requires to add classlevel constraints. By defining the set Bc as the set of indexes k corresponding to the actions which block the admission of a class c call, and recalling that the action dc corresponds to the dropping of class c calls, we can interpret the quantities 8 P P ðcÞ > zðxi;l ;ak Þ > > Pblock ðzÞ¼ > xi ;l2SAV k2Aa ðiÞ[Bc > > P P < þ zðxi;l ;dc Þ ;c¼1;:::;C xi;l 2SUN c2Ad ðiÞ > > > P fdrop > ðcÞ > zðxi;l ;dc Þ > : Pdrop ðzÞ¼ ðcÞ ½ 1P block ðcÞ xi;l 2SUN ð32Þ as the blocking probabilities and dropping probabilities of class c. Note that (i) in the computation of the blocking probabilities we consider that if a call attempt arrives when the system is in an available state it must be blocked, and that (ii) the expression of the dropping probabilities takes into account that the call dropping rate has to be evaluated over the admission rate of class c calls, computed as (c)[1–Pblock(c)], and not over the dropping frequency fdrop. Thus, the proposed reward function aimed at objectives 1. and 2. is defined as follows: X X zðxi;l ; ak Þrðxi;l Þ fðzÞ ¼ xi;l 2SAV k2Aa ðiÞ
þ
X
X
zðxi;l ; dc Þrðxi;l Þ
xi;l 2SUN c2Ad ðiÞ
h i ðcÞ wfair max Pblock ðzÞ c
ð33Þ
where wfair is a weight specifying the relevance of the third term, aimed at obtaining a fair solution, with respect to the first two terms, aimed at maximizing the
63
Call Control in Variable Capacity Links
throughput. The parameter wfair has to be carefully tuned: if the weight of the third term of the reward function is too small, little fairness is obtained; if it is too large, to gain negligible reduction of the worst blocking probability, the system considerably increases the blocking probabilities of the other classes. Cost function (33) can be expressed by a linear cost function by introducing a new variable t and C constraints. The LP becomes: Maximize fðz; tÞ ¼
X
X
zðxi;l ; ak Þrðxi;l Þ
xi;l 2SAV k2Aa ðiÞ
þ
X
X
zðxi;l ; dc Þrðxi;l Þ wfair t;
xi;l 2SUN c2Ad ðiÞ
ð34Þ subject to constraints (26), (27), (28) and to the following additional C constraints: X X ðcÞ zðxi;l ; ak Þ t Pblock ðzÞ ¼ xi ;l2SAV k2Aa ðiÞ[Bc
X
þ
X
zðxi;l ; dc Þ; c ¼ 1; :::; C:
solution of LP1 is the optimal policy which maximizes the total throughput. LP2: Throughput and fairness maximization. LP2 is defined by the cost function (34) with wfair > 0 and constraints (26)–(28), (35). The solution of LP2 is the optimal policy which maximizes the total throughput and minimizes the maximum blocking probability among the classes. LP3: Throughput maximization and dropping probability constraints. LP3 is defined by the cost function (35) with wfair ¼ 0 and constraints (26)–(28), (37). The solution of LP3 is the optimal policy which maximizes the total throughput while keeping the dropping probabilities below given thresholds. LP4: Throughput and fairness maximization and dropping probability constraints. LP4 is defined by the cost function (34) with wfair > 0 and constraints (26)–(28), (35), (37). The solution of LP4 is the optimal policy which maximizes the total throughput and minimizes the maximum blocking probability among the classes, while keeping the dropping probabilities below given thresholds.
xi;l 2SUN c2Ad ðiÞ
ð35Þ
Theorem: The solution spaces of the above-defined LP1, LP2, LP3 and LP4 are non-empty.
Finally, to enforce a pre-defined maximum dropping probability c tolerated by the network for class c, the following constraints on per-class dropping probabilities must be added to the LP: X fdrop ðcÞ zðxi;l ; dc Þ Pdrop ðzÞ ¼ ðcÞ ½ 1 Pblock ðcÞ x 2S
Proof: Because constraints (26) and (27) are the balance equations for each policy u, and because the DTMDP is unichain, from the Markov-Chain properties they define a feasible solution for each policy u. Constraints (35) together with eq. (34) are equivalent to the cost function (33) and, thus, do not reduce the solution space. Constraints (37) can be always met by increasing the blocking actions (i.e., by increasing the blocking probabilities). To prove the last statement, let us consider a policy u0 which set all the admission actions to 0: by blocking every incoming call, thesystem is reduced to the ¼ x set of recurrent states SR 0;l l¼1;:::;L , with x0,l ¼ u (0, . . . 0,l) 2 SAV, i.e., the system is in the (available) L P empty states, and ðx0;l Þ ¼ 1. It follows that all the
i;l
UN
c ; c ¼ 1; :::; C; ð36Þ which, from eq. (32), are equal to: 2 X X c ðcÞ 4 X zðxi;l ;dc Þþ zðxi;l ;ak Þ fdrop x 2S k2A ðiÞ[B xi;l 2SUN AV i;l a c 3 X X c ðcÞ þ zðxi;l ;dc Þ5 ; c ¼ 1;:::;C fdrop x 2S c2A ðiÞ i;l
UN
d
ð37Þ In the following section, we will compare the results obtained by four different LPs: LP1: Throughput maximization. LP1 is defined by the cost function (34) with wfair ¼ 0 and constraints (26), (27), (28) (in this case the additional constraints (35) are not required). The
l¼1
unavailable states are transient: SUN STu0 . Thus, from eq. (24) it follows that the stationary probabilities of the unavailable states are null: u ðxi;l Þ ¼ 0; 8 xi;l 2 SUN :
ð38Þ
From eqs. (29) and (38), given that the unknowns z(xj,l,dc) of the LP are non-negative, it follows that zðxi;l ; dc Þ ¼ 0;
8 xi;l 2 SUN
ð39Þ
64
A. Pietrabissa
Table 1. Values of the link model. Transition frequency matrix [min 1]
Link state load 2 1 3 5 6 4
¼ ¼ ¼ ¼ ¼ ¼
2
0.40 MAX 0.25 MAX 0.55 MAX 0.85 MAX MAX 0.70 MAX
0 6 0:0058 6 6 0:0029 ¼ 6 6 0 6 4 0 0
The theorem follows from eq. (39) and by noting that, because of constraint (27), the quantity between the square brackets of eq. (37) cannot be greater than 1.
0:2641 0 0:0049 0:0019 0:0010 0
0:1208 0:1968 0 0:0107 0:0049 0:0010
0:0302 0:2290 0:5427 0 0:0175 0:0117
3 0 0 7 0:0526 0 7 7 0:1715 0 7 0:5349 0:1647 7 7 0 0:29331 5 0:0370 0
Table 2. Class parameters. Class of Service
Class 1
Class 2
Class 3
ec [kbps] c [min 1] c [min 1]
64 2.125
0.333
96 1.25
0.2
128 0.5
0.2
4. Numerical Simulation Results In Sections 2 and 3, the MDP framework was extended to the case of a link with time-varying capacity. As stated in the introduction, due to the intrinsic MDP scarce scalability and to the underlying stationary hypothesis, the relevance of the extended framework is more theoretical than practical. The objective of the simulations is then to compare the optimal solutions of the defined LPs to show how the proposed framework can be used to control both blocking and dropping probabilities while maximizing the system load. Table 1 summarizes the link model parameters. The transition frequency matrix was computed from DVBS2 satellite link data provided by ThalesAlenia Space within the EU SatSix project ([23]). Table 2 collects the parameters of the three classes supported by the considered link; is a parameter used to vary the offered load. As shown in Table 3, to examine the algorithms’ behaviour in different traffic conditions, two simulation sets were performed. The first set was aimed at evaluating the control of dropping probabilities; four simulations were performed with different values of c. The second set was aimed at evaluating the overall algorithm performance in different traffic conditions; five simulations were performed, characterized by different values of , leading to different values of C P c offered traffic OFF, computed as OFF ¼ c e c . c¼1
As shown in Table 3, 10 simulation runs per simulation were performed. For each run, the link, class and simulation parameters were used to generate an event list; the events can be connection birth/ termination and link state variation. At each connection
birth event, the admission controller uses its admission policy to decide whether to accept or not the connection; at each link state variation, if the new link state capacity l is lower than the current state capacity (t), the controller uses its dropping policy to decide which connection (or sequence of connections) to drop. The policies were computed via the proposed procedure by using the LP solver provided by the PCx software (# 1996, University of Chicago, USA, [4]). Each simulation of the first set was executed twice: the first time without dropping constraints (LP1); the second with dropping constraints (LP3). Each simulation of the second set was executed four times to evaluate the policies computed by all the LPs defined in Section 3. The LP parameters are shown in Table 4z. Simulation set 1 results, summarized in Table 5 and Fig. 1, show that: LP3 has to increase the blocking probabilities – and, thus, to reduce the average throughput with respect to the unconstrained LP1 – to meet the maximum dropping probability threshold c when it is set to 0.1%. As c grows, LP3 manages to meet the constraint by selecting the most appropriate dropping action: by comparing the dropping probabilities of LP1 and LP3, it emerges that the latter redistributes the dropping probabilities among the classes in a fairer way. In this way, the average throughput is practically unaffected.
z
wfair was tuned by simulation runs.
65
Call Control in Variable Capacity Links
Table 3. Simulation parameters. Set 1
Set 2
Simulation set
1
2
3
4
1
2
3
4
5
MAX [Mbps]
OFF [Mbps] fdrop [min 1] c [%] , c ¼ 1,2,3. Length of each sim. run [s] Number of sim. runs
1 0.471 0.625 600 0.1 14400 10
1 0.471 0.625 600 0.5 14400 10
1 0.471 0.625 600 1 14400 10
1 0.471 0.625 600 1.5 14400 10
1 0.433 0.575 600 1 14400 10
1 0.471 0.625 600 1 14400 10
1 0.508 0.675 600 1 14400 10
1 0.546 0.725 600 1 14400 10
1 0.584 0.775 600 1 14400 10
Table 4. Problem parameters. Problem
LP1
LP2
LP3
LP4
wfair Dropping probability constraints
0 No
2 No
0 Yes
2 Yes
In simulation set 2, to evaluate the obtained fairness we use the Jain’s fairness index ([9]), widely used in the literature, which rates the fairness of the set of the C blocking probability values (the greater the index, the fairer the policy): P c¼1;:::;C
fairness index ¼ C
ðcÞ
Pblock
P ðcÞ 2 : Pblock
ð40Þ
The fairness index is defined only if at least one of the ðcÞ Pblock is not null, and its range is (0,1]x. Simulation set 2 results, summarized in Table 6 and Fig. 2, confirm that: Constraint (37) is effectively enforced by LP3 and LP4: the maximum dropping probability is about 1% whereas in LP1 and LP2 it is uncontrolled and reaches 4%. Note that, as explained before, by setting c ¼ 1% in these simulations the dropping policy manages to redistribute the dropping probabilities among the supported classes, so that the dropping probability constraints are met without affecting the average throughput and the blocking probabilities. x
Intuitively, the fairness index (40) is interpreted as the share of classes subject to a fair blocking probability. For example: if C ¼ 3, classes 1 and 2 are experiencing a blocking probability equal to ð2Þ
0.1 (Pblock ¼ Pblock ¼ 0.1) and class 3 is experiencing no blocking ð3Þ
probability, (Pblock ¼ 0), then the fairness index is equal to 2/3; if ð1Þ Pblock
ð2Þ
ð3Þ
5. Conclusions and Future Work
!2
c¼1;:::;C
ð1Þ
Cost function (33) with wfair > 0, (i.e., LP2 and LP4) manages to enforce fairness among classes without affecting the average throughput: in all the simulations the fairness index was above 98% whereas with wfair ¼ 0 (LP1 and LP3) it was between 89% and 93%; the obtained average throughput of LP2 and LP4 are less than 1% worse with respect to LP1 and LP3, respectively.
¼ 0.1 and Pblock ¼ Pblock ¼ 0, the fairness index is 1/3.
The theoretical relevance of this paper is that it extends the MDP framework currently available for the admission control problem in communication networks to the case of networks with time-varying link capacity; this is typically the case of wireless networks such as CDMA networks, DVB-S2 satellite networks and WiMAX networks. The developed theoretical framework can be used to develop more practical algorithms via ADP and RL methods. The innovative approach consists of (i) integrating a Markov-Chain-based link model within the MDP framework and (ii) computing a class dropping policy besides the standard admission policy to control the call dropping due to the variable link capacity. Numerical simulations validate the effectiveness of the proposed approach in controlling both blocking and dropping probabilities. On-going work is aimed at obtaining a realistic link model from real system measurements campaigns in a DVB-S2 satellite network with Adaptive Channel Modulation within the EU SATSIX project ([23]). Further work is being performed within the APICE project, aimed at exploiting the presented model, coupled with the state aggregation and policy-space reduction approach presented in [18], to develop RL-based algorithms which can be used in real networks without the scalability problems and the stationary hypothesis of the MDP framework.
66
A. Pietrabissa
Table 5. Simulation set 1 results: average throughput AVG,i and average dropping probabilities for LPi, i ¼ 1,3. Simulation number ( c)
1 (0.1%)
2 (0.5%)
3 (1%)
4 (1.5%)
AVG,1 [%] AVG,3 [%]
51.89 46.57
51.89 51.45
51.89 51.49
51.89 51.61
Pdrop [%]
0
0
0
0
Pdrop [%]
0.37
0.37
0.37
0.37
[%]
2.90
2.90
2.90
2.90
[%]
0.15
0.51
0.12
0
[%]
0.15
0.51
1.02
0.91
[%]
0.13
0.43
0.92
1.44
LP1
ð1Þ
ð2Þ
LP3
ð3Þ Pdrop ð1Þ Pdrop ð2Þ Pdrop ð3Þ Pdrop
Table 6. Simulation set 2 results: average throughput AVG,i, average fairness index fair_indexi and average maximum dropping probability max_drop_probi for LPi, i ¼ 1, . . . ,4. Simulation number
1
2
3
4
5
AVG,1 [Mbps] AVG,2 [Mbps] AVG,3 [Mbps] AVG,4 [Mbps] fair_index1 [%] fair_index2 [%] fair_index3 [%] fair_index4 [%] max_drop_prob1 max_drop_prob2 max_drop_prob3 max_drop_prob4
48.33 48.12 47.92 47.69 88.95 98.08 89.30 98.29 2.65 2.51 1.05 0.92
51.89 51.67 51.49 51.21 92.07 99.60 91.79 99.50 2.90 2.96 1.02 1.01
54.84 54.57 54.38 54.07 90.49 98.59 90.79 98.69 3.79 3.62 1.24 1.28
57.33 56.94 56.57 56.30 91.04 99.19 91.08 99.16 3.74 3.63 0.92 1.04
59.35 59.03 58.84 58.40 91.93 99.63 92.43 99.69 4.36 4.27 0.92 1.00
[%] [%] [%] [%]
55
ηAVG [Mbps]
35 25
LP1
ρc=0.1% ρc=0.5% ρc=1% ρc=1.5% LP2
LP 3
LP 2
LP 4
55 45 35 25
0.575
0.625
0.675
0.725
0.775
0.725
0.775
0.725
0.775
η OFF [Mbps]
3
Class Class11 Class Class22 Class Class33
2.5 2 1.5 1 0.5 0
LP1
Fairness index [%]
Dropping Probabilities [%]
LP 1
65
45
100 95 90 85 80
0.575
ρc=0.1% ρc=0.5% ρc=1% ρc=1.5% LP2
Fig. 1. Simulation set 1: average throughput and average dropping probabilities.
Acknowledgment
0.675
4 3 2 1 0 0.575
This work was partially supported by the SATSIX project ([23]), funded under the European Commission IST (Information Society Technologies) 6th
0.625
η OFF [Mbps] Maximum dropping probability [%]
ηAVG [Mbps]
65
0.625
0.675
η OFF [Mbps]
Fig. 2. Simulation set 2: average throughput, average fairness and maximum dropping probability.
Call Control in Variable Capacity Links
Framework Programme, and coordinated by ThalesAlenia Space (F), and by the APICE project (Algorithms for integrated planning and control of wireless heterogeneous networks), funded by the Italian Ministry of University and Research (MIUR), and coordinated by Teleinformatica e Sistemi (I).
References 1. Altman E. Applications of Markov decision processes in communication networks—A survey. Tech. Rep., INRIA, Available from www.inria.fr/RRRT/RR-3984. html, 2000 2. Bertsekas DP. Dynamic programming and suboptimal control: a survey from ADP to MPC. Fundamental Issues in Control, Eur J Control 2005; 11(4–5). 3. Castanet L, Deloues T, Lemorton J. Channel modelling based on N-state Markov chains for satcom systems simulation. In: Twelfth International Conference on Antennas and Propagation (ICAP 2003), vol. 1, pp. 119–122, 2003 4. Czyzyk J, Mehrotra S, Wagner M, Wright SJ PCx User Guide (Version 1.1). Optimization Technology Center. Technical Report OTC 96/01, 1997 5. ETSI Standard TR 102 376 V1.1.1: Digital Video Broadcasting (DVB) User guidelines for the second generation system for Broadcasting, Interactive Services, News Gathering and other broadband satellite applications (DVB-S2), 2005 6. Hillier FS, Lieberman GJ. Introduction to Operations Research, Sixth Edition. New York: McGraw Hill, ch. 21, 1995 7. Holma H, Toskala A. WWCDMA for UMTS. England: Wiley & Sons, 2004, ch. 9 8. IEEE standard for Local and Metropolitan Area Networks Part 16: Air interface for fixed broadband wireless access systems. IEEE Standard 802.16, 2004 9. Jain RK, Chiu D-MW, Hawe WR. A quantitative measure of fairness and discrimination for resource allocation in shared computer systems. Digital Equipment Corporation, Tech. Rep., 1984 10. Kalyanasundaram S, Chong EKP, Shroff NB Admission control schemes to provide class-level QoS in multiservice networks. Comput Netw 2001; 35: 307–326 11. Kalyanasundaram S, Chong EKP, Shroff NB Optimal resource allocation in multi-class networks with userspecified utility functions. Comput Netw 2002; 38: 613–630 12. Mosharaf K, Talim J, Lambadaris, I. Call admission and fairness control in WDM networks with grooming capabilities. In: IEEE Proceedings of 43rd Conference on Decision and Control, Nassau (Bahamas), 2004, pp 3738–3743
67 13. Nasser N, Hassanein H. An Optimal and Fair Call Admission Control Policy for Seamless Handoff in Multimedia Wireless Networks with QoS Guarantees. In: IEEE Proceedings, Globecom, 2004, pp 3926–3930 14. Ni J, Tsang DHK, Tatikonda S, Bensaou B. Optimal and structured call admission control policies for resource-sharing systems. IEEE Trans Commun 2007; 55(1): 158–170 15. Nordstrom E, Carlstrom J. Call admission control and routing for integrated CBR/VBR and ABR services: a Markov decision approach. In: IEEE Proceedings of ATM Workshop 1999, 1999, pp. 24–27 and 71–76 16. Park J-S, Huang L, Lee DC, Kuo C.-CJ. Optimal Code Assignment and Call Admission Control for OVSF-CDMA Systems Constrained by Blocking Probabilities. In: Proceedings of IEEE, Globecome 2004, pp. 3290–3294 17. Perez-Romero J, Sallent O, Agusti R, Diaz-Guerra MA. Radio Resource Management Strategies in UMTS. John Wiley & Sons, 2005 18. Pietrabissa A. Admission control in UMTS networks based on approximate dynamic programming. Eur J Control 2008; 14(1), pp. 62–75 19. Pietrabissa A. An Alternative LP Formulation of the Admission Control Problem in Multi-Class Networks. IEEE Trans Autom Control 2008; 53(3): 839–844 20. Pimentel C, Falk T, Lisboaˆ L. Finite-State Markov Modelling of Correlated Rician-Fading Channels. IEEE Trans Veh Technol 2004; 53(5): pp. 1491–1501 21. Puterman ML. Markov Decision Processes. New Jersey: Wiley & Sons, 1994 22. Sanchez-Salas DA, Cuevas-Ruiz JL N-states Channel Model using Markov Chains. Electronics, Robotics and Automotive Mechanics Conference (CERMA 2007), pp. 342–347, 25–28, 2007 23. SATSIX (Satellite-based communications systems within IPv6 networks) project (contract IST-2006–26950): http://www.ist-satsix.org/ 24. Tong H, Brown TX. Adaptive call admission control under quality-of-service constraints: A reinforcement learning solution. IEEE J Sel Areas Commun 2000; 18(2): 209–220 25. Vucetic B, Du J. Channel modelling and simulation in satellite mobile communication systems. IEEE J Sel Areas Commun 1992; 10(8): 1209–1218 26. Xiao Y, Chen C, Wang Y. Optimal admission control for multi-class of wireless adaptive multimedia services. IEICE Trans Commun 2001; E84-B(4): 795–804 27. Yang X, Feng G. Optimizing admission control for multiservice wireless networks with bandwidth asymmetry between uplink and downlink. IEEE Trans Veh Technol 2007; 56(2): 907–917 28. Yu F, Krishnamurthy V, Leung VCM. Cross-layer optimal connection admission control for variable bit rate multimedia traffic in packet wireless CDMA networks. IEEE Trans Signal Process 2006; 54(2): 542–555