Optimal Scheduling of A Large-Scale Multiclass ...

1 downloads 0 Views 474KB Size Report
loaded while the service and abandonment rates are fixed. The optimal solution is obtained via the ergodic diffusion control problem in the limit, which forms a ...
Optimal Scheduling of A Large-Scale Multiclass Parallel Server System with Ergodic Cost Ari Arapostathis

Anup Biswas

Guodong Pang

The Harold and Inge Marcus Department of Department of Electrical and Department of Electrical and Industrial and Manufacturing Engineering Computer Engineering Computer Engineering Pennsylvania State University The University of Texas at Austin The University of Texas at Austin University Park, PA 16802 1 University Station, Austin, TX 78712 1 University Station, Austin, TX 78712 Email:[email protected] Email:[email protected] Email: [email protected]

Abstract—We consider the optimal scheduling problem for a large-scale parallel server system with one large pool of statistically identical servers and multiple classes of jobs under the expected long-run average (ergodic) cost criterion. Jobs of each class arrive as a Poisson process, are served in the FCFS discipline within each class and may elect to abandon while waiting in their queue. The service and abandonment rates are both class-dependent. Assume that the system is operating in the Halfin-Whitt regime, where the arrival rates and the number of servers grow appropriately so that the system gets critically loaded while the service and abandonment rates are fixed. The optimal solution is obtained via the ergodic diffusion control problem in the limit, which forms a new class of problems in the literature of ergodic controls. A new theoretical framework is provided to solve this class of ergodic control problems. The proof of the convergence of the values of the multiclass parallel server system to that of the diffusion control problem relies on a new approximation method, spatial truncation, where the Markov policies follow a fixed priority policy outside a fixed compact set.

I. I NTRODUCTION One of the classical problems in queueing theory is to schedule jobs in a network in an optimal way. These problems are known as the scheduling problems which arise in a wide variety of applications, in particular, whenever there are different job classes present in the network and competing for the same resources. The optimal scheduling problem has a long history in the literature. One of the appealing scheduling rules is the well known cµ rule. This is a static priority policy in which it is assumed that each class-i job has a marginal delay cost ci and a service rate µi , and the classes are prioritized in the decreasing order of ci µi . This static priority rule has proven asymptotically optimal in many settings [4], [22], [24]. In [10] a single-server Markov modulated queueing network is considered and an averaged cµ-rule is shown asymptotically optimal for the discounted control problem. Another important aspect of queueing networks is abandonment/reneging, that is, jobs may choose to leave the system while being in the queue before their service. Therefore it is important to include abandonment in modeling queueing systems. In [5], [6], Atar et al. considered a multi-class M/M/N + M queueing network with abandonment and proved that a modified priority policy, referred to as cµ/θ

rule, is asymptotically optimal for the long run average cost in the fluid scale. Tezcan and Dai [11] showed the asymptotic optimality of a static priority policy on a finite time interval for a parallel server model under the assumed conditions on the ordering of the abandonment rates and running costs. Although static priority policies are easy to implement, it may not be optimal for control problems of many multi-server queueing systems. For the same multi-class M/M/N + M queueing network, discounted cost control problems are studied in [3], [7], [18], and asymptotically optimal controls for these problems are constructed from the minimizer of a HamiltonJacobi-Bellman (HJB) equation associated with the controlled diffusions in the Halfin-Whitt regime. The Halfin-Whitt regime is widely used to model large-scale parallel server systems, in which the arrival rates and the number of servers are scaled appropriately so that the system gets critically loaded and the system operations achieve both high quality (high server levels) and high efficiency (high servers’ utilization). Hence it is also referred to as the Quality-and-Efficiency-Driven (QED) regime; see, e.g., [7], [14]–[17] on the many-server regimes.

In this article we are interested in an ergodic control problem for a multi-class M/M/N + M queueing network in the Halfin-Whitt regime. The network consists of a single pool of a large number of statistically identical parallel servers and a buffer of infinite capacity. There are multi-classes of jobs and arrivals of different classes are independent Poisson processes. Both the service and reneging rates are class-dependent. Jobs may renege from the queue if they have not started to receive service before their patience times. The scheduling policies are work-conserving, that is, no server stays idle if any of the queues is non-empty. The control is the allocation of servers to different classes of customers at the service completion times. The running cost function is assumed to be a nonnegative, convex function with polynomial growth, as a function of the diffusion-scaled queue length processes. We are interested in the existence and uniqueness of asymptotically optimal stable stationary Markov controls for the ergodic control problem, and the asymptotic behavior of the value functions.

A. Contributions and comparisons The usual methodology for studying these problems is to consider the associated continuum model, which is the controlled diffusion limit in a heavy-traffic regime, and to study the ergodic control problem for the controlled diffusion. Ergodic control problems governed by controlled diffusions have been well studied in literature [2], [8] for models that fall in these two categories: (a) the running cost is nearmonotone, which is defined by the requirement that its value outside a compact set exceeds the optimal average cost, thus penalizing unstable behavior (see Assumption 3.4.2 in [2] for details), or (b) the controlled diffusion is uniformly stable, i.e., every stationary Markov control is stable and the collection of invariant probability measures corresponding to the stationary Markov controls is tight. However, the ergodic control problem at hand does not fall under any of these frameworks. First, the running cost we consider here is not near-monotone because the total queue length can be 0 when the total number of customers in the system are O(n). On the other hand, it is not at all clear that the controlled diffusion is uniformly stable (unless one imposes non-trivial hypotheses on the parameters), and this remains an open problem. The multiclass queueing model falls into a new and broad class of non-degenerate controlled diffusions [1], which in a certain way can be viewed as a mixture of the two categories mentioned above. We remark that for a fixed control, the controlled diffusions for the queueing model can be regarded as a special case of the piecewise linear diffusions considered in [12]. It is shown in [12] that these diffusions are stable under constant Markov controls. The proof is via a suitable Lyapunov function. We conjecture that uniform stability holds for the controlled diffusions associated with the queueing model. For the same multi-class Markovian model, Gamarnik and Stolyar show that the stationary distributions of the queue lengths are tight under any work-conserving policy and under certain parameter conditions [13, Theorem 2]. Another important contribution of this work is the convergence of the value functions associated with the sequence of multi-class queueing models to the value of the ergodic control problem, say %∗ , corresponding to the controlled diffusion model. It is not obvious that one can have asymptotic optimality from the existence of optimal stable controls for the HJB equations of controlled diffusions. This fact is relatively straightforward when the cost under consideration is discounted. In that situation the tightness of paths on a finite time horizon is sufficient to prove asymptotic optimality [7]. But we are in a situation where any finite time behavior of the stochastic process plays no role in the cost. In particular, we need to establish the convergence of the controlled steady states. Although uniform stability of stationary distributions for this multi-class queueing model under certain parameter conditions is established in [13], it is not obvious that the stochastic model considered here has the property of uniform stability. Therefore we use a different method to establish the asymptotic optimality. First we show that the value functions

are asymptotically bounded below by %∗ . To study the upper bound we construct a sequence of Markov scheduling policy that are uniformly stable. The key idea used in establishing such stability results is a spatial truncation technique, under which the Markov policies follow a fixed priority policy outside a given compact set. We believe these techniques can also be used to study ergodic control problems for other manyserver queueing models. The scheduling policies we consider in this paper allow preemption, that is, a customer in service can be interrupted for the server to serve a customer of a different class and her service will be resumed later. In fact, the asymptotic optimality is shown within the class of the work-conserving preemptive policies. In [7], both preemptive and non-preemptive policies are studied, where a nonpreemptive scheduling control policy is constructed from the HJB equation associated with preemptive policies and thus is shown to be asymptotically optimal. However, as far as we know, the optimal nonpreemptive scheduling problem under the ergodic cost remains open. For a similar line of work in uncontrolled settings we refer the reader to [14], [16]. Admission control of the single class M/M/N + M model with an ergodic cost criterion in the Halfin-Whitt regime is studied in [21]. For controlled problems and for finite server models, asymptotic optimality is obtained in [9] in the conventional heavy-traffic regime. The main advantage in [9] is the uniform exponential stability of the stochastic processes, which is obtained by using properties of the Skorohod reflection map. A recent work studying ergodic control of multi-class single-server queueing network is [20]. B. Organization In Section II we introduce the multi-class queueing model and the scheduling problem under ergodic cost. In Section III we solve the the ergodic control problem for the limiting diffusion. In Section IV we prove the results of asymptotic optimality. II. S CHEDULING P ROBLEM OF THE M ULTICLASS PARALLEL -S ERVER S YSTEM A. The multiclass parallel-server system Let (Ω, F, P) be a given complete probability space and all the stochastic variables introduced below are defined on it. The expectation w.r.t. P is denoted by E. We consider a multi-class Markovian many-server queueing system which consists of d jobs classes and n parallel servers capable of serving all jobs (see Figure 1). Jobs of class i ∈ {1, . . . , d} arrive according to a Poisson process with rate λni > 0. Jobs enter the queue of their respective classes upon arrival if not being processed. Jobs of each class are served in the first-come-first-serve (FCFS) service discipline. While waiting in queue, Jobs can abandon the system. The service times and patience times of jobs are class-dependent and both are assumed to be exponentially distributed, that is, class i jobs are served at rate µni and renege at rate γin . We assume that job arrivals, service and

  R ˜ n (t) := Rn γ n t Qn (s) ds . Then the dynamics and R i i 0 i i take the form: for i = 1, . . . , d, ˜ n (t) , Xin (t) = Xin (0) + A˜ni (t) − S˜in (t) − R i

Fig. 1. A schematic model of the system

abandonment of all classes are mutually independent. The system buffer is assumed to have infinite capacity. The Halfin-Whitt Regime. We consider a sequence of multiclass parallel server systems indexed by n in the Halfin-Whitt regime (the Quality-and-Efficiency-Driven (QED) regime). Consider a sequence of such systems indexed by n, in which the arrival rates λni and the number of servers n both increase n appropriately. Let rni := λi /µni be the mean offered load of class i customers. The intensity of the nth system is Pd traffic n −1 n given by ρ = n i=1 ri . In the Halfin-Whitt regime, the parameters are assumed to satisfy the following: as n → ∞, λni → λi > 0 , µni → µi > 0 , γin → γi > 0 , n √ λni − nλi ˆi , √ → λ n (µni − µi ) → µ ˆi , n d X rni λi → ρi := < 1, ρi = 1 . (1) n µi i=1 This implies that √

n

n(1 − ρ ) → ρˆ :=

d X ˆi ρi µ ˆi − λ i=1

µi

∈ R.

The above scaling is common in multi-class multi-server models [7], [18]. Note that we do not make any assumption on the sign of ρˆ. State Descriptors. Let Xin = {Xin (t) : t ≥ 0} be the total number of class i jobs in the system, Qni = {Qni (t) : t ≥ 0} the number of class i jobs in the queue and Zin = {Zin (t) : t ≥ 0} the number of class i jobs in service. The following basic relationships hold for these processes: for each t ≥ 0 and i = 1, . . . , d, Xin (t) = Qni (t) + Zin (t) ,

and e · Z n (t) ≤ n ,

(2)

and Qni (t) ≥ 0 and Zin (t) ≥ 0. We can describe these processes using a collection Ani , Sin , Rin , i = 1, . . . , d of independent rate-1 Poisson processes. Define  Z t  n n n n n n n ˜ ˜ Ai (t) := Ai (λi t) , Si (t) := Si µi Zi (s) ds , 0

t ≥ 0 . (3)

Scheduling Control. Following [7], [18] we only consider work-conserving policies that are non-anticipative and allow preemption. When a server becomes free and there are no jobs waiting in any queue, the server stays idle, but if there are jobs of multiple classes waiting in the queue, the server has to make a decision on the job class to serve. Service preemption is allowed, i.e., service of a job class can be interrupted at any time to serve some other class of jobs and the original service is resumed at a later time. A scheduling control policy determines the processes Z n , which must satisfy the constraints in (2) and the workconserving constraint, that is, e · Z n (t) = (e · X n (t))∧ n , t ≥ 0 . Define the action set An (x) as An (x) := a ∈ Zd+ : a ≤ x and e · a = (e · x) ∧ n . Thus, we can write Z n (t) ∈ An (X n (t)) for each t ≥ 0. We also assume that all controls are non-anticipative. Define the σ-fields Ftn := ˜ n (t) : 1 ≤ i ≤ d , s ∈ [0, t] ∨ σ X n (0), A˜ni (t),S˜in (t), R i n n ˜ ˜ n (t, r) : 1 ≤ i ≤ N , and G t := σ δ Ai (t, r), δ S˜in (t, r), δ R i d , r ≥ 0 , where δ A˜ni (t, r) := Ani (t + r) − Ani (λni t) ,  Z t  n n n n n ˜ δ Si (t, r) := Si µi Zi (s) ds + µi r − S˜in (t) , 0



˜ n (t, r) := Rn γ n δR i i i

Z

t

Qni (s) ds

+

γin r



˜ n (t) , −R i

0

and N is the collection of all P-null sets. The filtration {Ftn t ≥ 0} represents the information available up to time t while Gtn contains the information about future increments of the processes. We say that a working-conserving control policy is admissible if (i) Z n (t) is adapted to Ftn , (ii) Ftn is independent of Gtn at each time t ≥ 0, (iii) for each i = 1, . . . , d, and t ≥ 0, the process δ S˜in (t, · ) ˜ n (t, · ) agrees in law with Sin (µni · ), and the process δ R i agrees in law with Rin (γin · ). We denote the set of all admissible control policies (Z n , F n , G n ) by Un . B. The scheduling problem with ergodic cost Define the diffusion-scaled processes ˆ ni (t) := √1 Qni (t) , ˆ in (t) := √1 (Xin (t) − ρi nt) , Q X n n (4) 1 n n Zˆi (t) := √ (Zi (t) − ρi nt) n ˆ n as for t ≥ 0. By (3), we can express X i Z t Z t n n n n n n ˆ ˆ ˆ ˆ ni (s) ds Xi (t) = Xi (0) + `i t − µi Zi (s) ds − γi Q 0

0

n n n ˆ A,i ˆ S,i ˆ R,i +M (t) − M (t) − M (t) ,

where `ni :=

√1 (λn i n

(5)

− µni ρi n) , and

ˆ n (t) := √1 (An (λn t) − λn t) , M A,i i n i i   Z t 1 n ˆ S,i Zin (s) ds , M (t) := √ S˜in (t) − µni n 0   Z t n ˆ n (t) := √1 R ˜ in (t) − γin Q (s) ds M i R,i n 0

i=1

for some positive vector (h1 , . . . , hd )T . From (6) it is easy to see that we can rewrite the control problem as ˆ n (0)) = inf J( ˜X ˆ n (0), U ˆ n) , Vˆ n (X

are square integrable martingales w.r.t. the filtration {Ftn }. Note that ˆ i − ρi µ (λ ˆi ) . `ni −−−−→ `i := n→∞ µi  Define S := u ∈ Rd+ : e · u = 1 . For Z n ∈ Un we ˆ n (t) ∈ S, define, for t ≥ 0 and for adapted U  n ˆ (t) , ˆ n (t) := e · X ˆ n (t) + U Q (6)  n ˆ n (t) − e · X ˆ n (t) + U ˆ (t) . Zˆ n (t) := X ˆ n (t) = 0, we define U ˆ n (t) := ed = (0, . . . , 0, 1)T . Thus, If Q n ˆ represents the fraction of class-i jobs in the queue when the U i total queue size is positive. As we show later, it is convenient ˆ n (t) as the control. Note that the controls are nonto view U anticipative and preemption is allowed. We next introduce the running cost function for the control problem. Let r : Rd+ → R+ be a given function satisfying c1 |x|m ≤ r(x) ≤ c2 (1 + |x|m )

for some m ≥ 1 , (7)

for some positive constants ci , i = 1, 2. We also assume that r is locally Lipschitz. This assumption includes linear and convex running cost functions. For example, if we let hi be the holding cost rate for class i customers, then some of the Pd typicalmrunning cost functions are the following: r(x) = i=1 hi xi , m ≥ 1 . These running cost functions evidently satisfy the condition in (7). Given the initial state X n (0) and a work-conserving scheduling policy Z n ∈ Un , we define the diffusion-scaled cost function as Z T  1 n n n ˆ ˆ ˆ J(X (0), Z ) := lim sup E r(Q (s)) ds , (8) T →∞ T 0 where the running cost function r satisfies (7). Note that the running cost is defined using the scaled version of Z n . Then, the associated cost minimization problem becomes ˆ n (0)) := Vˆ n (X

inf

Z n ∈Un

ˆ n (0), Zˆ n ) . J(X

for some m ≥ 1 and positive constants c1 and c2 that do not depend on u. Some typical examples of such running costs are d X r(x, u) = [(e · x)+ ]m hi um with m ≥ 1 , (11) i ,

(9)

ˆ n (0)) as the diffusion-scaled value funcWe refer to Vˆ n (X ˆ n (0) in the nth system. tion given the initial state X We redefine r as r(x, u) = r((e · x)+ u), and suppose that r : Rd × S → [0, ∞) is locally Lipschitz with polynomial growth and  (10) c1 [(e · x)+ ]m ≤ r(x, u) ≤ c2 1 + [(e · x)+ ]m ,

where  Z T  ˆ n (s), U ˆ n (s) ds , ˜X ˆ n (0), U ˆ n ) := lim sup 1 E r X J( T →∞ T 0 (12) ˆ n, U ˆ n) and the infimum is taken over all admissible pairs (X satisfying (6). ˆ n (0) For simplicity we assume that the initial condition X ˆ n (0) → x as n → ∞ for some x ∈ Rd . is deterministic and X III. E RGODIC C ONTROL OF THE D IFFUSION L IMIT IN THE H ALFIN -W HITT R EGIME A. The ergodic control problem for the limiting diffusion process As in [7], [18], one formally deduces that, provided ˆ n (0) → x, there exists a limit X for X ˆ n on every finite time X interval, and the limit process X is a d-dimensional diffusion process with independent components, that is, dXt = b(Xt , Ut ) dt + Σ dWt ,

(13)

with initial condition X0 = x. In (13) the drift b(x, u) : Rd × S → Rd takes the form  b(x, u) = ` − R x − (e · x)+ u − (e · x)+ Γ u , (14) with ` := (`1 , . . . , `d )T , R := diag (µ1 , . . . , µd ) , and Γ := diag (γ1 , . . . , γd ) . The control Ut lives in S and is non-anticipative, W (t) is a d-dimensional standard Wiener process independent of the initial condition X0 = x, and the covariance matrix is given by ΣΣT = diag (2λ1 , . . . , 2λd ) . A formal derivation of the drift in (14) can be obtained from (5) and (6). Let U be the set of all admissible controls for the diffusion model. It is easy to check that the controlled diffusion limit Xt satisfies the standard assumptions to guarantee existence and uniqueness of solutions, in particular, the drift b and the diffusion matrix Σ satisfy the local Lipschitz continuity, affine growth condition and local nondegeneracy conditions [1]. In analogy with (12) we define the ergodic cost associated with the controlled diffusion process X and the running cost function r(x, u) as Z T  1 U r(Xt , Ut ) dt , U ∈ U . J(x, U ) := lim sup Ex T →∞ T 0 We consider the ergodic control problem %∗ (x) = inf J(x, U ) . U ∈U

(15)

We call %∗ (x) the optimal value at the initial state x for the controlled diffusion process X. It is shown later that %∗ (x) is independent of x. B. Optimal solution to the ergodic diffusion control problem The ergodic diffusion control problem in (15) for the limiting diffusion Xt in (13) introduces a new and broad class of ergodic control problems, which is thoroughly studied in [1]. In this section, we state the structural assumptions for the general class of problems and summarize the existence of optimal solution and its characterization in Theorems III.1 and III.2 (whose proofs can be found in [1]). Let Lu : C 2 (Rd ) → C(Rd ) be the controlled extended generator of the diffusion, Lu f (x) :=

1 ij a (x) ∂ij f (x) + bi (x, u) ∂i f (x) , 2

u ∈ U, (16) where a := ΣΣT , u ∈ U plays the role of a parameter, and U is the compact, metrizable control set. In (16) and elsewhere in ∂ and ∂ij := this paper we have adopted the notation ∂i := ∂x i 2 ∂ ∂xi ∂xj . The structural assumptions include the following. Assumption III.1. For some open set K ⊂ Rd , the following hold: (i) The running cost r is inf-compact on K. (ii) There exist inf-compact functions V ∈ C 2 (Rd ) and h ∈ C(Rd × U), such that Lu V(x) ≤ 1 − h(x, u)

∀ (x, u) ∈ Kc × U ,

Lu V(x) ≤ 1 + r(x, u)

∀ (x, u) ∈ K × U .

(17)

Without loss of generality we assume that V and h are nonnegative. The next assumption is not a structural one, but rather the necessary requirement that the value of the ergodic control problem is finite. Otherwise, the problem is vacuous. For U ∈ U define Z T  1 U %U (x) := lim sup Ex r(Xs , Us ) ds , T →∞ T 0 Assumption III.2. There exists U ∈ U such that %U (x) < ∞ for some x ∈ Rd . Assumption III.2 alone does not imply that %v < ∞ for some v ∈ USSM , where USSM stands for the set of stationary Markov controls under which the associated diffusion is positive recurrent. However, when combined with Assumption III.1, this is the case as the following lemma asserts. Lemma III.1. Let Assumptions III.1 and III.2 hold. Then there exists u0 ∈ USSM such that %u0 < ∞. Moreover, there exists a nonnegative inf-compact function V0 ∈ C 2 (Rd ), and a positive constant η such that Lu0 V0 (x) ≤ η − r(x, u0 (x))

∀ x ∈ Rd .

Conversely, if (18) holds, then Assumption III.2 holds.

(18)

Remark III.1. There is no loss of generality in using only the constant η in Assumption III.2, since V0 can always be scaled to achieve this. We also observe that for K = Rd the problem reduces to an ergodic control problem with near-monotone cost, and for K = ∅ we obtain an ergodic control problem under a uniformly stable controlled diffusion. The controlled diffusion process in (13) belongs to a large class of controlled diffusion processes, called piecewise linear controlled diffusions [12]. We can show that this class of controlled diffusions satisfies Assumptions III.1 and III.2 (whose proof can be found in [1]). Proposition III.1. Let b and r be given by (14) and (10), respectively. Then (13) satisfies Assumptions III.1 and III.2, with h(x) = c0 1 + |x|m ) and  K := x : δ|x| < (e · x)+ , (19) for appropriate positive constants c0 and δ. We next state the existence of a stationary Markov control that is optimal and its characterization via the HJB equation. Let USM be the set of stationary Markov controls. Theorem III.1. Let G denote the set of ergodic occupation measures corresponding to controls in USSM , and Gβ those corresponding to controls in UβSM := U ∈ U : %U (x) ≤ β for some x ∈ Rd , for β > %∗ . Then the following hod: (a) The set Gβ is compact in P(Rd ) for any β > %∗ . (b) There exists v ∈ USM such that %v = %∗ . Theorem III.2. There exists a unique function V∗ ∈ C 2 (Rd ) with V∗ (0) = 0, which is bounded below in Rd and satisfies   min Lu V∗ (x) + r(x, u) = %∗ . u∈U

A stationary Markov control v is optimal for the ergodic control problem relative to r if and only if it satisfies    H x, ∇V∗ (x) = b x, v(x) · ∇V∗ (x) + r x, v(x) a.e. , where   H(x, p) := min b(x, u) · p + r(x, u) . u∈U

Moreover, for an optimal v ∈ USM , we have Z T   1 v Ex r Xs , v(Xs ) ds = %∗ lim T →∞ T 0

∀x ∈ Rd .

C. Numerical Examples We give a numerical example with two classes of jobs with the running cost function r in (11). The system parameters are: arrival rates λ1 = 100, λ2 = 150, service rates µ1 = 1, µ2 = 1.2, abandonment rates γ1 = 0.5, γ2 = 1, holding (in queue) cost rates q1 = 1, q2 = 1.5, abandonment penalties p1 = 1, p2 = 1.5, the numbers of servers N = 240 and the safety staffing parameter ρˆ = 1. The total cost (holding and abandonment) rate hi = qi + pi γi : h1 = 1.5 and h2 = 3. The numerical solutions are computed with the policy iteration

algorithm. In Fig. 2, the optimal solutions to the ergodic diffusion control problem with linear and quadratic costs are shown for class-1 jobs.

(a) linear cost (m = 1)

(b) quadratic cost (m = 2)

Fig. 2. Optimal Control Solution: Class 1

IV. A SYMPTOTIC O PTIMALITY In this section we prove that the value of the ergodic control problem corresponding to the multi-class M/M/N + M queueing network asymptotically converges to %∗ , the value of the ergodic control for the controlled diffusion. ˆ n (0) → x ∈ Rd as n → ∞. Also Theorem IV.1. Let X assume that (1) and (7) hold. Then ˆ n (0)) ≥ %∗ (x) , lim inf Vˆ n (X n→∞

where %∗ (x) is given by (15). Theorem IV.2. Suppose the assumptions of Theorem IV.1 hold. In addition, assume that r in (8) is convex. Then ˆ n (0)) ≤ %∗ (x) . lim sup Vˆ n (X n→∞

Thus, we conclude that for any convex running cost function r, Theorems IV.1 and IV.2 establish the asymptotic convergence of the ergodic control problem for the queueing model. A. The lower bound In this section we prove Theorem IV.1. Proof of Theorem IV.1. Recall the definition of Vˆ n in (9), and ˆ n (0)) < ∞. Let consider a sequence such that supn Vˆ n (X 2 d ϕ ∈ C (R ) be any function satisfying ϕ(x) := |x|m for |x| ≥ 1. Applying Itˆo’s formula on ϕ (see, e.g., [19, Theorem 26.7]), we obtain from (5) that     ˆ 1n (t) = E ϕ X ˆ 1n (0) E ϕ X Z t   0 n  n ˆn n ˆ ˆ +E Θ1 X1 (s), Z1 (s) ϕ X1 (s) ds Z 0t   00 n  n ˆn n ˆ ˆ +E Θ2 X1 (s), Z1 (s) ϕ X1 (s) ds 0  X   ˆ 1n (s) − ϕ0 X ˆ 1n (s−) · ∆X ˆ 1n (s) +E ∆ϕ X s≤t

  1 ˆ n (s−) ∆X ˆ 1n (s)∆X ˆ 1n (s) , − ϕ00 X 2

where Θn1 (x, z) := `n1 − µn1 z − γ1n (x − z) ,   1 n λn µn z + γ1n (x − z) √ . Θn2 (x, z) := µ1 ρ1 + 1 + 1 2 n n Since {`n1 } is a bounded sequence, it is easy to show that for all n there exist positive constants κi , i = 1, 2, independent of n, such that  Θn1 (x, z) ϕ0 (x) ≤ κ1 1 + |(e · x)+ |m − κ2 |x|m ,  κ2 Θn2 (x, z) ϕ00 (x) ≤ κ1 1 + |(e · x)+ |m + |x|m , 4 z + √ provided that x−z ≤ (e·x) and n ≤ 1 . We next compute the terms corresponding to the jumps. For that, first we see that the jump size is of order √1n . We can also find a positive constant κ3 such that  ∀x ∈ Rd . sup ϕ00 (y) ≤ κ3 1 + |x|m−2 |y−x|≤1

Using Taylor’s approximation we obtain the inequality   ˆ n (s) − ϕ0 X ˆ n (s−) · ∆X ˆ n (s) ∆ϕ X 1 1 1 00   1 ˆ 1n (s) 2 . ≤ sup ϕ (y) ∆ X 2 |y−Xˆ n (s−)|≤1 1

Hence combining the above facts we obtain X   ˆ 1n (s) − ϕ0 X ˆ 1n (s−) · ∆X ˆ 1n (s) E ∆ϕ X s≤t

  1 ˆ n (s−) ∆X ˆ n (s)∆X ˆ n (s) − ϕ00 X 1 1 1 2  X  n  ˆ (s−) m−2 ∆ X ˆ n (s) 2 ≤ E κ3 1 + X 1 1 s≤t

 Z t    κ2 ˆ n m n + m ˆ (s) + κ (e · X (s)) ds , ≤ E κ4 + X 5 1 4 0 (20) for some suitable positive constants κ4 and κ5 , independent of n, where in the second inequality we use the fact that the ˆ n ] is the sum of the squares of the optional martingale [X 1 n ˆ ˆ n i is a martingale. Therefore, for jumps, and that [X1 ] − hX 1 some positive constants C1 and C2 it holds that   ˆ 1n (t) 0 ≤ E ϕ X Z t  n m   ˆ 1n (0) + C1 t − κ2 E ˆ 1 (s) ds X ≤ E ϕ X 2 0 Z t   ˆ n (s))+ m ds . + C2 E (e · X (21) 0

 1 ˆ n (s))+ m , which, By (2.8), we have r(Q (s)) ≥ dcm (e · X ˆ n (0)) < ∞, combined with the assumption that supn Vˆ n (X implies that Z T   1 n + m ˆ sup lim sup E (e · X (s)) ds < ∞ . n T →∞ T 0 ˆn

In turn, from (21) we obtain  Z T n m 1 ˆ X1 (s) ds < ∞ . sup lim sup E n T →∞ T 0 Repeating the same argument for coordinates i = 2, . . . , d, we obtain  Z T n m 1 ˆ sup lim sup X (s) ds < ∞ . (22) E n T →∞ T 0 We introduce the process  n n  Xˆ i (t)−Zˆi (t) , 1 ≤ i ≤ d , if (e · X ˆ n (t))+ > 0, ˆ n (t))+ n (e·X Ui (t) := e , otherwise. d Since Z n is work-conserving, it follows that U n takes values in S. Define the mean empirical measures  Z T  1 ˆ n (s), U n (s) ds ΦnT (A × B) := IA×B X E T 0 for Borel sets A ⊂ Rd and B ⊂ S. From (22) we see that the family {ΦnT : T > 0, n ≥ 1} is tight. Hence for any sequence Tk → ∞, there exists a subsequence, also denoted by Tk , such that ΦnTk → πn , as k → ∞. It is evident that {πn : n ≥ 1} is tight. Let πn → π along some subsequence, with π ∈ P(Rd × S). Therefore it is not hard to show that Z ˆ n (0)) ≥ lim Vˆ n (X r˜(x, u) π(dx, du) , n→∞

Rd ×U

where, as defined earlier, r˜(x, u) = r((e · x)+ u). To complete the proof of the theorem we only need to show that π is an ergodic occupation measure for the diffusion. For that, ˆ n, X ˆ n ] = 0 for i 6= j consider f ∈ Cc∞ (Rd ) . Recall that [X i j [23, Lemma 9.2, Lemma 9.3]. Therefore, using Itˆo’s formula and the definition of ΦnT , we obtain  1  ˆ n  1  ˆn E f X (T ) = E f X (0) T T Z d X + Ani (x, u) · fxi (x) Rd ×U

i=1

! + +

Bin (x, u)fxi xi (x)

ΦnT (dx, du)

d X X  1 ˆ n (s)) − ˆ n (s−) · ∆X ˆ n (s) E ∆f (X fxi X i T i=1 s≤T  d  1 X ˆ n (s−) ∆X ˆ in (s)∆X ˆ jn (s) , (23) − fxi xj X 2 i,j=1

where

We first bound the last term in (23). Using Taylor’s formula we see that d  X  ˆ n (s) − ˆ n (s−) · ∆X ˆ n (s) ∆f X ∇f X i=1 d  1 X ˆ n (s−) ∆X ˆ in (s)∆X ˆ jn (s) fxi xj X − 2 i,j=1

=

d k||f ||C 3 X ˆ n ˆ jn (s) , √ ∆Xi (s)∆X n i,j=1

for some positive constant k, where we use the fact that the jump size is √1n . Hence using the fact that independent Poisson processes do not have simultaneous jumps w.p.1, using the ˆn = X ˆ n − Zˆ n , we obtain identity Q i i i d X X  1 ˆ n (s)) − ˆ n (s−) · ∆X ˆ n (s) E ∆f (X ∇f X T i=1 s≤T  d  1 X ˆ n (s−) ∆X ˆ in (s)∆X ˆ jn (s) fxi xj X − 2 i,j=1 Z T X   d  n k||f ||C 3 λi µni Zin (s) γin Qni (s) √ E ≤ + + ds . n n n T n 0 i=1 (24)

Therefore, first letting T → ∞ and using (20) and (22) we see that the expectation on the right hand side of (24) is bounded above. Therefore, as n → ∞, the left hand side of (24) tends to 0. Thus by (23) and the fact that f is compactly supported, we obtain Z Lu f (x) π(dx, du) = 0 , Rd ×U

 where Lu f (x) = λi ∂ii f (x) + `i − µi (xi − (e · x)+ ui ) −  γi (e · x)+ ui ∂i f (x) . Therefore π ∈ G. B. The upper bound The proof of the upper bound in Theorem IV.2 is a little more involved than that of the lower bound. Generally it is very helpful if one has uniform stability across n ∈ N (see, e.g., [9]). In [9] uniform stability is obtained from the reflected dynamics with the Skorohod mapping. Here we establish the asymptotic upper bound by using a new technique referred to as spatial truncation. Let vδ be any precise continuous control in USSM satisfying vδ (x) = u0 = (0, . . . , 0, 1) for |x| > K > 1. First we construct a work-conserving admissible policy for each n ∈ N (see [7]). Define a measurable map $ : {z ∈ Rd+ : e · z ∈ Z} → Zd+ as follows: for z = (z1 , . . . , zd ) ∈ Rd , let   d X  $(z) := bz1 c, . . . , bzd−1 c, bzd c + zi − bzi c .

 Ani (x, u) := `ni − µni xi − (e · x)+ ui − γin (e · x)+ ui , i=1   Note that |$(z) − z| ≤ 2d. Define 1 n λn µn xi + (γin − µni )(e · x)+ ui  √ Bin (x, u) := µi ρi + i + i . uh (x) := $ (e · x − n)+ vδ (ˆ x n ) , x ∈ Rd , 2 n n

x ˆ

n

 :=

An :=



x1 − ρ1 n xd − ρd n √ ,..., √ n n

 ,

√ x ∈ Rd+ : sup |xi − ρi n| ≤ K n . i

We define a state-dependent, work-conserving policy as follows:  n if X n ∈ An , Xi − uh (X n ) , n n   Zi [X ] := X n ∧ n − Pi−1 X n + , otherwise. j i j=1 (25) Therefore, whenever the state of the system is in Acn , the system works under the fixed priority policy with the least priority given to class-d jobs. First we show that this is a well-defined policy for all large n. It is enough to show that Xin − uh (X n ) ≥ 0 for all i when X n ∈ An . If not, then for some i, 1 ≤ i ≤ d, we must have Xin − uh (X n ) < 0 and so Xin < (e · X n − n)+ + d. Since X n ∈ An , we obtain √ − K n + ρi n ≤ Xin < (e · X n − n)+ + d d + X √ (Xin − ρi n) + d ≤ dK n + d . = i=1

But this cannot hold for large n. Hence this policy is well defined for all large n. Under the policy defined in (25), X n is a Markov process and its generator given by d X

Ln f (x) =

 λni f (x + ei ) − f (x)

i=1 d X

+

 µni Zin [x] f (x − ei ) − f (x)

i=1

+

d X

 γin Qni [x] f (x − ei ) − f (x) ,

x ∈ Zd+ ,

i=1

where Z n [x] is as above and Qn [x] := x − Z n [x]. It is easy to see that, for x ∈ / An , "  + # + i−1 X n Qi [x] = xi − n − xj . j=1 n

Lemma IV.1. Let X be the Markov process corresponding to the above control. Let q be an even positive integer. Then there exists n0 ∈ N such that Z T  1 n q ˆ E |X (s)| ds < ∞ , sup lim sup n≥n0 T →∞ T 0  ˆn = X ˆ n, . . . , X ˆ n T is the diffusion-scaled process where X 1 d corresponding to the process X n , as defined in (4). The proof of this lemma can be found in the technical supplement [1]. Proof of Theorem IV.2. Let r be the given running cost with polynomial growth with exponent m in (7). Let q = 2(m + 1). Recall that r˜(x, u) = r((e · x)+ u) for (x, u) ∈ Rd × S. Then r˜ is convex in u and satisfies (10) with the same exponent

m. For any δ > 0 we choose vδ ∈ USSM such that vδ is a continuous precise control with invariant probability measure µδ and Z r˜(x, vδ (x)) µδ (dx) ≤ %∗ + δ . (26) Rd

We also want the control vδ to have the property that vδ (x) = (0, . . . , 0, 1) outside a large ball. To obtain such vδ , we can find vδ0 and a ball Bl for l large, such that vδ0 ∈ USSM , vδ0 (x) = ed for |x| > l, vδ0 is continuous in Bl , and Z δ 0 0 d r˜(x, vδ (x)) µδ (dx) − %∗ < 2 , R where µ0δ is the invariant probability measure corresponding to vδ0 . We note that vδ0 might not be continuous on ∂Bl . Let {χn : n ∈ N} be a sequence of cut-off functions such that c χn ∈ [0, 1], it vanishes on Bl− 1 , and it takes the value 1 n on Bl− n2 . Define the sequence vδn (x) := χn (x)vδ0 (x) + (1 − χn (x))ed . Then vδn → vδ0 , as n → ∞, and the convergence is uniform on the complement of any neighborhood of ∂Bl . Also by Proposition III.1 the corresponding invariant probability measures µnδ are exponentially tight. Thus Z Z 0 0 n n −−−−→ 0 . r ˜ (x, v (x)) µ (dx) − r ˜ (x, v (x)) µ (dx) δ δ δ δ Rd

Rd

n→∞

Combining the above two expressions, we can easily find vδ which satisfies (26). We construct a scheduling policy as in Lemma IV.1. By Lemma IV.1 we see that for some constant K1 we have Z T  1 ˆ n (s)|q ds < K1 , sup lim sup E |X (27) n≥n0 T →∞ T 0  with q = 2(m + 1). Define vh (x) := $ (e · x − n)+ vδ (ˆ xn ) ,  √ and vˆh (ˆ xn ) := $ n(e · x ˆn )+ vδ (ˆ xn ) . Since vδ (ˆ xn ) = n n n (0, . . . , 0, 1) when |ˆ x | ≥ K, it follows that Q P [X ] = X n − d−1 n n n Z [X ] = vhn(X ) for large n, provided that i=1 Xin ≤ n. Pd−1 n √ o Define Dn := x : ˆi > ρd n . Then i=1 x  1 ˆ n (t)) √ vˆh (X n  ˆ + r X n (t) − Zˆ n (t) I{Xˆ n (t)∈Dn }  1  ˆ n (t) I ˆ n − r √ vˆh (X {X (t)∈Dn } . n

 ˆ n (t) = r r Q



Define, for each n, the mean empirical measure ΨnT by Z T   1 ˆ n (t) dt . ΨnT (A) := E IA X T 0 By (27), the family {ΨnT : T > 0, n ≥ 1} is tight. We next show that Z T   1 n ˆ lim lim sup E r Q (t) dt n→∞ T →∞ T 0Z  = r (e · x)+ vδ (x) µδ (dx) . (28) Rd

For each n, select a sequence {Tkn : k ∈ N} along which the ‘lim sup’ in (28) is attained. By tightness there exists a limit point Ψn of ΨnT n . Since Ψn has support on a discrete lattice, k we have     Z Z vˆh (x) vˆh (x) n √ √ r ΨTkn (dx) −−−−→ r Ψn (dx) . k→∞ n n d d R R Therefore, lim sup T →∞

 Z T  1 ˆ n (t) dt r Q E T 0   Z 1 ≶ ˆh (x) Ψn (dx) ± E n , r √ u n Rd

where E

n

Z T   1 ˆ n (t) r Q = lim sup E T →∞ T 0   1  n ˆ + r √ vˆh (X (t) I{Xˆ n (t)∈Dn } dt . n

By (27), the family {Ψn : n ≥ 1} is tight. Hence it has a limit Ψ. By definition 1 2d √ vˆh (x) − (e · x)+ vδ (x) ≤ √ . n n Thus using the continuity property of r and (7) it follows that   Z Z  u ˆh (x) √ r Ψn (dx) −−−−→ r (e·x)+ vδ (x) Ψ(dx) , n→∞ n Rd Rd along some subsequence. Therefore, in order to complete the proof of (28) we need to show that lim supn→∞ E n = 0 . Since the policies are work-conserving, we observe that ˆ n − Zˆ n ≤ (e · X ˆ n )+ , and therefore for some positive 0≤X constants κ1 and κ2 , we have  1     ˆ n (t) ∨ r Q ˆ n (t) ≤ κ1 + κ2 (e · X ˆ n )+ m . r √ vˆh (X n Given ε > 0 we can choose n1 so that for all n ≥ n1 , Z T    1 n + m  ˆ lim E (e · X (s)) I ˆn ds ≤ ε , ρ √ |X (s)|> √d n T →∞ T d 0 p  n where we use (27). We observe that Dn ⊂ |ˆ x | > ρd d/n . Thus (28) holds. In order to complete the proof we only need to show that Ψ is the invariant probability measure corresponding to vδ . This can be shown using the convergence of generators as in the proof of Theorem IV.1.

R EFERENCES [1] A. Arapostathis, A. Biswas, and G. Pang. Ergodic control of multiclass m/m/n + m queues in the halfin-whitt regime. Annals of Applied Probability, Forthcoming, 2014. [2] A. Arapostathis, V. S. Borkar, and M. K. Ghosh. Ergodic control of diffusion processes, volume 143 of Encyclopedia of Mathematics and its Applications. Cambridge University Press, Cambridge, 2012. [3] R. Atar. Scheduling control for queueing systems with many servers: asymptotic optimality in heavy traffic. Ann. Appl. Probab., 15(4):2606– 2650, 2005. [4] R. Atar and A. Biswas. Control of the multiclass G/G/1 queue in the moderate deviation regime. Ann. Appl. Probab., 24(5):2033–2069, 2014. [5] R. Atar, C. Giat, and N. Shimkin. The cµ/θ rule for many-server queues with abandonment. Oper. Res., 58(5):1427–1439, 2010. [6] R. Atar, C. Giat, and N. Shimkin. On the asymptotic optimality of the cµ/θ rule under ergodic cost. Queueing Syst., 67(2):127–144, 2011. [7] R. Atar, A. Mandelbaum, and M. I. Reiman. Scheduling a multi class queue with many exponential servers: asymptotic optimality in heavy traffic. Ann. Appl. Probab., 14(3):1084–1134, 2004. [8] V. S. Borkar. Optimal control of diffusion processes, volume 203 of Pitman Research Notes in Mathematics Series. Longman Scientific & Technical, Harlow, 1989. [9] A. Budhiraja, A. P. Ghosh, and C. Lee. Ergodic rate control problem for single class queueing networks. SIAM J. Control Optim., 49(4):1570– 1606, 2011. [10] A. Budhiraja, A. P. Ghosh, and X. Liu. Scheduling control for markovmodulated single-server multiclass queueing systems in heavy traffic. Queueing Syst., 2014. [11] J. G. Dai and T. Tezcan. Optimal control of parallel server systems with many servers in heavy traffic. Queueing Syst., 59(2):95–134, 2008. [12] A. B. Dieker and X. Gao. Positive recurrence of piecewise OrnsteinUhlenbeck processes and common quadratic Lyapunov functions. Ann. Appl. Probab., 23(4):1291–1317, 2013. [13] D. Gamarnik and A. L. Stolyar. Multiclass multiserver queueing system in the Halfin-Whitt heavy traffic regime: asymptotics of the stationary distribution. Queueing Syst., 71(1-2):25–51, 2012. [14] D. Gamarnik and A. Zeevi. Validity of heavy traffic steady-state approximation in generalized Jackson networks. Ann. Appl. Probab., 16(1):56–90, 2006. [15] O. Garnett, A. Mandelbaum, and M. I. Reiman. Designing a call center with impatient customers. Manufacturing and Service Operations Management, 4(3):208–227, 2002. [16] I. Gurvich. Diffusion models and steady-state approximations for exponentially ergodic markovian queues. Ann. Appl. Probab., 2014 (to appear). [17] S. Halfin and W. Whitt. Heavy-traffic limits for queues with many exponential servers. Oper. Res., 29(3):567–588, 1981. [18] J. M. Harrison and A. Zeevi. Dynamic scheduling of a multiclass queue in the Halfin-Whitt heavy traffic regime. Oper. Res., 52(2):243–257, 2004. [19] O. Kallenberg. Foundations of modern probability. Probability and its Applications (New York). Springer-Verlag, New York, second edition, 2002. [20] J. Kim and A. R. Ward. Dynamic scheduling of a GI/GI/1+GI queue with multiple customer classes. Queueing Syst., 75(2-4):339–384, 2013. [21] Y. L. Kocaga and A. R. Ward. Admission control for a multi-server queue with abandonment. Queueing Systems, 65(3):275–323, 2010. [22] A. Mandelbaum and A. L. Stolyar. Scheduling flexible servers with convex delay costs: heavy-traffic optimality of the generalized cµ-rule. Oper. Res., 52(6):836–855, 2004. [23] G. Pang, R. Talreja, and W. Whitt. Martingale proofs of many-server heavy-traffic limits for Markovian queues. Probab. Surv., 4:193–267, 2007. [24] J. A. van Mieghem. Dynamic scheduling with convex delay costs: the generalized cµ rule. Ann. Appl. Probab., 5(3):809–833, 1995.

Suggest Documents