BEYOND THE c RULE: DYNAMIC SCHEDULING OF A TWO-CLASS LOSS QUEUE Eungab Kim and Mark P. Van Oyen Faculty
of Management University of Toronto Toronto, Ontario, M5S 3E6, CANADA E-mail:
[email protected] Department
of Industrial Engineering and Management Sciences Northwestern University Evanston, IL 60208-3119, USA E-mail:
[email protected] URL: http://primal.iems.nwu.edu/vanoyen/
To appear in Mathematical Methods of Operations Research, Vo. 48
Keywords: polling system, stochastic scheduling, nite buer, loss penalty, threshold policy, c rule
AMS Classi cation Numbers: 60K25 Queueing Theory; 90B35 Scheduling Theory; 68M20 Performance Evaluation, Queueing, Scheduling Corresponding Author: Mark P. Van Oyen Phone: 847-491-7008 FAX: 847-491-8005 March 20, 1997
Beyond the c Rule: Dynamic Scheduling of a Two-class Loss Queue 1 Eungab Kim Faculty of Management, University of Toronto, Toronto, Ontario, M5S 3E6, CANADA Mark P. Van Oyen Department of Industrial Engineering and Management Sciences Northwestern University, Evanston, IL 60208-3119, USA
Abstract
We consider scheduling a single server in a two-class M/M/1 queueing system with nite buers subject to holding costs and rejection costs for rejected jobs. We use dynamic programming to investigate the structural properties of optimal policies. Provided that the delay of serving a job is always less costly than rejecting an arrival, we show that the optimal policy has a monotonic threshold type of switching curve; otherwise, numerical analysis indicates that the threshold structure may not be optimal. Keywords: polling system, stochastic scheduling, nite buer, loss penalty, threshold policy, c rule
1. Introduction We consider the problem of scheduling a single server in a two-class M=M=1 queue with nite buers and heterogeneous holding costs and service rates. Holding costs are incurred at rate cn for each unit of time that jobs of type n wait in the system. Preemptive service is assumed; that is, service can be interrupted before its completion. The arrivals of type n who nd buer n full are rejected, in which case a rejection cost, Sn , is incurred. In establishing the main results, we assume that there is no switching penalty upon switching from one queue to the other. The goal is to identify a scheduling policy which minimizes the total expected discounted holding and rejection cost over a horizon T . When queueing capacities are unlimited and hence there are no rejection penalties or lost jobs, the scheduling problem has been well investigated in the literature, and the c rule (also known as Smith's rule or the weighted shortest processing time discipline) is known to be optimal (see This material is based upon work supported by the National Science Foundation under Grant No. DMI-9522795 to Northwestern University and was performed as part of the rst author's dissertation under the supervision of the second author. 1
1
Buyukkoc et. al [2], Varaiya et. al [17], and Walrand [18]). The c rule is a simple index rule that always serves the available job with the largest c index. Work has also been done to extend the model to allow switching penalties. Hofri and Ross [7] and Liu et al. [11] studied the case of homogeneous systems. For heterogeneous systems, Duenyas and Van Oyen [3], [4] and Koole [10] partially characterized the optimal policy and developed heuristic approaches. Reiman and Wein [12] proposed heuristic scheduling policies under the heavy trac assumption. The feature of nite queue capacity occurs in many communication and manufacturing systems due to inherent limitations of system memory, pallets, and oor space as well as policy-imposed limits on buer size. Most of the literature, however, has typically assumed in nite buers, often for tractability. When the queue capacity constraint is introduced, analysis becomes complex even without switching penalties and few results have been reported in the area of optimal control of queues. Rosberg and Kermani [13] studied the problem of scheduling a shared resource in a nite queueing model without rejection and switching penalties. They found the asymptotically optimal policy under a low trac assumption. Based on this, they derived a threshold type of policy called the \over ow scheduling policy", which can abandon (i.e. never serve) queues with low c indices. For systems with setup times, Kim and Van Oyen [9] developed and tested heuristic policies for nite-capacity polling systems. This class of polling systems with setup times and setup costs was used in Kim et al. [8] as a model for which various implementations of the dynamic programming algorithm are compared. In addition to value iteration, policy iteration, modi ed policy iteration, the replacement process decomposition aggregation-based method, and the block scaling aggregation-based method, a new variation on modi ed policy iteration called \value iteration with policy evaluation" was developed, and the latter was found to be the most eective overall. The scheduling problem considered here can be viewed as a two-class polling model with zero set-up/walk times. Our model, however, diers from polling models in the literature in that we consider limited queueing capacity with rejection penalties and address the question of optimization with respect to the scheduling policy employed. Because dynamic scheduling policies can respond appropriately when the system reaches states with at least one buer near over ow, it is clear that 2
dierent policies result in dierent job loss rates. Holding costs actually provide an incentive to keep the queue lengths imbalanced and in some cases provide an incentive to maximize the loss rates. For these reasons, we include rejection penalties. The problem studied here shares features with the classical problem of scheduling a multiclass M=M=1 system under holding costs and in nite buers. Because of the lost sales penalties, however, the operating policies of this system should strike a balance between the total size of the queue and the number of jobs to be processed between queues. Therefore, the optimal policies of this problem take a dierent form from those of the classical system. Our main result is that the optimal policy under the rejection penalty assumption has a monotonic threshold type switching curve, provided that the delay of serving a job is always less costly than rejecting an arrival. When rejection penalties are zero, our numerical results showed that optimal policies are complex and do not possess this structure. In addition to the explicit goal of achieving improved performance, we believe that the study of optimal policies for this class of problems can be valuable in the design of heuristic policies. Toward this end, we contribute a brief discussion of results indicating the impact of switching costs on the policy structure. The rest of this paper is organized as follows: Section 2 provides a dynamic programming formulation of the problem. In Section 3, we show that the c rule is not optimal for nite multiclass M/M/1 queueing systems. Section 4 presents a proof of the existence of a monotonic threshold switching curve, and discusses generalizations of the basic problem to unequal service rates and switching costs.
2. Problem Formulation A single server is to be allocated to jobs in a system of two nite parallel queues of lengths
M1 ; M2 fed by Poisson arrivals. By parallel queues, we mean that a job served in any queue directly exits the system. Each queue n possesses an exponential service distribution with mean ?n 1 (0 < ?n 1 < 1). For technical reasons, we assume that service times are associated with the . Successive services in queue n are independent and server, and not job type; that is, 1 = 2 = identically distributed (i.i.d.) and independent of all else. Jobs arrive to queue n according to a 3
Poisson process with strictly positive rate n (independent of all other processes). Since nite-buer P =. A holding systems are inherently stable, we assume that = 2n=1 n 2 [0; 1); where n = n cost is assessed at rate cn for each job of type n in queue n. A rejection cost, Sn (0 < Sn < 1), is incurred at each instant an arrival of type n nds that queue n is full. A policy speci es, at each decision epoch, that the server either remain working in the present queue, idle in the present queue, or switch to another queue for service. Without loss of optimality, the class of admissible strategies, U , is taken to be the set of non-anticipative, stationary, nonrandomized, Markov policies that are based on perfect observations of the queue length processes (see Ross [14]). In addition, U is restricted to the class of greedy policies, which never idle in a nonempty queue, so admissible policies are work-conserving. Because of the preemptive service assumption, the set of decision epochs is assumed to be the set of all arrival epochs and service completion epochs. The optimal scheduling problem considered here can be formulated as a discrete-time stochastic dynamic programming problem by using uniformization (see Bertsekas [1]). This uniformized version has a transition rate = 1 + 2 + for all states. Without the loss of generality, we scale the time unit so that = 1. Let be the discount factor of this discrete-time MDP. Denote by x = (x1; x2) the queue length vector of queues 1 and 2 and the state space by S = f0; 1; : : :; M1g f0; 1; : : :; M2g f1; 2g. The state of the system under policy is described by the vector (x1 ; x2; n(t)) 2 S ; 8t 2 f0; 1; : : :; T g; where n(t) denotes that the server is located at queue n(t) at time t. The last component of the state, n(t), is not essential, but we have found it convenient in relating this model and analysis to those with switching penalties, such as will be discussed in Section 4.2. Suppose at a decision epoch t, the state is (x1; x2; n(t)), and let the action space be A = f1; 2g. Then, action A(t) = n 2 A, where n 6= n(t), causes the server to switch to queue n and serve it (if xn = 0, the server idles). The action A(t) = n(t) results in the service of a job in n(t) if xn(t) > 0; otherwise, idle in the current empty queue. No other actions are possible. Let rt(x; n)(st(x; n)) denote the expected discounted cost to go from state (x; n) given that the remain (switch) action is taken at t. With the initial condition VT (x; n) = 0, the DP equation of 4
the discrete time model under the total expected discounted cost criterion is given as follows:
Vt(x; n) = minfrt(x; n); st(x; n)g; t = 0; : : :; T ? 1;
(2.1)
where
st(x; n) = rt(x; n 1); 2 2 X X rt(x; n) = fcj xj + j Sj 11(xj = Mj )g + j Vt+1 (Aj x; n) + Vt+1 (Dnx; n); j =1
j =1
n1= An x =
(
and
Dn x =
(
(2.2) (2.3)
2 if n=1 1 if n=2,
((x1 + 1) ^ M1 ; x2) if n=1 (x1 ; (x2 + 1) ^ M2 ) if n=2,
(
((x1 ? 1) _ 0; x2) if n=1 (x1; (x2 ? 1) _ 0) if n=2.
3. Sub-optimality of The c Rule for Finite Multi-Class Queueing Systems with Holding Costs In this section, we assume the rejection penalties to be zero and focus on the case of holding costs alone and allow 1 6= 2 . The c rule is known to be optimal in scheduling multi-class M/M/1 queueing systems with in nite buers. It is also known, although we are not aware of any formal proofs, that if each queue has nite queueing capacity, the c rule is no longer an optimal scheduling policy for this class of systems. In this section, we provide proof of this. Suppose the capacity (buer size) of each queue is M1 and M2 , respectively. We begin with an intuitive argument indicating why the c rule fails in the case of nite buers. The quantity cn n can be interpreted as the expected (reward) rate per unit time at which holding cost is reduced when the server works at queue n. This reward rate index turns out to be useful for a number of objective functions, including problems with nite horizons, discounting, and the average cost per unit time criterion (see Gittins [5], Varaiya et. al [17], and Walrand [18]). Since the arrival rates cannot be controlled, it is appropriate that the scheduling policy only focus on the controllable rate at which work is performed. 5
When the queues are limited in size, what we refer to as a \boundary eect" arises. Near the boundaries, M1; M2, the server is able to in some sense control the arrival processes by allowing queues to over ow, even though over ow may be avoidable. In principle, however, we may think P of cost entering the system at rate 2i=1 ci i and thus an incentive of ici is gained by allowing queue i to over ow. At the boundaries of the state space, we can in turn rede ne the reward rates (expected holding cost reduction rates) for the service of queue n according to cn n 11fxn > 0g+
cn1 n1 11fxn1 = Mn1 g to quantify the eective rate of system holding cost reduction when the server selects a job of type n. For example, with x1 = M1 and x2 away from its boundary, if queue 1 is served, the expected rate of holding cost reduction is c11 ; while service of class 2 yields approximately c22 + c11, because queue 1 is full and type 1 arrivals are rejected. If c11 = c2 2 , one can expect an extra cost reduction rate, c11 , upon choosing the switching action to queue 2, provided queue 1 has nearly full capacity. Thus, for states where one queue is signi cantly larger than the other, there is an incentive to work in the shorter queue. As the states get farther away from these boundaries, the optimal policy becomes the c rule. This eect is seen in Figure 1, which presents a graphical description of an optimal policy under the average cost per unit time criterion for a system with
M1 = M2 = 10, c1 = 1:1, c2 = 1, 1 = 2 = 1, 1 = 0:3, and 2 = 0:1. The optimal policy was found using value iteration with a termination criterion that ensures the accuracy of the value function to be within = 10?5. Since c11 > c22 in this example, queue 1 is exhausted under the c rule (which is optimal if the queue capacity is not limited). Figure 1, however, shows queue 1 is not exhausted when queue 1 is almost full because of the boundary eect. For example, consider state x1 = 10; x2 = 1. If the server remains in queue 1, the expected cost reduction rate is c11 = 1:1. If the server switches to queue 2, the expected cost reduction rate becomes
c22 + c11 = 1:33. Therefore, switching to queue 2 saves more cost, even though c11 > c22 . In states with x1 = x2 = 10, comparison of the reward rates yields c1 1 = 1:1 < c22 + c11 = 1:33, which is consistent with choosing queue 2. The only boundary point (even with xi = 0) not correctly predicted by this approximate calculation is state (x1 = 9; x2 = 10; n = 1). 6
We employ these ideas to identify an extremely high-trac case that disproves the optimality of the c rule in general. To do so, we consider the problem under the average cost per unit time criterion rather than the total expected discounted cost criterion.
Theorem 1: The c rule is not optimal in general for a nite-buer multi-class M/M/1 queueing system with holding costs (and without rejection costs) under the average cost per unit time criterion.
Proof: Consider a two-queue example where c11 > c22, 1 > 1, and 2 < 1, where n = n=n ; n = 1; 2. Let X1 and X2 be the number of jobs in queues 1 and 2, respectively. Assume that the c rule is selected. Since c1 1 > c2 2 , queue 1 is a top priority queue. Furthermore, since 1 > 1, queue 2 will be abandoned most of the time and the server will almost always serve queue 1. The stationary probability of the number of jobs, when the server serves queue 1 exhaustively, is given by
P (X1 = i) = 1 ?1 ? M11 +1 1M1 ?i ; i = 0; : : :; M1;
(3.1)
1M1 + M11M1 +1 ] : E (X1) = 1[1 ? ((1M?1 + 1) M1 +1 )(1 ? )
(3.2)
1
where 1 = 1 =1 = ?1 1 . Let X1 = M1 ? X1 . Then, we have 1
1
Therefore, the expected number of jobs when the server serves queue 1 and never queue 2 is
E (X1) = E (M1 ? X1) = M1 ? E (X1):
(3.3)
As 1 ! 1, the fraction of time spent in queue 2 vanishes and the average cost per unit time under the c rule is then arbitrarily close to
h1 = c1(M1 ? E (X1)) + c2M2 :
(3.4)
Now consider an alternative rule that abandons queue 1 and serves only queue 2. The average cost per unit time for this policy is given by
h2 = c1 M1 + c2E (X2); 7
(3.5)
10 9 8
Second Queue Length
7 6 5 4 3 2 1 0 0
1
2
3
4 5 6 First Queue Length
7
8
9
10
Figure 1: Optimal policy for a nite two-queue system subject only to holding costs.
: Action of staying at the current queue. o : Action of switching from queue 1 to queue 2. * : Action of switching from queue 2 to queue 1. where
M +1 M 2 2 + M 2 2 2 ] ; E (X2) = 2 [1 ? (M2 +M1) (1 ? 2 2 +1 )(1 ? 2)
(3.6)
the expected number of jobs in queue 2.
Supposing 1 = 2 ! 0, we have E (X1) ! 0 and E (X2) ! 0. It follows that h1 ? h2 ! c2M2 because the advantage of the alternative rule with respect to the c rule becomes arbitrarily close to
h1 ? h2 = ?c1E (X1) + c2(M2 ? E (X2)):
(3.7)
2
Therefore, the c rule is not optimal.
4. Optimality of a Monotonic Threshold Policy In this section, with respect to minimizing discounted costs over a horizon, T , we show that there exists a monotonic threshold type of the switching curve under some technical assumptions. We begin with T 2 IN, then extend the result in a straightforward manner to the case with T = 1. We de ne 8
V (A x; n) ? V (x; n). n Vt (x; n) = t n t
In other words, n Vt(x; n) is the marginal cost of holding one more job of type n when the server is serving queue n. We rst state the basic properties held by the value function Vt .
Lemma 1: (i) Vt(x; n) = Vt(x; n 1);
(4.1)
n Vt(x; n) = n Vt (x; n 1);
(4.2)
n Vt (x; m) 0; 0 xn Mn ? 1; 0 xn1 Mn1 ;
(4.3)
(ii) (iii) If Vt (x; n) = st (x; n), then Vt (x; n 1) = rt (x; n 1).
Proof: (i) By de nition of Vt, it follows that Vt(x; n) = Vt(x; n 1). Therefore, we have n Vt(x; n) = n Vt (x; n 1): (ii) A straightforward induction argument establishes (ii), so we omit it. (iii) Since, by hypothesis, st (x; n) = rt (x; n 1) < rt (x; n) = st (x; n 1), we have Vt(x; n 1) =
rt(x; n 1).
2
For concreteness we assume that if rt(x; n) = rt(x; n 1), such as is the case when x = (0; 0), the action to remain is chosen for both states. If the system is not empty, Vt (x; n) = rt (x; n), and
rt(x; n) < rt(x; n 1) then Vt(x; n 1) = st(x; n 1). Our analysis of this problem is restricted to the following condition:
Assumption A: Sn cn=(1 ? ); n = 1; 2:
(4.4)
The right-hand side of (4.4) is the total discounted holding cost incurred to hold a job in the system forever. Assumption A is equivalent to saying that the delay of serving a job is always less costly than a rejected arrival. Intuitively, one might think that this condition suggests that for a policy to be optimal, it must not sacri ce throughput in exchange for reduced holding costs. Actually, this 9
is not so for two reasons. First, it is possible to have Sn1 cn =(1 ? ) Sn , so that optimality is achieved at a loss of throughput for type n 1 to prevent large holding costs for type n. Moreover, for a particular state such as (0,1,1), the eect of switching versus remaining may have such a small expected discounted impact on the rejection penalty that holding cost considerations dominate. Our numerical experiments verify the fact that Assumption A does not trivialize this problem. The following theorem states properties held by the t-stage optimal cost-to-go function Vt under Assumption A. Property (a) is a sucient condition to guarantee that a threshold policy is optimal and it is monotonic. We say Vt is supermodular and convex if it satis es property (b) and (d), respectively. In particular, we say Vt is diagonally dominant if it meets property (c) (see Ha [6] for terminology). Property (e) provides a upper bound on the marginal cost incurred when one more job is held in the system. Property (f) is a technical property. Note that all of the following properties are needed to properly justify (a).
Theorem 2: Suppose Assumption A holds. Then, with pn = nSn, the expected rejection cost during one-step transition, m = f1; 2g, and n = f1; 2g, we have (a) n rt(x; n) n rt(x; n 1); 0 xn Mn ? 1; 0 xn1 Mn1 ;
(4.5)
n Vt(x; m) n Vt (An1 x; m); 0 xn Mn ? 1; 0 xn1 Mn1 ? 1;
(4.6)
n Vt(An1 x; m) n Vt (An x; m); 0 xn Mn ? 2; 0 xn1 Mn1 ? 1;
(4.7)
n Vt (x; m) n Vt(An x; m); 0 xn Mn ? 2; 0 xn1 Mn1 ;
(4.8)
(b) (c)
(d) (e) n Vt (x; m) 1 ? f (n1 + )g (cn + pn ); 0 xn Mn ? 1; 0 xn1 Mn1 ; 1 ? (n1 + ) T ?t
10
(4.9)
(f)
n n Vt(x; m) < pn; 0 xn Mn ? 1; 0 xn1 Mn1 ;
(4.10)
Proof: We use induction. k = T is trivial because VT (x) = 0; 8x 2 S . Assume that (a){(f) hold up to t. Consider t ? 1. (a) n rt?1(x; n) ? n rt?1(x; n 1) = n fn Vt(An x; n) ? n Vt (An x; n 1)g11(xn < Mn ? 1)
(4.11) (4.12)
+n1 fn Vt(An1 x; n) ? n Vt(An1 x; n 1)g
(4.13)
+ fn Vt (Dnx; n)11(xn > 0) ? n Vt (Dn1x; n 1)g
(4.14)
0:
(4.15)
By Lemma 1 (i), n Vt (x0; n) = n Vt(x0 ; n 1), we have (4.12) = (4.13)=0. We have (4.14) 0 by (c) at t if xn > 0 and by Lemma 1 (ii) if xn = 0. (b) We distinguish all combinations of actions at time t ? 1 in states (x; n), (An x; n), (An1 x; n); and (An An1 x; n), denoted by (r=s; r=s; r=s; r=s), respectively. For example, (r; r; r; r) indicates that (x; n), (An x; n), (An1 x; n); and (An An1 x; n) all have remain actions. Using the result of (a), it can be shown that only 6 are admissible among 16 combinations of actions for m = n and
n 1, respectively. It is helpful to think in terms of Theorem 3, which is implied by (a) of Theorem 2. The monotonic threshold property rules out cases of the form (r; s; ; ), (; ; r; s), (; s; ; r), (s; ; r; ), or (; s; r; ) if m = n, which rules out cases (r; s; r; r), (r; s; s; s), (r; s; s; r), (r; r; r; s), (s; s; r; s), (r; s; r; s), (s; r; r; s), (s; s; s; r), (s; s; r; r), and (s; r; r; r). If m = n 1, it rules out cases of the form (s; r; ; ), (; ; s; r), (; r; ; s), (r; ; s; ), or (; r; s; ), which rules out cases (s; r; r; r), (s; r; r; s), (s; r; s; r), (s; r; s; s), (r; r; s; r), (r; s; s; r), (s; s; s; r), (r; r; r; s),(r; r; s; s), and (r; s; s; s). In the following proof, the rst two cases hold regardless of the server position. Cases 3{6 hold when m = n 1 and cases 7{10 hold when m = n. 1. (r; r; r; r): n rt?1(x; m) ? n rt?1 (An1 x; m) 11
(4.16)
= n fn Vt(An x; m) ? n Vt(An An1 x; m)g11(xn < Mn ? 1)
(4.17)
+n1 fn Vt(An1 x; m) ? n Vt (A2n1 x; m)g11(xn1 < Mn1 ? 1)
(4.18)
+ fn Vt (Dmx; m) ? n Vt(Dm An1 x; m)g
(4.19)
0:
(4.20)
The inequalities (4.17) 0 and (4.18) 0 follow by (b) at t. If m = n 1 and xn1 = 0, (4.19) = 0. If m = n and xn = 0, n Vt(Dn x; n) = 0 and (4.19) 0 by Lemma 1 (ii). Otherwise, (4.19) 0 by (b) at t. 2. (s; s; s; s): This case is identical to that of case 1 because m is arbitrary and n st?1 (x; m) = n rt?1(x; m 1) n rt?1(An1 x; m 1) = n st?1 (An1 x; m). 3. (r; s; r; r); m = n 1: Using the de nition of the value functions and the result of case 1, we get
st?1 (Anx; n 1) ? rt?1(x; n 1) n rt?1(x; n 1) nrt?1(An1 x; n 1). 4. (s; s; r; s); m = n 1 : Using the de nition of the value functions and the result of case 2, we get
st?1 (An1 An x; n 1) ? rt?1(An1 x; n 1) nst?1 (An1 x; n 1) nst?1 (x; n 1). 5. (s; s; r; r); m = n 1 : n st?1 (x; n 1) ? n rt?1 (An1 x; n 1) = n fn Vt(An x; n) ? n Vt(An An1 x; n 1)g11(xn < Mn ? 1)
(4.21) (4.22)
+n1 fn Vt (An1 x; n) ? n Vt (A2n1 x; n 1)g11(xn1 < Mn1 ? 1) (4.23) + fn Vt(Dn x; n)11(xn > 0) ? n Vt(x; n 1)g
0:
(4.24) (4.25)
By (b) at t, (4.22) 0 and (4.23) 0. If xn > 0, (4.24) 0 by (d) at t; otherwise by Lemma 1 (ii). 12
6. (r; s; r; s); m = n 1 : By assumption and Lemma 1 (iii), we have
rt?1 (An1 Anx; n) ? rt?1 (An1 x; n 1)
(4.26)
?frt?1(Anx; n) ? rt?1(x; n 1)g
(4.27)
= n1 rt?1(An x; n) ? n1 rt?1 (x; n 1)
(4.28)
n1 rt?1(x; n) ? n1 rt?1(x; n 1) (by case 1)
(4.29)
0 (by (a) at t ? 1):
(4.30)
The cases 7{10, (s; r; s; s), (r; r; s; r), (r; r; s; s), and (s; r; s; r) that are admissible when m = n can be proved by cases 3{6, respectively using rt?1(x0; n) = st?1 (x0 ; n 1). (c) Denote by (r=s; r=s; r=s; r=s) the actions of (An1 x; m); (AnAn1 x; m); (Anx; m); and (A2n x; m). The monotonic threshold property rules out cases (r; s; r; r), (r; s; r; s), (r; s; s; r), (r; s; s; s), (r; r; r; s), (s; r; r; s), (s; s; r; s), (r; r; s; r), (r; r; s; s), (s; r; s; r), and (s; r; s; s) if m = n; (s; r; s; s), (s; r; s; r), (s; r; r; s), (s; r; r; r), (s; s; s; r), (r; s; s; r), (r; r; s; r), (s; s; r; s), (s; s; r; r), (r; s; r; s), and (r; s; r; r) if
m = n 1. In the following proof, the rst two cases hold regardless of the server position. Cases 3{5 hold when m = n and cases 6{8 hold when m = n 1. 1. (r; r; r; r): n rt?1 (An1 x; m) = cn + n n Vt (An An1 x; m)
(4.31)
+n1 n Vt(A2n1 x; m)11(xn1 < Mn1 ? 1)
(4.32)
+ n Vt (DmAn1 x; m):
(4.33)
n rt?1 (An x; m) = cn + pn 11(A2n xn = Mn ) + n n Vt (A2n x; m)11(xn < Mn ? 2) (4.34) +n1 n Vt(An1 An x; m)
(4.35)
+ n Vt (DmAn x; m):
(4.36)
13
When xn < Mn ? 2, the inequality (4.31) (4.34) follows by (c) at t; otherwise, (4.31) < (4.34) by (f). If xn1 < Mn1 ? 1, (4.32) (4.35) by (c) at t; otherwise, by Lemma 1 (ii). If
m = n 1 and xn1 = 0, (4.33) (4.36) by (d); if m = n and xn = 0, n Vt(DnAn x; n) = 0 and, thus, the result follows by Lemma 1 (ii); otherwise, by (c) at t. 2. (s; s; s; s): This case is identical to that of case 1 because m is arbitrary and n st?1 (An1 x; m) = n rt?1(An1 x; m 1) n rt?1(An x; m 1) = n st?1 (An x; m). 3. (s; s; r; r); m = n : n st?1 (An1 x; n) = cn + n n Vt(An An1 x; n 1)
(4.37)
+n1 n Vt (A2n1 x; n 1)11(xn1 < Mn1 ? 1)
(4.38)
+ n Vt(x; n 1):
(4.39)
n rt?1(An x; n) = cn + pn 11(A2n xn = Mn ) + n n Vt(A2n x; n)11(xn < Mn ? 2) (4.40) +n1 n Vt(An1 An x; n)
(4.41)
+ n Vt(x; n):
(4.42)
The inequality (4.37) (4.40) follows by (c) at t when xn < Mn ? 2 and by (f) when
xn = Mn ? 2. The inequality (4.38) (4.41) follows by (c) at t if xn1 < Mn1 ? 1; otherwise, by Lemma 1 (ii). By Lemma 1 (i), n Vt(x; n) = n Vt (x; n 1), and we have (4.39) = (4.42). 4. (s; s; s; r); m = n: The result comes from the case (s; s; r; r) because
rt?1 (A2nx; n) ? st?1 (An x; n) nrt?1(An x; n)
nst?1(An1 x; n): 5. (s; r; r; r); m = n: 14
(4.43) (4.44)
The result comes from the case (s; s; r; r) because
rt?1(An1 An x; n) ? st?1 (An1 x; n) nst?1 (An1 x; n)
nrt?1(Anx; n):
(4.45) (4.46)
The cases 6{8, (r; r; s; s), (r; r; r; s), and (r; s; s; s) that are admissible when m = n 1 can be proved by cases 3{5, respectively, using rt?1(x0 ; n 1) = st?1 (x0; n). (d) From (b) and (c), we have n Vt?1(x; m) n Vt?1(An1 x; m) n Vt?1 (An x; m). (e) Using supermodularity, (b), and convexity, (d), it can be easily shown that n Vt?1(Mn ? 1; Mn1 ; n) n Vt?1(x; n); 0 xn Mn ? 1; 0 xn1 Mn1 . Therefore, it suces to prove that (4.9) holds when xn = Mn ? 1 and xn1 = Mn1 . We distinguish 4 combinations of actions in states (Mn ; Mn1 ; m) and (Mn ? 1; Mn1 ; m), since (a) disallows the case of switch and remain, respectively, when m = n and remain and switch, respectively, when m = n 1. 1. (Mn; Mn1 ; m) = (Mn ? 1; Mn1 ; m) = remain: n rt?1(Mn ? 1; Mn1 ; m)
(4.47)
= cn + pn + n1 n Vt(Mn ? 1; Mn1; m) + n Vt (Mn ? 2; Mn1; m)
(4.48)
cn + pn + (n1 + ) n Vt(Mn ? 1; Mn1; m) (by (d)) T ?t cn + pn + (n1 + ) 1 ?1f? ( (n1 ++)g) (cn + pn) (by (e) at t) n1 T ? t +1 = 1 ? f1 ?( n(1 + +)g) (cn + pn ): n1
(4.49) (4.50) (4.51)
2. (Mn; Mn1 ; m) = (Mn ? 1; Mn1 ; m) = switch: This case is identical to case 1 because m is arbitrary and n st?1 (Mn ? 1; Mn1 ; m) = n rt?1(Mn ? 1; Mn1 ; m 1). 3. (Mn; Mn1 ; n) = remain and (Mn ? 1; Mn1 ; n) = switch: The result follows from case 2 because rt?1(Mn ; Mn1 ; n) ? st?1 (Mn ? 1; Mn1 ; n)
st?1 (Mn; Mn1 ; n) ? st?1 (Mn ? 1; Mn1; n): 15
4. (Mn; Mn1 ; n 1) = switch and (Mn ? 1; Mn1 ; n 1) = remain: The result follows from case 1 because st?1 (Mn ; Mn1 ; n) ? rt?1(Mn ? 1; Mn1 ; n)
rt?1(Mn; Mn1 ; n) ? rt?1(Mn ? 1; Mn1 ; n). (f) Using the same argument as in (e), it suces to prove that (4.10) holds when xn = Mn ? 1 and
xn1 = Mn1 . Since n + n1 + = 1, we have by (e) T ?t+1
n n Vt?1(Mn ? 1; Mn1 ; m) n 1 ? f1 ?( n(1 + +)g) (cn + pn ) n1 ( c + p ) < 1 ?n (n +n ) n1 n1 ?[(1 ?( )Sn++p) n ] (by (4.4)) n1 [(1 ? = n 1 ? ()pn =+n+) pn ] n1 ? ) + n = (1 1 ? + pn n
< pn :
The last inequality comes from the fact that < 1.
(4.52) (4.53) (4.54) (4.55) (4.56) (4.57)
2
Lemma 2: Let x1 = (xn ; xn1) and x2 = (x0n; xn1), where xn < x0n. Then, rt(x2; n) ? rt(x1; n) rt(x2 ; n 1) ? rt(x1; n 1): Proof: The result follows by iteratively applying (a) of Theorem 2.
(4.58)
2
Theorem 3: There exists a T -stage optimal policy with a nonstationary monotonic threshold type of switching curve. That is, for t = 0; 1; : : :; T and n = 1; 2, there exists a threshold function nt () : f1; 2; : : :; Mn g ! f1; 2; : : :; Mn1 ; 1g such that nt (xn ) := inf fxn1 2 f1; 2; : : :; Mn1 g : st (x; n) < rt(x; n)g
(4.59)
such that (a) it is optimal to switch from n to n 1 at t if xn1 nt (xn ). (b) The threshold function nt (xn ) is increasing in xn : nt (xn ) nt (xn + 1); 0 xn Mn ? 1: 16
(4.60)
Proof: (a) We prove this claim by contradiction. Let x1 = (xn ; nt(xn)) and x2 = (xn; xn1). Suppose
Vt(x2; n) = rt(x2; n), for nt (xn) < xn1 Mn1 . It follows that
rt (x1; n) > rt(x1 ; n 1);
(4.61)
rt (x2; n) < rt(x2 ; n 1):
(4.62)
rt(x2; n 1) ? rt(x1; n 1) > rt(x2; n) ? rt(x1 ; n):
(4.63)
From these two inequalities,
However, this is a contradiction by Lemma 2, which follows from (a) of Theorem 2. (b) Again, we use contradiction. Suppose nt (xn ) > nt (xn +1) for some xn . Let x1 = (xn ; nt (xn +
1)) and x2 = (xn + 1; nt(xn + 1)). From the de nition of nt () and (a),
Vt(x1 ; n) = rt(x1; n) < st (x1; n) = rt(x1 ; n 1):
(4.64)
From the de nition of (xn + 1),
Vt(x2 ; n) = st (x2; n) = rt(x2; n 1) < rt (x2; n):
(4.65)
Subtracting Eq. (4.64) from Eq. (4.65), n rt(x1 ; n) = rt(x2; n) ? rt(x1 ; n) > rt (x2; n 1) ? rt (x1; n 1) = n rt (x1; n 1):
(4.66)
However, this is a contradiction by (a) of Theorem 2.
2 Now we extend the above results to the case of T = 1. Since the one stage costs for our scheduling problem are bounded and nonnegative, we have a monotone increasing sequence of value functions in the horizon T , V (x) = limT !1 VT (x) where V is the optimal value function over an in nite horizon (see Proposition 1, Ch 5 of Bertsekas [1]). This implies that Lemma 1 17
S1 S2
c1
c2 1 2 1 2
10 10 1.01 1
2
2
.3 .7
Table 1: Input data for Example 1. and Theorem 2 hold for V (x). Therefore, Lemma 2 and Theorem 3 also hold for V (x) and a stationary, nonrandomized monotonic threshold policy is optimal. Assumption A is restrictive and will not always hold in practice. Nevertheless, it cannot be removed without diculty. The reason is that the structural property of a threshold policy need not be optimal in general, as Example 1 in Table 1 indicates strongly. For = 0:9 and 0.99, respectively, we computed the optimal policies using value iteration, and they are shown in Figures 2 and 3, respectively, with the legend for both placed beneath Figure 3. To interpret these gures, notice that if the optimal policy indicates a switch to queue n 1 from state (x; n), then this implies that the server remains in queue n 1 when the state is (x; n 1). The termination criterion was set to 10?5 . Assumption A does not hold for = :9 or for = :99. Although a monotonic switching curve is still observed in the rst case, in Figure 3, the region with x1 = 10 and 4 x2 7 shows that the threshold property is violated in a small part of the state space. It is interesting to note that for c1 = 1, the optimal policy is similar to Figure 2 (with more states indicating service in queue 2, the one with heavy trac). For cases in which c1 0:7, we found that a sort of \top-priority" service is given to queue 2, and queue 1 is served only when x2 = 0 or x1 = M1 ; x2 < M2 . Problems with 1 6= 2 are more complex. Having discussed Assumption A and proved our result, we pause to clarify the novelty of our insights. Suk and Cassandras [16] treated a variation on the problem solved above by including the possibility of unequal service rates. The condition proposed for a monotonic threshold policy to be optimal was given as n Sn cn for n = 1; 2. Unfortunately, this condition is not sucient in general. To see this, Example 1 above satis es this condition and does not maintain the threshold structure as approaches 1. A key point of diculty with their argument occurs when for x2 =
M2 ? 1, the term 2 in their paper becomes negative, contradicting their inductive proof. 18
10 9 8
Second Queue Length
7 6 5 4 3 2 1 0 0
1
2
3
4 5 6 First Queue Length
7
8
9
10
Figure 2: Optimal policy for Example 1 when = 0:9. 10 9 8
Second Queue Length
7 6 5 4 3 2 1 0 0
1
2
3
4 5 6 First Queue Length
7
8
9
10
Figure 3: Optimal policy for Example 1 when = 0:99.
: Action of staying at the current queue. o : Action of switching from queue 1 to queue 2. * : Action of switching from queue 2 to queue 1.
19
4.1. Cases with Dierent Service Rates In this section, we make Assumption A and assume that the server has dierent service rates (i.e. 1 6= 2 ). Although we cannot prove the monotonicity of the switching curve when 1 6= 2 , we provide some numerical observations that support our conjecture that there exist monotonic threshold policies which are optimal. We observed that the diagonal dominance property is violated for some asymmetrical test cases; however, a monotonic threshold policy still remained optimal. Based on our numerical results, however, we conjecture that the convexity and supermodularity do hold. In the case of equal service rates, the diagonal dominance property was crucial for the proof of the monotonicity of an optimal switching curve. The reasoning behind this is as follows. Suppose that for some xn , n Vt (An1 x; n) > n Vt(An x; n 1). Under the equal service rate assumption, this clearly leads to n rt(x; n) > n rt (x; n 1) by (4.12){(4.14) and hence, the violation of the monotonicity of switching curve. However, under the assumption of dierent service rates, these equations show that it can be true that n n Vt(An1 x; n) n1 n Vt(An x; n 1) and hence n rt (x; n) n rt (x; n 1), which implies (a) of Theorem 2.
4.2. Cases with Switching Cost Penalty In this section, we assume that a switching cost Kn1 (Kn ) is incurred when the server switches from queue n (n 1) to queue n 1 (n) and investigate if the monotonicity of the switching curve is preserved under Assumption A with the symmetric service rates, 1 = 2 . The DP equation for this MDP model is given by
Vt(x; n) = minfrt(x; n); st(x; n)g; t = 0; : : :; T ? 1;
(4.67)
with a revised de nition only for st (x; n):
st(x; n) = Kn1 + rt(x; n 1):
(4.68)
Our numerical results indicate that the switching curve is not monotonic in general. Figure 4 presents the optimal policy for Example 2 of Table 2 in which one point destroys the monotonicity 20
property. As suggested in this diagram, optimal policies are nearly monotonic. The termination criterion and discount factor were set to 10?5 and 0.9, respectively, and we checked that it is strictly suboptimal to switch in state (1,1,2). Consider states (0,2,2), (1,2,2), (2,2,2), (1,1,2), and (2,1,2). The optimal value function values are: V(0,2,2)=4.375, V(1,2,2) = 6.715, V(2,2,2) = 8.922, V(1,1,2) = 5.097, V(2,1,2) = 7.047. Since 1 V (0; 2; 2) > 1V (1; 2; 2) and 1V (0; 2; 2) > 1V (1; 1; 2), we see that the convexity and diagonal dominance properties do not hold for this example. We also tested the Example 3 of Table 2, which has larger switching costs than Example 2. Interestingly, when Kn and Kn1 become high, the monotonicity is preserved and when they become low, the monotonicity is violated. For Figure 5 shows the switching curve is monotonic when the switching costs are increased from 0:33 units to K1 = K2 = 1 units. This illustrates the fact that the monotonicity of the switching curve is very sensitive to the switching costs. We explain this phenomenon intuitively as follows. The most obvious eect of the switching penalty is that the incentive not to switch produces a border region between the regions of switching to the other queue. With K1 = K2 = 0, each state yielded a preferred service location and the switching curves for switching to 1 and for switching to 2 met. With K1 ; K2 > 0, the states around the switching curve become states in which the action remain is preferred regardless of server position. Supposing queue 2 is already set up and x2 = 0, the threshold for switching to queue 1 is at its lowest level for the following reason. In terms of holding costs, remaining in queue 2 yields an immediate reward rate of zero and waiting for the next arrival and serving the corresponding busy period yields at best c2 2, which is lower than c22 ; the rate available when x2 1. For x2 1, an immediate reward rate of c2 2 can be earned by serving an available job in queue 2. Since the selection of the largest reward rate is appropriate, this implies that the switching curve should increase signi cantly in x1 for x2 = 1 compared with x2 = 0. For 1 x2 M2 , we shift our focus to the sensitivity of reward rate to x1 . An instantaneous switch from queue 2 to 1 incurs a switching cost which we think of as being amortized over the length of time spent in queue 1 prior 21
Example K1 K2 S1 S2 c1 c2 1 2 1 2 2 0.33 0.33 20 10 1.5 1 2 2 .3 .7 3 1 1 20 10 1.5 1 2 2 .3 .7 Table 2: Input data for Examples 2 and 3. to returning to queue 2. Even without precisely specifying an equation, it is clear that the eect of the amortization is to increase the reward rate to be gained by switching as x1 (and thus the length of time spent in queue 1) increases. Because an immediate reward rate of c2 2 is available in queue 2 for x2 1, the switching curve tends to be insensitive to x2 in this region. Another explanation of the non-monotonic switching curve near the origin is that because an optimal policy limits the eective rate of switching, the threshold for x2 = 1 is set high so that sucient time is spent in queue 1 to allow arrivals to build queue 2 and thereby result in a low switching cost per unit time during the cycle from queue 2 to 1 back to 2 and then switching to 1. For states with x1 in the neighborhood of M1 , the threshold values increase signi cantly in response to the large rejection penalties applied at x1 = M1. This is consistent with an increasing threshold curve.
5. Conclusion In this paper we studied a scheduling problem for nite-buer two-class queueing systems with rejection penalties, a feature commonly encountered in engineering systems. We proved that there exists a monotonic, threshold-type of switching curve under the assumption that the delay of serving a job is always less costly than rejecting an arrival. We also showed by numerical example that the monotonicity of the switching curve does not hold in general when switching costs are introduced. We have oered intuitive explanations of the dominant characteristics observed in optimal policies, thereby making it possible to establish that the c rule cannot be optimal, even without rejection costs. It is our hope that results obtained can be applied to more complex problems, including scheduling multi-class, make-to-stock production systems with lost sales. The development of ecient heuristics for scheduling nite-buer multi-class queueing systems remains an interesting and complementary topic for future research. 22
10 9 8
Second Queue Length
7 6 5 4 3 2 1 0 0
1
2
3
4 5 6 First Queue Length
7
8
9
10
Figure 4: Optimal policy for Example 2. 10 9 8
Second Queue Length
7 6 5 4 3 2 1 0 0
1
2
3
4 5 6 First Queue Length
7
8
9
10
Figure 5: Optimal policy for Example 3 (increased switching costs).
: Action of staying at the current queue. o : Action of switching from queue 1 to queue 2. * : Action of switching from queue 2 to queue 1.
23
Bibliography [1] Bertsekas, D.P. (1987) Dynamic Programming: Deterministic and Stochastic Models, Prentice-Hall, Englewood Clis. [2] Buyukkoc, C., Varaiya, P., and Walrand, J. (1985) The c-rule revisited, Advances in Applied Probability 17, 237{238. [3] Duenyas, I. and Van Oyen, M.P. (1995) Stochastic Scheduling of Parallel Queues with Set-Ups Costs, Queueing Systems (QUESTA) 19:4, 421{444 [4] Duenyas, I. and Van Oyen, M.P. (1996) Heuristic Scheduling of Parallel Heterogeneous Queues with Set-Ups, Management Sciences 42:6, 814-829 [5] Gittins, J.C. (1989) Multi-armed Bandit Allocation Indices, Wiley, New York. [6] Ha, A.Y. (1997). Optimal dynamic scheduling policy for a make-to-stock production system, Operations Research 45, 42{53. [7] Hofri, M. and Ross, K.W. (1987) On the optimal control of two queues with server setup times and its analysis, SIAM Journal on Computing 16, 399{420. [8] Kim, E., Van Oyen, M.P., and Rieders, M. (1998) General Dynamic Programming Algorithms Applied to Polling Systems, Commun. in Statistics: Stochastic Models 14(5). [9] Kim, E., and Van Oyen, M.P. (1998) Finite-capacity multi-class production scheduling with setup times, Working paper. [10] Koole, G. (1994) Assigning a single server to inhomogeneous queues with switching costs, To appear Theoretical Computer Science. [11] Liu, Z., Nain, P., and Towsley, D. (1992) On optimal polling policies, Queueing Systems (QUESTA) 11, 59{83. [12] Reiman, M. and Wein, L. M. (1994) Dynamic scheduling of a two-class queue with setups, Preprint. [13] Rosberg, Z. and Kermani, P. (1992). Customer scheduling under queueing constraints, IEEE Transactions on Automatic Control 37:2, 252-257. [14] Ross, S.M. (1983) Introduction to Stochastic Dynamic Programming, Academic Press, New York. [15] Ross, S.M. (1983). Stochastic Processes, Wiley, New York. [16] Suk, J.B. and Cassandras, C.G. (1991) Optimal scheduling of two competing queues with blocking, IEEE Transactions on Automatic Control AC-36, 1086{1091. [17] Varaiya, P., Walrand, J., and Buyukkoc, C. (1985) Extensions of the multi-armed bandit problem, IEEE Transactions on Automatic Control AC-30, 426{439. [18] Walrand, J. (1988) An Introduction to Queueing Networks, Prentice-Hall, Englewood Clis.
24