Mail Station JO4.7, Box 830688, Richardson, TX 75083-0688, USA. Hanqin Zhang ... this paper is as follows. In Section 2, we formulate an N -machine flowshop.
On Optimality of Stochastic N -Machine Flowshop with Long-Run Average Cost Ernst Presman Central Economics and Mathematics Institute The Russian Academy of Sciences, Moscow, Russia Suresh P. Sethi School of Management, The University of Texas at Dallas Mail Station JO4.7, Box 830688, Richardson, TX 75083-0688, USA Hanqin Zhang Institute of Applied Mathematics Academia Sinica, Beijing, 100080, P. R. China and Qing Zhang Department of Mathematics, University of Georgia Athens, GA 30602, USA
Abstract This paper is concerned with the problem of production planning in a stochastic manufacturing system with serial machines that are subject to breakdown and repair. The machine capacities are modeled by a Markov chain. The objective is to choose the input rates at the various machines over time in order to meet the demand for the system’s production at the minimum long-run average cost of production and surplus, while ensuring that the inventories in internal buffers between adjacent machines remain nonnegative. The problem is formulated as a stochastic dynamic program. We prove a verification theorem and derive the optimal feedback control policy in terms of the directional derivatives of the potential function.
Electronic copy available at: http://ssrn.com/abstract=1125125
1
Introduction
Beginning with Bielecki and Kumar (1988), there has been a considerable interest in studying the problem of production planning in stochastic manufacturing systems with the objective of minimizing long-run average cost. Bielecki and Kumar (1988) deal with a single machine with two states: up and down), single product problem with linear holding and backlog costs. Because of the simple structure of their problem, they are able to obtain an explicit solution for the problem, and thus verify the optimality of the resulting policy (hedging point policy). Sharifinia (1988) deal with an extension of the Bielecki-Kumar model with more than two machine states. Liberopoulous and Caramanis (1998) show that Sharifinia’s method for evaluating hedging point policies applies even when the transition rates of the machine states depend on the production rate. Liberopoulous and Hu (1995) obtain monotonicity of the threshold levels corresponding to different machine states. At the same time, there has been a number of heuristic analyses of the multi-product problem. Srisvatsan (1993) and Srisvatsan and Dallery (1998) consider a two-product problem. Caramanis and Sharifinia (1991) decompose a multi-product problem into an analytically tractable single-product problem in order to obtain near-optimal hedging points for the problem. All of these papers, however, are heuristic in nature, since they do not rigorously prove the optimality of the policies for their extensions of the Bielecki-Kumar model. The difficulty in proving the optimality lies in the fact that when the problem is generalized to include convex costs and multiple machine capacity levels, explicit solutions are no longer possible, whereas the Bielecki-Kumar proof of optimality depends on being able to explicitly obtain the value function of the problem. One needs, therefore, to develop appropriate dynamic programming equations, existence of their solutions, and verification theorems for optimality. This is accomplished by Sethi et al. (1997) and Sethi et al. (1998) for single and multi-product problems. They use the vanishing discount approach to prove the optimality of hedging point policies for convex surplus costs and linear/convex production costs. These models are also considered by Duncan et al. (2001) with additional variables such Markov product demand. Their results make precise the heuristic treatments of several manufacturing system problems carried out by Sharifinia (1988), Srisvatsan and Dallery (1998), and others. Presman et al. (1995, 1997) considered flowshops and obtained optimal control policies for such problems in the context of the discounted cost criterion. A characteristic difficulty of the flowshop is the presence of the state constraints arising from the requirement that the number of parts in the internal buffers between any two adjacent machines must remain nonnegative. Our objective in this paper is to treat their problem with a view to minimizing the long-run average of expected production and surplus costs. We write the Hamilton-JacobiBellman equation in terms of directional derivatives and prove a verification theorem. Using the vanishing discount approach for the average-cost problem, we obtain a solution of the Hamilton-Jacobi-Bellman (HJB) equation. Two major contributions are made in order to implement the vanishing discount approach. One is in constructing a control policy which takes any given system state to any other state in a time whose rth moment has a finite expectation. The other is in obtaining a solution of the HJB equation for the problem in terms of directional derivatives by a limit procedure for 1
Electronic copy available at: http://ssrn.com/abstract=1125125
the discounted cost problem as the discount rate tends to zero. The plan of this paper is as follows. In Section 2, we formulate an N -machine flowshop as a continuous-time stochastic optimal control problem and specify required assumptions. In Section 3, the HJB equation for the problem in terms of directional derivatives is specified, and a verification theorem for optimality over a class of admissible controls is given. Moreover, by using the vanishing discount approach, it is shown that the HJB equation for the problem has a solution. Section 5 concludes the paper.
2
Problem Formulation
We consider a manufacturing system producing a single finished product using N machines in tandem that are subject to breakdown and repair. We are given a stochastic process m(t) = (m1 (t), · · · , mN (t)) on the standard probability space (Ω, F, P ), where mk (t), k = 1, · · · , N, is the capacity of the kth machine at time t. We use uk (t) to denote the input rate to the kth machine, k = 1, · · · , N , and xk (t) ≥ 0 to denote the number of parts in the buffer between the kth and (k + 1)th machines, k = 1, · · · , N − 1. We assume a constant demand rate d. The difference between cumulative production and cumulative demand, called surplus, is denoted by xN (t). If xN (t) > 0, we have finished good inventories, and if xN (t) < 0, we have backlogs. The dynamics of the system can be written as follows: x˙ k (t) = uk (t) − uk+1 (t), xk (0) = x0k , k = 1, · · · , N,
(1)
where uN +1 (t) = d. This relation can be written in the following vector form: ˙ x(t) = Au(t), x(0) = x0 ,
(2)
where A : RN +1 → RN is the corresponding linear operator. Here and elsewhere we use boldface letters to stand for vectors. Since the number of parts in the internal buffers cannot be negative, we impose the state constraints xk (t) ≥ 0, k = 1, · · · , N − 1. To formulate the problem precisely, let S = [0, ∞)N −1 ×(−∞, ∞) ∈ RN denote the state constraint domain, b(S) denote the boundary of S, and S o = S \ b(S). For m = (m1 , · · · , mN ), mk ≥ 0, k = 1, · · · , N , let U (m) = {u = (u1 , · · · , uN , d) : 0 ≤ uk ≤ mk , k = 1, · · · , N }, (3) and for x ∈ S, let U (x, m) = {u ∈ U (m) : xk = 0 ⇒ uk − uk+1 ≥ 0, k = 1, · · · , N − 1}.
(4)
Let the sigma algebra Ft = σ{m(s) : 0 ≤ s ≤ t}. We now define the concept of admissible controls. Definition 2.1. We say that a control u(·) = (u1 (·), · · · , uN (·), d) is admissible with respect to the initial state vector x0 = (x01 , · · · , x0N ) ∈ S, if (i) u(·) is an Ft -adapted process; (ii) u(t) is Borel measurable in t a.s. and u(t) ∈ U (m(t)) for all t ≥ 0; (iii) the corresponding state process x(t) = (x1 (t), · · · , xN (t)) ∈ S for all t ≥ 0. 2
The problem is to find an admissible control u(·) that minimizes the long-run average cost function Z T 1 J(x0 , m0 , u(·)) = lim sup E h(x(t), u(t))dt, (5) 0 T →∞ T where h(·, ·) defines the cost of surplus and production and m0 is the initial value of m(t). We impose the following assumptions on the process m(t) = (m1 (t), · · · , mN (t)) and the cost function h(·, ·) throughout this paper. (A.1) Let M = {m1 , · · · , mp } for some integer p ≥ 1, where mj = (mj1 , · · · , mjN ), with mjk , k = 1, · · · , N , denote the capacity of the kth machine, j = 1, · · · , p. The capacity process m(t) ∈ M is a finite state Markov chain with the infinitesimal generator Qf (m) =
X
qm0 m [f (m0 ) − f (m)]
m0 6=m
for some qm0 m ≥ 0 and any function f (·) on M. Moreover, the Markov process is strongly irreducible and has the stationary distribution pmj , j = 1, · · · , p. P (A.2) Let pk = pj=1 mjk pmj . Assume that min1≤k≤N pk > d. (A.3) h(·, ·) is a non-negative, jointly convex function that is strictly convex in either x or u or both. For all x, x0 ∈ S and u, u0 ∈ U (mj ), j = 1, · · · , p, there exist constants C0 and Kh ≥ 1 such that |h(x, u) − h(x0 , u0 )| ≤ C0 (1 + |x|Kh + |x0 |Kh )(|x − x0 | + |u − u0 |). We use A(x0 , m0 ) to denote the set of all admissible controls with respect to x0 ∈ S and m(0) = m0 . Let λ(x0 , m0 ) denote the minimal expected cost, i.e., λ(x0 , m0 ) =
inf
u(·)∈A(x0 ,m0 )
J(x0 , m0 ).
(6)
For writing the HJB equation for our problem, we first introduce some notation. Let G denote the family of real-valued functions f (·, ·) defined on S × M such that (i) f (·, m) is convex for any m ∈M; (ii) there exists a function C(x, x0 ) such that for any m ∈M and any x, x0 ∈S, |f (x, m) − f (x0 , m)| ≤ C(x, x0 )|x − x0 |. Remark. By Theorem 10.4 on Page 86 of Rockafellar (1972), (i) and (ii) imply that f (·, m) is Lipschitzian on any closed bounded subset of S for any m ∈ M. Formally, we write the HJB equation in terms of directional derivatives for our problem as λ=
inf
{∂Au f (x, m) + h(x, u)} + Qf (x, ·)(m),
u∈U (x,m)
(7)
where λ is a constant, f (·, ·) ∈ G, and ∂Au f (x, m) denotes the directional derivative of function f (x, m) along the direction Au ∈ RN .
3
3
Main Results
First we have the following verification theorem. Theorem 3.1. Assume that (λ, f (·, ·)) with f (·, ·) convex on S × M satisfies (7), there exists a constant function u∗ (x, m) for which inf
u∈U (x,m)
{∂Au f (x, m) + h(x, u)} = ∂Au∗ (x,m) f (x, m) + h(x, u∗ (x, m)),
(8)
˙ and the equation x(t) = Au∗ (x(t), m(t)), has for any initial condition (x∗ (0), m(0)) = (x0 , m0 ), ∗ a solution x (t) such that Ef (x∗ (T ), m(T )) lim = 0. (9) T →∞ T Then u∗ (t) = u∗ (x∗ (t), m(t)) is an optimal control. Furthermore, λ(x0 , m0 ) does not depend on x0 and m0 , and it coincides with λ. Moreover, for any T > 0, "Z 0
0
f (x , m ) =
inf
u(·)∈A(x0 ,m0 )
"Z
= E
0
T
E
T 0
#
(h(x(t), u(t)) − λ) dt + f (x(T ), m(T )) #
(h(x∗ (t), u∗ (t)) − λ) dt + f (x∗ (T ), m(T )) .
(10)
Proof. Since (λ, f (·, ·)) is a solution to (7) and (x∗ (t), u∗ (t)) satisfy condition (8), we have ∂Au(t) f (x∗ (t), m(t)) + Qf (x∗ (t), ·)(m(t)) = λ − h(x∗ (t), u∗ (t)).
(11)
Since f (·, ·) ∈ G, we apply Dynkin’s formula and use (11) to get Ef (x∗ (T ), m(T )) 0
0
= f (x , m ) + E = f (x0 , m0 ) + E
Z Th 0
Z T 0
i
∂Au(t) f (x∗ (t), m(t)) + Qf (x∗ (t), ·)(m(t)) dt
[λ − h(x∗ (t), u∗ (t)]dt
= f (x0 , m0 ) + λT − E
Z T 0
(12)
h(x∗ (t), u∗ (t))dt.
We can rewrite (12) as λ=
Z T i 1 1h Ef (x∗ (T ), m(T )) − f (x0 , m0 ) + E h(x∗ (t), u∗ (t))dt. T T 0
(13)
Using (9) and taking the limit as T → ∞, we get λ ≥ lim sup T →∞
1 E T
Z T 0
4
h(x∗ (t), u∗ (t))dt.
(14)
Moreover, for any u(·) ∈ A(x0 , m0 ), we have from (7) that ∂Au(t) f (x∗ (t), m(t)) + Qf (x∗ (t), ·)(m(t)) ≥ λ − h(x∗ (t), u(t)). Similar to (12), we get 1 λ ≤ lim sup E T →∞ T
Z T 0
h(x∗ (t), u(t))dt.
(15)
By (14) and (15) we obtain that u∗ (t) is an optimal control and λ(x0 , m0 )=λ. For proving (10), consider for a finite horizon T , the problem of minimization of functional "Z
E
0
T
#
(h(x(t), u(t)) − λ) dt + f (x(T ), m(T )) .
Functions f (x, m) and u(x, m) satisfy the HJB equation on the time interval [0, T ]. According to the verification theorem for a finite time interval, f (x, m) coincides with the optimal value of the functional. This completes the proof of the theorem. 2 Our goal in the remainder of the paper is to construct a pair (λ, W (·, ·)) which satisfies (7). To get this pair, we use the vanishing discount approach. Consider the corresponding control problem with the cost discounted at the rate ρ. For u(·) ∈ A(x0 , m0 ), we define the expected discounted cost as Z ∞ J ρ (x0 , m0 , u(·)) = E e−ρt h(x(t), u(t))dt. 0
Define the value function of the discounted cost problem as V ρ (x0 , m0 ) =
inf
u(·)∈A(x0 ,m0 )
J ρ (x0 , m0 , u(·)).
In order to get the solution of (7), we need the following result, which is also of independent interest. Theorem 3.2. For any (x0 , m0 ) ∈ S × M and (y, m0 ) ∈ S × M, there exists a control policy u(t), t ≥ 0, such that for any r ≥ 1, Ã
Eη r ≤ C1 (r) 1 +
N −1 X
!
|x0k − yk |r ,
(16)
k=1
where η = inf{t ≥ 0 : x(t) = y, m(t) = m0 }, and x(t), t ≥ 0, is the surplus process corresponding to the control policy u(t) and the initial condition (x(0), m(0)) = (x0 , m0 ). To prove the theorem, we first establish the following lemma concerning the difference between a Markov process and its mean. ˜ Lemma 3.1. Let τ˜ be a Markov time with respect to an ergodic Markov process m(t) ˜ = {m ˜ 1, · · · , m ˜ p } with m ˜ j = (m with a finite state space M ˜ j1 , · · · , m ˜ jN ) (j = 1, ..., p). Let 5
˜ = (˜ p˜m p1 , · · · , p˜N ) be its stationary expectation, i.e. ˜ j be its stationary distribution, and p P ˜ there exists a constant C2 such that p˜k = pj=1 m ˜ jk p˜m . Then for any linear function l(m), ˜j for any T > 0, ( ¯Z τ +t ¯) ¯ ¯ 1 ¯ ˜ E exp √ sup ¯ (l(m(s)) − l(˜ p)) ds¯¯ ≤ C2 . T 0≤t≤T τ Proof. Similar to Corollary C.2 in Sethi and Zhang (1994), we can prove that for any ˜ Markov time τ˜ with respect to m(t), there exists a C2 (A) for any T > 0 and A > 0 such that ½ E exp
¯Z
¯ A √ sup ¯¯ T 0≤t≤T
τ ˜+t
³
´ I{m ˜m ˜ (s)=m ˜ j} − p ˜ j
τ ˜
¯¾ ¯ ds¯¯ ≤ C2 (A).
(17)
First we show that there exists a constant C3 such that for any T > 0, (
E exp
¯Z τ˜+t
¯ 1 √ sup ¯¯ T 0≤t≤T
τ˜
¯) ¯ (m ˜ i (s) − p˜i ) ds¯¯ ≤ C3 .
(18)
To do this, we note that (
E exp
¯Z τ˜+t
¯ 1 √ sup ¯¯ T 0≤t≤T
(
1 √ T
= E exp
( ≤ E exp
1 √ T
τ˜
¯) ¯ (m ˜ i (s) − p˜i ) ds¯¯
¯Z ¯) p ¯ τ˜+t X ³ ´ ¯ ¯ ¯ j sup ¯ m ˜ i I{m ˜m ˜ (s)=m ˜ j} − p ˜ j ds¯ ¯ 0≤t≤T ¯ τ ˜ j=1 ) ¯Z τ˜+t ³ p ´ ¯¯ X ¯ j ¯ . m ˜ i sup ¯¯ I{m ˜m ˜ (s)=m ˜ j} − p ˜ j ds¯ 0≤t≤T τ ˜
j=1
Using the Schwarz inequality we get ( E exp
≤
p µ Y
½
E exp
j=1
¯)
¯
Z p ´ ¯ ¯ τ˜+t ³ 1 X j ¯ √ m ˜ i sup ¯¯ I{m ˜m ˜ (s)=m ˜ j} − p ˜ j ds¯ T j=1 0≤t≤T τ ˜
j
¯Z
¯ 2 √ m ˜ ji sup ¯¯ T 0≤t≤T
τ ˜+t
³
´ I{m ˜m ˜ (s)=m ˜ j} − p ˜ j
τ ˜
¯¾¶ Ppmi ¯ mk i k=1 ds¯¯ .
¿From here and (17), we get (18). From (18), by the Schwarz inequality, we get the statement of the lemma. 2 Proof of Theorem 3.2. The proof is divided into six steps. ˆ Step 1. We construct an auxiliary process m(t). It follows from (A.2) that we can select j j j ˆ = (m vectors m ˆ 1, · · · , m ˆ N ), j = 1, · · · , p, such that m ˆ j1 = mj1 , m ˆ jk ≤ mjk , j = 1, · · · , p, k = 2, · · · , N, and pˆk :=
p X j=1
m ˆ jk pmj > pˆk+1 :=
p X
m ˆ jk+1 pmj > d,
j=1
6
k = 1, · · · , N − 1.
(19)
ˆ Let us define the process m(t) as follows: ˆ ˆ j whenever m(t) = mj . m(t) =m ˆ We know that m(t) ∈ M is strongly irreducible and has the stationary distribution pm ˆ j, j = ˆ 1, · · · , p, where pm = (ˆ p , · · · , p ˆ ) corresponds to its stationary expectation, j = pmj . Thus, p 1 N ˆ and (19) gives p1 = pˆ1 > pˆ2 > · · · > pˆN > d. (20) Step 2. We construct a family of auxiliary processes x0 (t|s, x), t ≥ s ≥ 0 and x ∈ S. Consider the following function u0 (x, m) = (u01 (x, m), · · · , u0N (x, m)) : u01 (x, m) = m1 , (
u0k (x, m) =
mk , if xk−1 > 0, mk ∧ u0k−1 (x, m), if xk−1 = 0,
(21)
k = 2, · · · , N . We define x0 (t|s, x) as the process which satisfies the following equation (see (2)): ˆ x˙ 0 (t|s, x) = Au0 (x0 (t|s, x), m(t)), x0 (s|s, x)) = x. Clearly x0 (t|s, x) ∈ S for all t ≥ s. For a fixed s, x0 (t|s, x) is the state of the system with the production rate which is obtained by using the maximum admissible modified capacity at each machine. Define now the Markov time τ (s, x) = inf{t ≥ s : x01 (t|s, x) ≥ y1 , x0k (t|s, x) ≥ a + yk , k = 2, · · · , N },
(22)
where a > 0 is a constant specified later. It follows from this definition that τ (s, x) is the first time when the state process x0 (t|s, x) exceeds (y1 , a + y2 , · · · , a + yN ) under the production ˆ rate u0 (x0 (t|s, x), m(t)). Since each machine’s modified average capacity is larger than the modified average capacity of the machine that follows it, and since the last machine’s modified average capacity is larger than the required rate d (see (20)), we establish the following result. Step 3. We prove that there exists a constant C4 = C4 (r) such that à 2r
E (τ (s, x) − s)
< C4 1 +
N −1 X
¡
+ ¢r
(yk − xk )
!2
.
(23)
k=1
For simplicity in exposition, we will write τ, θ, xr (t), and ur (t)(r = 0, 1) instead of ˆ τ (s, x), θ(s, x), xr (t|s, x), and ur (xr (t|s, x), m(t)), respectively. Let mk = max1≤j≤p m ˆ jk , k = 1, · · · , n, mN +1 = pˆN +1 = d. We can choose ε > 0 such that (ˆ pk − pˆk+1 )(1 − ε) − εmk+1 =: bk > 0 for all 1 ≤ k ≤ N.
7
Let a1 = 0 and ak = a for 2 ≤ k ≤ N. By the definition of τ , N X
P (τ − s > t) ≤ N −2 X
+
k=1 N X
+
µ
P (x0k (s
+ t) < ak + yk ) ≤ P
k=1
µ
½
P ∩ki=1
¾
k=1
¾¶
inf
x0i (v) > 0 ∩ x0k (s + t) < ak + yk
¾
s+εt≤v≤s+t
=0
½
x0i (v) > 0 ∩
½
µ
inf
s+εt≤v≤s+t
inf
s+εt≤v≤s+t
−1 P ∩N i=1
¶
x01 (v)
inf
s+εt≤v≤s+t
x0k+1 (v) = 0
n
o¶
.
(24)
ˆ 2 (v). First we estimate the first term on the right-hand side of (24). Note that u02 (v) ≤ m Thus using Lemma 3.1 we get µ
P
¶
x01 (v)
inf
s+εt≤v≤s+t Z s+v
µ
=0 ≤P
inf
εt≤v≤t s
µ
≤ P
inf
εt≤v≤t s
Ã
≤ P
Z s+v
¶
(m ˆ 1 (r) − m ˆ 2 (r))dr ≤ 0 ¶
(m ˆ 1 (r) − m ˆ 2 (r) − (ˆ p1 − pˆ2 ))dr ≤ −ε(ˆ p1 − pˆ2 )t
! ¯Z s+v ¯ ¯ ¯ sup ¯¯ [(m ˆ 1 (r) − m ˆ 2 (r)) − (ˆ p1 − pˆ2 )]dr¯¯ ≥ ε(ˆ p1 − pˆ2 )t 0≤v≤t s n √o
≤ C2 exp −ε(ˆ p1 − pˆ2 ) t .
(25)
If inf s+εt≤v≤s+t xi (v) > 0 for all 1 ≤ i ≤ k, k ≤ N − 2, then u0k+1 (v) = m ˆ k+1 (v) and u0k+2 (v) ≤ m ˆ k+2 (v) for v ∈ (s + εt, s + t). So, just as in the proof of (25), we can show that for k = 1, ..., N − 2, ½
µ
¾
½
x0i (v) > 0 ∩ s+εt≤v≤s+t n √o ≤ C2 exp −ε(ˆ pk+1 − pˆk+2 ) t . P ∩ki=1
inf
¾¶
inf
s+εt≤v≤s+t
x0k+1 (v) = 0 (26)
Now we consider the members of the last sum on the right-hand side of (24). According to the definition of u0k (·), µ
½
−1 P ∩N k=1
µ
¾
s+εt≤v≤s+t Z s+t
≤ P xk − εtmk+1 + ≤ P
µZ s+t s+εt
n
x0k (v) > 0 ∩ x0k (s + t) < ak + yk
inf
s+εt
o¶
¶
(m ˆ k (r) − m ˆ k+1 (r))dr < ak + yk ¶ +
(m ˆ k (r) − m ˆ k+1 (r) − (ˆ pk − pˆk+1 )dr < (ak + yk − xk ) − bk t .
(27)
Applying Lemma 3.1 we have from (27): µ −1 P ∩N k=1
(
≤
½
¾
inf
s+εt≤v≤s+t
n
x0k (v) > 0 ∩ x0k (s + t) < ak + yk
o¶
+ 1 n o for 0 ≤ t ≤ (ak + yk − xk ) /bk , + √ k −xk ) C2 exp − bk t−(ak +y for t ≥ (ak + yk − xk )+ /bk . t
8
(28)
R
Note that E(τ − s)2r = 0∞ t2r−1 P (τ − s > t)dt. By substituting from (24), (25), (26), and (28) into this relation, we get (23). Step 4. We construct a family of auxiliary processes x1 (t|s, x), t ≥ s ≥ 0 and x ∈ S. Consider the following function u1 (x, m) = (u11 (x, m), · · · , u1N (x, m)), which is defined only for x such that xi ≥ yi with 1 ≤ i ≤ N − 1: u11 (x, m) = 0, (
u1k (x, m) =
mk , if xk−1 > yk−1 , mk ∧ u1k−1 (x, m), if xk−1 = yk−1
k = 2, · · · , N.
(29)
We define x1 (t|s, x) as a continuous process which coincides with x0 (t|s, x) for s ≤ t ≤ τ (s, x), which satisfies the following equation (see (2)): ˆ x˙ 1 (t|s, x) = Au1 (x1 (t|s, x), m(t)), t ≥ τ (s, x). Clearly x1 (t|s, x) ∈ S for all t ≥ s, and x1i (t|s, x) ≥ yi (1 ≤ i ≤ N − 1) for t ≥ τ (s, x). This process corresponds to a policy in which after τ (s, x), we stop production at the first machine and have the maximum possible production rate at other machines under the restriction that the content of each buffer k, 1 ≤ k ≤ N − 1, is not less than yk . We define now a Markov time θ(s, x) = inf{t ≥ τ (s, x) : x1N (t|s, x) = yN }.
(30)
Step 5. We establish that (i) a constant a given by (22) can be chosen in such a way that for all s, x, ³
´
P x1 (θ(s, x)|s, x) = y, m(θ(s, x) = m0 ≥ 1 − q > 0, and
(31)
(ii) there exists a constant C5 such that 1 a ≤ θ(s, x) − s ≤ d d N X
ÃN X
x1k (θ(s, x)|s, x) ≤
k=1
k=1 N X
xk −
N X
!
yk + C5 [τ (s, x) − s] ,
xk + C5 [τ (s, x) − s].
(33)
k=1
First taking the sum of all the equations in (1), we have d)dv for s ≤ t ≤ τ . Consequently, N X
x1k (τ ) ≤
k=1
N X
ÃN X k=1
PN
1 k=1 xk (t)
=
Rt 0 k=1 xk + s (u1 (v)−
PN
xk + (m ¯ 1 − d)(τ − s).
(34)
k=1
Since u11 (t) = 0 for t > τ , we have as before that θ > τ and x1k (θ) ≥ yk , we have 1 θ−τ ≤ d
(32)
k=1
x1k (τ )
−
N X
PN
!
yk ,
k=1
1 k=1 xk (θ)
N X k=1
9
=
PN
x1k (θ) ≤
1 k=1 xk (τ )
N X k=1
x1k (τ ).
− d(θ − τ ). Since
(35)
¿From the definitions of θ and τ , and (1) with k = N , we have that yN = x1N (θ) = R + τθ (u1N (v) − d)ds ≥ yN + a − d(θ − τ ), i.e., θ − τ ≥ a/d. This relation (34) and (35) prove (32)-(33). To prove (31), we introduce the following notations: x1N (τ )
θ(k) = inf{t ≥ τ : x1k (t) = yk }, S(k) = {ω : inf 0≤t −a/2},
S(0) = {ω : m(θ) = m0 },
k = 2, · · · , N,
S = S˜ ∩ S(0).
Note that S˜ = {ω : θ(N ) ≥ max1≤k≤N −1 θ(k)}. From the definition of u1 (x, m) and x1 (t), it ¯ then follows that if ω ∈ S, (
u1k+1 (t) =
m ˆ k+1 (v) for τ < t ≤ θ(k) 1 ≤ k ≤ N − 1, 0 for t > θ(k) 1 ≤ k < N − 1,
and θ(k) − θ(k − 1) ≥
x1k (θ(k − 1)) − yk a ≥ , m ¯k m ¯k
k = 2, · · · , N.
(36)
Therefore, S¯ ⊆ S˜ and P [S c ] ≤
N X
P (S c (k)) + P (S¯ ∩ S c (0)).
(37)
k=2
Note that if η1 and η2 are Markov times and η2 − η1 > 1, then there exists q1 < 1 such that max P (m(η2 ) 6= m0 |m(η1 ) = mj ) < q1 < 1.
(38)
1≤j≤p
Taking the conditional probability with respect to θ(N − 1) and using (36) with k = N, a > P 2/ N ¯ k ), and (38), we have k=2 (1/m P (S¯ ∩ S c (0)) < q1 < 1.
(39)
Applying Lemma 3.1 we have P (S c (k)) ≤ ≤
P∞
³R
P∞
³¯R ¯
n=1 P
τ +n (m ˆ k (v) τ
´
−m ˆ k+1 (v))dv < −a + m ¯ k+1 ¯ ¯
τ +n [(m ˆ k (v) − m ˆ k+1 (v)) − (ˆ pk − pˆk+1 )] ds¯ n=1 P ¯ τ > a + n(ˆ pk − pˆk+1 ) − m ¯ k+1 )
≤ C5
P∞
n=1 exp
n
−
a+n(ˆ pk −ˆ pk+1 )−m ¯ k+1 √ n
o
≤ C6 e−C7
√
(40)
a.
It follows from (37), (39), and (40) that we can choose a and q such that P (S c ) ≤ q < 1. This proves (31). Step 6. We construct a process x(t) (t ≥ 0) and the corresponding control policy u(t), which satisfies the statement of Theorem 2.4.
10
Define a sequence of Markov times (θi )∞ i=0 and the process x(t) for θi ≤ t < θi+1 (i = 1, 2, · · ·) as follows: θ0 = 0, θ1 = θ(0, x0 ) and x(t) = x1 (t|0, x0 ) with 0 ≤ t ≤ θ1 . If θi is defined for i ≥ 1 and x(t) is defined for 0 ≤ t ≤ θi , then we let θi+1 = θ(θi , x(θi )) and x(t) = x1 (t|θi , x(θi )) with θi ≤ t ≤ θi+1 . According to the left inequality in (32), the process x(t) is defined for all t ≥ 0. Let τi = τ (θi , x(θi )). The control policy corresponding to the process x(t) is given by (
u(t) =
ˆ u0 (x(t), m(t)), 1 ˆ u (x(t), m(t)),
if θi−1 ≤ t < τi , if τi ≤ t < θi ,
i = 1, 2 · · · ,
(41)
It is clear that u(t) ∈ A(x0 , m0 ). For the process x(t), a Markov time is defined as η = inf{t ≥ 0 : x(t) = y, m(t) = m0 }. Let Si = {ω : x(θi ) = y, m(θi ) = m}. Using conditional probabilities, we have from (31) that ³ ´ P ∩il=1 Slc ≤ q i , i = 1, 2, · · · . (42) Using (42) and the definition of x(t) we get: ηr =
∞ X i=1
θir I{∩i−1 S c ∩Si } , a.s., l=0
(43)
l
where S0c = Ω. Using (32) and (33) we have for n = 1, 2, · · · , θn − θn−1 N X
1 ≤ d
ÃN X
xk (θn−1 ) −
k=1
k=1
!
yk + C3 (τn − θn−1 ) ,
(44)
k=1 N X
xk (θn ) ≤
N X
xk (θn−1 ) + C3 (τn − θn−1 ).
(45)
k=1
Using (44) and (45) we have for i = 1, 2, · · · , 1 θi ≤ d
ÃN X
or θir ≤ C4 ir
(x0k − yk )+ + C3
i X
!
(τn − θn−1 ) ,
n=1
k=1
ÃN X³
(x0k − yk )+
´r
+
!
i X
(τn − θn−1 )r .
(46)
n=1
k=1
Note now that x(θn ) ≥ y for n = 1, 2, · · ·. Using the Schwarz inequality (Corollary 3 in page 104 of Chow and Teicher, 1988), we get from (42) and Lemma 3.1: ³
E τ1r I{∩i−1 S c ∩Si } l=1
l
´
³
´1/2
≤ q (i−1)/2 E(τ12r ) Ã
≤ C2 (r)q
(i−1)/2
1+
N −1 ³ X
(yk −
k=1
11
x0k )+
´r
!
, i = 1, 2, · · · ,
(47)
and ³
E (τn − θn−1 )r I{∩i−1 S c ∩Si } l=1
l
´
³
≤ q (i−1)/2 E((τn − θn−1 )2r
´1/2
≤ C2 (r)q (i−1)/2 , 2 ≤ n ≤ i = 2, 3, · · · .
(48)
Substituting (46) into (43), taking expectation, and using (47) and (48), we get (16). 2 The next two theorems are concerned with the solution of (7). Theorem 3.3. There exists a sequence {ρk : k ≥ 1} with ρk → 0 as k → ∞ such that for (x, m) ∈ S × M : lim ρk V ρk (x, m) = λ,
k→∞
lim [V ρk (x, m) − V ρk (0, m0 )] = W (x, m),
k→∞
where W (x, m) is convex in x for any given m. Proof. For the value function V ρ (x, m) of the discounted cost problem, we define the differential discounted value function, known also as the potential function, W ρ (x, m) = V ρ (x, m) − V ρ (0, m). Thus, the function W ρ (x, m) is convex in x. Following the line of the proof of Theorem 3.2 in Sethi et al. (1997), we see that there exist constants ρ0 and C7 > 0 such that for 0 < ρ ≤ ρ0 , ρV ρ (0, m) ≤ C7 . Thus, there exists a sequence {ρk : k ≥ 1} with ρk → 0 as k → ∞ such that for (x, m) ∈ S×M, lim ρk V ρk (0, m) = λ.
(49)
k→∞
Note that the first statement of Theorem 3.3 follows from (49) and the last statement of Theorem 3.2. So, it remains to prove the last statement of Theorem 3.3. To do this we first show that there is a constant C8 > 0 such that ³
´
|W ρ (x, m)| ≤ C8 1 + |x|Kh +2 ,
(50)
for all (x, m) ∈ S × M and ρ > 0. Without loss of generality we suppose that V ρ (x, m) ≥ V ρ (0, m) (the case V ρ (x, m) ≤ V ρ (0, m) is treated in the same way). By Theorem 3.2 there exists a control policy u(·) such that à r
Eη ≤ C1 (r) 1 +
N X
! r
|xk |
,
(51)
k=1
where η = inf{t > 0 : (x(t), m(t)) = (0, m)}, and x(t) is the state process corresponding to u(t) with the initial condition (x(0), m(0)) = (x, m). From the dynamic programming 12
principle we have V ρ (x, m)
≤E =E ≤E
µZ η µZ0η µZ0η 0
¶
exp(−ρt)h(x(t), u(t))dt + exp(−ρη)V ρ (x(η), m(η)) ¶
exp(−ρt)h(x(t), u(t))dt + exp(−ρη)V ρ (0, m) ¶
exp(−ρt)h(x(t), u(t))dt + V ρ (0, m).
Therefore, |W ρ (x, m)|
= V ρ (x, m) − V ρ (0, m) ≤E
µZ η 0
¶
exp(−ρt)h(x(t), u(t))dt .
(52)
By Assumption (A.3), there exists a C˜0 > 0 such that h(x(t), u(t)) ≤ C˜0 (1 + |x|Kh +1 + tKh +1 ),
(53)
where we use the fact that u(·) is bounded. Therefore, (51) implies that E
Z η 0
exp(−ρt)h(x(t), u(t))dt ≤ E
Z η 0
C˜0 (1 + |x|Kh +1 + tKh +1 )dt
= C˜0 (Eη + |x|Eη + E(η)Kh +2 ) ≤ C9 (1 +
N X
|xk |Kh +2 ),
k=1
for some C9 > 0. Thus (52) gives (50). For δ ∈ (0, 1), let B δ = [δ, 1/δ]N −1 × [−1/δ, 1/δ]. Based on (50) it follows from Theorem 10.6 on page 88 of Rockafellar(1972) that there is a C(δ) such that for x, x0 ∈ B δ , ¯ ρ ¯ ¯W (x, m) − W ρ (x0 , m)¯ ≤ C(δ)|x − x0 |.
(54)
Without loss of generality we assume that C(δ) is a decreasing function in δ. For 1 ≤ n ≤ N −1 and 1 ≤ i1 < ... < in ≤ N − 1, let Si1 ...in = {x ∈ b(S) : xi` = 0 for ` = 1, ..., n} and Sio1 ...in = {x ∈ Si1 ...in : xj > 0, 1 ≤ j ≤ N − 1, j ∈ / {i1 , ..., in }}. That is, Sio1 ...in is the interior of N −n−1 Si1 ...in relative to [0, ∞) × (−∞, +∞). Note that the function V ρ (x, m) is still convex on Si1 ...in . Let 1 1 −1 δ Biδ1 ...in = ΠN `=1 Υ` × [− , ] δ δ with (
Υδ`
=
{0}, if ` ∈ {i1 , ..., in } [δ, 1δ ], if ` 6∈ {i1 , ..., in }.
Using again Theorem 10.6 on Page 88 of Rockafellar (1972), in view of (50), there is a Ci1 ...in (δ) > 0 such that for x, x0 ∈ Biδ1 ...in ¯ ρ ¯ ¯W (x, m) − W ρ (x0 , m)¯ ≤ Ci ...i (δ)|x − x0 |. n 1
13
(55)
Also we assume that Ci1 ...in (δ) is a decreasing function in δ. From the arbitrariness of δ and (54)-(55), there exist W (x, m) and a sequence of {ρk : k ≥ 1} with ρk → 0 as k → ∞ such that for (x, m) ∈ S × M, lim [V ρk (x, m) − V ρk (0, m)] = W (x, m).
k→∞
(56)
It follows from the convexity of W ρk (x, m) that the limit function W (x, m) is also convex on S × M. 2 Let 5(V ρk (x, m) − V ρk (0, m0 )) be the derivative of V ρk (x, m) − V ρk (0, m0 )) at the point x when the derivative exists. Theorem 3.4. (i) λ(x0 , m0 ) does not depend on (x0 , m0 ). (ii) The pair (λ, W (·, ·)) defined in Theorem 3.3 satisfies (7) on S 0 . (iii) If there exists an open subset Sˆ of S such that (a) {5(V ρk (x, m) − V ρk (0, m0 )) : k ≥ ˆ and (b) b(S) ˆ ⊆ b(S), where b(S) and b(S) ˆ are the 1} is uniformly equi-Lipschitzian on S, ˆ boundaries of S and S, respectively. Then the pair (λ, W (·, ·)) defined in Theorem 3.3 satisfies (7) on S, i.e., it is a solution to (7). Proof. Let O be the set of all points in the interior of S on which W (x, m) is differentiable. From the convexity of W (x, m) we know that O is dense in S. It follows from the properties of convex functions that for x ∈ O and any r, lim ∂r W ρk (x, m) = ∂r W (x, m).
k→∞
(57)
Presman et al. (1995) proved that for any x ∈ S, the value function V ρk (x, m) of the discounted cost problem satisfies ρk V ρk (x, m) =
inf
u∈U (x,m)
{∂Au V ρk (x, m) + h(x, m)} + QV ρk (x, m).
This implies that ρk V ρk (x, m) =
inf
u∈U (x,m)
{∂Au W ρk (x, m) + h(x, m)} + QW ρk (x, m).
(58)
Taking the limit on both sides, we have that for x ∈ O, λ=
inf
u∈U (x,m)
{∂Au W (x, m) + h(x, m)} + QW (x, m).
(59)
If x ∈ / O, x ∈ S 0 then for any direction r, there exist a sequence {xn }∞ n=1 such that xn ∈ O and ∂r W (xn , m) → ∂r W (x, m). From this fact and from continuity of W (x, m), it follows that (59) holds for all x in the interior of S. Consider now the boundary b(S) of S. From the uniformly equi-Lipschitzian property of ˆ we know that (57) holds for all x ∈ b(S). Therefore, we have {5W ρk (x, m) : k ≥ 1} on S, (59) in b(S). 2
14
4
Concluding Remarks
In this paper, we have developed a theory of dynamic programming in terms of directional derivative for an N -machine flowshop with convex costs and the long-run average cost minimization criterion. Further research should focus on extending this analysis to N -machine flowshops with limited buffers. For such systems with two machines, see Presman et al. (2000).
References [1] T. Bielecki and P.R. Kumar, Optimality of zero-inventory policies for unreliable manufacturing systems, Operations Research, Vol. 36, 1988, 532-546. [2] M. Caramanis and A. Sharifinia, Optimal manufacturing flow control design, International J. Flexible Manufacturing Systems, Vol. 3, 1991, 321-336. [3] Y. S. Chow and H. Teicher, Probability Theory, Springer Verlag, New York, 1988. [4] F. Clarke, Optimization and Nonsmooth Analysis, Wiley-Intersciences, New York, 1983. [5] T. E. Duncan, B. Pasik-Duncan and L. Stettner, Average cost per unit time control of stochastic manufacturing systems: Revisited, Math Meth Oper Res, Vol. 54, 2001, 259-278. [6] W. Fleming and H. Soner, Controlled Markov Processes and Viscosity Solutions. Springer Verlag, New York, 1992. [7] G. Liberopoulos and M. Caramanis, Production control of manufacturing systems with production rate dependent failure rate, IEEE Trans. Auto. Control, Vol. 38, 1993, 889-895. [8] G. Liberopoulos and J. Hu, On the ordering of optimal hedging points in a class of manufacturing flow control models, IEEE Trans. Auto. Control, Vol. 40, 1995, 282-286. [9] E. Presman, S. Sethi and W. Suo, Existence of optimal feedback production plans in stochastic flowshops with limited buffers, Automatica, Vol.33, 1997, 1899-1903. [10] E. Presman, S. P. Sethi, H. Zhang and A. Bisi, Average cost optimal policies for an unreliable two-machine flowshops with limited internal buffer, Annals of Operations Research, Vol. 98, 2000, 333-351. [11] E. Presman, S. Sethi and Q. Zhang, Optimal feedback production planning in a stochastic N -machine flowshop, Automatica, Vol.31, 1995, 1325-1332. [12] R. T. Rockafellar, Convex Analysis, Princeton University Press, Princeton, NJ, 1996 Reprint Edition.
15
[13] S. P. Sethi, W. Suo, M. I. Taksar and Q. Zhang, Optimal production planning in a stochastic manufacturing system with long-run average cost, J. of Optimization Theory and Applications, Vol 92, 1997, 161-188. [14] S. P. Sethi, W. Suo, M. I. Taksar and H. Yan, Optimal production planning in a multi-product stochastic manufacturing system with long-run average cost, Discrete Event Dynamic Systems: Theory and Applications, Vol 8, 1998, 37-54. [15] S. Sharifinia, Production control of a manufacturing system with multiple machine states, IEEE Trans. Auto. Control, Vol. 33, 1988, 620-625. [16] H. Soner, Optimal stochastic control with state-space constraints II, SIAM J. on Control and Optimization, Vol. 24, 1986, 1110-1123. [17] N. Srisvatsan, Synthesis of Optimal Policies in Stochastic Manufacturing Systems, Ph.D Thesis, OR Center, MIT. [18] N. Srisvatsan and Y. Dallery, Partial characterization of optimal hedging point policies in unreliable two-part-type manufacturing systems, Operations Research, Vol. 46, 1997, 36-45.
16