Inventory Models with Markovian Demands and

Inventory Models with Markovian Demands and Cost Functions of Polynomial Growth 1

2

3

D. Beyer , S. P. Sethi , and M. Taksar

4

Communicated by G. Leitmann

To appear in JOTA 93(1998), No. 8 This research was supported in part by a Feodor Lynen Grant provided by the A. v. Humboldt-Foundation, NSERC Grant A4619, and NSF Grant DMS 9301200. 2 Technical Sta Member, Hewlett-Packard Laboratories, Palo Alto, California. 3 Professor, School of Management,The University of Texas at Dallas, Richardson, Texas. 4 Professor, Department of Applied Mathematics, State University of New York at Stony Brook, New York. 1

1

Abstract. This paper studies stochastic inventory problems with unbounded Markovian demands,

ordering costs that are lower semicontinuous, and inventory/backlog (or surplus) costs that are lower semicontinuous with polynomial growth. Finite horizon problems, stationary and nonstationary discounted cost in nite horizon problems, and stationary long-run average cost problems are addressed. Existence of optimal Markov or feedback policies is established. Furthermore, optimality of (s; S )-type policies is proved when in addition the ordering cost consists of xed and proportional cost components and the surplus cost is convex.

Key Words. Dynamic inventory model, Markov chain, dynamic programming, nite and in nite horizon, cyclic demand, (s; S ) policy.

2

1 Introduction This paper studies stochastic inventory problems with unbounded Markovian demands and more general costs than those considered in the inventory literature. Finite horizon problems, stationary and nonstationary discounted cost in nite horizon problems, and stationary long-run average cost problems are addressed. Existence of optimal Markov or feedback policies is established with unbounded Markovian demand, ordering costs that are lower semicontinuous, and inventory/backlog (or surplus) costs that are lower semicontinuous with polynomial growth. Furthermore, optimality of (s; S )-type policies is proved when in addition the ordering cost consists of xed and proportional cost components and the surplus cost is convex. The literature on in nite horizon inventory models involving a xed ordering cost assumes surplus cost to be of linear growth and uniformly continuous as in Karlin (Ref. 1), Scarf (Ref. 2), Bensoussan et al. (Ref. 3) and others. Even quadratic surplus costs that are popular in the production planning literature dating back to the classical HMMS model of Holt et al. (Ref. 4) have not been considered in in nite horizon inventory models. As for demand, Karlin and Fabens (Ref. 5), Song and Zipkin (Ref. 6), Sethi and Cheng (Ref. 7), and Beyer and Sethi (Ref. 8) have all considered Markovian demands. Karlin and Fabens consider only the class of state-independent (s; S ) policies, which does not in general include optimal policies. Song and Zipkin consider Markov modulated Poisson demand in their analysis. Sethi and Cheng consider general Markovian demands in their treatment of discounted cost problems. Beyer and Sethi (Ref. 8) address average cost problems, but assume demands to be almost surely bounded. See also Beyer and Sethi (Ref. 9) for a discussion of earlier analyses by Iglehart (Ref. 10), Veinott and Wagner (Ref. 11), and others of problems with independent demands. In this paper, we consider unbounded Markovian demands in our analysis of discounted and average cost problems, but require that a certain number (depending on the growth rate of the surplus cost function) of moments to be nite. This is an essential requirement and yet not very restrictive. As both costs and demand are generalized, the paper represents a signi cant extension of both the discounted cost and average cost in nite horizon inventory problems (involving a xed ordering cost component) that have appeared in the literature; see Section 4 for a summary of speci c contributions of the paper. The plan of the paper is as follows. The discounted cost problem is treated in the next section; nite and in nite horizon as well as stationary and nonstationary versions are considered. Section 3 takes up the stationary problem with the criterion of long-run average cost minimization. Section 4 concludes the paper.

3

2 Discounted Cost Problem In this section we conduct a detailed analysis of the discounted cost problems. The problem is carefully formulated in Section 2.1. In Section 2.2, we use the dynamic programming equation to prove the existence of an optimal Markov control for the nite horizon problem. We also provide a veri cation theorem for the solution of the dynamic programming equation to be the value function. We prove the value function to be continuous when the surplus cost is continuous and the ordering cost is lower semicontinuous. As we shall remark later, this has some implications for whether or not to order at the level s in an optimal (s; S )policy. The nonstationary in nite horizon problem is treated in Section 2.3. With further assumptions on costs, the optimality of (s; S )-type policies is established in Section 2.4. Finally, the stationary in nite horizon problem is brie y discussed in Section 2.5.

2.1 Formulation of the Model

Let us consider an inventory problem with a nite horizon [n; N ] = fn; n + 1; : : :; N g and an initial inventory of x units at the beginning of period n. The demand in each period is assumed to be a random variable de ned on a given probability space ( ; F ; P ) and not necessarily identically distributed. More speci cally, the demand distributions in successive periods are de ned as below. Consider a nite collection of demand states I = f1; 2; : : : ; Lg, and let ik denote the demand state in the kth period. We assume that ik , k 2 [n; N ], with known initial demand state in is a Markov chain over I with the transition matrix P = fpij g; thus L X 0 pij 1; i 2 I; j 2 I; and pij = 1; i 2 I: j =1

Let a nonnegative random variable k+1 denote the demand in a given period k, k = 0; : : : ; N ? 1. Demand k+1 follows a distribution with the cumulative probability distribution i;k+1 (x), if the demand state ik = i. In the following period, if the state changes to state j , which happens with probability pij , then the demand distribution is j;k+2 in that period. We further assume that for some 1 and a positive constant D, Z1

+ E (k+1jik = i) = x + di;k+1(x) D < 1; k = 1; : : : ; N; i 2 I (1) 0

for some > 0, where E denotes the expectation operator; this is not a very restrictive assumption from an applied perspective. We denote by Flk the -algebra generated by il; l+1 ; : : :; ik?1; k ; ik , 0 l k N , (2) F k = F0k . 4

Since ik ; k = 1; : : :; N; is a Markov chain and n+1 depends only on in, we have

E (n+1 jF n) = E (n+1 ji0; i1; : : : ; in; 1; : : :; n ) = E (n+1 jin):

(3)

An admissible decision (ordering quantities) for the problem on the interval [n; N ] with initial state in = i can be denoted as U = (un; : : :; uN ?1); (4) where uk is a nonnegative Fnk -measurable random variable. In simple terms, this means that decision uk depends only on the past information. Note that since in is known in period n, Fnn = ( ; ;), and hence un is deterministic. Moreover, it should be emphasized that this class of admissible decisions is larger than the class of admissible feedback policies. Ordering quantities are decided upon at the beginning of each period. Demand in each period is supposed to occur at the end of the period after the order has been delivered. Unsatis ed demand is carried forward as backlog. The inventory balance equations are de ned by the system

yk+1 = yk + uk ? k+1 ; k = n; : : : ; N ? 1; yn = x; initial inventory level, ik ; k = n; : : : ; N ? 1; Markov chain with transition matrix P; in = i; initial state; where yk is the surplus level at the beginning of period k, uk is the quantity ordered at the beginning of period k, ik is the demand state in period k, and k+1 is the demand in period k. Note that yk > 0 represents an inventory of yk and yk < 0 represents a backlog (or shortage) of ?yk . Next, we de ne various costs involved. For each period k 2 [n; N ] and the demand state j 2 I : The production cost function ck (j; u) : I R+ ! R+ is lower semicontinuous, (5) ck (j; 0) = 0, k = 0; 1; : : : ; N ? 1; The surplus cost function fk (j; x) : I R ! R+ is lower semicontinuous, with (6) fk (j; x) f(1 + jxj ), k = 0; 1; : : : ; N ? 1, where f is a nonnegative constant. When x < 0, fk (j; x) is the cost of backlogged sales x; and when x > 0, fk (j; x) is the carrying cost of holding inventory x during that period; fN (j; x) : I R ! R+ , the penalty cost/disposal cost for the terminal surplus, is (7) lower semicontinuous with fN (j; x) f(1 + jxj ). When x < 0, fN (j; x) represents the penalty cost of unsatis ed demand x; and when x > 0, fN (j; x) represents the disposal cost of the inventory level x. The objective function to be minimized is the expected value of all the costs incurred during 5

the interval [n; N ]:

Jn(i; x; U ) = E

(NX ?1 k=n

k?n [c

k (ik ; uk ) + fk (ik ; yk

) N (iN ; yN ) ;

)] + N ?n f

(8)

which is always de ned, provided we allow this quantity to be in nite. In (8), denotes the discount factor with 0 < < 1. While introducing in this nite horizon model does not add to generality in view of the already time-dependent nature of the cost functions, we do so for convenience of exposition in dealing with the average cost optimality problems later in the paper.

2.2 Dynamic Programming and Existence of Optimal Feedback Policies Let us rst introduce the following de nitions:

B = a complete normed linear space of real Borel functions b : I R ! R with polynomial growth with power or less. More speci cally, if b 2 B , then jb(i; x)j k b k (1 + jxj ), where the norm k b k= max sup jb(i; x)j < 1: (9) i x 1 + jxj L = the subspace of lower semicontinuous functions in B . The space L is closed in B . L? = be the class of all l.s.c. functions which are of polynomial growth with power or less on (?1; 0]. In view of (3), we can write

E [b(in+1; n+1 )jF n] = E [b(in+1; n+1)jin];

(10)

for any b 2 B . We de ne on B the operator Fn+1 on B as follows:

Fn+1(b)(i; y) = E [b(in+1; y ? n+1 )jin = i] L X = fP (in+1 = j jin = i)E [b(j; y ? n+1 )jin = i]g j =1 Z1 L X = pij b(j; y ? )di;n+1 (): j =1

0

Lemma 2.1 Fn+1 is a continuous linear operator from B into B .

6

(11)

Proof. It follows immediately from the de nition of Fn+1 that it is a continuous linear operator. By de nition of k b k we have jb(i; x)j k b k (1 + jxj ), and we can conclude PL p R 1 b(j; y ? )d () k k k F i;n+1 n+1 (b) k k Fn+1 k = sup k b k = sup j=1 ij 0 k b k b6=0 b6=0 R P 1 L

sup k j=1 pij 0 k b k (1k b+kjy ? j )di;n+1 () k b6=0 PL p R 1(1 + jy ? j )d () i;n+1 j =1 ij 0 = max sup

i y 1 + jyj PL p R 1(jyj + jj) d () i;n+1 1 + max sup j=1 ij 0 1 + jyj i y Using the inequality (a + b)x 2x (jajx + jbjx), we obtain PL p (jyj + R 1 jj d () i;n+1 j =1 ij 0

k Fn+1 k 1 + 2 max sup

i y 1 + jyj Z L 1 X 1 + 2 (1 + max p jj di;n+1 ()) ij i 0 j =1

L h i X

i = i ; = 1 + 2 1 + max p E j j ij n+1 n i j =1

which is nite on account of Assumption (1). Therefore, Fn+1(b) 2 B .

2

In addition to the requirements (5)-(7) on costs, we also assume that for n = 0; 1; : : : ; N ? 1, cn (i; u) + Fn+1(fn+1 )(i; u) ! 1 for u ! 1: (12) Remark 2.1 Assumption (12) implies that both the purchase cost and the inventory (or salvage) cost associated with a decision in any given period cannot identically be zero. The conditions rule out the trivial and unrealistic situation of ordering an in nite amount as the optimal policy. Let vn(i; x) represent the optimal value of the expected costs during the interval [n; N ] with demand state i in period n. Then, vn(i; x) satis es the dynamic programming equations vn(i; x) = fn (i; x) + uinf fc (i; u) + E [vn+1(in+1; x + u ? n+1 )jin = i]g 0 n = fn (i; x) + uinf fc (i; u) + Fn+1(vn+1)(i; x + u)g ; n = 0; 1; : : : ; N ? 1; (13) 0 n vN (i; x) = N ?n fN (i; x): (14) We can now state our existence results in the following two veri cation theorems. Theorem 2.1 The dynamic programming equations (13) and (14) de ne a sequence of functions in L . Moreover, there exists a Borel function u^n (i; x) such that the in mum in (13) is attained at u = u^n(i; x) for any x. Furthermore, if the functions fn (i; ), n = 0; 1; : : : ; N , are continuous then the functions de ned by (13) and (14) are continuous. 7

Proof. We proceed by induction. Assume vn+1(i; x) belongs to B . Consider points x such that jxj M , where M is an arbitrary nonnegative integer. Set M = sup fF (v )(i; x)g: (15) Bn;i n+1 n+1 jxjM

Because of (12) we know that the set M := fu : inf fc (i; u) + F (f )(i; x + u) B M g Nn;i n+1 n+1 n;i jxjM n

(16)

is bounded, i.e., there is a uMn;i such that M [0; uM ]: Nn;i n;i

(17)

M g N M [0; uM ]; fu : jxinf fc (i; u) + Fn+1(vn+1)(i; x + u) Bn;i n;i n;i jM n

(18)

Because of vn+1 fn+1, we conclude that and, therefore without loss of optimality, we can restrict our attention to 0 u uMn;i for all x satisfying jxj M . This is because any u > uMn;i satis es

cn (i; u) + Fn+1(vn+1 )(i; x + u) jxinf fc (i; u) + Fn+1 (vn+1)(i; x + u) jM n M = sup fF (v )(i; x)g > Bn;i n+1 n+1

jxjM Fn+1 (vn+1)(i; x)

= cn (i; 0) + Fn+1(vn+1)(i; x); and thus cannot be an in mum. Since the function n (i; x; u) = cn (i; u) + Fn+1 (vn+1 )(i; x + u) is l.s.c. and bounded from below, its minimum over a compact set is attained. Moreover, from the classical selection theorem (see Bensoussan et al., Ref. 3, for example), we know that there exists a Borel function u^Mn (i; x) such that ^Mn (i; x)) = n (i; x; u

inf

0uuM n;i

n (i; x; u);

jxj M:

(19)

De ning

u^n (i; x) = u^Mn (i; x) for M ? 1 < jxj M; we obtain a Borel function such that ^n(i; x)) = uinf (i; x; u); n (i; x; u 0 n 8

8x:

(20)

Since

inf (i; x; u) n(i; x; 0) cn(i; 0) + k Fn+1 u0 n

kk vn+1 k (1 + jxj );

we can use (13), (6), and Lemma 2.1 to conclude that vn(i; x) 2 B . Furthermore, because n(i; ; ) is l.s.c., it follows from equation (19) that vn(i; ) is l.s.c. for each i (e.g., see Aubin, Ref. 12, x2.5, Theorem 1) and therefore vn 2 L . To prove the last part of the theorem, we begin with the fact that for each i the function fn (i; ) is continuous, n = 0; 1; : : : ; N . Then vN (i; ) = fN (i; ) is continuous for all i, and the continuity of vn can be proved by induction as follows. Assume vn+1 to be continuous. From (20) we derive

vn(i; y0) = fn (i; y0) + cn(i; u^(i; y0)) + Fn+1 (vn+1)(i; y0 + u^(i; y0)) and, because the in mum in (13) is not necessarily attained at u^(i; y0) for x = y1,

vn (i; y1) fn(i; y1) + cn (i; u^(i; y0)) + Fn+1(vn+1)(i; y1 + u^(i; y0)): Thus,

vn(i; y1) ? vn(i; y0) fn (i; y1) ? fn (i; y0) + Fn+1 (vn+1)(i; y1 + u^(i; y0)) ? Fn+1(vn+1)(i; y0 + u^(i; y0)); which, in view of the continuity of fn and vn+1 2 L along with assumption (1), yields y1lim !y0 vn (i; y1) ? vn (i; y0) 0:

On the other hand, we have already proved that vn is l.s.c., which means Therefore, vn is continuous.

y1lim !y0 vn (i; y1) ? vn (i; y0) 0:

2

To solve the problem of minimizing J0(i; x; U ), let us de ne

y^0 = x; u^k = u^k (ik ; y^k ); k = 0; : : : ; N ? 1; y^k+1 = y^k + u^k ? k+1 ; k = 0; : : : ; N ? 1; ik ; k = 0; : : :; N; Markov chain with transition matrix P; i0 = i; where u^k (i; x) is a Borel function for which the in mum in (13) is attained for any i and x. Theorem 2.2 The policy U^ = (^u0; u^1; : : :; u^N ?1) minimizes J0(i; x; U ) over the class U of all admissible decisions. Moreover, v0(i; x) = min J (i; x; U ): (21) U 2U 0 9

Proof. Let U = (u0; : : : ; uN ?1) be any admissible decision with the corresponding trajectory

(y0; : : : ; yN ?1). Without loss of generality, we may assume that Ecn (in; un) < 1; Efn(in ; yn) < 1 8n; and EfN (iN ; yN ) < 1; (22) or else J0(i; x; U ) = 1 > v0(i; x) 2 L . Next, since yn + un is F n-measurable and vn 0 so that various expectations of vn used in what follows are well de ned, we can use a result on conditional expectations and (10) and (11) to obtain E fvn+1(in+1; yn+1 )jF ng = E fvn+1 (in+1; yn + un ? n+1 )jF ng = E (vn+1 (in+1; y ? n+1 )jF ngy=y +u = E (vn+1 (in+1; y ? n+1 )jingy=y +u = Fn+1 (vn+1)(i; y)i=i ;y=y +u = Fn+1 (vn+1)(in; yn + un) a:s:: (23) Now using (13), since U is admissible but not necessarily optimal, we can assert that vn(in; yn) fn (in; yn) + cn (in; un) + Fn+1(vn+1)(in; yn + un) a:s: Then from the relation (23), we can derive vn(in; yn) fn (in; yn) + cn (in; un) + E fvn+1(in+1 ; yn+1)jF ng: Taking expectation of both sides of the above inequality , we obtain nEvn (in; yn) n E (fn(in ; yn) + cn(in; yn)) + n+1 E (vn+1(in+1 ; yn+1)): (24) Because of vN (iN ; yN ) = fN (iN ; yN ) and (22), we can conclude recursively that Evn(in; yn ) < 1 for all n, 0 n N . Then, by summing (24) from 0 to N ? 1 and canceling terms, we obtain v0(i; x) J0(i; x; U ): (25) Consider now the decision U^ . Using the de nition of u^n(in; x) as the Borel function for which the in mum in (13) is attained and proceeding as above, we can obtain n Evn(in; y^n) = n E (fn(in; y^n) + cn (in; u^n)) + n+1 E (vn+1(in+1; y^n+1)): Notice that y^0 = x is deterministic and v0(i; x) 2 L . Therefore Ev0(i0; y^0) = v0(i; x) < 1, and we can prove recursively that Ecn (in; u^n) < 1; Efn(in; y^n) < 1; Evn(in ; y^n) < 1 8n: Adding up for n from 0 to N ? 1 and canceling terms once again, it follows that v0(i; x) = J0(i; x; U^ ): This and the inequality (25) complete the proof. 2 n

n

n

10

n

n

n

n

2.3 Nonstationary Discounted In nite Horizon Problem In this section we consider an in nite horizon version of the model formulated in Section 2.1. For this purpose, we set N = 1 and keep the same transition matrix P . We replace [n; N ] by [n; 1), (4) by an admissible decision U = (un; un+1; : : :); (26) and (8) by the objective function 1 X Jn (i; x; U ) = l?n E [cl(il; ul) + fl(il; yl)]; (27) l=n

where is the given discount factor, 0 < < 1. The dynamic programming equations for each i 2 I can be written as

vn(i; x) = fn(i; x) + uinf fc (i; u) + Fn+1(vn+1)(i; x + u)g; n = 0; 1; 2; : : : : 0 n

(28)

In what follows, we shall show that there exists an L -solution of (28), which is the value function of the in nite horizon problem. Moreover, the decision, for which the in mum in (28) is attained, is an optimal feedback policy. First, let us examine a nite horizon approximation Jn;k (i; x; U ) of (27). The approximation is obtained by the rst k-period truncation of the in nite horizon problem of minimizing Jn(i; x; U ), i.e., n+X k ?1 Jn;k (i; x; U ) = E [cl(il; ul) + fl(il; yl)]l?n : (29) l=n

Let vn;k (i; x) be value function of the truncated problem, i.e.,

vn;k (i; x) = inf J (i; x; U ): U n;k

(30)

vn;k+1(i; x) = fn (i; x) + uinf fc (i; u) + Fn+1(vn+1;k )(i; x + u)g; 0 n vn+k;0(i; x) = 0:

(31) (32)

Since the truncated problem is a nite horizon problem de ned on the interval [n; n + k], Theorems 2.1 and 2.2 apply. Therefore, its value function can be obtained by solving the corresponding dynamic programming equations:

Moreover, vn;0(i; x) = 0 and

vn;k (i; x) = min J (i; x; U ): U n;k

Next, we will show that an a priori upper bound on inf Jn (i; x; U ) can be easily constructed. Let us de ne wn(i; x) = Jn(i; x; 0); (33) 11

where 0 = f0; 0; : : :g. Then, noting (5) we have 1 h X i wn(i; x) = fn(i; x) + E k?n fk (ik ; x ? (n+1 + n+2 + + k )) in = i ): k=n+1

(34)

Lemma 2.2 wn(i; x) is well de ned and wn(i; x) 2 L . Proof. On account of fn (i; x) 2 L , it is sucient to show that E

1 h X

k=n+1

i k?n fk (ik ; x ? (n+1 + n+2 + + k )) in = i 2 L :

Assumption (6) yields 1 h X i E k?n fk (ik ; x ? (n+1 + n+2 + + k )) in = i k=n+1 1 h i X f k?n (1 + E jx ? (n+1 + n+2 + + k )j in = i ) k=n+1 1 h i X k?n E jx ? (n+1 + n+2 + + k )j in = i : = f1? +f k=n+1 Note that it follows from Assumption (1) that E (k +1 jik = 1) D + 1 for all k and i. Let M x = maxfjxj ; D + 1g. Now let us consider the argument of the sum for a xed k n + 1. Let k := f(in; : : : ; ik ) : in; : : :; ik 2 I and in = ig be the set of all combinations of demand states in periods n through k for a given initial state in = i. Note that for a given sequence of demand states, the one-period demands are independent. Then we have h i E jx ? (n+1 + + k )j in = i i X h E jx ? (n+1 + + k )j (in; : : :; ik ) = P ((in ; : : :; ik ) = ) = 2 h i X (k ? n + 1) E jxj + jn+1 j + + jk )j (in; : : :; ik ) = P ((in; : : :; ik ) = ) 2 X (k ? n + 1) (n ? k)M xP ((in; : : :; ik ) = ) = (k ? n + 1) (k ? n)M x : k

k

2k

This, in view of Assumption (6), gives 1 1 h X i X k ? n E fk (ik ; x ? (n+1 + + k )) in = i f 1 ? + Mx k?n (k ? n +1) (k ? n) < 1: k=n+1

k=n+1

Therefore, wn(i; x) < 1. Furthermore, as a sum of l.s.c. functions, it is also l.s.c. Moreover, because M x = jxj for jxj D + 1, wn(i; x) is at most of polynomial growth with power . We thus have wn(i; x) 2 L . 2 Now we can state the following result for the in nite horizon problem. 12

Theorem 2.3 Assume (1) and (5)-(7). Then we have 0 = vn;0 vn;1 : : : vn;k wn;

(35)

and

vn;k " vn; a solution of (28) in B : (36) Furthermore, vn 2 L and we can obtain U^ = fu^n; u^n+1; : : :g for which the in mum in (28) is attained. Moreover, U^ is an optimal feedback policy, i.e., vn (i; x) = min J (i; x; U ) = Jn (i; x; U^ ): U n

(37)

Proof. By de nition, vn;0 = 0. Let U~n;k = fu~n ; u~n+1; : : :; u~n+k g be a minimizer of (29). Thus, wn (i; x) = Jn(i; x; 0) Jn;k (i; x; 0) vn;k (i; x) = Jn;k (i; x; U~n;k ) Jn;k?1 (i; x; U~n;k?1) min J (i; x; U ) = vn;k?1(i; x): U n;k?1 This proves (35). Moreover, by the monotone convergence theorem, there is a function vn(i; x) such that vn;k (i; x) " vn(i; x) wn (i; x): (38) Next, we shall show that the functions vn satisfy the dynamic programming equations de ned in (28). Observe from (31) and (35) that for each k, we have

vn;k (i; x) fn (i; x) + uinf fc (i; u) + Fn+1 (vn+1;k )(i; x + u)g: 0 n Thus, in view of (38), we obtain

vn(i; x) fn (i; x) + uinf fc (i; u) + Fn+1 (vn+1)(i; x + u)g: 0 n

(39)

Let the in mum in (31) be attained at u^n;k . In order to obtain the reverse inequality, we rst prove that u^n;k (i; x) is uniformly bounded with respect to k. In the proof of Theorem 2.1 we showed that for jxj M , there is an uMn;i such that u^n;N ?n(i; x) uMn;i. Furthermore, if we replace vn+1 by wn+1 in (15) and follow the same line of arguments as in the proof of Theorem 2.1, we obtain an upper bound uMn;i , which does not depend on the horizon N . Therefore, we can conclude that

un(i; x) := uMn;i is an upper bound for u^n;k (i; x) if M ? 1 < jxj M; independent of k. 13

(40)

Let k < l, so that we can deduce from (31) and (32):

vn;l+1(i; x) = fn (i; x) + cn(i; u^n;l(x)) + Fn+1(vn+1;l)(i; x + u^n;l(x)) fn (i; x) + cn(i; u^n;l(x)) + Fn+1(vn+1;k )(i; x + u^n;l(x)):

(41)

Fix k and let l ! 1. In view of (40), we can choose a sequence of periods l0 such that

u^n;l0 (i; x) ! u~n(i; x) for l0 ! 1:

(42)

Then we conclude from (41) and Fatou's Lemma that

vn(i; x) fn(i; x) + l0lim c (i; u^n;l0 (i; x)) + lim inf F (v )(i; x + u^n;l0 (i; x)); !1 n l0 !1 n+1 n+1;k Z1 L X inf v (i; x + u^n;l0 (i; x) ? ))di;n+1 (): fn(i; x) + l0lim c (i; u^n;l0 (i; x)) + pij (lim !1 n l0 !1 n+1;k j =1

0

Since vn+1;k and cn are lower semicontinuous and in view of (42), we can pass to the limit in the argument of these functions to obtain

vn(i; x) fn (i; x) + cn (i; u~n(i; x)) + Fn+1 (vn+1;k )(i; x + u~n (i; x)); fn (i; x) + uinf fc (i; u) + Fn+1 (vn+1;k )(i; x + u)g: 0 n This, along with (39) and (38), proves (36). Furthermore, for any convergent sequence fyj g1j=1, we have lim inf v (i; yj ) lim inf v (i; yj ) vn;k (i; jlim y ): j !1 n j !1 n;k !1 j Letting k go to in nity gives

lim inf v (i; yj ) vn(i; jlim y ); j !1 n !1 j which proves that vn is l.s.c. Also, since it is bounded by a function wn of polynomial growth, we have vn 2 L . Because wn vn fn, it is clear that un (i; x) de ned in (40) is also an upper bound for the minimizer in (28). Therefore, there exists a Borel map u^n(i; x) such that

cn(i; u^n(i; x)) + Fn+1 (vn+1)(i; x + u^n(i; x)) = uinf fc (i; u) + Fn+1 (vn+1)(i; x + u)g: 0 n Summing from n to N ? 1 and canceling terms as in the proof of Theorem 2.2 yields NX ?1 vn (i; x) = E [ k?n (ck (ik ; u^k ) + fk (ik ; y^k ))] + N ?n E [vN (iN ; y^N )]: k=n

Letting N ! 1, we conclude

vn(i; x) Jn(i; x; U^ ): 14

(43)

From Theorem 2.2 we know that vn;k (i; x) Jn;k (i; x; U ) for any policy U , and we let k ! 1 to obtain vn(i; x) Jn(i; x; U ) for any admissible U . (44) Inequalities (43) and (44) together imply vn (i; x) = Jn (i; x; U^ ) = min Jn (i; x; U ); U

which completes the proof.

2

Before we prove the optimality of an (s; S ) policy for the nonstationary nite and in nite horizon problem, we should note that Theorem 2.3 does not imply uniqueness of the solution to the dynamic programming equations (31) and (32). There may well be other solutions. Moreover, one can show that the value function is the minimal positive solution of (31) and (32). It is also possible to obtain a uniqueness proof under additional assumptions. For our purpose, however, it is sucient to have the results of Theorem 2.3.

2.4 Optimality of ( ) Ordering Policies s; S

The existence and optimality of a feedback (or Markov) policy u^n(i; x) was proved in Theorems 2.1 and 2.2. We now make additional assumptions to further characterize the optimal feedback policy. Let us assume that in any demand state i,

fn (i; x) is convex with 8 respect to x; n = 0; 1; : : : ; N; < u = 0; cn (i; u) = : 0; i i Kn + cn u; u > 0; where n = 0; 1; : : : ; N ? 1, cin 0 and Kni 0, and L X Kni K ni +1 pij Knj +1 ; n = 0; 1; : : : ; N: j =1

(45) (46)

(47)

It should be noted that (45) implies fn(i; ), n = 0; 1; : : : ; N , to be continuous functions on R.

Remark 2.2 Assumptions (45){(47) re ect the usual structure of costs to prove optimality of an (s; S )-type policy.

Theorem 2.4 Let N be nite. Assume (1), (5)-(7), (12), and (45)-(47). Then there exists a sequence of numbers sin ; Sni ; n = 0; : : : ; N ? 1; i = 1; : : : ; L; with sin Sni , such that the optimal

feedback policy is

8 i i < u^n(i; x) = : Sn ? x; x sin : 0; x > sn 15

(48)

Proof. The dynamic programming equations (13) and (14) can be written as follows: vn(i; x) = fn(i; x) ? cin x + hn (i; x); 0 n N ? 1; i = 1; : : : ; L; vN (i; x) = fN (i; x);

i = 1; : : : ; L;

where

hn (i; x) = yinf [K i 1 + zn(i; y)] x n y>x zn(i; y) = ciny + Fn+1(vn+1)(i; y):

(49) (50)

From (13), we have vn(i; x) fn (i; x). This inequality along with (12) ensures for n = 1; 2; : : : ; N ?1 and i = 1; 2; : : : ; L that zn(i; x) ! +1 as x ! 1: (51) Furthermore, from the last part of Theorem 2.1, it follows that vn+1 is continuous, and therefore zn(i; x) is continuous. In order to obtain (48), we need to prove that zn(i; x) is Kni -convex. This is done by induction. First, vN (i; x) is convex by de nition and, therefore, K -convex for any K 0. Let us now assume that for a given n N ? 1 and i, vn+1(i; x) is Kni +1 -convex. By Assumption (47), it is easy to see that zn(i; x) is K ni +1 -convex, hence also Kni -convex. Then, Proposition 4.2 from Sethi and Cheng (Ref. 7) implies that hn (i; x) is Kni -convex. Therefore, vn(i; x) is Kni -convex. This completes the induction argument. Thus, it follows that zn (i; x) is Kni -convex for each n and i. In view of (51), we apply Proposition 4.2 of Sethi and Cheng (Ref. 7) to obtain the desired sin and Sni . According to Theorem 2.2 and the continuity of zn, the feedback policy de ned in (48) is optimal. 2

Theorem 2.5 Assume (1), (5)-(7), (12), and (45){(47) to hold for the in nite horizon problem. Then, there exists a sequence of numbers sin ; Sni ; n = 0; 1; : : : ; with sin Sni for each i 2 I , such that the feedback policy is optimal.

8 i i < u^n(i; x) = : Sn ? x; x < sni 0; x sn

(52)

Proof. Let vn denote the value function. De ne the functions zn and hn as above. We know that zn(i; x) ! 1 as x ! +1 and zn(i; x) 2 L for all n and i = 1; 2; : : : ; L.

We now prove that vn is Kn -convex. Using the same induction as in the proof of Theorem 2.5, we can show that vn;k (i; x) de ned in (30) is Kni -convex. This induction is possible since we know that vn;k (i; x) satis es the dynamic programming equations (31) and (32). It is clear from the de nition of K -convexity and from taking the limit as k ! 1 that the value function vn(i; x) is also Kni -convex. 16

From Theorem 2.3 we know that that vn satis es the dynamic programming equations (31) and (32). Therefore, we can obtain an optimal feedback policy U^ = fu^n; u^n+1 ; : : :g for which the in mum in (31) is attained. Because zn is Kni -convex and l.s.c., u^n can be expressed as in (52). 2

Remark 2.3 It is important to emphasize the dierence between the (s; S ) policies de ned in (48)

and (52). In (48), an order is placed when the inventory level is s or below, whereas in (52) an order is placed only when the inventory is strictly below s. Most of the literature uses the policy type (48). While (48) in Theorem 2.4 can be replaced by (52) on account of the continuity of zn , it is not possible to replace (52) in Theorem 2.5 by (48), since zn is proved only to be lower semicontinuous.

Remark 2.4 In the stationary in nite horizon discounted cost case discussed in the next section,

we are able to prove (see Lemma 3.4) that the value function is locally Lipschitz, and therefore continuous. Thus, in this case policies of both types (52) and (48) are optimal.

2.5 Stationary In nite Horizon Discounted Cost Model If the cost functions as well as the distributions of the demands do not explicitly depend on time, i.e., for each k ck (j; u) = c(j; u); fk (j; x) = f (j; x); and j;k+1 = j ; then it can be shown easily that the value function vn(i; x) does not depend on n. In what follows, we will denote the value function of the stationary discounted cost problem by v(; ) to emphasize the dependence on the discount factor. In the same manner as in Section 2.3, it can be proved that the function v satis es the dynamic programming equation

v(i; x) = f (i; x) + uinf fc(i; u) + F (v)(i; x + u)g; 0

(53)

where F is the same as Fn+1 de ned in (11), i.e., Z1 L X F (b)(i; y) = pij 0 b(j; y ? )di (); j =1

for b 2 B . Furthermore, for any , 0 < < 1, there is a stationary optimal feedback policy U = (u(i; x); u(i; x); : : :), where u(i; x) is the minimizer on the right-hand side of (53). Moreover, if the cost functions also satisfy the conditions (45){(47) introduced in Section 2.4, then we can obtain pairs (si; Si ) such that either of the (si; Si )-policies of types (52) and (48) is optimal; see Remark 2.4. 17

3 Long-Run Average Cost Problem With the results for the discounted problem in hand, we can now begin the analysis of the average cost problem. In Section 3.1, we provide a precise formulation of the problem. In Section 3.2 we derive relevant properties of the discounted-cost value function that are uniform in and examine the asymptotic behavior of the dierential discounted value function as the discount factor approaches one. In Section 3.3, we employ the vanishing discount approach to establish the average-cost optimality equation. The associated veri cation theorem is proved in Section 3.4, and the theorem is used to show that there exists an optimal policy characterized by parameters (si; S i), i 2 I . Moreover, these parameters can be obtained by taking the limit of a converging subsequence of (si; Si ) as ! 1

3.1 Formulation of the Problem In what follows we show the existence of an optimal (si; S i)-policy for the stationary long-run average cost problem. To establish this result we have to make assumptions in addition to those already made in Section 2. For the convenience of the reader, we list all the required assumptions here. (A1) The production cost is given by c(i; u) = K + ciu for u > 0 and c(i; 0) = 0, where the xed ordering cost K 0 and the variable cost ci 0. (A2) For each i, the nonnegative surplus cost function f (i; ) is convex and at most of polynomial growth with power , i.e., f (i; x) f(1 + jxj ) for some f > 0. (A3) There is a state l 2 I such that f (l; x) is not identically zero for x 0. (A4) There is a state g 2 I such that f (g; x) is not identically zero for x 0. (A5) The production and inventory costs satisfy Z1 L X cix + pij f (j; x ? z)di (z) ! 1 as x ! 1: j =1

0

(A6) The Markov chain (ik )1 k=0 is irreducible. (A7) There is a state h 2 I such that 1 ? h(") = > 0 for some " > 0. (A8) E (k+1 ) + D for some > 0.

Remark 3.1 Assumptions (A1), (A2), (A5) and (A8) are the same as the ones needed in Sections 2.4 and 2.5 to establish the optimality of a stationary (si; Si )-strategy in the stationary discounted in nite horizon problem; see Remarks 2.1 and 2.2 18

Remark 3.2 Assumptions (A3) and (A4) rule out trivial cases which lead to degenerate optimal

policies. In fact, if (A3) is violated, the optimal policy is never to order. If (A3) holds and (A4) is violated, then it is optimal to wait for a period with a demand state for which ci is minimal and order an in nite amount in that period. Conditions (A4) and (A5) generalize the usual assumption made by Scarf (Ref. 2) and others that the unit inventory holding cost h > 0.

Remark 3.3 Assumptions (A6) and (A7) are needed in order to guarantee that the exogenous

demand depletes any given initial inventory in a nite expected time. While (A7) says that in at least one state h the expected demand is strictly larger than zero, (A6) implies that the state h would occur in nitely often with a nite expected interval between successive occurrences. The objective is to minimize the expected long-run average cost (NX ) ?1 1 J (i; x; U ) = lim sup N E [c(ik ; uk ) + f (ik ; xk )] ; N !1 k=0

(54)

with i0 = i and x0 = x, where as before U = (u0; u1; : : :); ui 0; i = 0; 1; : : :, is a history-dependent or nonanticipative decision (order quantities) for the problem. Such a control U is termed admissible. Let U denote the class of all admissible controls. The surplus balance equations are given by

xk+1 = xk + uk ? k+1; k = 0; 1; : : : : Our aim is to show that (i) there exists a constant termed the optimal average cost, which is independent of the initial i and x, (ii) a control U 2 U such that

= J (i; x; U ) J (i; x; U ); for all U 2 U ; and (iii)

(NX ) ?1 1 lim E [c(ik ; uk ) + f (ik ; xk)] ; N !1 N k=0 where xk ; k = 0; 1; : : : , is the surplus process corresponding to U with i0 = i and x0 = x. To prove these results we will use the vanishing discount approach. That is, by letting the discount factor in the discounted cost problem approach one, we will show that we can derive a dynamic programming equation whose solution provides an average-optimal control and the associated minimum average cost . =

3.2 Properties of the Discounted Value Function As a rst step in the vanishing discount approach, we will establish a uniform lower bound for the largest of the optimal ordering levels si. We will do so by comparing the cost of two policies. The rst one is an (si; Si )-policy with si less than a negative real number ?V . In the proof of Lemma 3.3 19

we will give a modi ed policy and prove that there is a value V for V , that is independent of the values of close to one, such that this modi ed policy produces lower cost than the original policy. This will prove that any policy with si < ?V for all i and in the neighborhood of one cannot be optimal. To do so we need to prove some preliminary results. For convenience of exposition, let us de ne x00 = 0 and x0n as the surplus level at the beginning of period n corresponding to the policy of not ever ordering (i.e. U = 0). Thus,

x0n = ?(1 + 2 + : : : + n ): Let

(55)

V := inf fn : x0n ?V g = inf fn : 1 + 2 + : : : + n V g:

(56) ! k P P 1 Lemma 3.1 For V ! 1 and any given real number d, the expression V E k=0 f (ik ; ?t=1 t) is bounded from below by a function d(V ) which approaches d as V ! 1. V

Proof. Fix the initial state i. We de ne recursively the sequence (tjk)1k=0 of periods, in which the

demand state is j , as follows:

tj0 = 0; tjk+1 = inf fn > tjk : in = j g;

k = 0; 1; : : :

(57)

By convention we de ne tj1 = 1. We refer to the interval containing the periods tjk through tjk+1 ? 1 as the kth cycle, k = 0; 1; : : :. The kth cycle demand and surplus cost, respectively, are

Xkj := x0t +1 ? x0t = t +1 + : : : + t +1 ; j k

j k

j k

j k

Ykj :=

tjkX +1 ?1 n=t

j k

f (in ; x0n):

(58)

Because of the Markov property of the demand process and the fact that it = j for all k 1, we know that the random variables Xkj are independent for k 0 and identically distributed for k = 1; 2 : : :. Furthermore, it is easy to prove that EX0j < 1 and EX1j < 1, and thus EXkj < 1, k = 0; 1; : : :. Therefore, fXkJ ; k = 0; 1; : : :g is a delayed (or general) renewal process (see, e.g., Ross, Ref. 13). Let j k

TVj

:= inf fn : x0tjn

?V g = inf fn :

t X j n

t=0

t V g = inf fn :

nX ?1 k=0

Xk V g

be the index of the rst cycle for which the beginning shortage corresponding to the no-ordering policy is larger or equal to V . Thus, tjT is the index of the rst period in which the demand state is j and the surplus level at the beginning of the period in less than or equal to ?V . On the other j V

20

hand, V is the the index of the rst period in which the starting surplus is less than or equal to ?V , regardless of the demand state. Therefore, it is clear that

tjT

j V

?1 < V

tjT :

(59)

j V

Note also that TVj is nondecreasing in V . Now consider the case where the demand state j = l, the special state de ned in Assumption (A3). Because of Assumptions (A2) and (A3), we have limx!1 f (l; ?x) = 1. Therefore, we can choose a constant Vd, 0 Vd < 1, such that f (l; ?x) E (X1l )d for all ?x ?Vd. Then from (55), (58), and (59), we have X V

k=0

k X

tT l ?1

t=1

k=0 l TX V ?2

f (ik ; ? t)

X V

k=0

f (ik ; x0k ) Ykl

l TX V ?2

k=TVl d

Ykl

l TX V ?2

k=TVl d

E (X1)d (TVl ? TVl ? 1)E (X1l )d; d

for any V Vd. Clearly E (TVl ) < 1, and the key renewal theorem (for a delayed renewal process; see Ross, Ref. 13) yields E (TVl ) = 1 : lim V !1 V E (X l ) d

Finally with

d(V ) := E (TVl

? TVl d ? 1)E (X1l )d=V ,

1

we obtain

! k X 1E X V k=0 f (ik ; ? t=1 t) d(V ) and V

lim (V ) = d: V !1 d

2

This completes the proof.

Lemma 3.2 For V ! 1, the expression V1 E kP=0(c(ik ; k ) + f (ik ; ?k )) is bounded from above by a function (V ) which tends to c < 1 as V ! 1. V

Proof. Let Xkj and TVj be de ned as in the proof of Lemma 3.1. In what follows we set j := i0 = i,

the initial state and omit the superscript. With this, the 0th cycle also begins with its rst period in demand state i. Let tX +1 Zk := (c(in; n) + f (in ; ?n)) k

n=tk +1

represent a 'reward' corresponding to the kth cycle. Then (Xk ; Zk )1k=0 is a renewal reward process. The stationarity of the Markovian demands implies that the pairs (Xk ; Zk ) are independent and identically distributed for all k including k = 0. Furthermore, we already know E (X0) > 0, and because of Assumption (A1), (A2) and (A8), we have E (Z0) < 1. 21

Now we have 0t 1 ! X 1E X 1 @ A V k=0(c(ik ; k ) + f (ik ; ?k )) V E k=0(c(ik; k ) + f (ik ; ?k )) 0T ?1 1 0T 1 X X 1 1 V E @ Zk A V E @ Zk A : (60) k=0 k=0 Setting the right-hand side of (60) equal to (V ), we conclude with Proposition 4.1 of Ross (Ref. 14) that 0T 1 1 @X Zk A = E (Z0) := c < 1: lim ( V ) = lim E V !1 V !1 V E (X ) V

V

V

V

V

0

k=0

This completes the proof.

2

Lemma 3.3 Under Assumptions (A1)-(A3) and (A5)-(A8), there are real numbers 0 < 0 < 1 and C2 > ?1 such that maxi2I fsig C2 for all 2 [0; 1). Proof. For any xed discount factor 0 < < 1, let U be an optimal strategy with parameters (si; Si ). Let us x a positive real number V and assume si < ?V for each i 2 I . In what follows, we specify a value of V , namely V , in terms of which we shall construct an alternative strategy that is better than U at least when the initial surplus x = 0. Consider the alternate policy U~ = (~u0; u~1; : : :) de ned as follows: u~0 = 0,

u~k = k , k = 1; 2; : : : ; V ? 1, u~k = [xk + uk ? x~k ]+, k V , where V is de ned in (56). We investigate the strategy U~ with the initial surplus x~0 = 0. Let x~k denote the inventory level in period k resulting from U~ . With the initial inventory level x0 = x~0 = 0, we know that under the optimal policy U , no order is placed in or before the period V since the corresponding surplus k xk > ?V > si, k = 0; 1; : : : ; V ? 1. It is easy to see that x~k = ?k xk = ? P t, k V ? 1. For t=1 k V , we have the following situation: As long as the inventory level after ordering for U is less than the inventory level before ordering for U~ , the strategy U~ orders nothing, As soon as xk + uk is greater than or equal to x~k , we order up to the level xk + uk , and the inventory level is the same for both strategies from then on. Obviously, the expected cost corresponding to U from V on is not smaller than the one corresponding to U~ from V on. Therefore, ! 1 X k ~ v(i; 0) ? J(i; 0; U ) = E (c(ik ; uk ) + f (ik ; xk ) ? c(ik ; u~k ) ? f (ik ; x~k )) k=0 ! X ?1 k X k E (f (ik ; ? t) ? c(ik; k ) ? f (ik ; ?k )) : (61) V

t=1

k=0

22

Let

g(V; ) := E

X V ?1 k=0

!

k X

k ; ? t ) ? c(ik ; k ) ? f (ik ; ?k )) t=1 there is a V > 0 such that

k (f (i

:

It follows from Lemmas 3.2 and 3.1 that 0 ?1 1 0 ?1 1 k X X X g(V ; 1) = E @ f (ik ; ? t)A ? E @ (c(ik; k ) + f (ik ; ?k ))A > 0 V

V

t=1

k=0

k=0

Because lim!1 g(V ; ) = g(V ; 1), it follows that there is an 0 < 1 such that for all 0 we have g(V ; ) > 0. Therefore,

v(i; 0) ? J(i; 0; U~ ) > 0; 1 > 0; for the policy U~ de ned by V = V . This leads to a contradiction that the policy U is optimal, which helps us to conclude that any strategy with maxfsig < C2 = ?V cannot be optimal. 2

Lemma 3.4 Under Assumptions (A1)-(A3) and (A5)-(A8) and for 0 obtained in Lemma 3.3, v(i; ) is locally equi-Lipschitzian for 0, i.e., for X > 0 there is a positive constant C5 < 1,

independent of , such that

jv(i; x) ? v(i; x~)j C5jx ? x~j for all x; x~ 2 [?X; X ]; 0:

(62)

Proof. Note that from the convexity and polynomial growth properties of f (i; ), it follows that f (i; ) is locally Lipschitz continuous. Knowing this and given Lemma 3.3, the proof is exactly the

same as for the corresponding result in Beyer and Sethi (Ref. 8). Of course, in this paper Lemma 3.3 is proved under our more general assumptions than those of Beyer and Sethi (Ref. 8). 2

3.3 Vanishing Discount Approach

Theorem 3.1 Under Assumptions (A1)-(A3) and (A5)-(A8) the following results hold:

(i) For 0 obtained in Lemma 3.3, the dierential discounted value function w(i; x) := v(i; x) ? v(1; 0) is uniformly bounded with respect to ; 0 for each xed x and i and (1 ? )v(1; 0) is uniformly bounded on 0 < < 1. (ii) There exist a sequence (k )1 k=1 converging to one, a constant , and a locally Lipschitz continuous function w(; ), such that (1 ? k )v (i; x) ! and w (i; x) ! w(i; x); k

k

locally uniformly in x and i as k goes to in nity. (iii) w(i; ) is K -convex. 23

(63)

Proof. The proof of (i) is essentially the same as the proofs of Lemmas 4.1 and 4.2 in Beyer and

Sethi (Ref. 8). The only dierences are that M in Beyer and Sethi is replaced by in nity and the argument there that the demand is bounded is replaced by Assumptions (A1), (A2) and (A8). The proof of (ii) is exactly the same as the one of the rst part of Theorem 4.1 in Beyer and Sethi (Ref. 8). The proof requires part (i) of Theorem 3.1, Lemma 3.4, and the Arzela-Ascoli Theorem. The proof of (iii) follows from the fact that w as a limit of K -convex functions is K -convex. 2

Lemma 3.5 There are constants 2 2 [0; 1) and C8 > 0 such that for all 2 we have Si C8 < 1 for any i for which si > ?1. Proof. Let us x the initial state i0 = i for which si > ?1. Fix 2 > 0 and a discount factor 2. Let U = (u(i0; x0); u(i1; x1); : : :) be an optimal strategy with parameters (sj; Sj ), j 2 I . Let us x a positive real number V and assume Si > V . In what follows, we specify a value of V , namely V , in terms of which we shall construct an alternative strategy U~ that is better than U . For the demand state g speci ed in Assumption (A4), let

g := inf fn > 0 : in = gg be the rst period (not counting the period 0) with the demand state g. Furthermore, let d be the state with the lowest per unit ordering cost, i.e., cd ci for all i 2 I . Then we de ne

:= inf fn g : in = dg: Assume x0 = x~0 = x := minf0; sig and consider the policy U~ de ned as under: u~0 = ?x 0

u~k = 0, k = 1; 2; : : : ; ? 1 u~ = x + u(i ; x ) ? x~ 0 u~k = u(ik; xk ), k + 1. The two policies and the resulting trajectories dier only in the periods 0 through . Therefore, we have v(i; x) ? J(i; x; U~ ) = J(i; x; U ) ? J(i; x; U~ ) ! X k = E (f (ik ; xk ) ? f (ik ; x~k ) + c(ik ; uk ) ? c(ik ; u~k )) k=0 ! k X X k = E (f (ik ; xk ) ? f (ik ; ? k )) t=1 k=1 ! X k +E c(ik ; uk ) ? E ( c(i ; u~ )) ? c(i; ?x): (64) k=0

24

After ordering in period , the total accumulated ordered amount up to period is the same for both the policies, because x0 = x~0 and x +1 = x~ +1. Observe that the policy U~ orders only in the periods 0 and . In the period 0, u~0 u0 so that U~ orders no more than U at the unit cost ci. The second order of the policy U~ is executed at the lowest possible per unit cost cd in the period which is not earlier than any of the ordering periods of policy U . Because U~ orders twice and U orders at least once in periods 0; 1; : : : ; , the total xed ordering cost of U~ exceeds the total xed ordering cost of U by at most an amount K . Thus, ! X k K +E c(ik ; uk ) c(i; ?x) + E ( c(i ; u~ )): k=0

Furthermore, it follows from Assumptions (A2) and (A6){(A8) that ! k X X f (ik ; ? t) < 1: E t=1

k=1

Because g , we obtain

E and because si V , we obtain

X k=1

!

k f (i

k ; xk )

E

xk V ?

X g

k=1

k X t=1

! k ; xk ) ;

k f (i

t:

Irreducibility of the the Markov chain (in)1n=0 implies existence of an integer m, 0 m L, such that P (im = g) > 0. Let m0 be the smallest such m. It follows that g m0 and, therefore, ! X E k f (ik ; xk ) m0 E (f (im0 ; xm0 )) k=1 m0 X m2 0 E (f (g; V ? t)jim0 = g)P (im0 = g); for all 2. (65) g

t=1

Using Assumptions (A2), (A4), and (A8), it is easy to show that the right-hand side of (65) tends to in nity as V goes to in nity. Therefore, we can choose V , 0 V < 1, such that for all 2 ! k X X k ~ v(i; x) ? J(i; x; U ) E (f (ik ; xk ) ? f (ik ; ? k )) ? K > 0: t=1

k=1

Thus for 2, a policy with Si > C8 := V cannot be optimal.

2

Lemma 3.6 There is a positive integer N0 such that for k N0, functions w (i; ) on (?1; 0] are k

uniformly bounded from above by a function of polynomial growth with power . 25

Proof. Choose N1 such that k > 2 for k N1 with 2 de ned in Lemma 3.5. Substituting v (i; x) = w (i; x) + v (1; 0) in (53) yields k

k

k

w (i; x) + (1 ? k )v (1; 0) = f (i; x) + uinf fc(i; u) + k F (w )(i; x + u)g: 0 k

k

k

(66)

First, let x C8, where C8 is de ned in Lemma 3.5. Then it follows from the structure of the optimal policy and Lemma 3.5 that the in mum is obtained for u = 0, and we nd

w (i; x) + (1 ? k )v (1; 0) = f (i; x) + k F (w )(i; x): k

k

k

Because limk!1 (1 ? k )v (1; 0) = and limk!1 w (i; x) = w(i; x), it follows that k

k

lim F (wk )(i; x) = w k!1 k

(i; x) + ? f (i; x):

Now let x < C8 and set u = C8 ? x > 0. Then, because the in mum is not necessarily attained at u, we have

w (i; x) + (1 ? k )v (1; 0) f (i; x) + c(i; C8 ? x) + k F (w )(i; C8): k

k

k

Choose N0, N0 N1, such that for all k N0

j(1 ? k )v (1; 0) ? j " and jk F (w )(i; C8) ? w(i; C8) + ? f (i; C8)j ": k

k

Then for k N0, we obtain

w (i; x) f (i; x) + c(i; C8 ? x) + w(i; C8) + ? f (i; C8) + 2"; k

which provides a uniform upper bound of growth rate for x 0.

2

Lemma 3.7 For 0 obtained in Lemma 3.5, there is a nite constant C9 such that for all , 0 < 1, 8 < w(i; x) : C9 ? ci x; x > 0; C; x 0: 9

Proof. First assume x > 0. We know that there is an optimal stationary feedback policy U =

(u(i0; x0); u(i1; x1); : : :). Consider the modi ed policy U~ = (u(i0; x0) + x; u(i1; x1); u(i2; x2); : : :). If we use U with the initial inventory x0 = x and U~ with the initial inventory x~0 = 0, then under both policies the trajectories dier only in the initial period. Therefore, we have

w(i; x) = v(i; x) ? v(i; 0) + w(i; 0) v(i; x) ? J(i; 0; U~ ) + w(i; 0) = J(i; x; U ) ? J(i; 0; U~ ) + w(i; 0) = c(i; u) ? c(i; u~) + f (i; x) ? f (i; 0) + w(i; 0) ?K ? ci x + w(i; 0): (67) 26

By Lemma 3.5, w(i; 0) is uniformly bounded for 0. Therefore, there is a constant C9 such that w(i; x) C9 ? cix for 0 and x > 0: If x 0, it follows from Lemma 3.4 in Beyer and Sethi (Ref. 8) that w(i; x) is decreasing in x. Then by using the same arguments concerning the uniform boundedness of w(i; 0) as above, we obtain w(i; x) w(i; 0) C9; 0 and x 0: This completes the proof. 2

Theorem 3.2 The limit (; w) obtained in Theorem 3.1 satis es the average cost optimality equation

w(i; x) + = f (i; x) + uinf fc(i; u) + F (w)(i; x + u)g: (68) 0 Furthermore, w 2 L? . Moreover, let U be an optimal policy for the discounted cost problem for which the in mum in (53) is attained. Let u(i; x) be a cluster point of (u (i; x))1k=0, where the sequence (k )1 k=1 is de ned in Theorem 3.1. Then the in mum in (68) is attained at u (i; x) for w = w . k

Proof. For any u it follows from (66) that w (i; x) + (1 ? k )v (1; 0) f (i; x) + c(i; u) + F (w )(i; x + u): k

k

(69)

k

Now de ne the random variable

k (j; y) := w (j; y ? j ); where j is a random variable with the distribution P (j < t) = j (t). Then we can write L X F (w )(i; y) = pij E (k (j; y)): k

k

j =1

For k suciently close to one, it follows from Lemma 3.6 that there is a constant C3 such that

k (j; y) C3(1 + jy ? j j) C4(1 + j ) ;

(70)

and it follows from Lemma 3.7 that

k (j; y) xinf w (j; x) = C6 ?1; y

(71)

k

where C4 and C6 depend on y, which is xed. We know from Theorem 3.1 that w converges pointwise to w (moreover uniformly on compact intervals). Therefore, k (j; y) converges everywhere to (j; y) := w(j; y ? j ). On the other hand, we see from (70) and (71) that k

E (jk (j; y)j)( +)= (maxfC4; C6g)( +)= (1 + j ) + < 1 27

by Assumption (A8). Therefore, the random variables k (j; y) are uniformly integrable with respect to k for any xed j and y, and we can pass to the limit in (69) to obtain, for each x and u,

w(i; x) + f (i; x) + c(i; u) + F (w)(i; x + u):

(72)

Now let the in mum in (66) be attained at uk for xed x and j . Then we have

w (i; x) + (1 ? k )v (1; 0) = f (i; x) + c(i; uk) + F (w )(i; x + uk ): k

k

k

(73)

It follows from Lemma 3.5 that, for suciently large k, uk are uniformly bounded. Therefore we can choose a subsequence, still denoted by (k )1k=0 , such that uk ! u. Let

^k (j; x) := w (j; x + uk ? j ): k

The boundedness of uk also allows us to to prove the inequalities ^k (j; x) C3(1 + jx + uk ? j j) C^4(1 + j ) and

^k (j; x) vinf w (j; v) = C^6 ?1: x+u It is easy to see that as k ! 1, k

k

(74) (75)

^k (j; x) ! ^(j; x) := w(j; x + uk ? j ): The inequalities (74) and (75) imply that ^k (j; x) is uniformly integrable. By taking the limit in (73) as k ! 1, we obtain

w(i; x) + f (i; x) + c(i; u) + F (w)(i; x + u);

(76)

see Section 4.5 in Chung (Ref. 15) for details. Note that we write (76) as an inequality and not an equality on account of the lower semicontinuity of c(i; ). Nevertheless, the inequalities (72) and (76) together prove (68) and the last part of the theorem. The result that w 2 L? follows immediately from Lemmas 3.6 and 3.7. 2

Lemma 3.8 For each i 2 I , the function w(i; x) is decreasing for x 0, w(i; ) is bounded from below, and limx!1 w(i; x) = 1. Proof. It follows from Lemma 3.4 in Beyer and Sethi (Ref. 8) that the function w(i; x) is decreasing for x 0. This implies the same property for the limit w. Because w(i; x) is locally Lipschitz by Theorem 3.1, it follows that w(i; 0) is nite. Assume ulim !1(c(i; u) + F (w

28

)(i; u)) = ?1:

Then it follows from (68) that w(i; 0) = ?1, and we obtain a contradiction. Therefore, c(i; u) + F (w)(i; u) is bounded from below and so is w(i; x). Let I1 := fi 2 I : limx!1 w(i; x) = 1g. Because c(g; u) + F (w)(g; u) is bounded from below and because Assumptions (A1) and (A4) imply that limx!1 f (g; x) = 1, it follows that limx!1 w(g; x) = 1. Therefore, g 2 I1 6= ;. Assume I1 6= I and choose j 2 I1 and i 2 I n I1. Then it follows that pij = 0, because otherwise we would have limx!1 F (w)(i; x) = 1 and, therefore, limx!1 w(i; x) = 1, which contradicts i 62 I1. We can now conclude that pij = 0 for all i 62 I1 and j 2 I1. But this says that the Markov process (in)1n=0 is not irreducible. This contradicts Assumption (A6), which proves I1 = I . 2

3.4 Veri cation Theorem

De nition 3.1 An admissible strategy U = (u0; u1; : : :) is called stable with respect to w if for each initial surplus level x and for each i 2 I , 1 E (w(i ; x )) = 0; lim k k k!1 k

where xk is the surplus level in period k corresponding to the initial state (i; x) and the strategy U .

Lemma 3.9 Assume (A1)-(A8). There are constants S i < 1 and si S i, i 2 I , satisfying maxi2I fsig C2 with C2 speci ed in Lemma 3.3, such that 8 i i < u(i; x) = S ? x; x s : 0;

x > si

attains the minimum on the RHS in (68) for w = w as de ned in Theorem 3.1. Furthermore, the stationary feedback strategy U = (u; u; : : :) is stable with respect to any w 2 L? .

Proof. Let

G (i; y) = ciy + k F (w )(i; y) and G(i; y) = ciy + F (w)(i; y): Because w(i; ) is K -convex, we know that a minimizer in (68) is given by 8 i i < u(i; x) = : S ? x; x si ; 0; x>s k

k

(77)

where S i minimizes G(i; y) de ned in (77) and si is a solution of the equation G(i; y) = K + G(i; S i) in y, if one exists, or si = ?1 otherwise. To prove that the strategy U is stable with respect to any w 2 L? , we derive bounds for S i and si. Obviously, S i as a global minimizer of G(i; y) is less than in nity for all i because G(i; y) ! 1 for y ! 1. 29

For 0 < 1, an (si; Si )-strategy is optimal, v(i; ) is K -convex, and from Lemma 3.3 there is a j 2 I such that sj C2. We denote the smallest such j by j (). For the sequence (k )1 k=1 from Theorem 3.1, at least one j appears in nitely often among the j (k ). In the following, we choose a subsequence, still denoted by (k )1k=1 , such that j (k ) = j . Then we know that for all 1 > k 0, it holds that sj C2 and that v (j ; x) is decreasing for x sj on account of the K -convexity of v (j ; x) given in Theorem 2.1. Furthermore, by Lemma 3.5 we know that Sj C8 for 2 < 1. Let C7 := maxfS j ; C8g. By the de nition of si, we know that the argmin of v(i; x) is greater than or equal to si, and therefore is greater than or equal to C2. Thus we can conclude that for 1 > k > 0, we have k

k

cj C2 + k F (v )(j ; C2) ? C2min fc y + k F (v )(j ; y)g K: yC7 j k

k

Subtracting k v (1; 0) from both the terms on the LHS, keeping in mind the notation de ned in (77), we obtain G (j ; C2) ? C2min fG (j ; y)g K: yC7 The functions G (j ; ) converge uniformly on any closed interval to G(j ; ). Therefore, we can pass to the limit and get G(j ; C2) ? C2min fG(j ; y)g K: (78) yC7 k

k

k

k

Note that the minimum in (78) is attained at S j . From (78) and the continuity of G(j ; ), it follows that we can choose an sj 2 [C2; S j ] such that

G(j ; sj ) ? G(j ; S j ) = K: That means that under U there will be in nitely many nonzero orders. Because of the structure of the strategy, the inventory level under U will never be higher than maxfx; S 1; : : :; S Lg. Therefore, we have xk maxfx; S 1; : : :; S Lg < 1; a.s. (79) Furthermore, it follows from sj C2 that if in = j then xn+1 C2 ? n+1 . It is easy to prove that the cumulative demand between two consecutive periods with demand state j has a nite expectation. This implies that E (x k ) is uniformly bounded for all k. This, together with (79) proves stability of U with respect to any w 2 L? . 2

Lemma 3.10 Let be de ned as in Theorem 3.1. Then for any admissible strategy U ,

J (i; x; U )

Proof. Let U = (u0; u1; : : :) denote any admissible decision. Suppose J (i; x; U ) < : 30

(80)

?1 f~(k ) < 1 for each Put f~(k) = E [f (ik ; xk) + c(ik ; uk)]. From (80) it follows immediately that Pnk=0 positive integer n, since otherwise we would have J (i; x; U ) = 1. Note that ?1 1 nX J (i; x; U ) = lim sup f~(k); while n!1 n k=0 1 X (1 ? )J(i; x; U ) = (1 ? ) k f~(k): (81) k=0

Since f~(k) is nonnegative for each k, the sum in (81) is well de ned for 0 < 1, and we can use the Tauberian theorem (see, e.g., Sznadjar and Filar (Ref. 16), Theorem 2.2) to obtain lim sup(1 ? )J(i; x; U ) J (i; x; U ) < : "1

On the other hand, we know from Theorem 2.2 that (1 ? k )v (i; x) ! on a subsequence fk g1k=1 converging to one. Thus, there exists an < 1 such that (1 ? )J(i; x; U ) < (1 ? )v(i; x); which contradicts the de nition of the value function v(i; x). 2 k

Theorem 3.3 (Veri cation Theorem). (i) Let (; w(; )) be a solution of the average cost optimality equation (68) with w 2 L? . Then J (i; x; U ) for any admissible U . (ii) Suppose there exists an u^(i; x) for which the in mum in (68) is attained. Furthermore, let U^ = (^u; u^; : : :), the stationary feedback policy given by u^, be stable with respect to w. Then ! NX ?1 1 = J (i; x; U^ ) = = Nlim E f (ik ; x^k ) + c(ik ; u^k ) ; !1 N k=0 and U^ is an average optimal strategy. (iii) Moreover, U^ minimizes ! NX ?1 1 f (ik ; xk ) + c(ik ; uk ) lim inf E N !1 N k=0 over the class of admissible decisions which are stable with respect to w. Proof. We start by showing that J (i; x; U ) for any U stable w.r.t. w. (82) We assume that U is stable with respect to w, and begin with a result on conditional expectations (see, e.g., Gikhman and Skorokhod, Ref. 17), which gives E fw(in+1; xn+1) j i1; : : :; in; 1; : : : ; ng = E fw(in+1; xn + un ? n+1 )ji1; : : : ; in; 1; : : :; n g = E (w(in+1; y ? n+1 )ji1; : : :; in; 1; : : :; n gy=x +u = E (w(in+1; y ? n+1 ) j in gy=x +u = F (w)(in; y)y=x +u = F (w)(in; xn + un) a:s: (83) n

n

31

n

n

n

n

Because uk does not necessarily attain the in mum in (68), we have

w(ik ; xk ) + f (ik ; xk ) + c(ik ; uk ) + F (w)(ik ; xk + uk ) a.s.; and from (83) we derive

w(ik ; xk ) + f (ik ; xk ) + c(ik ; uk ) + E (w(ik+1; xk + uk ? k+1 ) j ik ) a.s.: By taking the mathematical expectation on both sides, we obtain

E (w(ik ; xk )) + E (f (ik ; xk ) + c(ik ; uk )) + E (w(ik+1; xk+1)): Summing from 0 to n ? 1 yields ! nX ?1 f (ik ; xk) + c(ik ; uk ) + E (w(in ; xn)) ? E (w(i0; x0)): n E k=0

(84)

Divide by n, let n go to in nity, and use the fact that U is stable with respect to w, to obtain ! nX ?1 1 lim inf n E f (ik ; xk ) + c(ik ; uk ) : (85) n!1 k=0 Note that if the above inequality holds for 'liminf' it certainly also holds for 'limsup'. This proves (82). On the other hand, if there exists a u^ which attains the in mum in (68), we then have

w(ik ; x^k ) + = f (ik ; x^k ) + c(ik ; u^(ik ; x^k )) + F (w)(ik; x^k + u^(ik ; x^k )); a.s.; and we obtain analogously ! nX ?1 n = E f (ik ; x^k ) + c(ik ; u^(ik ; x^k )) + E (w(in; x^n)) ? E (w(i0; x^0)): k=0

Because U^ is assumed stable with respect to w, we get ! nX ?1 1 ^ = nlim !1 n E k=0 f (ik ; x^k ) + c(ik ; u^(ik ; x^k ) = J (i; x; U ):

(86)

(87)

For the special solution ( ; w) de ned in Theorem 3.1 (see also Theorem 3.2) and the strategy U de ned in Lemma 3.9 we obtain = J (i; x; U ): Since U is stable w.r.t. any function in L? by Lemma 3.9, it follows that

J (i; x; U ) = ; 32

(88)

which, in view of Lemma 3.10, proves part (i) of the theorem. Part (i) of the Theorem together with (87) proves the average-optimality of U^ over all admissible strategies. Furthermore, since = J (i; x; U^ ) by (87) and Lemma 3.10, it follows from (88) that = and the proof of Part (ii) is completed. Finally, Part (iii) immediately follows from Part (i) and (85). 2

Remark 3.4 It should be obvious that any solution (; w) of the average cost optimality equation

and control u^ satisfying (i) and (ii) of Theorem 3.3 will have a unique , since it represents the minimum average cost. On the other hand, if (; w) is a solution, then (; w + c) is also a solution, where c is any constant. For the purpose of this paper we do not require w to be unique up to a constant. If w is not unique up to a constant, then u^ may not be unique. We also do not need w in Theorem 3.1 to be unique either. The nal result of this section, namely, that there exists an average-optimal policy of (s; S )-type is an immediate consequence of Lemma 3.9 and 3.3.

Theorem 3.4 Assume (A1)-(A8). Let si and S i, i 2 I be de ned as in Lemma 3.9. The stationary

feedback strategy U = (u; u; : : :) de ned by 8 i i < u(i; x) = : S ? x; x si ; 0; x>s: is average optimal.

The derivation of an optimal strategy presented above only uses properties of the function w. We want to mention here an additional way to obtain an average-optimal state-dependent (s; S ) policy.

Corollary 3.1 Let (k )1k=0 be a sequence of discount factors that converges to one and let (si; S i)

be any cluster point of the pairs of strategy parameters (si ; Si ), which exists if we allow them to take the value ?1. Then (si; S i) is an average-optimal policy. k

k

Proof. Because Si is uniformly bounded from above for suciently close to one as proved in

Lemma 3.5, a cluster point exists. We choose a subsequence of the sequence (k )1k=0 de ned in i i Theorem 3.1, still denoted by (k )1 k=0 , such that the strategy parameters s and S converge to si and S i and w converges to w. Then, by Theorem 3.2, w satis es the average optimality equation (68). Furthermore, it is easy to show that for any xed x 6= si, u (i; x) converges to 8 i < x; x > s u (i; x) = : i S ? x; x si: k

k

k

For x = si, we can choose at least one of the following type of subsequences: 33

k

(a) si < x for all k or k

(b) si x for all k. k

For the type (a) subsequence we obtain u (i; x) ! x and for type (b) we get u (i; x) ! S i ? si. By Theorem 3.2, the limit u(i; x) is precisely the function for which the in mum in (68) is attained. Then by Theorem 3.3, the policy U = (u(i0; x0); u(i1; x1); : : :) is average optimal. This proves the optimality of at least one type ((48) or (52)) of (s; S ) strategy. On the other hand, because of the continuity of w, if one of the types is optimal, then so is the other type with the same parameters (si; S i). 2 k

k

4 Concluding Remarks In this paper we have generalized the in nite horizon inventory models involving xed cost that have appeared in the literature to allow for unbounded demand and costs with polynomial growth. In both discounted and average cost problems, we have shown the existence of an optimal Markov policy, and moreover this policy can be of state-dependent (s; S )-type. The paper has made several speci c contributions. It has extended the proofs of existence and veri cation of optimality in the discounted cost case given in Bensoussan et al. (Ref. 3) and Sethi and Cheng (Ref. 7) to allow for more general costs including lower semicontinuous surplus cost with polynomial growth. In the vanishing discount approach to the average cost problem, relaxation of the bounded Markovian demand assumption in Beyer and Sethi to unbounded Markovian demands requires the use of renewal arguments for the establishment of the crucial local equi-Lipschitz property of the discounted cost value function. This has been carried out in Lemmas 3.1{3.4. Finally, the establishment of the ergodic dynamic programming equation requires proving uniform integrability of the dierential discounted value functions accomplished in Lemmas 3.6, 3.7 and Theorem 3.2 and the uniform boundedness of the discounted order-up-to-levels accomplished in Lemma 3.5. Some problems of theoretical interest remain open. One might want to show that the value function in the discounted nonstationary in nite horizon case is continuous if the surplus cost function is continuous. More importantly, the question of whether or not the potential function w is unique up to a constant could be examined. The considerable literature on average cost MDPs, surveyed in Arapostathis et al. (Ref. 18) and Hernandez-Lerma and Lasserre (Ref. 19), could be found useful in addressing these questions. Finally, it may be noted that a book by Beyer, Sethi and Taksar (Ref. 20) containing the analysis of average cost inventory problems in discrete as well as continuous time is currently in progress. 34

References 1. Karlin, S., Optimal Inventory Policy for the Arrow-Harris-Marschak Dynamic Model. Studies in Mathematical Theory of Inventory and Production. Edited by K.J. Arrow, S. Karlin and H. Scarf, Stanford University Press, Stanford, California, pp. 135-154, 1958. 2. Scarf, H., The Optimality of (s; S ) Policies in the Dynamic Inventory Problem, Mathematical Methods in the Social Sciences, Edited by K. Arrow, S. Karlin, and P. Suppes, Stanford University Press, Stanford, California, pp. 196-202, 1960. 3. Bensoussan, A., Crouhy M., and Proth, J., Mathematical Theory of Production Planning. North-Holland, Amsterdam, Netherlands, 1983. 4. Holt, C., Modigliani, F., Muth, J.F., and Simon, H.A., Planning Production, Inventory, and Workforce. Prentice-Hall, Englewood Clis, New Jersey, 1960. 5. Karlin, S. and Fabens, A., The (s; S ) Inventory Model under Markovian Demand Process, Mathematical Methods in the Social Sciences, Edited by K. Arrow, S. Karlin, and P. Suppes, Stanford University Press, Stanford, California, pp. 159-175, 1960. 6. Song, J.S. and Zipkin, P., Inventory Control in a Fluctuating Demand Environment, Operations Research, Vol. 41, pp. 351-370, 1993. 7. Sethi, S. and Cheng, F., Optimality of (s; S ) Policies in Inventory Models with Markovian Demand, Operations Research, Vol. 45, No. 6, pp, 331{339 1997. 8. Beyer, D. and Sethi, S.P., Average Cost Optimality in Inventory Models with Markovian Demands. Journal of Optimization Theory and Application, Vol. 92, No. 3, pp. 497{526, 1997. 9. Beyer, D. and Sethi, S. P., The Classical Average Cost Inventory Models of Iglehart (1963) and Veinott and Wagner (1965) Revisited, Working Paper, Hewlett-Packard Laboratories, Palo Alto, California, 1998. 10. Iglehart, D., Dynamic Programming and Stationary Analysis of Inventory Problems, Multistage Inventory Models and Techniques, Edited by H. Scarf, D. Gilford, and M. Shelly, Stanford University Press, Stanford, California, pp. 1{31, 1963. 11. Veinott, A. and Wagner, H., Computing Optimal (s; S ) policies, Management Science, Vol. 11, pp. 525{552, 1965. 12. Aubin, J.P., Mathematical Methods of Game and Economic Theory, North-Holland, Amsterdam, The Netherlands, 1979. 35

13. Ross, S.M., Stochastic Processes, Wiley Series in Probability and Statistics. Wiley, New York, New York, 1983. 14. Ross, S.M., Introduction to Probability Models, Academic Press, San Diego, California, 1989. 15. Chung, K.L., A course in Probability Theory, Academic Press, New York, New York, 1974. 16. Sznadjer, R. and Filar, J. A., Some Comments on a Theorem of Hardy and Littlewood, Journal of Optimization Theory and Applications, Vol. 75, pp. 201{208, 1992. 17. Gikhman, I. and Skorokhod, A., Stochastic Dierential Equations, Springer Verlag, Berlin, Germany, 1972. 18. Arapostathis, A., Borkar, V.S., Fernandez-Gaucherand, E., Ghosh, M.K., and Marcus, S.I., Discrete Time Controlled Markov-Processes with Average Cost Criterion: A Survey, SIAM Journal of Control and Optimization, Vol. 31, pp. 282{344, 1993. 19. Hernandez-Lerma, O. and Lasserrre, J. B., Discrete-Time Markov Control Processes, Springer, New York, New York, 1996. 20. Beyer, D., Sethi, S.P., and Taksar, M., Production-Inventory Models with Average Cost, Book in progress, 1999.

36