Stochastic Setup-Cost Inventory Model with Backorders and ...

4 downloads 25 Views 393KB Size Report
May 18, 2017 - OC] 18 May 2017. Stochastic Setup-Cost Inventory Model with. Backorders and Quasiconvex Cost Functions. Eugene A. Feinberg, Yan Liang.
arXiv:1705.06814v1 [math.OC] 18 May 2017

Stochastic Setup-Cost Inventory Model with Backorders and Quasiconvex Cost Functions Eugene A. Feinberg, Yan Liang Department of Applied Mathematics and Statistics Stony Brook University, Stony Brook, NY 11794 [email protected], [email protected]

Abstract This paper studies a periodic-review setup-cost inventory control model with backorders and holding/backlog costs satisfying quasiconvexity assumptions. It is shown that this model satisfies assumptions that imply the validity of optimality equations for discounted and average cost criteria. In particular, we show that the family of discounted relative value functions is equicontinuous. The paper establishes two groups of results: (i) the optimality of (s, S) policies for infinitehorizon problems under discounted and average cost criteria, and (ii) convergence of optimal discounted lower thresholds sα , where α is a discount factor, to optimal average-cost lower threshold s and convergence of discounted relative value functions to average-cost relative value function as α ↑ 1. The convergence results of type (ii) previously were known only for subsequences of discount factors even for problems with convex holding/backlog costs. The results of this paper also hold for problems with fixed lead times.

Keywords: Inventory control, average-cost optimality equations, (s, S) policies.

1

Introduction

This paper studies a periodic-review setup-cost inventory control model with backorders and holding/backlog costs satisfying quasiconvexity assumptions. It establishes two groups of results: (i) the optimality of (s, S) policies for infinite-horizon problems under discounted and average cost criteria, and (ii) convergence of optimal discounted lower thresholds sα , where α is a discount factor, to optimal average-cost lower threshold s and convergence of discounted relative value functions to average-cost relative value function as α ↑ 1. The convergence results of type (ii) previously were known only for subsequences of discount factors. The results of this paper hold for problems with deterministic positive lead times. The inventory control literature is vast, and earlier publications are surveyed in Porteus [16] and Zipkin [25]. Here we mention only later references and a few directly relevant earlier references. 1

For problems with convex holding/backlog cost functions, Scarf [18] introduced the concept of K-convexity to prove the optimality of (s, S) policies for finite-horizon problems with continuous demand and convex holding/backlog costs. Zabel [23] indicated some gaps in Scarf [18] and corrected them. References [1, 2, 3, 4, 5, 9, 10, 11, 15, 18, 20, 22], mentioned in this and the following paragraph, deal with convex or linear ordering/backlog functions. Iglehart [15] extended the results in Scarf [18] to infinite-horizon problems with continuous demand. Veinott and Wagner [22] proved the optimality of (s, S) policies for both finite-horizon and infinite-horizon problems with discrete demand. Beyer and Sethi [2] completed the missing proofs in Iglehart [15] and Veinott and Wagner [22]. Chen and Simchi-Levi [3, 4] studied coordinating inventory control and pricing problems and proved the optimality of (s, S) policies without assuming that the demand is discrete or continuous. Under certain assumptions, their results imply the optimality of (s, S) policies for problems without pricing. Beyer et al. [1] and Huh et al. [14] studied problems with parameters depending on exogenous factors modeled by a Markov chain. The analysis of periodic-review inventory control problems is based on the theory of Markov Decision Processes (MDPs). However, most of inventory control papers use only basic facts from the MDP theory, and the corresponding general results had been unavailable for long time. Feinberg et al. [6] developed the results on MDPs with Borel state spaces, possibly noncompact action sets, and possibly unbounded one-step costs. Discrete-time periodic-review inventory control problems are particular examples of such MDPs; see Feinberg [5] for details. Feinberg and Lewis [9] obtained additional convergence results for convergence of optimal actions for MDPs and established the optimality of (s, S) policies for inventory control problems as well as other results. Feinberg and Liang [10] provided descriptions of optimal policies for all possible values of discount factors (for some parameters, optimal (s, S) policies may not exist for discounted and finite-horizon problems). Feinberg and Liang [11] proved that discrete-time periodic-review inventory control models with backorders satisfy the equicontinuity assumption, and this implies several addition properties of optimal average-cost policies including the validity of average-cost optimality equations (ACOEs). Veinott [21] studied the nonstationary setup-cost inventory control model with a fixed lead time, backorders, and holding/backlog costs satisfying quasiconvexity assumptions. He proved the optimality of (s, S) policies for finite-horizon problems and also provided bounds on the values of the optimal thresholds s and S. Zheng [24] proved the optimality of (s, S) policies for models with discrete demands under both discounted and average cost criteria by constructing a solution to the optimality equations. This paper considers the setup-cost inventory control model with holding/backlog costs satisfying quasiconvexity assumptions. Our quasiconvexity assumptions are weaker than those introduced by Veinott [21] and used in Zheng [24]. For inventory control model with holding/backlog costs satisfying quasiconvexity assumptions, this paper establishes the validity of optimality equations for infinite-horizon discounted 2

and average cost criteria. This is done by showing that MDPs, corresponding to inventory control problems, satisfy certain assumptions. It also establishes the optimality of (s, S) policies and convergence properties of optimal discounted thresholds for discounted problems to the corresponding thresholds for average-cost problems. Some of the results are new even for problems with convex holding/backlog costs. While convergence of optimal thresholds and relative discounted value functions was known only for subsequences of discount factors (see [1, 9, 11, 14]), here we show that convergence of the lower thresholds and discounted value functions takes place for all discount factors tending to 1. The rest of the paper is organized in the following way. Section 2 describes the setup-cost inventory control model and introduces the assumptions used in this paper. Section 3 establishes the optimality of (sα , Sα ) policies for the infinite-horizon problem with the discount factor α. Section 4 verifies average-cost optimality assumptions and the equicontinuity conditions for discounted relative value functions. Section 5 establishes the validity of ACOEs for the inventory control model and the optimality of (s, S) policies under the average cost criterion. Section 6 establishes the convergence of discounted optimal lower thresholds sα , when the discount factor α converges to 1. Section 7 establishes the convergence of discounted relative value functions, when the discount factor converges to 1. Section 8 presents a reduction from the inventory control model with positive lead times to the model without lead times using Veinott’s [21] approach.

2

Setup-Cost Inventory Control Model with Backorders: Definitions and Assumptions

Let R denote the real line, Z denote the set of all integers, R+ := [0, +∞) and N0 := {0, 1, 2, . . .}. Consider the stochastic periodic-review setup-cost inventory control model with backorders. At times t = 0, 1, . . . , a decision-maker views the current inventory of a single commodity and makes an ordering decision. Assuming zero lead times, the products are immediately available to meet demand. The cost of ordering is incurred at the time of delivery of the order. Demand is then realized, the decisionmaker views the remaining inventory, and the process continues. The unmet demand is backlogged. The demand and the order quantity are assumed to be nonnegative. The inventory control model is defined by the following parameters. 1. α ∈ [0, 1) is the discount factor; 2. K > 0 is a fixed ordering cost; 3. c¯ > 0 is the per unit ordering cost; 4. {Dt , t = 1, 2, . . . } is a sequence of i.i.d. nonnegative finite random variables representing the demand at periods 0, 1, . . . . We assume that E[D] < ∞ and

3

P (D > 0) > 0, where D is a random variable with the same distribution as D1 ; 5. h(x) is the holding/backlog cost per period if the inventory level is x. Assume that: (i) the function E[h(x − D)] is finite and continuous for all x ∈ X; and (ii) E[h(x − D)] → ∞ as |x| → ∞. Without loss of generality, assume that the function E[h(x − D)] is nonnegative. The assumption P (D > 0) > 0 avoids the trivial case when there is no demand. In addition, E[h(x − D)] → ∞ as x → −∞ implies that h(x) → ∞

as

x → −∞,

(2.1)

because if h(x) < ∞ as x → ∞, then there exist real numbers M1 , M2 > 0 such that h(x) ≤ M1 for x ≤ −M2 , which implies that E[h(x − D)] ≤ M1 for x ≤ −M2 and E[h(x − D)] < ∞ as x → −∞. Recall the definition of quasiconvex functions. Definition 2.1. A function f is quasiconvex on a convex set X, if for all x, y ∈ X and 0 ≤ λ ≤ 1 f (λx + (1 − λ)y) ≤ max{f (x), f (y)}. Consider the following assumptions on the quasiconvexity or convexity of the cost function. Define hα (x) := h(x) + (1 − α)cx + cE[D], Assumption 1.

x ∈ X.

(2.2)

(i) The function E[hα (x − D)] = E[h(x − D)] + (1 − α)cx + αcE[D]

(2.3)

is quasiconvex for all α ∈ [0, 1]; (ii) There exists a ˜ ∈ [0, 1) such that lim E[ha˜ (x − D)] > K + inf {E[ha˜ (x − D)]}.

x→−∞

x∈X

(2.4)

Assumption 2. The function h(·) is convex on X. Note that since E[h(x − D)] → ∞ as x → ∞ and (1 − α)c ≥ 0 for all α ∈ [0, 1], then (2.3) implies that E[hα (x − D)] → ∞ as x → ∞ for all α ∈ [0, 1]. Assumption 1 is slightly weaker than the assumption introduced by Veinott [21] and used by Zheng [24], in which (2.4) is also assumed for all α ∈ [0, 1]. For the discounted criterion, consider the following assumption, that is slightly weaker than Assumption 1, which is used for average-cost problems. 4

Assumption 3. For α ∈ [0, 1) assume that: (i) the function E[ha˜ (x − D)] is quasiconvex for all α ˜ ∈ [0, α]; (ii) Assumption 1(ii) holds. For α ∈ [0, 1], if lim E[hα (x − D)] > inf E[hα (x − D)],

(2.5)

 rα := min Argmin{E[hα (x − D)]} ,

(2.6)

x→−∞

x∈X

then define x∈X

where (2.5), the continuity of the function E[hα (x − D)] and E[hα (x − D)] → ∞ as x → ∞ imply that |rα | < ∞. In addition to Assumption 1, consider the following assumption, which is used to establish the convergence of the discounted optimal lower thresholds and relative value functions in Sections 6 and 7, respectively. Assumption 4. For a given α ∈ [0, 1], the function E[hα (x−D)] is strictly decreasing on (−∞, rα ], where rα is defined in (2.6). Consider the function hα defined in (2.2). Let α∗ := sup{β : lim E[hβ (x − D)] ≤ K + inf E[hβ (x − D)]}, x→−∞

x∈X

(2.7)

where the supremum of an empty set is −∞. If Assumption 3(ii) holds, then α∗ < 1. For α > β > α∗ and x ≤ rβ E[hα (x − D)] − E[hα (rβ − D)] = (1 − α)(x − rβ ) ≥ (1 − β)(x − rβ ) = E[hβ (x − D)] − E[hβ (rβ − D)] ≥ 0,

(2.8)

where the equalities follow from (2.3) and the inequality holds because 1 − α < 1 − β and x − rβ ≤ 0. Therefore, (2.8) implies that rα ≥ rβ if α > β. Lemma 2.2. Let Assumption 3 hold for the discount factor α and α ∈ (α∗ , 1), where α∗ is defined in (2.7). Then: (i) if α ˜ ∈ (α∗ , α], then limx→−∞ E[hα (x − D)] = ∞, equation 2.4 holds, and the function E[hα (x − D)] is nonincreasing on (−∞, rα ]; (ii) if α ˜ ∈ [0, α∗ ), then limx→−∞ E[hα (x − D)] = −∞, equation 2.4 does not hold, and the function E[hα (x − D)] is nondecreasing on X.

5

Proof. The proof is by contradictions. (i) Since α > α∗ , then limx→−∞ E[hα (x − D)] > K + inf x∈X E[hα (x − D)] > −∞. Assume that limx→−∞ E[hα (x − D)] < ∞. Then for all β ∈ [0, α), limx→−∞ E[hβ (x − D)] = limx→−∞ E[hα (x − D)] + (α − β)cx = −∞. The quasiconvexity of E[hβ (x − D)] implies that it is nondecreasing for all β ∈ [0, α). Then α ≤ α∗ , which contradicts with α > α∗ . Therefore, (i) holds. (ii) Since α < α∗ , then limx→−∞ E[hα (x − D)] ≤ K + inf x∈X E[hα (x − D)] < ∞. Assume that limx→−∞ E[hα (x−D)] > −∞. Then for all β ∈ (α, 1], limx→−∞ E[hβ (x− D)] = limx→−∞ E[hα (x − D)] − (β − α)cx = ∞. The quasiconvexity of E[hβ (x − D)] implies that limx→−∞ E[hβ (x − D)] > K + inf x∈X E[hβ (x − D)] for all β ∈ [0, α). Then α ≥ α∗ , which contradicts with α < α∗ . Therefore, (ii) holds. Lemma 2.3. If Assumption 2, then h(x) < 1. x→−∞ cx

α∗ = 1 + lim

(2.9)

Proof. As explained in Feinberg and Liang [10, Equations (2.3), (4.1)], (2.1) and the convexity of the function h imply that 1 + limx→−∞ h(x) cx < 1. h(x) ′ ′ ′ For α > 1 + limx→−∞ cx , since limx→−∞ hα (x) = limx→−∞ cx( h(x) cx + 1 − α ) = ∞, then limx→−∞ E[hα′ (x − D)] = ∞. Therefore, the convexity of the function hα′ implies that for α′ > 1 + limx→−∞ h(x) cx lim E[hα′ (x − D)] > K + inf E[hα′ (x − D)].

x→−∞

x∈X

(2.10)

h(x) ′ ′ For α′ < 1 + limx→−∞ h(x) cx , since limx→−∞ hα (x) = limx→−∞ cx( cx + 1 − α ) = −∞, then limx→−∞ E[hα′ (x − D)] = −∞. Therefore, the convexity of the function hα′ implies that for α′ < 1 + limx→−∞ h(x) cx

lim E[hα′ (x − D)] < K + inf E[hα′ (x − D)].

x→−∞

x∈X

In view of(2.10) and (2.11), α∗ = 1 + limx→−∞

(2.11)

h(x) cx .

The following two lemmas state the relationships between these assumptions. Lemma 2.4. Assumption 2 implies the validity of Assumption 1 and the validity of Assumption 4 for all α ∈ (α∗ , 1], where α∗ is defined in (2.7). Proof. We first prove that Assumption 2 implies the validity of Assumption 1. Since the function h is convex, then the function E[h(x−D)] is convex. Therefore, in view of (2.3), the function E[hα (x−D)] is convex for all α ∈ [0, 1]. Since every convex function is quasiconvex, then Assumption 1(i) holds. According to Lemma 2.3, α∗ < 1, where α∗ is defined in (2.9). Then (2.7) implies that Assumption 1(ii) holds for α ˜ ∈ (α∗ , 1). In view of (2.7), the convexity of the function E[hα˜ (x − D)] implies that Assumption 4 holds for all α ˜ ∈ (α∗ , 1]. 6

Lemma 2.5. Assumption 1 implies Assumption 3. Proof. Since α ∈ [0, 1), then Assumption 1 implies Assumption 3. Now we formulate an MDP for this inventory control model. The state and action spaces can be either (i) X = Z and A = N0 , if the demand D takes only integer values, or (ii) X = R and A = R+ , otherwise. The dynamic of the system is defined by the equation xt+1 = xt + at − Dt+1 ,

t = 0, 1, 2, . . . ,

(2.12)

where xt and at denote the current inventory level and the ordered amount at period t, respectively. The transition probability q(dxt+1 |xt , at ) for the MDP defined by the stochastic equation (2.12) is q(B|xt , at ) = P (xt + at − Dt+1 ∈ B)

(2.13)

for each measurable subset B of R. The one-step cost is c(x, a) := KI{a>0} +¯ ca + E[h(x + a − D)],

(x, a) ∈ X × A.

(2.14)

Let Ht = (X × A)t × X be the set of histories for t = 0, 1, . . . . Let Π be the set of all policies. A (randomized) decision rule at period t = 0, 1, . . . is a regular transition probability πt : Ht → A, that is, (i) πt (·|ht ) is a probability distribution on A, where ht = (x0 , a0 , x1 , . . . , at−1 , xt ), and (ii) for any measurable subset B ⊂ A, the function πt (B|·) is measurable on Ht . A policy π is a sequence (π0 , π1 , . . . ) of decision rules. Moreover, π is called non-randomized if each probability measure πt (·|ht ) is concentrated at one point. A non-randomized policy is called stationary if all decisions depend only on the current state. According to the Ionescu Tulcea theorem (see Hern´andez-Lerma and Lasserre [13, p. 178]), given the initial state x, a policy π defines the probability distribution Pxπ on the set of all trajectories H∞ = (X × A)∞ . We denote by Eπx the expectation with respect to Pxπ . For a finite-horizon N = 0, 1, . . . , let us define the expected total discounted costs π vN,α (x) := Eπx

−1 h NX t=0

i αt c(xt , at ) ,

x ∈ X,

(2.15)

π (x) = 0, x ∈ X. When N = ∞ and where α ∈ [0, 1] is the discount factor and v0,a α ∈ [0, 1), equation (2.15) defines the infinite-horizon expected total discounted cost denoted by vαπ (x). Let vα (x) := inf π∈Π vαπ (x), x ∈ X. A policy π is called optimal for π (x) = v π the respective criterion with discount factor α if vN,α N,α (x) or vα (x) = vα (x) for all x ∈ X. The average cost per unit time is defined as

wπ (x) := lim sup N →+∞

1 π v (x), N N,1

x ∈ X.

(2.16)

Define the optimal value function w(x) := inf π∈Π wπ (x), x ∈ X. A policy π is called average-cost optimal if wπ (x) = w(x) for all x ∈ X. 7

3

Setup-Cost Inventory Control Model with Discounted Costs

This section establishes the existence of optimal (sα , Sα ) polclies for the problems with discounted costs stated in Theorem 3.4. We start this section by verifying the weak continuity of the transition probability q defined in (2.13) and the K-infcompactness of the one-step cost function c defined in (2.14). These properties, stated in Assumption W*, imply the validity of optimality equations and the convergence of value iterations for problems with discounted costs; see Feinberg et al. [6, Theorem 4]. Recalled that a function f : U → R ∪ {+∞} for a metric space U, where U is a subset of a metric space U, is called inf-compact, if for every λ ∈ R the level set {u ∈ U : f (u) ≤ λ} is compact. Definition 3.1 (Feinberg et al. [7, Definition 1.1], Feinberg [5, Definition 2.1]). A function f : X × A → R is called K-inf-compact, if for every nonempty compact subset K of X, the function f : K × A → R is inf-compact. It is know for discounted MDPs that if the one-step cost function c and transition probability q satisfy the following condition, then it is possible to write the optimality equations for the finite-horizon and infinite-horizon problems, these equations define the sets of stationary and Markov optimal policies for infinite-horizon and finitehorizon, respectively, vα (x) = limN →∞ vN,α (x) for all x ∈ X, and the functions vN,α , N = 1, 2, . . . , and vα are lower semicontinuous; see Feinberg et al. [6, Theorems 3, 4]. Assumption W* (Feinberg et al. [6], Feinberg and Lewis [9], or Feinberg [5]). (i) The function c is K-inf-compact and bounded below, and (ii) the transition probability q(·|x, a) is weakly continuous in (x, a) ∈ X × A, ˜ Rthat is, for every bounded continuous function f : X → R, the function f (x, a) := X f (y)q(dy|x, a) is continuous on X × A. Lemma 3.2. The inventory control model satisfies Assumption W*, and the one-step cost function c is inf-compact.

Proof. Since the function E[h(x − D)] is continuous and tends to ∞ as |x| → ∞, then this lemma follows from the same arguments in Feinberg and Lewis [9, Corollary 5.2]. According to Feinberg and Lewis [9], since Assumption W* holds for the MDP corresponding to the described inventory control model, then the optimality equations for the total discounted costs can be written as vt+1,α (x) = min{min[K + Gt,α (x + a)], Gt,α (x)} − c¯x, a≥0

vα (x) = min{min[K + Gα (x + a)], Gα (x)} − c¯x, a≥0

8

t = 0, 1, 2, . . . ,

(3.1) (3.2)

where Gt,α (x) := c¯x + E[h(x − D)] + αE[vt,α (x − D)],

t = 0, 1, 2, . . . ,

Gα (x) := c¯x + E[h(x − D)] + αE[vα (x − D)],

(3.3) (3.4)

and v0,α (x) = 0 for all x ∈ X. Recall the definition of (s, S) policies. Suppose f (x) is a lower semicontinuous function such that lim|x|→∞ f (x) > K + inf x∈X f (x). Let S ∈ Argmin{f (x)},

(3.5)

x∈X

s = inf{x ≤ S : f (x) ≤ K + f (S)}.

(3.6)

Definition 3.3. Let st and St be real numbers such that st ≤ St , t = 0, 1, . . . . A policy is called an (st , St ) policy at step t if it orders up to the level St , if xt < st , and does not order, if xt ≥ st . A Markov policy is called an (st , St ) policy if it is an (st , St ) policy at all steps t = 0, 1, . . . . A policy is called an (s, S) policy if it is stationary and it is an (s, S) policy at all steps t = 0, 1, . . . . In this section, we consider Assumption 3, which guarantees the optimality of (sα , Sα ) policies for infinite-horizon problems with the discount factor α. The following theorem states the optimality of (sα , Sα ) policies for infinite-horizon discounted-cost problems with the discount factor α. Theorem 3.4. Let Assumption 3 hold for the discount factor α and α ∈ (α∗ , 1), where α∗ is defined in (2.7). For the infinite-horizon problem, there exists an optimal (sα , Sα ) policy, where Sα and sa are real numbers such that Sα satisfies (3.5) and sa is defined in (3.6) with f (x) := Gα (x), x ∈ X. Remark 3.5. If Assumption 2 holds, then Theorem 3.4 is formulated in Feinberg and Liang [10, Theorem 4.4] with α∗ defined in (2.9). To prove Theorems 3.4, we first establish several auxiliary facts. Lemma 3.6. For x ≤ y and t = 1, 2, . . . vt,α (x) + cx ≤ vt,α (y) + cy + K,

(3.7)

vα (x) + cx ≤ vα (y) + cy + K,

(3.8)

Gt,α (y) − Gt,α (x) ≥ E[hα (y − D)] − E[hα (x − D)] − αK,

(3.9)

Gα (y) − Gα (x) ≥ E[hα (y − D)] − E[hα (x − D)] − αK.

(3.10)

Proof. In view of (3.1), for x ≤ y and t = 1, 2, . . . vt,α (x) + cx = min{min{K + Gt−1,α (x + a)}, Gt−1,α (x)} ≤ min{K + Gt−1,α (x + a)} a≥0

a≥0

≤ K + min Gt−1,α (x + a) = K + min Gt−1,α (y + a) a≥y−x

a≥0

≤ K + min{min{K + Gt−1,α (y + a)}, Gt−1,α (y)} = K + vt,α (y) + cy, a≥0

9

where the second inequality follows from y − x ≥ 0. Furthermore, (3.2) and the same arguments imply (3.8). In view of (3.3), for x ≤ y and t = 1, 2, . . . Gt,α (y) − Gt,α (x) =E[hα (y − D)] − E[hα (x − D)] + αE[vt,α (y − D) + c(y − D) − vt,α (x − D) − c(x − D)] ≤E[hα (y − D)] − E[hα (x − D)] + αK, where the inequality follows from (3.7). Furthermore, (3.4) and the same arguments imply (3.10). Let S0 := 0 and Sn :=

n X

n = 1, 2, . . . .

Dj ,

(3.11)

j=1

For t ∈ N0 and α ∈ [0, 1), let ft,α (x) be the expected total holding/backlog costs over the first t steps, if orders are not placed, ft,α (x) := c¯x +

t X

αi E[h(x − Si+1 )],

x ∈ X,

(3.12)

i=0

Observe that f0,α (x) = c¯x + E[h(x − D)] = G0,α (x). Let t n X αi ≥ Nα := inf t ∈ N0 : i=0

1 o , 1 − α∗

(3.13)

where the infimum of an empty set is +∞. Lemma 3.7. If the function Gt,α (x) is nondecreasing on X, then the minimum in the optimality equation (3.1) is achieved at the action a = 0 for all x ∈ X and t = 0, 1, . . . . Proof. Since the function Gt,α (x) is nondecreasing, then Gt,α (x) ≤ K + Gt,α (x + a) for all x ∈ X. In view of (3.1), the action a = 0 is optimal at the epoch t. Lemma 3.8. Let Assumption 3 hold for the discount factor α and α ∈ (α∗ , 1), where α∗ is defined in (2.7). If for some t0 = 0, 1, . . . and M0 > 0, such that the inequality Gt,α (y) − Gt,α (x) ≤ αt−t0 M0

(3.14)

holds for t = t0 and x ≤ y ≤ rα , where rα is defined in (2.6), then inequalities (3.14) and vt,α (y) + cy − vt,α (x) − cx ≤ αt−t0 M0 hold for t = t0 + 1, t0 + 2, . . . and for x ≤ y ≤ rα . 10

(3.15)

Proof. The proof is by induction on t. Assume that (3.14) holds for t = k ≥ t0 , then for x ≤ y ≤ rα  vk+1,α (x) + cx + αk−t0 M0 = min Gk,α (x), min{K + Gk,α (x + a)} + αk−t0 M0 a≥0  k−t0 = min Gk,α (x) + α M0 , min{K + Gk,α (x + a) + αk−t0 M0 } a≥0  ≥ min Gk,α (y), min {K + Gk,α (x + a) + αk−t0 M0 }, 0≤a 0 and αk−t0 M0 ≥ 0. Thus, (3.15) holds for t = k + 1. In addition, for x ≤ y ≤ rα Gk+1,α (y) − Gk+1,α (x) =E[hα (y − D)] − E[hα (x − D)] + αE[vk+1,α (y − D) + c(y − D) − vk+1,α (x − D) − c(x − D)] ≤ αk+1−t0 M0 , where the equality follows from (3.3) and the inequality holds because the function E[hα (x − D)] is nonincreasing on (−∞, rα ] and (3.15). Since (3.14) holds for t = t0 , then this completes the induction arguments. Lemma 3.9. Let Assumption 3 hold for the discount factor α and α ∈ (α∗ , 1), where α∗ is defined in (2.7). Then Nα < ∞ and Gt,α (x) = ft,α (x) for t = 0, 1, . . . , Nα , where the functions ft,α (x) and the number Nα are defined in (3.12) and (3.13) respectively. P Proof. The proof is by induction on t. Since ti=0 αi is increasing in t and for α > α∗ lim

t→∞

t X

1 1 > , 1−α 1 − α∗

αi =

i=0

1 then Nα < +∞. Since < 1 under Assumption 3, then α < 1 < 1−α ∗ . Therefore, Pt 1 i i=0 α < 1−α∗ for t = 0, 1, . . . , Nα − 1. In view of (3.3) and (3.12), G0,α = f0,α . Assume that Gt,α = ft,α for t = k < Nα , P 1 where k ≥ 0. Consider α′ < α∗ such that ki=0 αi ≤ 1−α ′ . Since Gk,α = fk,α , then for all x ∈ X

α∗



Gk,α (x) = fk,α(x) = (1 − (1 − α ) = (1 − (1 − α′ )

k X i=0

k X

i

α )cx +

i=0 k X

αi )cx +

i=0

k X i=0

  αi (1 − α′ )cx + E[h(x − Si+1 )]

  αi E[hα′ (x − Si+1 )] + α′ cE[Si+1 ] , 11

(3.16)

P where the second follows from (3.12) and Si+1 is defined in (3.11). Since ki=0 αi ≤ Pk 1 ′ i i=0 α ≥ 0 and the first summand in the last expression in 1−α′ , then 1 − (1 − α ) (3.16) is nondecreasing in x. According to Lemma 2.2(ii), the function E[hα′ (x−D)] is nondecreasing, which implies that the function E[hα′ (x−Si+1 )] = E[hα′ (x−Si −D)] is nondecreasing for all i = 0, 1, . . . . Therefore, (3.16) implies that the function Gt,α (x) is nondecreasing in x. For t = k + 1 Gk+1,α (x) = c¯x + E[h(x − D)] + αE[Gk,α (x − D) − c¯(x − D)] = c¯x + E[h(x − S1 )] + αE[fk,α (x − D) − c¯(x − D)] = c¯x +

k+1 X

αi E[h(x − Si+1 )] = fk+1,α(x),

(3.17)

i=0

where the first equality follows from (3.3), (3.1), and Lemma 3.7, the second one follows from the induction assumption, and the last two equalities follow from (3.12). Lemma 3.10. Let Assumption 3 hold for the discount factor α and α ∈ (α∗ , 1), where α∗ is defined in (2.7). Then there exist real numbers Nα′ ≤ Nα + 1 < ∞ and M0 > 0 such that for all x ≤ y ≤ rα GNα′ ,α (y) − GNα′ ,α (x) ≤ M0 ,

(3.18)

where rα is defined in (2.6). Proof. We first prove that there exist Nα′ ∈ {Nα , Nα + 1} and α′ ∈ [α∗ , α) such that PNα′ 1 i=0 ≥ 1−α′ , limx→−∞ E[hα′ (x − D)] > inf x E[hα′ (x − D)], and GNα′ ,α = fNα′ ,α . P α i 1 In view of the definition of Nα in (3.13), N i=0 α ≥ 1−α∗ . Then we consider two P P α i Nα i 1 1 cases: (a) N i=0 α = 1−α∗ . i=0 α > 1−α∗ ; and (b) P∞ i P Nα i 1 1 (a) Since 1−α ˆ ∈ (α∗ , α) such ∗ < i=0 α < i=0 α = 1−α , then there exists α P Nα i ˆ > α∗ , then limx→−∞ E[hαˆ (x − that i=0 α > 1−1 αˆ . According to Lemma 2.2, since α ′ D)] > inf x E[hαˆ (x − D)]. Therefore, choose Nα = Nα and α′ = α ˆ . According to Lemma 3.9, GNα ,α (x) = fNα ,α (x) and Nα < ∞. P α i 1 (b) In this case, N i=0 α = 1−α∗ , we consider two cases: (b1) limx→−∞ E[hα∗ (x − D)] > inf x E[hα∗ (x − D)]; and (b2) the function E[hα∗ (x − D)] is nondecreasing on X. These two cases are mutually exclusive because of the quasiconvexity of the function E[hα∗ (x − D)]. (b1) Since limx→−∞ E[hα∗ (x − D)] > inf x E[hα∗ (x − D)], choose Nα′ = Nα and ′ α = α∗ . Lemma 3.9 implies that GNα ,α (x) = fNα ,α (x) and Nα < ∞. (b2) Since Lemma 3.9 implies that GNα ,α (x) = fNα ,α (x) and Nα < ∞, then in

12

view of (3.16), Nα X



GNα ,α (x) = (1 − (1 − α )

Nα X

i

α )cx +

i=0

i=0

=

Nα X i=0

  αi E[hα∗ (x − Si+1 )] + α∗ cE[Si+1 ]

  αi E[hα∗ (x − Si+1 )] + α∗ cE[Si+1 ] ,

P α i 1 where the second equality holds because N i=0 α = 1−α∗ . Since the function E[hα∗ (x− D)] is nondecreasing, then the function E[hα∗ (x − Si+1 )] = E[hα∗ (x − Si − D)], i = 0, 1, . . . , is nondecreasing, which implies that the function GNα ,α (x) is nondecreasing. Therefore, (3.17) and Lemma 3.7 imply that GNα +1,α = fNα +1,α . Since PNα +1 i P∞ i 1 1 ˆ ∈ (α∗ , α) such that i=0 α < i=0 α = 1−α , then there exists α 1−α∗ < PNα +1 i 1 ˆ > α∗ , then limx→−∞ E[hαˆ (x − i=0 α > 1−α ˆ . According to Lemma 2.2, since α ′ ˆ. D)] > inf x E[hαˆ (x − D)]. Therefore, choose Nα = Nα + 1 and α′ = α Now we prove that there exists a finite number M0 > 0 such that fNα′ ,α (y) − fNα′ ,α (x) ≤ M0

x ≤ y ≤ rα .

(3.19)

Since α′ ∈ [α∗ , α) and limx→−∞ E[hα′ (x − D)] > inf x E[hα′ (x − D)], then the function E[hα′ (x − D)] is nonincreasing on (−∞, rα′ ], where rα′ is defined in (2.6). In view of (3.16), the function fNα′ ,α (x) can be written as ′



fNα′ ,α (x) = (1 − (1 − α′ )

Nα X

αi )cx +

Nα X i=0

i=0

  αi E[hα′ (x − Si+1 )] + α′ cE[Si+1 ] .

(3.20)

PNα′ i PNα′ i 1 ′ Since i=0 α ≥ 1−α ′ , then 1 − (1 − α ) i=0 α ≤ 0 and the first summand in the last expression in (3.20) is nonincreasing in x on X. Since the function E[hα′ (x − D)] is nonincreasing on (−∞, rα′ ], then the function E[hα′ (x − Si+1 )] = E[hα′ (x − Si − D)] is nonincreasing on (−∞, rα′ ) for all i = 0, 1, . . . . Therefore, (3.20) implies that the function fNα′ ,α (x) is nonincreasing on (−∞, rα′ ], that is, for x ≤ y ≤ rα′ fNα′ ,α (y) − fNα′ ,α (x) ≤ 0.

(3.21)

Since α > α′ , then rα′ ≤ rα ; see the paragraph below (2.8). In view of (3.16), ′



fNα′ ,α (x) = (1 − (1 − α)

Nα X

αi )cx +

i=0

For rα′ ≤ x ≤ y ≤ rα

Nα X i=0

  αi E[hα (x − Si+1 )] + αcE[Si+1 ] .

(3.22)

fNα′ ,α (y) − fNα′ ,α (x) ′

=(1 − (1 − α)

Nα X



αi )c(y − x) +

i=0

≤M0 := c(rα − rα′ ) < ∞,

Nα X

αi E[hα (y − Si+1 ) − hα (x − Si+1 )]

i=0

13

(3.23)

where the first equality follows from (3.22), the first inequality holds because (1 − PNα′ i α ) ≤ 1, 0 ≤ y − x ≤ rα − rα′ , and the function E[hα (z − Si+1 )] = (1 − α) i=0 E[hα (z − Si − D)] is nonincreasing in z on (−∞, rα ], and the last inequality holds because rα and rα′ are finite numbers. For x ≤ rα′ ≤ y ≤ rα fNα′ ,α (y) − fNα′ ,α (x) = fNα′ ,α (y) − fNα′ ,α (rα′ ) + fNα′ ,α (rα′ ) − fNα′ ,α (x) ≤ M0 , (3.24) where the inequality follows from (3.21) and (3.23). Therefore, (3.19) follows from (3.21), (3.23) and (3.24). Since GNα′ ,α = fNα′ ,α , then (3.19) implies (3.18). Lemma 3.11. Let Assumption 3 hold for the discount factor α and α ∈ (α∗ , 1), where α∗ is defined in (2.7). Then for x ≤ y ≤ rα vα (y) + cy − vα (x) − cx ≤ 0

and

(3.25)

Gα (y) − Gα (x) ≤ 0.

(3.26)

Proof. According to Lemmas 3.8 and 3.10, there exist real numbers Nα′ and M0 > 0 such that for t = Nα′ + 1, Nα′ + 2, . . . ′

vt,α (y) + cy − vt,α (x) − cx ≤ αt−Nα M0 t−Nα′

Gt,α (y) − Gt,α (x) ≤ α

and

M0 .

(3.27) (3.28)

According to Lemma 3.2, since Assumption W* holds and the costs are nonnegative, then Feinberg et al. [6, Theorem 2] implies that vt,α (x) ↑ vα (x) and, in view of (3.4) and Lebesgue’s monotone convergence theorem, Gt,α (x) ↑ Gα (x) for each x ∈ X. Therefore, (3.25) and (3.26) follow from (3.27) and (3.28), respectively. Proof of Theorem 3.4. According to Lemma 3.2, since Assumption W* holds, then Feinberg et al. [6, Theorem 2] implies that vα (x) and Gα (x) are lower semicontinuous functions. In view of Lemma 3.11, the function Gα (x) is nonincreasing on (−∞, rα ], where rα is defined in (2.6). In view of (3.4), Gα (x) ≥ cx → ∞ as x → ∞. Therefore, the function Gα (x) is inf-compat. Let Sα satisfy (3.5) and sα be defined in (3.6) with f := Gα . The lower semicontinuity of Gα (x) implies that Gα (sα ) ≤ Gα (Sα ) + K.

(3.29)

Since the function Gα (x) is nonincreasing on (−∞, rα ], then Sα ≥ rα .

(3.30)

To prove the optimality of (sα , Sα ) policies, we consider three cases: (i) x ≥ rα ; (ii) sα ≤ x ≤ rα ; and (iii) x < sα . (i) In view of Lemma 3.6, for rα ≤ x < y Gα (y) + K − Gα (x) ≥ E[hα (y − D)] − E[hα (x − D)] + K − αK > 0, 14

(3.31)

where the first inequality follows from (3.10) and the second one holds because the function E[hα (x − D)] is nondecreasing on [rα , ∞) and K − αK > 0. Therefore, the action a = 0 is optimal for x ≥ rα . In addition, (3.31) implies that Gα (x) < Gα (Sα ) + K for all x ∈ [rα , Sα ], which implies that sα ≤ rα .

(3.32)

(ii) For sα ≤ x ≤ rα Gα (x) ≤ Gα (sα ) ≤ K + Gα (Sα ) = K + min Gα (y), y∈X

where the first inequality follows from (3.26) and the second one follows from (3.29). Therefore, the action a = 0 is optimal for sα ≤ x ≤ rα . (iii) For x < sα Gα (x) > K + Gα (Sα ) = K + min Gα (y), y∈X

where the inequality follows from the definition of sα in (3.6) with f := Gα . Therefore, the action a = Sα − x is optimal for x < sα . Thus, the (sα , Sα ) policy is optimal for the discount factor α. Observe that formulae (3.30) and (3.32) imply that sα ≤ rα ≤ Sα ,

(3.33)

if Assumption 3 holds for the discount factor α.

4

Verification of Average-Cost Optimality Assumptions for the Setup-Cost Inventory Control Model

In this section we show that, in addition to Assumption W*, under Assumption 1, the setup-cost inventory control model satisfies Assumption B introduced by Sch¨ al [19]. This implies the validity of average-cost optimality inequalities (ACOIs) and the existence of stationary optimal policies; see Feinberg et al. [6, Theorem 4]. In addition, we show that, under Assumption 1, the inventory control model satisfies the equicontinuity condition from Feinberg and Liang [11, Theorem 3.2], which implies the validity of the ACOE for the inventory control model. As in Sch¨ al [19] and Feinberg et al. [6], define mα := inf vα (x), x∈X

uα (x) := vα (x) − mα ,

¯ := lim sup(1 − α)mα . w := lim inf (1 − α)mα , w α↑1

(4.1)

α↑1

The function uα is called the discounted relative value function. Assume that the following assumption holds in addition to Assumption W*. 15

Assumption B. (i) w∗ := inf x∈X w(x) < +∞, and (ii) sup uα (x) < ∞, x ∈ X. α∈[0,1)

As follows from Sch¨ al [19, Lemma 1.2(a)], Assumption B(i) implies that mα < +∞ for all α ∈ [0, 1). Thus, all the quantities in (4.1) are defined. According to Feinberg et al. [6, Theorems 3, 4], if Assumptions W* and B hold, then w = w ¯ and therefore, ¯ lim(1 − α)mα = w = w.

(4.2)

α↑1

Define the following function on X for the sequence {αn ↑ 1}n=1,2,... : u ˜(x) :=

lim inf

n→+∞,y→x

uαn (y).

(4.3)

In words, u ˜(x) is the largest number such that u ˜(x) ≤ lim inf n→∞ uαn (yn ) for all sequences {yn → x}. Since uα (x) is nonnegative by definition, then u ˜(x) is also nonnegative. The function u ˜, defined in (4.3) for a sequence {αn ↑ 1}n=1,2,... of nonnegative discount factors, is called an average-cost relative value function. If Assumptions W* and B hold, then Feinberg et al. [6, Corollary 2] implies the validity of ACOIs and ¯ = w∗ , wφ (x) = w = lim(1 − α)vα (x) = w α↑1

x ∈ X.

(4.4)

Furthermore, let us define w := w; see (4.2) and (4.4) for other equalities for w. Consider the renewal process N(t) := sup{n = 0, 1, . . . | Sn ≤ t},

(4.5)

where t ∈ R+ and Sn is defined in defined in (3.11). Observe that since P (D > 0) > 0, then E[N(t)] < +∞, t ∈ R+ ; see Resnick [17, Theorem 3.3.1]. For x ∈ X and y ≥ 0 define Ey (x) := E[h(x − SN(y)+1 )].

(4.6)

Since x − y ≤ x − SN(y) ≤ x and the function E[h(x − D)] is quasiconvex, then Ey (x) = E[h(x − SN(y) − D)] ≤ max{E[h(x − y − D)], E[h(x − D)]} < ∞.

(4.7)

Lemma 4.1. Let Assumption 1 hold. The inventory control model satisfies Assumption B. Proof. Assumption B(i) follows from the same arguments in the first paragraph of the proof of Feinberg and Lewis [9, Proposition 6.3]. The inf-compactness of the function c : X × A → R and the validity of Assumption W* imply that for each α ∈ [0, 1) the function vα is inf-compact (Feinberg and Lewis [8, Proposition 3.1(iv)]), and therefore the set Xα := {x ∈ X| vα (x) = mα }, 16

(4.8)

where mα is defined in (4.1), is nonempty and compact. Furthermore, the validity of Assumption B(i) implies that there is a compact subset K of X such that Xα ⊆ K for all α ∈ [0, 1); Feinberg et. al. [6, Theorem 6]. Following Feinberg and Lewis [9], let us consider a bounded interval [x∗L , x∗U ] ⊆ X such that Xα ⊆ [x∗L , x∗U ]

for all α ∈ [0, 1).

(4.9)

Consider an arbitrary α ∈ [0, 1) and a state xα such that vα (xα ) = mα , where mα is defined in (4.1). Then, in view of (4.9), the inequalities x∗L ≤ xα ≤ x∗U take place. Let E(x) := E[h(x − D)] + Ex−x∗L (x) < ∞,

(4.10)

where the function Ey (x) is defined in (4.6) and its finiteness is stated in (4.7). Since the function E[h(x − D)] is quasiconvex, then for xt = x − St , t = 1, . . . , N(x − x∗L ) + 1 x − SN(x−x∗L )+1 = x − SN(x−x∗L ) − D ≤ xt = xt−1 − D ≤ x − D and E[h(xt )] ≤ E[h(x − D)] + E[h(x − SN(x−x∗L )+1 )] = E(x).

(4.11)

By considering the same policy σ and following the arguments thereafter as in the proof of Feinberg and Lewis [9, Proposition 6.3] with the equation (6.14) there replaced with (4.11), we obtain the validity of Assumption B. Now we establish the boundedness and the equicontinuity of the discounted relative value functions uα defined in (4.1). Consider ( K + c¯(x∗U − x), if x < x∗L , (4.12) U (x) := K + c¯(x∗U − x∗L ) + (E(x) + c¯E[D])(1 + E[N(x − x∗L )]), if x ≥ x∗L , where the real numbers x∗L and x∗U are defined in (4.9) and the function E(x) is defined in (4.10). Lemma 4.2. Let Assumption 1 hold. The following inequalities hold for α ∈ [0, 1) : (i) uα (x) ≤ U (x) < +∞ for all x ∈ X; (ii) If x∗ , x ∈ X and x∗ ≤ x, then C(x∗ , x) := supy∈[x∗ ,x] U (y) < +∞; (iii) E[U (x − D)] < +∞ for all x ∈ X. Proof. The proof of this lemma is identical to the proof of Feinberg and Liang [11, Lemma 4.6].

17

The following theorem is proved in Feinberg and Lewis [9, Theorem 6.10(iii)] under Assumption 2. Under Assumption 1, since Assumptions W* and B hold and there exist α-discounted optimal (sα , Sα ) policies for all α ∈ (α∗ , 1), where α∗ is defined in (2.7), then the proof of the following lemma coincides with the proof of Feinberg and Lewis [9, Theorem 6.10(iii)]. Theorem 4.3. Let Assumption 1 hold. For each nonnegative discount factor α ∈ (α∗ , 1), consider an optimal (s′α , Sα′ ) policy for the discounted criterion with the discount factor α. Let {αn ↑ 1}n=1,2,... be a sequence of negative numbers with α1 > α∗ . Every sequence {(s′αn , Sα′ n )}n=1,2,... is bounded, and each its limit point (s∗ , S ∗ ) defines an average-cost optimal (s∗ , S ∗ ) policy. Furthermore, this policy satisfies the optimality inequality w+u ˜(x) ≥ min{min[K + H(x + a)], H(x)} − c¯x,

(4.13)

H(x) := c¯x + E[h(x − D)] + E[˜ u(x − D)],

(4.14)

a≥0

where

where the function u ˜ is defined in (4.3) for an arbitrary subsequence {αnk }k=1,2,... of {αn }n=1,2,... satisfying (s∗ , S ∗ ) = limk→∞ (s′αn , Sα′ n ). k

k

Recall the following definition of equicontinuity. Definition 4.4. A family of functions H of real-valued functions on a metric space X is called equicontinuous at the point x ∈ X if for each ǫ > 0 there exists an open set G containing x such that |h(y) − h(x)| < ǫ for all y ∈ G and for all h ∈ H. The family of functions H is called equicontinuous (on X) if it is equicontinuous at all x ∈ X. The following theorem provides sufficient conditions under which there exist a stationary policy φ and a function u ˜(·) such that the ACOEs hold for MDPs satisfying Assumptions W* and B. Theorem 4.5 (Feinberg and Liang [11, Theorem 3.2]). Let Assumptions W* and B hold. Consider a sequence {αn ↑ 1}n=1,2,... of nonnegative discount factors. If the sequence {uαn }n=1,2,... is equicontinuous and there exists a nonnegative measurable R function U (x), x ∈ X, such that U (x) ≥ uαn (x), n = 1, 2, . . . , and X U (y)q(dy|x, a) < +∞ for all x ∈ X and a ∈ A, then the following statements hold. (i) There exists a subsequence {αnk }k=1,2,... of sequence {αn }n=1,2,... such that the ˜(x), x ∈ X, where u ˜(x) is defined in functions {uαnk (x)} converges pointwise to u (4.3) for the sequence {αnk }k=1,2,... . In addition, the function u ˜(x) is continuous.

18

(ii) There exists a stationary policy φ satisfying the ACOE with the nonnegative function u ˜ defined for the subsequence {αnk }k=1,2,... mentioned in statement (i), that is, for all x ∈ X Z u ˜(y)q(dy|x, φ(x)) w+u ˜(x) = c(x, φ(x)) + Z X u ˜(y)q(dy|x, a)], (4.15) = min[c(x, a) + a∈A

X

and every stationary policy satisfying (4.15) is average-cost optimal. The following theorem shows that the equicontinuity conditions stated in Theorem 4.5 holds for the inventory control model with holding/backlog costs satisfying quasiconvexity assumptions. Theorem 4.6. Let Assumption 1 hold. Consider α∗ defined in (2.7). Then there exists ǫ∗ > 0 such that α∗ + ǫ∗ < 1 and the family of functions {uα }α∈[α∗ +ǫ∗ ,1) is equicontinuous on X. Proof. To prove this lemma, we first follow the same procedure as in the proof of Feinberg and Liang [11, Lemma 4.7] to prove that the family of function {uα }α∈[α∗ +ǫ∗ ,1) is equicontinuous on (−∞, M ] for any given M. According to Theorem 4.3, since each sequence {(sαn , Sαn )}n=1,2,... is bounded and α∗ < 1, then there exist constants ǫ∗ > 0, b > 0, and δ0 > 0 such that sα ∈ (−b, b), Sα ∈ (−b, b) and for all α ∈ [α∗ + ǫ∗ , 1) −b ≤ sα − δ0 < sα + δ0 ≤ b.

(4.16)

Let α ∈ [α∗ +ǫ∗ , 1). Consider M > b, z1 and z2 satisfying M > z1 , z2 ≥ sα . Without loss of generality, assume that z1 < z2 . With αn replaced with α, the arguments provided before Equation (4.19) in the proof of Feinberg and Liang [11, Lemma 4.7] imply that |uα (z1 ) − uα (z2 )| = |vα (z1 ) − vα (z2 )| −sα )+1 N(z1X ˜ 1 − Sj−1 ) − h(z ˜ 2 − Sj−1 )) = E[ αj−1 (h(z j=1

+ αN(z1 −sα )+1 (vα (z1 − SN(z1 −sα )+1 ) − vα (z2 − SN(z1 −sα )+1 ))] N(M +b)+1

≤E[

X

˜ 1 − Sj−1 ) − h(z ˜ 2 − Sj−1 )|] |h(z

j=1

+ E[|uα (z1 − SN(z1 −sα )+1 ) − uα (z2 − SN(z1 −sα )+1 )|],

(4.17)

˜ where h(x) := E[h(x − D)], the inequality holds because of α < 1, in view of the standard properties of expectations and absolute values, and because −b < sα ≤ z1 < 19

˜ ˜ is M. Recall that the function h(x) is continuous and finite. Therefore, the function h uniformly continuous on the closed interval [−(M +2b), M ]. In addition, Assumption 1 ˜ is quasiconvex. implies that the function h Since −(M +2b) < x−Sj−1 < M for all x ∈ [−b, M ] and j = 1, 2, . . . , N(M +b)+1, ˜ imply that for all x ∈ (−b, M ) then the nonnegativity and quasiconvexity of h ˜ − Sj−1 ) ≤ max{h(−(M ˜ ˜ 0 ≤ h(x + 2b)), h(M )}.

(4.18)

Furthermore, for −b < z1 < z2 < M, N(M +b)+1

E[

X

˜ 1 − Sj−1 ) − h(z ˜ 2 − Sj−1 )|] |h(z

X

˜ ˜ max{h(−(M + 2b)), h(M )}]

j=1

N(M +b)+1

≤E[

j=1

˜ ˜ ≤E[N(M + b) + 1] max{h(−(M + 2b)), h(M )} < ∞,

(4.19)

where the first inequality follows from (4.18), the second one follows from Wald’s ˜ Therefore, for identity, and the last one follows from the finiteness of the function h. −b < z1 < z2 < M, N(M +b)+1

lim E[

z1 →z2

˜ 1 − Sj−1 ) − h(z ˜ 2 − Sj−1 )|] |h(z

X j=1

N(M +b)+1

X

=E[

j=1

˜ 1 − Sj−1 ) − h(z ˜ 2 − Sj−1 )|] = 0, lim |h(z

z1 →z2

(4.20)

where the first equality follows from (4.19) and Lebesgue’s dominated convergence ˜ theorem, and the second one follows from the continuity of h. Consider ǫ > 0. In view of (4.20), since −b < z1 < z2 < M, then there exists δ1 ∈ (0, δ0 ) such that for sα ≤ z1 < z2 satisfying |z1 − z2 | < δ1 N(M +b)+1

E[

X j=1

˜ 1 − Sj−1 ) − h(z ˜ 2 − Sj−1 )|] ≤ ǫ . |h(z 2

(4.21)

The remaining proof coincides with the proof starting from Equation (4.22) in Feinberg and Liang [11] with Equations (4.19) and (4.21) there replaced with Equations (4.17) and (4.21) from this section, respectively, and the Lipshitz continuity ˜ on the interval [−(M + 2b), M ]. replaced with the uniform continuity of the function h Then there exists δ3 such that for all x, y ∈ (−∞, M ) satisfying |x − y| < δ3 |uα (x) − uα (y)| < ǫ. Since M can be chosen arbitrarily large, then the family of functions {uα }α∈[α∗ +ǫ∗ ,1) is equicontinuous on X. 20

5

Setup-Cost Inventory Control Model: Average Costs per Unit Time

The following theorem establishes the validity of the ACOEs for the described inventory control model and the optimality of (s, S) policies under the average cost criterion. Theorem 5.1. Let Assumption 1 hold. Consider the MDP defined for the described inventory control model. For every sequence {αn ↑ 1}n=1,2,... of nonnegative discount factors with α1 ≥ α∗ + ǫ∗ , where α∗ and ǫ∗ are defined in Theorem 4.6, there exist a stationary policy ϕ and a function u ˜ defined in (4.3) such that for all x ∈ X w+u ˜(x) = KI{ϕ(x)>0} + H(x + ϕ(x)) − c¯x = min{min[K + H(x + a)], H(x)} − c¯x, a≥0

(5.1)

where the function H is defined in (4.14). In addition, the functions u ˜ and H are continuous and inf-compact, and a stationary optimal policy ϕ satisfying (5.1) can be selected as an (s∗ , S ∗ ) policy described in Theorem 4.3. It also can be selected as an (s, S) policy with the real numbers S and s satisfying (3.5) and defined in (3.6) respectively for f (x) = H(x), x ∈ X. To prove Theorem 5.1, we first establish several properties of the average-cost relative value function. Lemma 5.2. Let Assumption 1 hold. Consider the function u ˜ defined in (4.3) for ∗ a sequence {αn }n=1,2,... such that αn ↑ 1 and α1 > α , where α∗ is defined in (2.7). Then the following statements hold: (i) For x ≤ y ˜(y) + cy + K u ˜(x) + cx ≤ u

and

H(y) − H(x) ≥ E[h(y − D)] − E[h(x − D)] − αK.

(5.2) (5.3)

(ii) For x ≤ y ≤ r1 u ˜(y) + cy − u ˜(x) − cx ≤ 0

and

H(y) − H(x) ≤ 0.

(5.4) (5.5)

Proof. (i) In view of (3.8), (5.2) holds because uα (x) − uα (y) = vα (x) − vα (y) and uαn (x) converges pointwise to u ˜(x) as n → ∞. For x ≤ y, H(y) − H(x) =E[h(y − D)] + αE[˜ u(y − D) + c(y − D)] − E[h(x − D)] − αE[˜ u(x − D) + c(x − D)] ≤E[hα (y − D)] − E[hα (x − D)] + αK, 21

where the equality follows from (4.14) and the inequality follows from (5.2). (ii) In view of (3.25), (5.4) holds because uα (x) − uα (y) = vα (x) − vα (y) and uαn (x) converges pointwise to u ˜(x) as n → ∞. For x ≤ y ≤ r1 , H(y) − H(x) ˜(x − D) − c(x − D)] ≤ 0, =E[h(y − D)] − E[h(x − D)] + αE[˜ u(y − D) + c(y − D) − u where the equality follows from (4.14) and the inequality follows from that the function E[h(x − D)] is nonincreasing on (−∞, r1 ] and (5.4). Proof of Theorem 5.1. The proof of this thoerem is identical to the proof of Feinberg and Liang [11, Theorem 4.5] with (i) Lemmas 4.6 and 4.7 there replaced with Lemma 4.2 and Theorem 4.6 from this paper, respectively; and (ii) the proof of the K-convexity of the functions u and H and the optimality of (s, S) policies under the average cost criterion replaced with the following arguments: consider the cases (i)-(iii) in the proof of Theorem 3.4 with Gα , hα and rα replaced with H, h and r1 , respectively. Then Lemma 5.2 implies that there exists an optimal (s, S) policy, with the real numbers S and s satisfying (3.5) and defined in (3.6) for f (x) := H(x), x ∈ X. Furthermore, the continuity of average-cost relative value functions implies the following corollary. Corollary 5.3. Let Assumption 1 hold, the state space X = R, and the action space A = R+ . For the (s, S) policy defined in Theorems 5.1, consider the stationary policy ϕ coinciding with this policy at all x ∈ X, except x = s, and with ϕ(s) = S − s. Then the stationary policy ϕ also satisfies the optimality equation (5.1), and therefore the policy ϕ is average-cost optimal. Proof. Since the proof of the optimality of (s, S) policies is based on the fact that K + H(S) < H(x), if x < s, and K + H(S) ≥ H(x), if x ≥ s. Since the function H is continuous, we have that K + H(S) = H(s). Thus both actions are optimal at the state s.

6

Convergence of Optimal Lower Thresholds sα

This section establishes for the inventory control model with holding/backlog costs satisfying quasiconvexity assumptions the convergence of discounted optimal lower thresholds sα → s as α ↑ 1, where s the average-cost optimal lower threshold (stated in Theorem 5.1). In this and the following sections, we assume that the state space X = R and the action space A = R+ . The quasiconvexity of E[h(x − D)] assumed in Assumption 1 implies that the function E[h(x − D)] is nonincreasing on (−∞, r1 ), where r1 is defined in (2.6). The following approach requires a slightly stronger condition. 22

The following theorem establishes the convergence of the discounted optimal lower thresholds sα when the discount factor α converges to 1. Theorem 6.1. Let Assumptions 1 and 4 hold. Then the limit s∗ := lim sα

(6.1)

α↑1

exists and s∗ ≤ r1 , where r1 is defined in (2.6). Remark 6.2. As shown in Corollary 7.4, if Assumption 1 holds and Assumption 4 holds for α = 1, then all sequences {αn ↑ 1}n=1,2,... define the same functions u ˜ and H in (5.1), and there exists a unique threshold s∗ , for which there is an (s∗ , S ∗ ) policy satisfying the ACOE (5.1). Before the proof of Theorem 6.1, we first state several auxiliary facts. Consider v α (x) := vα (x) + cx,

x ∈ X.

(6.2)

Then the optimality equations (3.2) can be written as v α (x) = min{min[K + Gα (x + a)], Gα (x)},

(6.3)

Gα (x) = E[hα (x − D)] + αE[v α (x − D)].

(6.4)

a≥0

where

For x ∈ X, define mα := min{v α (x)} x∈X

and uα (x) := v α (x) − mα .

(6.5)

If there exists an α-discount optimal (sα , Sα ) policy, then (6.3) can be written as ( Gα (x) if x ≥ sα , (6.6) v α (x) = K + Gα (Sα ) if x < sα , which implies that mα = min{Gα (x)} = Gα (Sα ). x∈X

(6.7)

Consider xα ∈ Xα , where Xα is defined in (4.9). For α ∈ [0, 1) mα ≤ v α (xα ) = mα + cxα ≤ mα + cx∗U ,

(6.8)

where x∗U is defined in (4.9). In view of (6.6), the continuity of v α (x) implies that v α (x) = v α (sα ) for all x ≤ sα . Therefore, mα = inf v α (x) = inf {vα (x) + cx} x≥sα

x≥sα

≥ inf {vα (x) + csα } ≥ mα + csα , x≥sα

23

(6.9)

where the first inequality holds because x ≥ sα and the last one follows from mα = inf x vα (x). Combining (6.8) and (6.9) yields mα + csα ≤ mα ≤ mα + cx∗U .

(6.10)

For α ∈ (α∗ , 1) define the set of all possible optimal discounted lower thresholds Gα := {x ≤ Sα : Gα (y) = K + Gα (Sα ) for all y ∈ [sα , x]},

(6.11)

where Sα satisfies (3.5) and sα is defined in (3.6) with f := Gα . Note that sα ∈ Gα and y ≥ sα for all y ∈ Gα . Lemma 6.3. Let Assumption 1 hold. Then, for all α ∈ (α∗ , 1) and y ∈ Gα , (1 − α)(mα + K) = E[hα (y − D)].

(6.12)

Proof. According to (6.6), v α (x) = K + Gα (Sα ) = K + mα for x ≤ y. In view of (6.4) and (6.11), K + mα = Gα (y) = E[hα (y − D)] + αE[v α (y − D)] = E[hα (y − D)] + α(K + mα ), which implies (6.12). Lemma 6.4. Let Assumption 1 hold. Then, for all α ∈ (α∗ , 1) and y ∈ Gα , y ≤ rα ≤ r1 .

(6.13)

Proof. Since E[h(x − D)] ≥ E[h(r1 − D)] and (1 − α)cx ≥ (1 − α)cr1 for x ≥ r1 , then, for x ≥ r1 , E[hα (x − D)] = E[h(x − D)] + (1 − α)cx + αcE[D] ≥ E[h(r1 − D)] + (1 − α)cr1 + αcE[D] = E[hα (r1 − D)]. Therefore, rα ≤ r1 . The following proof is by contradiction. Assume that there exist α ∈ (α∗ , 1) and y ∈ Gα such that y > rα . According to (6.6), v α (x) = K + mα for x ≤ y. Therefore, (6.4) implies that for x ≤ y Gα (x) = E[hα (x − D)] + α(K + mα ).

(6.14)

The definition (2.6) of rα and (6.14) imply that Gα (rα ) ≤ Gα (y). According to the definition of sα in (3.6), Gα (x) ≥ Gα (y) for x ≤ y. Therefore, Gα (rα ) = Gα (y), which implies that E[hα (rα − D)] + α(K + mα ) = Gα (rα ) = Gα (y) = K + Gα (Sα ) = K + E[hα (Sα − D)] + αE[v α (Sα − D)] > E[hα (rα − D)] + α(K + mα ),

(6.15)

where the first equality follows from (6.14), the last equality follows from (6.4), and the inequality follows from K > αK and the definition of rα and mα . The contradiction in (6.15) implies that y ≤ rα for all y ∈ Gα . 24

Lemma 6.5. Let Assumption 1 hold. Then, lim(1 − α)mα = lim E[hα (sα − D)] = w. α↑1

α↑1

(6.16)

Proof. According to the proof of Theorem 4.6, then there exist constants ǫ∗ > 0 and b > 0 such that sα ∈ (−b, b) for all α ∈ [α∗ + ǫ∗ , 1). In view of (4.2), since b and x∗U are real numbers, where x∗U is defined in (4.9), then lim(1 − α)(mα − cb) = lim(1 − α)(mα + cx∗U ) = w. α↑1

α↑1

(6.17)

Therefore, since sα > −b for all α ∈ [α∗ + ǫ∗ , 1), then (6.10) and (6.17) imply that lim(1 − α)mα = w. α↑1

(6.18)

Since (1−α)K → 0 as α ↑ 1, then (6.18) and Lemma 6.3 implies that limα↑1 E[hα (sα − D)] = w. Proof of Theorem 6.1. The proof is by contradiction. According to Theorem 4.3, for αn ↑ 1, n = 1, 2, . . . , with α1 > α∗ , every sequence {(sαn , Sαn )}n=1,2,... is bounded. Consider two real numbers s(1) < s(2) such that there exist two sequences {αn }n=1,2,... and {˜ αn }n=1,2,... satisfying limn→+∞ sαn = s(1) and limn→+∞ sα˜ n = s(2) . Since the function E[h(x − D)] is continuous, then lim E[h(sαn − D)] = E[h(s(1) − D)].

n→+∞

(6.19)

Therefore, lim E[hαn (sαn − D)] = lim

n→+∞



n→+∞ (1)

= E[h(s

E[h(sαn − D)] + (1 − αn )csαn + αn cE[D] − D)] + cE[D],

(6.20)

where the second equality follows from (6.19) and sαn → s(1) ∈ R as αn ↑ 1. According to Lemma 6.5, E[h(s(1) − D)] = w − cE[D]. By the same arguments with αn replaced with α ˜ n , E[h(s(2) − D)] = w − cE[D]. Therefore, E[h(s(1) − D)] = E[h(s(2) − D)].

(6.21)

According to Lemma 6.4, sα ≤ rα for all α ∈ (α∗ , 1). Therefore s(2) = lim sα˜ n ≤ lim inf rα˜ n ≤ r1 , n→+∞

n→+∞

(6.22)

where the last inequality follows from Lemma 6.4. Since s(1) < s(2) ≤ r1 , then Assumption 4 implies that E[h(s(1) − D)] > E[h(s(2) − D)],

(6.23)

which contradicts (6.21). Thus, the limit limα↑1 sα exists and (6.22) implies that s ∗ ≤ r1 . 25

The following theorem establishes the uniqueness of possible optimal lower thresholds for the inventory control model with convex cost functions under the discounted criterion. Theorem 6.6. Let Assumption 1 hold and Assumption 4 hold for the discount factor α ∈ (α∗ , 1), where α∗ is defined in (2.7). Then Gα = {sα }, where Gα and sα are defined in (6.11) and (3.6) with f := Gα , respectively. Proof. Recall that sα ∈ Gα and y ≥ sα for all y ∈ Gα . The proof is by contradiction. Assume that there exists y1 ∈ Gα such that y1 > sα . According to Lemma 6.3, E[hα (y1 − D)] = (1 − α)(mα + K) = E[hα (sα − D)]. Since Assumption 4 holds for the discount factor α, then rα < y1 , where rα is defined in (2.6). However, according to Lemma 6.4, y1 ≤ rα , which implies that y1 ≤ rα < y1 . Therefore, Gα = {sα }.

7

Convergence of Discounted Relative Value Functions

This section establishes the convergence of discounted relative value functions to the average-cost relative value function for the setup-cost inventory control model when the discount factor tends to 1. This is the stronger result than the convergence for a subsequence that follows from Theorem 5.1. Let us define u(x) := lim inf uα (y). α↑1,y→x

(7.1)

According to Feinberg et el. [6, Theorems 3, 4], the ACOI holds for the relative value functions u ˜ and u defined in (4.3) and (7.1) respectively. The following theorem states the convergence of discounted relative value functions, when the discount factor converges to 1, to the average-cost relative value function u. Theorem 7.1. Let Assumption 1 hold and Assumption 4 hold for α = 1. Then, lim uα (x) = u(x), α↑1

x ∈ X,

(7.2)

and the function u is continuous. In particular, Theorem 7.1 implies that the value of the function u ˜ does not depend on the sequence {αn ↑ 1}n=1,2,... chosen in (4.3). In general, this is not true.

26

Before the proof of Theorem 7.1, we first state several auxiliary facts. Consider uα defined in (6.5). If there exists an α-discounted optimal (sα , Sα ) optimal policy, then (6.6) implies that ( Gα (x) − mα if x ≥ sα , uα (x) = (7.3) K if x < sα . Lemma 7.2. Let Assumption 1 hold. Then: (i) the family of functions {uα }α∈[α∗ +ǫ∗ ,1) is equicontinuous on X, where α∗ + ǫ∗ is defined in Theorem 4.6; (ii) supα∈(α∗ ,1) uα (x) < +∞ for all x ∈ X. Proof. (i) For all α ∈ [0, 1) |uα (x) − uα (y)| = |v α (x) − v α (y)| = |vα (x) − vα (y) + c(x − y)| =|uα (x) − uα (y) + c(x − y)| ≤ |uα (x) − uα (y)| + c|x − y|,

(7.4)

where the first equality follows from (6.5), the second one follows from (6.2), and the third one follows from (4.1). Consider ǫ > 0. Since the family of functions {uα }α∈[α∗ +ǫ∗ ,1) is equicontinuous (see Theorem 4.6), then there exists δ > 0 such that |uα (x) − uα (y)| < 2ǫ for all |x − y| < δ ǫ }, c|x − y| < 2ǫ and (7.4) and α ∈ [α∗ + ǫ∗ , 1). Therefore, for |x − y| < δ1 := min{δ, 2c implies that for |x − y| < δ1 and α ∈ [α∗ + ǫ∗ , 1) |uα (x) − uα (y)| ≤ ǫ. Thus, the family of functions {uα }α∈[α∗ +ǫ∗ ,1) is equicontinuous. (ii) Consider x ∈ X. For all α ∈ (α∗ , 1), uα (x) − uα (x) ≤ |uα (x) − uα (x)| =|v α (x) − vα (x) − (mα − mα )| = |cx − (mα − mα )| ≤c|x| + |mα − mα | ≤ c(|x| + |sα | + |x∗U |) ≤c(|x| + b + |x∗U |),

(7.5)

where the last two inequalities follow from (6.10) and Theorem 6.1, respectively. According to Lemma 4.1, since Assumption B holds, then supα∈(α∗ ,1) uα (x) < +∞. Therefore, (7.5) implies that sup uα (x) ≤ c(|x| + b + |x∗U |) +

α∈(α∗ ,1)

27

sup uα (x) < +∞, α∈(α∗ ,1)

x ∈ X.

Lemma 7.3. Let Assumption 1 hold and Assumption 4 hold for α = 1. Then there exists the limit u(x) := lim uα (x), α↑1

x ∈ X,

(7.6)

where the function u is continuous on X. Proof. According to (4.16), there exist ǫ∗ > 0 and b > 0 such that sα ∈ [−b, b] for all α ∈ [α∗ + ǫ∗ , 1). Consider s∗ defined in (6.1). We first prove that the limit limα↑1 uα (x) exists for x < s∗ . For x < s∗ , according to Theorem 6.1, there exists α ˆ > α∗ + ǫ∗ such that sα > x for all α ∈ [ˆ α, 1), then (7.3) implies that uα (x) = K for all α ∈ [ˆ α, 1). Therefore, lim uα (x) = K, α↑1

x < s∗ .

(7.7)

For x > s∗ , according to Theorem 6.1, there exists α ˆ > α∗ + ǫ∗ such that sα < x for all α ∈ [ˆ α, 1). Therefore, in view of (7.3), (6.4), (6.5), and (6.7), for all α ∈ [ˆ α, 1) uα (x) = E[hα (x − D)] + αE[uα (x − D)] − (1 − α)mα .

(7.8)

Then, (7.3) and (7.8) imply that N(x−sα )+1

uα (x) = E[

X

˜ α (x − Sj−1 ) − (1 − α)mα )] + E[αN(x−sα )+1 K], αj−1 (h

(7.9)

j=1

˜ α (x) := E[hα (x − D)], x ∈ X, and N(·) is defined in (4.5). For α ∈ [α∗ + ǫ∗ , 1), where h 1 ≥ E[αN(x−sα )+1 ] ≥ E[αN(x+b)+1 ] ≥ αE[N(x+b)+1] ,

(7.10)

where the first inequality follows from α < 1, the second one follows from sα ≥ −b and α < 1, and the last one follows from Jensen’s inequality. Since P (D > 0) > 0, then E[N(x + b) + 1] < ∞, which implies that lim αE[N(x+b)+1] → 1. α↑1

(7.11)

Therefore, (7.10) and (7.11) imply that lim E[αN(x−sα )+1 K] = K. α↑1

28

(7.12)

The first term of the right hand side of (7.9) can be written as N(x−sα )+1

E[

X

˜ α (x − Sj−1 ) − (1 − α)mα )] αj−1 (h

j=1

=

i+1 +∞ X X

˜ α (x − Sj−1 ) − (1 − α)mα |N(x − sα ) = i]P (N(x − sα ) = i) αj−1 E[h

i=0 j=1

=

+∞ +∞ X X

˜ α (x − Sj ) − (1 − α)mα |N(x − sα ) = i]P (N(x − sα ) = i) αj E[h

j=0 i=j

=

+∞ X

˜ α (x − Sj ) − (1 − α)mα |N(x − sα ) ≥ j]P (N(x − sα ) ≥ j) αj E[h

j=0

=

+∞ X

˜ α (x − Sj ) − (1 − α)mα |Sj ≤ x − sα ]P (Sj ≤ x − sα ), αj E[h

(7.13)

j=0

where the first and third equalities follow from the properties of conditional expectations, the second equality changes the order of summation, and its validity follows from the nonnegativity of h(·) and the finiteness of uα (x), and the last equality follows from the fact that P (N(t) ≥ n) = P (Sn ≤ t). Now we prove that, if α ↑ 1, then the limit of the expression in (7.13) as α ↑ 1 exists almost everywhere on (s∗ , ∞). Consider the sets Dn , n = 0, 1, . . . , on which the distribution function of Sn , is not continuous. Since every distribution function is right-continuous, then Dn := {x ∈ R : lim P (Sn ≤ y) 6= P (Sn ≤ x)}. y↑x

Therefore, each set Dn , n = 0, 1, . . . , is at most countably infinite. Let D = {x > s∗ : x = s∗ + y,

y ∈ ∪+∞ n=0 Dn }

and

C = (s∗ , +∞) \ D.

(7.14)

Hence, D is also at most countably infinite. In addition, P (Sn ≤ x − s∗ ) is continuous at x − s∗ and limα↑1 P (Sn ≤ x − sα) = P (Sn ≤ x − s∗ ) for all x ∈ C and n = 0, 1, . . . . ˜ α (x−Sj )|Sj ≤ Consider x ∈ C. Since the function hα (x) is quasiconvex, then αj E[h x − sα ] ≤ E[hα (x − D)] + E[hα (−b − D)] < +∞. Since P (D > 0) > 0, then there exists a real number δD > 0 such that P (D > δD ) > 0. Consider ( if D < δD , ˜ = 0 D δD otherwise. ˜ = δD P (D ≥ δD ) > 0 and V ar(D) ˜ = δ2 P (D ≥ δD )(1 − P (D ≥ δD )) < ∞. Then E[D] D P n ˜ 0 = 0 and S ˜n = ˜ ˜ Define S i=1 D, n = 1, 2, . . . . Therefore, P (Sn ≤ x) ≤ P (Sn ≤ x) 29

˜ > 0, then there exists N1 > 0 such that for all x ∈ R and n = 0, 1, . . . . Since E[D] ˜ ˜ − (x + b) > 0. Hence, for n > N1 nE[D] > x + b for all n > N1 . Let δ(n) := nE[D] ˜ n ≤ x − sα ) ≤ P (S ˜ n ≤ x + b) P (Sn ≤ x − sα ) ≤ P (S ˜ n − nE[D] ˜ ≤ x + b − nE[D]) ˜ = P (S ˜ ˜ n − nE[D]| ˜ ≥ δ(n)) ≤ V ar(D) , ≤ P (|S δ2 (n) where the last inequality follows from Chebyshev’s inequality. In addition, according to Lemma 6.5, there exists M1 > 0 such that |(1 − α)mα | ≤ M1 for all α ∈ [α∗ + ǫ∗ , 1). P ˜ V ar(D) Therefore, the summation +∞ j=N1 (E[hα (x − D)] + E[hα (−b − D)] + M1 ) δ2 (j) < +∞. By taking the limit of the first and last terms of (7.13) as α ↑ 1, N(x−sα )+1 α↑1

=

+∞ X

˜ α (x − Sj−1 ) − (1 − α)mα )] αj−1 (h

X

lim E[

j=1

˜ 1 (x − Sj ) − w|Sj ≤ x − s∗ ]P (Sj ≤ x − s∗ ) E[h

j=0

N(x−s∗ )+1

=E[

X

˜ 1 (x − Sj−1 ) − w)], (h

(7.15)

j=1

where the first equality follows from the Lebesgue’s dominated convergence theorem, Theorem 6.1, and Lemma 6.5, and the second equality follows from (7.13) with sα replaced with s∗ and αj replaced with 1. In view of (7.9), (7.12), and (7.15), for all x∈C N(x−s∗ )+1

lim uα (x) = E[ α↑1

X

˜ 1 (x − Sj−1 ) − w)] + K. (h

(7.16)

j=1

¯ := D ∪ {s∗ }. The complement of D ¯ is D ¯c = X \ D ¯ = (−∞, s∗ ) ∪ C. In view Let D of (7.7) and (7.16), there exists the limit u(x) := lim uα (x), α↑1

¯ c. x∈D

(7.17)

¯ In view of Lemma 7.2, Now it remains to prove that (7.6) holds for x ∈ D. according to Arzel`a-Ascoli theorem (see Hern´andez-Lerma and Lasserre [13, p. 96]), there exist a sequence {αn ↑ 1}n=1,2,... with α1 ≥ α∗ + ǫ∗ and a continuous function u ¯∗ such that lim uαn (x) = u∗ (x),

n→+∞

30

x ∈ X,

(7.18)

In view of (7.17) and (7.18), u(x) = u∗ (x),

¯c. x∈D

(7.19)

¯ Assume that there exists The following proof is by contradiction. Consider z ∈ D. ∗ ∗ a subsequence {βn ↑ 1}n=1,2,... with β1 ≥ α + ǫ such that limn→∞ uβn (z) 6= u∗ (z). According to Arzel`a-Ascoli theorem, there exist a subsequence {βnk }k=1,2,... of the sequence {βn ↑ 1}n=1,2,... and a continuous function u′ such that lim uβnk (x) = u′ (x),

x ∈ X.

k→+∞

(7.20)

Therefore, u′ (z) 6= u∗ (z). However, in view of (7.17), (7.19), and (7.20), u∗ (x) = u′ (x) ∗ ′ ¯ c , which implies that u∗ (z) = lim for all x ∈ D ¯ c u (y) = limy→z,y∈D ¯ c u (y) = y→z,y∈D ′ u (z). Therefore, there exists the limit u(x) := lim uα (x), α↑1

¯ x ∈ D.

(7.21)

Furthermore, (7.17), (7.21), and (7.18) implies that u = u∗ and the function u is continuous. In view of (4.1), (6.2) and (6.5), uα (x) = uα (x) + mα − mα − cx,

x ∈ X.

(7.22)

Proof of Theorem 7.1. The proof consists of the proofs of the following two statements: (i) there exists the limit u∗ (x) := limα↑1 uα (x), x ∈ X, and the function u∗ is continuous on X; and (ii) u∗ (x) = u(x) := lim inf α↑1,y→x uα (x) for all x ∈ X. (i) We first prove that there exists the limit u∗ (s∗ ) := limα↑1 uα (s∗ ), where s∗ is defined in (6.1). Consider xα ∈ Xα , α ∈ [0, 1), where Xα is defined in (4.8). In view of (4.9), since Xα ⊆ [x∗L , x∗U ] for all a ∈ [0, 1), then for every sequence {˜ an ↑ 1}n=1,2,... , there exists a subsequence {˜ ank ↑ 1}k=1,2,... of the sequence {˜ an ↑ 1}n=1,2,... such that xa˜nk → x∗ ∗ ∗ ∗ as k → ∞ for some x ∈ [xL , xU ]. Let {αn ↑ 1}n=1,2,... be a sequence of discount factors such that xαn → x∗ as n → ∞ for some x∗ ∈ [x∗L , x∗U ] and α1 ≥ α∗ + ǫ∗ , where α∗ and ǫ∗ are defined in Theorem 4.6. Consider ǫ > 0. Since the family of functions {uα }α∈[α∗ +ǫ∗ ,1) is equicontinuous, then there exists an integer M (ǫ) > 0 such that for all n ≥ M (ǫ) |uαn (xαn ) − uαn (x∗ )| < ǫ.

(7.23)

Since uαn (xαn ) = 0 for all n = 1, 2, . . . , then (7.23) implies that for n ≥ M (ǫ) |uαn (x∗ )| < ǫ.

31

(7.24)

Therefore, (7.24) implies that lim uαn (x∗ ) = 0.

(7.25)

n→+∞

Since the function uαn is nonnegative, then (7.24) implies that for n ≥ M (ǫ) uαn (x∗ ) < uαn (x) + ǫ,

x ∈ X.

(7.26)

Then (7.26) and (7.22) imply that for n ≥ M (ǫ) u ¯αn (x∗ ) − c¯x∗ < u ¯αn (x) − c¯x + ǫ,

x ∈ X.

(7.27)

By taking the limit of both sides of (7.27) as n → +∞, Lemma 7.3 implies that u(x∗ ) − c¯x∗ ≤ u(x) − c¯x + ǫ,

x ∈ X.

(7.28)

Since ǫ can be chosen arbitrarily, (7.28) implies that u(x∗ ) − c¯x∗ = min{u(x) − c¯x}. x∈X

(7.29)

Let Mu := u(s∗ ) − c¯s∗ − minx∈X {u(x) − c¯x}. Then lim uαn (s∗ ) − uαn (x∗ ) = lim u ¯αn (s∗ ) − c¯s∗ − [¯ uαn (x∗ ) − c¯x∗ ]

n→+∞

n→+∞ ∗

= u(s ) − c¯s∗ − [u(x∗ ) − c¯x∗ ] = u(s∗ ) − c¯s∗ − min{u(x) − c¯x} = Mu , x∈X

(7.30)

where the first equality follows from (7.22), the second one follows from Lemma 7.3 and the third one follows from (7.29). In view of (7.25) and (7.30), lim uαn (s∗ ) = Mu .

n→+∞

(7.31)

Consider the sequence {˜ an }n=1,2,... of discount factors such that lim ua˜n (s∗ ) = lim inf uα (s∗ ).

n→+∞

α↑1

(7.32)

Since there exists a subsequence {˜ ank }n=1,2,... of the sequence {˜ an }n=1,2,... such that ∗ ∗ ∗ ∗ xa˜nk → x for some x ∈ [xL , xU ], then (7.31) implies that lim ua˜nk (s∗ ) = Mu .

k→+∞

(7.33)

Then (7.32) and (7.33) implies that lim inf α↑1 uα (s∗ ) = Mu . With lim inf replaced with lim sup in (7.32) and (7.33), the arguments after (7.31) imply that lim supα↑1 uα (s∗ ) = Mu . Therefore, u∗ (s∗ ) := lim uα (s∗ ) = Mu . α↑1

32

(7.34)

Now we prove that there exists the limit u∗ (x) := limα↑1 u(x) for all x 6= s∗ . For x ∈ X such that x 6= s∗ lim uα (x) − uα (s∗ ) = lim uα (x) − c¯y − [uα (s∗ ) − c¯s∗ ] α↑1

α↑1

= u(x) − c¯x − [u(s∗ ) − c¯s∗ ],

(7.35)

where the first equality follows from (7.22) and the second one follows from Lemma 7.3. Therefore, (7.34) and (7.35) implies that there exists the limit u∗ (x) := lim uα (x) = Mu + u(x) − c¯x − [u(s∗ ) − c¯s∗ ],

x 6= s∗ .

α↑1

(7.36)

Furthermore, since the family of functions {uα }{α≥α∗ +ǫ∗ } is equicontinuous and Assumption B holds, then Arzel`a-Ascoli theorem implies that the function u∗ is continuous. (ii) Consider sequences of {αn ↑ 1}n=1,2,... and {yn → x}n=1,2,... such that α1 > α∗ + ǫ∗ and limn→∞ uαn (yn ) = lim inf α↑1,y→x uα (y). Then, lim inf uα (y) ≤ lim inf uαn (y) ≤ lim uαn (yn ) = lim inf uα (x),

α↑1,y→x

n→∞,y→x

n→∞

α↑1,y→x

which implies that lim inf uα (y) = lim inf uαn (y).

α↑1,y→x

n→∞,y→x

(7.37)

Since limn→∞ uαn (x) = u∗ (x), by following the same arguments from Equation (3.4) to Equation (3.10) in the proof of Feinberg and Liang [11, Theorem 3.2] with nk and u ˜∗ replaced with n and u∗ , respectively, then lim inf uαn (y) = lim uαn (x) = u∗ (x).

n→∞,y→x

n→∞

(7.38)

Therefore, (7.37) and (7.38) imply that u := lim inf α↑1,y→x uα (y) = u∗ . This completes the proof. Theorem 7.1 implies that (4.14) can be written as H(x) := c¯x + E[h(x − D)] + E[u(x − D)].

(7.39)

Corollary 7.4. Let Assumption 1 hold and Assumption 4 hold for α = 1. Then the conclusions in Theorem 5.1 hold with u ˜ = u defined in (7.2) and s∗ defined in (6.1), that is, the functions u ˜ and the thresholds s∗ defined in (4.3) and Theorem 4.3, respectively, are the same for all sequences {αn ↑ 1}n=1,2,... . Proof. This corollary follows directly from Theorems 5.1, 6.1 and 7.1.

33

Define the set of all possible optimal average-cost lower thresholds G := {x ≤ S : H(y) = K + H(S) for all y ∈ [s, x]}, (7.40)  where S = min Argminx {H(x)} and s is defined in (3.6) with f := H. Note that s ∈ G and y ≥ s for all y ∈ G. The following theorem establishes the uniqueness of possible optimal lower thresholds for the inventory control model with holding/backlog costs satisfying quasiconvexity assumptions under the average cost criterion. Theorem 7.5. Let Assumption 1 hold and Assumption 4 hold for α = 1. Then, G = {s∗ }, where G and s∗ are defined in (7.40) and (6.1), respectively. Proof. Consider G and S defined in (7.40). Recall that s ∈ G and y ≥ s, where s is defined in (3.6) with f := H. According to Theorem 5.1 and Corollary 7.4, for y ∈ G ( K + H(S) if x ≤ y, (7.41) w + u(x) + cx = H(x) if x ≥ y, which implies that for x ≤ y H(x) = cx + E[h(x − D)] + E[u(x − D)] = K + E[h(x − D)] + H(S) + cE[D] − w.

(7.42)

Since H(y) = K + H(S) for y ∈ G, then, in view of (7.42), E[h(y − D)] = w − cE[D],

y ∈ G.

(7.43)

The following proof is by contradiction. Assume that there exists y1 ∈ G such that y1 > s. Then (7.43) implies that E[h(y1 − D)] = E[h(s − D)]. Therefore, implies that r1 < y1 , where r1 is defined in (2.6). Since  Assumption 4 S = min Argminx {H(x)} , then (7.41) implies that for x < S w + u(x) + cx > H(S).

(7.44)

Therefore, H(y1 ) = K + H(S) = K + cS + E[h(S − D)] + E[u(x − D)] > K + E[h(r1 − D)] + H(S) + cE[D] − w,

34

(7.45)

where the first equality holds because y1 ∈ G, the second follows from (7.39), and the inequality holds because E[h(S − D)] ≥ E[h(r1 − D)], (7.44) and P (D > 0) > 0. Since y ∈ G and r1 < y1 , then H(r1 ) ≥ H(y1 ). In view of (7.42), H(y1 ) ≤ H(r1 ) = K + E[h(r1 − D)] + H(S) + cE[D] − w, which contradicts (7.45). Then, G = {s}. In addition, Corollary 7.4 implies that s∗ ∈ G, where s∗ is defined in (6.1). Therefore, s = s∗ and G = {s∗ }. The following corollary states that all the results of this paper hold for inventory control models with convex holding/backlog costs. Corollary 7.6. The conclusions of lemmas, theorems, and corollaries in Sections 6 and 7 hold under Assumption 2. Proof. This corollary holds because Assumption 2 implies Assumptions 1 and 4; see Lemma 2.4

8

Veinott’s Reduction of Problems with Backorders and Positive Lead Times to Problems without Lead Times

In this section, we explain, by using the technique introduced by Veinott [21], that the inventory control model with positive lead times can be reduced to the model without lead times. Therefore, the results of this paper, Feinberg and Lewis [9], and Feinberg and Liang [10, 11] also hold for the inventory control model with positive lead times. Consider the inventory control model defined in Section 2. Instead of assuming zero lead times, assume that the fixed lead time is L ∈ N0 , that is, an order placed at the beginning of time t will be delivered at the beginning of time t + L. In addition, let hL (x) be the holding/backlog cost per period if the inventory level is x. We define ( P E[hL (x − L ∗ i=1 Di )] if L > 0, (8.1) h (x) := L h (x) if L = 0. For the inventory control model, the dynamic of the system is defined by the equation xt+1 = xt + at−L − Dt+1 ,

t = 0, 1, 2, . . . ,

(8.2)

where xt and at are the current inventory level before replenishment and the ordered amount at period t. In addition, at period t, the one-step cost is c˜(ht , at ) := KI{at−L >0} + cat−L + E[hL (xt + at−L − Dt+1 )],

t = 0, 1, . . . , (8.3)

where ht = (a−L , a−L+1 , . . . , a−1 , x0 , a0 , . . . , at−1 , xt ) is the history at period t. 35

Equation (8.2) means that a decision-maker observes at the end of the period t the history ht , places an order of amount at , which will be delivered in L periods (that is, at the end of the period t + L), and the demand occurred during the period t + 1 is Dt+1 . As usual, consider the set of possible trajectories h∞ = (ht , at , xt+1 , at+1 , . . .). An arbitrary policy is a regular probability distribution π(dat |ht ), t = 0, 1, . . . , on R+ . It defines the transition probability for ht to (ht , at ). The transition probability for (ht , at ) can be defined by (8.2). Therefore, given the initial state h0 = (a−L , a−L+1 , . . . , a−1 , x0 ), a policy π defines, in view of the Ionescu Tulcea theorem, the probability distribution Phπ0 on the set of trajectories. We denote by Eπh0 the expectation with respect to Phπ0 . For a finite-horizon N = L, L + 1, L + 2, . . . the expected total discounted cost is π v˜N,α (h0 ) := Eπh0

−1 h NX t=0

N −1 i h L−1 i X X αt c˜(xt+L , at+L ) , αt c˜(xt , at ) + αL αt c˜(xt , at ) = Eπh0 t=0

t=0

(8.4)

π (h ) = 0. When N = ∞ and α ∈ [0, 1), where α ∈ [0, 1] is the discount factor and v˜0,a 0 equation (2.15) defines the infinite-horizon expected total discounted cost denoted by v˜απ (h0 ). The average cost per unit time is defined as

w ˜π (h0 ) := lim sup N →+∞

1 π v˜ (h0 ). N N,1

It is possible to construct an MDP with the state space with the states ( if 0 ≤ t ≤ L − 1, (xt , xt−1 , . . . , x0 , at−1 , at−2 , . . . , at−L ) xt = (xt , xt−1 , . . . , xt−L , at−1 , at−2 , . . . , at−L ) if t ≥ L. However, as Veinott [21] observed, the problem can be simplified by constructing an MDP with the states yt := xt +

L X

at−i = xt+L +

L X

Dt+1 ,

t = 0, 1, . . . ,

(8.5)

i=1

i=1

that is, yt is the sum of the current inventory level and the outstanding orders at the end of period t. Following Veinott’s [21] approach, the states are defined in (8.5) and the actions are the amount of orders that can be placed at each period t. In view of (8.2) and (8.5), the dynamic of the system is defined by the equation yt+1 = yt + at − Dt+1 ,

36

t = 0, 1, 2, . . . .

(8.6)

Let the one-step cost be c∗ (yt , at ) := KI{at >0} +cat + E[h∗ (yt + at − D)].

(8.7)

The following two theorems state the validity of this reduction from the inventory control model with positive lead times to the model without lead times. Theorem 8.1. Consider the MDP {X, A, P, c} corresponding to the inventory control model with backorders, where P and c are defined in (2.13) and (2.14), respectively. If h(x) = h∗ (x) for all x ∈ X, then this MDP also corresponds to the model defined by the equations (8.6) and (8.7). Proof. Since yt ∈ X and the actions are the same for these two models, we need to verify only the correspondence for transition probabilities and costs. If h = h∗ , then formulae (2.14) and (8.7) coincide with xt = yt . The transition probabilities for the MDP corresponding to the equation (8.6) is q ∗ (B|yt , at ) = P (yt + at − Dt+1 ∈ B),

(8.8)

for each measurable subset B of R, which also coincides with (2.13). Theorem 8.2. Consider the inventory control model with lead time L ≥ 1 defined in this section and the inventory control model defined by equations (2.12) and (8.7). π (y), where α ∈ [0, 1], N = 0, 1, . . . , be the expected N -horizon discounted total Let vN,α cost for the latter model. Then for each h0 = (a−L , a−L+1 , . . . , a−1 , x0 ) π π v˜N,α (h0 ) = f (h0 ) + αL vN,α (y0 ).

(8.9)

Proof. For t = 0, 1, . . . c˜(ht+L , at+L ) = KI{at >0} + cat + E[hL (xt+L + at − Dt+L+1 )] = KI{at >0} + cat + E[hL (yt −

L X

Dt+i + at − Dt+L+1 )]

i=1

L X   L Dt+i + at − Dt+L+1 )|Dt+L+1 )] = KI{at >0} + cat + E E[h (yt − i=1

= KI{at >0} + c¯at + E[h∗ (yt + at − Dt+L+1 )] = c∗ (yt , at ),

where the first equality follows from (8.3), the second one follows from (8.5), the third one follows from conditional expectation, and the last two follow from the definition of the functions h∗ and c∗ in (8.1) and (8.7), respectively. In view of (8.4), equation (8.9) holds with f (h0 ) :=

L−1 X t=0

αt E[˜ c(x0 +

t−1 X i=0

37

a−L+i −

t−1 X i=0

Dt+i , at )].

This reduction and Theorems 8.1 and 8.2 imply the following corollary. Corollary 8.3. If the function h∗ satisfies Assumption 1 or 2, then the corresponding statements from this paper, Feinberg and Lewis [9], and Feinberg and Liang [10, 11] hold for the model with the lead time L. Remark 8.4. The reduction discussed in this section does not hold for the inventory control model with lost-sales. For such model with zero lead times the dynamics of the system are defined by the equation xt+1 = (xt + at − Dt+1 )+ := max{xt + at − Dt+1 , 0},

t = 0, 1, 2, . . . ,

where xt and at are the current inventory level and ordered amount at period t respectively. If the lead time is L > 0, the dynamics of the system are defined by the equation xt+1 = (xt + at−L − Dt+1 )+ ,

t = 0, 1, 2, . . . .

Consider the transformation similar to the one defined in (8.5). Then yt+1 = xt+1 +

L X

at+1−i = (xt + at−L − Dt+1 )+ +

at+1−i

i=1

i=1

6= (xt + at−L − Dt+1 +

L X

L X

at+1−i )+ = (yt + at − Dt+1 )+ .

i=1

Therefore, the reduction does not hold. Indeed, the structure of the optimal policies may depend on the lead times. In particular, if the lead times are large, then the constant-order policy performs nearly optimally; see Goldberg et al. [12]. Acknowledgement. This research was partially supported by NSF grants CMMI1335296 and CMMI-1636193.

References [1] Beyer D., Cheng F., Sethi S. P., and Taksar M. (2010). Markovian demand inventory models, Springer, New York. [2] D. Beyer and S. Sethi. (1999). The classical average-cost inventory models of Iglehart and Veinott–Wagner revisited. Journal of Optimization Theory and Applications, 101(3):523–555. [3] X. Chen and D. Simchi-Levi. (2004). Coordinating inventory control and pricing strategies with random demand and fixed ordering cost: the finite horizon case. Oper. Res., 52(6):887–896. 38

[4] Chen X. and Simchi-Levi D. (2004). Coordinating inventory control and pricing strategies with random demand and fixed ordering cost: the infinite horizon case. Math. Oper. Res., 29(3):698–723. [5] Feinberg E. A. (2016). Optimality conditions for inventory control. Tutorials in Operations Research, INFORMS, 14–44. [6] Feinberg E. A., Kasyanov P. O., and Zadoianchuk N. V. (2012). Average cost Markov decision processes with weakly continuous transition probability. Math. Oper. Res., 37(4):591–607. [7] Feinberg E. A., Kasyanov P. O., and Zadoianchuk N. V. (2013). Berge’s theorem for noncompact image sets. J. Math. Anal. Appl., 397(1):255–259. [8] Feinberg E. A. and Lewis M. E. (2007) Optimality inequalities for average cost Markov decision processes and the stochastic cash balance problem. Math. Oper. Res., 32(4):769–783. [9] Feinberg E. A. and Lewis M. E. (2016). On the convergence of optimal actions for Markov decision processes and the optimality of (s, S) inventory policies. Preprint arXiv:1507.05125, http://arxiv.org/pdf/1507.05125.pdf. [10] Feinberg E. A. and Liang Y. (2016). Structure of optimal solutions to periodicreview total-cost inventory control problems. Preprint arXiv:1609.03984, http://arxiv.org/pdf/1609.03984.pdf. [11] Feinberg E. A. and Liang Y. (2016). On the Optimality Equation for Average Cost Markov Decision Processes and its Validity for Inventory Control. Preprint arXiv:1609.08252, http://arxiv.org/abs/1609.08252.pdf [12] Goldberg D. A., Katz-Rogozhnikov D. A., Lu Y., Sharma M., Squillante M. S. (2016). Asymptotic Optimality of Constant-Order Policies for Lost Sales Inventory Models with Large Lead Times. Mathematics of Operations Research, 41(3):898– 913. [13] Hern´ andez-Lerma O. and Lasserre J. B. (1996). Discrete-Time Markov Control Processes: Basic Optimality Creteria. Springer-Verlag, New York. [14] Huh W. T., Janakiraman G., and Nagarajan M. (2011) Average Cost SingleStage Inventory Models: An Analysis Using a Vanishing Discount Approach. Operations Research, 59(1):143–155. [15] D. L. Iglehart. (1963) Dynamic programming and stationary analysis of inventory roblems. In Office of Naval Research Monographs on Mathematical Methods in Logistics (H. Scarf, D. Gilford, and M. Shelly, eds.), pp. 1–31, Stanford University Press, Stanford, CA.

39

[16] E. Porteus. (2002). Foundations of Stochastic Inventory Theory. Stanford University Press, Stanford, CA. [17] Resnick S. I. (1992). Adventures in stochastic processes. Birkhauser, Boston. [18] H. Scarf. (1960). The optimality of (S, s) policies in the dynamic inventory problem. Mathematical Methods in the Social Sciences (K. Arrow, S. Karlin, and P. Suppes, eds.), Stanford University Press, Stanford, CA. [19] Sch¨ al M. (1993). Average optimality in dynamic programming with general state space. Math. Oper. Res., 18(1):163–172. [20] Simchi-Levi D., Chen X., and Bramel J. (2005). The Logic of Logistics: Theory, Algorithms, and Applications for Logistics and Supply Chain Management. Springer-Verlag, New York. [21] Veinott A. F. (1966). On the Optimality of (s, S) Inventory Policies: New Condition and a New Proof. J. SIAM Appl. Math., 14(5):1067–1083. [22] A. F. Veinott and H. M. Wagner. (1965). Computing optimal (s, S) policies. Management Science, 11(5):525–552. [23] E. Zabel. (1962). A note on the optimality of (s, S) policies in inventory theory. Manag. Sci., 9(1):123–125. [24] Zheng Y. (1991). A simple proof for optimality of (s, S) policies in infinite-horizon inventory systems. J. Appl. Prob., 28(4):802–810. [25] P. H. Zipkin. (2000). Foundations of Inventory Management. McGraw-Hill, New York.

40

Suggest Documents