A Multiperiod Newsvendor Problem with Partially

A Multiperiod Newsvendor Problem with Partially Observed Demand ∗

Alain Bensoussan, Metin C ¸ akanyıldırım, Suresh P. Sethi

†

International Center for Decision and Risk Analysis School of Management P.O.Box 830688, SM 30 University of Texas at Dallas Richardson, TX 75083-0688 August 25, 2006

Abstract This paper considers the case of partially observed demand in the context of the newsvendor problem. Demand is observed if it is less than the inventory. Otherwise, only the event that it is larger than or equal to the inventory is observed. These observations are used to update the demand distribution. The state of the resulting dynamic programming equation is the current demand distribution, which is infinite dimensional. This formulation becomes linear with the use of unnormalized probabilities. We prove the existence of an optimal feedback ordering policy, and provide an equation for the optimal inventory level. We show that the optimal inventory level is always larger than or equal to the myopic optimal inventory level and that the optimal cost decreases as the demand distribution decreases in the hazard rate order. We apply the theory to a special case of the problem, in which the demand is modeled by a Markov Chain taking a finite number of values. We characterize a near-optimal solution in this case by establishing that the value function is piecewise linear. Keywords: Unobserved unmet demand, Markovian demand, Newsvendor problem. MSC 2000 Subject Classification: Primary: 93C41; Secondary: 49L20 OR/MS Subject Classification: Primary: Inventory / Production, Uncertainty, Stochastic; Secondary: Dynamic Programming / Optimal Control, Models

∗ †

To appear in Mathematics of Operations Research {alain.bensoussan, metin, sethi}@utdallas.edu

Electronic copy available at: http://ssrn.com/abstract=1089292

1

Introduction

The newsvendor problem studies the optimization of the inventory level at the beginning of a sales season to meet the demand during the season (e.g. p.961 of [8] and p.342 of [4]). When the inventory level is more than the demand during the season, costs are incurred on the basis of the leftover inventory at the end of the season. Otherwise, costs are incurred depending on the unmet demand during the season. Although there is an extensive literature on this problem, only recent work has started to emphasize the unobservability of the unmet demand. We consider a multiperiod newsvendor problem, in which the demand in each period is observed fully when it is met from the available inventory. Otherwise, only the event that “the demand is larger than or equal to the inventory” is observed. When the underlying demand distribution is not known but estimated from the demand observations, such partial demand observations limit the data available for estimation as well as optimization. This class of problems are called estimation and/or optimization with censored (demand) data. Ding et al. [6] and Lu et al. [10] study a multiperiod newsvendor model with censored demand. By assuming that the leftover inventories are salvaged and unfilled demands are lost in each period, they decouple the periods from the viewpoint of inventory but not from that of the Bayesian demand updates. That is, the state of the system becomes only the distribution of the demand which is updated in each period based on the partial observations available at that time. Ding et al. and Lu et al. assume that the demands are independently and identically distributed. Prior to these authors, Lariviere and Porteus [9] obtained similar results, but for a more restricted case of exponential demand distributions with gamma conjugate priors. Unlike [6] and [9], this paper models the demand with a stationary Markov process whose transition probability is known. Furthermore, we develop a Zakai-type equation [15] for the evolution of the probability distribution of the demand over time. This facilitates the analysis of the dynamic programming equation for the problem. We prove that the value function is the unique solution of the DP equation and we show that there exists an optimal feedback policy for the problem. Furthermore, we establish that the optimal order quantity is at least as large as that in a myopic solution. The problem studied in this paper can be classified as an example of problems with partial observations [1, 11]. A related example is given by Treharne and Sox [14]. They have a periodic-review inventory model with Markov modulated demands. The state of this demand is not known, and is estimated in a Bayesian fashion by using the observed sale in each period. The plan for this paper is as follows. In the next section, we obtain the evolution equation for the demand distribution. In Section 3, we provide a dynamic programming equation to find the optimal order quantity, and simplify the equation by using the unnormalized probabilities. Next we establish the existence of an optimal feedback policy, and provide an equation satisfied by the optimal order quantity. In Section 5, we compare the optimal and myopic solutions and establish that the value function is monotone in hazard rate order. We study the case of the demands taking a finite number of values in Section 6, and conclude the paper in Section 7.

1 Electronic copy available at: http://ssrn.com/abstract=1089292

2

Evolution of Demand Distribution

Let (Ω, F, P) be the probability space and let n ≥ 1 be the indices for the periods. Let xn ≥ 0 denote the demand occurring at the beginning of period n. The demand is modeled by a Markov process with the transition probabilities given by p(x|ξ) := P(xn+1 = x|xn = ξ). The inventory available to satisfy the demand xn , or a part thereof, is called yn . We can think of yn to be the order placed and delivered at the beginning of period n before the nth period demand xn arrives. Then the amount zn of sales is given by zn := min{xn , yn }.

(1)

When xn < yn , the demand is met and therefore observed. On the other hand, when xn ≥ yn , the inventory is not sufficient to meet the demand in period n. In that case, the amount of sales is yn and xn − yn is the unmet demand. When the demand is not met, the magnitude of the unmet demand is not observed by the inventory manager (IM). Indeed, the IM observes only the sales. Let Zn be the sigma algebra generated by the sales {zj : j ≤ n}, i.e., Zn := σ(z1 , . . . , zn ). Thus, Zn is the history available to the IM at the end of period n. Since the IM decides on yn at the beginning of period n, yn is Zn−1 measurable. However, xn , being partially observed, is not in general Zn measurable. Let the function L(x, y), which depends on the demand x and the available inventory level y ordered to meet the demand, denote the one-period cost function. Ding et al. [6] assume that excess inventory in a period, over and above, the demand is salvaged and the unmet demand in a period is lost. This results in the one-period cost function ( ) ( ) cy − h(y − x) if x ≤ y cy − h(y − x) if x ≤ y L(x, y) = = , (2) cy + b(x − y) if y ≤ x bx + (c − b)y if y ≤ x where h, c and b are, respectively, the salvage value per unit, the ordering cost per unit, and the shortage cost per unit. It is reasonable to assume that 0 ≤ h < c < b. We use the same cost function and also observe that ( ) cy if x ≤ y L(x, y) ≤ if y ≥ 0. (3) bx if y ≤ x With the discount factor 0 < α < 1 and with y defining the sequence of inventory levels y = {y1 , y2 , . . . }, our objective is to choose y so as to minimize J(y) :=

∞ X

αn−1 E L(xn , yn ).

(4)

n=1

A standard assumption in the infinite horizon inventory literature with identically and independently distributed demands or Markovian demands is that the mean demand is finite. This ensures a finite value 2

function. Since in our model, demand could grow over time, we must make an assumption to limit the rate of demand growth. Specifically, we assume E(x1 ) < ∞ and Z ∞ E(xn+1 |xn = ξ) = xp(x|ξ)dx ≤ c0 ξ for n ≥ 1, (5) 0

for a constant c0 < 1/α. Note that if the demand process is a supermartingale, then (5) is satisfied with P n−1 E(x ) < ∞. By (3), this sum, when multiplied by the unit c0 = 1. These conditions ensure that ∞ n n=1 α shortage cost b, is greater than or equal to the total discounted cost associated with the policy of ordering zero in every period. This cost being an upper bound on the value function ensures a finite value function. Later in Subsection 4.1, we restate the inequality part of (5) in the form of (19), which is used for the subsequent analysis in the paper. Let πn (x) = P(xn = x|Zn−1 ) be the probability density function of the demand xn . This density materializes at the beginning of period n after observing zn−1 . The corresponding cumulative density function is denoted by Πn . Starting with a given π1 , we can evolve this distribution over time as R∞ y πn (ξ)p(x|ξ)dξ πn+1 (x) = P(xn+1 = x|Zn−1 , zn ) = 1Izn =yn nR ∞ (6) + 1Izn 0, the result follows.

# θi − bθN .

i=1

¤

¯ 2 (ρ) ≥ G ¯ 1 (ρ) if, and only if, c ≥ hθ1 /(θ1 + Because of this lemma, a problem with N = 2 is trivial. G θ2 ) + bθ2 /(θ1 + θ2 ). Thus, x1 + ² is near-optimal when c is greater than or equal to a weighted average (specified by the current belief ρ of the demand) of h and b. ¯ j (ρ) − G ¯ j−1 (ρ) to the condition Lemma 9 does not apply for j < N − 1, in which case we relate G h

j−1 X

θi + b

i=1

N X

θi ≥ c

N X

i=j

(75)

θi .

i=1

Since h < c < b, the condition is satisfied with j = 1, but fails with j = N + 1. Moreover, the left-hand side of (75) is decreasing in j while the right-hand side is constant. Hence, if we let J(ρ) be the largest index that satisfies condition (75), this condition is satisfied for j ≤ J(ρ), but fails for j ≥ J(ρ) + 1. ¯ j (ρ) ≤ G ¯ j−1 (ρ) if j ≤ J(ρ). Lemma 10. G Proof: From (72), we immediately write



¯ j (ρ) − G ¯ j−1 (ρ) = (xj − xj−1 ) c G 



+α W  



−α W 

N X

θi − h

i=1 N X

k=j

≤ (xj − xj−1 ) c

θi − h

i=1

24

j−1 X i=1

θi  

j X k=1



θk βkN  +

j−1 X

 θk W (βk− ) 

θk W (βk− )

k=1

k=j N X



θk βkN  +

k=j+1 N X

N X i=j

N X

θk βk1 , . . . ,

θk βk1 , . . . ,



θi − b

i=1

k=j+1 N X

j−1 X

θi − b

N X i=j



θi  ,

(76)

where the inequality is due to the superadditivity of W in Lemma 7. Since the right-hand side of (76) is ¯ j (ρ) − G ¯ j−1 (ρ) ≤ 0. nonpositive for j ≤ J(ρ), we have G ¤ ¯ j (ρ) only for j ≥ J(ρ) to find the optimal index j ∗ (ρ). As a result of Lemma 10, we need to evaluate G ¯ j (ρ). This refinement, by restricting the search space for j ∗ (ρ), would speed Namely, j ∗ (ρ) = arg minj≥J(ρ) G up a procedure of finding j ∗ (ρ). Application of Lemmas 9 and 10 to N=3: We consider a special case with x1 < x2 < x3 , and use Lemmas 9 and 10 to find conditions on h, c, b and (θ1 , θ2 , θ3 ) under which j ∗ (θ1 , θ2 , θ3 ) = 2. From ¯ 3 (θ1 , θ2 , θ3 ) ≥ G ¯ 2 (θ1 , θ2 , θ3 ) if h(θ1 + θ2 ) + bθ3 ≤ c(θ1 + θ2 + θ3 ). From Lemma 10 similarly, Lemma 9, G ¯ 1 (θ1 , θ2 , θ3 ) ≥ G ¯ 2 (θ1 , θ2 , θ3 ) if hθ1 + b(θ2 + θ3 ) ≥ c(θ1 + θ2 + θ3 ). Under all h, c, b and (θ1 , θ2 , θ3 ) values such G that h(θ1 + θ2 ) + bθ3 ≤ c(θ1 + θ2 + θ3 ) ≤ hθ1 + b(θ2 + θ3 ), we have j ∗ (θ1 , θ2 , θ3 ) = 2. Note that we arrive at ¯ j values. this conclusion without evaluating any one of the G ♦

7

Concluding Remarks

We have studied a newsvendor problem with partially observed demands. Partial demand observations lead to a dynamic program in the space of probability distributions. This dynamic program is highly nonlinear. We use the idea of unnormalized probabilities to linearize the dynamic programming equation. This linearization allows us to prove the existence of an optimal feedback policy. The methodology of unnormalized probabilities facilitates the proofs of existence of a solution to the DP and of an optimal feedback solution in problems with partial observations. In addition, we obtain the equation for the optimal inventory level. We show that the optimal inventory level is larger than or equal to the myopic optimal inventory level. We also illustrate the computation of the optimal policy for the case when the demands can take only a finite number of specified values. Our future research on this problem would include studying a family of parameterized distributions for π to examine the evolution of the parameter(s) with Bayesian updates as described in [2]. We plan to numerically investigate the evolution of π by restricting it to the convex hull of a given set of probability distributions. Furthermore, we would like to treat the lost sales case, in which the excess inventory is carried from one period to the next. This will considerably complicate the matter, as it would bring the inventory level as an additional fully-observed state variable. Of course, in periods when it is more than the available inventory, the demand will still be censored as in the present model.

Acknowledgments: This research is supported in part by NSF grant DMS-0509278 and ARP grant 223259. The authors thank J. Adolfo Minj´arez-Sosa, the anonymous referees, and the associate editor for meticulously reading the paper and making many suggestions for improvement.

25

Appendix Justification of Equation (6): We have J(y) = E L(x1 , y1 ) + = E L(x1 , y1 ) +

∞ X n=2 ∞ X

αn−1 E L(xn , yn ) αn E L(xn+1 , yn+1 ).

(77)

n=1

Note that y1 is determined with certainty at the beginning of period 1. For later periods, we need E L(xn+1 , yn+1 ), which can be obtained as E {E[L(xn+1 , yn+1 )|Zn ]}. To facilitate this, we introduce an arbitrary test function φ(x), for which we compute Z E[φ(xn+1 )|Zn ] = φ(x)πn+1 (x)dx. (78) Since πn+1 is updated from πn based on the history Zn , it is obvious that πn+1 (x) is Zn -measurable. Since we start with the given distribution π1 of x1 , Z (79) E[φ(x1 )] = φ(ξ)π1 (ξ)dξ. In order to compute (78), we must first obtain E[ψ(xn )|Zn ] for any test function ψ(x). Note that both (78) and E[ψ(xn )|Zn ] are taken after observing the sales zn , but the former (resp. latter) involves the demand in period n + 1 (resp. n). Since the IM observes only the sales, demand xn is not Zn -measurable. But we can write E[ψ(xn )|Zn ] = E[ψ(xn )1Izn =yn |Zn ] + E[ψ(xn )1Izn