(ii) for all x â IR it holds g(x) < â if and only if µ+ = E max{ξ,0} < â. (iii) Suppose µ+ < â. Then. (a) g is (Lipschitz-)continuous,. (b) g is subdifferentiable,. 4 ...
Stochastic programming with simple integer recourse Fran¸cois V. Louveaux Facult´es Universitaires Notre-Dame de la Paix, 8 Rempart de la Vierge, B-5000 Namur, Belgium
Maarten H. van der Vlerk1 Department of Econometrics, University of Groningen, P.O. Box 800, 9700 AV Groningen, The Netherlands
Stochastic integer programs are notoriously difficult. Very few properties are known and solution algorithms are very scarce. In this paper, we introduce the class of stochastic programs with simple integer recourse, a natural extension of the simple recourse case extensively studied in stochastic continuous programs. Analytical as well as computational properties of the expected recourse function of simple integer recourse problems are studied. This includes sharp bounds on this function and the study of the convex hull. Finally, a finite termination algorithm is obtained that solves two classes of stochastic simple integer recourse problems.
Key words: Simple integer recourse, expected value function, convex hull, algorithms.
Abbreviated title: Simple Integer Recourse
1
Introduction
In trying to model realistic mathematical programs, one is very often faced with some type of uncertainty in the data or parameters of a problem. This justifies the interest in the theory and practice of stochastic programming. In stochastic programs, some decisions—called first-stage decisions—are taken before the information on the random elements is known. Later when the information is revealed, second-stage or recourse actions may be taken. Since the pioneering work of Dantzig [3], much of the research in this field has concentrated on stochastic linear programming. The reader may be referred to the book edited by Ermoliev and Wets [4] to have a general view of the field. Results of recent numerical experiments can be found in [5]. However, the most easily solved stochastic programs are those having simple recourse. In simple recourse models, the only possible recourse action is to incur linear penalties for any shortage or surplus. It follows that properties of these programs are numerous (see e.g. [6]) and computational results are much better than in the general case (see e.g. [11]). 1
Supported by the National Operations Research Network in the Netherlands (LNMB).
1
Very little is known about stochastic integer programs (SIP). One reason for this is that many nice properties of continuous (simple) recourse models are lost. In general, the expected recourse function of a SIP is discontinuous, and in case it is continuous, it usually is non convex [10]. Therefore, the research of algorithms has been limited so far, with the exception of stochastic programs with binary first-stage decisions [7]. If properties or methods should ever become available, it is to be expected that they are to be found first in the case of programs with simple integer recourse. It is precisely the object of this paper to study this particular class of problems. A formal definition of stochastic programs with simple recourse is given in Section 2. Finiteness, monotonicity and computational formulae are discussed in Section 3, whereas Section 4 handles continuity and differentiability. Bounds and approximations are provided in Section 5 and properties of the convex hull in Section 6. Finally, in Section 7 we present two algorithms to solve some stochastic two-stage integer programs.
2
Simple Integer Recourse
Let ξ be a random vector with support Ξ ⊂ IRm2 and expectation µ. The cumulative distribution function (cdf) of ξ is denoted by F with F (t) = Pr{ξ ≤ t},
t ∈ IRm2 .
We consider the following mathematical program : SIR
min z = cx + Eξ {min q + y + + q − y − |
(1)
y ≥ ξ − T x, +
y − ≥ T x − ξ, − 2 2 y + ∈ ZZm Zm + , y ∈Z + }
s.t. Ax = b, x ∈ IRn+1 , where A, b, c, q + , q − and T are given matrices of appropriate size. We call this mathematical program a stochastic linear program with simple integer recourse. Sometimes also x is integer. In that case, x ∈ IRn+1 is replaced by x ∈ ZZn+1 . In the sequel, we assume q + ≥ 0 and q − ≥ 0. A simple recourse program models the allocation of scarce resources under uncertainty, with linear penalties q + y + associated with shortages and q − y − with surplus. Compared to the classical simple recourse model, the SIR requires the recourse actions to be integer. This may occur in production planning when production only takes place in batches. Also, in several distribution models, clients are served by fully loaded trucks. Whenever a particular client has a higher demand, extra trucks have to be sent, the extra cost being qi+ per truck for client i, where qi+ represents the component i of q + . The interest in stochastic programs with simple recourse comes not only from practical applications but also from the fact that it is the simplest of all nontrivial stochastic programs.
2
The SIR program is easily seen to be a natural extension of the stochastic program with continuous simple recourse SR
min z = cx + Eξ {min q + y + + q − y − | y + − y − = ξ − T x, m2 − 2 y + ∈ IRm + , y ∈ IR+ }
s.t. Ax = b, x ∈ IRn+1 . To study SIR, we define the second-stage value function Q(x, ξ) = min{q + y + + q − y − | y + ≥ ξ − T x, y − ≥ T x − ξ, − 2 2 y + ∈ ZZm Zm + , y ∈Z + }
Under the assumption q + ≥ 0 and q − ≥ 0, this function is non-negative. Also, the second-stage always being feasible, Q(x, ξ) is finite for every x ∈ IRn1 and ξ ∈ Ξ. We also define Q(x) = Eξ Q(x, ξ) as the second-stage expected value function. It is customary to define χ = T x as a tender to be bid against future outcomes, so that Q(x) can be seen as a separable function of the components χi of χ. Q(x) =
m2 X
ψi (χi ),
i=1
with ψi (χi ) = Eξi ψi (χi , ξi ) and ψi (χi , ξi ) = min{qi+ yi+ + qi− yi− | yi+ ≥ ξi − χi , yi− ≥ χi − ξi ,
yi+ ∈ ZZ+ , yi− ∈ ZZ+ }. Now, define the expected shortage as gi (χi ) = Edξi − χi e+ and the expected surplus as hi (χi ) = Edχi − ξi e+ , where dxe denotes the smallest integer greater than or equal to x, and dxe+ = max{dxe, 0}. The ith term in the definition of the expected value function is thus ψi (χi ) = qi+ gi (χi ) + qi− hi (χi ). 3
As indicated before, we assume qi+ ≥ 0 , qi− ≥ 0. Hence, all properties of the expected value function directly depend on the properties of the expected shortage and expected surplus. In Section 3, we study finiteness, monotonicity and computation formulas for these functions. Continuity and differentiability are studied in Section 4, bounds and approximations in Section 5, the convex hull in Section 6. Finally, algorithmic procedures to solve (1) are considered in Section 7.
3
The expected value function: definition and computation
3.1
Definitions
We repeat the definitions of expected shortage and expected surplus. For convenience we drop the index i. Definition 3.1 Let ξ be a random variable and x ∈ IR. The expected shortage is defined as g(x) = Edξ − xe+ (2) and the expected surplus is h(x) = Edx − ξe+ .
(3)
It is convenient to link these functions to their continuous counterpart. Definition 3.2 Let ξ be a random variable and x ∈ IR. Then g(x) = E(ξ − x)+
(4)
h(x) = E(x − ξ)+ .
(5)
and They can be interpreted as second stage value functions of programs without integrality conditions on the second stage variables. On several occasions we will use properties of g and h to analyze the functions g and h. Moreover, in Section 6 it will turn out that the convex hulls of g and h are functions similar to g and h respectively. For easy reference, we state here some known properties of the function g. Lemma 3.1 Let ξ be a random variable with a cdf F . Then the function g has the following properties : (i) g is a non-negative non-increasing convex function on IR. (ii) for all x ∈ IR it holds g(x) < ∞ if and only if µ+ = E max{ξ, 0} < ∞. (iii) Suppose µ+ < ∞. Then (a) g is (Lipschitz-)continuous, (b) g is subdifferentiable, 4
(c) g is differentiable in any continuity point of F , (d) g has a horizontal asymptote y = 0 for x → ∞. If also µ− = E max{−ξ, 0} < ∞ then µ−x is an asymptote for x → −∞. Outside the support of ξ the function g is equal to its asymptotes.
2
Similar properties hold for the function h. Without proof we give the following well-known formulas for g and h : Lemma 3.2 Let ξ be a random variable with a cdf F . Then Z
∞
g(x) = x
(1 − F (t)) dt, x ∈ IR,
(6)
and Z
h(x) =
x
−∞
F (t) dt, x ∈ IR.
(7)
2 3.2
Finiteness and monotonicity
By providing appropriate bounds we obtain a necessary and sufficient condition for finiteness of the function g. Noting that (t − x)+ ≤ dt − xe+ ≤ (t − x + 1)+ ≤ (t − x)+ + 1,
∀t ∈ IR
(8)
we see that E(ξ − x)+ ≤ g(x) ≤ E(ξ − x + 1)+ ≤ E(ξ − x)+ + 1,
(9)
so g(x) is finite if and only if g(x) is finite. Hence we have Theorem 3.1 Let ξ be a random variable and µ+ = E max{ξ, 0}. Then g is a non-negative non-increasing extended real-valued function. Moreover, for all x ∈ IR it holds g(x) < ∞ if and only if µ+ < ∞. Proof. Use (9) and Lemma 3.1. The monotonicity and non-negativity are obvious. 2 In the same way we find Theorem 3.2 Let ξ be a random variable and µ− = E{−ξ, 0}. Then h is a non-negative non-decreasing extended real-valued function. Moreover, for all x ∈ IR it holds h(x) < ∞ if and only if µ− < ∞. 2 In the rest of this paper we will assume that µ+ and µ− are finite. Since µ+ − µ− = µ, this is of course equivalent to µ < ∞.
5
3.3
A formula for the expected shortage (surplus) function
To be able to study the function g more closely we now derive its expression in terms of the cdf F of the random variable ξ. Theorem 3.3 Let g(x) = Edξ − xe+ , where ξ is a random variable with a cdf F . Then ∞ g(x) =
X
(1 − F (x + k)),
x ∈ IR.
(10)
k=0
Proof. ∞ X
(1 − F (x + k)) =
k=0
=
∞ X k=0 ∞ X
Pr{ξ − x > k} ∞ X
Pr{dξ − xe+ = j}
k=0 j=k+1
= =
∞ j−1 X X
Pr{dξ − xe+ = j}
j=1 k=0 ∞ X
j · Pr{dξ − xe+ = j}
j=1
= Edξ − xe+ = g(x).
2 The formula (10) is very similar to the expression for g(x) = E(ξ−x)+ presented in Lemma 3.2. Because of Theorem 3.1 the sum in (10) converges for any x ∈ IR, since µ+ < ∞. In fact, this convergence is uniform on each interval [a, ∞) for x, since F is non-decreasing. To be able to present the corresponding result for the function h, it is convenient to define Fˆ (t) = Pr{ξ < t}. Note that Fˆ is continuous from the left for any distribution and that Fˆ equals F if ξ has a pdf. Theorem 3.4 Let h(x) = Edx − ξe+ , where ξ is a random variable with a cdf Fˆ . Then ∞ X h(x) = Fˆ (x − k), x ∈ IR. (11) k=0
2 Formulas (10) and (11) provide a means to compute g(x) and h(x) in finitely many steps in two cases : when analytical expressions can be found for the infinite sums or when ξ has bounded support. These two situations are illustrated by examples on the exponential distribution and on the uniform distribution respectively. In all other situations we have to resort to approximations, see Section 5.
6
Example 3.1 Let ξ be a random variable following an exponential distribution with parameter λ > 0. Then g(x) =
∞ X
(1 − F (x + k))
k=0 ∞ X
= d−xe+ +
e−λ(x+k)
k=d−xe+
e−λ(x+d−xe = d−xe + 1 − e−λ
+)
+
(12)
Since µ+ = 1/λ < ∞, Theorem 3.1 implies that g(x) < ∞, ∀x ∈ IR. Because F (t) = 0, ∀t ≤ 0, it is easy to see that h(x) = 0, ∀x ≤ 0. The following is an explicit formula for the function h restricted to the interval [0, ∞) : h(x) =
∞ X
F (x − k)
k=0
= bxc + 1 − e−λ(x−bxc) = bxc + 1 −
bxc X
e−λk
k=0 ! −λ(x−bxc) e − e−λ(x+1)
1 − e−λ
,
(13)
where bxc denotes the largest integer less than or equal to x. Since µ− = 0, Theorem 3.2 implies that h(x) < ∞, ∀x ∈ IR. Example 3.2 Let ξ be a random variable with a uniform distribution on [a, b], 0 ≤ a < b < ∞ (Notation : ξ ∼ U (a, b) ). Without loss of generality we may restrict the analysis to distributions on the interval [0, c], c = b − a, since g(x + a) = Edζ − xe+ where ζ ∼ U (0, c). g(x) =
∞ X
(1 − F (x + k))
k=0 dc−xe+ −1
X
= d−xe + +
(1 − F (x + k))
k=d−xe+ dc−xe+ −1
= dc − xe − +
X
k=d−xe+
= dc − xe+ − −
x+k c
1 dc − xe+ + d−xe+ − 1 dc − xe+ − d−xe+ + 2c
1 dc − xe+ − d−xe+ x c
(14)
Since µ+ = c/2 < ∞ we know by Theorem 3.1 that g(x) < ∞, ∀x ∈ IR. We conclude this section by stating the relation that exists between function values in points that are integer length apart, and some applications thereof. 7
Corollary 3.1 For all n ∈ ZZ+ we have g(x + n) = g(x) −
n−1 X
(1 − F (x + k)),
∀x ∈ IR.
(15)
k=0
2
Proof. Directly from Theorem 3.3. Corollary 3.2 For all n ∈ ZZ+ we have h(x + n) = h(x) +
n X
Fˆ (x + k),
∀x ∈ IR.
(16)
k=1
2 We can use Corollary 3.1 in several ways. If ξ has a discrete distribution with support Ξ ⊂ ZZ (e.g. a Poisson distribution), then it allows us to compute g(x) in finitely many steps. Corollary 3.3 Let ξ be a discrete random variable with support Ξ ⊂ ZZ and cdf F . Then −1 X + µ − bxc − F (k),
g(x) =
if x < 0;
k=bxc bxc−1
X + F (k), µ − bxc +
if x ≥ 0.
k=0
Proof. Observe that g(x) = g(bxc), ∀x ∈ IR, since F (t) = F (btc), ∀t ∈ IR. Trivially, g(0) = µ+ . Apply Corollary 3.1 to obtain the result. 2 Corollary 3.4 Let ξ be a discrete random variable with support Ξ ⊂ ZZ and cdf F . Then −1 X − µ − F (k),
h(x) =
if x < 0;
k=dxe dxe−1
X − F (k), µ +
if x ≥ 0.
k=0
2 To see another application of Corollary 3.1, suppose that c = sup{t : F (t) = 0} > −∞. Then for all x ≤ c we have g(x + 1) = g(x) − 1. In this case g is a semi-periodic function with period 1 on (−∞, c + 1]. (We call a function semi-periodic if it can be decomposed into a periodic component and a linear trend.) In Section 6 we will use this property to find the convex hull of g in the case that the cdf F belongs to a certain class. Corollaries 3.1 and 3.2 will also be used in Section 7 to design algorithms to solve (1). 8
4
Continuity and differentiability
In Section 4.1 we prove several continuity properties of the function g. In Section 4.2 differentiability is discussed. Corresponding results for the function h are mentioned.
4.1
Continuity
First we discuss (one-sided) continuity and semicontinuity. Theorem 4.1 Let ξ be a random variable. Then (i) g is continuous from the right. It is continuous in x = q if and only if Pr{ξ ∈ q + ZZ+ } = 0. (ii) g is lower semicontinuous. Proof. (i) From (10) we see that the function g is equal to a uniformly convergent sum of terms that are all non-increasing and continuous from the right. Therefore, lim g(x) = x↑q
=
∞ X k=0 ∞ X
lim Pr{ξ > x + k} x↑q
Pr{ξ ≥ q + k},
(17)
k=0
and similarly lim g(x) = x↓q
∞ X
Pr{ξ > q + k} = g(q).
(18)
k=0
Comparing (17) and (18), part (i) follows. (ii) Follows immediately from (i) and the fact that g is non-increasing.
2
Theorem 4.2 Let ξ be a random variable. Then (i) h is continuous from the left. It is continuous in x = q if and only if Pr{ξ ∈ q − ZZ+ } = 0. (ii) h is lower semicontinuous.
2
Next we consider Lipschitz continuity of the function g. Schultz [9] gives sufficient conditions for the expected value function to be Lipschitz continuous, in the general complete recourse case. The proof is rather complicated. In the simple recourse case we use formula (10) to get a straightforward proof of the result. Also, for the same reason, we are able to give a sharp upper bound on the Lipschitz constant in a special case. Before we present these results, we prove a lemma. 9
Lemma 4.1 Let f be a pdf. If f (t) is non-increasing for t sufficiently large, then ∞ X
f (x + k) < ∞,
∀x ∈ IR.
(19)
k=0
In fact, this sum is at most 1 if x is sufficiently large. Proof. Let T ∈ IR be such that f (t) is non-increasing on [T, ∞), and suppose that x ≥ T . Then we have for any k ∈ {1, 2, . . .} that f (x + k) ≤ f (t), for t ∈ [x + k − 1, x + k]. Hence, ∞ X
Z
f (x + k) ≤
∞ T
k=1
f (t) dt ≤ 1.
It follows that (19) is true for x ≥ T . Hence, it must be true for x < T too, since in that case ∞ X
bT −xc
f (x + k) =
k=0
X
bT −xc
X
f (x + k)
k=bT −xc+1
k=0
=
∞ X
f (x + k) +
f (x + k) +
k=0
∞ X
f (y + k) < ∞,
k=0
2
where y = x + bT − xc + 1 > T .
Theorem 4.3 Let ξ be a random variable with a pdf f . If f (t) is non-increasing for t sufficiently large and if f is bounded, then the function g is Lipschitz continuous on any interval [c, ∞). Proof. Let c ≤ x < y and f (t) ≤ M, ∀t ∈ IR. Let T ∈ IR be such that f is non-increasing on [T, ∞). Define K = dT − c + 1e+ , so that f is non-increasing on [x + K − 1, ∞). Then |g(x) − g(y)| = g(x) − g(y) = = ≤
∞ X
(1 − F (x + k)) −
k=0 " ∞ Z y+k X k=0 K−1 X
#
∞ X
(1 − F (y + k))
k=0
f (t) dt
x+k
|x − y| M +
k=0
∞ X
"Z
k=K
≤ KM |x − y| + |x − y|
#
y+k
f (t) dt x+k ∞ X
f (x + k)
k=K
≤ (KM + L) |x − y|, where the last inequality holds by Lemma 4.1. 10
L ≤ 1,
2
Corollary 4.1 Let ξ be a random variable with a pdf f . If f is non-increasing and bounded on [c, ∞) then the function g is Lipschitz continuous on [c + 1, ∞) with Lipschitz constant L ≤ 1.
2
Proof. Directly from Theorem 4.3 with K = 0.
We finish this section by giving a theorem on Lipschitz continuity of the function h. In this case, a monotonicity condition on the left tail of the distribution of ξ is needed. Theorem 4.4 Let ξ be a random variable with a pdf f . If f (t) is non-decreasing for t sufficiently small and if f is bounded, then the function h is Lipschitz continuous on any interval (−∞, c]. 2
4.2
Differentiability
We already know that g is in general not differentiable on IR, since it may be discontinuous on IR. In this section we give necessary and sufficient conditions for (one-sided) differentiability of the function g and derive the corresponding formulas. Before presenting these results we introduce some notation. By f−0 and f+0 we denote the left and right derivative of a function f respectively. Since we need one-sided continuity of the pdf f in the next theorem, we define the following two functions : f− (t) = lim f (t − ε),
(20)
f+ (t) = lim f (t + ε).
(21)
ε↓0
ε↓0
Note that f− and f+ are continuous from the left and right respectively. Theorem 4.5 Let ξ be a random variable having a piecewise continuous pdf f . Assume that µ+ = Eξ + < ∞ and that f (t) is non-increasing for t sufficiently large. Let D be the set of discontinuity points of f . Define the set D = {t ∈ IR : ∃d ∈ D and ∃k ∈ {0, 1, . . .} such that t = d − k}. Then 0 and the right derivative g 0 exist everywhere, and (i) the left derivative g− + ∞ X
0 g− (x) = − 0 g+ (x) = −
f− (x + k),
(22)
f+ (x + k),
(23)
k=0 ∞ X k=0
where f− and f+ are defined by (20) and (21), (ii) g is differentiable in x if and only if x 6∈ D. In this case g0 (x) = −
∞ X k=0
11
f (x + k).
(24)
Proof. First note that the assumption on the monotonicity of f ensures by Lemma 4.1 that the values of the sums in (22), (23) and (24) are finite. The assertions in the first part of the theorem are proven for the right derivative only, since the proof for the left derivative is analogous. (i) By definition we have 0 g+ (x) = lim h↓0
g(x + h) − g(x) h
if this limit is defined. Substituting the formula for g we get 0 g+ (x)
P∞
k=0 (1
= lim h↓0
"
= lim h↓0
=
Z ∞ X −1 k=0
TX x −1 k=0
− F (x + k + h)) − h
"
h
k=0 (1
− F (x + k))
#
x+k+h
f (t) dt
x+k
−1 lim h↓0 h
P∞
Z
#
x+k+h
f (t) dt + x+k
lim h↓0
Z ∞ X −1
k=Tx
h
x+k+h
f (t) dt ,
(25)
x+k
where Tx is chosen such that the pdf f is non-increasing on [x + Tx , ∞). Considering the second term on the right in (25) we have that for all k ∈ {Tx , Tx + 1, . . .} 0≤
1 h
Z
x+k+h
x+k
f (t) dt ≤
1 h f (x + k) = f (x + k). h
P∞
We know that k=0 f (x + k) < ∞. Therefore, by Lebesgue’s Dominated Convergence Theorem, we may exchange summation and taking the limit. Since lim h↓0
1 h
Z
x+k+h x+k
f (t) dt = f+ (x + k)
the result follows. (ii) To prove the second part of the theorem we simply note that the left and right derivatives in a point x are equal if and only if x 6∈ D. In this case f− (x+k) = f+ (x+k) = f (x+k), ∀k ∈ {0, 1, . . .}, and formula (24) follows from the first part of the theorem.
2
Using Theorem 4.5 we see that the function g is differentiable on the open interval (c, ∞) if and only if the pdf f is continuous on this same interval, and that g is differentiable on IR if and only if f is continuous on IR. If the random variable ξ follows a discrete distribution, then g is a piecewise constant function which is continuous from the right. It follows that in this 0 (x) = 0, ∀x ∈ IR, and that g 0 (x) = 0, ∀x ∈ IR \ D . Here D denotes the case g+ g g − 12
set of discontinuity point of the function g. If x ∈ Dg the left derivative does not exist. The results on (one-sided) differentiability of the function h are similar. Theorem 4.6 Let ξ be a random variable having a piecewise continuous pdf f . Assume that µ− = Eξ − < ∞ and that f (t) is non-decreasing for t sufficiently small. Let D be the set of discontinuity points of f . Define the set D = {t ∈ IR : ∃d ∈ D and ∃k ∈ {0, 1, . . .} such that t = d + k}. Then (i) the left derivative h0− and the right derivative h0+ exist everywhere, and ∞ X
h0− (x) = h0+ (x) =
f− (x − k),
(26)
f+ (x − k),
(27)
k=0 ∞ X k=0
where f− and f+ are defined by (20) and (21), (ii) h is differentiable in a point x if and only if x 6∈ D. In this case h0 (x) =
∞ X
f (x − k).
(28)
k=0
2 Results on differentiability on an interval follow from Theorem 4.6. Regarding discrete distributions, similar remarks apply as to the function g.
5
Bounds and Approximations
We already know that g(x) ≤ g(x) ≤ g(x − 1). Now we prove a sharper upper bound and use it to derive an approximating formula. Theorem 5.1 Let ξ be a random variable with a cdf F . Then g(x) ≤ g(x) + 1 − F (x).
(29)
Proof. Since 1 − F (t) is non-increasing we have for any x ∈ IR and any k ∈ {1, 2, . . .} that 1 − F (x + k) ≤ 1 − F (t), for t ∈ [x + k − 1, x + k). Hence, ∞ X
(1 − F (x + k)) ≤
Z
k=1
∞
x
(1 − F (t)) dt.
Adding 1 − F (x) to both sides gives the result. It is easy to show that this bound is sharper indeed. Similarly, for the function h we have 13
2
Theorem 5.2 Let ξ be a random variable with a cdf Fˆ (t) = P (ξ < t). Then h(x) ≤ h(x) + Fˆ (x) .
(30)
2 We now turn to approximation formulae, in order to compute g(x) and h(x) by an expression with finitely many terms. Theorem 5.3 Let ξ be a random variable with a cdf F . Let n be an integer, n ≥ 1. Define gn (x) =
n−1 X
(1 − F (x + k)) + g(x + n).
(31)
k=0
Then gn (x) ≤ g(x) ≤ gn (x) + 1 − F (x + n)
(32)
2
Proof. Directly from Theorem 5.1 and Corollary 3.1. A similar result is obtained for h(x).
Theorem 5.4 Let ξ be a random variable with a cdf Fˆ (t) = Pr{ξ < t}. Let n be an integer, n ≥ 1. Define n−1 X
Fˆ (x − k) + h(x − n).
(33)
hn (x) ≤ h(x) ≤ hn (x) + Fˆ (x − n)
(34)
hn (x) =
k=0
Then
2 These two theorems provide useful approximations of g(x) and h(x), using a finite sum of terms. To approximate g(x) within an accuracy ε we have to compute the first n terms of g(x), where n is chosen such that F (x + n) ≥ 1 − ε, and evaluate an integral, i.e. compute g(x+n). We illustrate this approximation technique with an example on the normal distribution. Example 5.1 Let ξ ∼ N (µ, σ 2 ) with cdf F and pdf f . Then gn (x) = =
n−1 X
(1 − F (x + k)) +
k=0 n−1 X
Z
∞
x+n
(1 − F (t)) dt
(1 − F (x + k)) − (x + n)(1 − F (x + n)) +
k=0
Z
∞
tf (t) dt. x+n
Using tf (t) = µf (t) − σ 2 f 0 (t), it follows that gn (x) =
n−1 X
(1 − F (x + k)) + (µ − x − n)(1 − F (x + n)) + σ 2 f (x + n).
k=0
In the same way we find hn (x) =
n−1 X
F (x − k) + (x − n − µ)F (x − n) + σ 2 f (x − n).
k=0
14
6
Convexity and convex hull
In this section we study convexity properties of the functions g and h. Since these functions are in general non-convex, much attention is paid to finding their respective convex hulls. As before, the text and proofs are on the function g only, but results (and one proof) are presented for both functions. As noted in Section 4.1 the function g is continuous if and only if the cdf F of ξ is continuous. This is of course also a necessary condition for convexity of a finite function on IR. A sufficient condition for convexity on an interval [c, ∞) is that F is concave on [c, ∞). To see this, use (10) and note that under this condition g equals the finite valued sum of an infinite number of convex functions, which is also convex. If ξ has a pdf f , concavity of F on [c, ∞) is equivalent to f being a non-increasing function on [c, ∞). However, in general g(x) = Edξ − xe+ will not be a convex function on IR. Therefore we are interested in the convex hull of this function, which is the largest convex function majorized by g. We denote the convex hull by g ∗∗ , since it is the biconjugate function of g (see e.g.[8]). In practice it can be very difficult to compute g∗∗ by biconjugation. Before we state Theorem 6.3 that allows us to compute g∗∗ when ξ has a cdf that belongs to a special class, we present a result on which it is based. Theorem 6.1 Let u be a finite convex function on IR with asymptote 0 as x → ∞. Then u is monotone non-increasing and Z
∞
u(x) = x
−u0+ (t) dt,
(35)
where u0+ denotes the right derivative of u. Moreover, if limx→−∞ u0+ (x) = −1, then U (t) = u0+ (t) + 1
(36)
is a cdf. Hence, using Lemma 3.2, we have Z
∞
u(x) = x
(1 − U (t)) dt = E(ζ − x)+ ,
x ∈ IR,
(37)
where ζ is any random variable with cdf U . Proof. Since u is a finite convex function on IR, u0+ exists at any x. To prove (35) we use ([8], Corollary 24.2.1) u(x) − u(y) =
Z
y
x
(1 − U (t)) dt, ∀x < y.
(38)
Taking y −→ ∞ we get the desired result. If u satisfies all conditions specified, u0+ is non-decreasing and continuous from the right, u0+ (−∞) = −1 and u0+ (∞) = 0, so that U is a cdf. 2 We will show that g∗∗ satisfies all conditions for u in Theorem 6.1. Recall the properties of the function g¯(x) = E(ξ − x)+ that were discussed in Section 3. Since g(x) ≤ g∗∗ (x) ≤ g(x) + (1 − F (x)), ∀x ∈ IR, 15
(39)
and µ+ < ∞, we know that g∗∗ is a finite function on IR with asymptote 0 as x → ∞. Since it is convex too, all right derivatives (g∗∗ )0+ (x), x ∈ IR, exist. Because of convexity, if g∗∗ would be increasing on some interval, say [a, b], it would be increasing on [a, ∞). But this contradicts the right inequality in (39), since both g(x) and (1 − F (x)) are non-increasing. It follows that g∗∗ is a non-increasing function. Finally, the convex functions g∗∗ and g differ by at most 1, so that lim (g∗∗ )0+ (x) = lim g0+ (x)
x→−∞
x→−∞
= lim (F (x) − 1) = −1. x→−∞
As to the convex hull of the function h, it is readily seen to satisfy all conditions for the function v in Theorem 6.2 below. Theorem 6.2 Let v be a finite convex function on IR with asymptote 0 as x → −∞. Then v is monotone non-decreasing and Z
v(x) =
x
−∞
0 v+ (t) dt,
(40)
0 denotes the right derivative of v. Moreover, if lim 0 where v+ x→∞ v+ (x) = 1, then 0 V (t) = v+ (t)
(41)
is a cdf. Hence, using Lemma 3.2, we have Z
v(x) =
x
−∞
V (t) dt = E(x − ζ)+ ,
x ∈ IR,
(42)
2
where ζ is any random variable with cdf V .
Now we are ready to state the main results of this section. If ξ has a pdf with support that is contained in [0, ∞) and if the cdf F of ξ is concave on [0, ∞), we can use the following theorem to obtain the convex hull g∗∗ , and, using Theorem 6.1, the associated cdf G. Theorem 6.3 Let ξ be a random variable having a pdf f with support in [0, ∞). Let the cdf F of ξ be a concave function on [0, ∞). Then (i) the convex hull of g is g ∗∗ (x) =
g(θF ) + (θF − x), g(x),
if x ≤ θF ; if x ≥ θF ,
(43)
for some θF ∈ (0, 1), (ii) the random variable ζ satisfying g∗∗ (x) = E(ζ − x)+ , has a cdf G defined by G(t) =
0, P 1− ∞ k=0 f (t + k), 16
if t < θF ; if t ≥ θF .
(44)
Proof. (i) Recall that g(x + 1) = g(x) − (1 − F (x)),
∀x ∈ IR.
Since ξ has a pdf with support in [0, ∞), the function g is semi-periodic on the interval (−∞, 1], i.e. g(x + 1) = g(x) − 1, ∀x ≤ 0. Moreover, using F (x + k) = 0 for k < d−xe+ , we find ∞ X
g(x) = d−xe+ +
(1 − F (x + k)).
k=d−xe+
We see that g is a convex function on every interval (−(m + 1), −m), m ∈ {−1, 0, 1, . . .}, since F is concave on [0, ∞). The function g is Lipschitz continuous on the closed interval [0, 1]. Therefore, by Lebourg’s Mean Value Theorem (see e.g. [2]), 0 0 ∃x ∈ (0, 1) : −1 ∈ ∂g(x), i.e. g− (x) ≤ −1 ≤ g+ (x).
Denote an arbitrary choice of such an x by θF . We see that the affine function through all points (θF − k, g(θF ) + k), ∀ k ∈ {0, 1, . . .}, is the convex hull of g restricted to (−∞, θF ]. Noting that g is a convex function on [0, ∞) since F is concave on [0, ∞), this completes the proof of the first part of the theorem. (ii) Using Theorem 6.1 we have ∗∗
g (x) = E(ζ − x) = +
Z
∞
x
(1 − G(t)) dt
(45)
for some cdf G. Since g∗∗ is a convex function the right derivative (g∗∗ )0+ exists everywhere. Substituting (43) and taking the right derivative on both sides of (45) we obtain the cdf G.
2
We continue with the analogous result for the function h, accompanied with its proof. Theorem 6.4 Let ξ be a random variable having a pdf f with support in [0, ∞). Let the cdf F of ξ be a concave function on [0, ∞). Then (i) the convex hull of h is h∗∗ (x) =
0, h(bxc) + F (dxe).(x − bxc),
(ii) the random variable ζ satisfying h∗∗ (x) = E(x − ζ)+ has a cdf H defined by H(t) = F (bt + 1c+ ). 17
if x ≤ 0; if x ≥ 0,
(46)
Figure 1: The function g and the convex hull g∗∗ . (ξ ∼ E(5)) Proof. (i) Trivially h(x) = 0 for all x ≤ 0. Furthermore we have dxe+ −1
h(x) =
X
F (x − k),
k=0
since ξ has support in [0, ∞). We see that h is a concave function on every interval (m, m + 1], m ∈ {0, 1, . . .}. Also h(x + 1) = h(x) + F (x + 1), ∀x ∈ IR, so the affine function h(m) + F (m + 1)(x − m) is the convex hull of h restricted to (m, m + 1]. This completes the proof of the first part of the theorem. (ii) Using Theorem 6.2 we have h∗∗ (x) = E(x − ζ)+ =
Z
x
−∞
H(t) dt
for some cdf H. Since h∗∗ is a convex function the right derivative (h∗∗ )0+ exists everywhere. Substituting (46) and taking the right derivative on both sides of (47) the result follows.
2
In the following two examples we apply Theorems 6.3 and 6.4 to the exponential and uniform distribution respectively. Example 6.1 (Example 3.1 continued) Assume that ξ is exponentially distributed with parameter λ > 0. By ζ ∼ E(λ, κ) we denote a random variable with a pdf f (t) = (λe−λ(t−κ) ) × I[κ,∞) (t), i.e. a random variable that follows a shifted exponential distribution. By Theorem 6.3 we find that the convex hull of the function g is ∗∗
+ (θF − x), g(x), = E(ζ − x)+ ,
g (x) =
1 λ
18
if x ≤ θF ; if x ≥ θF ,
Figure 2: The function h and the convex hull h∗∗ . (ξ ∼ E(1)) where ζ ∼ E(λ, θF ) and θF = ln
λ 1−e−λ
1 λ
. The cdf G of the random variable
ζ is
G(x) =
0, 1 − e−λ(x−θF ) ,
if x < θF ; if x ≥ θF .
By Theorem 6.4 the convex hull of the function h is ( ∗∗
h (x) =
0, −λ(bxc+1) bxc + 1 − 1−e1−e−λ + (1 − e−λdxe )(x − bxc),
if x ≤ 0; if x ≥ 0,
= E(x − η)+ , where the random variable η has a cdf H defined by H(t) = 1 − e−λbt+1c . +
Example 6.2 (Example 3.2 continued) Assume that ξ is uniformly distributed on the interval [0, c]. Using Theorem 6.3 we find that the convex hull of g is E(ζ−x)+ where ζ is a random variable which has a discrete uniform distribution with support that is contained in [0, c]. Define γ = dce − c. The convex hull of the function g is ( ∗∗
g (x) =
bcc2 +bcc 2c
+ (γ − x),
g(x),
if x ≤ γ; if x ≥ γ,
= E(ζ − x)+ , where ζ is a random variable with a cdf defined by
G(x) =
0, 1−
dc−xe+ , c
19
if x < γ; if x ≥ γ.
G has discontinuity-points in x = k+γ, k = 1, 2, . . . , bcc. In each of these points the jump is equal to 1/c, so G is the cdf of a discrete uniform distribution with support {1 + γ, 2 + γ, . . . , c}. Using the explicit formula (14) for the function g, it is not difficult to prove that this is a convex function if and only if c ∈ ZZ. In that case, g is of course equal to its convex hull, and we have Edξ − xe+ = E(ζ − x)+ , where ζ is a random variable with a discrete uniform distribution on the points {1, 2, . . . , c}.
7
Convex Approximations and Algorithms
In this section we turn our attention to the function ψi defined in Section 2. Again, until the very last part of this section, we drop the index i. As can be seen from the previous sections, the function ψ is in general not convex. Indeed, it is even discontinuous if ξ has a discrete distribution. It turns out however, that some form of convexity exists between function values in (not necessarily integer) points that are integer length apart. We use this property to define a piecewise linear convex approximating function. It is shown that in some cases this function is the convex hull of ψ. Finally we discuss two algorithms to solve some cases of integer two-stage problems. Lemma 7.1 Let m ∈ N . Define Λ = {0, 1/m, 2/m, . . . , 1}. For any fixed x0 ∈ IR and λ ∈ Λ, define x1 = x0 + m, xλ = λx0 + (1 − λ)x1 . Then ψ(xλ ) ≤ λψ(x0 ) + (1 − λ)ψ(x1 ). Proof. We prove that ψ(x + 1) − ψ(x) is a non decreasing function of x. Using Corollaries 3.1 and 3.2 we compute ψ(x + 1) − ψ(x) = q + (g(x + 1) − g(x)) + q − (h(x + 1) − h(x)) = q − Fˆ (x + 1) − q + [1 − F (x)]. Since F and Fˆ are non decreasing and q − ≥ 0, q + ≥ 0, the result follows.
2
From this, it follows that we can construct a piecewise linear convex function that coincides with ψ at any set of points that are integer length apart. Definition 7.1 Consider a fixed x0 . Define Ik = [x0 + k, x0 + k + 1) and ∆k = ψ(x0 + k + 1) − ψ(x0 + k), k ∈ ZZ. The piecewise linear convex function ρ, defined as ρ(x) = ψ(x0 + k) + (x − (x0 + k))∆k is said to be a ρ-approximation of ψ rooted at x0 . 20
,
if x ∈ Ik ,
(47)
Note that all ρ-approximations of ψ rooted at x0 + m, m ∈ ZZ, are identical. To find a class of distributions such that a ρ-approximation is also a lower bound of the function ψ, we first prove a property of ψ for a rather general class of discrete distributions. Definition 7.2 Let ξ be a discrete random variable with support Ξ. Define
S = inf
n:Ξ⊂
n [
{ξj + ZZ} ,
j=1
i.e. S is the number of different fractional values in the support of ξ. Lemma 7.2 Let ξ be a discrete random variable. Assume that S is finite, and let s1 < . . . < sS be the ordered sequence of different fractional values. Define sS+1 = 1. Choose z = bzc + f rac, sn ≤ f rac < sn+1 for some n ≤ S. Then (i) ψ(x) is constant on the open interval (bzc + sn , bzc + sn+1 ), (ii) ψ(x) ≥ max{ψ(bzc + sn ), ψ(bzc + sn+1 )},
∀x ∈ (bzc + sn , bzc + sn+1 ).
Proof. (i) By Theorems 3.3 and 3.4 ψ(x) = q +
∞ X
(1 − F (x + k)) + q −
k=0
∞ X
Fˆ (x − k).
k=0
From the assumptions it follows that F (x + k) = F (bzc + sn + k) and Fˆ (x − k) = F (bzc + sn − k), for any x ∈ (bzc + sn , bzc + sn+1 ) and any integer k. Hence ψ(x) = q
+
∞ X
(1 − F (bzc + sn + k)) + q
k=0
−
∞ X
F (bzc + sn − k)
(48)
k=0
is constant on the interval (bzc + sn , bzc + sn+1 ). (ii) Assume x ∈ (bzc + sn , bzc + sn+1 ). Again using Theorems 3.3 and 3.4 to compute ψ(bzc + sn ), and comparing with (48), it follows that ψ(x) ≥ ψ(bzc + sn ) since Fˆ (t) ≤ F (t), ∀t ∈ IR. To prove ψ(x) ≥ ψ(bzc + sn+1 ), we observe Fˆ (bzc + sn+1 − k) = F (bzc + sn − k), ∀k ∈ ZZ. Therefore, ψ(bzc + sn+1 ) = q + q−
∞ X
(1 − F (bzc + sn+1 + k)) +
k=0 ∞ X
F (bzc + sn − k).
k=0
Comparing to (48) the result follows, since F (bzc + sn + k) ≤ F (bzc + sn+1 + k), ∀k ∈ ZZ.
2 21
Corollary 7.1 Let ξ be a discrete random variable with S = 1. Let ρ be a ρ-approximation of ψ rooted at some point in the support of ξ. Then ρ(x) ≤ ψ(x), ∀x ∈ IR. Moreover, ρ is the convex hull of the function ψ. Proof. The function ρ is a lower bound by Lemma 7.2. It is the convex hull of the function ψ since it is convex, piecewise linear and coincides with ψ in all points at integer distance from the root. 2 Among the cases with S = 1, the most natural one in the context of simple integer recourse is when ξ only takes on integer values. Then the ρapproximation rooted at an integer point is the piecewise linear convex hull of ψ, and coincides with ψ at all integer points. To conclude this section, we consider algorithms which can be used to solve two main cases of stochastic programs with simple integer recourse. The first case is when the χi ’s are restricted to be integer. This is a natural situation, since one would typically expect first-stage variables to be integer when second-stage variables are integer. By definition of a ρ-approximation, solving SIR is then equivalent to solving
min{cx + x∈X
m2 X
ρi (χi ) | Ax = b,
i=1
χ = Tx } where T is such that x ∈ X implies χ is integer and ρi is a ρ-approximation of ψi rooted at an integer point. Since (49) is piecewise linear and convex, it can typically be solved by the multicut version of the L-shaped method of Van Slyke and Wets. This amounts to solving the current problem min { cx +
x∈X,θ
m2 X
θi |
(49)
i=1
Ax = b χ = Tx Eik · χi + θi ≥ eik , i = 1, . . . , m2 , k = 1, · · · , ki }
(50)
In this problem, the last constraints (50) are optimality cuts. All together they define the epigraphs of the ρi . If χνi is a current iterate point with θiν < ψi (χνi ), then generate one optimality cut by defining Eik = ψi (χνi ) − ψi (χνi + 1) and eik = (χνi + 1)ψi (χνi ) − χνi ψi (χνi + 1) The algorithm iteratively solves problem (49) and generates cuts (50), until an iterate point (χν , θ ν ) is found such that θ ν = ψ(χν ). More details on the 22
multicut algorithm can be found in [1]. Note that the algorithm is applicable for any type of random variable for which ψ can be computed. The second case is when the χi ’s are not necessarily integers and the ξi ’s have a discrete distribution with Si = 1. A multicut approach is again appropriate. Two different types of cuts are needed. Suppose that the current iterate point χνi ∈ (si + k − ε, si + k + ε), where si is the only fractional value in the support of ξi , k ∈ ZZ and ε > 0 being the tolerance on integer value. Then, by Lemma 7.2, valid cuts of the form (50) can be generated. Consider now an iterate point χνi ∈ [si + k + ε, si + k + 1 − ε]. By Lemma 7.2, the function ψi is constant on the open interval (si + k, si + k + 1). In this case, the cuts would take the form u χi ≥ (si + k + 1)(1 − δik )
χi ≤ s i + k + θi ≥
ψi (χνi )
(51)
l Ui δik
− Mi (2 −
(52) u δik
−
l δik ),
(53)
where u is a binary variable, δ u = 1 iff χ ≤ s + k + 1, δik i i ik l is a binary variable, δ l = 1 iff χ > s + k, δik i i ik
Ui is an upper bound on χi , Li is a lower bound on θi , Mi = ψi (χνi ) − Li . Given that (51)–(53) add three constraints and two new binary variables per cut, it would be advisable to first use cuts of the first type, even if the iterate points are not equal (up to some tolerance ε) to si1 + k for some integer k. The exact manner to do this as well as the exact performance of such a method remains to be investigated.
Acknowledgement The authors are grateful to an anonymous referee for pointing out reference [2].
References [1] J. Birge and F.V. Louveaux, “A multicut algorithm for two-stage stochastic linear programs”, European Journal of Operations Research 34 (1988) 384– 392. [2] F.H. Clarke, Optimization and Nonsmooth Analysis (Wiley, New York, 1983). [3] G. Dantzig, “Linear programming under uncertainty”, Management Science 1 (1955) 197–206. 23
[4] Yu. Ermoliev and R. J-B. Wets, Numerical Techniques for Stochastic Optimization (Springer-Verlag, Berlin, 1988). [5] H. Gassmann, “MSLIP, a computer code for the multistage stochastic linear programming problem”, Mathematical Programming 47 (1990) 407– 423. [6] P. Kall, Stochastic Linear Programming (Springer-Verlag, Berlin, 1976). [7] G. Laporte and F.V. Louveaux, “The integer L-shaped method for stochastic integer programs”, Technical Report 788, CRT, University of Montr´eal, (Montr´eal, 1992). [8] R.T. Rockafellar, Convex Analysis (Princeton University Press, Princeton, NJ, 1970). [9] R. Schultz, “Continuity properties of expectation functions in stochastic integer programming”, Mathematics of Operations Research, To appear. [10] L. Stougie, “Design and analysis of algorithms for stochastic integer programming”, CWI Tract 37, Centrum voor Wiskunde en Informatica (Amsterdam, 1987). [11] R. J-B. Wets, “Solving stochastic programs with simple recourse”, Stochastics 10 (1984) 219–242.
24