Stochastic Optimization Problems with CVaR Risk Measure and Their Sample Average Approximation Fanwen Meng1 , Jie Sun2 , Mark Goh3
Abstract We provide a refined convergence analysis for the sample average approximation (SAA) method applied to stochastic optimization problems with either single CVaR or mixed CVaR risk measures. Under certain regularity conditions, it is shown that any accumulation point of the generalized weak stationary points produced by the SAA method is almost surely a weak stationary point of the original CVaR (or mixed CVaR) optimization problems. In addition, it is shown that, as the sample size increases, the difference of the optimal values between the SAA problems and the original problem tends to zero with probability approaching to one exponentially fast.
Key words. Conditional Value-at-Risk, Sample Average Approximation, Stochastic Optimization, Variational Analysis
AMS subject classifications. 91B28, 90C90, 62P05
1
Introduction
Of particular importance in financial optimization is the notion of VaR (value-at-risk) for quantifying the down side risk. However, as a function of the decision variables, VaR is generally nonconvex and difficult to compute, making the resulting optimization problem numerically 1
The Logistic Institute - Asia Pacific, National University of Singapore. Email:
[email protected]. Fax: +65 6775 3391. 2 The corresponding author. School of Business and Singapore-MIT Alliance, National University of Singapore. Email:
[email protected]. 3 School of Business and The Logistics Institute - Asia Pacific, National University of Singapore. Email:
[email protected].
1
hard to solve. An extensive recent study on a related concept, CVaR (conditional VaR), has revealed that, as the tightest convex approximation of VaR, CVaR is a coherent risk measure that enjoys nice features such as convexity and monotonicity [1, 2, 3, 4, 5, 6, 7]. Therefore, CVaR has numerical advantages over VaR. On the negative side, CVaR has been criticized for its conservativeness [8], but this does not seem to prevent its application to such important areas as portfolio management and robust optimization [4, 5, 9, 10, 11]. Suppose that f : IR n × IRm → IR is a loss function which depends on a control vector x ∈ X ⊆ IR n and a random vector y˜ ∈ IR m . Here X denotes the set of feasible decisions and y˜ : Ω → Ξ ⊆ IR m is defined on probability space (Ω, F , P). For a given confidence level α ∈ (0, 1), the VaR of the loss f (x, y˜) associated with a decision x is defined as VaRα (x) = min{u | P[f (x, y˜) ≤ u] ≥ α},
(1.1)
where P(·) stands for the probability. In other words, for fixed x, VaRα (x) is the minimum α-quantile of the distribution of the random variable f (x, y˜). While CVaR is conceptually defined as the expectation of f (x, y˜) in the conditional distribution of its upper α-tail, a more “operationally convenient” definition by Rockafellar and Uryasev [7] is as follows. CVaRα (x) = min{η(x, u, α) | u ∈ IR},
(1.2)
where η(x, u, α) is defined as η(x, u, α) := u +
1 E[f (x, y˜) − u]+ , 1−α
(1.3)
where the superscript plus sign denotes the plus function [t]+ := max{0, t}, E denotes the mathematical expectation. It has been shown [7] that VaRα (x) is the lower endpoint of the closed bounded interval arg minu η(x, u, α). In particular, CVaRα(x) = η(x, VaRα(x), α).
(1.4)
Another very interesting measure of risk, called the mixed CVaR, is defined as λ1 CVaRα1 (x) + · · · + λJ CVaRαJ (x),
(1.5) PJ
where αi ∈ (0, 1) denote the probability levels and λi > 0 represent weights with i=1 λi = 1, i = 1, . . . , J. Note the single CVaR is a special mixed CVaR of J = 1. In general, for a weighting measure λ on (0, 1) which is nonnegative with total measure 1, the mixed CVaR can be written as an integral as follows: Z 1 CVaRα(x)dλ(α). (1.6) 0
It can be shown that both (1.5) and (1.6) are coherent measures of risk. In this paper, we are particularly interested in studying both single CVaR and mixed CVaR minimization problems; namely, min CVaRα(x) x∈X
2
(1.7)
and min {λ1 CVaRα1 (x) + · · · + λJ CVaRαJ (x)} . x∈X
(1.8)
It is known [7] that the CVaR minimization problem (1.7) is equivalent to the following problem 1 + E[f (x, y˜) − u] (1.9) min u+ 1−α (x,u)∈X ×IR in the sense that these two problems achieve the same minimum value and the x-component of the solution to (1.9) is a solution to (1.7). Similarly, one can also derive (see, Proposition 2.5 in Section 2) that the following problem 1 1 + + min λ1 u1 + E[f (x, y˜) − u1 ] + · · · + λJ uJ + E[f (x, y˜) − uJ ] 1 − α1 1 − αJ s. t. (x, u1, · · · , uJ ) ∈ X × IR × · · · × IR (1.10) is equivalent to Problem (1.8) in the same sense. If the expectations in problems (1.9) and (1.10) can be evaluated analytically, then these two problems can be regarded as standard nonlinear programming problems and consequently they can be solved by existing numerical methods. However, it might not be easy to evaluate or compute the underlying expectations in most practical situations. For instance, even the random vector y˜ follows a known continuous distribution, calculations of the expected value involve multidimensional integration, which is computationally expensive if not impossible at all. Motivated by this concern, this paper is devoted to the analysis of a Monte Carlo simulation-based sampling approach to problems (1.9) and (1.10), called the sample average approximation (SAA) method. The SAA method has been extensively investigated in stochastic optimization. See, for instance, [12]. More recently, SAA method was applied to supply chain network design problems under uncertainty [13] and stochastic mathematical programs with equality or equilibrium constraints [14, 15, 16]. In the present paper we concentrate on the SAA method applied to problems (1.9) and (1.10) and provide some new results. Particularly, we show that any accumulation point of the generalized weak stationary points produced by the SAA algorithm is almost surely a weak stationary point of the original CVaR optimization problem. We also show that, as the sample size increases, the difference of the optimal values between the SAA problems and the original problem tends to zero with probability approaching to one exponentially fast. The rest of this paper is organized as follows. Section 2 presents some basic notions and discusses some properties of CVaR, followed by an introduction to the notion of weak stationary points. We analyze the convergence properties of the SAA method in Section 3.
3
2 2.1
Preliminaries Basic notions
In this section, we first recall some basic notions which will be used in the subsequent analysis. We then discuss the properties of CVaR and introduce the optimality conditions of the CVaR and mixed CVaR optimization problems. Throughout this paper, k · k denotes the Euclidean norm of a vector or a compact set of vectors. When M is a compact set of vectors, we denote the norm of M by kMk := maxM ∈M kM k. For two compact sets C and D, the deviation from C to D, or the excess of C over D, is defined by D(C, D) := sup d(x, D), x∈C
where d(x, D) denotes the distance from point x to set D defined by d(x, D) := inf x0 ∈D kx − x0 k. Before stating the optimality conditions of the underlying CVaR and mixed CVaR optimization problems in Section 2.2, we recall the notion of the expectations of random compact set-valued l mappings (see [16]). Let A(·, y˜) : V → 2IR be a random compact set-valued mapping, where V ⊂ IRl be a compact set of IR l and y˜ : Ω → Ξ ⊂ IR m be a random vector. A selection of the random set A(v, y˜(ω)) is a measurable random vector A(v, y˜(ω)) ∈ A(v, y˜(ω)). The expectation of A(v, y˜(ω)), denoted by E[A(v, y˜(ω))], is defined as the collection of E[A(v, y˜(ω))] where A(v, y˜(ω)) is a selection. Note that such selections exist, see Artstein and Vitale [17] and references therein. We need a general assumption for our discussion. Assumption 1 E[f (x, y˜) − u]+ is finite on IRn × IR. Assumption 1 ensures that the objective functions in CVaR minimization (1.2) and mixed CVaR minimization are well defined and hence it is natural. Note that if there exists a measurable function κ(˜ y ) such that E[κ(˜ y )] < ∞ and |f (x, y˜)| ≤ κ(˜ y ) for all x ∈ X and y˜ ∈ Ξ, then Assumption 1 holds. For simplicity, here and below, we define ζ(x, α) := CVaRα (x) and write the random vector y˜ simply as y. Evidently, ζ is a function in x and α. For a locally Lipschitz continuous function Φ : Θ ⊆ U → W , where Θ is open, let DΦ denote the set where Φ is differentiable. Then, the Bsubdifferential of Φ at u0 ∈ Θ, denoted by ∂B Φ(u0 ), is the set of V such that V = limk→∞ JΦ(uk ), where uk ∈ DΦ converges to u0 . Hence, Clarke’s generalized Jacobian of Φ at u0 is the convex hull of ∂B Φ(u0 ) [18], i.e., ∂Φ(u0 ) = conv{∂B Φ(u0 )}. Rockafellar and Uryasev [7] studied the stability and sensitivity of CVaR and proved that ζ(x, α) is continuous in α on (0, 1). Their result can be enhanced by showing the global Lipschitz continuity of ζ(x, α) in α. Moreover, ∂α ζ(x, α0) reduces to a closed interval with a nonnegative 4
left endpoint if the measure of {y ∈ Ω | f (x, y) − VaRα0 (x) ≤ 0} is zero. We state these results in the following and the proof can be found in the Appendix. Proposition 2.1 (i) Given δ ∈ (0, 1) and x ∈ X , ζ(x, α) as a function of α is globally Lipschitz continuous on (0, δ). That is, there exists a Lipschitz constant L > 0, such that for any α1 , α2 ∈ (0, δ), |ζ(x, α1 ) − ζ(x, α2 )| ≤ L|α1 − α2 |. (ii) Let α0 ∈ (0, 1) and given x ∈ X , the generalized Jacobian of conditional value at risk, ζ(x, α), at α0 is 1 1 0 + + + E[f (x, y) − VaRα (x)] , E[f (x, y) − VaRα (x)] , ∂α ζ(x, α ) = conv (1 − α0 )2 (1 − α0 )2 where VaR+ α (x) := min{u | F (x, u) > α} and F (x, ·) represents the cumulative distribution of z(= f (x, y)) on IR. We next present a result concerning the differentiability of function η(·, ·, α). The proof is in the Appendix. Proposition 2.2 Let α ∈ (0, 1). Suppose that: (i) f (x, y) is continuously differentiable in x for a.e. y ∈ Ω; (ii) ∇x f (x, y) is continuous on X × Ω. Let S(x, u) := {y ∈ Ω | f (x, y) = u}. If S(x, u) is a set of zero measure, then η(·, ·, α) is differentiable at (x, u) with ! ! Z 0 ∇x f (x, y) 1 ∇x,u η(x, u, α) = + dP (y), 1 − α S(x,u) ¯ 1 −1 ¯ u) = {y ∈ Ω | f (x, y) ≥ u}. where S(x, Note that conditions in Proposition 2.2 will automatically hold in many cases in practice. For example, if the loss function f is an affine function, then obviously, conditions (i) and (ii) are satisfied. Further, given (x, u) ∈ X × IR, S(x, u) will be reduced to either a singleton or an empty set. Clearly, the measure of S(x, u) equals to zero in these situations.
2.2 2.2.1
Optimality conditions The Case of Single CVaR
Let z := (x, u). With a little abuse of notations, we write η(z, α) := η(x, u, α) = u + 1 + 1−α E[f (x, y) − u] , and Z := X × IR. Recall that minimizing CVaR over X is equivalent to solving the following problem: min{η(z, α) | z ∈ Z}.
5
(2.11)
Since, in general η is nonsmooth in z on Z, (2.11) is actually a nonsmooth problem. Note that dom η(·, α) = IRn × IR, by Assumption 1, where “dom” denotes the domain of a function. Note further that (2.11) can be rewritten as min E[g(z, α, y)].
(2.12)
z∈Z
Here η(z, α) = E[g(z, α, y)] and g(z, α, y) := u +
1 + 1−α [f (x, y) − u] .
Note that if X is convex, then (2.11) is a convex problem. Therefore, according to [19, Theorem 23.8], the optimality conditions of problem (2.11) is as follows. Proposition 2.3 Suppose that the set X is convex and function f (x, y) is convex in x for a.e. y ∈ Ω. Then, a necessary and sufficient condition for z¯ ∈ IR n × IR being an optimal solution of problem (2.11) is 0 ∈ ∂z E[g(¯ z, α, y)] + NZ (¯ z ),
(2.13)
where NZ (¯ z ) denotes the normal cone of Z at z¯. Note that, in general, ∂z E[g(z, α, y)] ⊂ E[∂z g(z, α, y)]. However, the former is disadvantaged by requiring the derivative information of the expected value of g(x, α, y), which may be difficult to obtain. Fortunately, under some regularity conditions [18, Definition 2.3.4] such as convexity of g(z, α, y) in z or continuous differentiability of function g in z, these two sets of subdifferentials may coincide, see for instance [20, Proposition 2.10] and [21, Proposition 5.1]. For problems considered in this paper, due to the convexity of the underlying functions, we always have ∂z E[g(z, α, y)] = E[∂z g(z, α, y)]. Thus, condition (2.13) can be written as 0 ∈ E[∂z g(¯ z, α, y)] + NZ (¯ z ).
(2.14)
We then derive the following result, which further characterizes the optimality conditions for the CVaR minimization problem in terms of mathematical expectation. Proposition 2.4 Suppose that X is a convex set and f (·, y) is convex for a.e. y ∈ Ω. If z¯ is an optimal solution of (2.11), then ! 0 1 0∈ + E[A(¯ z , y)] + NZ (¯ z ), (2.15) 1−α 1 where A(¯ z , y) :=
(
ν
∂x f (¯ x, y) −1
!
)
: ν ∈ ∂ (max{0, ¯t}) , t¯ = f (¯ x, y) − u ¯ .
Proof. By convexity, Proposition 2.3, and [22, Theorem 9], the optimality condition can be written as 0 ∈ ∂z E[g(¯ z, α, y)] + NZ (¯ z ) = E[∂z g(¯ z, α, y)] + NZ (¯ z). On the other hand, applying 6
some basic operations on the subdifferential of g, we have ! 0 1 ∂z [f (¯ x, y) − u ¯] + . ∂z g(¯ z, α, y) ⊂ + 1−α 1
In addition, note that
∂z [f (x, y) − u]
+
⊂
(
ν
∂x f (x, y) −1
!
)
: ν ∈ ∂ (max{0, t}) , t = f (x, y) − u .
Thereby, equation (2.15) follows immediately. This completes the proof. Definition 2.1 A point z ∈ Z is called a weak stationary point of (2.11) if it satisfies (2.15). A point z ∈ Z is called a stationary point of (2.11) if it satisfies (2.14). According to Definition 2.1, if a point is a stationary point of (2.11), this point must be a weak stationary point but not vice versa. In the following, we concentrate on the analysis of convergence of the weak stationary points of the associated SAA programs. We now investigate the properties of mapping A(·, y). Lemma 2.1 For every y ∈ Ω, the set-valued mapping A(·, y) : Z → 2IR uous.
n+1
is upper semicontin-
Proof. For any z = (x, u), let Υ(z, y) := f (x, y) − u. Clearly, Υ(·, y) is continuous in z. Since ∂f (x, y) is upper semicontinuous at x, by definition, it then follows that for any > 0, there exists δ1 > 0 such that for any x0 satisfying kx0 − xk ≤ δ1 , ∂x f (x0 , y) ⊂ ∂x f (x, y) + Bn ,
(2.16)
where Bn denotes the unit ball in IRn . For any z ∈ Z, we consider the following two cases: (i) Υ(z, y) > 0; (ii) Υ(z, y) ≤ 0. For case (i), by the continuity of Υ(·, y), there exists δ( for any z 0 satisfying kz 0 − 2 > 0 such that!) ∂x f (¯ x, y) zk ≤ δ2 , we have Υ(z 0 , y) > 0. In this case, A(z 0 , y) = since ∂(max{0, t}) = −1 {1} when t > 0. In addition, it follows from (2.16) that ! ! ∂x f (x, y) ∂x f (x0 , y) ⊂ + Bn+1 , −1 −1 for any z 0 satisfying kz 0 − zk < δ1 . Let δ = min{δ1 , δ2 }. Then for any z 0 satisfying kz 0 − zk < δ, A(z 0 , y) ⊂ A(z, y) + Bn+1 , which shows the upper semicontinuity of A(·, y). For case (ii), noticing that ∂(max{0, t}) = [0, 1] for t ≤ 0, we have ( ! ) ν∂x f (x, y) A(z, y) = : ν ∈ [0, 1] . −ν 7
Then, for z 0 ∈ N (z, δ1) := {z 0 : kz 0 − zk ≤ δ1 } we consider two cases: (a) Υ(z 0 , y) > 0; (b) Υ(z 0 , y) ≤ 0. ( !) 0 , y) ∂ f (x x For case (a), we have A(z 0 , y) = . Since ∂x f (x0 , y) ⊂ ∂x f (x, y) + Bn , −1 one has ν∂x f (x0 , y) ⊂ ν∂x f (x, y) + νBn ⊂ ν∂x f (x, y) + Bn , for any ν ∈ [0, 1]. Thus, for any ν ∈ [0, 1] ! ! ν∂x f (x0 , y) ν∂x f (x, y) ⊂ + Bn+1 , −ν −ν which leads to 0
A(z , y) ⊂
(
ν∂f (x0 , y) −ν
!
: ν ∈ [0, 1]
)
⊂
(
ν∂f (x, y) −ν
!
)
!
)
: ν ∈ [0, 1] + Bn+1
= A(z, y) + Bn+1 .
(2.17) (
ν∂x f (x0 , y) : ν ∈ [0, 1] . Again, by the upper semi−ν continuity of ∂x f (x, y), it is easy to derive that A(z 0 , y) ⊂ A(z, y) + Bn+1 , which together with (2.17) shows the upper semicontinuity of A in case (ii). This completes the proof.
For case (b), note that A(z 0 , y) =
Lemma 2.1 shows that A(·, y) is upper semicontinuous, which is an important property in analyzing the convergence of the stationary points in the sequel. Note that A(z, y) is actually a compact convex set at z ∈ Z. Note also that conditions in Proposition 2.4 are reasonable and moderate. In many cases in practice, the loss function f is assumed to be affine or convex piecewise linear-quadratic, which satisfies the conditions in Proposition 2.4. Further, when f is continuously differentiable, the subdifferential ∂x f (¯ x, y) will reduce to a singleton {∇x f (¯ x, y)}. T T In this case, A(¯ z , y) is a segment along the direction d with d = (∇x f (¯ x, y) , −1) .
2.2.2
The Case of Mixed CVaR
We now consider optimality conditions for the mixed CVaR minimization problem associated with the loss function f (x, y), namely, min {λ1 CVaRα1 (x) + · · · + λJ CVaRαJ (x) | x ∈ X } .
(2.18)
Since each item in the above objective is actually a minimization subproblem, we need to present an equivalent formulation of (2.18) as follows. The proof is given in the Appendix. Proposition 2.5 The minimization problem (2.18) is equivalent to 1 1 + + min λ1 u1 + E[f (x, y) − u1 ] + · · · + λJ uJ + E[f (x, y) − uJ ] 1 − α1 1 − αJ s. t. x ∈ X , u1 , · · · , uJ ∈ IR, (2.19) 8
in the sense that they achieve the same minimum value, and moreover, (x∗ , u∗1 , · · · , u∗J ) achieves the minimum of (2.19) if and only if x∗ achieves the minimum of (2.18) and 1 ∗ ∗ + ui ∈ arg min ui + E[f (x , y) − ui ] for i = 1, . . . , J. ui ∈IR 1 − αi According to Proposition 2.5, we turn to study problem (2.19) in what follows. Clearly, (2.19) can be rewritten as min
(x,u1 ,...,uJ )∈X ×IR×···×IR
λ1 E[g1 (x, u1, α1 , y)] + · · · + λJ E[gJ (x, uJ , αJ , y)]
(2.20)
1 [f (x, y) − ui ]+ , i = 1, . . ., J. For convenience, here we still where gi (x, ui, αi, y) = ui + 1−α i use the notation z to denote (x, u1 , . . . , uJ ) and Z to denote X × IR × · · · × IR. With similar arguments to the single CVaR case, we derive the optimality conditions for the mixed CVaR optimization problem.
Proposition 2.6 Suppose that X is a convex set and f (·, y) is convex for a.e. y ∈ Ω. If z¯ is an optimal solution of (2.19), then ! 0 λ1 λJ 0∈ + E[A1 (¯ z , y)] + · · · + E[AJ (¯ z , y)] + NZ (¯ z ), (2.21) 1 − α1 1 − αJ λ where λ = (λ1 , . . . , λJ )T ∈ IRJ , for i = 1, . . . , J, ( ! ) ∂x f (¯ x, y) Ai (¯ z , y) := ν : ν ∈ ∂ (max{0, ¯t}) , t¯ = f (¯ x, y) − u ¯i , βi βi ∈ IR J is a vector of all zero entries except its i-th entry being −1, i.e., βiT = (0, . . ., −1, . . ., 0). Proof. With similar arguments as Proposition 2.3, it follows that 0 ∈ ∂z (λ1 E[g1 (¯ x, u ¯1 , α1 , y)] + · · · + λJ E[gJ (¯ x, u ¯J , αJ , y)]) + NZ (¯ z ). By assumption 1, it is not hard to see that each E[gi (x, ui, αi, y)] is a proper convex function with its domain being the whole space, i.e., IR n × IR × · · · × IR. Then, by [19, Theorem 23.8] and noticing the positiveness of λi , we have ∂z (λ1 E[g1 (x, u1 , α1 , y)] + · · · + λJ E[gJ (x, uJ , αJ , y)]) = λ1 ∂z E[g1 (x, u1 , α1 , y)] + · · · + λJ ∂z E[gJ (x, uJ , αJ , y)]. In addition, by [20, Proposition 2.10], the expectation operator and subdifferential operator in the above equation can exchange. Thus, it yields that 0 ∈ λ1 E[∂z g1 (¯ x, u ¯1 , α1 , y)] + · · · + λJ E[∂z gJ (¯ x, u ¯J , αJ , y)] + NZ (¯ z ).
(2.22)
Then, the result follows immediately with help of Proposition 2.4. Similarly, we now define the notion of weak stationary point associated with the mixed CVaR problem. 9
Definition 2.2 We call z ∈ Z a weak stationary point of problem (2.19) if it satisfies (2.21),and call z ∈ Z a stationary point of problem (2.19) if it satisfies (2.22). Again, if a point is a stationary point of problem (2.19), it must be a weak stationary point but not vice versa. Note also that, by Lemma 2.1, mapping Ai (·, y), i = 1, . . ., J, is upper semicontinuous and compact on Z. In the next section, we will use a sampling approach for solving the equivalent counterparts of CVaR and mixed CVaR minimization problems and analyze the convergence of the stationary points as the sample size increases.
3
Convergence Analysis of the SAA Method
In this section, we introduce the SAA method for solving stochastic programs (2.12) and (2.20). We define the notion of stationary point for the SAA counterparts of the original problems and then analyze the convergence of the weak stationary points as the sample size tends to infinity. The main tool used in the analysis is a uniform strong law of large numbers for random compact set-valued mappings which was established in [16]. Here, we mainly consider the case where the distribution of random variable y is continuous.
3.1
Sample Average Approximation
Let us consider a stochastic programming problem in the form min{g(x) := E[G(x, ξ)]}, x∈X
(3.23)
where G(x, ξ) is a function of two vector variables x ∈ IRl and ξ ∈ IRd , X ⊂ IRl is a given set and ξ = ξ(ω) is a random vector. One of the most well known approach to solving the expected value problem (3.23) is the Monte Carlo simulation based method. The basic idea of the method is to generate an independent identically distributed (i.i.d.) sample, ξ 1 , . . . , ξ N of ξ and then approximate the expected value with the sample average, that is, ( ) N 1 X min gˆN (x) := G(x, ξ i) . (3.24) x∈X N i=1
We refer to (3.23) as the original problem and (3.24) as the sample average approximation problem. Shapiro used the SAA method to solve stochastic mathematical programs with equilibrium constraints [15]. Later, Meng and Xu [14], and Xu and Meng [16] further investigated the SAA method on stochastic mathematical programs with (nonsmooth) equality constraints. There are some advantages for using the SAA method. The method separates sampling procedures and optimization techniques, which makes it easy to implement. Any optimization algorithm 10
developed for a class of stochastic programs can be applied to the constructed SAA problem directly. Also, the method is ideally suited for a parallel implementation. Statistical theory of the SAA estimators is closely related to the statistical inference of the Maximum Likelihood (ML) method and M -estimators. There are some well developed statistical inference of the SAA method. See a comprehensive review by Shapiro [12]. Most convergence analysis of SAA problems in the literature concerns the convergence of optimal solutions and optimal values [12] and references therein. Our interest here, however, is on the convergence of weak stationary points. Namely, if we can obtain a weak stationary point of (3.24), then what is the accumulation point of the SAA weak stationary sequence? Our motivations to consider this issue is rooted from the consideration of allowing more flexibility in solving (3.24) in practice.
3.2 3.2.1
Stationary Points of SAA Counterparts The Single CVaR Case
We now study the optimality conditions of SAA counterparts of the true problems. For the single CVaR case, let {y 1 , y 2 , · · · , y N } be an i.i.d. sample of y. The SAA program of (2.12) (or (2.11)) is as follows: ( ) N 1 X i min gN (z, α) := g(z, α, y ) | z ∈ Z . (3.25) N i=1
Clearly, (3.25) is a deterministic optimization problem. In order to solve the original problem (2.12), the SAA method solves a sequence of (3.25) and let the sample size N increase. Note that, in the analysis, α ∈ (0, 1) is taken as a fixed scalar. Hence, unless otherwise specified, the underlying derivatives are taken with respect to z. Before analyzing the convergence of the stationary points of SAA programs, we start our discussion with a generalized Karush-KuhnTucker (GKKT) condition for (3.25), which is stated as 0 ∈ ∂z gN (z, α) + NZ (z). By Proposition 2.4, we have ∂z gN (z, α) ⊆
0 1
!
N
+
X 1 A(z, y i ). N (1 − α) i=1
We may consider the weak GKKT condition of (3.25) as follows ! N X 0 1 0∈ + A(z, y i ) + NZ (z). N (1 − α) 1 i=1 We say z is a weak GKKT point of (3.25) if it satisfies (3.26). 11
(3.26)
3.2.2
The Mixed CVaR Case
Similarly, for the mixed CVaR minimization case, let {y 1 , y 2 , · · · , y N } be an i.i.d. sample of y. The SAA program of (2.20) is as follows: ! ! N N X 1 X 1 min gN (z, α1, . . . , αJ ) := λ1 g1 (z, α1 , y i) + · · · + λJ gJ (z, αJ , y i) . (3.27) z∈Z N N i=1
i=1
Then, the GKKT conditions for (3.27) can be written as: 0 ∈ ∂z gN (z, α1 , . . . , αJ ) + NZ (z) ! N 1 X g1 (z, α1 , y i) + · · · + λJ ∂z = λ1 ∂z N i=1
N 1 X gJ (z, αJ , y i) N i=1
!
+ NZ (z).
Note that, with help of Proposition 2.6, the above condition can be further relaxed as ! N N X X 0 λ1 λJ 0∈ + A1 (z, y i) + · · · + AJ (z, y i) + NZ (z), (1 − α1 )N (1 − αJ )N λ i=1
(3.28)
i=1
in the sense that the points satisfying the former must satisfy the latter, but not vice versa. We say z is a weak GKKT point of (3.27) if it satisfies (3.28).
3.3
Convergence Analysis
We now investigate the weak GKKT points of the SAA method in which some nonsmooth analysis techniques are employed. We first make some necessary preparations before deriving the main results concerning the convergence analysis.
3.3.1
The Case of Single CVaR l
Let C(·, y) : V → 2IR be a random compact set-valued mapping, where V ⊂ IRl be a compact set of IRl and y is a random vector defined on Ω. Let {y 1 , · · · , y N } be an i.i.d. sample of y. Obviously for every v ∈ V , C(v, y i), i = 1, · · · , N , are independent, identically distributed random sets. Let CN (v) :=
N 1 X C(v, y i). N
(3.29)
i=1
Here the addition of compact sets is in the sense of Minkowski, that is, for two compact subsets K and L, K + L = {x + y : x ∈ K, y ∈ L}. Clearly, CN (v) is a compact subset. The following lemma, which is taken from [16, Lemma 3.2] with a slight modification, plays a role in analyzing the convergence of the stationary points. 12
l
Lemma 3.1 Let V ⊂ IRl be a compact set, and C(·, y) : V → 2IR be a compact, upper semicontinuous set-valued mapping for every y ∈ Ω. Let CN (v) be defined as in (3.29) for every v ∈ V . Suppose that there exists π(y) such that kC(v, y)k :=
sup
kCk ≤ π(y),
(3.30)
C∈C(v,y)
where E[π(y)] < ∞. Then for any µ > 0, there exists τ > 0 such that lim CN (v) ⊂ E[Cτ (v, y)] + µB, with probability 1 (w.p.1)
N →∞
(3.31)
S uniformly for v ∈ V , where Cτ (v, y) := w∈B(v,τ ) C(w, y), B(v, τ ) denotes the closed ball in IR l with radius τ and center v, B denotes the unit ball in IRl , and the mathematical expectation of C(v, y) is taken entry-wise on C(v, y). Theorem 3.1 Let α ∈ (0, 1). Let {zN } be a sequence of weak GKKT points that satisfy (3.26) and let z ∗ be an accumulation point. Suppose that there exist a compact set Z containing a neighborhood, B(z ∗ , τ0 ) ∩ Z, of z ∗ , and a function κ0 (y) such that k∂x f (x, y)k ≤ κ0 (y) for any p (z, y) ∈ Z × Ω, where τ0 is a small positive number and E[ κ0 (y)2 + 1] < ∞. Then, w.p.1, z ∗ satisfies ! 0 1 E[A(z ∗ , y)] + NZ (z ∗ ), 0∈ + 1−α 1 namely, z ∗ is a weak stationary point of (2.12). Proof. First, note that, by assumption, E[A(z, y)] is well defined for z ∈ Z. Define Aτ (z ∗ , y) := ∪z∈B(z∗ ,τ ) A(z, y) with τ > 0. By Lemma 2.1, for every y ∈ Ω, A(·, y) is an upper semicontinuous and compact set-valued mapping on Z. Then, by Lemma 3.1, for any µ > 0, there exists τ > 0 such that N 1 X A(z, y i ) ⊂ E[Aτ (z, y)] + µB, w.p.1 N →∞ N
lim
(3.32)
i=1
uniformly for z ∈ Z. Since, by assumption, z ∗ is an accumulation point of {zN }, without loss of generality, we assume that {zN } → {z ∗ } with {zN } ⊂ Z. In addition, we have ! N X 0 1 0∈ + A(zN , y i ) + NZ (zN ), (3.33) N (1 − α) 1 i=1
by noticing zN being a weak GKKT point of (3.26). Now, we study the average of set-valued mapP i pings in (3.33), i.e., N1 N i=1 A(zN , y ). To this end, we turn to study the relationship between P PN N 1 i ∗ , y)]+2µB by evaluating the deviation from 1 i (z i=1 A(zN , y ) and set E[A2τ i=1 A(zN , y ) N N P i ∗ to E[A2τ (z ∗ , y)] + 2µB, i.e., D N1 N i=1 A(zN , y ), E[A2τ (z , y)] + 2µB . Note that D(S, T ) = 0 13
implies S ⊂ D. Also, in general D(S, T ) 6= D(T, S). By virtue of the triangle inequality, we obtain D(
N N N 1 X 1 X 1 X A(zN , y i), E[A2τ (z ∗ , y)] + 2µB) ≤ D( A(zN , y i), Aτ (z ∗ , y i) + µB) N N N i=1
i=1
+ D(
1 N
N X
i=1
Aτ (z ∗ , y i) + µB, E[A2τ (z ∗ , y)] + 2µB).
i=1
We consider the first term on the right hand side of the above inequality. Note that zN ∈ B(z ∗ , τ ) for N sufficiently large, thereby, A(zN , y i ) ⊂ Aτ (z ∗ , y i ). This gives ! N N X 1 X 1 A(zN , y i), Aτ (z ∗ , y i) + µB = 0 D N N i=1
i=1
for N sufficiently large. In addition, by (3.32), it is not hard to see that the second term in the above inequality tends to zero w.p.1 as N → ∞. Thus, ! N 1 X i ∗ A(zN , y ), E[A2τ (z , y)] + 2µB = 0, w.p.1. lim D N →∞ N i=1
Thereby, with the help of (3.33), we have ! 0 1 0∈ + E[A2τ (z ∗ , y)] + 2µB + NZ (z ∗ ). 1−α 1
(3.34)
Next, we consider the mapping A2τ (z ∗ , y) and treat τ as a variable. By the boundedness p assumption of ∂x f (x, y) on Z, it is not hard to show that kA(z, y)k ≤ κ0 (y)2 + 1 for z ∈ B(z ∗ , 2τ ) as long as τ is small enough. In fact, this is true for small enough τ ∈ (0, τ0/2). Thereby, by recalling the definition of A2τ (z ∗ , y), for such small τ , A2τ (z ∗ , y) is dominated p p by κ0 (y)2 + 1, where E[ κ0 (y)2 + 1] < ∞ according to the assumption. Moreover, note that Aτ (z ∗ , y) is monotonic in τ and A(·, y) is upper semicontinuous, then limτ →0 A2τ (z ∗ , y) = A(z ∗ , y). Hence, by the Lebesgue dominated convergence theorem, it yields that h i lim E[A2τ (z ∗ , y)] = E lim A2τ (z ∗ , y) = E[A(z ∗ , y)]. (3.35) τ →0
τ →0
For τ and µ considered in (3.32), note that if µ tends to zero, τ is then driven to zero as well. Thus, by the arbitrariness of µ, (3.34), and (3.35), we obtain ! 0 1 0∈ + E[A(z ∗ , y)] + NZ (z ∗ ). 1−α 1 This completes the proof. Theorem 3.2 Let α ∈ (0, 1). Suppose that the set of global minimizers of (2.12) is bounded and contained in a compact set Z. Suppose further that there exists a function κ0 (y) such that 14
p k∂x f (x, y)k ≤ κ0 (y) for any (z, y) ∈ Z × Ω, where E[ 1 + κ0 (y)2 ] < ∞. Then (i) g(·, α, y) is Lipschitz continuous on Z, that is, there exists L1 (y) > 0 such that |g(z1, α, y) − g(z2 , α, y)| ≤ L1 (y)kz1 − z2 k, ∀ z1 , z2 ∈ Z. In addition, the moment-generating function of L1 (y), ML1 (t) := E[etL1 (y) ], is finite for every t near zero. (ii) As the sample size N increases, with probability approaching one exponentially fast, the difference between the global minimum value of (3.25) and the global minimum value of (2.12) becomes arbitrarily small. Proof. Part (i). It is obvious that g(z, α, y) is locally Lipschitz continuous in z. Note that for any (z, y) ∈ Z × Ω, ! 0 1 ∂z g(z, α, y) ⊂ + A(z, y). 1−α 1 Then, by assumption, for any (z, y) ∈ Z × Ω, k∂z g(z, α, y)k ≤ 1 +
1 1 p 1 + κ0 (y)2 . kA(z, y)k ≤ 1 + 1−α 1−α
p 1 Let L1 (y) := 1 + 1−α 1 + κ0 (y)2 . Clearly, E[L1 (y)] < ∞. Thus, for y ∈ Ω and any z1 , z2 ∈ Z, |g(z1, α, y) − g(z2, α, y)| ≤ L1 (y)kz1 − z2 k. Moreover, by the Lebesgue dominated convergence theorem, it is not hard to show the finiteness of the moment-generating function ML1 (t) for t near zero. Part (ii). Denote by z ∗ and zN the global minimizers of (2.12) and its SAA problem (3.25), respectively. According to [23, Theorem 5.1], for any % > 0, there exist positive constants z(%) and ρ(%), independent of the sample size N , such that P sup |gN (z, α) − η(z, α)| > % ≤ z(%)e−N ρ(%). (3.36) z∈Z
Then, by the well-known Berge’s stability theorem and (3.36), it yields that P {|gN (zN , α) − η(z ∗ , α)| > %} ≤ z(%)e−N ρ(%). Thus, a global minimizer of the true problem becomes a %-global minimizer, i.e., gN (zN , α) < η(z ∗ , α)+%, of its SAA counterpart, with a probability of at least 1−z(%)e−N ρ(%). This completes the proof. Note that in Theorems 3.1 and 3.2, we make an assumption of the boundedness of ∂x f (x, y) on a compact set Z. If the loss function f (x, y) is continuously differentiable in x for a.e. y, then this boundedness assumption can be simplified as k∇xf (x, y)k ≤ κ0 (y) for all x in a compact set. Note that, in some practical problems from logistics, the loss function f is given by f (x, y) := f1 (x) + f2 (y) where f1 (x) represents the usual operational costs while f2 (y) merely denotes the loss due to the exogenous risks. Then, the boundedness of ∂x f (x, y) will depend on the specific expression of f1 (x) only and is independent of the random factor y. In this case, the corresponding arguments in Theorems 3.1 and 3.2 will be simplified greatly. 15
3.3.2
The Case of Mixed CVaR
We now investigate the convergence of the stationary points of the SAA program associated with the mixed CVaR minimization problem. Similar to Lemma 3.1, we first derive a result related to strong law of large numbers for the sum of finitely many random compact set-valued mappings. l
Lemma 3.2 Let V ⊂ IRl be a compact set, and Ck (·, y) : V → 2IR be compact, upper semicontinuous set-valued mappings for every y ∈ Ω, k = 1, . . . , p. Consider mapping C(v, y) defined by C(v, y) := C1 (v, y) + · · · + Cp (v, y). Let {y 1 , . . . , y N } be an i.i.d. sample of y and set P i i CN (v) = N1 N i=1 C1 (v, y ) + · · · + Cp (v, y ) for every v ∈ V . Suppose that there exists π(y) with E[π(y)] < ∞ such that max{kC1 (v, y)k, kC2(v, y)k, . . . , kCp(v, y)k} ≤ π(y), Then for any µ > 0, there exists τ > 0 such that lim CN (v) ⊂ E[C1τ (v, y)] + · · · + E[Cpτ (v, y)] + µB, w.p.1
N →∞
(3.37)
S uniformly for v ∈ V , where Ckτ (v, y) := w∈B(v,τ ) Ck (w, y), k = 1, · · · , p, B(v, τ ) denotes the closed ball in IRl with radius τ and center v, B denotes the unit ball in IRl . Proof. By assumption, C(·, y) is a compact upper semicontinuous set-valued mapping on V for each y ∈ Ω. In addition, noticing the compactness of V and by the definition of norm, it follows that kC(v, y)k = supC∈C(v,y) kCk ≤ supC1 ∈C1 (v,y) kC1 k + · · · + supCp ∈Cp (v,y) kCp k = kC1 (v, y)k + · · · + kCp (v, y)k ≤ pπ(y), which implies that C(v, y) is dominated by an integrable random function. Then, by Lemma 3.1, for any µ > 0, there exists τ > 0 such that limN →∞ CN (v) ⊂ S E[Cτ (v, y)] + µB, w.p.1 uniformly for v ∈ V , where Cτ (v, y) := w∈B(v,τ ) C(w, y). Since Cτ (v, y) =
[
(C1 (w, y) + · · · + Cp (w, y))
w∈B(v,τ )
⊂
[
w∈B(v,τ )
C1 (w, y) + · · · +
[
Cp (w, y) = C1τ (v, y) + · · · + Cpτ (v, y),
w∈B(v,τ )
thus, the desired result follows immediately. λi Let C(z, y) := C1 (z, y) + · · · + CJ (z, y), where Ci (z, y) := 1−α Ai (z, y), Ai is defined in i Proposition 2.6, i = 1, . . . , J. Evidently, each Ci (·, y) is a compact and upper semicontinuous λi set-valued mapping as 1−α > 0. In addition, note that the sum of SAA terms in (3.28) is i actually CN (z) as given in Lemma 3.2. Thus, it is not hard to derive the convergence result of stationary points for the mixed CVaR SAA program, by virtue of Lemma 3.2 and by mimicking proofs of Theorems 3.1 and 3.2.
For brevity, we state the convergence results below without proof. 16
PJ Proposition 3.1 Let αi ∈ (0, 1), λi > 0 with i=1 λi = 1, i = 1, . . . , J. Let {zN } be a sequence of weak GKKT points satisfying (3.28) and z ∗ be an accumulation point. Suppose that there exist a compact set Z containing a neighborhood of z ∗ , B(z ∗ , τ1 ) ∩ Z, and a function κ1 (y) satisfying k∂x f (x, y)k ≤ κ1 (y) for any (z, y) ∈ Z × Ω, where τ1 is a small positive number and p E[ κ1 (y)2 + 1] < ∞. Then, w.p.1, z ∗ is a weak stationary point of (2.19).
Moreover, suppose that all assumptions in Theorem 3.2 hold for the corresponding mixed CVaR case. Then, as the sample size N increases, with probability approaching one exponentially fast, the difference between the global minimum value of (2.19) and the global minimum value of (3.27) becomes arbitrarily small. Appendix Proof of Proposition 2.1. (i) Recall the definition of ζ, for any α1 , α2 ∈ (0, δ), we have |ζ(x, α1 ) − ζ(x, α2 )| 1 1 + + = min u + E[f (x, y) − u] − min u + E[f (x, y) − u] u∈IR u∈IR 1 − α1 1 − α2 = min η(x, u, α1) − min {η(x, u, α1) + g(x, u)} , u∈IR
u∈IR
−α1 where η is defined in (1.3) and g(x, u) := (1−αα22 )(1−α E[f (x, y) − u]+. Without loss of generality, 1) we assume α2 ≥ α1 . Then, g(x, u) ≥ 0 for every u ∈ IR. Let u∗ (α) denote the optimal solution of min{η(x, u, α) | u ∈ IR}. We then have
min{η(x, u, α1) + g(x, u)} u∈IR
≤ η(x, u∗(α1 ), α1 ) + g(x, u∗(α1 )) = min η(x, u, α1) + g(x, u∗(α1 )) u∈IR
(3.38)
On the other hand, since η(x, u, α1) ≤ η(x, u, α1) + g(x, u) for any u ∈ IR, it yields that minu∈IR η(x, u, α1) ≤ minu∈IR {η(x, u, α1)+g(x, u)}. This together with (3.38) leads to −g(x, u∗(α1 )) ≤ minu∈IR η(x, u, α1) − minu∈IR {η(x, u, α1) + g(x, u)} ≤ 0. Thereby, |ζ(x, α1 ) − ζ(x, α2 )| = min η(x, u, α1) − min{η(x, u, α1) + g(x, u)} ≤ |g(x, u∗(α1 ))|. (3.39) u∈IR
u∈IR
1 1 In addition, since α1 , α2 ∈ (0, δ), it follows that (1−α1 )(1−α < (1−δ) 2 . Note that, by Assumption 2) ∗ + 1, E[f (x, y) − u (α)] is finite for fixed x ∈ X and α ∈ [0, 1), thereby it is true for α in the E[f (x, y) − u∗ (α)]+ underlying subinterval [0, δ] of [0, 1). Let L := sup{ | α ∈ (0, δ)} + ε, where ε (1 − δ)2 can be taken as any positive number. Noticing α ∈ (0, δ) ⊂ [0, δ], we can see that L is a positive constant and is independent of the choice of α. Then, α − α 2 1 ∗ ∗ + |g(x, u (α1 ))| = E[f (x, y) − u (α1 )] ≤ L|α1 − α2 |. (1 − α2 )(1 − α1 )
Thus, this together with (3.39) yields that |ζ(x, α1 ) − ζ(x, α2 )| ≤ L|α1 − α2 |, which leads to the global Lipschitz continuity of ζ(x, ·) on (0, δ). 17
(ii) According to the definition of ∂α ζ(x, α0 ), for any V ∈ ∂B,α ζ(x, α0 ), we have V ∈ limtk →0− ∇α ζ(x, tk + α0 ), limtk →0+ ∇α ζ(x, tk + α0 ) . By [7, Proposition 13], for V in ∂B,α ζ(x, α0 ), V is either the left side or right side derivatives of ζ(x, α) with respect to α at α0 . Hence, 1 1 + E[f (x, y) − VaRα (x)]+, E[f (x, y) − VaR+ V ∈{ α (x)] }. Thus, we derive the 0 2 (1 − α ) (1 − α0 )2 generalized Jacobian ∂α ζ(x, α0 ). This completes the proof.
Proof of Proposition 2.2. To show the differentiability of η in (x, u), it suffices to prove that of E[f (x, y)−u]+. Let %(x, u, y) = f (x, y)−u. By assumption, %(·, ·, y) is continuously differentiable for a.e. y and ∇x,u %(x, u, y) = (∇xf (x, y)T , −1)T is continuous on X × IR × Ω. Then, it is easy to verify that all conditions in [24, Corollary 1] are satisfied. Thereby, the desired results follow immediately by [24, Corollary 1]. This completes the proof. Proof of Proposition 2.5. Let x∗ be an optimal solution of (2.18), then for any x ∈ X , λ1 CVaRα1 (x∗ ) + · · · + λJ CVaRαJ (x∗ ) ≤ λ1 CVaRα1 (x) + · · · + λJ CVaRαJ (x). Set Gi (x, ui, αi) = 1 ui + 1−α E[f (x, y)−ui]+ and let u∗i ∈ arg minui ∈IR Gi(x∗ , ui, αi), i = 1, . . . , J. Then, CVaRαi (x∗ ) = i Gi (x∗ , u∗i , αi ). So, for any x ∈ X and u1 , · · · , uJ ∈ IR, λ1 η1 (x∗ , u∗1 , α1 ) + · · · + λJ ηJ (x∗ , u∗J , αJ ) = λ1 CVaRα1 (x∗ ) + · · · + λJ CVaRαJ (x∗ ) ≤ λ1 CVaRα1 (x) + · · · + λJ CVaRαJ (x) = λ1 min η1 (x, u1, α1 ) + · · · + λJ min ηJ (x, uJ , αJ ) u1 ∈IR
uJ ∈IR
≤ λ1 η1 (x, u1 , α1 ) + · · · + λJ ηJ (x, uJ , αJ ). Thus, (x∗ , u∗1 , u∗2 , . . . , u∗J ) is an optimal solution of (2.19). On the other hand, assume (x∗ , u∗1 , u∗2 , . . . , u∗J ) is an optimal solution of (2.19). Then, for fixed x∗ , λ1 η1 (x∗ , u∗1 , α1 ) + · · · + λJ ηJ (x∗ , u∗J , αJ ) ≤ λ1 η1 (x∗ , u1 , α1 ) + · · · + λJ ηJ (x∗ , uJ , αJ ) for any u1 , . . . , uJ ∈ IR. Note that each λi is strictly positive, we then have ηi(x∗ , u∗i , αi) ≤ ηi (x∗ , ui, αi ) for every ui ∈ IR. This implies that u∗i ∈ arg minui ∈IR ηi(x∗ , u∗i , αi ). So, for each i, CVaRαi (x∗ ) = ηi (x∗ , u∗i , αi). Denote by ui (x, αi) the optimal solution of min ηi (x, ui, αi). Then, λ1 CVaRα1 (x∗ ) + · · · + λJ CVaRαJ (x∗ ) = λ1 η1 (x∗ , u∗1 , α1 ) + · · · + λJ ηJ (x∗ , u∗J , αJ ) ≤ λ1 η1 (x, ui(x, α1)) + · · · + λJ ηJ (x, uJ (x, αJ )) = λ1 CVaRα1 (x) + · · · + λJ CVaRαJ (x), for any x ∈ X . Hence, x∗ is an optimal solution of (2.18). This completes the proof. Acknowledgments The first author would like to thank Professor Huifu Xu of the University of Southampton for fruitful discussions on SAA methods and related topics.
References [1] Anderson, F., Mausser, H., Rosen, D., Uryasev, S.: Credit risk optimization with conditional Value-at-Risk criterion. Math. Program. 89, 273–291 (2001) 18
[2] Artzner, P., Delbaen, F., Eber, J. M., Heath, D.: Coherent measures of risk. Math. Finance. 9, 203–228 (1999) [3] Hurlimann, W.: Conditional Value-at-Risk bounds for compound Poisson risks and a normal approximation. J. Appl. Math. 3, 141–153 (2003) [4] Krokhmal, P., Palmquist, J., Uryasev, S.: Portfolio optimization with conditional Value-AtRisk objective and constraints. J. Risk. 4, 43–68 (2002) [5] L¨ uthi, H-J., Doege, J.: Convex risk measures for portfolio optimization and concepts of flexibility. Math. Program. 104, 541–559 (2005) [6] Pang, J.-S., Leyffer, S.: On the global minimization of the Value-at-Risk. Optim. Methods and Software. 19, 611–631 (2004) [7] Rockafellar, R. T., Uryasev, S.: Conditional Value-at-Risk for general loss distributions. J. Banking and Finance. 26, 1443–1471 (2002) [8] Acerbi, C.: Spectral measures of risk: A coherent representation of subjective risk aversion. J. Banking and Finance. 26, 1505–1518 (2002) [9] Bogentoft, E., Romeijn, H. E., Uryasev, S.: Asset/liability management for pension funds using CVaR constraints. J. Risk Finance. 3, 57–71 (2001) [10] Chen, W., Sim, M., Sun, J., Teo, C. P.: From CVaR to uncertainty set: Implications in joint chance constrained optimization. Oper. Res. (to appear) [11] K¨ uzi-Bay, A., Mayer, J.: Computational aspects of minimizing conditional value-at-risk. Comput. Manag. Sci. 3, 3–27 (2006) [12] Shapiro A.: Stochastic Programming by Monte Carlo Simulation Methods. Published electronically in: Stochastic Programming E-Print Series (2000) [13] Santoso, T., Ahmed, S., Goetschalckx, M., Shapiro, A.: A stochastic programming approach for supply chain network design under uncertainty. Eur. J. Oper. Res. 67, 96–115 (2005) [14] Meng, F., Xu, H.: A regularized sample average approximation method for stochastic mathematical programs with nonsmooth equality constraints. SIAM J. Optim. 17, 891–919 (2006) [15] Shapiro, A.: Stochastic mathematical programs with equilibrium constraints. J. Optim. Theory Appl. 128, 223–243 (2006) [16] Xu, H., Meng, F.: Convergence analysis of sample average approximation methods for a class of stochastic mathematical programs with equality constraints. Math. Oper. Res. 32, 648–668 (2007)
19
[17] Artstein, Z., Vitale, R. A.: A strong law of large numbers for random compact sets. Ann. Prob. 3, 879–882 (1975) [18] Clarke, F. H.: Optimization and Nonsmooth Analysis. John Wiley and Sons, New York (1983) [19] Rockafellar, R. T.: Convex Analysis. Princeton University Press, Princeton (1970) [20] Wets, R.: Stochastic programming. In: Nemhauser, G. L., Rinnooy. Kan, A. H. H., Todd, M. J. (eds.) Handbooks in OR & MS, vol. 1, pp. 573–629. North-Holland Publishing Company, Amsterdam (1989) [21] Homen-De-Mello, T.: Estimation of derivatives of nonsmooth performance measures in regenerative systems. Math. Oper. Res. 26, 741–768 (2001) [22] Ruszczy´ nski, A., Shapiro, A.: Optimality and duality in stochastic programming. In: Ruszczy´ nski, A., Shapiro, A. (eds.) Handbooks in OR & MS Stochastic Programming, vol. 10, pp. 65–140. North-Holland Publishing Company, Amsterdam (2003) [23] Shapiro, A., Xu, H.: Stochastic mathematical programs with equilibrium constraints, modeling and sample average approximation. Optimization. 57, 395–418 (2008) [24] Qi, L., Shapiro, A., Ling, C.: Differentiability and semismoothness properties of integral functions and their applications. Math. Program. 102, 223–248 (2005)
20