On Distributionally Robust Chance Constrained ... - Optimization Online

On Distributionally Robust Chance Constrained Program with Wasserstein Distance Weijun Xie∗1 1

Department of Industrial and Systems Engineering, Virginia Tech, Blacksburg, VA 24061 June 15, 2018

Abstract This paper studies a distributionally robust chance constrained program (DRCCP) with Wasserstein ambiguity set, where the uncertain constraints should satisfy with a probability at least a given threshold for all the probability distributions of the uncertain parameters within a chosen Wasserstein distance from an empirical distribution. In this work, we investigate equivalent reformulations and approximations of such problems. We first show that a DRCCP can be reformulated as a conditionalvalue-at-risk constrained optimization problem, and thus admits tight inner and outer approximations. When the metric space of uncertain parameters is a normed vector space, we show that a DRCCP of bounded feasible region is mixed integer representable by introducing big-M coefficients and additional binary variables. For a DRCCP with pure binary decision variables, by exploring submodular structure, we show that it admits a big-M free formulation and can be solved by branch and cut algorithm. This result can be generalized to mixed integer DRCCPs. Finally, we present a numerical study to illustrate effectiveness of the proposed methods.

∗ Email:

[email protected].

1

1

Introduction

1.1

Setting

A distributionally robust chance constrained program (DRCCP) is of the form: min c> x,

(1a)

s.t. x ∈ S, n o inf P ξ˜ : a(x)> ξ˜i ≤ bi (x), ∀i ∈ [I] ≥ 1 − .

(1b) (1c)

P∈P

In (1), the vector x ∈ Rn denotes the decision variables; the vector c ∈ Rn denotes the objective function coefficients; the set S ⊆ Rn denotes deterministic constraints on x; and the constraint (1c) is a chance constraint involving I uncertain inequalities specified by the random vectors ξ˜i supported on set Ξi ⊆ Rn for each i ∈ [I] with a joint probability distribution P from a family P, termed “ambiguity set”. We let [R] := {1, 2, . . . , R} for any positive integer R, and for each uncertain constraint i ∈ [I], a(x) ∈ Rn and bi (x) ∈ R denote affine mappings of x such that a(x) = ηx + (1 − η)e and bi (x) = Bi> x + bi with η ∈ {0, 1}, all-one vector e ∈ Rn , Bi ∈ Rn , and bi ∈ R, respectively. For notational convenience, we let Q Ξ= Ξi and ξ˜ = (ξ˜1 , . . . , ξ˜I ). Note that (i) for any i, j ∈ [I] and i 6= j, random vectors ξ˜i and ξ˜j i∈[I]

can be correlated; and (ii) we use η ∈ {0, 1} to differentiate whether (1c) involves left-hand uncertainty (i.e., η = 1) or right-hand uncertainty (i.e., η = 0). The chance constraint (1c) requires that all I uncertain constraints are simultaneously satisfied for all the probability distributions from ambiguity set P with a probability at least (1 − ), where ∈ (0, 1) is a specified risk tolerance. We call (1) a single DRCCP if I = 1 and a joint DRCCP if I ≥ 2. Also, (1) is termed a DRCCP with right-hand uncertainty if η = 0 and a DRCCP with left-hand uncertainty, otherwise. For a joint DRCCP, if I = 2, ξ˜1 = −ξ˜2 , we call (1) as a two-sided DRCCP. We denote the feasible region induced by (1c) as n o n >˜ ˜ Z := x ∈ R : inf P ξ : a(x) ξi ≤ bi (x), ∀i ∈ [I] ≥ 1 − . (2) P∈P

In this paper, we consider Wassertein ambiguity set P. (A1) The Wasserstein ambiguity set P is defined as o n ˜ ζ)] ˜ ≤δ , P = P ∈ P0 (Ξ) : EP×Pζ˜ [d(ξ,

(3)

where Pζ˜ denotes the discrete empirical distribution of ζ˜ on the countable support Z = {ζ j }j∈[N ] ⊆ Ξ with point mass function {pj }j∈[N ] , d : Ξ × Ξ → R+ denotes distance metric and δ > 0 denotes the Wasserstein radius. We assume that (Ξ, d) is a totally bounded Polish (separable complete metric) space with distance metric d, i.e., for every b > 0, there exists a finite covering of Ξ by balls with radius at most b . Note that The Wasserstein metric measures the distance between true distribution and empirical distribution and is able to recover the true distribution when the number of sampled data goes to infinity [14].

1.2

Related Literature

There are significant work on reformulation, convexity and approximations of set Z under various ambiguity sets [7, 18, 19, 21, 37, 39]). For a single DRCCP, when P consists of all probability distributions with given first and second moments, the set Z is second-order conic representable [7, 13]. Similar 2

convexity results hold for single DRCCP when P also incorporates other distributional information such as the support of ξ˜ [10], the unimodality of P [18, 24], or arbitrary convex mapping of ξ˜ [37]. For a joint DRCCP, [19] provided the first convex reformulation of Z in the absence of coefficient uncertainty, i.e. η = 0, when P is characterized by the mean, a positively homogeneous dispersion measure, and a ˜ For the more general coefficient uncertainty setting, [37] identified several sufficient conic support of ξ. conditions for Z to be convex (e.g., when P is specified by one moment constraint), and [36] showed that Z is convex for two-sided DRCCP when P is characterized by the first two moments. When DRCCP set Z is not convex, many inner convex approximations have been proposed. In [9], the authors proposed to aggregate the multiple uncertain constraints with positive scalars in to a single constraint, and then use conditional-value-at-risk (CVaR) approximation scheme [28] to develop an inner approximation of Z. This approximation is shown to be exact for single DRCCP when P is specified by first and second moments in [43] or, more generally, by convex moment constraints in [37]. In [38], the authors provided several sufficient conditions under which the well-known Bonferroni approximation of joint DRCCP is exact and yields a convex reformulation. Recently, there are many successful developments on data driven distributionally robust programs with Wasserstein ambiguity set (3) [16, 27, 41]. For instance, [16, 27] studied its reformulation under different settings. Later on, [4, 15, 23, 32] applied it to the optimization problems related with machine learning. Other relevant works can be found [3, 17, 22, 26]. However, there is very limited literature on DRCCP with Wasserstein ambiguity set. In [35], the authors proved that it is strongly NP-hard to optimize over the DRCCP set Z with Wasserstein ambiguity set and proposed a bicriteria approximation for a class of DRCCP with covering uncertain constraints (i.e., S is a closed convex cone and Ξi ∈ Rn− , Bi ∈ Rn+ , bi ∈ R− for each i ∈ [I]). In [11], the authors considered two-sided DRCCP with right-hand uncertainty and proposed its tractable reformulation, while in [20], the authors studied CVaR approximation of DRCCP. As far as the author is concerned, there is no work on developing tight approximations and exact reformulations of general DRCCP with Wasserstein ambiguity set.

1.3

Contributions

In this paper, we study approximations and exact reformulations of DRCCP under Wasserstein ambiguity set. In particular, our main contributions are summarized as below. 1. We derive a deterministic equivalent reformulation for set Z and show that this reformulation admits a conditional-value-at-risk (CVaR) interpretation. Based upon this fact, we are able to derive tight inner and outer approximations. 2. When the support Ξ is an n × I- dimensional vectlor space and the distance metric is a norm (i.e, d(ξ, ζ) = kξ − ζk), we show that the feasible region S ∩ Z of a DRCCP, once bounded, is mixed integer representable with big-M coefficients and N × I additional binary variables. We also derive compact formulations for the proposed inner and outer approximations and compare their strengths. 3. When the decision variables are pure binary (i.e., S ⊆ {0, 1}n ), we first show that the nonlinear constraints in the reformulation can be recast as submodular knapsack constraints. Then, by exploring the polyhedral properties of submodular functions, we propose a new big-M free mixed integer linear reformulation. In a numerical study, we further show that the proposed formulation can be effectively solved by branch and cut algorithm. The remainder of the paper is organized as follows. Section 2 presents exact reformulation of DRCCP set Z as well as its inner and outer approximations under a general setting. Section 3 provides (mixed integer) convex reformulations of feasible region S ∩ Z and its inner and outer approximations when the metric space of the random variables is a normed vector space. Section 4 studies binary DRCCP (i.e., S ⊆ {0, 1}n ), develops a big-M free formulation, and numerically illustrates the proposed

3

methods. Section 5 concludes the paper. Notation: The following notation is used throughout the paper. We use bold-letters (e.g., x, A) to denote vectors or matrices, and use corresponding non-bold letters to denote their components. We let e be the all-ones vector, and let ei be the ith standard basis vector. Given an integer n, we let [n] := {1, 2, . . . , n}, and use Rn+ := {x ∈ Rn : xl ≥ 0, ∀l ∈ [n]} and Rn− := {x ∈ Rn : xl ≤ 0, ∀l ∈ [n]}. Given a real number t, we let (t)+ := max{t, 0}. Given a finite set I, we let |I| denote its cardinality. We let ξ˜ denote a random vector with support Ξ and denote one of its realization by ξ. Given a set R, the characteristic function χR (x) = 0 if x ∈ R, and ∞, otherwise, while the indicator function I(x ∈ R) =1 if x ∈ R, and 0, otherwise. For a matrix A, we let Ai• denote ith row of A and A•j denote jth column of A. Additional notation will be introduced as needed. Given a subset T ⊆ [n], we define an n-dimensional ( 1, if τ ∈ T . binary vector eT as (eT )τ = 0, if τ ∈ [n] \ T

2

General Case: Reformulations and Approximations

In this section, we will study an equivalent reformulation of set Z under Assumption (A1). This reformulation has conditional-value-at-risk (CVaR) interpretation, therefore allows us to derive tight inner and outer approximations.

2.1

Exact Reformulation

In this part, we will reformulate set Z into its deterministic counterpart. The main idea of this reformulation is that we first use the strong duality result from [16] to formulate the worst-case chance constraint into its dual form, and then break down the indicator function according to its definition. Theorem 1. Under Assumption (A1), set Z is equivalent to     X 1 min λf (x, ζ j ) − 1, 0 , λ ≥ 0 , Z = x ∈ Rn : δλ − ≤   N

(4)

j∈[N ]

where f (x, ζ) := min

inf

i∈[I] ξ∈Ξ,a(x)> ξi >bi (x)

d(ξ, ζ).

(5)

Proof. Note that n o inf P ξ˜ : a(x)> ξ˜i ≤ bi (x), ∀i ∈ [I] ≥ 1 −

P∈P

is equivalent to n o sup P ξ˜ : a(x)> ξ˜i > bi (x), ∃i ∈ [I] ≤ . P∈P

o n By Theorem 1 in [16], supP∈P P ξ˜ : a(x)> ξ˜i > bi (x), ∃i ∈ [I] is equivalent to min λδ − λ≥0

1 X inf λd(ξ, ζ j ) − I a(x)> ξi > bi (x), ∃i ∈ [I] . ξ∈Ξ N

(6a)

j∈[N ]

Thus set Z is equivalent to     X 1 Z := x ∈ Rn : λδ − inf λd(ξ, ζ j ) − I a(x)> ξi > bi (x), ∃i ∈ [I] ≤ , λ ≥ 0 .   ξ∈Ξ N j∈[N ]

4

(6b)

We now break down the indicator function in the infimum of (6b) and reformulate it as below. Claim 1. for given λ ≥ 0 and ζ ∈ Z, we have inf λd(ξ, ζ) − I a(x)> ξi > bi (x), ∃i ∈ [I] = min min

inf

i∈[I] ξ∈Ξ,a(x)> ξi >bi (x)

ξ∈Ξ

[λd(ξ, ζ) − 1] , 0 .

(6c)

Proof. We first note that I a(x)> ξi > bi (x), ∃i ∈ [I] = maxi∈[I] I a(x)> ξi > bi (x) . Thus, inf λd(ξ, ζ) − I a(x)> ξi > bi (x), ∃i ∈ [I] = min inf λd(ξ, ζ) − I a(x)> ξi > bi (x) . ξ∈Ξ

i∈[I] ξ∈Ξ

Therefore, we only need to show that for any i ∈ [I], > inf λd(ξ, ζ) − I a(x) ξi > bi (x) = min ξ∈Ξ

inf

ξ∈Ξ,a(x)> ξi >bi (x)

[λd(ξ, ζ) − 1] , 0 .

(6d)

There are two cases: Case 1. If a(x)> ζi > bi (x), then in the left-hand side of (6d), the infimum is equal to −1 by letting ξ := ζ, which equals to the right-hand side since the infimum is also achieved by ξ := ζ. Case 2. If a(x)> ζi ≤ bi (x), then for any ξ ∈ Ξ, we either have a(x)> ξi > bi (x) or a(x)> ξi ≤ bi (x). Hence, the left-hand side of (6d) is equivalent to inf λd(ξ, ζ) − I a(x)> ξi > bi (x) ξ∈Ξ = min inf [λd(ξ, ζ) − 1] , inf [λd(ξ, ζ)] ξ∈Ξ,a(x)> ξi >bi (x) ξ∈Ξ,a(x)> ξi ≤bi (x) = min inf [λd(ξ, ζ) − 1] , 0 , ξ∈Ξ,a(x)> ξi >bi (x)

where inf ξ∈Ξ,a(x)> ξi ≤bi (x) [d(ξ, ζ)] = 0 by letting ξ := ζ. 3 Thus, By Claim 1, set Z is equivalent to (4).2 In Theorem 1, we must have λ > 0, thus can define a new variable γ = Z into the following equivalent form.

1 λ

in (4) and reformulate set

Theorem 2. Under Assumption (A1), set Z is equivalent to     X 1 Z = x ∈ Rn : δ − γ ≤ min f (x, ζ j ) − γ, 0 , γ ≥ 0 ,   N

(7)

j∈[N ]

where function f (·, ·) is defined in (5). Proof. Next, let Z 0 denote the set in the right-hand side of (7), we only need to show that sets Z, Z 0 are equivalent, i.e., Z = Z 0 . (Z ⊆ Z 0 ) Given x ∈ Z, there exists λ ≥ 0 such that (x, λ) satisfies (4). If λ > 0, then let γ = is easy to see that (x, γ) satisfies (7). Hence, x ∈ Z 0 . Now suppose that λ = 0, then in (4), we have − ≤ −1 a contradiction that ∈ (0, 1). 5

1 λ.

Then it

(Z ⊇ Z 0 ) Similarly, given x ∈ Z 0 , there exists γ ≥ 0 such that (x, γ) satisfies (7). If γ > 0, then let λ = γ1 . Then it is easy to see that (x, λ) satisfies (4). Hence, x ∈ Z. Now suppose that γ = 0, then in (7), we have min f (x, ζ j ) − γ, 0 := 0 for each j ∈ [N ]. Thus, (7) reduces to δ ≤ 0 contradicting that δ > 0. 2 Before showing a conditional-value-at-risk (CVaR) interpretation of set Z, let us begin with the fol˜ let P and F ˜ (·) be its probability distribution and lowing two definitions. Given a random variable X, X ˜ is cumulative distribution function, respectively. Then (1 − )-value at risk (VaR) of X ˜ := min {s : F ˜ (s) ≥ 1 − } , VaR1− (X) X while its (1 − )-conditional-value-at-risk (CVaR) [31] is defined as h i ˜ −β ˜ := min β + 1 EP X . CVaR1− (X) β + With the definitions above, we observe that set Z in (7) has a CVaR interpretation. Corollary 1. Under Assumption (A1), set Z is equivalent to h i δ ˜ ≤0 , Z = x ∈ Rn : + CVaR1− −f (x, ζ) h i h i ˜ = minγ γ + 1 EP −f (x, ζ) ˜ −γ where f (·, ·) is defined in (5), and CVaR1− −f (x, ζ) . ˜ ζ

(8)

+

Proof. First, we observe that the constraint in (7) directly implies γ > 0, thus the nonnegativity constraint of γ can be dropped, i.e., equivalently, we have     X 1 δ max −f (x, ζ j ) + γ, 0 ≤ 0 . Z = x ∈ Rn : − γ +   N j∈[N ]

Next, in the above formulation, let γ := −γ and replace the existence of γ by finding the best γ such that the constraint still holds, we arrive at         X δ 1 Z = x ∈ Rn : + min γ + max −f (x, ζ j ) − γ, 0 ≤0 , γ     N j∈[N ]

which is equivalent to (8).2 In the following sections, our derivations of exact reformulations is based upon Theorem 2 while the approximations are mainly according to CVaR interpretation in Corollary 1.

6

2.2

Outer and Inner Approximations

In this subsection, we will introduce one outer approximation and three different inner approximations by exploring the exact reformulations in the previous section. ˜ we have VaR Outer Approximation: Note from [31] that for any random variable X, 1 h i ˜ = VaR1− X ˜ − VaR1− X ˜ + E X ˜ ˜ . CVaR1− X ≥ VaR1− X + Therefore, in Corollary 1, if we replace CVaR1− (·) by VaR1− (·), then we have the following outer approximation of set Z. Theorem 3. Under Assumption (A1), set Z is outer approximated by δ n ˜ ≥1− . (9) ZVaR = x ∈ R : Pζ˜ f (x, ζ) ≥ h i h i ˜ ≥ VaR1− −f (x, ζ) ˜ . Therefore, Proof. Due to the well known result in [31] that CVaR1− −f (x, ζ) set Z is outer approximated by h i δ ˜ ≤0 , ZVaR = x ∈ Rn : + VaR1− −f (x, ζ) which is equivalent to ZVaR =

˜ ≥ δ ≥1− . x ∈ Rn : Pζ˜ f (x, ζ)

Inner Approximation I- Robust Scenario Approximation: On the other hand, we notice that for any ˜ we have random variable X, ˜ ≤ CVaR1 X ˜ := ess. sup(X). ˜ CVaR1− X Thus, in Corollary 1, if we replace CVaR1− (·) by ess. sup(·), then we have the following inner approximation of set Z. Theorem 4. Under Assumption (A1), set Z is inner approximated by δ (10) ZS = x ∈ Rn : f (x, ζ j ) ≥ , ∀j ∈ [N ] . h i h i ˜ ≤ ess. sup −f (x, ζ) ˜ , and ζ˜ is a discrete random vector, therefore, set Proof. Since CVaR1− −f (x, ζ) Z can inner approximated by δ n ˜ ZS = x ∈ R : Pζ˜ f (x, ζ) ≥ =1 , which is equivalent to (10). 2 Note that set ZS has a similar structure as scenario approach to chance constrained problem [6, 8, 29], and indeed can be viewed as a “robust” scenario approach to chance constrained problem. We will discuss more on this fact in Section 3.2. Inner Approximation II- Inner Chance Constrained Approximation: Next we propose a chance constrained inner approximation of DRCCP set Z by constructing a feasible γ in (7). 7

Theorem 5. Under Assumption (A1) and ∈ (0, 1), N ∈ / Z+ , set Z is inner approximated by bN c δ n ˜ ZI = x ∈ R : Pζ˜ f (x, ζ) ≥ ≥ 1 − α, 0 ≤ α ≤ . −α N

(11)

Proof. For any x ∈ ZI , we would like to show that x ∈ Z. Since x ∈ ZI , there exists an α such that Nδ (x, α) satisfies constraints in (11). Now let us define γ = N −bN αc . Let define a set C = j ∈ [N ] : f (x, ζ j ) ≤ γ . Since

δ −α

≥

Nδ N −bN αc

:= γ, thus by (11), |C| ≤ bN αc. Hence,

1 X 1 X |C| bN αc f (x, ζ j ) − γ ≥ − γ ≥ − min f (x, ζ j ) − γ, 0 = γ = δ − γ, N N N N j∈C

j∈[N ]

where the first inequality is due to f (x, ζ j ) ≥ 0 and the second inequality is due to |C| ≤ bN αc.2 We remark that this result together with set ZVaR shows that DRCCP set Z can be inner and outer approximated by regular chance constraints with empirical probability distribution Pζ˜. We also note that (i) set ZS is a special case of set ZI by letting α = 0, thus, we must have ZS ⊆ ZI ; (ii) there are bN c + 1 non-dominant α values, that is, we must have α ∈ {0, N1 , . . . , bNNc }. Indeed, i suppose that α ∈ ( i−1 N , N ) for an i ∈ [bN c], then the feasible region expands if we decrease the value of α to i−1 N . Therefore, to optimize over set ZI , we can enumerate these bN c + 1 values of α and choose the one which yields the smallest objective value. These two results are summarized below. Corollary 2. Suppose that Assumption (A1) holds and ∈ (0, 1), N ∈ / Z+ and set ZI is defined in (11), then (i) ZS ⊆ ZI ; and (ii) set ZI is equivalent to n ˜ ≥ ZI = x ∈ R : Pζ˜ f (x, ζ)

δ −α

≥ 1 − α, α ∈

bN c 1 0, , . . . , N N

.

(12)

Inner Approximation III- CVaR Approximation: Finally, we close this section by studying a well known convex approximation of a chance constraint, which is to replace the nonconvex chance constraint by a convex constraint defined by CVaR (cf., [28]). For a DRCCP, the resulting formulation is ( ZCVaR =

" x ∈ Rn : sup inf −β + EP

P∈P β

>

max a(x) ξ − bi (x) + β i∈[I]

#

) ≤ 0.

(13)

+

Set ZCVaR (13) is convex and is an inner approximation of set Z. The following results show a reformulation of set ZCVaR . We would like to acknowledge that this result has been independently observed by a recent work in [20]. For the completeness of this paper, we present a proof with our notation as below. Theorem 6. Set ZCVaR ⊆ Z and is equivalent to   " # X     j >  − β + λδ − 1 inf λd(ξ, ζ ) − max a(x) ξi − bi (x) + β ≤ 0, ξ∈Ξ N i∈[I] ZCVaR = x : + j∈[N ]       λ, β ≥ 0.

8

(14a) (14b)

Proof. Note that Wassersetein ambiguity set P is weakly compact [5], thus according to Theorem 2.1 in [33], set ZCVaR is equivalent to ( ZCVaR =

"

x : inf −β + sup EP max a(x)> ξ − bi (x) + β β

i∈[I]

P∈P

)

# ≤ 0.

(15)

+

Note that in the (15), the infimum must be achieved. Indeed, we first note that for any β < 0, the inequality in (15) will not be satisfied. Thus, we must have β ≥ 0. On the other hand, we note that − β + sup EP max a(x)> ξ − bi (x) + β ≥ −β + EPζ˜ max a(x)> ζ˜ − bi (x) + β P∈P

i∈[I]

+

i∈[I]

+

where the inequality is due to Pζ˜ ∈ P. The right-hand side of the above inequality will be equal to (1 − )β > 0 for any β > maxi∈[I],j∈[N ] bi (x) − a(x)> ζij , 0 . Thus, the best β in (15) is bounded, i.e., ZCVaR is equivalent to ( ) > ≤ 0, β ≥ 0. (16) ZCVaR = x : − β + sup EP max a(x) ξi − bi (x) + β P∈P

i∈[I]

+

By Theorem 1 in [16], the above formulation is further equal to    " #     X    min −β + λδ − 1  ≤ 0, inf λd(ξ, ζ j ) − max a(x)> ξ − bi (x) + β ξ∈Ξ N i∈[I] ZCVaR = x : λ≥0 + j∈[N ]         β ≥ 0, which is equivalent to (14).2

3

DRCCP with Normed Vector Space (Ξ, d)

Note that the results in the previous section are quite general. In this section, we will show that these results can be significantly simplified given that (Ξ, d) is a normed vector space. In particular, we make the following assumption. (A2) The support Ξ is an n × I-dimensional vector space and distance metric d(ξ, ζ) = kξ − ζk.

3.1

Exact Mixed Integer Program Reformulation

In this subsection, we show that set Z is mixed integer representable under Assumptions (A1)-(A2). To begin with, we observe that under additional Assumption (A2), function f (·, ·) in Theorem 2 can be explicitly calculated. Thus, Theorem 7. Under Assumptions (A1)-(A2), set Z is equivalent to   1 X j    δ − γ ≤ min f (x, ζ ) − γ, 0 , N Z = x ∈ Rn : j∈[N ]     γ ≥ 0,

9

(17a) (17b)

where (

) max bi (x) − a(x)> ζi , 0 f (x, ζ) = min min , min χ{x:bi (x) n zj + γ ≤ max bi (x) − a(x) ζi , 0 , ∀i ∈ [I], j ∈ [N ], Z1 = x ∈ R :       zj ≤ 0, ∀j ∈ [N ],           ka(x)k ≤ ν, ∗       ν > 0, γ ≥ 0,

(19a) (19b) (19c) (19d) (19e)

and Z2 = {x ∈ Rn :a(x) = 0, bi (x) ≥ 0, ∀i ∈ [I]} .

(20)

Proof. We need to show that Z1 ∪ Z2 ⊆ Z and Z ⊆ Z1 ∪ Z2 . Z1 ∪ Z2 ⊆ Z. Given x ∈ Z2 , we have I(x) = [I], thus f (x, ζ) (defined in (18)) is ∞. Thus, let γ = Clearly, (γ, x) satisfies all the constraints in (17), i.e., x ∈ Z. Hence, Z2 ⊆ Z.

δ .

Given x ∈ Z1 , there exists (γ, ν, z, x) which satisfies constraints in (19). Suppose that I(x) = [I], then we have a(x) = 0. Hence, for each i ∈ I(x), we have (19a) and (19b) imply that δν − γ ≤

1 X 1 X zj ≤ (max {bi (x), 0} − γ), N N j∈[N ]

j∈[N ]

which is equivalent to max {bi (x), 0} ≥ δν + (1 − )γ > 0. That is, bi (x) > 0. Thus, x ∈ Z2 ⊆ Z. Now we suppose that I(x) = ∅. For each i ∈ [I], (19a) and (19b) along with ν > 0 imply that n o γ zj 1 > j ≤ min min max bi (x) − a(x) ζi , 0 − , 0 ν ν i∈[I] ν n o    max bi (x) − a(x)> ζij , 0 γ  ≤ min min − ,0 i∈[I] ka(x)k∗ ν  n o γ = min f (x, ζ j ) − , 0 ν where the second inequality due to (19d). Then according to (19a), we have n γ 1 X zj 1 X γ o δ− ≤ ≤ min f (x, ζ j ) − , 0 ν N ν N ν j∈[N ]

j∈[N ]

i.e., (γ/ν, x) satisfies the constraints in (17), i.e., x ∈ Z. Thus, Z1 ⊆ Z. 10

Z ⊆ Z1 ∪ Z2 Similarly, given x ∈ Z, there exists (γ, x) which satisfies constraints in (17). Suppose that a(x) = 0, then we must have bi (x) ≥ 0 for all i ∈ [I], otherwise, we have f (x, ζ j ) = 0 for all j ∈ [I]. Then (17a) is equivalent to 0 < δ ≤ ( − 1)γ a contradiction that γ ≥ 0, ∈ (0, 1). Hence, we must x ∈ Z2 . From now on, we assume that a(x) 6= 0. Let us define γ b = γka(x)k∗ , ν = ka(x)k∗ , and zj = b, 0) for each j ∈ [N ]. Clearly, (b γ , ν, z, x) satisfies constraints mini∈[I] (max{bi (x) − a(x)> ζij , 0} − γ in (19), i.e., x ∈ Z1 . 2 We remark that set Z2 is usually trivial. Remark 1.

(i) If η = 1, then Z2 = {x ∈ Rn : x = 0, bi ≥ 0, ∀i ∈ [I]} ;

(ii) if η = 0, then Z2 = ∅. On the other hand, set Z1 can be formulated as a mixed integer set when it is bounded, i.e., we can introduce binary variables to represent constraints (19b). I×N Theorem 9. Suppose there exists an M ∈ R+ such that

max |bi (x) − a(x)> ζij | ≤ Mij

x∈Z1

for all i ∈ [I], j ∈ [N ]. Then Z1 is mixed integer representable as below:   1 X     δν − γ ≤ zj ,     N     j∈[N ]           z + γ ≤ s , ∀i ∈ [I], j ∈ [N ],   j ij   n Z1 = x ∈ R :sij ≥ bi (x) − a(x)> ζij , ∀i ∈ [I], j ∈ [N ],       > j   s ≤ M y , s ≤ b (x) − a(x) ζ + M (1 − y ), ∀i ∈ [I], j ∈ [N ],   ij ij ij ij i ij ij i         ka(x)k∗ ≤ ν,         ν > 0, γ ≥ 0, sij ≥ 0, zj ≤ 0, yij ∈ {0, 1}, ∀i ∈ [I], j ∈ [N ].

(21a) (21b) (21c) (21d) (21e) (21f)

Proof. To prove that Z1 is equivalent to the right-hand side of (21), it is sufficient to show that for each i ∈ [I], j ∈ [N ], n o max bi (x) − a(x)> ζij , 0 = sij .

There are three cases: > j Case 1. if bi (x) − a(x)> ζij < 0, then we must have yij = 0 (otherwise, we n have sij ≤ bi (x) − o a(x) ζi
ζij , 0 .

n o Case 2. if bi (x) − a(x)> ζij = 0, then for any yij ∈ {0, 1}, we have sij = 0 = max bi (x) − a(x)> ζij , 0 . Case 3. if bi (x) − a(x)> ζij > 0, then we must have yij = 1 (otherwise, we have bi (x) − a(x)> ζij ≤ sij ≤ Mj yjn= 0, a contradictionothat sij ≥ bi (x)−a(x)> ζij > 0). Thus, we has sij = bi (x)−a(x)> ζij = max bi (x) − a(x)> ζij , 0 .

11

2 Note that there are various methods introduced by literature [30, 34] to obtain big-M coefficients and in the numerical study section, we will derive the big-M coefficients by inspection. In formulation (21), there are I × N binary variables and big-M coefficients, causing it very challenging to solve. In the next section, we will show that set Z1 can be approximated to arbitrary accuracy by a big-M free formulation, and a branch and cut algorithm can be used to solve the approximated formulation. DRCCP with Right-hand Uncertainty: In this special case, we consider DRCCP with right-hand uncertainty, i.e., η = 0, a(x) = e. We first note that by Theorem 7, set Z with a(x) = e yields a more compact formulation. Theorem 10. If η = 0, a(x) = e, then under Assumptions (A1)-(A2), set Z is equivalent to the following mathematical program:   1 X     (22a) zj , δ − γ ≤     N     j∈[N ] n o Z = x ∈ Rn : √  (22b) (zj + γ) n ≤ max bi (x) − e> ζij , 0 , ∀j ∈ [N ], i ∈ [I],           (22c) zj ≤ 0, ∀j ∈ [N ], γ ≥ 0. Proof. The result directly follows from Theorem 7. To reformulate set Z in (22) as a mixed integer program, we observe that without loss of generality, is nonnegative for all j ∈ [N ], i ∈ [I]. Indeed, suppose that L := minj∈[N ],i∈[I] e> ζij < 0, then we can redefine e> ζij := e> ζij − L and bi (x) := bi (x) − L for all j ∈ [N ], i ∈ [I]. e> ζij

I×N Theorem 11. Suppose that η = 0, a(x) = e and e> ζij ≥ 0 for all i ∈ [I] and there exists an M ∈ R+ such that max bi (x) − e> ζij ≤ Mij x∈Z

for all j ∈ [N ], i ∈ [I]. Then set Z is               Z = x ∈ Rn             

1 X zj , N j∈[N ] √ (zj + γ) n ≤ sij , ∀j ∈ [N ], i ∈ [I], δ − γ ≤

:sij ≤ bi (x) −

e> ζij yij , ∀j

∈ [N ], i ∈ [I],

sij ≤ Mij yij , ∀j ∈ [N ], i ∈ [I], sij ≥ bi (x) −

e> ζij , ∀j

∈ [N ], i ∈ [I],

γ ≥ 0, zj ≤ 0, yij ∈ {0, 1}, ∀j ∈ [N ], i ∈ [I].

                          

(23a) (23b) (23c) (23d) (23e) (23f)

Proof. Let Zb denote the set on right-hand side of (23), we would like to show that Zb = Z. b Given x ∈ Z, there exists (γ, z, x) which satisfies constraints (22). Now let sij = max{bi (x) − Z ⊆ Z. e> ζij , 0} and yij = 1 if bi (x) ≥ e> ζij , 0, otherwise for each j ∈ [N ], i ∈ [I]. We only need to show that (γ, s, z, y, x) satisfies the constraints in (23). Clearly, constraints (23a), (23b), (23d), (23e) and (23f) are satisfied, n and for each j o∈ [N ], i ∈ [I] such that yij = 1, constraints (23c) are also satisfied since sij = max bi (x) − e> ζij , 0 = bi (x) − e> ζij . It remains to show that for each j ∈ [N ], i ∈ [I]

such that yij = 0, constraints (23c) will be also satisfied, i.e., sij = 0 ≤ bi (x). 12

Suppose that there exists a i0 ∈ [I] such that bi0 (x) < 0. By assumption, we know that e> ζij0 ≥ 0 o n for all j ∈ [N ]. Thus, we have max bi0 (x) − e> ζij0 , 0 = 0 for all j ∈ [N ]. According to constraints (22b), we have zj +γ ≤ 0, i.e., zj ≤ −γ for each j ∈ [N ]. Substituting this inequality into constraint (22a), we finally have 1 X δ − γ ≤ zj ≤ −γ N j∈[N ]

which implies that γ ≤

δ − 1−

< 0, a contradiction that γ ≥ 0.

b Given x ∈ Z, there exists (γ, s, z, y, x) which satisfies the constraints in (23). To prove that Z ⊇ Z. (γ, z, x) satisfies constraints (22), we only need to show that n o sij ≤ max bi (x) − e> ζij , 0 for each j ∈ [N ], i ∈ [I]. Indeed, if yij = 1, we have sij ≤ bi (x) − e> ζij ≤ max{bi (x) − e> ζij , 0}, otherwise, we have sij ≤ 0 ≤ max{bi (x) − e> ζij , 0}. 2 We finally remark that formulation (23) can be stronger than (21) since we only need to compute the largest upper bound of bi (x) − e> ζij and define it as Mij rather than the largest absolute value of bi (x) − e> ζij .

3.2

Inner and Outer Approximations

In the previous subsection, we develop exact mixed integer reformulations of set Z under various setting. However, these reformulations might be difficult to solve, especially when the number of empirical data points becomes large (i.e., N is large), there are a large number (i.e., I ×N ) of binary variables in the reformulations. In this subsection, we will investigate compact formulations of the inner and outer approximations of set Z proposed in Section 2, which involve fewer or even zero binary variables. VaR Outer Approximation: We first study the reformulation of the outer approximation ZVaR . Theorem 12. Under Assumptions (A1)-(A2), set Z is outer approximated by δ >˜ n ka(x)k∗ + a(x) ζi ≤ bi (x), i ∈ [I] ≥ 1 − . ZVaR = x ∈ R : Pζ˜ Proof. By Theorem 3 with ) max bi (x) − a(x)> ζ, 0 , min χbi (x) ζ,0} δ ≥ , ∀i ∈ [I] \ I(x), n ka(x)k∗ ZVaR = x ∈ R : Pζ˜ ≥1− , χbi (x) ζ j ≤ bi (x), ∀j ∈ [N ], i ∈ [I].

(25)

Proof. The proof is similar to Theorem 12, thus is omitted .2 We remark that set ZS in (25) is very similar to scenario approach to chance constrained program [6], i.e., generate N i.i.d. samples {ζ j }j∈[N ] and enforce the constraints corresponding to each sample. The difference is that in set ZS , we also add a penalty δ ka(x)k∗ to each of the sampled constraints. Inner Approximation II- Inner Chance Constraint Approximation: The second inner approximation set ZI is nonconvex and according to Theorem 5, we can formulate it as below. Theorem 14. Suppose that Assumptions (A1)-(A2) hold and ∈ (0, 1), N ∈ / Z+ , then set Z is inner approximated by bN c δ n >˜ ZI = x ∈ R : Pζ˜ ka(x)k∗ + a(x) ζi ≤ bi (x), i ∈ [I] ≥ 1 − α, 0 ≤ α ≤ . (26) −α N Proof. The proof is similar to Theorem 12, hence is omitted. Note that for any given α, set ZI is mixed integer representable with big-M coefficients. Since from Corollary 2, there are only bN c + 1 effective values of α that we can choose from, thus ZI can be formulated as a disjunction of bN c + 1 mixed integer sets. Inner Approximation III- CVaR Approximation: Next, we study the CVaR approximation. Theorem 15. Under Assumptions (A1)-(A2), set Z is inner approximated by   1 X     z , δν − γ ≤   j   N     j∈[N ]         j > n zj + γ ≤ bi (x) − a(x) ζi , ∀j ∈ [N ], i ∈ [I], ZCVaR = x ∈ R :     zj ≤ 0, ∀j ∈ [N ],           ka(x)k ≤ ν, , ∗       ν ≥ 0, γ ≥ 0,

(27a) (27b) (27c) (27d) (27e)

Proof. By Theorem 6, we have set ZCVaR is equal to   " #   1 X   j >  − β + λδ − inf λkξ − ζ k − max a(x) ξi − bi (x) + β ≤ 0, n ξ N i∈[I] ZCVaR = x ∈ R : + j∈[N ]       λ, β ≥ 0.

14

which is further equivalent to   1 X   > j  min min bi (x) − a(x) ζi − β, 0 ≤ 0, − β + λδ −       N i∈[I] j∈[N ] n ZCVaR = x ∈ R :     ka(x)k∗ ≤ λ, ∀i ∈ [I],       λ, β ≥ 0. i h In the above formulation, let ν = λ, γ = β and also let zj = min mini∈[I] bi (x) − a(x)> ζij − β, 0 and linearize it for each j ∈ [N ]. Thus, we arrive at (27).2 We remark that we can directly derive the reformulation of set ZCVaR in (27) based upon formulation (17). Indeed, since max{bi (x)−a(x)> ζi , 0} ≥ bi (x)−a(x)> ζi , by replacing max{bi (x)−a(x)> ζi , 0} with bi (x) − a(x)> ζ, then function f (x, ζ) is lower bounded by bi (x) − a(x)> ζi min f (x, ζ) ≥ f (x, ζ) = min , min χbi (x) 0 for each j ∈ [N ]. Suppose that there exists a j0 ∈ [N ] such that zj0 + γ ≤ 0. Then according to (19a), we have δ≤

1 N

X

zj +

j∈[N ]\{j0 }

1 (N γ + zj0 ) ≤ 0 N

where the second inequality is due to N ≤ 1 and zj0 + γ ≤ 0, a contradiction that δ > 0. Therefore, in (19b), we must have n o max bi (x) − a(x)> ζij , 0 = bi (x) − a(x)> ζij for each i ∈ [I], j ∈ [N ]. Hence, (ν, γ, z, x) satisfies the constraints in (27), i.e., x ∈ ZCVaR . 2 Formulation Comparisons: Finally, we would like to compare sets ZS , ZCVaR . Indeed, we can show that ZS ⊆ ZCVaR , i.e., set ZS is at least as conservative as set ZCVaR . Theorem 16. Let ZS , ZCVaR be defined in (25), (27) , respectively. Then ZS ⊆ ZCVaR .

15

Proof. Given x ∈ ZS , we only need to show that x ∈ ZCVaR . Indeed, let us consider ν = ka(x)k∗ , ∗ , zj = 0 for all j ∈ [N ], then we see that (ν, γ, z, x) satisfies the constraints in (27), i.e., γ = ka(x)k x ∈ ZCVaR .2 We illustrate sets Z, ZVaR , ZCVaR , ZS , ZI with the following example.

6⊆

Example 1. Suppose N = 3, n = 2, I = 2, δ = 1/6, = 2/3 and ζ11 = (1, 0)> , ζ21 = (0, 3)> , ζ12 = (3, 0)> , ζ22 = (0, 1)> , ζ13 = (2, 0)> , ζ23 = (0, 2)> , b1 (x) = x1 , a(x) = 1, b2 (x) = x2 . Then, we have ) ( ) ( √ √ √ √ 2 2 2 2 ≤ x1 , 3 + ≤ x2 ∪ (x1 , x2 ) : 3 + ≤ x1 , 2 + ≤ x2 Z = (x1 , x2 ) : 2 + 2 2 2 2 ( ) √ 2 ∪ (x1 , x2 ) : 3 ≤ x1 , 3 ≤ x3 , 6 + ≤ x1 + x2 2 ( ) ( ) √ √ √ √ 2 2 2 2 ≤ x1 , 3 + ≤ x2 ∪ (x1 , x2 ) : 3 + ≤ x1 , 2 + ≤ x2 ZVaR = (x1 , x2 ) : 2 + 4 4 4 4 ) ( √ 2 ≤ x1 + x2 ZCVaR = (x1 , x2 ) : 3 ≤ x1 , 3 ≤ x2 , 6 + 2 ( ) √ √ 2 2 ZS = (x1 , x2 ) : 3 + ≤ x1 , 3 + ≤ x2 4 4 ( ) ( ) √ √ √ √ 2 2 2 2 ZI = (x1 , x2 ) : 2 + ≤ x1 , 3 + ≤ x2 ∪ (x1 , x2 ) : 3 + ≤ x1 , 2 + ≤ x2 2 2 2 2 ) ( √ √ 2 2 ≤ x1 , 3 + ≤ x2 . ∪ (x1 , x2 ) : 3 + 4 4 ZCVaR Clearly, we have ZS ( ( Z ( ZVaR (see Figure 1 for an illustration). ZI

Finally, the inclusive relationships among sets Z, ZVaR , ZS , ZI ZCVaR are illustrated in Figure 2 and their reformulations are summarized in Table 1. Table 1: Summary of formulation results in Section 3 Set Z Mixed-integer Theorem 9

4

Set ZVaR Mixed-integer Theorem 12

Set ZS Convex Theorem 13

Set ZI Mixed-integer Theorem 14

Set ZCVaR Convex Theorem 15

DRCCP with Pure Binary Decision Variables

In this section, we will study DRCCP with pure binary decision variables x ∈ {0, 1}n , i.e., in addition to Assumptions (A1)-(A2), we further assume that (A3) the set S is binary, that is, S ⊆ {0, 1}n . Indeed, we remark that if S is a bounded mixed integer set, we can introduce O(n log(n)) additional binary variables to approximate set S with arbitrary accuracy via binary approximation of continuous variables (c.f., [42]). For binary DRCCP, we will show that the reformulations in the previous section can be improved. 16

x2

ZCVaR

(2, 3)

(2, 2)

ZS

ZI Z

ZVaR

x1

(3, 2) Figure 1: Illustration of Example 1

Figure 2: Summary of formulation comparisons

17

4.1

Polyhedral Results of Submodular Functions: A Review

Our main derivation of stronger formulations is based upon some polyhedral results of submodular functions, which will be briefly reviewed in this subsection. We first begin with the following lemmas on submodular functions. Lemma 1. Given d1 ∈ Rn+ , d2 , d3 ∈ R, function f (x) = − max d> 1 x + d2 , d3 is submodular over the binary hypercube. Proof. For simplicity, given a T ⊆ [n], we define a binary vector eT ∈ {0, 1}n such that ( 1, if l ∈ T (eT )l = . 0, if l ∈ [n] \ T According to the definition of submodular function [12], we only need to show that f (eT1 ∪{t} ) − f (eT1 ) ≥ f (eT2 ∪{t} ) − f (eT2 ) for any T1 ⊆ T2 and t ∈ [n] \ T2 . There are three cases: P Case 1. if i∈T1 d1i + d2 ≥ d3 , since d1 ∈ Rn+ , then we must have f (eT1 ∪{t} ) − f (eT1 ) = f (eT2 ∪{t} ) − f (eT2 ) = −d1t . Case 2. if

P

i∈T1

d1i + d2 < d3 but

P

i∈T2

d1i + d2 ≥ d3 , then we must have

f (eT1 ∪{t} ) − f (eT1 ) = 0 ≥ f (eT2 ∪{t} ) − f (eT2 ) = −d1t , where the inequality is due to d1 ∈ Rn+ . P Case 3. if i∈T2 d1i + d2 < d3 , since d1 ∈ Rn+ , then we must have f (eT1 ∪{t} ) − f (eT1 ) = f (eT2 ∪{t} ) − f (eT2 ) = 0. 2 Lemma 2. Given q ≥ 1, function f (x) = kxkq with q ≥ 1 is submodular over the binary hypercube. qP > Proof. This is because f (x) = kxkq = q l∈[n] xl , and g(e x) is a submodular function if g(·) is a concave function (cf., [40]). 2 Next, we will introduce polyhedral properties of submodular functions. For any given submodular function f (x) with x ∈ {0, 1}n , let us denote Πf to be its epigraph, i.e., Πf = {(x, φ) : φ ≥ f (x), x ∈ {0, 1}n } . Then the convex hull of Πf is characterized by the system of “extended polymatroid inequalities” (EPI) [2, 40], i.e.,     X conv (Πf ) = (x, φ) : f (0) + ρσl xσl ≤ φ, ∀σ ∈ Ω, x ∈ [0, 1]n , (28)   l∈[n]

where Ω denotes a collection of all the permutations of set [n] and ρσl = f (eAσl ) − f (eAσl−1 ) for each ( 1, if τ ∈ T σ σ l ∈ [n] with A0 = ∅, Al = {σ1 , . . . , σl } and (eT )τ = . 0, if τ ∈ [n] \ T In addition, although there are n! number of inequalities in (28), these inequalities can be easily separated by greedy procedure. 18

˜ ∈ ˜ φ) Lemma 3. ([2, 40]) Suppose (x, / conv (Πf ), and σ ∈ Ω be a permutation of [n] such that x ˜σ1 ≥ . . . ≥ x ˜σn . P ˜ ˜ φ) must violate the constraint f (0) + l∈[n] ρσl xσl ≤ φ. Then (x, ˜ from conv (Πf ), we only need to sort the ˜ φ) From Lemma 3, we see that to separate a point (x, ˜ can be the separated by the ˜ in ˜ φ) coordinates of x ˜σ1 ≥ . . . ≥ x ˜σn . Then (x, Pa descending order, i.e., x constraint f (0) + l∈[n] ρσl xσl ≤ φ from conv (Πf ). The time complexity of this separating procedure is O(n log n).

4.2

Reformulating a Binary DRCCP by Submodular Knapsack Constraints: Big-M free

In this section, we will replace the nonlinar constraints defining the feasible region of a binary DRCCP (i.e., set S ∩ Z) by submodular upper bound (knapsack) constraints. These constraints can be equivalently described by the system of EPI in (28), therefore we obtain a big-M free mixed integer representation of set S ∩ Z. First, we introduce n complementing binary variables of x, denoted by w, i.e., wl + xl = 1 for each l ∈ [n]. With these n additional variables, we can reformulate function bi (x) − a(x)> ζij as > bi (x) − a(x)> ζij = rij x + t> ij w + uij

(29)

for each i ∈ [I], j ∈ [N ] such that rij ∈ Rn+ , tij ∈ Rn+ . Indeed, since a(x) = ηx + (1 − η)e and bi (x) = Bi> x + bi , in (29), we can choose rijl = Bil I(Bil > 0) − ηζilj I(ζilj < 0), tijl = −Bil I(Bil < 0) + ηζilj I(ζilj > 0), X j j uij = bi − (1 − η)e> ζij + Biτ I(Biτ < 0) − ηζiτ I(ζiτ > 0) , τ ∈[n]

for each l ∈ [n], i ∈ [I], j ∈ [N ]. Thus, from above discussion, we can formulate S ∩ Z1 (note that set Z = Z1 ∪ Z2 according to Theorem 8) as the following mixed integer set with submodular knapsack constraints. Theorem 17. Suppose that Assumptions (A1)-(A3) hold. Then S ∩ Z = (S ∩ Z1 ) ∪ (S ∩ Z2 ), where   1 X     z , δν − γ ≤ j     N     j∈[N ]         > >   − max r x + t w + u , 0 ≤ −z − γ, ∀i ∈ [I], j ∈ [N ],   ij j ij ij       z ≤ 0, ∀j ∈ [N ], S ∩ Z1 = x ∈ S : j     ηkxk∗ + (1 − η)kek∗ ≤ ν,           wl + xl = 1, ∀l ∈ [n],           ν ≥ 1,       n γ ≥ 0, w ∈ {0, 1}

(30a) (30b) (30c) (30d) (30e) (30f) (30g)

and S ∩ Z2 = {x ∈ S :a(x) = 0, bi (x) ≥ 0, ∀i ∈ [I]}

(31)

Proof. From the discussion above and the fact that a(x) = ηx + (1 − η)e with η ∈ {0, 1}, we have constraints (19b) and (19d) is equivalent to (30b) and (30d). We only need to show that ν ≥ 1 in set S ∩ Z1 . 19

If η = 0, then ηkxk∗ + (1 − η)kek∗ = kek∗ ≥ 1, then we are done. Now suppose that η = 1. We note that if x = 0, then the constraints (19) imply that bi (x) > 0 for each i ∈ [I]. Thus, if x = 0 , then set Z1 ⊆ Z2 . Therefore, without loss of generality, we can assume that in set Z1 , x 6= 0. Note that S ∩ Z1 ⊆ {0, 1}n , therefore, x 6= 0 implies that kxk∗ ≥ 1, thus, v ≥ ηkxk∗ + (1 − η)kek∗ = kxk∗ ≥ 1.2 From the proof of Theorem 17, we note that if η = 1 and bi ≥

δ

for each i ∈ [I], then we have Z2 ⊆ Z1

Corollary 4. Suppose that Assumptions (A1)-(A3) hold, η = 1 and bi ≥

δ

for each i ∈ [I]. Then S ∩Z = S ∩Z1 .

Proof. We only need to show that 0 ∈ S ∩ Z1 . Suppose x = 0, i.e., w = e. Let us set ν = 1, γ = 0, z = 0. Then it is easy to see that (x, w, z, γ, ν) satisfies the constraints in (30), i.e., 0 ∈ S ∩ Z1 . 2 We note that the left-hand sides of constraints (30b) and (30d) are submodular functions according to Lemma 1 and Lemma 2, thus, we can equivalently replace these constraints with the convex hulls of epigraphs of their associated submodular functions. Thus, Corollary 5. Suppose that Assumptions (A1)-(A3) hold. Then   1 X       δν − γ ≤ zj ,     N     j∈[N ]           (x, w, −z − γ) ∈ conv(Π ), ∀i ∈ [I], j ∈ [N ], j ij   S ∩ Z1 = x ∈ S :zj ≤ 0, ∀j ∈ [N ],         (x, ν) ∈ conv(Π ), 0           w + x = 1, ∀l ∈ [n], l l         n ν ≥ 1, γ ≥ 0, w ∈ {0, 1} ,

(32a) (32b) (32c) (32d) (32e) (32f)

where > n Πij = (x, w, φ) : − max rij x + t> , ∀i ∈ [I], j ∈ [N ], ij w + uij , 0 ≤ φ, x, w ∈ {0, 1} n

Π0 = {(x, φ) : ηkxk∗ + (1 − η)kek∗ ≤ φ, x ∈ {0, 1} }

(33a) (33b)

and {conv(Πij )}i∈[I],j∈[N ] , conv(Π0 ) can be described by the system of EPI in (28). Note that the optimization problem minx∈S∩Z1 c> x can be solved by branch and cut algorithm. b zb, γ In particular, at each branch and bound node, denoted as (b x, w, b, νb), there might be too many (i.e., N ×I +1) valid inequalities to add, since in (32b) and (32d), there are N ×I +1 convex hulls of epigraphs (i.e., {conv(Πij )}i∈[I],j∈[N ] , conv(Π0 )) to be separated from. Therefore, instead, we can first check and find the epigraphs of κ (κ ≥ 1) most violated constraints in (30b) and (30d), i.e., find the epigraphs corresponding to the κ largest values in the following set > b + t> b + uij , 0 + zbj + γ {− max rij x b}i∈[I],j∈[N ] ∪ {ηkb xk∗ + (1 − η)kek∗ − νb}. ij w b zb, γ Finally, we can generate and add valid inequalities by separating (b x, w, b, νb) from the convex hulls of these κ epigraphs according to Lemma 3.

4.3

Numerical Study

In this subsection, we present a numerical study to compare the big-M formulation in Theorem 9 with big-M free formulation in Theorem 17 and its corollaries on the distributionally robust multidimensional knapsack problem (DRMKP) [10, 34, 37]. In DRMKP, there are n items and I knapsacks. Additionally, cj represents the value of item j for all j ∈ [n], ξ˜i := [ξ˜i1 , . . . , ξ˜in ]> represents the vector of 20

random item weights in knapsack i, and bi > 0 represents the capacity limit of knapsack i, for all i ∈ [I]. The binary decision variable xj = 1 if the jth item is picked and 0 otherwise. We use the Wasserstein ambiguity set under Assumptions (A1) and (A2) with L2 - norm as distance metric. With the notation above, DRMKP is formulated as v∗ =

max

x∈{0,1}n

s.t.

c> x, n o inf P ξ˜i> x ≤ bi , ∀i ∈ [I] ≥ 1 − .

(34)

P∈P

To test the proposed formulations, we generate 10 random instances with n = 20 and I = 10, I×n indexed by {1, 2, . . . , 10}. For each instance, we generate N = 1000 empirical samples {ζ j }j∈[N ] ∈ R+ I×n from a uniform distribution over a box [1, 10] . For each l ∈ [n], we independently generate cl from the uniform distribution on the interval [1, 10], while for each i ∈ [I], we set bi := 100. We test these 10 random instances with risk parameter ∈ {0.05, 0.10} and Wasserstein radius δ ∈ {0.1, 0.2}. Our first approach is to solve the big-M reformulation of DRMKP in Theorem 9, which reads as follows: v∗ =

max

x∈{0,1}n

s.t.

c> x, δν − γ ≤

1 X zj , N j∈[N ]

zj + γ ≤ sij , ∀i ∈ [I], j ∈ [N ], sij ≥ bi − x> ζij , ∀i ∈ [I], j ∈ [N ], i

sij ≤ Mij yij , sij ≤ b −

x> ζij

(35)

+ Mij (1 − yij ), ∀i ∈ [I], j ∈ [N ],

kxk2 ≤ ν, ν ≥ 1, γ ≥ 0, sij ≥ 0, zj ≤ 0, yij ∈ {0, 1}, ∀i ∈ [I], j ∈ [N ], where Mij = max{bi , |bi − e> ζij |} for each i ∈ [I], j ∈ [N ]. We compare this formulation with another big-M free formulation in Theorem 17 and its corollaries, which reads as follows: v∗ =

max

x∈{0,1}n

c> x,

s.t. δν − γ ≤

1 X zj , N j∈[N ]

(w, −zj − γ) ∈ conv(Πij ), ∀i ∈ [I], j ∈ [N ], zj ≤ 0, ∀j ∈ [N ],

(36)

(x, ν) ∈ conv(Π0 ), wl + xl = 1, ∀l ∈ [n], ν ≥ 1, γ ≥ 0, w ∈ [0, 1]n , where n n o o Πij = (w, φ) : − max (ζij )> w + bi − (ζij )> e, 0 ≤ φ, w ∈ {0, 1}n , ∀i ∈ [I], j ∈ [N ], n

Π0 = {(x, φ) : kxk2 ≤ φ, x ∈ {0, 1} }

(37a) (37b)

and their convex hulls, {conv(Πij )}i∈[I],j∈[N ] , conv(Π0 ) can be described by the system of EPI (28). Note that the fact that (35) and (36) are exact reformulations of DRMKP follows from Corollary 4 since bi ≥ δ for all i ∈ [I]. 21

We use the commercial solver Gurobi (version 7.5, with default settings) to solve the instances of formulation (35). We set the time limit of solving each instance to be 3600 seconds. The results are displayed in Table 2. We use UB, LB, GAP, Opt. Val. and Time to denote the best upper bound, the best lower bound, optimality gap, the optimal objective value and the total running time, respectively. All instances were executed on a MacBook Pro with a 2.80 GHz processor and 16GB RAM. From Table 2, we observe that the overall running time of DRMKP formulation (36) significantly outperforms that of (35), i.e., almost all of the instances of formulation (36) can be solved within 10 minutes, while the majority of the instances of formulation (35) reach the time limit. The main reasons are two-fold: (i) formulation (35) involves O(N × I + n) binary variables and O(N × I) continuous variables, while formulation (35) only involves O(n) binary variables and O(N ) continuous variables; and (ii) formulation (35) contains big-M coefficients, while formulation (36) is big-M free. We also observe that, as the risk parameter increases or Wasserstein radius δ decreases, both formulations take longer to solve but formulation (36) still significantly outperforms formulation (35). These results demonstrate the effectiveness our proposed approaches.

5

Conclusion

In this paper, we studied a distributionally robust chance constrained problem (DRCCP) with Wasserstein ambiguity set. We showed that a DRCCP can be formulated as a conditional-value-at-risk constrained optimization, thus admits tight inner and outer approximations. When the metric space of random variables is normed vector space, we showed that a DRCCP is mixed integer representable with big-M coefficients and additional binary variables, i.e., a DRCCP can be formulated as a mixed integer conic program. We also compared various inner and outer approximations and proved their corresponding inclusive relations. We further proposed a big-M free formulation for a binary DRCCP. The numerical studies demonstrated that the developed big-M free formulation can significantly outperform the big-M one.

Acknowledgments The author would like to thank Professor Shabbir Ahmed (Georgia Tech) for his helpful comments on an earlier version of the paper.

22

Table 2: Performance comparison of formulation (35) and formulation (36) Formulation (35) Formulation (36) δ Instances n I UB LB Time GAP Opt. Val. Time 1 20 10 93 86 3600.0 7.5% 89 49.3 90 3600.0 7.2% 95 30.6 2 20 10 97 3 20 10 95 84 3600.0 11.6% 90 387.0 4 20 10 84 74 3600.0 11.9% 78 275.7 81 3600.0 6.9% 82 140.4 5 20 10 87 0.05 0.1 6 20 10 97 85 3600.0 12.4% 88 972.5 75 3600.0 15.7% 84 169.6 7 20 10 89 8 20 10 100 88 3600.0 12.0% 96 80.5 78 3600.0 18.8% 92 59.3 9 20 10 96 10 20 10 93 93 3542.7 0.0% 93 18.2 Average 3594.3 10.4% 218.3 1 20 10 100 NA 3600.0 NA 92 172.9 2 20 10 106 NA 3600.0 NA 99 164.0 3 20 10 105 87 3600.0 17.1% 93 569.1 4 20 10 92 67 3600.0 27.2% 82 600.5 NA 86 332.0 5 20 10 95 NA 3600.0 0.1 0.1 6 20 10 109 NA 3600.0 NA 94 1852.4 7 20 10 96 NA 3600.0 NA 88 279.8 8 20 10 108 82 3600.0 24.1% 100 133.2 9 20 10 102 NA 3600.0 NA 94 389.3 10 20 10 103 96 3600.0 6.8% 96 149.7 Average 3600.0 18.8% 464.3 1 20 10 87 87 665.8 0.0% 87 8.5 2 20 10 88 88 2473.2 0.0% 88 19.3 3 20 10 86 86 1391.3 0.0% 86 70.4 4 20 10 74 74 2881.7 0.0% 74 102.5 5 20 10 78 78 1553.5 0.0% 78 26.9 0.05 0.2 86 2776.2 0.0% 86 442.7 6 20 10 86 7 20 10 83 83 1413.9 0.0% 83 17.1 8 20 10 92 92 297.7 0.0% 92 21.0 9 20 10 90 90 148.5 0.0% 90 14.6 90 1074.2 0.0% 90 8.9 10 20 10 90 Average 1467.6 0.0% 73.2 1 20 10 96 85 3600.0 11.5% 92 34.3 88 3600.0 14.6% 99 16.5 2 20 10 103 3 20 10 98 93 3600.0 5.1% 93 175.4 4 20 10 86 82 3600.0 4.7% 82 243.5 5 20 10 90 NA 3600.0 NA 86 84.7 0.1 0.2 6 20 10 101 81 3600.0 19.8% 94 524.6 7 20 10 90 88 3600.0 2.2% 88 93.1 NA 100 53.4 8 20 10 103 NA 3600.0 9 20 10 97 94 3600.0 3.1% 94 75.5 10 20 10 99 89 3600.0 10.1% 96 14.1 Average 3600.0 8.9% 131.5 ∗ The NA represents that no feasible solution has been found within the time limit

23

References [1] S. Ahmed, J. Luedtke, Y. Song, and W. Xie. Nonanticipative duality, relaxations, and formulations for chance-constrained stochastic programs. Mathematical Programming, 162(1-2):51–81, 2017. ¨ and V. Narayanan. Polymatroids and mean-risk minimization in discrete optimiza[2] A. Atamturk tion. Operations Research Letters, 36(5):618–622, 2008. [3] J. Blanchet, L. Chen, and X. Y. Zhou. Distributionally robust mean-variance portfolio selection with wasserstein distances. arXiv preprint arXiv:1802.04885, 2018. [4] J. Blanchet, Y. Kang, and K. Murthy. Robust wasserstein profile inference and applications to machine learning. arXiv preprint arXiv:1610.05627, 2016. [5] E. Boissard et al. Simple bounds for the convergence of empirical and occupation measures in 1-wasserstein distance. Electronic Journal of Probability, 16:2296–2333, 2011. [6] G. C. Calafiore and M. C. Campi. The scenario approach to robust control design. IEEE Transactions on Automatic Control, 51(5):742–753, 2006. [7] G. C. Calafiore and L. El Ghaoui. On distributionally robust chance-constrained linear programs. Journal of Optimization Theory and Applications, 130(1):1–22, 2006. [8] M. C. Campi, S. Garatti, and M. Prandini. The scenario approach for systems and control design. Annual Reviews in Control, 33(2):149–157, 2009. [9] W. Chen, M. Sim, J. Sun, and C.-P. Teo. From CVaR to uncertainty set: Implications in joint chanceconstrained optimization. Operations research, 58(2):470–485, 2010. [10] J. Cheng, E. Delage, and A. Lisser. Distributionally robust stochastic knapsack problem. SIAM Journal on Optimization, 24(3):1485–1506, 2014. [11] C. Duan, W. Fang, L. Jiang, L. Yao, and J. Liu. Distributionally robust chance-constrained approximate ac-opf with wasserstein metric. IEEE Transactions on Power Systems, 2018. [12] J. Edmonds. Submodular functions, matroids, and certain polyhedra. Edited by G. Goos, J. Hartmanis, and J. van Leeuwen, 11, 1970. [13] L. El Ghaoui, M. Oks, and F. Oustry. Worst-case value-at-risk and robust portfolio optimization: A conic programming approach. Operations Research, 51(4):543–556, 2003. [14] N. Fournier and A. Guillin. On the rate of convergence in wasserstein distance of the empirical measure. Probability Theory and Related Fields, 162(3-4):707–738, 2015. [15] R. Gao, X. Chen, and A. J. Kleywegt. Wasserstein distributional robustness and regularization in statistical learning. arXiv preprint arXiv:1712.06050, 2017. [16] R. Gao and A. J. Kleywegt. Distributionally robust stochastic optimization with wasserstein distance. arXiv preprint arXiv:1604.02199, 2016. [17] G. A. Hanasusanto and D. Kuhn. Conic programming reformulations of two-stage distributionally robust linear programs over wasserstein balls. arXiv preprint arXiv:1609.07505, 2016. [18] G. A. Hanasusanto, V. Roitch, D. Kuhn, and W. Wiesemann. A distributionally robust perspective on uncertainty quantification and chance constrained programming. Mathematical Programming, 151:35–62, 2015.

24

[19] G. A. Hanasusanto, V. Roitch, D. Kuhn, and W. Wiesemann. Ambiguous joint chance constraints under mean and dispersion information. Operations Research, 65(3):751–767, 2017. [20] A. R. Hota, A. Cherukuri, and J. Lygeros. Data-driven chance constrained optimization under wasserstein ambiguity sets. arXiv preprint arXiv:1805.06729, 2018. [21] R. Jiang and Y. Guan. Data-driven chance constrained stochastic program. Mathematical Programming, 158:291–327, 2016. ¨ [22] R. Kiesel, R. Ruhlicke, G. Stahl, and J. Zheng. The wasserstein metric and robustness in risk management. Risks, 4(3):32, 2016. [23] J. Lee and M. Raginsky. Minimax statistical learning and domain adaptation with wasserstein distances. arXiv preprint arXiv:1705.07815, 2017. [24] B. Li, R. Jiang, and J. L. Mathieu. Ambiguous risk constraints with moment and unimodality information. Mathematical Programming, Nov 2017. [25] J. Luedtke and S. Ahmed. A sample approximation approach for optimization with probabilistic constraints. SIAM Journal on Optimization, 19(2):674–699, 2008. [26] F. Luo and S. Mehrotra. Decomposition algorithm for distributionally robust optimization using wasserstein metric. arXiv preprint arXiv:1704.03920, 2017. [27] P. Mohajerin Esfahani and D. Kuhn. Data-driven distributionally robust optimization using the wasserstein metric: performance guarantees and tractable reformulations. Mathematical Programming, Jul 2017. [28] A. Nemirovski and A. Shapiro. Convex approximations of chance constrained programs. SIAM Journal on Optimization, 17(4):969–996, 2006. [29] A. Nemirovski and A. Shapiro. Scenario approximations of chance constraints. In Probabilistic and randomized methods for design under uncertainty, pages 3–47. Springer, 2006. [30] F. Qiu, S. Ahmed, S. S. Dey, and L. A. Wolsey. Covering linear programming with violations. INFORMS Journal on Computing, 26(3):531–546, 2014. [31] R. T. Rockafellar and S. Uryasev. Optimization of conditional value-at-risk. Journal of risk, 2:21–42, 2000. [32] S. Shafieezadeh-Abadeh, P. M. Esfahani, and D. Kuhn. Distributionally robust logistic regression. In Advances in Neural Information Processing Systems, pages 1576–1584, 2015. [33] A. Shapiro and A. Kleywegt. Minimax analysis of stochastic problems. Optimization Methods and Software, 17(3):523–542, 2002. ¨ ¸ ukyavuz. ¨ [34] Y. Song, J. R. Luedtke, and S. Kuc Chance-constrained binary packing problems. INFORMS Journal on Computing, 26(4):735–747, 2014. [35] W. Xie and S. Ahmed. Bicriteria approximation of chance constrained covering problems. Available at Optimization Online, 2018. [36] W. Xie and S. Ahmed. Distributionally robust chance constrained optimal power flow with renewables: A conic reformulation. IEEE Transactions on Power Systems, 33(2):1860–1867, 2018. [37] W. Xie and S. Ahmed. On deterministic reformulations of distributionally robust joint chance constrained optimization problems. SIAM Journal on Optimization, 28(2):1151–1182, 2018. 25

[38] W. Xie, S. Ahmed, and R. Jiang. Optimized bonferroni approximations of distributionally robust joint chance constraints. Available at Optimization Online, 2017. [39] W. Yang and H. Xu. Distributionally robust chance constraints for non-linear uncertainties. Mathematical Programming, 155:231–265, 2016. [40] J. Yu and S. Ahmed. Polyhedral results for a class of cardinality constrained submodular minimization problems. Discrete Optimization, 24:87–102, 2017. [41] C. Zhao and Y. Guan. Data-driven risk-averse two-stage stochastic program with ζ-structure probability metrics. Available at http://www.optimization-online.org/DB_FILE/2015/07/ 5014.pdf, 2015. [42] J. Zou, S. Ahmed, and X. A. Sun. Stochastic dual dynamic integer programming. Mathematical Programming, pages 1–42, 2017. [43] S. Zymler, D. Kuhn, and B. Rustem. Distributionally robust joint chance constraints with secondorder moment information. Mathematical Programming, 137:167–198, 2013.

26