Computational aspects of robust optimized certainty equivalents Daniel Bartl
a,1
, Samuel Drapeau
b,2, ∗
, Ludovic Tangpi
c,3, †
arXiv:1706.10186v1 [q-fin.RM] 30 Jun 2017
July 3, 2017 A BSTRACT Accounting for model uncertainty in risk management leads to infinite dimensional optimization problems which are both analytically and numerically untractable. In this article we study when this fact can be overcome for the socalled optimized certainty equivalent risk measure (OCE) – including the average value-at-risk as a special case. First we focus on the case where the set of possible distributions of a financial loss is given by the neighborhood of a given baseline distribution in the Wasserstein distance, or more generally, an optimal-transport distance. Here it turns out that the computation of the robust OCE reduces to a finite dimensional problem, which in some cases can even be solved explicitly. Further, we derive convex dual representations of the robust OCE for measurable claims without any assumptions on the set of distributions and finally give conditions on the latter set under which the robust average value-at-risk is a tail risk measure.
AUTHORS I NFO a
Department of Mathematics, University of Konstanz, Universitätsstraße 10, 78464 Konstanz- Germany. b School of Mathematical Sciences and Shanghai Advanced Institute for Finance (CAFR/CMAR). 211 West Huaihai road, Shanghai, P.R. 200030 China. c Faculty of Mathematics, University of Vienna, Oskar-Morgenstern-Platz 1, 1090 Wien- Austria. 1
[email protected] 2
[email protected] 3
[email protected] ∗ Financial support from the National Science Foundation of China (Research Fund for International Young Scientists) under Grant 11550110184 is gratefully acknowledged. † Financial support from Vienna Science and Technology Fund (WWTF) under Grant MA 14-008 is gratefully acknowledged.
PAPER I NFO MSC 2010: 91G80, 90B50, 60E10, 91B30 K EYWORDS : Optimized certainty equivalent; optimal transport; Wasserstein distance; distribution uncertainty; convex duality; average value-at-risk.
1. Introduction In this article we study properties of the optimized certainty equivalent (OCE for short) of Ben-Tal and Teboulle [6, 7] under model uncertainty. In the context of risk assessment, the rationale behind the definition of the OCE is as follows. Assume that the financial agent faces a future uncertain loss profile with distribution µ, and she wants to assess its risk. In her assessment, given a loss function l : R → (−∞, +∞], R she computes the expectation l(x)µ(dx) representing the present average cost of her losses. She can however reduce her overall future losses by allocating some liquidity m, resulting into the present value R l(x + m)µ(dx) − m. Minimizing over all possible allocations defines the optimal costs or OCE of µ with respect to the loss function l: Z OCE(µ, l) := inf l(x − m)µ(dx) + m . (1.1) m∈R
Now, if the distribution µ of future loss is not perfectly known, but is an element of some set D of possible R distributions, the worst case overall cost of an allocation m is given by supµ∈D l(x − m)µ(dx) + m. ✩ We
thank Mathias Pohl for fruitful discussions
1
Hence, the natural definition of the robust optimized certainty equivalent is Z OCE(D, l) := inf sup l(x − m) µ(dx) + m . m∈R
µ∈D
The classical OCE satisfies sound economical properties discussed in [6]. In particular, it is a convex monetary risk measure in the sense of Artzner et al. [1] and Föllmer and Schied [20] which is additionally law-invariant, see Frittelli and Rosazza Gianin [22] for definition and consequences. Furthermore, depending on the specification of the loss function l, it entails classical risk measures such as the entropic risk measure, the average value-at-risk, see Artzner et al. [1], the monotone mean variance of Maccheroni et al. [27] and as a scaling limit, the shortfall risk measure of Föllmer and Schied [20]. Stated as a classical unconstrained one dimensional optimization problem, the OCE is a smooth quantification instrument, see Cheridito and Li [12] and Cherny and Kupper [14]. The computation of the risk as well as the risk contributions can be explicitly stated in terms of first order conditions and efficiently implemented using Fourier transform methods, see Drapeau et al. [17]. However when facing model uncertainty, these properties become a-priori challenging due to an infinite dimensional optimization problem. In addition, it is not clear by how much the resulting robust quantification of risk deviates from its non robust counterpart, a crucial question in practice. The goal of this paper is to study the robust OCE and provide several ways to reduce the complexity stemming from the robustness to get explicit formulas allowing for a quantification of the risk under model misspecification. Our first main result focuses on the case where D = Bδ (µ0 ) is the δ-neighborhood of a baseline distribution µ0 with respect to an optimal transport-like distance with cost function c(x, y) such as the Wasserstein distance (see Section 2.1 for precise definitions) and states that OCE(Bδ (µ0 ), l) = inf OCE µ0 , lλc + λδ , where lλc (x) := sup l(y) − λc(x, y) , λ≥0
y∈R
see Theorem 2.1. This formula shows that the infinite dimensional optimization problem of computing OCE(Bδ (µ0 ), l) simplifies to a finite dimensional problem with modified loss function lλc . In particular, when Bδ (µ0 ) is the ball with respect to the first order Wasserstein distance centered at µ0 and with radius δ, and l(x) = x+ /α, that is the OCE is the average value-at-risk at level α, then one has AVaRα (Bδ (µ0 )) = AVaRα (µ0 ) + δ/α. That is, the robust AVaR is the same as the AVaR plus an “uncertainty premium”. The above representation formulas rely on the general duality Z Z sup f dµ = inf λδ + f λc dµ0 , µ∈Bδ (µ0 )
λ≥0
which holds true for every Borel function f bounded from below, see Proposition 3.1. We stress that this formula is interesting on its own, as it can be used for several distributionally robust optimization problems, see for instance Gabrel et al. [23] for an overview and motivations. Moreover, it was also derived by Esfahani and Kuhn [18] and Blanchet and Murthy [10] with different assumptions and techniques. We further investigate alternative representations of the robust OCE when D is not necessarily of the form Bδ (µ0 ). In particular, we study the representation of the robust AVaR (corresponding to l(x) = x+ /α) as a tail risk measure. This representation, first proved by Rockafellar and Uryasev [32] in the non-robust case, has important applications in optimization problems and does not carry over to the robust case unless stronger structural and topological assumptions are put on the set D, see Proposition
2
2.8. When defined on random variables, convex dual representations of OCE are particularly relevant, see for instance Cherny and Kupper [14] for applications to optimization problems and Backhoff and Tangpi [2] for applications to dynamics representations. In the present article, we derive a dual representation of the robust OCE on the set of bounded measurable random variables, without any topological assumption on the sample space. Financial modeling under model ambiguity, known as robust finance, is currently being a topic of intensive research. We refer among others to [5, 13, 16, 25] for superhedging problems and [3, 9, 28–30] for robust utility maximization. Distributionally robust problems are also studied in statistics, economics and operations research, see for instance [24, 26, 31]. The paper is organized as follows: The next section summarizes our main findings. Namely, a finite dimensional representation of the OCE when the ambiguity set is an optimal transport-ball, conditions under which the robust AVaR is a tail-risk measure and a convex dual representation when the OCE is defined on random variables. We also give several edifying examples. In the final section, we provide detailed proofs.
2. Main results For d ∈ N, we denote by M1 (Rd ) the set of σ-additive probabilities on (the Borel σ-field of) Rd , and put M1 := M1 (R). For µ ∈ M1 (Rd ) and a (Borel-)measurable function f : Rd → (−∞, +∞], the integral R R R of f with respect to µ is written as f dµ. Throughout, the convention f dµ = +∞ if f + dµ = +∞ is used. For a set of probabilities D ⊂ M1 and a measurable function l : R → (−∞, +∞], define the robust optimized certainty equivalent Z OCE(D, l) := inf sup l(x − m) µ(dx) + m . m∈R
µ∈D
Whenever D = {µ} consists of one measure only, we write OCE(µ, l) = OCE({µ}, l). The OCE is usually defined for random variables, see also Section 2.3, and is law-invariant. Therefore can just as well be defined for distributions. In this article we follow the latter approach since it is more natural when investigating distribution uncertainty.
2.1. Robustness given by a neighborhood Our first main result is that in case of D being the “neighborhood” of a given baseline distribution µ0 , then the infinite dimensional problem of computing OCE(D, l) becomes finite dimensional. More precisely, fix a lower semicontinuous function c : R × R → [0, +∞] satisfying inf y∈R c(x, y) = 0 for every x ∈ R, and some δ > 0. Define the distance1 dc by Z 2 dc (µ, ν) := inf c(x, y) π(dx, dy) : π ∈ M1 (R ) such that π(· × R) = µ and π(R × ·) = ν R2
for µ, ν ∈ M1 and denote by Bδ (µ0 ) := {µ ∈ M1 : dc (µ0 , µ) ≤ δ} 1 In
general dc is not a metric. Nonetheless, this transport-like distance includes a variety of examples, such as the total variation for c(x, y) = 1x6=y or the famous Wasserstein distance of order p for c(x, y) = |x − y|p – see for instance [33, Section 6].
3
the ball of radius δ around µ0 . Finally, for any function f : R → (−∞, +∞] and λ ≥ 0 denote by f λc (x) := sup f (y) − λc(x, y) , y∈R
the λc-conjugate of f (with the convention +∞ − ∞ := −∞). Throughout this section we make the standing assumptions that l : Rd → (−∞, +∞] is bounded from below and that c only depends on the difference, that is, c(x, y) = c˜(y − x) for some function c˜. Both assumptions are satisfied in most of the interesting cases and the latter one is only required to ease the formulas – the general case is given in Theorem 3.2. We stress that l is not assumed to satisfy the typical assumptions of convexity and monotonicity. Theorem 2.1. Assume that lim inf |x|→+∞ c˜(x) = +∞ and inf x∈R c˜(x) = 0. Then OCE(Bδ (µ0 ), l) = inf OCE µ0 , lλc + λδ . λ≥0
Remark 2.2. Under the same assumptions as in the theorem, one can show a similar formula for the robust expected shortfall; namely Z n o ES(Bδ (µ0 ), l, α) := inf m ∈ R : sup l(x − m) µ(dx) ≤ α = inf ES {µ0 }, lλc , α − λδ λ≥0
µ∈Bδ (µ0 )
for all α such that inf x l(x) < α.
Remark 2.3. • There are other popular distances on probability measures, such as for instance the Lévy-metric that one could consider instead of dc . However, in the relevant cases when l is a loss function (meaning that l is monotone, convex and not constant), considering the Lévy-metric does not make much sense. Indeed, given any baseline distribution µ0 , one has OCE(Bδ (µ0 ), l) = +∞ if Bδ (µ0 ) is neighborhood in the Lévy-metric around µ0 . • Given that l is a loss function, one has OCE(Bδ (µ0 ), l) = +∞ if lim inf x→+∞ c˜(x) = +∞. 1/p
• Let c(x, y) = |x − y|p so that dc is the Wasserstein distance of order p. Then, even if µ0 is a Dirac measure, Bδ (µ0 ) contains heavy / fat-tailed distributions. One subject of concern when choosing a risk measure is its computational cost. Theorem 2.1 shows in particular that if the ambiguity set is Bδ (µ0 ), then the computation of the robust OCE is not much harder than that of the classical OCE. Example 2.4 (Average value-at-risk). Let l(x) = x+ /α for some α ∈ (0, 1) so that OCE(·, l) becomes 1/p the robust average value-at-risk AVaRα at level α. Further let c(x, y) = |x − y|p so that dc becomes the Wasserstein distance of order p. Then, if p = 1 or p = 2, one has AVaRα (Bδ (µ0 )) = AVaRα (µ0 ) +
1/p δ . α
♦
This example gives a mathematical justification to an intuitively natural fact known as post-valuation adjustment: When computing the risk of a loss µ0 , it is advisable to add a margin to hedge a possible model misspecification or a computational error, see for instance [15, Chapter 5].
4
Example 2.5 (Monotone mean-variance). Let l(x) = (((1 + x)+ )2 − 1)/2 so that OCE(·, l) becomes the robust monotone mean-variance risk measure, see [27]. For the cost function c(x, y) = (x − y)2 /2, one has 1 λ OCE(Bδ (µ0 ), l) = inf λδ + + OCE µ0 , l . ♦ λ>1 2(λ − 1) λ−1 Example 2.6 (Value-at-risk). Fix some p > 0 and let c(x, y) = |x − y|p . Then, for the robust value-atrisk at level α ∈ (0, 1), one has n o VaRα (Bδ (µ0 )) := inf m ∈ R : sup µ((m, +∞)) ≤ α µ∈D
= inf inf m ∈ R : µ0 ((m, +∞)) + e(m, λ) ≤ α − δλ , λ≥0
where e(m, λ) :=
R
(m−λ−1/p ,m] 1
− λ|x − m|p µ0 (dx).
♦
Remark 2.7. In practice, the estimation of the tails is more likely to be subject to uncertainty. So one can ask how to compute the robust OCE when only the probability of sets A ⊂ (−k, k) for some fixed k > 0 are well-known. This can be incorporated in our results as follows: Take any function c satisfying the assumptions of Theorem 2.1, respectively Theorem 3.2, and define ( +∞, if x 6= y and (|x| < k or |y| < k), cˆ(x, y) := c(x, y), else. ˆδ (µ0 ) (the ball of radius δ with respect to the cost function cˆ) satisfies µ(A) = Then every measure µ ∈ B µ0 (A) for every Borel set A ⊂ (−k, k). Moreover, one can check that cˆ again satisfies the assumptions of Theorem 3.2, so that Z ˆδ (µ0 ), l) = inf inf δλ + l(· − m)λˆc (x) µ0 (dx) + m . OCE(B λ≥0 m∈R
2.2. Representation as tail risk measure In the non-robust setting – that is, D being a singleton – it is well known that the average value-at-risk, see Example 2.4, is a risk measure capturing the “tail risk”. In particular, it satisfies the representation Z 1 α VaRu (µ) du, (2.1) AVaRα (µ) = α 0 where VaR is the value-at-risk, see Example 2.6. That is, AVaR is roughly speaking the average over the VaR below the α-quantile; an important property for instance in optimal portfolio problems, see Rockafellar and Uryasev [32]. However, we will see in Section 2.3 that AVaRα (D) = supµ∈D AVaRα (µ), from which it easily follows that (2.1) in general no longer holds true when D consists of more than one element. A robust version of formula (2.1) can be proven under stronger assumptions on the set D. In fact, if D 6= ∅ satisfies D is tight and for all µ, µ ˜ ∈ D there is ν ∈ D , (DIR) such that µ(−∞, t], µ ˜(−∞, t] ≥ ν(−∞, t] for all t it also follows that the robust OCE has the same properties as the non-robust one, see Corollary 3.4. Here tight means that for every ε > 0 there is some compact set K ⊂ R satisfying supµ∈D µ(K c ) ≤ ε.
5
Proposition 2.8. Assume that (DIR) holds true. Then Z 1 α VaRu (D) du AVaRα (D) = α 0 for every α ∈ (0, 1). Example 2.9. Let (Ω, F , P ) be a probability space carrying a Brownian motion (Wt )t∈[0,T ] , where T ∈ (0, ∞), equipped with the completion of the natural filtration of W . Let σ, b, and ¯b be three real numbers ¯ such that σ > 0. Then, for every strictly increasing function f : R → R and S0 > 0, the set D := P ◦ f (ST )−1 with dSt = St (b dt + σ dWt ) and b ∈ [b, ¯b] ¯ satisfies (DIR). ♦
2.3. Duality Let (Ω, F ) be a given measurable space endowed with a non-linear expectation E := supP ∈P EP , where P is a convex set of probabilities. In analogy to before and with a slight abuse of notation, we write OCE(X, l) = inf E(l(X − m)) + m = OCE(PX , l) m∈R
for every measurable function X : Ω → R where PX := {P ◦ X −1 : P ∈ P}. Throughout this section we assume that l : R → R satisfies the usual assumptions l is increasing, convex, bounded from below, and (STD) l(0) = 0, l∗ (1) = 0, and l(x) > x for |x| large enough where l∗ denote the convex conjugate of l defined as l∗ (y) := sup(xy − l(x)) x∈R
for y ∈ (−∞, +∞) and l∗ (+∞) := +∞. Note that l∗ (y) ≥ 0 and that if l is continuously differentiable, then l∗ (1) = 0 just says that l′ (0) = 1. For the remainder of this section, the term Q always refers to a probability measure on F and dQ/dP denotes the Radon-Nikodym derivative if Q is absolutely continuous with respect to P and dQ/dP ≡ +∞ otherwise. Theorem 2.10. For every bounded measurable function X : Ω → R one has h dQ i . OCE(X, l) = sup EQ [X] − inf EP l∗ P ∈P dP Q In particular it holds OCE(D, l) = supµ∈D OCE(µ, l) whenever there exists some k ∈ N such that µ([−k, k]) = 1 for all µ ∈ D. Example 2.11. • Relative entropy: Let l(x) = (exp(αx) − 1)/α for some α > 0. Then h dQ i 1 1 . OCE(X, l) = sup log EP [exp(αX)] = sup EQ [X] − inf EQ log P ∈P α dP Q P ∈P α This is a generalization of the well-known Gibbs variational principle, and inf P ∈P EQ [log dQ/dP ] can be seen as the Kullback-Leibler divergence between the probability measure Q and the set P.
6
• Monotone mean-variance: Let l(x) = (((1 + x)+ )2 − 1)/2. Then h 1 dQ 2 i OCE(X, l) = sup EQ [X] − inf EP − 1) . P ∈P 2 dP Q The function inf P ∈P EP [(dQ/dP )2 /2−1)] can be seen as the Rènyi divergence of order 2 between the probability measure Q and the set P. • Average value-at-risk: Let l(x) = x+ /α for some α ∈ (0, 1). Then OCE(X, l) = sup EQ [X] : Q such that dQ/dP ≤ 1/α for some P ∈ P .
♦
3. Proofs 3.1. Proofs for Section 2.1 Let d ∈ N and c : Rd × Rd → [0, +∞] be lower semicontinuous. Recall that for a function f : Rd → (−∞, +∞] and λ ≥ 0, it’s λc-transform is defined by f λc (x) := sup (f (y) − λc(x, y)) y∈Rd
with the convention +∞ − ∞ := −∞. Then, if f and c are continuous, f λc is lower semicontinuous. In general, if f is only assumed to be Borel, it follows for instance from [8, Section 7] that f λc is universally measurable and in particular µ-measurable. Also note that f λc is a well-known modification of the classical Fenchel-Legendre transform studied in the context of optimal transport under the name “c-transform”, see for instance [33, Section 5]. Finally define Z dc (µ, ν) := inf c(x, y) π(dx, dy) : π ∈ M1 (R2d ) such that π(· × Rd ) = µ and π(Rd × ·) = ν R2d
for µ, ν ∈ M1 (Rd ) and denote by Bδ (µ0 ) := {µ ∈ M1 (Rd ) : dc (µ0 , µ) ≤ δ} the ball of radius δ > 0 around the baseline distribution µ0 ∈ M1 (Rd ). Proposition 3.1. Assume that for every r ≥ 0 there is k ≥ 0 such that c(x, y) ≥ r if |x − y| ≥ k and inf y∈Rd c(x, y) = 0 for all x ∈ Rd . Then it holds Z Z sup (3.1) f dµ = inf λδ + f λc dµ0 λ≥0
µ∈Bδ (µ0 )
for every Borel function f : Rd → (−∞, +∞] which is bounded from below. Moreover, the infimum over λ ≥ 0 is attained. Proof. For any Borel function f : Rd → (−∞, +∞] that is bounded from below, define Z Φ(f ) := inf λδ + f λc dµ0 . λ≥0
The goal is to apply Theorem A.1.
7
Step 1: Monotonicity and sublinearity. The function Φ is an increasing and sublinear function. Indeed, for any f and g such that f ≤ g it holds f λc ≤ g λc by definition so that Φ(f ) ≤ Φ(g). Moreover, (tf )λc = t(f )λc/t showing that Φ(tf ) = tΦ(f ) for t > 0. Further, ′
′′
(f + g)λc ≤ f λ c + g λ
c
for λ := λ′ + λ′′
which implies that Φ(f + g) ≤ Φ(f ) + Φ(g). Finally, for every m ∈ R, λ ≥ 0, and x ∈ Rd it holds mλc (x) = m − inf λc(x, y) = m y∈Rd
so that Φ(m) = m. As Φ is increasing, it follows in particular that Φ(f ) ∈ R whenever f is bounded. Step 2: Continuity from above. Denote by Cb and Ub the set of bounded continuous and upper semicontinuous functions, respectively. We show that Φ is continuous from above on Cb . Let (fn ) be a sequence in Cb which decreases pointwise to 0. Fix ε > 0 and let m be a constant such that f1 ≤ m. Further fix k such that µ0 ([−k, k]c ) ≤ ε. By assumption there is r > 0 such that c(x, y) ≥ m/ε whenever |x − y| ≥ r, hence fnεc (x) = sup (fn (y) − εc(x, y)) ≤ sup fn (y). y∈[x−r,x+r]
y∈[x−r,x+r]
It follows from Dini’s lemma that fn 1[−k−r,k+r] ≤ ε for n large, thus fnεc ≤ ε1[−k,k] + m1[−k,k]c for n large. Therefore Φ(fn ) ≤ εδ +
Z
fnεc (x) µ0 (dx) ≤ εδ + εµ0 ([−k, k]) + mµ0 ([−k, k]c ) ≤ εδ + ε + mε
for n large and as ε > 0 was arbitrary, Φ(fn ) ↓ 0 = Φ(0). Step 3: Continuity from below. We show that Φ(fn ) ↑ Φ(f ) whenever (fn ) is a sequence of Borel functions fn : Rd → (−∞, +∞] which increases to f . Since Φ is increasing, it suffices to show that Φ(f ) ≤ supn Φ(fn ). Assume that supn Φ(fn ) < ∞, since otherwise there is nothing to prove. For every n fix λn ≥ 0 such that Z 1 Φ(fn ) ≥ λn δ + fnλn c dµ0 − n and m ∈ R with m ≤ f1 ≤ fn so that
fnλn c (x) ≥ sup m − λn c(x, y) = m. y∈Rd
In particular, since δ > 0, one obtains that the sequence (λn ) is bounded. Hence, possibly after passing to a subsequence, (λn ) converges to some λ ∈ [0, +∞). Since f λc (x) = sup lim fn (y) − λn c(x, y) ≤ lim inf sup fn (y) − λn c(x, y) = lim inf fnλn c (x) y∈Rd n
n
n
y∈Rd
for every x and fnλn c ≥ m, it follows from Fatou’s lemma that Z Z Φ(f ) ≤ λδ + f λc dµ0 ≤ lim inf λn δ + fnλn c dµ0 = sup Φ(fn ) ≤ Φ(f ), n
n
where the last inequality holds since Φ is increasing and fn ≤ f for every n. Thus the claim follows.
8
Step 4: Computation of the convex conjugate. We claim that ( Z Z 0, if µ ∈ Bδ (µ0 ), ∗ ∗ ΦC (µ) := sup f dµ − Φ(f ) = ΦU (µ) := sup f dµ − Φ(f ) = f ∈Cb f ∈Ub +∞, otherwise. First notice that 0 ≤ Φ∗C ≤ Φ∗U since Φ(0) = 0 and Cb is a subset of Ub . If µ ∈ Bδ (µ0 ), then there exists R π ∈ M1 (Rd × Rd ) with marginals π(· × Rd ) = µ0 and π(Rd × ·) = µ such that c dπ ≤ δ, see for instance [33, Theorem 5.9]. For any f ∈ Ub and λ ≥ 0, the pointwise inequality f (y) ≤ λc(x, y) + f λc (x) integrated with respect to π yields Z Z f (y) µ(dy) = Rd
for all x, y
f (y) π(dx, dy) ≤ λδ +
Z
f λc (x) µ0 (dx),
(3.2)
Rd
Rd ×Rd
R so that f dµ ≤ Φ(f ). Thus 0 ≤ Φ∗C (µ) ≤ Φ∗U (µ) = 0. Now assume that µ ∈ / Bδ (µ0 ). If µ is not a probability measure, then Φ∗C (µ) ≥ supm∈R (mµ(Rd ) − m) = +∞ where it was used that Φ(m) = m for every m. So in the sequel assume that µ is a probability but not an element of Bδ (µ0 ). Recall the dual problem of dc , Z Z ψ1 , ψ2 bounded continuous such that , dc (µ0 , µ) = sup ψ1 dµ0 + ψ2 dµ : ψ1 (x) + ψ2 (y) ≤ c(x, y) for all x, y proven in [33, Theorem 5.9]. Hence dc (µ0 , µ) > δ implies the existence of bounded and continuous functions ψ1 , ψ2 such that Z Z ψ1 (x) + ψ2 (y) ≤ c(x, y) for all x, y and ψ1 dµ0 + ψ2 dµ > δ. R Since ψ2 (y) − c(x, y) ≤ −ψ1 (x), it follows that ψ2c ≤ −ψ1 so that Φ(ψ2 ) ≤ δ − ψ1 dµ0 and therefore R ψ2 dµ − Φ(ψ2 ) > 0. Using Φ(tψ2 ) = tΦ(ψ2 ) for every t ≥ 0, which was shown in the first step, it follows that Z Φ∗U (µ) ≥ Φ∗C (µ) ≥ sup tψ2 dµ − Φ(tψ2 ) = +∞ t≥0
where the second inequality follows since tψ2 ∈ Cb for every t. The representation (3.1) now follows from an application of Theorem A.1. As for the existence of an optimal λ ≥ 0, apply the argument of Step 3 to the constant sequence fn = f .
Theorem 3.2. Assume that for every r ≥ 0 there is k ≥ 0 such that c(x, y) ≥ r if |x − y| ≥ k and inf y∈R c(x, y) = 0 for all x ∈ R. Then Z OCE(Bδ (µ0 ), l) = inf inf δλ + l(· − m)λc (x) µ0 (dx) + m . λ≥0 m∈R
Proof. By Proposition 3.1 one has
OCE(Bδ (µ0 ), l) = inf
m∈R
sup µ∈Bδ (µ0 )
Z
Z
inf δλ + l(· − m)λc (x) µ0 (dx) + m m∈R λ≥0 Z = inf inf δλ + l(· − m)λc (x) µ0 (dx) + m , = inf
l(x − m) µ(dx) + m
λ≥0 m∈R
which completes the proof.
9
Proof (of Theorem 2.1). If c(x, y) = c˜(x − y) for some function c˜ then l(· − m)λc (x) = lλc (x − m) for all m ∈ R, x ∈ Rd , and λ ≥ 0. Therefore Theorem 2.1 follows from Theorem 3.2. Proof (of Remark 2.2). The same arguments in the proof of Theorem 3.2 show that Z o n ES(Bδ (µ0 ), l, α) = inf m ∈ R : min λδ + lλc (x − m) µ0 (dx) ≤ α λ≥0 Z n o = inf inf m ∈ R : λδ + lλc (x − m) µ0 (dx) ≤ α λ≥0
= inf ES(µ0 , lλc , α − λδ). λ≥0
Proof (of Remark 2.3). We only prove that OCE(Bδ (µ0 ), l) = +∞ if l is a loss function and c˜ does not satisfy the assumption of Theorem 2.1, the proof for the Lévy-ball works similar. Since l is increasing, convex and not constant, there exist a, b > 0 such that l(x) ≥ ax − b for every x ∈ R. Moreover, since lim inf x→+∞ c˜(x) =: r < +∞, there is a sequence (xk ) in R such that xk ≥ k and c˜(xk ) ≤ r. For simplicity assume that c˜(0) = 0, µ0 = δ0 , and define µk = (1 − δ/r)δ0 + δ/rδxk . Then δ δ dc (µ0 , µk ) = 1 − c˜(0 − 0) + c˜(xk − 0) ≤ δ r r
so that µk ∈ Bδ (µ0 ) for each k. However, since Z δ δ l(0 − m) + a(xk − m) − b = +∞ sup l(x − m) µk (dx) ≥ sup 1− r r k k for every m ∈ R, it follows that OCE(Bδ (µ0 ), l) = +∞. To show that Bδ (µ0 ) contains fat / heavy-tailed distributions, assume for simplicity that µ0 = δ0 . Then R dc (µ0 , µ) = |x|p µ(dx). The claim follows by existence of fat / heavy-tailed distributions with finite p-th moment. Proof (of Example 2.4). For every λ ≥ 0 it holds sup y∈R
1 + 1 x+ y + − λ(x − y)2 = α α 4λα
1
so that OCE(µ0 , lλc ) = OCE(µ0 , l) +
1 4λα
for every λ ≥ 0. Thus, Theorem 2.1 yields r 1 δ = OCE(µ0 , l) + . OCE(Bδ (µ0 ), l) = OCE(µ0 , l) + inf δλ + λ≥0 4λα α This proves the claim for p = 2. Similarly, for p = 1, it holds lλc (x) = +∞ if λ < 1/α and lλc (x) = x+ /α else. Thus, it follows by Theorem 2.1 that OCE(Bδ (µ0 ), l) = OCE(µ0 , l) + inf λδ = OCE(µ0 , l) + λ≥1/α
10
δ . α
Proof (of Example 2.5). It holds λc
l (x) =
(
+∞,
if λ < 1,
λ λ−1 l(x)
+
1 2(λ−1) ,
else.
λ 1 + OCE(µ0 , λ−1 l) so that Thus, for the optimized certainty equivalent, one has OCE(µ0 , lλc ) = 2(λ−1) by Theorem 2.1 it holds λ 1 + OCE µ0 , l . OCE(Bδ (µ0 ), l) = inf λδ + λ>1 2(λ − 1) λ−1
Proof (of Example 2.6). Note that the value at risk is a special case of the expected shortfall, corresponding to the loss function l = 1(0,+∞) . Further, with the convention 0−1/p = +∞, it holds lλc (x) = l(x) + (1 − λ|x|p )1(−λ−1/p ,0] (x) for every x. Therefore
R
lλc (x − m) µ0 (dx) = µ0 ((m, +∞)) + e(m, λ) so that
VaRα (Bδ (µ0 )) = ES(Bδ (µ0 ), l, α) = inf ES {µ0 }, lλc , α − λδ λ≥0 = inf inf m ∈ R : µ0 ((m, +∞)) + e(m, λ) ≤ α − λδ , λ≥0
where the second equality follows from Remark 2.2.
3.2. Proofs for Section 2.2 The main argument for the proof of Proposition 2.8 is given in the next lemma. Lemma 3.3. Assume that D satisfies (DIR). Then, there exists µ∗ ∈ M1 such that Z Z sup f dµ = f dµ∗
(3.3)
µ∈D
for every increasing, continuous function f : R → R that is bounded from below. If in addition D is closed in the weak topology induced by all continuous bounded functions, then µ∗ ∈ D. Proof. First assume that f is bounded. Since D is tight, it can be checked that F defined by F (t) := ¯ ¯ inf µ∈D Fµ (t) where Fµ (t) := µ(−∞, t] is a cumulative distribution function. Furthermore, f being increasing, continuous and bounded, it defines a finite Borel measure df on the real line. Hence df is regular and τ -additive, see for instance [11, Proposition 7.2.2]. Let us first show that Z Z F df = inf Fµ df. (3.4) µ∈D ¯ Each cumulative distribution function Fµ is increasing and right-continuous, hence upper semicontinuous. Since D satisfies (DIR), the net2 (Fµ )µ∈D is decreasing. Thus, (1 − Fµ )µ is an increasing net of nonnegative lower semicontinuous functions such that 1 − F = limµ (1 − Fµ ). It therefore follows from ¯ [11, Lemma 7.2.6] that Z Z Z sup 1 − Fµ df = lim 1 − Fµ df = 1 − F df, µ ¯ µ∈D 2D
is endowed with the ordering µ ν if and only if µ(−∞, t] ≥ ν(−∞, t] for every t.
11
R R which shows (3.4). Moreover, since f is continuous, one has F (x−) df (x) = F (x) df (x). Hence, ¯ ¯ integration by parts yield Z Z Z Z inf Fµ df = f (∞) − inf f dF = f (∞) − F df = f (∞) − Fµ df µ∈D µ∈D ¯ ¯ Z Z = f (∞) − inf f (∞) − f dFµ = sup f dFµ , µ∈D
µ∈D
showing (3.3) whenever f is bounded, with µ∗ being the distribution associated to F . If f is not bounded, ¯ we approximate f from below by f n := f ∧ n. If D is also closed, then it follows from Prokhorov’s theorem and tightness that D is compact. Suppose for contradiction that µ∗ ∈ / D. Then, by the strong separation theorem and (3.3) which was already R proven, there exists a continuous bounded and increasing function f : R → R such that f dµ∗ > R supµ∈D f dµ, which clearly contradicts (3.3). Thus, µ∗ ∈ D.
Corollary 3.4. Assume that (DIR) holds and that l : R → R is convex, increasing, bounded from below, and that l(x) > x for |x| large enough. Then there exists µ∗ ∈ M1 and m∗ ∈ R such that Z OCE(D, l) = OCE(µ∗ , l) and OCE(D, l) = l(x − m∗ ) µ∗ (dx) + m∗ . In particular m∗ is characterized by Z Z ′ ′ l− (x − m∗ ) µ∗ (dx) ≤ 1 ≤ l+ (x − m∗ ) µ∗ (dx)
′ ′ of l. If l is continuously differentiable, then inequalities for the right and left hand derivatives l− and l+ in the above formula are equalities.
Proof. The existence of a µ∗ ∈ M1 such that OCE(D, l) = OCE(µ∗ , l) follows directly from Lemma 3.3. Therefore, the existence and characterization of an optimal allocation m∗ can be deduced from the non-robust case, see for instance [6]. Proof (of Proposition 2.8). It follows from Lemma 3.3 and Corollary 3.4 that one has µ∗ (−∞, t] = inf µ∈D µ(−∞, t] for every t and AVaRα (D) = AVaRα (µ∗ ). Thus, [21, Lemma 4.51] yields AVaRα (µ∗ ) =
1 α
Zα
VaRu (µ∗ ) du =
0
1 α
Zα
VaRu (D) du.
0
Proof (of Example 2.9). For every b ≤ b ≤ ¯b, the process Stb = S0 exp((b − 12 σ 2 )t+ σWt ) is the solution ¯ of dSt = St (b dt + σdWt ). Further, since S0 > 0 and f is strictly increasing, one has ¯
b
P (f (STb ) ≤ x) ≤ P (f (STb ) ≤ x) ≤ P (f (S¯T ) ≤ x) for all x. Thus a straightforward computation shows that D satisfies (DIR).
3.3. Proof of Theorem 2.10 Define the function J : R × P → R,
(m, P ) 7→ EP [l(X − m)] + m
12
(3.5)
and notice that J is convex and continuous in m, and concave in P. Moreover, since l is bounded from below and limx→∞ l(x)/x = +∞ by assumption, it follows that inf P ∈P J(m, P ) increases to +∞ whenever |m| does. In particular there exists m0 such that inf sup J(m, P ) =
m∈R P ∈P
inf
sup J(m, P ) = sup
m∈[−m0 ,m0 ] P ∈P
inf
P ∈P m∈[−m0 ,m0 ]
J(m, P ) = sup inf J(m, P ), P ∈P m∈R
where the middle equality follows for instance from [19, Theorem 2]. Therefore OCE(PX , l) = inf sup J(m, P ) = sup inf J(m, P ) = sup OCE(P ◦ X −1 , l). m∈R P ∈P
P ∈P m∈R
P ∈P
In particular, it follows from the classical representation of the optimized certainty equivalent that h dQ i h dQ i = sup EQ [X] − inf EP l∗ . OCE(X, l) = sup sup EQ [X] − EP l∗ P ∈P dP dP Q P ∈P Q
A. Appendix Denote by Bb− the set of all Borel measurable functions from Rd to (−∞, +∞] which are bounded from below, and by Ub and Cb the subsets of bounded upper semicontinuous (resp. bounded continuous) functions. Further write M for the set of all countably additive, finite, positive Borel measures on Rd , that is M := {tµ : µ ∈ M1 (Rd ), t ≥ 0}. The following theorem, which builds on Choquet’s theory on the regularity of capacities, is a slight modification of Theorem A1 in [4]. Theorem A.1. Let Φ : Bb− → (−∞, +∞] be a monotone convex functional such that Φ(f ) < +∞ whenever f is bounded. If • Φ(fn ) ↓ Φ(0) for every sequence (fn ) in Cb which decreases pointwise to 0, R R • Φ∗ (µ) := supf ∈Cb ( f dµ − Φ(f )) = supf ∈Ub ( f dµ − Φ(f )) for every µ ∈ M,
• Φ(fn ) ↑ Φ(f ) for every sequence (fn ) in Bb− which increases pointwise to f ∈ Bb− , then Φ(f ) = sup µ∈M
Z
f dµ − Φ∗ (µ)
for every f ∈ Bb− .
References [1] P. Artzner, F. Delbaen, J. M. Eber, and D. Heath. Coherent measures of risk. Math. Finance, 9:203–228, 1999. [2] J. Backhoff and L. Tangpi. On the dynamic representation of some time-inconsistent risk measures in a Brownian filtration. Preprint: arXiv:1608.07498, 2016. [3] D. Bartl. Exponential utility maximization under model uncertainty for unbounded endowments. Preprint: arXiv:1610.00999, 2016. [4] D. Bartl, P. Cheridito, and M. Kupper. Robust expected utility maximization with medial limits. Upcoming preprint. [5] M. Beiglböck, P. Henry-Labordère, and F. Penkner. Model-independent bounds for option prices: a mass transport approach. Finance and Stoch., 17(3):477–501, 2011. [6] A. Ben-Tal and M. Taboulle. An old-new concept of convex risk measures: The optimized certainty equivalent. Math. Finance, 17:449–476, 2007.
13
[7] A. Ben-Tal and M. Teboulle. Expected utility, penalty functions and duality in stochastic nonlinear programming. Management Science, 32:1445–1466, 1986. [8] D. Bertsekas and S. Shreve. Stochastic Optimal Control: The Discrete-Time Case, volume 23. Academic Press New York, 1978. [9] R. Blanchard and L. Carassus. Robust optimal investment in discrete time for unbounded utility function. Preprint: arXiv:1609.09205, 2016. [10] J. Blanchet and K. Murthy. arXiv:1604.01446, 2016.
Quantifying distribuional model risk via optimal transport.
Preprint:
[11] V. I. Bogachev. Measure Theory, volume 2. Springer, 2007. [12] P. Cheridito and T. Li. Risk measures on Orlicz hearts. Math. Finance, 19(2):189–214, 2009. [13] P. Cheridito, M. Kupper, and L. Tangpi. Duality formulas for robust pricing and hedging in discrete time. Forthcoming in SIAM J. Fin. Math., 2016. [14] A. S. Cherny and M. Kupper. Divergence utilities. Available at SSRN: https://ssrn.com/abstract=1023525, 2007. [15] A. Damodaran. Strategic Risk Taking: A Framework for Risk Management. Wharton School Publishing, 2008. [16] Y. Dolinsky and H. Soner. Martingale optimal transport and robust hedging in continuous time. Probab. Theory Related Fields, 160(1):391–427, 2014. [17] S. Drapeau, M. Kupper, and A. Papapantoleon. A Fourier approach to the computation of CVaR and optimized certainty equivalents. Journal of Risk, 16(6):3–29, 2014. [18] P. M. Esfahani and D. Kuhn. Data-driven distributionally robust optimization using the Wasserstein metric: Performace guarantees and tractable reformulations. Forthcoming in Mathematical Programming, 2015. [19] K. Fan. Minimax theorems. Proc. Nat. Acad. Sci. U.S.A., 39:42–47, 1953. [20] H. Föllmer and A. Schied. Convex measures of risk and trading constraint. Finance Stoch., 6(4):429–447, 2002. [21] H. Föllmer and A. Schied. Stochastic Finance: An Introduction in Discrete Time. Walter de Gruyter, 3rd Edition, 2011. [22] M. Frittelli and E. Rosazza Gianin. Putting order in risk measures. Journal of Banking & Finance, 26(7): 1473–1486, July 2002. [23] V. Gabrel, C. Murat, and A. Thiele. Recent advances in robust optimization: An overview. European J. Oper. Res., 235:471–483, 2014. [24] L. P. Hansen and T. J. Sargent. Robust control and model uncertainty. The American Economic Review, 91(2): 60–66, 2001. [25] D. Hobson. Robust hedging of the lookback option. Finance Stoch., 2(4):329–347, 1998. [26] P. J. Huber. Robust Statistics. John Wiley & Sons, 1981. [27] F. Maccheroni, M. Marinacci, and A. Rustichini. Ambiguity aversion, robustness, and the variational representation of preferences. Econometrica, 74(6):1447–1498, 2006. [28] A. Matoussi, D. Possamaï, and C. Zhou. Robust utility maximization in nondominated models with 2BSDE: the uncertain volatility model. Math. Finance, 25(2):258–287, 2015. [29] A. Neufeld and M. Sikic. Robust utility maximization in discrete-time markets with frictions. Preprint: arXiv:1610.09230, 2016. [30] M. Nutz. Utility maximization under model uncertainty in discrete time. Math. Finance, 26(2):252–268, 2016. [31] G. Pflug, A. Pichler, and D. Wozabal. The 1/n investment strategy is optimal under high model ambiguity. J. Bank. Financ, 36:410–417, 2012. [32] R. T. Rockafellar and S. Uryasev. Optimization of conditional value-at-risk. Jounal of Risk, 2:21–42, 2000. [33] C. Villani. Optimal Transport: Old and New, volume 338. Springer Science & Business Media, 2008.
14