13. Convex Function

1 downloads 0 Views 1MB Size Report
a real-valued convex function if its epigraph, defined as set epi = {( ... ℝ is a concave function if − is convex; : → ℝ is a concave function.
13.Convex Function Let ℝ = ℝ⋃{+∞, −∞} be the extended real line. Let (𝐴, 𝑉) be an affine space, where 𝑉 = (ℝ, 𝑉) is a vector space compatible with real number, then (ℝ, ℝ) is a 1-dimensional canonical affine space over ℝ. The product affine space is (𝐴, 𝑉)×(ℝ, ℝ) = (𝐴×ℝ, 𝑉×ℝ), denoted as 𝐴×ℝ for short. A map 𝑓: 𝐴 → ℝ is a real-valued convex function if its epigraph, defined as set epi 𝑓 = {(𝑥, 𝑦) ∈ 𝐴×ℝ: 𝑦 ≥ 𝑓(𝑥)} is a convex set. For any 𝑓: 𝑋 → ℝ map where 𝑋 ⊆ 𝐴 , we can extend 𝑓 to the whole 𝐴 by letting 𝑓 (𝑥) = 𝑓(𝑥) 𝑥 ∈ 𝑋 s.t. epi 𝑓 = epi 𝑓, and thus WLOG we can focus on studying 𝑓: 𝐴 → ℝ functions. 𝑓: 𝐴 → is +∞ 𝑥 ∉ 𝑋 called a convex function over 𝑋 if epi 𝑓 = {(𝑥, 𝑦) ∈ 𝑋×ℝ: 𝑦 ≥ 𝑓(𝑥)} is a convex, and it is easy to this implies 𝑋 is a convex set. 𝑓: 𝐴 → ℝ is a concave function if −𝑓 is convex; 𝑓: 𝑋 → ℝ is a concave function over 𝑋 if −𝑓 is convex over 𝑋. Given a set 𝐸 ⊆ 𝐴×ℝ, denote 𝐸 = {𝑥 ∈ 𝐴: (𝑥, 𝑦) ∈ 𝐸} as the projection of 𝐸 onto 𝐴, and 𝐸ℝ (𝑥) = {𝑦 ∈ 𝐴: (𝑥, 𝑦) ∈ 𝐸} be the projection of 𝐸 “at point 𝑥 ” onto ℝ. We will later learn that 𝐸 is actually the effective domain of 𝑓. 

Lemma 13-1 𝐸 is the epigraph of some function 𝑓: 𝐴 → ℝ iff ∀𝑥 ∈ 𝐸 = {𝑥: (𝑥, 𝑦) ∈ 𝐸} we have 𝐸ℝ(𝑥) = [𝑦 ∗, +∞) for some 𝑦 ∗ ∈ [−∞, +∞], i.e. 𝐸ℝ (𝑥) could only be ℝ, [𝑦 ∗ , +∞) for some finite 𝑦 ∗ , or ∅; and 𝑓 can be defined as 𝑓(𝑥) = inf{𝑦: (𝑥, 𝑦) ∈ 𝐸} = 𝑦 ∗, ∀𝑥 ∈ 𝐸 Necessity. If 𝐸 is an epigraph of some function 𝑓, then 𝐸ℝ (𝑥) = {𝑦: 𝑦 ≥ 𝑓(𝑥)} = [𝑓(𝑥), +∞). Sufficiency. Only need to confirm 𝐸 = epi inf{𝑦: (𝑥, 𝑦) ∈ 𝐸}, first note if (𝑥, 𝑦 ) ∈ 𝐸 for some 𝑦 ∈ ℝ, then (𝑥, 𝑦) ∈ 𝐸 for every 𝑦 ≥ 𝑦 . On one hand, let’s check that for any (𝑥, 𝑦) ∈ epi 𝑓, we have 𝑥 ∈ 𝐸 and 𝑦 ≥ 𝑓(𝑥) = 𝑦 ∗ , and since either (𝑥, 𝑦 ∗ ) ∈ 𝐸 or 𝑦 ∗ = −∞, then in both cases we have (𝑥, 𝑦) ∈ 𝐸. On the other hand, for any (𝑥, 𝑦) ∈ 𝐸, clearly 𝑦 ≥ 𝑦 ∗ = 𝑓(𝑥) ⇒ (𝑥, 𝑦) ∈ epi 𝑓. We also note the following, 1)

inf 𝑓(𝑥) = inf inf{𝑦|(𝑥, 𝑦) ∈ 𝐸} = inf 𝑦 ∗ = inf 𝐸ℝ (𝑥). ∈







2) We can extend 𝑓 to the whole 𝐴 by letting 𝑓(𝑥) =

𝑓(𝑥) 𝑥 ∈ 𝐸 and epi 𝑓 = epi 𝑓 = 𝐸. We +∞ 𝑥 ∉ 𝐸

will later learn that 𝐸 is the effective domain of 𝑓. 3) Consider two 𝑓, 𝑔: 𝐴 → ℝ functions, then epi 𝑓 = epi 𝑔 ⟺ 𝑓 = 𝑔 = 

inf{𝑦: (𝑥, 𝑦) ∈ 𝐸} 𝑥 ∈ 𝐸 +∞ 𝑥 ∈ 𝐴\𝐸

Property 13-1 If 𝐸 ⊆ 𝐴×ℝ is an epigraph of some function, then for any 𝑥 ∈ 𝐸 , 𝑦 ∈ 𝐸ℝ (𝑥) implies 𝑦 ∈ 𝐸ℝ(𝑥) for any 𝑦 ≥ 𝑦 . Recall that 𝐸 is an epigraph iff 𝐸ℝ (𝑥) could be ℝ, [𝑦 ∗, +∞) for some finite 𝑦 ∗ , or ∅. In any of these three cases, once 𝑦 ∈ 𝐸ℝ (𝑥), then 𝑦 ∈ 𝐸ℝ(𝑥) for any 𝑦 ≥ 𝑦 . Note the converse is not true. Consider replacing 𝐸ℝ (𝑥) = [𝑦 ∗, +∞) by 𝐸ℝ (𝑥) = (𝑦 ∗ , +∞) for some 𝑥, then 𝐸 still satisfies the condition that “(𝑥, 𝑦 ) ∈ 𝐸 implies (𝑥, 𝑦) ∈ 𝐸 for any 𝑦 ≥ 𝑦 ”, but it is no longer an epigraph. Lemma 13-2 𝐸 ⊆ 𝐴×ℝ is an epigraph of some function 𝑓: 𝐴 → ℝ iff it satisfies Property 13-1 plus that either inf 𝐸ℝ (𝑥) = −∞ or inf 𝐸ℝ(𝑥) ∈ 𝐸ℝ (𝑥) for any non-empty 𝐸ℝ (𝑥). Proof is trivial by Lemma 13-1 and above property.



Property 13-2 If 𝐸 is an epigraph, then 𝐸 is also an epigraph. See a later Figure 13-1 for an example of a non-closed epigraph. For any 𝑥 ∈ 𝐸 ,if inf 𝐸ℝ (𝑥) = 𝑦 ∗ is attainable, then we have (𝑥, 𝑦 ∗ ) ∈ 𝐸 and there exists a sequence {(𝑥 , 𝑦 )} ⊆ 𝐸 s.t. (𝑥 , 𝑦 ) → (𝑥, 𝑦 ∗ ) by definition of set closure, then for any

𝑦 ≥ 𝑦 ∗ we have 𝑦 + (𝑦 − 𝑦 ∗) > 𝑦 ∗ ⇒ 𝑦 + (𝑦 − 𝑦 ∗ ) ∈ 𝐸ℝ(𝑥) , and thus (𝑥 , 𝑦 + (𝑦 − 𝑦 ∗ )) → (𝑥, 𝑦), so 𝑦 ∈ 𝐸ℝ (𝑥) for any 𝑦 ≥ 𝑦 ∗. If inf 𝐸ℝ(𝑥) = −∞, then there exists a sequence {(𝑥 , 𝑦 )} ⊆ 𝐸 s.t. (𝑥 , 𝑦 ) → (𝑥, −∞). Then for any 𝑦 ∈ ℝ, we have 𝑦 > 𝑦 for all sufficiently large 𝑘, and then (𝑥 , 𝑦) ∈ 𝐸 for all sufficiently large 𝑘, and (𝑥 , 𝑦) → (𝑥, 𝑦) ⇒ 𝑦 ∈ 𝐸ℝ (𝑥). Property 13-3 If 𝐸 is an epigraph, then the closure its convex hull 𝐶(𝐸) is also an epigraph. For any 𝑥 ∈ 𝐶(𝐸) , if inf 𝐶(𝐸) ℝ (𝑥) = 𝑦 ∗ is attainable, then we must have (𝑥, 𝑦 ∗ ) ∈ 𝐶(𝐸) and it is a convex combination (𝑥, 𝑦 ∗ ) = ∑

𝜃 (𝑥 , 𝑦 ) of points (𝑥 , 𝑦 ) ∈ 𝐸, then for any 𝑦 ≥ 𝑦 ∗ , we have

𝑥 , 𝑦 + (𝑦 − 𝑦 ∗) ∈ 𝐸 ⇒ (𝑥, 𝑦) = If inf 𝐶(𝐸)



𝜃 𝑥 , 𝑦 + (𝑦 − 𝑦 ∗) ∈ 𝐶(𝐸)

(𝑥) = −∞ , then there is a sequence {(𝑥, 𝑦 )} ⊆ {𝑥}× 𝐶(𝐸)



(𝑥) s.t. (𝑥, 𝑦 ) →

(𝑥, −∞), and (𝑥, 𝑦 ) is a convex combination (𝑥, 𝑦 ) = ∑ 𝜃 (𝑥 , 𝑦 , ) of points 𝑥 , 𝑦 for any 𝑦 ∈ ℝ, we can choose some sufficiently large 𝑘 s.t. 𝑦 ≥ 𝑦 , and we have 𝑥 , 𝑦 , + (𝑦 − 𝑦 ) ∈ 𝐸 ⇒ (𝑥, 𝑦) =

,

∈ 𝐸. Then

𝜃 𝑥 , 𝑦 , + (𝑦 − 𝑦 ) ∈ 𝐶(𝐸)

Note the convex hull 𝐶(𝐸) might not be an epigraph, since it is possible that inf 𝐶(𝐸) ℝ (𝑥) = 𝑦 ∗ ∉ 1 𝑥 ∈ [0,1) 𝐶(𝐸) ℝ (𝑥). For example, if 𝑓(𝑥) = , then 𝐶(epi 𝑓) will clearly consist of open-end half 2 𝑥=2 lines on interval [1,2) and hence 𝐶(epi 𝑓) is not an epigraph. 

Property 13-4 Define the effective domain of a function 𝑓: 𝐴 → ℝ as the set dom 𝑓 = {𝑥 ∈ 𝐴: 𝑓(𝑥) ∈ 𝑓(𝑥) 𝑥 ∈ dom 𝑓 [−∞, +∞)}, then note 𝑓 = is a convex function if 𝑓 is convex, since epi 𝑓 = epi 𝑓 , +∞ 𝑥 ∉ dom 𝑓 which is due to {(𝑥, 𝑦) ∈ 𝔸×ℝ: 𝑦 ≥ +∞} = ∅. By this, it is easy to see the projection of epi 𝑓 onto 𝐴 is the effective domain. This projection is clearly convex by the definition of epigraph, then we have a convex function’s effective domain must be convex.

Given a set 𝑆 in 𝐴×ℝ, we say 𝑆 contains a vertical line if {(𝑝, 𝑧): 𝑧 ∈ ℝ} ⊆ 𝑆 for some 𝑝. A convex function 𝑓 s.t. 𝑓(𝑥) ≡ +∞ (so epi 𝑓 = ∅) or 𝑓(𝑥) = −∞ for some 𝑥 is called improper. In other words, 𝑓 is improper if its epigraph is empty or contains some vertical line. Improper functions are of litter interest and practical value due to their very special shapes. An example for 𝑓(𝑥) = −∞ for some 𝑥 on 𝐴 = ℝ is given in EX 29. 

Property 13-5 Let 𝑓 , … , 𝑓 be convex functions from a normed affine space 𝐴 to ℝ, then 𝑔(𝑥) = ∑ 𝜆 𝑓 (𝑥) , 𝜆 ∈ ℝ is also convex. 𝑔(𝜃𝑥 + (1 − 𝜃)𝑥 ) = =

𝜆 𝑓 (𝜃𝑥 + (1 − 𝜃)𝑥 )

𝜆 𝜃𝑓 (𝑥 ) + (1 − 𝜃)𝑓 (𝑥 ) = 𝜃

𝜆 𝑓 (𝑥 ) + (1 − 𝜃)

𝜆 𝑓 (𝑥 )

= 𝜃𝑔(𝑥 ) + (1 − 𝜃)𝑔(𝑥 ) 

Theorem 13-1 Jensen’s inequality. 𝑓: 𝐴 → ℝ is a convex function iff 𝑓(𝜃𝑥 + (1 − 𝜃)𝑥 ) ≤ 𝜃𝑓(𝑥 ) + (1 − 𝜃)𝑓(𝑥 ) for any 𝜃 ∈ [0,1] and any 𝑥 , 𝑥 ∈ 𝐴 s.t. 𝜃𝑓(𝑥 ) + (1 − 𝜃)𝑓(𝑥 ) are defined. 𝜃𝑓(𝑥 ) + (1 − 𝜃)𝑓(𝑥 ) are not defined when 𝑓(𝑥 ) and 𝑓(𝑥 ) are infinity with opposite signs. Jensen’s inequality is perhaps the most important inequality in many probability-based machine learning methods, such like EM algorithm.

Necessity. If 𝑓(𝑥 ) = +∞ or 𝑓(𝑥 ) = +∞, the inequality holds trivially. We consider the case when 𝑓(𝑥 ) < +∞, 𝑓(𝑥 ) < +∞ . Suppose 𝑓 is convex, then epi 𝑓 is convex. For any two points (𝑥 , 𝑦 ), (𝑥 , 𝑦 ) s.t. 𝑦 ≥ 𝑓(𝑥 ), 𝑦 ≥ 𝑓(𝑥 ), we have (𝑥 , 𝑦 ), (𝑥 , 𝑦 ) ∈ epi 𝑓, and 𝜃(𝑥 , 𝑦 ) + (1 − 𝜃)(𝑥 , 𝑦 ) ∈ epi 𝑓 ⇒ (𝜃𝑥 + (1 − 𝜃)𝑥 , 𝜃𝑦 + (1 − 𝜃)𝑦 ) ∈ epi 𝑓 ⇒ 𝑓(𝜃𝑥 + (1 − 𝜃)𝑥 ) ≤ 𝜃𝑦 + (1 − 𝜃)𝑦 for any 𝜃 ∈ [0,1]. Note this inequality holds for any 𝑦 ≥ 𝑓(𝑥 ), 𝑦 ≥ 𝑓(𝑥 ), thus it holds for 𝑦 = 𝑓(𝑥 ), 𝑦 = 𝑓(𝑥 ), and hence 𝑓(𝜃𝑥 + (1 − 𝜃)𝑥 ) ≤ 𝜃𝑓(𝑥 ) + (1 − 𝜃)𝑓(𝑥 ) Sufficiency. Suppose 𝑓 satisfies the inequality, we want to prove epi 𝑓 is a convex set. For any two points (𝑥 , 𝑦 ), (𝑥 , 𝑦 ) in epi 𝑓 ⊆ 𝐴×ℝ, then 𝑦 ≥ 𝑓(𝑥 ), 𝑦 ≥ 𝑓(𝑥 ), Let 𝜃 ∈ [0,1], then 𝑓(𝜃𝑥 + (1 − 𝜃)𝑥 ) ≤ 𝜃𝑓(𝑥 ) + (1 − 𝜃)𝑓(𝑥 ) ≤ 𝜃𝑦 + (1 − 𝜃)𝑦 ⇒ (𝜃𝑥 + (1 − 𝜃)𝑥 , 𝜃𝑦 + (1 − 𝜃)𝑦 ) ∈ epi 𝑓 ⇒ 𝜃(𝑥 , 𝑦 ) + (1 − 𝜃)(𝑥 , 𝑦 ) ∈ epi 𝑓 Note the sufficiency requires Jensen’s inequality to hold for any two points 𝑥 , 𝑥 . The following gives two counter examples when Jensen’s inequality only holds for a particular 𝑥 and any 𝑥 . Suppose −1.5𝑥 − 1.5 0.4𝑥 + 1.4 𝑓(𝑥) = −0.4𝑥 + 1.4 1.5𝑥 − 0.5 and 𝑥 = −

,

𝑥 < −1 −1 ≤ 𝑥 ≤ 0 0≤𝑥≤1 𝑥>1

which is the intersection between line −1.5𝑥 − 1.5 and −0.4𝑥 + 1.4. We can

“visually” see that (𝑥 , 𝑥 ) satisfies Jensen’s inequality for any 𝑥 ∈ ℝ. We can also see 𝑥 = (0,1.4) satisfies Jensen’s inequality

Another example is 𝑓(𝑥) =

(𝑥 + 1) (𝑥 − 1)

𝑥 𝑦 > liminf 𝑓(𝑥 ) for some 𝑦, then there exists a subsequence 𝑥 →

𝑓 𝑥

< 𝑦 for large 𝑗, and thus 𝑥

⊆ 𝐿 for large 𝑗. Since 𝑥

s.t.

→ 𝑥, and 𝐿 is closed, then 𝑥 ∈

𝐿 ⇒ 𝑓(𝑥) < 𝑦, which is a contradiction. Show 2) → 3). For any {(𝑥 , 𝑦 )} ⊂ epi 𝑓 and (𝑥 , 𝑦 ) → (𝑥, 𝑦), then 𝑓(𝑥 ) ≤ 𝑦 for all 𝑘 and we have liminf → 𝑓(𝑥 ) ≤ 𝑦. Since we assume 𝑓 is LSC, then 𝑓(𝑥) ≤ liminf → 𝑓(𝑥 ), and we have 𝑓(𝑥) ≤ 𝑦 ⇒ (𝑥, 𝑦) ∈ epi 𝑓, thus epi 𝑓 is closed. Actually, if 𝑓 is proper, then 𝑓 is closed if 𝑓 is LSC w.r.t. dom 𝑓, since the sequence {(𝑥 , 𝑦 )} in this proof has its every 𝑥 ∈ dom 𝑓. Show 3) → 1). Assume epi 𝑓 is closed, and let {𝑥 } be any sequence in some level set 𝐿 and 𝑥 → 𝑥, then (𝑥 , 𝑦) ∈ epi 𝑓 for all 𝑘 since the definition of level set 𝐿 says 𝑓(𝑥 ) ≤ 𝑦. Since (𝑥 , 𝑦) → (𝑥, 𝑦) and epi 𝑓 is closed, then (𝑥, 𝑦) ∈ epi 𝑓 ⇒ 𝑓(𝑥) ≤ 𝑦 ⇒ 𝑥 ∈ 𝐿 , so 𝐿 is closed. 

Property 13-9 A closed effective domain does not imply a closed function, as shown in Figure 13-1. We have instead if a 𝑓: 𝐴 → ℝ function has a closed dom 𝑓, and 𝑓 is LSC over dom 𝑓, then 𝑓 is closed or equivalently 𝑓 is LSC over the entire 𝐴. This is a generalized claim for 2) → 3) of previous lemma. For any {(𝑥 , 𝑦 )} ⊂ epi 𝑓 and (𝑥 , 𝑦 ) → (𝑥, 𝑦) , since dom 𝑓 is closed, then 𝑥 ∈ dom 𝑓 , which ensures 𝑓(𝑥) is meaningful. The remaining proof is exactly the same.

Conversely, also note a closed function might not have a closed effective domain, even if that function is closed convex. Consider 𝑓(𝑥) = , 𝑥 > 0, whose epi 𝑓 is clearly closed, but dom 𝑓 = (0, +∞) is an open set. 

Property 13-10 If 𝑓 is a convex function, then its level sets are convex. For any 𝑦 ∈ ℝ, For any 𝑥 , 𝑥 ∈ 𝐿 , we have 𝑓(𝑥 ) ≤ 𝑦, 𝑓(𝑥 ) ≤ 𝑦 by the definition of level sets. Then by Jensen’s inequality we have 𝑓(𝜃𝑥 + (1 − 𝜃)𝑥 ) ≤ 𝜃𝑓(𝑥 ) + (1 − 𝜃)𝑓(𝑥 ) ≤ 𝑦 ⇒ 𝜃𝑥 + (1 − 𝜃)𝑥 ∈ 𝐿 However, all level sets being convex does not imply 𝑓 is a convex function. For example, 𝑓(𝑥) = 1 𝑥≤0 is not a convex function since it is not continuous, but all its level sets (either (−∞, 0] or ℝ) 2 𝑥>0 are convex.



Theorem 13-3 𝑓: 𝐴 → ℝ is a proper closed convex function, then 𝑓 is continuous over dom 𝑓. If 𝑥 ∈ ri(dom 𝑓), then by Theorem 13-2 we have 𝑓 is continuous at 𝑥. If 𝑥 is a boundary point of dom 𝑓, then for any 𝑥 → 𝑥, let 𝑟 = sup‖𝑥𝑥 ⃗‖, then {𝑥 } ⊂ 𝐵 (𝑥)⋂ dom 𝑓. By the proof of Theorem 13-2, limsup



𝑓(𝑥 ) ≤ 𝑓(𝑥)

Then since 𝑓 is closed, we have 𝑓 is LSC by Lemma 13-3, and liminf Thus lim 





𝑓(𝑥 ) ≥ 𝑓(𝑥)

𝑓(𝑥 ) = 𝑓(𝑥).

Property 13-11 Let 𝒜: 𝐴 → 𝐵 be an affine map from a normed affine space 𝐴 to another normed affine space 𝐵, and 𝑓: 𝐵 → ℝ be a proper convex function, then 𝑔 = 𝑓 ∘ 𝒜: 𝐴 → ℝ is a proper convex function except for it could happen that 𝑓 ∘ 𝒜 ≡ +∞. For any 𝑥 , 𝑥 ∈ 𝐴, we have 𝑔(𝜃𝑥 + (1 − 𝜃)𝑥 ) = 𝑓 𝒜(𝜃𝑥 + (1 − 𝜃)𝑥 ) = 𝑓 𝜃𝒜(𝑥 ) + (1 − 𝜃)𝒜(𝑥 ) ≤ 𝜃𝑓 𝒜(𝑥 ) + (1 − 𝜃)𝑓 𝒜(𝑥 ) = 𝜃𝑔(𝑥 ) + (1 − 𝜃)𝑔(𝑥 ) Moreover, if 𝑓 is closed, then 𝑔 = 𝑓 ∘ 𝒜 is closed, simply due to affine map 𝒜 is continuous on 𝐵 by Property 11-12, and 𝑓 is LSC on 𝐴 by Lemma 13-3, and then 𝑔 is LSC over 𝐴 as a composition of 𝑓 and 𝒜 : for any 𝑥 → 𝑥 in 𝐴 , 𝒜(𝑥 ) → 𝒜(𝑥) in 𝐵 , then liminf𝑓 𝒜(𝑥𝑘 ) ≥ 𝑓 𝒜(𝑥) since 𝑓 is LSC, →

which implies liminf𝑔(𝑥 ) ≥ 𝑔(𝑥) for any 𝑥 → 𝑥, so 𝑔 is LSC as well. Now 𝑔 is closed again by Lemma 13-3.





Property 13-12 Let 𝑓 , … , 𝑓 be finite many proper 𝐴 → ℝ convex functions, then 𝑓 = ∑ 𝑐 𝑓 is a proper convex function for any 𝑐 , … , 𝑐 ∈ [0, +∞) except for it could happen that 𝑓 ≡ +∞, and 𝑓 is closed if all 𝑓 , … , 𝑓 are closed. Clearly for any 𝑥 , 𝑥 ∈ 𝐴, we have 𝑛

𝑐𝑖 𝑓𝑖 (𝜃𝑥 + (1 − 𝜃)𝑥 )

𝑓(𝜃𝑥 + (1 − 𝜃)𝑥 ) = 𝑖=1

𝑛

≤𝜃

𝑛

𝑐𝑖 𝑓𝑖 (𝑥 ) + (1 − 𝜃) 𝑖=1

𝑐𝑖 𝑓𝑖 (𝑥 ) = 𝜃𝑓(𝑥1 ) + (1 − 𝜃)𝑓(𝑥2 ) 𝑖=1

If all 𝑓 , … , 𝑓𝑛 are closed, then for any 𝑥 → 𝑥 in 𝐴, we have

liminf𝑓(𝑥𝑘 ) = →

𝑐 liminf𝑓 (𝑥𝑘 ) ≥

𝑐 𝑓 (𝑥) = 𝑓(𝑥)



implying 𝑓 is LSU and hence closed by Lemma 13-3. 

Property 13-13 Given a family of functions {𝑓 } ∈ where each 𝑓 : 𝐴 → ℝ is a proper convex function, then their pointwise supremum function, defined as sup 𝑓 (𝑥) = sup 𝑓 (𝑥), is also a proper convex ∈



𝐴 → ℝ function except for it could happen that sup 𝑓 ≡ +∞. Moreover, if all 𝑓 are closed, then sup 𝑓 ∈



is also closed. Note sup 𝑓 is a notation for the supremum function. Note 𝑦 ≥ sup 𝑓 (𝑥) iff 𝑦 ≥ 𝑓 (𝑥) ∈



for every 𝑖 (this is a property of supremum and is proved in EX 28). Thus epi sup 𝑓 = (𝑥, 𝑦) ∈ 𝐴×ℝ: 𝑦 ≥ sup 𝑓 (𝑥) = (𝑥, 𝑦) ∈ 𝐴×ℝ: ∈



𝑦 ≥ 𝑓 (𝑥) ∈

and ⋀ ∈ 𝑦 ≥ 𝑓 (𝑥) is a family of conditions. Thus, epi sup 𝑓 = (𝑥, 𝑦) ∈ 𝐴×ℝ: 𝑦 ≥ sup 𝑓 (𝑥) = ∈

{(𝑥, 𝑦) ∈ 𝐴×ℝ: 𝑦 ≥ 𝑓 (𝑥)} =

epi 𝑓



which is an intersection of a family of convex sets. Thus epi sup 𝑓 is still a convex set. If every 𝑓 is ∈

closed, then every epi 𝑓 is closed, and epi sup 𝑓 is closed as an intersection of them. ∈

EX 28. Given a set Ω, prove that 𝑦 ≥ sup Ω iff 𝑦 ≥ 𝑥 for every 𝑥 ∈ Ω. Key. Necessity is obvious. Sufficiency can be proved using the property of supremum: if sup Ω = a exists, then ∀𝜖 > 0∃𝑥 ∈ Ω s.t. a − 𝜖 < 𝑥 ≤ a). If 𝑦 ≥ 𝑥 for every 𝑥 ∈ Ω but 𝑦 < sup Ω, then there exists some 𝑥 ∈ Ω s.t. 𝑦 < 𝑥 ≤ sup Ω, contradicting with that 𝑦 ≥ 𝑥 for every 𝑥 ∈ Ω .

Closure of Convex Functions Recall a function is called closed if its epigraph is closed. As shown above in Figure 13-1, the epigraph of a function 𝑓: 𝐴 → ℝ could be open. Also, recall by Property 13-2 and Property 13-3 that the closure or convex hull of an epigraph is still an epigraph of some function. Define the closure 𝑓 ̅ of 𝑓 as a function whose epigraph is the closure of epi 𝑓, i.e. 𝑓 ̅(𝑥) = inf{𝑦: (𝑥, 𝑦) ∈ epi 𝑓}

Meanwhile, define the convex closure 𝑓 of 𝑓 as a convex function whose epigraph is the closure of the convex hull of epi 𝑓, i.e. 𝑓(𝑥) = inf{𝑦: (𝑥, 𝑦) ∈ 𝐶(epi 𝑓)} 

Lemma 13-4 Define 𝑓 as 𝑓 (𝑥) = inf{𝑦: (𝑥, 𝑦) ∈ 𝐶(epi 𝑓)}, then 𝑓 is a convex function. Note epi 𝑓 is 𝐶(epi 𝑓) plus some “vertical” limit points. That is, for any {𝑥, 𝑦 } ∈ 𝐶(epi 𝑓), if 𝑦 ↓ 𝑦, then (𝑥, 𝑦) ∈ epi 𝑓.

Now for any (𝑥 , 𝑦 ), (𝑥 , 𝑦 ) ∈ epi 𝑓, if both points are in 𝐶(epi 𝑓), then their convex combinations are obviously in epi 𝑓. If one of the point, say (𝑥 , 𝑦 ), is not in 𝐶(epi 𝑓), then 𝑦 = inf{𝑦: (𝑥 , 𝑦) ∈ 𝐶(epi 𝑓)}, and there exists a sequence {(𝑥 , 𝓎 )} ⊆ 𝐶(epi 𝑓) s.t. (𝑥 , 𝓎 ) ↓ (𝑥 , 𝑦 ). As a result, for any 𝜃 ∈ [0,1], we have 𝜃(𝑥 , 𝓎 ) + (1 − 𝜃)(𝑥 , 𝑦 ) = (𝜃𝑥 + (1 − 𝜃)𝑥 , 𝜃𝓎 + (1 − 𝜃)𝑦 ) ∈ 𝐶(epi 𝑓) for every 𝑘, and 𝜃𝓎 + (1 − 𝜃)𝑦 ↓ 𝜃𝑦 + (1 − 𝜃)𝑦 , and thus 𝜃(𝑥 , 𝑦) + (1 − 𝜃)(𝑥 , 𝑦 ) ∈ epi 𝑓 . The argument is similar when both (𝑥 , 𝑦 ), (𝑥 , 𝑦 ) are not in 𝐶(epi 𝑓). Lemma 13-5 Note it is possible that 𝑓 ≠ 𝑓. For example, if 𝑓 is a non-closed convex function, then 𝑓 = 𝑓, but 𝑓 = 𝑓 ̅ ≠ 𝑓 = 𝑓 . However, 𝑓 is the closure of 𝑓 , i.e. epi 𝑓 = 𝐶(epi 𝑓) = epı 𝑓 , which is simply due to epi 𝑓 is 𝐶(epi 𝑓) plus some “vertical” limit points, and its closure must be 𝐶(epı 𝑓). 

Property 13-14 dom 𝑓 ⊆ dom 𝑓 ̅ ⊆ dom 𝑓 , dom 𝑓 ⊆ dom 𝑓 ⊆ dom 𝑓. This is a trivial property due to that epi 𝑓 ⊆ epi 𝑓 ̅ ⊆ epi 𝑓 , epi 𝑓 ⊆ epi 𝑓 ⊆ epi 𝑓. Note it is possible that dom 𝑓 ≠ dom 𝑓 ̅, for example 𝑓(𝑥) = 1, 𝑥 ∈ (0,1), whose closure is 𝑓(𝑥) = 1, 𝑥 ∈ [0,1]. Also, note the relation between dom 𝑓 ̅ and dom 𝑓 is undetermined. As we can see for 𝑓(𝑥) = 1, 𝑥 ∈ (0,1) , we have dom 𝑓 ̅ = [0,1] ⊃ dom 𝑓 = dom 𝑓 = (0,1) , but for 𝑔(𝑥) = 1, 𝑥 ∈ (0,1)⋃(2,3) we have dom 𝑔 = (0,3) ⊃ dom 𝑔̅ = [0,1]⋃[2,3].



Theorem 13-4 For any 𝑓: 𝐴 → ℝ function, inf 𝑓 = inf 𝑓 ̅ = inf 𝑓 = inf 𝑓; and 𝑓 attains its minimum at point 𝑥 ∗ , then 𝑓̅, 𝑓 and 𝑓 attain minimum at 𝑥 ∗ . This theorem basically implies theoretically an optimization problem of any function 𝑓 can be converted to an optimization problem on convex function 𝑓 or 𝑓 . If epi 𝑓 = ∅, then 𝑓 = 𝑓 ̅ = 𝑓 = 𝑓 ≡ +∞ and the claim trivially holds. If epi 𝑓 ≠ ∅, for any sequence {(𝑥 , 𝑦 )} ⊆ epi 𝑓 ̅ = epı 𝑓 s.t. 𝑦 → inf 𝑓,̅ then for every 𝑘 there exists a sequence 𝓍 , , 𝓎 , ⊆ epi 𝑓 s.t. 𝓍 , , 𝓎 , → (𝑥 , 𝑦 ) . Note here either inf 𝑓̅ is finite or inf 𝑓 ̅ = −∞, but no matter which case we can construct a sequence {(𝓍 , 𝓎 )} ⊆ epi 𝑓 s.t. 𝓎 → inf 𝑓 ̅: let 𝜖 =

, choose a large 𝑗 s.t. 𝓎

,

−𝑦

< 𝜖 and let (𝓍 , 𝓎 ) = (𝓍 , , 𝓎 , ). We have inf 𝑓(𝑥) ≤

𝑓(𝓍 ) ≤ 𝓎 since {(𝓍 , 𝓎 )} ⊆ epi 𝑓, then inf 𝑓(𝑥) ≤ limsup



𝑓(𝓍 ) ≤ inf 𝑓 ̅ ≤ 𝑓̅(𝑥) ≤ 𝑓(𝑥) ⇒ inf 𝑓 ̅ = inf 𝑓

Now let {(𝑥 , 𝑦 )} ⊆ 𝐶(epi 𝑓) s.t. 𝑦 → inf 𝑓, then each (𝑥 , 𝑦 ) is a convex combination of points in epi 𝑓, so 𝑦 ≥ 𝑓(𝑥 ) ≥ inf 𝑓 for every 𝑘, which implies inf 𝑓 ≥ inf 𝑓. However, 𝑓 ≤ 𝑓 since epi 𝑓 ⊆ epi 𝑓 , and so inf 𝑓 ≤ inf 𝑓 ⇒ inf 𝑓 = inf 𝑓 . Last, 𝑓 is the closure of 𝑓 , so inf 𝑓 = inf 𝑓 by above discussion, and finally inf 𝑓 = inf 𝑓 ̅ = inf 𝑓 = inf 𝑓. If 𝑥 ∗ attains inf 𝑓, i.e. 𝑓(𝑥 ∗ ) = inf 𝑓, then inf 𝑓̅ = inf 𝑓 = 𝑓(𝑥 ∗) ≥ 𝑓 ̅(𝑥 ∗ ) ⇒ 𝑓 ̅(𝑥 ∗) = inf 𝑓̅ For exactly the same logic, we have 𝑓 (𝑥 ∗ ) = 𝑓(𝑥 ∗) = inf 𝑓. Of course, 𝑥 ∗ attaining inf 𝑓̅ , inf 𝑓 or inf 𝑓 (−1,0)⋃(0,1) does not imply 𝑓(𝑥 ∗ ) = inf 𝑓, for example 𝑓(𝑥) = 𝑥 𝑥 ∈ , whose 𝑓̅ = 𝑓 = 𝑓 = 𝑥 1 𝑥=0 on (−1,1) attains minimum at 𝑥 = 0, but 𝑓(0) ≠ inf 𝑓. 

Theorem 13-5 For any closed 𝑔: 𝐴 → ℝ function, if 𝑔 ≤ 𝑓, then 𝑔 ≤ 𝑓̅; for any convex function 𝑔: 𝐴 → ℝ, if 𝑔 ≤ 𝑓, then 𝑔 ≤ 𝑓; for any closed convex function 𝑓: 𝐴 → ℝ, if 𝑔 ≤ 𝑓, then 𝑔 ≤ 𝑓 . In other words, 𝑓 ̅, 𝑓, 𝑓 are the greatest functions that are closed, convex and closed-convex w.r.t. 𝑓, or they differ minimally from 𝑓 w.r.t. their respective restrictions. Clearly epi 𝑓 ⊆ epi 𝑓̅ and epi 𝑓 ̅ is the smallest closed set that contains epi 𝑓. In other words, epi 𝑓 ̅ is the intersection of closed sets that contain epi 𝑓, and thus for any closed function 𝑔 ≤ 𝑓 we have

epi 𝑓 ⊆ epi 𝑔 ⇒ 𝑔 ≤ 𝑓. The argument is the same for any closed convex function 𝑔 ≤ 𝑓 implies 𝑔 ≤ 𝑓 , due to epi 𝑓 is the smallest closed set that contain 𝐶(epi 𝑓). For 𝑓 , we can see epi 𝑓 minimally adds points to 𝐶(epi 𝑓) that turns it to an epigraph (remove any one added point, one (𝐶(epi 𝑓))ℝ(𝑥) will have an open end). For any convex function 𝑔 ≤ 𝑓, we must have epi 𝑔 ⊆ epi 𝑓 ⊆ 𝐶(epi 𝑓). Now only consider convex 𝑓: 𝐴 → ℝ functions. For convex functions, 𝑓 = 𝑓 and 𝑓 ̅ = 𝑓 . 

Property 13-15 dom 𝑓 = dom 𝑓 ̅, or equivalently ri(dom 𝑓) = ri(dom 𝑓̅). Note epi 𝑓 = {(𝑥, 𝑦): 𝑥 ∈ dom 𝑓 , 𝑦 ≥ 𝑓(𝑥)} epi 𝑓 ̅ = (𝑥, 𝑦): 𝑥 ∈ dom 𝑓 ̅ , 𝑦 ≥ 𝑓 ̅(𝑥) By Property 12-24 we have ri(epi 𝑓) = {(𝑥, 𝑦): 𝑥 ∈ ri(dom 𝑓) , 𝑦 > 𝑓(𝑥)} ri epi 𝑓 ̅ = (𝑥, 𝑦): 𝑥 ∈ ri dom 𝑓̅ , 𝑦 > 𝑓 ̅(𝑥) Since epi 𝑓 ̅ = epı 𝑓 , we have epı 𝑓 = epı 𝑓 ,̅ then by Property 12-19 we have ri(epi 𝑓) = ri epi 𝑓 ̅ , implying ri(dom 𝑓) = ri dom 𝑓 ̅ and dom 𝑓 = dom 𝑓.̅ Note it is possible that dom 𝑓 ⊃ dom 𝑓 ̅. Consider 𝑓(𝑥) = , 𝑥 > 0, and 𝑓 is already a closed convex function whose epigraph is closed.



Property 13-16 𝑓(𝑥) = 𝑓 ̅(𝑥), ∀𝑥 ∈ ri(dom 𝑓). By previous property we have ri(epi 𝑓) = {(𝑥, 𝑦): 𝑥 ∈ ri(dom 𝑓) , 𝑦 > 𝑓(𝑥)} = (𝑥, 𝑦): 𝑥 ∈ ri dom 𝑓̅ , 𝑦 > 𝑓̅(𝑥) = ri epi 𝑓 ̅ Thus, for any 𝑥 ∈ ri(dom 𝑓) = ri dom 𝑓̅ 𝑓(𝑥) = inf{𝑦: (𝑥, 𝑦) ∈ ri(epi 𝑓)} = inf 𝑦: (𝑥, 𝑦) ∈ ri epi 𝑓 ̅

= 𝑓 ̅(𝑥)



Property 13-17 𝑓 is proper iff 𝑓 ̅ is proper, or 𝑓 ̅ is improper iff 𝑓 is improper. Note 𝑓 ≡ +∞ iff 𝑓̅ ≡ +∞, so in this case the property holds trivially. Now consider the case of epi 𝑓 ≠ ∅. Necessity. If 𝑓 ̅ is improper, then 𝑓 ̅ = −∞ over dom 𝑓̅ by EX 30, which means 𝑓(𝑥) = 𝑓̅(𝑥) = −∞, ∀𝑥 ∈ ri dom 𝑓 ̅ by previous property, and thus 𝑓 is improper. Sufficiency is trivial: if 𝑓 is improper, then 𝑓(𝑥) = −∞ for some 𝑥, and 𝑓̅(𝑥) = −∞ at the same 𝑥.



Property 13-18 For any 𝑥 ∈ ri(dom 𝑓), we have 𝑓 ̅(𝑥) = lim 𝑓(𝑥 + ℎ𝑥𝑥 ⃗) , ∀𝑥 ∈ 𝐴. This says 𝑓̅(𝑥) can be estimated as a limit of 𝑓(𝑥) from certain directions.



If 𝑥 ∉ dom 𝑓 ̅ , then 𝑓 ̅(𝑥) = +∞. 𝑓 ̅ is LSC by Lemma 13-3, so for any {𝑥 }, 𝑥 → 𝑥 we have 𝑓 ̅(𝑥) ≤ liminf → 𝑓̅(𝑥 ) ⇒ liminf → 𝑓 ̅(𝑥 ) = +∞ ⇒ 𝑓̅(𝑥 ) = +∞ for all sufficiently large 𝑘. Thus, for any sequence {ℎ }, ℎ → 0, 𝑥 + ℎ 𝑥𝑥 ⃗ → 𝑥, and 𝑓 ̅(𝑥 + ℎ 𝑥𝑥 ⃗) = +∞ for all sufficiently large 𝑘. Since 𝑓 ̅ ≤ 𝑓, we have 𝑓(𝑥 + ℎ 𝑥𝑥 ⃗) = +∞ for all sufficiently large 𝑘 for any sequence {ℎ }, ℎ → 0, which implies lim 𝑓(𝑥 + ℎ𝑥𝑥 ⃗) = +∞ = 𝑓̅(𝑥). →

̅ then 𝑥 + ℎ𝑥𝑥0⃗, ℎ ∈ [0,1] represents a If 𝑥 ∈ dom 𝑓 ̅ ⊆ dom 𝑓 , note 𝑥 ∈ ri(dom 𝑓) = ri(dom 𝑓), segment [𝑥, 𝑥 ]. By Theorem 12-2 we have 𝑥 + ℎ𝑥𝑥0⃗ ∈ ri(dom 𝑓) , ℎ ∈ (0,1] and by Property ̅ + ℎ𝑥𝑥0⃗) = 𝑓(𝑥 + ℎ𝑥𝑥0⃗), ∀ℎ ∈ (0,1]. 13-16 we have 𝑓 (𝑥 1) If 𝑓 ̅(𝑥) = −∞ for any 𝑥 ∈ dom 𝑓, then 𝑓 ̅ ≡ −∞ over dom 𝑓 ̅ by EX 30, which means 𝑓(𝑥) = 𝑓 ̅(𝑥) = −∞, ∀𝑥 ∈ ri dom 𝑓̅

by Property 13-16, thus 𝑓(𝑥 + ℎ𝑥𝑥0⃗) = −∞, ∀ℎ ∈ (0,1], and hence lim 𝑓(𝑥 + ℎ𝑥𝑥0⃗) = −∞. →

̅ 2) If 𝑓 (𝑥) is finite for any 𝑥 ∈ dom 𝑓 , then both 𝑓,̅ 𝑓 are proper, then 𝑓 ̅ and 𝑓 are continuous over dom 𝑓 ̅ and dom 𝑓 respectively by Theorem 13-3, and

̅ ̅ + ℎ𝑥𝑥0⃗) = lim 𝑓(𝑥 + ℎ𝑥𝑥0⃗) 𝑓 (𝑥) = lim 𝑓 (𝑥 →



where we use the fact that if a function 𝑓 is continuous at some point 𝑥, then it is continuous along any direction, proved later by Lemma 14-8. EX 29. The followings give some discussion for an improper convex function 𝑓: ℝ → ℝ taking −∞ values. It is possible that 𝑓

(−∞) = ℝ, since epi 𝑓 = ℝ is a convex set.

𝑓 (−∞) cannot contain two disjoint intervals. WLOG, let (𝑎, 𝑏), (𝑐, 𝑑) be such two intervals with 𝑓(𝑧) > −∞ for some 𝑧 ∈ [𝑏, 𝑐], then 𝑧 = 𝜃𝑥 + (1 − 𝜃)𝑥 for some 𝑥 ∈ (𝑎, 𝑏), 𝑥 ∈ (𝑏, 𝑐) and for any 𝑦 < 𝑓(𝑧) we find (𝑥 , 𝑦), (𝑥 , 𝑦) ∈ epi 𝑓 but 𝜃(𝑥 , 𝑦) + (1 − 𝜃)(𝑥 , 𝑦) ∉ epi 𝑓 . The proof is the same for other types of intervals. Thus 𝑓 (−∞) contains a single interval. If there exists 𝑥, 𝑦 ∈ ℝ s.t. 𝑓 (−∞) = (𝑎, 𝑏) , then we can let 𝑓(𝑎), 𝑓(𝑏) ∈ ℝ and 𝑓 (+∞) = (−∞, 𝑎)⋃(𝑏, +∞). In this case for any (𝑥 , 𝑦 ), (𝑥 , 𝑦 ) ∈ epi 𝑓, 𝜃𝑥 + (1 − 𝜃)𝑥 ∈ (𝑎, 𝑏) for 𝜃 ∈ (0,1), and hence (𝜃𝑥 + (1 − 𝜃)𝑥 , 𝜃𝑦 + (1 − 𝜃)𝑦 ) ∈ epi 𝑓. It is easy to see we can also let 𝑓(𝑎), 𝑓(𝑏) = ±∞. It is not possible to let 𝑓(𝑥 ) ∈ ℝ for any 𝑥 ∈ (−∞, 𝑎). Suppose not, let 𝑦 = 𝑓(𝑥 ), and choose 𝑥 ∈ (𝑎, 𝑏). Now if epi 𝑓 is convex, then 𝜃(𝑥 , 𝑦 ) + (1 − 𝜃)(𝑥 , 𝑦 ) ∈ epi 𝑓 for any 𝜃 ∈ (0,1) and any 𝑦 ∈ (+∞, −∞), then (𝑥 , 𝑥 )×(−∞, +∞) ⊂ epi 𝑓 and 𝑓 (−∞) = (𝑥 , 𝑏) ⇒ 𝑎 = 𝑥 , which is absurd since 𝑥 > 𝑎 by assumption. For similar reason, it is not possible to let 𝑓(𝑥) ∈ ℝ for any 𝑥 ∈ (𝑏, +∞). As a result, if 𝑓: ℝ → ℝ takes −∞ at some 𝑥 ∈ ℝ, then 𝑓 (𝑎, 𝑏] or [𝑎, 𝑏].

(−∞) is a single interval (𝑎, 𝑏) or (𝑎, 𝑏] or

EX 30. Show that a closed improper convex function 𝑓: 𝐴 → ℝ must have 𝑓(𝑥) = −∞, ∀𝑥 ∈ dom 𝑓. Key. If 𝑓 ≡ +∞, then dom 𝑓 = ∅ and there is nothing to prove. If 𝑓(𝑥) = −∞ for some 𝑥 ∈ dom 𝑓 , assume there is another point 𝑥 ∈ dom 𝑓 s.t. 𝑓(𝑥 ) is finite, then consider the following sequence 𝑥 = We have 𝑓(𝑥 ) ≤

𝑘−1 1 𝑥 + 𝑥, 𝑘 = 1,2, … ⇒ 𝑥 → 𝑥 𝑘 𝑘

𝑓(𝑥 ) + 𝑓(𝑥) = −∞ ⇒ 𝑓(𝑥 ) = −∞ for every 𝑘 by Jensen’s inequality. 𝑓 is LSC

by Lemma 13-3, we have 𝑓(𝑥) ≤ liminf 𝑓(𝑥 ) = −∞ ⇒ 𝑓(𝑥) = −∞, which is a contradiction. →

Suggest Documents