Introduction - Mathematical Foundations - Computer Vision Group

12 downloads 194 Views 7MB Size Report
Variational Methods in Computer Vision. ICCV Tutorial, 6.11.2011. Chapter 1. Introduction. Mathematical Foundations. Daniel Cremers and Bastian Goldlücke.
Variational Methods in Computer Vision ICCV Tutorial, 6.11.2011

Chapter 1

Introduction Mathematical Foundations

Daniel Cremers and Bastian Goldlucke ¨ Computer Vision Group Technical University of Munich Thomas Pock Institute for Computer Graphics and Vision Graz University of Technology

1

R

Rx 0

Rx 0

Overview

Rx 0

D. Cremers, B. Goldlucke, ¨ T. Pock ICCV 2011 Tutorial Variational Methods in Computer Vision

1 Variational Methods

Introduction Convex vs. non-convex functionals Archetypical model: ROF denoising The variational principle The Euler-Lagrange equation 2 Total Variation and Co-Area

The space BV(Ω) Geometric properties Co-area 3 Convex analysis

Convex functionals Constrained Problems Conjugate functionals Subdifferential calculus Proximation and implicit subgradient descent 4 Summary 2

R

Rx

Variational Methods

Overview

R x0

Rx 0

0

D. Cremers, B. Goldlucke, ¨ T. Pock ICCV 2011 Tutorial Variational Methods in Computer Vision

1 Variational Methods

Introduction Convex vs. non-convex functionals Archetypical model: ROF denoising The variational principle The Euler-Lagrange equation 2 Total Variation and Co-Area

The space BV(Ω) Geometric properties Co-area 3 Convex analysis

Convex functionals Constrained Problems Conjugate functionals Subdifferential calculus Proximation and implicit subgradient descent 4 Summary 3

R

Variational Methods

Rx 0

R Introduction 0x

D. Cremers, B. Goldlucke, ¨ T. Pock

Rx Fundamental problems in computer vision 0

ICCV 2011 Tutorial Variational Methods in Computer Vision

Image labeling problems

Segmentation and Classification

Stereo

Optic flow

4

R

Variational Methods

Rx 0

R Introduction 0x

Rx Fundamental problems in computer vision 0

D. Cremers, B. Goldlucke, ¨ T. Pock ICCV 2011 Tutorial Variational Methods in Computer Vision

3D Reconstruction

5

R

Variational Methods

Rx 0

R Introduction 0x

Rx Variational methods 0

D. Cremers, B. Goldlucke, ¨ T. Pock ICCV 2011 Tutorial Variational Methods in Computer Vision

Unifying concept: variational approach

6

R

Variational Methods

Rx 0

R Introduction 0x

Rx Variational methods 0

D. Cremers, B. Goldlucke, ¨ T. Pock ICCV 2011 Tutorial Variational Methods in Computer Vision

Unifying concept: variational approach

Problem solution is the minimizer of an energy functional E, argmin E(u). u∈V

6

R

Variational Methods

Rx 0

R Introduction 0x

Rx Variational methods 0

D. Cremers, B. Goldlucke, ¨ T. Pock ICCV 2011 Tutorial Variational Methods in Computer Vision

Unifying concept: variational approach

Problem solution is the minimizer of an energy functional E, argmin E(u). u∈V

In the variational framework, we adopt a continuous world view.

6

R

Variational Methods

Rx 0

R Introduction 0x

Rx Images are functions 0

D. Cremers, B. Goldlucke, ¨ T. Pock ICCV 2011 Tutorial Variational Methods in Computer Vision

A greyscale image is a real-valued function u : Ω → R on an open set Ω ⊂ R2 . 7

R

Variational Methods

Rx 0

R Introduction 0x

Rx Surfaces are manifolds 0

D. Cremers, B. Goldlucke, ¨ T. Pock ICCV 2011 Tutorial Variational Methods in Computer Vision

Region Ω0 (backgroun

Ω1 Ω1

Ω0 Σ

2D

Region Ω1 (flower)

3D

Volume usually modeled as the level set {x ∈ Ω : u(x) = 1} of a binary function u : Ω → {0, 1}.

8

R

Variational Methods

Rx 0

R Convex vs. non-convex functionals 0x

Convex versus non-convex energies

non-convex energy

Rx 0

D. Cremers, B. Goldlucke, ¨ T. Pock ICCV 2011 Tutorial Variational Methods in Computer Vision

convex energy

9

R

Variational Methods

Rx 0

R Convex vs. non-convex functionals 0x

Convex versus non-convex energies

non-convex energy • Cannot be globally minimized

Rx 0

D. Cremers, B. Goldlucke, ¨ T. Pock ICCV 2011 Tutorial Variational Methods in Computer Vision

convex energy • Efficient global minimization

9

R

Variational Methods

Rx 0

R Convex vs. non-convex functionals 0x

Convex versus non-convex energies

non-convex energy

Rx 0

D. Cremers, B. Goldlucke, ¨ T. Pock ICCV 2011 Tutorial Variational Methods in Computer Vision

convex energy

• Cannot be globally minimized

• Efficient global minimization

• Realistic modeling

• Often unrealistic models

9

R

Variational Methods

Rx 0

R Convex vs. non-convex functionals 0x

Convex relaxation

D. Cremers, B. Goldlucke, ¨ T. Pock

Rx

ICCV 2011 Tutorial Variational Methods in Computer Vision

0

Convex relaxation: best of both worlds? E



{

R

• Start with realistic non-convex model energy E • Relax to convex lower bound R, which can be efficiently

minimized • Find a (hopefully small) optimality bound  to estimate quality of

solution. R 10

Variational Methods

Rx 0

R Archetypical model: ROF denoising 0x

Rx A simple (but important) example: Denoising 0

D. Cremers, B. Goldlucke, ¨ T. Pock ICCV 2011 Tutorial Variational Methods in Computer Vision

The TV-L2 (ROF) model, Rudin-Osher-Fatemi 1992 For a given noisy input image f , compute  Z Z 1 (u − f )2 dx . argmin |∇u|2 dx + 2λ Ω u∈L2 (Ω) | Ω {z } | {z } regularizer / prior

data / model term

Note: In Bayesian statistics, this can be interpreted as a MAP estimate for Gaussian noise.

Original

Noisy

Result, λ = 2 R 11

Variational Methods

n

Rx 0

2

R vs. L (Ω)

R Archetypical model: ROF denoising 0x

Rx

ICCV 2011 Tutorial Variational Methods in Computer Vision

0

Elements Inner Product Norm

D. Cremers, B. Goldlucke, ¨ T. Pock

V = Rn finitely many components xi , 1 ≤ i ≤ n Pn (x, y ) = i=1 xi yi qP n 2 |x|2 = i=1 xi

V = L2 (Ω) infinitely many “components” u(x), x ∈ Ω R (u, v ) = Ω uv dx R  12 2 kuk2 = Ω |u| dx

Derivatives of a functional E : V → R Gradient ´ (Frechet ) Directional ˆ (Gateaux ) Condition for minimum

dE(x) = ∇E(x)

dE(u) = ?

δE(x; h) = ∇E(x) · h

δE(u; h) = ?

∇E(xˆ ) = 0

?

R 12

Variational Methods

Rx 0

R The variational principle 0x

Rx ˆ Gateaux differential 0

D. Cremers, B. Goldlucke, ¨ T. Pock ICCV 2011 Tutorial Variational Methods in Computer Vision

Definition Let V be a vector space, E : V → R a functional, u, h ∈ V. If the limit 1 (E(u + αh) − E(u)) α→0 α

δE(u; h) := lim

ˆ exists, it is called the Gateaux differential of E at u with increment h. • The Gateaux ˆ differential can be though of as the directional

derivative of E at u in direction h. • A classical term for the Gateaux ˆ differential is “variation of E”, hence the term “variational methods”. You test how the functional “varies” when you go into direction h.

R 13

Variational Methods

Rx 0

R The variational principle 0x

Rx The variational principle 0

D. Cremers, B. Goldlucke, ¨ T. Pock ICCV 2011 Tutorial Variational Methods in Computer Vision

The variational principle is a generalization of the necessary condition for extrema of functions on Rn . Theorem (variational principle) ˆ ∈ V is an extremum of a functional E : V → R, then If u ˆ ; h) = 0 for all h ∈ V. δE(u ˆ is an extremum of E, then 0 must be an For a proof, note that if u extremum of the real function ˆ + th) t 7→ E(u for all h.

R 14

Variational Methods

Rx 0

R The Euler-Lagrange equation 0x

D. Cremers, B. Goldlucke, ¨ T. Pock

Rx Derivation of Euler-Lagrange equation (1) 0

ICCV 2011 Tutorial Variational Methods in Computer Vision

Method: • Compute the Gateaux ˆ derivative of E at u in direction h, and

write it in the form Z δE(u; h) =

φu h dx, Ω

with a function φu : Ω → R and a test function h ∈ Cc∞ (Ω). • At an extremum, this expression must be zero for arbitrary test

functions h, thus (due to the “duBois-Reymond Lemma”) you get the condition φu = 0. This is the Euler-Lagrange equation of the functional E. • Note: the form above is in analogy to the finite-dimensional case,

where the gradient satisfies δE(x; h) = h∇E(x), ·hi.

R 15

Variational Methods

Rx 0

R The Euler-Lagrange equation 0x

Rx Euler-Lagrange equation 0

D. Cremers, B. Goldlucke, ¨ T. Pock ICCV 2011 Tutorial Variational Methods in Computer Vision

The Euler-Lagrange equation is a PDE which has to be satisfied by ˆ . A ready-to-use formula can be derived for an extremal point u energy functionals of a specific, but very common form. Theorem ˆ be an extremum of the functional E : C 1 (Ω) → R, and E be of Let u the form Z E(u) = L(u, ∇u, x) dx, Ω

with L : R × Rn × Ω → R, (a, b, x) 7→ L(a, b, x) continuously ˆ satisfies the Euler-Lagrange equation differentiable. Then u ∂a L(u, ∇u, x) − divx [∇b L(u, ∇u, x)] = 0, where the divergence is computed with respect to the location variable x, and  T ∂L ∂L ∂L ∂a L := , ∇b L := ... . ∂a ∂b1 ∂bn R 16

Variational Methods

Rx 0

R The Euler-Lagrange equation 0x

Rx Derivation of Euler-Lagrange equation (2) 0

D. Cremers, B. Goldlucke, ¨ T. Pock ICCV 2011 Tutorial Variational Methods in Computer Vision

ˆ The Gateaux derivative of E at u in direction h is Z 1 δE(u; h) = lim L(u + αh, ∇(u + αh), x) − L(u, ∇u, x) dx. α→0 α Ω Because of the assumptions on L, we can take the limit below the integral and apply the chain rule to get Z δE(u; h) = ∂a L(u, ∇u, x)h + ∇b L(u, ∇u, x) · ∇h dx. Ω

Applying integration by parts to the second part of the integral with p = ∇b L(u, ∇u, x), noting h ∂Ω = 0, we get  Z  δE(u; h) = ∂a L(u, ∇u, x) − divx [∇b L(u, ∇u, x)] · h dx. Ω

This is the desired expression, from which we can directly see the definition of φu . R 17

Variational Methods

Rx 0

R Roadmap 0x

D. Cremers, B. Goldlucke, ¨ T. Pock

Rx Open questions 0

ICCV 2011 Tutorial Variational Methods in Computer Vision

• The regularizer of the ROF functional is

Z Ω

|∇u|2 dx,

which requires u to be differentiable. Yet, we are looking for minimizers in L2 (Ω). It is necessary to generalize the definition of the regularizer, which will lead to the total variation in the next section. • The total variation is not a differentiable functional, so the

variational principle is not applicable. We need a theory for convex, but not differentiable functionals.

R 18

Total Variation and Co-Area

Overview

Rx 0

Rx 0

Rx 0

D. Cremers, B. Goldlucke, ¨ T. Pock ICCV 2011 Tutorial Variational Methods in Computer Vision

1 Variational Methods

Introduction Convex vs. non-convex functionals Archetypical model: ROF denoising The variational principle The Euler-Lagrange equation 2 Total Variation and Co-Area

The space BV(Ω) Geometric properties Co-area 3 Convex analysis

Convex functionals Constrained Problems Conjugate functionals Subdifferential calculus Proximation and implicit subgradient descent 4 Summary R 19

Total Variation and Co-Area

Rx 0

R The space BV(Ω) 0x

D. Cremers, B. Goldlucke, ¨ T. Pock

Rx Definition of the total variation 0

ICCV 2011 Tutorial Variational Methods in Computer Vision

• Let u ∈ L1loc (Ω). Then the total variation of u is defined as

 Z  1 n J(u) := sup − u · div(ξ) dx : ξ ∈ Cc (Ω, R ), kξk∞ ≤ 1 . Ω

• The space BV(Ω) of functions of bounded variation is defined as

 BV(Ω) := u ∈ L1loc (Ω) : J(u) < ∞ . An idea why this is the same as before: the norm of the gradient |∇u(x)|2 below the integral for a fixed point x can be written as |∇u(x)|2 =

sup

∇u(x) · ξ.

ξ∈Rn :|ξ|2 ≤1

We can then use Gauss’ theorem again to shift the gradient from u to a divergence on ξ. R 20

Total Variation and Co-Area

Rx 0

R The space BV(Ω) 0x

D. Cremers, B. Goldlucke, ¨ T. Pock

Rx Convexity and lower-semicontinuity 0

ICCV 2011 Tutorial Variational Methods in Computer Vision

Below are the main analytical properties of the total variation. It also enjoys a number of interesting geometrical relationships, which will be explored next. Proposition • J is a semi-norm on BV(Ω), and it is convex on L2 (Ω). • J is lower semi-continuous on L2 (Ω), i.e.

kun − uk2 → 0 =⇒ J(u) ≤ lim inf J(un ). un

The above can be shown immediately from the definition, lower semi-continuity requires Fatou’s Lemma. Lower semi-continuity is important for the existence of minimizers, see next section. R 21

Total Variation and Co-Area

Rx 0

R Geometric properties 0x

Rx Characteristic functions of sets 0 ∂U {1U = 0}

U = {1U = 1}

D. Cremers, B. Goldlucke, ¨ T. Pock ICCV 2011 Tutorial Variational Methods in Computer Vision

n

Let U ⊂ Ω. Then the characteristic function of U is defined as ( 1 if x ∈ U, 1U (x) := 0 otherwise.

Notation If u : Ω → R then {f = 0} is a short notation for the set {x ∈ Ω : f (x) = 0} ⊂ Ω. Similar notation is used for inequalities and other properties.

R 22

Total Variation and Co-Area

Rx 0

R Geometric properties 0x

Rx Total variation of a characteristic function 0

D. Cremers, B. Goldlucke, ¨ T. Pock ICCV 2011 Tutorial Variational Methods in Computer Vision

We now compute the TV of the characteristic function of a “sufficiently nice” set U ⊂ Ω, with a C 1 -boundary. Remember: to compute the total variation, one maximizes over all vector fields ξ ∈ Cc1 (Ω, Rn ), kξk∞ ≤ 1: Z Z − 1U · div(ξ) dx = − div(ξ) dx Ω Z U = n · ξ ds (Gauss’ theorem) ∂U

The expression is maximized for any vector field with ξ|∂U = n, hence Z J(1U ) = ds = Hn−1 (∂U). ∂U

Here, Hn−1 , is the (n − 1)-dimensional Haussdorff measure, i.e. the length in the case n = 2, or area for n = 3. R 23

Total Variation and Co-Area

Rx 0

R Co-area 0x

D. Cremers, B. Goldlucke, ¨ T. Pock

Rx The co-area formula 0

ICCV 2011 Tutorial Variational Methods in Computer Vision

The co-area formula in its geometric form says that the total variation of a function equals the integral over the (n − 1)-dimensional area of the boundaries of all its lower level sets. More precisely, Theorem (co-area formula) Let u ∈ BV(Ω). Then Z



J(u) =

J(1{u≤t} ) dt −∞

R 24

Convex analysis

Overview

Rx

R0 x

Rx 0

0

D. Cremers, B. Goldlucke, ¨ T. Pock ICCV 2011 Tutorial Variational Methods in Computer Vision

1 Variational Methods

Introduction Convex vs. non-convex functionals Archetypical model: ROF denoising The variational principle The Euler-Lagrange equation 2 Total Variation and Co-Area

The space BV(Ω) Geometric properties Co-area 3 Convex analysis

Convex functionals Constrained Problems Conjugate functionals Subdifferential calculus Proximation and implicit subgradient descent 4 Summary R 25

Convex analysis

Rx 0

R Convex functionals 0x

D. Cremers, B. Goldlucke, ¨ T. Pock

Rx The epigraph of a functional 0

ICCV 2011 Tutorial Variational Methods in Computer Vision

Definition The epigraph epi(f ) of a functional f : V → R ∪ {∞} is the set “above the graph”, i.e. epi(f ) := {(x, µ) : x ∈ V and µ ≥ f (x)}.

R

epi(f )

f

dom(f )

V R 26

Convex analysis

Rx 0

R Convex functionals 0x

Rx Convex functionals 0

D. Cremers, B. Goldlucke, ¨ T. Pock ICCV 2011 Tutorial Variational Methods in Computer Vision

We choose the geometric definition of a convex function here because it is more intuitive, the usual algebraic property is a simple consequence. Definition • A functional f : V → R ∪ {∞} is called proper if f 6= ∞, or

equivalently, the epigraph is non-empty. • A functional f : V → R ∪ {∞} is called convex if epi(f ) is a convex

set. • The set of all proper and convex functionals on V is denoted

conv(V). The only non-proper function is the constant function f = ∞. We exclude it right away, otherwise some theorems become cumbersome to formulate. From now on, every functional we write down will be proper. R 27

Convex analysis

Rx 0

R Convex functionals 0x

Rx Extrema of convex functionals 0

D. Cremers, B. Goldlucke, ¨ T. Pock ICCV 2011 Tutorial Variational Methods in Computer Vision

Convex functionals have some very important properties with respect to optimization. Proposition Let f ∈ conv(V). Then • the set of minimizers argminx∈V f (x) is convex (possibly empty).

ˆ is a local minimum of f , then xˆ is in fact a global minimum, • if x i.e. xˆ ∈ argminx∈V f (x). Both can be easily deduced from convexity of the epigraph.

R 28

Convex analysis

Rx 0

R Convex functionals 0x

D. Cremers, B. Goldlucke, ¨ T. Pock

Rx Lower semi-continuity and closed functionals 0

ICCV 2011 Tutorial Variational Methods in Computer Vision

Lower semi-continuity is an important property for convex functionals, since together with coercivity it guarantees the existence of a minimizer. It has an intuitive geometric interpretation. Definition Let f : V → R ∪ {∞} be a functional. Then f is called closed if epi(f ) is a closed set. Proposition (closedness and lower semi-continuity) For a functional f : V → R ∪ {∞}, the following two are equivalent: • f is closed. • f is lower semi-continuous, i.e.

f (x) ≤ lim inf f (xn ) xn →x

for any sequence (xn ) which converges to x. R 29

Convex analysis

Rx 0

R Convex functionals 0x

Rx An existence theorem for a minimum 0

D. Cremers, B. Goldlucke, ¨ T. Pock ICCV 2011 Tutorial Variational Methods in Computer Vision

Definition Let f : V → R ∪ {∞} be a functional. Then f is called coercive if it is “unbounded at infinity”. Precisely, for any sequence (xn ) ⊂ V with lim kxn k = ∞, we have lim f (xn ) = ∞. Theorem Let f be a closed, coercive and convex functional on a Banach space V. Then f attains a minimum on V. The requirement of coercivity can be weakened, a precise condition and proof is possible to formulate with the subdifferential calculus. On Hilbert spaces (and more generally, the so-called “reflexive” Banach spaces), the requirements of “closed and convex” can be replaced by “weakly lower semi-continuous”. See [Rockafellar] for details.

R 30

Convex analysis

Rx 0

R Convex functionals 0x

D. Cremers, B. Goldlucke, ¨ T. Pock

Rx Examples 0

ICCV 2011 Tutorial Variational Methods in Computer Vision

• The function x 7→ exp(x) is convex, lower semi-continuous but

not coercive on R. The infimum 0 is not attained. • The function

( ∞ x 7→ x2

if x ≤ 0 if x > 0

is convex, coercive, but not closed on R. The infimum 0 is not attained. • The functional of the ROF model is closed and convex. It is also coercive on L2 (Ω): from the inverse triangle inequality, |kuk2 − kf k2 | ≤ ku − f k2 . Thus, if kun k2 → ∞, then E(un ) ≥ kun − f k2 ≥ |kun k2 − kf k2 | → ∞. Therefore, there exists a minimizer of ROF for each input f ∈ L2 (Ω). R 31

Convex analysis

Rx 0

R Constrained Problems 0x

Rx The Indicator Function of a Set 0

D. Cremers, B. Goldlucke, ¨ T. Pock ICCV 2011 Tutorial Variational Methods in Computer Vision

Definition For any subset S ⊂ V of a vector space, the indicator function δS : V → R ∪ ∞ is defined as ( ∞ if x 6= S, δS (x) := 0 if x ∈ S. Indicator functions give examples for particularly simple convex functions, as they have only two different function values. Proposition (convexity of indicator functions) S is a convex set if and only if δS is a convex function. The proposition is easy to prove (exercise). Note that by convention, r < ∞ for all r ∈ R. R 32

Convex analysis

Rx 0

R Constrained Problems 0x

D. Cremers, B. Goldlucke, ¨ T. Pock

Rx Constrained Problems 0

ICCV 2011 Tutorial Variational Methods in Computer Vision

Suppose you want to find the minimizer of a convex functional f : C → R defined on a convex set C ⊂ V. You can always exchange that with an unconstrained problem which has the same minimizer: introduce an extended function ( ˜f : V → R, ˜f (x) := f (x) if x ∈ C ∞ otherwise. Then argmin f (x) = argmin ˜f (x), x∈V

x∈C

and ˜f is convex. Similarly, if f : V → R is defined on the whole space V, then argmin f (x) = argmin [f (x) + δC (x)] , x∈C

x∈V

and the function on the right hand side is convex. R 33

Convex analysis

Rx 0

R Conjugate functionals 0x

D. Cremers, B. Goldlucke, ¨ T. Pock

Rx Affine functions 0

ICCV 2011 Tutorial Variational Methods in Computer Vision

Note: If you do not know what the dual space V ∗ of a vector space is, then you can substitute V - we work the Hilbert space L2 (Ω), so they are the same. Definition Let ϕ ∈ V ∗ and c ∈ R, then an affine function on V is given by hϕ,c : v 7→ hx, ϕi − c. We call ϕ the slope and c the intercept of hϕ,c . R

−c

hϕ,c [ϕ

−1] V R 34

Convex analysis

Rx 0

R Conjugate functionals 0x

D. Cremers, B. Goldlucke, ¨ T. Pock

Rx Affine functions 0

ICCV 2011 Tutorial Variational Methods in Computer Vision

We would like to find the largest affine function below f . For this, consider for each x ∈ V the affine function which passes through (x, f (x)): hϕ,c (x) = f (x) ⇔ hx, ϕi − c = f (x) ⇔ c = hx, ϕi − f (x). −(hx, ϕi − f (x)) f (x)

epi(f ) f

hϕ,hx,ϕi−f (x) x

V

To get the largest affine function below f , we have to pass to the supremum. The intercept of this function is called the conjugate functional of f . R 35

Convex analysis

Rx 0

R Conjugate functionals 0x

D. Cremers, B. Goldlucke, ¨ T. Pock

Rx Conjugate functionals 0

ICCV 2011 Tutorial Variational Methods in Computer Vision

Definition Let f ∈ conv(V). Then the conjugate functional f ∗ : V ∗ → R ∪ {∞} is defined as f ∗ (ϕ) := sup [hx, ϕi − f (x)] . x∈V

R

epi(f )

−f ∗ (ϕ)

f [ϕ

−1] V hϕ,f ∗ (ϕ) R 36

Convex analysis

Rx 0

Second conjugate

R Conjugate functionals 0x

D. Cremers, B. Goldlucke, ¨ T. Pock

Rx

ICCV 2011 Tutorial Variational Methods in Computer Vision

0

The epigraph of f ∗ consists of all pairs (ϕ, c) such that hϕ,c lies below f . It almost completely characterizes f . The reason for the “almost” is that you can recover f only up to closure. Theorem Let f ∈ conv(V) be closed and V be reflexive, i.e. V ∗∗ = V. Then f ∗∗ = f . For the proof, note that f (x) = sup hϕ,c (x) = hϕ,c ≤f

sup

hϕ,c (x)

(ϕ,c)∈epi(f ∗ )

= sup [hx, ϕi − f ∗ (ϕ)] = f ∗∗ (x). ϕ∈V ∗

The first equality is intuitive, but surprisingly difficult to show - it ultimately relies on the theorem of Hahn-Banach.

R 37

Convex analysis

Rx 0

The subdifferential

R Subdifferential calculus 0x

Rx 0

D. Cremers, B. Goldlucke, ¨ T. Pock ICCV 2011 Tutorial Variational Methods in Computer Vision

Definition • Let f ∈ conv(V). A vector ϕ ∈ V ∗ is called a subgradient of f at

x ∈ V if f (y ) ≥ f (x) + hy − x, ϕi for all y ∈ V. • The set of all subgradients of f at x is called the subdifferential

∂f (x). Geometrically speaking, ϕ is a subgradient if the graph of the affine function h(y ) = f (x) + hy − x, ϕi lies below the epigraph of f . Note that also h(x) = f (x), so it “touches” the epigraph.

R 38

Convex analysis

Rx 0

The subdifferential

R Subdifferential calculus 0x

Rx

D. Cremers, B. Goldlucke, ¨ T. Pock ICCV 2011 Tutorial Variational Methods in Computer Vision

0

Example: the subdifferential of f : x 7→ |x| in 0 is ∂f (0) = [−1, 1].

R 39

Convex analysis

Rx 0

R Subdifferential calculus 0x

Rx Subdifferential and derivatives 0

D. Cremers, B. Goldlucke, ¨ T. Pock ICCV 2011 Tutorial Variational Methods in Computer Vision

´ The subdifferential is a generalization of the Frechet derivative (or the gradient in finite dimension), in the following sense. ´ Theorem (subdifferential and Frechet derivative ´ Let f ∈ conv(V) be Frechet differentiable at x ∈ V. Then ∂f (x) = {df (x)}. The proof of the theorem is surprisingly involved - it requires to relate the subdifferential to one-sided directional derivatives. We will not explore these relationships in this lecture.

R 40

Convex analysis

Rx 0

R Subdifferential calculus 0x

D. Cremers, B. Goldlucke, ¨ T. Pock

Rx Relationship between subgradient and conjugate 0

ICCV 2011 Tutorial Variational Methods in Computer Vision

R

epi(f )

−f ∗ (ϕ)

f hϕ,f ∗ (ϕ) x

V

ϕ is a subgradient at x if and only if the line hϕ,f ∗ (ϕ) touches the epigraph at x. In formulas, ϕ ∈ ∂f (x) ⇔

hϕ,f ∗ (ϕ) (y ) = f (x) + hy − x, ϕi



f ∗ (ϕ) = hx, ϕi − f (x) R 41

Convex analysis

Rx 0

R Subdifferential calculus 0x

Rx The subdifferential and duality 0

D. Cremers, B. Goldlucke, ¨ T. Pock ICCV 2011 Tutorial Variational Methods in Computer Vision

The previously seen relationship between subgradients and conjugate functional can be summarized in the following theorem. Theorem Let f ∈ conv(V) and x ∈ V. Then the following conditions on a vector ϕ ∈ V ∗ are equivalent: • ϕ ∈ ∂f (x). • x = argmaxy∈V [hy, ϕi − f (y)] . • f (x) + f ∗ (ϕ) = hx, ϕi.

If furthermore, f is closed, then more conditions can be added to this list: • x ∈ ∂f ∗ (ϕ). • ϕ = argmaxψ∈V ∗ [hx, ψi − f ∗ (ψ)] .

R 42

Convex analysis

Rx 0

R Subdifferential calculus 0x

Rx Formal proof of the theorem 0

D. Cremers, B. Goldlucke, ¨ T. Pock ICCV 2011 Tutorial Variational Methods in Computer Vision

The equivalences are easy to see. • Rewriting the subgradient definition, one sees that ϕ ∈ ∂f (x)

means hx, ϕi − f (x) ≥ hy , ϕi − f (y ) for all y ∈ V. This implies the first equivalence. • We have seen the second one on the slide before. • If f is closed, then f ∗∗ = f , thus we get

f ∗∗ (x) + f ∗ (ϕ) = hx, ϕi . This is equivalent to the last two conditions using the same arguments as above on the conjugate functional.

R 43

Convex analysis

Rx 0

R Subdifferential calculus 0x

Rx Variational principle for convex functionals 0

D. Cremers, B. Goldlucke, ¨ T. Pock ICCV 2011 Tutorial Variational Methods in Computer Vision

As a corollary of the previous theorem, we obtain a generalized variational principle for convex functionals. It is a necessary and sufficient condition for the (global) extremum. Corollary (variational principle for convex functionals) Let f ∈ conv(V). Then xˆ is a global minimum of f if and only if 0 ∈ ∂f (xˆ ). Furthermore, if f is closed, then xˆ is a global minimum if and only if xˆ ∈ ∂f ∗ (0), i.e. minimizing a functional is the same as computing the subdifferential of the conjugate functional at 0. To see this, just set ϕ = 0 in the previous theorem.

R 44

Convex analysis

Rx 0

R Proximation and implicit subgradient descent 0x

Rx Moreau’s theorem 0

D. Cremers, B. Goldlucke, ¨ T. Pock ICCV 2011 Tutorial Variational Methods in Computer Vision

For the remainder of the lecture, we will assume that the underlying space is a Hilbert space H, for example L2 (Ω). Theorem (geometric Moreau) Let f be convex and closed on the Hilbert space H, which we identify with its dual. Then for every z ∈ H there is a unique decomposition z = xˆ + ϕ with ϕ ∈ ∂f (xˆ ), and the unique xˆ in this decomposition can be computed with the proximation   1 2 proxf (z) := argmin kx − zkH + f (x) . 2 x∈H

Corollary to Theorem 31.5 in Rockafellar, page 339 (of 423). The actual theorem has somewhat more content, but is very technical and quite hard to digest. The above is the essential consequence. R 45

Convex analysis

Rx 0

R Proximation and implicit subgradient descent 0x

Rx Proof of Moreau’s Theorem 0

D. Cremers, B. Goldlucke, ¨ T. Pock ICCV 2011 Tutorial Variational Methods in Computer Vision

The correctness of the theorem is not too hard to see: if xˆ = proxf (z), then   1 2 xˆ ∈ argmin kx − zkH + f (x) 2 x∈H ⇔ 0 ∈ xˆ − z + ∂f (xˆ ) ⇔

z ∈ xˆ + ∂f (xˆ ).

Existence and uniqueness of the proximation follows because the functional is closed, strictly convex and coercive.

R 46

Convex analysis

Rx 0

R Proximation and implicit subgradient descent 0x

The geometry of the graph of ∂f

Rx 0

D. Cremers, B. Goldlucke, ¨ T. Pock ICCV 2011 Tutorial Variational Methods in Computer Vision

• The map z 7→ (proxf (z), z − proxf (z)) is a continuous map from

H into the graph of ∂f , graph(∂f ) := {(x, ϕ) : x ∈ H, ϕ ∈ ∂f (x)} ⊂ H × H, with continuous inverse (x, ϕ) 7→ x + ϕ. • The theorem of Moreau now says that this map is one-to one. In

particular, H ' graph(∂f ), i.e. the sets are homeomorphic. • In particular, graph(∂f ) is always connected.

R 47

Convex analysis

Rx 0

R Proximation and implicit subgradient descent 0x

Rx Fixed points of the proximation operator 0

D. Cremers, B. Goldlucke, ¨ T. Pock ICCV 2011 Tutorial Variational Methods in Computer Vision

Proposition Let f be closed and convex on the Hilbert space H. Let zˆ be a fixed point of the proximation operator proxf , i.e. zˆ = proxf (zˆ ). Then zˆ is a minimizer of f . In particular, it also follows that zˆ ∈ (I − proxf )−1 (0). To proof this, just note that because of Moreau’s theorem, zˆ ∈ proxf (zˆ ) + ∂f (zˆ ) ⇔ 0 ∈ ∂f (zˆ ) if zˆ is a fixed point.

R 48

Convex analysis

Rx 0

R Proximation and implicit subgradient descent 0x

Rx Subgradient descent 0

D. Cremers, B. Goldlucke, ¨ T. Pock ICCV 2011 Tutorial Variational Methods in Computer Vision

Let λ > 0, z ∈ H and x = proxλf (z). Then z ∈ x + ∂λf (x) ⇔ x ∈ z − λ∂f (x). In particular, we have the following interesting observation:

The proximation operator proxλf computes an implicit subgradient descent step of step size λ for the functional f . Implicit here means that the subgradient is not evaluated at the original, but at the new location. This improves stability of the descent. Note that if subgradient descent converges, then it converges to a fixed point zˆ of I − λ∂f , in particular zˆ is a minimizer of the functional f . R 49

Summary

Rx 0

Rx 0

Rx Summary 0

D. Cremers, B. Goldlucke, ¨ T. Pock ICCV 2011 Tutorial Variational Methods in Computer Vision

• Variational calculus deals with functionals on infinite-dimensional

vector spaces. • Minima are characterized by the variational principle, which leads

to the Euler-Lagrange equation as a condition for a local minimum. • The total variation is a powerful regularizer for image processing

problems. For binary functions u, it equals the perimeter of the set where u = 1. • Convex optimization deals with finding minima of convex

functionals, which can be non-differentiable. • The generalization of the variational principle for a convex

functional is the condition that the subgradient at a minimum is zero. • Efficient optimization methods rely heavily on the concept of

duality. It allows certain useful transformations of convex problems, which will be employed in the next chapter. R 50

References

Rx 0

References (1)

Rx 0

Rx 0

D. Cremers, B. Goldlucke, ¨ T. Pock ICCV 2011 Tutorial Variational Methods in Computer Vision

Variational methods Luenberger, “Optimization by Vector Space Methods”, Wiley 1969. • Elementary introduction of optimization on Hilbert and Banach spaces. • Easy to read, many examples from other disciplines, in particular economics.

Gelfand and Fomin, “Calculus of Variations”, translated 1963 (original in Russian). • Classical introduction of variational calculus, somewhat outdated terminology, inexpensive and easy to get • Historically very interesting, lots of non-computer-vision applications (classical geometric problems, Physics: optics, mechanics, quantum mechanics, field theory) R 51

References

Rx 0

References (2)

Rx 0

Rx 0

D. Cremers, B. Goldlucke, ¨ T. Pock ICCV 2011 Tutorial Variational Methods in Computer Vision

Total Variation Chambolle, Caselles, Novaga, Cremers, Pock “An Introduction to Total Variation for Image Analysis”, Summer School, Linz, Austria 2006. • Focused introduction to total variation for image processing applications, plus some basics of convex optimization and the numerics of optimization. • Available online for free.

Attouch, Buttazzo and Micaille, “Variational Analysis in Sobolev and BV spaces”, SIAM 2006. • Exhaustive introduction to variational methods and convex optimization in infinite dimensional spaces, as well as the theory of BV functions. • Mathematically very advanced, requires solid knowledge of functional analysis. R 52

References

Rx 0

References (3)

Rx 0

Rx 0

D. Cremers, B. Goldlucke, ¨ T. Pock ICCV 2011 Tutorial Variational Methods in Computer Vision

Convex Optimization

Boyd and Vandenberghe, “Convex Optimization”, Stanford University Press 2004. • Excellent recent introduction to convex optimization. • Reads very well, available online for free.

Rockafellar, “Convex Analysis”, Princeton University Press 1970. • Classical introduction to convex analysis and optimization. • Somewhat technical and not too easy to read, but very exhaustive. R 53

Suggest Documents