Hessian Riemannian Gradient Flows in Convex ... - DIM-UChile

Hessian Riemannian Gradient Flows in Convex Programming Felipe Alvarez, J´erˆome Bolte, Olivier Brahic INTERNATIONAL CONFERENCE ON MODELING AND OPTIMIZATION MODOPT 2004 Universidad de La Frontera, Temuco, Chile January 19-22, 2004.

Hessian Riemannian Flows.– p. 1/25

Outline 1. 2. 3.

Motivation: scaling the Euclidean gradient. Riemannian gradient flows on convex sets. Hessian metrics, existence, convergence and examples.


1. Motivation: gradient method Let f : Rn → R be a smooth function, x0 ∈ Rn .

xk+1 = xk + λk dk ,



xk+1 = xk + λk dk , where • λk > 0 is a stepsize



xk+1 = xk + λk dk , where • λk > 0 is a stepsize and • dk = −∇f (xk ) is the steepest descent direction.



xk+1 = xk + λk dk , where • λk > 0 is a stepsize and • dk = −∇f (xk ) is the steepest descent direction. Continuous gradient method:

dx = −∇f (x), t > 0. dt

Continuous flow ! discrete method . Hessian Riemannian Flows.– p. 3/25

1. Scaling and Newton’s method Newton’s correction:

xk+1 = xk − λk ∇2 f (xk )−1 ∇f (xk ).

Continuous Newton’s method: dx = −∇2 f (x)−1 ∇f (x). dt



xk+1 = xk − λk ∇2 f (xk )−1 ∇f (xk ).

Continuous Newton’s method: dx = −∇2 f (x)−1 ∇f (x). dt ⇓

d ∇f (x) = −∇f (x). dt



xk+1 = xk − λk ∇2 f (xk )−1 ∇f (xk ).

Continuous Newton’s method: dx = −∇2 f (x)−1 ∇f (x). dt ⇓

d ∇f (x) = −∇f (x). dt ⇓ ∇f (x(t)) = e−t ∇f (x0 ) Scale invariant rate of convergence on a straight line ! back. Hessian Riemannian Flows.– p. 4/25

1. Scaling and constraints Problem:

min{f (x) | x ≥ 0}.


1. Scaling and constraints min{f (x) | x ≥ 0}. ∂f dxi (x), i = 1, . . . , n. ODE approach: = −xi dt ∂xi Problem:


1. Scaling and constraints min{f (x) | x ≥ 0}. ∂f dxi (x), i = 1, . . . , n. ODE approach: = −xi dt ∂xi Properties: ¯ ¯2 n P ¯ ∂f ¯ n • dtd f (x) = − xi ¯ ∂x (x) descent method on R ≤ 0 Ã ¯ +. i Problem:

i=1

• xi (0) > 0 ⇒ ∀t > 0, xi (t) > 0 Ã interior point trajectory.


1. Scaling and constraints min{f (x) | x ≥ 0}. ∂f dxi (x), i = 1, . . . , n. ODE approach: = −xi dt ∂xi Properties: ¯ ¯2 n P ¯ ∂f ¯ n • dtd f (x) = − xi ¯ ∂x (x) descent method on R ≤ 0 Ã ¯ +. i Problem:

i=1

• xi (0) > 0 ⇒ ∀t > 0, xi (t) > 0 Ã interior point trajectory.

• The equation may be written as ∂f d log(xi ) = − (x) dt ∂xi Scaling Ã logarithmic barrier to force x(t) > 0 ! Hessian Riemannian Flows.– p. 5/25

1. Hessian type scaling ∂f d log(xi ) = − (x) dt ∂xi


1. Hessian type scaling ∂f d log(xi ) = − (x) dt ∂xi m

d ∇h(x) = −∇f (x), dt where h(x) =

n X i=1

xi log(xi ) − xi


1. Hessian type scaling ∂f d log(xi ) = − (x) dt ∂xi m

d ∇h(x) = −∇f (x), dt where h(x) =

n X i=1

xi log(xi ) − xi

Thus dx dxi ∂f = −xi = −∇2 h(x)−1 ∇f (x) (x) ⇔ dt ∂xi dt Remark the analogy with Newton’s method Hessian Riemannian Flows.– p. 6/25

2. Riemannian gradient flows Problem: min f (x), with C = {x ∈ Rn | x ∈ Q, Ax = b}. x∈C n

Q ⊂ R nonempty, open and convex. A ∈ Rm×n and b ∈ Rm with m ≤ n.


2. Riemannian gradient flows Problem: min f (x), with C = {x ∈ Rn | x ∈ Q, Ax = b}. x∈C n

Q ⊂ R nonempty, open and convex. A ∈ Rm×n and b ∈ Rm with m ≤ n. Strategy: Introduce a Riemannian metric H(x) ∈ Sn++ on Q, (u, v)x = hH(x)u, vi =

n X i=1

Hij (x)ui vj , x ∈ Q.

Consider the gradient flow dx (t) = −∇H f (x(t)), t > 0, dt Hessian Riemannian Flows.– p. 7/25

2. Riemannian gradient • Let M , (·, ·) be a Riemannian manifold. • Tx M : tangent space to M at x ∈ M .


2. Riemannian gradient • Let M , (·, ·) be a Riemannian manifold. • Tx M : tangent space to M at x ∈ M . The gradient gradf of f ∈ C 1 (M ; R) is uniquely determined by Tangency: for all x ∈ M , gradf (x) ∈ Tx M. Duality: for all x ∈ M , v ∈ Tx M , df (x)v = (gradf (x), v)x , where df (x) : Tx M → R is the differential of f . Hessian Riemannian Flows.– p. 8/25

2. Riemannian gradient in our case M = Q ∩ {x ∈ Rn | Ax = b} with Q open set, then ⇒

Tx M ' Ker A.

(·, ·)x = hH(x)·, ·i with a barrier/penalty effect near ∂Q.



Tx M ' Ker A.

(·, ·)x = hH(x)·, ·i with a barrier/penalty effect near ∂Q. ⇓ ∇H f = H −1 [I − AT (AH −1 AT )−1 AH −1 ]∇f, where ∇H stands for grad to stress the dependence on H.



Tx M ' Ker A.

(·, ·)x = hH(x)·, ·i with a barrier/penalty effect near ∂Q. ⇓ ∇H f = H −1 [I − AT (AH −1 AT )−1 AH −1 ]∇f, where ∇H stands for grad to stress the dependence on H. Projection Ã vector field in the tangent space. Scaling Ã interior point method x(t) ∈ Q. Hessian Riemannian Flows.– p. 9/25

2. Example n

C = ∆n−1 := {x ∈ R | x ≥ 0,

Pn

i=1 xi

= 1}


2. Example n

C = ∆n−1 := {x ∈ R | x ≥ 0,

Pn

i=1 xi

= 1}

Q = Rn++ , A = (1, . . . , 1) ∈ R1×n and b = 1. Pn n M = {x ∈ R | x > 0, i=1 xi = 1}. Pn n Tx M = {v ∈ R | i=1 vi = 0}.


2. Example n

C = ∆n−1 := {x ∈ R | x ≥ 0,

Pn

i=1 xi

= 1}

Q = Rn++ , A = (1, . . . , 1) ∈ R1×n and b = 1. Pn n M = {x ∈ R | x > 0, i=1 xi = 1}. Pn n Tx M = {v ∈ R | i=1 vi = 0}.

Take H(x) = diag(1/x1 , . . . , 1/xn ) , then (u, v)x =

n X ui v j i=1

xi

dxi ∂f + = −xi dt ∂xi

Ã Shahshahani metric. n X

∂f Ã Lotka-Volterra type eq. xi xj ∂xj j=1

Karmarkar ’90, Faybusovich ’91,...


2. Barrier effect: Legendre functions We focus on the case H(x) = ∇2 h(x), x ∈ Q with  n  • h : R → R ∪ {+∞} is closed, convex and proper.         • int dom h = Q. (H0 ) • h is of Legendre type.    2 2  • h ∈ C (Q; R) and ∇ h(x) > 0.  | Q     • ∇2 h is locally Lipschitz.


2. Barrier effect: Legendre functions We focus on the case H(x) = ∇2 h(x), x ∈ Q with  n  • h : R → R ∪ {+∞} is closed, convex and proper.         • int dom h = Q. (H0 ) • h is of Legendre type.    2 2  • h ∈ C (Q; R) and ∇ h(x) > 0.  | Q     • ∇2 h is locally Lipschitz.

  ? h is strictly convex and C 1 on int dom h.  ? int dom h 3 xj → x ∈ ∂int dom h, k∇h(xj )k → +∞.

back


2. Example of Legendre function h(x) =

X

θ(gi (x)).

i∈I

where • I = {1, . . . , p}, gi ∈ C 3 (Rn ) concave. • Q = {x ∈ Rn | gi (x) > 0, i ∈ I} = 6 ∅. • ∀x ∈ Q, span {∇gi (x) | i ∈ I} = Rn , and  3   (i) (0, ∞) ⊆ domθ ⊆ [0, ∞), θ ∈ C (0, ∞).   (ii) lim+ θ0 (s) = −∞ and ∀s > 0, θ00 (s) > 0. s→0     (iii) Either θ is nonincreasing or ∀i ∈ I, g is affine. i Hessian Riemannian Flows.– p. 12/25

3. Questions Given x0 ∈ M = Q ∩ {x ∈ Rn | Ax = b}: dx (t) = −∇H f (x(t)), t > 0, dt


3. Questions Given x0 ∈ M = Q ∩ {x ∈ Rn | Ax = b}: dx (t) = −∇H f (x(t)), t > 0, dt Well posedness: global existence for all t > 0. Asymptotic behavior: convergence to an equilibrium as t → ∞, rate of convergence,...


3. Questions Given x0 ∈ M = Q ∩ {x ∈ Rn | Ax = b}: dx (t) = −∇H f (x(t)), t > 0, dt Well posedness: global existence for all t > 0. Asymptotic behavior: convergence to an equilibrium as t → ∞, rate of convergence,... Main difficulty: singular behavior near ∂Q. ⇒ Classical results do not apply. Hessian Riemannian Flows.– p. 13/25

3. Well posedness: global existence Thm. 2 The trajectory x(t) is defined for all t ≥ 0 under any of the following conditions: (C1 ) {x ∈ C | f (x) ≤ f (x0 )} is bounded.     (i) dom h = Q (C2 ) (ii) ∀y ∈ Q, ∀γ ∈ R, {x ∈ C | Dh (y, x) ≤ γ} is bdd.    (iii) ArgminC f 6= ∅ and f quasiconvex. (C3 ) ∃K ≥ 0, L ∈ R such that

∀x ∈ Q, ||H(x)−1 || ≤ K|x| + L.


3. Why Hessian metrics ? Suppose f is convex and A = 0 and b = 0. Euclidean case: y ∈ ArgminC f ⇔ ∀x ∈ C, h∇f (x), x − yi ≥ 0.


3. Why Hessian metrics ? Suppose f is convex and A = 0 and b = 0. Euclidean case: y ∈ ArgminC f ⇔ ∀x ∈ C, h∇f (x), x − yi ≥ 0. ϕy (x) = 12 kx − yk2 ⇓ d ˙ = hx − y, −∇f (x)i ≤ 0. x˙ = −∇f (x) ⇒ ϕy (x) = h∇ϕy (x), xi dt


3. Why Hessian metrics ? Suppose f is convex and A = 0 and b = 0. Euclidean case: y ∈ ArgminC f ⇔ ∀x ∈ C, h∇f (x), x − yi ≥ 0. ϕy (x) = 12 kx − yk2 ⇓ d ˙ = hx − y, −∇f (x)i ≤ 0. x˙ = −∇f (x) ⇒ ϕy (x) = h∇ϕy (x), xi dt ⇓ ϕy (x) is a Lyapunov function for the gradient flow


3. Characterization of Hessian metrics Riemannian case: y ∈ ArgminQ f ⇔ ∀x ∈ Q, (∇H f (x), x − y)x ≥ 0.


3. Characterization of Hessian metrics Riemannian case: y ∈ ArgminQ f ⇔ ∀x ∈ Q, (∇H f (x), x − y)x ≥ 0. Thm. 1

H ∈ C 1 (Q; Sn++ ) satisfies ∀y ∈ Q, ∃ϕy ∈ C 1 (Q; R), ∇H ϕy (x) = x − y m ?


3. Characterization of Hessian metrics Riemannian case: y ∈ ArgminQ f ⇔ ∀x ∈ Q, (∇H f (x), x − y)x ≥ 0. Thm. 1

H ∈ C 1 (Q; Sn++ ) satisfies ∀y ∈ Q, ∃ϕy ∈ C 1 (Q; R), ∇H ϕy (x) = x − y m ∃h ∈ C 3 (Q) such that H = ∇2 h on Q and

ϕy (x) = Dh (y, x) = h(y) − h(x) − h∇h(x), y − xi = Bregman pseudo-distance induced by h. Hessian Riemannian Flows.– p. 16/25

3. Implicit proximal iteration xk+1

n ∈ Argmin f (x) +

1 D (x, xk ) λk h

o

| Ax = b ,

m

1 [∇h(xk+1 ) − ∇h(xk )] ∈ −∇f (xk+1 ) + Im AT , Axk+1 = b λk Bregman 67, Censor-Zenios 92, Teboulle 92, Eckstein 93, Kiwiel 97,...


3. Implicit proximal iteration xk+1

n ∈ Argmin f (x) +

1 D (x, xk ) λk h

o

| Ax = b ,

m

1 [∇h(xk+1 ) − ∇h(xk )] ∈ −∇f (xk+1 ) + Im AT , Axk+1 = b λk Bregman 67, Censor-Zenios 92, Teboulle 92, Eckstein 93, Kiwiel 97,...   d ∇h(x) ∈ −∇f (x) + Im AT , dx dt But = −∇H f (x) ⇔  dt Ax(t) = b, t ≥ 0.

? This link was already noticed by Iusem-Svaiter-Da Cruz Neto ’99, together with convergence results for a linear objective function. Hessian Riemannian Flows.– p. 17/25

3. Convergence: Bregman functions A Legendre function h with domh = Q is of Bregman type if (i) {x ∈ Q | Dh (y, x) ≤ γ} is bdd. ∀y ∈ Q, ∀γ ∈ R. (ii) ∀y ∈ Q, ∀y j → y with y j ∈ Q, Dh (y, y j ) → 0.


3. Convergence: Bregman functions A Legendre function h with domh = Q is of Bregman type if (i) {x ∈ Q | Dh (y, x) ≤ γ} is bdd. ∀y ∈ Q, ∀γ ∈ R. (ii) ∀y ∈ Q, ∀y j → y with y j ∈ Q, Dh (y, y j ) → 0. Thm. 3 Suppose (H0 ) with h of Bregman type. f is quasiconvex and ArgminC f 6= ∅. Then ∃x∞ ∈ C such that x(t) → x∞ as t → +∞ with −∇f (x∞ ) ∈ NQ (x∞ ) + Ker A⊥ , where NQ (x∞ ) is the normal cone to Q at x∞ . Hessian Riemannian Flows.– p. 18/25

3. Examples on ∆n−1 Boltzmann-Shanon entropy: h(x) =

n P

i=1

xi log(xi ) − xi .

H(x) = ∇2 h(x) = diag(1/x1 , . . . , 1/xn ). n P Kullback-Liebler divergence: Dh (y, x) = yi log(yi /xi ) + xi − yi . Shahshahani metric:

i=1

3.5

x3

3

1.30

x(0)=(1/4,1/4,1/2) O=(0,0,0)

2.5

2

0.65 1.5

h(x)=x*log(x)−x

1

0.00

0.00 0.5

0.00

0

0.65 0.65

−0.5

1.30

−1

−1.5

x1

x2

0.5

1

1.5

2

2.5

3

3.5

4

4.5

5

1.30

Lotka-Volterra type flow Hessian Riemannian Flows.– p. 19/25

3. Other examples h(x) = −2

Pn √ i=1

xi 3/2

3/2

H(x) = ∇2 h(x) = 21 diag(1/x1 , . . . , 1/xn ). Pn √ √ √ Dh (y, x) = i=1 ( yi − xi )/ xi . µ ¶ 3/2 Pn xj 3/2 ∂f ∂f dxi Flow: dt = −2xi − . P 3/2 ∂x n j=1 ∂xi j k=1

xk


3. Other examples h(x) = −2

Pn √ i=1

xi 3/2

3/2

H(x) = ∇2 h(x) = 21 diag(1/x1 , . . . , 1/xn ). Pn √ √ √ Dh (y, x) = i=1 ( yi − xi )/ xi . µ ¶ 3/2 Pn xj 3/2 ∂f ∂f dxi Flow: dt = −2xi − . P 3/2 ∂x n j=1 ∂xi j k=1

h(x) = −

Pn

i=1

log(xi )

xk

(h(0) = +∞ so that h is not Bregman).

H(x) = ∇2 h(x) = diag(1/x21 , . . . , 1/x2n ). Pn Dh (y, x) = i=1 log(xi /yi ) + (yi − xi )/xi . ³ ´ 2 P x n ∂f 2 i P n j 2 ∂f . Flow: dx = −x − i j=1 dt ∂xi x ∂xj k=1

k


3. Further developments Rate of convergence. Dual trajectory and its convergence. Geodesic type characterization of trajectories. Connections with completely integrable Hamiltonian systems. Reference: A.-Bolte-Brahic, to appear in SIAM J. on Control Optim.


3. Further developments Rate of convergence. Dual trajectory and its convergence. Geodesic type characterization of trajectories. Connections with completely integrable Hamiltonian systems. Reference: A.-Bolte-Brahic, to appear in SIAM J. on Control Optim. Continuous version of similar results for proximal iterations: Iusem-Monteiro ’00.


3. Further developments Rate of convergence. Dual trajectory and its convergence. Geodesic type characterization of trajectories. Connections with completely integrable Hamiltonian systems. Reference: A.-Bolte-Brahic, to appear in SIAM J. on Control Optim. Continuous version of similar results for proximal iterations: Iusem-Monteiro ’00. Generalizations of results on the log-metric in linear programming: Bayer-Lagarias ’89. Hessian Riemannian Flows.– p. 21/25

4. Duality when Q = Rn

++

(P )

min{f (x) | x ≥ 0, Ax = b}

Assume: • f is convex and S(P ) 6= ∅. • ∃x0 ∈ Rn , x0 > 0, Ax0 = b. Dual problem: (D)

min{p(λ) | λ ≥ 0}

where p(λ) = sup{hλ, xi − f (x) | Ax = b}. Then S(D) = {λ ∈ Rn | λ ≥ 0, λ ∈ ∇f (x∗ ) + Im AT , hλ, x∗ i = 0}, where x∗ is any solution of (P ). Hessian Riemannian Flows.– p. 22/25

4. Dual trajectory Integrating the differential inclusion d ∇h(x(t)) ∈ −∇f (x(t)) + Im AT , dt we obtain where c(t) =

If h(x) =

n P

i=1

1 t

Rt 0

λ(t) ∈ c(t) + ImAT , ∇f (x(τ ))dτ and 1 λ(t) = [∇h(x0 ) − ∇h(x(t))]. t

θ(xi ), then λi (t) = 1t [θ0 (x0i ) − θ0 (xi (t))]. Hessian Riemannian Flows.– p. 23/25

4. Dual penalty scheme We have: λ(t) = 1t [∇h(x0 ) − ∇h(x(t))]. But ∗

Pn

h is Legendre ⇒ ∇h−1 = ∇h∗ ,

with h (λ) = i=1 θ∗ (λi ) being the Fenchel conjugate of h. Hence x(t) = ∇h∗ (∇h(x0 ) − tλ(t)), where Take Ae x = b. Since Ax(t) = b, we have x e − ∇h∗ (∇h(x0 ) − tλ(t)) ∈ KerA.

Then, λ(t) solves ¯ ¾ ½ n ¯ P ∗ 0 0 1 min he x, λi + t θ (θ (xi ) − tλi )¯¯ λ ∈ c(t) + ImAT λ

i=1


4. Dual trajectory: convergence Example: θ(s) = s log(s) − s ⇒ θ∗ (s∗ ) = exp(s∗ ), s∗ ∈ R. Then ¯ ½ ¾ n ¯ P 0 1 min he x, λi + t xi exp(−tλi )¯¯ λ ∈ c(t) + ImAT λ

i=1

Convergence:

1 t

Rt

• f (x) = hc, xi ⇒ c(t) = 0 ∇f (x(τ ))dτ ≡ c ⇒ by Cominetti-San Martin ´96, Auslender et al. ´97, Cominetti ’00,... convergence to the θ∗ -center of S(D). • Otherwise, x(t) bounded ⇒ ∇f (x(t)) → ∇f (x∗ ) for x∗ ∈ S(P ) ⇒ c(t) → ∇f (x∗ ) ⇒ convergence by Iusem-Monteiro ‘00. Hessian Riemannian Flows.– p. 25/25

Hessian Riemannian Gradient Flows in Convex ... - DIM-UChile

Hessian Riemannian Gradient Flows in Convex ... - DIM-UChile

Suggest Documents

second-order convex splitting schemes for gradient flows ... - CiteSeerX

Riemannian stochastic variance reduced gradient

Optimization with Gradient and Hessian ... - Stanford University

Optimization with Gradient and Hessian ... - Stanford University

Riemannian flows and adiabatic limits

On Neeman's gradient flows

Convex domains of Finsler and Riemannian manifolds

Riemannian stochastic variance reduced gradient on Grassmann ...

Large deviations and gradient flows

A NOTE ON Î»-CONVEX SET IN COMPLETE RIEMANNIAN MANIFOLD

Online Convex Programming and Generalized Infinitesimal Gradient ...

Stochastic Conditional Gradient Methods: From Convex Minimization ...

Non-Riemannian geometry of turbulent acoustic flows in analog gravity

Convex gradient optimization for increased ... - Springer Link

Multivariate Spectral Gradient Algorithm for Nonsmooth Convex ...

minimum convex cost dynamic network flows - CiteSeerX

Convex Combinations of Single Source Unsplittable Flows*

EQUIVALENCE OF GRADIENT FLOWS AND ENTROPY SOLUTIONS

Gradient flows on NPC spaces

STRATIFIED CONVEXITY & CONCAVITY OF GRADIENT FLOWS ON

Nonparametric Uncertainty Quantification for Stochastic Gradient Flows

Integrable Gradient Flows and Morse Theory

CONVERGENCE TO EQUILIBRIUM OF GRADIENT FLOWS DEFINED

Gradient flows: from theory to application