Document not found! Please try again

Hessian Riemannian Gradient Flows in Convex ... - DIM-UChile

13 downloads 0 Views 390KB Size Report
Newton's correction: xk+1 = xk − λk∇2f(xk). −1∇f(xk). Continuous Newton's method: dx dt= −∇2f(x). −1. ∇f(x). Hessian Riemannian Flows.– p. 4/25 ...
Hessian Riemannian Gradient Flows in Convex Programming Felipe Alvarez, J´erˆome Bolte, Olivier Brahic INTERNATIONAL CONFERENCE ON MODELING AND OPTIMIZATION MODOPT 2004 Universidad de La Frontera, Temuco, Chile January 19-22, 2004.

Hessian Riemannian Flows.– p. 1/25

Outline 1. 2. 3.

Motivation: scaling the Euclidean gradient. Riemannian gradient flows on convex sets. Hessian metrics, existence, convergence and examples.

Hessian Riemannian Flows.– p. 2/25

1. Motivation: gradient method Let f : Rn → R be a smooth function, x0 ∈ Rn .

xk+1 = xk + λk dk ,

Hessian Riemannian Flows.– p. 3/25

1. Motivation: gradient method Let f : Rn → R be a smooth function, x0 ∈ Rn .

xk+1 = xk + λk dk , where • λk > 0 is a stepsize

Hessian Riemannian Flows.– p. 3/25

1. Motivation: gradient method Let f : Rn → R be a smooth function, x0 ∈ Rn .

xk+1 = xk + λk dk , where • λk > 0 is a stepsize and • dk = −∇f (xk ) is the steepest descent direction.

Hessian Riemannian Flows.– p. 3/25

1. Motivation: gradient method Let f : Rn → R be a smooth function, x0 ∈ Rn .

xk+1 = xk + λk dk , where • λk > 0 is a stepsize and • dk = −∇f (xk ) is the steepest descent direction. Continuous gradient method:

dx = −∇f (x), t > 0. dt

Continuous flow ! discrete method . Hessian Riemannian Flows.– p. 3/25

1. Scaling and Newton’s method Newton’s correction:

xk+1 = xk − λk ∇2 f (xk )−1 ∇f (xk ).

Continuous Newton’s method: dx = −∇2 f (x)−1 ∇f (x). dt

Hessian Riemannian Flows.– p. 4/25

1. Scaling and Newton’s method Newton’s correction:

xk+1 = xk − λk ∇2 f (xk )−1 ∇f (xk ).

Continuous Newton’s method: dx = −∇2 f (x)−1 ∇f (x). dt ⇓

d ∇f (x) = −∇f (x). dt

Hessian Riemannian Flows.– p. 4/25

1. Scaling and Newton’s method Newton’s correction:

xk+1 = xk − λk ∇2 f (xk )−1 ∇f (xk ).

Continuous Newton’s method: dx = −∇2 f (x)−1 ∇f (x). dt ⇓

d ∇f (x) = −∇f (x). dt ⇓ ∇f (x(t)) = e−t ∇f (x0 ) Scale invariant rate of convergence on a straight line ! back. Hessian Riemannian Flows.– p. 4/25

1. Scaling and constraints Problem:

min{f (x) | x ≥ 0}.

Hessian Riemannian Flows.– p. 5/25

1. Scaling and constraints min{f (x) | x ≥ 0}. ∂f dxi (x), i = 1, . . . , n. ODE approach: = −xi dt ∂xi Problem:

Hessian Riemannian Flows.– p. 5/25

1. Scaling and constraints min{f (x) | x ≥ 0}. ∂f dxi (x), i = 1, . . . , n. ODE approach: = −xi dt ∂xi Properties: ¯ ¯2 n P ¯ ∂f ¯ n • dtd f (x) = − xi ¯ ∂x (x) descent method on R ≤ 0 Ã ¯ +. i Problem:

i=1

• xi (0) > 0 ⇒ ∀t > 0, xi (t) > 0 Ã interior point trajectory.

Hessian Riemannian Flows.– p. 5/25

1. Scaling and constraints min{f (x) | x ≥ 0}. ∂f dxi (x), i = 1, . . . , n. ODE approach: = −xi dt ∂xi Properties: ¯ ¯2 n P ¯ ∂f ¯ n • dtd f (x) = − xi ¯ ∂x (x) descent method on R ≤ 0 Ã ¯ +. i Problem:

i=1

• xi (0) > 0 ⇒ ∀t > 0, xi (t) > 0 Ã interior point trajectory.

• The equation may be written as ∂f d log(xi ) = − (x) dt ∂xi Scaling à logarithmic barrier to force x(t) > 0 ! Hessian Riemannian Flows.– p. 5/25

1. Hessian type scaling ∂f d log(xi ) = − (x) dt ∂xi

Hessian Riemannian Flows.– p. 6/25

1. Hessian type scaling ∂f d log(xi ) = − (x) dt ∂xi m

d ∇h(x) = −∇f (x), dt where h(x) =

n X i=1

xi log(xi ) − xi

Hessian Riemannian Flows.– p. 6/25

1. Hessian type scaling ∂f d log(xi ) = − (x) dt ∂xi m

d ∇h(x) = −∇f (x), dt where h(x) =

n X i=1

xi log(xi ) − xi

Thus dx dxi ∂f = −xi = −∇2 h(x)−1 ∇f (x) (x) ⇔ dt ∂xi dt Remark the analogy with Newton’s method Hessian Riemannian Flows.– p. 6/25

2. Riemannian gradient flows Problem: min f (x), with C = {x ∈ Rn | x ∈ Q, Ax = b}. x∈C n

Q ⊂ R nonempty, open and convex. A ∈ Rm×n and b ∈ Rm with m ≤ n.

Hessian Riemannian Flows.– p. 7/25

2. Riemannian gradient flows Problem: min f (x), with C = {x ∈ Rn | x ∈ Q, Ax = b}. x∈C n

Q ⊂ R nonempty, open and convex. A ∈ Rm×n and b ∈ Rm with m ≤ n. Strategy: Introduce a Riemannian metric H(x) ∈ Sn++ on Q, (u, v)x = hH(x)u, vi =

n X i=1

Hij (x)ui vj , x ∈ Q.

Consider the gradient flow dx (t) = −∇H f (x(t)), t > 0, dt Hessian Riemannian Flows.– p. 7/25

2. Riemannian gradient • Let M , (·, ·) be a Riemannian manifold. • Tx M : tangent space to M at x ∈ M .

Hessian Riemannian Flows.– p. 8/25

2. Riemannian gradient • Let M , (·, ·) be a Riemannian manifold. • Tx M : tangent space to M at x ∈ M . The gradient gradf of f ∈ C 1 (M ; R) is uniquely determined by Tangency: for all x ∈ M , gradf (x) ∈ Tx M. Duality: for all x ∈ M , v ∈ Tx M , df (x)v = (gradf (x), v)x , where df (x) : Tx M → R is the differential of f . Hessian Riemannian Flows.– p. 8/25

2. Riemannian gradient in our case M = Q ∩ {x ∈ Rn | Ax = b} with Q open set, then ⇒

Tx M ' Ker A.

(·, ·)x = hH(x)·, ·i with a barrier/penalty effect near ∂Q.

Hessian Riemannian Flows.– p. 9/25

2. Riemannian gradient in our case M = Q ∩ {x ∈ Rn | Ax = b} with Q open set, then ⇒

Tx M ' Ker A.

(·, ·)x = hH(x)·, ·i with a barrier/penalty effect near ∂Q. ⇓ ∇H f = H −1 [I − AT (AH −1 AT )−1 AH −1 ]∇f, where ∇H stands for grad to stress the dependence on H.

Hessian Riemannian Flows.– p. 9/25

2. Riemannian gradient in our case M = Q ∩ {x ∈ Rn | Ax = b} with Q open set, then ⇒

Tx M ' Ker A.

(·, ·)x = hH(x)·, ·i with a barrier/penalty effect near ∂Q. ⇓ ∇H f = H −1 [I − AT (AH −1 AT )−1 AH −1 ]∇f, where ∇H stands for grad to stress the dependence on H. Projection à vector field in the tangent space. Scaling à interior point method x(t) ∈ Q. Hessian Riemannian Flows.– p. 9/25

2. Example n

C = ∆n−1 := {x ∈ R | x ≥ 0,

Pn

i=1 xi

= 1}

Hessian Riemannian Flows.– p. 10/25

2. Example n

C = ∆n−1 := {x ∈ R | x ≥ 0,

Pn

i=1 xi

= 1}

Q = Rn++ , A = (1, . . . , 1) ∈ R1×n and b = 1. Pn n M = {x ∈ R | x > 0, i=1 xi = 1}. Pn n Tx M = {v ∈ R | i=1 vi = 0}.

Hessian Riemannian Flows.– p. 10/25

2. Example n

C = ∆n−1 := {x ∈ R | x ≥ 0,

Pn

i=1 xi

= 1}

Q = Rn++ , A = (1, . . . , 1) ∈ R1×n and b = 1. Pn n M = {x ∈ R | x > 0, i=1 xi = 1}. Pn n Tx M = {v ∈ R | i=1 vi = 0}.

Take H(x) = diag(1/x1 , . . . , 1/xn ) , then (u, v)x =

n X ui v j i=1

xi

dxi ∂f + = −xi dt ∂xi

à Shahshahani metric. n X

∂f à Lotka-Volterra type eq. xi xj ∂xj j=1

Karmarkar ’90, Faybusovich ’91,...

Hessian Riemannian Flows.– p. 10/25

2. Barrier effect: Legendre functions We focus on the case H(x) = ∇2 h(x), x ∈ Q with  n  • h : R → R ∪ {+∞} is closed, convex and proper.         • int dom h = Q. (H0 ) • h is of Legendre type.    2 2  • h ∈ C (Q; R) and ∇ h(x) > 0.  | Q     • ∇2 h is locally Lipschitz.

Hessian Riemannian Flows.– p. 11/25

2. Barrier effect: Legendre functions We focus on the case H(x) = ∇2 h(x), x ∈ Q with  n  • h : R → R ∪ {+∞} is closed, convex and proper.         • int dom h = Q. (H0 ) • h is of Legendre type.    2 2  • h ∈ C (Q; R) and ∇ h(x) > 0.  | Q     • ∇2 h is locally Lipschitz.

  ? h is strictly convex and C 1 on int dom h.  ? int dom h 3 xj → x ∈ ∂int dom h, k∇h(xj )k → +∞.

back

Hessian Riemannian Flows.– p. 11/25

2. Example of Legendre function h(x) =

X

θ(gi (x)).

i∈I

where • I = {1, . . . , p}, gi ∈ C 3 (Rn ) concave. • Q = {x ∈ Rn | gi (x) > 0, i ∈ I} = 6 ∅. • ∀x ∈ Q, span {∇gi (x) | i ∈ I} = Rn , and  3   (i) (0, ∞) ⊆ domθ ⊆ [0, ∞), θ ∈ C (0, ∞).   (ii) lim+ θ0 (s) = −∞ and ∀s > 0, θ00 (s) > 0. s→0     (iii) Either θ is nonincreasing or ∀i ∈ I, g is affine. i Hessian Riemannian Flows.– p. 12/25

3. Questions Given x0 ∈ M = Q ∩ {x ∈ Rn | Ax = b}: dx (t) = −∇H f (x(t)), t > 0, dt

Hessian Riemannian Flows.– p. 13/25

3. Questions Given x0 ∈ M = Q ∩ {x ∈ Rn | Ax = b}: dx (t) = −∇H f (x(t)), t > 0, dt Well posedness: global existence for all t > 0. Asymptotic behavior: convergence to an equilibrium as t → ∞, rate of convergence,...

Hessian Riemannian Flows.– p. 13/25

3. Questions Given x0 ∈ M = Q ∩ {x ∈ Rn | Ax = b}: dx (t) = −∇H f (x(t)), t > 0, dt Well posedness: global existence for all t > 0. Asymptotic behavior: convergence to an equilibrium as t → ∞, rate of convergence,... Main difficulty: singular behavior near ∂Q. ⇒ Classical results do not apply. Hessian Riemannian Flows.– p. 13/25

3. Well posedness: global existence Thm. 2 The trajectory x(t) is defined for all t ≥ 0 under any of the following conditions: (C1 ) {x ∈ C | f (x) ≤ f (x0 )} is bounded.     (i) dom h = Q (C2 ) (ii) ∀y ∈ Q, ∀γ ∈ R, {x ∈ C | Dh (y, x) ≤ γ} is bdd.    (iii) ArgminC f 6= ∅ and f quasiconvex. (C3 ) ∃K ≥ 0, L ∈ R such that

∀x ∈ Q, ||H(x)−1 || ≤ K|x| + L.

Hessian Riemannian Flows.– p. 14/25

3. Why Hessian metrics ? Suppose f is convex and A = 0 and b = 0. Euclidean case: y ∈ ArgminC f ⇔ ∀x ∈ C, h∇f (x), x − yi ≥ 0.

Hessian Riemannian Flows.– p. 15/25

3. Why Hessian metrics ? Suppose f is convex and A = 0 and b = 0. Euclidean case: y ∈ ArgminC f ⇔ ∀x ∈ C, h∇f (x), x − yi ≥ 0. ϕy (x) = 12 kx − yk2 ⇓ d ˙ = hx − y, −∇f (x)i ≤ 0. x˙ = −∇f (x) ⇒ ϕy (x) = h∇ϕy (x), xi dt

Hessian Riemannian Flows.– p. 15/25

3. Why Hessian metrics ? Suppose f is convex and A = 0 and b = 0. Euclidean case: y ∈ ArgminC f ⇔ ∀x ∈ C, h∇f (x), x − yi ≥ 0. ϕy (x) = 12 kx − yk2 ⇓ d ˙ = hx − y, −∇f (x)i ≤ 0. x˙ = −∇f (x) ⇒ ϕy (x) = h∇ϕy (x), xi dt ⇓ ϕy (x) is a Lyapunov function for the gradient flow

Hessian Riemannian Flows.– p. 15/25

3. Characterization of Hessian metrics Riemannian case: y ∈ ArgminQ f ⇔ ∀x ∈ Q, (∇H f (x), x − y)x ≥ 0.

Hessian Riemannian Flows.– p. 16/25

3. Characterization of Hessian metrics Riemannian case: y ∈ ArgminQ f ⇔ ∀x ∈ Q, (∇H f (x), x − y)x ≥ 0. Thm. 1

H ∈ C 1 (Q; Sn++ ) satisfies ∀y ∈ Q, ∃ϕy ∈ C 1 (Q; R), ∇H ϕy (x) = x − y m ?

Hessian Riemannian Flows.– p. 16/25

3. Characterization of Hessian metrics Riemannian case: y ∈ ArgminQ f ⇔ ∀x ∈ Q, (∇H f (x), x − y)x ≥ 0. Thm. 1

H ∈ C 1 (Q; Sn++ ) satisfies ∀y ∈ Q, ∃ϕy ∈ C 1 (Q; R), ∇H ϕy (x) = x − y m ∃h ∈ C 3 (Q) such that H = ∇2 h on Q and

ϕy (x) = Dh (y, x) = h(y) − h(x) − h∇h(x), y − xi = Bregman pseudo-distance induced by h. Hessian Riemannian Flows.– p. 16/25

3. Implicit proximal iteration xk+1

n ∈ Argmin f (x) +

1 D (x, xk ) λk h

o

| Ax = b ,

m

1 [∇h(xk+1 ) − ∇h(xk )] ∈ −∇f (xk+1 ) + Im AT , Axk+1 = b λk Bregman 67, Censor-Zenios 92, Teboulle 92, Eckstein 93, Kiwiel 97,...

Hessian Riemannian Flows.– p. 17/25

3. Implicit proximal iteration xk+1

n ∈ Argmin f (x) +

1 D (x, xk ) λk h

o

| Ax = b ,

m

1 [∇h(xk+1 ) − ∇h(xk )] ∈ −∇f (xk+1 ) + Im AT , Axk+1 = b λk Bregman 67, Censor-Zenios 92, Teboulle 92, Eckstein 93, Kiwiel 97,...   d ∇h(x) ∈ −∇f (x) + Im AT , dx dt But = −∇H f (x) ⇔  dt Ax(t) = b, t ≥ 0.

? This link was already noticed by Iusem-Svaiter-Da Cruz Neto ’99, together with convergence results for a linear objective function. Hessian Riemannian Flows.– p. 17/25

3. Convergence: Bregman functions A Legendre function h with domh = Q is of Bregman type if (i) {x ∈ Q | Dh (y, x) ≤ γ} is bdd. ∀y ∈ Q, ∀γ ∈ R. (ii) ∀y ∈ Q, ∀y j → y with y j ∈ Q, Dh (y, y j ) → 0.

Hessian Riemannian Flows.– p. 18/25

3. Convergence: Bregman functions A Legendre function h with domh = Q is of Bregman type if (i) {x ∈ Q | Dh (y, x) ≤ γ} is bdd. ∀y ∈ Q, ∀γ ∈ R. (ii) ∀y ∈ Q, ∀y j → y with y j ∈ Q, Dh (y, y j ) → 0. Thm. 3 Suppose (H0 ) with h of Bregman type. f is quasiconvex and ArgminC f 6= ∅. Then ∃x∞ ∈ C such that x(t) → x∞ as t → +∞ with −∇f (x∞ ) ∈ NQ (x∞ ) + Ker A⊥ , where NQ (x∞ ) is the normal cone to Q at x∞ . Hessian Riemannian Flows.– p. 18/25

3. Examples on ∆n−1 Boltzmann-Shanon entropy: h(x) =

n P

i=1

xi log(xi ) − xi .

H(x) = ∇2 h(x) = diag(1/x1 , . . . , 1/xn ). n P Kullback-Liebler divergence: Dh (y, x) = yi log(yi /xi ) + xi − yi . Shahshahani metric:

i=1

3.5

x3

3

1.30

x(0)=(1/4,1/4,1/2) O=(0,0,0)

2.5

2

0.65 1.5

h(x)=x*log(x)−x

1

0.00

0.00 0.5

0.00

0

0.65 0.65

−0.5

1.30

−1

−1.5

x1

x2

0.5

1

1.5

2

2.5

3

3.5

4

4.5

5

1.30

Lotka-Volterra type flow Hessian Riemannian Flows.– p. 19/25

3. Other examples h(x) = −2

Pn √ i=1

xi 3/2

3/2

H(x) = ∇2 h(x) = 21 diag(1/x1 , . . . , 1/xn ). Pn √ √ √ Dh (y, x) = i=1 ( yi − xi )/ xi . µ ¶ 3/2 Pn xj 3/2 ∂f ∂f dxi Flow: dt = −2xi − . P 3/2 ∂x n j=1 ∂xi j k=1

xk

Hessian Riemannian Flows.– p. 20/25

3. Other examples h(x) = −2

Pn √ i=1

xi 3/2

3/2

H(x) = ∇2 h(x) = 21 diag(1/x1 , . . . , 1/xn ). Pn √ √ √ Dh (y, x) = i=1 ( yi − xi )/ xi . µ ¶ 3/2 Pn xj 3/2 ∂f ∂f dxi Flow: dt = −2xi − . P 3/2 ∂x n j=1 ∂xi j k=1

h(x) = −

Pn

i=1

log(xi )

xk

(h(0) = +∞ so that h is not Bregman).

H(x) = ∇2 h(x) = diag(1/x21 , . . . , 1/x2n ). Pn Dh (y, x) = i=1 log(xi /yi ) + (yi − xi )/xi . ³ ´ 2 P x n ∂f 2 i P n j 2 ∂f . Flow: dx = −x − i j=1 dt ∂xi x ∂xj k=1

k

Hessian Riemannian Flows.– p. 20/25

3. Further developments Rate of convergence. Dual trajectory and its convergence. Geodesic type characterization of trajectories. Connections with completely integrable Hamiltonian systems. Reference: A.-Bolte-Brahic, to appear in SIAM J. on Control Optim.

Hessian Riemannian Flows.– p. 21/25

3. Further developments Rate of convergence. Dual trajectory and its convergence. Geodesic type characterization of trajectories. Connections with completely integrable Hamiltonian systems. Reference: A.-Bolte-Brahic, to appear in SIAM J. on Control Optim. Continuous version of similar results for proximal iterations: Iusem-Monteiro ’00.

Hessian Riemannian Flows.– p. 21/25

3. Further developments Rate of convergence. Dual trajectory and its convergence. Geodesic type characterization of trajectories. Connections with completely integrable Hamiltonian systems. Reference: A.-Bolte-Brahic, to appear in SIAM J. on Control Optim. Continuous version of similar results for proximal iterations: Iusem-Monteiro ’00. Generalizations of results on the log-metric in linear programming: Bayer-Lagarias ’89. Hessian Riemannian Flows.– p. 21/25

4. Duality when Q = Rn

++

(P )

min{f (x) | x ≥ 0, Ax = b}

Assume: • f is convex and S(P ) 6= ∅. • ∃x0 ∈ Rn , x0 > 0, Ax0 = b. Dual problem: (D)

min{p(λ) | λ ≥ 0}

where p(λ) = sup{hλ, xi − f (x) | Ax = b}. Then S(D) = {λ ∈ Rn | λ ≥ 0, λ ∈ ∇f (x∗ ) + Im AT , hλ, x∗ i = 0}, where x∗ is any solution of (P ). Hessian Riemannian Flows.– p. 22/25

4. Dual trajectory Integrating the differential inclusion d ∇h(x(t)) ∈ −∇f (x(t)) + Im AT , dt we obtain where c(t) =

If h(x) =

n P

i=1

1 t

Rt 0

λ(t) ∈ c(t) + ImAT , ∇f (x(τ ))dτ and 1 λ(t) = [∇h(x0 ) − ∇h(x(t))]. t

θ(xi ), then λi (t) = 1t [θ0 (x0i ) − θ0 (xi (t))]. Hessian Riemannian Flows.– p. 23/25

4. Dual penalty scheme We have: λ(t) = 1t [∇h(x0 ) − ∇h(x(t))]. But ∗

Pn

h is Legendre ⇒ ∇h−1 = ∇h∗ ,

with h (λ) = i=1 θ∗ (λi ) being the Fenchel conjugate of h. Hence x(t) = ∇h∗ (∇h(x0 ) − tλ(t)), where Take Ae x = b. Since Ax(t) = b, we have x e − ∇h∗ (∇h(x0 ) − tλ(t)) ∈ KerA.

Then, λ(t) solves ¯ ¾ ½ n ¯ P ∗ 0 0 1 min he x, λi + t θ (θ (xi ) − tλi )¯¯ λ ∈ c(t) + ImAT λ

i=1

Hessian Riemannian Flows.– p. 24/25

4. Dual trajectory: convergence Example: θ(s) = s log(s) − s ⇒ θ∗ (s∗ ) = exp(s∗ ), s∗ ∈ R. Then ¯ ½ ¾ n ¯ P 0 1 min he x, λi + t xi exp(−tλi )¯¯ λ ∈ c(t) + ImAT λ

i=1

Convergence:

1 t

Rt

• f (x) = hc, xi ⇒ c(t) = 0 ∇f (x(τ ))dτ ≡ c ⇒ by Cominetti-San Martin ´96, Auslender et al. ´97, Cominetti ’00,... convergence to the θ∗ -center of S(D). • Otherwise, x(t) bounded ⇒ ∇f (x(t)) → ∇f (x∗ ) for x∗ ∈ S(P ) ⇒ c(t) → ∇f (x∗ ) ⇒ convergence by Iusem-Monteiro ‘00. Hessian Riemannian Flows.– p. 25/25

Suggest Documents