Global Secant Methods and Matrix Structures - Math Unipd

0 downloads 0 Views 493KB Size Report
LBk − Bk F = min. X∈L. X − Bk F ,. ·F = Frob.norm where L = algebra ⊂ Cn×n,. S. Fanelli. Padova, February 2007. Prin2004 e Prin2006, Roma ”Tor Vergata” ...
Global Optimisation Quasi-Newton(QN) algorithms Matrix structures in a global minimization scheme

Global Secant Methods and Matrix Structures Stefano Fanelli

Padova, February 2007

S. Fanelli

Padova, February 2007

Prin2004 e Prin2006, Roma ”Tor Vergata”

Global Optimisation Quasi-Newton(QN) algorithms Matrix structures in a global minimization scheme

1

Global Optimisation Quasi-Newton(QN) algorithms

2

Matrix structures in a global minimization scheme

S. Fanelli

Padova, February 2007

Prin2004 e Prin2006, Roma ”Tor Vergata”

Global Optimisation Quasi-Newton(QN) algorithms Matrix structures in a global minimization scheme

Global Optimisation Quasi-Newton(QN) algorithms Given an approssimation Bk of ∇2 E (wk ), let us define the matrix LBk : kLBk − Bk kF = min kX − Bk kF , k · kF = Frob.norm X ∈L

where L = algebra ⊂ Cn×n ,

S. Fanelli

Padova, February 2007

Prin2004 e Prin2006, Roma ”Tor Vergata”

Global Optimisation Quasi-Newton(QN) algorithms Matrix structures in a global minimization scheme

Global Optimisation Quasi-Newton(QN) algorithms Given an approssimation Bk of ∇2 E (wk ), let us define the matrix LBk : kLBk − Bk kF = min kX − Bk kF , k · kF = Frob.norm X ∈L

where L = algebra ⊂ Cn×n , one can define descent methods LQN [DFLZ]: w0 ∈ R n , d0 = −g0 For k = 0, 1, . . .  wk+1 = wk + λk dk λk > 0    B k+1 = ϕ( LBk , wk+1 − wk , gk+1 − gk ), gk = ∇E (wk ) | {z } | {z }  sk yk   −1 dk+1 = −Bk+1 gk+1

S. Fanelli

Padova, February 2007

Prin2004 e Prin2006, Roma ”Tor Vergata”

Global Optimisation Quasi-Newton(QN) algorithms Matrix structures in a global minimization scheme

Global Optimisation Quasi-Newton(QN) algorithms Given an approssimation Bk of ∇2 E (wk ), let us define the matrix LBk : kLBk − Bk kF = min kX − Bk kF , k · kF = Frob.norm X ∈L

where L = algebra ⊂ Cn×n , one can define descent methods LQN [DFLZ]: w0 ∈ R n , d0 = −g0 For k = 0, 1, . . .  wk+1 = wk + λk dk λk > 0    B k+1 = ϕ( LBk , wk+1 − wk , gk+1 − gk ), gk = ∇E (wk ) | {z } | {z }  sk yk   −1 dk+1 = −Bk+1 gk+1 Remark The classical BFGS method [NW] and the more recent minimization methods introduced in [BDFZ], [DFZ2], [DFZ3] are examples of LQN algorithms, (being L = Cn×n , L = {αI }, {Hartley − type}, Lk ) S. Fanelli

Padova, February 2007

Prin2004 e Prin2006, Roma ”Tor Vergata”

Global Optimisation Quasi-Newton(QN) algorithms Matrix structures in a global minimization scheme

The step λk is determined such that: λk | sT k yk > 0 & E (wk+1 ) < E (wk ) The updating function ϕ in Bk+1 = ϕ (LBk , sk , yk ) is ϕ (, s, y) =  +

S. Fanelli

1 1 yyT − T ssT . yT s s s

Padova, February 2007

Prin2004 e Prin2006, Roma ”Tor Vergata”

Global Optimisation Quasi-Newton(QN) algorithms Matrix structures in a global minimization scheme

The step λk is determined such that: λk | sT k yk > 0 & E (wk+1 ) < E (wk ) The updating function ϕ in Bk+1 = ϕ (LBk , sk , yk ) is ϕ (, s, y) =  +

1 1 yyT − T ssT . yT s s s

The choice of λk and the properties of ϕ and LBk imply: Bk+1 inherites positive definiteness from Bk Bk+1 (wk+1 − wk ) = gk+1 − gk , ⇒ LQN secant algorithms

S. Fanelli

Padova, February 2007

Prin2004 e Prin2006, Roma ”Tor Vergata”

Global Optimisation Quasi-Newton(QN) algorithms Matrix structures in a global minimization scheme

The step λk is determined such that: λk | sT k yk > 0 & E (wk+1 ) < E (wk ) The updating function ϕ in Bk+1 = ϕ (LBk , sk , yk ) is ϕ (, s, y) =  +

1 1 yyT − T ssT . yT s s s

The choice of λk and the properties of ϕ and LBk imply: Bk+1 inherites positive definiteness from Bk Bk+1 (wk+1 − wk ) = gk+1 − gk , ⇒ LQN secant algorithms The structured space L ⇒ LQN of low complexity

S. Fanelli

Padova, February 2007

Prin2004 e Prin2006, Roma ”Tor Vergata”

Global Optimisation Quasi-Newton(QN) algorithms Matrix structures in a global minimization scheme

One can prove the following global convergence theorem [DFZ4]: Theorem 1 Given E ∈ C 2 , let Emin be the value of its global minimum. Assume that: ∀a ∈ 0, ∃εs : ||∇E (wk )|| > εs during the BFGS-type descent algorithm, apart for k : E (wk ) − Emin < εa ;

2

λk ≤ εa /k∇E (wk )k2 ;

3

k∇E (wk+1 ) − ∇E (wk )k2 kyk k2 = ≤ M, (∇E (wk+1 ) − ∇E (wk ))T λk dk ykT sk

4

kAk kkA−1 k k≤N

S. Fanelli

Padova, February 2007

Prin2004 e Prin2006, Roma ”Tor Vergata”

Global Optimisation Quasi-Newton(QN) algorithms Matrix structures in a global minimization scheme

Theorem 1 allows the definition of the following Non-Suspiciousness Conditions for a BFGS-type method: 1

∀εa > 0, ∃εs : ||∇E (wk )|| > εs during the BFGS-type descent algorithm, apart for k : E (wk ) − Emin < εa ;

2

λk ≤ εa /k∇E (wk )k2 ;

3

k∇E (wk+1 ) − ∇E (wk )k2 kyk k2 = ≤ M, (∇E (wk+1 ) − ∇E (wk ))T λk dk ykT sk

4

kAk kkA−1 k k≤N

N.B. The second condition derives from terminal attractors theory [DFZ0].

S. Fanelli

Padova, February 2007

Prin2004 e Prin2006, Roma ”Tor Vergata”

Global Optimisation Quasi-Newton(QN) algorithms Matrix structures in a global minimization scheme

Theorem 1 allows the definition of the following Non-Suspiciousness Conditions for a BFGS-type method: 1

∀εa > 0, ∃εs : ||∇E (wk )|| > εs during the BFGS-type descent algorithm, apart for k : E (wk ) − Emin < εa ;

2

λk ≤ εa /k∇E (wk )k2 ;

3

k∇E (wk+1 ) − ∇E (wk )k2 kyk k2 = ≤ M, (∇E (wk+1 ) − ∇E (wk ))T λk dk ykT sk

4

kAk kkA−1 k k≤N

N.B. The second condition derives from terminal attractors theory [DFZ0]. The third condition is a sort of weak discrete convexity assumption [P].

S. Fanelli

Padova, February 2007

Prin2004 e Prin2006, Roma ”Tor Vergata”

Global Optimisation Quasi-Newton(QN) algorithms Matrix structures in a global minimization scheme

Theorem 1 allows the definition of the following Non-Suspiciousness Conditions for a BFGS-type method: 1

∀εa > 0, ∃εs : ||∇E (wk )|| > εs during the BFGS-type descent algorithm, apart for k : E (wk ) − Emin < εa ;

2

λk ≤ εa /k∇E (wk )k2 ;

3

k∇E (wk+1 ) − ∇E (wk )k2 kyk k2 = ≤ M, (∇E (wk+1 ) − ∇E (wk ))T λk dk ykT sk

4

kAk kkA−1 k k≤N

N.B. The second condition derives from terminal attractors theory [DFZ0]. The third condition is a sort of weak discrete convexity assumption [P]. Since every descent direction is associated to a p.d. matrix with a well defined spectral structure, the fourth condition may be satisfied, by suitably modifying the matrices Ak during the optimization process. S. Fanelli

Padova, February 2007

Prin2004 e Prin2006, Roma ”Tor Vergata”

Global Optimisation Quasi-Newton(QN) algorithms Matrix structures in a global minimization scheme

Matrix structures in a global minimization scheme In order to define a global minimization scheme, we must satisfy Theorem 1 assumptions from an operational point of view.

S. Fanelli

Padova, February 2007

Prin2004 e Prin2006, Roma ”Tor Vergata”

Global Optimisation Quasi-Newton(QN) algorithms Matrix structures in a global minimization scheme

Matrix structures in a global minimization scheme In order to define a global minimization scheme, we must satisfy Theorem 1 assumptions from an operational point of view. This leads to compute a repeller matrix Arep for each local minimization. The basic idea is ([T]) to approximate Arep by the following expression: −1

Arep ≈ λrep I + (I /µ + R)

S. Fanelli

Padova, February 2007

, rank(R) ≤ 4

Prin2004 e Prin2006, Roma ”Tor Vergata”

Global Optimisation Quasi-Newton(QN) algorithms Matrix structures in a global minimization scheme

Matrix structures in a global minimization scheme In order to define a global minimization scheme, we must satisfy Theorem 1 assumptions from an operational point of view. This leads to compute a repeller matrix Arep for each local minimization. The basic idea is ([T]) to approximate Arep by the following expression: −1

Arep ≈ λrep I + (I /µ + R)

, rank(R) ≤ 4

being, by Condition (2), λrep the maximal scalar repeller, i.e.: λrep =

a , k∇E (wr )k2

S. Fanelli

k∇E (wr )k
Emin

Prin2004 e Prin2006, Roma ”Tor Vergata”

Global Optimisation Quasi-Newton(QN) algorithms Matrix structures in a global minimization scheme

Matrix structures in a global minimization scheme In order to define a global minimization scheme, we must satisfy Theorem 1 assumptions from an operational point of view. This leads to compute a repeller matrix Arep for each local minimization. The basic idea is ([T]) to approximate Arep by the following expression: −1

Arep ≈ λrep I + (I /µ + R)

, rank(R) ≤ 4

being, by Condition (2), λrep the maximal scalar repeller, i.e.: λrep =

a , k∇E (wr )k2

k∇E (wr )k
Emin

and R with the following structure: R = µ1 ppT + µ2 qqT + µ3 qpT + µ4 pqT , p e q suitable vectors

S. Fanelli

Padova, February 2007

Prin2004 e Prin2006, Roma ”Tor Vergata”

Global Optimisation Quasi-Newton(QN) algorithms Matrix structures in a global minimization scheme

In the first iterations of every local minimization, it is sufficient to verify Condition (1) with a “large” s .

S. Fanelli

Padova, February 2007

Prin2004 e Prin2006, Roma ”Tor Vergata”

Global Optimisation Quasi-Newton(QN) algorithms Matrix structures in a global minimization scheme

In the first iterations of every local minimization, it is sufficient to verify Condition (1) with a “large” s . Since wk is such that E (wk+1 ) < E (wk ), for a > E (w1 ) − Emin , (1) is satisfied if s =

S. Fanelli

Padova, February 2007

1 k∇E (w0 )k 2

Prin2004 e Prin2006, Roma ”Tor Vergata”

Global Optimisation Quasi-Newton(QN) algorithms Matrix structures in a global minimization scheme

In the first iterations of every local minimization, it is sufficient to verify Condition (1) with a “large” s . Since wk is such that E (wk+1 ) < E (wk ), for a > E (w1 ) − Emin , (1) is satisfied if s =

1 k∇E (w0 )k 2

. . . for a > E (wr ) − Emin , (1) is satisfied if s =

S. Fanelli

1 min k∇E (wk )k 2 k=0,...,r

Padova, February 2007

Prin2004 e Prin2006, Roma ”Tor Vergata”

Global Optimisation Quasi-Newton(QN) algorithms Matrix structures in a global minimization scheme

In the first iterations of every local minimization, it is sufficient to verify Condition (1) with a “large” s . Since wk is such that E (wk+1 ) < E (wk ), for a > E (w1 ) − Emin , (1) is satisfied if s =

1 k∇E (w0 )k 2

. . . for a > E (wr ) − Emin , (1) is satisfied if s =

1 min k∇E (wk )k 2 k=0,...,r

When s is becoming “small” and E (wr ) >> Emin = E (w∗ ), then the ˆ which cannot correspond sequence is converging to a stationary point w to the global minimum Emin .

S. Fanelli

Padova, February 2007

Prin2004 e Prin2006, Roma ”Tor Vergata”

Global Optimisation Quasi-Newton(QN) algorithms Matrix structures in a global minimization scheme

Setting

S. Fanelli

Mr

=

Nr

=

max

k=0,...,r

kyk k2 ykT sk

max kAk kkA−1 k k

k=0,...,r

Padova, February 2007

Prin2004 e Prin2006, Roma ”Tor Vergata”

Global Optimisation Quasi-Newton(QN) algorithms Matrix structures in a global minimization scheme

Setting Mr

=

Nr

=

max

k=0,...,r

kyk k2 ykT sk

max kAk kkA−1 k k

k=0,...,r

K = max{Mr , Nr }. It follows from Condition (3): ∀k ≤ r ,

S. Fanelli

Padova, February 2007

λr ≥

kyk k2 . K ykT dk

Prin2004 e Prin2006, Roma ”Tor Vergata”

Global Optimisation Quasi-Newton(QN) algorithms Matrix structures in a global minimization scheme

Setting Mr

=

Nr

=

max

k=0,...,r

kyk k2 ykT sk

max kAk kkA−1 k k

k=0,...,r

K = max{Mr , Nr }. It follows from Condition (3): ∀k ≤ r ,

λr ≥

kyk k2 . K ykT dk

Purpose: Define wr +1 such that, by using the latter point as the new starting vector, the algorithm LQN is convergent to a stationary ˆ ˆˆ ˆ with E (w) 0

Rr +1 (µ) = (1)

being: pr = wr +1 − wr

S. Fanelli

qr q T pr pT r − (I /µ) T r T qr pr pr pr (1)

qr = ∇E (wr +1 ) − ∇E (wr )

Padova, February 2007

Prin2004 e Prin2006, Roma ”Tor Vergata”

Global Optimisation Quasi-Newton(QN) algorithms Matrix structures in a global minimization scheme

SOME PRELIMINARY IDEAS: Suggestions for a proper tunneling phase. (1)

a 1) Compute wr +1 = wr −λrep ∇E (wr ), λrep = k∇E (w 2 r )k 2) Define Rr +1 (µ) : rank(Rr +1 (µ)) = 2, µ > 0

Rr +1 (µ) = (1)

being: pr = wr +1 − wr

qr q T pr pT r − (I /µ) T r T qr pr pr pr (1)

qr = ∇E (wr +1 ) − ∇E (wr )

By applying Sherman-Morrison-Woodbury formula, it follows:

S. Fanelli

Padova, February 2007

Prin2004 e Prin2006, Roma ”Tor Vergata”

Global Optimisation Quasi-Newton(QN) algorithms Matrix structures in a global minimization scheme

SOME PRELIMINARY IDEAS: Suggestions for a proper tunneling phase. (1)

a 1) Compute wr +1 = wr −λrep ∇E (wr ), λrep = k∇E (w 2 r )k 2) Define Rr +1 (µ) : rank(Rr +1 (µ)) = 2, µ > 0

Rr +1 (µ) = (1)

being: pr = wr +1 − wr

qr q T pr pT r − (I /µ) T r T qr pr pr pr (1)

qr = ∇E (wr +1 ) − ∇E (wr )

By applying Sherman-Morrison-Woodbury formula, it follows: 

−1 T q T qr pr p T pr q T r + qr pr I /µ + Rr +1 (µ) = µI +(1+µ rT ) T r −µ( ), µ > 0 T q r pr qr pr qr pr

Thus, we obtain a memoryless updating formula. S. Fanelli

Padova, February 2007

Prin2004 e Prin2006, Roma ”Tor Vergata”

Global Optimisation Quasi-Newton(QN) algorithms Matrix structures in a global minimization scheme

(2)

3) Define wr +1 :  −1 (2) (1) wr +1 = wr +1 − I /µ0 + Rr +1 (µ0 ) ∇E (wr ) :

S. Fanelli

Padova, February 2007

Prin2004 e Prin2006, Roma ”Tor Vergata”

Global Optimisation Quasi-Newton(QN) algorithms Matrix structures in a global minimization scheme

(2)

3) Define wr +1 :  −1 (2) (1) wr +1 = wr +1 − I /µ0 + Rr +1 (µ0 ) ∇E (wr ) : h  −1 i (2) (1) E (wr +1 ) = min E wr +1 − I /µ + Rr +1 (µ) ∇E (wr ) µ

Therefore: h  −1 i (2) wr +1 = wr − λrep I + I /µ0 + Rr +1 (µ0 ) ∇E (wr )

S. Fanelli

Padova, February 2007

Prin2004 e Prin2006, Roma ”Tor Vergata”

Global Optimisation Quasi-Newton(QN) algorithms Matrix structures in a global minimization scheme

(2)

3) Define wr +1 :  −1 (2) (1) wr +1 = wr +1 − I /µ0 + Rr +1 (µ0 ) ∇E (wr ) : h  −1 i (2) (1) E (wr +1 ) = min E wr +1 − I /µ + Rr +1 (µ) ∇E (wr ) µ

Therefore: h  −1 i (2) wr +1 = wr − λrep I + I /µ0 + Rr +1 (µ0 ) ∇E (wr ) 4) Evaluate: (2)

E (wr +1 ) − E (wr )

S. Fanelli

Padova, February 2007

Prin2004 e Prin2006, Roma ”Tor Vergata”

Global Optimisation Quasi-Newton(QN) algorithms Matrix structures in a global minimization scheme

if: (

(2)

E (wr +1 ) < E (wr ) (2) E (wr +1 )

or   (2) − E (wr ) < c E (wr +1 ) − Emin

being c(.) a suitable function

S. Fanelli

Padova, February 2007

Prin2004 e Prin2006, Roma ”Tor Vergata”

(EC )

Global Optimisation Quasi-Newton(QN) algorithms Matrix structures in a global minimization scheme

if: (

(2)

E (wr +1 ) < E (wr ) (2) E (wr +1 )

or   (2) − E (wr ) < c E (wr +1 ) − Emin

being c(.) a suitable function (2)

Define wr +1 = wr +1 and start a new local search. Else (2)

5) Set: pr = wr +1 − wr

S. Fanelli

(2)

qr = ∇E (wr +1 ) − ∇E (wr ).

Padova, February 2007

Prin2004 e Prin2006, Roma ”Tor Vergata”

(EC )

Global Optimisation Quasi-Newton(QN) algorithms Matrix structures in a global minimization scheme

if: (

(2)

E (wr +1 ) < E (wr ) (2) E (wr +1 )

or   (2) − E (wr ) < c E (wr +1 ) − Emin

(EC )

being c(.) a suitable function (2)

Define wr +1 = wr +1 and start a new local search. Else (2)

5) Set: pr = wr +1 − wr

(2)

qr = ∇E (wr +1 ) − ∇E (wr ).

Solve the new minimum problem associated to the corresponding 

S. Fanelli

−1 I /µ + Rr +1 (µ)

Padova, February 2007

Prin2004 e Prin2006, Roma ”Tor Vergata”

Global Optimisation Quasi-Newton(QN) algorithms Matrix structures in a global minimization scheme

If one of conditions (EC) is fulfilled, define: (3)

wr +1 = wr +1 and start a new local search. ....................................................

S. Fanelli

Padova, February 2007

Prin2004 e Prin2006, Roma ”Tor Vergata”

Global Optimisation Quasi-Newton(QN) algorithms Matrix structures in a global minimization scheme

If one of conditions (EC) is fulfilled, define: (3)

wr +1 = wr +1 and start a new local search. .................................................... Else: 6) Redefine λr by Condition 3 i.e.: kyr k2 a ≤ λr ≤ T K yr d r k∇E (wr )k2

S. Fanelli

Padova, February 2007

Prin2004 e Prin2006, Roma ”Tor Vergata”

Global Optimisation Quasi-Newton(QN) algorithms Matrix structures in a global minimization scheme

If one of conditions (EC) is fulfilled, define: (3)

wr +1 = wr +1 and start a new local search. .................................................... Else: 6) Redefine λr by Condition 3 i.e.: kyr k2 a ≤ λr ≤ T K yr d r k∇E (wr )k2 and then assume: λr