LBk â Bk F = min. XâL. X â Bk F ,. ·F = Frob.norm where L = algebra â CnÃn,. S. Fanelli. Padova, February 2007. Prin2004 e Prin2006, Roma âTor Vergataâ ...
Global Optimisation Quasi-Newton(QN) algorithms Matrix structures in a global minimization scheme
Global Secant Methods and Matrix Structures Stefano Fanelli
Padova, February 2007
S. Fanelli
Padova, February 2007
Prin2004 e Prin2006, Roma ”Tor Vergata”
Global Optimisation Quasi-Newton(QN) algorithms Matrix structures in a global minimization scheme
1
Global Optimisation Quasi-Newton(QN) algorithms
2
Matrix structures in a global minimization scheme
S. Fanelli
Padova, February 2007
Prin2004 e Prin2006, Roma ”Tor Vergata”
Global Optimisation Quasi-Newton(QN) algorithms Matrix structures in a global minimization scheme
Global Optimisation Quasi-Newton(QN) algorithms Given an approssimation Bk of ∇2 E (wk ), let us define the matrix LBk : kLBk − Bk kF = min kX − Bk kF , k · kF = Frob.norm X ∈L
where L = algebra ⊂ Cn×n ,
S. Fanelli
Padova, February 2007
Prin2004 e Prin2006, Roma ”Tor Vergata”
Global Optimisation Quasi-Newton(QN) algorithms Matrix structures in a global minimization scheme
Global Optimisation Quasi-Newton(QN) algorithms Given an approssimation Bk of ∇2 E (wk ), let us define the matrix LBk : kLBk − Bk kF = min kX − Bk kF , k · kF = Frob.norm X ∈L
where L = algebra ⊂ Cn×n , one can define descent methods LQN [DFLZ]: w0 ∈ R n , d0 = −g0 For k = 0, 1, . . . wk+1 = wk + λk dk λk > 0 B k+1 = ϕ( LBk , wk+1 − wk , gk+1 − gk ), gk = ∇E (wk ) | {z } | {z } sk yk −1 dk+1 = −Bk+1 gk+1
S. Fanelli
Padova, February 2007
Prin2004 e Prin2006, Roma ”Tor Vergata”
Global Optimisation Quasi-Newton(QN) algorithms Matrix structures in a global minimization scheme
Global Optimisation Quasi-Newton(QN) algorithms Given an approssimation Bk of ∇2 E (wk ), let us define the matrix LBk : kLBk − Bk kF = min kX − Bk kF , k · kF = Frob.norm X ∈L
where L = algebra ⊂ Cn×n , one can define descent methods LQN [DFLZ]: w0 ∈ R n , d0 = −g0 For k = 0, 1, . . . wk+1 = wk + λk dk λk > 0 B k+1 = ϕ( LBk , wk+1 − wk , gk+1 − gk ), gk = ∇E (wk ) | {z } | {z } sk yk −1 dk+1 = −Bk+1 gk+1 Remark The classical BFGS method [NW] and the more recent minimization methods introduced in [BDFZ], [DFZ2], [DFZ3] are examples of LQN algorithms, (being L = Cn×n , L = {αI }, {Hartley − type}, Lk ) S. Fanelli
Padova, February 2007
Prin2004 e Prin2006, Roma ”Tor Vergata”
Global Optimisation Quasi-Newton(QN) algorithms Matrix structures in a global minimization scheme
The step λk is determined such that: λk | sT k yk > 0 & E (wk+1 ) < E (wk ) The updating function ϕ in Bk+1 = ϕ (LBk , sk , yk ) is ϕ (, s, y) = +
S. Fanelli
1 1 yyT − T ssT . yT s s s
Padova, February 2007
Prin2004 e Prin2006, Roma ”Tor Vergata”
Global Optimisation Quasi-Newton(QN) algorithms Matrix structures in a global minimization scheme
The step λk is determined such that: λk | sT k yk > 0 & E (wk+1 ) < E (wk ) The updating function ϕ in Bk+1 = ϕ (LBk , sk , yk ) is ϕ (, s, y) = +
1 1 yyT − T ssT . yT s s s
The choice of λk and the properties of ϕ and LBk imply: Bk+1 inherites positive definiteness from Bk Bk+1 (wk+1 − wk ) = gk+1 − gk , ⇒ LQN secant algorithms
S. Fanelli
Padova, February 2007
Prin2004 e Prin2006, Roma ”Tor Vergata”
Global Optimisation Quasi-Newton(QN) algorithms Matrix structures in a global minimization scheme
The step λk is determined such that: λk | sT k yk > 0 & E (wk+1 ) < E (wk ) The updating function ϕ in Bk+1 = ϕ (LBk , sk , yk ) is ϕ (, s, y) = +
1 1 yyT − T ssT . yT s s s
The choice of λk and the properties of ϕ and LBk imply: Bk+1 inherites positive definiteness from Bk Bk+1 (wk+1 − wk ) = gk+1 − gk , ⇒ LQN secant algorithms The structured space L ⇒ LQN of low complexity
S. Fanelli
Padova, February 2007
Prin2004 e Prin2006, Roma ”Tor Vergata”
Global Optimisation Quasi-Newton(QN) algorithms Matrix structures in a global minimization scheme
One can prove the following global convergence theorem [DFZ4]: Theorem 1 Given E ∈ C 2 , let Emin be the value of its global minimum. Assume that: ∀a ∈ 0, ∃εs : ||∇E (wk )|| > εs during the BFGS-type descent algorithm, apart for k : E (wk ) − Emin < εa ;
2
λk ≤ εa /k∇E (wk )k2 ;
3
k∇E (wk+1 ) − ∇E (wk )k2 kyk k2 = ≤ M, (∇E (wk+1 ) − ∇E (wk ))T λk dk ykT sk
4
kAk kkA−1 k k≤N
S. Fanelli
Padova, February 2007
Prin2004 e Prin2006, Roma ”Tor Vergata”
Global Optimisation Quasi-Newton(QN) algorithms Matrix structures in a global minimization scheme
Theorem 1 allows the definition of the following Non-Suspiciousness Conditions for a BFGS-type method: 1
∀εa > 0, ∃εs : ||∇E (wk )|| > εs during the BFGS-type descent algorithm, apart for k : E (wk ) − Emin < εa ;
2
λk ≤ εa /k∇E (wk )k2 ;
3
k∇E (wk+1 ) − ∇E (wk )k2 kyk k2 = ≤ M, (∇E (wk+1 ) − ∇E (wk ))T λk dk ykT sk
4
kAk kkA−1 k k≤N
N.B. The second condition derives from terminal attractors theory [DFZ0].
S. Fanelli
Padova, February 2007
Prin2004 e Prin2006, Roma ”Tor Vergata”
Global Optimisation Quasi-Newton(QN) algorithms Matrix structures in a global minimization scheme
Theorem 1 allows the definition of the following Non-Suspiciousness Conditions for a BFGS-type method: 1
∀εa > 0, ∃εs : ||∇E (wk )|| > εs during the BFGS-type descent algorithm, apart for k : E (wk ) − Emin < εa ;
2
λk ≤ εa /k∇E (wk )k2 ;
3
k∇E (wk+1 ) − ∇E (wk )k2 kyk k2 = ≤ M, (∇E (wk+1 ) − ∇E (wk ))T λk dk ykT sk
4
kAk kkA−1 k k≤N
N.B. The second condition derives from terminal attractors theory [DFZ0]. The third condition is a sort of weak discrete convexity assumption [P].
S. Fanelli
Padova, February 2007
Prin2004 e Prin2006, Roma ”Tor Vergata”
Global Optimisation Quasi-Newton(QN) algorithms Matrix structures in a global minimization scheme
Theorem 1 allows the definition of the following Non-Suspiciousness Conditions for a BFGS-type method: 1
∀εa > 0, ∃εs : ||∇E (wk )|| > εs during the BFGS-type descent algorithm, apart for k : E (wk ) − Emin < εa ;
2
λk ≤ εa /k∇E (wk )k2 ;
3
k∇E (wk+1 ) − ∇E (wk )k2 kyk k2 = ≤ M, (∇E (wk+1 ) − ∇E (wk ))T λk dk ykT sk
4
kAk kkA−1 k k≤N
N.B. The second condition derives from terminal attractors theory [DFZ0]. The third condition is a sort of weak discrete convexity assumption [P]. Since every descent direction is associated to a p.d. matrix with a well defined spectral structure, the fourth condition may be satisfied, by suitably modifying the matrices Ak during the optimization process. S. Fanelli
Padova, February 2007
Prin2004 e Prin2006, Roma ”Tor Vergata”
Global Optimisation Quasi-Newton(QN) algorithms Matrix structures in a global minimization scheme
Matrix structures in a global minimization scheme In order to define a global minimization scheme, we must satisfy Theorem 1 assumptions from an operational point of view.
S. Fanelli
Padova, February 2007
Prin2004 e Prin2006, Roma ”Tor Vergata”
Global Optimisation Quasi-Newton(QN) algorithms Matrix structures in a global minimization scheme
Matrix structures in a global minimization scheme In order to define a global minimization scheme, we must satisfy Theorem 1 assumptions from an operational point of view. This leads to compute a repeller matrix Arep for each local minimization. The basic idea is ([T]) to approximate Arep by the following expression: −1
Arep ≈ λrep I + (I /µ + R)
S. Fanelli
Padova, February 2007
, rank(R) ≤ 4
Prin2004 e Prin2006, Roma ”Tor Vergata”
Global Optimisation Quasi-Newton(QN) algorithms Matrix structures in a global minimization scheme
Matrix structures in a global minimization scheme In order to define a global minimization scheme, we must satisfy Theorem 1 assumptions from an operational point of view. This leads to compute a repeller matrix Arep for each local minimization. The basic idea is ([T]) to approximate Arep by the following expression: −1
Arep ≈ λrep I + (I /µ + R)
, rank(R) ≤ 4
being, by Condition (2), λrep the maximal scalar repeller, i.e.: λrep =
a , k∇E (wr )k2
S. Fanelli
k∇E (wr )k
Emin
Prin2004 e Prin2006, Roma ”Tor Vergata”
Global Optimisation Quasi-Newton(QN) algorithms Matrix structures in a global minimization scheme
Matrix structures in a global minimization scheme In order to define a global minimization scheme, we must satisfy Theorem 1 assumptions from an operational point of view. This leads to compute a repeller matrix Arep for each local minimization. The basic idea is ([T]) to approximate Arep by the following expression: −1
Arep ≈ λrep I + (I /µ + R)
, rank(R) ≤ 4
being, by Condition (2), λrep the maximal scalar repeller, i.e.: λrep =
a , k∇E (wr )k2
k∇E (wr )k
Emin
and R with the following structure: R = µ1 ppT + µ2 qqT + µ3 qpT + µ4 pqT , p e q suitable vectors
S. Fanelli
Padova, February 2007
Prin2004 e Prin2006, Roma ”Tor Vergata”
Global Optimisation Quasi-Newton(QN) algorithms Matrix structures in a global minimization scheme
In the first iterations of every local minimization, it is sufficient to verify Condition (1) with a “large” s .
S. Fanelli
Padova, February 2007
Prin2004 e Prin2006, Roma ”Tor Vergata”
Global Optimisation Quasi-Newton(QN) algorithms Matrix structures in a global minimization scheme
In the first iterations of every local minimization, it is sufficient to verify Condition (1) with a “large” s . Since wk is such that E (wk+1 ) < E (wk ), for a > E (w1 ) − Emin , (1) is satisfied if s =
S. Fanelli
Padova, February 2007
1 k∇E (w0 )k 2
Prin2004 e Prin2006, Roma ”Tor Vergata”
Global Optimisation Quasi-Newton(QN) algorithms Matrix structures in a global minimization scheme
In the first iterations of every local minimization, it is sufficient to verify Condition (1) with a “large” s . Since wk is such that E (wk+1 ) < E (wk ), for a > E (w1 ) − Emin , (1) is satisfied if s =
1 k∇E (w0 )k 2
. . . for a > E (wr ) − Emin , (1) is satisfied if s =
S. Fanelli
1 min k∇E (wk )k 2 k=0,...,r
Padova, February 2007
Prin2004 e Prin2006, Roma ”Tor Vergata”
Global Optimisation Quasi-Newton(QN) algorithms Matrix structures in a global minimization scheme
In the first iterations of every local minimization, it is sufficient to verify Condition (1) with a “large” s . Since wk is such that E (wk+1 ) < E (wk ), for a > E (w1 ) − Emin , (1) is satisfied if s =
1 k∇E (w0 )k 2
. . . for a > E (wr ) − Emin , (1) is satisfied if s =
1 min k∇E (wk )k 2 k=0,...,r
When s is becoming “small” and E (wr ) >> Emin = E (w∗ ), then the ˆ which cannot correspond sequence is converging to a stationary point w to the global minimum Emin .
S. Fanelli
Padova, February 2007
Prin2004 e Prin2006, Roma ”Tor Vergata”
Global Optimisation Quasi-Newton(QN) algorithms Matrix structures in a global minimization scheme
Setting
S. Fanelli
Mr
=
Nr
=
max
k=0,...,r
kyk k2 ykT sk
max kAk kkA−1 k k
k=0,...,r
Padova, February 2007
Prin2004 e Prin2006, Roma ”Tor Vergata”
Global Optimisation Quasi-Newton(QN) algorithms Matrix structures in a global minimization scheme
Setting Mr
=
Nr
=
max
k=0,...,r
kyk k2 ykT sk
max kAk kkA−1 k k
k=0,...,r
K = max{Mr , Nr }. It follows from Condition (3): ∀k ≤ r ,
S. Fanelli
Padova, February 2007
λr ≥
kyk k2 . K ykT dk
Prin2004 e Prin2006, Roma ”Tor Vergata”
Global Optimisation Quasi-Newton(QN) algorithms Matrix structures in a global minimization scheme
Setting Mr
=
Nr
=
max
k=0,...,r
kyk k2 ykT sk
max kAk kkA−1 k k
k=0,...,r
K = max{Mr , Nr }. It follows from Condition (3): ∀k ≤ r ,
λr ≥
kyk k2 . K ykT dk
Purpose: Define wr +1 such that, by using the latter point as the new starting vector, the algorithm LQN is convergent to a stationary ˆ ˆˆ ˆ with E (w) 0
Rr +1 (µ) = (1)
being: pr = wr +1 − wr
S. Fanelli
qr q T pr pT r − (I /µ) T r T qr pr pr pr (1)
qr = ∇E (wr +1 ) − ∇E (wr )
Padova, February 2007
Prin2004 e Prin2006, Roma ”Tor Vergata”
Global Optimisation Quasi-Newton(QN) algorithms Matrix structures in a global minimization scheme
SOME PRELIMINARY IDEAS: Suggestions for a proper tunneling phase. (1)
a 1) Compute wr +1 = wr −λrep ∇E (wr ), λrep = k∇E (w 2 r )k 2) Define Rr +1 (µ) : rank(Rr +1 (µ)) = 2, µ > 0
Rr +1 (µ) = (1)
being: pr = wr +1 − wr
qr q T pr pT r − (I /µ) T r T qr pr pr pr (1)
qr = ∇E (wr +1 ) − ∇E (wr )
By applying Sherman-Morrison-Woodbury formula, it follows:
S. Fanelli
Padova, February 2007
Prin2004 e Prin2006, Roma ”Tor Vergata”
Global Optimisation Quasi-Newton(QN) algorithms Matrix structures in a global minimization scheme
SOME PRELIMINARY IDEAS: Suggestions for a proper tunneling phase. (1)
a 1) Compute wr +1 = wr −λrep ∇E (wr ), λrep = k∇E (w 2 r )k 2) Define Rr +1 (µ) : rank(Rr +1 (µ)) = 2, µ > 0
Rr +1 (µ) = (1)
being: pr = wr +1 − wr
qr q T pr pT r − (I /µ) T r T qr pr pr pr (1)
qr = ∇E (wr +1 ) − ∇E (wr )
By applying Sherman-Morrison-Woodbury formula, it follows:
−1 T q T qr pr p T pr q T r + qr pr I /µ + Rr +1 (µ) = µI +(1+µ rT ) T r −µ( ), µ > 0 T q r pr qr pr qr pr
Thus, we obtain a memoryless updating formula. S. Fanelli
Padova, February 2007
Prin2004 e Prin2006, Roma ”Tor Vergata”
Global Optimisation Quasi-Newton(QN) algorithms Matrix structures in a global minimization scheme
(2)
3) Define wr +1 : −1 (2) (1) wr +1 = wr +1 − I /µ0 + Rr +1 (µ0 ) ∇E (wr ) :
S. Fanelli
Padova, February 2007
Prin2004 e Prin2006, Roma ”Tor Vergata”
Global Optimisation Quasi-Newton(QN) algorithms Matrix structures in a global minimization scheme
(2)
3) Define wr +1 : −1 (2) (1) wr +1 = wr +1 − I /µ0 + Rr +1 (µ0 ) ∇E (wr ) : h −1 i (2) (1) E (wr +1 ) = min E wr +1 − I /µ + Rr +1 (µ) ∇E (wr ) µ
Therefore: h −1 i (2) wr +1 = wr − λrep I + I /µ0 + Rr +1 (µ0 ) ∇E (wr )
S. Fanelli
Padova, February 2007
Prin2004 e Prin2006, Roma ”Tor Vergata”
Global Optimisation Quasi-Newton(QN) algorithms Matrix structures in a global minimization scheme
(2)
3) Define wr +1 : −1 (2) (1) wr +1 = wr +1 − I /µ0 + Rr +1 (µ0 ) ∇E (wr ) : h −1 i (2) (1) E (wr +1 ) = min E wr +1 − I /µ + Rr +1 (µ) ∇E (wr ) µ
Therefore: h −1 i (2) wr +1 = wr − λrep I + I /µ0 + Rr +1 (µ0 ) ∇E (wr ) 4) Evaluate: (2)
E (wr +1 ) − E (wr )
S. Fanelli
Padova, February 2007
Prin2004 e Prin2006, Roma ”Tor Vergata”
Global Optimisation Quasi-Newton(QN) algorithms Matrix structures in a global minimization scheme
if: (
(2)
E (wr +1 ) < E (wr ) (2) E (wr +1 )
or (2) − E (wr ) < c E (wr +1 ) − Emin
being c(.) a suitable function
S. Fanelli
Padova, February 2007
Prin2004 e Prin2006, Roma ”Tor Vergata”
(EC )
Global Optimisation Quasi-Newton(QN) algorithms Matrix structures in a global minimization scheme
if: (
(2)
E (wr +1 ) < E (wr ) (2) E (wr +1 )
or (2) − E (wr ) < c E (wr +1 ) − Emin
being c(.) a suitable function (2)
Define wr +1 = wr +1 and start a new local search. Else (2)
5) Set: pr = wr +1 − wr
S. Fanelli
(2)
qr = ∇E (wr +1 ) − ∇E (wr ).
Padova, February 2007
Prin2004 e Prin2006, Roma ”Tor Vergata”
(EC )
Global Optimisation Quasi-Newton(QN) algorithms Matrix structures in a global minimization scheme
if: (
(2)
E (wr +1 ) < E (wr ) (2) E (wr +1 )
or (2) − E (wr ) < c E (wr +1 ) − Emin
(EC )
being c(.) a suitable function (2)
Define wr +1 = wr +1 and start a new local search. Else (2)
5) Set: pr = wr +1 − wr
(2)
qr = ∇E (wr +1 ) − ∇E (wr ).
Solve the new minimum problem associated to the corresponding
S. Fanelli
−1 I /µ + Rr +1 (µ)
Padova, February 2007
Prin2004 e Prin2006, Roma ”Tor Vergata”
Global Optimisation Quasi-Newton(QN) algorithms Matrix structures in a global minimization scheme
If one of conditions (EC) is fulfilled, define: (3)
wr +1 = wr +1 and start a new local search. ....................................................
S. Fanelli
Padova, February 2007
Prin2004 e Prin2006, Roma ”Tor Vergata”
Global Optimisation Quasi-Newton(QN) algorithms Matrix structures in a global minimization scheme
If one of conditions (EC) is fulfilled, define: (3)
wr +1 = wr +1 and start a new local search. .................................................... Else: 6) Redefine λr by Condition 3 i.e.: kyr k2 a ≤ λr ≤ T K yr d r k∇E (wr )k2
S. Fanelli
Padova, February 2007
Prin2004 e Prin2006, Roma ”Tor Vergata”
Global Optimisation Quasi-Newton(QN) algorithms Matrix structures in a global minimization scheme
If one of conditions (EC) is fulfilled, define: (3)
wr +1 = wr +1 and start a new local search. .................................................... Else: 6) Redefine λr by Condition 3 i.e.: kyr k2 a ≤ λr ≤ T K yr d r k∇E (wr )k2 and then assume: λr