Feb 16, 2009 - In this study, a new self-scaling VM algorithm for solving large- ..... equation (23) givens a class of modified secant condition in the form k k. T.
J J. Edu. & Sci., Vol. (23), No. (1) 2010 K
A new self –scaling VM-algorithm with an adaptive line search Abbas. Al-Baiati Salah G. Shareef Hamsa Th. Chilmerane Department of Mathematics College of Computers Sciences & Mathematics Mosul University Received ٢٥ / 0٦ / 2008
Accepted 16 / 02 / 2009
:ﺍﻟﺨﻼﺼﺔ ﻓﻲ ﻫﺫﺍ ﺍﻟﺒﺤﺙ ﺘﻡ ﺍﻗﺘﺭﺍﺡ ﺨﻭﺍﺭﺯﻤﻴﺔ ﺠﺩﻴﺩﺓ ﻟﻠﻤﺘﺭﻱ ﺍﻟﻤﺘﻐﻴﺭ ﺫﺍﺘﻲ ﺍﻟﻘﻴﺎﺱ ﻓﻲ ﺍﻻﻤﺜﻠﻴﺔ ﻏﻴﺭ ﺍﻟﺨﻁﻴﺔ ﻭﻏﻴﺭ ﺍﻟﻤﻘﻴﺩﺓ ﻤﻊ ﺍﺴﺘﺨﺩﺍﻡ ﺨﻁ ﺒﺤﺙ ﻤﻼﺌﻡ ﻭﺒﻌﺩ ﺍﺴـﺘﺨﺩﺍﻡ ﺼـﻔﻪ ﺍﻟﺘﻘـﺎﺭﺏ ﺍﻟﻤﺤﻠـﻲ
ﻟﻘـﺩ ﺘـﻡ. ﻟﻠﺤﺼﻭل ﻋﻠﻰ ﺼﻔﻪ ﺍﻟﺘﻘﺎﺭﺏ ﺍﻟﺸﺎﻤل ﻓﻲ ﺍﻟﻤﺼﻔﻭﻓﺔ ﺃﻟﻤﺴﺘﺨﺩﻤﺔBFGS ﻟﻤﺼﻔﻭﻓﺔ . ﺍﻟﻘﻴﺎﺴﻴﺔ ﻭﻜﺎﻨﺕ ﺍﻟﻨﺘﺎﺌﺞ ﻤﺸﺠﻌﺔ ﺠﺩﹰﺍBFGS ﻤﻘﺎﺭﻨﻪ ﻫﺫﻩ ﺍﻟﺨﻭﺍﺭﺯﻤﻴﺔ ﻤﻊ ﺨﻭﺍﺭﺯﻤﻴﺔ
Abstract In this study, a new self-scaling VM algorithm for solving largescale nonlinear unconstrained optimization problems is proposed. The new algorithm is a kind of line search procedure .The local convergence theory for the BFGS matrix implementation of this variable metric is applied to achieve a globally convergent property.Numerical experiments have been done with the standard BFGS algorithm with very promising results. Keywords: Self-scaling Variable Metric, Line search procedure, Largescale optimization, Global convergence algorithm.
٤٤
A new self –scaling VM-algorithm with an adaptive line search.
1. Introduction consider line search descent method for solving the unconstrained optimization problem min f ( x) ……………….……..(1) x∈R n
where f is twice continuously differentiable. Each method generates a sequence of points {xk } for k=1,2,………until termination. The initial point xi is given. If g k = ∇f ( xk ) = 0 , for some k, then the method terminates with xk = x *. Otherwise, a search direction d k is defined for which g kT d k < 0 …………………….….(2)
(the descent property). Then a new iterate is defined by the line search xk +1 = xk + λk d k ……………...…(3) where λk is a step length chosen to minimize f along dk for which g kT+1 d k = 0 …………………….….(4) In this case the line search is exact (see [1]). The search direction of Newton's method is defined by d k = −Gk−1 g k
……………….……(5)
where Gk = ∇ 2 f ( xk ) is the Hessian matrix which, for xk sufficiently close to x*, is usually positive definite. The methods that we study are all iterative methods, since in the quadratic case we cannot expect to be able to carry out enough iteration to obtain an exact solution, even if the theory allows this possibility, due to the size of n or the build up of round off error. For the quadratic case the conjugate gradient (CG) method itself (see [3]), or some preconditioned conjugate gradient (PCG) method (see[4]), is usually the method of choice, although there are other variants such as the minimum residual (MR) algorithm that are also applicable to the case that G is symmetric and indefinite. We consider the case when a preconditioned version of a nonlinear conjugate gradient method is used to minimize f; the matrix G will denote the precondition matrix. At the first iteration, the search direction is defined by d1 = −G −1 g1 ……..………..…..….(6)
for k > 0 , d k = −Gk−1 g k + β k −1 d k −1 ……….…..(7) where β k +1
and where
y kT−1Gk−1 g k = T y k −1 d k −1
٤٥
………...……….(8)
Abbas. Al-Baiati & Salah G. Shareef & Hamsa Th. Chilmerane
y k −1 = g k − g k −1 ………….……….(9)
(This method is referred to as preconditioned conjugate gradient (PCG), for more details (see[1])). Resetting corresponds to defining dk for k>1 as d k = −Gk−1 g k …………………..…(10)
which in effect starts a new sequence of iteration. For the VM-methods assume that at the k th iteration at approximation point xk and n × n matrix Hk are available. Then the methods proceed by generating a sequence of approximation points via the equation: xk +1 = xk + λk d k ……………….…...(11)
and
d k +1 = − H k g k
……….…………..….(12) where Hk is an approximation of G-1 which is corrected or updated form iteration to iteration,in general Hk is symmetric and positive definite. There are different choice of Hk (see[5]) we list here some most popular forms (see[6]) H
SR k +1
(v k − H k y k )(v k − H k y k ) T = Hk + (v k − H k y k ) T y k
…….…(13)
is called rank one correction formula, where vk = xk+1-xk and yk is defined as in equation(9) H
BFGS k +1
⎡ y kT H k y k ⎤ v k v kT ⎡ v k y kT H k + H k y k v kT ⎤ = H k + ⎢1 + T −⎢ ⎥ ⎥ v k y k ⎦ v k y kT ⎣ v k y kT ⎣ ⎦
………….(14) which are satisfies the Quasi-Newton condition defined by H k +1 y k = v k
…………….…………………..……….(15) and maintains the positive definite matrices if H0 is positive.
2. Self–scaling VM methods To improve the performance of VM updates, (see [6]) we are trying to choose Hk+1 to satisfy the following modified QN equation H k +1 yk = ρ k vk
……………..………………….….(16)
where ρ k > 0 is a scaling parameter .The BFGS may be written as: H k +1 = H k −
H k y k v kT + v k y kT H k ⎡ 1 y kT H k y k + + ⎢ v kT y k v kT y k ⎣ Tk
⎤ v k v kT ⎥ T ⎦ vk yk
…………(17)
where Tk =
1
λk
=
6 ( f ( x k ) − f ( x k +1 ) + v kT g k +1 ) − 2 v yk T k
٤٦
………………………….(18)
A new self –scaling VM-algorithm with an adaptive line search.
in fact the self-scaling update proposed by (Oren [8]) has some good characteristics, with a self –scaling parameter γ k , this class of updates can be written as ⎡ v vT H y yT H T ⎤ H k +1 = ⎢ H k − k T k k k + φ k ( y k H k y k )v k− v − k ⎥γ k + Tk k vk H k y k vk y k ⎣ ⎦
…………….(19 a)
k yk with v = Tv k − H T
vk y k
yk H k yk
(Oren and Luenberger [9]) suggest to use the self- adaptable values for the parameter γ k γk = t
g kT v k v kT y k + ( 1 − t ) g kT H k y k y kT H k y k
…………………………….……….(19 b)
and usually the value t=0 is recommended for the updates in the convex class, i.e yk =
v kT y k and y kT H k y k
σk =
1
γk
y kT H k y k (AL-Bayti 1991) ……….(20) = v kT y k
3. Armijo line search procedure Armijo provided in [10] a modification of the steepest descent method which automatically adapts step size λ of the iterative scheme x k +1 = x k − λ k ∇f ( x k ), k = 1,2,3,......... , ………, ………………………(21)
his method and the corresponding convergence result are as follows: Theorem (Armijo[10]) suppose that the objective function f : R n → R is continues on Rn and bounded below on Rn. Assume that for a given x0 ∈Rn , the function f is continuously differentiable on the bounded level set α ( x0 ) = {x : f ( x) ≤ f ( x0 } and that there exists a unique point x * ∈ R n which minimize f. Suppose further that the equation ∇ f ( x ) = 0 is satisfied for * ∇f is Lipschitz continues on α ( x0 ) , xo ∈α (xo ) if and only if x = x and that i.e, there exists a Lipschitz constant k ≠ 0 , such that ∇f ( x ) − ∇f ( y ) ≤ k x − y ,
for every pair x, y ∈ s( x0 ) . Let λ0 be an arbitrary assigned positive number,and consider the sequence λ m= λ m / 2 m−1 , m=1,2,……….. . Then for the sequence of points {xk }∞k =0 ٤٧
Abbas. Al-Baiati & Salah G. Shareef & Hamsa Th. Chilmerane
defined by x k +1= x k − λ Tk ∇f ( x) , k=0,1, …………………….………………..(22) where m is the smallest positive integer for which 1 2 f (xk − λ mk ∇f (xk )) − f (xk ) ≤ − λ mk ∇f (xk ) 2 x = x* it holds that klim →∞ Next, a high–level description of the Armijo,s algorithm, in which the corresponding parameters indicate x0 initial point, λ0 an arbitrary large initial step size, NIT (number of iterations) the maximum number of iterations required and ∈ the predetermined desired accuracy. Algorithm1 1- Input {f : x0:λ0 ; NIT,ε} 2- Set k=1 3- If k