A Nonsmooth Equation Based BFGS Method for Solving ... - CiteSeerX

10 downloads 0 Views 2MB Size Report
Jul 15, 1998 - Abstract In this paper, we present a BFGS method for solving a KKT system in mathematical ... quadratic programming (SQP) method is considered one of the most important ... However, so far no answer has been given as to whether any ..... On the other hand, since p satis es the third equality of (2.9), and.
A Nonsmooth Equation Based BFGS Method for Solving KKT Systems in Mathematical Programming Donghui Li

1

Department of Applied Mathematics Hunan University Changsha, China 410082 e-mail: [email protected] Nobuo Yamashita Department of Applied Mathematics and Physics Graduate School of Informatics Kyoto University Kyoto 606-8501, Japan e-mail: [email protected] Masao Fukushima Department of Applied Mathematics and Physics Graduate School of Informatics Kyoto University Kyoto 606-8501, Japan e-mail: [email protected]

July 15, 1998

Abstract

In this paper, we present a BFGS method for solving a KKT system in mathematical

programming, based on a nonsmooth equation reformulation of the KKT system. We successively split the nonsmooth equation into equivalent equations with particular structure. Based on the splitting, we develop a BFGS method in which subproblems are systems of linear equations with symmetric and positive de nite coecient matrices. A suitable line search is introduced under which the generated iterates exhibit an approximately norm decent property. The method is well de ned and, under suitable conditions, converges to a KKT point globally and superlinearly without convexity assumption on the problem.

Key Words

KKT system, splitting function, BFGS method, global convergence, superlinear

convergence

1

Present address (available until October, 1999): Department of Applied Mathematics and Physics, Graduate

School of Informatics, Kyoto University, Kyoto 606-8501, Japan, e-mail: [email protected]

1.

Introduction

Let f; gi ; i = 1; 2; . . . ; m, and hj j = 1; 2; . . . ; r, be twice continuously di erentiable functions from Rn to R. Consider the following general constrained mathematical programming problem: min f(x) s.t. gi (x)  0; i = 1; 2; . . . ; m hj (x) = 0; j = 1; 2; . . . ; r:

(1:1)

Associated with problem (1.1) is the Lagrangian function l : Rn+m+r ! R de ned by l(z) = f(x) 0 T g(x) 0 T h(x) = f(x) 0

m Xr X  g (x) 0  h (x); i=1

i i

j =1

j j

(1:2)

where z = (x; ; ) 2 Rn+m+r . Iterative methods for solving (1.1) typically generate a sequence of points fzk g estimating z 3 = (x3 ; 3 ; 3 ), a solution of the following KKT system:

rxl(x; ; ) = rf (x) 0 P irgi(x) 0 P j rhj (x) = 0; i=1 j =1 i  0; gi (x)  0; i gi (x) = 0; i = 1; 2; . . . ; m; m

r

(1:3)

hj (x) = 0; j = 1; 2; . . . ; r:

There have been developed many iterative methods for solving (1.3). We refer to [14] for a comprehensive treatment of these methods. Among the iterative methods, the so-called successive quadratic programming (SQP) method is considered one of the most important methods and has received much attention (see e.g. [1, 2, 3, 4, 6, 10, 15, 17, 18, 19, 24, 27, 28, 29, 30, 33]). In conventional SQP methods (see e.g. [17, 28, 29, 30]), problem (1.1) is approximated by a sequence of quadratic programming problems min 12 pT Bk p + rf(xk )T p s.t. gi (xk ) + rgi (xk )T p  0; i = 1; 2; . . . ; m hj (xk ) + rhj (xk )T p = 0; j = 1; 2; . . . ; r;

(1:4)

where Bk is the Hessian r2x l(xk ; k ; k ) of l at zk = (xk ; k ; k ) or its approximation, corresponding to Newton's method or quasi-Newton methods, respectively. There are other di erent versions of SQP methods where the constraints of the corresponding quadratic programming subproblems di er from those of (1.4) (see e.g. [3, 4, 15]). Under suitable conditions, SQP methods converge globally and superlinearly. Unfortunately, for most SQP methods, the subproblems sometimes turn out to be inconsistent, i.e., problem (1.4) may have no feasible solution. In recent years, some remedies have been proposed (see e.g. [3, 4, 33]). However, those methods require 1

additional computational e orts at each step. Another de ciency of SQP methods is that to establish global convergence of a SQP method, it is often assumed that the matrix sequence fBk g is uniformly bounded and positive de nite, i.e., there are constants 0 < m  M such that the following inequalities hold for all k: mkpk2  pT Bk p  M kpk2 ;

8p 2 Rn :

(1:5)

However, so far no answer has been given as to whether any particular quasi-Newton method satis es this condition. In this paper, we present a BFGS method for (1.3) based on an equivalent system of nonsmooth equations. In each step, the proposed method solves a system of linear equations whose coecient matrix is symmetric and positive de nite. Therefore the subproblems are always consistent and can be solved by any ecient method for linear equations. Moreover, the proposed method exhibits an approximately norm decent property. Under suitable conditions, we prove that the proposed BFGS method converges globally and superlinearly to a KKT point of (1.1) without assuming that the condition (1.5) holds for all k. In particular, the convergence analysis does not require the convexity of the problem. The paper is organized as follows: In the next section, we rst transform (1.3) into an equivalent system of nonsmooth equations and then split it into another form with a special structure. We also discuss several properties of the splitting in Section 2. In Section 3, we present BFGS method and prove some useful lemmas. In Sections 4 and 5, we prove global convergence and superlinear convergence, respectively, of the proposed BFGS method. We conclude the paper by giving some remarks in Section 6. 2.

Splitting Function and Its Properties

In this section, we rst reformulate the KKT system (1.3) as an equivalent nonsmooth equation and then describe a splitting method for the latter equation. The splitting function has a particular structure, which is the basis for designing a BFGS method. Let  : R2 ! R be the Fischer-Burmeister function [12] de ned by (a; b) =

p2

a + b2 0 (a + b):

The Fischer-Burmeister function has been extensively used in the context of the nonlinear complementarity problem and related problems (see e.g. [8, 11, 12, 13, 16, 20, 21, 34]). The function  is di erentiable everywhere except at the origin. It is easy to see that (a; b) = 0 if and only if 2

a  0, b  0 and ab = 0.

A function with this property is called an NCP function. By the use of Fischer-Burmeister function, the KKT system (1.3) can be rewritten in the following form:

8 > > < L(z) = 0; (gi (x); i ) = 0; i = 1; 2; 1 1 1 ; m > > hj (x) = 0; j = 1; 2; 1 1 1 ; r; :

where L(z )

= rx l(z ) = rf (x) 0

m X i=1

i rgi (x) 0

= rf (x) 0 rg (x) 0 rh(x):

r X j =1

(2:1)

j rhj (x)

De ne 8(x; ) = (1(x; ); 2(x; ); . . . ; m (x; ))T with elements i (x; ) = (gi (x); i ); i = 1; 2; 1 1 1 ; m:

Then (2.1) is represented in a compact form:

1 0 L ( z ) C 4B F (z ) = B B@ 08(x; ) CCA = 0: 0h(x)

(2:2)

The function F is generally not di erentiable because of the nondi erentiability of  and hence cannot be solved by ordinary iterative methods for smooth equations. Recently, there have been developed various Newton-type methods for solving nonsmooth equations with global and superlinear/quadratic convergence properties (see e.g. [18, 25, 26, 27, 31, 32]). However, the study on quasi-Newton methods for nonsmooth equations is relatively scarce. In this paper, we propose a BFGS method for solving the KKT system (1.3) or equivalently the nonsmooth equation (2.2). We construct an iterative process based on the splitting F (z ) = Fk (z ) + Gk (z ) = 0;

(2:3)

where Fk is di erentiable everywhere and Gk is not necessary di erentiable but relatively small compared with Fk . The idea of approximating a nonsmooth equation using a splitting function rst appeared in [7]. Qi and Chen [32] proposed a splitting function based successive Newton method for a class of nonsmooth equations and showed that under suitable conditions, the method converges globally and quadratically. The results obtained in [32] were extended in [35] to Broyden-like methods 3

for solving nonlinear complementarity problems. In this paper, we propose a BFGS method that is somewhat similar to the one in [35] but utilizes a di erent line search rule. Moreover, the conditions to ensure global convergence of the method are weaker than those in [35]. We now split F into an equivalent form so that the corresponding nonsmooth equation (2.2) takes the form (2.3). For a given constant  > 0, de ne 

8 > > > < > > > : 8 > < > :

( ) =

 a; b

=

( )

if

 a; b ;

p

( (a; b) +

a2

p

> > > :

Moreover, we have

p

0 1; p

b a2

T

01 2

+b

( a 0 1; b 0 1)T ;

;

if if

j(a; b) 0  (a; b)j  12 ; 

Further de ne

+ b2  ;

+ b2 0 (a + b); if a2 + b2  ; 1 [(a2 + b2) 0 2(a + b) + 2 ]; if pa2 + b2 < : 2 a2

8  > > p a > > < a2 + b2



a2

+ b2 0 )2 ; if pa2 + b2 <  2

Then  is everywhere continuously di erentiable and

r (a; b) = >

p

p p

a2

+ b2  ;

a2

+ b2 < :

8(a; b) 2 R2:

) =  (gi (x); i ); i = 1; 2; . . . ; m; 8 (x; ) = (1 (x; ); 2 (x; ); . . . ; m (x; ))T ; 1 0 0 rf (x) 0 rg(x) 0 rh(x) L(z ) B C B B C B   F (z ) = B 08 (x; ) C = B 08(x; ) A @ @ 0h(x) 0h(x) 

(

i x; 

and

0 B B G (z ) = F (z ) 0 F  (z ) = B @

0(8(x; ) 0 8 (x; ))

Then we get a splitting form of F : ( ) = F  (z ) + G (z ):

F z

0



0

1 C C C A

1 C C C: A

(2:4)

It is not dicult to see that for every  > 0, F  is continuously di erentiable since 8 is continuously di erentiable. We will give a direct expression of the Jacobian rF  (z ) later. Note that 4

is not di erentiable, but the norm of it is uniformly restricted by . This means that if  is small, then so is kG (x; )k for every (x; ) 2 Rn+m . To be precise, we have G

p kG (z )k = k8(x; ) 0 8 (x; )k  2m ; 

+ +r :

8z 2 R

n m



We now prove some useful lemmas for the splitting function (2.4). First, by the previous arguments, it is easy to show the following lemma. Lemma 2.1

(i) For every z

= (x; ; ) 2 Rn+m+r ,

pm

kG (z)k = k8(x; ) 0 8 (x; )k  2 



(2:5)

:

8(x; ) = 0 if and only if min(; g(x)) = 0. (iii) F (z ) = 0 if and only of z is a KKT point of (1.1). (ii)

Let I  (x; ) and J  (x; ) be the index sets given by I



(x; ) = fi j

respectively. Put

q

2 + gi (x)2  g

and

i

J



(x; ) = fi j

8 >> q 2  0 1; <  + g (x)2 a (x; ) = >>  :  0 1; 8 g (x) >> q >< 2 + g (x)2 0 1; b (x; ) = >> g (x) >:  0 1; i

i

i

i

 i

2i

+ gi (x)2 < g;

if

i

2 I (x; );

if

i

2 J (x; );

if

i

2 I (x; );

if

i

2 J (x; ):

(2:6)



i

i

 i

q





i

i



The next lemma immediately follows from the de nition of ai (x; ) and bi (x; ). 0 and every (x; ) 2 Rn+m , we have 02  ai (x; )  0 and 02  bi (x; )  0 for all i = 1; 2; . . . ; m. Moreover, ai (x; ) = 0 if and only if i   and gi (x) = 0 or equivalently bi (x; ) = 01; bi (x; ) = 0 if and only if gi (x)   and i = 0 or equivalently ai (x; ) = 01. Lemma 2.2

For any  >

The following lemma gives a direct expression of rF  (z ). It can be derived from the expression of r and the chain rule for the derivatives of composite functions. 5

Let f , gi , i = 1; 2; . . . ; m, and hj , j = 1; 2; . . . ; r, be twice continuously di erentiable. Then 8 and F  are continuously di erentiable. Moreover, for every z = (x; ; ) 2 Rn+m+r , we have the following formulas: Lemma 2.3

r  (x; ) = b (x; )rg (x; ); x

 i

 i

r  (x; ) = a (x; )e ;

i

r 8 (x; ) = rg(x) diag (b (x; )); x

and



 i



 i

i

r 8 (x; ) = diag (a (x; ));

 i





 i

1 0 r L(z) 0r 8 (x; ) 0rh(x) C B B rF (z ) = B CA 0 C @ r L(z) 0r 8 (x; ) r L(z ) 0 0 0 1 r L(z ) 0r g(x) diag (b (x; )) 0rh(x) B CC = B B CA ; 0 0r g (x) 0 diag ( a (x; )) @ 0rh(x) 0 0 

x

x











 i

x

T

 i

(2.7)

T

where ei denotes the i-th column of the identity matrix.

The next lemma gives a regularity condition on function F  . For a given z = (x; ; ) and  > 0, we de ne  S (z ) = fi j gi (x) = 0; i  g: (2:8) Let z = (x; ; ) 2 Rn+m+r and  > 0 be given. Suppose that the vectors rgi (x); i 2 S  (z ), and rhj (x); j = 1; 2; . . . ; r, are linearly independent. Suppose also that the matrix Lemma 2.4

r L(z) = r2 l(z) = r2f (x) 0 x

X m

x

i

=1

i

r2g (x) 0 i

X r

j

=1

j

r2h (x) j

is positive de nite on the subspace fp j rgi (x)T p = 0; 8i 2 S  (z ); rh(x)T p = 0g. Then rF  (z ) is nonsingular.

We verify the nonsingularity of rF  (z ) by showing that the system of linear equations rF  (z )w = 0 has the unique solution w = 0. Let w = (p; q; s) 2 Rn+m+r . Then the linear system rF  (z )w = 0 is expressed as Proof

8 > > < r L(z)p 0 rg(x) diag (b (x; ))q 0 rh(x)s = 0; 0rg(x) p 0 diag (a (x; ))q = 0; > > : 0rh(x) p = 0:  i

x

T

 i

T

6

(2:9)

It then follows that T

p

r

(z )p + q T diag (ai (x; ))bi (x; ))q = 0:

xL

(2:10)

Since ai (x; )  0 and bi (x; )  0, by Lemma 2.2, the second term on the left-hand side of (2.10) is nonnegative. On the other hand, since p satis es the third equality of (2.9), and rgi(x)T p = 0; 8i 2 S (z) by Lemma 2.2, the positive de niteness of rxL(z) on the subspace fp j rgi(x)T p = 0; 8i 2 S(z); rh(x)T p = 0g implies that the rst term on the left-hand side of (2.10) is also nonnegative. Consequently, p must be zero and (2.9) reduces to

8 < rg(x) diag (b (x; ))q + rh(x)s : 0 diag (a (x; ))q  i

 i

= 0; = 0:

(2:11)

From the second equality of (2.11), it is obvious that qi = 0 for every i with ai (x; ) 6= 0. In other words, the only possible nonzero elements qi correspond to the indices i satisfying ai (x; ) = 0 which, by Lemma 2.2, coincide with the indices i 2 S  (z ). Moreover, for these i, bi (x; ) = 01. Therefore, from the rst equality of (2.11) we get

X

0

2

i S  (z)

qi

rg (x) + i

X r

j=1

sj

rh (x) = 0: j

By the assumption that rgi (x); i 2 S  (z ), and rhj (x); j = 1; 2; . . . ; r, are linearly independent, it follows that qi = 0; 8i 2 S  (z ) and sj = 0, j = 1; 2; . . . ; r. In other words, zero is the unique solution of (2.9). Therefore, rF  (z ) is nonsingular. 2 3.

Algorithm

This section is devoted to presenting a BFGS method for the nonsmooth equation (2.2) based on the splitting function (2.4). For convenience, we simplify some notations. For a given positive sequence fk g, we abbreviate F k , Gk , ai k , bi k etc. as Fk , Gk , aki , bki etc., respectively. To describe the method, it would be helpful to recall the Newton-type method proposed by Qi and Chen [32]. In their method, the subproblem to be solved at each iteration is the following system of linear equations in p: rFk (zk )T p + F (zk ) = 0: If rFk (zk ) in nonsingular, then the subproblem is equivalent to the linear equation

rF

k

(zk )rFk (zk )T p + rF (zk )F (zk ) = 0:

7

(3:1)

Now we consider avoiding the calculation of the second derivatives r2 f , r2 gi , i = 1; 2; . . . m, and r2hj , j = 1; 2; . . . ; r, in (3.1). Speci cally, we consider the subproblem

Bk p + qk = 0; where matrix Bk 2 R(n+m+r)2(n+m+r) approximates rFk (zk )rFk (zk )T and vector qk 2 Rn+m+r approximates rFk (zk )F (zk ). When rFk is symmetric, the authors [23] proposed a quasi-Newton method that generates such an approximating matrix Bk and an approximating vector qk . In the present case, however, rFk is generally not symmetric, and hence the method of [23] cannot be applied. To obtain suitable approximation Bk and qk , let

1 0 k (x ;  )) 0rh(x ) 0 0r g ( x )diag( b k k k k i CC B (3:2) Qk = B CA : B@ 0rg(xk )T 0diag(aki (xk ; k )) 0 0rh(xk )T 0 0 We notice that Qk does not contain the second derivatives r f , r gi , i = 1; 2; . . . m, and r hj , 2

2

2

j = 1; 2; . . . ; r. By the de nition of Qk , we immediately have the following relationship

0 BB rxL(zk ) rFk (zk ) = Qk + B @ 0 0

0 0 0 0 0 0

1 CC CA :

(3:3)

Notice that the matrix in the second term on the right-hand side is a symmetric matrix. Let

qk = 0k 011

1 0 L ( x + L ( z ) ;  ;  ) 0 L ( z ) k C BB k k0 k k k CC + Qk Fk (zk ); B@ 0 A 1

(3:4)

0

where k01 is the steplength determined at the (k 0 1)-th iteration. Then we have

1 0Z BB rxL(xk +  k0 L(zk ); k ; k )dL(zk ) CC CC + Qk Fk (zk ) BB 0 A @ 0 0Z 1 BB rxL(xk +  k0 L(zk ); k ; k )d 0 0 CC BB C Fk (zk ) + Qk Fk (zk ) 0 0 0 C @ A 0 0 0 0Z [rx L(xk +  k0 L(zk ); k ; k ) 0 rx L(zk )]d B B rFk (zk )Fk (zk ) + B B 0 @ 1

qk =

1

0

1

=

0

1

1

= =

1

0

0

rFk (zk )Fk (zk ) + uk ;

0

1 0C CC F (z ) 0C A k k

0 0 0

(3.5) 8

0Z1 [rx L(xk +  k01 L(zk ); k ; k ) 0 rx L(zk )]d B B 0 uk = B B 0 @

where

0

Note that uk satis es

kuk k 

Z1 0

1 CC C Fk (zk ): 0 C A

0 0

0 0 0

(3:6)

krxL(xk +  k01L(zk ); k ; k ) 0 rxL(zk )kd kFk (zk )k:

If k k 01 L(zk )k is small, then so is kuk k. By (2.4) and (2.5), when k is small enough, we have F (zk )  Fk (zk ). Therefore, (3.5) implies that qk de ned by (3.4) is an approximation of rF (zk )F (zk ). Moreover, the calculation of qk does not require the second derivatives r2 f , r2gi; i = 1; 2; . . . ; m, and r2hj ; j = 1; 2; . . . ; r. We now construct Bk such that Bk pk  rFk (zk )rFk (zk )T pk by the BFGS update formula:

Bk+1 = Bk 0

Bk sk sTk Bk yk ykT + T ; sTk Bk sk yk sk

where sk = zk+1 0 zk and yk is determined as speci ed shortly. First, we notice that Bk+1 satis es the secant equation Bk +1 sk = yk . So, we only need to select yk appropriately so that yk  rFk +1 (zk+1 )rFk+1 (zk+1 )T sk . Denote

k = Fk (zk+1 ) 0 Fk (zk ) =

Z1 0

rFk (zk + sk )T dsk

1 0 L ( x + L ( z ) 0 L ( z ) ;  ;  ) 0 L ( z ) k k +1 k k k k CC B rk = B CA + Qk k : B 0 @

and

0

Then by (3.3), we get

rk =

=

1 0Z1 r L ( x +  ( L ( z ) 0 L ( z )) ;  ;  ) d ( L ( z ) 0 L ( z )) k+1 k k k k+1 k C BB 0 x k CC BB CA + Qk k 0 @ 0 1 0Z1 r L ( x +  ( L ( z ) 0 L ( z )) ;  ;  ) d 0 0 k+1 k k k CC BB 0 x k C k + Qk k BB 0 0 0 C A @ 0

0 0

9

(3:7)

(3:8)

0 Z1 1 frxL(xk +  (L(zk+1) 0 L(zk )); k ; k ) 0 rxL(zk )gd 0 0 C B B 0 C B C k = rFk (zk )k + B C 0 0 0 @ A

0

rFk (zk )

=

r r rFk (zk ) rFk (zk )rFk (zk )T sk + vk ;

0

0 0

rFk (zk +  sk )T d sk + vk

=

= where

Z 1

Fk (zk ) Fk (zk )T sk +

Z 1

0

[rFk (zk +  sk )T

0 rFk (zk )T ]d sk + vk

0 Z1 frxL(xk +  (L(zk+1) 0 L(zk )); k ; k ) 0 rxL(zk )gd B B 0 B vk = B 0 @

0

and

= rFk (zk )

v k

We can estimate vk as

kvk k   

Z1

0

[rFk (zk +  sk )T

1

0 0C C C

k 0 0C A 0 0

0 rFk (zk )T ]dsk + vk :

(3:9)

Z 1

[rx L(xk +  (L(zk+1 ) 0 L(zk )); k ; k ) 0 rx L(zk )]d

kk k

0 Z 1

0 Z

1

0 Z

1

krxL(xk +  (L(zk+1) 0 L(zk )); k ; k ) 0 rxL(zk )kd kk k krxL(xk +  (L(zk+1) 0 L(zk )); k ; k ) 0 rxL(zk )kd

1

0

krFk (zk +  sk )kd ksk k;

where the last inequality follows from the de nition (3.7) of k . Similarly we have

Z 1

kvk k  krFk (zk )k

 krFk (zk )k +

1

Z 1

0

Z 1

0

0

Z1

0

[rFk (zk +  sk )T

krFk (zk +  sk ) 0 rFk (zk )kd ksk k

krxL(xk +  (L(zk+1) 0 L(zk )); k ; k ) 0 rxL(zk )kd

krFk (zk +  sk )kd ksk k:

Assume that krFk k is bounded with an upper bound M

kvk k 

M

+

Z 1

0 Z1 0

0 rFk (zk )T ]d

ksk k + kvk k

>

0. Then we have

krFk (zk +  sk ) 0 rFk (zk )kd 

krxL(xk +  (L(zk+1) 0 L(zk )); k ; k ) 0 rxL(zk )kd ksk k: 10

Since sk = zk+1 0 zk , we see that kvk k = o(ksk k), which implies rk = rFk (zk +1 )rFk (zk+1 )T sk + o(ksk k). Thus rk seems to be a reasonable choice for yk satisfying Bk+1 sk = yk . Nevertheless, since rFk (zk ) can be singular for some k, taking yk = rk does not ensure that ykT sk > 0 holds for every k which is a sucient condition for Bk+1 to inherit the positive de niteness from Bk . To cope with this problem, we adopt a technique used in [22] to take

yk = rk + k kFk (zk )ksk ; where k is de ned by

k = 1 0

n

(3:10)

o

rT s 1 min k k2 ; 0 : kFk (zk )k ks k k

Note that k  1. If kFk (zk )k is small, then yk yk . Moreover, we have for every k

(3:11)

 rk is also expected to be a suitable choice of

ykT sk = rkT sk + k kFk (zk )kksk k2 = kFk (zk )k ksk k2 + rkT sk 0 minfrkT sk ; 0g  kFk (zk )k ksk k2:

(3.12)

This means that Bk+1 will be positive de nite provided that Bk is positive de nite. In the algorithm, we use a line search similar to the one proposed in [23]. Let fk g be a positive sequence satisfying

X1    < 1;

k=0

(3:13)

k

where  is a positive constant. We determine a steplength k > 0 so that the following inequality holds for = k :

kFk (zk + pk )k 0 kFk (zk )k  0 k Fk (zk )k 0  k pk k 2

2

2

1

2

2

+ k kFk (zk )k2 ;

(3:14)

where 1 and 2 are given positive constants. It is not dicult to see that (3.14) is satis ed for all suciently small > 0, because the last term on the right-hand side is positive and independent of . We state a BFGS method for solving (2.2). Algorithm 1

Choose constants  2 (0; 1); 2 (0; 1), 0 < < p2m , 1 > 0; 2 > 0; 01 > 0. Select a positive sequence fk g satisfying (3.13). Choose an initial point z0 2 Rn+m+r , a symmetric positive de nite matrix B0 2 R(n+m+r)2(n+m+r) and 0  2 kF (z0 )k. Let k := 0. Step 0.

11

Step 1.

to get pk . Step 2.

Let qk be given by (3.4). Solve the linear equation

If

Bk p + qk = 0

(3:15)

kF (zk + pk )k  kF (zk )k;

(3:16)

then let k = 1 and go to Step 4. Otherwise go to Step 3. Step 3. Let ik be the smallest nonnegative integer i such that = i satis es (3.14) and let k = ik . Step 4. Let zk +1 = z k + k pk . Step 5. Update B k by the BFGS formula Bk+1 = Bk 0

Bk sk sTk Bk yk ykT + yT s ; sTk Bk sk k k

(3:17)

where sk = zk +1 0 zk and yk is determined by (3.10) with rk and k given by (3.8) and (3.11), respectively. Step 6 If k < kF (zk +1 )k, take  k +1 = k . Otherwise, determine  k +1 by 1

(3:18) k +1  minf kF (zk+1 )k; k g: 2 2 Step 7. Let k := k + 1. Go to Step 1. We now prove some useful properties of the algorithm. In the rest of the paper, we need the following basic assumption. Assumption A (i) The level set

= fz 2 Rn+m+r j kF (z )k  6e kF (z0 )kg

(3:19)

is bounded, where  is a positive constant such that (3.13) holds. (ii) The functions f , gi ; i = 1; 2; . . . ; m, and hj ; j = 1; 2; . . . ; r, are twice continuously di erentiable on a bounded convex set D containing . We shall show in Theorem 4.1 that under Assumption A, the sequence fzk g generated by Algorithm 1 is contained in . Let M > 0 be an upper bound of krFk (z )k over D, which can be chosen independently of k by Lemmas 2.2 and 2.3. Then the following proposition follows from the previous discussion.

12

Let qk , k , rk and yk be given by (3.4), (3.7), (3.8) and (3.10), respectively. Then we have for every k Proposition 3.1

qk = rFk (zk )Fk (zk ) + uk

with uk satisfying

kuk k 

Z1

0

(3:20)

krxL(xk +  k01L(zk ); k ; k ) 0 rxL(zk )kd kFk (zk )k

and

rk = rFk (zk )rFk (zk )T sk + vk

with vk satisfying

kvk k  M +

Z 1

Z1

0

0

(3:21) (3:22)

krFk (zk + sk ) 0 rFk (zk )kd 

krxL(xk +  (L(zk+1) 0 L(zk )); k ; k ) 0 rxL(zk )kd ksk k

and

vk  4M 2 ksk k:

(3.23) (3:24)

Inequalities (3.21) and (3.23) follow from the previous discussion directly. To prove (3.24), notice that krx L(z )k  krx Fk (z )k  krFk (z )k  M holds for any z 2 and every k. Thus from (3.23) we have Proof

kvk k  M

Z 1h

Z 1 0h

i

krFk (zk + sk )k + krFk (zk )k d i 

+ krx L(xk +  (L(zk+1 ) 0 L(zk )); k ; k )k + krx L(zk )k d ksk k 0  4M 2ksk k:

2  > 0 be an upper bound of kF (z )k over D. Then for every k, the vector Let M yk = rk + k kFk (zk )ksk satis es

Proposition 3.2

kyk k  (10M 2 + M )ksk k: Proof

By (3.22) and (3.24) we get

krk k  krFk (zk )k2ksk k + kvk k  5M 2ksk k: 13

(3:25)

This together with the de nition of yk yields

kyk k  5M 2ksk k + k kFk (zk )k ksk k o  n T  5M 2ksk k + kFk (zk )k 0 min krsk skk2 ; 0 ksk k k  T s j j r  5M 2ksk k + M + ksk kk2 ksk k k    5M 2 + M + krkksk kks2k k ksk k k  (10M 2 + M )ksk k: 2 Denote

K 1 = fk j k  kF (zk +1 )kg and K 2 = fk j (3:16) holdsg:

(3:26)

Then we have the following proposition from Algorithm 1 immediately. Proposition 3.3 Let

where

p

fzk g

be generated by Algorithm 1. Then for every

kF (zk )k  k ;

 = 2m 2 (0; 1).

kGk (z)k 

Moreover, for every

pm 2

k  kF (zk )k;

k,

8z 2 Rn+m+r ;

(3:27)

k, the following inequality holds:

kFk (zk+1)k2  (1 + k )kFk (zk )k2:

(3:28)

The rst inequality in (3.27) follows form the choice of k , the second inequality follows from (2.5), and the last inequality then follows from the rst inequality. Now we prove (3.28).  2 , then kFk (zk +1 )k  kFk (zk )k  (1 + k )kFk (zk )k. Otherwise, by the de nition of If k 2 K K 2, k 62 K 2 means that zk +1 is generated by the line search in Step 3. In other words, we have zk+1 = zk + k pk with k satisfying (3.14), which also implies (3.28). 2 Proof

Proposition 3.4 Let

fzk g

be generated by Algorithm 1. Then

lim inf kF (zk )k = 0

(3:29)

lim inf kFk (zk )k = 0:

(3:30)

k

if and only if

k

!1

!1

14

Proof

By (3.27), we have for every k

kF (zk )k  kFk (zk )k + kGk (zk )k  kFk (zk )k + kF (zk )k; which implies

kF (zk )k  1 01  kFk (zk )k; 8k:

(3:31)

kFk (zk )k  kF (zk )k + kGk (zk )k  (1 + )kF (zk )k:

(3:32)

On the other hand, we have for every k

The inequalities (3.31) and (3.32) show the equivalence between (3.29) and (3.30). 2 Proposition 3.4 reveals that to prove the global convergence of Algorithm 1, it suces to show that there is a subsequence of fFk (zk )g converging to zero. In the next section, we will accomplish this under suitable conditions. 4.

Global Convergence

In this section, we prove global convergence of Algorithm 1. First, we show a convergence result similar to that in [32] and [35].  1 is in nite, Theorem 4.1 Let fzk g be generated by Algorithm 1. Then fzk g  . Moreover, if K

then

lim inf kF (zk )k = 0: k !1

(4:1)

Let K 1 = fk0 < k1 < k2 < 1 1 1g. Then by Step 6 of Algorithm 1, k = k for every k such that ki01 < k  ki . Moreover, for every ki 2 K 1, k +1  2 kF (zk +1 )k. It then follows that kF (zk +1)k  01k = 01k 01+1  12 kF (zk 01+1)k  1 1 1  ( 12 )ikF (zk0+1)k  01( 12 )ik0  ( 21 )ikF (zk0 )k: (4.2) Proof.

i

i

i

i

i

i

i

This implies (4.1). Moreover, if 0  kF (z1)k, i.e. k0 = 0, then kF (zk +1 )k  ( 12 )i kF (z0 )k  e kF (z0 )k. Otherwise, for all k  k0 , k = 0 and Fk = F0 , Gk = G0 . Thus it follows from (3.28) that i

kF (zk0 )k  kFk0 (zk0 )k + kGk0 (zk0 )k = kFk0 01 (zk0 )k + kGk0 (zk0 )k  (1 + k001)kFk001(zk001)k + kGk0 (zk0 )k 15

= (1 + k0 01 )kFk0 02 (zk0 01)k + kG0(zk0 )k  (1 + k001)(1 + k002)kFk002(zk002)k + kG0(zk0 )k = (1 + k0 01 )(1 + k0 02 )kFk0 03 (zk0 02)k + kG0(zk0 )k

Y (1 +  )]kF0(z0)k + kG0(z

k0 01

 111  [

Y (1 +  )](kF0(z0)k + kG0(z

k0 01

 [ p

j

j =0

j

j =0

k0

k0

)k

)k):

p

Since, by (2.5), kG0 (zk0 )k  2m 0  4m kF (z0 )k for all z , it follows that

pm

kF0(z0)k = kF (z0) 0 G0(z0)k  kF (z0)k + kG0(z0)k  (1 + 4 )kF (z0)k

and

pm

pm

pm

kF0(z0)k + kG0(zk0 )k  (1 + 4 )kF (z0)k + 4 kF (z0)k = (1 + 2 )kF (z0)k: Therefore, we have pm kY 0 01 kF (zk0 )k  [ (1 + j )](1 + 2 )kF (z0)k; (4:3) j =0

which implies

pm

kF (zk0 )k  e (1 + 2 )kF (z0)k  2e kF (z0)k;

(4:4) where the last inequality holds because < p2m . The inequalities (4.2) and (4.4) reveal that  1 . It then remains to show that zk 2 for all other k . For zk0 2 and zk +1 2 for all kj 2 K any of these k, it is clear that there exist kj ; kj +1 2 K 1 satisfying kj + 1 < k  kj +1 . In this case, by Step 6, k = k +1 and Fk = Fk +1 , Gk = Gk +1 . We deduce again from (3.28) that 

j

j

j

j

kF (zk )k = kFk +1 (zk ) + Gk +1(zk )k  kFk +1 (zk )k + kGk +1(zk )k  (1 + k01)kFk +1(zk01)k + kGk +1 (zk )k j

j

j

j

Y

 111  [

k01

 [

Y

t=kj +1

k

j

j

(1 + t)]kFk +1 (zk +1 )k + kGk +1 (zk )k j

j

(1 + t )][kF (zk +1 )k + kGk +1 (zk +1 )jj + kGk +1 (zk )k]: j

t=kj +1

j

However, by (3.27), we have

pm

kGk +1(zk )k  2 kF (zk +1)k; j

j

j

j

pm

kGk +1(zk +1)k  2 j

16

j

j

kj +1

pm

 2 kF (zk +1 )k: j

(4.5)

So, we get from (4.5)

Y (1 +  )](1 + pm )kF (z )k Y (1 +  )](1 + pm )kF (z )k [ pm Y p

)kF (z )k [ (1 +  )](1 + m )(1 +

kF (zk )k  [ 

k

t=kj +1

t

kj +1

t

k0

k

t=kj +1 k

 t t=0  6e kF (z0)k;

2

0

where the second and the third inequalities follow from (4.2) and (4.3), respectively. This means that zk 2 for all k other than k0 and kj 2 K 1 . Summarizing the above discussion, we conclude the proof. 2 The following lemma is simple but turns out to be very useful in proving global convergence for Algorithm 1. Lemma 4.1

 1 be nite. Then Let K

X1 k F (z )k < 1;

k=0

k k

k

X1 ks k = X1 kx

and

k=0

k

2

k=0

2

k+1

0 xk k2 < 1:

(4:6) (4:7)

The assumption that K 1 is nite implies that there is an index k such that k = k holds for all k  k. It means that for all k  k, Fk is independent of k. We denote Fk  F for all k  k. Then by the line search condition (3.14), we have Proof

kF (zk+1)k2  kF(zk )k2 0 1k k F(zk )k2 0 2ksk k2 + k kF(zk )k2; which can be rewritten as 1 k k F (zk )k2 + 2ksk k2  kF (zk )k2 0 kF (zk+1 )k2 + k kF (zk )k2:

Since fkF (zk )kg is bounded and k satis es (3.13), summing these inequalities from k = k to in nity yields (4.6) and (4.7). 2 The following lemma comes from [5]. 17

Lemma 4.2

Let fzk g be generated by Algorithm 1. If lim inf kF (zk )k = 6 0;

(4:8)

k!1

then there are positive constants j , j = 1; 2; 3, such that, for any k suciently large, the inequalities kBisik  1ksik and 2ksik2  sTi Bisi  3ksik2 (4:9) hold for at least d k2 e values of i 2 f1; 2; . . . ; kg. By Proposition 3.4, (4.8) implies that there exists a positive constant  such that kFk (zk )k   holds for all k  k, where k is a positive integer. It then follows from (3.12) . Combining this with (3.25), we get that y T sk  kFk (zk )k ksk k2   ksk k2 holds for all k  k Proof

k

2

(4.9) from Theorem 2.1 in [5]. Lemma 4.3

If k 6= 1, then k 

2(pTk Bk pk 0 tk kpk kkFk (zk )k) ; 1 kFk (zk )k2 + (2 + M 2 )kpk k2

where M is an upper bound of krFk (z)k on D and tk =

Z

0

1

krxL(xk +  k01L(zk ); k ; k ) 0 rxL(zk )kd +

where k0 = k =. Proof

If k = 6 1, then the steplength

0k = k = does not satisfy (3.14), i.e.,

k

Z

1 0

(4:10)

krFk (zk +  k0 pk ) 0 rFk (zk )kd;

is determined in Step 3, and by the line search rule,

kFk (zk + 0k pk )k2 0 kFk (zk )k2 > 01k 0k Fk (zk )k2 0 2k 0k pk k2 + k kFk (zk )k2  01k 0k Fk (zk )k2 0 2k 0k pk k2 or equivalently ( 0k )2(1 kFk (zk )k2 + 2 kpk k2 )  kFk (zk )k2 0 kFk (xk + 0k pk )k2 :

(4:11)

By an elementary deduction we get

kFk (zk + 0k pk )k2 0 kFk (zk )k2

= (Fk (zk + 0k pk ) + Fk (zk ))T (Fk (zk + k0 pk ) 0 Fk (zk ))

= 2Fk (zk )T (Fk (zk + 0k pk ) 0 Fk (zk )) + kFk (zk + 0k pk ) 0 Fk (zk )k2

 2Fk (zk )T (Fk (zk + 0k pk ) 0 Fk (zk )) + M 2k 0k pk k2: 18

(4.12)

For the rst term of (4.12), we have

Fk (zk )T (Fk (zk + 0k pk ) 0 Fk (zk )) Z1 0 T rFk (zk +  0k pk )T pk d = k Fk (zk ) 0

= 0k Fk (zk )T rFk (zk )T pk + 0k Fk (zk )T = 0k qkT pk 0 0k uTk pk + 0k Fk (zk )T

Z

1

Z

1 0

[rFk (zk +  0k pk )T

0 rFk (zk )T ]dpk

[rFk (zk +  0k pk )T 0 rFk (zk )T ]dpk Z1 [rFk (zk +  0k pk )T 0 rFk (zk )T ]dpk = 0 0k pTk Bk pk 0 0k uTk pk + 0k Fk (zk )T 0 Z1  0 0k pTk Bk pk + 0k kuk k kpk k + 0k kFk (zk )k krFk (zk +  0k pk ) 0 rFk (zk )kd kpk k 0 Z 1  0 0k pTk Bk pk + 0k kpk k kFk (zk )k krxL(xk +  k01L(zk ); k ; k ) 0 rxL(zk )kd 0 Z1  0 krFk (zk +  k pk ) 0 rFk (zk )kd + =

0

0

0 0k pTk Bk pk + 0k tk kpk k kFk (zk )k;

(4.13)

where the third equality follows from (3.20), the fourth equality follows from (3.15), and the last inequality follows from (3.21). Applying (4.13) to (4.12), we get from (4.11) ( 0k )2(1 kFk (zk )k2 + 2 kpk k2 )  2 0k pTk Bk pk 0 2 0k tk kpk kkFk (zk )k 0 M 2 k 0k pk k2 : Dividing the both sides by 0k yields

2(pTk Bk pk 0 tk kpk kkFk (zk )k) 0k  : 1 kFk (zk )k2 + (2 + M 2 )kpk k2

Since k =  0k , the above inequality is equivalent to (4.10). Now we prove global convergence of Algorithm 1. Denote

K = fi j (4.9) holdsg:

2 (4:14)

 be de ned by (4.14) and fzk g be generated by Algorithm 1. Assume that Let K there exists an accumulation point z of fzk gk2K such that rFk (z ) is nonsingular for all suciently large k 2 K . Then lim inf kF (zk )k = 0: (4:15) k !1  1 is nite. We may denote Proof From Theorem 4.1 it suces to consider the case where K Fk  F for all k suciently large. If K 2 = fk j (3:16) holdsg is in nite, then (4.15) is trivial. So  1 and K  2 are nite. we assume that both K Theorem 4.2

19

By (4.6), it suces to show that there is a subsequence of f k g with a positive lower bound, i.e., lim supk!1 k > 0. We assume limk!1 k = 0 to deduce a contradiction. Let fzk gk2K be a subsequence of fzk gk2K converging to z. The assumption that rF (z ) is nonsingular implies that when k 2 K is suciently large, rF (zk ) is uniformly nonsingular. In particular, there exists a constant M1 > 0 such that kF (zk )01 k  M1 holds for all k 2 K large enough. By Lemma 4.2, this together with (3.15) and (3.20) implies that, for at least d k2 e values of i 2 f1; 2; . . . ; kg

kF(zi)k

k 0 rF (zi)01(Bipi + ui)k    krF(zi)01k kBipik + kuik =



R

Z

1



 M1 1kpik + krxL(xi +  i01L(zi); i; i) 0 rxL(zi)kd kF(zi)k 0 = M1 ( 1 kpi k + i kF (zi )k);

(4.16)

where i = 01 krx L(xi +  i01 L(zi ); i ; i ) 0 rx L(zi )kd and the last inequality follows from (3.21) and (4.9). It is obvious that k ! 0. Therefore, (4.16) implies that there is a constant M2 > 0 such that for all k 2 K suciently large, kF (zi )k  M2 kpi k holds for at least d k2 e values of i 2 f1; 2; . . . ; kg. It follows from (4.10) that 2(pTi Bi pi 0 ti kpi kkF (zi )k) i  1 kF (zi )k2 + (2 + M 2 )kpi k2 2( 2 kpi k2 0 M2 ti kpi k2)  M 2 2 2 2 1 2 kpi k + (2 + M )kpi k 2( 2 0 M2ti ) = : 1 M22 + (2 + M 2 )

It is not dicult to see that tk ! 0 as k ! 1 with k 2 K , since zk ! z as k ! 1 with k 2 K . Consequently, the above inequality implies that f k gk2K contains a subsequence bounded away from zero, which contradicts the assumption that limk !1 k = 0. The proof is complete. 2

We notice that by Step 6 of Algorithm 1, k  kF (zk )k for every k (see Proposition 3.3). Since fk g is monotonically nonincreasing, we have

 =0 klim !1 k

(4:17)

under the conditions of Theorem 4.2. Now we establish global convergence of Algorithm 1 under slightly di erent conditions than Theorem 4.2.  be de ned by (4.14) and fzk g be generated by Algorithm 1. Assume that Let K  ) of fzk gk2K and  of fk gk2K such that rgi (x); i 2 there exist accumulation points z = (x; ; Theorem 4.3

20

and rhj (x); j = 1; 2; . . . ; r, are linearly independent, where S  (z ) is de ned by (2.8). Suppose that rx L(z ) is positive de nite on fp j rgi (x)T p = 0; 8i 2 S  (z ); rh(x)T p = 0g. Then (4.15) holds. S ( z ),

Proof If K 1 is in nite, then Theorem 4.1 has shown the conclusion. If K 1 is nite, then when k

is suciently large, k is independent of k. We assume that k =  holds for all k  k with some positive integer k. It then follows that Fk (z ) = Fk (z ) holds for all k  k. By Lemma 2.4, the . This means that the conditions in the theorem imply that rFk ( z ) is nonsingular for all k  k conditions in Theorem 4.2 hold. 2 5.

Superlinear Convergence

In this section, we prove superlinear convergence of Algorithm 1. To do this, we need the following further assumptions. (i) The sequence fzk g generated by Algorithm 1 converges to a solution of  (1.3), say z = ( x; ;  ), at which the strict complementarity condition holds, i.e.,

Assumption B

i 

+ gi ( x) > 0;

i

= 1; 2; . . . ; m:

(5:1)

(ii) rFk ( z ) is nonsingular for all k suciently large. 2 (iii) r f , r2gi ; i = 1; 2; . . . ; m, and r2 hj , j = 1; 2; . . . ; r, are Lipschitz continuous at x, i.e., there is a constant H > 0 and a neighborhood U ( x) of x  such that for all x 2 U ( x),

8 >> kr2f (x) 0 r2f (x)k  H kx 0 xk; < 2 2 >> kr2g (x) 0 r 2g (x)k  H kx 0 xk; : kr h (x) 0 r h (x)k  H kx 0 xk; i

i

j

j

i

(5:2)

= 1; 2; . . . ; m; j = 1; 2; . . . ; r:

Since rL is linear in  and , it is clear that condition (iii) in Assumption B implies the  ). It is also obvious that rg and rh are Lipschitz Lipschitz continuity of rL at z = ( x; ; continuous at x since r2 g and r2 h are continuous. So, there is a constant H > 0 such that for all z close to z, 0

8 >> krL(z) 0 rL(z)k  H kz 0 zk; < >> krg(x) 0 rg(x)k = krg(x) 0 rg(x) k  H kx 0 xk  H kz 0 zk; : krh(x) 0 rh(x)k = krh(x) 0 rh(x) k  H kx 0 xk  H kz 0 zk: 0

T

T

T

T

0

0

0

(5:3)

0

As mentioned just after the proof of Theorem 4.2, we have k ! 0. So it is not dicult to see p that under condition (i) in Assumption B, [k ]i + gi (zk )  k holds for all k suciently large, 21

where [k ]i denotes the i-th element of k . This means that J k (xk ; k ) = ; for all suciently large k, where J  (x; ) is de ned by (2.6). In other words, for all k suciently large, 8k (z ) = 8(z ) and hence Fk (z ) = F (z ) and Gk (z ) = 0 for all z close to z. As a result, conditions (i) and (ii) in Assumption B imply that there is a neighborhood U (z ) of z such that rF (z ) exists and is continuously di erentiable and uniformly nonsingular on U (z ). That is, we have the following lemma. Let conditions (i) and (ii) in Assumption B hold. Then there is an index k such that for any k  k and every z in a neighborhood U (z ) of z, we have Fk (z ) = F (z ) and Gk (z ) = 0. Moreover, F is continuously di erentiable and rF (z ) is uniformly nonsingular on U (z ). In particular, there is a constant c > 0 such that for all k  k,

Lemma 5.1

ckz 0 zk  kF (z ) 0 F (z )k  M kz 0 zk;

8z 2 U (z);

(5:4)

where M > 0 is an upper bound of krF (z )k over D. Let Assumptions A and B hold. Then there is a neighborhood U (z ) of z and a positive constant H such that Lemma 5.2

krF (z) 0 rF (z)k  H kz 0 zk; 8z 2 U (z):

(5:5)

From (2.7), by the strict complementarity condition (5.1), there is a neighborhood of z in which rF (z ) is represented as

Proof

0 1 r L ( z ) 0r g ( x ) diag ( b ( x;  )) 0r h ( x ) B CC rF (z) = B B CA ; 0r g ( x ) 0 diag( a ( x;  )) 0 @ 0rh(x) 0 0 x

i

T

i

T

where

ai (x; ) =

i 2i + gi (x)2

q

0 1;

bi (x; ) =

gi (x) 2i + gi (x)2

q

0 1:

We verify (5.5) by showing that every block in rF (z ) is Lipschitz continuous at z. First, the Lipschitz continuity of rx L, rgT , rhT , rg , and rh follows from (5.3) directly. It then suces to show that the second column block of rF (z ) satis es the Lipschitz condition. By an elementary deduction we have krg(x) diag (bi (x; )) 0 rg(x) diag (bi(x; ))k  krg(x) 0 rg(x)k k diag (bi(x; ))k + krg(x)k k diag (bi(x; )) 0 diag (bi(x; ))k  2H kx 0 xk + krg(x)k k diag (bi(x; )) 0 diag (bi(x; ))k; (5.6) 0

22

where the last inequality follows from Lemma 2.2 and (5.3). For the second term of (5.6), we have j jb (x; ) 0 b (x; ) g (x) 0 2 2 i

= =



i

q

i

gi (x) 2i + gi (x)2

q

i + gi (x) q q j gi(x) 2i + gi(x)2 0 gi(x) 2i + gi(x)2 j q

q

2i + gi (x)2  2i + gi (x)2

j g (x) 0 g(x)j

q

i

q

q

 2i + gi (x)2 + j gi (x) j j 2i + gi (x)2 0 2i + gi (x)2 j q q : 2i + gi (x)2  2i + gi (x)2

(5.7)

Since gi is continuously di erentiable, there is a constant H1 > 0 such that for all z near z,

j g (x) 0 g (x)j  H1kx 0 xk  H1kz 0 zk: i

i

We also have q 2 i

q



 + gi (x)2 0 2i + gi (x)2 j 2 + gi(x)2 0 q2i + gi(x)2 j = q i 2i + gi (x)2 +  2i + gi (x)2 j gi(x)2 0 gqi(x)2 j j 2i 0 q2i j +q :  q2 i + gi (x)2 +  2i + gi (x)2 2i + gi (x)2 +  2i + gi (x)2 q

By the strict complementarity condition (5.1), there is a neighborhood of z in which 2i + gi (x)2 is bounded away from zero. Therefore, the last inequality shows that there is a positive constant H2 such that for all z near z, q 2 i

q







 + gi (x)2 0 2i + gi (x)2  H2 kx 0 xk + k 0  k

 H2kz 0 zk:

Combining the above discussion with (5.7), we claim that there is a constant H3 > 0 such that for all z near z,  j  H3 kz 0 zk; jbi(x; ) 0 bi(x; ) i = 1; 2; . . . ; m: This together with (5.6) reveals that there is a positive constant H4 such that for all z near z,  k  H4 kz 0 zk: krg(x) diag (b (x; )) 0 rg(x) diag (b (x; )) i

i

Similarly, there is a positive constant H5 such that for all z near z,

k diag (a (x; )) 0 i

 k  H5 kz 0 zk: diag (ai (x; )) 23

Hence the second column block of proof.

rF (z) satis es the Lipschitz condition.

This completes the

2

Let Assumptions A and B hold. Then k = 1 holds for all k large enough, where k is de ned by (3.11). Moreover, there exist positive constants c1, c2 , C1, C2 and M3 such that for all k suciently large, c1 ksk k2  ykT sk  C1 ksk k2 ; (5:8) Lemma 5.3

c2 ksk k  kyk k  C2 ksk k; and





kyk 0 rF (zk )rF (zk )T sk k  M3 kF (zk )k + k ksk k;

(5:10)

k = maxfkzk+1 0 zk; kzk 0 zkg:

(5:11)

where

Proof

(5:9)

We rst show that there is a positive constant C3 such that when k is suciently large,

kvk k  C3k ksk k;

(5:12)

where vk is de ned by (3.9). Indeed, from (3.23), we deduce that for all k suciently large, Z 1 krF (zk + sk ) 0 rF (z)kd + krF (zk ) 0 rF (z)k kvk k  M 0 Z1  krxL(xk +  (L(zk+1) 0 L(zk )); k ; k ) 0 rxL(z)kd + krxL(zk ) 0 rxL(z)k ksk k +   i h0   M H kzk 0 zk + ksk k + kzk 0 zk + H kzk 0 zk + kL(zk+1) 0 L(zk )k + kzk 0 zk ksk k  h   i  M H 2kzk 0 zk + kzk+1 0 zk k + H 2kzk 0 zk + M kzk+1 0 zk k ksk k   i h   M H 3kzk 0 zk + kzk+1 0 zk + H (2 + M )kzk 0 zk + M kzk+1 0 zk ksk k  M [4H + 2H (1 + M )]k ksk k; 0

0

0

0

where M is an upper bound of krF (z )k on D (and hence an upper bound of krL(z )k on D)  + 2H (1 + M )], we and the second inequality follows from (5.3) and (5.5). Putting C3 = M [4H get (5.12). Next we show that k = 1 for all k suciently large. By the nonsingularity of rF ( z ), there is a positive constant c3 such that when k is suciently large 0

krF (zk )pk  c3kpk;

krF (zk )T pk  c3kpk; 24

8p 2 Rn+m+r :

(5:13)

From (3.22), (5.12) and (5.13), we get rkT sk

= sTk rF (zk )rF (zk )T sk + sTk vk  krF (zk )T sk k2 0 ksk k kvk k



c23 ksk k2 0 C3k ksk k2

= (c23 0 C3 k )ksk k2: Since k

! 0, it follows that when k is suciently large, rkT sk

 12 c23ksk k2 > 0;

which, by the de nition (3.11) of k , implies that k = 1 holds for all k suciently large. Moreover, we have 1 ykT sk = rkT sk + kF (zk )k sTk sk  rkT sk  c23 ksk k2 : 2 1 2 Putting c1 = 2 c3 yields the left-hand inequality of (5.8). We also get

kyk k

=

rk + kF (zk )ksk

 krk k + kF (zk )k ksk k  krF (zk )rF (zk )T sk k + kvk k + kF (zk )k ksk k  (M 2 + C3k + M )ksk k;

 is an upper bound of kF (z )k on D. The above inequalities show the right-hand inwhere M equality of (5.9). Moreover, we have ykT sk

 kyk k ksk k  (M 2 + C3k + M )ksk k2:

This implies the right-hand inequality of (5.8). Also, we have

kyk k



=

rk + kF (zk )ksk

 krk k 0 kF (zk )k ksk k  krF (zk )rF (zk )T sk k 0 kvk k 0 kF (zk )k ksk k  (c23 0 C3k 0 kF (zk )k)ksk k;

where the last inequality follows from (5.12) and (5.13). Since k ! 0 and F (zk ) ! F ( z ) = 0, the last inequality implies that there is a constant c2 > 0 such that the left-hand inequality of (5.9) holds for all k suciently large. 25

Now we verify (5.10). Observe that

kyk 0 rF (zk )rF (zk )T sk k

=

rk + kF (zk )ksk 0 rF (zk )rF (zk )T sk

 kF (zk )k ksk k + krk 0 rF (zk )rF (zk )T sk k = kF (zk )k ksk k + kvk k  (kF (zk )k + C3k )ksk k;

where the last equality follows from (3.22) and the last inequality follows from (5.12). Then (5.10) is obtained by letting M3 = maxf1; C3 g. 2 The next lemma follows from Theorem 4.2.  , either (3.16) holds or Let Assumptions A and B hold. Then for every k 2 K k  ~ holds, where K is de ned by (4.14) and ~ > 0 is some constant. Lemma 5.4

Proof

By Lemma 4.3, for every k large enough, if k = 6 1, then

k 

2(pTk Bk pk 0 tk kpk k kF (zk )k) : 1 kFk (zk )k2 + (2 + M 2 )kpk k2

 . Then in a similar way to the proof for (4.16), we deduce that Recall the de nition (4.14) of K  suciently large, kF (zi )k  M0 kpi k. It there is a positive constant M0 such that for all i 2 K then follows

i

kF (zi)k)  2k(Fp(i zB)ikp2i 0+ t(ikp+i k M 2 )kpi k2 1 i 2 2 2( 2 kpi k 0 M0 ti kpi k2)  M 1 22 kpi k2 + (2 + M 2)kpi k2 2( 2 0 M0ti ) = : T

1 M22 + (2 + M 2 )

 is suciently large, i is bounded away Since ti ! 0, the above inequalities show that when i 2 K from zero. In other words, there is a positive constant  and an index i such that i  minf  ; 1g  with i  i. holds for all i 2 K 2 Lemma 5.5

Let Assumptions A and B hold. Then we have

1 X

k=0 Proof

kzk 0 zk < 1:

(5:14)

It is easy to see by Algorithm 1 that for every i large enough, we have either

kFi(zi+1)k2 = kF (zi+1 )k2  2kF (zi)k2; 26

2 for i 2 K

(5:15)

or

kF (zi )k   +1

(1 0 1 2i + i )kF (zi )k2 0 2ksi k2

2

(1 0 1 2i + i )kF (zi )k2 ;

 2; for i 62 K

(5.16)

 2 is de ned by (3.26). Let K  be de ned by (4.14). Then, from Lemma 5.4, for each where K i 2 K , either (5.15) holds or i  ~ for some constant ~ > 0. In the latter case, it follows from (5.16) that kF (zi+1)k2  (1 + i 0 1 ~2)kF (zi)k2: Since i ! 0, there are an index i0 and a constant 1 2 (0; 1) such that 1 + i 0 1 ~2  1 holds for all i  i0. Let 2 = maxf 2; 1 g < 1. Then

kF (zi )k   kF (zi)k +1

2

(5:17)

2

2

 with i  i0 . However, Lemma 4.2 shows that the number of elements in K  is holds for all i 2 K at least d k2 e. Therefore, for any k > 2i0 , there are at least d k2 e 0 i0 of indices i such that (5.17) holds. Let K3 denote the set of indices i for which (5.17) holds and Nk denote the number of indices in K3 not exceeding k. Then we have Nk  d k2 e 0 i0 for each k large enough. Since (5.15) or (5.16) holds for all i large enough, multiplying (5.16) for i 62 K3 and (5.17) for i 2 K3 from i = i0 to i = k yields

kF (zk )k 

h Y



h Y



h Y(1 +  )id k e0 kF (x

+1

2

k

ii =62 Ki0 3 k

ii =62 Ki0 3

i

(1 + i 0 1 2i ) 2Nk kF (xi0 )k2

i

(1 + i ) 2Nk kF (xi0 )k2

k

i

i=0

k

 2 0(i0 +1)

 e

2

2

2

i0

k

i0 )

2

kF (xi0 )k

2

= c0 ~k ; where c0 = e 20(i0 +1)kF (xi0 )k2 and ~ = 22 2 (0; 1). Since zk ! z, it follows from (5.4) and the above inequalities that kzk +1 0 zk2  c02 c0 ~k holds for all k large enough. Hence we get (5.14) as desired. 2 1

The subsequent analysis follows a similar line to that of [23]. First, we verify the following lemma. 27

Lemma 5.6

Let Assumptions A and B hold. If

k(Bk 0 rF (z )rF (z )T )pk k = 0; k!1 kpk k then k = 1 for all k suciently large. Moreover, fzk g converges superlinearly. lim

Proof

Denote

k =

k(Bk 0 rF (z)rF (z)T )pk k : kpk k

(5:18)

(5:19)

Then we have from (3.15) and (3.20)

= = = =

rF (z)rF (z)T (zk + pk 0 z) rF (z)rF (z)T (zk 0 z) + rF (z)rF (z)T pk rF (z)rF (z)T (zk 0 z) 0 qk + (rF (z )rF (z )T 0 Bk )pk rF (z)rF (z)T (zk 0 z) 0 rF (zk )(F (zk ) 0 F (z)) + (rF (z )rF (z)T 0 Bk )pk 0 uk (5.20) (rF ( z )rF (z )T 0 rF (zk )Ak )(zk 0 z) + (rF (z )rF (z )T 0 Bk )pk 0 uk ;

R where uk is de ned by (3.6) and Ak = 01 rF ( z +  (zk 0 z))T d . From (3.21) and (5.3), we have  Z 1 krxL(xk +  k01L(zk ); k ; k ) 0 rxL(z)kd + krxL(zk ) 0 rxL(z)k kF (zk )k kuk k   0 0  H kzk 0 zk + k k01L(zk )k + kzk 0 zk kF (zk )k    H 0 2kzk 0 zk + kL(zk ) 0 L(z)k kF (zk )k

 H 0(2 + M )kzk 0 zk kF (zk )k  C4k kF (zk )k;

where C4 = H 0(2 + M ) and the third inequality follows form the fact that L( z ) = 0 and k01  1 for every k. Thus we get from (3.15), (3.20) and (5.19)

krF (z)rF (z)T pk k

k(rF (z)rF (z)T 0 Bk )pk 0 qk k  k(rF (z)rF (z)T 0 Bk )pk k + kqk k  k kpk k + krF (zk )F (zk )k + kuk k  k kpk k + krF (zk )F (zk )k + C4k kF (zk )k: =

(5.21)

Since rF ( z ) is nonsingular, there exists a positive constant c4 such that krF (z )pk  c4 kpk and T krF (z) pk  c4kpk holds for all p 2 Rn, which in turn implies that krF (z)rF (z)T pk k  c24kpk k. It then follows from (5.21) that (c24 0 k )kpk k  (krF (zk )k + C4k )kF (zk )k: 28

By the assumption that k ! 0 and the fact that k ! 0 and rF (zk ) ! rF ( z ), we claim that there exists a positive constant M4 such that for all k suciently large

kpk k  M4kF (zk )k = M4kF (zk ) 0 F (z)k  M M4kzk 0 zk: Therefore, taking the norm operation in (5.20) yields

krF (z)rF (z )T (zk + pk 0 z)k  k(rF (z)rF (z)T 0 rF (zk )Ak )(zk 0 z)k + k(rF (z)rF (z)T 0 Bk )pk k + kuk k  krF (z)rF (z )T 0 rF (zk )Ak k kzk 0 zk + k kpk k + C4k kF (zk ) 0 F (z)k  krF (z)rF (z )T 0 rF (zk )Ak kkzk 0 zk + k M4kzk 0 zk + C4M k kzk 0 zk = o(kzk 0 zk): (5.22) This together with (5.13) yields Moreover, we have

kF (zk + pk )k

kzk + pk 0 zk ! 0: kzk 0 zk

(5:23)

kF (zk + pk ) 0 F (z)k  M kzk + pk 0 zk M kzk + pk 0 zk = c kzk 0 zk ckzk 0 zk  Mc kzkkz+ p0k z0k zk kF (zk ) 0 F (z)k k M kzk + pk 0 zk = c kzk 0 zk kF (zk )k; =

where the last inequality follows from (5.4). This and (5.23) indicate that (3.16) is satis ed for all k suciently large. In other words, the unit steplength is accepted for all k suciently large. Consequently, (5.23) implies the superlinear convergence of fzk g. 2 Lemma 5.6 shows that to establish superlinear convergence of Algorithm 1, it suces to verify that fzk g satis es the Dennis-More condition (5.18). In the rest of this section, we devote ourselves to showing that this is true for Algorithm 1. Denote P = [rF ( z )rF ( z )T ]01=2 . For an n 2 n matrix A, de ne a matrix norm kAkP = kP AP kF , where k 1 kF denotes the Frobenius norm of a matrix. We let Hk and Hk+1 stand for the inverse matrices of Bk and Bk+1 , respectively. The following lemma is similar to Lemma 3.6 in [23], which shows that the BFGS formula (3.17) exhibits a similar property to that of the conventional BFGS formula. For completeness, we give a proof. 29

Under Assumptions A and B, there exist positive constants M5 ; M6 ; M7 and (0; 1) such that for all k suciently large Lemma 5.7

2

kBk+1 0 rF (z)rF (z)T kP  kBk 0 rF (z)rF (z)T kP + M5k ; (5:24) kHk+1 0 [rF (z)rF (z)T ]01kP 01  (1 0 12 k2 + M6k )kHk 0 [rF (z)rF (z)T ]01kP 01 + M7k ;

(5:25)

where k is de ned by (5.11) and k is given by kP 01[Hk 0 [rF (z)rF (z)T ]01]yk k : k = kHk 0 [rF (z)rF (z)T ]01kP 01 kP yk k In particular, fkBk kg and fkHk kg are bounded. Proof

(5:26)

From the update formula (3.17), we get

P (Bk +1 0 rF (z )rF (z )T )P [(P Bk P )(P 01 sk )][(P Bk P )(P 01 sk )]T = P (Bk 0 rF ( z )rF (z ))P 0 (P 01sk )T (P Bk P )(P 01 sk ) (P yk )(P yk )T + : (5.27) (P yk )T (P 01sk ) Denote B~k = P Bk P , s~k = P 01 sk , y~k = P yk , Qk +1 = P (Bk+1 0rF ( z )rF (z )T )P = P Bk+1 P 0 I , and Qk = P (Bk 0 rF ( z )rF ( z )T )P = B~k 0 I . Then Qk , Qk+1 and B~k are symmetric, and (5.27) is rewritten as y~ y~T B~ s~ s~T B~ Qk+1 = Qk 0 kT k~ k k + kT k : y~k s~k s~k Bk s~k Taking the norm operation on the both sides, we get

T~ T T T ~ (5:28) kQk+1kF 

Qk 0 BkTs~k~s~k Bk + ks~s~k s~kk2

F +

ks~s~k s~kk2 0 yy~~kTy~s~k

F : s~k Bk s~k k k k k We estimate the two terms on the right-hand side of (5.28). For the rst term, we have

B~k s~k s~T B~k s~k s~Tk

2

+

Qk 0 T k ks~k k2 F s~k B~k s~k n B~ s~ s~T B~ s~ s~T  s~ s~T T o B~ s~ s~T B~ = trace Qk 0 kT k~ k k + k k2 Qk 0 kT k~ k k + k k2 ks~k k ks~k k s~k Bk s~k s~k Bk s~k  kB~ s~ k2B~ s~ s~T B~ s~ s~T Q B~ s~ s~T B~ + B~k s~k s~Tk B~k Qk = trace Q2k + k k T k k 2k k + k k2 0 k k k k Tk ks~k k (~ sk B~k s~k ) s~k B~k s~k T T ~  T T ~ 0 Bk s~k s~kks~+ks~2k s~k Bk + Qk s~k s~kks~+ks~2k s~k Qk k k TB 4 ~ ~ s~T Q s~ s ~ Q B~ s~ s~T B~ s~ k B s ~ k = kQk k2F + T k~ k 2 + 1 0 2 k Tk ~k k k 0 2 k k 2k + 2 k k 2k : ks~k k ks~k k (~ sk Bk s~k ) s~k Bk s~k 30

(5.29)

By the de nition of Qk and B~k , we have ~k Qk B~k s~k = s~Tk B~k (B~k 0 I )B~k s~k s~Tk B = s~Tk B~k3 s~k 0 s~Tk B~k2s~k = s~Tk B~k3 s~k 0 kB~k s~k k2 and

~k s~k 0 ks~k k2: s~Tk Qk s~k = s~Tk B

So, we get from (5.29)

~k s~k s~T B~k B s~k s~Tk

2 +

Qk 0 T k 2 ~k s~k ks~k k F s~k B ~ 3 s~ ~ s~ kB~ s~ k2 s~T B~ s~ s~T B s~T B kB~ s~ k4 = kQk k2F + T k~ k 2 + 1 0 2 kT ~k k + 2 T k~ k 0 2 k k 2k + 2 k k 2k ks~k k ks~k k (~ sk Bk s~k ) s~k Bk s~k s~k Bk s~k h kB ~k s~k k2 2 s~T B~ 3 s~k i h kB~k s~k k2 2 kB~k s~k k2 + 1i k k 0 0 2 = kQk k2F + 2 0 ~k s~k ~k s~k ~k s~k ~k s~k s~Tk B s~Tk B s~Tk B s~Tk B h kB ~k s~k k2 2 s~T B~ 3 s~k i h kB~k s~k k2 i2 k k = kQk k2F + 2 0 0 1 + ~k s~k ~k s~k ~k s~k s~T B s~T B s~T B k

 kQk kF ; 2

k

02

k

(5.30)

where the inequality holds because 3

1

3

1

kB~k s~k k2 = [B~k2 s~k ]T [B~k2 s~k ]  kB~k2 s~k kkB~k2 s~k k = [~sTk B~k3s~k ] 12 [~sTk B~k s~k ] 12 : For the second term of (5.28) we have

s~ s~T

k k

ks~k k2

T

0 yy~~kTy~s~k

F  k k

  =

s~ s~T 0 y~ y~T 1 1  T

k k k k s ~ s ~ + 0

ks~k k2 y~kT s~k k k F y~kT s~k F jy~kT s~k 0 ks~k k2j + ks~k (~sk 0 y~k )T kF + k(~sk 0 y~k )~ykT kF y~kT s~k y~kT s~k ky~k 0 s~k k ks~k k + ks~k k ks~k 0 y~k k + ks~k 0 y~k kky~k k y~kT s~k (2kP 01 sk k + kP yk k)kP yk 0 P 01sk k



ykT sk

By the de nition of P , we get

:

kP yk 0 P 01sk k  kP k kyk 0 P 02sk k = kP k kyk 0 rF ( z )rF ( z )T sk k    kP k kyk 0 rF (zk )rF (zk )T sk k + k(rF (zk )rF (zk )T 0 rF (z)rF (z)T )sk k    M3(k + kF (zk )k) + krF (zk )rF (zk )T 0 rF (z)rF (z)T k kP kksk k; 31

(5.31)

where the last inequality follows from (5.10). Since rF is Lipschitz continuous at z, it follows from the last inequality that there is a positive constant C5 such that 

kP yk 0 P 01sk k    





M3 (k + kF (zk )k) + C5 kzk 0 zk kP kksk k 

M3 (k + kF (zk ) 0 F (z )k) + C5 k kP kksk k





M3 (k + M kzk 0 zk) + C5 k kP kksk k C6 k ksk k;

(5.32)

where C6 = [M3 (1 + M ) + C5 ]kP k. Substituting this into (5.31) yields

s~ s~T

k k

ks~k k2 

 

T

0 yy~~kTy~s~k

F k k



2kP 01 sk k + kP yk k C6 k ksk k

ykT sk





2kP 01 k ksk k + kP k kyk k C6 k ksk k

c1 ksk k2  01  c01 1 2kP k + C2 kP k C6k = M5  k ;







01 where M5 = c01 1 C6 2kP k + C2 kP k , the second inequality follows from (5.8), and the last inequality follows from (5.9). Combining this with (5.28) and (5.30), we get (5.24). Now we verify (5.25). The inverse update formula of BFGS method is represented as T s (s 0 H y )T k k k k Hk+1 = Hk + (sk 0 Hk yk )sk + ykT sk

=



I0

sk ykT H I ykT sk k

0

yk sTk  + sk sTk ; ykT sk ykT sk

0 yk (sk (0yTHsk y)2k )sk sk T

T

k k

which is the dual form of DFP update formula in the sense that Hk $ Bk , Hk+1 $ Bk+1 and sk $ yk . By (5.32), the condition of Lemma 3.1 in [9] is satis ed (with the identi cations s $ yk , y $ sk , B $ Hk , A $ [rF (z )rF (z )T ]01 and M $ P 01). Therefore there are constants 1 > 0; 2 > 0 and 2 (0; 1) such that

kHk+1 0 [rF (z )rF (z)T ]01kP 01 q kP 01sk 0 P yk k kH 0 [rF (z)rF (z)T ]01k 01  1 0 k2 + 1 k P kP yk k ks 0 [rF (z)rF (z )T ]01yk k + 2 k kP yk k 32

=

q kP 01sk 0 P yk k kH 0 [rF (z)rF (z)T ]01k 01 1 0 k2 + 1 k P + 2

ksk 0 P 2yk k

kP yk k

kP yk k ;

(5.33)

where k is de ned by (5.26). By the nonsingularity of P , there is a constant c4 > 0 such that

kP yk k  c4kyk k  c4c2ksk k; where the last inequality follows from (5.9). So, we get from (5.33)

kHk+1 0 [rF (z)rF (z)T ]01kP 01 q C  ks k  C  kP kksk k 1 0 k2 + 1 6 k k kHk 0 [rF (  ; z )rF (z )T ]01 kP 01 + 2 6 k c c ks k c c ks k 4 2 k

4 2 k

01 01 01 T 01  (1 0 21 k2 + 1c01 2 c4 C6 k )kHk 0 [rF (z )rF (z ) ] kP 01 + 2c2 c4 C6k kP k; q where the rst inequality follows from (5.32) and the last inequality holds since 1 0 k2  1 0 21 k2 . In view of (5.32), we get (5.25). Finally, from Lemma 3.4 in [9] and (5.14), we see that fkBk 0 rF (z )rF ( z )T kP g and fkHk 0 [rF ( z )rF (z )T ]01 kP 01 g converge. In particular, fkBk kg and fkHk kg are bounded. 2 Now we prove superlinear convergence of Algorithm 1. Theorem 5.1 Let Assumptions A and B hold. Then

k(Bk 0 rF (z)rF (z)T )pk k = 0: k!1 kpk k lim

Moreover,

Proof

fzk g

converges to

(5:34)

z superlinearly.

We rewrite (5.25) as 1 2  kH 0 rF (z )rF (z )T kP 01 2 k k  kHk 0 [rF (z)rF (z)T ]01kP 01 0 kHk+1 0 [rF (z )rF (z)T ]01kP 01   + M6kHk 0 [rF ( z )rF (z )T ]01kP 01 + M7 k :

Since fkHk 0 [rF ( z )rF (z )T ]01kP 01 g is bounded and k de nition (5.26) of k yields

kP 01(Hk 0 [rF (z)rF (z)T ]01)yk k2 = 0: (5:35) k!1 kHk 0 [rF ( z )rF (z )T ]01 kP 01 kP yk k2

lim k2 kHk 0 [rF ( z )rF (z )T ]01 kP 01 = lim

k!1

! 0, this inequality together with the

33

Moreover, since kHk

0 [rF (z)rF (z)T ]01kP 01 is bounded, (5.35) implies kP 01(Hk 0 [rF (z)rF (z)T ]01)yk k = 0: lim k !1 kP yk k

By the nonsingularity of P and (5.9), there exist some constants M8 > 0 and c5 > 0 such that kP yk k  M8ksk k for all k and kP 01wk  c5w for all w 2 Rn . Hence we get lim k !1

k(Hk 0 [rF (z)rF (z)T ]01)yk k = 0: ksk k

(5:36)

On the other hand, we have

k(Hk 0 [rF (z)rF (z)T ]01)yk k = kHk (rF ( z )rF ( z )T 0 Bk )[rF ( z )rF ( z )T ]01 yk k  kHk (rF (z)rF (z )T 0 Bk )sk k 0 kHk (rF (z)rF (z)T 0 Bk )([rF (z)rF (z)T ]01yk 0 sk )k = kHk (rF ( z )rF ( z )T 0 Bk )sk k 0 kHk (rF ( z )rF ( z )T 0 Bk )P (P yk 0 P 01 sk )k  kHk (rF (z)rF (z )T 0 Bk )sk k 0 kHk (rF (z )rF (z)T 0 Bk )P kkP yk 0 P 01sk k  kHk (rF (z)rF (z )T 0 Bk )sk k 0 C6k kHk (rF (z )rF (z)T 0 Bk )P kksk k = kHk (rF ( z )rF ( z )T 0 Bk )sk k + o(ksk k); where the last inequality follows from (5.32) and the last equality holds because k ! 0 and both fkBk kg and fkHk kg are bounded. Moreover the fact that fkBk kg and fkHk kg are bounded

fkBk kg and fkHk kg are uniformly nonsingular. Therefore, there is a constant c6 > 0 such that kHk (rF ( z )rF ( z )T 0 Bk )sk k  c6k(rF ( z )rF ( z )T 0 Bk )sk k for all k.

particularly implies that So we have

k(Hk 0 [rF (z)rF (z)T ]01)yk k  c6k(rF (z)rF (z)T 0 Bk )sk k + o(ksk k); and hence (5.36) yields (5.34). In view of Lemma 5.6, the proof is complete. 6.

2

Discussion

We have proposed a BFGS method for solving the KKT system of a general constrained optimization problem by means of successive splitting of an equivalent system of nonsmooth equations. Under Assumptions A and B, we have established global and superlinear convergence of the method. Compared with an SQP method, the proposed method has two advantages: First, at 34

every iteration, the subproblem is always solvable and has a unique solution, so that no additional computational e ort is required to restore the solvability of subproblems. Second, to ensure global and superlinear convergence of the method, we are free from the conventional but restrictive condition (1.5). The proposed method is applicable to nonconvex mathematical programming problems. A de ciency of the method is that the subproblems are of full dimension. That is, at each iteration, an n + m + r dimensional linear equation has to be solved. It is well known that under some regularity conditions, the KKT point relies only on active constraints. Reducing the size of subproblems deserves to be considered. We also notice that superlinear convergence has been established under the strict complementarity condition (5.1). It is certainly important to investigate whether the method retains superlinear convergence without this assumption.

35

References [1] B OGGS, P. T., TOLLE, J. W., and WANG , P.,

On the local convergence of quasi-Newton

methods for constrained optimization, SIAM Journal on Control and Optimization, Vol. 20, pp. 161-171, 1982.

[2] B ONNANS , J. F., and LAUNAY, G.,

Sequential quadratic programming with penalization of the

displacement, SIAM Journal on Optimization, Vol. 5, pp. 792-812, 1995. [3] B URKE, J. V.,

A sequential quadratic programming method for potentially infeasible mathe-

matical programs, Journal of Mathematics Analysis and Applications, Vol. 139, pp. 319-351, 1989. [4] B URKE, J. V., and HAN , S. P.,

A robust sequential quadratic programming method , Mathe-

matical Programming, Vol. 43, pp. 277-303, 1989. [5] B YRD, R., and NOCEDAL, J.,

A tool for the analysis of quasi-Newton methods with application

to unconstrained minimization, SIAM Journal on Numerical Analysis, Vol. 26, pp 727-739, 1989. [6] B YRD, R., TAPIA , R. A., and ZHANG , Y.,

An SQP augmented Lagrangian BFGS algorithm

for constrained optimization, SIAM Journal on Optimization, Vol. 2, pp. 210-241, 1992. [7] CHEN , X., and QI, L.,

A parameterized Newton method and a quasi-Newton method for

nonsmooth equations, Computational Optimization and Applications, Vol. 3, pp. 157-179, 1994. [8] DE LUCA , T., FACCHINEI , F., and KANZOW , C.,

A semismooth equation approach to the

solution of nonlinear complementarity problems, Mathematical Programming, Vol. 75, 407440, 1996.

 , J. J., [9] DENNIS, J. E. J R., and MORE

A characterization of superlinear convergence and its

application to quasi-Newton methods, Mathematics of Computation, Vol. 28, 549-560, 1974. [10] FACCHINEI , F., and LUCIDI , S.,

Quadratically and superlinearly convergent algorithms for the

solution of inequality constrained minimization problems, Journal of Optimization Theory and Applications, Vol. 85, pp. 265-289, 1995. [11] FACCHINEI , F., and SOARES, J.,

A new merit function for nonlinear complementarity problems

and a related algorithm , SIAM Journal on Optimization, Vol. 7, pp. 225-247, 1997.

36

[12] FISCHER, A., A special Newton-type optimization method, Optimization, Vol. 24, pp. 269-284, 1992. [13] FISCHER, A., A Newton-type method for positive semide nite linear complementarity problems, Journal of Optimization Theory and Applications, Vol. 86, pp. 585-608, 1995. [14] FLETCHER, R., Practical Methods of Optimization, Second Edition, John Wiley & Sons, Chichester, 1987. [15] FUKUSHIMA, M., A successive quadratic programming algorithm with global and superlinear convergence properties, Mathematical Programming, Vol. 35, pp. 253-264, 1986. [16] GEIGER, C., and KANZOW, A., On the resolution of monotone complementarity problems, Computational Optimization and Applications, Vol. 5, pp. 155-173, 1996. [17] HAN, S. P., A globally convergent method for nonlinear programming, Journal of Optimization Theory and Applications, Vol. 22, pp. 297-309, 1977. [18] HAN, S. P., PANG, J. S., and RANGARAJ, N., Globally convergent Newton method for nonsmooth equations, Mathematics of Operations Research, Vol. 17, pp. 586-607, 1992. [19] HEINKENSCHLOSS, M., Projected sequential quadratic programming methods, SIAM Journal on Optimization, Vol. 6 , pp. 373-417, 1996. [20] JIANG, H., and QI, L., A new nonsmooth equations approach to nonlinear complementarity problems,SIAM Journal on Control and Optimization, Vol. 35, pp. 178-193, 1997. [21] KANZOW, C., and FUKUSHIMA, M., Equivalence of the generalized complementarity problem to di erentiable unconstrained minimization, Journal of Optimization Theory and Applications, Vol. 90, pp. 581-603, 1996. [22] LI, D. H., and FUKUSHIMA, M., A modi ed BFGS method with global convergence in nonconvex minimization, Technical Report 98003, Department of Applied Mathematics and Physics, Kyoto University, January 1998. [23] LI, D. H., and FUKUSHIMA, M., A globally and superlinearly convergent Gauss-Newton based BFGS method for symmetric equations, Technical Report 98006, Department of Applied Mathematics and Physics, Kyoto University, March 1998. 37

A sequential quadratic programming algorithm using an incomplete solution of the subproblem, SIAM Journal on Optimization, Vol. 5, pp. 590-640, URRAY,

[24] M

RIETO,

W., and P

F.,

1995.

ANG,

[25] P

J. S.,

Newton's method for B-di erentiable equations,

Mathematics of Operations

Research, Vol. 15, pp. 311-341, 1990.

A B-di erentiable equation-based, globally and locally quadratically convergent algorithm for nonlinear programs, complementarity and variational inequality problems, ANG,

[26] P

J. S.,

Mathematical Programming, Vol. 51, pp. 101-131, 1991.

ANG,

[27] P

J. S.,

Serial and parallel computation of Karush-Kuhn-Tucker points via nonsmooth

equations, SIAM Journal on Optimization, Vol.4, pp. 872-893, 1994. The convergence of variable metric methods for nonlinearly constrained optimization calculations, Nonlinear Programming 3, Edited by O.L. Mangasarian, R.R. OWELL,

[28] P

M. J. D.,

Meyer and S.M. Robinson, Academic Press, pp. 27-63, 1978.

OWELL,

[29] P

M. J. D.,

Variable metric methods for constrained optimization,

Mathematical

Programming, the State of the Art, Edited by A. Bachem, M. Gr otschel and B. Korte, Springer-Verlag, pp. 288-311, 1982.

A recursive quadratic programming algorithm that uses di erentiable exact penalty functions, Mathematical Programming, Vol. 35, pp. 265-278, OWELL,

[30] P

M. J. D., and Y

UAN,

Y.,

1986.

I

[31] Q , L.,

Convergence analysis of some algorithms for solving nonsmooth equations,

Mathe-

matics of Operations Research, Vol. 18, pp. 227-244, 1993.

I

HEN,

[32] Q , L., and C

A globally convergent successive approximation method for severely

X.,

nonsmooth equations,

SIAM Journal on Control and Optimization, Vol. 33, pp. 402-418,

1995.

[33] S

AHBA,

M.,

A globally convergent algorithm for nonlinearly constrained optimization prob-

lems, Journal of Optimization Theory and Applications, Vol. 52, pp. 291-309, 1987. Modi ed Newton methods for solving a semismooth reformulation of monotone complementarity problems, Mathematical Programming, Vol. 76,

[34] Y

AMASHITA,

UKUSHIMA ,

N., and F

M.,

pp. 469-491, 1997.

38

[35] ZHOU , S. Z., LI, D. H., and ZENG, J. P., A successive approximation quasi-Newton process for nonlinear complementarity problems, Recent Advances in Nonsmooth Optimization, Edited by D.-Z. Du, L. Qi and R.S. Womersley, World Scienti c, pp. 459-472, 1995.

39

Suggest Documents