A Globally Convergent Lagrangian Barrier Algorithm for Optimization

1 downloads 0 Views 429KB Size Report
Nov 16, 1992 - for Optimization with General Inequality Constraints .... the shifted-barrier method analysed for linear programs by Gill et al. (1988) and the ..... important from a practical point of view that this condition is satis ed at the starting.
A Globally Convergent Lagrangian Barrier Algorithm for Optimization with General Inequality Constraints and Simple Bounds

by A. R. Conn1, Nick Gould2 , and Ph. L. Toint3 RAL 92-067 November 16, 1992

Abstract. We consider the global and local convergence properties of a class of Lagrangian barrier methods for solving nonlinear programming problems. In such methods, simple bound constraints may be treated separately from more general constraints. The objective and general constraint functions are combined in a Lagrangian barrier function. A sequence of Lagrangian barrier functions are approximately minimized within the domain de ned by the simple bounds. Global convergence of the sequence of generated iterates to a rst-order stationary point for the original problem is established. Furthermore, possible numerical diculties associated with barrier function methods are avoided as it is shown that a potentially troublesome penalty parameter is bounded away from zero. This paper is a companion to our previous work (see, Conn et al., 1991) on augmented Lagrangian methods. 1 Mathematical Sciences Department,

IBM T.J. Watson Research Center, PO Box 218, Yorktown Heights, NY 10598, USA 2 Central Computing Department, Rutherford Appleton Laboratory, Chilton, Oxfordshire, OX11 0QX, England 3 Department of Mathematics, Facultes Universitaires ND de la Paix, B-5000 Namur, Belgium

Keywords : Constrained optimization, Lagrangian barrier function, simple bounds, general inequality constraints. Mathematics Subject Classi cations : 65K05, 90C30

A Globally Convergent Lagrangian Barrier Algorithm for Optimization with General Inequality Constraints and Simple Bounds A. R. Conn, Nick Gould and Ph. L. Toint November 16, 1992

Abstract We consider the global and local convergence properties of a class of Lagrangian barrier methods for solving nonlinear programming problems. In such methods, simple bound constraints may be treated separately from more general constraints. The objective and general constraint functions are combined in a Lagrangian barrier function. A sequence of Lagrangian barrier functions are approximately minimized within the domain de ned by the simple bounds. Global convergence of the sequence of generated iterates to a rst-order stationary point for the original problem is established. Furthermore, possible numerical diculties associated with barrier function methods are avoided as it is shown that a potentially troublesome penalty parameter is bounded away from zero. This paper is a companion to our previous work (see, Conn et al., 1991) on augmented Lagrangian methods. 1

Introduction

In this paper, we consider the problem of nding a local minimizer of the function

f (x) where x is required to satisfy the general inequality constraints ci (x)  0; 1  i  m

(1:1) (1:2)

and speci c simple bounds

l  x  u: (1:3) Here f and ci map 0 for all iterates generated during the inner iteration. In particular, it is important from a practical point of view that this condition is satis ed at the starting point for the inner iteration. In one important case, this is trivially so. For we have,

Lemma 3.1 The iterates generated by Algorithm 3.1 satisfy the condition

ci (x(k) ) + s(ik+1) > 0; for i = 1; ::::; m

(3:11)

for k = 01 and all iterations k  0 for which (3.8) is satis ed.

Proof. The result is true for k = 01 by choice of the initial Lagrange multiplier estimates and shifts in Steps 0 and 1 of the algorithm. The k -th inner iteration (Step 1) of the algorithm ensures that (3.6) is satis ed. If (3.8) is satis ed, the updates (3.4) and (3.9) apply. For each constraint, there are two

11 possibilities. If ci (x(k) ) > 0, (3.11) follows immediately, as the algorithm ensures that the shifts are always positive. If, on the other hand, ci (x(k) )  0,

s(ik) ci (x(k) ) + s(ik)

 1:

(3:12)

In this case, the de nition (2.3) of the multiplier update ensures that

(ik+1) = i (x(k) ; (k) ; s(k) )  (ik) :

(3:13)

Hence s(i k+1)  s(i k) follows from (3.4) and (3.9), and thus (3.6) gives (3.11). Thus, so long as we are able to update the multiplier estimates rather than reducing the penalty parameter, the terminating iterate from one inner iteration gives a suitable starting point for the next. We shall consider what to do in other cases in due course.

3.3 The inner iteration In order to satisfy the inner-iteration termination test (3.5), one may in theory apply any algorithm for solving the simple-bound constrained minimization problems | problems in which the minimizer an objective function within a region de ned by simple bounds on the variables is sought | to the problem of minimizing (1.4) within (1.17). Indeed, as the condition P (x; rx9(x; (k) ; s(k) ) = 0 (3:14) is required at optimality for such a problem, (3.5) can be viewed as an inexact stopping rule for iterative algorithms for solving it. We merely mention here that the projected gradient methods of Calamai and More (1987), Burke and More (1988), Conn et al. (1988a), Conn et al. (1988b) and Burke et al. (1990) and the interior point method of Nash and Sofer (1991) are all appropriate, but that methods which take special account of the nature of (1.4) may yet be prefered.

3.4 Further discussion We should also comment on the rather peculiar test (3.8) in Algorithm 3.1. In our previous work on solving problems with general equality constraints ci (x) = 0, i = 1; ::::; m (see, Conn et al., 1991), we measure the success or failure of an outer iterate x(k) by the size of the norm of the constraint violation



h

m

i



c(x(k) )  ci (x(k) ) i=1 :

Speci cally, we ask whether





c(x(k) )   (k) ;

(3:15) (3:16)

for some convergence tolerance (k) (see Conn et al., 1991, Test (3.6)). In the current algorithm, we employ a similar test. As one would not in general expect all of the general inequality constraint functions to converge to zero for the problem under consideration in this paper, the test (3.16) is inappropriate. However, one would expect the complementary slacknesses ci (x)i , i = 1; ::::; m, to converge to zero for suitable Lagrange multiplier estimates i . The test (3.8) is designed with this in mind. In fact, there is a stronger similarity between Conn et al. (1991, Test (3.6)) and (3.8) than is directly apparent. For the former test may be rewritten as



(k) 0 (k)   (k) = (k) ;

(3:17)

12 using the rst-order multiplier update proposed in Conn et al., 1991. The test (3.8) may likewise be written as  (k ) (3:18) 0 (k)  (k) =(k) ;  because of the de nition of the multiplier estimates (2.3) and shifts (3.4). Whenever (k) is smaller than 1 , (3.17) and (3.18) coincide. Our primary aim is now to analyse the convergence behaviour of Algorithm 3.1. 4

Global convergence analysis

In this section, we shall consider the convergence of Algorithm 3.1 from arbitrary starting points. We aim to show that all nite limit points of the iterates generated by the algorithm are Kuhn-Tucker points. We shall analyse the convergence of Algorithm 3.1 in the case where the convergence tolerances !3 and 3 are both zero. We shall make use of the the following assumption. AS2: The set L(x3 ; 0; x3 ; F1 ) = f[A3 ] j[A3 ]  0 and (g (x3 ) 0 (A(x3 )[A3 ] )T [A3 ] )[F1 ] = 0g is bounded for any limit point x3 of the sequence fx(k) g and set F1 de ned by (2.13). Note that AS2 excludes the possibility that F1 is empty unless there are no general constraints active at x3 . In view of Lemma 2.1, this seems reasonable as otherwise we are allowing the possibility that there are more than n active constraints at x3 . As a consequence of AS2 we have:

Lemma 4.1 Suppose that AS2 holds. Then L(x; ! ; x3 ; F1) is bounded for all (x; ! ) suciently close to (x3 ; 0). Proof. The result follows directly from the analysis given by Fiacco (1983, Theorem 2.2.9). We require the following lemma in the proof of global convergence of our algorithm. The lemma is the analog of Conn et al. (1991, Lemma 4.1 ). In essence, the result shows that the Lagrange multiplier estimates generated by the algorithm cannot behave too badly. Lemma 4.2 Suppose that (k) converges to zero as k increases when Algorithm 1 is executed. Then the product (k) ((i k) )1+  converges to zero for each 1  i  m. Proof. If (k) converges to zero, Step 4 of the algorithm must be executed in nitely often. Let K = fk0 ; k1 ; k2; ::::g be the set of the indices of the iterations in which Step 4 of the algorithm is executed and for which

(k)  min(( 21 )  ; 1 ): 1

(4:1)

and thus in particular (k) = (k) . We consider how the i-th Lagrange multiplier estimate changes between two successive iterations indexed in the set K. Firstly note that (i kp +1) = (i kp ). At iteration kp + j , for kp + 1 < kp + j  kp+1 , we have ! ci (x(kp +j 01) )(ikp +j ) 1 ( kp +j ) ( kp +j01) i = i 0 ; ( k + ( kp +j 01)  p  j 01) ( )

i

(4:2)

13 from (2.5), (3.4) and (3.9) and

(kp+1 ) = (kp +j ) = (kp +1) = (kp ) ;

(4:3)

Hence summing (4.2) and using the fact that (i kp +1) = (i kp ) , j 01 (k +l) (kp +l+1) ! 1 ( kp +j ) ( kp ) X ci (x p )i i = i 0 (kp +1) ((i kp +l))  l=1

(4:4)

where the summation in (4.4) is null if j = 1. Now suppose that j > 1. Then for the set of iterations kp + l; 1  l < j , Step 2 of the algorithm must have been executed and hence, from (3.6), (4.3) and the recursive de nition of  (k) , we must also have "



#m ci (x(kp +l) )(ikp +l+1)   ((kp +1) )  +  (l 01) 0 ((i kp +l) )  i=1

(4:5)

Combining equations (4.1) to (4.5), we obtain the bound

k(kp+j)k

" (kp +l+1) #m Pj 01 ci (x(kp +l ) ) 1 ( k ) i  k p k + l=1 1 ( kp +l)  (kp +1) (i ) 01 ((kp +1) )i=1  (l01)  k(kp) k + 0((kp+1) )  01 Pjl=1  k(kp) k + 0((kp+1) )  01=(1 0 ((kp+1) )  )  k(kp) k + 20 ((kp+1) )  01 :

(4:6)

Thus, multiplying (4.6) by ((kp +j) )  , where  = 1=(1 +  ), and using (4.3), we obtain that (4:7) ((kp +j ))  k(kp +j )k    ((kp ) )  k(kp ) k + 20  +  01 ((kp ) )  +  01 : Equation (4.7) is also satis ed when j = 1 as equations (3.9) and (4.3) give ((kp +j ))  k(kp +j ) k =   ((ki ) )  k(kp ) k:

(4:8)

Hence from (4.7), ((kp+1 ))  k(kp+1) k    ((kp ) )  k(kp ) k + 20   + 01 ((kp ) )  +  01 :

(4:9)

We now show that (4.9) implies that ((kp ) )  k(kp ) k converges to zero as k increases. For, if we de ne = 20 ((kp ) )  +  01 ; (4:10) p def = ((kp ) )  k(kp ) k and p def equations (4.3), (4.9) and (4.10) give that (4:11) p+1    p +   +  01 p and p+1 =   +  01 p and hence that

0  p

 (  )p

0

pX 01 + 0 1 p   + ( ) ( 10  )l l=0

0:

(4:12)

Recall that the requirement (3.2) ensures that  +  0 1 > 0. If  < 1, the sum in (4.12) can be bounded to give 0  p  (  )p 0 + (  + 01 )p 0=(1 0  10  ); (4:13)

14 whereas if  > 1, we obtain the alternative 0  p  (  )p ( 0 +   01 0 =(1 0   01 ));

(4:14)

and if  = 1,

(4:15) 0  p  (  )p 0 + p(  )p 0 : But, both 0 and 0 are nite. Thus, as p increases, p converges to zero; the second part of equation (4.11) implies that p converges to zero. Therefore, as the right-hand side of (4.7) converges to zero, so does ((k) )  k(k) k for all k . The truth of the lemma is nally established by raising ((k) )  k(i k) k to the power 1=  = 1 +  . We note that Lemma 4.2 may be proved under much weaker conditions on the sequence ( k f ) g than those imposed in Algorithm 3.1. All that is needed is that, in the proof just given, jX 01 " c (x(kp +l) )(kp +l+1) #m i i ( kp +l)  (i ) l=1 i=1 in (4.6) should be bounded by some multiple of a positive power of (kp+1 ). We now give our most general global convergence result, in the spirit of Conn et al. (1991, Theorem 4.4).

Theorem 4.3 Suppose that AS1 holds. Let fx(k) g 2 B; k 2 K; be any sequence generated by Algorithm 1 which converges to the point x3 for which AS2 holds. Then (i) x3 is a Kuhn-Tucker ( rst-order stationary) point for the problem (1.15){(1.17). (ii) The sequence  (x(k) ; (k) ; s(k) )g remains bounded for k 2 K and any limit point of this sequence is a set of Lagrange multipliers 3 corresponding to the Kuhn-Tucker point at x3 . (iii) the gradients rx 9(k) converge to g`(x3; 3 ) for k 2 K.

Proof. We consider each constraint in turn and distinguish two cases: 1. constraints for which ci (x3 ) 6= 0; and 2. constraints for which ci (x3 ) = 0. For the rst of these cases, we need to consider the possibility that a. the penalty parameter (k) is bounded away from zero; and b. the penalty parameter (k) converges to zero. Case 1a. As (k) is bounded away from zero, test (3.8) must be satis ed for all k suciently large and hence jc(i k)  (i k)=((k) )  j converges to zero. Thus as fc(i k) g converges to ci (x3 ) 6= 0, for k 2 K,  (i k) =((k))  converges to zero. Hence, using (2.3) and (3.4),

(ik) ((i k)) 



(k ) (ik) (k) )10  (i )  = (  i c(ik) + ((ik) )  c(ik) + ((ik) ) 

We aim to show that (i k) converges to zero and that ci (x3 ) > 0.

! 0:

(4:16)

15 Suppose rst that (i k) does not converge to zero. It follows directly from (2.5) and (3.4) that c(ik)  (ik) =((ik))  = (k) ((ik) 0  (ik) ): (4:17)

Then, as the left-hand side of (4.17) converges to zero and (k) and (i k) are bounded away from zero, we deduce that

 (ik) = (ik)(1 + (ik) );

(4:18)

((ik) )  (k ) ( k) ( k)  = 1 + i : ci + (i )

(4:19)

for some f(i k) g, k 2 K, converging to zero. But then, by de nition (2.3),

However, as (i k) is bounded away from zero and   1, (4.19) contradicts (4.16). Thus (i k) converges to zero, for k 2 K. It now follows that, as  (i k)=((k) )  converges to zero, so does (i k) . It also follows from (3.6) that c(i k) + (k) ((i k) )  > 0. As (k) is bounded and (i k) converges to zero, we have that ci (x3 )  0. But as ci (x3) 6= 0, we conclude that ci (x3) > 0,  (i k) converges to 3i = 0, for k 2 K, and ci (x3 )3i = 0.

Case 1b. As (k) converges to zero, Lemma 4.2 shows that (k) ((i k) )1+  and hence (k) (ik) and (k) ((ik) )  converges to zero. It follows immediately that the numerator of (2.3) converges to zero while the denominator converges to ci (x3 ) and hence that  (i k) converges to zero for k 2 K. Furthermore, it follows from (3.6) that c(ik) + (k) ((ik) )  > 0: as (k) ((ik) )  converges to zero, we have that ci (x3 )  0. But as ci (x3) is, by assumption, nonzero, ci (x3) > 0. Hence we may conclude that that ci (x3) > 0,  (i k) converges to 3i = 0, for k 2 K, and ci (x3 )3i = 0. We note from (2.15) and (2.16) that the set I 3  I (x3 ) is precisely the set of constraints covered in Case 1. Having thus identi ed the constraints in A3  A(x3) as those in Case 2 above, we consider Case 2 in detail. Case 2. By construction, at every iteration of the algorithm, (k) > 0. Moreover, from (3.5) and Case 1 above,

k(g(x(k) ) 0 (A(x(k))[A3] )T  ([Ak)3] )[F ] k  k(A(x(k) )T[I3]  ([Ik3) ] )[F ] k + kP (x(k) ; rx9(k) )[F ] k  k(A(x(k) )T[I3]  ([Ik3) ] )[F ] k + !(k)  ! (k) 1

1

1

(4:20)

1

for some ! (k) converging to zero. Thus using AS2 and Lemma 4.1, the Lagrange multiplier estimates  ([Ak)3 ] are bounded and, as L(x(k) ; ! (k); x3; F1 ) is non-empty, these multipliers have at least one limit point. If [A3 ] is such a limit, AS1, (4.20) and the identity c(x3 )[A3 ] = 0 ensure that (g(x3 ) 0 (A(x3 )[A3 ] )T [A3 ] )[F1 ] = 0, c(x3 )T[A3 ] [A3 ] = 0 and [A3 ]  0. Thus, from AS2, there is a subsequence K0  K for which fx(k) g converges to x3 and f(k) g converges to 3 as k 2 K0 tends to in nity and hence, from (2.4), rx9(k) converges to g` (x3 ; 3 ). We also have that c(x3 )T 3 = 0 (4:21)

16 with both ci (x3 ) and 3i (i = 1; : : : ; m) non-negative and at least one of the pair equal to zero. We may now invoke Lemma 2.1 and the convergence of rx 9(k) to g` (x3 ; 3 ) to see that (g` (x3 ; 3))[F1 ] = 0 and g`(x3 ; 3 )T x3 = 0: (4:22) The variables in the set F1 \ Nb are, by de nition, positive at x3 . The components of g`(x3 ; 3 ) indexed by D1 are non-negative from (2.10) as their corresponding variables are dominated. This then gives the conditions x3i > 0 and (g`(x3; 3 ))i = 0 for i 2 F1 \ Nb ; (g`(x3; 3 ))i = 0 for i 2 F1 \ Nf ; (4:23) 3 xi = 0 and (g`(x3; 3 ))i  0 for i 2 D1 and x3i = 0 and (g`(x3; 3 ))i = 0 for i 2 F4 : Thus we have shown that x3 is a Kuhn-Tucker point and hence we have established results (i), (ii) and (iii). Note that Theorem 4.3 would remain true regardless of the actual choice of f! (k) g provided the sequence converges to zero. Now suppose we replace AS2 by the following stronger assumption: AS3: The matrix A(x3)[A3 ;F1 ] is of full rank at any limit point x3 of the sequence fx(k) g and set F1 de ned by (2.13). Furthermore, we de ne the least-squares Lagrange multiplier estimates (corresponding to the sets F1 and A3 )

(x)[A3 ] def = 0(A(x)+[A3 ;F1 ] )T g (x)[F1 ]

(4:24)

at all points where the right generalized inverse

A(x)+[A3 ;F1 ] def = A(x)T[A3 ;F1 ] (A(x)[A3 ;F1 ] A(x)T[A3 ;F1 ] )01

(4:25)

of A(x)[A3 ;F1 ] is well de ned. We note that these estimates are di erentiable functions of x whenever A(x)[A3 ;F1 ] is of full rank (see, for example, Conn et al., 1991, Lemma 2.2). Then we obtain the following improvement on Theorem 4.3 which has the same avour as Conn et al. (1991, Lemma 4.3).

Theorem 4.4 Suppose that the assumptions of Theorem 4.3 hold excepting that AS2 is replaced by AS3. Then the conclusions of Theorem 4.3 remain true and, in addition, we have that (iv) the vector of Lagrange multipliers 3 corresponding to the Kuhn-Tucker point at x3 are unique, (v) the Lagrange multiplier estimates  (x(k) ; (k) ; s(k))i satisfy

 (x(k) ; (k) ; s(k) )i = i(k)(ik) ;

(4:26)

where i(k) converges to zero for all i 2 I 3 as k 2 K tends to in nity, and (vi) there are positive constants a1 , a2 , a3 and an integer k0 such that

k((x(k) ; (k) ; s(k) ) 0 3 )[A3] k  a1 !(k) + a2 kx(k) 0 x3k + a3(k) k([Ik3) ] k;

(4:27)

17

h

and

k((x(k) ) 0 3 )[A3] k  a2kx(k) 0 x3 k; (4:28) m ci (x(k) )(ik) =i(k) i=1  (k) a1 !(k) + a2 kx(k) 0 x3 k+ (4:29) (1 + (k) (1 + a3 ))k([Ik3) ] k + k((k) 0 3 )[A3 ] k

i

kc(x(k))

h

i



i h (k) i(k) = (ik)

[A] k 

i2A



h

a1!(k) + a2kx(k) 0 x3 k+ i a3 (k) k([Ik3) ] k + k((k) 0 3 )[A] k

(4:30)

for all k  k0; (k 2 K), and any subset A  A3 and where

 (k) def = max3 i(k) i2I

(4:31)

converges to zero as k 2 K tends to in nity.

Proof. Assumption AS3 implies that there is at most one point in L(x3 ; 0; x3 ; F1 ) and thus AS2 holds. The conclusions of Theorem 4.3 then follow. The conclusion (iv) of the current theorem is a direct consequence of AS3. We have already identi ed the set of constraints for which ci (x3 ) = 0 with A3. Let

(k) def = i

s(ik) : ci (x(k) ) + s(ik)

(4:32)

Then (2.3) shows that (i k) = i(k)(i k) . We now prove that i(k) converges to zero for all i 2 I 3 as k 2 K tends to in nity. If (k) is bounded away from zero, we have established in Case 1a of the proof of Theorem 4.3 that (i k) converges to zero. Hence, as (k) is nite, s(i k) also converges to zero. On the other hand, if (k) converges to zero, we have established in Case 1b of the proof of Theorem 4.3 that (k) ((i k) )  and hence, once again, s(i k) converge to zero. But as i 2 I 3 , ci (x(k)) is bounded away from zero for all k 2 K suciently large, and therefore i(k) converges to zero for all i 2 I 3 which establishes (v). To prove (vi), we let  be any closed, bounded set containing the iterates x(k) , k 2 K. We note that, as a consequence of AS1 and AS3, for k 2 K suciently large, A(x(k) )+[A3 ;F1 ] exists, is bounded and converges to A(x3)+[A3 ;F1 ] . Thus we may write

k(A(x(k) )+[A3;F ] )T k  a1

(4:33)

1

for some constant a1 > 0. As the variables in the set F1 are oating, equations (2.6), (2.7), (2.12) and the inner iteration termination criterion (3.5) give that

kg(x(k) )[F ] + A(x(k) )T[A3;F ] ([Ak)3] + A(x(k) )T[I3;F ]  ([Ik3) ]k  !(k): 1

1

1

(4:34)

By assumption, (x)[A3 ] is bounded for all x in a neighbourhood of x3 . Thus we may deduce from (4.24), (4.33) and (4.34) that

k( (k) 0 (x(k) ))[A3]k = k(A(x(k) )+[A3;F ] )T g(x(k) )[F ] + ([Ak)3] k = k(A(x(k) )+[A3 ;F ] )T (g(x(k) )[F ] + A(x(k) )T[A3 ;F ] ([Ak)3 ] )k (4:35)  k(A(x(k) )+[A3;F ] )T k(!(k) + kA(x(k))T[I 3;F ] kk ([Ik3) ] )k  a1!(k) + a3k ([Ik3) ] )k; 1

1

1

1

1

1

1

18 where a3 def = a1 maxx2  kA(x)T[I 3 ;F1 ] k. Moreover, from the integral mean-value theorem and the (local) di erentiability of the least-squares Lagrange multiplier estimates (see, for example, Conn et al., 1991, Lemma 2.2) we have that ((x(k) ) 0 (x3 ))[A3 ] =

Z

0

1



rx(x(t))[A3] dt :(x(k) 0 x3 );

(4:36)

where rx (x)[A3 ] is given by Conn et al. (1991), equation (2.17), and where x(t) = x(k) + t(x3 0 x(k) ). Now the terms within the integral sign are bounded for all x suciently close to x3 and hence (4.36) gives k((x(k) ) 0 3 )[A3] k  a2 kx(k) 0 x3 k (4:37) for all k 2 K suciently large, for some constant a2 > 0, which is just the inequality (4.28). We then have that (x(k) )[A3 ] converges to 3[A3 ] . Combining (4.26), (4.35) and (4.37), we obtain

k((k) 0 3 )[A3] k  k((k) 0 (x(k) ))[A3] k + k((x(k) ) 0 3 )[A3] k  a1!(k) + a2kx(k) 0 x3 k + a3 k(x(k) ; (k) ; s(k))[I 3]k  a1!(k) + a2kx(k) 0 x3 k + a3 (k) k([Ik3) ] k;

(4:38)

the required inequality (4.27). It remains to establish (4.29) and (4.30). The relationships (2.5) and (3.4) imply that

ci (x(k) ) = (k) (i(k) =(ik) )((ik) 0  (ik) );

(4:39)

and

ci (x(k) ) (ik) =i(k) = (k) ((ik) 0  (ik) ) (4:40) for 1  i  m. Bounding (4.39) and using the triangle inequality and the inclusion A  A3 , we obtain

kc(x(k) )

[A] k 

 





i h (k) i(k) =(ik) i2A k( (k) 0 (k) )[A] k i i h (k ) h ( k) ( k )   i =i i2A k((k) 0 3 )[A]k + k((k) 0 3 )[A] k i i h (k ) h ( k) ( k )  (k) 0 3 )[A3 ]k + k((k) 0 3 )[A] k : k(   =

i

i

(4:41)

i2A

But then, combining (4.38) and (4.41), we see that (4.30) holds for all k 2 K suciently large. Furthermore, the triangle inequality, the relationships (4.26), (4.27) and 3[I 3 ] = 0 (4:42) yield the bound

k (k) 0 (k) k   

k (k) 0 3 k + k(k) 0 3 k k( (k) 0 3)[A3] k + k((k) 0 3 )[A3] k + k ([Ik3) ] k + k([Ik3) ] k a1 !(k) + a2 kx(k) 0 x3k+ (1 + (1 + a3) (k) )k([Ik3) ] k + k((k) 0 3 )[A3 ] k

(4:43)

Hence taking norms of (4.40) and using (4.43), we see that (4.29) holds for all k suciently large.

2K

19 5

Asymptotic convergence analysis

We now give our rst rate-of-convergence result. It is inconvenient that the estimates (4.27)-(4.29) depend upon kx(k) 0 x3 k as this term, unlike the other terms in the estimates, depends on a posteriori information. The next lemma removes this dependence and gives a result similar to the previous theory in which the errors in x are bounded by the errors in the multiplier estimates k((k) 0 3 )[A3 ]k and k([Ak)3 ] k (see Polyak, 1992, Theorem 1). However, as an inexact minimization of the Lagrangian barrier function is made, a term re ecting this is also present in the bound. Once again, the result allows for our handling of simple bound constraints. Before giving our result, which is in the spirit of Conn et al. (1991, Lemma 5.1), we need to make two additional assumptions.

AS4: The second derivatives of the functions f (x) and the ci (x) are Lipschitz continuous at all points within an open set containing B. AS5: Suppose that (x3 ; 3 ) is a Kuhn-Tucker point for problem (1.15){(1.17) and that

and

= fijci (x3 ) = 0 and 3i > 0g A31 def A32 def = fijci (x3 ) = 0 and 3i = 0g

(5:1)

J1 def = fi 2 Nb j(g` (x3 ; 3 ))i = 0 and x3i > 0g [ Nf J2 def = fi 2 Nb j(g` (x3 ; 3 ))i = 0 and x3i = 0g:

(5:2)

Then we assume that the matrix H`(x3 ; 3 )[J ;J ] (A(x3 )[A;J ] )T A(x3 )[A;J ] 0

!

(5:3)

is non-singular for all sets A and J , where A is any set made up from the union of A31 and any subset of A32 and J is any set made up from the union of J1 and any subset of J2. We note that assumption AS5 implies AS3.

Lemma 5.1 Suppose that AS1 holds. Let fx(k) g 2 B; k 2 K; be any sequence generated by Algorithm 1 which converges to the point x3 for which AS5 holds. Let 3 be the corresponding vector of Lagrange multipliers. Furthermore, suppose that AS4 holds and that the condition i h (k ) ( k) (k )  0 1   (5:4) i =i i2A3  a4 ( ) 1

is satis ed for some strictly positive constants a4 and  and all k 2 K. Let  be any constant satisfying 0 <   : (5:5) Then there are positive constants max, a5 ; 1 11 ; a13 , and an integer value k0 so that, if (k0 )  max, kx(k) 0 x3 k  a5!(k) + a6(  h(k) )k((ik) 0  3)[A3] k+ (5:6) a7((k) )10 ((ik) )  i2A3 + a8  (k)k([Ik3) ] k 2 k((x(k) ; (k) ; s(k) ) 0 3)[A3] k  a9 !(k) + a10(  h(k) )k((ik) 0  3)[A3] k+ a11 ((k) )10 ((ik) )  i2A3 + a12 (k) k([Ik3) ] k (5:7) 2

20 and m ci (x(k)) (ik)=i(k) i=1  (k) a9!(k) + a13 k((k) 0 3 ) [A3 ] k+ i i h a11 ((k) )10 ((ik) )  i2A3 + (1 + (1 + a12 )(k) )k([Ik3) ] k 2 (5:8) for all k  k0; (k 2 K) and where the scalar  (k) , as de ned by (4.31), converges to zero as k 2 K tends to in nity. h

i



h

Proof. We rst need to make some observations concerning the status of the variables as the limit point is approached. We pick k suciently large that the sets F1 and D1 , de ned in (2.13), have been determined. Then, for k 2 K, the remaining variables either

oat (variables in F2 ) or oscillate between oating and being dominated (variables in F3). Now recall the de nition (2.14) of F4 and pick an in nite subsequence, K~ , of K such that: (i) F4 = F5 [ D2 with F5 \ D2 = ;; (ii) variables in F5 are oating for all k 2 K~ ; and (iii) variables in D2 are dominated for all k 2 K~ . Notice that the set F2 of (2.13) is contained within F5 . Note, also, that there are only a nite number ( 2jF4 j ) of such subsequences K~ and that for k suciently large, each k 2 K is in one such subsequence. It is thus sucient to prove the lemma for k 2 K~ . Now, for k 2 K~ , de ne

F def = F1 [ F5 and D def = D1 [ D2: (5:9) So, the variables in F are oating while those in D are dominated. We also need to consider the status of the constraints in A32 . We choose a  satisfying (5.5) and pick an in nite subsequence, K , of K~ such that (a) A32 = A3s [ A3b with A3s \ A3b = ; where A3s and A3b are de ned below; (b) the Lagrange multiplier estimates satisfy

(ik)  ((k))10 ((ik) ) 

(5:10)

for all constraints i 2 A3s and all k 2 K ; and (c) the Lagrange multiplier estimates satisfy

(ik) > ((k))10 ((ik) ) 

(5:11)

for all constraints i 2 A3b and all k 2 K . 3 We note that there are only a nite number ( 2jA2 j ) of such subsequences K and that for k suciently large, each k 2 K is in one such subsequence. It is thus sucient to prove the lemma for k 2 K . We de ne A = A31 [ A3b (5:12) and note that this set is consistent with the set Adescribed by AS5. It then follows from (5.1) and (5.12) that A3 = A [ A3s with A \ A3s = ;: (5:13) We note that, if i 2 A3b , (5.11) gives

i(k) =(ik) = ((ik) )  = (ik) < ((k) )01

(5:14)

21 for all k 2 K . Moreover, inequalities (5.4) and (5.5) imply h

i  (k) = (k)

i

i

3

i2A1

 a4 ((k) )01  a4 ((k) )01:

(5:15)

It then follows directly from (5.14) and (5.15) that h

i  (k) = (k)

i

i

i2A



 a14 ((k))01

(5:16)

for some positive constants , satisfying (5.5), and a14 and for all k 2 K. Furthermore 3[A3s ] = 0; (5:17) as A3s  A32. Finally, the same inclusion and (5.10) imply that







i i h h (5:18) k([Ak)3s ] k  ((k) )10 ((ik) )  i2A3  ((k) )10 ((ik) )  i2A3 s 2 for all k 2 K . We may now invoke Theorem 4.4, part (vi), the bound (5.16) and inclusion A  A3 . to obtain the inequalities k((x(k) ; (k); s(k) ) 0 3 )[A] k  a1 !(k) + a2 kx(k) 0 x3 k + a3 (k) k([Ik3) ] k; (5:19) and h kc(x(k) )[A] k  a14 ((k) ) a1 !(k) + a2 kx(k) 0 x3 ki+ (5:20) a3  (k) k([Ik3) ] k + k((k) 0 3 )[A3 ]k for all suciently large k 2 K . Moreover,  (k) converges to 3 and hence (2.4) implies that rx 9(k) converges to g`3 . Therefore, from Lemma 2.1, (5:21) x3i = 0 for all i 2 D and (g`3 )i = 0 for all i 2 F : Using Taylor's theorem and the identities (4.42), (5.13) and (5.17), we have rx9(k) = g(k) + A(k) T (k) = g(x3 ) + H (x3 )(x(k) 0 x3 ) + A3 T  (k) + Pm 3 (k) 3 (k ) 3  (k )  (k ) (5:22) j =1 j Hj (x )(x 0 x ) + r1 (x ; x ;  ) = g`3 + H`3 1 (x(k) 0 x3) + A3[AT] 1 ( (k) 0 3 )[A] + A3[AT3s ]  ([Ak)3s ] + A3[IT3 ]  ([Ik3) ] + r1 (x(k) ; x3;  (k) ) + r2 (x(k) ; x3 ; (k) ; 3); where Z 1 r1 (x(k); x3;  (k) ) = (H`(x(k) + t(x3 0 x(k) );  (k) ) 0 H`(x3;  (k) ))(x(k) 0 x3 )dt (5:23)

and

0

m

X r2(x(k) ; x3 ; (k) ; 3 ) = ((jk) 0 3j )Hj (x3 )(x(k) 0 x3):

j =1

(5:24)

The boundedness and Lipschitz continuity of the Hessian matrices of f and the ci in a neighbourhood of x3 along with the convergence of  (k) to 3 for which the relationships (4.31) and (4.42) hold then give that kr1(x(k) ; x3 ; (k) )k  a15kx(k) 0 x3 k2 and kr2 (x(k) ; x3 ;  (k) ; 3 )k  a16kx(k) 0 x3 kk (k) 0 3 k  a16kx(k) 0 x3 k(k((k) 0 3 )[A3] k + (k) k([Ik3) ] k) (5:25)

22 for some positive constants a15 and a16 , using (4.26). In addition, again using Taylor's theorem and that c(x3 )[A] = 0,

c(x(k) )[A] = A3[A] 1 (x(k) 0 x3 ) + r3 (x(k) ; x3 )[A] ;

(5:26)

where (r3 (x(k) ; x3 ))i =

Z

0

1

t2

Z

0

1

(x(k) 0 x3)T Hi (x3 + t1 t2 (x(k) 0 x3))(x(k) 0 x3)dt1 dt2

(5:27)

for i 2 A (see Gruver and Sachs, 1980, p.11)). The boundedness of the Hessian matrices of the ci in a neighbourhood of x3 then gives that kr3(x(k) ; x3 )[A] k  a17kx(k) 0 x3 k2 (5:28) for some constant a17 > 0. Combining (5.22) and (5.26), we obtain ! ! H`3 (A3[A] )T x(k) 0 x3 A3[A] 0 ((k) 0 3 )[A] ( k ) (k ) 3 3 T  3 0 (A3 3 )T  (k)3 ! r x9 0 g` 0 (A[I 3 ] )  [As ] [As ] 0 r1 + r2 [I ] = ( k ) (r3 )[A] c(x )[A]

(5:29)

!

;

where we have suppressed the arguments of r1 , r2 and r3 for brevity. We may then use (5.21) to rewrite (5.29) as 1 (x(k) 0 x3 )[F ] H`3[F ;F ] H`3[F ;D] A3[AT;F ] B C B C 3 C H`3[D;D] A3[AT;D] A B @ H` [D ;F ] x([Dk)] @ A 3 3 A[A;F ] A[A;D] 0 ( (k) 013)[A] 0 (rx 9(k) )[F ] 0 A3[AT3s ;F ] ([Ak)3s ] 0 A3[IT3 ;F ]  ([Ik3) ] (r1 + r2)[F ] C B ( k ) ( k ) 3 T ( k ) 3 3 T C (rx 9 )[F ] 0 g` [D] 0 A[A3s ;D]  [A3s ] 0 A[I 3 ;D]  [I 3 ] A 0 @ (r1 + r2)[D] (r3 )[A] c(x(k) )[A] 0

0 B

=B @

0

1

(5:30)

1 C A

:

Then, rearranging (5.30) and removing the middle horizontal block we obtain ! ! H`3[F ;F ] A3[AT;F ] (x(k) 0 x3)[F ] = A3[A;F ] 0 ((k) 0 3 )[A] 1 0 rx9(k) )[F ] 0 H`3[F ;D] x([Dk)] 0 A3[AT3s ;F ] ([Ak)3s ] 0 A3[IT3;F ]  ([Ik3) ] A @ 0 (r1(r+3)r2 )[F ] ( k) ( k ) 3 c(x )[A] 0 A[A;D] x[D] [A]

!

:

(5:31) Roughly, the rest of the proof proceeds by showing that the right-hand side of (5.31) is O(! (k) )+ O ((k) k([Ik3) ] k)+ O ((k) k((k) 0 3)[A3 ] k). This will then ensure that the vector on the left-hand side is of the same size, which is the result we require. Firstly observe that kx([Dk)]k  !(k) ; (5:32) from (2.11) and (3.5) and k(rx9(k) )[F ] k  !(k) ; (5:33) from (2.12). Consequently, using (5.21) and (5.32), kx(k) 0 x3 k  k(x(k) 0 x3)[F ] k + !(k) : (5:34)

23 Let 1x(k) = k(x(k) 0 x3 )[F ] k. Combining (4.31), (4.42), (5.19) and (5.34), we obtain

k( (k) 0 3)[A] k  a18 !(k) + a2 1x(k) + a3(k) k([Ik3) ]k;

(5:35)

where a18 def = a1 + a2 . Furthermore, from (5.25), (5.28), (5.34) and (5.35),



(r1 + r2 )[F ] (r3 )[A]

!





a19 (1x(k) )2 + a20 1x(k) !(k) + a21 (!(k) )2 + a22  (k) k([Ik3) ] k(!(k) + 1x(k) )

(5:36)

where a19 def = a15 + a17 + a16 a2, a20 def = 2(a15 + a17 )+ a16 (a18 + a2), a21 def = a15 + a17 + a16 a18 def and a22 = a16 (1 + a3 ). Moreover, from (5.18), (5.20), (5.32), (5.33) and (5.34),

0

@

1

(rx 9(k) )[F ] 0 H`3[F ;D] x([Dk)] 0 A3[AT3s ;F ]  ([Ak)3s ] 0 A3[IT3 ;F ] ([Ik3) ] A



c(x(k) )[A] 0 A3[A;D] x([Dk)]

i h ( k)  ( k) ( k ) ( k ) ( k ) 1 0  a23 ! + a24  k[I 3 ] k + a25 ( ) (i ) i2A3 + 2 i h a14 ((k) ) a18 !(k) + a2 1x(k) + a3  (k) k([Ik3) ] k + k((k) 0 3 )[A3 ] k ; where

a23 def = 1 +

H`3[F ;D] A3[A;D]

!



a24 def = kA3[IT3 ;F ] k:and a25 def = kA3[AT3s ;F ] k:

(5:37)

(5:38)

By assumption AS5, the coecient matrix on the left-hand side of (5.31) is nonsingular. Let its inverse have norm M . Multiplying both sides of the equation by this inverse and taking norms, we obtain

!

(x(k) 0 x3)[F ]

M [a19 (1x(k) )2 + a20 1x(k) !(k) + a21 (!(k) )2 +



( (k) 0 3 )[A] a22  (k) k([Ik3) ] k (! (k) + 1x(k) ) + a23 !(k) + a24 (k) k[(Ik3) ] k+ i h a25 ((k) )10 ((ik) )  i2A3 + a14 ((k) ) (a18 !(k) + a2 1x(k) + 2 k((k) 0 3)[A3] k + a3 (k) k([Ik3) ] k)] = (Ma19 1x(k) + Ma20 !(k) + Ma2 a14 ((k) ) )1x(k) + (Ma21 !(k) + Ma14 a18 ((k) ) + Ma23 )!(k) + Ma14 ((k) ) k( (k) 0 3)[A3 ] k+ i h Ma25 ((k) )10 ((ik) )  i2A3 +

(Ma24 + Ma22 (! (k) + 1x(k) ) + Ma3 a14 ((k) )) (k) k([Ik3) ] k: (5:39) ( k ) The mechanisms of Algorithm 1 ensures that ! converges to zero. Moreover, Theorem 4.3 guarantees that 1x(k) also converges to zero for k 2 K . Thus, there is a k0 for which !(k)  min(1; 1=(4Ma20 )) (5:40) and 1x(k)  min(1; 1=(4Ma19 )) (5:41) for all k  k0 (k 2 K ). Furthermore, let 2

max  min(1; 1=(4Ma2 a14 )1= ):

(5:42)

24 Then, if (k)  max , (5.39), (5.40), (5.42) and (5.41) give 1x(k) 

1x(k) + M (a21 + a14 a18 + a23 )! (k)+ i h ( k )  ( k )  ( k ) 3 ( k ) 1 0  Ma14 ( ) k( 0  )[A3 ] k + Ma25 ( ) (i ) i2A3 + 3 4

M (a24 + 2a22 + a3a14 )(k) k([Ik3) ]k:

2

(5:43)

Cancelling the 1x(k) terms in (5.43), multipling the resulting inequality by four and substituting into (5.34), we obtain the desired inequality (5.6), where a5 def = 1 + 4M (a21 + def def def a14 a18 + a23 ), a6 = 4Ma14 , a7 = 4Ma25 and a8 = 4M (a24 +2a22 + a3 a14 ). The remaining inequalities (5.7) and (5.8) follow directly by substituting (5.6) into (4.27) and (4.29), the required constants being a9 def = a1 + a2 a5 , a10 def = a2 a6, a11 def = a2 a7 , a12 def = a3 + a2a8 and def a13 = 1 + a2a6 . In order for Lemma 5.1 to be useful, we need to ensure that (5.4) holds. There is at least one case where this is automatic. We consider the following additional assumption. AS6: The iterates fx(k) g generated by Algorithm 1 have a single limit point x3 . We then have:

Lemma 5.2 Suppose that AS1 holds and that the iterates fx(k) g generated by Algorithm 1 satisfy AS6 and converges to the point x3 for which AS3 holds. Let 3 be the corresponding vector of Lagrange multipliers. Suppose furthermore that  < 1. Then (i) f(k) g converges to 3 , (ii)  (k)  (k) (k) ; (5:44) where (k) converges to zero, as k increases, and (iii) inequality (5.4) is satis ed for all k . Moreover, if AS4 and AS5 replace AS3, (iv) the conclusions of Lemma 5.1 hold for all k , and any 0 <   1. Proof. We have, from Theorem 4.4 and AS6, that the complete sequence of Lagrange multiplier estimates f (k) g generated by Algorithm 1 converges to 3 . We now consider the sequence f(k) g, There are three possibilities. Firstly, (k) may be bounded away from zero. In this case, Step 3 of the Algorithm 1 must be executed for all k suciently large which ensures that f(k) g and f (k01) g are identical for all large k . As the latter sequence converges to 3, so does the former. Secondly, (k) may converge to zero but nonetheless there may be an in nite number of iterates for which (3.8) is satis ed. In this case, the only time adjacent members of the sequence f(k) g di er, (k) = (k01) and we have already observed that the latter sequence f(k01) g converges to 3. Finally, if the test (3.8) were to fail for all k > k1, k([Ik3) ] k and k((k) 0 3 )[A3 ] k will remain xed for all k  k1 , as Step 4 would then be executed for all subsequent iterations. But then (4.29) implies that h

m ci (x(k) )(ik) =i(k) i=1  a26 (k)

for some constant a26 for all k  < 1,

i



(5:45)

 k2  k1. As (k) converges to zero as k increases and a26 (k)  0 ((k))  =  (k)

(5:46)

25 for all k suciently large. But then inequality (3.8) must be satis ed for some k  k1 contradicting the supposition. Hence, this latter possibility proves to be impossible. Thus f(k) g converges to 3. Inequality (5.44) then follows immediately by considering the de nitions (3.4), (4.31) and (4.32) for i 2 I 3 and using the convergence of ([Ik3) ] to 3[I 3 ] = 0; a suitable representation of (k) would be ((i k) )  ( k ) :  = max3 i2I ci (x(k) ) + (k) (ik) )  !

(5:47)

Hence  (i k) converges to 3i > 0 and is thus bounded away from zero for all k, for each i 2 A31. But this and the convergence of f(k) g to 3 implies that i(k) =(ik) = ((ik) )  = (ik) is bounded and hence inequality (5.4), with  = 1, holds for all k. The remaining results follow directly from Lemma 5.1 on substituting  = 1 into (5.5). We now show that the penalty parameter will normally be bounded away from zero in Algorithm 1. This is important as many methods for solving the inner iteration subproblem will encounter diculties if the parameter converges to zero since this causes the Hessian of the Lagrangian barrier function to become increasingly ill conditioned. The result is analogous to Theorem 5.3 of Conn et al. (1991). We need to consider the following extra assumption.

AS7: (Strict complementary slackness condition 1) Suppose that (x3; 3 ) is a KuhnTucker point for problem (1.15){(1.17). Then

A32 = fijci (x3 ) = 0 and 3i = 0g = ;:

(5:48)

Theorem 5.3 Suppose that the iterates fx(k) g generated by Algorithm 1 of Section 3 satisfy AS6 and that AS1, AS4 and AS5 hold. Furthermore, suppose that either (i)  = 1 hold and we de ne

def = min( 12 ; ! ) and def = min( 21 ; ! ) or

(5:49)

(ii) AS7 holds and we de ne

def = min(1; ! ) and def = min(1; ! ):

(5:50)

Then, whenever  and  satisfy the conditions

and

 < min(1; ! )

(5:51)

 <

(5:52)

 +  < + 1; there is a constant min > 0 such that (k)  min for all k.

(5:53)

Proof. Suppose, otherwise, that (k) tends to zero. Then, Step 4 of the algorithm must be executed in nitely often. We aim to obtain a contradiction to this statement by showing that Step 3 is always executed for k suciently large. We note that our assumptions are sucient for the conclusions of Theorem 4.4 to hold.

26 Lemma 5.2, part (i), ensures that f(k) g converges to 3 . We note that, by de nition,

(k)  1 < 1:

(5:54)

Consider the convergence tolerance !(k) as generated by the algorithm. By construction and inequality (5.54), !(k)  !0 ((k) ) ! (5:55) for all k . (This follows by de nition if Step 4 of the algorithm occurs and because the penalty parameter is unchanged while !(k) is reduced when Step 3 occurs.) As Lemma 5.2, part (iii), ensures that (5.4) is satis ed for all k , we may apply Lemma 5.1 to the iterates generated by the algorithm. We identify the set K with the complete set of integers. As we are currently assuming that (k) converges to zero, we can ensure that (k) is suciently small so that Lemma 5.1 applies to Algorithm 1 and thus that there is an integer k1 and constants a9; 1 11 ; a13 so that (5.7) and (5.8) hold for all k  k1 . In particular, if we choose

 = 0 def =

(

if assumptions (i) of the current theorem hold 1 if assumptions (ii) of the current theorem hold, 1 2

(5:56)

we obtain the bounds k((x(k) ; (k) ; s(k) ) 0 3)[A3] k  a9 !(k) + (a10 + a11)((k) )0 k((k) 0 3 )[A3] k+ (5:57) +a12  (k) k([Ik3) ]k and h

m ci (x(k) )i(k) =i(k) i=1  (k) a9 !(k) + (a11 + a13 )k((k) 0 3)[A3 ] k+ i (1 + (1 + a12 ) (k) )k([Ik3) ] k i



h

(5:58)

for all k  k1 , from (5.54) and the inclusion A32  A3 . Moreover, as Lemma 5.2, part (ii), ensures that  (k) converges to zero, there is an integer k2 for which

 (k)  (k)

(5:59)

for all k  k2 . Thus, combining (5.54), (5.57), (5.58) and (5.59), we have that

k((x(k) ; (k) ; s(k) ) 0 3)[A3] k  a9 !(k) + a27 ((k)) k((k) 0 3 )[A3]k+ +a12 (k) k([Ik3) ] k 0

and

h

i

m



(5:60)

h

ci (x(k) )(ik) =i(k) i=1  (k) a9 !(k) + a28 k((k) 0 3)[A3 ] k+ i a29 k([Ik3) ] k

(5:61)

for all k  max(k1 ; k2 ), where a27 def = a10 + a11 , a28 def = a11 + a13 and a29 def = 2 + a12 . Now, let k3 be the smallest integer such that  ((k) )10   0 ; (5:62) !0a30 



 min 1; a1 ; ; 31  0 (k)  2! a

((k) )0 0 

0 9

(5:63) (5:64)

27 and

((k) ) +10  0 

 2! (a +0 a a ) 0 29 28 31

(5:65)

for all k  k3, where a30 def = a9 + a28 + a29 and a31 def = a9 + a12 + a27 . Furthermore, let k4 be such that k((k) 0 3 )[A3] k  !0 and k([Ik3) ] k  !0 (5:66) for all k  k4 . Finally de ne k5 = max(k1 ; k2 ; k3 ; k4 ), let 0 be the set fk j Step 4 is executed at iteration k 0 1 and k  k5 g and let k0 be the smallest element of 0. By assumption, 0 has an in nite number of elements. For iteration k0 , ! (k0 ) = !0((k0 ) ) ! and (k0 ) = 0((k0 ) )  . Then (5.61) gives h

m ci (x(k0 )) i(k0 ) =i(k0 ) i=1 h i  (k0 ) a9!(k0 ) + a28k((k0 ) 0 3)[A3] k + a29 k[(Ik30]) k  !0 (a9 + a28 + a29)(k0 ) = !0 a30(k0) [from (5.66)] [from (5.62)]:  0 ((k0 ))  = (k0)

i

(5:67)

Thus, from (5.67), Step 3 of Algorithm 1 will be executed with (k0 +1) =  (x(k0 ) ; (k0 ) ; s(k0 )). Notice that the de nition (5.56) of 0 and the de nitions (5.49) and (5.50) and restriction (5.52) imply that  min(0 ; ! )  1 (5:68) and  <  min(0; ! )  1: (5:69) Inequality (5.60), in conjunction with (5.55), (5.66) and (5.68), guarantees that

k((k +1) 0 3)[A3] k  a9 !(k ) + a27((k )) k((k ) 0 3 )[A3] k + a12(k ) k([Ik3])k  a9 !0 ((k ) ) ! + a27!0 ((k ) ) + a12!0 (k )  !0a31 ((k ) ) : 0

0

0

0

0

0

0

0

0

0

0

0

(5:70) Furthermore, inequality (4.26), in conjunction with (4.31), (5.59), (5.66) and (5.68) ensures that (k0 ) (k0 ) (k0 )  ! ((k0 ) ) : k[(Ik30+1) (5:71) 0 ] k   k[I 3 ] k  !0 We shall now suppose that Step 3 is executed for iterations k0 + i, (0  i  j ), and that k((k0+i+1) 0 3)[A3] k  !0a31 ((k0) ) +  i (5:72) and k[(Ik30 ]+i+1)k  !0 ((k0 )) +  i: (5:73) Inequalities (5.70) and (5.71) show that this is true for j = 0. We aim to show that the same is true for i = j + 1. Under our supposition, we have, for iteration k0 + j + 1, that (k0 +j +1) = (k0 ), ! (k0 +j +1) = !0((k0 ) ) ! + ! (j +1) and  (k0 +j +1) = 0((k0 ) )  +  (j +1) .

28 Then (5.61) gives h

i

m



ci (x(k0 +j +1) ) (ik0 +j +1) =i(k0 +j +1) i=1 h ! + a k((k0 +j +1) 0 3 ) 3 k+  (k0) a9 !0 ((k0) ) ! (j+1)+ 28 [A ] i ( k0 +j +1) a k  k [from (5.54)] 29 [I 3] h  (k0) a9 !0 ((k0) ) ! (j+1)+i ! + a28a31!0((k0) ) +  j + a29 !0 ((k0 )) +  j [from (5.72)-(5.73)] ( k ) + ( j +1)+1  a9 !0 ( 0 )   + (a29 + a28 a31 )!0((k0 )) +  j +1 [from (5.51)-(5.52)] = !0 (a9 (k0 ) + (a29 + a28 a31 )((k0 ) ) +10  0  )((k0 ) )  +  (j +1)  0((k0) )  +  (j+1) = (k0+j+1) [from (5.64)-(5.65)]:

(5:74)

Thus, from (5.74), Step 3 of Algorithm 1 will be executed with (k0 +j +2) =  (x(k0 +j +1) ; (k0 +j +1); s(k0 +j +1) ). Inequality (5.60) then guarantees that

k((k +j+2) 0 3 )[A3] k  a9!(k +j+1) + a27((k +j+1)) k((k +j+1) 0 3 )[A3] k+ a12 (k +j +1)k([Ik3 +] j +1) k  a9!0((k ) ) ! + ! (j+1) + a27 !0 (a31((k ) ) 0  )((k ) ) +  (j+1) + 0

0

0

0

0

0

0

0

 

0

0

0

[from (5.72)-(5.73)] a12 !0 ((k0 ))10  )((k0 ) ) +  (j+1) a9!0((k0 ) ) ! + ! (j +1) + a27 !0 ((k0 ) ) +  (j +1)+ a12 !0 ((k0 )) +  (j +1) [from (5.63)] a9!0((k0 ) ) +  (j +1) + a27 !0 ((k0 ) ) +  (j +1) + a12 !0 ((k0 ) ) +  (j +1)

(5:75)

[from (5.68)-(5.69)] = !0 (a9 + a12 + a27 )((k0 ) ) +  (j+1) = !0 a31 ((k0 ) ) +  (j +1)

which establishes (5.72) for i = j + 1. Furthermore, inequalities (4.26) and (5.59) ensure that

k([Ik3+] j+2)k  (k +j+1)k([Ik3+] j+1)k  (k +j+1) k([Ik3]+j+1) k [from (4.31)]  !0 ((k ) ) +  j+1 [from (5.73)] ( k ) + ( j +1)   !0 ( ) [from (5.69)] 0

0

0

0

0

0

(5:76)

0

which establishes (5.73) for i = j + 1. Hence, Step 3 of the algorithm is executed for all iterations k  k0. But this implies that 0 is nite which contradicts the assumption that Step 4 is executed in nitely often. Hence the theorem is proved. It is at present unclear how Algorithm 3.1 behaves when  < 1, in the absence of AS7. The inequalities from Lemma 5.1 appear not to be strong enough to guarantee at least a linear improvement in the error of the Lagrange multiplier estimates (k) because of the presence of the term a11 ((k) )10 k[((i k) )  ]i2A32 k in the bound (5.7). We should also point out that it is indeed possible to nd values ! ,  , ! and  which satisfy the requirements (3.2), (5.51), (5.52) and (5.53) for any 0 <   1. For instance, the values ! = 1,  = 0:75, ! = 1 and  = 0:25 suce. We caution the reader that, although the result of Theorem 5.3 is an important ingredient in overcoming the numerical diculties normally associated with barrier function methods, ensuring that (k) is bounded away from zero is not sucient. The numerical diculties arise because of the singularity of the barrier function when ci (x) + s(i k) = 0 for any 1  i  m. The algorithm is designed so that ci (x) + s(i k) > 0 for all 1  i  m. If, in

29 addition, AS7 holds, the Theorem 5.3 ensures that limit lim

x!x3 ;k!1

ci (x) + s(ik) = ci (x3 ) + min (3i )  > 0

(5:77)

for all 1  i  m, and thus numerical diculties will not arise as the limit is approached. In the absence of AS7, ci (x3) + min (3i )  = 0 for all i 2 A32 and thus numerical problems are possible in a small neighbourhood of the limit. If we make the following additional assumption, our de nition of oating variables has a further desirable consequence.

AS8: (Strict complementary slackness condition 2) Suppose that (x3; 3 ) is a KuhnTucker point for problem (1.15){(1.17). Then

J2 = fi 2 Nb j(g` (x3; 3 ))i = 0 and x3i = 0g = ;:

(5:78)

We then have the following direct analog of Conn et al. (1991, Theorem 5.4).

Theorem 5.4 Suppose that the iterates x(k) , k 2 K, converge to the the limit point x3 with corresponding Lagrange multipliers 3 , that AS1, AS2 and AS8 hold. Then for k suciently large, the set of oating variables are precisely those which lie away from their bounds, if present, at x3 . Proof. From Theorem 4.3, rx 9(k) converges to g` (x3 ; 3) and from Lemma 2.1, the variables in the set F4 then converge to zero and the corresponding components of g`(x3; 3 ) are zero. Hence, under AS8, F4 is null. Therefore, each variable ultimately remains tied to one of the sets F1 or D1 for all k suciently large; a variable in F1 is, by de nition,

oating and, whenever the variable is bounded, converges to a value away from its bound. Conversely, a variable in D1 is dominated and converges to its bound. We conclude the section by giving a rate-of-convergence result for our algorithm in the spirit of Conn et al. (1991, Theorem 5.5). For a comprehensive discussion of convergence, the reader is referred to Ortega and Rheinboldt (1970). Theorem 5.5 Suppose that the iterates fx(k) g generated by Algorithm 1 of Section 3 satisfy AS6, that AS1 and AS3 hold and that 3 are the corresponding vector of Lagrange multipliers. Then, if (3.8) holds for all k  k0 , (i) the Lagrange multiplier estimates for the inactive constraints, ([Ik3) ] , generated by Algorithm 1 converge Q-superlinearly to zero; (ii) the Lagrange multiplier estimates for the active constraints,  , where min R-linearly to 3 . The R-factor is at most  min the penalty parameter generated by the algorithm; and

([Ak)3 ] , converge at least is the smallest value of

(iii) if AS4 and AS5 replace AS3, x(k) converges to x3 at least R-linearly, with R-factor min(1; ! ;   ) . at most min Proof. Firstly, as (3.8) holds for all k  k0, the penalty parameter (k) remains xed at some value min , say, the convergence tolerances satisfy  ! and (k+1) =  (k)  min !(k+1) = !(k)  min

and (k+1) =  (k) all hold for all k > k0 .

(5:79)

30 The Q-superlinear convergence of the Lagrange multiplier estimates for inactive constraints follows directly from Theorem 4.4, part (v). Lemma 5.2, part (ii), the convergence of (k) to zero and the relationships (4.26) and (4.31) then give that (k ) k([Ik3+1) ] k  mink[I 3 ] k

(5:80)

and all k suciently large. The identities (2.5), (3.4) and the assumption that (3.8) holds for all k  k0 gives



i h k((k+1) 0 (k) )[A3]k = 0min1 ci(x(k) ) (ik) =i(k) i2A3 h i m  0min1 ci(x(k) ) (ik) =i(k) i=1  0min1 (k)

(5:81)

for all such k . But then the triangle inequality and (5.81) implies that

k((k+j) 0 3 )[A3]k  k((k+j+1) 0 3 )[A3]k + k((k+j+1) 0 (k+i))[A3] k (5:82)  k((k+j+1) 0 3 )[A3]k + 0min1 (k+j) for all k  k0. Thus summing (5.82) from j = 0 to jmax 0 1 and using the relationship (5.79) yields

k((k) 0 3 )[A3] k  k((k+j  k((k+j

) 0 3 ) 3 k + 01 Pjmax 01 (k+j ) [A ] min i=0  jmax max ) 0 3 )[A3] k + 0min1 (k) (1 0  min )=(1 0 min)

max

(5:83)

Hence, letting jmax tend to in nity and recalling that (k) converges to 3 , (5.83) gives 0 1  (k ) (5:84) k((k) 0 3 )[A3]k  1min 0 min  , (5.84) gives the for all k  k0 . As  (k) converges to zero R-linearly, with R-factor  min required result (ii). The remainder of the proof parallels that of Lemma 5.1. As (3.8) holds for all suciently large k, the de nition (5.12) of A and the bound (5.16), ensure that

kc(x(k) )

[A] k 

h

i  (k) = (k)

i

 a14(min

h h

i i2A ) 0 1 c

i ci (x(k) ) (k) =(k)

i

i

i2A



i (k) (k) =(k) m  a14 (min )01  (k) : i (x ) i i i=1

(5:85)

Thus combining (5.32) and (5.33), (5.80) and replacing (5.20) by (5.85), the bound on the right-hand side of (5.37) may be replaced by a23 !(k) + a24  (k) k([Ik3) ]k+ a25 (min )10 i h ((k ) )  + a ( 14 min )01  (k) and consequently (5.39) replaced by i 3 i2A2 1x(k)  M [a19 (1x(k) )2 + a20 1x(k) !(k) + a21 (!(k) )2 + a22  (k) k([Ik3) ] k ( ! (k) + 1x(k) ) + a23 !(k) + a24 (k) k([Ik3) ] k+ i h a25 (min )10 ((ik) )  i2A3 + a14 (min )01  (k) ] 2 = (Ma19 1x(k) + Ma20 !(k) )1x(k) + (Ma21 !(k) + Ma23 )! (k) + (Ma24 + Ma22 ( ! (k) + 1x(k) )) ( k) k([Ik3) ] k i h Ma25 (min )10 ((ik) )  i2A3 + Ma14 (min )01  (k) : 2

(5:86)

31 Hence, if k is suciently large that 1x(k)  1=(4Ma19 ); !(k)  min(1; 1=(4Ma20 )) and (k)  1;

(5:87)

(5.86) and (5.87) can be rearranged to give 1x(k)  2M [(a21 + a23 )! (k) + (a24 + 2 a22 )k([Ik3) ]k+ i h a25 (min )10 ((ik) )  i2A3 + a14 (min )01  (k)]: 2 But then (5.34) and (5.88) give kx(k) 0 x3 k  a32 !(k) + a33k([Ik3) ] k i h ( k)   0 1 ( k ) 1 0  ; +a34 (min )  + a35 (min ) (i ) 3 i2A2

(5:88)

(5:89)

where a32 def = 1 + 2M (a21 + a23 ), a33 def = 2M (a24 + 2a22 ), a34 def = 2Ma14 and a35 def = 2Ma25 . Each term on the right-hand-side of (5.89) converges at least R-linearly to zero; the R   ! respectively, following and  min , min,  min factors (in order) being no larger than  min (5.79), (5.80) and (5.84). Hence (5.89) and the restriction   1 shows that x(k) converges min(1; ! ;   ) . at least R-linearly with R-factor at most min As an immediate corollary we have

Corollary 5.6 Under the assumptions of Theorem 5.3, the results of Theorem 5.5 follow,   with the R-factor governing the convergence of fx(k)g being at most  min . Proof. This follows directly from Theorem 5.3 as this ensures that (3.8) is satis ed for all k suciently large. The bound on the R-factor for the convergence of x(k) is a consequence of (5.51) and (5.52). 6

An example

We are interested in the behaviour of Algorithm 1 in the case when the generated sequence of iterates has more than one limit point. We know that, under the assumptions of Theorem 4.3, each limit point will be a Kuhn-Tucker point. We aim to show in this section that, in the absence of AS6, the conclusion of Theorem 5.3 is false. We proceed by considering an example which has more than one Kuhn Tucker point and for which the optimal Lagrange multipliers are distinct. We consider a sequence of iterates which is converging satisfactorily to a single Kuhn-Tucker point (x31 ; 31 ) (and thus the penalty parameter has settled down to a single value). We now introduce an \extra" iterate x(k) near to a di erent Kuhn-Tucker point (x32; 32 ). We make use of the identity

ci (x(k) ) (ik) =(i(k) )  = (k) ((ik) 0  (ik) );

(6:1)

derived from (2.5) and (3.4), to show that if the Lagrange multiplier estimate (i k) calculated at x(k) is a suciently \accurate" approximation of 32 (while (i k) is an \accurate" representation of 31 ), the acceptance test (3.8) will fail and the penalty parameter will be reduced. Moreover, we show that this behaviour can be inde nitely repeated. To be speci c, we consider the following problem: minimize (x 0 1)2 such that c(x) = x2 0 4  0; x2
> > < > > > :

6 if   13 ! ( k )  0 1)s 2 3 + (34(1 otherwise: 0 )

(6:11)

Now (6.4) implies that s(k)   < 1 and thus we obtain the overall bound 



kP (x(k) ; rx9(x(k) ; (k) ; s(k))k   6 + 1 01  1

(6:12)

from (6.8) and (6.11). But then (6.12) and (6.4) give

kP (x(k) ; rx9(x(k); (k) ; s(k) )k  !0  ! = !(k) ; as 10 !  100 !  !0=(6 + 1=(1 0 1 )). Furthermore, from (6.1) and (6.4),

(6:13)

(6:14) kc(x(k) ) (k) =((k) )  k = k(ik) 0  (ik) k = 2   0   = (k) :  200   100   20 =3. Thus x(k) satis es (3.6) and (3.8) and hence 3 2

as 20  Step 3 of the algorithm will be executed. Therefore, in particular, !(k+1) = !0  ! + ! , (k+1) = 0   +  and (k+1) = (1 0 )31 . 2. For i = 1; 1 1 1 j 0 2, let q

x(k+i) = 02 1 0 41 i (1 0 )s(k+i) =(1 0 i+1 );

(6:15)

where the shift s(k+i) = ( 32 (1 0 i ))  . Note that (6.15) is well de ned as the second term within the square root is less than 14 in magnitude because (6.4) and (6.8) imply that s(k) <  and i (1 0 )=(1 0 i+1 ) < 1. It is then easy to verify that (k+i) = (1 0 i+1 )31 . Moreover,

rx9(x(k+i); (k+i); s(k+i)) = 2(x(k+i) 0 1) 0 3(1 0 i+1)x(k+i) = 0(2 + (1 0 3i+1 )x(k+i) ) i (10)s k i = 02 1 0 (1 0 3i+1 ) 1 0  4(1 0 i ) : r



( + )

(6:16)



+1

Now suppose i+1  31 . Then (6.16), (6.7), (6.8) and s(k)   yield (10)s k i ) kP (x(k+i); rx9(x(k+i) ; (k+i) ; s(k+i))k  2 1 0 (1 0 3i+1)(1 0 i8(1 0 i ) 

( + )



+1

 i )(103i+1 )s(k+i)  = 2 3i+1 ) +  (108(1 0i+1 i+1 )  (1 0  )(1 0 3  i +1  2 3 + 8(10i+1 ) ) )  2i+1 3 + 8(101 1 ) :

(6:17)

34 If, on the other hand, i+1 > 13 , the same relationships give (10)s k i ) kP (x(k+i); rx9(x(k+i) ; (k+i) ; s(k+i))k  2 1 0 (1 0 3i+1)(1 0 i4(1 0 i ) 

( + )

)(103 )s = 2 3i+1 +  (104(1 0i+1 )  6i+1: 

i

i+1

(k +i)

Thus, combining (6.17) and (6.18), we certainly have that 



+1



kP (x(k+i) ; rx9(x(k+i) ; (k+i) ; s(k+i))k   6 + 1 01 i+1 : 1



(6:18) (6:19)

But then (6.19) and (6.4) give (6:20) kP (x(k+i) ; rx9(x(k+i); (k+i) ; s(k+i) )k  !0  ! +i ! = !(k+i); as 10 ! +i(10 ! )  10 !  100 !  !0=((6 + 1=(1 0 1)) ). Furthermore, from (6.1)

and (6.4),

kc(x(k+i) ) (k+i)=((k+i))  k = k(ik+i) 0 (ik+i)k = i+1 (1 0 )  i+1  0  +i  = (k+i) 3 2

3 2

(6:21)

as 10  +i(10  )  10   100   23 0 =. Thus x(k+i) satis es (3.6) and (3.8) and hence Step 3 of the algorithm will be executed. Therefore, in particular, !(k+i+1) = !0  ! +(i+1) ! ,  (k+i+1) = 0   +(i+1)  and (k+i+1) = (1 0 i+1 )31 . 3. Let q x(k+j 01) = 02 1 0 14 j 01s(k+j 01) ; (6:22) where the shift s(k+j 01) = ( 32 (1 0 j 01))  . Once again, (6.4) and (6.8) imply that s(k+j 01)   and thus (6.22) is well de ned. Furthermore it is easy to verify that  (k+j 01) = 31 . Moreover rx9(x(k+j01) ; (k+j01) ; s(k+j01) ) = 2(x(k+j01) 0 1) 0 3x(k+j01) = 0(2+ x(kq+j01) )  (6:23) 1 j 01 (k +j 01) = 02 1 0 1 0 4  s : But then (6.7), (6.23) and the inequality s(k+j 01)   yield kP (x(k+j01); rx9(x(k+j01) ; (k+j01) ; s(k+j01) )k  12 j01s(k+j01)  21 j :

(6:24)

Thus, (6.24) and (6.4) give kP (x(k+j01); rx9(x(k+j01) ; (k+j01) ; s(k+j01) )k  !0  ! +(j01) ! = !(k+j01) ; (6:25) as 10 ! +(j 01)(10 ! )  10 !  010 !  !0 =((6 + 1=(1 0 1)) )  2!0=. Furthermore, from (6.1) and (6.4), kc(x(k+j01) ) (k+j01)=((k+j01) )  k = k(ik+j01) 0  i(k+j01) k = 32 j  (6:26)  0   +(j01)  = (k+j01) as 10  +(j 01)(10  )  10   100   23 0 =. Thus x(k+j 01) satis es (3.6) and (3.8) and hence Step 3 of the algorithm will be executed. Therefore, in particular, !(k+j ) = !0  ! +j ! ,  (k+j ) = 0   +j  and (k+j ) =)31 .

35

4. We pick x(k+j ) as the largest root of the nonlinear equation 3xs(k+j ) = 0; (6:27) (x)  2(x 0 1) 0 2 x 0 4 + s(k+j ) where s(k+j) = ( 23 )  , Equation (6.27) de nes the stationary points of the Lagrangian barrier function for the problem (6.3). This choice ensures that (3.6) is trivially satis ed. As (2) = 04 and (x) increases without bound as x tends to in nity, the largest root of (6.27) is greater than 2. The function  given by (2.4) is a decreasing function of x as x q grows beyond 2. Now let x^ = 4 + 21 s(k+j ) . It is easy to show that  (^x; 31; s(k+j )) = . Moreover (^x) = 2(^x 0 1) 0 2^x = 02. Therefore, x(k+j ) > x^ and thus  (k+j ) < . But then, using (6.1), kc(x(k+j))(k+j) =((k+j) )  k = ((k+j) 0  (k+j) )  ( 32  0 ) = 21  (6:28) > 0  +j  =  (k+j ) from (6.5). Thus the test (3.8) is violated and the penalty parameter subsequently reduced. This ensures that ! (k+j +1) = !0 () ! ,  (k+j +1) = 0()  and (k+j +1) = 31 . Hence, a cycle as described at the start of this section is possible and we conclude that, in the absence of AS6, the penalty parameter generated by Algorithm 1 may indeed converge to zero. 7

Second-order conditions

It is useful to know how our algorithms behave if we impose further conditions on the iterates generated by the inner iteration. In particular, suppose that x(k) satis es the following second-order suciency condition: AS9: Suppose that the iterates x(k) and Lagrange multiplier estimates (k) , generated by Algorithm 1, converge to the Kuhn-Tucker point (x3; 3 ) for k 2 K and that J1 and J2 are as de ned by (5.2). Then we assume that rxx9([Jk);J ] is uniformly positive de nite (that is, its smallest eigenvalue is uniformly bounded away from zero) for all k 2 K suciently large and all sets J , where J is any set made up from the union of J1 and any subset of J2 . With such a condition we have the following result.

Theorem 7.1 Under AS1, AS2, AS7 and AS9, the iterates x(k) , k Algorithm 1 converge to an isolated local solution of (1.15){(1.17).

2 K,

generated by

Proof. Let J be any set as described in AS9. Then (rxx9(k) )[J ;J ] = (H`(x(k) ; (k) ))[J ;J ] + (A([Ak)3 ;J ] )T D[(Ak)3 ;A3 ] A([Ak)3 ;J ] + (A([Ik3) ;J ] )T D[(Ik3);I 3 ] A([Ik3) ;J ]

(7:1)

where D(k) is a diagonal matrix with entries

(ik) s(ik) i(k) = i;i (ci (x(k)) + s(i k) )2 ci (x(k) ) + s(i k) for 1  i  m. Let s[J ] be any non-zero vector satisfying D (k ) =

A([Ak)3 ;J ]s[J ] = 0:

(7:2)

(7:3)

36 Then for any such vector,

sT[J ](rxx 9(k) )[J ;J ] s[J ]  2sT[J ]s[J ]

(7:4)

(k) , i 2 I 3 , converge to for some  > 0, under AS9. We note that the diagonal entries Di;i zero. Hence, for k suciently large

sT[J ](A([Ik3) ;J ])T D[(Ik3) ;I 3 ] A([Ik3) ;J ]s[J ]  sT[J ]s[J ]

(7:5)

and thus combining (7.1){(7.5),

sT[J ] (H` (x(k) ;  (k) ))[J ;J ] s[J ]  sT[J ] s[J ] By continuity of H` as x(k) and  (k) approach their limits, this gives that sT[J ] (H`(x3 ; 3))[J ;J ] s[J ]  sT[J ]s[J ] :

(7:6) (7:7)

for all non-zero s[J ] satisfying

A(x3)[A3 ;J ] s[J ] = 0; (7:8) which, given AS7, implies that x3 is an isolated local solution to (1.15){(1.17) (see, for example, Avriel, 1976, Theorem 3.11). We would be able to relax the reliance on AS7 in Theorem 7.1 if it were clear that the (k) , i 2 A3 , converged to zero for some subsequence of K. However, it is not elements Di;i 2 known if such a result holds in general. The importance of AS9 is that one might tighten the inner iteration termination test (Step 2 of the algorithm) so that, in addition to (3.5)), rxx9([Jk);J ] is required to be uniformly positive de nite, for all oating variables J and all k suciently large. If the strict complementary slackness condition AS8 holds at x3 , Theorem 5.4 ensures that the set J2 is empty and J1 identical to the set of oating variables after a nite number of iterations and thus, under this tighter termination test, AS9 and Theorem 7.1 holds. There is a weaker version of this result, proved in the same way, that if the assumption of uniform positive-de niteness in AS9 is replaced by an assumption of positive semide niteness, the limit point then satis es second-order necessary conditions (Avriel, 1976, Theorem 3.10) for a minimizer. This weaker version of AS9 is easier to ensure in practice as certain methods for solving the inner iteration subproblem, for instance that of Conn et al. (1988a), guarantee that the second derivative matrix at the limit point of a sequence of generated inner iterates will be positive semi-de nite. 8

Feasible starting points

We now return to the issue raised in Section 3.2, namely, how to nd a point for which

c(x) + s(k+1) > 0 and x 2 B

(8:1)

from which to start the k + 1-st inner iteration of Algorithm 1. We saw in Lemma 3.1 that this is trivial whenever (3.8) holds as the current estimate of the solution x(k) satis es (3.11). Furthermore, under the assumptions of Theorem 5.3, we know that (3.8) will hold for all suciently large k . The main diculty we face is that, when (3.8) fails to hold, the updates (3.10) do not guarantee that (3.11) holds and thus we may need a di erent starting point for the k + 1-st inner iteration.

37 There is, of course, one case where satisfying (8.1) is trivial. In certain circumstances, we may know of a feasible point, that is a point xfeas which satis es (1.16) and (1.17). This may be because we have a priori knowledge of our problem, or because we encounter such a point as the algorithm progresses. Any feasible point automatically satis es (8.1) as s(k+1) > 0. One could start the k + 1-st inner iteration from xfeas whenever (3.11) is violated. There is, however, a disadvantage to this approach in that a \poor" feasible point may result in considerable expense when solving the inner-iteration subproblem. Ideally, one would like a feasible point \close" to x(k) or x3 as there is then some likelihood that solving the inner-iteration will be inexpensive. It may, of course, be possible to nd a \good" interpolatory point between x(k) and xfeas satisfying (8.1). This would indeed be possible if the general constraints were linear. We consider the following alternative. Suppose that the k-th iteration of Algorithm 3.1 involves the execution of Step 4. Let 3(k+1) be any diagonal matrix of order m whose diagonal entries satisfy s(ik+1) (k+1)  1: (8:2) ( k)  3i;i si Note that 3(k+1) is well de ned as (3.1) and (3.10) ensure s(i k+1)  s(i k) for all i. Consider the auxiliary problem minimize  x2

Suggest Documents