A robust algorithm for optimization with general equality ... - CiteSeerX

11 downloads 0 Views 285KB Size Report
ci(xk) + rci(xk)T d 0; i 2I;. (1:7) at the kth iterate, where gk = rf(xk) is the gradient of the objective func- tion and Bk is an estimate of the Hessian of the Lagrangian ...
A robust algorithm for optimization with general equality and inequality constraints Xin-wei Liu and Ya-xiang Yuan

State Key Laboratory of Scienti c and Engineering Computing, Institute of Computational Mathematics and Scienti c/Engineering Computing, Chinese Academy of Sciences, P.O.Box 2719, Beijing, 100080, China

Abstract

An algorithm for general nonlinearly constrained optimization is presented, which solves an unconstrained piecewise quadratic subproblem and a quadratic programming subproblem at each iterate. The algorithm is robust since it can circumvent the diculties associated with the possible inconsistency of QP subproblem of the original SQP method. Moreover, the algorithm can converge to a point which satis es a certain rst-order necessary optimality condition even when the original problem is itself infeasible, which is a feature of Burke and Han's methods(1989). Unlike Burke and Han's methods(1989), however, we do not introduce additional bound constraints. The algorithm solves the same subproblems as Han-Powell SQP algorithm at feasible points of the original problem. Under certain assumptions, it is shown that the algorithm coincide with the Han-Powell method when the iterates are suciently close to the solution. Some global convergence results are proved and local superlinear convergence results are also obtained. Preliminary numerical results are reported.

keywords: SQP algorithm, constrained optimization, convergence.

 Research

partially supported by Chinese NSF grants 19525101,

National 9-5 key project

1

19731001,

and by

1 Introduction We consider the optimization problem with general equality and inequality constraints min f (x) (1:1) s:t: ci (x) = 0; i 2 E; (1:2) ci(x)  0; i 2 I; (1:3) where f (x) : Rn ! R and ci (x) : Rn ! R (i 2 E [ I ) are continuously di erentiable functions, E = f1; 2;    ; me g, I = fme + 1;    ; mg, me and m are two positive integers and m  me. SQP algorithms for constrained optimization are iterative and generate a new approximate to the solution by the procedure

x+ = x + sd;

(1:4)

where x is the current point, d is a search direction which minimizes a quadratic model subject to linearized constraints and s is the steplength along the direction, such as [8; 14; 23]. For k  1, the original SQP method developed by Wilson, Han and Powell solves the following QP subproblem (1:5) min gkT d + 21 dT Bk d s:t: ci (xk ) + rci (xk )T d = 0; i 2 E; (1:6) ci (xk ) + rci (xk )T d  0; i 2 I; (1:7) at the kth iterate, where gk = rf (xk ) is the gradient of the objective function and Bk is an estimate of the Hessian of the Lagrangian function m X L(x; ) = f (x) ? i ci(x); (1:8) i=1

and (1 ; 2 ;    ; m )T is a multiplier vector approximation. Because its nice convergence properties(for example, see Han(1977), Powell(1977, 1978), Boggs et al(1982)), the SQP method has been abstracting attentions from many researchers. It has been extended to problems other than optimization (Pang and Gabriel(1993), Taji and Fukushima(1996)). The requisite consistency of the linearized constraints of the QP subproblem (1.5)-(1.7) is a serious limitation of the SQP method. Within the 2

framework of the SQP method, Powell suggests to solving a modi ed subproblem at each iterate (Powell(1977), Stoer(1985)): (1:9) min gkT d + 12 dT Bk d + 12 k (1 ? )2 s:t: ci (xk ) + rci(xk )T d = 0; i 2 E; (1:10) ici (xk ) + rci (xk )T d  0; i 2 I; (1:11) ( 1; ci (xk ) > 0 and 0    1,  > 0 is a penalty parameter. where i = ; k ci(xk )  0 With some other technique, the computational investigation provided by Schittkowski (1981, 1983) shows that this modi cation works very well. A simple example presented by Burke and Han (1989) and Burke (1992) indicates this approach may not be the best one. Assume there are two constraints on R: c1 (x) = 1 ? ex = 0; (1:12) c2 (x) = x = 0; (1:13) with any objective function f (x) on R. For any infeasible point x 6= 0, the linearized constraints are inconsistent, and the only solution of the modifed constraints (1.10)-(1.11) is  = 0 and d = 0. Though this example is too special to make a general claim, it shows that the problem caused by the inconsistency of the linearized constraints can not always be solved by using (1.10)-(1.11). Based on a trust region strategy, Fletcher (1981, 1982) developed the Sl1 QP method for (1.1)-(1.3). Fletcher's approach solves the following QP subproblem at the kth iteration: (1:14) min gkT d + 21 dT Bk d + k jj(c(xk ) + rc(xk )T d)? jj1 s:t: jjdjj1  k ; (1:15) T T m where c(xk ) = (c1 (xk );    ; cm (xk )) , (c(xk ) + rc(xk ) d)? 2 R with (ci (xk ) + rci (xk )T d)? = ci (xk ) + rci (xk )T d; i 2 E;

(1:16)

(ci (xk ) + rci (xk )T d)? = min(0; ci (xk ) + rci (xk )T d); i 2 I; (1:17) k is a penalty parameter, k is a positive constant. It has been shown that under certain assumptions the search direction generated by (1.14)-(1.15) is locally identical to that by (1.5)-(1.7). 3

Burke and Han(1989) shows that Fletcher's approach is still incomplete. One of the reasons is that the search direction may points to the contrary of the optimal point. Similar to the method of Sahba(1987), Burke and Han (1989) and Burke(1989) present an approach to overcome diculties associated with the inconsistency of the QP subproblem (1.5)-(1.7). Their methods are also similar to the methods of Powell(1977) and Fletcher(1981, 1982). A feature di erent to the other methods is that even when (1.1)-(1.3) is itself infeasible their methods can converge to a point which meets a certain rst-order necessary optimality condition. However, Burke and Han's method is conceptual. In this paper, we describe an implementable algorithm which is a modi cation to the SQP method. With some technique, it is a generalization of the algorithm presented by Liu and Yuan(1997). The motivation of the algorithm is described in that report. The algorithm can circumvent diculties associated with the infeasibility of the QP subproblem. Our method is similar to the above mentioned methods. Unlike Burke and Han's method, however, we do not introduce additional bound constraints. By using certain information at current point and solving two subproblems, we will obtain a direction which can be a nonzero descent direction of the merit function even if (1.5)-(1.7) is infeasible. The algorithm solves the same subproblem as (1.5)-(1.7) at a feasible point of (1.1)-(1.3). Moreover, under certain local assumptions, the algorithm and Han-Powell method generate identical iterates. Some global convergence results are proved and the local superlinear convergence is derived. Our algorithm can be easily combined with the trust region approach. Thus, the algorithm can be extended to a trust region algorithm for optimization with general constraints. The paper is organized as follows. We present our algorithm in section 2. The stationary properties of the algorithm is given in section 3. In section 4 some global convergence results are proved. We discuss the local properties of the algorithm in section 5. In section 6, some preliminary numerical results are reported.

2 The algorithm De ne the penalty function associated with (1.1)-(1.3)

(x; r) = f (x) + rjjc(x)? jj; 4

(2:1)

where jj  jj is any given convex norm on Rm , r > 0 is a penalty parameter and c(x)? 2 Rm with

ci (x)? = ci (x); i 2 E; (2:2) ci(x)? = min(0; ci (x)); i 2 I (2:3) It is straightforward to see that jjc(x)? jj = 0 if and only if x is a feasible point of (1.1)-(1.3). If the norm jj  jj is the l1 norm, (2.1) is the l1 exact

penalty function, which is also a merit function employed by Han(1977) and Powell(1977, 1978). Throughout this paper if the norm is not speci ed, it is the same as that used in (2.1). De ne the index sets

Ik = fi 2 I : ci(xk )  0g Ik = fi 2 I : ci(xk ) > 0g Jk = Ik [ E

(2:4)

Ik () = fi 2 I : ci (xk )  g Ik () = fi 2 I : ci (xk ) > g Jk () = Ik () [ E

(2:7)

(2:5) (2:6) These index sets are related to the current iterate xk and can be identi ed easily. In practical implementation of the algorithm, a small positive tolerance number  is introduced and the following index sets (2:8) (2:9) are employed instead of (2.4)-(2.6). Under some assumptions, we will show that Jk tends to be the index set of the active constraints of (1.1)-(1.3). Our algorithm solves two subproblems at each iterate, one is an unconstrained piecewise quadratic subproblem (see [13, 21]) and the other is a quadratic programming subproblem. At the kth iteration the unconstrained subproblem has the following form: (2:10) mind2Rn k (d) = 21 dT Bk d + rk jj(cJk (xk ) + rcJk (xk )T d)? jj where Bk positive de nite is an estimate of the Lagrangian Hessian of (1.1)(1.3), cJk (xk ) 2 RjJk j is a vector whose components are ci (xk )(i 2 Jk ), jJk j is the cardinarity of the index set Jk and rk is the penalty parameter. Let dk1 be the solution of (2.10). If xk is feasible, we have dk1 = 0. If dk1 6= 0, 5

dk1 is a descent direction of (xk ; rk+1 ) for suciently large rk+1 . Moreover, there is a k 2 (0; 1] such that ci (xk ) + rci (xk )T dk1  0 for all  2 [0; k ] and i 2 Ik . In fact, we can let k = minf1; ^k g, where ^k = minf?ci(xk )=(rci (xk )T dk1 ) : i 2 Ik and rci (xk )T dk1 < 0g: (2:11) Let c^i (xk ) = ci (xk )+ rci (xk )T k dk1 for i 2 Ik , we generate dk2 by solving the following QP subproblem min gkT d + 21 dT Bk d

(2:12)

s:t: rci (xk )T d = 0; i 2 E (2:13) rci(xk )T d  0; i 2 Ik (2:14) c^i (xk ) + rci (xk )T d  0; i 2 Ik (2:15) and let dk = k dk1 + dk2 be the search direction. It will be shown that dk is a descent direction for the penalty function where the penalty parameter is updated automatically. Therefore, (2.1) can be employed as a merit function to force the global convergence of the algorithm. The updating of penalty parameter for the SQP approach is important. In order to obtain the global convergence, Han(1977) and Powell(1977) require that r  jjk jj1 (2:16) for all k  1, where k is an estimate of the Lagrangian multiplier vector at xk . However, (2.16) is generally replaced by some updating procedure when practically implementing a SQP algorithm because we do not know any information about the multiplier vector of (1.8). Similar to Powell(1977), Burke and Han(1989), a penalty parameter updating procedure is employed in our algorithm. Since dk2 is not related to the constraint violation, the object of updating the penalty parameter is to force dk1 to be a descent direction of (2.1). Thus, at the kth iteration we let rk unchanged if dk1 is a descent direction; Otherwise, rk is increased in the following way: T dk1 + dT Bk dk1 k1 (2:17) rk+1 = maxf2rk + ; jj(c ) gjjk ? jj(cJk + rcTJk dk1)?jj g Jk ?

where  is a positive number. Now we can state our algorithm as follows. 6

Algorithm 2.1 (A Robust Algorithm for Optimization) Step 0 Given the initial approximate x , a n  n symmetric positive 0

de nite matrix B0 , an initial penalty parameter r0 > 0 and some positive scalars , and , where < 1 and  < 21 ; k = 0; Step 1 If the stopping criterion is satis ed, stop; Solve subproblem (2.10) to generate dk1 and subproblem (2.12)(2.15) to generate dk2 ; Step 2 Update penalty parameter. If gkT dk1 + 21 dTk1 Bk dk1

+rk (jj(cJk (xk ) + rcJk (xk )T dk1 )? jj ? jj(ck (xk ))? jj)  0 (2:18) let rk+1 = rk ; Otherwise, rk is updated by (2.17). Step 3 dk = k dk1 + dk2 . Select the smallest positive integer s such that (xk + s dk ; rk+1 ) ? (xk ; rk+1 ) 

 s (gkT dk + rk+1 (jj(c(xk ) + rc(xk )T dk )? jj ? jj(c(xk ))? jj)): (2:19) Let tk = s and xk+1 = xk + tk dk ; Step 4 Generate Bk+1 . Set k = k + 1 and goto step 1. The stopping criterion is not given in the algorithm. Generally, jjdk jj2 =

0 can be used as the stopping criterion. Since no assumption on regularity of the constraints is made, it is possible that dk does not tend to zero for k ! 1. Thus, we use the condition jjxk+1 ? xk jj2 = 0 as the stopping criterion. In practical implementation, a positive tolerance number will be introduced. Algorithm 2.1 is similar to the methods proposed by Burke and Han(1989), Burke(1989). Since no additional bound constraints are employed, the algorithm can be implemented in the same way as SQP algorithms. It should be noted that our algorithm solves the same subproblem as (1.5)-(1.7) at a feasible point of (1.1)-(1.3). Two examples presented by Burke and Han(1989) can help us to understand the above algorithm and the di erences between our algorithm and Burke and Han's methods. 7

Example 2.2 The constraint function c : R ! R has the form: ! x 1 ? e c(x) = 2

x

(2:20)

and me = m = 2. The norm is the l1 norm. Thus, (2.10) has the form mind2R 21 Bk d2 + rk (j1 ? ex ? ex dj + jx + dj): (2:21) For any xk = x 6= 0, by direct calculations, dk2 = 0 and

if x > 0; dk1 = e?x ? 1 or ? Brk (ex + 1); k if x < 0; dk1 = Brk (ex + 1) or ? x; k if x = 0; dk1 = 0: It is easily found that dk1 has the following properties: dk1 > 0 for x < 0; dk1 < 0 for x > 0; dk1 = 0 for x = 0:

(2:22) (2:23) (2:24) (2:25)

(2:26) (2:27) By (2.25)-(2.27), our algorithm will converge to the solution x = 0 from any starting point.

Example 2.3 The constraint function c : R ! R is given by ! ? x ? 1 c(x) = ?x 2

2

(2:28)

and me = 0, m = 2. Any problem with c(x) as its constraint is infeasible and there always is a constraint violation for c1 (x) = ?x2 ? 1. Let the norm is the l1 norm. By direct calculations, we have that 2 (2:29) dk1 = max(? 2Brk x; ? x 2+x 1 ) > 0; dk2 = 0 if x < 0; k 2rk rk dk1 = ?x; ? x22+1 x ; ? Bk (2x + 1) or ? Bk x < 0

and dk2  0 if 1 > x > 0; 8

(2:30)

rk rk dk1 = ?x; ? x22+1 x ; ? Bk or ? Bk (2x + 1) < 0

and dk2  0 if x > 1;

dk1 = max(? 3Brk ; ? 1); dk2  0 if x = 1; k dk1 = 0; dk2  0 if x = 0:

(2:31) (2:32)

(2:33) Thus, the search direction generated by our algorithm always points towards the origin, of which the image under c is the closest point to R+2 for the l1 norm. Algorithm 2.1 can also solve the problem (8.1) of Burke and Han(1989) successfully since dk2 = 0 and dk1 directs to the optimal solution for any iteration point x 6= 0.

3 Stationary properties of the algorithm Examples 2.2 and 2.3 display some properties of the algorithm 2.1. These properties are favorable in practice because much information such as consistency for (1.1)-(1.3) is not known beforehand. Since no restrictions are imposed on the constraint functions, a cluster point of the sequence generated by our algorithm can be one of three di erent type points. Similar to Yuan(1995), we give their de nitions and their stationary properties. De nition 3.1 x 2 Rn is called (1) a strong stationary point of (1.1)-(1.3) if x is feasible and there exists a vector  = (1 ; 2 ;    ; m )T 2 Rm such that m X g(x) ? i rci (x) = 0; (3:1) i=1

i  0; i ci (x) = 0; i 2 I ; (3:2) (2) an infeasible stationary point of (1.1)-(1.3) if x is infeasible and (3:3) mind2Rn jj(c(x) + rc(x)T d)? jj = jj(c(x))? jj; (3) a singular stationary point of (1.1)-(1.3) if x is feasible and there exists an infeasible sequence fvk g converging to x such that T (3:4) limk!1 mind2Rn jj(jjc((cv(kv) +)) rjjc(vk ) d)? jj = 1: k ? 9

De nition 3.1 is related to our algorithm closely. It should be noted that there are some di erences between our de nition and that of Yuan(1995), for example, the de nition on the singular stationary point. A strong stationary point de ned above is precisely a K ? T point of (1.1)-(1.3). If jj(c(xk ))? jj = 0 and dk2 = 0, by the rst-order K ? T condition of (2.12)-(2.15), xk is a strong stationary point of (1.1)-(1.3). Throughout this report, we make the following assumption: Assumption 3.2 (1) f (x) and ci(x), i 2 E [ I are twice continuously differentiable functions; (2) The approximation Bk of the Lagrangian Hessian is positive de nite and there exists two positive constants M1 and M2 such that M1 jjdjj22  dT Bk d  M2 jjdjj22 (3:5) n holds for all d 2 R and all k  1. Lemma 3.3 The following statements hold: (i) If (3.3) holds at xk , then d=0 solves (2.10) uniquely; (ii) If fxk g and frk g are bounded, then fdk1 g is also bounded. (i) For any d 6= 0, by (3.3), there exists t > 0 suciently small such that k (td) = (1=2)t2 dT Bk d + rk jj(cJk (xk ) + rcJk (xk )T (td))? jj Proof.

= (1=2)t2 dT Bk d + rk jj(c(xk ) + rc(xk )T (td))? jj

(3:6)

 (1=2)t dT Bk d + rk jj(c(xk ))? jj > k (0): 2

Because k (d) is convex, we can see that d = 0 is the unique solution of (2.10). (ii) The de nition of dk1 shows that k (0)  k (dk1 )

 (1=2)M jjdk jj + rk jj(cJk (xk ) + rcJk (xk )T dk )?jj 1

2 2

1

(3:7)

 (1=2)M jjdk jj : 1

Therefore,

1

1

2 2

jjdk jj  (2=M ) k (0) = (2=M )rk jj(c(xk ))?jj: 1

2 2

1

1

10

(3:8)

Lemma 3.4 If x 2 Rn is an infeasible stationary point or a singular stationary point as de ned above, then there exist   0 and  2 Rm such that the rst-order necessary optimality condition

0 g(x) ? holds.

m X i=1

0

i rci (x) = 0;

i  0; i 2 I

(3:9) (3:10)

Suppose that d(x) minimizes the unconstrained problem (3:11) mind2Rn 21 dT Bd + jj(c(x) + rc(x)T d)? jj at the iteration point x, where B is any positive de nite matrix. Then, the rst-order optimality condition at x gives that Bd + rc(x)(x) = 0; (3:12) (x) 2 @ jjujjju=(c(x)+rc(x)T d)? ; (3:13) m where (x) 2 R . It follows directly from (3.13) that ((x))i  0 for i 2 I . If x is an infeasible stationary point, similar to the proof of Lemma 3.3, we have that d(x) = 0. Let 0 = 0 and i = ?((x))i , which gives (3.9). Now suppose that x is a singular stationary point, fxk : k 2 K g is a subsequence and xk ! x for k ! 1(k 2 K ). Suppose that d(xk ) is a solution of (3.11) at xk , then (3.12)-(3.13) holds at xk and mind2Rn jj(c(xk ) + rc(xk )T d)? jj ? jj(c(xk ))? jj (3:14)  ? 21 d(xk )T Bd(xk )  0: Combining (3.4), we have T (xk ) limk!1;k2K d(xjjkc)(xBd = 0: (3:15) k )? jj Thus, for k 2 K , limk!1jjd(xk )jj = 0: (3:16) It follows from (3.16) and (3.12) that limk!1;k2K rc(xk )(xk ) = 0: (3:17) Because jj(xk )jj0  1 for all k (where jj  jj0 is the dual norm of jj  jj), there is a cluster point  2 Rm with ( )i  0 for i 2 I . We see that (3.9) holds if we let 0 = 0 and i = ?( )i for i 2 E [ I . This completes our proof. Proof.

11

4 Global convergence First we show that if our algorithm stops after nite many iterations, the last iterate point must be a strong stationary point or an infeasible stationary point of (1.1)-(1.3).

Lemma 4.1 Suppose that dk1 is a solution of (2.10), dk2 solves (2.12)(2.15). If dk1 = 0 and dk2 = 0, then xk is either a strong stationary point or an infeasible stationary point of (1.1)-(1.3). . If dk1 = 0 and dk2 = 0, it follows from the rst order Kuhn-Tucker condition of (2.12)-(2.15) that there exists k 2 Rm such that

Proof

gk ? rc(xk )k = 0;

(4:1)

(k )i ci (xk ) = 0 for i 2 Ik ; (4:2) (k )i  0 for i 2 I: (4:3) If jj(c(xk ))? jj = 0, then by (4.1)-(4.3) and De nition 3.1(i), xk is a strong stationary point of (1.1)-(1.3). Suppose that jj(c(xk ))? jj 6= 0. We want to prove that (3.3) holds for xk . If it is not the case, then there exist d~k 6= 0 and 0 < k  1 such that mind2Rn jj(c(xk ) + rc(xk )T d)? jj = jj(c(xk ) + rc(xk )T d~k )? jj (4:4) < jj(c(xk ))? jj and

jj(c(xk ) + rc(xk )T (k d~k ))? jj = jj(cJk (xk ) + rcJk (xk )T (k d~k ))? jj: (4:5) Let d^k = k d~k , it follows that

rk (jj(cJk (xk ))? jj ? jj(cJk (xk ) + rcJk (xk )T d^k )? jj)  21 d^Tk Bk d^k :

De ne

T^ t0 = r2k jj(cJk (xk ))? jj ? jj(c^JTk (xk^) + rcJk (xk ) dk )? jj ; dk Bk dk

12

(4:6) (4:7)

then by (4.6), 0 < t0  14 , and k (t0 d^k ) ? k (dk1 )

 t d^Tk Bk d^k + rk t fjj(cJk (xk ) + rcJk (xk )T d^k )? jj ? jj(cJk (xk ))? jjg 1 2 2 0

0

 t k rk fjj(c(xk ) + rc(xk )T d~k )?jj ? jj(c(xk ))? jjg < 0; 3 4 0

which gives a contradiction.

(4:8)

The following result shows that the line search procedure is well de ned in the algorithm: Lemma 4.2 Suppose that at least one of dk1 and dk2 is nonzero, k is de ned by (2.11). Then k dk1 +dk2 is a descent direction of the penalty function (2.1) and the line search condition (2.19) is well de ned. . Let q(x) = jjc(x)? jj, by Lemma 4.1 of Burke and Han(1986), q0 (x; d)  jj(c(x) + rc(x)T d)? jj ? jjc(x)? jj: (4:9) De ne dk = k dk1 + dk2 , then 0 (xk ; rk+1 ; dk )  gkT dk + rk+1 (jj(c(xk )+ rc(xk )T dk )? jj?jjc(xk )? jj): (4:10) By (2.12)-(2.15) and the convexity of the norm, jj(c(xk ) + rc(xk )T dk )? jj ? jjc(xk )? jj (4:11)  k (jj(cJk (xk ) + rcJk (xk )T dk1)?jj ? jjc(xk )?jj): Thus, 0 (xk ; rk+1 ; dk )  gkT dk2 + k fgkT dk1 + (4:12) T rk+1 (jj(cJk (xk ) + rcJk (xk ) dk1 )?jj ? jjc(xk )? jj)g: It follows from step 2 of the algorithm that (4:13) 0 (xk ; rk+1 ; dk )  ? 21 dTk2 Bk dk2 ? 21 k dTk1 Bk dk1 < 0: Now we prove that the line search condition (2.19) is well de ned. By the mean value theorem, for any t > 0, there exists 2 (0; t) such that f (xk + tdk ) ? f (xk ) = tg(xk + dk )T dk : (4:14) Proof

13

Similarly there exist i 2 (0; t) such that

ci (xk + tdk ) ? ci (xk ) = trci(xk + i dk )T dk :

(4:15)

De ne Ak = (rc1 (xk + 1 dk ); rc2 (xk + 2 dk );    ; rcm (xk + m dk )), then

(xk + tdk ; rk+1 ) ? (xk ; rk+1 )

 tg(xk + dk )T dk + trk (jj(c(xk ) + ATk dk )?jj ? jjc(xk )? jj):

(4:16)

+1

Since

jj(c(xk )+ATk dk )?jj?jj(c(xk )+rc(xk )T dk )?jj  jj(Ak ?rc(xk ))T dk jj; (4:17) it follows from the rst part of the proof that there always exists a suciently small t0 > 0 such that for all t 2 (0; t0 ), 2 (0; t), (g(xk + dk ) ? gk )T dk + rk+1 (jj(Ak ? rc(xk ))T dk jj)+ (1 ? )(gkT dk + rk+1 (jj(c(xk ) + rc(xk )T dk )? jj ? jjc(xk )? jj) < 0;

(4:18)

which completes the proof.

Assumption 4.3 fxk g and fdk g are uniformly bounded. The assumption on fxk g is common in analyses on convergence of the

algorithms. Since the objective function (2.12) is coercive, and d = 0 is feasible for (2.13)-(2.15), dk2 is bounded. If rk ! 1, In place of (2.10), we use the following subproblem: (4:19) min 12 dT Bk d + rk jj(cJk (xk ) + rcJk (xk )T d)? jj s:t: jjdjj2  R; (4:20) where R > 0 is a constant, and all analyses still hold since the norm is convex. If rk ! 1, by Lemma 4.2 of Yuan (1995), limk!1jjc(xk )? jj exists.

Lemma 4.4 If rk ! 1 and limk!1jjc(xk )?jj 6= 0, then there exists a convergent subsequence of fxk g which converges to an infeasible stationary point of (1.1)-(1.3).

14

Let S be the set of the accumulation points of fxk g. If the lemma is not true, for any x 2 S , jjc(x)? jj 6= 0 and (3.3) does not hold. Thus, there exists a v > 0 such that for k large enough, minjjdjj2 jj(c(xk ) + rc(xk )T d)? jj  jjc(xk )? jj ? v; (4:21) where  is a positive constant. Let d^k be a vector such that kd^k k   and that jj(c(xk ) + rc(xk )T d^k )?jj = minjjdjj2 jj(c(xk ) + rc(xk )T d)? jj: (4:22) The facts that kd^k k  , jj(cJk (xk ) + rcJk (xk )T d^k )? jj  jj(c(xk ) + rc(xk )T d^k )? jj; (4:23) Proof.

rk ! 1, and that dk1 solves (2.10) implies that inequality gkT dk1 + 21 dTk1 Bk dk1 + rk (jj(cJk (xk ) + rcJk (xk )T dk1 )? jj ? jj(c(xk ))? jj)

 gkT dk + d^Tk Bk d^k + rk (jj(cJk (xk ) + rcJk (xk )T d^k )? jj ? jj(c(xk ))?jj) 1

1 2

 gkT dk + M ? rk v < 0 1

1 2

2

(4:24) holds for all suciently large k, which contradicts the parameter updating procedure. Similarly, we have the following result: Lemma 4.5 If rk ! 1 and limk!1jjc(xk )?jj = 0, then there exists a convergent subsequence of fxk g which converges to a singular stationary point of (1.1)-(1.3). Let x be any accumulation point of fxk g. Then x is a feasible point of (1.1)-(1.3). The condition rk ! 1 implies that there exists an in nite subsequence fxk : k 2 Kg such that jj(c(xk ))? jj 6= 0 for k 2 K. If the result is not true, then for any convergent subsequence fxk : k 2 K~ g(K~  K), (3.4) does not hold. Hence, there exists a positive number v such that (4.21) holds. Similar to Lemma 4.4, the proof can be completed. Proof.

The above two lemmas imply that rk is bounded, if no subsequence of

fxk g converges to an infeasible stationary point or a singular stationary point of (1.1)-(1.3).

15

Lemma 4.6 Suppose that rk = r (r is a positive constant) for all k large enough, fxk g is an in nite sequence and fxk : k 2 K^ g is a convergent subsequence. Then dk ! 0 for k 2 K^ and k ! 1. We proceed by contradiction. Without loss of generality, assume that rk = r for all k. Suppose that there exist an in nite0 subset K 0  K^ and a positive constant  such that jjdk jj2   for k 2 K . By Lemma 4.2, there exists ^ > 0 such that rt(xk + tdk ; r)jt=0  ?^ < 0: (4:25) Thus, there exists a constant  > 0 and suciently small tk > 0 such that for k 2 K 0 , (xk + tk dk ; r)  (xk ; r) ? : (4:26) The above inequality implies that Proof.

k2K 0 ((xk + tk dk ; r) ? (xk ; r))  ?k2K 0  = ?1;

(4:27)

which is a contradiction. This completes the proof. In the following theorem, we assume that (dk2 ; k ) is a Kuhn-Tucker Pair of (2.12)-(2.15) at xk , where k 2 Rm is a Lagrange multiplier vector associated with dk2 .

Theorem 4.7 Suppose that fxk g is an in nite sequence generated by the algorithm, frk g and fk g are bounded, fxk : k 2 K^ g is a subsequence converging to x . If jjc(x )? jj = 0, then x is a strong stationary point of (1.1)-(1.3). Proof.

Since (dk2 ; k ) is a Kuhn-Tucker Pair of (2.12)-(2.15) at xk , we have

gk + Bk dk2 ? rc(xk )k = 0;

(4:28)

(k )i rci (xk )T dk2 = 0 for i 2 Ik ; (k )i (^ci (xk ) + rci (xk )T dk2 ) = 0 for i 2 Ik ; (k )i  0 for i 2 I; and (2.13)-(2.15) hold. Moreover, (dk1 ; Jk ) satis es that

(4:29) (4:30) (4:31)

Bk dk1 + rk rcJk (xk )Jk = 0; 16

(4:32)

Jk 2 @ jjujjju=(cJk (xk )+rcJk (xk )T dk1 )? ; (4:33) where Jk 2 RjJk j is a vector with (Jk )i (i 2 Jk ) as its components. It follows from (4.33) that (Jk )i  0 for i 2 Ik . Thus, we have

where and

gk + Bk dk ? rc(xk )uk = 0;

(4:34)

(uk )i (ci (xk ) + rci (xk )T dk ) = 0 for i 2 Ik ; (uk )i  0 for i 2 I;

(4:35) (4:36)

(uk )i = (k )i ? rk k (Jk )i for i 2 Jk ;

(4:37)

(uk )i = (k )i for i 2 Ik :

(4:38) Let I (x ) = fi : i 2 Ik for infinitely many k 2 K^ g, I (x ) = fi 2 I : ci (x ) = 0g. Then I (x )  I (x ). By (4.33), jjJk jj0  1, where jj  jj0 is the dual norm of jj  jj de ned by (2.1). Then it follows from Lemma 4.6 that there exists a cluster point u 2 Rm of fuk g such that

with (u )i  0 for i 2 I .

g(x ) ? rc(x )u = 0;

(4:39)

(u )i ci (x ) = 0 for i 2 I;

(4:40)

The condition on k is not restrictive. The boundedness of frk g implies that (2.18) holds for suciently large k. Thus, by (4.32)-(4.33) and (4.28), we have rk  ((rc(xk )T dk1 )T k )=jjrc(xk )T dk1 jj (4:41) for suciently large k. On the other hand, if we suppose the MangasarianFromovitz condition holds at x , it can be proved that k is bounded. It should be noted that the above convergence results do not rely on any linear independence assumption of the gradients of the constraints. Thus, the algorithm may terminate at some iteration, which is not a Kuhn-Tucker point of (1.1)-(1.3), even if the penalty parameter is bounded. A simple example will demonstrate this case.

17

Example 4.8 Consider the problem min y1 + (1=2)y22

(4:42)

s:t: (1=2)y12 = 0; y1 + y23 ? 3=2 = 0:

(4:43) (4:44) Let the penalty parameter r = 1, the algorithm will terminate at (1; 0), which is not a Kuhn-Tucker point of (4.42)-(4.44).

5 Local convergence To study local convergence properties of the algorithm, we make the following assumption:

Assumption 5.1 (1) xk ! x, where x is a Kuhn-Tucker point of (1.1)(1.3); (2) Let I  = fi 2 I : ci (x ) = 0g, rci (x )(i 2 E [ I  ) are linearly independent; (3) rk = r for k  k^, where r > 0 is a constant, k^ is a suciently large positive integer.

The de nitions of (2.4)-(2.5) imply that for in nitely many k, there exists a small  > 0 such that ci (xk )   for i 2 Ik . Thus, by Assumption 4.2 and (2.11), we have k  0 for in nitely many k, where 0 > 0 is a constant. For suciently large k, by de nitions of (2.4)-(2.5), Ik  Ik+1 . Thus, Ik = I  for suciently large k. Moreover, under Assumption 5.1, it follows from (3.8) that jjdk1 jj2 ! 0 for k ! 1. Therefore, k ! 1 for suciently large k.

Lemma 5.2 Under Assumption 5.1, suppose that dk 0 is a solution of (2.10)0 at the point xk , then there exists a suciently large k , such that for k  k , (cJk (xk ) + rcJk (xk )T dk )? = 0: (5:1) 1

1

The rst-order necessary condition of (2.10) imply that (4.32)-(4.33) hold and jjJk jj0  1 with jjjj0 being the dual norm of jjjj de ned by (2.1). If at the kth iteration (cJk (xk ) + rcJk (xk )T dk1 )? 6= 0, then jjJk jj0 = 1. De ne pk = jjBk dk1 +rk rcJk (xk )Jk jj2 and J  = E [I  . If for suciently large k, (cJk (xk )+ rcJk (xk )T dk1 )? 6= 0, then it follows from Lemma 4.6 and

Proof.

18

the equivalence of the norm that

pk = jjrk rcJ  (x )Jk jj2 + O(jjxk ? x jj2 ) + O(jjdk1 jj2 )

 rk jjJk jj =jjrcJ  (x ) jj + o(1) +

2

2

 c rk jjJk jj =jjrcJ  (x ) jj + o(1) 0

+

0

(5:2)

2

 c r=(2jjrcJ  (x ) jj ); 0

+

2

where rcJ  (x ) 2 RnjJ j is a matrix with rci (x ) (i 2 J  ) as its volume vectors, rcJ  (x )+ is its generalized inverse matrix and c0 > 0 is a constant. (5.2) contradicts with (4.32). This completes our proof. The above lemma shows that there exists a sucient large integer k0 , such that for k  k0 , the piecewise quadratic subproblem (2.10) is equivalent to the following quadratic programming problem: (5:3) min 21 dT Bk d s:t: ci (xk ) + rci (xk )T d = 0; i 2 E; (5:4) ci (xk ) + rci (xk )T d  0; i 2 Ik : (5:5)

Assumption 5.3 Suppose that  is a Lagrangian multiplier vector asso-

ciated with x : (1) The strict complementarity condition holds at (x ;  ); (2) r2 L(x ;  ) is positive de nite for all nonzero d in the null space fd : rci (x)T d = 0; i 2 E [ I  g, where L(x; ) is de ned as (1.8).

It follows from Assumption 5.3 that (x ;  ) is a isolated Kuhn-Tucker pair of (1.1)-(1.3). If the conditions in Assumption 5.3 hold, then for suciently large k, dk1 derived by (5.3)-(5.5) is a solution of the problem: (5:6) min 21 dT Bk d s:t: ci (xk ) + rci (xk )T d = 0; i 2 E [ I  ; (5:7) and dk2 generated by (2.12)-(2.15) solves (5:8) min gkT d + 21 dT Bk d 19

s:t: rci (xk )T d = 0; i 2 E [ I  : (5:9) Let rcJ  (xk ) is a n  jJ  j matrix with rci (xk )(i 2 J  ) as its components. By direct calculations, it follows from (5.6)-(5.7) that

dk1 = ?Bk?1 rcJ  (xk )(rcJ  (xk )T Bk?1 rcJ  (xk ))?1 cJ  (xk );

(5:10)

and by (5.8)-(5.9),

dk2 = Bk?1 rcJ  (xk )(rcJ  (xk )T Bk?1 rcJ  (xk ))?1 rcJ  (xk )T Bk?1 gk ? Bk?1 gk : (5:11) Thus, dk1 + dk2 is a solution of the problem min gkT d + 21 dT Bk d

(5:12)

s:t: ci (xk ) + rci (xk )T d = 0; i 2 J  :

The above discussion can be stated as the following lemma:

(5:13)

Lemma 5.4 If the conditions in Assumption 5.1 and 5.3 hold, then there exists a suciently large k , k  k such that for k  k , the algorithm 1

1

0

generates identical directions with Han-Powell method.

1

By Lemma 5.4 and the related results of SQP method (Boggs et al(1982), Yuan(1993)), the superlinear convergence of the algorithm is a direct result.

Lemma 5.5 Suppose that the conditions in Assumption 5.3 hold. If    limk!1 jjP (Bk ? rjjdLjj(x ;  ))dk jj = 0; (5:14) k where P  is a projection matrix on the null space fd : rci (x )T d = 0; i 2 E [ I  g; then  limk!1 jjxjjk x+ d?k x? jjx jj = 0: (5:15) k 2

2

2

A superlinear convergence step may be truncated due to the nonsmoothness of the merit function, which is known as \the Marotos e ect" (for example, see Yuan(1993), Yuan and Sun(1997)). In order to avoid this case, the second-order correction technique is considered by Mayne and Polak(1982), 20

Coleman and Conn(1982), Fletcher(1982) and so on. For our problem, when jjcJk (xk )jj   ( is a prescribed number), we solve the subproblem (5:16) mind2Rn 21 dT Bk d + rk jj(cJk (xk + dk ) + rcJk (xk )T d)? jj to generate the second-order correction step d~k . The algorithm with the second-order correction technique is presented, which is a modi cation to Algorithm 2.1:

Algorithm 5.6

Step 0 Given x0 2 Rn, B0 2 Rnn , r0 > 0, 0 <  < 21 , 0 < < 1,  > 0, 0 , 1, 2 , 3 > 0, k := 0. Step 1 Generate Jk and Ik by (2.4)-(2.6), Solve (2.10) giving dk1 ; Calculate k by (2.11); Solve (2.12) giving dk2 ; dk = k dk1 + dk2 ; If jjdk jj  0 , stop; If jjcJk (xk )jj  1 solve (5.16) giving d~k ; else d~k = 0. Step 2 rk+1 := rk if (2.18) holds, otherwise compute rk+1 by (2.17). Step 3 s=0, 1, 2,   ; If (xk + s dk + 2s d~k ; rk+1 ) ? (xk ; rk+1 ) 

 s (gkT dk + rk+1 (jj(c(xk ) + rc(xk )T dk )? jj ? jj(c(xk ))? jj)): (5:17) Let tk = s and xk+1 = xk + tk dk + t2k d~. Step 4 If jjxk+1 ? xk jj  3 , stop. Step 5 Compute the values of f (x), c(x), rf (x) and rc(x) at xk+1 ; Generate Bk+1 ; k := k + 1 and goto Step 1. Similar to above discussion on Lemma 5.2 and Lemma 5.4, and to the analyses in Mayne and Polak(1982) and Yuan(1993), Yuan and Sun(1997), we have the following result:

Theorem 5.7 Under Assumptions 5.1 and 5.3, suppose that (5.14) holds, i = 0(i = 0; 1; 2; 3), fxk g is an in nite sequence generated by Algorithm 5.6. Then + d~k ? x jj = 0; (5:18) limk!1 jjxk +jjxdk ? k x jj 21

and there exists a suciently large k2 , such that for k  k2 , tk = 1. Thus,

fxk g converges Q-superlinearly.

6 Numerical results A FORTRAN subroutine was programmed to test our algorithm. Our experiments were done on an indigo workstation at the State Key Laboratory of Scienti c and Engineering Computing. The norm in (2.1) is selected to be the l1 norm. We solve the piecewise quadratic subproblem (2.10) by reformulating it as a positive semide nite quadratic programming: (6:1) min 21 wT Qk w + pTk w s:t: y ? rci (xk )T d  ci (xk ); i 2 E; (6:2) y + rci (xk )T d  ?ci (xk ); i 2 Jk ; (6:3) ! ! ! B d 0 k (n+1)(n+1) n +1 where Qk = , w = y 2 R , pk = r 2 0 2R k n +1 R . The second order correction subproblem (5.16) is solved similarly. The rst test problem that we solved is taken from Sahba(1987):

min x1x2

(6:4)

s:t: ? sinx1  0; (6:5) cosx1  0; (6:6) ? x21 ? x22 + =2  0; (6:7) x1 +   0; (6:8) x2 + =2  0; (6:9) and the standard starting point is x0 = (0; 5)T . The Sahba's algorithm terminates at the point x = (0; ?1:25331)T , which is an approximate Kuhn-

Tucker point and not the approximate minimum point of (6.4)-(6.9). The other test problems are from Hock and Schittkowski(1981). For each problem, the standard initial point is used. We choose initial parameters  = 0:1, = 0:5,  = 1, i = 10?6 for i = 0; 1; 2; 3. The choice of the initial penalty parameter is scale dependent, r0 = 1 is chosen for our 22

test problems. The initial Lagrangian Hessian estimate B0 = I and Bk is updated by the damped BFGS formular([14]): k sTk Bk + yk ykT ; (6:10) Bk+1 = Bk ? BskTsB s sT y k k k

where

k k

(

^kT sk  0:2sTk Bk sk ; (6:11) yk = y^ky;^ + (1 ?  )B s ; yotherwise; k k k k k and y^k = gk+1 ? gk + (rc(xk+1 ) ? rc(xk ))k , sk = xk+1 ? xk , k = 0:8sTk Bk sk =(sTk Bk sk ? sTk y^k ), k is a multiplier associated with (2.12)-

(2.15). The test problems are also solved by Powell's subroutine VMCWD, which is a very successful algorithm for many nonlinear programming problems. The error tolerance for VMCWD is 10?8 . The subroutine VMCWD failed to solve Sahba's problem (6.4)-(6.9) since the constraints seem to be inconsistent after the rst iteration. The numerical results given by our algorithm is presented in Table 1, where R ? KT and R ? CV represent the l2 norms of the gradient of the Lagrangian and the violation of the constraints respectively. The algorithm terminates at the approximate minimum point of (6.4)-(6.9).

Table 1. k 0 1 2 3 4 5 6 7

x(1) k

0.0 -3.1416 -1.1116 -0.8262 -0.9236 -0.8899 -0.8870 -0.8863

x(2) k

5.0 2.6571 2.3552 1.3834 0.9546 0.8858 0.8854 0.8862

rk

1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0

k

1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0

tk

1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0

R ? KT

5.0 3.2415 1.7142 8.8489E-1 4.4570E-2 6.1008E-3 2.4008E-3 1.3274E-4

R ? CV

2.3429E+1 1.5391E+1 5.2119 1.0257 1.9339E-1 5.8572E-3 7.7911E-6 0.0

Some numerical results for equality constrained optimization problems have been reported in Liu and Yuan[10]. It has been noticed that our algorithm can overcome the diculties associated with the linear dependence 23

of the gradients of the constraints, since an unconstrained subproblem is solved at each iterate. The numerical results for other test problems are listed in Table 2. The problems are numbered in the same way as in Hock and Schittkowski(1981). For example, \HS43" is problem 43 in Hock and Schittkowski (1981). NI, NF and NG represent the numbers of iterations, function and gradient calculations respectively.

Table 2. Problem n me m HS7 2 1 1 HS14 2 1 2 HS22 2 0 2 HS38 4 0 8 HS43 4 0 3 HS52 5 3 3 HS63 3 2 5 HS76 4 0 7 HS86 5 0 15 HS113 10 0 8

VMCWD Our algorithm NI-NF-NG Residual NI-NF-NG Residual 12-14-14 4.94E-08 9-18-10 3.85E-08 5-6-6 7.90E-11 4-5-5 2.98E-07 5-7-7 3.18E-10 23-46-24 4.58E-08 81-104-104 8.96E-04 38-64-39 6.68E-05 12-15-15 5.14E-06 12-23-13 4.48E-06 5-9-9 2.21E-05 16-21-17 2.43E-12 8-9-9 6.72E-07 7-8-8 2.41E-07 5-6-6 1.45E-04 6-7-7 2.12E-07 4-6-6 1.78E-04 4-7-5 6.22E-05 12-17-17 6.46E-06 14-20-15 4.07E-05

The numerical results show that our algorithm is comparable to VMCWD. But our algorithm requires slightly more function evaluations.

References [1] P.T. Boggs, J.W. Tolle and P. Wang, On the local convergence of QuasiNewton methods for constrained optimization, SIAM J. Control and Optimization, 20(1982), pp.161-171 [2] J.V. Burke, A sequential quadratic programming method for potentially infeasible mathematical programs, J.Math.Anal.Appl. 139(1989), pp.319-351. [3] J.V. Burke and S.P. Han, A robust sequential quadratic programming method, Math. Programming, 43(1989), pp.277-303. [4] J.V. Burke and S.P. Han, A Gauss-Newton approach to solving generalized inequalities, Math. Operations Research, 11(1986), pp.632-643.

24

[5] T.F. Coleman and A.R. Conn, Nonlinear programming via an exact penalty function: asymptotic analysis, Math. Prog., 24(1982), 123-136. [6] R. Fletcher, Practical Methods for Optimization, Vol. 2, Constrained Optimization (John Wiley and Sons, Chichester, 1981). [7] R. Fletcher, A model algorithm for composite nondi erentiable optimization problems, Math. Prog. Stud. 17(1982), pp.67-76. [8] S.P. Han, A globally convergent method for nonlinear programming, JOTA, 22(1977), pp.297-309. [9] W. Hock and K. Schittkowski, Test examples for nonlinear programming codes, Lecture Notes in Eco. and Math. Systems 187, 1981. [10] X. Liu and Y. Yuan, A globally convergent, locally superlinearly convergent algorithm for equality constrained optimization, Research Report, ICM-9784, Inst. Comp. Math. Sci./Eng. Computing, Chinese Academy of Sciences, Beijing, China. [11] D.Q. Mayne and E. Polak, A superlinearly convergent algorithm for constrained optimization problems, Math. Prog. Stud., 16(1982), pp.45-61. [12] J.S. Pang and S.A. Gabriel, NE/SQP : A robust algorithm for the nonlinear complementarity problem, Math. Programming, 60(1993), pp.295-337. [13] J.S. Pang, S.P. Han and N. Rangaraj, Minimization of locally Lipschitzian functions, SIAM J. Opt., 1(1991), pp.57-82. [14] M.J.D. Powell, A fast algorithm for nonlinearly constrained optimization calculations, in proc. 1977 Dundee Biennial Conference on Numerical Analysis, G.A.Watson, ed., Springer-Verlag, Berlin, 1978, pp.144-157. [15] M.J.D. Powell, The convergence of variable metric methods for nonlinear constrained optimization calculations, in Nonlinear Programming 3, O.L. Mangasarian, R.R.Meyer and S.M.Robinson, eds., Academic Press, NewYork, 1978, pp.27-63. [16] M. Sahba, Globally convergent algorithm for nonlinearly constrained optimization problems, JOTA, 52(1987), pp.291-309. [17] K. Schittkowski, The nonlinear programming method of Wilson, Han and Powell with an augmented Lagrangian type line search function, Part 1: convergence analysis, Numer. Math., 38(1981), pp.83-114. [18] K. Schittkowski, On the convergence of a sequential quadratic programming method with an augmented Lagrangian line search function, Math. Operations for sch. u. Statist., Ser. Optimization, 14(1983), pp.197-216.

25

[19] K. Schittkowski, NLPQL: A FORTRAN subroutine solving constrained nonlinear programming problems, Annals of Operations Research, 5(1986), pp.485-500. [20] J. Stoer, Principles of sequential quadratic programming methods for solving nonlinear programs, in NATO ASI Series, Vol.F15, Computational Mathematical Programming, K. Schittkowski, ed., Springer-Verlag, Berlin, 1985, pp.166-207. [21] J. Sun, On piecewise quadratic Newton and trust region problems, Math. Prog., 76(1997), pp.451-467. [22] K. Taji and M. Fukushima, A new merit function and a successive quadratic programming algorithm for variational inequality problems, SIAM J. Optimization, 6(1996), pp.704-713. [23] R.B. Wilson, A simplicial algorithm for concave programming, Ph.D. thesis, Harvard University, Cambridge, MA, 1963. [24] Y. Yuan, Numerical methods for nonlinear programming, (in Chinese) Modern Mathematics Series, Shanghai Scienti c & Technical Publishers, 1993. [25] Y. Yuan, On the convergence of a new trust region algorithm, Numer. Math., 70(1995), pp.515-539. [26] Y. Yuan and W. Sun, Theories and methods for optimization, (in Chinese) Scienti c Press, Beijing, 1997.

26

Suggest Documents