Nonlinear programming via an exact penalty function ... - Springer Link

7 downloads 0 Views 550KB Size Report
Key words: Nonlinear Programming, Exact Penalty Methods, Successive Quadratic ... T.F. Coleman and A.R. Conn/ Nonlinear programming: Asymptotic analysis.
Mathematical Programming 24 (1982) 123-136 North-Holland Publishing Company

N O N L I N E A R P R O G R A M M I N G VIA AN EXACT PENALTY FUNCTION: ASYMPTOTIC ANALYSIS* T.F. COLEMAN** Applied Mathematics Division, Argonne National Laboratory, Argonne, IL, U.S.A.

A.R. CONN Computer Science Department, University of Waterloo, Waterloo, Ont., Canada Received 16 June 1980 Revised manuscript received 16 July 1981

In this paper we consider the final stage of a 'global' method to solve the nonlinear programming problem. We prove 2-step superlinear convergence. In the process of analyzing this asymptotic behavior, we compare our method (theoretically) to the popular successive quadratic programming approach.

Key words: Nonlinear Programming, Exact Penalty Methods, Successive Quadratic Programming.

I . Introduction

The nonlinear programming problem can be expressed as minimize f(x), subject to ~i(x) 1>0, i = l ..... m,

(1.1)

where m is a positive integer and f, ~bi, i = 1..... m are continuously differentiable functions mapping R" to R ~. Many algorithms have been proposed to solve (1.1), and recently, successive quadratic programming has been a popular approach. While this method often exhibits fast local behaviour, it is not a robust global procedure. There have been, and continue to be, attempts to 'globalize' this method (for example, [5, 6, 9, 12, 14], however to date there does not exist an entirely satisfactory method. (That is, currently there does not exist a method which 'globalizes' the local quadratic programming approach in a consistent and natural way. See Sections 3 and 4 for more details.) We discuss the method of Han [12] in Section 3. In [4], Coleman and Conn introduce a method, based on an exact penalty function, which possesses both global and fast local convergence properties. In [4], numerical results are given which support this claim, and global convergence is proven. It is the intent of this paper to rigorously establish the superlinear *This work is supported in part by NSERC Grant No. A8639 and the U.S. Dept. of Energy. **Now at Department of Computer Science, Cornell University, Ithaca, New York. 123

124

T.F. Coleman and A.R. Conn/ Nonlinear programming: Asymptotic analysis

properties. In the process we will directly c o m p a r e our method to the successiw quadratic p r o g r a m m i n g approach.

2. Local considerations 2.1. T h e a l g o r i t h m

In this section we will carefully consider the search direction produced b y the successive quadratic p r o g r a m m i n g method w h e n we are 'near' a solution to (1.1). It will be seen that there m a y be u n n e c e s s a r y computation and storage. A geometric interpretation of the search direction leads to a modification which eliminates this excess. This new direction is exactly that produced by the algorithm of Coleman and Conn (derived in [4]) when in a neighbourhood of the solution. Let x* be a local solution to (1.1) and suppose that at x* the active set is {1 . . . . . t}, where t ~ n. Furthermore, let Ak denote the n x t matrix (V4~(x k) . . . . . V4~t(xk)), and let ~ ( x k) denote (4~1(xk) . . . . . 4~t(xk))T. As in [15], we will assume that we are sufficiently close to x* so that the active set has been 'identified' and the successive quadratic p r o g r a m m i n g procedure reduces to the problem minimize a

V f ( x k ) T d + ~dTBkd,

such that

&i(x k) + (Vcki(xk))Td -----0,

(2.1) i = 1. . . . . t.

H e r e the n × n matrix Bk is an approximation to the Hessian of the Lagrangian function. Using the formulation of Powell [15], the solution to (2.1) can be written as d k = qk + r k,

(2.2)

where q k = _ B ~ 1 A k ( A ~ B ~lAk)-lqb(xk), r k = {B~'Ak(ATB~'Ak)-IATBk

' -- B ; ' } V I ( x k ) .

Provided we start sufficiently close to x*, a stepsize of unity is assumed in [15], and thus we have x k+l ~- x k + d k,

(2.3)

where d k is given by (2.2). It is instructive to introduce the n × (n - t) matrix Z~ ( c o m m o n l y used b y Gill and Murray, see [11], for example) which satisfies A TZk = O,

ZVkZk = I(,_,).

(2.4)

125

T.F. Coleman and A.R. Conn/ Nonlinear programming: Asymptotic analysis

Provided

ZTBkZk is positive definite, the solution to (2.1) can be rewritten as d k = h k + v k,

(2.5)

h k = - Z k ( Z ~ B k Z k ) - I z ~ ( V f ( x k) + Bkvk),

(2.6)

V k = _ Ak(A~Ak)-Irb(Xk).

(2.7)

where

We assume that the columns of Ak are linearly independent. where h is a Let L(x, h) denote the Lagrangian function f ( x ) - - h T ~ ( x ) , t-vector. Suppose that at iteration k we have available x k and hk, estimates to the optimal primal and dual variables, x* and h*. Let Bk be an estimate to the current Lagrangian H e s s i a n . Gs(x k) - ~ h ~G,~(xk).

(2.8)

i-|

Eqs. (2.6) and (2.7) can be interpreted in an interesting geometric fashion. Firstly, v k is just the least-squares solution to the system d~(x k) + A TkV = 0.

(2.9a)

That is, Vk is a 'Newton-like' attempt to solve the system qb(x) = 0,

(2.9b)

starting at the point x k, and using exact information computed at x k. The step h k can be viewed as an approximation to the constrained Newton step (w.r.t. x) for the Lagrangian (in the manifold spanned by the columns of Zk and containing the point x k +vk). This 'Newton' step is based on a p p r o x i m a t e Lagrangian gradient information at the point x k + v k. To see this consider that t

But Bk approximates G~(xk)-~l=l h~G,~(xk), the Lagrangian Hessian at x k, which we denote by GL(X ~, ~k). Thus, hk = - Zk(ZTkB~Zk)-IZ~(VL(x k, h k) + Bkvk),

(2.10)

which approximates -- Z k ( Z T G L ( X k, h k ) z k ) - I z T ( V L ( x

k, h k) q- GL(X k, )t k)l) k),

(2.11)

provided the inverse of the projected Hessian exists. But by Taylor's theorem, Z~[VL(x k, h k) + GL(x k, h k ) v k ] -~ Z~[VL(x k + v k, hk)].

(2.12)

Considering (2.10) and (2.12) it is clear that h k is an approximation to the constrained Newton direction based on a p p r o x i m a t e g r a d i e n t information at x k At- ,i)k.

126

T.F. Coleman and A.R. Conn/ Nonlinear programming: Asymptotic analysis

In summary then, the direction d k can be viewed as a two-part process. First, the step v k is taken, based on e x a c t information at xk: V k satisfies qb(x k + v) = 0, up to first-order terms. From the point x k + V k a step h k is taken in the space spanned by Zk: h k is a 'Newton-like' attempt to satisfy z T [ V L ( ( x k + v k) + h)] = 0, however, only a p p r o x i m a t e g r a d i e n t information is used at x k + v k. It is difficult to imagine improving on tlie step v k (up to first-order) since v k uses exact information. The question should be asked, however: is h k a good approximation to the true constrained N e w t o n direction at x k + vk? This question is naturally divided into the following questions: (i) Is ZTBkZk a good approximation (in some sense) t o ZTGL(X k, )tk)zk? (ii) Is z T [ V L ( x k , ) t k ) + B k v k] a good approximation (in some sense) to z T [ V L ( x k + v k, hk)]? Interestingly, Powell [15] proved that question (ii) can be ignored, to some extent (assuming convergence), and yet a 2-step superlinear convergence rate can be maintained. In particular, the accuracy of Z T B k v k is not important. This suggests that one could ignore the computation of Z T B k v k altogether. Specifically, let d k = ~k ÷ l~k, where i k= _

Z~(zTBkzk)-'zTVI(Xk).

We note that since

i=!

we can interpret /~k as an approximation to the constrained Lagrangian Newton direction, starting at x k (in the manifold containing x k and spanned by the columns of Zk), based on e x a c t gradient information. If we view v k as being added after/~k, then 1) k is now an attempt to solve ¢}(x k +/~k + V) = 0, based on old information (that is, A and qb are computed at x k, not x k +/~k). Nevertheless, it can be shown that the iterate xk+l ~__X k + ~k + V k,

(2.13)

will result in a 2-step superlinear convergence rate. We note that (i) /yk + v k is n o t a solution to the quadratic programming problem (2.1), (ii) only the p r o j e c t e d Hessian, ZTGL(X k, )tk)Zk, need be computed. (Murray and Wright [14], suggest algorithms which, at times, also ignore the term z T B k v k.)

Since we are now viewing v k as being taken after/yk, and since v k is based on information evaluated at x k, it seems reasonable to suggest that v k be 'improved' by re-evaluating gradients and functions at x k + ~k. Such computation would probably be unjustifiably expensive; however, global convergence considerations [4] demand that the active constraint functions, ~bi, i = 1.... , t, be

T.F. Coleman and A.R. Conn/ Nonlinear programming: Asymptotic analysis

127

evaluated at x k + /~k. (This does not destroy 2-step superlinearity.) Thus we define ~k = _ Ak(ATAk)-Idp(xk ÷ ~lk),

(2.15)

xk+l ~_ X k + fak + ~k.

(2.16)

and set

We emphasize that the only new information that is obtained at x k + ~k is t h e vector function value *(xk+/~k). The matrix Ak is not re-computed at x k + /~k, but contains gradient information accurate at x k. (Thus, matrix decompositions are not modified.) We note that properties (i) and (ii) above continue to hold for the step /~k + ~k. Based on the preceding observations, we present the following 'local' algorithm. This local method is exactly that to which the global procedure of Coleman and Conn [4] automatically reduces to in a neighbourhood of a solution. A l g o r i t h m 1 (Local) (0) Select an x ° sufficiently close to x* and set k ~ 1.

(1) Determine the dual estimates {hk}. (2) 'Update' ZXkBkZk maintaining positive definiteness. (3) Determine /~k: Solve (Z~kBkZk)i~ = -- Z ~ V f ( x k ) , and set /~k ~__Zk/~. (4) Determine ~k:

(5) Update: xk+l t__._xk ÷ ~k + ~k,

go to (1). Note. (i) This algorithm statement is not meant to reflect the actual implementation. This question is dealt with in [4]. (ii) Theoretically, it does not matter how step (1) is performed as long as {)~k} ~ )~,, where V/(x*) = ~ = 1 hSV~bi(x*). In practice we use the least-squares solution to

AkX = Vf(xk), computed using a QR decomposition of AR (see [4]). Next we establish that Algorithm 1 generates a sequence {xk}, which (under a convergence assumption) satisfies

ilxk l_ x*U IIx x*lL

0.

128

T.F. Coleman and A.R. Conn/ Nonlinear programming: Asymptotic analysis

2.2. 2-step superlinear convergence

Before stating and proving the major result of this section, a number of preliminary results are established. We make the following assumptions: (A) f, &i, i = 1, ..., m, are twice continuously differentiable; (B) the second-order sufficiency conditions (as in Fiacco and McCormick [10]) are satisfied at x*; (C) {x k} is generated by Algorithm 1, and {x k} E W, a compact set; (D) the columns of A ( x ) = ( V ( ~ I ( X ) . . . . . V(bt(X)) are linearly independent for all xEW. We first establish that the horizontal step, /~k, is bounded by the distance between x k and x*. Lemma 1. Under assumptions (A)-(D) and assuming that there exist scalars hi, b2 (0 ~ bl ~< bE) such that

blllyll2~< yt(Z~BkZk)y 0 such that I1~11 ~ L,IIx ~ - x*ll.

Proof. By Algorithm 1, ~k = _ Zk(ZtBkZk)-lZtkVf(xk ) = - Zk(ZtkBkZk)-'Ztk(VL(xk, h*)).

But, by (2.17), {(ZTBkZk) -1} is bounded above, thus there exists an /~1 > 0 such that

II~kll ~< £,llVL(x k, x*)ll.

(2.18)

But V L ( x * , h * ) = O , and thus using Lipschitz continuity the result follows. (Note: unless stated otherwise, I1"II denotes the 2-norm.) A similar bound exists for ~7k. Lemma 2. Under assumptions (A)-(D) and (2.17), there exists an L2 > 0 such that

I1~11 ~ L~llx ~ - x*ll. Proof. From Algorithm 1, ~k = _ Ak(A XkAk)-l~(xk + ~k).

But ~j(x ~ + K~) = ~j(x ~) ÷ o(llK~llb,

Thus,

I1~11 ~ IIAdl-

II(ATAk)-lll

"

II~(x~)II + O(l[h~llb-

T.F. Coleman and A.R. Conn/ Nonlinear programming: Asymptotic analysis

129

But qb(x*)--0. Thus using Lipschitz continuity of 4~, and the boundedness of Ak, ( A T A k ) -l, the result follows.

Lemma 3. U n d e r the a s s u m p t i o n s o f L e m m a s

1 and 2, there exists an La > O

s u c h that

IIx k+' - x*ll ~ L311x k - x*ll.

Proof. Follows directly from Lemmas 1 and 2. Clearly, by definition, the columns of (Ak, Z0 span R". Therefore, we can write (2.19)

x k - x* = AkW k + ZkU k,

where w k ~ R t, u k ~ R n-t. Using these definitions, the following lemma is easy to establish. Lemma 4. U n d e r a s s u m p t i o n s (A)-(D) and (2.17), if 0

and

ilukql

IIx ~-~- x*ll ~ 0.

(2.20)

then

[[xk+l-- X*ll ~ O. IIx ~-~- x*ll

Proof. Obvious. Lemma 4 suggests that 2-step superlinear convergence can be proved in a separable fashion: we show separately that ~ - ~

0 and

Iluk+'llx*ll

IIx ~-'-

--> 0.

Theorem 1. U n d e r the a s s u m p t i o n s o f L e m m a s

1 and 2, a n d a s s u m i n g t h a t

{x k} -~ x * , then IIx ~ - x*ll

o.

Proof. From Algorithm 1, Xk+l = X k _ Ak(A~Ak)-I@(Xk + ~l~) + ~k = X k _ A k ( A T A O - I ~ ( X k ) +/ik + yk,

where Ilykll= O(ll/ik]12). But for each j E {1.... , t}, 4,j(x ~) = V4,l(~)~(x

k - x*),

(2.21) (2.22)

T.F. Coleman and A.R. Corm/Nonlinear programming: Asymptotic analysis

130

where ~ = x k + O ~ ( x * - xk),

Thus

if

we

define

0~ 0. Therefore, [[wk+l[[ ~ [[(A/+,Ak+0-~[[ • [[AT[["[[Ek[["[Ix k -- X*[[ + ( L . + Ls)[Ix k - x*[[2. (2.25b) But, by definition of liEd[ and the convergence assumption, {[[Ekl]}~ 0, and therefore our result follows. Theorem 2. U n d e r the a s s u m p t i o n s o f T h e o r e m 1, a n d a s s u m i n g t h a t Z ~ B k Z k --> Z ~ G L ( X * , A*)Z,, then

Iluk+'al IJx x*ll (Note: Z,

satisfies Z ~ Z ,

0. = I(,-t), A ~ , Z , = O, w h e r e A , = (V~bl(x*), ..., V~bt(x*)).)

Proof. By Algorithm 1, x k+l = x e - Z k ( Z ~ B k Z k ) - 1 Z ~ ( V L ( x k, A*)) + ~k.

Define

P,k = Z k [ ( Z ~ B k Z O - ' - ( z ~ Q ( x

~, A*)z~)-']z~,

(2.26)

131

T.F. Coleman and A.R. Conn/ Nonlinear programming: Asymptotic analysis

and c o m b i n i n g this with (2.26), we obtain x ~+~ = x k - Zk(ZrkGL(x k, X * ) Z k ) - I Z ~ V L ( x k, X*) - F . k V L ( x k, h*) + ~k.

(2.27)

U s i n g a T a y l o r ' s e x p a n s i o n , there exists a m a t r i x GL(X k, h*) w h i c h satisfies V L ( x k, )t*) = GL(X k, )t *)(X k -- X*), GL(X k, h*) ~ O1~(x*, h*).

L e t us define a m a t r i x / ~ k : P-,k = Z k ( Z ~ G L ( x k, )t *)Zk)-lZTk[dc(X k, )t * ) - GL(X ~, it*)].

(2.28)

In light of (2.28), (2.27) can be written as x k+l = x k - Z k ( Z ~ G c ( x k, X * ) Z k ) - l Z ~ G L ( X k, h*)(X ~ -- X*) -- Ek(X k -- X*) --/~kGL(X k, X*)(X k -- X*) + ~k.

(2.29)

L e t us define Ck = Z k ( Z ~ G L ( x k, ) t * ) Z k ) - l Z ~ G c ( x ~, )t*)Ak. U s i n g this definition and c o m b i n i n g (2.19) and (2.29) we obtain x k+~ = x k - Zku k - Ckw k -:- ~ k ( x ~ - x * )

-/~TkGL(x k, X*)(X k -- X*) + ~k.

(2.30)

Again, if we a p p l y (2.19) and multiply b y Z~, t h e n (2.30) r e d u c e s to ZTk(X k+~- X*) = -- ZTkCkW k -- Z~(E~ + ~ k d L ( X k, a*)(X k - X*).

(2.31a)

Adding Z~+l(x k + l - x * ) t o b o t h sides of (2.31a) and using (2.19) yields u k+~ =

-

z~C~w

~-

z~(&

+

&dL(x ~, X*))(x ~ - x*)

+ (ZkT+1-- Z~)(x k + l - x*).

(2.31b)

But using the L i p s c h i t z continuity of the XT~b~'s ( f r o m a s s u m p t i o n (A)), a 'fixed m e t h o d ' f o r c o m p u t i n g the m a t r i c e s Zk, ~ a s s u m p t i o n (D) and L e m m a s 1, 2 and 3, I I ( z L ~ - z ~ ) ( x ~+' - x*)U ~ L611xk - x*ll:.

for s o m e L6 > 0. T h e r e f o r e

IIx - - x II

-"%xY"x,ii +(llt~ll+ll&ll.

~ ' x~-x*ll + C IIx~-x*ll= " 'llx ~-'- x*ll

611x~-'- x*ll"

(2.32)

It is assumed in several places that small changes in x produce small changes in the matrices Z. This is so, even under the stated assumption on the ~b{sonly if the basis for the null space of At that gives Zk is always the 'consistent one' that is, the ordering and the manner in which it is computed, is consistent from iteration to iteration. We have in mind Zk = Qd0, I,,-tl where Ak Qk{g~}. =

132

T.F. Coleman and A.R. Corm~Nonlinear programming: Asymptotic analysis

But {llC~ll} is bounded, and using the convergence assumption {llE~ll}~ 0. In addition, by assumption Z~BkZk ~ Z~GL(X*, h * ) Z , , and thus it follows that {ll~dl}-~ 0. But, by L e m m a 3,

{llx ~- x*ll/ IIx~-~- x*llJ is bounded, and by Theorem 1,

IIw~llx*U --, IIx~-~-

0,

and therefore

Ilu~+lll ~0. ilx~_~_ x*ll

Theorem 3. Under the assumptions (i) {x k} ~ x*, (ii) there exist scalars b~, b2 such that 0 < b~ ~ b2 and

b,llyl[~< (iii) (iv) (v) (vi) (vii) then

yT(Z~BkZk)y ~

b2llYl[~, Vk, Vy E R"-',

f, chi, i = 1.... , m, are twice continuously differentiable, second-order sufficiency conditions hold at x*, {x k} is generated by Algorithm 1, Z~BkZk ~ Z~GL(x*, h * ) Z , , the columns of A(x*) are linearly independent,

IIx~+'- x*ll iix ~-,

0.

x,ii -~

Proof. Follows directly from Lemma 4 and Theorems 1 and 2. 2.3. Local convergence

Next we establish that Algorithm 1 is locally convergent. That is, provided x k is sufficiently close to x*, then {x ~} ~ x*. Lemma 5. Suppose that x ~ and x 2 are generated by Algorithm 1, with starting vector x °. Under assumptions (A)-(D), if x ° is sufficiently close to x*, it follows that

]lAzw21l~llx°-x*ll. Proof, By (2.25b),

IIA2w211 ~< II(A~A2)-'II" Ila=ll" IIATII"IIE,II" IIx ' - x*ll + (L4 q- Ls)IIA2[["[Ix l-

x'I[ 2.

Now, using L e m m a 3, we have, for x ° sufficiently close to x*,

IIm~wZll x*,

then for x °, ZToBoZo sufficiently close to x*, ZT GL(X *, X*)Z, respectively,

Ilu=ll ~< AIIx°- x*llProof. Using (2.32),

Ilu211 ~< IIc111. Uwill + (IIE,II + IIE,II IIdLII)llx ~ - x*ll + L6[[x I X,II2. But using (2.25b), we can replace

(2.33)

Ilwlll with

II(ATA0-tll • Ilmo~ll• IIEoU" IIx °- x*ll + (g4 + t,)llx °- x*ll 2.

(2.34)

In light of (2.33), (2.34) and Lemma 3, our result follows. Theorem 4. Under the assumptions of Lemmas 5 and 6, then for x °, ZToBoZo

sufficiently close to x*, z,TGL(x*, X *)Z, respectively,

{x k} -~ x*, where {x k} is generated by Algorithm 1. Proof. By Lemmas 5 and 6 and (2.19), LIx2-x*lL p(x') and d is the successive quadratic programming direction. 4.2. Future work

The convergence rate results presented in this paper are dependent on the projected Hessian approximation asymptotically approaching the true projected Hessian. The full n × n Lagrangian Hessian is never approximated, and thus computational expense is reduced. To ensure that the projected Hessian approximation approach the true projected Hessian necessitates that an expensive method be used (such as gradient differencing along the columns of Zk), at least in a neighbourhood of x*. In fact, the numerical results given in [4] are based on an implementation which uses a rank-2 updating procedure when far from the solution and then switches to a gradient difference method when nearing a solution. It is expected that a full quasi-Newton implementation of our method will be developed. This expectation is fueled b y the result of Powell [15] which states that, (using the successive quadratic programming approach), the projected Hessian approximations need only be asymptotically accurate along the directions of search, and superlinearity will be maintained. (This result parallels a superlinearity characterization given by Dennis and Mor6 [8].) We expect a similar property holds for the method given here and this gives hope for a full quasi-Newton implementation.

References

[1] C. Charalambous, "A lower bound for the controlling parameter of the exact penalty function", Mathematical Programming 15 (1978) 278-290. [2] C. Charalambous, "On the conditions for optimality of the nonlinear Ii problem", Mathematical Programming 17 (1979) 123-135. [3] T.F. Coleman and A.R. Conn, "Second-order conditions for an exact penalty function", Mathematical Programming 19 (1980) 178-185. [4] T.F. Coleman and A.R. Conn, "Nonlinear programming via an exact penalty function: Global analysis', Mathematical Programming 24 (1982) 137-161. [This issue.]

136

T.F. Coleman and A.R. Corm/Nonlinear programming: Asymptotic analysis

[5] R.M. Chamberlain, "Some examples of cycling in variable metric methods for constrained optimization", Mathematical Programming 16 (1979) 378-383. [6] R.M. Chamberlain, H.C. Pederson and M.J.D. Powell, "A technique for forcing convergence in variable metric methods for constrained optimization", presented at the Tenth International Symposium on Mathematical Programming, Montreal (1979). [7] A.R. Conn and T. Pietrzykowski, "A penalty function method converging directly to a constrained optimum", SIAM Journal on Numerical Analysis 14 (1977) 348-375. [8] J.E. Dennis and J.J. Mor6, "Quasi-Newton methods, motivation and theory", SIAM Review 19 (1977) 46--84. [9] L.C.W. Dixon, "On the convergence properties of variable metric recursive quadratic programming methods", presented at the Tenth International Symposium on Mathematical Programming, Montreal (1979). [10] A.V. Fiacco and G.P. McCormick, Non-linear programming: Sequential unconstrained minimization techniques (Wiley, New York, 1968). [11] P. Gill and W. Murray, Numerical methods for constrained optimization (Academic Press, London, 1974). [12] S.P. Han, "A globally convergent method for nonlinear programming", Journal of Optimization Theory and Applications 22 (1977) 297-309. [13] S.P. Han, "Superlinearly convergent variable metric algorithms for general nonlinear programming problems", Mathematical Programming 11 (1976) 263-282. [14] W. Murray and M. Wright, "Projected Lagrangian methods based on the trajectories of penalty and barrier functions", Technical Report SOL 78-23, Department of Operations Research, Stanford University, Stanford, CA (1978). [15] M.J.D. Powell, "The convergence of variable metric methods for nonlinearly constrained optimization calculations", in: O.L. Mangasarian, R.R. Meyer and S.M. Robinson, eds., Nonlinear programming 3 (Academic Press, New York, 1978) pp. 27-63. [16] W. Zangwill, "Nonlinear programming via penalty functions", Management Science 13 (1967) 344-350.