On differentiable exact penalty functions - Springer Link

JOURNAL OF OPTIMIZATION THEORY AND APPLICATIONS: Vol. 50, No. 3, SEPTEMBER 1986

On Differentiable Exact Penalty Functions C. V I N A N T E 1 A N D

S. P I N T O S 2

Communicated by R. A. Tapia Abstract. In this work, we study a differentiable exact penalty function for solving twice continuously differentiable inequality constrained optimization problems. Under certain assumptions on the parameters of the penalty function, we show the equivalence of the stationary points of this function and the Kuhn-Tucker points of the restricted problem as well as their extreme points. Numerical experiments are presented that corroborate the theory, and a rule is given for choosing the parameters of the penalty function. Key Words. Constrained optimization, nonlinear programming, differential exact penalty functions, computational methods, augmented Lagrangian functions.

1. Introduction Considerable attention has been given in recent years to devising methods for solving nonlinear programming problems via unconstrained minimization techniques. One class of methods which has emerged as very promising is the exact penalty function methods, which avoid the sequence o f unconstrained minimization problems characteristic o f the augmented Lagrangian methods. In Refs. 1-4, it is shown that, under suitable assumptions, it is possible to define a continuously differentiable function V(x, A, E, oe), whose unconstrained minima yields the solution of the constrained problem and its associated multipliers. Moreover, this function can also be used as a line search function for various direction finding algorithms, One of such possibilities was studied in Ref. 5 in relation to a recursive quadratic programming algorithm for equality constrained problems. For the equality constrained problem minf(x),

subject to

g(x) = O,

1Associate Professor, Posgrado de Ingenieria, Universidad del Zulia, Maracaibo, Venezuela. z Assistant Professor, Department of Mathematics, Facultad de Ingenieria, Universidad deI Zulia, Maracaibo, Venezuela. 479 0022-3239/86/0900.0479505,00/0 @ 1986 Plenum Publishing Corporation

480

JOTA: VOL. 50, NO. 3, SEPTEMBER 1986

where f : R" ~ R, g: R" ~ R m, DiPillo and Grippo (Ref. 2) [see also Boggs and Tolle (Ref. 4)] studied the exact penalty function

S(x, A, ~)=f(x)+A~g(x)+(1/e)llg(x)lf+ [IM(Vf+vgA)ll 2

(1)

for two cases: (i) M=~-dI, where I is the identity matrix of appropriate dimensions, so that if x is regular point, MVg is of rank m; (ii) M =,/-dlVgl r, so that M(x)Vg is invertible. Bertsekas (Ref. 1) studied the same functions in relation to enlarging the region of convergence of the Newton method and also showed that there is a close relation between these functions and the class of penalty functions of Fletcher. See also Tapia (Ref. 6). Extensions to inequality constrained problems were studied by DiPitlo and Grippo (Ref. 3) by converting inequality constraints to equality constraints using squared slack variables and a special choice of the matrix M(x), similar to case (ii) for the equality constrained case. In this work, we study the exact differentiable penalty function with M = v~-~l for the inequality constrained problem. For this choice of the matrix M, MVg is no longer invertible as in Ref. 3; therefore, some special conditions have to be imposed on the parameters E and a of the penalty function to guarantee the equivalence of its stationary points and the Kuhn-Tucker (K-T) points of the constrained problem, as well as their local minima. This choice of the matrix M is very convenient to implement the algorithms. Some preliminary numerical experiments are also presented with a standard set of problems. The results obtained are in agreement with those of Ref. 3 and the rule suggested by Ref. 1 for choosing the relation e/c~.

2. Exact Differentiable Penalty Function The problem under consideration is the following nonlinear programming problem: (P)

min f ( x ) , subject to

g(x) ~ O,

where f: R" ~ R and g: R n ~ R m. It is assumed, unless otherwise stated, that the functions f and g are twice continuously ditterentiable on R ". We denote by L(x, A) the Lagrangian function

L(x, A)=f(x)+Arg(x).

(2)

JOTA: VOL. 50, NO. 3, SEPTEMBER I986

481

An equivalent equality constrained problem is given by (B)

mini(x),

gj(x)+y~=O,

subject to

involving the additional vector of squared slack variables y. Now, as proposed in Ref. 1 and Ref. 2, the search for the solution of problem (B) can be converted into the unconstrained minimization with respect to x, y, h of the augmented function m

S( x, y, ~, ~ ) = L( x, A ) + Y~ ~jy~ + (1/~)

rrl

2 (gj + y~)2

1

1

This minimization can be carried out by minimizing first with respect to y and subsequently the resulting function with respect to (x, A). A straightforward calculation shows that

V(x, A, e, a ) = m i n S = L(x, A)+(1/e)]lvxe]12-(1/e)[]d[t 2,

(4)

Y

where the components of the vector d are given by dj = -mini0, gj(x) + (eAJ2)(1 + 4aAfl].

(5)

It can be easily proved that, under the conditions of program (P), the function V is continuously differentiable with respect to x and A with gradients

VxV(x, A, e, a) = (I + 2aV~xL)VxL + (2/e)Vxg(g + d),

(6)

V ~ V(x, A, e, a) = (g + d) + 2a[V~g] rV~L+ 8a Ad,

(7)

where A = diag[h~, A2. . . . , Am].

3. Relation between K-T Points and Stationary Points of V

In this section, we will establish the relations between the stationary points of the function V and the Kuhn-Tucker points of problem (P). Theorem 3.1. Let (g, A) be a point satisfying the K-T necessary conditions for problem (P), Then, (£, A) is a stationary point for the function V(x, A, ~, a).

482

JOTA: VOL. 50, NO. 3, SEPTEMBER 1986 Proof.

At the point (~, X), we have

VxL07, X) = 0,

h g ( £ ) = 0,

(8)

and by (5) and (8) we obtain g ( f f ) + d(£, X, E, ~) = 0,

Ad0~, X, e, a ) = 0.

(9)

Therefore, (6) and (7) with (8) and (9) yield

VxV(~, ~, E, a) = o,

V~V(x, ~, e, c~) =0, which proves the theorem.

[]

To prove the converse of T h e o r e m 3.1 with M = ,/-d/, we state first the following lemmas. Lemma 3.1. Let Q be an n x n positive-definite matrix, y = Qx with 1, and M = min{xrQx: Ilxll = 1}. Then,

Ilxll =

max{lyil: i = 1 , 2 , . . . ,

n}>-k,

where k = M / v ~ . ProoL

Assume that tY~l< k, Vi. Then, for x such that llxll = 1, we have m

m

xTQx = xTy ~ E Ix, tly, t < k E Ix, I-< M. 1

1

Therefore, a contradiction arises.

D

Lemma 3.2. Let Q be an n x n positive-definite matrix and x(e) a vector whose n o r m is an infinitesimal of order k for e ~ 0. Then, the vector y = Qx(e) has at least one c o m p o n e n t y~ whose absolute value is also an infinitesimal of order k for e ~ 0. Proof.

Applying the previous l e m m a to x(e)/Ilx[[, there exists i such

that ly, I/ llxl[ >- k. Then, ly, I ~ kllx(')ll;

(10)

and, from properties of the norm,

[[yl[ -< IIOII tlx(,)ll. From (10) a n d (11), the result follows.

(1:) []


483

Lemma 3.3. Let X x T be a compact subset of R" x R " and assume that: (i) the matrix (I+2aV~xL) is positive definite on X x T; (ii) every feasible point in X is regular; (iii) e is infinitesimal of higher order than a(e), that is, lim~_~o(e/a) = O.

Then, there exists g-> O, such that, for e c (0, g], every stationary point of V(x, h, E, a) satisfies the relation g j + 4 =0,

j = l , 2 . . . . ,m,

that is, g~ + mini0, gj + (eAj/2)(1 + 4aAfl] = 0.

(12)

Proof. We will proceed by contradiction, assuming that the lemma is false, that is, there exists a decreasing sequence {Ek} tending to zero, where {xk, "~k}are stationary points of V(x, 2~, ek, ak) and do not satisfy (12). Since X x T is compact, there is a subsequence of {Xk, hk} converging to some (~, h) c X x 72. To simplify the notation, we will use the same nomenclature {Xk, hk} for this converging subsequence. At each stationary point (Xk, hk) of the function V, we have from (6) and (7)

VxL = - ( 2 / e k ) [ I + 2akV~xLJ-~V,,g(g + d),

(13)

g + d = - 2 a k [ ( V x g ) r i 2 ~ / D ] [ ( V , ~ L ) r i ( 2 . , / D a ) r ] r,

(14)

where D = diag[d~, d 2 , . . . , din]. It can be easily shown that

2~/DA = --(2/ ~k)2D(g + d) - 8ak,,/ DA,~.

(t5)

Substituting (15) and (13) in (14) yields

g + d = 4(ak/ek)Q(g + d) + 32a~DAh,

(16)

O = [(Vg) v(I + 2 a k V L L ) - a v g + 4D],

(17)

with

and this matrix, see Ref. 7, is positive definite in a neighborhood of (~, X), that is, for small values of o~k. Since DAb = A2d, (16) can be expressed as

[ I - 4 ( a k / e k ) Q - 3 2 a ~ A 2 ] ( g + d) = -32a~A2g,

(18)

[--(ek/ak)I + Q+ 8akEkA2](g + d) = 8akekA2g,

(19)

or

484


and this expression is valid for every (Xk, hk). From (5), it follows that the components of the vector g + d can be expressed as

gj + dj = max[gl(xk), --(Ekhjk/2)(1 + 4akhjk)].

(20)

AS k ~ , the matrix that multiplies g + d in (18) tends to 0. Then, there is a/~ sufficiently large and a corresponding pair (g, a) such that, for k->/~, the points (Xk, hk) belong to the neighborhood of (2, ~) for which the matrix - ( e k / a k ) I + Q + 8 akekA 2 is positive definite. To establish the contradiction, we need to prove that g + d = 0 on the points of the subsequence for k ->/~ The following cases will be considered. (a) If

g~(xk) >- (EkA~k/2)(1 + 4akAjk),

Vj,

(21)

then g + d - = g and, from (19),

[--( ek/ 4ak ) I + Q( xk, Ak, ek, ak)](g + d) = O. Since [--(ek/4ak)l+ Q] is positive definite, g + d = O. (b) If (21) is not true for every j, define the set J = {j: gj(Xk) < (ek)tjk/2)(1 + 4akAjk); ~j # 0}. If J is not empty, i.e., there is at least one sequence {~'jk} not tending to zero, the infinitesimal order of the vector g + d is the same as the order of e. Since Q is positive definite, by Lemma 3.2 the infinitesimal order of some elements of the left-hand side vector in (19) is of the same order as e, but the corresponding elements on the right-hand side of (19) are of the form 80tkEk)t2kgj(Xk), which are of higher infinitesimal order. Therefore, a contradiction arises and g + d - - 0 for k->/~. (c) If (21) is not true for every j and if J is empty, we partition the vector g + d into two sets, set Y where the maximum in (20) is gj, and set Z where the maximum is --(ekAjk/2)(1 d-4Olk)tjk). Considering the case where the infinitesimal order of g + d is given by an element of the set Y, let it be gj(Xk). By Lemma 3.1, there is at least one component of the left-hand side vector; let it be the ith component, with infinitesimal order equal to gj(Xk). If i e Y, the corresponding component of the right-hand side vector is 8akEkA2kgi(Xk), a n infinitesimal of higher order than gj(Xk), since Ig~(Xk)l-/~, which contradicts the initial assumption, and the lemma is proved. []


485

Based on Lemma 3.3, we prove now the following theorem. Theorem 3.2. Let X x T be a compact subset of R ~ x R " ; and let the conditions of Lemma 3.3 be satisfied. Then, there exists g-> 0 and a corresponding 5(g), such that, for E~(0, g], every stationary point (if, X) of V(x, A, E, a) is a K - T point of problem (P). Proof.

At (~, ~),

VxV(~, Yt) = 0,

g(~)+d(Y~,X,e,a)=O.

Then, (6) yields

VxL(X, X) = 0;

(22)

and from (7) we obtain Ad = 0, which implies that Ag = 0.

(23)

Since by definition d _>0, we have g(X)- 0. From (22), (23), (24), (27), the proof follows.

(27) []

The results of this section show that a should be taken small enough so that I + 2 a V 2 L is positive definite on X × ~ and that, for this a, should be such that the matrix - ( e/4a )I + (V~g) T(I + 2aV2.L)-lVxg + 4D, where the diagonal matrix D is

D(j,j) = -min[0, &(x) + ( O / 2 ) ( 1 + 4c~as)], is positive definite on X x T. Under these conditions, the critical points of V(x, .L ~, a) are equivalent to the K - T points for problem (P).

486


4. Local Optimality Results In this section, we will establish local optimality results considering second-order derivatives under the assumption that the functions f and g are three times continuously differentiable. With these assumptions, the first three terms in (4) are twice continuously differentiable. For the fourth term, it can be proved (see Ref. 7) that, if (2, ~) is a K-T point that satisfies the property of strict complementarity, then there exists a neighborhood of the point in which the function lid [I2 is twice continuously differentiable. Establishing the following convention to distinguish between active and nonactive constraints:

Ia(.~) ~ {j: gj(X) =0}, IB(2) a_{j: g2(X) ¢ 0}, and defining the diagonal matrices

E(j,j)=O,

j6IA,

E(j,j)=I,

jelB,

and G(j,j)= gj(x), we have

Vxd = -VxgE, Vxd = - ( • / 2 ) ( 1 + 8aA)E, and V~x V = (V 2xL)(I + 2aV ~ L ) + (2/•)(Vxg)(I - E)(Vxg) ~, v = (Vxg)

-(I + 2,

VLL) -

V2AAV = 2a(Vxg)rVxg - (e/2)E - gaG.

(28) (29)

(30)

We now prove the following theorems. Theorem 4.1. Let f and g be three times continuously differentiable; and let (x*, A*) be a regular K-T point for problem (P) where strict complementarity is satisfied and

xT(V2xxL(X*,A*)x>O ,

Vx>O, withVgj(x*)Tx=O,jEIA(X*).

(31)

Then, there exists g and the corresponding 6 such that, for • e (0, g], the function V(x, A, •, a) has a strict local minimum at (x*, A*). Proof.

See Ref. 1, Proposition 2.1.

We will prove now the converse of Theorem 4.1.

D

JOTA: VOL, 50, NO. 3, SEPTEMBER 1986

487

Theorem 4.2. With the assumptions of Theorem 3.2 and the condition of strict complementarity for the K-T points, there exists g and the corresponding a such that, for e6(0, g], every local minimum (x*,A*) of V(x, A, e, a) that satisfies the second-order sufficient conditions is a local minimum of probtem (P) satisfying the second-order sufficient conditions

xr[VZ~L(x*,A*)]x>O,

Vx#0,

with (V~gj(x*))Tx = O,j ~ la(x*).

(32)

Proof. Let g be defined by the conditions of Theorem 3.2. Then, if

V(x, ~, e, ~) has a local minimum at (x*, A*), this point is also a K - T for problem (P); and, since strict complementarity holds, we have only to show that V2x~L(x*, A*) is positive on the tangent hyperplane. At (x*, A*), we have

(x,)t)r[V2V(x*,,t*)](x,A)>O,

V(x, A) ~0.

(33)

Let x be an element of the tangent hyperplane, and let A be such that ~ = 0 for i t IB(x*). Expanding (33) and defining the vectors

g,,(x*) = (gj(x*)),

j ~ IA(X*),

(34)

Ab = (Aj),

j ~ IB(X*),

(35a)

aa = (aj),

j e IA(x*),

(35b)

we have

(x, a)~[v2 V(x *, ,l*)](x, ;t) = (2/~)ll(Vxg~)TxllZ+x~VZxLx+2~ IlV~xLx+VxgaA [I + 2Aa(Vxg,)Tx - 8aA TGA -- (e/2)]lAb 1]2> 0.

(36)

By the conditions of the theorem, the first, fourth, fifth, and sixth terms of the right-hand side are zero. Therefore,

(x, a)~[v2 v(x *, a*)](x, a) = 2ce ltV~xL(x* , A*)x+Vg,~(x*, A*)A~tl2

+ xW[V2xL(x*, a*)Jx > 0.

(37)

Since the norm in the first term on the right is continuous, it is bounded on the compact set and this term tends to zero with or. Then, xr[v2~L(x *, A*)]x cannot be negative. Furthermore, it cannot be zero, since this would imply that either V~xL(x*,A*)x=O or that x and V2xL(x *, A*) are orthogonal. In the first case, taking Aa=0 leads to a contradiction. The second case implies that V~xL(x*, )t*)x belongs to the orthogonal subspace complementary to the tangent hyperplane and is a

488


linear combination of the gradients of the active constraints, g;(x),j ~ IA(X*). Then, by choosing appropriate values for the vector )ta, we can have ItV~xZ(x*,,~*)x+Vg~,~ollZ--O and (x, I~)T[v2V(x *,/~*)](x, A)~--~-0; a contradiction arises, and the theorem is proved.

V1

It can also be proved (see Ref. 7) that, if in an interior K - T point problem (P) has a global minimum, the function V(x, h, e, a) has also a global minimum. The converse result also holds with the conditions of Theorems 3.2.

5. Numerical Experiments In this section, we present the computational experiments done on four problems studied by Ref. 3. These experiments were performed primarily to study the algorithm in relation to the parameters e and a. The problems were solved by means of a quasi-Newton algorithm using the DFP updating formula and the stopping rule

II(x, A)k+'-(x, X)kll ~Xto,

and

Ilf(x, A)k+'-f(x, A)~II