WEAK SHARP MINIMA IN MATHEMATICAL ... - CiteSeerX

14 downloads 0 Views 247KB Size Report
9] M.C. Ferris. Weak sharp minima and penalty functions in mathematical programming. Technical. Report 779, Computer Sciences Department, University of ...
WEAK SHARP MINIMA IN MATHEMATICAL PROGRAMMING J.V. BURKEy AND M.C. FERRISz

Abstract. The notion of a sharp, or strongly unique, minimumis extended to include the possibility of a nonunique solution set. These minima will be called weak sharp minima. Conditions necessary for the solution set of a minimization problem to be a set of weak sharp minima are developed in both the unconstrained and constrained cases. These conditions are also shown to be sucient under the appropriate convexity hypotheses. The existence of weak sharp minima is characterized in the cases of linear and quadratic convex programming and for the linear complementarity problem. In particular, we reproduce a result of Mangasarian and Meyer that shows that the solution set of a linear program is always a set of weak sharp minima whenever it is nonempty. Consequences for the convergence theory of algorithms is also examined, especially conditions yielding nite termination.

1. Introduction. Let f : X 7! IR : = IR Sf?1; 1g, we say that f has a sharp minimum at x 2 IRn if f (x)  f (x) + kx ? xk ; for all x near x and some > 0. The notion of a sharp minimum, or equivalently, a strongly unique local minimum, has far reaching consequences for the convergence analysis of many iterative procedures [1, 8, 11, 12, 17, 18]. In this article, we extend the notion of a sharp minimum in order to include the possibility of a non-unique solution set. We say that S  IRn is a set of weak sharp minima for the function f relative to the set S  IRn where S  S if there is an > 0 such that (1) f (x)  f (y) + dist (x j S); for all x 2 S and y 2 S where dist (x j S) := zinf kx ? zk : 2S The constant and the set S are called the modulus and domain of sharpness for f over S , respectively. Clearly, S is a set of global minima for f over S . The notion of weak sharp minima is easily localized. We will say that x 2 IRn is a local weak sharp minimum for f on S  IRn if there exists a set S  S and a parameter  > 0 with x 2 S such that the set S \ fx : kx ? xk  g is a set of weak sharp minima for the function ( x) , if kx ? xk   f (x) := f+(1 , otherwise, relative to the set S . Since the restriction to the local setting is straightforward, we will concentrate on the global de nition.

This material is based on research supported by National Science Foundation Grants CCR-9157632 and Air Force Oce of Scienti c Research Grant AFOSR-89-0410 y Department of Mathematics, GN{50, University of Washington, Seattle, Washington 98195 z Computer Sciences Department, University of Wisconsin, Madison, Wisconsin 53706 1 

The study of weak sharp minima is motivated primarily by applications in convex, and convex composite programming, where such minima commonly occur. For example, such minima frequently occur in linear programming, linear complementarity, and least distance or projection problems. The goals of this study are to quantify this property, investigate its geometric structure, characterize its occurence in simple convex programming problems, and nally to analyze its impact on the convergence of algorithms. Furthermore, although our primary interest is with convex programming, we also investigate the signi cance of weak sharp minima for nonconvex problems. However, in this later case rather strong regularity conditions are required to yield signi cant extentions of the convex case. Nonetheless, we do obtain some very interesting and signi cant results for di erentiable problems with convex constraints. These results extend and re ne earlier work of Al-Khayyal and Kyparisis [1] on the nite termination of algorithms at sharp minima. In a later study, we also show how these results can be applied to convex composite optimization problems to establish the quadratic rate of convergence of a variety of algorithms. This study builds on the work initiated in [9]. Our study begins in Section 2 with the derivation of rst order necessary conditions for the solution set of a problem to be a set of weak sharp minima. The unconstrained (S = IRn ) and constrained cases are treated separately. When the problem data is convex, it is shown that these conditions are also sucient. In the third section these results are applied to three important classes of convex programs: quadratic programming, linear programming, and the linear complementarity problem. In the nal section we examine certain tools for studying the convergence of algorithms in the presence of weak sharp minima. In particular, it is shown how one can attain nite convergence to weak sharp minima. The notation that we employ is for the most part standard, however, a partial list is provided for the readers convenience. The inner product on IRn is de ned as the bi-linear form n X hy; xi := yixi: i=1

We denote a norm on IRn by kk. Each norm de nes a norm dual to it and is given by

kxko := sup hy; xi : kyk1

The associated closed unit balls for these norms are denoted by IB and IB, respectively. The 2-norm plays a special role in our development and is denoted by q kxk2 := hx; xi: If it is understood from the context that we are speaking of the 2-norm, then we will drop the subscript \2" from this notation. Given two subsets A and B of IRn and 2 IR, we de ne

A  B := fa  b : a 2 A; b 2 B g: 2

On the other hand,

A n B := fa 2 A : a 2= B g: If A  IRn then the polar of A is de ned to be the set A := fx 2 IRn : hx; xi  1 8x 2 Ag: This notation is consistent with the de nition of the dual unit ball IB. The indicator and support functions for A are given by ( if x 2 A ; and (x j A) := 0+1 otherwise (x j A) := supfhx; xi : x 2 Ag;

respectively. Moreover, we write int A for the interior of A, cl A for the closure of A, and span A for the linear span of the elements of A. The relative interior of A, denoted ri A, is the interior of A relative to the ane hull of A which is given by (X ) s s 2 f 1 ; 2 ;   g ; x 2 A and  2 IR k k a A := k xk for k = 1; 2;    ; s; with Ps  = 1 : k=1 k k=1 The subspace perpendicular to A is de ned to be A? := fy 2 IRn : hy; xi = 0 for all x 2 Ag: If A is closed, then we de ne the projection of a point x 2 IRn onto the set A as the set of all points in A that are closest to A in a given norm. In this paper, we will only speak of the projection with respect to the 2-norm and it is denoted by P (x j A) := fy 2 A : kx ? yk2 = yinf kx ? yk2g: 2A

The projection is an example of a multivalued mapping on IRn. The set A is said to be convex if the line segment connecting any two points in A is also contained in A. The convex hull of the set A, denoted co (A), is the smallest convex set which contains A, that is, co (A) is the intersection of all convex sets which contain A. It is interesting to note that the projection operator can be used to characterize the closed convex subsets of IRn. That is, the set A is closed and convex if and only if the projection operator for A, P ( j A), is single valued on all of IRn [2, 16]. Given x 2 A, we de ne the normal cone to A at x, denoted N (x j A), to be the closure of the convex hull of all limits of the form lim t?1(x ? pk ); k k k

where the sequences ftk g  IR, fpk g  A, and fxk g  IRn satisfy tk # 0, pk 2 P (xk j A), and pk ! x. If A is convex, one can show that this de nition implies that N (x j A) = fx 2 IRn : hx; y ? xi  0 8y 2 Ag: 3

The tangent cone to A at x is de ned dually by the relation T (x j A) := N (x j A): If A is convex, we have the relation T (x j A) = cl [[0(A ? x)] : The contingent cone to A at x plays a role similar to that of the tangent cone but is, in general, larger. The contingent cone to A at x is given by K (x j A) := fd 2 IRn : 9tk # 0; dk ! d; with x + tk dk 2 Ag : The set A is said to be regular at x 2 A if T (x j A) = K (x j A). In particular, every convex set is regular. Let f : X 7! IR := IR [ f+1g. The domain and epigraph of f are given by dom f := fx 2 IRn : f (x) < +1g and epi f := f(x; ) 2 IRn  IR : f (x)  g; respectively. Observe that f is lower-semicontinuous (l.s.c.) if and only if epi f is closed. For x 2 dom f , we de ne the subdi erential of f at x to be the set @f (x) := fx : (x; ?1) 2 N ((x; f (x)) j epi f )g; and the singular subdi erential of f at x to be the set @ 1f (x) := fx : (x; 0) 2 N ((x; f (x)) j epi f )g: The mappings @f and @ 1f are further examples of multivalued mappings on IRn . We observe that the set @f (x) [ @ 1f (x) is always nonempty even though @f may be empty at certain points. Moreover, the function f is locally lipschitzian on IRn if and only if @f is nonempty and compact valued on all of IRn . The domain of @f is the set dom @f := fx 2 IRn : @f (x) 6= ;g: If f is convex, then this subdi erential coincides with the usual subdi erential from convex analysis. The generalized directional derivative of f is the support function of @f (x), f  (x; d) := (d j @f (x)); and the contingent directional derivative of f at x in the direction d is given by f ?(x; d) := lim inf f (x + tu) ? f (x) :

t

!#

u d t 0

The relation f ? (x; d)  f (x; d) always holds. The function f is said to be regular at x if f (x; d) = f ? (x; d) in which case the usual directional derivative, f 0(x; d) := lim f (x + td) ? f (x) ; exists and equals this common value.

t

t#0

4

2. Subdi erential Geometry. We begin with a study of some geometric conse-

quences of of weak sharp minima. Speci cally, we are interested in rst order necessary conditions. The general unconstrained (S = IRn) and constrained cases are treated separately. In both cases, it is shown that the necessary conditions are also sucient under appropriate convexity hypotheses. The following preliminary result is required. Lemma 2.1. Suppose f : IRn 7! IR is closed, proper, and convex, the sets S: = arg minff (x) : x 2 IRn g and C are nonempty, closed, and convex subsets of IRn with C  S, and T> 0. The following are equivalent: 1. IB  T S N (x j S)  @f (x)S, 8x 2 C  2. IB x2C N (x j S)  x2C @f (x)

Proof 1 =) 2 Trivial 2 =) 1 Let z 2 C and z 2 IB T N (z j S). Then by hypothesis, z 2 @f (u) for some u 2 C . C  S implies that @f (u)  N (u j S), hence z 2 N (u j S), whereupon noting that z 2 N (z j S) gives (2) hz; ui = hz; zi However, z 2 @f (u) is by de nition f (y) ? f (u)  hz; y ? ui, for all y. Since u, z 2 S, f (u) = f (z) so that (2) gives f (y) ? f (z)  hz; y ? zi, for all y, or equivalently, z 2 @f (z). Necessary conditions for weak sharp minima in the unconstrained case now follow. Theorem 2.2. Let f : IRn 7! IR be lower semi-continuous and > 0. Consider the following statements: 1. The set S is a set of weak sharp minima for the function f on IRn with modulus . 2. For all d 2 IRn , f ? (x; d)  dist (d j K (x j S)): 3. For all d 2 IRn

f (x; d)  dist (d j T (x j S)): 4. The inclusion

\ IB  N (x j S)  @f (x)

holds. 5. The inclusion

IB

2 \  4[

x2S

3 [ N (x j S)5  @f (x) x2S

holds. 5

6. For all y 2 IRn,

f 0(p; y ? p)  dist (y j S); where p 2 P (y j S): Statement 1 implies statement 2 for all x 2 S. Statement 2 implies statement 3 at points x 2 S at which S is regular. Statements 3 and 4 are equivalent. If f is closed proper and convex and the set S is nonempty closed and convex, then statements 1 through 6 are equivalent with 2, 3, and 4 holding at every point of S. Proof [1 =) 2]: Let x 2 S. The hypothesis guarantees that for all t and d0 f (x + td0) ? f (x)  dist (x + td0 j S)

which implies that

f (x + td0) ? f (x)  dist (x + td0 j S) ? dist (x j S) t t By taking lim infs of both sides as d0 ! d and t # 0 and applying [4, Theorem 4], we

obtain the result. [(2 plus regularity ) =) 3]: Simply observe that regularity at x 2 S implies the equivalence T (x j S) = K (x j S) and by de nition f (x; )  f ? (x; ). [3 () 4]: We recall from [5, Theorem 3.1] that if K  IRn is a nonempty closed convex cone, then dist (x j K ) = (x j K  \ IB ): The result now follows from the fact that f (x; ) = ( j @f (x)). Observe that if f is closed proper and convex, and S is nonempty closed and convex, then f is regular on its domain and S is regular at each of its elements. Hence either one of the statements 1 or 2 implies both 3 and 4 for all x 2 S. [(4 holds for all x 2 S) =) 5]: Trivial. [(5 plus convexity) =) 4]: Convexity and Lemma 2.1 combine to establish that 5 implies 4. [(5 plus convexity) =) 1]: Given y 2 IRn , Theorem 1 in [4] implies the existence of T a x 2 IB  N (P (y j S) j S) such that dist (y j S) = hx; yi ? (x j S). Thus, by hypothesis, there exists a x 2 S with x 2 @f (x). Hence

f (y)  f (x) + hx; y ? xi  f (x) + hx; yi ? hx; xi  f (x) + hx; yi ? (x j S) = f (x) + dist (y j S): Since y 2 IRn is arbitrary, the result is obtained. 6

[(1 plus convexity) =) 6]: Let y be given and de ne p: = P (y j S) so that f (y)  f (p) + dist (y j S) = f (p) + ky ? pk. Let z = y + (1 ? )p for  2 [0; 1]. Then p = P (z j S) and

f (z)  f (p) + kz ? pk = f (p) +  ky ? pk implying that

f (p + (y ? p)) ? f (p)  ky ? pk 

The result now follows in the limit. [(6 plus convexity) =) 1]: Since f is convex it follows that for all x and y # " f ( x + t ( y ? x )) ? f ( x ) 0 f (x; y ? x) = inf

t

t>0

so that for any y we may take x = P (y j S) = p, t = 1 and

f (p + y ? p) ? f (p)  f 0(p; y ? p)  dist (y j S) Corollary 2.3. Suppose f is closed proper and convex and has a set of weak

sharp minima, S, that is nonempty, closed, convex and compact. Then

0 2 int

[

x2S

@f (x)

Proof The corollary follows if we can show that [ N (x j S) = IRn x2S

Clearly, Sx2S N (x j S)  IRn, so let y 2 IRn . By continuity of hy; i and compactness of S

z 2 arg maxz2S hy; zi so that hy; z ? zi  0, for all z 2 S. Hence y 2 N (z j S).

In the constrained case, one must introduce a constraint quali cation in order to guarantee the validity of the type of rst order optimality conditions that are required for our analysis. For the problem minimize f (x) ; (3)

x2S

these optimality conditions take the form (4)

0 2 @f (x) + N (x j S ): 7

The condition (4) is not always guaranteed to be valid even in the fully convex case and so a constraint quali cation is required. Example 2.4. Consider the problem (3) where f : IR 7! IR is given by ( p 1 + x2 , for x 2 [?1; 1] f (x) := ? +1 , otherwise, and S := fx : x  ?1g. This is a convex program with a closed proper convex objective function having unique global solution x = ?1. However, the condition (4) does not hold since @f (x) = ;. For this reason we introduce the following constraint quali cation due to Rockafellar [20]. Definition 2.5. We say that the basic constraint quali cation (BCQ) for (3) is satis ed at x 2 S if for every u 2 @ 1 f (x) and v 2 N (x j S ) such that u + v = 0 it must be the case that u = v = 0. The BCQ is said to be satis ed on a set S  S if it is satis ed at every point of S. From Rockafellar [20, Corollary 5.2.1], we know that the optimality condition (4) is satis ed at every local solution to (3) at which the BCQ holds. In particular, if f is locally lipschitzian on IRn, then @ 1f (x) = f0g on all of IRn, hence the BCQ is vacuously satis ed on all of S and so (4) holds at every local minima for (3). Theorem 2.6. Suppose f : IRn 7! IR is lower semi-continuous and S  S are nonempty closed subsets of IRn . a) The inclusion (5)

h i \ IB  @f (x) + T (x j S ) N (x j S) :

holds at x 2 S if and only if (6) b)

c)

\ f (x; z)  kzk 8 z 2 T (x j S ) N (x j S):

If S is a set of weak sharp minima for f over S with modulus > 0 such that the BCQ holds at every point of S, then for each x 2 S at which f , S , and S are regular one has the inclusion (5). If one further assumes that f is closed proper and convex and the sets S and S are nonempty closed and convex, then S is a set of weak sharp minima for f over S with modulus > 0 if and only if the inclusion (5) holds for all x 2 S.

Proof

a) We show that (5) and (6) are equivalent. Clearly, both statements are false if @f (x) is empty, so we assume it to be nonempty. First note that (6) is equivalent to \ (7) sup fhx; zi j x 2 @f (x) g  kzk 8z 2 T (x j S ) N (x j S): We show this is equivalent to n h i o \ (8) sup hx; zi x 2 @f (x) + T (x j S ) N (x j S)  kzk 8z 2 IRn: 8

This is accomplished in two parts. First it is shown that the supremum in (8) is in nite T if z 2= T (x j S ) N (xTj S) and then it is shown that theT suprema in (7) and (8) are equalhif z 2 T ( x j S ) N i(x j S). Suppose z 2= T (x j S ) N (x j S). Then there exists T z 2 T (x j S ) N (x j S) such that hz; zi > 0. Let x 2 @f (x), which is nonempty by assumption, and consider x + z as  ! 1. Since hx + z; zi " +1 as  " +1, we see that the supremum in (8) is in nite. Suppose that z 2 T (x j S ) T N (x j S). Then sup fhx; zi j x 2 @f (x)ng h io \  sup hx; zi x 2 @f (x) + T (x j S ) N (x j S)  h i since 0 2 T (x j S ) T N (x j S) . However n h i o \ sup hx; zi x 2 @f (x) + T (x j S ) N (x j S) n h i o \ = sup hy + z; zi y 2 @f (x); z 2 T (x j S ) N (x j S)  sup fhy; zi j y 2 @f (x) g by the de nition of a polar cone. Note that (8) is equivalent to h i  (z j IB )  (z j @f (x) + T (x j S ) \ N (x j S)  ) which is equivalent to h i \ IB  @f (x) + T (x j S ) N (x j S) ; which establishes the result. b) The de nitions imply that S is a set of weak sharp minima for f over S with modulus > 0 if and only if S is a set of weak sharp minima for the function h(x) := f (x)+ (x j S ) over IRn with modulus > 0. We will show that this implies (6) for every x 2 S at which f , S and S are regular. Let x 2 S be a point at which f , S and S are regular. Since S is a set of weak sharp minima for h over IRn with modulus > 0, Theorem 2.2 implies that h0(x; d)  dist (d j T (x j S)) for all d : Now, by the BCQ, [20, Corollary 8.1.2], and the regularity of S , we know that (9) h0(x; d)  f 0(x; d) + ( j S )0(x; d) = f 0(x; d) + (d j T (x j S )): Therefore, f 0(x; d)  dist (d j T (x j S)) for all d 2 T (x j S ): This last inequality implies (6) since dist (d j T (x j S)) = kdk for every d 2 N (x j S). 9

c) Since convexity implies regularity, half of this result has already been established in Part (b). It remains to show that (5) holding for all x 2 S implies that S is a set of weak sharp minima for f over S with modulus . Let x 2 S. It was shown in Part (a) that that the statement (5) is equivalent to the statement (6). Thus we need only show that if (6) holds for all x 2 S, then S is a set of weak sharp minima for f over S with modulus . To this end let x 2 IRn be given and set x = P (x j S). By (9) we only need consider cases where x ? x 2 T (x j S ). From the de nition of projection it follows that x ? x 2 N (x j S). Therefore, f 0(x; x ? x)  kx ? xk, for all x and hence h0(x; x ? x)  dist (x j S), for all x. By Theorem 2.2, S is a set of weak sharp minima for f over S with modulus . Corollary 2.7. Suppose f : IRn 7! IR is di erentiable and S  S are nonempty closed subsets of IRn . a) The inclusion h i \ IB  rf (x) + T (x j S ) N (x j S) : (10) holds at x 2 S if and only if \ hrf (x); zi  kzk 8 z 2 T (x j S ) N (x j S): b) If S is a set of weak sharp minima for f over S with modulus > 0, then for each x 2 S at which S and S are regular one has the inclusion (10). c) If one further assumes that f is closed proper and convex and the sets S and S are nonempty closed and convex, then S is a set of weak sharp minima for f over S with modulus > 0 if and only if i \ \h ?rf (x) 2 int T (x j S ) N (x j S) : x2S

Remark The corollary given above is a strengthening of [1, Proposition 2.2]. In partic-

ular, the equivalence in Part (a) is proven without assumptions on convexity of S . In fact, under the convexity assumptions in Part (c), the condition given in [1] is equivalent to strong uniqueness. By relaxing strong uniqueness to the assumption of a weak sharp minimum, all the results of [1, Proposition 2.2] still follow, with the exception of uniqueness. 3. Some Special Cases. We now examine three important classes of convex programming problems and characterize when these problems posses weak sharp minima. The problem classes considered are linear and quadratic programming and the linear complementarity problem. 3.1. Quadratic Programming. We will use the results on weak sharp minima from Section 2 to obtain a necessary and sucient condition for weak sharp minima to occur in convex quadratic programs. The quadratic programming problem is minimize 1=2 hx; Qxi + hc; xi (11)

x2S

10

where S is polyhedral and Q 2 IRnn is symmetric and positive semide nite. The key to our characterization of when problem (11) has weak sharp minima is the relation (6) in Theorem 2.6. In order to apply this result we must rst obtain a tractable description of the tangent cone to the solution set of (11). This is accomplished by using the description of the solution set of a convex program given in [3, 14]. Theorem 3.1. Let S be the set of solutions to the problem minff (x) : x 2 S g where both f : IRn 7! IR and S  IRn are taken to be convex and choose x 2 S. Then S = fx 2 S j rf (x) = rf (x); hrf (x); x ? xi = 0 g It is clear that for convex quadratic programs this gives the solution set as o \ \n S = S fx j hrf (x); x ? xi = 0 g x r2f (x)(x ? x) = 0 and since S is polyhedral \ \ (12) T (x j S) = T (x j S ) (rf (x))? ker(r2f (x)): Note that rf (x) is constant on the solution set of a convex program and r2f (x) is constant for the problem (11). In the sequel, we shall use the notation rf (x), r2f (x) for these constants and span (d), ker(A) to represent the subspace generated by d and the nullspace of the matrix A, respectively. Theorem 3.2. Let S be the set of solutions to (11) and assume that S is nonempty. Then S is a set of weak sharp minima for f over S if and only if  ? ker(r2f (x))  span (rf (x)) + N (x j S ); 8x 2 S or, equivalently,

\  (rf (x))? T (x j S )  ker(r2f (x)); 8x 2 S; where x is any element of S.

Proof ( (= ) We show that (6) holds. Let x 2 S and d 2 T (x j S ). Note that (12) and the hypothesis gives

 ? K : = T (x j S) = N (x j S ) + span (rf (x)) + ker(r2f (x)) = N (x j S ) + span (rf (x))

Therefore

\ dist (d j T (x j S)) = (d j IB T (x j S)) n \ o = sup hz; di z 2 IB K It Tfollows from [21, page 65] that K = span (rf (x)) + (K T (rf (x))? ), hence, z 2 IB K implies z = rf (x) + y with jj   11

where

(

krf (x)k 6= 0  := 10= krf (x)k ; ifotherwise

and y 2 K T (rf (x))? . Therefore dist (d j T (x j S))n o \ = sup hrf (x) + y; di jj  ; y 2 N (x j S ) (rf (x))?   hrf (x); di  hrf (x); di = hrf (x); di = f 0(x; d) as required. The last two inequalities follow since d and y are polar to each other and by choosing  krf (x)k when rf (x) 6= 0. T ( =) ) Suppose thatTfor some x 2 S, T (x j S ) (rf (x))? 6 ker(r2f (x)). Then there exists d 2 T (x j S ) (rf (x))? with d 2= ker(r2f (x)). Thus from (12), d 2= T (x j S) and so dist (d j T (x j S)) > 0 = hrf (x); di = f 0(x; d) which using (6) implies that (11) does not have a weak sharp minimum. It is possible to illustrate the Theorem by means of adapting a simple example given in [18, page 206] Example 3.3. The problem is 1 2 1 2 minimizex2IR3 2 x1 + 2 x2 subject to xi 2 [ai; bi]; i = 1; 2; 3 for given a, b 2 IR3 with a  b. We let S = [a1; b1 ]  [a2; b2]  [a3; b3], note that S = f(P (0 j [a1; b1]); P (0 j [a2; b2]); x3) j x3 2 [a3; b3] g and for each x 2 S, rf (x) = (x1; x2 ; 0). Also,

21 0 03 r2f (x) = 64 0 1 0 75 so that ker r2f (x) = f0g  f0g  IR 0 0 0 Furthermore, for x 2 S we have

T (x j S ) = I1(x1)  I2(x2)  I3(x3) where

8 > < [0; +1) if xj = aj Ij (xj ) = > IR if a < x < b : (?1; 0] if xjj = bjj j 12

It follows that the second equivalence of Theorem 3.2 is satis ed exactly when 0 > bi or 0 < ai for i = 1; 2, that is the box does not straddle the x1 or x2 axis. This is precisely when the problem has a weak sharp minimum. A generalization of this result which does not require the set S to be polyhedral is easily obtained. Observe that the argument given above only employs the polyhedrality of S to establish that (12) holds. However, (12) also holds under the assumption

\ \ ri (S ? x) (rf (x))? ker(r2f (x)) 6= ;

(see [21, Corollaries 23.8.1 and 16.4.2]) and so the following result is immediate. Theorem 3.4. Let S be the solution set for (11) where it is no longer assumed that S is polyhedral. Suppose x 2 S is such that \ \ ri (S ? x) (rf (x))? ker(r2f (x)) 6= ;: Then S is a set of weak sharp minima for f over S if and only if  ?  ker(r2f (x))  span (rf (x)) + N (x j S ); 8x 2 S:

3.2. Linear Programming. It was shown in [15] that the solution set of a linear

program is a set of weak sharp minima. We show below how it can be obtained as a corollary to Theorem 3.2. The linear programming problem is minimize hc; xi

(13)

x2S

where S is polyhedral. Theorem 3.5. If (13) has a solution, then the set of solutions is a set of weak sharp minima for this problem. Proof Let x be a solution of (13). We note that for linear programming f (x) = hc; xi so that  ? ker(r2f (x)) = f0g It follows that



? ker(r2f (x))  span (rf (x)) + N (x j S ); 8x 2 S

and so by Theorem 3.2, (13) has a weak sharp minimum. As was done in Theorem 3.4, one can generalize this result to the case where S is not assumed to be polyhedral. 13

Remark It is tempting to consider parametric results for weak sharp minima. In fact, the following example shows that this is not too fruitful. Consider the linear programs P (i), for i = 1; : : : ; 1, given by minimize x1=i + x2 subject to x  0 Then as shown above, each of these problems has a weak sharp minimum. However, it is easy to show that there is no constant > 0 which will work for all of them. As a simple application of this result, we have the following corollary. Corollary 3.6. Suppose f : IRn 7! IR is a proper polyhedral convex function and the problem (14) min f (x) x2IRn has a nonempty solution set, S. Then S is a set of weak sharp minima for (14). Proof It follows from the de ntion of a polyhedral convex function that f (x) = h(x) + (x j C ) where h(x): = maxfhx; b1i ? 1; : : : ; hx; bk i ? k g and C : = fx : hx; bk+1i  k+1; : : :; hx; bmi  mg It is clear that (14) is equivalent to minimizex h(x) (15) subject to x 2 C which in turn is equivalent to the linear program minimize(x; ) (16)  hx; bii ? i i = 1; : : : ; k subject to x 2 C

and that the solution set of (16) is S  fh(x)g for any x 2 S. Theorem 3.5 implies the existence of > 0 such that ? h(x)  dist ((x; ) j S  fh(x)g)  dist (x j S) for all (x; ) feasible for (16). It then follows that h(x) ? h(x)  dist (x j S) for all x 2 C since (x; h(x)) is feasible for (16). Thus (15) has a weak sharp minimum as required. 14

3.3. Sharpness for linear complementarity problems. We will use the anal-

ysis given previously to show that nondegenerate monotone linear complementarity problems have weak sharp minima. This was proved in [13]. The linear complementarity problem is to nd an x  0 with Mx + q  0 satisfying hx; Mx + qi = 0. In order to study this we consider the related optimization problem minimize hx; Mx + qi (17) subject to Mx + q  0; x  0 : Given any feasible point, x, for (17) we de ne the sets I (x) = fi j Mi x + qi = 0 g and J (x) = fj j xj = 0 g : It is clear that any solution of (17) satis es [ I (x) J (x) = f1; : : :; ng : We make a convexity(monotone) assumption that M is positive semide nite and a nondegeneracy assumption that there is a solution of (17), x^, which satis es \ I (^x) J (^x) = ;: Under these assumptions, it can be shown that any other solution of (17) satis es I (^x)  I (x) and J (^x)  J (x), (see for instance [13, Lemma 2.2]). Theorem 3.7. The solution set of a nondegenerate monotone linear complementarity problem (17) is a set of weak sharp minima for the problem (17). Proof Let x be any solution of (17) and let x^ be the nondegenerate solution. By Theorem 3.2 we need to show \ (rf (^x))? T (x j S )  ker(r2f (^x)) which for this problem means + D E M d  0 I ( x ) T T (M + M )^x + q; d = 0; dJ (x)  0 =) (M + M )d = 0 We note that D E 0 = (M + M T )^x + q; d = hM i Xx^ + q; di + hx^; MdX = (M x^ + q)idi + x^j (Md)j i2J (^x)

j 2I (^x)

Since I (^x)  I (x) and J (^x)  J (x) and MI (x)d  0 and dJ (x)  0 we see that X X x^j (Md)j = 0 (M x^ + q)idi = 0 and j 2I (^x)

i2J (^x)

It now follows that dJ (^x) = 0 and (Md)I (^x) = 0 so that hd; Mdi = 0. This is equivalent to (M + M T )d = 0 as required. 15

Note that in this result, we asume that the related optimization problem (17) has a weak sharp minimum, as opposed to an assumption of the form (18)

? M x^ ? q 2 int N (^x j IRn+ )

as made in [1]. Using Theorem 3.7 it is easy to construct examples which are sharp in the sense given above, but do not satisfy (18). 4. Finite Termination of Algorithms. In this section we study the convergence properties of algorithms for solving problems of the form (19)

minimize f (x)

x2S where it is assumed that f : IRn 7! IR is di erentiable and S is a nonempty closed convex subset of IRn . Under the assumption that the solution set for (19), S, is a set of weak sharp minima, we will examine certain tools for identifying an element of S

in a nite number of iterations. Our approach is based on the techniques developed in [6]. Consequently, we need to introduce some elementary facts concerning the face structure of convex sets. Recall that a nonempty convex subset Cb of a closed convex set C in IRn is said to be a face of C if every convex subset of C whose relative interior meets Cb is contained in Cb (e.g. see [21, Section 18]). In fact, the relative interiors of the faces of C form a partition of C [21, Theorem 18.2]. Thus every point x 2 C can be associated with a unique face of C denoted by

F (xjC ) such that x 2 ri (F (xjC )). A face Cb of C is said to be exposed if there is a vector x 2 IRn such that Cb = E (x j C ) where

E (x j C ): = arg maxfhx; yi : y 2 C g: The vector x is said to expose the face E (x j C ). It is well known and elementary to show that every face Cb of a polyhedron is exposed and that the exposing vectors are precisely the elements of ri (N (xjC )) for any x 2 ri Cb . With these notions in mind, we have the following key result. Theorem 4.1. If S is a set of weak sharp minima for the problem (19) that is regular, then the set i \h K := T (x j S ) \ N (x j S)  x2S

has nonempty interior and for each z 2 int K one has the inclusion E (z j S )  S. If it is further assumed that the function f is convex, then S is an exposed face of S with exposing vector ?rf (x) for any x 2 S. 16

Proof The fact that the set K has nonempty interior follows immediately from Corollary 2.7, in particular, ?rf (x) 2 int K for any x 2 S. Let z 2 int K and choose  > 0 so that

Then for each x 2 S or, equivalently,

z + IB  K: \

hz + IB; di  0 for all d 2 T (x j S ) N (x j S); \

hz; di  ? kdk for all d 2 T (x j S ) N (x j S): Hence, given x 2 S and p 2 P (x j S) we have hz; x ? pi  ? kx ? pk since (x ? p) 2 T (p j S ) T N (p j S): Consequently, E (z j S )  S. It only remains to show that if f is convex, then E (?rf (x) j S ) = S for any x 2 S.

First observe that (20) rf (x) = rf (y) for every x; y 2 S  Hence by Theorem 3.1. Moreover, it has been established that E (?rf (x) j S )  S: the result will follow if we can show that hrf (x); xi = hrf (x); yi for any choice of x; y 2 S. But this follows immediately from Theorem 3.1. Remark In Theorem 4.1, the nonemptiness of the set int K followed from the di erentiability hypothesis on f . In the absence of such a di erentiability hypothesis, the result would, in general, be false. Indeed, one need only consider the case f (x) := dist (x j S) where S is any nonempty closed set and S is any set that properly contains S in its interior. In [10], the notion of minimum principle suciency was introduced. Assuming x 2 S, we de ne S^: = arg min frf (x)x j x 2 S g : Minimum principle suciency (MPS) is the equality of the sets S^ and S. Note that in the notation above S^  E (?rf (x) j S ). M.V. Solodov [22] pointed out the following interesting corollary to Theorem 4.1. Theorem 4.2. Suppose f is convex and di erentiable and S  IRn is a closed convex set. If S is a set of weak sharp minima for (19) then MPS is satis ed. If S is polyhedral, then MPS is equivalent to S being a set of weak sharp minima for (19) Proof The rst statement of the Theorem follows from Theorem 4.1. For the second part, note that for all x 2 S f (x) ? f (P (x j S))  rf (P (x j S))(x ? P (x j S)) by convexity of f = rf (x)(x ? P (x j S)) by Theorem 3.1 = rf (x)( x ? P (x j S^)) by MPS

^ 

x ? P (x j S )

by Theorem 3.5 =

x ? P (x j S)

by MPS 17

where > 0 as required. The following simple example shows that the assumption of polyhedrality cannot be removed in the above. Example 4.3. The problem minimize x1 subject to (x1 ? 1)2 + x22  1 has a unique solution (0; 0). It is easy to see that MPS is satis ed. However, for the problem to have a weak sharp minimum would require the existence of > 0 such that q x1  x21 + x22 for all feasible points x. If we consider points x on the boundary of the circle, then it follows that x21  2 2x1 which is not true for x1 suciently small. Another simple application of Theorem 4.1 results in the following strong upper semi-continuity result for linear programs that was rst proven in [19, Lemma 3.5]. Corollary 4.4. Let S be a polyhedral convex set in IRn . Let c 2 IRn and S: = arg maxx2S hc; xi. Then there is a neighborhood U of c such that if c 2 U then arg maxx2S hc; xi = arg maxx2S hc; xi Proof If S = ;, the result follows from the fact that a polyhedral set has a nite number of faces and the graph of the subdi erential of a closed proper convex function is closed. Otherwise, it follows from Theorem 3.5 that S is a set of weak sharp minima for max hc; xi x2S By Theorem 4.1, it follows that S = E (c j S ) and that for all c in a neighborhood of c that E (c j S )  E (c j S ). The required equality E (c j S ) = E (c j S) now follows easily. As another immediate consequence of Theorem 4.1, we obtain the following generalization of a result due to Al-Khayyal and Kyparisis [1]. Corollary 4.5. Suppose S is a set of weak sharp minima for the problem (19) and let fxk g  IRn . If either a) f is convex and fxk g is any sequence for which dist (xk j S) ! 0 and rf is uniformly continuous on an open set containing fxk g, or b) the sequence fxk g converges to some x^ 2 S, rf is continuous and S is regular, then there is a positive integer k0 such that any solution of D E k ); x minimize r f ( x (21)

x2S

solves (19).

18

Proof Let us rst assume that a) holds. By Theorem 2.6, i \h ? rf (x) + IB 2 T (x j S ) \ N (x j S)  (22) x2S

for every x 2 S, where > 0 is the modulus of weak sharp minimization for the set S. Also, by Theorem 3.1, rf (x) = rf (y) for all x; y 2 S. Consequently, the hypotheses

k imply the existence of an integer k0 such that rf (x ) ? rf (x) < for all k  k0. Therefore, by Theorem 4.1, E (rf (xk ) j S ) = S. If b) holds, then (22) is still valid

fork every point

x 2 S. The result follows just as it did under assumption a) since rf (x ) ? rf (^x) ! 0. The proof of this result only requires the assumption (22) to hold. Part b) of the above corollary can then be proven under the hypothesis that (22) holds only at x^. This is a weakening of the hypotheses that ?rf (^x) 2 int N (^x j S ) in [1, Theorem 2.1]. Assuming that one can solve (21), Corollary 4.5 can be employed to construct hybrid iterative algorithms for solving the problem (19) that will terminate nitely at weak sharp minima. All that needs to be done is to solve the problem (21) occasionally and if an optima is found, then stop. However, some algorithms do not require such a \ x" in order to locate weak sharp minima nitely. We show that when the objective function f is convex, one can characterize those algorithms that can identify weak sharp minima nitely. We begin with a result that relates the optimality condition given in Theorem 2.6 to the structure of convex subsets of the constraint region S . Lemma 4.6. Let F be any nonempty closed convex subset of the closed convex set S  IRn. Then \ [ (23) F + [T (x j S ) \ N (x j F )]  [x + N (x j S )] =: K: x2F

x2F

Proof Let x 2 F . We need only show that \ K := x + [T (x j S ) \ N (x j F )]  K: x2F

Let y 2 K and let y be the projection of P (y j S ) onto F . Since y 2 K , there is a z 2 [T (y j S ) \ N (y j F )] such that y = x + z. Hence 0 = hy ? y; P (y j S ) ? yi = hP (y j S ) + (y ? P (y j S )) ? x ? z; P (y j S ) ? yi = h(P (y j S ) ? y) + (y ? P (y j S )) + (y ? x) ? z; P (y j S ) ? yi = kP (y j S ) ? yk22 + hy ? P (y j S ); P (y j S ) ? yi + hy ? x; P (y j S ) ? yi + h?z; P (y j S ) ? yi : Observe that each of the terms in the nal sum is non-negative. The second term is nonnegative since (y ? P (y j S )) 2 N (P (y j S ) j S ) and ?(P (y j S ) ? y) 2 T (P (y j S ) j S ). The third term is non-negative since x ? y 2 T (y j F ) while (P (y j S ) ? y) 2 N (y j F ). Finally, the fourth term is non-negative since (P (y j S ) ? y) 2 [T (y j S ) \ N (y j F )]. Hence each term is zero so that y = P (y j S ), that is y 2 y + N (y j S )  K . 19

Remarks

1. It should be noted that one can easily generate examples in which the inclusion (23) is strict. 2. In the fully convex and di erentiable case, it was shown in Theorem 4.1 that the set of weak sharp minima, S, is an exposed face of the constraint region S . Consequently, the set F in the above lemma may be taken to be the set S. In this case one may write [ K = [F (x j S ) + N (x j S )]: x2S

Lemma 4.6 is now employed to show that the characterization given in [6] of those algorithms that identify the optimal face of S in a nite number of steps also characterizes those algorithms that identify weak sharp minima nitely. Theorem 4.7. Suppose f is convex and let S  S be a set of weak sharp minima for (19). If fxk g  S is such that dist (xk j S) ! 0 and rf is uniformly continuous on an open set containing fxk g, then xk 2 S for all k suciently large if and only if

P (?rf (xk ) j T (xk j S )) ! 0:

(24)

Proof If xk 2 S for all k suciently large, then ?rf (xk ) 2 N (xk j S ) for all k

suciently large so that (24) holds trivially. On the other hand, suppose (24) is satis ed. The Moreau decomposition of ?rf (xk) yields

?rf (xk ) = P (?rf (xk ) j T (xk j S )) + P (?rf (xk ) j N (xk j S )): From Theorem 3.1, we have that rf is constant on S. Thus for any x 2 S, the

hypotheses imply that,



rf (x) + P (?rf (xk ) j N (xk j S ))

! 0; and so,

dist (xk + P (?rf (xk ) j N (xk j S )) j S ? rf (x)) ! 0: But, by Theorem 2.6,

2 3 \ S ? rf (x)  int 4S + [T (x j S ) \ N (x j S)]5 : x2S

Thus Lemma 4.6 implies that

2 3 \ xk + P (?rf (xk ) j N (xk j S )) 2 int 4S + [T (x j S ) \ N (x j S)]5 x2S [  [x + N (x j S )] x2S

20

for all k suciently large. Therefore,   xk = P xk + P (?rf (xk) j N (xk j S )) S 1 0 [ 2 P @ [x + N (x j S )] S A [ x2S



x2S

fxg

= S

for all k suciently large. In [6], it was shown that the condition (24) is simple to check in certain cases. In particular, it was established that the standard sequential quadratic programming method and the gradient projection method both satisfy (24) and so will automatically generate sequences that terminate nitely at weak sharp minima. We should also note that Polyak[18, Exercise 2, page 209] indicates that the gradient projection method terminates nitely at weak sharp minima. REFERENCES [1] F.A. Al-Khayyal and J. Kyparisis. Finite convergence of algorithms for nonlinear programs and variational inequalities. Journal of Optimization Theory and Applications, 70(2): 319{332, 1991. [2] L.N.H. Bunt. Bijdrage tot de Theorie der Convexe Puntverzamelingen. Thesis, University of Groningen, 1934. [3] J.V. Burke and M.C. Ferris. Characterization of solution sets of convex programs. Operations Research Letters, 10(1):57{60, 1991. [4] J.V. Burke, M.C. Ferris, and M. Qian. On the Clarke subdi erential of the distance function to a closed set. Journal of Mathematical Analysis and its Applications, 166:199{213, 1992. [5] J.V. Burke and S.P. Han. A Gauss-Newton approach to solving generalized inequalities. Math. of Oper. Res., 11(1986), pp.632{643. [6] J.V. Burke and J.J. More. On the identi cation of active constraints. SIAM J. Numer. Anal., 25(1988), pp.1197{1211. [7] F.H. Clarke. Optimization and Nonsmooth Analysis. John Wiley & Sons, New York, 1983. [8] L. Cromme. Strong Uniqueness. Numerische Mathematik, 29(1978), 179{193. [9] M.C. Ferris. Weak sharp minima and penalty functions in mathematical programming. Technical Report 779, Computer Sciences Department, University of Wisconsin, Madison, Wisconsin 53706, June 1988. [10] M.C. Ferris and O.L. Mangasarian. Minimum principle suciency. Mathematical Programming, 57(1):1{14, 1992. [11] R. Hettich. A review of numerical methods for semi{in nite optimization. In A.V. Fiacco and K.O. Kortanek, editors, Semi{In nite Programming and Applciations, Springer{Verlag, Berlin, 1983. [12] K.Madsen. Minimization of Non-Linear Approximation Functions. Dr. Techn. Thesis, Instititute for Numerical Analysis, The Technical University of Denmark, DK2800 Lyngby, Denmark. [13] O.L. Mangasarian. Error bounds for nondegenerate monotone linear complementarity problems. Mathematical Programming, 48: 437{446, 1990. [14] O.L. Mangasarian. A simple characterization of solution sets of convex programs. Operations Research Letters, 7(1): 21{26, 1988. 21

[15] O.L. Mangasarian and R.R. Meyer. Nonlinear perturbation of linear programs. SIAM Journal on Control and Optimization, 17(6): 745{752, November 1979. [16] T.S. Motzkin. Sur quelque proprietes caracteristiques des ensembles convexes. Rend. Accad. Naz. Lincei, 21(1935), 562{567. [17] B.T. Polyak. Sharp Minima Institute of Control Sciences Lecture Notes, Moscow, USSR (1979), Presented at the IIASA Workshop on Generalized Lagrangians and Their Applications, IIASA, Laxenburg, Austria (1979). [18] B. T. Polyak Introduction to Optimization. Optimization Software, Inc., Publications Division, New York, 1987. [19] S.M. Robinson. Local structure of feasible sets in nonlinear programming, part II: nondegeneracy. Mathematical Programming Study, 22(1984), 217{230. [20] R.T. Rockafellar. Extensions of subgradient calculus with applications to optimization. Nonlinear Anal., 9(1985), 665{698. [21] R.T. Rockafellar. Convex Analysis. Princeton University Press, Princeton, NJ, 1970. [22] M.V. Solodov. Private communication.

22