A Bundle Trust Region Algorithm for Bilinear ... - Semantic Scholar

0 downloads 0 Views 259KB Size Report
(10){(12) we will use a bundle trust region algorithm, as described in Section. 3.1. .... Remark 2.1 Consider a uniform distribution over the unit sphere Bm 2 Rm: ...... Apply an LP code to solve the follower's problem for the leader's variables xed.
A Bundle Trust Region Algorithm for Bilinear Bilevel Programming S. Dempe1, J.F. Bard2

Abstract

The bilevel programming problem (BLPP) is equivalent to a two-person Stackelberg game in which the leader and follower pursue individual objectives. Play is sequential and the choices of one a ect the choices and attainable payo s of the other. The purpose of this paper is to investigate an extension of the linear BLPP where both players' objective functions are bilinear. To overcome certain discontinuities in the master problem, a regularized term is added to the follower's objective function. Using ideas from parametric programming, the directional derivatives of the regularized follower's solution function are computed along with its generalized Jacobian. This allows us to develop a bundle trust region algorithm. Theoretical results related to the existence of solutions are presented as well as a convergence analysis of the proposed methodology.

Key words: bilevel programming, bundle algorithm, Lipschitz continuity, generalized gradients, nondi erentiable optimization.

Freiberg University of Mining and Technology, Germany, [email protected] Graduate Program in Operations Research, University of Texas, Austin, U.S.A., [email protected] 1 2

1 Introduction Bilevel programming is a model for a static, two-person game in which each player tries to optimize his or her individual objective function subject to a set of interdependent constraints [1, 2, 3]. Play is sequential and assumed to be uncooperative. The decision variables are partitioned amongst the players, and the choices of one a ect the payo s and choices available to the other. However, neither player can completely dominate his opponent, nor are solutions likely to be Pareto-optimal as they are in a more traditional two-person game. Applications include government regulation, management of a decentralized rm, and the standard min-max problem (e.g., see [4]). In each of these situations, the leader goes rst, and in selecting a strategy, must anticipate all possible reactions of the follower. In this paper, we consider a bilinear formulation of the bilevel programming problem (BLPP) motivated by an application concerned with determining tax credits for biofuel producers [5]. The speci c model is

y > Q1x ? P1x ! \ min y " B1 y  b 1 A1 x  a 1 x 2 (y)

(1) (2) (3) (4)

where x 2 R n ; y 2 R m ; b1 2 nR p1 ; a1 2 R q1 ; all matrices are of appropriate dimension, and : R m ! 2R is a point-to-set map known as the rational reaction set. In our case, (y ) denotes the solution set of the following linear optimization problem with the parameter y 2 R m in the objective function.

y >Q2 x ! min x A2 x  a 2 x0

(5) (6) (7)

Here, a2 2 R q2 and the matrices A2 ; Q2 are again of appropriate dimension. The upper level problem de ned by (1){(4) is associated with the leader while the lower level problem (5){(7) is associated with the follower. Note that the lower level feasible region (6){(7) does not depend on the parameter y . This implies that the optimal solution to the follower's problem can only be unique for all parameter values when it is constant for all y and hence independent of y . In this case, there is no reason to speak of a bilevel problem so it will be ruled out in the sequel. Although constraints (2) and (6) can be extended with little consequence to include terms in the follower's and leader's variables, respectively, we refrain from doing so. The current formulation most closely re ects the application we have in mind which includes several thousand lower level variables and constraints. To interpret the full model (1){(7) consider a hierarchical decision problem in which the rst player selects a y and announces his choice to the second player who then computes his response x = x(y ) by solving the linear optimization 1

problem (5){(7) for y xed. The solution x(y ) is conveyed back to the leader who is now able to evaluate his objective function (1) as well as determine whether or not y has produced a feasible response with respect to (3). In trying to solve the BLPP, the leader must consider all values of y that produce a feasible response and select the one that minimizes (1). This interpretation emphasizes the implicit nature of the problem. To ensure that (1){(7) is well posed, we assume that the feasible region (2){ (3) is nonempty and that for each decision taken by the leader, the follower has some room to respond; i.e., (y ) 6= ;. Even with these assumptions, though, the BLPP may not have a solution in the normal sense. In general, for all values of y for which (y ) is not a singleton, the leader is not able to compute his objective function value or to check whether his decision is indeed feasible prior to the follower selecting an element in (y ). The quotation marks around the `min' operator in the leader's objective function (1) are included to call attention to this potential ambiguity. The majority of the algorithms developed for solving bilevel programs ignore this issue and assume that (y ) is a point-to-point map [2]. Nevertheless, for problem (5){(7) it is an easy matter to check any algorithmic solution, say x^ 2 (^y), to see if it is unique, and if not, to determine whether any other point in (^y ) yields a smaller objective function value in (1) than y^> Q1 x^ ? P1x^. In the next section, we examine the point-to-set nature of (y ) and discuss di erent methods for dealing with it. The most common algorithmic approach to bilevel programs is based on solving the nonlinear program obtained by replacing the lower level problem with its Karush-Kuhn-Tucker conditions. Penalty methods or branch and bound centering on the resulting complementarity conditions are then used to achieve convergence (see [2, 3] for several implementations). The e ectiveness of these approaches is mainly limited to the case where the leader's objective function is linear and the follower's is convex-quadratic. Shimizu and Lu [6] used an equivalent formulation with an equality formed by the di erence of two convex functions to establish a penalty approach converging to a global optimum. Loridan and Morgan [7] developed general convergence properties for related computational methods. In a di erent vein, arguments common to sensitivity analysis in parametric nonlinear programming can provide information on the directional derivative of the optimal solution x(y ) of the lower level problem, although its gradient is not likely to exist. It is then possible to derive optimality conditions for the general BLPP (e.g., see [8, 9]). Building on this idea, a number of researchers have developed bundle methods that exploit subgradient information to compute directions of descent for the upper level objective function [10, 11, 12]. Under the standard assumptions that the strong second-order sucient optimality condition and the linear indepedence constraint quali cation hold at the solution to the follower's problem it can be shown that x(y ) is piecewise di erentiable and that its generalized Jacobian has a manageable structure. We take a similar approach. All that is needed is Lipschitz continuity to obtain various types of generalized gradients. Unfortunately, the solution set mapping () of problem (5){(7) cannot be assumed to be lower semicontinuous. 2

For each choice of the leader, the follower selects one element x(y ) 2 (y ) but the resulting selection function is not, in general, continuous. Hence, if we insert this selection function into the leader's objective function (1) we do not get a Lipschitz continuous function. To overcome this diculty we introduce a regularization term in the lower level objective function (5). The new problem is a strictly convex quadratic program designed to have a unique optimal solution for each value of the parameter y . This implies that the reaction function will be Lipschitz continuous, and allows us to use the above approach without the need for any regularity assumption.

2 Theoretical developments

2.1 Existence of solutions

Recall that if the lower level optimal solution is not uniquely determined, the leader will be unable to evaluate his objective function value and to check if his decision is feasible before he is aware of the follower's real choice. The literature discusses several ways out of this situation, each requiring some assumptions about the level of cooperation between the players. 1. Optimistic approach (e.g., [13, 14, 15, 16]). Here it is assumed that the leader is able to in uence the follower's choice of elements in (y ). This leads to the `min-min' problem

(

min fy> Q1x ? P1x : A1x  a1g : B1y  b1 y x2min (y)

)

(8)

which is appropriate, for example, in an economic setting where the follower can participate in the pro ts realized by the leader [14]. 2. Pessimistic approach (e.g., [17, 18]). Here the leader tries to bound the damage resulting from an unfavourable decision by the follower, giving rise to the `min-max' problem

(

min fy>Q1x ? P1x : A1x  a1g : B1y  b1 y xmax 2 (y)

)

:

(9)

A comparison of these approaches can be found in [19]. Both (8) and (9) result in a three-level programming problem and are likely to yield di erent optimal solutions. To establish the existence of optimal solutions, weaker assumptions are needed for the optimistic model. To show this, we need to introduce the idea of set continuity. De nition 2.1 A point-to-set mapping ? : Rs ! 2Rt sending points z 2 Rs into subsets of R t is called upper semicontinuous at a point z 0 2 R s if for each open neigborhood M  ?(z 0 ) there exists an open neighborhood U of z 0 such that ?(z )  M for all z 2 U: Theorem 2.1 Let the point-to-set mapping () be upper semicontinuous at every y and let the set f(x; y ) : B1 y  b1; A1x  a1 ; A2x  a2 ; x  0g be bounded. Then, if problem (8) has a feasible solution it has an optimal solution. 3

Proof: The boundedness assumption implies that the function values of (1) are bounded from below on the feasible set. Let v denote the in mal value of (1) with respect to (2){(4). Then there exists a sequence f(xk ; y k )g1 k=1 such that





lim (y k )> Q1 xk ? P1xk = v k!1 B1y k  b1; A1 xk  a1; xk 2 (y k ) 8 k By the boundedness assumption again, there is at least one accumulation point (x0; y 0) of the sequence f(xk ; y k )g1 k=1 . Without loss of generality, let k ; y k ) = (x0 ; y 0). By upper semicontinuity of the mapping (), we have lim ( x k!1 x0 2 (y 0). Hence, (x0; y 0) is feasible for the leader's problem and by continuity of (1) we conclude y 0> Q1x0 ? P1x0 = v . To show existence of optimal solutions in the pessimistic case we need lower semicontinuity of the point-to-set mapping () [18]. Note that upper semicontinuity of () is implied by boundedness of fx  0 : A2 x  a2 g while uniqueness of the optimal solution is sucient to guarantee lower semicontinuity of this mapping [20]. Other approaches used to derive tractable optimization problems rest on regularizing the lower level problem by adding a strongly convex term to the objective function (e.g., see [21, 22, 23]). Let x0 2 R n be a given point, let > 0 and consider the regularized lower level problem

y> Q2x + kx ? x0 k2 ! min x A2 x  a2 x0

(10) (11) (12)

with the solution set mapping (). By strong convexity of the objective function (10), if the set of feasible solutions (11){(12) is not empty, the optimal solution of (10){(12) is uniquely determined for each y and each > 0: Moreover, the unique optimal solution function fx (y )g = (y ) is locally Lipschitz continuous [24]. Hence, if we insert x (y ) into the upper level objective function (1), this function

F (y) 4= y >Q1 x (y) ? P1x (y) is now Lipschitz continuous for xed. Under similar assumptions to those of Theorem 2.1 (boundedness and the existence of feasible solutions), problem (1){(3) with x replaced by x (y ) has an optimal solution. In addition [20], for tending to zero we get a feasible solution for problem (1){(4) since lim

!0; y!y0

x (y) 2 (y 0 ):

The following example shows, however, that the above approach cannot be relied upon to produce an optimistic optimal solution [25]. 4

Example: Consider the bilevel programming problem 2 2 min y f(x ? y ) + y : ?20  y  20; x 2 (y )g where

(y ) = argmin fxy : ?y ? 1  x  ?y + 1g: x

The objective function f (x; y ) = (x ? y )2 + y 2  0 for all x; y . For x = y = 0, we have f (0; 0) = 0. When the leader selects y = 0, the rational reaction set is (0) = [?1; 1]. Hence, 0 2 (0) and x = y = 0 gives the optimistic optimal solution. For all feasible values of y , we have: min x ff (x; y ) : x 2 (y )g = f (?y ? 1; y ) > 0 for y > 0

min x ff (x; y ) : x 2 (y )g = f (?y + 1; y ) > 0 for y < 0

min x ff (x; y ) : x 2 (y )g = f (0; 0) = 0 for y = 0 For 0 < < 0:25 (or equivalently, 0 < 4 < 1), we have

8 > < ?y ? 1 x (y) = > y(1 ? 1=(2 )) : ?y + 1

if y > ?2 =(4 ? 1) if 2 =(4 ? 1)  y  ?2 =(4 ? 1) if y < 2 =(4 ? 1)

Now, when y = 2 , the follower selects x (y ) = y (1 ? 1=(2 )). Letting converge to zero leads to lim x (y ) = ylim (y ? 1) = ?1 y!0 !0 but

?1 62 argmin ff (x; 0) : x 2 (0)g = argmin fx2 : x 2 (0)g = 0 x

x

so the optimistic solution is not obtained.

2.2 Di erentiability of lower level optimal solutions

To solve the regularized bilevel program

y >Q1 x ? P1x ! min y B1 y  b 1 A1 x  a 1 x 2 (y)

(13) (14) (15) (16)

where (y ) denotes the solution set of the regularized lower level problem (10){(12) we will use a bundle trust region algorithm, as described in Section 3.1. At each iteration of this algorithm it will be necessary to compute one element of the generalized gradient of the function F (y ) = y > Q1 x (y ) ? P1x (y ). To do this, we will have to rst compute an element of the generalized Jacobian 5

of the function x () whose existence and Lipschitz continuity follow from [24]. The formulae needed for the computations are derived below. Consider a point y 0 and denote the index sets of active constraints by I (y 0) 4= fi : (A2x (y 0) ? a2)i = 0g and J (y 0 ) 4= fj : (x (y 0))j = 0)g. Denote the set of Lagrange multipliers by  (y 0 ) 4= f(; ) : rx L (x (y 0); y 0; ; ) = 0; i = 0; i 62 I (y0); i = 0; i 62 J (y0 )g

where L (x; y; ; ) 4= y > Q2x + kx ? x0 k2 + > (A2x ? a2 ) ? > x is the Lagrangian of the regularized problem (10){(12). Let E (y 0 ) denote the set of all vertices of  (y 0 ). For some vector v , let the index set of its positive components be P (v ) 4= fj : vj > 0g. Theorem 2.2 Consider problem (10){(12) at a point y = y0 for a xed value of . Then, 1. [26] the function x () is directionally di erentiable at y 0; i.e., the directional derivative x0 (y0; r) 4= t!lim0+ t?1 [x (y 0 + tr) ? x (y0)]

exists for all directions r: 2. [27] the function x () is a PC 1 -function at y 0 ; i.e., there exist an open neighborhood V of y 0 and a nite number of continuously di erentiable functions xI : U ! R n ; I 2 I such that x (y ) 2 fxI (y ); I 2 Ig for each y 2 U; where I is a nite family of sets I . 3. [27] the directional derivative of the function x () at y 0 is the unique optimal solution of the following quadratic optimization problem:

d> d + r> Q2d ! min d ( = 0 if i 2 P () (A2 d)i  0 if i 2 I (y 0 ) n P () ( = 0 if i 2 P () di  0 if i 2 J (y 0) n P () for each vertex (; ) 2 E (y 0).

(17) (18) (19)

Corollary 2.1 Let the assumptions of Theorem 2.2 be satis ed. Then, 1. the generalized Jacobian of the function x () at y = y 0 has the structure @y x (y 0) = conv fry xI (y 0) : I 2 Ie (y 0)g; (20) where Ie (y 0 ) = fI 2 I : y 0 2 cl int fy : xI (y ) = x (y )gg [28]. 2. the function x () is semismooth, i.e., for each sequence fy k g1 k=1 conk ? y 0 )=ky k ? y 0k = r and each sequence verging to y 0 with klim ( y !1 k 2 @y x (y k ) the limit lim v k r exists [29]. Consequently, fvk g1 with v k=1 k!1 k r = x0 (y 0; r) = ry xI (y 0)r for some function xI (); I 2 Ie (y 0 ): lim v k!1 6

A selection function xI () with I 2 I is called active while xI () for I 2 Ie (y 0) is termed essentially active. We see from (20) that Ie (y 0)  I , a fact that is needed when we compare Clarke's generalized Jacobian with the pseudodi erential in Section 2.3. It has been shown in [27] that in place of the functions xI (y ) we can use the unique optimal solution of the problem

y> Q2x + kx ? x0 k2 ! min x (A2 x ? a2 )i = 0; i 2 I1 xi = 0; i 2 I2 where I = I1 [ I2 satis es the following two conditions: (C1) There exists a vertex (; ) 2 E (y 0) such that P ()  I1  I (y 0 ) and P ()  I2  J (y0). (C2) The gradients fAi2 : i 2 I1 g [ fei : i 2 I2g are linearly independent, where Ai2 and ei denote the i-th row of the matrix A2 and the i-th unit vector, respectively. The family of all sets I = I1 [ I2 satisfying conditions (C1) and (C2) coincides with the family I which has been used in Theorem 2.2. In short, if we consider the set of active constraints of the problem (10){(12) then, in order to satisfy the Karush-Kuhn-Tucker conditions, we can restrict ourselves to the sets satisfying (C1), (C2). For each such set a uniquely determined Lagrange multiplier (which in fact is a vertex in  (y 0 )) exists such that condition (C1) is satis ed. For the computation of the Jacobian of the functions xI (y 0) we have to solve the following system of equations [30]: 2 D + Q2 + A>2 ? ? = 0 (A2D)i = 0; i 2 I1 Di = 0; i 2 I2 ?i = 0; i 62 I1

i = 0; i 62 I2

(21) (22) (23) (24) (25)

Here, each equation (21){(25) de nes a vector of partial derivatives being equal to the zero vector, especially Di is the i-th row of the matrix D, and also ?;

are matrices of appropriate types. The diculty that must be overcome is in verifying that an arbitrary chosen set I 2 I belongs to Ie (y 0 ). The Karush-Kuhn-Tucker conditions associated with problem (17){(19) are > > 2 d + Q (2 r + A 2 ?  = 0 = 0 if i 2 P () (A2 d)i  0 if i 2 I (y 0) n P () ( = 0 if i 2 P () di  0 if i 2 J (y 0) n P ()

i (A2d)i = 0; 8i idi = 0; 8i 7

i  0;

i = 0; i  0; i = 0;

for i 2 I (y 0 ) n P () for i 62 I (y 0 ) for i 2 J (y 0 ) n P () for i 62 J (y 0 )

If we use an active-set strategy to split the complementarity conditions of this system, we get

d + Q(>2 r + A>2 ?  = 0 = 0 if i 2 I1 (A2d)i  0 if i 62 I1 ( = 0 if i 2 I2 di  0 if i 62 I2

i  0; for i 2 I1 n P ()

i = 0; for i 62 I1 i  0; for i 2 I2 n P () i = 0; for i 62 I2

(26) (27) (28) (29) (30) (31) (32)

For a vertex (; ) 2 E (y 0 ) and P ()  I1; P ()  I2 let R(I1; I2; ; ) denote the set of all directions r such that the system (26){(32) has a feasible solution. Then, since (17){(19) is a convex quadratic optimization problem with a unique optimal solution, x0 (y 0; r) solves problem (17){(19) if and only if there are (I1; I2; ; ) with I1 [ I2 2 I , (; ) 2 E (y 0) such that r 2 R(I1 ; I2; ; ). This shows [ [ R(I1; I2; ; ) = Rm: (;)2E (y0 ) I1 [I2 2I

Given that the number of elements in this union is nite, there are sets

R(I1; I2; ; ) having a nonempty interior. This leads to the following. Remark 2.1 Consider a uniform distribution over the unit sphere Bm 2 Rm: The sets R(I1; I2; ; ) are convex cones. Hence, each set R(I1 ; I2; ; ) hav-

ing empty interior is of lower dimension and so is the union over all sets R(I1; I2; ; ) with empty interior. Consequently, a randomly chosen vector r 2 Bm belongs to the interior of some of the sets R(I1; I2; ; ) with probability one.

Remark 2.2 In [31, 32, 10, 33, 11], conditions have been formulated which guarantee the computability of generalized Jacobians of the function x (). While in [31, 32] the Mangasarian-Fromowith constraint quali cation is used, [10, 33, 11] assume the linear independence constraint quali cation to be satis ed which implies uniqueness of the Lagrange multiplier. The additional conditions in [31, 10] coincide. By use of the Motzkin Theorem of the Alternatives, the assumptions in [32, 11] can be shown to be the same. The result in [33] applies to problems with parameter{independent constraints only and implies the condition used in [32, 11]. 8

The following theorem shows that if a direction r is chosen such that it belongs to the interior of one of the sets R(I1; I2; ; ), then we are able to compute one element of the generalized Jacobian of the solution function. This resolves the main question of the computability of such matrices with probability one, as asserted in Remark 2.1. Theorem 2.3 Consider problem (10){(12) at a point y = y0 with a xed > 0. Let r be chosen such that there exist (I10; I20; 0; 0 ) with I = I10 [ I20 2 I , (0; 0) 2 E (y 0) and r 2 int R(I10; I20; 0; 0). Then there exist sets I^1 ; I^2 with I^ = I^1 [ I^2 2 I such that the function xI ^ () is essentially active and

rxI (y0) = rxI ^ (y0) 2 @y x (y0):

Proof: The proof is done in two steps: First we show that the directional derivative of x () corresponds to a certain active selection function related

to the set R(I10; I20; 0; 0). Second we use Corollary 2.1 to nd an essentially active selection function also belonging to x0 (y 0; r). In the last part of step two, we show that the Jacobians of both selection functions coincide. 1. By Theorem 2.2 there exists a unique optimal solution x0 (y 0; r) to the quadratic optimization probem (17){(19) for each (; ) 2 E (y 0). Take (; ) = (0; 0 ). Moreover, since the Karush-Kuhn-Tucker conditions are necessary and sucient optimality conditions for the strongly convex quadratic optimization problem (17){(19) there exist ( 0;  0) such that (x0 (y 0; r)r; 0;  0) solve the system (26){(32) so we have (33) P (0 ) [ P ( 0)  I10; P (0 ) [ P ( 0)  I20: Now, given that 0i = 0 for 0all i 62 I10, and 0i = 00 for all i 62 I20, there is an active selection function xI () with x (y 0 ) = xI (y0) for I 0 = I10 [ I20 2 I . 0 0 I Then the Jacobians D = ry x (y ); ? = ry I (y 0); = ry I (y 0 ) exists and simply by inserting (ry xI 0 (y 0)r; r; ?r; r) into (26){(32) it can be seen that ry xI 0 (y0)r = x0 (y0; r) for all r 2 R(I10; I20; 0; 0). Thus according to our assumptions, we have int R(I10; I20; 0; 0 ) 6= ;. 2. By Corollary 2.1 there exist sets I1 ; I2 such that I = I1 [ I2 2 Ie (y 0) and x0 (y 0 ; r) = ry xI (y 0)r: Let T (I ) denote the tangent cone to the set fy : xI (y) = x (y)g: I 1 T (I ) = fd : 9 fy k g1 k=1  fy : x (y ) = x (y )g;?19 ftk gk=1 with k k 0 lim y = y ; klim t = 0; klim t (y ? y 0 ) = dg: k!1 !1 k !1 k Because the number of elements in the set Ie (y 0) is nite and because the directional derivative x (y 0; ) is continuous with respect to changes of the direction, there must be a set I^ = I^1 [ I^2 2 Ie (y 0 ) with ry xI ^ r = x0 (y 0; r) and int T (I^) 6= ;. But then S =4 int (T (I^) \ R(I10; I20; 0; 0)) 6= ; and we have x0 (y 0; r) = ry xI ^ (y 0 )r = ry xI 0 (y 0)r for each r 2 S . This implies ry xI 0 (y0) = ry xI (y0). 9

Theorem 2.3 suggests the following procedure for computing one element of the generalized Jacobian of the solution function x () at a point y = y 0.

Algorithm 1: Computation of an element of the generalized Jacobian Step 1 Select randomly a direction r 2 R m ; krk = 1 and compute the direc-

tional derivative of the function x (y 0) by solving the problem (17){ (19). Step 2 Compute the sets I1 ; I2 satisfying (33) for a vertex ( ;  ) of the Lagrange multiplier set of the quadratic problem (17){(19). Step 3 Compute the element of the generalized Jacobian by solving the problem (21){(25). Because the direction r chosen in Step 1 belongs to the interior of one of the sets R(I1; I2; ; ) with probability one, the result of this procedure is an element of the generalized Jacobian with probability one by Theorem 2.3. Some more information about the computation of the generalized Jacobian of the function x () can be found in [34].

2.3 The pseudogradient

We now introduce a second generalization of a gradient that will be useful in subsequent developments. De nition 2.2 [35] A function f : Rp ! R is called pseudodi erentiable at y 0 if there exist an open neighborhood U of y 0 and an upper semicontinuous p R point-to-set mapping ?f : U ! 2 with nonempty, convex and compact images such that f (y) = f (y 0 ) + g(y ? y 0 ) + o(y; y 0; g) 8 y 2 U; where g 2 ?f (y ) and lim o(y k ; y 0; g k )=ky k ? y 0k = 0

k!1

(34)

k 1 for each sequences fy k g1 yk = y 0, g k 2 ?f (y k ) for all k. k=1 , fg gk=1 with klim !1

Because we intend to apply this idea to the vector-valued function x () we de ne the pseudodi erential of a vector-valued function as the set of all matrices which have the elements of the pseudodi erential of the component functions as rows. It has been shown in [35] that pseudodi erentiable functions are locally Lipschitz continuous and that locally Lipschitz continuous, semismooth functions are pseudodi erentiable. In the latter case, the generalized gradient in the 10

sense of Clarke can be used as the pseudodi erential ?f . In our case a di erent approach is used to verify the pseudodi erentibility of the function x () because it shows that a random choice of one selection function guarantees the computation of one element of the pseudodi erential. This is not the case with the generalized Jacobian by Theorem 2.3. Note that a continuously di erentiable function f is pseudodi erentiable with ?f (y ) = frf (y )g. Theorem 2.4 Let f : Rn ! R be a continuous function which is a selection of a nite number of pseudodi erentiable functions:

f (y) 2 ffi (y) : i = 1; : : :; kg where fi : R n ! R has the pseudodi erential ?fi (y ); i = 1; : : :; k < 1: Then f

is pseudodi erentiable and, as such, we can take

?f (y ) = conv fg : g 2 ?fi (y ); fi (y ) = f (y )g: Proof: The arguments used here mainly parallel those used in the proof of Theorem 1.5 in [35]. We show the result for k = 2. The desired result then follows by induction. Clearly, the point-to-set mapping ?f has nonempty, convex and compact values. It is easy to see that ?f is a closed point-to-set mapping. Since each of the mappings ?fi is locally bounded, the same can be said for ?f which in turn implies that ?f is upper semicontinuous [20]. We now show that (34) holds or, in other words, that for each " > 0 there exists a  =  (") > 0 such that

?"  (f (y) ? f (x) ? g(y ? x))=ky ? xk  " (35) for all points in the set fy : ky ? xk  ; y = 6 xg and for all g 2 ?f (y). Since the functions fi , i = 1; 2, are pseudodi erentiable there exist corresponding i (") > 0 such that for all gi 2 ?fi (y) and all points in fy : ky ? xk  i; y = 6 xg,

we have

?"  (fi(y) ? fi (x) ? gi(y ? x))=ky ? xk  ":

(36) To show that inequalities (36) imply (35) we will consider all the di erent cases of coincidence between the functions f (); f1 () and f2 (). If f (x) = f1 (x) 6= f2 (x) then, by continuity, there is some open neighborhood fy : ky ? xk  g;  > 0 of x such that f (y) = f1 (y) 6= f2(y) for all y in that neighborhood. Then for each y in that neighborhood, the result follows from (36). Now let f (x) = f1 (x) = f2 (x) and put  (") = minf1("); 2(")g. Take a xed point y 2 fy : ky ? xk  ; y 6= xg. If it happens that f (y ) = f1 (y ) 6= f2 (y ) then the result again follows from (36) since we have only to consider f1 (y ) for evaluating (35). Consider now the last case when f (y ) = f1 (y ) = f2 (y ): Then g 2 ?f (y) if there exists 2 [0; 1] with g = g1 +(1 ? )g2 for some gi 2 ?fi (y). Hence, if we multiply the relations (36) for i = 1 by and for i = 2 by 1 ? and add the two, we obtain (35).

Corollary 2.2 A pseudodi erential of the function x (y0) is given by ?x (y 0) = conv fry xI (y 0 ) : I 2 Ig: 11

Comparing this with the generalized Jacobian

@x (y0) = conv fry xI (y 0) : I 2 Ie (y0)g we see that @x (y 0)  ?x (y 0):

If we compute an element of ?x (y 0) using Algorithm 1 we indeed compute an element of @x (y 0) with probability one. Note that semismoothness of x () is maintained even if @y x () is replaced by ?x (), i.e. klim vk r exists for each sequence f(y k; v k )g1 (y k ? k=1 with klim !1 !1 y 0)=ky k ? y 0k = r and v k 2 ?x (yk ) for all k. This can be shown, mutatis mutandis, in full analogy to [29].

3 Solution Approach

3.1 Bundle trust region algorithm for minimizing a nonconvex function

We start by outlining the bundle trust region algorithm as given in [36, 37] for minimizing a nonconvex, Lipschitz continuous function G : R m ! R without constraints. To apply this algorithm, we must be able to compute the function value G(y ) and one element v (y ) 2 @G(y ) of the generalized gradient of G for an appropriate y 2 Rm . This must be done at each iteration. We will not give a detailed description of the aglorithm's components but restrict ourselves to the main ideas. The bundle method has its roots in the use of cutting planes for minimizing convex functions. Let fy i gki=1 , fz i gki=1 be iterates already computed. The traditional cutting plane method minimizes the function max fv (y i)> d + v (y i )(z k ? y i ) + G(y i )g 1ik

with respect to d, where d = y ? z k . In bundle algorithms, a quadratic regularization of this function is minimized; namely, max fv (y i)> d ? i;k g + G(z k ) + (2tk )?1 d> d

1ik

(37)

where i;k = G(z k )?v (y i )> (z k ?y i )?G(y i ) for all i; k and positive tk . Note that i;k  0 can be guaranteed only in the case of minimizing a convex function. Putting it another way, in general, G() is approximated accurately by (37) only in the convex case. Because G() is not convex, the above functions cannot be used to describe appropriate local approximations of G(). To overcome this diculty, i;k is replaced by

i;k = maxf i;k ; c0 kzk ? y ikg where c0 is a small positive constant. The algorithm consists of a sequence of socalled inner iterations, which are given in short below [37]. In the description, let " and m be small positive constants. 12

Algorithm 2: Bundle algorithm inner iteration Step 1 Let fz i gki=1 and fy j gkj=1 be sequences already computed. Let d^ be a solution of the problem

max fv (y i)> d ? i;k g + (2tk )?1 d> d ! min : d

1ik

(38)

If

fv(yi)>d^? i;k g  "; t?k 1 kd^k  " and ? t?k 1 kd^k2 ? 1max ik

(39)

yk is almost stationary and the inner iteration terminates with z k = y k . Otherwise, put y k+1 z k + d^ and compute a generalized gradient v(y k+1 ). Step 2 If G(y k+1 ) ? G(z k ) < m 1max fv(yi)> d^? i;k g, then zk+1 = yk+1 is used ik and the inner iteration terminates (serious step). Otherwise, either (i) a so-called null step (i.e., a more accurate local approximation of G() is computed) or (ii) a line search (to nd a new point where either a serious step or a null step is possible) is carried out, or (iii) the value of tk is changed. In the last case, Step 1 is repeated. A null step terminates the inner iteration.

The components of the inner iteration are embedded in an outer loop where two operations are performed. First, convergence of the algorithm is checked. Second, the set of iterates y i used for describing the function (38), the corresponding elements of the generalized gradients, and the values of i;k are updated. It should be mentioned that linear constraints can be added without major diculties [37] while nonlinear constraints can be handled using a feasible directions idea [38], but with a bit more e ort. The following theorem con rms the convergence of the bundle algorithm. Note that if in the de nition of semismoothness only points y k = y 0 + tr for a xed direction r are considered, then the function is called weakly semismooth. Semismooth functions are weakly semismooth but not vice versa. Theorem 3.1 [37] If G() is weakly semismooth, bounded below and the sequence fz k g1 k=1 computed by the above algorithm remains bounded, then there exists an accumulation point z of fz k g1 k=1 such that 0 2 @G(z). The following result can be shown by repeating all steps in [36] one-by-one: Theorem 3.2 If an element v(yi) 2 ?G (yi) is used at each iteration of Algorithm 2 instead of v (y i ) 2 @G(y i ), then there exists an accumulation point z of fz k g1 k=1 such that 0 2 ?G (z) provided that all the other assumptions of Theorem 3.1 are satis ed.

Remark 3.1 Let all directions used for the computation of the pseudogradient 13

be chosen independently. Then,

0=

X

i2Js

iv(y i) with v(y i ) 2 ?G (z);

X i2Js

i = 1; i  0; i 2 Js

by the convergence analysis of the bundle trust region algorithm in [37]. Then, by Algorithm 1, v (y i ) 2 @G(z) with probability one for all i which implies that also 0 2 @G(z) with probability one.

3.2 Prototype algorithm for problem (1){(4)

We consider a penalty function approach for treating the constraints (2) and (3) (the former is nondi erentiable when x is replaced by x (y )) as well as the above described regularization of the implicit constraint (4). This leads to the following regularized problem

y> Q1x ? P1x + p(kA1x ? a1 k+ + kB1 y ? b1k+ ) ! min y x 2 (y)

P

(40) (41)

where the norm kak+ = ni=1 maxf0; aig for a 2 R n . For a xed pair ( ; p), (40){(41) is the problem of minimizing a Lipschitz continuous function without constraints which can be solved with a bundle trust region algorithm. For computing the elements of the generalized gradient of the objective function

Gp (y) = y >Q1 x (y) ? P1x (y) + pkA1x (y) ? a1k+ + pkB1 y ? b1k+ we can use Theorem 2.3. With these developments in mind, we propose the following prototype bundle trust region algorithm for solving problem (40){ (41).

Algorithm 3: Prototype bundle algorithm Step 1 Select "0 > 0; 0 > 0; p0 > 0 and a starting point z 0 . Put s 0: Step 2 Starting with z s , use a bundle algorithm to compute a solution z s+1 of the problem (40){(41) for that xed value of = s , p = p0 satisfying the conditions (39) for " = "s . Step 3 Put "s+1 "s =2; s+1 s =2; ps+1 2ps , s s + 1 and repeat Step 2 until some termination criterion is satis ed. To show convergence, we need a theorem about the exactness of the penalty function (40) which, in turn, makes use of Lemma 3.1 below. Before we can present the results, we need ton de ne a polyhedral point-to-set mapping. We P also de ne the norm kak1 = jai j for a 2 R n . i=1

De nition 3.1 [39] A point-to-set mapping  : Rm ! 2Rn is called polyhedral if its graph f(x; y ) : x 2 (y )g is the union of a nite number of polyhedral convex sets. It is called locally upper Lipschitz continuous at a point y 0 2 R m 14

if there exist a constant c < 1 and an open neighborhood V of y 0 such that for every y 2 V the relation

(y )  (y 0) + cky ? y 0k1 Bn holds.

Polyhedral point-to-set mappings () and their inverses ?1 () de ned by ?1 (z ) = fx : z 2 (x)g are locally upper Lipschitz continuous at each point y 0 2 R m [39]. Consider the function 0 2 (x ? x0) + Q> y + A>  ?  1 2 2

BB H ( ; x; y; ; ; a1; b1) = B BB @

CC A2 x ? a2 CC x CA B1 y ? b1 A1 x ? a1 and the polyhedral cone K (; ) = f0g R q+2 ()  Rn+ ()  R p+1  R q+1 , where the i-th component of Rp+ () is equal to zero if i > 0 and equal to [0; 1) if i = 0: Then, by use of the Karush-Kuhn-Tucker conditions for the regularized

lower level problem (10) {(12) it can be seen that the constraints of problem (13){(16) are equivalently formulated as 0 2 H ( ; x; y; ; ; a1; b1) + K (; ): (42) Moreover, each point (x; y ) is feasible for the regularized upper level problem for a xed value of if there exists (; ) such that (x; y; ; ; ; a1; b1) solves (42). Note that the point-to-set mapping N de ned by N (x; y; ; ; a1; b1) = H ( ; x; y; ; ; a1; b1) + K (; ) is polyhedral and so is its inverse N ?1 given by N ?1(z) = f(x; y; ; ; a1; b1) : z 2 H ( ; x; y; ; ; a1; b1) + K (; )g:

It is easy to see that for z = (0; 0; 0; z1; z2)> z 2 H ( ; x; y; ; ; a1; b1) + K (; ) () 0 2 H ( ; x; y; ; ; a1 ? z1 ; b1 ? z2 ) + K (; ):

Hence, the point-to-set mapping M (a1; b1) = f(x; y; ; ) : 0 2 H ( ; x; y; ; ; a1; b1) + K (; )g is also polyhedral and both N and M are locally upper Lipschitz continuous. Denote the distance of a point z 2 R p to a set V  R p by %(x; V ): Lemma 3.1 Let > 0 be xed. There exist " > 0; c < 1 such that for each (x; y ) satisfying x 2 (y ) with corresponding Lagrange multipliers (; ) and kA1x ? a1k+ + kB1y ? b1k+ < " we have %((x; y; ; )>; M (a1; b1))  c(kA1x ? a1 k+ + kB1 y ? b1k+ ): 15

Proof: Since M (; ) is locally upper Lipschitz continuous there are " > 0; c < 1 such that M (a01; b01)  M (a1; b1) + c(ka01 ? a1k1 + kb01 ? b1k1)B for all a01; b01 with ka01 ? a1 k1 + kb01 ? b1 k1 < ": Let (x; y ) be given with x 2 (y ). Take any corresponding Lagrange multipliers (; ). Then the following constraints are satis ed: 2 (x ? x0 ) + Q>2 y + A>2  ?  = 0 ?A2x + a2 2 Rq+2 () x 2 Rn+ ();   0;   0 Take a01i = maxfa1i; (A1x)ig; b01i = maxfb1i; (B1y )ig 8i: Then (x; y; ; ) 2 q1 q1 M (a01; b01). On the other hand, ka01 ? a1k1 = P ja01i ? a1ij = P maxf0; (A1x ? i=1 i=1 a1 )ig = kA1x ? a1k+: This implies the desired result. The following theorem says that the objective function in (40) is an exact penalty function. Theorem 3.3 Let > 0 be xed and let the problem (13){ (16) have a globally optimal solution. There exists a parameter p such that 1. for all p  p, if (x (y); y) is a locally optimal solution of problem (13){ (16), then (x (y); y) is a locally optimal solution of (40){(41). 2. if (x (y ); y) solves (40){(41) locally for some p  p, then it is also a locally optimal solution of the problem (13){(16).

Proof: The proof essentially uses Lemma 3.1 and ideas of the proof of Theorem 2.1.2 in [40]. From Lemma 3.1 we know that there exists " > 0; c < 1 such that for each (x; y ) satisfying x 2 (y ) with corresponding Lagrange multipliers (; ) and r(x; y ) 4= kA1 x ? a1 k+ + kB1 y ? b1k+ < ", we have %((x; y; ; )>; M (a1; b1))  cr(x; y): By local Lipschitz continuity of the function x () the function F (y ) = y >Q1x (y) ? P1x (y) is also locally Lipschitz continuous, and for each y and some compact set U with y 2 int U there exists a constant L < 1 such that jF (y) ? F (z)j  Lky ? zk1 8 y; z 2 U: Take p > cL and let p > p: 1. Now let (x (y ); y) be a locally optimal solution of (13){(16) and let V be some open neighborhood of y such that

y > Q1 x (y) ? P1x (y)  y > Q1 x (y) ? P1x (y) 8 y 2 V: Without loss of generality let V  U: Let y 2 V be arbitrary and (x (z ); z; (z ); (z )) 2 M (a1 ; b1) for some ((z ); (z )) such that %((x (y); y; (y); (y)); M (a1; b1))  ky ? zk1; where ((y); (y)) are some corresponding Lagrange multipliers. The point z can be chosen, for 16

example, by projection of (x (y ); y; (y ); (y )) on M (a1; b1). Assume that ky ? y k1 is small enough so that z 2 V and r(x (y ); y ) < ": This is possible because x () is continuous and (; ) is upper semicontinuous. Then F (z )  F (y ) and F (z ) ? F (y )  ?Lkz ? y k1 and we get

F (y) + pr(x (y); y) = F (z) + (F (y) ? F (z)) + pr(x (y); y)  F (z) ? Lkz ? yk1 + pkz ? yk1=c  F (y) + (p=c ? L)kz ? yk1  F (y) = F (y) + pr(x (y); y) Hence, (x (y ); y) is a locally optimal solution of problem (40){(41). 2. Now let (x (y 0); y 0) be a locally optimal solution of (40){(41), i.e., let F (y) + pr(x (y); y)  F (y 0) + pr(x (y 0); y 0) for all y in some open neighborhood V of y 0 . Let y be an optimal solution of problem (13){(16) restricted to y 2 cl V which exists by the assumptions in the theorem. Then,

F (y) = F (y) + pr(x (y); y)  F (y 0) + pr(x (y 0); y 0) = F (z ) + (F (y 0) ? F (z )) + pr(x (y 0); y 0)  F (z) ? Lky0 ? zk1 + pky0 ? zk1=c  F (y) + (p=c ? L)ky0 ? zk1  F (y) where z has been chosen as in part (1) of the proof. Hence all the inequalities in this sequence are indeed equations. Because p > cL, we have ky 0 ? z k1 = 0 or (x (y 0); y 0; (y 0); (y 0)) 2 M (a1; b1) for some ((y 0); (y 0)). This implies that y 0 solves (13){(16).

4 Convergence analysis Conditions (39) used in Algorithm 2 are equivalent to

ws 4= k

X

i2Js

i v(y i)k  "s ;  s 4=

P

X

i2Js

i i;s  "s

(43)

i = 1 and Js is some subset of the index set of where i  0; 8 i 2 Js ; i2Js iteration points. From the results in [36], for "s > 0, the conditions in (39) are satis ed after a nite number of iterations. For each vertex x of the set fx  0 : A2 x  a2 g let P (x) denote the region of stability of x given by P (x) 4= fy : x 2 (y)g:

Let fz s g1 s=1 be the sequence of iteration points generated by the bundle trust region algorithm and assume that it is bounded. Then this sequence has an accumulation point. Let z be some accumulation point of the component ses quence fz s g1 s=1 and assume without loss of generality that slim !1 z = z: Assume 17

for the moment that z 2 int P (x). Then, since x() is locally constant around z, the Lipschitz constant of x () remains bounded for tending to zero and zs approaching z. This implies that the steps of Algorithm 3 can be realized. s s s Then, by slim !1 " = 0 we have slim !1 w = 0 and slim !1  = 0 which imply that and

0 2 @G1 (z) if v (y i) 2 @Gp(y i) 8 i

0 2 ?G1 (z ) if p depending on the case. See [41, Section 3.3], [36].

v(y i) 2 ?G

(y i ) 8 i

(44)

Lemma 4.1 All stationary points of problem (8) are either locally minimal or

locally maximal.

Proof: We can formulate problem (8) as ' = min y f'(y ) : B1 y  b1g; where

> '(y) = min x fy Q1x ? P1x : A1 x  a1 ; x 2 (y )g:

Let X (y ) be the ( nite) set of vertices of the set fx 2 (y) : A1x  a1g: Then > '(y) = min x fy Q1 x ? P1 : x 2 X (y )g: Note that by parametric linear programming, the mapping X () is piecewise constant and that the function '() is piecewise linear on each of the constant pieces. Hence, '() itself is also piecewise linear and problem (8) is equivalent to minimizing a piecewise linear function. All stationary points of such functions are either locally minimal or locally maximal. This concludes the proof. This Lemma together with the calculation of ?Gp () implies that z is a local minimum of (8) since a local maximum cannot be obtained by a descent algorithm such as Algorithm 3. Moreover, since z 2 P (x) which implies that x 2 X (z) and B1 z  bi for p ! 1, we have the following result. Theorem 4.1 Applying Algorithm 3 to problem (40){(41), let the sequence of iterates fz s g1 s=1 be a bounded. Let each accumulation point of this sequence belong to int P (x) for some vertex x of fx  0; A2x  a2 g: Then, there is an accumulation point z which is a local minimum of (8) and the value of the penalty parameter p need not to go to in nity but can remain bounded. The last assertion is a direct consequence of Theorem 3.3. It should be noted that if fz s g1 s=1 converges to a boundary point of some set P (x), the algorithm will run into trouble due to an unbounded Lipschitz constant. Such a situation would imply that elements of either the generalized Jacobian or the pseudogradient of the function x () could not be computed for suciently small . When this occurs, the best that we can hope for is that we are sucently close to a local minimium. 18

5 Implementation To solve problem (1){(7) using the above ideas we assume that a bundle trust region code (such as the one described in [37]) is available. Initially, let the leader's variables be xed at y = y 0 and let the regularizing parameter = 0 in (10). 1. Apply an LP code to solve the follower's problem for the leader's variables xed. 2. Select a direction r 2 R m n f0g randomly and a vertex (0; 0) of the set  (y ), as de ned in Section 2.2. The latter can be done, for example, by solving the LP maxfa>  + b>  : (; ) 2  (y )g for a randomly chosen vector (a; b)> 2 Bm+n n f0g. 3. Solve the quadratic optimization problem (17){(19) with (; ) = (0; 0) by applying a standard QP code and nd a vertex ( 0;  0) of the set of Lagrange multipliers of this problem. 4. Set I1 = P (0 ) [ P ( 0); I2 = P (0 ) [ P ( 0) and solve the system of linear equations (21){(25). By Theorem 2.3 the result of this procedure gives a generalized Jacobian of the function x (y ) with probability one. Moreover, these steps can be implemented in polynomial time. Note that in Steps 2 and 3 the use of a random objective function guarantees that a vertex of the Lagrange multiplier set is found, again with probability one. This can also be done in polynomial time. Using the chain rule for generalized gradients, the generalized Jacobian can be used to compute an element of the generalized gradient of the function Gp (y) de ned in Section 3.2, which is then used in the bundle algorithm. With respect to Algorithm 3, the nal implementation task involves the control of the parameters ; p and ": One diculty that might result from the use of Algorithm 3 is that of discontinuity at the limit point of the sequence generated. This possibility is an inherent property of the model. But if the sequence of iterates converges to such a point numerical diculties can arise. This is a consequence of our attempt to use Lipschitz optimization and cannot be avoided unless we take an entirely di erent approach to the computations such as applying more traditional enumerative methods (e.g., see [2]) or a recently proposed trust region algorithm [42]. But these methods present their own diculties, especially with respect to model size. Recall that our motivation comes from an application in biofuel production that has only a handful of upper level variables (less than 10). When solving such a problem with the bundle trust region algorithm, we would be working in a very low dimensional space. Using the other methods means that we would be working in a space determined by the number of upper level variables plus the number of lower level variables plus the number of dual variables. Current technology does not permit the solution of problem instances arising from the application that we have in mind. As a nal point we note that alternative penalty function approaches exist to the one presented here. Several, for example, penalize the complementarity term in the Karush-Kuhn-Tucker reformulation of the bilevel programming 19

problem [43, 40]. These approaches replace an inherently nonsmooth problem (1){(7) by a sequence of smooth problems. The latter are likely to exhibit numerical diculties, however, given the state of standard nonlinear programming codes. Instead, we have proposed to construct a sequence of nonsmooth problems, which we believe will be somewhat better behaved, and investigate their stability. Generally speaking, we feel that it is better to approximate a nonsmooth function to be minimized by a nonsmooth function and not to force di erentiability.

References [1] G. Anandalingam and T.L. Friesz (eds.). Hierarchical optimization. Annals of Operations Research, 34, 1992. [2] J.F. Bard. Practical Bilevel Optimization: Algorithms and Applications. Kluwer Academic Publishers, Dordrecht, 1998. [3] K. Shimizu, Y. Ishizuka, and J.F. Bard. Nondi erentiable and Two{Level Mathematical Programming. Kluwer Academic Publishers, Dordrecht, 1997. [4] A. Migdalas, P.M. Pardalos, and P. Varbrand (eds.). Multilevel Optimization: Algorithms and Applications. Kluwer Academic Publishers, Dordrecht, 1998. [5] J.F. Bard, J.C. Plummer, and J.C. Sourie. A bilevel programming approach to determining tax credits for biofuel production. European Journal of Operational Research, 1999. to appear. [6] K. Shimizu and M. Lu. A global optimization method for the Stackelberg problem with convex functions via problem transformations and concave programming. IEEE Transactions on Systems, Man, and Cybernetics, 25:1635{1640, 1995. [7] P. Loridan and J. Morgan. New results on approximate solutions in twolevel optimization. Optimization, 20:819{836, 1989. [8] S. Dempe. A necessary and a sucient optimality condition for bilevel programming problems. Optimization, 25:341{354, 1992. [9] G. Savard and J. Gauvin. The steepest descent direction for the nonlinear bilevel programming problem. Operations Research Letters, 15:265{272, 1994. [10] J.F. Falk and J. Liu. On bilevel programming, Part I: General nonlinear cases. Mathematical Programming, 70:47{72, 1995. [11] J. Outrata and J. Zowe. A numerical approach to optimization problems with variational inequality constraints. Mathematical Programming, 68:105{130, 1995. [12] J.V. Outrata. On the numerical solution of a class of Stackelberg problems. Z. Operations Research, 34:255 { 277, 1990. 20

[13] J.F. Bard. Convex two-level optimization. Mathematical Programming, 40:15{27, 1988. [14] W. Bialas and M. Karwan. Two-level linear programming. Management Science, 30:1004{1020, 1984. [15] S. Dempe. A simple algorithm for the linear bilevel programming problem. Optimization, 18:373{385, 1987. [16] P.T. Harker and J.-S. Pang. Existence of optimal solutions to mathematical programs with equilibrium constraints. Operations Research Letters, 7:61{64, 1988. [17] P. Loridan and J. Morgan. "-regularized two-level optimzation problems: approximation and existence results. In Optimization { Fifth FrenchGerman Conference (Varez), pages 99{113. Lecture Notes in Mathematics, Springer Verlag, Berlin, No. 1405, 1989. [18] R. Lucchetti, F. Mignanego, and G. Pieri. Existence theorem of equilibrium points in Stackelberg games with constraints. Optimization, 18:857{ 866, 1987. [19] P. Loridan and J. Morgan. Weak via strong Stackelberg problem: New results. Journal of Global Optimization, 8:263{287, 1996. [20] B. Bank, J. Guddat, D. Klatte, B. Kummer, and K. Tammer. Non-Linear Parametric Optimization. Akademie-Verlag, Berlin, 1982. [21] S. Dempe and H. Schmidt. On an algorithm solving two-level programming problems with nonunique lower level solutions. Computational Optimization and Applications, 6:227{249, 1996. [22] P. Loridan and J. Morgan. Least-norm regularization for weak two-level optimization problems. In Optimization, Optimal Control and Partial Differential Equations, volume 107 of International Series of Numerical Mathematics, pages 307{318. Birkhauser Verlag, Basel, 1992. [23] D.A. Molodtsov. The solution of a certain class of non-antagonistic games. Zurnal Vycislitel'noi Matematiki i Matematiceskoi Fiziki, 16:1451{1456, 1976. [24] J.W. Daniel. Stability of the solution of de nite quadratic problems. Mathematical Programming, 5:41{53, 1973. [25] H. Schmidt. Zwei-Ebenen-Optimierungsaufgaben mit mehrelementiger Losung der unteren Ebene. PhD thesis, Fakultat fur Mathematik, Technische Universitat Chemnitz-Zwickau, 1995. [26] A. Shapiro. Sensitivity analysis of nonlinear programs and di erentiability properties of metric projections. SIAM Journal on Control and Optimization, 26:628{645, 1988. [27] D. Ralph and S. Dempe. Directional derivatives of the solution of a parametric nonlinear program. Mathematical Programming, 70:159{172, 1995. [28] B. Kummer. Newton's method for non-di erentiable functions. In Advances in Mathematical Optimization, volume 45 of Mathematical Research. Akademie-Verlag, Berlin, 1988. 21

[29] R.W. Chaney. Piecewise C k functions in nonsmooth analysis. Nonlinear Analysis, Methods & Applications, 15:649{660, 1990. [30] A.V. Fiacco and G.P. McCormick. Nonlinear Programming: Sequential Unconstrained Minimization Techniques. John Wiley & Sons, New York, 1968. [31] S. Dempe. On generalized di erentiability of optimal solutions and its application to an algorithm for solving bilevel optimization problems. In D.-Z. Du, L. Qi, and R.S. Womersley, editor, Recent advances in nonsmooth optimization, pages 36{56. World Scienti c Publishers, Singapore, 1995. [32] S. Dempe. An implicit function approach to bilevel programming problems. In A. Migdalas, P.M. Pardalos, and P. Varbrand, editors, Multilevel Optimization: Algorithms and Applications, pages 273{294. Kluwer Academic Publishers, Dordrecht, 1998. [33] K. Malanowski. Di erentiability with respect to parameters of solutions to convex programming problems. Mathematical Programming, 33:352{361, 1985. [34] S. Dempe and S. Vogel. The subdi erential of the optimal solution in parametric optimization. Technical Report 97-10, Fakultat fur Mathematik und Informatik, TU Bergakademie Freiberg, 1997. [35] V.S. Mikhalevich, A.M. Gupal, and V.I. Norkin. Methods of Nonconvex Optimization. Nauka, Moscow, 1987. (in Russian). [36] H. Schramm. Eine Kombination von bundle- und trust-region-Verfahren zur Losung nichtdi erenzierbarer Optimierungsprobleme. Bayreuther Mathematische Schriften, Bayreuth, No. 30, 1989. [37] H. Schramm and J. Zowe. A version of the bundle idea for minimizing a nonsmooth function: conceptual idea, convergence analysis, numerical results. SIAM Journal on Optimization, 2:121{152, 1992. [38] K.C. Kiwiel. Methods of Descent for Nondi erentiable Optimization. Springer - Verlag, Berlin, 1985. [39] S.M. Robinson. Generalized equations and their solution. Part I: Basic theory. Mathematical Programming Study, 10:128{141, 1979. [40] Z.-Q. Luo, J.-S. Pang, and D. Ralph. Mathematical Programs with Equilibrium Constraints. Cambridge University Press, Cambridge, 1996. [41] J. Outrata, M. Kocvara, and J. Zowe. Nonsmooth Approach to Optimization Problems with Equilibrium Constraints. Kluwer Academic Publishers, Dordrecht, 1998. [42] S. Scholtes and M. Stohr. Exact penalization of mathematical programs with equilibrium constraints. SIAM Journal on Control and Optimization, 37:617{652, 1999. [43] D.J. White and G. Anandalingam. A penalty function approach for solving bi-level linear programs. Journal of Global Optimization, 3:397{419, 1993. 22

Suggest Documents