an unconstrained minimization approach to the ... - Semantic Scholar

7 downloads 0 Views 284KB Size Report
rfk. Nk. (xk ?l)Lk. (xk ?u)Uk. 3. 75;. (19) where Lk, Uk and Nk are given by (15), (16), and (17) respectively and where Hk is an "ap- proximation" of r2fk satisfying ...
AN UNCONSTRAINED MINIMIZATION APPROACH TO THE SOLUTION OF OPTIMIZATION PROBLEMS WITH SIMPLE BOUNDS

Francisco Facchinei and Stefano Lucidi Universita di Roma \La Sapienza" Dipartimento di Informatica e Sistemistica Via Buonarroti 12, 00185 Roma, Italy e-mail (Facchinei): [email protected] e-mail (Lucidi): [email protected]

Abstract: A new method for the solution of minimization problems with simple bounds is presented. Global convergence of a general scheme requiring the solution of a single linear system at each iteration is proved and a superlinear convergence rate is established without requiring the strict complementarity assumption. The theory presented covers Newton and Quasi-Newton methods, allows rapid changes in the active set estimate and is based on a smooth unconstrained reformulation of the bound constrained problem. Key Words: Bound constrained problem, penalty function, Newton method, nonmono-

tone line search, strict complementarity.

2

F. Facchinei and S. Lucidi

1 Introduction We are concerned with the solution of simple bound constrained minimization problems of the form min f (x); s:t: l  x  u; (PB) where the objective function f is smooth, l and u are constant vectors, and the inequalities are valid componentwise. This paper is the continuation of [13], where a wide class of di erentiable exact penalty functions for Problem (PB) was introduced and studied in detail. On the basis of these penalty functions it is possible to de ne several algorithm schemes for the solution of Problem (PB). This paper is devoted to the detailed study of one of these schemes, which, from the results reported in [13] appears to be promising. Penalty functions (both di erentiable and nondi erentiable ones) for the solution of quadratic, box constrained problems attracted much attention in the last few years and showed to be a powerful tool that can lead to ecient algorithms [6, 20, 21, 23, 24, 25, 26]. This work can be seen as an attempt to extend these kind of results to nonquadratic problems. Box constrained problems arise often in the applications, and some authors even claim that any real-world unconstrained optimization problem is meaningful only if solved subject to box constraints. These facts have motivated considerable research devoted to the development of ecient and reliable solution algorithms, especially in the quadratic case. The development of such algorithms is a challenging task; in fact, on the one hand the appealing structure of the constraints urges the researchers to try to develop ad hoc minimization techniques that take full advantage of this structure; on the other hand Problem (PB) still retains the main diculty generally associated with inequality constrained problems: the determination of the set of active constraints at the solution. The algorithms most widely used to solve Problem (PB) fall in the active set category. In this class of methods at each iteration we have a working set that is supposed to approximate the set of active constraints at the solution and that is iteratively updated. In general, only a single active constraint can be added or deleted to the active set at each iteration, and this can unnecessarily slow down the convergence rate, especially when dealing with large-scale problems. Note, however, that, for the special case of Problem (PB), it is possible to envisage algorithms that update the working set more eciently [17], especially in the quadratic case [9]. Actually, a number of proposals have been made in the last years to design algorithms that quickly identify the correct active set. With regard to Problem (PB), the seminal work is [2] (see also [1]), where it is shown that if the strict complementarity assumption holds, then it is possible, using a projection method, to add or delete to the current estimated active set many constraints at each iteration and yet nd an active set in a nite number of steps. This work has motivated a lot of further studies on projection techniques, both for the general linearly constrained case and for the box constrained case (see , e.g. [3], [4], [5], and [10]), and it is safe to say that algorithms in this class are among the most ecient ones for the solution of large scale, convex, quadratic problems, [29], [30]. More recently trust region type algorithms for unconstrained optimization have been succesfully extended to handle the presence of bounds on the variables. The global convergence theory thus developed is very robust [7],[16] and, under appropriate assumptions, it is possible to establish a superlinear convergence rate without requiring strict complementarity [22],[16]. Furthermore, preliminary numerical results on small, dense problems [8],[16] show that these methods are e ective and suggest that they are well suited to large-scale problems. Another algorithm also based on a trust region philosophy, but in connection with a nonsmooth merit function, is proposed in [34]. A major di erence between this latter algorithm and the techniques so far considered is that the iterates generated are not forced to remain feasible throughout.

An algorithm for bound constrained problems

3

We nally mention that interior point methods for the solution of Problem (PB) are currently an active eld of research and that some interesting theoretical results can be obtained in this framework, yet computational experience with this class of methods is still very limited (see, however, [31]). In this paper we propose a new general scheme for the solution of Problem (PB) which does not t in any of the categories considered above. At each iteration a linear system is solved to compute a search direction. This linear system, whose de nition is based on a powerful active set identi cation technique, can be viewed as the system arising from the application of the Newton method to the solution of the Kuhn-Tucker conditions for Problem (PB). To globalize this Newton-type algorithm it is possible to employ a line search nonmonotone stabilization technique ([19]) in conjunction with a simple continuously di erentiable exact penalty function for Problem (PB) whose properties were studied in [13]. Di erentiable penalty functions are often blamed for being too computationally expensive, however the one we employ in this paper takes full advantage of the structure of Problem (PB) and is not expensive. Furthermore, thanks to the nonmonotone stabilization technique, we have to resort to the penalty function aid very seldom, and in most iterations we do not even compute its value. It is worth mentioning the following points. (a) A complete global convergence theory is established for the proposed general scheme which covers Newton and Quasi-Newton algorithms. (b) It is shown that this general scheme does not prevent superlinear convergence, in the sense that if a step length of one along the search direction yields superlinear convergence then, without requiring strict complementarity, the step length of one is eventually accepted. (c) Rapid changes in the active set estimate are allowed. (d) The points generated by the algorithms at each iteration need not be feasible. (e) The main computational burden per iteration is given by the solution of a square linear system whose dimension is equal to the number of variables estimated to be non active. (f) A particular Newton-type algorithm is described which falls in the general scheme of point (a) and for which it is possible to establish, under the strong second order sucient condition, but without requiring strict complementarity, a superlinear convergence rate. Numerical results obtained with the algorithm of point (f), were reported in [13], where it was shown that our approach is viable in practice, at least for small-to-medium-size problems. We are currently investigating truncated versions of this algorithm with the aim of tackling large scale problems as well; we shall report on this topic in a future paper. Regarding point (d) we note that the possible infeasibility of the points generated may constitute a limitation if the bounds are \hard", but often this is not the case, and the possibility to violate some of the constraints may give additional, bene cial exibility. The paper is organized as follows. In the next section some basic de nitions and assumptions are stated. In Section 3 the main properties of a di erentiable merit function for problem (PB) are recalled. Sections 4, 5, and 6 contain a detailed exposition of the algorithm and an analysis of its main properties (some relevant lengthy proofs which are an extension of known results for unconstrained minimization problems are collected in the Appendix). Conclusions and directions for future research are outlined in Section 7. Regarding the notation , if M is a n  n matrix with rows Mi , i = 1; : : :; n, and if I and J are index sets such that I , J  f1; : : :; ng, we denote by MI the jI j n submatrix of M consisting of

4

F. Facchinei and S. Lucidi

rows Mi , i 2 I , and we denote by MI;J the jI jjJ j submatrix of M consisting of elements Mi;j , i 2 I , j 2 J . If w is an n vector, we denote by wI the subvector with components wi , i 2 I , and we denote by Diag[wi] the n  n diagonal matrix with diagonal elements wi . A superscript k is used to indicate iteration numbers; furthermore, we shall often omit the arguments and write, for example, f k instead of f (xk ). Finally we indicate by E the n  n identity matrix and by kk the euclidean norm.

2 Problem formulation and preliminaries For convenience we recall Problem (PB) min f (x);

s:t:

l  x  u:

(PB)

For simplicity we assume that the objective function f : IRn ! IR is three times continuously di erentiable (even if weaker assumptions can be used, see Remark 6.1) and that li < ui for every i = 1; : : :; n. Note that ?1 and +1 are admitted values for li and ui respectively, i.e. we also consider the case in which some (possibly all) bounds are not present. In the sequel we indicate by F the feasible set of Problem (PB), that is:

F := fx 2 Rn : l  x  ug:

(1)

Let 2 IRn and 2 IRn be two xed vectors of positive constants and let xa and xb be two feasible points such that f (xa ) 6= f (xb). Without loss of generality we assume that f (xa) < f (xb ). The algorithms proposed in this paper generates, starting from xa , a sequence of points which belong to the following open set

S := fx 2 IRn : l ? < x < u + ; f (x) < f (xb)g: Roughly speaking xa is the starting point, while xb determines the maximum function value

which can be taken by the objective function in the points generated by the algorithm. We remark that not every point produced by the algorithm we propose is feasible; feasibility is only ensured in the limit. Note also that and are arbitrarly xed before starting the algorithm and never changed during the minimization process. We now introduce an assumption that is needed to guarantee that no unbounded sequences are produced by the minimization process. This hypothesis has the same role which the compactness assumption on the level sets of the objective function has in the unconstrained case. We note that this assumption (or a similar one) is needed by any standard algorithm which guarantees the existence of a limit point.

Assumption 1 The set S is bounded. Assumption 1 is automatically satis ed in the following cases: - all the variables have nite lower and upper bounds - f (x) is radially unbounded, that is limkxk!1 f (x) = +1. Observe also that in the unconstrained case Assumption A is equivalent to the standard compactness hypothesis on the level sets of the objective function. In the sequel of this paper we shall consider in detail the results only for the case in which all the variables are box constrained, i.e. the case in which no li is ?1 and no ui is +1. The

An algorithm for bound constrained problems

5

extension to the general case is trivial and, therefore, we omit it. With this assumption, the KKT conditions for x to solve Problem (PB) are rf (x) ?  +  = 0; (l ? x)0 = 0; (2) (x ? u)0 = 0;   0;   0; l  x  u; where  2 IRn and  2 IRn are the KKT multipliers. Strict complementarity is said to hold at the KKT triplet (x; ; ) if xi = li implies  i > 0 and xi = ui implies i > 0. An equivalent way to write the KKT conditions is the following one l  x  u; li < xi < ui =) rf (x)i = 0; (3) xi = li =) rf (x)i  0; xi = ui =) rf (x)i  0: In this case the strict complementarity assumption corresponds to having rf (x)i > 0 and rf (x)i < 0 in the second and third implications of (3). It is also possible to give second order sucient conditions of optimality for Problem (PB). The most common is the KKT second order sucient condition, see e.g. [1]. However, in order to prove a superlinear convergence rate without assuming strict complementarity, we shall employ a stronger condition, known as the strong second order sucient condition. This condition has already been employed, with the same purpose, in [22] (see also [33]). Condition 1 Let (x; ; ) be a KKT triplet for Problem (PB). We say that the strong second order sucient condition holds at x if z0 r2f (x)z > 0; 8z 2 fz 2 IRn : zi = 0; 8i :  i > 0 or i > 0g: We note that the strong second order condition boils down to the KKT second order conditions if the strict complementarity assumption holds. In general, however, condition 1 is stronger than the KKT second order sucient condition in that it requires the de nite positiveness of the Hessian of the objective function on a larger region.

3 A di erentiable exact penalty function for Problem (PB) In this section we introduce a di erentiable exact penalty function for Problem (PB). The penalty functions belongs to a more general class of penalty function studied in [13]. Here we report only some very basic facts on this penalty function, the interested reader is referred to [13] for a more complete discussion. The the penalty function is given by



P (x; ") := f (x) + Pni=1 i (x)ri(x; ")+ 1" cr(ix()xa;i"()x2)







2 P + ni=1 i (x)si(x; ") + 1" cs(ix()xb;i"()x) ;

(4)

6 where:

F. Facchinei and S. Lucidi



   " " ri(x; ") := max li ? xi; ? 2 c(x)ai(x)i(x) ; si (x; ") := max xi ? ui; ? 2 c(x)bi(x)i (x) ; (5)

and where, ai , bi and ci are barrier functions, see [13]

ai(x) := i ? li + xi; bi(x) := i + ui ? xi; c(x) := f (xb) ? f (x);

(6)

while i(x) and i (x) are, according to the de nition given in [15], multiplier functions 2 i(x) = (x ? u(x)i2?+u(il) ? x )2 rfi(x) 8i = 1; : : :; n: i i i i 2 i (x) = ? (x ? u(l)i 2?+x(i )l ? x )2 rfi (x); 8i = 1; : : :; n; :

i

i

i

i

(7) (8)

The multiplier functions are obviously continuously di erentiable and it is trivial to verify, see [13], that if (x; ; ) is a Kuhn-Tucker triplet for Problem (PB) then (x) =  , and (x) = . The penalty function depends, as usual, on a positive parameter "; furthermore it is de ned only in S (see Section 2). We note that, on the boundary of S , at least one of the terms ai (x), bi(x) and c(x) goes to zero, and this causes the level sets of penalty function to be compact. In particular, this implies that a minimization algorithm applied to the penalty function will never generate unbounded sequences. A detailed study of the properties of P (x; ") can be found in [13]. It can be proved that, for suciently small values of the penalty parameter ", there is a one-to-one correspondence between (unconstrained) stationary and minimum points of the penalty function on S and stationary and minimum points of Problem (PB). Hence, we can solve the Problem (PB) by performing a single, unconstrained minimization of P (x; "), provided that " is small enough. >From this point of view another important feature of the penalty function P (x; ") is that, in spite of the terms (5), it is continuously di erentiable in S , so that standard, ecient methods for unconstrained smooth minimization can be employed. The gradient of P (x; ") is given by (see [13]):









rP (x; ")= ? "c1(x) Diag ai1(x) Diag 2 + rai(ix;(x")) r(x; ")    s (x; ")  1 + "c(x) Diag b (1x) Diag 2 + ib (x) s(x; ") i i     (9) 1 1 0 r ( x ; " ) Diag +r(x)r(x; ") + r ( x ; " ) r f ( x ) "c(x)2   ai(x)  1 1 0 +r(x)s(x; ") + "c(x)2 s(x; ") Diag bi(x) s(x; ") rf (x): We remark that the terms of the gradient have been rearranged using the expression of the multiplier functions, so that the expression of the gradient of f (x) does not appear explicitly in the above formula. We nally report the following technical result, that will be used in the sequel and that follows from [13, Proposition 2.3].

Proposition 3.1 For every " > 0 the set fx 2 S : P (x; ")  P (xa; ")g is closed. Furthermore, there exists a compact set L such that, for every " > 0 we have fx 2 S : P (x; ")  P (xa; ")g  L  S :

An algorithm for bound constrained problems

7

The properties of the penalty function reported so far clearly show that we can solve Problem (PB) by unconstrained optimization techniques. However, if one wants to develope a practical algorithm at least two important questions have two be answered. How to calculate a suitable value of ", so that, as discussed above, the unconstrained minimization of the penalty function is equivalent to the solution of Problem (PB); and which unconstrained optimization algorithm to employ for the minimization of the penalty function. In [13] we have proposed a very general scheme for updating " that, coupled with practically any standard unconstrained minimization algorithm allows us to solve Problem (PB). This scheme has however a drawback in that it does not exploit the structure neither of Problem (PB) nor of the minimization algorithm employed. The aim of this paper is to present a new algorithm based on P (x; ") which answers in a novel and innovative way to the two questions raised above. In particular the unconstrained minimization algorithm we use is a nonmonotone line-search scheme which uses a search direction that is strongly related to the KKT conditions of Problem (PB). Since this algorithm is so much tailored to the structure of the problem, we can use a rule for updating the penalty parameter di erent from that proposed in a much broader context in [13] and which, for the problem at hand, is much more ecient from a practical point of view. To facilitate the reader, we split the presentation of the new scheme into three parts. In the next section we introduce a general algorithm model for the minimization of P (x; ") which is based on the one proposed in [19]. We show that, for every xed value of the penalty parameter ", this algorithm is globally convergent to solutions of Problem (PB), provided that the search direction satis es suitable, nonstandard assumptions. In Section 5 we show how it is possible, by considering an approximation of the KKT conditions, to compute cheaply and eciently a search direction which, for suciently small values of the penalty parameter ", satis es all the assumptions required for the global convergence of the nonmonotone algorithm of Section 4. Finally, in Section 6 we describe the overall algorithm, which includes an automatic updating scheme for the penalty parameter " and is based on the results of the previous two sections. We also study the local convergence properties of the algorithm and show that, under mild assumptions, it is quadratically convergent to solutions of Problem (PB).

4 A nonmonotone algorithm for the minimization of P (x; ")

In this section we introduced a nonmonotone algorithm for the minimization of P (x; ") for a xed value of the penalty parameter ". We show that, if the search direction employed satis es certain nonstandard conditions, then every limit point of the sequence produced is both a stationary point of P (x; ") and a solution of Problem (PB). In the next two sections we shall show how a direction d satisying these nonstandard assumptions can be computed. The algorithm we consider is an iterative process of the form

xk+1 = xk + k dk ;

(10)

where x0 = xa 2 F is the starting point, dk is the search direction and k is the stepsize. In order to establish the convergence properties of the algorithm, we assume that the following assumption on the direction dk is always satis ed.

Assumption 2 The search direction dk 2 IRn satis es the following conditions: (a) dk = 0 if and only if xk is a stationary point of Problem (PB);

8

F. Facchinei and S. Lucidi

(b) if xk ! x and dk ! 0, then x is a stationary point of Problem (PB). Furthermore, in certain speci c iterations (see below), we shall also assume that the following condition is ful lled by the direction.

Assumption 3 There exists a positive number such that the search direction dk 2 IRn satis es

the condition

rP (xk ; ")T dk  ? kdkk : 2

Using these assumptions we can now introduce a general algorithm model for the solution of Problem (PB) which is strongly based on the NonMonotone stabilization algorithm proposed in [19] and which includes as particular cases, many known linesearch algorithms. The algorithm model is an iterative process of the form (10) that includes di erent strategies for enforcing global convergence without requiring a monotonic reduction of the merit function. This may be reasonable in many situations. For example, in our case, if the sequence fkdk kg goes to zero, then, by Assumption 2 (b), the corresponding sequence of points fxk g is converging to stationary points. Then an e ective criterion to control if convergence is taking place is to check whether the norm of the direction is decreasing. Thus the \normal step" of the algorithm is to check whether the norm of the direction has \suciently" decreased. If it has, the algorithm accepts the unit step size ( k = 1) without computing the merit function. Otherwise, after a check on the objective function value (and a possible \backtrack", see below), the algorithm performs a nonmonotone Armijo-type linesearch procedure [18]. In order to prevent the sequence of points from leaving the region of interest (with possible convergence to local maxima or occurence of over ows) a \function step" is performed at least every N  1 iterations. In a \function step" the objective function is computed and its value is compared with an adjustable reference value (R). If the value of the objective function is smaller than the reference value the algorithm proceeds like in a normal step, as above. Otherwise the algorithm \backtracks" by restoring the vector of variables to the last point where the objective function was smaller than the reference value R. The linesearch procedure is

linesearch: If necessary modify dk so that Assumption 3 is satis ed. Find the smallest integer from i = 0; 1; : : : such that

xk + 2?i dk 2 S P (xk + 2?i dk ; ")  R ? 2?i rP (xk ; ")T dk ;

(11) (12)

set k = 2?i , ` = k + 1 and update R. Note that this linesearch procedure is invoked only in certain steps and that only in this steps Assumption 3 is required to hold. In the description of the algorithm that follows, ` denotes the iteration index where the merit function has been evaluated and the reference value modi ed. The precise form of the backtraking procedure then can be described as follows

backtrack: Replace xk by x` and set k = `.

An algorithm for bound constrained problems

9

Actually in describing the algorithm we have simpli ed matters in order to concentrate on the main arguments. In fact when we try to accept the unit stepsize without computing the merit function value (normal step), we have to chech whether xk + dk 2 S , because the merit function is de ned only in the open set S and we cannot generate points outside this region. Hence we have to possibly \scale" dk in order to ensure the condition xk + dk 2 S . Note that this procedure is standard and common to all the modi cations of unconstrained minimization algorithm designed to locate stationary points of an objective function on an open set (see, e.g., [27]). The scaling procedure is scale: Find the smallest integer from i = 0; 1; : : : such that

xk + 2?i dk 2 S

(13)

and set dk = 2?i dk . We can now describe the algorithm more in detail.

NonMonotone Stabilization Algorithm for Box Constrained Problems (NMSB) Data: Choose x = xa 2 F , " > 0,   0, 2 (0; 1),  2 (0; 1=2) and N  1. Initialization: Set k = 0, j = 0, `(j ) = 0, and  =  . 0

0

0

Compute P (x0 ; ") and set Rj = P (x0; ").

Iteration:

Compute dk satisfying Assumption 2. If kdk k = 0 stop. If k 6= ` + N perform an n-step to calculate k ; otherwise perform an f-step to calculate k . Set xk+1 = xk + k dk , k = k + 1, and repeat Iteration.

n-step:

If kdk k   perform (a), otherwise perform (b). (a) Perform scale and set  =  . (b) Compute P (xk ; ") If P (xk ; ")  Rj perform backtrack and linesearch; otherwise set `(j ) = k, perform update Rj , set j=j+1 and perform linesearch.

f-step:

Compute P (xk ; "). If P (xk ; ")  Rj perform (c); otherwise perform (d). (c) Perform backtrack and linesearch; (d) Set `(j ) = k, perform update Rj and set j = j + 1. If kdk k   perform scale and set  =  ; otherwise perform linesearch.

In order to complete the description of the algorithm we only need to specify the way in which the reference value Rj is updated. To this end we note that the index j is incremented each time we set `(j ) = k, i.e. each time the function is evaluated. Therefore fx`(j )g is the sequence of points where the merit function is evaluated and fRj g is the sequence of reference values. The reference value is initially set to P (x0 ; "). Whenever a point x`(j ) is generated such that P (x`(j) ; ") < Rj , the reference value is updated by tacking into account the \memory" (i.e. a xed number m(j )  m of previous values ) of the objective function. To be precise, the updating rule for Rj is the following one.

10

F. Facchinei and S. Lucidi

Update Rj : Given m  0, let m(j + 1) be such that m(j + 1)  min[m(j ) + 1; m ]: Choose the value Rj +1 to satisfy

P (x`(j+1) ; ")  Rj+1  0imax P (x`(j+1?i) ; "): m(j +1)

(14)

Note that if, in the procedure Update Rj , m  = 0, then each time we update Rj , we simply set it to the current penalty function value, so that we perform monotone linesearches. On the other hand, if m > 0, then it is possible to choose a value of Rj which is larger than the current value of the penalty function, so that we perform nonmonotone linesearches and the penalty function value can increase from one iteration to the next one. It turns out that nonmonotone linesearches are a very valuable tool from the numerical point of view [18, 19]. The NMSB algorithm is a very general scheme and encompasses many possible extensions of unconstrained algorithms. For example, if we set m = 0 and 0 = 0 we obtain the Armijo stabilization algorithm; if we set m > 0 and 0 = 0 we obtain the box constrained version of the nonmonotone algorithm proposed in [18]. By an almost verbatim repetition of the proof of convergence described in [19] it is easy to show that the following result holds. Theorem 4.1 Suppose that we generate a sequence fxk g according NMSB algorithm described above. Then: (i) there exists at least one limit point of the sequence fxk g; (ii) every limit point of the sequncece fxk g is a KKT of Problem (PB); (iii) every limit point x of the sequncece fxk g is such that f (x)  f (xa ). The detailed proof of this theorem, which is just an adaptation of some of the proofs in [19], is long and cumbersome, and we therefore report it in the Appendix. In the statement of Theorem 4.1 we have stressed the properties of the algorithm in term of properties of Problem (PB). However we can equivalently see algorithm NMSB as an algorithm for the minimization of the penalty function. From this point of view, we can also see that every accumulation point of the sequence generated by the algorithm is a stationary point of the penalty function. This easily follows by the fact every KKT point of Problem (PB) is a stationary point of P (x; ") for every positive value of ", see [13].

5 The search direction

In this section we show how to build a direction dk that satis es Assumptions 2 and 3. In particular we de ne a search direction which satis es Assumption 2 for every value of ", while Assumption 3 is ful lled only if the penalty parameter " is suciently small. The calculation of dk is based on an identi cation technique that uses the simple multiplier functions (7) and (8) and on the solution of KKT-like equations for Problem (PB). Based on the multiplier functions we can introduce the following \guessing" of the sets of indexes active at their lower or upper bounds. L(x; ") := fi : xi  li + min[ "c2(x) ai (x)i(x); ui 3? li ]g (15) li ]g; (16) U (x; ") := fi : xi  ui ? min[ "c2(x) bi (x)i(x); ui ? 3

An algorithm for bound constrained problems

11

where " is the positive parameter used in the penalty function and ai (x), bi (x), and c(x) are the barriers functions de ned by (6). Furthermore we indicate by N (x; ") the set of indices that are estimated to be non active.

N (x; ") := f1; : : :; ng n L(x; ") [ U (x; ")

(17)

Note that the sets L(x; "), U (x; ") and N (x; ") are pairwise disjoint for every x and for every positive value of ". The following theorem shows that in fact these are good estimates, at least in the neighborhood of a KKT triplet of Problem (PB). The validity of this theorem immediately follows from Theorem 2.1 and Remark 2.1 in [15].

Theorem 5.1 Let (x; ; ) be a K-T triplet for Problem (PB) and let " be a positive constant.

Then, for every " 2 (0; "] there exists a neighborhood of x such that, for all x 2 , we have fi : li < xi < uig  N (x; ")  fi : i = 0 and i = 0g fi : i > 0g  L(x; ")  fi : li = xig (18) fi : i > 0g  U (x; ")  fi : ui = xig: Moreover, if the strict complementarity assumption holds, then, for all x 2 and " 2 (0; "]

N (x; ") = fi : li < xi < uig L(x; ") = fi : li = xig U (x; ") = fi : ui = xi g: The direction we shall use in our algorithm, then, is de ned as the solution of the system

2 Hk 3 2 rf k 3 k N 64 ELk 75 dk = ? 64 (xk ?Nlk)Lk 75 ; EU k (xk ? u)U k

(19)

where Lk , U k and N k are given by (15), (16), and (17) respectively and where H k is an "approximation" of r2f k satisfying the following assumption.

Assumption 4 The matrices H k are bounded and a positive  exists such that, for every k kzk2  z 0HNk k ;N k z;

8z 2 IRjN k j:

By Theorem 5.1 (for every " > 0), there exists a neighborhood of a KKT point of Problem (PB) satisfying the strong second order condition, where the matrices H k = r2f (xk ) satisfy Assumption 4. Furthermore, Assumption 4 obviously guarantees that the direction dk is well de ned, i.e. that the system (19) is uniquely solvable. Note that the search direction dk depends on the current approximation to the solution, xk , on the matrix H k and on the value of ", so that we should write d(xk ; H k ; "). However, when there is no possibility of misunderstanding, we shall use the short notation dk and use the full notation d(xk ; H k ; ") only when we want to explicitly stress the dependence of dk on its arguments or when there can be ambiguities. The next theorem shows that Assumption 2 (a) is always satis ed.

12

F. Facchinei and S. Lucidi

Theorem 5.2 For every " > 0, for every xk belonging to S , and for every matrix H such that

HN k ;N k is positive de nite, dk is equal to zero if and only if x is stationary point for Problem (PB).

Proof. Suppose that dk = 0. Then, since dk is the solution of system (19), and taking into

account the de nition of the multiplier functions and of the index sets Lk , N k and U k , we have that the following implications hold: i 2 Lk =) xki = li and rfi (xk )  0; i 2 N k =) li < xki < ui and rfi(xk ) = 0; i 2 U k =) xki = ui and rfi (xk )  0, that is (3). Suppose now that xk is a stationary point for Problem (PB). Since xk is feasible, we have, by (18), (xk ? l)Lk = 0; (xk ? u)U k = 0;

8" > 0 8" > 0:

Furthermore by the rst equation of (2) and (18) we have that

rf (xk )N k = 0;

8" > 0:

Hence the right hand side of system (19) is 0, and the theorem follows by noting that, by the assumption made on H , system (19) is non singular. / We now consider Assumption 2 (b).

Theorem 5.3 Let fxk g be a sequence of points such that xk 2 S and such that fd(xk; H k; ")g ! 0. Then, for every positive value of ", every accumulation point x of fxk g is a KKT point.

Proof. We can write the solution of system (19) in the following form dkN k = ?(HNk k;N k )?1 [rfNk k + HNk k;Lk dkLk + HNk k ;U k dkU k ] (20) dkLk = ?(xk ? l)Lk k k dU k = ?(x ? u)U k : Taking into account that the number of subsets of f1; : : :; ng is nite, we can assume, without loss of generality, that the index sets Lk , N k , and U k are constant, so that we can write: L(xk ; ") = L; N (xk; ") = N; U (xk ; ") = U: Passing to the limit in (20), and taking into account that by assumption fdk g ! 0 and Assumption 4, we get

xL = lL; rfN (x) = 0: By (22), (8), (7) we have

N (x) = 0;

xU = uU ; N (x) = 0:

(21) (22)

An algorithm for bound constrained problems

13

Then, by the de nition of the index set N , we have lN  xN  uN ; (23) so that, recalling (21) we conclude that x is feasible. By the de nition of L, (21), and (7), we have rfL(x)  0: (24) Analogously, by the de nition of U , (21), and (8), we also have ?rfU (x)  0: (25) Now, the theorem follows noting that (21)-(22) and (23)-(25) coincide with (3). / In the last theorem of this section we show that also Assumption (3) can be satis ed. But there is a di erence to be stressed. While it was possible to prove that Assumptions 2 (a) and (b) are satis ed, by the search direction de ned by (19), for every positive value of the penalty parameter ", it is possible to satisfy Assumption 3 only for suciently small values of the penalty parameter ". On the other hand this fact is not unexpexted, since there is a complete correspondence between the solutions of Problem (PB) and the unconstrained minimizers of the penalty function only for suciently small values of ". Theorem 5.4 There exists an " > 0 such that for every fxk g and f"g such that (i) " 2 (0; "], (ii) xk 2 fx 2 S : P (xk ; ")  P (xa ; ")g , the following relation holds: rP (xk ; ")0d(xk; H k; ")  ? kd(xk; H k; ")k2; (26) for some positive .

Proof. The proof is by contradiction. Assume that the theorem is false, then there exist sequences fxk g, f"k g, fH k g, and f k g such that "k # 0; k # 0; xk 2 fx 2 S : P (xk ; "k )  P (xa ; "k )g; (27) rP (xk ; "k )0dk > ? k kdkk2: Furthermore, analogously to the proof of Theorem 5.3, we shall assume, without loss of generality, that the index sets Lk , N k , and U k are constant, namely: L(xk ; "k ) = L; N (xk ; "k ) = N; U (xk ; "k ) = U: By (9) we can write " k# " # 1 r k 0 k k 0 i (rd ) d = ? (r ) Diag 2 + Diag 1 dk "k ck

aki

"

#

aki

" #

k + k1 k (sk )0 Diag 2 + ski Diag 1k dk + (rk )0 (rk )0dk "c bi bi

+ k 1k 2 (rk )0Diag " (c ) +

1

(s "k (ck )2

k )0 Diag

" #

1 rk (rf k )0dk + (sk )0 (rk )0dk k

ai

" #

1 sk (rf k )0dk k

bi

(28)

14

F. Facchinei and S. Lucidi

By Assumption 1 and by (15), (16) and (19) we have, for "k small enough,

k ski = ? "2 ck bki ki ; k ski = ? "2 ck bki ki ; ski = xki ? ui = ?dki ;

i 2 L =) rik = li ? xki = dki ; k i 2 N =) rik = ? "2 ck aki ki ; k i 2 U =) rik = ? "2 ck aki ki ;

so that we can rewrite (28) as " # " k# 1 l ? x i i k 0 k k 0 (rd ) d = ? k k (dL) Diag 2 + k (Diag 1k dk )L "c ai ai

"

LL

k#

"

(29)

#

k k k k k + 21 (kN )0 Diag 2 ? " c2 i dkN + 21 (kU )0Diag 2 ? " c2 i dkU NN UU " k k k# " k k k# dkN ? 21 (kL)0Diag 2 ? " c2 i dkL ? 21 (kN )0Diag 2 ? " c2 i LL NN

?

"

1 (dk )0Diag 2 + ui ? xki k k U k

"c

bi

#

" #

UU

(Diag 1k dk )U + (dkL)0[(rk )0 dk ]L

bi

(30)

k k k k h i h i ? " 2c (Diag aki k)0N [(rk)0dk ]N ? " 2c (Diag aki )kU 0[(rk)0dk ]U k k k k h i h i ? " 2c (Diag bki k )0L[(rk )0dk ]L ? " 2c (Diag bki k )0N [(rk )0dk ]N " # " # 1 1 k k 0 k 0 k 0 k 0 k ? (dU ) [(r ) d ]U + k k [(r ) Diag k r + (s ) Diag 1k sk ](rf k )0dk : " (c ) ai bi 2

We now make the following readily veri able observations. k

k

(i) Each element [2+ li ?kxi ], i 2 L, and [2+ ui ?k xi ], i 2 U , is greater than 1. These elements ai bi appear in some diagonal matrices in the formula above. (ii) Taking into account (19) we can write k dk ? H k dk ? H k dk ; (rf k )N = ?HN;N N N;L L N;U U and hence we have, for i 2 N k 2 k dk + H k dk + H k dk ) (k )i = ? k (x 2 ? u)i k 2 (HN;N N N;L L N;U U i (x ? u)i + (l ? x )i xk )2i k dk + H k dk + H k dk )i (HN;N (k )i = k (l ? N N;L L N;U U 2 k 2 (x ? u)i + (l ? x )i (iii) By (8), (7) and (29) we can write, for i 2 L [ U k2 rf (xk )i (k )i = k (2d )i (x ? u)i + (l ? xk )2i k 2 rf (xk )i (k )i = ? k (2d )i (x ? u)i + (l ? xk )2i

An algorithm for bound constrained problems

15

(iv) The quantities kxk ? lk, kxk ? uk, k(xk )k, k(xk )k, kr(xk )k, kr(xk )k and krf (xk )k are bounded. (v) By Assumption 4 and (19), since xk 2 S , the sequence kdk k is bounded. Then, taking into account (30) and the points (i)-(iv) above, we can assert that, for "k small enough (rdk )0dk  ? Kk 1k kdkL k2 ? K2kdkN k2 ? Kk 3k kdkU k2 + K4 kdkLk2 + K5kdkU k2 "c "c +K6 kdkLkkdkN k + K7 kdkN kkdkU k + K8kdkLkkdkU k " # " # 1 1 k 0 k k 0 + k k 2 [(dL) Diag k dL + (dU ) Diag 1k dkU ](rf k )0dk " (c ) ai bi +"k K9kdk k2

(31)

where K1; : : :; K9 are positive constants. Equations (31) and (27) imply that for k suciently large 0  k kdk k2 ? Kk 1k kdkLk2 ? K2 kdkN k2 ? Kk 3k kdkU k2 + K4kdkL k2 "c "c

+K5 kdkU k2 + K6kdkL kkdkN k + K7kdkN kkdkU k + K8kdkL kkdkU k " # " # 1 1 k k 0 k 0 + k k 2 [(dL) Diag k dL + (dU ) Diag 1k dkU ](rf k )0dk + "k K9kdk k2 " (c ) ai bi

(32)

= ?(kdkL kkdkN k; kdkU k)Qk (kdkLk; kdkN k; kdkU k)0 + "| k K9{zkdk k2} + k 1k 2 [(dkL)0Diag " (c )

|

" #

" #



1 dk + (dk )0Diag 1 dk ](rf k )0 dk ; ak L U bk U i

i

{z 

where Qk is the matrix de ned by

0 K k ?K k ck ? K ? 2 B " B K k K ?2 Q =B B @ ? K2 ? K2 1

? K2 ? K2 K ? K ? k "k ck 7

6

4

6

8

2

8

7

}

3

5

1 CC CC : A

(33)

We want to show that, for "k small enough, Qk is a de nite positive matrix with eigenvalues uniformely bounded away from 0. To this end we note that we can write, for any k,

0 K K 1 k ? ~ ) 0 ? ? K ? (

0 B CC "k ck "~c~ B k B CC 0 0 0 Q = B @ A ?0 0 Kk k ? K"~c~ ? K ? ( k ? ~ ) "c 1 0K K K ? K ?

~ ? ? 2 2 CC BB "~c~ K K CC ; B + B K ?2 ?2 A @ ? K2 K"~c~ ? K ? ~ ? K2 1

1

4

3

1

4

6 7

6

7

8

2

8

3

5

3

5

(34)

16

F. Facchinei and S. Lucidi

where "~, c~ and ~ are positive constants such that the second matrix in the right-hand side of (34) is a positive de nite matrix (this is always possible, for "~, c~ and ~ small enough, as it can be veri ed by using Sylvester Theorem). Since "k and k are positive quantities tending to 0, and since ck is a positive quantity bounded from above, we have that, for k suciently large, the rst matrix in the right-hand side of (34) is positive semide nite, from which, the assertion on the uniform positive de niteness of Qk readily follows. By Proposition 3.4 in [13], we can assume, without loss of generality, that xk ! x 2 F \ S . This implies that ck is bounded away from 0 and, recalling (15), (16), (19) and "k # 0, also that kdkLk ! 0 and kdkU k ! 0. Since by (v) kdk k is bounded, we can assume, without loss of generality, that dk admits a limit so that two cases can occur: (a) dk converges to 0, (b) dk converges to a vector di erent from 0. (a) In case, term (**) can be majorized by the following expression: kk kk ! k d k d M M "L + M U" j(rf k )0dk j; 2

0

1

2

2

(35)

where

1; 1: M M0  1k 2 ; M1  1max 2  max k in ai 1in bk (c ) i Note that M0 , M1 , M2 satisfying the above relations exist because the respective right-hand sides are bounded from above, by Proposition 3.1 and the de nition of the set S . Now taking into account (34), (35) and the fact that j(rf k )0 dk j goes to 0, we easily see that eventually, in (32), the term (**) is dominated by the quadratic term de ned by Qk . Since the same happens for the term (*), because "k goes to 0, we have a contradiction from (32). (b) In this case we have that, as we already observed, dkL ! 0 and dkU ! 0, so that dkN ! d~N = 6 0. We can write

(rf k )0 dk = (rfLk )0 dkL + (rfNk )0 dkN + (rfUk )0dkU : Then, using the observation (ii) above and recalling Assumption 4 we have that (rf k )0dk < 0;

so that the term (**) is non positive. Since the quadratic term in (32) tends to a negative quantity and the term (*) tends to zero, again we have a contradiction from (32) and the proof is complete. /

6 The algorithm At this point of our analysis we have all the tools we need to describe our algorithm and to prove its properties. The results described in the previous section show that the direction dk given by (19) satis es Assumption 2 for every positive ", while Assumption 3 is full lled if " is smaller than a threshold value " (see Theorem 5.4). However the value " generally is not known in advance, and therefore has to be determinated during the minimization process. Actually, it turns out that this is an easy task that can be accomplished by modifying the procedure linesearch in the following way.

"-linesearch: If

rP (xk ; ")T dk  ?"kdk k ; 2

(36)

An algorithm for bound constrained problems then

17

nd the smallest integer from i = 0; 1; : : : such that

xk + 2?i dk 2 S P (xk + 2?i dk ; ")  R ? 2?i rP (xk ; ")T dk ;

(37) (38)

set k = 2?i , ` = k + 1 and update R; otherwise set " = 0:5" and restart the NMSB algorithm with x0 = xk if P (xk ; ")  P (xa; ") and x0 = xa otherwise.

Basically, the procedure "-linesearch di eres from the procedure linesearch in that while in the procedure linesearch it was \vaguely" required to modify dk to satisfy Assumption 3, in the procedure "-linesearch it is checked whether this assumption holds. If not the value of " is resuced. Roughly speaking, Theorem 5.4 guarantees that after a nite number of reductions the value of " settles down and Assumption 3 is always satis ed. Note also that when a reduction of " takes place, and hence, in a sense, we change objective function, we also restart the minimization process from xa if this leads to a better penalty function value. We call the algorithm obtained by subtituting the procedure linesearch by the procedure "-linesearch, algorithm "-NMSB. The following theorem can be proved.

Theorem 6.1 Suppose that we employ algorithm "-NMSB to minimize P (x; "). Suppose also that Assumptions 1 and 4 hold. Then (a) After a nite number of iteration the penalty parameter " stays xed;

(b) There exists at least a limit point of the sequence fxk g generated by the algorithm; (c) Every limit point of the sequence fxk g is a KKT point of Problem (PB).

Proof. We rst prove point (a). The proof is by contradiction. Suppose that " is reduced an in nite number of times. Then, there exist subsequences f"k gK and fxk gK such that f"k gK # 0 and, for every k 2 K (26) is violated, i.e. rP (xk ; "k)T dk > ?"kdk k : 2

Since f"k gK # 0 and taking into account (37) and the fact that, by the instructions of the "-NMSB algorithm, P (xk ; ")  P (xa; ") and xk 2 S , we can apply Theorem 5.4. But then we obtain a contradiction to Theorem 5.4 because eventually "k becomes smaller than , where

is the positive constant whose existence is proved in Theorem 5.4. Thus point (a) is proved. Points (b) and (c) now readily follow by Theorem 4.1. / We now pass to analyze the local properties of the algorithm. We rst show that if convergence occurs towards a point satisfying the strong second order sucient condition and exact second order information is used, then the convergence rate is quadratic.

Theorem 6.2 Suppose that the sequence fxk g produced by the algorithm converges to a point x

satisfying the strong second order sucient condition, and that eventually H k = r2 f (xk ). Then, eventually xk+1 = xk + dk (i.e. the stepsize of one is accepted eventually) and the convergence rate is quadratic.

18

F. Facchinei and S. Lucidi

Proof. We rst make two preliminary observations. The rst observation is that the gradient of

P is semismooth according to the de nition of [28, 32]. This follows easily by the expression (9) and the fact that the composite of semismooth functions is semismooth, that the max operator is semismooth, and that smooth functions are also semismooth [28]. The second observation is that the direction dk used by the algorithm can also be obtained as the solution of the following linear system 32 2 3 2 3 k k T T k 66 H ELk ?EU k 77 66 d 77 66 krf 7 0 5 4 zLk 5 = ? 4 (x ? l)Lk 75 : 4 ELk 0 0 ?EU k 0 zU k (u ? xk )U k

(39)

This easily follows by the special structure of the matrices ELk and EU k and of the systems (39) and (19). This means that the direction dk is the same direction considered in [15] with reference to a local algorithm for the solution of inequality constrained problems of general type. Taking into account these two observations, and the fact that eventually the penalty parameter is no longer changed (see the previous theorem) the theorem readily follows from [15, Theorem 3.2] and [11, Theorem 3.2]. /

Remark 6.1 It may be interesting to note that at the beginning of the paper we made, for simplicity, the blanket assumption that f is three times continuously di erentiable. However it should be clear from the proofs of Theorems 6.1 and 6.2 that to establish global convergence it is sucient to assume continuous di erentiability of the objective function while to prove the quadratic convergence rate of the algorithm it is enough to assume that the Hessian of f is semismooth. Furthermore, these di erentiability assumptions are only needed in S .

Remark 6.2 In Theorem 6.2 we made the assumption that fxk g ! x. We remark however,

that it is standard to prove that if one of the limit points of the sequence fxk g generated by the algorithm satis es the strong second order sucient condition then the whole sequence converges to this point. Exploiting the results of [15] it is now easy to analyze also the case in which quasi-Newton methods are employed.

Theorem 6.3 Suppose that the sequence fxk g produced by the algorithm converges to a point x satisfying the strong second order sucient condition, and that k(HNk k ? r2fNk k )dk)k = 0: lim k!1 kdkk

Then, eventually xk+1 = xk + dk (i.e. the stepsize of one is accepted eventually) and the convergence rate is superlinear.

Proof. The proof is similar to that of the previous Theorem, the only di erence being that this time we invoke [15, Theorem 5.2] instead of [15, Theorem 3.2].

/

7 Conclusion In this paper we described a new globally and superlinearly convergent algorithm for the solution of box constrained optimization problems. The algorithm is based on a continuously di erentiable merit function and on a nonmonotone linesearch technique and uses a new identi cation

An algorithm for bound constrained problems

19

technique of the active constraints which seems promising, see [13, 14]. Among the favourable characteristics of the new algorithm we recall the low computational cost per iteration and the fact that superlinear convergece can be established without requiring strict complementarity. Furthermore, although this issue was not addressed in this paper, it is not dicult to envisage a truncated version of the algorithm described here, see [14], which we believe to be, on the basis of some very preliminary computational experience, promising in the solution of large-scale problems.

8 Appendix A. To establish Theorem 4.1, we rst need some technical results. First of all we remark that, by Proposition 3.1, the set

o := fx 2 S : P (x; ")  P (xa ; ")g: is a compact set (for every xed " > 0). Lemma 8.1 Let F j = 0max P (x`(j?i) ; ") i m(j )

(40)

assume that Algorithm NMSB produces an in nite sequence fxk g; then:

(a) the sequence fF j g is non increasing and has a limit F^ ; (b) for any index j we have

F i < F j ; for all i  j + m + 1;

(c) fxk g remains in a compact set.

Proof. We observe rst that fxk g contains a subsequence of points x` j where the objective ( )

function is evaluated. At each of these points, we can de ne a new value F j +1 according to (40). By de nition, the number of previous function values, that are taken into account for determining F j +1 , increases at most by one at each j -update, that is m(j + 1)  m(j ) + 1. Therefore we can write: F j+1 = max P (x`(j +1?i) ; ")  max P (x`(j +1?i) ; ")

 i m(j +1) 0 i m(j )+1 = max [P (x`(j +1) ; "); max P (x`(j ?i) ; ")] = max [P (x`(j +1) ; "); F j ]: 0 i m(j ) On the other hand, the instructions of Algorithm NMSB and the condition on Rj +1 ensure that: 0

P (x`(j+1) ; ") < Rj  F j ;

and therefore we get, for all j :

(41)

F j+1  F j : (42) >From (41) and (42) it follows that P (x`(j ) ; ")  Ro = F o = P (xa ; ") and hence that ` (j ) x 2 o for all j . Since o is a compact set we have that the sequence fF j g is bounded from below so that, by (42), there exists F^ such that: lim F j = F^ ; j !1

20

F. Facchinei and S. Lucidi

and this establishes (a). Property (b) follows from (41) and the fact that, for all j , F j is computed by taking the maximum over at most m  + 1 previous function values. As regards (c) we rst observe that since the level set o is bounded and x`(j ) 2 o, there exists a number  such that kx`(j ) k   for all j . Further, the algorithm ensures that the objective function is computed at least every N iterations, that is `(j + 1)  `(j ) + N , so that, for any xk 2= fx`(j ) g, there exists an integer k  N and a point xr 2 fx`(j ) g such that

xk

= xr +

k ?1 X i=0

dr+i :

Since the points xr+i ; i = 1; : : :;  k do not belong to fx`(j ) g, by the test at n-step, we have kdr+ik  o; i = 0; 1; : : :;  k ? 1, so that kxk k  kxr k + o k   + oN , which shows that the whole sequence fxk g is bounded. /

Lemma 8.2 Assume that Algorithm NMSB produces an in nite sequence fxk g; let fx` j g be ( )

the sequence of points where the objective function is evaluated and let q (k) be the index de ned by: q(k) = max [j : `(j )  k]: (43) Then, there exists a sequence fxs(j )g satisfying the following conditions:

(a) F j = P (xs(j ) ; "), for j = 0; 1; : : :; (b) for any integer k, there exist indices hk and j k such that: 0 < hk ? k  N (m  + 1); hk = s(j k );

F j k = P (xhk ; ") < F q(k) :

(44) (45)

Proof. Let s(j ) be an index in the set f`(j ); `(j ? 1); : : :; `(j ? m(j ))g such that: P (xs(j) ; ") = 0max P (x`(j?i) ; "); im(j )

then (a) follows from the de nition of F j . Since m(j ) is bounded by the integer m  and `(j ) ! 1 for j ! 1, we have that s(j ) ! 1. Let now xk be any point produced by the algorithm and let q (k) be the index de ned by (43) (thus `(q (k)) is the largest index not exceeding k of an iteration that evaluates the objective function). We note that, by (43), q (h) > q (k) implies h > k. Consider the index j k = q (k) + m + 1; by the de nition of F j k , there is a point xhk = xs(j k ) such that P (xhk ; ") = F jk and j k  q(hk )  j k ? m(j k ): Therefore we have q (hk )  j k ? m(j k )  j k ? m  = q (k) + 1 and this implies that hk > k. k Moreover, since q (h ) ? q (k)  m  + 1 and the function is evaluated at least every N iterations, we have that: hk ? k  (m + 1)N : Finally, by (b) of Lemma 8.1 we have:

F jk < F q(k)

which completes the proof of assertion (b).

/

An algorithm for bound constrained problems

21

Lemma 8.3 Assume that Algorithm NMSB produces an in nite sequence fxk g. Then, we have:

^ lim P (xk ; ") = jlim F j = F; !1

k!1

lim kxk+1 ? xk k = 0:

k!1

(46) (47)

Proof. Let fxk gK denote the set (possibly empty) of points satisfying the test: kdk k  o t; for k 2 K (48) where the integer t increases with k 2 K ; when k 2 K we set, for convenience, k = 1. It follows from (48) that, if K is an in nite set, we have: lim k kdk k = 0: (49) k!1 k2K

Let now s(j ) and q (k) be the indices de ned in Lemma 8.2. We show by induction that, for any xed integer i  1, we have: lim s(j )?i kds(j )?i k = 0; (50) j !1 lim P (xs(j )?i ; ") = jlim P (xs(j) ; ") = jlim F j = F^ : !1 !1

(51)

F j = P (xs(j) ; ") = P (xs(j)?1 + s(j)?1 ds(j)?1 ; ")  F q(s(j)?1) + s(j)?1rP (xs(j)?1; ")T ds(j)?1:

(52) (53)

j !1

(Here and in the sequel we assume that the index j is large enough to avoid the occurrence of negative subscripts.) Assume rst that i = 1. If s(j ) ? 1 2 K , (50) holds with k = s(j ) ? 1. Otherwise, if s(j ) ? 1 2= K , recalling the acceptability criterion of the nonmonotone line search, we can write:

It follows that:

F q(s(j)?1) ? F j  s(j)?1 jrP (xs(j)?1 ; ")T ds(j)?1 j: (54) Therefore, if s(j ) ? 1 2= K for an in nite subsequence, from (a) of Lemma 8.1 and (54) we get s(j)?1 g(xs(j)?1)T ds(j)?1 ! 0, so that by Assumption 3 on the search direction and s(j)?1  1 we have also s(j )?1 kds(j )?1k ! 0 for this subsequence. It can be concluded that (50) holds for i = 1. Moreover since: P (xs(j) ; ") = P (xs(j)?1 + s(j)?1 ds(j)?1 ; "); by (50) and the uniform continuity of P on the compact set containing fxk g, equation (51) holds for i = 1. Assume now that (50) and (51) hold for a given i and consider the point xs(j )?i?1 . Reasoning as before, we can again distinguish the case s(j ) ? i ? 1 2 K , when (48) holds with k = s(j ) ? i ? 1, and the case s(j ) ? i ? 1 2= K , in which we have:

P (xs(j)?i ; ")  F q(s(j)?i?1) + s(j)?i?1 rP (xs(j)?i?1 ; ")T ds(j)?i?1 and hence:

F q(s(j)?i?1) ? P (xs(j)?i ; ")  s(j)?i?1 jrP (xs(j)?i?1 ; ")T ds(j)?i?1 j:

(55)

22

F. Facchinei and S. Lucidi

Then, using (49), (51), (55) and recalling that the direction satis es Assumption 3, we can assert that equation (50) holds with i replaced by i + 1. By (50) and the uniform continuity of P , it follows that also (51) is satis ed with i replaced by i + 1, which completes the induction. Let now xk be any given point produced by the algorithm. Then by Lemma 8.2 there is a k point xh 2 fxs(j ) g such that 0 < hk ? k  (m  + 1)N : (56) Then, we can write:

xk

= xhk

?

and this implies, by (50) and (56), that:

k ?k hX i=1

hk ?i dhk ?i :

k ? xhk k = 0: lim k x k!1

(57)

>From the uniform continuity of P , it follows that k ; ") = lim P (xhk ; ") = lim F j ; lim P ( x j !1 k!1 k!1

(58)

and (a) is proved. If k 2= K , we obtain P (xk+1 ; ")  F q(k) + k rP (xk ; ")T dk and hence we have that:

F q(k) ? P (xk+1 ; ")  k jrP (xk ; ")T dk j:

(59)

Therefore by (49), (58), (59) and Assumption 3, we can conclude that: lim k kdk k = 0;

k!1

/

which establishes (b).

Finally we can prove Theorem 4.1 . Proof of Theorem 4.1. We observe now that if the algorithm terminates after a nite number of iterations, the thesis follows by Theorem 5.2 and the stopping criterion. Suppose then that the sequence fxk g is in nite. By Lemma 8.1 all the points of the sequence belong to a compact set and therefore fxk g admits at least a limit point. Denote by x any such limit point, and relabel fxk g a subsequence converging to x. By (47) of Lemma 8.3, we have lim k kdk k = 0:

k!1

(60)

k = Then either limk!1 kdk k = 0 or there exists a subsequence fxk gK1 of fxk g such that limkK!1 1 0. In the rst case the thesis follows by Theorem 5.3; then, let us consider the second case. In the second case we can assume without loss of generality, that there are two possibilities: (a) the sequence f k gK1 is such that xk + 2 k dk 2 IRn n S ; (b) the sequence f k gK1 is such that

P (xk + 2 k dk ; "~)  P (xk ; "~) + 2 k rP (xk ; "~)dk ;

(61)

An algorithm for bound constrained problems

23

where we took into account that R is the maximum among the last m previous values of P . We analyse rst the case (a). Since fxk gK1 ! x we have, by (60) that

fxk + 2 k dk gK1 ! x: (62) But, taking into account that by (46), fP (xk ; "~)gK1 tends to a nite value, we have that actually x 2 S . Hence, taking into account that S is an open set, we get a contradiction from (61) and (62). Then, let us examine the case (b). By the theorem of the mean we can nd, for k 2 K suciently large, a point uk = xk + ! k 2 k dk with ! k 2 (0; 1) such that rP (uk ; "~)dk  rP (xk; "~)dk: (63) 1

dk  Let now fxk gK2 be a subsequence of fxk gK1 such that limkK!1 2 kdk k = d: By (61) we have fuk gK2 ! x, so that, dividing both members in (63) by kdkk and taking limits, we obtain: (1 ? )rP (x; "~)d  0. Since 1 ? > 0 we get

rP (x; "~)d  0: But, by Step 6, we also have, for all k 2 K , k rP (xk ; "~) kddk k < ?"~; which implies rP (x; "~)d < 0 which, in turn, contradicts (64).

(64)

2

2

/

References [1] D.P. Bertsekas: Constrained Optimization and Lagrange Multiplier Methods. Academic Press, New York, 1982. [2] D.P. Bertsekas: Projected Newton methods for optimization problems with simple constraints. SIAM Journal on Control and Optimization 20, 1982, pp. 221{246. [3] J. Burke and J. More: On the identi cation of active constraints. SIAM Journal on Numerical Analysis 25, 1988, pp. 1197{1211. [4] P. Calamai and J. More: Projected gradient for linearly constrained problems. Mathematical Programming (Series A) 39, 1987, pp. 93{116. [5] T. Coleman and L. Hulbert: A direct active set algorithm for large sparse quadratic programs with simple bounds. Mathematical Programming (Series A) 45, 1987, pp. 373{406. [6] T. Coleman and L. Hulbert: A globally and superlinearly convergent algorithm for convex quadratic programs with simple bounds. SIAM Journal on Optimization 3, 1993, pp. 298{321. [7] A. Conn, N. Gould and Ph. Toint: Algorithms for minimization subject to bounds. SIAM Journal on Numerical Analysis 25, 1988, pp. 433{460. [8] A. Conn, N. Gould and Ph. Toint: Testing a class of methods for solving minimization problems with simple bounds on the variables. Mathematics of Computation 50, 1988, pp. 399{430.

24

F. Facchinei and S. Lucidi

[9] R. Cottle and M. Goheen: A special class of large quadratic programs. in Nonlinear Programming 3, O.L. Mangasarian, R.R. Meyer and S. Robinson (eds.), Academic Press, New-York, 1978, pp. 361{390. [10] J. Dunn: Global and asymptotic convergence rate estimate for a class of projected gradient process. SIAM Journal on Control and Optimization 19, 1981, pp. 368{400. [11] F. Facchinei: Minimization of SC1 functions and the Maratos e ect. Operations Research Letters 17, 1995, pp. 131{137. [12] F. Facchinei and S. Lucidi: A method for the minimization of a quadratic convex function over the simplex. Operations Research Proceedings 1990, W. Buhler et al., SpringerVerlag, Berlin, 1992, pp. 125{132. [13] F. Facchinei and S. Lucidi: A class of penalty functions for optimization problems with bounds constraints. Optimization 26, 1992, pp. 239{259. [14] F. Facchinei and S. Lucidi: A class of methods for optimization problems with simple bounds. Part 2: Algorithms and numerical results. Technical Report R.336, IASI-CNR, Roma, Italy, 1992. [15] F. Facchinei and S. Lucidi: Quadratically and superlinearly convergent algorithms for the solution of inequality constrained minimization problems. Journal of Optimization Theory and Applications 85, 1995, pp. 265{289. [16] A. Friedlander, J.M. Martinez, S.A. Santos: A new trust region algorithm for bound constrained minimization. Applied Mathematics and Optimization 30, 1994, pp. 235{266. [17] P. Gill, W. Murray and M. Wright: Practical Optimization. Academic Press, New York, 1981. [18] L. Grippo, F. Lampariello and S. Lucidi: A nonmonotone linesearch technique for Newton's method. SIAM Journal on Numerical Analysis 23, 1986, pp. 707{716. [19] L. Grippo, F. Lampariello and S. Lucidi: A class of nonmonotone stabilization methods in unconstrained optimization. Numerische Mathematik 59, 1991, pp. 779{805. [20] L. Grippo and S. Lucidi: A di erentiable exact penalty function for bound constrained quadratic programming problems. Optimization 22, 1991, pp. 557{578. [21] L. Grippo and S. Lucidi: On the solution of a class of quadratic programs using a di erentiable exact penalty function. in System Modelling and Optimization, H.J. Sebastian and K. Tammer eds., Springer-Verlag, Berlin, 1990, pp. 764{773. [22] M.Lescrenier: Convergence of trust region algorithms for optimization with bounds when strict complementarity does not hold. SIAM Journal on Numerical Analysis 28, 1991, pp. 476{495. [23] W. Li: Di erentiable Piecewise Quadratic exact penalty functions for quadratic programs with simple bound constraints. Department of Mathematics and Statistics, Old Dominion University, Norfolk, USA, 1984. To appear in SIAM Journal on Optimization [24] W. Li: Linearly convergent descent methods for unconstrained minimization of convex quadratic spline. Department of Mathematics and Statistics, Old Dominion University, Norfolk, USA, 1994. To appear in Journal of Optimization Theory and Applications.

An algorithm for bound constrained problems

25

[25] W. Li and J. Swetits: A new algorithm for strictly convex quadratic programs. Technical Report TR92-1, Department of Mathematics and Statistics, Old Dominion University, Norfolk, USA, 1992. To appear in SIAM Journal on Optimization. [26] W. Li and J. Swetits: A newton method for convex regression, data smoothing, and quadratic programming with bounded constraints. SIAM Journal on Optimization 3, 1993, pp. 466{488. [27] G.P. McCormick: Nonlinear Programming. John Wiley & Sons, New York, 1983. [28] R. Mifflin: Semismooth and semiconvex functions in constrained optimization. SIAM Journal on Control and Optimization 15, 1977, pp. 957{972. [29] J. More and G. Toraldo: Algorithms for bound constrained quadratic programming problems. Numerische Mathematik 55, 1989, pp. 377{400. [30] J. More and G. Toraldo: Numerical solution of large quadratic programming problems with bound constraints. SIAM Journal on Control and Optimization 1, 1991, pp. 93{113. [31] S. Nash and A. Sofer: A barrier method for large-scale constrained optimization. ORSA Journal on Computing 5, 1993, pp. 40{53. [32] L. Qi and J. Sun: A nonsmooth version of Newton's method. Mathematical Programming (Series A) 58, 1993, pp. 353{368. [33] S.M. Robinson: Generalized equations. In Mathematical programming: the state of the art, A. Bachem, M. Groetschel and B. Korte editors, Springer-Verlag, Berlin, 1983, pp. 346{367. [34] S. Wright: Algorithms for minimization subject to bounds. Technical Report MCS-P321288, Argonne National Laboratory, Mathematics and Computer Science Division, December 1988.

Suggest Documents