Following the lines of Mor e and Toraldo (SIAM J. on Optimization 1, pp. 93-113), ...... Gill, P. E.; Murray, W.; Wright, M. H. 1981]: Practical Optimization, Academic.
A NEW METHOD FOR LARGE-SCALE BOX CONSTRAINED CONVEX QUADRATIC MINIMIZATION ( ) PROBLEMS Ana Friedlander() Jose Mario Martnez() and Marcos Raydan()
ABSTRACT. In this paper, we present a new method for minimizing a convex quadratic
function of many variables with box constraints. The new algorithm is a modi cation of a method introduced recently by Friedlander and Martnez (SIAM J. on Optimization, february 1994). Following the lines of More and Toraldo (SIAM J. on Optimization 1, pp. 93-113), it combines an ecient unconstrained method with gradient projection techniques. The strategy for \leaving the current face" makes it possible to obtain convergence even when the Hessian is singular. Dual nondegeneracy is not assumed anywhere. The unconstrained minimization algorithm used within the faces was introduced by Barzilai and Borwein and analyzed by Raydan (IMA Journal on Numerical Analysis 13, pp. 321-326).
Key words: quadratic programming, bound constrained problems, projected gradients,
Barzilai-Borwein method.
Work supported by FAPESP (Grant 90/3724/6), FINEP, CNPq and FAEP-UNICAMP. This paper appeared in Optimization Methods and Software 5 (1995) 57-74. 0() Department of Applied Mathematics, IMECC-UNICAMP, University of Campinas, CP 6065, 13081 Campinas SP, Brazil. 0() Department of Mathematics, Central University of Venezuela, Ap. 47002, Caracas 1041 A, Venezuela. 0()
1
1 Introduction The problem of minimizing a convex quadratic function of many variables subject to box constraints appears frequently in applications. See Cea and Glowinski [1983], O'Leary [1980], Lotstedt [1984]), Herman [1980], Nickel and Tolle [1989], etc. Many algorithms for solving this type of problems are based on active set strategies (see Gill, Murray and Wright [1981], Fletcher [1987]). Essentially, an active set method generates iterates on a face of the feasible polytope until either a minimizer of the objective function on that face or a point on the boundary of the face is reached. In the rst case the iterate is allowed to leave the current face and the algorithm continues working on a face of a higher dimension. Since the function values are strictly decreasing, nite convergence is obtained. Many times, nite convergence results are based on the nite termination properties of the Conjugate Gradient Method for quadratic functions (Hestenes and Stiefel [1952], Golub and Van Loan [1989]). This classical approach has two main disadvantages for large-scale problems: one is that constraints are added one at a time to the working set, a fact that leads to an excessive number of iterations. The other disadvantage is that the exact minimizer on the current working face is required before dropping constraints. Of course, obtaining the exact minimizer may require many conjugate gradient iterations if the dimension of the face is large. In order to avoid the disadvantages of the active set strategy, a dierent type of algorithm, based on the gradient projection, was proposed by several authors. Bertsekas [1982] used a scaled gradient projection method for minimizing arbitrary functions subject to bound constraints, Dembo and Tulowitzki [1987] proposed the combination of the gradient projection method with conjugate gradients and More and Toraldo [1991] introduced an algorithm of the same type that has nite convergence if the problem is strictly convex and the solution is nondegenerate. Yang and Tolle [1986], Wright [1989], Friedlander and Martnez [1994], and other authors followed the ideas of Dembo and Tulowitzki to introduce algorithms whose eciency is currently under discussion. In this paper, we present a new algorithm that combines an active set strategy with the gradient projection method. As in Friedlander and Martnez [1994], we avoid the necessity of nding an exact minimizer on a face by testing a computable criterion on the current iteration that, if satis ed, guarantees that the next point will have a lower function value than the minimum value of the function on a neighborhood of xed size restricted to the current face. This property is essential to prove convergence for a (not necessarily strictly) convex quadratic without nondegeneracy assumptions. The gradient projection techniques are used to speed up the process of identifying the optimal face, but they play no role in the convergence analysis. The main dierence between the approach of this paper and the one given in 2
Friedlander and Martnez [1994] is that here we use the Barzilai-Borwein method analyzed by Raydan [1993], instead of the classical Conjugate Gradient Method for minimization inside the faces. The Barzilai-Borwein method, as well as the Conjugate Gradient method, minimizes an unconstrained convex quadratic using few memory locations and few oating point operations per iteration. However the Conjugate Gradient method requires the storage of one additional vector and also requires one additional scalar-vector multiplication and one additional vector summation, which is not negligible for large problems. On the other hand, this new method does not enjoy nite termination as the Conjugate Gradient Method does. The Barzilai-Borwein method uses always the function gradient as search direction but the choice of the steplength is not the classical choice of the steepest descent method. This new choice of the steplength speeds up the convergence of the gradient method and allows superlinear convergence for a subclass of problems, which is an interesting feature. See Barzilai and Borwein [1988] and Raydan [1993]. The Barzilai-Borwein method is nonmonotone, that is, the objective function does not necessarily decrease at each iteration. By this reason, the active set strategy used here has some dierences with the one used in Friedlander and Martnez [1994]. In fact, in the method of Friedlander and Martnez, when conjugate gradient iterations are generated inside a face and one of them turns out to be infeasible, a point on the boundary of the face is computed, using the direction determined by the last (infeasible) iteration. At the new (boundary) point, the objective function is necessarily lower than at the last feasible point. In the Barzilai-Borwein method, the objective function values at infeasible iterations can be greater than at the \best" feasible iteration. However, by the convergence of the method, the value of the objective function eventually decreases. Because of this, we continue the Barzilai-Borwein iterations at infeasible points until we nd a point where a decrease of the objective function takes place. This last point is used to de ne a direction that produces a better iterate on the boundary. In this sense, our strategy for nding a point on the boundary of the face is closer to the one used by More and Toraldo, who also work with infeasible points, than to the one used by Friedlander and Martnez. By these reasons, the convergence proof of the main algorithm is dierent (though similar) from the convergence proof of the Friedlander-Martnez method. This paper is organized as follows. In Section 2 we prove the convergence of the Barzilai-Borwein method for solving unconstrained positive semide nite quadratic problems. In Section 3 we de ne the basic method, which uses the Barzilai-Borwein algorithm inside the faces. In Section 4 we prove that the main algorithm is well-de ned and convergent. In Section 5 we describe the algorithms that are used to abandon the faces and to go to the
3
boundary. In Section 6 we make nal remarks and we indicate directions of current research.
2 Convergence analyis for the Barzilai-Borwein method in the convex quadratic case In this section we establish the convergence of the Barzilai-Borwein gradient method when applied to the problem (2:1) Minimize (x) 1 xT Hx ? bT x 2 where H 2 IRnn is symmetric and positive semide nite and b belongs to the range of H . The analysis that we present is an extension of the analysis presented in Raydan [1993] for the strictly convex quadratic case. Basically, the idea of our proofs consists on reducing problem (2.1) to the case studied by Raydan [1993]. However, the details cannot be omitted. Let fv ; v ; : : : ; vng be a set of orthonormal eigenvectors of H associated with the eigenvalues f ; ; : : : ; ng. Furthermore, let us assume that = = : : : = `? = 0 ; (2:2) and 0 < ` ` : : : n (2:3) for some 1 ` < n. Notice that, in our notation, the case ` = 1 corresponds to the strictly convex quadratic case for which all the eigenvalues of H are strictly positive. A necessary and sucient condition for x being a minimizer of (x) is r (x) Hx ? b = 0. On the other hand, b is in the range of H if and only if it does not have a component in the subspace generated by fv ; : : : v`? g. Therefore, the function (x) has minimizers if and only if there exist i, ` i n, such that b = ` v` + ` v` + : : : + n vn : (2:4) Our next result presents a characterization of the minimizers of the quadratic function (x). 1
2
1
2
1
2
1
+1
1
1
+1
+1
Lemma 2.1. Let us assume that b is in the range of H . Then, x^ is a minimizer of (x) if and only if there exist i , 1 i ` ? 1, such that x^ = 1 v1 + : : : + `?1 v`?1 + ` v` + : : : + n vn ; ` n
4
(2:5)
where i , ` i n, are de ned in (2.4). Proof: Let the vector x^ be given by (2.5). Then, using (2.2) it follows that H x^ = ` ` v` + : : : + n n vn = b : ` n
Hence, x^ is a minimizer of (x). On the other hand, let us assume that x^ 2 IRn is a minimizer of (x). Since fv ; : : : ; vng are orthonormal vectors, there exist i, 1 i n, such that 1
x^ = 1 v1 + 2 v2 + : : : + n vn :
(2:6)
From (2.2), (2.4), (2.6) and the fact that H x^ ? b = 0, we obtain (`` ? `)v` + : : : + (nn ? n)vn = 0 : Since fv`; : : : ; vng are linearly independent vectors, then i = i/i for all ` i n. Therefore, x^ can be written as in equation (2.5), and the lemma is true. 2 Now, let us consider the Barzilai-Borwein method for quadratics. Given x 2 IRn and 6= 0, the sequence of iterates is de ned by 1 (2:7) xk = xk ? g ; 0
0
+1
k k
where gk r (xk ) = Hxk ? b, and k is given by sT Hs k = k?T 1 k?1 ; sk?1sk?1
(2:8)
where sk? = xk ? xk? . For any k, there exist cki , 1 i n, such that 1
1
xk = ck1 v1 + : : : + cknvn :
(2:9)
In particular, x = c v + : : : + cnvn. Let us now de ne, for a given x , the vector x? as 0
0 1 1
0
0
x? = c01 v1 + : : : + c0`?1 v`?1 + ` v` + : : : + n vn ; ` n
(2:10)
and for every k let us also de ne the vector ek = x? ? xk . From Lemma 2.1 it follows that x? is a minimizer of (x). Later, in this section, we will prove that the sequence fxk g converges 5
to x? . First, we need to show the following result.
Lemma 2.2. Let us assume that b is in the range of H .Then, for all k, the vectors gk , sk and ek generated by the Barzilai and Borwein method belong to Spanfv` ; v` ; : : : ; vn g. Moreover, for all k 1, k satis es 0 < ` k n : (2:11) Proof: Since b is in the range of H and gk = Hxk ? b, then clearly gk 2 spfv`; : : : ; vng. +1
Therefore, using (2.7) and (2.9) we have for all k
xk = c01 v1 + : : : + c0`?1 v`?1 + ck` v` + : : : + ckn v` :
(2:12)
Using (2.12) and the de nition of sk? it follows that 1
sk?1 = (ck` ? ck` ?1 )v` + : : : + (ckn ? ckn?1 )vn ;
(2:13)
which implies that for all k, sk 2 spfv`; : : : ; vng. Similarly, from (2.10) and (2.12) we conclude that for all k, ek 2 spfv`; : : : ; vng. Finally, from (2.8), (2.13) and the Rayleigh-Ritz Theorem we have that k satis es (2.11) for k 1. This completes the proof. 2 Theorem 2.1 establishes the convergence of the Barzilai and Borwein method when applied to a convex quadratic function that has minimizers.
Theorem 2.1. Let (x) = xT Hx ? bT x be a convex quadratic function with b in the range of H . Let fxk g be the sequence generated by the Barzilai and Borwein method and x? the minimizer of q given by (2.10). Then, either xj = x? for some nite j , or the sequence fxk g 1 2
converges to x? . Proof: We need only consider the case in which there is no nite integer j such that xj = x?. Hence, it suces to prove that the sequence fek g converges to zero. Using (2.7) and the fact that r (x? ) = 0, we have Hek = k sk for all k: (2:14) Substituting sk = ek ? ek+1 in (2.14) we obtain for any k 1 ek+1 = (k I ? H )ek : (2:15) k
6
Now, using Lemma 2.2, it follows that for the initial error e there exist constants d ; : : : ; dn such that n e = di vi : 0
Using (2.15) we obtain for any integer k ek+1 =
where
X
0 1
0
0
0
i=`
Xn dki i=`
+1
vi ;
? dki +1 = ( k i )dki : k
We observe that the sequence fek g converges to zero if and only if each one of the sequences fdki g for i = `; ::; n converges to zero. From this point on, the proof of Theorem 2.1 in Raydan [1993] applies, where now ` plays the role of min, the sequences fek g, fsk g and fgk g remain in spfv`; : : : ; vng and the scalars k satisfy (2.11). 2
3 Basic method We consider the problem of minimizing a convex quadratic function with bound constrained variables: subject to
Minimize (x) x2
(3.1)
where = fx 2 IRn j ` x u; ` < ug; (x) = xT Hx ? bT x; and H is positivesemide nite. We assume that `i > ?1, ui < 1 for all i = 1; : : : ; n and that b belongs to the range of H (observe that rank-de cient bound constrained least-squares problems satisfy this hypothesis). We denote g(x) ?r (x) ?(Hx ? b) for all x 2 IRn. Let L > 0 be such that kH k L. (k k denotes the 2-norm of vectors or matrices.) By the convexity of we have that 0 (z) ? (x) ? hr (x); z ? xi = 21 (z ? x)T H (z ? x) L2 kz ? xk (3:2) 1 2
2
7
for all x; z 2 IRn. We de ne an open face of as a set FI such that I is a (possibly empty) subset of f1; 2; : : : ; 2ng such that i and n + i cannot belong simultaneously to I for any i 2 f1; 2; : : : ; ng (3.3) and FI = fx 2 j xi = `i if i 2 I; xi = ui if n + i 2 I; `i < xi < ui otherwiseg: (3:4) Let us call F I , the closure of each open face, [FI ] the smallest linear manifold which contains FI ; S (FI ) the parallel subspace to [FI ] and dim FI the dimension of S (FI ). Clearly, dim FI = n ? jI j. For each x 2 let us de ne the (negative) projected gradient gP (x) 2 IRn as gP (x)i
@ = 0 if xi = `i and @x (x) > 0 i or @ (x) < 0 xi = ui and
(3.5)
@xi @ (x) otherwise: =? @xi A necessary and sucient condition for x being a global solution of (3.1) is gP (x) = 0: (3:6) n For each x 2 F I let us de ne gI (x) 2 IR as gI (x)i = 0 if i 2 I or n + i 2 I (3.7) @ = ? @x (x) otherwise: i Thus, gI (x) is the orthogonal projection of ?r (x) on S (FI ). We also de ne for x 2 F I, gCI (x)i = 0 if i 62 I and n + i 62 I (3.8) = 0 if i 2 I and @ (x) > 0 @xi
or @ n + i 2 I and (x) < 0
=?
@xi @ (x) otherwise: @xi
8
The vector gCI (x) was introduced in Friedlander and Martnez [1989] and named \chopped gradient". Observe that for all x 2 FI we have gP (x) = g I (x) + gCI (x) :
In Lemma 3.1 we show that a stationary point x for F I , either is a global solution of (3.1), or has a nonnull gCI (x).
Lemma 3.1. Assume that x 2 F I is such that (x) (x) for all
x 2 F I:
(3:9)
Then, the following statements are equivalent:
(x) (x) for all x 2 ;
and
(3.10)
g CI (x) = 0:
(3.11)
Proof: See Friedlander and Martnez [1994].
2
Let us now de ne the Main Model Algorithm considered in this work. The following description will be \high-level" in the sense that the speci cation of two important procedures (at Steps 2 and 4) are postponed to Section 5. In fact, the structure of these two steps will be \Compute xk such that : : :" and the reader does not need to believe at this point that such computation is possible. In Section 4 we will prove that the algorithm is well de ned, which essentially consists in proving that there exists computable points that satisfy the requirements of Steps 2 and 4. Finally, in Section 5 we will give the algorithms which compute the points with the desired conditions in an ecient way. +1
Algorithm 3.1. Main Model Algorithm. Let > 0, " > 0, and L kH k be given independently of k, and let x 2 be an arbitrary initial point. The algorithm de nes a sequence of approximations fxk g and stops when kgP (xk )k ". Assume that xk 2 is such that kgP (xk )k > ". Let I = I (xk ) be such that xk 2 FI . (Observe that there exists only one set I f1; : : : ; 2ng with that property.) De ne I minfui ? `i j i 2 I or n + i 2 I g. The following steps de ne the 0
procedure for obtaining xk . +1
9
Step 1. Test if kgI (xk )k is small enough for leaving the face. If
I kgCI (xk )k > L I and kgI (xk )k < L 2
2
or
x )k kgCI (xk )k L I and kgI (xk )k < kgI2(L C k
2
go to Step 2. Else, go to Step 3.
Step 2. Find a new point not belonging to F I . Compute xk 2 ? F I such that (xk ) < (xk ) ? kgI (xk )k; +1
(3:12)
+1
(using Algorithm 5.1.1) and stop the iteration.
Step 3. Compute the Barzilai-Borwein Direction.
Consider the problem of minimizing (x) subject to x 2 [FI ]. With an obvious change of variables, this problem takes the form (2.1). Apply the Barzilai-Borwein method to this problem until a point z 2 [FI ] is obtained such that (z) < (xk ). In this case, we de ne z k = z . (The initial for the application of the Barzilai- Borwein method at this step is de ned as = kxk k if xk? does not belong to FI or if k = 0. Otherwise, comes from the (k ? 1)-th iteration.) 0
0
1
0
Step 4. Find the new point on F I .
If zk does not belong to F I , then obtain xk as any point on F I ? FI that satis es +1
(xk ) < (xk ); +1
(3:13)
(using Algorithm 5.1.2). Else, de ne xk = zk and stop the iteration. +1
Remarks
The geometrical meaning of the procedure used in Algorithm 3.1 for leaving the current face was explained in Friedlander and Martnez [1994]. For completeness, let us rephrase their arguments here. 10
The main property of the algorithm is that, when a closed face F I is abandoned, no future iterate will belong to a ball with center xk and radius in F I . Since is convex, we have that (x) (xk ) ? gI (xk )T (x ? xk ) (xk ) ? kgI (xk )k for all x 2 FI such that kx ? xk k , and so, the property above follows from the monotonicity of (xk ). At Step 1, we test sucient conditions for obtaining the required decrease in Step 2. In fact, let us consider the quadratic function de ned by () = (xk ) ? kgCI (xk )k2 + 2
Due to (3.2) we have that, for 0,
L C k 2 kgI (x )k :
2
(xk + gCI (xk )) ():
The unconstrained minimizer of is = L . Moreover, the minimizer of for 2 [0; kgCI Ixk k ] is = minf L ; kgCI Ixk k g. Therefore, the function may be decreased by an amount of at least = kgCI (xk )k ? L2 kgCI (xk )k (
e
)
1
1
(
)
e
2
e
2
2
C( k)
if the new point is the minimizer of on the segment [xk ; xk + I kggCII xxk k ]. At Step 1 of Algorithm 3.1 we essentially test if kgI (xk )k. If this happens, it turns out that the desired decrease can be obtained. (Algorithms for obtaining this decrease will be described in Section 5.) However, we will see in the proof of Theorem 4.1 that, for getting the desired decrease, it is sucient that, eventually, the choice xk + kgCI (xk )k be made. On the other hand, if < kgI (xk )k, we judge that the set F I should be better explored, and our search for a new point is restricted to this set, with the single requirement that (xk ) < (xk ). We are also going to describe in Section 5 the algorithm for nding xk when the minimizer along a search direction is not in F I . The user-given parameter has a big in uence on the performance of the algorithm. If is large, the criterion for abandoning a face is stringent, so, the iterates tend to stay on a face (perhaps going to its boundary) until a minimizer on the face is found. In this case, returning to an abandoned face is practically impossible. On the other hand, if is very small, the current face tends to be left if gCI (xk ) 6= 0. The optimum value of depends on the type of problem under consideration. Generally speaking, when no arti cial bounds are present in the formulation of the problem (arti cial bounds are sometimes introduced to deal with unbounded variables) we prefer to use = 10? max fui ? `i; i = 1; : : : ; ng. (
e
+1
+1
3
11
)
In our de nition of the problem we assumed that `i < ui for all i = 1; : : : ; n. A referee observed that there exist practical cases where, for some i, `i = ui, but it is not convenient to de ne a new data structure that excludes the variable xi. It is easy to verify that the algorithm can also be de ned in this case and all the convergence results hold, if we modify the de nition of I in Algorithm 3.1 in such a way that the indices i such that `i = ui are excluded. That is, if J = fi 2 f1; : : : ; ng j `i 6= uig, the appropriate de nition of I shoud be I mini2J fui ? `i j i 2 I or n + i 2 I g.
4 Convergence results In this section we prove that Algorithm 3.1 is well de ned (the points required at Steps 2 and 4 can be computed) and that an approximate solution is obtained up to any required precision. The structure of the proofs will be the same as in Friedlander and Martnez [1994]. However, some important dierences come from the fact that infeasible Barzilai-Borwein iterations are admitted, as we mentioned in the introduction. The main results are stated in Theorems 4.1 and 4.2.
Theorem 4.1. Algorithm 3.1 is well de ned. Theorem 4.2. If fxk g is a sequence generated by Algorithm 3.1, then fxk g terminates at a point that satis es kgP (xk )k " in a nite number of iterations. Theorems 4.1 and 4.2 are consequences of the following lemmas:
Lemma 4.1. Let I = minfui ? `i j i 2 I or n + i 2 I g and x 2 F I such that gCI (x) 6= 0. De ne
!IC (x) =
then
g CI (x) kgCI (x)k
x + !IC (x) 2 ? F I for all 2 [0; I ]:
Proof: See Friedlander and Martnez [1994].
12
2
(4:1) (4:2)
Lemma 4.2. Let 2 (0; I ] and x 2 F I such that gCI (x) 6= 0. Then for all y 2 FI such that kx ? yk , (x + !IC (x)) ? (y) ?kgCI (x)k + L + kgI (x)k : 2 Proof: The convexity of implies that, for all y 2 FI , 2
(y) (x) ? hgI (x); y ? xi:
(4:3) (4:4)
So, by the Cauchy-Schwarz inequality, we have for all y 2 FI such that kx ? yk , (y) (x) ? kgI (x)k:
(4:5)
Now, by (3.2), we have: (x + !IC (x)) ? (x) ? hr (x); !IC (x)i L2 : 2
Therefore
(x + !IC (x)) ? (x) ?kgCI (x)k + L2 : Substracting (4.4) from (4.7) we obtain (4.3). 2
(4:6) (4:7)
2
Proof of Theorem 4.1: If kgP xk k > " , xk is de ned either at Step 2 or 4. +1
We get into Step 2 if
kgCI (xk )k > L I
L I2 and 2 > kgI (xk )k
(4:8)
or
C xk )k kgCI (xk )k L I and kgI2(L > kgI (xk )k: By Lemma 4.1 we have that xk + I !IC (xk ) 2 ? F I . 2
(4:9)
But, by (4.7), if (4.8) holds, we have that (xk + I !IC (xk )) <
" either
(4.11) or (4.12) hold.
0
0
15
Proof: Suppose, on the contrary, that there exists an in nite set of indices K IN such that for all k 2 K , kgP (xk )k > " and xk satis es (4.13). Since the number of dierent faces FI is nite, there exists an in nite set K K such that xk belongs to a xed FI for all k 2 K . So, for all k 2 K we have that (xk ) < (x) for all x 2 FI such that kx ? xk k . Therefore, (xk ) < (x) for all k 2 K ; x 2 FI ; kx ? xk k . Hence, for any two dierent k ; k 2 K ; k < k , we necessarily have that kxk ? xk k > 0. This is 1
+1
1
2
2
+1
1
2
2
1
2
1 +1 2
1
2
impossible, since FI is bounded. Therefore, the desired result is proved.
2
2
Proof of Theorem 4.2: Assume, by contradiction, that kgP (xk )k > " for all k. By Lemma 4.6 we know that there exist ` and FI such that xk 2 FI and either (4.11) or (4.12) must hold for all k ` . However, we cannot perform an in nite number of iterations where 0
0
(4.12) holds, since the dimension of the current face decreases at each of these iterations. Therefore, there exists k ` and a subset I of f1; 2; : : : ; 2ng such that xk 2 FI for all k k . Then, the sequence fxk0 ; xk0 ; : : :g is obtained by successive Barzilai-Borwein iterations inside FI . So, the proof follows by the same arguments used in Lemma 4.4. 2 0
0
+1
0
+2
5 Search on the piecewise linear path In this section we describe the implementation of Steps 2 and 4 of the Model Algorithm 3.1. In both cases we use a backtracking search along the polygonal path de ned by a feasible search direction. Our projected search is analogous to the one of More and Toraldo [1991], except that we admit singular Hessians and we use dierent stopping conditions for the search. We denote P (x) the orthogonal projection of x 2 IRn on . The projected piecewise linear search is used at steps 2 and 4 of Algorithm 3.1. At Step 2 we wish to nd a point xk = P (xk + dk ) not belonging to F I such that (3.12) holds. First, we try dk = gP (xk ). In nite time, we will be able to detect if the desired decrease is possible or not, using the projected gradient direction. If it is, the new approximation xk is computed. Otherwise, we try the direction dk = gCI (xk ). By (4.10), using this choice, the backtracking procedure guarantees that (3.12) will be satis ed. At Step 4 we wish to nd a point xk = P (xk + dk ) 2 F I such that (3.13) holds. The search direction will be dk = zk ? xk , where zk is de ned at Step 3 of Algorithm 3.1 +1
+1
+1
16
(so, (zk ) < (xk )). Since is convex, we have that (z) < (xk ) for all z in the segment (xk ; zk ]. In particular, the value of at the only boundary point of FI that belongs to this segment is less than (xk ). If, using the piecewise linear search, we are not able to nd another point, we can take that boundary point as xk . This is made in Algorithm 5.1.2, while the search called from Step 2 is performed by Algorithm 5.1.1. Let us describe the algorithms that de ne the projected piecewise linear search. Remember that xk is the current k ? th approximation of the solution of the problem and that dk is a feasible descent direction. +1
Algorithm 5.1.1. Search along the Piecewise Linear Path (from Step 2). Step 1 . Compute the Minimum and the Maximum Breakpoints. Compute
k = maxf 0 j xk + dk 2 g;
k = maxf 0 j 9 i 2 f1; : : : ; ng such that `i = xki + (dk )i or xki + (dk )i = ui g:
Step 2. Compute the First Trial Point.
If dTk Hdk = 0, then set = k : Else, compute g (xk )T dk =? T : dk Hdk
Step 3. Case in which the rst trial point is feasible. If k , set xk = xk + dk and return. +1
Else, go to Step 4.
Step 4. Stopping Criterion for the Search. If
(P (xk + dk )) < (xk ) ? kgI (xk )k; then set xk = P (xk + dk ) and return. Else, go to Step 5. +1
Step 5. Compute the Minimizer of the Interpolatory Quadratic and safeguard.
Compute new the minimizer of the quadratic () such that (0) = (xk ), 0(0) = g(xk )T dk and () = (P (xk + dk ). See More and Toraldo [1991] for details. If new 0:1 or new 0:9, then set =2. Else, set = new . 17
Step 6. Case in which the new trial point is feasible. If k , then set xk = xk + dk and return. Else, go to Step 4. +1
Algorithm 5.1.2. Search along the Piecewise Linear Path (from Step 4). Step 1 . Compute the Minimum and the Maximum Breakpoints. Proceed as in Step 1 of Algorithm 5.1.1.
Step 2. Compute the First Trial Point.
Proceed as in Step 2 of Algorithm 5.1.1.
Step 3. Case in which the rst trial point is feasible. If k , set xk = xk + dk and return. +1
Else, go to Step 4.
Step 4. Stopping Criterion for the Search. If
(P (xk + dk )) < (xk ); set xk = P (xk + dk ) and return. Else, go to Step 5. +1
Step 5. Compute the Minimizer of the Interpolatory Quadratic and safeguard. Proceed as in Step 5 of Algorithm 5.1.1.
Step 6. Case in which the new trial point is feasible. If k , set xk = xk + k dk and return. Else, go to Step 4. +1
We have already shown that, due to the convexity of , if in Algorithm 5.1.2 we de ne = xk + dk , the descent condition (3.13) is satis ed. In Algorithm 5.1.1 the situation is somewhat dierent. In fact, if we de ne dk = gCI (xk ), we have already seen, in Section 4, that the minimizer of in the rst segment determined by dk satis es (3.12). Therefore, since Algorithm 5.1.1 eventually evaluates at this point, it turns out that , with this de nition of dk , if xk = P (xk + dk ), we obtain in nite time a point that satis es (3.12). However, numerical experimentation and tradition recommends also to give a chance to the projected gradient as a possible direction for de ning the piecewise linear path. Frequently, the utilisation of the direction gP (xk ) in Algorithm 5.1.1 produces a point that satis es (3.12). But, especially in nearly degenerate situations, xk+1
+1
18
this could not be the case. Therefore, we cannot base our strategy for leaving the face only on the projected gradient, and so, the direction gCI (xk ) is necessary. Because of this, we de ne two possible strategies for leaving the face at Step 4 of Algorithm 3.1. The rst one is based just on gCI (xk ), and the second one tries gCI (xk ) only in the case of a failure with gP (xk ). These two strategies are described in Algorithms 5.2 and 5.3. See also Friedlander and Martnez [1994].
Algorithm 5.2. Strategy for leaving the face based on gCI (xk ).
De ne dk = gCI (xk ). Execute Algorithm 5.1.1 and de ne xk = P (xk + dk ). +1
Algorithm 5.3. Strategy for leaving the face based on gP (xk ) and gCI (xk ). De ne dk = gP (xk ). Execute Algorithm 5.1.1. If (P (xk + dk )) < (xk ) ? kgI (xk )k, then de ne xk = P (xk + dk ). Else, execute Algorithm 5.2. +1
It follows from the safeguarded interpolatory scheme of Algorithm 5.1.1, that a possible failure of the projected gradient as search direction is detected in nite time. In this case, the output of Algorithm 5.1.1 is the minimizer of () for 2 [0; k ] and the sucient descent condition does not hold because, probably, this interval is excessively small.
6 Final remarks The Barzilai-Borwein method for unconstrained quadratic minimization in 1988 attracted much interest because it suggested the possibility of de ning fast general minimization methods based only on gradient directions. In this paper we prove that the method can be extended in a natural way for handling bound constrained problems. In the description of the algorithm and the convergence proofs we assumed that ` > ?1 and u < 1. We used this assumption to ensure that a xed face contains a nite number of balls of radius . If H is positive de nite, in nite bounds can be allowed. In fact, what we need in the convergence proofs is the compactness of the level sets of the quadratic restricted to the current face, a property that is obviously satis ed when H is positive de nite. We conjecture that we can also eliminate the hypotheses on nite bounds in the semide nite case, but the arguments should be more involved. 19
We have performed preliminary numerical experiments which suggest that the algorithm introduced in this paper is a useful technique for solving very large-scale quadratic programming problems subject to bound constraints. We have observed, in the strongly convex case, that if the Hessian H is reasonably well-conditioned, the Conjugate Gradient method and the Barzilai-Borwein method require roughly the same number of iterations within the faces. So, in this case, the Barzilai-Borwein method should be preferred since it requires less computational work per iteration. On the other hand, for both algorithms, the number of iterations increases as the condition number of H does. The convergence of the Barzilai-Borwein method seems to deteriorate faster than the convergence of the Conjugate Gradient method for ill-conditioned problems. Obviously, much numerical investigation is necessary to establish these results conclusively. It would be interesting, for example, to experiment with suitable preconditioning techniques for the Barzilai-Borwein method when applied to special problems. In particular, our attention is now concentrated on the implementation of the algorithm for box constrained ill - conditioned least squares problems.
Acknowledgements
This work was presented as contributed paper to the NATO-ASI Meeting on Continuos Optimization held in Il Ciocco (Italy) in september 1993. The rst two authors are indebted to Prof. E. Spedicato, chairman of the meeting, for nancial support. We also acknowledge two anonymous referees for useful suggestions.
References Barzilai, J.; Borwein, J.M. [1988]: Two point step size gradient methods, IMA Journal on Numerical Analysis 8, pp. 141-148. Bertsekas, D. P.[1982]: Projected Newton methods for optimization problems with simple constraints, SIAM J. Control and Optimization 20, pp. 221-246. Cea, J.; Glowinski, R. [1983]: Sur des methodes d'optimisation par relaxation, RAIRO R-3, pp. 5-32. Coleman, T. F.; Hulbert, L. A. [1989]: A direct active set algorithm for large sparse quadratic programs with simple bounds, Mathematical Programming 45, pp. 373-406. 20
Coleman, T. F.; Hulbert, L. A. [1990]: A globally and superlinearly convergent algorithm for convex quadratic programs with simple bounds,Technical Report 90-1092, Dept. of Computer Science,Cornell University, Ithaca, N.Y. Dembo, R. S.; Tulowitzki, U. [1987]: On the minimization of quadratic functions subject to box constraints, Working paper series B 71, School of Organization and Management, Yale University, New Haven, Connecticut. Fletcher, R. [1987]: Practical Methods of Optimization (2nd edition), John Wiley and Sons, Chichester, New York, Brisbane, Toronto and Singapore. Friedlander, A.; Martnez, J. M. [1989]: On the numerical solution of bound constrained optimization problems, RAIRO Operations Research 23, pp. 319-341. Friedlander, A.; Martnez, J.M. [1994]: On the maximization of a concave quadratic function with box constraints, SIAM Journal on Optimization 4, pp. 177-192. Gill, P. E.; Murray, W.; Wright, M. H. [1981]: Practical Optimization, Academic Press,London and New York. Golub, G. H.; Van Loan, Ch. F. [1989]: Matrix Computations, The Johns Hopkins University Press, Baltimore and London Herman, G. T. [1980]: Image Reconstruction from Projections: The Fundamentals of Computerized Tomography, Academic Press, New York. Hestenes, M. R.; Stiefel, E. [1952]: Methods of conjugate gradients for solving linear systems, J. Res. Nat. Bur. Stds. B49, pp. 409-436 Lotstedt, P. [1984]: Solving the minimal least squares problem subject to bounds on the variables, BIT 24, pp. 206-224. More, J. J.; Toraldo, G. [1991]: On the solution of large quadratic programming problems with bound constraints, SIAM Journal on Optimization 1, pp. 93-113. Nickel, R. H.; Tolle, J.W. [1989]: A sparse sequential programming algorithm, Journal of Optimization Theory and Applications 60, pp. 453-473. O'Leary, D. P. [1980]: A generalized conjugate gradient algorithm for solving a class of quadratic programming problems, Linear Algebra and its Applications 34, pp. 371399. 21
Raydan, M. [1993]: On the Barzilai-Borwein choice of steplength for the gradient method, IMA Journal of Numerical Analysis 13, pp. 321-326. Wright, S. J. [1989]: Implementing Proximal Point Methods for Linear Programming,Preprint MCS-P45-0189, Mathematics and Computer Science Division, Argonne National Laboratory, Argonne, Il. Yang, E. K.; Tolle, J. W. [1986]: A class of methods for solving large,convex quadratic programs subject to box constraints, Technical Report N o UNC/ORSA/TR-86-3, Dept of Operations Research and Systems Analysis, University of North Carolina, Chapel Hill, NC.
22