bound constraints is constructed which allows one to use these methods for general ... We assume. B = BT ;. zT Bz > 0 for all z 6= 0 such that AT z = 0;. (1.4). A 2 IR ..... We get the data (see (2.12) , (2.13) ). 1 = p. 5;. ( ^B22) = 3. 5. : From this data we can .... directional derivative values of however can be computed with two ...
SOLVING GENERAL CONVEX QP PROBLEMS VIA AN EXACT QUADRATIC AUGMENTED LAGRANGIAN WITH BOUND CONSTRAINTS P. SPELLUCCI
Abstract
Large convex quadratic programs, where constraints are of box type only, can be solved quite eciently [1], [2], [12], [13], [16]. In this paper an exact quadratic augmented Lagrangian with bound constraints is constructed which allows one to use these methods for general constrained convex quadratic programming. This is in contrast to well known exact dierentiable penalty functions for this type of problem, which are not quadratic, e.g. [7], [10], [15]. Key words: quadratic programming, exact augmented Lagrangian AMS(MOS) classi cation: primary 90C20, secondary 65K05
1 INTRODUCTION The solution of large convex quadratic programs has interest in its own right, see e.g. [9], [13], and also in the context of large scale nonlinear programming using the SQP method. Large scale quadratic programs can be solved in a variety of ways. The classical way is to satisfy equality constraints exactly during iteration using a projection or elimination technique. This however makes necessary using a QR- or LU-decomposition of the Jacobian of those equality constraints, which may be very costly. Inequality constraints then are handled either by an active set strategy or by interior point methods, e.g. [1], [5], [6]. Using duality theory a general convex QP-problem may be transformed into a bound constrained one. This process involves solution of a linear equation with the Hessian as coecient matrix for any intermediate value of the dual variable. In the general case this will be very costly again. However, if the Hessian is of a special structure (diagonal or block diagonal e. g.), this approach is very attractive. Relevant work has been done by Mangasarian [11]. If there are inequality constraints only, a bound constrained exact dierentiable quadratic penalty function can be obtained from the Wolfe dual problem. This has been shown by Han and Mangasarian in [8]. The present work is an extension of this approach to the mixed case. If there are only box constraints, the active set and the projected gradient technique can be combined to yield highly ecient methods [2], [12], [13], [16]. An alternative is to set up an exact dierentiable augmented Lagrangian to be minimized simultaneously with respect to primal and dual variables without any additional constraints [10]. This type of function however is not in the class C2 globally and not quadratic of course. Therefore eciency of this approach is not so obvious. This is even more the case for exact penalty methods where the dual variables depend on the primal ones [7], [15]. Recently Friedlander, Martinez and Santos [3] showed, that the general convex QP-problem can be solved by nding one stationary point of a quartic function subject to bound constraints. However this involves iterative use of minimization of quadratics subject to bound constraints, whereas the present approach needs only one such minimization. However, here we need to compute two ( nite) penalty parameters adaptively, whereas their approach is free of such. Nevertheless we believe that the present approach should have merits, since minimization of quartics may be not so easy.
1
We consider the following type of problem
; f (x) = 21 xT Bx + bT x =! min x2
(1.1)
= fx 2 IRn : AT x ? a = 0; xJ 0g; J f1; : : :; ng:
(1.2) (1.3)
where
We assume
B = B T ; z T Bz > 0 for all z 6= 0 such that AT z = 0; (1.4) n m A2 ; m < n; rank (A) = m; (1.5) 0 0 0 9x : Ax = a; xJ > 0: (1.6) Under these hypotheses, the problem has an unique solution x with bounded multipliers ; J . (Since IR
the Slater condition (1.6) implies the Mangasarian-Fromowitz condition). If the original problem has the general form =! min ; AT x ? a = 0 ; CT x ? c 0 ;
T 1 T 2 x Bx + b x
with C 2 IRnu , we may introduce slack variables y 2 IRu+ and write =! min ; A^T x^ ? a^ = 0 ; x^J 0 ;
1 x^T B^ x^ + ^bT x^ 2
where
x^ =
x
y ;
B
x^J y ;
^b =
b 0
;
0 ; A^T = AT 0 ; a^ = a : 0 0 C T ?I c If the original problem satis es (1.4), (1.5) and the Slater condition: 9x0 : AT x0 = a; C T x0 > c, then the extended problem satis es (1.4), (1.5), (1.6) again. Therefore it is sucient to consider the case (1.1), (1.2). The necessary and sucient optimality conditions for (1.1) read
B^ =
Bx + b ? A ? IJ J = 0; xJ 0; J 0; xi i = 0 for i 2 J ; AT x ? a = 0:
(1.7) (1.8) (1.9)
We brie y describe now the notation used in this paper. All matrices and vectors are real. Vectors are column vectors, if not explicitly transposed, T is the transposition symbol. If J is a subset of IN; AJ denotes the matrix composed from columns of A with indices in J and similarly xJ the subvector of x with indices in J . Superscripts on vectors not in brackets or square brackets denote elements of sequences. jj jj denotes the euclidean norm and its subordinate matrix norm.
2
2 AN EXACT QUADRATIC AUGMENTED LAGRANGIAN SUBJECT TO BOUND CONSTRAINTS We follow the ideas of Lucidi [10] to form an exact dierentiable augmented Lagrangian. However, the sign of the ordinary Lagrangian has to be reversed here to obtain convexity. We let L(x; ; J ) = f (x) ? T (AT x ? a) ? TJ xJ and (x; ; J ; %; ) = ?L(x; ; J ) + %2 kAT x ? ak2 + 2 kBx + b ? A ? IJ J k2: ( IJ is the submatrix of the identity in IRn with column numbers in J ). This augmented Lagrangian is to be jointly minimized with respect to x; ; J under the constraints xJ 0, J 0. We also show that for suitable %; > 0 is convex and bounded from below on the set
~ = IRn+m+jJ j \ fx; J : xJ 0; J 0g: is an ordinary quadratic function of x; ; J . So the highly ecient methods of e.g. [2], [12], [13], [16] are available to minimize . We will now show that under the assumptions (1.4), (1.5), (1.6) the problem (1.1), (1.2) is equivalent to problem (2.1), (2.2): (x; ; J ; %; ) =!
min
(x;;J )2 ~
;
(2.1)
~ = f(x; ; J ) : xJ 0; J 0g:
(2.2)
In the following we suppress the argument list of where possible.
Theorem 1 Let (1.4), (1.5), (1.6) be satis ed. Then there exists some 0 > 0 such that for every
> 0 there exists a %0(), such that has minimizers on ~ and any minimizer of on ~ gives a solution of (1.7){(1.9) if % > %0(). Conversely a solution of (1.7){(1.9) de nes a minimizer of on
~ , if and % are suciently large.
Proof: We rst compute the Hessian of (with respect to x; ; J ). This reads
0 ?B + %AAT + B 2 A ? BA (I ? B )IJ B r2 = B B@ AT ? AT B AT A AT IJ IJT (I ? B )
IJT A
IjJ j
1 CC CA :
(2.3)
IjJ j = IJT IJ is the identity in jJ j . We will now show that r2 is positive semide nite for suitably chosen ; % > 0. To this end we interchange the rst and third (block) row and column to get IR
0 IjJ j B T 2 B P r P = @ AT IJ
IJT A IJT (I ? B ) AT A AT (I ? B ) (I ? B )IJ (I ? B )A ?B + %AAT + B 2
1 CC ; A
(2.4)
where P is a permutation matrix. The rst diagonal block is a positive multiple of the identity for > 0. Therefore r2 will be positive semide nite if and only if the Schur-complement of this block is positive semide nite. This is a 2 2 block matrix C with entries Cij , where
C11 = AT (I ? IJ IJT )A T = AT (I ? IJ I T )(I ? B ) C12 = C21 J T C22 = ?B + %AA + B 2 ? 1 (I ? B )IJ IJT (I ? B ): 3
(2.5) (2.6) (2.7)
Let
IJ IJT = I ? IJ IJT ;
(2.8)
C11 = AT IJ IJT A T = AT I I T (I ? B ) C12 = C21 J J C22 = ?B + %AAT + B 2 ? 1 (I ? B )2 + 1 (I ? B )IJ IJT (I ? B ) = B + %AAT ? 1 I + 1 (I ? B )IJ IJT (I ? B ):
(2.9) (2.10)
where J = f1; : : : ; ngnJ . Then
If
A = U 0 V T ;
= diag(1 ; : : : ; m )
(2.11) (2.12)
is a singular value decomposition with unitary U 2 IRnn and V 2 IRmm , then from (1.4), (1.5) it follows that ^ B^ 12 ; 11 B^ := U T BU = B B^11 2 IRmm (2.13) B^ B^ 22
21
has a positive de nite right lower minor B^22 and B^11 + %2 ? 1 I ; B^12 U T C22U = B^21 ; B^22 ? 1 I
!
+ 1 H;
where the second term
H := U T (B ? I )IJ IJT (B ? I )U is positive semide nite for every . If > 0 = 1=min (B^22);
(2.14) (2.15) (2.16)
then B^22 ? 1 I will be positive de nite. Therefore C22 will be positive de nite provided the Schur complement of B^22 ? 1 I , namely
B^11 + %2 ? 1 I ? B^12(B^22 ? 1 I )?1B^21 =: D11
(2.17)
is positive de nite. Remark: If B is positive de nite and > 1=min (B), then C22 will be positive de nite for every % > 0. This follows immediately from (2.11). 2 In the general case D11 will be positive de nite provided % is suciently large, since 2 is positive de nite. E.g. if 20 (2.18) and kB k 1; (2.19) then % > %0() = (2 + )= min ( )2 (2.20) i i would be sucient. In the general case we have as a sucient condition
( )2 : % > 1 + kB k + kB k2=(min (B^22) ? 1 ) = min i i
(2.21)
Therefore r2 will be positive semide nite, provided the Schur complement of C22 in C , namely ?1C21 =: E11 C11 ? C12C22
4
is positive semide nite too and ; % are suciently large, such that C22 is positive de nite. If J = f1; : : : ; ng (J = ;) this condition is empty, since in that case C11 = 0; C12 = C21T = 0. Otherwise E11 will be positive semide nite provided
n
IjJj ? IJT (I ? B ) B + %AAT ? I + (I ? B )IJ IJT (I ? B ) is, since > 0 and
o?1
(I ? B )IJ =: E~11
(2.22)
E11 = AT IJ E~11IJT A:
(2.23) Now, if B + %AAT ? I is positive de nite, which has been discussed already, then E~11 is positive de nite anyway. To see this let ~ ~ U~ 2 IRnn ; V~ 2 IRjJjjJj unitary (I ? B )IJ = U 0 V~ T ; be a singular value decomposition. Then o n ~V T E~11V~ = IjJj ? (~ ; 0) U~ T (B + %AAT ? I )U~ + ~ 2 0 ?1 ~ : 0 0 0 Now for suciently large and % > %0() U~ T (B + %AAT ? I )U~ = W 2W T with unitary W and regular diagonal matrix = diag(1; : : : ; n ):
~ ? T ? 1 T = W~ F := W W V~ ; ? = diag( i )
If
0 0 is a singular value decomposition, it follows that V~ T E~11V~ = IjJj ? F T fI + FF T g?1F T T = V~ (IjJj ? ?(IjJj + ?2)?1?)V~ = V~ diag(1=(1 + i2))V~ ; which is positive de nite (if J 6= ;. Otherwise it is vacuous). The proof of semide niteness of r2 for suciently large and % > %0() is therefore complete. The next step of the proof shows that is bounded from below on ~ . Since is quadratic and ~ is a polyhedron, it then follows that attains its in mum on ~ , and from convexity of that every local minimizer on ~ is a global one. If (x; ; J ) 2 ~ is arbitrary, we consider a feasible ray (x + tx; + t; J + tJ ) 2 ~ for t 2 IR+ ; t ! 1: That implies Let If then
(x)J 0; x
:= (xT ; T ; TJ )T ;
y
(J ) 0: := (xT ; T ; TJ )T 6= 0:
(r2 )y 6= 0; yT
r2y > 0 5
from the positive semide niteness of r2 and therefore will grow to +1 along this ray. Therefore it suces to consider rays for which (r2 )y = 0. In that case for every x0, using Taylor's expansion twice ( x + ty ) = ( x ) + tr( x )T y = ( x ) + t(r( x0 ) + (r2 )( x ? x0 ))T y (2.24) = ( x ) + t(r( x0 ))T y ( x0 ) + r( x 0)T ( x ? x0 ) + tr( x0 )T y : Now 0 ?Bx ? b + A + I + %A(AT x ? a) + B(Bx + b ? A ? I ) 1 J J J J A : (2.25) AT x ? a ? AT (Bx + b ? A ? IJ J ) r( x ) = @ IJT x ? IJT (Bx + b ? A ? IJ J ) From (1.4){(1.6) it follows that there exist x0 ; 0; 0J such that (1.7){(1.9) have a solution:
x0J 0J AT x0 ? a Bx0 ? b ? A0 ? IJ 0J 0J T x0J
0; 0;
= 0; = 0; = 0:
Let x0 := (x0T ; 0T ; 0JT )T in (2.24). Then for x 2 ~ and feasible y
r( x0 )T ( x ? x0 ) = (x0J T )(J ? 0J ) ?x0J T 0J = 0; r( x0 )T y = (IJT x0)T J = (x0J T )J 0: Clearly is bounded from below on ~ . Since every point of ~ is regular, the Kuhn-Tucker conditions are necessary and sucient for (2.1), (2.2). They read
0I 1 0 J r( x ) ? @ 0 A J[1] ? @ 0
0 0
IjJ j
1 A J[2] = 0;
J[1] 0; J[2] 0; (J[1])T xJ = 0; (J[2])T J = 0; ~: x 2
(2.26) (2.27) (2.28)
If (x ; ; J ) solves (1.7){(1.9), then x = (xT ; T ; JT )T ; J[1] := 0; J[2] := xJ solve (2.26){(2.28). Let (2.26){(2.28) be satis ed. Then adding ?%A times the second block equation of (2.26) to the rst one we obtain (?I + B + %AAT )(Bx + b ? A ? IJ J ) = IJ J[1]: Multiplying by (Bx + b ? A ? IJ J )T =: z T from the left we get, using the third block equation
z T (?I + B + %AAT )z = z T IJ J[1] = 1 (xTJ ? J[2]T )J[1] = ? 1 J[2]T J[1] 0: But ?I + B + %AAT positive de nite for suciently large ; %. Then z = 0. From the second block T in (2.26) then AT x ? a = 0. Therefore J[1] = 0 too and J[2] = xJ . But J[2] J = 0 and therefore 2 xTJ J = 0 and we have a solution of (1.7){(1.9). This proves the theorem. Remark: If the matrix (A; IA ), where A = fi 2 J : xi = 0g, is of full rank, then the multipliers ; are unique. Because of equivalence of solutions, will have an unique minimizer on ~ . 6
Moreover the projected Hessian of is positive de nite in this case (provided ?I + B + %AAT positive de nite of course). 2 Let us give a little example: Example: Let n = 2; m = 1; J = f1; 2g. ?1 0 0 T = ? 2 1 ; a = (?1) ; b = B = ; A 0 1 0 : Clearly, there is no x 0 such that 2x1 + x2 + 1 = 0 and therefore we expect the method to fail for every ; % > 0, that is, is not bounded from below. We get the data (see (2.12) , (2.13) ) p 1 = 5; (B^22) = 35 : From this data we can compute 0 and 0() using (2.16) and (2.21). Computing we obtain (x; ; ) = ? 12 (?(x1 )2 + (x2 )2) ? x1 1 ? x22 ? (2x1 + x2 + 1) + 2% (2x1 + x2 + 1)2 + 2 ((x1 + 2 + 1 )2 + (x2 ? ? 2 )2) with the constraints x1 0; x2 0; 1 0; 2 0. x1 = x2 = 0; 1 = ?2; 2 = ?; 0 is feasible and for ! ?1 we obtain ! ?1, regardless how large ; % are chosen, as expected. If we change a to become a = 1; the problem becomes feasible. The unique optimal solution is x1 = 21 ; x2 = 0; = ? 14 ; 1 = 0; 2 = 14 : Now reads (x; ; ) = ? 21 (?(x1 )2 + (x2 )2 ) ? x1 1 ? x2 2 ? (2x1 + x2 ? 1) + %2 (2x1 + x2 ? 1)2 + 2 ((x1 + 2 + 1 )2 + (x2 ? ? 2)2 ): We get 0 x + + 2 + 2%(2x + x ? 1) + (x + 2 + ) 1 BB ?1 x2 +1 2 + + %(2x11 + x22 ? 1) + (x12 ? ? 21) CC r(x; ; ) = B B@ 2x1 + x2 ? 1 +x(2(+ x1(x+ 2+2+ +1)?) (x2 ? ? 2)) CCA ; 1 1 1 x2 ? (x2 ? ? 2)
r(x ; ; ) = (0; 0; 0; 12 ; 0)T ;
0 1 + 4% + 2% BB 2% %+?1 2 1? r (x; ; ) = B B@ 21++2 0 5 3
For > 0 = and
0
1?
2 + 2 1 + 0 1? 0 1? 5 2 2 0 0
% > %0() = 15 ( 53 + 1 + 316? 5 )
1 CC CC : A
(see (2.16), (2.17) ) r2 will be positive semide nite with exactly one eigenvalue 0 . The corresponding eigenvector has the form (0; 0; 1; ?2; 1)T . This is not a direction to in nity in ~ . Using x and r( x ) given above we get ( x ) = ( x ) + r( x )T ( x ? x ) + 21 ( x ? x )T r2 ( x )( x ? x ) ( x ) + 12 1 ( x ) for 1 0 : But 1 0 for x 2 ~ . Clearly x minimizes on ~ . 2 7
3 A working algorithm using In this section we will show how to compute appropriate values for the parameters % and . Let us give some comments on choosing a minimization method for (2.1) rst. The case of small to medium scale QP-problems is of no interest here. For these problems direct active set methods (e.g. [6]) are to be preferred. These methods have high reliability and are reasonable ecient if the dimension of the problem is not too large. Therefore we will discuss large scale problems, where usually B and A are very sparse. B 2; AAT ; AT A and AT B however, making up the Hessian of , may be much more dense, even completely dense if there are dense columns. The use of minimization methods which make explicit use of the projected Hessian, e.g. the projected Newton-method of More and Toraldo [13], the method of Coleman and Hulbert [2] or barrier methods using Newton-steps will therefore be restricted to cases, there this loss of sparsity is tolerable, e.g. A and B block diagonal. Active set methods or interior point methods using preconditioned conjugate gradients, the limited memory Quasi-Newtonor the truncated Newton-approach, e.g. [12], [14], [17], give more promise for general application. These methods only require evaluation of function and gradient values. Function, gradient and second directional derivative values of however can be computed with two applications of each of B , A and AT on a vector without any explicit use of r2 . We assume the following properties of a solution method to be used: 1. For xed % and the method is strongly monotonic with respect to and xk+1
or
= xk + k dk
= P ~ (xk + k dk ) ; where dk is a direction of descent for at xk and P ~ denotes projection on ~ . 2. If is bounded from below and % and are xed, then a Kuhn-Tucker point of (2.1), (2.2) is found irrespective of convexity of . 3. xk 2 ~ for every k. 4. The method has the nite termination property or otherwise is R-linearly convergent (at least). xk+1
These properties are shared by a wide variety of algorithms. Our rst tentative implementation uses Nash's code tnb.f obtainable from netlib with only minor modi cation. For a description of this method see [14]. The modi cations used are the following: GTIMS (which approximately computes Gv, G representing the Hessian and v a vector, by nite dierencing) is replaced by a routine which computes the matrix-vector-product exactly. The stepsize-computation is replaced by computing the unidimensional minimum exactly, which is trivial in this case. A routine performing the adaptation of % and is called as soon as the current direction of descent dk = (xk ; k ; kJ ) , say, is computed. It consists of three modi cation steps and a restart safeguard. Firstly, if we observe a large increase in the penalty terms which penalize infeasibility resp. the Lagrangian condition violation , then the corresponding parameters are increased. This is a folklore devive in augmented Lagrangian methods, see e.g. [4]. Secondly the penalty parameters are adapted such that the second directional derivate of along the current search direction is positive, if it extends to in nity in the feasible domain or nonnegative otherwise. Remember that is convex on the feasible domain, if % and are large enough. Neither of these devices may be sucient to prevent divergence, since it may occur that the search directions miss a component in some invariant subspace of r2 . Therefore thirdly there is a simultaneous increase of both and % if we observe divergence of the sum of error measures, which are infeasibility, Lagrangian condition violation and complementarity violation. Finally, if the method stops nitely, then we check the Kuhn-Tucker-conditions of the original QP and take a restart, if these are violated. Precisely we proceed as follows:
8
1. If one of the penalty-terms did increase since the last change of a penalty-parameter (in step k0 < k , say), the appropriate parameter is increased. This is done according the following scheme. Let 1;k?1 = jjAT xk0 ? ajj . If jjAT xk ? ajj > 01;k?1 then
1;k := jjAT xk ? ajj;
T k T k0 %k = %k?1 max 1; min 2; jjA x ?jjaAjjT?jjxAk jj x ? ajj ; otherwise %k = %k?1 is tried and 1;k := 1;k?1 . The computation of k is done in the same manner, using a variable 2;k for the level of Lagrangian violation. The constants i , i 2 f0; 1; 2g; satisfy 0 > 1; 1 < 1 < 2. The present
implementation uses
0 = 1 = 1:5; 2 = 3:
2. The second directional derivative of along the computed direction of descent is computed. Using the obvious decomposition r2 = H0 + %H1 + H2 this can be written as (dk )T r2dk = k + %k k + k k ; where
k = ?(xk )T B (xk ) + 2((AT xk )T k + (xk )T IJ kJ ) ; k = jjAT xk jj2 ;
k = jjIJT (B xk ? Ak )jj2 + jjkJ ? IJT (B xk ? Ak )jj2 : k ; k ; k and (r2)dk can be computed using two applications of each of B; A and AT on a vector. Observe that the computed values of k and k will be nonnegative automatically using
this representation. For to be convex we have the necessary condition
k + %k k + k k =: k 0 : If k < 0 and k = k = 0 we terminate unsuccessfully. Given the conditions of theorem 1, this case cannot occur. Otherwise in case k 0 we distinguish the following cases: a. dk is a direction to in nity in ~ . Then we require to be strictly convex along this direction (remember that we have shown in section 2 that a direction of descent which extends to in nity in ~ cannot have zero second directional derivative). Therefore we let
n
%k := max 4%k?1 ; (1jjdk jj2 ? k ? k k )= k and k = k?1 , given k > k , and
n
k := max 4k?1 ; (1jjdk jj2 ? k ? k %k )= k %k = %k?1 otherwise, with
4 > 1; 1 > 0 :
o
o
b. The step along dk is restricted by feasibility, i.e. by the condition xk+1 2 ~ . In that case we use the same procedure, however with 1 replaced by 0. In the implementation, 4 = 1:1; 1 = 10?6 and 0 is replaced by the machine epsilon. c. If k > 0 and dk is a direction to in nity in ~ , we require k 1jjdk jj2 and proceed as above. 3. Let k X k = fjjAT xj ? ajj + jjBxj ? Aj ? IJ jJ jj + (xjJ )T jJ g: j =0
9
If k > ck?1 then
ck := c2k?1 ; %k = 1%k?1 ; k = 1k?1
and %k = %k?1; k = k?1 ; ck = ck?1 otherwise. Here c0 1 is a preselected constant (, presently c0 = 103). 4. If the method terminates nitely indicating a solution of (2.1), (2.2), we check whether a solution of (1.1), (1.2) is obtained. If not, then % and are increased by the factor 1 and the method is restarted. The tests are performed in succession. As soon as a change of a penalty-parameter occurs, the remaining tests are discarded. If %k > %k?1 or k > k?1 then dk is recomputed and the tests are repeated. This is necessary since the descent direction computed already depends on the current % and of course. Since in practice the number of changes of the penalty-parameters will be quite small, this causes no much overhead.
Theorem 2 Assume that in addition to (1.4), (1.5), (1.6) the matrix (A; IA ) is of full rank, where A = fi 2 J : xi = 0g : Then %k and k stay xed for k suciently large and xk converges against the unique solution of (1.1), (1.2).
Proof: The modi cation step 4 can occur nitely often only because becomes convex for % and suciently large as shown in section 2. By assumption all level sets L = fx 2 ~ : ( x ) g
are bounded, if % and are large enough. For % and xed k is monotonic decreasing and increasing % or increases the value of . Therefore fxk g remains bounded if % and become large enough. The modi cation step 1 for % or therefore can apply nitely often only then ( observe that the sequences f1;k g and f2;k g are nondecreasing). In case of nonconvergence of xk against the solution of (1.1), (1.2) fk g would diverge and therefore by device 3 % and would both be increased in nitely often. Necessarily therefore they both would go beyond the bounds necessary to make convex. The modi cation step 2 therefore can apply nitely often at most. However, as soon as % and are suciently large, by assumption on R-linear convergence of the minimization method k cannot diverge. Therefore %k and k stay bounded and by construction are held xed for k suciently large. Again, by assumption on the algorithm used to minimize and by device 3 we have
jjAT xj ? ajj ! 0 jjBxj ? Aj ? IJ jJ jj ! 0 (xjJ )T jJ ! 0
which proves the theorem. 2 This device works quite well and gave reasonable values for and % in any case tried so far. It is in spirit similar to techniques used e.g. in [10]. We present here results obtained for the obstacle problem of a membrane in discretized form. The continuous problem is to minimize 1 Z (@ u)2 + (@ u)2 + 2fu d! 2 2 1 subject to uj@ = 0 u a.e. on ; 10
where f : ! IR is the given load and : ! IR the given obstacle. For reasons of simplicity we choose =]0; 1[]0; 1[ and f (; ) = ?2 f (1 ? ) + (1 ? )g such that u(; ) = ? (1 ? )(1 ? ) is the unconstrained deformation of the membrane. As we choose (; ) = ? 0 ? 1 ( ? p )2 + ( ? p)2 with (p ; p) 2 given. Varying 0; 1 and (p ; p) one obtains dierent con gurations concerning the area of contact (i.e. the number of binding constraints). Discretizing this problem using a quadratic grid with mesh-size h = 1=(2N + 2) and the well known ve-point-stencil we arrive at a QP-problem + uT f h = min ;
1 T 2 u Bu
u h ; where u = (uh11; uh12; : : : ; uh1;2N +1; : : : ; uh21; : : : ; uh2N +1;2N +1)T with uhij an approximation for u(ih; jh), and f h ; h the vectors of values of f and taken on the grid. Using the substitution v := u ? h we would obtain a strictly convex bound constrained QP-problem directly. This is not done here purposely. Rather we introduce slack variables v 0, write u?v = h and obtain the QP-problem 1 T T 2 x Bx + x b = min ;
AT x ? a = 0; a = h ; xJ 0; J = fM + 1; : : : ; 2M g ;
where M = (2N + 1)2,
B =
h 0
0
0
; b =
fh 0
;
AT = (I; ?I )
; x =
u v
:
then depends on 4(2N + 1)2 variables xT = (uT ; vT ; T ; TJ ). The matrix h represents the discrete Laplacian and reads h = (2N + 2)2 block-tridiag(?I; T; ?I ) 2 IRM M ; T = tridiag(?1; 4; ?1) 2 IR(2N +1)(2N +1) : In this case the theoretical bounds for and % can be computed explicitly. This gives pus the possibility to check proper working of our algorithm. In the notation of section 2 we have = 2 I , h h B^ = 12 ; h h B^11 = B^12 = B^21 = B^22 = 21 h ; Therefore we obtain the condition
min(B^22) = 4 sin2( h2 )=h2 :
> 0 = 14 h2= sin2( h2 ) (! 12 for h ! 0)
and % has to be chosen such that ?1h 1 1 1 1 1 2 h + 2% I ? I ? 4 h ( 2 h ? I ) 11
is positive de nite (see (2.17) ). This yields the condition + 2% ? 1 ? 2 =( ? 1 ) > 0
for every eigenvalue of 12 h and thus 2min + 1 2% > ? 1 ; min where min is the minimal eigenvalue of 21 h . E.g. for = 2=(min) we get the bound % 25 min( 12 h ) = 10 sin2( h2 )=h2 ! 25 12 for h ! 0. For very large , % may be very small. In our tests we purposely did choose %0 and 0 inappropriately small, namely %0 = 0 = 10?5 : The initial value was chosen in the following manner:
h ; ?ih(1 ? ih)jh(1 ? jh) ; i;j
u(0) i;j = max (0) vi;j = u(0) i;j ? (0) i;j =
0
(3.29) (3.30)
h i;j ;
(0) > 0 if vi;j ; 2(ih(1 ? ih) + jh(1 ? jh)) otherwise
(3.31)
(0) (0) i;j = ?i;j :
(3.32)
and (x0)T := ( (u0)T ; (v 0)T ; (0 )T ; (0J )T ). Double indices (i; j ) above corrrespond to vector indices k = (i ? 1)(2N + 1) + j . Some typical results are given in the following tables. The columns have the following meaning : n = n, niter = # of main iterations in LMQNBC, nftotl = # function/gradient calls, infeas = jjAT x ? ajj ( nal), lagr viol = jjBx ? A ? IJ J jj, compl viol = xTJ J . cputim is in seconds on an HP9000/720/60 with f77-compiler options +Obb2000 +OP3 -K. psi0=-0.015 psi1=-0.01 ksi0=0.75 ksi1=0.75 n iter nftotl infeas lagr_viol 36 32 254 .1193E-07 .1227E-09 196 289 2269 .4627E-06 .1091E-09 484 709 9300 .1912E-06 .9685E-10 900 1146 16568 .4774E-07 .9505E-11 1444 1183 21562 .2700E-04 .3255E-08 2916 3544 65319 .1212E-05 .5769E-10 4900 6287 118415 .2634E-04 .1128E-08 7396 28343 545709 .2649E-02 .1312E-07 10404 13568 267752 .1393E-04 .3096E-09 14884 19240 393816 .1226E-04 .1856E-09
compl_viol .5092E-08 .2268E-07 .2614E-09 .0000E+00 .0000E+00 .0000E+00 .0000E+00 .1162E-02 .4731E-05 .0000E+00
rho .2608E+01 .1638E+00 .7741E+00 .5969E+00 .5443E+00 .4155E+00 .6257E+00 .1084E+00 .6554E+00 .1044E+01
eta .4823E+00 .9678E+01 .2024E+01 .3011E+01 .1917E+01 .3090E+01 .1847E+01 .9500E+01 .2226E+01 .2178E+01
cputim .1 2.6 30.4 107.6 239.6 1588.5 5150.2 31028.2 21731.0 48245.9
psi0= -.3000E-01 psi1= -.1000E-01 ksi0= .7500 ksi1= .7500 n iter nftotl infeas lagr_viol compl_viol rho 36 49 311 .1909E-06 .7071E-09 .1559E-07 .1680E+01 196 203 1992 .7441E-08 .3111E-10 .3694E-09 .3141E+01 484 523 6429 .3694E-06 .2634E-09 .7384E-07 .1146E+01 900 823 15340 .3005E-05 .5310E-09 .0000E+00 .4986E+00 1444 1802 27474 .8630E-07 .8985E-11 .2824E-07 .5707E+00
eta .2658E+01 .1311E+01 .1519E+01 .2674E+01 .4499E+01
cputim .1 2.2 21.4 93.6 304.5
12
2916 4900 7396 10404 14884
2888 13651 9685 22031 21677
51187 287998 153185 407913 395068
.1374E-04 .1411E-04 .7884E-06 .7533E-04 .5339E-04
.9206E-09 .9657E-10 .2580E-10 .7104E-09 .1023E-08
.0000E+00 .2743E-05 .3651E-06 .3124E-04 .0000E+00
.5865E+00 .9927E-01 .6554E+00 .2686E+00 .8242E+00
.2362E+01 1204.3 .1199E+02 10812.6 .2734E+01 9650.4 .6127E+01 33160.6 .1484E+01 47955.2
The results show that the mechanism for selecting % and always works properly. Of course the algorithm as implemented presently is not competitive, since Nash's code provides for deletion of a single constraint from the working set per step at most. The pictures below apply to the case N = 30(n = 14884). They show the independent changes of % and and the development of the three interesting error measures jjAT x ? ajj , jjBx ? A ? IJ J jj and xTJ J . The results are typical, other test cases producing quite the same qualitative behaviour. N=30, n=14884, psi0=-0.05, psi1=-0.01, ksi0=ksi1=.75
N=30, n=14884, psi0=-0.05, psi1=-0.01, ksi0=ksi1=.75
2
5 3 1
eta rho
1
infeasibility lagrange_violation complementarity
0.1 0.1
0.01 0.001
0.01
0.0001 1e-05
0.001
1e-06 1e-07
0.0001
1e-08 1e-09 1e-10 0
1000
2000
3000
4000
5000
6000
0
gure 1
5000
10000
15000
20000
gure 2
4 Conclusion The general convex quadratic programming problem can be transformed into a convex quadratic programming problem with lower bound constraints only using the method of this paper. This opens the possibility to solve these problems by means of the highly ecient methods known for the latter. An algorithm for choosing the necessary parameters automatically has been given. It can be incorporated in any solution method for the bound constrained problem without much programming eort. Preliminary numerical results show, that it works well.
References [1] Carpenter, T.J.; Shanno, D.F.: An interior point method for quadratic programs based on conjugate projected gradients. Comp. Optim. Appl. 2, (1993), 5-28 . [2] Coleman, Th.F.; Hulbert, L.A.: A globally and superlinearly convergent algorithm for convex quadratic programs with simple bounds. SIOPT 3, (1993), 298-321 . [3] Friedlander, Ana; Martinez, J. M.; Santos, A. S.: On the resolution of linearly constrained convex minimization problems. SIOPT 4, (1994), 331-339 . [4] Gill, Ph.E.; Murray, W.; Wright,M.: Practical methods of optimization. New York: Acad. Prss 1980 . 13
[5] Goldfarb, D.; Liu, S.: An O(n3 L) primal interior point algorithm for convex quadratic programming. Math. Prog. 49, 325{340, (1991). [6] Gould, N.I.M.: An algorithm for large scale quadratic programming. I.M.A.J. Numer. Anal. 11, 299{323, (1991). [7] Grippo, L.; Lucidi, S.: A dierentiable exact penalty function for bound constrained quadratic programming problems. Optimization 22, (1991), 557-578 [8] Han, S.P.; Mangasarian, O.L.: A dual dierentiable exact penalty function. Math. Prog. 25, (1983) , 293-306 [9] Lin, Y. Y.; Pang, S. Y.: Iterative methods for large convex quadratic programs: A survey. SIAM J. Conrol and Optimization 25, (1987), 383-411 [10] Lucidi, S.: New results on a class of exact augmented Lagrangians. J.O.T.A. 58, (1988), 259{282 [11] Mangasarian, O.L.: Sparsity preserving SOR-algorithms for separable quadratic and linear programming. Comput. Oper. Res. 11, (1984), 105-112 [12] More, J.J.; Toraldo, G.: On the solution of large quadratic programming problems with bound constraints. SIOPT 1, (1991), 93{113 [13] More, J.J.-Toraldo, G.: Algorithms for bound constrained quadratic programming problems. Num.Math. 55, (1989), 377-400 [14] Nash, S.G.: Newton-type minimization via the Lanczos method, SINUM 21, (1984), 770-788 [15] di Pillo, E.; Grippo, L.: An exact penalty method with global convergence properties for nonlinear programming problems. Math. Prog. 36, 1{18, (1986). [16] Yang, E.K.; Tolle, J.W.: A class of methods for solving large convex quadratic programs subject to box constraints. Math. Prog. 51, 223{228, (1991). [17] Zou, X., et alii: Numerical experience with limited-memory Quasi-Newton- and truncated Newtonmethods. SIOPT 3, (1993), 582-608
14