M.J. Todd/Large-scale linear programming the problem from a ..... Bartels and Golub [3, 4] showed how the LU factorization could be maintained throughout (see ...
Mathematical Programming 26 (1983) 1-20 North-Holland Publishing Company
LARGE-SCALE LINEAR PROGRAMMING: GEOMETRY, WORKING BASES AND FACTORIZATIONS* Michael J. T O D D School of Operations Research and Industrial Engineering, Cornell University, Ithaca, NY, U.S.A. Received 4 February 1981 Revised manuscript received 19 April 1982
This paper is concerned with linear programming problems in which many of the constraints are handled implicitly by requiring that the vector of decision variables lie in a polyhedron X. It is shown that the simplex method can be implemented using a working basis whose size is the number of explicit constraints as long as the local structure of X around the current point is known. Various ways of describing this local structure lead to known implementations when X is defined by generalized or variable upper bounds or flow conservation constraints. In the general case a decomposition principle can be used to generate this local structure. We also show how to update factorizations of the working basis.
Key words: Large-Scale Linear Programming, Geometry, Compact Bases, Matrix Factorizations.
I. Introduction We are concerned in this paper with the linear programming problem min
gTx,
Ax = h,
xEX,
(1)
where A is an m x n real matrix, x, g and h are vectors of appropriate dimension and X C_ R" is a polyhedron. We think of the equations in A x = h as complicating constraints, perhaps f e w in number, whereas X is a polyhedron of simple structure, although it may be described by a large number of constraints. Without loss of generality, we assume that A has rank m. Problems of this nature abound in applications; for example X may be the set of vectors x satisfying generalized and/or variable upper bounds, or the set of feasible flows in a network or a m u l t i c o m m o d i t y network. It is also helpful to bear in mind the trivial example where X = R 2. Our aim is to s h o w h o w the simplex method for problem (1) can be implemented in a w a y that treats the constraints of x E X implicitly. Our approach is similar to algebraic treatments of (1) using partitioned bases; h o w e v e r w e view * This research was initiated while the author was visiting at D.A.M.T.P, University of Cambridge, England, and was supported by fellowships from the John Simon Guggenheim Foundation and the Alfred P. Sloan Foundation and by National Science Foundation grant ECS-7921279.
2
M.J. Todd/Large-scale linear programming
the problem from a geometrical perspective that allows greater flexibility. We show in Section 2 that an extreme point of the feasible region of (1) lies in a face of X of dimension at most m. It is then of interest to define directions of motion from this extreme point that either keep us in this face or move into a face of dimension one higher. Such directions form direction matrices P and Q. Section 2 shows that knowledge of P and Q allows us to check the optimalit~ of the current extreme point and, if not attained, proceed to an adjacent extreme point with strictly improved objective function value. This analysis requires solution of linear systems whose coefficient matrix is the 'working basis' AP. An interesting point is that the polyhedron X may be massively degenerate without causing problems, as long as suitable matrices P and Q can be generated. Section 3 examines some special cases where P and Q can be obtained explicitly. These cases correspond to variable and generalized upper bounds and network polyhedra. In these cases the algorithm simplifies into previously known implementations for the particular structure. We also consider a general case in which P and Q are generated from the constraints of X in a particular way that leads to the partitioned inverse of a full basis. In Section 4 we discuss how the various quantities required in our implementation, i.e., the direction matrices P and Q and a representation of the inverse of the working basis AP, can be updated from one iteration to the next in these cases. We consider a triangular factorization of AP. A classical way to treat problem (1) is by the price-directive decomposition principle of Dantzig and Wolfe [12]. As we have stated, our approach is merely an implementation of the simplex method; however our viewpoint permits a geometrical comparison of these two approaches, which we describe in Section 5. Dantzig-Wolfe decomposition motivates our consideration of a form of our method in which the requisite knowledge of X is acquired as needed, rather than being obtained explicitly. In Section 6 we show that our method can be implemented without explicit knowledge of Q by solving a subproblem at each itcration to generate appropriate information concerning X. We call this the 'cone subproblem' as we require an extreme ray of a certain cone related to X rather than an extreme point of X. Besides the information generated by the subproblem it is necessary to update the direction matrix P. Two ways in which this can be accomplished are presented in Section 7. The method chosen depends on the way the representation of the working basis A P is m a i n t a i n e d - - o u r methods assume that A P is factorized either as L U or as LZ, when L and U are lower and upper triangular respectively and Z is orthogonal. This paper grew out of the author's treatment [37] of th~ variable upper bounded problem. In that special case, one of the main contributions was the handling of massive degeneracy. In the present more general setting the algorithm can be seen more clearly in relation to a large body of literature on large-scale programming problems. As we have noted above there is a very close
M.J. Todd/Large-scale linear programming
3
relationship to c o m p a c t inverse methods that have been developed for generalized (variable) upper bounds [11, 26, 35, 8], network-related problems [32, 21, 27, 19, 25, 2, 24, 20] and in a general setting [18, 29]. Our contribution here is to give a more geometrical viewpoint of these methods. There is also a close relationship to a class of methods for nonlinear programming with linear constraints (in which the optimal solution need not lie at a vertex). These methods use direction matrices describing possible motion within the current f a c e - - s e e Gill and Murray [15] and Murtagh and Saunders [30]. One difference is that in the nonlinear p r o g r a m m i n g context the dimension of the face varies whereas we are always concerned with m- and (m + l)-faces. H o w e v e r , our Section 5 is clearly analogous to the updating schemes that have been developed in this area, though attuned more to implicit handling of X.
2. Extreme points and optimality Let V denote the feasible region {x E X: A x = h} of problem (1), and let ~ be a given point in V. In this section we show how to determine whether 2 is extreme in V, and, if so, whether it is optimal, given local information concerning X around 2. To clarify our results we will frequently refer to the trivial case in which X = RT; the next sections consider more important examples. First we make precise what we mean by local information about X. We recall that a face of X is either 0 or X or the intersection of X with a supporting hyperplane. It is called a k-face if its dimension, i.e., the dimension of its affine hull, is k.
Definition 2.1. Suppose 2 lies in the relative interior of the k-face F of X. We say (P, Q) is a pair of direction matrices for X at 2 if P is n • k with rank k, Q is n x l, and cone(X - {2}) ~ {)t(x - 2): ~, _> 0, x ~ X} = {Py + Qz: z -> 0} and for each column q of Q and sufficiently small positive e, 2 + Eq lies in a (k + D-face of X. Thus if 2 is a vertex of X, k is 0 and P is a null matrix. The columns of Q are then the edge-vectors of the edges of X incident on .~. If 2 lies in the interior of X, k = n, P is n • n nonsingular and Q is a null matrix. If.? lies in a k-face F of X with 0 < k < dim X, then the columns of P are a basis for s p a n ( F - {2}) while each column of Q is a vector which, together with the columns of P, f o r m s a basis for span (G - {g}), where G is a (k + 1)-face of X containing F. Moreover, the columns are chosen in feasible directions. The final requirement of Definition 2.1, that 2 + Eq lies in a (k + l)-face of X, is crucial; it ensures that our algorithm progresses from extreme point to extreme point.
4
M.d. Todd/Large-scale linear programming
L e t us c o n s i d e r the trivial e x a m p l e w h e r e X = R~. S u p p o s e the first k c o m p o n e n t s of ~ are positive while the rest are zero. T h e n .~ lies in the relative interior of the face F = { x ~ R n : x ~ - > 0 for i = 1 , 2 . . . . . k, x ~ = 0 for i = k + l . . . . . n} of X. We m a y take P--['0]
and
Q--[~l
where the partitions are into k rows and n - k rows. Of c o u r s e , P and Q are not unique; we m a y replace the k x k identity in P by any nonsingular matrix, the k x (n - k) zero matrix in Q by any matrix, and the (n - k) x (n - k) identity in Q by any diagonal matrix with positive diagonal. F o r the rest of this section, ~ is a given point in V and (P, O) is a pair of direction matrices for X at ~. The t h e o r e m s that follow relate the status of ~ in p r o b l e m (1) with the status of the origin in the following 'local p r o b l e m ' , obtained by substituting x = ~ + Py + Qz in (1). min
gTpy + gTQz ' APy+AQz=0,
(2)
z >- O.
L e t W d e n o t e the feasible region {(y, z): A P y + A Q z = 0, z -> 0} for (2).
Theorem 2.2. ~ is an extreme point of V iff A P has full column rank. Proof. S u p p o s e first that A P does not have full c o l u m n rank. T h e n there exists with A P 9 = 0 . T h u s for all real A, A ( ~ + A P g ) = A~ = h. M o r e o v e r , since c o n e ( X - { ~ } ) = {Py + Qz: z - 0 } , ~ + AP9 lies in X for IAI sufficiently small. Thus s = ~(~ + AP?) + ~(~ - AP.~) is not e x t r e m e in V. C o n v e r s e l y a s s u m e ~ is not e x t r e m e in V, so that it can be written as ~x ~+~x 2 with x ~ E V , i = l , 2 . W e m a y write x ~ = ~ + P f + Q z ~ with z ~->0, ( f , z ~ ) r i = 1,2. W e n o w aim to s h o w that x ~ = s + Py~, i = 1,2. S u p p o s e ~ lies in the relative interior of the k - f a c e F of X. If k = dim X, Q is null and so x ~ = ~ + P f as desired. S u p p o s e n o w k < d i m X . Since s = ~ x t + 8 9 2, we obtain P ( y ~ + f ) + Q(z ~+ z z) = 0. N o w there is a s u p p o r t i n g h y p e r p l a n e to X that meets X in F. L e t u be a n o r m a l to this h y p e r p l a n e , with its direction c h o s e n so that uTx >-- uT'2 for all x E X . then u T p = 0 and u T Q > O , w h e n c e (uTQ)(z ~+z 2 ) = 0 . But since z I, z 2 -->0, this implies that z t = z 2 = 0. H e n c e in either case we have ~ = x + P f , i = 1,2. N o w y ~ 0 , and since ~ + P y ~ E V, we find A 2 + A P y ~= h =AY:, so that A P f = 0. T h u s A P does not have full c o l u m n rank.
Corollary 2.3. ~ is an extreme point of V iff (0, 0) is an extreme point of W. Proof. Trivial. Corollary 2,4. A n y extreme point of V lies in a face of X of dimension at most m.
M.J. Todd[ Large-scale linear programming
5
Proof. A P has m rows and full column rank if g is extreme. Thus it has at most m columns. Henceforth we make the following assumption.
Nondegeneracy Assumption. The affine set {x E R": A x = h} meets no face of X of dimension less than m. Since the affine set has dimension n - m, this assumption is true for almost all choices of h and can be assured by suitable perturbations. It is far weaker than an assumption of nondegeneracy on (1), since it allows degeneracies in the (specially-structured) polyhedron X - - a n example is variable upper bounds, which we discuss in Section 3.4. In the presence of the assumption, we obtain the following. Corollary 2.5. :~ is an extreme point of V iff A P is square and nonsingular. In the example where X = R ~, our nondegeneracy assumption reduces to the usual one that all nonnegative solutions to A x = h have at least m positive components. In this example, let .L P and Q be as above with k = m, so that ~ > 0r 1 - < i -< m. Then if A is partitioned into m and n-m columns as [B, N], we find A P = B, AQ = N so that Corollary 2.5 expresses the classical result that is extreme iff B is nonsingular. The nondegeneracy assumption also yields the following proposition.
Proposition 2.6. If V has extreme points, then ,2 G V is an extreme point precisely when it lies in a face of X of dimension m. Proof. By Corollary 2.3, we need only show that if ~ C V lies in a face F of X of dimension m, it is extreme. Suppose to the contrary that s = ~xl+~x 2 with x i ~ V , i = 1 , 2 and x ' ~ x 2. Let d = x 2 - x '. If Y , + ) ~ d E V for all A ~ R , then d lies in the lineality space of V and V therefore has no extreme points. If on the other hand g + Xd ~ V for some A E R, then for some ~ ~ R, ~ = ~ + , ( d lies in V but lies in a face of X of dimension less than m. This contradicts our nondegeneracy assumption. H e n c e f o r t h we assume V has extreme points. Next we characterize optimality. Let ~ be an extreme point of V with P and Q as above.
Theorem 2.7. ~ is optimal in (1) iff g T Q _ g X p ( A p ) - , A Q >_O. Proof. Suppose g V Q _ g v p ( A p ) - , A Q
>-0 and let x be an arbitrary point of V.
6
M.J. Todd/Large-scale linear programming
Then we m a y write x=)7+Py+Qz with z->O. Since A x = h = A ) 7 , A P y + A Q z = O, so that y = - ( A P ) - ~ A Q z . Thus
we find
gXx = gr)7 + gVpy + grQz = gV)7 + (grQ _ g X P ( A p ) - i A Q ) z > gVy~.
(3)
Hence )7 is optimal. The converse follows from T h e o r e m 2.9 below. Corollary 2.8. )7 is optimal in (I) iff (0,0) is optimal in (2). When X = R~ and )7, P, Q, B and N are as above, we may partition gT as (g~,g[.) in the usual fashion. The theorem then gives the usual optimality since A P = B and A Q = N . This condition is condition g.~,-g~B-~N>-O, v necessary as well as sufficient since we have assumed non-degeneracy (in the standard sense in this case). Theorem 2.9. Suppose gXq _ g r P ( A p ) - l A q < 0 for some column q of Q. Then x(A) = Y, + ~(q - P ( A P ) - t A q ) lies in V with gXx()~) < gV)7 for sufficiently small positive ~. Moreover, /f ,~= sup{iX->0: x(JX)~ V}, then either f~ = +o~ and (1) is unbounded or )7' = x(~) is an extreme point of V with gT)7, < gZ)7. Proof. That gXx(A) < gV)7 for all ,~ > 0 follows from (3) by taking z to be A times
the appropriate unit vector. We may also write x(,~) as )7 + P ( - ) ~ ( A P ) - ~ A q ) + Aq from which it follows that x()Q E X for sufficiently small positive ~. Trivially Ax(X) = AY, = h, so the first part of the theorem is proved. If )~ = : o there is nothing further to prove, so assume )~ < ~. We know f r o m Definition 2.1 that )7 + ,kq for sufficiently small positive A lies in an (m + D-face of X, say G, and G must contain )7 and hence the m - f a c e F. Then x ( ~ ) = )7+ P ( - , ~ ( A P ) - r A q ) + , ~ q lies in G for sufficiently small positive X, and hence for all 0 j + 1) is reduced to upper triangular form using standard techniques [3,4]. The matrix T and hence the representation of its inverse T -~ are unchanged. This case corresponds to a nonkey column leaving the basis in an example with generalized upper bounds. Now suppose the first c o m p o n e n t of x to hit zero is xm+~, 1 - < i -< r. (This corresponds, in the G U B case, to a key variable leaving the basis.) We c o m p u t e the ith row of T-IS, (eTT-I)S = r~v = (rt~ . . . . . r/i,,). If rl~v = 0 then we may directly switch the columns
and
12
M.L Todd/Large-scale linear programming
Thus P, A P and hence L i and U are unchanged. There is a column exchange in T, so that the representation of T -I is updated in a standard way, and Q is updated by post-multiplying by an elementary matrix analogous to that in (4) b e l o w - - w i t h "qT replaced by ~x = eiXT-~H__so that we need consider this case no further. (In the G U B context, this case arises if a key variable leaves the basis, but no basic non-key variable belongs to the same G U B set.) So let us assume that ~ T r 0; then as in the G U B case, we must first interchange the ith columns of C and T with, say, the kth columns of B and S before proceeding as in the first case. The interchange is possible iff ~k :~ 0. Again T is changed by a simple column exchange, and the representation for its inverse is updated by standard techniques. Let us examine the effects on P and A P . Since the face has not changed, the new direction matrix /5 has columns that are linear combinations of those of P. H o w e v e r . since xk has b e c o m e a ' k e y ' and x,,,.~ a ' n o n - k e y ' column, the ith row of P must b e c o m e a unit row. Thus /5 = P E , where 1 1
E =
_ ~i__.! ... "r/ik
l
~i,n
"flik
"flik
.~ k
(4)
1 0
Thus, with A/5 = A P E the new working basis, we have 0 =- L - ~ ( A P ) = UE. If E is upper triangular so is 0 and no further eliminations are required. This will be the case if we choose for k the first index I with ~i~ ~ 0. In the G U B case, this is precisely Tomlin's rule [38]. H o w e v e r , it is equally applicable in the general or network case. See also Kallio and Porteus [23]. In the network case, all ag~t's are 0 or - 1. Thus E has a very simple form, and there is no reason not to choose the first nonzero ~it to maintain the upper triangularity of U. (Alternatively, if triangular factorization is not used, the inverse of E can be added to the product file for the working basis i n v e r s e - - a s in Maier [27], who uses the Markowitz s c h e m e - - o r the explicit inverse can be premultiplied (trivially) by E -~, as in Chen and Saigal [7], for instance.) In the general case, for reasons of numerical stability, we might prefer not to use the first nonzero n, for ~k if this was excessively small, e.g. if ]~1 > 1001"O~k[ for some I. Thus suppose ~k in (4) is not necessarily the first nonzero ~,. Then 0 = U E has the form
M.J. Todd/Large-scale linear programming
13
where Uj, of size (k x k), differs by a rank-one matrix from an upper triangular matrix and /Q3 is upper triangular. Indeed, U~ = U ~ + u v T, where U~ is the corresponding submatrix of U, u is the first k entries of the kth column of U, and v T the first k entries of the non-unit row of E. We reduce 0 to upper triangular form in two steps, as follows. Let the first nonzero entry of v T be vt, l -< k. First we perform permutations and eliminations forming L~ ~ to reduce (~) to ([~), where ~ is zero below its /th entry, in the following order: first we (possibly) interchange the last two entries of u, then we eliminate its final entry; next we proceed similarly on the (k - 2 ) n d and ( k - 1)st entries of u, and so on. The result is
(6)
0
where O~ is the result of applying the operations comprising L f ~ to UL, and is upper Hessenberg, and 02 arises similarly from /-)2. The second step is to reduce the upper Hessenberg matrix (L7 ~U) to upper triangular form as in the standard Bartels-Golub scheme, using operations comprising L~ ~. The operations in L ~ L ? L are added to L, and the resulting upper triangular matrix is the new U. Clearly there is a tradeoff here between the amount of computation (related to the size of k - l) and numerical stability (depending on the size of Irtikl compared to other entries in tilt).
5. A comparison with Dantzig-Wolfe decomposition The decomposition principle of Dantzig and Wolfe ([12], [9, Chapter 23]) is probably the best known method of dealing with some of the constraints of a linear programming problem implicitly, generating information from them as needed. Here we compare this approach with the geometrical viewpoint of Section 2, thus leading into consideration of implicit generation of an appropriate column q of the direction matrix Q by the use of a subproblem in the next section. Again we consider problem (1), which we restate here: min
g TX, AX = h ,
x~X.
(1)
Suppose for simplicity that X is a bounded polyhedron, with extreme points x~ ~..... x r. Then any point in X can be written as a convex combination ~ 0 )tjxj of these extreme points, and substitution in (1) leads to the so-called master problem:
M.J. T o d d / L a r g e - s e a e linear programming
14
rain
~ ,~j(gt x~), j=0
~ )tj(Ax j) = h, 1 =0
(7)
r
;tj -> 0,
j = O . 1. . . . . r.
The D a n t z i g - W o l f e d e c o m p o s i t i o n principle is to solve (7) by the revised simplex m e t h o d , generating c o l u m n s as n e c e s s a r y by solving a s u b p r o b l e m of optimizing a linear f u n c t i o n o v e r X. A s s u m i n g n o n d e g e n e r a c y , a basic solution to (7) involves e x a c t l y m + 1 positive Aj's, say X0,~t~. . . . . ~,,.. This solution c o r r e s p o n d s to the point ~ = ~"_~ ,kjxj of V. For the solution to be basic it is n e c e s s a r y that the xJ's, j = 0 . . . . . m be affinely independent, so that F = conv{x ~. . . . . x"} has dimension m. H o w e v e r , F need not lie in an m - f a c e of X even if it lies in the b o u n d a r y of X. We will call any such set F an 'internal m - f a c e ' . It follows that ,~ m a y not be e x t r e m e in V. Let us a s s u m e h o w e v e r that .~ is e x t r e m e and lies in the relative interior of F, which is a (true) m - f a c e of X. N o t e that by solving for h0 in (7) we obtain r
rain
~ ;ti(gr(x j - x~
gtx~
r
j=~l X~(A(xJ - x~)) = h - Ax ~
(8)
r
j_~L Aj -O,
j = 1,2 . . . . . r.
T h e r e is a close relationship b e t w e e n (8) and (2). The directions x J - x ~ j = 1, 2 . . . . . m, are a basis for s p a n ( F - {~}), and thus could f o r m the c o l u m n s of the matrix P. Similarly, except for the r e q u i r e m e n t s that the c o l u m n s of Q be in some sense e x t r e m e (last s e n t e n c e of Definition 2.1), the directions x J - x ~ j = m + 1.... , r could f o r m the c o l u m n s of Q. If )~0. . . . . x,, are positive in the r e p r e s e n t a t i o n ~ = ~j'--0 Xjx i, then the inequalities ~ = ~ ~ -0, j = 1,2 . . . . . m are not binding at the current solution. T h u s (8) is indeed similar to (2). H o w e v e r , if we make a basis c h a n g e in (7) or (8), the effect is v e r y different f r o m that in our a p p r o a c h . I n c r e a s i n g ~,j c o r r e s p o n d s to m o v i n g in the direction x j - x ~ adjusting the weights on x ~ - x ~. . . . . x m - x ~ (i.e., m o v i n g the base point within F ) as we p r o c e e d . H o w e v e r , as we noted a b o v e , since the directions x j - x ~ j > m, are not c h o s e n to be e x t r e m e , we m a y be led to a k-face of X, k > m + 1. E v e n if we m o v e d as far as possible in this direction while remaining in X, we would in general penetrate the b o u n d a r y of X in a (k - l)-face; thus our
M.J. Todd/Large-scale linear programming
15
new solution would not be extreme in V if k > m + 1. However, since we in fact maintain feasibility in (4) or (5), we will generally not move as far as the boundary of X, but instead terminate with a point ~' lying in an internal surface of X. Indeed, the simplex interpretation [9, p. 160-166] is here very apparent; the point g moves through V by traversing a sequence of (m + l)-simplices, with each successive pair meeting in an internal m-face. The computational inefficiency of the decomposition approach for many classes of problems is thus elucidated by the geometrical contrast between moving through X from (external) m-face to (external) m-face (as in the simplex method) and moving through X via a sequence of (m + 1)-simplices meeting possibly in internal m-faces of X. Of course, the beauty of Dantzig-Wolfe decomposition is the way it generates only as needed the required information about X, by solving a subproblem at each iteration. By contrast, our method seems to require much more knowledge of X to generate the direction matrices P and Q. We will show however, in the next section, how a suitable column q of Q (as required by Theorem 2.9) can be generated by solving a certain 'cone subproblem'. Finally, Section 7 will demonstrate how P and a factorization of AP can be initialized and then updated from iteration to iteration. Thus we will have an implicit method analogous to that of Dantzig-Wolfe, but implementing the simplex method on the original problem (1). 6. A new decomposition principle
Suppose we are given an extreme point ~ of V = {x E X : Ax = h}. We will suppose that an appropriate direction matrix P, and a factorization of the working basis AP, is available, but that the direction matrix Q is not. Let us define ~.T by ~'TAp = gTp. Then, according to Theorems 2.7 and 2.9, we wish to know whether (gT_ 7rTA)q >_0 for each column q of Q; if this inequality fails for some q, we need to know any such q. In this section we show how these tasks can be performed without explicit knowledge of Q. Consider the problem min (gT_ ~rTA)q, q E cone(X - {~}).
(9)
If the optimal value is 0, $ is optimal. Otherwise, (9) is unbounded and we wish to find some q with (gT_ 7rTA)q < 0 lying in a face of cone(X - {~}) of minimal dimension m + I. (Note that the face of c o n e ( X - { ~ } ) of minimal dimension without the inequality is its lineality space, the column space of P ; but (gT_ 7rTA)p = 0.) We call the problem of determining whether the optimal value of (9) is zero, and, if not, finding such a q the 'cone subproblem'. Given its solution, we can proceed with an iteration without further knowledge of Q. The cone subproblem may vary considerably in difficulty, depending on the structure of X. We present two examples.
16
M.J. Todd/Lar~e-scale linear programming
First, let X = {x E R": Rx