The general problem of mathematical programming may be formulated in the ..... though the theory and methods of linear programming did not begin to be ...
MINIMIZATION B.
T.
METHODS
WITH
CONSTRAINTS UDC 518:519
Polyak
The p r e s e n t s u r v e y deals with n u m e r i c a l methods of solving e x t r e m u m p r o b l e m s with c o n s ~ a i n t s , i.e., m a t h e m a t i c a l p r o g r a m m i n g p r o b l e m s . The s u r v e y e n c o m p a s s e s the p e r i o d beginning with the disc o v e r y of the t h e o r y of the t h e o r y of m a t h e m a t i c a l p r o g r a m m i n g (early 1950's) to the p r e s e n t day. 1.
INTRODUCTION
The g e n e r a l p r o b l e m of m a t h e m a t i c a l p r o g r a m m i n g m a y be f o r m u l a t e d in the following way: minf(x), (1)
g(x)~(af(xk), X)q-f(x k) (for all x). Such a method was f i r s t proposed by N. Z. Shor and studied by Yu. M. E r m o l ' e v and B. T. Polyak [59, 60, 64, 107, 110, 152, 153]. N. Z. Shor also proposed methods for a c c e l e r a t i n g the convergence of the generalized gradient method [154-156]. Other m i n i m i z a tion methods for a noncontinuous function use the concept of p i e c e w i s e - l i n e a r approximation and r e q u i r e the solution of auxiliary linear p r o g r a m m i n g type problems [89, 110, 204, 248]. Special methods for minimization p r o b l e m s of noncontinuous functions of the type f ( x ) = max ~ (x, y) will be discussed in Sec. 11. uEQ Let us now pass to a second important class of optimization p r o b l e m s , cnlled linear p r o g r a m m i n g problems. The f i r s t r e s u l t s in this field a r e due to L. V. Kantorovich [76] and date f r o m the late 1930ts though the t h e o r y and methods of linear p r o g r a m m i n g did not begin to be i n t e n s i v e l y developed until the 19501swith the works of Dantzig, Von Neumann, Kuhn, T u c k e r , and others. Many original and t r a n s l a t e d monographs and textbooks on linear p r o g r a m m i n g have been published in Russian including Gass [24], E. G. GolVshtein and D. B. Yudin [28], Dantzig [38], Zoutendijk [67], S. I. Zukhovitskii and L. I. Avdeeva [68], Karlin [82], and D. B. Yudin and E. G. Goltshtein [159-160]. The simplex method, also called the method of s u c c e s s i v e improvement of a plan, is a fundamental method for solving linear p r o g r a m m i n g p r o b l e m s . In idea it is close to the Gauss elimination method (more exactly, the Jordan method) in linear algebra and yields an exact solution of a problem in a finite n u m b e r of steps, a s s u m i n g the absence of computational e r r o r s . Actual p r a c t i c e has shown that this numb e r is r e m a r k a b l y small (on the o r d e r 2 m - 3 m , where m is the number of constraints). T h e r e exist diff e r e n t computational s c h e m e s for the simplex method (direct algorithm, algorithm with inverse matrix, multiplicative algorithm) and also different modifications of it (dual method, a method of simultaneous solution of the d i r e c t and dual problem). The general method is s o m e t i m e s substantially simplified for partieular p r o b l e m s of linear p r o g r a m m i n g , for example, for the transportation problem. Iterative methods of linear p r o g r a m m i n g have been developed to a significantly l e s s e r extent. A s u r vey of c e r t a i n e a r l y studies in this field can be found in [28]. New Iterative methods have r e c e n t l y appeared, some ofwhich m a y b e competitive with finite methods [7, 13, 21, 49, 57, 111, 116, 126, 127, 138, 144146, 180,226-227, 235, 256, 281]. Still another c l a s s of p r o b l e m s for which t h e r e exist finite methods of solutions a r e quadratic p r o g r a m m i n g problems. H e r e different a l g o r i t h m s belonging to the simplex-type method and which lead to a solution in a finite number of steps have been developed. A s u r v e y of this c l a s s of problems c a n b e found in Kunzi and Krelle [88] and Boot [189] (cf. also [67]) and n u m e r i c a l e x p e r i m e n t s with different quadratic p r o g r a m m i n g methods have been d e s c r i b e d in [194]. A s i m p l e r quadratic p r o g r a m m i n g probletn is often encountered in which a quadratic f o r m is defined by a unit matrix: rain Ilx- all ~, where Ax --- b. In this case the dual problem is simply to minimize a ". 1 quadratic f o r m on an octant: rmn~ llAry--a[[ 2, where y >- 0 and can be solved using special methods, such as the conjugate gradient method generalized to this c a s e [23, 112]. 3.
MINIMIZATION
ON S I M P L E
SETS
In this section we will c o n s i d e r the problem m i n / ( x ) , xEQcR"
(3)
where Q is a ".4imple" set (cL Sec. 1). We will limit o u r s e l v e s to situations when the s t r u c t u r e of Q is not defined in detail, and where it is only important that simple minimization-type auxiliary p r o b l e m s for a linear or quadratic function be solvable on Q. The c a s e of an explicit definition of Q by m e a n s of linear constraints and methods using this s t r u c t u r e of Q will be t r e a t e d below (Sec. 5). It turns out that many absolute minimization methods g e n e r a l i z e to p r o b l e m s with simple constraints. i00
a. Gradient Methods. The f i r s t method for introducing constraints is to r e t u r n at each step on the set Q by projection (gradient projection method): x k+~= PQ ( x * - =~V'f (x~)). H e r e PQ is the p r o j e c t i o n o p e r a t o r on Q, i.e., HPQ(Z)--z[l-----min[lx--zi[. xE~
F o r a number of c a s e s it is possi-
ble to d e s c r i b e explicitly the f o r m of the projection o p e r a t o r . F o r example, if Q = {x fi R n, x ->. 0}, we have PQ (z) = z+ (z+ is the v e c t o r with coordinates (zt)+ . . . . . (Zn)+, where (zi) + = max {0, zi)). If Q = {x E R n, a < x < b } , w e h a v e P Q ( Z ) i = b i i f z i > b i , P Q ( z ) i = z i if a i ~ z i- 0 is s o m e n u m e r i c a l p a r a m e t e r , is the s i m p l e s t e x a m p l e of such a modified function. C l e a r l y , when K = 0 the modified function coincides with the o r d i n a r y function and when y = 0, M (x, 0) is the s a m e as in the penalty function method. The function M (x, y) is (under n a t u r a l a s s u m p t i o n s and for sufficiently l a r g e K) convex with r e s p e c t to x in the neighborhood of (x*, y* ), w h e r e a s L ( x , y) l a c k s this p r o p e r t y . M o r e o v e r a m i n i m u m for M (x, y* ) with r e s p e c t to x is attained only in the solutions of the initial p r o b l e m (the stability p r o p e r t y is not inherent to L (x, y)). This m a k e s the modified L a g r a n g i a n function v e r y suitable for f o r m u l a t i n g e x t r e m u m conditions (for e x a m p l e , it is possible to f o r m u l a t e duality t h e o r e m s for nonconvex p r o b l e m s by m e a n s of it) and a l s o in developing n u m e r i c a l m e t h o d s . The function M (x, y) was f i r s t introduced in the m o n o g r a p h of A r r o w , H u r w i c z , a n d Uzawa ([158], Chap. 11). Then, roughly s i m u l t a n e o u s l y and independently, a n u m b e r of a u t h o r s (Hestenes [239], Powell [290], and H a a r h o f f and Buys [236]) p r o p o s e d a different method, b a s e d on the u s e of M(x, y): M (x ~+', yk) = min 214(x, y~), yk§ = yk + Kh (xk+l). Powell [290] and B. T. P o l y a k and N. V. T r e t ' y a k o v [117] investigated c o n v e r g e n c e conditions for the m e t h od. It turned out that they a r e substantially l e s s r e s t r i c t e d than for methods b a s e d on the m i n i m i z a t i o n of L (x, y) (strong c o n v e x i t y - t y p e conditions a r e not required) and that the method c o n v e r g e s at the r a t e of a g e o m e t r i c p r o g r e s s i o n whose r a t i o is the l e s s e r , the g r e a t e r is K. The method h a s been given different n a m e s by v a r i o u s a u t h o r s b e c a u s e of different i n t e r p r e t a t i o n s , including the method of m u l t i p l i e r s , the d i s placed penalty method, a u x i l i a r y p a r a b o l a method, and penalty e s t i m a t e method. S e v e r a l s i m i l a r methods have been c o n s i d e r e d by F l e t c h e r [219, 220] and M a r t e n s s o n [266]. Computational efforts have a l s o been d e s c r i b e d [277, 311]. 5.
PROBLEMS
WITH
LINEAR
CONSTRAINTS
P r o b l e m s with l i n e a r c o n s t r a i n t s , i.e., of the f o r m
(d~,
min/r (x), ~)- 0. F i r s t of all, to solve Eq. (6) it is possible to use all the methods d e s c r i b e d in Sec. 3. They r e d u c e to a sequence of l i n e a r o r quadratic p r o g r a m m i n g p r o b l e m s that can be solved under the initial c o n s t r a i n t s . H e r e we will d e s c r i b e a different type Of method g e n e r a l l y a s s o c i a t e d with a u x i l i a r y p r o b l e m s of l e s s e r dimension that can be solved on s e p a r a t e f a c e s of a polyhedron definable by the c o n s t r a i n t s . N e a r l y all t h e s e methods a r e f e a s i b l e d i r e c t i o n methods, in t e r m s of Z o u t e n d i j k ' s c l a s s i f i c a t i o n [67]. In other w o r d s , all the i t e r a t i o n s x t . . . . . x k satisfy the c o n s t r a i n t s , while the direction of motion is selected, such that it does not s i m u l t a n e o u s l y develop beyond the c o n s t r a i n t and r e m a i n s a direction along which the function to be m i n i m i z e d d e c r e a s e s . a. Method of C o o r d i n a t e - b y - C o o r d i n a t e Descent. If the c o n s t r a i n t s have the f o r m a m~xl(ft1(x*)l-~}, and e > 0 is a given p a r a m e t e r .
The solution of the a u x i l i a r y p r o b l e m , p k s e r v e s for c o n s t r u c t i n g a new a p p r o x i m a t i o n x k+l = x k + akp k, w h e r e the length of the step 0 ~o,t~11,[ tGik
IEJk
and its solution denoted by u k, v k, then
J~s~ pk_~ _ ( V f ( x k )
-t- E .u~Vgi (x~)+ E v~Vhl(xk)), the s i m i l a r i t y of
tel ~ SEJ~ the P s h e n i c h n y i method to that d e s c r i b e d in See. 5 for the simultaneous solution of the d i r e c t and dual p r o b l e m s t h e n b e i n g e v i d e n t . If the p r o b l e m is convex, the P s h e n i c h n y i method a p p r o a c h e s the feasible d i r e c tion method with n o r m a l i z a t i o n N1. Finally if no inequality-type c o n s t r a i n t s o c c u r , we have a p r o j e c t i o n type method d e s c r i b e d in Sec. 4. A second method of solution, a l s o using only f i r s t d e r i v a t i v e s of functions, is b a s e d on the concept of e l i m i n a t i n g v a r i a b l e s . It was p r e s e n t e d in [162], w h e r e the Wolfe r e d u c e d gradient method was g e n e r alized to the c a s e of g e n e r a l c o n s t r a i n t s . b. N e w t o n - T y p e Methods. If the calculation of second d e r i v a t i v e s of the functions o c c u r r i n g in the p r o b l e m i s s i m p l y a c c o m p l i s h e d , it is p o s s i b l e to use a different analog of Newton's method. All the conm
p
s t r a i n t s a r e l i n e a r i z e d at x k in this method, the L a g r a n g i a n function L (x, u, v) = f (x) + ~ u ~ (x) + ~ vih j (x) ~-i
j-1
is a p p r o x i m a t e d by a quadratic function, and an a u x i l i a r y quadratic p r o g r a m m i n g p r o b l e m is soIved, namely, ~nin [(V f
(xk), x - - x~) q--~t (V 2xxL (xk ' uk, v~)(x--x~);
x - - x~)],
g~ (x k) + (XTg~(x~), x -- x k) ~ k. It then satisfies the r e c u r r e n c e r e l a t i o n (dynamic p r o g r a m m i n g equation) Vk_,(x~.,) = rain [f~_,, (x~_,, xk) + V~ (x~)i. The application of various approximate (and, in x~EQ~
s e p a r a t e c a s e s , exact) methods for successfully solving such r e c u r r e n c e equations makes it possible to obtain a solution of the initial problem. Some new variants of the dynamic p r o g r a m m i n g method (for continuous problems) have been d e s c r i b e d by J a c o b s e n and Mayne [245]. c. R e s o u r c e Distribution P r o b l e m s . An optimization problem in which the function to be minimized and t h e c o n s t r a i n t s a r e s e p a r a b l e , n
rain ~ A (x~),
~ g~ (x,) ..< at, ~= 1. . . . , m, ~" kjk (xk) = b 1, ] ~ 1. . . . . p,
is usually called an optimal r e s o u r c e distribution problem. Some optimization methods a r e p a r t i c u l a r l y effective for these p r o b l e m s . Since the Lagrangian function in this c a s e is separable in x, its minimization r e d u c e s to one-dimensional p r o b l e m s . T h e r e f o r e methods based on minimizing the Lagrangian function a r e
113
e x t r a o r d i n a r i l y simplified.
In p a r t i c u l a r , i n t h e s i m p l e s t p r o b l e m tl
min ~ A (xk), /t--I
~ , , x ~ = l , x k > O k = l . . . . . n,
n the solution is r e d u c e d to s e l e c t i n g a n u m b e r y such that
~ xk ( y ) = 1, w h e r e x k (y) is the point that rain"
i m i z e s f k ( X k ) + yx k with r e s p e c t to x k -> 0. It is also possible to use dynamic p r o g r a m m i n g - t y p e methods when m and p a r e of low dimension. A s u r v e y of methods of solving optimal r e s o u r c e distribution p r o b l e m s h a s been given by L. S. Gurin, Ya. S. D y m a r s k i i , and D. A. Merkulov [32]. d. G e o m e t r i c P r o g r a m m i n g .
If a m a t h e m a t i c a l p r o g r a m m i n g p r o b l e m h a s the f o r m rain go (x),
g~(x) ~ 1, i = I . . . . . m, r
w h e r e go, gt, . . . ,
gm a r e polynomials, i.e., e x p r e s s i o n s of the f o r m g ( x ) = ~ c j x ~ ; ~ . . "-nraJn' it is called a j=l
g e o m e t r i c p r o g r a m m i n g p r o b l e m . T h e r e e x i s t special methods of solution for such p r o b l e m s b a s e d on the fact that the p r o b l e m b e c o m e s convex following substitutions of the f o r m x k = e zk and the p r o b l e m dual to it p o s s e s s e s l i n e a r c o n s t r a i n t s (Daffin, P e t e r s o n , and Z e n e r [39]). e. Block P r o b l e m s . The b a s i c concept of decomposition methods is the s a m e a s for a dynamic p r o ' g r a m m i n g method, n a m e l y to r e d u c e an initial p r o b l e m with higher dimension to the solution of a sequence" of p r o b l e m s , each of l e s s e r dimension. Methods of solving block p r o b l e m s of linear p r o g r a m m i n g have b e e n h i g h l y developed [19, 28, 37, 61, 126, 127, 210, 235, 256]. Block p r o g r a m s of convex p r o g r a m m i n g a r e of the s a m e f o r m a s r e s o u r c e distribution p r o b l e m s , with the difference that the x i a r e not s c a l a r s but v e c t o r s and c o n s t r a i n t s of the type x~EQ~cRnt a r e imposed on them. Decomposition methods a r e b a s e d on the s a m e concepts a s methods of solving s e p a r a b l e p r o b l e m s [106, 241]. f. Composite Punctionals. Suppose the function to b e m i n i m i z e d has the f o r m f ( x ) = r where (z), z E R r , is a continuous function and R :R n - * R r is a nonlinear o p e r a t o r . Then in addition to g e n e r a l methods for m i n i m i z i n g f ( x ) on a given s e t Q, s p e c i a l methods e x i s t i n t e r m e d i a t e between Newton's m e t h od and the gradient method b a s e d on the quadratic a p p r o x i m a t i o n of r and a linear a p p r o x i m a t i o n of R (x). F o r e x a m p l e , at the k - t h step it is possible to solve the a u x i l i a r y p r o b l e m min [(VR (xk)r~z+ (R (xk)), X--x~) +-~1 [~zR(x~)r~'2,e(R(x~))V R(x~) (x-x~), x - x ~ ) ] x~q and to take its solution as the Xk+i. If O = R n and if ~o(z) = llz]l 2, this method coincides with the well-known G a u s s - N e w t o n method for m i n i m i z i n g a sum of s q u a r e s . Other v a r i a n t s of the method for p r o b l e m s without c o n s t r a i n t s can be found in Daniel [209] and p r o b l e m s with c o n s t r a i n t s have also been c o n s i d e r e d [115, 185, 304]. Many r e s u l t s on n u m e r i c a l e x p e r i m e n t s for m i n i m i z i n g c o m p o s i t e functions have been p r e s e n t e d by B a r d [176l. E x t r e m u m P r o b l e m s on G r a p h s . A s p e c i a l c l a s s of p r o b l e m s a r e those r e l a t e d to optimizing n e t w o r k s , for e x a m p l e , the t r a n s p o r t a t i o n p r o b l e m in a network formulation. It is 0sually unsuitable to r e d u c e them to g e n e r a l m a t h e m a t i c a l p r o g r a m m i n g p r o b l e m s , since their s t r u c t u r e would not then be used. The r e a d e r who w i s h e s to b e c o m e acquainted with e x t r e m u m p r o b l e m s on graphs and methods for solving them is r e f e r r e d to the monograph of Yu. M. E r m o l ' e v and I. M. Mel'nik [62]. We mention still other special types of m a t h e m a t i c a l p r o g r a m m i n g p r o b l e m s . T h e s e include f r a c t i o n a l linear p r o g r a m m i n g and p a r a m e t r i c p r o g r a m m i n g [24, 28], the g e n e r a l i z e d linear p r o g r a m m i n g p r o b l e m [38, 210], linear dynamic p r o g r a m m i n g [121], optimization p r o b l e m s for multiply connected s y s t e m s [97], and oth or s.
114
11.
RELATED
PROBLEMS
H e r e we will t r e a t a n u m b e r of p r o b l e m s which, though not f o r m u l a t e d d i r e c t l y a s e x t r e m u m p r o b l e m s , a r e c l o s e l y r e l a t e d to t h e m . a) Solution of Inequalities and Finding the C o m m o n Point of Convex Sets. The p r o b l e m of finding the solution of a s y s t e m of convex inequalities gi (x) ~ 0, w h e r e i = 1, . . . . m, can be c o n s i d e r e d as a p a r t i c u l a r c a s e of a convex p r o g r a m m i n g p r o b l e m in which the function to be m i n i m i z e d is a constant. On the other rn
hand, this p r o b l e m r e d u c e s to absolute m i n i m i z a t i o n o f a discontinuous function, for e x a m p l e , f { x ) ~ ~ g~(x)+. H o w e v e r , t h e r e a l s o e x i s t specific methods f o r solving inequalities which we now c o n s i d e r . We begin with the g e n e r a l p r o b l e m of finding a c o m m o n point of a s y s t e m of convex s e t s Q~cR", w h e r e i = 1 . . . . . m. The s u c c e s s i v e p r o j e c t i o n method c o n s i s t s in p r o j e c t i n g in turn on each of the s e t s , i.e., a sequence ~' = PQ,(X~ x2=Pr .. :, x ~§ =PQm(x ~) . . . . is c o n s t r u c t e d . In a second v a r i a n t of the method p r o j e c t i o n is p e r f o r m e d a~
to the m o s t distant set. It is p o s s i b l e to p r o v e that x k - ~ x * E Q = QQ~.
T h i s m e t h o d w a s p r o p o s e d in 1937 by
K a c z m a r z for the c a s e of Qi a h y p e r p l a n e (i.e., for solving a s y s t e m of linear equations), and by Agmon [165] for l i n e a r inequalities, while Motzkin and Shoenberg [279] extended it in 1954 to the c a s e of a r b i t r a r y s e t s . D i f f e r e n t modifications and g e n e r a l i z a t i o n s of the method as well as the r a t e of c o n v e r g e n c e have been i n v e s t i g a t e d by L. M. B r e g m a n [9], L, G. Gurin, B. T. Polyak, and ]~. V. Raik, [31], I. I. E r e m i n [51], A. I. L o b y r e v [93], V. A. Yakubovich [161]; and T o m p k i n s [308]. In p a r t i c u l a r , it h a s been p r o v e d [31, 107, 161] that if Q h a s inner points and a step is m a d e in the d i r e c t i o n f r o m x k to P i ( x k ) , but of the length 7k, where "[4-~ 0, ~ Tk= c~,
the method l e a d s to a solution in a finite n u m b e r of steps. Other finite v a r i a n t s of the
k=l
method have a l s o been p r e s e n t e d [279, 161]. If all the Qi a r e s e m i s p a c e s , i.e., Qi = {x : gi (x)