Cost approximation: A uni ed framework of descent algorithms for

0 downloads 0 Views 464KB Size Report
Mar 31, 1995 - subproblem in the Frank{Wolfe (or conditional gradient) method ( 37, 58, .... high as that given by the Frank{Wolfe algorithm; it is, in fact, strictly ...
Cost approximation: A uni ed framework of descent algorithms for nonlinear programs Michael Patriksson

y

March 31, 1995

Abstract. The paper describes and analyzes the cost approximation algorithm, a general framework of iterative descent algorithms for nonlinear programs and variational inequality problems. A common property of the methods included in the framework is that their search direction nding subproblems may be characterized by monotone cost approximating mappings, which replace an additive part of the original cost mapping in an iterative manner; alternately, a line search is made in the direction obtained in order to reduce the value of a merit function for the original problem. By characterizing iterative algorithms through their corresponding sequences of cost approximating mappings, their relationships and di erences in convergence conditions may be described precisely. The framework encompasses a large number of iterative algorithms as well as several algorithm classes earlier presented for nonlinear programming problems. For applications to nondi erentiable optimization and its specializations to di erentiable optimization with and without constraints, the convergence characteristics of the algorithm is analyzed for varying monotonicity assumptions on the cost approximating mappings and conditions on the choices of step lengths. The convergence analysis re nes and extends that of two earlier studies [Partial linearization methods in nonlinear programming, J. Optim. Theory Appl., 78 (1993), pp. 227{246], [A uni ed description of iterative algorithms for trac equilibria, European J. Oper. Res., 71 (1993), pp. 154{176], and includes results for both exact and inexact computations and a convergence rate analysis. As an application of the latter, we establish the linear convergence of a general regularization algorithm. Key Words. nondi erentiable optimization, di erentiable optimization, cost approximation, partial linearization, descent algorithms, algorithmic equivalence, convergence analysis AMS Subject Classi cation. 49M37, 90C30 Abbreviated Title. Cost approximation algorithms for nonlinear programs

1 Introduction The eld of nonlinear programming has seen a dramatic development in iterative algorithms over the past few decades. This development has taken place in a wide variety of areas of applications, where di erent features of iterative algorithms and assumptions on the problem have been considered. In many cases, developments in one eld of application have not been transferred to other elds where they may be of interest. Presently therefore there exist several versions of what is essentially the same algorithm; the use of very di erent descriptions of them contributes to the diculty of realizing this fact. The main motivation for the work leading to the formulation of the cost approximation algorithm was a wish to determine the relationships between iterative algorithms for mathematical programming problems; the natural framework for such an analysis was found to be a uni cation of existing algorithms in di erent elds of applications of nonlinear programs and variational in This research was sponsored in part by grants from the Swedish Institute (303 GH/MLH) and the Swedish Research Counsil for Engineering Sciences (TFR) (282-93-1195) y Department of Mathematics, Link oping Institute of Technology, S-581 83 Linkoping, Sweden; presently at Department of Mathematics, GN{50, University of Washington, Seattle, WA 98195

1

equalities. A summary of the results obtained can be found in the author's doctoral dissertation [78]. Unifying frameworks are of value both as a means to summarizing existing knowledge and to provide tools for investigating relationships and performing analyses. This paper presents a framework of descent methods for nonlinear programming problems. A common property of the methods included in the framework is that their search direction nding subproblems may be characterized by cost approximating mappings, which replace an additive part of the original cost mapping in an iterative manner. The number of methods that can be described within the framework is large, and included are many important|and quite diverse|algorithms. The main contribution of the paper is the construction of the framework of cost approximation algorithms. This new concept contributes to the understanding and interpretation of|seemingly unrelated|iterative methods for di erent problem classes and is, in our view, a natural and very elegant way of describing them. Bene ts that are gained from the construction of the framework are the possibilities of relating algorithms in several ways, e.g., with respect to their requirements for guaranteeing convergence, and a cross-fertilization among the elds of application where the methods included have originally been developed. The technical contribution of the paper is a convergence analysis of the algorithm in applications to a general nondi erentiable optimization problem and its specializations to constrained and unconstrained di erentiable optimization. The convergence analysis includes investigations of inexact solutions of the auxiliary problems, di erent step length rules and convergence rate results. The remainder of the paper is organized as follows. In the next section, we introduce the problem under consideration and discuss some of its most important properties. In Section 3, we state the cost approximation algorithm. In Sections 4{6, we relate the cost approximation algorithm to existing methods and analyze its convergence properties for applications to nondi erentiable optimization problems, and their specializations to constrained and unconstrained optimization. Finally, in Section 7, we discuss possible extensions of the results presented.

2 The generalized equation

Let u : 2( ? 1) [(x) ? (y)]T d ; Mrf kdk2 i.e., ? (y)]T d : t~ > 2 ( ? 1) [(x) (4.21) Mrf kdk2 From (4.20) we may also nd conditions under which the unit step, i.e., { = 0, is accepted by Rule A'. Especially, for a strongly monotone mapping , the Armijo Rule A' yields a unit step ({ = 0) if 2(1 ? )m  1: (4.22) Mrf Note that, from (4.21), the step length given by Rule A' always satis es  ? (x)]T d  ; t  min 1; 2 (1 ? )[(y) Mrf kdk2 which reduces to   2 (1 ? )m  ; (4.23) t  min 1; M rf whenever  is strongly monotone. The next convergence result corresponds to the choice of strictly monotone cost approximating mappings k . In order to ensure convergence under this monotonicity property, the sequence fk g must further be given by a continuous mapping  on dom u  dom u of the form (x; y); this mapping is then strictly monotone on dom u with respect to x. The introduction of strict monotonicity means that, compared to the requirements of Theorem 4.4, the mappings k need not be gradients, and the exact line search Rule E may be replaced by the inexact step length Rule A'. Further, continuity of u is replaced by Lipschitz continuity of rf and a quali cation (rint (dom u) 6= ;). 18

Theorem 4.5 (Convergence under Rule A') Assume that rint (dom u) 6= ; and that rf is Lipschitz continuous on dom u (with modulus Mrf ). Let  : dom u  dom u 7! < be a continuous mapping on dom u  dom u of the form (x; y), maximal and strictly monotone on dom u with respect to x. Assume that the point x0 is chosen so that the level set LT 1, we have





T(xk+1 ) ? T(xk )  ?mk + M2rf kdk k2 ;

where ?mk + Mrf =2 < 0. Rule A' From (4.19), (4.21) and the strong monotonicity of k , we have T(xk+1 ) ? T(xk )  ? tk [k (yk ) ? k (xk )]Tdk ? ) k k k k T k2  ? 2 (1 Mrf kdk k2 ([ (y ) ?  (x )] d ) 2  ? 2 (1M? )mk kdkk2 : rf Rule P With tk  1, from (4.25) we obtain a decrease in T whenever tk belongs to a compact subset of (0; 2mk =Mrf ). Since T is lower bounded and, from the above, fT(xk )g is decreasing, fT(xk+1 ) ? T(xk )g ! 0. Thus, in each of the three rules we obtain that fdkg ! 0. From (4.7) and the upper bound on Mk , as fdkg ! 0, frf(xk ) + uk g = fk (xk ) ? k (yk )g ! 0: (4.26) Assume now that fxk g has an accumulation point, x1 , corresponding to a convergent subsequence fxk gk2K (K  N ). Since fyk ? xk g = fdk g ! 0, we have that fyk gk2K ! x1 . From the closedness of @u ([94, Th. 24.4]) and the continuity of rf, (4.26) yields that fuk gk2K ! u1 2 @u(x1 ), and, again appealing to (4.26), rf(x1 ) + u1 = 0. Thus, x1 2 . (b) The result (4.13) follows from Lemma 4.3.a. (c) The result follows from the equality kxk+1 ? xk k = tk kdkk, the result of (a) that fdk g ! 0, the upper bound on tk and Lemma 4.3.b. 2 From the proof considering Rule A', we may conclude that convergence is guaranteed for any step length rule that results in a larger reduction of T at each iteration than the Armijo Rule. To ensure that a unit step implies convergence in Rule P, the cost approximating mapping  should be chosen such that 2m > 1: (4.27) Mrf [This condition is implied by (4.22).] The maximal allowed step length in Rule P is, from the above theorem, bounded by the constant 2mk =Mrf . Now, suppose that f may be written as f = f1 + f2 , where f1 and f2 are convex with Lipschitz continuous gradients (with modulus Mrf1 and Mrf2 , respectively), and that f1 is strongly convex. For the sake of this example, we further assume that f1 is quadratic. To enhance the speed of convergence of the cost approximation algorithm, we consider rede ning k as k := k + rf1. (This operation corresponds to applying a cost approximation to the function [u + f1 ] + f2 , instead of to u + f.) Indeed, this operation yields an increased maximal allowed step length, since 2mk +rf1 = 2(mk + mrf1 ) > 2mk : Mrf2 Mr(f ?f1 ) Mrf It also yields a faster convergence for the other step length rules; for Rule A', cf. the change in the linear convergence ratio in Theorem 4.8 below. 21

From this example, we are lead to conclude that, in order to achieve the best possible rate of convergence, the function 'k := 'k + f1 should be constructed such that f2 = f ? f1 is not strongly convex. (The desire to obtain a high rate of convergence must of course be weighed against the computational diculty of the subproblems.) Chen and Rockafellar [19] study splitting methods for the problem of nding a zero of the sum of two maximal monotone operators. They argue that all the strong monotonicity that is inherent in a problem de ning mapping should be kept in the mapping de ning the forward step ; this result corresponds to the above for this special case of cost approximation (see Section 5.2).

4.7 A truncated version of the cost approximation algorithms

From Lemma 4.1.b.2, d = y ? x de nes a direction of descent with respect to T whenever T' (y) < T' (xk ). The idea behind the truncated cost approximation method is to reduce the work performed on [NDP' ] by limiting the number of iterations performed solving it with a descent algorithm. This strategy introduces a trade-o between the computational e ort spent on solving the subproblem and the quality of the search direction obtained. Compared to the assumptions of Theorem 4.4, the requirement that the subproblem [NDP' ] is solved exactly is replaced by the requirement that the procedure for approximately solving [NDP' ] is a descent method with a closed algorithmic map (in order to ensure the xed point property), and the requirement that (4.4) holds for every x 2 dom u is replaced by the assumption that the resulting sequence fy k g of approximate subproblem solutions is bounded.

Theorem 4.7 (Convergence of truncated cost approximation algorithms) Let u be continuous on dom u. Let ' : dom u  dom u 7! < be a continuous function on dom u  dom u of the form

'(x; y), convex and in C 1 on dom u with respect to x. Assume that x0 is chosen such that LT 0, fxk g ! x 2 , where f is twice continuously di erentiable and r2 f(x ) is positive de nite. If B is chosen so that  ?1 2  = M m kI ? B r f(x )k < 1;

(4.38)



then fxk g converges Q-linearly with the ratio o n q  max ; 1 ? 2 (1M?r f)m (1 ? ) ; for Rule A', and q  max f; 1 ? t(1 ? )g for Rule P; where t = inf k ftk g  a1 .

Proof The proof is based on that of [38, Th. 5.1], given for the special case B = (1= )I, > 0,

and for the corresponding step length Rule A'. From Lemma 4.1.a.1 and the nonexpansiveness property of P, we have kyk ? x k = kPQ(xk) ? PQ(x)k  kQ(xk ) ? Q(x) ? rQ(x )(xk ? x )k + krQ(x)(xk ? x)k; where rQ(x) = (1=m )(B ? r2f(x )). Let " > 0 be arbitrary. Since fxkg ! x , we have for a suciently large k that kQ(xk ) ? Q(x ) ? rQ(x)(xk ? x )k  "kxk ? x k: We also have that krQ(x)k  (M =m )kI ? B ?1 r2f(x )k = , so that kyk ? x k  ( + ")kxk ? x k: Then, kxk+1 ? x k = kxk + tk dk ? xk  tk kyk ? x k + (1 ? tk )kxk ? x k  [1 ? tk (1 ?  ? ")]kxk ? x k: For Rule A', from (4.23), we then have, for a large enough k, kxk+1 ? x k  q"kxk ? x k; where   2 (1 ? )m  (1 ?  ? ") ; q" = max  + "; 1 ? M rf while, for Rule P, (4.39) holds for large k with q" = max f + "; 1 ? t(1 ?  ? ")g : Since " was arbitrary we have asymptotically k+1  q = lim sup kxkxk ??xx kk  q0; k!1 25

(4.39)

where   2 (1 ? )m ' (1 ? ) ; q0 = max ; 1 ? M rf

in the case of Rule A', and q0 = max f; 1 ? t(1 ? )g ; for Rule P. If  < 1, then q0 < 1, and the theorem is proved. 2 If (4.27) holds, then tk  1 is a valid step length in Rule P and q  . Hence, the choice B = r2f(x ) yields superlinear convergence, i.e., q = 0 in (4.36). For Rule A', the same conclusion may be drawn, provided that (4.22) holds. We have not been able to establish linear convergence for Rule E, since no lower bound on the step length is available. Nor have we been able to establish linear convergence for nonane or iteration dependent cost approximating mappings; in the special case where u is an indicator function of a nonempty, closed and convex set, however, such results are available (see Sections 5.10 and 6.7). Despite the fact that the linear convergence theorem is valid only for iteration independent ane cost approximation mappings, it is strong enough to reproduce some well known linear convergence results in di erentiable optimization, as we illustrate below. Corollary 4.1 (Linear convergence of some instances of cost approximation algorithms) Let B = (1= )I , > 0.

(a) The condition (4.27) for convergence under Rule P using unit steps implies the linear

convergence condition (4.38). (b) Let Rule P be applied, using unit steps. Then the linear convergence ratio is

q  max fj1 ? mj; j1 ? M jg ; (4.40) where m and M are the smallest and largest eigenvalue of r2 f(x ), respectively. (c) Let further u  X , where X is a nonempty, closed and convex set in 0 and Bk is a symmetric and positive semidefinite matrix on X, yields a subproblem where rf(xk )T (x ? xk ) + 2 1 (x ? xk )T Bk (x ? xk ) k is minimized over x 2 X. If the matrix Bk is positive de nite on X, yk is given by yk = PXBk (xk ? k Bk?1 rf(xk )): This is the subproblem of the class of (de ected ) gradient projection methods, including, as special cases, the gradient projection method (Bk  I) ([43, 58]) and Newton's method (Bk = r2 f(xk ); k = 1, for all k) ([58, 90, 31]). The framework contains a class of regularization methods. Let 'k (x) = f(x)+1=(2 k )Dk (x), where k > 0 and where the function Dk is convex and in C 1 on X. Furthermore, if we assume that rDk (xk ) = 0 holds, then f'k (x) = f(x) + 2 1 Dk (x); k which is a generalization of the subproblem objective in regularization methods ([58, 107, 87]). In particular, if we let Dk  D have the form Dr (x; y) = r(x) ? r(y) ? rr(y)T (x ? y); where r : X 7! < is a Bregman function, then Dr is the D-function of Bregman [10], also studied in [29, 17, 106, 33, 18]. (Usually, k  1 is used.) An extensively studied special case is the result of choosing r(x) = 21 kxk2, which yields f'k (x) = f(x) + 2 1 kx ? xk k2; k i.e., the subproblem objective of the proximal point method (e.g., [65, 97, 88, 98, 99, 8, 9]). (The main reason for considering regularization methods is the wish to strictly convexify a nonconvex 28

objective; regularization methods are usually studied under the additional assumption that D is strongly convex.) Closely related to the proximal point algorithm is the class of splitting methods ([11, 60, 108, 34]) for nding a zero of the sum of two (maximal) monotone operators. (For an objective of the form f = g + h, in a splitting algorithm the function 'k is given by 'k (x) = g(x) + 1=(2 k )kx ? xk k2.) It has been shown ([57, 34]) that a number of algorithms in this class, such as Lions-Mercier splitting ([60]) and Douglas{Rachford splitting ([30]) are instances of the proximal point algorithm. It is also known ([40, 108]) that the method of alternating directions ([42, 41]) is a special case of Lions{Mercier splitting, and that the same holds for the method of partial inverses ([102, 103]) and the method of Han and Lou [45]. It has also been observed that augmented Lagrangean methods (e.g., the method of multipliers [46, 89, 44, 8]) de ne special cases of proximal point methods for dual programs, see for instance [97, 98, 8] and [9, Sec. 3.4]. All the above mentioned methods are a fortiori cost approximation methods. Several specialized algorithms for Lagrangean dual formulations of convex programs can be identi ed as cost approximation methods. Apart from those already mentioned, classical multiplier methods such as Uzawa's method [110] and the Arrow-Hurwicz algorithm [3], the price coordination methods of Takahara [105] and Bernhard [7] are included, as are sequential linear and quadratic programming algorithms for the solution of the KKT conditions ([5]). Cohen [22, 23, 24] discusses at length dual decomposition/coordination methods (e.g., [3, 105, 56]) in the context of the auxiliary problem principle, and gives several examples of how it can be applied in two-level algorithms and price coordination schemes; from the algorithmic equivalence Theorem 5.1.c, this discussion applies to the class of cost approximation methods as well. However, a further analysis of applications to dual programs is beyond the scope of this paper. In the special case where f is quadratic [f(x) = 21 xT Qx + qT x, for some symmetric and positive semide nite matrix Q 2 0, or, with "k  1, the sequence ftk g satisfying tk  maxft j xk + tdk 2 X g and (5.10). Cohen proves that any accumulation point of fxk g (at least one exists) is optimal in [CDP]. 33

This result essentially corresponds to that for Rule P in Theorem 4.6.b in the case where k is a gradient mapping. Note that the convergence conditions for sequences fxk g de ned by a scaling of 'k with "k (see Theorem 4.1) and using unit steps, and sequences fxk g de ned by unscaled subproblems and predetermined step lengths (see Theorem 4.6.b) are equivalent, in the sense that the scaling "k in the rst method and the step length tk in the second are both subject to the same restrictions, namely the interval constraint (5.10); the sequences generated by the two methods are, however, not necessarily the same, since the subproblems di er. (The condition (5.10) and the step length restriction of, e.g., Theorem 4.6.b are equivalent.) Assuming that f is a convex function in C 1 on the nonempty, compact and convex set X and ' : X 7! < is a strictly convex function in C 1 on X, Dussault et al. [32], as well as Larsson and Migdalas [67, 54], under Rule E, establish the optimality of any accumulation point of fxk g (at least one exists). (See Theorem 5.1.c for a comparison to the cost approximation algorithm.) Migdalas [68] assumes that f is a pseudoconvex function on the nonempty, compact and convex set X and ' : X  X 7! < is a continuous function which is strictly convex and in C 1 on X with respect to its rst argument. Under Rule E, any accumulation point of fxk g (at least on exists) is optimal in [CDP]. (See Theorem 5.1.b for a comparison to the cost approximation algorithm.) Tseng [109] assumes that f is a convex and lower bounded function in C 1 on the nonempty, closed and convex set X and  : X  X 7! 0 such that

x; y 2 ; f(x) 6= f(y) =) kx ? yk  ":

This assumption holds whenever f takes on a nite number of values on ; it holds in particular when f is convex. The linear convergence result established below extends that of Theorem 4.8 to non-ane and iteration dependent cost approximating mappings k , for algorithms using unit steps. It establishes that the sequence ff(xk )g converges Q-linearly, while (in slight contrast to the result of Theorem 4.8) the sequence fxk g converges with an R-linear rate, i.e., for some constants c > 0 and  2 (0; 1), kxk ? x1 k  ck ; 8k: 37

Theorem 5.5 (Linear convergence of cost approximation algorithms) Let the assumptions of Theorem 5.4 hold. Let further f be lower bounded on X , and Assumptions 5.1 and 5.2 hold. Then ff(xk )g converges at least Q-linearly and fxk g converges at least R-linearly to an element of .

Proof It follows from Theorem 5.4 that, under the given assumptions, the cost approximation algorithm is of the form (5.13){(5.15). The result then follows from Theorem 3.1 of [63]. 2 Luo and Tseng consider also milder conditions than (5.14) on the convergence rate of fek g. In the rst result, it is replaced by the requirement that fkek kg converges with an R-linear rate, in which case ff(xk )g can be shown to converge R-linearly; in the second result, it is replaced by the even milder requirement that 1 X

k=0

kek k < 1;

k )g and fxk g is replaced by convergence only. (From in which case the linear convergence of P ff(x 1 (5.19), this requirement is implied by k=0 kdk k < 1.) Luo and Tseng use their result to establish the linear convergence of the gradient projection algorithm (cf. Corollary 4.1.c), the extragradient algorithm of Korpelevich [52], the proximal point algorithm, the coordinate descent method, and a matrix splitting algorithm. With the exception of Korpelevich's algorithm, these are all instances of the cost approximation algorithm (see Section 5.2 and [78]). By Theorem 5.4, several other instances of cost approximation are linearly convergent, such as the more general gradient projection type algorithms and regularization methods presented in Section 5.2. As an example, we establish below the linear convergence of a general regularization algorithm. Let 'k (x) = f(x) + 1=(2 k )Dk (x), where k > 0 and Dk : X 7! < is a strongly convex function in C 1 on X (with modulus mDk ) with a Lipschitz continuous gradient on X (with modulus MrDk ). In the cost approximation algorithm, let xk+1 = yk , i.e., xk+1 = xmin f k (x); k = 0; 1; : : :; (5.20) 2X '

where

f'k (x) = f(x) + 2 1 [Dk (x) ? Dk (xk ) ? rDk (xk )T (x ? xk )]: k

(5.21)

This algorithm is slightly more general than the regularization algorithms presented in Section 5.2, since we here do not assume that rDk (xk ) = 0. From (5.16), it then follows that (5.13) holds with ek = xk+1 ? xk + 2 k [rf(xk ) ? rf(xk+1 )] + rDk (xk ) ? rDk (xk+1 ); and, with rf Lipschitz continuous on X (with modulus Mrf ), = supk f k g < 1, mD = inf k fmDk g > 0 and MrD = supk fMrDk g < 1, (5.14) holds with 1 = 1 + 2 Mrf + MD . The strong convexity of Dk yields, nally, that (5.15) holds with 2 = mD =(4 ). Thus, the algorithm (5.20){(5.21) is of the form (5.13){(5.15). Under the additional assumptions that f is lower bounded on X, inf k f k g > 0, and that Assumptions 5.1 and 5.2 hold, the algorithm (5.20){(5.21) yields a sequence ff(xk )g which is at least Q-linearly convergent and a sequence fxk g which is at least R-linearly convergent to an element of . This result generalizes that of Luo and Tseng for the proximal point algorithm (which is obtained from the choice Dk  D = k  k2). 38

The convergence of algorithms of the form (5.20){(5.21) with non-quadratic regularization functions has been studied previously, in particular for Bregman functions Dk  D in regularization methods for convex programs (e.g., [17, 18, 33]), and for augmented Lagrangean type methods for dual convex programs (e.g., [53]). To our knowledge, this is, however, the rst linear convergence result for this general class of algorithms; observe that the result does not require f to be convex. It is an open problem if the above linear convergence analysis can be extended to allow for the use of step length rules which are valid under less restrictive assumptions on the cost approximating mappings, such as Rule A, or to nondi erentiable optimization.

6 Unconstrained optimization

In this section we study [GE] in its simplest form, where u  0 (or X = 0 and R < +1 denote the smallest and largest eigenvalue of Bk?1 r2f(xk ) over all k, respectively. If, in Rule P, tk 2 [a1; 2mk =Mrf ? a2 ] for some constants a1; a2 > 0, then the conclusion in (a) holds for Rule P. In particular, if tk  2=(R + r), then the ratio is

R ? r: q= R +r

Proof (a) Follows from the characterization (6.4) of cost approximation methods as gradient related methods, and the results in [84, 113] or [82, pp. 242{246]. (b) Follows from the equivalence to de ected gradient methods (Theorem 6.1), and the results in [21] or [90, Ths. II.1.2 and II.1.6]. (c) Follows as in (b), and from the results in [90, p. 55], [9, Prop. 3.2.4] or [84, 21]. 2

Note that f needs only to be strongly convex in a neighbourhood of x . If the Armijo rule is slightly modi ed, linear convergence is ensured also for nonquadratic functions 'k ([21]). These convergence results are probably the strongest possible given the assumptions. (In fact, in [1] it is shown that the Kantorovitch ratio (R ? r)=(R + r) is the exact convergence rate of the steepest descent method.) By imposing further conditions on the functions 'k , however, so that the obtained search direction approaches the Newton direction in the limit, giving quasi-Newton methods, stronger convergence rate results may be achieved (e.g., [28]). [For these methods, the convergence rate analysis made above is too conservative, since fBk g is not related to r2 f(x ).]

7 Extensions and further research In this paper we have introduced the principle of cost approximation as a means to characterize iterative descent algorithms for nonlinear programming problems. Many well-known algorithms are included in the framework, and so is several classes of iterative algorithms presented earlier. The large freedom in choosing the form of the cost approximating mappings enables one to adapt cost approximation algorithms to the given problem structure. This is of importance for the ecient solution of large-scale problems, and is one of the main merits of the proposed framework. (One example, Evans' algorithm, was presented in Section 5.) When a nonseparable function is to be minimized over a Cartesian product of feasible sets, separability may be induced by choosing the cost approximating mappings to be separable with respect to these sets. Thus, one may develop decomposition algorithms in which the independent subsets of the variables are updated either in sequence or in parallel, depending on the computer facilities available. (See [78] for a detailed analysis of these decomposition algorithms.) 42

The convergence analysis was performed on instances of the general problem [GE] with a decreasing order of simplicity. It was observed that in some important cases, a specialization leads to improvements of the results. For each problem instance, the results were presented for progressively stronger monotonicity requirements on the cost approximation mappings. It was then observed that with stronger monotonicity requirements, there are more degrees of freedom in choosing both the form of these mappings and the step lengths to use; also, the analysis con rms that the rate of convergence is improved by choosing the cost approximation mappings to be strongly monotone. In [78] we have shown that in applications to the constrained di erentiable problem [CDP], the class of cost approximation algorithms has the property of forcing the projected gradient to zero. This result may be used in conjunction with the results obtained in [14, 15] to establish that cost approximation algorithms, when applied to linearly constrained problems, identify the optimal face in a nite number of iterations and thus eventually reduce to the unconstrained version. Under an additional sharpness assumption on the set of optimal solutions (e.g., [87, 13]), we have also shown that cost approximation algorithms, applied to [NDP] or [CDP], converge nitely. In [75, 78, 55, 79], some of the results obtained here are extended to variational inequality problems, and in [77, 80], the framework is used to analyze and interrelate algorithms for trac equilibrium problems. Some interesting subjects for further research have been mentioned in the text. We nally mention that some of the theoretical developments lead to opportunities for computational investigations; a particularly interesting one concerns the construction of strategies for terminating the solution of [CDP'] in order to de ne practically ecient truncated cost approximation algorithms.

References [1]

[2] [3] [4] [5] [6] [7] [8] [9] [10] [11]

H. Akaike, On the successive transformation of probability distributions and its application to the analysis

of the optimum gradient method, Ann. Inst. Math. Stat., 11 (1959), pp. 1{17. L. Armijo, Minimization of functions having Lipschitz continuous rst partial derivatives, Paci c J. Math., 16 (1966), pp. 1{3. K. J. Arrow and L. Hurwicz, Decentralization and computation in resource allocation, in Essays in Economics and Econometrics, R. W. Pfouts, ed., University of North Carolina Press, Rayleigh, NC, 1960, pp. 34{104. M. S. Bazaraa and C. M. Shetty, Foundations of Optimization, vol. 122 of Lecture Notes in Economics and Mathematical Systems, Springer-Verlag, Berlin, 1976. M. S. Bazaraa and C. M. Shetty, Nonlinear Programming: Theory and Algorithms, John Wiley & Sons, New York, NY, 1979. C. Berge, Topological Spaces, Oliver & Boyd, Edinburgh, 1963. P. Bernhard, Commande Optimale, Decentralisation, et Jeux Dynamiques, Dunod, Paris, 1975. D. P. Bertsekas, Constrained Optimization and Lagrange Multiplier Methods, Academic Press, San Diego, CA, 1982. D. P. Bertsekas and J. N. Tsitsiklis, Parallel and Distributed Computation: Numerical Methods, Prentice-Hall, London, 1989. L. M. Bregman, The relaxation method of nding the common point of convex sets and its application to the solution of problems in convex programming, U.S.S.R. Comput. Math. and Math. Phys., 7 (1967), pp. 200{217. H. Brezis, Operateurs Maximaux Monotones et Semi-Groupes de Contractions dans les Espaces de Hilbert, North-Holland, Amsterdam, 1973.

43

[12]

F. E. Browder, Nonlinear maximal monotone mappings in Banach spaces, Math. Ann., 175 (1968),

[13]

J. V. Burke and M. C. Ferris, Weak sharp minima in mathematical programming, SIAM J. Control

[14] [15] [16] [17] [18] [19] [20] [21] [22] [23] [24] [25] [26] [27] [28] [29] [30] [31] [32] [33] [34] [35] [36]

pp. 81{113.

Optim., 31 (1993), pp. 1340{1359. J. V. Burke and J. J. More, On the identi cation of active constraints, SIAM J. Numer. Anal., 25 (1988), pp. 1197{1211. J. V. Burke and J. J. More, Exposing constraints, SIAM J. Optim., 4 (1994), pp. 573{595. A. Cauchy, Methode generale pour la resolution des systemes d'equations simultanees, Comptes Rendus Hebdomadaires des Seances de l'Academie des Sciences (Paris), Serie A, 25 (1847), pp. 536{538. Y. Censor and S. A. Zenios, Proximal minimization algorithm with D-functions, J. Optim. Theory Appl., 73 (1992), pp. 451{464. G. Chen and M. Teboulle, Convergence analysis of a proximal-like minimization algorithm using Bregman functions, SIAM J. Optim., 3 (1993), pp. 538{543. G. H.-G. Chen and R. T. Rockafellar, Forward-backward splitting methods in Lagrangian optimization, unpublished report, Department of Applied Mathematics, University of Washington, Seattle, WA, 1992. F. H. Clarke, Generalized gradients and applications, Trans. Amer. Math. Soc., 205 (1975), pp. 247{262. A. I. Cohen, Stepsize analysis for descent methods, J. Optim. Theory Appl., 33 (1981), pp. 187{205. G. Cohen, Optimization by decomposition and coordination: a uni ed approach, IEEE Trans. Automat. Control, AC-23 (1978), pp. 222{232. G. Cohen, Auxiliary problem principle and decomposition of optimization problems, J. Optim. Theory Appl., 32 (1980), pp. 277{305. G. Cohen and D. L. Zhu, Decomposition coordination methods in large scale optimization problems: the nondi erentiable case and the use of augmented Lagrangians, in Advances in Large Scale Systems, Volume 1, J. B. Cruz, ed., JAI Press, Greenwich, CT, 1984, pp. 203{266. R. S. Dembo and U. Tulowitzki, Computing equilibria on large multicommodity networks: an application of truncated quadratic programming algorithms, Networks, 18 (1988), pp. 273{284. V. F. Dem'yanov and A. M. Rubinov, Approximate Methods in Optimization Problems, American Elsevier, New York, NY, 1970. V. F. Dem'yanov and L. V. Vasil'ev, Nondi erentiable Optimization, Optimization Software, New York, NY, 1985. J. E. Dennis and J. J. More, Quasi-Newton methods, motivation and theory, SIAM Rev., 19 (1977), pp. 46{89. A. R. De Pierro and A. N. Iusem, A relaxed version of Bregman's method for convex programming, J. Optim. Theory Appl., 51 (1986), pp. 421{440. J. Douglas and H. H. Rachford, On the numerical solution of heat conduction problems in two and three space variables, Trans. Amer. Math. Soc., 82 (1956), pp. 421{439. J. C. Dunn, Newton's method and the Goldstein step-length rule for constrained minimization problems, SIAM J. Control Optim., 18 (1980), pp. 659{674. J.-P. Dussault, J. A. Ferland, and B. Lemaire, Convex quadratic programming with one constraint and bounded variables, Math. Programming, 36 (1986), pp. 90{104. J. Eckstein, Nonlinear proximal point algorithms using Bregman functions, with applications to convex programming, Math. Oper. Res., 18 (1993), pp. 202{226. J. Eckstein and D. P. Bertsekas, On the Douglas-Rachford splitting method and the proximal point algorithm for maximal monotone operators, Math. Programming, 55 (1992), pp. 293{318. S. P. Evans, Derivation and analysis of some models for combining trip distribution and assignment, Transportation Res., 10 (1976), pp. 37{57. R. Fletcher, Practical Methods of Optimization, John Wiley & Sons, Chichester, U.K., second ed., 1987.

44

[37]

M. Frank and P. Wolfe, An algorithm for quadratic programming, Naval Res. Logist. Quart., 3 (1956),

[38]

M. Fukushima and H. Mine, A generalized proximal point algorithm for certain non-convex minimization

[39] [40] [41] [42] [43] [44] [45] [46] [47] [48] [49] [50] [51] [52] [53] [54] [55] [56] [57] [58] [59] [60]

pp. 95{110.

problems, Internat. J. Systems Sci., 12 (1981), pp. 989{1000. M. Fukushima, V. H. Nguyen, and J.-J. Strodiot, A globally convergent algorithm for a class of nonsmooth optimization problems and its application to parallel decomposition of convex programs, Technical Report 91002, Department of Applied Mathematics and Physics, Kyoto University, Kyoto, Japan, 1991. D. Gabay, Applications of the method of multipliers to variational inequalities, in Augmented Lagrangian Methods: Applications to the Solution of Boundary-Value Problems, M. Fortin and R. Glowinski, eds., North-Holland, Amsterdam, 1983, pp. 299{331. D. Gabay and B. Mercier, A dual algorithm for the solution of a nonlinear variational problem via nite element approximation, Comput. Math. Appl., 2 (1976), pp. 17{40. R. Glowinski and A. Marrocco, Sur l'approximation, par elements nis d'ordre un, et la resolution, par penalisation-dualite, d'une classe de problemes de Dirichlet non lineaires, Revue Francaise d'Automatique, Informatique et Recherche Operationelle, R-2 (1975), pp. 41{76. A. A. Goldstein, Convex programming in Hilbert space, Bull. Amer. Math. Soc., 70 (1964), pp. 709{710. P. C. Haarhoff and J. D. Buys, A new method for the optimization of a nonlinear function subject to nonlinear constraints, Comput. J., 13 (1970), pp. 178{184. S.-P. Han and G. Lou, A parallel algorithm for a class of convex programs, SIAM J. Control Optim., 26 (1988), pp. 345{355. M. R. Hestenes, Multiplier and gradient methods, J. Optim. Theory Appl., 4 (1969), pp. 303{320. C. Hildreth, A quadratic programming procedure, Naval Res. Logist. Quart., 4 (1957), pp. 79{85. Erratum, ibid., p. 361. J.-B. Hiriart-Urruty and C. Lemarechal, Convex Analysis and Minimization Algorithms, SpringerVerlag, Berlin, 1993. A. I. Iusem, On the convergence of iterative methods for symmetric linear complementarity problems, Math. Programming, 59 (1993), pp. 33{48. K. C. Kiwiel, Methods of Descent for Nondi erentiable Optimization, vol. 1133 of Lecture Notes in Mathematics, Springer-Verlag, Berlin, 1985. K. C. Kiwiel, A method for minimizing the sum of a convex function and a continuously di erentiable function, J. Optim. Theory Appl., 48 (1986), pp. 437{449. G. M. Korpelevich, The extragradient method for nding saddle points and other problems, Matekon, 13 (1977), pp. 35{49. B. W. Kort and D. P. Bertsekas, Combined primal-dual and penalty methods for convex programming, SIAM J. Control Optim., 14 (1976), pp. 268{294. T. Larsson and A. Migdalas, An algorithm for nonlinear programs over Cartesian product sets, Optimization, 21 (1990), pp. 535{542. T. Larsson and M. Patriksson, A class of gap functions for variational inequalities, Math. Programming, 64 (1994), pp. 53{79. L. S. Lasdon, Optimization Theory for Large Systems, Macmillan, New York, NY, 1970. J. Lawrence and J. E. Springarn, On xed points of non-expansive piecewise isometric mappings, Proc. London Math. Soc., 55 (1987), pp. 605{624. E. S. Levitin and B. T. Polyak, Constrained minimization methods, U.S.S.R. Comput. Math. and Math. Phys., 6 (1966), pp. 1{50. Y. Y. Lin and J.-S. Pang, Iterative methods for large convex quadratic programs: a survey, SIAM J. Control Optim., 25 (1987), pp. 383{411. P. L. Lions and B. Mercier, Splitting algorithms for the sum of two nonlinear operators, SIAM J. Numer. Anal., 16 (1979), pp. 964{979.

45

[61] [62] [63] [64] [65] [66] [67] [68] [69] [70] [71] [72] [73] [74] [75] [76] [77] [78] [79] [80] [81] [82] [83] [84] [85]

Z.-Q. Luo and P. Tseng, On the convergence of a matrix splitting algorithm for the symmetric monotone linear complementarity problem, SIAM J. Control Optim., 29 (1991), pp. 1037{1060. Z.-Q. Luo and P. Tseng, Error bound and convergence analysis of matrix splitting algorithms for the ane variational inequality problem, SIAM J. Optim., 2 (1992), pp. 43{54. Z.-Q. Luo and P. Tseng, Error bounds and convergence analysis of feasible descent methods: a general approach, Ann. Oper. Res., 46 (1993), pp. 157{178. O. L. Mangasarian, Convergence of iterates of an inexact matrix splitting algorithm for the symmetric monotone linear complementarity problem, SIAM J. Optim., 1 (1991), pp. 114{122. B. Martinet, Regularisation d'inequations variationnelles par approximations successives, Revue Francaise d'Informatique et de Recherche Operationelle, R-3 (1970), pp. 154{158. B. Martos, Nonlinear Programming Theory and Methods, North-Holland, Amsterdam, 1975. A. Migdalas, Mathematical programming techniques for analysis and design of communication and transportation networks, Doctoral dissertation, Departmentof Mathematics, Linkoping Institute of Technology, Linkoping, Sweden, 1988. A. Migdalas, A regularization of the Frank-Wolfe method and uni cation of certain nonlinear programming methods, Math. Programming, 65 (1994), pp. 331{345. H. Mine and M. Fukushima, A minimization method for the sum of a convex function and a continuously di erentiable function, J. Optim. Theory Appl., 33 (1981), pp. 9{23. G. J. Minty, Monotone (nonlinear) operators in Hilbert space, Duke Math. J., 29 (1962), pp. 341{346. J. M. Ortega and W. C. Rheinboldt, Iterative Solution of Nonlinear Equations in Several Variables, Academic Press, New York, NY, 1970. J.-S. Pang, Necessary and sucient conditions for the convergence of iterative methods for the linear complementarity problem, J. Optim. Theory Appl., 42 (1984), pp. 1{17. J.-S. Pang, More results on the convergence of iterative methods for the symmetric linear complementarity problem, J. Optim. Theory Appl., 49 (1986), pp. 107{134. J.-S. Pang, Convergence of splitting and Newton methods for complementarity problems: an application of some sensitivity results, Math. Programming, 58 (1993), pp. 149{160. M. Patriksson, A descent algorithm for a class of generalized variational inequalities, Report LiTHMAT-R-93-35, Department of Mathematics, Linkoping Institute of Technology, Linkoping, Sweden, 1993. M. Patriksson, Partial linearization methods in nonlinear programming, J. Optim. Theory Appl., 78 (1993), pp. 227{246. M. Patriksson, A uni ed description of iterative algorithms for trac equilibria, European J. Oper. Res., 71 (1993), pp. 154{176. M. Patriksson, A uni ed framework of descent algorithms for nonlinear programs and variational inequalities, Doctoral dissertation, Department of Mathematics, Linkoping Institute of Technology, Linkoping, Sweden, 1993. M. Patriksson, On the convergence of descent methods for monotone variational inequalities, Oper. Res. Lett. (to appear). M. Patriksson, The Trac Assignment Problem: Models and Methods, VSP, Utrecht, 1995. R. R. Phelps, Convex Functions, Monotone Operators and Di erentiability, vol. 1364 of Lecture Notes in Mathematics, Springer-Verlag, Berlin, 1989. E. Polak, Computational Methods in Optimization: A Uni ed Approach, Academic Press, New York, NY, 1971. E. Polak, D. Q. Mayne, and Y. Wardi, On the extension of constrained optimization algorithms from di erentiable to nondi erentiable problems, SIAM J. Control Optim., 21 (1983), pp. 179{203. B. T. Polyak, Gradient methods for the minimisation of functionals, U.S.S.R. Comput. Math. and Math. Phys., 3 (1963), pp. 864{878. B. T. Polyak, A general method of solving extremum problems, Soviet Math. Dokl., 8 (1967), pp. 593{597.

46

[86] [87] [88] [89] [90] [91] [92] [93] [94] [95] [96] [97] [98] [99] [100] [101] [102] [103] [104] [105] [106] [107] [108] [109] [110] [111]

B. T. Polyak, Minimization of unsmooth functionals, U.S.S.R. Comput. Math. and Math. Phys., 9

(1969), pp. 14{29. B. T. Polyak, Introduction to Optimization, Optimization Software, New York, NY, 1987. B. T. Polyak and N. V. Tret'yakov, An iterative method for linear programming and its economic interpretation, Matekon, 10 (1974), pp. 81{100. M. J. D. Powell, A method for nonlinear constraints in minimization problems, in Optimization, R. Fletcher, ed., Academic Press, New York, NY, 1969, pp. 283{298. B. N. Pshenichny and Yu. M. Danilin, Numerical Methods in Extremal Problems, MIR Publishers, Moscow, 1978. S. M. Robinson, Generalized equations and their solutions, part I: basic theory, Math. Programming Study, 10 (1979), pp. 128{141. R. T. Rockafellar, Characterization of the subdi erentials of convex functions, Paci c J. Math., 17 (1966), pp. 497{510. R. T. Rockafellar, Local boundedness of nonlinear, monotone operators, Michigan Math. J., 16 (1969), pp. 397{407. R. T. Rockafellar, Convex Analysis, Princeton University Press, Princeton, NJ, 1970. R. T. Rockafellar, On the maximal monotonicity of subdi erential mappings, Paci c J. Math., 33 (1970), pp. 209{216. R. T. Rockafellar, On the maximality of sums of nonlinear monotone operators, Trans. Amer. Math. Soc., 149 (1970), pp. 75{88. R. T. Rockafellar, The multiplier method of Hestenes and Powell applied to convex programming, J. Optim. Theory Appl., 12 (1973), pp. 555{562. R. T. Rockafellar, Augmented Lagrangians and applications of the proximal point algorithm in convex programming, Math. Oper. Res., 1 (1976), pp. 97{116. R. T. Rockafellar, Monotone operators and the proximal point algorithm, SIAM J. Control Optim., 14 (1976), pp. 877{898. R. T. Rockafellar, The Theory of Subgradients and its Applications to Problems of Optimization: Convex and Nonconvex Functions, Heldermann Verlag, Berlin, 1981. N. Z. Shor, Minimization Methods for Non-Di erentiable Functions, Springer-Verlag, Berlin, 1985. J. E. Springarn, Partial inverse of a monotone operator, Appl. Math. Optim., 10 (1983), pp. 247{265. , Applications of the method of partial inverses to convex programming: decomposition, Math. Programming, 32 (1985), pp. 199{223. J. Stoer and C. Witzgall, Convexity and Optimization in Finite Dimensions I, Springer-Verlag, Berlin, 1970. Y. Takahara, Multilevel approach to dynamic optimization, Report SRC-50-C-64-18, Systems Research Center, Case Western Reserve University, Cleveland, OH, 1964. M. Teboulle, Entropic proximal mappings with applications to nonlinear programming, Math. Oper. Res. (1992), pp. 670{690. A. N. Tikhonov and V. Y. Arsenin, Solutions of Ill-Posed Problems, John Wiley & Sons, New York, NY, 1977. P. Tseng, Applications of a splitting algorithm to decomposition in convex programming and variational inequalities, SIAM J. Control Optim., 29 (1991), pp. 119{138. P. Tseng, Decomposition algorithm for convex di erentiable minimization, J. Optim. Theory Appl., 70 (1991), pp. 109{135. H. Uzawa, Iterative methods for concave programming, in Studies in Linear and Nonlinear Programming, K. J. Arrow, L. Hurwicz, and H. Uzawa, eds., Stanford University Press, Stanford, CA, 1958, pp. 154{165. J. van Tiel, Convex Analysis: An Introductory Text, John Wiley & Sons, Chichester, U.K., 1984.

47

[112] W. I. Zangwill, Nonlinear Programming: A Uni ed Approach, Prentice-Hall, Englewood Cli s, NJ, 1969. [113] P. Wolfe, Convergence conditions for ascent methods, SIAM Rev., 11 (1969), pp. 226{235. [114] E. Zeidler, Nonlinear Functional Analysis and its Applications II/B: Nonlinear Monotone Operators, Springer-Verlag, New York, NY, 1990.

48

Suggest Documents