On Dual Convergence of the Generalized Proximal ...

On Dual Convergence of the Generalized Proximal Point Method with Bregman Distances Alfredo Iusem Renato D.C. Monteiroy October 28, 1997

Abstract

The use of generalized distances (e.g. Bregman distances), instead of the Euclidean one, in the proximal point method for convex optimization, allows for elimination of the inequality constraints from the subproblems. In this paper we consider the proximal point method with Bregman distances applied to linearly constrained convex optimization problems, and study the behavior of the dual sequence obtained from the optimal multipliers of the linear constraints of each subproblem. Under rather general assumptions, which cover most Bregman distances of interest, we obtain an ergodic convergence result, namely that a sequence of weighted averages of the dual sequence converges to the centroid of the dual optimal set. As an intermediate result, we prove under the same assumptions that the dual central path generated by a large class of barriers, including the generalized Bregman distances, converges to the same point.

Keywords: generalized proximal point methods, barrier function, centroid of the optimal set, Bregman distances, convergence of dual sequence, central path.

1 Introduction The main goal of this paper is to analyze the behavior of dual sequences generated by generalized proximal point (GPP) algorithms with separable Bregman distances for solving the linearly constrained problem minff (x) : Ax = b; x 0g;

(1)

Instituto de Matematica Pura e Aplicada, Estrada Dona Castorina 110, Rio de Janeiro, RJ, CEP 22460-320, Brazil. ([email protected]) The work of this author was partially supported by CNPq grant no. 301280/86 y School of Industrial and Systems Engineering, Georgia Institute of Technology, Atlanta, 30332. ([email protected]). The work of this author was partially supported by the National Science Foundation under grants INT-9600343 and CCR-9700448.

1

where f : Rn ! R is a dierentiable convex function, A is an m n real matrix, b is a real m-vector and the variable x is a real n-vector. These algorithms generate a sequence fxk g according to the iteration xk+1 = argminff (x) + k D' (x; xk ) : Ax = bg; (2) where x0 > 0 is arbitrary, fk g is a bounded sequence of positive scalars and D' is the Bregman P distance determined by a convex barrier function ' : Rn+ ! R of the form '(x) = nj=1 '(xj ) according to (45). In a more general form than described above, this method is a generalization of the classical proximal point method studied in [20]. The optimality condition for (2) naturally determines a sequence of dual variables fsk g de ned as sk k [r'(xk ) ? r'(xk+1)], which satis es the dual condition that sk 2 rf (xk+1 ) + Im AT but not necessarily sk 0. A natural question is whether, under appropriate conditions, fsk g converges to the set of optimal solutions of (1), or even stronger, to a speci c dual optimal solution of (1). In this paper, we study the related issue of analyzing the behavior of the averaged dual k g constructed from fsk g as sk Pk ki si , where the weights ki are determined as sequence fsP i=1 ki ?i 1 = ki=1 ?k 1 for i = 1; : : : ; k. The main result we obtain is that fsk g, under appropriate conditions, converges to a speci c dual optimal solution of (1), namely the centroid of the dual optimal set with respect to the barrier h. Partial results regarding the behavior of the dual sequence fsk g have been obtained in a few papers which we now discuss. Most of these results are described in a somewhat dierent framework, with a -divergence d(x; y ) instead of a Bregman distance D' (x; y ) in (2) (see (60) and (61) in Section 5). For the entropic barrier, which can be seen as either the Bregman distance or the -divergence induced by the functions of Examples 1(a) and 2(b) in Section 3 respectively, it was proved in [22] that all cluster points of the sequence fsk g are dual optimal solutions. The case of the shifted logarithmic barrier (i.e., the -divergence induced by the function of Example 2(b) in Section 3) was considered in [9], where it was proved that some cluster points of fsk g are dual optimal solutions. This result was improved upon in [17], where it is proved that all cluster points of fsk g are dual optimal solutions for a larger class of -divergences, but with a rather restrictive assumption, namely log-convexity of the conjugate function j for j = 1; ; n. (see the paragraph following (62)). These papers deal with the more general case of convex (rather than linear) constraints, but none of them establish convergence of the whole sequence fsk g. The only result of this type appears in [18], where convergence of fsk g to the centroid of the dual optimal set is proved, but only for linear programming with the shifted logarithmic barrier. We mention that, up to multiplicative and additive constants, the entropic barrier of Examples 1(a) and 2(a) in Section 3 is the only one which gives rise both to a Bregman distance and to a -divergence, so that all the results just mentioned apply essentially only to one Bregman distance, namely the entropic one. With the goal of analyzing the behavior of the sequence fsk g, we rst study the behavior of the path of solutions of the following family of problems parametrized by a parameter > 0: minff (x) + D' (x; x1) : Ax = bg: (3) 2

The analysis of this path has been systematically studied in the paper by Iusem et al. [7]. The results in [7] are in turn related to the work of Auslender et al. [2] where existence, uniqueness and convergence of this path of solutions and an associated dual path are established, under appropriate conditions, for barriers with dierent properties than those of D' (; x1). Extending the work of [7] from the dual point of view, in Section 2 we study the existence and convergence of the dual path of solutions associated with the above family of problems. Our analysis in this part uses several ideas from [2], which in turn generalizes several works dealing with the convergence behavior of the central path and the continuous trajectories of various interior point algorithms for linear and convex programming (e.g., see McLinden [12, 11], Megiddo [13], Kojima et al. [10], Adler and Monteiro [1], Monteiro [14, 15] and Monteiro and Zhou [16]). Our paper is organized as follows. In Section 2 we study the behavior the associated dual path of solutions of the family of problems (3) and develop a convergence result for dual sequences that asymptotically behave like the dual path. Using this asymptotic result and the fact that fsk g asymptotically approaches points of the dual path for (3), we establish in Section 4 the convergence of fsk g to a unique dual optimal solution of (1). In Section 3, we give several examples of well-known barriers that satisfy the required assumptions of the convergence results developed in Sections 2 and 4. We end the paper by giving in Section 5 some remarks and open problems.

1.1 Notation and terminology

The following notation is used throughout the paper. The superscript T denotes transpose. Rp denotes the p-dimensional Euclidean space. The set of all p q matrices with real entries is denoted by Rpq. If J is a nite index set then jJ j denotes its cardinality, that is the number of elements of J . The Euclidean norm is denoted by k k. For a matrix E , Im(E ) denotes the subspace generated by the columns of E and Null(E ) denotes the subspace orthogonal to the rows of E . The i-th component of a vector w 2 Rn is denoted by wi for every i = 1; : : : ; n. Given an index set J f1; : : : ; ng and a vector w 2 Rn, we denote the subvector [wi ]i2J by wJ ; conversely, a vector x 2 RjJ j is often denoted by xJ (when we want to index its components by elements of J ) and the set of these indexed vectors is denoted by RJ . For Y Rn, we let int Y and ri Y denote, respectively, the interior and relative interior of Y . Given a convex function g : Rn ! R [ f1g, we denote its eective domain by dom g , its conjugate function by g and its subdierential by @g ; moreover, we denote the set fx 2 Rn : @g (x) 6= ;g by dom(@g ).

2 The dual central path associated with general barriers We consider the linearly constrained convex programming problem min ff (x) : Ax = b; x 0g ; 3

(4)

with f : Rn ! R convex and dierentiable, A 2 Rmn, b 2 Rm. We make two assumptions on problem (4), whose solution set will be denoted as X . A1) X 6= ;. A2) fx 2 Rn : Ax = bg \ Rn++ 6= ;. Associated with problem (4), we have the Lagrangian dual problem maxf (s) : s 0g;

(5)

where : Rn ! R [ f?1g is de ned as (s) = inf ff (x) ? xT s : Ax = bg for all s 2 Rn. Under condition A1, it is known that the set of optimal solutions of (5), which we denote by S , is a nonempty polyhedral set, which is bounded when in addition A2 holds. We consider separable barrier functions h for the nonnegative orthant of the form h(x) Pn j =1 hj (xj ), where the functions hj : R++ ! R satisfy A3) A4) A5)

hj is strictly convex and dierentiable for j = 1; : : : ; n. there exists x^ 2 Rn++ such that Ax^ = b and rh(^x) = 0. limt!0 h0j (t) = ?1 for j = 1; : : : ; n.

We also need a joint assumption on problem (4) and the barrier function h: A6) either i) limt!0 hj (t) < 1 for j = 1; : : : ; n, or ii) X is bounded, or iii) f is linear. Condition A6(i) means that h can be continuously extended to Rn+. The central path fx() : > 0g with respect to the barrier h is de ned as follows. For 2 R++, let 8
0g where, for every > 0, x() is the unique solution of 0 2 T (x) + rh(x);

(7)

with h having the property that rh \diverges" on the boundary of C (h is not required to be separable). Problem (1){(3) is a special case of this framework in which the pair (T; C ) is given by C = Rn+ and T (x) = rf (x) + @IL(x), where L = fx 2 Rn : Ax = bg and IL is the indicator function of L, i.e. IL (x) = 0 if x 2 L, and IL (x) = 1 otherwise. It can be easily veri ed that the hypotheses of [7] made on h, T and C for next three results to hold are valid when some of the conditions A1-A6 are assumed. Proposition 1. Under conditions A1{A5, x() exists, is unique and strictly positive for every > 0. Proof. See Proposition 2 of [7]. We say that x is a cluster point of the central path fx() : > 0g if x = limk!1 x(k ) for some sequence fk g R++ such that limk!1 k = 0. Proposition 2. Under conditions A1{A6, the central path fx() : > 0g is bounded and all its cluster points belong to X . Proof. See Proposition 5 of [7]. In some cases it is possible to prove that the central path converges to a speci c point in X . We present a result dealing with this issue in the next proposition, though we will not need it in our analysis of the dual central path. Let B fj : xj > 0 for some x 2 X g and N f1; : : :; ng n B . Proposition 3. Under conditions A1{A5, the following two implications hold. i) If A6(i) holds then lim!0 x() = x , where x argminx2X h(x). P ii) If A6(iii) holds then lim!0 x() = x , where x argminx2X j 2B hj (xj ). Proof. See Theorem 1 of [7] for a proof of (i) and Theorem 2 of [7] for a proof of (ii). For every > 0, de ne s() 2 Rn as

s() ?rh(x()):

(8)

The optimality condition for x() to be a solution of (6) is that

s() 2 rf (x()) + Im AT : 5

(9)

The path fs() : > 0g will be called the dual central path with respect to the barrier h. We are interested in the behavior of s() as goes to 0. We start by characterizing s() as the solution of a convex optimization problem. De ne

h^ (s) =

n X

hj ?sj : j =1

(10)

We need a preliminary result on dom h^ . Proposition 4. For some scalars 1; : : :; n 2 (0; 1], we have (? 1 ; 1) (? n ; 1) = dom(@ ^h ) = int [dom ^h ]:

Proof. Observe that h0j is strictly increasing by A3. Let j limt!1 h0j (t) ( j is possibly +1). By A5, (h0j )(R ) = (?1; j ). By A3 and A4, we have 0 = h0j (^xj ) < h0j (^xj + 1) < j , where x^ is as in A4. Hence, f ; : : :; ng (0; 1]. It is well known (e.g., see Section 26 of [19]) that (hj )0 is the inverse function of h0j . Therefore dom(hj )0 = (h0j )(R ) = (?1; j ), which immediately yields the rst equality of the proposition. Since dom(@ ^h ) dom h^ and the rst set is open, it follows that dom(@ ^h ) int [dom h^ ]. The reverse inclusion follows from Theorem 23.7 of [19]. ++

1

++

Next we present the optimization problem whose solution is s(). Proposition 5. Take any x~ 2 fx 2 Rn : Ax = bg. Then s() belongs to int [dom ^h] and is the unique solution of n

o

min x~T s + ^h (s) : s 2 rf (x()) + Im AT :

(11)

Proof. Observe that by (9), s() is a feasible solution of (11). Using (8), (10) and the relation between h0j and (hj )0 (namely, h0j (t) = r , (hj )0(r) = t), we obtain ? s ( ) ? ^ = ?? x(); (12) rh (s()) = ? rh

1

1

where h (h1 ; : : : ; hn ). Hence, by Proposition 4, we have s() 2 dom(@ ^h ) = int [dom ^h ]. Using (12), we obtain x~ + rh^ (s()) = x~ ? x() 2 Null(A); (13) which shows that s() satis es the optimality conditions of (11). Strict convexity of hj implies strict convexity of hj , and hence of ^h . As a consequence, it follows that s() is the unique optimal solution of problem (11). 6

Next we prove that the dual central path fs() : > 0g is bounded. We need rst a preliminary result of some interest on its own, which requires some notation. Let P denote the set of all triplets P = (I; J; K ) where I; J; K are pairwise disjoints subsets of f1; : : :; ng satisfying I [ J [ K = f1; : : :; ng. For a 2 Rn and P 2 P de ne OPa Rn as OPa fx 2 Rn : xj > aj for j 2 I; xj = aj for j 2 J; xj < aj for j 2 K g: (14) For Y Rn and a 2 Rn, let [

fOPa \ Y : P 2 P such that OPa \ Y is nonempty and boundedg : (15) For an arbitrary Y Rn, the set BPa (Y ) may be empty. As a consequence of Lemmas 1 and 2 stated below, it follows that BPa (Y ) is nonempty whenever Y is an ane manifold, or, more generally, a certain set of parallel ane manifolds. Observe that BPa (Y ) is always bounded since BPa (Y )

it is the union of a nite number of bounded sets. Lemma 1. Let vectors a; u 2 Rn and a linear subspace H Rn be given. Suppose that g : Rn ! Pn R[f1g is a closed convex function of the form g(w) = j=1 gj (wj ) such that ri(dom g)\(H +u) 6= ; and, for each j = 1; : : : ; n, gj is strictly convex and 0 2 @gj (aj ). Then, the problem min fg (w) : w 2 H + ug (16) has a unique solution w which belongs to BPa (H + u). Proof. Observe that a is the unique unconstrained minimizer of g since 0 2 @g(a) = @g1(a1) @gn(an) and each gj , and hence g, is strictly convex. Since g is a closed convex function, it follows that all nonempty level sets of g are compact. The assumption implies that dom g \ (H + u) 6= ;, which together with the compactness of the level sets of g and the closedness of H + u guarantees that (16) has a solution w 2 H + u. Moreover, w is unique due to the strict convexity of g . The assumption that ri(dom g ) \ ri(H + u) 6= ; and Theorem 27.4 of [19] imply that w satis es the optimality condition that for some q 2 @g (w), qT d = 0 for all d 2 H: (17) S We now show that w 2 BPa(H + u). Since Rn = P 2P ri OPa , there exists P 2 P such that w 2 (ri OPa ) \ (H + u). We claim that OPa \ (H + u) is bounded. Indeed, assume for contradiction that this set is unbounded. Applying Proposition 2.2.3 of [4] to the closed convex set OPa \ (H + u), we conclude that there exists d 2 Rn, d 6= 0, such that the hal ine fw + td : t 0g is contained in OPa \ (H + u). Using the de nition of OPa and the fact that w 2 ri OPa and H is a subspace, we easily see that d 2 H and dj > 0 ) wj > aj ; (18) dj < 0 ) wj < aj : (19) 7

Noting that @gj is strictly monotone due to the strictly convexity of gj and using the fact that 0 2 @gj (aj ) and qj 2 @gj (wj ), we conclude that

wj > aj ) qj > 0; wj < aj ) qj < 0:

(20) (21)

It follows from the implications (18)-(21) that dj 6= 0 ) qj dj > 0: Since d 6= 0, this implies that qT d > 0. On the other hand, since d 2 H it follows from (17) that qT d = 0, thus yielding the desired contradiction. Then, it follows that OPa \ (H + u) is bounded, and hence that w 2 BPa(H + u). We observe that the above result for the case in which g (w) = kDwk2, for some positive diagonal matrix D, has already been derived in Todd [21] (see the paragraph before Proposition 2 of [21]). The following result allows us to show that the solution w of problem (16) remains bounded when the (parameter) vector u varies in a bounded set. We omit its trivial proof. Lemma 2. If U Rn is a bounded set and H Rn is a linear subspace then, for any a 2 Rn, we have [ BPa (H + u) = BPa (H + U ): u2U

Next we use Lemmas 1 and 2 to establish boundedness of the dual central path fs()g. Proposition 6. Under conditions A1{A6, the curve fs() : > 0g de ned by (8) is bounded. Proof. We will cast the optimization problem (11) in the framework of Lemma 1. Take H Im AT , Pn u rf (x()), a 0 and g(s) = j=1 gj (sj ), where gj (t) x^j t + hj (?t=) for every t 2 R and x^ is as in A4. By (10) and Proposition 5 with x~ = x^, s() is the solution w of problem (16) for this choice of g , H and u. We now check now that the hypotheses of Lemma 1 hold. The strict convexity of hj implies that hj , and hence gj , is strictly convex. The closeness of g follows from the closeness of the conjugate functions hj . By Proposition 5, s() 2 int (dom ^h ) \ (Im AT + rf (x())) = int (dom g ) \ (H + u). Finally, we check that 0 2 @gj (aj ), or equivalently gj0 (0) = 0, for j = 1; : : : ; n. Note that gj0 (t) = x^j ? (hj )0 (?t=), so that gj0 (0) = x^j ? (hj )0 (0). By A4 and the relation between hj 0 and h0j , we conclude that gj0 (0) = 0. Since all the assumptions of Lemma 1 hold, it follows that s() 2 BP0(H + u) = BP0 (H + rf (x())) for all > 0. Now, let U frf (x()) : > 0g. Note that the set fx() : > 0g is bounded by Proposition 2. Together with the convexity and dierentiability of f , whose eective domain is Rn, this implies that U is bounded. Hence, by Lemma 2 we conclude that s() 2 BP0 (H + rf (x())) BP0(H + U ) for all > 0. Since BP0 (H + U ) is bounded, the result follows. 8

A point s 2 Rn will be said to be a cluster point of fs()g if s = limk!1 s(k ) for some fk g R++ such that limk!1 k = 0. We prove next that the cluster points of fs()g, which exist by Proposition 6, are dual solutions of problem (4), i.e. solutions of problem (5). Proposition 7. Under conditions A1{A6, if s is a cluster point of fs()g then s belongs to S . Proof. By well known duality results applied to problem (4), it suces to show that s 2 Im AT + rf (x); xT s = 0; s 0: (22) for some x 2 X . Assume that s = limk!1 s(k ) with limk!1 k = 0. In view of Proposition 2, we can assume without loss of generality (by re ning the sequence fk g if necessary) that x limk!1 x(k ) exists. By Proposition 2, x 2 X . By (9), we have s(k ) 2 rf (x(k )) + Im AT for all k. Letting k ! 1 in this relation, and using the fact that Im AT is a closed set and the gradient of a convex and dierentiable function is continuous in the interior of its eective domain, which in the case of f is the whole Rn, we obtain the rst relation of (22). To verify the second and third relations of (22), it suces to show that sj = 0 whenever xj > 0, and sj 0 whenever xj = 0. Indeed, by (8) we have s(k )j = ?k h0j (x(k )j ); j = 1; : : : ; n: (23) When xj > 0, (23) implies that sj = limk!1 s(k )j = 0 due to the fact that x(k )j converges to xj > 0, h0j (t) is nite for all t > 0 and limk!1 k = 0. When xj = 0, we have limk!1 x(k )j = 0, and hence limk!1 h0j (x(k )j ) = ?1 by A5. By (23), this implies that s(k )j 0 for large enough k, and hence sj = limk!1 s(k )j 0. We have established that all cluster points of fs(k )g are dual optimal solutions. We would like to characterize them in a way similar to the primal result in Proposition 3. For this purpose, we need to impose a slightly convoluted assumption on h. We will show later on that this assumption holds in most signi cant cases. Let N 0 f1; : : :; ng be de ned as N 0 = fj : sj > 0 for some s 2 S g; (24) 0 0 B = f1; : : :; ng n N : (25) We remark that for linear f , i.e. in the linear programming case, it holds that N 0 = N and B 0 = B , with B and N as de ned just before Proposition 3. This is not true for nonlinear f . For instance, the problem min f (x1; x2) = x21 s.t. x2 = 1, x 0 has X = f(0; 1)g and S = f(0; 0)g, and hence N = f1g and N 0 = ;. Clearly, N 0 N and B 0 B. Since, by de nition, sj = 0 for all j 2 B 0 and s 2 S , it follows from Propositions 6 and 7 that conditions A1{A6 imply that lim!0 s()j = 0 for all j 2 B 0 . In particular, if N 0 = ; then lim!0 s() = 0 and S = f0g. The interesting case is therefore N 0 6= ; which we will analyze in the next proposition. De ne h J : RJ ! R [ f1g by

hJ (sJ ) = X hj ?sj ; for all sJ 2 RJ : j 2J

9

(26)

The required assumption is stated next. A7) There exist an interval T R and a function : T R++ ! R such that i) (; ) is nondecreasing on T for all > 0; ii) for every > 0 and J N 0, we have hJ (RJ++) T and the function (hJ (); ) is convex over RJ++; iii) for every J N 0 , there exists a closed convex function J : RJ+ ! R [ f1g such that dom(J ) RJ++ and J (sJ ) = lim !0 (h J (sJ ); ) for all sJ > 0. We mention that, in view of (i), a sucient condition for (ii) is that (; ) be convex for all > 0. However, in some relevant examples (; ) is not convex, while (ii) holds. Under A7, we have the following characterization of the cluster points of fs()g. Proposition 8. Assume that conditions A1{A7 hold and N 0 =6 ;. Then, all cluster points of fs()g are solutions of the problem minfN 0 (sN 0 ) : s 2 S g, with N 0 as in A7(iii). Proof. Let s be a cluster point of fs()g. Assume that s = limk!1 s(k ) with limk!1 k = 0. Take any s 2 ri S = 6 ; and x 2 (0; 1). Let s^k s(k ) + (s ? s) and s~k s(k ) + s ? s for all k. Then limk!1 s^k = s + (s ? s) and limk!1 s~k = s. Let x 2 X be given. Then, s; s 2 rf (x)+Im AT , from which it follows that s ? s 2 Im AT . This implies that s~k 2 rf (x(k ))+Im AT for all k. Using this, Proposition 5 with x~ = x, the convexity of the minimand in (11) and the fact that s^k = (1 ? )s(k ) + s~k , we conclude that

xT s^k + k h^k (^sk ) maxfxT s(k ) + k ^hk (s(k )) ; xT s~k + k ^hk (~sk )g xT s~k + k ^hk (~sk ):

(27)

Since x 2 X and s; s 2 S , we have xT s = xT s = 0, from which it follows that xT s^k = xT s~k for all k. Using this in (27), we conclude that n X j =1

hj

!

!

n X ?s^kj ?s~kj k k ^ ^ k = hk (^s ) hk (~s ) = j hj k :

(28)

=1

Moreover, since sB 0 = sB 0 = 0, we have s^kB 0 = s~kB 0 for all k. This together with the last inequality imply that

hNk0 (^skN 0 ) =

X

j 2N 0

hj

!

k X ?s^kj ? s ~ N 0 k k j2N 0 hj k = hk (~sN 0 ):

10

(29)

Using the de nition of N 0 and the fact that s 2 ri S , we easily see that sN 0 > 0. Hence, limk!1 s^kN 0 > 0 and limk!1 s~kN 0 > 0, from which it follows that, for some k k0, s^kN 0 > 0 and s~kN 0 > 0 for all k k0 . By A7(i)-(ii) and (29), we obtain

(hNk0 (^skN 0 ); k ) (hNk0 (~skN 0 ); k ); 8k k0 :

(30)

By A7(ii), A7(iii), the functions (h Nk0 (); k ) restricted to the set RN++0 are nite, convex and converge 0pointwise to N 0 . By Theorem 10.8 of [19], such convergence is uniform in any compact set of RN++. This together with the fact that lim k!1 s^kN 0 = sN 0 + (sN 0 ? sN 0 ) > 0 and limk!1 s~kN 0 = sN 0 > 0 allow us to conclude, after letting k tend to 1 in (30), that

N 0 (sN 0 + (sN 0 ? sN 0 )) N 0 (sN 0 ):

(31)

Since (31) holds for any 2 (0; 1) and s 2 ri S , and N 0 is a closed convex function by A7(iii), it follows from Proposition IV.1.2.5 of [4] that

0 (s 0 + (sN 0 ? sN 0 )) N 0 (sN 0 ); 8s 2 ri S : N 0 (sN 0 ) = lim !0 N N

(32)

Since any boundary point of S can be approached by points in ri S lying on a segment (see Lemma III.2.1.6 of [4]), it follows again from Proposition IV.1.2.5 of [4] that (32) holds for any s 2 S . The next corollary gives the characterization of the limit of the dual path fs()g.

Corollary 1. Assume that conditions A1{A7 hold and N 0 6= ;. If the function N 0 of A7(iii) is strictly convex on S , then lim!0 s() exists and is the unique solution s of the problem minfN 0 (sN 0 ) : s 2 S g.

We will see in the next section that in all but one of the examples, it is possible to nd such that N 0 is strictly convex on S . In the other case (perhaps the most relevant one) N 0 is convex but not strictly convex. For this case, the convergence of fs()g can be proved by using a re nement of Proposition 8 which we discuss next. When N 0 is not strictly convex on S , the problem minfN 0 (sN 0 ) : s 2 S g may have multiple solutions. Let S1 denote the optimal solution set of this problem and de ne the index set N1 fj 2 N 0 : sj is not constant on S1 g. Consider now the problem minfN1 (sN1 ) : s 2 S1g. Let S2 denote its optimal solution set and de ne the index set N2 fj 2 N1 : sj is not constant on S2 g. Continuing in this way, we obtain a sequence of sets S = S0 S1 S2 and a sequence of index sets N 0 = N0 N1 N2 . The result stated below imposes the following condition on these sequences. A8) There exists r 0 such that Sr = fsc g for some sc 2 Rn (and hence Nr = ;). 11

The point sc is referred to as the centroid of S with respect to the barrier h. Note that condition A8 holds if and only if the sequence N 0 = N0 N1 N2 is strictly decreasing, i.e. when at least one variable si with i 2 Ni?1 is constant on Si , for i = 1; 2; . Proposition 9. Assume that conditions A1{A8 hold and N 0 6= ;. Then, the dual path fs()g converges to the centroid sc . Proof. The proof is similar to the one of Proposition 8 except for a few minor points which we now discuss. Again, let s be as before. To show that s = sc , it is sucient to prove that s 2 Si for every i = 1; : : : ; r, due condition to A8. Clearly, by Proposition 8, we have s 2 S1 . Assume that s 2 Si and i < r. We will show that s 2 Si+1 . Clearly, this implies that s 2 Sr = fsc g, and hence that the proposition holds. Indeed, let s 2 ri Si be given and x 2 (0; 1). De ne the sequences fs^k g and fs~k g as in the proof of Proposition 8. Arguing as in that proof, we easily see that (28) holds. Now since s; s 2 Si , we have si = si for every i 62 Ni , due to the de nitions of the sets Nj 's. This implies that s^ki = s~ki for every k and i 62 Ni . Hence, from (28) we deduce that (29) holds with N 0 replaced by Ni. The rest of the proof goes exactly like in Proposition 8 with N 0 and S replaced by Ni and Si, respectively, and yields the conclusion that s minimizes the problem minfNi (sNi ) : s 2 Sig, i.e. s 2 Si+1 . We observe that a convergence result similar to Proposition 9 has been derived in [2] under dierent assumptions on the barrier h and for the case of linear objective function f . We will now provide a variation of the results stated above which will be used in Section 4 to analyze the behavior of dual sequences generated by generalized proximal point methods with Bregman distances. Proposition 10. Suppose that conditions A1-A5 hold and let fxk g fx 2 Rn : Ax = b; x > 0g, fsk g Rn, fgk g Rn and fk g R++ be sequences such that sk + k rh(xk ) = 0; sk 2 g k + Im AT ; 8k 0: (33) Assume also that fxk g is bounded, lim k!1 k = 0 and limk!1 [g k ? rf (xk )] = 0. Then, a) for any x~ 2 Rn such that Ax~ = b and k 0, sk is the unique optimal solution of the problem min x~T s + k h^ (s) (34) k T s.t. s 2 g + Im A ; (35) b) any cluster point of fxk g is a solution of (4); c) the sequence fsk g is bounded and all its cluster points are contained in S ; d) if in addition conditions A7 and A8 hold then fsk g converges to the centroid sc of S with respect to h.

12

Proof. The proof of (a) is similar to the proof of Proposition 5. The boundedness of fsk g can be proved with the same arguments of the proof of Proposition 6 using the set U fg k : k 0g, which is clearly bounded due to the assumption that fxk g is bounded and limk!1 [g k ? rf (xk )] = 0. Assume now that (x; s) is a cluster point of the sequence f(xk ; sk )g. Using the fact that limk!1 [g k ?rf (xk )] = 0 and arguments similar to those used in the proof of Proposition 7, it can be shown that

Ax = b; x 0; s 2 Im AT + rf (x); xT s = 0; s 0:

(36) (37)

This clearly implies that (x; s) 2 X S . It is now easy to see that (b) and (c) follows from the observations above. The proof of d) is exactly like the one of Proposition 9. It is worth emphasizing that Proposition 10 does not assume condition A6. Instead, it explicitly assumes that fxk g is bounded and limk!1 [g k ? rf (xk )] = 0. Observe that for any sequence fk g Rn++ such that limk!1 k = 0, the hypotheses of Proposition 10 are satis ed when the sequences fxk g, fsk g and fg k g are given by xk x(k ), sk s(k ) and g k rf (xk ) for all k. Conclusions (a), (c) and (d) of Proposition 10 for this special case are analogous to the ones obtained earlier in Propositions 5, 6, 7 and 9; moreover, conclusion (b) yields an alternative proof of the second part of Proposition 2 (assuming that its rst part is known).

3 Examples of barriers In this section we give several examples of barriers that satisfy conditions A3-A5 of Section 2. We consider two types of barriers, called Bregman type and divergence type respectively. For both types, we take a xed x^ > 0 such that Ax^ = b. Bregman type barriers are of the form

h(x) = '(x) ? '(^x) ? hr'(^x); x ? x^i with

'(x) =

n X j =1

'j (xj );

(38) (39)

where each 'j : Rn++ ! R is strictly convex, dierentiable and satis es limt!0 '0j (t) = ?1. NePn glecting constant terms, this is equivalent to h(x) = j =1 hj (xj ) with

hj (t) = 'j (t) ? '0j (^xj )t: 13

(40)

It follows easily that h0j (^xj ) = 0 and that conditions A3{A5 hold. In this case the function h J of (26) takes the form

hJ (sJ ) = X 'j '0j (^xj ) ? sj :

(41)

j 2J

The divergence type barriers are of the form

h(x) =

n X

x^j 'j xx^j ; j j =1

(42)

with each 'j : Rn++ ! R strictly convex, dierentiable and satisfying 'j (1) = '0j (1) = 0 and limt!0 '0j (t) = ?1. In the notation of the previous section, we have t hj (t) = x^j 'j x^ : j

(43)

Again, it is easy to check that conditions A3{A5 hold for this type of barriers. The function hJ of (26) takes the form

hJ (sJ ) = X x^j 'j ? sj : j 2J

(44)

Proposition 5 holds for the dual central path of any barrier of either type. To have establish convergence of the whole dual central path, i.e. Proposition 9, we need to check conditions A6{A8. We will do this for several examples of each type. In each case we give the expressions of 'j , 'j , h J , T , and J .

1) Bregman type. a) Let 'j (t) = t log t, and so 'j (t) = et?1 and h J (sJ ) = (t; ) = t and J (sJ ) = maxj2J fe?sj g. b) Let 'j (t) = ? log t, and so 'j (t) = ? log(?t) ? 1 and

P

j 2J x^j e

?sj = . De ne

hJ (sJ ) = ? X log 1 + sj ? jJ j: x^j j 2J De ne T = R, (t; ) = t + jJ j(1 ? log ) and J (sJ ) = ? j 2J log sj . P

14

T = Rn , +

c) Let 'j (t) = t ? t with 0 < < 1, and so 'j (t) = (1 ? )(1 ? s= ) =( ?1) and

X sj hJ (sJ ) = (1 ? ) x^j ?1 + j 2J P ? = t and J (sJ ) = j 2J s?j ,

?1

:

where =(1 ? ) > 0 and De ne T = Rn+, (t; ) ? =(1 ? ). d) Let 'j (t) = t ? t with > 1 and 0 < < 1. In this case it is not possible to give closed expressions for 'j and h J , but it can be proved that condition A7 holds with the same formulae as in Example 1(c).

2) Divergence type.

a) Let 'j (t) = t log t ? t + 1. De ne hJ , T , and J as in Example 1(a). b) Let 'j (t) = t ? log t ? 1, and so 'j (t) = ? log(1 ? t).

hJ (sJ ) = ? X x^j log 1 + sj :

De ne T = Rn, (t; ) = t ?

P

j 2J x^j

j 2J

P

log and J (sJ ) = ? j 2J x^j log sj .

c) Let 'j (t) = t ? t + (1 ? ) with 0 < < 1, and so 'j (t) = ?(t= ) [1 ? (t= )]1=( ?1) and

hJ (sJ ) =

1 X x^ sj 1 + sj j2J j

1

?1

:

De ne T = Rn, (t; ) = ? t, J (sJ ) = j 2J x^j s?j , where ? and =(1 ? ). d) Let 'j (t) = t + t? + ( + 1) with > 0, and so ' (t) = (t= ) [1 ? (t= )]?1=(+1) and 1 hJ (s ) = ? 1 X x^ s 1 + sj ? +1 : P

j2J j j P De ne T = Rn, (t; ) = t, J (sJ ) = ? j 2J x^j sj , where =( + 1) and . Note that the functions J of cases 1(b)-(d) are strictly convex on S , and hence conditions A7 and A8 hold for these cases. In case 1(a), J is convex, but not strictly convex. Hence, for this case condition A7 holds, and condition A8 can be easily established using the de nition of J . J

Condition A6(i) holds for cases 1(a), 1(c), 1(d), 2(a) and 2(c). In the other cases, the results of Corollary 1 and Proposition 9 (i.e. convergence of the whole dual path fs()g to a unique point in S ) are valid if either A6(ii) or A6(iii) hold (i.e. linear programming problem or bounded primal optimal set). 15

4 The dual sequence of the proximal point method with separable Bregman distances

A separable Bregman function with zone Rn+ is a function ' : Rn++ ! R of the form '(x) = j =1 'j (xj ), satisfying A3, A6(i) and additionally three technical conditions related to the behavior of r' near the boundary of Rn+ (assumptions B3{B5 in [5]). From ', we construct the Bregman distance D' : Rn+ Rn++ ! R as Pn

D'(x; y) = '(x) ? '(y) ? r'(y)T (x ? y):

(45)

Bregman functions which also satisfy A5 are said to be boundary coercive . The functions ' with 'j as in Examples 1(a), 1(c) and 1(d) of Section 3 are boundary coercive separable Bregman functions with zone Rn+; ' with 'j as in Example 1(b) fails to satisfy only condition A6(i). The proximal point method with Bregman distance D' for solving problem (4) generates a sequence fxk g Rn++ de ned as

x0 2 Rn++; xk+1 = argminff (x) + k D' (x; xk ) : Ax = bg;

where fk g R++ satis es

1 X k=0

?k 1 = 1:

(46) (47) (48)

The following result on the convergence of fxk g given by (46){(47) is known. Proposition 11. Assume that conditions A1 and A2 hold and that ' is a boundary coercive Bregman function with zone Rn+. Then, the sequence fxk g generated by (46){(47) converges to a solution of problem (4). Proof. See Theorem 4.1 of [6]. We mention that in [6] a condition stronger than (48), namely boundedness of fk g, is assumed, but the proof can be easily modi ed to hold under our weaker assumption (48), as done in [3], which on the other hand imposes a condition on ' stronger than A5, namely that r' is onto. The optimality condition for xk+1 to be a solution of (47) is that

sk 2 rf (xk+1 ) + Im AT ;

(49)

sk k [r'(xk ) ? r'(xk+1 )]:

(50)

where, for every k 0, 16

We are interested in the convergence properties of the dual sequence fsk g. Using the relation Pn 0 0 between 'j and ('j ) and the fact that '(x) = j =1 'j (xj ), we see that (50) is equivalent to !

sk xkj +1 ? ('j )0 '0j (xkj ) ? j = 0; j = 1; : : : ; n: k As in the proof of Proposition 5, it is easy to show that sk is the solution of 8
0:

(62)

and each j : Rn++ ! R is strictly convex, dierentiable and satis es j (1) = 0j (1) = 0 and limt!1 0j (t) = ?1. Convergence of fxk g as de ned by (60) to a solution of problem (4) has been proved in [8] under the additional condition that A result on the corresponding averaged dual sequence fsk g similar to Proposition 14 can be found in [17] for the more general case of convex, rather than linear, constraints. Assuming that 1 = = n, that fkg is constant, that j satis es (62) and that log(j (t)) is convex, it is proved in [17] that fsk g is bounded and that all its cluster points are dual optimal solutions. Finally, we mention three open problems related to our results. The rst one is to prove that the cluster points of fsk g as given by (50) are dual optimal solutions. As mentioned above, the basic diculty is to prove that they are nonnegative. The second one is the convergence of the primal sequence fxk g in the absence of A6(i) (e.g. Example 1(b)). For linear programming fxk g converges to a primal solution, since xk = x(k ) as discussed above and fx(k )g converges by Proposition 3(ii). However, the problem is open for other situations (e.g. under A6(ii)). 19

The third problem is to decide whether the limits of fx()g and fxk g coincide, when both exist (e.g. under A6(i)). It has been proved in ([7], Corollary 1) that they do coincide under the additional assumption that the rank of the Hessian matrix of f is constant over the feasible set of problem (4), but the problem remains open without this hypothesis.

References [1] I. Adler and R. D. C. Monteiro, Limiting behavior of the ane scaling continuous trajectories for linear programming problems, Mathematical Programming, 50 (1991), pp. 29{51. [2] A. Auslender, R. Cominetti, and M. Haddou, Asymptotic analysis for penalty and barrier methods in convex and linear programming, Mathematics of Operations Research, 22 (1997), pp. 43{62. [3] G. Chen and M. Teboulle, Convergence analysis of a proximal-like optimization algorithm using Bregman functions, SIAM Journal on Optimization, 3 (1993), pp. 538{543. [4] J.-B. Hiriart-Urruty and C. Lemarechal, Convex Analysis and Minimization Algorithms I, vol. 305 of Comprehensive Study in Mathematics, Springer-Verlag, New York, 1993. [5] A. Iusem, On some properties of generalized proximal point methods for variational inequalities, manuscript, Instituto de matematica Pura e Aplicada, Rio de Janeiro, Brazil, 1994. To appear in Journal of Optimization Theory and Applications. [6] , On some properties of generalized proximal proint methods for quadratic and linear programming, Journal of Optimization Theory and Applications, 85 (1995), pp. 593{612. [7] A. Iusem, B. Svaiter, and J. Cruz, Generalized proximal point methods and Cauchy trajectories in Riemannian manifolds, manuscript, Instituto de matematica Pura e Aplicada, Rio de Janeiro, Brazil, 1996. [8] A. Iusem, B. Svaiter, and M. Teboulle, Entropy-like proximal methods in convex programming, Mathematics of Operations Research, 19 (1994), pp. 790{814. [9] D. L. Jensen and R. A. Polyak, The convergence of a modi ed barrier method for convex programming, IBM Journal of Research and Development, 38 (1994), pp. 307{321. [10] M. Kojima, S. Mizuno, and T. Noma, Limiting behavior of trajectories by a continuation method for monotone complementarity problems, Mathematics of Operations Research, 15 (1990), pp. 662{675. [11] L. McLinden, An analogue of Moreau's proximation theorem, with application to the nonlinear complementarity problem, Paci c Journal of Mathematics, 88 (1980), pp. 101{161. 20

[12] [13]

[14] [15] [16]

[17] [18] [19] [20] [21] [22]

, The complementarity problem for maximal monotone multifunctions, in Variational Inequalities and Complementarity Problems, R. Cottle, F. Giannessi, and J.-L. Lions, eds., Wiley, New York, 1980, pp. 251{270. N. Megiddo, Pathways to the optimal set in linear programming, in Progress in Mathematical Programming: Interior Point and Related Methods, N. Megiddo, ed., Springer Verlag, New York, 1989, pp. 131{158. Identical version in: Proceedings of the 6th Mathematical Programming Symposium of Japan, Nagoya, Japan, pages 1{35, 1986. R. D. C. Monteiro, Convergence and boundary behavior of the projective scaling trajectories for linear programming, Mathematics of Operations Research, 16 (1991), pp. 842{858. , On the continuous trajectories for a potential reduction algorithm for linear programming, Mathematics of Operations Research, 17 (1992), pp. 225{253. R. D. C. Monteiro and F. Zhou, On the existence and convergence of the central path for convex programming and some duality results, manuscript, School of Industrial and Systems Engineering, Georgia Institute of Technology, Atlanta, GA 30332-0205, USA, April 1994. To appear in Computational Optimization and Applications. R. Polyak and M. Teboulle, Nonlinear rescaling and proximal-like methods in convex optimization, Mathematical Programming, 76 (1997), pp. 265{284. M. J. D. Powell, Some convergence properties of the modi ed log barrier method for linear programming, SIAM Journal on Optimization, 5 (1995), pp. 695{739. R. T. Rockafellar, Convex Analysis, Princeton University Press, Princeton, NJ, 1970. , Monotone operators and the proximal point algorithm, SIAM Journal on Control and Optimization, 14 (1976), pp. 877{898. M. Todd, A Dantzig-Wolfe-like variant of Karmarkar's interior-point linear programming algorithm, Operations Research, 38 (1990), pp. 1006{1018. P. Tseng and D. Bertsekas, On the convergence of the exponential multiplier method for convex programming, Mathematical Programming, 60 (1993), pp. 1{19.

21