A Generalized Like-distance in Convex Programming - Hikari

International Mathematical Forum, 2, 2007, no. 37, 1811 - 1830

A Generalized Like-distance in Convex Programming Romulo A. Castillo

1

Department of Mathematics Universidad Centroccidental ”Lisandro Alvarado” Barquisimeto, Venezuela [email protected] Eibar Hernandez

2

Department of Mathematics Universidad Centroccidental ”Lisandro Alvarado” Barquisimeto, Venezuela [email protected] Jorge Campos Section of Mathematics, UNEXPO Barquisimeto, Venezuela [email protected] Abstract We consider a generalized like-distance which contains as degenerate cases φ-divergences and like-distances with second order homogeneous kernel. The motivation to get this like-distance comes from studying shifted penalty functions in the primal space. These penalty functions do not necessarily have to pass through the origin with slope one and their conjugate functions allow negative values. For a particular case we get a generalization of the Kullback-Liebler entropy distance. This like-distance can be seen as the difference between a sequence of Bregman distances and their linear approximations for specific values of the arguments. Dual and primal convergence results are shown, particularly, we show that each limit point of the sequence generated by the proximal method defined by the generalized like-distance applied to the dual problem is an optimal dual solution. 1 2

This author was partially supported by CDCHT-UCLA-Venezuela. This author was partially supported by DFPA-UCLA-Venezuela.

1812

R. Castillo, E. Hernandez and J. Campos

Mathematics Subject Classification: 90C99, 90C25, 49M30 Keywords: Proximal point methods, convex programming, multiplier methods

1

Introduction

We consider the convex programming problem defined by (P ) f ∗ = inf{f0 (x) : fi (x) ≤ 0 i = 1, . . . , m} where fi : Rn → R for i = 0, 1, . . . , m are closed proper convex functions. The associated dual convex problem can be written by (D)

d∗ = inf{−d(μ) : μ ≥ 0}

(1)

where d(μ) = inf{l(x, μ) : x ∈ Rn } and l(x, μ) = f0 (x) +

m

μi fi (x)

(2)

i=1

is the Lagrangian function. We assume the following hypothesis: (H1) The optimal set for problem (P ) is nonempty and compact. x) < 0 for i = 1, . . . , m. (H2) There exist x¯ such that fi (¯ The problem (P ) can be solved using different approaches, for example, we can use augmented Lagrangian methods which can be introduced from a primal view point, see [3], [14],[2] or from a dual view point where the multiplier method is constructed applying Fenchel’s theory of duality to the proximal point method, see [17], [9], [6], [1]. Proximal point methods with φ divergences to solve (D), generate a sequence 0 m {μk } ∈ Rm ++ such that μ ∈ R++ and μk+1 = argminμ≥0 {−d(μ) + rk dφ (μ, μk )}

(3)

where 0 < r < rk < r ≤ 1, dφ :

Rm +

×

Rm ++

→ R is given by dφ (s, μ) =

m i=1

μi φ

si μi

(4)

and φ is a nonnegative and strictly convex function that passes by the point (1, 0) and it is well studied in [9]. Other like-distances have been used in

A generalized like-distance

1813

different contexts with different properties and results, for example in [2], the parameter r was proposed as a function of the multiplier μ and when r(μ) = μ we get the like-distance studied in [1] which is homogeneous of order two and given by the expression m si 2 μi φ dφ (s, μ) = . (5) μi i=1 Proximal point methods with Bregman distances solve (D) by using iterates in (3) with Dh instead of dφ and given by Dh (s, μ) = h(s) − h(μ) − ∇hμ (μ)t (s − μ)

(6)

where h is a Bregman function, see [12]. In section 2 we consider a new likedistance to solve problem (D) which has interesting properties and is given by the expression m p μi ∗ csi μpi ∗ p p−1 ∗ θ dθ∗ (s, μ) = − θ (c) − μi (θ ) (c)(si − μi ) (7) c μ c i i=1 where θ∗ is the conjugate function of the penalty function θ which does not necessarily pass through the origin and satisfies θ (0) = κ, κ > 0, p ≥ 0 and c ∈ R++ satisfies (θ∗ )(c) = y˜ ∈ R. m μpi ∗ csi m For μ ∈ Rm we fixed, if we define h : R → R by h (s) = μ μ ++ + i=1 c θ μi

get dpθ∗ (s, μ) = hμ (s) − hμ (μ) − ∇hμ(μ)t (s − μ), i.e., dpθ∗ can be seen as the difference between hμ (s) and its linear approximation at s = μ. We observe that θ∗ is a translation of the function φ used in φ-divergences, in this case, with minimal point at (κ, θ∗ (κ)) and θ∗ (κ) ∈ R. In section 3 we show dual and primal results based on a proximal method and an augmented Lagrangian algorithm. Sections 4 and 5 and the appendix present the concluding remarks. We get the like-distance by applying the conjugacity theory to a shifted penalty function used in an augmented Lagrangian approach, so we consider first a family of penalty functions and we show how the like-distance defined in (7) naturally arises.

2

Getting the like-distance

In order to get the generalized like-distance in a natural way we study first a family of penalty functions in the context of multiplier methods.

2.1

Penalty functions

Let θ be a strictly increasing twice differentiable strictly convex function defined on (−∞, b), 0 < b ≤ +∞, such that:

1814


0 < θ (0) = κ < +∞. limt→b θ (t) = +∞. limt→−∞ θ (t) = 0. θ (t) ≥ M1 , ∀t ∈ [0, b] and for some M > 0. Condition 1a) has already been considered in Bregman distances approach, see [12],[5] but not in like-distances ones, see [9],[1], [14], [2], in our case the penalty function does not need to pass through the origin with slope one. Associated with the function θ we consider and denote its conjugate function θ∗ , see [15] which satisfies the following properties: 1b) θ∗ is a strictly convex differentiable function on (0, +∞). 2b) θ∗ is decreasing on (0, κ) and increasing on (κ, +∞) with θ∗(κ) ∈ R. 3b) limt→0+ (θ∗ )(t) = −∞ and limt→+∞ (θ∗ )(t) = +∞. 4b) (θ∗ ) (t) ≤ M for all t ≥ κ = θ (0). Condition 2b) with κ = 1 is used in all the like-distances known until now. Observe that negative values for θ∗ are allowed. 1a) 2a) 3a) 4a)

Example 2.1. For θ(t) = et+1 , we have θ(0) = e, θ (0) = e = κ, (θ∗) (e) = 0 and θ∗ (e) = −e. So, θ∗ has its minimal point at (e, −e). We show the graphics of θ and θ∗ in the next figure. 25

4

3 20 2

15 1

0 10

−1 5 −2

0 −1

0

1

2

−3

0

2

4

6

8

10

Fig.1 Graphics of θ and θ∗

2.2

Shift in penalty functions

We use a constant shift to define a generalized penalty function which will be used in section (3.2) in the context of the multiplier’s method. y ) = c with c ∈ R++ . According to the corollary We choose y˜ ∈ R so that θ(˜ 23.5.1 in [15] we have (θ )−1 = (θ∗ )

(8)

1815


and so y ) = c ⇔ y˜ = (θ∗ ) (c). θ(˜

(9)

Given p ≥ 0, r ∈ (0, 1] , we define the generalized penalty function with shift Pp as: y ∈ R ,μ ∈ m

Rm ++

→ Pp (y, μ, r, c) =

m

Pp,i (yi , μi , r, c)

(10)

i=1

where Pp,i (yi , μi , r, c) = r

μpi c

yi θ μp−1 + y ˜ − θ(˜ y ) for i = 1, ..., m, θ satisfies r i

conditions 1a)-4a) and θ (˜ y ) = c. Note that Pp,i (0, μi , r, c) = 0 for i = 1, . . . , m and yi μi + y˜ , (Pp,i )1(yi , μi , r, c) = θ c μp−1 r i where (Pp,i )1 =

∂Pp,i , ∂yi

so by (9)

(Pp,i )1 (0, μi , r, c) =

μi y ) = μi , for i = 1, . . . , m. θ (˜ c

(11)

Remark An interesting fact in the penalty function Pp is that it considers for the first time rational exponents on the multiplier vectors, although we show only convergence results for p ≥ 2. Geometrically the shift is a translation that satisfies the equation (11). The conditions θ(0) = 0 and θ (0) = 1 were used in all the multiplier methods known until now that lead to like-distances, see [9], [1], [2], [14], but not in this one, what really matter is that equation (11) holds. Another relevant aspect of the shift in this penalty function is that it allows us to construct a generalized like-distance in the dual space applying the conjugacity theory as shown in the next proposition. Proposition 2.2. Let θ be a penalty function satisfying conditions 1a)-4a). Given p ≥ 0, r ∈ (0, 1], y˜ ∈ R, c ∈ R++ , consider m μpi yi m m r y ∈ R , μ ∈ R++ → Pp (y, μ, r, c) = ˜ − θ(˜ y) (12) θ p−1 + y c μ r i i=1 y ) = c then where θ(˜ ∗ (s) Pp,μ,r,c

=r

m p μ i

i=1

c

θ

∗

csi μi

μpi ∗ p−1 ∗ − θ (c) − μi (θ ) (c)(si − μi ) . c

(13)

1816


Proof. Consider yi r p Pp,i (yi , μi , r, c) = μi θ + y˜ − θ(˜ y) c μp−1 r i

for i = 1, ..., m

y) = c. where θ satisfies conditions 1a)-4a) and θ (˜ m be fixed and proposition 1.3.1 in [11], we have Let r, c, μi , for i = 1, ..., ∗ (si ) = Pp,μ i ,r,c

∗

+ y˜ − rc μpi θ(˜ y) for i = 1, ..., m ∗

yi y˜ + y˜ + rc μpi θ(˜ y) = rc μpi θ μp−1 r i i si y˜ + rc μpi θ(˜ y ). − rμp−1 = rc μpi θ∗ cs i μi r p μθ c i

yi μp−1 r i

By theorem 25.3 in [15], if y˜ ∈ dom θ and c ∈ dom θ∗ we know that y for i= 1, θ(˜ y ) + θ∗ (c) = c˜ . . . , m, by using (9) we have p ∗ i Pp,μ − μp−1 (si ) = rc μi θ∗ cs rsi y˜ + rc μpi [c˜ y − θ∗ (c) i i ,r,c μi i ry˜(si − μi ) − r μp θ∗ (c) − μp−1 = rc μpi θ∗ cs i μi c i i − rc μpi θ∗ (c) − μp−1 r(θ∗ )(c)(si − μi ). = rc μpi θ∗ cs i μi Finally, ∗ Pp,μ,r,c (s)

=r

m p μ i

c

i=1

2.3

θ

∗

csi μi

μpi ∗ p−1 ∗ − θ (c) − μi (θ ) (c)(si − μi ) . c

The generalized like-distance

According with definition 2.1 in [10], given S ⊂ Rn ; d : S × S → R is called a divergence measure in S if and only if: i) d(x, y) ≥ 0, ∀x, y ∈ S. ii) If{xk } ⊂ S, x ∈ S, then lim d(x, xk ) = 0 ⇔ lim xk = x. k→+∞

k→+∞

iii) The partial level sets Γ1 (y, ν) = {x ∈ S : d(x, y) ≤ ν} are bounded ∀y ∈ S and ∀ν > 0. iv) The partial level sets Γ2 (x, ν) = {y ∈ S : d(x, y) ≤ ν} are bounded ∀x ∈ S and ∀ν > 0. Using (13), we define the generalized like-distance as dpθ∗ (x, y)

=

m p y i

i=1

c

θ

∗

cxi yi

yip ∗ p−1 ∗ − θ (c) − yi (θ ) (c)(xi − yi ) . c

(14)

In the appendix we show that it is a divergent measure. Observe that dpθ∗ (·, y) is a strictly convex function because θ∗ is one.

1817


Note that for p = 1, c = 1, y˜ = 0 with θ(˜ y ) = 0 we get: P1∗ (s, μ, r, c)

= r dθ∗ (s, μ) = r

m

μi θ

∗

i=1

si μi

which is used in proximal methods with φ-divergences, see [10], [14]. For p = 2, c = 1, y˜ = 0 with θ(˜ y ) = 0 we get P2∗ (s, μ, r, c)

= r d˜θ∗ (s, μ) = r

m i=1

μ2i θ∗

si μi

the second order homogeneous kernel used in [1] and in [2] for a specific case.

Example 2.3. t have θ∗ (s) = s ln(s) − s, (θ∗) (s) = ln(s) and For m = 1 and θ(t) = e , we dpθ∗ (s, μ) = μp−1 s ln( μs ) − s + μ . This like-distance can be considered as a generalization of the Kullback-Leibler entropy distance. The same expression can be obtained from θ(t) = et − 1 or from θ(t) = et+K with K ∈ R.

2.3.1

Geometric Interpretation

Consider the generalized like-distance defined in (14) and given μ ∈ Rm ++ , we define the convex function m μpi ∗ csi m θ . hμ : R+ → R as hμ (s) = c μi i=1 Note that hμ (μ) =

m μp i

c

θ∗ (c) and

i=1 t p−1 ∗ cs1 p−1 ∗ csm ∇hμ (s) = μ1 (θ ) μ1 , . . . , μm (θ ) μm

∗ p−1 ∗ t so that ∇hμ (μ) = (μp−1 1 θ (c), . . . , μm θ (c)) . We have then

dpθ∗ (s, μ) = hμ (s) − hμ (μ) − ∇hμ (μ)t (s − μ), that is, for each μ, dpθ∗ (s, μ) is the difference between hμ (s) and its linear approximation at s = μ. m μpi ∗ csi θ is a strictly It can be observed that for each μ > 0, hμ (s) = c μ i i=1

1818


convex function and it is not difficult to prove that, for each μ > 0 it generates m a Bregman distance Dhµ : Rm + × R++ → R given by Dhµ (x, y) = hμ (x) − hμ (y) − ∇hμ (y)t (x − y),

(15)

see [12], so we have for each μ > 0, the generalized like-distance satisfies dpθ∗ (s, μ) = hμ (s) − hμ (μ) − ∇hμ (μ)t (s − μ) = Dhµ (s, μ).

(16)

If we consider a sequence {μk } in (15) and in (16) we have in (15) a sequence of Bregman distances, so that the generalized like-distance at μ = μk coincide with Dhµk (x, y) at y = μk . This suggest the posibility of studying proximal methods with a sequence of Bregman distances depending on a parameter or varying the induced Bregman function at each iteration. This will not be considered in this paper. Figure 2 below shows the graph of hμ (s) and its linear approximation at μ for μ=2, 32 , 1, c = 12 , p = 52 and θ∗ as in example 1. On the right, the graph of dpθ∗ (s, μ) for the same values. 30

25

25

20

20

15

10

15

5

0

10 μ=1

−5 μ=1.5

−10

−15

−20

5

μ=2

0

5

10

0

15

0

2

4

6

8

10

Fig.2

3 3.1

Methods and theorems Proximal method

We consider the generalized like-distance applied directly over the dual problem. The proximal method to solve problem (D) defined in (1) generates a sequence {μk } such that μ0 ∈ Rm ++ and μk+1 = argmin{−d(μ) + rk dpθ∗ (μ, μk )} r, r] ⊂ (0, +∞) and θ∗ as defined in section (2.1). where rk ∈ [ By optimality condition we have

0 ∈ ∂ −d(μk+1 ) + rk dpθ∗ (μk+1 , μk )

(17)

1819


or equivalently k+1 k+1 k p−1 ∗ cμ1 ∗ k p−1 ∗ cμm ∗ −r (μ1 ) [θ ( k ) − θ (c)], ..., (μm) [θ ( k ) − θ (c)] ∈ ∂(−d(μk+1 )). μ1 μm (18) k

Proposition 3.1. {−d(μk )} is a non-increasing convergent sequence. Proof. Due to the optimality conditions in (17) we have μk+1 = argminμ>0 {−d(μ) + rk dpθ∗ (μ, μk )}, so −d(μk+1 ) + rk dpθ∗ (μk+1 , μk ) ≤ −d(μk ) + rk dpθ∗ (μk , μk ) = −d(μk ), that is −d(μk+1 ) ≤ −d(μk ). By duality theory, f ∗ is a lower bound for {−d(μk )}, so {−d(μk )} is convergent. Proposition 3.2. The sequence {μk } generated by (17) is bounded. Proof. By (H2) and since −d is a proper convex function, the level set k Λ = {μ ∈ Rm + / − d(μ) ≤ −d(μ0 )} is compact and by proposition 3.1, μ ∈ Λ for all k, so μk is bounded. In the next subsection, based on proposition 2.2 a primal multiplier method associated to the proximal one will be obtained using the generalized penalty functions defined in (10).

3.2

Augmented Lagrangian algorithm

Consider problem (P ) with hypothesis (H1) and (H2) and the θ-functions satisfying conditions 1a)-4a) in section (2.1). y) = c, the augmented Given p ≥ 0, r ∈ (0, 1], y˜ ∈ R, c ∈ R++ with θ (˜ Lagrangian function is given by Lr,c (x, μ) = f0 (x) +

m

Pp,i (fi (x), μi , r, c)

i=1

fi (x) y ) for i = 1, · · · , m. θ μp−1 r + y˜ − θ(˜ where Pp,i (fi (x), μi , r, c) = i The multiplier method associated to the proximal one is given by: μpi c

x

k+1

∈ argmin{f0 (x) +

m

Pp,i (fi (x), μki , rk , c)},

(19)

i=1

μk+1 i

μk = i θ c

fi (xk+1 ) + y˜ (μki )p−1 rk

for i = 1, · · · , m and rk ∈ (0, 1].

(20)

1820


Observe that 0 ∈ ∂x Lrk ,c (xk+1 , μk ) ⇔ 0 ∈ ∂xl(xk+1 , μk+1 ) where l is the lagrangian function defined in (2). The next proposition shows that the sequences defined in (20) and (17) are the same. Proposition 3.3. Let { μk } be the sequence generated by (17) which solves the dual problem (D) and let {xk } and {μk } be the sequences generated by (19) and 0 , then ∀k > 0 : μk = μ k . (20) which solve the primal problem (P ). If μ0 = μ The proof is direct from theorem 7.1 in [8].

3.3

Convergence results

This section is inspired from the convergence theorems in [2] and [14]. From (14) we can write dpθ∗ (s, μ) = μp

m

d(si , μi )

(21)

i=1 μp

i − ci θ∗ (c) − μp−1 (θ∗ )(c)(si − μi ) for i = . . . , m and where d(si , μi ) = ci θ∗ cs i μi so

∂d(si ,μi ) p−1 ∗ csi ∗ (θ ) μi − (θ ) (c) . d1 (si , μi ) = ∂si = μi

Lemma 3.4. Let s, μ be positive real numbers with s > μ then d(s, μ) ≥

1 [d1 (s, μ)]2 2 c Mμp−2

where M = max{(θ ∗) (t)/t ≥ κ = θ (0)} and c = θ (˜ y). Proof. Consider the quadratic function 1 q(t) = q(s) + (t − s)d1 (s, μ) + (t − s)2 cMμp−2 2 so

so

q (t) = d1 (s, μ) + (t − s)cMμp−2 and q (t) = 0 ⇔ d1 (s, μ) + tcM μp−2 − scMμp−2 = 0 −d (s,μ) −d (s,μ) μp−2 = cM1μp−2 + s ⇔ t = cM1μp−2 + scM cM μp−2 t∗ =

−d1 (s, μ) +s cMμp−2

(22)

1821


is a minimizer of q(·). Since s > μ > 0 d1 (s, μ)

p−1

=μ

∗

(θ )

cs μ

∗

− (θ ) (c) > 0.

Then, from (22 ) t∗ < s. Furthermore, μ, μ) for some μ ˆ ∈ [μ, s] d1 (s, μ) = d1 (s, μ) − d1 (μ, μ) = (s − μ)d1 (ˆ μ p−2 = (s − μ)μp−2 (θ∗) ( cˆ )c ≤ (s − μ)μ cM. μ So

d1 (s,μ) cM μp−2

≤ s − μ and μ≤s−

d1 (s, μ) = t∗ , cMμp−2

(23)

then μ ≤ t∗ < s. On the other hand, for all t ∈ [μ, s) d1 (s, μ) − d1 (t, μ) = (s − t)d1 (t, μ) for some t ≤ t ≤ s )c = (s − t)μp−2 (θ∗) ( ct μ p−2 ≤ (s − t)μ cM so d1 (s, μ) ≤ d1 (t, μ) + (s − t)μp−2 cM for all μ ≤ t ≤ s and d1 (t, μ) ≥ d1 (s, μ) + (t − s)μp−2 cM = q (t). We have then

d1 (t, μ) ≥ q (t) for all μ ≤ t ≤ s.

Since μ ≤ t∗ < s, integrating from t∗ to s we get d(s, μ) − d(t∗ , μ) ≥ q(s) − q(t∗). So d(s, μ) ≥ q(s) − q(t∗) + d(t∗, μ) ≥ q(s) − q(t∗) and 1 d(s, μ) ≥ q(s) − q(s) − (t∗ − s)d1 (s, μ) − (t∗ − s)2 cMμp−2 . 2

(24)

−d (s,μ)

From (23) t∗ − s = cM1μp−2 , replacing in (24) we have d(s, μ) ≥

[d1 (s,μ)]2 cM μp−2

=

−

2 1 [d1 (s,μ)] 2 cM μp−2

2 1 [d1 (s,μ)] . p−2 2 cM μ

(25)

1822


Finally, d(s, μ) ≥

1 [d1 (s, μ)]2 . 2 cMμp−2

The next proposition uses implicitly a similar version of property (D4) in [2], and it is valid for p ≥ 2. Proposition 3.5. (Asymptotic feasibility) Consider p ≥ 2. The sequence {[fi (xk )]+ } converges to 0 for all i = 1, ..., m where [y]+ = max{0, y}. Proof. Let us suppose by absurd that {[fi (xk )]} 0, then there exists {lk } such that lk → +∞ with lk < lk+1 and ∃ > 0 such that fi0 (xlk ) > for some i0 ∈ {1, . . . , m}. m m k+1 μi fi (x)} ≤ f0 (x ) + μi fi (xk+1 ) We have d(μ) = inf{f0(x) + = f0 (xk+1 ) +

m

i=1

μk+1 fi (xk+1 ) + i

m

i=1

)+ = d(μk+1 i From (20),

m

i=1

(μi − μk+1 )fi (xk+1 ) i

i=1

(μi − μk+1 )fi (xk+1 ) for i = 1, . . . , m. i

i=1

μk+1 i

μk = i θ c

fi (xk+1 ) + y˜ (μki )p−1rk

for i = 1, . . . , m

and by using (θ )−1 = (θ∗ ) we get k+1 cμi k k p−1 ∗ ∗ r (μi ) (θ ) − (θ ) (c) = fi (xk+1 ) for i = 1, . . . , m. μki

(26)

Using notation in (21), μpi ∗ d(si , μi ) = θ c

csi μi

and

μpi ∗ (θ∗ )(c)(si − μi ) for i = 1, . . . , m − θ (c) − μp−1 i c

d1 (si , μi )

=

μp−1 i

∗

(θ )

csi μi

∗

− (θ ) (c)

for i = 1, . . . , m,

(27)

from (26) and (27) fi (xk+1 ) = d1 (μk+1 , μki ) for i = 1, . . . , m. i k r

(28)

1823

A generalized like-distance l

, μlik0 ) > > 0. Since fi0 (xlk+1 ) > , then rk d1 (μik+1 0 l l This implies d1 (μik+1 , μlik0 ) > 0 and so μik+1 > μlik0 , and we can use 0 0 proposition 3.4. Since μk+1 = argmin{−d(μ) + dpθ∗ (μ, μk )}, we have−d(μk+1 ) + dpθ∗ (μk+1 , μk ) ≤ −d(μk ) + dpθ∗ (μk , μk ) = −d(μk ), so d(μk+1 ) − d(μk ) ≥ dpθ∗ (μk+1 , μk ) ≥ 0. m l l p lk+1 lk lk+1 lk Then d(μ ) − d(μ ) ≥ dθ∗ (μ , μ ) = d(μik+1 , μlik ) ≥ d(μik+1 , μlik0 ) . 0 i=1

By proposition 3.2, let μ an upper bound of {μk } and by lemma 3.4 we have , μlik0 )]2 1 [d1 (μik+1 2 0 ) − d(μ ) ≥ ≥ =δ>0 2cM 2cM μ p−2 (μlik0 )p−2 i0 l

lk+1

d(μ

lk

for p ≥ 2. Then d(μlk+1 ) − d(μlk ) ≥ δ > 0 and taking limits when k → ∞ we have that 0 is greater than a positive value , which is a contradiction. Finally, lim [fi (xk )]+ = 0. k→+∞

The next proposition use the following affirmation:

y) Fact 1: For all t, t θ (t+ ≥ t. c

y) = c Proof. If t > 0 ⇒ t + y > y ⇒ θ (t + y) > θ( y) θ (t+y) > 1 ⇒ t > t. ⇒ θ (t+ c c y) = c If t < 0 ⇒ t + y < y ⇒ θ (t + y) < θ ( θ (t+y) θ (t+y) ⇒ c < 1 ⇒ t c > t. Proposition 3.6. Let {xk } and {μk } be the sequences generated by (19) and (20), then limk→+∞ μki fi (xk ) = 0 for i = 1, · · · , m. Proof. By absurd, suppose there exists i0 ∈ {1, ..., m}, > 0 and an infinite index set {kj } such that k +1

|μi0j

fi0 (xkj +1 )| ≥ for all j.

Since {μk } is bounded, there exists μ > 0 such that: 0 < μk ≤ μ hence

k +1

|μi0j

for all k,

fi0 (xkj +1 )| ≥ ⇒ |fi0 (xkj +1 )| ≥

(29) . μ i0

1824


But {[fi(xk )]+ } converges to 0, for all i = 1, ..., m hence fi0 (xkj +1 ) ≥ μi is true 0 only for a finite set of index kj , so we can consider without lost of generality (30) fi0 (xkj +1 ) ≤ − μ i0 for all j. Since (f1 (xk+1 ), ..., fm(xk+1 ))t ∈ ∂d(μk+1 ) and d is a concave function, then m

k +1

fi (xkj +1 )(μi j

i=1

= Since μk+1 i

μki θ c

fi (xk+1 ) (μki )p−1 r k

− μi j ) ≤ d(μkj +1 ) − d(μkj ). k

+ y for i = 1, ..., m we have

k +1

fi (xkj +1 )(μi j

− μi j ) = μi j fi (xkj +1 ) k

k

1 θ c

k μi j fi (xkj +1 )

fi (xkj +1 )

k +1

μi j

k

μi j

−1

+ y − 1 k (μi j )p−1 rk k μi j fi (xkj +1 ) fi (xkj +1 ) k θ + y − μi j fi (xkj +1 ). = kj p−1 k c (μi ) r =

(31)

(32) (33)

Since θ is a strictly convex function, θ (t) > 0 for all t ∈ (−∞, b), then θ is increasing and so using (33) and fact 1 kj p k +1 k +1 k kj +1) kj k r (μi ) fi (x j ) fi (x j ) kj +1 )(μi − μi ) = θ + y − μi j fi (xkj +1 ) fi (x kj kj c p−1 k p−1 k (μi ) r (μi ) r kj kj kj +1 kj +1 ) − μi fi (x ) μi fi (x

hence 0 ≤

m

≥ = 0, k +1

fi (xkj +1 )(μi j

− μi j ) ≤ d(μkj +1 ) − d(μkj ), k

i=1 k

since {d(μ )} is convergent, then lim [d(μkj +1 ) − d(μkj )] = 0 j→+∞

and so m k +1 k fi (xkj +1 )(μi j − μi j ) = 0. lim j→+∞

i=1

k +1

k

Since fi (xkj +1 )(μi j − μi j ) ≥ 0 for all i = 1, ..., m k +1 k we have lim fi (xkj +1 )(μi j − μi j ) = 0 for all i = 1, ..., m. j→+∞

So from (33), lim

j→+∞

k μi j fi (xkj +1 )

1 θ c

fi (xkj +1 ) k

(μi j )p−1 rk

+ y − 1 = 0 for all i = 1, ..., m. (34)

1825


On the other hand fi0 (xkj +1 ) k rk (μi0j )p−1

+ y ≤

−

+ y < 0 + y,

k rk (μi0j )p−1 μ i0

and since θ is increasing, kj +1 f − (x ) i0 θ + y ≤ θ + y < θ( y ) = c, kj p−1 kj p−1 k k r (μi0 ) μ i0 r (μi0 ) so 1 − 1 fi0 (xkj +1 ) + y ≤ θ + y < 1. θ k k p−1 c c rk (μi j )p−1 μ i0 rk (μ j )

(35)

0

i0

From (34) and (35) lim μi0j fi0 (xkj +1 ) = 0. k

j→+∞

Since k +1

lim fi (xkj +1 )(μi j

j→+∞ k +1

then lim μi0j j→+∞

k

− μi j ) = 0

fi0 (xkj +1 ) = 0, which is a contradiction.

Consequently μki fi (xk ) → 0 for all i = 1, ..., m each limit point of the Theorem 3.7. The sequence {f0 (xk )} converges to f, k k sequences {x } and {μ } generated by the method (19) and (20) (or (19) and (17)) are optimal solutions for the problems (P ) and (D) respectively. Proof. Since {xk } is asymptotically feasible, for all > 0, f0 (xk ) ≥ f − for sufficiently large k. According to (H2), the optimal set of problem (D) is nonempty and compact, f = d where f = min{f0 (x) : x ∈ Rn , fi (x) ≤ 0, i = 1, ..., m} and

d = sup{d(μ) : μ ∈ Rm + }.

1826


Furthermore, for each β < d the level set {μ ∈ Rm + : d(μ) ≥ β} is compact. But f = d ≥ d(μk ) = inf {l(x, μk )} = l(xk , μk ) = f0 (xk ) + x

m

μki fi (xk )

(36)

i=1

and according to proposition 3.6 {μki fi (xk )} converges to 0 for all i = 1, ..., m so m k for all > 0, f − ≤ f0 (x ) ≤ f − μki fi (xk ) < f + for sufficiently large k and so

i=1

f0 (xk ) → f.

(37)

By proposition 3.5, for all > 0 and for i = 1, ..., m, fi (xk ) ≤ for sufficiently large k, then f0 (xk ) ≤ f + and fi (xk ) ≤ for i = 1, ..., m and sufficiently large k. According to (H1) and corollary 20 in [7] the set {x ∈ Rn : fi (x) ≤ α, f0(x) ≤ β} is compact for any α, β, then {xk } is bounded and by the proposition 3.2 the sequence {μk } also is bounded. If x¯ is a limit point of {xk } then by proposition 3.5 and (37) , x¯ is a primal optimal solution and if μ ¯ is a limit point of {μk }, Hence μ using (36) and proposition 3.6, d(¯ μ) = f = d. ¯ is a dual optimal solution.

4

Concluding remarks

For the first time there appear rational powers p in the multiplier vector, this leads us to think about the value of p with the best convergence rate in the context of multiplier methods. It was not treated in this paper and remains as an open problem, although we observe in computational testing that the number of iterations in the main algorithm decrease when p increases but it causes a bad numerical performance. Multiplier methods without shift with p = 3 was considered in [4] without a complete convergence rate study but with a similar convergence analysis. Convergence analysis for the case p=1 can be obtained following similar hypothesis and theorems as in [9]. In [13] convergence results were considered for p = 0 for a specific penalty function without shift. Other considerations about the value of p, for example, p ≤ 0 remain as open problems. Finally , we could consider shifts at each iteration as θ(˜ yik ) = cki and to choose k k k ci = r α as in [16] in order to relate both approaches.

1827


5

Appendix

Following definition 2.1 in [9] we next show that the generalized like-distance considered in this work is a divergence measure. Proposition 5.1. Given θ∗ satisfying conditions 1b)-4b), p ≥ 0, r ∈ [0, 1] and c > 0 then m p yi ∗ cxi yip ∗ p p−1 ∗ θ dθ∗ (x, y) = − θ (c) − yi (θ ) (c)(xi − yi ) c yi c i=1 is a divergence measure in Rn++ . Proof. 1) Since θ∗ is convex, ∀ z, w ∈ R θ∗(z) ≥ θ∗ (w) + (θ∗ )(w)(z − w), in i particular for w = c and z = cx , i = 1, ..., m with xi ≥ 0, yi > 0 and yi ∗ dom θ = (0, +∞), we have cxi cxi ∗ ∗ ∗ −c θ ≥ θ (c) + (θ ) (c) yi yi and for all i = 1, ..., m and p ≥ 0 yip ∗ cxi yp θ − i θ∗ (c) − yip−1(θ∗ )(c)(xi − yi ) ≥ 0 c yi c so

m p y i

i=1 p dθ∗ (x, y) k

c

θ

∗

cxi yi

(38)

yip ∗ p−1 ∗ − θ (c) − yi (θ ) (c)(xi − yi ) ≥ 0; c

≥ 0. that is 2) Let {x } be a sequence on (0, ∞) . We next show lim dpθ∗ (x, xk ) = 0 ⇔ lim xk = x.

k→+∞

k→+∞

Note that, i) θ∗ is non-increasing on (0, κ) and increasing on (κ, +∞). ii) (θ∗ ) (κ) = 0. iii) lim θ∗ (t) = +∞. t→+∞

So, lim dpθ∗ (x, xk ) = 0 k→+∞ m (xki )p ∗ (xki )p ∗ cx1 k p−1 ∗ k − ⇔ lim θ θ (c) − (xi ) (θ ) (c)(xi − xi ) = 0 ⇔ k→+∞ c c xki i=1 k p (xki )p ∗ (xi ) ∗ cxi k p−1 ∗ k θ θ (c) − (xi ) (θ ) (c)(xi − xi ) = 0, − lim k→+∞ c xki c

1828


for all i = 1,..., m cxi cxi ∗ = θ∗ (c), for all i = 1, ..., m ⇔ lim k = c, for i = 1, ..., m ⇔ lim θ k k→+∞ k→+∞ xi xi xi ⇔ lim k = 1, i = 1, ..., m k→+∞ xi ⇔ lim xki = xi , i = 1, ..., m k→+∞

⇔ lim xk = x. k→+∞

3) The level set Γ1 (y, υ) = {x ∈ (0, +∞)/dpθ∗ (x, y) ≤ υ} is bounded for all y ∈ (0, +∞) and for all υ ∈ (0, +∞). In fact, suppose for some y ∈ (0, +∞) and for some υ ∈ (0, +∞), Γ(y, υ) is not bounded. Then there exists {xk } in (0, +∞) / xk → +∞, but dpθ∗ (x, y) ≤ υ, from (14), m yip ∗ cxki yp θ − i θ∗ (c) − yip−1(θ∗ ) (c)(xki − yi) ≤ υ 0≤ c yi c i=1 and for i=1,...m yip ∗ cxki yip ∗ − 0≤ cθ θ (c) − yip−1 (θ∗ )(c)(xki − yi ) ≤ υ, so yi c k yp yp cx 0 ≤ ci θ∗ yii − yip−1 (θ∗) (c)xki ≤ υ + ci θ∗ (c) − yip−1(θ∗ ) (c)yi , and k yp cx yp 0 ≤ ci θ∗ yii − yip−1 (θ∗) (c)xki ≤ υ + ci θ∗ (c) − yip(θ∗ ) (c). If Mi = υ +

p

yi c

θ∗(c) − yip (θ∗) (c) for i = 1, · · · , m we have: cxki − yip−1 (θ∗) (c)xki ≤ Mi , and yi ⎛ ⎞ k ∗ cxi p θ ( ) y Mi yi ⎠ 0≤ i ⎝ ≤ k + yip−1 (θ∗ )(c). k c xi xi

yip ∗ 0≤ θ c

On the other ⎛ hand, ⎛ ⎞ ⎞ k k ∗ cxi ∗ cxi p p θ ( yi ) θ ( yi ) y ⎠ = lim yi ⎝ ⎠ lim i ⎝ k k→+∞ c k→+∞ c xi xki k p cxi c yi ∗ = c lim (θ ) k→+∞ yi yi p−1 = yi · (+∞) = +∞. i + yip−1 (θ∗ )(c) → yip−1(θ∗ ) (c), we obtain Furthermore : M xk i

+∞ ≤ yip−1(θ∗ )(c), which is a contradiction. 4) Next we prove that Γ2 (x, υ) = {y ∈ (0, +∞) / dpθ∗ (x, y) ≤ υ} is bounded

1829


for all x ∈ (0, +∞) and for all υ ∈ (0, +∞). Suppose there exists {y k } ⊂ Γ2 (x, υ) such that for some j; {yjk } is not bounded x and without loss of generality yjk+1 > yjk , ∀k. For x fixed, since ykj → 0 if x

cx

j

k → +∞, then for sufficiently large k: ykj < 1 implying that ykj < c and so j j (yk )p (yjk )p ∗ cxj j ∗ k p−1 ∗ k p−1 ∗ υ ≥ c θ yk − c θ (c) − (yj ) (θ ) (c)xj + (yj ) (θ ) (c)(yjk ) j

(yjk )p xj x cxj ∗ k p ∗ ≥ c (θ ) (ξ) c yk − 1 − (yj ) (θ ) (c) ykj + (yjk )p (θ∗) (c), < ξ < c. yjk

j

j x x = (yjk )p (θ∗) (ξ) ykj − 1 − (yjk )p(θ∗ ) (c) ykj − 1 j j

xj k p ∗ ∗ = (yj ) [(θ ) (ξ) − (θ ) (c)] yk − 1 j

xj k p ∗ = (yj ) (θ ) (ξ)(ξ − c) yj k − 1 → +∞ (ξ < ξ¯ < c ) which is a contradiction. Finally dpθ∗ (x, y) is a divergence measure.

References [1] A. Auslender, M. Teboulle and S. Ben-Tiba, Interior proximal and multiplier methods based on second order homogeneous kernels, Mathematics of Operations Research 24, No. 3, pp. 645-668, 1999. [2] A.M. Ben Tal and M. Zibulevsky, Penalty barrier multiplier methods for convex programming problems, SIAM Journal on Optimization 7, pp. 447336, 1997. [3] D.P. Bertsekas , Constrained Optimization and Lagrange Multiplier Methods, Academic Press, New York, 1982. [4] J. Campos , R.A. Castillo and E. Hernandez, A multiplier method with a third order homogeneous kernel, Technical Report, U.C.L.A, Venezuela, 2005. [5] R.A. Castillo and C. Gonzaga, Penalidades generalizadas e metodos de lagrangeano aumentado para promama¸c ao nao linear, Dsc. these, U.F.R.J, Brasil, 1998. [6] J. Eckstein , Nonlinear proximal point algoritms using Bregman funtions, with aplications to convex programming, Mathematics of Operations Researchs, 18, pp. 202-226, 1993. [7] A. Fiacco and G. McCormick, Nonlinear programming, Sequential Unconstrained Minimization Techniques, Classics in applied Mathemathics, SIAM, Philadhelpia, 1990.

1830


[8] A. Iusem, Metodos de ponto proximal em Optimiza¸c ao, 20 Coloquio Brasileiro de matematica,” IMPA, R.J, Brazil, 1995. [9] A. Iusem , M. Teboulle and B. Svaiter, Entropy-like proximal methods in convex programming, Mathematics of Operations Research, 19(4), pp. 790-814, 1994. [10] A. Iusem , Augmented Lagrangian methods and proximal point methods for convex optimization, Investigacion Operativa, 8 , pp. 11-50, 1997. [11] J.B. Hiriart-Urruty and C. Lemarechal, Convex Analysis and Minimization Algorithm II, Springer Verlag, New York, 1996. [12] K. Kiwiel , Proximal minimization methods with generalized Bregman funtions, SIAM J. on Control and Optimization, 35, 1142-1168, 19(4), 1997. [13] L. Matioli, Tese de Doutorado, Univesidade Federal de Santa Catarina, Brazil, 2000. [14] R. Poliak and M. Teboulle, Nonlinear rescaling and proximal-like methods in convex programming, 76, pp. 667-739, 1997. [15] R.T. Rockafellar , Convex Analysis, Princeton University Press, New Jersey, 1970. [16] P.J. Silva, J. Eckstein and C. Humes, Rescaling and Stepsize selection in proximal methods using separable generalized distances, SIAM Journal on Optimization 12(1), pp. 238-261, 2001. [17] M. Teboulle, Entropic proximal mappings with applied to nonlinear programming, Mathematics of Operations Research 17, pp- 97-116, 1992. Received: November 21, 2006

A Generalized Like-distance in Convex Programming - Hikari

A Generalized Like-distance in Convex Programming - Hikari

Suggest Documents

Online Convex Programming and Generalized Infinitesimal Gradient ...

New Properties of m-Convex Functions - Hikari

Jensen's Inequality in Detail and S-Convex Functions - m-hikari

A Weaker Form of a Generalized Closed Set - m-hikari

A Sequential Convex Semidefinite Programming ... - Optimization Online

Duality for a Convex Fractional Programming

A Repository of Convex Quadratic Programming Problems

Cycle Derivatives of the Generalized Fan and the Generalized ... - Hikari

Directionlets and Some Generalized Nonlinear Selfsimilarities - hikari

The generalized order-k Jacobsthal numbers - Hikari

Disciplined Convex Programming and CVX

Sequential Convex Programming - Stanford University

Automatic robust convex programming - CiteSeerX

A Generalized Integration by Parts1 Introduction Main Results - m-hikari

Semi Star Generalized w - Closed Sets in Bitopological ... - m-hikari

Solution of Generalized Fractional Kinetic Equation in Terms - Hikari

On Semi-Es-Open Sets in Hereditary Generalized ... - m-hikari

[hal-00423367, v1] Generalized convex functions and

Generalized Convex Functions and Their Applications

[hal-00423367, v1] Generalized convex functions

A New Generalized Data Stacking Programming ...

A Generalized Interval Fuzzy Chance-Constrained Programming ...

On Strongly Jensen m-Convex Functions - Hikari Ltd

A New Type of Stable Generalized Convex Functions - EMIS