The Log-Exponential Smoothing Technique and ...

Journal of Optimization Theory and Applications manuscript No. (will be inserted by the editor)

The Log-Exponential Smoothing Technique and Nesterov’s Accelerated Gradient Method for Generalized Sylvester Problems Nguyen Thai An · Daniel Giles Nguyen Mau Nam · R. Blake Rector

Received: date / Accepted: date

Abstract The Sylvester or smallest enclosing circle problem involves finding the smallest circle enclosing a finite number of points in the plane. We consider generalized versions of the Sylvester problem in which the points are replaced by sets. Based on the log-exponential smoothing technique and Nesterov’s accelerated gradient method, we present an effective numerical algorithm for solving these problems. Keywords log-exponential smoothing technique · majorization minimization algorithm · Nesterov’s accelerated gradient method · generalized Sylvester problem Mathematics Subject Classification (2000) Primary 49J52 · 49J53 · Secondary 90C30

1 Introduction

The smallest enclosing circle problem can be stated as follows: Given a finite set of points in the plane, find the circle of smallest radius that encloses all of the points. This problem was introduced in the 19th century by the English mathematician James Joseph Sylvester (1814–1897) [1]. It is both a facility location problem and a major problem in computational geometry. Over a century later, research related to the smallest Communicated by Horst Martini. N. T. An Thua Thien Hue College of Education, 123 Nguyen Hue, Hue City, Vietnam. E-mail: [email protected] D. Giles · N. M. Nam · R. Rector Fariborz Maseeh Department of Mathematics and Statistics, Portland State University, PO Box 751, Portland, OR 97207, United States. E-mail: [email protected], [email protected], [email protected]

2

Nguyen Thai An et al.

enclosing circle problem remains very active due to its important applications to clustering, nearest neighbor search, data classification, facility location, collision detection, computer graphics, and military operations. The problem has been widely treated in the literature from both theoretical and numerical standpoints; see [2–11] and the references therein. A generalized Sylvester problem called the smallest intersecting ball problem (SIBP, for brevity) was considered in [12, 13]. This problem asks for a smallest ball that intersects a finite number of target sets in Rn . Note that when the given sets are singletons, the SIBP reduces to the classical Sylvester problem. Besides the mathematical motivation, this problem appears in more complicated models of facility location in which the sizes of the locations are not negligible, as in bilevel transportation problems. The SIBP can be solved by minimizing a nonsmooth objective function represented as the maximum of the distances to a finite number of target sets. Minimizing functions of distances to sets has been used in other problems of optimization and computational statistics such as the generalized Fermat-Torricelli problem, the feasible point problem, the support vector machine; see [14–16] and the references therein. The nondifferentiability of the objective function in the SIBP makes it difficult to develop effective numerical minimization algorithms. In [12], after establishing some theoretical properties of the problem such as the existence and uniqueness of solutions as well as the optimality conditions, the authors used the subgradient method for numerically solving the problem. In fact, the subgradient method allows us to solve the SIBP involving not only the Euclidean distance but also distances generated by different norms. However, the subgradient method is known to be slow in general, so the main goal of this paper is to develop a more effective numerical algorithm for solving the SIBP. A natural approach is to approximate the nonsmooth objective function by a smooth one that is convenient for applying available smooth optimization schemes. Based on the log-exponential smoothing technique and Nesterov’s accelerated gradient method, we present an effective numerical algorithm for solving this problem. Our paper is organized as follows. Section 2 contains tools of convex optimization used throughout the paper. We study the existence and uniqueness of optimal solutions to the smallest intersecting ball problem in Section 3. In Section 4, we focus on the analysis of the log-exponential smoothing technique applied to the SIBP. Section 5 is devoted to developing an effective algorithm based on the majorization minimization

Log-Exponential Smoothing for Generalized Sylvester Problems

3

algorithm and Nesterov’s accelerated gradient method to solve the problem. We also analyze the convergence of the algorithm. Finally, we present some numerical examples in Section 6.

2 Problem Formulation and Tools of Convex Optimization

In this section, we introduce the mathematical models of the generalized Sylvester problems under consideration and some important tools of convex optimization used throughout the paper. The readers are referred to [17–20] for more complete theory of convex analysis and applications to optimization from both theoretical and numerical aspects. We use Rn to denote the n - dimensional Euclidean space, h·, ·i to denote the usual inner product in Rn , and k · k to denote the associated Euclidean norm. The distance function to a nonempty set Q is defined by d(x; Q) := inf{kx − ωk : ω ∈ Q}, x ∈ Rn .

(1)

Given x ∈ Rn , the Euclidean projection from x to Q is the set P (x; Q) := {ω ∈ Q : d(x; Q) = kx − ωk}. If Q is a nonempty, closed and convex set, then d(·; Q) is a convex function and P (x; Q) is a singleton for every x ∈ Rn . Furthermore, the projection operator is non-expansive. The projection operator and the distance function to a nonempty, closed and convex set Ω are related through the identity ∇d2 (x; Q) = 2[x − P (x; Q)], x ∈ Rn .

(2)

A standard proof of this fact can be found in [19, p. 186]. If d2 (x; Q) > 0, i.e., x ∈ / Q, then it follows from the chain rule that p x − P (x; Q) ∇d(x; Q) = ∇ d2 (x; Q) = . d(x; Q) One has ∇d(x; Q) = 0 if x is an interior point of Q. However, the differentiability of d(·; Q) at the boundary points of Q is not guaranteed. Given a convex function f : Rn →] − ∞, +∞] with x ¯ ∈ dom f := {x ∈ Rn : subdifferential (collection of subgradients) of f at x ¯ is defined by ∂f (¯ x) := {v ∈ Rn : hv, x − x ¯i ≤ f (x) − f (¯ x) for all x ∈ Rn }.

f (x) < +∞}, the

4


Let Q be a convex set in Rn and let x ¯ ∈ Q. The normal cone to Q at x ¯ is defined by N (¯ x; Q) := {v ∈ Rn : hv, x − x ¯i ≤ 0 for all x ∈ Q}. By [19, p. 259], the subdifferential of the distance function (1) to a nonempty, closed and convex set Q at x can be computed by the formula     N (x; Q) ∩ IB, if x ∈ Q, ∂d(x; Q) = x − P (x; Q)    , if x ∈ / Q, d(x; Q) where IB denotes the Euclidean closed unit ball of Rn .

(3)

Let Ωi for i = 1, . . . , m and Ω be nonempty, closed and convex subsets of Rn . The mathematical modeling of the smallest intersecting ball problem generated by the target sets Ωi and the constraint set Ω is minimize D(x) := max d(x; Ωi ) subject to x ∈ Ω. 1≤i≤m

(4)

A solution to this problem gives the center of a smallest Euclidean ball (with center in Ω) that intersects all target sets Ωi . We consider a more general setting in what follows. Let F be a closed, bounded and convex set that contains the origin as an interior point. We hold this as a standing assumption for the set F for the remainder of the paper. The support function associated with F is defined by σF (x) := sup{hx, ωi : ω ∈ F }.

(5)

Note that, if F = {x ∈ Rn : kxkX ≤ 1}, where k · kX is a norm in Rn , then σF is the dual norm of the norm k · kX . Let Q be a nonempty subset of Rn . The generalized distance from a point x ∈ Rn to Q generated by F is given by dF (x; Q) := inf{σF (x − ω) : ω ∈ Q}.

(6)

The generalized distance function (6) reduces to the distance function (1) when F is the closed unit ball of Rn with respect to the Euclidean norm. The readers are referred to [21] for important properties of the generalized distance function (6). Using (6), a more general model of problem (4) is given by minimize DF (x) := max dF (x; Ωi ) subject to x ∈ Ω. 1≤i≤m

(7)


5

The generalized distance function generated by a Minkowski gauge and in particular the distance function generated by a norm studied in [12, 16, 22] can be written in terms of the generalized distance function (6). In fact, the Minkowski gauge ρK (x) := inf{t > 0 : x ∈ tK} generated by a closed and convex set K containing the origin as an interior point has the representation ρK (x) = σK ◦ (x), where K ◦ denotes the polar of K given by K ◦ := {u ∈ Rn : hu, xi ≤ 1 for all x ∈ K}; see, e.g., [22, Proposition 2.1]. Thus, considering the generalized distance function generated by a support function allows us to solve the SIBP in many different ways where the intersecting ball is not necessarily the Euclidean ball or the ball generated by some `p -norm. The class of convex functions plays an important role in many applications of mathematics, especially applications to optimization. It is well-known that a convex function f : Rn → R has an absolute minimum on a convex set Ω at x ¯ if and only if it has a local minimum on Ω at x ¯. Moreover, if f : Rn → R is a differentiable convex function, then x ¯ ∈ Ω is a minimizer for f if and only if h∇f (¯ x), x − x ¯i ≥ 0 for all x ∈ Ω.

(8)

3 Existence and Uniqueness of Optimal Solutions

In this section, we study the SIBP in its general form (7). The main theorem of the section gives necessary and sufficient conditions for the SIBP (7) to have a unique optimal solution. Our results complete known results in [12, 13]. Given an element v ∈ Rn , the cone generated by v is defined by cone {v} := {λv : λ ≥ 0}. Let us review the following definition from [21]. We recall that F is a closed, bounded and convex set that contains zero in its interior. Definition 3.1 (i) The set F is called normally smooth iff for every x ∈ bd F there exists ax ∈ Rn such that N (x; F ) = cone {ax }. (ii) A subset Q of Rn is called strictly convex iff λx + (1 − λ)y ∈ int Q whenever x, y ∈ Q, x 6= y and λ ∈]0, 1[. Let us present the main theorem of this section.

6


Theorem 3.1 Suppose that F is normally smooth, all of the target sets Ωi for i = 1, . . . , m are strictly convex, and at least one of the sets Ω, Ω1 , ..., Ωm is bounded. Then the SIBP (7) has a unique optimal solution if and only if

Tm

i=1

(Ω ∩ Ωi ) contains at most one point.

Proof Obviously, every point in the set solution, we must have that

Tm

i=1

Tm

i=1

(Ω ∩ Ωi ) is a solution of (7). Thus, if (7) has a unique optimal

(Ω ∩ Ωi ) contains at most one point, so the necessary condition has been

proved. For the sufficient condition, assume that

Tm

i=1

(Ω ∩ Ωi ) contains at most one point. The existence of an

optimal solution is guaranteed by the assumption that at least one of the sets Ω, Ω1 , ..., Ωm is bounded, which implies that each level set {x ∈ Ω : DF (x) ≤ α}, α ∈ R, is bounded. What remains to be shown is the uniqueness of this solution. We consider two cases. In the first case, assume that

Tm

i=1

(Ω ∩ Ωi ) contains exactly one point x ¯. Observe that DF (¯ x) = 0 and

DF (x) ≥ 0 for all x ∈ Rn , so x ¯ is a solution of (7). If x ˆ ∈ Ω is another solution, then we must have DF (ˆ x) = DF (¯ x) = 0. Therefore, dF (ˆ x; Ωi ) = 0 for all i ∈ {1, . . . , m} and hence x ˆ∈

Tm

i=1

(Ω ∩ Ωi ) = {¯ x}.

We conclude that x ˆ=x ¯ and the problem has a unique solution in this case. Tm

(Ω ∩ Ωi ) = ∅ and show that the function n o 2 2 S(x) := max (dF (x; Ω1 )) , . . . , (dF (x; Ωm )) ,

For the second case, we assume that

i=1

is strictly convex on Ω. This will prove the uniqueness of the solution. Take any x, y ∈ Ω, x 6= y and t ∈]0, 1[. Denote xt := tx + (1 − t)y. Let i ∈ {1, . . . , m} such that 2

(dF (xt ; Ωi )) = S(xt ). Let u, v ∈ Ωi such that σF (x − u) = dF (x; Ωi ) and σF (y − v) = dF (y; Ωi ). Then we have 2

2

2

S(xt ) = (dF (xt ; Ωi )) = [dF (tx + (1 − t)y; Ωi )] ≤ [tdF (x; Ωi ) + (1 − t)dF (y; Ωi )] 2

= [tσF (x − u) + (1 − t)σF (y − v)] 2

2

= t2 (σF (x − u)) + 2t(1 − t)σF (x − u)σF (y − v) + (1 − t)2 (σF (y − v))

h i 2 2 2 2 ≤ t2 (σF (x − u)) + t(1 − t) (σF (x − u)) + (σF (y − v)) + (1 − t)2 (σF (y − v)) 2

= t (σF (x − u)) + (1 − t) (σF (y − v)) 2

2

2

= t (dF (x; Ωi )) + (1 − t) (dF (y; Ωi )) ≤ tS(x) + (1 − t)S(y).


7

Recall that we need to prove the inequality S(xt ) < tS(x) + (1 − t)S(y). Suppose by contradiction that S(xt ) = tS(x) + (1 − t)S(y). Then all of the inequalities in the previous estimate are turned to equalities and thus we have dF (xt ; Ωi ) = tdF (x; Ωi ) + (1 − t)dF (y; Ωi )

and σF (x − u) = σF (y − v).

(9)

dF (xt ; Ωi ) = σF (x − u) = σF (y − v).

(10)

Hence,

Observe that σF (w) = 0 if and only if w = 0, and hence it follows from (10) that x = u if and only if y = v. We claim that x 6= u and y 6= v. Indeed, if x = u and y = v, then x, y ∈ Ωi and hence xt ∈ Ωi by the convexity of Ωi . Thus, dF (xt ; Ωi ) = 0, a contradiction to the fact that dF (xt ; Ωi ) = DF (xt ) > 0 guaranteed by the assumption

Tm

i=1

(Ω ∩ Ωi ) = ∅.

Now, we will show that u 6= v. Denote c := tu + (1 − t)v ∈ Ωi . The properties of the support function and (9) give dF (xt ; Ωi ) ≤ σF (xt − c) = σF (t(x − u) + (1 − t)(y − v)) ≤ σF (t(x − u)) + σF ((1 − t)(y − v)) = tσF (x − u) + (1 − t)σF (y − v) = σF (x − u). By (10) we have σF (t(x − u) + (1 − t)(y − v)) = σF (t(x − u)) + σF ((1 − t)(y − v)) . Since F is normally smooth, it follows from [21, Remark 3.4] that there exists λ > 0 satisfying t(x − u) = λ(1 − t)(y − v). Now, by contradiction, suppose u = v. Then x − u = β(y − u) where β = t−1 λ(1 − t). Note that β 6= 1 since x 6= y, and σF (y − u) > 0 since y − u 6= 0. It follows that σF (x − u) = sup{hx − u, ωi : ω ∈ F } = sup{hβ(y − u), ωi : ω ∈ F } = β sup{hy − u, ωi : ω ∈ F } = βσF (y − u) 6= σF (y − u), which contradicts (10). Thus, u 6= v.

8


Since u, v ∈ Ωi , u 6= v and Ωi is strictly convex, c ∈ int Ωi . The assumption

Tm

i=1

(Ω ∩ Ωi ) = ∅ gives

dF (xt ; Ωi ) = DF (xt ) > 0. Therefore, xt ∈ / Ωi and thus xt 6= c. Let δ > 0 such that IB(c; δ) ⊂ Ωi . Then c + γ(xt − c) ∈ Ωi , with γ =

δ 2kxt −ck

> 0. We have

dF (xt ; Ωi ) ≤ σF (xt − c − γ(xt − c)) = (1 − γ)σF (t(x − u) + (1 − t)(y − v)) < σF (t(x − u) + (1 − t)(y − v)) = σF (x − u). t u

This contradicts (10) and completes the proof.

4 Smoothing Techniques for Generalized Sylvester Problems

In this section, we study the log-exponential smoothing technique for the SIBP. Although it is possible to formulate the results for the SIBP in its general form (7), we choose to present our results for the case of the SIBP generated by the Euclidean norm (4) for simplicity. We employ the approach developed by G. Zhou et al. [11] to approximate the nonsmooth function D in problem (4) by a smooth one which is convenient for applying available smooth numerical algorithms. The difference here is that we use distances to sets instead of distances to points. If all of the target sets have a common point which belongs to the constraint set, then such a point is a solution of problem (4), so we always assume that

Tn

i=1

(Ωi ∩ Ω) = ∅. We also assume that at least one

of the sets Ω, Ω1 , ..., Ωm is bounded which guarantees the existence of an optimal solution. These are our standing assumptions for the remainder of this paper. Let us start with some useful well-known results. We include the proofs for the convenience of the readers.

Lemma 4.1 Given positive numbers ai for i = 1, . . . , m, m > 1, and 0 < s < t, one has (i) (as1 + as2 + . . . + asm )1/s > (at1 + at2 + . . . + atm )1/t . 1/s

(ii) (a1

1/s

+ a2

1/r

(iii) limr→0+ (a1

1/s

1/t

1/t

1/t

+ . . . + am )s < (a1 + a2 + . . . + am )t . 1/r

+ a2

1/r

+ . . . + am )r = max{a1 , . . . , am }.

as Proof (i) Since t/s > 1 and Pm i s ∈]0, 1[, i=1 ai as t/s as t/s as as Pm 1 s + · · · + Pmm s < Pm 1 s + · · · + Pmm s = 1. i=1 ai i=1 ai i=1 ai i=1 ai


9

It follows that at at Pm 1 s t/s + · · · + Pm m s t/s < 1, ( i=1 ai ) ( i=1 ai ) and hence m X at1 + · · · + atm < ( asi )t/s . i=1

This implies (i) by raising both sides to the power of 1/t. (ii) The inequality in (ii) follows directly from (i). (iii) Defining a := max{a1 , . . . , am } yields 1/r

a ≤ (a1

1/r

+ a2

r r + + . . . + a1/r m ) ≤ m a → a as r → 0 ,

which implies (iii) and completes the proof.

For p > 0, the log-exponential smoothing function of D is defined by m X Gi (x, p) , D(x, p) = p ln exp p i=1 p where Gi (x, p) := d(x; Ωi )2 + p2 .

t u

(11)

The sets Ωi for i = 1, . . . , m are said to be non-collinear iff it is impossible to draw a straight line that intersects all of these sets.

Theorem 4.1 The function D(x, p) defined in (11) has the following properties: (i) If x ∈ Rn and 0 < p1 < p2 , then D(x, p1 ) < D(x, p2 ). (ii) For any x ∈ Rn and p > 0, 0 ≤ D(x, p) − D(x) ≤ p(1 + ln m).

(iii) For any p > 0, the function D(·, p) is convex. If we suppose further that the sets Ωi for i = 1, . . . , m are strictly convex and non-collinear, then D(·, p) is strictly convex. (iv) For any p > 0, D(·, p) is continuously differentiable with the gradient in x computed by m X Λi (x, p) ∇x D(x, p) = (x − x ei ) , Gi (x, p) i=1 where x ei := P (x; Ωi ), and exp (Gi (x, p)/p) . Λi (x, p) := Pm i=1 exp (Gi (x, p)/p)

10


(v) If at least one of the target sets Ωi for i = 1, . . . , m is bounded, then D(·, p) is coercive in the sense that lim

kxk→+∞

D(x, p) = +∞.

Proof (i) Define ai (x, p) := exp(Gi (x, p)), a∞ (x, p) := max ai (x, p) i=1,...,m

and G∞ (x, p) := max Gi (x, p). i=1,...,m

Then ai (x, p) is strictly increasing on ]0, +∞[ as a function of p and ai (x, p) = exp(Gi (x, p) − G∞ (x, p)) ≤ 1, a∞ (x, p) G∞ (x, p) ≤ D(x) + p. For 0 < p1 < p2 , it follows from Lemma 4.1(ii) that m m p2 p1 X X (ai (x, p1 ))1/p2 (ai (x, p1 ))1/p1 < ln D(x, p1 ) = ln i=1

i=1

< ln

m X

(ai (x, p2 ))1/p2

p2

= D(x, p2 ),

i=1

which justifies (i). (ii) It follows from (11) that for any i ∈ {1, . . . , m} we have Gi (x, p) = Gi (x, p) ≥ d(x; Ωi ). D(x, p) ≥ p ln exp p This implies D(x, p) ≥ D(x) for all x ∈ Rn and p > 0. Moreover, m hX ai (x, p) 1/p ip D(x, p) = ln a∞ (x, p) a∞ (x, p) i=1 = ln a∞ (x, p) + p ln

m X i=1

ai (x, p) 1/p a∞ (x, p)

≤ G∞ (x, p) + p ln m ≤ D(x) + p + p ln m. Thus, (ii) has been proved. √ (iii) Given p > 0, the function fp (t) := convex, so the function ki (x, p) :=

t2 +p2 p

is increasing and convex on the interval [0, +∞[, and d(·; Ωi ) is

Gi (x, p) is also convex with respect to x. For any x, y ∈ Rn and λ ∈]0, 1[, p

by the convexity of the function u = (u1 , . . . , um ) → ln

m X i=1

exp(ui ),


11

one has D(λx + (1 − λ)y, p) = p ln

m X

exp ki (λx + (1 − λ)y, p)

i=1

≤ p ln

m X

exp λki (x, p) + (1 − λ)ki (y, p)

i=1 m X

≤ λp ln

(12)

m X exp ki (x, p) + (1 − λ)p ln exp ki (y, p)

i=1

i=1

= λD(x, p) + (1 − λ)D(y, p). Thus, D(·, p) is convex. Suppose that the sets Ωi for i = 1, . . . , m are strictly convex and non-collinear, but D(·, p) is not strictly convex. Then there exist x, y ∈ Rn with x 6= y and 0 < λ < 1 such that D(λx + (1 − λ)y, p) = λD(x, p) + (1 − λ)D(y, p). Therefore, all the inequalities in (12) become equalities. Since the functions ln and exp are strictly increasing on ]0, +∞[, one has ki (λx + (1 − λ)y, p) = λki (x, p) + (1 − λ)ki (y, p)

(13)

for all i = 1, . . . , m. Observe that ki (·, p) = fp (d(·; Ωi )) and the function fp (·) is strictly increasing on [0, +∞[, it follows from (13) that d(λx + (1 − λy), Ωi ) = λd(x, Ωi ) + (1 − λ)d(y, Ωi ) for all i = 1, . . . , m. The result now follows directly from the proof of [21, Proposition 4.5]. (iv) Let ϕi (x) := [d(x; Ωi )]2 . According to (2), ϕi is continuously differentiable and ∇ϕi (x) = 2(x − x ei ), where x ei := P (x; Ωi ). Therefore, the function D(x, p) is continuously differentiable as a function of x. Consider the functions g : Rm → R defined by g(y) = p ln f (x) = (f1 (x), . . . , fm (x)) where fi (x) =

Gi (x,p) p

Pm

1 x−x ei p Gi (x, p)

exp(yi ) and f : Rn → Rm defined by

for i = 1, . . . , m. Then D(x, p) is the composition of f and

g. Note that, for any x ∈ Rn and y ∈ Rm , one has

∇fi (x) =

i=1





 exp(y1 )      p ..  . and ∇g(y) = Pm .    i=1 exp(yi )    exp(ym )

The gradient formula for D(x, p) is obtained by the chain rule.

12


(v) Without loss of generality, we assume that Ω1 is bounded. It then follows from (ii) that lim

kxk→+∞

D(x, p) ≥

lim

kxk→+∞

D(x) ≥

lim

kxk→+∞

d(x; Ω1 ) = +∞.

Therefore, D(·, p) is coercive, which justifies (iv). The proof is complete.

t u

Remark 4.1 (i) In general, D(·, p) is not strictly convex. For example, in R2 , consider the sets Ω1 = {−1} × [−1, 1] and Ω2 = {1} × [−1, 1]. Then D(·, p) takes a constant value on {0} × [−1, 1]. (ii) To avoid working with large numbers when implementing algorithms for (4), we often use the identity exp [(Gi (x, p) − G∞ (x, p)]/p) exp (Gi (x, p)/p) = Pm , Λi (x, p) := Pm exp [(G (x, p)/p) i i=1 i=1 exp [(Gi (x, p) − G∞ (x, p)]/p) where G∞ (x, p) := maxi=1,...,m Gi (x, p).

We now clarify the relation between problem (4) and problem of minimizing the function (11) on Ω. Note that the assumption involving the uniqueness of an optimal solution is guaranteed by Theorem 3.1.

Proposition 4.1 Let {pk } be a sequence of positive real numbers converging to 0. For each k, let y k ∈ argminx∈Ω D(x, pk ). Then {y k } is a bounded sequence and every cluster point of {y k } is an optimal solution of (4). Suppose further that (4) has a unique optimal solution. Then {y k } converges to that optimal solution.

Proof First, observe that {y k } is well defined because of the assumption that at least one of the sets Ω, Ω1 , . . . , Ωm is bounded and the coercivity of D(·, pk ). By Theorem 4.1 (ii), for all x ∈ Ω we have D(x, pk ) ≤ D(x) + pk (1 + ln m) and D(y k ) ≤ D(y k , pk ) ≤ D(x, pk ). Thus, D(y k ) ≤ D(x) + pk (1 + ln m), which implies the boundedness of {y k } using the boundedness of Ω or the coercivity of D(·) from Theorem 4.1 (v). Suppose that the subsequence {y kl } converges to y 0 . Then D(y 0 ) ≤ D(x) for all x ∈ Ω, and hence y 0 is an optimal solution of problem (4). If (4) has a unique optimal solution y¯, then y 0 = y¯ and hence y k → y¯.

t u

Recall that a function f : Ω → R is called strongly convex with modulus m > 0 on a convex set Ω iff 2 f (x) − m 2 kxk is a convex function on Ω. From the definition, it is obvious that any strongly convex function

is also strictly convex. Moreover, when f is twice differentiable, f is strongly convex with modulus m on an open convex set Ω if ∇2 f (x) − mI is positive semidefinite for all x ∈ Ω; see [19, Theorem 4.3.1(iii)].


13

Proposition 4.2 Suppose that all the sets Ωi for i = 1, . . . , m reduce to singletons. Then for any p > 0, the function D(·, p) is strongly convex on any bounded convex set, and ∇x D(·, p) is globally Lipschitz continuous on Rn with Lipschitz constant

2 p.

Proof Suppose that Ωi = {ci } for i = 1, . . . , m. Then m X gi (x, p) D(x, p) = p ln exp , p i=1 and the gradient of D(·, p) at x becomes ∇x D(x, p) =

m X λi (x, p) i=1

gi (x, p)

(x − ci ) ,

where gi (x, p) :=

p

exp (gi (x, p)/p) . kx − ci k2 + p2 and λi (x, p) := Pm i=1 exp (gi (x, p)/p)

Let us denote Qij :=

(x − ci )(x − cj )T . gi (x, p)gj (x, p)

Then   m m X X λ (x, p) λ (x, p)λ (x, p) λ (x, p) i i j  i (In − Qii ) + Qii − Qij  . ∇2x D(x, p) = g (x, p) p p i j=1 i=1 For a bounded convex set Ω, let K > 0 be such that the open ball with radius K centered at the origin contains Ω. For any x ∈ Rn , kxk < K and z ∈ Rn , z 6= 0, one has 1 1 (kzk2 − z T Qii z) ≥ (kzk2 − kzk2 k(x − ci )/gi (x, p)k2 ) gi (x, p) gi (x, p) p2 1 kzk2 =p kx − ci k2 + p2 kx − ci k2 + p2 p2 ≥ kzk2 [2(kxk2 + kci k2 ) + p2 ]3/2 ≥ `kzk2 , where ` :=

p2 [2K 2

+ 2 max1≤i≤m kci k2 + p2 ]3/2

.

For any real numbers a1 , . . . , am , since λi (x, p) ≥ 0 for all i = 1, . . . , m, and Schwartz inequality yields !2 m X λi (x, p)ai = i=1

m p X i=1

!2 p λi (x, p) λi (x, p) ai

≤

Pm

m X i=1

i=1

λi (x, p) = 1, the Cauchy-

λi (x, p)a2i .

14


This implies z T ∇2x D(x, p)z =

m X λi (x, p)

gi (x, p)

i=1

(kzk2 − z T Qii z)



 m m X X λ (x, p) λ (x, p)λ (x, p) i i j  + z T Qii z − z T Qij z  p p i=1 j=1  !2  m m X X 1 λi (x, p)a2i − λi (x, p)ai  ≥ `kzk2 +  p i=1 i=1 ≥ `kzk2 , where ai := z T (x − ci )/gi (x, p). This shows that D(·, p) is strongly convex on Ω. The fact that for any p > 0, the gradient of D(x, p) with respect to x is Lipschitz continuous with constant L=

2 p

t u

was proved in [23, Proposition 2].

As shown in [23], ∇x D(·, p) is

2 p -Lipschitz

when the sets involved are balls in Rn . In Proposition 4.2,

the sets are reduced to points, so as to be balls with radius zero. In the general case, for example when the sets are boxes, it is unknown if ∇x D(·, p) is Lipschitz. In the next section, we discuss how the majorization minimization principle allows us to minimize D(·, p) in either case.

5 The Majorization Minimization Algorithm for Generalized Sylvester Problems

In this section, we apply the majorization minimization well known in computational statistics along with the log-exponential smoothing technique from the previous section to develop an algorithm for solving the SIBP. We also provide some examples showing that minimizing functions involving distances to convex sets not only allows one to study a generalized version of the smallest enclosing circle problem, but also leads to other applications in constrained optimization. Let f : Rn → R be a convex function. Consider the optimization problem minimize f (x) subject to x ∈ Ω. A function g : Rn → R is called a surrogate of f at z¯ ∈ Ω iff f (x) ≤ g(x) for all x ∈ Ω, f (¯ z ) = g(¯ z ). The set of all surrogates of f at z¯ is denoted by S(f, z¯).

(14)


15

The majorization minimization algorithm for solving (14) is given as follows; see [24]. Algorithm 1. INPUT: x0 ∈ Ω, N for k = 1, . . . , N do Find a surrogate gk ∈ S(f, xk−1 ) Find xk ∈ argminx∈Ω gk (x) end for OUTPUT: xN

Clearly, the choice of surrogate gk ∈ S(f, xk−1 ) plays a crucial role in the majorization minimization algorithm. In what follows, we consider a particular method of choosing surrogates for the majorization minimization algorithm; see, e.g., [15, 25, 26]. An objective function f : Ω → R is said to be majorized by M : Ω × Ω → R iff f (x) ≤ M(x, y) and M(y, y) = f (y) for all x, y ∈ Ω. Given xk−1 ∈ Ω, we can define gk (x) := M(x, xk−1 ), so that gk ∈ S(f, xk−1 ). Then the update xk ∈ arg minx∈Ω M(x, xk−1 ) defines a majorization minimization algorithm. It has been shown in [15] that the majorization minimization algorithm using distance majorization provides an effective tool for solving many important classes of optimization problems. The key step is to use the following: d(x; Q) ≤ kx − P (y; Q)k and d(y; Q) = ky − P (y; Q)k. In the example below, we revisit an algorithm based on distance majorization and provide the convergence analysis for this algorithm. p Example 5.1 Given a data set S := {(ai , yi )}m i=1 , where ai ∈ R and yi ∈ {−1, 1}, consider the support vector

machine problem minimize

1 kxk2 2

subject to yi hai , xi ≥ 1 for all i = 1, . . . , m. Let Ωi := {x ∈ Rn : yi hai , xi ≥ 1}. Using the quadratic penalty method (see [15]), solving the support vector machine problem can be reduced to solving the following problem: m 1 µX minimize f (x) := kxk2 + [d(x; Ωi )]2 subject to x ∈ Rn , 2 2 i=1

(15)

16


where µ > 0. Using the majorization minimization algorithm with the surrogates m 1 µX gk (x) = kxk2 + kx − P (xk−1 ; Ωi )k2 2 2 i=1 for (15) yields m

xk =

X µ P (xk−1 , Ωi ). 1 + mµ i=1

Let m

i µ Xh 2 hk (x) := gk (x) − f (x) = kx − P (xk−1 ; Ωi )k2 − [d(x; Ωi )] . 2 i=1 We can show that ∇hk (x) is Lipschitz with constant L = mµ, hk (xk−1 ) = 0, and ∇hk (xk−1 ) = 0. Moreover, gk is strongly convex with parameter ρ = 1 + mµ and f is strongly convex with parameter µ = 1. By Proposition [24, Proposition 2.8], the MM method applied for (15) gives k−1 mµ mµ k ∗ kx0 − x∗ k2 ∀k ∈ N, f (x ) − V ≤ 2 mµ + 2 where x∗ is the optimal solution of (15) and V ∗ is the optimal value.

In what follows, we apply the majorization minimization algorithm in combination with the log-exponential smoothing technique to solve the SIBP (4). In the first step, we approximate the cost function D in (4) by the log-exponential smoothing function (11). Then the new function is majorized in order to apply the majorization minimization algorithm. For x, y ∈ Rn and p > 0, define ! p m X kx − P (y; Ωi )k2 + p2 G(x, y, p) := p ln exp . p i=1 Then G(x, y, p) serves as a majorization of the log-exponential smoothing function (11). From Proposition 4.2, for p > 0 and y ∈ Rn , the function G(x, y, p) with variable x is strongly convex on any bounded convex set and continuously differentiable with Lipschitz gradient on Rn . Our algorithm is explained as follows. Choose a small number p¯ > 0. In order to solve the SIBP (4), we minimize its log-exponential smoothing approximation (11): minimize D(x, p¯) subject to x ∈ Ω.

(16)

Pick an initial point x0 ∈ Ω and apply the majorization minimization algorithm with xk := arg min G(x, xk−1 , p¯). x∈Ω

The algorithm is summarized by the following.

(17)


17

Algorithm 2. INPUT: Ω, p¯ > 0, x0 ∈ Ω, m target sets Ωi , i = 1, . . . , m, N for k = 1, . . . , N do use a fast gradient algorithm to solve approximately xk := arg minx∈Ω G(x, xk−1 , p¯) end for OUTPUT: xN

Proposition 5.1 Given p¯ > 0 and x0 ∈ Ω, the sequence {xk } defined by xk := arg minx∈Ω G(x, xk−1 , p¯) has a convergent subsequence. Proof Denoting α := D(x0 , p¯) and using Theorem 4.1(v) imply that the level set L≤α := {x ∈ Ω : D(x, p¯) ≤ α} is bounded. For any k ≥ 1, because G(·, xk−1 , p¯) is a surrogate of D(·, p¯) at xk−1 , one has D(xk , p¯) ≤ G(xk , xk−1 , p¯) ≤ G(xk−1 , xk−1 , p¯) = D(xk−1 , p¯). It follows that D(xk , p¯) ≤ D(xk−1 , p¯) ≤ . . . ≤ D(x1 , p¯) ≤ D(x0 , p¯). This implies {xk } ⊂ L≤α which is a bounded set, so {xk } has a convergent subsequence. The convergence of the majorization minimization algorithm depends on the algorithm map ! p m X ky − P (x; Ωi )k2 + p¯2 . ψ(x) := arg min G(y, x, p¯) = arg min p¯ ln exp y∈Ω y∈Ω p¯ i=1

t u

(18)

In the theorem below, we show that the conditions in [15, Proposition 1] are satisfied. Theorem 5.1 Given p¯ > 0, the function D(·, p¯) and the algorithm map ψ : Ω → Ω defined by (18) satisfy the following conditions: (i) For any x0 ∈ Ω, the level set L(x0 ) := {x ∈ Ω : D(x, p¯) ≤ D(x0 , p¯)} is compact. (ii) ψ is continuous on Ω. (iii) D(ψ(x), p¯) < D(x, p¯) whenever x 6= ψ(x). (iv) Any fixed point x ¯ of ψ is a minimizer of D(·, p¯) on Ω.

18


Proof Observe that the function D(·, p¯) is continuous on Ω. Then the level set L(x0 ) is compact for any initial point x0 since D(·, p¯) is coercive by Theorem 4.1(v), and hence (i) is satisfied. From the strict convexity on Ω of G(·, x, p¯) guaranteed by Proposition 4.2, we can show that the algorithm map ψ : Ω → Ω is a single-valued mapping. Let us prove that ψ is continuous. Take an arbitrary sequence {xk } ⊂ Ω, xk → x ¯ ∈ Ω as k → +∞. It suffices to show that the sequence y k := ψ(xk ) tends to ψ(¯ x). It follows from the continuity of D(·, p¯) that D(xk , p¯) → D(¯ x, p¯), and hence we can assume D(xk , p¯) ≤ D(¯ x, p¯) + δ, for all k ∈ N, where δ is a positive constant. One has the estimates D(ψ(xk ), p¯) ≤ G(ψ(xk ), xk , p¯) ≤ G(xk , xk , p¯) = D(xk , p¯) ≤ D(¯ x, p¯) + δ, which implies that {y k } is bounded by the coerciveness of D(·, p¯). Consider any convergent subsequence {y k` } with the limit z. Since y k` is a solution of the smooth optimization problem miny∈Ω G(y, xk` , p¯), by the necessary and sufficient optimality condition (8) for the given smooth convex constrained optimization problem, we have h∇G(y k` , xk` , p¯), x − y k` i ≥ 0

∀x ∈ Ω.

This is equivalent to X m i=1

λi (y k` , p¯) k` y − P (xk` ; Ωi ) , x − y k` k ` gi (y , p¯)

≥0

∀x ∈ Ω,

where q k` exp g (y , p ¯ )/¯ p i k . gi (y , p¯) = ky k` − P (xk` , Ωi )k2 + p¯2 and λi (y ` , p¯) = Pm k` ¯)/¯ p) i=1 exp (gi (y , p k`

Since the Euclidean projection mapping to a nonempty, closed and convex set is continuous, by passing to a limit, we have h∇G(z, x ¯, p¯), x − zi ≥ 0

∀x ∈ Ω.

Thus, applying (8) again implies that z is also an optimal solution of the problem miny∈Ω G(y, x ¯, p¯). By the x). uniqueness of solution and ψ(¯ x) = argminy∈Ω G(y, x ¯, p¯), one has that z = ψ(¯ x) and {y k` } converges to ψ(¯ Since this conclusion holds for all convergent subsequences of the bounded sequence {y k }, the sequence {y k } itself converges to ψ(¯ x), which shows that (ii) is satisfied. Let us verify that D(ψ(x), p¯) < D(x, p¯) whenever ψ(x) 6= x. Observe that ψ(x) = x if and only if G(x, x, p¯) = miny∈Ω G(y, x, p¯). Since G(y, x, p¯) has a unique


19

minimizer, we have the strict inequality G(ψ(x), x, p¯) < G(x, x, p¯) whenever x is not a fixed point of ψ. Combining with D(ψ(x), p¯) ≤ G(ψ(x), x, p¯) and D(x, p¯) = G(x, x, p¯), we arrive at the conclusion (iii). Finally, we show that, any fixed point x ¯ of algorithm map ψ(x) is a minimizer of D(x, p¯) on Ω. Fix any x ¯ ∈ Ω such that ψ(¯ x) = x ¯. Then G(¯ x, x ¯, p¯) = miny∈Ω G(y, x ¯, p¯), which is equivalent to h∇G(¯ x, x ¯, p¯), x − x ¯i ≥ 0

∀x ∈ Ω.

This means X m i=1

λi (¯ x, p¯) (¯ x − P (¯ x; Ωi )) , x − x ¯ gi (¯ x, p¯)

≥0

∀x ∈ Ω,

where gi (¯ x, p¯) =

p p k¯ x − P (¯ x, Ωi )k2 + p¯2 = d(¯ x; Ωi )2 + p¯2 = Gi (¯ x, p¯)

and exp (gi (¯ x, p¯)/¯ p) = Λi (¯ x, p¯). λi (¯ x, p¯) = Pm x, p¯)/¯ p) i=1 exp (gi (¯ This inequality, however, is equivalent to the inequality h∇D(¯ x, p¯), x − x ¯i ≥ 0 for all x ∈ Ω, which in turn holds if and only if x ¯ is a minimizer of D(x, p¯) on Ω.

t u

Corollary 5.1 Given p¯ > 0 and x0 ∈ Ω, the sequence {xk } of exact solutions xk := arg minx∈Ω G(x, xk−1 , p¯) generated by Algorithm 2 has a subsequence that converges to an optimal solution of (16). If we suppose further that problem (16) has a unique optimal solution, then {xk } converges to this optimal solution.

Proof It follows from Proposition 5.1 that {xk } has a subsequence {xk` } that converges to x ¯. Applying [15, Proposition 1] implies that kxk` +1 − xk` k → 0 as k → ∞. From the continuity of the algorithm map ψ and the equation xk` +1 = ψ(xk` ), one has that ψ(¯ x) = x ¯. By Theorem 5.1(iv), the element x ¯ is an optimal solution of (16). The last conclusion is obvious.

t u

In what follows, we apply Nesterov’s accelerated gradient method introduced in [27, 28] to solve (17) approximately. Let f : Rn → R be a a smooth convex function with Lipschitz gradient. That is, there exists ` ≥ 0 such that k∇f (x) − ∇f (y)k ≤ `kx − yk for all x, y ∈ Rn .

20


Let Ω be a nonempty, closed and convex set. In his seminal papers [27, 28], Nesterov considered the optimization problem minimize f (x) subject to x ∈ Ω. For x ∈ Rn , define TΩ (x) := arg min

` h∇f (x), y − xi + kx − yk2 : y ∈ Ω . 2

Let d : Rn → R be a strongly convex function with parameter σ > 0. Let x0 ∈ Ω such that x0 = arg min {d(x) : x ∈ Ω}. Further, assume that d(x0 ) = 0. For simplicity, we choose d(x) = 12 kx − x0 k2 , where x0 ∈ Ω, so σ = 1. It is not hard to see that ∇f (xk ) y k = TΩ (xk ) = P xk − ;Ω . ` Moreover, k

k

z =P

1Xi+1 ∇f (xi ); Ω x − ` i=0 2 0

! .

Nesterov’s accelerated gradient algorithm is outlined as follows. Algorithm 3. INPUT: `, x0 ∈ Ω set k = 0 repeat find y k := TΩ (xk )

P ` i+1 d(x) + ki=0 [f (xi ) + h∇f (xi ), x − xi i] : x ∈ Ω σ 2 2 k+1 k set xk := zk + y k+3 k+3 find z k := arg min

set k := k + 1 until a stopping criterion is satisfied. OUTPUT: y k .

It has been experimentally observed that the algorithm is more effective if, instead of choosing a small value p ahead of time, we change its value using an initial value p0 and define ps := σps−1 , where σ ∈]0, 1[. We finally present the main algorithm of the paper. This algorithm combines the majorization minimization algorithm and Nesterov’s accelerated gradient method to give an effective optimization scheme for solving the SIBP.


21

Algorithm 4. INPUT: Ω, > 0, p0 > 0, x0 ∈ Ω, m target sets Ωi , i = 1, . . . , m, N set p = p0 , y = x0 for k = 1, . . . , N do use Nesterov’s accelerated gradient method to solve approximately y := arg minx∈Ω G(x, y, p) set p := σp end for OUTPUT: y

Remark 5.1 (i) When implementing this algorithm, we usually desire to maintain pk > , where < p0 . So the factor σ can be calculated based on the desired number of iterations N to be run, i.e., σ = (/p0 )1/N . (ii) In Nesterov’s accelerated gradient method, at iteration k, we often use the stopping criterion k∇x G(x, y, p)k < γk , where γ0 is chosen and γk = σ ˜ γk−1 for some σ ˜ ∈]0, 1[. The factor σ ˜ can be calculated based on the number of iterations N and the lower bound ˜ > 0 for γk as above.

6 Numerical Implementation For solving problem (7), we can use the subgradient method. At the k th iteration, we need to find a subgradient wk ∈ ∂DF (xk ). Using the formula for subdifferentials of maximum functions, we have ∂DF (xk ) = co ∂dF (xk ; Ωi ) : i ∈ I(xk ) , where I(xk ) := {i ∈ {1, . . . , m} : DF (xk ) = dF (xk ; Ωi )}. By [21], the subdifferential of the function dF (·; Ω) can be computed by ∂dF (¯ x; Ω) = ∂σF (¯ x − w) ¯ ∩ N (w; ¯ Ω),

x ¯ ∈ Rn ,

where w ¯ is any element chosen from the projection set πF (¯ x; Ω) := {ω ∈ Ω : dF (¯ x; Ω) = σF (¯ x − ω)}. Note that if xk ∈ Ωi with i ∈ I(xk ), the subdifferential dF (xk ; Ωi ) always contains 0, so we can choose wk = 0. In the case where xk ∈ / Ωi , we first find pk,i ∈ πF (¯ x; Ωi ) and find wk ∈ ∂σF (xk − pk,i ) ∩ N (pk,i ; Ω). After that, we perform the iteration xk+1 = P xk − αk wk ; Ω ,

22


where αk is a stepsize.

14 12 10 8 6 4 2 0 −2 −4 −6 −8 −10 −10

−8

−6

−4

−2

0

2

4

6

8

10

12

14

16

Fig. 1 The smallest intersecting ball problems with different norms

Example 6.1 Let us first apply the subgradient method to an unconstraint generalized Sylvester problem (4) in R2 in which Ωi for i = 1, . . . , 6 are squares with centers at (−6, 9), (12, 9), (−1, −6), (−8, 5), (−7, 0), (7, 1) with radii 1, 2.5, 2.5, 1, 2, 4, respectively. If we consider the case where σF (x) = kxk, the Euclidean norm, a MATLAB program yields an approximate solution x∗ = (1.444, 3.967) and an approximate optimal value V ∗ = 8.444. With the same target sets, but we consider the case where σF (x) = kxk∞ , an approximate optimal value is V ∗ = 8.25 and an approximate optimal solution is x∗ = (1.25, 0). Similarly, if σF = kxk1 , then we obtain an optimal value V ∗ = 10.131 and an approximate solution x∗ = (1.500, 4.369); see Figure 1.

We implement Algorithm 4 to solve the generalized Sylvester problem in a number of examples. In each of the following examples, we implement Algorithm 4 with the following parameters described in this algorithm and Remark 5.1: = 10−6 , ˜ = 10−5 , p0 = 5, γ0 = .5, and N = 10. Observations suggest that when the number of dimensions is large, speed is improved by starting with the relatively high γ0 and p0 and decreasing each (thereby reducing error) with each iteration. Choosing σ, σ ˜ as described in Remark 5.1 ensures that the final iterations are of desired accuracy. The approximate radii of the smallest intersecting ball that corresponds to the approximate optimal solution xk is rk := D(xk ).


x0

23

x1

Fig. 2 An illustration of majorization minimization algorithm.

Example 6.2 Consider the unconstrained generalized Sylvester problem (4) in R2 in which Ωi for i = 1, . . . , 6 are disks with centers at (−6, 9), (12, 9), (−1, −6), (−8, 5), (−7, 0), (7, 1) with radii 3, 2.5, 2.5, 1, 2, 4, respectively. This setup is depicted in Figure 2. Algorithm 4 gives a very fast convergence rate in this example and does not seem to depend much on the starting point. We use the same starting point x0 = (−15, 3) for both Algorithm 4 and subgradient method. With N=10 iterations, Algorithm 4 yields an approximate smallest intersecting ball with center x∗ ≈ (1.65, 4.83) with approximate radius r∗ ≈ 8.65. Figure 2 shows a significant move toward the optimal solution of the problem in one step of the majorization minimization algorithm.

Fig. 3 A smallest intersecting ball problem for cubes in R3 .

Example 6.3 We consider the SIBP in which the target sets are square boxes in Rn . In Rn , a square box S(ω, r) with center ω = (ω1 , . . . , ωn ) and radius r is the set S(ω, r) := {x = (x1 , . . . , xn ) : max{|x1 − ω1 |, . . . , |xn − ωn |} ≤ r}.

24


Note that the Euclidean projection from  x to S(ω, r) can be expressed componentwise as follows     ωi − ri , if xi + r ≤ ωi ,      [P (x; S)]i = xi , if ωi − r ≤ xi ≤ ωi + r,         ω + r, if ω + r ≤ x . i

i

i

Consider the case where n = 3. The target sets are 5 square boxes with centers (−5, 0, 0), (1, 4, 4), (0, 5, 0), (−4, −3, 2) and (0, 0, 5) and radii ri = 1 for i = 1, . . . , 5 (see Figure 3). Our results show that both Algorithm 4 and the subgradient method give an approximate radius r∗ ≈ 3.18 although the latter outperforms the former.

10 Boxes, 10 Dimensions 89

50 Boxes, 50 Dimensions

Algorithm 4 subgradient algorithm BFGS

88


230 225

87 220

function value

function value

86 85 84 83

215 210 205

82 200 81 195

80

0

50

100

150

200

250

300

350

400

450

0

100

200

300

j

400

500

600

700

j

100 Boxes, 100 Dimensions

100 Boxes, 1000 Dimensions 1100


320


315

1050

function value

function value

310 305 300 295

1000

950

290 900

285 280 0

100

200

300

400

500

600

0

100

200

300

400

j

500

600

700

800

900

j

Fig. 4 Comparison between Algorithm 4, a subgradient algorithm, and a BFGS algorithm.

Example 6.4 Now we illustrate the performance of Algorithm 4 in higher dimensions with the same setting in Example 6.3. We choose a modification of the pseudo random sequence from [11] with a0 = 7 and ai+1 = mod(445ai + 1, 4096), bi =

ai for i ∈ N. 40.96


25

The radii ri and centers ci of the square boxes are successively set to b1 , b2 , . . . in the following order 10r1 , c1 (1), . . . , c1 (n); 10r2 , c2 (1), . . . , c2 (n); . . . ; 10rm , cm (1), . . . , cm (n). Algorithm 5 (BFGS Algorithm with Wolfe conditions). INPUT: H0 , x0 set k = 0 repeat set pk := −Hk ∇f (xk ) use a line search method to solve approximately αk := arg min{f (xk + αpk )} α

satisfying the Wolfe conditions, for 0 < c1 < c2 < 1, (i) f (xk + αpk ) ≤ f (xk ) + c1 α∇f (xk )T pk (ii) ∇f (xk + αpk )T pk ≥ c2 ∇f (xk )T pk set xk+1 := xk + αk pk set sk := xk+1 − xk set y k := ∇f (xk+1 ) − ∇f (xk ) y k (sk )T sk (y k )T sk (sk )T set Hk+1 := I − (yk )T sk Hk I − (yk )T sk + (yk )T sk set k := k + 1 until a stopping criterion is satisfied. OUTPUT: xk .

We also implement Algorithm 4 in comparison with the subgradient algorithm (minimizing the max distance function), and a BFGS algorithm with inexact line search satisfying Wolfe conditions (minimizing the log-exponential smoothing function, p = 10−6 ). The BFGS algorithm is outlined Algorithm 5; for details the reader is directed to [29]. From our numerical results (Figure 4), we see that Algorithm 4 performs better than the other algorithms in both accuracy and speed. Note that in Figure 4 we count every iteration of Nesterov’s accelerated gradient method in the total iteration count along the horizontal axis. The empty circles on the plot of Algorithm 4 mark those points at which the accelerated gradient method has sufficiently minimized a surrogate function of the log-exponential approximation with fixed p.

7 Conclusions

Based on the log-exponential smoothing technique and the majorization minimization algorithm, we have developed an effective numerical algorithm for solving the SIBP. The problem under consideration not only generalizes the Sylvester smallest enclosing circle problem, but also leads to the possibility of applications to other problems in constrained optimization, especially those that appear frequently in machine learning. Our numerical examples show that the algorithm works well for the problem in high dimensions.

26


Acknowledgements The authors would like to thank the referees and Prof. Jie Sun for reading the paper carefully and giving valuable comments to improve the content and the presentation of the paper. The research of the first author was supported by the Vietnam National Foundation for Science and Technology Development (NAFOSTED) under grant #101.01-2014.37. The research of Daniel Giles was partially supported by the USA National Science Foundation under grant DMS-1411817. The research of Nguyen Mau Nam was partially supported by the USA National Science Foundation under grant DMS-1411817 and the Simons Foundation under grant #208785.

References 1. Sylvester, J. J.: A question in the geometry of situation. Quarterly Journal of Pure and Applied Mathematics 1:79 (1857) 2. Alonso, J., Martini, H., Spirova, M.: Minimal enclosing discs, circumcircles, and circumcenters in normed planes. Comput. Geom. 45, 258–274 (2012) 3. Cheng, C., Hu, X., Martin, C.: On the smallest enclosing balls, Commun. Inf. Syst. 6, 137–160 (2006) 4. Fischer, K., Gartner, B.: The smallest enclosing ball of balls: Combinatorial structure and algorithms, Comput. Geom. 14, 341–378 ( 2004) 5. Hearn, D. W., Vijay, J.: Efficient algorithms for the (weighted) minimum circle problem, Oper. Res. 30, 777–795 (1981) 6. Nielsen, F., Nock, R.: Approximating smallest enclosing balls with applications to machine learning, Internat. J. Comput. Geom. Appl. 19, 389–414 (2009) 7. Saha, A., Vishwanathan, S., Zhang, X.: Efficient approximation algorithms for minimum enclosing convex shapes, proceedings of SODA (2011) 8. Welzl, E.: Smallest enclosing disks (balls ellipsoids). H. Maurer, editor, Lecture Notes in Comput. Sci. 555, 359–370 (1991) 9. Yildirim, E. A.: On the minimum volume covering ellipsoid of ellipsoids, SIAM J. Optim. 17, 621–641 (2006) 10. Yildirim, E. A.: Two algorithms for the minimum enclosing ball problem, SIAM J. Optim. 19, 1368–1391 (2008) 11. Zhou, G., Toh, K. C., Sun, J.: Efficient algorithms for the smallest enclosing ball problem, Comput. Optim. Appl. 30, 147–160 (2005) 12. Nam, N. M., An, N. T., Salinas, and J.: Applications of convex analysis to the smallest intersecting ball problem, J. Convex Anal., 19, 497–518 (2012) 13. Mordukhovich, B.S., Nam, N.M., Villalobos, C.: The smallest enclosing ball problem and the smallest intersecting ball problem: existence and uniqueness of optimal solutions, Optim. Lett. 7, 839–853 (2013) 14. Mordukhovich, B.S., Nam, N.M.: Applications of variational analysis to a generalized Fermat-Torricelli problem. J. Optim. Theory Appl. 148, 431–454 (2011) 15. Chi, E., Zhou, H., Lange, K.: Distance Majorization and Its Applications, Math Program., Ser. A, 146, 409–436 (2014) 16. Jahn, T., Kupitz, Y. S., Martini, H., Richter, C.: Minsum location extended to gauges and to convex sets, J. Optim. Theory Appl. 166, 711–746 (2015) 17. Bertsekas, D., Nedic, A., Ozdaglar, A.: Convex Analysis and Optimization, Athena Scientific, Boston (2003)


27

18. Boyd, S., Vandenberghe, L.: Convex Optimization, Cambridge University Press, Cambridge, England (2004) 19. Hiriart-Urruty, J.-B., Lemar´ echal, C.: Convex Analysis and Minimization Algorithms I. Fundamentals, Springer-Verlag, Berlin (1993) 20. Mordukhovich, B. S., Nam, N. M.: An Easy Path to Convex Analysis and Applications, Morgan & Claypool Publishers (2014) 21. Nam, N. M., An, N. T., Rector, R. B., Sun, J.: Nonsmooth algorithms and Nesterov’s smoothing technique for generalized Fermat-Torricelli problems. SIAM J. Optim. 24, no. 4, 1815–1839 (2014) 22. He, Y., Ng, K. F.: Subdifferentials of a minimum time function in Banach spaces, J. Math. Anal. Appl. 321 896–910 (2006) 23. Zhai, X.: Two problems in convex conic optimization, master’s thesis, National University of Singapore (2007) 24. Mairal, J.: Incremental majorization-minimization optimization with application to large-scale machine learning. arXiv preprint arXiv:1402.4419 (2014) 25. Hunter, D.R., Lange, K.: Tutorial on MM algorithms. Amer Statistician 58, 30–37 (2004) 26. Lange, K., Hunter, D. R., Yang, I.: Optimization transfer using surrogate objective functions (with discussion), J Comput Graphical Statatisctis 9, 1–59 (2000) 27. Nesterov, Yu.: Smooth minimization of non-smooth functions, Math. Program. 103, 127–152 (2005) 28. Nesterov, Yu.: A method for unconstrained convex minimization problem with the rate of convergence O( k12 ). Doklady AN SSSR (translated as Soviet Math. Docl.) 269, 543–547 (1983) 29. Nocedal, J., Wright, S.: Numerical Optimization, 2nd ed. Springer (2006)