Smoothing algorithms for computing the projection

0 downloads 0 Views 933KB Size Report
The case of more than two sets is defined in the same way by induction. ...... Remark 6.2 The classical Frank-Wolfe method for solving the problem mintfpxq : x А ...
manuscript No. (will be inserted by the editor)

Smoothing algorithms for computing the projection onto a Minkowski sum of convex sets Xiaolong Qin



Nguyen Thai An

Received: date / Accepted: date

Abstract In this paper, the problem of computing the projection, and therefore the minimum distance, from a point onto a Minkowski sum of general convex sets is studied. Our approach is based on the minimum norm duality theorem originally stated by Nirenberg and the Nesterov smoothing techniques. It is shown that the projection onto a Minkowski sum of sets can be represented as the sum of points on constituent sets so that, at these points, all of the sets share the same normal vector which is the negative of the dual solution. For numerically solving the problem, the most suitable algorithm is the one suggested by Gilbert [SIAM J. Contr., vol. 4, pp. 61–80, 1966]. This algorithm has been widely used in collision detection and path planning in robotics. However, a main drawback of this method is that in some  cases, it turns to be very slow as it approaches the solution. In this paper we proposed NESMINO whose O

?1 lnp 1 q complexity

bound is better than the worst-case complexity bound of Op 1 q of Gilbert’s algorithm.

Keywords Minimum norm problem  Minkowski sum of sets  Gilbert’s algorithm  Nesterov’s smoothing

technique  Fast gradient method  SAGA

Mathematics Subject Classification (2010) Primary 49J52, 49M29, Secondary 90C30.

 Corresponding author

Xiaolong Qin Institute of Fundamental and Frontier Sciences, University of Electronic Science and Technology of China, Chengdu 611731, China E-mail: [email protected] Nguyen Thai An Institute of Fundamental and Frontier Sciences University of Electronic Science and Technology of China, Chengdu 611731, China Institute of Research and Development, Duy Tan University, Danang, Vietnam E-mail: [email protected]

2

Xiaolong Qin, Nguyen Thai An

1 Introduction Let A and B be two subsets in Rn . Recall that the Minkowski sum of these sets is defined by A

B : ta

b : a P A, b P B u.

The case of more than two sets is defined in the same way by induction. Note that if all the sets Ai for i  1, . . . , m are convex, then every linear combination of these sets,

°m

 λi Ai with λi P R for i  1, . . . , m,

i 1

is also convex. The Euclidean distance function associated with a subset Q is defined by dpq; Qq : inf t}x  q } : x P Qu, where

}  } is the Euclidean norm. The optimization problem we are concerned with in this paper is the

following minimum norm problem 

d 0;

p ¸



Ti pΩi q



: min

#

}x} : x P

i 1

where Ωi , for i

p ¸



+

Ti pΩi q ,

 1, . . . , p, are nonempty convex compact sets in Rm and Ti

mappings satisfying

(1.1)

i 1

Ti py q  Ai y

: Rm

Ñ Rn are affine

P Rm , where Ai , for i  1, . . . , p, are n  m matrices and ai , for i  1, . . . , p, are given points in Rn . Since °p i1 Ti pΩi q is closed and convex, and the norm under consideration is Euclidean, (1.1) has a unique solu°p tion, which is the projection from the origin onto i1 Ti pΩi q. We denote this solution by x throughout ai ,

y

this paper. We assume in problem (1.1) that each set Ωi is simple enough so that the corresponding projection operator PΩi is easy to compute. It is worth noting that there hasn’t been an algorithm for finding the Minkowski sum of general convex sets except for the cases of balls and polytopes. Moreover, in general, the projection onto a Minkowski sum of sets cannot be represented as the sum of the projections onto constituent sets. Minimum norm problems for the case of polytopes have been well studied in the literature from both theoretical and numerical point of view; see e.g., [24, 34, 12] and the references therein. The most suitable algorithm for solving (1.1) is perhaps the one suggested by Gilbert [14]. The original Gilbert’s algorithm was devised for solving the minimum norm problem associated with just one convex compact set. The algorithm does not require the explicit projection operator of the given set. Instead, it requires in each step the computation of the support point of the set along with a certain direction. By observation that for a given direction, support point of a Minkowski sum of sets can be represented in term of support points of constituent sets, Gilbert’s algorithm thus can be applied for general case of (1.1). Gilbert’s algorithm can be analyzed from geometrical point of views and is easy to implement. However, a serious problem in the algorithm is that, in some cases, it loops infinitely and is very slow as it approaches the solution; see [20, 23,

4]. Following [14], Gilbert’s algorithm is a descent method that generates a sequence tzk u satisfying }zk }

converges downward to }x } within Op k1 q iterations.

Another effective algorithm for distance computation between two convex objects is the Gilbert-JohnsonKeerthi (GJK) algorithm proposed in [15] and its enhancing versions [16, 3, 1]. The original GJK algorithm

Smoothing algorithms for computing the projection onto a Minkowski sum of convex sets

3

was just restricted to compute the distance between objects which can be approximately represented as convex polytopes. In order to reduce the error of the polytope approximations in finding the minimum distance, Gilbert and Fo [16] modified the original GJK to handle general convex objects. The new modified algorithm is based on Gilbert’s algorithm and has the same bound on number of iterations. Problem (1.1) is challenging due to its complex constraint

°p

 Ti pΩi q. To overcome this difficulty we

i 1

are going to employ duality approach. Contributions. The contributions of this paper are two-fold: On the theoretical side we show that each projection onto a Minkowski sum of convex sets can be represented as the sum of points on constituent sets so that at these points, all the sets share the same normal vector. This result is obtained by investigating the relationship between solutions of (1.1) and its Fenchel duality. We also give conditions for the uniqueness of solution of primal and dual problem. On the numerical side, we proposed NESMINO that is based on the smoothing technique developed by Nesterov [27, 28]. To this end, we first approximate the dual objective function by a smooth and strongly convex function and solve this smooth problem via a fast gradient scheme. After that we show how an approximate solution for the primal problem, i.e., an approximation for the projection of the origin, can be reconstructed from the dual iterative sequence. Our  algorithm has the complexity bound of O

?1 lnp 1 q that is better than the worst-case complexity Op 1 q of

Gilbert’s algorithm. Moreover, the algorithm also provides elements on each constituent sets such that their

sum is equal to the projection x . When the number of sets in the Minkowski sum is large, we introduce SAGA-NESMINO which is comparable to Gilbert’s algorithm from both running time and accuracy. The rest of the paper is organized as follows. In section 2, we provide tools of convex analysis that are widely used in the sequel. The Nesterov’s smoothing technique and fast gradient method are recalled in section 3. In section 4, we state some duality results concerning the minimum norm problems. Section 5 is devoted to an overview of Gilbert’s algorithm. Section 6 is the main part of the paper devoted to develop a smoothing algorithms for solving (1.1). Some numerical experiments are also provided in this section.

2 Tools of Convex Analysis Let x, y be the inner product associated with Euclidean norm }  } in Rn . An extended real-valued function

Ñ R Y t 8u is said to be convex if f pp1  λqx λy q ¤ p1  λqf pxq λf py q for all x, y P Rn and λ P p0, 1q. We say that f is strongly convex with modulus γ if f  γ2 }  }2 is a convex f : Rn

function. Let Q be a subset of Rn , the support function of Q is defined by σQ puq : suptxu, xy : x P Qu, u P Rn .

(2.1)

It follows directly from the definition that σQ is positive homogeneous and subadditive. The set-valued

Ñ Rn defined by SQ puq  tx P Q : xu, xy  σQ puqu, u P Rn is called the support point mapping of Q. If Q is compact, then σQ puq is finite and SQ puq  H for all u P Rn . mapping SQ : Rn

4

Xiaolong Qin, Nguyen Thai An

In order to study minimum norm problem in which the Euclidean distance is replaced by distances generated by different norms, we consider a more general setting. Let F be a closed, bounded and convex set of Rn that contains the origin as an interior point. The minimal time function associated with the dynamic set F and the target set Q is defined by TF px; Qq : inf tt ¥ 0 : px

tF q X Q  Hu.

(2.2)

The minimal time function (2.2) can be expressed as TF px; Qq  inf tρF pω  xq : ω

P Qu,

(2.3)

where ρF pxq : inf tt ¥ 0 : x P tF u is the Minkowski function associated with F . Moreover, TF p, Qq is convex if and only if Q is convex; see [25]. We denote by ΠF px; Qq : tq

P Q : ρF pq  xq  TF px; Qqu

the set of generalized projection from x to Q.

Note that, if F is the closed unit ball generated by some norm ~  ~ on Rn , then we have ρF

 ~  ~,

 ~  ~ and TF p, Qq reduces to the ordinary distance function dpx; Qq  inf t~ω  x~ : ω P Qu, x P Rn . The set ΠF px; Qq in this case is denoted by Π px; Qq : tq P Q : dpx; Qq  ~q  x~u. When ~  ~ is Euclidean norm, we simply use the notation PQ pxq instead. If Q is a nonempty closed convex set, then the Euclidean projection PQ pxq is a singleton for every x P Rn . σF

The following results whose proof can be found in [17] allow us to represent support functions of general sets in term of the support functions of one or more simpler sets. Lemma 2.1 Consider the support function (2.1). Let Ω, Ω1 , Ω2 be subsets of Rm and T : Rm satisfying T pxq

 Ax

a be an affine transformation, where A is an n  m matrix and a

Ñ Rn

P Rn . The

following assertions hold:

 σcl Ω  σco Ω  σco Ω . (ii) σΩ Ω puq  σΩ puq σΩ puq and σΩ Ω puq  σΩ puq σΩ puq, for all u P Rm . (iii) σT pΩ q pv q  σΩ pAJ v q xv, ay, for all v P Rn . A subset Ω P Rm is said to be strictly convex if tu p1  tqv P intpΩ q whenever u, v P Ω, u  v and t P p0, 1q. Lemma 2.2 Let Ω, Ω1 , Ω2 be convex compact subsets of Rm and T : Rm Ñ Rn be an affine transformation satisfying T pxq  Ax a, where A is an n  m matrix and a P Rn . The following assertions hold: (i) SΩ Ω puq  SΩ puq SΩ puq, for all u P Rm .   a, for all v P Rn . (ii) ST pΩ q pv q  T SΩ pAJ v q  A SΩ pAJ v q (iii) If suppose further that Ω is a strictly convex set, then SΩ puq is a singleton for any u P Rm zt0u.

(i) σΩ

1

1

2

2

1

2

1

1

2

1

2

2

Proof (i) The assumption on the compactness ensures the nonemptyness of involving support points sets.

puq. This means that w¯ P Ω1 Ω2 and xu, w¯y  σΩ exists w ¯1 P Ω1 and w ¯2 P Ω2 such that w ¯w ¯1 w ¯2 . Employing Lemma 2.1(ii), we have xu, w¯1 y xu, w¯2 y  σΩ puq σΩ puq. Let any support point w ¯

P SΩ

1

Ω2

1

2

1

Ω2

puq. There (2.4)

Smoothing algorithms for computing the projection onto a Minkowski sum of convex sets

From the definition of support functions, xu, w ¯1 y

5

¤ σΩ puq and xu, w¯2 y ¤ σΩ puq. Therefore, equality (2.4) holds if and ony if xu, w ¯1 y  σΩ puq and xu, w ¯2 y  σΩ puq. Thus, w ¯ P SΩ puq SΩ puq and we have justified the ” € ” inclusion in (i). The converse implication is straightforward. Now let w ¯ P Ω such that Aw ¯ a P ST pΩ q pv q. By Lemma 2.1(iii), we have xv, Aw ¯ ay  σT pΩ q pv q  J J J σΩ pA v q xv, ay. This is equivalent to xA v, w ¯ y  σΩ pA v q. Therefore, w ¯ P SΩ pAJ v q and thus Aw ¯  J a P A SΩ pA v q a. The converse implication of (ii) is proved similarly. For (iii), suppose that there exist w ¯1 , w ¯2 P Ω with w ¯ w ¯2 such that xu, w ¯1 y  xu, w ¯2 y  σΩ puq. It 1

1

2

2

1

2

then follows from the properties of support function that

xu, w¯1 2 w¯2 y  12 xu, w¯1 y

1 σΩ puq  σΩ puq. 2

P ıpΩ q. Take  ¡ 0 small enough such that IB pw; ¯ q € Ω. Then ¯ q € Ω. Moreover, since u  0, we have } P IB pw;

Since Ω is strictly convex, w ¯ :

p : w the element w ¯

1 xu, w¯2 y  12 σΩ puq 2

w ¯1 w ¯2 2

 u 2 u

}

xu, wpy  xu, w¯y

 }u} ¡ xu, w¯y  σΩ puq. 2

l

This is a contradiction. The proof is complete.

The Fenchel conjugate of a convex function f : Rn

Ñ R Y t 8u is defined by

f  pv q : suptxv, xy  f pxq : x P Rn u, v If f is proper and lower semicontinuous, then f  : Rn

P Rn .

Ñ R Yt 8u is also a proper, lower semicontinuous

convex function. From the definition, support function σQ is the Fenchel conjugate of the indicator function δQ of Q which is defined by δQ pxq  0 if x P Q and δQ pxq 

8 otherwise. The polar of a subset E € Rn is the set E   tu P Rn : σE puq ¤ 1u. When E is the closed unit ball of a norm ~~, then E  is the closed unit ball of the corresponding dual norm ~~ . Some basis properties of the polar set are collected in the following result whose proof can be found in [33, Proposition 1.23]. Proposition 2.1 The following assertions hold:

(i) For any subset E, the polar E  is a closed convex set containing the origin and E (ii) 0 P intpE q if and only if E  is bounded;

€ E  ;

 E  if E is closed convex and contains the origin;  (iv) If E is closed convex and contains the origin, then ρE  σE  and pρE q  δE  . If F is a closed convex and bounded set with 0 P intpF q then F  is also a closed convex and bounded set with 0 P intpF  q. Moreover, from the subadditive property, ρF  σF  is a Lipschitz function with modulus }F  } : supt}x} : x P F  u. (iii) E

Let us recall below the Fenchel duality theorem which plays an important role in the sequel. We denote

Ñ R Y t 8u is finite and continuous by contg. Theorem 2.1 (See [2, Theorem 3.3.5]) Given functions f : Rm Ñ R Y t 8u and g : Rn Ñ R Y t 8u, and a linear mapping A : Rm Ñ Rn , the weak duality inequality the set of points where a function g : Rn

inf tf pxq

P

x Rm

g pAxqu ¥ sup tf  pA uq  g  puqu

P

u Rn

6

Xiaolong Qin, Nguyen Thai An

holds. If furthermore f and g are convex and satisfy the following condition Apdom f q X contg

 H,

then the equality holds and the supremum is attained if it is finite.

3 Nesterov’s Smoothing Technique and Fast Gradient Method In a celebrated work [29], Nesterov introduced a fast first-order method for solving convex smooth problems in which the objective functions have Lipschitz continuous gradients. In contrast to the complexity bound

of Op1{q possessed by the classical gradient descent method, Nesterov’s method gives a complexity bound

?

of Op1{ q, where  is the desired accuracy for the objective function.

When the problem under consideration is nonsmooth in which the objective function has an explicit max-structure as follows f puq : maxtxAu, xy  φpxq : x P Qu, u P Rn , where A is an m  n matrix and φ is a continuous convex function on a compact set Q of Rm , in order to overcome the complexity bound Op 12 q of the subgradient method, Nesterov [27] made use of the special

structure of f to approximate it by a function with Lipschitz continuous gradient and then applied a fast gradient method to minimize the smooth approximation. With this combination, we can solve the original non-smooth problem up to accuracy  in Op 1 q iterations. To this end, let d be a continuous strongly convex function on Q. Let µ be a positive number called a smooth parameter. Define fµ puq : maxtxAu, xy  φpxq  µdpxq : x P Qu.

(3.1)

Since dpxq is strongly convex, problem (3.1) has a unique solution. The following statement is a simplified version of [27, Theorem 1]. Theorem 3.1 The function fµ in (3.1) is well defined and continuously differentiable on Rn . The gradient of the function is ∇fµ puq  AJ xµ puq, where xµ puq is the unique element of Q such that the maximum in (3.1) is attained. Moreover, ∇fµ is a 1 Lipschitz function with the Lipschitz constant `µ  }A}2 , and µσ1 fµ puq ¤ f puq ¤ fµ puq

µD

@ u P Rn ,

where D : maxtdpxq : x P Qu. For the reader’s convenience, we conclude this section with a presentation of the simplest optimal method for minimizing smooth and strongly convex functions; see [28] and the references therein. Let

Ñ R be strongly convex with parameter γ ¡ 0 and its gradient be Lipschitz continuous with constant L ¥ γ. Consider problem g   inf tg puq : u P Rn u and denote by u its unique optimal solution.

g : Rn

Smoothing algorithms for computing the projection onto a Minkowski sum of convex sets

7

Fast Gradient Method INITIALIZE:

Set k

γ, v0

 u0 P Rn .

 0.

Repeat the following Set uk

1

: vk 

Set vk

1

: uk

Set k : k

1 L ∇g

p vk q

? ? ?LL?γγ puk

1

1

 uk q

1

Until a stopping criterion is satisfied. The linear convergence rate is proved in [28, Theorem 2.2.3]. From this the sequence tuk u8 k0 satisfies 



γ } u0  u }2 1 2



γ k } u0  u }2 e 2

g puk q  g  ¤ g pu0 q  g 

¤ gpu0 q  g

¤ 2 pgpu0 q  g q ek

?

γ L

b

c

2 L

k

γ Lµ

.

(3.2)

Since g is a differentiable strongly convex function and u is its unique minimizer on Rn , we have ∇g pu q  0. Using [28, Theorem 2.1.5], we also have the estimate 1 }∇gpuk q}2 2L

¤ gpuk q  g ¤ 2 pgpu0 q  g q ek (3.2)

?

γ L

.

(3.3)

4 Duality for Minimum Norm Problems In this section, we are in a position to give some duality results concerning minimum norm problem (1.1). Let us first recall the duality theorem originally stated by Nirenberg [30].

P Rn and let dp; Ω q be the distance function to a nonempty closed convex set Ω associated with some norm ~  ~ on Rn . Then Theorem 4.1 (Minimum norm duality theorem) Given x ¯

dpx ¯; Ω q  maxtxu, x ¯y  σΩ puq :

~u~ ¤ 1u,

where the maximum on the right is achieved at some u ¯. Moreover, if w ¯ w ¯x ¯, i.e., xu ¯, w ¯x ¯y  ~u ¯ ~  .~ w ¯x ¯ ~.

P Π px¯; Ω q, then u¯ is aligned with

According to this theorem, the minimum distance from a point to a convex set is equal to the maximum of the distance from the point to hyperplanes separating the point and the set; see Figure 1. A standard proof of this theorem can be found in [22, p. 136]. We also refer the readers to the recent paper [8] for more types of minimum norm duality theorems concerning the width and the length of symmetrical convex bodies.

8

Xiaolong Qin, Nguyen Thai An





Fig. 1 An illustration of the minimum norm duality theorem

Lemma 4.1 Let Q be a nonempty closed subset of Rn . Then the generalized projection ΠF px; Qq is

nonempty for any x P Rn .

Proof Since F is a closed bounded and convex set that contains the origin as an interior point, 0

TF px; Qq  

8 for all x P R

n

¤

and the following number exists R  suptr : IB p0; rq € F  u  

8.

Then we have ρF pxq  σF  pxq ¥ R}x} for all x P Rn . Fix x P Rn . For each n P N, from (2.3) there exists wn

P Q, such that

TF px; Qq ¤ ρF pwn  xq   TF px; Qq

1 . n

(4.1)

It follows from (4.1) and triangle inequality that R}wn } ¤ R p}wn  x}

}x}q ¤ ρF pwn  xq

R}x} ¤ TF px; Qq

1

R}x}

for all n. Thus the sequence twn u is bounded. We can take a subsequence twkn u that converges to a point

P Q due to the closedness of Q. By taking the limit both sides of (4.1) and using the continuity of the Minkowski function, we can conclude that w ¯ P ΠF px; Qq. l

w ¯

Theorem 4.1 is in fact a consequence of the Fenchel duality theorem which is used to prove the following extension for minimal time functions. Theorem 4.2 The generalized distance TF p0; ApΩ qq from the origin 0Rn to the image ApΩ q of a nonempty closed convex set Ω

€ Rm under a linear mapping A : Rm Ñ Rn can be computed by

TF p0; ApΩ qq : inf tρF pAwq : w

P Ω u  max σΩ pAJ uq : u P F 

where the maximum on the right is achieved at some u ¯ the origin onto ApΩ q, then

(

,

P F  . If Aw¯ P ΠF p0; ApΩ qq is a projection from

xAw, ¯ u ¯y  σF  pAw ¯ q  σΩ pAJ u ¯ q.

Smoothing algorithms for computing the projection onto a Minkowski sum of convex sets

Proof Applying Theorem 2.1 for g

9

 ρF and f  δΩ , the following qualification condition holds

Adomf

X contpgq  ApΩ q X Rn  ApΩ q  H,

where contpg q  Rn is due to the fact that ρF is a continuous function on Rn . It follows that TF p0; ApΩ qq  inf tδΩ pxq

ρF pAxq : x P Rn u

 supt pδΩ q



AJ u 

AJ u

 suptσΩ

 pρF q puq : u P Rn u

 δF  puq : u P Rn u 

 suptσΩ AJ u  δF  puq : u P Rn u  suptσΩ AJ u



: u P F  u,

and the supremum is attained because TF p0; ApΩ qq is finite. If the supremum on the right is achieved at

some u ¯ P F  and the infimum on the left is achieved at some w ¯ σ F  p Aw ¯ q  ρ F p Aw ¯ q  TF p0, ApΩ qq  maxtσΩ Since w ¯

P Ω, then 

 AJ u

: u P F  u  σΩ pAJ u ¯q.

P Ω, we also have 

xAJ u¯, w¯y ¤ σΩ AJ u¯  σF  pAw¯q. This implies that xAJ u ¯, w ¯y

¥ σF  pAw¯q. On the other hand, σF  pAw¯q ¥ xAw, ¯ u ¯y  xAJ u ¯, w ¯ y, because   u ¯ P F . Thus, xAw, ¯ u ¯y  σF pAw ¯ q. This completes the proof. l Note that, given a closed set Ω, the set ApΩ q need not to be closed and therefore, we cannot use the min to replace the inf in the primal problem in Theorem 4.2. Proposition 4.1 Let Q be a nonempty, closed convex subset of Rn . The following holds TF p0; Qq : mintρF pq q : q

P Qu  maxtσQ puq : u P F  u.

(4.2)

If the maximum on the right is achieved at u ¯ P F  and the infimum on the left is attained at q¯ P Q, then

xq¯, u¯y  σF  pq¯q  σQ pu¯q. If F

(4.3)

 IB is the Euclidean closed unit ball, then the projection q¯ exists uniquely and dp0; Qq : mint}q } : q

If suppose further that 0 R Q, then

P Qu  maxtσQ puq : u P IB u.

q¯ }q¯} is the unique solution of the dual problem.

Proof The first assertion is a direct consequence of Theorem 4.2 with Ω

 Q and A is the identity mapping

of Rn . Note that, by Lemma 4.1, the infimum is attained here. When F is the Euclidean ball, the minimal time function reduces to the Euclidean distance function and therefore the projection q¯

 PQ p0q exists

uniquely. If 0 R Q, then q¯  0. Moreover, we have xq¯, x  q¯y ¤ 0 for all x P Q. This implies, B

F

 }qq¯¯} , x

¤ }q¯}, for all x P Q.

10

Xiaolong Qin, Nguyen Thai An





 }qq¯¯} ¤ }q¯}  dp0; Qq. This means that }qq¯¯} is a solution of the following dual problem dp0; Qq  maxtσQ puq : u P IB u. Using (4.3), any dual solution u ¯ must satisfy u ¯ P SF  pq¯q. Since F  IB, we have F   IB is a strictly convex set. Thus, by Lemma 2.2(iii), u ¯  }qq¯¯} is the unique solution of dual problem. The proof is now complete. l From (4.3), for any primal-dual pair pq¯, u ¯q, we have the following relationship u ¯ P SF  pq¯q and q¯ P SQ pu ¯ q. (4.4) Hence σQ

This observation seems to be useful from numerical point of view in the sense that if a dual solution u ¯ is found exactly, then a primal solution q¯ can be obtained by taking a support point in SQ pu ¯q. However, for a

general convex set Q, the set SQ pu ¯q might contain more than one point and there might be some points in

this set which is not a desired primal solution; see Figure 2. Thus, the above task is possible when SQ pu ¯q is a singleton. When the distance function under consideration is non-Euclidean, the primal problem may have infinitely many solutions and we may not recover a dual solution from a primal one q¯ by setting

q¯ }q¯} as in the

Euclidean case.

 tx P R2 : 2 ¤ x1 ¤ 5 and 1 ¤ x2 ¤ 4u in which the distance function generated by the `8 -norm. In this case, F  tx P R2 : maxt|x1 |, |x2 |u ¤ 1u and F   tx P R2 : |x1 | |x2 | ¤ 1u and we have TF p0; Qq  2. The primal problem in (4.2) has the solution set ΠF p0; Qq  tx P R2 : x1  2 and 1 ¤ x2 ¤ 2u and the corresponding dual problem has a unique solution u ¯  p1, 0q. We can see that, for any primal solution q¯, q¯ q¯ the element }q¯}  u ¯. Thus, }q¯} is not a dual solution; see Figure 2.

Example 4.1 In R2 , consider the problem of finding the projection onto the set Q

Fig. 2 A minimum norm problem with non-Euclidean distance.

We now give a sufficient condition for the uniqueness of solution of primal and dual problems in (4.2). We recall the following definition from [26]. The set F is said to be normally smooth if and only if for every

Smoothing algorithms for computing the projection onto a Minkowski sum of convex sets

11

boundary point x ¯ of F , the normal cone of F at x ¯ defined by N px ¯; F q : tu P Rn : xu, x  x ¯y ¤ 0,

@x P F u is generated exactly by one vector. That means, there exists ax¯ P Rn such that N px ¯; F q  cone tax¯ u. From [26, Proposition 3.3], we have that F is normally smooth if and only if its polar F  is strictly convex. Proposition 4.2 Let Q be a nonempty, closed convex subset of Rn . We have the following: (i) If Q is strictly convex or if F is strictly convex, then the primal problem in (4.2) has a unique solution, i.e., the generalized projection set ΠF p0; Qq is a singleton.

(ii) If F is normally smooth, then the dual problem in (4.2) has a unique solution. Proof (i) Let u ¯ be a dual solution in (4.2). Since Q is nonempty and closed, the set ΠF p0; Qq is nonempty

by Lemma 4.1. Consider the case where Q is strictly convex. Suppose that ΠF p0; Qq contains two distinct elements q1

 q2 . Then, by relation (4.4), both q1 and q2 belong to the set SQ pu¯q. This contradicts Lemma

2.2 by the strictly convexity of Q.

P Q then ΠF p0; Qq  t0u is a singleton. Consider the case 0 R Q. Suppose by contradiction that there exist q¯1 , q¯2 P ΠF p0; Qq with q¯1  q¯2 . For γ : TF p0; Qq ¡ 0, we have ρF pq¯1 q  ρF pq¯2 q  γ ¡ 0. By the positive homogeneity of ρF , we have ρF p q¯γ q  ρF p q¯γ q  1. This  q¯ implies q¯γ , q¯γ P F and therefore 12 q¯γ P intpF q by the strictly convexity of F . It follows again by γ  q¯ q¯ the homogeneity of ρF that ρF   γ  TF p0; Qq  inf tρF pqq : q P Qu. This is a contradiction. 2 Thus, (i) has been justified. The proof of (ii) is similar by using the strictly convexity of F  . l Assume that F is strictly convex. If 0

2

1

1

2

1

1

2

2

From above result, when F is strictly convex and also normally smooth (for example, F is an Euclidean ball or an Ellipsoid), both the primal and dual problems in (4.2) have a unique solution. Minkowski sum of two closed sets is not necessarily closed. For example, for Q1

e

x1

u and Q2  tx P R

2

: x2

 0u, the sum Q1

Q2

 tx P R2 :

x2

 tx P R2 :

x2

¥

¡ 0u

is an open set. In what follows, in order to ensure the existence of support point for the Minkowski sum, we assume that all component sets are compact. We now show that (4.4) can allow us to characterize points on each constituent sets in the Minkowski sum so that their sum is equal to the projection point; see Figure 3 for an illustration. Corollary 4.1 Let tQi upi1 be a finite collection of nonempty convex compact sets in Rn . It holds that 

TF

0;



p ¸



Qi

  min

#



i 1

+

p ¸

σQi puq : u P F  .

i 1

Moreover, if the minimum on the right hand side is attained at u ¯ P F  , then any generalized projection q¯ of the origin onto the set

°p

 Qi satisfies

i 1

q¯ P SQ1 pu ¯q

...

SQp pu ¯ q.

Thus, the projection q¯ is the sum of points on component sets such that at these points all the sets have the same normal vector u ¯. If F and

 IB is the Euclidean closed unit ball, then the projection q¯ exists uniquely



d 0;

p ¸



i 1



Qi

  min

#

p ¸



i 1

+

σQi puq : u P IB .

12

Xiaolong Qin, Nguyen Thai An 6

4

2

0

−2

−4

−6 −2

0

2

4

6

8

10

12

14

16

Fig. 3 The Minkowski sum of a polytope and two ellipses is approximately plotted by the red set. The projection of the origin onto the red is the sum of points on three constituent sets such that, at these points, three sets share the same normal vector. This Figure is plotted via Ellipsoidal Toolbox [21].

If in addition, 0 R

°p

q¯  Qi then }q¯} is the unique solution of the dual problem and we have

i 1

q¯ P SQ1 pq¯q Proof Let Q :

...

SQp pq¯q.

°p

 Qi . Using Lemma 2.1 and Lemma 2.2, we have

i 1

σQ puq  σQ1 puq

...

σQp puq and SQ puq  SQ1 puq

...

SQp puq.

Note that, the support point mapping SQ puq does not depend on the magnitude of u, using Proposition 4.1

l

and relation (4.4), we clarify the desired conclusion easily.

The problem of finding a pair of closest points, and therefore the Euclidean distance, between two given convex compact sets P and Q can be reduced to the minimum norm problem associated with the Minkowski

sum Q  P by observing that dpP, Qq

 dp0, Q  P q. A note here is that although there may be several

pairs of closest points, the latter problem always has a unique solution which is the projection from 0 onto Q  P. By noting that σP puq  σP puq and SP puq  SP puq, we have the following result.

Corollary 4.2 Let tQi upi1 and tPj u`j 1 be two finite collection of nonempty convex compact sets in Rn and let Q 

°p

 Qi , P 

i 1

°`

 Pj . It holds that

j 1

d pP, Qq   min

#

p ¸



i 1

σQi puq

` ¸



+

σPj puq : u P IB .

j 1

Moreover, if q¯ is the projection of the origin onto Q : Q  P and if pa ¯, ¯bq is a pair of closest points of Q and P, then q¯  a ¯  ¯b and a ¯ P SQ1 pq¯q

...

SQp pq¯q and ¯b P SP1 pq¯q

...

SP` pq¯q.

Thus, a ¯ is the sum of points in Qi for i  1, . . . , p such that at these points all Qi have the same normal vector q¯ and ¯b is the sum of points in Pj for j  1, . . . , ` such that at these points all Pj have the same normal vector q¯.

Smoothing algorithms for computing the projection onto a Minkowski sum of convex sets

13

5 The Gilbert Algorithm We now give an overview and clarify how the Gilbert algorithm can be applied for solving (1.1). Let us define the function g : Rn  Q Ñ R by

gQ pz, xq : σQ pz q  xz, xy,

 °pi1 Ti pΩi q. From the definition, gQ pz, zq ¥ 0 for all z P Q. A point z P Q is the solution of (1.1) if and only if xz, x  z y ¤ 0 for all x P Q. This amounts to saying that gQ pz, z q  0. where Q

Lemma 5.1 If two points z and z¯ satisfy }z }2

convtz, z¯u such that }z˜}   }z }.

 xz, z¯y ¡ 0, then there is a point z˜ in the line segment

Proof If }z¯}2

¤ xz, z¯y, then we can choose z˜  z¯. Consider the case }z¯}2 ¡ xz, z¯y. By combining with the assumption }z }2  xz, z¯y ¡ 0, we have 0   λ :

}z}2  xz, z¯y   1. }z  z¯}2

This implies the quadratic function f pλq  }z¯  z }2 λ2

2xz, z¯  z yλ

attains its minimum on r0, 1s at λ and therefore f pλ q

z˜ : z

λ pz¯  z q is the desired point.

 }z

}z}2 λ pz¯  z q}2

  f p0q  }z}2 . Thus l

Fig. 4 An illustration of Gilbert’s algorithm.

The Gilbert algorithm can be interpreted as follows. Starting from some z

is the solution. If gQ pz, z q ¡ 0, then z¯ P SQ pz q satisfies }z }

P Q, if gQ pz, zq  0 then z

 xz, z¯y ¡ 0. Using Lemma 5.1, we find a point z˜ on the line segment connecting z and z¯ such that }z˜}   }z }. The algorithm is outlined as follows. 2

14

Xiaolong Qin, Nguyen Thai An

Gilbert’s Algorithm 0. Initialization step: Take an arbitrary point z0 1. If gQ pz, z q  0, then return z

P Q.

 x is the solution

else, set z¯ P SQ pz q. 2. Compute z˜ P convtz, z¯u which has minimum norm, set z

 z˜

and go back to step 1. Figure 4 illustrates some iterations of Gilbert’s algorithm for finding closest point to an ellipse in two dimension. Lemma 5.1 also suggests an effective way to find z˜ in step 3. We have z˜ : z where λ 

$ ' ' ' &1,

if }z¯}2

} }2  xz, z¯y , }z  z¯}2

' ' z ' %

¤ xz, z¯y,

(5.1)

otherwise.

To implement the algorithm, it remains to show how to compute a supporting point for Q Fortunately, this can be done by using Lemma 2.2.

Gilbert showed that, if tzk u8 k1 generated by the algorithm does not stop with z finite number of iterations, then zk

λ pz¯  z q,

 °pi1 Ti pΩi q.

 x at step 1 within a

Ñ x asymptotically. According to [14, Theorem 3], we have

}zk }  }x } ¤ Ck1

and

}zk  x } ¤ ?C2 , k

where C1 and C2 are some positive constants. From the above estimates, in order to find an  - approximate solution, i.e., a point z such that }z }  }x } ¤ , we need to perform the algorithm in Op 1 q iterations.

6 Smoothing Algorithm for Minimum Norm Problems Let us first consider the function of the following type σA,Q puq  suptxAu, xy : x P Qu, u P Rn , where A is an m  n matrix and Q is a closed bounded subset of Rm . Observe that σA,Q puq is the composition of a linear mapping and the support function of Q. As we will see, this function can be approximated by the following function µ σA,Q puq  sup

!

)

xAu, xy  µ2 }x}2 : x P Q

, u P Rn .

The following statement is a directly consequence of Theorem 3.1. However, the approximate function as well as its gradient, in this case, has closed form that is expressed in term of the Euclidean projection. This feature makes it reliable from numerical point of view.

Smoothing algorithms for computing the projection onto a Minkowski sum of convex sets

15

µ Proposition 6.1 The function σA,Q has the following explicit representation µ σA,Q puq 

}Au}2  µ dp Au ; Qq2 2µ

2

µ

µ and is continuous differentiable on Rn with its gradient given by ∇σA,Q puq  AJ PQ µ ∇σA,Q is a Lipschitz function with constant `µ

 µ1 }A}2 . Moreover,

µ µ σA,Q puq ¤ σA,Q puq ¤ σA,Q puq

where }Q} : supt}q } : q



Au µ



µ }Q}2 for all u P Rn , 2

. The gradient

(6.1)

P Qu.

Proof We have µ σA,Q puq  sup

!

)

xAu, xy  µ2 }x}2 : x P Q

"



 sup  µ2 }x}2  µ2 xAu, xy   µ2 inf

"

}x 

*

: xPQ

*

Au 2 }Au}2 }  µ2 : x P Q µ "

*

}  µ inf }x  Au }2 : x P Q  }Au 2µ 2 µ 2

 

2 µ Au } Au}2  d ;Q . 



2

µ

Since ψ pxq : rdpx; Qqs2 is a differentiable function satisfying ∇ψ pxq

 2rx  PQ pxqs for all x P Rm ,

we find from the chain rule that µ puq  ∇σA,Q



1 J µ 2 J A pAuq  A µ 2 µ



Au µ



 PQ p Au q µ

 AJ PQ p Au q. µ From the property of the projection mapping onto convex sets and Cauchy-Schwarz inequality, we find, for any u, v

P Rn , that

µ µ }∇σA,Q puq  ∇σA,Q pvq}2  }AJ PQ p Au q  AJ PQ p Av q}2 µ µ

Av 2 ¤ }A}2 }PQ p Au q  PQ p q} µ µ F B Av ¤ }A}2 Au µ Av , PQ p Au q  PQ p q µ µ

B F } A}2 Au Av J J  µ u  v, A PQ p µ q  A PQ p µ q µ µ puq  ∇σA,Q pvqy  }Aµ} xu  v, ∇σA,Q 2 ¤ }A} }u  v}}∇σµ puq  ∇σµ pvq}. 2

µ

A,Q

A,Q

16

Xiaolong Qin, Nguyen Thai An

This implies that

µ µ }∇σA,Q puq  ∇σA,Q pvq} ¤ }Aµ} }u  v}. 2

The lower and upper bounds in (6.1) follow from

xAu, xy  µ2 }x}2 ¤ xAu, xy ¤ xAu, xy  µ2 }x}2

µ sup 2

for all x P Q. The proof is now complete.

(

}q}2 : q P Q

l

From Proposition 4.1, we have the duality result 

d 0;

p ¸



Ti pΩi q



i 1

#

p ¸

,

+

¸   min σΩi pAJ ai y : u P IB . i uq  xu, i1 i1 p

This dual problem is a non-smooth convex problem with constraint. This makes it not favorable to apply optimization scheme. We now make use of the strong convexity of the squared Euclidean norm to state another dual problem for (1.1) in which the dual objective function is strongly convex and the constraint is removed. Proposition 6.2 The following duality result holds  

d 0;

p ¸



2

Ti pΩi q

  min

#

p ¸





σ Ωi



Ai J u

xu,

i 1

i 1

p ¸



ai y

i 1

Moreover if u is the unique solution of the dual problem in (6.2), then x Proof By observing that the dual of the function }}2 is

1 4

+

1 }u}2 : u P Rn . 4

(6.2)

  21 u .

}}2 , applying Theorem 2.1 for Q : °pi1 Ti pΩi q,

we have (

rd p0; Qqs2  inf }x}2 : x P Q !

 max  pδQ q puq  }  }2  puq : u P Rn 

"

 max σQ puq  14 }  u}2 : u P Rn   min

"

σQ puq

)

*

*

1 }u}2 : u P Rn . 4

Equality (6.2) now follows directly from Lemma 2.1. The objective dual function in (6.2) is strongly convex, solution u exists uniquely. Recall that x

 PQ p0q, from this duality result, we have }x }2  σQ pu q 1 2       2 1 }u }2 ¤ 4 }u } . Moreover, σQ pu q  supq PQ xu , q y ¥ xu , x y since x P Q. We have }x } 4 xu , x y which is equivalent to }x 12 u }2 ¤ 0 and therefore x   21 u . l In order to solve minimum norm problem (1.1), we solve dual problem (6.2) by approximating the dual objective function by a smooth and strongly convex function with Lipschitz continuous gradient and then apply a fast gradient scheme to this smooth one. Let us define the dual objective function by f puq :

p ¸



i 1





σ Ωi Ai J u

xu,

p ¸



ai y

i 1

The following result is a direct consequence of Proposition 6.1.

1 }u}2 , 4

u P Rn .

Smoothing algorithms for computing the projection onto a Minkowski sum of convex sets

17

Proposition 6.3 The function f puq has the following smooth approximation fµ puq :

p ¸





}AJi u}2  µ rdp AJi u ; Ω qs2 xu, ¸p a y i i 2µ 2 µ

1 }u}2 , 4



i 1

i 1

 12 and its gradient is given by

Moreover, fµ is a strongly convex function with modulus γ p ¸

∇fµ puq 





Ai P Ωi

i 1

The Lipschitz constant of ∇fµ is Lµ :

u P Rn .

AJ i u µ



°p

2  }Ai }

i 1

µ

p ¸



ai

i 1

1 u. 2

1 . 2

(6.3)

(6.4)

Moreover, we have the following estimate fµ puq ¤ f puq ¤ fµ puq where Df :

1 °p }Ωi }2 2 i1

µDf ,

  8.

We now apply the Nesterov fast gradient method introduced in Section 3 to minimize fµ . The NEsterov Smoothing algorithm for MInimum NOrm problem (NESMINO) is outlined as follows: NESMINO INITIALIZE:

Set k

Ωi , Ai , ai for i  1, . . . , p and v0 , u0 , µ.

 0.

Repeat the following Compute ∇fµ pvk q using (6.3) Compute Lµ using (6.4) Set uk

1

:  vk 

Set vk

1

: uk

Set k : k

p vk q ?L ?1{2 ?L ?1{2 puk 1  uk q

1 Lµ ∇fµ µ

1

µ

1

Until a stopping criterion is satisfied. We denote by uµ the unique minimizer of fµ on Rn . We also denote by u a minimizer of f and by

f  : f pu q  inf xPRn f pxq its optimal value on Rn . From the duality result (6.2), we have  

2

¸ f    d 0; Ti pΩi q i1 p

 }x }2 .

(6.5)

Recall that, our objective function in primal problem (1.1) is the Euclidean norm function }  }. A feasible point x P

°p

 Ti pΩi q is said to be an -approximate solution of problem (1.1) if it satisfies }x}  }x } ¤ .

i 1

18

Xiaolong Qin, Nguyen Thai An

Our purpose is to solve the original minimum norm problem (1.1) with an accuracy . From the structure of (1.1), it is very challenging for us to deal with the constraint x

P °pi1 Ti pΩi q and therefore we have

employed the dual approach. In our approach, the one that we are optimizing is the smooth approximation of the dual objective function. It remains to show how to recover an approximate primal solution from the dual iterative sequence.

8 Theorem 6.1 Let tuk u8 k1 be the sequence generated by NESMINO algorithm. Then the sequence txk uk1 defined by xk :



p ¸



A i PΩ i



i 1

AJ i uk µ





ai

converges to an -approximate solution of minimum norm problem (1.1) within k

O



?1 ln

1 



itera-

tions. Proof Using (3.2), we find that the iterative sequence tuk u8 k0 satisfies fµ puk q  fµ

¤2



fµ pu0 q  fµ e

k

b

γ Lµ

.

(6.6)

From fµ pu0 q ¤ f pu0 q and the following estimate fµ

 fµ puµ q ¥ f puµ q  µDf ¥ f pu q  µDf  f   µDf ,

we have

fµ pu0 q  fµ

¤ f pu0 q  f  µDf . Moreover, since fµ puk q  fµ ¥ f puk q  µDf  f  , we find from (6.6) and (6.7) that f puk q  f 

¤ µDf

fµ puk q  fµ

¤ µDf

2 pf pu0 q  f 

µDf q e

k

b

γ Lµ

, for all k

(6.7)

¥ 0.

(6.8)

Since fµ is a differentiable strongly convex function and uµ is its unique minimizer on Rn , we have ∇fµ uµ



 0. It follows from (3.3) that 1 }∇fµ puk q}2 2Lµ

¤ fµ puk q  fµ ¤

(6.6)

This implies

}∇fµ puk q}

2

¤ 4Lµ pfµ pu0 q  f  qek µ

b

γ Lµ

¤

(6.7)



2 fµ pu0 q  fµ e

k

4Lµ pf pu0 q  f 

b

γ Lµ

.

µDf qe

k

b

γ Lµ

.

(6.9)

For each k and for each i P t1, . . . , pu, let wki be the unique solution to the problem 

σµ,Ωi AJ i uk : sup

!

xAJi uk , wy  µ2 }w}2 : w P Ωi

We have !

µ sup xAJ }w}2 : w P Ωi i uk , w y  2

)

 sup

#

)

.

}AJi uk }2  µ  AJi uk  w2 : w P Ω 2





µ

  J

2 2 } AJ µ Ai u k i uk }   d ,Ω .



2

µ

i

i

+

Smoothing algorithms for computing the projection onto a Minkowski sum of convex sets

 PΩ

Hence wki

 i

dk :

AJ i uk . µ

19

For each k, we have

 p ¸  Ai wki  



i 1

2  ai  



 

d 0,

2

p ¸

Ti pΩi q





2  ai  

 p ¸  Ai wki  



i 1

i 1

where the equality is due to (6.5). Observe that the sequence txk u is primal feasible, i.e., xk p °



i 1

f ,



p °



i 1

Ai wki

ai

Ti pΩi q. From the property of the projection onto convex sets, we have xx , xk  x y ¤ 0 and hence

}xk  x }2  }xk }2  }x }2

2 xx , xk  x y ¤ }xk }2  }x }2

This implies that txk u converges to x whenever dk 2}x } p}xk }  }x }q ¤ p}xk }

 dk .

Ñ 0 as k Ñ 8. Moreover, we have

}x }q p}xk }  }x }q  }xk }2  }x }2  dk .

(6.10)

We have the following dk

     

 2 p ¸    i Ai wk ai  f    i1  2 p ¸   i Ai wk ai  fµ uk f  fµ u k    i1 2  p p p  ¸ ¸ ¸ µ i 2 1   i i AJ u , w Ai wk ai  w u , ai uk 2  k i k k k   2 4 i1 i1 i1  2 p p p ¸  ¸ @ D 1 µ¸ i 2   Ai wki ai  uk , Ai wki ai uk 2 wk f    4 2 i1 i1 i1  2 p p ¸   1  µ¸ i 2  uk  Ai wki ai wk f  fµ u k    2 2 i1 i1 2  p p p ¸ ¸ AJ 1  µ¸ i 2  i uk A i PΩ i ai uk  wk f  fµ uk    µ 2 2 i1 i1 i1

p

q

p

q

p

q

p q

 p q

rx

y } } s x

p

q



p

 }∇fµ puk q}  2

Observe that |fµ puk q  f  |

p

} }

¤ |f puk q  f  |

(6.3)

¤ }∇fµ puk q}2 |f puk q  f  | ¤ 4Lµ pf pu0 q  f  ¤ 2p2Lµ

For a fix 

µDf qe

1q pf pu0 q  f 

} }

 fµ puk q

 p q

f   fµ puk q. µDf and

b k

p °



i 1

(6.9), we have dk

f   fµ puk q

} }

 p q



µ¸ i 2 }w } 2 i1 k

} } 

} }

q

y

}wki }2 ¤ 2Df , taking into account (6.8) and

2µDf γ Lµ

µDf q e

µDf

b k

γ Lµ

2pf pu0 q  f 

µDf qe

k

b

γ Lµ

2µDf .

3µDf .

¡ 0, from (6.10) in order to achieve an  - approximate solution for the primal problem, we

should force each of the two terms in the above estimate less than or equal to 2 . If we choose the value of



P

20

Xiaolong Qin, Nguyen Thai An

smooth parameter µ to be

 6Df

k where L µ k

O



°

? ln 1 

p i1

 1

} }2 Df

6 Ai 



, we have dk d

¥ 1 2.

Lµ ln γ



¤  whenever 4p2Lµ

1q f pu0 q  f  

 6



,

(6.11)

Thus, we can find an  - approximate solution for primal problem within

l

iterations. The proof is complete.

Remark 6.1 In NESMINO algorithm, a smaller smooth parameter µ is often better because it reduces the error when approximate f by fµ . However, a small µ implies a large value of the Lipschitz constant Lµ which in turn reduces the convergence rate by (6.11). Thus the time cost of the algorithm is expensive if we fix a value for µ ahead of time. In practice, a sequence of smooth problems with decreasing smooth parameter µ is solved and the solution of the previous problem is used as the initial point for the next one. The algorithm stops when a preferred µ is attained. The optimization scheme is outlined as follows. INITIALIZE:

Set k

Ωi , Ai , ai for i  1, . . . , p and w0 , σ

P p0, 1q, µ0 ¡ 0 and µ ¡ 0.

 0.

Repeat the following 1. Apply NESMINO algorithm with µ  µk , u0 wk 2. Update µk

1

1

 argminwPR

n

 v0  wk to find

fµ pwq.

: σµk and set k : k

1.

Until µ ¤ µ . We highlight the fact that the algorithm does not require computation of the Minkowski sum but rather only the projection onto each of the constituent sets Ωi . Fortunately, many useful projection operators are easy to compute. Explicit formula for projection operator PΩ exists when Ω is a closed Euclidean ball, a closed rectangle, a hyperplane, or a half-space. Although there are no analytic solutions, fast algorithms for computing the projection operators exist for the cases of unit simplex, the closed `1 ball (see [10, 6]), or the ellipsoids (see [7]). An advantage of smoothing approach is that in many cases, by making use of the special structure of the support function of Ω, we can have a suitable smoothing technique that can avoid working with implicit projection operator PΩ or employ some fast projection algorithm. We consider two important cases as follows: The case of ellipsoids. Consider the case of ellipsoids associated with Euclidean norm (

E pA, cq : x P Rn : px  cqJ A1 px  cq ¤ 1 , where the shape matrix A is positive definite and the center c is some given point in Rn . It is well known that the support function of this Ellipsoid is σE puq is

? Au J

u Au



?

uJ Au

uJ c and the support point in direction u

c. We can rewrite the support function as follows σE puq  }A1{2 u}



uJ c  σIB A1{2 u



uJ c,

Smoothing algorithms for computing the projection onto a Minkowski sum of convex sets

21

where IB stands for the closed unit Euclidean ball and A1{2 is the square root of A. The smooth approximation gµ of the function g

 σE has the following explicit representation }A1{2 u}2  µ dp A1{2 u ; IB q2

gµ puq 



2

µ

uJ c. 

A1{2 u µ projecting onto the Ellipsoid, we just need to project onto the closed unit ball. and is differentiable on Rn with its gradient given by ∇g puq

The case of polytopes. Consider the polytope S

 A1{2 PIB



c. Thus, instead of

 convta1 , . . . , am u generated by m point in Rn . By [31,

Theorem 32.2], we have σS puq  suptxu, xy : x P S u  max xu, ai y,

¤¤

1 i m

and the support point S is some point ai such that xu, ai y

 σS puq. For α  pα1 , . . . , αm qJ P Rm , we

have max αi

¤¤

1 i m

 sup

#

x1 α1

...

xm αm : xi

¥ 0,

m ¸



+

xi

 1  sup txα, xy : x P ∆m u .

i 1

P ∆m u  σ∆ pAuq, where A  ra1 , a2 , . . . , am sJ is an m  n m matrix whose ith row is aJ i and ∆m is the unit simplex in R . The smooth

approximate function of g  σS  2   µ Au Au } Au} 2 J  2 dp µ ; ∆m q , with ∇gµ puq  A P∆ µ . We thus can employ the fast is gµ puq  2µ

Therefore, σS puq

 suptxAu, xy :

x

m

m

and simple algorithms for computing the projection onto a unit simplex, for example in [5, 6], instead of projection onto a polytope. Remark 6.2 The classical Frank-Wolfe method for solving the problem mintf pxq :

x

P Qu has the

following form xk

 xk

λk psk  xk q,

P Qu; see [11, 18]. Gilbert’s algorithm can be seen as a FrankWolfe type method in which f pxq  }x}2 and the step-size sequence λk is chosen in a special way according to (5.1). It is well-known that the FW method has slow convergence rate of Op1{k q because of where sk



1

argmintx∇f pxk q, sy : s

1 2

the so-called zig-zagging phenomenon [18, 19]. Especially when the optimal solution x does not lie in the

relative interior of Q, the FW iterate tends to zig-zag amongst the vertices that define the face containing x . Theorem 6.1 shows that, the complexity bound O

?1 lnp 1 q of NESMINO is better than the worst-case

complexity Op 1 q of Gilbert’s algorithm. Moreover, we can also reduce the expensive task of projection onto original constituent sets in NESMINO to projection onto some much simpler ones. Very recently, Garber and Hazan [13] proved that in case where the feasible set Q is strongly convex, the Frank-Wolfe method converges at an accelerated rate of Op k12 q. The step-size λk in [13, Algorithm 1] is updated by the following rule λk

ÐÝ argmin λxsk  xk , ∇f pxk qy Pr s

λ 0,1

λ2

βf }sk  xk }2 . 2

From the proof of Lemma 5.1, this is exactly the same as Gilbert’s algorithm with f pxq βf

 1.

 21 }x}2 and

22

Xiaolong Qin, Nguyen Thai An

The rest of this section is devoted to conducting some numerical examples for both NESMINO and Gilber’s algorithm. All the tests are implemented by MATLAB R2016b on a personal computer with an Intel Core i5 CPU 1.6 GHz and 4G of RAM. All of the codes can be found at https://github.com/thaian2784 Example 6.1 Let us first consider a simple example where the optimal solution is known in advance. Con-

sider the minimum norm problem associated with a polytope P in R2 whose vertices are p2, 1q, p2, 1q and

p1, 2q. The projection of the origin onto P is x  p0, 1q. Starting from p 23 , 23 q, we implement NESMINO and Gilbert’s algorithm in 104 iterations and report the result in Figure 5. The NESMINO algorithm, with a fixed value µ

 0.1 converges to the optimal solution x within 10 steps. In contrast, the approximate

values in Gilbert’s algorithm are still changing after 104 iterations. In this case, as the number of iterations is increasing, the Gilbert algorithm alternately chooses the two vertices p2, 1q and p2, 1q as support points of P and turns to be very slow when it approach the solution x ; see Figure 5.

100 Gilbert NESMINO

Iter. 1

10-5

10 100 300

10-10

1000 10000 10-15

0

5

10

15

20

25

30

35

40

45

NESMINO

Gilbert

p1.5, 1.5q p0, 1q p0, 1q p0, 1q p0, 1q p0, 1q

p1.5, 1.5q p0.0611, 1.1071q p0.0090, 1.0177q p0.0032, 1.0063q p0.0010, 1.0020q p0.0001, 1.0002q

50

Fig. 5 A comparison of NESMINO and Gilbert’s algorithm for finding the projection onto a polytope.

The above simple case can be modified to get a polytope in arbitrary n dimension to which the Gilbert

 2` 1. We generate a polytope P of m vertices according to the MATLAB syntax: A  randp`, n  1q, B  rrA; As, onesp2`, 1qs and P  rB; rrandp1, n  1q, 10ss. Polytope P in this case has all vertices belonging to the hyperplane xn  1 except the last one belonging to the hyperplane xn  10 and the projection of the origin in this case is x  rzerospn  1, 1q; 1s. For this polytope, NESMINO converges in several steps while Gilbert’s algoalgorithm usually has the zig-zagging phenomenon. Let m

rithm converges very slowly as above.

Example 6.2 We now consider the problem of computing the projection onto a sum of polytopes in hight dimension. We generate p polytopes in n dimension space, each of them has m vertices. The vertices of

the ith polytope are rows of an m  n matrix Ai that is randomly generated by the MATLAB function (2*i)*rand(m,n).

For general problem, the exact solution x to (1.1) cannot be computed analytically. Evaluating the

complexity of the algorithms based on the number of iterations requires defining a nearly optimal solution.

Smoothing algorithms for computing the projection onto a Minkowski sum of convex sets

10-1

23

10-1 Gilbert NESMINO

Gilbert NESMINO

10-2

10-2

10-3

10-3

10-4

10-4

10-5

0

20

40

60

80

100

120

140

160

180

10-5

200

100

0

20

40

60

80

100

120

140

160

180

200

10-1 Gilbert NESMINO

Gilbert NESMINO

10-1 10-2 10-2

10-3 10-3 10-4

10-5

0

20

40

60

80

100

120

140

160

180

10-4

200

0

20

40

60

80

100

120

140

160

180

200

Fig. 6 Performance of NESMINO and Gilbert’s algorithm in finding the projection onto the sum of two polytopes. We use µ0 10, σ



 0.5 for all cases.

For each problem, we first run Gilbert’s algorithm in a large enough number of iterations to find such a referenced solution. (a) Let us first consider the case where p

 2. For each value of the pair pm, nq, we generated 100

different problems and track the progress of each algorithm during the iteration by computing the relative error

}xk }}x } . We implement NESMINO with the geometrically decreasing sequence µ k }x  }

smooth parameter µ, i.e., µ0

 10 and σ 

1 2

 10

 1 k 2

of

in Remark 6.1. We switch to the next smaller µ whenever

}∇fµ puk q} ¤   103 . Each step of NESMINO requires to compute the full gradient ∇fµ pv q 

p ¸



i 1

A J P∆ i

 m

Ai v µ



1 v. 2

We can reduce computation time without using the for loop by the following line of MATLAB code P1  reshapepsimplexprojpreshapepp1{muq  P  v, rs, pqq, rs, 1q where P

p1{2q  v;

 rAJ1 , . . . , AJp sJ P Rpmn and simplexproj(Y) is a fast procedure to projection each column in

Y onto the unit simplex, see [6]. Figure 6 plots the average of the relative error on a log scale at each iteration of the two methods. From this test, we can see that Gilbert’s algorithm usually decreases extremely fast at beginning iterations and turns to be slow after that. In contrast, smoothing algorithm decreases slowly at starting iterations and its improvement will be faster than that of Gilbert’s algorithm from a certain iteration k0 . The value of k0 is

24

Xiaolong Qin, Nguyen Thai An

10-1

10-1 Gilbert NESMINO

Gilbert NESMINO

10-2

10-2

10-3 10-3 10-4 10

-4

10-5 10-5

10-6

10-6

0

100

200

300

400

500

600

700

800

900

10-7

1000

0

100

200

300

400

500

Fig. 7 Performance of NESMINO and Gilbert’s algorithm for medium scale problems. We use µ0 µ0

600

700

800

900

1000

 20, σ  0.5 for the left and and

 200, σ  0.5 for the right.

larger when the number of vertices m is larger. When m is quite small, the NESMINO scales well with the space dimensions and performs better than Gilbert’s algorithm. NESMINO is quite sensitive to the choice of the smooth parameter µ and therefore to both µ0 and σ. When the size of the problems is large, increasing the value of µ0 can significantly improve the performance of NESMINO. Figure 7 shows the results for the cases of larger p and m. We can see that NESMINO performs well at final iterations. (b) Now we consider the case with very large number of components p in the Minkowski sum. NESMINO gets stuck because its computational cost in each of its iteration turns to be very expensive when p large. An advantage of smoothing approach is that it allows us to use some stochastic methods in this case. For our minimum norm problem with large p, we apply the SAGA [9] to minimize the objective function fµ instead of fast gradient method as before. The resulting algorithm is called SAGA-NESMINO. It is known that SAGA is inspired from SAG (Stochastic Average Gradient) [32]. However, instead of using a biased gradient estimate as in SAG, SAGA use an unbiased update direction. Given x0 and

 ∇fi px0 q for i  1, . . . , p, to minimize f pxq  p1 °pi1 fi pxq, at the kth iteration, SAGA picks an index j uniformly at random from t1, . . . , pu, sets yki  ∇fj pxk1 q if i  j and yki  yki 1 otherwise and

y0i

then updates xk

 xk1  α



ykj







1¸ i y . p i1 k1 p

ykj 1

Recall that our objective function for the case of p polytopes can be written as fµ puq 

p ¸



fi puq 

i 1

p ¸



i 1



}Ai u}2  µ dp Ai u ; ∆ q2 m 2µ 2 µ

and the gradient of the ith component is ∇fi puq

AJ i P∆m





Ai u 1 u for i  1, . . . , p. Each µ 2p 1 component function fi is µ-strongly convex with L-Lipschitz continuous gradient, where µ  2p and L



1 µ i max 1,...,p



}Ai }2

1 2p .





1 }u}2 , 4p

At each iteration, SAGA-NESMINO requires to compute only one projection

instead of p projection as in NESMINO. Therefore, the computational cost of SAGA-NESMINO is much cheaper when p is large. In this test, we set the smooth parameter µ and use the constant step-size α

 0.01, take x0  0 and yoi  ∇fi px0 q for i  1, . . . , p

10  64L . To have a fair comparison between a stochastic method and a full 4

Smoothing algorithms for computing the projection onto a Minkowski sum of convex sets

100

25

100 Gilbert SAGA-NESMINO

10

Gilbert SAGA-NESMINO

-2

10

-2

10-4

10-4

10-6

10-6

10-8

10-8

10-10

10-10

10-12

0

10

20

30

40

50

60

70

80

90

100

10-12

0

10

20

30

40

50

60

70

80

90

100

Fig. 8 Comparison between SAGA-NESMINO and Gilbert’s algorithm for large scale problems.

gradient method, we identify p iterations of SAGA-NESMINO with 1 iteration of Gilbert’s algorithm. For each case of pp, m, nq, we run SAGA-NESMINO in 20 times and plot the relative error at each iteration on Figure 8. Once again, we can see that smoothing algorithm can reach a high accuracy approximation solution (for example, with relative error up to 1010 ) faster than Gilbert’s algorithm.

7 Conclusions Minimum norm problems have been studied from both theoretical and numerical point of view in this paper. Based on duality approach, it is shown that projection onto a Minkowski sum of sets can be represented as the sum of points on constituent sets so that, at these points, all of the sets share the same normal vector. By combining Nesterov’s smoothing technique and his fast gradient scheme, we have developed a numerical algorithm for solving the problems. The proposed NESMINO is proved to have a better complexity bound than the worst-case complexity bound of Gilbert’s algorithm. Gilbert’s algorithm usually decreases slowly as it approaches the solution and therefore it is slow in finding an approximate solution with high accuracy. In such situations, smoothing-based method can be seen as a good alternative. Acknowledgments This article was supported by the National Natural Science Foundation of China under Grant No.11401152. Research of the second author was supported by the China Postdoctoral Science Foundation under Grant No. 2017M622991 and the Vietnam National Foundation for Science and Technology Development under grant No. 101.01-2017.325.

References 1. Bergen, G.: A fast and robust GJK implementation for collision detection of convex objects, Tech. report, Department of Mathematics and Computing Science, Eindhoven University of Technology, 1999. 1

26

Xiaolong Qin, Nguyen Thai An

2. Borwein, J.M., Lewis, A.S.: Convex Analysis and Nonlinear Optimization: Theory and Examples, CMS books in Mathematics. Canadian Mathematical Society, 2000. 2.1 3. Cameron, S.: Enhancing GJK: Computing minimum and penetration distances between convex polyhedra, In Proceedings of International Conference on Robotics and Automation, 3112-3117, 1997. 1 4. Chang, L., Qiao, H., Wan, A., Keane, J.: An Improved Gilbert Algorithm with Rapid Convergence, Proceedings of the IEEE/RSJ International Conference on Intelligent Robots and Systems, pp. 3861- 3866, 2006. 1 5. Chen, Y., Ye,X.: Projection onto a simplex, CoRR, abs/1208.4873 6 6. Condat, L.: Fast projection onto the simplex and the `1 ball, Math. Program., 158 (2016), 575-585. 6, 6.2 7. Dai, Y.H.: Fast algorithms for projection on an ellipsoid, SIAM J. Optim., 16 (2006), 986-1006. 6 8. Dax, A.: A new class of minimum norm duality theorems, SIAM J. Optim., 19 (2009) 1947-1969. 4 9. Defazio, A., Bach, F., Lacoste-Julien, S.: Saga: A fast incremental gradient method with support for non-strongly convex composite objectives. Advances in Neural Information Pro- cessing Systems, 2014. 6.2 10. Duchi, J., Shalev-Shwartz, S., Singer, Y., Chandra, T.: Efficient projections onto the `1 -ball for learning in high dimensions, in Proceedings of the 25th ACM International Conference on Machine learning, 2008, 272-279. 6 11. Frank, M., Wolfe, P.: An algorithm for quadratic programming, Naval Res. Logis. Quart., 3 (1956), 95-110. 6.2 12. Gabidullina, Z.R.: The problem of projecting the origin of euclidean space onto the convex polyhedron, http://arxiv.org/abs/1605.05351. 1 13. Garber, D., Hazan, E.: Faster rates for the frank-wolfe method over strongly-convex sets. In ICML, pages 541?549, 2015. 6.2 14. Gilbert, E.G.: An iterative procedure for computing the minimum of a quadratic form on a convex set, SIAM J. Contr., 4 (1966), 61-80. 1, 5 15. Gilbert, E.G., Johnson, D.W., Keerthi, S.S.: A fast procedure for computing the distance between complex objects in threedimensional space, IEEE Trans. Robot. Autom., 4 (1988), 193-203. 1 16. Gilbert, E.G., Foo, C.-P.: Computing the distance between general convex objects in three-dimensional space, IEEE Trans. Robot. Autom. 6 (1990), 53-61. 1 17. Hiriart-Urruty, J.B., Lemar´echal, C.: Convex Analysis and Minimization Algorithms, I and II, Grundlehren Math. Wiss. 305 and 306. Springer-Verlag, Berlin, 1993. 2 18. Jaggi, M.: Revisiting Frank-Wolfe: Projection-Free Sparse Convex Optimization, In ICML (1), pp. 427-435, 2013. 6.2 19. Lacoste-Julien, S., Jaggi, M.: On the global linear convergence of Frank-Wolfe optimization variants. In Advances in Neural Information Processing Systems, pages 496?504, 2015. 6.2 20. Keerthi, S.S., Shevade, S.K., Bhattacharyya, C., Murthy, K.R.K.: A fast iterative nearest point algorithm for support vector machine classifier design, IEEE Trans. Neural Netw., 11 (2000), 124-136. 1 21. Kurzhanskiy, A.A., Varaiya, P.: Ellipsoidal Toolbox, Tech. Report EECS-2006-46, EECS, UC Berkeley, 2006. 3 22. Luenberger, D.G.: Optimization by Vector Spaces Method, John Wiley and Sons, Inc., New York, 1969. 4 23. Martin, S.: Training support vector machines using Gilbert’s algorithm, The 5th IEEE International Conference on Data Mining (ICDM), 2005, 306-313. 1

Smoothing algorithms for computing the projection onto a Minkowski sum of convex sets

27

24. Mitchell, B.F., Demyanov, V.F., Malozemov, V.N.: Finding the point of a polyhedron closest to the origin, SIAM J. Control Optim., 12 (1974), 19-26. 1 25. Mordukhovich, B.S., Nam, N.M.: Limiting subgradients of minimal time functions in Banach spaces, J. Global Optim., 46 (2010), 615-633. 2 26. Nam, N.M., An, N.T., Rector, R.B., Sun,J.: Nonsmooth algorithms and Nesterov smoothing technique for generalized FermatTorricelli problems, SIAM J. Optim., 24 (2014), No. 4, 1815-1839. 4 27. Nesterov, Y.: Smooth minimization of non-smooth functions, Math. Program. 103 (2005), 127-152. 1, 3, 3 28. Nesterov, Y.: Introductory lectures on convex optimization. A basic course, Appl. Optim. 87, Kluwer Academic Publishers, Boston, MA, 2004. 1, 3, 3 29. Nesterov, Y.: A method for unconstrained convex minimization problem with the rate of convergence O(

1 ), Dokl. Akad. Nauk k2

SSSR, 269 (1983), 543-547. 3 30. Nirenberg, L.: Functional Analysis, Academic Press, New York, 1961. 4 31. Rockafellar, R.T.: Convex Analysis, Princeton University Press, Princeton, NJ, 1970. 6 32. Schmidt, M., Le Roux, N., Bach, F.: Minimizing finite sums with the stochastic average gradient. Technical report, INRIA, hal-0086005, 2013. 6.2 33. Tuy, H.: Convex Analysis and Global Optimization. Nonconvex Optimization and Its Applications, Kluwer Academic Publishers, 1998. 2 34. Wolfe, P.: Finding the nearest point in a polytope, Math. Programm., 11 (1976), 128-149. 1