PIECEWISE LINEAR APPROXIMATIONS IN ...

3 downloads 0 Views 217KB Size Report
Nonsmooth optimization, cutting planes, bundle methods. AMS subject classifications. 90C26, 65K05. 1. Introduction. Nonsmooth optimization deals with the ...
PIECEWISE LINEAR APPROXIMATIONS IN NONCONVEX NONSMOOTH OPTIMIZATION∗ M. GAUDIOSO† , E. GORGONE† , AND M.F. MONACO† Abstract. We present a bundle type method for minimizing nonconvex nondifferentiable functions of several variables. The algorithm is based on the construction of both a lower and an upper polyhedral approximation of the objective function. In particular, at each iteration, a search direction is computed by solving a quadratic program aiming at maximizing the difference between the lower and the upper model. A proximal approach is used to guarantee convergence to a stationary point under the hypothesis of weakly semismoothness. Key words. Nonsmooth optimization, cutting planes, bundle methods. AMS subject classifications. 90C26, 65K05

1. Introduction. Nonsmooth optimization deals with the problem of finding the minima of real functions of several variables in absence of differentiability hypothesis. From the historical point of view nonsmooth optimization aimed first at treating convex functions, under the impulse provided by the development of convex analysis [26]. Cutting plane [3, 15], subgradient method [5], space dilatation approach [28], and bundle methods [18, 30] were the most important proposals in the area. In more recent years convex nonsmooth optimization has grown considerably and many variants of the basic approaches have been developed. Special classes of convex problems have been tackled, in areas such as eigenvalue functions [13], variational inequalities [17, 20], and network optimization [10]. Minmax problems, a typical class of nonsmooth problems, have been dealt with by exploiting their peculiar structure [25, 9]. Nonsmooth nonconvex problems have been often treated by adapting algorithms primarily designed for the convex case (see, e.g., [23] and [27]), while several proposals, conceived for dealing both with nonsmoothness and nonconvexity, have recently appeared in the literature ( see [29], [2] and [12]). In addition, new fields of application of nonconvex nonsmooth optimization have been focused as in [1]. The bundle approach, primarily devised for dealing with convex minimization, has been specialized in [7, 8] to the nonconvex case by splitting the bundle into two subsets related to the points that exhibit, respectively, some kind of “convex” or “nonconvex” behavior. In this paper the bundle splitting idea is still present, giving rise to two models of the objective function (one convex and the other concave) which are both piecewise ∗ This research has been partially supported by the Italian “Ministero dell’Istruzione, dell’Universit` a e della Ricerca Scientifica”, under Prin project Numerical methods for global optimization and for some classes of nonsmooth optimization problems (2005017083 002). † Dipartimento di Elettronica Informatica e Sistemistica, Universit` a della Calabria, 87036 Rende (CS), Italia. E-mail: [email protected], [email protected], [email protected]

1

2

M. Gaudioso, E. Gorgone, M.F. Monaco

affine. The two models are combined in a unique quadratic subproblem embedding a typical proximity measure, whose solution provides a tentative displacement with respect to the current point (the stability center) in a descent procedure. Termination of the algorithm at an approximate optimal solution is proved under the hypothesis that the function is locally Lipschitz and weakly semismooth. The paper is organized as follows. In section 2 the bundle–splitting model is presented. The algorithm is discussed in section 3, while the convergence is proved in section 4. Finally some numerical results are reported in section 5. The following notations are adopted throughout the paper. We denote by k · k the Euclidean norm in Rn , by aT b the standard inner product of the vectors a and b, and by e a vector of ones of appropriate dimension. The generalized gradient of a Lipschitz function f : Rn 7→ R at any point x is denoted by ∂f (x). 2. The model. Consider the following unconstrained minimization problem: min f (x)

x∈Rn

where f : Rn 7→ R is not necessarily differentiable. We assume that f is weakly semismooth, and consequently f is locally Lipschitz, i.e. it is Lipschitz on every bounded set. Since f is locally Lipschitz, then it is differentiable almost everywhere. It is well known [4] that, under the above hypotheses, it is defined at each point x the generalized gradient (or Clarke’s gradient or the subdifferential) ∂f (x) = conv{g | g ∈ Rn , ∇f (xk ) → g, xk → x, xk 6∈ Ωf }, Ωf being the set (of zero measure) where f is not differentiable. An extension of the generalized gradient is the Goldstein ²-subdifferential ∂²G f (x) defined as ∂²G f (x) = conv{∂f (y) | ky − xk ≤ ²} . We assume also that we are able to calculate at each point x both the objective function value and a subgradient g ∈ ∂f (x), i.e. an element of the generalized gradient. We recall that a function f : Rn 7→ R is weakly semismooth at x (see [19, 24, 27]) if it is Lipschitz around x and lim g(t)T d t↓0

exists for all d ∈ Rn , where g(t) ∈ ∂f (x + td). In particular, if f is weakly semismooth at x, the directional derivative f 0 (x, d) of f along the direction d exists for all d ∈ Rn and f 0 (x, d) = lim g(t)T d . t↓0

Piecewise linear approximations in nonconvex nonsmooth optimization

3

Moreover f is weakly semismooth on Rn if it is weakly semismooth at each x ∈ Rn . Now we describe our model, focusing on some differences with respect to the standard bundle methods valid for the convex case. We denote by xj the current estimate of the minimum (i.e. the “stability center”) in an iterative procedure and by gj any subgradient of f at xj . In the sequel we will assume that an initial estimate of the minimum, say x0 , is available and that at least the stability center xj belongs to the level set defined by x0 , i.e. xj ∈ {x ∈ Rn | f (x) ≤ f (x0 )}. The bundle of available information is the set of elements 4

(xi , f (xi ), gi , αi , ai ) with i ∈ I = {1, . . . , j}, where xi , i ∈ I, are the points previously generated during the procedure, gi is a subgradient of f at xi , αi is the linearization error between the actual value of the objective function at xj and the linear expansion generated at xi and evaluated at xj , i.e. 4

αi = f (xj ) − f (xi ) − giT (xj − xi ) and 4

ai = kxj − xi k . The classical cutting plane method [3, 15] minimizes at each iteration the cutting plane function fj (x) defined as © ª 4 fj (x) = max f (xi ) + giT (x − xi ) . i∈I

It is worth noting that in the nonconvex case αi may be negative, since the first order expansion at any point does not necessarily support from below the epigraph of the function. Thus we partition the set I into two sets I + and I − , defined as 4

I + = {i | αi ≥ −σ}

(2.1)

4

and I − = {i | αi < −σ} ,

for some σ > 0. The bundles corresponding to the index sets I + and I − are characterized by points that exhibit, respectively, a “convex behavior” and a “concave behavior” relative to xj . We observe that I + is never empty, as at least the element (xj , f (xj ), gj , 0, 0) belongs to the bundle. 4

Let h(d) = f (xj + d) − f (xj ) be the difference function. The basic idea of our approach is to construct two piecewise affine models of h, using separately the two

4

M. Gaudioso, E. Gorgone, M.F. Monaco

bundles. In particular, setting αi = max{αi , 0} for all i ∈ I + , we define the two piecewise affine functions: © ª 4 ∆+ (d) = max giT d − αi i∈I +

and ¾ ½ 4 T d − α } , ∆− (d) = min 0, min {g i i − i∈I

which are convex and concave, respectively. We remark that ∆− (d) is equal to zero around d = 0 and ∆+ (0) = ∆− (0) = h(0). Our approach, at current point xj , consists in finding a tentative stepsize by solving the following problem: 1 min ρdT d + ∆(d) d 2

(2.2) 4

where ∆ = ∆+ − ∆− and ρ > 0 is the “proximity parameter” introduced for both stabilization and well-posedness purposes. Observe that at the minimum of (2.2) both ∆+ and ∆− are nonnegative, thus they are concordant in predicting non-increase in the objective function. On the other hand, since we minimize the difference ∆+ − ∆− , the model tends to locate the new “sample point” in an area where the disagreement of the two models is maximal. The problem (2.2) is equivalent to  1   ρkdk2 + v − w zρ∗ = min   v,w,d 2        v ≥ giT d − αi i ∈ I+ QP (ρ)     i ∈ I− w ≤ giT d − αi        w≤0. whose the dual is

DP (ρ)

   ζρ∗ =  

min

λ≥0,µ≥0

1 T T λ − α− µ kG+ λ − G− µk2 + α+ 2ρ eT λ = 1;

eT µ ≤ 1

where G+ and G− are matrices whose columns are, respectively, the vectors gi , i ∈ I + , and gi , i ∈ I − . Analogously, the terms αi , i ∈ I + , and αi , i ∈ I − , are grouped into the vectors of appropriate dimension α+ and α− , respectively.

Piecewise linear approximations in nonconvex nonsmooth optimization

5

Indicating by (vρ∗ , wρ∗ , d∗ρ ) and (λ∗ρ , µ∗ρ ) the optimal solutions of QP (ρ) and DP (ρ) respectively, the following primal-dual relations hold: ¢ 1¡ −G+ λ∗ρ + G− µ∗ρ ρ

(2.3a)

d∗ρ =

(2.3b)

T ∗ vρ∗ = (G+ λ∗ρ )T d∗ρ − α+ λρ

(2.3c)

T ∗ wρ∗ = (G− µ∗ρ )T d∗ρ − α− µρ

We observe that, since the triple (v, w, d) = (0, 0, 0) is feasible for QP (ρ), zρ∗ ≤ 0. Consequently, vρ∗ ≤ − 12 ρkd∗ρ k2 + wρ∗ ≤ 0 and vρ∗ − wρ∗ ≤ 0. Assuming that the set 4

F0 = {x ∈ Rn | f (x) ≤ f (x0 )} is compact, we indicate by L0 a Lipschitz constant of f on F0 and by Lρ a Lipschitz constant of f on ½ ¾ 2L0 Fρ = x ∈ Rn | dist (x, F0 ) ≤ . ρ Before giving a formal description of the algorithm, we state some simple properties of problem QP (ρ), assuming that ρ is not smaller than a fixed positive threshold ρmin . Lemma 2.1. Let δ > 0 and vρ∗ − wρ∗ ≥ −δ, then 1 kG+ λ∗ρ − G− µ∗ρ k2 ≤ δ. ρ Proof. The property follows by (2.3) and by, respectively, nonnegativity of α+ and negativity of α− . In fact we have 1 T ∗ T ∗ µρ λρ + α− −δ ≤ vρ∗ − wρ∗ = − kG+ λ∗ρ − G− µ∗ρ k2 − α+ ρ 1 ≤ − kG+ λ∗ρ − G− µ∗ρ k2 . ρ Lemma 2.2. For any ρ > 0 the following inequality holds: kd∗ρ k ≤

2L0 . ρ

6

M. Gaudioso, E. Gorgone, M.F. Monaco

Proof. Since 12 ρkd∗ρ k2 + vρ∗ − wρ∗ ≤ 0, we have 1 vρ∗ + ρkd∗ρ k2 ≤ wρ∗ ≤ 0. 2 Hence, taking into account that 1 1 1 0 ≥ vρ∗ + ρkd∗ρ k2 ≥ gjT d∗ρ + ρkd∗ρ k2 ≥ −kgj kkd∗ρ k + ρkd∗ρ k2 , 2 2 2 the thesis follows by considering that kgj k ≤ L0 as by hypothesis xj belongs to F0 . 4

Lemma 2.3. Let α ˆ ρ = f (xj ) − f (xj + d∗ρ )+ gˆT d∗ρ , with gˆ ∈ ∂f (xj + d∗ρ ). For σ > 0 there exists a threshold value ρ¯(σ) such that |ˆ αρ | ≤ σ for all ρ ≥ ρ¯(σ). Proof. By lemma 2.2, we have, for all ρ > 0, kd∗ρ k ≤ taking into account ρ ≥ ρmin , we have

2L0 . Since xj + d∗ρ ∈ Fρ , ρ

|ˆ αρ | ≤ |f (xj ) − f (xj + d∗ρ )| + |ˆ g T d∗ρ | ≤ 2Lρmin kd∗ρ k ≤ Consequently for the threshold value we have ρ¯(σ) =

4Lρmin L0 . ρ

4Lρmin L0 . σ

3. The algorithm. In this section we describe our algorithm based on repeatedly solving problem QP (ρ), or equivalently DP (ρ). The core of the algorithm is the “main iteration”, i.e. the set of steps where the stability center remains unchanged. Two exits from the “main iteration” may occur: (i) termination of the whole algorithm, due to satisfaction of an approximate stationarity condition; (ii) update of the stability center, due to satisfaction of a sufficient decrease condition. The initialization of the algorithm requires a starting point x0 ∈ Rn . The initial stability center is set equal to x0 . The initial bundle is made up by just one element (x0 , f (x0 ), g0 , 0, 0), where g0 ∈ ∂f (x0 ). Consequently I − is the empty set, while I + is a singleton. The following global parameters are to be set: • the descent parameter m1 ∈ (0, 1); • the concave cut parameter m2 > 1; • the lower threshold on the proximity parameter ρmin > 0; • the increase parameter R > 1; • the distance parameter ² > 0;

Piecewise linear approximations in nonconvex nonsmooth optimization

7

• the linearization error threshold parameter σ > 0; • the stopping parameter η > 0. The algorithm can be summarized as follows: Algorithm Outline 1. Initialization. 2. “Main iteration”. 3. Bundle updating and return to step 2. To simplify the notations, in the sequel we drop the subscript ρ for all quantities defined in the previous section. We remark that in general the “main iteration” maintains the (updated) bundle of information from previous iterations. Updating the bundle is necessary since the quantities αi and ai are dependent on the stability center. Algorithm 3.1 (Main Iteration). Step 0. Select ρ ≥ ρmin . Step 1. Solve QP (ρ) (or DP (ρ)) and determine (d∗ , v ∗ , w∗ , λ∗ , µ∗ ).

Set δ =

η2 . If v ∗ − w∗ < −δ go to step 3. ρ

Step 2. Two cases can occur: Case a): µ∗ = 0. If ai ≤ ² for all i ∈ I + such that λ∗i > 0 then stop, else delete from I + all the i’s such that ai > ², set ρ = Rρ and return to step 1; Case b): µ∗ 6= 0. Reset I − . Set ρ = Rρ and return to step 1. Step 3. Evaluate f (xj + d∗ ). If f (xj + d∗ ) − f (xj ) ≤ m1 v ∗ , then exit from main iteration (stability center update). Step 4. Calculate gˆ ∈ ∂f (xj + d∗ ), and α ˆ = f (xj ) − f (xj + d∗ ) + gˆT d∗ . Step 5. Three cases can occur: Case a): α ˆ ≥ 0. Update the bundle, by introducing a new element in I + , and return to step 1;

8

M. Gaudioso, E. Gorgone, M.F. Monaco

Case b): −σ ≤ α ˆ < 0. Set α ˆ = 0 and find a scalar t ∈ (0, 1] such that g(t) ∈ ∂f (xj +td∗ ) satisfies the condition g(t)T d∗ ≥ m1 v ∗ . Update the bundle, by introducing a new element in I + , and return to step 1; Case c): α ˆ < −σ. If gˆT d∗ − α ˆ ≤ m2 w∗ , update the bundle by introducing a new − element in I . Set ρ = Rρ and return to step 1. Before discussing the convergence, the following remarks are in order. Remark 3.2. We observe that from the definition of α ˆ , whenever cases a) and b) at step 5 of the main iteration occur, the condition gˆT d∗ − α ˆ > m1 v ∗ holds. On the other hand whenever case b) occurs, since α ˆ is set equal to zero, the condition gˆT d∗ − α ˆ > m1 v ∗ holds as well. We observe that the problem of finding the scalar t, at case b), is well posed. In fact since the directional derivative f 0 (xj + td∗ , d∗ ) exists for any t ≥ 0, from the mean value theorem (see [6], Chap. 3, Prop. 3.1) it follows that f (xj + d∗ ) − f (xj ) = c

(3.1) 0 0 for some c ∈ [finf , fsup ], where 4

0 −∞ < finf = inf f 0 (xj + td∗ , d∗ ) 0≤t≤1

and

4

0 fsup = sup f 0 (xj + td∗ , d∗ ) < ∞ 0≤t≤1

Moreover, taking into account that the sufficient decrease condition is not satisfied, i.e. f (xj + d∗ ) − f (xj ) > m1 v ∗ , 0 there exists a scalar t¯ ∈ (0, 1] such that by (3.1) and the definition of fsup

f 0 (xj + t¯d∗ , d∗ ) > m1 v ∗ and, by weakly semismoothness assumption, we have lim g(t)T d∗ > m1 v ∗ , t↓t¯

where g(t) ∈ ∂f (xj + td∗ ). Consequently the inequality g(t)T d∗ > m1 v ∗ holds in some interval (t¯, tˆ). Remark 3.3. When the stopping criterion at step 2 is met, the following conditions hold: (3.2) (3.3) (3.4)

v ∗ − w∗ ≥ −δ; µ∗ = 0; a∗i ≤ ²

∀ i : λi > 0.

Piecewise linear approximations in nonconvex nonsmooth optimization

9

Hence, by equations (2.3), we have 1 T ∗ T ∗ −δ ≤ v ∗ − w∗ = − kG+ λ∗ − G− µ∗ k2 − α+ λ + α− µ ρ From (3.3) and nonnegativity of α+ , we have (3.5)

1 kG+ λ∗ k2 ≤ δ ρ

which in turn, taking into account (3.4) and δ ≤ ∂²G f (xj ) and kgk ≤ η.

η2 4 , implies both g = G+ λ∗ ∈ ρ

4. Convergence. In this section we prove the termination of the algorithm at a point satisfying an approximate stationarity condition. In particular we prove that, for any given ² > 0 and η > 0, it is possible to set the input parameters so that, after a finite number of “main iteration” executions, the algorithm stops at a point xj satisfying the condition kgk ≤ η , with g ∈ ∂²G f (xj ) . In the proofs, for simplicity of notation, we have dropped the superscript “ * ”, in addition by xj we still indicate the current stability center. We start proving the following: Lemma 4.1. The “main iteration” algorithm cannot loop infinitely many times without entering step 2. Proof. Suppose that the algorithm loops infinitely many times (i.e. the descent test at step 3 is never satisfied) without entering step 2. We index by k all the quantities referred to the k-th passage through steps 1-5. We observe that the case c) at step 5 cannot occur infinitely many times. In fact, whenever α ˆ k < −σ, the parameter ρk is increased and consequently there exists an index kˆ such that α ˆ k ≥ −σ for all k > kˆ (see lemma 2.3). Thus only cases a) or b) can occur infinitely many times and, taking into account the remark 3.2 at the end of previous section, the condition dTk gˆk − α ˆ k > m1 vk is met infinitely many times too. Letting k¯ index the ¯ last passage through case c) at step 5, we note that ρk remains constant for all k ≥ k. ¯ Consequently the sequence {zk }, for k ≥ k, is monotically nondecreasing, bounded from above, and hence convergent. Moreover, the condition ρk ≥ ρmin implies, by 2L0 lemma 2.2, kdk k ≤ , hence {dk } belongs to a compact set and there exists ρmin a convergent subsequence, say {dk }k∈K 0 . Thus the subsequence {vk − wk }k∈K 0 is convergent. Furthermore, we have −δ > vk ≥ gjT dk ≥ −

2L20 , ρmin

10

M. Gaudioso, E. Gorgone, M.F. Monaco

hence {vk }k∈K 0 is bounded and consequently there exist two convergent subsequences {vk }k∈K 00 ⊂K 0 and {wk }k∈K 00 ⊂K 0 . Now let i and s be two successive indices in K 00 and let limk∈K 00 vk = v¯, then we have: dTi gˆi − αˆi ≥ m1 vi dTs gˆi − αˆi ≤ vs , that is vs − m1 vi ≥ (ds − di )T gˆi , which implies v¯ ≥ 0. Observe that v¯ ≥ 0 contradicts the hypothesis that the algorithm never enters step 2. In fact, in this case, we would have: vk < wk − δ < −δ

for all k,

which, taking into account vk → v¯, would imply v¯ ≤ −δ. Now we can prove finite termination of the “main iteration”. Lemma 4.2. If the “main iteration” does not terminate with satisfaction of the sufficient decrease condition at step 3, then the stopping condition is met after a finite number of passages through step 2. Proof. Assume that the sufficient decrease condition at step 3 of the “main iteration” is never satisfied. If we assume also that the stopping condition is never satisfied, by lemma 4.1 we have that step 2 is entered infinitely many times and 2L0 the parameter ρ grows indefinitely. Let k˜ be an index such that ρk˜ ≥ and, ² consequently, all points newly generated by the algorithm for k ≥ k˜ are characterized by a distance from the stability center ai ≤ ² (lemma 2.2). We observe also that (see the proof of lemma 4.1), for sufficiently large values of ρ, only modifications of I + can occur. Thus, taking into account the reset of I − and the deletion of elements of I + , for sufficiently large values of ρ, we have I − = ∅ and ai ≤ ² for all i ∈ I + . Thus, the stopping condition is met after a finite number of passages through step 2. Remark 4.3. The proofs of the previous lemmas ensure also that the value of the proximity parameter ρ cannot become arbitrarily large. Now we are ready to prove the overall finiteness of the algorithm. Theorem 4.4. For any ² > 0 and η > 0, the algorithm stops in a finite number of “main iterations” at a stability center xj satisfying the approximate stationarity condition (4.1)

kgk ≤ η,

with g ∈ ∂²G f (xj ).

Proof. We prove, by contradiction, that the stopping criterion is satisfied in a finite number of “main iterations”. Suppose in fact that an infinite number of main

Piecewise linear approximations in nonconvex nonsmooth optimization

11

iterations occurs. Then the descent condition at step 3 is verified infinitely many (k) times. Let xj be the stability center at the k-th main iteration and v (k) and δ (k) be, respectively, the values of v ∗ and δ for which the descent condition at step 3 has been fulfilled. Then (k+1)

f (xj

(k)

) ≤ f (xj ) + m1 v (k)

and, after k main iterations, (k)

(0)

η2

(0)

f (xj ) ≤ f (xj ) − km1 δ (k) ≤ f (xj ) − km1

ρmax

,

where ρmax is any upper bound on the proximity parameter ρ (see remark (4.3)) Passing to the limit we have (k)

(0)

lim f (xj ) − f (xj ) ≤ −∞,

k→∞

which is a contradiction, since f is bounded from below by hypothesis. 5. Implementation and numerical results. It is worth noting that the algorithm described in section 3 can produce a bundle whose size can grow indefinitely. Thus, to make the method implementable, it is important to introduce bounded storage for the bundle. Of course it is necessary as well to show that convergence properties proved in section 4 are retained under such hypothesis. To tackle the problem we introduce an aggregation technique scheme (see Kiwiel [16]) widely used in bundle methods [14]. In particular let (d∗ , v ∗ , w∗ ) and (λ∗ , µ∗ ) be, respectively, the solution of QP (ρ) and DP (ρ) at step 1 of the “main iteration”. If we define the aggregate quantities 4

4

4

4

T ∗ λ gp = G+ λ∗ , αp = α+

and, in case µ∗ 6= 0, T ∗ µ , gm = G− µ∗ , αm = α−

it is easy to verify that the aggregate    min   v,w,d            a QP (ρ)               

problem QP a (ρ) 1 ρkdk2 + v − w 2 v ≥ gpT d − αp v ≥ giT d − αi

i ∈ I¯+

T w ≤ gm d − αm

w ≤ giT d − αi

i ∈ I¯−

12

M. Gaudioso, E. Gorgone, M.F. Monaco

has the same optimal solution (d∗ , v ∗ , w∗ ) as QP (ρ), where I¯+ and I¯− are arbitrary subsets of I + and I − respectively. Of course, in case I − = ∅ or µ∗ = 0, the formulation T of the aggregate problem does not contain the constraint w ≤ gm d−αm and (d∗ , v ∗ , 0) is still optimal. Now suppose that at a certain execution of the “main iteration”, the quadratic program QP (ρ) (or DP (ρ)) is solved, and the corresponding optimal dual vector (λ∗ , µ∗ ) is calculated. If we calculate also the quantities gp , αp , gm , αm , it is possible to construct the aggregate problem QP a (ρ) by inserting the aggregated constraints into QP (ρ) and deleting part of its bundle elements. Thus, the new quadratic program can be obtained by inserting the new constraint, corresponding to the new bundle element calculated at step 5 of the “main iteration”, into the aggregated problem QP a (ρ). Of course, such an aggregation task will only be carried out each time a given maximal bundle dimension is reached. The aggregation mechanism does not impair convergence. Indeed the key argument is that the monotonicity of the sequence {zk∗ }, necessary in the proof of Lemma 4.1, is still guaranteed. The algorithm, equipped with the aggregation scheme, has been implemented in double precision Fortran-77 under the Windows XP system. Our code, called NCNS, has been tested on a set of 25 problems [21] available on the web at the URL http://www.cs.cas.cz/eluksan/test.html. All test problems, except the Rosenbrock problem, are nonsmooth. The input parameters have been set as follows: ² = 10−2 , η = 10−4 , σ = 10−2 , m1 = 0.2, m2 = 1.2, ρmin = 10−2 kg(x0 )k, R = 6. In table 5.1 we report the computational results in terms of the number Nf of function evaluations. By f ∗ and f we indicate, respectively, the minimum value of the objective function and the function value reached by the algorithm when the stopping criterion is met. At each iteration we solve the dual program DP (ρ), by using the subroutine DQPROG provided by the IMSL library and based on M.J.D. Powell’s implementation of the Goldfarb and Idnani [11] dual quadratic programming algorithm. We compare the results provided by our code NCNS with those available in the literature for the bundle method NCVX [7] and the variable metric algorithms VN [22] and VMNC [29]. The performance of our algorithm seems comparable with those of the considered methods.

# 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25

Problem Problem n Rosenbrock 2 Crescent 2 CB2 2 CB3 2 DEM 2 QL 2 LQ 2 Mifflin1 2 Mifflin2 2 Rosen-Suzuki 4 Shor 5 Maxquad 10 Maxq 20 Maxl 20 Goffin 50 El-Attar 6 Wolfe 2 MXHILB 50 L1HILB 50 Colville1 5 Gill 10 HS78 5 TR48 48 Shell Dual 15 Steiner2 12 f∗ 0 0 1.9522245 2 -3 7.2 -1.4142136 -1 -1 -44 22.600162 -0.8414083 0 0 0 0.5598131 -8 0 0 -32.348679 9.7857721 -2.9197004 -638565 32.348679 16.703838 Nf 54 53 16 14 13 15 15 165 14 33 27 90 187 23 56 172 43 24 30 36 308 237 1662 642 96

Nf 70 22 18 15 21 28 9 127 13 29 44 56 293 44 148 152 21 33 104 47 164 159 353 1497 196

NCVX f 5.009e-07 8.022e-06 1.9522245 2.0000000 -2.9999999 7.2000005 -1.4142135 -0.9999977 -1.0000000 -44.000000 22.600162 -0.8414078 1.660e-07 1.110e-15 1.142e-13 0.5598163 -7.9999998 1.768e-05 6.978e-07 -32.348679 9.7857746 -2.9196589 -638565.00 32.349404 16.703838

Table 5.1 NCNS: computational results

NCNS f 5.137e-06 5.112e-06 1.9522255 2.0000000 -3.0000000 7.2000001 -1.4142136 -9.9993895 -9.9999915 -43.999997 22.600163 -0.8413860 1.561e-06 4.493e-15 1.984e-13 0.5598143 -7.9999998 1.764e-05 1.709e-05 -32.348677 9.7858381 -2.9191783 -638514.80 32.348687 16.703844 Nf 34 16 17 17 20 19 10 127 13 38 40 89 123 23 360 83 14 66 67 53 241 32 359 315 89

f 2.759e-11 9.489e-11 1.9522247 2.0000000 -2.9999996 7.2000000 -1.4142133 -0.9999924 -0.9999997 -43.999999 22.600162 -0.8414057 1.468e-06 0.0000000 4.153e-06 0.5598155 -7.9999998 3.272e-06 9.457e-07 -32.348678 9.7858732 -2.9197003 -638564.91 32.349159 16.703838

VN Nf 33 15 16 17 20 18 10 59 35 32 30 89 111 23 368 76 14 67 64 47 108 — 295 289 62

VMNC f 0.320e-07 0.949e-10 1.9522250 2.0000000 -2.9999997 7.2000023 -1.4142133 -0.9999925 -0.9999998 -43.999975 22.600186 -0.8414057 0.898e-05 0.0000000 0.332e-05 0.5598184 -7.9999998 0.201e-05 0.153e-05 -32.348675 9.7862324 ——– -638562.27 32.349018 16.703937

Piecewise linear approximations in nonconvex nonsmooth optimization

13

14

M. Gaudioso, E. Gorgone, M.F. Monaco

REFERENCES [1] A. Astorino and A. Fuduli, Nonsmooth optimization techniques for semi-supervised classification, IEEE Transactions on Pattern Analysis and Machine Intelligence. To appear. [2] J. Burke, A. Lewis, and L. Overton, A robust gradient sampling algorithm for nonsmooth nonconvex optimization, SIAM Journal on Optimization, 15 (2005), pp. 751–779. [3] E. W. Cheney and A. A. Goldstein, Newton’s method for convex programming and Tchebycheff approximation, Numerische Mathematik, 1 (1959), pp. 253–268. [4] F. Clarke, Optimization and nonsmooth analysis, John Wiley and Sons, 1983. [5] V. F. Demyanov and V. N. Malozemov, Introduction to Minimax, Wiley, 1974. [6] V. F. Demyanov and A. Rubinov, Quasidifferential calculus, Optimization Software Inc., New York, 1986. [7] A. Fuduli, M. Gaudioso, and G. Giallombardo, Minimizing nonconvex nonsmooth functions via cutting planes and proximity control, SIAM Journal on Optimization, 14 (2004), pp. 743–756. [8] , A DC piecewise affine model and a bundling technique in nonconvex nonsmooth minimization, Optimization Methods and Software, 19 (2004), pp. 89–102. [9] M. Gaudioso, G. Giallombardo, and G. Miglionico, An incremental method for solving convex finite min-max problems, Mathematics of Operations Research, 31 (2006), pp. 173– 187. [10] J.-L. Goffin, J. Gondzio., R. Sarkissian, and J.-P. Vial, Solving nonlinear multicommodity flows problems by the analytic center cutting plane method, Mathematical Programming, 76 (1997), pp. 131–154. [11] D. Goldfarb and A. Idnani, A numerically stable dual method for solving strictly convex quadratic program, Mathematical Programming, 27 (1983), pp. 1–33. ´ bal, Computing proximal point of nonconvex functions, Mathe[12] W. Hare and C. Sagastiza matical Programming. To appear. [13] C. Helmberg and F. Oustry, Bundle methods to minimize the maximum eigenvalue function, in Handobook of Semidefinite Programming, H. Wolkowicz, R. Saigal, and L. Vandenberghe, eds., Kluwer’s International Series, Kluwer Academic Publishers, Boston, 2000, pp. 307–337. ´chal, Convex analysis and minimization algorithms Vol. [14] J. Hiriart-Urruty and C. Lemare I-II, Springer-Verlag, 1993. [15] J. E. Kelley, The cutting-plane method for solving convex programs, Journal of SIAM, 8 (1960), pp. 703–712. [16] K. C. Kiwiel, An aggregate subgradient method for nonsmooth convex minimization, Mathematical Programming, 27 (1983), pp. 320–341. [17] , Proximal level bundle methods for convex nondifferentiable optimization, saddle-point problems and variational inequalities, Mathematical Programming, 69 (1995), pp. 89–109. ´chal, An extension of Davidon methods to nondifferentiable problems, in Nondiffer[18] C. Lemare entiable optimization, M. Balinski and P. Wolfe, eds., vol. 3 of Mathematical Programming Study, North-Holland, Amsterdam, 1975, pp. 95–109. [19] , A view of line-searches, in Optimization and optimal control, W. O. A. Auslender and J. Stoer, eds., vol. 30 of Lecture notes in control and information sciences, Springer Verlag, 1981, pp. 59–78. ´chal, A. Nemirovskii, and Y. Nesterov, New variants of bundle methods, Math[20] C. Lemare ematical Programming, 69 (1995), pp. 111–147. ˇek, Test problems for nonsmooth unconstrained and linearly constrained san and J. Vlc [21] L. Lukˇ optimization, Tech. Report 798, Institute of Computer Science, Academy of Sciences of the Czech Republic, Prague, 2000. , Variable metric methods for nonsmooth optimization, Tech. Report 837, Institute of [22] Computer Science, Academy of Sciences of the Czech Republic, Prague, 2001. ¨kela ¨ and P. Neittaanma ¨ ki, Nonsmooth optimization, World Scientific, 1992. [23] M. Ma

Piecewise linear approximations in nonconvex nonsmooth optimization

15

[24] R. Mifflin, An algorithm for constrained optimization with semismooth functions, Mathematics of Operations Research, 2 (1977), pp. 191–207. [25] E. Polak, On the mathematical foundations of nondifferentiable optimization in engineering design, SIAM Review, 29 (1987), pp. 21 – 89. [26] R. T. Rockafellar, Convex analysis, Princeton University Press, 1970. [27] H. Schramm and J. Zowe, A version of the bundle idea for minimizing a nonsmooth function: conceptual idea, convergence analysis, numerical results, SIAM Journal on Optimization, 1 (1992), pp. 121–152. [28] N. Shor, Minimization methods for nondifferentiable functions, Springer-Verlag, Berlin, 1985. [29] J. Vlˇ cek and Luksˇ an, Globally convergent variable metric method for nonconvex nondifferentiable unconstrained minimization, Journal of Optimization Theory and Applications, 111 (2001), pp. 407–430. [30] P. Wolfe, A method of conjugate subgradients for minimizing nondifferentiable functions, in Nondifferentiable optimization, M. Balinski and P. Wolfe, eds., vol. 3 of Mathematical Programming Study, North-Holland, Amsterdam, 1975, pp. 145–173.