a quadratically convergent line-search algorithm for ... - CiteSeerX

3 downloads 0 Views 644KB Size Report
solutions of one-dimensional optimization problems. According to .... on the Figure 1. FIGURE: 1 Piece-wise linear approximation (solid line) of d(x) (dotted line).
Oprimrzarion Methods and Software, 1995,Vo1.6, pp. 59-80 Reprints available directly from the publisher Photocopying permitted by license only

@ 1995 OPA (Overseas,Publishers Association) Amsterdam B.V. Published under license by Gordon and Breach Science Publishers SA Printed in Singapore

A QUADRATICALLY CONVERGENT LINE-SEARCH ALGORITHM FOR PIECEWISE SMOOTH CONVEX OPTIMIZATION* EVGENI A. NURMINSKI Institute of Applied Mathematics, Far Eastern Branch of the Russian Academy of Sciences, 7, Radio Street, 690041 Vladivostok, Russia (Received 30 December 1993; in final form 27 March 1995)

A quadratically convergent line-search algorithm for piecewise smooth convex optimization based on a discontinuous piecewise linear approximation of the subgradient of the objective function is proposed. The algorithm safeguards the optimal point and has a global linear rate of convergence with locally quadratic convergence in the case of an isolated non-degenerate kink at the solution. For practical purposes it can be combined with the line-search routine based on cubic approximation of the objective function to produce an algorithm suitable for both smooth and nonsmooth optimization.

KEY WORDS:

Line-search, convex nondifferentiable optimization

1 INTRODUCTION Line-search procedures intended to solve one-dimensional optimization problems constitute an important part of the optimization repertoire. The problem of one-dimensional optimization is an important one in its own right but it is more often considered as an essential part of multidimensional techniques. Many failures to solve specific optimization problems by different optimization subroutines are caused by or attributed to inefficiency of line-search solvers. This stresses the need for reliable and efficient line-search algorithms. Within the context of multidimensional optimization, line-search techniques can be classified with a certain degree of oversimplification into two broad categories: exact and inexact. The first ones are intended for solving one-dimensional optimization problems with arbitrary (infinitely small, theoretically speaking) accuracy. The aim of the second ones usually consists in satisfying some condition of sufficient descent to ensure the convergence of other (multidimensional) algorithms in which they are imbedded. *This work was supported in part by the Russian Foundation for Fundamental Research under the grant 94-01-01771.

59

60

E.A. NURMINSKI

Classical procedures of dichotomy, gold section, etc. which form the standard part of an: textbook on optimization belong to the first class together with their more recent relative! such as quadratic or cubic interpolation, Newton or secant methods, which often have I more rapid rate of convergence. An extensive discussion on this topic can be found fo instance in [5]. The appropriate examples of the second class are Fibonacci search, specifically con structed for a predetermined in advance number of iterations, and Armijo rule [I] which ir extensively used in many optimization methods. The boundary between these two classes is rather fuzzy: to be implementable, exact line searches are provided with stopping criteria which make them inexact; on the other hand for inexact line-search to produce a sufficient descent it may require in fact rather accuratc solutions of one-dimensional optimization problems. According to this classification scheme this paper is devoted to the exact line-searcl algorithm for convex nondifferentiable optimization. Nondifferentiability presents a special challenge for line-search techniques as it rule; out even linear approximations of an objective function, not to mention quadratic or highe order approximations which are usually employed to speed up local convergence. Among the first attempts to develop fast algorithms for one-dimensional nondifferentiablt optimization problems one has to mention [2] in which the sum of piecewise linear (PWL and quadratic (Q) models of an objective function was suggested with cutting-plane-liki PWL part and the linear approximation of the derivatives on the same side of solution. Even i the use of a single second-order term does not capture the possible difference in the curvaturi on the opposite sides of the solution, the superlinear rate of convergence was established ir theoretical studies of this algorithm. The practical performance of this algorithm is unknowr as no computational experiments were given in the pioneering paper [2]. In the 5-points method of [3] which is the further development of [4], the use of left right and central Q-models together with PWL-functions competing for the best value oj the objective was suggested. Again, superlinear convergence has been demonstrated undei piecewise twice differentiability. The main idea of the current paper consists in piecewise quadratic approximation of the objective function and we demonstrate that if there is a nontrivial kink at the solution oi the one-dimensional optimization problem it is possible to achieve a local quadratic rate oi convergence. We conclude the paper with a discussion on implementation strategies for the algorithm. and numerical experiments. 2 PRELIMINARIES We consider the line-search problem in the following notation

and denote as [a, b] the initial guess for x,, that is the interval which is known to contain x,.

QUADRATICALLY CONVERGENT LINE-SEARCH

61

It is assumed that 4 : R + R is convex, hence directionaly differentiable. We denote its right derivative as d+ ( x ) = lim A-' A+o+

(4( x + A ) - 4 ( x ) ) .

Throughout the paper we assume that d+(x) may have only a finite number of points of discontinuity. We shall also use the left derivative d- ( x ) = lim A-' A+o+

(4( x ) - 4 ( x - A ) ) ,

From convex analysis one has d+ ( x ) is non-decreasing, right continuous, upper semicontinuous, d- ( x ) is non-decreasing, left continuous, lower semicontinuous, d+(x) q d- ( x ) . It is commonly assumed in nondifferentiable optimization that at any given point x it is possible to compute a subgradient d ( x ) of 4 ( x ) :

However there is no explicit control over the choice of d ( x ) within this interval. The optimality condition for line-search problem ( 1 ) requires that 0 E [d- ( x ) ,d+ ( x ) ] or d- ( x ) F 0 I d+ ( x ) consequently d ( x ) = 0 is the ultimate stopping criteria with a little chance to ever occur in practice. We assume that the initial interval [a,b] contains in its interior exactly one point of discontinuity of d - ( x ) , d+(x) that is namely the solution x, of ( 1 ) : d+ (x, - 0 ) = d- (x,) < d+ (x,) = d- (x,

+ 0).

(2)

The subgradient d ( x ) is assumed to be Lipschitz continuous on [a,x,), (x,, b ] :

forx, y E [a,x,) o r x , y E (x,, b]. It is also necessary to impose certain nondegeneracy conditions on d ( x ) :Id ( x )- d (y) 1 > hlx -yl f o r x , y E [ a , b ] , x# y andsomeh > 0. Denote the difference between left and right-hand sides of (2) with x = x, as H :

62

E.A. NURMINSKI

Curiously enough the quadratic convergence of the algorithm depends on the assumptio that H is not zero, that is the problem (1) is essentially nonsmooth or nondifferentiable. More precisely this type of nondifferentiability for the problem (1) is introduced by th following definition.

Definition 1: Problem ( 1 ) will be called essentially nondifferentiable if there exist y > 0 such that

In what follows a line-search algorithm with a locally quadratic rate of convergence wi. be proposed. The radius of the quadratic rate of convergence depends on H , y , A , h and i of the order H / A , h / A , y / A . Exact estimates are formulated in the theorem 3.5. 3 ALGORITHM

The algorithm operates with 4 points a2, a l , bl , b2 within the initial interval [a, b ] : a1a:! 0. Then,for 6 < y/4A, d a ( x )< 0 < d b ( x ) , for x E [a2,bz]. Proof: For the right-hand side of (15)

The left part of (15)is demonstrated by the similar reasoning

Lemma 3.4. Let (I) be essentially nondifferentiable with the corresponding constant y > 0. Then,for 6 < min{y/4A, H/5A), there is a unique i E ( a l ,b l ) such that

Proof: Define

Then

E.A. NURMINSKI

Taking into account that

and applying Lemma 3.1 to the first and the last terms one obtains

The first term is estimated as follows:

1

( d b ( x )- d ( x ) )d x =

a1

7

( d b ( x ) - d ( x ) )d x

a1

+

7

( d b ( x )- d ( x ) )d x 2

xi,

The combination of these estimates yields

Symmetrically

Again, the first and the last terms are estimated using the Lemma 3.1 which yields

The integral term is estimated as

I

a1

(da( x ) - d ( x ) )d x =

QUADRATICALLY CONVERGENT LINE-SEARCH

Substitution of this into (16) yields

As $ ( x ) is obviously continuous there is i E ( a l ,b l ) such that

To prove uniqueness, notice that as d a ( x ) < 0 < d b ( x ) under the hypotheses of the Lemma due to Lemma 3.3, d'(x, i )is a point-wise monotone with respect to 2 , so $ ( x ) is a monotone function of x which implies uniqueness of 2 . rn The basis for the quadratic convergence is established by the following proposition.

Theorem 3.5. If

then for 2 , j determined by (8), (9) or (10) we have - YI

I 3As2/H,

I i

- x,l

I1 4 A s 2 / ~ ,

x, E (min{i, j } , max{? , j } ) =

(a, 6 ) .

Proof: Under the hypotheses of the theorem according to Lemma 3.4 there is a unique .? such that

Assume, for the sake of definiteness, that 2 < x,, that is that d ( i ) < 0. Then (18) can be rewritten as

E.A. NURMINSKI

70

As in Lemma 3.4 the first and last two terms can be estimated via Lemma 3.1:

1 ( d a x ) -d ) )d

5 A a 2 - a112/4 = AS2/4,

a2

and so (18) leads to the estimate

According to Lemma 3.2, db(x)

> d (x) + f , for x E [a2, x,),

and 6
x, can be considered in the same way using simply the other part of Lemma 3.: Assume again for the sake of definiteness, that d ( i ) < 0. It usually implies that 2 < x but it may also happen that i = x, only we picked a negative value from the intervi d- (x,) < 0 < d+ (x,). Then, as it is prescribed by the algorithm, j is determined by (9) and we demonstraf first that this equation has a unique solution j E (x,, b l ) . A typical situation of this kind i shown on the Figure 2. Toward this purpose, we define a (monotone) function of j

QUADRATICALLY CONVERGENT LINE-SEARCH

FIGURE: 2 Piece-wise constant approximation for finding bracketting j .

At the left endpoint of the interval (x,, b l ) the function $ ( x ) can be estimated by applying Lemma 3.1 and the nondegeneracy condition d ( x ) 5 d ( b l ) - h 1 bl - x 1 for x < bl : bl

for 6 < ( H / 1 4 A ) (h/A)ll'. At the other endpoint of this interval

E.A. NURMINSKI

72

From these two estimates and the monotonicity of $(.), the existence of a unique root o (9) follows. Simultaneously it was proved that j2 x,. Having proved it, rewrite (9) as follows

Consequently

if 6 5 ~ / ( 7 2 / 2 ~The ) . integral at the left-hand side can be estimated from below as

Combination of the last two estimates yields

(x,- j(( 3 A f i 2 / H , for

To complete the proof, it is sufficient to add that the case d ( i ) > O can be considerec exactly in the same manner, resulting in identical estimates. I From theorem 3.5,

and this value can be considered as a measure of uncertainty for the position of the solution of (1). To be meaningful it should not exceed the initial interval of uncertainty of 56, which leads to the upper estimate of 6 :

QUADRATICALLY CONVERGENT LINE-SEARCH

73

which is consistent with the propositions of theorem 3.5 ensuring the convergence of the algorithm.

4 THE SMOOTH CASE Apart from the other uses, (17) can be used to assess H. If the estimate for H is getting too small, it may serve as an indicator that we are dealing in fact with the smooth case that is with a differentiable 4 (.). For this case a cubic interpolation algorithm seems to be very suitable. Again, we describe the algorithm for passing from one set of points a:, b: obtained at the k-th iteration to the next set a;'', b;" used further on in k 1-st iteration by the following pseudocode using literate programming-like syntax:

+

(Algorithm for smooth convex optimization)

while

(Solution of ( I ) is not found with desirable accuracy) { (Determine parameters of cubic approximation ) (Find i) (Find j ) (Correct if the interval is too wide (Center new interval of uncertainty ) (Round up iteration )

1 0

where the major steps of the algorithm are detailed as follows: (Determine parameters of cubic approximation) = For given a:,bF with corresponding values of d ( x ) at these points construct a quadratic approximation

such that

(Find ?)= Solve the equation

and let i be the root of this equation lying in the interval [a: ,b:l (Find j )

E.A. NURMINSKI Find the point

y bracketing the solution x,

of (1)as

where C>O is some normalizing constant, k is the first nonnegative integer such that dO;)d(i)O and j computed by the rule (21) is less than a:, set ?=a: If d(I)

Suggest Documents