Line search algorithms with guaranteed sufficient decrease - CiteSeerX

36 downloads 0 Views 1MB Size Report
lk n [ a~,., am.. 1. Section. 4 describes the specific choices for the trial values a~ that are used in our algorithm. Our numerical results indicate that these choices.
Line Search Algorithms Sufficient Decrease JORGE

J. MORE

Argonne

National

The

development

We consider the

finding

search

that

problems

satisfy

of determining that

in a finite

algorithm

for minimization

methods

converge

number

sufficient

a point

in a set T( p), We describe

of iterates

terminates

J. THUENTE

Laboratory

problem

a point

sequence

DAVID

of software

line

formulate

and

with Guaranteed

of steps.

and

satisfies

a search

algorithm

in

that

these

results

the algorithm

on a line

that, for

search

curvature two

for this

2’( p-) and

Numerical

show

based

that

to a point

on a set of test functions

is often decrease

conditions problem

except

for

that

within

and

in terms

of

produces

a

pathological

an implementation

terminates

method.

conditions,

cases,

of the a small

search

number

of

iterations. Categories

and

optt m tzatton;

Mathematical General

Key

nonlinear

Descriptors

t methods;

G. 16 nonlinear

So ftware—algorLthm

Terms:

Additional

Subject

gradlen

[Numerical

Analysis]: Optimization—corzstral necl G.4 [Mathematics of Computing]:

programming:

analysls;

efficiency;

reliability

and

robustness

Algorithms Words

optimization,

and

Phrases:

truncated

Conjugate

Newton

gradient

algorithms,

algorithms,

variable

metric

line

search

algorithms,

algorithms

1. INTRODUCTION Given

a continuously

+’(0)

0 such that o(a)

< $NO) + ~$b’(o)a

(1.1)

and

14’(CI)I < 7710’(0)1. The

development

crucial algorithm

ingredient described

of a search in

a line in

this

procedure search paper

that

method has

been

(1.2)

satisfies for

these

minimization.

used

by

several

conditions The

is a search

authors,

for

This work was supported by the Office of Scientific Computing, U.S. Department of Energy, under contract W-3 1-109-Eng-38 Authors’ addresses: J. J. Mor6, Mathematics and Computer Science Division, Argonne National Laboratory, 9700 South Cass Avenue, Argonne, IL 60439; D. J Thuente, Department of Mathematical Sciences, Indiana-Purdue University, Fort Wayne, IN 46805, Permission to copy without fee all or part of this material is granted provided that the copies are not made or distributed for direct commercial advantage, the ACM copyright notice and the title of the publication and its date appear, and notice is given that copying is by permission of the Association for Computing Machinery. To copy otherwise, or to repubhsh, requires a fee and/or specific permission, (9 1994 A(2M 0098-3500/94/0900-0286 $03.50 ACM Transactions on Mathematical Software, Vol 20, No. 3, September 1994, Pages286-307.

Line Search Algorithms example,

by Liu

and

Nocedal

[1989],

O’Leary

[1990],

Schlick

287

.

and

Fogelson

[1992a; 1992b], and Gilbert and Nocedal [1992]. This paper describes this search procedure and the associated convergence theory. In a line search method, we are given a continuously differentiable function f

R“ -

R and

a descent

(1. 1) and

conditions

(1.2)

define

an acceptable

(1. 1) and (1.2) in a line

too small, condition (1. 1) forces this condition is not sufficient arbitrarily

small

choices

for f at a given

step.

search

point

x = R”. Thus,

a!>o,

=f(x + ap),

+(a) then

p

direction

The

method

(1.3)

motivation

should

for requiring

be clear.

If

a sufjjicient decrease in the function. to guarantee convergence, because

of a > 0. Condition

(1.2) rules

a is not However, it allows

out arbitrarily

small

choices of a and usually guarantees that a is near a local minimizer Condition (1.2) is a curvature condition because it implies that

@’(cl)- +’(0) 2 and

thus,

the

condition

average

curvature

(1.2) is particularly

(1 -

in a quasi-Newton

The

curvature

method,

because

guarantees that a positive definite quasi-Newton update is possible. example, Dennis and Schnabel [1983] and Fletcher [1987]. As final motivation for the solution of (1.1) and (1.2), we mention step satisfies these conditions, reasonable choices of direction. and Fletcher

[1987]

In most

for gradient-related

practical

situations,

on a. In particular,

methods;

linearly

constrained

it is important

it is natural

The main reason for requiring iteration, while the upper bound optimization

it

See, for that

if a

then the line search method is convergent for See, for example, Dennis and Schnabel [1983] Powell

al. [1987] for quasi-Newton methods; and A1-Baali [1989], and Gilbert and Nocedal [1992] for conjugate ments

of ~.

q)ld’(o)l,

of @ on (O, a) is positive.

important

if

to impose

to require

a lower bound am ~~ is needed problems

[1976]

and Byrd

et

[ 1985], Liu and Nocedal gradient methods. that

additional a satisfy

requirethe bounds

a~,~ is to terminate the when the search is used for

or when

the

function

@ is un-

bounded below. In linearly constrained optimization problems, the parameter a!~ ~~ is a function of the distance to the nearest active constraint. An unbounded problem can be approached by accepting any a in [ a~, ~, a~~, ] by the such that O(a) < ~~,~, where @~,~ < O(O) is a lower bound specified user

of the search.

In this

case

(1.5)

is a reasonable setting, because if a~~, satisfies the sufficient decrease condition (1. 1), then 4( a ~a,) < +~~m. On the other hand, if a~~x does not satisfy the sufficient decrease condition, then we will show that it is possible to determine an acceptable a. ACM Transactions on Mathematical Software, Vol. 20, No. 3, September 1994,

288

.

J. J. More and D. J. Thuente

The main that

problem

a belongs T(p)

considered

= {a >0:

By phrasing

in this

paper

is to find

an acceptable

a such

to the set @(cl!) < 0(0)

our results

in terms

+ clv#’(o),

1+’(cl)l

of T( P), we make

< pl@J’(o)l}.

it clear

that

the

search

algorithm is independent of q; the parameter q is used only in the termination test of the algorithm. Another advantage of phrasing the results in terms of 7’( p) is that T( ~) is usually not empty. For example, I“( w) is not empty when

@ is bounded

Several For

example,

univariate did

not

have

done

and

Murray

Gill

minimization satisfy

solution

below.

authors

(1.1),

~ * to (1.1).

related

work

[1974]

on the

attacked

solution

algorithm for @ to find a solution halved in then a” was repeatedly Of course,

p * did

not

of (1.1)

(1. 1) and

necessarily

(1.2)

and

by

(1.2).

using

a

a * to (1.2). If a * order to obtain a

satisfy

(1.2);

but

if

p

was sufficiently small, then it was argued that this was an unlikely event. In a similar vein, we mention that the search algorithm of Shanno and Phua [1976; 1980] is not guaranteed to work in all cases. In particular, the sufficient decrease condition (1.2), and then the algorithm Gill they

et al. [1979] argued

that

to compute

proposed if there

a point

(1. 1) can rule out many of the points is not guaranteed to converge. an interesting

is no solution

has

intervals

a solution,

such

or just (1.6). that satisfies Lemarechal satisfies

and (1.2) when

to (1.1) and (1.2), then

that

then

= 4(0) their

it is necessary

Their algorithm, (1.1) and (1.2).

contains

however,

[1981]

developed decrease

computes points

a sequence

that

satisfy

is not guaranteed

a search condition o’(a)

(1.6)

+ @’(o)a.

algorithm

each interval

the sufficient

Moreover, iterations

on (1.1)

satisfy

such that +(a)

If (1.6)

variation

that

algorithm

for

of nested

(1. 1) and (1.2),

to produce finding

a point

a step

that

(1. 1) and (1.7)

2 q+’(o).

he proved that the algorithm terminates in a finite number of whenever v < q. These conditions are of interest because (1.1) and

(1.7) are adequate for many optimization algorithms. A possible disadvantage of these two conditions is that acceptable steps have higher function values than those where (1.2) holds. On the other hand, satisfying the stronger conditions (1. 1) and (1.2) requires a more elaborate algorithm. Fletcher [1980] suggested nested intervals that contain prove

any

result

along

these

that it is points that lines.

This

possible to compute a sequence of satisfy (1. 1) and (1.2), but he did not suggestion

led

to the

algorithms

developed by A1-Baali and Fletcher [1984] and Mor6 and Sorensen [1984]. In this paper we provide a convergence analysis, implementation details, and numerical results for the algorithm of Mor6 and Sorensen [1984]. In addition to the preceding references, there are numerous references on univariate particular, of A1-Baali

minimization and their rates of convergence. We mention, in the paper of Hager [1989] because of its connection with the work and Fletcher [1984] and Mor6 and Sorensen [1984]. In this paper

ACM Transactions on Mathematical Software, Vol. 20, No 3, September 1994

Line Search Algorithms we establish convergence The search

finite results

search

termination of the are not appropriate.

algorithm

algorithm

for

7’( p), and that, finite sequence choice

(1.5), then a that

Termination

of bounds.

either

The results an

is defined

a sequence

procedure,

and

in Section

of iterates

thus,

For

converge

at one of the bounds

example,

am lies in 7’(W)

of Section satisfies

2 show

(1.1)

a~,.

or O( an)

that

and

if

(1.2)

rate-of-

2. We show

that

that

the

to a point

except for pathological cases, the search algorithm amaxl,where ao, ..., am of trial values in [ a~,.,

or is one of the bounds. suitable

7’(p)

produces

search

289

.

can be avoided

= O and

am..

in

produces a am ~ 7’( p) by a

is defined

by

< ~~,..

the search when

algorithm

can be used to find

w s q. A result

for

an

arbitrary

q e (O, I-L) requires additional assumptions, because there may not be an a that satisfies (1. 1) and (1.2) even if @ is bounded below. In Section 3 we show that if the search algorithm generates an iterate a~ that satisfies the sufficient

decrease

terminates

condition

satisfies an.. ] , the

Given a. in [an,., nested intervals {~~} Section

and

at an ah that

4 describes

our algorithm.

and

@‘( a~ ) > 0, then (1.1) and (1.2). search algorithm

a sequence choices

for the trial

numerical

results

indicate

Our

that

search

generates

of iterateS

the specific

the

ak



algorithm

a sequence

lk n [ a~,.,

of

am..

1.

values

a~ that

are used in

these

choices

lead

to fast

termination. Section

5 describes

a set of test

problems

and

numerical

results

for

the

search procedure. The first three functions in the test set have regions of concavity, while the last three functions are convex. In all cases the functions have a unique minimizer. The emphasis in the numerical results is to explain the

qualitative

behavior

of the

algorithm

for

a wide

range

of values

of

~

and rjI. 2. THE

SEARCH

In

section

this

7’( p).

We

@‘(0) 0, then

O(a”)

for

on [0,

p < l\2,

the global

an

1 with

a~.z

because

minimizer

a in if d is

a * of @

+ $a*@’(o),

and thus a * satisfies (1.1) only if w s 1/2. The restriction p < 1/2 also allows a = 1 to be ultimately acceptable to Newton and quasi-Newton methods. In this section we need only assume that p lies in (O, 1). generates a sequence of Given a. in [ am,., a.,.. ] , the search algorithm nested

intervals

cording Search

{Ik}

and

to the following Algorithm.

a sequence

Set I.

Fork =0, l,... Choose a safeguarded a. Test for convergence. Update the interval 1~. ACM

of iterates

a~ = ~k fl [ a~,.,

a~.,

1 ac-

procedure: = [0, CQ]. ●

Transactions

Ik

n [ am:.,

a

ma.Z1

on Mathematical

Software,

Vol. 20, No. 3, September

1994.

290 In

. this

J. J. Mor6 and D J, Thuente description

convergence choice

is made

The

aim

generate

so that

on the endpoints intersection with specified in terms

We assume

that 2.1.

THEOREM

PROOF.

Assume

Clm

if

process

al + aU, but

rules

that

the

intervals

Ik

that

force

a safeguarded

is to identify

T( p) n Ih is not empty, not empty.

do not assume

Let I be a closed

a * in I with

that

interval

@(a*)

and then

and

to refine

We now specify

that with

< V(al)

aU > al; the proof

=Sup{a=

[al, *‘( al)

on au shows

al and

conditions

aU are ordered.

endpoints

am < aU, because

that

V( al).

The

the case if am = aU. This

also

definition *

Define

on

[ al,

~(a~) of

am

am].

implies

We

claim

that

cannot be achieved at al, because ~‘( al ) < that V( am) > @(al), the global minimum

achieved at am. Hence, a * is in the interior of [ al, @’(a*) = O, and thus, l@ ’(a*)l = pl@’(0)1. We also know

(1,1), because

that

+( a) < 0 for all

a in [ al,

am]. Hence,

a*

am]. that

In a’

T( ~), as





Theorem 2.1 provides the motivation for the search algorithm by showing a* in the that if the endpoints of 1 satisfy (2.1), then @ has a minimizer interior of I and, moreover, that a * belongs to T( ~). Thus the search algorithm can be viewed as a procedure for locating a minimizer of +. The assumptions that Theorem 2.1 imposes on the endpoints al and aU cannot be relaxed, because if we fix al and aU by the assumption +( al) < @(a.), then the result fails to hold if either of the other two assumptions are violated. The assumptions (2. 1) can be paraphrased by saying that al is the endpoint condition

with the lowest (1. 1), and that

* value, that al satisfies the sufficient aU – al is a descent direction for @ at

4(a) < @(al) for all a in 1 sufficiently close to a,. In particular, assumption guarantees that 4 can be decreased by searching near ACM

Transactions

on Mathematical

Software,

Vol. 20, No

3, September

1994

decrease al

so that

this al.

last

Line Search Algorithms We now describe an algorithm for updating how to use this algorithm to obtain an interval Theorem

Algorithm.

of the updated

Case Ul:

291

1 and then show the conditions of

2.1.

Updating a:

the interval that satisfies

.

interval

Given a trial value I+ are determined

If v(cq)

> @(al),

Case U2:

If ~(a~)

< +(al)and

Case U3:

If @(a~) s ~(al)

It is straightforward

then

and

to show

a:=

at in I, the as follows:

al and

a;=

endpoints

a;

and

af.

~’(a~)(al

– at) >0,

then

al=

at and

a;=

aU.

~’(a~)(al

– at) q@’(0), while Scheme S2 the

as follows:

If *(c+) >0 or if ~(a~) > @(al), then al+= al and a;= at, else if ~’(a~)(al – au) > 0, then % ‘= CYfand a;= au, else a:= at and a:= al. The two updating

algorithms

but

treatment

differ

in their

produce

of the situation

In this case, our updating algorithm sets 1+= [ at, aU ] if ~‘( at ) < 0. Our situation, interval

because generated

the same iterates

the interval by Scheme

as long

as

where

chooses algorithm

1+= [ al, at], while Scheme S2 seems to be preferable in this

1+ contains an acceptable point, while the S2 is not guaranteed to contain an acceptable

point. We now show how the updating algorithm can be used to determine an interval 1 in [0, a~~z ] with endpoints that satisfy (2.1). Initially, al = O and au = ~. Consider any trial value at in [ a~l ~, an~X ]. If Case U1 or U3 hold, then we have determined an interval with endpoints al and aU that satisfy the conditions of Theorem 2.1. Otherwise, Case U2 holds, and we can repeat the process for some a; in [ a,, a~~$ ]. We continue generating trial values in [ al, am~X] as long as Case U2 holds, but require as a trial value. This is done by choosing

for some factor

S~ ~~ > 1. In our implementation at ‘=

min{a~

+ 8(a~

– al),

a~.X},

that

eventually

a~a,

be used

we use 8=

[1.1,4].

ACM Transactions on Mathematical Software, Vol. 20, No. 3, September 1994.

292

J. J. More and D. J. Thuente

.

This choice produces trial values that satisfy a; – at = 8( cq – aj ), where is the previous value of at. Thus, if at > 80+ , then af+ > acq. Hence, induction

argument

shows

Since al = O initially is increasing with @(ci~)

that

and

o,

or satisfy

*(at)

shows that the sequence we use an cq with @(at)

>0

k= or

(2.4)

O, l,....

~‘( at ) >0,

then

the

updating

of trial values is decreasing. If, on the s O and v‘( at ) < 0, then the updating

algorithm shows that the interval 1+ lies to the right of at. Since we only use trial values cq in [ an,,, a ~~X], the interval 1+ will contain [ a~,,~, a~~z ]. We force the search algorithm to use am,. as a trial value when (2.4) holds and a~,. > 0. This is done by choosing CYJG [amln,

max{8mzn

for some factor ti~, ~ < 1. In our implementation, For more details, see Section 4.

at, amln}] (2.5) holds

(2.5) with

8~, ~ = 7\ 12.

The search algorithm terminates at a~,~ if 4( a~,~) >0 or 4 ‘(am, n) >0. This is a reasonable termination criterion, because Theorem 2.1 shows that with a* < an,. there is an a“ ● T(w) when these conditions hold. Thus, after a finite number of trial values, either the search algorithm terminates at an,., or the search algorithm generates an interval 1 in [ a~, ~, am ~~] with endpoints that satisfy conditions (2. 1). The requirements (2.2) and (2.5) are two of the safeguarding rules. Note that (2.2) is enforced only when (2.3) holds, while (2.5) is used when (2.4) holds. If the search algorithm generates an interval 1 in [ am,,, a~~ ~], then that the choice of at forces the length of 1 we need a third rule to guarantee to zero. In our implementation this is done by monitoring the length of 1; if the length of I does not decrease by a factor of d < 1 after two trials, then a ACM Transactions

on

Mathematical Software, Vol. 20, No. 3, September 1994

Line Search Algorithms bisection

step

is used

for

the

next

trial

at. In

our

.

implementation

293 we use

8 = 0.66. 2.2.

The

a~~z ] such that conditions holds:

THEOREM

after

search

algorithm

a finite

produces

number

a sequence

of trial

values

{ ak}

in

[ a~,.,

one of the following

The search and (2.3)

terminates holds.

at

a~~z,

the

sequence

of trial

values

is increasing,

The search and (2.4)

terminates holds.

at

am ;R, the

sequence

of trial

values

is decreasing,

An interval

Ih

atmax]

PROOF. above.

c

[ am,n,

In this

Let

proof

al ‘k) and

is generated.

we essentially

a(k) u

summarize

be the endpoints

& (~) = min{ajk),

au (k)

~Jk)

},

the

==

max{afk),

The left endpoint ~jk ) of Ik is nondecreasing, while nonincreasing. We first show that @~k) = ~ cannot hold for all k >0. Case U2 of the updating endpoints

algorithm

are set to finite

(2.3) holds, eventually

holds,

values.

Since

arguments

presented

of 1~, and define

because only

a~k)}. the

right

endpoint

is

If ~~k ) = ~, then

in the other

Case U2 holds,

and thus the safeguarding rule (2.2) shows that used as a trial value. Given an ~, as a trial

either terminates at a~~ ~ or /3Jk) = a~~ *. A similar argument shows that /3jk ) = O cannot

only

two cases both it is clear

that

the bound a~~ ~ is value, the search

hold

for

all

k >0.

If

algorithm holds, because in /$ (k) = O then only Case U1 or U3 of the updating Case U; both endpoints are set to positive values. Moreover, in this case (2.4) holds. The safeguarding rule (2.5) shows that (2.4) cannot hold for all h >0 when a~,~ = O, because a!~,. > 0, the safeguarding value,

and thus

We have search

thus

either

The

most

mzn~

amax

that

in this

interesting ]

after

at one of the

~~k) < ~. Of course,

[a

the search

shown

terminates

we have assumed rule shows that

last

terminates a finite two

at an ~~ or /?~k) = a~L~. number

bounds

of trial

In this

values,

a~l ~ or am .x,

case Ik is a subset

case of Theorem

is generated.

that V(O) 0.

(2.7)

2.2 shows that an interval number of trial values.

lk c

on Mathematical

Software,

Vol

20, No, 3, September

1994,

294

.

J J. More and D J. Thuente

Conditions then

(2.6) and (2.7) can be easily

(2.6) holds.

a strict

lower

THEOREM

Condition

bound

algorithm

satisfied.

For example,

if am .X is defined

for ~. Condition

If the bounds

2.3.

the search

(2.7) holds

(2.7) also holds

an,.

and

a~~,

if

am,,

= O,

by ( 1.5) and if ~n,~

is

if @‘( a~~ ~) >0.

satisfy

(2.6)

and

(2.7),

then

in a finite number of steps with an ah s T( p), = O. If the search or the iterates { ah} converge to some a * E T( p) with @’(a*) algorithm does not terminate in a finite number of steps, then there is an Ik satisfy Q)k) < a+ index k ~ such that the endpoints a~k’, aUck] of the interval < a~k~. Moreover, if $(a”) = O, then +’ changes sign on [ a~k), a“] for all k > ko, while

term inates

if @(a”)

ko. PROOF.

search lengths

tend

common with

that

Assume

algorithm.

to zero,

limit

ak @ T(W)

Since a*.

the

any

~‘( a *) = O. k ~ by noting

are

implies

that

k. >0 such that ~’(a) ~(a~h~).

H

If the search algorithm does not terminate in a finite number of steps, then Theorem 2.3 implies that ~‘ changes sign an infinite number of times, in the sense that there is a monotone sequence { ~k} that converges to a * and such that 4‘( ~k ) @‘( /3k, ~) IL. In this section we modify the search algorithm and show that under reasonable conditions we can guarantee that the modified search algorithm generates an a~ that satisfies (1.1) and (1.2) for any q > 0. A difficulty with setting q < p is that even if T( ~) is not empty, there may not be an a > 0 that satisfies (1.1) and (1.2). We illustrate this point with a minor

modification

of an example

of A1-Baali

and Fletcher

[1984].

Define

(3.1)

where q < w < p. The solid plot in Figure 1 is the function ~ with m = O.1; the dashed plot is the function 1(a) = +(0) + ~@ ’(0)a with p = 0.25. A computation shows that ~ is continuously differentiable and that

for all

a >0.

Moreover,

if p < ~, then 1–(7

l–p — [ l–a’2(p–

T(/-L)=

a)”

1

Thus 2’(p) is a nonempty interval with a = 1 in the interior. have set a = 0.1 and ~ = 0.25, and thus 2’(p) = [5/6, 3].

In Figure

1 we

We now show that if during the search for T(~) we compute a trial value a~ such that $( CYk ) s O and ~‘( ak ) > 0, then a~ belongs to 2’( p) or we have identified an interval that contains points that satisfy the sufficient decrease condition (1. 1) and the curvature condition (1.2). THEOREM 3.1.

Assume

that

the bounds

(2.7). Let { ak) be the sequence generated and a~k ) be the endpoints of the interval If ak is the first iterate that satisfies

then a~k) < a~k). Moreover, interval

contains

an a * that

satisfies

@(a ) < +( ah ) also satisfies PROOF.

We

first

claim

an,, ~ and

ah = I!’( ~) or @’(ak)

(1.1)and

a~~ ~ satisfy

(2.6)

and

by the search algorithm, and let ajk) Ik generated by the search algorithm.

> 0. If @’(ak)

4’( a *) = O. Moreover,

> 0, then

any a G I“

the

with

(1.1). that

v‘( afk))

0 for some index j < k, because a~h ) is a previous iterate. However, this contradicts the assumption that ak is the first iterate that satisfies (3.2). This proves that +‘( CE}h)) < 0. Since @‘( a~k’) CYjk’, so 1 * is well defined. ACM

TransactIons

on Mathematical

Software,

a}k ) < a~k’. This

implies,

Vol. 20, No 3, September

1994

.

296

J J, More and D. J. Thuente

J,)?

-

-0.3 -0.4 -0.5 -0.6 -0.7 -0.8 -

-..

-0.9 -1.0 0.0 Fig. 1

0.5

1.0

1.5

20

2.5

30

-....

..

.. .0

35

Solid plot is ~, defined by (3.1). Dashed plot is 1(a) = O(O)+ LLO‘(0)a.

If ~’(a~)

>0

and

~’(a~)

0 and a* is a minimizer, a * + ah. Similarly, since we proved above that

@‘( a)~)) < 0 and since

a * is a minimizer,

a * # a[k).

a* is in the interior of l“. Hence, @‘( a”) = O, as We have shown that desired. We complete the proof by noting that if +(a) < O( ak ) for some a ● 1’, then

The second inequality equality holds because satisfies

(1.1).

holds because cr~ satisfies (1.1), a < ah. Hence, any a = 1“ with

while +(a)

the third in< @(ah) also



There is no guarantee that the search algorithm will generate an iterate a~ such that +( ah ) < 0 and @‘( a~ ) > 0. For example, if @ is the function shown in Figure 1, then @‘(a) < 0 for all a. Even if @ has a minimizer a * that satisfies

the sufficient

decrease

condition,

the search

any points that satisfy (1. 1) and (1.2), and instead T( ~). We illustrate this point with the function

I#(a)=

–(a:;2)

will

algorithm converge

may

not find

to a point

in

(3.3)

–pa’,

with ~ = 0.03. In Figure 2 the set 2“( ~) is highlighted; the set of a that satisfies (1. 1) and (1.2) is larger than T( ~) if q > ~, and smaller than T( p) if q < ~. Theorem 2.3 guarantees finite termination at an ak ● T( p), but a~ may not be near the minimizer a*. We now develop a search procedure that minimizer a*, provided 4( a~ ) 0 for some iterate.

Vol. 20, No. 3, September

1994

near the Theorem

Line Search Algorithms 0.0

..

297

.

...

-(),2

-0.4 -0.6 -0.8 1

-.\ -1.2 -1.’4. 0

5

10

15

....

..

..

..

9 .5

20

30

Fig. 2. Solid plot is ~, defined by (3.3). Dashed plot is 1(a) = q5(0)+ ,ucj‘(0)a.

3.1 is one specified

of the

by this

ingredients; result

THEOREM 3.2. endpoints

there

Let I be a closed

is an a’

PROOF.

The

minimizer is

in

of

the

The

need

interval

1“

3.2, We

of

on of

interval

to show

that

the

interval

of the following

with

endpoints

this

1 and

thus

specified

assumed

< O( al)

result the

the

o’(~1)(%

~(a*)

1, then

because

have

s @(au),

in I with

proof +

interior

Theorem points.

also

the assumptions

1*

result:

al and

aU. If the

satisfy @(al)

then

we

satisfies

is

by

*)

that

=

al

3.1 ~

@‘( ak ) >0.

@’(a*)

= O. If

and

au

a*

is

the

global

guarantee

that

a *



O.

of

a$k), and

V( ak ) s O and

thus

the

modified

[%@)> %]

=

because Case U2 does not hold. Moreover, Theorem 3.1 guarantees that any a ~ Ih + ~ with 4(a) < #1(a~ ) satisfies (1.1). This implies that for any iteration j > k the

endpoint

with

Oh ● 1~ must

that

there

Hence,

a}~) satisfies converge

a 6h ● 1~ such

is

af~) satisfies

modified

search

4. TRIAL

VALUE

We also know

that

limit

@’(0~)

a*.

at an iterate

that

that

any

Since

= O, we

(1.2) for all j > k sufficiently

terminates

satisfies

{ ~~}

3.2 shows

that

4’( a *) = O.

obtain

large.

sequence

Theorem This

proves

that

the ❑

(1.1) and (1.2).

SELECTION

Given the endpoints updating algorithm 1+ that

(1.1).

to a common

contains

1, and a trial value al and aU of the interval described in the preceding section produces

acceptable

points.

We now specify

the new trial

af in 1, the an interval value

a;

in

I+. We assume at, we have values function obtained

that

in addition

function

values

to the endpoints

fl, fu, ft

fl, fu, ft and derivatives @ or the auxiliary from

the

auxiliary

gl,

function

the trial minimizer

value

function

@ until

a;

by interpolating lies between

of the cubic

of the quadratic quadratic that ACM

al

Transactions

that

on Mathematical

some

al and

fl, ft, gl,

and

Software,

Vol

iterate

satisfied, into four

the function

interpolates

that interpolates fl, interpolates

aU, and the trial

point

g,, gU, g~. The function

gU, g~ can be obtained from *. The function and derivative

+( ah) ft.

Case 1:

In this o!

+_ Cl!t —

Both aC and a~ lie in choice that is close to and

Both

Thus,

for the above

A choice this

aq

close to

so. Indeed,

if a~(

I+,

so they are both since this is the

al,

choice

al

all,

candidates point with

for a;. We desire a the lowest function

close to al, because

of a~,

is clearly

desirable

step is closer

ft ) is

a~ and set

otherwise.

+ a.),

aC are relatively

case the quadratic

aC and

if]ac–all

+(aq

{

value.

case compute

299

.

the value

ft

when

to al than

is much

larger

to aC, but usually

of a~ as a function

of

ft,

than

fl.

In

abnormally

then

a~(ft) = al.

lim f,+~

On the other

hand,

a computation

shows

that

aC(f,) = al + ~(au – at).

lim f,+=

Thus,

the midpoint

Case 2:

f, s fl

of aC and and

gtgl

+= fft

Both

aC and

a~ is a reasonable



In this

case compute

ae and

as, and set

if [aC – atl 2 Ias — atl,

(1S9

otherwise.

so they

are both

a minimizer lies between al and af tends to generate a step that

compromise.

candidates

for a;.

Since

gfgl

< 0,

cq. Choosing the step that is farthest from straddles a minimizer, and thus the next

step is also likely to fall into this case. In the next case, we choose a: by extrapolating the function values at al of the interval with at and al as and cq, so the trial value a: lies outside endpoints. We define a; in terms of a, (the minimizer of the cubic that interpolates interpolates

fl, ft, g~ and

gz, and g~).

g~ ) and

a,

(the

minimizer

of the

quadratic

that

In this case the cubic that interCase 3: ft s fl, gtgl 20, and lg~l < Igll. gl and g~ may not polates the function values fl and ft and the derivatives have a minimizer. Moreover, even if the minimizer aC exists, it maybe in the wrong direction. For example, we may have af > al, but aC < at. On the other hand, the secant step as always exists and is in the right direction. ACM

Transactions

on Mathematical

Software,

Vol. 20, No. 3, September

1994.

300

J. J. Mor6 and D. J. Thuente

.

If the cubic tends the cubic

to infinity

is beyond

+_ at —

Otherwise,

set

in the direction

of the step and the minimum

of

at, set

a;=

( a

as. This

atl < la.

iflaC–

c>

– atl,

otherwise.

a ,5 7

choice

is based

on the

observation

that

during

extrapolation it is sensible to be cautious and to choose the step closest to at. The trial value a~ defined above may be outside of the interval with at and aU as endpoints, or it may be in this interval but close to aU. Either situation is undesirable, so we redefine a; by setting min{at

+_ at — {

+ ~(au

max{a~+

8(aU

for some 8

– at),

a:},

otherwise,

we use 8 = 0.66. available at al and

al,

at indicates

that

function is decreasing rapidly in the direction of the step, but there seem to be a good way to choose a: from the available information. Case 4: f~ < fz, gtgl >0, minimizer of the cubic that 5. NUMERICAL The

set of test

algorithm have

the

problems

that

convex

of concavity,

functions

have

done in double precision The region of concavity

we use to illustrate general

functions.

while

the

three

a unique

last

minimizer.

The

first

functions Our

as the

of the

search

the second d(a)

is concave

are

functions

convex. results

In

left

of the

all

were

set is to the right

to the

-(a,:b)

function

three

numerical

on an IPX Sparcstation. of the first function in the test

+(a)=

P = 2, while

the behavior

and

the minimizer, while the second function mizer. The first function is defined by

with

a:

RESULTS

includes

regions

cases

In this case we choose and Igtl > Igll. fu, ft, gu, and g~. interpolates

the

does not

of

mini-

(5.1)

is defined

by

= (a+/3)5-2(a+/?)4

(5.2)

with ~ = 0.004. Plots for these two functions appear in Figures 3 and 4. The third function in the test set was suggested by Paul Plassmann. This function

is defined

in terms

Cj(a) ACM

Transactions

on Mathematical

of the parameters

= @o(a) Software,

+ Vol

2(1–6) ~m 20, No

1 and

sin

~ by

1’T —CY (1 2

3, September

,

1994

(5.3)

Line Search Algorithms

301

.

0.00 -0.05 -0.10 -0.15 . -0.20 . -0.25

Fig. 3.

Plot

of function

(5.1) with

~ = 2.

.

-0.30 . -0.35 -0.40

0

2

4

6

8

10

12

14

0.5

16

7

0,0 -0.5 -1.0

Fig. 4. Plot of function (5.2) with ~ = 0.004.

-1.5 . -2.0 . -2,5 . -3.0 0.0 0.2 0.4 0.6 0.8 10 1.2 1.4 1.6 18 2.0

where

The

parameter

controls

the

~ size

I a – 11 z /3, and

controls of the

thus

the

interval

(1.2)

size

of

where

can hold

only

~ ‘(0) = – ~. (1.2)

holds

for

Ia



This

because

parameter

also

I@‘(a )1 z @ for

11< ~. The

parameter

1

controls the number of oscillations in the function for I a – 1 I z ~, because in that interval 4“ ( a ) is a multiple of sin((lw/2) a ). Note that if 1 is odd, then ~’(l) = O, and that if 1 = 4ii – 1 for some integer k z 1, then ~“(l) >0. Also note that @ is convex for I a – 1 I < ~ if

We have parameter

chosen B = 0.01 settings appears ACM

and 1 = 39. A plot in Figure 5.

Transactions

on Mathematical

of this

Software,

function

with

Vol. 20, No 3, September

these 1994

302

J, J, Mor6 and D. J, Thuente

.

1 0.8 0.6 Fig.

5.

0.01

and

Plot

of function

(5.3)

with

P =

0.4 .

1 = 39.

‘w

‘0’20.0

The other functions

three

functions

are defined

(fxa) =

02

0.4

06

08

1.2 1.4 1.6 1.8 2,0

1.0

in the test set are from Yanai et al. [ 1981]. of parameters ~1 and ~z by

These

in terms

wl)[(l -d’ +(3;]”2+W,)[a’+fy]’”>

(5.4)

where 103)=(l+@)1’2-B. These functions functions with Figures 6–8. In Tables have

I–VI

used

algorithm

are quite

we present

aO = 10’ from

convex, different

for

different

but different characteristics. numerical

i = ~ 1, starting

choices This

results

+3.

This

points.

of El and ~2 lead can be seen clearly

for different illustrates

We are particularly

behavior from the remote starting points a. = 10 i 3. In our numerical results, we have used different values to illustrate

different

features

of the

values

the

problems

and

the

to in

of aO. We

behavior

of the

interested

in the

of K and q in order search

algorithm.

In

many problems we have used q = 0,1 because this value is typical of those used in an optimization setting. We comment on what happens for other values of ~ and q. The general trend is for the number of function evaluations to decrease if p is decreased or if q is increased. The reason for this trend is that as K is decreased or q is increased, the measure of the set of acceptable values of a increases. An interesting feature of the numerical results for the function in Figure 3 is that should

values of a much larger than a * = 1.4 can satisfy (1.1) and (1.2). This be clear from Figure 3 and from the results in Table I. These results

show that if we use ~ = 0.001 and q = 0.1, then the starting satisfies (1. 1) and (1.2), and thus the search algorithm exits larly, the a!o = 10+3. We can increasing algorithm

search

exits

with

a~ = 37 when

the

avoid termination w or decreasing terminates with

a. = 10+3. There ACM

algorithm

Transactions

point a. = 10 with a.. Simistarting

point

is

at points far away from the minimizer a * by q. If we increase p and set w = q = 0.1, then the aa = 1.6 when a. = 10 and with a~ = 1.6 when is no change in the behavior of the algorithm from the other

on Mathematical

Software,

Vol

20, No

3, September

1994

Line Search Algorithms

.

303

““”’~ 1.001. 1.001 1.001 Fig.

1.000

0.001

6.

Plot and

of function

&

(5.4) with

PI =

= 0.001.

1.000 1.000 1,000~ 0.0 0.1 02

0,3

0.4

0.5

0.6

0.7

0.8

0.9

). 1.0

1001 1,000 0.999 0.998 0.997 0.996

Fig.

7.

Plot

0.01 and

0,995

of function

(5.4) with

PI =

pz = 0.001.

0.994 0.993 0.992 0.991 0.0

0.1

0.2

03

0.4

0.5

0.6

07

0.8

0.9

1,0

1.001 1.000 0.999 0.998 0.997 . 0.996 . 0.995 0.994 . 0.993 0,992 0.991 0.0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1.0

Fig. 8. Plot of function (5.4) with /31= 0.001 and & = 0.01.

two starting points. If we decrease q by setting q = 0.001 but leave K unchanged at ~ = 0.1, then the final iterate am is near a * for all starting points. For q = 0.001 the search algorithm needs 6 function evaluations for = 10 and ten function evaluations for a. = 10+3. The number of function a. evaluations for a. = 10’3 and a. = 10-1 is, respectively, 8 and 4. This increase in the number of function evaluations is to be expected because now the set of acceptable Another interesting evaluations

needed

a is smaller. feature of the results for

a. = 10

ACM

Transactions

3 could

have

on Mathematical

in Table

I is that

been predicted Software,

the six function from

the nature

Vol. 20, No. 3, September

1994.

304

.

J. J. Mor6 and D. J. Thuente Table

I.

Results

for Function

w = 0.001 Info

ffO IO-3

1o-1

m

3 with

+’(%)

ffm

1

6

1.4

1

3

1.4

–9.2

10-3

4.710-3

1

1

10

9.410-3

10+3

1

4

37

7.310-4

II.

Results

for Function

with Q’o

m

fJ’in

12

1.6

1

8

1.6

1

8

1.6

–5.0

10-9

1

11

1.6

–2.3

10-8

10+1 ~o+3

~n

III.

Results

for Function

with

p = q = 0.1

Info

10-3 ~o-l ~o+l

10+3

Table

m

~o 1o-1 ~o+l

~o+3

Table

ffO

10-3 1o-1 ~o+l

~o+3

on Mathematical

o’(%) 7.110-9 1010-10

in Figure

a.

5

O’(a-)

1

12

1.0

–5.1

1

12

1.0

–1.910-4

1

10

1.0

–2.0

10-6

1

13

1.0

–1.6

10-5

IV.

Results with

10-3

4

1

10-1

Table

in Figure

w = ~ = 0.1

Info

IO-3

Transactions

in Figure

q = 0.1

10+1

Table

ACM

and

for Function

in Figure

10-5

6

w = q = 0.001

Info

m

1

4

0.08

–6.9

10-5

1

1

0.10

–4.9

10-5

–2.9

10-6

1

3

0.35

1

4

0.83

V.

+’(am)

%

Results

for Function

with

w = q = 0.001

1.610-5

in Figure

7

Info

m

~.

1

6

0.075

1.910-4

1

0.078

1

3 7

0.073

7.410-4 –2.6 10-4

1

8

0.076

4.510-4

Software,

Vol.

20,

No.

3, September

@’(ff~)

1994,

Line Search Algorithms Table VI.

~o

m

1 1 1 1

13 11 8 11

1o-1

~o+l ~o+3

of the extrapolation

process.

This

= 0.005,

until

the

least

OJ~= 0.021,

minimizer

termination ~ * = 14

is bracketed

5.210-4 8.410-5 –2.4 10-4 –3.2 10-4

generates

or until

by noting

iterates

that

in a typical

by setting

a;=

at +

ab ‘= 1.365

a4 = 0.341,

one of these

dependent the

4 and 5, because

choice

only

difficult

test

iterates

for

region

the

the search

needed

to find

an

of the set of acceptable

problems

for these

of /3 = 0.004

has a large

before

evaluations

on the measure

of view,

function

are required

of function

Figures The

0.93 0.93 0.92 0.92

as = 0.085,

evaluations

number

usually

@’(am)

satisfies

the

conditions. This implies, for example, that if the minimizer is then either one of the above iterates satisfies (1.1) and (1.2), or at

S;X ‘function

The

a.

can be explained

situation the extrapolation process a(a~ – al) with 8 = 4, and thus a!~

305

Results for Function in Figure 8 with w = v = 0.001

Info

~o-3

.

are those

functions

a. From

based

on the

in

Figure

a

this

is

point in

a is small.

4 guarantees

but also forces

exits.

functions

the set of acceptable

function

of concavity,

algorithm acceptable

that

this

@‘(O) to be quite

small

(approximately – 510-7 ). AS a consequence, (1.2) is quite restrictive for any q < 1. Similar remarks apply to the numerical results for the fu]mction in Figure 5. This is a difficult test problem, because information based on derivatives

is

unreliable

as

a result

of

the

oscillations

in

Moreover, as already noted, (1.2) can hold only for I a – 110

Recall that once such has a minimizer that

we would

evaluations

VI

ah such that

the value am a * = ~ of the

tend

for example,

the

and

an acceptable

results

ak

an iterate is satisfies the

to use q = 0.001,

to obtain

in this and

for the

then

a would function

in

8 with ~ = 0.001 and q = 0.1. For these settings, the number of evaluations needed to obtain an acceptable a from the starting 2, 1, 3, and a. = 10’ for i = –3, – 1, 1, and 3 would be, respectively,

points 4. Similar

results

would

be obtained

for the functions

in Figures

6 and 7.

and suggestions

from

sources.

ACKNOWLEDGMENTS

This

work

benefited

from

discussions

thank, in particular, Bill Davidon, are also grateful to Jorge Nocedal phases Section Finally,

many

We

Roger Fletcher, and Michael Powell. We for his encouragement during the final

of this work, to Paul Plassmann for providing the monster function of 5, and to Gail Pieper for her careful reading of the manuscript. we thank the referees for their helpful comments.

REFERENCES AL-BAALI, with

M.

1985.

inexact

line

AI-BAALI,

M., ND

Opttm.

Theory

BYRD, R. H., methods DENNIS, and

ACM

R,

New

NOCEDAL,

global Anal.

5, 121–

efficient

line

R, 1984.

J., AND YUN,

problems.

Equations.

1980.

and

J. Numer. An

convergence

of the

Fletcher-Reeves

search

for

nonlinear

Y.

SIAM

1987.

J. Numer.

R, E, 1983. Prentice-Hall,

Practical

Methods

Global Anal.

convergence 24,

least

squares.

Numermal

Method~

Englewood

Cliffs,

of Opt[mi.zatlon.

of a class

on Mathematical

Software,

Vol

of quasi-Newton

1190.

1171–

for

Unconstrained

Opttmlzation

N.J. Vol.

1. Unconstrained

York

Transactions

method

124.

48, 359-377.

J, E,, AND SGHNABEL,

FLETCHER,

property IMA

FLETCHER, Appl.

on convex

Non hnear

Wileyj

Descent searches.

20, No 3, September

1994

Optimization.

J.

Line Search Algorithms

307

o

R. 1987. Practtcal Methods of Optimization. 2nd ed. Wiley, New York. GILBERT,J. C., AND NOCEDAL,J. 1992. Global convergence properties of conjugate gradient methods for optimization. SIAM J. Optim. 2, 2 1–42.

FLETCHER,

GILL,

P. E., AND MURRAY,

descent

methods.

W.

1974.

Rep. NAC

Safeguarded

37, National

steplength

Physical

algorithms

Laboratory,

GILL, P. E., MURRAY, W., SAUNDERS, M. A., AND WRIGHT, M. H. 1979. for numerical Stanford,

optimization.

Rep. SOL

79-25,

Systems

for

optimization

Teddington, Two

Optimization

using

England. steplength

Laboratory,

algorithms

Stanford

Univ.,

Calif.

HAGER, W. W. 1989. conjugate

A derivative-based

gradient

method.

bracketing

Comput.

Math.

scheme

Appl.

for univariate

minimization

and the

18, 779–795.

C. 1981. A view of line-searches. In Optimization and Optimal Control, A. Auslender, W. Oettli, and J. Steer, Eds. Lecture Notes in Control and Information Science, vol. 30. Springer-Verlag, Heidelberg, 59-78. LIU, D. C., AND NOCEDAL,J. 1989. On the limited memory BFGS method for large scale 45, 503–528. optimization. Math. Program.

LEMARECHAL,

MoR6,

J. J., AND SORENSEN, D. C. 1984.

G. H. Golub, O’LEARY, SIAM

Ed. Mathematical

D. P. 1990. J. Matrix

Robust

Anal.

line

searches.

Proceedings, SCHLICK,

Some global

Nonlinear

SCHLICK,

convergence

Studies

in Numerical

Analysis,

D. C., 29-82.

iteratively

properties

R. W.

Mathematical

reweighted

I. Algorithms

and usage.

A.

1992b.

and

least

Providence,

ACM

squares.

truncated

method Eds.

without

SIAM-AMS

R. I., 53-72. Newton

Trans.

examples.

metric

C. E. Lemke,

truncated

TNPACK—A

11, Implementations

of a variable

Cottle

Society,

TNPACK—A

problems:

In

Washington,

using

1992a.

T., AND FOGELSON,

large-scale

computation

Programming,

vol. IX. American problems:

method.

of America,

11, 466-480.

T., AND FOGELSON, A.

for large-scale for

In

regression

Appl.

POWELL, M. J. D. 1976.

Newton’s

Association

Math.

minimization So@o.

Newton

ACM

package

1, 46-70.

18,

minimization

Trans.

Math.

package

SoftLO.

18,

1,

71-111. SHANNO, D. F., AND PHUA, K, H. ACM

Trans.

Math.

Softw.

1976.

Minimization

of unconstrained

multivariate

functions.

2, 87-94.

SHANNO, D. F., AND PHUA, K. H. 1980.

Remark

on Algorithm

500.

ACM

Trans.

Math.

Softw.

6,

618-622. YANAI,

H.,

mization.

OZAWA, M., Computing

AND KANEKO, 27,

S. 1981.

Interpolation

methods

in one

dimensional

opti-

155–163.

Received December 1992; revised and accepted August 1993

ACM

Transactions

on Mathematical

Software,

Vol. 20, No. 3, September

1994

Suggest Documents