lk n [ a~,., am.. 1. Section. 4 describes the specific choices for the trial values a~ that are used in our algorithm. Our numerical results indicate that these choices.
Line Search Algorithms Sufficient Decrease JORGE
J. MORE
Argonne
National
The
development
We consider the
finding
search
that
problems
satisfy
of determining that
in a finite
algorithm
for minimization
methods
converge
number
sufficient
a point
in a set T( p), We describe
of iterates
terminates
J. THUENTE
Laboratory
problem
a point
sequence
DAVID
of software
line
formulate
and
with Guaranteed
of steps.
and
satisfies
a search
algorithm
in
that
these
results
the algorithm
on a line
that, for
search
curvature two
for this
2’( p-) and
Numerical
show
based
that
to a point
on a set of test functions
is often decrease
conditions problem
except
for
that
within
and
in terms
of
produces
a
pathological
an implementation
terminates
method.
conditions,
cases,
of the a small
search
number
of
iterations. Categories
and
optt m tzatton;
Mathematical General
Key
nonlinear
Descriptors
t methods;
G. 16 nonlinear
So ftware—algorLthm
Terms:
Additional
Subject
gradlen
[Numerical
Analysis]: Optimization—corzstral necl G.4 [Mathematics of Computing]:
programming:
analysls;
efficiency;
reliability
and
robustness
Algorithms Words
optimization,
and
Phrases:
truncated
Conjugate
Newton
gradient
algorithms,
algorithms,
variable
metric
line
search
algorithms,
algorithms
1. INTRODUCTION Given
a continuously
+’(0)
0 such that o(a)
< $NO) + ~$b’(o)a
(1.1)
and
14’(CI)I < 7710’(0)1. The
development
crucial algorithm
ingredient described
of a search in
a line in
this
procedure search paper
that
method has
been
(1.2)
satisfies for
these
minimization.
used
by
several
conditions The
is a search
authors,
for
This work was supported by the Office of Scientific Computing, U.S. Department of Energy, under contract W-3 1-109-Eng-38 Authors’ addresses: J. J. Mor6, Mathematics and Computer Science Division, Argonne National Laboratory, 9700 South Cass Avenue, Argonne, IL 60439; D. J Thuente, Department of Mathematical Sciences, Indiana-Purdue University, Fort Wayne, IN 46805, Permission to copy without fee all or part of this material is granted provided that the copies are not made or distributed for direct commercial advantage, the ACM copyright notice and the title of the publication and its date appear, and notice is given that copying is by permission of the Association for Computing Machinery. To copy otherwise, or to repubhsh, requires a fee and/or specific permission, (9 1994 A(2M 0098-3500/94/0900-0286 $03.50 ACM Transactions on Mathematical Software, Vol 20, No. 3, September 1994, Pages286-307.
Line Search Algorithms example,
by Liu
and
Nocedal
[1989],
O’Leary
[1990],
Schlick
287
.
and
Fogelson
[1992a; 1992b], and Gilbert and Nocedal [1992]. This paper describes this search procedure and the associated convergence theory. In a line search method, we are given a continuously differentiable function f
R“ -
R and
a descent
(1. 1) and
conditions
(1.2)
define
an acceptable
(1. 1) and (1.2) in a line
too small, condition (1. 1) forces this condition is not sufficient arbitrarily
small
choices
for f at a given
step.
search
point
x = R”. Thus,
a!>o,
=f(x + ap),
+(a) then
p
direction
The
method
(1.3)
motivation
should
for requiring
be clear.
If
a sufjjicient decrease in the function. to guarantee convergence, because
of a > 0. Condition
(1.2) rules
a is not However, it allows
out arbitrarily
small
choices of a and usually guarantees that a is near a local minimizer Condition (1.2) is a curvature condition because it implies that
@’(cl)- +’(0) 2 and
thus,
the
condition
average
curvature
(1.2) is particularly
(1 -
in a quasi-Newton
The
curvature
method,
because
guarantees that a positive definite quasi-Newton update is possible. example, Dennis and Schnabel [1983] and Fletcher [1987]. As final motivation for the solution of (1.1) and (1.2), we mention step satisfies these conditions, reasonable choices of direction. and Fletcher
[1987]
In most
for gradient-related
practical
situations,
on a. In particular,
methods;
linearly
constrained
it is important
it is natural
The main reason for requiring iteration, while the upper bound optimization
it
See, for that
if a
then the line search method is convergent for See, for example, Dennis and Schnabel [1983] Powell
al. [1987] for quasi-Newton methods; and A1-Baali [1989], and Gilbert and Nocedal [1992] for conjugate ments
of ~.
q)ld’(o)l,
of @ on (O, a) is positive.
important
if
to impose
to require
a lower bound am ~~ is needed problems
[1976]
and Byrd
et
[ 1985], Liu and Nocedal gradient methods. that
additional a satisfy
requirethe bounds
a~,~ is to terminate the when the search is used for
or when
the
function
@ is un-
bounded below. In linearly constrained optimization problems, the parameter a!~ ~~ is a function of the distance to the nearest active constraint. An unbounded problem can be approached by accepting any a in [ a~, ~, a~~, ] by the such that O(a) < ~~,~, where @~,~ < O(O) is a lower bound specified user
of the search.
In this
case
(1.5)
is a reasonable setting, because if a~~, satisfies the sufficient decrease condition (1. 1), then 4( a ~a,) < +~~m. On the other hand, if a~~x does not satisfy the sufficient decrease condition, then we will show that it is possible to determine an acceptable a. ACM Transactions on Mathematical Software, Vol. 20, No. 3, September 1994,
288
.
J. J. More and D. J. Thuente
The main that
problem
a belongs T(p)
considered
= {a >0:
By phrasing
in this
paper
is to find
an acceptable
a such
to the set @(cl!) < 0(0)
our results
in terms
+ clv#’(o),
1+’(cl)l
of T( P), we make
< pl@J’(o)l}.
it clear
that
the
search
algorithm is independent of q; the parameter q is used only in the termination test of the algorithm. Another advantage of phrasing the results in terms of 7’( p) is that T( ~) is usually not empty. For example, I“( w) is not empty when
@ is bounded
Several For
example,
univariate did
not
have
done
and
Murray
Gill
minimization satisfy
solution
below.
authors
(1.1),
~ * to (1.1).
related
work
[1974]
on the
attacked
solution
algorithm for @ to find a solution halved in then a” was repeatedly Of course,
p * did
not
of (1.1)
(1. 1) and
necessarily
(1.2)
and
by
(1.2).
using
a
a * to (1.2). If a * order to obtain a
satisfy
(1.2);
but
if
p
was sufficiently small, then it was argued that this was an unlikely event. In a similar vein, we mention that the search algorithm of Shanno and Phua [1976; 1980] is not guaranteed to work in all cases. In particular, the sufficient decrease condition (1.2), and then the algorithm Gill they
et al. [1979] argued
that
to compute
proposed if there
a point
(1. 1) can rule out many of the points is not guaranteed to converge. an interesting
is no solution
has
intervals
a solution,
such
or just (1.6). that satisfies Lemarechal satisfies
and (1.2) when
to (1.1) and (1.2), then
that
then
= 4(0) their
it is necessary
Their algorithm, (1.1) and (1.2).
contains
however,
[1981]
developed decrease
computes points
a sequence
that
satisfy
is not guaranteed
a search condition o’(a)
(1.6)
+ @’(o)a.
algorithm
each interval
the sufficient
Moreover, iterations
on (1.1)
satisfy
such that +(a)
If (1.6)
variation
that
algorithm
for
of nested
(1. 1) and (1.2),
to produce finding
a point
a step
that
(1. 1) and (1.7)
2 q+’(o).
he proved that the algorithm terminates in a finite number of whenever v < q. These conditions are of interest because (1.1) and
(1.7) are adequate for many optimization algorithms. A possible disadvantage of these two conditions is that acceptable steps have higher function values than those where (1.2) holds. On the other hand, satisfying the stronger conditions (1. 1) and (1.2) requires a more elaborate algorithm. Fletcher [1980] suggested nested intervals that contain prove
any
result
along
these
that it is points that lines.
This
possible to compute a sequence of satisfy (1. 1) and (1.2), but he did not suggestion
led
to the
algorithms
developed by A1-Baali and Fletcher [1984] and Mor6 and Sorensen [1984]. In this paper we provide a convergence analysis, implementation details, and numerical results for the algorithm of Mor6 and Sorensen [1984]. In addition to the preceding references, there are numerous references on univariate particular, of A1-Baali
minimization and their rates of convergence. We mention, in the paper of Hager [1989] because of its connection with the work and Fletcher [1984] and Mor6 and Sorensen [1984]. In this paper
ACM Transactions on Mathematical Software, Vol. 20, No 3, September 1994
Line Search Algorithms we establish convergence The search
finite results
search
termination of the are not appropriate.
algorithm
algorithm
for
7’( p), and that, finite sequence choice
(1.5), then a that
Termination
of bounds.
either
The results an
is defined
a sequence
procedure,
and
in Section
of iterates
thus,
For
converge
at one of the bounds
example,
am lies in 7’(W)
of Section satisfies
2 show
(1.1)
a~,.
or O( an)
that
and
if
(1.2)
rate-of-
2. We show
that
that
the
to a point
except for pathological cases, the search algorithm amaxl,where ao, ..., am of trial values in [ a~,.,
or is one of the bounds. suitable
7’(p)
produces
search
289
.
can be avoided
= O and
am..
in
produces a am ~ 7’( p) by a
is defined
by
< ~~,..
the search when
algorithm
can be used to find
w s q. A result
for
an
arbitrary
q e (O, I-L) requires additional assumptions, because there may not be an a that satisfies (1. 1) and (1.2) even if @ is bounded below. In Section 3 we show that if the search algorithm generates an iterate a~ that satisfies the sufficient
decrease
terminates
condition
satisfies an.. ] , the
Given a. in [an,., nested intervals {~~} Section
and
at an ah that
4 describes
our algorithm.
and
@‘( a~ ) > 0, then (1.1) and (1.2). search algorithm
a sequence choices
for the trial
numerical
results
indicate
Our
that
search
generates
of iterateS
the specific
the
ak
●
algorithm
a sequence
lk n [ a~,.,
of
am..
1.
values
a~ that
are used in
these
choices
lead
to fast
termination. Section
5 describes
a set of test
problems
and
numerical
results
for
the
search procedure. The first three functions in the test set have regions of concavity, while the last three functions are convex. In all cases the functions have a unique minimizer. The emphasis in the numerical results is to explain the
qualitative
behavior
of the
algorithm
for
a wide
range
of values
of
~
and rjI. 2. THE
SEARCH
In
section
this
7’( p).
We
@‘(0) 0, then
O(a”)
for
on [0,
p < l\2,
the global
an
1 with
a~.z
because
minimizer
a in if d is
a * of @
+ $a*@’(o),
and thus a * satisfies (1.1) only if w s 1/2. The restriction p < 1/2 also allows a = 1 to be ultimately acceptable to Newton and quasi-Newton methods. In this section we need only assume that p lies in (O, 1). generates a sequence of Given a. in [ am,., a.,.. ] , the search algorithm nested
intervals
cording Search
{Ik}
and
to the following Algorithm.
a sequence
Set I.
Fork =0, l,... Choose a safeguarded a. Test for convergence. Update the interval 1~. ACM
of iterates
a~ = ~k fl [ a~,.,
a~.,
1 ac-
procedure: = [0, CQ]. ●
Transactions
Ik
n [ am:.,
a
ma.Z1
on Mathematical
Software,
Vol. 20, No. 3, September
1994.
290 In
. this
J. J. Mor6 and D J, Thuente description
convergence choice
is made
The
aim
generate
so that
on the endpoints intersection with specified in terms
We assume
that 2.1.
THEOREM
PROOF.
Assume
Clm
if
process
al + aU, but
rules
that
the
intervals
Ik
that
force
a safeguarded
is to identify
T( p) n Ih is not empty, not empty.
do not assume
Let I be a closed
a * in I with
that
interval
@(a*)
and then
and
to refine
We now specify
that with
< V(al)
aU > al; the proof
=Sup{a=
[al, *‘( al)
on au shows
al and
conditions
aU are ordered.
endpoints
am < aU, because
that
V( al).
The
the case if am = aU. This
also
definition *
Define
on
[ al,
~(a~) of
am
am].
implies
We
claim
that
cannot be achieved at al, because ~‘( al ) < that V( am) > @(al), the global minimum
achieved at am. Hence, a * is in the interior of [ al, @’(a*) = O, and thus, l@ ’(a*)l = pl@’(0)1. We also know
(1,1), because
that
+( a) < 0 for all
a in [ al,
am]. Hence,
a*
am]. that
In a’
T( ~), as
●
❑
Theorem 2.1 provides the motivation for the search algorithm by showing a* in the that if the endpoints of 1 satisfy (2.1), then @ has a minimizer interior of I and, moreover, that a * belongs to T( ~). Thus the search algorithm can be viewed as a procedure for locating a minimizer of +. The assumptions that Theorem 2.1 imposes on the endpoints al and aU cannot be relaxed, because if we fix al and aU by the assumption +( al) < @(a.), then the result fails to hold if either of the other two assumptions are violated. The assumptions (2. 1) can be paraphrased by saying that al is the endpoint condition
with the lowest (1. 1), and that
* value, that al satisfies the sufficient aU – al is a descent direction for @ at
4(a) < @(al) for all a in 1 sufficiently close to a,. In particular, assumption guarantees that 4 can be decreased by searching near ACM
Transactions
on Mathematical
Software,
Vol. 20, No
3, September
1994
decrease al
so that
this al.
last
Line Search Algorithms We now describe an algorithm for updating how to use this algorithm to obtain an interval Theorem
Algorithm.
of the updated
Case Ul:
291
1 and then show the conditions of
2.1.
Updating a:
the interval that satisfies
.
interval
Given a trial value I+ are determined
If v(cq)
> @(al),
Case U2:
If ~(a~)
< +(al)and
Case U3:
If @(a~) s ~(al)
It is straightforward
then
and
to show
a:=
at in I, the as follows:
al and
a;=
endpoints
a;
and
af.
~’(a~)(al
– at) >0,
then
al=
at and
a;=
aU.
~’(a~)(al
– at) q@’(0), while Scheme S2 the
as follows:
If *(c+) >0 or if ~(a~) > @(al), then al+= al and a;= at, else if ~’(a~)(al – au) > 0, then % ‘= CYfand a;= au, else a:= at and a:= al. The two updating
algorithms
but
treatment
differ
in their
produce
of the situation
In this case, our updating algorithm sets 1+= [ at, aU ] if ~‘( at ) < 0. Our situation, interval
because generated
the same iterates
the interval by Scheme
as long
as
where
chooses algorithm
1+= [ al, at], while Scheme S2 seems to be preferable in this
1+ contains an acceptable point, while the S2 is not guaranteed to contain an acceptable
point. We now show how the updating algorithm can be used to determine an interval 1 in [0, a~~z ] with endpoints that satisfy (2.1). Initially, al = O and au = ~. Consider any trial value at in [ a~l ~, an~X ]. If Case U1 or U3 hold, then we have determined an interval with endpoints al and aU that satisfy the conditions of Theorem 2.1. Otherwise, Case U2 holds, and we can repeat the process for some a; in [ a,, a~~$ ]. We continue generating trial values in [ al, am~X] as long as Case U2 holds, but require as a trial value. This is done by choosing
for some factor
S~ ~~ > 1. In our implementation at ‘=
min{a~
+ 8(a~
– al),
a~.X},
that
eventually
a~a,
be used
we use 8=
[1.1,4].
ACM Transactions on Mathematical Software, Vol. 20, No. 3, September 1994.
292
J. J. More and D. J. Thuente
.
This choice produces trial values that satisfy a; – at = 8( cq – aj ), where is the previous value of at. Thus, if at > 80+ , then af+ > acq. Hence, induction
argument
shows
Since al = O initially is increasing with @(ci~)
that
and
o,
or satisfy
*(at)
shows that the sequence we use an cq with @(at)
>0
k= or
(2.4)
O, l,....
~‘( at ) >0,
then
the
updating
of trial values is decreasing. If, on the s O and v‘( at ) < 0, then the updating
algorithm shows that the interval 1+ lies to the right of at. Since we only use trial values cq in [ an,,, a ~~X], the interval 1+ will contain [ a~,,~, a~~z ]. We force the search algorithm to use am,. as a trial value when (2.4) holds and a~,. > 0. This is done by choosing CYJG [amln,
max{8mzn
for some factor ti~, ~ < 1. In our implementation, For more details, see Section 4.
at, amln}] (2.5) holds
(2.5) with
8~, ~ = 7\ 12.
The search algorithm terminates at a~,~ if 4( a~,~) >0 or 4 ‘(am, n) >0. This is a reasonable termination criterion, because Theorem 2.1 shows that with a* < an,. there is an a“ ● T(w) when these conditions hold. Thus, after a finite number of trial values, either the search algorithm terminates at an,., or the search algorithm generates an interval 1 in [ a~, ~, am ~~] with endpoints that satisfy conditions (2. 1). The requirements (2.2) and (2.5) are two of the safeguarding rules. Note that (2.2) is enforced only when (2.3) holds, while (2.5) is used when (2.4) holds. If the search algorithm generates an interval 1 in [ am,,, a~~ ~], then that the choice of at forces the length of 1 we need a third rule to guarantee to zero. In our implementation this is done by monitoring the length of 1; if the length of I does not decrease by a factor of d < 1 after two trials, then a ACM Transactions
on
Mathematical Software, Vol. 20, No. 3, September 1994
Line Search Algorithms bisection
step
is used
for
the
next
trial
at. In
our
.
implementation
293 we use
8 = 0.66. 2.2.
The
a~~z ] such that conditions holds:
THEOREM
after
search
algorithm
a finite
produces
number
a sequence
of trial
values
{ ak}
in
[ a~,.,
one of the following
The search and (2.3)
terminates holds.
at
a~~z,
the
sequence
of trial
values
is increasing,
The search and (2.4)
terminates holds.
at
am ;R, the
sequence
of trial
values
is decreasing,
An interval
Ih
atmax]
PROOF. above.
c
[ am,n,
In this
Let
proof
al ‘k) and
is generated.
we essentially
a(k) u
summarize
be the endpoints
& (~) = min{ajk),
au (k)
~Jk)
},
the
==
max{afk),
The left endpoint ~jk ) of Ik is nondecreasing, while nonincreasing. We first show that @~k) = ~ cannot hold for all k >0. Case U2 of the updating endpoints
algorithm
are set to finite
(2.3) holds, eventually
holds,
values.
Since
arguments
presented
of 1~, and define
because only
a~k)}. the
right
endpoint
is
If ~~k ) = ~, then
in the other
Case U2 holds,
and thus the safeguarding rule (2.2) shows that used as a trial value. Given an ~, as a trial
either terminates at a~~ ~ or /3Jk) = a~~ *. A similar argument shows that /3jk ) = O cannot
only
two cases both it is clear
that
the bound a~~ ~ is value, the search
hold
for
all
k >0.
If
algorithm holds, because in /$ (k) = O then only Case U1 or U3 of the updating Case U; both endpoints are set to positive values. Moreover, in this case (2.4) holds. The safeguarding rule (2.5) shows that (2.4) cannot hold for all h >0 when a~,~ = O, because a!~,. > 0, the safeguarding value,
and thus
We have search
thus
either
The
most
mzn~
amax
that
in this
interesting ]
after
at one of the
~~k) < ~. Of course,
[a
the search
shown
terminates
we have assumed rule shows that
last
terminates a finite two
at an ~~ or /?~k) = a~L~. number
bounds
of trial
In this
values,
a~l ~ or am .x,
case Ik is a subset
case of Theorem
is generated.
that V(O) 0.
(2.7)
2.2 shows that an interval number of trial values.
lk c
on Mathematical
Software,
Vol
20, No, 3, September
1994,
294
.
J J. More and D J. Thuente
Conditions then
(2.6) and (2.7) can be easily
(2.6) holds.
a strict
lower
THEOREM
Condition
bound
algorithm
satisfied.
For example,
if am .X is defined
for ~. Condition
If the bounds
2.3.
the search
(2.7) holds
(2.7) also holds
an,.
and
a~~,
if
am,,
= O,
by ( 1.5) and if ~n,~
is
if @‘( a~~ ~) >0.
satisfy
(2.6)
and
(2.7),
then
in a finite number of steps with an ah s T( p), = O. If the search or the iterates { ah} converge to some a * E T( p) with @’(a*) algorithm does not terminate in a finite number of steps, then there is an Ik satisfy Q)k) < a+ index k ~ such that the endpoints a~k’, aUck] of the interval < a~k~. Moreover, if $(a”) = O, then +’ changes sign on [ a~k), a“] for all k > ko, while
term inates
if @(a”)
ko. PROOF.
search lengths
tend
common with
that
Assume
algorithm.
to zero,
limit
ak @ T(W)
Since a*.
the
any
~‘( a *) = O. k ~ by noting
are
implies
that
k. >0 such that ~’(a) ~(a~h~).
H
If the search algorithm does not terminate in a finite number of steps, then Theorem 2.3 implies that ~‘ changes sign an infinite number of times, in the sense that there is a monotone sequence { ~k} that converges to a * and such that 4‘( ~k ) @‘( /3k, ~) IL. In this section we modify the search algorithm and show that under reasonable conditions we can guarantee that the modified search algorithm generates an a~ that satisfies (1.1) and (1.2) for any q > 0. A difficulty with setting q < p is that even if T( ~) is not empty, there may not be an a > 0 that satisfies (1.1) and (1.2). We illustrate this point with a minor
modification
of an example
of A1-Baali
and Fletcher
[1984].
Define
(3.1)
where q < w < p. The solid plot in Figure 1 is the function ~ with m = O.1; the dashed plot is the function 1(a) = +(0) + ~@ ’(0)a with p = 0.25. A computation shows that ~ is continuously differentiable and that
for all
a >0.
Moreover,
if p < ~, then 1–(7
l–p — [ l–a’2(p–
T(/-L)=
a)”
1
Thus 2’(p) is a nonempty interval with a = 1 in the interior. have set a = 0.1 and ~ = 0.25, and thus 2’(p) = [5/6, 3].
In Figure
1 we
We now show that if during the search for T(~) we compute a trial value a~ such that $( CYk ) s O and ~‘( ak ) > 0, then a~ belongs to 2’( p) or we have identified an interval that contains points that satisfy the sufficient decrease condition (1. 1) and the curvature condition (1.2). THEOREM 3.1.
Assume
that
the bounds
(2.7). Let { ak) be the sequence generated and a~k ) be the endpoints of the interval If ak is the first iterate that satisfies
then a~k) < a~k). Moreover, interval
contains
an a * that
satisfies
@(a ) < +( ah ) also satisfies PROOF.
We
first
claim
an,, ~ and
ah = I!’( ~) or @’(ak)
(1.1)and
a~~ ~ satisfy
(2.6)
and
by the search algorithm, and let ajk) Ik generated by the search algorithm.
> 0. If @’(ak)
4’( a *) = O. Moreover,
> 0, then
any a G I“
the
with
(1.1). that
v‘( afk))
0 for some index j < k, because a~h ) is a previous iterate. However, this contradicts the assumption that ak is the first iterate that satisfies (3.2). This proves that +‘( CE}h)) < 0. Since @‘( a~k’) CYjk’, so 1 * is well defined. ACM
TransactIons
on Mathematical
Software,
a}k ) < a~k’. This
implies,
Vol. 20, No 3, September
1994
.
296
J J, More and D. J. Thuente
J,)?
-
-0.3 -0.4 -0.5 -0.6 -0.7 -0.8 -
-..
-0.9 -1.0 0.0 Fig. 1
0.5
1.0
1.5
20
2.5
30
-....
..
.. .0
35
Solid plot is ~, defined by (3.1). Dashed plot is 1(a) = O(O)+ LLO‘(0)a.
If ~’(a~)
>0
and
~’(a~)
0 and a* is a minimizer, a * + ah. Similarly, since we proved above that
@‘( a)~)) < 0 and since
a * is a minimizer,
a * # a[k).
a* is in the interior of l“. Hence, @‘( a”) = O, as We have shown that desired. We complete the proof by noting that if +(a) < O( ak ) for some a ● 1’, then
The second inequality equality holds because satisfies
(1.1).
holds because cr~ satisfies (1.1), a < ah. Hence, any a = 1“ with
while +(a)
the third in< @(ah) also
❑
There is no guarantee that the search algorithm will generate an iterate a~ such that +( ah ) < 0 and @‘( a~ ) > 0. For example, if @ is the function shown in Figure 1, then @‘(a) < 0 for all a. Even if @ has a minimizer a * that satisfies
the sufficient
decrease
condition,
the search
any points that satisfy (1. 1) and (1.2), and instead T( ~). We illustrate this point with the function
I#(a)=
–(a:;2)
will
algorithm converge
may
not find
to a point
in
(3.3)
–pa’,
with ~ = 0.03. In Figure 2 the set 2“( ~) is highlighted; the set of a that satisfies (1. 1) and (1.2) is larger than T( ~) if q > ~, and smaller than T( p) if q < ~. Theorem 2.3 guarantees finite termination at an ak ● T( p), but a~ may not be near the minimizer a*. We now develop a search procedure that minimizer a*, provided 4( a~ ) 0 for some iterate.
Vol. 20, No. 3, September
1994
near the Theorem
Line Search Algorithms 0.0
..
297
.
...
-(),2
-0.4 -0.6 -0.8 1
-.\ -1.2 -1.’4. 0
5
10
15
....
..
..
..
9 .5
20
30
Fig. 2. Solid plot is ~, defined by (3.3). Dashed plot is 1(a) = q5(0)+ ,ucj‘(0)a.
3.1 is one specified
of the
by this
ingredients; result
THEOREM 3.2. endpoints
there
Let I be a closed
is an a’
PROOF.
The
minimizer is
in
of
the
The
need
interval
1“
3.2, We
of
on of
interval
to show
that
the
interval
of the following
with
endpoints
this
1 and
thus
specified
assumed
< O( al)
result the
the
o’(~1)(%
~(a*)
1, then
because
have
s @(au),
in I with
proof +
interior
Theorem points.
also
the assumptions
1*
result:
al and
aU. If the
satisfy @(al)
then
we
satisfies
is
by
*)
that
=
al
3.1 ~
@‘( ak ) >0.
@’(a*)
= O. If
and
au
a*
is
the
global
guarantee
that
a *
❑
O.
of
a$k), and
V( ak ) s O and
thus
the
modified
[%@)> %]
=
because Case U2 does not hold. Moreover, Theorem 3.1 guarantees that any a ~ Ih + ~ with 4(a) < #1(a~ ) satisfies (1.1). This implies that for any iteration j > k the
endpoint
with
Oh ● 1~ must
that
there
Hence,
a}~) satisfies converge
a 6h ● 1~ such
is
af~) satisfies
modified
search
4. TRIAL
VALUE
We also know
that
limit
@’(0~)
a*.
at an iterate
that
that
any
Since
= O, we
(1.2) for all j > k sufficiently
terminates
satisfies
{ ~~}
3.2 shows
that
4’( a *) = O.
obtain
large.
sequence
Theorem This
proves
that
the ❑
(1.1) and (1.2).
SELECTION
Given the endpoints updating algorithm 1+ that
(1.1).
to a common
contains
1, and a trial value al and aU of the interval described in the preceding section produces
acceptable
points.
We now specify
the new trial
af in 1, the an interval value
a;
in
I+. We assume at, we have values function obtained
that
in addition
function
values
to the endpoints
fl, fu, ft
fl, fu, ft and derivatives @ or the auxiliary from
the
auxiliary
gl,
function
the trial minimizer
value
function
@ until
a;
by interpolating lies between
of the cubic
of the quadratic quadratic that ACM
al
Transactions
that
on Mathematical
some
al and
fl, ft, gl,
and
Software,
Vol
iterate
satisfied, into four
the function
interpolates
that interpolates fl, interpolates
aU, and the trial
point
g,, gU, g~. The function
gU, g~ can be obtained from *. The function and derivative
+( ah) ft.
Case 1:
In this o!
+_ Cl!t —
Both aC and a~ lie in choice that is close to and
Both
Thus,
for the above
A choice this
aq
close to
so. Indeed,
if a~(
I+,
so they are both since this is the
al,
choice
al
all,
candidates point with
for a;. We desire a the lowest function
close to al, because
of a~,
is clearly
desirable
step is closer
ft ) is
a~ and set
otherwise.
+ a.),
aC are relatively
case the quadratic
aC and
if]ac–all
+(aq
{
value.
case compute
299
.
the value
ft
when
to al than
is much
larger
to aC, but usually
of a~ as a function
of
ft,
than
fl.
In
abnormally
then
a~(ft) = al.
lim f,+~
On the other
hand,
a computation
shows
that
aC(f,) = al + ~(au – at).
lim f,+=
Thus,
the midpoint
Case 2:
f, s fl
of aC and and
gtgl
+= fft
Both
aC and
a~ is a reasonable
In this
case compute
ae and
as, and set
if [aC – atl 2 Ias — atl,
(1S9
otherwise.
so they
are both
a minimizer lies between al and af tends to generate a step that
compromise.
candidates
for a;.
Since
gfgl
< 0,
cq. Choosing the step that is farthest from straddles a minimizer, and thus the next
step is also likely to fall into this case. In the next case, we choose a: by extrapolating the function values at al of the interval with at and al as and cq, so the trial value a: lies outside endpoints. We define a; in terms of a, (the minimizer of the cubic that interpolates interpolates
fl, ft, g~ and
gz, and g~).
g~ ) and
a,
(the
minimizer
of the
quadratic
that
In this case the cubic that interCase 3: ft s fl, gtgl 20, and lg~l < Igll. gl and g~ may not polates the function values fl and ft and the derivatives have a minimizer. Moreover, even if the minimizer aC exists, it maybe in the wrong direction. For example, we may have af > al, but aC < at. On the other hand, the secant step as always exists and is in the right direction. ACM
Transactions
on Mathematical
Software,
Vol. 20, No. 3, September
1994.
300
J. J. Mor6 and D. J. Thuente
.
If the cubic tends the cubic
to infinity
is beyond
+_ at —
Otherwise,
set
in the direction
of the step and the minimum
of
at, set
a;=
( a
as. This
atl < la.
iflaC–
c>
– atl,
otherwise.
a ,5 7
choice
is based
on the
observation
that
during
extrapolation it is sensible to be cautious and to choose the step closest to at. The trial value a~ defined above may be outside of the interval with at and aU as endpoints, or it may be in this interval but close to aU. Either situation is undesirable, so we redefine a; by setting min{at
+_ at — {
+ ~(au
max{a~+
8(aU
for some 8
– at),
a:},
otherwise,
we use 8 = 0.66. available at al and
al,
at indicates
that
function is decreasing rapidly in the direction of the step, but there seem to be a good way to choose a: from the available information. Case 4: f~ < fz, gtgl >0, minimizer of the cubic that 5. NUMERICAL The
set of test
algorithm have
the
problems
that
convex
of concavity,
functions
have
done in double precision The region of concavity
we use to illustrate general
functions.
while
the
three
a unique
last
minimizer.
The
first
functions Our
as the
of the
search
the second d(a)
is concave
are
functions
convex. results
In
left
of the
all
were
set is to the right
to the
-(a,:b)
function
three
numerical
on an IPX Sparcstation. of the first function in the test
+(a)=
P = 2, while
the behavior
and
the minimizer, while the second function mizer. The first function is defined by
with
a:
RESULTS
includes
regions
cases
In this case we choose and Igtl > Igll. fu, ft, gu, and g~. interpolates
the
does not
of
mini-
(5.1)
is defined
by
= (a+/3)5-2(a+/?)4
(5.2)
with ~ = 0.004. Plots for these two functions appear in Figures 3 and 4. The third function in the test set was suggested by Paul Plassmann. This function
is defined
in terms
Cj(a) ACM
Transactions
on Mathematical
of the parameters
= @o(a) Software,
+ Vol
2(1–6) ~m 20, No
1 and
sin
~ by
1’T —CY (1 2
3, September
,
1994
(5.3)
Line Search Algorithms
301
.
0.00 -0.05 -0.10 -0.15 . -0.20 . -0.25
Fig. 3.
Plot
of function
(5.1) with
~ = 2.
.
-0.30 . -0.35 -0.40
0
2
4
6
8
10
12
14
0.5
16
7
0,0 -0.5 -1.0
Fig. 4. Plot of function (5.2) with ~ = 0.004.
-1.5 . -2.0 . -2,5 . -3.0 0.0 0.2 0.4 0.6 0.8 10 1.2 1.4 1.6 18 2.0
where
The
parameter
controls
the
~ size
I a – 11 z /3, and
controls of the
thus
the
interval
(1.2)
size
of
where
can hold
only
~ ‘(0) = – ~. (1.2)
holds
for
Ia
–
This
because
parameter
also
I@‘(a )1 z @ for
11< ~. The
parameter
1
controls the number of oscillations in the function for I a – 1 I z ~, because in that interval 4“ ( a ) is a multiple of sin((lw/2) a ). Note that if 1 is odd, then ~’(l) = O, and that if 1 = 4ii – 1 for some integer k z 1, then ~“(l) >0. Also note that @ is convex for I a – 1 I < ~ if
We have parameter
chosen B = 0.01 settings appears ACM
and 1 = 39. A plot in Figure 5.
Transactions
on Mathematical
of this
Software,
function
with
Vol. 20, No 3, September
these 1994
302
J, J, Mor6 and D. J, Thuente
.
1 0.8 0.6 Fig.
5.
0.01
and
Plot
of function
(5.3)
with
P =
0.4 .
1 = 39.
‘w
‘0’20.0
The other functions
three
functions
are defined
(fxa) =
02
0.4
06
08
1.2 1.4 1.6 1.8 2,0
1.0
in the test set are from Yanai et al. [ 1981]. of parameters ~1 and ~z by
These
in terms
wl)[(l -d’ +(3;]”2+W,)[a’+fy]’”>
(5.4)
where 103)=(l+@)1’2-B. These functions functions with Figures 6–8. In Tables have
I–VI
used
algorithm
are quite
we present
aO = 10’ from
convex, different
for
different
but different characteristics. numerical
i = ~ 1, starting
choices This
results
+3.
This
points.
of El and ~2 lead can be seen clearly
for different illustrates
We are particularly
behavior from the remote starting points a. = 10 i 3. In our numerical results, we have used different values to illustrate
different
features
of the
values
the
problems
and
the
to in
of aO. We
behavior
of the
interested
in the
of K and q in order search
algorithm.
In
many problems we have used q = 0,1 because this value is typical of those used in an optimization setting. We comment on what happens for other values of ~ and q. The general trend is for the number of function evaluations to decrease if p is decreased or if q is increased. The reason for this trend is that as K is decreased or q is increased, the measure of the set of acceptable values of a increases. An interesting feature of the numerical results for the function in Figure 3 is that should
values of a much larger than a * = 1.4 can satisfy (1.1) and (1.2). This be clear from Figure 3 and from the results in Table I. These results
show that if we use ~ = 0.001 and q = 0.1, then the starting satisfies (1. 1) and (1.2), and thus the search algorithm exits larly, the a!o = 10+3. We can increasing algorithm
search
exits
with
a~ = 37 when
the
avoid termination w or decreasing terminates with
a. = 10+3. There ACM
algorithm
Transactions
point a. = 10 with a.. Simistarting
point
is
at points far away from the minimizer a * by q. If we increase p and set w = q = 0.1, then the aa = 1.6 when a. = 10 and with a~ = 1.6 when is no change in the behavior of the algorithm from the other
on Mathematical
Software,
Vol
20, No
3, September
1994
Line Search Algorithms
.
303
““”’~ 1.001. 1.001 1.001 Fig.
1.000
0.001
6.
Plot and
of function
&
(5.4) with
PI =
= 0.001.
1.000 1.000 1,000~ 0.0 0.1 02
0,3
0.4
0.5
0.6
0.7
0.8
0.9
). 1.0
1001 1,000 0.999 0.998 0.997 0.996
Fig.
7.
Plot
0.01 and
0,995
of function
(5.4) with
PI =
pz = 0.001.
0.994 0.993 0.992 0.991 0.0
0.1
0.2
03
0.4
0.5
0.6
07
0.8
0.9
1,0
1.001 1.000 0.999 0.998 0.997 . 0.996 . 0.995 0.994 . 0.993 0,992 0.991 0.0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1.0
Fig. 8. Plot of function (5.4) with /31= 0.001 and & = 0.01.
two starting points. If we decrease q by setting q = 0.001 but leave K unchanged at ~ = 0.1, then the final iterate am is near a * for all starting points. For q = 0.001 the search algorithm needs 6 function evaluations for = 10 and ten function evaluations for a. = 10+3. The number of function a. evaluations for a. = 10’3 and a. = 10-1 is, respectively, 8 and 4. This increase in the number of function evaluations is to be expected because now the set of acceptable Another interesting evaluations
needed
a is smaller. feature of the results for
a. = 10
ACM
Transactions
3 could
have
on Mathematical
in Table
I is that
been predicted Software,
the six function from
the nature
Vol. 20, No. 3, September
1994.
304
.
J. J. Mor6 and D. J. Thuente Table
I.
Results
for Function
w = 0.001 Info
ffO IO-3
1o-1
m
3 with
+’(%)
ffm
1
6
1.4
1
3
1.4
–9.2
10-3
4.710-3
1
1
10
9.410-3
10+3
1
4
37
7.310-4
II.
Results
for Function
with Q’o
m
fJ’in
12
1.6
1
8
1.6
1
8
1.6
–5.0
10-9
1
11
1.6
–2.3
10-8
10+1 ~o+3
~n
III.
Results
for Function
with
p = q = 0.1
Info
10-3 ~o-l ~o+l
10+3
Table
m
~o 1o-1 ~o+l
~o+3
Table
ffO
10-3 1o-1 ~o+l
~o+3
on Mathematical
o’(%) 7.110-9 1010-10
in Figure
a.
5
O’(a-)
1
12
1.0
–5.1
1
12
1.0
–1.910-4
1
10
1.0
–2.0
10-6
1
13
1.0
–1.6
10-5
IV.
Results with
10-3
4
1
10-1
Table
in Figure
w = ~ = 0.1
Info
IO-3
Transactions
in Figure
q = 0.1
10+1
Table
ACM
and
for Function
in Figure
10-5
6
w = q = 0.001
Info
m
1
4
0.08
–6.9
10-5
1
1
0.10
–4.9
10-5
–2.9
10-6
1
3
0.35
1
4
0.83
V.
+’(am)
%
Results
for Function
with
w = q = 0.001
1.610-5
in Figure
7
Info
m
~.
1
6
0.075
1.910-4
1
0.078
1
3 7
0.073
7.410-4 –2.6 10-4
1
8
0.076
4.510-4
Software,
Vol.
20,
No.
3, September
@’(ff~)
1994,
Line Search Algorithms Table VI.
~o
m
1 1 1 1
13 11 8 11
1o-1
~o+l ~o+3
of the extrapolation
process.
This
= 0.005,
until
the
least
OJ~= 0.021,
minimizer
termination ~ * = 14
is bracketed
5.210-4 8.410-5 –2.4 10-4 –3.2 10-4
generates
or until
by noting
iterates
that
in a typical
by setting
a;=
at +
ab ‘= 1.365
a4 = 0.341,
one of these
dependent the
4 and 5, because
choice
only
difficult
test
iterates
for
region
the
the search
needed
to find
an
of the set of acceptable
problems
for these
of /3 = 0.004
has a large
before
evaluations
on the measure
of view,
function
are required
of function
Figures The
0.93 0.93 0.92 0.92
as = 0.085,
evaluations
number
usually
@’(am)
satisfies
the
conditions. This implies, for example, that if the minimizer is then either one of the above iterates satisfies (1.1) and (1.2), or at
S;X ‘function
The
a.
can be explained
situation the extrapolation process a(a~ – al) with 8 = 4, and thus a!~
305
Results for Function in Figure 8 with w = v = 0.001
Info
~o-3
.
are those
functions
a. From
based
on the
in
Figure
a
this
is
point in
a is small.
4 guarantees
but also forces
exits.
functions
the set of acceptable
function
of concavity,
algorithm acceptable
that
this
@‘(O) to be quite
small
(approximately – 510-7 ). AS a consequence, (1.2) is quite restrictive for any q < 1. Similar remarks apply to the numerical results for the fu]mction in Figure 5. This is a difficult test problem, because information based on derivatives
is
unreliable
as
a result
of
the
oscillations
in
Moreover, as already noted, (1.2) can hold only for I a – 110
Recall that once such has a minimizer that
we would
evaluations
VI
ah such that
the value am a * = ~ of the
tend
for example,
the
and
an acceptable
results
ak
an iterate is satisfies the
to use q = 0.001,
to obtain
in this and
for the
then
a would function
in
8 with ~ = 0.001 and q = 0.1. For these settings, the number of evaluations needed to obtain an acceptable a from the starting 2, 1, 3, and a. = 10’ for i = –3, – 1, 1, and 3 would be, respectively,
points 4. Similar
results
would
be obtained
for the functions
in Figures
6 and 7.
and suggestions
from
sources.
ACKNOWLEDGMENTS
This
work
benefited
from
discussions
thank, in particular, Bill Davidon, are also grateful to Jorge Nocedal phases Section Finally,
many
We
Roger Fletcher, and Michael Powell. We for his encouragement during the final
of this work, to Paul Plassmann for providing the monster function of 5, and to Gail Pieper for her careful reading of the manuscript. we thank the referees for their helpful comments.
REFERENCES AL-BAALI, with
M.
1985.
inexact
line
AI-BAALI,
M., ND
Opttm.
Theory
BYRD, R. H., methods DENNIS, and
ACM
R,
New
NOCEDAL,
global Anal.
5, 121–
efficient
line
R, 1984.
J., AND YUN,
problems.
Equations.
1980.
and
J. Numer. An
convergence
of the
Fletcher-Reeves
search
for
nonlinear
Y.
SIAM
1987.
J. Numer.
R, E, 1983. Prentice-Hall,
Practical
Methods
Global Anal.
convergence 24,
least
squares.
Numermal
Method~
Englewood
Cliffs,
of Opt[mi.zatlon.
of a class
on Mathematical
Software,
Vol
of quasi-Newton
1190.
1171–
for
Unconstrained
Opttmlzation
N.J. Vol.
1. Unconstrained
York
Transactions
method
124.
48, 359-377.
J, E,, AND SGHNABEL,
FLETCHER,
property IMA
FLETCHER, Appl.
on convex
Non hnear
Wileyj
Descent searches.
20, No 3, September
1994
Optimization.
J.
Line Search Algorithms
307
o
R. 1987. Practtcal Methods of Optimization. 2nd ed. Wiley, New York. GILBERT,J. C., AND NOCEDAL,J. 1992. Global convergence properties of conjugate gradient methods for optimization. SIAM J. Optim. 2, 2 1–42.
FLETCHER,
GILL,
P. E., AND MURRAY,
descent
methods.
W.
1974.
Rep. NAC
Safeguarded
37, National
steplength
Physical
algorithms
Laboratory,
GILL, P. E., MURRAY, W., SAUNDERS, M. A., AND WRIGHT, M. H. 1979. for numerical Stanford,
optimization.
Rep. SOL
79-25,
Systems
for
optimization
Teddington, Two
Optimization
using
England. steplength
Laboratory,
algorithms
Stanford
Univ.,
Calif.
HAGER, W. W. 1989. conjugate
A derivative-based
gradient
method.
bracketing
Comput.
Math.
scheme
Appl.
for univariate
minimization
and the
18, 779–795.
C. 1981. A view of line-searches. In Optimization and Optimal Control, A. Auslender, W. Oettli, and J. Steer, Eds. Lecture Notes in Control and Information Science, vol. 30. Springer-Verlag, Heidelberg, 59-78. LIU, D. C., AND NOCEDAL,J. 1989. On the limited memory BFGS method for large scale 45, 503–528. optimization. Math. Program.
LEMARECHAL,
MoR6,
J. J., AND SORENSEN, D. C. 1984.
G. H. Golub, O’LEARY, SIAM
Ed. Mathematical
D. P. 1990. J. Matrix
Robust
Anal.
line
searches.
Proceedings, SCHLICK,
Some global
Nonlinear
SCHLICK,
convergence
Studies
in Numerical
Analysis,
D. C., 29-82.
iteratively
properties
R. W.
Mathematical
reweighted
I. Algorithms
and usage.
A.
1992b.
and
least
Providence,
ACM
squares.
truncated
method Eds.
without
SIAM-AMS
R. I., 53-72. Newton
Trans.
examples.
metric
C. E. Lemke,
truncated
TNPACK—A
11, Implementations
of a variable
Cottle
Society,
TNPACK—A
problems:
In
Washington,
using
1992a.
T., AND FOGELSON,
large-scale
computation
Programming,
vol. IX. American problems:
method.
of America,
11, 466-480.
T., AND FOGELSON, A.
for large-scale for
In
regression
Appl.
POWELL, M. J. D. 1976.
Newton’s
Association
Math.
minimization So@o.
Newton
ACM
package
1, 46-70.
18,
minimization
Trans.
Math.
package
SoftLO.
18,
1,
71-111. SHANNO, D. F., AND PHUA, K, H. ACM
Trans.
Math.
Softw.
1976.
Minimization
of unconstrained
multivariate
functions.
2, 87-94.
SHANNO, D. F., AND PHUA, K. H. 1980.
Remark
on Algorithm
500.
ACM
Trans.
Math.
Softw.
6,
618-622. YANAI,
H.,
mization.
OZAWA, M., Computing
AND KANEKO, 27,
S. 1981.
Interpolation
methods
in one
dimensional
opti-
155–163.
Received December 1992; revised and accepted August 1993
ACM
Transactions
on Mathematical
Software,
Vol. 20, No. 3, September
1994