for trying to deal with integrands that are not so "well behaved" is to ... the function is badly behaved, its predicted and actual behavior may not .... Y = fbl 9x2), ... min f(x). - subject to xL I x I x". M2: max f(x) subject to xL s x s x", where the scalar-valued .... Thus, the vectors 6', 6- are the solution of the 2n nonlinear equations:.
SLAC PUB-2006 (Rev) August 1977 July 1978 (Rev)
A NESTED PARTITIONING PROCEDUREFOR NUMERICAL MULTIPLE INTEGRATION Jerome H. Friedman* Stanford
Linear Stanford,
Accelerator California
Center
and European Organization Geneva,
for Nuclear Switzerland
Research,
CERN
and Margaret
H. Wright**
Department of Operations Research Stanford University Stanford, California
ABSTRACT An algorithm coordinate
is presented
for
adaptively
space based on optimization
ordinates.
The goal
such that
the variation
These regions
of function
are then
mate of the definite (Submitted
is to construct
of a scalar
a multidimensional function
of the co-
a set of hyperrectangular values
used as the basis
integral
partitioning
within for
each region
a stratified
regions, is small.
sampling
esti-
of the function.
to ACM Transactions
on Mathematical
Software)
*Work partially supported by Department of Energy under contract number DE-AC03-76SF00515. **Work partially supported by the Department of Energy under contract number DE-AS03-76-SF00326, National Science Foundation Grant MCS7620019-AOl, and U.S. Army Research Office contract DAAG29-79-C-0110.
1.
INTRODUCTION AJthough
considerable
the numerical attention,
purpose
techniques
behaved"
in that
low order for
the
the
Zeidman,
is taken This
sulting
integrals
in which
is because the
and subdivision constructed partitioning
on estimated
properties
of the
approximations
and Zeidman,
are general
1971;
strategy
1979;
Sasaki,
to evaluate region
for
multi-
is based on the evaluations
the final
required
integration,
this
methods on high-dimeninvestment
is repaid
and
the entire
algorithm
existing initial
[Halton
subregions.
the function for
chosen
in function
values
by the efficiencies
re-
have been proposed,
based
partition.
automatic
the others
the
is to
subregions,
be applied over
subdivision
with
behaved"
each subregion
partitioning
Although
Several
while
over
by a
One technique
into
integral
"well
approximated
Kahaner and Wells,
are not retained
This
from a well
on factored
the
Most general-
so "well
can then
and the
to be competitive
the optimization
within
has received
are relatively well
are not
techniques
optimization.
problems.
that
"adaptively"
an adaptive
the partition appears
that
Genz, 1972;
Standard
quadrature
to develop
for
1971;
sum of the
problem.
of integration.
is well-behaved
Lautrup,
use of numerical
approach
the region
integrands
paper presents
dimensional
a difficult
integrals
can be reasonably
in each subregion,
as the
of multiple
to integrands
of integration
LePage 19781.
integral
sional
with
integrand
1971;
apply
integrand
the region
so that
remains
within
to deal
partition
the
today
polynomial
trying
1978;
it
evaluation
strategies integrand.
[Lautrup,
1971;
multidimensional
Genz, 1972;
Some of these
strategies
LePage,
Sasaki,
1978;
adaptive
Kahaner and Wells,
procedures
19791.
All
rely 19781 [Halton
of these
-2-
adaptive
techniques
(as well
and ars based on top-down particular
region
tegration
region.
to estimate strategy until
A sampling
division
into
error
upon the degree
to which
ensions
where even samplings
example,
in ten dimensions
about
three
titioning
This
points strategy
ations
may be inefficient
better
expended
the final
integral
simply
its
process
in-
is used
then guide
reduction
strategy of the
predicted may lead
possibility
a
continues in the es-
depends, integrand,
a sampling
large
to increase Thus, feasible
may not
likely
in high dim-
points
sparse;
for
is equivalent
hand, a complex
evaluations
the number of points a good partitioning way of assessing
to par-
number of integrand
because the additional
If
or counterpro-
are very
On the other a very
behavior
to ineffective
of 60,000
part,
deduced during
and actual
cardinality
in large
of the function.
is especially
of large
requires
estimate.
based on a computationally
This
some specified
per coordinate. that
which
subregions.
this
partitions.
the region
the characteristics
behaved,
ductive
is the entire
within
integrand,
the properties
one another;
region
a
integral.
reflect
is badly
even resemble
this
of a partitioning
sampling,accurately
the function
several
are iterative,
At each iteration,
integrand
of the
has achieved
The effectiveness
the
of the
of the approximate
here)
refinement.
initially
properties
the partitioning
timated
successive
is considered;
various
for
as the one presented
evalu-
might
be
used to compute strategy
the
must be
integrand's
be-
havior. The main distinctions ly proposed
methods are:
of the new partitioning
strategy
from previous-
-3-
(1)
The behavior
of the
integrand
means of multiparameter (2)
All
subregions
within
a region
optimization
are defined
rather
by simple
is estimated than
by
by sampling;
bounds on the co-
ordinates.
2.
OVERVIEW OF THE ADAPTIVE REFINEMENT PROCEDURE Consider
a hyperrectangular
region
R, defined
by simple
bounds on
each coordinate: R = {xl where x is the vector The essence
x: I xi I xli]l (xl,
x2,
of an adaptive
..,
(1)
xn)
strategy
for
T
. partitioning
R can be specified
attributes:
by three
A measure
(1)
grand's
s(R)
behavior
A method for
(2)
that
indicates
within
subdividing
the
"badness"
of the
inte-
R; the region
after
s(R)
has been de-
termined; (3)
A procedure
for
terminating
the partitioni
The quantity
of extreme
ng (a global
values
within
and for
stopping
to characterize R, weighted
criterion). the
integrand
by the volume of
Let v(R)
where f(x) tion).
the new subregions
used in the new al gorithm
is the d ifference R.
processing
= max f(x) xER
is a scalar-valued
The spread s(R)
s(R)
-. min f ‘lx> 9 xER function
is then defined
= v(R)
. vol(R).
(presently,
(2)
the
integrand
func-
by: (3)
-4-
The spread measure Carlaestimate tribution
s(R)
of the
bounds the uncertainty integral
The choice
of a global
of the measure
simply-bounded
form
R, and is taken
over
of R to the uncertainty
of a quadrature
(3) depends First,
(1) of R.
or Monte
to indicate
estimate
the con-
of the
in two crucial
integral.
ways on the
the volume of such a region
is
easy to compute:
vol(R) This other able
would not be true term since
are well
developed
be solved
quite
for
partitioning
(3)
algorithm
that
in Section Finally,
a given
to calculate
associated
the the
with
function.
it.
(2) can
Section
The strategy
with
for
rule
region
3
of the
(1)
subdivision
will
into
in each
the aimed-for
spread mea-sures.
to achieve
this
dis-
is based
be applied
so that
"similar"
is refined
second element
single
of the partitioning
region
intract-
procedure.
dividing
of regions
The
bounds on the variables
sub-problems
the same quadrature
is a list
method by which
simple
spread measure,
subregions.
were allowed.
to be computationally
f is a reasonable
involves
at the conclusion
result
with
the if
is the
simply-bounded
merged into
thus,
regions
must be solved
of the optimization
on the assumption
final
problems
efficiently
Given that
glance,
optimizat+on and,
some details
subregion
- xi).
more complicated
two optimization
methods
joint
if
(x;
(2) may seem, at first
However,
gives
= :: i=l
goal
The
is described
4. after the list
R has been partitioned, of all
regions.
If
the daughter the global
subregions
stopping
criteria
are
- 5-
are satisfied, regioas
the partitioning
is scanned for
then considered aspect
for
cursive
partitioning
with
. -
plane,
9x2),
procedure
applying
achieved
2
+ (x2 + 0.25)2]l
-1 s x2 I
representation
lb displays
(x2
1. of the surface,
some isopleths
shows the partitioning recursively,
The numbers indicate
the order
in this
3.
OPTIMIZATION WITH SIMPLY-BOUNDED VARIABLES
problems
it
is necessary
of the form:
measure to solve
achieved
case creating
in which
were made.
defined
of the function
of the plane
cuts
procedure,
re-
+ o.2Q2]}
- 0.433)2
To compute the spread
this
+ (x2 - 0.5)2]>
+ exp i-15[(xl and
is
of this
by applying
+
lc
which
4.
+ 0.433)2
the above procedure
subregions.
Details
f exp {-15[(x1
Figure
of
spread measure,
iteration.
in Section
the list
to the function
= exp (-15[xl
-1 5 x1 s 1
and Figure
the largest
the partitioning
la shows an isometric
Y = fbl
Otherwise,
at the next
are given
1 illustrates
f(x1,x2)
Figure
the one with
refinement
of the algorithm Figure
terminates.
by
on the by eleven
the corresponding
(3) at each step of the partitioning two bounds-constrained
optimization
-6-
Ml :
min f(x)
-
subject
to xL I x I x"
M2:
max f(x) subject
where the
to xL s x s x",
scalar-valued
function
f drives
tors
xL and x" contain,
fine
the desired
region.
The problem
M2 can be treated
(-f(x) > 3 and therefore tion only. In a typical differentiable,
require zation
the lower
will
to solve
function
not
values
only
Ml should However,
be available
only.
method with
be twice
problem
so that
finite-difference
concern
minimiza-
continuously
at isolated
points.
consequently
be able
in most instances,
considerations,
algorithm
de-
involving
the
the method of choice
Based on these
method used in the partitioning
quasi-Newton
f(x)
be non-smooth
on a smooth function.
of f will
will
will
bounds that
problem
discussion
problem,
and the vec-
and upper
as a minimization
subsequent
quadrature
selected well
derivatives
all
or at least
The algorithm to perform
respectively,
the partitioning;
should
the optimi-
is a bounds-constrained
approximations
to first
deriv-
atives. Quasi-Newton able
history,
a recent
methods
beginning
with
summary of their
Morg [1977]. wide variety and usually
Quasi-Newton of problems; display
for
unconstrained Davidon
[1959],
motivation methods if
superlinear
optimization and Fletcher
and properties
implemented,
convergence.
and Powell
is given
have been extremely
properly
have a remark-
by Dennis
successful
theyare
[1963];
quite
on a robust
The idea of a quasi-Newton
and
- 7 -
method is to build minimEed, matrix tives
by incorporating
that
matrix
the function
to be
in the gradient
into
of second partial
the method should
eventually
a
deriva-
behave like
method. iteration
the current
mation
so that
about
changes
the underlying
matrix),
A typical with
information
the observed
approximates
(Hessian
Newton's
up second-order
of an unconstrained
iterate,
to the Hessian, (i)
If
the matrix
Solve
terminates.
of f,
g; and an approxi-
is sufficiently
Otherwise,
the linear
vector
method begins
B.
the norm of the gradient
cedure (ii)
x; the gradient
quasi-Newton
proceed
small, to step
the pro-
(ii).
system
Bp = -g for
the
direction,
bility
is
matrix
B, so that
This (iii)
search
insured
essential
a Cholesky
factorization
a direction
is due to Gill
CY> 0 that
yields
numerical
staof the
of descent
and Murray a sufficient
for
f.
[1972]. decrease
so that + Up) < f(x).
The steplength the
In practice,
p is always
feature
f(x
algorithm
safeguarded
mented by Gill (iv)
by using
Find a steplength f,
p.
Evaluate
zation
quadratic and Murray
the gradient
Hessian
approximation of B with
and Murray, the next
interpolation
imple-
[1974a].
at x + crp, and produce by modifying
Return
procedure
procedure
to step
an updated
the Cholesky
the BFGS quasi-Newton
1974b].
iterate.
used in the current
update (i)
with
factori[see Gill
x + GYP as
is
in
I
-8-
In the present f is mt out using
algorithn,it
available,
so that
finite
is assumed that the calculation
the analytic
of the vector
above algorithm
are constrained
can be modified
to be between
ariable
is to be he Id "fixed"
at one of its
bounds.
After
constrained
is applied
is determined
algorithm
The gradient, represent
(2)
with
direction
the free
variables
The steplength
in step
prevent
variable
a free
iteration. fixed
In this
on that
The updates
(4)
The test
for
gradient
with
quantity
is sufficiently
respect
freeing
negative,
to
a bound.during
an
subsequently
becomes
small,
- e.g.,
bound and the the
ith
variable
ith
variables. it
in f.
held This
of the gradient if
When this
is necessary
currently
the sign
variables.
is based on the norm of the
to the free
to a reduction
variables
the free
in (i)
any variable
lead
fixed
on a lower
only
convergence
is made by checking to all
Hessian
may need to be restricted
the variable
the
component
ith
to check
fixed
on its
determ ination with
respect
variable
is fixed
of the gradient
can be released
from
its
or
the un-
changes:
and approximate
from violating
to B involve
bound will
decision,
bound.
(3)
whether
to vary
only;
(iii)
case,
this
the
At each
is "free"
the fo lowing
of search,
bounds,
manner.
a given
(1)
g is carried
simple
in a straightforward
whether
it
of
differences.
When the variables
iteration,
gradient
is bound.
-9-
The many additional the
scrftware
controlled step
documentation tolerances
of the algorithm
[Friedman
that
define,
and Wright, for
beginning
to solve
19791,
example,
problems
at a random sample of points
(largest)
function
value
in full
in
including
"sufficiently
user-
small"
in R, and the point
of this
in
as part
of the
to improve spurious
the extrema initial
saddle
depending
tremum since
this
Especially
point
point
for
the
smallest
the minimizati
information
states
might
preclude
the convergence
criteria
any further would
Although
could
be used
is not retained
in high dimensions,
in order
convergence
search
for
be satisfied
on
values
on the problem.)
computed at previous
sample,
robustness.
initial
with
f is eval-
sample is user controllable;
of 50 to 100 seem to be adequate, some regions
Ml and M2, the function
is used as the
(The size
(maximization).
for
are given
(i). Before
uated
details
to a
the true
at the
ex-
initial
point. An additional robustness the
feature
is a "local
initial
point.
search",
The idea
convergence
at a saddle
complicated
and only
perturbed
from the
amount along
a feasible
lower
and an exact
idea will
point
point
line
direction search
descent found
be sketched.
direction,
by moving
the function
value
is constructed
is carried
local
and a second exact
indication
First,
a point
a small,
search
feasible
sufficiently. point
direction.
to the first, line
of
are rather
changes
that
at
search
at whichever
out along
orthogonal
to improve is small
a spurious
of the
is generated
until
is designed
the gradient
to avoid
the general
descent
a second feasible
is again
The details
initial
that
to be used if
point.
each coordinate
Next,
at the lowest
of the algorithm
is Then,
is generated
is performed.
I - 10 -
If
this
the
transfer
ititial
gins
fails
point
to yield
ient ly
a "suffic
is accepted;
otherwise,
lower'
function
the quasi-Newton
value,
procedure
be-
at the new point. This
local
search
point
since
along
two directions.
cessful 4.
cannot
the function
on all
be guaranteed
is evaluated In practice,
the examples
to move away from a saddle
only
at a finite
the local
number of points
search
has been quite
suc-
tested.
REFINEMENT INTO SUB-REGIONS Given that
the
spread measure
(3)
of two optimization
problems , we next
by which
is subdivided.
with
the region
a particular
the points
consider In this
R, defined
region,
in R where f achieves
has been computed
its
the numerical
section,
by (1):
by the
solution
procedure
we shall
be concerned
Let xmax and xmln denote
maximum and minimum;respectively,
.
with
fmax and fmln
assume that cedure not
there
contains
the corresponding are no other
provisions
function
local
values.
extrema
to handle
For simplicity,
in R; the
situations
we
implemented
where this
pro-
assumption
is
satisfied. A partitioning
might
divide
strategy
R into
Suppose that
determine
an isoplethic
Rmax (which
the uniqueness within
allowed
two disjoint
follows.
parts,
that
for
Contains
assumption,
parts
a given
surface
regions
equal spread measures, as . ?, fmln 2 7 5 fmax, one could
value
= 71 that Rmin (which
the differences
Rmax and Rmln, respectively,
general
with
{x/f(x)
Xmax)and
completely
would
separates-R contains
of extreme be (fmax
into
xmin).
function
two 'Under
values
- 7) and ('i - fmin).
- 11 -
The desired
choice
for
-f would make the spread measures
associated
with
Rmax and Rmln equal , i.e., (fmax
- i)
The strategy termination
just
of the procedure.
no longer
be defined
would
would also
(Rmax) = (-f - fmin)
described
isopleths
numerical
problem
vol
Since
is,
would,
since
be an extremely
subregions
by simple
(4)
impractical
in general,
bounds,
be much more complicated
be much more difficult
(Rmin).
of course,
the resulting
in general
vol
the decomplex
Rmax and Rmin would
the next
to solve.
optimization
Furthermore,
it
to compute the volume of such a general
region.
into
The strategy
adopted
a collection
of simply-bounded
hyperrectangular . max f or fmln is selected
in the present
oriented
algorithm
subregions
approximation as the
which
the corresponding
function
value
7 in R (7 is the average
the region
by constructing
to either
"major"
extremum
value
is farthest
of function
divides
values
an axis-
. Rmax oi- Rmln. (f"),
i.e.,
Either
that
for
from the mean function at the
initial
random
sample of points). fmax + fmin If
r 7, then
2
fM = fmin, fm = fmax(with We then "cuts"
(s+,
seek to define
fM = fmax,
fm = fmin;
the corresponding a region
s-> a lo ng the positive
and negative
from x": RM = 1x1 xy - 6; I xi 5 xy + q,
6;'
choices
RM containing
6; 2 0, i = 1, . . . . n
R
otherwise, for
x",
xm).
xM by two sets coordinate
directions
of
I - 12 -
with
6',
6- chosen
such that
f(xM + 6f ei)
= 7 (5)
f(xM where ei
- 6; ei)
is the
ith
The equation
= 7,
i = 1, . . . . n
column of the
identity
to be satisfied
YfM -I- (1-Y)frn
matrix.
by -f is a re-arrangement
of
(4)
= 7,
(6)
where Y = vol(R")/vol(R). Because RM is defined
by simple
vol(R")
Thus,
the vectors
6',
=
:: i=l
6- are the
f(xM t 6; ei)
= '='
f(xM
- 6f ei)
= i=l
i=l,
. . . . n.
vol
bounds,
(6;
+ 6;).
solution
of the 2n nonlinear
fM t ['
R
I: - i='
I:(6;+“;,
Several nonlinear solving racy.
considerations
system (7) This
since
(7). the
means that
It
affect
the choice
is undesirable
solution
(6;
+ "f)
vol
(K)--lfrn
vol(R)
of solution
Ifrn,
method for
to expend too much effort
need not be computed with
a Newton-type
(7)
:r(67 +6;)
fM t [ 1 - i='
vol(R)
equations:
method (for
solving
very
the in
high accu-
nonlinear
I - 13 -
equations) of the2n
based on standard function
the Jacobian.
evaluations
values
moving
one (which
is all
that
Even a secant able
because
iteration.
it
the solution
but simpler,
The right-hand
side
of
and hence the vectors
local
solution case)
(7) could
con-
method in
to a reasonably [More,
it
good
19771.
be considered
(7) allows
objectionsystem at each
to be transformed
system.
is a vector
t 6 , 6- also
the
superlinear
of a 2n by 2n linear
nonlinear
method,
to a secant
as a Newton-type
form of
(7)
display
in this
solving
the special
to compute
by differencing
The switch
of the
is required
requires
to an equivalent,
equal,
estimate
method for
However,
iteration.
as effective
from a poor initial
because
is to use a secant-type
such methods
and are typically
iteration
are approximated
from the previous because
is unacceptable
at every
alternative
of the Jacobian
method is worthwhile vergence,
differences
required
A reasonable
where the elements function
finite
whose components
satisfy
are all
the 2n nonlinear
equa-
tions f(xM + 6: e,)
f(xM + bi-,
- f(xM + 6; e2) = 0
enDl)
- f(xM + 6: en) = 0 (8)
f(xM + 6: en) - f(xM f(xM
- 6; e,)
f(xM
- bi-,)
f(x
M
+ 6f e,)
- f(xM
- f(xM - ;=0
- 6; el)
= 0
- 6; e2) = 0
- 6; en) = 0
-14-
The attractive cept
feature
of the
-for the last)
involves
the following
special
plays
system
only
(8)
is that
two adjacent
since
unknowns,
each equation
(ex-
the Jacobian
dis-
structure:
xxo....o oxxo...o ooxxo..o . .
.
.
.
.
.
.
(9)
ooo..oxx xxx...xx If
no interchanges
upper triangular
form very
each successive
system at each iteration
03)
requires
TYPically, satisfied
of the
three
or four
where the
solution
hand side
of
(8)
(9) will
subtracting
means that
be reduced multiples
solving
to
of
the linear
fast.
secant
method for
evaluations
for
reasonable
each variable
This
iterations
Numerous safeguards ticular,
by simply
is extremely
2n function
with
easily,
row from the last.
Each iteration
the matrix
are necessary,
solving
the latest
are requ ired
the nonlinear values
in order
system
of 16i,Sil. for
(8) to be
accuracy. are included
6;,
must lie, is required
in the secant
6; is constrained
procedure
- in par-
to remain
within
the range
and the norm of the vector
that
is the left-
to decrease
at every
iteration.
-15
For simplicity cluded
in the preceding
particular,
the fact
candidates
for
same, it Certain close
of exposition,
cuts.
is prudent directions
Since
no cut
nor along
the
ith
to disregard
cuts
are eliminated
for
'i"*
bound,
xiL
i
Furthermore,
that
ith
strategy
appear
for
- in
any cut
is the
First,
xM may be very be insignificant.
direction
if
XiL),
I a(xi" -
if XiL),
p = .05).
the solution
values
for
6; are constrained
to satisfy
0