SLAC PUB-2006 - Stanford University

10 downloads 0 Views 2MB Size Report
for trying to deal with integrands that are not so "well behaved" is to ... the function is badly behaved, its predicted and actual behavior may not .... Y = fbl 9x2), ... min f(x). - subject to xL I x I x". M2: max f(x) subject to xL s x s x", where the scalar-valued .... Thus, the vectors 6', 6- are the solution of the 2n nonlinear equations:.
SLAC PUB-2006 (Rev) August 1977 July 1978 (Rev)

A NESTED PARTITIONING PROCEDUREFOR NUMERICAL MULTIPLE INTEGRATION Jerome H. Friedman* Stanford

Linear Stanford,

Accelerator California

Center

and European Organization Geneva,

for Nuclear Switzerland

Research,

CERN

and Margaret

H. Wright**

Department of Operations Research Stanford University Stanford, California

ABSTRACT An algorithm coordinate

is presented

for

adaptively

space based on optimization

ordinates.

The goal

such that

the variation

These regions

of function

are then

mate of the definite (Submitted

is to construct

of a scalar

a multidimensional function

of the co-

a set of hyperrectangular values

used as the basis

integral

partitioning

within for

each region

a stratified

regions, is small.

sampling

esti-

of the function.

to ACM Transactions

on Mathematical

Software)

*Work partially supported by Department of Energy under contract number DE-AC03-76SF00515. **Work partially supported by the Department of Energy under contract number DE-AS03-76-SF00326, National Science Foundation Grant MCS7620019-AOl, and U.S. Army Research Office contract DAAG29-79-C-0110.

1.

INTRODUCTION AJthough

considerable

the numerical attention,

purpose

techniques

behaved"

in that

low order for

the

the

Zeidman,

is taken This

sulting

integrals

in which

is because the

and subdivision constructed partitioning

on estimated

properties

of the

approximations

and Zeidman,

are general

1971;

strategy

1979;

Sasaki,

to evaluate region

for

multi-

is based on the evaluations

the final

required

integration,

this

methods on high-dimeninvestment

is repaid

and

the entire

algorithm

existing initial

[Halton

subregions.

the function for

chosen

in function

values

by the efficiencies

re-

have been proposed,

based

partition.

automatic

the others

the

is to

subregions,

be applied over

subdivision

with

behaved"

each subregion

partitioning

Although

Several

while

over

by a

One technique

into

integral

"well

approximated

Kahaner and Wells,

are not retained

This

from a well

on factored

the

Most general-

so "well

can then

and the

to be competitive

the optimization

within

has received

are relatively well

are not

techniques

optimization.

problems.

that

"adaptively"

an adaptive

the partition appears

that

Genz, 1972;

Standard

quadrature

to develop

for

1971;

sum of the

problem.

of integration.

is well-behaved

Lautrup,

use of numerical

approach

the region

integrands

paper presents

dimensional

a difficult

integrals

can be reasonably

in each subregion,

as the

of multiple

to integrands

of integration

LePage 19781.

integral

sional

with

integrand

1971;

apply

integrand

the region

so that

remains

within

to deal

partition

the

today

polynomial

trying

1978;

it

evaluation

strategies integrand.

[Lautrup,

1971;

multidimensional

Genz, 1972;

Some of these

strategies

LePage,

Sasaki,

1978;

adaptive

Kahaner and Wells,

procedures

19791.

All

rely 19781 [Halton

of these

-2-

adaptive

techniques

(as well

and ars based on top-down particular

region

tegration

region.

to estimate strategy until

A sampling

division

into

error

upon the degree

to which

ensions

where even samplings

example,

in ten dimensions

about

three

titioning

This

points strategy

ations

may be inefficient

better

expended

the final

integral

simply

its

process

in-

is used

then guide

reduction

strategy of the

predicted may lead

possibility

a

continues in the es-

depends, integrand,

a sampling

large

to increase Thus, feasible

may not

likely

in high dim-

points

sparse;

for

is equivalent

hand, a complex

evaluations

the number of points a good partitioning way of assessing

to par-

number of integrand

because the additional

If

or counterpro-

are very

On the other a very

behavior

to ineffective

of 60,000

part,

deduced during

and actual

cardinality

in large

of the function.

is especially

of large

requires

estimate.

based on a computationally

This

some specified

per coordinate. that

which

subregions.

this

partitions.

the region

the characteristics

behaved,

ductive

is the entire

within

integrand,

the properties

one another;

region

a

integral.

reflect

is badly

even resemble

this

of a partitioning

sampling,accurately

the function

several

are iterative,

At each iteration,

integrand

of the

has achieved

The effectiveness

the

of the

of the approximate

here)

refinement.

initially

properties

the partitioning

timated

successive

is considered;

various

for

as the one presented

evalu-

might

be

used to compute strategy

the

must be

integrand's

be-

havior. The main distinctions ly proposed

methods are:

of the new partitioning

strategy

from previous-

-3-

(1)

The behavior

of the

integrand

means of multiparameter (2)

All

subregions

within

a region

optimization

are defined

rather

by simple

is estimated than

by

by sampling;

bounds on the co-

ordinates.

2.

OVERVIEW OF THE ADAPTIVE REFINEMENT PROCEDURE Consider

a hyperrectangular

region

R, defined

by simple

bounds on

each coordinate: R = {xl where x is the vector The essence

x: I xi I xli]l (xl,

x2,

of an adaptive

..,

(1)

xn)

strategy

for

T

. partitioning

R can be specified

attributes:

by three

A measure

(1)

grand's

s(R)

behavior

A method for

(2)

that

indicates

within

subdividing

the

"badness"

of the

inte-

R; the region

after

s(R)

has been de-

termined; (3)

A procedure

for

terminating

the partitioni

The quantity

of extreme

ng (a global

values

within

and for

stopping

to characterize R, weighted

criterion). the

integrand

by the volume of

Let v(R)

where f(x) tion).

the new subregions

used in the new al gorithm

is the d ifference R.

processing

= max f(x) xER

is a scalar-valued

The spread s(R)

s(R)

-. min f ‘lx> 9 xER function

is then defined

= v(R)

. vol(R).

(presently,

(2)

the

integrand

func-

by: (3)

-4-

The spread measure Carlaestimate tribution

s(R)

of the

bounds the uncertainty integral

The choice

of a global

of the measure

simply-bounded

form

R, and is taken

over

of R to the uncertainty

of a quadrature

(3) depends First,

(1) of R.

or Monte

to indicate

estimate

the con-

of the

in two crucial

integral.

ways on the

the volume of such a region

is

easy to compute:

vol(R) This other able

would not be true term since

are well

developed

be solved

quite

for

partitioning

(3)

algorithm

that

in Section Finally,

a given

to calculate

associated

the the

with

function.

it.

(2) can

Section

The strategy

with

for

rule

region

3

of the

(1)

subdivision

will

into

in each

the aimed-for

spread mea-sures.

to achieve

this

dis-

is based

be applied

so that

"similar"

is refined

second element

single

of the partitioning

region

intract-

procedure.

dividing

of regions

The

bounds on the variables

sub-problems

the same quadrature

is a list

method by which

simple

spread measure,

subregions.

were allowed.

to be computationally

f is a reasonable

involves

at the conclusion

result

with

the if

is the

simply-bounded

merged into

thus,

regions

must be solved

of the optimization

on the assumption

final

problems

efficiently

Given that

glance,

optimizat+on and,

some details

subregion

- xi).

more complicated

two optimization

methods

joint

if

(x;

(2) may seem, at first

However,

gives

= :: i=l

goal

The

is described

4. after the list

R has been partitioned, of all

regions.

If

the daughter the global

subregions

stopping

criteria

are

- 5-

are satisfied, regioas

the partitioning

is scanned for

then considered aspect

for

cursive

partitioning

with

. -

plane,

9x2),

procedure

applying

achieved

2

+ (x2 + 0.25)2]l

-1 s x2 I

representation

lb displays

(x2

1. of the surface,

some isopleths

shows the partitioning recursively,

The numbers indicate

the order

in this

3.

OPTIMIZATION WITH SIMPLY-BOUNDED VARIABLES

problems

it

is necessary

of the form:

measure to solve

achieved

case creating

in which

were made.

defined

of the function

of the plane

cuts

procedure,

re-

+ o.2Q2]}

- 0.433)2

To compute the spread

this

+ (x2 - 0.5)2]>

+ exp i-15[(xl and

is

of this

by applying

+

lc

which

4.

+ 0.433)2

the above procedure

subregions.

Details

f exp {-15[(x1

Figure

of

spread measure,

iteration.

in Section

the list

to the function

= exp (-15[xl

-1 5 x1 s 1

and Figure

the largest

the partitioning

la shows an isometric

Y = fbl

Otherwise,

at the next

are given

1 illustrates

f(x1,x2)

Figure

the one with

refinement

of the algorithm Figure

terminates.

by

on the by eleven

the corresponding

(3) at each step of the partitioning two bounds-constrained

optimization

-6-

Ml :

min f(x)

-

subject

to xL I x I x"

M2:

max f(x) subject

where the

to xL s x s x",

scalar-valued

function

f drives

tors

xL and x" contain,

fine

the desired

region.

The problem

M2 can be treated

(-f(x) > 3 and therefore tion only. In a typical differentiable,

require zation

the lower

will

to solve

function

not

values

only

Ml should However,

be available

only.

method with

be twice

problem

so that

finite-difference

concern

minimiza-

continuously

at isolated

points.

consequently

be able

in most instances,

considerations,

algorithm

de-

involving

the

the method of choice

Based on these

method used in the partitioning

quasi-Newton

f(x)

be non-smooth

on a smooth function.

of f will

will

will

bounds that

problem

discussion

problem,

and the vec-

and upper

as a minimization

subsequent

quadrature

selected well

derivatives

all

or at least

The algorithm to perform

respectively,

the partitioning;

should

the optimi-

is a bounds-constrained

approximations

to first

deriv-

atives. Quasi-Newton able

history,

a recent

methods

beginning

with

summary of their

Morg [1977]. wide variety and usually

Quasi-Newton of problems; display

for

unconstrained Davidon

[1959],

motivation methods if

superlinear

optimization and Fletcher

and properties

implemented,

convergence.

and Powell

is given

have been extremely

properly

have a remark-

by Dennis

successful

theyare

[1963];

quite

on a robust

The idea of a quasi-Newton

and

- 7 -

method is to build minimEed, matrix tives

by incorporating

that

matrix

the function

to be

in the gradient

into

of second partial

the method should

eventually

a

deriva-

behave like

method. iteration

the current

mation

so that

about

changes

the underlying

matrix),

A typical with

information

the observed

approximates

(Hessian

Newton's

up second-order

of an unconstrained

iterate,

to the Hessian, (i)

If

the matrix

Solve

terminates.

of f,

g; and an approxi-

is sufficiently

Otherwise,

the linear

vector

method begins

B.

the norm of the gradient

cedure (ii)

x; the gradient

quasi-Newton

proceed

small, to step

the pro-

(ii).

system

Bp = -g for

the

direction,

bility

is

matrix

B, so that

This (iii)

search

insured

essential

a Cholesky

factorization

a direction

is due to Gill

CY> 0 that

yields

numerical

staof the

of descent

and Murray a sufficient

for

f.

[1972]. decrease

so that + Up) < f(x).

The steplength the

In practice,

p is always

feature

f(x

algorithm

safeguarded

mented by Gill (iv)

by using

Find a steplength f,

p.

Evaluate

zation

quadratic and Murray

the gradient

Hessian

approximation of B with

and Murray, the next

interpolation

imple-

[1974a].

at x + crp, and produce by modifying

Return

procedure

procedure

to step

an updated

the Cholesky

the BFGS quasi-Newton

1974b].

iterate.

used in the current

update (i)

with

factori[see Gill

x + GYP as

is

in

I

-8-

In the present f is mt out using

algorithn,it

available,

so that

finite

is assumed that the calculation

the analytic

of the vector

above algorithm

are constrained

can be modified

to be between

ariable

is to be he Id "fixed"

at one of its

bounds.

After

constrained

is applied

is determined

algorithm

The gradient, represent

(2)

with

direction

the free

variables

The steplength

in step

prevent

variable

a free

iteration. fixed

In this

on that

The updates

(4)

The test

for

gradient

with

quantity

is sufficiently

respect

freeing

negative,

to

a bound.during

an

subsequently

becomes

small,

- e.g.,

bound and the the

ith

variable

ith

variables. it

in f.

held This

of the gradient if

When this

is necessary

currently

the sign

variables.

is based on the norm of the

to the free

to a reduction

variables

the free

in (i)

any variable

lead

fixed

on a lower

only

convergence

is made by checking to all

Hessian

may need to be restricted

the variable

the

component

ith

to check

fixed

on its

determ ination with

respect

variable

is fixed

of the gradient

can be released

from

its

or

the un-

changes:

and approximate

from violating

to B involve

bound will

decision,

bound.

(3)

whether

to vary

only;

(iii)

case,

this

the

At each

is "free"

the fo lowing

of search,

bounds,

manner.

a given

(1)

g is carried

simple

in a straightforward

whether

it

of

differences.

When the variables

iteration,

gradient

is bound.

-9-

The many additional the

scrftware

controlled step

documentation tolerances

of the algorithm

[Friedman

that

define,

and Wright, for

beginning

to solve

19791,

example,

problems

at a random sample of points

(largest)

function

value

in full

in

including

"sufficiently

user-

small"

in R, and the point

of this

in

as part

of the

to improve spurious

the extrema initial

saddle

depending

tremum since

this

Especially

point

point

for

the

smallest

the minimizati

information

states

might

preclude

the convergence

criteria

any further would

Although

could

be used

is not retained

in high dimensions,

in order

convergence

search

for

be satisfied

on

values

on the problem.)

computed at previous

sample,

robustness.

initial

with

f is eval-

sample is user controllable;

of 50 to 100 seem to be adequate, some regions

Ml and M2, the function

is used as the

(The size

(maximization).

for

are given

(i). Before

uated

details

to a

the true

at the

ex-

initial

point. An additional robustness the

feature

is a "local

initial

point.

search",

The idea

convergence

at a saddle

complicated

and only

perturbed

from the

amount along

a feasible

lower

and an exact

idea will

point

point

line

direction search

descent found

be sketched.

direction,

by moving

the function

value

is constructed

is carried

local

and a second exact

indication

First,

a point

a small,

search

feasible

sufficiently. point

direction.

to the first, line

of

are rather

changes

that

at

search

at whichever

out along

orthogonal

to improve is small

a spurious

of the

is generated

until

is designed

the gradient

to avoid

the general

descent

a second feasible

is again

The details

initial

that

to be used if

point.

each coordinate

Next,

at the lowest

of the algorithm

is Then,

is generated

is performed.

I - 10 -

If

this

the

transfer

ititial

gins

fails

point

to yield

ient ly

a "suffic

is accepted;

otherwise,

lower'

function

the quasi-Newton

value,

procedure

be-

at the new point. This

local

search

point

since

along

two directions.

cessful 4.

cannot

the function

on all

be guaranteed

is evaluated In practice,

the examples

to move away from a saddle

only

at a finite

the local

number of points

search

has been quite

suc-

tested.

REFINEMENT INTO SUB-REGIONS Given that

the

spread measure

(3)

of two optimization

problems , we next

by which

is subdivided.

with

the region

a particular

the points

consider In this

R, defined

region,

in R where f achieves

has been computed

its

the numerical

section,

by (1):

by the

solution

procedure

we shall

be concerned

Let xmax and xmln denote

maximum and minimum;respectively,

.

with

fmax and fmln

assume that cedure not

there

contains

the corresponding are no other

provisions

function

local

values.

extrema

to handle

For simplicity,

in R; the

situations

we

implemented

where this

pro-

assumption

is

satisfied. A partitioning

might

divide

strategy

R into

Suppose that

determine

an isoplethic

Rmax (which

the uniqueness within

allowed

two disjoint

follows.

parts,

that

for

Contains

assumption,

parts

a given

surface

regions

equal spread measures, as . ?, fmln 2 7 5 fmax, one could

value

= 71 that Rmin (which

the differences

Rmax and Rmln, respectively,

general

with

{x/f(x)

Xmax)and

completely

would

separates-R contains

of extreme be (fmax

into

xmin).

function

two 'Under

values

- 7) and ('i - fmin).

- 11 -

The desired

choice

for

-f would make the spread measures

associated

with

Rmax and Rmln equal , i.e., (fmax

- i)

The strategy termination

just

of the procedure.

no longer

be defined

would

would also

(Rmax) = (-f - fmin)

described

isopleths

numerical

problem

vol

Since

is,

would,

since

be an extremely

subregions

by simple

(4)

impractical

in general,

bounds,

be much more complicated

be much more difficult

(Rmin).

of course,

the resulting

in general

vol

the decomplex

Rmax and Rmin would

the next

to solve.

optimization

Furthermore,

it

to compute the volume of such a general

region.

into

The strategy

adopted

a collection

of simply-bounded

hyperrectangular . max f or fmln is selected

in the present

oriented

algorithm

subregions

approximation as the

which

the corresponding

function

value

7 in R (7 is the average

the region

by constructing

to either

"major"

extremum

value

is farthest

of function

divides

values

an axis-

. Rmax oi- Rmln. (f"),

i.e.,

Either

that

for

from the mean function at the

initial

random

sample of points). fmax + fmin If

r 7, then

2

fM = fmin, fm = fmax(with We then "cuts"

(s+,

seek to define

fM = fmax,

fm = fmin;

the corresponding a region

s-> a lo ng the positive

and negative

from x": RM = 1x1 xy - 6; I xi 5 xy + q,

6;'

choices

RM containing

6; 2 0, i = 1, . . . . n

R

otherwise, for

x",

xm).

xM by two sets coordinate

directions

of

I - 12 -

with

6',

6- chosen

such that

f(xM + 6f ei)

= 7 (5)

f(xM where ei

- 6; ei)

is the

ith

The equation

= 7,

i = 1, . . . . n

column of the

identity

to be satisfied

YfM -I- (1-Y)frn

matrix.

by -f is a re-arrangement

of

(4)

= 7,

(6)

where Y = vol(R")/vol(R). Because RM is defined

by simple

vol(R")

Thus,

the vectors

6',

=

:: i=l

6- are the

f(xM t 6; ei)

= '='

f(xM

- 6f ei)

= i=l

i=l,

. . . . n.

vol

bounds,

(6;

+ 6;).

solution

of the 2n nonlinear

fM t ['

R

I: - i='

I:(6;+“;,

Several nonlinear solving racy.

considerations

system (7) This

since

(7). the

means that

It

affect

the choice

is undesirable

solution

(6;

+ "f)

vol

(K)--lfrn

vol(R)

of solution

Ifrn,

method for

to expend too much effort

need not be computed with

a Newton-type

(7)

:r(67 +6;)

fM t [ 1 - i='

vol(R)

equations:

method (for

solving

very

the in

high accu-

nonlinear

I - 13 -

equations) of the2n

based on standard function

the Jacobian.

evaluations

values

moving

one (which

is all

that

Even a secant able

because

iteration.

it

the solution

but simpler,

The right-hand

side

of

and hence the vectors

local

solution case)

(7) could

con-

method in

to a reasonably [More,

it

good

19771.

be considered

(7) allows

objectionsystem at each

to be transformed

system.

is a vector

t 6 , 6- also

the

superlinear

of a 2n by 2n linear

nonlinear

method,

to a secant

as a Newton-type

form of

(7)

display

in this

solving

the special

to compute

by differencing

The switch

of the

is required

requires

to an equivalent,

equal,

estimate

method for

However,

iteration.

as effective

from a poor initial

because

is to use a secant-type

such methods

and are typically

iteration

are approximated

from the previous because

is unacceptable

at every

alternative

of the Jacobian

method is worthwhile vergence,

differences

required

A reasonable

where the elements function

finite

whose components

satisfy

are all

the 2n nonlinear

equa-

tions f(xM + 6: e,)

f(xM + bi-,

- f(xM + 6; e2) = 0

enDl)

- f(xM + 6: en) = 0 (8)

f(xM + 6: en) - f(xM f(xM

- 6; e,)

f(xM

- bi-,)

f(x

M

+ 6f e,)

- f(xM

- f(xM - ;=0

- 6; el)

= 0

- 6; e2) = 0

- 6; en) = 0

-14-

The attractive cept

feature

of the

-for the last)

involves

the following

special

plays

system

only

(8)

is that

two adjacent

since

unknowns,

each equation

(ex-

the Jacobian

dis-

structure:

xxo....o oxxo...o ooxxo..o . .

.

.

.

.

.

.

(9)

ooo..oxx xxx...xx If

no interchanges

upper triangular

form very

each successive

system at each iteration

03)

requires

TYPically, satisfied

of the

three

or four

where the

solution

hand side

of

(8)

(9) will

subtracting

means that

be reduced multiples

solving

to

of

the linear

fast.

secant

method for

evaluations

for

reasonable

each variable

This

iterations

Numerous safeguards ticular,

by simply

is extremely

2n function

with

easily,

row from the last.

Each iteration

the matrix

are necessary,

solving

the latest

are requ ired

the nonlinear values

in order

system

of 16i,Sil. for

(8) to be

accuracy. are included

6;,

must lie, is required

in the secant

6; is constrained

procedure

- in par-

to remain

within

the range

and the norm of the vector

that

is the left-

to decrease

at every

iteration.

-15

For simplicity cluded

in the preceding

particular,

the fact

candidates

for

same, it Certain close

of exposition,

cuts.

is prudent directions

Since

no cut

nor along

the

ith

to disregard

cuts

are eliminated

for

'i"*

bound,

xiL

i

Furthermore,

that

ith

strategy

appear

for

- in

any cut

is the

First,

xM may be very be insignificant.

direction

if

XiL),

I a(xi" -

if XiL),

p = .05).

the solution

values

for

6; are constrained

to satisfy

0