Lecture Notes in Mathematics

Lecture Notes in Mathematics Edited by A. Dold and B. Eckmann

1349 Erich Novak

Deterministic and Stochastic Error Bounds in Numerical Analysis

Springer-Verlag Berlin Heidelberg NewYork London Paris Tokyo

Author

Erich Novak Mathematisches Institut, Universit~.t Erlangen-N~Jrnberg Bismarckstr. 1 1/2, D-8520 Erlangen, Federal Republic of Germany

Mathematics Subject Classification (1980): 65-02, 41 A46, 6 5 C 0 5 , 65J05, 68Q25, 28C20 ISBN 3-540-50368-4 Springer-Verlag Berlin Heidelberg New York ISBN 0-38?-50368-4 Springer-Verlag New York Berlin Heidelberg

This work is subject to copyright. All rights are reserved, whether the whole or part of the material is concerned, specifically the rights of translation, reprinting, re-use of illustrations, recitation, broadcasting, reproduction on microfilms or in other ways, and storage in data banks. Duplication of this publication or parts thereof is only permitted under the provisions of the German Copyright Law of September 9, 1965, in its version of June 24, 1985, and a copyright fee must always be paid. Violations fall under the prosecution act of the German Copyright Law. © Springer-Verlag Berlin Heidelberg 1988 Printed in Germany Printing and binding: Druckhaus Beltz, Hemsbach/Bergstr. 2146/3140-543210

Acknowledgments This is a revised version of my notes "Deterministic, stochastic, and average error bounds in optimal recovery", written in 1984-1986. The author is indebted to several people for generous advice and comments, in particular to Professor D. KSlzow who also initiated and supervised his doctoral thesis. Erlangen, June 1988

Erich Novak

Contents Introduction

1.

.

.

.

Deterministic

.

.

.

.

.

.

error bounds

.

.

1

. . . . . . . . . . . . . . . . . . .

9

1.1 Basic a s s u m p t i o n s and definitions

.

.

2.

Error

bounds

for Monte

error bounds

Bibliography

Notations

Index

.

.

.

.

.

.

.

.

.

.

.

9

22

. . . . . . . . . . . .

43

. . . . . . . . . . . . . . . . .

43

. . . . . . . . . . . . . . . . . . . . . .

53

. . . . . . . . . . . . . . . . . . . .

3.2 T h e a v e r a g e over t h e set of i n f o r m a t i o n

Existence

.

. . . . . . . . . . . . . . . . . . . . . .

3.1 A v e r a g e s over t h e class of p r o b l e m elements

Appendix:

.

14

Carlo methods

2.2 Kesults for special p r o b l e m s

Average

.

. . . . . . . . . . . . . . . . .

2.1 Basic p r o p e r t i e s of M o n t e Carlo m e t h o d s

3.

.

. . . . . . . . . . . . . . . . . . . . .

1.2 D e t e r m i n i s t i c e r r o r b o u n d s and n - w i d t h s 1.3 R e s u l t s for special p r o b l e m s

.

and uniqueness

66

. . . . . . . . . . . . . . .

66

. . . . . . . . . . . . . . . . .

79

of optimal

algorithms

. . . . .

90

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

100

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

110

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

112

Introduction In these notes we want to investigate different deterministic and stochastic error bounds of numerical analysis. For m a n y computational problems (such as approximation, optimization, and quadrature) we have only partial information and consequently such problems can only be solved with uncertainty in the answer. The information-centered approach asks for optimal methods and optimal error bounds if only the type of information available is indicated. We begin with worst case error bounds for deterministic methods and consider relations between these error bounds and the n-widths of the class of problem elements (1.2). In 1.3 we give worst case error bounds for some special problems. We are mainly interested in the problems of approximation (App), optimization (Opt and Opt*), and quadrature or integration (Int). We consider different function classes, for both adaptive and nonadaptive methods. First of all, I explain the information-based approach by means of an example.

Example 1.

If f : [0, 1] ~ R is a continuous function (or Riemann-integrable) and

( 2 i - 1 ~]' So(f) = ¼~i f \--~--n .= then

1

lim S,(f) = For the error A ( S , ( f ) , f ) = A(~(f),f)< A(~,~(f),f)
e~+l(Fs, App) aad(Fs, Opt) >>_(1/2 - e). 2 -1/s. e,( Fs, Opt) a~d(Fs,Opt *) >_(1/2 - e). 2 -1/s. e,~(Fs,Opt*) a,~(Fs, As) 0 there is a linear $2 E A~(~) w i t h IIs~(f)- fll < E for all f E K . Now t h e m a p p i n g S + = S - $2 = Id - $2 is l i n e a r a n d defines a p r o b l e m which is solvable and lipschitz with a li~)schitz c o n s t a n t less t h a n ~. By P r o p o s i t i o n 1.2.5 we get

em(Y + K , S +) d~(F) > d,~(F) a n d hence l i m d,~(F) = 0

¢::=>

lim en(F, App) = O.

c) Let S : F --* M be a l i n e a r p r o b l e m (see 1.3.1 It can be p r o v e d t h a t in this case S(F) C M is c o m p a c t a n d V is finite dimensional. This l i m e ~ ( F , S ) = 0. Consider, for e x a m p l e , the

c([0,1]) I Ilfll~ _< 1).

for t h e definition) with lime~d(F,S) = O. is of t h e form S(F) C K + V where K condition, however, is not sufficient for p r o b l e m S = Int = A w i t h F = { f E

20 1.2.8 T h e p r o b l e m I n t Let S = I n t : B ( X ) ~ I% be a bounded and linear mapping. By means of Proposition 1.2.5 we have e~(F, S) 0 there exist a t , . . . ,am 6 X and c l , . . . ,ca E 1% with r~

L ( f ) = ~_, c,f(a,) i=1

for all f C V and

IILll ~ ~ Ic~[ ~ IILII + e. i=1

Proof: If X is compact and V consists of continuous functions, this statement even holds for e = 0 (this is the mentioned interpolation theorem, see Shapiro (1971)). We reduce the general case to this special case: The set M = { f 6 V l u l l = 1} is compact, for /~1 > 0 exist f l , . . . ,f,~ 6 M with M = U{f

6 M [ [ I f - f~[[ _< ~1}.

i=1

For ~2 > 0 let By eventually with K finite case to K, V', Because of

b~ be given so that [fg(b~)[ _> 1 - ~ (i = 1 , . . . , m). making K ' = { b ~ , . . . , bin} larger we can assume t h a t for V' = {fig [ f E V} and K ' C K the statement dim V' = n holds. Now we apply the special and L' defined by U(fiK ) = L ( f ) . Ilftgll --> Ilfll" (1 -- ~1 -- ~ )

for all f 6 V we get

L(f) = f i c,.f(ai) i=1

21 (for all f E V and suitable ci E R, ai E K) with ilLtl < ~

ie, I = iiL'll < iiLii • (1 - ~1 - 52) -1

i-=1

and the statement follows. Proof of the Proposition: Let F, S, n and e > 0 be given. There is a vector space V C B ( X ) with dim V = n and sup inf

.fEF gEV

II/-

gli < d , ( F ) + e.

Because of the last lemma there is a L : B ( X ) --* R of the form L ( f ) = ~,"=1 cJ(ai) with L ( f ) = S ( f ) for all f E V and BiLl[ -< lisii + E. Therefore

IL(f) - S(f)i _ ½e,(F, S) for any linear problem, see Gal, Micchelli (1980) and Traub, Wofniakowski (1980). It is easy to sharpen this inequality using the notion of J u n g ' s constant. Definition Let M be a normed space. The Jung constant c(M) is defined by

c(M) = sup{rad(D)/diam(D) [ O ~ D C M is bounded}. Here tad(D) and diam(D) are defined by

tad(D) = inf sup d(x, y)

and

diam(D) = sup d(x, y).

xEM yED

~,yED

Proposition 1 Let S : F ~ M be a linear problem. Then

1
_~l-~,~rad(S(No~(x))). c~ ivl )

23

Since this holds for all x we obtain diam(S(Nol(O))) _ > ~(-~)rm,~(No). Thus

r.~.~(N) >_rad(S(N-l(O))) = rad(S(Nol(O)))= l diam(S(Nol(O))) > 2c~M)r.~.~(No). Together with c(M) _< 1 this completes the proof. T h e following results concerning c(M) are well known, see Holmes (1972) and A mir (1985). Together with Proposition 1 these results yield lower bounds for the a d a p t i o n constant a ( M ) , which we define by a ( M ) = inf{e~d(F, S)/e,(F,.S) [ S: F -* M is a linear problem and n e N}. In particular, we obtain a(Lo~(#)) = 1, i.e. adaption can not help in the case M = L~o(#). This result is essentially due to Bakhvalov (1971). Proposition 2 1) 1/2 < c(M) < 1 for every normed space M;

( m ~I/2

2) c(l~) = \~--~-5/

, where l~ is the euclidean space R'~;

3) c(H) = 2 -1/~, where H is an infinite dimensional Hilbert space; 4) c(C[0, 1]) = 1 and more general c(C(K)) = 1 if K is a c o m p a c t space which is not extremally disconnected; 5) c(Loo(#)) = 1/2 and more general: c(M) = 1/2 if and only if M is a P r s p a c e , i.e. M is isometric with some C(K), where K is a extremally disconnected c o m p a c t space.

Example It was an unsolved problem for some years, whether there is a linear problem where adaptive m e t h o d s are really better than nonadaptive ones, see Problem 1 of Packel, Wo2niakowski (1986). T h e following example was independently found by Mark A. Kon and the author in 1987, see Kon, Novak (1988) for more details. Let K C t t s. We define

R~(K) = s u p r a d { k E K I k. x = d}

(x ~ R 3, x # 0)

dER

and

R(K) =

inf

xE t t a, ~:~0

R~(K).

We assume t h a t T C p,a is the convex hull of the following six points (0, vf3/3, :t: 1/4)

(4-1/2, - v/3/6, -t- 1/4).

It can be proved tha.t R(0,0,1)(T) = v/'3/3 ~ 0.577, Ro,0,0)(T ) = 1/2, and R(T) = R(1,0,0) = 1/2. For Mo = {(tl + t2)/2 [ tl e T , t 2 e - T } we get R(Mo) = R(o,o.1)(Mo) = 1/2. The set ?do is the convex hull of the points (+1/4,±v/3/4,:t:ho)

and

(+1/2,0,+h0),

24 where h0 = 1/4. If we replace h0 by h = v/2-i-/12 ,~ 0.382 we o b t a i n a convex and s y m m e t r i c set M C R 3 with R(M) = R(0,0,1)(M) = 1/2 a n d we also have R(1,0,0)(M) = V ~ / 3 ~ 0.577. Now we consider the i n f o r m a t i o n

x~=

(0,0,1),

if A _

(1,0,0),

if A >

for the convex b o d y M~ = AT + (1 - A)M. We o b t a i n

R(I~.I~,) < R~(M~,) < 0.530

(1)

(0 < A < 1).

It can be shown, however, t h a t t h e r e is no single x E R 3 with (2)

R~(M~) < v ~ / 3 ~ 0.577

for all A. Now we can give an e x a m p l e for a linear p r o b l e m such t h a t a d a p t i o n helps in the worst case. Let F C R 4 be the convex hull of the p o i n t s {+(xl,x~,x3,x~)[(xl,x~,x3)eTand

x~=k),

{(x~,x~,x3,0) l(xl,x~,x3)e

M},

where k > 0. We consider the linear p r o b l e m S :F

~

l~,

S(z)

= x

(observe t h a t F is a s y m m e t r i c and convex set). It follows from (1) t h a t

e~d(F, S) < 0.530 (we use t h e i n f o r m a t i o n I~(x) = x4 and I2(x) = x3 if ix4l - ~ ~).

A m ~ ( S ) > V ~ / 3 - ~ ~ 0.577 - e, where ~ > 0 and k > k(e). This follows from (2).

Remark As a corollary, we o b t a i n 0.790 < a(l 4) c(M)

and

a(M) < a(M).

25 Proof: Let e > 0. Then there is a bounded set D C M with

rad(D_____~) >_ c(M) - e. diam(D) Here rad(D) = r a d M ( D ) is the radius of D in M. Because of the existence of a n o r m - l projection P : M ~ M , we have

radM(D) = r a d ~ ( D ) (see Garkavi (1964)) and hence

rad~( D) > c(M) - ~. diam(D) We conclude t h a t c(M) >_ c(M). Now let S : F ~ M be a l i n e a r p r o b l e m and n 6 N. Then we also have $1 : F ~ M with the same operator $1 = S. We show t h a t

e(:d)(F, Sl) -- 4°d)(r, S) which proves our statement. Let N : F ~ R " be some (adaptive or nonadaptive) inform a t i o n operator. Then the radius rM(N) of N with respect to the problem S : F ~ M is given by rM(N) = sup r a d M S ( N - I ( z ) ) xER"

and an analogous formula holds for

It follows that r M ( N ) =

The following Proposition 4 deals with Lv-spaces. Our result implies that a ( M ) < 1 (and adaption helps for certain linear problems) if M is a Lp-spaee with dim M > 3 and I P - 21 < 5 for some ~ > 0. A more careful analysis of our example in four dimensions would probably show t h a t a(l~) < 1 for all 1 c~(l~ +1) _> a(l N) _> a ( L p ( R ) ) . Proof: First we prove i). Let D C R "~ be a bounded set and let radp(D) and diamp(D) denote its radius and diameter in l~. It is easy to prove the following estimates for 1 __ rad~( D ) >_ m I/~-~/p . radp( D ) and

diamp(D) > diamp(D) >_ m ~/f-~/p . diamp(D). It follows t h a t p ~ c(l'~)is a continuous function, 1 < p_< oo. Let S : F - * R "~ be a given problem, let N : F ~ R " be a given information and let rp(N) be the radius of N with respect to the n o r m of l~ in R m. It follows that

rp(N) >_ r~(N) > m ~/p-1/p . rp(N)

26 and the continuity of p ~ a(l~) can easily be concluded. The inequalities in ii) follow from Proposition 3 because the projections

P : l~ +1 --+ 1'~ P : lN ~ l~

P((xl,...,xm+l)) = ( x l , . . . , x m ) , P((xl,x2,...)) = (x~,...,xm),

P: z,(it) all have n o r m 1.

R e m a r k o n linear a l g o r i t h m s We have seen t h a t adaptive methods are only slightly better than nonadaptive ones for linear problems. Let N • An be a given nonadaptive information, N ( f ) = (f(al),...f(a,~)). We could ask whether there is a linear method

S(f) = ¢o

N(f) = fif(a,).m, i=1

(with mi • M ) such t h a t r(N) = Am~,(S). This is true for the problems App and Int (see 1.2.6) and more general in the case M = t t or M = B ( X ) (Smolyak's lemma, see Bakhvalov (1971), Sukharev (1986), and Osipenko (1976) for the complex case). However, there are linear problems for which no linear optimal algorithm exists, see Packel (1986) and Werschulz, Woiniakowski (1986).

1.3.2 P r o p o s i t i o n Let F C B ( X ) be convex and symmetric, and let n • N. Then

e,+l( F, App) e for all i = 1 , . . . ,n + 1. Then

e,~d(F, App) >_ e,

e ~d ,-1~( F , Opt) > - ¢,

and

e~d(F, Opt*) > 1/2 .E. b) Assume that f l , . . . ,fm (where m > n) have mutually disjoint supports and rrt

Ehifie

F

i=1

for all 5i E { - 1 , 1}. Let lnt(f~) k e for i = 1 , . . . , m. Then

e,ad (F, Int) > ( m - n). e. c) Assume that f ~ , . . . , f , + l have mutually disjoint supports, Int(f~) > e, and 4-£ E F for i = 1 , . . . , n + 1. Then

e•d(F, Int) > e. Proof: The proof of these lower bounds can easily be given by means of the radius of information, see 1.1.2. To give an example, we show b). Let .~ E A~,d and f = ~i~1 fi. Then S uses some information N](f) = ( f ( a l ) , . . . , f(a,)) to compute S ( f ) ~ ¢o Ns(f). Renumbering the fi we can assume that N j ( f - ) = N s ( f ) , where

f-=

~-~.fi- ~ i=1

fi.

i=n+l

Because of IS(f) - S ( f - ) t > 2. (rn - n).e and S ( f ) = S ( f - ) the maximal error of S must be at least ( m - n ) . e.

1.3.6 T h e p r o b l e m s A p p , O p t , a n d Opt* for c l a s s e s o f L i p s c h l t z f u n c t i o n s a n d related classes of functions Let (X, d) be a bounded metric space and let f : X ~ R be a continuous function. We put

~ ( f , h ) = sup(IX(x ) - f(y)l i d(x,y) < h}. The function w ( f , . ) : tL+ --~ R + U {co} is called the modulus of continuity of f . If the function f is uniformly continuous then w(f, . ) has the following properties: (1) (2) (3)

w ( f , . ) is nondecreasing, finite, and continuous; limh_0 w(f, h) = 0; w(f, hl + h2) < w(f, hl) + w ( f , h2).

29 I n the following we always assume t h a t ca : R + ~ t t + is a f u n c t i o n which satisfies (1)-(3). We consider the set

F~ = { f : X -- R ] w ( f , h ) < w ( h )

for a l l h > O ) .

First we note t h a t F~ is a convex and s y m m e t r i c subset of the n - t h covering c o n s t a n t c, of X by c, =

inf

sup inf

a t , . . . , a ~ E X aE X

i

B ( X ) . Let n e N. We define

d(a, ai).

Proposition (1)

e~d( F~, App) = e.( F,o, App) = ca(c.)

(2)

e,~qG, Opt) = e=(F~, Opt) = ca(c.) e~a(F,o, Opt*) = e.(F~o, Opt*) = 1/2. ca(c.).

(a)

Proofi (1) Using 1.2.6 we get

e~d( F~, App) = en( F~, App) = =

inf

al,... , a n E X

sup

inf

sup

al,... , a n E X

. l E F t , f(al)=O

Ilflloo

{ca(infd(a,a,))la E X } = ca(c.).

(2) Let S = ¢ o N C A~d for the p r o b l e m Opt, let No(f) i n f o r m a t i o n which is t a k e n for g = 0 a n d let S(0) = a,~+l. We assume t h a t A , ~ , ( S ) = ca(c,) - e with e > 0. We define

= (f(a~),... , f ( a ~ ) ) be the

fi(a) = ca(d(a, ai)) (i = 1 , . . . , n) f , + l ( a ) = ca(d(a,a~+l)) -ca( m i n d(ai,a,+l)) i = 1,... ,n f(a) =

min

i=l,...,n+l

f~(a).

We have fi E F~, hence f E F~, and and supf(a) -

and

f(ai) = 0 for i = 1 , . . . , n and therefore S ( f ) = a~+l

f(a,+l) = s u p f ( a ) +ca(

min

i : 1,... ,n

d(ai,an+l))
0 a n d a l , . . . , a~ • X w i t h

s u p i n f d ( x , a i ) < c,, + e. .vEX

'

W e also a s s u m e t h a t { X 1 , . . . , X , } forms a p a r t i t i o n of X such t h a t

Xi C {x I d(x,ai) < cn + e}. C l e a r l y the d i a m e t e r of Xi is less t h a n 2(c, + e) and therefore each f a p p r o x i m a t e d b y a g such t h a t

E F~ can be

Ill - gll < 1/2- ~(2(c,~ + ~)) a n d gIx, is a c o n s t a n t for each i (define g b y gIx. = 1 / 2 . (sup.vex, f ( x ) + inf.vex, f ( x ) ) ) .

Example Let X = [0, 1] s w i t h d(x, y) = max, Ix, - Yil. It is easy to prove t h a t p , = 1 / 2 . m -~ for n = m e + 1 up to n = ( m + 1 ) ' , where m E N . Hence we have c , = P~+I for all n E N and P r o p o s i t i o n 1.3.7 yields the e x a c t values of

d~(F,o) for all n E N : 1 / 2 . w ( 2 c . ) = d~(F~) = 1 / 2 . w(2pn+l). O n l y the special case s = 1 of this s t a t e m e n t seems to be known, see Grigorian (1973).

Remark It is i n t e r e s t i n g to c o m p a r e this P r o p o s i t i o n with 1.3.6: We have e,(F~o,App) = w(c,) and d,(F,o) _P2~ - e. We consider the functional I n t , defined by 1

2n

I n t ( f ) = ~n ~

f(ai).

i=l

Define fi by

f~(x)

f w(p2, - e - d),

[

0,

if d = d(x, ai) (2n - n ) . ~ n " co(p2,) = 1 / 2 . w(p2,).

Example T h e case X C R ~ a n d I n t = )~ : F~, --+ R is s t u d i e d in Sukharev (1979). We give only one result. Let F , = { f : [0, 1]" ---* t t I If(x) - f(Y)l 1, b u t e , ( F w ~ , , ~ ) = 1 / ( 2 n ) (the last result is due to Zubrzycki (1963/64), see Novak (1983)).

1.3.9 Hblder classes of functions W e define t h e Hblder classes C,k,~ by C~ '~ : { f : [0, 1] ~ ---* R I ] D ( O f ( x ) - D ( O f ( Y ) l

0 such that

n and pairs of xi are separated by at least 2e 1/(~+"). We can choose n of $ -

Because of fi C C~ '", the lower bounds now follow from 1.3.5. To prove the upper bounds we can use known methods using piecewise polynomials, see Ivanov (1971).

Remarks a) Biickner (1950a,b) seems to be the first to obtain similar lower bounds for the problem Int. b) F o r a = 1 and k = 0 o r s = l we know that

d.( C~ '~) = e,( App, C~'~), see 1.3.7 and 1.3.10. It would be interesting to know whether this holds for arbitrary k and s. (What are perfect splines in several variables?) c) By means of 1.2.8 it is easy to prove that _

sup e,(C~'",Int) × n

k+~

,

]]l~tll_ s then Wvk,' may be regarded as a subset of C([0, 1]~) ("imbedding condition"). If, in addition, 2k > s then the following is known:

×

(

n-k/s n-k/n-1/2+l/p k

and L

k,s

dn(W;

)×

{ _k/,+,/2, n-k/s+l/P:

forp> 2 forl s o r p = l a n d k = s . Then there i s a c > 0 s u c h that for each f 6 W~"([a,b] ~) there is a polynomial g E P ( k , s ) (i.e. the degree o f g is less than k) with

Ill

-

glloo

IID 'flI,.

< c . (b - a y - S / ' .

Ic,l=k Proof of Proposition 1 Again the lower bound follows from 1.3.5. To prove the upper bound we divide the cube [0, 1]' into m s little cubes with edge length 1/m (m E N). Consider a quadrature formula on each of the little cubes with dim P(k, s) knots and norm m -s which is exact for all p E P ( k , s ) (such a formula exists, see 1.2.8). Let S be the sum of these quadrature formulas, let

n=

"mS'

k --1

and let [[D~f[[(0 denote the p-norm on the ith subcube (i = 1 , . . . , mS). Then S E A, and for f E W~ '8 we get by Lemma 1:

m

s

I s ( f ) - s ( f ) ] ~ c. m -'-k+'/p ~_~ ~

IID~fll(p o

i=1 M=k

< c. m -'-k+'/p

• m'-'/P.

~_, I,~l=k

IID~'fllp

__.< C • T n - k ~

which proves the statement. Remark

The special case p = 2 of Proposition 1 is treated in Sobolev (1965) with Hilbert space methods. The case 1 s o r p = l a n d

k=s.

Then

exists.

Proof: Let h ( n ) = n kl" . e . ( W ; , ks

, ).

First we h a v e i n f n e N h ( n ) = c > 0. Let e > 0 and no E N with c 0, 0 < 5 < 1. We consider the following partition Z1 of [0, 1]' in subcubes: Beginning with [0, 1]' each (sub-) cube is divided into 2" cubes with half the edge length provided t h a t for the respective cube the inequality

(1)

lt ,ll- . ;
5.

Let l 1 = 2 -a be the edge length of the biggest cube in Z2, and let 12 = 2 -b be the edge length of the smallest cube. Then (7)

N(5) _< ( b - a ) . 2 ~. M.

Let M~ E Z2 have edge length I~ and let Mb E Z, have edge length I2. We further assume that (6) is valid for Mb. We define a measure #' by the following properties: (8)

#'=p

outside ofM~ a n d M b ,

#'--0

on Mb and

" = rl'l[M " 2 °'.

on Mo.

An easy computation yields

(9)

II 'll

1.

We apply the technique which yields to the partition Z~ to the measure #' and obtain a partition Z; = { M i t i = 1 , . . . , N'(5)}. Now we compare N'(5) with .hr(5): First it can be seen that by the change of the measure on Mb the number of the cubes becomes smaller at most by (b - a) • 2". On the other hand new cubes m a y be added by the partition of M~. First we have (10)

p ' = k,~"

with k > 5 . 2 ~ + ~

onM~.

Now we define m E N by

(11)

(2s) m-2
(x ~. y~)~+"

(18)

U~-

--

'

which is valid for u, v, x, y > 0 (see Mitrinovid (1970, p.30)), we obtain 5

(19)

M'_< 6- ~-7.

By (5), (15), (16) and (19) we get s

N(~) _< 2 2' • c~,~. ~ ~¥~ and hence the lemma is proved. P r o o f o f P r o p o s i t i o n 3:

Let be given a Borel measure # on [0,1] ' with IIPH < 1, 0 < 6 < 1 and k,p such that

pk>s. We put a = k - sip > 0 and apply L e m m a 2. As in the proof of P r o p o s i t i o n l we approximate each #~ (i = 1 , . . . , N ) by a quadrature formula/5/ with dim P(k, s) knots and norm II#,ll 0 a n d ~ i =r l m i inequality

= 1.

T h e n for t h e m e a n value

r .~ = )-~i=1 mixi the

i:l

is valid for all x E M . Let l i m ~ _ ~ a,~(F,S) = ~ E R and let e > 0. T h e n t h e r e is a n C N a n d a Q E C(A,~) w i t h A m a , ( Q ) _< ~ + e . Let fl = { W l , . . . ,w~} w i t h m({w~}) = m~ be the p r o b a b i l i t y space t h a t is t a k e n as a basis. For each f E F we have r

A(Q, f ) _- ~ i=l

IIQ(~,,)(/) - s ( f ) l l . m ,

0. Then there is a n E N and a Q E C(A,) with A,~a~(Q ) < ~ + E. Let f~ = { w l , - - - , w ~ } with m ( ( ~ i ) ) = m~ be t h e probability space t h a t is taken as a basis. For each f E F we have

A(Q,f)

= ~ [f(Q(w,)(f))- sup fI" m,

< 6 + £.

i=1

We define S E Ant by

S(f) = Q(~.,i)(f) where i E { 1 , . . . , r ) is the least number with

f(Q(c~,)(f)) = m axf(Q(c~j)(f)). 3

Now we have If(S(f)) - sup fl ~ ~ + ~ for all f E F which proves the first inequality. The proof of the second one is analogous.

2 . 1 . 9 L o w e r b o u n d s o f the a(ad)(F,S) Estimates from above of the numbers a(~d)(F, S) can be given by the more or less explicit indication of suitable Monte Carlo methods. Because only generalized Monte Carlo methods have been investigated up to now, one often has to discretize known methods appropriately. In this section we deal with lower bounds of the a(ad)(F, S). These estimates only use properties of deterministic methods. First we define the numbers e(ad)(F, S, #) which will be discussed in the third chapter.

Definition Let # be a (Borel) probability measure on F and let f* h d# denote the upper integral of a function h : F ~ It. The number £x~(S) =

2x(S(f),

f) dp(f)

51 is called the average error of the (deterministic) method S : F --+ M with respect to the measure #. Analogously to the e(ad) we define e(a~)(F, S, #) =

inf

Au(S ).

SeA ('a)

Because of Au(S ) < Am~,(S) we have

e(.°")(F, s,,) _< e(.°d)(F, S). By the following proposition it follows that the e(,"d)(F, S, #) are also lower bounds for the a(~d)(F, S) in many cases.

Proposition Let # be of the form # = ~irn=l Cie]i with ci > 0, ~ ci = 1, and fi E F (where e! denotes the respective Dirac measure). Then the inequalities

e(2~)(F, s,u) V n/ -~1 .

~,

and (iii)

~d %~_2(F, Opt) > e.

Proposition 2 If we additionally assume that - f i C F for i = 1 , . . . , 2n, then the following bounds for Opt* hold: 1

(i)

~ , Opt*) _> *c rad'F

.e,

(ii)

*&~d(F, Opt*) 2" lV/-2 . e,

and (iii)

1 .e. %aa, - I ( F , Opt , ) > -~

55 Proof: We show *a~d(F, Opt) _> ~n-1 • e under the conditions of Proposition 1. Consider sup f - f ( S ( f ) ) ,

where S E A i d and P = {fi I i = 1 , . . . , 2 n } . information which is taken for g = 0. Then

Let N o ( f ) = ( f ( x l ) , . . . , f ( x ~ ) )

= ¢(N0(O))

be the

=

for at least n different i = 1 , . . . , 2n and hence sup f

>_

- 1). ,.

]e# T h e s t a t e m e n t follows by means of 2.1.9 using the equidistribution on _P. T h e proof of the other inequMities is analogous.

2.2.4 L o w e r b o u n d s

for the problem

Int

Proposition 1 Let F C B ( X ) and fi (i = 1 , . . . , 2n) with the following conditions: i) the f~ have disjoint supports and satisfy I n t ( f i ) = S(f~) > e for i = 1 , . . . ,2n. ii) for all 6i E { 1 , - 1 } the function ~i=1 ~" 6ifi is an element of F. Then (i)

*a~d(F, I n t ) > l x / 2 . e. n 1/2,

(ii)

*o~d(F, I n t ) > 2" ~" nl/2'

1

and (iii)

e,~d(F, I n t ) >_ n . E.

In some cases, another e s t i m a t e yields better results: Proposition 2 If instead of ii) (in Proposition 1) the p r o p e r t y ii') for all i = 1 , . . . , 2n the functions +fi are an element of F holds, then (i)

*a ,ad'F [ , Int) > _ ~ '1e ,

(ii)

*a, ~-ad~F,I n t ) _> l v ~ ' e ,

56 and e ad 2 . _ l ( F , I n t ) > e.

(iii)

Proof: To prove the inequality *-~a 1¢nl/2 under the conditions of Proposition 1, we a , (F, I n t ) >__ -~_.o assume that S E A,~d and consider ~lS(f)/ep

S ( f ) l 2,

where 2n

P =

e

{1,-1)}.

i~l

Collecting those f E /~ with the same information we see that --°

i=o

1eP

2"

If # denotes the equidistribution o n / ~ we get inf # ( A ( S ( f ) , f ) : ) l / 2 > 1 .nl/: ~eA~ -- 2 and the statement follows by means of 2.1.9. Similarly we see t h a t IS(f)-

S ( f ) l >- 2 " .

y~p

Ii - ~ ] . e >_ i=o

> 2 = ~ . ~ . l v f ~.n112 -

4

which proves the inequality c~. ~r, I n t ) >__

• ad,~

1

4v ~

• ¢. n 112

To prove Proposition 2 one has to consider the quantities i s ( f ) _ ~(f)12 /~P and Is(f)-

s(f)l

ye# with :~ c A aa and P = {-/-fi I i = 1 , . . . , 2n} in the same way.

4

57

2.2.5 R e m a r k We will give applications of 2.2.2-2.2.4 to special function classes in the following sections. Bakhvalov (1959) seems to be the first to obtain similar lower bounds for the error of Monte Carlo methods. His result can be regarded as a (weaker) version of 2.2.4.1.

2.2.6 T h e p r o b l e m s App, Opt, Opt*, a n d Int f o r c l a s s e s o f L i p s c h i t z f u n c t i o n s In this section we treat Monte Carlo methods for the class F, (s E N ) , defined by

F, = { f : [0,1]' -~ R I I f ( ~ ) - f(Y)l -
_ (8"21D • (s +

(ii)

. n -112-11s

1)) -1

for n -- 2 S - l m s, m E N . *

(iii) for n

--

a na d [ LF,, M) _> (8.21/2+1/s • (s + 1)) - 1 . T/,-1/2-1/s

2s-lrtzS~ m E N.

Proof: We consider the following generalized Monte Carlo m e t h o d Q E *C(An), where n = m s, m E N. Let n

Q ( w ) ( f ) = n1 E

f(x,(w)),

i=1

where the xi are independent and each xi is equidistributed on a cube Ci with edge length 1 / m and [0, 1]' is the union of the Ci. The r a n d o m variable Q ( f ) is a r a n d o m Riemann sum. Because of

h ( Q , f)2 = -n .=

,

x) - n

f ( y ) dy

dz

we can conclude (see Haber (1966)) t h a t 1

. n_1_2/s

( f E Fs)

< ]5 and hence * -an a d z LF

, , M ) __ < 12 -1/2

• n-ll2-1/s.

Since the set Fs is compact we can replace Q by disrete approximations Ql E C(An). This proves the upper bound. The lower bounds follow from 2.2.4.1. Remark We give a special example. Let s = 10. Then el(Fa0, A1°) = 5/11 and

e,(Flo, A 1 ° ) = -

5 44

for

n=22°>

106 .

We see t h a t en(F, S) tends to zero very slowly. Because of bl(F10, A1°) < 0.2887 and an(F10, A1°) _< 7.1667- 10 -~

for

n = 220

the stochastic errors of suitable Monte Carlo methods are much smaller in this case.

59

2.2.7 Integration of functions of bounded variation Int = ~ for the class

We consider the problem

F v ~ = { f : [0,1] ~

R ] V a r ( f ) _ C. n Gna d (vvp,

~

s

!

2,

!

n - - - s -- 2

k

_

1 1 + ~

(p ~ 2),

(1 0 depends on the class F. Proof: These lower bounds follow from 2.2.4. T h e fi are taken similar as in the proofs of 1.3:9 and 1.3.11. To prove the last s t a t e m e n t , we take 2.2.4.2, whereas the other estimates follow from 2.2.4.1. Remarks

i) Proposition 1 is known, see Bakhvalov (1959) for the first s t a t e m e n t . T h e other estimates are stated without proof in Bakhvalov (1962). ii) To prove the respective upper bounds (not only for I n t = As, but for a r b i t r a r y measures on [0, 1]'), we need - - a stochastic anMog of the interpolation theorem 1.2.8 ( L e m m a 1) - - inequalities similar to L e m m a 1 of 1.3.12 ( L e m m a 2) - - the partitioning l e m m a for measures on [0, 1]' (see L e m m a 2 of 1.3.12). Lemnla 1 Let (X, A, #) be a measure space with a positive finite measure # and let V C L2(X, #) be a vector space with dim V = n and 1 6 V. Then there is a stochastic q u a d r a t u r e formula Q 6 *C(An) with the following properties:

(i)

m(Q(f)) = ~(f)

Q(f) = #(f)

(ii)

(iii)

m((Q(f)

-

for ali f 6 L l ( X , tt);

for all f 6 V;

H~II inf # ( ( f - g)2) gEV

for all f 6 L2(X, #).

61

Remarks i) L e m m a 1 is due to Ermakov, Zolotukhin (1960), see also Ermakow (1975) and Hammersley, Handscomb (1964). Examples can be found in Handscomb (1964). Stochastic q u a d r a t u r e formulas which are exact for certain low order polynomials are also investigated in Bogues, Morrow, Patterson (1981) and Siegel, O'Brian (1985). ii) Let # as in L e m m a 1 and F C L2(X, #). Then

a,,+a(r,#) 2; 1+~

ifl_ 2, kp > s);

k

sup ~r~..p(,W 'k, Int) × n

$

1 + v!

(1_ 1/2. To give an estimation for Au(S), we collect those f E F1 = { f E F0 I n(f) < 2n} which yield the same information N(f). We may assume that Fa contains 24'~-1 functions of the form 2n

4n

i=l

/=2n-bl

whereby the signs in the first sum are known and the signs in the second sum are unknown and arbitrary. We get

:eFo

-

-

2

i=o

- 4

and this proves the inequality >

1

nl/2 .

64 Now we prove b). The optimal information satisfies n(f) = 1 for two functions, n(f) = 2 for two functions and so on till n(f) = k for two functions. Because of f r n(f) dp(f) 1/2. By L e m m a l a [Lemma lb] we conclude that m { ~ I A , ( Q @ ) ) _> ¼e(n)n ~/2} > 1/2

[m{,~ I A , ( Q @ ) ) _> ~-2-~e(n)} _> 1/2]

and therefore

JoJ.,,(:)-

I..._>

65 which yields

Amo.(Q) >

1

112

-->

~U_~e(n)].2

Using Lemma 2a for C~ ,~ and W~ ,s for p > 2 and Lemma 2b for Fvar and W~,' in the case 1 _< p < 2 we obtain our statements.

2.2.11 N o t e s a n d r e f e r e n c e s The lower bounds 2.2.2-2.2.4 and some of its consequences seem to be new although similar error estimates have been used beginning with Bakhvalov (1959). Proposition 3 of 2.2.9 gives new upper bounds for the error of stochastic quadrature formulas, see Novak (1983) for similar results. The results of 2.2.10 are from Novak (1987b).

3. A v e r a g e 3.1 Averages

error bounds

over the class of problem

elements

3.1.1 R e m a r k Let a p r o b l e m S : F ~ M and an error f u n c t i o n A : M x F --* t t + be given. One m a y argue t h a t the m a x i m a l error A.,,~(,~) = sup A ( S ( f ) , f ) ]EF

of a d e t e r m i n i s t i c m e t h o d S is a very pessimistic e s t i m a t e for the t r u e error A ( S ( f 0 ) , f0) for some given f0 E F . Hence it seems to be reasonable, a d d i t i o n a l l y to consider the average error

;

where # is a In Chapters consider the change if we

A(S(f), f) d#(f),

suitable p r o b a b i l i t y m e a s u r e on F . 1 a n d 2 we have seen t h a t a d a p t i o n does not help for m a n y p r o b l e m s if we m a x i m a l error (over F ) of d e t e r m i n i s t i c or stochastic m e t h o d s . This might consider average case errors.

Example Let F1 = { f : [0,1]--* R I I f ( x ) - f ( y ) l have proved the following facts:

(i) (ii)

0 such t h a t for all r > 0 and f l , f2 E F t h e following i n e q u a l i t y holds: # ( { g I d(g, fl) < r} > c. #({g I d(g, f2) < r}).

69 W e show t h a t on F~ no such m e a s u r e exists. More general, t h e following p r o p o s i t i o n holds.

Proposition Let F C B(X) be c o m p a c t a n d convex a n d assume t h a t F c o n t a i n s a m a x i m a l element f0 a n d an e l e m e n t go with inf(f0 - go) > 0. T h e n there is a h o m o g e n e o u s m e a s u r e # on F if a n d only if F is finite dimensional.

Proof a) If F is c o m p a c t , convex, a n d finite dimensional, t h e n o r m e d Lebesgue m e a s u r e on F is a h o m o g e n o u s m e a s u r e . W e do not give the formal a r g u m e n t s , b e c a u s e we are m a i n l y i n t e r e s t e d in t h e case where F is infinite dimensional. b) Now let F C B(X) w i t h t h e given p r o p e r t i e s and let # be a h o m o g e n o u s m e a s u r e on F . Let f0 E F be m a x i m a l a n d g0 e F with a = inf(f0 - g0) > 0, b = sup(f0 - g0). T h e n

forO < r < a +b {h[d(fo,h)_ 0, hi C H, zl + z2 0, z~ + z2 _< 1}. Using formula ( . ) we can c o m p u t e the average error

of the trapezoidal rule i=1 rt

forn=2 m (re•N0). We obtain / ~ ( $ 1 ) = 120 -1/2 and 1

F u r t h e r m o r e we can prove t h a t S~ (n = 2 m) is not far from being optimal in the sense that ~(F, Int,p) × n -l°g~/a°g4,

73 (where l o g 6 / l o g 4 = 1.29248... ), see Graf, Novak (1987). It is interesting to compare this result with the worst case error bound 1

e,(F, Int) - 2 n + 1" See also Proposition 2 of 3.1.9 for a related result.

3.1.8 "Can adaptlon help on the average?" We have seen t h a t adaption does not help for m a n y problems if we consider the maximal error of deterministic or stochastic methods. Wasilkowski, WoSniakowski (1984) proved t h a t even in an average case setting adaption does not help for certain linear problems. Let F and M be real separable Hilbert spaces and let S : F --+ M be a continuous linear operator. Let the average error of S : F --+ M be defined by

where # is a (Borel) probability measure on F. Wasilkowski, Woiniakowski (1984) show t h a t adaption does not help for a wide class of measures including Gaussian measures. See Wasilkowski (1985) and Packel, Woiniakowski (1987) for a survey of related results. There are also linear problems, however, where adaption helps on the average for m a n y probability measures.

Example We consider a simple discrete analog of the problem Int for the class F~ and show that a d a p t i o n helps on the average for every probability measure with full support. (The reader will observe, however, t h a t adaptive methods are only slightly better t h a n nonadaptive ones.)

Let m E N ,

m>5,

and

F = { f : {0, 1 / m , . . . , 1} --+ {0, i l / m , . . . , - 4 - 1 } We consider the problem

I If(x) - f(Y)[ -
ai+.~. This is possible because of m >_ 5 and 5 < n < in. Consider the following adaptive information

Na(f) =

( f ( a l ) , . . . , f(a~)) if [f(a~+~) - f(ai)[ < [a,+= - a,[ (f(al), , f(a~), f ( a i + 2 ) , . . . , f ( a ~ ) , f ( x ) ) else.

74 It is easy to see that the information N~ admits an approximation S' = ¢' o N~ such that

A(S'(f),f) < A(~(f),f) for all f • F and A ( S ' ( f ) , f ) < A ( S ( f ) , f ) for some f • F. S* • A~d with least average error is adaptive.

As a consequence, each

3.1.9 B o u n d s f o r t h e e(~'d)(F, S, # ) T w o questions concerning the e,~(F,S, #) and e,]a(F,S, #) are of main interest: i) How much do the numbers e(~d)(F,S,#) differ from the e(~d)(F,S) ? ii) How much do the e n~d(F, S, p) differ from the e~(F, S, #), i.e. how much does adaption help on the average? T h e answer to both questions of course depends on the measure #. There are problems, however, where the numbers e(~a)(F,S, #) are much smMler t h a n the e(aa)(F,S) for every probability measure. For the class F,, as well as for m a n y other classes, the following statements hold: a) For each n E N there is a probability measure # on F such t h a t e,a d (F, S , # ) is only slightly smaller t h a n e~(F, S) if S = App, Opt, or Opt*. For such an unfavourable measure the average error of optimal methods is not much smaller t h a n the maximal error and adaption does not help on the average. This is interesting particularly for the problems Opt and Opt* because one could guess that adaptive methods are better t h a n nonadaptive ones on the average. b) Concerning the problem Int = M, the numbers e~(F,S,#) are much smaller than en(F, S) for arbitrary measures p, while adaptive methods are again only slightly better: For all four problems the numbers

sup e(.°~)(F, S, ~ ) hardly differ from the numbers

supe,(F,S,#), P

where # runs t h r o u g h all (Borel) probability measures on F. Using results from C h a p t e r 2, we can give lower and upper bounds of these numbers. First we consider the classes Fs = { f : [0, 1]~ --+ R I If(x)

Proposition

(i)

- f(y)[ S

max

Ix, - y,I}.

1

e,~+l(Fs,App) 0. ii) Let e > 0, kp > s, a n d 1 _ 0. iii) Let ¢ > 0. T h e n t h e r e is a m e a s u r e # on C~ ,~ such t h a t inf

cost,(g)__c . n -(k+a)/s-x/2-"

for some c > 0.

Proof: W e use L e m m a 1 a n d L e m m a 2 of 2.2.10. W e a s s u m e t h a t F is a class satisfying the a s s u m p t i o n s of L e m m a l a [ L e m m a l b , respectively] w i t h

~(n) .~ n -z-1/2 a n d ,8 > 0. Let

(Cm)meNbe

[(~(n) × n - z ]

a sequence of positive n u m b e r s w i t h

fi

(1)

Cm = 1.

m----1

F o r m E N let /z,~ be art e q u l d i s t r i b u t i o n 1 =

2'~

1 [w

i=l

= s--g

s,~ i=1

as considered in L e m m a l a [ L e m m a lb], where ¢V denotes t h e respective Dirac measure. Now we define the m e a s u r e p by oo

m=l

C l e a r l y # is a p r o b a b i l i t y m e a s u r e on F . conclude t h a t

(2)

~

Let S = A o N with fF n ( f ) d r ( f ) 0 such that ~ = 1

c,~ = 1. It follows from (5) that

]~(n) ~

(8)

n 1/(1-~')

in this case and by (6) we get

(9) To conclude the proof take Lemma la and Lemma 2a for the classes C~ ,~ and W~ ,s for p > 2 and take Lemma lb and Lemma 2b for W~," in the case ] < p < 2.

3.1.11 Notes

and references

The average case analysis of algorithms for problems which are defined in infinite dimensional spaces began with the papers Suldin (1959, 1960) and Larkin (1972). These papers only consider linear algorithms, nonadaptive information, and linear problems such as App and Int. Arbitrary algorithms and adaptive information for linear problems in a finite dimensional space are studied by Traub, Wasilkowski, Wodniakowski (1984). The analysis of the infinite dimensional setting requires the techniques of measure theory in abstract spaces, see Kuo (1975), Parthasarathy (1967), and Skorohod (1974). Linear problems in Hilbert and Banach spaces are studied in Wasilkowski, Wodniakowski (1984), Micchelli (1984), Farber (1984), Wasilkowski (1986), Voronin, Temirgaliev (1986), Lee (1986), and in Lee, Wasilkowski (1986), see Wasilkowski (1985) and Packel, Wodniakowski (1987) for a survey. The results of 3.1.10 are from Novak (1987b).

3.2 The

average

over

the

set of information

3.2.1 R e m a r k Let there be given a m a p p i n g S : F ~ M and an error function A : M × F ~ R +. We want to define the average a posteriori error of an approximation S = ¢ o N (S E A~d, N E I,~d). The estimate ~ ( S ( f ) , f ) < A~o~(~) is the best one, valid for all f E F. This estimate is an a priori error bound, it does not depend on the information N(f). Knowing N(f) = x E R " , of course A(S(f),f) is valid.

1. To give an e x a m p l e , we t r e a t t h e case n=2. 2 In the case n = 2, the o p t i m a l i n f o r m a t i o n N*ve r is unique a n d given by N * ~ r ( f ) = ( f ( 1 / 2 - 1 / 6 - x/3), f ( 1 / 2 + 1 / 6 . V~)). F u r t h e r m o r e , a2(F~, Opt) = 1 / 4 . ( v ~ - 1) holds. Proposition

P roof: T h e o p t i m a l i n f o r m a t i o n N*.e~ has to be of the form N ( f ) = ( f ( a ) , f ( 1 - a)) w i t h a < 1/4. F i r s t we i n d i c a t e the o p t i m a l a p p r o x i m a t i o n and the local error a ( x ) for such an i n f o r m a t i o n according to P r o p o s i t i o n 1. Because of the s y m m e t r y we assume x2 = 0 and xl >_ 0. T h e n we have S*(f)

f 1/4-

1/4.xl

l

a

if xl > 1 - 4 a if xl _< 1 - 4a

and

a(x)= {

if xl > 1 - 4 a if xl < 1 - 4 a .

1/4-1/4.x, 1 / 2 - 1 / 2 - xl - a

T h e r a d i u s r . . . . ( N ) is given by

1 /1--2a r .... (N)-

0)) dXl

1 - 2 a j0

a n d an easy c a l c u l a t i o n yields 7". . . . ( N ) -

1 - 4a + 6a 2

4 - 8a

from which the s t a t e m e n t follows. Remark It seems to be very difficult to c o m p u t e the o p t i m a l i n f o r m a t i o n N2ver for large n. Numerical calculations show t h a t - - similar as for the p r o b l e m App - - t h e o p t i m a l worst case i n f o r m a t i o n is n e a r l y o p t i m a l . F u r t h e r m o r e we conjecture t h a t the a s y m p t o t i c e s t i m a t e 1

a~(F1, Opt) ~ 3n

85 is true. 3.2.6 Nonadaptive methods for the problem [nt for Lipschitz functions We t r e a t n o n a d a p t i v e m e t h o d s for the p r o b l e m I n t = A and the Lipschitz class F1. As i n f o r m a t i o n , we consider g ( f ) = ( f ( a ~ ) , . . . , f(a~)) where 0 _< a~ < a2 < . . . < a , < 1 are fixed. In this s i t u a t i o n t h e following s t a t e m e n t s hold.

Proposition 1 T h e r e is e x a c t l y one m e t h o d S * , ~ using N with least average error and S ~

S~

is given by

rt

= F,i=l ci" f(ai), where cl = al + 1/2(a2 -- a l ) ,

for i = 2 , . . . , n -

ci = 1/2(ai - ai_l) + 1/2(ai+l - ai)

1,

and cn = 1/2(a~ - a n _ l ) + (1 - an). Moreover r . . . . = 1/2a~ + 1/2(1 - a . ) 2 + 1 / 6 has least m a x i m a l error a n d

~i=l~'~-~a~i+1 -

a~) 2. T h e m e t h o d S~*~ also

r m ~ ( N ) = 1/2a~ + 1/2(1 - a~) 2 + 1 / 4 . E ( a i + l

- ai) 2.

i=l

Proof: F o r given x • N ( F ) t h e set { S ( f ) [ f • F, N ( f ) = x} is an i n t e r v a l in R . Therefore, t h e r e is a unique m e t h o d S ~ r = ¢* o N which is central in the average a n d ¢* is the m i d p o i n t of this interval. One easily shows t h a t ¢*(x) = ~i~=1 cixi w i t h t h e given c~. It is well k n o w n t h a t S ~ , ~ is a m e t h o d 'with least m a x i m a l error and t h a t rma~(N) is as given. Now we have to show only the giw,n f o r m u l a f o r / , . . . . (¢*, N ) = r . . . . ( N ) . F i r s t we have a ( f + const) = a ( f ) for each f • F a n d hence A . . . . (¢*, N ) with P = { ( x l , . . . , x . ) ] x ~ one shows t h a t

1

IF a(Xl ..... , x~) dx2 . . . dx~

= 0 and ] x i + ~ - x i [ ~ a i + l - a i

fori=

1.... ,n-l}.

Now

rt--1

a ( x ) = 1 / 2 . a~ + 1 / 2 - ( 1 - a~) 2 + 1 / 4 . E

[(ai+l - ai) 2 - (xi+l - xi) 2]

i=1

and i n t e g r a t i o n over P leads to the given value of r . . . . ( N ) .

Remark For fixed N the o p t i m a l S~w r is unique (up to a change of ¢* on some null set of course) a n d at t h e s a m e t i m e a m e t h o d with least m a x i m a l error (S~a ~ is not unique, however).

86 Now we ask for the optimal knots, i.e. we want to find the optimal information N * a , and N~,,~ in the class I~. It is interesting to observe t h a t N~,,~ ¢ N*a v e r is valid for all n > 1.

Proposition 2 In the class I , the optimal N~,~, and N*ve ~ are unique and given by

((1)

N*a*(f)= and

N;'e~(f)=

f

~n

'f

( ( 1 ) f

~

(3) ~n

""'

(4)

'f

f\

(2n-l~ 2n

//

(3n- 2hh

~

""'fk3n-1//"

Proof: Proposition 2 follows from Proposition 1 by means of easy calculations. values are also easy to compute: 1

e.(Fl,Int) = rr~(N~) and

r"*a~(N~wr)

-

-- -~n'

9n - 5 4(3n - 1) 2,

The optimal information N ~

1

r. . . . ( N ~ )

= -~n +

T h e following

1 12n ~

an(F1, Int) = r . . . . ( N ~ r ) = 6n

1

2"

for the worst case is of course well known.

posteriorierrorof

3.2.7 The definition of the average a adaptive methods Up to now, the average a posteriori error A . . . . ( ¢ , N ) is defined only for nonadaptive m e t h o d s S = ¢ o N E A , . The following example shows t h a t the same definition is inadequate in the case of adaptive methods.

Example Let F = { f : [ 0 , 1 ] ~ I t [ f ( 0 ) = 0 and [ f ( x ) - / ( y ) [ 1 -

and S ( f ) = f ~ f ( x ) d x .

6

if I/(~)I 1 - 6 and a = e if [xll < 1 - 6 . For suitably chosen small 6, e > 0 we can obtain methods S C A~ d such that the error A . . . . (¢, N ) according to the definition in 3.2.2 is arbitrary small.

87 W e conclude t h a t t h e a v e r a g i n g m e t h o d of 3.2.2 is not a p p r o p r i a t e in this case. A c c o r d i n g to our i d e a of a fair a v e r a g i n g process it seems to be m o r e sensible to a s s u m e t h a t t h e d i s t r i b u t i o n of f(al) = f ( 1 ) E [ - 1 , 1] is the n o r m e d Lebesgue m e a s u r e 1/2-A[_1,~]. Similar, for fixed f(a~) we assume t h a t t h e d i s t r i b u t i o n of f(a2) is t h e n o r m e d Lebesgue m e a s u r e in t h e r e s p e c t i v e interval

~ F,f(a~)

v = {/(a2)].[

=

f(al)}.

P r o c e e d i n g in this way, we o b t a i n t h e following definition. Definition

Let S = ¢ o N E A~ d a n d let a : N ( F ) ~ It + be t h e respective local error. i) F o r z = (xa,...,x,~) e g ( F ) we w r i t e

~(~,...,~,):= ~(~). ii) For 1 _< m < n we define a ( x l , . . . ,x,,,_l) = c~(x~,... ,x,,) if a ( x l , . . . ,xm-~, .) is constant and

a ( x l , . . . , x m _ l ) - A(Z)

a(x:,...,xm)dxm

if t h e d o m a i n Z of the function a ( x l , . . . , xm-1, • ) has a finite positive measure. In this w a y we define a(x~,... ,x,~) for all ( x ~ , . . . ,x,~) such t h a t ( x l , . . . ,x,~) C N ( F ) for c e r t a i n x m + ~ , . . . , xn. In t h e l a s t step we define = A ad . . . . (¢, N ) which we call t h e average a posteriori error of the ( a d a p t i v e ) m e t h o d S --- ¢ o N . A g a i n we define -~a(F,S) =

inf

A~¢r(¢,N),

¢oNEA~. a

where t h e infimum is t a k e n over all S = ¢ o N E A~d for which A ~d . . . . (¢, N ) exists.

3.2.8 Examples

T h e r e are l i n e a r p r o b l e m s where the a p o s t e r i o r i error b o u n d A ( S ( f ) , f ) 0 a n d 0 < a < 1.

3.2.10 Notes and references Some of t h e results of 3.2.1-3.2.6 are c o n t a i n e d in Novak (1986b, 1986c). T h e definition of t h e a v e r a g e a p o s t e r i o r i error for a d a p t i v e m e t h o d s is new.

Appendix: Existence and uniqueness of optimal algorithms A.1 Remark In this appendix we deal with different problems concerning the existence and uniqueness of optimal algorihms. We always assume that S : F ~ M , where M is a metric space and A ( S ( f ) , f ) = diS(f), S(f)). First of all we could ask whether there is a S:, E A~ with =

s).

We consider this problem for S(f) = fo~ f i x )dx and F C Ci[0, 1 ] ) i n A.2. Even in this relatively simple case, the problem of existence of an optimal S~ can only be solved in some cases using special methods (i.e. theory of monosplines). Therefore we treat a more speciM problem in the following sections: We assume t h a t an information operator N : F ~ R " is given and ask whether there is an optimal algorithm ¢* o N in the class A(N) = { C o N I ¢ : R ~ --+ M}. I n A . 3 we deal with the worst case and indicate some resu]ts using well known facts on Chebyshev centers. Then we introduce the notion of a center of a probability measure in a metric space and prove statements on its existence and uniqueness. Similarly as in the worst case these results can be used to prove the existence and uniqueness of optimal average case algorithms. These results are due t o Novak, Ritter (1987) and use the definition of the average error of Wasilkowski (1983). We also show that nonmeasurable algorithms are not better than measurable ones.

A.2 Existence and uniqueness of optimal quadrature formulas There are m a n y papers dealing with the existence and uniqueness of optimal q u a d r a t u r e formulas. The following proposition is a s u m m a r y of the results of different authors, see Bojanov (1986) and Zhensykbaev i1981).

Proposition Let F be one of the classes W~ = { f : [0, 1] -+ R I llf(k)llp _< 1} or W~ = { f C W~ I f(~)(0) = f(~)(1) for r = 0 , . . . , k - 1}, where 1 _ k the optimal linear m e t h o d is unique. d) In the case F = W~ the only optimal linear m e t h o d (up to a rigid shift) is the rectangle formula. We would like to stress t h a t the proof of these results is difficult and relies heavily on the theory of monosplines. The following example shows t h a t the existence of an optimal q u a d r a t u r e formula can not be shown by means of a simple compactness argument.

Example Let F be the closure of the set { f : [0, 1] --+ R I f ( 0 ) = 0, llf'll~ -< 1, []f"llo~ O for all :~ E A , and n E N. Remark

It is widely believed that the order of exactness is a good measure of the quality of an quadrature formula, at least if F is a class of analytic functions. There are situations, however, where optimal methods are much better than Gauss quadrature. An example is the class F={f:[-1,1]--*R]

there is an analytic e x t e n s i o n f : { z e C l i z ]

< 1}~C with ]]][l~ - 1}.

For the problem S ( f ) = f l l f ( x ) dx it is known (see gowalski, WerschuIz, Woiniakowski (1985)) that

while the error of the Gauss method S~ satisfies A.,o.(&)

> ~. n -~.

A.3 Chebyshev centers and optimal worst case algorithms Let (M, d) be a Banach space and let D C M be a bounded set. We define

tad(D, x) = sup d(x, y) yED

and

rad(D) = radM(D)= inf rad(D, x). xEM

The number tad(D) is called the Chebyshev radius of D and a point x E M is a Chebyshev center of D if

rad(D) = rad(D, x). The notion of a Chebyshev center is well established, see Garkavi (1964) and Amir (1984).

92 W e a s s u m e t h a t S : F ~ M and N : F ~ called o p t i m a l (in t h e worst case) if A,~,(S')=

R ~ are given. A n a l g o r i t h m

inf

SeA(N)

S* E A(N) is

A,~,(S).

A m e t h o d ¢ o N E A(N) is called c e n t r a l if ¢ ( x ) is a C h e b y s h e v center of each x. It is easy to see t h a t every central a l g o r i t h m is o p t i m a l a n d

S(N-I(x)) for

r , ~ ( N ) = sup raHS(N-I(x)). xER,~

If M is a s u b s p a c e of a B a n a c h space ~ r (for e x a m p l e M C M** = M b y t h e n a t u r a l i m b e d d i n g ) t h e n tad(D, x) with x C M and rad~(D) = i n f e ~ tad(D, x) are defined as well. T h e following p r o p o s i t i o n is a s u m m a r y of well k n o w n facts. W e shall see in A.4 t h a t some of our results for t h e average case are similar, b u t t h e r e are i n t e r e s t i n g differences as well.

Proposition 1) In t h e case M =

C(K) (where K is c o m p a c t ) a C h e b y s h e v center exists for every

DcM. 2) In t h e case M =

B(T) (where T is a r b i t r a r y ) a C h e b y s h e v center exists for every

DCM. 3) Let M = C(K) and M = B(K). T h e n radM(D) < rad~(D) for each D C M and this i n e q u a l i t y is strict for some D. 4) Let M be t h e d u a l space of a n o r m e d linear space. T h e n a C h e b y s h e v center exists for every D C M . 5) Let P : M -~ M be a p r o j e c t i o n with [[PH = 1. T h e n we have

radM(D) = rad~(D) for every D C M a n d t h e p r o j e c t i o n of a C h e b y s h e v center in M is a C h e b y s h e v center in M . 6) T h e C h e b y s h e v center of any b o u n d e d n o n e m p t y set D C M is unique if and only if M is u n i f o r m l y convex in every direction.

Remarks T h e first s t a t e m e n t is due to Kadets, Zamyatin (1968), t h e second one is trivial. For t h e t h i r d s t a t e m e n t consider D = { f [ Hf[[ _ 0} C C ( [ - 1 , 1 ] ) . T h e n radc([_l,1])(D) = 1 b u t radB([-1,1l)(D) = ½. T h e s t a t e m e n t s 4 ) - 6 ) are due to Garkavi (1964). W e m e n t i o n t h a t every u n i f o r m l y convex space is u n i f o r m l y convex in every direction a n d therefore every b o u n d e d n o n e m p t y set in a u n i f o r m l y convex space has a unique C h e b y s h e v center by 4), 6), and a well k n o w n t h e o r e m of M i l m a n .

A.4 Centers of probability measures W e are m M n l y i n t e r e s t e d in o p t i m a l average case a l g o r i t h m s . To establish results on the existence a n d uniqueness of o p t i m a l a l g o r i t h m s in this s e t t i n g we i n t r o d u c e the n o t i o n of a center of a p r o b a b i l i t y m e a s u r e in a m e t r i c space.

93 Let m be a (finite Borel) m e a s u r e on the m e t r i c space M and let 1 < p < co. W e define

a n d a s s u m e t h a t radV(m, . ) is a finite function, i.e. radV(m, x) < oo for some x 6 M ( a n d hence for a n y x 6 M ) . T h e n u m b e r

radV(m) = radvM(m) = inf radV(m, x) xEM

is called t h e p - r a d i u s of m a n d a p o i n t x 6 M is called a p - c e n t e r of m if

radV(m) = radV(m, x). W e use t h e following n o t a t i o n s . A m e a s u r e on a m e t r i c space is u n d e r s t o o d to be a finite Borel m e a s u r e . M e a s u r a b i l i t y of a m a p p i n g m e a n s m e a s u r a b i l i t y w i t h respect to the Borel sets. W e a s s u m e t h a t M is a B a n a c h space a n d t h a t m is a m e a s u r e on M with radV(m__. ) < ec for some given p, 1__ p < co. If M is a closed s u b s p a c e of t h e B a n a c h space M t h e n radP(m, x) w i t h x 6 M a n d radV~(m) = i n f e ~ radV(m, x) are defined as well. Now we s t a t e our results concerning t h e existence of p - c e n t e r s .

Proposition 1) In t h e case M = C([0, 1]) a p - c e n t e r does not always exist. 2) Let M = B(T) be t h e space of b o u n d e d functions on a set T, and let m be a t i g h t m e a s u r e on M . T h e n a p - c e n t e r always exists. 3) Let M = C ( K ) a n d M = B(K), where K is a c o m p a c t m e t r i c space. T h e n

radvM(m) = rad~(m). 4) Let M be the d u a l space of a n o r m e d linear space a n d m be a t i g h t m e a s u r e on M . T h e n a p - c e n t e r always exists. 5) Let P : M ---* M be a p r o j e c t i o n with lIP[[ = 1. T h e n we have

radvM(m ) = radV~(m ) and the p r o j e c t i o n of a p - c e n t e r in M is a p - c e n t e r in M . 6) Let 1 < p < oo. T h e n the p - c e n t e r of every m e a s u r e with radvM(m, ) < oo is unique if a n d only if M is s t r i c t l y convex. Remarks and Proofs: A m e a s u r e m on a m e t r i c space M is said to be tight if for each e > 0 t h e r e exists a c o m p a c t set D~ C M such t h a t m(M \ D~) < e. If M is a c o m p l e t e s e p a r a b l e m e t r i c space, t h e n every m e a s u r e on M is tight, see Parthasarathy (1967, p. 29). T h e s t a t e m e n t s 2), 4), and 5), are analogous to the respective results on C h e b y s h e v centers. Also the proofs are similar for 4) (it can be shown t h a t the function x ~ radV(m, x) is weak* lower semicontinuous and hence takes its infimum on M ) a n d 5). For the p r o o f of 2) we use t h e fact t h a t x ~ radP(m, x) is lower semicontinuous with respect to t h e p r o d u c t

94 topology on I - c , C]T. If M is strictly convex then radP(rn,. ) is a strictly convex m a p p i n g for 1 < p < c~ and 6) follows. (The case p = 1 is slightly more complicated, see Novak, Ritter (1987))• We should remark that 4) (for the case where the support of m is compact and X is the dual of a separable space) is due to Beauzamy, Maurey (1977)• The expectation in metric spaces (i.e. the case p = 2) is also studied in Pick (1987). Now we prove the statements 1) and 3) which are different as in the worst case. P r o o f o f 1). We construct a probability measure m on M = C([0, 1]) with the following properties: The support of m is bounded and hence radP(m, x) < c~ for each x E M and 1 _< p < c¢. There is no x e M with radP(m, x) = radv(m). First we define the functions x,,y, EMfornEN. Let 0, x.(t) =

if t

2n(t-

½),

and

y~(t) =

} + 2--lg

if½ < t < ½+2-1£

-1,

if t _< =1

O,

if t _>

2n(t-}),

if½-~

2.1

1

< t < £2 "

Now we consider tile probability measure

1 ~__1 .(e=. +e~.), m=~. 2" n=l

where e~ is the respective Dirac measure• Let z e M = C([0, 1]). Then

radP(m, z) =

-~

ii > --

(51 --

n=l

2n

•

( z

+ z

(1~(o)1 + 11+ z(O)l)

>_ ~.

1 1 Considering the function z~ = ~(x, + Yn) (for n -+ oc), we see t h a t radP(m) = 5" assume t h a t rP(m, z*) = ½ for some z* E M . The above estimate shows that

I[z* -

+ IIz" - y.ll

NOW

we

= 1

for all n. If z *(5) 1 < 0 then Ilz* - x,I I > 1 for large n. If z*(}) > 0 then IIz* - Y-II > 1 for l a r g e n . Ifz*(5)l = 0 t h e n I I z * - x ~ l I >__ ~2 and IIz * - y . [ [ >_ ~ for l a r g e n . Hence we have proved t h a t such a z* does not exist. P r o o f o f 3).

Let z* C M = B(K) be a p-center of m in M , i.e.

r a g ( m , z*) =

95 a n d let e > 0. T h e r e is a c o m p a c t set

/D IIz for all

D C C(K) with

xll' din(x))

\ lip

_>

z) -

z e B ( K ) , Nz[l < IIz*ll. We m a y assume t h a t D is of t h e form D = {x 6 C ( K ) ll]x N (F,S) A.(~) f" hd. e(Z+(F, s,,)

el C(d~ ~r) ~ar(r, S) Z~.(Q) ~(Z+(F, s, ~) F~ ~(~) I, .... (¢,N)

~.(F,S) 1". . . . ( N )

~(~,... , ~ ) Aad .... (¢,N)

a~.d(F, S)

w~ radP(rn)

z~ ( ~) r~(N)

error b o u n d s for M o n t e C a r l o m e t h o d s , 2.1.4 b o u n d s for t h e dispersion of M o n t e C a r l o m e t h o d s , 2.1.4 average error of S, 2.1.9, 3.1.2 u p p e r i n t e g r a l of h w i t h respect t o #, 2.1.9 b o u n d s for t h e average error, 2.1.9, 3.1.2 Dirac m e a s u r e , 2.1.9 class of M o n t e Carlo m e t h o d s w i t h v a r y i n g c a r d i n a l i t y , 2.1.10 error b o u n d s for M o n t e C a r l o m e t h o d s , 2.1.10 average error of Q, 3.1.3 b o u n d s for t h e average error of M.C. m e t h o d s , 3.1.3 a class of Lipschitz functions, 3.1.4 average c a r d i n a l i t y of a m e t h o d , 3.1.10 local error of a m e t h o d S, 3.2.1 average a p o s t e r i o r i error, 3.2.2 b o u n d s for the average a p o s t e r i o r i error, 3.2.2 average case radius of N , 3.2.2 certain m e a n s of t h e local error, 3.2.7 average a posteriori error of a d a p t i v e m e t h o d s , 3.2.7 b o u n d s for the average a p o s t e r i o r i error, 3.2.7 a class of periodic Sobolev functions, A.2 p - r a d i u s of a measure, A.4 p-average error of a m e t h o d , A.5 p-average r a d i u s of an i n f o r m a t i o n , A.5

Index adaption constant of a normed space adaptive method, adaptive information approximation of S approximation which uses a certain information center of a set center of a measure central algorithm complexity coapproximation, best concentration of measure phenomenon covering constant of a metric space diameter of a set differential equations doubling condition e-approximation equidistribution on a compact metric space equivalence (strong and weak) of sequences errors (and error bounds) error function maximal error of a deterministic method error of a Monte Carlo method dispersion of a Monte Carlo method average error of a deterministic method average error of a Monte Carlo method average a posteriori error local error Gauss quadrature H61der classes of functions homogeneous measure on a compact metric space ill-posed problems information, information operator information-based complexity integral equations Jung constant of a normed space linear problems Lipschitz classes of functions lipschitz problem measurable algorithm metric dimension Monte Carlo method (generalized and restricted) nonadaptive method, nonadaptive information optimal algorithm (worst case) optimal algorithm (average case)

1.3.1 1.1.3 1.1.1 1.1.2 A.3 A.4 3.2.2, A.3 1.1.7 A.4 3.1.4 1.3.6 1.3.1 1.1.5 3.1.5 1.1.2 3.1.6 1.3.6, 1.3.9 1.1.1 1.1.1, 2.1.2 2.1.2 2.1.9, 3.1.3 3.2.2, 3.2.1 A.2 1.3.9 3.1.5 1.1.6 1.1.2, 1.1.7 1.1.5 1.3.1 1.3.1 1.3.6 1.2.4 A.6 3.1.5

1.1.3

3.1.2, A.5 3.2.7

1.1.3

2.1.2, 2.1.4 1.1.3 A.3 A.5

113

optimal quadrature formulas optimal recovery packing constant of a metric space problem S problem App problems Opt and Opt* problem Int quasi Monte Carlo methods Sobolev classes of functions solvable problem radius Chebyshev radius of a set p-radius of a measure maxima] radius of an information average radius of an information p-average radius of an information random, random number selection theorem varying cardinality widths in B(X), linear widths in B(X)

1.3.12, A.2 1.1.7 1.3.7 1.1.1 1.1.4 1.1.4 1.1.4 2.1.6 1.3.10, 1.3.11 1.2.4 1.3.1, A.3 A.4 1.1.2 3.2.2 A.5 2.1.3, 2.1.5 A.6 2.1.10, 3.1.10 1.2.1, 1.2.6

Lecture Notes in Mathematics

Lecture Notes in Mathematics

Suggest Documents