Strong Approximation of Systems of Stochastic ...

1 downloads 0 Views 579KB Size Report
My par- ticular thanks are to Norbert Hofmann, Erich Novak, Klaus Ritter, and Dirk Werner. .... cept of information-based complexity, see Traub, Wasilkowski und ...
Strong Approximation of Systems of Stochastic Differential Equations

Habilitation Thesis by

Dr. Thomas M¨ uller-Gronbach

accepted by the Department of Mathematics Darmstadt University of Technology Germany

Referees: Prof. Prof. Prof. Prof.

Dr. Dr. Dr. Dr.

Peter Kloeden J¨ urgen Lehn Klaus Ritter Henryk Wo´zniakowski

Darmstadt, January 2002

Acknowledgements I am grateful to several persons for fruitful discussions and valuable comments. My particular thanks are to Norbert Hofmann, Erich Novak, Klaus Ritter, and Dirk Werner.

Contents Chapter I. Introduction 1. Main Results 2. An Outlook 3. Notation

1 4 7 10

Chapter II. Basic Facts about Strong Approximation 1. Strong Solutions 2. The Commutativity Property 3. Strong Approximations 4. Classical Methods 5. Minimal Errors and Optimality 6. Pathwise Convergence, Maximum Step-Size and Asymptotic Efficiency

11 11 13 14 15 18 22

Chapter III. Global Approximation 1. L2 -Approximation of Systems with Commutative Noise 1.1. Local Smoothness in Terms of the Driving Brownian Motion 1.2. The Adaptive Method 1.3. Error Analysis and Asymptotic Optimality 1.4. Proofs 2. Lp -Approximation of Systems with Commutative Noise, p < ∞ 2.1. Local Smoothness in Terms of Conditional H¨older Constants 2.2. Adaptive Methods with Component Independent Discretization 2.3. Error Analysis and Asymptotic Optimality 2.4. Proofs 3. L2 -Approximation of Systems with Non-Commutative Noise 3.1. The Sampling Strategy 3.2. Adaptive Methods with Component Independent Discretization 3.3. Error Analysis and Optimality Properties 3.4. Proofs 4. L∞ -Approximation 4.1. Maximal Conditional H¨older Constants 4.2. Adaptive Methods with Component Independent Discretization 4.3. Error Analysis and Asymptotic Optimality 4.4. Proofs iii

27 27 28 28 32 35 43 43 44 45 49 60 65 68 75 77 98 98 99 100 102

iv

CONTENTS

Chapter IV. One-Point Approximation 1. One-Point Approximation of Scalar Equations 1.1. The Relation to Weighted Integration 1.2. The Adaptive Method 1.3. Error Analysis and Asymptotic Optimality 1.4. Proofs 2. One-Point Approximation of Systems with Non-Commutative Noise 2.1. The Sampling Strategy 2.2. Adaptive Methods with Component Independent Discretization 2.3. Error Analysis and Asymptotic Optimality 2.4. Proofs

103 103 103 105 107 112 123 123 125 126 128

Chapter V. Appendix 1. Gronwall’s Lemma 2. Error Bounds for Auxiliary Processes 2.1. The Milstein Process X M 2.2. The Truncated Milstein Process X M t 2.3. The Modified Milstein Process X M m 2.4. The Wagner-Platen Process X W P 2.5. The Truncated Wagner-Platen Process X W P t 2.6. The Modified Wagner-Platen Process X W P m

135 135 136 137 142 143 147 151 152

References

159

CHAPTER I

Introduction The topic of this work is strong approximation of d-dimensional systems of stochastic differential equations in the Itˆo-sense, dX(t) = a(t, X(t)) dt + σ(t, X(t)) dW (t),

t ∈ [0, 1],

with an m-dimensional driving Brownian motion W . In the majority of cases, such a system can not be solved explicitly so that numerical methods must be used for the approximation of the solution X. The matter of estimating quantities that only depend on the law of the solution, such as the expected value of some functional of X, is called the weak approximation problem. On the other hand, strong approximation means pathwise approximation of the solution X, either globally on the interval [0, 1] or at finitely many points. Historically, the first methods for strong approximation are the Euler scheme and the Milstein scheme, which have been introduced and analyzed by Maruyama (1955) and Milstein (1974), respectively. Meanwhile, there exists a great variety of numerical methods for strong approximation, which in parts reflect basic ideas for the numerical solution of deterministic differential equations. This includes, e.g., Itˆo-Taylor-methods, which are based on the stochastic Taylor expansion derived by Wagner and Platen (1978), and stochastic Runge-Kutta methods, as well as multistep and implicit versions of these kinds. The book of Kloeden and Platen (1995) is a standard reference on this field. In the literature, up to few exceptional cases, the error analysis of methods for strong approximation deals only with upper bounds for the error at the discretization points in the interval [0, 1]. Upper bounds do not give an answer to the following natural question: Assume that the driving Brownian motion W may be evaluated at N points. Where in the unit interval should these evaluations be made and how should the data be used in order to obtain the best possible approximation to the solution? We answer this question in an asymptotic sense, i.e., as N → ∞, by providing sharp upper and lower error bounds together with the corresponding asymptotic constants. The lower bounds hold for all methods that depend in any measurable way on at most N sequential evaluations of W on the average. In particular, this includes all methods 1

2

I. INTRODUCTION

that are implementable on a computer, e.g., via C-codes, and use a standard normal random number generator to simulate W at finitely many points. Our asymptotically optimal methods, which achieve the upper bounds, are easy to implement. These methods choose the location as well as the number of the discretization points adaptively for every trajectory of the solution X. The computation time of these methods is proportional to the number of evaluations of W with a small constant of proportionality. Furthermore, it turns out that choosing the discretization in an adaptive way is necessary, i.e., methods that are based on a fixed discretization can not achieve asymptotic optimality. We study two variants of strong approximation, namely global approximation of X for every t ∈ [0, 1] and approximation of X at the single point t = 1. Global Approximation. Here, one wants to find a d-dimensional stochastic prob whose trajectories are close to the corresponding trajectories of the solution on cess X b the whole unit interval. Then it is natural to measure the distance between X and X pathwise by the Lp -distance of the respective trajectories, ( )1/p  ∫1 p b |X(t) − X(t)|p dt if 1 ≤ p < ∞ 0 b p= kX − Xk sup b if p = ∞. 0≤t≤1 |X(t) − X(t)|∞ b of the global Here, |·|p denotes the `p -norm of a d-dimensional vector. The error ep,q (X) b is defined by averaging over all trajectories, i.e., approximation X ( ) b = EkX − Xk b q 1/q ep,q (X) p b to the mean number n(X) b 1 of evaluafor some q ∈ [1, ∞[. We relate the error ep,q (X) b tions of the driving Brownian motion that are used by X. One-Point Approximation. Here, one wants to find a d-dimensional random b vector X(1) that is close to X(1). More general, one considers the approximation of the solution X at finitely many points in the unit interval. For every realization, the b b distance between X(1) and X(1) is measured in the `p -norm |X(1)− X(1)| p . Averaging over all trajectories as above, the error is defined by ( ) p 1/p b b . ep (X(1)) = E|X(1) − X(1)| p b Similar to the case of global approximation, we let n(X(1)) denote the mean number b of evaluations of W that are used by X(1). 1More

precisely, we will permit evaluations of the different components of W at different points b the mean number of evaluations of the single components. If all components are and study n(X), b = m · n(X) b evaluated at the same points in [0, 1] then n(X)

I. INTRODUCTION

3

Minimal Errors. We study methods that use a finite number of evaluations w(τ1 ), . . . , w(τν ) of a trajectory w of the driving Brownian motion W for the approximation of the corresponding trajectory of the solution X. For every n ∈ {1, . . . , ν}, the choice of the point τn may depend in any measurable way on the observed initial value x = X(0) and the previously computed values w(τ1 ), . . . , w(τn−1 ). Moreover, the total number ν of evaluations of w may be determined by any measurable termination criterion. Thus b = E(ν) for a global approximation X. b n(X) The minimal error for global approximation that can be obtained by methods that use at most N evaluations of W on the average is given by b : n(X) b ≤ N }. ep,q (N ) = inf{ep,q (X) The task is to determine the rate of convergence of this quantity and to provide implementable methods that achieve this rate of convergence. Similarly, for one-point approximation, we study the minimal error b b ep (N ) = inf{ep (X(1)) : n(X(1)) ≤ N }. Complexity. Our investigation of strong approximation relies on the general concept of information-based complexity, see Traub, Wasilkowski und Wo´zniakowski (1988). In fact, our results essentially yield the complexity of strong approximation of systems of stochastic differential equations. For instance, in the case of global approximation, the ε-complexity is the minimal computational cost necessary to achieve a global error of at most ε > 0, i.e., b : ep,q (X) b ≤ ε}. comp(ε) = inf{cost(X) b the computational cost of a method X, b consists of Here cost(X), b of evaluations of W , (i) the mean number n(X) (ii) the mean number of function values or derivative values, e.g., of the drift coefficient a and the diffusion coefficient σ that are used, (iii) the mean number of arithmetic operations needed for the calculation of the approximation. Clearly, b ≤ cost(X), b m · n(X) if the cost of evaluating a single component of the m-dimensional Brownian motion at a single point in the interval [0, 1] is one by definition. On the other hand, our asymptotically optimal methods satisfy b ≤ const · n(X) b cost(X)

4

I. INTRODUCTION

with a small constant that does not depend on the particular system of equations. b of evaluations of W as In the present work, we therefore use the mean number n(X) b is the a rough measure of the computational cost. Practically, the quantity m · n(X) b is mean number of calls of a standard normal random number generator and cost(X) the average computation time. The relation between the ε-complexity and the minimal error introduced above is obvious. In particular, all results in this work may also be formulated in terms of complexity. 1. Main Results In the following we present main results for global and one-point approximation. These results hold under moment conditions on the initial value X(0) and certain regularity conditions on a and σ, such as linear growth and Lipschitz conditions. Lp -Approximation, p < ∞. For global approximation in the case p < ∞, it turns out to be crucial whether the diffusion coefficient σ has the so-called commutativity property. This property ensures that the trajectories of the solution X depend continuously on the trajectories of the driving Brownian motion W . In particular, this condition is satisfied in the case of additive noise, i.e., σ(t, x) = σ(t), or if W is onedimensional. In the commutative case, if q = p = 2 then )1/2 ∫ 1 (∑ d √ 1/2 2 lim N · e2,2 (N ) = 1/ 6 · E |σi (t, X(t))|2 dt, N →∞

0

i=1

where σi denotes the i-th row of the Rd×m -valued diffusion √ coefficient σ. Hence, the order of convergence of the minimal error e2,2 (N ) is 1/ N unless σ = 0. The latter means that X is the solution of a system of ordinary differential equations. Due to the diffusion property of the solution, ( ) E |X(t + δ) − X(t)|22 X(t) = x = (s(t, x))2 · δ + o(δ), where s(t, x) =

(∑ d

)1/2 |σi (t, x)|22

.

i=1

Therefore, X is H¨older continuous of order 1/2 in the mean squared sense and s(t,√x) might be called a conditional H¨older constant given X(t) = x. Up to the factor 1/ 6, the asymptotic constant ∫ 1 √ 1/ 6 · E s(t, X(t)) dt 0

1. MAIN RESULTS

5

for the minimal errors e2,2 (N ) is merely the conditional H¨older constant averaged in time and space with respect to the Lebesgue-measure and the distribution of the solution. Furthermore, the asymptotically optimal method derived in these notes uses an adaptive discretization with step-size at (t, X(t)) roughly proportional to the inverse (s(t, X(t)))−1 of the conditional H¨older constant. √For finite q = p 6= 2, the order of convergence of the minimal error ep,p (N ) remains 1/ N and the asymptotic constant is determined by the 2p/(p + 2)-th average in time and space of the conditional Lp -H¨older constant (∑ )1/p d p Hp (t, x) = mp · |σi (t, x)|2 , i=1

mpp

denotes the p-th absolute moment of a standard normal variable. Here, the where asymptotically optimal method chooses the step-size roughly proportional to (Hp (t, X(t)))−1 . In the non-commutative case, not only the local smoothness of the solution but also the deviation of the diffusion coefficient from commutativity has to be taken into acount. For every pair σ (j1 ) and σ (j2 ) of distinct columns of σ this deviation is measured in time and space by the function ( ) ncj1 ,j2 (t, x) = ∇x σ (j2 ) · σ (j1 ) − ∇x σ (j1 ) · σ (j2 ) (t, x), where ∇x σ (j) denotes the Jacobian matrix of σ (j) with respect to the state variable x. The effect of a deviation at time t depends on the variability of the solution with respect to its state at time t. This variability is determined by a d × d-dimensional stochastic field Φ(t, s), 0 ≤ t ≤ s ≤ 1, which, roughly speaking, is the L2 -derivative of the solution at time s as a function of the initial value X(t). The effect of non-commutativity is quantified by the process ∑ ∫ 1 Φ(t, s) · ncj1 ,j2 (t, X(t)) 2 ds, Ψ(t) = 0 ≤ t ≤ 1. 2 1≤j1

Suggest Documents