Itô stochastic differential equations using Malliavin calculus. ...... Financial Calculus: An Introduction to ... equations: Error analysis with Malliavin calculus. Math.
Numerical Complexity Analysis of Weak Approximation of Stochastic Differential Equations Ra´ ul Tempone Olariaga
TRITA-NA-0220 Doctoral Dissertation Royal Institute of Technology Department of Numerical Analysis and Computer Science
Numerical Complexity Analysis of Weak Approximation of Stochastic Differential Equations Ra´ ul Tempone Olariaga
Akademisk avhandling som med tillst˚ and av Kungl Tekniska H¨ogskolan framl¨agges till offentlig granskning f¨ or avl¨aggande av teknisk doktorsexamen Fredagen den 11 oktober 2002 kl 10.00 i D3, Lindstedtsv¨agen 5, Entreplan, Tekniska H¨ ogskolan, Stockholm.
ISBN 91-7283-350-5, TRITA-NA-0220, ISSN 0348-2952, ISRN KTH/NA/R-02/20,
c Ra´
ul Tempone Olariaga, October 2002 Royal Institute of Technology, Stockholm 2002
Akademisk avhandling som med tillst˚ and av Kungl Tekniska H¨ogskolan framl¨agges till offentlig granskning f¨ or avl¨ aggande av teknisk doktorsexamen Fredagen den 11 oktober 2002 kl 10.00 i D3, Lindstedtsv¨ agen 5, Entreplan, Tekniska H¨ogskolan, Stockholm. ISBN 91-7283-350-5 TRITA-NA-0220 ISSN 0348-2952 ISRN KTH/NA/R-02/20 c Ra´
ul Tempone Olariaga, October 2002 KTH, Stockholm 2002
iv
Abstract The thesis consists of four papers on numerical complexity analysis of weak approximation of ordinary and partial stochastic differential equations, including illustrative numerical examples. Here by numerical complexity we mean the computational work needed by a numerical method to solve a problem with a given accuracy. This notion offers a way to understand the efficiency of different numerical methods. The first paper develops new expansions of the weak computational error for Itˆo stochastic differential equations using Malliavin calculus. These expansions have a computable leading order term in a posteriori form, and are based on stochastic flows and discrete dual backward problems. Beside this, these expansions lead to efficient and accurate computation of error estimates and give the basis for adaptive algorithms with either deterministic or stochastic time steps. The second paper proves convergence rates of adaptive algorithms for Itˆo stochastic differential equations. Two algorithms based either on stochastic or deterministic time steps are studied. The analysis of their numerical complexity combines the error expansions from the first paper and an extension of the convergence results for adaptive algorithms approximating deterministic ordinary differential equations. Both adaptive algorithms are proven to stop with an optimal number of time steps up to a problem independent factor defined in the algorithm. The third paper extends the techniques to the framework of Itˆ o stochastic differential equations in infinite dimensional spaces, arising in the Heath Jarrow Morton term structure model for financial applications in bond markets. Error expansions are derived to identify different error contributions arising from time and maturity discretization, as well as the classical statistical error due to finite sampling. The last paper studies the approximation of linear elliptic stochastic partial differential equations, describing and analyzing two numerical methods. The first method generates iid Monte Carlo approximations of the solution by sampling the coefficients of the equation and using a standard Galerkin finite elements variational formulation. The second method is based on a finite dimensional KarhunenLo`eve approximation of the stochastic coefficients, turning the original stochastic problem into a high dimensional deterministic parametric elliptic problem. Then, a deterministic Galerkin finite element method, of either h or p version, approximates the stochastic partial differential equation. The paper concludes by comparing the numerical complexity of the Monte Carlo method with the parametric finite element method, suggesting intuitive conditions for an optimal selection of these methods. 2000 Mathematics Subject Classification. Primary 65C05, 60H10, 60H35, 65C30, 65C20; Secondary 91B28, 91B70. Keywords and phrases. Adaptive methods, a posteriori error estimates, stochastic differential equations, weak approximation, Monte Carlo methods, Malliavin Calculus, HJM model, option price, bond market, stochastic elliptic equation, KarhunenLo`eve expansion, perturbation estimates, numerical complexity ISBN 91-7283-350-5 • TRITA-NA-0220 • ISSN 0348-2952 • ISRN KTH/NA/R-02/20
v
vi
Acknowledgments First I wish to thank my advisor, Anders Szepessy, for all his support, both enthusiastic and continuous, and for his enlightening guidance during my PhD studies. Without his help this dissertation could hardly be written. I am also grateful to Georgios Zouraris and Kyoung-Sook Moon for many fruitful discussions, scientific and others, and for their nice company at different scientific meetings. All my good friends at NADA have contributed to make my stay a pleasant one, and I am thankful for that. Among them, I wish to particularly thank Jesper Oppelstrup and Ingrid Melinder. I would like to thank Germund Dalhquist for sharing his exciting views of numerical analysis and finding time for scientific discussions. Special thanks also to my friends Ulf and Christer Andersson, for their infinite patience and invaluable computer consulting. Ivo Babuˇska arranged the financial support for my two visits to the Texas Institute for Computational and Applied Mathematics (TICAM) and gave me the opportunity to learn from him there. Many thanks to him and to the people at TICAM for their warm hospitality. Tomas Bj¨ork introduced me to mathematical finance. Here goes my appreciation for sharing his expertise in the area. Last but not least, I want to specially thank my dear wife Roxana for all her love and patience during these years.
This work was financially supported by the Swedish council for Engineering Science (TFR), grant # 22-148, UdelaR and UdeM in Uruguay and by the National Science Fundation with the grant DMS-9802367. All these grants are gratefully acknowledged.
vii
viii
Contents 1 Introduction 1.1 Itˆo Stochastic Differential Equations . . . 1.1.1 Weak Approximation of SDEs . . . 1.1.2 Overview of Paper 1 . . . . . . . . 1.1.3 Overview of Paper 2 . . . . . . . . 1.1.4 Weak Approximation of an Infinite 1.1.5 Overview of Paper 3 . . . . . . . . 1.2 Stochastic Partial Differential Equations . 1.2.1 Overview of Paper 4 . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Dimensional SDE . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . .
. . . . . . . .
. . . . . . . .
3 7 7 10 11 12 14 15 20
List of Papers Paper I: A. Szepessy, R. Tempone and G. Zouraris. Adaptive weak approximation of stochastic differential equations. Communications on Pure and Applied Mathematics, 54,(10): 1169–1214, 2001. Paper II: K-S. Moon, A. Szepessy, R. Tempone and G. Zouraris. Convergence Rates for Adaptive Weak Approximation of Stochastic Differential Equations. Presented in the following scientific meetings: First SIAM-EMS Conference ”Applied Mathematics in our Changing World” in Berlin on September 2-6, 2001. “Third International Conference on Applied Mathematics for Industrial Flow Problems” Lisbon, Portugal, on April 17-20 2002. Paper III: T. Bj¨ ork, A. Szepessy R. Tempone and G. Zouraris. Monte Carlo Euler approximation of HJM term structure financial models. Presented in the following scientific meetings: Stochastic Numerics 2001 at ETH, Zurich, Switzerland. February 19 - 21, 2001. First SIAM-EMS Conference ”Applied Mathematics in our Changing World” in Berlin on September 2-6, 2001. The 2nd World Congress of the Bachelier Finance Society at Crete, June 12-15, 2002. ix
x
List of Papers
Paper IV: I. Babuˇska, R. Tempone and G. Zouraris. Galerkin finite element approximations of stochastic elliptic partial differential equations. Presented in “Current and future trends in numerical pde’s: Where is the field, and where is it going?” February 8-9, 2002 Texas Institute for Computational and Applied Mathematics, The University of Texas at Austin, Austin, Texas, USA.
A Roxana.
Chapter 1
Introduction This work studies numerical methods for weak approximation of stochastic differential equations. In particular, it focuses on the numerical complexity of different numerical methods. Here by numerical complexity we mean the computational work needed by a numerical method to solve a problem with a given accuracy. This work first considers adaptive numerical methods for weak approximation of Itˆo Stochastic Differential Equations (SDEs) and then analyzes Galerkin finite element approximations for Stochastic Partial Differential Equations (SPDEs) that are linear and elliptic. SDEs are often part of mathematical models that describe the evolution of dynamical systems under uncertainty. A mathematical model establishes mathematical relations between the relevant variables of a given system. For example, a differential equation modeling the temperature of a hot metal surface subject to water cooling, describes the relation between the given initial condition –the initial temperature–, the flux function –which tells how the heat is convected in the system–, and the final value –the final temperature we want to know–. The purpose of a mathematical model is to predict the outcome of events –which can be past, present or future–, for example the result of a certain physical experiment, and possibly take advantage of that knowledge, since an accurate mathematical model may be used as a basic tool to control the outcomes. Mathematical models can be deterministic or stochastic. The first case arises when the data and the relations described in the model are deterministic, like in the case of an ordinary differential equation with fixed data, whereas in the second either the data or the relations between the variables are stochastic. As an example of this, consider an ordinary differential equation, whose initial value is not deterministic, but follows a given probability distribution. Another example is to consider a perturbation of an ordinary differential equation, where the evolution itself is affected by some “noise”. Setting a formal description of the this intuitive 3
4
Chapter 1. Introduction
notion leads to the concept of Stochastic Differential Equations. The field of applications is quite wide, e.g. it comprises ground water flow and financial markets [KP92, Øks98]. Uncertainty comes basically from two sources, namely the lack of complete information about the dynamics of the system to model, or the fact that, for fixed data, the system does not always offer the same outcome. As an example of the second case, when rolling a fair dice a sufficiently large number of times, we tend to observe that all the values appear in similar proportions in the outcomes. It is possible then to use this statistical information within a probability model to answer questions related to dice games. This second step is related with the approach pursued here, that is, we shall assume that the stochastic model has been properly identified by some statistical procedure and is given, and then try to compute some related quantities. The need to compute expected values or averages –functionals– depending on the solution of an SDE will guide us towards the notion of weak convergence, as opposite to strong convergence, where good approximation of realization paths is required. The discretization of an SDE can be more subtle than for ODEs, for example forward and backward differences do not in general converge to the same limit. Therefore, the model must also include information on the discretization to be used. Numerical methods offer approximate computable solutions to mathematical problems, and are usually applied when the exact solution is either unknown or its computation is costly or involved. In particular, adaptive numerical methods aim for efficient use of computational resources by trying to minimize the degrees of freedom in the numerical discretizations, as well as to provide accurate estimates of the different sources of error present in the computations, like the time discretization error in the solution of an ordinary differential equation. Efficient adaptive numerical methods rely on a posteriori information, i.e. information offered by the computable numerical solution, both to estimate the error present in the numerical solution and to apply a refinement criterion when adding degrees of freedom to a given discretization. On the other hand, a priori information, i.e. information about the unknown and usually non computable exact solution, is of qualitative kind, e.g. provides smoothness properties of the exact solution, and may be used to prove convergence of numerical approximations, to identify the order of such convergence, as well as to select an appropriate numerical method [EEHJ95, EEHJ96]. Regarding applications, mathematical finance is an area where stochastic modeling with SDEs has obtained a sound success, in particular when dealing with contingent claims pricing theory. A derivative product or a contingent claim is a financial contract whose value depends on a risk factor, also known as the underlying, such as the price of a bond, commodity, currency, share, a yield or rate of interest, an index of prices or yields, etc. These contracts are also known as ”derivatives”, for short, and are common in financial markets. The application of derivatives is increasing, consider for example the case of energy derivatives [Boh98, EL00] currently traded in many new regional
5 markets, arising from deregulation of former national monopolies. A simple example of a derivative, is the so called European Call Option, which gives to its owner the right, but not the obligation, to buy the underlying asset at the previously agreed-upon price on the expiration date [Hul93]. The usual valuation method assumes that the financial markets are efficient, that is, that there is no opportunity of making riskless profits, or in other words, that there exist no arbitrage opportunity in the market. This assumption leads to a consistency requirement between the price of the underlying, e.g. a stock price, which can be observed directly in the market, and a fair price for a related derivative product, e.g. the call option introduced above. Mathematically this consistency relation can be expressed by the existence of a probability measure Q such that given today’s date and today’s stock price, the price of the derivative is the expectation under Q of its discounted final payoff. The relevant point is that the expectations must be taken under Q, which is known as a martingale measure, and not under the objective probability measure [BR96, Bj¨ o98]. After the celebrated work of Black and Scholes [BS73], stochastic differential equations have been playing a major role in financial applications. Black and Scholes’ model can be used to fit observed data through implied quantities, and the related valuation formula can be interpreted as a nonlinear interpolation procedure to estimate derivative prices. Even though few of the model’s assumptions are fully respected in practice, e.g. constant volatility and constant riskless interest rate, the model is quite robust, specially for relatively short maturity options. However, when the life of the option becomes larger, extensions of the Black and Scholes model, e.g. allowing stochastic volatility, are of practical use to explain the so called volatility smile effect observed in the market [FSP00, SP99, WO97]. On the other hand, since only relatively few stochastic differential equations have explicit solutions, as the financial models get more and more refined the need for deeper understanding and better numerical methods increases. Thanks to Kolmogorov’s stochastic representation formulae, see [KS88], numerical methods for weak approximation of SDEs can be based either on the numerical solution of a Kolmogorov backward partial differential equation, see [Bj¨o94a, Bj¨o94b], using finite differences schemes [WHD95], [Wil98], or the finite element method [BS94], or by the time discretization of the SDE and the computation of sample averages by the Monte Carlo Euler method, see [KP92]. The convergence properties of finite difference schemes and the finite element method make them the best tools whenever the dimension of the given system of SDEs is low, say less or equal than four, since their computational cost increases exponentially with such dimension. On the other hand, the computational cost of Monte Carlo methods is only polynomial in the dimension of the SDE system, making them a feasible alternative to compute with large systems of SDEs. Tree methods [CRR79] are popular and have pedagogical advantages. They may be thought of as a special case of explicit finite difference schemes, although they are non optimal, i.e. with the same amount of computational work there exist other finite difference schemes with better convergence properties.
6
Chapter 1. Introduction
This work uses the Euler Monte Carlo method for weak approximation of SDEs, developing a posteriori error approximations proposing and analyzing related adaptive numerical methods for weak approximation that are well suited to solve problems with systems of SDEs. Stochastic Partial Differential Equations (SPDEs) are also used to describe the behavior of systems under uncertainty. Due to the great development in computational resources and scientific computing techniques, more mathematical models can be solved efficiently. Ideally, these techniques could be used to solve many classical partial differential equations (PDEs) to high accuracy. However, in many cases, the information available to solve a given problem is far from complete. This is the case when solving a partial differential equation whose coefficients depend on material properties that are known to some accuracy. The same may occur with its boundary conditions, and even with the geometry of its domain, see for example the work [BCa, BCb]. Naturally, since the current engineering trends are toward more reliance on computational predictions, the need for assessing the level of accuracy in the results grows accordingly. More than ever, the goal then becomes to represent and propagate the uncertainties from the available data to the desired result through our partial differential equation. By uncertainty we mean either intrinsic variability of physical quantities or simply lack of knowledge about some physical behavior, cf. [Roa98]. If variability is interpreted as randomness then naturally we can apply probability theory. To be fruitful, probability theory requires considerable empirical information about the random quantities in question, generally in the form of probability distributions or their statistical moments. Uncertainties may arise at different levels. They could appear in the mathematical model, e.g. if we are not sure about the linear behavior of some material, or in the variables that describe the model, e.g. if the linear coefficient that describes the material is not completely known. Here we shall discuss the second alternative, and use a probabilistic description for the coefficient variability, leading us to the study of stochastic partial differential equations. Regarding the approximation of SPDEs, this thesis describes and analyzes two numerical methods for a linear elliptic problem with stochastic coefficients and homogeneous Dirichlet boundary conditions. The first method generates iid approximations of the solution by sampling the coefficients of the equation and using a standard Galerkin finite elements variational formulation. The Monte Carlo method then uses these approximations to compute corresponding sample averages. The second method is based on a finite dimensional approximation of the stochastic coefficients, turning the original stochastic problem into a deterministic parametric elliptic problem. A Galerkin finite element method, of either h or p version, then approximates the corresponding deterministic solution yielding approximations of the desired statistics. We include a comparison of the computational work required by each method to achieve a given accuracy, the numerical complexity, to illustrate their nature and possible use. The thesis is organized as follows: Section 1.1 describes the problem of weak approximation of Itˆ o stochastic differential equations and the contributions from
1.1. Itˆ o Stochastic Differential Equations
7
papers I, II and III. Finally, Section 1.2 describes a problem from linear elliptic SPDEs and describes the contribution from paper IV.
1.1
Itˆ o Stochastic Differential Equations
1.1.1
Weak Approximation of SDEs
Let (Ω, F, P ) be a probability space, where Ω is a set of outcomes, F is a set of events in Ω, P : F → [0, 1] is a probability measure; and then let W : R × Ω → R`0 be a Wiener process on (Ω, F, P ). On what follows, {FtW }t∈[0,T ] denotes the natural filtration, i.e. the filter structure of σ-algebras generated by W , or equivalently the filter generated by the random variables {W (s) : 0 ≤ s ≤ t}. Let a(t, x) ∈ Rd and b` (t, x) ∈ Rd , ` = 1, . . . , `0 , be given drift and diffusion fluxes and consider the Itˆ o stochastic differential equation in Rd
dXk (t) =ak (t, X(t))dt +
`0 X
b`k (t, X(t))dW ` (t), k = 1, . . . , d, t > 0,
`=1
(1.1)
Xk (0) =X0,k , k = 1, . . . , d. A classical reference for SDEs is the book [KS88]. An existence proof for strong solutions of SDEs, based on Piccard iterations and Lipschitz continuity of the drift and diffusion coefficients can be found in e.g. [Øks98], while a description of an alternative proof based on the Euler method can be found in [MSTZ00b]. The weak approximation of SDEs consists in approximating the expectation E[g(X(T ))], where g : Rd → R is a given function, T is a given positive number and the stochastic process X is the solution of (1.1) with initial datum X(0). In finance applications the function g can be a discounted payoff function of a T -contingent claim, and the fluxes a, b describe the dynamics of the underlying process, e.g. a vector of stock values X. Figure 1.1 shows a simple example of such a problem, with g(x) = max(X − 0.9, 0) corresponding to the payoff diagram of a call option with strike price 0.9, maturity time T = 1 and an underlying process that follows a X(t) geometric Brownian motion, in this case dX(t) = X(t) dW (t), with initial 10 dt + 5 condition X(0) = 1. The functional to compute is then Z max(x − 0.9, 0)p(1, x; 0, X0 )dx,
E[g(X(1))] =
(1.2)
R
where p(1, ·; 0, X0 ) is the probability density of X(1). A first step towards the development of numerical solutions of the weak approximation problem is the forward Euler method, which is a time discretization of
8
Chapter 1. Introduction 2.5 2 1.5 1
g(x)
p(x)
W
0.5 0
time t
−0.5 −1
x
−1.5 −2 −2.5
0
0.1
0.2
0.3
0.4
0.5 time
0.6
0.7
0.8
0.9
x0
1
Figure 1.1. Weak approximation example. Left: Realizations for the Wiener process, ∆t = 0.01. Right: Realizations of the process X(t), the function g and a final sample density p(1, x; 0, X0 ) corresponding to M = 1000 realizations.
(1.1). Consider the time nodes 0 = t0 < t1 < · · · < tN = T and define the discrete ¯ by time stochastic process X ¯ n+1 ) X(t
¯ n ) + a(tn , X(t ¯ n ))∆tn + = X(t
`0 X
¯ n ))∆Wn` , 0 ≤ n ≤ N − 1 b` (tn , X(t
`=1
¯ X(0)
= X0 .
(1.3)
¯ n ) is computable, the expectation E[g(X(T ¯ )] Even though a realization of X(t ¯ is in general not; however, E[g(X(T )] can be approximated by a sample average PM 1 ¯ of M independent realizations, M j=1 g(X(T ; ωj )), which is the basis of Monte Carlo methods [KP92]. Therefore, the exact computational error, EC , naturally separates into the two parts EC ≡ E[g(X(T ))] −
M 1 X ¯ g(X(T ; ωj )) M j=1
M X ¯ )) + E[g(X(T ¯ ))] − 1 ¯ ; ωj )) ≡ ET + ES , = E g(X(T )) − g(X(T g(X(T M j=1
(1.4) ¯ )) , is the time discretization error, where the first term, ET ≡ E g(X(T )) − g(X(T ¯ ))] − 1 PM g(X(T ¯ ; ωj )) , is the statistical error. and the second, Es ≡ E[g(X(T j=1 M ¯ are determined from statistical approximations The time steps for the trajectories X ¯ are of the time discretization error ET . The number of realizations, M of X,
1.1. Itˆ o Stochastic Differential Equations
9
determined from the statistical error ES . Therefore, the number of realizations can be asymptotically determined by the Central Limit Theorem √ M ES * χ, where the stochastic variable χ has the normal distribution, with mean zero and ¯ ))]. The objective here is to choose the time nodes, which may variance var[g(X(T be different for different realizations of W , 0 = t0 < t1 < · · · < tN = T, and the number of realizations, M , so that the absolute value of the computational error is below a given tolerance, |EC | ≤ TOL, with probability close to one and with as few time steps and realizations as possible. Other aspects of the use of the Euler method for the weak approximation of SDEs have been addressed before. Milstein [Mil78] proved that the weak order of T the Euler method is 1, i.e. that for uniform deterministic time steps ∆t = N , 1 ¯ E[g(X(T )) − g(X(T ))] = O N , where N is the number of time steps. Later, Talay and Tubaro [TT90] proved that for uniform deterministic time steps there is an a priori expansion Z T T 1 ¯ E g(X(T )) − g(X(T )) = E[Ψ(s, X(s))]ds + O( 2 ), N N 0 where Ψ(t, x) ≡ 12 (ak an ∂kn u)(t, x) + (ai djk ∂ijk u)(t, x) + 12 (dij dkn ∂ijkn u)(t, x) ∂ ∂ ∂ + 21 u(t, x) + (ai ∂i u)(t, x) + (dij ∂ij u)(t, x), ∂t ∂t ∂t is based on the definition of the conditional expectation u(t, x) ≡ E[g(X(T ))| X(t) = x] and the following notation 1 ` ` bb , 2 i j ∂ , ∂k ≡ ∂xk ∂2 ∂ki ≡ , ∂xk ∂xi .. . dij ≡
with the summation convention, i.e., if the same subscript appears twice in a term, the term denotes the sum over the range of this subscript, e.g. cik ∂k bj ≡
d X k=1
cik ∂k bj .
10
Chapter 1. Introduction
For a derivative ∂α the notation |α| is its order. Kloeden and Platen [KP92] extended the results of Talay and Tubaro on the existence of leading order error expansion in a priori form, for first and second order schemes, to general weak approximations of higher order. Bally and Talay [BT95, BT96] extended the proof to the case where the payoff function g is not smooth, using Malliavin calculus [Nua95]. This expansion motivates the use of Richardson’s extrapolation for the development of higher order methods. The case of killed diffusions, e.g. arising in the computation of barrier options, where the distribution of X is not absolutely continuous with respect to the Lebesgue measure, was analyzed by Gobet [Gob00]. An introduction to numerical approximation of SDEs and an extensive review of the literature can be found in the inspiring book by Kloeden and Platen [KP92], including information about the construction and the analysis of the convergence order for higher order methods, either implicit or explicit. Asymptotical optimal adapted adaptive methods for strong approximation of stochastic differential equations, are analyzed in [HMGR00] and [MG00], which include the hard problem to obtain lower error bounds for any method based on the number of evaluations of W and requires roughly the L2 norm in time of the diffusion maxi dii to be positive pathwise. The work [GL97] treats a first study on strong adaptive approximation.
1.1.2
Overview of Paper 1
The main result is new expansions of the computational error, with computable leading order term in a posteriori form, based on stochastic flows and discrete dual backward problems. The expansions lead to efficient and accurate computation of error estimates. In the first simpler expansion, the size of the time steps ∆tn may vary in time but they are deterministic, i.e. the mesh is fixed for all samples. This is useful for solutions with singularities, or approximate singularities, at deterministic times or for problems with small noise. The second error expansion uses time steps ¯ Stochastic time steps which may vary for different realizations of the solution X. are advantageous for problems with singularities at random times. Stochastic time steps use Brownian bridges and require more work for a given number of time steps. ¯ The optimal stochastic steps depend on the whole solution X(t), 0 < t < T , and in particular the step ∆t(t) at time t depends also on W (τ ), τ > t. In stochastic analysis the concept adapted to W means that a process at time t only depends on events generated by {W (s), s < t}. In numerical analysis an adaptive method means that the approximate solution is used to control the error, e.g. to determine the time steps. Our stochastic steps are in this sense adaptive non adapted, since ∆t(t) depends slightly on W (τ ), τ > t. The number of realizations needed to determine the deterministic time steps is asymptotically at most O(TOL−1 ), while the number of realizations for the Monte ¯ ))] is O(TOL−2 ). Therefore, the additional Carlo method to approximate E[g(X(T work to determine optimal deterministic time steps becomes negligible as the error tolerance tends to zero.
1.1. Itˆ o Stochastic Differential Equations
11
Efficient adaptive time stepping methods, with theoretical basis, use a posteriori error information, since the a priori knowledge usually cannot be as precise as the a posteriori. This work develops adaptive time stepping methods by proving in Theorems 2.2 and 3.3 error estimates of ET with leading order terms in computable a posteriori form. Theorem 2.2 uses deterministic time steps, while Theorem 3.3 also holds for stochastic time steps, which are not adapted. The main new idea here is the efficient use of stochastic flows and dual functions to obtain the error expansion with computable leading order term in Theorems 2.2 and 3.3, including also non adapted adaptive time steps. The use of dual functions is standard in optimal control theory and in particular for adaptive mesh control for ordinary and partial differential equations, see [BMV83], [Joh88], [JS95], [EEHJ96], and [BR96]. The authors are not aware of other error expansions in a posteriori form or adaptive algorithms for weak approximation of stochastic differential equations. In particular error estimates with stochastic non adapted time steps seem to not have been studied before. Theorem 2.2 describes a computable error expansion, with deterministic time steps, to estimate the computational error, X ¯ ))] ' E[g(X(T )) − g(X(T E[ρ(tn , ω)](∆tn )2 (1.5) n
where ρ(tn , ω)∆tn is the corresponding error density function. Section 3 proves in Theorem 3.3 an analogous error expansion X ¯ ))] ' E[ E[g(X(T )) − g(X(T ρ(tn , ω)(∆tn )2 ] (1.6) n
which can be used also for stochastic time steps. The leading order terms of the expansion have less variance compared to the expansion in Theorem 2.2, but use upto the third variation, which requires more computational work per realization. The focus in the paper is on computable error estimates for weak convergence of stochastic differential equations. The technique used here is based on the transition probability density and Kolmogorov’s backward equation, which was developed in [SV69] and [SV79] to analyze uniqueness and dependence on initial conditions for weak solutions of stochastic differential equations. The analogous technique for deterministic equations was introduced in [Gr¨o67] and [Ale61].
1.1.3
Overview of Paper 2
Convergence rates of adaptive algorithms for weak approximations of Itˆo stochastic differential equations are proved for the Monte Carlo Euler method. Here the focus is on the adaptivity procedures, and we derive convergence rates of two algorithms including dividing and merging of time steps, with either stochastic or deterministic time steps. The difference between the two algorithms is that the stochastic time steps may use different meshes for each realization, while the deterministic time steps use the same mesh for all realizations. The construction and
12
Chapter 1. Introduction
the analysis of the adaptive algorithms are inspired by the related work [MSTZ00a], on adaptive algorithms for deterministic ordinary differential equations, and use the error estimates from [STZ01]. The main step in the extension is the proof of the almost sure convergence of the error density. Both adaptive algorithms are proven to stop with optimal number of steps up to a problem independent factor defined in the algorithm. There are two main results on efficiency and accuracy of the adaptive algorithms described in Section 3. In view of accuracy with probability close to one, the approximation errors in (1.4) are asymptotically bounded by the specified error tolerance times a problem independent factor as the tolerance parameter tends to zero. In view of efficiency, both the algorithms with stochastic steps and deterministic steps stop with the optimal expected number of final time steps and optimal number of final time steps respectively, up to a problem independent factor. The number of final time steps is related to the numerical effort needed to compute the approximation. To be more precise, the total work for deterministic steps is roughly M · N where M is the final number of realizations and N is the final number of time steps, since the work to determine the mesh turns out to be negligible. On the other hand, the total work with stochastic steps is on average bounded by M · E[Ntot ], where the total number, Ntot , of steps including all refinement levels is bounded by O(N log N ) with N steps in the final refinement; for each realization it is necessary to determine the mesh, which may vary for each realization. The accuracy and efficiency results are based on the fact that the error density, ρ which measures the approximation error for each interval following (1.5,1.6), converges almost surely or a.s. as the error tolerance tends to zero. This conver¯ gence can be understood by the a.s. convergence of the approximate solution, X, as the maximal step size tends to zero. Once this convergence is established, the techniques to develop the accuracy and efficiency results are similar to those from [MSTZ00a]. Although the time steps are not adapted to the standard filtration generated by W for the stochastic time stepping algorithm, the work [STZ01] proved that the corresponding approximate solution converges to the correct adapted solution X. This result makes it possible to prove the martingale property of the approximate error term with respect to a specific filtration, see Lemma 4.2. There¯ fore Theorem 4.1 and 4.4 use Doob’s inequality to prove the a.s. convergence of X. Similar results of pointwise convergence with constant step sizes, adapted to the standard filtration, are surveyed by Talay in [Tal95]. This work can be easily modified following [MSTZ02] yielding adaptive algorithms with no merging that have several theoretical advantages.
1.1.4
Weak Approximation of an Infinite Dimensional SDE
The extension of the weak approximation problem described in Section 1.1.1 to the infinite dimensional case is here motivated by financial applications, in particular the valuation of contingent claims that have the market interest rate as the underlaying.
1.1. Itˆ o Stochastic Differential Equations
13
Due to the relatively long life span of these products, the interest rate is modeled as a stochastic process, which may be Markovian, like in the case of spot rate models. In the context of interest rates products, the most elementary one is the zero coupon bond with maturity time τ that gives to its owner the right to receive one unit of currency on the date τ . The price at time t < τ of such a contract is denoted by P (t, τ ). Since we can choose in principle any possible values of τ > t we have an unlimited number of bonds. However, due to the assumption of absence of arbitrage in the market, the bond prices corresponding to different maturities are not independent. Mathematical models that take into account the joint evolution of zero coupon bonds with different maturities can use the so called forward rate, f (t, τ ) ≡ −∂τ log P (t, τ ), for t ∈ [0, τ ] and τ ∈ [0, τmax ], see [BR96, Bj¨o94b, Bj¨ o98, Hul93]. Intuitively, we can think of the instantaneous forward rate f (t, τ ) as the risk-free rate at which we can borrow and lend money over the infinitesimal time interval [τ, τ + dτ ], provided that the contract is written at time t. In such models, the absence of arbitrage and friction in the market implies that the drift and the diffusion of the forward rate dynamics must fulfill the HeathJarrow-Morton condition [HJM90, HJM92], i.e. that under a risk neutral probability measure, the forward rate f (t, τ ) follows an infinite dimensional Itˆo stochastic differential equation of the form
df (t, τ ) =
J X j=1
Z σ (t, τ ) j
τ
J X σ j (t, s)ds dt + σ j (t, τ )dW j (t),
t
j=1
(1.7)
f (0, τ ) = f0 (τ ), τ ∈ [0, τmax ]. Here W (t) = (W 1 (t), . . . , W J (t)), is a J-dimensional Wiener process with independent components, and σ j (t, T ), j = 1, . . . , J are stochastic processes, adapted to the filtration generated by W and that may depend on f . Beside this, the initial data f0 : [0, τmax ] → R, is a given deterministic C 1 function, obtained from the observable prices P (0, τ ). Figure 1.2 depicts a typical realization of the surface f (t, τ ). Observe that the t-sections, f (t, ·), are smooth functions of τ , while the τ -sections, f (·, τ ), are continuous but not smooth. A basic contract to price is a call option, with exercise time tmax and strike price K, on a zero coupon bond with maturity τmax . The price of this option can be written as h Z E exp − 0
tmax
n Z f (s, s)ds max exp −
τmax
tmax
oi f (tmax , τ )dτ − K, 0 .
14
Chapter 1. Introduction Forward rate realization
time t maturity τ
Figure 1.2. Forward rate modeling: a typical realization of f (t, τ ).
With this motivation, we consider computation of the functionals h Z F(f ) ≡E F Z + 0
tmax
0 tmax
f (s, s)ds
G
Z
τmax
Q(f (tmax , τ ))dτ
τa
Z F(
s
f (v, v)dv U (f (s, s))ds
i
0
The functions F : R → R, G : R → R, Q : R → R, U : R → R, and their derivatives up to a sufficiently large order m0 are assumed to have polynomial growth. Beside this 0 < tmax ≤ τa < τmax are given positive numbers. Here the aim is to provide a computable approximation of the above functional F(f ). This is accomplished in two steps, namely by a t and τ discretization of (1.7), yielding a numerical solution f , and then the computation of sample averages by the Monte Carlo method. As in the previous Section, an important issue is to estimate the different sources of computational error. Here the new ingredient is the analysis of the τ discretization error, which appears together with the time discretization error and statistical error introduced in Section 1.1.1.
1.1.5
Overview of Paper 3
This work studies the problem introduced in Section 1.1.4. considering numerical solutions, based on the so called Monte Carlo Euler method, for the price of financial instruments in the bond market, using the Heath Jarrow Morton model for the forward rate [HJM90, HJM92]. The main contribution is to provide rigorous error expansions, with leading error term in computable a posteriori form, offering computational reliability in the use of more complicated HJM multifactor models,
1.2. Stochastic Partial Differential Equations
15
Approximate Forward rate realization
time t maturity τ
Figure 1.3. Forward rate modeling: a typical realization of f¯(t, τ ), an approximation for f (t, τ ).
where no explicit formula can be found for the pricing of contingent claims. These error estimates can be used to handle simultaneously different sources of error, e.g. time discretization, maturity discretization, and finite sampling. To develop error estimates we use a Kolmogorov backward equation in an extended domain and carry out further the analysis in [STZ01], from general weak approximation of Itˆo stochastic differential equations in Rn , to weak approximation of the HJM Itˆo stochastic differential equations in infinite dimensional spaces. Therefore the main new ingredient here is to provide error estimates useful for adaptive refinement not only in time t but also in maturity time τ , see Figure 1.3. Another contribution is the removal of the error in the numerical approximations, produced by the representation of the initial term structure in a finite maturity partition. Finally, the formulae to compute sharp error approximations are simplified by exploiting the structure of the HJM model, reducing the work to compute such error estimates.
1.2
Stochastic Partial Differential Equations
Let D be a convex bounded polyhedral domain in Rd and (Ω, F, P ) be a complete probability space. Here Ω is the set of outcomes, F ⊂ 2Ω is the σ-algebra of events and P : F → [0, 1] is a probability measure. Consider the stochastic linear elliptic boundary value problem: find a stochastic process, u : Ω × D → R, such that P -almost everywhere in Ω, or in other words almost surely (a.s.), we have −∇ · (a(ω, ·)∇u(ω, ·)) = f (ω, ·) on D, u(ω, ·) = 0 on ∂D.
(1.8)
Here a, f : Ω × D → R are stochastic processes that are correlated in space. If we denote by B(D) the Borel σ-algebra generated by the open subsets of D, then a, f
16
Chapter 1. Introduction
are jointly measurable with the σ-algebra (F ⊗ B(D)). It is usual to assume that the elliptic operator is bounded and uniformly coercive, i.e. ∃ amin , amax ∈ (0, +∞) : a(ω, x) ∈ [amin , amax ], ∀x ∈ D
a.s.
(1.9)
To ensure regularity of the solution u we assume also that a has a uniformly bounded and continuous first derivative, i.e. there exists a real deterministic constant C such that a(ω, ·) ∈ C 1 (D) and max |∇x a(ω, ·)| < C a.s. (1.10) D
In addition, the right hand side in (1.8) satisfies Z Z Z f 2 (ω, x)dx dP (ω) < +∞ which implies f 2 (ω, x)dx < +∞ a.s. (1.11) Ω
D
D
Stochastic Sobolev spaces are used to obtain existence and uniqueness results for the solution of (1.8). As in the deterministic space, cf. [BS94], the tools are Hilbert spaces, a suitable weak formulation and then the application of the LaxMilgram’s lemma. All the necessary definitions then have to be extended to the stochastic setting using tensor product of Hilbert spaces, cf. [Lar86] and [ST02], which is a standard procedure. Definition 1.1. Let H1 , H2 be Hilbert P spaces. The tensor space H1 ⊗ H2 is the completion of formal sums u(y, x) = i=1,...,n vi (y)w P i (x), {vi } ⊂ H1 , {wi } ⊂ H2 , with respect to the inner product (u, u ˆ)H1 ⊗H2 = i,j (vi , vˆj )H1 (wi , w ˆj )H2 . For example, let us consider two domains, y ∈ Γ, x ∈ D and the tensor space L2 (Γ) ⊗ H 1 (D), with tensor inner product (u, u ˆ)L2 (Γ)⊗H 1 (D) = Z Z Z Z u(y, x)ˆ u(y, x)dx dy + ∇x u(y, x) · ∇x u ˆ(y, x)dx dy. Γ
D 2
Γ k
D k
Thus, if u ∈ L (Γ) ⊗ H (D) then u(y, ·) ∈ H (D) a.e. on Γ and u(·, x) ∈ L2 (Γ) a.e. on D. Moreover, we have the isomorphism L2 (Γ) ⊗ H k (D) ' L2 (Γ; H k (D)) ' H k (D; L2 (Γ)) with the definitions L2 (Γ; H k (D)) = Z v : Γ × D → R | v is jointly measurable and kv(y, ·)k2H k (D) < +∞ , Γ
H k (D; L2 (Γ)) = n v : Γ × D → R | v is jointly measurable, ∀|α| ≤ k ∃ ∂α v ∈ L2 (Γ) ⊗ L2 (D) and Z Z Z o ∂α v(y, x)φ(x)dx = (−1)|α| v(y, x)∂α φ(x)dx, ∀φ ∈ C0∞ (D) a.e. on Γ . D
Γ
D
1.2. Stochastic Partial Differential Equations
17
f s,q (D) Intuitively, a function v(ω, x) that belongs to the stochastic Sobolev space W s,q will have its realizations accordingly regular, i.e. v(ω, ·) ∈ W (D) a.s. We first recall the definition of stochastic weak derivatives. Let v ∈ L2P (Ω) ⊗ L2 (D), then the α stochastic weak derivative of v, w = ∂α v ∈ L2P (Ω) ⊗ L2 (D), satisfies Z Z v(ω, x)∂ α φ(x)dx = (−1)|α| w(ω, x)φ(x)dx, ∀φ ∈ C0∞ (D), a.s. D
D
f s,q (D) = Lq (Ω, W s,q (D)) containWe shall work with stochastic Sobolev spaces W P ing stochastic processes, v : Ω × D → R, that are measurable with respect to the product σ-algebra F ⊗ B(D) and equipped with the averaged norms 1/q
q 1/q kvkW =E f s,q (D) = E[kvkW s,q (D) ]
X Z
|∂ α v|q dx
,
1 ≤ q < +∞
D
|α|≤s
and α kvkW f s,∞ (D) = max ess supΩ×D |∂ v| . |α|≤s
f s,q (D) then v(ω, ·) ∈ W s,q (D) a.s. and ∂ α v(·, x) ∈ Lq (Ω) Observe that if v ∈ W P a.e. on D for |α| ≤ s. Whenever q = 2, the above space is a Hilbert space, i.e. f s,2 (D) = H e s (D) ' L2 (Ω) ⊗ H s (D). W P Now we recall the definition of weak solutions for (1.8). Consider the bilinear e 1 (D) × H e 1 (D) → R, form, B : H 0 0 Z e 1 (D). B(v, w) ≡ E a∇v · ∇wdx , ∀v, w ∈ H 0
D
The standard assumption (1.10) yields both the continuity and the coercivity of B, i.e. e 01 (D), |B(v, w)| ≤ amax kvkHe 1 (D) kwkHe 1 (D) , ∀v, w ∈ H (1.12) 0
0
and amin
e 01 (D). kvk2He 1 (D) ≤ B(v, v), ∀v ∈ H
(1.13)
0
A direct application of the Lax Milgram’s lemma, see [BS94], implies the existence e 1 (D) and uniqueness for the solution to the variational formulation: find u ∈ H 0 such that e 01 (D). B(u, v) = L(v), ∀v ∈ H (1.14) R e 1 (D) defines a bounded linear functional since the Here L(v) ≡ E[ D f vdx], ∀v ∈ H 0 random field f satisfies (1.11). Moreover, standard arguments from measure theory show that the solution to (1.14) also solves (1.8). Usually in practical problems the information about the stochastic processes a and f is only limited. For example, we may only have approximations for their
18
Chapter 1. Introduction
expectations and covariance functions to use in the implementation of a numerical method for (1.8). The Karhunen-Lo`eve expansion, known also as the Proper Orthogonal Decomposition (POD), is a suitable tool for the approximation of stochastic processes. This expansion is used extensively in the fields of detection, estimation, pattern recognition, and image processing as an efficient tool to approximate random processes. Now we describe the Karhunen-Lo`eve expansion of a stochastic process. Consider a stochastic process a with continuous covariance function, Cov[a] : D × D → R. Besides this, let {(λi , bi )}+∞ i=1 denote the sequence of eigenpairs associated with the compact self adjoint operator that maps Z 2 f ∈ L (D) 7→ Cov[a](x, ·)f (x)dx ∈ L2 (D). D
The real and non-negative eigenvalues λ1 ≥ λ2 ≥ . . . satisfy sZ Z 2 0 ≤ λi ≤ (Cov[a](x1 , x2 )) dx1 dx2 , D +∞ X i=1
i = 1, . . .
D
Z λi =
V ar[a](x)dx, D
and λi → 0. The corresponding eigenfunctions are orthonormal, i.e. Z bi (x)bj (x)dx = δij D
and by Mercer’s theorem, cf. [RSN90] p. 245, kCov[a](x, y) −
N X
λn bn (x)bn (y)kL∞ (D×D) → 0.
(1.15)
n=1
The Karhunen-Lo`eve expansion of the stochastic process a, cf. [L´ev92, Yag87a, Yag87b], is N p X aN (ω, x) = E[a](x) + λi bi (x)Yi (ω) i=1
{Yi }+∞ i=1
where is a sequence of uncorrelated real random variables, with mean zero and unit variance. These random variables are uniquely determined by Z 1 (a(ω, x) − E[a](x))bi (x)dx Yi (ω) = √ λi D
1.2. Stochastic Partial Differential Equations
19
for λi > 0. Then applying (1.15) yields the uniform convergence ! N X 2 2 sup E[(a − aN ) ](x) = sup V ar[a](x) − λi bi (x) → 0, as N → ∞. x∈D
x∈D
i=1
A standard approach then is to approximate the stochastic coefficients from (1.8) using the Karhunen-Lo`eve expansion and then solve the resulting stochastic partial differential equation. In this thesis we study the approximation of some statistical moments of the solution from (1.8), e.g. the deterministic function E[u]. For example, we are interested in approximating this function using either L2 (D) or H 1 (D). Depending on the structure of the noise that drives an elliptic partial stochastic differential equation, there are different numerical approximations. For example, when the size of the noise is relatively small, a Neumann expansion around the mean value of the equation’s operator is a popular alternative. It requires only the solution of standard deterministic partial differential equations, the number of them being equal to the number of terms in the expansion. Equivalently, a Taylor expansion of the solution around its mean value with respect to the noise yields the same result. Similarly, the work [KH92] uses formal Taylor expansions up to second order of the solution but does not study their convergence properties. Recently, the work [BC02] proposed a perturbation method with successive approximations. It also proves that the condition of uniform coercivity of the diffusion is sufficient for the convergence of the perturbation method. When only the load is stochastic, it is also possible to derive deterministic equations for the moments of the solution. This case was analyzed in [Bab61, Lar86] and more recently in the work [ST02], where a new method to solve these equations with optimal complexity is presented. On the other hand, the work by Babuˇska et al. [Deb00, DBO01] and by Ghanem and Spanos [GS91] address the general case where all the coefficients are stochastic. Both approaches transform the original stochastic problem into a deterministic one with higher dimensions, and they differ in the choice of the approximating functional spaces. The work [Deb00] uses finite element to approximate the noise dependence of the solution, while [GRH99, GS91] uses a formal expansion in terms of Hermite polynomials. Monte Carlo methods are both general and simple to code and they are naturally suited for parallelization. They generate a set of independent identically distributed (iid) approximations of the solution by sampling the coefficients of the equation, using a spatial discretization of the partial differential equation, e.g. by a Galerkin finite element formulation. Then, using these approximations we can compute corresponding sample averages of the desired statistics. The drawback of Monte Carlo methods is their slow rate of convergence. It is worth mentioning that in particular cases their convergence can be accelerated by variance reduction techniques [JCM01] or even the convergence rate improved with Quasi Monte Carlo methods [Caf98, Sob94, Sob98]. Moreover, if the probability density of the
20
Chapter 1. Introduction
random variable is smooth, the convergence rate of the Monte Carlo method for the approximation of the expected value can be improved, cf. [Nov88, TW98]. Another way to provide a notion of stochastic partial differential equations is based on the Wick product and the Wiener chaos expansion, see [HØUZ96] and [V˚ ag98]. This approach yields solutions in Kondratiev spaces of stochastic distributions which are not the same as those from (1.14). The choice between (1.14) and [HØUZ96] is a modeling decision, based on the physical situation under study. For example, with the Wick product we have E[au] = E[a]E[u] regardless of the correlation between a and f , whereas this is in general not true with the usual product. A numerical approximation for Wick stochastic linear elliptic partial differential equations is studied in [The00], yielding a priori convergence rates.
1.2.1
Overview of Paper 4
We describe and analyze two numerical methods for the linear elliptic problem (1.8). Here the aim of the computations is to approximate the expected value of the solution. Since the approximation of the stochastic coefficients a and f by the Karhunen-Lo`eve expansion is in general not exact, we derive related a priori error estimates. The first method generates iid approximations of the solution by sampling the coefficients of the equation and using a standard Galerkin finite elements variational formulation. The Monte Carlo method then uses these approximations to compute corresponding sample averages. More explicitely, we follow: 1. Give a number of realizations, M , and a piecewise linear finite element space on D, Xhd . 2. For each j = 1, . . . , M sample iid realizations of the diffusion a(ωj , ·) and the load f (ωj , ·) and find a corresponding approximation uh (ωj , ·) ∈ Xhd such that Z
Z a(ωj , x)∇uh (ωj , x)∇χ(x)dx =
D
3. Finally use the sample average
f (ωj , x)χ(x)dx, ∀χ ∈ Xhd .
D
1 M
PM
j=1
uh (ωj , ·) to approximate E[u].
Here we only consider the case where Xhd ⊆ H01 (D) is the same for all realizations, i.e. the spatial triangulation is deterministic. The second method is based on a finite dimensional approximation of the stochastic coefficients, turning the original stochastic problem into a deterministic parametric elliptic problem. In many problems the source of the randomness can be approximated using just a small number of mutually uncorrelated, sometimes mutually independent, random variables. Take for example the case of a truncated Karhunen-Lo`eve expansion described previously.
1.2. Stochastic Partial Differential Equations
21
Assumption 1.2. Whenever we apply some numerical method to solve (1.8) we assume that the coefficients used in the computations satisfy a(ω, x) = a(Y1 (ω), . . . , YN (ω), x)
and f (ω, x) = f (Y1 (ω), . . . , YN (ω), x) (1.17)
where {Yj }N j=1 are real random variables with mean value zero, unit variance, are mutually independent, and their images, Γi,N ≡ Yi (Ω) are bounded intervals in R for i = 1, . . . , N . Moreover, we assume that each Yi has a density function ρi : Γi,N → R+ for i = 1, . . . , N . Use the notations ρ(y) = ΠN ∀y ∈ Γ, i=1 ρi (yi ) N for the joint probability density of (Y1 , . . . , YN ), and Γ ≡ ΠN Γ ⊂ R , for the i=1 i,N support of such probability density. After making assumption (1.17), we have by Doob-Dynkin’s lemma, cf. [Øks98], that u, the solution corresponding to the stochastic partial differential equation (1.8) can be described by just a finite, hopefully small, number of random variables, i.e. u(ω, x) = u(Y1 (ω), . . . , YN (ω), x). Now the goal is to approximate the function u(y, x). The stochastic variational formulation (1.14) now has a deterministic equivalent in the following: find u ∈ L2ρ (Γ) ⊗ H01 (D) such that Z Z Z Z ρ(y) a(y, x)∇u(y, x) · ∇v(y, x)dxdy = ρ(y) f (y, x)v(y, x)dxdy, Γ
D
Γ
D
∀ v ∈ L2ρ (Γ) ⊗ H01 (D). (1.18) A Galerkin finite element method, of either h or p version, then approximates the corresponding deterministic solution yielding approximations of the desired statistics, i.e. we seek u ¯h ∈ Zkp ⊗ Xhd , that satisfies Z Z Z Z ρ(y) a(y, x) ∇¯ uh (y, x) · ∇χ(y, x)dx dy = ρ(y) f (y, x)χ(y, x)dxdy, Γ
D
Γ
D
∀ χ ∈ Zkp ⊗ Xhd . (1.19) Here Zkp ⊆ L2ρ (Γ) is a finite element space that contains tensor products of polynomials with degree less than or equal to p on a mesh with size k. Section 7 explains how to use the tensor product structure to efficiently compute u ¯h from (1.19). Finally, Section 8 uses the a priori convergence rates of the different discretizations to compare the computational work required by (1.16) and (1.19) to achieve a given accuracy in the approximation of E[u], i.e. their numerical complexity, offering a way to understand the efficiency of each numerical method.
22
Bibliography [Ale61]
V.M. Alekseev. An estimate for the peturbations of the solutions of ordinary differential equations. II. Vestnik Moskov. Univ. Ser. I Mat. Mech, 3:3–10, 1961.
[Bab61]
I. Babuˇska. On randomized solutions of Laplace’s equation. Casopis Pest. Mat., 1961.
[BCa]
I. Babuˇska and J. Chleboun. Effects of uncertainties in the domain on the solution of Dirichlet boundary value problems in two spatial dimensions. To appear in Num. Math.
[BCb]
I. Babuˇska and J. Chleboun. Effects of uncertainties in the domain on the solution of Neumann boundary value problems in two spatial dimensions. To appear in Math. Comp.
[BC02]
I. Babuˇska and P. Chatzipantelidis. On solving linear elliptic stochastic partial differential equations. Comput. Methods Appl. Mech. Engrg., 191:4093–4122, 2002.
[Bj¨o94a]
T. Bj¨ ork. Stokastisk kalkyl och kapitalmarknadsteori, del 1 grunderna. Matematiska Institutionen, KTH, Stockholm, 1994.
[Bj¨o94b]
T. Bj¨ ork. Stokastisk kalkyl och kapitalmarknadsteori, del 2 specialstudier. Matematiska Institutionen, KTH, Stockholm, 1994.
[Bj¨o98]
T. Bj¨ ork. Arbitrage Theory in Continuous Time. Oxford University Press Inc., New York, 1998.
[BMV83]
I. Babuˇska, A. Miller, and M. Vogelius. Adaptive computational methods for partial differential equations. Adaptive computational methods for partial differential equations, pages 57–73, 1983. (College Park, Md., 1983), Philadelphia, Pa.
[Boh98]
M. Bohlin. Pricing of options with electricity futures as underlying assets. Master’s thesis, Royal Institute of Technology, 1998. 23
24
Bibliography
[BR96]
M. Baxter and A. Rennie. Financial Calculus: An Introduction to Derivate Pricing. Cambridge University Press, Cambridge, UK, 1996.
[BS73]
F. Black and M. Scholes. The pricing of options and corporate liabilities. Journal of Political Economy, 81:637–659, 1973.
[BS94]
S. C. Brenner and L. R. Scott. The Mathematical Theory of Finite Element Methods. Springer–Verlag, New York, 1994.
[BT95]
V. Bally and D. Talay. The Euler scheme for stochastic differential equations: Error analysis with Malliavin calculus. Math. Comput. Simulation, 38:35–41, 1995.
[BT96]
V. Bally and D. Talay. The law of the Euler scheme for stochastic differential equations. II. convergence rate of the density. Monte Carlo Methods Appl., 2:93–128, 1996.
[Caf98]
R. E. Caflisch. Monte Carlo and quasi-Monte Carlo methods. In Acta numerica, 1998, pages 1–49. Cambridge Univ. Press, Cambridge, 1998.
[CRR79]
J.C. Cox, S.A. Ross, and M. Rubinstein. Option pricing: A simplified approach. Journal of Financial Economics, 7:229–263, 1979.
[DBO01]
M.-K. Deb, I. M. Babuˇska, and J. T. Oden. Solution of stochastic partial differential equations using Galerkin finite element techniques. Comput. Methods Appl. Mech. Engrg., 190:6359–6372, 2001.
[Deb00]
M.-K. Deb. Solution of stochastic partial differential equations (SPDEs) using Galerkin method: theory and applications. PhD thesis, The University of Texas at Austin, 2000.
[EEHJ95]
K. Eriksson, D. Estep, P. Hansbo, and K. Johnson. Introduction to adaptive methods for differential equations. Acta Numerica, pages 105– 158, 1995.
[EEHJ96]
K. Eriksson, D. Estep, P. Hansbo, and K. Johnson. Computational Differential Equations. Cambridge University Press, 1996.
[EL00]
F. Ericsson and M. L¨ uttgen. Risk managment on the nordic electricity market. Master’s thesis, Stockholm School of Economics, January 2000.
[FSP00]
J-P Fouque, K. R. Sircar, and G. Papanicolaou. Mean reverting stochastic volatility. International Journal of Theoretical and Applied Finance, 3:101–142, 2000.
[GL97]
J.G. Gaines and T.J. Lyons. Variable step size control in the numerical solution of stochastic differential equations. SIAM J. Appl. Math., 57:1455–1484, 1997.
Bibliography
25
[Gob00]
E. Gobet. Weak approximation of killed diffusion using Euler schemes. Stochastic Processes and their applications, 87:167–197, 2000.
[GRH99]
R. Ghanem and J. Red-Horse. Propagation of probabilistic uncertainty in complex physical systems using a stochastic finite element approach. Phys. D, 133(1-4):137–144, 1999. Predictability: quantifying uncertainty in models of complex phenomena (Los Alamos, NM, 1998).
[Gr¨o67]
W. Gr¨ obner. Die Lie-Reihen und ihre Anwendungen. VEB Deutscher Verlag der Wissenschaften, Berlin, 1967.
[GS91]
R. G. Ghanem and P. D. Spanos. Stochastic finite elements: a spectral approach. Springer-Verlag, New York, 1991.
[HJM90]
D. Heath, R. Jarrow, and A. Morton. Bond pricing and the term structure of interest rates: A discrete time approximation. Journal of Financial and Quantitative Analysis, 25:419–440, 1990.
[HJM92]
D. Heath, R. Jarrow, and A. Morton. Bond pricing and the term structure of interest rates: A new methodology for contingent claims valuation. Econometrica, 60:77–105, 1992.
[HMGR00] N. Hofmann, T. M¨ uller-Gronbach, and K. Ritter. Optimal approximation of stochastic differential equations by adaptive step-size control. Math. Comp., 69(231):1017–1034, 2000. [HØUZ96] H. Holden, B. Øksendal, J. Ubøe, and T. Zhang. Stochastic partial differential equations. Birkh¨ auser Boston Inc., Boston, MA, 1996. A modeling, white noise functional approach. [Hul93]
J. Hull. Options, Futures and Other Derivatives. Prentice Hall, Upper Saddle River, NJ, 1993.
[JCM01]
E. Jouini, J. Cvitani´c, and M. Musiela, editors. Option Pricing, Interest Rates and Risk Management. Cambridge University Press, 2001.
[Joh88]
C. Johnson. Error estimates and adaptive time-step control for a class of one-step methods for stiff ordinary differential equations. SIAM J. Numer. Anal., 25:908–926, 1988.
[JS95]
C. Johnson and A. Szepessy. Adaptive finite element methods for conservation laws based on a posteriori error estimates. Comm. Pure Appl. Math., 48:199–234, 1995.
[KH92]
M. Kleiber and T-D Hien. The stochastic finite element method. John Wiley & Sons Ltd., Chichester, 1992. Basic perturbation technique and computer implementation, With a separately available computer disk.
26
Bibliography
[KP92]
P. E. Kloeden and E. Platen. Numerical solution of stochastic differential equations. Springer-Verlag, Berlin, 1992.
[KS88]
I. Karatzas and S.E. Shreve. Brownian Motion and Stochastic Calculus, volume 113 of Graduate Texts in Mathematics. Springer–Verlag, 1988.
[Lar86]
S. Larsen. Numerical analysis of elliptic partial differential equations with stochastic input data. PhD thesis, University of Maryland, MD, 1986.
[L´ev92]
´ P. L´evy. Processus stochastiques et mouvement Brownien. Editions Jacques Gabay, Paris, 1992.
[MG00]
T. M¨ uller-Gronbach. The optimal uniform approximation of systems of stochastic differential equations. Preprint, Freie Universit¨at Berlin, 2000.
[Mil78]
G. N. Milstein. A method with second order accuracy for the integration of stochastic differential equations. Teor. Verojatnost. i Primenen., 23(2):414–419, 1978.
[MSTZ00a] K-S. Moon, A. Szepessy, R. Tempone, and G. Zouraris. Adaptive approximation of differential equations based on global and local errors. Technical Report TRITA-NA-0006, NADA, KTH, Stockholm, Sweden, February 2000. [MSTZ00b] K-S. Moon, A. Szepessy, R. Tempone, and G. Zouraris. Stochastic and partial differential equations with adaptive numerics, 2000. Lecture Notes. [MSTZ02]
K-S. Moon, A. Szepessy, R. Tempone, and G. Zouraris. Convergence rates for adaptive approximation of ordinary differential equations based on global and local errors. 2002.
[Nov88]
E. Novak. Stochastic properties of quadrature formulas. Numer. Math., 53(5):609–620, 1988.
[Nua95]
D. Nualart. The Malliavin Calculus and Related Topics. Probability and its Applications. Springer–Verlag, 1995.
[Øks98]
B. Øksendal. Stochastic differential equations. Springer-Verlag, Berlin, fifth edition, 1998. An introduction with applications.
[Roa98]
P. J. Roache. Verification and Validation in Computational Science and Engineering. Hermosa, Albuquerque, New Mexico, 1998.
[RSN90]
F. Riesz and B. Sz.-Nagy. Functional analysis. Dover Publications Inc., New York, 1990.
Bibliography
27
[Sob94]
I. M. Sobol0 . A primer for the Monte Carlo method. CRC Press, Boca Raton, FL, 1994.
[Sob98]
I. M. Sobol0 . On quasi-Monte Carlo integrations. Math. Comput. Simulation, 47(2-5):103–112, 1998. IMACS Seminar on Monte Carlo Methods (Brussels, 1997).
[SP99]
K. R. Sircar and G. Papanicolaou. Stochastic volatility, smile and asymptotics. Mathematical Finance, 6:107–145, 1999.
[ST02]
C. Schwab and R.-A. Todor. Finite element method for elliptic problems with stochastic data. Technical report, Seminar for Applied Mathematics, ETHZ, 8092 Zurich, Switzerland, April 2002. Research Report 2002-05.
[STZ01]
A. Szepessy, R. Tempone, and G. E. Zouraris. Adaptive weak approximation of stochastic differential equations. Comm. Pure Appl. Math., 54(10):1169–1214, 2001.
[SV69]
D.W. Strook and S.R.S. Varadhan. Diffusion processes with continuous coefficients, i and II. Comm. Pure Appl. Math., 22:345–400 and 479– 530, 1969.
[SV79]
D.W. Strook and S.R.S. Varadhan. Multidimensional Diffusion Processes. Grundlehren der Mathematischen Wissenschaften 233. Springer–Verlag, 1979.
[Tal95]
D. Talay. Probabilistic Methods in Applied Physics, chapter Simulation of Stochastic Differential Systems, pages 54–96. Number 451 in Lecture notes in physics. Springer, 1995. P. Kr´ee ; W. Wedig Eds.
[The00]
T. G. Theting. Solving Wick-stochastic boundary value problems using a finite element method. Stochastics Stochastics Rep., 70(3-4):241–270, 2000.
[TT90]
D. Talay and L. Tubaro. Expansion of the global error for numerical schemes solving stochastic differential equations. Stochastic Anal. Appl., 8:483–509, 1990.
[TW98]
J. F. Traub and A. G. Werschulz. Complexity and information. Cambridge University Press, Cambridge, 1998.
[V˚ ag98]
G. V˚ age. Variational methods for PDEs applied to stochastic partial differential equations. Math. Scand., 82(1):113–137, 1998.
[WHD95]
P. Wilmott, S.D. Howinson, and J.N Dewynne. The Mathematics of Financial Derivatives: A Student Introduction. Cambridge University Press, 1995.
28
Bibliography
[Wil98]
P. Wilmott. Derivatives : The Theory and Practice of Financial Engineering. John Wiley & Sons, November 1998.
[WO97]
P. Wilmott and A. Oztukel. Uncertain parameters, an empirical stochastic volatility model and confidence limits. International Journal of Theoretical and Applied Finance, 1(1):175–189, 1997.
[Yag87a]
A. M. Yaglom. Correlation theory of stationary and related random functions. Vol. I. Springer-Verlag, New York, 1987. Basic results.
[Yag87b]
A. M. Yaglom. Correlation theory of stationary and related random functions. Vol. II. Springer-Verlag, New York, 1987. Supplementary notes and references.
Adaptive Weak Approximation of Stochastic Differential Equations ANDERS SZEPESSY RAÚL TEMPONE Kungl. Tekniska Högskolan
AND GEORGIOS E. ZOURARIS University of the Aegean Abstract Adaptive time-stepping methods based on the Monte Carlo Euler method for weak approximation of Itô stochastic differential equations are developed. The main result is new expansions of the computational error, with computable leading-order term in a posteriori form, based on stochastic flows and discrete dual backward problems. The expansions lead to efficient and accurate computation of error estimates. Adaptive algorithms for either stochastic time steps or deterministic time steps are described. Numerical examples illustrate when stochastic and deterministic adaptive time steps are superior to constant time steps and when adaptive stochastic steps are superior to adaptive deterministic steps. Stochastic time steps use Brownian bridges and require more work for a given number of time steps. Deterministic time steps may yield more time steps but require less work; for example, in the limit of vanishing error tolerance, the ratio of the computational error and its computable estimate tends to 1 with negligible c 2001 additional work to determine the adaptive deterministic time steps. John Wiley & Sons, Inc.
1 Introduction to the Monte Carlo Euler Method This work develops adaptive methods and proves a posteriori error expansions, with a computable leading-order term, for weak approximations of Itô stochastic differential equations
(1.1)
d X k (t) = ak (t, X (t))dt +
`0 X
bk` (t, X (t))dW ` (t) ,
`=1
k = 1, 2, . . . , d, t > 0 , where (X (t; ω)) is a stochastic process in Rd , with randomness generated by the independent Wiener processes W ` (t; ω), ` = 1, 2, . . . , `0 on R; cf. [18, 26]. The Communications on Pure and Applied Mathematics, Vol. LIV, 0001–0046 (2001) c 2001 John Wiley & Sons, Inc.
2
A. SZEPESSY, R. TEMPONE, AND G. E. ZOURARIS
functions a(t, x) ∈ Rd and b` (t, x) ∈ Rd , ` = 1, 2, . . . , `0 , are given drift and diffusion fluxes. The goal is to construct approximations to the expected value E[g(X (T ))] by a Monte Carlo method for a given function g. Examples of such Monte Carlo simulations are to compute option prices in mathematical finance or to simulate stochastic dynamics; cf. [18, 19, 26]. The Monte Carlo method approximates the unknown process X by the Euler method X (tn ) (cf. [19, 20]), which is a time discretization based on the nodes 0 = t0 < t1 < t2 < · · · < t N = T , with (1.2) X (tn+1 ) − X (tn ) =
`0 X (tn+1 − tn )a tn , X (tn ) + W ` (tn+1 ) − W ` (tn ) b` tn , X (tn ) . `=1
The aim is to choose the size of the time steps 1tn ≡ tn+1 − tn and the number M of independent identically distributed samples X (·; ω j ), j = 1, 2, . . . , M, such that for a given tolerance TOL, M X 1 E[g(X (T ))] − (1.3) g(X (T ; ω j )) ≤ TOL , M j=1 with high probability and as few time steps and realizations as possible. We study two algorithms for determining the time steps 1tn . In the first simpler algorithm, the size of the time steps 1tn may vary in time but are deterministic; i.e., the mesh is fixed for all samples. This is useful for solutions with singularities, or approximate singularities, at deterministic times or for problems with small noise. The second algorithm uses time steps that may vary for different realizations of the solution X . These steps yield optimal time steps in the sense that the expected number of final steps is minimal for this error density and a given global error. Stochastic time steps are advantageous for problems with singularities at random times. The optimal stochastic steps depend on the whole solution X (t), 0 < t < T ; in particular, the step 1t (t) at time t depends also on W (τ ), τ > t. In stochastic analysis the concept adapted to W means that a process at time t only depends on events generated by {W (s) : s < t}. In numerical analysis, an adaptive method means that the approximate solution is used to control the error, e.g., to determine the time steps. Our stochastic steps are in this sense adaptive nonadapted, since 1t (t) depends on W (τ ), τ > t. For a fixed number of time steps, the algorithm with deterministic steps requires less work than the algorithm with stochastic steps. The number of realizations needed to determine the deterministic time steps is asymptotically at most O(TOL−1 ), while the number of realizations for the Monte Carlo method to approximate E[g(X (T ))] is O(TOL−2 ). Therefore, the additional work to determine
WEAK APPROXIMATION OF STOCHASTIC DIFFFERENTIAL EQUATIONS
3
optimal deterministic time steps becomes negligible as the error tolerance tends to zero. Section 4 compares deterministic with stochastic time steps. The computational error naturally separates into the two parts, M 1 X g X (T ; ω j ) E[g(X (T ))] − M j=1 M 1 X g X (T ; ω j ) = E g(X (T )) − g X (T ) + E g X (T ) − M j=1
≡ ET + E S . The time steps for the trajectories X are determined from statistical approximations of the time discretization error ET . The number of realizations, M of X , are determined from the statistical error E S . Therefore, the number of realizations can be asymptotically determined by the central limit theorem √ ME S * χ , where the stochastic variable χ has the normal distribution, with mean zero and variance Var[g(X (T ))]. Efficient adaptive time stepping methods, with theoretical basis, use a posteriori error information, since the a priori knowledge usually cannot be as precise as the a posteriori. This work develops adaptive time-stepping methods by proving in Theorems 2.2 and 3.3 error estimates of ET with leading-order terms in computable a posteriori form. Theorem 2.2 uses deterministic time steps, while Theorem 3.3 also holds for stochastic time steps, which are not adapted. The main inspiration of Theorems 2.2 and 3.3 is the work by Talay and Tubaro in [32]. They proved that for uniform deterministic time steps 1t = T /N , Z T 1 T (1.4a) E g(X (T )) − g(X (T )) = E[9(s, X (s))]ds + O , N2 0 N where
(1.4b)
1 9(t, x) ≡ (ak an ∂kn u)(t, x) + (ai d jk ∂i jk u)(t, x) 2 1 ∂ 1 + (di j dkn ∂i jkn u)(t, x) + u(t, x) 2 2 ∂t ∂ ∂ + ai ∂i u (t, x) + di j ∂i j u (t, x) ∂t ∂t
is based on the definition of the conditional expectation u(t, x) ≡ E[g(X (T )) | X (t) = x] and the notation 1 di j ≡ bi` b`j , 2
∂k ≡
∂ , ∂ xk
∂ki ≡
∂2 , ∂ xk ∂ xi
... ,
4
A. SZEPESSY, R. TEMPONE, AND G. E. ZOURARIS
with the summation convention; i.e., if the same subscript appears twice in a term, the term denotes the sum over the range of this subscript, e.g., cik ∂k b j ≡
d X
cik ∂k b j .
k=1
For a derivative ∂α , the notation |α| is its order. The proof of Talay and Tubaro extends directly to nonuniform deterministic time steps. The difference between the expansion in (2.5) of Theorem 2.2 and (1.4) is in the leading-order terms of the expansion. In (2.5) the leading terms are directly computable known variables and involve computed stochastic flow approximations of only up to second derivatives of u(t, x). On the other hand, the expansion (1.4) is an a priori estimate based on the unknown exact solution X and the unknown derivatives ∂α u up to fourth order. An adaptive time-stepping algorithm can, of course, use only computed variables. Suppose that X (t) ∈ Rd and that the Jacobians of the fluxes a and b are sparse; i.e., #{ j : ∂ j ai 1t + b`j 1W ` 6= 0} is independent of the dimension d. Then these reductions decrease the work to evaluate the error estimate for one realization and one time step to the order O(d 2 ) in Theorem 2.2 compared to the work of order O(d 4 ) for evaluating all fourth-order derivatives. Kloeden and Platen [19] extend the results of Talay and Tubaro on the existence of leading-order error expansion in a priori form, for first- and second-order schemes, to general weak approximations of higher order. Extensions of [32] to approximation of nonsmooth functions g and the probability density of X are studied in [3, 4]. The main new idea here is the efficient use of stochastic flows and dual functions to obtain the error expansion with computable leading-order term in Theorems 2.2 and 3.3, also including nonadapted adaptive time steps. The use of dual functions is standard in optimal control theory and in particular for adaptive mesh control for ordinary and partial differential equations; see [2, 5, 7, 16, 17]. The authors are not aware of other error expansions in a posteriori form or adaptive algorithms for weak approximation of stochastic differential equations. In particular, error estimates with stochastic nonadapted time steps do not seem to have been studied before. Asymptotical optimal adapted adaptive methods for strong approximation of stochastic differential equations are analyzed in [15, 24], which include the hard problem of obtaining lower error bounds for any method based on the number of evaluations of W and requires roughly the L 2 -norm in time of the diffusion maxi dii to be positive pathwise. The work [11] treats a first study on strong adaptive approximation. Theorem 2.2 describes the basis of an adaptive algorithm, with deterministic time steps, to estimate the computational error; the time steps are then chosen by (1tn )2 |E[ρ(tn , ω)]| = const ,
WEAK APPROXIMATION OF STOCHASTIC DIFFFERENTIAL EQUATIONS
5
where ρ(tn , ω)1tn is the function defined by the sum of the terms inside of the two square brackets in (2.5) based on the weight functions ϕ and ϕ 0 defined in (2.7) and (2.9). Provided the path X (tn ), n = 0, 1, . . . , N , is stored, the leading-order error bound can be evaluated by solving, step by step, the two backward problems (2.7) and (2.9). The backward evolutions (2.7) and (2.9) of the weight functions ϕ and ϕ 0 avoid solving for the two variables t and s present in ∂ X (t)/∂ x(s), which appears in the forward t-evolution equation for ∂ X (t)/∂ x(s) in the identity ϕi (tn ) = ∂ j g(X (T )) ∂ X j (tn )/∂ xi (tn ); cf. (2.37). A solution with two variables s and t would require work of the order N 2 for each realization instead of the corresponding work of the order N in Theorems 2.2 and 3.3. Computational aspects of the algorithms are discussed in more detail in Sections 3 and 5. Section 3 proves in Theorem 3.3 an analogous error expansion X E[g(X (T )) − g(X (T ))] ' E ρ(tn , ω)(1tn )2 , n
which can also be used for stochastic time steps. The leading-order terms of the expansion have less variance compared to the expansion in Theorem 2.2 but use up to the third variation, which requires more computational work per realization. The optimal time steps, which for given error tolerance and error density minimize the expected number of time steps, are then obtained by adapting the step sizes to each individual realization X (·, ω) by choosing (1tn )2 |ρ(tn , ω)| = const . The constant is the same for all realizations. Here the evaluation of a realization of the Wiener process W (·, ω) for the Euler method is constructed by successive Brownian bridges; cf. [18] in order to adaptively increase the resolution of W . These optimal steps depend on the whole path W . The focus in the paper is on error estimates for weak convergence of stochastic differential equations. However, the estimates in Theorems 2.2 and 3.3 also work without noise, and they give a contribution to the analysis of a posteriori error estimates and adaptive methods for deterministic ordinary differential equations. A posteriori error estimates for deterministic differential equations are usually based on linearization around the solution path (cf. [5, 8, 14, 16, 17]). Linearization in the classical sense by directly comparing two different solution trajectories is not relevant for weak convergence, i.e., convergence in distribution, since the trajectories might not be close. The technique used here is based on the transition probability density and Kolmogorov’s backward equation, which was developed in [30, 31] to analyze uniqueness and dependence on initial conditions for weak solutions of stochastic differential equations. The analogous technique for deterministic equations was introduced in [1, 12]. The deterministic restriction of Theorem 2.2 is extended to general higher-order methods for ordinary and partial differential equations in [22, 23].
6
A. SZEPESSY, R. TEMPONE, AND G. E. ZOURARIS
Section 2 proves error estimates for deterministic time steps. Section 3 derives an alternative error expansion with less statistical error but more computational work, which also works for stochastic time steps. Section 4 compares deterministic step size control with stochastic time steps, which can be better when the noise is not small. Section 5 presents implementations of adaptive algorithms and numerical experiments.
2 An Error Estimate with Deterministic Time Steps This section first proves an error expansion for deterministic time steps in Theorem 2.2. The starting point for the analysis is Lemma 2.1. It uses the fact that the Euler method can be extended, for theoretical purposes only, to t ∈ [tn , tn+1 ) by Z (2.1)
X (t) − X (tn ) =
t
tn
a(s; X )ds +
`0 Z X `=1
t
`
b (s; X )dW ` (s)
tn
where a and b` are the piecewise constant approximations (2.2)
a(s; X ) = a(tn , X (tn )) and b` (s; X ) = b` (tn , X (tn )) for s ∈ [tn , tn+1 ) .
The dependence of the noise ω j is often omitted in the notation. We also use the space Cm 0 ([0, T ] × Rd ) of functions with bounded continuous derivatives of order 0 d up to m 0 in the supremum norm on [0, T ] × Rd and the space Cm loc ([0, T ] × R ) of functions with continuous derivatives of order up to m 0 which have bounded 1 ,m 2 ([0, T ]×Rd ) maximum norm on each compact set in [0, T ]×Rd . The space Cm loc m 1 +m 2 d denotes the subset of Cloc ([0, T ] × R ) with time derivatives of up to order m 1 and space derivatives of up to order m 2 . L EMMA 2.1 Suppose that, for some m 0 > [d/2] + 10, there are positive constants k and C such that d 0 g ∈ Cm |∂α g(x)| ≤ C(1 + |x|k ) , for all |α| ≤ m 0 , loc (R ) , E |X (0)|2k+d+1 + |X (0)|2k+d+1 ≤ C ,
and a and b are bounded in Cm 0 ([0, T ] × Rd ) .
WEAK APPROXIMATION OF STOCHASTIC DIFFFERENTIAL EQUATIONS
7
Then the solution X of the Itô differential equation (1.1) and its Euler approximation X in (1.2) and (2.1), based on deterministic time steps, satisfy E[g(X (T )) − g(X (T ))] = Z T E ak (t, X (t)) − a k (t; X ) ∂k u(t, X (t)) dt 0 Z T + E di j (t, X (t)) − d i j (t; X ) ∂i j u(t, X (t)) dt
(2.3)
0
+ E[u(0, X (0)) − u(0, X (0))] , where 1 di j ≡ bi` b`j , 2
1 d i j ≡ bi` b`j , 2
and (2.4)
u(t, x) = E[g(X (T )) | X (t) = x] .
Lemma 2.1 is combined with stochastic flows to derive the a posteriori error expansion in Theorem 2.2. Theorem 2.2 is based on the variations of the processes X and X . For a process X , the first variation of a function F(X (T )) with respect to a perturbation in the initial location of the path X at time s is denoted by F 0 (T ; s) = ∂x(s) F(X (T )) ∂ ∂ F X (T ); X (s) = x , . . . , F X (T ); X (s) = x . ≡ ∂ x1 ∂ xd The proof of Theorem 2.2 uses mainly that the error in replacing g(X (T )) in Lemma 2.1 by g(X (T )) in the representation (2.4) of ∂α u yields the small deRT terministic remainder term 0 O((1t)2 )dt in (2.5) of Theorem 2.2, which is analogous to the O(1/N 2 ) term in (1.4), and needs some a priori estimate to be controlled. Lemma 2.1 can be applied to estimate this error. The second important ingredient in the proof is the Markov property of X . Let the standard σ -algebra generated by {W (s) : s ≤ tn } be Ftn . Then the nested expected values E a j (t, X (t))∂x j (t) E[g(X (T )) | Ft ] in (2.3) can, by definition (2.7) of ϕ, be decoupled to E a j (t, X (t))ϕ j (t) , which reduces the computational complexity substantially; see Lemma 2.5. T HEOREM 2.2 Suppose that a, b, g, X , and X satisfy the assumptions in Lemma 2.1 and that the initial data X (0) and its approximation X (0) have the same
8
A. SZEPESSY, R. TEMPONE, AND G. E. ZOURARIS
distribution. Then the time discretization error has the expansion E[g(X (T )) − g(X (T ))] = M N −1 X X n=0 j=1
+
(2.5)
1tn ai (tn+1 , X (tn+1 ; ω j )) − ai (tn , X (tn ; ω j )) ϕi (tn+1 ; ω j ) 2M
M N −1 X X
dik (tn+1 , X (tn+1 ; ω j ))
n=0 j=1
0 1tn − dik (tn , X (tn ; ω j )) ϕik (tn+1 ; ω j ) 2M Z T N −1 N −1 X X + (1tn )2 O(1tn ) + O((1tm )2 ) + (I M + I I M )dt , n=0
m=n
0
where the two leading-order terms are in computable a posteriori form and based on the discrete dual functions ϕ(tn ) ∈ Rd and ϕ 0 (tn ) ∈ Rd×d , which are determined as follows: For all tn and x ∈ Rd , let 1Wn ≡ W (tn+1 ) − W (tn ) and ci (tn , x) ≡ xi + 1tn ai (tn , x) + 1Wn` bi` (tn , x) ;
(2.6)
then the function ϕ is defined by the dual backward problem ϕi (tn ) = ∂i c j tn , X (tn ) ϕ j (tn+1 ) , tn < T , (2.7) ϕi (T ) = ∂i g(X (T )) , and its first variation 0 ϕik (tn ; ω) = ∂xk (tn ) ϕi (tn ; ω) ≡
(2.8)
∂ϕi (tn ; X (tn ) = x) ∂ xk
satisfies for tn < T
0 (tn ) = ∂i c j tn , X (tn ) ∂k c p tn , X (tn ) ϕ j0 p (tn+1 ) ϕik + ∂ik c j tn , X (tn ) ϕ j (tn+1 ) ,
(2.9)
0 ϕik (T ) = ∂ik g(X (T )) .
√ √ The distributions of the statistical errors M I M and M I I M tend to the normal distributions with mean zero and variance (2.10) Var ai (tn+1 , X (tn+1 )) − ai (tn , X (tn )) ϕi (tn+1 ) = O(1tn ) and (2.11)
0 (tn+1 ; tn+1 ) = O(1tn ) , Var dik (tn+1 , X (tn+1 )) − dik (tn , X (tn )) ϕik
respectively.
WEAK APPROXIMATION OF STOCHASTIC DIFFFERENTIAL EQUATIONS
9
P ROOF OF L EMMA 2.1: A standard energy estimate, using the regularity assumptions on a, b, and g, can be combined with the Sobolev inequality to show d that there exists a unique solution u ∈ C1,6 loc ([0, T ] × R ) of the Kolmogorov backward equation 1 ∂ (2.12) Lu ≡ − u − ak ∂k u − bk` bn` ∂kn u = 0 , u(T, ·) = g , ∂t 2 satisfying the polynomial growth condition max |∂α u(t, x)| ≤ C(1 + |x|k+(d+1)/2 ) ,
0≤t≤T
|α| ≤ 6 ,
for some positive k and C; cf. [10]. The Feynman-Kac formula without potential (cf. [18]) implies that u can be represented by the expected value u(t, x) = E[g(X (T )) | X (t) = x] .
(2.13) The Itô formula du(t, X (t)) =
∂ u(t, X (t)) + a i (t; X )∂i u(t, X (t)) + d i j ∂i j u(t, X (t)) dt ∂t + bi` (t; X )∂i u(t, X (t))dW ` (t)
combined with the Kolmogorov equation (2.12) to eliminate ∂t∂ u yields Z T u(0, X (0)) − g(X (T )) = ai (t, X (t)) − a i (t; X ) ∂i u(t, X (t))dt 0 Z T (2.14) di j (t, X (t)) − d i j (t; X ) ∂i j u(t, X (t))dt + 0 Z T + bi` (t; X )∂i u(t, X (t))dW ` (t) . 0
The expected value of the last integral is zero, by the martingale property of Itô integrals, and the construction of u in (2.13) shows that E[u(0, X (0))] = E[g(X (T ))] . Therefore, the expected value of equality (2.14) implies the result (2.3).
P ROOF OF T HEOREM 2.2: The purpose of Theorem 2.2 is to replace the estimate in Lemma 2.1 by an expansion with computable leading-order term. The expected values ∂α u(x, t) = ∂α E[g(X (T )) | X (t) = x] will be replaced by the corresponding derivatives of u(x, ¯ t) ≡ E[g(X (T )) | X (t) = x] , based on the known approximate solution X . This proof is divided into the three steps of Lemmas 2.3 through 2.5: The first lemma estimates the quadrature error; the second lemma shows the error in replacing ∂α u by ∂α u; ¯ and the third lemma derives the representation of ∂α u¯ by the dual functions ϕ and ϕ 0 solving the backward evolution problems (2.7) and (2.9). The lemmas are prepared by first deriving the
10
A. SZEPESSY, R. TEMPONE, AND G. E. ZOURARIS
representation of ∂α u and ∂α u¯ in terms of expected values of stochastic flows, i.e., first and higher variations of X and X , respectively. In the theorem, the derivatives ∂α u¯ are computed by means of the discrete dual functions ϕ and ϕ 0 , defined in (2.6)–(2.9). In this first step, ∂α u and its approximation ∂α u¯ are evaluated by expected values of stochastic flows of X and X , respectively. Lemma 2.3 then relates the duals and the stochastic flows, which have the following equations. Recall the definition of the variation of a process Y : The first variation of a function F(Y (T )) with respect to a perturbation in the initial location of the path Y at time s is denoted by (2.15)
F 0 (T ; s) = ∂ y(s) F(Y (T )) ∂ ∂ F(Y (T ); Y (s) = y), . . . , F(Y (T ); Y (s) = y) . ≡ ∂ y1 ∂ yd
( 0 , i 6= k , δik ≡ 1, i = k . The equations for the first variation, Let
(2.16)
d X i0 j = ∂k ai (s, X (s))X k0 j (s)ds + ∂k bi` (s, X (s))X k0 j (s)dW ` (s), X i0 j (t)
s >t,
= δi j ,
the second variation,
d X i00jn = ∂k ai (s, X (s))X k00jn (s) + ∂kr ai (s, X (s))X k0 j (s)X r0 n (s) ds + ∂k bi` (s, X (s))X k00jn (s)
+ ∂kr bi` (s, X (s))X k0 j (s)X r0 n (s) dW ` (s) ,
(2.17) X i00jn (t)
s >t,
= 0,
the third variation, for s > t, d X i000jnm = ∂k ai (s, X (s))X k000jnm (s) + ∂kr ai (s, X (s))X k0 j (s)X r00nm (s)
(2.18)
0 0 + ∂kr ai (s, X (s))X kn (s)X r00jm (s) + ∂kr ai (s, X (s))X km (s)X r00jn (s) 0 + ∂kr v ai (s, X (s))X k0 j (s)X r0 n (s)X vm (s) ds
+ ∂k bi` (s, X (s))X k000jnm (s) + ∂kr bi` (s, X (s))X k0 j (s)X r00nm (s) 0 0 + ∂kr bi` (s, X (s))X kn (s)X r00jm (s) + ∂kr bi` (s, X (s))X km (s)X r00jn (s) 0 + ∂kr v bi` (s, X (s))X k0 j (s)X r0 n (s)X vm (s) dW ` (s) ,
X i000jnm (t) = 0 ,
and similarly the fourth variation, (2.19)
d X i0000jnmp = ∂ p (right-hand side of (2.18)) , X i0000jnmp (t) = 0 ,
WEAK APPROXIMATION OF STOCHASTIC DIFFFERENTIAL EQUATIONS
11
imply the representation with expected values of stochastic flows, cf. [28, 29], 0 (2.20) (T ) | X i0 j (t) = δi j , X (t) = x , ∂k u(t, x) = E ∂i g(X (T ))X ik 00 ∂kn u(t, x) = E ∂i g(X (T ))X ikn (2.21) (T ) 0 + ∂ir g(X (T ))X ik (T )X r0 n (T ) | 00 X ikn (t) = 0, X i0 j (t) = δi j , X (t) = x
(2.22)
000 0 ∂knm u(t, x) = E ∂i g(X (T ))X iknm (T ) + ∂ir g(X (T ))X ik (T )X r00nm (T ) 0 + ∂ir g(X (T ))X in (T )X r00km (T ) 0 (T )X r00kn (T ) + ∂ir g(X (T ))X im 0 0 (T )X r0 n (T )X vm (T ) | + ∂ir v g(X (T ))X ik
000 00 X iknm (t) = X ikn (t) = 0, X i0 j (t) = δi j , X (t) = x ,
and (2.23)
∂knmp u(t, x) = ∂ p (right-hand side of (2.22)) .
Let Y = (X, X 0 , X 00 , X 000 , X 0000 )T and let I denote the d × d identity matrix. Then the system (2.16)–(2.19) can be written (2.24) dY = A(t, Y )dt + B ` (t, Y )d W ` (t) , t > t0 ,
Y (t0 ) = (x, I, 0, 0, 0)T .
Furthermore, rewrite the representation (2.20)–(2.23) as 0 f i (Y ) ≡ ∂k g(X )X ki , 00 0 0 f i j (Y ) ≡ ∂k g(X )X ki j + ∂kn g(X )X ki X n j ,
(2.25)
000 0 00 f i jm (Y ) ≡ ∂k g(X )X ki jm + ∂kn g(X )X ki X n jm 00 0 + ∂kn g(X )X k0 j X nim + ∂kn g(X )X km X n00ji 0 0 + ∂knv g(X )X ki X n0 j X vm , f i jmn (Y ) ≡ ∂n (right-hand side of f i jm (Y ) in (2.25)) .
The Euler approximation Y = (Y 0 , Y 1 , Y 2 , Y 3 , Y 4 )T of Y in (2.24) with piecewise constant drift and diffusion fluxes (2.26)
dY = A(t; Y )dt + B ` (t; Y )dW ` (t) ,
defined as in (2.1), yields the following error of the expected value: E f α (Y (T )) − f α (Y (T )) | Y (t) = Y¯ (t) = (x, I, 0, 0, 0)T . An important consequence of the Euler method is that the first variation of Y 0 ≡ X is in fact equal to the function Y 1 in Y . The expected value has, by Lemma 2.1, the
12
A. SZEPESSY, R. TEMPONE, AND G. E. ZOURARIS
estimate
E f α (Y (T )) − f α (Y (T )) | Y (t) = Y (t) = (x, I, 0, 0, 0)T = Z T E (A − A)k ∂k ν α (s, Y (s)) | Y (t) = (x, I, 0, 0, 0)T ds t Z T + E (D − D)kn ∂kn ν α (s, Y (s)) | Y (t) = (x, I, 0, 0, 0)T ds
(2.27)
t
to be used in Lemmas 2.3 through 2.5, where Dkn = 12 Bk` Bn` ,
D kn = 12 B `k B `n ,
and
∂ α ν − Ak ∂k ν α − Dkn ∂kn ν α = 0 , t < T , ∂t The next steps in the proof are the three lemmas. −
ν α (T, ·) = f α .
L EMMA 2.3 Suppose that the assumptions in Lemma 2.1 hold. Then the quadrature error satisfies Z tn+1 E ai (t, X (t)) − a i (t; X ) ∂i u(t, X (t)) dt tn
and Z
1tn − E ai (tn+1 , X (tn+1 )) − ai (tn , X (tn )) ∂i u(tn+1 , X (tn+1 )) = 2 O (1tn )3
tn+1
tn
E di j (t, X (t)) − d i j (t; X ) ∂i j u(t, X (t)) dt
1tn = − E di j (tn+1 , X (tn+1 )) − di j (tn , X (tn )) ∂i j u(tn+1 , X (tn+1 )) 2 O (1tn )3 .
P ROOF : Introduce the notation f (t, X (t)) ≡ di j (t, X (t)) − d i j (t; X ) ∂i j u(t, X (t)) , t − tn di j (tn+1 , X (tn+1 )) − d i j (tn ; X ) ∂i j u(tn+1 , X (tn+1 )) . f¯(t) ≡ 1tn Then the quadrature error satisfies Z tn+1 E[ f (t, X (t)) − f¯(t)]dt = tn
Z
tn+1 tn
E di j (t, X (t)) − d i j (t; X )) ∂i j u(t, X (t)) dt 1tn − E di j (tn+1 , X (tn+1 )) − d i j (tn ; X ) ∂i j u(tn+1 , X (tn+1 )) , 2
WEAK APPROXIMATION OF STOCHASTIC DIFFFERENTIAL EQUATIONS
13
and E[ f¯(t)] is the linear nodal projection of the smooth function E[ f (t, X (t))] in the interval [tn , tn+1 ]. Therefore, a standard interpolation estimate yields Z tn+1 Z tn+1 2 d 1 2 dt . E f (t, X (t)) − f¯(t) dt ≤ (1tn ) E[ f (t, X (t))] (2.28) 8 dt 2 tn tn Itô’s formula shows that d E[ f (t, X (t))] = E[L f (t, X (t))] ≤ C , dt (2.29) d2 E[ f (t, X (t))] = E[L 2 f (t, X (t))] ≤ C , dt 2 where ∂ 1 Lf ≡ f + a i ∂i f + d i j ∂i j f , ∂t 2 which combined with (2.28) proves the estimate of the diffusion terms in the lemma. The estimate of the drift terms follows analogously. L EMMA 2.4 Let the standard σ -algebra generated by {W (s) : s ≤ tn } be Ftn , and let the piecewise constant mesh function 1t be defined by 1t (τ ) ≡ 1tn
for τ ∈ [tn , tn+1 ) and n = 0, 1, . . . , N − 1 .
Suppose that the assumptions in Lemma 2.1 hold. Then the discretization errors of the stochastic flows satisfy, for tn ≤ t < tn+1 , Z T (2.30) ∂α (u − u)(t, ¯ X (t)) = O(1t (s))ds, |α| ≤ 4 , t
and (2.31)
(2.32)
E ai (t, X (t)) − a i (tn ; X ) ∂i u(t, X (t)) − ∂i u(t, ¯ X (t)) = Z T O(1t (s))ds , 1tn t E di j (t, X (t)) − d i j (tn ; X ) ∂i j u(t, X (t)) − ∂i j u(t, ¯ X (t)) = Z T O(1t (s))ds . 1tn t
P ROOF : Lemma 2.1 and the representation of ∂α u and ∂α u¯ by the stochastic flows (2.20)–(2.21) and (2.25)–(2.26) in (2.27) show that, for |α| ≤ 4, ¯ X (t)) = (2.33) ∂α (u − u)(t, Z T E (Ai − Ai )∂i ν α (s, Y (s)) + (Di j − D i j )∂i j ν α (s, Y (s)) | Ft ds . t
Introduce the notation f (s, Y (s)) ≡ (Ai − Ai )∂i ν α (s, Y (s)) + (Di j − D i j )∂i j ν α (s, Y (s)) ,
14
A. SZEPESSY, R. TEMPONE, AND G. E. ZOURARIS
and let, for tm ≤ s < tm+1 ,
L Y w(s, Y (s)) ≡
∂ w + An ∂n w + D kn ∂kn w ∂t
s, Y (s) .
Then Itô’s formula implies, as in (2.29), that Z s E f (s, Y (s)) | Ft = E[L Y f | Ft ](τ )dτ = O(1tm ) ,
tm ≤ s < tm+1 ,
tm
which combined with (2.33) proves (2.30). The estimate (2.32) follows similarly by defining f˜(t, X (t)) ≡ (di j − d i j )∂i j (u − u)(t, ¯ X (t)) . Then the Itô’s formula shows as in (2.28) and (2.29) Z t ˜ E[L f˜](s)ds , tn ≤ t < tn+1 . E[ f (t, X (t))] = tn
The final step to prove (2.32) is to establish Z T ˜ E[L f ](s) = O(1t (τ ))dτ . s
The function L f˜ splits into the two types of terms f 1 ≡ (d − d)v and f 2 ≡ v∂α (u − u), ¯ with smooth functions v of (s, X (s)). The Itô formula again shows that Z s E[ f 1 ](s) =
E[L f 1 ](τ )dτ = O(1tm ) ,
tm ≤ s < tm+1 .
tm
Moreover, (2.30) implies
Z E[ f 2 ](s) =
T
O(1t (τ ))dτ ,
s
and consequently (2.32) holds. The estimate (2.31) of the drift terms follows analogously. L EMMA 2.5 Suppose that the assumptions in Lemma 2.1 hold. Then the dual functions ϕ and ϕ 0 , defined by (2.6)–(2.9), satisfy (2.34)
∂i u(t ¯ n , X (tn )) = E[ϕi (tn ) | Ftn ] ,
(2.35)
∂i j u(t ¯ n , X (tn )) = E[ϕi0 j (tn ) | Ftn ] ,
and for t = tn+1
(2.36)
E ai (t, X (t)) − a i (tn ; X ) E[ϕi (t) | Ft ] = E ai (t, X (t)) − a i (tn ; X )) ϕi (t) , E di j (t, X (t)) − d i j (tn ; X ) E[ϕi0 j (t)| Ft ] = E di j (t, X (t)) − d i j (tn ; X )) ϕi0 j (t) .
WEAK APPROXIMATION OF STOCHASTIC DIFFFERENTIAL EQUATIONS
15
P ROOF : Equations (2.1), (2.16), and (2.20) show that the first variation of the Euler approximation X is in fact equal to the Euler approximation of the first variation X 0 and consequently 0 ¯ X (t)) = E ∂ j g(X (T ))X ji (T ; t) | Ft , ∂i u(t, 0
where X ji (s; t), s > t, is the Euler approximation (2.26) of X 0 with initial data 0
X ji (t; t) = δ ji . Let ϕ be the solution of (2.7). Then (2.26) implies 0=
N −1 X
0 ϕi (tn ) − ∂i c j tn , X (tn ) ϕ j (tn+1 ) X ik (tn ; tm )
n=m
=
N −1 X
0 0 ϕi (tn+1 ) X ik (tn+1 ; tm ) − ∂ j ci tn , X (tn ) X jk (tn ; tm )
n=m 0
0
+ ϕi (tm )X ik (tm ; tm ) − ϕi (T )X ik (T ; tm ) 0
0
= ϕi (tm )X ik (tm ; tm ) − ϕi (T )X ik (T ; tm ) . Therefore, the initial conditions of (2.7) and (2.26), in (2.24), yield (2.37)
0 (T ; tm ) , ϕk (tm ) = ∂i g(X (T ))X ik
which proves (2.34). Equality (2.37) implies that
∂i j u(t ¯ n , X (tn )) = E ∂x j (tn ) ϕi0 (tn ) | Ftn .
The next step is to verify that the first variation of ϕ, ∂ϕi (tn ; X (tn ) = x) , ∂ xj satisfies the backward recursive equation (2.9). First, differentiate equation (2.7) for ϕ to obtain (2.38)
(2.39)
ϕi0 j (tn ) ≡ ∂x j (tn ) ϕi (tn ) ≡
0 ϕik (tn ) = ∂i c j (tn , X (tn ))∂xk (tn ) ϕ j (tn+1 ) + ∂k ∂i c j (tn , X (tn ))ϕ j (tn+1 ) , tn < T , 0 ϕik (T ) = ∂ik g(X (T )) .
Then the linear backward equation (2.7) shows that ϕ(tn+1 ) depends only on the point values {X (tm ) : tn+1 ≤ tm ≤ T } so that (2.40)
∂xk (tn ) ϕ j (tn+1 ) = ∂x p (tn+1 ) ϕ j (tn+1 )∂xk (tn ) X p (tn+1 ) .
Finally, the definitions of X and c in (2.1) and (2.6) imply (2.41)
∂k c p (tn , X (tn )) = ∂xk (tn ) X p (tn+1 ) ,
16
A. SZEPESSY, R. TEMPONE, AND G. E. ZOURARIS
which combined with (2.39) and (2.40) prove that ϕ 0 satisfies recursive equation (2.9). The measurability of (ai (tn+1 , X (tn+1 )) − a i (tn , X (tn ))) ∈ Ftn+1 proves for t = tn+1 that E ai (t, X (t)) − a i (tn , X (tn )) E[ϕi (t) | Ft ] = E E ai (t, X (t)) − a i (tn ; X ) ϕi (t) | Ft = E ai (t, X (t)) − a i (tn ; X ) ϕi (t) , and a similar argument for the diffusion term proves (2.36).
The proof of Theorem 2.2 is now concluded by combining Lemmas 2.3–2.5 and the central limit theorem to estimate I M and I I M . This section ends with three remarks on higher-order methods, the variance in the error bound, and general expected values. Remark 2.6 (Higher-Order Methods). Talay and Tubaro point out in [32] that expansion (1.4) justifies Romberg extrapolation to obtain second-order accuracy provided the statistical error is sufficiently small. By Theorems 2.2 and 3.3, Romberg extrapolation also works for variable time steps if every time step is halved. However, the error density for this second-order method is not known in the computation and therefore a choice of time steps would be suboptimal. Let us compare the computational work for ordinary and stochastic differential equations approximated by p th -order accurate methods. Assume that the solution of a deterministic ordinary differential equation is smooth and bounded so that to approximate it with accuracy TOL requires the work O( pTOL−1/ p ) for a p th -order method. To approximate the expected value E[g(X (T ))] with accuracy TOL by a p th -order method requires the additional work of O(TOL−2 ) realizations; therefore the total work for the Monte Carlo method is O( pTOL−2−1/ p ). With these assumptions, the optimal choice of the order is p = log TOL−1 for both deterministic and stochastic problems. However, if we compare, for fixed work, the gain in accuracy to do high-order methods instead of first-order, the result is as follows: For a deterministic differential equation, a first-order method with error TOL requires the same work −1 as the optimal higher-order method with error O(e−TOL ), while for a stochastic differential equation a first-order method with error TOL requires the same work as the optimal higher-order method with error slightly larger than TOL3/2 . Hence the maximal gain in accuracy to use higher-order methods is much less in the stochastic case and in a sense comparable with the improvement to use a third-order instead of a second-order accurate method in the deterministic case provided Var[g(X (T ))] TOL2 . The opposite with negligible noise, satisfying Var[g(X (T ))] ≤ TOL2 , behaves like the deterministic case. The work [21] assumes small noise to construct new, simplified higher-order methods for weak approximations of stochastic differential equations.
WEAK APPROXIMATION OF STOCHASTIC DIFFFERENTIAL EQUATIONS
17
Remark 2.7 (Variance of the Error Bound). The number of realizations to determine a reliable error estimate is in general much smaller than the number of realizations to approximate E[g(X (T ))], which is proportional to TOL−2 . Let us now study how many realizations M are needed for the error estimate. Write the sum of the leading-order term in (2.5), i.e., the sum of the terms inside of the two square brackets, as (2.42)
ξ≡
M N −1 X X ρn (ω j ) n=0 j=1
M
1tn2 ,
where M is the number R T stochastic R T of realizations to evaluate the error estimate. The variable ξ has mean 0 O(1t)dt and, by (2.10) and (2.11), variance 0 O(1t)dt/ M. A useful error estimate therefore requires s Z T O(1t)dt TOL , M 0 and consequently Z (2.43)
M 0
T
O(1t)dt = O(TOL−1 ) . 2 TOL
Estimates (2.10) P and (2.11) imply that a good approximation of the density based directly on j ρn (ω j )/M would require M ∼ O(( 2 1t)−1 ), with statistical error , which can be more demanding than (2.43) for a nonuniform mesh. A remedy for this is to use approximate ergodicity and local averages in time to obtain a statistically good approximation of the density with M based on (2.43); see Section 5 and the appendix. In Theorem 3.3 an alternative error expansion with leading-order term M N −1 X X ρ˜ jn 1t 2 n
n=0 j=1
M
RT is given, which has the same mean but a smaller variance ( 0 O(1t)dt)2 /M. A consequence for deterministic time steps is that M O(1) is enough in this case. However, the estimate in Theorem 3.3 requires more computational work per realization. Remark 2.8 (More General Expected Values). Suppose that h : [0, T ] × Rd → R is sufficiently smooth. Then the error estimates in Lemma 2.1 and Theorems 2.2 and 3.3 include estimates of expected values of the form Z T h(t, X (t))dt + g(X (T )) E 0
18
A. SZEPESSY, R. TEMPONE, AND G. E. ZOURARIS
by introducing the additional variable X d+1 and equation d X d+1 = h(t, X (t))dt to (1.1). By eliminating the additional variables in X and ϕ, (2.7) is replaced by ϕi (tn ) = ∂i c j (tn , X (tn ))ϕ j (tn+1 ) + ∂i h(tn , X (tn ))1tn ,
tn < T ,
ϕi (T ) = ∂i g(X (T )) , and (2.9) by 0 ϕik (tn ) = ∂i c j (tn , X (tn ))∂k c p (tn , X (tn ))ϕ j0 p (tn+1 ) + ∂ik c j (tn , X (tn ))ϕ j (tn+1 )
+ ∂ik h(tn , X (tn ))1tn , 0 ϕik (T )
tn < T ,
= ∂ik g(X (T )) .
3 An Error Estimate with Stochastic Time Steps This section derives error estimates with time steps that are stochastic and determined individually for each realization by the whole solution path X . The analysis will use the Malliavin derivative, ∂W (t) Y , which is the first variation of a process Y with respect to a perturbation d W (t) at time t of the Wiener process; cf. [25]. The Malliavin derivative for a stochastic integral X is related to the first variation ∂x(t) X for a perturbation of the position at time t by (3.1)
∂ X k (t) ∂x (t) X (τ ) = bk` (X (t))∂xk (t) X (τ ) , ∂ W ` (t) k ∂W ` (t) X (τ ) = 0 ,
∂W ` (t) X (τ ) =
τ >t, τ and conditioned on the σ -algebra M(t, 1t) ˆˆ the probability to change the mesh by a change in only 1W ˆˆ (t) is arbitrarily t + 1t}, ˆˆ (t) for s < t. Conditioned small; thus X (s) will be essentially independent of 1W
ˆˆ ) ≡ E[r | M(t, 1t)] ˆˆ ˆˆ denote the dependence of the error on M(t, 1t), let rn (1W n ˆ ˆ . The Malliavin derivative and Taylor’s formula imply indicator rn on the noise 1W Z 1 ˆˆ ) = r (0) + 1W ˆˆ ` ˆˆ )ds . rn (1W ∂W ` (t)rn (s 1W n 0
ˆˆ ) and r (0) is the same provided The mesh generated by r· (1W · Z 1 ˆˆ ) = δ − r (0) + 1W ˆˆ ` ˆˆ )ds 0 < δ − rn (1W ∂ ` r (s 1W n W (t) n 0 (3.5a) ˆˆ · r 0 ) for all n ≡ δ − (rn (0) + 1W n
WEAK APPROXIMATION OF STOCHASTIC DIFFFERENTIAL EQUATIONS
21
and min(δ − rn (0)) ≡ (ω) > 0 .
(3.5b)
n
ˆˆ · r 0 < (ω). Let Therefore, (3.5a) and (3.5b) hold if 1W n ˇ ≡ ; (supn 1tn )2 C(TOL) ˆˆ | < ˇ implies 1W ˆˆ · r 0 < , since by (3.3) then |1W n R1 ˆˆ 2 (1t ) | n ˆ ˆ 0 ∂W (t) ρ(tn , s 1W )ds| 0 0 0 ˆ ˆ ≤, 1W · rn ≤ |1W ||rn | < ˇ |rn | = (sup 1tn )2 C(TOL) ˆˆ ) = E[ρ(t ) | M(t, 1t)]. ˆˆ Consequently, the following indepenwhere ρ(tn , 1W n dence claim holds: ˆˆ (t) conditioned on |1W ˆˆ (t)| < ˇ and X (tn ) is independent of 1W (3.6a) ˆˆ for t < t, M(t, 1t) n ˆˆ (t)) is, for sufand the probability of having different meshes with r· (0) or r· (1W ˆˆ bounded by ficiently small 1t, (ˇ )2 ˆ ˆ ˆˆ 3 . ˆ | ≥ ˇ | M(t, 1t) ˆ (3.6b) P |1W ≤ C`0 exp − (1t) ˆ ˆ (41t) ˆ on Let Xˆ be a forward Euler approximation of X with uniform time steps 1t ˆ ˆ a much finer grid than 1t so that {n 1t : n = 0, 1, . . . , N } includes the time ˆˆ (t, ω) ≤ 1t ˆ ≤ 1t (t, ω). The standard proof of strong steps {tm } for X and 1t ˆ and convergence for the Euler method starts by writing t = nˆ 1t ˆ = ˆ − X (nˆ 1t) Xˆ (nˆ 1t)
n−1 ˆ X
ˆ i + 1i b` 1W ˆ i` , 1i a 1t
i=0
where ˆ Xˆ (i 1t)) ˆ 1i a = a(i 1t, − a(tn , X (tn )) ,
ˆ < tn+1 , tn ≤ i 1t
ˆ Xˆ (i 1t)) ˆ 1i b` = b` (i 1t, − b` (tn , X (tn )) ,
ˆ < tn+1 , tn ≤ i 1t
ˆ i` = W ` ((i + 1)1t) ˆ − W ` (i 1t) ˆ . 1W Therefore, ˆ − X (nˆ 1t)| ˆ 2 = (3.7) E | Xˆ (nˆ 1t) X ˆ i + 1i b`j 1W ˆ i` 1i 0 a j 1t ˆ i 0 + 1i 0 b`j 0 1W ˆ i`0 0 . E 1i a j 1t i,i 0
22
A. SZEPESSY, R. TEMPONE, AND G. E. ZOURARIS
To prove convergence, we shall use the uniform bound E[|X (t)|2 ] ≤ C. This bound follows by the same steps as the convergence proof below applied to X ˆ 2 = ˆ i + b`j 1W ˆ i` )(a j 1t ˆ i 0 + b`j 0 1W ˆ i`0 0 ) E (a j 1t E |X (nˆ 1t)| i,i 0
instead of (3.7). Let us first analyze one term in the right-hand side of (3.7), X ˆ i` 1i 0 b`j 0 1W ˆ i`0 0 = E 1i b`j 1W i,i 0
2
X X ˆ i` 1i 0 b`j 0 1W ˆ i`0 0 + ˆ i` 1i b`j 0 1W ˆ i`0 . E 1i b`j 1W E 1i b`j 1W i 2Φ(c0 ) − 1 ≥ 0.901 and the event S(Y ; M ) √ M has probability close to one, which involves the additional step to approximate σY by S(Y ; M ), cf. [12]. Thus, in the computations ES (Y ; M ) is a good approximation of the statistical error ES (Y ; M ). For a given TOLS > 0, the goal is to find M such that ES (Y ; M ) ≤ TOLS . The following algorithm adaptively finds the number of realizations M to compute the sample average A(Y ; M ) as an approximation to E[Y ]. With probability close to one, depending on c0 , the statistical error in the approximation is then bounded by (3.5)
|ES (Y ; M )| ≤ ES (Y ; M ) ≡ c0
8
K.-S. MOON, A. SZEPESSY, R. TEMPONE, AND G.E. ZOURARIS
TOLS , see Theorems 3.3 and 3.9. Technical reasons to prove a.s. convergence in Lemma 4.4 motivates our choice M = 2n , n ∈ N. routine Monte-Carlo(TOLS , M0 ; EY ) Set the batch counter m = 1, M [1] = M0 and ES [1] = +∞. Do while (ES [m] > TOLS ) Compute M [m] new samples of Y , along with the sample average EY ≡ A(Y ; M [m]), the sample variance S[m] ≡ S(Y ; M [m]) and the deviation ES [m + 1] ≡ ES (Y, M [m]). Compute M [m + 1] by change M (M [m], S[m], TOLS ; M [m + 1]). Increase m by 1. end-do end of Monte-Carlo routine change M (Min , Sin , TOLS ; Mout ) o n c0 Sin 2 , MCH × Min M∗ = min integer part TOL S (3.6) n = integer part (log2 M ∗ ) + 1 Mout = 2n . end of change M Here, M0 is a given initial value for M , and MCH > 1 is a positive integer parameter introduced to avoid a large new number of realizations in the next batch due to a possibly inaccurate sample standard deviation S[m]. Indeed, M [m + 1] cannot be greater than MCH × M [m]. Control of the time discretization error. For given time nodes 0 = t1 < ... < tN +1 = T , let the piecewise constant mesh function ∆t be determined by ∆t(τ ) ≡ ∆tn for τ ∈ [tn , tn+1 ) and n = 1, . . . , N. Then the number of time steps that corresponds to a mesh ∆t, for the interval [0, T ], is given by Z T 1 N (∆t) ≡ (3.7) dτ. ∆t(τ ) 0 Consider, for τ ∈ [tn , tn+1 ) and n = 1, . . . , N , the piecewise constant function (3.8)
ρ(τ ) ≡ sign(ρn ) max (|ρn |, δ)
which measures the density of the time discretization error, where ρn = ρ(tn , X) is defined by (2.6) for the stochastic time stepping algorithm or ρn = ρ(tn , X) from (2.13) for the deterministic time stepping algorithm. Here we have sign(x) ≡ x/|x|, for x ∈ R − {0}, sign(0) ≡ 1, and we use the positive parameter, δ, satisfying δ + TOL2 /δ → 0, in order to guarantee that ∆tsup → 0 and that ρ(t) and ρn have the same limit, as TOLT → 0. From now on, with a slight abuse of notation, ρ(tn ) = ρn denotes the modified density (3.8). Following the error expansion in Theorem 2.2, the time discretization error is approximated by "N # X (3.9) |ET | = |E[g(X(T )) − g(X(T ))]| . E rn n=1
CONVERGENCE RATES FOR ADAPTIVE WEAK APPROXIMATION OF SDE’S
9
using the error indicator, rn , defined by rn ≡ |ρ(tn )|∆t2n
(3.10)
with the modified error density defined by (3.8). To motivate the adaptivity procedure for the time partition, let us now formulate an optimal choice of the time steps by minimizing the expected computational work subject to the accuracy constraint "N # X (3.11) E rn ≤ TOLT . n=1
More precisely, solve (3.12)
f , such that ∆t f ∈ K and E min E N ∆t
"N ∆t f X
# rn ≤ TOLT ,
n=1
f and N ∆t f is the corresponding where K is the feasible set for the mesh function ∆t number of steps. The optimal choice of time steps in K is based on the given density ρn [k], which is piecewise constant on the mesh ∆t[k], k = 1, 2, . . . . The choice of K determines either deterministic time steps or stochastic time steps. For example, if we let ( ) f ∈ L2dt ((0, T )) :∆t f is deterministic, positive and ∆t K≡ piecewise constant on ∆t[k] then the objective function in (3.12) becomes deterministic and a standard application of a Lagrange multiplier shows that the minimizer of the problem (3.12) satisfies (3.13)
E[rn ] = constant, for all time steps n,
which sets the basis for the refinement procedure with deterministic time steps. On the other hand, letting ) ( f ∈ L2dt×P ((0, T ) × Ω)) :∆t f is stochastic, positive and ∆t K≡ piecewise constant on ∆t[k](ω) leads to (3.14) rn (ω) = constant, for all time steps n and for all realizations ω, which sets the basis for the refinement procedure with stochastic time steps. Thus, the adaptive algorithm with stochastic time steps uses the optimal conditions (3.11) and (3.14) to construct the mesh which may be different for each realization. On the other hand, the adaptive algorithm with deterministic time steps uses the optimal conditions (3.11) and (3.13) to construct the mesh which is the same for all realizations. 3.1. Convergence rates for stochastic time steps. The optimal conditions (3.11), (3.14) and the restriction (2.2) motivate that the goal of the adaptive algorithm is to construct a time partition ∆t of [0, T ] for each realization such that (3.15)
s2
TOLT TOLT ≤ |ρn |∆t2n ≤ s1 , ∀ n = 1, ..., N E[N ] E[N ]
where s1 and s2 are given constants satisfying 0 < s2 < s1 . Note that in practice the quantity E[N ] is not known and we can only estimate it by a sample average
10
K.-S. MOON, A. SZEPESSY, R. TEMPONE, AND G.E. ZOURARIS
A(N ; M ) from the previous batch of realizations. The statistical error |E[N ] − A(N ; M )| is then bounded by ES (N ; M ), with probability close to one, by the same argument as in (3.5). The remainder of this section analyzes an adaptive algorithm based on (3.15) with respect to stopping, accuracy and efficiency. Let N [j] ≡ A(N ; M [j]) be the sample average of the final number of time steps in the j-th batch of M [j] numbers of realizations. To achieve (3.15), start with an initial partition ∆t[1] and then specify iteratively a new partition ∆t[k + 1], from ∆t[k], using the following dividing and merging strategy: for each realization in the m-th batch and for each time step n = 1, 2, . . . , N [k], (3.16) (3.17)
TOLT , then divide ∆tn [k] into H substeps N [m − 1] TOLT elseif max (rn [k], rn+1 [k]) ≤ s2 , then merge ∆tn [k] and N [m − 1] ∆tn+1 [k] into one step, and increase n by 1, else let the new step be the same as the old one. endif if rn [k] ≥ s1
Here H is a given integer greater than 1, which bounds the increment of the number of time steps from one iteration to the next. The following analysis, for fixed H, can easily be extended to bounded and varying H. The dividing and merging strategy (3.16, 3.17) motivates the following stopping criteria: for each realization of the m-th batch TOLT (3.18) , ∀ n = 1, . . . , N [k] and if rn [k] < S1 N [m − 1] TOLT (3.19) max (rn [k], rn+1 [k]) > S2 , ∀ n = 1, . . . , N [k] − 1 N [m − 1] then stop the dividing-merging process. Here S1 and S2 are given constants satisfying 0 < S2 < s2 < s1 < S1 . The combination of (3.9), (3.11) and the upper bound (3.18) asymptotically guarantee a given level of accuracy, |E[g(X(T )) − g(X(T ))]| < S1 TOLT , while the lower bound (3.19) implies efficiency by almost optimal time steps. When almost all rn satisfy rn < s1 TOLT /N , the reduction of the error may be slow. Therefore the algorithm stops if maxn rn < S1 TOLT /N and minn (max(rn , rn+1 )) > S2 TOLT /N . What is the right choice for the constants S2 < s2 < s1 < S1 of the dividing and merging strategy? Before determining sufficient conditions for the constants, we will describe the algorithm with stochastic time steps in detail. Remark 3.1. In practice, our numerical tests show that |E [g(X(T ))] − A(g(X(T )); M )| s1 1 ≈ < S1 , TOL 2 3 so we choose s1 = 2. The adaptive algorithm. The adaptive, stochastic time stepping algorithm has a structure similar to the Monte-Carlo routine. First we split the specified error tolerance by (3.2) and the outer loop computes the batches of realizations of X, until
CONVERGENCE RATES FOR ADAPTIVE WEAK APPROXIMATION OF SDE’S
11
an estimate for the statistical error (3.5) is below the tolerance, TOLS . In the inner loop for each realization, we apply our dividing and merging strategy (3.16, 3.17) to a given initial time mesh iteratively until the approximate solution is sufficiently resolved, in other words, until the approximate error density and the time steps satisfy the stopping criteria (3.18, 3.19) with a given time discretization tolerance TOLT . This procedure needs to sample the Wiener process, W , on finer or coarser partitions, given its values on coarser or finer ones respectively. Therefore, the use of Brownian bridges is natural to preserve the required independence between the Wiener increments. Now we are ready for the detailed definition of the adaptive algorithm with stochastic steps: Algorithm S Initialization Choose: an error tolerance, TOL ≡ TOLS + TOLT , a number, N [1], of initial uniform steps ∆t[1] for [0, T ] and set N = N [1], a number, M [1], of initial realizations, an integer H ≥ 2 for the number of subdivisions of a refined time step, a number, s1 = 2 in (3.16) and a rough estimate of c in (3.20) to compute s2 , S1 , S2 using (3.21, 3.22, 3.23), and (5) a constant c0 ≥ 1.65 and an integer MCH≥ 2 to determine the number of realizations in (3.6). (1) (2) (3) (4)
Set the two iteration counters, m for batches and k for time refinement levels, to 1. Set the stochastic error ES [1] = +∞. Do while ( ES [m] > TOLS ) For realizations j = 1, . . . , M [m] Set k = 1 and r[1] = +∞. Start with the initial partition ∆t[1] and generate ∆W [1]. Do while ( r[k] violates the stopping (3.18, 3.19) ) Compute the approximation X[k] and the error indicator r[k] in (3.10) using the error density (2.6) on ∆t[k] with the known Wiener increments ∆W [k]. If ( r[k] violates the stopping (3.18, 3.19) ) For time steps i = 1, . . . , N [k] Do the dividing and merging process (3.16, 3.17) to compute ∆t[k + 1] from ∆t[k] and compute ∆W [k + 1] from ∆W [k] using Brownian bridges. end-for end-if Increase k by 1. end-do end-for Compute the sample average Eg ≡ A g(X(T )); M [m] , the sample standard deviation S[m] ≡ S(g(X(T )); M [m]) and the a posteriori bound for the statistical error ES [m] ≡ ES (g(X(T )), M [m]) in (3.5). if ( ES [m] > TOLS ) Compute M [m + 1] by change M (M [m], S[m], TOLS ; M [m + 1]), cf. (3.6), and update N = A (N ; M [m]), where the random variable N is
12
K.-S. MOON, A. SZEPESSY, R. TEMPONE, AND G.E. ZOURARIS
the final number of time steps on each realization. end-if Increase m by 1. end-do Accept Eg as an approximation of E[g(X(T ))], since the estimate of the computational error is bounded by TOL. Note that the dividing and merging strategy in (3.16, 3.17) may do infinite loops at some time steps if the constant s2 is too close to s1 . Clearly, we want to avoid the case where the time step ∆t(t)[k] is divided but in the next iteration, ∆t(t)[k + 1] is merged. An important ingredient to avoid such instability is the proof in Section 4 that the error density converges a.s. as the maximal time step tends to zero. Therefore, for sufficiently small initial meshes, which may depend on the realization and small tolerance, there exists a positive constant c such that for all t ∈ [0, T ] and for each realization ρ(t)[k + 1] ≤ c−1 . (3.20) c≤ ρ(t)[k] Stopping of the adaptive algorithm. The right choice of the parameters S2 < s2 < s1 < S1 is explained by Theorem 3.2 (Stopping). Suppose the assumptions of Theorem 2.2 hold and the adaptive algorithm uses the strategy (3.8), (3.16, 3.17), (3.18, 3.19). Assume that (3.20) holds with c ≥ 4−1 , and that c (3.21) s1 , s2 < H2 (3.22) S2 < cs2 , 1 (3.23) s1 . S1 > c Then for each realization the adaptive dividing-merging process stops by the stopping criteria (3.18, 3.19), after a finite number of operations. Proof. If s2 is too close to s1 , the algorithm may be unstable in the sense that a time step is first divided and then in the next iteration the step is merged, or similarly a step is first merged and then divided. To analyze this unstable situation assume that (3.20) holds. Merging two steps, where at least one has just been divided requires that TOLT TOLT , and max (ri [k + 1], ri+1 [k + 1]) ≤ s2 N N with ∆ti [k + 1] = ∆ti [k]/H, so that by (3.10) and (3.20) 2 N |ρ[k + 1]| ∆t[k + 1] (3.25) s2 ≥ r[k] TOLT |ρ[k]| ∆t[k] N 1 TOLT c ≥ c 2 s1 = 2 s1 . TOLT H H N (3.24) ri [k] ≥ s1
Similarly, dividing following merging requires s2 ≥ cs1 /4, which is less demanding than the smaller (3.25). Therefore the condition (3.21) avoids the dividing-merging instability.
CONVERGENCE RATES FOR ADAPTIVE WEAK APPROXIMATION OF SDE’S
13
The next step is to understand the evolution of rmax [k] ≡ max ri [k], and rmin [k] ≡ min max(ri [k], ri+1 [k]) i
i
for iterations k = 1, 2, . . .. It is advantageous if rmax decreases and rmin increases quickly to levels close to the bounds s1 TOLT /N and s2 TOLT /N , respectively. Indeed, there holds (3.26)
rmax [k + 1]
H 2 s1 TOLT /N , and (3.27)
rmin [k + 1] > 4crmin [k],
provided rmin [k] < s2 TOLT /(4N ). In other words, the error indicators rmax and rmin decrease and increase, respectively, with a constant factor away from the bounds rmax > H 2 s1 TOLT /N and rmin < s2 TOLT /(4N ). If rmax is close to the dividing bound, i.e. if rmax [k] ≤ H 2 s1 TOLT /N ,
(3.28)
then in the next iteration rmax [k + 1] ≤ c−1 s1 TOLT /N , since a divided step cannot be merged by (3.21). Therefore rmax [k + 1] enters the stopping region, rmax < S1 TOLT /N , provided S1 > s1 /c and the condition (3.28) remains satisfied for the later iterations provided c−1 ≤ H 2 . Similarly, if rmin is close to the merging bound, i.e. if rmin [k] ≥ s2 TOLT /(4N ),
(3.29)
then rmin [k + 1] ≥ cs2 TOLT /N . Therefore rmin belongs to the stopping region, rmin [k + 1] > S2 TOLT /N , provided S2 < cs2 and the condition (3.29) remains satisfied for the later iterations provided c ≥ 4−1 . When both rmax and rmin have entered their stopping regions, the dividing-merging process stops, and the algorithm continues with the next realization.
Accuracy of the adaptive algorithm. The adaptive algorithm guarantees that the estimated error is bounded by S1 TOLT + TOLS = ( S31 + 23 )TOL. The next question is whether the true error is bounded by ( S31 + 23 )TOL asymptotically. Using the upper bound (3.18) of the error indicators and the a.s. convergence of ρ, proved in Section 4, the approximate error has the estimate Theorem 3.3 (Accuracy). Suppose that the assumptions of Theorem 2.2 hold. Then the adaptive algorithm (3.8), (3.16, 3.17), (3.18, 3.19) satisfies, for any constant c0 > 0 defined in (3.6), (3.30) lim inf P
TOL→0
|E [g(X(T ))] − A(g(X(T )); M )| S1 2 ≤ + TOL 3 3
Z
c0
≥ −c0
2
e−x /2 √ dx. 2π
14
K.-S. MOON, A. SZEPESSY, R. TEMPONE, AND G.E. ZOURARIS
Proof. To simplify the proof, we split the fraction on the left hand side of (3.30) into a statistical error and a time discretization error, for all s ∈ R+ , (3.31) ));M )| S1 2 ≤ + lim inf TOL→0 P |E[g(X(T ))]−A(g(X(T TOL 3 3 |E [g(X(T ))−g(X(T ))]| |E [g(X(T ))]−A(g(X(T ));M )| ≥ lim inf P + ≤ TOL TOL |E [g(X(T ))−g(X(T ))]| ≥ lim inf P ≤s TOL |E [g(X(T ))]−A(g(X(T ));M )| S1 2 and ≤ + − s TOL 3 3 |E [g(X(T ))−g(X(T ))]| ≤ s) × = lim inf P ( TOL |E [g(X(T ))]−A(g(X(T ));M )| S1 2 P ( ≤ + − s) . TOL 3 3
S1 3
+
2 3
The time discretization error. When the adaptive algorithm stops, the error estimate (2.1) and the stopping bound for ∆t (3.18) imply " E g(X(T )) − g(X(T ))
≤ E
N X
Z
#
ti
|˜ ρ(τ )|dτ
∆ti ti−1
i=1
"
S1 /3 ! |E g(X(T )) − g(X(T )) | (3.34) lim inf P ≤ s = 1. TOL→0 TOL The statistical error. Use that in (3.6) the number of realizations is 2 c0 S(g(X(T )); M ) M≥ TOLS √ to rewrite 1/TOL ≤ (2/3c0 ) M /S(g(X(T )); M ) by (3.2), so that ! |E g(X(T )) − A(g(X(T )); M )| S1 2 P ≤ + −s TOL 3 3 ! |E g(X(T )) − A(g(X(T )); M )| √ 3c0 S1 2 ≥ P M≤ + −s . 2 3 3 S(g(X(T )); M ) Let χ ≡ (E[g(X(T ))]−A(g(X(T )); M ))/S(g(X(T )); M ) and χ ≡ (E[g(X(T ))]− A(g(X(T )); M ))/S(g(X(T )); M ). Then, write χ = (χ − χ) + χ and note that χ − χ * 0, weakly as ∆t → 0, and that S(g(X(T )); M ) → σg(X(T )) , strongly as M → ∞. Therefore the central limit theorem shows that limTOL→0 χ is normally distributed with mean zero and variance one and consequently (3.35) sup s>
S1 3
lim
TOL→0
P
|E g(X(T )) − A(g(X(T )); M )| √ S(g(X(T )); M )
3c0 M≤ 2
! S1 2 + −s 3 3
Z c0 −x2 /2 ( S31 + 23 −s) e−x2 /2 e √ √ = sup dx = dx. 3c S 2 0 1 S 2π 2π −c0 s> 31 − 2 ( 3 + 3 −s) Z
3c0 2
Finally the combination of (3.31), (3.34) and (3.35) proves the theorem.
16
K.-S. MOON, A. SZEPESSY, R. TEMPONE, AND G.E. ZOURARIS
Efficiency of the adaptive algorithm. The minimal expected number of time steps in the class of stochastic time steps ∆t(t, ω) depends on the stochastic data and the individual realizations X through the constraint E[g(X(T )) − g(X(T ))] = TOLT . The conditions (3.11) and (3.14) imply that the optimal expected number of time steps, E[NS ], satisfies (cf. [31]) "Z #!2 T p 1 (3.36) E[NS ] = E |ρ(τ )|dτ . TOLT 0 On the other hand, for the constant time steps ∆t = constant, the number of PN steps, NC , to achieve i=1 |ρi |∆t2i = TOLT , becomes ! Z T T (3.37) NC = E[|ρ(τ )|]dτ . TOLT 0 Therefore, Jensen’s inequality shows that the adaptive method with stochastic time steps uses fewer time steps than the method with a constant time step, i.e., E[NS ] ≤ NC .
(3.38)
The following theorem uses the lower bound (3.19) of the error indicators to show that the algorithm (3.16 - 3.19) generates a mesh which is optimal, up to a multiplicative constant C = 4/S2 . Theorem 3.4 (Efficiency). Suppose that the assumptions of Theorem 2.2 hold. Then there exists a constant C > 0, asymptotically bounded by 4/S2 , such that, using a final batch of M [m] realizations, the final sample average N [m] ≡ A(N ; M [m]) of the number of adaptive steps of the algorithm (3.16, 3.17), (3.18, 3.19) satisfies !!2 Z Tp N [m]2 (3.39) t + ∆t}, the probability to change the mesh by a change in ˆ ˆˆ ˆ ˆ (tˆ) is arbitrary small, thus X(τ ) will be essentially independent of ∆W only ∆W (t) ˆˆ ˆˆ ˆˆ for τ < tˆ. Conditioned on M(tˆ, ∆t), let rn (∆W ) ≡ E[rn |M(tˆ, ∆t)] denote the ˆˆ dependence of the error indicator rn on the noise ∆W . The Malliavin derivative, ∂W ` (t) rn , and Taylor’s formula imply Z 1 ˆ ˆˆ ` ˆˆ ˆ ) = rn (0) + ∆W rn (∆W ∂W ` (t) rn (τ ∆W )dτ. 0
ˆ ˆ ) and r· (0) is the same provided The mesh generated by r· (∆W (4.14)
TOLT ˆˆ − rn (∆W ) N Z 1 TOLT ˆˆ ` ˆˆ − rn (0) + ∆W ∂W ` (t) rn (τ ∆W )dτ = S1 N 0 TOLT ˆˆ ≡ S1 − (rn (0) + ∆W · rn0 ), for all n, N
0
0. n N
ˆˆ Therefore, (4.14, 4.15) hold if ∆W · rn0 < (ω) for all n. Let ˇ ≡ /((sup ∆tn )2 C(TOL)), n
ˆ ˆ ˆ | < ˇ implies ∆W ˆ · rn0 < , since then |∆W ˆ ˆ ˆ · rn0 ≤ |∆W ˆ ||rn0 | < ˇ|rn0 | = (∆tn )2 | ∆W
ˆˆ ∂W (t) ρ(tn , τ ∆W )dτ | ≤ , (sup ∆tn )2 C(TOL)
R1 0
ˆ ˆˆ ˆ ) ≡ E[ρ(tn )|M(tˆ, ∆t)] provided ρ(tn , ∆W and ρ is an approximate error density function satisfying (2.3). Consequently, the following independence claim holds: (4.16)
ˆˆ ˆ X(tn ) is independent of ∆W (t) conditioned on ˆ ˆˆ ˆ ˆ ˆ |∆W (t)| < ˇ and M(t, ∆t), for tn < tˆ,
ˆˆ ˆ and the conditional probability to have different meshes with r· (0) or r· (∆W (t)) is, ˆ ˆ for sufficiently small ∆t, bounded by (4.17) ˆ ˆ ˆˆ ˆˆ ˆˆ 3 ˆ | ≥ ˇ | M(tˆ, ∆t), ˆ Fs = 1 − P |∆W P |∆W | < ˇ | M(tˆ, ∆t), Fs ≤ (∆t)
CONVERGENCE RATES FOR ADAPTIVE WEAK APPROXIMATION OF SDE’S
27
since ˆ ˆ ˆ | < ˇ | M(tˆ, ∆t), ˆ Fs P |∆W
ˆˆ ˆˆ = P |∆W | < ˇ | M(tˆ, ∆t) ! (ˇ )2 ˆˆ 3 ≥ 1 − C`0 exp − > 1 − (∆t) . ˆˆ 4∆t
ˆˆ ˆ or τ > n ˆ + ∆t} Let us define the σ-algebra G generated by {dW (τ ) : τ < n ˆ ∆t ˆ ∆t ˆ and {∆t(τ ) : τ < n ˆ ∆t}. Then using the results (4.17) and (4.16), we get h i h h i i ˆ nˆ` | Fs = E E ∆b` ∆W ˆ nˆ` | G | Fs E ∆b` ∆W ˆˆ ` ˇ ` + (1 ˆ = E E ∆b` ∆W + 1 ) ∆W | G | F s 1 1 ˆ 1 | t0 Z(t0 ) = (x, I, 0, 0, 0)T . 0
00
000
The Euler approximation Z = (X, X , X , X )T of Z in (4.29) with piecewise constant drift and diffusion fluxes satisfies `
dZ = A(t; Z)dt + B (t; Z)dW ` defined as in (4.1). Applying Theorem 4.1 to Z and the augmented system (4.29) shows that the approximation Z converges a.s to the process Z. Therefore, the representations (4.26- 4.28) and the regularity assumptions on the functions g, a and b imply that ϕ, ϕ0 and ϕ00 converge a.s.. Consequently, the error density ρ converges a.s. to the true error density ρˆ as the specified error tolerance, TOL, tends to zero, i.e., as the maximum step size tends to zero. 4.2. Deterministic time steps. Lemma 4.4. Suppose that a, b, g and X satisfy the assumptions in Lemma 2.1. Then for any α ∈ (0, 1/2) (4.30) lim M α sup E[X(t)] − A(X(t); M ) = 0 a.s. M →∞
t∈[0,T ]
Proof. The idea is to split the difference in (4.30) into a sum of a martingale and a bounded integral and use Doob’s inequality for the martingale. Thus, representation (4.1) shows (4.31) E[X(t)] − A(X(t); M )
Z t Z t ` ` ` ¯ ¯ = E[ b (s; X(s))dW ] −A b (s; X(s))dW ; M 0 | 0 {z } =0
Z +
t
E[¯ a(s; X(s))] − A(¯ a(s; X(s)); M ) ds.
0
Denote by {FM,t }t≥0 the filtration generated by the M independent Wiener processes that define the sample average A(X(t); M ) and let the σ-algebra generated by the step sizes be σ(∆t). The construction of Algorithm D implies that σ(∆t) is independent of FM,t . The conditional expectation process Z t ` ` ¯ YM (t) ≡ E A b (s; X(s))dW ; M |σ(∆t) 0
30
K.-S. MOON, A. SZEPESSY, R. TEMPONE, AND G.E. ZOURARIS
is then a FM,t martingale and Doob’s inequality yields Var[YM (T )] P M α sup |YM (t)| ≥ ≤ M 2α 2 t∈[0,T ] RT ` ` ¯ 2α Var[E[A( 0 b (s, X(s))dW (s); M )|σ(∆t)]] =M 2 RT ` ` Var[E[ 0 ¯b (s, X(s))dW (s)|σ(∆t)]] = M 1−2α 2 CT ≤ 1−2α 2 . M Here CT is a constant that, due to the smoothness assumptions in the drift and RT diffusion coefficients, bounds uniformly the variance of 0 ¯b` (s, X(s))dW ` (s). The construction (3.6) shows that all M belong to the set {2k |k ∈ N}. Let Mk ≡ 2k , then (4.32) ∞ ∞ C X X 1 T P Mkα sup |YMk (t)| ≥ ≤ 2 0 and θf (ˆ y1 , τ ) > 0 such that n Z 1 − t2 CD θf (2n + 1) 1 dt kdn kH01 (D) ≤ τa ˜ 1 2n t + 1 + C(1 − τ ) −1 Proof. Consider 2n + 1 dn = 2
1
(2n + 1)(−1)n Ψ(t)pn (t)dt = n! 2n+1 −1
Z
Z
1
−1
dn Ψ(t)(1 − t2 )n dt. dtn
Use the analytic continuation of the real function Ψ to the complex domain as in Lemma 6.1. The application of Cauchy’s formula gives Z dn n!(−1)n Ψ(η) Ψ(t) = dη, dtn 2πi (η − t)n+1 γt where γt is a positively oriented closed circumference with center at the real point t ∈ (−1, 1), radius R(t), and such that all singularities from Ψ are exterior to γt . Denote the complex closed ball B(t, R(t)) = {z ∈ C : |z − t| ≤ R(t)}. Estimate (6.8) implies Z kf (y1 (η), yˆ1 , ·)kL2 (D) dn CD n! |dη| k n Ψ(t)kH01 (D) ≤ dt 2π a ˜(η)|η − t|n+1 γt Z CD n! |dη| ≤ sup kf (y1 (η), yˆ1 , ·)kL2 (D) (6.10) 2π ˜(η)|η − t|n+1 η∈γt γ a t CD n! 1 ≤ sup kf (y1 (η), yˆ1 , ·)kL2 (D) sup . (R(t))n η∈γt ˜(η) η∈γt a 21
Let θf ≡
sup kf (y1 (η), yˆ1 , ·)kL2 (D)
sup
t∈[−1,1] η∈γt
then estimate (6.10) implies (6.11)
kdn kH01 (D)
(2n + 1)CD θf ≤ 2n+1
Z
1
−1
1 inf η∈γt a ˜(η)
1 − t2 R(t)
n dt.
Now choose R(t) to control the above integral as follows. Let τ ∈ (0, 1) and consider the interval I = {y ∈ R : |y| ≤ (1 − τ ) kb1 kLν∞ (D) }. By construction, we have the strict inclusion Γ1 ⊂ I with )˜ ν dist(Γ1 , I) > (1−τ > 0. C1 Then, the corresponding preimages satisfy the inclusion (y1 )−1 (Γ1 ) = [−1, 1] ⊂ (y1 )−1 (I) and (y1 )−1 (I) = (−1 − δ1 , 1 + δ2 ) with δ1 , δ2 > 0. Choose the radius
R(t) ≡ min{t + 1 + δ1 , −t + 1 + δ2 } > 0
(6.12)
which implies, by virtue of (6.9), inf a ˜(η) ≥ τ a ˜1 .
(6.13)
η∈γt
Let (6.14)
(1 − τ )˜ ν 2 C1 ymin − ymax
δ(τ, y2 , . . . , yN ) ≡ min{δ1 , δ2 } ≥
and observe that (6.11-6.14) imply kdn kH01 (D)
(2n + 1)CD θf ≤ τa ˜1 2n
Z
1
−1
1 − t2 t+1+δ
n dt
which is what we wanted to prove. Now we use a result from [26], namely that we have Lemma 6.3 (Integral estimate). Let ξ < −1 and define 1 p , 0 < r < 1. r≡ |ξ| + ξ 2 − 1 Then there holds n
Z
1
(−1)
−1
t2 − 1 t+ξ
n
dt = (2r)n 2n+1
n! Φn,0 (r2 ) (2n + 1)!!
where Φn,0 (r2 ) is the Gauss hypergeometric function. Moreover, we have p 1 Φn,0 (r2 ) = 1 − r2 + O n1/3 uniformly with respect to 0 < r < 1. Lemma 6.4. Let τ ∈ (0, 1). Under Assumption 6.1 there exist positive constants C, θf > 0 such that r 2CD θf πn p 1 2 ) rn kdn kH01 (D) ≤ ( 1−r +O τa ˜1 2 n1/3 with 1 p r≡ , 0 0 is independent from u, h, pi and ri . The proof of the previous Theorem uses Theorem 6.2 and and is completely similar to the proof of Theorem 5.2. 7. Double orthogonal polynomials Here we explain the use of double orthogonal polynomials to compute efficiently the solution of the k×h−version and the p×h−version studied in Sections 5 and 6 respectively. The idea is to use a special basis to decouple the system in the y-direction, yielding just a number of undecoupled systems, each one with the size and structure of one Monte Carlo realization of (4.1). Without any loose of generality we focus on the p-version, i.e. find uph ∈ Z p ⊗ Xhd such that Z Z Z Z p (7.1) ρ(y) a(y, x)∇uh (y, x) · ∇v(y, x)dxdy = ρ(y) f (y, x)v(y, x)dxdy ∀v ∈ Z p ⊗ Xhd . Γ
D
Γ
23
D
Let {ψj (y)} be a basis of the subspace Z p ⊂ L2 (Γ) and {ϕi (x)} be a basis of the subspace Xhd ⊂ H01 (D). Write the approximate solution as X (7.2) uph (y, x) = uij ψj (y)ϕi (x) j,i
and use test functions v(y, x) = ψk (y)ϕ` (x) to find the coefficients uij . Then (7.1) gives Z Z Z X Z ρ(y)ψk (y)ψj (y) a(y, x)∇ϕi (x) · ∇ϕ` (x)dxdy uij = ρ(y)ψk (y) f (y, x)ϕ` (x)dxdy, ∀k, ` j,i
Γ
D
Γ
D
R R P which can be rewritten as j,i Γ ρ(y)ψk (y)ψj (y)Ki,` (y)dy uij = Γ ρ(y)ψk (y)f` (y)dy ∀k, ` with R R Ki,` (y) ≡ D a(y, x)∇ϕi (x) · ∇ϕ` (x)dx and f` (y) ≡ D f (y, x)ϕ` (x)dx. Now, if we have that the PN diffusion coefficient is a truncated Karhunen-Lo`eve expansion, a(y, x) = E[a](x) + n=1 bn (x)yn , and by the independence of the Y components its joint probability density decouples into the product ρ(y) = ΠN eve ” like expression for m=1 ρm (ym ) then we have a corresponding “Karhunen-Lo` the stiffness matrix Z N N X X 0 n Ki,` (y) ≡ (E[a](x) + bn (x)yn )∇ϕi (x) · ∇ϕ` (x)dx = Ki,` + yn Ki,` D
n=1
n=1
with deterministic coefficients 0 Ki,`
Z ≡
E[a](x)∇ϕi (x) · ∇ϕ` (x)dx D
and n Ki,` ≡
Z bn (x)∇ϕi (x) · ∇ϕ` (x)dx. D
By the same token we have, Z Z 0 ρ(y)ψk (y)ψj (y)Ki,` (y)dy =Ki,` ρ(y)ψk (y)ψj (y)dy Γ
Γ
+
N X
n Ki,`
Z yn ρ(y)ψk (y)ψj (y)dy. Γ
n=1
Since ψk ∈ Z p , with multiindex p = (p1 , . . . , pN ), it is enough to take it as the product ψk (y) =
N Y
ψkr (yr )
r=1
where ψkr : Γr → R is a basis function of the subspace Z pr = span[1, y, . . . , y pr ] = span[ψhr : h = 1, . . . , pr + 1]. Keeping this choice of ψk in mind, Z Z Y N 0 ρ(y)ψk (y)ψj (y)Ki,` (y)dy =Ki,` ρm (ym )ψkm (ym )ψjm (ym )dy Γ
Γ m=1
+
N X
n Ki,`
Z yn Γ
n=1 24
N Y m=1
ρm (ym )ψkm (ym )ψjm (ym )dy
Now, for every set Γn , n = 1, . . . , N choose the polynomials, ψj (y) = onal, i.e. for n = 1, . . . , N they must satisfy Z ρn (z)ψkn (z)ψjn (z)dz = δkj Γn Z (7.3) zρn (z)ψkn (z)ψjn (z)dz = ckn δkj .
QN
n=1
ψjn (yn ), to be biorthog-
Γn
To find the polynomials ψ we have to solve N eigenproblems, each of them with size (1 + pn ). The computational work required by these eigenproblems is negligible with respect to the one required to Z solve for uij , cf. [25], Z section 8.7.2. The orthogonality properties (7.3) for ψ imply the decoupling ρψk ψj dy =δkj ,
yn ρψk ψj dy =ckn δkj .
Γ
By means of this decoupling we now conclude that
Γ
X Z j,i
ρ(y)ψk (y)ψj (y)Ki,` (y)dy
0 =Ki,`
Γ
Z ρ(y)ψk (y)ψj (y)dy Γ
+
N X
n Ki,`
=
yn ρ(y)ψk (y)ψj (y)dy Γ
n=1 0 Ki,`
Z
+
N X
! ckn
n Ki,`
δkj .
n=1
The structure of the linear system that determines uij now becomes block diagonal, which each block being coercive and with the sparsity structure identical to one deterministic FEM stiffness matrix, i.e. PN K 0 + n=1 c1n K n 0 ... 0 PN 0 K 0 + n=1 c2n K n ... 0 .. . . . . . . . P N 0 ... 0 K 0 + n=1 cN n K n The conclusion is that the work to find the coefficients uij in (7.2) is the same as the one needed QN to compute i=1 (1 + pi ) Monte Carlo realizations of uh defined in (4.1). Beside this, observe that as a consequence of the uniform coercivity assumption, each of the diagonal blocks in the system above is symmetric and strictly positive definite. 8. Asymptotical efficiency comparisons In this section we compare the asymptotical numerical complexity for the Monte Carlo Galerkin finite element method, cf. Section 4, with the Stochastic Galerkin finite element method introduced in Sections 5 and 6. In all the cases, the spatial discretization is done by piecewise linear finite elements on globally quasiuniform meshes. For the k×h−SGFEM the Γ partitions are also assumed to be globally quasiuniform. Beside this, the diffusion function a is assumed to be a truncated Karhunen-Lo`eve expansion. 8.1. MCGFEM versus k×h−SGFEM. Here we consider the computational work to achieve a given accuracy bounded by a positive constant TOL, for both the MCGFEM and the k×h−SGFEM methods. This optimal computational work indicates under which circumstances one method may be best suited. When using the MCGFEM method to approximate the solution of (1.1) in energy 25
norm the error becomes, applying Proposition 4.2 together with (4.5) that given a confidence level, 0 < c0 < 1, there exists a constant C > 0 depending only on c0 such that M X 1 1 (8.1) P kE[u] − uh (·; ωj )kH01 (D) ≤ C(h + √ ) ≥ c0 M j=1 M √ Then, in the sense of (8.1) we write EM CGF EM = O(h) + O(1/ M ). The corresponding computational work for the MCGFEM method is W orkM CGF EM = O((1/hd )r + 1/hd )M, where the parameter 1 ≤ r ≤ 3 relates to the computational effort devoted to solve one linear system with with n unknowns, O(nr ). From now on we continue the discussion with the optimal r = 1 that can be achieved by means of the Multigrid method, cf. [12, 13, 28]. Thus, choosing h and M to minimize the computational work for a given desired level of accuracy TOL > 0 yields the optimal work ∗ (8.2) W orkM TOL−(2+d) . CGF EM = O On the other hand, if we apply a k ×h−SGFEM with piecewise polynomials of order q in the y−direction the computational error in H01 (D) norm is, cf. Proposition 5.1, ESGF EM = O(h) + N O(k q+1 ) and the corresponding computational work for the k×h−version is W orkSGF EM = O h−d (1 + q)N k −N . Here N is the number of terms in the truncated Karhunen-Lo`eve expansion of the coefficients a and f and k is the discretization parameter in the y direction. Similarly as before, we can compute the optimal work for the k×h−SGFEM method, yielding N − q+1 TOL ∗ N TOL−d ). W orkSGF EM = O((1 + q) N Therefore, a k × h−version SGFEM is likely to be preferred whenever TOL is sufficiently small and N/2 < 1 + q, i.e. if the number of terms in the Karhunen-Lo`eve expansion of a is large then the degree of approximation in the y direction, q, has to become correspondingly large. We summarize the comparison results in Table 1, where we also include corresponding results from the p×h−version, discussed in Subsection 8.2. MCGFEM W ork 1 H0 (D) Error
M/hd h + √1M −(2+d)
k×h−version SGFEM p×h−version SGFEM (1+q)N hd kN q+1
(1+p)N hd (p+1)
h+ k N − q+1
h+r
W ork TOL TOL TOL (logr (TOL))N TOL−d 1 Table 1. Approximation of the function E[u] in H0 (D). Asymptotical numerical complexity for the MCGFEM and SGFEM methods. H01 (D)
∗
−d
Similarly, if we are interested to control the difference kE[u] − E[uh ]kL2 (D) the application of (4.2) and Properties 4.1 and 4.2 for the MCGFEM method and Theorem 5.2 on the superconvergence of the k×h−SGFEM method imply the results shown in Table 2. In this case k×h−SGFEM is likely to be preferred whenever N/4 < (q + 1) and TOL is sufficiently small. In addition, the comparison tells us that to be able to be competitive with the Monte Carlo method when the number of relevant terms in the Karhunen-Lo`eve expansion is not so small, an optimal method should have a high order of approximation and should avoid as much as possible the coupling between the different 26
MCGFEM W ork L2 (D) Error
d
M/h h2 + √1M
p×h−version SGFEM (1+p)N hd 2 2(p+1)
(1+q)N hd kN 2 2(q+1)
h +k
N − 2(q+1)
h +r −d/2
L (D) W ork TOL (TOL) TOL (logr (TOL))N TOL−d/2 2 Table 2. Approximation of the function E[u] in L (D). Asymptotical numerical complexity for the MCGFEM and SGFEM methods. 2
∗
−(2+d/2)
k×h−version SGFEM
components of the numerical solution to preserve computational efficiency. The approach proposed by Ghanem and Spanos [24] based on orthogonal polynomials has, whenever the approximate diffusion satisfies (1.2), a high order of approximation but introduces coupling between the different components of the numerical solution. The uncoupling can be achieved for linear equations using double orthogonal polynomials, see the description in Section 7. With this motivation, Section 6 studies the convergence of the p×h−SGFEM. 8.2. MCGFEM versus p×h−SGFEM. Here we consider the computational work to achieve a given accuracy, for both the p×h− version of SGFEM defined in (6.1) and the MCGFEM method for the approximation of E[u] defined in Section 4, i.e. we are interested to control the difference PM 1 kE[u] − E[uph ]kL2 (D) or kE[u] − M j=1 uh (·; ωj )kL2 (D) , respectively. This computational work indicates under which circumstances one method may be best suited than the other. Besides this, let us assume that we use in our computations a piecewise linear FEM space in D. When using the MCGFEM method to approximate the expected value of the solution of (1.1), we have the optimal work required to achieve a given desired level of accuracy TOL > 0, cf. (8.2), 2+d/2 ∗ W orkM ). CGF EM = O(1/TOL
On the other hand, if we apply a p × h−version of the SGFEM, with pi = p, i = 1, . . . , N , the computational error is, cf. Theorem 6.3, ESGF EM = O(h2 ) + O(r2(p+1) ), 0 < r < 1, and the corresponding computational work is, cf. Section 7, (1 + p)N . W orkSGF EM = O hd Recall that N is the number of terms in the truncated Karhunen-Lo`eve expansion of the coefficients a and f and k is the discretization parameter in the y direction. As before, we can compute the optimal work for the SGFEM method, yielding −d/2 ∗ N W orkSGF ≤ O (log (TOL)) TOL r EM and the asymptotical comparison ∗ W orkSGF EM lim = lim (logr (TOL))N TOL2 = 0. ∗ TOL→0 TOL→0 W orkM CGF EM Therefore, for sufficiently strict accuracy requirements, i.e. sufficiently small TOL, in the computation of E[u], SGFEM requires less computational effort than MCGFEM. The work of Bahvalov and its subsequent extensions, cf. [9, 27, 54, 40], generalizes the standard Monte Carlo method, taking advantage of the available integrand’s smoothness and yielding a faster order of convergence. The optimal work of such a method is for our case, i.e. the approximation of E[u] in L2 (D), 1 O(C(N )TOL− 1/2+q/N TOL−d/2 ), where it is assumed that the integrand u has bounded derivatives up to order q with respect to y and the integral is done in the N -dimensional unit cube. 27
The result on the computational work of the p×h−version of the SGFEM presented in this work is then related to the case q = ∞, since u is analytic with respect to y. This analyticity allows the exponential convergence with respect to p, cf. Theorem 6.3. Notice that we only discussed the optimal asymptotical computational work required by both methods, but in practice the constants involved in the asymptotic approximations makes these comparisons just R indicative, and not conclusive. In addition, we have only studied the case where the integrals Γi ρi y k dy can be computed exactly for k = 0, 1, . . ., and not considered the more general case where quadrature rules are needed to approximate such integrals. Remark 8.1 (Use of higher order FEM approximations on D). Based on Remark 5.1 and following the approach from Sections 8.1 and 8.2 we can discuss the possible use of higher order FEM approximations on D. This use seems a priori always useful for the p×h−version of the SGFEM while for the k×h−version higher order FEM on D are attractive provided d >> N/(q + 1). On the other hand a MCGFEM piecewise linear approximation on D with error TOL requires the same work as the optimal higher order method with error slightly larger than TOL1+d/4 . See [52], p. 1884, for a similar discussion on the weak approximation of ordinary SDEs. Remark 8.2 (Combination of MCGFEM with SGFEM). It is possible to combine in a natural way the Monte Carlo method with an SGFEM version, partitioning Γ in order to take advantage of their different convergence rates. Split the domain Γ into Γ = ΓM C ∪ ΓG , ΓM C ∩ ΓG = φ and approximate (8.3)
E[u(Y, ·)] =E[u(Y, ·)|Y ∈ ΓM C ]P (Y ∈ ΓM C ) + E[u(Y, ·)|Y ∈ ΓG ]P (Y ∈ ΓG ).
using SGFEM in ΓG and MCGFEM in ΓM C . Remark 8.3 (Successive approximation method). The work [5] proposes a successive approximation for the solution of (1.1). As any Neumman expansion with K terms, whenever it converges K+1 it has an error of the form ErrorBC ≤ C(h + ξξ−1 ) for some 0 < ξ < 1. The computational work K+1
to achieve this error is W orkBC ≤ C (NN−1)hd . Then, a comparison between this method and the √ MCGFEM yields that if ξ < 1/ N then the method from [5] is likely to be preferred. On the other hand, for sufficiently small tolerances the p×h−version of the SGFEM is more efficient than the successive approximation. Aknowledgements The first author was partially supported by the National Science Fundation (grant DMS-9802367) and the Office of Naval Research (grant N00014-99-1-0724). The second author was partially supported by the Swedish Council for Engineering Science grant # 222-148, the National Science Fundation grant DMS-9802367, and UdelaR and UdeM in Uruguay. The third author was partially supported by the European Union TMR grant No ERBFMRX-CT980234 (Viscosity Solutions and their Applications). The authors would like to thank Prof. Anders Szepessy for many valuable discussions regarding to this work. References [1] S. Agmon. Lectures on elliptic boundary value problems. D. Van Nostrand Company, Inc., 1965. [2] K. Alvin, W. Oberkampf, B. Rutheford, and K. Diegert. Methodology for characterizing modeling and discretization uncertainties in computational simulation. Technical Report SAND2000-0515, Sandia National Laboratories, Albuquerque, New Mexico, 2000. [3] I. Babuˇska. On randomized solutions of Laplace’s equation. Casopis Pest. Mat., 1961. [4] I. Babuˇska, B. Andersson, P. J. Smith, and K. Levin. Damage analysis of fiber composites. I. Statistical analysis on fiber scale. Comput. Methods Appl. Mech. Engrg., 172(1-4):27–77, 1999. [5] I. Babuˇska and P. Chatzipantelidis. On solving linear elliptic stochastic partial differential equations. Comput. Methods Appl. Mech. Engrg., 191:4093–4122, 2002. 28
[6] I. Babuˇska and J. Chleboun. Effects of uncertainties in the domain on the solution of Dirichlet boundary value problems in two spatial dimensions. To appear in Num. Math. [7] I. Babuˇska and J. Chleboun. Effects of uncertainties in the domain on the solution of Neumann boundary value problems in two spatial dimensions. To appear in Math. Comp. [8] I. Babuˇska, K-M. Liu, and R. Tempone. Solving stochastic partial differential equations based on the experiment data. Submitted. [9] N. S. Bahvalov. Approximate computation of multiple integrals. Vestnik Moskov. Univ. Ser. Mat. Meh. Astr. Fiz. Him., 1959(4):3–18, 1959. [10] Y. Ben-Haim and I. Elishakoff. Convex models of uncertainty in applied mechanics. Elsevier Scientific Publishing Co., Amsterdam, 1990. [11] P. Billingsley. Probability and measure. J. Wiley & Sons, Inc., second edition, 1986. [12] J. H. Bramble. Multigrid methods. Longman Scientific & Technical, Harlow, 1993. [13] S. C. Brenner and L. R. Scott. The Mathematical Theory of Finite Element Methods. Springer–Verlag, New York, 1994. [14] R. E. Caflisch. Monte Carlo and quasi-Monte Carlo methods. In Acta numerica, 1998, pages 1–49. Cambridge Univ. Press, Cambridge, 1998. [15] P. G. Ciarlet. The finite element methods for elliptic problems. North-Holland, New York, 1987. [16] M.-K. Deb. Solution of stochastic partial differential equations (SPDEs) using Galerkin method: theory and applications. PhD thesis, The University of Texas at Austin, TX, 2000. [17] M.-K. Deb, I. M. Babuˇska, and J. T. Oden. Solution of stochastic partial differential equations using Galerkin finite element techniques. Comput. Methods Appl. Mech. Engrg., 190:6359–6372, 2001. [18] I. Elishakoff, editor. Whys and hows in uncertainty modelling. Springer-Verlag, Vienna, 1999. [19] I. Elishakoff, Y. K. Lin, and L. P. Zhu. Probabilistic and convex modelling of acoustically excited structures. Elsevier Science B.V., Amsterdam, 1994. [20] I. Elishakoff and Y. Ren. The bird’s eye view on finite element method for structures with large stochastic variations. Comput. Methods Appl. Mech. Engrg., 168(1-4):51–61, 1999. [21] L. Evans. Partial Differential Equations, volume 19 of Graduate Studies in Mathematics. AMS, 1998. [22] R. Ghanem and J. Red-Horse. Propagation of probabilistic uncertainty in complex physical systems using a stochastic finite element approach. Phys. D, 133(1-4):137–144, 1999. [23] R. Ghanem and S. Shigehiro. Simulation of multi-dimensional non-gaussian non-stationary random fields. Probabilistic Mechanics, 2002. to appear. [24] R. G. Ghanem and P. D. Spanos. Stochastic finite elements: a spectral approach. Springer-Verlag, New York, 1991. [25] G. H. Golub and C. F. Van Loan. Matrix computations. Johns Hopkins University Press, Baltimore, MD, third edition, 1996. [26] W. Gui and I. Babuˇska. The h, p and h-p versions of the finite element method in 1 dimension. I. The error analysis of the p-version. Numer. Math., 49(6):577–612, 1986. [27] S. Haber. Stochastic quadrature formulas. Math. Comp., 23:751–764, 1969. [28] W. Hackbusch. Multigrid methods and applications. Springer-Verlag, Berlin, 1985. [29] E. Haugen. Probabilistic Mechanical Design. John Wiley & Sons, New York, 1980. [30] H. Holden, B. Øksendal, J. Ubøe, and T. Zhang. Stochastic partial differential equations. Birkh¨ auser Boston Inc., Boston, MA, 1996. A modeling, white noise functional approach. [31] J. L. Jensen, L. W. Lake, P. W. Corbett, and D. Logan. Statistics for Petroleum Engineers and Geoscientists. Prentice Hall, 1997. [32] E. Jouini, J. Cvitani´ c, and M. Musiela, editors. Option Pricing, Interest Rates and Risk Management. Cambridge University Press, 2001. [33] M. Kleiber and T-D Hien. The stochastic finite element method. John Wiley & Sons Ltd., Chichester, 1992. [34] P. E. Kloeden and E. Platen. Numerical solution of stochastic differential equations. Springer-Verlag, Berlin, 1992. [35] J. E. Lagnese. General boundary value problems for differential equations of sobolev type. SIAM J. Math. Anal., 3(1):105–119, February 1972. [36] S. Larsen. Numerical analysis of elliptic partial differential equations with stochastic input data. PhD thesis, University of Maryland, MD, 1986. ´ [37] P. L´ evy. Processus stochastiques et mouvement Brownien. Editions Jacques Gabay, Paris, 1992. [38] G. Maymon. Some Engineering Applications in Random Vibrations and Random Structures, volume 178 of Progress in Astronautics and Aeronautics. American Institute of Aeronautics and Astronautics, Inc., Reston, Virginia, 1998. 29
[39] M. P. Mignolet and P. D. Spanos. A direct determination of ARMA algorithms for the simulation of stationary random processes. Internat. J. Non-Linear Mech., 25(5):555–568, 1990. [40] E. Novak. Stochastic properties of quadrature formulas. Numer. Math., 53(5):609–620, 1988. [41] W. Oberkampf, S. DeLand, B. Rutheford, and K. Diegert. Estimation of total uncertainty in modeling and simulation. Technical Report SAND2000-0824, Sandia National Laboratories, Albuquerque, New Mexico, April 2000. [42] B. Øksendal. Stochastic differential equations. An introduction with applications. Springer–Verlag, Berlin, fifth edition, 1998. [43] P. A. Raviart and J. M. Thomas. Introduction ´ a l’analyse num´ erique des ´ equations aux d´ eriv´ ees partielles. Masson, Paris, 1988. [44] J. Red-Horse, T. Paez, R. Field, and V. Romero. Nondeterministic analysis of mechanical systems. Technical Report SAND2000-0890, Sandia National Laboratories, Albuquerque, New Mexico, April 2000. [45] F. Riesz and B. Sz.-Nagy. Functional analysis. Dover Publications Inc., New York, 1990. [46] P. J. Roache. Verification and Validation in Computational Science and Engineering. Hermosa, Albuquerque, New Mexico, 1998. [47] J. B. Roberts and P.-T. D. Spanos. Random vibration and statistical linearization. John Wiley & Sons Ltd., Chichester, 1990. [48] C. Schwab and R.-A. Todor. Finite element method for elliptic problems with stochastic data. Technical report, Seminar for Applied Mathematics, ETHZ, 8092 Zurich, Switzerland, April 2002. Research Report 2002-05. [49] I. M. Sobol0 . A primer for the Monte Carlo method. CRC Press, Boca Raton, FL, 1994. [50] I. M. Sobol0 . On quasi-Monte Carlo integrations. Math. Comput. Simulation, 47(2-5):103–112, 1998. [51] J. S´ olnes. Stochastic processes and random vibrations. John Wiley & Sons Ltd., 1997. Theory and practice. [52] A. Szepessy, R. Tempone, and G. E. Zouraris. Adaptive weak approximation of stochastic differential equations. Comm. Pure Appl. Math., 54(10):1169–1214, 2001. [53] T. G. Theting. Solving Wick-stochastic boundary value problems using a finite element method. Stochastics Stochastics Rep., 70(3-4):241–270, 2000. [54] J. F. Traub and A. G. Werschulz. Complexity and information. Cambridge University Press, Cambridge, 1998. [55] G. V˚ age. Variational methods for PDEs applied to stochastic partial differential equations. Math. Scand., 82(1):113–137, 1998. [56] A. M. Yaglom. Correlation theory of stationary and related random functions. Vol. I. Springer-Verlag, New York, 1987. [57] A. M. Yaglom. Correlation theory of stationary and related random functions. Vol. II. Springer-Verlag, New York, 1987.
30