A PPROXIMATION OF I NFINITELY D IVISIBLE R ANDOM VARIABLES WITH A PPLICATION TO THE S IMULATION OF S TOCHASTIC P ROCESSES M AGNUS W IKTORSSON
Centre for Mathematical Sciences Mathematical Statistics
Mathematical Statistics Centre for Mathematical Sciences Lund Institute of Technology Lund University Box 118 SE-221 00 Lund Sweden http://www.maths.lth.se/ Doctoral Theses in Mathematical Sciences 2001:1 ISSN 1404-0034 ISBN 91-628-4640-X LUTFMS-1014-2001 Magnus Wiktorsson, 2001 Printed in Sweden by KFS AB Lund 2001
Contents Acknowledgments
iii
List of papers Introduction 1 Historical overview . . . . . . . . . 2 Stochastic integrals . . . . . . . . . 3 Stochastic differential equations . . 4 Numerical approximations of SDEs 5 Infinitely divisible distributions . .
v . . . . .
. . . . .
. . . . .
. . . . .
. . . . .
. . . . .
. . . . .
. . . . .
. . . . .
1 1 1 4 7 10
A On the simulation of iterated Itô integrals 1 Introduction . . . . . . . . . . . . . . . . . . . . . . . 2 Distributional properties of the iterated Itô integral . . . 3 Simulation algorithms . . . . . . . . . . . . . . . . . . 4 Improved rate of convergence through tail approximation 5 Applications to the simulation of SDEs . . . . . . . . .
. . . . .
. . . . .
. . . . .
. . . . .
. . . . .
. . . . .
. . . . .
. . . . .
19 20 22 24 30 37
. . . . .
. . . . .
. . . . .
. . . . .
. . . . .
. . . . .
. . . . .
. . . . .
. . . . .
. . . . .
B Joint characteristic function and simultaneous simulation of iterated Itô integrals for multiple independent Brownian motions 1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2 Representation of the iterated Itô integrals . . . . . . . . . . . . . . . . 3 Conditional joint characteristic function of the stochastic area integrals . 4 Simulation of the iterated Itô integrals . . . . . . . . . . . . . . . . . .
43 43 46 47 51
C Improved convergence rate for the simulation of Lévy processes of type G 1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2 Lévy processes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3 Representations of the tail-sum process . . . . . . . . . . . . . . . . . 4 Simulation algorithms . . . . . . . . . . . . . . . . . . . . . . . . . . 5 Convergence of the subordinator . . . . . . . . . . . . . . . . . . . . 6 Generalisations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
65 66 66 72 74 82 86
D Simulation of stochastic integrals with respect to Lévy processes of type G 1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2 Lévy processes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3 Stochastic integrals with respect to type G Lévy processes . . . . . . . .
93 93 94 96
i
ii
Acknowledgments I would like to thank ... Mona and James for making the practical things work. My supervisor Tobias for careful proof-readings and helpful suggestions. Everybody at the department for the pleasant and open atmosphere. All the PhD students for extended lunch-breaks and countless discussions about life, universe and everything. My sister Maria for being there and my parents for me being here. My lovely daughter Klara for reminding me that there is more to life than mathematics, although her fifth birthday coincides with my disputation, and her mother Kristina for coping with a self-centred thesis-writer.
iii
iv
List of papers This thesis is based on four papers referred to in the text with the capital letters A, B, C and D. A. Tobias Rydén & Magnus Wiktorsson: On the simulation of iterated Itô integrals, Stochastic Process. Appl. 91, 151–168. (2001). B. Magnus Wiktorsson: Joint characteristic function and simultaneous simulation of iterated Itô integrals for multiple independent Brownian motions. to appear in Annals of Applied Probability (2001). C. Magnus Wiktorsson: Improved convergence rate for the simulation of Lévy processes of type G. working paper. D. Magnus Wiktorsson: Simulation of stochastic integrals with respect to Lévy processes of type G. working paper.
v
vi
1. Historical overview
Introduction 1 Historical overview Stochastic differential equations and stochastic calculus is a vast field of study. In a few pages, it is virtually impossible to give anything more than a very brief outline of the field. It all started in the summer of 1827 when the biologist Robert Brown discovered that small particles from inside pollen grains, when suspended in water and observed in a microscope, moved in a highly irregular manner. This phenomenon which we now call “Brownian motion” was described in Brown (1828, 1829). An explanation of “Brownian motion” was not given until the end of the century; the highly irregular movement was caused by water molecules repeatedly hitting the small particles. The first more mathematical treatment of Brownian motion was given by Bachelier (1900), who was interested in modelling stock prices 1 . A quantitative explanation of the phenomenon was given by Einstein (1905). Einstein also gave the density for the movement along the x-axis during a time-interval t, as well as a physical explanation of the scale parameter in the density function. In the following decades Brownian motion was put in a more strict mathematical framework through the works of Norbert Wiener (1923, 1924) and Paul Lévy (1939, 1948). Stochastic integrals with respect to Brownian motion were first constructed by Itô (1944).
2 Stochastic integrals An ordinary Riemann (Riemann-Stieltjes) integral is defined as a limit of a sum approximations. Let tk(n) k 0 1 n for each n 0 be an increasing (in k) sequence (n) of time-points with t0(n) T0 and tn(n) Tf , such that maxk tk(n) 1 tk 0 n . Then Tf
f (t) dt T0
lim
n
n
1
k 0
(n) f (tk(n) )(tk(n) 1 tk )
where tk(n) is some arbitrary point in the interval [tk(n) tk(n) 1 ]. Now assume that we want to define a stochastic integral in which we replace the infinitesimal dt by dW (t), where the stochastic process W (t ) is a Brownian motion defined on a probability space ( P). The above definition is not possible to employ for defining the stochastic integral, Tf
I [f ]( )
“ T0
f (t ) dW (t )”
1 This was 73 years before Black & Scholes’ famous article about option-pricing where they used geometric Brownian motions as a model for the stock prices (Black & Scholes, 1973).
1
Introduction
since the Brownian motion W (t) does not have finite variation. As a consequence of this, we cannot in general define stochastic integrals by . Assume that: (i) f (t ) is jointly measurable in (t ) for T0 (ii) f (t ) ,T0
t
t
Tf , is adapted to the filtration
(iii) the Riemann integral
Tf T0
Tf , t
(W T
s
0
s
t),
f (t )2 dt has finite expectation.
We can then define the stochastic integral as a limit in mean square sense, i.e. I [f ] L2-lim n
n
1
(n) f (tk(n) )(W (tk(n) 1 ) W (tk ))
k 0
However, this limit depends on what point tk(n) in each time-interval we choose to evaluate the function f ( ) at. A classical example of this (following Øksendal, 1996) is if we take f ( ) W ( ) and compare the limits I1 [W ] and I2 [W ] resulting from choosing tk(n) tk(n) and tk(n) tk(n) 1 respectively. Thus define
I1 [f ] I2 [f ]
n
L2-lim n
(n) W (tk(n))(W (tk(n) 1 ) W (tk ))
k 0
n
L2-lim
1
n
1 (n) (n) W (tk(n) 1 )(W (tk 1 ) W (tk ))
k 0
Then, by the independent increments of Brownian motion, EI1 [f ]
E lim n
n
1
k 0
(n) W (tk(n))(W (tk(n) 1 ) W (tk ))
0
but
EI2 [f ]
E lim n
lim
n
lim
n
2
lim
n
n
n
k 0
(n) (n) W (tk(n) 1 )(W (tk 1 ) W (tk ))
1
k 0
n
1
(n) 2 E[(W (tk(n) 1 ) W (tk )) ]
1
k 0 (tn(n)
(n) (tk(n) 1 tk )
t0(n) )
2. Stochastic integrals
Tf T0
The two most common evaluation points are (a) tk(n) tk(n) , which leads to the Itô integral, and (b) tk(n) (tk(n) tk(n) 1 ) 2, which leads to the Stratonovich integral. These inte-
T
T
grals are denoted by T0f f (t) dW (t) and T0f f (t) dW (t) respectively. The Stratonovich calculus obeys the usual chain rule while the Itô calculus does not. The Itô integral as opposed to the Stratonovich integral is a martingale. It is also possible to define stochastic integrals with respect to more general processes than Brownian motion, for example continuous local martingales (see e.g. Karatzas & Shreve, 1991) or -stable Lévy-motions (Janicki, Michna & Weron, 1996); in the last case the integral is defined as a limit in probability. More generally we can integrate with respect to semimartingales (see e.g. Jacod & Shiryaev, 1987; Protter, 1990). Recall that a semimartingale X on a filtration t can be decomposed as X X0 M A, where X0 is 0 -measurable, M is local martingale and A is a process of finite variation over bounded intervals. An import subclass of semimartingales is the class of Lévy processes, i.e. semimartingales with independent stationary increments starting at zero. We will treat the class of Lévy processes more thoroughly below. If the integrator is a Brownian motion one can instead weaken the conditions for the integrand. For instance, if we drop the adaptedness condition for the integrand we get the Skorohod integral which we T denote 0 f (t) W (t) (see e.g. Øksendal, 1996, pp. 2.1–2.6). For adapted integrands the Skorohod integral and the Itô integral coincide. From ordinary integration we are used 1 1 to that 0 f (1)g(t) dt f (1) 0 g(t) dt, i.e. we can move out any factor that does not depend on the integration variable. The Skorohod integral does not obey this rule (unless the integrand is adapted of course), which we see from the following simple example. If T T we calculate the integral 0 W (T ) W (t) we get that 0 W (T ) W (t) W (T )2 T as
opposed to W (T )
W (t) W (T ) T 0
T 0
W (T )2 .
dW (t)
The Itô formula As mentioned above the Itô integral with respect to integrands of infinite variation does not obey the usual chain rule of classic calculus. Let X (t) be a stochastic Itô process defined as t
X (t) X (0) 0
t
e(t ) dt
0
f (t ) dW (t)
where e and f satisfy conditions (i)–(iii) above. Let g(t x) Y (t) g(t X (t)). It then holds that dY (t)
g(t X (t)) t
e(t )
g(t X (t)) x
C 1 2 ([0 )
1 f (t )2 2
2
g(t X (t)) x2
) and
dt 3
Introduction
f (t
g(t X (t)) x dW (t) )
which should be interpreted as
g(s X (s)) g(s X (s)) 1 g(s X (s)) s x x ds f (s ) Y (0) e(s ) 2 g(s X (s)) x dW (s) f (s ) 0, f 1 and g(t x) x 2 yields the classical example t
Y (t)
t
2
2
2
0
0
Taking e
2
1 W (t)2 2
1 2
t
t
ds 0
0
W (s) dW (s)
which shows that
t
W (t)2 t 2
W (s) dW (s) 0
We note that a modified version of the Itô formula is valid also for semimartingales (see e.g. Protter, 1990, Theorem 32, p. 71).
3
Stochastic differential equations
We now want to give meaning to the stochastic differential equation (SDE) “
dX (t) a(X (t) t) dt
b(X (t) t)
dW (t) ” dt
The term “dW (t) dt” can be viewed as white noise in continuous time. The trouble is that “dW (t) dt” does not even exist as a stochastic process in the ordinary sense. It is, however, possible to define it in distributional sense; with distribution is here meant a generalised function, e.g. a function such as the Dirac. White noise must then be interpreted as a random distribution, or more precisely as a probability measure on the space of tempered distributions (the dual space of the rapidly decreasing smooth functions) (see e.g. Hida, 1980). In order to get around this rather technical approach we simply interpret
“
dX (t) dt X (0)
a(X (t) t)
X0
as one of the the following integral equations,
b(X (t) t)
t
X (t) 4
X0
0
dW (t) ” dt
t
a(X (s) s) ds
0
b(X (s) s) dW (s)
3. Stochastic differential equations
t
X (t)
X0
0
t
a(X (s) s) ds
0
b(X (s) s) dW (s)
The first equation is in the Itô sense and the second one is in the Stratonovich sense. If the function b(t x) has a continuous derivative with respect to x it holds that the Itô SDE dX (t)
a(X (t) t) dt
X (0)
X0
b(X (t) t) dW (t) (3.1)
has the same solution as the Stratonovich SDE dX (t)
X (0)
a(X (t) t)
1 b(X (t) t) b(X (t) t) 2 x
dt
b(X (t) t) dW (t)
X0
Uniqueness and existence of strong and weak solutions If the drift term a and the dispersion term b satisfy: (i) a(t x) and b(t x) are jointly measurable in (t x) (ii) Lipschitz condition
a(t x)
a(t y)
b(t x)
b(t y)
F
[T0 Tf ]
Lx y
d
,
for x y
d
for some positive constant L, (iii) linear growth bound
a(t x) b(t x)
C(1
F
x)
for x
d
for some positive constant C, (iv) the initial value X (T0 ) is measurable and has finite variance, then there exists a unique strong solution to the SDE, i.e. for every given Brownian motion W (t) we can find a solution X (t) which is adapted to the filtration generated by W (t) . The Lipschitz condition provides uniqueness and the linear growth bound gives global existence of the solution. If (ii) is violated the uniqueness of the solution cannot be guaranteed. A classical example of this is the ordinary differential equation (ODE) dX (t) dt
X (t) 5
Introduction
X (0)
0
which has the solutions X (t) X (t)
1 4 (t
0
C)2
t t
C C
for any C 0. If (iii) is violated, existence can in general only be guaranteed for a limited amount of time, which usually is a function of the initial value. A deterministic example of this is the ODE
which for t
dX (t) dt X (0)
X 2 (t)
X0 0
[0 Te ), with Te 1 X0 , has the solution X (t)
1 Te t
The time Te is usually referred to as the explosion time. The solution to the ODE thus reaches infinity in finite time. There exist SDEs which do not have any strong solution, not even a non-unique one with limited existence in time. Those SDEs can however have a weak solution, i.e. we can find a pair of processes (X (t) W (t)) on some probability space and a filtration t such that X (t) is t -adapted and W (t) is a Brownian motion which is a martingale with respect to t so that the SDE is satisfied. For SDEs without any strong solutions we can of course not have strong uniqueness but we can still have weak uniqueness, i.e. any two processes that are weak solutions to the SDE have the same finite-dimensional distributions.
Examples of SDEs with explicit solutions We now give some examples of SDEs which have explicit solutions. The linear SDE dX (t) X (0)
X (t)
X0 exp((a b2 2)t
has the solution
aX (t) dt X0
bX (t) dW (t)
This process is called geometric Brownian Motion. 6
bW (t))
4. Numerical approximations of SDEs
All SDEs of the form dX (t)
X (0)
1 b(X (t))b (X (t)) dt 2 X0
b(X (t)) dW (t)
have solutions of the form X (t) g 1 (Wt
g(X0 ))
x
1 b(y) dy. where g(x) The following SDE is an example of an SDE having this form. Hence 1
dX (t)
2
2 a X (t) dt
X (0)
X0
with X0
a
1 X 2 (t) dW (t) 1
has the the solution sin(aW (t)
X (t)
arcsin(X0 ))
These examples and several more can be found in (Kloeden & Platen, 1995, pp. 118–127).
4 Numerical approximations of SDEs In the general case we cannot find an explicit solution to an SDE. Therefore it is necessary to compute numerical solutions. A numerical scheme is a time discretisation and a set of rules for advancing the solution from one discrete time-point to the next one. In some cases the schemes use intermediate time-points when updating the solution. Let tn(N ) n 0 1 N be a discretisation of the interval [T0 Tf ], i.e. a sequence of time-points that satisfy (i) T0
t0(N )
t1(N )
(ii) maxn (tn(N )1 tn(N ) )
tN(N ) 1
0 N
tN(N ) Tf N
1 2
The sequence of time-points can be deterministic or stochastic. The stochastic timepoints can be chosen to be dependent on the trajectory of the underlying Brownian motion (see e.g. Gaines & Lyons, 1997). This approach is often referred to as adaptive step size control. The simplest choice of discretisation, however, is the fixed step size, i.e. tn(N ) T0 hn n 0 1 N , where h (Tf T0 ) N is called the step size. Assume that the functions a( ) and b( ) of the Itô SDE (3.1) satisfy the conditions defined above which guarantee a unique strong solution. Let for the fixed step size
7
Introduction h 0, X˜ n(h) be a numerical approximation of (3.1) defined at the discrete time-points tn(N ) n 0 1 N . We can then define a numerical approximation X (h) (t) for an arbitrary t [T0 Tf ] by interpolation. The straightforward approach is to use linear interpolation between the discrete time-points. Note that with linear interpolation we obtain a t-continuous approximation. A numerical scheme X (h) (t) 0 t T is said to converge at rate h if E X (T ) X h (T ) O(h ) as h 0. The simplest algorithm is the Euler scheme, which is given by
X (h) (t h) X (h)(T0 )
X (h) (t) X (T0 )
a(X (h)(t) t)h
b(X (h)(t) t) W (t t
h)
where t nh n 0 1 T h This sequence has a convergence rate of h1 2 as h 0. There are also numerical schemes converging faster than h1 2 . For example, Milshtein (1974) proposed a scheme, which in the one-dimensional case is given by
X (h)(t
X (h) (t)
h)
b(X (h)(t) t) W (t t W 2 (t t h) h b (X (h) (t) t)b(X (h)(t) t) 2 X (0)
a(X (h)(t) t)h
h)
X (h) (0)
and it converges at rate h as h 0 provided that a C 1 1( [0 T ]) and b 21 C ( [0 T ]). The Milshtein scheme is in fact just a truncated Itô -Taylor expansion. d m d In the multi-dimensional case where X (t) , W (t) , a : d [0 T ] d d m and b : [0 T ] , the Milshtein scheme becomes more complicated. To simplify the notation we suppress the evaluation points for the functions a and b, thus letting a denote a(X (h)(t) t) and b denote b(X (h) (t) t). The kth component of the scheme is then given by
Xk(h) (t
Xk(h) (t)
h)
m
ak h m
i 1 j 1
X (h) (0)
bki Wi (t t
i 1
h)
h)
X (0)
i
L
d
b
t h
m
Li bkj Iij (t t
where
1
i
(x t) x
and Iij (t t h) t Wi (s) dWj (s). The integral Iij (t h) is called a multiple or iterated Itô integral. These integrals and the increments of the Brownian motion are difficult to 8
4. Numerical approximations of SDEs
simulate simultaneously, with prescribed precision. Therefore we need to approximate Iij (t t h) for i j 1 m conditioned on the increments of the Brownian motion, cf. the next section. If Li bkj Lj bki for i j 1 m and k 1 d, then the Milshtein scheme simplifies to
Xk(h) (t
Xk(h) (t)
h)
m 1 Lj bkj 2 j 1
ak
m m 1 Li bkj Wi (t t 2 i 1 j 1
X (h) (0)
h
m
bki Wi (t t
h) Wj (t t
h)
i 1
h)
X (0)
This is called the commutative case. In this case we do not have to simulate the iterated Itô integrals to obtain the convergence rate of order h. We note that just as in the ODE-case there are explicit and implicit numerical schemes. Moreover there is the whole family of different stochastic Runge-Kutta schemes (see e.g. Burrage, 1999). In the non-commutative case it is necessary to approximate the various iterated Itô integrals, also for these numerical schemes, to obtain a convergence rate faster than O(h1 2 ) (Rümelin, 1982).
Approximation of iterated Itô integrals In order to obtain a convergence rate of order h for a numerical scheme we need to approximate the iterated Itô integrals with a mean square error (MSE) of Ch 3 where the positive constant C is chosen so that Ch3 is negligible to the one step discretisation error of the numerical scheme. Therefore it is important to have fast algorithms that can generate the iterated Itô integrals with an MSE of prescribed order. Kloeden, Platen & Wright (1992) gave an approximation of the iterated Itô integrals based on a truncated sum representation derived from the Fourier expansion of the Brownian bridge process. If the sum is truncated after n terms an MSE of order h n is obtained. In the case where m 2, i.e we have a two-dimensional Brownian motion as the driving term in the SDE, there is only one iterated Itô integral I12 (t t h) that needs to be approximated, since I21 (t t h) W1 (t t h) W2 (t t h) I21(t t h). Gaines & Lyons (1994) proposed a method for the exact simulation of I12 (t t h). It is, however, quite complicated to implement. Lévy (1951) showed that the conditional distribution of I 12 (t t h) given W1 (t t h) W2 (t t h) is infinitely divisible (ID) (cf. next section). In (A) we propose an alternative method for the approximation of I12 (t t h), based on a truncated shot-noise representation (cf. next section) of an infinitely divisible distribution with an additional tail-sum approximation, which has an MSE of order h2 n2 . We also show that the tail-
9
Introduction
sum is asymptotically Gaussian. In (B) we propose an approximation method for arbitrary m with an MSE of order h2 n2 . The method in (B) is based on a truncation of an infinite sum representation and a tail-sum approximation. We also show that this tail-sum is asymptotically Gaussian and calculate its covariance matrix. Moreover we give an explicit coupling between the true tail-sum and its approximation. In addition we calculate the previously not known conditional joint characteristic function of the m(m 1) 2 iterated Itô integrals obtained when pairing m independent Wiener processes given the increments of the Wiener processes over the integration interval, i.e. the m-dimensional random variable (W1 (t h) W1 (t) Wm (t h) Wm (t)).
5
Infinitely divisible distributions
A random variable X on is said to be infinitely divisible (ID) if for every n there exist i.i.d. random variables X1(n) Xn(n) such that X X1(n) d
This implies that the characteristic function of X ,
X (t)
Xn(n)
X (n) (t)
X (t), n
can be written as
where X (n) (t) is a characteristic function for each n 1. Every characteristic function of an ID random variable X can be written in the following form, the so-called LévyKhinchine canonical representation, X(
)
2
2 2
exp i a
exp(i x)
1 i xI ( x
1) L(dx)
where L is called the Lévy measure. If 2 0 then X is said to have no Gaussian component. Any positive -finite measure L which assigns finite mass to sets bounded away from zero and satisfies
min(1 x 2 ) L(dx)
can be used as a Lévy measure. If the ID random variable is positive the Laplace transform of the distribution can be represented as
( ) X
exp a
0
x)
(exp(
1) L(dx)
the stronger integrability condition The Lévy measure L then satisfies 0
10
min(1 x) L(dx)
5.1 Lévy processes
Any (ID) random variable X with Lévy measure L can be represented as an infinite series of shot-noise type, i.e.
d
X
Y (Tk ) k
where Y (u) u 0 is a family of independent random variables such that 0 P(Y (u) y) du L(dx) and Tk are the points in a homogeneous Poisson process on , y that is independent of Y (u) .
5.1 Lévy processes We now state some elementary properties of Lévy processes. For a more general treatment see Bertoin (1996) and Sato (1999). A Lévy process X (t) is a stochastic process with independent stationary increments where X (0) 0. Every Lévy process X (t) can be decomposed as X (t)
W (t)
at
Z (t)
where at is a linear drift, W (t) is a standard Wiener process and Z (t) is a pure jump process. Note that the process Z (t) has no fixed jump times. The distribution of X (1) completely determines the finite-dimensional distributions of X (t) . Moreover there is a one-to-one correspondence between the infinitely divisible distributions and the distribution of X (1). If a Lévy process has finite expectation and variance they are linear functions of t, i.e. E X (t)
t E X (1)
Var X (t)
t Var X (1)
5.2 Simulation of Lévy processes There are many different ways of simulating Lévy processes. The perhaps simplest method is to define a grid of time-points and simulate the increments between the the grid points. This method works fine if we can easily generate the increments and if we are only interested in approximating the Lévy process on a grid. If we cannot simulate the increments exactly or if we want to have a representation for each t in some closed bounded set, [0 1] say, we can use shot noise type series representations of the Lévy processes. A shot noise representation of a Lévy process X (t) 0 t 1 can easily be obtained by a random thinning of the the shot-noise representation of the ID random variable X (1). More precisely, we have that
X (t)
d
Y (Tk )I (Uk
t)
k
11
Introduction
where Tk , Y (u) are as defined above and Uk is an i.i.d. sequence of random variables uniformly distributed on (0 1). From this representation we see that the jumps of a Lévy process are i.i.d. random variables and that the corresponding jump-times are uniformly distributed over the simulation interval. For a general discussion of series representation of shot noise type for Lévy processes see Rosin´ ski (2000). We have two qualitatively different situations called the finite jump rate case and the infinite jump rate case. The jump rate is the average number of jumps in an interval of unit length and this is further the same as the total mass of the Lévy measure. In the finite case there are a.s. only a finite number of jumps in any time interval with bounded length, whereas with infinite jump rate we have a.s. a countable infinite number of jumps in each compact interval of positive length. If we have finite jump rate the Lévy process is a compound Poisson process. In case of infinite jump rate we can write the Lévy process as an infinite sum of independent compound Poisson processes. We have, however, only finitely many jumps (upwards or downwards) larger than for any 0 even with infinite jump rate. This says that most of the jumps are very small. The main idea when simulating Lévy processes by series representations is to include all (upwards and downwards) jumps greater than some fix level, say. The remaining small jumps can either by neglected or approximated by some process that we can simulate. Asmussen & Rosi´nski (2000) show that for a large class of Lévy processes the Lévy process consisting of the remaining jumps converge weakly to a Brownian motion as the maximum jump-size tends to zero, if we subtract its mean value and scale it by its standard deviation at time t 1. In paper (C) we treat the approximation of Lévy processes of type G. A real-valued d random variable X is said to be of type G if X V 1 2 G where G is a standard Gaussian variable and V is a non-negative infinitely divisible random variable. A Lévy process is said to be of type G if its increments are of type G. Every real-valued type G Lévy process d can be seen as a subordinated Brownian motion, i.e. X (t) W (V (t)) t 0 where d means equality in finite-dimensional distributions, W is a standard Brownian motion and V is a non-negative increasing Lévy process. The process V is called a subordinator. Rosi´nski (1991) suggested
X (t)
Gk g(Tk )1 2 I (Uk
t)
Tk
for 0 t 1 as a series representation of a type G Lévy process with no Gaussian component, where Uk is an i.i.d. sequence of uniform variables on (0 1), G k is an i.i.d. sequence of standard Gaussian variables and Tk the points of a homogeneous Poisson process on with unit intensity. The function g is the generalised inverse of the tail of the Lévy measure M defined as
g(u) inf x 0 : M (x ) 12
u
(5.1)
5.3 Stochastic integrals with respect to Lévy processes
where M is the Lévy measure of the subordinator V . In paper (C) we show that for the class of type G Lévy processes we can for each fix truncation level for the jumps of the subordinator obtain an explicit coupling between the remainder process and a scaled Brownian motion. We use this coupling to approximate Lévy processes of type G and calculate the mean integrated square error (MISE) for the approximation. We also show that it is possible generalise this coupling idea to any real valued Lévy process obtained by subordination of a Lévy process. This can be used for the approximation of the Lévy process provided that the subordinand can be simulated exactly. For the case where the subordinand has has two finite moments we calculate the MISE for this approximation.
5.3 Stochastic integrals with respect to Lévy processes In paper (D) we study stochastic integrals of the form
S
Z (t) 0
f (t s ) dX (s)
where X (s) is a type G Lévy process and f (t s) is adapted in s for each t càdlàg (RCLL) paths.
[0 S] with
Representations of the stochastic integrals Rosi´nski (1991) suggested Z (t)
0 t S
d
Gk g(Tk )1 2 f (t Uk )
k
0 t S
as a series representation of stochastic integrals with respect to a type G Lévy process with no Gaussian component, where Uk , Gk and Tk are as defined above. Depending on the properties of f we have to use different approaches to obtain useful approximations of the stochastic integral Z (t). The above series representation is useful when we can simulate f exactly but not X exactly e.g. when f is a stochastic process independent of X and such that it can easily be simulated or when f is a deterministic function. If the problem instead is to approximate f we have to use a different approach. The difficult case is when we neither can simulate X nor f exactly. For certain special cases it still possible to obtain good approximations. One such case, is when f is a smooth function of X . In paper (D) we propose approximation algorithms when f is of finite variation on compacts and has four finite moments. If f is independent of X it is enough for f to have two finite moments. We also propose an approximation when f is a smooth function of X where X has finite variation. It is in fact possible to drop the finite variation condition if f is independent of X . This case will be treated in the next section. 13
Introduction
Stochastic time change representation Stochastic time change representations of stochastic integrals with respect to symmetric stable Lévy processes were first shown by Rosin´ ski & Woyczy´nski (1986) as well as a necessary and sufficient condition for the existence of these stochastic integrals. Let X (t) be a symmetric -stable Lévy process with 0 2. We then have that
t
Z (t) 0
t
f (s ) dX (t)
X( 0
f (s)
ds) a.s.
t
d
where X (t) X (t) , provided that f satisfies the condition 0 f (s) ds for any finite t. Moreover the process X (t) can explicitly be constructed as X (t)
, a.s.
s
(t)
Z ( (t))
where
inf s 0 :
f (u)
0
du) t
Kallenberg (1992) generalised these results to asymmetric stable Lévy processes and indicated possible multi-dimensional extensions. Kallsen & Shiryaev (2000) showed that this time change property is valid only for the class of -stable Lévy processes. In paper (D) we show that a modification of the time change representation is valid also for type G Lévy processes in the finite dimensional distribution sense provided that the integrand f is independent of the integrator X . Using this representation we obtain an approximation of the stochastic integrals also in the case where the integrand is not of finite variation, provided that it is independent of the integrator and a.s. square integrable with respect to the subordinator V .
References Asmussen, S. & Rosi´nski, J. (2000). Approximations of small jumps of Lévy processes with a view towards simulation. Preprint. Available at: http://www.math.utk.edu/ rosinski/manuscripts.html
Bachelier, L. (1900). Theorie de la spéculation. Ann. Sci. École Norm. Sup. 17, 21–86. Bertoin, J. (1996). Lévy Processes. Cambridge University Press, Cambridge. Black, F. & Scholes, M. (1973). The pricing of options and corporate liabilities. J. Polit. Economy. 81, 637–659. 14
REFERENCES
Brown, R. (1828). A brief account of microscopical observations made in the months of June, July & Aug., 1827, on the particles contained in the pollen of plants; and on existence of active molecules in organic & inorganic bodies, Phil. Mag. 4, 161–173. Brown, R. (1829). Additional remarks on active molecules. Phil. Mag. 6, 161–166. Burrage, P.M. (1999). Runge-Kutta Methods for Stochastic Differential Equations. PhD Thesis, Dept. Maths., University of Queensland, Australia. Einstein, A. (1905). Über die von der molekularkinetischen Theorie der Wärme geforderte Bewegung von in ruhenden Flüssigkeiten suspendierten Teilchen. Ann. Physik 17, 549–560. Gaines, J.G. & Lyons, T.J. (1994). Random generation of stochastic area integrals. SIAM J. Appl. Math. 54, 1132–1146. Gaines, J.G. & Lyons, T.J. (1997). Variable step size control in the numerical solution of stochastic differential equations. SIAM J. Appl. Math. 57, 1455–1484. Hida, T. (1980). Brownian Motion. Springer-Verlag, New York. Itô, K. (1944). Stochastic integral. Proc. Imperial Acad. Tokyo, 20, 519–524. Jacod, J. & Shiryaev, A.N. (1987). Limit Theorems for Stochastic Processes. Springer Verlag, Berlin. Janicki, A., Michna, Z. & Weron, A. (1996). Approximation of stochastic differential equations driven by -stable Lévy-motion. Appl. Math. (Warzaw) 24, 149–168.
Kallenberg, O. (1992). Some time change representations of stable integrals, via transformations of local martingales. Stoch. Proc. Appl. 40, 199–223. Kallsen, J. & Shiryaev, A.N. (2000). Time Change Representation of Stochastic Integrals. Preprint. Available at: http://neyman.mathematik.uni-freiburg.de/ kallsen/
Karatzas, I. & Shreve, S.E. (1991). Brownian Motion and Stochastic Calculus. SpringerVerlag, New York. Kloeden, P.E., Platen, E. & Wright, W. (1992). The approximation of multiple stochastic integrals. Stoch. Anal. Appl. 10, 431–441. Kloeden, P.E. & Platen, E. (1995). Numerical Solution of Stochastic Differential Equations. Springer-Verlag, Berlin. Lévy, P. (1939). Sur certain processus stocastiques homogénes. Composito Math. 7, 283– 339. 15
Introduction
Lévy, P. (1948). Processus Stochastiques et Mouvement Brownien. Gauthier-Villars, Paris. Lévy, P. (1951). Wiener’s random functional and other Laplacian random functionals. Proceedings of the Second Berkeley Symposium on Mathematical Statistics and Probability. J. Neyman, ed., pp. 171–187, University of California Press, Berkeley. Milshtein, G.N. (1974). Approximate integration of stochastic differential equations. Theor. Prob. Appl. 19, 557–562. Øksendal, B. (1995). Stochastic Differential Equations, An Introduction with Applications, 4. ed. Springer-Verlag, Berlin. Øksendal, B. (1996). An Introduction to Malliavin Calculus with Applications to Economics.Working Paper no. 3/96, Norwegian School of Economics, Bergen, Norway. (Lecture Notes) Available at http://www.nhh.no/for/wp/1996/0396.pdf Protter, P. (1990). Stochastic Integration and Differential Equations., Springer-Verlag, Berlin. Rosi´nski, J. & Woyczy´nski, W.A. (1986). On Itô stochastic integration with respect to pstable motion: Inner clock, integrability of sample paths, double and multiple integrals. Ann. Prob. 14, 271–286. Rosi´nski, J. (1991). On a class of infinitely divisible processes represented as mixtures of Gaussian processes. In Stable Processes and Related Topics. Cambanis, S., Samorodnitsky, G. & Taqqu, T.S. (eds.). Birkhäuser, Boston. 405–430. Rosi´nski, J. (2000). Series representation of Lévy processes from the perspective of point processes. Lévy Processes – Theory and Applications. Barndorff-Nielsen, O.E., Mikosch, T. & Resnick, S.I. (eds.). Birkhäuser, Boston. Rümelin, W. (1982). Numerical treatment of stochastic differential equations. SIAM J. Numer. Anal. 19, 604–613. Sato, K. (1999). Lévy Processes and Infinitely Divisible Distributions. Cambridge University Press, Cambridge. Wiener, N. (1923). Differential space. J. Math. Phys. 2, 131–174. Wiener, N. (1924). Un problème de probabilités dénombrables. Bull. Soc. Math. France, 52, 569–578.
16
A
Paper A On the simulation of iterated Itô integrals TOBIAS RYDÉN & M AGNUS W IKTORSSON Centre for Mathematical Sciences Lund University Box 118 221 00 Lund, Sweden
Abstract We consider algorithms for simulation of iterated Itô integrals with application to simulation of stochastic differential equations. The fact that the iterated Itô integral tn
Iij (tn tn
h)
h
s
dWi (u) dWj (s)
tn
tn
conditioned on Wi (tn h) Wi (tn ) and Wj (tn h) Wj (tn ), has an infinitely h), divisible distribution is utilised for the simultaneous simulation of I ij (tn tn Wi (tn h) Wi (tn ) and Wj (tn h) Wj (tn ). Different simulation methods for the iterated Itô integrals are investigated. We show mean square convergence rates for approximations of shot-noise type and asymptotic normality of the remainder of the approximations. This together with the fact that the conditional distribution of Iij (tn tn h), apart from an additive constant, is a Gaussian variance mixture is used to achieve an improved convergence rate. This is done by a coupling method for the remainder of the approximation.
Keyword: iterated Itô integral,infinitely divisible distribution, multi-dimensional stochastic differential equation ,numerical approximation, Type G distribution, variance mixture,coupling 2000 Maths Subject Classification: Primary 60H05; Secondary 60H10 19
A
1
Introduction
The numerical solution of stochastic differential equations (SDEs) has attracted quite a lot of attention during the years. Consider the multi-dimensional SDE
(X (t) t) dW (t)
dX (t) b(X (t) t) dt
(1.1)
where X (t) is a d-dimensional vector and W (t) is an m-dimensional vector of independent standard Brownian motions. The functions b(X (t) t) and (X (t) t) are measurable m mappings from d to d and from d to d , respectively. The above equation is here interpreted in the Itô sense. A solution to (1.1) is said to be strong if there exists a solution for each given Wiener process W . A solution is said to be path-wise unique if any two strong solutions to (1.1) for a given Wiener process W and a given initial value X (0) have the same sample paths a.s. If the functions b(x t) and (x t) are Lipschitz continuous in x and satisfy a linear growth condition in x then a unique strong solution to (1.1) exists. If the linear growth condition is violated the solution can “explode”, i.e. reach infinity in finite time. This means that the solution only exists on a bounded time interval, whose length in general is a function of the initial value. This is of course a well known problem present already in the deterministic setup. Explicit solutions to (1.1) can only be found in a few special cases, so that in general T of we are confined to computing numerical solutions. A sequence X h (t) 0 t numerical approximations, for h 0, of a strong solution X (t) 0 t T is said to converge at rate O(h ) if
E X (T ) X h (T ) O(h ) as h
0
Here h is called the step size. The diffusion coefficient of the SDE (1.1) satisfies the so-called commutativity condition if Li
kj
Lj
ki
i j
1 m k 1 d
where the differential operator Li is given by Li
d
1
i (x t)
(1.2)
x
In the general case where (x t) does not satisfy (1.2), it is not possible to generate 1 2 numerical approximations converging faster than O(h ) unless the iterated Itô integrals
Iij (tn tn 20
h)
tn h tn
s tn
dWi (u) dWj (s)
1. Introduction
are included in the numerical scheme (see e.g. Rümelin, 1982). Milshtein (1974) proposed a numerical scheme that converges strongly at rate O(h) ) and ). In this scheme the kth component of if b C 1 1 ( d C 2 1( d the approximation is given by
Xkh (tn
Xkh (tn )
h)
m
m
Li
i 1 j 1
X h (t0 )
bk h
m i 1
ki
Wi (tn tn
kj Iij (tn tn
h)
h)
X (t0 )
where Wi (tn tn h) Wi (tn h) Wi (tn ). In the present paper we study methods for simulation of iterated Itô integrals. Since the distributions of Iij (tn tn h), Wi (tn tn h) and Wj (tn tn h) do not depend on tn , we hereafter set tn 0 and write Wi (h) for Wi (0 h), Wj (h) for Wj (0 h) and Iij (h) for Iij (0 h). Note that Iii (h) ( Wi (h))2 2 h 2, It is quite a difficult task to simultaneously generate the iterated Itô integrals I ij (h) and Wiener increments Wi (h) and Wj (h) with prescribed accuracy. Kloeden & Platen (1995, p. 347) describe an approximative method based on Fourier expansion of the Brownian bridge process. Gaines & Lyons (1994) suggest a method based on Marsaglia’s “rectangle-wedge-tail” for the case m 2. In the present paper we consider a number of different methods that make use of the fact that the iterated Itô integrals, conditioned on the Wiener increments, have an infinitely divisible distribution. We also show this infinitely divisible distribution to be, apart from an additive constant, of so-called class G type. The methods cover the case m 2. In simulating strong approximations of a given SDE it is of primary interest to generate approximations of iterated Itô integrals with a prescribed mean square error (MSE), see e.g. Kloeden & Platen (1995, p. 362–363); therefore we focus our attention on this measure of deviation below. Lévy (1951) calculated the characteristic function of the conditional distribution of the so-called Lévy stochastic area integral. This integral, denoted by A ij (h), is defined as
Iij (h) Iji (h) ; 2
Aij (h)
(1.3)
obviously Aii (h) 0. There is also another important relation between Aij (h) and Iij (h), Wi (h) and Wj (h). Using
Iij (h)
Iji (h)
Wi (h) Wj (h) a.s, i j
it is clear that
Iij (h) Aij (h)
Wi (h) Wj (h) a.s, i 2
j
(1.4)
21
A
2
Distributional properties of the iterated Itô integral
2.1 Characteristic functions Lévy (1951) (see also Talacko, 1956; Lévy, 1965, pp. 329–333) showed that the characteristic function of the conditional distribution of Aij (h) given Wi (h) and Wj (h) is
Aij (h)
Wi (h)
th 2 exp sinh(th 2)
Wj (h) (t)
2
2
((th 2) coth(th 2) 1)
where 2
( Wi (h)2
Wj (h)2 ) h
(2.1)
Hence (1.4) gives Iij (h)
Wi (h)
Wj (h) (t)
where a
th 2 sinh(th 2)
2
exp
2
((th 2) coth(th 2) 1)
itha
(2.2)
Wi (h) Wj (h) 2h. From (2.2) it is evident that
Iij (h)
d
hIij (1)
The conditional characteristic function Iij (h) Wi (h) Wj (h) (t) can be viewed as the characteristic function of a sum Y1 (h) Y2 (h) Y3 (h) of three independent random variables. The first one, Y1 (h), has characteristic function Y1 (h) (t) (th 2) sinh(th 2), which is the characteristic function of a logistic random variable. We can view Y 1 (h) as the distribution of Iij (h) conditioned on Wi (h) Wj (h) 0. We can generate Y1 (h) by the inverse method, i.e. pick U U (0 1) and let Y1 (h) (h 2 ) log(U (1 U )). The second random variable, Y2 (h), has characteristic function
Y2 (h) (t)
exp
2
2
((th 2) coth(th 2) 1)
(2.3)
and the third one, Y3 (h), has a distribution degenerate at ah. From a simulation point of view Y2 (h) is the difficult part since there is no known closed form for its distribution function. Lévy (1951) proved that Iij (h) has an infinitely divisible distribution. We also see that Y1 (h) Y2 (h) and Y3 (h) have infinitely divisible distributions. Before proceeding we recall some facts about such distributions. 22
2.2 Infinitely divisible distributions
2.2 Infinitely divisible distributions A random variable X is said to be infinitely divisible (ID) if for every n there exist i.i.d. random variables X1(n) Xn(n) such that X1(n)
d
X
Xn(n)
This implies that the characteristic function of X ,
X (t)
X (n) (t)
X (t), n
can be written as
where X (n) (t) is a characteristic function for each n 1 (see e.g. Breiman, 1968, pp. 191– 192). The characteristic function of an ID random variable can be written in the following form, the so-called Lévy-Khinchine canonical representation,
X (t)
exp ita
itx 1 x2
eitx 1
x2
1 x2
d (x)
where (x) is called the Lévy-Khinchine measure (see e.g. Lukacs, 1970). Another possible representation is the Lévy canonical representation X (t)
exp ita
2
2
t
2
where
0
0
N (u)
2
(
2
x2
1
x2
d (x)
x2
1 x2
(
u
0)
eitx 1
u
M (u)
itx dM (x) 1 x2 itx dN (x) 1 x2
eitx 1
d (x)
0)
0 then X is said to have no Gaussian component. If X has finite variance the If somewhat simpler Kolmogorov representation X (t)
exp ita
e
itx
1 itx
dK (x) x2
can be used (see e.g. Lukacs, 1970). If the Kolmogorov measure K (x) has no mass at zero then X has no Gaussian component. If X has a symmetric distribution the corresponding characteristic function is real and symmetric. For a symmetric ID random variable with finite variance and no Gaussian component we have the representation X (t)
exp 0
2 (cos(tx) 1)
dK (x) x2
(2.4)
23
A
2.3 Properties of the characteristic function of the iterated Itô integral As stated before, Iij (h) can be viewed as a sum of three independent random variables Y1 (h), Y2 (h) and Y3 (h). We now focus our attention on Y2 (h), with characteristic function (2.3). It easily seen that this is the characteristic function of an ID random variable. Moreover, since Y2 (h) (t) is integrable, the distribution of Y2 (h) has a density. Y2 (h) (t) even has finite moments of all orders, giving that the density of Y2 (h) is infinitely differentiable. The characteristic function is itself infinitely differentiable which implies that Y 2 (h) has Im(z) 2 h, finite moments of all orders. Indeed, Y2 (h) (z) is analytic for 2 h whence the distribution has exponential moments. Therefore the tail of the distribution, T (x), is exponentially decreasing, i.e.
T (x) 1 F (x)
F ( x)
O(e
rx
) r
2
h
as x
where F (x) is the distribution function of Y2 (h) (Lukacs, 1983, p. 16). The density of Y2 (h) is also unimodal at zero since Y2 (h) (t) is real and symmetric having the representation 1 t
Y2 (h) (t)
t 0
g(u) du
where g(t) is a characteristic function. See e.g. Lukacs (1983, p. 49). Lévy (1951) showed that Y2 (2 ) (t) has Kolmogorov measure
dK (x)
x 2 exp(x) dx 2 (exp(x) 1)2 2
(2.5)
This implies that Y2 (h) has no Gaussian component. The Lévy-Khinchine measure can now be obtained as
d (x)
3
2
2 1
x2
exp(x) dx x 2 (exp(x) 1)2
(2.6)
Simulation algorithms
3.1 A generalisation of Bondesson’s method Bondesson (1982) proposed a method for simulating positive ID random variables based on a so-called shot-noise representation. The basic idea is to approximate the ID random variable with a sum of random variables, one for each point T k of a homogeneous Poisson 24
3.1 A generalisation of Bondesson’s method
process on (0 ). Let X (u) u 0 be a family of independent (of each other and of the Poisson process) but in general not identically distributed random variables. More precisely, X (u) has distribution function H (x u), where H (x u) is a family of distribution functions on [0 ) indexed by u H (dx u) du N (dx), (0 ) such that 0 being the intensity of the Poisson process. This is an integral representation of the Lévy measure of the ID random variable. Written out in detail Bondesson’s method is as follows:
1. Let Tk k
1 2 be points of a Po( ) process on (0 ) in increasing order.
2. Let X (Tk )
H (x Tk )
3. Define
Z (T )
Tk T
X (Tk )
where T 0 is a truncation time. As T , Z (T ) converges in distribution to the appropriate ID distribution. Since Bondesson’s method only deals with positive ID random variables we need to generalise it to the case of symmetric ID random variables with finite variance. In this case we need that 0 H (dx u) du K (dx) x 2 , i.e. an integral representation of the Kolmogorov measure. This can be shown quite straightforwardly. Let Z be a symmetric ID random variable with finite variance. From (2.4) we have that the characteristic function of Z , Z (t), can be represented as
log
Z (t)
0
2 (cos(tx) 1)
dK (x) x2
where the Kolmogorov measure is symmetric. It easily seen that we can choose each distribution H (dx u) u 0, symmetric as well. Now let T 0 be a truncation time and Tk as above be points of a homogeneous Poisson process J (s) s 0 with intensity . Let
Z (T )
Tk T
X (Tk )
Then
Z (T ) (t)
log E[exp(itZ (T ))]
log EE exp
J (T )
J (T ) The points of the Poisson process on (0 T ) conditioned on J (T ) is the ordered sample log
i
X (Tk )
k 1
from J (T ) i.i.d. random variables uniformly distributed on (0 T ). But the distribution 25
A
of the sum is independent of the ordering of the points, so we can take T 1 TJ (T ) to be i.i.d. with Tk U (0 T ). Thus log
Z (T ) (t)
log
n 0
X (u) (t)
T
1 T
E[exp(itX (u))] du 0
T
E[exp(itX (u))] du
0
n
exp(
T )
T
(
X (u) (t)
0
where
( T )n n!
T
1) du
is the characteristic function of X (u). Now X (u) (t)
1
eitx H (dx u) 1
e
itx
1 H (dx u)
2 (cos(tx) 1) H (dx u)
0
since H (dx u) is symmetric in x. Hence log
Z (T ) (t)
T 0
0
2 (cos(tx) 1) H (dx u) du
A change of order of integration yields log
Z (T ) (t)
Now if lim
T
0
T 0
2 (cos(tx) 1)
H (dx u) du
T 0
H (dx u) du
0
H (dx u) du
dK (x) x2
we have exactly the Kolmogorov representation of Z . There are of course several possible choices of H (x u). From a practical point of view we want to have control of the behaviour of the tail sum Ztail (T )
Tk T
X (Tk )
Two extreme cases can be obtained; either the convergence is fast enough for the tail sum to be neglected or the convergence is slow enough for the tail sum to be approximated by 26
3.2 Method A
a Gaussian variable. Another important point is that it should be easy to simulate from H (x u). A further property, which we will utilise below to improve the simulation algorithms, is that the tail sum Ztail (T ) is independent of Z (T ). This follows since X (u) is family of independent random variables and the Poisson process has independent increments. As mentioned above, we are interested in the MSE of the approximation. In order to compute the MSE we need to define the random variable Z and its approximation Z (T ) on the same probability space. This is easily achieved if the random variable Z ( ) is well-defined, i.e. if the sum X (Tk ) converges, since we can then take Z Z ( ). It follows easily by the independent increments of the Poisson process that Z ( ) has variance
EZ ( )2
EX (u)2 du
0
Provided this variance is finite, which we have assumed, the above sum converges a.s. and in mean square sense. This follows by using independent increments once again, and by invoking the two-series theorem for the a.s. convergence.
3.2 Method A According to (2.5) we should choose H (x u) such that
2
exp(x) dx 2 2 (exp(x) 0
1) One possible choice is to let H (dx u) have point masses 1/2 at g(u) and g(u), where g(u) log((1 u) u), and let 2 . This leads to the following algorithm:
H (dx u) du
A1. Simulate Y2(T ) (h)
(h 2 )Z (T ) from the generalised Bondesson method with
Z (T )
log
Tk T
1
Tk Tk
B(Tk )
where B(t) is a family of i.i.d. random variables with P(B(t) As T
1)
P(B(t)
1)
1 2
, Y2(T ) (h) converges in distribution to Y2 (h).
3.3 Method B Y2 (2 ) (t)
The characteristic function
Y2 (2
) (t)
exp
2
k 1
can be written as
t2 k2
t2
exp k 1
2
1
1 1 t 2 k2
27
A
(Lévy, 1951). Hence Y2 (h) can be viewed as a sum of compound Poisson random variables. This leads to the following simulation algorithm: B1. Simulate Nk
Poisson( 2 ) k
B2. Simulate Xik
Laplace(1 k) i
1 n. 1 Nk k
1 n.
B3. Define h 2
Y2(n) (h)
Nk
n
Xik
k 1 i 1
As n , Y2(n) (h) converges in distribution to Y2 (h). This method is in fact equivalent 2 to choosing , H (dx u) ( 1 u 2) exp x 1 u dx and T n in the generalised Bondesson method, i.e. X (u) Laplace(1 1 u ).
3.4 Method C Damien, Laud & Smith (1995) proposed the following method for simulating ID random variables:
1. Let X1 Xn be n i.i.d. samples from the distribution (1 k) (x), where (x) is the finite Lévy-Khinchine measure of the ID random variable and k is its total mass. 2. Simulate Yi
Po
k (1 nXi2
3. Define Z (n)
n i 1
Xi2 ) i
X i Yi
1 n
k nXi
As n , Z (n) converges in distribution to the appropriate ID distribution. We now obtain the following method for simulating iterated Itô integrals:
C1. Generate Y2(n) (h) (h 2 )Z (n) where Z (n) is a sample from the Damien, Laud and Smith algorithm with d (x) given by (2.6), given by (2.1) and with k 1 176680161 2.
As n , Y2(n) (h) converges in distribution to Y2 (h). The samples from d (x) k can be generated with rejection from the Laplace distribution with rejection constant r 1 10528854. The constants k and r were computed numerically. 28
3.5 Mean square rate of convergence
3.5 Mean square rate of convergence In this section we compute the MSE for methods A and B. For method C we have not been able to carry out an analysis of this kind. Indeed, for this method we could not define Y2 (h) and its approximation on a common probability space. Note that all expectations in this and the following sections are taken conditionally on unless explicitly stated. We start with method A. Let T Y2 (h) Y2(T ) (h) be the tail of the approximating sum in this method and let 2T denote its variance.
Theorem 3.1 The MSE for method A is
2 T
E Y (h)
Y2(T ) (h) 2
2
h 2
2
1 T 0
Moreover, the right-hand side is an upper bound on Proof. We have 2
exp
exp
T
2
X (u) (th
cos
2
1 as T T
for each T 0.
1 T
2
0
T)
tlim 0
cos (th 2 ) log(1 y2
so that V(
1 du
1 u yields
(t) exp T
2 T
(2 )) 1 du
u 1 th log 2 u
T
A change of variables y
2 T
h 2
(t) T
log(1 y)2 dy y2
(t) T
y) 1 dy
h 2
2
1 T
0
log(1 y)2 dy y2
L’Hôspital’s rule shows that 1 T
lim T
T
2 T
The bound
0
log(1 y)2 dy y2
lim T 2 log(1
T
1 T )2 1
(h 2 )2 T follows from the inequality log(1
y)
y y 0.
Thus the mean square distance between Y2 (h) and Y2(T ) (h) is asymptotically decreasing at rate 1 T . We now turn to method B. Let n Y2 (h) Y2(n) (h) be the tail of the approximating sum in this method and let 2n denote its variance.
29
A
Theorem 3.2 The MSE for method B is
2 n
E Y (h)
h 2
2
Y2(n) (h) 2
2
k n 1
Moreover, the right-hand side is an upper bound on Proof. The characteristic function of
where c
2 n
exp
h 2 . The variance of
n
(t) n
tlim 0
2 2
c
k n 1
2 n
h 2
2
for each n
2 as n n
1.
is
n
n (t)
2 k2
2
c2 t 2
k n 1
is
c2 t 2
k2
2 dx x2
2 k2
2 2
c
n
2c 2 n
2
as n
The same rate of decay is obtained by approximating the sum from below.
Hence the mean square distance between Y2 (h) and Y2(n) (h) is asymptotically decreasing at rate 1 n.
4
Improved rate of convergence through tail approximation
4.1 Asymptotic normality of the tail sums For both method A and method B the variance of the tail of the approximating sum is asymptotically decreasing at rate T 1 as T . We will now show that both tail sums are asymptotically Gaussian. Again we first look at method A. Let
T,
be the asymptotic variance of Theorem 4.1
T
˜
T
˜ 2T
i.e. V (
N (0 1) and
T
h 2
T)
˜ T
2 T
2
1 T
1 as T N (0 1) in distribution as T
Proof. The logarithm of the characteristic function of the normalised tail sum is 1 T
log 30
T
˜ T (t)
2 0
cos(t T log(1 y) y2
)
1
dy
4.1 Asymptotic normality of the tail sums
By the mean value theorem,
)
log
T
˜ T (t)
1 T
T log(1
2 cos(t
)
1
2
1 f ( ) T
where [0 1 T ]. For sufficiently large T the integrand f (y) is increasing on [0 1 T ]. We can therefore bound f ( ) from above and below by f (1 T ) and f (0), respectively, i.e.
1 f (0) T
log
T
2
t T 2 and f (1 T )
Now f (0)
the inequality cos(x) 1
x
2
˜ T (t)
˜
2
T
1.
be the asymptotic variance of n
1 4 T 2 t 4 log(1 1 T )4 T 4! 4 1 T2
2
1 . From
3 4
T t log 1 4!
1 T
4
n,
˜ 2n
n
h 2
2
i.e. V ( n ) ˜ 2n
N (0 1) and
Proof. If we normalise
n
n
2 n 1 as n
N (0 1) in distribution as n
by its asymptotic standard deviation ˜ n we obtain
log
˜
1 T)
2
t 2 for each t and the first part result follows. The second
We now turn to method B. Let
n
t T 1 log 1
2 T t2
2 as T
Theorem 4.2
x 4! it follows that
T
)
T 2 cos(t T log(1
2
2
4
2
1 2 Tt 2 log(1 1 T )2
T 2 2 1 T2
1 f (1 T ) T
Hence log T part follows as
1 f (1 T ) T
˜ T (t)
n
˜ n (t)
2
k n 1
bnt 2 bnt 2 k2
1 (2 2 ). We approximate the sum from above and below by integrals,
where b
2
n 1
bnt 2 dx bnt 2 x 2
log
n
˜ n (t)
2 n
bnt 2 dx bnt 2 x 2
(4.1) 31
A
Evaluating the last integral we get
log
n
˜ n (t)
2
2
bn t
Now arctan x
2
arctan
1 x
O
1 x3
as x
t nb
whence
lim sup log n
n
˜ n (t)
t2 2
Using the same technique when evaluating the first integral in (4.1) leads to the same lim inf, so that lim log
n
n
˜ n (t)
t2 2
which completes the proof of the first part. The second part follows as ˜ n
n
1.
4.2 Modified simulation algorithms The asymptotic normality of the tail sums together with their independence of the corresponding main approximating random variables suggest us to modify methods A and B by adding Gaussian random variables with suitable variances. Hence we define the following methods. A’ : A” : B’ : B” :
Y˜2(T ) (h) Y2(T ) (h) T GA (T ) (T ) ˜ Y2 (h) Y2 (h) ˜ T GA Y˜2(n) (h) Y2(n) (h) n GB Y˜2(n) (h) Y2(n) (h) ˜ n GB
where GA and GB are standard Gaussian variables independent of Y2(T ) (h) and Y2(n) (h). We remark that methods A’ and B’ provide random variables with the correct variance EY2 (h)2 , while the variances given by methods A” and B” are somewhat too large. To calculate the MSE for methods B’ and B” we shall use that the tail sum n has a class G distribution. We therefore first recall some facts about such distributions. 32
4.3 Class G distributions
4.3 Class G distributions An interesting subclass of the symmetric ID distributions is the so-called class G. This class consists of variance mixtures of standard Gaussian random variables with the mixing distributions being positive and ID. Some very common symmetric ID distributions are class G, e.g. Laplace, Gaussian and logistic. Now let Xg be a class G random variable. This is equivalent to that Xg can be factorised as a product of independent random variables
Xg
G Y
where G is a standard Gaussian random variable and Y is a positive ID random variable. The density of Xg is given by
fXg (x)
1 x2 exp
2 y 2y
0
dFY (y)
A random variable Xg has a class G distribution if and only if its characteristic function has the form
Xg (t)
( t 2))
exp(
where the function (t), t 0, has a completely monotone derivative and (Rosi´nski, 1990). Recall that a function f (t) is called completely monotone if ( 1)n
d n f (t) dt n
0 for each n
(0)
0
0
Since class G distributions are conditionally Gaussian, the conditional characteristic function is of the type
Xg
t 2Y 2
Y (t) exp
Hence
Xg (t)
E exp
t 2Y 2
which is the Laplace transform of the mixing distribution evaluated at t 2 2. Thus Y has Laplace transform
Y (t)
Xg (
2t) t
0
(4.2) 33
A
4.4 Coupling of tails Proposition 4.1 The tail sum for method B is of class G. Proof. We have that
log
2t) n(
2
k n 1
2c 2 t t 0 2c 2 t k2
where c h (2 ). Each term in the sum has a completely monotone derivative, hence so has the sum and the result follows.
Notice that by taking n
0 in the above proof it follows that Y2 (h) itself is of class
G. A different and perhaps more intuitive proof of Proposition 4.1 is obtained by utilising the infinite divisibility of the Poisson process and by observing that each random variable X (Tk ) added in the Bondesson interpretation of method B is Laplace distributed, i.e. is a normal variance mixture. The tail sum of method B is asymptotically Gaussian and since we know that the tail sum has a class G distribution this implies that the normalised mixing distribution converges in distribution to one, cf. (4.2). We will now show that the normalised mixing distribution converges to one in mean square sense. This can then be utilised to increase the mean square convergence rate for the approximating sum. We can write the tail sum n as product of a standard Gaussian random variable G and the square root of an independent positive ID random variable V n , i.e.
Notice that
2 n
E l
where c of Vn .
2 n
Vn
(4.3)
EG 2 EVn EVn . The log Laplace transform of Vn is
Vn (t)
log
n
( 2t)
2
k n 1
2tc 2 2tc 2 k2
h (2 ). This function is closely related to the cumulant generating function
Lemma 4.1 E 34
G
n
Vn
2 n
1
2
2 3 2n
4.4 Coupling of tails
and
E for each n
Vn 1 ˜ 2n
2
2 3
1 n2
2n
1.
Proof. Direct calculation shows that
E
Vn
2 n
1
2
E(Vn
4 n
2 2 n)
2 4
8 c
1
4 n
l
Vn (0) 4 n
2
1 k4
k n 1
2
k n 1
1 k4
k n 1
1 k2
2
where c h (2 ). From integral approximations of the sums we obtain
E
Vn
1
2 n
2
2
2
n 1 2
( n
3 4
1 x 4 dx
1
x2
dx)2
3 4)2 1 2)3
2 (n 3 2 (n
2
3 2n
Furthermore we have that E
Vn 1 ˜ 2n
1 ˜ E(V
2
1 ˜ l
4 n
k n 1
) 2 2 n
1 ˜ 4n
Approximating the sums with integrals gives E
Vn 1 ˜ 2n
2
8 2 c4 ˜ 4n
2 n
2 2
2 c
k n 1
˜
2 n
2 2
2 c
E
Vn 1 ˜ 2n
2
2 1 3 2n
1 ˜ 4n
n2
2 n
2 2
1 n
1
2
2
n 1
1 dx x2
n.
Hence
2
2
n
1 n
2 c ˜
; n 1 the bound on the second term holds since ˜ is an upper bound on 8 2 c4 3 ˜ 4n n3
1 k2
1 ˜ 4n
1 dx x4
n
2 2 n
˜
1 k4
)
( ˜ 2n
( ˜ 2n
Vn (0)
8 2 c4 ˜ 4n
2 2 n)
n
4 n
2 1 3 2n
1 n2 35
A
which proves the result.
We now consider a coupling between the random variable n and its approximations in methods B’ and B” respectively by using the same standard Gaussian random variable in both the true tail sum and its approximations. That is we put the random variable G B in methods B’ and B” equal to the random variable G in (4.3). Thus the true tail sum n and its approximations n GB and ˜ n GB are now defined on a common probability space. This is called a coupling of these random variables. By a coupling is generally meant to define random variables with prescribed marginal distributions on common probability space. Usually this construction involves introducing some kind of dependence. The following theorem shows that the particular coupling given above works well for our present purposes.
Theorem 4.3 For method B’ we obtain by coupling the MSE
4 h2 3(2 )2 n2
E Y2 (h) Y˜2(n) (h) 2
1. For method B” we obtain by coupling the MSE
for each n
2 2 h2 1 (2 )2 n3
4 h2 3(2 )2 n2
E Y2 (h) Y˜2(n) (h) 2
1.
for each n
Proof. For method B’ we have
E Y2 (h) Y˜2(n) (h) 2
EGB2 E
E Vn
2 2 n
2 n
Vn
2 n
2 nE
E Vn
2 n
Vn
Vn 1
2 n
2
n
2
Using Lemma 4.1, the first part of the theorem follows. The second part follows similarly.
The theorem thus shows that methods B’ and B” have MSEs decreasing at rate 1 n 2 , as opposed to the slower rate 1 n for methods A and B. This can be explained as follows. For method B, the MSE is equal to 2n EVn . For method B’, the MSE is dictated by the variance of Vn , which decays faster than its mean. For method B” an extra smaller order term is added because of the difference between 2n and ˜ 2n .
36
5. Applications to the simulation of SDEs
By the L2 -Wasserstein distance between two distributions F1 and F2 , both with finite variance, is meant the minimum mean square distance between random variables defined on a common probability space and having marginal distributions F 1 and F2 respectively. Theorem 4.3 thus provides an upper bound for the L2 -Wasserstein distance between the distribution of n and a normal distribution. For method A we have not been able to carry out an analysis similar to the above one. We do conjecture, however, that the distribution of its tail sum T is a Gaussian variance mixture not in class G; in fact we can show that it is not in class G. If the conjecture is true one could of course construct a coupling as above to analyse methods A’ and A”.
5 Applications to the simulation of SDEs The results in the previous sections can now be utilised for the simulation of SDEs. We can approximate Iij (h) with I˜ij (h) Y1 (h) Y˜2 (h) Y3(h) where Y˜2 (h) is an approximation of Y2 (h) obtained from one of the three methods described above. The simulation of Y1 (h) and Y3 (h) is exact so the MSE in the approximation of the iterated Itô integral is just the MSE in the approximation of the random variable Y 2 (h). Corollary 10.6.5 in Kloeden & Platen (1995, p. 362) states that the MSE should be Dh3 to obtain strong convergence of order 1. The constant D should be chosen such that the MSE in the approximation of Iij (h) is negligible compared to the error terms in SDE approximation. Thus we need to choose 2
T
2
1 3
T T
1 3
1 Dh
for method A
1
for method B’
Dh 1
3
2
2
1
Dh
(5.1)
for method B”,
to obtain the desired convergence rate in the numerical approximation of the SDE (1.1). Finally we briefly compare our methods to each other and to the methods proposed by Kloeden, Platen & Wright (1992) (see also Kloeden & Platen, 1995, pp. 200–205) and Gaines & Lyons (1994). The method of Kloeden, Platen & Wright is based on a Fourier expansion of the Brownian bridge, and has MSE decaying at rate 1 n, where n is the number of terms included in the approximating sum. More precisely, the MSE is bounded by h2 (2 2 n), which is on the average equal to the MSE for method A, since the expectation of 2 , which has a 2 (2)-distribution, is 2. Their method requires simulation of 4n 2 standard normal variables, whereas method A requires one logistic variable, one Poisson variable, and on the average 2T uniform variables (yielding the points of the Poisson process). We find that method A requires less work than does the method of Kloeden, Platen & Wright, although method A involves logarithms, and thus advocate
37
A
method A in preference to the latter. Method B has an MSE twice as big as that of method A and does not require less computations for a given n, and should thus not be used. However, our main interest is of course in the case when h and Dh are small, so that the faster convergence rate of methods B’ and and B” becomes an advantage. Both methods require, for a given n, the simulation of one logistic variable, one Poisson variable, one normal random variable, and, on the average, 2n uniform variables and 2n Laplace variables. Method B’ suffers from the slight drawback that 2n is not available in closed form, but this number can easily be computed beforehand for various n and tabulated. Method B” has an MSE that is bounded by 1 3 2 (2n) times the MSE of method B’, cf. Theorem 4.3. Thus for a given step size h, method B”, on the average, requires an n-value that is at most E 1 3 2 2 1 89 times larger than that of method B’. However, as the step size h tends to zero methods B’ and B” asymptotically require the same number of terms in the approximating sum. To see this, one needs to do a more careful analysis than in (5.1) of the n required to achieve a given MSE in method B”; it involves solving a cubic equation and we do not show these computations here. From a practical point of view, the difference in efficiency between method A on one hand and methods B’ and B” on the other hand will be more pronounced as the step size h tends to zero. The notion of complexity for a method can also be viewed in a different way. Assume that we want to simulate an SDE with a mean error E X h (T ) X (T ) ; how much work is required to accomplish this? If we measure work by the number of 2 for Gaussian random variables that needs to be simulated, we obtain W KPW ( ) 3 2 Milshtein combined with the Kloeden, Platen & Wright method and W NM ( ) for Milshtein combined with our new methods B’ and B”. The notation W M ( ) means that as 0, the number of Gaussian variables needed to achieve the accuracy for the method M is O( ). If we compare this with the Euler method which has 2 , it is evident that there is no gain in using Milshtein combined with WEULER ( ) the Kloeden, Platen & Wright method since it requires no less (in practice even more) work than the Euler method to obtain the same accuracy. The Euler method is also easier to implement and faster to execute provided that the evaluations of the drift and dispersion functions are not too time-consuming compared to generation of the Gaussian random variables. This clearly shows why it is crucial to have a convergence rate faster than h2 n in the approximation of the iterated Itô integrals. The method by Gaines & Lyons differs from all the above ones in that it is exact and based on inversion of the joint characteristic function of 2 and the Lévy stochastic area integral Aij (2 ) (see (1.3)) rather than on a probabilistic representation and analysis of the iterated Itô integral. It is certainly also the fastest method; Gaines & Lyons report that simulation of two Wiener increments and one iterated Itô integral takes about the same time as the simulation of approximately fourteen standard normal variables. However, the method is also by far the most complicated one to implement and in
38
REFERENCES
deed sometimes—although seldomly—requires on-line numerical Fourier inversion of the characteristic distribution. Hence code for this operation must be included in the simulation package. Moreover, exact simulation of the iterated Itô integrals is not really necessary since other sources of error, such as the time discretisation, also influence the precision of the solution. A problem that has not been addressed in this article is the weak approximation of SDEs, i.e. estimation of an expectation E[f (X (T ))] of the process at some time T rather than a pathwise approximation. For weak approximation it is possible to replace the iterated Itô integrals by random variables with considerably less complicated probabilistic structure and still obtain a convergence rate of O(h) in the weak sense. Moreover, it is possible to use the same type of extrapolation methods as in the deterministic case to improve the accuracy of the approximation (e.g. Romberg extrapolation, Talay & Tubaro, 1990).
References Bondesson, L. (1982). On simulation from infinitely divisible distributions. Adv. Appl. Prob. 14, 855–869. Breiman, L. (1968). Probability. Addison-Wesley, Reading. Reprinted 1992 in SIAM series. Damien, P., Laud, P.W. & Smith, A.F.M. (1995). Approximate random generation from infinitely divisible distributions with applications to Bayesian inference. J. Roy. Statist. Soc. B 57, 547–563. Gaines, J.G. & Lyons, T.J. (1994). Random generation of stochastic area integrals. SIAM J. Appl. Math. 54, 1132–1146. Kloeden, P.E., Platen, E. & Wright, W. (1992). The approximation of multiple stochastic integrals. Stoch. Anal. Appl. 10, 431–441. Kloeden, P.E. & Platen, E. (1995). Numerical Solution of Stochastic Differential Equations. Springer-Verlag, Berlin. Lévy, P. (1951). Wiener’s random functional and other Laplacian random functionals. Proceedings of the Second Berkeley Symposium on Mathematical Statistics and Probability. J. Neyman, ed., pp. 171–187, University of California Press, Berkeley. Lévy, P. (1965). Processus Stochastiques et Mouvement Brownien. Gauthier-Villars, Paris. Lukacs, E. (1970). Characteristic Functions. Charles Griffin, London. Lukacs, E. (1983). Developments in Characteristic Function Theory. Charles Griffin, London. 39
A
Milshtein, G.N. (1974). Approximate integration of stochastic differential equations. Theor. Prob. Appl. 19, 557–562. Rosinski, J. (1990). On the representation of infinitely divisible random vectors. Ann. Probab. 18, 405–430. Rümelin, W. (1982). Numerical treatment of stochastic differential equations. SIAM J. Numer. Anal. 19, 604–613. Talacko, J. (1956). Perks distributions and their role in the theory of Wiener’s stochastic variables. Trabajos de Estadistica 7, 159–174. Talay, D. & Tubaro, L. (1990). Expansion of the global error for numerical schemes solving stochastic differential equations. Stoch. Anal. Appl. 8, 483–509.
40
B
Paper B Joint characteristic function and simultaneous simulation of iterated Itô integrals for multiple independent Brownian motions M AGNUS W IKTORSSON Lund University
Abstract We consider all two times iterated Itô integrals obtained by pairing m independent standard Brownian motions. First we calculate the conditional joint characteristic function of these integrals, given the Brownian increments over the integration interval, and show that it has a form entirely similar to what is obtained in the univariate case. Then we propose an algorithm for the simultaneous simulation of the m2 integrals conditioned on the Brownian increments that achieves a mean square error of order 1 n2 , where n is the number of terms in a truncated sum. The algorithm is based on approximation of the tail-sum distribution, which is a multivariate normal variance mixture, by a multivariate normal distribution.
2000 Maths Subject Classification: Primary 60H05; Secondary 60H10 Keyword: iterated Itô integral, multi-dimensional stochastic differential equation, numerical approximation, variance mixture
1 Introduction Consider the multi-dimensional stochastic differential equation (SDE) dX (t)
b(X (t) t) dt
(X (t) t) dW (t)
(1.1)
where X (t) is a d-dimensional vector and W (t) is an m-dimensional vector of independent standard Brownian motions. The functions b(X (t) t) and (X (t) t) are measurable m mappings from d to d and from d to d , respectively. The above equation is here interpreted in the Itô sense.
43
B
Explicit solutions to (1.1) can only be found in a few special cases, so that in general we are confined to computing numerical approximations. Consider a sequence X h (t) 0 t T for h 0, of numerical approximations of a (strong) solution X (t) 0 t T , where X h ( ) is defined for t 0 h 2h T , where h is called the step size. This sequence is said to converge at rate O(h ) if
E X (T ) X h (T )
O(h ) as h
0
For example, the simplest scheme, the Euler one, converges at rate 1 2. The dispersion matrix (x t) of the SDE (1.1) is said to satisfy the so-called commutativity condition if
Li
kj
Lj
ki
i j
1 m k
where the differential operator Li is given by i
L
d
1
i (x
t)
1 d
(1.2)
x
In the general case where (x t) does not satisfy (1.2), it is not possible to generate numerical approximations converging faster than O(h1 2 ) unless the iterated Itô integrals
Iij (tn tn
tn h
h)
tn
s tn
dWi (u) dWj (s)
(1.3)
are included in the numerical scheme (see e.g. Rümelin, 1982). Here t n are the time points used in the discretisation. Milshtein (1974) proposed a numerical scheme that converges strongly at rate O(h) ) and ). In this scheme the kth component of C 2 1( d if b C 1 1 ( d the approximation is given by
Xkh (tn
Xkh (tn )
h)
X h (t0 )
bk h
m
i 1
ki
Wi (tn tn
h)
m
m
i 1 j 1
Li
kj Iij (tn tn
h)
X (t0 )
where Wi (tn tn h) Wi (tn h) Wi (tn ). The purpose of this paper is twofold. First we derive the conditional joint characteristic function of the iterated Itô integrals given the Brownian increments, and secondly we propose an algorithm for the simultaneous simulation of the iterated Itô integrals and the Brownian increments. Before proceeding we note that the joint distribution of Wi (t t h) Wi (t h) Wi (t), Wj (t t h) Wj (t h) Wj (t) and Iij (t t h) for
44
1. Introduction
i j 1 m does not depend on t, and hereafter we write Wi (h) for Wi (t t h) and Iij (h) for Iij (t t h). In the case m 2 the conditional characteristic function of I12 (h) given W1 (h) and W2 (h) is given by
I12 (h)
W1 (h)
th 2 sinh(th 2)
W2 (h) (t)
exp
2
2
((th 2) coth(th 2) 1)
ıtha
(1.4)
where 2 ( W1 (h)2 W2 (h)2 ) h, a W1 (h) W2 (h) 2h and ı is the imaginary unit. This expression was derived by Lévy (1951) (see also Talacko, 1956; Lévy, 1965, pp. 329–333). In Section 3 we show that a similar expression holds true in the multi-dimensional case. There is no simple way to simultaneously simulate the iterated Itô integrals and Brownian increments exactly. In the case m 2, Gaines & Lyons (1994) proposed an algorithm for exact simulation of 2 (see above) and the single iterated Itô integral I12 (h), based on Marsaglia’s “rectangle-wedge-tail”-method. This method is complicated to implement, however, and occasionally requires numerical inversion of the joint characteristic function of 2 and I12 (h). In higher dimensions, Kloeden, Platen & Wright (1992) suggested a simulation algorithm essentially based on truncation of an infinite series representation of the iterated Itô integrals. In order to accomplish a convergence rate of h for a numerical scheme approximating an SDE, the mean square error (MSE) in the approximation of the iterated Itô integrals must be negligible compared to the discretisation error of the numerical scheme. More precisely, an MSE of Ch3 for some positive constant C is required (Kloeden & Platen, 1995, Corollary 10.6.5). Hence it is important to have an algorithm that simulates the iterated Itô integrals with small MSE in short time. The algorithm of Kloeden, Platen & Wright (1992) has an MSE of order h 2 n, where n is the number of terms in the truncated sum. In Section 4 we show that a slight modification of this algorithm yields an MSE of order h2 n2 . Hence, with this improved convergence rate n needs to be proportional to h 1 2 rather than h 1 , resulting in a considerable speed-up of the simulation. This can also be viewed in different way. Assume that we want to simulate an SDE with a mean error E X h (T ) X (T ) ; how much work is required to accomplish this? If we measure work by the number of Gaussian random variables 2 for Milshtein combined with the that needs to be simulated, we obtain WKPW ( ) 3 2 for Milshtein combined Kloeden, Platen and Wright algorithm and WNA ( ) with our new algorithm. The notation WM ( ) means that as 0 the number of Gaussian variables needed to achieve the accuracy for the method M is O( ). 2 , it is evident If we compare this with the Euler method which has WEULER ( ) that there is no gain in using Milshtein combined with the Kloeden, Platen and Wright method since it requires no less (in practice even more) work than the Euler method to obtain the same accuracy. The Euler method is also easier to implement and faster to
45
B
execute provided that the evaluations of the drift and dispersion functions are not too time-consuming compared to generation of the Gaussian random variables. This clearly shows why it is crucial to have a convergence rate faster than h2 n in the approximation of the iterated Itô integrals. Before closing this section we give some general notation used throughout the paper. The matrix In is an n n identity matrix, 0n m is an n m matrix of zeros and 0n is a column-vector of n zeros. Furthermore AT will denote the transpose of A. The imaginary unit ( 1) will be denoted by ı.
2
Representation of the iterated Itô integrals
The iterated Itô integrals are closely related to the so-called Lévy stochastic area integrals, denoted by Aij (h) and defined by Iij (h) Iji (h) 2
Aij (h)
for i j 1 m. These integrals have a nice geometric interpretation; A ij (h) would, if the Brownian motion had finite variation, equal the signed area enclosed by the two(Wi (t) Wj (t)) from 0 to h and the chord dimensional Brownian motion W (t) connecting W (h) and W (0) (0 0) (see Figure 1). We can think of it as a stochastic generalisation of area. We now state some useful relations between Iij (h), Aij (h), Wi (h) and Wj (h) for i j:
Iij (h)
Iji (h)
Iij (h)
Aji (h)
Wi (h) Wj (h) a.s. Wi (h) Wj (h) Aij (h) a.s. 2
Aij (h) Wi (h)2 h a.s. 2 0
(2.1)
Iii (h)
Aii (h)
Kloeden, Platen & Wright (1992) gave the following simultaneous representation of Iij (h), Wi (h) and Wj (h) for i j 1 m:
Wi (h) Wj (h) h 2 h 1 Xik Yjk 2 k
Iij (h)
ij
Aij (h)
Aij (h)
46
k 1
2 Wj (h) h
3. Conditional joint characteristic function of the stochastic area integrals
1
0.5
0 A (h)
−0.5
ij
−1
(wi(h),wj(h))
−1.5 −2.5
−2
−1.5
−1
−0.5
0
0.5
Figure 1: Illustration of Lévy’s stochastic area integral.
Xjk
2 Wi (h) h
Yik
where Wi (h) N (0 h), Xik N (0 1) and Yik N (0 1) i 1 m k 1 2 are all independent. If we let I(h) and A(h) be the matrices where element i j equals Iij (h) and Aij (h) respectively, we can rewrite this representation in matrix form as
W(h) W(h)T hIm A(h) 2 h 1 Xk (Yk 2 h W(h))T 2 k
I(h)
A(h)
k 1
(Yk
2 h
W(h))XTk
(2.2)
where now W(h) N (0 hIm ), Xk N (0m Im ) and Yk N (0m Im ), k 1 2 are all independent. Indeed, W(h) ( W1 (h) Wm (h))T , Xk (X1k Xmk )T and Yk (Y1k Ymk )T .
3 Conditional joint characteristic function of the stochastic area integrals
Recall that m is the number of independent Brownian motions. Let ij be the variable which corresponds to the random variable Aij (h) in the joint characteristic function. From (2.1) it follows that we only need to calculate the characteristic function of the random j, since the other stochastic area integrals depend in a deterministic variables Aij (h) i
47
B
way on these. We denote this set of Aij (h)’s by A(h). Now define triangular matrix, with zeros on the diagonal, given by
Let A(h) W(h).
W(h) (
ij
as the m
m upper
for i j otherwise.
ij
0
) be the conditional joint characteristic function of A(h) given
Theorem 3.1 The conditional joint characteristic function of A(h) given written as
A(h)
)
W(h) (
exp tr
det sinch( ( ))
W(h) can be
1 2
W(h) W(h)T cosh( ( )) sinch( ( )) 2h
T
1
Im
(3.1)
)2 2 1 2 and where the hyperbolic where sinch(x) sinh(x) x, ( )
h(
functions and the square root should be interpreted in the matrix sense.
Before giving the proof we note that the characteristic function A(h) has a form that is similar to what is obtained for m 2, cf. (1.4). The first factor does not depend on W(h) and it follows by taking W(h) 0m that it is itself a characteristic function. In the univariate case m 2 it is the characteristic function of a logistic random variable and so its density is known; for m 2 all marginals are of course still logistic, but the joint distribution involves dependencies and we have not been able to find a closed form for the joint density. The second factor is also a characteristic function itself when m 2 but its density admits no simple closed form expression. When m 2 we do not even know if this factor is a characteristic function. The factor exp(ıtha) in (1.4) comes from the second relation in (2.1) and so does not appear in A(h) W(h) .
Proof of Theorem 3.1. First note that
A(h)
W(h) (
)
E exp ı E[exp ı tr(
i j T
ij Aij (h)
A(h))
W(h)
W(h)]
Using (2.2) it is clear that tr(
T
A(h))
h 2
48
k 1
1 XTk (
k
T
)(Yk
2 h
W(h))
3. Conditional joint characteristic function of the stochastic area integrals Now since Xk 1 and Yk 1 are two independent i.i.d. sequences of random vectors it follows that
A(h)
)
W(h) (
h 2 k
k 1
T
E[exp ıXT1 (
2 h W(h)) W(h)]. In order to simplify )(Y1 T the notation we write ¯ for
. To compute ( ), first calculate the conditional characteristic function given Y 1 , where ( )
E[exp ıXT1 ¯ (Y1
exp (Y1
)
Y1 (
2 h
2 h
2 h
Y
W(h))
W(h))
T ¯ ¯T
1
(Y1
W(h)]
2 h
T
W(h)) 2
W(h))T ¯ ¯ (Y1 2 h W(h)) is a quadratic T ¯ ¯ form in Y1 2 h W(h) with matrix . Thus ( ) EY1 exp( Q 2), which is the moment generating function of Q evaluated at the point 1 2. From Mathai & Provost (1992, Theorem 3.2a.1) we get The random variable Q
EY1 exp(tQ)
(Y1
det(Im 2t ¯ ¯ ) T
1 exp
2
2 h
1 2
W(h)T Im (Im 2t ¯ ¯ ) T
1
2 h
W(h)
Hence ( )
T det(Im ¯ ¯ ) 1 2 1 exp
2 h W(h)T Im (Im 2
2 h
¯ ¯T ) 1
W(h)
and thus
A(h)
W(h) (
)
det Im
k 1
exp
1 2 h W(h)T Im 2
det
exp
Im
1 2
c2 ¯ ¯ T k2
c2 ¯ ¯ T k2 1
Im
k 1
1 W(h)T h
c 2 ¯ ¯ T k2
I I m m
k 1
1
2 h W(h)
1 2
c 2 ¯ ¯ T k2
1
W(h)
49
B
where c If a
h (2 ). then
(1 k 1
a
a k ) 2
1
sinh(
a)
and k 1
1 2
a k2 ) 1 )
(1 (1
a
cosh(
a)
sinh(
1 ; a)
T for a 0 the functions by continuity equal one and zero respectively. Since ¯ ¯ is a symmetric and non-negative definite matrix it then follows, by the spectral lemma for T normal matrices (see e.g Golub & van Loan, 1996, Theorem 11.1.3), that for ¯ ¯ positive definite,
c2 ¯ ¯ T k2
Im
k 1
and k 1
Im
1 2
c2 ¯ ¯ T k2
Im
cosh
1
h 2
¯ ¯ T sinh h 2
¯ ¯T
1
1
h 2
¯ ¯T
h 2
¯ ¯ T sinh h 2
¯ ¯T
1
Im
where the square root and the hyperbolic functions should be interpreted in the matrix sense. If ¯ is singular some extra precaution is necessary and in fact ¯ is always singular if m is odd. This follows from that ¯ is skew-symmetric and since skew-symmetric real matrices only have eigenvalues with zero real part at least one of the eigenvalues must be zero. To avoid problems in the singular case we simply use the matrix version of the function sinch(x) instead, where sinch(x) sinh(x) x. This leads to the representation (3.1) and the proof is complete.
Note that the function 1 sinch( ) is well-behaved on the entire real line and is in fact analytic in the strip
Im(x) . This implies that the characteristic function is T analytic in the region where all eigenvalues of (h 2) ¯ ¯ have their imaginary parts in ( ).
50
4. Simulation of the iterated Itô integrals
We further remark that the conditional joint characteristic function of all the stochastic area integrals can in fact be easily obtained from the one calculated above, since exactly the same calculations go through with replaced by a matrix that has arbitrary real-valued elements in all positions.
4 Simulation of the iterated Itô integrals Because the Brownian increments W(h) can easily be simulated exactly while the iterated Itô integrals cannot, we consider an algorithm that first simulates W(h) and then simulates approximations Iˆij (h) of Iij (h) conditional on the realised W(h). Furthermore, since the iterated Itô integrals and stochastic area integrals only differ by products of the Brownian increments, it is enough to simulate approximations Aˆ ij (h) of Aij (h). Then Iˆij (h) can easily be constructed. Because of the relations (2.1) we only need to approximate Aij (h) for i j; cf. the previous section. The MSE of interest is maxij E(Iij (h) Iˆij (h))2 , see Kloeden & Platen (1995, Corollary 10.6.5). It turns out, however, to be more convenient to work with the sum of squared errors which is of course larger. Thus the error we consider is
E
i j
(Iij (h) Iˆij (h))2
E
i j
(Aij (h) Aˆ ij (h))2
The representation (2.2) immediately suggests a simulation algorithm; truncate the sum after n terms. This algorithm is essentially the one proposed by Kloeden, Platen & Wright (1992) and it has an MSE of order h2 n. In order to improve on this rate a careful analysis of the discarded tail-sum is needed. We shall show that this sum asymptotically has a multivariate Gaussian distribution, and that approximating it with a Gaussian random vector yields a convergence rate of order h2 n2 . To carry out this proof, some more notation is needed, and we also need to formalise the operation of picking out elements with indices i j. We now show how to do this. The Kronecker tensor product between two matrices A and B will be denoted A B and is defined as
A
B
a11 B a21 B .. .
a12 B a22 B .. .
..
am1 B
am2 B
.
a1n B a2n B ..
. amn B
If A is an m n matrix and B is an p q matrix, then A B is an mp nq matrix. The operation vec is defined as the mn 1 matrix obtained by stacking the columns of a
51
B
matrix on top of each other, i.e.
A1 A 2 .. . An
vec(A)
for an m n matrix A. The representation (2.2) can now be written with column vectors as
W(h)
vec(I(h)T ) T
vec(A(h) )
h 2
k 1
1 k
W(h) vec(hIm ) vec(A(h)T ) 2 2 Xk Yk W(h) h
2 W(h) h
Yk
Xk
(4.1)
Let Pm be the m2 m2 permutation matrix which swaps rows i and j 1 m((i
1) mod m) (i 1) div m for i 1 2 m2 . Then Pm (Xk Yk ) Yk Xk . From the definition of Pm we see that it is symmetric and thus Pm is its own inverse. We rewrite the representation of vec(I(h)T ) as
T
vec(A(h) )
W(h)
vec(I(h)T )
h 2
W(h) vec(hIm ) 2
vec(A(h)T )
k 1
1 (Pm Im2 ) k
2 W(h) h
Yk
Xk
(4.2)
We now want to pick out the M -dimensional subset of Aij (h)’s corresponding to i where M m(m 1) 2. Thus define
A(h)
[A12 (h) A1m (h) A 1 (h) A m (h) Am
1 m (h)]
j,
T
This column vector can also be written A(h) Km vec(A(h)T ) where Km is an M 52
m2 matrix which picks out elements 2 m (k 1)m
k
4. Simulation of the iterated Itô integrals
1 km M from vec(A(h)T ). The matrix Km is thus 0m 1 1 Im 1 0m 1 m(m 0 I 0m 2 m(m m 2 m 2 m 2 .. .. .. . . . Km I 0 0 k (k 1)m k m k m k m(m m .. .. .. . . . 01 (m 2)m m 1 1 01 m
1)
2)
k)
One may verify that Km and Pm satisfy the following relations: Km KmT KmT Km
IM diag(0 1Tm
Km Pm KmT
0M
Km Im2 KmT
(Im2 Pm )KmT Km (Im2 Pm )
IM I m 2 Pm
T T 0k 1m
1
k
T 0m
1
M
1 0Tm ) (4.3)
where diag(x T ) is the diagonal matrix with x T on its diagonal. Given A(h), the other stochastic area integrals can be easily be generated from A(h) using relations (2.1). More precisely, vec(A(h)T )
(Im2 Pm )KmT A(h)
It follows from (4.2) that we can write A(h) as h 2
A(h)
k 1
1 Km (Pm Im2 ) k
2 W(h) h
Yk
Xk
We now split this sum into two by defining h 2
(n)
A (h)
n
k 1
1 Km (Pm Im2 ) k
2 W(h) h
Yk
Xk
and n
h 2
k n 1
1 Km (Pm Im2 ) k
2 W(h) h
Yk
Xk
It easy to see from these definitions that given W(h), n and A(n) (h) are conditionally independent. We proceed by examining the tail-sum n in closer detail.
53
B
The term (Pm Im2 ) (Yk 2 h W(h)) Xk in the above sum can for each k, given Yk and W(h), be seen as a conditionally Gaussian column vector with conditional mean 0m2 and conditional covariance matrix (Yk ), where
(Yk )
2 h
(Im2 Pm ) (Yk
(Yk
2 h
W(h))
W(h))T
2 h
(Im2 Pm ) (Yk (Im2 Pm )
Im
Im (Im2 Pm )
2 h
W(h))(Yk
W(h))T
Im
(4.4)
Yk 1 and W(h), the conditional distribution of n is GausHence, given Y sian with mean 0M and covariance matrix (h 2 )2 k n 1 (Yk ) k2 , where (Yk ) Km (Yk )KmT . Write
n
h 2
k n 1
(Yk ) k2
1 2
Gn
where Gn is the random vector Gn
2 h
(Yk ) k
k n 1
2
1 2
n
The conditional distribution of Gn given Y and W(h) is thus a standard Gaussian distribution N (0M IM ), i.e. the conditional distribution does not depend on Y and W(h). Hence Gn is a standard Gaussian vector independent of Y and W(h). From the above calculations it is evident that the random vector n is a Gaussian variance mixture with random covariance matrix k n 1 (Yk ) k2 . We remark that in the scalar case (m 2), the distribution of the mixing random variance is infinitely divisible, which is equivalent to saying that n has a so-called class G distribution. This was shown in Rydén & Wiktorsson (2001). For the definition of class G distributions, see e.g. Rosin´ ski (1990). In the case m 2 it holds that each element in the random covariance matrix is infinitely divisible, but we have not been able to prove that n has a multivariate class G distribution. We shall now examine the asymptotic properties of the random covariance matrix. Let n denote the normalised version, i.e.
n
1 an
k n 1
54
(Yk ) k2
4. Simulation of the iterated Itô integrals
2 where an k n 1 1 k . Define and using (4.3) it follows that
EY1 (Y1 ). Taking the expectation in (4.4)
2 Km (Im2 Pm )(Im h
2IM
W(h) W(h)T )(Im2 Pm )KmT
(4.5)
We will show below that n converges in mean square sense to the constant matrix 1 2 as n . This implies that (2 h)an n converges weakly to a Gaussian vector with zero mean and covariance matrix . This property gives rise to the following improved simulation algorithm for vec(I(h)T ):
1. Simulate
W(h) from N (0m
hIm ).
2. First approximate the stochastic area integrals as (n)
A (h)
h 2
where Xk
n k 1
N (0m Im ) and Yk
3. Simulate Gn
1 Km (Pm Im2 ) k
2 W(h) h
Yk
Xk
N (0m Im ).
N (0M IM ) and add the tail-sum approximation: A(n) (h)
h 1 a 2 n
A(n) (h)
2
Gn
4. Finally define the approximation vec(I(h)T )(n) of vec(I(h)T ) as
T (n)
vec(I(h) )
W(h) vec(hIm2 ) 2 T (n)
Pm )Km A (h)
W(h)
(Im2
The following result gives a bound on the maximal conditional MSE given W(h). We also give an explicit expression for the square root of the asymptotic covariance matrix .
Theorem 4.1 (i) The maximal conditional MSE for the approximation of the iterated Itô integrals given W(h), for n 1, satisfies
max E[ Iij (h) Iij(n) (h) 2 ij
W(h)] i j
h2 m(m 1)(m 24 2 n2
E[ Iij (h) Iij(n) (h) 2
4
W(h)]
W(h) 2 h)
(4.6) 55
B
(ii) The matrix square root
can be explicitly written as
2 1
2(1
W(h) h) W(h) 2 hIM
1
(4.7)
2
As mentioned in the introduction, the MSE in the simulation of the iterated Itô integrals should be no larger than Ch3 . We then see that it is enough to choose
n
m(m 1)(m
4
W(h) 2 h) (24 2)
Ch
(4.8)
in the approximation of the iterated Itô integral. Taking the expectation over W(h) of the right-hand side of (4.6), it follows that
5h2 m2 (m 1) 24 2 n2
max E[ Iij (h) Iij(n) (h) 2 ] ij
Hence, there are two ways of choosing n. Either we first simulate n according to (4.8), or we fix n beforehand according to n
5m2 (m 1) (24 2)
Ch
W(h) and then choose
(4.9)
and thus do not take W(h) into account when selecting n. By Jensen’s inequality, the mean of the right-hand side of (4.8) is smaller than the right-hand side of (4.9), so that the first way of choosing n yields a smaller n on the average.
Proof of Theorem 4.1. The first part is an immediate consequence of Theorem 4.3 below. The second part follows by direct calculation. First let a 1 W(h) 2 h to simplify the notation. Then, with A as in the right-hand side of (4.7),
2
A2
2
2
4a 4a2 IM 2 2(1 a) 2 4a2 IM
(2 2a ) 2 2(1 a) (2 2a2 ) 4a2 IM
2(1 a)2
2a2 2a2
2 2
Hence we need to show that
56
2
(2
2a2 )
4a2 IM 0M
M
4a 4a
4. Simulation of the iterated Itô integrals
which is the same as saying that has minimal polynomial (x 2)(x 2a 2 ). This is further the same as saying that only has two different eigenvalues, namely 2 and 2 2 W(h)(h) 2 h, and since 2IM B, where B is the non-negative definite matrix Km (Im2 Pm )((2 h) W(h) W(h)T Im )(Im2 Pm )KmT , it is enough to show that B2 (2 W(h) 2 h)B. This is equivalent to that B has minimal polynomial x(x
2 W(h) 2 h). Writing W for 2 h W(h) to simplify the notation and using (4.3), it follows that
B2
Km (Im2 Pm )(WW
Km (Im2
Km (Im2
T
Im )(WW T
P )( W WW Pm )(WW 2
m
Km (Im2 Pm )(WW
T
Im )(Im2 Pm )KmT
Im )Pm (WW T
T
T
Im )(Im2 Pm )KmT
Im )(Im2 Pm )KmT Im )Pm (WW
T
Im )Pm (Im2 Pm )KmT
Now using the definition of Pm it follows that B2
W B
T
2
Km (Im2 Pm )(WW
T
WW )(Im2 Pm )KmT
The last term is zero since (WW
T
T
WW )
T
(W
W)(W
T
W )
and from the definition of Pm it is evident that (Im2 Pm )(W W) the proof.
0m2 . This concludes
In the following the usual operator norm will be denoted B and the Frobenius q p norm of a p q matrix B, defined as ( i 1 j 1 Bij2 )1 2 , will be denoted B F .
Theorem 4.2 Conditional on EY where dm
m(m 1)(m
4
W(h), n
n
converges in mean square sense to . Moreover, 2 F
dm 3n
for n
1
W(h) 2 h)
Proof. First observe that EY
n
1 an
k n 1
EY
(Yk ) k2
EY (Y1 )
57
B
Hence, by the definition of the Frobenius norm,
EY and since
n
M
2 F
EY (
p q 1
n pq
n pq )
EY
2
M
p q 1
VY (
n pq )
(Yk ) is an i.i.d. sequence of matrices we obtain EY
n
k n 1
1 k4 1
2 k2
M p q 1 VY (
n 1 2
( n
k n 1
VY ( (Yk )pq ) k4
3
1 x 4 dx
1 x 2 dx)2 4
bn dm an2
4 where bn k n 1 1 k and dm and bn by integrals yields k n 1
M bn VY ( (Y1 )pq ) an2 p q 1
M
1 a2 p q 1 n
2 F
bn an2
(Y1 )pq ). Approximating the sums an
(n 3 4)2 3(n 1 2)3
1 3n
for n
We now turn to the calculation of dm . Recall the definition (4.4) of simplify the notation we drop the index k on Yk and define
2 h W(h))(Y 2 h W(h)) I
E (Y 2 h W(h))(Y 2 h W(h)) (YY Y 2 h W(h) 2 h W(h)Y
Q
(Y
T
Y
T
T
m
T
T
Im Im )
Then we can write
p q 1
VY ( (Y)pq )
EY tr ( (Y) EY (Y))( (Y) EY (Y))T
EY tr Km (Im2 Pm )Q(Im2 Pm )KmT Km (Im2 Pm )Q(Im2 Pm )KmT
EY tr Q(Im2 Pm )KmT Km (Im2 Pm ) Q(Im2 Pm )KmT Km (Im2 Pm )
58
M
dm
EY tr (Q(Im2 Pm )Q(Im2 Pm )) EY tr Q 2 2Q 2 Pm QPm QPm
Im
1
(Yk ). To
4. Simulation of the iterated Itô integrals
EY tr(Q 2 ) 2EY tr(Q 2 Pm )
EY tr(QPm QPm )
where we used (4.3). Now start with tr(Q 2 ). We replace the notation. EY tr(Q 2 )
tr(Im )EY tr((YYT
2 h
W(h) with W to simplify
T
YW I m ) 2 ) T mEY tr(YYT YYT YYT WYT YYT YW YYT ) T mEY tr(WYT YYT WYT WYT WYT YW WYT )
WYT
T
T
mEY tr(YW YYT
T
YW WYT T
mEY tr( YYT WYT YW m (m(m 2) 0 0 m) (0 2 W(h) 2 2m W(h) 2
2m
2 W(h) (0 W(h) ( m 0 0 m) m(m 1)(m 4 W(h) h)
2
T
T
YW YW YW ) Im )
0) h
2
0) h
2
Continue with tr(QPm QPm ). By the definition of Pm , Pm QPm
(YYT
Im
YW
T
T
W Y Im )
and thus T
EY [tr (YYT
EY tr(QPm QPm )
(YYT
T
YW
YW
T
T
W Y Im )
W Y Im ) ]
T
EY [(tr(YYT
T
W Y Im ))2 ] T EY [(YT Y 2W Y m)2 ] T T EY [(YT Y)2 4(W Y)2 m2 4YT YW Y 2mYT Y
YW
T
4mW Y]
Finally 2EY tr(Q 2 Pm )
4
m 0 2m 0 2(m W(h) h)
2(m 1)(m 4 W(h) ) is obtained similarly, giving that d m(m 1)(m 4 W(h) h) m(m
2)
8
W(h) 2 h 2
2
2
2
m
This concludes the proof.
In order to calculate the MSE in the approximation of vec(I(h) T ) the following lemma will be useful. 59
B
Lemma 4.1 If A and B are two symmetric positive definite matrices then
A
1 2
B1
2
A B
1 2
F
F
where is the smallest eigenvalue of B.
Proof. First observe that for a
1
a
a a
0
t
t
1 2
dt
By the spectral lemma that for normal matrices it then follows A1
2
1
tI ) 1 At
(A
1 2
0
1
dt
A(A
tI ) 1 t
1 2
dt
0
for any symmetric positive definite matrix A. Thus A1
2
B1
2
1
so that
A
tI ) 1 t 1
2
dt
(A tI ) (A B)(B tI ) t dt C D for any pair of symmetric matrices C and D, we Using twice that CD obtain (A tI ) (A B)(B tI ) (A tI ) A B (B tI ) t and (B tI ) 1 (t ), We bound these norms from above by (A tI ) 1 2
1
B
2
tI ) 1 (A B)(B
(A
0
F
1
1
1
0
F
1
1
F
1 2
1
B
2
A B F
1
F
1
A
1 2
F
1
yielding
F
1
1
F
1
t
0
t
1 2
dt
1 2
A B
F
Theorem 4.3 The conditional MSE, given 1 2 (h 2 )an Gn of n is bounded by
E[ where c
60
W(h), in the approximation
n
h 1 a 2 n
2
Gn
2
h (2 ) and dm m(m 1)(m
4
W(h)]
dm c 2 6n2
W(h) 2 h).
for n
1
REFERENCES
E[
Proof. n
h 1 a 2 n
2
Gn
2
c 2 an E[ (
W(h)]
c a E E[ ( c aE 2
2
By Lemma 4.1, EY
n
2 F
1 EY
n
n Y
n
n Y
n
n
2 F
)Gn
1 EY 2
2
Y )G n
2 F
n
W(h)]
2
W(h)]
2 F;
here 2 follows from the proof of Theorem 4.1. Now use Theorem 4.2 and the bound an 1 n to finish the proof.
We remark that the tail-sum associated with the full vector vec(A(h) T ) of stochastic area integrals is a Gaussian variance mixture as well, and its conditional covariance matrix converges in mean square sense to a non-stochastic limit. However, this limit, as well as the conditional covariance matrix, are singular and indeed have ranks M so that the corresponding Gaussian variance mixture have support on a subspace of dimension M . This invalidates the use of Lemma 4.1 for the asymptotic analysis of the algorithm and is the main reason for working with the set A(h) throughout this section.
Acknowledgments We thank Professor Anders Melin for giving the proof of Lemma 4.1. We also thank the referee for the correction of some minor misprints.
References Gaines, J.G. & Lyons, T.J. (1994). Random generation of stochastic area integrals. SIAM J. Appl. Math. 54, 1132–1146. Golub, G.H. & van Loan, C.F. (1996). Matrix Computations, 3rd ed. John Hopkins University Press, London. Kloeden, P.E., Platen, E. & Wright, W. (1992). The approximation of multiple stochastic integrals. Stochastic Anal. Appl. 10, 431–441. Kloeden, P.E. & Platen, E. (1995). Numerical Solution of Stochastic Differential Equations. Springer-Verlag, Berlin. 61
B
Lévy, P. (1951). Wiener’s random functional and other Laplacian random functionals. Proceedings of the Second Berkeley Symposium on Mathematical Statistics and Probability. J. Neyman, ed., pp. 171–187, University of California Press, Berkeley. Lévy, P. (1965). Processus Stochastiques et Mouvement Brownien. Gauthier-Villars, Paris. Milshtein, G.N. (1974). Approximate integration of stochastic differential equations. Theory Prob. Appl. 19, 557–562. Mathai, A.M. & Provost, S.B. (1992) Quadratic Forms in Random Variables: Theory and Applications. Marcel Decker, New York. Rosinski, J. (1990). On the representation of infinitely divisible random vectors. Ann. Probab. 18, 405–430. Rümelin, W. (1982). Numerical treatment of stochastic differential equations. SIAM J. Numer. Anal. 19, 604–613. Rydén, T. & Wiktorsson, M. (2001). On the simulation of iterated Itô integrals, Stochastic Process. Appl. 91, 151–168. Talacko, J. (1956). Perks distributions and their role in the theory of Wiener’s stochastic variables. Trabajos de Estadistica 7, 159–174. Centre for Mathematical Sciences Mathematical Statistics Lund University Box 118 SE-221 00 Lund, Sweden E-mail:
[email protected]
62
C
Paper C
Improved convergence rate for the simulation of Lévy processes of type G M AGNUS W IKTORSSON Centre for Mathematical Sciences Lund University Box 118 221 00 Lund, Sweden
Abstract A random variable is said to be of type G if it is a Gaussian variance mixture with the mixing distribution being infinitely divisible. A Lévy process is said to be of type G if its increments is of type G. Every such Lévy process on [0 1] can be represented as an infinite series which converges uniformly a.s. In practice, however, we must truncate this infinite series. The question is now how to approximate the neglected terms in the series, the tail-sum process, with a simpler stochastic process such that a good convergence rate is achieved. In order to do this we note that both the original process and the tail-sum process can be represented as subordinated Wiener processes. The main idea is then to approximate the subordinator (i.e. a non-negative increasing Lévy process) by its mean function. This leads to an approximation of the original process which has a better integrated mean square convergence rate compared to that obtained using the truncated series representation only. We also show that this approach is generalisable to arbitrary real-valued subordinated Lévy processes provided that the subordinand has two finite moments and that it is possible to simulate the subordinand exactly.
Key words: type G distribution, variance mixture, Lévy process, shot noise representation, stochastic time-change, subordination. 2000 Maths Subject Classification: Primary 60G51; Secondary 60E07, 60G15 65
C
1
Introduction
Recently various Lévy processes have been used as the driving process in stock price modelling as an alternative to the Wiener process and simulation of Lévy processes has received new attention. Rydberg (1997) used a Normal inverse Gaussian Lévy process, which is of type G, as a model for financial data. A more extensive treatment of Normal inverse Gaussian processes can be found in Barndorff-Nielsen (1998) (see also Barndorff-Nielsen & Pérez-Abreu, 1999). Protter & Talay (1997) studied stochastic differential equations (SDEs) driven by Lévy processes. Jacod & Protter (1998) found that the asymptotic error distribution for the Euler method when simulating certain SDEs driven by Lévy processes with a Gaussian component converges weakly to a type G Lévy process with a Gaussian component. An important difference compared to the case where the driving process is a standard Wiener process is that in general there is no method for exact simulation of the driving Lévy process. If an approximation of the driving process is used together with an approximative solution of the SDE, such as the Euler method, a slower convergence rate of the solution is expected. This makes it important to obtain approximations of the driving process with small error.
2
Lévy processes
We now state some elementary properties of Lévy processes. For a more general treatment see Bertoin (1996) and Sato (1999).A (homogeneous) Lévy process is a stochastic process with independent stationary increments. Every Lévy process X (t) can be decomposed as
W (t)
X (t) at
Z (t)
where at is a linear drift, W (t) is a standard Wiener process and Z (t) is a pure jump process. The distribution of X (1) completely determines the finite-dimensional distributions of X (t) . Moreover there is a one-to-one correspondence between the infinitely divisible distributions and the distribution of X (1). If a Lévy process has finite expectation and variance they are linear functions of t, i.e. E X (t)
t E X (1)
Var X (t)
t Var X (1)
2.1 Infinitely divisible distributions
A random variable X on is said to be infinitely divisible (ID) if for every n there exist i.i.d. random variables X1(n) Xn(n) such that X X1(n) d
66
Xn(n)
2.2 Type G distributions
This implies that the characteristic function of X ,
X (t)
X (n) (t)
X (t), n
can be written as
where X (n) (t) is a characteristic function for each n 1. Every characteristic function of an ID random variable X can be written in the following form, the so-called LévyKhinchine canonical representation,
2
2 2
X ( ) exp i a
exp(i x)
1 i xI ( x
1) L(dx)
where L is called the Lévy measure. If 2 0 then X is said to have no Gaussian component. Any positive -finite measure L which assigns finite mass to sets bounded away from zero and satisfies
min(1 x 2 ) L(dx)
can be used as a Lévy measure. If the ID random variable is positive the Laplace transform of the distribution can be represented as
( ) X
exp a
0
x)
(exp(
1) L(dx)
The Lévy measure L then satisfies the stronger integrability condition 0
min(1 x) L(dx)
2.2 Type G distributions An ID random variable X is said to be of type G if it can be represented as a Gaussian d variance mixture with an ID mixing distribution, i.e. X GV 1 2 where G is a standard Gaussian variable and V 0 is ID. From Rosi´nski (1991) we recall the following result.
Theorem 2.1 A random variable X on has a type G distribution if and only if the function defined by ( 2 2) : log E exp(ı X ) satisfies (0) 0 and has a completely monotone derivative on (0 ). Recall that a function f (x) x 0 is called completely monotone if its derivatives have alternating signs, i.e. if ( 1)nf (n) (x) 0 for x 0 and n 0 1 2 . If lim ( ) 0 the corresponding type G distribution has no Gaussian component (Rosi´nski, 1991). Since it is well known how to simulate Gaussian random variables we from now on assume that lim ( ) 0.
67
C
It is apparent from Theorem 2.1 that
1 2
log E exp ı(2 ) X
( )
log E exp
V
for
0
This implies that the Lévy measures N and M of the type G distribution and the variance mixing distribution respectively satisfy
2 0
(cos( x) 1) N (dx)
x 2) 2
(exp(
0
1) M (dx)
We can rewrite this as
n(x)
0
1 e 2 y
x 2 2y
M (dy)
(2.1)
where n is the density of N with respect to Lebesgue measure. Hence, the Lévy measure of a type G distribution is a Gaussian variance mixture with a density supported on the whole of . From this relation it is also obvious that N and M have the same total mass, since the mixture is mass-preserving. Rosi´nski (1991) showed that every type G random variable X with no Gaussian component can be represented as a series of shot noise type. More precisely,
1 2
X
d
g(Tk )1 2 Gk G
d
g(Tk )
k 1
(2.2)
k 1
where Tk are the points of a homogeneous Poisson process on (0 ) with intensity , g(u) inf x 0 : M (x )
u
for u 0
(2.3)
is the generalised inverse of the tail of the Lévy measure M , Gk is an i.i.d. sequence of standard Gaussian variables independent of Tk and G is a standard Gaussian variable. More generally we can use the ideas in Bondesson (1982) to represent the random variable X by means of a family Y (u) of independent non-negative random variables as 1 2
X
d
Y (Tk )1 2 Gk G d
Y (Tk )
k 1
(2.4)
k 1
provided that their distribution functions H ( u) satisfy the relation
0
H (dy u) du
M (dy)
The representation (2.2) is a special case of (2.4) obtained when H (dy u) has a single atom at g(u). There is also another special case of (2.4) which is of specific interest. Let 68
2.3 Lévy processes of type G
2 2 Y be a positive random variable such that E Y and E Y . Take d 2 2 Y (u) c(u)Y so that, for T 0, and c(u) du c(u) du . T T Then H (y u) F (y c(u)), where F is the distribution function of Y , and M (dx) 0 F (dx c(u)) du. The corresponding series representation is
X
d
(Yk c(Tk ))1 2 Gk
(2.5)
k 1
where Yk are independent copies of Y . ´ ski, 1991). This If ( ) the Lévy measures N and M have finite mass (Rosin implies that we can find a series representation of shot noise type for which the function g(u) has finite support. In fact, this support can always taken to be [0 1]. To see this, d let M ( ). Then g, as defined above, vanishes outside [0 1] and hence V d 1 2 . k:Tk 1 g(Tk ) and X GV
Example 2.1 Inverse method Assume that the variance mixing random variable V has 0. Then M ( ) 2 and g(u) Lévy measure M (dx) 1 cosh(x) dx, x log(cot( u 4)) 0 u 1. This means that we on average have to simulate random variables to obtain a realisation of X .
If the variance mixing random variable has a Lévy measure with finite mass and a density, we can use a rejection method to simulate from the mixing distribution.
Example 2.2 Rejection method Assume that M (dx) exp( 1 x x) dx. Then M ( ) 2K1 (2) 0 2797 and thus M ( ) is a probability measure on ; here K1 is the modified first order Bessel function of second kind. We can simulate from M by rejection from the standard exponential distribution with rejection constant C 1 .
From now on we assume that ( )
2.3 Lévy processes of type G Type G processes were first treated by Marcus (1987) and then in a somewhat more general form (concerning the d-dimensional situation) described by Rosi n´ ski (1990) and more extensively by Rosi´nski (1991). A type G Lévy process is a Lévy process whose increments are of type G. We here restate a theorem from Rosin´ ski (1991) for such processes. Theorem 2.2 A Lévy process X (t)
E exp ı
n
X (tk )ak
k 1
is of type G if and only if
0 t 1
exp
n k 1
(tk tk 1 )
1 2
m k
for every 0 t0 t1 tk tn 1 a1 an (0) 0 and ( ) has a completely monotone derivative on (0 ).
n
am
2
and n
1, where
69
C
Rosi´nski (1991) showed that every Lévy process X (t) of type G admits the series representation
X (t)
0 t 1
d
Y (Tk )1 2 Gk I (Uk
t)
k 1
(2.6)
0 t 1
d
where means equality in finite-dimensional distributions, T k are the points of a homogenous Poisson process on (0 ) with intensity , Gk is a sequence of independent standard Gaussian random variables, Y (u) is family independent non-negative random variables with distribution functions H ( u) such that 0 H (dy u) du M (dy) or equivalently
0
Y (u))]
(E[exp(
1) du
( )
Uk is a sequence of independent random variables uniformly distributed on (0 1) and where Y (u) , Gk , Tk and Uk are mutually independent. This is just a slight modification of the representation (2.4). We also define the variance process V (t) 0 t 1 by
V (t)
Y (Tk )I (Uk
t)
(2.7)
k 1
The process V (t) is a non-decreasing positive Lévy process with càdlàg (RCLL) paths. Now define the truncated series representation XT (t) 0 t 1 by
XT (t)
Y (Tk )1 2 Gk I (Uk
t)
(2.8)
k:Tk T
The right-hand side of (2.8) converges uniformly in t a.s. as T lim XT (t)
T
0 t 1
X (t)
The same type of result is true for VT (t) VT (t)
0 t 1
0 t 1,
d
X (t)
and thus
0 t 1
defined by
Y (Tk )I (Uk
t)
(2.9)
k:Tk T
(see Rosi´nski, 2000, and the references therein). Note that we can define X (t) and V (t) on the same probability space as X T (t) T 0, and VT (t) T 0, and thus identify X (t) with X (t) and V (t) with V (t) . 70
2.3 Lévy processes of type G
For completeness we note that there are in fact three Lévy measures related to each type G Lévy process. Each of the three measures uniquely determines the other two. First there is the Lévy measure N of X (1), then the Lévy measure M of V (1) and finally the symmetric Lévy measure M0 of S0 (1), where S0 (t) 0 t 1 is a Lévy process with series representation
S0 (t)
Rk g(Tk )1 2 I (Uk
t)
k 1
Here g, Uk and Tk are as defined above and Rk is an i.i.d. sequence of random variables independent of the other sequences with P(Rk 1) P(Rk 1) 1 2. The Lévy process V (t) is the quadratic variation process of S0 (t) . We can view the process X (t) as the process obtained by replacing the sequence Rk with the Gaussian sequence Rk Gk in the series representation of S0 (t) , i.e. Gaussian randomisation of the process S0 (t) . This turns out to be a natural way of defining multivariate Lévy processes of type G (Maejima & Rosi´nski, 2000). The Lévy measures N , M and M0 satisfy the relations
n(x)
0
n(x)
2
0
M (x )
x2 1 M (dy) exp
2 y 2y x2 1 M0 (dy) exp
2y2 2 y
2M0 (x 1 2 )
(2.10)
for x 0. A non-negative increasing Lévy process V (t) is called a subordinator and a process of the form W (V (t)) , where W (t) is a Wiener process, is called a subordinated Wiener process. It is known that a subordinated Wiener process is of type G (Feller, 1971, XVII.4(e)). It is thus possible to obtain a conditionally Gaussian representation of type G Lévy processes in the form of a specific stochastic time-change (subordination) of a standard Wiener process. This representation can be viewed as the composition of a standard Wiener process with the variance process (subordinator) V (t) defined in (2.7). More on subordinators can be found in Bertoin (1996) and Sato (1999). Proposition 2.1 (Stochastic time-change) Every Lévy process X (t) mits the representation X (t) d
d
of type G ad-
W (V (t))
where means equality in finite-dimensional distributions, W (t) process and V (t) 0 t 1 is the Lévy process defined in (2.7).
0 t 1
t 0
is a standard Wiener
71
C
Proof. It is enough to show that the characteristic functions
)
1(
E exp ı
n
k (X (tk )
X (tk 1 ))
k (W (V (tk ))
W (V (tk 1 )))
k 1
and 2(
)
E exp ı
n
k 1
coincide for 0 t0 t1 tk tn 1 and the independent stationary increments of X (t) ,
1(
)
k 1
exp
n
k 1 2.
We now turn to
1.
By
(tk
tk 1 ) (
2) 2 k
By the independent stationary increments of W (t) and V (t) ,
n
2( )
. We start with
E exp ı k (X (tk ) X (tk 1 ))
n
n
E
k 1 n
E exp ı k (W (V (tk )) W (V (tk 1 )))
E exp
k 1 n
k 1
(E exp
exp
2 k (V (tk )
2 k V (1)
V (t)
V (tk 1 )) 2
2 )
(tk tk
n k 1
(tk tk 1 ) (
1)
2) 2 k
3
Representations of the tail-sum process
Now let T 0 be an arbitrary but fixed truncation time and define the tail-sum processes T T X (t) 0 t 1 and V (t) 0 t 1 by
T X (t)
72
X (t) XT (t)
Y (Tk )1 2 Gk I (Uk
k:Tk T
t)
(3.1)
3. Representations of the tail-sum process
and T V (t)
V (t) VT (t)
Y (Tk )I (Uk
t)
(3.2)
k:Tk T
respectively. By the independent increments of the Poisson process it follows that the tail-sum processes TX (t) and TV (t) are independent of XT (t) and VT (t) . Lemma 3.1 If Y (u) g(u), where g(u) is defined in (2.3), then for each T 0 TV (t) is a subordinator. Moreover the Lévy measure MT of TV (1) is the Lévy measure M of V (1) restricted to [0 g(T )), i.e. MT (A) M (A [0 g(T ))) for A ( ). Proof. The Laplace transform T (1) V
The change of variable z
T V (1)
where M (x)
g(
exp
g(T ) g(T )
0
e
exp
z
z
is given by
e
e e
)
0
g(u) yields
exp
T V (1)
T
( ) of
( ) exp
T (1) V
z
g(u)
1 M (dz)
1 I (0
1 du
1 M(dz)
z
g(T )) M (dz)
M (dz). The uniqueness of Lévy measures concludes the proof.
x
For completeness we also state the lemma for the representation (2.8) in the general case. Lemma 3.2 If P(Y (u) y) H (y u) then for each T Moreover the Lévy measure MT of TV (1) satisfies MT (dx)
T
0
T V (t)
is a subordinator.
H (dx u) du
We now want to show that a representation of the same type as in Proposition 2.1 is valid for TX (t) 0 t 1 for each fixed T 0.
T X (t) 0 t 1
Proposition 3.1 For each fixed T 0 the tail-sum process sentation T X (t) 0 t 1
d
WT (
admits the repre-
T V (t)) 0 t 1
73
C
d
where means equality in finite-dimensional distributions, WT (t) process and TV (t) 0 t 1 is the Lévy process defined in (3.2).
t 0 is a standard Wiener
Proof. By Lemmas 3.1 and 3.2, MT is a Lévy measure for each fixed T . We can then use Proposition 2.1 with M replaced by MT to conclude the proof.
It is possible to derive an alternative representation of the tail-sum process. First we recall that if W (t) t 0 is a standard Wiener process, so is Z (t) t 0 C 1 2 W (t C) t 0 for any C 0, provided that C is independent of W (t) t 0. Using this we can find . If T 0 then set an alternative representation of TX (t) 0 t 1 for each V ( ) (1) 1 2 T T T T WT ( V ( ) (t) V ( ) (1)), where WT (t) t 0 is a Wiener process. X ( ) (t) V ( ) (1) If TV ( ) (1) 0 then set TX ( ) (t) 0. This modification does not alter the distribution : T since the set V ( ) (1) 0 has measure zero. We formalise this in the following result.
T X (t) 0 t 1
Proposition 3.2 For each fixed T 0 the tail-sum process sentation T X (t) 0 t 1
d
T 1 2 WT V (1)
T V (t) T (1) V
0 t 1
d
where means equality in finite-dimensional distributions, WT (t) process and TV (t) 0 t 1 is the Lévy process defined in (3.2).
admits the repre-
t 0 is a standard Wiener
T 1 2 WT (t TV (1) t Proof. Let for TV (1) 0, ZT (t) t 0 V (1) is that ZT (t) t 0 is a standard Wiener process independent of WT in Proposition 3.1 by ZT to finish the proof.
4
0 . All we need to show T V (1). We now replace
Simulation algorithms
A main point of the paper is that in many cases TV (t) E TV (1) t as T for all 0 t 1 in some mode of convergence. This implies that WT ( TV (t)) (E TV (1))1 2 0 t 1 converges weakly to a standard Wiener process in the Skorohod topology. Such convergence will be illustrated in examples and general conditions ensuring it are given below. The representation in Proposition 3.1 leads to the following simulation algorithm.
Algorithm 1 1. Simulate XT (t) 74
0 t 1
as XT (t) :
Y (Tk )1 2 Gk I (Uk
k:Tk T
t)
4.1 Mean integrated square error
T X (t) 0 t 1
2. Approximate
by ˆ TX (t) :
3. Define the approximation Xˆ T(1) (t) ˆ TX (t)
T V (1))
WT (t E
of X (t)
0 t 1
by Xˆ T(1) (t) :
0 t 1
XT (t)
1 2
We now turn to the representation in Proposition 3.2 and suggest T WT (t) 0 t 1 as an approximation of TV (1)1 2 WT ( TV (t) TV (1)) 0 t 1, where T E TV (1). The main reason for doing so is that this is an optimal coupling at t 1 in mean square sense. To see this note that
E
T 1 2 WT ( TV (t) V (1)
T V (1))
1 2 T WT (t)
E (
T 1 2 WT (1) V (1)
E (
T 1 2 V (1)
t 1 1 2 2 T WT (1))
1 2 2 T )
2
It is well known that the optimal coupling (in mean square sense) of two zero mean Gaussian random variables is to scale a single standard Gaussian variable with the respective standard deviations (see e.g. Dowson & Landau, 1982). This fact, however, does not automatically imply that the above coupling is optimal for each t. We will examine this coupling in more detail below. First we propose the corresponding simulation algorithm. Algorithm 2 1. Simulate XT (t) 2. Approximate
0 t 1
as XT (t) :
T X (t) 0 t 1
Y (Tk )1 2 Gk I (Uk
k:Tk T
1 2 T WT (t)
by ˜ TX (t) :
3. Define the approximation Xˆ T(2) (t) ˜ TX (t)
0 t 1
t)
of X (t)
0 t 1
by Xˆ T(2) (t) :
XT (t)
We note that both the approximating processes Xˆ T(1)(t) and Xˆ T(2)(t) have the same the same finite-dimensional distributions.
4.1 Mean integrated square error We will now examine the distance between the process X (t) and the approximations Xˆ (1)(T t) and Xˆ (2) (T t) respectively. The distance that will be used is the mean integrated square error (MISE), defined as
(A) MISE (T )
1
E 0
(X (t) Xˆ T(A)(t))2 dt 75
C
where A is 0,1 or 2. Method 0 is to neglect the tail-sum process altogether and its MISE is given by 1
(0) MISE (T )
T 2 X (t)
E 0
1 E 2
dt
T V (1)
for T 0
This method is included as a reference to compare the other two against. We now turn to method 1. Theorem 4.1 (MISE for representation 1) Let X (t) 0 t 1 be a Lévy process of type G such that E TX (1)2 E TV (1) . Then this process and the approximating process Xˆ T(1) (t) 0 t 1 can be defined on a common probability space so that
(1) MISE (T )
1
E 0
T V (t)
E
T V (t)
dt for T
0
In addition
where Var(
T V (1))
2 Var( 3
(1) MISE (T )
T
T 1 2 V (1))
for T 0
E Y (u)2 du.
Proof. Using the representation in Proposition 3.1, elementary properties of the Wiener process and Fubini’s theorem, we find that 1
E 0
1
(X (t) Xˆ T(1) (t))2 dt
0
E(X (t) Xˆ T (t))2 dt
1
EE 0 1
E 0
W ( T
T V (t)
T V (t))
E
T V (t)
WT (E
dt
2 T V (t))
T V (t)
dt
which shows the first part. The second part follows trivially from the monotonicity of E Y (u)2 du. Lp -norms, linearity of the variance and the fact that Var( TV (1)) T
We see from Theorem 4.1 that the MISE depends on the integrated expected absolute deviation between the subordinator and its mean function. It is well known that the absolute deviation is minimised by the median. Thus replacing the mean function by the median function will yield a lower MISE. The drawback is that in general is it much more difficult to calculate the median function. 76
4.1 Mean integrated square error
We further note that the expected absolute deviation can be quite difficult to calculate in the general case. To get upper and lower bounds for it we can use a result from Marcus & Rosi´nski (2000), which we state without proof.
Lemma 4.1 (L1 inequality) If X is an ID random variable with no Gaussian component, E X 0 and E X such that
E exp(ıX ) exp
(eıx 1 ıx ) M (dx)
then e 1 (1) 4 E X 3e 1 (1) where e 1 (1) is the solution to the equation e(z) and e(z) 0 min(x 2 z 2 x z) M (dx).
1
Using this lemma we can bound the L1 -norm for each t obtaining the following corollary to Theorem 4.1. Corollary 4.1
1
(1) MISE (T )
0
3eT 1 (1 t) dt
3eT 1 (1)
where eT 1 (1 t) is the solution to te(z) 1 and eT 1 (1) is the solution to eT (z) eT (z) 0 min(x 2 z 2 x z) MT (dx), MT being the Lévy measure of TV (1).
1 with
Proof. It is evident that X TV (t) E use that tMT ( ) is the Lévy measure of
T V (t) satisfies the conditions T V (t). Furthermore
1 0
eT 1 (1 t) dt
of Lemma 4.1. Then
sup eT 1 (1 t)
0 t 1
and thus the last inequality follows from the fact eT 1 (1 t) is increasing in t.
We now proceed to approximation 2. Theorem 4.2 (MISE for representation 2) Let X (t) 0 t 1 be a Lévy process of type G such that E TX (1)2 E TV (1) . Then this process and the approximating process (2) ˆ XT (t) 0 t 1 can be defined on a common probability space so that
(2) MISE (T )
1
E
T 1 2 V (t)
0
2
2
1 2 T
min
(t
T V (t) T (1) V
T)
t
1 2
1 2
T 1 2 V (t)
(t
T 1 2 V (1))
dt 77
C
We can alternatively write this as, for T 0, (2) MISE (T )
(1) (T ) MISE
1
2
E min(t
T
0
where
T
1 2 T V (1) T )
(
min
T V (t) T (1) V
t
dt
T V (1).
E
T V (t))
Proof. As in the proof of Theorem 4.1 we use the elementary properties of the Wiener process and Fubini’s Theorem to obtain 1
E 0
(X (t) Xˆ T(2) (t))2 dt 1
E(X (t) X˜ T (t))2 dt
0
1
EE
T V (t)
t
T
E
T V (t)
t
T
1
0
2 (
T 1 2 V (t)t T )
1
T 1 2 V (t)
2(
T 1 2 V (t)t T )
1 2 T
min
T V (t) T (1) V
t
E 0
t
T
2(
T 1 2 V (1) T )
1
T)
0
t
dt
dt
dt
min T V (t)
2 min(t 78
T V (t) T (1) V
2
1 2
1 2
T V (t)
T 1 2 V (t)
T V (t) T (1) V
t
(t
T 1 2 V (1))
dt
dt
E
T V (t)
T V T (1) V
The second part follows on observing that 1
1 2 T V (1) T )
(t
(t) min t
2(
0
2
1 2 T WT (t)
T 1 2 min V (1) T )
(
E
2
T V (t) T (1) V
E 0
0 1
T 1 2 WT V (1)
T
t
T
T V (t))
2 min(E
(
T V (t)
T V (t))
T 1 2 min V (1) T )
T V (t) T (1) V
t
dt
4.1 Mean integrated square error
We do conjecture that (2) MISE (T )
(1) MISE (T )
and also that the LHS has the same convergence rate when T a smaller constant.
as the RHS but with
Example 4.1 Stable processes of type G Suppose that X (t) is a standard symmetric -stable Lévy process, i.e. E exp(ı X (t)) exp( t ).
(i) We can then simulate the subordinator with g(u) This gives the MISEs 2 (Var 3
(1) (i) (T )
T 1 2 V (1))
and (0) (i) (T )
T V (1)
E
4 3
(1
u2
(ii) Alternatively we can use H (dy u) 2 2 . This gives the MISEs
1
2))
2 (u (1
2)
1 2
4
2
(1
2
1 2
and
T 1 2
2) 2 T
y (1 2) exp(
2
2
2
1.
2
yu2 ) dy and
2 (Var 3
(1) (ii) (T )
T 1 2 V (1))
and
(0) (ii) (T )
T V (1)
E
2
4
))
( (2
3
2
22
2
T
1 2
1 2
T 1 2
2
(iii) We can also take Y (u) Pareto distributed with density fY (u) (y) ku 2k yk 1 I (y u 2 ) where k 2 and (21 2 (2k
) (2k (1 1 2 ))). This gives the MISEs
2 (Var 3
(1) (iii) (T )
T 1 2 V (1))
(0) (iii) (T )
E
2 1 C 3
and
T V (1)
2k
(2k 4)(4
2
where C
2
C
2k
(4k 4)(2
2
2 (1
)
) T 1
1 2
T 1 2
2
2
2
2) 79
C
Example 4.2 If N (dx) C (exp(x) 1)2 then M (dx) C k 1 (k2 2) exp( xk2 2). We simulate this with Y (u) having an exponential distribution with mean u 1 2 and C. This gives the MISEs
(1)
2 (Var 3
(T )
2C
2 1 C 3
T 1 2 V (1))
1 2
2 3
3 2
T
2
T
k
1 2
4
8 k
1
3 2
and
(0)
(T )
T V (1)
E
2
(C 2)
T
k
2 k2
1
C T
1
We see that in both the above examples we gain a factor T 1 2 in convergence rate. 1 as This result is in fact true for any simulation where g(T ) or c(T ) is of O(T ) T
Example 4.3 Symmetric gamma process Here X (1) has a standard Laplace distribution and the corresponding subordinator is the gamma process with mean two at t 1. We can simulate the gamma process with H (y u) 1 exp( ye u 2), i.e. Y (u) has an exponential distribution with mean 2e u . This gives the MISEs
(1)
1
(T )
E
T V (t)
0
T V (1)
dt
t
(0)
(T ) E
4e
0
and
t e t dt (t)
1 t
T
T V (1)
2
e
T
0 9029e
T
In this case we only obtain a small improvement in the constant and no improvement in the convergence rate. This is due to the fact that we do not have convergence to a Wiener process. This will be discussed in more detail below. We also note that for the gamma process we can in fact find a better coupling than the one of method 1. The coupling we propose is ˆ TX (t) : WT (t TV (1)) and it has a MISE of 1
(T )
E 0
E
T V (t)
2 T V (1)
Note that 2e T . 80
T V (1)
1
t
T V (1)
sin( t)t t (1 t)1
0
dt
t
dt
0 4530e
T
is easily simulated since it has an exponential distribution with mean
4.2 The work required for a given accuracy
4.2 The work required for a given accuracy
Assume that we want to simulate a process of type G with a MISE not exceeding . How do we choose the truncation time T T (1) ( ) to accomplish this and how much work is then required? We can measure work by the expected number of random variables N ( ) that need to be by simulated, thus disregarding the additional work required to simulate the Wiener process. Note that the expected number of random variables is N( ) T ( ). Using Theorem 4.1 we find that
2 Var( TV (1))1 3
(1) MISE (T )
2
1 2
2 3
T
1 2
E Y (u)2 du
T
c(u)2 du
3
2
1 2
1 2
d
assuming that Y (u) c(u) for some random variable Y . Moreover,
(0) MISE (T )
1 E( 2
T 1 2 V (1))
1
2
E Y (u) du
T
1
2
T
c(u) du
For simplicity we only consider the case where c(T ) asymptotically has a polynomial decrease rate, i.e. T as T
c(T ) for some
1. This gives
(1) MISE (T )
1 2 2 T 3 (2 1)1 2
1 2
and thus T ( ) (2 3)2 (1 2 )( (2 1))1 (1 2 ) 2 The expected number of random variables is therefore
N (1) ( )
2 (1 2 )
as
as T
(1 2 )
yields a MISE less than .
0
which should be compared to the work required if we neglect the tail-sum,
N (0) ( )
1 (1 )
as
0
Note that in the -stable case we have polynomial decrease rate of order T , where 2 0 2.
81
C
5
Convergence of the subordinator
We will now examine if and when the scaled subordinator TV (t) E TV (t) of the tailsum process converges in L1 or L2 to t for 0 t 1, i.e. if E TV (t) E TV (1) t 0 or E[( TV (t) E TV (1) t)2] 0 as T . We start with L2 convergence. Assume that 2 EY (u) c(u) and EY (u)2 c(u)2 where c is a non-increasing function such that T T 2 2 E V (1) and Var V (1) for T 0. T c(u) du T c(u) du We then have that
1
E
0
T V (t)
E
T V (1)
t
2
2
dt
2
Note that this result also covers the the case Y (u)
2
(E
T V (t)
0 then
T
c(u)2 du
c(u) du
T
2
g(u).
Proposition 5.1 (L2 convergence) (i) If limT c(T ) E TV (1) [0 1] P Leb) to t .
T V (1)) 0 t 1
converges in L2 (
(ii) If c(T ) is differentiable for all large enough T then
d log(1 c(T )) dT
lim
T
0
is sufficient for the conclusion in (i) to hold true. Proof. Use that c(u) is non-increasing to obtain lim sup 2
2
T
T
T
c(u)2 du c(u) du
2
lim sup T
lim sup
lim sup
T
2
c(T ) 2
T
2
T 2
T
c(u) du
c(u) du
c(T ) T c(u) du
2
2
c(T ) E TV (1)
This proves the first part. The second part follows from using L’Hôpital’s rule on the first part to obtain lim
T
2
2
c(T ) c(u) du T
lim
T
2
2
c (T ) c(T )
T
provided that the limits exist, which concludes the proof. 82
2
lim
2
d log(1 c(T )) dT
5. Convergence of the subordinator
When does the above conditions fail? We can e.g. take g(T ) Then
d log(1 g(T )) dT
lim
T
exp( cT ) as T
.
c
This happens e.g. for the symmetric gamma (Laplace) process where c 1. We also note that the above limit does not necessarily exist. We continue with L1 convergence. In general it is much harder to calculate L1 -norms than L2 -norms. We can of course use Lemma 4.1 together with Corollary 4.1 to calculate the bound 1
E 0
T V (t) T (1) V
E
3eT 1 (1) E
t dt
This leads to the following trivial result.
T V (1)
Proposition 5.2 If
E
lim E
T
T V (1) T (1) V
1
then the scaled tail-sum subordinator converges in L1 (
0 [0 1] P
Leb) to t as T
Proof. By using the upper and lower bounds in Lemma 4.1 we obtain that the convergence of E TV (1) E TV (1) 1 to zero implies the convergence of 3eT 1 (1) E TV (1) to zero which concludes proof.
When Y (u) g(u) we can make more precise statements. If we examine the expression for e(z) in Lemma 4.1 more carefully we see that if Var TV (1) g(T )2 1 for all large enough T then eT 1 (1) (Var TV (1))1 2 as T . This evident from that
g(T )
eT (z)
min(z g (T ))
0
min(x 2 z 2 x z) M (dx)
0
x 2 z 2 M (dx)
g(T )
I (z
g(t))
x z M (dx)
z
The condition Var TV (1) g(T )2 1 thus assures that e(g(T )) 1 for all large enough T which implies eT 1 (1) (Var TV (1))1 2 since eT (z) Var TV (1) z 2 for all z g(T ). This is the same as saying that (Var TV (1))1 2 asymptotically gives the correct rate for the L1 distance. So in this case L1 and L2 convergence become equivalent, but what does the
83
C
condition Var TV (1) g(T )2 1 for all large enough T really mean? If we suppose that g(T ) is differentiable for all large enough T we can invoke L’Hôpital’s rule to obtain Var TV (1) g(T )2
lim
T
lim
T
2
g(T )
2g (T )g(T )
g(T ) lim
2g (T )
T
lim
T
1 2d (dT ) log(1 g(T ))
provided that the limits exist. This is precisely the reciprocal of the condition that assures L2 convergence and thus if limT Var TV (1) g(T )2 and g(T ) is differentiable for all large enough T we have L1 and L2 convergence of the scaled tail-sum subordinator. What happens if limT Var TV (1) g(T )2 C 1 C ? For this case we have neither L2 nor L1 convergence of TV (1) E TV (1) to unity.
5.1 Weak convergence of the subordinator and the tail-sum process When does the tail-sum process scaled by its standard deviation converge weakly to a standard Wiener process? If it does not, when does it converge weakly to some other non-zero Lévy process? Asmussen & Rosin´ ski (2000) address this problem for arbitrary real-valued Lévy processes in a slightly different setting, namely that of truncating the original Lévy measure rather than, as in the present paper, truncating the Lévy measure of the underlying subordinator. Hence the results are not immediately comparable. The main difference is that although the Lévy measure of the scaled tail-sum process of the subordinator TV (t) E TV (1) may have bounded support, the Lévy measure of T T 1 2 is always supported on the whole of for any finite T , which is X (t) E V (1) easily seen from (2.1). Another way of viewing this is that Asmussen & Rosi n´ ski (2000) truncate the jump heights of the Lévy process while we truncate the variance of the jump heights. Four qualitatively different things can happen to the scaled tail-sum subordinator as T . The first case is that it converges weakly to t , the second case is that it converges weakly to a non-deterministic subordinator, in the third case it converges weakly to zero and finally it might not converge weakly at all. It is only in the first case that we have weak convergence of the type G tail-sum process to a standard Wiener process. The second case implies weak convergence to a type G process other than the Wiener process. The third case implies weak convergence to the null process and since the scaled tail-sum subordinator has mean one for every finite T 0, weak convergence to zero implies lack of uniform integrability for the scaled tail-sum subordinator. Note, however, that the scaled type G tail-sum process is still uniformly integrable since Var( TX (t) E TV (1)1 2 ) t 0 t 1, for all T 0. Due to the independent stationary increments and bounded mean of the subordinator T T V (t) E V (1) , weak convergence of the subordinator is implied by weak convergence T of V (1) E TV (1) and thus if limT then TV (t) E TV (1) converges T( T) weakly to t , where
T(
84
)
log E exp(
T V (1))
5.1 Weak convergence of the subordinator and the tail-sum process
with T E TV (1). We now give some examples of scaled tail-sum subordinators that do not converge to their mean value functions. The first two examples converge weakly to the same subordinator.
Example 5.1 If we take the Lévy measure with density m(x) 1 (exp(x) 1) and use the inverse tail-measure method and scale with its mean value we obtain convergence to the subordinator with Lévy measure tx 1 I (0 x 1) dx.
Example 5.2 (A discretely supported Lévy measure) If we take a subordinator with M (dx) k 1 (1 k) (x 1 k) dx, use the inverse tail-measure method for the simulation and scale the tail-sum subordinator by E TV (1) we obtain weak convergence to the subordinator with Lévy measure tx 1 I (0 x 1) dx.
We now give two examples of subordinators that are self-normalising. The term self-normalising should here be understood as the property that the scaled tail-sum subordinator TV (t) E TV (1) converges weakly to V (t) as T . In the two following examples the subordinators in fact satisfy the stronger self-normalising condition d d T T V (t) T 0, where means equality in finite-dimensional V (t) E V (1) distributions.
Example 5.3 (Exponential scaling of Pareto distributions) Let Y (u) be a Pareto distributed random variable with parameter k 1 multiplied by exp( u)k (k 1). We then for each T 0 obtain the subordinator for which V (1) has Lévy measure
1 min x
M (dx)
k 1 k
k 1 kx
k
I (x
0) dx
Example 5.4 (Gamma process) Let Y (u) be exponentially distributed with mean exp( u). Then MT (dx) exp( xeT ) x dx, T exp( T ) and T ( ( ) log(1 ) T) 0. In this case we thus have weak convergence to the gamma process, i.e. the for each T subordinator we started with.
We have not been able to obtain a full description of the class of self-normalising subordinators. It should, however, be noted that the last two examples are both self-decomposable (Lévy class L) subordinators. We will show below that all class L subordinators satisfy the strong self-normalising condition. Recall that a random variable X is self-decomposable if d X exp( t)X Yt t 0, where Yt is some family of random variables independent of X . Alternatively we can characterise class L as X
L iff X
d
0
exp( t)Y (t) 85
C
where the Lévy process Y satisfies E log(1 Y (1) ) (see e.g. Jurek, 1997). All selfdecomposable subordinators have Lévy measures which are absolutely continuous and have infinite mass. The corresponding density m(x) satisfies the property that xm(x) is non-decreasing in x with finite limit as x 0 (Bondesson, 1982). Moreover Bondesson (1982) states that all subordinators obtained from exponential scaling of the type used above are self-decomposable and in fact that all self-decomposable subordinators can be obtained in this way. To see that this class is strongly self-normalising, let Y (u) be a d family of independent random variables with Y (u) exp( u)Y and E Y 1, where P(Y y) H (y). We then have that
M T (x exp( T ))
T
H (x exp(u T )) du
H (y) dy y
x
M(x)
where we used the change of variables y x exp(u T ). From this we conclude that all self-decomposable subordinators V (t) with E V (1) 1 are self-normalising in the above sense. We note that the subordinator with M (dx) x 1 I (0 x 1) dx which was obtained as the limit in the first two examples also is self-decomposable. It is moreover the only self-decomposable distribution with E V (1) 1 for which the exponential scaling and the inverse simulation method coincide. We do not know , however, if class L fully exhausts the class of self-normalising subordinators. Before closing this section we point of that the strong self-normalising property can be used to obtain representations of non-negative stationary class L processes on . If we instead view T as the time of the process and t as some shape parameter of the class L distribution we see that
Z (T )
T
0
T V (t)
E
T V (1) T
0
is a non-negative stationary self-decomposable process where all the marginals have the same distribution as V (t).
6
Generalisations
The approach in this paper only to some extent relies on the type G property of the simulated Lévy processes. It can in fact be generalised to a larger class of Lévy processes obtained as subordinations of other Lévy processes provided that we can simulate the subordinand exactly and that is has two finite moments. The decomposition into a compound Poisson process and a remainder process is still valid as well as the first coupling even if the subordinand has no finite moments. We can for this case not calculate any mean square distances, it is however possible to look at distances in probability. We will not proceed in this direction and therefore we hereafter assume that that the subordinand has two finite moments. Assume that Y (t) t 0 is a Lévy process with at least two finite moments. Let X (t) 0 t 1 be the Lévy process obtained by subordination of Y ,
86
6. Generalisations
d
d
i.e. X (t) Y (V (t)) where means equality in finite-dimensional distributions. We can then decompose X (t) 0 t 1 as
X (t)
d
Y1 (VT (t))
T V (t))
Y2 (
where Y1 and Y2 are independent copies of Y . We now propose the approximation X¯ T (t)
Y1 (VT (t))
T V (1))
Y2 (t E
If the subordinand does not have zero expectation we will get an additional term in the MSE. This term, however is asymptotically negligible as T . To see this we start by noting that
E X (t) X¯ T (t) 2
E Y2 ( TV (t)) Y2 (t E TV (1)) 2 Var Y2 ( TV (t)) Var Y2 (t E TV (1))
2 Cov(Y2 (
Now use that Var Y2 (
T V (t))
T V (t))
Y2(t E
T T Var E(Y2 ( TV (t)) V (t)) V (t) ) Var Y (1) E TV (1) t(E Y (1))2 Var TV (1)
E Var(Y2 (
t
and that Cov(Y2 (
T V (t))
E X (t) X¯ T (t) 2
Var Y (1) E min(
Var Y (1) E
T V (t)
t 1 2 Var Y (1)(Var
Thus the MISE is given by
(1)
t
(T )
0
T V (t)
)
E Cov(Y2 ( TV (t)) Y2(t E TV (1))) TV (t) Cov(E[Y2 ( TV (t)) TV (t) ] E[Y2 (t E TV (1))
to obtain
T V (1)))
Y2(t E
T V (1)))
T V (t)
tE T V (1))
2 ¯ E X (t) XT (t) dt t
Var Y (1)
E
0
2 Var Y (1)(Var 3 The additional term (E Y (1))2 Var gible compared to the first term.
T V (t)
tE
T 1 2 V (1))
T V (1)
T V (1)
T V (1) 1 2
dt
tE
T V (1))
t(E Y (1))2 Var t(E Y (1))2 Var
T V (t)
T V (1)
T V (1)
1 (E Y (1))2 Var 2
1 (E Y (1))2 Var 2
T V (1)
T V (1)
2 is now easily seen to be asymptotically negli87
])
C
References Asmussen, S. & Rosi´nski, J. (2000). Approximations of small jumps of Lévy processes with a view towards simulation. Preprint. Available at: http://www.math.utk.edu/ rosinski/manuscripts.html
Barndorff-Nielsen, O.E. (1998). Processes of normal inverse Gaussian type. Financial Stoch. 2, 41–68. Barndorff-Nielsen, O.E. & Pérez-Abreu, V. (1999). Stationary and self-similar processes driven by Lévy processes. Stoch. Proc. Appl. 84, 357–369. Barndorff-Nielsen, O.E. & Sheppard, N. (2000). Modelling by Lévy processes for Financial Econometrics. In Lévy Processes – Theory and Applications. Barndorff-Nielsen, O.E., Mikosch, T. & Resnick, S.I. (eds.). Birkhäuser, Boston. Bertoin, J. (1996). Lévy Processes. Cambridge University Press, Cambridge. Bondesson, L. (1982). On simulation from infinitely divisible distributions. Adv. Appl. Prob. 14, 855–869. Dowson, D.C. & Landau, B.V. (1982). The Fréchet distance between multivariate normal distributions. J. Multivariate Anal. 13, 450–455. Feller, W. (1971). An Introduction to Probability Theory and Its Applications. Vol. II, 2nd ed. Wiley, New York. Jacod, J. & Protter, P. (1998). Asymptotic error distributions for the Euler method for stochastic differential equations. Ann. Prob. 26, 267–307. Jurek, Z.J. (1997). Selfdecomposability: an exception or a rule? Ann. Univ. Mariae CurieSkłodowska Sect. A. 51, 93–107. Marcus, M.B. (1987). -radial processes and random Fourier series. Mem. Amer. Math. Soc. 368, American Mathematical Society, Providence. Marcus, M.B. & Rosi´nski, J. (2000). L1 -norm of infinitely divisible random vectors and certain stochastic integrals. Preprint. Available at: http://www.math.utk.edu/ rosinski/manuscripts.html
Maejima, M. & Rosi´nski, J. (2000). Type G distributions on http://www.math.utk.edu/ rosinski/manuscripts.html
d
. Preprint. Available at:
Protter, P. & Talay, D. (1997). The Euler scheme for Lévy driven stochastic differential equations. Ann. Prob. 25, 393–423. 88
REFERENCES
Rosi´nski, J. (1990). On the representation of infinitely divisible random vectors. Ann. Prob. 18, 405–430. Rosi´nski, J. (1991). On a class of infinitely divisible processes represented as mixtures of Gaussian processes. In Stable Processes and Related Topics. Cambanis, S., Samorodnitsky, G. & Taqqu, T.S. (eds.). Birkhäuser, Boston. pp. 405–430. Rosi´nski, J. (2000). Series representation of Lévy processes from the perspective of point processes. In Lévy Processes–Theory and Applications. Barndorff-Nielsen, O.E., Mikosch, T. & Resnick, S.I. (eds.). Birkhäuser, Boston. Rydberg, T. (1997). The Normal inverse Gaussian Lévy process: simulation and approximation. Comm. Stat. Stoch. Models 13, 887–910. Sato, K. (1999). Lévy Processes and Infinitely Divisible Distributions. Cambridge University Press, Cambridge.
89
C
90
D
Paper D
Simulation of stochastic integrals with respect to Lévy processes of type G M AGNUS W IKTORSSON Centre for Mathematical Sciences Lund University Box 118 221 00 Lund, Sweden
Abstract We study the simulation of stochastic processes defined as stochastic integrals with respect to type G Lévy processes for the case where it is not possible to simulate the type G process exactly. The type G Lévy process as well as the stochastic integral can on compact intervals be represented as an infinite series. In a practical simulation we must truncate this representation. We examine the approximation of the remaining terms with a simpler process to get an approximation of the stochastic integral. We also show that a stochastic time change representation can be used to obtain an approximation of stochastic integrals with respect to type G Lévy processes provided that the integrator and the integrand are independent.
Key words: type G distribution, stochastic integral, variance mixture, Lévy process, shot noise representation, stochastic time change, subordination. 2000 Maths Subject Classification: Primary 60G51; Secondary 60H05, 60E07
1 Introduction Lévy processes and certain stochastic integrals with respect to Lévy processes are used as building blocks when modelling phenomena related to financial markets e.g. stochastic modelling of stock prices, volatility and interests rates. The use of general Lévy processes instead of the traditional Wiener process makes it possible to incorporate processes with 93
D
jumps and infinite variance. Type G Lévy processes (cf. below) is an important subclass of all Lévy processes which allows us to retain some of the Gaussian properties of the Wiener process and yet is rich enough to incorporate processes with jumps and infinite variance. Moreover the class of type G Lévy processes contains the important subclass of symmetric stable Lévy processes. In e.g. Barndorff-Nielsen & Pérez-Abreu (1999), Barndorff-Nielsen (1998) and Rydberg (1997) type G processes are used for modelling financial data. It is in these cases important to have efficient algorithms that can simulate such processes with prescribed accuracy.
2
Lévy processes
We now state some elementary properties of Lévy processes. For a more general treatment see Bertoin (1996) and Sato (1999). A Lévy process is a stochastic process with independent stationary increments. Every Lévy process X (t) can be decomposed as
W (t)
X (t) at
Z (t)
where at is a linear drift, W (t) is a standard Wiener process and Z (t) is a pure jump process. The distribution of X (1) completely determines the finite-dimensional distributions of X (t) . Moreover there is a one-to-one correspondence between the infinitely divisible (ID) distributions and the distribution of X (1). The characteristic function X (t) ( ) E exp(i X (t)) of a Lévy process can always be written in the form
X (t) ( )
2
exp t i a
exp(i x)
2
2
1 i xI ( x
1) L(dx)
where L is called the Lévy measure. If 0 the (ID) distribution is said to have no Gaussian component. If a Lévy process has finite expectation and variance they are linear functions of t, i.e. E X (t)
t E X (1)
Var X (t)
t Var X (1)
2.1 Levy processes of type G
d
A random variable X on is said to be of type G if X V 1 2 G where G is a standard Gaussian variable and V is non-negative and (ID). In this case X (t) (
and therefore
94
)
E exp(i X )
(cos( x) 1)) N (dx)
0
2V ) 2
E exp(
exp( 2x) 1)
2
M (dx)
(2.1)
2.1 Levy processes of type G
where N and M are the Lévy measures of X and V respectively. A Lévy process X (t) is said to be of type G if its increments are of type G. Rosi n´ ski (1991) suggested
X (t)
Gk g(Tk )1 2 I (Uk
t)
k 1
for 0 t S as a series representation of a type G Lévy process with no Gaussian component, where Uk is an i.i.d. sequence of uniform variables on (0 S), Gk is an i.i.d. sequence of standard Gaussian variables and Tk are the points of a homogeneous Poisson process on with intensity S. The function g is the generalised inverse of the tail Lévy measure M ,
g(u)
inf x 0 : M (x )
u
(2.2)
If the Lévy measure M has infinite mass then g has support on the whole of and moreover every interval in [0 S] with positive length will contain a countable infinite number of jumps. This is usually referred to as the infinite jump intensity case. If M has finite mass then the process X (t) is a compound Poisson process. The process X (t) d can also be seen as a subordinated Wiener process, i.e. X (t) W (V (t)) where W (t) is a Wiener process and the subordinator V (t) is a Lévy process with Lévy measure M and series representation
V (t)
t)
g(Tk )I (Uk k 1
We can now split the series representation of X (t) into two independent type G Lévy processes, X (t)
XT (t)
T X (t)
Gk g(Tk )1 2 I (Uk
t)
k:Tk T
Gk g(Tk )1 2 I (Uk
t)
k:Tk T
The first process XT (t) has jumps with variances larger than g(T ) and the second process TX (t) has jumps with variances smaller than g(T ). We also split the subordinator in the same way, i.e. V (t)
VT (t)
T V (t)
g(Tk )I (Uk
t)
Tk T
Tk T
g(Tk )I (Uk
t)
The first process VT (t) has jumps larger than g(T ) and the second process TV (t) has jumps smaller than g(T ). The process TV (t) is a subordinator with bounded moments of all orders for T 0. In Wiktorsson (2000) we showed that the subordinator T T 2 [0 1] P Leb) to t as T if V (t) E V (1)) converges in L (
lim
T
d log(1 g(T )) 0 dT 95
D
T V (1)
for the case where g is differentiable and if limT g(T ) E case.
3
0 in the general
Stochastic integrals with respect to type G Lévy processes
We study stochastic integrals of the form S
Z (t)
f (t s ) dX (s)
0
where X (s) is a type G Lévy process and f (t s) is adapted in s for each t [0 S] with S t càdla`g (RCLL) paths. We will also denote 0 f (t s ) dX (s) by IX (f ) and 0 k(s ) dX (s) by IX (k)t .
3.1 Representations of the stochastic integrals Rosi´nski (1991) suggested
Z (t)
0 t S
d
Gk g(Tk )1 2 f (t Uk )
(3.1)
k
0 t S
as a series representation of stochastic integrals with respect to a type G Lévy process with no Gaussian component, where Uk , Gk and Tk are as defined above. Depending on the properties of f we have to use different approaches to obtain useful approximations of the stochastic integral Z (t). The above series representation is useful when we can simulate f but not X exactly, e.g. when f is a stochastic process independent of X such that f can be easily simulated or when f is a deterministic function. If the problem instead is to approximate f we have to use a different approach. The difficult case is when we can simulate neither X nor f exactly. For certain special cases it is still possible to obtain good approximations. One such case is when f is a smooth function of X . In the next section we propose approximations for these different cases.
3.2 Approximations of stochastic integrals We begin with a very general decomposition of stochastic integrals that only rely on the additivity of the integral. The decomposition is obtained by first writing both the T integrand and the integrator as sums of two terms: X (t) XT (t) X (t) and f (t s) T fT (t s) f (t s) respectively. The first terms XT and fT respectively can be simulated exactly and the second parts TX and Tf respectively have to be approximated. Note that T is a parameter that determines how much fT and XT resembles f and X respectively, 96
3.2 Approximations of stochastic integrals
i.e. the larger T is the smaller the remainder terms TX and Tf are. For different specific cases we will give T a more precise meaning. The decomposition of X and f gives in a natural way a corresponding decomposition of the stochastic integral Z I X (f ), Z IX (f )
IXT (fT )
IXT (
T f )
I TX (fT )
I TX (
T f )
To obtain approximations of Z we first approximate f and X . Let X T be an approximation of X given by XT where
T X
T X.
is an approximation of
ZT
T f .
IX T (f T )
Let also f T be an approximation of f given by fT
where Tf is an approximation of given by
T X
XT
T f
fT
Using this we obtain Z T as an approximation of Z
IXT (fT )
IXT (
T f )
I TX (fT )
I TX (
T f )
T A straightforward approximation would be to let Tf 0, thus letting Z T X IXT (fT ). For several cases we can obtain better approximations than this one, which will be shown below. We start with the case where we can simulate f exactly. We do not assume that f is independent of X . It might seem odd to assume that we can simulate f , being dependent of X , yet not being able to simulate X , but we do not want to rule out this possibility. We proceed by splitting the series representation of the integral into two processes, T Z (t)
Z (t) ZT (t)
Gk g(Tk )1 2 f (t Uk )
k:Tk T
Gk g(Tk )1 2 f (t Uk )
k:Tk T
The simplest approximation is to use ZT (t) as an approximation of Z (t). We can however, when f has finite variation, do better than this. We first rewrite the tail-sum TZ (t) as T Z (t)
S
0
f (t s ) d
T X (t)
We will now utilise the convergence of TX (t) (Var TX (1))1 2 to a Wiener process to obtain an approximation of TZ (t) . In to order avoid some measurability technicalities for the approximating process TZ (t) we first make a partial integration of the process T Z (t) . We then obtain T Z (t)
f (t S)
T X (S)
S
0
T X (s
) ds f (t s) [f (t )
T X(
)] S
97
D
where [X Y ]S is the quadratic co-variation of the processes X and Y evaluated at time S (Protter, 1990, pp. 58–60). Note that if f is continuous and has finite variation or if f is independent of WT ( TV ( )) ,
T V(
[f (t ) WT (
))]
0
S
(Jacod & Shiryaev, 1987, Proposition 4.49, p. 52). We now propose the approximation T Z (t)
S
T X (S)
f (t S)
T X (s)
0
ds f (t s)
where TX (s) WT (s E TV (1)) and WT (t) is a standard Wiener process. Note that since T X (s) is continuous we can, when f has finite variation, define this integral path-wise for each fixed (see e.g. Protter, 1990, Theorem 49, p. 38). We now utilise the coupling T T T X (s) WT ( V (s)) where we use the same Wiener process as for X (s). T To calculate the difference between Z (t) ZT (t) Z (t) and its approximation T Z T (t) ZT (t) Z (t) we first note that (T )t
T Z (t)
Z (t) Z T (t)
T Z (t)
Thus (T )t
f (t S)(
(T )t
0
T X (s
T X (S))
S
) ds f (t s)
0
2
27(Var 3S E
T 1 2 V (1))
T V (1) E[f
(S E f 4 (t S))1
(t ) f (t )]S
(ii) If in addition f is continuous in s for each t
E
(T )t
T X (s)
ds f (t s)
2
12(Var
T 1 2 V (1))
S
2
0
S
2
(iii) If f in addition to (i) and (ii) is absolutely continuous in s for each t E
98
(T )t
2
12(Var
T 1 2 V (1))
(S E f 4 (t S))1
S
2
2
2
[0 S] then
s1 2 (E fs (t s)2)1
0
s1 2 (E ds f (t s) 2)1
0
S
s1 2 (E ds f (t s) 2)1
[0 S] then
(S E f 4 (t S))1
)]
T X(
[f (t )
(i) If f has finite variation and four finite moments then
Theorem 3.1 E
S
T X (S)
2
ds
3.2 Approximations of stochastic integrals
(iv) If f has two finite moments, finite variation and is independent of X then E
(T )t
2E
2
T V (S) S
E
2
S E E f 2 (t S)
T V (S)
0
T 1 2 V (1))
2(Var
SE
E d f (t s)
T V (1)
s
S
S 1 2 E f 2 (t S)
s1 2 E ds f (t s)
0
Before proving the theorem we note that in (i) we have a slower convergence rate than in (ii)–(iv). This follows from the fact that if TV (t) E TV (1) converges to t in L2 ( [0 S] P Leb) then the asymptotically dominant term in (i) will be
We should compare E T Z (t).
T V (1) E[f
SE
(T )
(t ) f (t )]S as T
2 t to the error obtained when we simply neglect the tail-sum We then have that the MSE is given by
E
T 2 Z (t)
T V (1)
E
S 0
E f (t s)2 ds
We see that this MSE is asymptotically of the same order as the dominant term in (i). So for this case we do not accomplish an increase in convergence rate using the above approximation of the tail-sum. We now proceed with the proof. Proof. We start with (i). Under the assumption that f has four finite moments we obtain
E[
(T )2t ]
E
f (t S)( S
(
T X (s
0
3E
)
T X (s
f (t S) (
T X (S)
2
S
3E
( 0
S 0
T X (S))
)
2
T X (S)
T X (s
)
2 T X (S)
T X (s
)]
S
2
T X(
3 E[f (t )
)) ds f (t s)
2 T X (S) T X (s
T X(
[f (t )
)) ds f (t s)
T X (s
f (t S) (
3E 3E
T X (S)
s
2 S
2
3 E[f (t )
) d f (t s)
)]
T X(
)]
2 S
2
99
D
where we used that (a b c)2 3(a2 b2 c 2 ). We now apply the Cauchy-Schwarz inequality and the Kunita-Watanabe inequality (see e.g. Protter, 1990, Theorem 25, p. 61) to obtain E[ (T )2t ]
27(S E f 4 (t S))1 2(Var
S
3E
0
T 1 2 V (1))
T X (s
T X (s
)
) 2 ds f (t s)
T X(
3 E[f (t ) f (t )]S E[
) ( )] T X
S
Apply the Cauchy-Schwarz inequality again on the second term and then Fubini’s theorem to find 27S(E f 4 (t S))1 2(Var
E[ (T )2t ]
S
(E
27
0
T V (s
T X(
T X(
3 E[f (t ) f (t )]S E[ 27(Var 3S E
T 1 2 V (1))
T V (1) E[f
) ( )] T X
T X
(S E f 4 (t S))1
S
S
2
) ( )]
(t ) f (t )]S
) 2 )1 2 (E ds f (t s) 2)1
(S E f 4 (t S))1
T V (s
T 1 2 V (1))
27(Var
) E
3 E[f (t ) f (t )]S E[
T 1 2 V (1))
S
S
2
s1 2 (E ds f (t s) 2)1
0
2
s1 2 (E ds f (t s) 2)1
0
2
2
where the last equality follows from that E[ TX ( ) TX ( )]S E TV (S) S E TV (1). We continue with (ii). This part follows from the proof of (i) and, as noted above, that [f (t ) TX ( )]S 0 when f is continuous. Part (iii) follows from (ii) and that
(E ds f (t s) 2 )1
2
(E fs (t s)2)1
2
ds
if f is absolutely continuous in s. Finally (iv) follows from that E f (t S) 2( TX (S)
T 2 E f (t S)2 E( TX (S) TX (S))2 E f (t S)2 E TV (S) E TV (S) by the inX (S)) dependence of f and TX , TX . The last inequality in (iv) follows by Jensen’s inequality, which concludes the proof.
We continue with the case where f (t s) I (s t)k(X (s)) is a smooth function of X in that k C 1 ( ) and with k being Lipschitz. For this case we propose
Z T (t) 100
IXT (k(XT ))t
k(XT (t))
T X (t)
Ik(XT ) (
T X )t
IXT (k (XT )
T X )t
3.2 Approximations of stochastic integrals
as an approximation of Z (t) IX (k(X ))t , where XT , TX and TX are defined as above and we use the same coupling as before. Using the mean value theorem, partial integration and that XT and TX are independent we can rewrite Z as Z (t) where
IXT (k(XT ))t T (t)
k(XT (t))
T X (t)
Ik(XT ) (
T X )t
IXT (k (
T T ) X )t
for each t is a point between XT (t) and X (t). Thus
(T )t
I TX (k ( t )
T T k(XT (t))( TX (t) TX (t)) X t
Ik(XT ) X
IXT (k ( t ) k (XT )) TX t IXT k (XT )( TX
I TX (k ( t ) k (XT )) TX t I TX (k (XT ) TX )t A1 A2 A3 A4 A5 A6
T X )t
T X ) t
where A1 A2 A3 A4 A5 and A6 are the six terms in curly brackets.
Theorem 3.2 If f (t s) I (s t)k(X (s)) where k C 1 ( ) has a Lipschitz continuous 2 derivative, k(XT ) and XT have finite variation, E k(XT (s))2 , E XT (s) and 2 for 0 s S, then E k (XT (s))
E (T )2
T 1 2 1 2 (t E k(XT (t))2 V (1)) t s1 2 E dk(XT (s)) 6(Var TV (1))1 2 0 t 18L2 (s Var TV (1) s2 (E TV (1))2) E dXT (s) 0 t 6(Var TV (1)) 1 2 E k (XT (s))2s1 2 E dXT (s) 0 t 18L2 E TV (1) (s Var TV (1) s2 (E TV (1))2 ) ds 0 t 6 E TV (1)2 E k (XT (s))2 s ds 0
6(Var
where L is the Lipschitz constant of k . Proof. The first two terms follow from the independence of XT and TX , using Theorem 3.1(iv). The third follows term from the Lipschitz continuity of k and Theorem 3.1(iv). The fourth is proven similarly to Theorem 3.1(ii). The fifth and sixth terms follow from the Lipschitz continuity of k and that TX is a martingale with all moments finite.
In Theorem 3.1 we assume that the integrand has finite variation, but this restriction can in fact be removed if f is independent of X . For this special case we can still obtain a 101
D
good approximation of the stochastic integral provided that we can simulate the integrand exactly. The method for doing this is stochastic time change representation of stochastic integrals. This will treated in the next section.
3.3 Time change representations of stochastic integrals Stochastic time change representations of stochastic integrals with respect to symmetric stable Lévy processes were first studied by Rosin´ ski & Woyczy´nski (1986), who also gave a necessary and sufficient condition for the existence of these stochastic integrals. Let X (t) be a symmetric -stable 2. We then have that Lévy process with 0
t
Z (t)
f (s) dX (t) X
f (s)
t
0
ds
0
t
d
where X (t) X (t) , provided that f satisfies the condition 0 f (s) ds for any finite t. Moreover, the process X (t) can explicitly be constructed as X (t)
a.s.
f (u)
s
(t)
Z ( (t))
where
inf s 0 :
du) t
0
Kallenberg (1992) generalised these results to asymmetric stable Lévy processes and indicated possible multi-dimensional extensions. Kallsen & Shiryaev (2000) showed that this time change property is valid only for the class of -stable Lévy processes. We will, however, show that a modification of the time change property is valid for type G Lévy processes in finite-dimensional distribution sense, provided that the integrand f and the integrator X are independent.
d
Proposition 3.1 If X (t) W (V (t)) is a type G Lévy process, the process Z (t) t f (s) dX (s) 0 t S can be represented as 0
Z (t)
d
Z(t)
W
t
f (s)2 dV (s)
0
where W (t) is a Wiener process independent of V (t) and f (t) , provided that f (t) is independent of X (t) and satisfies t 0
102
f (s)2 dV (s)
a s for 0
t
S
3.3 Time change representations of stochastic integrals
Proof. By the independence of W , V and f we can view Z (t) as a conditionally Gaussian process given f (t) and V (t) . The conditional covariance function r Z V f (s t) of Z (t) given f (t) and V (t) is rZ
V f
E[Z (t)Z (s) V f ]
E
(s t)
0
t
s
0 0 min(s t )
f (u)f (v) dW (V (u)) dW (V (v)) V f
f (u)2 dV (s) a s
where the last equality follows from Fubini’s theorem, the fact that V is increasing and the independent increments of W . By using the independents increments of W and t that 0 f (u)2 dV (u) is non-decreasing we find that rZ V f (s t), the conditional covariance
function of Z(t) given f (t) and V (t) , a.s. equals rZ V f (s t). Now use that two Gaussian processes with a.s. equal covariance functions have the same finite-dimensional distributions to conclude the proof.
We note that it is in fact possible to obtain a strong version of this proposition, i.e. we can find a Wiener process W such that
t
f (s) dW (V (s))
W
0
t 0
f (s)2 dV (s) a s
The problem is that we cannot uniquely construct W from the stochastic integral process Z (t) . This is due to the fact that the inverse time shift
t
(t) inf s 0 :
f (u)2 dV (u) t
0
is not strictly increasing. Thus the process Q(t) Z ( (t)) is not a Wiener process since it is piecewise constant. We can, however, in principle modify Q(t) on each interval of constancy by inserting a suitable Brownian bridge to obtain a Wiener process. Note t that this modification does not change the process Q 0 f (s)2 dV (s) since the prot cess 0 f (s)2 dV (s) never hits the interior of these intervals. There are a few unsolved technicalities regarding finding a filtration which makes this modification of the process Q adapted. In order to obtain approximations of the stochastic integral we first split the integral into a sum of two terms,
Z (t) ZT (t)
T Z (t)
IXT (f )t
I TX (f )t 103
D
where IXT (f )t and I TX (f )t are conditionally independent given f . Using the weak time change property of Proposition 3.1 we can represent the stochastic integral Z (t) by Z (t)
W2 (I TV (f 2 )t )
W1 (IVT (f 2 )t )
d
where W1 (t) and W2 (t) are independent standard Wiener processes. We now propose an approximation Z T (t) of Z (t) . For 0 t S define Z T (t) by
2
Z T (t) W1 (IVT (f )t ) The difference
(T )t
Theorem 3.3 by
t
T V (1)
W2 E
f (s)2 ds
0
(T )t between Z (t) and its approximation Z T (t) is thus given by
W2 (I TV (f 2 )t ) W2 E
Z (t) Z T (t)
t
T V (1)
0
f (s)2 ds
(i) If f has two finite moments then the MSE of the approximation is given
E
(T )t
2
E
(ii) If f has four finite moments then E
(T )t
2
t
f (s)2 d
0
Var(
T 1 2 V (1))
T V (s)
E
T V (s)
t
E f (s)4 ds
0
T V
Proof. By using independence of W2 , f and
1 2
we obtain
f (s) d (s) E (s) which shows part (i). Part (ii) follows from Jensen’s inequality, the independence of f
E W2 (I TV (f 2 )t ) W2 E T V
and
, and that
T V (t)
T V (1)
E
t
2
f (s)2 ds
t
E
0
T V (t)
0
2
T V
is a martingale with all moments finite.
T V
We can also obtain an upper bound for the the mean integrated square error (MISE), i.e.
(1)
S
(T )
E 0
104
(T )t
2
dt
REFERENCES
Corollary 3.1 (MISE) If f has four finite moments then
(1)
(T )
T 1 2 V (1))
2S(Var
(1)
(T )
S E sup 0 t S
t
T V (s)
E
t
T V (s)
f (s)2 d
0
(1)
(T )
2S
E
2S(Var
T V (s)
T V (s)
E
2
S
1 2
is an L martingale, Doob’s inequality and The-
E f (s)4 ds
0
Now use that 0 f (s)2 d orem 3.3(ii) to obtain
t
Proof. We first observe that
f (s)2 d
0 T 1 2 V (1))
T V (t) t
0
E
T V (t)
E f (s)4 ds
1 2
1 2
2
which concludes the proof.
References Barndorff-Nielsen, O.E. (1998). Processes of normal inverse Gaussian type. Financial Stoch. 2, 41–68. Barndorff-Nielsen, O.E. & Pérez-Abreu, V. (1999). Stationary and self-similar processes driven by Lévy processes. Stoch. Proc. Appl. 84, 357–369. Bertoin, J. (1996). Lévy Processes. Cambridge University Press, Cambridge. Jacod, J. & Shiryaev, A.N. (1987). Limit Theorems for Stochastic Processes. Springer-Verlag, Berlin. Kallenberg, O. (1992). Some time change representations of stable integrals, via transformations of local martingales. Stoch. Proc. Appl. 40, 199–223. Kallsen, J. & Shiryaev, A.N. (2000). Time Change Representation of Stochastic Integrals. Preprint. Available at: http://neyman.mathematik.uni-freiburg.de/ kallsen/
105
D
Protter, P. (1990). Stochastic Integration and Differential Equations. Springer-Verlag, Berlin. Protter, P. & Talay, D. (1997). The Euler scheme for Lévy driven stochastic differential equations. Ann. Prob. 25, 393–423. Rosi´nski, J. & Woyczy´nski, W.A. (1986). On Itô stochastic integration with respect to pstable motion: Inner clock, integrability of sample paths, double and multiple integrals. Ann. Prob. 14, 271–286. Rosi´nski, J. (1991). On a class of infinitely divisible processes represented as mixtures of Gaussian processes. In Stable Processes and Related Topics. Cambanis, S., Samorodnitsky, G. & Taqqu, T.S. (eds.). Birkhäuser, Boston. pp. 405–430. Rydberg, T. (1997). The Normal inverse Gaussian Lévy process: simulation and approximation. Comm. Stat. Stoch. Models 13, 887–910. Sato, K. (1999). Lévy Processes and Infinitely Divisible Distributions. Cambridge University Press, Cambridge. Wiktorsson, M. (2000). Improved convergence rate for the simulation of Lévy processes of type G, working paper.
106