3.3 Introduction to Stochastic Integrals. The stochastic integral arose from
attempts to use the techniques of Riemann-Stieltjes integration for stochastic ...
Chapter 3
Stochastic Integration and Continuous Time Models 3.1 Brownian Motion The single most important continuous time process in the construction of financial models is the Brownian motion process. A Brownian motion is the oldest continuous time model used in finance and goes back to Bachelier(1900) around the turn of the last century. It is also the most common building block for more sophisticated continuous time models called diffusion processes. The Brownian motion process is a random continuous time process denoted W (t) or Wt , defined for t ≥ 0 such that W (0) takes some predetermined value, usually 0, and for each 0 ≤ s < t, W (t)−W (s) has a normal distribution with mean µ(t−s) and variance σ2 (t − s). The parameters µ and σ are the drift and the diffusion parameters of the Brownian motion and in the special case µ = 0, σ = 1, W (t) is often referred to as a standard Brownian motion or a Wiener process. Further properties of the Brownian motion process that are important are: A Brownian motion process exists such that the sample paths are each continuous functions of t (with probability one) The joint distribution of any finite number of increments W (t2 ) − W (t1 ), W (t4 ) − W (t3 ), .....W (tk )−W (tk−1 ) are independent normal random variables provided that 0 ≤ t1 < t2 ≤ t3 < t4 ≤ ...tk−1 < tk .
Further Properties of the (standard) Brownian Motion Process The covariance between W (t) and W (s), Cov(W (t), W (s)) = min(s, t). Brownian motion is an example of a Gaussian process, a process for which every finitedimensional distribution such as (W (t1 ), W (t2 ), ..., W (tk )) is Normal (multivariate or univariate). In fact, Gaussian processes are uniquely determined by their covariance structure. In particular if a Gaussian process has E(Xt ) = 0 and Cov(X(t), X(s)) = 59
0.7
0.6
0.5
0.4
W(t)
60
0.3
0.2
0.1
0
−0.1
0
0.1
0.2
0.3
0.4
0.5 t
0.6
0.7
0.8
0.9
1
Figure 3.1: A sample Path for the Standard Brownian Motion (Wiener Process)
min(s, t), then it has independent increments. If in addition it has continuous sample paths and if X0 = 0, then it is standard Brownian motion. Towards the construction of a Brownian motion process, define the triangular function ⎧ 2t for 0 ≤ t ≤ 12 ⎨ 2(1 − t) for 12 ≤ t ≤ 1 ∆(t) = ⎩ 0 otherwise and similar functions with base of length 2−j
∆j,k (t) = ∆(2j t − k) for j = 1, 2, ...and k = 0, 1, ..., 2j − 1. ∆0,0 (t) = t, 0≤t≤1 Theorem A38 (Wavelet construction of Brownian motion) Suppose the random variables Zj,k are independent N (0, 1) random variables. Then series below converges uniformly (almost surely) to a standard Brownian motion process B(t) on the interval [0, 1]. j
B(t) =
∞ 2X −1 X
2−j/2−1 Zj,k ∆j,k (t)
j=0 k=0
The standard Brownian motion process can be extended to the whole interval [0, ∞) by generating independent Brownian motion processes B (n) on the interval [0, 1] and Pn defining W (t) = j=1 B (j) (1) + B (n+1) (t − n) whenever n ≤ t < n + 1. Figure 1.73.1 gives a sample path of the standard Brownian motion. Evidently the path is continuous but if you examine it locally it appears to be just barely continuous, having no higher order smoothness properties. For example derivatives do not appear
61 to exist because of the rapid fluctuations of the process everywhere. There are various modifications of the Brownian motion process that result in a process with exactly the same distribution. Theorem A39 If W (t) is a standard Brownian motion process on [0, ∞), then so are the processes Xt = √1a W (at) and Yt = tW (1/t) for any a > 0. A standard Brownian motion process is an example of a continuous time martingale, because, for s < t, E[W (t)|Hs ] = E[W (t) − W (s)|Hs ] + E[W (s)|Hs ] = 0 + W (s) since the increment W (t) − W (s) is independent of the past and normally distributed with expected value 0. In fact it is a continuous martingale in the sense that sample paths are continuous (with probability one) functions of t. It is not the only continuous martingale, however. For example it is not difficult to show that both Xt = Wt2 − t and exp(αWt −α2 t/2), for α any real number are continuous martingales. Of course neither are Gaussian processes. Their finite dimensional distributions cannot be normal since both processes are restricted to values in the positive reals. We discussed earlier the ruin probabilities for a random walk using martingale theory, and a similar theory can be used to establish the boundary crossing probabilities for a Brownian motion process. The following theorem establishes the relative probability that a Brownian motion hits each of two boundaries, one above the initial value and the other below. Theorem A40 (Ruin probabilities for Brownian motion) If W (t) is a standard Brownian motion and the stopping time τ is defined by τ = inf{t; W (t) = −b or a} where a and b are positive numbers, then P (τ < ∞) = 1 and P [Wτ = a) =
b a+b
Although this establishes which boundary is hit with what probability, it says nothing about the time at which the boundary is first hit. The distribution of this hitting time (the first passage time distribution) is particularly simple: Theorem A41 (Hitting times for a flat boundary) If W (t) is a standard Brownian motion and the stopping time τ is defined by τa = inf{t; W (t) = a} where a > 0, then
62 Theorem A42 1. P (τa < ∞) = 1 2. τa has a Laplace Transform given by √ 2λ|a|
E(e−λτa ) = e−
.
3. The probability density function of τa is f (t) = at−3/2 φ(at−1/2 ) where φ is the standard normal probability density function. 4. The cumulative distribution function of τa is given by P [τa ≤ t] = 2P [W (t) > a] = 2[1 − Φ(at−1/2 )] for t > 0 zero otherwise 5. E(τa ) = ∞ The last property is surprising. The standard Brownian motion has no general tendency to rise or fall, but because of the fluctuations it is guaranteed to strike a barrier placed at any level a > 0. However, the time before this barrier is struck can be very long, so long that the expected time is infinite. The following corollary provides an interesting connection between the maximum of a Brownian motion over an interval and its value at the end of the interval. Corollary If Wt∗ = max{W (s); 0 < s < t} then for a ≥ 0, P [Wt∗ > a] = P [τa ≤ t] = 2P [W (t) > a] Theorem A43 (Time of last return to 0) Consider the random time L = s u p{t ≤ 1; W (t) = 0}. Then L has cumulative distribution function P [L ≤ s] =
√ 2 arcsin( s), 0 < s < 1 π
and corresponding probability density function √ d 2 1 ,0 < s < 1 arcsin( s) = p ds π π s(1 − s)
63
3.2 Continuous Time Martingales As usual, the value of the stochastic process at time t may be denoted X(t) or by Xt for t ∈ [0, ∞) and let Ht is a sub sigma-field of H such that Hs ⊂ Ht whenever s ≤ t. We call such a sequence a filtration. Xt is said to be adapted to the filtration if X(t) is measurable Ht for all t ∈ [0, ∞). Henceforth, we assume that all stochastic processes under consideration are adapted to the filtration Ht . We also assume that the filtration Ht is right continuous, i.e. that \
Ht+ = Ht .
(3.1)
>0
We can make this assumption without loss of generality because if Ht is any filtration, then we can make it right continuous by replacing it with Ht+ =
\
Ht+ .
(3.2)
>0
We use the fact that the intersection of sigma fields is a sigma field. Note that any process that was adapted to the original filtration is also adapted to the new filtration Ht+ . We also typically assume, by analogy to the definition of the Lebesgue measurable sets, that if A is any set with P (A) = 0, then A ∈ H0 . These two conditions, that the filtration is right continuous and contains the P −null sets are referred to as the standard conditions. The definition of a martingale is, in continuous time, essentially the same as in discrete time: Definition Let X(t) be a continuous time stochastic process adapted to a right continuous filtration Ht , where 0 ≤ t < ∞. Then X is a martingale if E|X(t)| < ∞ for all t and E [X(t)|Hs ] = X(s)
(3.3)
for all s < t. The process X(t) is a submartingale (respectively a supermartingale) if the equality is replaced by ≥ (respectively ≤). Definition A random variable τ taking values in [0, ∞] is a stopping time for a martingale (Xt , Ht ) if for each t ≥ 0, the event [τ ≤ t] is in the sigma algebra Ht .
Stopping a martingale at a sequence of non-decreasing stopping times preserves the martingale property but there are some operations with Brownian motion which preserve the Brownian motion measure:
W(t)
64 W( τ)
~
u
W(t)
W(t)
0
τ0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
f (t) obtained by reflecting a Brownian motion about W (τ ). Figure 3.2: The process W Theorem A44 (Reflection&Strong Markov Property) If τ is a stopping time with respect to the usual filtration of a standard Brownian motion W (t), then the process ½ W (t) t < τ f (t) = W 2W (τ ) − W (t) t ≥ τ
is a standard Brownian motion.
f (t) is obtained from the Brownian motion process as follows: up to The process W f (t) is time τ the original Brownian motion is left alone, and for t > τ, the process W the reflection of W (t) about a horizontal line drawn at y = W (τ ). This is shown in Figure 3.2.
Theorem A45 Let {(Mt , Ht ), t ≥ 0} be a (right-)continuous martingale and assume that the filtration satisfies the standard conditions. If τ is a stopping time, then the process Xt = Mt∧τ is also a continuous martingale with respect to the same filtration.
0.9
1
65 Various other results are essentially the same in discrete or continuous time. For example Doob’s Lp inequality p ||MT ||p , if p > 1 p−1
|| sup Mt ||p ≤ 0≤t≤T
holds for right-continuous non-negative submartingales and p ≥ 1. Similarly the submartingale convergence theorem holds as stated earlier, but with n → ∞ replaced by t → ∞.
3.3 Introduction to Stochastic Integrals The stochastic integral arose from attempts to use the techniques of Riemann-Stieltjes integration for stochastic processes. However, Riemann integration requires that the integrating function have locally bounded variation in order that the Riemann-Stieltjes sum converge. Definition (locally bounded variation) If the process At can be written as the difference of two nondecreasing processes, it is called a process of locally bounded variation.A function is said to have locally bounded variation if it can be written as the difference of two non-decreasing processes. For any function G of locally bounded variation, random or not, integrals such as f dG are easy to define because, since we can write G = G1 −G2 as the difference 0 between two non-decreasing functions G1 , G2 , the Rieman-Stieltjes sum RT
n X i=1
f (si )[G(ti ) − G(ti−1 )]
where 0 = t0 < t1 < t2 < ... < tn = T is a partition of [0, T ], and ti−1 ≤ si ≤ ti will converge to the same value regardless of where we place si in the interval (ti−1 , ti ) as the mesh size maxi |ti − ti−1 | → 0. By contrast, many stochastic processes do not have paths of bounded variation. Consider, for example, a hypothetical integral of the form Z T f dW 0
where f is a nonrandom function of t ∈ [0, T ] and W is a standard Brownian motion. The Riemann-Stieljes sum for this integral would be n X i=1
f (si )[W (ti ) − W (ti−1 )]
where again 0 = t0 < t1 < t2 < ... < tn = T , and ti−1 ≤ si ≤ ti . In this case as maxi |ti − ti−1 | → 0 the Riemann-Stieljes sum will not converge because the
66 Brownian motion paths are not of bounded variation. When f has bounded variation, we can circumvent this difficulty by formally defining the integral using integration by parts. Thus if we formally write Z
0
T
f dW = f (T )W (T ) − f (0)W (0) −
Z
T
W df
0
then the right hand side is well defined and can be used as the definition of the left hand side. Unfortunately this simple interpretation of the stochastic integral does not work for many applications. The integrand f is often replaced by some function of W or another stochastic process which does not have bounded variation. There are other difficulties. For example, integration by parts to evaluate the integral Z
T
W dW
0
RT leads to 0 W dW = W 2 (T )/2 which is not the Ito stochastic integral. Consider for a moment the possible limiting values of the Riemann Stieltjes sums Iα =
n X i=1
f (si ){W (ti ) − W (ti−1 )}.
(3.4)
where si = ti−1 +α(ti −ti−1 ) for some 0 ≤ α ≤ 1. If the Riemann integral were well defined, then I1 − I0 → 0 in probability. However when f (s) = W (s), this difference I1 − I0 =
n X [W (ti ) − W (ti−1 )]2 i=1
and this cannot possibly converge to zero because, in fact, the expected value is à n ! n X X 2 E [W (ti ) − W (ti−1 )] (ti − ti−1 ) = T. = i=1
i=1
In fact since these increments [W (ti ) − W (ti−1 )]2 iare independent, we can show by a version of the law of large numbers that n X [W (ti ) − W (ti−1 )]2 →p T i=1
and more generally l Iα − I0 → αT in probability as the partition grows finer. In other words, unlike the Riemann-Stieltjes integral, it makes a difference where we place the point si in the interval (ti−1 , ti ) for a stochastic integral. The Ito stochastic RT integral corresponds to α = 0 and approximates the integral 0 W dW with partial sums of the form n X W (ti−1 )[W (ti ) − W (ti−1 )] i=1
67 the limit of which is, as the mesh size decreases, 12 (W 2 (T ) − T ). If we evaluate the integrand at the right end point of the interval (i.e. taking α = 1) we obtain 1 2 2 (W (T ) + T ). Another natural choice is α = 1/2 (called the Stratonovich integral) and note that this definition gives the answer W 2 (T )/2 which is the same result obtained from the usual Riemann integration by parts. Which definition is “correct”? The Stratonovich integral has the advantage that it satisfies most of the traditional rules of deterministic calculus, for example if the integral below is a Stratonovich integral, Z T exp(Wt )dWt = exp(WT ) − 1 0
While all definitions of a stochastic integral are useful, the main applications in finance are those in which the values f (si ) appearing in (3.4) are the weights on various investments in a portfolio and the increment [W (ti ) − W (ti−1 )] represents the changes in price of the components of that portfolio over the next interval of time. Obviously one must commit to ones investments before observing the changes in the values of those investments. For this reason the Ito integral (α = 0 ) seems the most natural for these applications. We now define the class of functions f to which this integral will apply. We assume that Ht is a standard Brownian filtration and that the interval [0, T ] is endowed with its Borel sigma field. Let H2 be the set of functions f (ω, t) on the product space Ω × [0, T ] such that 1. f is measurable with respect to the product sigma field on Ω × [0, T ]. 2. For each t ∈ [0, T ], f (., t) is measurable Ht . (in other words the stochastic process f (., t) is adapted to Ht . RT 3. E[ 0 f 2 (ω, t)dt] < ∞.
The set of processes H2 is the natural domain of the Ito integral. However, before we define the stochastic integral on H2 we need to define it in the obvious way on the step functions in H2 . Let H02 be the subset of H2 consisting of functions of the form f (ω, t) =
n−1 X i=0
ai (ω)1(ti < t ≤ ti+1 )
where the random variables ai are measurable with respect to Hti and 0 = t0 < t1 < ... < tn = T. These functions f are predictable in that their value ai (ω) in the interval (ti , ti+1 ] is determined before we reach this interval. A typical step function is graphed in Figure For such functions, the stochastic integral has only one natural definition: Z
0
T
f (ω, t)dW (t) =
n−1 X i=0
ai (ω)(W (ti+1) − W (ti ))
and note that considered as a function of T, this forms a continuous time square integrable martingale.
1
0.9
0.8
0.7
0.6
f( ω ,t)
68
a4 ( ω)
0.5 a2 ( ω)
0.4
a ( ω)
0.3
3
a1 ( ω)
0.2
0.1
0
0
t1
t2
t3
t4
t5
t
Figure 3.3: A typical step function f (ω, t)
There is a simple definition of an of inner product between two square integrable random variables X and Y , namely E(XY ) and we might ask how this inner product behaves when applied to the random variables obtained from stochastic integration like RT RT XT (ω) = 0 f (ω, t)dW (t) and YT (ω) = 0 g(ω, t)dW (t). The answer is simple, in fact, and lies at the heart of Ito’s definition of a stochastic integral. For reasons that will become a little clearer later, let us define the predictable covariation process to be the stochastic process described by Z T < X, Y >T (ω) = f (ω, t)g(ω, t)dt 0
Theorem A46 For functions f and g in H02 , E{< X, Y >T } = E{XT YT }. and
Z E(< X, X >T ) = E{
(3.5)
T
f 2 (ω, t)dt} = E(XT2 )
(3.6)
0
These identities establish an isometry, a relationship between inner products, at least for two functions in H02 . The norm on stochastic integrals defined by Z T Z T f dW ||2L(P ) = E( f (ω, t)dW (t))2 || 0
0
69 agrees with the usual L2 norm on the space of random functions Z T ||f ||22 = E{ f 2 (ω, t)dt}. 0
RT We use the notation ||f ||22 = E{ 0 f 2 (ω, t)dt}. If we now wish to define a stochastic integral for a general function f ∈ H2 , the method is fairly straightforward. First we approximate any f ∈ H2 using a sequence of step functions fn ∈ H02 such that ||f − fn ||22 → 0 To construct the approximating sequence fn , we can construct a mesh ti = for i = 0, 1, ...2n − 1 and define fn (ω, t) =
n−1 X i=0
ai (ω)1(ti < t ≤ ti+1 )
with ai (ω) =
1 ti − ti−1
Z
i 2n T
(3.7)
ti
f (ω, s)ds
ti−1
the average of the function over the previous interval. The definition of a stochastic integral for any f ∈ H2 is now clear from this approximation. Choose a sequence fn ∈ H02 such that ||f − fn ||22 → 0. Since the sequence fn is Cauchy, the isometry property (3.6) shows that the stochastic integrals RT fn dW also forms a Cauchy sequence in L2 (P ). Since this space is complete (in 0 the sense that Cauchy sequences converge to a random variable in the space), we can RT RT define 0 f dW to be the limit of the sequence 0 fn dW as n → ∞. Of course there is some technical work to be done, for example we need to show that two approximating sequences lead to the same integral and that the Ito isometry (3.5) still holds for functions f and g in H2 . The details can be found in Steele (2001). RT So far we have defined integrals 0 f dW for a fixed value of T, but how shoulud Rt we define the stochastic process Xt = 0 f dW for t < T ? T do so we define a similar integral but with the function set to 0 for s > t : Theorem A47 (Ito integral as a continuous martingale) For any f in H2 , there exists a continuous martingale Xt adapted to the standard Brownian filtration Ht such that Z T f (ω, s)1(s ≤ t)dW (s) for all t ≤ T. Xt = 0
This continuous martingale we will denote by
Rt 0
f dW.
So far we have defined a stochastic integral only for functions f which are square RT integral, i.e. which satisfy E[ 0 f 2 (ω, t)dt] < ∞ but this condition is too restrictive
70 for some applications. A larger class of functions to which we can extend the notion of integral is the set of locally square integrable functions, L2LOC . The word “local” in martingale and stochastic integration theory is a bit of a misnomer. A property holds locally if there is a sequence of stopping times νn each of which is finite but the νn → ∞, and the property holds when restricted to times t ≤ νn . Definition Let L2LOC be the set of functions f (ω, t) on the product space Ω × [0, T ] such that 1. f is measurable with respect to the product sigma field on Ω × [0, T ] 2. For each t ∈ [0, T ], f (., t) is measurable Ht (in other words the stochastic process f (., t) is adapted to Ht . RT 3. P ( 0 f 2 (ω, s)ds < ∞) = 1
Clearly this space includes H2 and arbitrary continuous functions of a Brownian motion. For any function f in L2LOC , it is possible to define a sequence of stopping times Z s νn = min(T, inf{s;
0
f 2 (ω, t)dt ≥ n})
which acts as a localizing sequence for f. Such a sequence has the properties: 1. νn is a non-decreasing sequence of stopping times 2. P [νn = T for some n] = 1 3. The functions fn (ω, t) = f (ω, t)1(t ≤ νn ) ∈ H2 for each n.
The purpose of the localizing sequence is essentially to provide approximations of a function f in L2LOC with functions f (ω, t)1(t ≤ νn ) which are in H2 and therefore have a well-defined Ito integral as described above. The integral of f is obtained by taking the limit as n → ∞ of the functions f (ω, t)1(t ≤ νn ) Z t Z t f (ω, s)dWs = lim f (ω, t)1(t ≤ νn )dWs 0
n→∞
0
If f happens to be a continuous non-random function on [0, T ], the integral is a limit in probability of the Riemann sums, X f (si )(Wti+1 − Wti )
RT 0
f (s)dWs
for any ti ≤ si ≤ tt+1 . The integral is the limit of sums of the independent normal zero-mean random variables f (si )(Wti+1 − Wti ) and is therefore normally distributed. In fact, Z t
Xt =
f (s)dWs
0
R min(s,t) 2 is a zero mean Gaussian process with Cov(Xs , Xt ) = 0 f (u)du. Such Gaussian processes are essentially time-changed Brownian motion processes according to the following:
71 Theorem A48 (time change to Brownian motion) Suppose f (s) is a continuous non-random function on [0, ∞) such that Z ∞ f 2 (s)ds = ∞. 0
Define the function t(u) = t}. Then
Ru 0
f 2 (s)ds and its inverse function τ (t) = inf{u; t(u) ≥ Y (t) =
Z
τ (t)
f (s)dWs
0
is a standard Brownian motion.
Definition (local martingale) The process M (t) is a local martingale with respect to the filtration Ht if there exists a non-decreasing sequence of stopping times τk → ∞ a.s. such that the processes (k)
Mt
= M (min(t, τk )) − M (0)
are martingales with respect to the same filtration. In general, for f ∈ L2LOC , stochastic integrals are local martingales or more formally thereRis a continuous local martingale equal (with probability one) to the stochast tic integral 0 f (ω, s)dWs for all t. We do not usually distinguish among processes that Rt differ on a set of probability zero so we assume that 0 f (ω, s)dWs is a continuous local martingale. There is a famous converse to this result, the martingale representation theorem which asserts that a martingale can be written as a stochastic integral. We assume that Ht is the standard filtration of a Brownian motion Wt . Theorem A49 (The martingale representation theorem) Let Xt be an Ht martingale with E(XT2 ) < ∞. Then there exists φ ∈ H2 such that Xt = X0 +
Z
t
φ(ω, s)dWs for 0 < t < T
0
and this representation is unique.
3.4 Differential notation and Ito’s Formula Summary 1 (Rules of box Algebra) It is common to use the differential notation for stochastic differential equations such as dXt = µ(t, Xt )dt + σ(t, Xt )dWt
72 to indicate (this is its only possible meaning) a stochastic process Xt which is a solution of the equation written in integral form: Z t Z t µ(s, Xs )ds + µ(s, Xs)dWs . Xt = X0 + 0
0
We assume that the functions µ and σ are such that these two integrals, one a regular Riemann integral and the other a stochastic integral, are well-defined, and we would like conditions on µ, σ such that existence and uniqueness of a solution is guaranteed. The following result is a standard one in this direction. Theorem A50 (existence and uniqueness of solutions of SDE) Consider the stochastic DE dXt = µ(t, Xt )dt + σ(t, Xt )dWt
(3.8)
with initial condition X0 = x0 . Suppose for all 0 < t < T, |µ(t, x) − µ(t, y)|2 + |σ(t, x) − σ(t, y)|2 ≤ K|x − y|2 and |µ(t, x)|2 + |σ(t, x)|2 ≤ K(1 + |x|2 ) Then there is a unique (with probability one) continuous adapted solution to (3.8) and it satisfies sup E(Xt2 ) < ∞. 0 0. There are at least as many distinct solutions as there are possible values of a. Now suppose a process Xt is a solution of (3.8) and we are interested an a new stochastic process defined as a function of Xt , say Yt = f (t, Xt ). Ito’s formula is used to write Yt with a stochastic differential equation similar to (3.8). Suppose we attempt this using a Taylor series expansion where we will temporarily regard differentials such as dt, dXt as small increments of time and the process respectively (notation such as ∆t, ∆W might have been preferable here). Let the partial derivatives of f be denoted by ∂f ∂f ∂2f f1 (t, x) = , etc , f2 (t, x) = f22 (t, x) = ∂t ∂x ∂x2 Then Taylor’s series expansion can be written 1 dYt = f1 (t, Xt )dt + f11 (t, Xt )(dt)2 + .... 2 1 + f2 (t, Xt )dXt + f22 (t, Xt )(dXt )2 + . 2 + f12 (t, Xt )(dt)(dXt ) + ....
(3.9)
73 and although there are infinitely many terms in this expansion, all but a few turn out to be negligible. The contribution of these terms is largely determined by some simple rules often referred to as the rules of box algebra. In an expansion to terms of order dt, as dt → 0 higher order terms such as (dt)j are all negligible for j > 0. For example (dt)2 = o(dt) as dt → 0 (intuitively this means that (dt)2 goes to zero faster than dt does). Similarly cross terms such as (dt)(dWt ) are negligible because the increment dWt is normally distributed with mean 0 and standard deviation (dt)1/2 and so (dt)(dWt ) has standard deviation (dt)3/2 = o(dt). We summarize some of these order arguments with the oversimplified rules below where the symbol “∼00 is taken to mean “is order of, as dt → 0” (dt)(dt) ∼ 0 (dt)(dWt ) ∼ 0 (dWt )(dWt ) ∼ dt From these we can obtain, for example, (dXt )(dXt ) = [µ(t, Xt )dt + σ(t, Xt )dWt ][µ(t, Xt )dt + σ(t, Xt )dWt ] = µ2 (t, Xt )(dt)2 + 2µ(t, Xt )σ(t, Xt )(dt)(dWt ) + σ 2 (t, Xt )(dWt )(dWt ) ∼ σ 2 (t, Xt )dt which indicates the order of the small increments in the process Xt . If we now use these rules to evaluate (3.9), we obtain 1 dYt ∼ f1 (t, Xt )dt + f2 (t, Xt )dXt + f22 (t, Xt )(dXt )2 2 1 ∼ f1 (t, Xt )dt + f2 (t, Xt )(µ(t, Xt )dt + σ(t, Xt )dWt ) + f22 (t, Xt )σ 2 (t, Xt )dt 2 which is the differential expression of Ito’s formula. Theorem A51 (Ito’s formula). Suppose Xt satisfies dXt = µ(t, Xt )dt +σ(t, Xt )dWt . Then for any function f such that f1 and f22 are continuous, the process f (t, Xt ) satisfies the stochastic differential equation: 1 df (t, Xt ) = {µ(t, Xt )f2 (t, Xt )+f1 (t, Xt )+ f22 (t, Xt )σ 2 (t, Xt )}dt+f2 (t, Xt )σ(t, Xt )dWt 2 Example (Geometric Brownian Motion) Suppose Xt satisfies dXt = aXt dt + σXt dWt
74 and f (t, Xt ) = ln(Xt ). Then substituting in Ito’s formula, since f1 = 0, f2 = Xt−1 , f22 = −Xt−2 , 1 dYt = Xt−1 aXt dt − Xt−2 σ 2 Xt2 dt + Xt−1 σXt dWt 2 σ2 = (a − )dt + σdWt 2 and so Yt = ln(Xt ) is a Brownian motion with drift a −
σ2 2
and volatility σ.
Example (Ornstein-Uhlenbeck process) Consider the stochastic process defined as Xt = x0 e−αt + σe−αt
Z
t
eαs dWs
0
for parameters α, σ > 0. Using Ito’s lemma, dXt = (−α)x0 e−αt + (−α)σe−αt
Z
t
eαs dWs + σe−αt eαt dWt
0
= −αXt dt + σdWt .
with the initial condition X0 = x0 . This R s process has Gaussian increments and covariance structure cov(Xs , Xt ) = σ 2 0 e−α(s+t−u) ds, for s < t and is called the Ornstein-Uhlenbeck process. Example (Brownian Bridge) Consider the process defined as Xt = (1 − t)
Z
0
t
1 dWs , for 0 < t < 1 1−s
subject to the initial condition X0 = 0. Then Z t 1 1 dXt = − dWs + (1 − t) dWt 1 − s 1 − t 0 Xt =− dt + dWt 1−t This process satisfying X0 = X1 = 0 and dXt = −
Xt dt + dWt 1−t
is called the Brownian bridge. It can also be constructed as Xt = Wt − tW1 and the distribution of the Brownian bridge is identical to the conditional distribution of a standard Brownian motion Wt given that W0 = 0, and W1 = 0. The Brownian bridge is a Gaussian process with covariance cov(Xs , Xt ) = s(1 − t) for s < t.
75 Theorem A52 (Ito’s formula for two processes) If dXt = a(t, Xt )dt + b(t, Xt )dWt dYt = α(t, Yt )dt + β(t, Yt )dWt then df (Xt , Yt ) = f1 (Xt , Yt )dXt + f2 (Xt , Yt )dYt 1 1 + f11 (Xt , Yt )b2 dt + f22 (Xt , Yt )β 2 dt 2 2 + f12 (Xt , Yt )bβdt There is an immediate application of this result to obtain the product rule for differentiation of diffusion processes. If we put f (x, y) = xy above, we obtain d(Xt Yt ) = Yt dXt + Xt dYt + bβdt This product rule reduces to the usual with either of β or b is identically 0.
3.5 Quadratic Variation One way of defining the variation of a process Xt is Pto choose a partition π = {0 = t0 ≤ t1 ≤ ... ≤ tn = t} and then define Qπ (Xt ) = i (Xti − Xti−1 )2 . For a diffusion process dXt = µ(t, Xt )dt + σ(t, Xt )dWt satisfying standard conditions, as the mesh size max |ti − ti−1 | converges to zero, Rt Rt we have Qπ (Xt ) → 0 σ 2 (s, Xs )ds in probability. This limit 0 σ 2 (s, Xs )ds is the process that we earlier denoted < X, X >t . For brevity, the redundancy in the notation is usually removed and the process < X,P X >t is denoted < X >t . For diffusion processes, variation of lower order such as i |Xti −Xti−1 | approach infinity P and variation of higher order, e.g. i (Xti − Xti−1 )4 converges to zero as the mesh size converges. We will return to the definition of the predictable covariation process < X, Y >t in a more general setting shortly. The Stochastic Exponential Suppose Xt is a diffusion process and consider a stochastic differential equation dYt = Yt dXt
(3.10)
with initial condition Y0 = 1. If Xt were an ordinary differentiable function, we could solve this equation by integrating both sides of dYt = dXt Yt
76 to obtain the exponential function Yt = c exp(Xt )
(3.11)
where c is a constant of integration. We might try and work backwards from (3.11) to see if this is the correct solution in the general case in which Xt is a diffusion. Letting f (Xt ) = exp(Xt ) and using Ito’s formula, df (Xt ) = {exp(Xt ) +
1 exp(Xt )σ 2 (t, Xt )}dt + exp(Xt )σ(t, Xt )dWt 2
6= f (Xt )dXt so this solution is not quite right. There is, however, a minor fix of the exponential function which does provide a solution. Suppose we try a solution of the form Yt = f (t, Xt ) = exp(Xt + h(t)) where h(t) is some differentiable stochastic process. Then again using Ito’s lemma, since f1 (t, Xt ) = Yt h0 (t) and f2 (t, Xt ) = f22 (t, Xt ) = Yt , 1 dYt = f1 (t, Xt )dt + f2 (t, Xt )dXt − f22 (t, Xt )σ 2 (t, Xt )dt 2 1 = Yt {h0 (t) + µ(t, Xt ) + σ 2 (t, Xt )}dt + Yt σ(t, Xt ))dWt 2 and if we choose just the right function h so that h0 (t) = − 12 σ2 (t, Xt ), we can get a Rt solution to (3.10). Since h(t) = − 12 0 σ2 (s, Xs )ds the solution is Z 1 t 2 1 Yt = exp(Xt − σ (s, Xs )ds) = exp{Xt − < X >t }. 2 0 2 We may denote this solution Y = E(X). We saw earlier that E(αW ) is a martingale for W a standard Brownian motion and α real. Since the solution to this equation is an exponential in the ordinary calculus, the term “stochastic exponential” seems justified. The “extra” term in the exponent 12 < X >t is a consequence of the infinite local variation of the process Xt . One of the most common conditions for E(X) to be a martingale is: Novikov’s condition: Suppose for g ∈ L2LOC Z 1 T 2 E exp{ g (s, Xs )ds} < ∞. 2 0 Rt Then Mt = E( 0 g(ω, s)dWs ) is a martingale.
3.6 Semimartingales Suppose Mt is a continuous martingale adapted to a filtration Ht and At is a continuous adapted process that is nondecreasing. It is easy to see that the sum At + Mt is a submartingale. But can this argument be reversed? If we are given a submartingale Xt , is it possible to find a nondecreasing process At and a martingale Mt such that Xt = At + Mt ? The fundamental result in this direction is the Doob-Meyer decomposition.
77 Theorem A53 (Doob-Meyer Decomposition) Let X be a continuous submartingale adapted to a filtration Ht . Then X can be uniquely written as Xt = At +Mt where Mt is a local martingale and At is an adapted nondecreasing process such that A0 = 0. Recall that if Mt is a square integrable martingale then Mt2 is a submartingale (this follows from Jensen’s inequality). Then according to the Doob-Meyer decomposition, we can decompose Mt2 into two components, one a martingale and the other a non-decreasing continuous adapted process, which we call the (predictable) quadratic variation process < M >t . In other words, Mt2 − < M >t is a continuous martingale. We may take this as the the more general definition of the process < M >, met earlier for processes obtained as stochastic integrals. For example suppose Z t Xt (ω) = f (ω, t)dW (t) 0
Rt
where f ∈ H2 . Then with < X >t = 0 f 2 (ω, t)dt, and Mt = Xt2 − < X >t , notice that for s < t Z t Z t 2 E[Mt − Ms |Hs ] = E[{ f (ω, u)dW (u)} − f 2 (ω, u)du|Hs ] s
s
=0
by (3.5). This means that our earlier definition of the process < X > coincides with the current one. For two martingales X, Y , we can define the predictable covariation process < X, Y > by < X, Y >t =
1 {< X + Y >t − < X − Y >t } 4
and once again this agrees for process obtained as stochastic integrals, since if X and Y are defined as Z t f (ω, t)dW (t) Xt (ω) = 0 Z t Yt (ω) = g(ω, t)dW (t) 0
then the predictable covariation is < X, Y >t (ω) =
Z
and this also follows from the Ito isometry.
t
f (ω, t)g(ω, t)dt 0
78 Definition (semimartingale) A continuous adapted process Xt is a semimartingale if it can be written as the sum Xt = At + Mt of a continuous adapted process At of locally bounded variation, and a continuous local martingale Mt . The stochastic integral for square integrable martingales can be extended to the class of semimartingales. Let Xt = At + Mt be a continuous semimartingale. We define Z Z Z h(t)dXt =
h(t)dAt +
h(t)dMt .
(3.12)
The first integral on the right hand side of (3.12) is understood to be a LebesgueStieltjes integral while the second is an Ito stochastic integral. There are a number of details that need to be checked with this definition, for example whether when we decompose a semimartingale into the two components, one with bounded variation and one a local martingale in two different ways (this decomposition is not unique), the same integral is obtained.
3.7 Girsanov’s Theorem Consider the Brownian motion defined by dXt = µdt + dWt with µ a constant drift parameter and denote by Eµ (.) the expectation when the drift is µ. Let fµ (x) be the N (µ, T ) probability density function. Then we can compute expectations under non-zero drift µ using a Brownian motion which has drift zero since Eµ (g(XT )) = E0 {g(XT )MT (X)} where
1 Mt (X) = E(µX) = exp{µXt − µ2 t}. 2
This is easy to check since the stochastic exponential MT (X) happens to be the ratio of the N (µT, T ) probability density function to the N (0, T ) density. The implications are many and useful. We can for example calculate moments or simulate under the condition µ = 0 and apply the results to the case µ 6= 0. By a similar calculation, for a bounded Borel measurable function g(Xt1 , ..., Xtn ),where 0≤ t1 ≤ ... ≤ tn , Eµ {g(Xt1 , ..., Xtn )} = E0 {g(Xt1 , ..., Xtn )Mtn (X)}. Theorem A54 (Girsanov’s Theorem for Brownian Motion) Consider a Brownian motion with drift µ defined by Xt = µt + Wt .
79 Then for any bounded measurable function g defined on the space C[0, T ] of the paths we have Eµ [g(X)] = E0 [g(X)MT (X)] where again MT (X) is the exponential martingale E(µX) = exp(µXT − 12 µ2 T ). Note that if we let P0 , Pµ denote the measures on the function space corresponding to drift 0 and µ respectively, we can formally write Z Z dPµ dP0 Eµ [g(X)] = g(x)dPµ = g(x) dP0 dPµ = E0 {g(X) } dP0 which means that MT (X) plays the role of a likelihood ratio dPµ dP0 for a restriction of the process to the interval [0, T ]. If g(X) only depended on the process up to time t < T then, from the martingale property of Mt (X), Eµ [g(X)] = E0 [g(X)MT (X)] = E0 {E[g(X)MT (X)|Ht ]} = E0 {g(X)Mt (X)} which shows that Mt (X) plays the role of a likelihood ratio for a restriction of the process to the interval [0, t]. We can argue for the form of Mt (X) and show that it “should” be a martingale under µ = 0 by considering the limit of the ratio of the finite-dimensional probability density functions like fµ (xt1 , ..., xtn ) f0 (xt1 , ..., xtn ) where fµ denotes the joint probability density function of Xt1 , Xt2 , ..., Xtn for t1 < t2 < ... < tn = T. These likelihood ratios are discrete-time martingales under P0 . For a more general diffusion, provided that the diffusion terms are identical, we can still express the Radon-Nikodym derivative as a stochastic exponential. Theorem A55 (Girsanov’s Theorem) Suppose P is the measure on C[0, T ] induced by X0 = 0, and dXt = µ(ω, t)dt + σ(ω, t)dWt under P. Assume the standard conditions so that the corresponding stochastic integrals are well-defined. Assume that the function θ(ω, t) =
µ(ω, t) − ν(ω, t) σ(ω, t)
80 is bounded. Then the stochastic exponential Z t Z t Z 1 t 2 Mt = E(− θ(ω, s)dWs ) = exp{− θ(ω, s)dWs − θ (ω, s)ds} 2 0 0 0 is a martingale under P . Suppose we define a measure Q on C[0, T ] by dQ = MT , dP or, equivalently for measurable subsets A, Q(A) = EP [1(A)MT ]. Then under the measure Q, the process Wt0 defined by Wt0 = Wt −
Z
t
θ(ω, s)dWs
0
is a standard Brownian motion and Xt has the representation dXt = ν(ω, t)dt + σ(ω, t)dWt0