Lecture 4: Introduction to stochastic processes and ... - CiteSeerX

10 downloads 0 Views 631KB Size Report
A Lévy process can have three types of components: a determinisitic drift, a random diffusion component and a random jump component. It is implicitly assumed ...
Lecture 4: Introduction to stochastic processes and stochastic calculus C´edric Archambeau Centre for Computational Statistics and Machine Learning Department of Computer Science University College London [email protected]

Advanced Topics in Machine Learning (MSc in Intelligent Systems) January 2008

Discrete-time vs continuous-time?

Real systems are continuous. Can we gain something? Physical model. Can we exploit this information? 2 1.5 1 0.5 0 !0.5 !1 !1.5 !2 450

500

550

600

Outline

Some definitions Stochastic processes L´evy processes Markov processes Diffusion processes Itˆ o’s formula Variational inference for Diffusion processes

Elements of probability theory

A collection A of subsets of the sample space Ω is a σ-algebra if A contains Ω: Ω ∈ A. A is closed under the operation of complementation: Ω\A ∈ A

if

A ∈ A.

A is closed under the operation of countable unions: [ An ∈ A if A1 , A2 , . . . , An , . . . ∈ A. n

This implies that A is closed under countable intersections. We say that (Ω, A) is a measurable space if Ω is a non-empty set and A is a σ-algebra of Ω.

Elements of probability theory (continued) A measure H(·) on (Ω, A) is a nonnegative valued set function on A satisfying H(∅) = 0, ! [ X H An = H(An ) n

if

Ai ∩ Aj = ∅,

n

for any sequence A1 , A2 , . . . , An , . . . ∈ A. If A ⊆ B, it follows that H(A) 6 H(B). If H(Ω) is finite, i.e. 0 6 H(Ω) 6 ∞, then H(·) can be normalized to obtain a probability measure P(·): P(A) =

H(A) , H(Ω)

P(A) ∈ [0, 1],

for all A ∈ A. We say that (Ω, A, P) is a probability space if P is a probability measure on the measurable space (Ω, A).

Elements of probability theory (continued)

Let (Ω1 , A1 ) and (Ω2 , A2 ) be two measurable spaces. The function f : Ω1 → Ω2 is measurable if the pre-image of any A2 ∈ A2 is in A1 : f −1 (A2 ) = {ω1 ∈ Ω1 : f (ω1 ) ∈ A2 } ∈ A1 , for all A2 ∈ A2 . Let (Ω, A, P) be a probability space.. We call the measurable function X : Ω → RD a continuous random variable.

Stochastic process

Let T be the time index set and (Ω, A, P) the underlying probability space. The function X : T × Ω → RD is a stochastic process, such that Xt = X (t, ·) : Ω → RD is a random variable for each t ∈ T , Xω = X (·, ω) : T → RD is a realization or sample path for each ω ∈ Ω. When considering continuous time systems, T will often be equal to R+ . In practice, we call stochastic process a collections of random variables X = {Xt , t > 0}, which are defined on a common probability space. We can think of Xt as the position of a particle at time t, changing as t varies. The particle moves continuously or has jumps for some t > 0: ∆Xt = Xt+ − Xt− = lim Xt+ − lim Xt− . ↓0

↓0

In general, we will assume that the process is right-continuous, i.e. Xt+ = Xt .

Independence

Let {Y1 , . . . , Yn } be a collection of random variables, with Yi ∈ RDi . They are independent if P(Y1 ∈ A1 , . . . , Yn ∈ An ) =

n Y

P(Yi ∈ Ai ),

i=1

for all Ai ⊂ RDi . An infinite collection is said to be independent if every finite subcollection is independent. A stochastic process X = {Xt , t > 0} has independent increments if the random variables Xt0 , Xt1 − Xt0 , . . . , Xtn − Xtn−1 are independent for all n > 1 and t0 < t1 < . . . < tn .

Stationarity

A stochastic process is (strictly) stationary if all the joint marginals are invariant under time displacement h > 0, that is p(Xt1 +h , Xt2 +h , . . . , Xtn +h ) = p(Xt1 , Xt2 , . . . , Xtn ) for all t1 , . . . , tn . The stochastic process X = {Xt , t > 0} is wide-sense stationary if there exists a constant m ∈ RD and a function C : R+ → RD , such that µt ≡ hXt i = m, Σt ≡ h(Xt − µt )(Xt − µt )> i = C(0), Vs,t ≡ h(Xt − µt )(Xs − µs )> i = C(t − s), for all s, t ∈ R+ . We call Vs,t the two-time covariance. The stochastic process X = {Xt , t > 0} has stationary increments if Xt+s − Xt has the same distribution as Xs for all s, t > 0.

Example: Poisson process The Poisson process with intensity parameter λ > 0 is a continuous time stochastic process X = {Xt , t ∈ R+ } with independent, stationary increments: Xt − Xs ∼ P(λ(t − s)), X0 = 0, for all 0 6 s 6 t. 10 9

The Poisson process is not wide-sense stationary:

8 7

σt2 = λt, vs,t = λ min{s, t}.

6

Xt

µt = λt,

5 4 3 2 1 0 0

5

10

15

20

25

30

35

40

t

The Poisson process is right-continuous and, in fact, it is L´evy process (see later) consisting only of jumps.

45

50

The Poisson distribution (or law of rare events) is defined as n ∼ P(λ) =

λn −λ e , n!

n ∈ N,

where λ > 0. The mean and the variance are given by hni = λ,

h(n − hni)2 i = λ.

L´evy process A stochastic process X = {Xt , t > 0} is a L´ evy process if The increments on disjoint time intervals are independent. The increments are stationary: increments with equally long time intervals are identically distributed. The sample paths are right-continuous with left limit, i.e. lim Xt+ = Xt , ↓0

lim Xt− = Xt− . ↓0

L´evy processes are usually described in terms of the L´evy-Khintchine representation. A L´evy process can have three types of components: a determinisitic drift, a random diffusion component and a random jump component. It is implicitly assumed that a L´evy process starts at X0 = 0 with probability 1. Applications: Financial stock prices: Black-Scholes Population models: birth-and-death processes ...

Interpretation of L´evy processes L´evy processes are the continuous time equivalent of random walks. A random walk over over n time units is a sum of n independent and identically distributed random variables: X ∆xn , Sn = n>1

where ∆xn are iid random variables. Random walks have independent and stationary increments. 2 1.8 1.6

state

1.4 1.2 1 0.8 0.6 0.4 0

0.2

0.4

time

0.6

0.8

1

Figure: Example of a Gaussian random walk with S0 = 1.

Interpretation of L´evy processes (continued) A random variable X has an infinitely divisible distribution if for every m > 1 we can write m X (m) X ∼ Xj , j=1

where

(m) {Xj }j

are iid.

For example the Normal, Poisson or Gamma distribution are infinitely divisible. The Bernouilli is not infinitely divisible. L´evy processes are infinitely divisible since the increments for non-overlapping time intervals are independent and stationary: Xs =

m X (Xjs/m − X(j−1)s/m ), j=1

for all m > 1. In fact, it can be shown that there is a L´evy process for each infinitely divisible probability distribution.

Markov process

The stochastic process X = {Xt , t > 0} is a (continuous time continuous-state) Markov process if p(Xt |Xs ) = p(Xt |Xr1 , . . . , Xrn , Xs ), for all 0 6 r1 6 . . . 6 rn 6 s 6 t. We call p(Xt |Xs ) the transition density. It can be time depenent. The Chapman-Kolmogorov equation follows from the Markov property: Z p(Xt |Xs ) = p(Xt |Xτ )p(Xτ |Xs )dXτ , for all s 6 τ 6 t. The Chapman-Kolmogorov played already an important role in (discrete time) dynamical systems. L´evy processes satisfy the Markov property.

Markov process (continued)

A Markov process is homogeneous if its transition density depends only on the time difference: p(Xt+h |Xt ) = p(Xh |X0 ), for all 0 6 h. The Poisson process is homogeneous a discrete state Markov process: P(nt+h |nt ) = P(λ(t + h − t)) = P(nh |n0 ). Let f (·) be a bounded function. A Markov process is ergodic if the time average limit coincides with the spatial average, i.e. Z 1 T f (Xt )dt = hf i lim T →∞ T 0 where the expectation is taken wrt the stationary probability density.

Martingale (fair game)

A martingale is a stochastic process such that the expectation of some future event given the past and the present is the same as if given only the present: hXt |{Xτ , 0 6 τ 6 s}i = Xs for all t. More formally, let (Ω, A, P) be a probability space and {At , t > 0} a filtration1 of A. The stochastic process X = {Xt , t > 0} is a martingale if hXt |As i = Xs , with probability 1, for all 0 6 s < t. When the process Xt satisfies the Markov property, we have hXt |As i = hXt |Xs i.

1 A filtration {At , t > 0} of A is an increasing family of σ-algebras on the measurable space (Ω, A), that is As ⊆ At ⊆ A for any 0 6 s 6 t. This means that more information becomes available with increasing time.

Diffusion process A Markov process X = {Xt , t > 0} is a diffusion process if the following limits exist for all  > 0: Z 1 lim p(Xt |Xs )dXt = 0, t↓s t − s |X −X |> Z t s 1 lim (Xt − Xs ) p(Xt |Xs )dXt = α(s, Xs ), t↓s t − s |X −X | p(Xt |Xs )dXt = β(s, Xs )β > (s, Xs ), t↓s t − s |X −X | is the instantaneous rate of change of the squared fluctuations of the process, given that Xs = x at time s.

Diffusion process (continued)

Diffusion processes are almost surely continuous functions of time, but they need not to be differentiable. Diffusion processes are L´evy processes (without the jump component). The time evolution of the transition density p(y, t|x, s) with s 6 t, given some initial condition or target constraint was described by Kolmogorov: The forward evolution of the transition density is given by the Kolmogorov forward equation (also known as the Fokker-Planck equation): X ∂ ∂p 1 X ∂2 =− {αi (t, y)p} + {Dij (t, y)p}, ∂t ∂yi 2 i,j ∂yi ∂yj i for a fixed initial state (s, x). The backward evolution of the transition density is given by the Kolmogorov backward equation (or adjoint equation): −

X ∂p ∂p ∂2p 1X = αi (s, x) + Di,j (s, x) , ∂s ∂x 2 ∂x i i ∂xj i i,j

for a fixed final state (t, y).

Wiener process The Wiener process was proposed by Wiener as mathematical description of Brownian motion. It characterizes the erratic motion (i.e. diffusion) of a grain pollen on a water surface due to it continually be bombarded by water molecules. It can be viewed as a scaling limit of a random walk on any finite time interval (Donsker’s Theorem). It is also commonly used to model stock market fluctuations.

0.6 0.4 0.2

Wt

0 !0.2 !0.4 !0.6 !0.8 0

0.2

0.4

t

0.6

0.8

1

Wiener process (continued) A standard Wiener process is a continuous time Gaussian Markov process W = {Wt , t > 0} with (non-overlapping) independent increments for which W0 = 0, The sample path Wω is almost surely continuous for all ω ∈ Ω, Wt − Ws ∼ N (0, t − s), for all 0 6 s 6 t. The sample paths of Wω are almost surely nowhere differentiable. The expectation hWt i is equal to 0 for all t. W is not wide-sense stationary as vs,t = min{s, t}, but has stationary increments. W is homogeneous since p(Wt+h |Wt ) = p(Wh |W0 ). W is a diffusion process with drift α = 0 and diffusion coefficient β = 1, such that Kolmogorov’s forward and backward equation are given by ∂p 1 ∂2p − = 0, ∂t 2 ∂y 2

∂p 1 ∂2p + = 0. ∂s 2 ∂x 2

Informal proof that a Wiener process is not differentiable: (n) (n) Consider the partition of a bounded time interval [s, t] into subintervals [τk , τk+1 ] of equal length, such that t −s , k = 0, 1, . . . , 2n − 1. 2n Consider a sample path Wω (τ ) of the standard Wiener process W = {Wτ , τ ∈ [s, t]}. It can be shown (Kloeden and Platen, p. 72) that (n)

τk

lim

=s +k

n 2X −1 “

n→∞

(n)

(n)

W (τk+1 , ω) − W (τk ), ω

”2

= t − s.

k=0

Hence, taking the limit superior, i.e. the supremum2 of all the limit points, we get n

−1 ˛ ˛ 2X ˛ ˛ (n) (n) ˛W (τ (n) , ω) − W (τ (n) , ω)˛. t − s 6 lim sup max ˛W (τk+1 , ω) − W (τk , ω)˛ k+1 k n→∞

k

k=0

˛ ˛ (n) (n) From the sample path continuity, we have maxk ˛W (τk+1 , ω) − W (τk , ω)˛ → 0 with probability 1 when n → ∞ and therefore n 2X −1

˛ ˛ ˛W (τ (n) , ω) − W (τ (n) , ω)˛ → ∞. k+1 k

k=0

As a consequence, the sample paths do almost surely not have bounded variation on [s, t] and cannot be differentiated. 2 For S ⊆ T , the supremum of S is the least element of T , which is greater or equal to all elements of S.

Let s 6 t. The two-time covariance is then given by vs,t = hWt Ws i = h(Wt −Ws + Ws )Ws i = hWt − Ws ihWs i + hWs2 i = 0 · 0 + s. The transition density of W is given by p(Wt |Ws ) = N (Ws , t − s). Hence, the drift and the diffusion coefficient for a standard Wiener process are hWt i − Ws = 0, t −s ˙ 2¸ ˙ 2¸ Wt −2 hWt i Ws + Ws2 Wt −Ws2 t −s β(s, Ws ) = lim = lim = lim . t↓s t↓s t↓s t − s t −s t −s

α(s, Ws ) = lim t↓s

The same results are found by directly differentiating the transition density as required in the Kolmogorov’s equations.

Brownian bridge

A Brownian bridge is a Wiener process pinned at both ends, i.e. the sample paths all go through an initial state at time t = 0 and a given state at a later time t = T . Let W = {Wt , 0 6 t} be a standard Wiener process. The Brownian bridge B(x0 , yT ) = {Bt (x0 , yT ), 0 6 t 6 T } is a stochastic process, such that Bt (x0 , yT ) = x0 + Wt −

t (x0 + WT − yT ). T

A Brownian bridge Bt (x0 , yT ) is a Gaussian process with mean function and two-time covariance given by t (x0 − yT ), T st = min{s, t} − , T

hBt i = x0 − vs,t for 0 6 s, t 6 T .

Brownian bridge (continued)

3

5

4

2

3 1

Bt

Bt

2 0

1 !1 0 !2

!3 0

!1

0.5

1

1.5

2

2.5

t

3

3.5

4

4.5

5

!2 0

0.5

1

1.5

2

2.5

3

3.5

4

4.5

5

t

Figure: Sample path examples of a Brownian bridge for different initial and final states.

Diffusion processes revisited Let W = {Wt , t > 0} be a standard Wiener process. The time evolution of a diffusion process can be described by a stochastic differential equation (SDE): dXt = α(t, Xt )dt + β(t, Xt )dWt ,

dWt ∼ N (0, dtID ),

where X = {Xt , t > 0} is a stochastic process with drift α ∈ RD and diffusion coefficient β ∈ RD×D . This representation corresponds to the state-space representation of discrete-time dynamical systems. An SDE is be interpretted as a (stochastic) integral equation along a sample path ω, that is Z t Z t dW (τ, ω) X (t, ω) − X (s, ω) = α(τ, X (τ, ω))dτ + β(τ, X (τ, ω)) dτ. dτ s s This representation is symbolic as a Wiener process is almost surely not differentiable, but the limiting process corresponds to Gaussian white noise: W (τ + h, ω) − W (τ, ω) lim ∼ N (0, 1). h→0 h This means that Gaussian white noise cannot be realized physically!

Construction of Itˆo’s stochastic integral The central question is how to compute a stochastic integral of the form Z t β(τ, X (τ, ω))dW (τ, ω) =? s

K. Itˆ o’s starting point is the following: Consider the standard Wiener process W = {Wt , t > 0} and a (scalar) constant diffusion coefficient β(t, Xt ) = β for all t. The integral along the sample path ω is equal to Z t βdW (τ, ω) = β {W (t, ω) − W (s, ω)} s

with probability 1. The expected integral and the expected squared integral are thus given by *„Z fiZ t fl «2 + t βdW (τ, ω) = 0, βdW (τ, ω) = β 2 (t − s). s

s

Construction of Itˆo’s stochastic integral (continued) Consider the integral of the random function f : T × Ω → R: Z t f (τ, ω)dW (τ, ω). I [f ](ω) = s

It is assumed f is mean square integrable. 1

If f is a random step function, that is f (t, ω) = fj (ω) on [tj , tj+1 [, then I [f ](ω) =

n−1 X

fj (ω){W (tj+1 , ω) − W (tj , ω)},

j=1

with probability 1 for all ω. Since fj (ω) is constant on [tj , tj+1 [, we get ˙ ¸ ˙ 2 ¸ P 2 I [f ] = 0, I [f ] = j hfj i(tj+1 − tj ). 2

If f (n) is a sequence of random n-step functions converging to the general (n) (n) (n) random function f , such that f (n) (t, ω) = f (tj , ω) on [tj , tj+1 [, then I [f (n) ](ω) =

n−1 X

(n)

(n)

(n)

f (tj , ω){W (tj+1 , ω) − W (tj , ω)},

j=1

with probability 1 for all ω. The same results follow.

The Itˆo stochastic integral

Theorem: The Itˆ o stochastic integral I [f ] of a random function f : T × Ω → R is the (unique) mean square limit of sequences of stochastic integrals I [f (n) ] for any sequence of random n-step functions f (n) converging to f : I [f ](ω) = m.s. lim

n−1 X

n→∞

(n)

(n)

(n)

f (tj , ω){W (tj+1 , ω) − W (tj , ω)}

j=1 (n)

(n)

with probability 1 and s = t1 < . . . < tn−1 < t. The Itˆ o integral of f with respect to W is a zero mean random variable. Since the Itˆ o integral is constructed from the sequence f (n) evaluated at tj ’s, it defines a stochastic process which is a martingale. The chain rule from classical calculus does not apply (see later)! The Stratonivich construction preserves the classical chain rule, but not the martingale property. ˙ ¸ Rt We call I 2 [f ] = s hf 2 idt the Itˆ o isomery.

Itˆ o’s stochastic integral follows from the fact that n−1 D E X (n) (n) (n) I 2 [f (n) ] = hf 2 (tj , ω)i(tj+1 − tj ), j=1 (n)

(n)

is a proper Riemann integral for tj+1 − tj

→ 0.

Itˆ o formula

Let Yt = U(t, Xt ) and consider the process X = {Xt , t > 0} described by the following SDE: dXt = α(t, Xt )dt + β(t, Xt )dWt . The stochastic chain rule is given by ff  ∂U 1 ∂2U ∂U ∂U +α + β 2 2 dt + β dWt dYt = ∂t ∂x 2 ∂x ∂x with probability 1. The additional term comes from the fact that The symbolic SDE is to be interpreted as an Itˆ o stochastic integral, i.e. with equality in the mean square sense. dWt2 is of O(dt).

Chain rule for classical calculus: Consider y = u(t, x). Discarding the second and higher order terms in the Taylor expansion of u leads to dy = u(t + dt, x + dx) − u(t, x) =

∂u ∂u dt + dx. ∂t ∂x

Chain rule for stochastic calculus: For Yt = U(t, Xt ), the Taylor expansion of U leads to dYt = U(t + dt, Xt + dXt ) − U(t, Xt )  ff ∂U ∂U 1 ∂2U ∂2U ∂2U = dt + dXt + (dt)2 + 2 dt dXt + (dXt )2 + . . . , 2 2 ∂t ∂x 2 ∂t ∂t∂x ∂x where

(dXt )2 = α2 (dt)2 + 2αβdt dWt +β 2 (dWt )2 . Hence, we need to keep the additional term of O(dt), such that ff  1 ∂2U ∂U ∂U + β2 dt + dXt . dYt = 2 ∂t 2 ∂x ∂x

Substituting dXt leads to the desired results.

Application: Black-Scholes option-pricing model Assume the evolution of a stock price Xt is described by a geometric Wiener process: dXt = ρXt dt + σXt dWt , where ρ is called the risk-free rate (or drift) and σ the volatility. Consider the change of variable Yt = log Xt . Applying the stochastic chain rule leads to Black-Scholes formula: « „ σ2 dt + σdWt . dYt = ρ − 2 This leads to the following solution for the stock price at time t: „ « ff σ2 Xt = X0 exp ρ− t + σWt . 2 Assumptions: No dividends or charges European exercise terms Markets are efficient Interest rates are known Returns are log-normal

Ornstein-Uhlenbeck (OU) process

The Ornstein-Uhlenbeck process with drift γ > 0 and mean µ is defined as follows: dXt = −γ(Xt − µ)dt + σdWt . The OU process is known as the mean reverting process. It is a Gaussian process covariance function: vs,t =

σ 2 −γ|t−s| e , 2γ

s 6 t.

It is wide-sense stationary. It is a homogeneous Markov process. It process is a diffusion process. It is the continuous equivalent of the discrete AR(1) process.

OU process (continued)

2 1 0

Xt

!1 !2 !3 !4 !5 !6 0

0.5

1

1.5

2

2.5

3

3.5

4

4.5

5

t

Figure: Sample path examples of a OU process with different drift and diffusion coefficient. The same mean µ and initial condition were used.

References

Crispin W. Gardiner: Handbook of Stochastic Methods. Springer, 2004 (3rd edition). Peter E. Kloeden and Eckhard Platen: Numerical Solution of Stochastic Differential Equations. Springer, 1999. Bernt Øksendal: Stochastic Differential Equations. An Introduction with Applications. Springer, 2000 (5th edition). A Tutorial Introduction to Stochastic Differential Equations: Continuous time Gaussian Markov Processes by Christopher K. I. Williams at NIPS workshop on Dynamical Systems, Stochastic Processes and Bayesian Inference, 2006. L´evy processes and Finance by Matthias Winkel.

Suggest Documents