Lecture 4: Introduction to stochastic processes and ... - CiteSeerX

Lecture 4: Introduction to stochastic processes and stochastic calculus Cédric Archambeau Centre for Computational Statistics and Machine Learning Department of Computer Science University College London [email protected]

Advanced Topics in Machine Learning (MSc in Intelligent Systems) January 2008

Discrete-time vs continuous-time?

Real systems are continuous. Can we gain something? Physical model. Can we exploit this information? 2 1.5 1 0.5 0 !0.5 !1 !1.5 !2 450

500

550

600

Outline

Some definitions Stochastic processes Lévy processes Markov processes Diffusion processes Itˆ o’s formula Variational inference for Diffusion processes

Elements of probability theory

A collection A of subsets of the sample space Ω is a σ-algebra if A contains Ω: Ω ∈ A. A is closed under the operation of complementation: Ω\A ∈ A

if

A ∈ A.

A is closed under the operation of countable unions: [ An ∈ A if A1 , A2 , . . . , An , . . . ∈ A. n

This implies that A is closed under countable intersections. We say that (Ω, A) is a measurable space if Ω is a non-empty set and A is a σ-algebra of Ω.

Elements of probability theory (continued) A measure H(·) on (Ω, A) is a nonnegative valued set function on A satisfying H(∅) = 0, ! [ X H An = H(An ) n

if

Ai ∩ Aj = ∅,

n

for any sequence A1 , A2 , . . . , An , . . . ∈ A. If A ⊆ B, it follows that H(A) 6 H(B). If H(Ω) is finite, i.e. 0 6 H(Ω) 6 ∞, then H(·) can be normalized to obtain a probability measure P(·): P(A) =

H(A) , H(Ω)

P(A) ∈ [0, 1],

for all A ∈ A. We say that (Ω, A, P) is a probability space if P is a probability measure on the measurable space (Ω, A).

Elements of probability theory (continued)

Let (Ω1 , A1 ) and (Ω2 , A2 ) be two measurable spaces. The function f : Ω1 → Ω2 is measurable if the pre-image of any A2 ∈ A2 is in A1 : f −1 (A2 ) = {ω1 ∈ Ω1 : f (ω1 ) ∈ A2 } ∈ A1 , for all A2 ∈ A2 . Let (Ω, A, P) be a probability space.. We call the measurable function X : Ω → RD a continuous random variable.

Stochastic process

Let T be the time index set and (Ω, A, P) the underlying probability space. The function X : T × Ω → RD is a stochastic process, such that Xt = X (t, ·) : Ω → RD is a random variable for each t ∈ T , Xω = X (·, ω) : T → RD is a realization or sample path for each ω ∈ Ω. When considering continuous time systems, T will often be equal to R+ . In practice, we call stochastic process a collections of random variables X = {Xt , t > 0}, which are defined on a common probability space. We can think of Xt as the position of a particle at time t, changing as t varies. The particle moves continuously or has jumps for some t > 0: ∆Xt = Xt+ − Xt− = lim Xt+ − lim Xt− . ↓0

↓0

In general, we will assume that the process is right-continuous, i.e. Xt+ = Xt .

Independence

Let {Y1 , . . . , Yn } be a collection of random variables, with Yi ∈ RDi . They are independent if P(Y1 ∈ A1 , . . . , Yn ∈ An ) =

n Y

P(Yi ∈ Ai ),

i=1

for all Ai ⊂ RDi . An infinite collection is said to be independent if every finite subcollection is independent. A stochastic process X = {Xt , t > 0} has independent increments if the random variables Xt0 , Xt1 − Xt0 , . . . , Xtn − Xtn−1 are independent for all n > 1 and t0 < t1 < . . . < tn .

Stationarity

A stochastic process is (strictly) stationary if all the joint marginals are invariant under time displacement h > 0, that is p(Xt1 +h , Xt2 +h , . . . , Xtn +h ) = p(Xt1 , Xt2 , . . . , Xtn ) for all t1 , . . . , tn . The stochastic process X = {Xt , t > 0} is wide-sense stationary if there exists a constant m ∈ RD and a function C : R+ → RD , such that µt ≡ hXt i = m, Σt ≡ h(Xt − µt )(Xt − µt )> i = C(0), Vs,t ≡ h(Xt − µt )(Xs − µs )> i = C(t − s), for all s, t ∈ R+ . We call Vs,t the two-time covariance. The stochastic process X = {Xt , t > 0} has stationary increments if Xt+s − Xt has the same distribution as Xs for all s, t > 0.

Example: Poisson process The Poisson process with intensity parameter λ > 0 is a continuous time stochastic process X = {Xt , t ∈ R+ } with independent, stationary increments: Xt − Xs ∼ P(λ(t − s)), X0 = 0, for all 0 6 s 6 t. 10 9

The Poisson process is not wide-sense stationary:

8 7

σt2 = λt, vs,t = λ min{s, t}.

6

Xt

µt = λt,

5 4 3 2 1 0 0

5

10

15

20

25

30

35

40

t

The Poisson process is right-continuous and, in fact, it is Lévy process (see later) consisting only of jumps.

45

50

The Poisson distribution (or law of rare events) is defined as n ∼ P(λ) =

λn −λ e , n!

n ∈ N,

where λ > 0. The mean and the variance are given by hni = λ,

h(n − hni)2 i = λ.

Lévy process A stochastic process X = {Xt , t > 0} is a L´ evy process if The increments on disjoint time intervals are independent. The increments are stationary: increments with equally long time intervals are identically distributed. The sample paths are right-continuous with left limit, i.e. lim Xt+ = Xt , ↓0

lim Xt− = Xt− . ↓0

Lévy processes are usually described in terms of the Lévy-Khintchine representation. A Lévy process can have three types of components: a determinisitic drift, a random diffusion component and a random jump component. It is implicitly assumed that a Lévy process starts at X0 = 0 with probability 1. Applications: Financial stock prices: Black-Scholes Population models: birth-and-death processes ...

Interpretation of Lévy processes Lévy processes are the continuous time equivalent of random walks. A random walk over over n time units is a sum of n independent and identically distributed random variables: X ∆xn , Sn = n>1

where ∆xn are iid random variables. Random walks have independent and stationary increments. 2 1.8 1.6

state

1.4 1.2 1 0.8 0.6 0.4 0

0.2

0.4

time

0.6

0.8

1

Figure: Example of a Gaussian random walk with S0 = 1.

Interpretation of Lévy processes (continued) A random variable X has an infinitely divisible distribution if for every m > 1 we can write m X (m) X ∼ Xj , j=1

where

(m) {Xj }j

are iid.

For example the Normal, Poisson or Gamma distribution are infinitely divisible. The Bernouilli is not infinitely divisible. Lévy processes are infinitely divisible since the increments for non-overlapping time intervals are independent and stationary: Xs =

m X (Xjs/m − X(j−1)s/m ), j=1

for all m > 1. In fact, it can be shown that there is a Lévy process for each infinitely divisible probability distribution.

Markov process

The stochastic process X = {Xt , t > 0} is a (continuous time continuous-state) Markov process if p(Xt |Xs ) = p(Xt |Xr1 , . . . , Xrn , Xs ), for all 0 6 r1 6 . . . 6 rn 6 s 6 t. We call p(Xt |Xs ) the transition density. It can be time depenent. The Chapman-Kolmogorov equation follows from the Markov property: Z p(Xt |Xs ) = p(Xt |Xτ )p(Xτ |Xs )dXτ , for all s 6 τ 6 t. The Chapman-Kolmogorov played already an important role in (discrete time) dynamical systems. Lévy processes satisfy the Markov property.

Markov process (continued)

A Markov process is homogeneous if its transition density depends only on the time difference: p(Xt+h |Xt ) = p(Xh |X0 ), for all 0 6 h. The Poisson process is homogeneous a discrete state Markov process: P(nt+h |nt ) = P(λ(t + h − t)) = P(nh |n0 ). Let f (·) be a bounded function. A Markov process is ergodic if the time average limit coincides with the spatial average, i.e. Z 1 T f (Xt )dt = hf i lim T →∞ T 0 where the expectation is taken wrt the stationary probability density.

Martingale (fair game)

A martingale is a stochastic process such that the expectation of some future event given the past and the present is the same as if given only the present: hXt |{Xτ , 0 6 τ 6 s}i = Xs for all t. More formally, let (Ω, A, P) be a probability space and {At , t > 0} a filtration1 of A. The stochastic process X = {Xt , t > 0} is a martingale if hXt |As i = Xs , with probability 1, for all 0 6 s < t. When the process Xt satisfies the Markov property, we have hXt |As i = hXt |Xs i.

1 A filtration {At , t > 0} of A is an increasing family of σ-algebras on the measurable space (Ω, A), that is As ⊆ At ⊆ A for any 0 6 s 6 t. This means that more information becomes available with increasing time.

Diffusion process A Markov process X = {Xt , t > 0} is a diffusion process if the following limits exist for all > 0: Z 1 lim p(Xt |Xs )dXt = 0, t↓s t − s |X −X |> Z t s 1 lim (Xt − Xs ) p(Xt |Xs )dXt = α(s, Xs ), t↓s t − s |X −X | p(Xt |Xs )dXt = β(s, Xs )β > (s, Xs ), t↓s t − s |X −X | is the instantaneous rate of change of the squared fluctuations of the process, given that Xs = x at time s.

Diffusion process (continued)

Diffusion processes are almost surely continuous functions of time, but they need not to be differentiable. Diffusion processes are Lévy processes (without the jump component). The time evolution of the transition density p(y, t|x, s) with s 6 t, given some initial condition or target constraint was described by Kolmogorov: The forward evolution of the transition density is given by the Kolmogorov forward equation (also known as the Fokker-Planck equation): X ∂ ∂p 1 X ∂2 =− {αi (t, y)p} + {Dij (t, y)p}, ∂t ∂yi 2 i,j ∂yi ∂yj i for a fixed initial state (s, x). The backward evolution of the transition density is given by the Kolmogorov backward equation (or adjoint equation): −

X ∂p ∂p ∂2p 1X = αi (s, x) + Di,j (s, x) , ∂s ∂x 2 ∂x i i ∂xj i i,j

for a fixed final state (t, y).

Wiener process The Wiener process was proposed by Wiener as mathematical description of Brownian motion. It characterizes the erratic motion (i.e. diffusion) of a grain pollen on a water surface due to it continually be bombarded by water molecules. It can be viewed as a scaling limit of a random walk on any finite time interval (Donsker’s Theorem). It is also commonly used to model stock market fluctuations.

0.6 0.4 0.2

Wt

0 !0.2 !0.4 !0.6 !0.8 0

0.2

0.4

t

0.6

0.8

1

Wiener process (continued) A standard Wiener process is a continuous time Gaussian Markov process W = {Wt , t > 0} with (non-overlapping) independent increments for which W0 = 0, The sample path Wω is almost surely continuous for all ω ∈ Ω, Wt − Ws ∼ N (0, t − s), for all 0 6 s 6 t. The sample paths of Wω are almost surely nowhere differentiable. The expectation hWt i is equal to 0 for all t. W is not wide-sense stationary as vs,t = min{s, t}, but has stationary increments. W is homogeneous since p(Wt+h |Wt ) = p(Wh |W0 ). W is a diffusion process with drift α = 0 and diffusion coefficient β = 1, such that Kolmogorov’s forward and backward equation are given by ∂p 1 ∂2p − = 0, ∂t 2 ∂y 2

∂p 1 ∂2p + = 0. ∂s 2 ∂x 2

Informal proof that a Wiener process is not differentiable: (n) (n) Consider the partition of a bounded time interval [s, t] into subintervals [τk , τk+1 ] of equal length, such that t −s , k = 0, 1, . . . , 2n − 1. 2n Consider a sample path Wω (τ ) of the standard Wiener process W = {Wτ , τ ∈ [s, t]}. It can be shown (Kloeden and Platen, p. 72) that (n)

τk

lim

=s +k

n 2X −1 “

n→∞

(n)

(n)

W (τk+1 , ω) − W (τk ), ω

”2

= t − s.

k=0

Hence, taking the limit superior, i.e. the supremum2 of all the limit points, we get n

−1 ˛ ˛ 2X ˛ ˛ (n) (n) ˛W (τ (n) , ω) − W (τ (n) , ω)˛. t − s 6 lim sup max ˛W (τk+1 , ω) − W (τk , ω)˛ k+1 k n→∞

k

k=0

˛ ˛ (n) (n) From the sample path continuity, we have maxk ˛W (τk+1 , ω) − W (τk , ω)˛ → 0 with probability 1 when n → ∞ and therefore n 2X −1

˛ ˛ ˛W (τ (n) , ω) − W (τ (n) , ω)˛ → ∞. k+1 k

k=0

As a consequence, the sample paths do almost surely not have bounded variation on [s, t] and cannot be differentiated. 2 For S ⊆ T , the supremum of S is the least element of T , which is greater or equal to all elements of S.

Let s 6 t. The two-time covariance is then given by vs,t = hWt Ws i = h(Wt −Ws + Ws )Ws i = hWt − Ws ihWs i + hWs2 i = 0 · 0 + s. The transition density of W is given by p(Wt |Ws ) = N (Ws , t − s). Hence, the drift and the diffusion coefficient for a standard Wiener process are hWt i − Ws = 0, t −s ˙ 2¸ ˙ 2¸ Wt −2 hWt i Ws + Ws2 Wt −Ws2 t −s β(s, Ws ) = lim = lim = lim . t↓s t↓s t↓s t − s t −s t −s

α(s, Ws ) = lim t↓s

The same results are found by directly differentiating the transition density as required in the Kolmogorov’s equations.

Brownian bridge

A Brownian bridge is a Wiener process pinned at both ends, i.e. the sample paths all go through an initial state at time t = 0 and a given state at a later time t = T . Let W = {Wt , 0 6 t} be a standard Wiener process. The Brownian bridge B(x0 , yT ) = {Bt (x0 , yT ), 0 6 t 6 T } is a stochastic process, such that Bt (x0 , yT ) = x0 + Wt −

t (x0 + WT − yT ). T

A Brownian bridge Bt (x0 , yT ) is a Gaussian process with mean function and two-time covariance given by t (x0 − yT ), T st = min{s, t} − , T

hBt i = x0 − vs,t for 0 6 s, t 6 T .

Brownian bridge (continued)

3

5

4

2

3 1

Bt

Bt

2 0

1 !1 0 !2

!3 0

!1

0.5

1

1.5

2

2.5

t

3

3.5

4

4.5

5

!2 0

0.5

1

1.5

2

2.5

3

3.5

4

4.5

5

t

Figure: Sample path examples of a Brownian bridge for different initial and final states.

Diffusion processes revisited Let W = {Wt , t > 0} be a standard Wiener process. The time evolution of a diffusion process can be described by a stochastic differential equation (SDE): dXt = α(t, Xt )dt + β(t, Xt )dWt ,

dWt ∼ N (0, dtID ),

where X = {Xt , t > 0} is a stochastic process with drift α ∈ RD and diffusion coefficient β ∈ RD×D . This representation corresponds to the state-space representation of discrete-time dynamical systems. An SDE is be interpretted as a (stochastic) integral equation along a sample path ω, that is Z t Z t dW (τ, ω) X (t, ω) − X (s, ω) = α(τ, X (τ, ω))dτ + β(τ, X (τ, ω)) dτ. dτ s s This representation is symbolic as a Wiener process is almost surely not differentiable, but the limiting process corresponds to Gaussian white noise: W (τ + h, ω) − W (τ, ω) lim ∼ N (0, 1). h→0 h This means that Gaussian white noise cannot be realized physically!

Construction of Itô’s stochastic integral The central question is how to compute a stochastic integral of the form Z t β(τ, X (τ, ω))dW (τ, ω) =? s

K. Itˆ o’s starting point is the following: Consider the standard Wiener process W = {Wt , t > 0} and a (scalar) constant diffusion coefficient β(t, Xt ) = β for all t. The integral along the sample path ω is equal to Z t βdW (τ, ω) = β {W (t, ω) − W (s, ω)} s

with probability 1. The expected integral and the expected squared integral are thus given by *„Z fiZ t fl «2 + t βdW (τ, ω) = 0, βdW (τ, ω) = β 2 (t − s). s

s

Construction of Itô’s stochastic integral (continued) Consider the integral of the random function f : T × Ω → R: Z t f (τ, ω)dW (τ, ω). I [f ](ω) = s

It is assumed f is mean square integrable. 1

If f is a random step function, that is f (t, ω) = fj (ω) on [tj , tj+1 [, then I [f ](ω) =

n−1 X

fj (ω){W (tj+1 , ω) − W (tj , ω)},

j=1

with probability 1 for all ω. Since fj (ω) is constant on [tj , tj+1 [, we get ˙ ¸ ˙ 2 ¸ P 2 I [f ] = 0, I [f ] = j hfj i(tj+1 − tj ). 2

If f (n) is a sequence of random n-step functions converging to the general (n) (n) (n) random function f , such that f (n) (t, ω) = f (tj , ω) on [tj , tj+1 [, then I [f (n) ](ω) =

n−1 X

(n)

(n)

(n)

f (tj , ω){W (tj+1 , ω) − W (tj , ω)},

j=1

with probability 1 for all ω. The same results follow.

The Itô stochastic integral

Theorem: The Itˆ o stochastic integral I [f ] of a random function f : T × Ω → R is the (unique) mean square limit of sequences of stochastic integrals I [f (n) ] for any sequence of random n-step functions f (n) converging to f : I [f ](ω) = m.s. lim

n−1 X

n→∞

(n)

(n)

(n)

f (tj , ω){W (tj+1 , ω) − W (tj , ω)}

j=1 (n)

(n)

with probability 1 and s = t1 < . . . < tn−1 < t. The Itˆ o integral of f with respect to W is a zero mean random variable. Since the Itˆ o integral is constructed from the sequence f (n) evaluated at tj ’s, it defines a stochastic process which is a martingale. The chain rule from classical calculus does not apply (see later)! The Stratonivich construction preserves the classical chain rule, but not the martingale property. ˙ ¸ Rt We call I 2 [f ] = s hf 2 idt the Itˆ o isomery.

Itˆ o’s stochastic integral follows from the fact that n−1 D E X (n) (n) (n) I 2 [f (n) ] = hf 2 (tj , ω)i(tj+1 − tj ), j=1 (n)

(n)

is a proper Riemann integral for tj+1 − tj

→ 0.

Itˆ o formula

Let Yt = U(t, Xt ) and consider the process X = {Xt , t > 0} described by the following SDE: dXt = α(t, Xt )dt + β(t, Xt )dWt . The stochastic chain rule is given by ff  ∂U 1 ∂2U ∂U ∂U +α + β 2 2 dt + β dWt dYt = ∂t ∂x 2 ∂x ∂x with probability 1. The additional term comes from the fact that The symbolic SDE is to be interpreted as an Itˆ o stochastic integral, i.e. with equality in the mean square sense. dWt2 is of O(dt).

Chain rule for classical calculus: Consider y = u(t, x). Discarding the second and higher order terms in the Taylor expansion of u leads to dy = u(t + dt, x + dx) − u(t, x) =

∂u ∂u dt + dx. ∂t ∂x

Chain rule for stochastic calculus: For Yt = U(t, Xt ), the Taylor expansion of U leads to dYt = U(t + dt, Xt + dXt ) − U(t, Xt )  ff ∂U ∂U 1 ∂2U ∂2U ∂2U = dt + dXt + (dt)2 + 2 dt dXt + (dXt )2 + . . . , 2 2 ∂t ∂x 2 ∂t ∂t∂x ∂x where

(dXt )2 = α2 (dt)2 + 2αβdt dWt +β 2 (dWt )2 . Hence, we need to keep the additional term of O(dt), such that ff  1 ∂2U ∂U ∂U + β2 dt + dXt . dYt = 2 ∂t 2 ∂x ∂x

Substituting dXt leads to the desired results.

Application: Black-Scholes option-pricing model Assume the evolution of a stock price Xt is described by a geometric Wiener process: dXt = ρXt dt + σXt dWt , where ρ is called the risk-free rate (or drift) and σ the volatility. Consider the change of variable Yt = log Xt . Applying the stochastic chain rule leads to Black-Scholes formula: « „ σ2 dt + σdWt . dYt = ρ − 2 This leads to the following solution for the stock price at time t: „ « ff σ2 Xt = X0 exp ρ− t + σWt . 2 Assumptions: No dividends or charges European exercise terms Markets are efficient Interest rates are known Returns are log-normal

Ornstein-Uhlenbeck (OU) process

The Ornstein-Uhlenbeck process with drift γ > 0 and mean µ is defined as follows: dXt = −γ(Xt − µ)dt + σdWt . The OU process is known as the mean reverting process. It is a Gaussian process covariance function: vs,t =

σ 2 −γ|t−s| e , 2γ

s 6 t.

It is wide-sense stationary. It is a homogeneous Markov process. It process is a diffusion process. It is the continuous equivalent of the discrete AR(1) process.

OU process (continued)

2 1 0

Xt

!1 !2 !3 !4 !5 !6 0

0.5

1

1.5

2

2.5

3

3.5

4

4.5

5

t

Figure: Sample path examples of a OU process with different drift and diffusion coefficient. The same mean µ and initial condition were used.

References

Crispin W. Gardiner: Handbook of Stochastic Methods. Springer, 2004 (3rd edition). Peter E. Kloeden and Eckhard Platen: Numerical Solution of Stochastic Differential Equations. Springer, 1999. Bernt Øksendal: Stochastic Differential Equations. An Introduction with Applications. Springer, 2000 (5th edition). A Tutorial Introduction to Stochastic Differential Equations: Continuous time Gaussian Markov Processes by Christopher K. I. Williams at NIPS workshop on Dynamical Systems, Stochastic Processes and Bayesian Inference, 2006. Lévy processes and Finance by Matthias Winkel.

Lecture 4: Introduction to stochastic processes and ... - CiteSeerX

Lecture 4: Introduction to stochastic processes and ... - CiteSeerX

Suggest Documents

Introduction to Stochastic Processes - Lecture Notes

Lecture 5. Stochastic processes

Lecture 4 : Introduction to CCDs.

M362M: Introduction to Stochastic Processes - Department of ...

An Introduction to Stochastic Processes - Semantic Scholar

Introduction to Stochastic Processes - University of Kent

Lecture 2: Branching Processes Introduction Branching Processes

DISCRETE EVENT STOCHASTIC PROCESSES Lecture Notes for ...

An Introduction to Probability and Stochastic Processes for ... - PIMS

An introduction to diffusion processes and Ito's stochastic calculus

Stat418: Introduction to Probability and Stochastic Processes for ...

From Individual Stochastic Processes to Macroscopic ... - CiteSeerX

Stochastic Processes and Stochastic Integration

An Introduction to Stochastic Processes in Continuous Time

Read Online Introduction to Stochastic Processes with ... - Google Sites

Stochastic Processes

Lecture 4: Nash Equilibria 4.1 Introduction

Lecture 4: Bell LaPadula CS 591: Introduction to Computer Security

3.091 – Introduction to Solid State Chemistry Lecture Notes No. 4 ...

INTRODUCTION TO RANDOM PROCESSES

Probability and Stochastic Processes: A Friendly Introduction - Winlab

Probability and Stochastic Processes: A Friendly Introduction for ...

Lecture 1: Introduction to Measurement 1.1 Introduction

Lecture 1 Introduction to Bioinformatics