everywhere central limit theorems are established and the large deviation results are ... 4.4 Large deviation principle for sequence of i.i.d. random variables .
LARGE DEVIATION PRINCIPLE FOR FUNCTIONAL LIMIT THEOREMS
by ADINA OPRISAN
Presented to the Faculty of the Graduate School of The University of Texas at Arlington in Partial Fulfillment of the Requirements for the Degree of
DOCTOR OF PHILOSOPHY
THE UNIVERSITY OF TEXAS AT ARLINGTON August 2009
c by Adina Oprisan 2009 Copyright All Rights Reserved
This dissertation is dedicated to the memory of my father in law Professor Gh. Oprisan
ACKNOWLEDGEMENTS First and foremost, I would like to express my indebted gratitude to my adviser Prof. Andrzej Korzeniowski for his warm encouragement, thoughtful guidance and support and for sharing with me a lot of his expertise and research insight. Also, I would like to thank Prof. Han Chien-Pai, Prof. D.L. Hawkins, Prof. Jianzhong Su and Prof. Shan Sun-Mitchell for their kind agreement to serve on my committee. I feel very privileged to have met the families of Professors Corduneanu and Dragan, who substituted my home family and I would like to thank them very much for all unforgettable moments spent together. Among the friends I made as a graduate student, a special mention is due to Violeta Yurita and Silviu Minut. I shall cherish their friendship forever. I owe so much thanks to my parents who were continuously supporting me throughout my life. I am forever indebted to my husband Paul for his support and assistance in every step of my thesis. I would like to thank to my daughter Dana for being such an understanding child and for giving me unlimited happiness and strength to finish this thesis. July 7, 2009
ABSTRACT
LARGE DEVIATION PRINCIPLE FOR FUNCTIONAL LIMIT THEOREMS Adina Oprisan, Ph.D. The University of Texas at Arlington, 2009
Supervising Professor: Andrzej Korzeniowski We study a family of stochastic additive functionals of Markov processes with locally independent increments switched by jump Markov processes in an asymptotic split phase space. Based on an averaging limit theorem, we obtain a large deviation result for this stochastic evolutionary system using a weak convergence approach. Examples, including compound Poisson processes, illustrate cases in which the rate function is calculated in an explicit form. We prove also a large deviation principle for a class of empirical processes in C[0, ∞) associated with additive functionals of Markov processes that were shown to have a martingale decomposition. Functional almost everywhere central limit theorems are established and the large deviation results are derived.
TABLE OF CONTENTS ACKNOWLEDGEMENTS . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
iv
ABSTRACT . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
v
Chapter
Page
1. INTRODUCTION . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
1
2. GENERAL FRAMEWORK . . . . . . . . . . . . . . . . . . . . . . . . . . . .
3
2.1 Markov processes: notations and definitions . . . . . . . . . . . . . . . .
3
2.1.1 Markov chains . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
5
2.1.2 Continuous-time Markov processes . . . . . . . . . . . . . . . . . .
8
2.1.3 Diffusion processes . . . . . . . . . . . . . . . . . . . . . . . . . . .
13
2.1.4 Processes with independent increments . . . . . . . . . . . . . . . .
14
2.1.5 Processes with locally independent increments . . . . . . . . . . . .
17
2.2 Weak convergence of stochastic processes . . . . . . . . . . . . . . . . . .
18
2.2.1 Martingale characterization of Markov processes . . . . . . . . . . .
18
2.2.2 Reducible-invertible operators . . . . . . . . . . . . . . . . . . . . .
21
2.2.3 Perturbation of reducible-invertible operators . . . . . . . . . . . .
23
2.2.4 Weak convergence . . . . . . . . . . . . . . . . . . . . . . . . . . .
25
2.3 Large deviation principle . . . . . . . . . . . . . . . . . . . . . . . . . . .
28
3. LARGE DEVIATION PRINCIPLE FOR AVERAGE LIMIT THEOREMS . .
38
3.1 Average approximation . . . . . . . . . . . . . . . . . . . . . . . . . . . .
39
3.1.1 Stochastic additive functionals . . . . . . . . . . . . . . . . . . . .
39
3.1.2 Phase merging principle . . . . . . . . . . . . . . . . . . . . . . . .
41
3.2 Large deviation principle for ergodic Markov processes . . . . . . . . . .
54
3.3 Large deviations for stochastic additive functionals . . . . . . . . . . . .
57
4. LARGE DEVIATION PRINCIPLE FOR FUNCTIONAL CENTRAL LIMIT THEOREMS . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
79
4.1 Functional a.e. central limit theorems for additive functionals . . . . . .
79
4.2 Large deviation principle for Ornstein-Uhlenbeck processes . . . . . . . .
82
4.3 Large deviation principle for Brownian motion . . . . . . . . . . . . . . .
84
4.4 Large deviation principle for sequence of i.i.d. random variables . . . . .
86
4.5 Martingale decomposition of Markov functionals . . . . . . . . . . . . . .
92
4.6 Large deviation principle for additive functionals . . . . . . . . . . . . . .
99
5. FURTHER RESEARCH DIRECTIONS . . . . . . . . . . . . . . . . . . . . . 102 REFERENCES . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 105 BIOGRAPHICAL INFORMATION . . . . . . . . . . . . . . . . . . . . . . . . . 108
vii
CHAPTER 1 INTRODUCTION The theory of large deviations concerns the asymptotic behavior of remote tails of sequences of probability distributions. It has applications in many areas including statistics, communication networks, information theory, statistical mechanics, risksensitive control and queuing systems. The first rigorous large deviations results were obtained by Harald Cram´er in the late 1930s, who applied them to model the insurance business. He gave a large deviation principle (LDP) result for a sum of i.i.d. random variables with the rate function expressed by a power series. A general abstract framework for the LDP was proposed by S.R.S. Varadhan in 1966. At that time, the only modern large deviation principles available were: Schilder (LDP for Brownian motion), Sanov (LDP for ergodic processes), and Freidlin and Wentzell (LDP for Itˆo diffusions) with some abstract foundation of LDP. A crucial step forward was achieved through a series of papers of Donsker and Varadhan, starting in the mid 1970s in which they developed a systematic large deviation theory for empirical measures in the i.i.d. and Markov cases. For his seminal contributions to the field, Varadhan has won the 2007 Abel Prize. A unified approach, including Non-Uniform and Hypoeliptic Diffusions is due to D.W. Stroock. In this dissertation we establish several large deviations results for stochastic additive functionals that are strongly or weakly convergent under different scalings. In Chapter 2 we introduce the necessary notations, definitions and state basic facts and results related to our study. Furthermore, the fundamental objects such as Markov
1
2 chains, Markov processes and some of their sub-families, including their properties are discussed and the framework for our analysis is presented. In Chapter 3 we present the notions of switching and switched processes via additive functionals of processes with locally independent increments subject to Markov switching in phase split and merging scheme. The method is based on a splitting of the phase space into disjoint classes Ek , k ∈ V and further merging these classes into distinct states k ∈ V . Large deviation principle for phase merging principle and average approximation theorem are established. These results are published in [1]. For phase merging principle the large deviation result was obtained via the LDP considerations for ergodic processes. The large deviation result for average approximation theorem follows from a weak convergence approach. That is, we represent the normalized logarithms of the expectations as variational formulas that are interpreted as the minimal cost functions of associated stochastic optimal control problems. In Chapter 4 we study additive functionals associated with functional central limit theorems using entirely different techniques than those used in Chapter 3. We establish functional almost everywhere central limit theorems and derive the large deviation principle through the rate function for Ornstein-Uhlenbeck processes and a martingale decomposition of the additive functionals. These results are published in [2]. Finally, in Chapter 5 we outline some ideas for future work, including the extension of the results obtained in Chapter 4 for additive functionals in continuous time and for stochastic additive functionals.
CHAPTER 2 GENERAL FRAMEWORK 2.1
Markov processes: notations and definitions Let (E, r) be a complete, separable metric space, and let E be its Borel σ-
algebra of all Borel subsets of E. We will call the measurable space (E, BE ) a standard state space. The space IRd , with the Euclidian metric, is a complete, separable metric space, and define by Bd its Borel σ-algebra. The space where the trajectories of processes are considered is D[0, ∞), the space of right continuous functions having left hand side limits. This embedded with Skorokhod metric becomes a complete, separable metric space. Let also (C[0, ∞), k·k) be the space of continuous functions with the sup-norm, kxk = supt≥0 |x(t)|, x ∈ C[0, ∞). Let B(E) the Banach space of all bounded, Borel measurable functions. Endowed with the norm kϕk = supx∈E |ϕ(x)|, (B(E), k · k) is a Banach space. Let us consider the stochastic basis (Ω, F, (Ft )t≥0 , IP) where (Ft )t≥0 is a filtration of sub-σ-algebras of F, that is: Fs ⊆ Ft ⊆ F, for all s < t and t ≥ 0. The filtration is said to be complete if F0 contains all the IP-null sets. Set Ft+ = ∩s>t Fs , t ≥ 0. If for any t ≥ 0, Ft = Ft+ , then the filtration Ft , t ≥ 0 is said to be rightcontinuous. If a filtration is complete and right-continuous, we say that it satisfies the usual conditions. A mapping T : Ω → [0, ∞], such that {T ≤ t} ∈ Ft is called a stopping time. If T is a stopping time, we denote by FT the collection of all sets A ∈ F such that A ∩ {T ≤ t} ∈ Ft .
3
4 An (E, BE )-valued stochastic process x(t), t ∈ I, defined on Ω is adapted to the filtration if for any t ∈ I, x(t) is Ft -measurable. The set of values E is said to be the state (phase) space of the process x(t), t ≥ 0. Given a probability measure µ on (E, BE ), we define the probability measures IPµ , by Z µ(dx)IPx (B),
IPµ (B) = µIP(B) =
x ∈ E, B ∈ F
E
where IPx (B) := IP(B|x(0) = x). We denote by IEµ and IEx the expectations corresponding respectively to IPµ and IPx . A positive-valued function P (x, B), x ∈ E, B ∈ BE is called a Markov transition function or a Markov kernel if 1) for any fixed x ∈ E, P (x, ·) is a probability measure on (E, BE ) 2) for any fixed B ∈ BE , P (·, B) is a Borel measurable function. If P (x, E) ≤ 1 for a x ∈ E, then P is said to be a sub-Markov kernel. If for a fixed x ∈ E, the P (x, ·) is a signed measure, then it is said to be a signed kernel. In that case, we will suppose that the signed kernel P is of bounded variation, that is, |P |(x, E) < ∞. If E is a finite or countable set, we take BE = PE (the set of all subsets of E), the Markov kernel is determined by the matrix (P (i, j) : i, j ∈ E), with P P (i, B) = j∈B P (i, j), B ∈ BE .
5 2.1.1
Markov chains A time-homogeneous Markov chain associated to a Markov kernel P (x, B) is an
adapted sequence of random variables xn , n ≥ 0 defined on Ω, satisfying for every n ∈ IN, x ∈ E and B ∈ BE , the following relation:
IP(xn+1 ∈ B|Fn ) = IP(xn+1 ∈ B|xn ) =: P (xn , B),
(a.s.)
(2.1.1)
which is called the Markov property. In most cases, we consider the Markov property with respect to Fn := σ(xk , k ≤ n), n ≥ 0, the natural filtration generated by the chain xn , n ≥ 0. If the Markov property (2.1.1) is satisfied for any finite Fn -stopping time, then it is called strong Markov property and the chain is called a strong Markov chain. The product of two Markov kernels P and Q defined on (E, BE ), is also a R Markov kernel, defined by P Q(x, B) = E P (x, dy)Q(y, B). Let us denote by P n (x, B) = IP(xn ∈ B|x0 = x) = IP(xn+m ∈ B|xm = x) the n-step transition probability which is defined inductively by the above relation. Using Markov property one gets the following property: for any n, m ∈ IN,
n+m
IP
Z (x, B) =
n
m
Z
P (x, dy)P (y, B) = E
P m (x, dy)P n (y, B)
E
which is called the Chapman-Kolmogorov equation. A subset B ∈ BE , is called accessible from a state x ∈ E, if IPx (xn ∈ B, for some n ≥ 1) > 0 or equivalently P n (x, B) > 0. A Markov chain xn , n ≥ 0, is called Harris recurrent if there exists a σ-finite measure ψ on (E, BE ), with ψ(E) > 0, such that for any x ∈ E, for any A ∈ BE and ψ(A) > 0, IPx (∪n≥1 {xn ∈ A}) = 1.
6 If IPx (∪n≥1 {xn ∈ A}) is positive then the Markov chain is called ψ-irreducible. The Markov chain is said to be uniformly irreducible if for any A ∈ BE , supx IPx (τA > N ) → ∞ as N → ∞, where τA := inf{n ≥ 0 : xn ∈ A}. A Markov chain xn , n ≥ 0, is said to be d-periodic (d > 1), if there exists a cycle, that is, a sequence (C1 , ..., Cd ) of sets, Ci ∈ BE , 1 ≤ i ≤ d, with P (x, Cj+1 ) = 1, x ∈ Cj , 1 ≤ j ≤ d − 1 and P (x, C1 ) = 1, x ∈ Cd , such that - the set E\ ∪di=1 Ci is ψ-null - if (C10 , ..., Cd0 0 ) is another cycle, then d0 divides d and Ci0 differs from a union of d/d0 members of (C1 , ..., Cd ) only by a ψ-null set which is of type ∪i≥1 Vi , where, for any i ≥ 1, IPx (lim sup{xn ∈ Vi }) = 0. If d = 1 the the Markov chain is called aperiodic. A probability measure ρ on (E, BE ) is said to be a stationary distribution or invariant probability for the Markov chain xn , n ≥ 0 if for any B ∈ BE , Z ρ(B) =
ρ(dx)P (x, B). E
If a Markov chain is ψ-irreducible and has invariant probability, it is called positive, otherwise it is called null. If a Markov chain is Harris recurrent and positive it is called Harris positive. If a Markov chain is aperiodic and Harris positive it is called (Harris) ergodic. Proposition 2.1 ([3]) Let xn , n ≥ 0, be an ergodic Markov chain, then 1) for any probability measure α on (E, BE ), we have
kαP n − ρk → 0,
n→∞
7 2) for any ϕ ∈ B(E), n−1
1X lim ϕ(xk ) = n→∞ n k=0
Z ρ(dx)ϕ(x),
IPµ − a.s.
(2.1.2)
E
for any probability measure µ on (E, BE ). Let us denote by Π the stationary projector in B(E) defined by the stationary distribution ρ(B), B ∈ BE of the Markov chain xn as: Z
Z ρ(dx)ϕ(y)1(x) = ϕ1(x), ˆ
Πϕ(x) :=
ϕ ˆ :=
E
ρ(dx)ϕ(x), E
where 1(x) = 1 for all x ∈ E. Then Π2 = Π. Let us denote by P the operator of transition probabilities on B(E) defined by Z P ϕ(x) = IE[ϕ(xn+1 )|xn = x] =
P (x, dy)ϕ(y) E
and denote by P n the n-step transition operator corresponding to P n (x, B). The Markov property (2.1.1) can be represented in the following form
IE[ϕ(xn+1 )|Fn ] = P ϕ(xn ).
The Markov chain is uniformly ergodic if
sup k(P n − Π)ϕk → 0,
n → ∞.
(2.1.3)
kϕk≤1
Uniform ergodicity of the Markov chain xn implies the convergence of the series
R0 :=
∞ X n=0
[P n − Π]
(2.1.4)
8 which is called the potential operator of the Markov chain xn , n ≥ 0 (see [4]) and satisfies the property R0 [I − P ] = [I − P ]R0 = I − Π.
2.1.2
Continuous-time Markov processes Let us consider a family of Markov kernels {Pt = Pt (x, B), t ∈ IR+ }. Let
x(t) ≥ 0 be an adapted stochastic process with values in (E, BE ) defined on Ω. The process x(t), t ≥ 0, is said to be a time-homogeneous Markov process if for any fixed s, t ∈ IR+ and B ∈ BE ,
IP(x(t + s) ∈ B|Fs ) = IP(x(t + s) ∈ B|x(s)) = Pt (x(s), B),
(a.s.).
(2.1.5)
When the Markov property (2.1.5) holds for any finite stopping time τ , instead of a deterministic time s, we say that the Markov process x(t), t ≥ 0, satisfies the strong Markov property and that the process x(t) is a strong Markov process.
On the
Banach space B(E), the operator of transition probability Pt is defined by Z ϕ(y)Pt (x, dy), ϕ ∈ B(E)
Pt ϕ(x) = IEx (ϕ(x(t))) = E
This is a contractive operator, i.e., kPt ϕk ≤ kϕk. The Chapman-Kolmogorov equation is equivalent to the following
Pt Ps = Pt+s , for all t, s ∈ IR+
9 which is called semigroup property. The Markov process x(t), t ≥ 0 has a stationary distribution π if for any B ∈ BE , Z π(B) =
π(dx)Pt (x, B),
π(E) = 1, t ≥ 0.
E
The Markov process x(t), t ≥ 0 is ergodic if for every ϕ ∈ B(E) we have t
Z
1 lim t→∞ t
Z ϕ(x(s)) ds =
0
IPµ − a.s.
π(dx)ϕ(x),
(2.1.6)
E
for any probability µ on (E, BE ). The stationary projector Π, of an ergodic Markov process with stationary distribution π, is defined as Z Πϕ(x) :=
Z π(dx)ϕ(y)1I(x) = ϕ1 ˆ I(x),
E
ϕˆ :=
π(dx)ϕ(x), E
where 1I(x) = 1 for all x ∈ E. We have Π2 = Π. Let us consider a Markov process x(t), t ≥ 0 with trajectories in D[0, ∞) and semigroup {Pt , t ≥ 0}. The linear operator Q : B(E) → B(E) defined by 1 Qϕ = lim (Pt ϕ − ϕ), t↓0 t where the limit exists in norm, is called the infinitesimal operator. Let the domain of the operator D(Q) be the subset of B(E) for which the above limit exists. A Markov semigroup Pt , t ≥ 0 is said to be uniformly continuous on B(E), if
lim ||Pt − I|| = 0 t→0
10 where I is the identity operator on B(E). A time-homogeneous Markov process is said to be (purely) discontinuous or of jump type, if its semigroup is uniformly continuous. In that case, the process stays in any state for a positive (strict) time, and after leaving a state it moves directly to another one. This process we will call a jump Markov process. Let x(t), t ≥ 0 be a time-homogeneous jump Markov process. Let τn , n ≥ 0 be the jump times for which we have 0 = τ0 ≤ τ1 ≤ · · · ≤ τn ≤ · · · . The stochastic process xn , n ≥ 0 defined by xn = x(τn ), n ≥ 0 is called the embedded Markov chain of the Markov process x(t), t ≥ 0. Let P (x, B) be the transition probability of xn , n ≥ 0. The infinitesimal generator Q of the jump Markov process x(t), t ≥ 0, is Z P (x, dy)[ϕ(y) − ϕ(x)]
Qϕ(x) = q(x)
(2.1.7)
E
where the kernel P (x, B) is the transition kernel of the embedded Markov chain, and q(x), x ∈ E is the intensity of jumps function. If the Markov process x(t) has a stationary distribution π, then the embedded Markov chain xn has also a stationary distribution ρ having the following property: Z π(dx)q(x) = qρ(dx),
q :=
π(dx)q(x). E
Example 2.2 The infinitesimal generator of a Poisson process. Let S1 , S2 , ... be i.i.d. random variables such that IP(Sn > t) = e−λt , λ > 0. Let T1 = P S1 , Tn = S1 + · · · + Sn , n ≥ 2. The process Nt , t ≥ 0 defined by Nt = ∞ n=1 1I(Tn ≤t) is a Markov process with respect to the natural filtration Ft = σ{Ns : s ≤ t} and its
11 trajectories t → Nt are right continuous step functions with jumps of height 1 at Tn and they belong to the D[0, ∞). The random variable Nt has Poisson distribution IP(Nt = n) = e−λt
(λt)n , n!
IE(Nt ) = λt.
(2.1.8)
We can consider the time-homogeneous Poisson process Nt , t ≥ 0 as a Markov family on the state E = ZZ+ = {0, 1, 2, · · · } with the probability measure IPx being such that N0 = x, x ∈ E and the transition probability function
IP(t, x, {z}) =
e−λt
1 (λt)z−x , (z−x)!
z≥x
0,
z1
The processes with stationary independent increments satisfy the Markov property, that is, the transition probabilities are generated by the Markov semigroup
Γt ϕ(u) := IEϕ(u + x(t)).
(2.1.12)
16 The generator of the semigroup (2.1.12) with the cumulant (2.1.11) has the following representation σ2 IΓϕ(u) = aϕ (u) − ϕ00 (u) + 2 0
Z
[ϕ(u + v) − ϕ(u) − vϕ0 (u)1I(|v|≤1) ]H(dv).
IR
The most important processes with stationary independent increments are Brownian motion, Poisson process, and L´evy process. The standard Wiener process with the cumulant ψ(λ) = −σ 2 λ2 /2 has the generator IΓϕ(u) = σ 2 ϕ00 (u)/2. The Compound Poisson process x(t) =
Pν(t)
k=1 ξk ,
where νt , t ≥ 0 is a homoge-
neous Poisson process with intensity λ > 0 and ξk , k ≥ 1 is an i.i.d. sequence of real random variables, independent of ν(t), t ≥ 0, with common distribution function F , has the cumulant of the form Z ψ(λ) = λ
[eiλz − 1]F (dz)
IR
and the infinitesimal generator of the form Z [ϕ(u + v) − ϕ(u)]F (dv).
IΓϕ(u) = λ IR
One can associate uniquely a homogeneous Markov process to the process with independent increments x(t) by
ξ(t) := x(s + t) − x(s) + u s ≥ 0, u ∈ IRd .
17 The semigroup Γt ϕ(u) = IEu ϕ(ξ(t)) = IEϕ(u + x(t)) has the following remarkable property: let Sv : Bd → Bd be the shift operator, Sv ϕ(u) = ϕ(u + v); then Γt Sv ϕ(u) = Sv Γt ϕ(u) for any u, v ∈ IRd . The Markov processes with the property that their semigroup commutes with the operator Sv , v ∈ IRd are called homogeneous in space. Thus, processes with stationary independent increments can be identified in a certain sense with Markov processes that are homogeneous in both time and space.
2.1.5
Processes with locally independent increments The processes with locally independent increments considered are jump Markov
processes with drift and without diffusion part. These processes are also called weakly differentiable Markov processes [3] or piecewise-deterministic Markov processes [6]. It is worth noticing that such processes include strictly the independent increment processes. For their detailed presentation and applications see [6]. These processes are defined by the infinitesimal generator IΓ as follows
0
Z
IΓϕ(u) = a(u)ϕ (u) +
[ϕ(u + v) − ϕ(u) − vϕ0 (u)]Γ(u, dv)
(2.1.13)
IRd
with drift velocity a(u), the intensity kernel Γ(u, dv) such that Γ(u, dv) ∈ IR+ and satisfying the above conditions of the spectral measure. It is understood that when d > 1, we have vϕ0 (u) =
d X k=1
vk
∂ϕ (u). ∂uk
Let us be given the Euclidian space IRd with the Borel σ-algebra Bd and the compact measurable space (E, E). We consider the family of time-homogeneous Markov processes η(t; x), t ≥ 0, x ∈ E, with trajectories in D[0, ∞), with locally independent
18 increments. These processes take values in the Euclidian space IRd (d ≥ 1), and depend on the state x ∈ E, and their infinitesimal generators are given by
0
Z
[ϕ(u + v) − ϕ(u) − vϕ0 (u)]Γ(u, dv; x)
IΓ(x)ϕ(u) = a(u; x)ϕ (u) +
(2.1.14)
IRd
These processes will be used in Chapter 3 as switched processes. The drift velocity a(u; x) and the measure of the random jumps Γ(u, dv; x) depend on the state x ∈ E.
2.2 2.2.1
Weak convergence of stochastic processes Martingale characterization of Markov processes Let (Ω, F, (Ft )t≥0 , IP) be a probability space and let Mt , t ≥ 0 be a process on
Ω and adapted to the filtration (Ft )t≥0 . A real-valued process Mt , t ≥ 0 adapted to the filtration (Ft )t≥0 is called a martingale (submartingale, supermartingale) if (i) IE|Mt | < ∞ for all t ≥ 0 (ii) for any s < t,
IE(Mt |Fs ) = Ms , (IE(Mt |Fs ) ≥ Ms , IE(Mt |Fs ) ≤ Ms ), a.s.
The process Mt , t ≥ 0 is called a square integrable martingale if
sup IEMt2 < ∞ t≥0
If Mt , t ≥ 0 is a square integrable martingale then the process Mt2 , t ≥ 0 is an Ft -submartingale, and the Doob-Meyer decomposition [7] holds
Mt2 = hMt i + Nt
19 where Nt , t ≥ 0 is a martingale and the increasing process hMt i, t ≥ 0 is called the square characteristic of the martingale Mt . A process Mt , t ≥ 0 is called local martingale if there exists an increasing sequence of stopping times τn such that the stopped process Mtn := Mt∧τn , t ≥ 0 is a martingale for each n ≥ 1. Let x(t), t ≥ 0 be a Markov process with a standard state space (E, E), defined on Ω. Let Pt (x, B), x ∈ E, B ∈ E, t ≥ 0 be its transition function, and Pt , t ≥ 0, its strongly continuous semigroup defined on the Banach space B(E) of real-valued measurable functions defined on E. Let Q be the infinitesimal generator of the semigroup Pt , t ≥ 0, with the dense domain of definition D(Q) ∈ B. For any function ϕ ∈ D(Q) ⊂ B(E) and t > 0 we have the Dynkin formula t
Z
QPs ϕ(x) ds.
Pt ϕ(x) = ϕ(x) + 0
From this formula, using conditional expectation, we get Z IEx [ϕ(x(t)) − ϕ(x) −
t
Qϕ(x(s)) ds] = 0. 0
Thus, the process Z µ(t) := ϕ(x(t)) − ϕ(x) −
t
Qϕ(x(s)) ds
(2.2.1)
0
is an Ftx = σ(x(s), s ≤ t)-martingale. The following theorems gives the martingale characterization of Markov processes. Theorem 2.7 ([8]) Let (E, E) be a standard state space and let x(t), t ≥ 0 be a stochastic process on it, adapted to the filtration (Ft )t≥0 . Let Q be the generator of
20 a strongly continuous semigroup Pt , t ≥ 0, on the Banach space B(E), with dense domain D(Q) ∈ B(E). If for any ϕ ∈ D(Q), the process µ(t), t ≥ 0, defined by (2.2.1) is an Ft -martingale, then x(t), t ≥ 0 is a Markov process generated by the infinitesimal generator Q. The process x(t), t ≥ 0 is said to solve the martingale problem for the infinitesimal generator Q. Let xn , n ≥ 0 be a Markov chain on a measurable state space (E, E) induced by a stochastic kernel P (x, B), x ∈ E, B ∈ E. Let P be the corresponding transition operator defined on the Banach space B(E). Let us construct now the following martingale as a sum of martingale differences n−1 X µn = [ϕ(xk+1 ) − IE(ϕ(xk+1 )|Fk )]. k=0
By using the Markov property and the rearrangement of terms the martingale is written as µn = ϕ(xn ) − ϕ(x0 ) −
n−1 X
[P − I]ϕ(xk ).
(2.2.2)
k=1
This representation of the martingale is associated with a Markov chain characterization. Proposition 2.8 Let xn , n ≥ 0 be a sequence of random variables taking values in a measurable space (E, E) and adapted to the filtration Fn , n ≥ 0. Let P be a bounded linear positive operator on the Banach space B(E) induced by a transition probability kernel P (x, B) on (E, E). If for every ϕ ∈ B(E), the right hand side of (2.2.2) is a martingale then the sequence xn , n ≥ 0 is a Markov chain with transition probability kernel P (x, B) induced by the operator P.
21 2.2.2
Reducible-invertible operators A bounded linear operator Q : B(E) → B(E) is called reducible-invertible if
the Banach space B(E) can be decomposed in direct sum of two subspaces:
B(E) = NQ ⊕ RQ ,
dimNQ ≥ 1
(2.2.3)
where NQ := {ϕ : Qϕ = 0} is the null space, and RQ := {ψ : Qϕ = ψ, ϕ ∈ B(E)} is the range space. [4] The decomposition (2.2.3) generates the projector Π on NQ
Πϕ :=
ϕ, ϕ ∈ NQ 0, ϕ ∈ RQ
The operator I − Π is a projector on the subspace RQ
(I − Π)ϕ :=
0, ϕ ∈ NQ ϕ, ϕ ∈ RQ
where I is the identity operator in B(E). Let Q be a reducible-invertible operator. The operator
R0 := (Q + Π)−1 − Π
is called the potential operator of Q. The potential can also be written as R0 = (Π − Q)−1 − Π.
(2.2.4)
22 Proposition 2.9 ([4]) The following equalities hold:
QR0 = R0 Q = I − Π ΠR0 = R0 Π = 0 QR0n = R0n Q = R0n−1 , n ≥ 1.
Proposition 2.10 ([4]) Let Q : B(E) → B(E) be a reducible-invertible operator. The the equation Qϕ = ψ
(2.2.5)
under the solvability condition Πψ = 0 has a general solution of the form:
ϕ = −R0 ψ + ϕ0 , ϕ0 ∈ NQ .
(2.2.6)
If moreover, the condition Πϕ = 0 holds then the equation has a unique solution represented by ϕ = −R0 ψ.
(2.2.7)
Note that for a uniformly ergodic Markov chain the operator Q := P − I is reducibleinvertible. For a uniformly ergodic Markov process with infinitesimal generator Q and semigroup Pt , t ≥ 0 the potential R0 is a bounded operator defined by ∞
Z
(Pt − Π)dt,
R0 = 0
where the projector Π is defined as Z Πϕ(u) =
ρ(dx)ϕ(x)1I(x), E
(2.2.8)
23 with ρ(B), B ∈ E the stationary distribution of the Markov chain. Let Qε : B(E) → B(E), ε > 0, be a family of linear operators. We say that it converges in the strong sense as ε → 0 to the operator Q if
lim kQε ϕ − Qϕk = 0
ε→0
for any ϕ ∈ B(E). We use the notation s − limε→0 Qε = Q.
2.2.3
Perturbation of reducible-invertible operators Let Q be a bounded reducible-invertible operator on B(E). The problem of asymptotic singular perturbation of a reducible-invertible oper-
ator ([4]) Q with small parameter ε > 0 and perturbing operator Q1 is formulated in the following way: we have to construct the vector ϕε = ϕ + εϕ1 which realizes the asymptotic representation
[ε−1 Q + Q1 ]ϕε = ψ + εθε
(2.2.9)
for some given vector ψ and uniformly bounded in norm vector θε , kθε k ≤ c as ε → 0.
Proposition 2.11 ([4]) Let the following conditions be satisfied: (a) the bounded operator Q is reducible-invertible (b) the perturbing operator Q1 is closed with a dense domain B0 (E) ⊆ B(E) ˆ 1 , defined by ΠQ1 Π = Q ˆ 1 Π, has the inverse operator (C) the contracted operator Q ˆ −1 Q 1
24 Then the asymptotic representation (2.2.9) is realized by the vectors which are determined by
ˆ 1 ϕˆ = ψˆ Q ϕ1 = R0 (ψ − Q1 ϕ) θε = Q1 R0 (ψ − Q1 ϕ) R0 = (Q + Π)−1 − Π
Proof: The left hand side of (2.2.9) can be represented in the form:
[ε−1 Q + Q1 ](ϕ + εϕ1 ) = ε−1 Qϕ + (Qϕ1 + Q1 ϕ) + εQ1 ϕ1
In order to realize the right-hand side of (2.2.9) we set
Qϕ = 0 Qϕ1 + Q1 ϕ = ψ Q1 ϕ1 = θε
The first relation implies that ϕ ∈ NQ , from the third we get that θε = Q1 ϕ1 is independent of ε. The main problem is to solve the equation
Qϕ1 = ψ − Q1 ϕ
(2.2.10)
The solvability of (2.2.10) is Π(ψ − Q1 ϕ) = 0. Since Πϕ = ϕ we have
Πψ = ΠQ1 Πϕ
(2.2.11)
25 ˆQ , introduce the contracted operator Q ˆ 1 which is deterOn the contracted space N ˆ ∈N ˆ 1 Π and set ψˆ := Πψ ˆQ . Thus, the relation Q ˆ 1 ϕˆ = ψˆ mined by relation ΠQ1 Π = Q ˆQ . establishes a connection between two vectors ϕˆ and then ψˆ in N The solution of (2.2.10) is ϕ1 = R0 (ψ − Q1 ϕ) where R0 is the potential operator of Q, and θε = Q1 ϕ1 = Q1 R0 (ψ − Q1 ϕ).
2.2.4
Weak convergence Consider the functional space DE [0, ∞) as the space of sample paths of stochas-
tic processes. The main type of convergence of stochastic processes is the weak convergence of finite-dimensional distributions, that is for the family of stochastic processes xε (t), t ≥ 0, ε > 0 there is a process x(t), t ≥ 0 such that
lim IEϕ(xε (t1 ), ..., xε (tN )) = IEϕ(x(t1 ), ..., x(tN ))
ε→0
for any test function ϕ(x1 , ..., xN ) ∈ Cb (E N ), the space of real-valued bounded continuous functions on E N , and for any finite set {t1 , ..., tN } ∈ S, where S is a dense set in IR+ . We will denote this convergence by D
xε (t) ⇒ x(t),
ε → 0.
A more general type of convergence of stochastic processes is the weak convergence of associated measures, that is, the probability distributions
Pε (B) := IP(xε (·) ∈ B),
B ∈ DE
26 where DE is the Borel σ-algebra on DE [0, ∞) converge weakly to P (B). Therefore, the following limit holds lim IEϕ(xε (·)) = IEϕ(x(·))
ε→0
for all ϕ ∈ Cb (DE [0, ∞)), the space of all bounded continuous real-valued functions on DE [0, ∞). We will denote the weak convergence of processes and that of the associated measures by xε ⇒ x,
Pε ⇒ P,
ε → 0.
The connection between the above two types of convergence of stochastic processes is realized by means of the relative compactness of the family of associated measures (Pε , ε > 0), i.e., for any sequence (Pεn ) of (Pε , ε > 0) there exists a weak convergent subsequence (Pε0n ) ⊂ (Pεn ). Theorem 2.12 ([9]) Let xε (t), t ≥ 0, ε > 0, and x(t), t ≥ 0, be processes with sample paths in DE [0, ∞). D
(a) if xε ⇒ x as ε → 0, then xε ⇒ x, for the set S = {t > 0 : IP(x(t) = x(t− )) = 1}. (b) if (xε , ε > 0) is relatively compact and there exists a dense set S ⊂ IR+ such D
that xε ⇒ x as ε → 0 on the set S, then xε ⇒ x as ε → 0. Theorem 2.13 (Prohorov’s theorem [10]) The relative compactness of a family of probability measures on DE [0, ∞) is equivalent to the tightness of this family. For Markov processes, the best way to verify the relative compactness is to use the martingale characterization approach of Stroock and Varadhan, described in the following theorem. Theorem 2.14 [8] Let the family of IRd -valued stochastic processes xε (t), t ≥ 0, ε > 0 such that:
27 H1 : for all nonnegative functions ϕ ∈ C0∞ (IRd ) there exists a constant Aϕ ≥ 0 such that (ϕ(xε (t)) + Aϕ (t), Ftε ) is a nonnegative submartingale H2 : given a nonnegative ϕ ∈ C0∞ (IRd ), the constant Aϕ can be chosen such that it does not depend on the translates of ϕ. Then, under the initial condition
lim sup IP(|xε (0)| ≥ c) = 0,
c→∞ 0 0 is relatively compact. It is worth noticing that the relative compactness conditions formulated in CE [0, ∞) are valid for the space DE [0, ∞). In order to verify the weak convergence of a family of stochastic processes in DE [0, ∞) we have to establish the relative compactness and the weak convergence of finite-dimensional distributions. For a family of Markov processes these are derived from the martingale characterization of Markov processes and the convergence of their infinitesimal generators. Theorem 2.15 ([5]) Let xε (t), t ≥ 0, ε > 0 be a family of stochastic processes with values in the standard phase space (E, E) and adapted to the filtration of σ-algebras Ft , t ≥ 0, ε > 0. Let x(t) be a homogeneous stochastically continuous Markov process whose transition probabilities are uniquely determined by the generating operator L which is defined on a dense domain DL in the space Cb (E) and in addition assume that C0 (E) is contained in the closure of DL . Suppose that for all ϕ ∈ DL
ε
Z
ϕ(x (t)) − 0
t
Lϕ(xε (s)) ds = µεt + ψtε
(2.2.12)
28 where (µεt , Ft , t ≥ 0) is a family of square integrable martingales with the quadratic characteristic admitting representation in the form Z
ε
hµ it =
t
ζ ε (s) ds
0
with random functions ζ ε (s), s ≥ 0 satisfying
IE sup |ζ ε (s)|1+δ ≤ c < ∞
(2.2.13)
0≤t≤T
and the process ψtε , ε > 0 verifying sup IE sup |ψtε | → 0 as
ε≤ε0
ε0 → 0
0≤t≤T
Then xε ⇒ x as ε → 0 and the limiting process x(t), t ≥ 0 is characterized by the martingale Z ϕ(x(t)) −
t
Lϕ(x(s)) ds = µ0t .
0
If the condition (2.2.13) is replaced by
IE sup |ζ ε (s)| → 0
as
ε→0
0≤t≤T
then the limiting process x(t) is given by the solution of the deterministic evolutionary equation dϕ(x(t)) = Lϕ(x(t)). dt 2.3
Large deviation principle Let {X n , n ∈ IN} be a sequence of random variables defined on a probability
space (Ω, F, IP) and taking values in a complete, separable metric space (X , d). The
29 theory of large deviations focuses on random variables {X n } for which the probabilities IP{X n ∈ A} converge to 0 exponentially fast for a class of Borel sets A. The exponential decay rate of these probabilities is expressed in terms of a function I mapping X into [0, ∞]. This function is called a rate function if it has compact level sets, i.e., for each M < ∞ the level set {x ∈ X : I(x) ≤ M } is a compact subset of X . A function having compact level sets is automatically lower semicontinuous and it attains its infimum on any nonempty closed set. Definition 2.16 The sequence {X n } is said to satisfy the large deviation principle on X with rate function I if the following two conditions hold: (a) Large deviation upper bound: for each closed subset F of X
lim sup n→∞
1 log IP{X n ∈ F } ≤ − inf I(x) x∈F n
(b) Large deviation lower bound: for each open subset G of X
lim inf n→∞
1 log IP{X n ∈ G} ≥ − inf I(x). x∈G n
Definition 2.17 Let I be a rate function on X . The sequence {X n } is said to satisfy the Laplace principle on X with rate function I if for all bounded continuous functions h lim
n→∞
1 log IE{exp[−nh(X n )]} = − inf {h(x) + I(x)}. x∈X n
(2.3.1)
The large deviation principle can be formulated in terms of the Laplace principle by using the following fundamental results. Theorem 2.18 (Varadhan’ theorem [11]) Assume that the sequence {X n } satisfies the large deviation principle on X with rate function I. Then for all bounded con-
30 tinuous functions h mapping X into IR, the equation (2.3.1) holds and therefore the sequence {X n } satisfies the Laplace principle on X . A converse of Varadhan’s theorem is also due to Varadhan and it states: Theorem 2.19 ([11]) The Laplace principle implies the large deviation principle with the same rate function. More precisely, if I is a rate function on X and the limit 2.3.1 is valid for all bounded continuous functions h, then {X n } satisfies the large deviation principle on X with rate function I. Definition 2.20 The sequence {X n } is said to be exponentially tight if for each M ∈ (0, ∞) there exists a compact set K such that
lim sup n→∞
1 log IP{X n ∈ K c } ≤ −M. n
The next theorem is another converse to Varadhan’s theorem due to Bryc. Theorem 2.21 (Bryc’s theorem [12]) If the sequence {X n } is exponentially tight and if the limit Λ(h) := lim log IE{exp[−nh(X n )]} n→∞
exists for all h bounded continuous functions, then {X n } satisfies the large deviation principle on X with rate function
I(x) := − inf {h(x) + Λ(h)} h∈Cb (X )
and Λ(h) = − inf x∈X {h(x) + I(x)}. If a sequence of random variables satisfies the Laplace principle (or equivalently the LDP) with some rate function, then the rate function is unique. Theorem 2.22 ([11]) Assume that {X n } satisfies the Laplace principle on X with rate function I and with rate function J. Then I(ξ) = J(ξ) for all ξ ∈ X .
31 The continuous image of a sequence of random variables satisfying the Laplace principle (or LDP) also satisfies the Laplace principle (or LDP). This property is illustrated in the next theorem. Theorem 2.23 (Contraction Principle [11]) Let X and Y be Polish spaces, f : X → Y a continuous function and I a rate function on X . (a) For each y ∈ Y J(y) := inf{I(x) : x ∈ X , y = f (x)}. is a rate function on Y, where the infimum over the empty set is taken as ∞. (b) If {X n } satisfies the Laplace principle on X with rate function I, then {f (X n )} satisfies the Laplace principle on Y with rate function J. Another important property of the Laplace principle is that it is preserved under superexponential approximation. Theorem 2.24 ([13]) Let X n and Y n be random variables defined on the same probability space (Ω, F, IP) and taking values in X . Assume that {X n } satisfies the Laplace principle on X with the rate function I and that {Y n } is superexponentially closed (exponentially equivalent) to {X n } in the following sense: for each δ > 0
lim sup n→∞
1 log IP{d(Y n , X n ) > δ} = −∞. n
Then {Y n } satisfies the Laplace principle on X with the same rate function I.
2.3.0.1
Examples
The role of large deviations in evaluating the difference between the timeaveraged value and its expectations when this difference is significant is best understood through the following example. Suppose that X1 , X2 , ..., Xn , ... is a sequence
32 of i.i.d. random variables, for instance, normally distributed with mean zero and variance 1. Then IE[eθ(X1 +···Xn ) ] = en(θ
2 /2)
.
On the other hand, IE[eθ(X1 +···Xn ) ] = IE[enθ(Sn /n) ] and since
Sn n
converges to zero almost
sure (the law of large numbers), it is further equal to IE[eo(n) ] which is different of eo(n) . Indeed, assuming that θ > 0, for any a > 0,
IE[eθSn ] ≥ enθa IP{
Sn 2 2 ≥ a} = enθa e−(na /2)+o(n) = en(aθ−a /2)+o(n) . n
Since a > 0 is arbitrary chosen, 2 /2)+o(n)
IE[eθSn ] ≥ en supa>0 (θa−a
= enθ
2 /2+o(n)
.
The simplest example for which one can calculate probabilities of large deviations is coin tossing ([14]). 1. Large deviations for sum of i.i.d. random variables Consider a sequence of independent tosses of a fair coin. The probability of k heads in n tosses is P (n, k) = Cnk 2−n '
√
√ 2πe−n nn+1/2 2√−n 2πe−(n−k) (n−k)n−k+1/2 2πe−k kk+1/2
by
Stirling’s approximation. Therefore, log P (n, k) ' − 21 log(2π) − 21 log n − (n − k + 12 ) log(1 − nk ) − (k + 12 ) log nk − n log 2. If k ' nx, then log P (n, k) ' −n[log 2 + x log x + (1 − x) log(1 − x)] + o(n) = −nH(x) + o(n)
33 where H(x) is the Kullback-Leibler information or relative entropy of Binomial(x, 1 − x) with respect to Binomial( 12 , 21 ). Also, if fi are the observed frequencies in n trials of a multinomial with probabilities {pi } for the individual cells, then
P (n, p1 , ..., pk ; f1 , ..., fk ) =
n! pf11 · · · pfkk . f1 ! · · · fk !
A similar calculation using Stirling’s approximation yields, assuming fi ≈ nxi ,
log P (n, p1 , · · · , pk ; f1 , · · · , fk ) = −nH(x1 , ..., xk ; p1 , ..., pk ) + o(n)
where H(x, p) is again the Kullback-Leibler information number
H(x) =
k X
xi log
i=1
xi . pi
Any probability distribution can be approximated by one that is concentrated on a finite set and the empirical distribution from a sample of size n will have a multinomial distribution. One therefore expects that the probability P (n, µ, ν) P that the empirical distribution n1 ni=1 δXi of n independent observations from a distribution µ is close to ν should satisfy
log P (n, µ, ν) = −nH(nu|µ) + o(n)
where H(ν|µ) is again the Kullback-Leibler information number Z
dν log dν = dµ
Z
dν dν log dµ. dµ dµ
(2.3.2)
34 The quantity H(ν|µ) is also called the relative entropy of ν with respect to µ. This is rigorous proven by Sanov. Theorem 2.25 (Sanov [15]) Let µ be a probability measure on the Polish space P ˜n ∈ X . Let for X = (X1 , ..., Xn ) ∈ X n define Ln (X) := n1 nm=1 δXm . Let µ M1 (M1 (X )) be the distribution under µn of the function Ln . Then the family {˜ µn : n ≥ 1} satisfies the large deviation principle with rate function I(·) = H(·|µ) where
H(ν|µ) =
R f log f dµ ∞
if ν µ and f =
dν dµ
(2.3.3)
otherwise
is the relative entropy of ν with respect to µ. A classical example for which the rate function is evaluated is Cram´er theorem. Let X1 , X2 , ... be i.i.d. d-dimensional random vectors with X1 distributed according to the probability law µ ∈ M1 (IRd ). Consider the empirical means P Sˆn := n1 ni=1 Xi . The cumulant generating function associated with the law µ is defined as Λ(λ) := log IE[ehλ,X1 i ] where hλ, xi :=
Pd
j=1
(2.3.4)
λj xj is the usual scalar product in IRd .
Let µn be the law of Sˆn and x¯ := IE[X1 ]. When x¯ exists and is finite and IE[|X1 − x¯|2 ] < ∞, then Sˆn converges in probability to x¯. Therefore µn (F ) → 0 as n → ∞ for any closed set F such that x¯ ∈ / F. Theorem 2.26 (Cram´er’s theorem [12]) Assume that Λ(λ) < ∞ for all |λ| small enough. Then the sequence of measures {µn } satisfies the LDP on IRd , that is 1 log µn (A) = − inf Λ? (x). n→∞ n x∈A lim
35 where the rate function
Λ? (x) := sup {hλ, xi − Λ(λ)} λ∈IRd
is the Fenchel-Legendre transform of Λ(λ). 2. Sample path large deviations for random walk and Brownian motion Let X1 , X2 , ... be a sequence of i.i.d. random variables taking values in IRd with the cummulant generating function Λ(λ) defined in (2.3.4) being finite for all λ ∈ IRd . Let [nt]
1X Zn (t) = Xi , n i=1
0 ≤ t ≤ 1,
and let µn be the law of Zn (·). Then Theorem 2.27 (Mogulskii’s theorem [16]) The measures µn satisfy the large deviation principle with rate function
I(φ) =
R 1 Λ? (φ0 (t)) dt, if φ ∈ A0 [0, 1] 0 ∞
otherwise
where Λ? is the Fenchel-Legendre transform of Λ(·). From this, another important result is derived, the large deviation principle for Brownian motion. Consider a standard Brownian motion (B(t), t ∈ IR+ , and Xn (t) =
B(t) √ , n
where n is a large parameter. Since Xn (t) − Xn (s) are Gaussian
with mean zero and variance with rate function
x2 . 2(t−s)
t−s , n
the sequence of random variables verify LDP
Let us now consider a vector (Xn (ti ), i = 1, 2, ...k),
where t1 < t2 ... < tk . By the property of independence of increments, the kdimensional vector (Xn (ti ) − Xn (ti−1 ), i = 1, 2, ..., k), t0 = 0, verify the LDP P x2 with rate function 12 ki=1 ti −tii−1 . Therefore, by the contraction principle, the
36 vectors (Xn (ti ), i = 1, 2, ..., k) verify the LDP with rate
1 2
(xi −xi−1 )2 . ti −ti−1
Pk
i=1
This
means that for small enough ε > 0 and large n k
1 1X log IP{|Xn (ti ) − xi | ≤ ε, i = 1, 2, ..., k} ≈ − n 2 i=1
xi − xi−1 ti − ti−1
2 (ti − ti−1 ).
Therefore, if (x(t), t ∈ [0, 1]) is an absolutely continuous function, then by considering finer and finer subdivisions of the [0, 1] one should expect that 1 1 log IP{|Xn (t) − x(t)| ≤ ε}| ≈ − n 2 so that the rate function should be
1 2
R1 0
Z
1
x0 (t)2 dt,
0
x0 (t)2 dt for absolutely continuous func-
tions starting at zero. The right-hand side tends to −∞ if x is either not absolutely continuous or if it does not start at zero, rendering the action functional equal infinity in these cases. Theorem 2.28 (Schilder’s theorem [12]) Let A0 ([0, 1]) denote of all absolutely continuous functions φ satisfying φ(0) = 0. Then Xn (t) :=
√1 B(t) n
satisfies the
large deviation principle on C[0, 1]) with rate function
I(φ) =
1 2
R1 0
φ0 (t)2 dt, if x ∈ A0 ([0, 1])
∞
if x ∈ C([0, 1]) A0 ([0, 1])
This statement can be generalized to Itˆo diffusions by applying various contraction principles. Let B be the standard Brownian motion and Xtε be the diffusion process, that is the unique solution of the stochastic differential equation
dXtε = b(Xtε )dt +
√
εdBt
0 ≤ t ≤ 1,
X0ε = 0,
37 where b : IR → IR is a uniformly continuous function. Theorem 2.29 (Freidlin-Wentzell theorem [12]) The family {Xtε } satisfies the LDP in C0 ([0, 1]) with rate function
I(φ) =
1 2
R1
∞
0
|φ0 (t) − b(φ(t))|2 dt, if f ∈ A0 ([0, 1]) otherwise
CHAPTER 3 LARGE DEVIATION PRINCIPLE FOR AVERAGE LIMIT THEOREMS The main mathematical object of this chapter is a family of coupled Markov process (ξ(t), x(t)), t ≥ 0 called the switched and switching processes, respectively. The switched process describes the evolution of the system and it is a stochastic functional of the process η(t; x), t ≥ 0, x ∈ E with locally independent increments [5]. In order to reduce the complexity of the phase space, the switching processes that describe the random changes in the evolution of the system, are jump Markov 0 processes considered in a split space E = ∪N k=1 Ek , Ek ∩ Ek0 = ∅, k 6= k with non-
communicating components, and having the ergodic property on each class Ek . By introducing the parameter > 0 one defines a jump Markov process on the split phase space with small transition probabilities between the states of the system and further merges the classes Ek , k = 1, 2, · · · , N into distinct states k, 1 ≤ k ≤ N . The average limit theorem of the stochastic additive functional with fast timescaling switching process is obtained by using the martingale characterization [8] and a solution of the singular perturbation problem for reducible-invertible operators [4]. We are interested in finding the large deviation principle for this sequence of stochastic additive functionals. Using the weak convergence approach of Dupuis and Ellis [13], a large deviation principle is derived for a sequence of random walks constructed such that they have the same distribution as the linear interpolation sequence of samples of stochastic additive functionals.
38
39 3.1
Average approximation
3.1.1
Stochastic additive functionals
Definition 3.1 An IRd × E-valued coupled stochastic process (ξ(t), x(t)), t ≥ 0 is called a Markov additive process (see [17], [18] for details) if (a) the coupled process (ξ(t), x(t)), t ≥ 0 is a Markov process; (b) on {ξ(t) = u} we have a.s. the following relation
IP(ξ(t + s) ∈ A, x(t + s) ∈ B|Ft ) = IP(ξ(t + s) − ξ(t) ∈ A − u, x(t + s) ∈ B|x(t)))
Example 3.2 Sums of i.i.d. random vectors in IRd . Example 3.3 A Markov renewal process ξn = τn , xn , n ≥ 0 is a Markov additive process with additive component ξn . Definition 3.4 The Markov additive process (ξ(t), x(t)), t ≥ 0 in IRd ×E with Markov switching x(t), t ≥ 0, defined by the relation Z
t
η(ds; x(s)), t ≥ 0,
ξ(t) = ξ0 +
(3.1.1)
0
is called a stochastic additive functional. The family of cadlag Markov processes η(t; x), t ≥ 0, x ∈ E, parameterized by x, are such that η(t; x(t)) is measurable, are of locally independent increment processes determined by their infinitesimal generators
0
Z
IΓ(x)ϕ(u) = a(u; x)ϕ (u) + IRd
[ϕ(u + v) − ϕ(u) − vϕ0 (u)]Γ(u, dv; x)
(3.1.2)
40 where the positive kernels Γ(u, dv; x), x ∈ E, satisfy the following conditions: the functions
Λ(u; x) := Γ(u, IRd ; x) Z b(u; x) := vΓ(u, dv; x) d ZIR C(u; v) := vv ? Γ(u, dv; x) IRd
are continuous and bounded on u ∈ IRd , and uniformly continuous and bounded on x ∈ E. The product aϕ0 stands for the inner product ha, ∇ϕi in IRd . The switching jump Markov process x(t), t ≥ 0 is defined by its infinitesimal generator Q as Z P (x, dy)[ϕ(y) − ϕ(x)],
Qϕ(x) = q(x)
(3.1.3)
E
where the kernel P (x, B) is the transition kernel of the embedded Markov chain, and q(x), x ∈ E is the intensity of jumps function. Lemma 3.5 The Markov additive process (ξ(t), x(t)), t ≥ 0 is determined by the infinitesimal generator
ILϕ(u, x) = Qϕ(u, x) + IΓ(x)ϕ(u, x).
(3.1.4)
41 Proof: We have:
IE[ϕ(u(t + ∆t), x(t + ∆t))|ξ(t) = u, x(t) = x] = IE[ϕ(u + ∆ξ(t), x(t + ∆t))|ξ(t) = u, x(t) = x] = IE[ϕ(u + ∆ξ(t), x(t + ∆t))|x(t) = x] = IE[ϕ(u + ∆ξ(t), x(t + ∆t))1I{θx >∆t} |x(t) = x] + IE[ϕ(u + ∆ξ(t), x(t + ∆t))1I{θx 0 with the standard phase space (E, E), on the split phase space
E=
N [ k=1
Ek ,
Ek
\
Ek0 = ∅,
k 6= k 0
42 given by the infinitesimal generator Z
Q ϕ(x) = q(x)
P (x, dy)[ϕ(y) − ϕ(x)].
(3.1.5)
E
The phase merging algorithm is considered under the following assumptions: A1. The stochastic kernel in (3.1.5) is represented in the following form
P (x, B) = P (x, B) + P1 (x, B)
where the stochastic kernel P (x, B) is coordinated with the splitting as follows:
P (x, Ek ) = 1Ik (x) :=
1, x ∈ Ek , 0, x 6∈ Ek
A2. The Markov supporting process x(t), t ≥ 0 on the state space (E, E), determined by the generator Q given in (3.1.3) is supposed to be uniformly ergodic in every class Ek , 1 ≤ k ≤ N , with the stationary distribution πk (dx), 1 ≤ k ≤ N , satisfying the following relations Z πk (dx)q(x) = qk ρk (dx),
qk =
πk (dx)q(x), Ek
Z ρk (B) =
ρk (dx)P (x, B),
ρk (Ek ) = 1.
Ek
The perturbing operator P1 (x, B) is a signed kernel which satisfies the conservative condition P1 (x, E) = 0.
43 A3. The average exit probabilities satisfy the following condition Z ρk (dx)P1 (x, E\Ek ) > 0,
pˆk :=
1 ≤ k ≤ N.
Ek
Introduce the merging function m(x) = k, x ∈ Ek , 1 ≤ k ≤ N, and the merged process xˆ (t) := m(x (t/)),
t≥0
(3.1.6)
on the merged phase space Eˆ = {1, ..., N }. The phase merging principle establishes the weak convergence of the above process to the limit Markov process xˆ(t) (a variant of Theorem 4.1 in [5]). Theorem 3.6 (Ergodic phase merging principle) Under the assumptions A1 − A3, the following weak convergence holds
xˆ (t) ⇒ xˆ(t)
as
→ 0.
The limit merged Markov process xˆ(t), t ≥ 0, on the merged state space Eˆ is determined by the generator matrix
ˆ = (ˆ Q qkr ; 1 ≤ k, r ≤ N ),
with entries Z qˆkr = qˆk pkr ,
pkr =
ρk (dx)P1 (x, Er ),
1 ≤ k, r ≤ N
(3.1.7)
Ek
where ρk is the stationary distribution of the corresponding embedded Markov chain.
44 Proof: Since x (t) is a jump Markov process with infinitesimal generator Q+Q1 , then x ( t ) has the generator Q := −1 Q + Q1 . Using the martingale characterization of x ( t ) we have that µt
:= ϕ
Z t s t x( ) − Q ϕ x ( ) ds 0
with ϕ ∈ D(Q ), is a martingale. Let us consider the test functions of the form ϕ = ϕ +ϕ1 . Q is a reducible-invertible operator, so using Proposition 2.11 the asymptotic representation
(−1 Q + Q1 )(ϕ + ϕ1 ) = ψ + θ
is realized by
ˆ 1 ϕˆ = ψˆ Q ϕ1 = R0 (ψ − Q1 ϕ) θ = Q1 R0 (ψ − Q1 ϕ) R0 = (Q + Π)−1 − Π ˆ 1Π ΠQ1 Π = Q Let us define ϕˆ : Eˆ → IR by ϕ(m(x)) ˆ = ϕ(x). Then ϕ = ϕ(m(x)) ˆ + ϕ1 (x) and ˆ 1 ϕ(m(x)) (−1 Q + Q1 )(ϕ + ϕ1 ) = Q ˆ + θ .
45 This implies that
µt
Z t Z t s s t t ˆ Q1 ϕˆ m(x ( )) ds − θ x ( ) ds := ϕˆ m(x ( )) + ϕ1 x ( ) − 0 0
is a martingale. Therefore Z t t s ˆ ϕˆ m(x ( )) − Q1 ϕˆ m(x ( )) ds = µt + ψt , 0 where Z
t
s θ x ( ) ds − ϕ1 (xt ). = 0 Applying Theorem 2.15 we get that m x ( t ) ⇒ xˆ(t) and xˆ(t) is the solution of the ψt
martingale problem t
Z
ˆ 1 ϕ(ˆ Q ˆ x(s)) ds
µ ˆt = ϕ(ˆ ˆ x(t)) − 0
ˆ 1 determining the merged Markov process xˆ(t) on the with infinitesimal generator Q ˆ merged phase space E. ˆ 1 is defined by Q ˆ 1 Π = ΠQ1 Π. Therefore, The contracted operator Q
ΠQ1 Πϕ(x) = ΠQ1
N X
ϕˆk 1Ik (x) =
k=1
=
N X
R Er
Z
ϕˆk ΠQ1 (x, Ek ) =
k=1
Ek
ϕˆk
N X r=1
πr (dx)Q1 (x, Ek ), 1 ≤ r, k ≤ N . Thus
ΠQ1 Πϕ(x) =
N X r=1
1Ir (x)
N X k=1
Q1 (x, dy)
ϕˆk Π
k=1 N X
k=1
where qˆrk :=
N X
qˆrk ϕˆk
qˆrk 1Ir (x)
46 ˆ 1 is determined by the matrix and the contracted operator Q ˆ 1 := (ˆ Q qrk ; 1 ≤ r, k ≤ N ) N X ˆ ˆ ˆ Q1 ϕˆ = ψ, ψr := qˆrk ϕˆk ,
1 ≤ r ≤ N.
k=1
Let us consider the stochastic additive functionals ξ ε (t), t ≥ 0 in the following scheme: ε
t
Z
ε
η
ξ (t) = ξ (0) +
ε
0
s ds; x ( ) , ε ε
t ≥ 0, ε > 0.
(3.1.8)
where the process η ε (t; x), t ≥ 0, ε > 0, x ∈ E is given by the infinitesimal generators
ε
0
ε
IΓ (x)ϕ(u) = a (u; x)ϕ (u) + ε
−1
Z
[ϕ(u + εv) − ϕ(u) − εvϕ0 (u)]Γε (u, dv; x). (3.1.9)
IRd
The Markov additive process (ξ ε (t), xε (t)), t ≥ 0, ε > 0 is determined by the infinitesimal generator
ILε ϕ(u, x) = Qε ϕ(u, x) + IΓε (x)ϕ(u, x).
(3.1.10)
We want to prove the weak convergence of the stochastic system with Markov switching in the series scheme, (ξ ε (t), xε (t)), t ≥ 0, ε > 0. For this we use the singular perturbation problem for reducible-invertible operators and the martingale characterization approach of Stroock and Varadhan to find the relative compactness of the process. The coupled Markov process is characterized by the martingale
ε
ε
ε
Z
µ (t) = ϕ(ξ (t), x (t)) − 0
t
ILε ϕ(ξ ε (s), xε (s)) ds
47 where the generators ILε , ε > 0 have the common domain of definition D(IL) supposed to be dense in C02 (IRd × E). Lemma 3.7 Let the generators ILε , ε > 0 have the following property
|ILε ϕ(u)| ≤ cϕ
(3.1.11)
for any real-valued nonnegative function ϕ ∈ C02 (IRd ), where the constant cϕ depend only on the norm of ϕ. Suppose that the compact containment condition holds
lim sup IP( sup |ξ ε (t)| > l) = 0
l→∞ ε>0
(3.1.12)
0≤t≤T
Then the family of stochastic processes ξ ε (t), t ≥ 0, ε > 0 is relatively compact. Proof:
Let us consider the process η ε (t) := ϕ(ξ ε (t)) + cϕ t, t ≥ 0, where ϕ is
understood to be composed with the projection on the first component. We want to prove that η ε (t) is an Ftε -submartingale. For 0 ≤ s ≤ t we have
IE[η
ε
(t)|Fsε ]
Z
s ε
IL ϕ(ξ
(u)) du|Fsε
+ µεs + cϕ s + Z0 t ε ε ε (IL ϕ(ξ (u)) + cϕ ) du|Fs IE s Z t ε ε ε ε = η (s) + IE (IL ϕ(ξ (u)) + cϕ ) du|Fs
= IE
ε
s
where the last term is nonnegative due to the condition (3.1.11). So the conditions of Theorem 2.14 are fulfilled, and therefore ξ ε (t), t ≥ 0, ε > 0 is relatively compact.
48 Lemma 3.8 Let the generators ILε , ε > 0 have the following estimation for ϕ0 (u) = √ 1 + u2 , ILε ϕ0 (u) ≤ cl0 ϕ0 (u),
|u| ≤ l0
where the constant cl0 depends on the function ϕ0 , but not on ε > 0, and IE[|ξ ε (0)|] ≤ b < ∞. Then the compact containment condition (3.1.12) holds. Proof:
ϕ00 (u) =
√ u , 1+u2
ϕ000 (u) =
1√ . (1+u2 ) 1+u2
Then ϕ0 (u) ≤ 1 + |u|, |ϕ00 (u)| ≤
1 ≤ ϕ0 (u), |ϕ000 (u)| ≤ 1 ≤ ϕ0 (u). It is known (Lemma 3.2 [9]) that if x(t), t ≥ 0 is a Markov process defined by the generator IL and if ϕ ∈ D(IL) then
−λt
e
t
Z
e−λs [λϕ(x(s)) − ILϕ(x(s))] ds
ϕ(x(t)) − 0
is a martingale. Let us define the stopping time τlε by
τlε :=
inf{t ∈ [0, T ] : |ξ ε (t)| ≥ l} if |ξ ε (t)| ≤ l for all t ∈ [0, T ]
T Then
e
−cl0 t∧τlε
ε
ϕ0 (ξ (t ∧
τlε ))
Z +
t∧τlε
e−cl0 s [cl0 ϕ0 (ξ ε (s)) − ILε ϕ0 (ξ ε (s))] ds
0
is a martingale. We get, for s ≤ t ∧ τlε , cl0 ϕ0 (ξ ε (s)) − ILε ϕ0 (ξ ε (s)) ≥ 0, so ε IE e−cl0 t∧τl ϕ0 (ξ ε (t ∧ τlε )) ≤ IEµεt = IEµε0 = IEϕ0 (ξ0ε ).
49 Then
IPεl : = IPε ( sup |ξ ε (t)| > l) ≤ IPε (ϕ0 (ξ ε (τlε )) ≥ ϕ0 (l)) 0≤t≤T IE[ϕ0 (ξ ε (τlε ))]
IE[ϕ0 (ξ ε (0))] ϕ0 (l) ϕ0 (l) ε (1 + b) 1 + IE|ξ (0)| ≤ ecl0 T ≤ ecl0 T ϕ0 (l) ϕ0 (l) ≤
≤ ecl0 T
which converges to 0 as l → ∞. Corollary 3.9 Let the generators ILε , ε > 0 have the following estimation: |ILε ϕ(u)| ≤ cϕ for any real-valued nonnegative function ϕ ∈ C02 (IRd ), where the con√ stant ϕ depends only on the norm of ϕ, and for ϕ0 (u) = 1 + u2 , ILε ϕ0 (u) ≤ cl0 ϕ0 (u), |u| ≤ l0 , with the constant cl0 depending only on the function ϕ0 , but not on ε > 0. Then, the family of processes ξ ε (t), t ≥ 0, ε > 0 is relatively compact. Let the switching Markov process xε (t), t ≥ 0 satisfies the phase merging condition of Theorem 3.6. Let the following conditions be valid C1. the drift velocity a(u; x) belongs to the Banach space B 1 (IRd ), with
aε (u; x) = a(u; x) + θε (u; x)
where θε (u; x) → 0 as ε → 0 uniformly on (u; x) and Γε (u, dv; x) ≡ Γ(u, dv; x) independent of ε. C2. the operator γ ε (x)ϕ(u) = ε−1
R
IRd
[ϕ(u + εv) − ϕ(u) − εvϕ0 (u)]Γ(u, dv; x) is neg-
ligible on B 1 (IRd ):
sup ϕ∈C 1 (IRd )
||γε (x)ϕ|| → 0
as
→0
50 C3. the convergence in probability of the initial values of (ξ ε (t), m(xε ( εt )), t ≥ 0 holds, that is (ξ ε (0), m(xε (0)) → (ξ(0), xˆ(0)) and there exists a constant c ∈ IR+ such that supε>0 IE|ξ ε (0)| ≤ c < ∞. As in [19], under Markov switching, we get the following averaging theorem. Theorem 3.10 (Average approximation) The stochastic evolutionary system ξ ε (t), ˆ t ≥ 0 defined by (3.1.8) converges weakly to the averaged stochastic system ξ(t),
ˆ ξ ε (t) ⇒ ξ(t)
as
ε → 0.
ˆ The limit process ξ(t), t ≥ 0 is defined by a solution of the evolutionary equation dˆ ˆ xˆ(t)), ξ(t) = a ˆ(ξ(t); dt
ˆ = ξ(0), ξ(0)
(3.1.13)
where the averaged velocity is determined by Z a ˆ(u; k) =
πk (dx)a(u; x)
1 ≤ k ≤ N.
Ek
ˆ is a random dynamical system evolving determinNote 3.11 The limit process ξ(t) N (T )
istically on random time intervals [τi , τi+1 ), where {τi }i=1 are the transition times of the stationary merged process xˆ(t) and N (T ) the number of transitions on [0, T ].
51 First we will prove that the family (ξ ε (t), t ≥ 0, ε > 0 is tight. For √ this, let’s consider the test functions ϕ0 : IR → [1, ∞), ϕ0 (u) = 1 + u2 . We have Proof:
|ϕ00 (u)| ≤ 1 ≤ ϕ0 (u), |ϕ000 (u)| ≤ 1 ≤ ϕ0 (u), u ∈ IR. Then ILε ϕ0 (u) = ε−1 Qϕ0 (u) + Q1 ϕ0 (u) + IΓ(x)ϕ0 (u) = aε (u; x)ϕ00 (u) ≤ |aε (u; x)|ϕ0 (u) ≤ ca ϕ0 (u).
By Lemma (3.8) the compact containment condition follows. For ϕ ∈ C0∞ (IR) we write: |ILε ϕ(u)| ≤ qϕ(u) + |aε (u; x)||ϕ0 (u)| ≤ cϕ
so using Lemma (3.7) it follows that (ξ ε (t), t ≥ 0, ε > 0) is relatively compact and therefore tight in D([0, T ]) for any T > 0. The generator of the coupled Markov process (ξ ε (t), xε ( εt )), t ≥ 0 is ILε = ε−1 Q + Q1 + IΓ(x) + γ ε (x) + Σε (x)
where IΓ(x)ϕ(u) = a(u; x)ϕ0 (u), Σε (x)ϕ(u) = θε (u; x)ϕ0 (u) are bounded operators and the operator γ ε (x) + Σε (x) is negligible on B 1 (IRd ). Let R0 be the potential of the operator Q, R0 = [Q + Π]−1 − Π. ˆ be the space of measurable, bounded functions ϕ(u, v) with compact Let C¯02 (IRd × E) support and twice continuously differentiable on the first argument. We need to find the uniform convergence of generators, i.e.,
lim ILε ϕε (u, x) = ILϕ(u, m(x)).
ε→0
52 Using the reducible-invertible operator method we have the following asymptotic representation [ε−1 Q + Q1 + IΓ(x)]ϕε (u, x) = ILϕˆ + εΘε (x) ˆ is realized by with ϕε (u, x) = ϕ(u, m(x)) + εϕ1 (u, x) and ϕˆ = ϕ(u, ˆ v) ∈ C¯02 (IRd × E), ˆ =Q ˆ 1 + IΓ(x) ˆ IL ˆ 1 and IΓ(x) ˆ where contracted operators Q are defined by ˆ 1Π ΠQ1 Π = Q ˆ ΠIΓ(x)Π = IΓ(x)Π ϕ1 = R0 (IL − Q1 )ϕ Θε = (Q1 + IΓ(x))ϕ1 (u, x)
hence Θ (x) is a bounded operator independent of ε so εΘε → 0. Indeed,
[ε−1 Q + Q1 + IΓ(x)][ϕ(u, m(x)) + εϕ1 (u, x)] = ε−1 Qϕ(u, m(x)) + ε[Q1 ϕ1 (u, x) + IΓ(x)ϕ1 (u, x)] + Qϕ1 (u, x) + Q1 ϕ(u, m(x))
From the condition Qϕ(u, m(x)) = 0 we get that ϕ ∈ NQ so ϕ = Πϕ. For
Qϕ1 (u, x) = ILϕˆ − Q1 ϕ(u, m(x)) − IΓ(x)ϕ(u, m(x))
53 under the solvability condition
ˆ ϕˆ = (Q ˆ 1 + IΓ(x)) ˆ Π(ILϕ − Q1 ϕ − IΓ(x)ϕ) = 0 which means IL ϕ, ˆ
we get the solution ϕ1 (u, x) = R0 (IL − Q1 )ϕ(u, m(x)) Therefore limε→0 ILε ϕε (u, x) = ILϕ(u, m(x)) uniformly, where ˆ 1 ϕ(·, ˆ ϕ(u, ILϕ(u, x) = Q ˆ x) + IΓ(x) ˆ ·). ˆ It remains to calculate IΓ(x). Since IΓ(x)ϕ(u, ·) = a(u; x)ϕ0 (u, ·) we get
ˆ ϕ(u, IΓ(x) ˆ ·) = ΠIΓ(x)Πϕ(u, ˆ ·) = Πa(u; x)ϕˆ0 (u, ·) = a ˆ(u; x)ϕˆ0 (u, ·)
where Z a ˆ(u; k) =
πk (dx)a(u; x) Ek
ˆ t ≥ 0 is defined by a solution of the evolutionary equation Thus the limit process ξ(t), dˆ ˆ xˆ(t)), ξ(t) = a ˆ(ξ(t); dt which ends the proof.
ˆ = ξ(0) ξ(0)
54 3.2
Large deviation principle for ergodic Markov processes Let x(t), t ∈ IR+ be a time-homogeneous Markov process on a compact metric
space X, BX be the Borel σ-algebra in X and M(X) the space of probability measures on BX . Let introduce a random measure on BX by 1 νt (B) = t
Z
t
1{x(s)∈B} ds,
B ∈ BX .
0
Theorem 3.12 ([20] and [21]) Assume that the process x(t), t ∈ IR+ is an ergodic Markov process. Then the following large deviation result holds
− inf ◦ I(m) ≤ lim inf m∈Γ
t→∞
1 1 log IP{νt ∈ Γ} ≤ lim sup log IP{νt ∈ Γ} ≤ − inf I(m) ¯ t m∈Γ t→∞ t
where the rate function I : M(X) → [0, +∞] is defined by Z I(m) = − inf{
(φ(x))−1 Qφ(x)m(dx) : φ ∈ D(Q), φ > 0}
and Γ ∈ B(M(X)) be the Borel σ-algebra in M(X). Typically Γ = {ν ∈ M(X)|d(ν, m) > δ, I(m) = 0} where d is some metric on M(X). The rate function I(m) verifies the following properties (i) I(m) ≥ 0 for all m ∈ M(X), and I(m) = 0 if and only if m is the invariant measure for the ergodic Markov process, (ii) I(m) is a convex function, i.e.,
I(sm1 + (1 − s)m2 ) ≤ sI(m1 ) + (1 − s)I(m2 ),
mi ∈ M(X), i = 1, 2, 0 < s < 1
55 (iii) I(m) is a lower semi-continuous function, i.e.,
lim inf I(mn ) ≥ I(m) mn →m
(iv) For any b > 0 the set Cb (I) = {m : I(m) ≤ b} is compact, and the function I(m) is continuous on this compact set. We illustrate the concept of the split phase space in the following example. Example 3.13 Let us consider a four-state Markov process x(t), t ∈ IR+ on the split phase space E = {1, 2, 3, 4} = E1 ∪ E2 , E1 = {1, 2}, E2 = {3, 4} generated by
−λ λ1 0 0 1 µ1 −µ1 0 0 Q= 0 −λ2 λ2 0 0 0 µ2 −µ2 One checks that the Markov process x(t) is ergodic in both E1 and E2 with stationary 1 distributions π1 = ( λ1µ+µ 1
λ1 ) λ1 +µ1
2 and π2 = ( λ2µ+µ 2
λ2 ). λ2 +µ2
Now we analyze singularly perturbed Markov processes by introducing a small parameter ε > 0 which leads to a singular perturbed system involving two-time scales, the actual time t and the stretched time εt . Since the process x(t) is ergodic on E1 , E2 , the system can be decomposed and the states of the Markov process can be aggregated.
56 Let xε (t) be a Markov chain on E generated by Q + Q1 with Q defined above and Q1 given by
−λ 0 λ1 0 1 0 −µ1 0 µ 1 Q1 = 0 −λ2 0 λ2 0 µ2 0 −µ2 and xε ( εt ) be a time-invariant Markov process with generator Qε = 1ε Q + Q1 . Note that for small ε, the Markov process xε ( εt ) jumps more frequently within each block and less frequently from one block to another. To further understanding of the underlying process, we consider the merged process xˆε (t) := m(xε ( εt )) obtained by aggregating the states in the k th block by a single state k and study its asymptotic behavior (for many asymptotic results see [22]). Theorem 3.6 states that the limit process is a Markov process on the merged ˆ = (ˆ space Eˆ = {1, 2} determined by generator matrix Q qkr , 1 ≤ k, r ≤ 2) with qˆkr verifying (3.1.7). For this example, q1 = 1,
p21 = 1,
2λ1 µ1 , λ1 +µ1
q2 =
2λ2 µ2 , λ2 +µ2
p11 = −1,
p12 =
p22 = −1. Thus,
2λ1 µ1 − λ1 +µ1
ˆ= Q
2λ2 µ2 λ2 +µ2
2λ1 µ1 λ1 +µ1
−λ λ := 2 µ2 µ −µ − λ2λ2 +µ 2
Note 3.14 The merged process xˆε (t), unlike its limit xˆ(t), is not time-homogeneous. Let us consider now the occupational time of xˆ(t) defined by 1 νt (B) = t
Z
t
1I{ˆx(t)∈B} (s)ds 0
for any B ∈ BEˆ . By ergodic theorem, the measure νt converges to the ergodic distribution ρ as t goes to ∞. As an example, let M be the set of all probability
57 measures on {0, 1} identified with {(p, 1 − p), 0 ≤ p ≤ 1} and d(x, y) = |x| + |y|, x, y ∈ IR2 . Then Theorem 3.12 implies that for ρ = (p0 , 1 − p0 ) and Γ = {(p, 1 − p) | d((p, 1 − p), (p0 , 1 − p0 )) > δ}, IP(νt ∈ Γ) ∼ exp(−tI(p0 + δ, 1 − p0 − δ)) for large t with Z I(m) = − inf{ ˆ E
− inf{λp(
ˆ (Qφ)(y) ˆ φ(y) > 0, ∀y ∈ {0, 1}} = m(dy) : φ ∈ D(Q), φ(y)
φ(2) φ(1) − 1) + µ(1 − p)( − 1); φ(1) φ(2)
φ(1), φ(2) > 0}
for m = (p, 1 − p). The infimum is attained at
3.3
q
µ 1 ( λ p
p − 1) and I(m) = λp+µ(1−p)−2 λµp(1 − p).
Large deviations for stochastic additive functionals Let us consider the family of stochastic additive functionals ξ ε (t), t ≥ 0 repre-
sented by ε
ε
Z
ξ (t) = ξ (0) + 0
t
s η ε ds; xε ( ) , ε
t ≥ 0, ε > 0.
The family of coupled Markov processes (ξ ε (t), xε ( εt )), t ≥ 0, ε > 0 on IRd × E has infinitesimal generator ILε given by ILε = 1ε Q + Q1 + IΓε (x) with the domain D(ILε ) ˆ xˆ(t)), t ≥ 0 is a Markov process on dense in C(IRd × E) and the limit process (ξ(t), ˆ IRd × E. Our goal is to show the large deviation principle for this family of stochastic additive functionals with the rate function I stated as
− inf◦ I ≤ lim inf ε log IP{ξ ε ∈ Γ} ≤ lim sup ε log IP{ξ ε ∈ Γ} ≤ − inf I Γ
ε→0
¯ Γ
(3.3.1)
58 ¯ represent the interior respectively the closure of the set Γ. In the where Γ◦ and Γ ˆ particular case in which we take Γ = {ξ(t) : ||ξ(t)− ξ(t)|| > δ} one gets the asymptotic ˆ behavior of the IP(supt∈[0,T ] ||ξ ε (t) − ξ(t)|| > δ). Proposition 3.15 ([13]) If the sequence ξ ε satisfies the large deviation principle on D([0, T ], IRd ) with rate function Iu (ϕ), then for all bounded continuous functions h : D([0, T ], IRd ) → IR, 1 lim ε log IE{exp [− h(ξ ε )]} = −infϕ∈D([0,T ],IRd ) {h(ϕ) + Iu (ϕ)} ε→0 ε
(3.3.2)
The Laplace principle implies the large deviation principle with the same rate function (Theorem 1.2.3). Proposition 3.16 If Iu is a rate function on D([0, T ], IRd ) and the limit (3.3.2) is valid for all bounded continuous functions h, then the sequence ξ ε satisfies the large deviation principle on D([0, T ], IRd ) with rate function I. ˆ the family ξtε := ξtε (u; k), t ≥ Lemma 3.17 Suppose that for each fixed k ∈ E, 0, ε > 0 satisfies the large deviation principle with the rate function Iu,k (·). If xˆt is a stationary process on Eˆ then ξtε (u; xˆ(t)) satisfies the large deviation principle with the rate function Iu (ϕ) = min{Iu,k (ϕ) : 1 ≤ k ≤ N }. Proof:
ˆ the family ξ ε (u; k), t ≥ 0, ε > 0 satisfy the Since for each fixed k ∈ E, t
large deviation principle with the rate function Iu,k , we have
ε log IP(ξ ε ∈ Γ|ˆ xε = k) ∼ − inf Iu,k . Γ
Let’s denote bεk := IP(ξ ε ∈ Γ|ˆ xε = k), bk := inf Γ Iu,k and pk = IP(ˆ xε = k). Thus ε log bεk ∼ −bk and therefore bεk = exp(− 1ε bk + cεk ) with cεk = o( 1ε ). We want to prove that ε log IP(ξ ε ∈ Γ) ∼ − min{b1 , · · · , bN }. We may assume that
59 b1 ≤ b2 ≤ · · · ≤ bN and 0 < pi < 1, 1 ≤ i ≤ N without loss of generality. P ε Since IP(ξ ε ∈ Γ) = N xε = k)IP(ˆ xε = k), it is enough to prove that k=1 IP(ξ ∈ Γ|ˆ ε log(bε1 p1 + · · · + bεN pN ) ∼ −b1 which is equivalent to because
bεi bε1
1 bε1 p1 +···+bεN pN
∼
1 . bε1 p1
This is true
= exp(− 1ε (bi − b1 + ε(cεi − cε1 )) goes to 0 as ε goes to 0.
Theorem 3.18 For absolutely continuous functions ϕ from D([0, T ], IRd ), with T > 0 ˆ define arbitrary fixed, satisfying ϕ(0) = u, and for each fixed k ∈ E, Z
T
L(ϕ(t), ϕ(t); ˙ k)dt,
Iu,k (ϕ) :=
(3.3.3)
0
where L is subsequently defined by (3.3.9). For all other functions in D([0, T ], IRd ), Iu,k (ϕ) := ∞. Then the family ξ ε (t), ε > 0 satisfies the large deviation principle with rate function Iu (ϕ) = min{Iu,k (ϕ) : 1 ≤ k ≤ N }
(3.3.4)
Proof: This will be carried out in several steps. For the sake of clarity, it became necessary to state a number of known results, which we reformulated and adapted to our situation. step 1. Consider the martingale problem for the generator ILε and its relationship with the exponential martingale problem [8] by taking the transformation H ε defined as 1
1
H ε f := εe− ε f ILε e ε f
(3.3.5)
An important step is to prove the convergence of H ε for an appropriate collection of sequences f ε to an operator H in the sense that if f ε converges to f as ε → 0 the H ε f ε converges to Hf [23].
60 Let us consider the test functions f ε (u, x) = f (u)+ε log ϕε (u, x) with ϕε (u, x) = ϕ(u, m(x)) + εϕ1 (u, x), where f, ϕε (u, x) are bounded, measurable, continuous differentiable functions on u ∈ IRd , with bounded first derivative, and uniformly continuous on E, convergent to the function f (u) Then, H ε f ε converges to Hf ,
0
Z
Hf (u; x) := a(u; x)f (u) +
0
(evf (u) − 1 − vf 0 (u))Γ(u, dv; x).
(3.3.6)
IRd
ˆ defined by Applying the stationary projector Π : B(E) → E, Z ρ(dx)ϕ(y)1I(x)
Πϕ(x) := E
(where 1I(x) = 1 for all x ∈ E), we obtain
ˆ (u; k) = a Hf ˆ(u; k)f 0 (u) +
Z
0 ˆ dv; k) (evf (u) − 1 − vf 0 (u))Γ(u,
(3.3.7)
IRd
where Z a ˆ(u; k) =
ˆ dv; k) = πk (dx)a(u; x) and Γ(u,
Z πk (dx)Γ(u, dv; k). Ek
Ek
A key role is played by the function in u and p in IRd defined by Z H(u, p; k) := a ˆ(u; k)p +
ˆ dv; k) (evp − 1 − vp)Γ(u,
IRd
having the following properties: ˆ sup d H(u, p; k) < ∞; (Ia) for each p ∈ IRd and each k ∈ E, u∈IR ˆ H(u, p; k) is a continuous function of (u, p) ∈ IRd × IRd . (Ib) for each k ∈ E,
(3.3.8)
61 For u and q in IRd we define the Fenchel-Legendre transform
L(u, q; k) := sup {pq − H(u, p; k)}
(3.3.9)
p∈IRd
step 2. As in Lemma 6.2.3. [13] the following properties of the FenchelLegendre function can be proved.
Lemma 3.19 The functions H(u, p; k) and L(u, q; k) defined by (3.3.8) and (3.3.9) respectively, have the following properties ˆ H(u, p; k) is a finite convex function of p ∈ IRd which (a) For each u ∈ IRd , k ∈ E, is differentiable for all p. In addition, H(u, p; k) is a continuous function of (u, p) ∈ IRd × IRd ˆ L(u, q; k) is a convex function of q ∈ IRd . In addition, (b) For each u ∈ IRd , k ∈ E, L(u, q; k) is a nonnegative, lower semi-continuous function of (u, q) ∈ IRd × IRd
(c) L(u, q; k) is uniformly superlinear in the sense:
lim inf
inf
N →∞ u∈IRd q∈IRd :||q||=N
1 L(u, q; k) = ∞ ||q||
ˆ the relative interior ri(domL(u, ·; k)) = ri(convSµ(·|u,k) ); (d) For each u ∈ IRd , k ∈ E, in particular L(u, q; k) equals ∞ for u ∈ IRd and q ∈ (cl(convSµ(·|u,k) ))c . For any q ∈ ri(domL(u, ·; k)) there exists v = v(u, q; k) ∈ IRd such that ∇v H(u, v(u, q; k); k) = q. In addition,
L(u, q; k) = v(u, q; k)q − H(u, v(u, q; k); k)
62 (e) Suppose in addition that for a given u ∈ IRd , convSµ(·/u) has nonempty interior. Then H(u, v; k) is a strictly convex function of v ∈ IRd , int(domL(u, ·; k)) is nonempty, for each q ∈ int(domL(u, ·; k)) there exists a unique value of v such that ∇v H(u, v(u, q; k); k) = q, and L(u, ·; k) is differentiable on int(domL(u, ·; k)). ˆ (f ) For each u and q in IRd , k ∈ E,
d
Z
L(u, q; k) = inf{R(ν(·)||µ(·|u, k) : ν ∈ P(IR ),
vν(dv) = q} IRd
and the infimum is always attained. If L(u, q; k) < ∞, then the infimum is atR tained uniquely. R(·||·) is the relative entropy defined by R(ν||θ) := (log dν )dν dθ whenever ν is absolutely continuous with respect to θ. Otherwise R(ν||θ) := ∞.
(g) There is a stochastic kernel ν(dv|u, k) on IRd given IRd × Eˆ satisfying for u and q in IRd , Z R(ν(·|u, k)||µ(·|u, k)) = L(u, q; k) and
vν(dv|u, k) = q IRd
(h) If ν ∈ P(IRd ) satisfies R(ν(·)||µ(·|u, k)) < ∞ for u ∈ IRd , k ∈ Eˆ then R ||v||ν(dv) < ∞ and IRd Z R(ν(·|u, k)||µ(·|u, k)) ≥ L(u,
vν(dv); k). IRd
step 3. To prove Laplace principle for the sequence ξ ε it is sufficient to prove it for a sequence of random walks X n constructed below.
63 Let h be any bounded continuous function mapping D([0, T ], IRd ) into IR. We prove the Laplace limit (3.3.2) when ε → 0 along any sequence {εn , n ∈ IN} converging to 0. Let’s fix such a sequence. By sampling the process ξ εn at a sequence of times depending on εn , we define a sequence of piecewise linear processes {ζ n , n ∈ IN} for which we prove Laplace principle. Then we show that the sequence is superexponentially closed to {ξ εn , n ∈ IN}. Fix T > 0. For each n ∈ IN, let cn := [ εTn ] ( where [x] represents the integer part of x). Consider the sampled sequence ξ εn ( Tcnj ), j = 0, 1, · · · , cn − 1. Define ζ n := {ζ n (t), t ∈ [0, T ]} by Tj Tj εn T (j + 1) εn T j ζ (t) = ξ ( ) + cn (t − ) ξ ( )−ξ ( ) cn cn cn cn n
for t ∈
h
T j T (j+1) , cn cn
εn
i , which is the linear interpolation of the sampled sequence
ξ εn ( Tcnj ), j = 0, 1 · · · , cn − 1. ˆ let {vjn (u; k), u ∈ IRd , j ∈ IN0 } be an i.i.d sequence of For each fixed k ∈ E, random vector fields having the common distribution
n
µ (dv|u, k) := IPu
cn T
εn T ξ ( ) − u ∈ dv cn
(3.3.10)
ˆ We construct the random walks which is a stochastic kernel on IRd given IRd × E. corresponding to the sequence of stochastic kernels µn (dv|u, k) as follows: for each u ∈ ˆ n ∈ IN, consider the sequence of random variables {Xjn , j = 0, 1, · · · , cn − IRd , k ∈ E, 1} taking values in IRd with
n Xj+1 := Xjn +
T n n v (X ; k), cn j j
X0n = u.
64 Suppose that the the sequence of random vectors Xjn is interpolated into a piecewise linear continuous-time process X n := {X n (t), t ∈ [0, T ]} by
n
X (t) =
Xjn
Tj + t− vjn (Xjn ; k), cn
T j T (j + 1) t∈ , , j = 0, 1, · · · , cn − 1 cn cn
Then the distribution of ζ n is the same as the distribution of X n . For each n ∈ IN ˆ define and u, p ∈ IRd , k ∈ E, Z
n
H (u, p; k) := log
evp µn (dv|u, k)
(3.3.11)
IRd
step 4. We will show that the function H(u, p; k) defined in (3.3.8) can be written as the moment generating function of a stochastic kernel µ(dv|u, k).
Lemma 3.20 For ε > 0, p ∈ IRd , k ∈ E, s ∈ [0, T ], t ∈ [s, T ], n ∈ IN, u ∈ IRd and δ > 0, define:
Npε (s; t)
εn Np,u (t)
Z t 1 ε ε ε := exp p(ξ (t) − ξ (s)) − H(ξ (v), p; k)dv ε s
cn := exp T
Z t εn εn p(ξ (t) − u) − H(ξ (v), p; k)dv 0
and n τu,δ
εn c := inf t ∈ [0, T ] : inf d(ξ (v), (B(u, δ)) ) = 0 v∈[0,t]
The following conclusions hold: (a) Npε (s; t) is an Ftε -martingale for t ∈ [s, T ] and therefore IEµ {Npε (s; t)} = IEµ {Npε (s; s)} = 1
65 n ) is an F εn -martingale for t ∈ [0, T ], hence (b) Npεn (0; t ∧ τu,δ n IEu {Npεn (0; t ∧ τu,δ )} = IEu {Npεn (0; 0)} = 1
(c) For any p ∈ IRd , ε ∼ n T sup sup IEu N p,u ≤1 cn n∈IN u∈IRd and
∼ εn
sup sup IEu N p,u
n∈IN u∈IRd
T n ∧ τu,δ cn
≤1
(d) There exists γ1 ∈ (0, ∞) such that for any u, p ∈ IRd
H(u, p) ≥ pˆ a(u; k) ≥ −γ1 ||p||
(e) For any p ∈ IRd , there exists γ2 ∈ IR such that ( sup sup IEu n∈IN
u∈IRd
∼ εn
N p,u
T cn
2 )
≤ eγ2
and ( 2 ) ∼ εn T n ∧ τu,δ ≤ eγ2 sup sup IEu N p,u c d n∈IN u∈IR n (f ) For any p ∈ IRd and compact subset K ⊆ IRd , there exists M ∈ IN such that
∼ εn
inf inf IEu N p,u
n∈IN u∈K
T n ∧ τu,δ cn
≥ e−γ3 (δ) ,
where γ3 (δ) → 0 as δ → 0. Proof: (a) Part (a) follows from Th 4.2.1 [8] n (b) Since τu,δ is a stopping time, using the optional stopping theorem, part (b) fol-
lows: Proposition 2.1.5, Theorem 2.2.13 [9]. Characterization and convergence.
T cn εn
(c) Define pn :=
66 p n ε ∼ n = Npεn (0; t). Using H¨ older inequal∈ [1, ∞), then N p,u (t)
ities we get ε ∼ n 1 sup sup IEu N p,u (t) ≤ sup sup IEu Npεn (0; t) pn = 1
n∈IN u∈IRd
n∈IN u∈IRd
ε ∼ n 1 T n n sup sup IEu N p,u ) pn = 1 ∧ τu,δ ≤ sup sup IEu Npεn (0; t ∧ τu,δ cn n∈IN u∈IRd n∈IN u∈IRd (d) Since ez − 1 − z ≥ 0 ∀z ∈ IR, H(u, p; k) ≥ pˆ a(u; k) ∀u, p ∈ IRd . Since a ˆ(u; k) is bounded it follows that there exists γ1 ∈ (0, ∞) such that pˆ a(u; k) ≥ −γ1 ||p||. ε 2 ∼ n = (e) In order to prove the first inequality, we write N p,u cTn ∼ εn
N 2p,u ∼ εn
≤ N 2p,u
T cn
T cn
"
cn exp T
Z
T cn
cn H(ξ εn (v), 2p; k)dv − 2 T
0
Z
T cn
# H(ξ εn (v), 2p; k)dv
0
exp(γ2 ) where
γ2 := sup H(u, 2p; k) − 2 inf H(u, p; k) u∈IRd
u∈IRd
which is finite because of the condition (Ia) and part (d) of this lemma. Similarly for the second inequality. εn (f) For n ∈ IN, define p¯n := cn T −ε ≥ 1. n
Γnu,δ
cn := T
Z
T n ∧τu,δ cn
H(ξ εn (v), p; k)dv −
0
cn T − εn
Z 0
T n ∧τu,δ cn
H(ξ εn (v), T (T − εn )p; k)dv
67 For any u ∈ IRd , part(b) and H¨ older inequality imply: 1=
p¯n p¯n T T εn εn n n IEu NT (T −εn )p (0; ∧ τu,δ ) ≤ IEu NT (T −εn )p (0; ∧ τu,δ ) cn cn ε ∼ n T n n = IEu N p,u ∧ τu,δ exp(Γu,δ ) . cn
Since εn → 0, there exists M ∈ IN such that εn ≤
1 2
∧ δ for any n ≥ M . Hence
for any compact subset K ⊂ IRd ,
sup sup Γnu,δ ≤ sup sup
n≥M u∈K
sup
n≥M u∈K {v:||v−u||≤δ}
|H(v, p; k) −
1 H(v, T (T − n )p; k)| T − n
≤ γ3 (δ) where γ3 (δ) equals
3 sup sup{|H(y, q; k)−H(u, p; k)| : ||y−u|| ≤ δ, ||q−p|| ≤ δ}+2δ sup |H(u, p; k)| u∈K
u∈K
Thus ε ∼ n T n ∧ τu,δ inf inf IEu N p,u ≥ e−γ3 (δ) n∈IN u∈K cn Condition (Ia) and part (d) of this lemma imply that supu∈K |H(u, p; k)| < ∞ and the uniform continuity of H on compact subsets of IRd ×IRd gives us γ3 (δ) → 0 as δ → 0.
Since the conditions of the Proposition 10.3.2 in [13] are fulfilled the next result follow. ˆ the following conclusions hold: Proposition 3.21 For each k ∈ E,
68 (a) there exists a superlinear function f : (0, ∞) → IR ∪ {∞} such that for any > 0, δ > 0, s ∈ [0, T ], t ∈ (s, T ] t−s δ sup IPu,k { sup ||ξ (σ) − ξ (s)|| ≥ δ} ≤ 2d exp − f(√ ) ε s≤σ≤t d(t − s) u∈IRd ε
ε
(b) for each p ∈ IRd , supn∈IN supu∈IRd H n (u, p; k) < ∞ (c) for each p ∈ IRd and each compact subset K ⊂ IRd ,
lim sup |H n (u, p; k) − H(u, p; k)| = 0
(3.3.12)
n→∞ u∈K
(d) for each u ∈ IRd , the sequence of probability measures µn (dv|u, k), n ∈ IN converges weakly to a probability measure µ(dv|u, k) on IRd and for each p ∈ IRd , Z H(u, p; k) = log
epv µ(dv|u, k).
IRd
The family µ(dv|u, k), u ∈ IRd , k ∈ Eˆ defines a stochastic kernel on IRd given ˆ In addition, the function mapping u ∈ IRd 7→ µ(·|u, k) ∈ P(IRd ) is IRd × E. continuous in the topology of weak convergence on P(IRd ). Definition 3.22 A measurable function f : (0, ∞) → IR ∪ {∞} is called superlinear if limc→∞
f (c) c
= ∞.
Proof: (a) For i ∈ {1, ..., d} let ui be the unit vector in the ith coordinate direction of IRd and let r > 0 be a real number. Then for any > 0, δ > 0, s ∈ [0, T ], t ∈ (s, T ], we have:
δ sup < ±u , ξ (σ) − ξ (s) >≥ √ s≤σ≤t d
sup IPu u∈IRd
sup IPu u∈IRd
i
ε
ε
rδ sup < ±ru , ξ (σ) − ξ (s) >≥ √ s≤σ≤t d i
ε
ε
= ≤
sup IPu { sup exp u∈IRd
s≤σ≤t
1 < ±rui , ξ ε (σ) − ξ ε (s) > − ε
Z
σ
H(ξ ε (σ), ±rui ; k))dσ
69
s
1 rδ ¯ k))) ≥ exp( ( √ − (t − s)H(r; ε d ¯ k) := max± maxi=1,...,d sup d H(u, ±rui ; k). The property (Ib) where H(r; u∈IR ¯ is finite for each r > 0. implies that H ¯ For c > 0 define f (c) := supr>0 {rc − H(r)}. Then for M > 0 and all sufficiently ¯ )≥ large c, 1c f (c) ≥ M − 1c H(M
M . 2
Since M is arbitrary chosen it implies that
ε f is superlinear. We rewrite the above inequality in terms of N±ru i (s; σ) and
then use the submartingale inequality. We get
δ sup IP sup < ±u , ξ (σ) − ξ (s) >≥ √ s≤σ≤t d u∈IRd i
ε
ε
≤
1 rδ ¯ √ − (t − s)H(r) sup IPu sup ≤ ≥ exp ε s≤σ≤t d u∈IRd 1 rδ ε ¯ √ − (t − s)H(r) sup IEu {N±ru exp − i (s; t)} = ε d d u∈IR 1 rδ rδ t−s ¯ ¯ √ − (t − s)H(r) exp − sup √ ≤ exp − − H(r) ε ε r>0 d d(t − s) δ t−s f √ = exp − . ε d(t − s)
ε N±ru i (s; σ)
Combining these bounds for the 2d choices of vectors {±ui , i = 1, ..., d}, we get sup IPu u∈IRd
ε
ε
sup ||ξ (σ) − ξ (s)|| ≥ δ s≤σ≤t
t−s ≤ 2d exp − f ε
δ √ d(t − s)
70 (b)
cn εn T exp{H (u, p; k)} = exp log IEu exp p ξ ( )−u T cn cn T = IEu exp p ξ εn ( ) − u T c " nZ T #) ( ∼ εn T cn cn H(ξ εn (v), p; k)dv = IEu N u,p ( ) exp cn T 0 n
Therefore,
sup sup exp{H n (u, p; k)} ≤ sup sup IEu n∈IN u∈IRd
n∈IN u∈IRd
≤ exp
T N u,p ( ) exp cn !
!
∼ εn
sup H(v, p; k)
sup H(v, p; k) v∈IRd
0, there exists δ > 0 such that sup
|H(v, p; k) − H(u, p; k)| < θ
sup
u∈K {v:||v−u||≤δ}
For n ∈ IN, define: A(εn , u, δ) :=
n o sup0≤s≤ T ||ξ εn (s) − u|| < δ . Then using cn
(a) we have T δcn sup IPu {A(εn , u, δ) } ≤ 2d exp − f(√ ) cn ε n dT u∈IRd δcn ≤ 2d exp −f ( √ ) dT c
71 For u ∈ K, p ∈ IRd ,
exp[H n (u, p; k) − H(u, p; k)] cn n T = IEu exp p ξ ( )−u − H(u, p; k) T cn ( " Z T #) ∼ εn T cn c n = IEu N u,p ( ) exp (H(ξ εn (t), p; k) − H(u, p; k))dt . cn T 0 n and part (f) of lemma it follows that there exists From the definition of τu,δ
M ∈ IN such that (
" Z T n #) ∧τu,δ cn T c n n N u,p ( ∧ τu,δ ) exp (H(ξ εn (t), p; k) − H(u, p; k))dt cn T 0
inf inf IEu
n≥M u∈K
∼ εn
≥ exp(−γ3 (δ) − θ) with γ3 (δ) → 0 as δ → 0, and, #) " Z T n ∧τu,δ cn T c n n (H(ξ εn (t), p; k) − H(u, p; k))dt N u,p ( ∧ τu,δ ) exp cn T 0
( sup sup IEu n≥M u∈K
∼ εn
n ≤ exp(θ) (from part (c) of lemma and the definition of τu,δ ). n Let γ4 = supv∈IRd H(v, p; k) − inf u∈K H(u, p; k) < ∞ and A(εn , u, δ) = {τu,δ > T }. cn
Then ( sup |IEu u∈K
(
∼ εn
∼ εn
N p,u (
T exp cn
cn T n IEu N u,p ( ∧ τu,δ )exp cn T
cn T Z
!) [H(ξ εn , p; k) − H(u, p; k)]dt
−
0
T n ∧τu,δ cn
!)
[H(ξ εn (t), p; k) − H(u, p; k)]dt | ≤
0 ∼ εn
exp(γ4 )(sup IEu {1I(A(εn ,u,δ))c N p,u ( u∈K
T cn
Z
∼ εn T T n ) + sup IEu {1I(A(εn ,u,δ))c N p,u ( ∧ τu,δ )} cn cn u∈K
72 ∼ εn
≤ exp(γ4 )(sup(IEu (1I(A(εn ,u,δ))c ))1/2 (IEu {N u,p ( u∈K
T 2 1/2 )} ) cn
∼ εn T + sup(IEu (1I(A(εn ,u,δ))c ))1/2 (IEu {N u,p ( )}2 )1/2 ) ≤ cn u∈K √ δcn γ2 2 2d exp γ4 − f ( √ ) exp( ) 2 T d
which goes to 0 as n → ∞. Therefore lim inf inf [H n (u, p; k) − H(u, p; k] ≥ −γ3 (δ) − θ n→∞ u∈K
lim sup sup[H n (u, p; k) − H(u, p; k)] ≤ θ n→∞
u∈K
so lim sup |H n (u, p; k) − H(u, p; k)| = 0.
n→∞ u∈K
(d) Condition (Ib) and lemma (c) implies that for each u ∈ IRd and k ∈ E, the moment generating function of {µn (·/u, k), n ∈ IN} (which is the sequence {exp(H n (u, p; k)} ), converges for each p ∈ IRd to exp(H(u, p; k). Since the latter is a continuous function, we conclude using the continuity of moment generating functions that µn (·/u, k) ⇒ µ(·/u, k) where µ(·/u, k) is a probability measure on IRd satisfying for each p ∈ IRd Z H(u, p; k) = log
exp(pv)µ(dv/u, k). IRd
Thus the family {µ(·/u, k), u ∈ IRd } is a stochastic kernel on IRd (it is a limit of a stochastic kernel). The continuity theorem for moment generating function and (Ib) implies the continuity of the map u ∈ IRd → µ(·/u, k).
73 step 5. In order to study the Laplace principle for the process X n , we need to verify the asymptotic behavior of
W n (u) := −
1 log IEu (exp(−cn h(X n ))), cn
(3.3.13)
where IEu denotes the expectation with respect to IPu and h is any bounded continuous function mapping C([0, T ], IRd ) into IR. We will show that this is equal to the minimal cost of function of an associated stochastic control problem. We now specify the stochastic control problem whose minimal cost function gives a representation for the function W n (u). The controlled process is a discrete-time ¯ n , j = 0, 1, · · · , cn − 1, and at each time t there will be a control ν n process X j j giving the distributions of the controlled random variable that replaces this noise due to the increments. νjn is a stochastic kernels on (IRd )j+1 , denoted by νjn (dv) = n ¯ 0n , · · · , X ¯ jn ). A sequence of controls {ν1,j νjn (dv|X , j = 0, 1, · · · , cn − 1} is called an
admissible control sequence. Then, as in [13] (Theorem 4.3.1) we get the variational representation of Wun as (c −1 ) n X 1 n n n ¯ , k) + h(X ¯ ¯u R(νj (·)||µ(·|X IE W (u) = inf j νjn c n j=0 n
(3.3.14)
where the infimum is taken over all admissible control sequences {νjn }. For n ∈ IN and t ∈ [0, T ], define the stochastic kernel
n
ν (dv|t) :=
ν n (dv), j
νcn −1 (dv), n
t ∈ [ Tcnj , T (j+1) ), cn t ∈ [ T (ccnn−1) , T ]
j = 0, 1, · · · , cn − 2
74 The following representation holds (similar as in [13] (Corollary 5.2.1))
¯u W (u) = inf IE n n
νj
Z
T
˜ n (t)) R(ν1n (·|t)||µ(·|X
n ¯ + h(X )
(3.3.15)
0
˜ n = {X ˜ n (t), t ∈ [0, T ]} is the piecewise constant interpolation of the conwhere X ¯ jn , j = 0, 1, · · · , cn − 1}. trolled random variables {X
step 6. Laplace principle upper bound
Let Iu,k (ϕ) :=
RT 0
L(ϕ(t), ϕ(t), ˙ k)dt where L is the Legendre-Fenchel transform
defined in (3.3.9). Then Iu,k is a rate function and
lim sup n→∞
1 log IEu (exp(−cn h(X n ))) ≤ − inf (Iu,k (ϕ) + h(ϕ)) cn ϕ∈C([0,T ],IRd )
(3.3.16)
Indeed, first it can be shown that Iu,k has compact level sets in C([0, T ], IRd ) by using parts (b) and (c) of the Proposition 3.21, which implies that Iu,k is a rate function. Then using part (h) of Proposition 3.19 we will get
lim inf W n (u) ≥ n→∞
inf
(Iu,k (ϕ) + h(ϕ)).
ϕ∈C([0,T ],IRd )
step 7. Laplace principle lower bound. In order to prove the Laplace principle lower bound we need to characterize the relative interior of the effective domain of L(u, ·; k) in terms of the stochastic kernel µ(dv|u, k). This is done in part (d) of Proposition 3.19. For A, B subsets of IRd define
A + B := {u ∈ IRd : u = a + b, a ∈ A, b ∈ B}.
75 A subset C of IRd is called a convex cone if it has the property that for c ∈ C, λc ∈ C ∀λ ∈ [0, ∞). Denote conC for the convex cone of C. We can rewrite H(u, p; k) as
H(u, p; k) = ˆb(u; k)p +
Z
ˆ dv; k) (evp − 1)Γ(u,
IRd
where ˆb(u; k) := a ˆ(u; k) −
Z
ˆ dv; k) v Γ(u,
IRd
ˆ (u,k) and define T(u,k) := {ˆb(u; k)} + conS ˆ Let SΓˆ (u,k) be the support of Γ Γ(u,k) . The relative interior ri(domL(u, ·; k)) = ri(T(u,k) ) and the following properties hold: (a) The sets intT(u,k) are independent of (u, k) ∈ IRd × Eˆ (b) 0 ∈ intT(u,k) With similar arguments as in Theorem 6.5.1 [13] it can be proved that
lim sup W n (u, k) ≤ n→∞
inf
ϕ∈C([0,T ],IRd )
(Iuk (ϕ) + h(ϕ)).
This gives the Laplace principle lower bound for X n .
lim inf n→∞
1 log IEu (exp(−cn h(X n ))) ≥ − inf (Iuk (ϕ) + h(ϕ)). cn ϕ∈C([0,T ],IRd )
(3.3.17)
Thus the Laplace principle is proved for the random walk X n and therefore for the process ζ n .
76 step 8. Laplace principle holds for the sequence ξ εn because ξ εn , ζ n are superexponentially closed, i.e.
lim sup sup εn log IPu (ρ(ξ εn , ζ n ) > δ) = −∞, n→∞
(3.3.18)
u∈IRd
where ρ is Skorokhod metric on D([0, T ], IRd ).
Thus, by Proposition 3.16 we obtain the large deviation principle for the sequence of random variables ξtε (u; k) with the rate function Z Iu,k (ϕ) =
T
L(ϕ(t), ϕ(t); ˙ k)dt. 0
Using Lemma 3.17 we get the large deviation principle for the sequence of stochastic additive functionals ξ ε with rate function Iu (ϕ) = min{Iu,k (ϕ) : 1 ≤ k ≤ N }. This completes the proof of the theorem.
This principle has many applications, for example finding the probability of exit from a stable domain of the process. In some cases the infimum can be explicitly found by using calculus of variations. The class of absolutely continuous functions on [0, T ] can be identified with the Sobolev space H 1,1 [0, T ], and since the Fenchel-Legendre function L(u, q; k) verifies the conditions of Tonelli’s existence theorem (Theorem 3.7 [24]), the existence of the minimizer will follow. If ϕ ∈ AC[0, T ] is a local minimizer of the functional L(ϕ, ϕ0 ), then ϕ will satisfy the Euler-Lagrange equation which will be further simplified to the Beltrami equation: L(ϕ, ϕ0 ) − ϕ0 Lϕ0 (ϕ, ϕ0 ) = C, where C is a constant. Example 3.23 Compound Poisson process
77 Consider the compound Poisson process ξ ε (t), t ≥ 0 switched by the jump Markov process x(t), t ≥ 0 defined in Example 3.13, of the form ν(t/ε;x(t/ε)
X
ε
ξ (t) =
k=1
t ak (x( )) ε
with the infinitesimal generator given by Λ(x) IΓ (x)φ(u) = ε ε
Z [φ(u + εv) − φ(u)]F (dv; x). IRd
Here ν(t; x), t ≥ 0, x ∈ E = {1, 2, 3, 4} is a homogeneous Poisson process, with intensity Λ(x) and ak (x), k ≥ 1, x ∈ E is a sequence of i.i.d. random variables, independent of ν(t), t ≥ 0, with common distribution F (dv; x). R Using notation a ˆ(k) = Ek πk (dx)a(x), this process converges weakly, ε
Z
ξ (t) ⇒
t
as ε → 0.
a ˆ(ˆ x(s))ds, 0
Applying the operator H ε f ε as in equation (3.3.5) we get the limiting operator Hf as follows Z
0
[evf (u) − 1]F (dv, x).
Hf (u, x) = Λ(x) IRd
For tractability purposes, let’s suppose that F (dv; x) is independent of x. Then ˆ is the projected operator Hf
ˆ (u, k) = Λ(k) ˆ Hf
Z
0
[evf (u) − 1]F (dv)
IRd
ˆ where Λ(k) =
R Ek
ˆ πk (dx)Λ(x). Hence, Λ(1) =
2λ1 µ1 λ1 +µ1
ˆ and Λ(2) =
2λ2 µ2 . λ2 +µ2
Assume that
the random variables ak (x) are distributed exponential with the parameter λ. Then
78 the function H(p; k), p ∈ IR, k ∈ Eˆ = {1, 2} defined in the relation 3.3.8 is
ˆ H(p; k) = Λ(k)
p , λ−p
λ>p
The Legendre-Fenchel transform L(q; k) = supp∈IR {pq − H(p; k)} becomes q ˆ ˆ L(q; k) = λq − 2 λq Λ(k) + Λ(k),
the supremum being attained for p = λ−
q
ˆ λΛ(k) . q
Therefore, for T > 0 arbitrary fixed,
and for absolutely continuous functions ϕ ∈ D([0, T ], IR), with ϕ(0) = 0, the process ξ ε satisfies the large deviation principle. Its rate function is I(ϕ) = mink=1,2 Ik (ϕ), q RT ˆ ˆ + Λ(k). where Ik (ϕ) = 0 L(ϕ0 (t); k)dt and L(ϕ0 (t)) = λϕ0 (t) − 2 λϕ0 (t)Λ(k)
CHAPTER 4 LARGE DEVIATION PRINCIPLE FOR FUNCTIONAL CENTRAL LIMIT THEOREMS We study a class of empirical measures on C[0, ∞) associated with Sn (g) = Pn−1 k=0
g(Xk ), where (Xk ) is a Markov chain with an invariant measure m and g ∈
L2 (m), via an invariance principle corresponding to interpolation of Sn . The necessary background is introduced and auxiliary results are provided for subsequent analysis. Then we provide criteria which allow us to establish a martingale decomposition representation for Sn (g), the key element in the proof of the large deviation result presented in the last section of this chapter. 4.1
Functional a.e. central limit theorems for additive functionals
Let X1 , X2 , ... be a sequence of i.i.d. random variables on (Ω, F, IP) such that P IE(Xi ) = 0 , IE(Xi2 ) = 1 and Sn = ni=1 Xi . Define the interpolation processes, with ψn nk = Sk for 1 ≤ k ≤ n, 1 ψn (t) := √ S[nt] + (nt − [nt])(S[nt]+1 − S[nt] ) , 0 ≤ t ≤ 1 n
(4.1.1)
and empirical processes Wn : C[0, 1] → M1 (C[0, 1]), n
1 X1 Wn (·) := δ{ψ ∈·} L(n) k=1 k k with L(n) =
Pn
1 k=1 k .
(4.1.2)
The following functional almost everywhere central limit theo-
rem is due to Brosamler [25].
79
80 Theorem 4.1 The random process Wn converges weakly to the Wiener measure W on C[0, 1], IP-a.e. Let (Xn ) be an ergodic Markov chain with stationary measure m, Sn (g) = Pn−1 k=0
g(Xk ) be such that IEm (S12 ) = σ 2 ∈ (0, ∞) and for any initial distribution µ
and any k ∈ IN, IEµ (Sk2 ) < ∞. Define the interpolation processes 1 Ψn (t) := √ S[nt] + (nt − [nt])(S[nt]+1 − S[nt] ) , 0 ≤ t < ∞ σ n
(4.1.3)
and the empirical processes n
1 X1 Wn (·) := δ{Ψ ∈·} L(n) k=1 k k
(4.1.4)
Theorem 4.2 Wn converges weakly to the Wiener measure W on C[0, ∞), IPµ -a.e. Proof: By using our martingale decomposition (see section 3, Theorem 4.11) we can write Sn = Mn +Rn , where (Mn ) is a mean zero martingale for which sup1≤k≤n
|Rk | √ σ n
converges in probability to 0. Define the empirical measures n
WnM (·)
1 X1 δ M = L(n) k=1 k {Ψk ∈·}
(4.1.5)
where ΨM n is the interpolation process 1 ΨM n (t) = √ {M[nt] + (nt − [nt])(M[nt]+1 − M[nt] ) σ n
(4.1.6)
81 corresponding to the martingale M = (Mn ). By [26], WnM converges weakly to the Wiener measure W on C[0, 1], i.e., for any bounded continuous function f : C[0, 1] → IR , W - a.e., n
1 X1 lim δf ◦ΨM = φ(W ) k n→∞ L(n) k k=1 To show that Wn converges weakly to the Wiener measure W on C[0, 1], we check that n
n
1 X1 1 X1 lim δf ◦Ψk = lim δf ◦ΨM k n→∞ L(n) n→∞ L(n) k k k=1 k=1 for which it is sufficient to have limn→∞ ||Ψn − ΨM n ||C[0,1] = 0. Indeed, M ||Ψn − ΨM n ||C[0,1] = sup |Ψn (t) − Ψn (t)| = t∈[0,1] 1 sup √ {R[nt] + (nt − [nt])(R[nt]+1 − R[nt] )} ≤ t∈[0,1] σ n 1 √ |R[nt] + (nt − [nt])(R[nt]+1 − R[nt] )| ≤ sup sup 1≤k≤n t∈[ k−1 , k ] σ n n
n
1 √ |Rk | 1≤k≤n σ n sup
which by (4.5.2) converges to 0 in probability. By replacing 1 with N in the above proof the result holds for C[0, N ], for every N > 0. The extension to C[0, ∞) is standard: νn converges weakly to ν on C[0, ∞) if for every N > 0, πN νn converges weakly to πN ν on C[0, N ], where πN of a measure on C[0, ∞) denotes the measure it induces on C[0, N ]. In our case this means that Wn converges weakly to the Wiener measure W on C[0, ∞).
82 4.2
Large deviation principle for Ornstein-Uhlenbeck processes Define the canonical shifts σt : C(IR) → C(IR) by
σt ω(s) := ω(t + s)
and empirical processes RT : C(IR) → M1 (C(IR)) for any T > 0 by 1 RT := 2T
Z
T
δσt ω dt. −T
Assume that σ is an ergodic transformation, and let IP be a σ-invariant measure i.e. IP is the probability measure on the space of trajectories corresponding to the Ornstein-Uhlenbeck processes starting with stationary distribution at time t = 0. Let Ω = C(IR), then for any bounded continuous function we have: Z T Z Z T 1 1 f (ω)dRT (ω) = f (ω)dδσt ω dt = f (σt ω)dt 2T −T 2T −T Ω Ω Z f (ω)dIP →
Z
Ω
a.s. as T → ∞ by ergodic theorem. This implies the weak convergence RT ⇒ IP as T → ∞. The exponential decay for the deviations of RT from IP was given by Donsker and Varadhan in [27] through the rate function H(Q), Q ∈ M1 (C(IR)) defined as
H(Q) :=
limT →∞ 1 h(=|[−T,T ] (Q)/=|[−T,T ] (IP)) if Q is σ invariant T ∞
otherwise
83 where =|I (µ) denote the image of a measure µ under the restriction map on I and h(µ/ν) is the relative entropy of µ with respect to ν,
h(µ/ν) :=
R log( dµ ) dν ∞
if µ ν
(4.2.1)
otherwise
Theorem 4.3 For every bounded interval I ⊂ IR,
HI (·) :=
inf
Q∈=|−1 I (·)
H(Q)
is a rate function on M1 (C(I)) and for any Borel set A ⊆ M1 (C(I)), 1 log IP{=|I (RT ) ∈ A} T 1 ≤ lim sup log IP{=|I (RT ) ∈ A} ≤ − inf HI ¯ A T →∞ T
− inf◦ HI ≤ lim inf T →∞
A
An extension to paths on [0, ∞) by using a finer topology than the topology of uniform convergence on compact sets was obtained by Heck in [28]. Theorem 4.4 Let Φ : IR → [0, ∞) be a continuous function satisfying Φ(t) Φ(t) lim √ = lim p = ∞ t→−∞ t |t|
t→∞
and CΦ := {ω ∈ C(IR) : supt∈IR
|ω(t)| Φ(t)
(4.2.2)
< ∞}, M1 (CΦ ) = {Q ∈ M1 (C(IR)) : Q(CΦ ) = 1}.
Then H|M1 (CΦ ) is a rate function on M1 (CΦ ), and for every Borel set A ⊆ M1 (CΦ ), 1 log IP{RT ∈ A} T →∞ T 1 ≤ lim sup log IP{RT ∈ A} ≤ − inf H ¯ A T →∞ T
− inf◦ H ≤ lim inf A
84 Remark 4.5 Under condition (4.2.2), IP ∈ M1 (CΦ ) and IP-a.e. RT ∈ M1 (CΦ ). 4.3
Large deviation principle for Brownian motion Let φ : IR+ → IR+ be a continuous function such that φ(t) φ(t) lim p = lim p = ∞, t→0 t| log t| t→∞ t| log t|
(4.3.1)
and the set Cφ defined as
Cφ := {ω ∈ C[0, ∞) : sup t∈IR+
|ω(t)| < ∞} φ(t)
Consider the isomorphism F : C(IR) → C[0, ∞), F (ω)(t) :=
√
t ω(log t). Then
√ . F |CΦ : CΦ → Cφ is a bijective isometry with respect to | · |Φ and | · |φ if Φ(·) = φ(exp(·)) exp(·)
Let us denote W := =F (IP), so W is Wiener measure on C[0, ∞). Theorem 4.4 can be written in terms of Brownian motion as follows: For any t ∈ IR define θt : C[0, ∞) → C[0, ∞) by θt ω(s) := e−t/2 ω(et s) and for any T > 0 empirical processes ST : C[0, ∞) → M1 (C[0, ∞)) by 1 ST (ω) := 2T
Z
T
δθt ω dt −T
Theorem 4.6 Define for any Q in M1 (C[0, ∞)) the function
I(Q) :=
limT →∞ 1 h(=|[e−T ,eT ] (Q)/=|[e−T ,eT ] (W )) if Q is θ-invariant T ∞
otherwise
(4.3.2)
85 Then I|M1 (Cφ ) is a rate function and for any Borel set A ⊆ M1 (Cφ ), 1 log W {ST ∈ A} T →∞ T 1 ≤ lim sup log W {ST ∈ A} ≤ − inf I ¯ A T →∞ T
− inf◦ I ≤ lim inf A
where M1 (Cφ ) := {Q ∈ M1 (C[0, ∞)) : Q(Cφ ) = 1} Let’s define for any a > 0, θa : C[0, ∞) → C[0, ∞) by 1 θa ω(t) = √ ω(at) a and empirical processes RT : C[0, ∞) → M1 (C[0, ∞)), √
1 RT := log T
Z
T
δθt ω
√1 T
dt t
Then (Rn ) satisfies LDP with constants (log n) and rate function I, where
I(Q) :=
lima→∞
1 h 2 log a
Q ◦ |−1 /W ◦ |−1 [ 1 ,a] [ 1 ,a]
∞
a
if Q is θ-invariant
a
(4.3.3)
otherwise
which means 1 log W {Rn ∈ A} n→∞ log n 1 ≤ lim sup log W {Rn ∈ A} ≤ − inf I ¯ A n→∞ log n
− inf◦ I ≤ lim inf A
Since
1 log n
Rn 1
δθt dtt has under W the same distribution as
the following corollary:
1 log n
R √n √1 n
δθt dtt we have
86 ∼
Corollary 4.7 Let Rn =
1 log n
Rn 1
∼
δθt dtt . Then (Rn ) satisfies LDP with constants
(log n) and rate function I defined in (4.3.3). 4.4
Large deviation principle for sequence of i.i.d. random variables
Let’s consider X1 , X2 , ... i.i.d. random variables on (Ω, F, IP) such that IE(Xi ) = P 0 , IE(Xi2 ) = 1 and Sn = ni=1 Xi . We want to prove the LDP for the functional almost everywhere central limit theorem given in Theorem 4.1. Lemma 4.8 Let Yn and Zn be random variables with values in a metric space (E, d) such that for all > 0, 1 log IP{d(Yn , Zn ) > } = −∞ n→∞ log n
(4.4.1)
lim
Then Yn and Zn are equivalent with respect to LDP (or exponentially equivalent), which means that if (Yn ) satisfies LDP with constants (log n) and rate function I, then (Zn ) also satisfies LDP with the same constants and rate function. To use this lemma for sequences of random variables in M1 (Cφ ) it is necessary to define a metric, as done in [28]. Let (Cφ , | · |φ ) be a metric space, with |ω|φ = supt∈IR+
|ω(t)| . φ(t)
On M1 (Cφ ) define
Z Z 1 dφ (µ, ν) := sup f dµ − f dν , f ∈ C(Cψ , IR), kf kL ≤ 2 where kf kL := supω∈Cφ |f (ω)|+supω,ω0 ∈Cφ ,ω6=ω0
|f (ω)−f (ω 0 )| . |ω−ω 0 |φ
a metric space and the following properties hold: (a) dφ ≤ 1
(4.4.2)
Then (M1 (Cφ ), dφ ) becomes
87 ∼
∼
∼
∼
(b) dφ (αµ + (1 − α)ν, αµ + (1 − α)ν) ≤ αdφ (µ, µ) + (1 − α)dφ (ν, ν), for α ∈ [0, 1] ∼
∼
and µ, µ, ν, ν ∈ M1 (Cφ ) (c) dφ (δω , δω0 ) ≤ |ω − ω 0 |ψ for ω, ω 0 ∈ Cφ . Then, as shown in [29], we have Lemma 4.9 Let Yt ,Zt , t ∈ [1, ∞) be random variables with values in the separable metric space (Cφ , | · |φ ) such that t → Yt , t → Zt are continuous from the right or left and for all ε > 0,
lim
l→∞
Then
1 log n
Rn 1
δYs ds s
Let Wn = tion φ(t) > t
1+ε 2
1 log IP log l
1 L(n)
and Pn
1 log n
1 k=1 k δψk
(
) sup |Ys − Zs |φ > ε
= −∞
(4.4.3)
s∈[l,l+1)
Rn 1
δZs ds s
are equivalent with respect to LDP.
be as defined in (4.1.2) and φ above satisfies in addi-
, for t ∈ [0, 1].
Theorem 4.10 (Wn ) satisfies LDP with constants (log n) and rate function I defined by (4.3.3). Proof: ∼
Let W n :=
1 log n
Rn 1
δψ[s]
ds . s
∼
First we show that W n and Wn are exponentially
equivalent. By Lemma 4.8 it suffices to verify ∼ 1 log IP{dφ (Wn , W n ) > ε} = −∞ n→∞ log n
lim
88 Given f ∈ C(Cφ , IR), kf kL ≤
1 2
we have
n 1 Z n dt 1 X1 f (ψ[t] ) − | f dWn − f dW n | = f (ψk ) log n 1 t L(n) k=1 k n−1 n 1 X 1 X1 1 = f (ψk ) log(1 + )δψk − log n k L(n) k k=1 k=1 n−1 1 1 X 1 1 1 ≤ log(1 + f (ψn ) + log(1 + ) − f (ψk ) L(n) L(n) n k k k=1 n−1 1 X 1 1 + − log(1 + f (ψk ) L(n) log n k=1 k ! n−1 1 log n 1 C X 1 1 − ≤ + + 2 nL(n) L(n) k=1 k 2 L(n) Z
Z
∼
which converges to 0 uniformly as n → ∞. Let X1 , X2 , .... be i.i.d. random variP ables such that IE(Xi ) = 0, IE(Xi2 ) = 1 and the partial sum Sn = ni=1 Xi . If one ∼
shows that (W n ) satisfies the LDP with constants (log n) and rate function I in the case in which Xi are i.i.d. N (0, 1)-distributed, then the general case follows, see [29], i.e., by Skorokhod’s representation theorem there exists a probability space (Ω, F, IP) with random variables (Yn ) and (Zn ) such that Yn , n ∈ IN are i.i.d. with the same ∼
distribution as Xn , Zn are i.i.d N (0, 1)-distributed and (W n ) corresponding to (Yn ) and respectively (Zn ) are shown to be equivalent with respect to LDP. Therefore, it remains to consider the case for Xi which are i.i.d. N (0, 1)-distributed. Let (Ω = C0 [0, ∞), F, IP = W ) with coordinate map Xt (ω) = ω(t) and Xi (ω) := ∼
ω(i) − ω(i − 1) ∼ i.i.d. N (0, 1)-distributed. We will show that (W n ) satisfies the ∼
∼
LDP by checking that W n and Rn are equivalent with respect to LDP.
89 By Lemma 4.9, this reduces to checking that for any > 0 1 lim log W n→∞ log n
(
) sup
|θs − ψn |φ > ε
= −∞
(4.4.4)
s∈[n,n+1)
The probability W {sups∈[n,n+1) |θs − ψn |φ > ε} is dominated by the sum of three probabilities, W {sups∈[n,n+1) |θn − ψn |φ > 2ε }, W {sups∈[n,n+1) supt∈IR+
|ω(st)−ω(nt)| √1 φ(t) s
W {sups∈[n,n+1) supt∈IR+ | √1s −
> 4ε },
|ω(nt)| √1 | n φ(t)
> 4 }. Given ε > 0, based on arguments of
Lemma 4 in [28], the estimates below hold for large n and utilize the properties of function φ given in Theorem 4.6 along with the properties of Brownian motion, ( W
)
ω ∈ C0 [0, ∞) : sup |ω(t) − ω(a)| ≥ c
≤ 2 exp −
t∈[a,b]
for all 0 ≤ a < b and c > 0, and the inequalities: Log(e|k| ) ≥ √1 ) u
c2 2(b − a)
|k|+1 , 2
sups∈[n,n+1) ( √1n −
≤ 12 n−3/2 . In what follows the positive constant cij represent the j-th constant
in the i-th estimate. Regarding the first estimate we have
(
) ε ≤ W sup sup |θs − ψn |φ > 2 s∈[n,n+1) t∈IR+ |θn (t) − ψn (t)| ε√ W sup sup > n ≤ k≥1 t∈[ k , k+1 ) φ(t) 2 n n ( ) k ε√ W sup sup |ω(t) − ω(k)| > nφ 2 n k≥1 t∈[k,k+1)
90 Then for sufficiently large n (
) ε√ k W sup sup |ω(t) − ω(k)| > nφ ≤ 2 n 1≤k≤n t∈[k,k+1) ( r ε ) n X ε√ k k ≤ W sup |ω(t) − ω(k)| > n 2 n n t∈[k,k+1) k=1 ( ) 1 n X ε k 2 +ε W sup |ω(t) − ω(k)| > ≤ 2 nε t∈[k,k+1) k=1 2 n X ε 1 ε2 k 1+2ε ≤ 2n exp − n 2 exp − 2ε 2 4 n 8 k=1 and
s ) ε√ k k W sup sup |ω(t) − ω(k)| > n Log ≤ 2 n n k≥n+1 t∈[k,k+1) ( ) ∞ X ε√ W sup |ω(t) − ω(k)| > 3k ≤ 2 t∈[k,k+1) k=n+1 ∞ X 1 ε2 2 exp − 9k ≤ c11 e−c12 n . 2 4 k=n+1 (
Therefore, ε2 W { sup |θs − ψn |φ > ε} ≤ 2n exp − n + c11 e−c12 n . 8 s∈[n,n+1)
For the second estimate one obtains
91
(
) |ω(st) − ω(nt)| ε √ W sup sup > ≤ 4 sφ(t) s∈[n,n+1) t∈IR+ ) ( X √ |ω(st) − ω(nt)| ε > n ≤ W sup sup φ(t) 4 s∈[n,n+1) t∈[el ,el+1 ) l∈ZZ ) ( X |ω((s + n)t) − ω(nt)| ε√ > n ≤ W sup sup φ(t) 4 s∈[0,1) t∈[el ,el+1 ) l∈ZZ ( ) X ε √ φ(el ) n√ W sup sup |ω((s + n)t) − ω(nt)| > ≤ 4 el s∈[0,1) t∈[1,e) l∈ZZ ( ) p X ε nLog(e|l| )ζ(e|l| ) W sup sup |ω((s + n)t) − ω(nt)| > ≤ 4 s∈[0,1) t∈[1,e) l∈ZZ X c21 n exp(−c22 ε2 n(|l| + 1)) ≤ c23 n exp(−c24 ε2 n), l∈ZZ
where ζ : [1, ∞) → [1, ∞), ζ(t) := inf s∈IR+ \( 1 ,t) t
φ(s)2 , sLog(s)
limt→∞ ζ(t) = ∞.
Finally, (
) 1 1 |ω(nt)| ε W sup sup √ − √ > ≤ 4 s n φ(t) s∈[n,n+1) t∈IR+ ) X ( ε εn |ω(nt)| |ω(t)| > ≤ W sup > ≤ W sup √ 4 4 t∈IR+ n nφ(t) t∈[el ,el+1 ) φ(t) l∈ZZ ( ) q X n W sup |w(t)| > ε Log(e|l| )ζ(e|l| ) ≤ 4 t∈[1,e) l∈ZZ X c31 exp(−ε2 n2 c32 (|l| + 1)) ≤ c32 exp(−c34 ε2 n2 ). l∈ZZ
which establishes (4.4.4) and concludes the proof.
92 4.5
Martingale decomposition of Markov functionals In this section we show that certain functionals of Markov chains can be writ-
ten as martingales perturbed by random variables whose maxima (scaled by factor √1 ) n
converge in probability to zero at a prescribed rate. The main interest in such
representation stems from the fact that when proving central limit theorem or large deviation principle, the methods in Markovian versus martingale case are essentially different (except when the two cases overlap and either method can be employed), while martingale approach is often preferable. For instance, martingale maximal inequalities offer a useful tool in handling the tail estimates which in turn could be applied to Markovian cases whenever possible. In what follows we provide criteria for a martingale decomposition and describe the classes of Markov chain functionals that allow such representation. Let (Ω, F, IP) be a probability space, (E, BE ) a complete separable metric space, and let Xn , n ≥ 0 be a Markov chain defined on Ω with values in E. Given a probability measure µ on (E, BE ), one defines the probability measures IPµ by Z IPµ (B) = µIP(B) =
µ(dx)p(x, B),
x ∈ E, B ∈ F
E
where p(x, B), x ∈ E, B ∈ BE is the Markov transition function of Xn , n ≥ 0. We denote by IEµ and IEx the expectations corresponding to IPµ and IPx respectively. Inductively one defines the n-step transition probability by pn (x, B) = IP(Xn ∈ B|X0 = x) = IP(Xn+m ∈ B|Xm = x). Let P be the transition probability operator defined as Z P ϕ(x) := IE[ϕ(xn+1 )|xn = x] =
p(x, dy)ϕ(y) E
93 and denote by P n the n-step transition operator corresponding to the n-step transition probability pn (x, B). Theorem 4.11 Let Xn , n ≥ 0 be an ergodic E-valued Markov chain with initial distribution µ and unique invariant probability measure m. R If g ∈ L2 (m) := {g : E → IR : E g 2 dm < ∞} satisfies the properties: R (i) E g dm = 0 (ii) kP k gkL2 (m) ≤ ρk kgkL2 (m) for some 0 < ρ < 1, k ∈ IN R k (iii) dµP ≤ D < ∞, k ∈ IN and g 2 (x)m(dx) ≤ exp(−ϕ(n)) for n large, dm {x: g 2 (x)>n} with ϕ : IR+ → IR+ such that limx→∞
ϕ(x) log(x)
→∞
(iv) |P k g(x)| ≤ Cn, whenever |g(x)| ≤ n, for some 1 < C < ∞, x ∈ E, k ∈ IN, and large n then Sn (g) :=
n−1 X
g(Xk ) = Mn + Rn
(4.5.1)
k=0
where Mn is a mean zero martingale relative to (Ω, Fn , IP), with the natural filtration Fn = σ(X0 , ..., Xn ) and 1 lim log IP n→∞ log n
sup1≤k≤n Rk2 > ε = −∞. n
(4.5.2)
Proof: We show that under our hypothesis the Poisson equation (I − P )u = g has a unique solution. Notice that I − P is not invertible, while ((1 + ε)I − P )uε = g, ε > 0 has unique solution ∞
1 X P k−1 g uε = ((1 + ε)I − P ) g = 1 + ε k=1 (1 + ε)k−1 −1
(4.5.3)
94 thanks to 1 + ε being in the resolvent of L2 (m)-contractive P , whose spectral radius is 1. Condition (ii) implies that the series in (4.5.3) converges in L2 (m) and we have
u(x) := lim uε (x) = ε→0
Therefore, Sn (g) =
Pn−1
k=0 g(Xk ) =
Pn
k=1 (u(Xk )
∞ X
P k g.
(4.5.4)
i=0
− P u(Xk−1 )) + u(X0 ) − u(Xn ) and
by Markov property Mn :=
n X
(u(Xk ) − P u(Xk−1 ))
k=1
is a mean zero martingale in L2 (m) with respect to the natural filtration Fn = σ(X0 , ..., Xn ). Setting Rn := u(X0 ) − u(Xn )
(4.5.5)
establishes the martingale decomposition (4.5.1). To simplify notation, summation index below and whenever else integer is needed, is understood as the integer part of the number. For each k ∈ IN we have √ √ 4n X i Cn P g(Xk ) > IP(u2 (Xk ) > Cn) ≤ IP 2 i=0 √ X ∞ Cn + IP P i g(Xk ) > 2 i= √ 4 n+1
95 The second term with n ≥ 4 satisfies ∞ √ ∞ X X Cn i i = ≤ IE IP P g(Xk ) > P g(X ) k √ 2 i= √ 4 n+1 i= 4 n+1
∞
∞
X
X
i i
P g(Xk ) ≤ P g(Xk ) ≤
√
i= 4 n+1
1
i= √
2 4 n+1 L (Ω)
∞ X √ i= 4 n+1
L (Ω)
kP i g(Xk )kL2 (Ω)
Using conditions (iii) and (ii) we get
i
g(Xk )k2L2 (Ω)
i
2
Z
i
2
g(y)p (Xk , dy) = kP = IE(P g(Xk )) = IE 2 Z Z i g(y)p (x, dy) µP k (dx) ≤ DkP i g(x)k2L2 (m)≤ Dρ2i kg(x)k2L2 (m)
Consequently, for n sufficiently large, √ √ X 4 n+1 √ ∞ √ Cn ρ i IP ≤ D kgkL2 (m)≤ A exp(−B 4 n) P g(Xk ) > 2 1−ρ i= √ 4 n+1 for some positive constants A and B. Turning to the first term, we have √ √ 4n 4n √ X X i Cn Cn i 2 IP P g(Xk ) > ≤ IP |P g(Xk )| > , 2 4 i=0 i=0 and for Ω = {g 2 (Xk ) > n4 } ∪ {g 2 (Xk ) ≤ n4 } gives Cn n i 2 IP |P g(Xk )| > ≤ IP(g 2 (Xk ) > ) 4 4 n Cn 2 i 2 + IP {g (Xk ) ≤ } ∩ {|P g(Xk )| > } 4 4
96 and (iv) makes the second term disappear while the first term, thanks to (ii), satisfies n Cn Cn k 2 2 IP g (Xk ) > = µIP g > ≤ exp −ϕ . 4 4 4 Combining the above yields √ √ √ εCn ) IP( max |Rk | > εCn) ≤ nIP(|Rk | > εCn) ≤ 2nIP(|u(Xk )| > 1≤k≤n 2 h i √ √ n ≤ 2n A exp −B 4 n) + 4 n exp(−ϕ 4 and (4.5.2) follows. Theorem 4.12 The following classes of processes satisfy the conclusions (4.5.1) and (4.5.2) of theorem (4.11): (a) finite state irreducible, aperiodic Markov chains (b) uniformly ergodic Markov chains with bounded functions g (c) Markov chains obtained by quantization P n of continuous time Markov processes P t , that are symmetric on L2 (m) (i.e. for which m is a reversible measure). Proof: (a) Since there is a unique invariant measure m = (m1 , ..., mN ), the assumption P (i) for g = (g1 , ..., gN ) can be written as N i=1 gi mi = 0. Then by the Fredholm Alternative, there is a solution of the Poisson equation (I − P )u ≡ Au = g if and only if g is orthogonal to w where A∗ w ≡ AT w = 0. The later is true because mA = 0 or AT mT = 0. We do not need to verify (ii) since that was only used to prove the existence of solution to the Poisson equation. Also, since g on {1, ..., N } is bounded, (iii) and(iv) clearly hold. (b) If the Markov chain is uniformly ergodic then the operator Q := I − P is normally solvable and condition (i) enables the uniqueness of the solution of
97 the Poisson equation. That is u = R0 g where R0 is the potential operator P n defined by R0 := ∞ n=0 [P − Π] and the projection operator in B(E) is defined R by Πg(x) := E m(dx)g(y)1I(x). Since R0 is a bounded operator conclusions (4.5.1) and (4.5.2) follows directly. Moreover, the conditions (iii) and (iv) holds because g is bounded. (c) For the sake of the discussion, we consider a class of continuous time Markov processes in (IRd , | · |) such that P t g = etA g for smooth functions g, where the infinitesimal generator A has a discrete spectrum {−λn , n ∈ IN} with λ0 = 0 < λ1 < λ2 · · · and its corresponding orthonormal in L2 (m) set of eigenfunctions {en , n ∈ IN} with e0 = 1. Then for any g in the orthogonal complement 1⊥ , the condition (i) is satisfied and P k en = e−λn k en , n ≥ 1. Consequently, for P∞ P −λn k k en , whence kP k g(x)kL2 (m) ≤ α e , P g(x) = g ∈ 1⊥ , g = ∞ n n n=1 αn e n=1 ρk kgkL2 (m) with 0 < ρ = e−λ1 < 1 and this gives (ii). Furthermore, the invariant measure m often has a density with respect to Lebesgue measure and condition R dµP k ≤ D < ∞ is satisfied. Finally, given m, the tail condition g 2 dm ≤ dm {g 2 >n} exp(−ϕ(n)) is readily verifiable for a large class of g ∈ 1⊥ in addition to which g is chosen to satisfy (iv). We remark that for any finite linear combination of eigenfunctions from 1⊥ , the condition (iv) is redundant and the tail condition R P reduces to {e2 >a} e2i dm ≤ e−ϕ(a) for large a. Namely, given g = N i=1 αi ei , i
|P i g(x)|2 ≤ kgk2L2 (m) (e21 (x) + · · · + e2N (x))
and therefore
IP(|P i g(Xk )|2 > n) ≤ IP
N X i=1
! e2i (Xk ) > n
≤
98 N X
IP
e2i (Xk )
i=1
n > ≤ N D max 1≤i≤N N
Z {e2i >n/N }
e2i dm
as claimed.
Hypercontractivity Example
Consider a 1-dimensional Ornstein-Uhlenbeck process Xt satisfying the equation dXt = − 12 Xt dt + dBt , X0 = x, where Bt is the standard Brownian motion, with 2
d 1 d infinitesimal generator A = 21 dx 2 − 2 x dx and the invariant measure dm = R Then P t g(x) := g(y)pt (x, y)dy, where
1
t
−
p (x, y) = p e 2π(1 − e−t )
−t (xe 2 −y)2 2(1−e−t )
2
x √1 e− 2 2π
dx.
,
is a m-symmetric hypercontractive Hermite semigroup with other examples, including Laguerre semigroup, studied in [30] and [31]. x2
Here, for Hermite polynomials Hn = (−1)n e 2 x, H2 = x2 − 1, H3 = x3 − 3x, ..., en =
Hn √ n!
2
dn − xn e dxn
, we have H0 = 1, H1 =
form an orthonormal basis for L2 (m)
and AHn = − n2 Hn gives the corresponding eigenvalues λn = − n2 , n ∈ IN. Letting t = 0, 1, 2, ... we obtain the embedded Markov chain Xn . Since
k
dµP = p
1 2π(1 − e−k )
e
−
−k (xe 2 −y)2 2(1−e−k )
”
r
dy
then, for example, taking µ = δx = δ0 we have 2 dµP k 1 −y =√ e 2 dm 1 − e−k
“
e−k 1−e−k
≤
e ≡D e−1
99 It is easy to see that for polynomial domination, |g(x)| ≤ B|x|l , l ∈ IN, and for large n Z
g 2 dm ≤ Cn2l−1 e−
n2 2
≤ e−
n2 4
{g 2 >n}
whence one may choose ϕ(x) =
x2 4
to satisfy the tail estimate.
For example, taking g(x) = x one gets
i
2
|P g(x)|
Z =
2 |g(y)p (x, y) dy| ≤ DkgkL2 (m) = D2 (1 − e−i + x2 ) i
≤ 4D max(1, x2 )
and condition (iv) holds true with C = 2D. For general g, even growing exponentially fast to infinity, it requires some work through estimates and usually g determines ϕ. 4.6
Large deviation principle for additive functionals
Let Xn , n ≥ 0 be an ergodic E-valued Markov chain with stationary measure P 2 2 m, as in Theorem 4.11. Let Sn = n−1 k=0 g(Xk ) and IEm (S1 ) = σ ∈ (0, ∞). According to Theorem 4.11, Sn = Mn + Rn , where Mn is a mean zero martingale and Rn satisfies (4.5.2). Let Wn and WnM be the empirical measures defined in (4.1.4) and (4.1.5) corresponding to the interpolation processes Ψn and ΨM n defined by (4.1.3) and (4.1.6) respectively. By Theorem 4.2, Wn converges weakly to the Wiener measure W on C[0, ∞). To conclude our analysis we invoke a martingale LDP [32], applied here to WnM , and show that WnM and Wn are LDP equivalent. Lemma 4.13 If lim
n→∞
1 log IP{|Ψn − ΨM n |φ > ε} = −∞ log n
(4.6.1)
100 1 then (Wn ) ≡ ( L(n)
Pn
1 k=1 k δΨk ∈· )
1 and (WnM ) ≡ ( L(n)
Pn
1 ) k=1 k δΨM k ∈·
are equivalent with
respect to LDP. Proof: By the above Lemma, we need to verify 1 log IP{dφ (Wn , WnM ) > ε} = −∞ n→∞ log n lim
where dφ is defined in (4.4.2). For f ∈ C(Cφ , IR), kf kL ≤
1 2
(4.6.2)
we have
Z Z n X 1 f dWn − f dWnM ≤ 1 |f (Ψk ) − f (ΨM k )| = L(n) k k=1 ε/2
[n ] 1 X 1 1 |f (Ψk ) − f (ΨM k )| + L(n) k=1 k L(n)
|Ψk − ΨM k |φ ≤ ≤
n X k=[nε/2 ]+1
1 |f (Ψk ) − f (ΨM k )| M k |Ψk − Ψk |φ
L([nε/2 ] L(n) − L([nε/2 ]) + sup |Ψk − ΨM k |φ L(n) 2L(n) ε/2 1+[n ]≤k≤n
ε 1 + sup |Ψk − ΨM k |φ 2 2 1+[nε/2 ]≤k≤n
whence IP{dφ (Wn , WnM ) > ε} ≤ IP{
|Ψk − ΨM k |φ > ε}.
sup 1+[nε/2 ]≤k≤n
Condition (4.6.1) implies that for every ε > 0 and N > 0, there exists k0 such that − for any k ≥ k0 , |Ψk − ΨM k |φ < k
2N ε
k −2 . Therefore,
IP{dφ (Wn , WnM ) > ε} ≤ n X k=[nε/2 ]+1
k−
2N ε
n X
k=[nε/2 ]+1 ∞ X −N −2
k −2 ≤ n
k
k=1
where c is a positive constant and (4.6.2) holds.
IP{|Ψk − ΨM k |φ > ε} ≤ = cn−N .
101 Assume in addition to the conditions of Theorem 4.11 that P∞
k=n
exp(−ϕ(k))
0). Theorem 4.14 The sequence (Wn ) satisfies the large deviation principle with constants (log n) and rate function I|M1 (Cφ ) defined in (4.3.3), that is, for any Borel set A ⊆ M1 (Cφ ), 1 log IP{Wn ∈ A} log n 1 ≤ lim sup log IP{Wn ∈ A} ≤ − inf I ¯ A n→∞ log n
− inf◦ I ≤ lim inf A
n→∞
where M1 (Cφ ) := {Q ∈ M1 (C[0, ∞)) : Q(Cφ ) = 1}, and φ is defined in (4.3.1). Proof: By Lemma 4.13 it suffices to check that (4.6.1) holds. We have
IP{|Ψk − (
ΨM k |φ
|Ψk (t) − ΨM k (t)| >ε = > ε} = IP sup φ(t) t∈IR+
) |R − (nt − [nt])(R − R | 1 [nt] [nt]+1 [nt] √ IP sup sup >ε φ(t) 1≤k εσ nφ( ) ≤ IP |Rk | > εσ n n n 1≤k εσ n Log( ) ≤ nIP{|Rk | > εσ n} + n n k=n+1 ∞ X
√ √ IP{|Rk | > εσ 3k} ≤ c41 n exp(−c42 4 n)+c43 n exp(−ϕ(n))+ n−γ
k=n+1
which ends the proof.
CHAPTER 5 FURTHER RESEARCH DIRECTIONS Large deviation principle for additive functionals of Markov chains was determined in Chapter 4. A similar result can be found for additive functionals of Markov processes. Rt
g(Ys ) ds where {Yt } is a Markov process with invariant measure m R nt R and g ∈ L2 (m) such that g dm = 0. To Xn (t) = σ√1 n 0 g(Ys ) ds we associate the Let Xt =
0
empirical processes Wn : C[0, ∞) → M1 (C[0, ∞)), n
1 X1 δ{Xk ∈·} Wn (·) := L(n) k=1 k with L(n) =
Pn
1 k=1 k .
Theorem 5.1 (Bhattacharya) (Functional central limit theorem) Let {Yt }t≥0 be an ergodic stationary Markov process on a state space S. If A is its infinitesimal generator on L2 (S, dm) where m is the invariant probability measure, then for all functions R nt g in the range of A, Xn (t) = σ√1 n 0 g(Ys ) ds, t ≥ 0, converges to W , the Wiener measure with zero drift and variance parameter σ 2 = −2hg, f i = 2hAf, f i where f is some element in the domain of A such that the Poisson equation g = −Af holds. This theorem can be easily extended to [0, ∞). In order to establish the functional almost everywhere convergence (FAECLT) of Wn to a Wiener measure on C[0, ∞) we need to find a martingale decomposition for the additive functional Xn (t). Using Poisson equation and the martingale problem associated to the Markov process Yt , one gets 102
103
Z nt 1 Xn (t) = g(Ys ) ds = − √ Af (Ys ) ds σ n 0 σ n 0 Z nt 1 = f (Yt ) − f (Y0 ) − √ Af (Ys ) ds + (f (Y0 ) − f (Yt )) σ n 0 1 √
Z
nt
= Mn (t) + Rn (t)
where Mn (t) is a martingale. Next it should be shown that the remaining part Rn (t) goes to 0 in probability and the FAECLT follows. For the large deviation result we have to prove that Rn (t) goes to 0 to a certain rate, and we need to find criteria for which this is fulfilled. Another future direction is to find a large deviation result for the stochastic additive functional ε
Z
ε
ξ (t) = ξ (0) +
t
η ε (ds; xε (
0
s )). ε2
where the supporting Markov process x(t), t ≥ 0 is uniformly ergodic on E with the stationary distribution π(dx), and η ε (t; x), t ≥ 0, x ∈ E is a process with locally independent increments defined by the generators
0
IΓε (x)ϕ(u) = aε (u; x)ϕ (u) + ε
−1
Z
[ϕ(u + εv) − ϕ(u) − εvϕ0 (u)]Γε (u, dv; x).
IRd
To get a martingale decomposition for this stochastic additive functional one can consider the Poisson equation associated with it. It could be shown that the existence of the solution for the Poisson equation of the process with locally independent increments is equivalent to the existence of the solution of the Poisson equation for the embedded Markov chain associated with the process with locally independent increments. We note that because of the random jumps of the process, the Poisson equation for continuous time processes cannot be applied directly. Using a similar
104 approach as in Chapter 4 one can get the large deviation result for the stochastic additive functional ξ ε (t). Another future research goal is that of extending this dissertation to limit theorems and large deviations for stochastic additive functionals switched by semi-Markov processes. Exit time applications and stability of perturbed dynamical systems may also be considered.
REFERENCES [1] A. Oprisan and A. Korzeniowski, “Large deviations for ergodic processes in split spaces,” Dynamic Systems and Applications, to appear, vol. 18, 2009. [2] A. Korzeniowski and A. Oprisan, “Large deviations for additive functionals of markov processes,” International Journal of Pure and Applied Mathematics, to appear, vol. 53, pp. 441–459, 2009. [3] I. Gikhman and A. Skorokhod, The Theory of Stochastic Processes. New York: Springer-Verlag, 1987. [4] V. Korolyuk and V. Korolyuk, Stochastic Models of Systems. Dordrecht: Kluwer Academic, 1999. [5] V. Koroliuk and N. Limnios, Stochastic Systems in Merging Phase Space. London: World Scientific, 2005. [6] M. Davis, Markov Models and Optimization. London: Chapman & Hall, 1993. [7] A. Shiryaev, Probability. New York: Springer-Verlag, 1984. [8] D. W. Stroock and S. Varadhan, Multidimensional Diffusion Processes.
New
York: Springer-Verlag, 1979. [9] S. Ethier and T. Kurtz, Markov Processes. Characterization and Convergence. New York: John Wiley & Sons, 1986. [10] P. Billingsley, Convergence of Probability Measures. New York: Wiley, 1968. [11] S. Varadhan, Large Deviations and Applications.
Philadelphia, Pennsylvania:
Society for industrial and applied mathematics, 1984. [12] A. Dembo and O. Zeitouni, Handbook of Stochastic Analysis and Applications. New York: Marcel Dekker, 2002. 105
106 [13] P. Dupuis and R. Ellis, A Weak Convergence Approach to the Theory of Large Deviations. New York: John Wiley & Sons, 1997. [14] S. Varadhan, “Large deviations,” The Annals of Probability, vol. 36, pp. 397–419, 2008. [15] J. Deuschel and D. Stroock, Large deviations. Providence, Rhode Island: AMS Chelsea, 2000. [16] A. Dembo and O. Zeitouni, Large Deviations Techniques and Applications. New York: Springer-Verlag, 1998. [17] E. C ¸ inlar, “Markov additive processes. I,” Probability Theory and Related Fields, vol. 24, pp. 85–93, 1972. [18] ——, “Markov additive processes. II,” Probability Theory and Related Fields, vol. 24, pp. 95–121, 1972. [19] V. Korolyuk and N. Limnios, “Average and diffusion approximation of stochastic evolutionary systems in an asymptotic split state space,” Annals Appl.Probab., vol. 14, pp. 489–516, 2004. [20] F. Skorokhod, A. V. Hoppensteadt and H. Salehi, Random perturbation methods with applications in science and engineering. New York: Springer-Verlag, 2002. [21] D. W. Stroock, An Introduction to the Theory of Large Deviations. New York: Springer-Verlag, 1984. [22] G. G. Yin and Q. Zhang, Continuous-Time Markov Chains and Applications. New York: Springer-Verlag, 1998. [23] J. Feng and T. Kurtz, Large Deviations for Stochastic Processes.
Providence,
Rhode Island USA: American Mathematical Society, 2006. [24] M. Buttazzo, G. Giaquinta and S. Hildebrandt, One-dimensional Variational Problems. Oxford: Clarendon Press, 1998.
107 [25] G. Brosamler, “An almost everywhere central limit theorem,” Math. Proc. Cambridge Phil. Soc., vol. 104, pp. 561–574, 1998. [26] F. Maaouia, “Principes d’invariance par moyennisation logarithmique pour les processus de markov,” The Annals of Probability, vol. 29, pp. 1859–1902, 2001. [27] M. Donsker and S. Varadhan, “Asymptotic evaluation of certain markov process expectations for large time iv,” Comm. Pure Appl. Math., vol. 36, pp. 183–212, 1983. [28] M. Heck, “Principles of large deviations for the empirical processes of the ornstein-uhlenbeck process,” Journal of Theoretical Probability, vol. 12, pp. 147– 179, 1999. [29] ——, “The principle of large deviations for the almost everywhere central limit theorem,” Stochastic Processes and their Applications, vol. 76, pp. 61–75, 1998. [30] A. Korzeniowski and D. Strook, “An example in the theory of hypercontractive semigroups,” Proceedings of the American Mathematical Society, vol. 94, pp. 87–90, 1985. [31] A. Korzeniowski, “On logarithmic sobolev constant for diffusion semigroups,” Journal of Functional Analysis, vol. 71, pp. 363–370, 1987. [32] M. Heck and F. Maaouia, “The principle of large deviations for martingale additive functionals of recurrent markov processes,” Electronic Journal of Probability, vol. 6, pp. 1–26, 2001.
BIOGRAPHICAL INFORMATION Adina Oprisan was born in Campulung, Romania in 1974. She graduated from Bucharest University, Romania in 1997, after which she received a M.S. in Mathematics from the same university in 1998. From 1998 to 2003 she was with the University “Politehnica” of Bucharest as an instructor in Department of Mathematics II. She received a M.S. degree in Statistics from Michigan State University in 2006 and her Ph.D. in Mathematics from the University of Texas at Arlington in 2009. Her current research interests lie in the area the of stochastic processes, with focus on randomly perturbed dynamical systems, almost everywhere central limit theorems, large deviation principle. She is a member of AMS and SIAM.