Applications of Large Deviations to Hidden Markov ...

2 downloads 0 Views 668KB Size Report
space Markov chain X. Due to measurements errors we can observe only perturbed values. Suppose we have a set of observations Y WD fYi ; i 2 Ng, which take ...
Applications of Large Deviations to Hidden Markov Chains Estimation Fabiola Del Greco M.

Abstract Consider a Hidden Markov model where observations are generated by an underlying Markov chain plus a perturbation. The perturbation and the Markov process can be dependent from each other. We apply large deviations result to get an approximate confidence interval for the stationary distribution of the underlying Markov chain.

1 Introduction Hidden Markov models (HMMs) describe the relationship between two stochastic processes: An observed one and an underlying hidden (unobserved) process. These models are used for two purposes. The first one is to make inferences or predictions about an unobserved process based on the observed one. A second reason is to explain variation in the observed process on variation in a postulated hidden one. For these reasons, HMMs became quite popular and many are its applications (i.e. biology, speech recognition, finance, etc.). For a general reference on these models, see Capp´e et al. (2005). More precisely, suppose a phenomena is driven by a discrete time finite state space Markov chain X. Due to measurements errors we can observe only perturbed values. Suppose we have a set of observations Y WD fYi ; i 2 Ng, which take values in Rd , and the following relationship holds Yi D Xi C "i ; 8i  1;

F. Del Greco M. () Institute of Genetic Medicine, EURAC research, Viale Druso 1, 39100 Bolzano, Italy e-mail: [email protected] A. Di Ciaccio et al. (eds.), Advanced Statistical Methods for the Analysis of Large Data-Sets, Studies in Theoretical and Applied Statistics, DOI 10.1007/978-3-642-21037-2 25, © Springer-Verlag Berlin Heidelberg 2012

279

280

F. Del Greco M.

where the processes X WD fXj ; j 2 Ng and " WD f"j ; j 2 Ng satisfy the following assumptions. The process X is a homogeneous Markov chain with a finite known state space ˝ WD fx1 ; : : : ; xm g, with xi 2 Rd . Let P  be the true transition matrix of X, whose entries are  Pi;j WD P.X1 D xj j X0 D xi /: Of course P  is unknown and we assume it is irreducible and aperiodic. This assumption implies the existence and uniqueness of the stationary distribution, which we denote by   D .1 ; 2 ; : : : ; m /. Moreover, we assume that given the value of the Markov process at time j , the error "j is independent from the past. The errors are allowed to be continuous random variables. In the literature we found three different methods for computing confidence intervals (CIs) of Hidden Markov chains parameters, namely likelihood profiling, bootstrapping and CIs based on finite-differences approximation of the Hessian. The problem of interval estimate for parameters of a HMM has been studied in Visser et al. (2000). Their conclusion is that likelihood profiling and bootstrapping provide similar results, whereas the finite-differences intervals are mostly too small. There is no detailed analysis of the true coverage probability CIs in the context of HMMs. We propose a CI for the stationary distribution of the underlying Markov chain using a large deviations approach. Roughly, we estimate the rate of convergence of the frequencies of times that the observations fall in a certain interval to its limit, which is a linear transform of the unknown stationary distribution. Nowadays, the theory of large deviations is rapidly expanding and is applied in many areas, such as statistics, engineering and physics; the reader can find few references in Varadhan (2008). It has been applied to a wide range of problems in which detailed information on rare events is required. Of course, one could be interested not only in the probability of rare events but also in the characteristic behavior of the system as the rare event occurs. This paper is organized as follows. In Sect. 2 we define the framework of the study; in Sect. 3 we construct the confidence interval and explore its properties.

2 Study Framework The Markov chain X WD fXi gi 1 is observed through a perturbation sequence " WD f"j gj 1 of random variables. Assume that given fX ˚ n D xj g, "n has distributionQj , with j 2 f1; 2; : : : ; mg, and is independent from "l ; Xl ; l 2 f1; 2; : : : ; n  1g . For any subset C  Rd and x 2 Rd let C  x WD fy 2 Rd W y C x 2 C g. Let U the collection of partitions U WD .U1 ; U2 ; : : : ; Um / with Ui  Rd , satisfying the following properties. The sets U1 ; U2 ; : : : ; Um are disjoint, with non–empty interior S d set (i.e. the largest open set contained in Ui is not the empty set), m j D1 Uj D R and the matrix

Applications of Large Deviations to Hidden Markov Chains Estimation

2

.U / .U / q1;2 q1;1 6 :: QU D 4 ::: : .U / .U / qm;2 qm;1

281

.U / 3 ::: q1;m : 7 :: : :: 5 .U / ::: qm;m

.U / has full rank, where qi;j WD Qj .Ui  xj /. We also suppose that each entry of the matrix is strictly positive. Denote by QU1 the inverse matrix of QU . We assume that the measures Qj , with j 2 f1; 2; : : : ; mg, make U non-empty. For a vector x 2 Rd we use the notation x  0 to indicate that each coordinate of x is positive. For any vector u set diag(u) to be the diagonal matrix whose .i; i / element is ui , and let Im be the m  m identity matrix. Define

o n   HU WD xW det P  diag.xQU /  Im D 0 ;

JU ./ WD

sup

m X

x2HU Wx0 kD1

k log xk ;

and Br .x/ WD fyW ky  xk  rg, where k  k stands for the Euclidean norm in Rd . Set B r D Br .QU   /. We assume that there exists a partition U such that inf2B cr J./  m C 1 has a known lower bound Hr which is strictly positive for r > 0. P b.n/ b.n/ Denote by b d .n/ WD n1 nj D1 1lfYj 2Ui g and let b d .n/ .U / WD .b d .n/ i 1 ; d 2 ; : : : ; d m /, where 1lA is the indicator function of the event A. Theorem 1 (Confidence Interval). Fix ˛ > 0 and choose the smallest r such that  d .n/ .U // is an approximate .1  ˛/e nHr  ˛. Then, the set Ar D QU1 Br .b confidence interval. Our approach relies on the fact that b d n .U / converges to QU   and does it quite fast. The large deviations principle makes this statement rigorous. Hence we use Hr to lower estimate the rate function.

3 Construction of the Confidence Interval Definition 1 (Large deviations principle). A rate function is a function which is non-negative and lower semicontinuous. A sequence of random variables Zn , n 2 N, satisfies a large deviations principle with rate function I./ if we have lim sup n!1

lim inf n!1

1 log P.Zn 2 F /   inf I.x/; x2F n

for any closed set F , and

1 log P.Zn 2 A/   inf I.x/; x2A n

for any open set A.

282

F. Del Greco M.

Proposition 1. Define m oi h nX / k 1lf"1 2Uk xj g : .U j ./ WD E exp kD1

For any Borel set U  Rd , b d .n/ .U / satisfies a large deviations principle with rate function ˚  IU .z/ WD sup h; zi  log .P / ; 2Rm

where hi is the usual inner product, P is the matrix whose .i; j / entry is /  Pi;j .U j ./ and for any irreducible matrix A the scalar .A/ is the so called Perron–Frobenius eigenvalue, with the properties .A/ 2 .0; 1/ and jj  .A/ for any eigenvalue  of A. Proof. Define m X 1lfXi Dxk g 1lf"i 2U1 xk g ; f .Xi ; "i / D kD1 m X

1lfXi Dxk g 1lf"i 2U2 xk g ; : : : ;

kD1

m X

 1lfXi Dxk g 1lf"i 2Um xk g :

kD1

The sequence ff .Xi ; "i /; 1  i  ng, given a realization fXi D i ; 1  i  ng, with i 2 ˝ for each i , is composed by independent random variables. This random function meets the hypothesis of Exercise 3.1.4 of Dembo and Zeitouni (1998) which yields this proposition. t u Proposition 2. b d .n/ .U / converges a.s. to QU   . P Proof. We can write 1lfYj 2Ui g D m sD1 1lfXj Dxs g 1lf"j 2Ui xs g . Hence, b d .n/ i D

m  X n X 1 sD1

n j D1

 1lfXj Dxs g 1lf"j 2Ui xs g :

P Consider the markov chain .Xn ; m i D1 i 1lf"n 2Ui Xn g /, taking values in the set ˝  f1; 2; : : : ; mg. This chain is irreducible and aperiodic, and our result follows by the ergodic theorem. u t Denote by N .QU / the set of linear combinations, with non-negative coefficients, of the rows of QU . More precisely ˚  N .QU / WD xQU W x  0 : For any t WD .t1 ; t2 ; : : : ; tm /, with t  0, define

Applications of Large Deviations to Hidden Markov Chains Estimation

tk  ; i D1 ti Pi;k

Lk .t/ WD Pm

283

for k 2 f1; : : : ; mg;

  and L.t/ WD L1 .t/; L2 .t/; : : : ; Lm .t/ . Notice that for any positive scalar a we have L.at/ D L.t/, i.e. is homogeneous of degree 0. Denote by M1 the set of probability measures on ˝. We consider M1 as a subset of Rm , that is the m-dimensional simplex.Let qU.1/ .i; j / the .i; j /-th entry of the matrix QU1 . Denote by K the image L M1  Rm . Define the map GU W M1 ! R GU ./ WD

m X

sup

z2K \N .QU / kD1

k log

m X i D1

zi qU.1/ .i; k/:

Proposition 3. For  … M1 we have I./ D 1, and if  2 M1 then IU ./  GU ./  .m  1/: Proof. Notice that m h nX oi / .U k 1lf"1 2Uk xj g j ./ D E exp kD1

D

m X

e k Cm1 P."1 2 Uk  xj /

kD1

D e m1

m X

.U / e k qk;j :

kD1

Fix t such that L.t/ 2 N .QU /. For a vector x  0 denote bylog.x/ the vector whose j -th entry is log xj . The vector  WD log e 1m L.t/QU1 satisfies tj  : i D1 ti Pi;j

j . / D Pm

Now the argument is standard. In fact t satisfies tP D t, which together with the fact t  0, implies .P / D 1. Hence ˚  I./ D sup h; zi  log .P / 2Rm

 h ; i  log .P / D

m X kD1

k log

m X i D1

e 1m Li .t/qU.1/ .i; j /:

284

F. Del Greco M.

It remains to take the suprema over the set of t satisfying L.t/ 2 N .Q/ to get our result. t u Proposition 4. GU D JU . Proof. Notice that xQU 2 K if and only if m X i D1

tk  ; j D1 tj Pj;k

.U / xi Qi;k D Pm

which can be reformulated as tP  diag.xQU / D t: This is verified if and only if the matrix P  diag.xQU / as eigenvalue 1, and is equivalent to   det P  diag.xQU /Im D 0: Hence c n inf2B c r P.b d .n/ .U / 2 B r /  e

We get



JU ./.m1/Co.1/

t u  :

c P.  … Ar / D P.b d .n/ .U / 2 B r /   n supU 2U inf2B c JU ./.m1/Co.1/ r e   n Hr Co.1/ ; e

which proves Theorem 1.

4 Conclusions We end this paper by considering few of the properties of the confidence interval. The coverage of this set is quite good, in fact, for any p > 0, there exists a constant c.p/ such that c.p/ P.  … Ar /  ˛ C p : n The advantage of using the large deviations approach stands in the fact that for fixed r, the probability that Ar does not contain   decreases exponentially fast as n increases. On the other hand, the measure of the confidence interval is less than kQU1 kr, where k:k here denotes the usual matrix norm, and r is the ray of Br defined in the introduction. Acknowledgements The author is grateful to anonymous referees for their valuable comments which led to improvements in this paper.

Applications of Large Deviations to Hidden Markov Chains Estimation

285

References Capp´e O, Moulines E, Ryd´en T .2005/ Inference in Hidden Markov Models. Springer Dembo A and Zeitouni O .1998/ Large deviations techniques and applications. Springer Varadhan S R S .2008/ Special invited paper large deviations. Ann Probab 36 W 397  419 Visser I, Raijmakers M, and Molenaar P .2000/ Confidence intervals for Hidden Markov models parameters. Brit J Math Stat Psychol 53 W 317  327