More On Recurrence and Waiting Times - CiteSeerX

1 downloads 0 Views 246KB Size Report
Abraham J. Wyner. Department of Statistics ..... THEOREM C: Let X be a stationary, ergodic, d-dimensional random eld (de ned above). Let t > 0: jPrfWnP(ZC(n)).
More On Recurrence and Waiting Times by

Abraham J. Wyner  Department of Statistics Wharton School University of Pennsylvania Philadelphia, PA USA

Abstract Let X = fXn : n = 1; 2; : : :g be a discrete valued stationary ergodic process distributed according to probability P . Let Zn = fZ ; Z ; : : : ; Zng be an independent realization of an n-block drawn with the same probability as X. We consider the waiting time Wn de ned as the rst time the n-block Zn appears in X. There are many recent results concerning this waiting time that demonstrate asymptotic properties of this random variable. In this paper, we prove that for all n the random variable WnP (Z n) is approximately distributed as an exponential random variable with mean 1. We use a Poisson heuristic to provide a very simple intuition for this result, which is then formalized using the Chen-Stein method. We then rederive, with remarkable brevity, most of the known asymptotic results concerning Wn and prove others as well. We further establish the surprising fact that for many sources WnP (Zn) is exp(1) even if the probability law for Z is not the same as that of X. We also consider the d-dimensional analog of the waiting time and prove a similar result in that setting. Nearly identical results are then derived for the recurrence time Rn de ned as the rst time the initial n-block Xn reappears in X. 1

1

2

1

1

1

1

We conclude by developing applications of these results to provide concise solutions to problems that stem from the analysis of the Lempel-Ziv data compression algorithm. We also consider possible applications to DNA sequence analysis. 

Research supported by the National Science Foundation grant number DMS-9508933.

KEYWORDS: Recurrence times, entropy, pattern matching, Chen-Stein method, data compression, Lempel-Ziv algorithm. AMS subject classi cations 60-G07.

2

I. Introduction Let X = fX ; X ; : : :g be a nite alphabet stationary ergodic process on the space of in nite sequences endowed with probability P . Let Xi 2 A and denote any nite contiguous substring of X with the notation Xji = fXi; : : : ; Xj g. Let the sequence Zn be an independent realization of the rst n symbols of the process. Thus, for any n-block sequence of symbols zn with zi 2 A, we de ne 1

2

1

1

P (zn) = PrfZn = zng: 1

1

1

Since X and Z have the same distribution the random variable P (Xn) and P (Zn) are also identically distributed as well as independent. Our main interests are the waiting time Wn, de ned to be the rst time the sequence Zn appears in X, and the recurrence time Rn, de ned to be the rst time the sequence Xn reappears in X: 1

1

1

1

Wn = inf fk  1 : Xkk

n?1 = Zn g

+

Rn = inf fk  1 : Xkk n = Xng: + +1

1

1

These random variables have been the subject of a growing body of work that crosses over probability theory to computer science and information theory. Originally, A.D. Wyner and J. Ziv, motivated by interest in the Ziv-Lempel data compression algorithm, discovered (in [10]) that for all stationary, nite-alphabet, ergodic processes: log Rn = H in probability, lim (1) n!1 n where H is the process entropy. They found the same result to be true of Wn, if the process was also assumed to be Markov of arbitrary order. They conjectured that the same limit held almost surely (a fact that was later established in [4]). Subsequently, A.J. Wyner discovered in [11] that for Markov sources, Wn (and Rn ) satis ed a second order limit law: log Wp n ? nH lim ) N (0; 1) (2) n!1  n n1 )

where  = limn!1 V ar ? n P X . Interesting work on the \Wyner-Ziv" problem was presented by Shields who discovered (in [6]) that the limit theorem of (1) was not true of all stationary, ergodic processes especially if the process memory vanished suciently slowly. In 1994, Ornstein and Weiss showed that Wn and Rn had simple analogs for stationary, ergodic, d-dimensional random elds and that a limit law similar to (1) also held in that setting. 2

(

log

(

3

In this paper, we consider the quantities WnP (Zn) and Rn P (Xn). We establish that the respective products each are approximately distributed exponentially with mean 1 (for all n) providing an explicit error term. Since P (Zn) is a very familiar object (since it is easily transformed into a random walk by taking logarithms), this result transforms Wn and Rn into equally familiar objects. We then use this non-asymptotic result to prove several new asymptotic results and reprise some older ones. 1

1

1

The accuracy of the approximation is contingent upon two quantities that characterize the length of the process memory. For ?1  i  j  1 let Bij denote the - eld generated by Xji . De ne for k  1: (k) = sup 1 jP (A \PB()B?)PP((AB))P (A)j 0 ;B2Bk A2B?1

where 0=0 is de ned to be 0. We say X is -mixing if (k) ! 0 as k ! 1. Furthermore, we de ne the quantity k? ); P (xk? jx k ); P (xk? jx? )]:

(k) = x?max [ P ( x k ?k ;:::;x k k 0

2

1

0

1

2

0

1

1

Since jAj is nite there there must exist a k-block whose probability or conditional probability is maximal. Therefore, (k) always exists. The relationship between waiting times and probability functions has been pursued before. It is well known (to information theorists who study data compression) that ? log P (Xn) represents (in some sense) an ideal codeword length. Furthermore, a Lempel-Ziv codeword is very closely related to a quantity Lm  , de ned to be the largest k such that a copy of Zk can be found as a contiguous substring of Xm: 1

1

1

Lm = maxfk  1 : Zk = Xjj 1

k?1

+

for some j 2 [1; m ? k + 1]g:

(3)

The relationship of Wn and Lm is through the observation that

fLm < ng = fWn > mg: Thus Wn and Lm are \duals" of one another. In [12], it was discovered that if

Tm = inf fk : ? log P (Zk ) > log mg; 1

(4)

The quantity Lm is exactly the length of the rst phrase in the Fixed-Database version of the LZ(77) algorithm; see [12]. 

4

then ? log P (ZTm ) and Lm are asymptotically equivalent. This was only proved if the process was assumed to be Markov. Finally, no history of the waiting time problem, however brief, would be complete without a reference to the ground breaking work of Szpankowski (see [9]) who has considered Lm and related match lengths in quite literally dozens of papers with almost as many collaborators. 1

The rst explicit connection between log Wn and ? log P (Zn) (as well as log Rn and log P (Xn)) was made by Kontoyiannis in [2]. He showed, with methods similar to those found in [10], that ? log P (Zn) provides a strong asymptotic approximation to log Wn for mixing processes and that log Rn is approximated by ? log P (Xn) for all stationary ergodic processes. 1

1

1

1

II. Waiting Times THEOREM A: Let X be a stationary, nite valued, ergodic process. For t > 0, we have

j PrfWnP (Zn)  tg ? exp(?t)j  (1 ^ t)[8n (n) + (n)] 1

Thus, for processes with memory that vanishes suciently fast, the quantity WnP (Z1n) has an exponential distribution with mean 1. Corollary A1: If X is an ergodic Markov chain then there exists a positive constant < 1 such that j PrfWnP (Zn1 ) > tg ? exp(?t)j  (1 ^ t)9n n:

Corollary A2: If n (n) ! 0 and (n) ! 0 then Prflog[Wn P (Zn)] > log log ng ! 0 1

Corollary A3: If (n) = o(n) then for any  > 0: Prflog[WnP (Zn)] < ?(1 + ) log ng ! 0 1

Corollary A4: Suppose X is i.i.d. uniform over A with s = jAj. Then for any a 2 IR and all integers n > 0:

j Prflogs Wn > a + ng ? exp(?sa )j  (1 ^ sa)8ns?n: 5

Corollary A5: If X is an ergodic Markov Chain then for any  > 0:

?(1 + ) log n  log[WnP (Zn)]  log log n eventually, almost surely. 1

Proofs of Corollaries A1-A5: It is easily shown that for Markov chains there exists p < 1 such that for all sequences x?nn: P (xn? )  pn, P (xn? jx??n)  pn and P (xn? jxnn )  pn. Typically 2

0

1

0

1

1

1

0

2

p might be the largest transition probability of the forward or backward chain if less than 1. If not, it is possible, without loss of generality, to extend the alphabet so that the largest transition is less than 1. This implies that (n)  pn. Now recall that for Markov chains there exists a positive constant q < 1 such that (n)  qn. Corollary A1 follows from Theorem A with = p _ q. Corollary A2, follows from Theorem A with t = log n. To prove Corollary A3, observe that with t = n?  , we have that (1+ )

PrfWnP (Zn) < n?  g  1 ? exp(?n?  ) + (1 ^ n? Prflog WnP (Zn)) < ?(1 + ) log ng  n?  + 8 (n)n? + (n)n? 1

(1+ )

(1+ )

(1+ )

1



(1+ )

)[8n (n) + (n)]



(1+ )

= o(1): Lastly, Corollary A4 is proved by noting that for X i.i.d. uniform, P (Xn) = s?n. This implies that

(n) = s?n and it is obvious that (n) = 0. Thus, Corollary A4 will follow from Theorem A with t = sa . We point out the following: a stronger version of Corollary A3 can be found in [2]; Corollary A2 is a stronger version in a less general setting than another theorem in [2]; and Corollaries A1 and A4 are essentially new versions of theorems that have appeared only in [11]. Corollary A5 is proved P1 (n) < 1, which implies, by by observing that for X ergodic Markov, P1 n (n) < 1 and n Borel-Cantelli, that Corollaries A2 and A3 hold almost surely. 1

=1

=1

We will need some new notation to provide a framework for Poisson approximation. To that end, we x zn 2 An. Let I = f1; 2; : : : m = d P zt n eg. Consider the doubly in nite extension of X to X1?1 and de ne ( if Xii n? = zn Yi = 01 otherwise : 1

( 1)

+

1

1

Thus Yi = 1 if the n-block at position i in X is equal to zn. We count the number of occurrences of zn over the index set I with the random variable X W (zn) = Yi: 1

1

1

i2I

6

For each i 2 I we denote the 2n neighborhood of i by Bi = fi ? 2n; i ? 2n + 1; : : : ; i + 2ng. We now de ne three expectations over X: XX B (zn) = EYiEYk i2I k2Bi X X B (zn) = EYiYk i2I i6 k2Bi X B (zn) = si where si = jE fYi ? EYij(Yk : k 2 I ? Bi )gj : 1

1

2

1

=

3

1

i2I

Finally, we let

(zn) , EW (zn): 1

1

Heuristically, if P (zn) is small and matches of zn are not likely to appear in clumps (this will be true if zn exhibits little self-symmetry) then the waiting time until an occurrence of zn (call this conditional waiting time Wn(zn)) should be approximately geometric with mean  = P zn . Thus if we close our eyes to the dependence one might believe that 1

1

1

1

1 ( 1)

1

PrfWn(zn)P (zn) > tg = PrfWn(zn) > dteg  (1 ? 1 )dte  exp(?t): 1

1

1

This statement we can make rigorous using the Chen-Stein method (see Barbour et al. [1]) which shows that: X (5) j PrfW (zn) = 0g ? exp(?(zn))j  (1 ^ ? (zn)) Bi(zn): 1

3

1

1

1

i=1

1

By the stationarity of X and the linearity of expectations we have that

(zn) = mP (zn) = t: 1

1

Furthermore, it is easy to see that PrfWnP (zn) > tjZn = zng = PrfW (zn) = 0g: 1

Thus we have

1

1

1

3 X

j PrfWnP (zn) > tjZn = zng ? exp(?t)j  (1 ^ t? ) Bi(zn): 1

1

1

1

i=1

(6)

1

We now state our key technical Lemma, leaving the proof for the appendix. LEMMA A: Let Bi = EZn Bi(Zn ). Then, B  4nt (n), B  4nt (n) , and B  t (2n). Proof of THEOREM A: Taking expectations with respect to Zn in (6), 1

1

1

2

3

1

7

PrfWnP (Zn) > tg  exp(?t) + (1 ^ t? ) 1

1

3 X

i=1

EZn Bi(Zn ): 1

1

Applying Lemma A, we have PrfWnP (Zn) > tg  exp(?t) + (1 ^ t)[8n (n) + (2n)]: 1

In the same way, we can prove reverse inequalities and thus,

j PrfWnP (Zn) > tg ? exp(?t)j  (1 ^ t)[8n (n) + (2n)]; 1

which proves Theorem A. 

III. Recurrence Times THEOREM B: Let X be a stationary, nite valued, ergodic process. Let t > 0. Then

jP (RnP (Zn)  t) ? exp(?t)j  (1 ^ t)[8n (n)[1 + (n)] + (n)] + 8n (n) 1

Corollary B1: If X is an ergodic Markov chain then there exists a positive constant < 1 such that

j PrfRnP (Xn) > tg ? exp(?t)j  (1 ^ t)16n n: Corollary B2: If n (n) ! 0 and (n) ! 0 then 1

Prflog[Rn P (Zn)] > log log ng ! 0 1

Corollary B3: If (n) = o(n) and n (n) = o(1) then for any  > 0: Prflog[Rn P (Xn)] < ?(1 + ) log ng ! 0 1

Corollary B4: Suppose X is i.i.d. uniform over A with s = jAj. Then for any a and all integers n > 0:

j Prflogs Rn > a + ng ? exp(?sa )j  (1 ^ sa)16ns?n:

Corollary B5: If X is an ergodic Markov Chain then for any  > 0:

?(1 + ) log n  log[Rn P (Zn)]  log log n eventually, almost surely. 1

8

Proof: We only need to make a simple modi cation in the de nitions. First we x xn 2 An and let I = f4n + 1; 4n + 2; : : : m = d P xt n eg. For i 2 I , de ne 1

( 1)

(

if Xii n = xn Yi = 10 otherwise : The de nitions of Bi , B (xn), B (xn), B (xn), and W (xn) are the same as in Section 2, replacing every occurrence of zn with xn. 1

1

2

1

3

1

+ +1

1

1

1

1

In the recurrence time setting, there are two kinds of occurrences of the initial patterns to consider: short range occurrences, where Rn < 4n, and long range occurrences where Rn > 4n. We argue that for processes with vanishing memory, it is very likely that Rn > 4n (short range matches occur with small probability). We formalize this by proving (in the appendix): LEMMA B: PrfRn < 4ng  4n (n): Observe that in the recurrence setting the quantities EYi and EYiYk are conditional events, conditioned on the outcome of the initial n-block. It is easy to see that B (xn)  B (zn)[1 + (n)] follows from the -mixing property (since we are conditioning on the initial Xn block). Similarly B (xn)  B (zn)[1 + (n)]. Finally, it follows that 1

1

1

1

1

2

1

2

1

PrfW (xn) = 0g  PrfRnP (xn) > tjXn = xng  PrfW (xn) = 0g + PrfRn < 4njXn = xng: 1

1

1

1

1

1

1

To apply the Chen-Stein equation to approximate PrfW (xn) = 0g we need to nd (xn) = EW (xn). This is no more dicult than in the waiting time case, but slightly more complicated: X (xn) = EYi 1

1

1

i2I

= [ t n ? 4n]P (xn) P (x ) = t ? 4nP (xn) 1

1

1

Since, for every xn 2 An, P (xn)  (n) it must be that 1

1

t ? 4n (n)  (xn)  t: 1

Thus, it follows from (5) (and the fact that e n n = 1 + 4n (n) + o(4n (n))) that 4

( )

9

1

3 X

j PrfRnP (Xn) > tjXn = xng ? exp(?t)j  (1 ^ (xn)? ))[ Bi(xn )] i + PrfRn < 4njXn = xng + 4n (n): 1

1

1

1

1

1

=1

1

1

Following the steps of the proof of Theorem A and using Lemma A and Lemma B proves Theorem B. 

IV. Generalization to Random Fields A remarkable feature of the Poisson approximation approach is the ability to nd extensions, with only minor modi cation of the proof used in the one-dimensional case, to random elds on the integer lattice \Zd . We consider a family of random variables fXu : u 2 \Zdg indexed by d-dimensional vectors u = (u ; : : : ; ud). We assume that for all vectors u the random variable Xu takes values in a nite set A. The distribution on the process is assumed to be a stationary and ergodic probability P on the product space  = QfAu : u 2 \Zd g (with each Au a copy of A). Suppose x is a realization of the process (and thus x 2 ) then Xu = xu is the coordinate at position u. For any subset U  \Zd we let XU (x) = fxugu2U . For any vector v 2 \Zd, let Tv x denote the realization with coordinates Xu(Tv x) = Xu v (x) = xu v . Thus, for any subset U 2 \Zd, we say that the d-dimensional pattern XU occurs at position v if XU = Xv U . 1

+

+

+

The d-dimensional analog of a sequence of length n in one dimension is the d-dimensional cube. For u; w 2 \Zd let [u; w) = fv 2 \Zd : uj  vj < wj for all j g. We can de ne the d-dimensional cube with side k > 0, to be the cartesian product of [0; k)d:

C (k) = fu 2 \Zd : 0  uj < k for all j g: We de ne the recurrence time Rn as the smallest value of k such that the initial n-cube pattern XC n occurs at some coordinate inside C (k) other than at coordinate 0: ( )

Rn = inf fk  1 : XC n = Xu ( )

C (n)

+

10

for some u 2 C (k); u 6= 0g:

The waiting time analog of the recurrence time Rn, considers an independent process fZu g also distributed with probability measure P . We de ne the waiting time Wn as the smallest value of k such that the n-cube pattern ZC n occurs in the process X at some coordinate inside C (k): ( )

Wn = inf fk  1 : ZC n = Xu ( )

C (n)

+

for some u 2 C (k)g:

Before we state any theorems we need to introduce d-dimensional analogs of (n) and (n). To that end, for any coordinate u 2 I , de ne the 2n-neighborhood of by Bu = fv : juj ? vj j  2ng. Thus, for any v 2 C (n) + u and any w 2 f\Zd ? Bu g it must be that jvj ? wj j  n for all j . This means that every coordinate outside of Bu is separated by at least n (in all directions) from every coordinate inside the n-cube at u. We now are able to de ne

(n) = xumax P (xC n jxv : v 2 B0 ? C(n)): u2B0

(n) = xumax P (xC n ) u2B0 0

:

1

( )

( )

:

Thus (n) is the probability of the n-cube with the highest probability with respect to P and (n) is the conditional probability of the n-cube with the highest conditional probability given all its neighbors that are within a distance n of any coordinate. Since, we are assuming that A is nite alphabet, the n-cube with the largest probability or conditional probability must exist and therefore

(n) = (n) _ (n) is well de ned. Now let  (n) denote the - eld generated by fXu : u 2 C (n)g and let  (?n) denote the - eld generated by fXu : u 2 \Zd ? B0g. Let jP (A \ B ) ? P (A)P (B )j (n) = sup P (A)P (B ) A2 n ;B2 ?n THEOREM C: Let X be a stationary, ergodic, d-dimensional random eld (de ned above). Let t > 0: j PrfWnP (ZC n ) d  tg ? exp(?t)j  (1 ^ t)[(4n)d (n) + (n)]: 0

1

0

1

( )

(

)

1

( )

Thus, for many processes with vanishing memory the quantity Wn P (ZC (n) ) d has an exponential distribution with mean 1. 1

Corollary C1: If X is an i.i.d. random eld then

d log Wn ) N (H;  ) nd 2

2

where

E ? log P (Zc n ) and  = lim V ar(? log P (Zc n )) : H = nlim n!1 !1 nd nd ( )

2

11

( )

Proof of Corollary C1: It follows from Theorem C that n? d d log Wn and ?n? d log P (ZC n ) are almost surely equal in the limit as n ! 1. By the independence assumption X ? log P (ZC n ) = log P (Zu); 2

( )

2

( )

u2C (n)

Since there are exactly nd vectors in C (n) the Corollary follows from the Central Limit Theorem. We remark that Corollary C1 will also be true if Z is a Markov random eld. Proof of Theorem C: We use the Chen-Stein method in the d-dimensional setting. Given ZC n = zC n we let m = d( P zCt n ) d e and I = fv 2 C (m)g. Then de ne ( if Xu C n = zC n Yu = 01 otherwise : Thus for any coordinate u, Yu = 1 if the n-cube at u is equal to the n-cube zC n . The total number of matches of zC n in C (m) is given by X W (zC n ) = Yu: 1

( )

( )

(

( ))

+

( )

( )

( )

( )

( )

u2I

The expected number of matches is easily derived from the stationarity and is given by

(zC n ) , EW (zC n ) = md P (zC n ) = t: ( )

( )

( )

The d-dimensional analogs B ; B and B of section II are XX EYuEYv B (zC n ) = u2I v2Bu X X EYuYv B (zC n ) = u2I u6 v2Bu X B (zC n ) = su where su = jE fYu ? EYuj(Yv : v 2 I ? Bu )gj 1

1

( )

2

( )

2

3

=

3

( )

u2I

In the appendix we prove the following: LEMMA C: The following bounds hold for all n: EB (ZC n )  (4n)dt (n), and EB (ZC n )  (4n)dt (n), and EB (ZC n )  t (n). From the Chen-Stein equation we have that X j PrfW (zC n ) = 0g ? exp(?(zC n ))j  (1 ^ ? (zC n )) Bi (zC n ): 1

3

( )

2

( )

( )

( )

1

3

( )

The theorem follows taking expectations and applying Lemma C . 12

i=1

( )

( )

V. Extensions Suppose Z does not have the same distribution as X . Let Z be distributed according to the stationary distribution Q and de ne

Q(zn) = PrfZn = zng: 1

We will need to de ne

1

1

Q(n) = max Q(zn) P (n) = max P (xn): zn xn 1

1

1

1

We will also need to bound the ratio of the conditional probability of an n-vector and the unconditional probability n? = xjX n = yg Pr f X n (n) = xmax ; n ? n ;y2A PrfX = xg Let = lim sup (n): 0

1

2

0

1

n!1

If we let

(n) = xmax 2A n 2

n X [log P (X0 = x0 jX??1n?i = x??1n?i) ? log P (X0 = x0 jX??1i = x??1i )] i=1

then < 1 if (n) < 1 which is akin to a vanishing memory condition. Clearly, is nite if X is Markov.

THEOREM D (Mismatch Theorem): Let (n) be the -mixing coecents of X. Then for any t > 0 and any n

j PrfWnP (Zn) > tg ? exp(?t)j  (1 ^ t)[4n P (n) + 4n (n) Q(n) + (n)]: 1

Corollary D1 If P and Q are ergodic Markov chains then there exists a constant < 1 such that

j PrfWnP (Zn) > tg ? exp(?t)j  (1 ^ t)[(5n + 4n ) n: 1

Corollary D2 If Z is Markov and X is ergodic, mixing and is nite and n P (n) = o(1) then nlim !1

log Wn = H (Q) + D(QjjP ) in probability: n 13

Proofs of the Corollaries: If X is Markov then is nite since P1n (n) < 1. Corollary =1

D1 follows using the same steps used to prove Corollary A1. Corollary D2 follows by evaluating n limn!1 WnnP Z : the Markovity of Z and the ergodic theorem implies that log

( 1)

? log P (Zn) = ?E log P (Zn) , H (Q) + D(QjjP ): lim n!1 n 1

1

We remark, that if the conditions of Corollary D1 are satis ed then the convergence is almost sure by Borel Cantelli.

Proof of Theorem D: The proof rests on a modi cation of Lemma A, (proved in the Appendix) LEMMA D: B = EZn B (Zn)  4nt (n) Q(n). 2

1

2

1

The proof of Theorem D follows the steps of the proof of Theorem A. The bounds on B and B of Lemma A apply to the mismatch case unchanged. If Lemma D is applied to bound B then the theorem is proved. 1

3

2

To conclude, we remark that the mismatch case was rst considered in [11] where it is remarked that the proof used to prove (2) may be carried over to the mismatch case with only slight changes. There are mismatch extensions of the main theorems in [2] as well. Furthermore, there are perhaps uncounted corollaries and extensions of Theorem A; we have remarked on only a few. It is, however, this author's belief that Theorem A provides a thorough settlement of the waiting time problem.

VI. Applications It is clear that pattern matching theory, as developed in the preceding sections, has a natural connection to the Lempel-Ziv data compression algorithm (see section I). The relevant quantity is the match length Lm de ned in (3). Since the algorithm's asymptotic performance is determined by this match length, if the distribution and moments of the match length can be found then performance properties can be derived. Our main theorem here provides a precise bound on the di erence between the match length Lm and the random variable Tm, de ned in (4). Before stating and proving that theorem, we establish an extension of Theorem A. 14

Let N be a positive, integer valued random variable, that is independent of X (but not necessarily independent of Z). Consider the random variable WN P (ZN ). The following theorem establishes criterion for when we can expect the distribution of this random variable to also be distributed exp(1). THEOREM E: Assume that there exist constants  and  so that   N   . Then for t > 0. 1

1

2

1

2

j PrfWN P (ZN )  tg ? exp(?t)j  (1 ^ t)[8 ( ) + ( )] 2

1

1

1

Proof of Theorem E: The proof of the theorem is follows closely the steps of the proof of Theorem

A. Since we are assuming bounds on N , the proof is basically identical to the non-random case; we will only sketch the proof. A more general theorem, for unbounded N , surely holds, but for our applications we only need to establish the inequality of Theorem E for bounded N . The rst step is to condition on N = n and Zn = zn. Since N is independent of X (as is Z), the inequality of (6) holds unchanged. We cannot apply Lemma A directly, however, because of the randomness of N . Instead we apply the following simple extension: LEMMA E: Let Bi = EZN Bi (ZN ). Then, B  4 t ( ), B  4 t ( ) , and B  t (2 ). Applying Lemma E completes the proof of the Theorem. 1

1

1

1

1

2

1

2

2

1

3

1

The most useful application of this Theorem is with N = Tm (Z). In this case, the quantity P (ZN ) is nearly m so that PrfWN > mg  exp(?1). It is then easy to prove the following extension of Theorem 3 of [12]: 1

1

THEOREM F: Let  = Lm ? Tm(Z). If Z satis es the conditions of Theorem E, then there

exists a positive constant  such that

m + j ) j : Prf  j g  j + O( log m  ) ?j Prf  ?j g  exp(? ?j ) + O( (logmm  ) 2

A very useful application of Theorem F makes use of the elementary fact that ETm (Z) = Hm + O(1) for Z both ergodic and Markov (see the appendix of [12] for a short proof). Since Theorem F implies that the tails of  are at least exponential we have log

15

Corollary F: If Z is Markov and Ergodic then E  = O(1) ELm = logHm + O(1): In conclusion, it follows easily from Corollary F and Theorem F that the Lempel-Ziv parsing algorithm is simply a partition of a sequence into equally probable segments of random length, with an approximately Normal distribution (since Tm(Z) is Normal) and mean equal to log n=H . From this fact it is easy to establish convergence rates (see [12], [11] and [10] for details). Another application of Theorem F is in entropy estimation. Most approaches to entropy estimation proceed by rst estimating a model and then forming a plug-in estimate by computing the entropy of the estimated model. From corollary F it follows that consistent estimates of the entropy can be constructed using match lengths. Algorithms that are based upon match lengths have found useful application in linguistics and DNA sequence analysis (see [13], [14] and [3] ).

VIII Appendix Proof of Lemma A: The proof of the technical lemma is straightforward; we need only evaluate B ; B , and B . Recall that Yi is the indicator of the event fXii n? = Zn g, the index set I = f1; 2; : : :; m = d P Zt n eg, and Bi = fi ? 2n; i ? 2n + 1; : : : ;  + 2ng. 1

2

+

3

1

1

( 1)

B = EZn 1

1

XX i2I k2Bi

EX [YijZn]EX [Yk jZn] 1

1

X n n =a EZn t n E X [Yi jZ ]EX [Yk jZ ] P (Z ) k2B =b EZn Pt(4Znn) P (Zn ) =c 4ntEZn P (Zn)

( )

1

1

1

( )

2

1

1

0

1

1

( )

d

1

1

 4nt (n):

( )

Where (a) follows from the de nition on B using the stationarity of X and the fact that there are t=P (Zn) summands. Step (b) follows since Yk is the indicator of the event fXkk n? = Zn g whose 1

+

1

16

1

1

expectation with respect to X is P (Zn). Step (d) follows from the de nition of (n). To bound B start with the de nition: XX B = EZn EX[YiYk jZn] i2I k2Bi X n E =a EZn t n X [Y Yk jZ ] P (Z ) 6 k2B Xn =b EZn P (Zt n) PrfXkk n? = Zn jXn? = Zn g PrfXn? = Zn g k ? n;k6 n X =c tEZn EX 1fXkk n? = Xn? jXn? = Zn g 1

2

2

1

1

( )

0

1

1

0=

2

( )

1

1

=

+

2

+

6 k ?2n 2n X

6 k ?2n

0= =

1

t PrfXkk

1

1

0

1

0= =

( )

1

0

1

1

0

1

1

=0

2

( )

=d

1

0

0

1

1

n?1 = Xn?1 g:

+

0

Step (a) follows from the stationarity and the de nition of B and the fact that there are P Zt n summands. Step (b) follows by conditioning on the event Y = 1 which is identical to the event fXn? = Zn g. Step (c) follows by canceling the P (Zn) term in the denominator with the PrfXn? = Zng term. Step (d) follows from the de nitions of conditional probability and the fact that Zn has the same distribution as Xn? . To complete the bound on B we must evaluate PrfXkk n? = Xn? g for integers k with 0 < jkj  2n. First consider 0 < k  n. 2

( 1)

0

0

0

1

1

1

1

1

0

1

+

1

0

1

2

1

PrfXkk n? = Xn? g = EXnn EXn? [1fXn? = Xkk n? gjXnn ]: +

1

0

1

2

+1

0

1

0

1

+

1

2

+1

We will show that for every outcome of the \future" sequence x n n there exists a unique sequence Uk (xnn ) such that 2 1+

2

PrfXn? = Xkk n? jXnn = xnn g = PrfXn? = Ui (xnn)jXnn = xnn g: 0

1

+

1

2

2

+1

0

1

2

2

2

In other words, if we condition rst on the the outcome of the second n block from time n to 2n, label it x, there exists only a single string Uk (x) which will admit a match at position k. We prove by construction: write Uk (x) = fu ; : : : ; ung. Clearly, the last k positions of Uk (x) must be the rst k positions of x: un?j = xk ?j for j = 1; 2; : : : k. The rest of the sequence is de ned recursively so that the matching condition Xn? = Xkk n? is satis ed, that is we let un?j = un?j k for j = k; : : : ; n ? 1. Therefore, 1

+1

+1

0

1

+

1

+1

PrfXkk n? = Xn? g = E [1fXn? = Xkk n? gjXnn] +

1

0

1

0

17

1

+

1

2

+ +1

= E [1fXn = Ui(Xkk n? )gjXnn] +

1

1

2

 (n):

The same proof applies for 0 < ?k  n and for k > n we have by the de nition of (n) that PrfXkk n? = Xn? g  (n). We now nish the bound on B by observing that Xn B  PrfXkk n? = Xn? g  4n (n): +

1

0

1

2

2

2

+

1

0

6 k ?2n

1

0= =

The nal task is to evaluate B . Recall that si = jE fYi ? EYij(Yk : k 2 I ? Bi )gj and that X B = EZn [sijZn]: 3

3

1

1

i2I

Given any set A 2 (X : 62 B ) with P (Z ) > 0 it follows from the -mixing property of X: 0

P (fY = 1g \ A) ? P (A)P (Y = 1)  (2n)P (A)P (Y = 1): 0

0

Thus

0

E [Y jA]  [1 + (2n)]P (Zn): 0

1

This implies that for any i 2 I ,

si  P (Zn)[1 + (2n)] ? P (Zn): 1

1

Hence,

B = EZn 3

1

X i2I

[sijZn ] 1

 EZn P (Zt n) [P (Zn)[1 + (2n)] ? P (Zn)] 1

=

(2n)t

1

1

1

Proof of Lemma B: Using the same procedure used to bound B it follows that 2

PrfRn < 4ng 

n?1 X

4

k=1

PrfXkk n = Xng + +1

1

 4n (n): Proof of Lemma C: This is the d-dimensional version of Lemma A and the proof carries over with only changes in the index set. We leave the details to the reader. 18

Proof of Lemma D: We start with the de nition of B : XX B = EZn EX [YiYk jZn] 2

2

1

1

i2I k2Bi

X n = EZn t n P (Z ) 6 k2B EX [Y Yk jZ ] Xn k n? = Zn jXn? = Zn g PrfXn? = Zn g = EZn t n P (z ) k ? n;k6 PrfXk Xn = tEZn EX1fXkk n? = Xn? jXn? = Zng 0

1

1

0=

1

0

2

1

1

=

+

2

+

6 k ?2n 2n X

6 k ?2n

0= =

1

0

1

0= =

t PrfXkk

1

0

1

1

0

1

1

=0

2

=

1

n?1

+

1

1

0

1

= Xn? jZn = Xn? g 0

1

1

1

0

As in the proof of Lemma A we will need a bound, for all 0  jkj  2n, on PrfXkk n? = Xn? jZn = Xn? g. De ne a sequence of random variables X^ i from i = n; : : : ; 2n so that the unconditional distribution of X^ i is the conditional distribution of Xi given that Xn? = Zn . As before (in the proof of Lemma A), if Xnn = xnn then there exists a unique sequence Uk (xnn) that will admit a match of Xn? at position k. Thus PrfXkk n? = Xn? jZn = Xn? g = EZn EXnn 1fZn = Uk (X^ nn)g = EXnn EZn 1fZn = Uk (X^ nn)g X = PrfZn = Uk (x)jX^ nn = xg PrfX^ nn = xg +

1

0

1

0

1

1

0

2

1

0

1

1

2

1

0

+

2

1

1

0

1

1

^2

x2An

^2

2

1

2

1

1

2

1

 x;ymax PrfZ n = yjX^ nn = xg 2An

2

2

1

^ nn = xjZn = yg PrfZ = yg = x;ymax Pr f X n 2A PrfX^ nn = xg PrfX^ nn = xjXn? = yg (n)  x;ymax Q 2An PrfXnn = xg  (n)P Q(n): 2

1

1

2

0

n

2

1

2

Since this holds for every k, it must be that

B  4nt (n) Q(n): 2

Proof of Lemma E: By conditioning rst on M and applying the bounds as well as the independence of M and X, we can follow the steps of the proof of Lemma A. The details are left to the reader. A detailed description of random length patterns can be found in [11]. 19

References [1] Barbour A., Holst, and Janson, Poisson Approximation, Oxford Press, 1992. [2] Kontoyiannis, I., \Asymptotic Recurrence and Waiting Times For Stationary Processes" Technical Report No. 92, Stanford University, July 1996. [3] Farach M., Noordeweir M. , Savari S., Shepp L., Wyner A.J., Ziv J. , \The Entropy of DNA: Algorithms and Measurements Based on Memory and Rapid Convergence", SODA 1995, Jan. 1995. [4] Ornstein, D.S, and Weiss, B. \Entropy and data compression schemes". IEEE Trans. on Information Theory, 39, pp. 78-83, 1993. [5] Ornstein, D.S, and Weiss, B. \Entropy and Recurrence Rates for Stationary Random Fields". Presented at IEEE Int. Symp. on Information Theory, Trondheim, Norway, Jun27July 1, 1994. [6] Shields, P.C., \Waiting times: positive and negative results on the Wyner-Ziv Problem," Journal of Theoretical Probability, Vol. 6, nr. 3, pp.499-519 , 1993. [7] Shields, P.C., \Entropy and Pre xes", Annals of Probability, vol. 20, pp 403-409, 1992. [8] Shields, P.C., The Ergodic Theory of Discrete Sample Paths, American Mathematical Society, Providence, 1996. [9] Jacquet P. and Szpankowski, W., \ Autocorrelation on Words and Its Applications: Analysis of Sux trees by String-Ruler Approach" Journal of Combinatorial Theory Vol. 66, NO. 2, May 1994. [10] Wyner, A. D. and J. Ziv, \Some asymptotic properties of the entropy of a stationary ergodic source with applications to data compression," IEEE Trans. on Information Theory, 35, pp. 1250-1258, Nov. 1989. [11] Wyner, A. J., String Matching Theorems and Applications to Data Compression and Statistics, Ph.D. Thesis, Stanford University, 1993. 20

[12] Wyner, A.J. \The redundancy and distribution of phrase lengths of the FDLZ algorithm" IEEE Trans. on Information Theory, September, 1997. [13] Wyner, A.J. \Entropy Estimation and Patterns" Proceedings of the 1996 ISIT workshop , Haifa, Israel. 1996. [14] Wyner, A.D., Ziv J, and Wyner A.J. \The Role of Pattern Matching in Information Theory", IEEE Trans. on Information Theory, 50th Anniversary Edition, October, 1999. Permanent Address: Department of Statistics, The Wharton School, University of Pennsylvania 3000 Steinberg-Dietrich Hall, Philadelphia, PA 19104.6302 email: [email protected]

21

Suggest Documents