Error exponents for Neyman-Pearson detection of a continuous-time

0 downloads 0 Views 280KB Size Report
(SDE), it is shown that the minimum Type II error probability decreases exponentially ... noise hypothesis and the second as the signal plus noise hypothesis. ... spacing possibly subject to a cost or a power constraint. .... that end, we derive the expressions of the likelihood functions f0,N (Y1:N |T1:N ) and f1,N (Y1:N | T1:N ).
SUBMITTED TO IEEE TRANS. INFORMATION THEORY

1

Error exponents for Neyman-Pearson detection of a continuous-time Gaussian Markov process from regular or irregular samples W. Hachem, E. Moulines and F. Roueff

Abstract This paper addresses the detection of a stochastic process in noise from a finite sample under various sampling schemes. We consider two hypotheses. The noise only hypothesis amounts to model the observations as a sample of a i.i.d. Gaussian random variables (noise only). The signal plus noise hypothesis models the observations as the samples of a continuous time stationary Gaussian process (the signal) taken at known but random time-instants and corrupted with an additive noise. Two binary tests are considered, depending on which assumptions is retained as the null hypothesis. Assuming that the signal is a linear combination of the solution of a multidimensional stochastic differential equation (SDE), it is shown that the minimum Type II error probability decreases exponentially in the number of samples when the False Alarm probability is fixed. This behavior is described by error exponents that are completely characterized. It turns out that they are related to the asymptotic behavior of the Kalman Filter in random stationary environment, which is studied in this paper. Finally, numerical illustrations of our claims are provided in the context of sensor networks.

Index Terms Error exponents, Kalman filter, Gaussian Markov processes, Neyman-Pearson detection, Stein’s Lemma, Stochastic Differential Equations.

The authors are with Institut Telecom / Telecom ParisTech / CNRS LTCI, France. e-mails: walid.hachem, eric.moulines, [email protected] DRAFT

SUBMITTED TO IEEE TRANS. INFORMATION THEORY

2

I. I NTRODUCTION The detection of stochastic processes in noise has received a lot of attention in the past decades, see for instance the tutorial paper [1] and the references therein. More recently, in the context of sensor networks, there has been a rising interest in the analysis of detection performance when the stochastic process is sampled irregularly. An interesting approach in this direction has been initiated in [2] using error exponents for assessing the performance of the optimal detection procedure. In this paper, we follow this approach in the following general setting. Given two integers p and q , and A a positive stable square matrix, we consider the q -dimensional stochastic process defined as the stationary solution of the stochastic differential equation (SDE) dX(t) = −A X(t) dt + B dW (t),

t≥0

(1)

where (W (t), t ≥ 0) is a p-dimensional Brownian motion and B is a q ×p matrix. The SDE (1) is widely used to describe continuous time signals (see [3], [4] and the references therein). We are interested in the detection of the signal (X(t), t ≥ 0) from a finite sample obtained from a general spacing model. Let C be a d × q matrix, (Tn , n ≥ 1) be a renewal sampling process and (Vn , n ≥ 1) be a noise sequence.

Based on the observed samples Y1:N = (Y1 , . . . , YN ) and T1:N = (T1 , . . . , TN ), our goal is to decide whether for n = 1, . . . , N , Yn = Vn or Yn = CX(Tn ) + Vn . The first situation will be referred to as the noise hypothesis and the second as the signal plus noise hypothesis. Pn The renewal hypothesis on (Tn , n ≥ 1) means that Tn = k=1 Ik where the (Ik , k ≥ 1) are

nonnegative i.i.d. random variables called holding times with common distribution denoted by τ . This is a standard model for possibly irregular sampling, see [2], [5], [6]. The most usual examples are: •

The Poisson point process. In this case, the (Ik , k ≥ 1) are i.i.d. with exponential distribution τ (dx) = λ exp(−λx) dx. This model has been considered in [2], [5], [6] to model the situation

where the signal is measured in time by N identical asynchronous sensors. •

The Bernoulli process. This is the discrete time counterpart of the Poisson process. In this case, the (Ik , k ≥ 1) are i.i.d. and have geometric distribution up to a multiplicative time constant S > 0,

i.e. τ ({Sℓ}) = τ ({S})(1 − τ ({S}))ℓ−1 for ℓ = 1, 2, . . . In practice, this model corresponds to a

regular sampling with period S for which observations are missing at random, with failure probability 1 − τ ({S}). The regular sampling process corresponds to τ ({S}) = 1.

Two binary tests are considered in this work: either H0 is the noise hypothesis and H1 is the signal plus noise hypothesis, or the opposite. Constraining the False Alarm probability (probability for deciding H1 under H0) to lie beneath an ε ∈ (0, 1), it is well known that the minimum Type II error probability DRAFT

SUBMITTED TO IEEE TRANS. INFORMATION THEORY

3

is achieved by the Neyman-Pearson test. It will be shown in this paper that this minimum Type II error probability βN (ε) satisfies βN (ε) = exp(−N (ξ + o(1))) as N → ∞, where the error exponent ξ does not depend on ε. The error exponent ξ is an indicator of the performance of the detection test. Its value will be shown to depend on the distribution of signal (given by A, B and C ) and on the distribution of the sampling process (given by τ ). An important goal in sensor network design is to optimize the sampling process. Characterizing the error exponents offers useful guidelines in this direction. For instance, [2], [7]–[10] provide useful insights on such concrete problems as the choice of the optimum mean sensor spacing possibly subject to a cost or a power constraint. Other application examples are considered in [11], [12]. In these contributions, error exponents are used to propose optimum routing strategies for conveying the sensors data to the fusion center. In the context of Neyman-Pearson detection, these error exponents are given by the limits of the likelihood ratios, provided that these limits exist. Let Z1:N = (Z1 , . . . , ZN ) be a sequence of N observed random vectors. Assume a binary test is performed on this sequence, and assume that under hypothesis H0, the distribution of Z1:N has the density f0,N , while under H1, this distribution has the density f1,N . Fix ε ∈ (0, 1) and let βN (ε) be the minimum over all tests of the Type II error probability when the False Alarm probability α is constrained to satisfy α ≤ ε. Let   f0,N (Z1:N ) 1 LN (Z1:N ) = log N f1,N (Z1:N ) be the normalized Log Likelihood Ratio (LLR) associated with the received Z1:N . Then the asymptotic behavior of βN (ε) can be obtained from the following theorem, found in [13]. Theorem 1 Assume there is a real number ξ such that the random variable LN (Z1:N ) satisfies LN (Z1:N ) −−−−→ ξ N →∞

Then for every ε ∈ (0, 1), −

in probability under H0.

(2)

1 log βN (ε) −−−−→ ξ . N →∞ N

In the case where the Zi ’s are i.i.d. under both hypotheses, the analogue of Theorem 1 appeared in [14] and is known as Stein’s lemma. The generalization to Theorem 1 can be found in [13], [15]. In our case, the observed process is Z1:N = (Z1 , . . . , ZN ) with Zn = (Yn , Tn ), in other words, the measurements consist in the sampled received signal and the sampling moments. Let us consider that Y1:N = V1:N under H0 and Y1:N = (CX(Tn ) + Vn )1≤n≤N under H1. Recall that the probability distribution of T1:N

DRAFT

SUBMITTED TO IEEE TRANS. INFORMATION THEORY

does not depend on the hypothesis to be tested. In these conditions, the LLR is given by   f0,N (Y1:N | T1:N ) 1 LN (Z1:N ) = log N f1,N (Y1:N | T1:N )

4

(3)

where f0,N (. | T1:N ) and f1,N (. | T1:N ) are the densities of Y1:N conditionally to T1:N under H0 and H1 respectively. It is clear that f0,N (. | T1:N ) = N (0, 1). Being solution of the SDE (1), the process (X(t), t ≥ 0) is a Gaussian process. In consequence, f1,N (. | T1:N ) = N (0, R(T1:N )) where matrix R(T1:N ) is a covariance matrix that depends on T1:N . In the light of Theorem 1 we need to establish

the convergence in probability of the Right Hand Side (RHS) of Eq. (3) towards a constant ξ , and to characterize this constant, under the assumption Y1:N = V1:N . Alternatively, if we consider that H0 is the Signal plus Noise hypothesis Y1:N = (CX(Tn ) + Vn )1≤n≤N , then we study the convergence of −LN under this assumption. Theorem 1 has been used for detection performance analysis in [2], [7], [16], [17]. In the closely related Bayesian framework, error exponents have been obtained in [8]–[10], [18]. The closest contributions to this paper are [2], [7], [17] which consider different covariance structure for the process and different sensors locations models. In [7], Sung, Tong and Poor consider the scalar version of the SDE (1) and a regular sampling. In [17], the authors essentially generalize the results of [7] to situations where the sensor locations follow some deterministic periodic patterns. In [2], the sampling process (sensor locations) is a renewal process as in our paper, and the detector discriminates among two scalar diffusion processes described by Eq. (1). Moreover, the observations are noiseless. Here, due to the presence of additive noise, our technique for establishing the existence of the error exponents and for characterizing them differ substantially from [2]. We establish the convergence of the LLR LN (Z1:N ) by studying the stability (and ergodicity) of the Kalman filter, using Markov chains techniques. The paper is organized as follows. In Section II, the main assumptions and notations are introduced and the main results of the paper are stated. Proofs of these results are presented in Section III. Some particular cases of interest are presented in Section IV. Section V is devoted to numerical illustrations and to a discussion about the impact of the sampling scheme on the detection performance. The proofs in Section III rely heavily on a theorem for Markov chains stability shown in appendix B. The other appendices contain technical results needed in the proofs.

DRAFT

SUBMITTED TO IEEE TRANS. INFORMATION THEORY

5

II. T HE E RROR E XPONENTS We consider the following hypothesis test based on observations (Yn , Tn ), n = 1, . . . , N , that we shall call the “H0-Noise” test: H0 : Yn = Vn

for n = 1, . . . , N (4)

H1 : Yn = CX(Tn ) + Vn

for n = 1, . . . , N

where C is the d×q observation matrix and the involved processes satisfy the following set of conditions. Assumption 1 The following assertions hold. (i) The process (X(t), t ≥ 0) is a stationary solution of the stochastic differential equation (1) where (W (t), t ≥ 0) is a p-dimensional Brownian motion. P (ii) (Tn , n ≥ 1) is a renewal process, that is Tn = n1 Ik where (In , n ≥ 1) is a sequence of i.i.d.

non-negative r.v.’s with distribution τ and τ ({0}) < 1.

(iii) (Vn , n ≥ 1) is a sequence of i.i.d. r.v.’s with V1 ∼ N (0, 1d ). (iv) The processes (X(t), t ≥ 0), (Tn , n ≥ 1) and (Vn , n ≥ 1) are independent. Here 1d denotes the d × d identity matrix. In order to be able to apply Theorem 1, we now develop the expression of the LLR given by (3). To that end, we derive the expressions of the likelihood functions f0,N (Y1:N | T1:N ) and f1,N (Y1:N | T1:N ). The density f0,N (. | T1:N ) is simply the density N (0, 1N d ) of (V1 , . . . , VN ), therefore ! N 1 1X T f0,N (Y1:N | T1:N ) = Yn Yn . exp − 2 (2π)N d/2 n=1

(5)

We now develop f1,N (. | T1:N ) by mimicking the approach developed in [19] and in [7]. Let Q(x) be

the q × q symmetric nonnegative matrix defined by Z x T e−uA BB T e−uA du . Q(x) =

(6)

0

As A is positive stable, the covariance matrix Q(∞) exists (by Lemma 3) and is the unique solution of the so called Lyapunov’s equation QAT + AQ = BB T , see [20, Chap. 2]. Solving Eq. (1) between Tn and Tn+1 (see [3, Chap. 5]), the conditional distribution of the process (Xn , n ≥ 1) given (In , n ≥ 1) is characterized by the recursion equation Xn+1 = e−In+1 A Xn + Un+1 ,

n≥0,

(7)

where the conditional distribution of the sequence (X0 , Un , n ≥ 1) is that of a sequence of independent r.v.’s, X0 ∼ N (0, Q(∞)) and Un ∼ N (0, Qn ) with Qn = Q(In ) defined by (6). DRAFT

SUBMITTED TO IEEE TRANS. INFORMATION THEORY

6

Now we write f1,N (Y1:N | T1:N ) =

N Y

n=1

f1,n,N (Yn | Y1:n−1 , T1:N )

(8)

where f1,n,N (. | Y1:n−1, T1:N ) is the density of Yn conditionally to (Y1:n−1 , T1:N ). In view of Eq. (7), Yn = CXn + Vn and the assumptions on (Vn ), these conditional densities are Gaussian, in other words   1 1 T −1 b b (9) f1,n,N (Yn | Y1:n−1 , T1:N ) = exp − (Yn − Yn ) ∆n (Yn − Yn ) 2 det(2π∆n )1/2   where Ybn = E [Yn | Y1:n−1 , T1:N ] and ∆n = Cov Yn − Ybn | T1:N are respectively the conditional

expectation of the current observation Yn given the past observations and the so-called innovation covariance matrix under H1. From Equations (5), (8) and (9), the LLR LN writes LN (Y1:N , T1:N ) = =

1 1 log f0,N (Y1:N | T1:N ) − log f1,N (Y1:N | T1:N ) N N N N N 1 X 1 X T 1 X T −1 b b log det ∆n + (Yn − Yn ) ∆n (Yn − Yn ) − Yn Yn .(10) 2N 2N 2N n=1

n=1

As (Yn ) is described under H1 by the state equations  −In+1 A X + U X n+1 = e n n+1 H1 :  Y = CX + V n

n

n=1

for n = 1, . . . , N,

(11)

n

it is well known that Ybn and ∆n can be computed using the Kalman filter recursive equations. Define bn and the q × q matrix Pn as the q × 1 vector X bn = E[Xn | Y1:n−1 , T1:N ] X

  bn | T1:N . and Pn = Cov Xn − X

The Kalman recursions which provide these quantities are [21, Prop. 12.2.2]:     bn + e−In+1 A Pn C T CPn C T + 1d −1 Yn bn+1 = e−In+1 A 1q − Pn C T CPn C T + 1d −1 C X X  −1  T C Pn e−In+1 A + Qn+1 . Pn+1 = e−In+1 A 1q − Pn C T CPn C T + 1d

(12) (13)

b1 = 0 and P1 = Q(∞). With these quantities at The recursion is started with the initial conditions X

hand, Ybn and ∆n are given by

bn Ybn = C X

and ∆n = CPn C T + 1d .

(14)

With these expressions at hand, our purpose is to study the asymptotic behavior of LN given by Eq. (10) assuming that that (Yn ) is i.i.d. with Y1 ∼ N (0, 1d ) (under H0). In our analysis, we shall require Model (1) to be controllable, i.e., (A, B) satisfies

q−1 X



Aℓ BB T AT > 0.

ℓ=0

Recall that (A, B) is controllable if and only if the matrix Q(x) defined by Equation (6) is nonsingular for any x > 0 (see [22, Chap. 6] for a proof). DRAFT

SUBMITTED TO IEEE TRANS. INFORMATION THEORY

7

Since under H0, ηn = (In+1 , Yn ), n ≥ 1, are i.i.d. r.v.’s, the Kalman equations (12)-(13) can be written as a random iteration scheme. To this end we introduce some notation. Let Pq denote the cone of q × q

nonnegative symmetric matrices. For any η = (I, Y ) ∈ [0, ∞) × Rd , we denote by Fη the Rq × Pq -valued function defined for all w = (x, p) ∈ Rq × Pq by   −1  −1  Y C x + e−IA pC T CpC T + 1d e−IA 1q − pC T CpC T + 1d  .   Fη (w) :=   T −1 C pe−IA + Q(I) e−IA 1q − pC T CpC T + 1d

(15)

Using this notation the Kalman equations read

Wn = Fηn (Wn−1 ),

n≥1,

bn+1 , Pn+1 ). Under H0, since (ηn ) is a sequence of i.i.d. random variables, (Wn )n≥0 where Wn = (X

is a Markov chain starting at W0 = (0, Q(∞)). Observe also that since the second component of Fη , denoted by F˜I (p) in the following, does not depend on x, (Pn )n≥1 also is a Markov chain starting at P1 = Q(∞). Since it neither depends on Y , this is also true under H1 as well. Let [0, Q(∞)] denote the

subset of all matrices p ∈ Pq such that p ≤ Q(∞). It is easy to see that, for any I ≥ 0, [0, Q(∞)] is a

stable set for F˜I . Indeed, suppose that p ∈ [0, Q(∞)], then  −1  −IAT + Q(I) C pe F˜I (p) = e−IA 1q − pC T CpC T + 1d T

≤ e−IA pe−IA + Q(I) T

≤ e−IA Q(∞)e−IA + Q(I) = Q(∞) ,

by definition of Q in (6). Hence, in the following, we consider (Wn ) and (Pn ) as chains valued in ˜ the transition kernels associated Rq × [0, Q(∞)] and [0, Q(∞)], respectively. We will denote by Π and Π

to the chains (Wn ) (under H0) and (Pn ), respectively, that is, for test functions f and f˜ defined on

Rq × [0, Q(∞)] and [0, Q(∞)] , Πf (w) = E[f (Fη (w))],

w ∈ Rq × [0, Q(∞)]

˜ f˜(p) = E[f˜(F˜I (p))], Π

p ∈ [0, Q(∞)] ,

where η = (I, Y ) is such that I ∼ τ , Y ∼ N (0, 1d ) and I and Y are independent. In the following we will simply use the notation η ∼ τ ⊗ N (0, 1d ). We now state our main results. First we determine the limit of the LLR under the signal hypothesis H1. Theorem 2 Suppose that Assumption 1 holds with a state realization (A, B, C) such that A is positive ˜ has a unique invariant distribution µ. stable and (A, B) is controllable. Then the transition kernel Π DRAFT

SUBMITTED TO IEEE TRANS. INFORMATION THEORY

8

Moreover, if Y1:N is defined as in H1 in (4), then as N → ∞, we have −LN (Y1:N , T1:N ) → ξH0:Signal

almost surely (a.s.) ,

where LN (Y1:N , T1:N ) is defined in (10) and   Z   1 T T tr CQ(∞)C − log det CpC + 1d dµ(p) ξH0:Signal = 2

(16)

is positive and finite.

Remark 1 This paper deals with the detection of a stationary signal. That is why the matrix A is assumed to be positive stable. An interesting problem, however, consists in searching for minimum conditions on τ that guarantee the existence of µ when A is not positive stable. This problem could also be refined by

studying the existence of some moments of µ. Recently, this study has been undertaken in [23], [24] and [25] in the case where the sampling process is a Bernoulli process. The approach of [23] and [24] is based on the so called random dynamical systems theory. Let us now provide the limit of the LLR under the noise hypothesis H0. Theorem 3 Suppose that Assumption 1 holds with a state realization (A, B, C) such that A is positive stable and (A, B) is controllable. Then the transition kernel Π has a unique invariant distribution ν . Moreover, if Y1:N is defined as in H0 in (4), then as N → ∞, we have LN (Y1:N , T1:N ) → ξH0:Noise

a.s. ,

where LN (Y1:N , T1:N ) is defined in (10) and Z h −1 io  1 n ξH0:Noise = dν(x, p) log det CpC T + 1d + tr C(xxT − p)C T CpC T + 1d 2

(17)

is positive and finite.

˜ , we immediately see that µ is the Remark 2 From Theorems 2 and 3, and by definitions of Π and Π

second marginal distribution of ν , µ(·) = ν(Rq × ·). By Theorems 1, 2 and 3, we get that the error exponents associated to the hypotheses testing problem (4) are given by ξH0:Noise and ξH0:Signal . More precisely, we have the following result. Corollary 1 Consider, under Assumption 1, the hypotheses test (4). For N ≥ 1 and ε ∈ (0, 1), let βN (ε) be the minimum of error probabilities over all tests for which the false alarm probability is at most ε. Then, as N → ∞, N −1 log βN (ε) → ξH0:Noise , as defined in (17). DRAFT

SUBMITTED TO IEEE TRANS. INFORMATION THEORY

9

Now, if we interchange the roles of H0 and H1 in (4) (call this test the “H0-Signal” test), we obtain N −1 log βN (ε) → ξH0:Signal , as defined in (16).

III. P ROOFS

OF MAIN RESULTS

This section is devoted to the proofs of Theorems 2 and 3. These results follow from an analysis of ˜ , or, equivalently, of the random iteration the Markov chains induced by the transition kernels Π and Π

functions Fη and F˜I defined in (15). We provide a fairly general result in the appendix, Theorem 4, to deal with this general framework. Based on moment contraction conditions, this latter result establishes the existence and uniqueness of the invariant distribution and a law of large numbers for functions with precise polynomial growth conditions at infinity. In this section we establish some useful preliminary results related to moment contraction conditions for random iteration functions Fη and F˜I , and then prove Theorems 2 and 3 by applying Theorem 4. A. Preliminary results We start with a series of preliminary results for which we recall the following notations and assumptions: τ is a distribution on [0, ∞) such that τ ({0}) < 1, (In )n≥1 is a sequence of i.i.d. r.v.’s distributed according to the distribution τ and (ηn )n≥1 is a sequence of i.i.d. r.v.’s distributed according to the distribution τ ⊗ N (0, 1d ). We further denote by I and η two generic r.v.’s having same distribution as I1 and η1 , respectively. ˜ and starting at For any x ∈ Rq and p ∈ [0, Q(∞)], we define two Markov chains induced by Π and Π w = (x, p) and p, respectively    Z0w = w

and

Z˜0p = p

  Z w = Fη (Z w ) and k k k−1

p Z˜kp = F˜Ik (Z˜k−1 ),

k≥1.

As noticed earlier, Z˜kp corresponds to the second component of Zkw for each k and is valued in [0, Q(∞)]. We introduce the following notation for the Kalman gain matrix G(p) = pC T (1d + CpC T )−1 ,

and the short-hand notation for G(Z˜kp ) (the Kalman gain matrix at time k): Gpk = Z˜kp C T (1d + C Z˜kp C T )−1 ,

k≥0,

(18)

As for the Kalman transition matrix, we set Θ(I, p) = e−IA (1q − G(p)C) , DRAFT

SUBMITTED TO IEEE TRANS. INFORMATION THEORY

10

p and the short-hand notation for Θ(Ik , Z˜k−1 ) (the Kalman transition matrix at time k):

Θpk = e−Ik A (1q − Gpk−1 C),

n≥1.

(19)

p Using this notation and Qk = Q(Ik ), the Kalman covariance update equation Z˜kp = F˜Ik (Z˜k−1 ) can be

expressed for all k ≥ 1 as T p Z˜kp = Θpk Z˜k−1 e−Ik A + Qk

p p T −Ik A = Θpk Z˜k−1 + Qk Θpk T + e−Ik A (1q − Gpk−1 C)Z˜k−1 C T Gpk−1 e

(20)

p = Θpk Z˜k−1 Θpk T + Qk ,

(21)

T

where T −Ik A Qk = e−Ik A Gpk−1 Gpk−1 e + Qk , T

k≥1.

(22)

We also denote a product of successive Kalman transition matrices by Θpn,m = Θpn Θpn−1 . . . Θpm+1 ,

0≤m 0, there exist K > 0 and ρ ∈ (0, 1) such that # "

E

sup

p∈[0,Q(∞)]

kΘpn,m k2r ≤ K ρn−m ,

0≤m 0 that we will choose arbitrarily small later. Denote T = inf{k ≥ m | Ik ≥ ǫ}. Then we have, by (21) and (22), λmin (Z˜Tp ) ≥ λmin (QT ) ≥ λmin (Q(ǫ)) > 0 ,

by Lemma 4. We now write, denoting by kΘpn,m k2r = ≤

n−1 X

k=m n−1 X

k=m

(26)

1A the indicator function of the event A,

kΘpn,k Θpk,mk2r 1{T =k} + kΘpn,m k2r 1{T ≥n} kΘpn,k k2r (Θ∗ )2r(k−m) 1{T =k} + (Θ∗ )2r(n−m) 1{T ≥n} .

For any k < n, applying Lemma 1 and the bound (26), we have, on the event T = k,   n Y λmin (Qj ) 2r p 2r ∗ −1 2r ˜ , kΘn,k k ≤ (Z λmin (Q(ǫ)) ) 1− Z˜ ∗ j=T +1

Observe that, for any k ≥ m, {T = k} = {Im < ǫ, . . . , Ik−1 < ǫ, Ik ≥ ǫ}. Hence T − m + 1 is a

geometric r.v. with parameter τǫ = τ ([ǫ, ∞)) and (QT +i )i≥1 is i.i.d., independent of T , and follows the same distribution as Q(I). Moreover, by Lemma 4, λmin (Q(I)) > 0 for I > 0, and since, τ ({0}) < 1, we have γ = E[(1 − λmin (Q(I))/Z˜∗ )2r ] < 1. Thus we get # " E

sup

p∈[0,Q(∞)]

kΘpn,m k2r

≤ (Z˜ ∗ λmin (Q(ǫ))−1 )2r

n−1 X

k=m

γ n−k (Θ∗ )2r(k−m) τǫ (1 − τǫ )k−m + (Θ∗ )2r(n−m)

X

k≥n

τǫ (1 − τǫ )k−m

≤ {(Z˜ ∗ λmin (Q(ǫ))−1 )2r τǫ (n − m) + 1}˜ ρn−m . DRAFT

SUBMITTED TO IEEE TRANS. INFORMATION THEORY

12

where we chose ǫ > 0 small enough so that (Θ∗ )2r (1 − τǫ ) < 1 and set ρ˜ = γ ∨ {(Θ∗ )2r (1 − τǫ )} < 1. This gives the result for any ρ ∈ (˜ ρ, 1) by conveniently choosing K .

We conclude this series of preliminary results with useful Lipschitz bounds for the mappings p 7→ Z˜np ,

p 7→ Θpn,m and p 7→ Gpn .

Proposition 1 We have, for any nonnegative symmetric matrices p and q, Z˜np − Z˜nq = Θpn,0 (p − q)ΘqT n,0 ,

n≥1.

(27)

Moreover, there exists a constant C > 0 such that, for all p, q ∈ [0, Q(∞)], kGpn − Gqn k ≤ C kp − qk kΘpn,0 k kΘqn,0 k , kΘpn,m − Θqn,mk ≤ C kp − qk

n X

j=m+1

kΘqn,j k kΘpj−1,m k kΘqj−1,1 k kΘpj−1,1 k ,

n≥1,

(28)

0≤m 0. Thus, since Gpn = G(Z˜np ), the bound (28) follows

from (27). Finally we prove (29). We have, for all 0 ≤ m < n (recall the convention Θpn,n = Θpm,m = 1q ), Θpn,m



Θqn,m

=

n X

j=m+1

Θqn,j (Θpj − Θqj )Θpj−1,m .

On the other hand, Θpj − Θqj = e−Ij A (Gqj−1 − Gpj−1 )C , and (29) thus follows from (28).

DRAFT

SUBMITTED TO IEEE TRANS. INFORMATION THEORY

13

B. Proof of Theorem 2. Using (27) in Proposition 1, Lemma 2 and the H¨older inequality, we obtain that, for any q > 0 there exists C > 0 and α ∈ (0, 1) such that i h E kZ˜np − Z˜nq kq ≤ Cαn ,

p, q ∈ [0, Q(∞)], n ≥ 1 .

(32)

This corresponds to Condition (i) in Theorem 4. Condition (ii) is trivially satisfied for any s and r = 1 since here X = [0, Q(∞)] is a compact state space. Hence, by Theorem 4(a) we obtain the existence

and uniqueness of µ. Next, we show that −LN (Y1:N , T1:N ) defined in (10) converges to ξH0:Signal in probability when Yn =

Q(∞) CXn + Vn for all n ≥ 1. Since, for all n ≥ 1, Pn = Z˜n−1 and log det ∆n is a Lipschitz function of

Pn , we have by Theorem 4(b) that

Z N 1 X a.s. log det ∆n −−−−→ log det(CpC T + 1d ) µ(dp) . N →∞ N

(33)

n=1

This is true independently of the definition of (Yn ) and hence will also be used in the proof of Theorem 3. In contrast the specific definition of (Yn ) here implies that Yˆn = E[Yn | Y1:n−1 , T1:N ] and ∆n =

ˆ Cov(Yn − Yˆn | T1:N ). Hence ((Yn − Yˆn )T ∆−1 n (Yn − Yn ))n≥1 is a sequence of i.i.d. N (0, 1) r.v.’s, which

yields N 1 X ˆ −a.s. (Yn − Yˆn )T ∆−1 −−→ d . n (Yn − Yn ) − N →∞ N n=1

On the other hand, in (10) this limit cancels with

N 1 X T a.s. Vn Vn −−−−→ d , N →∞ N

(34)

N 1 X T T a.s. Xn C CXn −−−−→ tr(CQ(∞)C T ) , N →∞ N

(35)

n=1

which appears in the last term of (10) when developing YnT Yn = VnT Vn + XnT C T CXn + 2XnT C T Vn . Hence it only remains to show that

n=1

N 1 X T T a.s. Xn C Vn −−−−→ 0 . N →∞ N

(36)

n=1

To this end, recall that (Xn ) is a Markov chain, whose distribution is defined by the recurrence equation (7) and the initial condition X0 ∼ N (0, Q(∞)). We shall establish the ergodicity of this Markov chain by

again applying Theorem 4. For any x ∈ Rq , we denote by (Xnx ) the Markov chain defined with the same

DRAFT

SUBMITTED TO IEEE TRANS. INFORMATION THEORY

14

recurrence equation but with initial condition X0 = x. Then we have, by iterating, Xnx = e−

with the convention

Pn

j=1

Ij A

x+

n X

e−

Pn

j=k+1

Ij A

k=1

Pn

j=n+1 Ij

Uk , n ≥ 1 .

= 0. Recall that, given In , the conditional distribution of Un is N (0, Qn )

and Qn = Q(In ) ∈ [0, Q(∞)]. Hence E[|Un |s | In ] is a bounded r.v. for any s > 0. By Lemma 3, we

have E[ke−

Pn

j=k+1

Ij A s k ]

≤ K (E[e−asI1 ])n−k for same K, s > 0. Hence, we obtain, for any s > 0, for

some constants C > 0 and α ∈ (0, 1), for all x, y ∈ Rq ,

E[|Xnx − Xny |s ] ≤ Cαn (1 + |x|s + |y|s ) , E[|Xnx |s ] ≤ C(1 + |x|s ) . These are conditions (i) and (ii) of Theorem 4 with r = p = 1. Moreover, (Xn ) has a constant marginal distribution, namely N (0, Q(∞)), so that the invariant distribution µ of Theorem 4(a) is necessary µ = N (0, Q(∞)). Now, applying Theorem 4(b) and Theorem 4(c), we get (35) and (36), with a = 2

and a = 1 respectively. To achieve the proof of Theorem 2, it remains to prove that ξH0:Signal > 0. This results from log det(CpC T + 1d ) < tr(CQ(∞)C T ) for every p ∈ [0, Q(∞)].

C. Proof of Theorem 3. w ˜p w Let w = (x, p) ∈ Rq ×[0, Q(∞)]. We denote the first component of Zkw by Z w k so that Zk = (Z k Zk ).

p w −Ik A Gp Y Using the notation introduced above, we have, for all k ≥ 1, Z w k = Θk Z k−1 + e k−1 k−1 , and, by

iterating, Zw n

=

Θpn,0 x

+

n X

Θpn,k e−Ik A Gpk−1 Yk−1 ,

k=1

n≥1.

By continuity of G, it is bounded on the compact set [0, Q(∞)], hence supp,n kGpk k < ∞. Also by

Lemma 3, supk ke−Ik A k < ∞. Applying these bounds, Lemma 2, the Minkowski Inequality and the H¨older Inequality in the previous display, we obtain, for any s > 0 and some constant C > 0, s s E[|Z w n | ] ≤ C (1 + |x| ),

w = (x, p) ∈ Rq × [0, Q(∞)], n ≥ 1 .

(37)

Since the second component of Zkw stays in the compact set [0, Q(∞)], this implies Condition (i) in Theorem 4 with r = 1 for the complete chain (Zkp )k≥0 . Let now v = (y, q) ∈ Rq × [0, Q(∞)]. We have Zw n

− Z vn

=

Θpn,0 x − Θqn,0 y

+

n X k=1

(Θpn,k

− Θqn,k )e−Ik A Gpk−1 Yk−1

+

n X k=1

Θqn,k e−Ik A (Gpk−1 − Gqk−1 )Yk−1 . DRAFT

SUBMITTED TO IEEE TRANS. INFORMATION THEORY

15

Note that, using Lemma 2, the bounds (28) and (29) in Proposition 1, the H¨older Inequality and the Minkowski Inequality, we obtain, for any r > 0 and some constants C > 0, and ρ ∈ (0, 1) not depending on p, q, E[kGpn − Gqn kr ] ≤ Cρn

and E[kΘpn,m − Θqn,m kr ] ≤ Cρn ,

0≤m 0 and some constants C > 0, and α ∈ (0, 1) not depending on w, v, v q n q q E [|Z w n − Z n | ] ≤ Cα (1 + |x| + |y| ),

n≥1.

This, with (32), implies Condition (i) in Theorem 4 with p = 1 for the chain (Zkp )k≥0 . Hence Theorem 4(a) R applies, which yields the existence and uniqueness of the measure ν . Moreover we get that |w|r dν(w) < ∞ for any r > 0.

Let us now show that LN (Y1:N , T1:N ) converges to ξH0:Noise when Yn = Vn for all n ≥ 1. Some of the terms appearing in (10) are identical to the case where Yn = CXn + Vn for all n ≥ 1 investigated for the proof of Theorem 2. Writing T −1 T −1 ˆ ˆ T T −1 ˆ ˆ (Yn − Yˆn )T ∆−1 n (Yn − Yn ) = Vn ∆n Vn + 2Vn ∆n C Xn + Xn C ∆n C Xn ,

and using µ = ν(Rq , ·), (33), (34) and some algebra, it is in fact sufficient to prove that Z N 1 X ˆ T T −1 ˆ a.s. Xn C ∆n C Xn −−−−→ xT C T (CpC T + 1d )−1 Cx dν(x, p) , N →∞ N

(38)

n=1

Z N 1 X T −1 a.s. Vn ∆n Vn −−−−→ tr(CpC T + 1d )−1 dµ(p) , N →∞ N

(39)

n=1

N 1 X T −1 ˆ a.s. Vn ∆n C Xn −−−−→ 0 . N →∞ N

(40)

n=1

(0,Q(∞))

ˆn , Pn ) = Z Now, these limits hold by observing that (X n−1

and by apply Theorem 4(b) with a = 1

for (38), Theorem 4(c) with a = 1 for (40) and Theorem 4(c) with a = 2 for (40). R It remains to prove that ξH0:Noise > 0. From Equation (17), ξH0:Noise ≥ f (p)dµ(p) where f (p) =  0.5 log det(CpC T + 1d ) − CpC T (CpC T + 1d )−1 . This function satisfies f (p) ≥ 0 and f (p) = 0 if

and only if CpC T = 0. Let Z˜ ∈ [0, Q(∞)] be a random variable with the invariant distribution µ, and ˜ T = 0 with probability one. From Equation (15) we have with probability one assume that C ZC

 −1 T −IA ˜ T T −IA ˜ −IAT T ˜ ˜ −IAT C T + CQ(I)C T ˜ ˜ C Ze C − Ce ZC C ZC + 1d 0 = C FI (Z)C = Ce Ze

˜ −IA C T + CQ(I)C T = CQ(I)C T = Ce−IA Ze T

DRAFT

SUBMITTED TO IEEE TRANS. INFORMATION THEORY

16

Due to the controllability of (A, B) and the fact that τ ({0}) < 1, this is a contradiction. Therefore ˜ > 0 with probability one, and hence ξH0:Noise > 0, which achieves the proof of Theorem 3. f (Z)

IV. S OME

PARTICULAR CASES

Different particular cases and limit situations will be considered in this section. We begin with the case where the sampling is regular, i.e., I1 is equal to a constant that we take equal to one without loss of generality. In this case, we obtain compact expressions for the error exponents. We then consider the case where the holding times are large with high probability, i.e., the sensors tend to be far apart. Finally, we consider the case where the SDE (1) is a scalar equation. In the scalar case, we will be able to analyze the impact of E[I1 ], the Signal to Noise Ratio, and the distribution of I1 on ξH0:Signal . All proofs are deferred to Appendix C. Regular sampling When the sampling is regular, the model for (Yn ) under H1 (see Eqs. (11)) is a general model for stable Gaussian multidimensional ARMA processes corrupted with a Gaussian white noise. In this case R1 we denote by Φ = e−In+1 A = exp(−A) and by Q = Q(1) = 0 exp(−uA)BB T exp(−uAT ) du the

state transition matrix and the excitation covariance matrix respectively.

Proposition 2 (Regular sampling) Assume that I1 = 1 and define ξH0:Signal and ξH0:Noise by (16) and (17), respectively. Then we have   1 tr CQ(∞)C T − log det CPR C T + 1d , 2 h −1 i  1 log det CPR C T + 1d − tr CPR C T CPR C T + 1d = 2 h −1 i , +tr CΣC T CPR C T + 1d

ξH0:Signal = ξH0:Noise

(41)

(42)

where PR is the unique solution of the matrix equation

P = ΦP ΦT − ΦP C T CP C T + 1d

−1

CP ΦT + Q

(43)

and where the q × q symmetric matrix Σ is the unique solution of the matrix linear equation Σ − Φ(1q − GC)Σ(1q − GC)T ΦT = ΦGGT ΦT

with G = PR C T CPR C T + 1d

−1

(44)

.

DRAFT

SUBMITTED TO IEEE TRANS. INFORMATION THEORY

17

Equation (43) is the celebrated discrete algebraic Riccati equation. Its solution PR is the asymptotic −1 (steady state) error covariance matrix when the sampling is regular. The matrix G = PR C T CPR C T + I

is the Kalman filter steady state gain matrix [28, Chap. 4]. Large Holding Times

We now study the behavior of the error exponents when the holding times are large with high probability. We shall say that a family (τs ) of probability distributions on [0, ∞) “escapes to infinity” if, as s → ∞ for all K > 0, τs ([0, K]) −−−→ 0. s→∞

In order to study the large holding time behavior of the error exponents, we index the distribution of the holding times by s and assume that τs escapes to infinity. A typical particular case that illustrates this situation is when we assume that the In are equal in distribution to sI¯ where I¯ is some nonnegative random variable, and when we study the behavior of the error exponents for large values of s. Proposition 3 (Large holding times) Assume that (τs ) escapes to infinity and define ξH0:Signal and ξH0:Noise by (16) and (17), respectively. Then, as s → ∞, h −1 i  1 , ξH0:Noise → log det CQ(∞)C T + 1d − tr CQ(∞)C T CQ(∞)C T + 1d 2   1 ξH0:Signal → tr CQ(∞)C T − log det CQ(∞)C T + 1d . 2

(45) (46)

Given an Rd -valued i.i.d. sequence (Yn ) such that Y1 ∼ N (0, 1d ) under H0 and Y1 ∼ N (0, CQ(∞)C T + 1d ) under H1, it is well known that, under H0, the LLR converges to the Kullback-Leibler divergence  D N (0, 1d ) k N (0, CQ(∞)C T + 1d ) , which equals the RHS of (45), while under H1, the negated  LLR converges to D N (0, CQ(∞)C T + 1d ) k N (0, 1d ) , which equals the RHS of (46). This can be

explained as follows. When τs escapes to infinity, two consecutive samples X(Tn ) and X(Tn+1 ) are

asymptotically uncorrelated. Hence, under the signal hypothesis and as τs escapes to infinity, the process (Yn ) can be seen as a centered Gaussian i.i.d. sequence with covariance matrix CQ(∞)C T + 1d .

The Scalar Case In the scalar case, the SDE (1) becomes dX(t) = −a X(t) dt + b dW (t),

t≥0,

(47) DRAFT

SUBMITTED TO IEEE TRANS. INFORMATION THEORY

18

where W (t) is a scalar Brownian motion and (a, b) are known real non zero constants. The SDE (47) defines a so called Ornstein-Uhlenbeck (O-U) process. Under the stationary assumption, a > 0 and the initial value X(0) is independent from W (t) and follows the law N (0, Q(∞)) where the variance Q(∞) is given by Q(∞) = b2 /(2a). We observe (Yn , Tn )1≤n≤N where (Yn ) is a scalar process and we write

the H0-Noise test as H0 : Yn = Vn

for n = 1, . . . , N

H1 : Yn = X(Tn ) + Vn

(48)

for n = 1, . . . , N ,

(49)

where the observation noise process (Vn ) is i.i.d. with V1 ∼ N (0, 1). Solving Equation (47) between Tn and Tn+1 we obtain that Xn = X(Tn ) is given by Xn+1 = e−aIn+1 Xn + Un+1 ,

n ∈ N,

(50)

where, given (In ), (Un ) is a sequence of independent variables such that Un ∼ N (0, Qn ) with Qn =

Q(∞)(1 − e−2aIn ). The distribution of the process (Yn ) under H1 is completely described by the scalars a and Q(∞) and by the distribution τ of I1 , and so are the error exponents. Recall that we assume here

that I1 is integrable. In this case, we can set E[I1 ] = 1 by including the mean holding time into a. The parameter Q(∞) determines the marginal distribution of X , since X(t) ∼ N (0, Q(∞)) for every t ≥ 0, and can thus be interpreted as the Signal to Noise Ratio SNR = E[Xn2 ]/E[Vn2 ].

We now provide the error exponents when the sampling is regular. In the scalar case, it is easy to solve Equations (43) and (44) in the statement of Proposition 2 and to obtain the error exponents in closed forms. Corollary 2 (Regular sampling in the scalar case) In the scalar case, defining ξH0:Signal and ξH0:Noise as in (41) and (42), respectively, we have 1 (SNR − log (1 + PR )) , 2    Φ2 PR PR 1 log (1 + PR ) + −1 , = 2 PR + 1 PR2 + 2PR + 1 − Φ2

ξH0:Signal =

(51)

ξH0:Noise

(52)

where Φ = exp(−a) and PR =

(SNR − 1)(1 − Φ2 ) +

p (SNR − 1)2 (1 − Φ2 )2 + 4SNR(1 − Φ2 ) . 2

We note that (52) was first proved in [7, Theorem 1]. The proof of (51) is straightforward and thus omitted.

DRAFT

SUBMITTED TO IEEE TRANS. INFORMATION THEORY

19

We now consider a general distribution for the holding times and consider the behavior of ξH0:Signal with respect to a, the Signal to Noise Ratio SNR = Q(∞), and the distribution of I1 . Proposition 4 Define ξH0:Signal and ξH0:Noise by (16) and (17), respectively. In the scalar case, the following properties hold true. (i) The error exponent ξH0:Signal decreases as a increases. Moreover, as a → 0, ξH0:Signal = Q(∞)/2 and, as a → ∞, ξH0:Signal = (Q(∞) − log(Q(∞) + 1)) /2. (ii) ξH0:Signal increases as Q(∞) increases. (iii) The distribution τ of I1 that minimizes ξH0:Signal under the constraint E[I1 ] = 1 is τ = δ1 . The proof of Proposition 4 can be found in Appendix C. Some practical design guidelines can be inferred from this proposition: from the stand point of the error exponent theory, when H0 refers to the presence of a noisy O-U signal, one has an interest in choosing close sensors if one wants to reduce the Type II error probability. This probability is reduced by exploiting the correlations between the Xn . As regards the sampling strategy, the worst sampling from the error exponent stand point is the regular sampling. We note that the problem of determining the best distribution τ , that is, the one that maximizes ξH0:Signal with a given mean is an open question. In the setting of Theorem 3, the behavior of ξH0:Noise with respect to a has been analyzed in the regular sampling case only (Corollary 2) in [7]. The authors of [7] proved that when SNR ≥ 0 dB, ξH0:Noise is an increasing function of a while when SNR < 0 dB, ξH0:Noise admits a maximum with respect to a. By a numerical estimation of ξH0:Noise (see below), we observe a similar behavior in the case of a Poisson sampling. However, a more formal characterization of the shape of ξH0:Noise for a general distribution τ seems to be difficult. A more detailed discussion on the behavior of ξH0:Noise is provided in the next

section. V. N UMERICAL I LLUSTRATION

AND I NTERPRETATION OF THE

R ESULTS

b∞ , P∞ ) be a random element of Rq × [0, Q(∞)] Let us first describe the simulation procedure. Let (X

bn , Pn ). Then the error distributed according to the invariant distribution ν of the Markov process (X exponent defined by (17) can be written as     i 1 h b∞ X b T − P∞ C T CP∞ C T + 1d −1 ξH0:Noise = E log det CP∞ C T + 1d + C X ∞ 2 and the error exponent defined by (16) as ξH0:Signal =

   1 . tr CQ(∞)C T − E log det CP∞ C T + 1d 2 DRAFT

SUBMITTED TO IEEE TRANS. INFORMATION THEORY

.

20

1 Poisson Regular

0.9 0.8 0.7

SNR=3 dB

0.6 0.5 0.4 0.3

SNR=0 dB

0.2

SNR=−3 dB

0.1 0

Fig. 1.

0

0.5

1

1.5

2

2.5

.3

Scalar case: ξH0:Signal vs a for SNR = −3, 0 and 3 dB

bn , Pn ) shown in Section III, we estimate the error exponents by By the stability of the Markov chain (X

simulating the Kalman Equations (12)-(13) with (Yn ) i.i.d. N (0, 1d ), and by estimating the expectation

bn , Pn )n=1,...,N for N large enough. A scalar case in the equations above by empirical means taken on (X and a vector case are considered.

A. The scalar case Figures 1 and 2 describe the behavior of the error exponents in the scalar case. Poisson sampling with E[I1 ] = 1 and regular sampling with I1 = 1 are both displayed in the figures. In Fig. 1, the error exponent ξH0:Signal is plotted as a function of a for SNR (= Q(∞)) = −3, 0 and 3 dB. We note that the empirical results match the theoretical findings stated in Proposition 4.

In Fig. 2, ξH0:Noise is plotted vs a for SNR = −3, 0 and 3 dB. We notice that ξH0:Noise → 0 as a → 0. Moreover, this error exponent is an increasing function of a for SNR = 0 and 3 dB while it has a maximum with respect to a for SNR = −3 dB. As said in Section IV, this behavior has been established in [7] in the case of a regular sampling. We also notice that Poisson sampling is worse than regular sampling for SNR = 3 dB and better than regular sampling for SNR = −3 dB. We will further discuss these findings in Section V-C below.

DRAFT

SUBMITTED TO IEEE TRANS. INFORMATION THEORY

.

21

0.25 Poisson Regular 0.2

SNR=3 dB 0.15

SNR=0 dB 0.1

SNR=−3 dB

0.05

0

Fig. 2.

0

0.5

1

1.5

2

2.5

.3

Scalar case: ξH0:Noise vs a for SNR = −3, 0 and 3 dB

B. The vector case We now consider a vector case and investigate whether the qualitative findings in the scalar case are again observed. The following 2-dimensional process is considered.     0 −1 0  X(t) dt +   dW (t) dX(t) = −  1 1 1

where W (t) is a scalar Brownian motion. We take C = 12 in (4). Figures 3 and 4 concern the behavior

of ξH0:Signal and ξH0:Noise in the vector case. Both Poisson and regular sampling are considered. In the Poisson sampling case, we assume that the In are equal in distribution to sI where I is an exponential random variable with mean one, and we plot the error exponents in terms of the mean holding time s. In the regular sampling case, s is simply the sensor spacing. The last parameter is the SNR given by  tr CQ(∞)C T E[|CXn + Vn |2 ] SNR = = . E[|Vn |2 ] d In this experimental setting, a behavior comparable to the scalar case is observed for both tests. In the case of the H0-Signal test, we also observe that ξH0:Signal decreases as s increases, and Poisson sampling enjoys a higher error exponent than the regular sampling for the three considered SNR. In the case of the H0-Noise test, the error exponent increases with s at high SNR, while it has a maximum with respect to s at low SNR, and the Poisson sampling is worse than the regular sampling at high SNR.

DRAFT

SUBMITTED TO IEEE TRANS. INFORMATION THEORY

.

22

2 Poisson Regular

ξ

H0:Signal

1.5

SNR = 3 dB

1

0.5

SNR = 0 dB SNR = −3 dB

0

0

2

4

6

8

s

Fig. 3.

10

.

Vector case: ξH0:Signal vs s for SNR = −3, 0 and 3 dB

C. Discussion on the error exponents behavior. Small sampling spacing: Figures 1 and 2 show that ξH0:Signal increases to Q(∞)/2 as a ↓ 0 (as predicted by Proposition 4) while ξH0:Noise decreases to zero as a ↓ 0. This behavior has the following heuristic interpretation. At a = 0, Equations (50) boil down to XN = · · · = Xn = · · · = X0 ∼

N (0, Q(∞)). Under H1, it is easy to show that the corresponding negated LLR converges to X02 /2, which

has expectation Q(∞)/2, the limit of ξH0:Signal as a ↓ 0. In contrast, as already noticed in [7], a direct derivation shows that the error exponent ξH0:Noise of the limit model is zero, since the Neyman-Pearson √ Type II error probability decreases as O(1/ N ), that is much more slowly than the usual exponential decreasing. This is in accordance with the observed behavior on simulations, namely, ξH0:Noise → 0 as a ↓ 0.

  Small versus large SNR: Here we denote the SNR by q = Q(∞) and let ϕτ (a) = E e−2aI1 ,

for I1 with distribution τ . In Appendix D we provide a heuristic calculation that yields the following approximations. As q → 0,

and, as q → ∞, ξH0:Noise −

ξH0:Noise ∼ q 2

1 + ϕτ (a) 4(1 − ϕτ (a))

   1 1 log(q) → E log(1 − e−2aI1 ) − 1 . 2 2

(53)

(54)

DRAFT

SUBMITTED TO IEEE TRANS. INFORMATION THEORY

23

. Poisson Regular

0.4

SNR = 3 dB 0.3 0.25

ξ

H0:Noise

0.35

0.2

SNR = 0 dB 0.15 0.1 0.05 0

Fig. 4.

SNR = −3 dB 0

2

4

s

6

8

10

.

Vector case: ξH0:Noise vs s for SNR = −3, 0 and 3 dB

These results show that for small SNR, the regular sampling has the lowest error exponent, and for high SNR the regular sampling enjoys the highest error exponent in the set of distributions τ for which E[I1 ] = 1. Indeed, as exp(−x) is convex and (1 + x)/(1 − x) is increasing on [0, 1), the q 2 term at the right hand side of (53) is minimum when I1 = 1 with probability one. At low SNR, the error exponent loss L due to the use of a regular sampling (that is, the SNR to pay to achieve an equivalent error exponent) is L = 5 log 10



(1 + ϕτ (a))(1 − exp(−2a)) (1 − ϕτ (a))(1 + exp(−2a))



dB .

(55)

At large SNR, as log(1− exp(−x)) is concave, the right hand side of (54) is maximum when I1 = 1 with probability one. Thus, at high SNR, the error exponent gain G due to the use of the regular sampling is   G = 10 log10 (1 − exp(−2a)) − E log10 (1 − e−2aI1 ) dB .

(56)

These results are illustrated in Figure 5 for the low SNR regime, and in Figure 6 for the high SNR regime, where ξH0:Noise is plotted as a function of the SNR q for a = 1. In these figures, the curves termed “Asymp. Poisson” and “Asymp. Regular” represent the asymptotic values of ξH0:Noise provided by Equations (53) for Fig. 5 and (54) for Fig. 6. At low SNR, the asymptotic loss incurred by the regular sampling with respect to a Poisson sampling as predicted by (55) is L = 0.91 dB, and this is confirmed numerically in Figure 5. At high SNR, (56) predicts for the regular sampling the asymptotic gain G = 2.03 dB with respect to the Poisson sampling. This is confirmed numerically by Figure 6.

DRAFT

SUBMITTED TO IEEE TRANS. INFORMATION THEORY

24

0.1 Asymp. Poisson True Poisson Asymp. Regular True Regular

0.01

0.001

0.0001

1e-05 -20

Fig. 5.

-18

-16

-14

-12

-10

-8

-6

ξH0:Noise (log scale) vs SNR in dB in the low SNR regime, a = 1.

A PPENDIX A. Technical lemmas In this section we provide some useful technical lemmas. Lemma 3 Assume that A is positive stable. Then there exists constants a > 0 and K > 0 such that ke−xA k ≤ K exp(−xa) for x ≥ 0.

Proof: Let a > 0 be smaller than the real parts of all the eigenvalues of A and C be a rectangle in the complex half plane {z : ℜ(z) ≥ a} whose interior contains all these eigenvalues (here ℜ(z) is the R real part of z ). Applying Theorem 6.2.28 of [20], we have e−xA = 2π1 i C e−xλ (λ I − A)−1 dλ. Hence, we have

−xA

ke

By continuity of λ 7→ (λ I −

A)−1

−xa

k≤e

Z

C

k(λ I − A)−1 k dλ.

on C , the previous integral is finite, which gives the result.

Lemma 4 Assume that the matrix A is positive stable and that the pair (A, B) is controllable. Then the matrix function Q(x) defined by (6) is strictly increasing in the positive semidefinite ordering from 0 to Q(∞) as x increases from 0 to ∞. DRAFT

SUBMITTED TO IEEE TRANS. INFORMATION THEORY

25

2.5 True Regular Asymp. Regular True Poisson Asymp. Poisson 2

1.5

1

0.5

0

5

Fig. 6.

10

15

20

25

ξH0:Noise vs SNR in dB in the high SNR regime, a = 1.

Proof: Since (A, B) is controllable, Q(x) > 0 for any x > 0. Assume that x < y . We have Ry Q(y) − Q(x) = x exp(−uA)BB T exp(−uAT )du = exp(−xA)Q(y − x) exp(−xAT ) > 0 which proves

the lemma.

B. A stability result on Markov chains Here we present our Swiss knife result on Markov chains. We follow the approach in [29] for obtaining the geometric ergodicity of Markov chains using simple moment conditions, although we use a more direct proof inspired from [30]. For the a.s. convergence of the empirical mean, we will rely on the following standard result for martingales [31]. Lemma 5 Let (Mn )n≥0 be a martingale sequence and Xn = Mn − Mn−1 be its increments. If there P a.s. exists p ∈ [1, 2] such that k≥1 k−p E[|Xk |p ] < ∞, then Mn /n −−−→ 0. n→∞

We adopt the following setting for our generic Markov chain. Let (ηk , k ≥ 1) be an i.i.d. sequence of

random variables valued in E and let X be a closed subset of Rd . We further denote by η a random variable having the same distribution as the ηk ’s and independent of them. Let Fy (x) be defined for all y ∈ E and x ∈ X with values in X and such that (x, y) 7→ Fy (x) is a measurable X × E → X function. DRAFT

SUBMITTED TO IEEE TRANS. INFORMATION THEORY

26

This allows to define a Markov chain (Zkx , k ≥ 0) by    Z0x = x ,

  Z x = Fη (Z x ), k k k−1

(57) k≥1.

This Markov chain is valued in X and start at time 0 with the value x. We denote by P the corresponding kernel defined on any bounded continuous function f : X → R by P f (x) = E(f (Z1x )) = E(f ◦ Fη1 (x)),

x∈X ,

where ◦ denotes the composition operator. Observe that (57) implies, for all n ≥ 1 Znx = Fηn ◦ · · · ◦ Fη1 (x) .

Recalling that | · | is the Euclidean norm on Rd , we denote for any p ≥ 1 and f : X → R, kf kLipp = sup

x,x′ ∈X 2

|f (x) − f (x′ )| , |x − x′ | (1 + |x|p−1 + |x′ |p−1 )

which is the Lipschitz norm for p = 1. We now state the main result of this appendix. Theorem 4 Define (Zkx , k ≥ 0) as in (57). Assume that Fη is a.s. continuous, and that, for some C > 0, α ∈ (0, 1), p, r ≥ 0, q ≥ 1 and s ≥ p,

(i) For all (x, x′ ) ∈ X 2 and n ≥ 1,

  E |Fηn ◦ · · · ◦ Fη1 (x) − Fηn ◦ · · · ◦ Fη1 (x′ )|q ≤ Cαn (1 + |x|pq + |x′ |pq ) .

(ii) For all x ∈ X and n ≥ 1,

E [|Fηn ◦ · · · ◦ Fη1 (x)|s ] ≤ C(1 + |x|rs ) .

(58)

Then the following conclusions hold. (a) There exists a unique probability measure µ on X such that ξ ∼ µ and ξ independent of η ⇒ Fη (ξ) ∼ µ .

(59)

Moreover such a measure µ has a finite s-th moment. (b) Let a ∈ [1, s ∧ {1 + s(q − 1)/q}] and f : X → R such that kf kLipa < ∞. Suppose in addition that s > b = p + r(a − 1). Then, for all x ∈ X , n

1X a.s. f (Zkx ) −−−→ µ(f ) = n→∞ n k=1

Z

f dµ .

DRAFT

SUBMITTED TO IEEE TRANS. INFORMATION THEORY

27

(c) Let (Un )n≥1 be a sequence of i.i.d. real-valued random variables such that E[|U1 |1+ǫ ] < ∞ for some ǫ > 0 and, for all n ≥ 1, Un is independent of η1 , . . . , ηn . Then, under the same assumptions as in

(b), if moreover s > a, then, for all x ∈ X , n 1X a.s. Uk f (Zkx ) −−−→ m µ(f ) , n→∞ n k=1

where m = E[U1 ].

Proof: Let us introduce the backward recurrence process starting at x defined by Y0 = x and Yn = Fη1 ◦ · · · ◦ Fηn (x),

n≥1.

d

Note that for any n, Yn = Znx , that is, the processes (Yn ) and (Znx ) has the same marginal distributions. Moreover, using (i) and the Jensen Inequality, we have   X X |Yn+1 − Yn | ≤ C 1/q αn/q (1 + |x|p + E [|Fηn (x)|p ]) . E n≥0

By (ii), since s ≥ p, E [|Fηn

n≥0

(x)|p ]

< ∞ and thus

P

n≥0 |Yn+1

− Yn | < ∞ a.s. By completeness of the

state space X , Yn converges in X a.s. We denote the limit by ξ and its probability distribution by µ. By a.s. continuity of Fη , we have Fη (Yn )→Fη (ξ) a.s. On the other hand Fη (Yn ) ∼ Yn+1→ξ a.s. Hence µ satisfies (59), that is, µ is an invariant distribution of the induced Markov chain. Moreover by (ii), we have supn E[|Yn |s ] < ∞ which by Fatou’s Lemma implies that E[|ξ|s ] < ∞. Let us show that µ is the a.s.

unique invariant distribution. By (i), for any x, y ∈ X , Znx − Zny −−−→ 0. Now draw x and y according n→∞

to two invariant distributions, respectively, so that (Znx )n≥0 and (Zny )n≥0 are two sequences with constant marginal distributions. Then necessarily these two distributions are the same and thus µ is the unique invariant distribution, which achieves the proof of (a). We now prove (b). First observe that f is continuous and f (x) = O(|x|a ) as |x| → ∞. Hence by (a),

since a ≤ s, f is integrable with respect to µ. Also, by (ii), E[|f (Zkx )|] < ∞ for all k ≥ 1 and x ∈ X . We use the classical Poisson equation for decomposing the empirical mean of the Markov chain as the empirical mean of martingale increments plus a negligible remainder. Using that kf kLipa < ∞, the H¨older inequality and (i), we have, for any x, y ∈ X , X X E |f (Zkx ) − f (Zky )| ≤ ) (1 + |x|p + |y|p ) , + kZky ka−1 C 1/q αk/q (1 + kZkx ka−1 q ′ (a−1) q ′ (a−1) k≥1

k≥1

where we used the notation k · kp = (E[| · |p ])1/p and q ′ = q/(q − 1). Since q ′ (a − 1) ≤ s, we can apply the Jensen Inequality and (ii) to bound kZkx kq′ (a−1) and kZky kq′ (a−1) . We obtain, for some constant c > 0 X E |f (Zkx ) − f (Zky )| ≤ c (1 + |x|b + |y|b ) . k≥1

DRAFT

SUBMITTED TO IEEE TRANS. INFORMATION THEORY

28

with b = p + r(a − 1). Since we assumed s > b, using (a), the right-hand side of the previous display is integrable in y with respect to µ and we get Z X X |E[f (Zkx )] − µ(f )| ≤ E |f (Zkx ) − f (Zky )| µ(dy) ≤ c′ (1 + |x|b ) . k≥1

k≥1

Hence we may define the real-valued function fˆ(x) =

X {E[f (Zkx )] − µ(f )} , k≥1

which is the solution of the Poisson equation f (x) − µ(f ) = fˆ(x) − P fˆ(x) and satisfies sup (1 + |x|b )−1 |fˆ(x)| < ∞ .

(60)

x∈X

Hence the decomposition n

n

n

k=1

k=1

k=1

1X 1X ˆ x 1 1X {f (Zkx ) − µ(f )} = {f (Zk ) − P fˆ(Zkx )} = Xk + {P fˆ(x) − P fˆ(Znx )} , n n n n

x ), k ≥ 1. Observe that (X ) where Xk = fˆ(Zkx ) − P fˆ(Zk−1 k k≥1 is a sequence of martingale increx )|s/b ] and by (60) and (ii), ments. By the Jensen Inequality, we have E[|P fˆ(Znx )|s/b ] ≤ E[|fˆ(Zn+1

x )|s/b ] < ∞. Since s/b > 1, by the Markov Inequality and Borel-Cantelli’s lemma, this supn≥1 E[|fˆ(Zn+1 P implies that P fˆ(Znx )/n→0 a.s. We also get that supk≥1 E[|Xk |s/b ] < ∞ and, by Lemma 5 nk=1 Xk /n→0

a.s. This proves (b).

We conclude with the proof of (c). Using (b) we may replace Uk by Uk − m, that is, we assume

m = 0 without loss of generality. Then (Uk f (Zkx ))k≥1 is a sequence of martingale increments. Let

u = (1 + ǫ) ∧ s/a > 1. We have supk≥1 E[|Uk f (Zkx )|u ] = E[|U1 |u ] supk≥1 E[|f (Zkx )|u ] < ∞ by (ii) and

the result follows from Lemma 5. C. Proofs for Section IV. 1) Proof of Proposition 2: Given any deterministic nonnegative matrix p ∈ [0, Q(∞)], the sequence

p of covariance matrices Z˜kp = F˜1 (Z˜k−1 ) where F˜1 is the second component of (15) with I = 1 is a

p q deterministic sequence. From Lemma 1 and Proposition 1–Eq. (27), one get that kZ˜K − Z˜K k ≤ αkp− qk

for α ∈ (0, 1) and K large enough. Hence, by the fixed point theorem, Z˜kp converges to a limit PR defined

as the unique solution in [0, Q(∞)] of the equation P = F˜1 (P ) which is the discrete algebraic Riccati equation (43). This amounts to say that the invariant distribution µ defined in Theorem 2 coincides with δPR . It remains to show that Equation (43) has no solutions outside [0, Q(∞)]. Indeed, assume that p is

DRAFT

SUBMITTED TO IEEE TRANS. INFORMATION THEORY

29

a solution of (43). Consider the state equations (11) where it is assumed that X(0) ∼ N (0, p). By the very nature of the Kalman filter, the covariance matrix Pk satisfies   T T Pk ≤ E Xk XkT = e−Tk A pe−Tk A + Q(Tk ) < e−Tk A pe−Tk A + Q(∞)

As Pk = p for any k, we have p ≤ Q(∞) by taking the limit as k → ∞.

We now consider the invariant distribution ν defined in Theorem 3. This distribution writes ν = νX ⊗ δPR , and we shall show that νX = N (0, Σ) where Σ is the unique solution of Equation (44). To

that end, we begin by showing that the steady state Kalman filter transition matrix Θ = Φ(1q − GC)

with G = PR C T (CPR C T + 1d )−1 has all its eigenvalues {λi } in the open unit disk. Indeed, taking

the limit in (20), we get PR = ΘPR ΘT + ΦGGT ΦT + Q. Assuming ti is an eigenvector of Θ with

T T T T eigenvalue λi , we obtain from this last equation that (1 − |λi |2 )tT i PR ti = ti ΦGG Φ ti + ti Qti > 0

due to Q = Q(1) > 0, hence |λi | < 1. Consequently, the matrix equation (44) has a unique solution P n T T T n q ˜ Σ= ∞ n=0 Θ ΦGG Φ (Θ ) [28, Chap. 4.2]. When Zk = (Z k , Zk ) ∈ R × [0, Q(∞)] follows the

distribution ν , we have (see (15)) Z k = ΘZ k−1 + ΦGYk . Recall that Yk ∼ N (0, 1d ) and is independent with Z k−1 . In these conditions, it is clear that Z k ∼ N (0, Σ) when Z k−1 ∼ N (0, Σ). Therefore, ν = N (0, Σ) ⊗ δPR is invariant, and it is the unique invariant distribution. Replacing ν and µ with their

values at the right hand sides of (17) and (16), we obtain (42) and (41) respectively. Proposition 2 is proven. 2) Proof of Proposition 3: We assume that the holding times In are equal in distribution to I s (distributed as τs ) to point out the dependence on s. We also denote the invariant distribution of the Markov chain (Z˜k ) defined in Section III as µs . We begin by proving that µs converges weakly to δQ(∞) as s → ∞ (we will use the notation µs ⇒ δQ(∞) ). By Lemma 3 we have E[k exp(−I s A)k2 ] ≤ R R K E[exp(−2aI s )] = exp(−2ax)τs (dx) with a > 0. Given a K > 0, we have exp(−2ax)τs (dx) = RK R∞ 0 exp(−2ax)τs (dx) + K exp(−2ax)τs (dx) ≤ τs ([0, K]) + exp(−2aK). Since τs escapes to infinity,

E[ke−I

s

s

A k2 ]

→ 0 as s → ∞, which implies that e−I A → 0 in probability as s → ∞. Moreover, we have

Z ∞

T T s

kQ(I ) − Q(∞)k =

s exp(−uA)BB exp(−uA ) du I Z ∞ Z ∞ 2 2 exp(−2ua) du = (K/2a) exp(−2aI s ) ≤ kBk k exp(−uA)k du ≤ K Is

Is

hence

Q(I s )

→ Q(∞) in probability as s → ∞. Now, assume that the random variable Z˜ ∈ [0, Q(∞)] is

distributed as µs . Recalling that F˜ is the random iteration function defined as the second component of ˜ − Q(∞)k ≤ Kke−I Equation (15), we have kF˜I s (Z)

s

A k2

˜ → Q(∞) + kQ(I s ) − Q(∞)k, hence F˜I s (Z)

˜ ∼ µs , µs ⇒ δQ(∞) . Due to the continuity of the log det on the in probability as s → ∞. As F˜I s (Z)

DRAFT

SUBMITTED TO IEEE TRANS. INFORMATION THEORY

30

compact set [0, Q(∞)], we have, as s → ∞, results from (16).

R

log(1 + p)dµs (p) →

R

log(CQ(∞)C T + 1d ), and (46)

˜ ∈ Rq × [0, Q(∞)] follows the invariant distribution ν , and let (Z 1 , Z˜1 ) = Now assume that Z = (Z, Z)

˜ F(I s ,V ) (Z), where F(I s ,V ) is defined by Equation (15). In particular, we have Z 1 = Θ(I s , Z)Z + e−I

s

A G(Z)V ˜

. As E[ke−I

s

A k2 ]

˜ 2 ] = E[ke−I s A (I − → 0 and Z˜ ≤ Q(∞), we have E[kΘ(I s , Z)k

2 ] → 0 as s → ∞ and E[|e−I ˜ G(Z)C)k

s

A G(Z)V ˜ |2 ]

→ 0, hence E[|Z 1 |2 ] → 0. The third term in the

RHS of the Expression (17) of ξH0:Noise satisfies Z Z −1 Cx dν(x, p) ≤ kCk2 |x|2 dν(x, p) = kCk2 E[|Z 1 |2 ] → 0 , xT C T CpC T + 1d

as s → ∞. As µs ⇒ δQ(∞) , the second term in the RHS of (17) converges to −tr[CQ(∞)C T (CQ(∞)C T +

1d )−1 ], which achieves the proof of Proposition 3.

3) Proof of Proposition 4: We begin with (i). In the scalar case, the covariance update equation (13) writes Pk+1 =

F˜Iak+1 (Pk )

−2aIk+1

=e



 Pk − Q(∞) + Q(∞) . Pk + 1

(61)

Given a sequence of holding times (Ik )k≥1 and two positive numbers a1 ≥ a2 , consider the two

Markov chains Z˜api ,k = F˜Iaki (Z˜api ,k−1) for i = 1, 2, both starting at the same value p = Q(∞). Let

f (p) = p/(p + 1) − Q(∞). As f (Q(∞)) < 0 and 0 < exp(−2a1 I1 ) ≤ exp(−2a2 I1 ), it is clear

that Z˜ap1 ,1 ≥ Z˜ap2 ,1 . Assume that Z˜ap1 ,k−1 ≥ Z˜ap2 ,k−1. As f (p) is negative and increasing for p ∈

[0, Q(∞)] and 0 < exp(−2a1 Ik ) ≤ exp(−2a2 Ik ), we have Z˜ap1 ,k = exp(−2a1 Ik )f (Z˜ap1 ,k−1 ) + Q(∞) ≥

exp(−2a2 Ik )f (Z˜ap2 ,k−1 ) + Q(∞) = Z˜ap2 ,k .

From Theorem 2, both the chains Z˜ap1 ,k and Z˜ap2 ,k have unique invariant distributions µ1 and µ2 respectively, and by repeating the arguments of the proof of Theorem 2, Z N −1   1 X p ˜ log 1 + Zai ,k −−−−→ log (1 + p) dµi (p) for i = 1, 2, a.s. N →∞ N k=0 R R As Z˜ap1 ,k ≥ Z˜ap2 ,k for all k, by passing to the limit we have log(1+p)dµ1 (p) ≥ log(1+p)dµ2 (p). As  R ξH0:Signal = 0.5 Q(∞) − log(1 + p)dµ(p) in the scalar case (see Expression (16)), this error exponent decreases with a.

We now show that lima→0 ξH0:Signal = Q(∞)/2. Assume that Z˜ ∈ [0, Q(∞)] has the invariant distribution i  h ˜ = E[F˜ a (Z)] ˜ = E[e−2aI ] E Z˜ − Q(∞) + Q(∞) that we denote µa . From Eq. (61), we have E[Z] ˜ I Z+1

which yields

"

Z˜ 2 + (1 − E[e−2aI ])Z˜ E Z˜ + 1

#

= Q(∞)(1 − E[e−2aI ]) .

DRAFT

SUBMITTED TO IEEE TRANS. INFORMATION THEORY

As Z˜ ≤ Q(∞), we have E

h

Z˜ 2 Q(∞)+1

i

≤E

h

31

˜2 Z ˜ Z+1

i

≤ Q(∞)(1 − E[e−2aI ]). By the dominated convergence

theorem, E[exp(−2aI)] →a→0 1, therefore E[Z˜ 2 ] → 0 as a → 0. It follows that µa converges weakly to R δ0 as a → 0, therefore log(1 + p)dµa (p) → 0. Hence lima→0 ξH0:Signal = Q(∞)/2. The limit as a → ∞ is obtained by applying Proposition 3.

To show (ii), the argument is similar to the one used above to show that ξH0:Signal decreases as a increases. We now prove (iii). Consider the Markov chain Z˜k = F˜Iak (Z˜k−1 ) where F˜Iak is given by (61). Assuming that Z˜k has the invariant distribution µ, we have h i   E Z˜k = E e−2aIk E ≤ e−2a

E

"

"

# ! Z˜k − Q(∞) + Q(∞) Z˜k + 1 # !

Z˜k − Q(∞) Z˜k + 1

+ Q(∞)

 E[Z˜k ] + Q(∞) 1 − e−2a E[Z˜k ] + 1   = h E[Z˜k ] ≤ e−2a

where the first inequality is due to the convexity of e−2ax in conjunction with Z˜k ≤ Q(∞) with probability one, and the second inequality is due to the concavity of x/(1 + x). If we choose τ = δ1 , then the corresponding invariant distribution is δPR where PR is the unique solution of the equation h(p) = p (see R Proposition 2). As h(p) − p is decreasing, p dµ(p) = E[Z˜k ] ≤ PR . As log is an increasing concave function, the error exponent satisfies ξH0:Signal

1 = 2

  Z Q(∞) − log(p + 1) dµ(p)  Z  1 1 ≥ Q(∞) − log p dµ + 1 ≥ (Q(∞) − log(PR + 1)) 2 2

which achieves the proof of Proposition 4.

D. Heuristic calculations for small VS large SNR discussion In the scalar case, the Kalman recursions (12)-(13) write −aIn+1 bn + e−aIn+1 Pn Yn bn+1 = e X X Pn + 1 Pn + 1 Pn Pn+1 = e−2aIn+1 + q(1 − e−2aIn+1 ) Pn + 1

(62) (63)

DRAFT

SUBMITTED TO IEEE TRANS. INFORMATION THEORY

32

bn , Pn ) has the invariant distribution ν . In order to obtain (53) and (54), we We assume that vector (X

study the asymptotic behavior of

ξH0:Noise

for small and large values of q .

# " bn2 − Pn 1 X = E log(Pn + 1) + 2 Pn + 1

(64)

We start with q → 0. Let us expand the RHS of (63) to the order O(q), by recalling that Pn ≤ q and by taking the expectations, we obtain E[Pn ] = E[Pn+1 ] = ϕτ (a)E[Pn ] + q(1 − ϕτ (a)) + o(q). Hence E[Pn ] = q + o(q). As Pn ≤ q , we also have Pn = q + o(q). Inserting this back into (63), we obtain Pn = q + q 2 Bn + o(q 2 ) with

 q + q 2 Bn+1 = e−2aIn+1 q + q 2 Bn (1 − q) + q(1 − e−2aIn+1 ) + o(q 2 ) .

By identifying the coefficient of q 2 in the two members we get Bn+1 = e−2aIn+1 (Bn − 1). Taking the expectations and recalling that we are under an invariant distribution, we obtain E[Pn ] = q + q 2 E[Bn ] + o(q 2 ) = q − q 2

ϕτ (a) + o(q 2 ) . 1 − ϕτ (a)

Turning to Equation (62) and developing as above, we have  !2  " 2 # i h h i b P X n n 2 2 bn+1 = ϕτ (a)E  bn = E X  + ϕτ (a)E E X Pn + 1 Pn + 1 hence

h i bn2 + ϕτ (a)q 2 + o(q 2 ) = ϕτ (a)E X

# h i bn2 X bn2 + o(q 2 ) = q 2 ϕτ (a) . =E X E Pn + 1 1 − ϕτ (a) "

Similarly, we have      Pn q2 E + o(q 2 ) . = E q + q 2 Bn (1 − q) + o(q 2 ) = q − Pn + 1 1 − ϕτ (a)

Plugging these expressions into (64) and recalling that log(1 + x) = x − x2 /2 + o(x2 ) we obtain (53). Next we consider q → ∞. It is easily seen from (63) that Pn+1 + 1 q+1 ≥ ↓ 1 − e−2aIn+1 q q

as q → ∞

  therefore, by the monotone convergence theorem, E[log(Pn +1)]−log q → E log(1 − e−2aI1 ) . Moreover, bn2 ] → ϕ(a) and we readily have E[(Pn + 1)/Pn ] → 1. Using (62), as Pn ∼ q(1 − e−2aIn−1 ), we have E[X bn2 /(Pn + 1)] → 0. Replacing into (64), we obtain (54). E[X

DRAFT

SUBMITTED TO IEEE TRANS. INFORMATION THEORY

33

ACKNOWLEDGMENTS The authors thank the associate editor and the reviewers for their remarks which substantially improved the paper. R EFERENCES [1] T. Kailath and H.V. Poor, “Detection of stochastic processes,” IEEE Trans. on Information Theory, vol. 44, no. 6, pp. 2230–2259, Oct. 1998. [2] S. Misra and L. Tong, “Error exponents for the detection of Gauss-Markov signals using randomly spaced sensors,” IEEE Trans. on Signal Processing, vol. 56, no. 8, pp. 3385–3396, Aug. 2008. [3] B. Øksendal, Stochastic Differential Equations: An Introduction With Applications, Springer Verlag, 6th edition, 2003. [4] E. Allen, Modeling with Itˆo Stochastic Differential Equations, vol. 22 of Mathematical Modelling: Theory and Applications, Springer, 2007. [5] M. Micheli, “Random Sampling of a Continuous-Time Stochastic Dynamical System: analysis, State Estimation, and Applications,” M.S. thesis, UC Berkeley, 2001. [6] M. Micheli and M.I. Jordan, “Random sampling of a continuous-time stochastic dynamical system,” in Proc. of the International Symposium on the Mathematical Theory of Networks and Systems, Univ. of Notre Dame, South Bend, Indiana, USA, Aug. 2002. [7] Y. Sung, L. Tong, and H.V. Poor, “Neyman-Pearson detection of Gauss-Markov signals in noise: Closed-form error exponents and properties,” IEEE Trans. on Information Theory, vol. 52, no. 4, pp. 1354–1365, Apr. 2006. [8] J.-F. Chamberland and V.V. Veeravalli, “Decentralized detection in sensor networks,” IEEE Trans. on Signal Processing, vol. 51, no. 2, pp. 407–416, Feb. 2003. [9] J.-F. Chamberland and V.V. Veeravalli, “How dense should a sensor network be for detection with correlated observations ?,” IEEE Trans. on Information Theory, vol. 52, no. 11, pp. 5099–5106, Nov. 2006. [10] S.K. Jayaweera, “Bayesian fusion performance and system optimization for distributed stochastic Gaussian signal detection under communication constraints,” IEEE Trans. on Signal Processing, vol. 55, no. 4, pp. 1238–1250, Apr. 2007. [11] Y. Sung, S. Misra, L. Tong, and A. Ephremides, “Signal processing for application-specific ad hoc networks,” IEEE Signal Processing Magazine, vol. 23, no. 5, pp. 74–83, Sept. 2006. [12] Y. Sung, S. Misra, L. Tong, and A. Ephremides, “Cooperative routing for distributed detection in large sensor networks,” IEEE Journal on Selected Areas in Communications, vol. 25, no. 2, pp. 471–483, Feb. 2007. [13] Po-Ning Chen, “General formulas for the Neyman-Pearson type-II error exponent subject to fixed and exponential type-I error bounds,” IEEE Trans. on Information Theory, vol. 42, no. 1, pp. 316–323, Jan 1996. [14] H. Chernoff, “A measure for asymptotic efficiency for tests of a hypothesis based on a sum of observations,” Ann. Math. Statist., vol. 23, no. 4, pp. 493–507, 1952. [15] H. Luschgy, A. Rukhin, and I. Vajda, “Adaptive tests for stochastic processes in the ergodic case,” Stoch. Process. Appl., vol. 45, no. 1, pp. 45–59, 1993. [16] A. Anandkumar, L. Tong, and A. Swami, “Detection of Gauss-Markov random field on nearest-neighbor graph,” in IEEE International Conference on Acoustics, Speech, and Signal Processing, Hawaii, USA, 2007. [17] Y. Sung, X. Zhang, L. Tong, and H.V. Poor, “Sensor configuration and activation for field detection in large sensor arrays,” IEEE Trans. on Signal Processing, vol. 56, no. 2, pp. 447–463, Feb. 2008. DRAFT

SUBMITTED TO IEEE TRANS. INFORMATION THEORY

34

[18] R.K. Bahr, “Asymptotic analysis of error probabilities for the nonzero-mean Gaussian hypothesis testing problem,” IEEE Trans. on Information Theory, vol. 36, no. 3, pp. 597–607, May 1990. [19] F. Schweppe, “Evaluation of likelihood functions for Gaussian signals,” IEEE Trans. on Information Theory, vol. 11, no. 1, pp. 61–70, Jan. 1965. [20] R. Horn and C. Johnson, Topics in Matrix Analysis, Cambridge Univ. Press, 1991. [21] P. Brockwell and R. Davis, Time Series: Theory and Methods, Springer Series in Statistics, 1991. [22] H.H. Rosenbrock, State Space and Multivariable Theory, Nelson, 1970. [23] S. Kar, B. Sinopoli, and J. Moura, “Kalman Filtering with Intermittent Observations: Weak Convergence to a Stationary Distribution,” submitted to IEEE Trans. Automatic Control, [online] arXiv:0903.2890v1, Mar. 2009. [24] S. Kar and J. Moura, “Kalman Filtering with Intermittent Observations: Weak Convergence and Moderate Deviations,” [online] arXiv:0910.4686, Oct. 2009. [25] B. Sinopoli, L. Schenato, M. Franceschetti, K. Poolla, M.I. Jordan, and S.S. Sastry, “Kalman filtering with intermittent observations,” IEEE Transactions on Automatic Control, vol. 49, no. 9, pp. 1453–1464, 2004. [26] L. Guo, “Stability of recursive stochastic tracking algorithms,” SIAM J. on Control and Optimization, vol. 32, pp. 1195– 1125, 1994. [27] B.D.O. Anderson and J.B. Moore, “Detectability and stabilizability of time-varying discrete-time linear systems,” SIAM J. Control and Optimization, vol. 19, no. 1, pp. 20–32, 1981. [28] B.D.O. Anderson and J.B. Moore, Optimal Filtering, Prentice-Hall, 1979. [29] A. Benveniste, M. M´etivier, and P. Priouret, Adaptive algorithms and stochastic approximations, vol. 22 of Applications of Mathematics (New York), Springer-Verlag, Berlin, 1990, Translated from the French by Stephen S. Wilson. [30] P. Diaconis and D. Freedman, “Iterated random functions,” SIAM Review, vol. 41, no. 1, pp. 45–76, Mar. 1999. [31] Y.S. Chow, “Local convergence of martingales and the law of large numbers,” Ann. Math. Statist., vol. 36, no. 2, pp. 552–558, 1965.

DRAFT

Suggest Documents