Estimation of Toeplitz Covariance Matrices in Large ... - arXiv

1

Estimation of Toeplitz Covariance Matrices in Large Dimensional Regime with Application to Source Detection

arXiv:1403.1243v1 [cs.IT] 5 Mar 2014

Julia Vinogradova, Romain Couillet, and Walid Hachem

Abstract In this article, we derive concentration inequalities for the spectral norm of two classical sample estimators of large dimensional Toeplitz covariance matrices, demonstrating in particular their asymptotic almost sure consistence. The consistency is then extended to the case where the aggregated matrix of time samples is corrupted by a rank one (or more generally, low rank) matrix. As an application of the latter, the problem of source detection in the context of large dimensional sensor networks within a temporally correlated noise environment is studied. As opposed to standard procedures, this application is performed online, i.e. without the need to possess a learning set of pure noise samples.

Index Terms Covariance matrix, concentration inequalities, correlated noise, source detection.

I. I NTRODUCTION Let (vt )t∈Z be a complex circularly symmetric Gaussian stationary process with zero mean and covariance function (rk )k∈Z with rk = E[vt+k vt∗ ] and rk → 0 as k → ∞. We observe N independent copies of (vt )t∈Z

N −1,T −1 over the time window t ∈ {0, . . . , T − 1}, and stack the observations in a matrix VT = [vn,t ]n,t=0 . This 1/2

matrix can be written as VT = WT RT , where WT ∈ CN ×T has independent CN (0, 1) (standard circularly 1/2

symmetric complex Gaussian) entries and RT

is any square root of the Hermitian nonnegative definite Toeplitz

T × T matrix RT , [ri−j ]0≤i,j≤T −1



r0

   r−1 =  .  ..  r1−T

r1 .. . .. .

... .. . .. .

. . . r−1

 rT −1 ..   .  .  r1   r0

A classical problem in signal processing is to estimate RT from the observation of VT . With the growing importance of multi-antenna array processing, there has recently been a renewed interest for this estimation problem in the regime of large system dimensions, i.e. for both N and T large. The first and third authors are with CNRS LTCI; Telecom ParisTech, France (e-mail: [email protected]; [email protected]). The second author is with Supelec, France (e-mail: [email protected]). This work is supported in part by the French Ile-de-France region, DIM LSC fund, Digiteo project DESIR, and in part by the ANR-12-MONU-OOO3 DIONISOS.

March 6, 2014

DRAFT

2

b u At the core of the various estimation methods for RT are the biased and unbiased estimates rˆk,T and rˆk,T

for rk , respectively, defined by b rˆk,T

N −1 T −1 1 XX ∗ 10≤t+k≤T −1 = vn,t+k vn,t N T n=0 t=0

u rˆk,T =

where

−1 N −1 T X X 1 ∗ vn,t+k vn,t 10≤t+k≤T −1 N (T − |k|) n=0 t=0

1A is the indicator function on the set A. Depending on the relative rate of growth of N and T ,

a.s. b u b b = [ˆ bu = [ˆ bb k −→ the matrices R ri−j,T ]0≤i,j≤T −1 and R ri−j,T ]0≤i,j≤T −1 may not satisfy kRT − R 0 or T T T

a.s. bu k −→ kRT − R 0. An important drawback of the biased entry-wise estimate lies in its inducing a general T

b b ; as for the unbiased entry-wise estimate, it may induce too much inaccuracy in the asymptotic bias in R T

bu . The estimation paradigm followed in the recent literature generally top-right and bottom-left entries of R T b b or R bu (i.e. by weighting down or discarding a consists instead in building banded or tapered versions of R T T certain number of entries away from the diagonal), exploiting there the rate of decrease of rk as k → ∞ [1],

[2], [3], [4], [5], [6]. Such estimates use the fact that kRT − Rγ(T ),T k → 0 with Rγ,T = [[RT ]i,j 1|i−j|≤γ ] for some well-chosen functions γ(T ) (usually satisfying γ(T ) → ∞ and γ(T )/T → 0) and restrict the study to the

consistent estimation of Rγ(T ),T . The aforementioned articles concentrate in particular on choices of functions bγ(T ),T k for the banded or tapered estimate R bγ(T ),T . γ(T ) that ensure optimal rates of convergence of kRT − R

These procedures, although theoretically optimal, however suffer from several practical limitations. First, they assume the a priori knowledge of the rate of decrease of rk (and restrict these rates to specific classes). Then, even if this were indeed known in practice, being asymptotic in nature, the results do not provide explicit rules for selecting γ(T ) for practical finite values of N and T . Finally, the operations of banding and tapering do

not guarantee the positive definiteness of the resulting covariance estimate. In the present article, we consider instead that the only constraint about rk is

P∞

k=−∞

|rk | < ∞ and estimate

b b and R bu . The consistence of these estimates, RT from the standard (non-banded and non-tapered) estimates R T T

in general invalid, shall be enforced here by the choice N, T → ∞ with N/T → c ∈ (0, ∞). This setting is

more practical in applications as long as both the finite values N and T are sufficiently large and of similar order of magnitude. Another context where a non banded Toeplitz rectification of the estimated covariance matrix leads to a consistent estimate in the spectral norm is studied in [7]. Our specific contribution lies in the establishment of concentration inequalities for the random variables bb k and kRT − R b u k. It is shown specifically that, for all x > 0, − log P[kRT − R bb k > x] = O(T ) kRT − R T T T

bu k > x] = O(T / log T ). Aside from the consistence in norm, this implies as a corollary and − log P[kRT − R T

bu is positive definite with outstanding probability that, as long as lim supT kRT−1 k < ∞, for T large enough, R T

b b is nonnegative definite by construction). (R T

For application purposes, the results are then extended to the case where VT is changed into VT + PT for

a rank-one matrix PT . Under some conditions on the right-eigenspaces of PT , we show that the concentration inequalities hold identically. The application is that of a single source detection (modeled through PT ) by an array of N sensors embedded in a temporally correlated noise (modeled by VT ). To proceed to detection, RT bb or R bu , which is used as a whitening matrix, before applying a generalized is estimated from VT + PT as R T T March 6, 2014

DRAFT

3

likelihood ratio test (GLRT) procedure on the whitened observation. Simulations corroborate the theoretical consistence of the test. The remainder of the article is organized as follows. The concentration inequalities for both biased and unbiased estimates are exposed in Section II. The generalization to the rank-one perturbation model is presented in Section III and applied in the practical context of source detection in Section IV. Notations: The superscript (·)H denotes Hermitian transpose, kXk stands for the spectral norm for a matrix

and Euclidean norm for a vector, and k · k∞ is the sup norm of a function. The notations N (a, σ 2 ) and

CN (a, σ 2 ) represent the real and complex circular Gaussian distributions with mean a and variance σ 2 . For

x ∈ Cm , Dx = diag(x) = diag(x0 , . . . , xm−1 ) is the diagonal matrix having on its diagonal the elements of

the vector x. For x = [x−(m−1) , . . . , xm−1 ]T ∈ C2m+1 , the matrix T (x) ∈ Cm×m is the Toeplitz matrix built from x with entries [T (x)]i,j = xj−i . The notations ℜ(·) and ℑ(·) stand for the real and the imaginary parts respectively. II. P ERFORMANCE

OF THE COVARIANCE MATRIX ESTIMATORS

A. Model, assumptions, and results Let (rk )k∈Z be a doubly infinite sequence of covariance coefficients. For any T ∈ N, let RT = T (r−(T −1) , . . . , rT −1 ), a Hermitian nonnegative definite matrix. Given N = N (T ) > 0, consider the matrix model 1/2

N −1,T −1 VT = [vn,t ]n,t=0 = WT RT

(1)

N −1,T −1 ∗ where WT = [wn,t ]n,t=0 has independent CN (0, 1) entries. It is clear that rk = E[vn,t+k vn,t ] for any t, k,

and n ∈ {0, . . . , N − 1}. In the following, we shall make the two assumptions below. Assumption 1. The covariance coefficients rk are absolutely summable and r0 6= 0. With this assumption, the covariance function Υ(λ) ,

∞ X

k=−∞

rk e−ıkλ ,

λ ∈ [0, 2π)

is continuous on the interval [0, 2π]. Since kRT k ≤ kΥk∞ (see e.g. [8, Lemma 4.1]), Assumption 1 implies that supT kRT k < ∞. We assume the following asymptotic regime which will be simply denoted as “T → ∞”: Assumption 2. T → ∞ and N/T → c > 0. Our objective is to study the performance of two estimators of the covariance function frequently considered in the literature. These estimators are defined as b rˆk,T =

u rˆk,T =

March 6, 2014

N −1 T −1 1 XX ∗ 10≤t+k≤T −1 vn,t+k vn,t N T n=0 t=0

N −1 T −1 X X 1 ∗ vn,t+k vn,t 10≤t+k≤T −1 . N (T − |k|) n=0 t=0

(2)

(3)

DRAFT

4

b u b u Since Eˆ rk,T = (1 − |k|/T )rk and Eˆ rk,T = rk , the estimate rˆk,T is biased while rˆk,T is unbiased. Let also

b b ˆ(T rˆ−(T −1),T −1),T , . . . , r u b u , T rˆu ˆ(T R T −1),T . −(T −1),T , . . . , r

bb , T R T

(4) (5)

bb over R bu as an estimate of RT is its structural nonnegative definiteness. In A well known advantage of R T T

this section, results on the spectral behavior of these matrices are provided under the form of concentration b b − RT k and kR b u − RT k: inequalities on kR T T

bb be defined as in (4). Then, for any x > 0, Theorem 1. Let Assumptions 1 and 2 hold true and let R T

h i x x

bb

P R − R + o(1) > x ≤ exp −cT − log 1 +

T T kΥk∞ kΥk∞

where o(1) is with respect to T and depends on x.

bu be defined as in (5). Then, for any x > 0, Theorem 2. Let Assumptions 1 and 2 hold true and let R T !

i h cT x2

bu (1 + o(1)) P RT − RT > x ≤ exp − 2 4 kΥk∞ log T where o(1) is with respect to T and depends on x.

b b − RT k → 0 and A consequence of these theorems, obtained by the Borel-Cantelli lemma, is that kR T

b u − RT k → 0 almost surely as T → ∞. kR T

The slower rate of decrease of T / log(T ) in the unbiased estimator exponent may be interpreted by the

increased inaccuracy in the estimates of rk for values of k close to T − 1. We now turn to the proofs of Theorems 1 and 2, starting with some basic mathematical results that will be needed throughout the proofs. B. Some basic mathematical facts Lemma 1. For x, y ∈ Cm and A ∈ Cm×m ,

Proof:

H x Ax − y H Ay ≤ kAk (kxk + kyk) kx − yk . H x Ax − y H Ay = xH Ax − y H Ax + y H Ax − y H Ay ≤ (x − y)H Ax + y H A(x − y) ≤ kAk (kxk + kyk) kx − yk .

Lemma 2. Let X0 , . . . , XM−1 be independent CN (0, 1) random variables. Then, for any x > 0, # " M−1 1 X 2 (|Xm | − 1) > x ≤ exp (−M (x − log(1 + x))) . P M m=0 March 6, 2014

DRAFT

5

Proof: This is a classical Chernoff bound. Indeed, given ξ ∈ (0, 1), we have by the Markov inequality ! # " M−1 M−1 h i X X −1 2 2 > exp ξM (x + 1) P M (|Xm | − 1) > x = P exp ξ |Xm | m=0

m=0

"

≤ exp(−ξM (x + 1))E exp ξ

M−1 X m=0

|Xm |

2

!#

= exp (−M (ξ(x + 1) + log(1 − ξ))) since E exp(ξ|Xm |2 ) = 1/(1 − ξ). The result follows upon minimizing this expression with respect to ξ. C. Biased estimator: proof of Theorem 1 Define b bT (λ) , Υ

T −1 X

b rˆk,T eıkλ

k=−(T −1)

ΥT (λ) ,

T −1 X

rk eıkλ .

k=−(T −1)

b b − RT is a Toeplitz matrix, from [8, Lemma 4.1], Since R T

bb

RT − RT ≤

sup

λ∈[0,2π)

bb ΥT (λ) − ΥT (λ) ≤

sup

λ∈[0,2π)

bb b b (λ) + ΥT (λ) − EΥ T

sup λ∈[0,2π)

bb EΥT (λ) − ΥT (λ) .

By Kronecker’s lemma ([9, Lemma 3.21]), the rightmost term at the right-hand side satisfies bb EΥ (λ) − ΥT (λ) ≤ T

T −1 X

k=−(T −1)

|krk | −−−−→ 0. T T →∞

(6)

b b (λ) − EΥ b b (λ)|, two ingredients will be used. The first one is In order to deal with the term supλ∈[0,2π) |Υ T T

the following lemma (proven in Appendix A1): Lemma 3. The following facts hold:

H

b b (λ) = dT (λ)H VT VT dT (λ) Υ T N b H b EΥ (λ) = dT (λ) RT dT (λ) T

√ T where dT (λ) = 1/ T 1, e−ıλ , . . . , e−ı(T −1)λ .

The second ingredient is a Lipschitz property of the function kdT (λ) − dT (λ′ )k seen as a function of λ. ′

From the inequality |e−ıtλ − e−ıtλ | ≤ t|λ − λ′ |, we indeed have v u T −1 u1 X T |λ − λ′ | ′ √ . |e−ıtλ − e−ıtλ′ |2 ≤ kdT (λ) − dT (λ )k = t T t=0 3 Now, denoting by ⌊·⌋ the floor function and choosing β > 2, define I =

March 6, 2014

(7)

0, . . . , ⌊T β ⌋ − 1 . Let λi =

DRAFT

6

2π ⌊Tiβ ⌋ , i ∈ I, be a regular discretization of the interval [0, 2π]. We write sup λ∈[0,2π)

≤ max

bb b b (λ) ΥT (λ) − EΥ T sup

i∈I λ∈[λi ,λi+1 ]

≤ max

sup

bb b b (λi ) + Υ b b (λi ) − EΥ b b (λi ) + EΥ b b (λi ) − EΥ b b (λ) ΥT (λ) − Υ T T T T T


, χ1 + χ2 + χ3 .

bb b b (λi ) + max Υ b b (λi ) − EΥ b b (λi ) + max ΥT (λ) − Υ T T T i∈I

sup


bb b b (λ) EΥT (λi ) − EΥ T

With the help of Lemma 3 and (7), we shall provide concentration inequalities on the random terms χ1 and χ2 and a bound on the deterministic term χ3 . This is the purpose of the three following lemmas. Herein and in the remainder, C denotes a positive constant independent of T . This constant can change from an expression to another. Lemma 4. There exists a constant C > 0 such that for any x > 0 and any T large enough, !! xT β−2 xT β−2 2 P [χ1 > x] ≤ exp −cT − log −1 . CkΥk∞ CkΥk∞ Proof: Using Lemmas 3 and 1 along with (7), we have H H bb H VT VT b H VT VT b dT (λ) − dT (λi ) dT (λi ) ΥT (λ) − ΥT (λi ) = dT (λ) N N

≤ 2N −1 kdT (λ) − dT (λi )k kRT k WTH WT

≤ C|λ − λi |kΥk∞ WTH WT .

From kWTH WT k ≤ Tr(WTH WT ) and Lemma 2, assuming T large enough so that f (x, T ) , xT β−1 /(CN kΥk∞ ) satisfies f (x, T ) ≥ 1, we then obtain

"

P [χ1 > x] ≤ P CkΥk∞ T "

−β

T −1 N −1 X X t=0 n=0

2

#

|wn,t | > x

1 X =P (|wn,t |2 − 1) > f (x, T ) − 1 N T n,t

#

≤ exp(−N T (f (x, T ) − log f (x, T ) − 1)).

Lemma 5. The following inequality holds β

P [χ2 > x] ≤ 2T exp −cT

x x − log 1 + kΥk∞ kΥk∞

!!

.

Proof: From the union bound we obtain: ⌊T β ⌋−1

P [χ2 > x] ≤

March 6, 2014

X i=0

h i bb bb P Υ T (λi ) − EΥT (λi ) > x .

DRAFT

7

We shall bound each term of the sum separately. Since h i h i h i bb bb bb bb bb bb P Υ T (λi ) − EΥT (λi ) > x = P ΥT (λi ) − EΥT (λi ) > x + P − ΥT (λi ) − EΥT (λi ) > x

it will be enough to deal with the first right-hand side term as the second one is treated similarly. Let 1/2

T

ηT (λi ) , WT qT (λi ) = [η0,T (λi ), . . . , ηN −1,T (λi )] where qT (λi ) , RT dT (λi ). Observe that ηk,T (λi ) ∼

CN (0, kqT (λi )k2 IN ). We know from Lemma 3 that

b b (λi ) = 1 kηT (λi )k2 − EkηT (λi )k2 . b b (λi ) − EΥ Υ T T N

(8)

From (8) and Lemma 2, we therefore get

h i b bT (λi ) − EΥ b bT (λi ) > x ≤ exp −N P Υ

x x − log 1 + kqT (λi )k2 kqT (λi )k2

!!

.

Noticing that kqT (λi )k2 ≤ kΥk∞ and that the function f (x) = x − log 1 + x is increasing for x > 0, we get the result.

Finally, the bound for the deterministic term χ3 is provided by the following lemma: Lemma 6. χ3 ≤ CkΥk∞ T −β+1 . Proof: From Lemmas 3 and 1 along with (7), we obtain bb b bT (λi ) = dT (λ)H RT dT (λ) − dT (λi )H RT dT (λi ) EΥT (λ) − EΥ ≤ 2 kRT k kdT (λ) − dT (λi )k

≤ CkΥk∞ |λ − λi |T. From max

sup


|λ − λi | = λi+1 − λi = T −β we get the result.

We now complete the proof of Theorem 1. From (6) and Lemma 6, we get

h i

bb

P R − R > x = P [χ1 + χ2 > x + o(1)] .

T T

Given a parameter ǫT ∈ [0, 1], we can write (with some slight notation abuse)

P [χ1 + χ2 > x + o(1)] ≤ P [χ1 > xǫT ] + P [χ2 > x(1 − ǫT ) + o(1)] . With the results of Lemmas 4 and 5, setting ǫT = 1/T , we get h i h x xi + P χ2 > x(1 − ) + o(1) P [χ1 + χ2 > x + o(1)] ≤ P χ1 > T T xT β−3 β−3 xT − log −1 ≤ exp −cT 2 CkΥk∞ CkΥk∞ x 1 − 1 x 1 − T1 T + exp −cT + o(1) − log 1 + kΥk∞ kΥk∞ x x = exp −cT + o(1) − log 1 + kΥk∞ kΥk∞ since β > 2.

March 6, 2014

DRAFT

8

D. Unbiased estimator: proof of Theorem 2 The proof follows basically the same main steps as for Theorem 1 with an additional difficulty due to the scaling terms 1/(T − |k|). Defining the function

we have

b uT (λ) , Υ

bu

RT − RT ≤

sup λ∈[0,2π)

T −1 X

u rˆk,T eikλ

k=−(T −1)

bu ΥT (λ) − ΥT (λ) =

sup λ∈[0,2π)

b u (λ), the estimates rˆu being unbiased. since ΥT (λ) = EΥ T k,T

bu b uT (λ) ΥT (λ) − EΥ

In order to deal with the right-hand side of this expression, we need the following analogue of Lemma 3,

borrowed from [7] and proven here in Appendix B1. Lemma 7. The following fact holds: b uT (λ) = dT (λ)H Υ

VTH VT ⊙ BT N

dT (λ)

where ⊙ is the Hadamard product of matrices and where T . BT , T − |i − j| 0≤i,j≤T −1 b u (λ) more tractable, we rely on the following lemma which can be proven by direct In order to make Υ T

calculation.

Lemma 8. Let x, y ∈ Cm and A, B ∈ C m×m . Then xH (A ⊙ B)y = Tr(DxH ADy B T ) where we recall Dx = diag(x) and Dy = diag(y). Denoting 1 DT (λ) , diag(dT (λ)) = √ diag(1, eiλ , . . . , ei(T −1)λ ) T 1/2

1/2

QT (λ) , RT DT (λ)BT DT (λ)H (RT )H we get from Lemmas 7 and 8 b u (λ) = 1 Tr(DT (λ)H (R1/2 )H W H WT R1/2 DT (λ)BT ) Υ T T T T N 1 Tr(WT QT (λ)WTH ) = N N −1 1 X H = w QT (λ)wn N n=0 n

(9)

H where wiH is such that WT = [w0H , . . . , wN −1 ].

Compared to the biased case, the main difficulty lies here in the fact that the matrices BT /T and QT (λ) March 6, 2014

DRAFT

9

have unbounded spectral norm as T → ∞. The following lemma, proven in Appendix B2, provides some information on the spectral behavior of these matrices that will be used subsequently. Lemma 9. The matrix BT satisfies kBT k ≤

p √ 2T ( log T + C).

(10)

For any λ ∈ [0, 2π), the eigenvalues σ0 , . . . , σT −1 of the matrix Q(λ) satisfy the following inequalities: T −1 X

σt2

t=0

2

≤

max|σt | ≤ t

T −1 X t=0

|σt |3

≤

2 kΥk∞ log T + C

(11)

√ 2kΥk∞ (log T )1/2 + C

(12)

C((log T )3/2 + 1)

(13)

where the constant C is independent of λ. We shall also need the following easily shown Lipschitz property of the function kDT (λ) − DT (λ′ )k: kDT (λ) − DT (λ′ )k ≤

√ T |λ − λ′ |.

(14)

We now enter the core of the proof of Theorem 2. Choosing β > 2, let λi = 2π ⌊Tiβ ⌋ , i ∈ I, be a regular discretization of the interval [0, 2π] with I = 0, . . . , ⌊T β ⌋ − 1 . We write sup

λ∈[0,2π)

bu b u (λ) ≤ max Υ (λ) − EΥ T

T

sup


+ max

sup


bu b u (λi ) + max Υ b u (λi ) − EΥ b u (λi ) Υ (λ) − Υ T

T

i∈I

T

T

bu b u (λ) EΥ (λi ) − EΥ T

T

, χ1 + χ2 + χ3 .

Our task is now to provide concentration inequalities on the random terms χ1 and χ2 and a bound on the deterministic term χ3 . Lemma 10. There exists a constant C > 0 such that, if T is large enough, the following inequality holds: xT β−2 xT β−2 2 √ P [χ1 > x] ≤ exp −cT − log √ −1 . C log T C log T Proof: From Equation (9), we have N −1 X 1 bu H u b T (λi ) = wn (QT (λ) − QT (λi )) wn ΥT (λ) − Υ N n=0

March 6, 2014

N −1 X

H w (QT (λ) − QT (λi )) wn

≤

1 N

≤

N −1 X 1 2 kQT (λ) − QT (λi )k kwn k . N n=0

n=0

n

DRAFT

10

The norm above further develops as kQT (λ) − QT (λi )k ≤ kRT k kDT (λ)BT DT (λ)H − DT (λi )BT DT (λ)H + DT (λi )BT DT (λ)H − DT (λi )BT DT (λi )H k p ≤ 2 kDT (λ)k kRT k kBT k kDT (λ) − DT (λi )k ≤ CT ( log T + 1) |λ − λi |

√ where we used (10), (14), and kDT (λ)k = 1/ T . Up to a change in C, we can finally write kQT (λ) − QT (λi )k ≤ √ √ CT 1−β log T . Assume that f (x, T ) , xT β−2 / C log T satisfies f (x, T ) > 1 (always possible for every fixed x by taking T large). Then we get by Lemma 2 P [χ1 > x] ≤ P CN

−1

T

1−β

! X p 2 log T |wn,t | > x n,t

1 X 2 (|wn,t | − 1) > f (x, T ) − 1 N T n,t

=P

!

≤ exp (−N T (f (x, T ) − log (f (x, T )) − 1)) .

The most technical part of the proof is to control the term χ2 , which we handle hereafter. Lemma 11. The following inequality holds: P [χ2 > x] ≤ exp −

cx2 T 2

4 kΥk∞ log T

!

(1 + o(1)) .

Proof: From the union bound we obtain: ⌊T β ⌋−1

P [χ2 > x] ≤ Each term of the sum can be written

X i=0

h i bu u b P Υ (λ ) − E Υ (λ ) > x . i i T T

(15)

h i h i h i bu u b bu bu bu bu P Υ (λ ) − E Υ (λ ) i i > x = P ΥT (λi ) − EΥT (λi ) > x + P − ΥT (λi ) − EΥT (λi ) > x . T T

h i h i b u (λi ) − EΥ b u (λi ) > x , the term P − Υ b u (λi ) − EΥ b u (λi ) > x We will deal with the term ψi = P Υ T T T T

being treated similarly. Let QT (λi ) = UT ΣT UTH be a spectral factorization of the Hermitian matrix QT (λi ) with ΣT = diag(σ0 , . . . , σT −1 ). Since UT is unitary and WT has independent CN (0, 1) elements, we get from Equation (9) N −1 N −1 T −1 X 1 XX L 1 b uT (λi ) = wnH ΣT (λi )wn = |wn,t |2 σt Υ N n=0 N n=0 t=0

(16)

h i 2 L where = denotes equality in law. Since E ea|X| = 1/(1 − a) when X ∼ CN (0, 1) and 0 < a < 1, we have

March 6, 2014

DRAFT

11

by Markov’s inequality and from the independence of the variables |wn,t |2

! N −1 T −1 1 XX 2 ψi = P |wn,t | σt − Tr QT (λi ) > x N n=0 t=0 # " T −1 τ X X 2 σt ≤ E exp |wn,t | σt exp −τ x + N n,t t=0 T −1 −1 TY X σt τ −N 1− = exp −τ x + σt N t=0 t=0

(17)

T −1 T −1 X X σt τ log 1 − = exp −τ x + σt − N N t=0 t=0

for any τ such that 0 ≤ τ
x] dominates the term P[χ1 > x] and that the term χ3 is vanishing. Mimicking the end of the proof of Theorem 1, we obtain Theorem 2. bT − RT k > x] We conclude this section by an empirical evaluation by Monte Carlo simulations of P[kR

b T ∈ {R bb , R bu }, T = 2N , x = 2. This is shown in Figure 1 (curves labeled Biased and Unbiased), with R T T

against the theoretical exponential bounds of Theorems 1 and 2 (curves labeled Biased theory and Unbiased theory). We observe that the rates obtained in Theorems 1 and 2 are asymptotically close to optimal.

i h

b T −1 log P R T − RT > x

0

−0.05

−0.1

−0.15

Biased theory Biased Unbiased theory Unbiased

−0.2 10

15

20

25

30

35

40

N Figure 1.

Error probability of the spectral norm for x = 2, c = 0.5, [RT ]k,l = a|k−l| with a = 0.6.

III. C OVARIANCE “S IGNAL

MATRIX ESTIMATORS FOR THE PLUS

N OISE ”

MODEL

A. Model, assumptions, and results Consider now the following model: YT = [yn,t ]0≤n≤N −1 = PT + VT

(19)

0≤t≤T −1

where the N × T matrix VT is defined in (1) and where PT satisfies the following assumption: 1/2

Assumption 3. PT , hT sH T ΓT

where hT ∈ CN is a deterministic vector such that supT khT k < ∞, the

vector sT = (s0 , . . . , sT −1 )T ∈ CT is a random vector independent of WT with the distribution CN (0, IT ),

−1 and ΓT = [γij ]Ti,j=0 is Hermitian nonnegative such that supT kΓT k < ∞.

We have here a model for a rank-one signal corrupted with a Gaussian spatially white and temporally correlated noise with stationary temporal correlations. Observe that the signal can also be temporally correlated. March 6, 2014

DRAFT

13

Our purpose is still to estimate the noise correlation matrix RT . To that end, we use one of the estimators (2) or (3) with the difference that the samples vn,t are simply replaced with the samples yn,t . It turns out that these estimators are still consistent in spectral norm. Intuitively, PT does not break the consistence of these estimators as it can be seen as a rank-one perturbation of the noise term VT in which the subspace spanned by (Γ1/2 )H sT is “delocalized” enough so as not to perturb much the estimators of RT . In fact, we even have the following strong result. Theorem 3. Let YT be defined as in (19) and let Assumptions 1–3 hold. Define the estimates bp rˆk,T =

up rˆk,T =

N −1 T −1 1 XX ∗ 10≤t+k≤T −1 yn,t+k yn,t N T n=0 t=0

N −1 T −1 X X 1 ∗ 10≤t+k≤T −1 yn,t+k yn,t N (T − |k|) n=0 t=0

and let bp bp b bp = T (ˆ R r−(T ˆ(T T −1),T , . . . , r −1),T )

Then for any x > 0,

and

up up b up = T (ˆ R r−(T ˆ(T T −1),T , . . . , r −1),T ).

i h

bbp > x ≤ exp −cT P R − R

T T

x x + o(1) − log 1 + kΥk∞ kΥk∞

h i

bup P R T − RT > x ≤ exp −

cT x2 2 4 kΥk∞

log T

(1 + o(1)) .

Before proving this theorem, some remarks are in order. Remark 1. Theorem 3 generalizes without difficulty to the case where PT has a fixed rank K > 1. This captures the situation of K ≪ min(N, T ) sources. Remark 2. Similar to the proofs of Theorems 1 and 2, the proof of Theorem 3 uses concentration inequalities for functionals of Gaussian random variables based on the moment generating function and the Chernoff bound. Exploiting instead McDiarmid’s concentration inequality [11], it is possible to adapt Theorem 3 to sT with bounded (instead of Gaussian) entries. This adaptation may account for discrete sources met in digital communication signals. B. Main elements of the proof of Theorem 3 bup . Defining We restrict the proof to the more technical part that concerns R T b up (λ) , Υ T

March 6, 2014

T −1 X

up ikλ rˆk,T e

k=−(T −1)

DRAFT

14

PT −1 ikλ , we need to establish a concentration inequality on and recalling that ΥT (λ) = k=−(T −1) rk e h i up b b up (λ) can be written as (see Lemma 7) P supλ∈[0,2π) |ΥT (λ) − ΥT (λ)| > x . For any λ ∈ [0, 2π), the term Υ T b up (λ) = dT (λ)H Υ T

= dT (λ)H

YTH YT ⊙ BT N

VTH VT ⊙ BT N

dT (λ)

dT (λ)

PTH VT + VTH PT + dT (λ) ⊙ BT N H PT PT ⊙ BT dT (λ) + dT (λ)H N H

dT (λ)

b u (λ) + Υ b cross (λ) + Υ b sig (λ) ,Υ T T T

where BT is the matrix defined in the statement of Lemma 7. We know from the proof of Theorem 2 that " # ! cT x2 u b P sup |ΥT (λ) − ΥT (λ)| > x ≤ exp − (1 + o(1)) . (20) 2 4 kΥk∞ log T λ∈[0,2π) b cross (λ) and Υ b sig (λ). We then need only handle the terms Υ T T We start with a simple lemma.

Lemma 13. Let X and Y be two independent N (0, 1) random variables. Then for any τ ∈ (−1, 1), E[exp(τ XY )] = (1 − τ 2 )−1/2 . Proof: Z 2 2 1 eτ xy e−x /2 e−y /2 dx dy 2π R2 Z 2 2 2 1 = e−(x−τ y) /2 e−(1−τ )y /2 dx dy 2π R2

E[exp(τ XY )] =

= (1 − τ 2 )−1/2 .

With this result, we now have Lemma 14. There exists a constant a > 0 such that " # b cross (λ)| > x ≤ exp − √axT (1 + o(1)) . P sup |Υ T log T λ∈[0,2π)

Proof: We only sketch the proof of this lemma. We show that for any λ ∈ [0, 2π], axT b cross √ + C P[|Υ (λ)| > x] ≤ exp − T log T

where C does not depend on λ ∈ [0, 2π]. The lemma is then proven by a discretization argument of the interval b cross (λ) > x], the term [0, 2π] analogous to what was done in the proofs of Section II. We shall bound P[Υ T

March 6, 2014

DRAFT

15

b cross (λ) < −x] being bounded similarly. From Lemma 8, we get P[Υ T

H H b cross (λ) = Tr DT (λ)H PT VT + VT PT DT (λ)BT Υ T N 1/2 H 1/2 H DT (λ) (ΓT ) sT hH T WT RT DT (λ)BT = Tr N 1/2 1/2 H H DT (λ) (RT ) WTH hT sH T ΓT DT (λ)BT + Tr N 2 H = ℜ(hT WT GT (λ)sT ) N

1/2 1/2 e H be a singular value decomposition where GT (λ) = RT DT (λ)BT DT (λ)H (ΓT )H . Let GT (λ) = UT ΩT U T

of GT (λ) where Ω = diag(ω0 , . . . , ωT −1 ). Observe that the vector xT , WTH hT = (x0 , . . . , xT −1 )T has the

distribution CN (0, khT k2 IT ). We can then write T −1 2 X L 2 H cross b ωt (ℜxt ℜst + ℑxt ℑst ). ΥT (λ) = ℜ xT ΩT sT = N N t=0

−1 Notice that {ℜxt , ℑxt , ℜst , ℑst }Tt=0 are independent with ℜxt , ℑxt ∼ N (0, khT k2 /2) and ℜst , ℑst ∼

N (0, 1/2). Letting 0 < τ < (supT khT k)−1 (supλ kGT (λ)k)−1 and using Markov’s inequality and Lemma 13, we get i i h P h i h b cross b cross P Υ (λ) > x = P eN τ ΥT (λ) > eN τ x ≤ e−N τ xE e2τ t ωt (ℜxt ℜst +ℑxt ℑst ) T =e

−N τ x

TY −1 t=0

1−τ

2

−1 ωt2 khT k2

= exp −N τ x −

T −1 X t=0

log(1 − τ

2

!

ωt2 khT k2 )

.

√ P 2 Mimicking the proof of Lemma 9, we can establish that t ωt = O(log T ) and maxt ωt = O( log T ) √ uniformly in λ ∈ [0, 2π]. Set τ = b/ log T where b > 0 is small enough so that supT,λ (τ khT k kGT (λ)k) < 1. Observing that log(1 − x) = O(x) for x small enough, we get p b cross P[Υ (λ) > x] ≤ exp −N bx/ log T + E(λ, T ) T

where |E(λ, T )| ≤ (C/ log T )

P

t

ωt2 ≤ C. This establishes Lemma 14.

Lemma 15. There exists a constant a > 0 such that " # sig b (λ)| > x ≤ exp − √axT (1 + o(1)) . P sup |Υ T log T λ∈[0,2π) Proof: By Lemma 8,

b sig (λ) = N −1 Tr(DH P H PT DT BT ) Υ T T T =

1/2

khT k2 H sT GT (λ)sT N

1/2

where GT (λ) = ΓT DT (λ)BT DT (λ)H (ΓT )H . By the spectral factorization GT (λ) = UT ΣT UTH with ΣT = diag(σ0 , . . . , σT −1 ), we get

March 6, 2014

−1 2 T X L khT k b sig (λ) = Υ σt |st |2 T N t=0

DRAFT

16

and i h P b sig (λ) > x] ≤ e−N τ x E eτ khT k2 t σt |st |2 P[Υ T

T −1 X = exp −N τ x − log(1 − σt τ khT k2 ) t=0

for any τ ∈ (0, 1/(khT k2 supλ kGT (λ)k)). Let us show that r log T + 1 . | Tr GT (λ)| ≤ C T Indeed, we have 1 | Tr GT (λ)| = N −1 | Tr DT BT DTH ΓT | = N

T −1 −ı(k−ℓ)λ X e γ ℓ,k k,ℓ=0 T − |k − ℓ|

−1 −1 1 TX 1/2 1/2 1 TX 1 |γk,ℓ |2 2 N N (T − |k − ℓ|) k,ℓ=0 k,ℓ=0 r Tr Γ ΓH 1/2 2 1/2 log T + 1 T T = (log T + C) . ≤C N N T √ P Moreover, similar to the proof of Lemma 9, we can show that t σt2 = O(log T ) and maxt |σt | = O( log T ) √ uniformly in λ. Taking τ = b/ log T for b > 0 small enough, and recalling that log(1 − x) = 1 − x + O(x2 )

≤

for x small enough, we get that 2 Tk b sig (λ) > x] ≤ exp − √N bx + bkh √ Tr G (λ) + E(T, λ) P[Υ T T log T log T P where |E(T, λ)| ≤ (C/ log T ) t σt2 ≤ C. We therefore get b sig (λ) > x] ≤ exp − √N bx + C P[Υ T log T

where C is independent of λ. Lemma 15 is then obtained by the discretization argument of the interval [0, 2π].

Gathering Inequality (20) with Lemmas 14 and 15, we get the second inequality of the statement of Theorem 3. IV. A PPLICATION TO

SOURCE DETECTION

Consider a sensor network composed of N sensors impinged by zero (hypothesis H0 ) or one (hypothesis H1 ) source signal. The stacked signal matrix YT = [y0 , . . . , yT −1 ] ∈ CN ×T from time t = 0 to t = T − 1 is modeled as   V T YT =  h sH + V T T T

, H0

(21)

, H1

∗ ∗ where sH T = [s0 , . . . , sT −1 ] are (hypothetical) independent CN (0, 1) signals transmitted through the constant 1/2

channel hT ∈ CN , and VT = WT RT

∈ CN ×T models a stationary noise matrix as in (1).

As opposed to standard procedures where preliminary pure noise data are available , we shall proceed here to an online signal detection test solely based on YT , by exploiting the consistence established in Theorem 3.

March 6, 2014

DRAFT

17

b T ∈ {R b bp , R bup }, which is then used as a whitening The approach consists precisely in estimating RT by R T T

matrix for YT . The binary hypothesis (21) can then be equivalently written   W R R b−1/2 , H0 T T T b−1/2 = YT R T −1/2 −1/2  h sH R b b ,H . +W R R T T

T

T

T

T

(22)

1

b−1 − IT k → 0 almost surely (by Theorem 3 as long as inf λ∈[0,2π) Υ(λ) > 0), for T large, Since kRT R T

the decision on the hypotheses (22) can be handled by the generalized likelihood ratio test (GLRT) [12] by b−1/2 as a purely white noise. We then have the following result. approximating WT RT R T

bT be any of R bbp or R bup strictly defined in Theorem 3 for YT now following model (21). Theorem 4. Let R T T Further assume inf λ∈[0,2π) Υ(λ) > 0 and define the test

b −1 H N YT R T YT H0 ≶ γ α= b−1 Y H H1 Tr YT R T T where γ ∈ R+ satisfies γ > (1 +

√ 2 c) . Then, as T → ∞,   0 P [α ≥ γ] →  1

Recall from [12] that the decision threshold (1 + eigenvalue of

1 T

(23)

, H0 , H1 .

√ 2 c) corresponds to the almost sure limiting largest

WT WTH , that is the right-edge of the support of the Marˇcenko–Pastur law.

Simulations are performed hereafter to assess the performance of the test (23) under several system settings. p We take here hT to be the following steering vector hT = p/T [1, . . . , e2iπθ(T −1) ] with θ = 10◦ and p a power parameter. The matrix RT models an autoregressive process of order 1 with parameter a, i.e. [RT ]k,l = a|k−l| . In Figure 2, the detection error 1 − P[α ≥ γ|H1 ] of the test (23) for a false alarm rate (FAR) P[α ≥ γ|H0 ] =

bT = R bup (Unbiased) or R bT = R bbp (Biased) is compared against the estimator that assumes RT 0.05 under R T T

bT = RT in (23), and against the GLRT test that wrongly assumes perfectly known (Oracle), i.e. that sets R

bT = IT in (23). The source signal power is set to p = 1, that is temporally white noise (White), i.e. that sets R

a signal-to-noise ratio (SNR) of 0 dB, N is varied from 10 to 50 and T = N/c for c = 0.5 fixed. In the same setting as Figure 2, the number of sensors is now fixed to N = 20, T = N/c = 40 and the SNR (hence p) is varied from −10 dB to 4 dB. The powers of the various tests are displayed in Figure 3 and compared to the detection methods which estimate RT from a pure noise sequence called Biased PN (pure noise) and Unbiased PN. The results of the proposed online method are close to that of Biase/Unbiased PN, this last presenting the disadvantage to have at its disposal a pure noise sequence at the receiver. Both figures suggest a close match in performance between Oracle and Biased, while Unbiased shows weaker performance. The gap evidenced between Biased and Unbiased confirms the theoretical conclusions.

March 6, 2014

DRAFT

18

100

1 − P[α > γ|H1 ]

10−1

10−2

10−3

10−4 10

Biased Unbiased White Oracle 15

20

25

30

35

40

45

50

N Figure 2.

Detection error versus N with FAR= 0.05, p = 1, SNR= 0 dB, c = 0.5, and a = 0.6.

1

P[α > γ|H1 ]

0.8

Biased Unbiased Biased PN Unbiased PN Oracle

0.6

0.4

0.2

0 −10

−8

−6

−4

−2

0

2

4

SNR (dB) Figure 3.

Power of detection tests versus SNR (dB) with FAR= 0.05, N = 20, c = 0.5, and a = 0.6.

A PPENDIX A. Proofs for Theorem 1

March 6, 2014

DRAFT

19

1) Proof of Lemma 3: Developing the quadratic forms given in the statement of the lemma, we get dT (λ)H

T −1 1 X −ı(l′ −l)λ H VTH VT dT (λ) = e [VT VT ]l,l′ N NT ′ l,l =0

=

T −1 N −1 1 X −ı(l′ −l)λ X ∗ e vn,l vn,l′ NT ′ n=0 l,l =0

T −1 X

=

e−ıkλ

k=−(T −1) T −1 X

=

k=−(T −1)

and

N −1 T −1 1 XX ∗ v vn,t+k 10≤t+k≤T −1 N T n=0 t=0 n,t

b bT (λ), rˆkb e−ıkλ = Υ

V H VT E[WTH WT ] 1/2 1/2 E dT (λ)H T dT (λ) = dT (λ)H (RT )H RT dT (λ) N N = dT (λ)H RT dT (λ). B. Proofs for Theorem 2 1) Proof of Lemma 7: We have H

dT (λ)

VTH VT ⊙ BT N

dT (λ) =

1 NT

T −1 X

l,l′ =−(T −1)

T −1 X

=

eikλ

k=−(T −1) T −1 X

=

k=−(T −1)

2) Proof of Lemma 9: We start by observing that Tr BT2

=

T −1 X

2 [BT ]i,j

=

i,j=0

i,j=0

=2

T −1 X

T −1 X k=1

T T −k

2

′

ei(l−l )λ [VTH VT ]l,l′

T T − |l − l′ |

N −1 T −1 X X 1 v ∗ vn,t+k 10≤t+k≤T −1 N (T − |k|) n=0 t=0 n,t

b uT (λ). rˆku eikλ = Υ

T T − |i − j|

2

=2

T −1 X i>j

(T − k) + T = 2T 2

T −1 X k=1

T T − |i − j|

2

+T

1 + T = 2T 2 (log T + C) . T −k

p Inequality (10) is then obtained upon noticing that kBT k ≤ Tr BT2 .

We now show (11). Using twice the inequality Tr(F G) ≤ kF k Tr(G) when F, G ∈ Cm×m and G is

nonnegative definite [10], we get T −1 X

σt2 (λi ) = Tr QT (λi )2 = Tr RT DT (λi )BT DT (λi )H RT DT (λi )BT DT (λi )H

t=0

≤ kRT k Tr RT (DT (λi )BT DT (λi )H )2 2

2

≤ T −2 kRT k Tr(BT2 ) ≤ 2 kΥk∞ log T + C. 2

Inequality (12) is immediate since kQT k ≤ Tr Q2T .

March 6, 2014

DRAFT

20

As regards (13), by the Cauchy–Schwarz inequality, T −1 X t=0

|σt3 (λi )| =

T −1 X t=0

v uT −1 T −1 uX X 2 σt (λi )|σt (λi )| ≤ t σt4 (λi ) σt2 (λi ) t=0

v !2 T −1 u T −1 u X X t σt2 (λi ) σt2 (λi ) = ≤ t=0

t=0

t=0

T −1 X t=0

!3/2

σt2 (λi )

= C((log T )3/2 + 1).

R EFERENCES [1] W. B. Wu and M. Pourahmadi, “Banding sample covariance matrices of stationary processes,” Statist. Sinica, vol. 19, pp. 1755–1768, 2009. [2] P. J. Bickel and E. Levina, “Regularized estimation of large covariance matrices,” Ann. Statist., vol. 36, pp. 199–227, 2008. [3] C. Lam and J. Fan, “Sparsistency and rates of convergence in large covariance matrices estimation,” Ann. Statist., vol. 37, pp. 4254–4278, 2009. [4] H. Xiao and W. B. Wu, “Covariance matrix estimation for stationary time series,” Ann. Statist., vol. 40, pp. 466–493, 2012. [5] T. T. Cai, C.-H. Zhang, and H. H. Zhou, “Optimal rates of convergence for covariance matrix estimation,” Ann. Statist., vol. 38, pp. 2118–2144, 2010. [6] T. T. Cai, Z. Ren, and H. H. Zhou, “Optimal rates of convergence for estimating Toeplitz covariance matrices,” Prob. Th. and Related Fields, vol. 156, pp. 101–143, 2013. [7] P. Vallet and P. Loubaton, “Toeplitz rectification and DoA estimation with MUSIC.” Submitted to IEEE ICASSP’14, Florence, Italy, 2014. [8] R. M. Gray, Toeplitz and circulant matrices: A review.

Now Pub., 2006.

[9] O. Kallenberg, Foundations of modern probability, ser. Probability and its Applications (New York).

New York: Springer-Verlag,

1997. [10] R. A. Horn and C. R. Johnson, Topics in matrix analysis.

Cambridge: Cambridge University Press, 1991.

[11] M. Ledoux, The concentration of measure phenomenon, ser. Mathematical Surveys and Monographs.

Providence, RI: American

Mathematical Society, 2001, vol. 89. [12] P. Bianchi, M. Debbah, M. Maida, and J. Najim, “Performance of statistical tests for single-source detection using random matrix theory,” IEEE Transactions on Information Theory, vol. 57, no. 4, pp. 2400–2419, 2011.

March 6, 2014

DRAFT

Estimation of Toeplitz Covariance Matrices in Large ... - arXiv

Estimation of Toeplitz Covariance Matrices in Large ... - arXiv

Suggest Documents

Sparse estimation of large covariance matrices via a nested ... - arXiv

Masked Toeplitz covariance estimation - RWTH Aachen University

Minimax Estimation of Large Covariance Matrices ... - Yale University

Truncated Covariance Matrices and Toeplitz Methods in Gaussian

estimation of hyperspectral covariance matrices - IEEE GRSS

Generalized Thresholding of Large Covariance Matrices - CiteSeerX

Shrinkage Estimation of High Dimensional Covariance Matrices

ML ESTIMATION OF COVARIANCE MATRICES WITH KRONECKER ...

Efficient Distributed Estimation of Inverse Covariance Matrices

Forecasting Large Realized Covariance Matrices: The ...

Block Toeplitz Matrices - COMONSENS

Robust Estimates of Covariance Matrices in the Large Dimensional ...

Robust Estimates of Covariance Matrices in the Large Dimensional

Universality of Covariance Matrices

Toeplitz and Toeplitz-block-Toeplitz matrices and their correlation with ...

estimation of multivariate complex normal covariance matrices under ...

Rank of finite-sample covariance matrices for model order estimation

toeplitz preconditioning of toeplitz matrices â an operator ... - CiteSeerX

Noncirculant Toeplitz Matrices All of Whose Powers are Toeplitz

High-dimensional covariance matrices in elliptical distributions ... - arXiv

On Estimation of Covariance Matrices With Kronecker Product Structure

Parameter expansion for estimation of reduced rank covariance matrices

Unbounded largest eigenvalue of large sample covariance matrices

Robust Covariance Estimation under Imperfect Constraints ... - arXiv

Estimation of Toeplitz Covariance Matrices in Large ... - arXiv