Robust Model Order Selection in Large

1

Robust Model Order Selection in Large Dimensional Elliptically Symmetric Noise Eugénie Terreaux∗, Jean-Philippe Ovarlez∗† , and Frédéric Pascal‡ 3 rue Joliot-Curie, 91190 Gif-sur-Yvette, France (e-mail: [email protected]) † ONERA, DEMR/TSI, Chemin de la Hunière, 91120 Palaiseau, France ∗ L2S/CentraleSupélec-Université Paris-Sud, 3 rue Joliot-Curie, 91190 Gif-sur-Yvette, France

arXiv:1710.06735v1 [stat.ME] 18 Oct 2017

∗ CentraleSupélec-SONDRA,

Abstract—This paper deals with model order selection in context of correlated noise. More precisely, one considers sources embedded in an additive Complex Elliptically Symmetric (CES) noise, with unknown parameters. The main difficultly for estimating the model order lies into the noise correlation, namely the scatter matrix of the corresponding CES distribution. In this work, to tackle that problem, one adopts a two-step approach: first, we develop two different methods based on a Toeplitzstructured model for estimating this unknown scatter matrix and for whitening the correlated noise. Then, we apply Maronna’s M -estimators on the whitened signal to estimate the covariance matrix of the “decorrelated” signal in order to estimate the model order. The proposed methodology is based both on robust estimation theory as well as large Random Matrix Theory, and original results are derived, proving the efficiency of this methodology. Indeed, the main theoretical contribution is to derive consistent robust estimators for the covariance matrix of the signal-plus-correlated noise in a large dimensional regime and to propose efficient methodology to estimate the rank of signal subspace. Finally, as shown in the analysis, these results show a great improvement compared to the state-of-the-art, on both simulated and real hyperspectral images. Index Terms—Model order selection, RMT, correlated noise, CES distribution, robust estimation.

I. I NTRODUCTION ODEL order selection is a challenging issue in signal processing for example in wireless communication [1], array processing [2], or other related problems [3], [4]. Classically, for a white noise, statistical methods such as the one based on the application of the information theoretic criteria for model order selection, allow to estimate the model order thanks to eigenvalues and eigenvectors of the covariance matrix of the signal. This is the case of the Akaike Information Criterion (AIC) [5] or the Minimum Description Length (MDL) [6], [7]. Other examples are the problem of source localization [8], where the estimation of the signal subspace is done by the estimation of the eigenvalues of the covariance matrix, channel identification [9], waveform estimation [10] and many other parametric estimation problems. Though, all these methods are no more relevant for large dimensional and correlated data. Even if particular cases have been studied for correlated signals as in [11] or [12], these methods can not be generalized for all kind of signals and a whitening step, when possible, can not be systematically set up [13]. Moreover, the commonly used statistical model for this problem has not the same matrix properties when the data are large and when they

M

are not: the covariance matrix is not correctly apprehended and the methods fail to estimate the model order, for example in [14], in [15] or in [16]. In the field of model order selection for large dimensional regime, that is when the number of snapshots N and the dimension of the signal m tend to infinity with a constant positive ratio, and for white or whitened noise, the Random Matrix Theory (RMT) proposes methods to estimate the model order selection relying on the study of the largest eigenvalues distribution of the covariance matrix [17]. The RMT introduces new methodologies which correctly handle the statistical properties of large matrices thanks to a statistical and probability approach: see [18] for a review of this theory, [19] for a general detection algorithm, [20] for an adapted MUSIC detection algorithm, [21] for applications to radar detection and [22] for an application on hyperspectral imaging. When the noise is spatially correlated, it is still possible to estimate the model order for example by evaluating the distance between the eigenvalues of the covariance matrix [23]. Nevertheless, these methods require a threshold that has no explicit expression and can be fastidious to obtain [24]. In addition to the problem of the large dimension and the correlation, another recurrent problem in signal processing is the non-Gaussianity of the noise. To be less dependent of the noise statistic, that is for the model order selection not to be degraded with a noise more or less sightly different than targeted, robust methods for model order selection have been developed [25] in hyperspectral imaging [26]. Nevertheless, these methods depend on unknown parameters [1] or are not adapted for large data. Recent results in RMT enable to correctly estimate the covariance matrix for textured signals [27]. But the correlation matrix is assume to be known and the signal is whitened before processed. In this works, one considers a Complex Elliptically Symmetric (CES) noise. The CES distributions modelling is often exploited in signal processing, because of its flexibility, that is the ability to model a large panel of random signals. The signal can be split in two parts: a texture and a speckle. They are rather often used in various fields, as in [28] for hyperspectral imaging, or [29] for radar clutter echoes modelling. This article deals with large dimansional non-Gaussian data, and proposes a robust method to estimate the model order. The robustness of our method comes from the robust estimation of the covariance matrix, with a Maronna M -estimator [30] which assigns different weights according to the Mahalanobis

2

distance between the signals received by the different sensors. It is a generalization of [27] and [31] to the case of left hand side correlation (with an unknown covariance matrix). Moreover, this article proposes a new algorithm to estimate the model order. In a first part, an estimator for the correlation matrix is presented: the toeplitzified Sample Covariance Matrix (SCM), that is, the SCM enforced to be of Toeplitz form [32]. Indeed, as the covariance matrix is supposed to be Toeplitz, the SCM is toeplitzified as in [33] to enhance the estimation. The data are then whitened with this Toeplitz matrix and a robust Maronna M -estimator of the covariance matrix is then used after the data whitening. This robust estimation is studied and a threshold on its eigenvalues can be derived to select the model order. A second part presents the same procedure for the toeplitzified Fixed-Point (FP) estimator [34] and [35]. The third part presents some simulations on both simulated and real hyperspectral images. Proofs of the main results are postponed in the appendices. Notations: Matrices are in bold and capital, vectors in bold. Let X be a square matrix of size s × s, (λ)i (X), i ∈K1, ..., sJ, are the eigenvalues of X. T r(X) is the trace of the matrix X. kXk stands for the spectral norm. Let A be a matrix, AT is the transpose of A and AH the Hermitian transpose of A. In is the n × n identity matrix. For any m−vector x, L : x 7→ L(x) is the m × m matrix defined as the Toeplitz operator : ([L(x)]i,j )i≤j = xi−j and ([L(x)]i,j )i>j = x∗i−j . For any matrix A of size m × m, T (A) represents the ˇ is a vector for which each component matrix L(ˇ a) where a ˇi, 0 t), if p 1, x→∞ ii) lim cN < Φ−1 ∞, N →∞

ˇ −1/2 C1/2 X T1/2 . ˇ wF P = C ˇ −1/2 M δ H Γ1/2 + C Y FP FP The parameter E[τ ] can be in practice evaluated with the empirical estimator of the mean, or, as in the previous section, be considered as equal to one. The quantity E[v(τ γ) τ ] can be also evaluated through an estimate γˆ of γ as explained in the Results section. B. Robust estimation of the covariance matrix and model order selection The robust estimation of the covariance matrix and the model order selection are done as previously. The robust ˇ wF P estimator of the scatter matrix of the whitened signal Y ˇ is a fixed-point estimator denoted by ΣF P and defined as the unique solution of the equation: N −1 1 H 1 X H −1 ˇ wF ˇ wF i y ˇ ˇ wF i y u y Σ y Σ= i. N i=0 m wF i

(15)

ˇ F P is a robust estimator of the covariance matrix of Thus, Σ the whitened signal. ˇ SCM The equation (11) is still effective when replacing Σ ˇ by ΣF P . Indeed, Theorem 2 can be adapted as follows: Theorem 4. The following convergence holds:

a.s.

ˇ ˆ

ΣF P − S

−→ 0 .

(16)

Proof: The proof is the same as in Theorem 2 and is provided in Appendix B. The same threshold t given by equation (13) can be ˇ F P to estimate p. The final used on the eigenvalues of Σ corresponding algorithm is presented below.

6

Histogram of eigenvalues

Figure 6 presents the same histograms as in Figure 4 for a single source of SNR equal to 10 dB present in the samples. One can observe that only single eigenvalue exceeds the ˇ FP threshold and that the noise eigenvalues distribution of Σ ˆ fits well those of S.

·10−2 6

ˆ S ˇ FP Σ Threshold t

4

Histogram of eigenvalues

0.1 ˆ S ˇ FP Σ

2

0 0

1

2

3

ˆ and Σ ˇ F P when the signal Fig. 4. Eigenvalues of the covariance matrices S ˇ F P and the corresponding threshold t (ρ = 0.7, m = is whitened through C 900, N = 2000, τ ∼ inverse gamma).

Histogram of eigenvalues ·10−2

ˆ S ˇ FP Σ Threshold t

4

2

0 0

5 · 10−2

0.1

ˆ and Σ ˇ F P when the signal is Fig. 5. Eigenvalues of the covariance matrices S ˇ F P and the corresponding threshold (ρ = 0.7, m = 900, whitened through C N = 2000, τ = t2 , t ∼ student)

C. Results As in the previous section, it seems interesting to analyse ˆ and Σ ˇ F P . For the next simuthe eigenvalues distributions of S lations, source-free samples are considered and the parameters are set to c = 0.45, m = 900 and N = 2000. The function u chosen for the FP and Maronna M -estimators is the same function as before with α = 0.1. Figure 4 presents the eigenvalues distribution of the covariance ˆ and Σ ˇ F P when the signal has been whitened by matrices S ˇ CF P . One can notice that the results are the same as Figure 1: for N large, the distribution of eigenvalues is almost the same ˆ However, as the rate of convergence of (14) is as those of S. faster than in (5), it is more interesting to consider the robust method. Moreover, if a robust estimator is not used after the whitening process, the eigenvalues distribution will not follow ˆ and will exceed the threshold t. those of S For robustness analysis (not the same texture distribution for the u function and the observed samples), Figure 5 shows quite good results when T is a diagonal matrix containing the {τi }i∈J0,N −1K on its diagonal where {τi }i∈J0,N −1K are i.i.d. and follow a distribution equal to t2 with t a Student-t random variable with parameter 100 and α = 0.1.

8 · 10

−2

6 · 10

−2

Threshold t

4 · 10−2 2 · 10−2 0

0

1

2

3

ˆ and Σ ˇ F P for a single source Fig. 6. Eigenvalues of the covariance matrices S with SN R = 10dB present in the samples and the calculated threshold ρ = 0.7, m = 900, N = 2000, τ ∼ inverse gamma

The results are better for this robust method than in the previous section (e.g. figure 3). Indeed, the robust method provides robustness with respect to the distribution of τ : if the distribution of the texture differs to those for which the function u has been computed, the method is still reliable, this can be explained by the robustness of the covariance matrix estimation. As for the non-whitening case, the eigenvalues get over the threshold and no conclusion or model order can be deducted. These results have so extended the paper of [31] to the left hand side correlated noise case. The L2-norm of the estimated covariance matrix compared to the SCM tends to zero when N and m tends to infinity with a constant ratio c. As a lot of estimation methods for the rank of the signal subspace are based on the estimation of the eigenvalues of the covariance matrix, this new estimator improves the consistency for resolution of this problem. V. R ESULTS

AND

C OMPARISONS

In this section some results of order selection are presented, on both simulated and real hyperspectral images. The simulaˇ SCM and Σ ˇ FP . tions are based on Σ A. Estimation of the model order In order to test the proposed method, we simulate hyperspectral images, before dealing with real images. As a reminder, we first whiten the received signal thanks to a Toeplitz matrix coming from the SCM or a Fixed-Point estimator. Thus, a M -estimator is used to estimate the scatter matrix of the whitened signal. The distribution of its eigenvalues is then studied: a threshold is applied to count how many eigenvalues are higher than this threshold, providing the estimated model order pˆ.

7

number of sources

103

102

101

ˇ p ˆ with Σ UW ˇ p ˆ with Σ FP ˇ p ˆ with Σ SCM p ˆ with AIC method p

100

0

−10

10

20

SNR Fig. 7. Estimation of the number pˆ of sources (4 trials) embedded in CES correlated noise (m = 400, c = 0.2, p = 4 source, ρ = 0.7) versus SNR.

number of sources

103

Point estimator, M 3 be the method consisting in thresholding the eigenvalues of the Fixed-Point estimator without the whitening step, and the usual AIC method. For the function u(.) corresponding to the Student-t distribution, we choose ν = 0.1 for the whitening process if it is done by a fixedpoint estimator, and zero for the estimation process. As we do not have any access to the true distribution of the noise, an N 1 X 1 H ˇ −1 y Σ yi , empirical estimator of γ is used, γˆ = N i=1 m i (i) ˇ −1 yi yi yH . Then [27] ˇ (i) = Σ ˇ − 1 u 1 yH Σ where Σ i N m i a.s. shows that γ − γˆ −→ 0. Moreover, as the distribution of τ is unknown, we choose to consider that E [τ ] and E [v(τ γ) τ ] are equal to 1. Further works can be carried out to estimate correctly these unknown quantities. However, we can reasonably assume than E [v(τ γ) τ ] and E [τ ] are not to large and that the estimation error will not impact the results a lot. The results are summarized in table I. On each image, the result tends to be better than those of classical methods.

102

101

TABLE I E STIMATED p FOR DIFFERENT HYPERSPECTRAL IMAGES .

ˇ p ˆ with Σ UW ˇ p ˆ with Σ FP ˇ p ˆ with Σ SCM p ˆ with AIC method

Images p pˆ M1 pˆ M2 pˆ M3 pˆ AIC

p

100 −10

−5

0

5

Indian Pines 16 11 12 220 219

SalinasA 9 9 9 204 203

PaviaU 9 1 1 103 102

Cars 6 3 3 1 143

10

SNR Fig. 8. Estimation of the number p of sources (4 trials) embedded in CES correlated noise (m = 400, c = 0.2, p = 4 sources, ρ = 0.7) versus SNR.

For simulated and correlated (ρ = 0.7) CES noise, the {τi }i∈J0,N −1K are inverse gamma distributed with parameter ν = 0.1. On Figure 7 (m = 400 and N = 2000), p = 4 sources are added in the observations with a SNR varying from −15 to 20dB. For this figure, the number of sources pˆ (average on 4 trials) is estimated through three methods: AIC, the nonwhitened signal and the two proposed methods: when the signal is whitened with the Toeplitz version of the SCM and the one of the FP. The proposed method starts to find sources from a SNR equal to −5dB. The FP method seems to better evaluate the number of sources. For a greater SNR, whereas it systematically gives the correct number of sources, the other methods overestimate it. On Figure 8 the same simulation is done for p = 4 but with the {τi }i following a distribution equal to t2 where t is a Student-t random variable, as before. On Figure 8, one notice that the proposed estimators still present better performance than the others, and allow to find sources with SNR greater than 0 dB. Now, we compare the results obtained with three different methods on several real hyperspectral images found in public access: Indian Pines, SalinasA from AVIRIS database and PaviaU from ROSIS database [41]. Let M 1 be the proposed method with a whitening made with the SCM estimator, M 2 be the proposed method with a whitening made with a Fixed-

VI. C ONCLUSION The model order selection for large dimensional data and for sources embedded in correlated CES noise is tackled in this article. Two Toeplitz-based covariance matrix estimators are first introduced, and their consistency has been proved. As for the CES texture, it is handled with any M -estimator, which can then be used to estimate the correct structure of the scatter matrix built on whitened observations. The Random Matrix Theory provides tools to correctly estimate the model order. Results obtained on real and simulated hyperspectral images are promising. Moreover, the proposed method can be applied on a lot of other kind of model order selection problems such as radar clutter rank estimation, sources localization or any hyperspectral problems such as anomaly detection or linear or non-linear unmixing techniques. A PPENDIX A P ROOFS OF T HEOREM 1 AND T HEOREM 3 The proofs of Theorem 1 and Theorem 3 are inspired by [31]. For these theorems, we will use the lemma 4.1 in T a Toeplitz Hermi[42], that is, for T = L (t0 , . . . , tm−1 ) tian m × m-matrix with {tk }k∈J0,m−1K absolutely summable (t−k = t∗k ), we can define the function f (.) such that for any m−1 X λ ∈ [0, 2 π], f (λ) = tk eiλk and Mf characterizes its essential supremum:

k=1−m

m−1 X kTk ≤ Mf = sup tk eiλk . λ∈[0,2π) k=1−m

(17)

8

And the equation (18) becomes:

noise

T 1 Y YH − E[τ ] C ≤ sup |ˆ γm (λ)

N λ∈[0,2π)

A. Proof of Theorem 1 H

1/2

As in the main body of this article, let Y = M δ Γ + C1/2 X T1/2 and let T be the Toeplitz operator as defined in the introduction and for any m-vector x, [L(x)]i,j = i≤j xi−j and [L(x)]i,j = x∗i−j , of size m × m. Un-

sign cross +ˆ γm (λ) + γˆm (λ) − E[τ ] γm (λ)| .

i>j

This leads to:

T 1 Y YH − E[τ ] C

N noise noise ≤ sup γˆm (λ) − E γˆm (λ) λ∈[0,2π) noise (λ) − E(τ ) γm (λ) + sup E γˆm

der Assumption 1, Assumption 2, Assumption 3 and as 1 H and E[τ ] C are Toepltiz matrices, one can YY T N write, thanks to (17):

T 1 Y YH − E[τ ] C ≤ sup |ˆ γm (λ) − E[τ ] γm (λ)| ,

N λ∈[0,2π) (18) m−1 X where γm (λ) = ck,m ei k λ with c−k = c⋆k and

λ∈[0,2π)

sign (λ) + + sup γˆm λ∈[0,2π)

k=1−m

m−1 X

γˆm (λ) =

k=1−m

Lemma 5. The quantity γˆm (λ) can be rewritten as: Y YH dm (λ) , N T 1, e−i λ , . . . , e−i (m−1) λ .

γˆm (λ) = dH m (λ) 1 with dm (λ) = √ m

(19)

Proof: The proof draws his inspiration from the one of Appendix A1 in [31]. Equation (19) can be rewritten as: dH m (λ)

m−1 1 X −i (l′ −l) λ Y YH dm (λ) = e Y YH l,l′ , N mN ′ l,l =0

=

m−1 X

e−i k λ

k=1−m

=

m−1 X

1 mN

m−1 −1 X NX

⋆ yi,j yi+k,j 10≤i+k≤m ,

i=0 j=0

Thereby, we have: Y YH dm (λ) , N Mδ H Γ δ MH = dH dm (λ) m (λ) N 1/2 1/2 1/2 C X T Γ δ MH + dH (λ) dm (λ) m N M δ H Γ1/2 T1/2 XH C1/2 + dH dm (λ) m (λ) N C1/2 X T XH C1/2 + dH dm (λ) . m (λ) N m (λ)

= dH m (λ)

Mδ sign And we note : γˆm (λ) = dH m (λ)

1/2

Lemma 6. noise E γˆm (λ) = E[τ ] dH m (λ) C dm (λ) = E[τ ] γm (λ) . (23) noise Proof: The equation (20) gives E γˆm (λ) = 1/2 H 1/2 C X T X C dm (λ). Let V = C1/2 X dH m (λ) E N noise and (V)i,j = vi,j . We obtain E γˆm (λ) = H V T V dH = dm (λ). As E V T VH ij m (λ) E N N i h X ∗ ∗ E[τ ] E vj,k vik and ck′ = E vp,n vp+k ′ ,n , we have k=1

N X E V T VH ij = E[τ ] cj−i = N E[τ ] cj−i . This leads k=1

H

Γ δ MH N

Thereby, the second term of (22) leads to: noise (λ) − E(τ ) γm (λ) sup E γˆm λ∈[0,2π)

=

X T XH C1/2 N

dm (λ) .

sup

λ∈[0,2π)

(20)

dm (λ),

cross γˆm (λ) = C1/2 X T1/2 Γ1/2 δ MH +M δ H Γ1/2 T1/2 XH C1/2 dm (λ) dH (λ) m N C noise γˆm (λ) = dH m (λ)

λ∈[0,2π)

first need the following lemma:

noise N E(τ ) H dm (λ) C dm (λ) = E[τ ] γm (λ) . E γˆm (λ) = N

k=1−m

γˆ

(22)

noise (λ) − E(τ ) γm (λ) : We sup E γˆm

1) Analysis of

to

cˇk e−i k λ .

cross sup |ˆ γm (λ)| .

λ∈[0,2π)

We will now analyse each term of (22).

cˇk,m ei k λ with cˇ−k = cˇ⋆k .

The following lemma is essential for the development of the proof:

(21)

| E(τ ) γm (λ) − E(τ ) γm (λ) |= 0 .

The second term is equal to zero. noise noise (λ) − E γˆm (λ) : As in 2) Analysis of sup γˆm λ∈[0,2π)

[31], the method consists in proving for a λi ∈ [0, noise 2π)and noise (λi ) − E γm (λi ) > x → a real x > 0 that P γˆm 0. " After that, it remains to# prove that noise noise γm (λ) − γm (λi ) > x → 0 and P sup λ∈[λi ,λi+1 ) # " noise noise Eγm (λi ) − Eγm (λ) > x → 0. that P sup λ∈[λi ,λi+1 )

9

Let ⌊·⌋ be the floor function, choosing a β > 2, I = 0, . . . , ⌊N β ⌋ − 1 , λi = 2 π ⌊Niβ ⌋ , i ∈ I: noise noise (λ) − E γˆm (λ) sup γˆm λ∈[0,2π) noise noise (λ) − γˆm (λi ) 6 max sup γˆm i∈I λ∈[λi λi+1 ] noise noise (λi ) − E γˆm (λi ) + max γˆm i∈I noise noise (λi ) − E γˆm (λ) , + max sup E γˆm i∈I λ∈[λi λi+1 ]

△

= χ1 + χ2 + χ3 .

(24)

The idea of the proof in [31] is then to provide concentration inequalities for the term χ1 and χ2 (random terms) and a bound on χ3 . The only difference with [31] is the presence noise of the matrix T in γˆm (λ) and the left side correlation of the noise. Let note kγk∞ the sup norm of the function γ : ∞ X λ −→ ck e−ikλ for λ ∈ [0 2π). The convergence of the k=−∞

first term χ1 is proposed in the following lemma.

Lemma 7. A constant A > 0 can be found such that, for any x > 0 and N large enough, x N β−2 x N β−2 c N2 − log −1 . P [χ1 > x] ≤ exp − kTk∞ A kγk∞ A kγk∞

Proof: As already mentioned, the proof is the same as in [31] except for two points: the presence of the matrix T and the left side correlation instead of right side

of theHnoise

VN T V 6 kTk VN VH ≤ in [31]. The inequality: N N ∞

kTk∞ kCk X XH enables to write: noise noise γˆm (λ) − γˆm (λi ) V T VH V T VH H d (λ) − d (λ ) d (λ ) = dH (λ) m m i , m i m N N

2

≤ |dm (λ) − dm (λi )| kCk kTk∞ X XH . N And then the end of the proof is exactly the same as those of c in the exponential. the Lemma 4 in [31] replacing c by kTk∞ The left correlation is without consequences on the proof. The convergence of the second term χ2 is proposed in the following lemma. Lemma 8. x x cN − log +1 . P [χ2 > x] ≤ 2 N β exp − kTk∞ kγk∞ kγk∞

Proof: The proof is the same as those of the Lemma 5 c in [31], with the on the denominator. kTk∞ The convergence of the third term χ3 is proposed in the following lemma. Lemma 9. χ3 ≤ A kγk∞ N

−β+1

.

Proof: The proof is the same as those of the lemma 6 of c on the denominator. [31], still with the kTk∞

P

"These

inequalities proves # noise noise a.s. sup γˆm (λ) − E γˆm (λ) > x −→

that 0 for

λ∈[0,2π)

2

any x positive real and with a e−N rate of decrease. cross sup |ˆ γm (λ)|: To prove the conver-

3) Analysis of

λ∈[0,2π)

gence of the last term of (22), let us recall that C1/2 X T1/2 Γ1/2 δ MH dm (λ) N M δ H Γ1/2 T1/2 XH C1/2 dm (λ) . + dH m (λ) N Let Im be a m × m matrix containing 1 everywhere and Dm (λ) be the matrix containing the elements of dm (λ) on its diagonal. It can be easily verified that, for any matrix A, H dH (λ) A d (λ) = Tr D (λ) A D (λ) I m m m . We obtain: m m cross γˆm (λ) = dH m (λ)

cross γˆm (λ) = 1 1/2 2 Re . Tr X T1/2 Γ1/2 δ MH Dm (λ) Im DH (λ) C m N

For readability, let E(λ) = Dm (λ) Im DH m (λ) defined as:   1 eiλ . . . ei(m−1)λ E(λ) =  e−iλ 1 . . . ei(m−2)λ  , −i(m−1)λ e ... ... 1

let G(λ) = MH E(λ) C1/2 X and J = T1/2 Γ1/2 δ, two matrices respectively of size p × N and N × p. Moreover, T let g(λ) = [g1 (λ), . . . , gN p (λ)] = vec(G(λ)) and j = T [j1 , . . . , jN p ] = vec(J). We obtain: 2 cross Re vecT (G(λ)) vec(J) , (λ) = γˆm N Np 2 X Re(gk (λ)) Re(jk ) − Im(gk (λ)) Im(jk ) . = N k=1

This expression can be transformed by introducing A = ˜ (λ) = A−1 g(λ) MH E C1/2 ⊗ IN , B = T1/2 Γ1/2 ⊗ Ip , g Np Np X X ˜j = B−1 j, ak = AT l,k and bk = (B)s,k : s=1

l=1

cross γˆm (λ)

T 2 Re A−1 g(λ) AT B B−1 j N ! ! Np Np Np X X 2 X T ˜ ˜k (λ) Re Re A l,k g = (B)s,k jk N s=1 k=1 l=1 ! ! Np Np X X T ˜k (λ) Im − Im A l,k g (B)s,k ˜jk , =

=

2 N

l=1 N p X

Re (ak g˜k (λ)) Re bk ˜jk

k=1 − Im (ak g˜k (λ)) Im bk ˜jk .

s=1

The variables ak g˜k (λ) and bk ˜jk are two independent complex Gaussian variables with variances respectively equal to 2 2 |˜ ak (λ)| and |bk | . We can apply the following lemma:

10

Lemma 10. Let x and y be two independent Gaussian N (0, 1) scalar random variables, then for any τ ∈ (−1 1), then E [exp (τ x y)] = (1 − τ 2 )−1/2 . Proof: The proof is derived in [31] through lemma 13. Let

ν > sup

k∈J1,N pK,λ∈[0,2π)

> 0 a real such that : ν −1 2 2 |˜ ak (λ)| |bk | . Then, for a fixed

λ ∈ [0 2π), from Lemma 10 and from the Markov Inequality: cross P [ˆ γm (λ)

> x | T]

cross = P [exp (N ν γˆm (λ)) > exp (N ν x) | T] " Np X ≤ exp (−N ν x) E exp 2 ν Re (ak g˜k (λ)) Re bk ˜jk

k=1 −Im (ak g˜k (λ)) Im bk ˜jk −1/2 Np Y |˜ ak (λ)|2 |bk |2 1 − 4 ν2 ≤ exp (−N ν x) 2 2 k=1 −1/2 |˜ ak (λ)|2 |bk |2 1 − 4 ν2 2 2 ! N p X 2 2 2 −1 ≤ exp −N νx − log 1 − ν |˜ ak (λ)| |bk | k=1

Moreover, since the Γi,j are absolutely summable (Assumption 3), it exists a constant K such that: 2 Np Np X √ X 1/2 2 τl . τl Γl,k ≤ K |bk | = l=1

l=1

Np 1 X τl −→ E(τi ) = 1, we obtain N −→∞ N l=1 2 |bk | ≤ N K. To deal with |˜ ak (λ)|, let K1 and K2 be some constants and remind that, for a fixed j, the {ci,j }i and the {Mi,j }i are absolutely summable: p m 1 X X ⋆ i(l−j) λ |ak (λ)| = cl,k Mj,s e , m s=1 l,j=1

Furthermore, since

p,m X

m ⋆ 1 X Mj,s , |cl,k | m m s,j=1 l=1   m X ⋆  Mj,s ≤ p K2 max  = p K1 .

≤

s

j=1

We obtain ν 2 |˜ ak (λ)|2 |bk |2 ≤ ν 2 N 2 p2 K K12 with p ≪ N . Let q and ǫ be two positive reals small enough and such that: q 2 K K12 ν2 = . < N N 1/2+ǫ

Then lim ν 2 |ak (λ)|2 |bk |2 = 0 and N −→∞ −1 2 2 2 2 log 1 − ν 2 |˜ ak (λ)| |bk | ∼ ν 2 |˜ ak (λ)| |bk | . Thereby, with A defining a constant, it can be obtained: cross P [ˆ γm (λ) > x | T] ≤ exp −N 1/2−ǫ q x − A .

Then, integrating with respect to any density pT (.) of T leads to: Z cross cross P [ˆ γm (λ) > x] = P [ˆ γm (λ) > x | T] pT (T) dT ≤ exp −N 1/2−ǫ q x − A .

cross This proves that, for any λi , P [ˆ γm (λi ) > x] → 0. N →∞ It remains now to prove that a.s. cross cross max sup |ˆ γm (λ) − γˆm (λi )| −→ 0. This will i∈I λ∈[λi λi+1 ]

be left to the reader "as it follows the same #proof as for χ1 of cross sup γˆm (λ) λ∈[0 2π) sign sup γˆm (λ) : λ∈[0,2π)

(24). We have so P 4) Analysis of

>x

−→ 0.

N −→∞

The proof of conver-

gence of this quantity follows the same principles. We have: M δ H Γ δ MH dm (λ) . N As previously, let Im be a m × m matrix containing 1 everywhere and let E(λ) = Dm (λ) Im DH m (λ). Then: 1 sign Tr M δ Γ δ H MH E . γˆm (λ) = 2 Re N sign γˆm (λ) = dH m (λ)

Let A(λ) = MH E M δ and B = Γ δ H be two matrix respectively of size p × N and N × p. Defining a(λ) = vec(A(λ)) and b = vec(B), we have: 2 sign γˆm (λ) = Re vecT (A(λ)) vec(B) , N −T 2 Re aT (λ) MH E M ⊗ IN = N T MH E M ⊗ IN (Γ ⊗ Ip ) (Γ ⊗ Ip )−1 j , =

Np 2 X Re(ak (λ)) Re(bk ) − Im(ak (λ)) Im(bk ) . N k=1

˜(λ) = Let us define C(λ) = MH E M ⊗ IN , D = Γ ⊗ Ip , a Np X ˜ = D−1 b, ck = C−1 (λ) a(λ), b CT (λ) l,k and dk =

Np X

l=1

(D)s,k . Using Lemma 10 and the Markov inequality, it

s=1

can be shown that, for any fixed λ ∈ [0 2π) and !−1a constant µ

such that 0 < µ
x P γˆm

≤ exp −N ν x −

Np X

k=1

2

2

log 1 − µ |ck (λ)| |dk |

2

−1

!

.

As the matrix Γ is absolutely summable, then, for all k, 2 |dk | ≤ K where K is a constant. Now, for all k, we have   p m m X X X i (j−l)λ   |ck (λ)| = Mj,k e Ml,s , s=1 l=1 j=1   p m m X X X  |Mj,k | . ≤ |Ml,s | s=1

l=1

j=1

11

The columns of M are absolutely summable. As p is fixed and p ≪ N , with K a constant, we have |ck (λ)| ≤ K. The coefficients of the matrix Γ being absolutely summable, for all k, we have find a constant K1 such that |dk | ≤ K1 . By w defining w as a constant small enough and µ = √ such that N µ2 |ck |2 |dk |2 −→ 0, then, for all x > 0 and A a constant, N −→∞ we have the following inequality: sign P[ˆ γm (λ) > x] ≤ exp −N 1/2 w x − A . cross As for γm (λ), it remains sign sign max sup |ˆ γm (λ) − γˆm (λi )| i∈I λ∈[λi λi+1 ]

to prove than a.s. −→ 0 and this

will left to the reader" as it is the same as the # proof of χ1 . We sign sup γˆm (λ) > x

have proven than P

λ∈[0 2π)

−→ 0. As the

and C are Toeplitz, it follows through (17):

ˆ F P − E[v(τ γ) τ ] C

T C

Sˆ ˆ FP − S ˆ γ (λ) − γ E[v(τ γ) τ ] C (λ) + T C ≤ sup ˆ

λ∈[0,2 π)

≤ χ1 + χ2 ,

where χ1 = sup λ∈[0,2 π)

ˆ

ˆ χ2 = T (CF P − S) .

1) Part 1: convergence Sˆ γ (λ) − γ E[v(τ γ) τ ] C (λ) : sup ˆ

N X H ˆ= 1 S v (τi γ) ywi ywi , N i=1

where γ is the unique solution (if defined) of:

Sˆ γ (λ) − γ E[v(τ γ) τ ] C (λ) ˆ λ∈[0,2 π) h i ˆ Sˆ γ (λ) − E γˆ S (λ) ≤ sup ˆ λ∈[0,2 π) h i ˆ + sup E γˆ S (λ) − γ E[v(τ γ) τ ] C (λ) , λ∈[0,2 π)

≤ χ11 + χ12 ,

∆

γ A (λ) =

m−1 X

ak e i k λ .

k=1−m

Finally, we denote by γˆ A (λ) the estimated spectral density of Toeplitz matrix A. To prove the consistency, we will decompose, as for Theoˆ FP rem 1, the equation (26) in two parts. As matrices T C

χ11

where χ12 =

sup λ∈[0,2 π)

h i Sˆ ˆ γ (λ) − E γˆ S (λ) = sup ˆ λ∈[0,2 π) h i ˆ S E[v(τ γ) τ ] C (λ) . E γˆ (λ) − γ

and

Part 1.1: convergence of χ11 : We will need the following lemma: Lemma 11.

ˆ

ˆ γˆ S (λ) = dH m (λ) S dm (λ) ,

(27)

h i ˆ E γˆ S (λ) = E [v(τ γ) τ ] dH m (λ) Im dm (λ) ,

(28)

and:

iT 1 h −i λ where dm (λ) = √ 1, e , . . . , e−i (m−1) λ . m

Proof: This is the same idea than for Lemma 5. First, we can write:

N 1 X ψ(τi γ) 1= , N i=1 1 + c ψ(τi γ)

x , v : x 7→ u o g −1 (x) and where g : x 7→ 1 − c φ(x) ψ : x 7→ x v(x). T is a Toeplitz matrix • If A = T (a0 , . . . , am−1 ) ∗ (a−k = ak ), we can define the spectral density as:

= χ1

sup

B. Proof of Theorem 3

Let us considering the following notations:

a.s.

ˆ the matrix such as Σ ˆ ˇ −S • S

−→ 0, as Theorem 3 has ˆ is the matrix defined by: stated. As a reminder, S

χ1 will split

and

λ∈[0,2 π)

right term of (21) tends to zero when N is tends to infinity, the proof of Theorem 1 is completed.

This equation can be split as:

ˆ F P − E [v(τ γ) τ ] C

T C

ˆ − E [v(τ γ) τ ] C ˆ FP − S ˆ + T S ≤ T C

.

of We

(26)

into two sub-terms:

N −→∞

ˇ FP = The proof follow the same idea. With the notation C ˆ F P ) where T is the Toeplitz operator defined in the T (C introduction, the equation to prove becomes:

ˆ

a.s. (25)

T (CF P ) − E [v(τ γ) τ ] C −→ 0 .

Sˆ γ (λ) − γ E[v(τ γ) τ ] C (λ) ˆ

ˆ

γˆ S (λ) =

m−1 X

sˇk ei k λ ,

k=1−m

=

=

1 mN 1 mN

m−1 X

ei k λ

l,l′ =0

sˆj,n sˆ⋆j+k,n 10≤j+k

Robust Model Order Selection in Large

Robust Model Order Selection in Large

Suggest Documents

New Model Order Selection in Large Dimension Regime for Complex ...

Robust VIF regression with application to variable selection in large ...

Robust Reduced-Order Model Stabilization for

Robust Passivity Preserving Parametric Model Order ...

Robust Reduced-Order Model Stabilization for

On Model Selection in Robust Linear Regression - CiteSeerX

Robust Volatility Forecasts and Model Selection in ...

On model selection in robust linear regression - ETH E-Collection

Robust Mediators in Large Games

Model Order Selection of Damped Sinusoids in Noise by Predictive ...

Strategy Selection in Order Entry

Robust Controller Order Reduction

Robust Detection of Phone Boundaries Using Model Selection Criteria ...

Robust Linear Model Selection Based on Least Angle Regression

Model Mining for Robust Feature Selection - (CUI) - UNIGE

Acoustic model selection using limited data for accent robust ... - Eurasip

Fast Model Selection for Robust Calibration Methods - K.U.Leuven

Model Selection Identification and Robust Control for Dynamical ...

Model Selection Identification and Robust Control for Dynamical ...

Linear Regression Model Selection Based on Robust Bootstrapping ...

robust combination of model selection methods for prediction

Reduced Order Model and Robust Control Architecture for Mechanical ...

Robust Data Driven Model Order Estimation for ... - CiteSeerX

Bayesian Model Order Selection for Nonlinear ... - Semantic Scholar