Apr 30, 2011 - here of MarËcenko and Pastur [6] in 1967. Section III is dedicated to an introduction of random matrix theory, most specifically of the Stieltjes ...
1
Signal Processing in Large Systems: a New Paradigm
arXiv:1105.0060v1 [cs.IT] 30 Apr 2011
1
Romain Couillet1 and M´erouane Debbah2 Centrale-Sup´elec-EDF Chair on System Sciences and the Energy Challenge, Gif sur Yvette, France. 2 Alcatel Lucent Chair on Flexible Radio, Gif sur Yvette, Sup´ elec, France.
Abstract—For a long time, signal processing applications, and most particularly detection and parameter estimation methods, have relied on the limiting behaviour of test statistics and estimators, as the number n of observations of a population grows large comparatively to the population size N , i.e. n/N → ∞. Modern technological and societal advances now demand the study of sometimes extremely large populations, while simultaneously requiring fast signal processing due to accelerated system dynamics; this results in not-so-large practical ratios n/N , sometimes even smaller than one. A disruptive change in classical signal processing methods has therefore been initiated in the past ten years, mostly spurred by the field of large dimensional random matrix theory. The early literature in random matrix theory for signal processing applications is however scarce and highly technical. This tutorial proposes an accessible methodological introduction to the modern tools of random matrix theory and to the signal processing methods derived from them, with emphasis on simple illustrative examples.
I. I NTRODUCTION Most signal processing methods, e.g. statistical tests or parameter estimators, rely on some sort of asymptotic statistics of n observed random signal vectors [1]. This is because deterministic behavior often arises which simplifies the problem (or at least deterministic equivalents can be found) when n → ∞, as exemplified by the celebrated law of large numbers and the central limit theorem. With the increasing growth of system dimensions, understood generically by a system size N , and the need for even faster dynamics, the scenario n ≫ N is less and less met in practice so that large n properties are no longer valid, as will be demonstrated in Section II. This is the case for instance in array processing where a large number of antennas is used to detect incoming signals with short time stationarity properties (e.g. radar detection of fast moving objects) [2]. Other examples are found for instance in mathematical finance where numerous stocks show price index correlation over short observable time periods [3] or in evolutionary biology where the joint presence of multiple genes in the genotype of a given species is analyzed from scarce DNA samples [4]. This tutorial introduces recent tools, among which a new statistical inference framework, to cope with the n ≃ N limitation of classical approaches. These tools are based on the study of large dimensional random matrices, which originates from the works of Wigner [5] in 1955 and more importantly here of Mar˘cenko and Pastur [6] in 1967. Section III is dedicated to an introduction of random matrix theory, most specifically of the Stieltjes transform method, as well as to
statistical inference methods deriving from random matrix results. The aforementioned new statistical inference framework uses random matrix theory jointly with complex integration methods [7] and is often known as G-estimation, as it is based on the G-estimators of Girko [8]. Since the tools presented in Section III are expected to be rather new to the nonexpert reader, we elaborate mainly on toy examples rather than on a list of theoretical results to introduce the methods. More realistic application examples, taken from the still very scattered literature, are then presented in Section IV in order to further familiarize the reader with the introduced concepts. These examples span from signal detection in array processing to failure localisation in large dimensional networks. Finally, Section V draws the main conclusions and perspectives of this new emerging field. In the appendix, we also give a helpful reminder of the mathematical prerequisites needed for better understanding. These prerequisites are mainly on matrix calculus, see e.g. [9], [10], notions of convergence in probability theory, see e.g. [11], and on basic elements of complex analysis, see e.g. [7]. Notations: In the following, boldface lowercase (uppercase) characters stand for vectors (matrices). The matrix IN is the identity matrix of size N × N . The notations (·)T and (·)H denote transpose and Hermitian transpose, respectively. a.s. The notations −→ and ⇒ denote almost sure and weak convergence of random variables, respectively. The norm k · k will be understood as the Euclidean norm for vectors and the spectral norm for Hermitian matrices (kXk = maxi |λi | for X with eigenvalues λ1 , . . . , λN ). II. L IMITATIONS OF
CLASSICAL SIGNAL PROCESSING
A. The Mar˘cenko-Pastur law Consider x ∈ CN a complex random variable with zero mean and covariance matrix R. If x1 , . . . , xn are n independent realizations of x, then, from the law of large numbers, a.s.
kRn − Rk −→ 0 as n → ∞ and
(1)
n
Rn ,
1X xi xH i . n i=1
Here Rn is often called the sample covariance matrix, obtained from the observations x1 , . . . , xn of the random variable x with population covariance matrix E[xxH ] = R. The convergence (1) suggests that for n sufficiently large, Rn is
2
k=1
Empirical eigenvalues Mar˘cenko-Pastur density
0.8
0.6 Density
an accurate estimate of R. The natural reason why R can be asymptotically well approximated is because Rn originates from a total of N n ≫ N 2 observations, while the parameter R to be estimated is of size N 2 . For further use, we will denote X = [x1 , . . . , xn ] with (i, j) entry Xij , and realize that with this notation, Rn = n1 XXH . Suppose now for simplicity that R = IN . If the relation N n ≫ N 2 is not met in practice, e.g. if n cannot be taken very large compared to the system size N , then a peculiar behaviour of Rn arises, as both N, n → ∞ while N/n does not tend to zero. Observe that the entry (i, j) of Rn is given by n 1X ∗ Xik Xkj [Rn ]ij = n
0.4
0.2
0 0
0.5
1
∗ E[Xik Xkj ]
with Xij the entry (i, j) of X. Since R = IN , = δij . Therefore, the law of large numbers ensures that [Rn ]ij → 0 if i 6= j, while [Rn ]ii → 1, for all (i, j). Since N grows along with n, we might then say that Rn converges point-wise to an “infinitely large identity matrix”. This stands, irrelevant of N , as long as n → ∞, so in particular if n = N/2 → ∞. However, under this hypothesis, Rn is of maximum rank N/2 so is rank-deficient.1 Altogether, we then have that Rn converges point-wise to an “infinite-size identity matrix” which has the property that half of its eigenvalues equal zero. The convergence (1) therefore clearly does not hold here as the spectral norm of Rn − IN is at least greater than 1 for all N, n. This convergence paradox (convergence of Rn to an identity matrix with zero eigenvalues) stands obviously from an incorrect reasoning arising when taking the limits N, n → ∞. Indeed, the convergence of matrices with increasing sizes only makes sense if a proper measure, here the spectral norm, in the space of such matrices is considered. The outcome of our previous observation is then that the random eigenvalue distribution F Rn of Rn , defined as F Rn (x) =
N 1 X 1x≤λi N i=1
with λ1 , . . . , λN the eigenvalues of Rn , does not converge (weakly) to a Dirac mass measure in 1, and this is all what can be said to this point. This remark has fundamental consequences in basic signal processing detection and estimation procedures, some of which are discussed hereafter and precisely studied in Section IV. In fact, although the eigenvalue distribution of Rn does not converge to a Dirac in 1, it turns out that in many situations, it does converge towards a limiting distribution, as N, n → ∞, N/n → c (in the sense of weak convergence of probability measures; see Appendix B or [12]). In the particular case where R = IN , the limiting distribution is known as the Mar˘cenko-Pastur law, initially exhibited by Mar˘cenko and Pastur in [6]. This distribution is depicted in Figure 1 and compared to the empirical simulation of a matrix Rn of large dimensions. The shape of the Mar˘cenko-Pastur law for different limiting ratios c is then depicted in Figure 2. From Figure 1, we observe that the empirical distribution 1 Indeed,
Rn =
2 N
PN/2 i=1
xi xH i , which is the sum of N/2 rank-1 matrices.
1.5
2
2.5
3
Eigenvalues of Rn
Fig. 1. Histogram of the eigenvalues of Rn = CN(0, IN ), for n = 2000, N = 500.
1 n
Pn
k=1
xk xH k , xk ∼
of the eigenvalues is a good match to the limiting distribution for N, n large. It is also observed that no eigenvalue seems to leave the support of the limiting distribution even for finite N, n; this fundamental property can be proved to hold for any matrix X with independent and identically distributed (i.i.d.) zero mean and unit variance entries under some mild moment assumption. We can also see in Figure 2 that, as c → 0, the support of the Mar˘cenko-Pastur law tends to concentrate into a single mass in 1, as expected by the discussion above for finite N , while it tends to spread and reach 0 when c is large, which is compliant with the existence of a mass of eigenvalues at 0 when N < n, i.e. c > 1. The methodology employed to derive the Mar˘cenko-Pastur law and its generalization to more advanced random matrix models is described in Section III as an example of use of one of the most fundamental tools of random matrix theory: the Stieltjes transform. The main implication of the above observations on signal processing methods is that, in general, the tools developed in view of application for small N and large n are not adequate if N is taken much larger in the first place, or if n cannot be afforded to be too large. In the remainder of this section, we introduce a practical signal processing example which assumes the setup n ≫ N and which can be proved asymptotically biased if N/n does not converge to zero instead. B. Multi-source power inference This example deals with the power inference of multiple signal sources embedded in white noise. We model the problem by assuming that, at time t, a sensor array of dimension N receives signals from K sources equipped withPn1 , . . . , nK K transmit antennas, respectively. We denote n = i=1 ni and assume n ≤ N . The receive data vector at time t is denoted y(t) ∈ CN and reads y(t) =
K X √ ˜ i xi (t) + σw(t) P iU i=1
3
1.2 c = 0.1 c = 0.2 c = 0.5
0.8 Density
Density fc (x)
1
Empirical eigenvalue distribution Limit law (from Theorem ??)
0.6
0.6 0.4
0.4
0.2
0.2 0
0 0
0.5
1
1.5
2
2.5
3
x
Fig. 2.
Mar˘cenko-Pastur density fc for different limit ratios c = lim N/n.
where xi (t) ∈ Cni is the signal sent by source i, assumed such that xi (t) ∼ CN(0, Ini ), Pi is the power of source i according ˜ = [U ˜ 1, . . . , U ˜ K] ∈ ˜ i ∈ CN ×ni such that U to the sensors, U N ×n C is a mixing matrix formed by n columns of an N × N unitary matrix U,2 and σw(t) ∼ CN(σ 2 , IN ) is an additive noise. It is clear here that E[y(t)y(t)H ] = UPUH + σ 2 IN where P = diag(P1 , . . . , P1 , . . . , PK , . . . , PK , 0, . . . , 0) ∈ CN ×N , with each Pi of multiplicity ni and zero of multiplicity N − n. Similar to above, we then consider M successive independent observations y(1), . . . , y(M ) of the process y(t) and we denote Y = [y(1), . . . , y(M )]. We then have that, as M → ∞,
1
H a.s. H 2
(2)
M YY − σ IN − UPU −→ 0
where the convergence is in spectral norm, and therefore, since the eigenvalues of UPUH are the same as those of P, an M consistent estimator Pˆk∞ for Pk reads 1 X (λi − σ 2 ) (3) Pˆk∞ = nk i∈Nk
1 YYH where λ1 < . . . < λN are the ordered eigenvalues of M PK PK and Nk = {N − i=k ni + 1, N − i=k+1 nk }. Indeed, the 1 eigenvalues of M YYH indexed by Nk can be directly mapped to those with same indexes in P. For N and M of comparable sizes however, the convergence (2) no longer holds. In this scenario, the spectrum of 1 H converges weakly to the union of a maximum of M YY K + 1 compact clusters of eigenvalues concentrated somewhat around σ 2 , P1 + σ 2 , . . . , PK + σ 2 , but whose masses are not centered at these points. This is depicted in Figure 3 for σ = 0, where it is clearly seen that eigenvalues gather around 2 This assumption is made here for simplicity of exposition and approxi˜ with Gaussian i.i.d. entries mates to some extent the scenario of a matrix U if n ≪ N .
1
3
7 Eigenvalues
1 Fig. 3. Histogram of the eigenvalues of M YYH for N = n = 300, M = 3000, with P diagonal composed of three evenly weighted masses in 1, 3 and 7, σ = 0.
the empirical values of P1 , P2 , P3 in clusters. It turns out that, as N, M → ∞, the estimator Pˆk∞ is not consistent. For this reason, more advanced analysis on the spectrum 1 YYH must be made, from which (N, M )-consistent of M alternatives to (3) can be derived. This is the objective of this article, which develops a methodological approach to (N, M )consistent estimators, call G-estimators, in particular through the generalization of (3). III. L ARGE DIMENSIONAL RANDOM MATRIX THEORY Random matrix theory studies the properties of matrices whose entries follow a joint probability distribution. As such, this field groups mainly the study of small size matrices with joint Gaussian entries [13], [14], [15], the study of small and large random matrices with invariance properties (free probability theory [16], [17], combinatorics [18], [19], Gaussian methods [20], [21]) and the study of large random matrices with independent entries [6], [22], [23]. In the sequel, we will mostly concentrate on the latter, most popular, approach, although some results we present originate from other techniques. The main tool of interest in the spectral study of large dimensional random matrices with independent entries is the Stieltjes transform, which we introduce hereafter. A. The Stieltjes transform Similar to the much celebrated Fourier transform in classical probability theory and signal processing which allows one to perform simpler analysis in the Fourier (or frequency) domain than in the initial (or time) domain, spectral analysis of large dimensional random matrices is conveniently analyzed through the Stieltjes transform: Definition 1 (Stieltjes transform): Let F be a real probability distribution function and z ∈ C taken outside the support of F . Then the (Cauchy-)Stieltjes transform mF (z) of F at point z is defined as Z 1 dF (t). mF (z) , t−z
4
Note importantly that, for X ∈ CN ×N Hermitian with eigenvalues λ1 , . . . , λN , the Stieltjes transform mX of (the eigenvalue distribution F X of) X is mX (z) =
Z
N dF X (t) 1 X 1 1 = = tr(X − zIN )−1 t−z N λk − z N k=1
where the last equality is valid since there exists U ∈ CN ×N unitary such that UXUH = diag(λ1 , . . . , λN ) and clearly tr UXUH = tr X. This remark is essential when it comes to studying the properties of the eigenvalues of X, as it is relatively easy to manipulate matrices of the type (X−zIN )−1 thanks to inversion lemmas. The Stieltjes transform, in the same way as the Fourier transform, has an inverse formula, given as follows F (b) − F (a) = lim
y→0
Z
We will now apply the lemmas above to derive a sketch of the proof of the Mar˘cenko-Pastur law in a very condensed way. We remind that we wish to determine the form of the limiting distribution, given in Figure 2, of the eigenvalue distribution of XXH , as N, n → ∞, N/n → c, where X = [x1 , . . . , xn ] ∈ CN ×n has i.i.d. entries of zero mean and variance 1/n.3 Since we want to use the Stieltjes transform method, we want here to evaluate N1 tr(XXH − zIN )−1 . The idea is to look at the value of each individual diagonal entry (i, i) of the matrix (XXH − zIN )−1 . Let us start with the entry (1, 1). Denoting XH = [y YH ] ∈ Cn×N , the block-matrix inversion lemma and some matrix manipulations, see Appendix A or [9], give " −1 # yH y − z yH Y H H −1 (XX − zIN ) 11 = Yy YYH − zIN −1
11
b
mF (x + iy)dx
(4)
1 = −z − zyH (YH Y − zIn )−1 y
a
although this formula is rarely used in practice. What this says for random matrices is that, if the Stieltjes transform of a matrix X is known, then one can retrieve the eigenvalue distribution of X, which is often more difficult to derive directly. Since we will constantly deal with matrices of the form (X − zIN )−1 , we need in the following fundamental random matrix inverse identities, used almost exclusively in the core of the major theorem proofs, and that can be found in e.g. [24], [25]. Lemma 1 (Trace lemma): Let A ∈ CN ×N be a nonnegative definite Hermitian matrix with uniformly bounded spectral norm over N and x ∈ CN be a vector independent of A with i.i.d. entries of zero mean, variance 1/N and eighth order moment of order O(1/N 4 ). Then, as N → ∞, xH Ax −
B. Spectrum of large matrices
1 a.s. tr A −→ 0. N
To convince oneself, observe that the result clearly holds in expectation and that the law of large numbers suggests that nondiagonal entries of A won’t play a role asymptotically. This lemma will be used when inversion lemmas will allow us to extract a column x of a given random matrix X = [x Y] with independent columns, the remaining matrix Y being left indea.s. pendent of x, and in particular xH YYH x − N1 tr YYH −→ 0. Lemma 2 (Rank-one perturbation lemma): Let X ∈ CN ×N be a nonnegative Hermitian matrix, v ∈ CN a vector and z ∈ C \ R+ (where the inverse matrices are well defined). Then 1 1 tr(X + vvH − zIN )−1 − tr(X − zIN )−1 → 0 N N as N → ∞. This lemma will be used in conjunction with Lemma 1 to show that, with the example X = [x Y] above, N1 tr(YYH − zIN ) − N1 tr(XXH − zIN ) → 0. In the following, the derivation of the spectrum of large matrices is performed using the Stieltjes transform method.
(5)
From there, we want to recover from yH (YH Y − zIn )−1 y an expression of the Stieltjes transform mXXH (z) of XXH . This is done in two steps. It is clear that y is independent of Y, and therefore, using Lemmas 1-2 and the associated discussions, as N, n → ∞, yH (YH Y − zIn )−1 y −
1 a.s. tr(XH X − zIn )−1 −→ 0 n
(6)
which is not exactly what we wanted, since we obtained the Stieltjes transform of XH X instead of that of XXH . However, noticing that the eigenvalue distribution of both only differ by a mass of |n − N | eigenvalues at 0, it is immediate that 1 N 1 N −n1 tr(XH X − zIn )−1 = tr(XXH − zIN )−1 + . n nN n z (7) This value is independent of the particular choice of (1, 1) as the studied diagonal entry of (XXH − zIN )−1 , and then the same result stands for any diagonal entry. In the limit, we can now replace yH (YH Y − zIn )−1 y in (5) by n1 tr(XH X − zIn )−1 , according to (6), itself being expressed as a function of N1 tr(XXH − zIn )−1 , according to (7). This gives in the end 1 (XXH − zIN )−1 11 − 1 N 1 − n − z − z n tr(XXH − zIn )−1 a.s.
−→ 0.
Averaging the above relation for each diagonal entry of (XXH −zIN )−1 and taking N, n → ∞, N/n → c, we deduce that the Stieltjes transform mXXH (z) , N1 tr(XXH − zIn )−1 has a limit which is solution to the following equation in m m−
1 = 0. 1 − c − z − zcm
(8)
Quite often, this type of form is as far as the Stieltjes transform method will lead us and we need to use fixed-point algorithms 3 We should add for completion here that the entries of X have eighth order moment of order 1/N 4 , but this can be relaxed [22].
5
to solve (8). Here, it turns out that this is a quadratic form which can be solved explicitly in m. Precisely, we obtain p 1 mF (z) = 1 − c − z + (1 − c − z)2 − 4zc . (9) 2zc
Using the inverse Stieltjes transform formula (4), we finally obtain the desired distribution with density 1 p (x − a)(b − x) fc (x) = (1 − c−1 )+ δ(x) + 2πcx √ √ with a = (1 − c)2 and b = (1 + c)2 , the left and right edges of the support. The same technique can be applied to a wide variety of similar models, and in particular to the sample covariance matrix model. This is given by the following result [22], 1 1 Theorem 1: Consider the matrix BN = T 2 XXH T 2 ∈ N ×N N ×n C , where X ∈ C has i.i.d. entries with zero mean and variance 1/n, and T ∈ CN ×N is nonnegative Hermitian whose eigenvalue distribution F T converges weakly and almost surely to F T as N → ∞.4 Then, the eigenvalue distribution of BN converges weakly and almost surely to F such that, for z ∈ C+ , mF (z) = cmF (z) + (c − 1) 1z , where mF (z) is the unique solution with positive imaginary part of the equation in m −1 Z t T m=− z−c dF (t) 1 + tm −1 1 c 1 − mT (−1/m) (10) =− z− m m with mT (z) the Stieltjes transform of T .5 From T given, mF (z) can be explicitly estimated in z = x + iy, with y > 0 small and for all x > 0. Using the inverse Stieltjes transform formula (4), we then obtain a good approximation of F , which is the technique we used to produce Figure 3. It is therefore possible to derive the limiting eigenvalue distribution of a wide range of random matrix models or, if a limit does not exist, to derive deterministic equivalents for the large dimensional matrices. For the latter, the eigenvalue distribution F BN of BN can be approximated for each N by a deterministic distribution FN such that F BN − FN ⇒ 0 almost surely. These distribution functions FN usually have a very practical form for analysis; see e.g. [27], [26], [28] for examples of random matrix models that do not admit limits but only deterministic equivalents. The knowledge of the large dimensional distribution of such matrices is not the specific object of interest to signal processing, but it is a required prior step before performing statistical inference and detection methods. It must be noted instead that, for wireless communication problems, such as the capacity C = log det(IN + ρHHH ) = Rthat of evaluatingHH H log(1 + ρt)dF (t) of a multi-antenna channel H, these 4 See
Appendix B for notions of convergence in probability. is easy to realize that mF (z) is in fact the limiting Stieltjes transform of the matrix BN = XH TX. We mention also that in [22], the exact model is BN = A + XH TX, with T diagonal. For A = 0, T is not required to be diagonal, see e.g. the extension [26]. 5 It
tools alone are fundamental, see e.g. [28], [26], [29], [30], [31]. In the following, we pursue our study of spectral properties of large dimensional random matrices to applications in statistical inference. C. G-estimation The G-estimation method, originally due to Girko [8], intends to provide (N, n)-consistent estimators based on large N × n random matrix observations. The general technique consists in a three-step approach, along the following lines. First, we write the parameter to be estimated, call it θ, under the form of a functional of the Stieltjes transform of a deterministic hidden matrix in the model, e.g. θ = f (mT ) for a given deterministic T matrix with Stieltjes transform mT . Then, mT is connected to the Stieltjes transform mBN of an observed matrix BN through a formula similar to (10), e.g. mT = limN g(mBN ) for a certain function g. Finally, connecting the pieces together, we have a link between the parameter of interest and the observations, this link being correct only asymptotically. For finite N, n dimensions, this produces naturally an (N, n)-consistent estimator by approximating limN g(mBN ) by g(mBN ). The Stieltjes transform is fundamental here as it can connect observations to deterministic values in the system. The other important tool, that will connect the parameter θ to the Stieltjes transform mT , is complex analysis. We give a concrete example here in the context of the estimation of the source powers discussed in Section II-B. We recall that we observe a matrix of the type Y = 1 P 2 UX+σW, where U ∈ CN ×N is unitary, X, W ∈ CN ×M are filled with i.i.d. entries with zero mean and unit variance and P is diagonal with n1 entries equal to P1 , . . . , nK entries equal to PK and N − n entries equal to zero. All Pi ’s are supposed distinct and non zero. Our objective is to infer P1 , . . . , PK from the observation Y. For this, we make the following first observation. From Cauchy’s complex integration formula (see Appendix C or [7]), I 1 z Pk + σ 2 = − dz 2πi Ck (Pk + σ 2 ) − z for a complex positively oriented contour Ck circling once around Pk +σ 2 . If Ck does not enclose any of the other Pi +σ 2 , i 6= k, then it is also true, again from Cauchy’s integration formula, that I K z 1 X 1 N 2 ni dz Pk + σ = − 2πi nk Ck N i=0 (Pi + σ 2 ) − z I 1 N zmP +σ2 (z)dz (11) =− 2πi nk Ck where we denoted mP +σ2 (z) = mP+σ2 IN (z). So we managed to write Pk as a function of the Stieltjes transform of the nonobservable matrix P + σ 2 IN . The next step is now to link 1 YYH (or in fact of mP +σ2 to the Stieltjes transform of M 1 H N Y Y). For this, we use Theorem 1 (the columns of Y being Gaussian with zero mean and covariance matrix T = UH PU + σ 2 IN , unitarily equivalent to T′ = P + σ 2 IN ),
6
which states that, as N, M → ∞ with N/M → c, assuming ni /N = ci constant (to maintain mP+σ2 IN constant for all N ), the spectrum of N1 YH Y converges weakly and almost surely to F with Stieltjes transform mF solution of the equation in m 1 c 1 1 − mP +σ2 (−1/m) . = −z + m m m To connect Pk to mF (z) (and hence approximately to Y), we proceed to the variable change z = −1/mF (u) in (11), dz = m′F (u)/mF (u)2 du 1 1 mP +σ2 (−1/mF (u)) = mF (u) 1 − − umF (u) . c c Replacing in (11), we then obtain # ′ I " mF (u) m′F (u) 1 1 1 2 Pk = −σ + du + −1 u c 2πi C′k mF (u) c mF (u)2 for a given contour C′k to be discussed later. Since m′F (u)/mF (u)2 is the derivative of −1/mF (u), it integrates to zero on closed contours. Therefore, we end up with I m′F (u) 1 1 2 Pk = −σ + du. u c 2πi C′k mF (u) It can be proved, see [32], that, upon separation of the cluster corresponding to Pk in the spectrum of F from the neighboring clusters, a condition referred to as cluster separability (this is the case in Figure 3), if C′k is chosen to circle around this cluster exactly, then Ck (its preimage by −1/mF + σ 2 ) circles exactly around Pk (the converse is not true). The separability condition is essential here. If not met, the estimator cannot be derived with this technique. The link between Pk and Y can now be established from the fact that, denoting m(z) ˆ the Stieltjes transform of N1 YH Y, a.s. m(u) ˆ −→ mF (u) (from Theorem 1). As one expects, it can then be proved (by dominated convergence arguments [7]) that I 1 1 m ˆ ′ (u) a.s. du −→ Pk + σ 2 . u c 2πi C′k m(u) ˆ But then, m(u) ˆ and m ˆ ′ (u) are rational functions, so that the complex integral can be solved by residue calculus (see Appendix C or [7]). Precisely, we see that the only poles of 1 YYH and the roots the integrand are the eigenvalues λi of M of m, ˆ call them the µi ’s, lying in the interior of C′k . This is because, for z ∼ λi , the integrand behaves like z/(λi − z) and for z ∼ µi , the integrand clearly grows infinite. Also, it can be shown that λi < µi+1 < λi+1 for each i so that each λi in the contour C′k comes along with a µi in the contour. After precise determination of the poles asymptotically found in C′k (which is a consequence of an exact separation result [33] to be introduced in Section III-D), we obtain [23], [32] 1 X a.s. (λm − µm ) −→ Pk + σ 2 ck m∈Nk
with Nk defined as previously and µ1 < . . . < µN are √ √ T 1 λ λ , where the ordered eigenvalues of diag(λ) − M
λ = (λ1 , . . . , λN )T . We therefore obtain a natural (N, M )consistent estimator Pˆk for Pk by 1 X Pˆk = −σ 2 + (λm − µm ). (12) ck m∈Nk
Note that, although the summation is over indexes in the clus1 YYH , ter corresponding to Pk in the ordered spectrum of M the variables µi depend in a non trivial manner on all λi . Multiple estimators can be derived similarly for different types of models and problems, a large number of which have been investigated by Girko and named, after him, the G-k estimators, where k spans from one to more than fifty [34]. In Section IV, some examples relying on the method developed above will be introduced. We now turn to a finer analysis of random matrices which is concerned, no longer with the weak limit of eigenvalue distributions, but with the weak limit of fluctuations of individual eigenvalues (in particular the extreme eigenvalues). D. Extreme eigenvalues It is important to note at this point that Theorem 1 (whose corollary for T = 1x≤1 is the Mar˘cenko-Pastur law) and all theorems deriving from the Stieltjes transform method briefly mentioned in Section II, only provide weak convergence of the empirical eigenvalue measure to a limit F but do not say anything about the individual eigenvalue behaviour. In particular, the fact that the support of F is compact does not mean that the empirical eigenvalues asymptotically all lie in this compact support. Indeed, consider for instance the distribution FN = NN−1 1x≤1 + N1 1x≤2 . Then the largest value taken by a random variable distributed as FN equals 2 for all N although it is clear that FN converges weakly to F = 1x≤1 (and then the support of F is the singleton {1}). In the context of Theorem 1, this reasoning indicates that, although the weak limit of F BN is compactly supported, the largest eigenvalue of F BN may always be found outside the support. As √ a matter of fact, it is proved in [35] that, if the entries of nX in Theorem 1 have infinite fourth order moment, then almost surely the largest eigenvalue of XXH tends to infinity. For non-degenerate scenarios though, i.e. when the fourth moment of the scaled entries is finite, we however have the following fundamental result [36], [37]. Theorem 2 (No eigenvalue outside the support): Let BN be defined as in Theorem 1 with X having entries with fourth order moment of order O(1/N 2 ) and with kTN k, TN = T, uniformly bounded over N . We remind that F BN ⇒ F for F defined in Theorem 1. Denote F TN the eigenvalue distribution of TN (formed of N Dirac masses) and call FN the weak limit of F BN if the series {F T1 , F T2 , . . .} were to converge to F TN . For any ε > 0, take now [a − ε, b + ε], a, b ∈ (0, ∞] a fixed interval outside the union SN0 of the supports of FN for all N ≥ N0 . Then, almost surely, there is no eigenvalue of BN found in [a − ε, b + ε] for all large N . This result says in particular that, for TN = IN , and for all large N , there is no eigenvalue of BN found any ε away from the support of the Mar˘cenko-Pastur law. This further generalizes to the scenario where TN is made of a
7
few masses, in which case F is formed of multiple compact clusters. In this context, it is proved in [33] that the number of eigenvalues asymptotically found in each cluster matches exactly the multiplicity of the corresponding mass in TN (assuming cluster separability). This is often referred to as exact separation. Note that this point fully justifies the last step in the derivation of the estimator (12). These results, as will be seen in Section IV, allow us to derive new hypothesis tests for the detection of signals embedded in white noise. That is, tests that decide between the hypotheses TN = IN and TN 6= IN . Based on the earlier reasoning on the weak limit interpretations, we insist that, while Theorem 1 does state that F TN ⇒ 1x≤1 implies that F is the Mar˘cenko-Pastur law, it does not state that F TN ⇒ 1x≤1 , kTN k bounded, implies that no eigenvalue is asymptotically found away from the support of the Mar˘cenko-Pastur law. Indeed, if TN = diag(1, . . . , 1, a), with a > 1, then the theorem only states that no eigenvalues are found away from the union of the support of the Mar˘cenkoPastur law and any compact containing the largest eigenvalue of BN (found in all spectra FN ). This remark is fundamental and has triggered a recent interest for this particular model for TN , called the spike model, which we study presently. E. Spike models Under the generic denomination of “spike models” are gathered all random matrix models showing small rank perturbations of some classical random matrix models. For instance, for X ∈ CN ×n with i.i.d. entries with zero mean and unit variance, the matrices BN = (X + P)(X + P)H , P ∈ CN ×n such that PPH is of small rank K, or BN = 1 1 (IN + P) 2 XXH (IN + P) 2 , P ∈ CN ×N of small rank K, fall into the generic scheme of spike models. For all these models, Theorem 1 and similar results, see e.g. [38], claim that F BN converges weakly to the Mar˘cenko-Pastur law. The interest here though is on the study of the extreme eigenvalues of BN . The first remark concerning these spike models is that, to each eigenvalue (or singular value) of P corresponds an eigenvalue of BN asymptotically found either inside or outside the support of the Mar˘cenko-Pastur law. The condition for this eigenvalue to be found inside or outside the support of the limiting law depends on the amplitude of the eigenvalue of P under consideration and on the limiting ratio c. There therefore exists an asymptotic phase transition effect which rules the fact that remote eigenvalues of BN are found or not away from the base support, for all large N . This has important consequences related to decidability in large dimensional hypothesis tests. To carry on on the same line of study, we concentrate here on BN modeled as in Theorem 1, with TN a small rank perturbation of the identity matrix. Similar properties can be found for other models in e.g. [39], [40]. Our first result deals with first order limits of eigenvalues and eigenvector projections of the spike model [41], [39], [42]. Theorem 3: Let BN be defined as in Theorem 1, X have i.i.d. complex Gaussian entries, and TN = T = IN +
PK
ω i ui uH i for K fixed, ω1 > · · · > ωK > 1. Denote λ1 ≥ . . . ≥ λN the ordered eigenvalues of BN and uN,i ∈ CN the eigenvector with eigenvalue λi . Then, for 1 ≤ i ≤ K, √ √ , if ωi ≤ c (1 + c)2 a.s. √ λi −→ i ρi , 1 + ωi + c 1+ω , if ωi > c. ωi i=1
and a.s. |uH i uN,i | −→
(
0 ξi ,
1−cωi−2 1+cωi−1
√ c √ , if ωi > c.
, if ωi ≤
Although this was not exactly the approach followed initially in [41], to prove this result, the same Stieltjes transform and complex integration framework can be used. Let us con√ sider for simplicity the case where P = ωuuH and ω > c, and let us study the extreme eigenvalues and eigenvector projections of BN . 1) Extreme eigenvalues: By definition of an eigenvalue, det(BN − zIN ) = 0 for z an eigenvalue of BN . Some algebraic manipulation based on determinant product formulas and the Woodbury’s identity (see Appendix A) then leads to det(BN − zIN ) = fN (z) det(IN + P) det(XXH − zIN ) ω for fN (z) = 1 + z 1+ω uH (XXH − zIN )−1 u. For X i.i.d. Gaussian (and therefore invariant by left- or right-product with unitary matrices), the trace lemma, Lemma 1, suggests that ω a.s. fN (z) −→ f (z) , 1 + z mF (z) 1+ω for mF the Stieltjes transform of the Mar˘cenko-Pastur law. Now, from Theorem 2, for all N large, almost √ surely, det(XXH√−zIN ) = 0 has no solution for z > (1+ c)2 +ε or z < (1− c)2 −ε for any given ε > 0. The eigenvalues of BN are also the solutions to fN (z) = 0, therefore asymptotically to f (z) = 0. Substituting mF (z) by its exact formulation (9), we then√ obtain that f (z) = √ 0 is equivalent to z = 1+ω+c 1+ω ω if ω > c, and z > (1 + c)2 in this case, which gives the expected result. 2) Eigenvector projections: For the result of eigenvector projection, residue calculus states that, for any deterministic vectors a, b ∈ CN and for all large N , I 1 H H a uN,1 uN,1 b = − aH (BN − zIN )−1 b (13) 2πi C
where C is a positively oriented contour circling around ω and not containing 1 (the only zero of BN − zIN inside C is in ω, with residue uN,1 uH N,1 ). Using the same matrix manipulations as for the eigenvalue approach, we further obtain I 1 1 −1 aH (IN + P)− 2 Q(z)(IN + P)− 2 dz aH uN,1 uH b = N,1 2πi C I 1 ω g(z)dz + zfN (z)−1 2πi C 1+ω
where Q(z) = (XXH − zIN )−1 and 1
1
g(z) = aH (IN + P)− 2 Q(z)uuH Q(z)(IN + P)− 2 b. The √ first right-hand side term has again no residue for z > (1+ c)2 +ε for any ε > 0, so no pole in C (almost surely and for all large N ). The second term has a pole where fN (z) = 0,
8
so again asymptotically for z = ρ , 1 + ω + c 1+ω ω . Explicit residue calculus finally gives a.s.
aH uN,1 uH N,1 b −→
ρmF (ρ)2 1 aH uuH b 1 + ω mF (ρ) + ρm′F (ρ)
which, after explicit replacement of mF and m′F , taking a = −2 b = u, gives 1−ω 1+ω −1 , as required. The generalization of both results to P of rank K ≥ 1 is just a matter of further algebraic calculus. Details and precise convergence proofs are available in [42]. We see from Theorem 3 that, in a signal plus noise model where the signal is represented by a matrix P = ωuuH , if the observation window n is such that N/n > ω, then asymptotically no eigenvalue is found outside the support of the Mar˘cenko-Pastur law and therefore no decision can be based on the observation of λ1 . If, on the opposite, ω > N/n, then an eigenvalue will be found away from the support and the presence of a signal can be detected with high probability. However, signal detection tests in finite N horizons require more than first order limits on the decision variables. Recent works have provided further statistics for the results of Theorem 3, as well as for the fluctuations of the estimator (12), which we introduce presently. F. Fluctuations We first mention that the fluctuations of functionals of F BN − F in Theorem 1 have been derived in [43]. Precisely, for any well-behaved function f and under some mild technical assumptions, Z N f (t)d[F BN − F ](t) has a Gaussian limit with zero mean and a known variance. This in particular applies to f (t) = (t − z)−1 for z ∈ C \ R+ , which gives the asymptotic fluctuations of the Stieltjes transform of BN . From there, classical asymptotic statistics tools such as the delta-method [1] allow us to transfer the Gaussian fluctuations of the Stieltjes transform mBN to the fluctuations of any function of it, e.g. estimators based on mBN . The following result on the fluctuations of the estimator (12) therefore comes with no surprise [44]. Theorem 4: Consider the power inference scenario leading to the estimator Pˆk in (12). Assuming the entries of X Gaussian and that the cluster separability condition for Pk is met, N (Pˆk − Pk ) ⇒ N 0, σ 2 where σ 2 is given by
1 σ2 = − 2 2 2 4π c ck I I m′ (z1 )m′ (z2 ) 1 dz1 dz2 − 2 2 m(z )m(z ) ′ ′ (m(z ) − m(z )) (z − z ) 1 2 1 2 1 2 Ck Ck with the contour C′k enclosing the cluster corresponding to Pk in the limiting spectrum F of N1 YH Y, and m(z) = mF (z). For practical applications, Theorem 4 allows for an evaluation of the performance of the estimator Pˆk of (12).
In the case of the spike model, it is of interest to evaluate the fluctuations of the extreme eigenvalues and eigenvector projection, whose limits were given in Theorem 3. Surprisingly, the fluctuations of these variables are not always Gaussian, [45], [46], [42] Theorem 5: Let B√ N be defined as in Theorem 3. Then, for 1 ≤ k ≤ K, if ωk < c, √ 2 2 λk − (1 + c) 3 N √ 4 √ ⇒ T2 3 (1 + c) c where T2 is known as the complex √ Tracy-Widom distribution [47]. If, on the opposite, ωk > c, then H √ |uk uN,k |2 − ξk ⇒ N (0, Σk ) N λk − ρk where Σk =
2 c2 (1+ωk )2 c (1+ωkk ))2 2 (c+ωk )2 (ωk −c) 3(c+ω (1+ωk ) c2 (ωk +c)2 ωk
+1
(1+ωk )3 c2 (ωk +c)2 ωk . 2 c(1+ωk )2 (ωk −c) 2 ωk
√ Moreover, for k 6= k ′ such that ωk , ωk′ > c, the crossvariations of eigenvalues and eigenvector projections are independent. The second result of Theorem 5 can be obtained using the same Stieltjes transform and complex integration framework described as a sketch of proof of Theorem 3, see [42] for details. Again in that case, these are mainly the fluctuations of the Stieltjes transform of XXH and an application of the delta method which lead to the result. As for the first result of Theorem 5, it is proved using very different methods which we do not introduce here (since the extreme eigenvalues are inside the base support, complex contour integration approaches cannot be performed). The Tracy-Widom distribution is depicted in Figure 4. It is interesting to see that the Tracy-Widom law is centered on a negative value and that the probability for positive values is low. This √ suggests that eigenvalues of BN found greater than (1 + c)2 for N large strongly favor the spike model assumption against a pure noise model (a similar remark can be done for the smallest eigenvalue of BN which has a mirrored Tracy-Widom fluctuation [48], when c < 1). Theorem 5 allows for asymptotically accurate test statistics for the decision of signal plus noise against pure noise models, thanks to the result on the extreme eigenvalues. Moreover, the results on the eigenvector projections provide even more information to the observer. In the specific context of failure detection [42], introduced briefly in Section IV, this can be used for the identification of the (asymptotically) most likely assumption from a set of M failure models of the type (IN + 1 1 Pk ) 2 XXH (IN + Pk ) 2 . G. Practical limitations For all techniques derived above, it is fundamental to keep in mind that the results are only valid under the large dimensional assumptions. As it turns out from convergence speed considerations, estimators based on weak convergence properties of the eigenvalue distribution are accurate even for very small system sizes (N = 8 is often sufficient, if not less), while second order statistics of functionals of
9
0.5 Empirical Eigenvalues Tracy-Widom law T2
0.4
already mentioned that this model, with unitary channel matrix U, is unrealistic and is better replaced by one with a channel H with random i.i.d. (in general assumed Gaussian) entries. In this scenario, i.e. y(t) =
Density
0.3
K X √ P i Hi xi (t) + σw(t) i=1
0.2
0.1
0 −4
−2
0
2
Centered-scaled largest eigenvalue of
4 XXH
2 4 1 √ √ Fig. 4. Distribution of N 3 c− 2 (1 + c)− 3 λ1 − (1 + c)2 against the Tracy-Widom law for N = 500, n = 1500, c = 1/3, for the covariance matrix model XXH , X ∈ CN×n with Gaussian i.i.d. entries. Empirical distribution taken over 10, 000 Monte-Carlo simulations.
the eigenvalue distribution require N of order 64. When it comes to asymptotic properties of the extreme eigenvalues, the convergence speed is slower so that N = 16 is often a minimum for first order convergence, while N of order 128 is required for second order statistics to emerge. Note also that these empirical values assume “non-degenerated” conditions; for instance, for N = 128, the second order statistics of the largest √ eigenvalue in a spike model with population eigenvalue ω = c + ε for a small ε > 0 are often far from Gaussian so that, depending on how small ε is chosen, much larger N are needed for an accurate Gaussian approximation to arise. This suggests that all the above methods have to be manipulated with care depending on the application at hand. The results introduced so far are only a small subset of the important results of more than ten years of applied random matrix theory. A large exposition of sketches of proofs, main strategies to address applied random matrix problems, as well as a large amount of applications are analyzed in the book [25] in the joint context of wireless communications and signal processing. An extension of some notions introduced in the present tutorial can also be found in [49]. Deeper random matrix consideration about the Stieltjes transform approach can be found in [24] and references therein, while discussions on extreme eigenvalues and the tools needed to derive the Tracy-Widom fluctuations can be found in e.g. [50], [51], [52] and references therein. In the next section, we discuss four examples, already partly introduced above, which apply Theorems 1-5 in various contexts of hypothesis testing and statistical inference. IV. C ASE
STUDIES
A. Multi-source power inference in i.i.d. channels Our first application is closely related to the explanatory example of power inference of multiple signal sources. We
which is the same as detailed in Section II-B but for the random nature of H = [H1 , . . . , HK ], the observer gathers M realizations of y(t) in the matrix Y = [y(1), . . . , y(M )] and tries to infer P1 , . . . , PK . We see that the sample covariance matrix approach is no longer valid as the “population matrix” HHH + σ 2 IN is now random. Denoting H′ the extension of H into an i.i.d. N × N matrix, Y can be written under the form X 1 ′ Y = H P 2 σIN W where X = [x(1), . . . , x(M )] ∈ Cn×M , x(t) = [x1 (t)T , . . . , xK (t)]T ∈ Cn , W = [w(1), . . . , w(M )] ∈ CN ×M and P ∈ CN ×N as in Section II-B. Following the general methodology unfolded in Section III-C, assuming asymptotic cluster separation property for a certain Pk , we first need to determine a connection between Pk and mP (z), the Stieltjes transform of P. This connection is similar as previously; see (11). The innovation from the study in Section III-C is that the link between mP and mF , the limiting Stieltjes transform of N1 YH Y is more involved. Precisely, we have in this case that mF (z), z ∈ C+ , is the unique solution in C+ of the equation in m [32] 1 1 = −σ 2 + mP (−1/f (z)) m f (z) with f (z) = (c − 1)m − czm2 . Two variable changes are demanded at this point to control the image of the integration contour around Pk . This procedure is the main technical part of the proof. Similar to the sample covariance matrix model, the resulting integration contour circles around the expected cluster. After variable change, residue calculus and an update of Theorem 2 to include random TN , we finally obtain the following (N, M )-consistent estimator [42], X NM (ηi − µi ) Pˆk = nk (M − N ) i∈Nk
PK i=k+1 ni }, η1 < √ √ T . . . < ηN the eigenvalues of diag(λ) − N1 λ λ and µ1 < √ √ T 1 λ λ , where . . . < µN the eigenvalues of diag(λ) − M 1 YYH . λ1 < . . . < λN are the ordered eigenvalues of M In Figure 5, the performance comparison between the classical small N approach and the Stieltjes transform approach for power estimation is depicted, based on 10, 000 Monte Carlo simulations. Similar to the estimator (3), the classical approach of Pk , denoted Pˆk∞ , consists in assuming M ≫ N and N ≫ ni for each i. The parameters used are given in the caption. A clear performance gain is observed for high signalto-noise ratios (SNR), defined as σ −2 , although a saturation level is observed, which translates the approximation error due with Nk = {N −
PK
i=k
ni + 1, . . . , N −
10
with respective eigenvectors uN,1 , . . . , uN,N . Assuming N ≥ K, the last N − K eigenvalues of R equal σ 2 and we can represent R under the form H σ 2 IN −K UW 0 R = UW US 0 ΩS UH S
Normalized mean square error [dB]
0 Pˆ3 Pˆ ∞ 3
−5
with ΩS = diag(ωN −K+1 , . . . , ωN ), US = [uN −K+1 , . . . , uN ] the so-called signal space and UW = [u1 , . . . , uN −K ] the so-called noise space. The basic idea of the MUSIC method is to observe that any vector lying in the signal space is orthogonal to the noise space. This leads in particular to
−10
−15
UH W s(θk ) = 0 −20 −5
0
5
10
15
20
25
30
SNR [dB] (σ −2 )
Fig. 5. Normalized mean square error of largest estimated power P3 for P1 = 1/16, P2 = 1/4, P3 = 1, n1 = n2 = n3 = 4 ,N = 24, M = 128. Comparison between conventional and Stieltjes transform approaches.
for all k ∈ {1, . . . , K}, which is equivalent to η(θk ) , s(θk )UW UH W s(θk ) = 0. A natural estimator of θk is θˆk in the neighborhood of θk which minimizes ηˆ(θ) , s(θ)H UN,W UH N,W s(θ)
to asymptotic analysis. For low SNR, it is also observed that the classical method tends to become better, which is here due to the inexactness of the Stieltjes transform approach when clusters tend to merge (here the cluster associated to the value σ 2 grows large for σ 2 growing large and covers the clusters associated to P1 , then P2 and P3 ). Our next example, due to Mestre [2], is the contribution which initiated a series of statistical inference methods exploiting complex integration, among which the source detection methods described above. B. G-MUSIC The MUSIC method, originally due to Schmidt and later extended by McCloud and Scharf [53], [54], is a directional of arrival detection technique based on a subspace approach. The model is that of an array of N sensors (e.g. a radar) receiving in light-of-sight the signal originating from K sources at angles θ1 , . . . , θK with respect to the array. The objective is to estimate those angles. The data vector y(t) received at time t by the array can be written y(t) =
K X
s(θk )xk (t) + σw(t)
k=1
where s(θ) ∈ CN is a deterministic vector-valued function of the angle θ ∈ [0, 2π) assumed of unit Euclidean norm, xk (t) ∈ C is the standard Gaussian i.i.d. data sent by source k at time t and w(t) ∈ CN is the additive Gaussian noise. The vector y(t) is Gaussian with covariance H
2
R = S(Θ)S(Θ) + σ IN where S(Θ) = [s(θ1 ), . . . , s(θK )] ∈ CN ×K . We denote as usual Y = [y(1), . . . , y(M )] ∈ CN ×M . We call ω1 ≤ . . . ≤ ωN the eigenvalues of R and u1 , . . . , uN their associated eigenvectors. Similarly, we will 1 denote λ1 ≤ . . . ≤ λN the eigenvalues of RN , M YYH ,
where UN,W = [uN,1 , . . . , uN,N −K ] is the eigenvector space corresponding to the smallest N − K eigenvalues of RN . This estimator is however proven inconsistent for N, M growing large simultaneously [55]. To produce an (N, M )consistent estimator, we remind the following remark established in (13). Similar to the derivation of the eigenvector projection in Section III-D, we can write I 1 s(θ) = s(θ)H UW UH s(θ)H (R − zIN )−1 s(θ) W 2πi Ck for a contour Ck circling around σ 2 only. Indeed, residue calculus leads to a pole in (R − zIN )−1 at σ 2 with residue 2 s(θ)H UW UH W s(θ) (since the eigenvalue σ has associated 1 eigenspace UW ). Connecting R to M YYH from Theorem 1 and performing residue calculus, we finally obtain a good approximation of η(θ), and then an (N, M )-consistent estimator for the direction of arrival. This estimator consists precisely in determining the K deepest minima (zeros may not exist) of the function [2] ! N X H H φ(n)uN,n uN,n s(θ) s(θ) n=1
with φ(n) defined as i h P µk λk 1+ N , n≤N −K − k=N −K+1 λn −λk iλn −µk h φ(n) = P N −K µ λ k k − k=1 λn −λk − λn −µk , n > N − K
and with µ1 ≤ . . . ≤ µN the eigenvalues of diag(λ) − √ √ T 1 T M λ λ , where we denoted λ = (λ1 , . . . , λN ) , λ1 < 1 H . . . < λN the ordered eigenvalues of M YY . This result is generalized to the scenario where xk (t) is not necessarily known but has some unknown distribution. In this scenario, the technique involves different reference results than the sample covariance matrix theorems required here but the final estimator is obtained similarly [56].
11
0.7
−18
0.6
−20
0.5
Correct detection rate
Cost function [dB]
−16
−22 −24 −26
0.4 0.3 0.2
N-P, Jeffreys N-P, uniform condition number GLRT
−28 −30
MUSIC G-MUSIC
0.1 35
37 angle [deg]
Fig. 6. MUSIC against G-MUSIC for DoA detection of K = 3 signal sources, N = 20 sensors, M = 150 samples, SNR of 10 dB. Angles of arrival of 10◦ , 35◦ and 37◦ .
In Figure 6, the performance comparison between the traditional MUSIC and the G-MUSIC algorithms is depicted, when two sources have close angles which the traditional MUSIC algorithm tends not to be able to isolate. It is clear on this oneshot realization that the G-MUSIC algorithm shows deeper minima and allows for a better source resolution. This completes the set of examples using results based on weak limits of large dimensional random matrices and Gestimation. We now turn to examples involving the results for the spike models, starting with an obvious signal detection procedure. C. Detection Consider the hypothesis test which consists in deciding whether a received signal y(t) ∈ CN is made of pure noise, y(t) = σw(t), w(t) ∼ CN(0, IN ), for some unknown σ (hypothesis H0 ), or made of a signal plus noise, y(t) = hx(t) + σw(t), for an unknown time-independent channel h ∈ CN and a scalar signal x(t) ∼ CN(0, 1) (hypothesis H1 ). As usual, we gather M received data in Y = [y(1), . . . , y(M )]. Since h is unknown, we cannot compare directly H0 and H1 . Instead, we will accept or reject H0 based on how Y fits the pure noise model. It can be proved that the generalized likelihood ratio test (GLRT) for the decision between H0 and H1 boils down to the test λ′1 ,
( N1M
H0 λ1 ≶ f (ε) H tr YY ) H1
1 YYH , for f a given with λ1 the largest eigenvalue of M monotonic function and ε the maximally acceptable false alarm rate. Evaluating the statistical properties of λ′1 for finite N is however rather involved and leads to impractical tests, see e.g. [15]. Instead, we consider here a very elementary statistical test based on Theorem 5. a.s. It is clear that N1M tr YYH −→ σ 2 under both H0 or H1 . Therefore, an application of Slutsky’s lemma [1] ensures
0 1 · 10−3
5 · 10−3
1 · 10−2
2 · 10−2
False alarm rate
Fig. 7. ROC curve for a priori unknown σ2 of the Neyman-Pearson test (N-P), condition number method and GLRT, N = 4, M = 8, SNR = 0 dB, h ∼ CN(0, IN ). For the Neyman-Pearson test, both uniform and Jeffreys prior, with exponent β = 1, are provided.
′ that λp 1 asymptotically has a Tracy-Widom fluctuation around (1 + N/M )2 . The GLRT method therefore leads asymptotically to test the Tracy-Widom statistics for the appropriately centered and scaled version of λ′1 . Precisely, for a false alarm rate ε, this is √ 2 ′ 2 λ − (1 + c) H0 −1 1 3 N √ 4 √ ≶ T2 (1 − ε) (1 + c) 3 c H1
with c = N/M and T2 the Tracy-Widom distribution. Further properties of the above statistical test, and in particular theoretical expression of false alarm rates and test powers are derived in [57] using the theory of large deviations. In Figure 7, the performance of the GLRT detector given by the receiver operating curve for different false alarm rate levels is depicted, for small system sizes. We give the comparison between the GLRT approach, the empirical technique [58] 1 which consists in using the conditioning number of M YYH (ratio largest and the smallest eigenvalues) as a signal detector and the optimal Neyman-Pearson detector using Bayesian marginalization (with maximum entropy priors on h and either uniform of β-Jeffreys prior on σ 2 ) derived in [59]. It turns out that the suboptimal asymptotic GLRT approach is quite close in performance to the finite dimensional exact optimum, the latter being however quite complex to implement. Our last application uses the second order statistics of eigenvalues and eigenvectors for a spike model in the context of failure localization in large dimensional networks. D. Failure detection in large networks The method presented here allows for fast detection and localization of local failures (few links or few nodes) in a large sensor network. Assume for instance the linear scenario y(t) = Hθ(t) + σw(t) for θ(t) = [θ1 (t), . . . , θp (t)]T ∈ Cp and w(t) ∈ CN two standard Gaussian signals with entries of zero mean and
12
′
y (t) = H(Ip +
αk ek eH k )θ(t)
+ σw(t)
for a given αk ≥ −1 and with ek ∈ Cp the vector of all zeros but for ek (k) = 1. Taking for instance αk = −1 turns the entry k of θ(t) into zero, corresponding to complete collapse of the parameter. 1 1 Denoting s(t) = R− 2 y(t) and s′ (t) = R− 2 y(t), we have H E[s(t)s(t) ] = IN , while we have
1 0.9 Correct detection/localization rates
unit variance. The objective here is to detect a small rank perturbation of the covariance matrix E[y(t)y(t)H ] , R = HHH + σ 2 IN , originating from a local failure. In particular, consider the scenario where the entry k of θ(t) shows a sudden change in variance. In this case, we have instead a model of the form
0.8 0.7 0.6 0.5 CDR, FAR= 10−4
0.4
CLR, FAR= 10−4
0.3
CDR, FAR= 10−3
0.2
CLR, FAR= 10−3 CDR, FAR= 10−2
0.1
CLR, FAR= 10−2
0 10
20
30
40
which is clearly a rank-1 perturbation of IN . A simple failure detection test therefore consists, as in Section IV-C, to decide whether the largest eigenvalue of 1 H M SS , with S = [s(1), . . . , s(M )], has a Tracy-Widom distribution. However, we wish now to go further and to be able to decide, upon failure detection, which entry θk of θ was altered. For this, we need to provide a statistical test for the hypothesis Hk =“parameter θk is in failure”. A mere maximum likelihood test consisting in testing the Gaussian distribution of S, assuming αk known for each k, is however costly for N large and becomes impractical if αk is unknown. Instead, one can perform a statistical test on the 1 SSH , extreme eigenvalues and eigenvector projections of M which mainly costs the computational time of eigenvalue decomposition (already performed in the detection test). For this, we need to assume that the number M of observations is sufficiently large for a failure of any parameter of θ to be detectable. As an immediate application of Theorem 5 focused on the largest eigenvalue of the perturbation, denoting 1 SSH and u its corresponding λ the largest eigenvalue of M eigenvector, the estimator kˆ for k is then given by H 2 T H 2 |ui u| − ξi |ui u| − ξi kˆ = arg max − N Σ−1 i λ − ρi λ − ρi 1≤i≤p − log det Σi 1
50
60
70
80
90
100
1
1
H −2 E[s′ (t)s′H (t)] = IN + [(1 + αk )2 − 1]R− 2 Hek eH kH R
1
H −2 2 −2 Hei eH and ξi , ρi , with ωi ui uH i H R i = [(1 + αi ) − 1]R Σi given by Theorem 5 (the index i refers here to a change in parameter θi and not to the i-th largest eigenvalue of the small rank perturbation matrix). For further discussions on this method, see [42]. This method still assumes that αi is known for each i. If not, an estimator of αi can be derived and Theorem 5 adapted accordingly to account for the fluctuations of the estimator. In Figure 8, we depict the detection and localization performance for a sudden drop to zero of the parameter θ1 (t) in a given network of p = N = 10 nodes and parameter entries. The minimum asymptotic ratio N/M equals 0.8, which we take as the reference minimum (hence M = 8) for detection and localization. We see in Figure 8 that good performance for both detection and localization is only achieved for much larger M than theoretically predicted. This is an aftermath
M
Fig. 8. Correct detection (CDR) and localization (CLR) rates for different levels of false alarm rates (FAR) and different values of M , for drop of variance of θ1 . The minimal theoretical M for observability is M = 8.
of the small dimensions considered. Also, the localization performance is rather poor for small M but reaches the same level as the detection performance for larger M . This is explained by the √ inaccuracy of the Gaussian approximation for ωi close to c.
V. P ERSPECTIVES
AND CONCLUSIONS
We have discussed in this short introductory tutorial to statistical inference using large dimensional random matrix theory that most signal processing methods are limited by their being inconsistent with both large population and large sample dimensions. We then introduced notions of random matrix theory which provide results on the spectrum of large random matrices, which were then used to adjust some of these inconsistent signal processing methods. To this end, we introduced a recent method based (i) on the Stieltjes transform tool to derive weak convergence properties of the spectrum of large matrices and (ii) on complex integration to derive estimators. This somewhat parallels the Fourier transform and M -estimator framework usually met in classical asymptotic signal processing, see e.g. [1]. Nonetheless, while classical signal processing tools are very mature, statistical inference based on random matrix analysis still lacks many fundamental mathematical results that only experts can provide to this day (such as asymptotic eigenvalue properties and central limit theorems for more involved models than the sample covariance matrix). The main consequence of this is the slow appearance of natural methods to derive all sorts of estimators for more complicated system models. The emergence of the aforementioned Stieltjes transform and complex integration framework is a rare exception to this rule which, we believe, is sufficiently generic to answer many questions involving the spectral properties of large systems.
13
A PPENDIX A M ATRIX CALCULUS In the following, we remind some simple matrix identities needed in the development of the proofs in the article. For A ∈ CN ×N and B ∈ CN ×N two matrices, we have det(AB) = det(A) det(B). We now remind some matrix inversion lemmas. For X ∈ CN ×n and P ∈ Cn×n , we have the following Woodbury’s identity (IN + XPXH )−1 = IN − X(P−1 + XH X)−1 XH . This is of particular interest if P = ωuuH (of rank-1) where the inner matrix inverse on the right-hand side is scalar. We also have, for A ∈ CN ×N , B ∈ CN ×n ,
distribution function F . We will in particular mention, e.g. in Theorem 1 that this weak convergence is almost sure. This means that FN (x) = FN (x; ω) can be viewed itself as a random variable (since the underlying matrices are random) a.s. and that FN (x) − F (x) −→ 0. When we state that an event EN on (Ω, F, P ) is true for all large N , almost surely, we mean that, for each ω ∈ A ⊂ Ω with P (A) = 1, there exists N0 = N0 (ω) such that N ≥ N0 implies that EN = EN (ω) is true. It is important to note that N0 depends on ω, or else we would say instead that (EN )N ≥N0 is true almost surely for all N . In the context of Theorem 2, this means that we know BN has no eigenvalue in [a − ε, b + ε] after a certain N0 , but this N0 depends on the random sequence B1 , B2 , . . ..
(A + BBH )−1 B = A−1 B(In + BH AB)−1 .
A PPENDIX C C OMPLEX INTEGRATION
We finally mention the block matrix inversion lemma for square invertible matrices A and D: −1 A B C D (A − BD−1 C)−1 (BD−1 C − A)−1 BD−1 = . (CA−1 B − D)−1 CA−1 (D − CA−1 B)−1
Complex integration relates to the extension of integration theory on the real axis to integration along paths in the complex plane. Given a (continuously differentiable) path γ : [a, b] → C (oriented from a to b), we define the integral along γ of a complex function f Z b Z f (γ(t))γ ′ (t)dt. f (z)dz ,
In the particular case where A = a is scalar, C = b ∈ CN −1 , B = bH , note that the entry (1, 1) of the inverse matrix is exactly " −1 # 1 A B . = H C D a − b D−1 b 11
This last relation is used as a first step in the proof of the Mar˘cenko-Pastur law. A PPENDIX B N OTIONS
OF CONVERGENCE IN PROBABILITY
In the course of this tutorial, we often need notions of convergence in probability theory. We remind here the basics of these. Given a probability space (Ω, F, P ), we say that a random variable Xn on Ω (so a map Xn (ω), ω ∈ Ω, onto some measurable space (G, µ)) converges almost surely to the random variable X (in this tutorial, this is often a constant) if there exists A ⊂ Ω, P (A) = 1, such that ω ∈ A implies Xn (ω) → X(ω) as n → ∞. The notion of weak convergence of Xn to X, denoted Xn ⇒ X, is looser, as it is not required that a point-wise convergence of Xn (ω) to X(ω) arises for almost all ω ∈ Ω, but only that P (Xn ∈ G) → P (X ∈ G) for each G ∈ G, as n → ∞. If Xn is a real random variable, then we define the distribution function Fn of Xn as Fn (x) = P (Xn ∈ (−∞, x]). Taking G such an interval, we see clearly that the weak convergence of Xn to X implies (and is in fact equivalent to) Fn (x) → F (x) for all real x continuity points of F . The distribution function F (x) = 1x≤a is the distribution function with density a single mass in a. In the context of random matrix theory, we will use the fact that the eigenvalue distribution function FN of a Hermitian random matrix of size N × N converges weakly to a given
γ
a
In particular, for f continuously differentiable on an open set including a closed path γ (i.e. γ(a) = γ(b)), I f ′ (z)dz = f (b) − f (a) = 0 γ
where the circle on the integral sign reminds here that the path is closed. Of particular interest is the Cauchy complex integration formula which states that, if γ is a simple positively oriented closed-path (or contour) and z0 ∈ C lies in the interior of γ, then for f continuously differentiable on an open set containing γ I f (z) 1 dz f (z0 ) = 2πi γ z − z0 which is easily proved if γ is a ball centered at z0 by making the variable change z = z0 + eiθ . If z0 does not lie in the contour, then the right-hand side equals zero instead. Another important notion is the residue calculus. For a continuously differentiable function f on an open set containing the closed contour γ with singularity at point z0 inside γ where |f (z0 )| = ∞, I 1 f (z)dz 2πi γ
equals the residue of f in z0 . That is, calling k the minimum dk−1 (z − z0 )k f (z) exists, then integer such that limz→z0 dz k−1 we say that f (z) has a residue of order k at z, and I dk−1 1 (z − z0 )k f (z) . f (z)dz = lim z→z0 dz k−1 2πi γ 1 The intimate R 1 link between the form t−z and the Stieltjes transform t−z dF (t) of a distribution function F makes complex integration a good candidate for random matrix analysis.
14
Also, since P Stieltjes transform of empirical eigenvalue distribution ( i λi1−z ) are rational functions, the residue calculus machinery allows easily to discard the complex integrals at the end of most calculus. R EFERENCES [1] A. W. Van der Vaart, Asymptotic statistics. New York: Cambridge University Press, 2000. [2] X. Mestre and M. Lagunas, “Modified Subspace Algorithms for DoA Estimation With Large Arrays,” IEEE Trans. Signal Process., vol. 56, no. 2, pp. 598–614, Feb. 2008. [3] L. Laloux, P. Cizeau, M. Potters, and J. P. Bouchaud, “Random matrix theory and financial correlations,” International Journal of Theoretical and Applied Finance, vol. 3, no. 3, pp. 391–397, Jul. 2000. [4] N. Hansen and A. Ostermeier, “Adapting arbitrary normal mutation distributions in evolution strategies: The covariance matrix adaptation,” in Evolutionary Computation, Proceedings of IEEE International Conference on. IEEE, 1996, pp. 312–317. [5] E. Wigner, “Characteristic vectors of bordered matrices with infinite dimensions,” The annal of mathematics, vol. 62, no. 3, pp. 548–564, Nov. 1955. [6] V. A. Mar˘cenko and L. A. Pastur, “Distributions of eigenvalues for some sets of random matrices,” Math USSR-Sbornik, vol. 1, no. 4, pp. 457– 483, Apr. 1967. [7] W. Rudin, Real and complex analysis, 3rd ed. McGraw-Hill Series in Higher Mathematics, May 1986. [8] V. L. Girko, Theory of Random Determinants. Dordrecht, The Netherlands: Kluwer, Kluwer Academic Publishers, 1990. [9] R. A. Horn and C. R. Johnson, Matrix Analysis. Cambridge University Press, 1985. [10] ——, Topics in Matrix Analysis. Cambridge University Press, 1991. [11] P. Billingsley, Probability and Measure, 3rd ed. Hoboken, NJ: John Wiley & Sons, Inc., 1995. [12] ——, Convergence of probability measures. Hoboken, NJ: John Wiley & Sons, Inc., 1968. [13] J. Wishart, “The generalized product moment distribution in samples from a normal multivariate population,” Biometrika, vol. 20, no. 1-2, pp. 32–52, Dec. 1928. [14] A. T. James, “Distributions of matrix variates and latent roots derived from normal samples,” The Annals of Mathematical Statistics, vol. 35, no. 2, pp. 475–501, 1964. [15] T. Ratnarajah, R. Vaillancourt, and M. Alvo, “Eigenvalues and condition numbers of complex random matrices,” SIAM Journal on Matrix Analysis and Applications, vol. 26, no. 2, pp. 441–456, 2005. [16] F. Hiai and D. Petz, The semicircle law, free random variables and entropy - Mathematical Surveys and Monographs No. 77. Providence, RI, USA: American Mathematical Society, 2006. [17] P. Biane, “Free probability for probabilists,” Quantum probability communications, vol. 11, pp. 55–71, 2003. [18] A. Masucci, Ø. Ryan, S. Yang, and M. Debbah, “Finite Dimensional Statistical Inference,” IEEE Trans. Inf. Theory, 2009, submitted for publication. [Online]. Available: http://arxiv.org/abs/0911.5515 [19] N. R. Rao and A. Edelman, “The polynomial method for random matrices,” Foundations of Computational Mathematics, vol. 8, no. 6, pp. 649–702, Dec. 2008. [20] L. Pastur and V. Vasilchuk, “On the law of addition of random matrices,” Communications in Mathematical Physics, vol. 214, no. 2, pp. 249–286, 2000. [21] W. Hachem, O. Khorunzhy, P. Loubaton, J. Najim, and L. A. Pastur, “A new approach for capacity analysis of large dimensional multi-antenna channels,” IEEE Trans. Inf. Theory, vol. 54, no. 9, 2008. [22] J. W. Silverstein and Z. D. Bai, “On the empirical distribution of eigenvalues of a class of large dimensional random matrices,” Journal of Multivariate Analysis, vol. 54, no. 2, pp. 175–192, 1995. [23] X. Mestre, “Improved estimation of eigenvalues of covariance matrices and their associated subspaces using their sample estimates,” IEEE Trans. Inf. Theory, vol. 54, no. 11, pp. 5113–5129, Nov. 2008. [24] Z. Bai and J. W. Silverstein, “Spectral Analysis of Large Dimensional Random Matrices,” Springer Series in Statistics, 2009. [25] R. Couillet and M. Debbah, Random matrix methods for wireless communications, 1st ed. New York, NY, USA: Cambridge University Press, 2011, to appear.
[26] R. Couillet, M. Debbah, and J. W. Silverstein, “A deterministic equivalent for the analysis of MIMO multiple access channels,” IEEE Trans. Inf. Theory, 2010, to appear. [Online]. Available: http://arxiv.org/abs/0906.3667v3 [27] F. Dupuy and P. Loubaton, “Mutual information of frequency selective MIMO systems: an asymptotic approach,” 2009. [Online]. Available: http://www-syscom.univ-mlv.fr/∼ fdupuy/publications.php [28] W. Hachem, P. Loubaton, and J. Najim, “Deterministic Equivalents for Certain Functionals of Large Random Matrices,” Annals of Applied Probability, vol. 17, no. 3, pp. 875–930, 2007. [29] R. Couillet, J. Hoydis, and M. Debbah, “Deterministic equivalents for the analysis of unitary precoded systems,” IEEE Trans. Inf. Theory, 2011, submitted for publication. [Online]. Available: ttp://arxiv.org/abs/1011.3717 [30] J. Hoydis, R. Couillet, and M. Debbah, “Random Isotropic Beamforming over Correlated Fading Channels,” IEEE Trans. Inf. Theory, 2011, submitted for publication. [31] F. Dupuy and P. Loubaton, “On the capacity achieving covariance matrix for frequency selective MIMO channels using the asymptotic approach,” IEEE Trans. Inf. Theory, 2010, submitted for publication. [Online]. Available: http://arxiv.org/abs/1001.3102 [32] R. Couillet, J. W. Silverstein, Z. Bai, and M. Debbah, “Eigen-Inference for Energy Estimation of Multiple Sources,” IEEE Trans. Inf. Theory, vol. 57, no. 4, pp. 2420–2439, 2011. [33] Z. D. Bai and J. W. Silverstein, “Exact Separation of Eigenvalues of Large Dimensional Sample Covariance Matrices,” The Annals of Probability, vol. 27, no. 3, pp. 1536–1555, 1999. [34] V. L. Girko, “Ten years of general statistical analysis,” unpublished. [Online]. Available: www.general-statistical-analysis.girko.freewebspace.com/chapter14.pdf [35] J. W. Silverstein, Z. Bai, and Y. Yin, “A note on the largest eigenvalue of a large dimensional sample covariance matrix,” Journal of Multivariate Analysis, vol. 26, no. 2, pp. 166–168, 1988. [36] Y. Q. Yin, Z. D. Bai, and P. R. Krishnaiah, “On the limit of the largest eigenvalue of the large dimensional sample covariance matrix,” Probability Theory and Related Fields, vol. 78, no. 4, pp. 509–521, 1988. [37] Z. D. Bai and J. W. Silverstein, “No Eigenvalues Outside the Support of the Limiting Spectral Distribution of Large Dimensional Sample Covariance Matrices,” Annals of Probability, vol. 26, no. 1, pp. 316–345, Jan. 1998. [38] R. B. Dozier and J. W. Silverstein, “On the empirical distribution of eigenvalues of large dimensional information plus noise-type matrices,” Journal of Multivariate Analysis, vol. 98, no. 4, pp. 678–694, 2007. [39] F. Benaych-Georges and R. Rao, “The eigenvalues and eigenvectors of finite, low rank perturbations of large random matrices,” arXiv Preprint, arXiv:0910.2120, 2009. [40] Z. Bai and J. F. Yao, “Limit theorems for sample eigenvalues in a generalized spiked population model,” 2008. [Online]. Available: http://arxiv.org/abs/0806.1141 [41] J. Baik and J. W. Silverstein, “Eigenvalues of large sample covariance matrices of spiked population models,” Journal of Multivariate Analysis, vol. 97, no. 6, pp. 1382–1408, 2006. [42] R. Couillet and W. Hachem, “Local failure detection and identification in large sensor networks,” IEEE Trans. Inf. Theory, 2011, submitted for publication. [43] Z. D. Bai and J. W. Silverstein, “CLT of linear spectral statistics of large dimensional sample covariance matrices,” Annals of Probability, vol. 32, no. 1A, pp. 553–605, 2004. [44] J. Yao, R. Couillet, J. Najim, E. Mouline, and M. Debbah, “CLT for eigen-inference methods in cognitive radios,” in Proc. IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP’11), Prague, Czech Republic, 2011, submitted for publication. [45] I. M. Johnstone, “On the distribution of the largest eigenvalue in principal components analysis,” Annals of Statistics, vol. 99, no. 2, pp. 295–327, 2001. [46] J. Baik, G. Ben Arous, and S. P´ech´e, “Phase transition of the largest eigenvalue for non-null complex sample covariance matrices,” The Annals of Probability, vol. 33, no. 5, pp. 1643–1697, 2005. [47] C. A. Tracy and H. Widom, “On orthogonal and symplectic matrix ensembles,” Communications in Mathematical Physics, vol. 177, no. 3, pp. 727–754, 1996. [48] O. N. Feldheim and S. Sodin, “A universality result for the smallest eigenvalues of certain sample covariance matrices,” Geometric And Functional Analysis, vol. 20, no. 1, pp. 88–123, 2010.
15
[49] T. Chen, D. Rajan, and E. Serpedin, Mathematical foundations for signal processing, communications and networking. Cambridge University Press, 2011. [50] A. Guionnet, “Large Random Matrices: Lectures on Macroscopic ´ ´ e de Probabilit´es de Saint-Flour XXXVI-2006. Asymptotics,” Ecole d’Et´ [Online]. Available: www.umpa.ens-lyon.fr/∼ aguionne/cours.pdf [51] J. Faraut, “Random Matrices and Orthogonal Polynomials,” Lecture Notes, CIMPA School of Merida, 2006. [Online]. Available: www.math.jussieu.fr/∼ faraut/Merida.Notes.pdf [52] Y. V. Fyodorov, “Introduction to the random matrix theory: Gaussian unitary ensemble and beyond,” Lecture Notes. [Online]. Available: http://arxiv.org/abs/math-ph/0412017 [53] R. Schmidt, “Multiple emitter location and signal parameter estimation,” IEEE Transactions on Antennas and Propagation, vol. 34, no. 3, pp. 276–280, 1986. [54] M. L. McCloud and L. L. Scharf, “A new subspace identification algorithm for high-resolution DOA estimation,” IEEE Transactions on Antennas and Propagation, vol. 50, no. 10, pp. 1382–1390, 2002. [55] X. Mestre, “On the asymptotic behavior of the sample estimates of eigenvalues and eigenvectors of covariance matrices,” IEEE Trans. Signal Process., vol. 56, no. 11, pp. 5353–5368, Nov. 2008. [56] P. Vallet, P. Loubaton, and X. Mestre, “Improved subspace estimation for multivariate observations of high dimension: the deterministic signals case,” IEEE Trans. Inf. Theory, 2010, submitted for publication. [Online]. Available: http://arxiv.org/abs/1002.3234 [57] P. Bianchi, J. Najim, M. Maida, and M. Debbah, “Performance of Some Eigen-based Hypothesis Tests for Collaborative Sensing,” IEEE Trans. Inf. Theory, vol. 57, no. 4, pp. 2400–2419, 2011. [58] L. S. Cardoso, M. Debbah, P. Bianchi, and J. Najim, “Cooperative spectrum sensing using random matrix theory,” in IEEE Pervasive Comput. (ISWPC’08), Santorini, Greece, May 2008, pp. 334–338. [59] R. Couillet and M. Debbah, “A Bayesian framework for collaborative multi-source signal detection,” IEEE Trans. Signal Process., vol. 58, no. 10, pp. 5186–5195, Oct. 2010.