VOL. X, NO. Y, MONTH 2015
1
On the probability that all eigenvalues of Gaussian and Wishart random matrices lie
arXiv:1502.04189v1 [math.ST] 14 Feb 2015
within an interval Marco Chiani, Fellow, IEEE
Abstract We derive the probability ψ(a, b) = Pr(a ≤ λmin (M), λmax (M) ≤ b) that all eigenvalues of a random matrix M lie within an arbitrary interval [a, b], when M is a real or complex finite dimensional Wishart, double Wishart, or Gaussian symmetric/hermitian matrix. We give efficient recursive formulas allowing, for instance, the exact evaluation of ψ(a, b) for Wishart matrices with number of variates 500 and degrees of freedom 1000. We also prove that the probability that all eigenvalues are within the limiting spectral support (given by the Marchenko-Pastur or the semicircle laws) tends for large dimensions to the universal values 0.6921 and 0.9397 for the real and complex cases, respectively. Applications include improved bounds for the probability that a Gaussian measurement matrix has a given restricted isometry constant in compressed sensing.
Index Terms Random Matrix Theory, Principal Component Analysis, Compressed Sensing, eigenvalues distribution, Tracy-Widom distribution, Wishart matrices, Gaussian Orthogonal Ensemble, MANOVA.
I. I NTRODUCTION Many mathematical models in information theory and physics are formulated by using matrices with random elements. In particular, the distribution of the eigenvalues of Wishart and Gaussian random matrices plays a key role in multivariate analysis, including principal component analysis, Marco Chiani is with DEI, University of Bologna, V.le Risorgimento 2, 40136 Bologna, ITALY (e-mail:
[email protected]). This work was partially supported by the Italian Ministry of Education, Universities and Research (MIUR) under the PRIN Research Project “GRETA”. Submitted 2015
DRAFT
VOL. X, NO. Y, MONTH 2015
2
analysis of large data sets, information theory, signal processing and mathematical physics [1]–[11]. Most of the problems concern the distribution of the smallest and/or of the largest eigenvalue, or of a randomly picked (unordered) eigenvalue. For example, in compressed sensing the probability that a randomly generated measurement matrix has a given restricted isometry constant is related to the probability Pr(a ≤ λmin (M), λmax (M) ≤ b), where λmin (M), λmax (M) denote the minimum and maximum eigenvalues of the matrix M = X† X, and X is the measurement matrix [9], [12], [13]. To mention another example, several stability problems in physics, in complex networks and in complex ecosystems are related to the probability that all eigenvalues of a random symmetric matrix (for instance with Gaussian entries) are negative [14]–[17]. This probability is also important in mathematics, as it is related to the expected number of minima in random polynomials [18]. Owing to the difficulties in computing the exact marginal distributions of eigenvalues, asymptotic formulas for matrices with large dimensions are often used as approximations. One important example is the Wigner semicircular law, giving the asymptotic distribution of a randomly picked eigenvalue of a symmetric/hermitian random matrix M having zero mean independent, identically distributed (i.i.d.) entries above the diagonal. The law applies to a wide range of distributions for the entries [19], and in particular when M = X + X† and the elements of X are zero mean (real or complex) i.i.d. Gaussian. In this situation, the symmetric matrix M = X + X† belongs to the Gaussian Orthogonal Ensemble (GOE) and to the Gaussian Unitary Ensemble (GUE), for the real and complex cases, respectively. Another key result is the Marchenko-Pastur law, giving the asymptotic distribution for one randomly picked eigenvalue of the random matrix M = X† X. This is related to the sample covariance matrix, and therefore of primary importance in statistics and signal processing. The limiting Marchenko-Pastur law applies to a wide class of distributions for the entries of X, including the case when X has i.i.d. Gaussian entries, and thus M = X† X is a white Wishart matrix [19]. The limiting value and the limiting distribution of the extreme eigenvalues (largest or smallest) have also been studied intensively [19], [20]. The Tracy-Widom laws give the asymptotic distribution of the extreme eigenvalues around the limiting values [7], [20]–[27]. Large deviation methods are used in [16], [28], [29] to derive the asymptotic behavior of the distribution of the largest eigenvalue far from its mean value. Submitted 2015
DRAFT
VOL. X, NO. Y, MONTH 2015
3
For non-asymptotic analysis, where the random matrices have finite dimensions, deriving the distribution of eigenvalues is generally difficult, especially for the real Wishart and GOE. Recently, the exact distribution of the largest eigenvalue, as well as efficient recursive methods for its numerical computation, has been found for real white Wishart, multivariate Beta (also known as double Wishart), and GOE matrices [30], [31]. In this paper we give new expressions and efficient recursive methods for the evaluation of the probability ψ(a, b) = P (a ≤ λmin (M), λmax (M) ≤ b) that all eigenvalues are within an arbitrary interval [a, b], when M is a real finite dimensional white Wishart, double Wishart, or Gaussian symmetric matrix. For completeness we provide also the results for complex Wishart (with arbitrary covariance), complex double Wishart, and GUE, by specializing [32, Th. 7]. The marginal cumulative distribution of the smallest eigenvalue and of the largest eigenvalue can be seen as the particular cases 1 − ψ(a, ∞) and ψ(−∞, b), respectively. We then derive simple and accurate approximations to ψ(a, b) based on gamma incomplete functions and valid for large matrices, and prove that the probability that all eigenvalues are within the limiting spectral support (given by the Marchenko-Pastur and the semicircle laws) tends for large dimensions to the universal values 0.6921 and 0.9397 for the real and complex cases, respectively. Throughout the paper we indicate with Γ(.) the gamma function, with γ (a; x, y) =
Ry x
ta−1 e−t dt
1 γ(a; 0, x) the regularized lower the generalized incomplete gamma function, with P (a, x) = Γ(a) R y a−1 −t 1 t e dt = P (a, y)−P (a, x) the generalincomplete gamma function, with P (a; x, y) = Γ(a) x
ized regularized incomplete gamma function, with B(a, b) the beta function, with B (x, y; a, b) = R y a−1 t (1 − t)b−1 dt the incomplete beta function [33, Ch. 6], with ()† transposition and complex x conjugation, and with | · | or det(·) the determinant. When possible we use capital letters for random variables, and bold for vectors and matrices. We say that a random variable Z has a standard complex Gaussian distribution (denoted CN (0, 1)) if Z = (Z1 + iZ2 ), where Z1 and Z2 are i.i.d. real Gaussian N (0, 1/2). A complex random vector X is Gaussian circularly symmetric if its probability distribution function (p.d.f.) has the form f (x) ∝ exp −x† Σ−1 x , where Σ is the covariance matrix. Note that this implies that X is zero mean. When Σ = I the entries of X are i.i.d. CN (0, 1).
Submitted 2015
DRAFT
VOL. X, NO. Y, MONTH 2015
4
II. R EAL W ISHART AND G AUSSIAN SYMMETRIC MATRICES A. Wishart real matrices Assume a Gaussian real n×N matrix X with i.i.d. columns, each with zero mean and covariance Σ. The real matrix M = XXT is called Wishart, and its distribution indicated as Wn (N, Σ). When Σ ∝ I the matrix is called white or uncorrelated Wishart. Q Denoting Γm (a) = π m(m−1)/4 m i=1 Γ(a − (i − 1)/2), the joint p.d.f. of the (real) ordered eigenvalues λ1 ≥ λ2 . . . ≥ λn ≥ 0 of the real Wishart matrix M ∼ Wn (N, I) is given by [1], [34] f (x1 , . . . , xn ) = K
n Y
e−xi /2 xαi
n Y
i=1
(xi − xj )
(1)
i p. When m, p → ∞ and m/p → γ ∈ (1, ∞), the probability that all eigenvalues are within the Marchenko-Pastur support is √ √ √ √ ψ ( m − p)2 , ( m + p)2 → F22 (0) ' 0.9397 .
(66)
3) Let M be a (p × p) real symmetric GOE matrix. When p → ∞ the probability that all eigenvalues are within the semicircle support is p p ψ − 2p, 2p → F12 (0) ' 0.6921 .
(67)
4) Let M be a (p × p) complex symmetric GUE matrix. When p → ∞ the probability that all eigenvalues are within the semicircle support is p p ψ − 2p, 2p → F22 (0) ' 0.9397 .
(68)
Proof: Let us consider the Wishart case. The statistical dependence between the largest and the smallest eigenvalues passes through the intermediate p − 2 eigenvalues, and in the limit for p → ∞ the largest and smallest eigenvalues are thus independent. Then, using (50) and observing that the MP edges are the constants µmp and µ− mp appearing in (51) and (53), we get the results. For GOE/GUE the limiting distribution of the extreme eigenvalues is still a shifted (of an amount equal to the circular law edges) and scaled T W β , so the same reasoning leads to (67) and (68). This theorem is valid not only for matrices derived from Gaussian measurements, but for the much wider class of matrices for which (51) (53) applies [24], [26], [43]. Note that the gamma approximation (59) for the Tracy-Widom law gives F12 (0) ≈ P 2 k, αθ = 0.83122 ' 0.691 and F22 (0) ≈ P 2 k, αθ = 0.969452 ' 0.9398. Exact values for ψ(˜ a, ˜b) for finite dimension real Wishart matrices, as obtained by Algorithm 1, are reported in Table IV, together with the asymptotic value. Finally, since for Wishart matrices the smallest and largest eigenvalues have different variances, − asymptotically we can leave the same probability at the left and right side by moving t σms on
the left and t σms on the right of the limiting MP support. In fact, we have
Submitted 2015
DRAFT
VOL. X, NO. Y, MONTH 2015
23
Pr (λmax (M) > µms + t σms ) → 1 − Fβ (t) − Pr λmin (M) < µ− ms − t σms → 1 − Fβ (t)
(69)
− 2 ψ(µ− ms − t σms , µms + t σms ) → (1 − Fβ (t))
(71)
(70)
and thus
with, e.g., F1 (0) ' 83%, F1 (1) ' 95%, F1 (2) ' 99%, F1 (3) ' 99.8%. TABLE IV P ROBABILITY ψ(˜ a, ˜b) THAT ALL EIGENVALUES OF A REAL W ISHART MATRIX ARE WITHIN THE M ARCHENKO -PASTUR EDGES FOR
p/m = 2/3, 1/2, 1/5, 1/10. N UMERICAL VALUES FOR FINITE p OBTAINED BY A LGORITHM 1, AND FOR p = ∞ BY
T HEOREM 6.
ψ(˜ a, ˜b) p/m = 2/3
p/m = 1/2
p/m = 1/5
p/m = 1/10
10
0.7678
0.7645
0.7625
0.7624
20
0.7499
0.7483
0.7476
0.7477
p
50
0.7332
0.7326
0.7327
0.7329
100
0.7239
0.7238
0.7242
0.7244
200
0.7169
0.7171
0.7175
0.7177
500
0.7101
0.7103
0.7108
0.7109
∞
0.6921
0.6921
0.6921
0.6921
VI. A PPLICATION TO COMPRESSED SENSING Assume that we want to solve the system y = Ax
(72)
where y ∈ Rm and A ∈ Rm×n are known, the number of equations is m < n, and x ∈ Rn is the unknown. Since m < n we can think of y as a compressed version of x. Without other constraints the system is underdetermined, and there are infinitely many distinct solutions x satisfying (72). If we assume that at most s elements of x are non-zero (i.e., the vector is s-sparse), and s < m, than there is only one solution (the right one) to (72), provided that all possible submatrices Submitted 2015
DRAFT
VOL. X, NO. Y, MONTH 2015
24
consisting of 2s columns of A are maximum rank (2s). However, even when this condition is satisfied, finding the solution of (72) subject to ||x||0 ≤ s, where the `0 -norm ||·||0 is the number of non-zero elements, is computationally prohibitive. A computationally much easier problem is to find a `1 -norm minimization solution x = {arg minx˜ ||˜ x||1 : y = A˜ x}. In [9] it is proved that, under some more strict conditions on A, the solution provided by `1 -norm minimization is the same as that of the `0 -norm minimization. More precisely, for integer s define the isometry constant of a matrix A as the smallest number δs = δs (A) such that [9] (1 − δs )||x||22 ≤ ||Ax||22 ≤ (1 + δs )||x||22
(73)
holds for all s-sparse vectors x. The possibility to use `1 minimization instead of the impractical `0 minimization is, for a given matrix A, related to the restricted isometry constant [9]. For example, in [45] it is shown that the `0 and the `1 solutions are coincident for s-sparse vectors x if δs < 0.307. Then, the next question is how to design a matrix A with a prescribed isometry constant. One possible way to design A consists simply in randomly generating its entries according to some statistical distribution. In this case, for given m, n, s and δs the inequalities (73) are satisfied with some probability. The target here is to find a way to generate A such that, for example, for given m, n, s, the probability Pr(δs (A) < 0.307) is close to one. When the measurement matrix A has entries randomly generated according to a N (0, 1/m) distribution, this probability can be bounded starting from the probability Pr(a ≤ λmin (ATs As ), λmax (ATs As ) ≤ b), where As is a m × s Gaussian random matrix with N (0, 1/m) i.i.d. entries [9, Sec. III]. In [9], deviation bounds for the largest and smallest eigenvalues of ATs As are obtained, using the concentration inequality, as r
s 2 + o(1) + t ≤ e−mt /2 m r p s 2 T Pr λmin (As As ) < 1 − + o(1) − t ≤ e−mt /2 m
p λmax (ATs As ) > 1 + Pr
where t > 0 and o(1) is a small term tending to zero as m increases. In our notation, and neglecting o(1), the previous bounds can be rewritten √ √ √ 2 2 Pr λmax (M) > m+ s+t m ≤ e−mt /2 √ √ √ 2 2 m− s−t m ≤ e−mt /2 Pr λmin (M) < Submitted 2015
(74) (75) DRAFT
VOL. X, NO. Y, MONTH 2015
25
where M = mATs As ∼ Ws (m, I) and λmax (M) = mλmax (ATs As ). In Fig. 1 and Fig. 2 these bounds are compared with the exact results given by Algorithm 1 and with the simple gamma approximations (60), (61), for some values on s, m. It can be noted that the concentration inequality bounds (74), (75) are quite loose. VII. C ONCLUSIONS Iterative algorithms have been found to evaluate in few seconds the exact value of the probability that all eigenvalues lie within an arbitrary interval [a, b], for quite large (e.g. 500 × 500) real white Wishart, complex Wishart with arbitrary correlation, double Wishart, and Gaussian symmetric/hermitian matrices. For large matrices, simple approximations based on shifted incomplete gamma functions have been proposed, and it is proved that the probability that all eigenvalues are within the limiting support is 0.6921 for real white Wishart and GOE, and 0.9397 for complex white Wishart and GUE. For example, we analyzed the probability that all eigenvalues are negative for GOE, of interest in complex ecosystems and physics, and compared the new expressions with the concentration inequality based bounds in the context of compressed sensing. R EFERENCES [1] T. W. Anderson, An Introduction to Multivariate Statistical Analysis. [2] R. J. Muirhead, Aspects of Multivariate Statistical Theory. [3] M. L. Mehta, Random Matrices, 2nd ed.
New York: Wiley, 2003.
New York: Wiley, 1982.
Boston, MA: Academic, 1991.
[4] J. H. Winters, “On the capacity of radio communication systems with diversity in Rayleigh fading environment,” IEEE J. Sel. Areas Commun., vol. 5, no. 5, pp. 871–878, Jun. 1987. [5] A. Edelman, “Eigenvalues and condition numbers of random matrices,” SIAM Journal on Matrix Analysis and Applications, vol. 1988, pp. 543–560, 1988. [6] ˙I. E. Telatar, “Capacity of multi-antenna Gaussian channels,” European Trans. Telecommun., vol. 10, no. 6, pp. 585–595, Nov./Dec. 1999. [7] I. Johnstone, “On the distribution of the largest eigenvalue in principal components analysis,” The Annals of Statistics, vol. 29, no. 2, pp. 295–327, 2001. [8] M. Chiani, M. Z. Win, and A. Zanella, “On the capacity of spatially correlated MIMO Rayleigh fading channels,” IEEE Trans. Inform. Theory, vol. 49, no. 10, pp. 2363–2371, Oct. 2003. [9] E. J. Cand`es and T. Tao, “Decoding by linear programming,” Information Theory, IEEE Transactions on, vol. 51, no. 12, pp. 4203–4215, Dec 2005.
Submitted 2015
DRAFT
VOL. X, NO. Y, MONTH 2015
26
[10] F. Penna, R. Garello, and M. A. Spirito, “Cooperative spectrum sensing based on the limiting eigenvalue ratio distribution in Wishart matrices,” Communications Letters, IEEE, vol. 13, no. 7, pp. 507–509, 2009. [11] Y. Chen and M. McKay, “Coulumb fluid, Painlev´e transcendents, and the information theory of MIMO systems,” Information Theory, IEEE Transactions on, vol. 58, no. 7, pp. 4594–4634, July 2012. [12] D. L. Donoho, “Compressed sensing,” Information Theory, IEEE Transactions on, vol. 52, no. 4, pp. 1289–1306, 2006. [13] E. J. Cand`es and M. B. Wakin, “An introduction to compressive sampling,” Signal Processing Magazine, IEEE, vol. 25, no. 2, pp. 21–30, 2008. [14] R. M. May, “Will a Large Complex System be Stable?” Nature, vol. 238, pp. 413–414, Aug. 1972. [15] A. Aazami and R. Easther, “Cosmology from random multifield potentials,” Journal of Cosmology and Astroparticle Physics, vol. 0603, p. 013, 2006. [16] D. S. Dean and S. N. Majumdar, “Extreme value statistics of eigenvalues of Gaussian random matrices,” Physical Review E, vol. 77, no. 4, p. 041108, Apr. 2008. [17] M. C. D. Marsh, L. McAllister, E. Pajer, and T. Wrase, “Charting an Inflationary Landscape with Random Matrix Theory,” Journal of Cosmology and Astroparticle Physics, vol. 11, p. 40, Nov. 2013. [18] J.-P. Dedieu and G. Malajovich, “On the number of minima of a random polynomial,” ArXiv Mathematics e-prints, Feb. 2007. [19] Z. Bai and J. W. Silverstein, Spectral Analysis of Large Dimensional Random Matrices.
Science Press, 2006, 2006.
[20] C. Tracy and H. Widom, “The distributions of random matrix theory and their applications,” New Trends in Mathematical Physics, pp. 753–765, 2009. [21] ——, “Level-spacing distributions and the Airy kernel,” Communications in Mathematical Physics, vol. 159, no. 1, pp. 151–174, 1994. [22] ——, “On orthogonal and symplectic matrix ensembles,” Communications in Mathematical Physics, vol. 177, pp. 727–754, 1996. [23] K. Johansson, “Shape fluctuations and random matrices,” Communications in Mathematical Physics, vol. 209, pp. 437–476, 2000. [24] A. Soshnikov, “A note on universality of the distribution of the largest eigenvalues in certain sample covariance matrices,” Journal of Statistical Physics, vol. 108, pp. 1033–1056, 2002. [25] I. M. Johnstone, “Approximate null distribution of the largest root in multivariate analysis,” The annals of Applied Statistics, vol. 3, no. 4, pp. 1616–1633, 2009. [26] O. N. Feldheim and S. Sodin, “A universality result for the smallest eigenvalues of certain sample covariance matrices,” Geometric And Functional Analysis, vol. 20, no. 1, pp. 88–123, 2010. [27] Z. Ma, “Accuracy of the Tracy–Widom limits for the extreme eigenvalues in white Wishart matrices,” Bernoulli, vol. 18, no. 1, pp. 322–359, 2012. [28] D. S. Dean and S. N. Majumdar, “Large deviations of extreme eigenvalues of random matrices,” Phys. Rev. Lett., vol. 97, p. 160201, Oct 2006. [Online]. Available: http://link.aps.org/doi/10.1103/PhysRevLett.97.160201 [29] C. Nadal and S. N. Majumdar, “A simple derivation of the Tracy-Widom distribution of the maximal eigenvalue of a Gaussian unitary random matrix,” Journal of Statistical Mechanics: Theory and Experiment, vol. 2011, no. 04, p. P04001, 2011. [30] M. Chiani, “Distribution of the largest eigenvalue for real Wishart and Gaussian random matrices and a simple approximation for the Tracy–Widom distribution,” Journal of Multivariate Analysis, vol. 129, pp. 69–81, 2014.
Submitted 2015
DRAFT
VOL. X, NO. Y, MONTH 2015
27
[31] ——, “Distribution of the largest root of a matrix for Roy’s test in multivariate analysis of variance,” arXiv preprint arXiv:1401.3987, 2014. [32] M. Chiani and A. Zanella, “Joint distribution of an arbitrary subset of the ordered eigenvalues of Wishart matrices,” in Proc. IEEE Int. Symp. on Personal, Indoor and Mobile Radio Commun., Cannes, France, Sep. 2008, pp. 1–6, invited Paper. [33] M. Abramowitz and I. A. Stegun, Handbook of Mathematical Functions wih Formulas, Graphs, and Mathematical Tables. Washington, D.C.: United States Department of Commerce, 1970. [34] A. T. James, “Distributions of matrix variates and latent roots derived from normal samples,” Annals Math. Stat., vol. 35, pp. 475–501, 1964. [35] N. De Bruijn, “On some multiple integrals involving determinants,” J. Indian Math. Soc, vol. 19, pp. 133–151, 1955. [36] P. Vivo, S. N. Majumdar, and O. Bohigas, “Large deviations of the maximum eigenvalue in Wishart random matrices,” Journal of Physics A: Mathematical and Theoretical, vol. 40, no. 16, p. 4317, 2007. [37] K. S. Pillai, “Upper percentage points of the largest root of a matrix in multivariate analysis,” Biometrika, vol. 54, no. 1-2, pp. 189–194, 1967. [38] C. G. Khatri, “Distribution of the largest or the smallest characteristic root under null hypothesis concerning complex multivariate normal populations,” Ann. Math. Stat., vol. 35, pp. 1807–1810, Dec. 1964. [39] M. Chiani, M. Z. Win, and H. Shin, “MIMO networks: the effects of interference,” IEEE Trans. Inf. Theory, vol. 56, no. 1, pp. 336–349, Jan. 2010. [40] A. Zanella, M. Chiani, and M. Z. Win, “On the marginal distribution of the eigenvalues of Wishart matrices,” IEEE Trans. Commun., vol. 57, no. 4, pp. 1050–1060, Apr. 2009. [41] B. Nadler, “Finite sample approximation results for principal component analysis: a matrix perturbation approach,” The Annals of Statistics, vol. 36, no. 6, pp. pp. 2791–2817, 2008. [42] N. El Karoui, “On the largest eigenvalue of Wishart matrices with identity covariance when n, p and p/n → ∞,” arXiv preprint math/0309355, 2003. [43] S. P´ech´e, “Universality results for the largest eigenvalues of some sample covariance matrix ensembles,” Probability Theory and Related Fields, vol. 143, no. 3, pp. 481–516, 2009. [44] V. A. Marˇcenko and L. A. Pastur, “Distribution of eigenvalues for some sets of random matrices,” Math USSR Sbornik, vol. 1, pp. 457–483, 1967. [45] T. Cai, L. Wang, and G. Xu, “New bounds for restricted isometry constants,” Information Theory, IEEE Transactions on, vol. 56, no. 9, pp. 4388–4394, Sept 2010.
Submitted 2015
DRAFT
VOL. X, NO. Y, MONTH 2015
28
�(λ ���< ( � - � -� � )� ) � ���� ��-� ��-� ��-� ��-�� ��-��
Fig. 1.
���
���
���
� ���
Distribution of the smallest eigenvalue for Wishart real matrices Ws (m, I), m = 400, s = 10. Comparison between
the exact (dashed line), the gamma approximation (61) (dotted), and the concentration inequality bound (75) (solid).
�(λ ��� > ( � + � + � � )� ) � ���� ��-� ��-� ��-� ��-�� ��-��
Fig. 2.
���
���
���
� ���
Distribution of the largest eigenvalue for Wishart real matrices Ws (m, I), m = 400, s = 10. Comparison between the
exact (dashed line), the gamma approximation (60) (dotted), and the concentration inequality bound (74) (solid).
Submitted 2015
DRAFT