Dec 3, 2014 - PR] 3 Dec 2014. RANDOM MATRICES HAVE SIMPLE SPECTRUM. TERENCE TAO AND VAN VU. Abstract. Let Mn = (ξij )1â¤i,jâ¤n be a real ...
arXiv:1412.1438v1 [math.PR] 3 Dec 2014
RANDOM MATRICES HAVE SIMPLE SPECTRUM TERENCE TAO AND VAN VU
Abstract. Let Mn = (ξij )1≤i,j≤n be a real symmetric random matrix in which the upper-triangular entries ξij , i < j and diagonal entries ξii are independent. We show that with probability tending to 1, Mn has no repeated eigenvalues. As a corollary, we deduce that the Erd˝ os-Renyi random graph has simple spectrum asymptotically almost surely, answering a question of Babai.
1. Introduction Let n be an asymptotic parameter going to infinity; we allow all mathematical objects in the discussion below to depend on n unless explicitly declared to be fixed. Asymptotic notation such as o(1), O(), or ≪ will always be understood to be with respect to the asymptotic limit n → ∞, for instance X ≪ Y denotes the claim that X ≤ CY for sufficiently large n and for a fixed C independent of n. In this paper, we study the spectrum of the following general random matrix model. Definition 1.1 (A general model). We consider real symmetric random matrices Mn of the form Mn = (ξij )1≤i,j≤n , where the entries ξij for i ≤ j are jointly independent with ξji = ξij , the upper-triangular entries ξij , i < j have distribution ξ for some real random variable ξ (which may depend on n). The diagonal entries ξii , 1 ≤ i ≤ n can have an arbitrary real distribution (and can be correlated with each other), but are required to be independent of the upper diagonal entries ξij , 1 ≤ i < j ≤ n.
Important classes of matrices covered by this model include real symmetric Wigner matrix ensembles (e.g. random symmetric sign matrices) and the adjacency matrix of an Erd¨ os-R´enyi random graph G(n, p) (note that we permit the diagonal entries to be identically zero). Notice that we do not require the distribution ξ to have zero mean, or even to be absolutely integrable. As a matter of fact, our proofs do not require the entries to be iid, either; see Section 5 for details. However, for sake of simplicity, we make the iid assumption in the main sections of this paper. T. Tao is supported by a Simons Investigator grant, the James and Carol Collins Chair, the Mathematical Analysis & Application Research Fund Endowment, and by NSF grant DMS1266164. V. Vu is supported by NSF grant DMS 1307797 and AFORS grant FA9550-12-1-0083. 1
2
TERENCE TAO AND VAN VU
The spectrum of a symmetric matrix is real, and we say that it is simple if all eigenvalues have multiplicity one. This paper deals with the following basic question Is it true that the spectrum of a random matrix is simple with high probability ? It is easy to see that if the distribution ξ is continuous, then the spectrum is simple with probability 1. On the other hand, the discrete case is far from trivial. In particular, the following conjecture of Babai has been open since the 1980s [2]. Conjecture 1.2. With probability 1 − o(1), G(n, 1/2) has a simple spectrum. In [1], Babai, Grigoriev and Mount showed that the notorious graph isomorphism problem is in P within the class of graphs with simple spectrum. Conjecture 1.2, if holds, implies that most graphs belong to this class. From universality results [3] on the gap between adjacent eigenvalues, one can show that with probability 1 − o(1), that most (i.e. (1 − o(1))n) of the eigenvalues of G(n, 1/2) are simple; however, the error terms in these universality results do not appear to be strong enough to resolve Babai’s conjecture completely. Our main result provides a positive answer to the question above. We say that a real-valued random variable ξ is non-trivial if there is a fixed µ > 0 (independent of n) such that (1)
P(ξ = x) ≤ 1 − µ
for all x ∈ R. In particular, any random variable independent of n is non-trivial if it is not deterministic (i.e. it does not take a single value almost surely). The distribution ξ for the adjacency matrix of G(n, p) will be non-trivial if p stays bounded away from both 0 and 1 (and in particular if 0 < p < 1 is a fixed value such as 1/2). Theorem 1.3 (Simple Spectrum). Let Mn be a random matrix of the form in Definition 1.1 whose upper triangular entries have non-trivial distribution for some fixed µ > 0. Then for every fixed A > 0 and n sufficiently large (depending on A, µ), the spectrum of Mn is simple with probability at least 1 − n−A . In the case when Mn is the adjacency matrix of G(n, 1/2), µ = 1/2 and Theorem 1.3 implies Corollary 1.4. Conjecture 1.2 holds. The rest of the paper is devoted to the proof of Theorem 1.3. One can easily extend Theorem 1.3 (with the same proof) to more general models where the entries are independent but not iid and also to random Hermitian matrices; see Section 5. 2. From multiple eigenvalues to structured eigenvectors The first step in our proof is to reduce the non-simple spectrum problem to a problem about the structure of eigenvectors.
SIMPLE SPECTRUM
3
For a symmetric matrix Mn (either deterministic or random) of size n, write Mn−1 X (2) Mn = X∗ ξnn
where X = (x1 , . . . , xn−1 ) ∈ Rn−1 is the column vector. We need the following (deterministic) lemma: Lemma 2.1. Let Mn be a real symmetric matrix of the form (2). Assume that the spectrum of Mn is not simple. Then X is orthogonal to an eigenvector of Mn−1 . Proof. We can change basis in Rn−1 so that the standard basis e1 , . . . , en−1 is an orthonormal eigenbasis of Mn−1 ; Mn−1 is now a diagonal matrix with entries λ1 , . . . , λn−1 . If Mn has a multiple eigenvalue, then one of the λi is an eigenvalue of Mn . Assume (without loss of generality) that it is λ1 and let v = (v1 , . . . , vn ) be a corresponding eigenvector. The first row of the equation Mn v = λ1 v implies that x1 vn = 0. If x1 = 0 then X is orthogonal to e1 . If vn = 0, then v ′ := (v1 , . . . , vn−1 ) is an eigenvector of Mn−1 and the last row of the equation Mn v = λ1 v implies that v ′ · X = λ1 vn = 0, proving the lemma. In view of this lemma, Theorem 1.3 clearly follows from the following statement. Proposition 2.2. Let the notation and hypotheses be as in Theorem 1.3, and expand Mn as (2). Let E1 be the event that X is orthogonal to a non-trivial eigenvector of Mn−1 . Then P(E1 ) ≪ n−A . It remains to prove this proposition. The crucial point here is that X and Mn−1 are independent of each other. Fix a constant A > 0 and call a vector v ∈ Rn rich if we have sup P(X · v = x) ≥ n−A x∈R
where X ∈ Rn is a random vector whose entries are iid copies of ξ. Let E2,n−1 be the event that an eigenvector of Mn−1 is rich. We have
P(E1 ) = P(E1 |E¯2,n−1 )P(E¯2,n−1 ) + P(E1 |E2,n−1 )P(E2,n−1 ) ≤ n−A + P(E2,n−1 ), by the definition of E1 and E2,n−1 . To prove Proposition 2.2, it therefore suffices to prove Proposition 2.3 (Rich eigenvectors are rare). Let the notation and hypotheses be as above. Then P(E2,n−1 ) ≪ n−A . In fact we will prove the stronger claim (3)
P (E2,n ) ≪ exp(−cn)
for some fixed c > 0 (depending on σ), where E2,n is the event that an eigenvector of Mn is rich; Proposition 2.3 then follows by replacing n with n − 1. The key ingredient of our proof of Proposition 2.3 is the so-called inverse LittlewoodOfford theory, introduced in [8] (see [5] for a survey) which established almost
4
TERENCE TAO AND VAN VU
completely the structure of rich vectors. On the other hand, one expects that eigenvectors of a random matrix must look random, and thus should not attain any rigid structure. This explains the intuition behind Proposition 2.3. The actual proof, however, requires some novel ideas and delicate arguments, and will be the subject of the next two sections.
3. Inverse Littlewood-Offord theory We recall the definition of a (symmetric) generalized arithmetic progression (GAP): Definition 3.1. A set P ⊂ R is a symmetric GAP of rank r if it can be expressed in the form P = {m1 g1 + · · · + mr gr : −Mi ≤ mi ≤ Mi , mi ∈ Z for all 1 ≤ i ≤ r} for some r ≥ 0, g1 , . . . , gr ∈ R and some real numbers M1 , . . . , Mr . It is convenient to think of P as the image of an integer box B := {(m1 , . . . , mr ) ∈ Zr : −Mi ≤ mi ≤ Mi } under the linear map Φ : (m1 , . . . , md ) 7→ m1 g1 + · · · + md gd . The numbers gi are the generators of P , the numbers Mi are the dimensions of P . We refer to r asQthe rank of P . We say that P is proper if this map is one to r one. We define by i=1 (2⌊Mr ⌋ + 1) the volume of P . For more discussion about GAPs (including non-symmetric GAPs, which we will not use here), see [7]. For a vector V = (v1 , . . . , vn ), define the concentration probability ! n X ξi vi = x , pξ (V ) := sup P x∈R
i=1
where ξi are iid copies of ξ. In the notation of the previous section, a vector V is then rich precisely when pξ (V ) ≥ n−A . Abusing the notation slightly, we also think of V as a (multi)-set, as the ordering of the coordinates plays no role. The next theorem determines the structure of rich vectors/sets, asserting that such vectors mostly lie inside a GAP P of bounded rank. Theorem 3.2 (Structure theorem for rich vectors). Let δ < 1 and A be positive constants. There are constant d0 = d0 (δ, A) ≥ 1 and C0 = C0 (δ, A) such that the following holds. Assume that pξ (V ) ≥ n−A . Then for any nδ ≤ m ≤ n, there exists a proper symmetric GAP Q of rank r ≤ r0 with volume at most C0 pξ (v1 , . . . , vn )−1 m−r/2 such that P contains all but at most m elements of V (counting multiplicities). Proof. See [6, Theorem 2.1]. This theorem extended earlier results in [8, 9]; see [5] for a survey.
SIMPLE SPECTRUM
5
For our purpose, we are going to need the following refinement of Theorem 3.2, in which the GAP P not only contains most of the rich vector V , but also contains a large subset of V that does not concentrate too strongly. Theorem 3.3 (A finer structure theorem for rich vectors). Let 0 < ε < 1/4 and A > 0 be fixed, and let V = (v1 , . . . , vn ) be a rich (multi-) set. Set d0 = d0 (1/2, A) and C0 = C0 (1/2, A) from Theorem 3.2. Then there are (multi-)sets W ′ ⊂ W ⊂ V where |W | ≥ n − n1−ε/4 , |W ′ | ≤ εn, a parameter p ≥ n−A , and a GAP P of rank d ≤ d0 and volume at most 2C0 p−1 n−d/2 such that the following holds: • W ⊂ P. • pξ (W ′ ) ≤ nd0 ε p. We now prove this theorem. We first establish the following proposition: Proposition 3.4. Let A ≥ 0 be fixed, and set d0 := d0 (1/2, A) and C0 := C0 (1/2, A). Let P be a proper GAP of rank d ≤ d0 and volume at most 2C0 p−1 n−d/2 for some p ≥ n−A . Let v1 , . . . , vn ∈ P (allowing repetitions). Assume n sufficiently large depending on A. Then one of the following statements holds: (i) (Stability) There are indices 1 ≤ i1 < . . . < ik ≤ n with k ≤ εn such that the (multi-)set V ′ := {vi1 , . . . , vik } satisfies pξ (V ′ ) ≤ nd0 ε p. (ii) (Concentration) There exists a GAP P ′ of some rank d′ ≤ d0 and volume ′ at most (nε/2 p)−1 n−d /2 which contains at least n − n1−ε/3 elements of {v1 , . . . , vn }. Proof. Assume that the stability conclusion (i) fails. Let I := {i1 , . . . , ik } be a subset of {1, . . . , n} of cardinality k := ⌊εn⌋ to be chosen later, and set VI := {vi1 , . . . , vik }. Then pξ (VI ) > nd0 ε p ≥ n−A . Applying Theorem 3.2 with m := n1−ε/2 and δ = 1/2 (and n replaced by k), we obtain a proper symmetric GAP PI of some rank dI ≤ d0 and volume at most ′
′
C0 n−d0 ε p−1 k −(1−ε/2)d /2 ≤ (nε/2 p)−1 n−d /2 which contains at least k − n1−ε/2 elements of VI . At present, PI is unrelated to P , but we can make PI “commensurate” with P as follows. Write PI = {n1 w1 + . . . + ndI wdI : |ni | ≤ Ni for all 1 ≤ i ≤ dI } and let Σ ⊂ ZdI be the set of dI -tuples (n1 , . . . , ndI ) ∈ ZdI with |ni | ≤ Ni for all 1 ≤ i ≤ dI with n1 w1 + . . . + ndI wdI ∈ P . We say that PI has full rank in P if Σ spans RdI as a real vector space. We claim that we may assume without loss of generality that PI is of full rank in P . Indeed, if this is not the case, then Σ is
6
TERENCE TAO AND VAN VU
contained in a hyperplane, which by symmetry we may take to be given by some equation xdI = a1 x1 +. . .+adI −1 xdI −1 . But then every element n1 w1 +. . .+ndI wdI PdI −1 in PI ∩P can be rewritten as j=1 nj (wj + aj wdI ) and so one may replace PI with ′ a rank dI − 1 GAP PI of volume at most that of PI , such that PI ∩ P = PI′ ∩ P . By the principle of infinite descent, we may iterate this procedure until we replace PI with a functionally equivalent GAP which is of full rank in P . The purpose of making this full rank reduction is that it cuts down on the number of possible PI . Indeed, to specify PI , one needs to specify the rank dI , the dimensions N1 , . . . , NdI , and the generators w1 , . . . , wdI . As PI has rank at most d0 and volume at most nO(1) we have O(nO(1) ) choices for dI , N1 , . . . , NdI . To specify the generators, it suffices to choose dI linearly independent elements (n1 , . . . , ndI ) of Σ, and their representatives n1 w1 + . . . + ndI wdI in P . As P has volume O(nO(1) ), we see that the total number of choices here is also O(nO(1) ). Thus we see that there are at most O(nO(1)) choices for PI . Applying the pigeonhole principle, we conclude that there exists a fixed GAP P ′ ′ of rank d′ ≤ d0 and volume at most (nε p)−1 n−d /2 such that, when I is chosen uniformly at randomly from all subsets of size k := ⌊εn⌋ of {1, . . . , n} then with probability ≫ n−O(1) , at least k − n1−ε/2 of the elements of PI lie in P ′ . A routine application of the Chernoff inequality then shows that at least n − n1−ε/3 of the v1 , . . . , vn lie in P ′ . This gives the desired concentration conclusion (ii).
To prove Theorem 3.3, we apply Proposition 3.4 iteratively, as follows. (i) Let V be a rich vector, and set p1 := pξ (V ), thus p1 ≥ n−A by hypothesis. By Theorem 3.2 (with m = n1−ε/2 ), we may find a GAP P1 of rank d1 ≤ d0 and volume at most C0 p−1 n−d1 /2 which contains all but at most n1−ε/2 elements of V . Set V1 := V ∩ P1 and n1 := |V1 |, thus n1 ≥ n − n1−ε/2 . Initialize i = 1, thus Vi , Pi , ni , pi have all been defined. (ii) If there exist a subset Vi′ of Vi of size ki := ⌊εni ⌋ such that pξ (Vi′ ) ≤ ndi 0 ε pi , then we set W ′ , W, p, P equal to Vi′ , Vi , pi , Pi respectively, and STOP. Otherwise, if no such subset Vi′ exists, we move on to step (iii). (iii) If pi < n−A then STOP. Otherwise, by Proposition 3.4 with n replaced by ni , i 1−ε/3 contained we may find a set Vi+1 ⊂ Vi of size ni+1 := |Vi+1 | ≥ ni − ni −di+1 /2 , in a GAP Pi+1 of rank di+1 at most d0 and volume at most p−1 i+1 ni+1 ε/2
where pi+1 := ni pi . Thus Vi+1 , Pi+1 , ni+1 , pi+1 have all been defined. (iv) Increment i to i + 1 and return to step (ii). Note that after each successfully completed loop, the probability pi increases by a 1−ε/3 ε/2 ; multiplicative factor of ni , while ni only decreases by an additive factor of ni −A since p1 is initially at least n , we see that after O(1) steps pi will exceed 1, at which point we must terminate at step (ii). Thus the above algorithm can only run for for at most O(1) steps. The same analysis also shows inductively that pi ≥ n−A i all i = O(1), so one never terminates at step (iii), and so must instead terminate
SIMPLE SPECTRUM
7
at step (ii). It is then routine that W ′ , W, p, P obey all the properties required for Theorem 3.3. 4. Proof of Proposition 2.3 We can now conclude the proof of Proposition 2.3, and more precisely the stronger bound (3). One can show that for any given rich vector v, the probability that v is an eigenvector is very small. Ideally, we would like to conclude by bounding the number of rich vectors, using the inverse theorems, and then apply the union bound (this will explain the entropy estimates below). However, this strategy does not work straightforwardly, and we will need to introduce an additional twist to see it through. In this section, all implied constants in the O() notation can depend on the fixed quantity A (but not on the quantity ε to be introduced shortly). Suppose that we are in the event E2,n , thus Mn has an eigenvector V = (v1 , . . . , vn ) which is rich. Let ε > 0 be a small fixed quantity (depending on A) to be chosen later. Applying Theorem 3.3, we can find (multi-)sets W ′ ⊂ W ⊂ V where |W | ≥ n − n1−ε/4 , |W ′ | ≤ εn, as well as a parameter p ≫ n−A , and a GAP P of rank d = O(1) and volume at most O(p−1 n−d/2 ) such that W ⊂ P and pξ (W ′ ) ≪ nd0 ε p. By rounding p to the nearest multiple of n−A , we may assume that p is an integer multiple of n−A , which must then be of size at most O(nO(d/2) ) since otherwise P would have volume less than 1. Write W = {vi1 , . . . , vin′ } and W ′ = {vj1 , . . . , vjk }, where k ≤ εn and n−n1−ε/4 ≤ n′ ≤ n. From Stirling’s formula we observe that the total number of possibilities for p, d, n′ , k, i1 , . . . , in′ and j1 , . . . , jk is at most exp(O(nε log 1ε )). Thus, if we let E2,n,p,d,n′ ,k,i1 ,...,in′ ,j1 ,...,jk denote the event that Mn has a rich eigenvector obeying the above assertions, then by the union bound we will obtain (3) if we can show that P(E2,n,p,d,n′ ,k,i1 ,...,in′ ,j1 ,...,jk ) ≪ exp(−cn) for some fixed c > 0 independent of ε, and for sufficiently small ε. Let us work now with a single choice of n, p, d, n′ , k, i1 , . . . , in′ , j1 , . . . , jk , and abbreviate E2,n,p,d,n′ ,k,i1 ,...,in′ ,j1 ,...,jk as E3 , thus our task is to show that (4)
P(E3 ) ≪ exp(−cn).
By symmetry we may assume that il = l for l = 1, . . . , n′ , and that jl = l for l = 1, . . . , k. Thus on the event E3 , we now have (5)
v1 , . . . , vn′ ∈ P
and (6)
pξ (v1 , . . . , vk ) ≪ nO(ε) p.
We cover E3 by the events E3′ and E3′′ , where E3′ is the event that we can take d = 0, and E3′′ is the event that we can take d > 0.
8
TERENCE TAO AND VAN VU
Case 1: d = 0. In this case, P is trivial and so v1 = . . . = vn′ = 0. We now use a conditioning argument of Koml´os [4]. Split Mn′ B (7) Mn = B∗ C
where Mn′ is the top left n′ × n′ minor of Mn , B is the n′ × n − n′ top right minor, B ∗ is the adjoint of B, and C is the bottom right n − n′ × n − n′ minor. By hypothesis, Mn has an eigenvector V with the first n′ coefficients vanishing, which implies from the eigenvector equation Mn V = λV and (7) that the matrix B does not have full rank. Thus there exists n − n′ rows of B which span a proper ′ subspace H of Rn−n in which the remaining n′ − (n − n′ ) rows necessarily lie. n′ The entropy cost of picking these n − n′ rows is n−n ′ . Now suppose we fix the position of these rows, as well as the precise values that the random matrix attains on these rows, so that H is now deterministic. The entries of B on the remaining rows remain identically distributed with law ξ. This can be seen by embedding H in a hyperplane, which can be written as a graph of one of the n coordinates of Rn as a linear combination of the other n − 1 coordinates, and then using (1), we conclude that each of these rows has an independent probability of at most 1 − µ of lying in H. Putting all this together, we conclude that ′ ′ n′ ′ P(E3 ) ≤ × (1 − µ)n −(n−n ) n − n′ and hence from Stirling’s formula and the size n′ = n − O(n1−ε/4 ) of n that P(E3′ ) ≪ exp(−cn)
(8)
for some fixed c > 0 independent of ε (but depending on µ). Case 2. d ≥ 1. As in Case 1, we split Mn using (7). Write V ′ := (v1 , . . . , vn′ ) (viewed as a column vector), then from the eigenvalue equation Mn V = λV and (7), we see that Mn′ V ′ lies in the space spanned by V ′ and the n − n′ columns of B. We expand the GAP P as P = {n1 w1 + · · · + nd wd : |ni | ≤ Ni } for some w1 , . . . , wd ∈ R and N1 , . . . , Nd ≥ 1. By (5), we have ′ ′ + . . . + wd V(d) V ′ = w1 V(1)
(9) ′
′ where each V(i) ∈ Rn is a vector whose entries all lie in [−Ni , Ni ]∩Z. In particular, ′
′ ′ and the columns of B, , . . . , V(d) if we let H be the subspace of Rn spanned by V(1) ′ ′ then V lies in H, and H has dimension at most d + (n − n ) = O(n1−ε/4 ). ′ The total number of possibilities for each vector V(i) is at most (2Ni + 1)n . By −1 −d/2 Theorem 3.3, P has volume at most O(p n ), so the total number of possi′ ′ bilities for V(1) , . . . , V(d) is (very crudely) at most O(p−1 n−d/2 )n . Thus, by paying ′ ′ this as an entropy cost, we may assume that V(1) , . . . , V(d) are fixed. For the rest of the argument, we condition on the minors B, C in (7), so the subspace H defined previously is now deterministic, while the matrix Mn′ remains random (and is of the form in Definition 1.1, with n replaced by n′ ). The real numbers w1 , . . . , wd are also random and may potentially depend on Mn′ .
SIMPLE SPECTRUM
9
We now split Mn′ further as Mn′ =
(10)
Mk D∗
D E
where Mk is the top left k × k minor of Mn′ (or of Mn ), D is a k × (n′ − k) matrix, and E is a (n′ − k) × (n′ − k) matrix. We aim to bound the probability P(Mn′ V ′ ∈ H)
(11)
(conditioning on B, C as mentioned above); any bound we obtain on this probability, multiplied by the previously mentioned entropy cost of O(p−1 n−d/2 )n , will provide a bound on P(E3′′ ) by Fubini’s theorem and the union bound. To illustrate our ideas, let us first consider the toy case when H = {0} and the generators wi , 1 ≤ i ≤ d are fixed, which would make V ′ deterministic. We split ′′ V ′ (12) V = V ′′′ where V ′′ is the column vector (v1 , . . . , vk ), and V ′′′ is the column vector (vk+1 , . . . , vn′ ). Expanding the condition Mn′ V ′ ∈ H using (10) and (12) and extracting the lower n′ − k entries of Mn′ V ′ , we see that D∗ V ′′ = w ′
where w ∈ Rn −k is the vector w := −EV ′′′ . If we condition Mk and E to be deterministic, then w becomes deterministic also, while each entry of D∗ remains independent with distribution ξ, and by (6) each entry of D∗ V ′′ will match its corresponding entry of w with an independent probability O(nO(ε) p). Multiplying these probabilities and integrating out the conditioning, we obtain the bound ′
P(Mn′ V ′ ∈ H) ≪ O(nO(ε) p)n −k for (11), which as mentioned earlier would give an upper bound for P (E3′′ ) of the form ′
O(p−1 n−d/2 )n × O(nO(ε) p)n −k which simplifies to d
′
n− 2 n+O(εn) (1/p)k+(n−n ) ; since k + (n − n′ ) = O(εn) and p ≥ n−A , this can be bounded by exp(−cn) for some fixed c > 0 independent of ε, with plenty of room to spare, if ε is chosen small enough. There are two problems with this toy argument. Firstly, in general H is not {0}. However, we will be able to (morally) reduce to the H = {0} case by a linear projection argument that incurs a tolerable extra entropy loss (using the fact that the dimension of H is only O(n1−ε )). The second problem is that the number of choices for the generators wi is potentially infinite or even uncountable, so the entropy loss here is unacceptable. We are going to avoid this problem by not counting the number of wi , but a finite set of representatives, which is linear algebraically equivalent.
10
TERENCE TAO AND VAN VU
We turn to the details. We split ′ V(i)
=
′′ V(i) ′′′ V(i)
! ′
′′ ′′′ for i = 1, . . . , d, where V(i) ∈ Rk and V ′′′ , V(i) ∈ Rn −k . Expanding the condition Mn′ V ′ ∈ H using (10), (9) and extracting the bottom n−k coefficients, we conclude that
(13)
′′ ′′′ ′′ ′′′ w1 (D∗ V(1) + EV(1) ) + . . . + wd (D∗ V(d) + EV(d) ) ∈ H1 ′
′
where H1 ⊂ Rn −k is the projection of H to Rn −k . With our current condition′ ing, H1 is a deterministic subspace of Rn −k of some dimension d1 = O(n1−ε/4 ). Meanwhile, from (6), and (9) we have (14)
pξ (w1 V1′′ + . . . + wd Vd′′ ) ≤ nCε p.
for some fixed constant C (independent of ε). We next reduce the space H1 to the trivial space {0}. Recall that d1 = O(n1−ε/4 ) is the dimension of H1 . By permuting the indices if necessary, we may assume ′ that1 H1 is a graph over the last d1 coordinates of Rn −k . In other words, we may express H1 as H1 = {(L(Y ), Y ) : Y ∈ Rd1 } ′
for some (deterministic) linear map L : Rd1 → Rn −k−d1 . Equivalently, we have ′ ˜ Y ) = 0} H1 = {(X, Y ) ∈ Rn −k−d1 × Rd1 : L(X,
˜ : Rn′ −k−d1 × Rd1 → Rn′ −k−d1 is the map where L ˜ L(X, Y ) := X − L(Y ). ′ ′ ˜ is the identity map on Rn′ −k−d1 If we identify Rn −k−d1 with Rn −k−d1 ×{0}, then L ˜ to (13), we obtain and has H1 as its kernel. Applying L
(15)
˜ ∗ Vd′′ + EVd′′′ ) = 0. ˜ ∗ V1′′ + EV1′′′ ) + . . . + wd L(D w1 L(D
We now condition Mk , E to be fixed; the only remaining random variables are the entries of the k×n′ −k matrix D, which are iid with distribution ξ. Let E4 denote the event that (15) holds for a given choice of deterministic data (Mk , E, B, C, V(i) , d, H, d1 , H1 ); we suppress the dependence of E4 on this data. If we can obtain an upper bound on the conditional probability that E4 occurs, then by multiplying this bound by the previous entropy cost of O(p−1 n−d/2 )n would give an upper bound on P(E3′′ ). We still need to control P(E4 ). Now that H has been eliminated, the most significant remaining difficulty is the lack of control of the quantities w1 , . . . , wd , which at present are arbitrary real numbers and can thus take an uncountable number of possible values. To resolve this difficulty we again use the conditioning arguments of Koml´os [4]. Given any m × d matrix M for any m, we say that M has good kernel if the kernel ker(M ) := {w ∈ Rd : M w = 0} contains a tuple (w1 , . . . , wd ) obeying both (15) and (14). If we form the (n′ − k − d1 ) × d random matrix U with 1This does not incur any entropy cost, as H was already deterministic under our current 1 conditioning.
SIMPLE SPECTRUM
11
˜ ∗ V ′′ + EV ′′′ ), . . . , L(D ˜ ∗ V ′′ + EV ′′′ ) as columns, then clearly E4 is the vectors L(D 1 1 d d contained in the event that U has good kernel. Trivially, U has rank at most d. As a consequence, one can select d rows from U whose row span is the same as that of U , or equivalently that the corresponding d × d minor of U has the same kernel as that of U . The number of possible ways ′ 1 to select these rows is n −k−d , which we crudely bound by nd . By paying an d entropy cost of nd for the purposes of bounding P(E4 ), we may thus assume that these row positions are deterministic, thus there are deterministic 1 ≤ l1 < · · · < ld ≤ n′ − k − d1 , and we need to bound the event that the d × d minor Ud formed by the l1 , . . . , ld rows of U has a good kernel. We now condition on the l1 , . . . , ld rows of the n′ − k × k matrix D∗ , as well as the last d1 rows of the same matrix D∗ , thus leaving at least n′ − k − d1 − d of the first n′ − k − d1 rows of D∗ random (with entries independently distributed with law ξ). ˜ ∗ V ′′ + ˜ is the identity on Rn′ −k−d1 , we see that the l1 , . . . , ld entries of L(D As L 1 ′′′ ∗ ′′ ′′′ ˜ EV1 ), . . . , L(D Vd + EVd ) are now deterministic (they do not depend on the remaining random rows of D∗ ). In other words, the minor Ud is now deterministic. If Ud does not have a good kernel, its contribution to P(E4 ) is zero. If instead Ud has a good kernel, then we may find a deterministic choice of w1 , . . . , wd in the kernel of Ud that obeys both (15) and (14). The rest of the calculation is similar to the toy case. Consider the ith component of the vector equation (15) for this deterministic choice of w1 , . . . , wd , where 1 ≤ i ≤ n′ − k − d′′′ is not equal to any of the l1 , . . . , ld . We can rewrite this component as (16)
(w1 V1′′ + . . . + wd Vd′′ ) · Ri = xi
where Ri ∈ Rk is the ith row of D∗ , and xi ∈ R is a deterministic quantity that does not depend on the remaining random rows in D∗ . By (14), for each such i, the equation (16) holds with an independent probability of at most nCε p, and so ′ the probability that (15) holds in full is at most O(nCε p)n −k−d1 −d . Taking into account the entropy cost of nd mentioned earlier, we thus have ′
′
P(E4 ) ≤ nd × O(nCε p)n −k−d1 −d ≤ nO(εn) pn −k−d1 −d . Paying the previously mentioned entropy cost of O(p−1 n−d/2 )n , we then have ′
d
′
P(E3′′ ) ≤ O(p−1 n−d/2 )n × nO(εn) pn −k−d1 −d ≤ n− 2 n+O(εn) (1/p)n−n +k+d1 +d . Since p ≥ n−A , d ≥ 1, and n − n′ + k + d1 + d = O(εn), we conclude that P(E3′′ ) ≪ exp(−cn) for some fixed c > 0 independent of ε (with plenty of room to spare), if ε is small enough. Combining this with (8) we obtain (4). This concludes the proof of (3), and Theorem 1.3 follows.
12
TERENCE TAO AND VAN VU
5. Concluding remarks The assumption that the upper diagonal entries ξij , 1 ≤ i < j ≤ n are iid is not essential; an inspection of the argument reveals that the proof continues to work if we assume that the ξij are independent and there is a constant µ > 0 such that P(ξij = x) ≤ 1 − µ for any 1 ≤ i < j ≤ n and x ∈ R. Our main tool, Theorem 3.2, holds under this assumption; see [6]. The argument also easily extends to Hermitian random matrix models, in which the coefficients ξij for i < j are allowed to be complex, and one imposes the condition ξji = ξij . In other words, the above arguments can extend to show the following result: Theorem 5.1. For any fixed A, µ ≥ 0 and sufficiently large n the following holds. Let ξij , 1 ≤ i < j ≤ n be independent (complex or real) random variables such that P(ξij = x) ≤ 1 − µ for any 1 ≤ i < j ≤ n and x ∈ R. Let ξii , 1 ≤ i ≤ n be real random variables that are independent of the ξij , 1 ≤ i < j ≤ n. Set ξji = ξij for 1 ≤ i < j ≤ n. Then the spectrum of the matrix (ξij )1≤i