NEW METHODS FOR HANDLING SINGULAR SAMPLE COVARIANCE MATRICES
arXiv:1111.0235v1 [math.PR] 1 Nov 2011
GABRIEL H. TUCCI AND KE WANG
Abstract. The estimation of a covariance matrix from an insufficient amount of data is one of the most common problems in fields as diverse as multivariate statistics, wireless communications, signal processing, biology, learning theory and finance. In [13], a new approach to handle singular covariance matrices was suggested. The main idea was to use dimensionality reduction in conjunction with an average over the unitary matrices. In this paper we continue with this idea and we further consider new innovative approaches that show considerable improvements with respect to traditional methods such as diagonal loading. One of the methods is called the Ewens estimator and uses a randomization of the sample covariance matrix over all the permutation matrices with respect to the Ewens measure. The techniques used to attack this problem are broad and run from random matrix theory to combinatorics.
1. Introduction The estimation of a covariance matrix from an insufficient amount of data is one of the most common problems in fields as diverse as multivariate statistics, wireless communications, signal processing, biology, learning theory and finance. For instance, the covariation between asset returns plays a crucial role in modern finance. The covariance matrix and its inverse are the key statistics in portfolio optimization and risk management. Many recent financial innovations involve complex derivatives, like exotic options written on the minimum, maximum or difference of two assets, or some structured financial products, such as CDOs. All of these innovations are built upon, or in order to exploit, the correlation structure of two or more assets. In the field of wireless communications, covariance estimates allows us to compute the direction of arrival (DOA), which is a critical task in smart antenna systems since it enables accurate mobile location. Another application is in the field of biology and involves the interactions between proteins or genes in an organism and the joint time evolution of their interactions. Typically the covariance matrix of a multivariate random variable is not known but has to be estimated from the data. Estimation of covariance matrices then deals with the question of how to approximate the actual covariance matrix on the basis of samples from the multivariate distribution. Simple cases, where the number of observations is much greater than the number of variables, can be dealt with by using the sample covariance matrix. In this case, the sample covariance matrix is an unbiased and efficient estimator of the true covariance matrix. However, in many practical situations we would like to estimate the covariance matrix of a set of variables from an insufficient amount of data. In this case the sample covariance matrix is singular (non-invertible) and therefore a fundamentally bad estimate. More specifically, let X be a random vector X = (X1 , . . . , Xm )T ∈ Cm×1 and assume for simplicity that X is centered. Then the true covariance matrix is given by Σ = E(XX ∗ ) = (cov(Xi , Xj ))1≤i,j≤m . (1.1) m×1 Consider n independent samples or realizations x1 , . . . , xn ∈ C and form an m × n data matrix M = (x1 , . . . , xn ). Then the sample covariance matrix is an m × m non-negative definite matrix 1 (1.2) K = M M ∗. n If n → +∞ as m tends to infinity, then the sample covariance matrix K converges (entrywise) to Σ almost surely. Whereas, as we mentioned before, in many empirical problems, the number of measurements is less 1
2
GABRIEL H. TUCCI AND KE WANG
than the dimension (n < m), and thus the sample covariance matrix K is almost always singular. Our target in this paper is to recover the true covariance matrix Σ from K under the condition n < m. The conventional treatment of covariance singularity artificially converts the singular sample covariance matrix into an invertible (positive definite) covariance by the simple expedient of adding a positive diagonal matrix, or more generally, by taking a linear combination of the sample covariance and an identity matrix. This procedure is variously called “diagonal loading” or “ridge regression” [17, 5]. The method is pretty straightforward. Consider αK + βIm as an estimate of Σ, where α, β are called loading parameters. This resulting matrix is positive definite (invertible) that preserves the eigenvectors of the sample covariance. The eigenvalues of αK + βIm are a uniform shift of the eigenvalues of K. There are many methods in choosing the optimum loading parameters, see [11], [14] and [15]. In Marzetta, Tucci and Simon’s paper [13], they suggested a new approach to handle singular covariance matrices. Let p ≤ n be a parameter, to be estimated later, and consider the set of all p × m one-sided unitary matrices Ωp,m = {Φ ∈ Cp×m : ΦΦ∗ = Ip }. (1.3) Endow Ωp,m with the Haar measure, that is, the uniform distribution on the set Ωp,m . We define the operators covp (K) = E(Φ∗ (ΦKΦ∗ )Φ) (1.4) and invcovp (K) = E(Φ∗ (ΦKΦ∗ )−1 Φ)
(1.5)
where the expectation is taken with respect to the Haar measure. Surprisingly, they found that p (mp − 1)K + (m − p)Tr(K)I , covp (K) = m (m2 − 1)m which is the same as diagonal loading. Moreover, they investigated the properties of invcovp (K). If K is decomposed as K = U DU ∗ , with D = diag(d1 , . . . , dn , 0, . . . , 0), then invcovp (K) = U invcovp (D)U ∗ , and invcovp (D) = diag(λ1 , . . . , λn , µ, . . . , µ). In other words, invcovp (K) preserves the eigenvectors of K, and transforms all the zero eigenvalues to a nonzero constant µ. They also provided the formulas to compute the λi ’s and µ, and studied the asymptotic behavior of invcovp (D) using techniques from free probability. In this paper, we investigate new methods to estimate singular covariance matrices. In Section 2, we present some preliminaries in Schur polynomials that are later used in this work. In Section 3, we continue to work on the operator invcovp suggested in [13]. We also show that invcovp (K) actually has a very simple algebraic structure, i.e. it is a polynomial in K. A formula for computing E(Φ(Φ∗ Dn Φ)l Φ∗ ) is given to help to further study Equation (65) and Theorem 1, Section VI in [13]. In Section 4, we consider a new approach, called the Ewens estimator, to estimate Σ. In this one, the average is taken over the set of all m × m permutation matrices with respect to the Ewens measure. The explicit formula for the Ewens estimator is computed by a combinatorial argument. In Section 5, we combine the ideas of the first two methods. We extend the definition of permutation matrices to get p × m unitary matrices and define two new operators Kθ,m,p := E VσT (Vσ KVσT )Vσ , ˜ θ,m,p := E V T (Vσ KV T )+ Vσ K σ σ
SINGULAR SAMPLE COVARIANCE MATRICES
3
Figure 1. Young tableaux representation of the partition (5, 4, 1). to estimate Σ and Σ−1 respectively. We provide the explicit formula for Kθ,m,p and an inductive formula to ˜ θ,m,p . In Section 6, it is assumed that Σ has some special form, i.e. tridiagonal Toeplitz matrix compute K or power Toeplitz matrix and we study its asymptotic behavior under Ewens estimator. In this Section, we also present some simulations under different methods to test the effect of the parameters. Notation: Throughout this paper, 1S is the indicator function of a set S. We sometimes use [n] to present the set {1, 2, . .p . , n}, and Tr(A) is the trace of a matrix A. For a vector v = (v1 , . . . , vm ), we use pthe Euclidean Pm 2 Tr(AA∗ ). We norm kvk2 = i=1 |vi | and for an m × m matrix A, we use the Frobenius norm kAk = use the notation µ ` n to indicate that µ is a partition of the positive integer n. 2. Schur Polynomials Preliminaries A symmetric polynomial is a polynomial P (x1 , x2 , . . . , xn ) in n variables such that if any of the variables are interchanged one obtains the same polynomial. Formally, P is a symmetric polynomial if for any permutation σ of the set {1, 2, . . . , n} one has P (xσ(1) , xσ(2) , . . . , xσ(n) ) = P (x1 , x2 , . . . , xn ). Symmetric polynomials arise naturally in the study of the relation between the roots of a polynomial in one variable and its coefficients, since the coefficients can be given by a symmetric polynomial expressions in the roots. Symmetric polynomials also form an interesting structure by themselves. The resulting structures, and in particular the ring of symmetric functions, are of great importance in combinatorics and in representation theory (see for instance [7, 16, 12, 18] for more on details on this topic). The Schur polynomials are certain symmetric polynomials in n variables. This class of polynomials is very important in representation theory since they are the characters of irreducible representations of the general linear groups. The Schur polynomials are indexed by partitions. A partition of a positive integer n, also called an integer partition, is a way of writing n as a sum of positive integers. Two partitions that differ only in the order of their summands are considered to be the same partition. Therefore, we can always represent a partition λ of a positive integer n as a sequence of n non-increasing and non-negative integers di such that n X
di = n
with
d1 ≥ d2 ≥ d3 ≥ . . . ≥ dn ≥ 0.
i=1
Notice that some of the di could be zero. Integer partitions are usually represented by the so called Young’s tableaux (also known as Ferrers’ diagrams). A Young tableaux is a finite collection of boxes, or cells, arranged in left-justified rows, with the row lengths weakly decreasing (each row has the same or shorter length than its predecessor). Listing the number of boxes on each row gives a partition λ of a non-negative integer n, the total number of boxes of the diagram. The Young diagram is said to be of shape λ, and it carries the same information as that partition. For instance, in Figure 1 we can see the Young tableaux corresponding to the partition (5, 4, 1) of the number 10. Given a partition λ of n n = d1 + d2 + · · · + dn : d1 ≥ d2 ≥ · · · ≥ dn ≥ 0
4
GABRIEL H. TUCCI AND KE WANG
Figure 2. Young tableaux representation of the partition (5, 4, 1) with its corresponding hook’s lengths. the following functions are alternating polynomials (in other words they change sign under any transposition of the variables): d1 x1 xd21 . . . xdn1 xd2 xd2 . . . xdn2 2 1 a(d1 ,...,dn ) (x1 , . . . , xn ) = det . .. .. .. .. . . . dn dn dn x1 x2 . . . xn X 1 n = (σ)xdσ(1) · · · xdσ(n) σ∈Sn
where Sn is the permutation group of the set {1, 2, . . . , n} and (σ) the sign of σ. Since they are alternating, they are all divisible by the Vandermonde determinant Y ∆(x1 , . . . , xn ) = (xj − xk ). 1≤j 0 and K(σ) is the number of cycles in σ. The case θ = 1 corresponds to the uniform measure. This measure has recently appeared in mathematical physics models (see e.g. [2] and [6]) and one has only recently started to gain insight into the cycle structures of such random permutations. Let σ be a permutation in Sm , the corresponding permutation matrix Mσ is an m × m matrix such that Mσ (i, j) = 1i=δ(j) . If we denote ei to be a 1 × m vector such that the ith entry is 1 and all the others entries are zero, then eσ(1) Mσ = ... , eσ(m) which is, of course, a unitary matrix. Given the sample covariance matrix K we define the new estimator for Σ as Kθ := E(Mσ KMσ∗ )
(4.1)
where the expectation is taken with respect to the Ewens measure. Theorem 4.1. Let K = (aij ) be an m × m matrix in Cm×m . Then Kθ = E(Mσ KMσ∗ ) is an m × m matrix such that the diagonal terms satisfy (Kθ )ii =
θ−1 1 aii + Tr(K), θ+m−1 θ+m−1
(4.2)
and the non-diagonal terms (i 6= j) satisfy (Kθ )ij =
1 2 θ aij + (θ − 1)aji + θ (θ + m − 2)(θ + m − 1)
X
(aik + akj ) +
X
alk
l6=i,k6=j k6=l
k6=i,j
(4.3)
=
1 (θ2 − 1)aij + (θ − 1)aji + (θ − 1) (θ + m − 2)(θ + m − 1)
X
(aik + akj ) +
k6=i,j
X l6=k
Remark 4.2. If θ = 1, eeT eeT eKeT K1 = α + β(Im − ) where α = = m m m
P
i,j
aij
m
This has already been computed in [20], Proposition 2.2. If K = D = diag(d1 , . . . , dm ), then Kθ = which corresponds to diagonal loading.
Tr(D) θ−1 D+ Im , θ+m−1 θ+m−1
and β =
Tr(K) − α . m−1
alk .
SINGULAR SAMPLE COVARIANCE MATRICES
9
Proof. First,
eσ (1) Mσ KM ∗ = ... K e∗σ (1)
···
e∗σ (m)
m X m X
=
! akl ekσ (i)elσ (j)
= (aσ(i)σ(j) )1≤i,j≤m .
l=1 k=1
eσ (m) For diagonal terms, (Kθ )ii =
X σ∈Sm
= aii =
X
pθ,m (σ)aσ(i)σ(i) = aii
pθ,m (σ) +
X 6=i
σ∈Sm σ(i)=i
θ θ+m−1
X
pθ,m−1 (˜ σ) +
σ ˜ ∈Sm−1
1 θ aii + θ+m−1 θ+m−1
X l6=i
X l6=i
all
all =
X
pθ,m (σ)
σ∈Sm σ(i)=l
X all pθ,m−1 (ˆ σ (l)) θ+m−1 σ ˆ (l)
θ−1 1 aii + Tr(K). θ+m−1 θ+m−1
Now we compute the off-diagonal terms (Kθ )ij (i 6= j). For σ ∈ Sm , if σ(i) = i and σ(j) = j, then σ = (i)(j)σ1 with σ1 ∈ Sm−2 and K(σ) = K(σ1 ) + 2, pθ,m (σ) =
θ2 pθ,m−2 (σ1 ). (θ + m − 2)(θ + m − 1)
If σ(i) = j, σ(j) = i, we erase i, j from σ to obtain σ2 ∈ Sm−2 , and pθ,m (σ) =
θ pθ,m−2 (σ6 ). (θ + m − 2)(θ + m − 1)
If σ(i) = i and σ(j) = k 6= i, j, then σ = (i)ˆ σ with σ ˆ ∈ Sm−1 , and K(σ) = K(ˆ σ ) + 1. Furthermore, we can erase j from σ ˆ to get a new permutation σ3 (k) ∈ Sm−2 such that K(σ3 (k)) = K(ˆ σ ) and finally pθ,m (σ) = Notice that
P
σ3 (k)
θ pθ,m−2 (σ3 (k)). (θ + m − 2)(θ + m − 1)
pθ,m−2 (σ3 (k)) = 1.
If σ(i) = l 6= i, j and σ(j) = j, then as above we can have σ4 (l) ∈ Sm−2 such that pθ,m (σ) =
X θ pθ,m−2 (σ4 (l)) and pθ,m−2 (σ4 (l)) = 1. (θ + m − 2)(θ + m − 1) σ4 (l)
If σ(i) = l 6= i and σ(j) = k 6= j (k 6= l), and we exclude the case that σ(i) = j, σ(j) = i, we erase i, j from σ to obtain σ5 (l, k) ∈ Sm−2 , thus pθ,m (σ) =
X 1 pθ,m−2 (σ5 (l, k)) and pθ,m−2 (σ5 (l, k)) = 1. (θ + m − 2)(θ + m − 1) σ5 (l,k)
10
GABRIEL H. TUCCI AND KE WANG
Therefore, for i 6= j, X
(Kθ )ij =
pσ,m (σ)aσ(i)σ(j)
σ∈Sm
= aij + aji +
θ2 (θ + m − 2)(θ + m − 1) θ (θ + m − 2)(θ + m − 1)
X
aik
k6=i,j
+
X l6=i,j
+
alj
X
X
θ (θ + m − 2)(θ + m − 1) X
l6=i,k6=j,k6=l without l=j,k=i
σ5 (k,l)∈Sm−2
pθ,m−2 (σ2 )
σ2 ∈Sm−2
θ (θ + m − 2)(θ + m − 1)
X
pθ,m−2 (σ1 )
σ1 ∈Sm−2
alk
X
pθ,m−2 (σ3 (k))
σ3 (k)∈Sm−2
X
pθ,m−2 (σ4 (l))
σ4 (l)∈Sm−2
1 pθ,m−2 (σ5 (k, l)) (θ + m − 2)(θ + m − 1)
=
1 2 θ aij + (θ − 1)aji + θ (θ + m − 2)(θ + m − 1)
X k6=i,j
(aik + akj ) +
X
alk .
l6=i,k6=j k6=l
5. Hybrid Method In this Section, we combine the ideas of the first two methods to create a third hybrid method. First, we extend the definition of a permutation. For an integer p ≤ m, let n o Sp,m := σ : σ an injection from {1, 2, . . . , p} to {1, 2, . . . m} . m! and in the case p = m, Sm,m is the set of all permutations on {1, 2, . . . , m}. The size of the set Sp,m is (m−p)! For σ ∈ Sp,m , the associated p × m matrix takes the form eσ(1) eσ(2) Vσ := . .. eσ(p) th where eσ(i) = (e1σ(i) , e2σ(i) , . . . , em entry 1 and all others equal to zero. σ(i) ) is a 1 × m row vector with the σ(i) Notice Vσ VσT = Ip (5.1) and Pσ := VσT Vσ = diag(p1 , . . . , pm ) (5.2) where p X 1 if i ∈ {σ(1), . . . , σ(p)}, i 2 pi = (eσ(l) ) = 0 otherwise. l=1
Next, we use the Ewens measure on the permutation sets to define a probability on the set Sp,m . For each σ ∈ Sp,m , consider the set n o Ωσ := σ ˜ ∈ Sm : σ ˜{1,...,p} = σ .
SINGULAR SAMPLE COVARIANCE MATRICES
11
In other words, Ωσ is the set of all permutations in Sm whose restriction to the set {1, 2, . . . , p} is equal to σ. Recall that pθ,m is the Ewens measure on Sm with parameter θ. Define the probability measure on Sp,m for σ ∈ Sp,m as X pθ,m (˜ σ ). (5.3) µθ,m,p (σ) := pθ,m (Ωσ ) = σ ˜ ∈Ωσ
Now we are ready to introduce two new operators Kθ,m,p := E VσT (Vσ KVσT )Vσ
(5.4)
˜ θ,m,p := E VσT (Vσ KVσT )+ Vσ , K
(5.5)
where (Vσ KVσT )+ is the Moore−Penrose pseudoinverse of the matrix Vσ KVσT . We use Kθ,m,p as an estimate ˜ θ,m,p for Ω−1 . Now we show a few results on these new estimators. for Ω and K Theorem 5.1. Let A be an m × m matrix A = (aij ) ∈ Cm×m . Then Kθ,m,p is an m × m matrix such that the diagonal entries are equal to θ+p−1 θ+m−1 aii , if 1 ≤ i ≤ p, (Kθ,m,p )ii = p θ+m−1 aii , if p + 1 ≤ i ≤ m. and the non-diagonal entries, assuming i < j (If j < i, exchange i and j in the following expression.) are equal to (θ+p−1)(θ+p−2) (θ+m−1)(θ+m−2) aij , if 1 ≤ i < j ≤ p, (p−1)(θ+p−1) (Kθ,m,p )ij = (θ+m−1)(θ+m−2) aij , if 1 ≤ i ≤ p < j ≤ m, p(p−1) (θ+m−1)(θ+m−2) aij , if p < i < j ≤ m. Remark 5.2. In the particular case that A is a diagonal matrix A = diag(d1 , . . . , dm ), then Kθ,m,p =
θ−1 p D+ diag(d1 , . . . , dp , 0, . . . , 0). θ+m−1 θ+m−1
For instance, if p = 1 and m = 3 then Kθ,3,1 =
1 diag(θa11 , a22 , a33 ). θ+2
Remark 5.3. In the general case with p = 2 and m = 3 then (θ + 1)a11 θa12 1 θa21 (θ + 1)a22 Kθ,3,2 = θ+2 a31 a32
a13 a23 . 2a33
Proof. Recall from Equation (5.2) that Pσ = VσT Vσ = diag(pσ1 , . . . , pσm ), thus VσT (Vσ AVσT )Vσ = (pσi pσj aij )1≤i,j≤m , where p X 1 if i ∈ {σ(1), . . . , σ(p)}, i 2 pi = (eσ(l) ) = 0 otherwise. l=1
12
GABRIEL H. TUCCI AND KE WANG
For the diagonal entries, if 1 ≤ i ≤ p,
(Kθ,m,p )ii =
X
µθ,m,p (σ)(pσi )2 aii = aii
σ∈Sm,p
p X
X
µθ,m,p (σ)
l=1 σ∈Sm,p ,σ(l)=i
X
= aii
µθ,m,p +
X
X
µθ,m,p
l6=i σ∈Sm,p ,σ(l)=i
σ∈Sm,p ,σ(i)=i
= aii =
θ θ+m−1
X
p−1 θ+m−1
µθ,m−1,p−1 +
σ 0 ∈Sm−1,p−1
X
µθ,m−1,p−1
σ 0 ∈Sm−1,p−1
θ+p−1 aii . θ+m−1
If p + 1 ≤ i ≤ m,
(Kθ,m,p )ii =
X
µθ,m,p (σ)(pσi )2 aii
= aii
σ∈Sm,p
p X
=
µθ,m,p (σ)
l=1 σ∈Sm,p ,σ(l)=i
= aii
X
p θ+m−1
X
µθ,m−1,p−1
σ 0 ∈Sm−1,p−1
p aii . θ+m−1
For non-diagonal entries, if 1 ≤ i < j ≤ p, which turns out to be the most complicated case, pσi pσj aij is non zero if i, j ∈ {σ(1), . . . , σ(p)}. Thus
(Kθ,m,p )ij = aij
X
X
s,t∈[p],s6=t
σ∈Sm,p , σ(s)=i,σ(t)=j
µθ,m,p (σ).
We divide the previous sum into five parts: (1) σ(i) = i, σ(j) = j, thus if we “erase” i, j from the sets [p] and [m], we get a new injection σ1 from [p]\{i, j} to [m]\{i, j} and K(σ) = K(σ1 ) + 2; (2) σ(s) = i, for some s ∈ [p]\{i, j}, σ(j) = j, if we ”erase” j from the sets [p], [m] and consider s, i as one number s˜, we can get a new injection σ2 : [p] ∪ s˜\{i, j, s} → [m] ∪ s˜\{i, j, s} , and K(σ) = K(σ2 ) + 1; (3) σ(t) = j, forsome t ∈ [p]\{i, j}, σ(i) = i, similar to case (2) by exchanging the roles of i, j, we can get a new injection σ3 with K(σ) = K(σ3 ) + 1; (4) σ(s) = i, σ(t) = j, s 6= t, for some s ∈ [p]\{i} and t ∈ [p]\{j}, if we consider s, i as a new number s˜ and t, j as a new number t˜, we get a new injection σ4 : [p] ∪ s˜, t˜\{i, j, s, t} → [m] ∪ s˜, t˜\{i, j, s, t} , and K(σ) = K(σ4 ); (5) σ(i) = j, σ(j) = i, if we ”erase” i, j, we can get a new injection σ5 : [p]\{i, j} → [m]\{i, j} and K(σ) = K(σ5 ) + 1.
SINGULAR SAMPLE COVARIANCE MATRICES
(Kθ,m,p )ij = aij + +
aij θ(p − 2) (θ + m − 1)(θ + m − 2) aij θ(p − 2) (θ + m − 1)(θ + m − 2)
+ aij + =
θ2 (θ + m − 1)(θ + m − 2)
µθ,m−2,p−2 (σ1 )
σ1 ∈Sm−2,p−2
X
µθ,m−2,p−2 (σ2 )
σ2 ∈Sm−2,p−2
X
µθ,m−2,p−2 (σ3 )
σ3 ∈Sm−2,p−2
(p − 2)2 + (p − 2) (θ + m − 1)(θ + m − 2)
aij θ (θ + m − 1)(θ + m − 2)
X
13
X
µθ,m−2,p−2 (σ4 )
σ4 ∈Sm−2,p−2
X
µθ,m−2,p−2 (σ5 )
σ5 ∈Sm−2,p−2
(θ + p − 1)(θ + p − 2) aij . (θ + m − 1)(θ + m − 2)
For 1 ≤ i ≤ p < j ≤ m, we only consider two cases: s = i and s 6= i, X θ(p − 1) (Kθ,m,p )ij = aij (θ + m − 1)(θ + m − 2)
µθ,m−2,p−2 (σ1 )
σ1 ∈Sm−2,p−2
+ aij = aij
(p − 1)2 (θ + m − 1)(θ + m − 2)
X
µθ,m−2,p−2 (σ2 )
σ2 ∈Sm−2,p−2
(p − 1)(p + θ − 1) . (θ + m − 1)(θ + m − 2)
For p < i < j ≤ m, (Kθ,m,p )ij = aij
p(p − 1) . (θ + m − 1)(θ + m − 2)
˜ θ,m,p as in Equation (5.5). First we analyze the case when K is diagonal. Now we consider the estimate K Theorem 5.4. Let D = diag(d1 , . . . , dp , 0, . . . , 0), then p θ−1 −1 ˜ θ,m,p = E VσT (Vσ DVσT )+ Vσ = K D+ + diag(d−1 1 , . . . , dp , 0, . . . , 0). θ+m−1 θ+m−1 Pn Proof. First we notice that Wσ := Vσ DVσT = ( i=1 dl elσ(i) elσ(j) )1≤i,j≤p is a diagonal matrix. For 1 ≤ i ≤ p, n X dσ(i) if σ(i) ∈ [n], l 2 (Wσ )ii = dl (eσ(i) ) = 0 otherwise. l=1
Thus Wσ = diag(dσ(1) 1σ(1)∈[n] , . . . , dσ(p) 1σ(p)∈[n] ) and Wσ+ = diag (dσ(1) 1σ(1)∈[n] )+ , . . . , (dσ(p) 1σ(p)∈[n] )+ . Next VσT W + Vσ =
Pp
+
is still a diagonal matrix where for 1 ≤ i ≤ m (dσ(l) 1σ(l)∈[n] )+ if i ∈ {σ(1), . . . , σ(p)}, (VσT W + Vσ )ii = 0 otherwise.
l=1 (dσ(l) 1σ(l)∈[n] )
14
GABRIEL H. TUCCI AND KE WANG
˜ θ,m,p is also diagonal and Therefore K ˜ θ,m,p )ii = (K
p X X l=1
µθ,m,p (σ)(di 1i∈[n] )+ .
σ∈Sm,p , σ(l)=i
For 1 ≤ i ≤ n, X
˜ θ,m,p )ii = d−1 (K i
µθ,m,p (σ) =
−1 p di θ+m−1 , if 1 ≤ i ≤ p,
σ∈Sm,p , σ(l)=i
θ+p−1 d−1 i θ+m−1 , if p + 1 ≤ i ≤ n.
˜ θ,m,p )ii = 0. For n + 1 ≤ i ≤ m, (K
Obtaining a close form expression for Equation (5.5) in the general case seems to be much more challenging. However, we are able to give an inductive formula for non-negative definite matrix K, with the help of a result of Kurmayya and Sivakumar’s result [10]. Theorem 5.5 (Theorem 3.2, [10]). Let M = [A a] ∈ Rm×n be a block matrix, with A ∈ Cm×(n−1) and a ∈ Cm being written as a column vector. Let B = M ∗ M and s = kak2 − a∗ AA+ a. Then (AA∗ )+ + s−1 (A+ a)(A+ a)∗ −s−1 (A+ a) + B = −s−1 (A+ a)∗ s−1 if s 6= 0 and (AA∗ )+ + kbk2 (A+ a)(A+ a)∗ − (A+ a)(A+ b)∗ − (A+ b)(A+ a)∗ B = −kbk2 (A+ a)∗ + (A+ b)∗
−kbk2 A+ a + A+ b , kbk2
+
if s = 0, with b = (A∗ )+ (I + A+ a(A+ a)∗ )−1 A+ a. For a non-negative definite matrix K, one can decompose u1 d1 u2 d1 K = U DU ∗ = . .. .. . um where U is a unitary matrix. Then uσ(1) d1 uσ(2) Wσ = Vσ KVσT = . ..
∗ u1
u∗2
...
u∗m ,
dm
d1 ..
uσ(p)
u∗ σ(1)
.
u∗σ(2)
...
u∗σ(p)
dm
u ˜σ(1) u ˜σ(2) ˜∗ = . u .. σ(1) u ˜σ(p)
u ˜∗σ(2)
...
u ˜∗σ(p) := M ∗ M,
where ˜∗σ(1) Let M = [M1 a] with M1 = u
p p u ˜i = ( d1 uii , . . . , dm um i ). ∗ ∗ u ˜σ(2) . . . u ˜σ(p−1) and a = u ˜∗σ(p) . Let s = kak2 − a∗ M1 M1+ a and
b = (M1∗ )+ (I + M1+ a(M1+ a)∗ )−1 M1+ a. By Theorem 5.5, (M1 M1∗ )+ ∗ + (M M ) = 0
0 + Eσ 0
SINGULAR SAMPLE COVARIANCE MATRICES
and the matrix Eσ = −1 s (M1+ a)(M1+ a)∗ −s−1 (M1+ a)∗
15
−s−1 (M1+ a) if s 6= 0, s−1 (5.6)
2 kbk (M1+ a)(M1+ a)∗ − (M1+ a)(M1+ b)∗ − (M1+ b)(M1+ a)∗ −kbk2 (A+ a)∗ + (A+ b)∗
2
−kbk
M1+ a + 2
M1+ b
kbk
if s = 0.
Therefore, ˜ θ,m,p = E(VσT K
(M1 M1∗ )+ 0
0 ˜ θ,m,p−1 + E(VσT Eσ Vσ ). V ) + E(VσT Eσ Vσ ) = K 0 σ
(5.7)
6. Performance and Simulations In this Section, we study the performance of our estimators and we compare it with traditional methods. We focus on the case where the true covariance matrix has a Toeplitz structure. More specifically, we focus on the following two types of Toeplitz matrices. 6.1. Tridiagonal Toeplitz Matrix. Consider an m × m symmetric tridiagonal Toeplitz matrix of the form 1 b b 1 b .. .. .. B= . . . . b 1 b b 1 Proposition 6.1.1 ([3]). The eigenvalues and corresponding eigenvectors of B are given by πj , λj = 1 + 2b cos m+1 and 2πj mπj T πj vj = sin , sin , . . . , sin where j = 1, 2, . . . , m. m+1 m+1 m+1 We are interested in the case when B is non-negative definite and the entries of B are non-negative. Therefore, i h 1 it is not hard to see that b should belong to the set 0, 2 cos(π/(m+1)) for this to hold. 6.2. Power Toeplitz matrix. An m × m power Toeplitz matrix is given by 1 α α . . . αm−1 α 1 α · · · αm−2 .. = α|i−j| .. .. .. Aα = ... . . . . . 1≤i,j≤m αm−2 αm−3 · · · 1 α αm−1 αm−2 · · · α 1 Proposition 6.2.1. Let Aα as before then (1) Aα ≥ 0 if and only if |α| ≤ 1. (2) det(Aα ) = (1 − α2 )m−1 .
16
GABRIEL H. TUCCI AND KE WANG
(3) For α 6= 1,
Aα −1
1 −α 1 = 1 − α2
−α 1 + α2 .. .
−α .. . −α
..
. 1 + α2 −α
. −α 1
In particular, when m → ∞, the asymptotic behavior of the eigenvalues of Aα −1 are essentially the same as that of tridiagonal Toepltiz matrix. Proof. For (1), use induction. (2) follows directly from (1). To prove (3), use the matrix inverse formula and (1). For our practical purposes, we consider the case when α ∈ [0, 1). 6.3. Preliminaries on the asymptotic behavior of large Toeplitz matrices. We first collect some basic definitions and theorems regarding large Toeplitz matrices from Albrecht B¨ottcher and Bernd Silbermann’s book [4]. For an infinite Toeplitz matrix of the form a0 a−1 a−2 . . . a1 a0 a−1 . . . , A = (aj−k )∞ = a2 a1 j,k=0 a0 . . . .. .. .. .. . . . . define the symbol of matrix A to be ∞ X
a = a(eiθ ) =
an eiθn ,
n=−∞
for 0 ≤ θ ≤ 2π. Let Am be the m × m principal minor of the matrix A. Given a Borel subset E ⊂ C we define the measures m
µm (E) =
1 X (m) χE (λj ), m j=1
(6.1)
and 1 µ(E) = 2π
Z
2π
χE (a(eiθ ))dθ,
(6.2)
0 (m) m }j=1
where χE is the characteristic function of the set E and {λj classical result holds.
are the eigenvalues of Am . The following
Theorem 6.1 (Corollary 5.12 in [4]). If a ∈ L∞ is real-valued, then the measures µm given by (6.1) converge weakly to the measure µ defined by (6.2). 6.4. Asymptotic Behavior of Toeplitz Matrices under Ewens Estimator. For the symmetric tridiagonal Toeplitz matrix B its symbol is a(eiθ ) = 1 + beiθ + be−iθ = 1 + 2b cos θ,
SINGULAR SAMPLE COVARIANCE MATRICES
17
where θ ∈ [0, 2π]. By Theorem 1.2 in [4], the spectrum of B as m tends to infinity is supported on the interval [1 − 2b, 1 + 2b]. On the other hand, by Theorem 4.1, we have that Bθ := E(Mσ BMσ∗ ) θ2 + θ − 2 b(θ − 1) Lm + Tm (θ + m − 2)(θ + m − 1) (θ + m − 2)(θ + m − 1) 2b(m − 1) + eeT − Im , (θ + m − 2)(θ + m − 1) = Im +
(6.3)
where
Tm
0 1 1 0 3 2 := ... 3 4 3 4 2 3
3 2 0 .. .
3 4 2 .. .
··· ··· ··· .. .
4 4 3
··· ··· ···
0 2 3
2 3 3 .. . 2 3 0 1 1 0
3 4 4
and
Lm
0 b :=
b 0 .. .
b .. .
..
. 0 b
b
. b 0
If θ is a fixed constant greater than 1 then as m → ∞, 4m b(θ − 1) kTm k ≤ 2 → 0 (θ + m − 2)(θ + m − 1) m
(6.4)
θ2 + θ − 1 kLm k → 0, (θ + m − 2)(θ + m − 1)
(6.5)
and
as m → ∞. Therefore, Bθ and (1 − [9]) and by Theorem 2.6 in [9],
2 m )Im
+
2 T m ee
are asymptotically equivalent sequences (see Chapter 2, 2 2 (1− m )Im + m eeT
θ lim µB m = lim µm
m→∞
m→∞
which is a rank-1 perturbation of identity matrix. Therefore, θ lim µB m = δ1 ,
m→∞
where δt is the Dirac measure at the point t. A more interesting situation happens when θ = βm for a fixed constant β. In this case, Bθ = Im +
β2 β 1 2b 1 Lm + Tm + (eeT − Im ). (β + 1)2 (β + 1)2 m (β + 1)2 m
Since 1 β 1 Tr Tm m (β + 1)2 m
!2 ≤
and 1 2b 1 Tr (eeT − Im ) 2 m (β + 1) m
1 16m2 → 0 m3
!2 ≤
4b2 2 m → 0, m3
18
GABRIEL H. TUCCI AND KE WANG
as m → ∞. By Lemma 2.3 in [1], the Levy metric of the empirical distributions of two m × m Hermitian matrix A, B satisfies 1 1/3 B L(µA Tr(A − B)(A − B)∗ . m , µm ) ≤ m It is known (see Theorem 6, Section 4.3, [8]) that the distribution functions µm converges weakly to µ if and only if the Levy metric L(µm , µ) → 0. Therefore β )2 Lm Im +( β+1
θ lim µB m = lim µm
m→∞
m→∞
For the matrix β2 β2 Lm = B+ Im + 2 (β + 1) (β + 1)2
.
! β2 1− Im , (β + 1)2
which is still a tridiagonal Toeplitz matrix with symbol β2 cos(θ). (β + 1)2 β2 β2 Hence the limit eigenvalue distribution is supported on the interval 1 − 2b (β+1) 2 , 1 + 2b (β+1)2 . The Figure below shows the estimated density function for the spectrum as β changes. a(eiθ ) = 1 + 2b
Figure 3. This Figure shows the density functions of the empirical spectral distribution of 300 × 300 tridiagonal Toeplitz matrix B with b = 0.3 and those of E(Mσ BMσ∗ ) for different θ’s. For the power Toeplitz matrix Aα , a(eiθ ) = 1 +
eiθ
α α cos(θ) − α + −iθ = 1 + 2α . −α e −α (cos(θ) − α)2 + sin2 (θ)
1+α Thus the spectrum of Aα as m tends to infinity is supported on [ 1−α 1+α , 1−α ].
By Theorem 4.1, one can get Aθ = E(Mσ Aα Mσ∗ ) θ2 + θ − 1 α(am − ma + m − 1 − (θ − 1)(a − 1)) T (Aα − Im ) + (ee − Im ) (θ + m − 2)(θ + m − 1) (1 − α)2 (θ + m − 2)(θ + m − 1) 1 θ−1 − Jm , 1 − α (θ + m − 2)(θ + m − 1) = Im +
(6.6)
SINGULAR SAMPLE COVARIANCE MATRICES
19
Figure 4. This Figure shows the estimated density functions of the empirical spectral distribution of 300×300 power Toeplitz matrix A0.5 and those of E(Mσ A0.5 Mσ∗ ) for different θ’s. where Jm = (lij ) with diagonal entries lii = 0 and non-diagonal entries lij = αi + αj + αm+1−i + αm+1−j . In the case θ = βm then !2 m X 1 1 θ−1 1 Tr Jm ≤ 3 (αi + αj + αm+1−i + αm+1−j )2 m 1 − α (θ + m − 2)(θ + m − 1) m (1 − α)2 i,j=1 (6.7) 16 = o(1). ≤ m(1 − α) Similarly, we can show that 2
β Im + (β+1) 2 (Aα −Im )
θ lim µA m = lim µm
m→∞
For the matrix Im +
m→∞
.
β2 β2 β2 (A − I ) = A + 1 − Im , α m α (β + 1)2 (β + 1)2 (β + 1)2
one has
α β2 β2 α 2α(cos(θ) − α) + =1+ . 2 iθ −iθ 2 (β + 1) e − α e −α (β + 1) (cos(θ) − α)2 + sin2 (θ) Thus the limiting spectrum is supported on the interval " # β2 β2 α α 1− ,1 + . (β + 1)2 1 + α (β + 1)2 1 − α a(eiθ ) = 1 +
6.5. Simulations. In this Section, we present some of the simulations to test the performance of our estimators. Let Aα be an m × m Toeplitz covariance matrix with entries ai,j = α|i−j| . Assume that we take n measurements and we want to recover Σ to the best of our knowledge. After performing the measurements we construct the sample covariance matrix K and proceed to recover Aα in terms of the operators invcovp (K) and E(Mσ KMσ∗ ). First we look at the eigenvalue distributions under invcovp and Ewens estimators. In Figure 5, we can observe a realization of this experiment with α = 0.5, m = 200 and n = 150. We see that the eigenvalues of Aα range roughly from 1/3 to 3. For the sample covariance matrix K, 50 eigenvalues are precisely zero. Both, the inverse of invcovp and Ewens estimators give non-zero eigenvalues. The eigenvalues
20
GABRIEL H. TUCCI AND KE WANG
True Covariance Matrix with m=200 and ! = 0.5
4
2
0 0
50
100
150
200
250
50
100
150
200
250
50
100
150
200
250
10
Sample Covariance with n = 150
5 0 −5 0 4
Inverse of invcov with optimum p ( p = 45 ) MSE = 0.7420
2
0 0
4
True Covariance Matrix with m = 200 n = 150 and ! = 0.5
2 0 0
50
100
150
200
250
200
250
Sample Covariance with n = 150
10 5 0 −5 0 4
50
100
150
Ewens Method with Optimum " (" = 261) MSE = 0.6607
2 0 0
50
100
150
200
250
Figure 5. Comparison of the eigenvalue distributions of the true covariance matrix and sample covariance matrix and the invcov estimator vs. the Ewens estimator.
under the inverse of invcovp (p = 45) range from 0.4 to 2 and those under Ewens (with θ = 261) estimate from 0.6 to 2.7. Similar results were observed for other parameter values. In Figures 6 and 7, we show the performance of the estimators for different values of p and θ. It was observed in [13] that the estimator invcovp outperforms the more standard and classical estimator of diagonal loading for optimal loading parameters as in Ledoit and Wolf [11] by computing the Frobenius norm (MSE) p kAα − m invcovp (K)−1 k2 for the different values of p and then computing kAα − KLW k2 . The same type of experiments were performed on a variety of different scenarios as well. Let Aα , m, p, n, K and θ as before and define the functions f (m, n, α, p) = kAα − (p/m)invcovp (K)−1 k2 , g(m, n, α, p) = kA−1 α − (m/p)invcovp (K)k2 , F (m, n, α, θ) = kAα − E(Mσ KMσ∗ )k2 .
SINGULAR SAMPLE COVARIANCE MATRICES
21
Figure 6. The functions f and g for m = 200, n = 150 and α = 0.5 as functions of p.
Figure 7. The functions f and g for m = 200, n = 150 and α = 0.5 as functions of θ. We can observe how the Ewens estimator outperforms the invcovp estimator for the optimum values of p and θ. The next Figures show the behavior of the previous functions for different parameter values α, m, n, p and θ. References [1] Z.D Bai. Methodologies in spectral analysis of large-dimensional random matrices, a review. Statist. Sinica, 9(3):611–677, 1999. [2] V. Betz, D, Ueltschi and Y. Velenik. Random permutations with cycle weights Ann. Appl. Probab., vol. 21, no. 1, 312331, 2011. [3] A. B¨ ottcher and S.M. Grudsky. Spectral properties of banded Toeplitz matrices. Society for Industrial Mathematics, 2005. [4] A. B¨ ottcher and B. Silbermann. Introduction to large truncated Toeplitz matrices. Springer Verlag, 1999. [5] N.R. Draper and H. Smith. Applied Regression Analysis (Wiley Series in Probability and Statistics). Wiley-Interscience, 1998. [6] N. Ercolani and D. Ueltschi. Cycle structure of random permutations with cycle weights, 2011. [7] Fulton H., Representation Theory, Springer, 1991.
22
GABRIEL H. TUCCI AND KE WANG
Figure 8. f (m, n, α, p) = kAα − invcovp (K)−1 k2
Figure 9. g(m, n, α, p) = kA−1 α − invcovp (K)k2
Figure 10. F (m, n, α, θ) = kAα − E(Mσ KMσ∗ )k2
[8] J. Galambos. Advanced probability theory, volume 10. CRC, 1995. [9] R.M. Gray. Toeplitz and circulant matrices: A review. Information Systems Laboratory, Stanford University, 1971. [10] T. Kurmayya and KC Sivakumar. Moore-penrose inverse of a gram matrix and its nonnegativity. Journal of Optimization Theory and Applications, 139(1):201–207, 2008. [11] O. Ledoit and M. Wolf. Some hypothesis tests for the covariance matrix when the dimension is large compared to the sample size. Annals of statistics, pages 1081–1102, 2002. [12] I. Macdonald, Symmetric functions and Hall Polynomials Clarendon Press, Oxford University Press, New York, 1995. [13] T. Marzetta, G. Tucci, and S. Simon. A random matrix–theoretic approach to handling singular covariance estimates, IEEE Transactions on Information Theory, vol. 57, no. 9, pp. 6256–6271, 2011. [14] X. Mestre. Improved estimation of eigenvalues and eigenvectors of covariance matrices using their sample estimates. Information Theory, IEEE Transactions on, 54(11):5113–5129, 2008. [15] X. Mestre and M.A. Lagunas. Diagonal loading for finite sample size beamforming: an asymptotic approach. Robust adaptive beamforming, pages 201–257, 2006. [16] R. Muirhead, Aspects of Multivariate Statistical Theory. John Wiley & Sons, New York, 1982.
SINGULAR SAMPLE COVARIANCE MATRICES
23
[17] C.D. Richmond, R. Rao Nadakuditi, and A. Edelman. Asymptotic mean squared error performance of diagonally loaded capon-mvdr processor. In Signals, Systems and Computers, 2005. Conference Record of the Thirty-Ninth Asilomar Conference on, pages 1711 – 1716, 28 - november 1, 2005. [18] B. Sagan, The Symmetric Group: Representations. Combinatorial Algorithms, and Symmetric Functions, Springer, 2nd edition, 2010. [19] R.P. Stanley et al. Enumerative Combinatorics: Volume 2 Cambridge university press Cambridge, 1999. [20] M.A.G. Viana. The covariance structure of random permutation matrices. Algebraic methods in statistics and probability: AMS Special Session on Algebraic Methods and Statistics, April 8-9, 2000, University of Notre Dame, Notre Dame, Indiana, 287:303, 2001. Gabriel H. Tucci is with Bell Labs, Alcatel–Lucent, 600 Mountain Ave, Murray Hill, NJ 07974. E-mail address:
[email protected] Ke Wang is with the Math Department at Rutgers University, Busch Campus, Piscataway, NJ E-mail address:
[email protected]