NEW METHODS FOR HANDLING SINGULAR SAMPLE ...

NEW METHODS FOR HANDLING SINGULAR SAMPLE COVARIANCE MATRICES

arXiv:1111.0235v1 [math.PR] 1 Nov 2011

GABRIEL H. TUCCI AND KE WANG

Abstract. The estimation of a covariance matrix from an insufficient amount of data is one of the most common problems in fields as diverse as multivariate statistics, wireless communications, signal processing, biology, learning theory and finance. In [13], a new approach to handle singular covariance matrices was suggested. The main idea was to use dimensionality reduction in conjunction with an average over the unitary matrices. In this paper we continue with this idea and we further consider new innovative approaches that show considerable improvements with respect to traditional methods such as diagonal loading. One of the methods is called the Ewens estimator and uses a randomization of the sample covariance matrix over all the permutation matrices with respect to the Ewens measure. The techniques used to attack this problem are broad and run from random matrix theory to combinatorics.

1. Introduction The estimation of a covariance matrix from an insufficient amount of data is one of the most common problems in fields as diverse as multivariate statistics, wireless communications, signal processing, biology, learning theory and finance. For instance, the covariation between asset returns plays a crucial role in modern finance. The covariance matrix and its inverse are the key statistics in portfolio optimization and risk management. Many recent financial innovations involve complex derivatives, like exotic options written on the minimum, maximum or difference of two assets, or some structured financial products, such as CDOs. All of these innovations are built upon, or in order to exploit, the correlation structure of two or more assets. In the field of wireless communications, covariance estimates allows us to compute the direction of arrival (DOA), which is a critical task in smart antenna systems since it enables accurate mobile location. Another application is in the field of biology and involves the interactions between proteins or genes in an organism and the joint time evolution of their interactions. Typically the covariance matrix of a multivariate random variable is not known but has to be estimated from the data. Estimation of covariance matrices then deals with the question of how to approximate the actual covariance matrix on the basis of samples from the multivariate distribution. Simple cases, where the number of observations is much greater than the number of variables, can be dealt with by using the sample covariance matrix. In this case, the sample covariance matrix is an unbiased and efficient estimator of the true covariance matrix. However, in many practical situations we would like to estimate the covariance matrix of a set of variables from an insufficient amount of data. In this case the sample covariance matrix is singular (non-invertible) and therefore a fundamentally bad estimate. More specifically, let X be a random vector X = (X1 , . . . , Xm )T ∈ Cm×1 and assume for simplicity that X is centered. Then the true covariance matrix is given by Σ = E(XX ∗ ) = (cov(Xi , Xj ))1≤i,j≤m . (1.1) m×1 Consider n independent samples or realizations x1 , . . . , xn ∈ C and form an m × n data matrix M = (x1 , . . . , xn ). Then the sample covariance matrix is an m × m non-negative definite matrix 1 (1.2) K = M M ∗. n If n → +∞ as m tends to infinity, then the sample covariance matrix K converges (entrywise) to Σ almost surely. Whereas, as we mentioned before, in many empirical problems, the number of measurements is less 1

2


than the dimension (n < m), and thus the sample covariance matrix K is almost always singular. Our target in this paper is to recover the true covariance matrix Σ from K under the condition n < m. The conventional treatment of covariance singularity artificially converts the singular sample covariance matrix into an invertible (positive definite) covariance by the simple expedient of adding a positive diagonal matrix, or more generally, by taking a linear combination of the sample covariance and an identity matrix. This procedure is variously called “diagonal loading” or “ridge regression” [17, 5]. The method is pretty straightforward. Consider αK + βIm as an estimate of Σ, where α, β are called loading parameters. This resulting matrix is positive definite (invertible) that preserves the eigenvectors of the sample covariance. The eigenvalues of αK + βIm are a uniform shift of the eigenvalues of K. There are many methods in choosing the optimum loading parameters, see [11], [14] and [15]. In Marzetta, Tucci and Simon’s paper [13], they suggested a new approach to handle singular covariance matrices. Let p ≤ n be a parameter, to be estimated later, and consider the set of all p × m one-sided unitary matrices Ωp,m = {Φ ∈ Cp×m : ΦΦ∗ = Ip }. (1.3) Endow Ωp,m with the Haar measure, that is, the uniform distribution on the set Ωp,m . We define the operators covp (K) = E(Φ∗ (ΦKΦ∗ )Φ) (1.4) and invcovp (K) = E(Φ∗ (ΦKΦ∗ )−1 Φ)

(1.5)

where the expectation is taken with respect to the Haar measure. Surprisingly, they found that p (mp − 1)K + (m − p)Tr(K)I , covp (K) = m (m2 − 1)m which is the same as diagonal loading. Moreover, they investigated the properties of invcovp (K). If K is decomposed as K = U DU ∗ , with D = diag(d1 , . . . , dn , 0, . . . , 0), then invcovp (K) = U invcovp (D)U ∗ , and invcovp (D) = diag(λ1 , . . . , λn , µ, . . . , µ). In other words, invcovp (K) preserves the eigenvectors of K, and transforms all the zero eigenvalues to a nonzero constant µ. They also provided the formulas to compute the λi ’s and µ, and studied the asymptotic behavior of invcovp (D) using techniques from free probability. In this paper, we investigate new methods to estimate singular covariance matrices. In Section 2, we present some preliminaries in Schur polynomials that are later used in this work. In Section 3, we continue to work on the operator invcovp suggested in [13]. We also show that invcovp (K) actually has a very simple algebraic structure, i.e. it is a polynomial in K. A formula for computing E(Φ(Φ∗ Dn Φ)l Φ∗ ) is given to help to further study Equation (65) and Theorem 1, Section VI in [13]. In Section 4, we consider a new approach, called the Ewens estimator, to estimate Σ. In this one, the average is taken over the set of all m × m permutation matrices with respect to the Ewens measure. The explicit formula for the Ewens estimator is computed by a combinatorial argument. In Section 5, we combine the ideas of the first two methods. We extend the definition of permutation matrices to get p × m unitary matrices and define two new operators Kθ,m,p := E VσT (Vσ KVσT )Vσ , ˜ θ,m,p := E V T (Vσ KV T )+ Vσ K σ σ

SINGULAR SAMPLE COVARIANCE MATRICES

3

Figure 1. Young tableaux representation of the partition (5, 4, 1). to estimate Σ and Σ−1 respectively. We provide the explicit formula for Kθ,m,p and an inductive formula to ˜ θ,m,p . In Section 6, it is assumed that Σ has some special form, i.e. tridiagonal Toeplitz matrix compute K or power Toeplitz matrix and we study its asymptotic behavior under Ewens estimator. In this Section, we also present some simulations under different methods to test the effect of the parameters. Notation: Throughout this paper, 1S is the indicator function of a set S. We sometimes use [n] to present the set {1, 2, . .p . , n}, and Tr(A) is the trace of a matrix A. For a vector v = (v1 , . . . , vm ), we use pthe Euclidean Pm 2 Tr(AA∗ ). We norm kvk2 = i=1 |vi | and for an m × m matrix A, we use the Frobenius norm kAk = use the notation µ ` n to indicate that µ is a partition of the positive integer n. 2. Schur Polynomials Preliminaries A symmetric polynomial is a polynomial P (x1 , x2 , . . . , xn ) in n variables such that if any of the variables are interchanged one obtains the same polynomial. Formally, P is a symmetric polynomial if for any permutation σ of the set {1, 2, . . . , n} one has P (xσ(1) , xσ(2) , . . . , xσ(n) ) = P (x1 , x2 , . . . , xn ). Symmetric polynomials arise naturally in the study of the relation between the roots of a polynomial in one variable and its coefficients, since the coefficients can be given by a symmetric polynomial expressions in the roots. Symmetric polynomials also form an interesting structure by themselves. The resulting structures, and in particular the ring of symmetric functions, are of great importance in combinatorics and in representation theory (see for instance [7, 16, 12, 18] for more on details on this topic). The Schur polynomials are certain symmetric polynomials in n variables. This class of polynomials is very important in representation theory since they are the characters of irreducible representations of the general linear groups. The Schur polynomials are indexed by partitions. A partition of a positive integer n, also called an integer partition, is a way of writing n as a sum of positive integers. Two partitions that differ only in the order of their summands are considered to be the same partition. Therefore, we can always represent a partition λ of a positive integer n as a sequence of n non-increasing and non-negative integers di such that n X

di = n

with

d1 ≥ d2 ≥ d3 ≥ . . . ≥ dn ≥ 0.

i=1

Notice that some of the di could be zero. Integer partitions are usually represented by the so called Young’s tableaux (also known as Ferrers’ diagrams). A Young tableaux is a finite collection of boxes, or cells, arranged in left-justified rows, with the row lengths weakly decreasing (each row has the same or shorter length than its predecessor). Listing the number of boxes on each row gives a partition λ of a non-negative integer n, the total number of boxes of the diagram. The Young diagram is said to be of shape λ, and it carries the same information as that partition. For instance, in Figure 1 we can see the Young tableaux corresponding to the partition (5, 4, 1) of the number 10. Given a partition λ of n n = d1 + d2 + · · · + dn : d1 ≥ d2 ≥ · · · ≥ dn ≥ 0

4


Figure 2. Young tableaux representation of the partition (5, 4, 1) with its corresponding hook’s lengths. the following functions are alternating polynomials (in other words they change sign under any transposition of the variables):  d1  x1 xd21 . . . xdn1  xd2 xd2 . . . xdn2  2  1  a(d1 ,...,dn ) (x1 , . . . , xn ) = det  . .. ..  ..  .. . . .  dn dn dn x1 x2 . . . xn X 1 n = (σ)xdσ(1) · · · xdσ(n) σ∈Sn

where Sn is the permutation group of the set {1, 2, . . . , n} and (σ) the sign of σ. Since they are alternating, they are all divisible by the Vandermonde determinant Y ∆(x1 , . . . , xn ) = (xj − xk ). 1≤j 0 and K(σ) is the number of cycles in σ. The case θ = 1 corresponds to the uniform measure. This measure has recently appeared in mathematical physics models (see e.g. [2] and [6]) and one has only recently started to gain insight into the cycle structures of such random permutations. Let σ be a permutation in Sm , the corresponding permutation matrix Mσ is an m × m matrix such that Mσ (i, j) = 1i=δ(j) . If we denote ei to be a 1 × m vector such that the ith entry is 1 and all the others entries are zero, then   eσ(1)   Mσ =  ...  , eσ(m) which is, of course, a unitary matrix. Given the sample covariance matrix K we define the new estimator for Σ as Kθ := E(Mσ KMσ∗ )

(4.1)

where the expectation is taken with respect to the Ewens measure. Theorem 4.1. Let K = (aij ) be an m × m matrix in Cm×m . Then Kθ = E(Mσ KMσ∗ ) is an m × m matrix such that the diagonal terms satisfy (Kθ )ii =

θ−1 1 aii + Tr(K), θ+m−1 θ+m−1

(4.2)

and the non-diagonal terms (i 6= j) satisfy  (Kθ )ij =



1  2 θ aij + (θ − 1)aji + θ (θ + m − 2)(θ + m − 1)

X

(aik + akj ) +

X

 alk 

l6=i,k6=j k6=l

k6=i,j

(4.3)

 =



1 (θ2 − 1)aij + (θ − 1)aji + (θ − 1) (θ + m − 2)(θ + m − 1)

X

(aik + akj ) +

k6=i,j

X l6=k

Remark 4.2. If θ = 1, eeT eeT eKeT K1 = α + β(Im − ) where α = = m m m

P

i,j

aij

m

This has already been computed in [20], Proposition 2.2. If K = D = diag(d1 , . . . , dm ), then Kθ = which corresponds to diagonal loading.

Tr(D) θ−1 D+ Im , θ+m−1 θ+m−1

and β =

Tr(K) − α . m−1

alk  .


9

Proof. First, 

 eσ (1)   Mσ KM ∗ =  ...  K e∗σ (1)

···

e∗σ (m)

m X m X

=

! akl ekσ (i)elσ (j)

= (aσ(i)σ(j) )1≤i,j≤m .

l=1 k=1

eσ (m) For diagonal terms, (Kθ )ii =

X σ∈Sm

= aii =

X

pθ,m (σ)aσ(i)σ(i) = aii

pθ,m (σ) +

X 6=i

σ∈Sm σ(i)=i

θ θ+m−1

X

pθ,m−1 (˜ σ) +

σ ˜ ∈Sm−1

1 θ aii + θ+m−1 θ+m−1

X l6=i

X l6=i

all

all =

X

pθ,m (σ)

σ∈Sm σ(i)=l

X all pθ,m−1 (ˆ σ (l)) θ+m−1 σ ˆ (l)

θ−1 1 aii + Tr(K). θ+m−1 θ+m−1

Now we compute the off-diagonal terms (Kθ )ij (i 6= j). For σ ∈ Sm , if σ(i) = i and σ(j) = j, then σ = (i)(j)σ1 with σ1 ∈ Sm−2 and K(σ) = K(σ1 ) + 2, pθ,m (σ) =

θ2 pθ,m−2 (σ1 ). (θ + m − 2)(θ + m − 1)

If σ(i) = j, σ(j) = i, we erase i, j from σ to obtain σ2 ∈ Sm−2 , and pθ,m (σ) =

θ pθ,m−2 (σ6 ). (θ + m − 2)(θ + m − 1)

If σ(i) = i and σ(j) = k 6= i, j, then σ = (i)ˆ σ with σ ˆ ∈ Sm−1 , and K(σ) = K(ˆ σ ) + 1. Furthermore, we can erase j from σ ˆ to get a new permutation σ3 (k) ∈ Sm−2 such that K(σ3 (k)) = K(ˆ σ ) and finally pθ,m (σ) = Notice that

P

σ3 (k)

θ pθ,m−2 (σ3 (k)). (θ + m − 2)(θ + m − 1)

pθ,m−2 (σ3 (k)) = 1.

If σ(i) = l 6= i, j and σ(j) = j, then as above we can have σ4 (l) ∈ Sm−2 such that pθ,m (σ) =

X θ pθ,m−2 (σ4 (l)) and pθ,m−2 (σ4 (l)) = 1. (θ + m − 2)(θ + m − 1) σ4 (l)

If σ(i) = l 6= i and σ(j) = k 6= j (k 6= l), and we exclude the case that σ(i) = j, σ(j) = i, we erase i, j from σ to obtain σ5 (l, k) ∈ Sm−2 , thus pθ,m (σ) =

X 1 pθ,m−2 (σ5 (l, k)) and pθ,m−2 (σ5 (l, k)) = 1. (θ + m − 2)(θ + m − 1) σ5 (l,k)

10


Therefore, for i 6= j, X

(Kθ )ij =

pσ,m (σ)aσ(i)σ(j)

σ∈Sm

= aij + aji +

θ2 (θ + m − 2)(θ + m − 1) θ (θ + m − 2)(θ + m − 1)

X

aik

k6=i,j

+

X l6=i,j

+

alj

X

X

θ (θ + m − 2)(θ + m − 1) X

l6=i,k6=j,k6=l without l=j,k=i

σ5 (k,l)∈Sm−2

pθ,m−2 (σ2 )

σ2 ∈Sm−2

θ (θ + m − 2)(θ + m − 1)

X

pθ,m−2 (σ1 )

σ1 ∈Sm−2

alk

X

pθ,m−2 (σ3 (k))

σ3 (k)∈Sm−2

X

pθ,m−2 (σ4 (l))

σ4 (l)∈Sm−2

1 pθ,m−2 (σ5 (k, l)) (θ + m − 2)(θ + m − 1)

 =



1  2 θ aij + (θ − 1)aji + θ (θ + m − 2)(θ + m − 1)

X k6=i,j

(aik + akj ) +

X

 alk  .

l6=i,k6=j k6=l

5. Hybrid Method In this Section, we combine the ideas of the first two methods to create a third hybrid method. First, we extend the definition of a permutation. For an integer p ≤ m, let n o Sp,m := σ : σ an injection from {1, 2, . . . , p} to {1, 2, . . . m} . m! and in the case p = m, Sm,m is the set of all permutations on {1, 2, . . . , m}. The size of the set Sp,m is (m−p)! For σ ∈ Sp,m , the associated p × m matrix takes the form   eσ(1) eσ(2)    Vσ :=  .   ..  eσ(p) th where eσ(i) = (e1σ(i) , e2σ(i) , . . . , em entry 1 and all others equal to zero. σ(i) ) is a 1 × m row vector with the σ(i) Notice Vσ VσT = Ip (5.1) and Pσ := VσT Vσ = diag(p1 , . . . , pm ) (5.2) where p X 1 if i ∈ {σ(1), . . . , σ(p)}, i 2 pi = (eσ(l) ) = 0 otherwise. l=1

Next, we use the Ewens measure on the permutation sets to define a probability on the set Sp,m . For each σ ∈ Sp,m , consider the set n o Ωσ := σ ˜ ∈ Sm : σ ˜{1,...,p} = σ .


11

In other words, Ωσ is the set of all permutations in Sm whose restriction to the set {1, 2, . . . , p} is equal to σ. Recall that pθ,m is the Ewens measure on Sm with parameter θ. Define the probability measure on Sp,m for σ ∈ Sp,m as X pθ,m (˜ σ ). (5.3) µθ,m,p (σ) := pθ,m (Ωσ ) = σ ˜ ∈Ωσ

Now we are ready to introduce two new operators Kθ,m,p := E VσT (Vσ KVσT )Vσ

(5.4)

˜ θ,m,p := E VσT (Vσ KVσT )+ Vσ , K

(5.5)

where (Vσ KVσT )+ is the Moore−Penrose pseudoinverse of the matrix Vσ KVσT . We use Kθ,m,p as an estimate ˜ θ,m,p for Ω−1 . Now we show a few results on these new estimators. for Ω and K Theorem 5.1. Let A be an m × m matrix A = (aij ) ∈ Cm×m . Then Kθ,m,p is an m × m matrix such that the diagonal entries are equal to  θ+p−1  θ+m−1 aii , if 1 ≤ i ≤ p, (Kθ,m,p )ii =  p θ+m−1 aii , if p + 1 ≤ i ≤ m. and the non-diagonal entries, assuming i < j (If j < i, exchange i and j in the following expression.) are equal to  (θ+p−1)(θ+p−2)  (θ+m−1)(θ+m−2) aij , if 1 ≤ i < j ≤ p,      (p−1)(θ+p−1) (Kθ,m,p )ij = (θ+m−1)(θ+m−2) aij , if 1 ≤ i ≤ p < j ≤ m,       p(p−1) (θ+m−1)(θ+m−2) aij , if p < i < j ≤ m. Remark 5.2. In the particular case that A is a diagonal matrix A = diag(d1 , . . . , dm ), then Kθ,m,p =

θ−1 p D+ diag(d1 , . . . , dp , 0, . . . , 0). θ+m−1 θ+m−1

For instance, if p = 1 and m = 3 then Kθ,3,1 =

1 diag(θa11 , a22 , a33 ). θ+2

Remark 5.3. In the general case with p = 2 and m = 3 then  (θ + 1)a11 θa12 1  θa21 (θ + 1)a22 Kθ,3,2 = θ+2 a31 a32

 a13 a23  . 2a33

Proof. Recall from Equation (5.2) that Pσ = VσT Vσ = diag(pσ1 , . . . , pσm ), thus VσT (Vσ AVσT )Vσ = (pσi pσj aij )1≤i,j≤m , where p X 1 if i ∈ {σ(1), . . . , σ(p)}, i 2 pi = (eσ(l) ) = 0 otherwise. l=1

12


For the diagonal entries, if 1 ≤ i ≤ p,

(Kθ,m,p )ii =

X

µθ,m,p (σ)(pσi )2 aii = aii

σ∈Sm,p

p X

X

µθ,m,p (σ)

l=1 σ∈Sm,p ,σ(l)=i



 X

= aii 

µθ,m,p +

X

X

µθ,m,p 

l6=i σ∈Sm,p ,σ(l)=i

σ∈Sm,p ,σ(i)=i

 = aii  =

θ θ+m−1

X

p−1 θ+m−1

µθ,m−1,p−1 +

σ 0 ∈Sm−1,p−1

 X

µθ,m−1,p−1 

σ 0 ∈Sm−1,p−1

θ+p−1 aii . θ+m−1

If p + 1 ≤ i ≤ m,

(Kθ,m,p )ii =

X

µθ,m,p (σ)(pσi )2 aii

= aii

σ∈Sm,p

p X

=

µθ,m,p (σ)

l=1 σ∈Sm,p ,σ(l)=i

 = aii 

X 

p θ+m−1

X

µθ,m−1,p−1 

σ 0 ∈Sm−1,p−1

p aii . θ+m−1

For non-diagonal entries, if 1 ≤ i < j ≤ p, which turns out to be the most complicated case, pσi pσj aij is non zero if i, j ∈ {σ(1), . . . , σ(p)}. Thus

(Kθ,m,p )ij = aij

X

X

s,t∈[p],s6=t

σ∈Sm,p , σ(s)=i,σ(t)=j

µθ,m,p (σ).

We divide the previous sum into five parts: (1) σ(i) = i, σ(j) = j, thus if we “erase” i, j from the sets [p] and [m], we get a new injection σ1 from [p]\{i, j} to [m]\{i, j} and K(σ) = K(σ1 ) + 2; (2) σ(s) = i, for some s ∈ [p]\{i, j}, σ(j) = j, if we ”erase” j from the sets [p], [m] and consider s, i as one number s˜, we can get a new injection σ2 : [p] ∪ s˜\{i, j, s} → [m] ∪ s˜\{i, j, s} , and K(σ) = K(σ2 ) + 1; (3) σ(t) = j, forsome t ∈ [p]\{i, j}, σ(i) = i, similar to case (2) by exchanging the roles of i, j, we can get a new injection σ3 with K(σ) = K(σ3 ) + 1; (4) σ(s) = i, σ(t) = j, s 6= t, for some s ∈ [p]\{i} and t ∈ [p]\{j}, if we consider s, i as a new number s˜ and t, j as a new number t˜, we get a new injection σ4 : [p] ∪ s˜, t˜\{i, j, s, t} → [m] ∪ s˜, t˜\{i, j, s, t} , and K(σ) = K(σ4 ); (5) σ(i) = j, σ(j) = i, if we ”erase” i, j, we can get a new injection σ5 : [p]\{i, j} → [m]\{i, j} and K(σ) = K(σ5 ) + 1.


(Kθ,m,p )ij = aij + +

aij θ(p − 2) (θ + m − 1)(θ + m − 2) aij θ(p − 2) (θ + m − 1)(θ + m − 2)

+ aij + =

θ2 (θ + m − 1)(θ + m − 2)

µθ,m−2,p−2 (σ1 )

σ1 ∈Sm−2,p−2

X

µθ,m−2,p−2 (σ2 )

σ2 ∈Sm−2,p−2

X

µθ,m−2,p−2 (σ3 )

σ3 ∈Sm−2,p−2

(p − 2)2 + (p − 2) (θ + m − 1)(θ + m − 2)

aij θ (θ + m − 1)(θ + m − 2)

X

13

X

µθ,m−2,p−2 (σ4 )

σ4 ∈Sm−2,p−2

X

µθ,m−2,p−2 (σ5 )

σ5 ∈Sm−2,p−2

(θ + p − 1)(θ + p − 2) aij . (θ + m − 1)(θ + m − 2)

For 1 ≤ i ≤ p < j ≤ m, we only consider two cases: s = i and s 6= i, X θ(p − 1) (Kθ,m,p )ij = aij (θ + m − 1)(θ + m − 2)

µθ,m−2,p−2 (σ1 )

σ1 ∈Sm−2,p−2

+ aij = aij

(p − 1)2 (θ + m − 1)(θ + m − 2)

X

µθ,m−2,p−2 (σ2 )

σ2 ∈Sm−2,p−2

(p − 1)(p + θ − 1) . (θ + m − 1)(θ + m − 2)

For p < i < j ≤ m, (Kθ,m,p )ij = aij

p(p − 1) . (θ + m − 1)(θ + m − 2)

˜ θ,m,p as in Equation (5.5). First we analyze the case when K is diagonal. Now we consider the estimate K Theorem 5.4. Let D = diag(d1 , . . . , dp , 0, . . . , 0), then p θ−1 −1 ˜ θ,m,p = E VσT (Vσ DVσT )+ Vσ = K D+ + diag(d−1 1 , . . . , dp , 0, . . . , 0). θ+m−1 θ+m−1 Pn Proof. First we notice that Wσ := Vσ DVσT = ( i=1 dl elσ(i) elσ(j) )1≤i,j≤p is a diagonal matrix. For 1 ≤ i ≤ p, n X dσ(i) if σ(i) ∈ [n], l 2 (Wσ )ii = dl (eσ(i) ) = 0 otherwise. l=1

Thus Wσ = diag(dσ(1) 1σ(1)∈[n] , . . . , dσ(p) 1σ(p)∈[n] ) and Wσ+ = diag (dσ(1) 1σ(1)∈[n] )+ , . . . , (dσ(p) 1σ(p)∈[n] )+ . Next VσT W + Vσ =

Pp

+

is still a diagonal matrix where for 1 ≤ i ≤ m (dσ(l) 1σ(l)∈[n] )+ if i ∈ {σ(1), . . . , σ(p)}, (VσT W + Vσ )ii = 0 otherwise.

l=1 (dσ(l) 1σ(l)∈[n] )

14


˜ θ,m,p is also diagonal and Therefore K ˜ θ,m,p )ii = (K

p X X l=1

µθ,m,p (σ)(di 1i∈[n] )+ .

σ∈Sm,p , σ(l)=i

For 1 ≤ i ≤ n, X

˜ θ,m,p )ii = d−1 (K i

µθ,m,p (σ) =

 −1 p  di θ+m−1 , if 1 ≤ i ≤ p, 

σ∈Sm,p , σ(l)=i

θ+p−1 d−1 i θ+m−1 , if p + 1 ≤ i ≤ n.

˜ θ,m,p )ii = 0. For n + 1 ≤ i ≤ m, (K

Obtaining a close form expression for Equation (5.5) in the general case seems to be much more challenging. However, we are able to give an inductive formula for non-negative definite matrix K, with the help of a result of Kurmayya and Sivakumar’s result [10]. Theorem 5.5 (Theorem 3.2, [10]). Let M = [A a] ∈ Rm×n be a block matrix, with A ∈ Cm×(n−1) and a ∈ Cm being written as a column vector. Let B = M ∗ M and s = kak2 − a∗ AA+ a. Then (AA∗ )+ + s−1 (A+ a)(A+ a)∗ −s−1 (A+ a) + B = −s−1 (A+ a)∗ s−1 if s 6= 0 and (AA∗ )+ + kbk2 (A+ a)(A+ a)∗ − (A+ a)(A+ b)∗ − (A+ b)(A+ a)∗ B = −kbk2 (A+ a)∗ + (A+ b)∗

−kbk2 A+ a + A+ b , kbk2

+

if s = 0, with b = (A∗ )+ (I + A+ a(A+ a)∗ )−1 A+ a. For a non-negative definite matrix K, one can decompose   u1 d1  u2   d1   K = U DU ∗ =  .   ..  ..   . um where U is a unitary matrix. Then   uσ(1) d1 uσ(2)     Wσ = Vσ KVσT =  .    ..  

  ∗  u1 

u∗2

...

u∗m ,

dm 

d1 ..

uσ(p) 



  u∗  σ(1) 

.

u∗σ(2)

...

u∗σ(p)

dm



u ˜σ(1) u   ˜σ(2)  ˜∗ = .  u  ..  σ(1) u ˜σ(p)

u ˜∗σ(2)

...

u ˜∗σ(p) := M ∗ M,

where ˜∗σ(1) Let M = [M1 a] with M1 = u

p p u ˜i = ( d1 uii , . . . , dm um i ). ∗ ∗ u ˜σ(2) . . . u ˜σ(p−1) and a = u ˜∗σ(p) . Let s = kak2 − a∗ M1 M1+ a and

b = (M1∗ )+ (I + M1+ a(M1+ a)∗ )−1 M1+ a. By Theorem 5.5, (M1 M1∗ )+ ∗ + (M M ) = 0

0 + Eσ 0


and the matrix Eσ =  −1 s (M1+ a)(M1+ a)∗     −s−1 (M1+ a)∗ 

15

−s−1 (M1+ a) if s 6= 0, s−1 (5.6)

2   kbk (M1+ a)(M1+ a)∗ − (M1+ a)(M1+ b)∗ − (M1+ b)(M1+ a)∗    −kbk2 (A+ a)∗ + (A+ b)∗

2

−kbk

M1+ a + 2

M1+ b

kbk

if s = 0.

Therefore, ˜ θ,m,p = E(VσT K

(M1 M1∗ )+ 0

0 ˜ θ,m,p−1 + E(VσT Eσ Vσ ). V ) + E(VσT Eσ Vσ ) = K 0 σ

(5.7)

6. Performance and Simulations In this Section, we study the performance of our estimators and we compare it with traditional methods. We focus on the case where the true covariance matrix has a Toeplitz structure. More specifically, we focus on the following two types of Toeplitz matrices. 6.1. Tridiagonal Toeplitz Matrix. Consider an m × m symmetric tridiagonal Toeplitz matrix of the form   1 b  b 1 b     .. .. .. B= . . . .    b 1 b b 1 Proposition 6.1.1 ([3]). The eigenvalues and corresponding eigenvectors of B are given by πj , λj = 1 + 2b cos m+1 and 2πj mπj T πj vj = sin , sin , . . . , sin where j = 1, 2, . . . , m. m+1 m+1 m+1 We are interested in the case when B is non-negative definite and the entries of B are non-negative. Therefore, i h 1 it is not hard to see that b should belong to the set 0, 2 cos(π/(m+1)) for this to hold. 6.2. Power Toeplitz matrix. An m × m power Toeplitz matrix is given by   1 α α . . . αm−1  α 1 α · · · αm−2     ..  = α|i−j| .. .. .. Aα =  ... .  . . . .   1≤i,j≤m  αm−2 αm−3 · · · 1 α αm−1 αm−2 · · · α 1 Proposition 6.2.1. Let Aα as before then (1) Aα ≥ 0 if and only if |α| ≤ 1. (2) det(Aα ) = (1 − α2 )m−1 .

16


(3) For α 6= 1, 

Aα −1

1 −α 1   =  1 − α2  



−α 1 + α2 .. .

−α .. . −α

..

. 1 + α2 −α

   .  −α 1

In particular, when m → ∞, the asymptotic behavior of the eigenvalues of Aα −1 are essentially the same as that of tridiagonal Toepltiz matrix. Proof. For (1), use induction. (2) follows directly from (1). To prove (3), use the matrix inverse formula and (1). For our practical purposes, we consider the case when α ∈ [0, 1). 6.3. Preliminaries on the asymptotic behavior of large Toeplitz matrices. We first collect some basic definitions and theorems regarding large Toeplitz matrices from Albrecht B¨ottcher and Bernd Silbermann’s book [4]. For an infinite Toeplitz matrix of the form   a0 a−1 a−2 . . . a1 a0 a−1 . . .   , A = (aj−k )∞ = a2 a1 j,k=0 a0 . . .   .. .. .. .. . . . . define the symbol of matrix A to be ∞ X

a = a(eiθ ) =

an eiθn ,

n=−∞

for 0 ≤ θ ≤ 2π. Let Am be the m × m principal minor of the matrix A. Given a Borel subset E ⊂ C we define the measures m

µm (E) =

1 X (m) χE (λj ), m j=1

(6.1)

and 1 µ(E) = 2π

Z

2π

χE (a(eiθ ))dθ,

(6.2)

0 (m) m }j=1

where χE is the characteristic function of the set E and {λj classical result holds.

are the eigenvalues of Am . The following

Theorem 6.1 (Corollary 5.12 in [4]). If a ∈ L∞ is real-valued, then the measures µm given by (6.1) converge weakly to the measure µ defined by (6.2). 6.4. Asymptotic Behavior of Toeplitz Matrices under Ewens Estimator. For the symmetric tridiagonal Toeplitz matrix B its symbol is a(eiθ ) = 1 + beiθ + be−iθ = 1 + 2b cos θ,


17

where θ ∈ [0, 2π]. By Theorem 1.2 in [4], the spectrum of B as m tends to infinity is supported on the interval [1 − 2b, 1 + 2b]. On the other hand, by Theorem 4.1, we have that Bθ := E(Mσ BMσ∗ ) θ2 + θ − 2 b(θ − 1) Lm + Tm (θ + m − 2)(θ + m − 1) (θ + m − 2)(θ + m − 1) 2b(m − 1) + eeT − Im , (θ + m − 2)(θ + m − 1) = Im +

(6.3)

where

Tm

 0 1 1 0  3 2   :=  ...  3 4  3 4 2 3

3 2 0 .. .

3 4 2 .. .

··· ··· ··· .. .

4 4 3

··· ··· ···

0 2 3

 2 3  3  ..  .  2 3  0 1 1 0

3 4 4

and 

Lm

0 b   :=   



b 0 .. .

b .. .

..

. 0 b

b

   .  b 0

If θ is a fixed constant greater than 1 then as m → ∞, 4m b(θ − 1) kTm k ≤ 2 → 0 (θ + m − 2)(θ + m − 1) m

(6.4)

θ2 + θ − 1 kLm k → 0, (θ + m − 2)(θ + m − 1)

(6.5)

and

as m → ∞. Therefore, Bθ and (1 − [9]) and by Theorem 2.6 in [9],

2 m )Im

+

2 T m ee

are asymptotically equivalent sequences (see Chapter 2, 2 2 (1− m )Im + m eeT

θ lim µB m = lim µm

m→∞

m→∞

which is a rank-1 perturbation of identity matrix. Therefore, θ lim µB m = δ1 ,

m→∞

where δt is the Dirac measure at the point t. A more interesting situation happens when θ = βm for a fixed constant β. In this case, Bθ = Im +

β2 β 1 2b 1 Lm + Tm + (eeT − Im ). (β + 1)2 (β + 1)2 m (β + 1)2 m

Since 1 β 1 Tr Tm m (β + 1)2 m

!2 ≤

and 1 2b 1 Tr (eeT − Im ) 2 m (β + 1) m

1 16m2 → 0 m3

!2 ≤

4b2 2 m → 0, m3

18


as m → ∞. By Lemma 2.3 in [1], the Levy metric of the empirical distributions of two m × m Hermitian matrix A, B satisfies 1 1/3 B L(µA Tr(A − B)(A − B)∗ . m , µm ) ≤ m It is known (see Theorem 6, Section 4.3, [8]) that the distribution functions µm converges weakly to µ if and only if the Levy metric L(µm , µ) → 0. Therefore β )2 Lm Im +( β+1

θ lim µB m = lim µm

m→∞

m→∞

For the matrix β2 β2 Lm = B+ Im + 2 (β + 1) (β + 1)2

.

! β2 1− Im , (β + 1)2

which is still a tridiagonal Toeplitz matrix with symbol β2 cos(θ). (β + 1)2 β2 β2 Hence the limit eigenvalue distribution is supported on the interval 1 − 2b (β+1) 2 , 1 + 2b (β+1)2 . The Figure below shows the estimated density function for the spectrum as β changes. a(eiθ ) = 1 + 2b

Figure 3. This Figure shows the density functions of the empirical spectral distribution of 300 × 300 tridiagonal Toeplitz matrix B with b = 0.3 and those of E(Mσ BMσ∗ ) for different θ’s. For the power Toeplitz matrix Aα , a(eiθ ) = 1 +

eiθ

α α cos(θ) − α + −iθ = 1 + 2α . −α e −α (cos(θ) − α)2 + sin2 (θ)

1+α Thus the spectrum of Aα as m tends to infinity is supported on [ 1−α 1+α , 1−α ].

By Theorem 4.1, one can get Aθ = E(Mσ Aα Mσ∗ ) θ2 + θ − 1 α(am − ma + m − 1 − (θ − 1)(a − 1)) T (Aα − Im ) + (ee − Im ) (θ + m − 2)(θ + m − 1) (1 − α)2 (θ + m − 2)(θ + m − 1) 1 θ−1 − Jm , 1 − α (θ + m − 2)(θ + m − 1) = Im +

(6.6)


19

Figure 4. This Figure shows the estimated density functions of the empirical spectral distribution of 300×300 power Toeplitz matrix A0.5 and those of E(Mσ A0.5 Mσ∗ ) for different θ’s. where Jm = (lij ) with diagonal entries lii = 0 and non-diagonal entries lij = αi + αj + αm+1−i + αm+1−j . In the case θ = βm then !2 m X 1 1 θ−1 1 Tr Jm ≤ 3 (αi + αj + αm+1−i + αm+1−j )2 m 1 − α (θ + m − 2)(θ + m − 1) m (1 − α)2 i,j=1 (6.7) 16 = o(1). ≤ m(1 − α) Similarly, we can show that 2

β Im + (β+1) 2 (Aα −Im )

θ lim µA m = lim µm

m→∞

For the matrix Im +

m→∞

.

β2 β2 β2 (A − I ) = A + 1 − Im , α m α (β + 1)2 (β + 1)2 (β + 1)2

one has

α β2 β2 α 2α(cos(θ) − α) + =1+ . 2 iθ −iθ 2 (β + 1) e − α e −α (β + 1) (cos(θ) − α)2 + sin2 (θ) Thus the limiting spectrum is supported on the interval " # β2 β2 α α 1− ,1 + . (β + 1)2 1 + α (β + 1)2 1 − α a(eiθ ) = 1 +

6.5. Simulations. In this Section, we present some of the simulations to test the performance of our estimators. Let Aα be an m × m Toeplitz covariance matrix with entries ai,j = α|i−j| . Assume that we take n measurements and we want to recover Σ to the best of our knowledge. After performing the measurements we construct the sample covariance matrix K and proceed to recover Aα in terms of the operators invcovp (K) and E(Mσ KMσ∗ ). First we look at the eigenvalue distributions under invcovp and Ewens estimators. In Figure 5, we can observe a realization of this experiment with α = 0.5, m = 200 and n = 150. We see that the eigenvalues of Aα range roughly from 1/3 to 3. For the sample covariance matrix K, 50 eigenvalues are precisely zero. Both, the inverse of invcovp and Ewens estimators give non-zero eigenvalues. The eigenvalues

20


True Covariance Matrix with m=200 and ! = 0.5

4

2

0 0

50

100

150

200

250

50

100

150

200

250

50

100

150

200

250

10

Sample Covariance with n = 150

5 0 −5 0 4

Inverse of invcov with optimum p ( p = 45 ) MSE = 0.7420

2

0 0

4

True Covariance Matrix with m = 200 n = 150 and ! = 0.5

2 0 0

50

100

150

200

250

200

250

Sample Covariance with n = 150

10 5 0 −5 0 4

50

100

150

Ewens Method with Optimum " (" = 261) MSE = 0.6607

2 0 0

50

100

150

200

250

Figure 5. Comparison of the eigenvalue distributions of the true covariance matrix and sample covariance matrix and the invcov estimator vs. the Ewens estimator.

under the inverse of invcovp (p = 45) range from 0.4 to 2 and those under Ewens (with θ = 261) estimate from 0.6 to 2.7. Similar results were observed for other parameter values. In Figures 6 and 7, we show the performance of the estimators for different values of p and θ. It was observed in [13] that the estimator invcovp outperforms the more standard and classical estimator of diagonal loading for optimal loading parameters as in Ledoit and Wolf [11] by computing the Frobenius norm (MSE) p kAα − m invcovp (K)−1 k2 for the different values of p and then computing kAα − KLW k2 . The same type of experiments were performed on a variety of different scenarios as well. Let Aα , m, p, n, K and θ as before and define the functions f (m, n, α, p) = kAα − (p/m)invcovp (K)−1 k2 , g(m, n, α, p) = kA−1 α − (m/p)invcovp (K)k2 , F (m, n, α, θ) = kAα − E(Mσ KMσ∗ )k2 .


21

Figure 6. The functions f and g for m = 200, n = 150 and α = 0.5 as functions of p.

Figure 7. The functions f and g for m = 200, n = 150 and α = 0.5 as functions of θ. We can observe how the Ewens estimator outperforms the invcovp estimator for the optimum values of p and θ. The next Figures show the behavior of the previous functions for different parameter values α, m, n, p and θ. References [1] Z.D Bai. Methodologies in spectral analysis of large-dimensional random matrices, a review. Statist. Sinica, 9(3):611–677, 1999. [2] V. Betz, D, Ueltschi and Y. Velenik. Random permutations with cycle weights Ann. Appl. Probab., vol. 21, no. 1, 312331, 2011. [3] A. B¨ ottcher and S.M. Grudsky. Spectral properties of banded Toeplitz matrices. Society for Industrial Mathematics, 2005. [4] A. B¨ ottcher and B. Silbermann. Introduction to large truncated Toeplitz matrices. Springer Verlag, 1999. [5] N.R. Draper and H. Smith. Applied Regression Analysis (Wiley Series in Probability and Statistics). Wiley-Interscience, 1998. [6] N. Ercolani and D. Ueltschi. Cycle structure of random permutations with cycle weights, 2011. [7] Fulton H., Representation Theory, Springer, 1991.

22


Figure 8. f (m, n, α, p) = kAα − invcovp (K)−1 k2

Figure 9. g(m, n, α, p) = kA−1 α − invcovp (K)k2

Figure 10. F (m, n, α, θ) = kAα − E(Mσ KMσ∗ )k2

[8] J. Galambos. Advanced probability theory, volume 10. CRC, 1995. [9] R.M. Gray. Toeplitz and circulant matrices: A review. Information Systems Laboratory, Stanford University, 1971. [10] T. Kurmayya and KC Sivakumar. Moore-penrose inverse of a gram matrix and its nonnegativity. Journal of Optimization Theory and Applications, 139(1):201–207, 2008. [11] O. Ledoit and M. Wolf. Some hypothesis tests for the covariance matrix when the dimension is large compared to the sample size. Annals of statistics, pages 1081–1102, 2002. [12] I. Macdonald, Symmetric functions and Hall Polynomials Clarendon Press, Oxford University Press, New York, 1995. [13] T. Marzetta, G. Tucci, and S. Simon. A random matrix–theoretic approach to handling singular covariance estimates, IEEE Transactions on Information Theory, vol. 57, no. 9, pp. 6256–6271, 2011. [14] X. Mestre. Improved estimation of eigenvalues and eigenvectors of covariance matrices using their sample estimates. Information Theory, IEEE Transactions on, 54(11):5113–5129, 2008. [15] X. Mestre and M.A. Lagunas. Diagonal loading for finite sample size beamforming: an asymptotic approach. Robust adaptive beamforming, pages 201–257, 2006. [16] R. Muirhead, Aspects of Multivariate Statistical Theory. John Wiley & Sons, New York, 1982.


23

[17] C.D. Richmond, R. Rao Nadakuditi, and A. Edelman. Asymptotic mean squared error performance of diagonally loaded capon-mvdr processor. In Signals, Systems and Computers, 2005. Conference Record of the Thirty-Ninth Asilomar Conference on, pages 1711 – 1716, 28 - november 1, 2005. [18] B. Sagan, The Symmetric Group: Representations. Combinatorial Algorithms, and Symmetric Functions, Springer, 2nd edition, 2010. [19] R.P. Stanley et al. Enumerative Combinatorics: Volume 2 Cambridge university press Cambridge, 1999. [20] M.A.G. Viana. The covariance structure of random permutation matrices. Algebraic methods in statistics and probability: AMS Special Session on Algebraic Methods and Statistics, April 8-9, 2000, University of Notre Dame, Notre Dame, Indiana, 287:303, 2001. Gabriel H. Tucci is with Bell Labs, Alcatel–Lucent, 600 Mountain Ave, Murray Hill, NJ 07974. E-mail address: [email protected] Ke Wang is with the Math Department at Rutgers University, Busch Campus, Piscataway, NJ E-mail address: [email protected]

NEW METHODS FOR HANDLING SINGULAR SAMPLE ...

NEW METHODS FOR HANDLING SINGULAR SAMPLE ...

Suggest Documents

New Datum Handling Methods for the Quality Control of Antibiotic ...

1 AUTOMATED CORE SAMPLE HANDLING FOR

First dice your dill ânew methods and techniques in sample handling ...

Returned Mars Sample Handling Strategy

Collocation-approximation methods for nonlinear singular ... - Core

Workshop on Voronoi Methods for Handling

Performance Evaluation of Methods for handling Premature

comparing three methods of handling

Dielectrophoresis-Based Sample Handling in ... - Semantic Scholar

Mars Sample Handling Protocol Workshop Series

Mars Sample Handling Protocol Workshop Series

Evaluation of sample pretreatment methods for

Efficient Methods for Calculating Sample Entropy ... - ScienceDirect.com

Latitude in sample handling and storage for ... - Microbiome Journal

Automated Core Sample Handling For Future Mars Drill Missions.

Evaluation of sample preservation methods for poultry

Network methods for describing sample relationships

Nonparametric Methods for Sample Surveys of Environmental ...

Assessment of sample preparation methods for the

Digital Microfluidics Platform for Handling Sample ... - Krishi Sanskriti

Coupling of Singular Integral Equation Methods - PaperSciences

Harmonic and refined extraction methods for the singular ... - CiteSeerX

Quadrature methods for periodic singular and weakly ... - CS, Technion

Methods for the Evaluation of Regular, Weakly Singular and Strongly ...

NEW METHODS FOR HANDLING SINGULAR SAMPLE ...