Approximate Distributions of Quadratic Forms in High-Dimensional ...

T ECHNICAL R EPORT 1 Approximate Distributions of Quadratic Forms in High-Dimensional Repeated-Measures Designs Edgar Brunner, Benjamin Becker, and Carola Werner Department Medizinische Statistik Georg-August-Universit¨at G¨ottingen

1 Introduction Repeated measures are data which are observed repeatedly on the same subjects. Such data appear in many medical or biological trials and are called high-dimensional if the dimension d (i.e. the number of repeated measures per subject) is larger (or even much larger) than the number n of the independent subjects on which they are observed. Most of the classical procedures require that n > d. If this is not the case then they break down or cannot even be computed. For example, Wald-type statistics can no loger be computed since they require an inverse of the sample covariance matrix which becomes singular if n < d. The ANOVAtype statistic (ATS), first mentioned by Box (1954) and further developed by Geisser and Greenhouse (1958) and Greenhouse and Geisser (1959), however, can be computed. But it may be noted that a test based on this statistic using a plug-in estimator of the sample covariance matrix becomes conservative if n < d. Meanwhile a plethora of papers considers procedures for high-dimensional data. The ideas underlying the procedures developed may be classified as (1) approximations for fixed n and fixed (but arbitrary large) d assuming a multivariate normal distribution but admitting any type of covariance matrices, so-called unstructured covariance matrices, (2) procedures assuming n, d → ∞ while d/n → κ ∈ (0, 1) - assuming or not a multivariate normal distribution, (3) procedures assuming n fixed while d → ∞ with different assumptions on the structure of the covariance matrix and the underlying distribution of the repeated measures. In this technical report we will focus on the first idea, i.e. we assume a multivariate normal distribution and want to provide approximations based on the ANOVA-type statistic considered by Box (1954). We note that it is our intention to provide approximations to the distribution of the ATS while the quality of the approximation is uniform with respect to the dimension d. 1

First draft (German): October 2004, update (English): November 2009

1

2 The ANOVA-Type Statistic 2.1 Box-Approximation We consider the simple repeated measures model and assume that Xk = (Xk1 , . . . , Xkd )′ ∼ N (µ, V),

k = 1, . . . , n,

(2.1)

are independent vectors representing the subjects. Let H denote an appropriate contrast matrix for testing the hypothesis H0 : Hµ = 0 and let T = H′ (HH′ )− H denote the unique projection matrix derived from H. We allow for any suitable factorial structure of the repeated measures by partitioning the index s = 1, . . . , d in sub-indices, for example, s1 = 1, . . . , b and s2 = 1, . . . , t such that d = bt. One special example may be that the d repeated measures are obtained at t time points under b consecutive treatments or conditions. We consider the projection Yk = TXk ∼ N (Tµ, S), where S = TVT and Tµ = 0 under the hypothesis. To test the hypothesis H0 : Hµ = 0 ⇐⇒ Tµ = 0, we consider P P ′ ′ the statistic Qn = n Y· Y· = n X· TX· , where Y· = n1 nk=1 Yk and X· = n1 nk=1 Xk √ denote the means. Note that n Y· ∼ N (0, S) under H0 . Then, using the representation theorem of quadratic forms it follows under H0 : Tµ = 0 that Qn = n

′ Y·

Y· = n

′ X· TX·

=

d X

λi Ci ,

(2.2)

i=1

where Ci ∼ χ21 are independent and the λi are the eigenvalues of S. Based P on an idea of Patniak (1949), Box (1954) suggested to approximate the distribution of di=1 λi Ci by P a scaled χ2 -distribution g · χ2f such that the first two moments of di=1 λi Ci and g · χ2f coincide. ! d d X X 2 λi Ci = g f = E g χf = E λi = tr (S) i=1

2

2 g f = Var g

χ2f

= Var

d X i=1

Straightforward computation leads to

λi Ci

!

i=1

=2

d X

λ2i = 2 tr (S2 ).

i=1

′

Qn nX· TX· . ∼ χ2f /f = F (f, ∞), = . tr (S) tr (TV)

(2.3)

where f = [tr (S)]2 / tr (S2 ). In practice, however, S is unknown and must be estimated from the data. It is important to note that two quantities are involved in the estimation of S, namely the sample size n and the number d of the repeated measures. This shall be considered in detail in the next subsection. 2

2.2 Consistency of Estimators We consider an array of estimators θbn,d of a quantity θn,d which becomes larger with increasing d. Thus we consider the ratio θbn,d /θn,d . Then it follows from Tschebychev’s inequality 2 1 b b P θn,d /θn,d − 1 > ε ≤ 2 E θn,d /θn,d − 1 ε 2 1 b b b ≤ 2 E θn,d /θn,d − E θn,d /θn,d + E θn,d /θn,d − 1 ε h i2 2 b b . ≤ 2 Var θn,d /θn,d + E θn,d /θn,d − 1 ε We note that the right-hand side is uniformly bounded in d, if E(θbn,d ) = θn,d and Var (θbn,d /θn,d ) ≤ Kn , where Kn denotes a sequence converging to 0 which is uniformly bounded with respect to d. This leads to the following definition. D EFINITION 2.1 An array of estimators θbn,d of a quantity θn,d is called dimensionally stable if 1. E(θbn,d /θn,d ) − 1 ≤ Dn , 2. Var (θbn,d /θn,d ) ≤ Kn ,

where Dn → 0 and Kn → 0 are uniformly bounded with respect to d. As an example, consider the sample covariance matrix n

bn = S

1 X (Yk − Y· )(Yk − Y· )′ , n − 1 k=1

P where Y· = n1 nk=1 Yk denotes the mean vector of the observations Yk , k = 1, . . . , n. b 2 ) of tr (S2 ) is not dimensionally Then it is easily seen that the plug-in estimator tr (S n stable since it is biased and the bias increases with d. Therefore it is our aim to derive unbiased estimators of the traces involved in (2.3) and then show that their variances fulfill the requirement (2) of Definition 2.1. We note that Bai and Saranadasa (1996) have derived an unbiased estimator of tr (S2 ) b n and showed its ratio consistency, i.e. if n, d → ∞ using the sample covariance matrix S while d/n → κ ∈ (0, 1). As we intend, however, to derive an approximation for fixed n and fixed d, we will derive other unbiased estimators of the quantities tr (S), [tr (S)]2 , and tr (S2 ) which are dimensionally stable. This will be worked out in the next section. 3

3 Unbiased Estimators 3.1 Derivation of the estimators In the sequel we will derive unbiased estimators of [tr (TV)]2 and tr [(TV)2 ]. Numerator and denominator of f = [tr (S)]2 / tr (S2 ) will be separately estimated consistently and then the ratio is taken as an estimator of f . The bias of this estimator is obtained from a Taylor expansion. For Model (2.1) we will provide procedures for testing the linear hypothesis H0 : Hµ = 0, where H denotes a suitable hypothesis matrix. As we will need the properties that H is symmetric and idempotent, we will equivalently work with the unique projection matrix T = H′ (HH′ )− H and note that Hµ = 0 ⇐⇒ Tµ = 0. For convenience, let Yk = TXk denote the projection of the observation vectors and let Sk = Yk Yk′ . Then, under H0 : Tµ = 0, it follows that E(Yk ) = 0 and it is easily seen that n

X en = 1 Sk S n k=1

(3.4)

is an unbiased estimator of S = TVT under H0 . Using the invariance of the trace under cyclic permutations, a natural estimator of tr (TV) = tr (S) is given by n

B0

1X Ak = A· , = n k=1

(3.5)

where Ak = Yk′ Yk . The estimator B02 , however, is a biased estimator of [tr (TV)]2 and the bias τ 2 follows from 2 e n )) + E[tr (S e n )] e n )]2 = Var (tr (S E [tr (S

e )) +[tr (S)]2 . = Var (tr (S {z n } | := τ 2 P 2 e Since by independence, τ = Var tr (Sn ) = Var n1 nk=1 Ak = lows that n X 1 (Ak − A· )2 τb2 = n(n − 1) k=1

(3.6)

1 n

Var (A1 ), it fol-

is an unbiased estimator of τ 2 . Combining this result with (3.6), one obtains an unbiased estimator B1 of [tr (TV)]2 , namely n

B1

e n )]2 − = [tr (S

X 1 (Ak − A· )2 . n(n − 1) k=1 4

(3.7)

To derive an unbiased estimator B2 of tr [(TV)2 ], a simple result from matrix algebra is used. Let A, B ∈ Rn×n , then tr (AB′ ) = 1′n (A#B)1n , where A#B denotes the Hae 2 ) can be obtained damard product of A and B. Using this result and (3.4), the bias of tr (S n from " n # n X 1 X 1 2 ′ ′ e )] = 1 E[S e n #S e n ]1d = 1 E E[tr (S Sk # Sk 1d n d d n k=1 n k=1 # " n n X X 1 E(Sk #Sk′ ) 1d = 1′d 2 n k=1 k′ =1 n 1 X ′ n−1 = 1d E(Sk #Sk ) 1d . S#S + 2 n n k=1 | {z } 2 := π

To estimate the unknown quatity π 2 , again the above quoted result from matrix algebra is used and one obtains 1′d (Sk #Sk )1d = tr (S2k ). Finally, this leads to the desired unbiased estimator B2 of tr [(TV)2 ], namely ! n X n e2 1 B2 = tr S2 . S − (3.8) n − 1 n n(n − 1) k=1 k To reduce memory space and computation time (which is important for simulations), the e 2 are computed from simple bilinear and quadratic traces of the d×d matrices Sk , S2k , and S n forms. This is summarized in the following lemma.

L EMMA 3.1 Let Xk = (Xk1 , . . . , Xkd )′ , k, l = 1, . . . , n be i.i.d. random vectors, let T denote a projection matrix and denote Yk = TXk . Further let Akl = X′k TXl denote a symmetric bilinear form in Xk and Xl and for k = l let Ak := Akk denote a quadratic form. Then n n n X h i2 X X 1 1 en) − Ak Al 1. B1 = tr (S (Ak − A· )2 = n(n − 1) k=1 n(n − 1) k=1 l=1 | {z } k6=l ! n n n X XX n e2 1 1 2 S − 2. B2 = tr A2 . S = n − 1 n n(n − 1) k=1 k n(n − 1) k=1 l=1 kl | {z } k6=l

P ROOF :

1. First note that Yk = TXk and Sk = Yk Yk′ . Thus, ! n n n X 1 1X 1X e tr (Sn ) = tr Sk = tr (Sk ) = tr (Yk Yk′ ) n k=1 n k=1 n k=1 n

=

n

n

1X ′ 1X ′ 1X (Yk Yk ) = Xk TTXk = Ak = A· n k=1 n k=1 | {z } n k=1 =Ak

5

e n ) with A· leads to using invariance under cyclic permutations. Replacing tr (S B1

h

i2 e2 ) − tr (S n

n

X 1 = (Ak − A· )2 n(n − 1) k=1 # " n n X X 1 1 n 2 2 2 2 A − A2 A − nA· = = A· − n(n − 1) k=1 k n−1 · n(n − 1) k=1 k   !2 n n X n n X X X 1 1 2  Ak − Ak  = = Ak Al . n(n − 1) n(n − 1) k=1 k=1 |k=1{zl=1} k6=l

2. Furthermore,

n n−1

n

1X Sk n k=1

!2

1 n(n − 1)

n

X 1 − S2 = n(n − 1) k=1 k " n # n X X Sk Sl − S2k = k=1

k=1

X 1 Sk Sl . n(n − 1) k6=l

Then, taking traces and observing the invariance under cyclic permutations one obtains for k 6= l tr (Sk Sl ) = tr (Yk Yk′ Yl Yl′ ) = Yl′ Yk Yk′ Yl = Alk Akl = A2kl since Akl = Alk . This leads to an unbiased estimator B2 of tr (S2 ) n

n

B2

XX 1 A2 . = n(n − 1) k=1 l=1 kl | {z } k6=l

Combining all the results above, one obtains the new estimator fe of f = [tr (S)]2 / tr (S2 ) X Ak Al B k6=l 1 . (3.9) = X fe = B2 A2 kl

k6=l

3.2 Quadratic and Bilinear Forms In order to derive the properties of the estimators B0 in (3.5), B1 in (3.7), and B2 in (3.8), some results on bilinear and quadratic forms are needed. Let T be a symmetric matrix. Then the quantity Akl = X′k TXl is called a bilinear form if k 6= l and if k = l the quantity Ak = X′k TXk is called a quadratic form. First we consider the representation of a quadratic form in a random vector X. 6

L EMMA 3.2 (R EPRESENTATION OF A Q UADRATIC F ORM ) Let X = (X1 , . . . , Xd )′ denote a random vector with E(X) = µ and Cov (X) = V > 0 and let T be a symmetric matrix. Then,  d X    λi (Ui + bi )2 if E(X) = µ 6= 0    i=1 A = X′ TX =  d  X    λi Ui2 if E(X) = 0  i=1

where the λi are the eigenvalues of V1/2 TV1/2 . Moreover, U = (U1 , . . . , Ud ) is a ran1 dom vector with E(U) = 0, Cov (U) = I, and (b1 , . . . , bd ) = (P′ V 2 µ)′ where PP′ = I denotes an orthogonal matrix. P ROOF : see M ATHAI and P ROVOST (1992), p. 28f. If X has a multivariate normal distribution, then the distribution of a quadratic form is given in the following corollary.

C OROLLARY 3.3 (D ISTRIBUTION OF A Q UADRATIC F ORM ) Let X = (X1 , . . . , Xd )′ ∼ N (0, V), | V| = 6 0, V = V′ and let T denote a symmetric matrix. Then, d X ′ A = X TX = λi Ci , i=1

where Ci ∼ χ21 are i.i.d. random variables and λi are the eigenvectors of TV, i = 1, . . . , d. P ROOF : Note that | V| = 6 0 and V = V′ ⇒ ∃ V1/2 and V−1/2 is symmetric and regular 1/2 −1/2 ⇒V V = I. X′ TX = X′ V−1/2 V1/2 TV1/2 V−1/2 X = X′ V−1/2 P′ PV1/2 TV1/2 P′ PV−1/2 X

Since V1/2 and T are symmetric it follows that V1/2 TV1/2 is also symmetric. Thus, by singular value decomposition theorem it follows that there exists an orthogonal matrix P such that PV1/2 TV1/2 P′ = diag{λ′1 , . . . , λ′d } = ∆. The quantities λ′i are the eigenvectors of V1/2 TV1/2 . Thus d X ′ −1/2 ′ 1/2 1/2 ′ −1/2 ′ X TX = (PV X) PV TV P}(PV X) = Z ∆Z = λ′i Zi2 , | {z i=1

∆

where Z = PV−1/2 X ∼ N (0, PV−1/2 VV−1/2 P′ ) = N (0, Id ). Thus it remains to show that λi = λ′i . To this end let λ denote an arbitrary eigenvalue of V1/2 TV1/2 . Then, 0 = |V1/2 TV1/2 − λI| = |I||V1/2 TV1/2 − λI| = |V−1/2 ||V1/2 TV1/2 − λI||V1/2 |

= |V−1/2 V1/2 TV1/2 V1/2 − λV−1/2 IV1/2 | = |TV − λI| 7

Thus, it follows that λ is also an eigenvalue of TV.

2

This corollary shows that a quadratic form is distributed as a sum of uncorrelated (in case of a normal distribution independent) χ21 -distributed random variables. With this in mind, we will derive the central moments of quadratic forms. L EMMA 3.4 (M OMENTS OF A QUADRATIC FORM ) Let Xk = (Xk1 , . . . , Xkd ) ∼ N (µ, V), k = 1, . . . , n, i.i.d. random variables and let T denote a projection matrix. Then for the quadratic form Ak = X′k TXk , it holds in general that 1. E(Ak ) = tr (TV) + µ′ Tµ 2. Var (Ak ) = 2 tr [(TV)2 ] + 4µ′ TVTµ Moreover, if Tµ = 0 then, 3. E(Ak ) = tr (TV) =: ν 4. Var (Ak ) = 2 tr [(TV)2 ] =: τ 2 5. For the pairs Ak Al and Ar As (where k 6= l and r  4 τ + 2 τ 2ν 2 :      τ 2ν 2 : Cov (Ak Al , Ar As ) =      0 :

6= s) it holds that if (k, l) = (r, s) or if (k, l) = (s, r) if k = r and l 6= s or if k 6= r and l = s otherwise

P ROOF :

1. this is the well-known Lancaster theorem. 2. see M ATHAI and P ROVOST (1992), p.53 3. follows immediately from (1) 4. follows immediately from (2) 5.

• Var (Ak Al ) = E(Ak Al Ak Al ) − [E(Ak Al )]2

= [E(A2k )]2 − [E(Ak )]4 = (τ 2 + ν 2 )2 − ν 4

= τ 4 + 2 τ 2ν 2

• Cov (Ak Al , Ak Ar ) = E(Ak Al Ak Ar ) − E(Ak Al )E(Ak Ar ) = (τ 2 + ν 2 )ν 2 − ν 4 = τ 2 ν 2

• Cov (Ak Al , Ar As ) = ν 4 − ν 4 = 0

Next we consider the representation, distribution and moments of bilinear forms. 8

2

L EMMA 3.5 (R EPRESENTATION OF A BILINEAR FORM ) Let X = (X1 , . . . , Xd )′ and Y = (Y1 , . . . , Yd )′ be independent identically distributed random vectors satisfying E(X) = E(Y) = µ and Cov (X) = Cov (Y) = V > 0 and let T denote a symmetric matrix. Then,  d X    λi (Ui + bi )(Wi + bi ) if E(X) = E(Y) = µ 6= 0    i=1 A = X′ TY =  d  X    λi Ui Wi if E(X) = E(Y) = 0  i=1

where the λi are the eigenvalues of V1/2 TV1/2 . Moreover, U = (U1 , . . . , Ud ), W = (W1 , . . . , Wd ) are random vectors with E(U) = E(W) = 0, Cov (U, W) = 0 and 1 Cov (U) = Cov (W) = I and (b1 , . . . , bd ) = (P′ V 2 µ)′ , where PP′ = I denotes an orthogonal matrix. P ROOF : Let ZX = (V−1/2 X − V−1/2 µ) and ZY = (V−1/2 Y − V−1/2 µ). Then, by the singular value decomposition theorem, there exists an orthogonal matrix P such that PV1/2 TV1/2 P′ = diag{λ′1 , . . . , λ′d } = ∆, where PP′ = I. Then set U = P′ ZX and W = P′ ZY . Thus, E(U) = E(W) = 0

Cov (U) = Cov (W) = I.

Then it follows for the bilinear form A = X′ TY that X′ TY = X′ V−1/2 V1/2 TV1/2 V−1/2 Y = (ZX + V−1/2 µ)′ V1/2 TV1/2 (ZY + V−1/2 µ) 1/2 1/2 ′ = (ZX + V−1/2 µ)′ P′ PV TV P} P(ZY + V−1/2 µ) {z | ∆

= (U + b)′ ∆(W + b) .

2

If X and Y have multivariate normal distributions, then the distribution of a bilinear form is given in the following corollary. C OROLLARY 3.6 (D ISTRIBUTION OF A BILINEAR FORM ) Let X = (X1 , . . . , Xd )′ and Y = (Y1 , . . . , Yd )′ ∼ N (0, V) be independent random vectors with | V| = 6 0, V = V′ and let T denote a symmetric matrix. Then, d X ′ A = X TY ∼ λi Ci Di , i=1

where Ci , Di ∼ N (0, 1) are independent and the λi are the eigenvalues of TV. 9

′

P ROOF : From | V| = 6 0 and V = V ⇒ V1/2 V−1/2 = I.

⇒ ∃ V1/2 and V−1/2 , symmetric and regular

X′ TY = X′ V−1/2 V1/2 TV1/2 V−1/2 Y = X′ V−1/2 P′ PV1/2 TV1/2 P′ PV−1/2 Y . Sice V1/2 and T are symmetric it follows that V1/2 TV1/2 is also symmetric. Thus, by singular value decomposition theorem it follows that there exists an orthogonal matrix P such that PV1/2 TV1/2 P′ = diag{λ′1 , . . . , λ′d } = ∆. The quantities λ′i are the eigenvectors of V1/2 TV1/2 . Thus d X −1/2 ′ ′ −1/2 ′ 1/2 1/2 ′ X TY = (PV X) PV TV P}(PV Y) = C ∆D = λ′i Ci Di , {z | i=1

∆

where

C = PV−1/2 X ∼ N (0, PV−1/2 VV−1/2 P′ ) = N (0, In ), D = PV−1/2 Y ∼ N (0, PV−1/2 VV−1/2 P′ ) = N (0, In ).

Thus it remains to show that λi = λ′i . This has already been shown in the proof of Corollary 3.3. 2 By this corollary it is possible to derive the central moments of bilinear forms. L EMMA 3.7 (M OMENTS OF A BILINEAR FORM ) Let Xk = (Xk1 , . . . , Xkd )′ , Xl = (Xl1 , . . . , Xld )′ ∼ N (0, V), be i.i.d. for (k 6= l ∈ {1, . . . , n}) and let T denote a projection matrix. Then, 1. E(Akl ) = 0 2. Var (Akl ) = E(A2kl ) = tr [(TV)2 ] 3. E(A4kl ) = 6 tr [(TV)4 ] + 3 (tr [(TV)2 ])2 4. E(A2kl A2kn ) = 2 tr [(TV)4 ] + (tr [(TV)2 ])2 P ROOF : d X

1. E (Akl ) = E

λi Ci Di

i=1

2. Var (Akl ) = E(A2kl ) = E

!

=

d X i=1

d X

=E

i=1

=

d X

λ2i Ci2 Di2

=0

λi Ci Di

i=1

d X

λi E(Ci )E(Di ) = 0 | {z }

!

d X

λj Cj Dj

j=1

=

d X i=1

λ2i = tr [(TV)2 ]

i=1

10

!

λ2i E(Ci2 ) E(Di2 ) | {z } | {z } =1

=1

"

3. E(A4kl ) = E  =

d X

λi Ci Di

i=1

d X d X d X d X

#4  

λi λj λr λs E(Ci Di Cj Dj Cr Dr Cs Ds )

i=1 j=1 r=1 s=1

=

d X

λ4i

|

{z

λ4i

+3

i=1

d X

=6

E(Ci4 Di4 ) +3 9

i=1



4. E(A2kl A2kr ) = E  =

}

d X

d X i6=j

λ2i λ2j E(Ci2 Di2 Cj2 Dj2 ) {z } | 1

λ2i λ2j = 6 tr [(TV)4 ] + 3 (tr [(TV)2 ])2

i,j

d X

λi Ci Di

i=1

d X d X d X d X

!2

d X i=1

!2  λi Ci Ei 

λi λj λr λs E(Ci Di Cj Dj Cr Er Cs Es )

i=1 j=1 r=1 s=1

=

d X

λ4i

i=1

=2

d X i=1

E(Ci4 Di2 Ei2 ) +

|

{z 3

λ4i +

}

d X d X

d X i6=j

λ2i λ2j E(Ci2 Di2 Cj2 Ej2 ) | {z } 1

λ2i λ2j = 2 tr [(TV)4 ] + (tr [(TV)2 ])2

i=1 j=1

2

3.3 Consistency and dimensional stability of the estimators In the following lemma, a regularity condition is given from which it follows that the new estimators are consistent and dimensionally stable. L EMMA 3.8 Let λi , (λi ≥ λ0 > 0) denote the eigenvalues of TVT and let m := max λi < ∞. Then, i

m 1. lim Pd d→∞

i=1

m2 2. lim Pd d→∞

i=1

tr [(TV)2 ] =0 , d→∞ [tr (TV)]2

λi

= 0 ⇒ lim

λ2i

= 0 ⇒ lim

tr [(TV)4 ] =0 . d→∞ (tr [(TV)2 ])2 11

P ROOF : Pd 2 tr [(TV)2 ] i=1 λi = ≤ 1. P [tr (TV)]2 ( di=1 λi )2 Pd 4 tr [(TV)4 ] i=1 λi 2. = P (tr [(TV)2 ])2 ( di=1 λ2i )2

P m di=1 λi m d→∞ = Pd −→ 0 . Pd ( i=1 λi )2 i=1 λi P m2 di=1 λ2i m2 d→∞ ≤ Pd = −→ 0 . P d 2 ( i=1 λ2i )2 i=1 λi

R EMARK 3.1 The first condition follows from the second condition since λi ≥ 0 and

m2 m2 ≤ Pd Pd 2 ( i=1 λi )2 i=1 λi P 2 P d which follows from λ ≥ di=1 λ2i . i=1 i

In the sequel it will be shown by Lemma 3.7 that the unbiased estimators B1 and B2 are consistent and dimensionally stable - provided that the conditions of Lemma 3.8 are fulfilled. To this end it has to be shown that the variance of the ratio of the estimator and the quantity to be estimated is uniformly bounded in d and that this bound does not depend on d. The following trivial properties of the variance of a sum of random variables will be used in the sequel: ! n n n X X X Xi = Var (Xi ) + Var Cov (Xi , Xj ) i=1

Var

n X

Xi

i=1

!

i=1

= E

n X

Xi Xj

i,j

!

i6=j

"

− E

n X i=1

Xi

!#2

.

Thus, one obtains for V1 = Var (B1 ) ! n n n X n X X X Var (Ak Al ) + n2 (n − 1)2 V1 = Var Ak Al = Cov (Ak Al , Ar As ) . k6=l

k6=l

k6=l r6=s

| {z }

(k,l)6=(r,s)

Moreover, by Lemma 3.4,

n2 (n − 1)2 V1 = n(n − 1) Var (Ak Al ) +n(n − 1)(n − 2)(n − 3) Cov (Ak Al , Ar As ) +4n(n − 1)(n − 2) Cov (Ak Al , Ak Ar ) +n(n − 1) Cov (Ak Al , Al Ak ) {z } | =Var (Ak Al )

= 2n(n − 1)(σ 4 + 2 σ 2 µ2 ) + 4 n(n − 1)(n − 2)σ 2 µ2 = 2n(n − 1)σ 4 + (4n3 − 8 n2 + 4n)σ 2 µ2 . 12

Note that µ4 = [tr (TV)]4 does no longer appear in the variance. Thus, it can be shown by Lemma 3.8 that V1 /[tr (TV)]4 → 0 if d → ∞. V1 2 (tr [(TV)2 ])2 (4n3 − 8 n2 + 4n) tr [(TV)2 ] + . (3.10) = [tr (TV)]4 n(n − 1) [tr (TV)]4 n2 (n − 1)2 [tr (TV)]2 {z } | | {z } →0

→0

Further note that this ratio is bounded for all d, n > 1 by

4n . (n−1)2

2 (tr [(TV)2 ])2 (4n3 − 8 n2 + 4n) tr [(TV)2 ] V1 = + [tr (TV)]4 n(n − 1) [tr (TV)]4 n2 (n − 1)2 [tr (TV)]2 {z } | | {z } ≤1

≤

≤1

(4n2 − 8 n + 4) 4n2 − 6n + 2 4n 2 + ≤ ≤ . 2 2 n(n − 1) n(n − 1) n(n − 1) (n − 1)2

For the variance V2 = Var (B2 ) of the second estimator it follows that ! n X 2 2 2 n (n − 1) V2 = Var Akl k6=l

= E

n n X X

A2kl A2rs

k6=l r6=s

!

"

− E

n X k6=l

A2kl

!#2

.

(3.11)

Moreover, by Lemma 3.7, n2 (n − 1)2 V2 = 2n(n − 1)E(A4kl )

+4n(n − 1)(n − 2)E(A2kl A2kn )

+n(n − 1)(n − 2)(n − 3)E(A2kl )2 −n2 (n − 1)2 Var (Akl )2

= 2n(n − 1)(6 tr [(TV)4 ] + 3(tr [(TV)2 ])2 )

+4n(n − 1)(n − 2)(2 tr [(TV)4 ] + (tr [(TV)2 ])2 )) +n(n − 1)(n − 2)(n − 3)(tr [(TV)2 ])2 )

−n2 (n − 1)2 (tr [(TV)2 ])2 )

= n(n − 1)[(8n − 4) tr [(TV)4 ] + 4(tr [(TV)2 ])2 )] . Unfortunately, the term V2 /(tr [(TV)2 ])2 does not vanish for d → ∞ and for fixed n. One obtains V2 4 8n − 4 tr [(TV)4 ] + = . 2 2 2 2 (tr [(TV) ]) n(n − 1) (tr [(TV) ]) n(n − 1) {z } | →0

13

(3.12)

This ratio is bounded, however, for all d, n > 1 by

8 n−1

V2 8n − 4 tr [(TV)4 ] 8n − 4 + 4 8 4 = ≤ ≤ . + (tr [(TV)2 ])2 n(n − 1) (tr [(TV)2 ])2 n(n − 1) n(n − 1) (n − 1) {z } | ≤1

Here it has been shown that for fixed n and d → ∞ the quantity B1 /[tr (TV)]2 is L2 -consistent and thus it is dimensionally stable. Moreover, B1 /[tr (TV)]2 is also L2 consistent for n → ∞. The quantity B2 / tr [(TV)2 ], however, is only L2 -consistent for n → ∞ and arbitrary fixed d. For fixed n and d → ∞, it is at least uniformly bounded by 8/(n − 1) und thus, it is also dimensionally stable.

3.4 Bias of the Ratio [tr (TV)]2 / tr [(TV)2 ] It has been shown that B1 and B2 are unbiased estimators of [tr (TV)]2 and tr [(TV)2 ], respectively. The ratio B1 /B2 , however may be biased. This shall be investigated in the sequel using a Taylor expansion for the ratio of two random variables. E(B1 ) Cov (B1 , B2 ) Var (B2 ) B1 ≈ E − 1+ , (3.13) B2 E(B2 ) [E(B2 )]2 E(B1 )E(B2 )

where the sign ≈ means “approximately equal”. Using the derivation in the previous section, one obtains B1 4 [tr (TV)]2 Cov (B1 , B2 ) E 1+ . (3.14) ≈ − B2 tr [(TV)2 ] n(n − 1) [tr (TV)]2 tr [(TV)2 ]

It is easily seen that the last term in (3.14) can be neglected for large d or for large n. From Cauchy-Schwarz-inequality and by (3.10) and (3.12), it follows under the assumptions of Lemma 3.8 that Cov (B1 , B2 ) 1 √ = O . [tr (TV)]2 tr [(TV)2 ] n d Thus, the ratio Cov (B1 , B2 )/([tr (TV)]2 tr [(TV)2 ]) vanishes for n fixed and d → ∞ as well as for d fixed and n → ∞. 2 The bias of fe = B1 /B2 can be approximately determined by plugging in this result in (3.13). 4 [tr (TV)]2 ˜ 1+ . (3.15) E(f ) ≈ tr [(TV)2 ] n(n − 1)

Obviously, the new estimator is slightly biased, but the bias disappears rapidly with increasing n. Furthermore, numerator and denominator of f are dimensionally stable and f is asymptotically unbiased for n → ∞ and arbitrary d. Simulations show that the bias of fe is much smaller than the bias of the simple plug-in b n )]2 / tr (S b 2 ) in the ANOVA-type statistic. estimator fb = [tr (S n 14

3.5 F -Approximation (continued) Now all relevant estimators have been derived and theirs properties have been investigated. Thus, the F -approximation started in Section 2.1 can be resumed and the distribution of the statistic Qn /B0 shall be approximated by a F -distribution such that the first two moments coincide. Fn =

Qn . ∼ F (f1 , f2 ). B0 .

We need the following moments: P L EMMA 3.9 Let B0 = n1 nk=1 Ak and Qn = are symmetric bilinear forms. Then,

1 n

Pn

k=1

Pn

l=1

Akl , where Akl = X′k TXl

1. E(B0 ) = tr (TV) 2. Var (B0 ) =

2 n

tr [(TV)2 ]

3. Cov (Qn , B0 ) = Var (B0 ). Proof: The statements follow easily using the results from Lemma 3.4 on the moments of quadratic forms and from Lemma 3.7 on the moments of bilinear forms. ! n n 1X 1X 1. E Ak = E(Ak ) = tr (TV) n k=1 n k=1 X n 1 1 X 2 2. Var Var (Ak ) = tr [(TV)2 ] Ak = 2 n n k=1 n 3.

Cov (Qn , B0 ) = E(Qn B0 ) − E(Qn )E(B0 ) = ! ! n n n n n X 1 XX 1 XX Akl Am − E Akl tr (TV) = E n2 k=1 l=1 n m=1 k=1 l=1 n n 1 XX E(Ak Al ) − [tr (TV)]2 = n2 k=1 l=1

1 n−1 2 E(A2k ) + [E(Ak )]2 − [tr (TV)]2 = tr [(TV)2 ]. n n n

= Thus, B0 = of tr (TV).

1 n

Pn

k=1

2

Ak is an unbiased, consistent and dimensonally stable estimator

The first two central moments of Fn are approximately determined by a Taylor expansion for the ratio of two random variables X and Y with Var (X) < ∞ and Var (Y ) < ∞. 15

The approximation for the expectation E(X/Y ) is given in (3.13) replacing B1 with X and B2 with Y . For the variance Var (X/Y ) we have Cov (X, Y ) X . E(X)2 Var (X) Var (Y ) = Var . E(Y )2 [E(X)]2 + [E(Y )]2 − 2 E(X)E(Y ) Y By noting that E(Qn ) = tr (TV), Var (Qn ) = 2 tr [(TV)2 ], and Cov (Qn , B0 ) = Var (B0 ), one obtains Var (B0 ) Cov (Qn , B0 ) . E(Qn ) E(Fn ) = . E(B ) 1 + [E(B )]2 − E(Q )E(B ) 0 0 n 0 Var (B0 ) Var (B0 ) . . tr (TV) = . 1 . tr (TV) 1 + [tr (TV)]2 − [tr (TV)]2 = . (E(Qn ))2 Var (Qn ) Var (B0 ) 2 Cov (Qn , B0 ) Var (Fn ) = + − . [E(B0 )]2 [E(Qn )]2 [E(B0 )]2 E(Qn )E(B0 ) . [tr (TV)]2 Var (Qn ) − Var (B0 ) = . [tr (TV)]2 [tr (TV)]2 . = .

(2 − n2 ) tr [(TV)2 ] . [tr (TV)]2

Now, E(Fn ) and Var (Fn ) are equated with the first two moments of the F (f1 , f2 )distribution E

Var

F (f1 , f2 )

f2 − 2 f2

f23 + 2f22 f1 − 4f22 (f1 f22 − 4f2 f1 + 4f1 )(f2 − 4)

Fn

1

(2 − n2 ) tr [(TV)2 ] [tr (TV)]2

and one obtains fe = f1 =

[tr (TV)]2 , (1 − n1 ) tr [(TV)2 ]

f2 = ∞.

The distribution of Fn can be approaximated by a χ2fe/fe-distribution, where fe =

n B1 B1 = . 1 n − 1 B2 (1 − n )B2

Obviously, fe/f → 1 for n → ∞, where f = B1 /B2 . The bias in (3.15) is reduced by the factor n/(n − 1). 16

Literatur [1] BAI , Z. and S ARANADASA , H. (1996), Effect of High Dimension: by an Example of a Two Sample Problem. Statistica Sinica, 6, 311-329. [2] B OX , G.E.P. (1954). Some Theorems on Quadratic Forms Applied in the Study of Analysis of Variance Problems, II. Effects of Inequality of Variance and of Correlation Between Errors in the Two-Way Classification. The Annals of Mathematical Statistics 25, 484-498. [3] G EISSER , S. and G REENHOUSE , S. W. (1958). An Extension of Box’s Result on the Use of the F Distribution in Multivariate Analysis. Annals of Mathematical Statistics, 29, 885-891. [4] G REENHOUSE , S. W. and G EISSER , S. (1959). On Methods in the Analysis of Profile Data. Psychometrika, 24, 95-112. [5] M ATHAI , A.M. and P ROVOST, S.B. (1992). Quadratic Forms in Random Variables, Theory and Applications. Marcel Dekker, Inc. [6] PATNAIK , P.B. (1949). The non-central χ2 -and F-distributions and their Applications. Biometrika, 202-232.

17

Approximate Distributions of Quadratic Forms in High-Dimensional ...

Approximate Distributions of Quadratic Forms in High-Dimensional ...

Suggest Documents

Generalized skew-elliptical distributions and their quadratic forms

Generalized skew-elliptical distributions and their quadratic forms

On genera of quadratic forms

Topics on Quadratic Forms - ICTP

Quadratic Forms and Elliptic Curves

The representation of quadratic forms by almost universal forms of ...

INFINITE DIMENSIONAL QUADRATIC FORMS ADMITTING ...

An Approximate Quadratic Analysis of Fixture

INFINITE DIMENSIONAL QUADRATIC FORMS ADMITTING

Simerka-Quadratic Forms and Factorization

ALMOST UNIVERSAL QUADRATIC FORMS: AN

Zeros of Pairs of Quadratic Forms

Approximate forms of the density of states

REPRESENTATIONS OF QUADRATIC FORMS AND ... - Project Euclid

ESSENTIAL DIMENSION OF QUADRATIC FORMS WITH TRIVIAL ...

REPRESENTATIONS OF QUADRATIC FORMS AND ... - Project Euclid

Grassmannian Reduction of Quadratic Forms - Carleton University

Algebraic Theory of Quadratic Forms - Core

Algebraic Theory of Quadratic Forms - Core

Convex pencils of real quadratic forms

generic splitting of quadratic forms, ii - Core

Asymptotic Distribution of Quadratic Forms and Applications

Approximate Distributions of Several Common Graph Statistics ...

On the asymptotic and approximate distributions of