Approximating by the Wishart distribution

Ann. Inst. Statist. Math. Vol. 47, No. 4, 767 783 (1995)

APPROXIMATING BY THE WISHART DISTRIBUTION TONU KOLLO ~ AND DIETRICH VON ROSEN 2

1Department of Mathematical Statistics, University of Tartu, J. Liivi Street 2, EE-2900 Tartu, Estonia 2Department of Mathematics, Uppsala University, Box 2t80, S-751 06 Uppsala, Sweden

(Received January 24, 1994; revised February 6, 1995)

Abstract. Approximations of density functions are considered in the multivariate case. The results are presented with the help of matrix derivatives, powers of Kronecker products and Taylor expansions of functions with matrix argument. In particular, an approximation by the Wishart distribution is discussed. It is shown that in many situations the distributions should be centred. The results are applied to the approximation of the distribution of the sample covariauce matrix and to the distribution of the non-central Wishart distribution.

Key wor'ds and phrases: Density approximation, Edgeworth expansion, multivariate cunmlants, Wishart distribution, non-central Wishart distribution, sample covariance matrix.

1.

Introduction

Ill statistical a p p r o x i m a t i o n t h e o r y the m o s t c o m m o n tool for a p p r o x i m a t i n g the density or the distribution function of a statistic of interest is the E d g e w o r t h expansion or related expansions like tilted E d g e w o r t h (e.g. see Barndorff-Nielsen and Cox (1989)). T h e n a distribution is a p p r o x i m a t e d by the s t a n d a r d n o r m a l distribution using derivatives of its density function. However, for a p p r o x i m a t i n g a skewed r a n d o m variable it is n a t u r a l to use some skewed distribution. This idea was elegantly used by Hall (1983) for a p p r o x i m a t i n g a stun of i n d e p e n d e n t r a n d o m variables with the chi-square distribution. T h e s a m e ideas are also valid in the m u l t i v a r i a t e case. For different multivariate statistics E d g e w o r t h expansions have been derived on the basis of the multivariate n o r m a l distribution, Np(0, E) (e.g. see T r a a t (1986), Skovgaard (1986), M c C u l l a g h (1987), Barndorff-Nielsen and Cox (1989)), b u t it seems more uatural in m a n y cases to use m u l t i v a r i a t e a p p r o x i m a t i o n s via the W i s h a r t distribution. Most of the test-statistics in m u l t i v a r i a t e analysis are based on functions of q u a d r a t i c forms. Therefore, it is reasonable to believe, at l e ~ t when the statistics are based on n o r m a l samples, t h a t we could expect good a p p r o x i m a t i o n s for these statistics using the W i s h a r t density. 767

768

TONU

KOLLO

AND

DIETRICH

VON ROSEN

In this p a p e r we are going to o b t a i n the W i s h a r t - a p p r o x i m a t i o n for the density function of a s y m m e t r i c r a n d o m matrix. In Section 2 basic notions and fornmlas for the probabilistie c h a r a c t e r i z a t i o n of a r a n d o m m a t r i x will be given. Section 3 includes a general relation between two different density functions. T h e o b t a i n e d relation will be utilized in Section 4 in the case when one of the two densities is the W i s h a r t density. In particular, the first t e r m s of the e x p a n s i o n will be w r i t t e n out. In Section 5 we present two applications of our results and consider the d i s t r i b u t i o n of the s a m p l e covarianee m a t r i x as well as the non-central W i s h a r t distribution. 2.

Moments and cumulants of a random matrix

In the p a p e r we are, systematically, going to use m a t r i x notations. Most of the results will be presented using notions like v e c - o p e r a t o r , Kroneeker p r o d u c t , c o m m u t a t i o n m a t r i x and m a t r i x derivative. Readers, not very familiar with these concepts, are referred to the b o o k by M a g n u s and Neudecker (1988), for example. Now we present those definitions of m a t r i x derivatives which will be used in the subsequent. For a p x q - m a t r i x X and a m x n - m a t r i x Y = Y ( X ) the m a t r i x derivative dY 9 d.~ IS a m n x pq-matrix:

dY d - - - | vec K dX dX where

d(0 dX

0

0

O X l l ' ' " ' OXpl' OX12'"" OXp2""' OXlq

o) OXpq

i.e.

dY (2.1)

d

dX - dvec'~

| vec Y.

Higher order derivatives are defined recursively:

dkY dX k

(2.2)

d dk-lY

dX dX k-1

W h e n differentiating with respect to a s y m m e t r i c m a t r i x X we will instead of (2.1) use

dY (2.3)

dA~

d - d v e c ' A X Q vecY,

where A X denotes the u p p e r t r i a n g u l a r p a r t of X and vecAX = (Xll,Xlz,X22,...,Xlp,...,Xpp)'. T h e r e exists a (p2 x 89

+ 1 ) ) - m a t r i x G which is defined by the relation G ~vec T = vec A T .

APPROXIMATING BY THE WISHART DISTRIBUTION Explicitly the block-diagonal matrix G is given by p • (2.4)

Gii = ( e l , e 2 , . . . , e i )

769

blocks Gii:

i = 1,2,...,p,

where ei is the i-th unit vector, i.e. ei is the i-th column of Ip. An important special case of (2.3) is when Y = X. Replacing vec A X by G' vec X we get by definition (2.1) the following equality dX dAX

-

+ K,,,

- (K,,p)d)G,

where (I(p,p)d stands for the commutation matrix A'p,p where the off-diagonal elements have been put to 0. To shorten the expressions we shall use the notation (2.5)

H,,p = I,~ + A~.p - (K,,,)d,

where the indices may be omitted, if dimensions can be understood from the text. Hence our derivative equals the product dX dAX

-

HG.

Note, that the use of H G is equivalent to the use of the duplication matrix (see Magnus and Neudecker (1988)). For a random p • q-matrix X the characteristic function is defined as 9)x(T) = E[exp(i tr(T'X))], where T is a p x q-matrix. through the vec-opera;tor; (2.6)

The characteristic function can also be presented

~ x (T) = E[exp(i vec' T vec X)],

which is a useful relation for differentiating. In the case of a symmetric p x p-matrix X the nondiagonal elements of X appear twice in the exponent and so definition (2.6) gives us the characteristic function for X l l , . . . , Xpp, 2X12,..., 2Xpp_ 1. However, it is more natural to present the characteristic function of a symmetric matrix for X.ij, 1 gx (T)(i vec T ) ~'~~ cr -

' "~

T ,,o~ X

d.T .

A P P R O X I M A T I N G BY T H E W I S H A R T D I S T R I B U T I O N

Prenmltiplying this equality with a/ gives the statement of the corollary. symmetric case is treated similarly. []

773

The

Now we are able to present the main result of the section.

Let f(.x~)(X) denote the k-th derivative ~akfx(x) . If Y and X two random p x q-matrices the density f y ( X ) can be presented through th,e density f x ( X ) by the folio'wing formal equality: THEOREIVl 3.1.

ave

f y ( X ) = f x ( X ) - (E[Yl - E[X])'vec f (1) ( )

D[X] +

+21 vec'{D[Y] -

(ELY] - E[XJ)(E[Y] -

E[X])'}vec f(x?)(X)

i -

~{vec (c:~[Y] - c3[X]) + 3 vec'(D[Y] - D[X]) O (E[Y] - E[X])' + (E[Y]- E[X]) '| } vec

PROOF.

f!> p

we

2

vec'

-

P.-'}HG

(V/n + 2) -1

n

HG.

have 1

L I, (*I:, ~ ) ~ -72 vp('l{(V/'/, -{- ~_,)-1 _ ~ p - 1 } H C and using the mat.rix equality (e.g. see Srivastava and K h a t r i (1979), p. 7)

(A+BEB') -I =,4 I _ A

IB(B'.4-1B+ E-I)-IB'A -t,

where A and E are posit, ire definite nmtriees of orders p and q, respectively, and B is a p x q-matrix, we get t h a t

L*I(I"\ E)

1 {

m - - - vee' 211

E 1V1/2

0,

wh.e.re G a'nd H are defined by (2.4) and (2.5), respectively. PROOF. To obtain (5.1) we have to insert the expressions of" LA*.(X, E) and cumulants ck[AS*] and ck[AV] in (4.15) and examine tile result. At first let us

780

T O N U I_ 4 is of order 7~- a + l . F u r t h e r m o r e , from (4.7) and properties of sample cumulants of k-statistics it ibllows t h a t the diffe,'ences c~,.[AS*] - ca.[AV], k _> 2 are of order ~.. T h e n from the construction of the formal expansion (3.3) we have that for k = 2p, p = 2, 3 . . . . , the t e r m including L~ (X, s is of order 71'x 7~,-2j'+l = .n -p+I , where the main t e r m of the cunmlant differenees is the t e r m where the seeond o f der cumulants have been multiplied p times. Hence, the L](X, E)-terln, i.e. the

A P P R O X I M A T I N G B Y THE W I S H A R T D I S T R I B U T I O N

781

expression including L,*~(X, E) and the p r o d u c t of D[AS*] - D[AV] with itself, is O ( n - x ) , the La(X, E ) - t e r m is O ( n - 2 ) , etc. For k = 2p + 1, p = 2, 3, ... the order of the L~(X, E ) - t e r m is d e t e r m i n e d by the p r o d u c t of L*k(X, E) and the (p - 1) p r o d u c t s of the differences of the second order cumulants and a difference of the third order cumulants. So the order of the L;(X, E ) - t e r m (k = 2p + 1) is n -2p • n p-1 x n = n -p. Thus, the L;(X, E ) - t e r m is O ( n - e ) , the L*(X, E ) - t e r m is O ( n -3) and so on. T h e presented arguments complete the proof. [] Our second application is a b o u t the non-central Wishart distribution. It turns out t h a t T h e o r e m 4.1 gives a very convenient way to describe the non-central Wishart density. Previously the approximation of the non-central Wishart distribution by the Wishart distributions has, among others, been considered by Steyn and R o u x (1972) and Tan (1979). Both Steyn and Roux (1972) and Tan (1979) p e r t u r b e d the covariance m a t r i x in the Wishart distribution so t h a t moments of the Wishart distribution and the non-central Wishart distribution should be close to each other. Moreover, Tan (1979) based his approximation on Finney's (1963) approach but never explicitly calculated the derivatives of the density. It was not considered t h a t the density is d e p e n d e n t of n. Although our approach is a m a t r i x version of Finney's there is a fundamental difference with the approach in Steyn and R o u x (1972) and Tan (1979). Instead of p e r t u r b i n g the covariance m a t r i x we use the idea of centring the non-central Wishart distribution. Indeed, as shown below, this will also simplify the calculations because we are now able to describe the difference between the cumulants in a convenient wa.v, instead of treating the cumulants of the Wishart distribution and non-central Wishart distribution separately. Let Y ~ Wv(E,n,.p), i.e. the non-central Wishart distribution with a noncentrality p a r a m e t e r E-lp.p ~. If E > 0 the m a t r i x Y has the characteristic flmction (see Muirhead (1982)) (5.3)

~5-(T) = F~t" (T)e-

tr(x-lplJ)/2etr(Z-lpp'(l,,-iM(T)E)-l)/2

where M(T) and ~ w ( T ) , the characteristic function of W ~ I,lp(E, n), are given

by (4.2). We shall consider centred versions of Y and IF again, where IY ~ II'/)(E, n). Let Z = Y - n E - t t p ~ and V = W - hE. Since we are interested in the differences ck [ Z ] - c t . IV], k = 1, 2, 3 , . . . we can, by the similarity of the characteristic flmctions ~z(T) and ~..(T), obtain by (5.3) the difference of the cumulative flmctions

"~/;z(r) -!/.'v (T) = - ~1 t r ( ~ - l p # ') - i ~ 1 tr{M(T)pp/} +[1 t r { ~ _ l l t p , ( i _ i A f ( r ) z ) _ l } After expanding the m a t r i x (I - iM(T)2) -1 we have (5.4)

1~

I['z(T)- ~'v(T)= ~

j=2

ij tr{~-lpp'(llf(T)~)J}.

782

TONU KOLLO

AND

DIETRICH VON ROSEN

From (5.4) it follows t h a t cl [Z] - Cl [V] = 0, which, of course, must be true because E[Z] = E[V] = 0. In order to obtain the difference of the second order cmnulants we have to differentiate (5.4) and obtain 1 d2tr(tLp'M(T)EM(T)) c2[Z] - c2[V] = 2 dAT 2

(5.5)

= c'(•

o ~,~,.'r + 6, e z~,.S)(%.-, +

,'q,.~)c.

Moreover,

c3[z] - c3[v] = (C(%., + E,~,,,) ~.~ c')(I,, r I,:p,,~ ~ I,~)(/,., + E,,~,,~)

(5.6)

x {x: c, E C, vec #p' + (E e p#' + F P / e E) ~=~vee E}

• (z,,~ + K,,,,JG. Hence the next theorenl will easily follow. THEOREM 5.2. Let Z = Y - 'hE - tttt', where Y ~ II'I~(E, 71,p) and V = l'l" - hE, wh,ere W ~ IVp(E, 7~). Th, en (5.7)

,fz(X) = .fv(X)

1 + ~1 vec'{tlp o # / z

+ 6, ,% z # / ) ( % ~ + < , , , j }

x (G 0,

where L*~.(X,E), k = 2,3 are give,, by (4.14), (4.6) and (4.12), ( c 3 [ A Z ] - c3[AV]) i.s determined by (5.6) and G i.s defined by (2.4). PROOF. T h e proof follows from (4.15) if we replace the difference of the second order cumulants by (5.5) and take into account t h a t by L e m m a 4.2 L*~.(X, E) is of order n -k+~, k > 3, and t h a t the differences of c m n u h m t s ck[AZ] - c A . [ A V ] do not depend on 7~. [] For an a p p r o x i m a t i o n of order n - 1 we get from T h e o r e u l 5.2 the following COROLLARY 5.1.

f z ( X ) -- f v ( X )

1

1

4~ vec'{(Ip m'J p p ' E +

I,,

c:> -rp#')(6~ + A'p.,~)}

• ( G G ' H C, G G ' H )

• ,,ec((Vl, + ~)-1 ~ (Vl, + ~)-') + o

(1)} ~

x > o.

PROOF. T h e statement follows fronl (5.7) if we omit the L:](X,E)-term, which is of order n -2 and use the 'n-I term fl'om (4.14) for L~(X, E). []

A P P R O X I M A T I N G BY T H E W I S H A R T D I S T R I B U T I O N

783

Acknowledgements The authors are extremely thankful to the referees whose remarks have avoided several lnisprints and mistakes which occurred in the first version of the paper. T6nu Kollo is indepted to the Swedish Institute for covering a stay at the Uppsala University and to the Estonian Research Fotmdation for their support. Dietrich yon Rosen grateflflly acknowledges the support from the Swedish Natural Research Council. REFERENCES Barndorff-Nielsen, O. E. and Cox, D. R. (1989). Asymptotic Techniques for Use in Statistics, Chapman and Hall, London. Cornish, E. A. and Fisher, R. A. (1937). Moments and cumulants in the specification of distributions, Inter'nat. Statist. Rev., 5, 307-322. Finney, D. J. (1963). Some properties of a distribution specified by its cumulants, Technometrics, 5, 63-69. Hall, P. (1983). Chi-squared approximations to the distribution of a sum of independent random variables, Ann. Probab., 11, 1028 1036. Kollo, T. (1991). Matrix Derivative in Multivariate Statistics, Tartu University Press, Tartu (in Russian). Kollo, T. and Neudecker, H. (1993). Asymptotics of eigenvalues and unit-length eigenvalues of salnple variance and correlation inatrices, .]. Multivariate Anal., 47, 283-300. Magims, J. R. and Neudeeker, H. (1988). Matrix Differential Calculus with Applications in Statistics and Econometr'ics, Wiley, Chichester. McC, ullagh, P. (1987). Tensor" Methods in Statistics, Chapman and Hall, London. Muirhead, R. J. (1982). Aspects of Multivariate Statistical Theory, Wile3', New York. Skovgaard, I. (1986). On the multivariate Edgeworth expansions, Internal. Statist. Rev., 54, 169-186. Srivastava, M. S. and Khatri, C. G. (1979). An Introduction to Multivariate Statistics, North Holland, New York. Steyn, H. S. and Roux, J. J. J. (1972). Approximations for the noncentral Wishart distribution, South African Journal of Statistics, 6, 165 173. Tan, W. Y. (1979). On the approximation of noneentral Wishart distribution by Wishart distribution, Metron, 37, 50--58. Tan, W. Y. (1980). On approximating multivariate distributions, Multivariate Statistical ATmlysis, Proceedings of the Research Seminar at Dalhousie University, 237-249, North Holland, Amsterdam. Traat, I. (1984). Moments of the sample eovarianee matrix, Proceedings of the Computer Centre of Tartu State University, 51, 108-126 (in Russian). Traat, I. (1986). Matrix calculus for nmltivariate distributions, Acta et Commenlationes Universitatis Tartuensis, 733, 64-84.