Moments and cumulants of the multivariate real

Moments and cumulants of the multivariate real and complex Gaussian distributions Kostas Triantafyllopoulos1 Department of Mathematics University of Bristol University Walk Bristol BS8 1TW United Kingdom E-mail: [email protected] Version: 12 March 2002 This paper considers the problem of higher order moments and cumulants for the multivariate normal distribution. An older result of this problem is criticized as far as its practical use. We give an analytical form and a much simpler proof of the central moments and we provide a sequential updating calculation for the general moments. We also consider the central higher order moments of the complex-valued multivariate normal distribution.

Key Words: function.

Normal distribution; high order moments; characteristic

1.

INTRODUCTION

Let X = (X1 , . . . , Xn )0 be a Gaussian real vector with some known vector mean ξ and covariance matrix C. Then the k-order moments of X are defined by   m Y r Xj j  , (1) µr1 ,...,rm (X) = E  j=1

where r1 + · · · + rm = k and rj ≥ 1. The problem of the moments consists of calculating µr1 ,...,rm (X) in terms of ξ, C. Until 1988 there was no general formula for any moment of arbitrary order k and for any parameters ξ, C and n. Considering the standardized bivariate normal distribution, Pearson and Young (1918) gave tables up to the 10-order moments, for correlation between the two variables 0,0.05,1. For the same distribution, Kendall and Stuart (1963) gave a recurrence relationship of the 2-order moments. An analytical exact formula for the same problem is given in Johnson et al. (2000, page 261). 1 First

author thanks.

1

Similar results may obtain for the trivariate standardized normal distribution. Only in 1988 a general solution was given. Holmquist (1988) proved a general form of µr1 ,...,rm (X). Holmquist (1996) generalized this result to include the derivation of higher order moments of quadratic forms of normal distributions. The problem of moments and cumulants of normal random matrices is considered in Ghazal and Neudecker (2000). Using the Kronecker product they derive simple formulae for the special case of second and fourth order moments of random matrices. This paper proposes an alternative method of the calculation of higher order moments of the Gaussian distribution. The central moments are derived in a faster and more elegant way, while an algorithm for noncentral moments is developed. We also provide a sequential algorithm for the central moments of the complex valued normal distribution. Holmquist (1988) has a leading role to all of the above mentioned papers because it develops high order moments for the Gaussian distribution in a neat mathematical formulation. Section 2 discusses briefly this result and addresses the need for an alternative method of deriving the moments. Our main result is given in Section 3. First we give a theorem that provides a simpler and well understood formula for the central moments. This is an alternative method of deriving the central moments of the normal distribution, giving an easier and more practical way of calculating them. Then we consider the general moments by using the central moments. We provide a formula for the k-order moments working recurrently using the (k−1)-order moments as well as a formula for the complex-valued Gaussian moments. The proofs are given in Appendix A. This research is based on some unpublished notes that Professor P.J. Harrison wrote on the moments and cumulants of the normal distribution back in 1978. These notes as well as motivating discussions with him were inspired me to develop the full version of the k-order moments of the Gaussian distribution. 2.

BACKGROUND

Here, we briefly discuss the method of Holmquist (1988) and we address the associated problems in application. Let X = (X1 , . . . , Xn )0 ∼ N (ξ, C) for some known moments ξ, C. Let “⊗” denote the Kronecker product and X (k) = X ⊗ · · · ⊗ X so that the elements of X (k) are exactly all the k-order moments of X. Also, write v = (c11 , . . . , cnn )0 , where cii is the variance of Xi (i = 1, . . . , n). If Ei,j is a nk × nk matrix with the (i, j)th element equal to 1, being the only non-zero element and q(lk ; n1k ) = 1 +

k X j=1

2

(ij − 1)nj−1 ,

where lk = (i1 , . . . , ik )0 and 1k = (1, . . . , 1)0 (k times), then the direct product permutating matrix with k degrees n is defined by X Eq(πlk ;n1k ),q(lk ;n1k ) , Qn1k (π) = lk ∈{1,...,n}k

with πlk = (iπ−1 (1) , . . . , iπ−1 (k) )0 (π is the argument in the symmetric group of order k), and the symmetrizer Sn1k is defined by X Qn1k (π)/k!, Sn1k = π

where the summation extends over all k! permutations π in the symmetric group of order k. After this notation, Holmquist (1988) proved that the non-central moments are given by [k/2]

E(X

(k)

) = k!

X

Sn1k

i=0

ξ (k−2i) ⊗ v (i) i!(k − 2i)!2i

(2)

and the central moments by E(X − ξ)(k) = (k − 1)!!Sn1k v (k/2) ,

(3)

if k is even, E(X − ξ)(k) = 0 if k is odd, where [k/2] denotes the integral part of k/2 and (k − 1)!! = (k − 1)(k − 3) · · · 1. For more details and the proofs the reader is referred to Holmquist (1988). Although these results are very neat and the vector formulation is often advantageous, here there are problems in application. For example suppose that X = (X1 , . . . , X10 )0 is normally distributed with known mean ξ and variance C and that we want to calculate E(X42 X8 X9 ). The product X42 X8 X9 will be in X (4) , but in order to find its position at X (4) , we need to perform 2 steps of calculations, namely X (3) = X (2) ⊗ X and X (4) = X (3) ⊗ X. Of course in this special case we can obtain X (4) only from X (2) in one step, but this may not be possible for higher orders. Then the calculation of the Kronecker product ξ (4−2i) ⊗ v (i) (equation (2)) requires the calculation of the elements of ξ (4−2i) , v (i) , which will take the same route, for each i = 0, 1, 2, as for X (4) . It follows that even for the 4order moment calculation a computer programme is necessary. In general, the symmetrizer Sn1k is not easy to be calculated, for large k. Even in the special, but important, case of the central k-order moments the statistician does not obtain a practical idea of how to get these moments without the aid of a computer. It is probable that this method will not appeal practitioners that may be put off by the technical complications. In the next section we propose a new simpler and easy to apply method of calculating central moments and provide a sequential algorithm for non-central higher order moments. We also provide an algorithm for the moments of the complex-valued Gaussian distribution. 3

3.

MAIN RESULTS

First we consider the central moments of order k. Without loss in generality let X = (X1 , . . . , Xk )0 follows a normal distribution with known mean ξ and variance C = {cij }, i, j = 1, . . . , k. We will simplify equation (1) by writing µr1 ,...,rm (X) = µ1,...,k (X), where the variables Xj are not necessarily distinct. Then the following result apply. Theorem 3.1. The central k-order moments of the variable X are given as follows. (a) If k is odd, µ1,...,k (X − ξ) = 0. (b) If Pk is even with k = 2λ (λ ≥ 1), then it is µ1,...,2λ (X − ξ) = (cij ckl · · · cxz ), where the sum is taken over all permutations of {1, . . . , 2λ} giving (2λ − 1)!/(2λ−1 (λ − 1)!) terms in the sum, each being the product of λ covariances. For example the 4-order moments of X are E(Xi4 ) = 3cii E(Xi3 Xj ) = 3cii cij E(Xi2 Xj2 ) = cii cjj + 2(cij )2 E(Xi2 Xj Xp ) = cii cjp + 2cij cip E(Xi Xj Xp Xq ) = cij cpq + cip cjq + ciq cjp . These equations show clearly the mechanism of the above theorem. No matter how high order of central moments are required it is notably easy to calculate them, as long as the covariances cij are given. Next we consider the general moments of X, µ1,...,k (X). But first we need the following lemma. Lemma 1. Let xi , ai (i = 1, . . . , n; n ≥ 2) be any real numbers. Then n Y

(xi + ai ) =

i=1

n Y i=1

xi +

n−1 X

m XY

m=1 jk 6=i k=1

ajk

Y i∈Jn−m

xi +

n Y

ai ,

i=1

where Jn−m are all the n!/(m!(n − m)!) subsets of {1, . . . , n} not including m elements. Note that when xi = x and ai = a for all i = 1, . . . , n, this lemma is the well known binomial theorem.

4

Let now ξ = (ξ1 , . . . , ξk )0 . Then the k-order moments of X can be calculated as µ1,...,k (X)

= µ1,...,k (X − ξ) −

k−1 X

X

m=1 jl k

−(−1)

k Y

m

(−1)

m Y l=1

(−j ,...,−jm )

1 ξjl µ1,...,k−m

ξi ,

(X − ξ)

(4)

i=1 (−j ,...,−j )

1 m (X − ξ) is the (k − m)-order moment not including where µ1,...,k−m the variables Xj1 , . . . , Xjm . This result is immediate from Lemma 1 by substituting ai = −ξi , n = k, and Theorem 3.1. For example for k = 3 we obtain

E(X1 X2 X3 ) = ξ1 c23 + ξ2 c13 + ξ3 c12 + ξ1 ξ2 ξ3 , where ξ = (ξ1 , ξ2 , ξ3 )0 and cij =cov(Xi , Xj ), i, j = 1, 2, 3. Similarly we can calculate E(X13 ), E(X23 ), E(X33 ), E(X12 X2 ), E(X12 X3 ), E(X1 X22 ), E(X1 X32 ), E(X22 X3 ), E(X2 X32 ) so that we can proceed with k = 4. Equation (4) provides a sequential algorithm that is an alternative to equation (2) proposed by Holmquist (1988). Now we consider the multivariate complex normal distribution. The general case of the multivariate complex normal distribution is obtained 0 when the complex random √ vector Z = (Z1 , . . . , Zk ) , where Zj = Xj + iYj (j = 1, . . . , k; i = −1), is such that both random vectors X = (X1 , . . . , Xk )0 and Y = (Y1 , . . . , Yk )0 follow k-variate normal distributions. In general, the distribution of Z depends on X, Y in a way that the normal density is not expressible only in terms of Z. Wooding (1956) introduced a special case, restricting the covariance matrix of X, Y , that generalizes all the basic statistical theory related with the real multivariate normal distribution. For more details on these topics see the above reference as well as Goodman (1963) and Khatri (1965). This distribution is also described in Johnson et al. (2000, page 222). In brief, if ξ, η are the means of X, Y , and V = E((X + iY )(X − iY )0 ) the density of Z can be written as ¯ 0 V −1 (z − ζ)}, z − ζ) p(z) = π −k |V |−1 exp{−(¯ where z = x + iy, ζ = ξ + iη and the “−” notation is for the conjugate of a complex quantity, (e.g. z¯ = x − iy). ζ, V are called the mean and variance of Z. Being motivated from the multivariate higher order moments for the real case, we define the k-order moments for Z. Definition 1. Let Z = (Z1 , . . . , Zk )0 be a complex random k-vector , Xk )0 , Y = (Y1 , . . . , Yk )0 are such that and that the k-vectors X = (X1 , . . .√ Zj = Xj + iYj , j = 1, . . . , k and i = −1. Then the k-order moments of Z

5



are defined to be

µ1,...,k (Z) = E 

k Y

 Zj  ,

j=1

where not all Zj are necessarily distinct. Considering the central moments of order k and using Theorem 3.1 and Lemma 1, we have µ1,...,k (Z − ζ)

= µ1,...,k (X − ξ) + ik µ1,...,k (Y − η)   k−1 m X X Y Y m  i E (Yjp − ηjp ) (Xs − ξs )(5) + m=1 jp 6=s

p=1

s∈Jk−m

where ξ = (ξ1 , . . . , ξk )0 and η = (η1 , . . . , ηk )0 . Now if k is odd, µ1,...,k (Z − ζ) = 0, since from (a) of Theorem 3.1 the expectation of any product of k normal variables is 0. If k is even, writing k = 2λ, each of the expectations of the right hand side of (5) is a sum of (2λ − 1)!/(2λ−1 (λ − 1)!) terms, as calculated from (b) of Theorem 3.1. Similarly as in the real case, we can calculate the non-central moments of Z, using again Lemma 1. These results give insight into the matrix moments calculation. As usual the random matrix (either real or complex) can be vectorized so that the above results are directly applied, see Ghazal and Neudecker (2000). APPENDIX A: PROOFS Proof of Theorem 3.1. The characteristic function of X is φ(t) = exp{−t0 Ct/2} with the expansion φ(t) =

∞ X

(−t0 Ct/2)ν /ν!.

(6)

ν=0

(a) is obvious from symmetry and the quadratic form of φ(t). The proof of (b) is by induction. For λ = 1 it is clearly true, giving E((X1 − ξ1 )(X2 − ξ2 )) = c12 . Now for any λ ≥ 2 it is 1 ∂ 2λ (t0 Ct)λ , (7) µ1,...,2λ (X − ξ) = λ 2 λ! ∂t1 · · · ∂t2λ t=0 since the partial derivatives of order ν 6= λ in equation (6) vanish. From P2λ P2λ t0 Ct = i=1 j=1 cij ti tj we have 2λ 2λ X (t0 Ct)λ−1 X 1 ∂(t0 Ct)λ 2λ 0 λ−1 (t Ct) c t = cij tj . ij j 2λ λ! ∂ti 2λ λ! 2λ−1 (λ − 1)! j=1 j=1

6

Taking again the partial derivative of the above expression with respect to tj (j 6= i) we have     ! 2λ 2λ 2λ 0 λ−2 X X Ct) (t ∂  (t0 Ct)λ−1 X  cij tj  = cij tj  cjm tm ∂tj 2λ−1 (λ − 1)! j=1 2λ−2 (λ − 2)! j=1 m=1 +

(t0 Ct)λ−1 cij . 2λ−1 (λ − 1)!

Without loss in generality assume i < j. By taking the (2(λ − 2))-order partial derivative over t1 , . . . , ti−1 , ti+1 , . . . , tj−1 , tj+1 , . . . , t2λ around t = 0 P (−i,−j) of the last expression, we get µ1,...,2λ (X − ξ) = j cij µ1,...,2(λ−1) (X − ξ), (−i,−j)

where µ1,...,2(λ−1) (X − ξ) is the (2(λ − 1))-order moment not involving one i and one j term. Hence result follows from induction. Proof of Lemma 1. The proof is by induction. For n = 2 it is (x1 + a1 )(x2 + a2 ) = x1 x2 + a2 x1 + a1 x2 + a1 a2 , which provides a direct validation. Now let the lemma be true for n ≥ 3. It is   m n+1 n n−1 n Y Y X XY Y Y (xi + ai ) = (xn+1 + an+1 )  xi + ajk xi + ai  i=1

m=1 jk 6=i k=1

i=1

=

n+1 Y i=1

n X

xi +

X

m Y

m=1 jk 6=i k=1

upon observing that xn+1

Y

xi =

i∈Jn−m

Y

ajk

xi +

i∈Jn+1−m

Y

i∈Jn−m n+1 Y

i=1

ai ,

i=1

xi ,

i∈Jn+1−m

where m of the left hand side takes values 1, . . . , n − 1, while m on the right hand side takes values 1, . . . , n. Then the result follows from induction. ACKNOWLEDGMENTS I am grateful to Professor P.J. Harrison who gave me some of his unpublished notes on the cumulants of the normal distribution. This research would not be possible without those notes together with his inspirational discussions.

REFERENCES [1] Ghazal, G.A., and Neudecker, H., On second-order and fourth-order moments of jointly distributed random matrices: a servey, Linear Algebra and its Applications 321, (2000), 61-93. 7

[2] Goodman, N.R., Statistical analysis based on a certain multivariate complex Gaussian distribution (an introduction), Annals of Mathematical Statistics 34, (1963), 152-177. [3] Holmquist, B., Moments and cumulants of the multivariate normal distribution, Stochastic Analysis and Applications 6, (1988), 273-278. [4] Holmquist, B., Expectations of products of quadratic forms in normal variables, Stochastic Analysis and Applications 14, (1996), 149-164. [5] Kendall, M.G., and Stuart, A., “The Advanced Theory of Statistics,” 1, London: Griffin, 1963. [6] Khatri, C.G., Classical statistical analysis based on a certain multivariate complex Gaussian distribution, Annals of Mathematical Statistics 36, (1965) 98-114. [7] Johnson, N.L., Kotz, S., and Balakrishnan, N., “Continuous Multivariate Distributions”, Volume 1, 2nd. edt., Wiley, New York, 2000. [8] Pearson, K., and Young, A.W., On the product-moments of various orders of the normal correlation surface of two variates. Biometrika 12, (1918), 86-92. [9] Wooding, R.A., The multivariate distribution of compex normal variables. Biometrika 43, (1956), 212-215.

8