REML ESTIMATION - Project Euclid

30 downloads 103 Views 491KB Size Report
example, Hartley and Rao 1967 and Miller 1977 . We give complete an- swers to these questions. For all balanced mixed models of the analysis of. Ž . variance ...
The Annals of Statistics 1996, Vol. 24, No. 1, 255 ] 286

REML ESTIMATION: ASYMPTOTIC BEHAVIOR AND RELATED TOPICS BY JIMING JIANG University of California, Berkeley The restricted maximum likelihood ŽREML. estimates of dispersion parameters Žvariance components. in a general Žnon-normal. mixed model are defined as solutions of the REML equations. In this paper, we show the REML estimates are consistent if the model is asymptotically identifiable and infinitely informative under the Žlocation. invariant class, and are asymptotically normal ŽA.N.. if in addition the model is asymptotically nondegenerate. The result does not require normality or boundedness of the rank p of design matrix of fixed effects. Moreover, we give a necessary and sufficient condition for asymptotic normality of Gaussian maximum likelihood estimates ŽMLE. in non-normal cases. As an application, we show for all unconfounded balanced mixed models of the analysis of variance the REML ŽANOVA. estimates are consistent; and are also A.N. provided the models are nondegenerate; the MLE are consistent ŽA.N.. if and only if certain constraints on p are satisfied.

1. Introduction. The restricted or residual maximum likelihood ŽREML. method was proposed by Thompson Ž1962. as a way of estimating dispersion parameters associated with linear models. Several authors have given overviews on REML, which will be given in the sequel. Although the REML method has been used and studied over the past 30 years, questions remain on how good REML is compared with other estimates. Some of the questions are related to the asymptotic behavior of the REML estimates, especially when the rank p of design matrix of fixed effects tends to infinity. In such cases it is well known, by the Neyman]Scott example w Neyman and Scott Ž1948.x , that the maximum likelihood estimates ŽMLE. can be inconsistent. What can we say about the REML estimates? And under what condition will the MLE be consistent Žasymptotically normal.? Furthermore, can the REML estimates obtained under normality still perform well asymptotically in nonnormal cases? In particular, is it true that for balanced data the ANOVA estimates, which agree with solutions of REML equations under normality, are always consistent even if normality does not hold and p ª `? These questions, along with others, will be investigated in this paper. The REML method was put on a broad basis for unbalanced data by Patterson and Thompson Ž1971.. Surveys of REML can be found in articles of

Received May 1994; revised May 1995 AMS 1991 subject classification. 62F12. Key words and phrases. Mixed models, restricted maximum likelihood, MLE, ANOVA, consistency, asymptotic normality.

255

256

J. JIANG

Harville Ž1977., Khuri and Sahai Ž1985. and Robinson Ž1987. and in a recent book by Searle, Casella and McCulloch Ž1992.. Different derivations of the REML show it may also be regarded as a method of marginal likelihood w Harville Ž1974. and Verbyla Ž1990.x or modified profile likelihood w Barndorff-Nielsen Ž1983.x . Other areas in which REML has been used include the following: estimating smoothing parameters in penalized estimation w Wahba Ž1990.; see Speed Ž1991. for discussionx ; the estimation of parameters in ARMA processes and other time series in the presence of fixed effects w Cooper and Thompson Ž1977. and Azzalini Ž1984.x ; REML estimation in spatial models w Green Ž1985. and Gleeson and Cullis Ž1987.x ; the analysis of longitudinal data w Laird and Ware Ž1982.x ; and REML estimation in empirical Bayes smoothing of the census undercount w Cressie Ž1992.x . Consider a general mixed model

Ž 1.

y s X b q Z1 a 1 q ??? qZ s a s q « ,

where y is an N = 1 vector of observations; X is an N = p known matrix of full rank p; b is a p = 1 vector of unknown constants Žthe fixed effects.; Zi is an N = m i known matrix; a i is an m i = 1 vector of i.i.d. random variables with mean 0 and variance si2 Žthe random effects., i s 1, . . . , s; and « is an N = 1 vector of i.i.d. random variables with mean 0 and variance s 02 Žthe errors.. Asymptotic results for the mixed model Ž1. are few in number, with or without normality assumptions. Assuming normality and assuming that the model has a standard ANOVA structure with the number p s rankŽ X . fixed, Miller Ž1977. considered the MLE for both fixed effects and variance components s 02 , s 12 , . . . , ss2 . He formulated a set of conditions under which the consistency and asymptotic normality of a sequence of solutions of the likelihood equations were proved. He also noted that normalizing sequences of different orders of magnitude might be required for estimates of different parameters. Under conditions slightly stronger than those of Miller Žin particular, with normality and p fixed., Das Ž1979. obtained a similar result for the REML estimates and found that in his situation the REML estimates and the MLE are in some sense equivalent. In a quite different direction, Speed Ž1986. proved that in the balanced case with p s 1 the usual ANOVA estimates of variance components are consistent without assuming normality. Also without normality, Westfall Ž1986. obtained asymptotic normality of the ANOVA estimates of variance components for unbalanced mixed models with a nested structure; Brown Ž1976. proved asymptotic normality of C. R. Rao’s MINQUE, and the so-called I-MINQUE w e.g., Rao and Kleffe Ž1988., Section 9.1x under replicated error structure w e.g., Anderson Ž1973.x . Recently, asymptotic behavior of the REML estimates was discussed by Cressie and Lahiri Ž1993. and by Richardson and Welsh Ž1994.. Normality was assumed in the first paper but not in the second, although the second was restricted to hierarchical Žnested. models. However, p was held fixed in both studies. It should be pointed out that when p is fixed or bounded, the REML estimates and the MLE for the variance components are equivalent in the sense that

REML ESTIMATION

257

their suitably normalized difference converges to zero in probability Žand hence there would be no essential difference asymptotically between the two estimates.. It follows that the boundedness of p is a serious restriction, and an important and interesting question regarding the Žpossible. superiority of REML over straight ML in estimating the variance components is how do the REML estimates behave asymptotically when p ª `. Our main goal is to establish a general theorem on the asymptotic behavior of the REML estimates for the dispersion parameters Žvariance components. in model Ž1. without any assumption on the structure of the model Žsuch as balancedness, nestedness or ANOVA design., boundedness of p and normality. This may sound confusing because the REML estimates are usually, if not always, defined under normality. One obvious approach is to use the true likelihood of AX y, where A is an N = Ž N y p . matrix of full rank with AX X s 0. However, such likelihood may not have a simple closed form. Furthermore, the estimates thus obtained may depend on A, which will not occur with the normal likelihood. An alternative is to treat the REML estimates as a kind of M-estimates w e.g., Huber Ž1981.x , that is, solutions of the REML equations, taking into account the nonnegativity constraints. We employed the second definition. Similarly we treat the Gaussian MLE in the nonnormal cases. Before we give the general result, we want to take a look at a simple case of Ž1., the balanced case. For balanced data the REML solutions are identical to the ANOVA estimates, and this is true whether normality is assumed or not w e.g., Searle, Casella and McCulloch Ž1992., page 253x . It is well known that for balanced data the ANOVA estimates are best quadratic unbiased estimates Žbest unbiased estimates under normality.. However, it is not clear whether in all balanced cases the ANOVA estimates are consistent Žasymptotically normal. even without normality and possibly with p ª `, although one would expect such a result. Results covering special cases are available w see Speed Ž1986. and Westfall Ž1986.x . It is well known that, with p ª `, the MLE for variance components can be inconsistent, but it is also not clear under exactly what conditions the MLE will be consistent Žasymptotically normal.. In particular, is it true that under normality and with p fixed for all balanced mixed ANOVA models the MLE are asymptotically normal and efficient in the sense of attaining the Cramer]Rao lower bound? Such results ´ are, of course, expectable, and simple examples have been discussed in, for example, Hartley and Rao Ž1967. and Miller Ž1977.. We give complete answers to these questions. For all balanced mixed models of the analysis of variance, the REML ŽANOVA. estimates are consistent, provided the models are not confounded and variance components are positive; they are also asymptotically normal if, furthermore, the models are nondegenerate. The MLE are consistent Žasymptotically normal. if and only if certain constraints on p are satisfied. In particular, the answer to the last question on efficiency is positive. The general result for model Ž1. turns out to be the following. The REML estimates are consistent if the model is asymptotically identifiable and in-

258

J. JIANG

finitely informative under the invariant class ŽAI 4 .; they are also asymptotically normal if, furthermore, the model is asymptotically nondegenerate ŽAND.. We will make it clear what AI 4 and AND mean. A necessary and sufficient condition for asymptotic normality of the MLE is also given. The proof of the above result is based on a central limit theorem for quadratic forms of random variables, which seems to be of interest itself in limit theorems. In Section 2 we define our estimates. Main results are given and explained in Section 4. Section 5 develops some central limit theorems for the quadratic forms. Comments and remarks are made in Section 6 and proofs given in Section 7. 2. Definitions of the estimates. Following Hartley and Rao Ž1967. Žbut using different notation., we consider the following parameters of variance components: l s s 02 , m i s si2rs 02 , i s 1, . . . , s. There is a 1]1 correspondence between the two sets of parameters l, m i , 1 F i F s and si2 , 0 F i F s, and all results we obtain in this paper for the first set of parameters have analogues for the second set. Therefore we will focus on the first set of parameters. The parameter space is Q s  u : l ) 0, m i G 0, i s 1, . . . , s 4 ,

Ž 2.

where u s Ž l, m 1 , . . . , m s .X . Basic results such as deriving the REML and the maximum likelihood ŽML. equations can be found in Searle, Casella and McCulloch Ž1992., Section 6. The REML equations under normality are equivalent to

Ž 3. Ž 4.

zX V Ž A, m . zX V Ž A, m .

y1

y1

AX Zi ZiX AV Ž A, m .

z s lŽ N y p . , z s l tr Ž ZiX AV Ž A, m .

y1

where z s A y, A is an N = Ž N y p . matrix with X

Ž 5.

and

rank Ž A . s N y p,

y1

AX Zi . , 1 F i F s,

AX X s 0, s

V Ž A, m . s AX A q

Ý mi AX Zi ZiX A.

is1

Note that Ž3. and Ž4. do not depend on the choice of A so long as Ž5. is satisfied. In general Žwithout assuming normality., the REML estimates for l, m i , i s 1, . . . , s, are defined as solutions of Ž3. and Ž4. that belong to Q, whenever such solutions exist. REMARK 2.1. Although the REML equations Ž3. and Ž4. are derived by assuming y ; N Ž X b , lVm ., where s

Vm s IN q

Ý m i Zi ZiX ,

is1

259

REML ESTIMATION

normal likelihood is not the only one that can lead to the REML equations. For example, suppose in Ž1. that y has a multivariate-t distribution with degree of freedom k, y ; t N Ž X b , lVm , k ., where Y ; t nŽ m , S, k . if Y has density pŽ y. s

G Ž Ž n q k . r2 .

Žp k .

nr2

G Ž kr2 .

det Ž S .

y1r2

1q

1 k

Ž y y m . Sy1 Ž y y m . X

y Ž nqk .r2

.

Such distributions have been used in multiple linear regression w e.g., Zellner Ž1976.x . It can be shown that under the multivariate-t distribution, the likelihood of AX y again leads to Ž3. and Ž4.. Similarly, the MLE for both the fixed effects and l, m i , i s 1, . . . , s, are defined as solutions of the ML equations under normality with the same constraints on the l and m i ’s. The ML equations are equivalent to

Ž X X Vmy1 X . b s X X Vmy1 y,

Ž 6. Ž 7. Ž 8.

zX V Ž A, m .

y1

zX V Ž A, m .

y1

z s l N,

AX Zi ZiX AV Ž A, m .

y1

z s l tr Ž ZiX Vmy1 Zi . ,

1 F i F s.

In this paper our interest is the MLE for the parameters of variance components, namely, the l and m i ’s. So in the following the term MLE will refer to the MLE for the l and m i ’s. It is seen from Ž6. ] Ž8. that the MLE belong to the class of Žlocation. invariant estimates Žinvariant class.

Ž 9.

I s  estimates which are function of AX y with A satisfying Ž 5 . 4

w see Rao and Kleffe Ž1988., Section 4.4x . Other estimates that belong to I include the ANOVA estimates for variance components in a mixed model w Henderson’s method III, Že.g., Searle, Casella and McCulloch Ž1992., Section 5.x and some of the MINQE’s w e.g., Rao and Kleffe Ž1988., Sections 5 and 9.1x . Note that the REML estimates are the MLE based on AX y. From this point of view, the REML method seems to lose no information in estimating the parameters of variance components, and there is reason to expect the REML to behave well asymptotically, as will be seen next. 3. Notation. Let A, B, A1 , . . . , A s be matrices, and let a1 , . . . , a s be numbers. Define X 5 A 5 s l1r2 max Ž A A . ,

² A, B :R s tr Ž AX B . ,

5 A 5 R s tr 1r2 Ž AX A . ,

cor Ž A, B . s

² A, B :R 5 A5 R 5 B 5 R

,

A, B / 0,

CorŽ A1 , . . . , A s . s ŽŽcorŽ A i , A j .. if A1 , . . . , A s / 0, and is 0 otherwise; A l l is the lth diagonal element of A; diag Ž a i . is the diagonal matrix with diagonal elements a i , i s 1, . . . , s; In and 1 n are the n-dimensional identity matrix

260

J. JIANG

and vector of all 1’s, respectively. Let u 0 s Ž l 0 , mX0 .X be the true parameter vector, and let pi Ž N ., i s 0, . . . , s, be sequences of positive numbers; write bŽ m . s Ž IN m 1 Z1 ??? m s Z s .X ,

'

'

V Ž m . s AV Ž A, m .

y1

V0 Ž m . s b Ž m . V Ž m . b Ž m . , X

AX ,

Vi Ž m . s b Ž m . V Ž m . Zi ZiX V Ž m . b Ž m . , X

IN

U0 s V0 s

INy p

l0

Ui s Vmy1r2 Zi ZiX Vmy1r2 , 0 0

,

Vi s V Ž A, m 0 .

,

IiŽjN . s K iŽ jN . s

l0

y1r2

tr Ž Vi Vj . pi Ž N . p j Ž N .

1

Nqm

pi Ž N . p j Ž N .

ls1

where m s m1 q ??? qm s , «l , l0 WN l s a i lyNyÝ k - i m k , l0 m 0 i

¡

~'

¢ '

Ž IiŽjN . .,

i s 1, . . . , s,

AX Zi ZiX AV Ž A, m 0 . IiUj Ž N . s

,

y1r2

i s 1, . . . , s,

,

tr Ž Ui Uj . pi Ž N . p j Ž N .

,

Ý Ž EWN4 l y 3 . Vi Ž m 0 . l l Vj Ž m 0 . l lrl10

Ž is 0. q1 Ž js 0.

,

i , j s 0, 1, . . . , s, 1 F l F N, Nq

Ý m k q 1 F l F N q Ý m k , 1 F i F s.

k-i

kFi

Ž IiUj Ž N . .,

Define IN Ž u 0 . s s K N Ž u 0 . s Ž K iŽ jN . . and JN Ž u 0 . s 2 IN Ž u 0 . q K N Ž u 0 .. The abbreviation w.p. ª 1 refers to ‘‘with probability tending to 1’’; the abbreviation v.c., to ‘‘ variance components.’’ INU Ž u 0 .

4. Main results. First we note that in considering consistency and asymptotic normality of our estimates as N ª `, each m i can be, w.l.o.g., considered as a function of N. Since such results hold iff they hold for each sequence with N increasing strictly monotonically, in which case the m i ’s can readily be regarded as functions of N. ŽNote that the y, X, Zi ’s, A, etc., also depend on N.. The following assumptions A1 and A2 are made for model Ž1.. Let a 0 s « and m 0 s N. A1. For each N, a 0 , a 1 , . . . , a s are mutually independent; A2. For 0 F i F s, the common distribution of a i1 , . . . , a i m i may depend on N. However, it is required that

Ž 10.

4 1Ž < a i1 < ) x . s 0. lim sup max Ea i1

xª`

N

0FiFs

NOTE. If the common distribution of a i1 , . . . , a i m i is assumed not to depend on N, 0 F i F s, as is usually the case, then Ž10. is equivalent to the

261

REML ESTIMATION

existence of the fourth moments of a i1 , 0 F i F s. In general, Ž10. implies 4 sup N max 0 F i F s Ea i1 - ` but not the converse. DEFINITION 4.1. We say model Ž1. has positive v.c. Žp.v.c.. if the true parameter vector of v.c. is an interior point of Q; and it is nondegenerate ŽND. if 2 inf min var Ž a i1 . ) 0.

Ž 11.

N 0FiFs

ˆN , m A sequence of estimates Ž l ˆ N 1, . . . , m ˆ N s .X 4 is called asymptotically normal ŽAN. if there are sequences of positive numbers pi Ž N . ª `, 0 F i F s, Ž .5 k and a sequence of matrices  MN Ž u 0 .4 such that lim supŽ5 My1 N u0 5 MN Ž u 0 .5. - ` and

ž

ˆN y l0 . , MN Ž u 0 . p 0 Ž N . Ž l Ž 12.

p1Ž N . Ž m ˆ N 1 y m 01 . , . . . , ps Ž N . Ž m ˆ Ns y m0 s .

/

X

ªL N Ž 0, Isq1 . . Two sequences  pN 4 and  q N 4 are called equivalent, denoted by pN ; q N , if 0 - lim infŽ pN rq N . F lim supŽ pN rq N . - `. 4.1. The balanced case. A general balanced r-factor mixed model Žof the analysis of variance. can be expressed Žafter possible reparametrization. in the form Ž1., where X and the Zi ’s are Kronecker products w e.g., Searle, Casella and McCulloch Ž1992., Section 4.6; Rao and Kleffe Ž1988., pages 172]173x . By introducing indexes in Srq 1 s  0, 14 rq 1 , this can be written as

Ž 13.

y s Xb q

Ý Zi a i q « ,

igS

rq 1 d q 1 s m Ž . where X s 1dn11 m ??? m 1dnrq qs1 1 n q , with d s d1 , . . . , d rq1 g S rq1 , Z i s rq 1 rq 1 i q mqs1 1 n q with i s Ž i1 , . . . , i rq1 . g S ; Srq1 , 10n s In and 11n s 1 n . Hence rq 1 Ž . N s Ł qs 1 n q , p s Ł d q s0 n q and m i s Ł i q s0 n q , i g S ŁB ? ' 1 .

EXAMPLE 4.1. yi jk s m q a i q bj q c i j q « i jk , 1 F i F I, 1 F j F J, 1 F k F K, where a, b and c are random effects with c corresponding to the interaction Žbetween factors associated with a and b .. The model can be written as y s Ž 1 I m 1 J m 1 K . m q Ž II m 1 J m 1 K . a q Ž 1 I m I J m 1 K . b q Ž II m I J m 1 K . c q « . Thus r s 2, n1 s I, n 2 s J, n 3 s K ; N s IJK; d s Ž1 1 1., p s 1; S s Ž0 1 1., Ž1 0 1., Ž0 0 1.4 , mŽ0 1 1. s I, mŽ1 0 1. s J, mŽ0 0 1. s IJ. This model was discussed by Miller Ž1977., where he showed that under normality and I, J ª ` Žwhich implies N ª ` and m i ª `, i g S . the MLE were AN.

262

J. JIANG

Ž3. Ž1. Ž2. EXAMPLE 4.2. yi jk l s a iŽ1. q a iŽ2. q « i jk l , 1 F i F a, j q a i k q bk q bl Ž1. Ž2. and b are fixed main effects, 1 F j F b, 1 F k F c, 1 F l F d, where b a Ž1., a Ž2. and a Ž3. are random effects corresponding to a random main effect, a nested random factor and a fixed-by-random interaction. After reparametrization, namely, letting b k l s b kŽ1. q b lŽ2., b s Ž b k l ., the model can be written as Ž13..

EXAMPLE 4.3 ŽNeyman]Scott problem.. yi j s m i q « i j , 1 F i F n, j s 1, 2. This corresponds to Ž13. with S s B, X s In m 1 2 , p s n. It was shown by Neyman and Scott Ž1948. that as p ª ` the MLE for s«2 is inconsistent. However, the REML estimates are known to be AN w Hammerstrom Ž1978.x . EXAMPLE 4.4 ŽRandom model.. When p s 1, Ž13. is called a balanced random Žeffects. model. Speed Ž1986. proved the consistency of the ANOVA estimates in such a model without assuming normality. EXAMPLE 4.5 ŽNested design.. A balanced nested or hierarchical model is Ž13. with  d4 j S being a completely ordered subset of Srq 1 Ž u F v iff u q F vq , 1 F q F r q 1 gives a partial order in Srq1 .. Westfall Ž1986. showed that under certain conditions the ANOVA estimates are AN. The result did not require normality or balancedness, although p was assumed to satisfy prN ª 0 Žtherefore it did not cover Example 4.3.. The above examples are special cases of two general theorems which we will state in the sequel. DEFINITION 4.2. A general mixed ANOVA model Žnot necessarily balanced. is called unconfounded if Ži. the fixed effects are not confounded with the random effects and errors w i.e., rankŽ X, Zi . ) p, ; i and X / IN x and Žii. the random effects and errors are not confounded w i.e., the matrices IN and Zi ZiX , i g S, are linearly independent w e.g., Miller Ž1977.x . THEOREM 4.1. Let the balanced model Ž13. be unconfounded and have p.v.c. As N ª ` and m i ª `, i g S, the following hold:

ˆN and m Ži. There exist w. p.ª 1 REML estimates l ˆ N i , i g S, which are ˆ Ž Ž consistent, and the sequence N y p l N y l0 ., Ž m i Ž m ˆ N i y m 0 i ..Xi g S .X 4 is bounded in probability. Žii. If, moreover, the model is ND, then the REML estimates in Ži. are AN with p 0 Ž N . s N y p and p i Ž N . s m i , i g S, and M N Ž u 0 . s r2 Ž Jy1 u 0 . IN Ž u 0 .. N

'

'

'

'

REMARK 4.1. The conclusions are also true for the ANOVA estimates w e.g. Searle, Casella and McCulloch Ž1992., page 253x .

263

REML ESTIMATION

REMARK 4.2. There is no restriction on p in Theorem 4.1. For example, in Example 4.3, N ª ` iff p ª `. Let u, v g Srq 1; define u k v s Ž u1 k v 1 , . . . , u rq1 k vrq1 ., Su s  v g S: v F u4 , m u s Ł u qs0 n q and m u, S s min v g S u m v if Su / B and 1 if Su s B. THEOREM 4.2. Let the balanced model Ž13. be unconfounded and have p.v.c. As N ª ` and m i ª `, i g S, the following hold: Ži. There exist w. p.ª 1 MLE which are consistent if and only if

Ž 14.

p N

mi k d mi k d , S

ª 0,

m2i

ª 0,

i g S.

Žii. If, moreover, the model is ND, then there exist w. p.ª 1 MLE which are AN if and only if

Ž 15.

'

p0 Ž N . ; N y p ,

'

pi Ž N . ; m i ,

i g S,

and

Ž 16.

p

'N

mi k d mi k d , S

ª 0,

m3r2 i

ª 0,

i g S.

When Ž16. is satisfied, the MLE are AN with the same pi Ž N ., i g  04 j S, and MN Ž u 0 . as for the REML estimates. 4.2. The general case. The assumption of a mixed ANOVA model not being confounded is a natural requirement for the v.c. to be ‘‘identifiable.’’ More generally we have the following. DEFINITION 4.3. A v.c. model

Ž 17.

Y s Ž Y1 , . . . , YN . s X b q « , X

where E« s 0 and VarŽ « . s SŽ u . s u 1 S 1 q ??? qur S r , is said to be identifiable of its v.c. ŽID. if the matrices S 1 , . . . , S r are linearly independent. Note that our definition of identifiability is equivalent to requiring that every parameter u i , 1 F i F r, be identifiable in the sense of Rao and Kleffe wŽ 1988., Section 4.2x . Let A be a matrix. Then

Ž 18.

AX Y s AX X b q AX«

is again a v.c. model like Ž17.. DEFINITION 4.4. Model Ž17. is said to be identifiable of its v.c. under the invariant class ŽIDI. if model Ž18. is ID for some N = Ž N y p . matrix A w p s rankŽ X .x such that Ž5..

264

J. JIANG

It is clear that model Ž17. is IDI iff Ž18. is ID for every A satisfying Ž5. iff AX S1 A, . . . , AX S r A are linearly independent for every A Žor some A. satisfying Ž5.. Now consider the general mixed model Ž1.. LEMMA 4.1.

Model Ž1. is IDI iff l minŽCorŽ V0 , V1 , . . . , Vs .. ) 0.

Note CorŽ V0 , V1 , . . . , Vs . does not depend on the choice of A so long as Ž5.. In considering the asymptotic behavior of our estimates, we need model Ž1. to be IDI in the asymptotic sense. Lemma 4.1 inspires the following definition. DEFINITION 4.5. We say model Ž1. is asymptotically identifiable Žof its v.c.. under the invariant class, abbreviated by AI 2 , at u 0 if lim inf lminŽCorŽ V0 , V1 , . . . , Vs .. ) 0. We now take another look at the property AI 2 . We now return to Ž12.. The feature of this definition is that different normalizing sequences ŽNS. are used for estimates of different parameters. The necessity of this was noted by Miller Ž1977.. Harville Ž1977. described Miller’s NS as ‘‘the effective number of levels for the ith random factor Ž i s 1, . . . , c ..’’ Searle, Casella and McCulloch wŽ 1992., page 240x questioned how in general the NS should be chosen and asked ‘‘what is meant by ‘sample size tending to infinity’.’’ We have seen that in the balanced case there is virtually no other choice of NS Žsee Theorem 4.2.. Now we consider the problem from another point of view. Let uˆN g I in Ž9. and let it satisfy Ž12.. The asymptotic covariance matrix of uˆN is VuˆN s diag

ž

1 pi Ž N .

/

Ž MN Ž u 0 .

X

MN Ž u 0 . .

y1

diag

ž

1 pi Ž N .

/

.

If we want our estimates to be efficient in some sense, we would like to see VuˆN to be not too far from the Cramer]Rao lower bound I Ž N . Ž u 0 .y1 , where ´

ž ½

I Ž N . Ž u 0 . s y Eu 0

­ 2 LN ­u i ­u j

u0

5/

Ž L N is the log-likelihood of AX y .; that is, there exist bounds d , M ) 0 such that d I Ž N . Ž u 0 . F Vuˆy1 F MI Ž N . Ž u 0 ., which, under normality, leads to the folN lowing requirement on the NS pi Ž N . s:

Ž 19.

0 - lim inf l min Ž IN Ž u 0 . . F lim sup l max Ž IN Ž u 0 . . - `,

where IN Ž u 0 . is as in Section 3 w see Miller Ž1977., Assumption 3.5x . That Ž19. is closely related to the AI 2 is seen in the following lemma.

265

REML ESTIMATION

LEMMA 4.2. The following are equivalent: Ži. There are sequences of positive numbers pi Ž N . ª `, 0 F i F s, such that Ž19.; Žii. 5 Vi 5 R ª `, 0 F i F s, and the model is AI 2 at u 0 . In fact, whenever Ži. holds, we must have pi Ž N . ; 5 Vi 5 R , 0 F i F s. The quantities 5 Vi 5 R can be interpreted intuitively. Under normality, 1 2

5 Vi 5 2R s yEu

­ 2 LN 0

­u i2

, u0

which is the information that AX y contains about the true parameter u 0 i , 0 F i F s. This leads to the following definition. DEFINITION 4.6. Model Ž1. is called infinitely informative Žabout is v.c.. under the invariant class at u 0 if lim 5 Vi 5 R s `, 0 F i F s. The main theorem is now stated as follows. THEOREM 4.3. Consider a general mixed model Ž1. having p.v.c. Ži. If the model is asymptotically identifiable and infinitely informative ˆN under the invariant class at u 0 , then there exist w. p.ª 1 REML estimates l and m , 1 F i F s, which are consistent, and the sequence ˆNi

½ ž 'N y p Ž lˆ

N

y l0 . , 5 V1 5 R Ž m ˆ N 1 y m 01 . , . . . , 5 Vs 5 R Ž m ˆ Ns y m0 s .

/5 X

is bounded in probability. Žii. If, moreover, the model is ND, then the REML estimates in Ži. are AN with p 0 Ž N . s N y p , pi Ž N . being any sequence ; 5 Vi 5 R , 1 F i F s, and r2 Ž MN Ž u 0 . s Jy1 u 0 . IN Ž u 0 .. N

'

ABBREVIATION. We use AI 4 for ‘‘asymptotically identifiable and infinitely informative under the invariant class.’’ NOTE. A necessary and sufficient condition for AI 4 is given by Lemma 4.2Ži.. In particular, all balanced mixed models Ž13. are AI 4 , provided the models are unconfounded, have p.v.c., and N ª `, m i ª `, i g S Žsee the proof of Theorem 4.1.. THEOREM 4.4. Consider a general mixed model Ž1. having p.v.c. Ži. For the MLE to exist w. p.ª 1 and be consistent, it is necessary that

Ž 20.

p N

ª 0,

tr Ž Ci Ž m 0 . . mU

ª 0,

1 F i F s,

y V Ž m 0 .. Zi , mU s max 1 F i F s m i . where Ci Ž m 0 . s ZiX Ž Vmy1 0

266

J. JIANG

Žii. If, moreover, the model is ND, then the following are equivalent: ˆUN and m Ža. There exist w. p.ª 1 MLE l ˆUN i , 1 F i F s, which are AN with pi Ž N ., 0 F i F s, satisfying

Ž 21.

0 - lim inf lmin Ž INU Ž u 0 . . F lim sup lmax Ž INU Ž u 0 . . - `;

Žb. The model is AI 4 at u 0 and p

Ž 22.

'N

ª 0,

tr Ž Ci Ž m 0 . . 5 Vi 5 R

ª 0,

1 F i F s.

In either case, the MLE and the REML estimates in Theorem 4.3 are equivalent in the sense that they are AN for the same pi Ž N ., 0 F i F s, and MN Ž u 0 . as in Theorem 4.3Žii., and

Ž 23.

ž 'N y p ž lˆ

U N

/

/

X

ˆN , p1 Ž N . Ž m yl ˆUN 1 y m ˆ N 1 . , . . . , ps Ž N . Ž m ˆUN s y m ˆNs . ª 0

in probability. Condition Ž21. is implied, for example, by Miller wŽ 1977., Assumption 3.5x , which also shows the dependence of Miller’s NS on u 0 , and the relation between the two sets of parameters. See also Das Ž1979.. The assumption ND in Theorems 4.3Žii. and 4.4Žii. can be weakened to the following Ž24. called asymptotically nondegenerate ŽAND.

Ž 24.

lim inf l min Ž JN Ž u 0 . . ) 0,

where JN Ž u 0 . is given in Section 3 with pi Ž N . s 5 Vi 5 R , 0 F i F s. It can also been shown that under a condition weaker than Ž22. the MLE exist w.p.ª 1 and are consistent. 5. Some central limit theorems for quadratic forms. The proof of our main theorem is based on a central limit theorem for quadratic forms of random variables Žr.v.’s.. For each n, let X n1 , . . . , X n k n be independent with mean 0, and let A n s Ž a n i j .1 F i, j F k n be symmetric. There have been studies on the central Žnoncentral. limit theorems of the quadratic form XnX A n Xn , where Xn s Ž X n1 , . . . , X n k n .X . Some of the results are either for special kind of r.v.’s w e.g., Guttorp and Lockhart Ž1988.x or for A n with a special structure w e.g., Fox and Taqqu Ž1985.x , or with the assumption that a i i s 0, 1 F i F k n w e.g., de Jong Ž1987.x . A general theorem was given in Schmidt and Thrum Ž1981. and was extended by Rao and Kleffe wŽ 1988., Theorem 2.5.2x . However, as was pointed out by Rao and Kleffe wŽ 1988., page 51x , ‘‘the application of Žthe theorem. might be limited as it is essentially based on the assumption that the off diagonal blocks of A n tend to zero.’’ Such results could be used for models with replicated error structure w e.g., Anderson Ž1973. and Brown Ž1976.x , but not for general model Ž1.. We will state two theorems. The first removes the unpleasant restriction noted by Rao and Kleffe. The second extends the first. The results can be

267

REML ESTIMATION

extended to the vector case considered by Schmidt and Thrum Ž1981. and Rao and Kleffe Ž1988.. Extension to the case where X n i are martingale differences is also possible. We begin with some simple examples. EXAMPLE 5.1. If X n1 , . . . , X n k n are N Ž0, 1. distributed, then a necessary and sufficient condition for XnX A n Xn XnX A n Xn y EX

Ž 25.

var Ž XnX A n Xn .

1r2

ª L N Ž 0, 1 .

is that

lma x Ž A2n .

Ž 26.

ª 0.

tr Ž A2n .

EXAMPLE 5.2. Let A n s In , P Ž X n i s y1. s P Ž X n i s 1. s 1r2 y 1rn, P Ž X n i s y'2 . s P Ž X n i s '2 . s 1rŽ2 n. and P Ž X n i s 0. s 1rn, 1 F i F n. By the Lindeberg]Feller theorem it is easy to show that Ž25. does not hold, although Ž26. is satisfied. The situation in Example 5.2 is extreme because the random variables are ‘‘asymptotically degenerate.’’ Such cases must be excluded if one attempts to generalize the result of Example 5.1. Let A0n s A n y diagŽ a n i i ., An s  1 F i F k n , a n i i / 04 . THEOREM 5.1.

Suppose

Ž 27. Ž 28.

inf n

sup n

ž

ž

/ ž

/ / ª 0,

min var Ž X n i . n min var Ž X n2i . ) 0,

1FiFk n

/ ž

ig An

max EX n2i 1Ž < X n i < ) x . k max EX n4i 1Ž < X n i < ) x .

1FiFk n

ig An

as x ª `. Then Ž26. implies Ž25.. Let  L n i , 1 F i F k n , n G 14 be numbers; define 4 gnŽ1. i s EX n i 1 Ž < X n i < F L n i . ,

2 gnŽ2. i s E Ž X n i y 1 . 1Ž < X n i < F L n i . ,

2 dnŽ1. i s EX n i 1 Ž < X n i < ) L n i . ,

2 dnŽ2. i s E Ž X n i y 1 . 1Ž < X n i < ) L n i . ;

gn i j s

½

2

Ž1. gnŽ1. i gn j ,

if i / j,

gnŽ2. i ,

if i s j,

¡ Žd ~ s d , ¢0,

Ž1. ni

1 2

dn i j

4

Ž2. ni

q dnŽ1.j . ,

if i / j, if i s j g An otherwise.

268

J. JIANG

THEOREM 5.2. Suppose EX n2i s 1, 1 F i F k n , and there are numbers  L n i , 1 F i F k n , n G 14 such that

Ž 30.

kn

1

Ž 29.

sn2 1

sn4

kn

Ý

kn

a 4n i jgn i j q

i , js1

Ý

is1

ž

Ý

i , js1

a2n i j dn i j ª 0,

/

2

Ý a2n i j gnŽ1.i ª 0,

j/i

where sn2 s var Ž XnX A n Xn .. Then Ž25. is true provided

lma x

Ž 31.

žŽ A . / 0 2 n

sn2

ª 0.

In particular, with a n i i s 0, 1 F i F k n , we get Theorem 5.2 in de Jong Ž1987. under slightly weaker assumption. The following lemma plays a crucial role in the proof of Theorem 5.2 and hence of Theorem 5.1. LEMMA 5.1 ŽA lemma of linear algebra.. triangular matrix. Then

Ž 32.

tr Ž Ž BX B .

2

Let B s Ž bi j 1Ž i ) j. . be a lower

. F 2 lma x Ž Ž BX q B . 2 . tr Ž BX B . .

6. Discussion. 6.1. As in many maximum-likelihood-related problems the solution or root of the REML ŽML. equations sometimes presents difficulties. Theorems 4.1 and 4.3 ensure the existence of a consistent sequence of roots ŽCSR. of the REML equations and asymptotic normality of any sequence of roots such that ˆN y l0 ., p1Ž N .Ž m Ž p 0 Ž N .Ž l ˆ N 1 y m 01 ., . . . , ps Ž N .Ž m ˆ N s y m 0 s ..X 4 is bounded in probability Žsee the proofs of the theorems.. However, the theorems do not provide a way of identifying such a sequence when the roots are not unique. In other words, the established theorems are results of Cramer ´ type w e.g., Miller Ž1977.x . Some methods were proposed in the literature to overcome this difficulty w e.g., Lehmann Ž1983.x . These methods basically require that some sequence of consistent Žbut not necessarily AN. estimates be available. A candidate of such estimates in our cases is Rao’s MINQE, asymptotic properties of which are discussed in Rao and Kleffe wŽ 1988., Section 10x . In some cases, the uniqueness of the roots can be ensured. For example, under certain conditions the ANOVA estimates are uniquely defined w e.g., Westfall Ž1986.x . Since in the balanced case solutions of the REML equations are identical to the ANOVA estimates, these conditions also guarantee the

REML ESTIMATION

269

uniqueness of the REML estimates in the balanced case. Necessary and sufficient conditions for existence of Žunique. explicit solution of the ML equations in a balanced mixed model of the analysis of variance are given by Szatrowski and Miller Ž1980.. General sufficient conditions for the uniqueness of solutions of ML equations can be found in Makelainen, Schmidt and ¨ ¨ Styan Ž1981.. The identification of a CSR is a problem of both theoretical and practical interests not only in mixed model analysis but also in a wide range of areas where M-estimates w Huber Ž1981.x are involved. Questions can always be asked such as whether solutions that maximize the Gaussian likelihood form a CSR, although the Gaussian likelihood is not necessarily the true likelihood. Note that all the REML estimates proved to exist in this paper are actually at least local maxima of the Gaussian likelihood of AX y. Finally, Theorems 4.2 and 4.4 give necessary and sufficient conditions for existence of a CSR of the ML equations Žand asymptotic normality of such a sequence.. When these conditions are violated, no sequence of roots of the ML equations can be consistent ŽAN.. 6.2. From Theorems 4.3 and 4.4 we see the asymptotic covariance matrix ˆN , m of both uˆN s Ž l ˆ N 1, . . . , m ˆ N s .X and uˆNU s Ž lˆUN , m ˆUN 1 , . . . , m ˆUN s .X is V˜Ž u 0 . s ˜IN Ž u 0 .y1 J˜N Ž u 0 . ˜IN Ž u 0 .y1, where ˜IN Ž u 0 . s ŽtrŽ Vi Vj .., J˜N Ž u 0 . s 2 ˜IN Ž u 0 . q Nq m Ž K˜N Ž u 0 . with K˜N Ž u 0 . s ŽÝ ls1 EWN4 l y 3.Vi Ž m 0 . l l Vj Ž m 0 . l lrl10Ž is 1.q1 Ž js 0. .. Thus one can construct approximate confidence intervals for the parameters of variance components. It is seen that under normality w in which case V˜Ž u 0 . s 2˜ IN Ž u 0 .y1 , which is the inverse of the restricted information matrixx and the condition that prN ª 0, trŽ Ci Ž m 0 ..r5 Vi 5 2R ª 0, 1 F i F s w which implies 2˜ IN Ž u 0 .y1 ; 2 ˜ INU Ž u 0 .y1 , the inverse of the Žunrestricted. information matrix; see, e.g., Searle, Casella and McCulloch Ž1992., Section 6x , the REML estimates are efficient in the sense of attaining asymptotically the Cramer]Rao ´ lower bound w i.e., Miller Ž1977.x . By Theorem 4.4Žii. and similar discussion as for Ž19., the MLE are efficient in the same sense if and only if Ž22. holds. In particular, with p fixed for all balanced mixed models of the analysis of variance, both the REML estimates and the MLE are efficient. However, efficiency in the non-i.i.d. case, especially in the presence of a large number of nuisance parameters, ought to be defined in a stricter sense w see Bickel Ž1993. and Pfanzagl Ž1993.x . Further work is needed before a conclusion is made about whether the REML estimates are the asymptotically best. 6.3. In all theorems in this paper, we assume the model has p.v.c. w e.g., Miller Ž1977.x . It can be shown that even without this assumption but with the assumption sup N max 1 F i F s l max Ž ZiX V Ž m 0 . Zi . - `, a sequence of solutions to the REML equations can still be consistent and AN. However, the solutions are not guaranteed to fall into Q asymptotically and therefore not the REML estimates by our definition.

270

J. JIANG

7. Proofs PROOF

OF

LEMMA 5.1. For any 1 F i F n, let

¡

A s BX q B s

s

ž

0 b 21 .. . bi1

b 21 .. . ???

. ???

biq11 .. . .. . bn1

???

???

¢ Ai

˜Xi A

˜Xi A

AUi

/

bi1 .. . .. . 0

biq11 .. . .. . biq1 i

???

???

???

???

biq1 i .. . .. . bn i

0 .. . .. .

??? .. .

???

bn iq1

???

??? ..

???

???

..

.

bn1 .. . .. . bn i

¦

bn iq1 .. . bn ny1 0

bn ny1

§

,

a s Ž 0) X .X and b s Ž ) ˜ X 0 .X , w here ) s Ž b iq 1 i , . . . , b n i . 9, ŽÝ k ) i b k i b k1 , . . . , Ý k ) i b k i b k i . X . Then it is easy to check that

) ˜ s