a statistical model are independence, conditional distribution and symmetry. ...... Wishart distribution on P I cf. the final paragraph of Appendix B. The. G. Ž.
The Annals of Statistics 1998, Vol. 26, No. 2, 525 ᎐ 572
SYMMETRY AND LATTICE CONDITIONAL INDEPENDENCE IN A MULTIVARIATE NORMAL DISTRIBUTION BY STEEN ANDERSSON 1
AND
JESPER MADSEN 1 , 2
Indiana University and University of Copenhagen A class of multivariate normal models with symmetry restrictions given by a finite group and conditional independence restrictions given by a finite distributive lattice is defined and studied. The statistical properties of these models including maximum likelihood inference, invariance and hypothesis testing are discussed.
1. Introduction. Three of the most important concepts used in defining a statistical model are independence, conditional distribution and symmetry. ŽThe assumption most often used in statistics is that of i.i.d. observations, that is, independent and identical distributed observations, which means independence between observations and symmetry under any permutation of the observations.. Statistical models given by a combination of two of these concepts, conditional distribution and independence, the so-called conditional independence ŽCI. models, have received increasing attention in recent years. The models are defined in terms of directed graphs, undirected graphs, or the combination of the two, the so-called chain graphs. See Whittaker Ž1990. or Lauritzen Ž1996. for an introduction to models of this type. The special connections between statistical models and graphs have been the subject of many of the contributions to this area, see, for example, Andersson and Perlman Ž1995b., Cox and Wermuth Ž1993., Lauritzen Ž1989, 1996., Lauritzen and Wermuth Ž1989., or Frydenberg Ž1990.. The special class of CI models where all distributions are assumed to be multivariate normal is of special interest. Under this assumption, Andersson and Perlman Ž1993, 1995a. w hereafter abbreviated AP Ž1993. and AP Ž1995a., respectivelyx introduced the so-called lattice conditional independence ŽLCI. models and presented a complete solution to their estimation and testing problems. The relations between LCI models Žwithout the assumption of normality. and other CI models are studied in Andersson, Madigan, Perlman and Triggs Ž1995a, b..
Received March 1996; revised January 1997. 1 Supported in part by U.S. National Security Agency Grant MDA 904-92-H-3083 and by NSF Grant DMS-94-02714. 2 Research carried out in part at the Department of Statistics, University of Washington. AMS 1991 subject classifications. Primary 62H12, 62H15; secondary 62H10, 62H20, 62A05. Key words and phrases. Group symmetry, invariance, orthogonal group representation, quotient space, conditional independence, distributive lattice, join-irreducible elements, maximum likelihood estimator, likelihood ratio test, multivariate normal distribution.
525
526
S. ANDERSSON AND J. MADSEN
As part of a general development of the theory of the normal distribution, Brøns Ž1969. presented a general definition of group symmetry ŽGS. models to S. Andersson and S. T. Jensen. In the years 1972᎐1985, Andersson, Brøns, and Jensen together developed an algebraic theory for these models containing a complete solution to the likelihood inference problem. This basic theory, detailed in numerous Danish manuscripts w e.g., Andersson, Brøns and Jensen Ž1975., Andersson Ž1975a, 1976., Brøns Ž1969. and Jensen Ž1973, 1974, 1977, 1983.x , has not yet been published. Several manuscripts in English summarize the theory w e.g., Andersson Ž1978, 1992.x . In Andersson Ž1975b. the structure of the models was explained and a solution to the estimation problem was given in a canonical form. Perlman Ž1987. reviews a small part of the theory. In Andersson, Brøns and Jensen Ž1983., the ten fundamental irreducible testing problems within this theory are discussed. Andersson and Perlman Ž1984. and Bertelsen Ž1989. treat the noncentral distributions connected to two of these ten testing problems. Since the present paper uses most of the basic theory of GS models, a summary is presented in Appendix A. The present paper combines the lattice conditional independence restrictions with the group symmetry restrictions to obtain the group symmetry lattice conditional independence Ž GS-LCI . models. The GS models and the LCI models then become special cases of the GS-LCI models. In this paper we give necessary and sufficient conditions for the existence and uniqueness of the maximum likelihood ŽML. estimator for an arbitrary observation, necessary and sufficient conditions for the existence and uniqueness of the ML estimator with probability 1, an explicit expression for the ML estimator, an explicit expression for the likelihood ratio statistic Q for testing one GS-LCI model against another and the central distribution of Q in terms of the moments EŽ Q ␣ ., ␣ ) 0. Andersen, Højbjerre, Sørensen and Eriksen Ž1995. combine the symmetry given by the complex numbers, that is, the GS condition given by the group "1, " i4 , with CI restrictions given by an undirected graph. In Hylleberg, Jensen and Ørnbøl Ž1993. a subgroup of the symmetric group is combined with CI restrictions given by an undirected graph. In both cases there is a nontrivial overlap with the models in the present paper. These arise from the overlap between LCI models and CI models given by undirected graphs, as explained in Andersson, Madigan, Perlman and Triggs Ž1995a, b.. However, in the case of Hylleberg, Jensen and Ørnbøl Ž1993. the nontrivial overlap is also because the restriction of the interplay between the special group of permutations and the CI conditions is relaxed compared to the restriction between the general GS and LCI conditions in the present paper. Madsen Ž1996. discusses ML estimation in a class of models which extends both the GS-LCI models and those of Andersen, Højbjerre, Sørensen and Eriksen Ž1995. and Hylleberg, Jensen and Ørnbøl Ž1993.. We introduce the GS-LCI models by means of the following four simple examples.
527
SYMMETRY AND LATTICE MODELS
EXAMPLE 1.1. Let x a s Ž x a1 , x a2 ., x b s Ž x b1 , x b 2 . and x c s Ž x c1 , x c2 . be three pairs of random observations with a joint normal distribution with mean zero and covariance matrix ⌺ s Ž l , k < l, k s a, b, c; , s 1, 2., that is, l , k is the covariance between the two observations x l and x k . For example Ž x a , x b , x c . could be measurements of three different variables a, b, and c on two symmetric objects, for example, two plants within the same plot. Since the joint distribution should not depend on the Žprobably irrelevant. numbering of the two plants, it should remain invariant under the simple linear transformation that corresponds to permutation of plant indices. This implies that ⌺ has the restriction H GS : l , k s
½
␥lk , lk ,
s , /
where ␥ l k s ␥ k l and l k s k l are real numbers, l, k s a, b, c. Thus, under H GS , the six-dimensional variable x s Ž x a1 , x b1 , x c1 , x a2 , x b 2 , x c2 .⬘ has the 2 = 2 block covariance matrix ⌺s
Ž 1.1.
ž
⌫ ⍀
⍀ , ⌫
/
where ⌫ s ⌫⬘ s Ž␥ l k < l, k s a, b, c . and ⍀ s ⍀⬘ s Ž l k < l, k s a, b, c .. This covariance structure is a special case of multivariate complete symmetry; compare Section A.6. Next, consider the assumption that x a and x c are conditionally independent given x b , which we express in the familiar notation H LC I : x a H x c N x b . This restriction could occur if the three measured variables correspond to three ‘‘sites’’ on the plant where a is a neighbor to b, and b is a neighbor to c, in which case the dependence between the observations from ‘‘site’’ a and ‘‘site’’ c is indirect due only to their mutual dependence on the observations from ‘‘site’’ b. The lattice Žring. K of subsets of the index set I s 1a, 2 a, 1b, 2 b, 1c, 2 c4 , which defines this CI restriction, is given by K s ⭋, 1b, 2 b 4 , 1b, 2 b, 1a, 2 a4 , 1b, 2 b, 1c, 2 c 4 , I 4 , compare AP Ž1993., Example 2.5. The restriction imposed on ⌺ by both H GS and H LC I can then be expressed as Ž1.1. together with the additional restriction H GS -LCI :
ž
␥ac ac
ac ␥a b ␥ac s a b
/ ž
ab ␥a b
/ž
␥bb bb
bb ␥bb
y1
/ ž
␥bc bc
bc ␥bc .
/
We thus have four hypotheses for the covariance matrix ⌺, namely the unconstrained H, the two subhypotheses H GS and H LCI and their intersection H GS -LCI . Now consider N i.i.d. observations x 1 , . . . , x N of the six-dimensional random observation x. It is well known that under H the ML estimator exists
528
S. ANDERSSON AND J. MADSEN
and is unique with probability 1 if and only if N G 6. Moreover it is well known from classical multivariate analysis that in the models H GS and H LCI , the required conditions are N G 3 and N G 4, respectively. In all three cases, an explicit expression for the ML estimator is easily obtained. In the case of the model H GS -LCI , the results in the present paper applied to this simple case show that the condition for existence and uniqueness of the ML estimator with probability 1 is N G 2. The ML estimator can be found using a combination of the techniques applied for GS models and LCI models. First, one determines the ML estimator
ˆ ˆ⌺GS s ⌫ ˆ ⍀
ˆ ⍀ , ˆ ⌫
ž
/
ˆ s Ž␥ˆl k < l, k s a, b, c . and ⍀ ˆ s Ž ˆ l k < l, k s a, b, c .. for ⌺ under H GS , where ⌫ Under H LC I , the likelihood function ŽLF. factorizes into the product of the conditional LF of x a given x b , the conditional LF of x c given x b and the marginal LF of x b . These factors then contain two 2 = 2 regression parameters R a and R c , two 2 = 2 conditional covariance matrices ⌳ a and ⌳ c and one 2 = 2 marginal covariance matrix ⌳ b . The ML estimator ˆ ⌺GS-LCI for ⌺ under H GS -LCI , is then determined by ˆl s R
ž ž
␥ ˆl b ˆlb
␥ ˆ ˆ l s ll ⌳ ˆll
ˆlb ␥ ˆl b
/ž
␥ˆb b ˆbb
ˆbb ␥ ˆb b
␥ˆl b ˆll y ␥ ˆl l ˆlb
y1
/
ˆlb ␥ˆl b
/ ž
,
/ž
␥ ˆb b ˆbb
ˆbb ␥ˆb b
y1
/ ž
␥ ˆb l ˆbl
ˆbl , ␥ˆb l
/
for l s a, c, respectively, and
ˆb s ⌳
ž
␥ ˆb b ˆbb
ˆbb . ␥ ˆb b
/
Of the five possible testing problems within the design of the models given by H, H GS , H LCI and H GS-LCI , the three involving H GS-LCI seem to be new. The likelihood ratio statistic and its central distribution for these tests can easily be obtained from the general theory presented in this paper. EXAMPLE 1.2. In Example 1.1, instead of H LC I , consider the assumption HXLC I : x a H x c , that is, x a and x c are marginally independent. The interpretation of this restriction is that the actual measurements of plants on ‘‘site’’ a do not contain any information about the measurements on ‘‘site’’ c and vice versa. The lattice Žring. K ⬘ of subsets of the index set I s 1a, 2 a, 1b, 1c, 2 c4 which defines this CI restriction is given by K ⬘ s ⭋, 1a, 2 a4 , 1c, 2 c 4 , 1a, 2 a, 1c, 2 c 4 , I 4 ;
529
SYMMETRY AND LATTICE MODELS
compare AP Ž1993., Example 2.4. The restriction imposed on ⌺ by both H GS and HXLC I can then be expressed as Ž1.1. together with the additional restriction HXGS -LCI : ␥ac s ac s 0. Note that H LC I and HXLCI are nonnested and have a nontrivial intersection. We thus again consider four hypotheses for the covariance matrix ⌺, namely the unconstrained H, the two subhypotheses H GS and HXLCI and their intersection HXGS -LCI . Consider N i.i.d. observations x 1 , . . . , x N of the six-dimensional variable x. From Example 2.4 in AP Ž1993. it follows that under HXLC I , the ML estimator exists and is unique with probability 1 if and only if N G 6. The results in the present paper shows that under HXGS -LCI , the required condition is N G 3. In this case, the ML estimator can be determined in the same way as in Example 1.1. Under HXLC I , the likelihood function ŽLF. factorizes into the conditional LF of x b given Ž x a , x c . and the marginal LFs of x a and x c , respectively. These factors then contain one 2 = 4 regression parameter R b , one 2 = 2 conditional covariance matrix ⌳ b and two 2 = 2 marginal covariance matrices ⌳ a , ⌳ c . The ML estimator ˆ ⌺GS-LCI for ⌺ under HXGS-LCI , is then determined by
ˆb s R
ˆb s ⌳
ž ž
␥ˆa b ˆa b
␥ˆb b ˆbb
␥ˆb c ˆbc
ˆa b ␥ˆa b
ˆbb ˆb yR ␥ ˆb b
/
=
and
ˆl s ⌳
ž
␥ˆl l ˆll
ˆbc ␥ˆb c ␥ ˆa b ␥ˆb c ˆa b ˆbc
/
␥ˆa a ␥ˆac ˆa a ˆ ac
␥ˆac ␥ ˆcc ˆ ac ˆ cc
ˆa a ˆ ac ␥ ˆa a ␥ˆac
ˆ ac ˆ cc ␥ˆac ␥ ˆcc
y1
0
,
ˆa b ˆbc ␥ ˆa b ␥ˆb c
0
ˆll , ␥ˆl l
/
for l s a, c, respectively. As in Example 1.1, the three testing problems involving HXGS -LCI of the possible five within the design of the models given by H, H GS , HXLCI , and HXGS-LCI , seem to be new. The likelihood ratio statistic and its central distribution for these tests can be obtained from the general theory presented in this paper. EXAMPLE 1.3. Let x a s Ž x a1 , x a2 , . . . , x a n a ., x b s Ž x b1 , x b 2 , . . . , x b n b . and x c s Ž x c1 , x c2 , . . . , x cn c . be three families of n a , n b and n c multivariate random observations, respectively. The dimensions of the multivariate observations within each of the families are pa , p b and pc , respectively. The
530
S. ANDERSSON AND J. MADSEN
simultaneous distribution of these n a pa q n b p b q n c pc real observations is assumed to be normal with mean vector zero and Ž n a q n b q n c . = Ž n a q n b q n c . block covariance matrix ⌺ s Ž ⌺ l , k < l, k s a, b, c; s 1, . . . , n l ; s 1, . . . , n k .; that is, ⌺ l , k is the p l = p k covariance matrix between the two multivariate observations x l and x k . For example, x a , x b and x c could be multivariate measurements on plants from three different varieties a, b and c, respectively. Since the joint distribution should not depend on the numbering of plants within a variety, it must remain invariant under any linear transformation of the sample space that corresponds to renumbering of plants within varieties. This implies that the covariance matrix ⌺ has the restrictions given by
¡⌫ , ¢⌬ ,
s~⍀ , l
H GS : ⌺ l , k
l
lk
l s k, s , l s k, / , l / k,
⌫lX
where ⌫l s is a p l = p l matrix, ⍀ l s ⍀Xl is a p l = p l matrix, and ⌬ l k s ⌬X k l is a p l = p k matrix, l, k s a, b, c; l / k. Thus, under H GS the random vector x s Ž xXa1 , xXa2 , . . . , xXa n a , xXb1 , xXb 2 , . . . , xXb n b , xXc1 , xXc2 , . . . , xXcn c . ⬘ of real dimension n a pa q n b p b q n c pc has the block covariance matrix
Ž 1.2.
⌺aa ⌺ s ⌺b a ⌺c a
⌺ab ⌺b b ⌺ cb
⌺ ac ⌺b c , ⌺ cc
0
where
Ž 1.3.
and
Ž 1.4.
⌫l ⍀l . ⌺ l l s .. .. . ⍀l
⍀l ⌫l ⍀l .. . ⍀l
⭈⭈⭈ ⍀l .. . .. . ⭈⭈⭈
⭈⭈⭈ ⭈⭈⭈ .. . .. . ⍀l
⌬ lk .. ⌺l k s . ⌬ lk
⭈⭈⭈ .. . ⭈⭈⭈
⌬ lk .. . , ⌬ lk
⍀l ⍀l .. . .. . ⌫l
0
0
for l, k s a, b, c; l / k. This is an example of what we could call multivariate compound symmetry, first considered by Votaw Ž1948. in the univariate case; that is, pa s p b s pc s 1 Žsee Section A.6.. Next consider the assumption that the families x a and x c are conditionally independent given the family x b , which we express in the familiar notation H LC I : x a H x c N x b .
531
SYMMETRY AND LATTICE MODELS
This restriction could occur if the three families of variables correspond to three plots a, b and c where a is neighbor to b, and b is a neighbor to c, in which case the dependence between the observations from plot a and plot c is due only to the observations from plot b. The lattice Žring. K of subsets of the index set I s a1, a2, . . . , an a , b1, b2, . . . , bn b , c1, c2, . . . , cn c 4 which defines this conditional independence is given by
˙ Ib , Ib j ˙ Ic , I , K s ⭋, Ib , Ia j
½
5
where Il s l1, l2, . . . , ln l 4 , l s a, b, c; compare AP Ž1993., Example 2.5. The restriction imposed on ⌺ by both H GS and H LCI can then be expressed as Ž1.2., Ž1.3. and Ž1.4. together with the additional restriction H GS -LCI :
⌬ ac .. . ⌬ ac
⭈⭈⭈ .. . ⭈⭈⭈
⌬ ac .. . ⌬ ac
⌬ ab .. s . ⌬ ab
0
⭈⭈⭈ .. . ⭈⭈⭈
or equivalently,
⌬ ab .. .
⌬ ab
0
⌫b ⍀b .. . .. . ⍀b
⍀b ⌫b ⍀b .. . ⍀b
⭈⭈⭈ ⍀b .. . .. . ⭈⭈⭈
⭈⭈⭈ ⭈⭈⭈ .. . .. . ⍀b
⍀b ⍀b .. . .. . ⌫b
y1
0
H GS -LCI : ⌬ ac s n b ⌬ a b Ž ⌫b q Ž n b y 1 . ⍀ b .
y1
⌬bc .. .
⌬bc
⭈⭈⭈ .. . ⭈⭈⭈
⌬bc .. . , ⌬bc
0
⌬bc .
We thus again have four hypotheses in the covariance matrix ⌺, namely the unconstrained H, the two subhypotheses H GS and H LCI , and the intersection H GS -LCI . Let x 1 , x 2 , . . . , x N be N i.i.d. observations of the Ž n a pa q n b p b q n c pc .dimensional random vector x. It is well known that under H the ML estimator exists and is unique with probability 1 if and only if N G n a pa q n b p b q n c pc . The model given by H GS is well known when pa s p b s pc s 1; compare Votaw Ž1948.. It follows from the theory of GS models presented in Appendix A that in the general case, the ML estimator for ⌺ exists and is unique with probability 1 if and only if N G pa q p b q pc , N Ž n a y 1. G pa , N Ž n b y 1. G p b , and N Ž n c y 1. G pc ; see Section A.4. In the familiar model given by H LC I , the condition is N G max n a pa q n b p b , n c pc q n b p b 4 ; compare AP Ž1993., Example 2.5. For both models, the ML estimator is easily obtained. In the case of the model H GS -LCI , the theory presented in the present paper shows that the conditions for existence and uniqueness of the ML estimator with probability 1 become N G pa q p b , N G pc q p b , N Ž n a y 1. G pa , N Ž n b y 1. G p b and N Ž n c y 1. G pc . The ML estimator can be found using a combination of the techniques from GS models and LCI models. First
532
S. ANDERSSON AND J. MADSEN
ˆa , ⌫ ˆb , ⌫ ˆc , ⍀ ˆ a, ⍀ ˆ b, ⍀ ˆ c, ⌬ ˆ ab, ⌬ ˆ ac , ⌬ ˆ b c . under H GS . one finds the ML estimator Ž ⌫ Let y s Ž x 1 , x 2 , . . . , x N . be the I = N observation matrix and let y l be the p l = N submatrix ŽŽ x 1 . l , Ž x 2 . l , . . . , Ž x N . l . of y, s 1, . . . , n l , l s a, b, c. We then obtain that ˆl s ⌫ ˆls ⍀ ˆ lk s ⌬
1 nl
Ý Ž yl yXl N s 1, . . . , n l . , 1
n l Ž n l y 1. 1 nl nk
Ý Ž yl yXl N , s 1, . . . , n l , / . ,
Ý Ž yl yXk N s 1, . . . , n l ,
s 1, . . . , n k . ,
where l, k s a, b, c, l / k. Under H LC I the likelihood function ŽLF. can be factorized into the conditional LF of x a given x b , the conditional LF of x c given x b and the marginal LF of x b . These factors then contain two multivariate regression parameters R a b and R cb , of dimensions n a pa = n b p b and n c pc = n b p b , respectively; two multivariate conditional covariance matrices ⌳ a and ⌳ c of dimensions n a pa = n a pa and n c pc = n c pc , respectively and one marginal covariance matrix ⌳ b of dimension n b p b = n b p b . Under H GS-LCI , the regression parameters R l b , l s a, c and the variance parameters ⌳ l , l s a, b, c have the form Ž1.4. and Ž1.3., respectively. Thus,
Tl b .. Rlb s . Tl b
⭈⭈⭈ .. . ⭈⭈⭈
Tl b .. . Tl b
0
and
Fl ⌽l . ⌳ l s .. .. . ⌽l
⌽l Fl ⌽l .. . ⌽l
⭈⭈⭈ ⌽l .. . .. . ⭈⭈⭈
⭈⭈⭈ ⭈⭈⭈ .. . .. . ⌽l
⌽l ⌽l .. . , .. . Fl
0
where Tl b is a p l = p b matrix, l s a, c and Fl s FlX , ⌽ l s ⌽Xl are p l = p l matrices, l s a, b, c. The ML estimator ˆ ⌺ for ⌺ under H GS -LCI is then ˆb , ⌽ b s ⍀ ˆ b and determined by setting Fb s ⌫
ˆ lb Ž ⌫ ˆb q Ž n b y 1 . ⍀ ˆ b. Tl b s ⌬
y1
,
ˆl y n b ⌬ ˆ lb Ž ⌫ ˆb q Ž n b y 1 . ⍀ ˆ b. Fl s ⌫
y1
ˆ l y nb ⌬ ˆ lb Ž ⌫ ˆb q Ž n b y 1 . ⍀ ˆ b. ⌽l s ⍀
ˆ bl , ⌬
y1
ˆ bl , ⌬
for l s a, c, respectively. Of the five possible testing problems within the design of the models given by H, H GS , H LCI , and H GS-LCI , the problem of testing H LCI versus H is well known from the literature; compare AP Ž1995a., and the problem of testing H GS versus H follows from the theory covered in Appendix A. The three tests
SYMMETRY AND LATTICE MODELS
533
involving the hypothesis HGS -LCI seem to be new. The likelihood ratio test statistics and a representation of the corresponding central distributions can easily be obtained from the general theory in the present paper. EXAMPLE 1.4. Analogously to the construction of Example 1.2, consider the assumption HXLC I : x a H x c , instead of H LC I in Example 1.3. This restriction could occur if the plants on plot a are assumed not to influence the plants on plot c and vice versa. The lattice Žring. K ⬘ of subsets of the index set I which defines this CI restriction is given by ˙ Ic , I 4 ; K ⬘ s ⭋, Ia , Ic , Ia j compare AP Ž1993., Example 2.4. The restriction imposed on ⌺ by both H GS and HXLC I can then be expressed as Ž1.2., Ž1.3. and Ž1.4. together with the additional restriction HXGS -LCI : ⌬ ac s 0. As in Example 1.3, consider N i.i.d. observations x 1 , . . . , x N of the Ž n a pa q n b p b q n c pc .-dimensional variable x. From Example 2.4 in AP Ž1993. it follows that under HXLC I , the ML estimator exists and is unique with probability 1 if and only if N G n a pa q n b p b q n c pc , that is, the same as in the unconstrained case. The results in the present paper shows that under HXGS -LCI , the required conditions are N G pa q p b q pc , N Ž n a y 1. G pa , N Ž n b y 1. G p b and N Ž n c y 1. G pc , respectively, that is, the same as in the case of H GS . In this case, the ML estimator can easily be determined in the same way as in the previous examples. Similarly, the three testing problems involving HXGS -LCI of the possible five within the design of the models given by the unconstrained H, H GS , HXLCI , and HXGS-LCI , seem to be new, and the likelihood ratio statistic and its central distribution for these tests can be obtained from the general theory presented in this paper. In general the observation space is ⺢ I where I is a finite index set. The general definition of a GS-LCI model is stated in terms of an orthogonal group representation of a finite group G on ⺢ I together with a ring Žlattice. K of subsets of the index set I. The GS-LCI model is then defined by imposing symmetry conditions given by and conditional independence conditions given by K. A condition on the interplay between the group representation and the ring is required to ensure the complete solution of the GS-LCI model. In Section 2 the GS-LCI models are defined ŽSection 2.4., the fundamental factorization of the parameter space PG, K Ž I . of all I = I covariance matrices determined by the GS and LCI restrictions is obtained ŽTheorem 2.1. and the fundamental invariance group GL G, K Ž I . is defined together with its transitive action on PG, K Ž I . ŽTheorem 2.2.. The distribution results for the likelihood ratio statistics are greatly facilitated by this transitive action. The derivations of these distributions which generalize and improve the corresponding derivations for LCI models are presented in Appendix B; compare
534
S. ANDERSSON AND J. MADSEN
the Appendix of AP Ž1995a.. In Section 3 a necessary and sufficient condition for the existence of the ML estimator and a necessary and sufficient condition for the uniqueness of the ML estimator for a fixed observation x g ⺢ I is obtained together with an almost explicit expression for the ML estimator ŽTheorem 3.1.. In Proposition 3.2 the necessary and sufficient algebraic condition for the existence and uniqueness of the ML estimator with probability 1 is obtained. The structure constants for a GS-LCI model are then introduced. In terms of these, another, very useful, necessary and sufficient condition for the existence and uniqueness of the ML estimator with probability 1 is obtained ŽProposition 3.3.. Section 4 presents the general testing problem, and the likelihood ratio test statistic Q is derived. The central distribution of Q in terms of the moments EŽ Q ␣ ., ␣ ) 0, is given as a function of the structure constants. In Section 5, it is established that independent repetitions Ži.i.d.. of a GS-LCI model is again a GS-LCI model, except for a trivial reparametrization. Furthermore, it is shown how estimators and structure constants for the i.i.d. GS-LCI model are obtained in terms of the original GS-LCI model ŽSection 5.1.. In Section 5.2 it is demonstrated how to construct new examples ad libitum based on well-known examples of GS models Žcf. Section A.6. and the examples of LCI models in AP Ž1993.. Finally, in Section 6, we indicate how the GS-LCI models can be extended in various ways. 2. Mathematical formulation. In this section we explain the mathematical set-up for the combined GS-LCI models to be investigated. Furthermore, we present some fundamental theorems describing the structure of the set PG, K Ž I . of covariance matrices that satisfy the GS-LCI restrictions. We have tried as much as possible to use the same type of notation as in AP Ž1993. and Ž1995a.. In the following, let I and J denote finite index sets and let < I < denote the number of elements in I. 2.1. Notation. Let ⺢ I be the vector space of all families x ' Ž x i < i g I . of real numbers indexed by I. For K : I, let pK : ⺢ I ª ⺢ K be the canonical projection and u K : ⺢ K ª ⺢ I the canonical imbedding; that is, pK ŽŽ x i < i g I .. s Ž x i < i g K . and u K ŽŽ x i < i g K .. s Ž xXi < i g I ., where xXi s x i for i g K and xXi s 0 otherwise. For x g ⺢ I, let x K denote pK Ž x . g ⺢ K . Note that ⺢ ⭋ s 04 . Let MŽ I = J . ' ⺢ I= J denote the vector space of all I = J matrices. The algebra MŽ I = I . is denoted by MŽ I .. For A g MŽ I = J . let A⬘ g MŽ J = I . denote the transposed matrix. The group of all nonsingular I = I matrices, the group of all orthogonal I = I matrices, the cone of all positive semidefinite I = I matrices and the cone of all positive definite I = I matrices are denoted by GLŽ I ., OŽ I ., PSŽ I ., and PŽ I ., respectively. The action of the group GLŽ I . on PŽ I . given by
Ž 2.1.
GL Ž I . = P Ž I . ª P Ž I . ,
Ž A, ⌺ . ¬ A⌺ A⬘
SYMMETRY AND LATTICE MODELS
535
is well known to be transitive and proper. The I = I identity matrix is denoted by 1 I . For A s Ž a i i⬘ N Ž i, i⬘. g I = I . g MŽ I . and K : I, let A K denote the K = K submatrix of A; that is, A K s Ž a i i⬘ N Ž i, i⬘. g K = K . g MŽ K .. If A K is nonsingular, then Ay1 denotes the inverse matrix Ž A K .y1 . K For any subspace U : ⺢ I, let U H denote the orthogonal complement to U wrt the usual inner product in ⺢ I ; that is, U H s x g ⺢ I N ᭙ z g U: x⬘z s 04 and denote by PU g MŽ I . the corresponding orthogonal projection matrix. For g ⺢ I and ⌺ g PŽ I . let NŽ , ⌺ . denote the normal distribution on ⺢ I with expectation and covariance matrix ⌺. Let NŽ ⌺ . denote NŽ0, ⌺ .. The overall normal model ŽNŽ ⌺ . N ⌺ g PŽ I .. is invariant under the action of GLŽ I . on the observation space ⺢ I given by
Ž 2.2.
GL Ž I . = ⺢ I ª ⺢ I ,
Ž A, x . ¬ Ax ,
and the transitive action of GLŽ I . on the parameter space PŽ I . given by Ž2.1.. 2.2. The lattice conditional independence model. Let K be a subring of the ring D Ž I . of all subsets of I; that is, K is closed under union and intersection. Since K is a distributive lattice wrt these operations, we usually refer to K as a lattice of subsets of I. Without loss of generality we assume that I, ⭋ g K. A matrix A g MŽ I . is called K-preserving if for every K g K and x g ⺢ I, Ž Ax .K s A K x K , or equivalently, if Au K Ž⺢ K . : u K Ž⺢ K .. Let M K Ž I . be the algebra of all K-preserving matrices w in AP Ž1993., M K Ž I . was denoted MŽ K .x , and let GL K Ž I . be the group of all nonsingular K-preserving matrices w in AP Ž1993., GL K Ž I . was denoted GLŽ K .x . Define the subset PK Ž I . : PŽ I . as follows: ⌺ g PK Ž I . if and only if x L and x M are conditionally independent given x L l M for every L, M g K whenever x g ⺢ I follows NŽ ⌺ . w in AP Ž1993., PK Ž I . was denoted by PŽ K .x . The statistical model
Ž 2.3.
Ž N Ž ⌺ . N ⌺ g PK Ž I . .
with observation space ⺢ I and parameter space PK Ž I . is called the lattice conditional independence Ž LCI . model determined by K. For K g K, define ² K : s DŽ K ⬘ g K N K ⬘ ; K . and w K x s K _ ² K :, so that
Ž 2.4.
˙ w Kx, K s ²K: j
˙ indicates that the union is disjoint. Let I Ž K . denote the set of where j join-irreducible elements of K, that is, K g I Ž K . if and only if ² K : ; K, or equivalently, if w K x / ⭋. The subsets w K x of I, K g I Ž K ., are all disjoint, and ˙ Ž w K ⬘x N K ⬘ g I Ž K . , K ⬘ : K . , KsD
536
S. ANDERSSON AND J. MADSEN
K g K. In particular,
˙ Žw K x N K g IŽ K.. IsD
Ž 2.5.
w see AP Ž1993., Proposition 2.1x . For every K g I Ž K . and A g MŽ I ., partition A K according to the decomposition Ž2.4. as follows: AK s
ž
A² K :
A² K x
Aw K :
Aw K x
/
,
so A ² K : g MŽ² K :., A ² K x g MŽ² K : = w K x. , Aw K : g MŽw K x = ² K :. and Aw K x g MŽw K x. . For ⌺ g PŽ I . and K g I Ž K ., define ⌺ w K x ⴢs ⌺ w K x y ⌺ w K : ⌺y1 ² K : ⌺² K x . The following five results are the main tools in solving the estimation and testing problems for models of the type Ž2.3.. 1. The mapping PK Ž I . ª = Ž M Ž w K x = ² K : . = P Ž w K x . N K g I Ž K . . , ⌺¬
žŽ⌺
y1 w K : ⌺² K : ,
⌺w K x ⴢ . K g I Ž K . ,
/
is bijective w AP Ž1993., Theorem 2.2x ; 2. The covariance matrix ⌺ g PK Ž I . if and only if y1 Ž 2.6. tr Ž ⌺y1 xx⬘ . s Ý tr ž ⌺y1 w K x ⴢ Ž x w K x y ⌺ w K : ⌺² K : x ² K : . Ž ⭈⭈⭈ . ⬘ / K g I Ž K . ,
ž
/
for all x g ⺢ I w AP Ž1993., Theorem 2.1x ; 3. For ⌺ g PK Ž I . and L g K,
Ž 2.7.
det Ž ⌺ L . s
Ł ž det Ž ⌺w K x ⴢ .
K g IŽ K., K : L
/
w AP Ž1993., Lemma 2.5x . In particular,
Ž 2.8.
det Ž ⌺ . s
Ł ž det Ž ⌺w K x ⴢ .
K g IŽ K. ;
/
4. The action of the group GL K Ž I . on PK Ž I . given by restriction of Ž2.1. is well defined, transitive and proper; 5. The model Ž2.3. is invariant under the action of GL K Ž I . on the observation space ⺢ I given by the restriction of the action Ž2.2. and the transitive action of GL K Ž I . on the parameter space PK Ž I .. 2.3. The group symmetry model. Let G be a finite group and : G ª OŽ I . an orthogonal group representation of G on ⺢ I, that is, Ž1. s 1 I and Ž g 1 g 2 . s Ž g 1 . Ž g 2 ., for all g 1 , g 2 g G. Let M G Ž I . denote the subalgebra of all matrices A g MŽ I . that commute with Ž G ., that is, A Ž g . s Ž g . A for all g g G. The group of all nonsingular matrices and the cone of all positive definite matrices in M G Ž I . are denoted by GL G Ž I . and PG Ž I ., respectively. Note that ⌺ g PG Ž I . if and only if ⌺ g PŽ I . and ⌺ is G-invariant, that is, Ž g . ⌺ Ž g .⬘ s ⌺. Thus if x g ⺢ I follows the distribution NŽ ⌺ ., where ⌺ g PG Ž I ., then Ž g . x follows the same distribution for all g g G. The statistical
SYMMETRY AND LATTICE MODELS
model Ž 2.9.
537
Ž N Ž ⌺ . N ⌺ g PG Ž I . .
with observation space ⺢ and parameter space PG Ž I . is thus called the group symmetry Ž GS . model given by G. A summary of the basic theory of these models is presented in Appendix A; see the Introduction. The smoothing Ž' averaging. mapping I
IG : PS Ž I . ª PS G Ž I . ,
Ž 2.10.
S¬
1