non-hierarchical multilevel models - CiteSeerX

40 downloads 82 Views 280KB Size Report
chapter two main types of non-hierarchical model are considered. Firstly, .... tation does not increase in complexity as we add more classifications.
Chapter 1

NON-HIERARCHICAL MULTILEVEL MODELS Jon Rasbash and William Browne

1.

INTRODUCTION

2.

CROSS-CLASSIFIED MODELS

2.1

TWO WAY CROSS-CLASSIFICATIONS : A BASIC MODEL

In the models discussed in this book so far we have assumed that the structures of the populations from which the data have been drawn are hierarchical. This assumption is sometimes not justi ed. In this chapter two main types of non-hierarchical model are considered. Firstly, cross-classi ed models. The notion of cross-classi cation is probably reasonably familiar to most readers. Secondly, we consider multiple membership models, where lower level units are in uenced by more than one higher level unit from the same classi cation. For example, some pupils may attend more than one school. We also consider situations that contain a mixture of hierarchical, crossed and multiple membership relationships. This section is divided into three parts. In the rst part we look at situations that can give rise to a two way cross-classi cation and introduce some diagrams to describe the population structure, and discuss notation for constructing a statistical model. In the second part we discuss some of the possible estimation methods for estimating cross-classi ed models and give an example analysis of an educational data set. In the third part we then describe some more complex cross-classi ed structures and give an example analyses of a medical data set.

Suppose, we have data on a large number of patients, attending many hospitals and we also know the neighbourhood in which the patient lives and that we regard patient, neighbourhood and hospital all as important 1

2 Table 1.1

Patients cross-classi ed by hospital and neighbourhood.

Neighbourhood 1 Neighbourhood 2 Neighbourhood 3 Hospital 1 XX X Hospital 2 X X Hospital 3 XX X Hospital 4 X XXX

Table 1.2

Patients nested within hospitals within neighbourhoods.

Neighbourhood 1 Neighbourhood 2 Neighbourhood 3 Hospital 1 XXX Hospital 2 XX Hospital 3 XXX Hospital 4 XXXX sources of variation for the patient level outcome measure we wish to study. Now, typically hospitals will draw patients from many di erent neighbourhoods and the inhabitants of a neighbourhood will go to many hospitals. No pure hierarchy can be found and patients are said to be contained within a cross-classi cation of hospitals by neighbourhoods. This can be represented schematically , for the case of twelve patients contained within a cross-classi cation of three neighbourhoods by four hospitals as in table 1.1. In this example we have patients at level 1 and neighbourhood and hospital are cross-classi ed at level 2. The characteristic pattern of a cross-classi cation is shown, some rows contains multiple entries and some columns contain multiple entries. In a nested relationship, if the row classi cation is nested within the column classi cation then all the entries across a row will fall under a single column and vice versa if the column classi cation is nested within the row classi cation. For example, if hospitals are nested within neighbourhoods we might observe the pattern in table 1.2.

3 Many studies follow this simple two-way crossed structure, here are a few examples : Education: students cross-classi ed by primary school and secondary school. Health: patients cross-classi ed by general practice and hospital. Survey data: individuals cross-classi ed by interviewer and area of residence.

2.1.1 Diagrams for representing the relationship between classi cations. We nd two types of diagrams useful in expressing the nature of relationships between classi cations. Firstly, unit diagrams where we draw every unit (patient, hospital and neighbourhood, in the case of our rst example) and connect each lowest level unit(patient) to its parent units (hospital, neighbourhood). Such a representation of the data in table 1.1 is shown in gure 1.1.

Hospital

Patient

H1

P1

Neighbourhood

H2

P2

P3

N1

Figure 1.1

P4

H3

P5

P6

N2

P7

H4

P8

P9

P10

P11

P12

N3

Diagrams for crossed structure given in table 1.1.

Note that we have two hierarchies present, patients within hospitals and patients within neighbourhoods, we have organised the topology of the diagram such that patients are nested within hospitals. However,

4 when we come to add neighbourhoods to the diagram we see that the connecting lines cross, indicating we have a cross classi cation. Drawing the hierarchical structure shown in table 1.2 gives the representation shown in gure 1.2.

Neighbourhood

Hospital

Patient

N1

H1

P1

Figure 1.2

P2

H4

P3

P4

P5

P6

P7

P8

N2

N3

H2

H3

P9

P10

P11

P12

Diagrams for completely nested structure given in table 1.2.

Clearly, to draw such diagrams that include all units with large data sets is not practical as there will be far to many nodes on the diagram to t into a reasonable area. However, they can be used in schematic form to convey the structure of the relationship between classi cations. However, when we have four or ve random classi cations present (as commonly occur with social data) even schematic forms of these diagrams can become hard to read. There is a more minimal diagram, the classi cation diagram, which has one node for each classi cation. Nodes connected by an arrow indicate a nested relationship, nodes connected by a double arrow indicate a multiple-membership relationship (examples are given later) and unconnected nodes indicate a crossed relationship. Thus the crossed structure in gure 1.1 and the completely nested structure of gure 1.2 are drawn as

5 Hospital

Neighbourhood

Neighbourhood

Hospital

Patient

Patient

(i) crossed structure

Figure 1.3

2.1.2

(ii) nested structure

Classi cation Diagrams for nesting and crossing.

Some notation for constructing a statistical model.

The matrix notation used in this book for describing hierarchical models, that is, yj = Xj + Zj j + ej does not readily extend to the case of cross-classi cations. This is because this notation assumes a unique hierarchy where we write down the generic equation for the j th level two unit. In a simple cross-classi cation we have two sets of level two units, for example, hospitals and neighbourhoods, so which classi cation is j indexing? We can extend the basic scalar notation to handle cross-classi ed structures. Assume we have patients nested within a cross-classi cation of neighbourhoods by hospital, that is the case illustrated in gure 1.3(i). Suppose we want to estimate a simple variance components model giving estimates of the mean response and patient, hospital and neighbourhood level variation. In this case we can write the model in scalar notation as yi(j ;j ) = 0 + j + j + ei(j ;j ) where 0 estimates the mean response, j1 indexes the neighbourhood classi cation, j2 indexes the hospital classi cation, j is the random 1

2

1

2

1

2

1

6 Table 1.3

ure 1.1

Indexing table for neighbourhoods and hospitals for patients given in g-

i

1 2 3 4 5 6 7 8 9 10 11 12

nhbd(i) hosp(i) 1 2 1 2 1 2 2 3 3 2 3 3

1 1 1 2 2 3 3 3 4 4 4 4

e ect for neighbourhood j1 , j is the random e ect for hospital j2 , yi(j ;j ) is the response for the ith patient from the cell in the crossclassi cation de ned by neighbourhood j1 and hospital j2 and nally ei(j ;j ) is the patient level residual for the i'th patient from cell in the cross-classi cation de ned by neighbourhood j1 and hospital j2 . 2

1

2

1

2

Details of how this notation extends to represent more complex models and patterns of cross-classi cations are given in Rasbash and Browne, 2001. One problem with this notation is that as we t models with more classi cations and more complex patterns of crossing, the subscript notation that describes the patterns becomes very cumbersome and dicult to read. We therefore prefer an alternative notation introduced in Browne et al., 2000. We can write the same model as (2) (3) yi = 0 + nbhd + hosp + ei (i) (i)

where i indexes the observation level in this case patients, and nbhd(i) and hosp(i) are functions that return the unit number of the neighbourhood and hospital, respectively, that patient i belongs to. Thus for the data structure drawn in gure 1.1 the values of nbhd(i) and hosp(i) are given in table 1.3. Therefore the model for patient 3 would be

7

y3 = 0 + 1(2) + 1(3) + e3 and for patient 5 would be

y5 = 0 + 1(2) + 2(3) + e5 We number the classi cations from 2 upwards as we use classi cation number 1 to represent the identity classi cation that applies to the observation level (like level 1 in a hierarchical model). This classi cation simply returns the row numbers in the data matrix. As can be seen random e ects require bracketed superscripting with their classi cation number to avoid ambiguity. This simpli ed notation has the advantage that the subscripting notation does not increase in complexity as we add more classi cations. This simpli cation is achieved because the notation makes no attempt to describe the patterns of crossing and nesting present. This is useful information and we therefore advocate the use of this notation in conjunction with the classi cation diagrams, as shown in gure 1.3, which displays these patterns explicitily.

2.2

ESTIMATION ALGORITHMS

We will describe three estimation algorithms for tting cross-classi cation models in detail and mention other alternatives. Each of these three methods has advantages and disadvantages in terms of speed, memory usage and bias and these will be discussed later. All three methods have been implemented in versions of the MLwiN software package (Rasbash et al., 2000) and all results in this paper are produced by this package.

2.2.1 An IGLS algorithm for estimating cross-classi ed models. The iterative generalized least squares estimates for a multilevel model are those estimates which simultaneously satisfy both of the following equations: ^ = (XT V;1 X);1(XT V;1 y)

^ = (ZT (V );1 Z);1 ZT (V );1 y where ^ are the estimated xed coecients and ^ is a vector containing

the estimated variances and covariances of the sets of random e ects in the model. V = Cov(y j X; ) and an estimate of V is constructed from the elements of ^. y is the vector of elements of (y ; X )(y ; X )T

8 and therefore has length n2 (n is the length of the data set). V is the covariance matrix of y and Z is the design matrix linking y to V in the N     regression of y on Z . V has the form V = V V. See Goldstein, 1986 for more details. Some of these matrices are massive for example (V );1 is dimensioned n2 by n2 , making a direct software implementation of these estimating equations extremely resource intensive both in terms of CPU time and memory consumed. However, in hierarchical models V and V have a block diagonal structure which can be exploited by customised algorithms (see Goldstein and Rasbash, 1996) which allow ecient computation. The problem presented by cross-classi ed models is that V (and therefore V ) no longer has the block diagonal structure which the ecient algorithm requires.

2.2.2

Structure of V for cross classi ed models. Lets take a look at the structure of V, the covariance matrix of y, for cross-classi ed models and see how we can adapt the standard IGLS algorithm to handle cross-classi cations. The basic two level cross-classi ed model (with hospitals + neighbourhoods) can be written as : (2) (3) yi = X + hosp + nhbd + ei (i) (i) (2) (3) hosp  N (0; 2(2) ); nhbd  N (0; 2(3) ); ei  N (0; e2 ) (i) (i)

The variance of our response is now (2) (3) var(yi ) = var(hosp + nhbd + ei ) = 2(2) + 2(3) + e2 : (i) (i) The covariance between individuals a and b is (2) (3) (2) (3) cov(ya ; yb ) = cov(hosp + nhbd + ea ; hosp + nhbd + eb ) (a) ( a) (b) (b)

which simpli es to 2(2) for two individuals from the same hospital but di erent neighbourhoods, 2(3) for two individuals from the same neighbourhood but di erent hospitals, 2(2) + 2(3) for two individuals from the same neighbourhood and the same hospital and zero for two individuals who are from both di erent neighbourhoods and hospitals. If we take a toy example of ve patients in two hospitals and introduce a cross-classi cation with two neighbourhoods, as shown in table 1.4. This generates a 5 by 5 covariance matrix for the responses of the ve patients with the following structure :

9 Table 1.4

Indexing table for hospitals and neighbourhoods for 5 patients

i hosp(i) nhbd(i) 1 2 3 4 5

1 1 1 2 2

1 2 1 2 1

0 h+n+p h h+n 0 BB h h+n+p h n h h+n+p 0 V=B B@ h + n 0 n 0 h+n+p n 0 n 2 2 where n = (2) ; h = (3) and p = e2 .

h

n 0

n h h+n+p

1 CC CC A

Here the data is sorted patient within hospital, this allows us to split the covariance matrix into two components. A component for patients within hospitals which has a block diagonal structure (P) and a component for neighbourhoods which is not block diagonal (Q) : V=P+Q where0 h+p h h 0 0 1 BB h h + p h 0 0 C C B h h+p 0 0 C P=B h C @ 0 0 0 h+p h A 0 0 0 h h+p and 0 n 0 n 0 n1 BB 0 n 0 n 0 CC Q=B B@ n 0 n 0 n CCA 0 n 0 n 0 n 0 n 0 n Splitting the structure of V into a hierarchical, block-diagonal part that the IGLS algorithm can handle in an ecient way and a nonhierarchical, non-block diagonal part forms the basis of a relatively ecient algorithm for handling cross-classi ed models. If we take the dummy variable indicator matrix of neighbourhoods (Z), then we have Q = ZZT n :

10

0 BB Z=B B@

01 0 1 0 11 1 01 0 1C C , ZZT n = BBB 01 10 01 10 01 CCC n 1 0C C B@ C 0 1A 0 1 0 1 0A 1 0 1 0 1 0 1 We can de ne a `pseudo-unit' that spans the entire data set, in our toy example, all ve points, and declare this pseudo-unit to be level three in the model (removing the neighbourhood level from the model). We can now form the three level hierarchical model (2) (3) (3) yi = 0 + hosp + punit Z + punit Z + ei (i) (i);1 1 (i);2 2

2 3 " #  4 punit i ; 5  N (0;  );  =  0 ;  0 punit i ;  ; (3) (3)

( )1

(3)

(3)

2 (3) 1

( )2

2 (3) 2

(2) hosp  N (0; 2(2) ); ei  N (0; e2 ) (i)

Here the level structure is patients within hospitals within the pseudo unit level. Z1 and Z2 are columns 1 and 2 of Z. 2(3);1 and 2(3);2 are both estimates of the between neighbourhood variation, therefore we constrain them to be equal. Thus we can use the standard IGLS hierarchical algorithm to de ne and estimate the correct covariance structure for a cross-classi ed model. Now if we had 200 hospitals and 100 neighbourhoods, we would have to form 100 dummy variables for the neighbourhoods, allow them all to have variances at level 3 and constrain the variances to be equal. Details of this algorithm are given in Rasbash and Goldstein, 1994 and Bull et al., 1999 and it will be refered to as the RG algorithm in later sections.

2.2.3

MCMC. The MCMC estimation methods (see Chapter 3 of this book for a fuller description) aim to generate samples from the joint posterior distribution of all unknown parameters. They then use these samples to calculate point and interval estimates for each individual parameter. The Gibbs sampler algorithm produces samples from the joint posterior by generating in turn from the conditional posterior distributions of groups of unknown parameters. In chapter 3 the Gibbs sampling algorithm for a Normally distributed response hierarchical model is given. As we have seen in the notation section we can describe our model as a set of additive terms, one for the xed part of the model and one

11 for each of the random classi cations. The MCMC algorithm works on each of these terms seperately and consequently the algorithm for a crossclassi ed model is no more complicated than for a hierarchical model. For illustration we present the steps for the following cross-classi ed model based on the variance components hospitals by neighbourhoods model and refer the interested reader to Browne et al., 2000 for more general algorithms. Note that if the response is dichotomous or a count then as in chapter 3 we can use the Metropolis-Gibbs hybrid method discussed there. The basic two level cross-classi ed model (with hospitals + neighbourhoods) can be written as : (2) (3) yi = X + hosp + nhbd + ei (i) (i) (2) (3) hosp  N (0; 2(2) ); nhbd  N (0; 2(3) ); ei  N (0; e2 ) (i) (i)

We can split our unknown parameters into 6 distinct sets : the xed ef(2) fects, , the hospital random e ects, hosp , the neighbourhood random (i) (3) 2 e ects, nhbd(i) , the hospitals variance, (2) the neighbourhood variance, 2(3) and the residual variance, e2 . Then we need to generate random draws from the conditional distribution of each of these six groups of unknowns. MCMC algorithms are generally used in a Bayesian context and consequently we need to de ne prior distributions for our unknown parameters. For generality we will use a multivariate Normal prior for the xed e ects,  Npf (p; Sp ), and scaled inverse (SI) 2 priors for the three variances. For the hospital variance 2(2)  SI2 (2 ; s22 ), for the neighbourhood variance 2(3)  SI2 (3 ; s23 ) and for the residual variance e2  SI2 (e; s2e ). The steps are then as follows: In step 1 of the algorithm the conditional posterior distribution in the Gibbs update for the xed e ects parameter vector is multivariate normal with dimension pf (the number of xed e ects) : p( j y; (2) ; (3) ; 2(2) ; 2(3) ; e2 )  Npf ( b; Db ); where h i;1 T Db = hPNi=1 (Xi)e Xi + Sp;1 i and T b = Db Pi (Xi)e di + Sp;1p ; where (2) (3) di = yi ; hosp ; nhbd : (i) (i) 2

2

In step 2 we update the hospital residuals, k(2) , using Gibbs sampling with a univariate Normal full conditional distribution :

12

p(k(2) j y; ; (3) ; 2(2) ; 2(3) ; e2 )  N (bk(2) ; Db k(2) ); where ;1  bDk(2) = nk +  1 and e  (2)

ubk = Db k (2)

P

2

2 (2)

d(2) i;hosp(i)=k ie2



; where (3) d(2) i = yi ; Xi ; nhbd(i) : (2)

In step 3 we update the neighbourhood residuals, k(3) , using Gibbs sampling with a univariate Normal full conditional distribution :

p(k(3) j y; ; (2) ; 2(2) ; 2(3) ; e2 )  N (bk(3) ; Db k(3) ); where  ;1 Db k(3) = nke + 1 and (3)

P

2

2 (3)



b (3) i;nhbd(i)=k di ; where ub(3) k = Dk e (2) d(3) = y ; X ;  i i i hosp(i) : (3) 3

Note that in the above two steps n(kc) refers to the number of individuals in the kth unit of classi cation c. In step 4 we update the hospital variance 2(2) using Gibbs sampling and a Gamma full conditional distribution for 1=2(2) :

p(1=2(2) j y; ; (2) ; (3) ; 2(3) ; e2 )  Gamma

hn



2+ 2

2

2

In step 5 we update the neighbourhood variance 2(3) using Gibbs sampling and a Gamma full conditional distribution for 1=2(3) :

p(1=2(3) j y; ; (2) ; (3) ; 2(2) ; e2 )  Gamma

hn



3+ 3

2

hN



+ e 2

3

i

; 12 Pi e2i + es2e :

The above 6 steps are repeatedly sampled from in sequence to produce correlated chains of parameter estimates from which point and interval estimates can be created as in chapter 3.

2.2.4

i

; 21 Pnj =1 (j(3) )2 + 3 s23 :

In step 6 we update the observation level variance e2 using Gibbs sampling and a Gamma full conditional distribution for 1=e2 :

p(1=e2 j y; ; (2) ; (3) ; 2(2) ; (3) )  Gamma

i

; 21 Pnj =1 (j(2) )2 + 2 s22 :

AIP method. The Alternating Imputation Prediction (AIP) method is a data augmentation algorithm for estimating cross-classi ed

13 models with large numbers of random e ects. Comprehensive details of this algorithm are given in Clayton and Rasbash, 1999. We now give an overview. Data augmentation has been reviewed by Schafer, 1997. Tanner and Wong, 1987 introduced the idea of data augmentation as a stochastic version of the EM algorithm for maximum likelihood estimation in problems involving missing data. Corresponding to the E and M steps of Tanner and Wong we have I(mputation) step - impute missing data by sampling the distribution of the missing data conditional upon the observed data and current values of the model parameters. P(osterior) step - sample parameter values from the complete data posterior distribution; these will be used for the next I-step. In the context of random e ect models, the random e ects play the role of missing data. If the observed data are denoted by y, the random e ects by  and the model parameters by  and if we denote the probability distribution of y conditional on X as p(yjX) then the algorithm is speci ed (at step t) by I step - Draw a sample (t) from p( j y;  = (t;1) ) P step - Draw a sample (t) from p( j y;  = (t) ) Repeated application of these two steps delivers a stochastic chain with equilibrium distribution p(;  j y) in a similar way to the MCMC algorithm. Now lets look at how we can adapt this method to t a crossed random e ects model when the only estimating engine we have at our disposal is one optimized for tting nested random e ects. An n-way cross-classi ed model can be broken down into n sub-models each of which is a 2 level hierarchical model. For example, patients nested within a cross-classi cation of neighbourhood by hospital can be broken down into a patient within hospital sub-model and a patient within neighbourhood sub-model. Take the simple model (2) (3) yi = Xi + nbd + hosp + ei (i) (i)

where neighbourhood and hospital are cross-classi ed. This crossclassi ed model can be portioned into two hierarchical sub models : patients within neighbourhoods (model N) and patients within hospitals (model H). An informal description of the AIP algorithm is : 1. Start by tting model N using an estimation procedure for 2 level models.

14 2. Sample the model parameters from an approximation to their joint posterior distribution. That is sample the xed e ects, the neighbourhood level variance and the patient level variance; denote these samples by [0;2] , 2[0;2] and e2[0;2] respectively. Here [0,2] labels a term as belonging to AIP iteration 0, for classi cation number 2, that is neighbourhood. This is the P-step for the neighbourhood classi cation. 3. Next sample a set of neighbourhood level random e ects(o[0;2] ) from p([0;2] j y; [0;2] ; 2[0;2] ; e2[0;2] ) . This is the I-step for the neighbourhood classi cation. 4. O set o[0;2] from y, that is form y = y ; o[0;2], re-sort the data according to hospitals and t model H using the new o set response y . 5. Next sample [0;3] , 2[0;3] and e2[0;3], from this second model, H. This is the P-step for the hospital classi cation. 6. Sample a set of hospital level random e ects(o[0;3] ) from p([0;3] j y; [0;3]; 2[0;3] ; e2[0;3] ). This is the I-step for the hospital classi cation. This completes one iteration of the AIP algorithm, this is an ImputationPosterior algorithm that Alternates between the neighbourhood and hospital classi cations. We proceed by forming y = y ; o[0;3], that is o setting the sampled hospital residuals from y and using that as a response in step 1. After T iterations the procedure delivers the following two chains, that can be used for inference

f ; ;  ; ; e ; g; f ; ;  ; ; e ; g : : : f T; ;  T; ; e T; g f ; ;  ; ; e ; g; f ; ;  ; ; e ; g : : : f T; ;  T; ; e T; g [0 2]

[0 3]

2 [0 2] 2 [0 3]

2 [0 2] 2 [0 3]

[1 2]

[1 3]

2 [1 2] 2 [1 3]

2 [1 2] 2 [1 3]

[

2]

[

3]

2 [ 2 [

2]

3]

2 [ 2 [

2]

3]

Note that we get two sets of estimates for both the xed e ects and the level 1 variance with the AIP algorithm and these should be approximately equal.

2.2.5

Other Methods. Raudenbush, 1993 considers an empirical Bayes approach to tting cross-classi ed models based on the EM algorithm. He considers the speci c case of two classi cations where one of the classi cations has many units whilst the other has far fewer and shows two educational examples to illustrate the method. Two other recent approaches that can be used for tting cross-classi ed models, in particular with non-Normal responses are Gauss-Hermite quadrature within PQL estimation Pan and Thompson, 2000 and the

15 HGLM model framework as described in Lee and Nelder, 2000. Neither of these approaches has been designed with speed of estimation in mind and so they are currently not feasible for the size of some of the problems that are considered in practice.

2.2.6

Comparison of estimation methods. The RG method

when it works is generally fairly quick to converge where all or all but one of the crossed classi cations have small numbers of units. When there are multiple crossed classi cations with large numbers of untis then the speed of the RG algorithm deteriorates and memory usage is greatly increased, often exhausting the available memory. The AIP method does not have these memory problems but will be slower for structures that are almost hierarchical. Although this method works reasonably well, if the response is a binary variable and quasi-likelihood methods need to be used, then this method like the RG method is still a ected by the bias that is inherent in quasi-likelihood methods for binary response multilevel models (See Goldstein and Rasbash, 1996). The MCMC methods have no bias problems although there are still issues on which prior distributions to use for the variance parameters. They also, like the AIP methods do not have any memory problems. They are however generally computationally a lot slower as they are estimating the whole distribution and not simply the mode, although as the structure of the data becomes more complex the ratio of speed di erence is reduced.

2.2.7 An example analysis of a two way cross-classi cation: primary schools crossed with secondary schools. We will here

consider tting the RG method using the IGLS algorithm, the MCMC method based on Gibbs sampling (Browne et al., 2000) and the AIP method to an educational example from Fife in Scotland. Here we have as a response the exam results of 3,435 children at age 16. We know for each child both the primary school and secondary school that they attended and we are interested in partitioning the variance between these two sources and individual pupil level variation. The classi cation diagram is shown in gure 1.4. There are 148 primary schools that feed into 19 secondary schools in the dataset. Of the 148 primary schools, 59 are nested within a single secondary school, whilst another 62 have at most 3 pupils that do not go to the main secondary school so we have an almost nested structure. This structure is particularly suited for the RG algorithm. We will t the following model to the dataset

16

Primary School

Secondary School

Pupil

Figure 1.4

Classi cation Diagram for the Fife educational example

Table 1.5

Point estimates for the Fife educational dataset.

Parameter Mean achievement ( 0 ) Secondary school variance (2(2) ) Primary school variance (2(3) ) Individual level variance (e2 )

IGLS 5.50 (0.17) 0.35 (0.16) 1.12 (0.20) 8.10 (0.20)

MCMC 5.50 (0.18) 0.41 (0.21) 1.15 (0.213) 8.12 (0.20)

AIP 5.51 (0.19) 0.34 (0.15) 1.11 (0.20) 8.11 (0.20)

(2) (3) yi = 0 + SEC + PRIM + ei (i) (i) (2) (3) SEC  N (0; 2(2) ); PRIM  N (0; 2(3) ); ei  N (0; e2 ): (i) (i)

The results are shown in table 1.5. From table 1.5 we can see that in this example there is more variation between primary schools than between secondary schools. The MCMC

17 estimates replicate the IGLS estimates with slightly greater higher level variances (mean versus mode estimates) due to the skewness of the posterior distribution. The AIP method gives very similar results to the IGLS method. A further discussion of these results is given in Goldstein, 1995.

2.3

MODELS FOR MORE COMPLEX POPULATION STRUCTURES

In this section we will consider expanding the simple two cross-classi ed structure to accomodate more classi cations and more complex structures.

2.3.1

Example scenarios. Lets take the situation described in the classi cation diagram drawn in gure 1.3(i) where patients lie within a cross-classi cation of hospitals by areas. We may have information on the doctor that treated each patient and doctors may be nested within hospitals. The classi cation diagram for this structure is shown in gure 1.5. Hospital

Doctor

Neighbourhood

Patient

Classi cation Diagram for two crossed hierarchies (patients within doctors within hospitals)*(patients within neighbourhoods).

Figure 1.5

A variance components model for this structure is written as

18 (2) (3) (4) yi = 0 + nhbd + hosp + doct + ei (i) (i) (i)

If doctors work across hospitals and are therefore not nested within hospital we then have a three way cross-classi cation which is drawn in gure 1.6.

Hospital

Neighbourhood

Doctor

Patient

Classi cation Diagram for three crossed hierarchies (patients within hospitals)* (patients within doctors)*(patients within neighbourhoods).

Figure 1.6

Note that the variance components model for the structure in gure 1.6 is also described by the same equation. This is a re ection of the fact that the model notation for describing the random e ects simply lists the classi cations that are sources of variation for the response we are modelling. In the variance components model we only have an intercept term which varies across all four classi cations present. Suppose we had another explanatory variable, x1 and we wished to allow its coecient to vary across the doctor classi cations; we would write this model as (2) (3) (4) (4) yi = 0 + nhbd + hosp + doct + 1 x1i + doct x + ei (i) (i) (i);0 (i);1 1i

or alternatively we can express the model as :

19

yi = 0i + 1i x1i + ei (2) (3) (4) 0i = 0 + nhbd + hosp + doct (i) (i) (i);0 (4) 1i = 1 + doct (i);1

It may be that the scenario described in gure 1.6 is further complicated because hospitals, doctors and neighbourhoods are all nested within regions. In this case the classi cation diagram becomes as in gure 1.7. Region

Hospital

Neighbourhood

Doctor

Patient

Classi cation Diagram for three crossed hierarchies nested within a higher level classi cation. Figure 1.7

Extending the last model to incorporate a simple random e ect for the region classi cation we have yi = 0i + 1i x1i + ei (2) (3) (4) (5) 0i = 0 + nhbd + hosp + doct + reg (i) (i) (i);0 (i) (4) 1i = 1 + doct (i);1 These few example scenarios indicate how the classi cation diagrams and simpli ed notation can extend to describe patterns of crossings of arbitrary complexity.

2.3.2 An example analysis of a complex cross-classi ed structure : Arti cial Insemination data. We consider a data set con-

20 cerning arti cial insemination by donor. Detailed description of this data set and the substantive research questions addressed by modelling it within a cross-classi ed frame work are given in Ecochard and Clayton, 1998. The data was re-analysed in Clayton and Rasbash, 1999 as an example case study demonstrating the properties of the AIP algorithm for estimating cross-classi ed models. The data consists of 1901 women who were inseminated by sperm donations from 279 donors. Each donor made multiple donations, there were 1328 donations in all. A single donation is used for multiple inseminations. Each woman receives a series of monthly inseminations, 1 insemination per ovulatory cycle. The data contain 12100 cycles within the 1901 women. There are two crossed hierarchies, a hierarchy for donors and a hierarchy for women. Level 1 corresponds to measures made at each ovulatory cycle. The response we analyse is the binary variable indicating if conception occurs in a given cycle. The hierarchy for women is cycles within women. The hierarchy for donors is cycles within donations within donors. Within a series of cycles a women may receive sperm from multiple donors/donations. The classi cation diagram for this structure is given in gure 1.8. The model tted to the data is Donor

Donation

Woman

Cycle

Figure 1.8

Classi cation Diagram for the arti cial insemination example model.

21 Table 1.6

Results for the Arti cial insemination example.

Parameter MCMC AIP intercept ( 0 ) -3.92 (0.21) -3.90 (0.21) azoospermia ( 1 ) 0.21 (0.09) 0.22 (0.10) semen quality ( 2 ) 0.18 (0.03) 0.18 (0.03) womens age > 35 ( 3 ) -0.29 (0.12) -0.27 (0.12) Sperm count ( 4 ) 0.002 (0.001) 0.002 (0.001) Sperm motility ( 5 ) 0.0002 (0.0001) 0.0002 (0.0001) Insemination too early ( 6 ) -0.69 (0.17) -0.67 (0.17) Insemination too late ( 7 ) -0.27 (0.09) -0.25 (0.09) Donor variance (2(4) ) 0.11 (0.06) 0.10 (0.06) 2 Donation variance ((3) ) 0.36 (0.074) 0.34 (0.065) Women variance (2(2) ) 1.02 (0.15) 1.01 (0.11)

yi  Bernouilli(i ) logit(i ) = 0 + azooi  1 + semenqi  2 + age > 35i  3 + spermcounti  4 + spermmoti  5 + iearlyi  6 + (2) (3) ilatei  7 + woman + donation + ((4) (i) (i) donor(i) (3) (4) (2) 2 2 2 woman  N (0 ;  ) ;   N (0 ;  ) ;  (2) donation(i) (3) donor(i)  N (0; (4) ) (i)

(1.1)

Note that azoospermia (azoo) is a dichotomous variable indicating whether the fecundability of the woman is impaired (0 impaired, 1 not impaired). The results of tting this model from the MCMC and AIP estimation procedures are given in table 1.6. This model could not be tted using the RG algorithm. This is because if the data is sorted according to women then we need to t 279 dummy variables for donors and 1328 dummy variables for donations. Alternatively, if we sort the data according to donations within donors we have to t 1901 dummy variables for women. Either way, the size of these data matrices cause problems of insucient memory. Even if these memory problems can be worked around the numerical instability of the constraining procedure, that attempts to constrain over a thousand seperately estimated variances to be equal, causes the adapted IGLS algorithm to fail to converge. After inclusion of covariates there is considerably more variation in the probability of a successful insemination attributable to the women

22 hierarchy than the donor hierarchy. Both the AIP and MCMC methods give similar estimates for all parameters. The xed e ect estimates show that the probability of conception is increased with azoospermia and increased sperm quality, count and motility but decreased with the age of the woman and with inseminations that are too early or late.

3.

MULTIPLE MEMBERSHIP MODELS

3.1

A BASIC STRUCTURE FOR TWO LEVEL MULTIPLE MEMBERSHIPS

As we have seen from the previous section, allowing classi cations to be crossed gives rise to a large family of additional model structures that can be estimated. The other main restriction of the basic multilevel model is the need for observations to belong to a unique classi cation unit i.e. every pupil belongs to a particular class, every patient is treated at a particular hospital. Often however, over time a patient may be treated at several hospitals and depending on the response of interest all of these hospitals may have in uence. In this section we will rstly introduce the idea of multiple membership and give some example scenarios where it may occur. We will then discuss the possible estimation procedures that can be used to t multiple membership models and nish the chapter with a simulated example from the eld of education.

Supose we have data on a large number of patients that attend their local hospital and during the course of their hospital stay they are treated by several nurses and we regard the nurses as an important factor on the patients outcome of interest. Now typically each patient will be seen by more than one nurse during their stay (although some will only see 1) but there are many nurses and so we will treat nurses as a random classi cation rather than as xed e ects. To illustrate this table 1.7 shows the nurses seen by the rst 4 patients. We can consider this structure in a unit diagram as shown in gure 1.9. Here each line in the diagram corresponds to a tick mark in the table. Again as our dataset gets larger such unit diagrams become impractical as there will be too many nodes and so we will resort to using the classi cation diagrams introduced earlier for cross-classi ed models. If we wish to include multiple membership classi cations in such diagrams we use the convention of a double arrow to represent multiple membership. This will lead to the classi cation diagram shown in gure 1.10 for the above patients and nurses example.

23 Table 1.7

Table of patients that are seen by multiple nurses.

Patient 1 Patient 2 Patient 3 Patient 4

Nurse

Patient

Figure 1.9

Nurse 1 Nurse 2 Nurse 3

p p

p p

p

N1

P1

p

N2

P2

p

N3

P3

P4

Unit Diagram for multiple membership patients within nurses example.

3.1.1 Example scenarios. Many studies have multiple membership structure, here are a few examples : Education : pupils change school/class over the course of their education and each school/class has an e ect on their education. Health : patients are seen by several doctors and nurses during the course of their treatment. Survey data : Over their lifetime individuals move household and each household has a bearing on their lifestyle, health, salary etc.

3.1.2

Constructing a statistical model. Returning to our example of patients being seen by multiple nurses, we have patient 1's response being a ected by nurse 1 and nurse 3 while patient 2 is only a ected by nurse 1. As we are treating nurse as a random classi cation

24

Nurse

Patient

Figure 1.10

example.

Classi cation Diagram for multiple membership patients within nurses

we would like each patient's response to have equal e ect on the nurse classi cation variance so we generally weight the random e ects to sum to 1. For example let's assume patient 1 has been treated by nurse 1 for 2 days and nurse 3 for 1 day. Then we may give nurse 1 a weight of 32 and nurse 3 a weight of 13 . Often we do not have information on the amount of time patients are seen by each nurse and so we commonly allocate equal weights (in this case 21 ) to each nurse. We can then write down a general two level multiple-membership model as X (2) (2) yi = X + wi;j j + ei j 2nurse(i)

j(2)  N (0; 2(2) ); ei  N (0; e2 ) (2) nurse(i) is the set of nurses seen by patient i and wi;j is the weight given to nurse j for patient i. Here we assume that X (2) wi;j = 18i:

j 2nurse(i)

25 If we wish to write out this model for the rst four patients from the example we get

y1 = X1 + 12 1(2) + 12 3(2) + e1 y2 = X2 + 1(2) + e2 y3 = X3 + 21 2(2) + 12 3(2) + e3 y4 = X4 + 21 1(2) + 12 2(2) + e4

3.2

ESTIMATION ALGORITHMS

3.2.1

An IGLS algorithm for multiple membership models.

There are two main algorithms for multiple membership models, an adaption of the Rasbash and Goldstein, 1994 algorithm described earlier and the MCMC method. The AIP method has not been extended to cater for multiple membership models. Earlier we described how to t a cross-classi ed model by absorbing one of the cross-classi cations into a set of dummy variables (The RG method). A slight modi cation is required to allow this technique to be used to t multiple membership models. First lets consider a two level hierarchical model for patients within nurses: (2) yi = 0 + nurse + ei (i) (2) nurse  N (0; 2(2) ); ei  N (0; e2 ): (i)

We can reparamaterise this simple two level model as (2) (2) (2) (2) yi = 0 +zi;1nurse +zi;2 nurse +zi;3 nurse +: : :+zi;J nurse +ei (i);1 (i);2 (i);3 (i);J

2 66 nurse i ; 66 nurse i ; 66  66 nurse. i ; 4 (2) (2) (2)

( )1 ( )2 ( )3

(2) nurse (i);J

3 2 0 0 77 66  0 ;  0  ; 77 66 0 0  ; 77  N (0;  );  = 66 ... ... 77 66 ... ... ... 5 4 ... 2 (2) 1

(2)

2 (2) 2

(2)

0

ei  N (0; e2 )

0

2 (2) 3

0

... 0 ... 0 ... 0 ... ... ... ... ... 2(2);J

3 77 77 77 77 5

26 where zi;j is a dummy variable which is 1 if patient i is seen by nurse j , 0 otherwise and J is the total number of nurses. Also we add the constraint 2(2);1 = 2(2);2 = : : : = 2(2);J . Now these two models will deliver the same estimates, however the second formulation will take much longer to compute. The advantage of the second model formulation is that it is straightforward to extend it to the multiple membership case. Suppose patients are not nested within a single nurse but are multiple members of nurses with membership probabilities, i;j . We can simply replace zi;j with i;j in the second formulation and estimation can proceed in an identical fashion but will now deliver estimates for the multiple membership model.

3.2.2

MCMC. Once again we will use a Gibbs sampling algo-

rithm that relies on updating groups of parameters in turn from their conditional posterior distributions. For illustration we present the steps for the following simple multiple membership model based on the variance components model patients within nurses described earlier. We once again refer the interested reader to Browne et al., 2000 for more general algorithms and note that if the response is dichotomous or a count then as in chapter 3 we can use the Metropolis-Gibbs hybrid method discussed there. The basic two level multiple membership model (patients within nurses) can be written as : X (2) (2) yi = X + wi;j j + ei j 2nurse(i)

j(2)  N (0; 2(2) ); ei  N (0; e2 )

We can split our unknown parameters into 4 distinct sets : the xed e ects, , the nurse random e ects, j(2) , the nurse level variance, 2(2) and the patient level residual variance, e2 . We then need to generate random draws from the conditional distribution of each of these four groups of unknowns. We will de ne prior distributions for our unknown parameters as follows: For generality we will use a multivariate Normal prior for the xed e ects,  Npf (p ; Sp ), and scaled inverse 2 priors for the two variances. For the nurse level variance 2(2)  SI2 (2 ; s22 ), and for the patient level variance e2  SI2 (e ; s2e ). The steps are then as follows: In step 1 of the algorithm the conditional posterior distribution in the Gibbs update for the xed e ects parameter vector is multivariate normal with dimension pf (the number of xed e ects) :

27

p( j y; (2) ; 2(2) ; e2 )  Npf ( b; Db ); where h i;1 T Db = hPNi=1 (Xi)e Xi + Sp;1 i and T b = Db Pi (Xi)e di + Sp;1p ; where (2) (2) di = yi ; Pj 2nurse(i) wi;j j : 2

2

In step 2 we update the nurse residuals, k(2) , using Gibbs sampling with a univariate Normal full conditional distribution : p(k(2) j y; ; 2(2) ; e2 )  N (bk(2) ; Db k(2) ); where

#; P w and Db k = i;k2nurse i  +  " # ubk = Db k Pi;k2nurse i w  d ; where di;k = yi ; Xi ; Pj2nurse i ;j 6 k wi;j j : (2)

"

( )

(2)

1

(2)

( i;k )2

1

2

2

e

 (2)

(2) (2)

(2)

i;k i;k 2

( )

e

(2)

(2) (2)

( ) =

In step 3 we update the nurse level variance 2(2) using Gibbs sampling and a Gamma full conditional distribution for 1=2(2) :

p(1=2(2) j y; ; (2) ; e2 )  Gamma

hn



2+ 2

2

i

; 21 Pnj =1(j(2) )2 + 2 s22 : 2

In step 4 we update the patient level variance e2 using Gibbs sampling and a Gamma full conditional distribution for 1=e2 :

p(1=e2 j y; ; (2) ; 2(2) )  Gamma

hN



+ e 2

i

; 12 Pi e2i + e s2e :

The above 4 steps are repeatedly sampled from in sequence to produce correlated chains of parameter estimates from which point and interval estimates can be created as in chapter 3.

3.2.3

Comparison of estimation methods. As in the compar-

ison for cross-classi ed models there are bene ts for both methods. The RG method is fairly quick but the number of level 2 units determines the size of some of the matrices involved and the number of constraints that the method has to apply. These dependencies lead to numerical instability or memory exhaustion in situations with more than a few hundred level 2 units. The MCMC methods although again computationally slower do not su er from these memory problems.

3.2.4 An example analysis of a two level multiple membership model : Children moving school . We consider a simulated

28 Table 1.8

Results for the multiple membership schools example.

Parameter RG RIGLS estimates MCMC Estimates intercept ( 0 ) 0.002 (0.040) 0.003 (0.040) LRT e ect ( 1 ) 0.565 (0.012) 0.565 (0.013) 2 School variance ((3) ) 0.093 (0.018) 0.096 (0.020) Pupil variance (2(2) ) 0.570 (0.013) 0.571 (0.013) data example based on the problem in education of adjusting for the fact that pupils move school during the course of their studies. We will consider a study with 4059 students from 65 schools taken from Rasbash et al., 2000. The actual data in the study has each child belonging to 1 school but we will assume that over their education 10% of children moved school so we will choose at random for 10% of the children a second school. We will assume that information about when the move occured is unavailable and so for these children we will allocate equal weights of 0.5 to each school. Browne et al., 2000 considered this as the basis for a simulation experiment by generating 1000 datasets with this structure to show the bias and coverage properties of the MCMC method. We will instead consider the true response on our modi ed structure. We have as a response the pupil's total (normalised) exam score in all GCSE exams taken at age 16 and as a predictor the pupil's (standardised) score in a reading test taken at age 11. As we are interested in progress from age 11-16 it makes sense to consider the e ect of all schools attended in this period. We will consider the following model

normexami = 0 + 1 standlrti +

X

j 2school(i)

(2) (2) wi;j j + ei

j(2)  N (0; 2(2) ); ei  N (0; e2 )

We t this model using both the RG and MCMC methods and the results can be seen in table 1.8 From the table we can see that both methods give similar results. If we compare the results here with the results in Rasbash et al., 2000 we see only slight changes to the estimates with the level 2 variance slightly decreased and the level 1 variance slightly increased. However in cases where there is greater amounts of multiple membership the

29 variance estimates can be altered if this multiple membership is ignored, for example if we randomly assigned every pupil to a second school the variances change to 0.088 and 0.609 at levels 1 and 2 respectively.

4.

COMBINING MULTIPLE MEMBERSHIP AND CROSS-CLASSIFIED STRUCTURES IN A SINGLE MODEL

Consider two of our earlier examples in the eld of education, rstly pupils in a crossing of primary schools and secondary schools and secondly pupils who are moving from school to school. We could assume that these two structures occur simultaneously and we will then end up with a model structure that contains both a multiple membership classi cation (secondary schools) and a second classi cation (primary schools) that is crossed with the rst. This scenario can be represented by a classi cation diagram as in gure 1.11. Browne et al., 2000 refer to models that contain both multiple memberships and cross classi cations as multiple membership multiple classi cation (MMMC) models.

P. School

S. School

Pupil

Figure 1.11

model.

Classi cation Diagram for the neighbours/schools multiple membership

30

4.1

EXAMPLE SCENARIOS

Many studies have both cross-classi ed and multiple membership classi cations in their structure, a few examples are the following : Education : pupils can be a ected by the crossing of the neighbourhood they live in and the school they attend. They could also change class over their period of education and so this multiple membership class classi cation will be crossed with the neighbourhood classi cation.

4.2

Health : patients are seen by several doctors during their treatment and may visit several hospitals. Doctors who are specialists may move from hospital to hospital and so are crossed with the hospitals. Survey Data : individuals will belong to many households over the course of their lives and will reside in several properties. An entire household may move to a new property so households can be crossed with properties and all the households/properties can have an e ect on the individual. See Goldstein et al., 2000 for more details. Spatial Data : individuals will belong to a particular area but will also be a ected by multiple neighbouring areas.

CONSTRUCTING A STATISTICAL MODEL

If we return to our example of pupils attending multiple secondary schools but coming from one primary school we need to combine the multiple membership and cross classi ed model structures into one model. As we are treating the secondary schools as a random classi cation we would like each pupil to have an equal e ect on the secondary school classi cation so we will use weights that add to 1 when a pupil attends more than one secondaryschool. We will let second(i) be the list of secondary schools that child i has attended. We can then write down a general two classi cation MMMC model as X (2) (2) (3) yi = X + wi;j j + prim + ei j 2second(i)

(3) j(2)  N (0; 2(2) ); prim  N (0; 2(3) ); ei  N (0; e2 ):

31 (2) Here wi;j is the weight given to secondary school j for pupil i. Here P (2) we assume that j 2second(i) wi;j = 18i. Both the RG algorithm and the MCMC method can be used to t these models that combine both multiple membership and cross classi cation.

4.3

AN EXAMPLE ANALYSIS : DANISH POULTRY FARMING

Rasbash and Browne, 2001 consider an example from veterinary epidemiology concerning the outbreaks of salmonella typhimurium in ocks of chickens in poultry farms in Denmark between 1995 and 1997. The response of interest is whether salmonella typhimurium is present in a

ock and in the data collected 6.3% of ocks had the disease. At the observation level, each observation represents a ock of chickens. For each ock the response variable is whether or not there was an instance of salmonella in that ock. The basic data have a simple hierarchical structure as each ock is kept in a house on a farm until slaughter. As

ocks live for a short time before they are slaughtered several ocks will stay in the same house each year. The hierarchy is as follows 10,127 child ocks within 725 houses on 304 farms. Each ock is created from a mixture of parent ocks (up to 6) of which there are 200 in Denmark and so we have a crossing between the child

ock hierarchy and the multiple membership parent ock classi cation. The classi cation diagram can be seen in gure 1.12. We also know the exact makeup of each child ock (in terms of parent ocks) and so can use these as weights for each of the parent ocks. We are interested in assessing how much of the variability in typhoid incidence can be attributed to houses, farms and parent ocks. There are also 4 hatcheries in which all the eggs from the parent

ocks are hatched. We will therefore t a variance components model that allows for di erent average rates of salmonella for each year with hatchery included in the xed part as follows :

salmonellai  Bernouilli(i ) logit(i ) = 0 + Y 96  1 + Y 97  2 + hatch2  3 +

P (2) (3) (4) (4) hatch3  4 + hatch4  5 + House + Farm + j 2P:flock(i) wi;j j (i) (i) (2) (3) House  N (0; 2(2) ); Farm  N (0; 2(3) ); (4)  N (0; 2(4) ) (i) (i)

(1.2)

32 Farm

House

Parent Flock

Child Flock

Figure 1.12

Classi cation diagram for the Danish poultry model.

The results of tting model 1.2 using both the Rasbash and Goldstein method with 1st order MQL estimation and the MCMC method can be seen in table 1.9. The quasi-likelihood methods are numerically rather unstable and we could not get either 2nd order MQL or PQL to t this model. We can see here that there are large e ects for the year the chickens were born suggesting that salmonella was more prevalent in 1995 than the other years. The hatchery e ects were also large suggesting chickens produced in hatcheries 1 and 3 had a larger incidence of salmonella. There is a large variability for the parent ock e ects and for the farm e ects which are of similar magnitude. There is less variability between houses within farms.

4.3.1 Method comparison. The MCMC results were run for 50,000 iterations after a burn-in of 20,000 (This took just under 2 hours on a 733MHz PC) as we used arbitrary starting values and so the chain took a while to converge. From table 1.9 we can see reasonable agreement between the two methods, although the xed e ects in MQL are all smaller as is the farm level variance. This behaviour was shown in simulations on a nested 3 level binary response data structure in Rodriguez

33 Table 1.9

Results for the Danish poultry example.

Parameter intercept ( 0 ) 1996 e ect ( 1 ) 1997 e ect ( 2 ) Hatchery 2 e ect ( 3 ) Hatchery 3 e ect ( 4 ) Hatchery 4 e ect ( 5 ) Parent ock variance (2(4) ) Farm variance (2(3) ) House variance (2(2) )

1st MQL MCMC Estimates -1.862 (0.184) -2.322 (0.213) -1.004 (0.138) -1.239 (0.162) -0.852 (0.159) -1.165 (0.187) -1.458 (0.222) -1.733 (0.255) -0.250 (0.209) -0.211 (0.252) -1.007 (0.353) -1.062 (0.388) 0.892 (0.184) 0.895 (0.179) 0.639 (0.121) 0.927 (0.197) 0.206 (0.096) 0.208 (0.108)

and Goldman, 1995 with the improvements of the MCMC method shown in Browne and Draper, 2000 and so this suggests that the MCMC results should be more accurate.

4.4

COMPLEX RANDOM EFFECTS

Model 1.2 is essentially another variance components model but we could t a model that has complex variation at one of the higher classi cations. To illustrate this we will modify the farm level variance to account for di erent variability between years at the farm level that is (3) we replace the simple farm level random e ects, Farm with 3 sets of (i) e ects one for each year. Our expanded model is then as follows :

salmonellai  Bernouilli(i ) logit(i ) = 0 + Y 96  1 + Y 97  2 + hatch2  3 + (2) (3) hatch3  4 + hatch4  5 + House + Y 95  Farm + (i) (i);1

P (3) (3) (4) (4) Y 96  Farm + Y 97  Farm + j 2P:flock(i) wi;j j (i);2 (i);3 (2) (3) House  N (0; 2(2) ); Farm  N3(0; (3) ); (4)  N (0; 2(4) ) (i) (i)

(1.3)

34 Table 1.10

Estimates for the parameters in model 1.3.

Parameter MCMC Estimates intercept ( 0 ) -2.544 (0.240) 1996 e ect ( 1 ) -1.149 (0.256) 1997 e ect ( 2 ) -1.003 (0.293) Hatchery 2 e ect ( 3 ) -1.788 (0.265) Hatchery 3 e ect ( 4 ) -0.143 (0.252) Hatchery 4 e ect ( 5 ) -1.065 (0.383) Parent ock variance (2(4) ) 0.878 (0.180) Farm year95 variance ((3) [1; 1]) 1.416 (0.341) Farm 95/96 covariance ((3) [1; 2]) 0.514 (0.262) Farm 95/97 covariance ((3) [1; 3]) 0.415 (0.226) Farm year96 variance ((3) [2; 2]) 1.239 (0.463) Farm 96/97 covariance ((3) [2; 3]) 0.750 (0.321) Farm year97 variance ((3) [3; 3]) 1.017 (0.482) 2 House variance ((2) ) 0.271 (0.119) The parameter estimates for this extended model are given in table 1.10. We see that the xed e ects estimates are fairly similar to model 1.2. It is interesting to see that all the covariances in the farm level variance matrix are positive. This suggests that after adjusting for other factors, if a farm has an incidence of salmonella in 1995 then it is more likely to have an incidence again in 1996 and in 1997. In fact the corresponding correlation estimates are 0.39, 0.35 and 0.67 respectively showing that in particular there is a strong correlation between salmonella infection in farms in 1996 and 1997. The numerical instabilities of the quasi-likelihood methods mean that comparitive estimates could not be calculated for this model.

5.

CONSEQUENCES OF IGNORING NON-HIERARCHICAL STRUCTURES

Analysing only hierarchical components of populations which have additional non-nested structures has two potentially negative consequences. Firstly, the model is under-speci ed because there are sources of variation that have not been included in the model. This underspeci cation can lead to an underestimation of the standard errors of

35 Table 1.11

E ects of ignoring a cross-classi ed structure.

Parameter intercept VRQ e ect primary school variance secondary school variance pupil variance

Model I Model II Model III 5.97 (0.07) 6.02 (0.07) 5.98 (0.07) 0.16 (0.003) 0.16 (0.003) 0.16 (0.003) 0.28 (0.06) 0.27 (0.06) 0.05 (0.02) 0.01 (0.02) 4.25 (0.10) 4.48 (0.11) 4.25 (0.10)

the parameters and therefore to incorrect inferences. Secondly, the variance components obtained from the simple hierarchical model, or sets of separate hierarchical models, can not be trusted. They may change substantially if the additional non-nested structures are included in a single model. For example, we may wish to know about the relative importance of general practices and hospitals on the variation in some patient level outcome. If patients are cross-classi ed by hospital and general practice, we need to t the full cross-classi ed model including patients, general practices and hospitals in order to address this question. Looking at two separate hierarchical analysis one of patients within hospital, the other of patients within general practices, is not sucient. A numerical example of this is shown in table 1.11 which shows results for three models tted using the RG method to the educational attainment data from Fife in Scotland, where pupils are contained within a cross-classi cation of primary schools by secondary schools. Model I ts pupils within primary schools and ignores secondary school, model II ts pupils within secondary schools and ignores primary school and model III ts the cross-classi cation. The response is an attainment score at age 16, the explanatory variable vrq is a verbal reasoning measure taken at age 11. When one side of the cross-classi cation is ignored, the released variance is split between the classi cation left in the model and the pupil level variance, in ating both estimates. This has the most drastic e ect when the primary school hierarchy is ignored, in this case (model II) the in ated estimate of the between secondary school variance is 2.5 times its standard error as opposed to 0.5 times its standard error in the full model.

References

Browne, W. J. and Draper, D. (2000). A comparison of Bayesian and likelihood methods for tting multilevel models. Submitted. Browne, W. J., Goldstein, H., and Rasbash, J. (2000). Fitting complex model structures to large datasets: a Monte Carlo Markov chain (MCMC) algorithm to t multiple membership multiple classi cation models. Submitted. Bull, J. M., Riley, G. D., Rasbash, J., and Goldstein, H. (1999). Parallel Implementation of a Multilevel Modelling Package. Computational Statistics and Data Analysis, 31:457{474. Clayton, D. G. and Rasbash, J. (1999). Estimation in large crossed random-e ects models by data augmentation. Journal of the Royal Statistical Society, Series A, 162:425{436. Ecochard, R. and Clayton, D. G. (1998). Multilevel modelling of conception in arti cial insemination by donor. Statistics in Medicine, 17:1137{1156. Goldstein, H. (1986). Multilevel mixed linear model analysis using iterative generalised least squares. Biometrika, 73:43{56. Goldstein, H. (1995). Multilevel Statistical Models. Edward Arnold, London, 2 edition. Goldstein, H. and Rasbash, J. (1996). Improved Approximations for Multilevel Models with Binary Responses. Journal of the Royal Statistical Society, Series A, 159:505{513. Goldstein, H., Rasbash, J., Browne, W. J., Woodhouse, G., and Poulain, M. (2000). Multilevel modelling in the study of dynamic household structures. European Journal of Population, pages {. Lee, Y. and Nelder, J. (2000). Hierarchical Generalized linear models: a synthesis of generalized linear models, random e ects models, and structured dispersion. Technical report, Department of Mathematics, Imperial College, London. 37

38 Pan, J. X. and Thompson, R. (2000). Generalized linear mixed models: An improved estimating procedure. In Bethlehem, J. G. and van der Heijden, P. G. M., editors, COMPSTAT: Proceedings in Computational Statistics, 2000., pages 373{378. Physica-Verlag. Rasbash, J. and Browne, W. J. (2001). Non-hierarchical multilevel models. In Leyland, A. and Goldstein, H., editors, Multilevel modelling of health statistics. Wiley. Rasbash, J., Browne, W. J., Goldstein, H., Yang, M., Plewis, I., Healy, M., Woodhouse, G., Draper, D., Langford, I., and Lewis, T. (2000). A User's Guide to MLwiN. Institute of Education, London, 2.1 edition. Rasbash, J. and Goldstein, H. (1994). Ecient analysis of mixed hierarchical and crossed random structures using a multilevel model. Journal of Behavioural Statistics, 19:337{350. Raudenbush, S. W. (1993). A crossed random e ects model for unbalanced data with applications in cross-sectional and longitudinal research. Journal of Educational Statistics, 18:321{350. Rodriguez, G. and Goldman, N. (1995). An Assessment of Estimation Procedures for Multilevel Models with Binary Responses. Journal of the Royal Statistical Society, Series A, 158:73{89. Schafer, J. (1997). Analysis of Incomplete Multivariate Data. Chapman & Hall, London. Tanner, M. and Wong, W. (1987). The calculation of posterior distributions by data augmentation (with discussion). Journal of the American Statistical Association, 82:528{550.