Boxes on entry contain the system's UNKNOWN or missing value; this is a very large negative number. 2 ... This section is written for those who want to know a little bit more of what is going on in the ..... Catn 1 c12 0 '\ Girl NoInt' 1 '\Boy No Int' 2 '\Girl Int' 3 '\ Boy ...... after lag 1, and that is pretty much what we have got here.
Developing multilevel models for analysing contextuality, heterogeneity and change using MLwiN 2.2 Volume 1
Kelvyn Jones SV Subramanian June 2014
© Jones and Subramanian _____________________________________________________________________________________________________________________
Preface to Volume 1 The purpose of these volumes is to provide a thorough account on how to implement multilevel models from a social science perspective in general and geographic perspective in particular. They specifically serve as a handbook for the multilevel modelling courses run by Kelvyn Jones and SV Subramanian. We use the MLwiN software throughout. Volume1 introduces the software and then provides an extended account of the Normal-theory two-level model estimated by maximum likelihood. Particular emphasis is placed on modelling variance functions which are considered at more than one level. This is then followed by an account of how to apply Bayesian MCMC models. A consideration of a three-level Normal theory model which analyses repeated cross-sectional data on changing school performance concludes this volume. This is written in the form of an extended research project so that readers can get the feel of how to combine models in and the choices that have to be made in developing a coherent research strategy. Volume 2 extends this volume by additionally considering discrete outcomes and longitudinal and spatial models.
2
© Jones and Subramanian _____________________________________________________________________________________________________________________
Table of contents Chapter 1
2
Page 4 4 5 5 7 9 14 22 23 29 35 38 44 46 46 58 64 69 76 88 97 98 100 106 128 128 138 147 151 151 160 160 166 168 173 195 234
Title Introducing MLwiN 2.1 Brief history of different version of the software Some resources The MLwiN worksheet The command interface An example of a macro Getting started with MLwiN - copying from an Excel spreadsheet - data manipulation - data description - graphical display - Command window
3
Fitting a two-level model - Model 1: null random intercepts - Model 2: with additional predictor - Model 3: fully random at level 2 - Model 4: District 34 as an outlier
4 5
Confidence intervals, residuals and diagnostics Modelling variance functions: continuous variables - Model 5: linear variance function at level 2 - Model 6: linear variance function at level 1 - Model 7: quadratic variance function at level 2
6 7
Comparing models and significance testing Modelling with categorical predictor variables - Model 8: categorical predictors as main effects and interactions - Model 9 : quadratic function at level 2 for categorical predictors - Model 10: level 1 heterogeneity with categorical predictors (contrast) - Model 10: level 1 heterogeneity with categorical predictors (separate) - Model 11: size included as a random term at level 1 and level 2
8
Modelling higher-level variables - Model 12 Cross-level interactions between Envq and Type - Model 13 a second order polynomial for Envq - Model 14 cross level interactions between Size and Envq
9 10 11
Random versus fixed effects models: a comparative MCMC estimation for normal-theory models Three-level Model Changing School performance in London Schools, 1985-7: the ethnic dimension
3
© Jones and Subramanian _____________________________________________________________________________________________________________________
1. Introducing MLwiN MLwiN is a complete statistical package that has been specifically written for multilevel modelling. The letters ML stands for MultiLevel; wi stands for windows; and N stands for number of levels that can be specified (as can be seen there is no restriction on the number of levels that can be specified and calibrated). It is a highly flexible program and it can fit a very wide range of models as well as prepare the data before analysis and display the results after estimation. This Chapter provides background on the development of the program, what resources are available and background on the basic structure of the program. Various substantial developments are appearing ‘around’ the software (eg power calculations and multiple imputation of missing data) and these are also described.
Brief history of different version of the software The origins of the software go back to the mid-1980's and the current program reflects the legacy of its history. Prior to the development of the ML software, Michael Healey had written a PCbased statistical package called NANOSTAT which bore a close resemblance to a commanddriven statistical package, MINITAB (see http://www.minitab.com). Jon Rasbash, the primary developer of the multilevel software, built the initial product, ML2, a DOS-based multilevel modelling program allowing a specification of two levels, on the basis of the NANOSTAT program and the Iterative Generalized Least Squares Algorithm (Goldstein, 1986). Subsequently the program was developed to incorporate three levels (ML3) followed by n levels (MLn). In 1998 a Windows version, MLwiN, was released which built upon MLn. Subsequently Bill Browne was primarily responsible for adding a large range of new model types (eg crossclassified, multiple membership, spatial models) that can be estimated by Markov Chain MonteCarlo procedures. The present release, MLwiN 2.1, with Chris Charlton joining the team is a substantial upgrade with more flexible procedures for specifying and interpreting models. MLwiN program is composed of essentially four parts:
a front-end of a Graphical User Interface (GUI) in which a set of ‘windows’ are used to manipulate data, specify models, present results of estimation and display graphs;
a processing unit in which the data are actually manipulated and models are estimated;
a back-end, this is essentially the MLn DOS command based program with added functionality, and an interfacing module added that allows it to receive requests for actions to be executed from the front end and to pass back the results of actions to the front end. You can use the commands to write macros.
a worksheet (like a spreadsheet) in which data and estimates are stored. 4
© Jones and Subramanian _____________________________________________________________________________________________________________________
Presently, the version that is available and one that has been used for developing this training module is MLwiN version 2.1 released in Decemeber 2009.
Some resources Manuals New developments, documentation, listing of bugs and maintenance upgrades along with details of how to obtain the program are available from http://www.cmm.bristol.ac.uk/. There are a set of different manuals which are all freely available for download from the website, http://www.cmm.bristol.ac.uk/MLwiN/download/manuals.shtml Rasbash, J., Steele, F., Browne, W.J. and Goldstein, H. (2009) A User’s Guide to MLwiN, v2.1 Centre for Multilevel Modelling, University of Bristol. This provides a detailed, step-by-step, guide to fitting and interpreting a wide range of models. This is the place to start. After a short introduction to the underlying concepts of multilevel models, the guide provides detailed exemplification of two-level models with continuous response in Part 1. Part 2 considers more complex models including models where the response is binary, a proportion, a count, a repeated measure, and multivariate. This part also considers model diagnostics. Part 3 considers simulation-based methods for model estimation; this includes MCMC and bootstrap procedures. There is also an exemplification of cross-classified and multiple membership models at the end of the guide, but these are better covered in the MCMC manual (see below).
Rasbash, J., Charlton, C., Jones, K. and Pillinger, R. (2009) Manual Supplement to MLwiN v2.1. Centre for Multilevel Modelling, University of Bristol. This demonstrates the new functionality in version 2.1; that is the above User’s Guide has not been fully updated to take account of new functionality, but this Training Manual has.
Browne, W.J. (2009) MCMC Estimation in MLwiN, v2.1. Centre for Multilevel Modelling, University of Bristol. This gives a thorough coverage of the Bayesian capacity of the software; it is essential for handling data with non-hierarchical structure; but you are advised to have worked through the User’ Guide first.
Rasbash J, Browne W, Goldstein H, (2003) The MLwiN 2.0 Command Manual, Centre for Multilevel Modelling , University of Bristol.
5
© Jones and Subramanian _____________________________________________________________________________________________________________________
This is a very terse specification of each and every command. It also considers how to obtain access to the underlying IGLS algorithm, details how macros are written, describes how to undertake simulations, and provides exemplification of the estimation of cross-classified and multiple membership models. It has not yet been updated to the functionality of version 2.1 Yang M, Rasbash J, Goldstein H, Barbosa M (2001) MLwiN macros for advanced multilevel modelling, Version 2.0, Centre for Multilevel Modelling , University of Bristol. As the title suggests these are procedures for handling models that have not as yet been implemented in the GUI. Indeed thay are based on Version 1.10 of MLwiN, These include survival and event duration models and models with temporal auto-correlation at level 1. The style of the writing of this manual requires a fair familiarity with the program; this is not the place to start. Mmodels for repeated measurement with ‘extra’ temporal auto-correlation at level 1 have been included in the Manual Supplement. There is a fairly comprehensive help system that comes with the software (it has not been fully updated to take account of the new functionality of v2.1) that not only provides technical assistance on the commands and procedures, but also provides discussion on wider issues. This help system uses the standard Windows 95/98/NT help conventions. You can use the index to search for a topic or click on the 'find' tab to search on keywords for a topic. Ensure that the help system is customized for maximized search capabilities. A window will come up when you invoke the find utility for the first time asking for different search capabilities and click on maximize search capabilities. It may take a while but it is worth it. Furthermore, if you have an internet connection running, you can connect directly to the MLwiN web site from the help system. I t is very useful to know that when searching for a particular command using the index tab in the help menu you should preface the command name by the word "command". If you have questions, there is a substantial and growing resource at http://www.cmm.bristol.ac.uk/MLwiN/tech-support/support-faqs/index.shtml It reflects the questions that the developers have been asked.
LEMMA: Multilevel Modelling online course This online course is an outstanding resource to learn about multiple regression and multilevel models; it uses a special version of MLwiN 2.1 to estimate models from the example datasets. This course can usefully be followed alongside this Training manual; it is especially recommended if you need refreshing on categorical variables and interactions. It is freely available from http://www.cmm.bristol.ac.uk/learning-training/course.shtml Related Software There are four other software products that are worth mentioning REALCOM
http://www.cmm.bristol.ac.uk/research/Realcom/index.shtml
6
© Jones and Subramanian _____________________________________________________________________________________________________________________
This is standalone software (it uses MATLAB) which was the result of an ESRC project to estimate “models with responses at several levels of a data hierarchy, multilevel structural equation models, and measurement error modelling”. This software uses Markov Chain Monte Carlo (MCMC) estimation. REALCOM-IMPUTE http://www.cmm.bristol.ac.uk/research/Realcom/Imputation.pdf This is designed to make multiple imputations when there is missing data. The software will deal with categorical as well as normal data and also with multilevel structures. The model of interest is first set up in MLwiN in the usual way and then the variables exported to REALCOMIMPUTE and then the imputed datasets returned to MLwiN where they will be fitted and combined automatically for the specified model of interest. For much useful background on missing data there are a number of resources at www.missingdata.org.uk MLPowSim is a package still under development that will allow power calculations for multilevel models (it uses either MLwiN or R). Currently details are available from http://seis.bris.ac.uk/~frwjb/esrc.html
BUGS http://www.mrc-bsu.cam.ac.uk/bugs/ This is a freely available software environment for specifying models and estimating them in MCMC; the range of models is much wider than MLwiN can estimate but usually the estimation is noticeably slower. The MCMC Manual covers how a model can be specified in MLwiN, and the BUGS syntax generated so that the model can be estimated (and modified) in the latter software.
The MLwiN worksheet Before you start computing and modelling it is very helpful to understand how MLwiN stores its data and estimates. In its simplest form the worksheet can be portrayed graphically as C1
C2
C3
.
.
1 2 3 4 .
where
the columns are the variables (predictor and responses) and indicators of structure the rows are generally the level-1 units and the term cell is used to describe a particular row in a particular column.
7
© Jones and Subramanian _____________________________________________________________________________________________________________________
Besides the columns, there are also ‘boxes’ or constants, numbered from B1 to B400, and groups of columns, G1, G2 etc., which are effectively matrices. MLwiN is not case-sensitive, so you can use C1 or c1, B1 or b1, but it is common practice to name the columns in the worksheet, so that it looks like:
1 2 3 4 5 6
District 1 2 2 1 1 2
HouseID 1 1 3 2 3 2
Price 75000 42350 66700 152000 142000 112600
Type 1 1 2 3 1 1
Size 6 4 8 4 6 8
On entry to MLwiN the worksheet is empty.1 Data can be read in from files or ‘pasted’. Results from calculations and estimations can be stored in the worksheet, and previous worksheets can be restored. The worksheet is an excellent design feature because not only can you look at and use the data you have input and manipulated, but you can also use the results that have been generated by the program (such as estimated coefficients and residuals) which are stored as columns and therefore readily accessible for further use and processing. The worksheet also stores the settings and specifications for the multilevel model to be estimated. Thus stored in the worksheet are indicators of which variables are in the fixed part of the model, which are in the random and what are the levels. When you leave the program, the worksheet is emptied of both data and model structure and so you should save a worksheet if you want to use it subsequently. It is also recommended that you save your worksheet frequently and that when major changes are made you save under a new name. This is because it is possible to overwrite columns and thereby destroy you data. Crashes are also possible. The data in the worksheet is held in main memory (RAM) for speed and consequently the size of the model than can be fitted by the software is governed by the RAM that is on your computer. Multilevel models can be very data hungry (cross-classified and multivariate models in particular (because of the way MLwiN estimates these models) and can require substantial memory. The default size for the worksheet is 400 columns, 150 explanatory variables, 5 levels of nesting, 400 boxes, and 20 group labels. The overall default size of a worksheet is 1,000,000 cells. The worksheet dimensions, the number of parameters and the number of levels can be allocated dynamically (with the exception of the number of boxes) either by the INITialise command (see later) or by Worksheet item on the Options menu in the GUI. The ultimate control is, of course, RAM. A number of columns are reserved by MLwiN for storing attributes of models.2 You will not run into any problems if you initially limit yourself to columns 1 to 89. To see precisely which columns are used by different models, look up reserved columns and groups in the help system. 1
Boxes on entry contain the system’s UNKNOWN or missing value; this is a very large negative number. C1096-C1099 are used to store the latest estimates of the parameters of the current model, and their covariance matrices, as follows: C1096:random parameter estimates; C1097: covariance matrix of random parameter estimates C1098: fixed parameter estimates; C1099: covariance matrix of fixed parameter estimates 2
8
© Jones and Subramanian _____________________________________________________________________________________________________________________
The command interface This section is written for those who want to know a little bit more of what is going on in the background. We have found it helpful to have some understanding of the command interface and this becomes absolutely essential for some models. It is also very useful if something goes ‘wrong’ with the model you are estimating or when the model is being estimated by macros, that is sets of commands that are stored in a file. Here is a typical command line as entered into the command interface: AVERAGE the data in column C3 and display the result AVERAGE: is one of the many commands that are available, all of which will be obeyed instantly in this interactive environment; C3
: is short for column 3 of the worksheet.
The example command line is therefore telling the program in relatively plain English to calculate the average, the arithmetic mean, of the data stored in column C3. The other words on the line are ignored; they are there merely there to make it clear to the user what is going on. A useful command is NAME, which names the columns of the worksheet; thus NAME C1 ‘District’ C2 ‘HouseID’ C3 ‘Price’ C4 ‘Type’ C5 ‘Size’ Only the first four letters of a command are required, and it is possible to use the column names, so that our original command can become: AVER ‘Price’ The quotes around a named variable are mandatory. There are often several variants of a command. Thus, AVER ‘Price’ B1 B2 B3 B4 will take the data in the column headed ‘Price’ and store the number of observations in B1, the mean (average) in B2, the standard deviation of the mean in B3, and the standard error of the mean in B4. The B’s are known as boxes and represent a single value, a scalar. To take another example LINK ‘Price’ ‘Size’ as G1 AVER 2 G1 C6 C7 where the LINK command enables the two columns to be treated as a single entity or a group; the means of the two variables are then stored in C6, the standard deviations to C7. Groups of
9
© Jones and Subramanian _____________________________________________________________________________________________________________________
columns are equivalent to matrices and there are commands that will perform matrix algebra on groups. Commands have the following general structure: COMMAND = COMMAND NAME + ARGUMENTS where a command tells MLwiN to do something, and the arguments are (in the main) the various elements of the worksheet. To take another example: SORT the values in C1 put the results in C2 will result in the following action:
the first four letters will be examined for a valid command; SORT is in MLwiN’s internal dictionary, so the processing can proceed;
other words which are not enclosed in quotes and are not arguments such as C1 or C2 are stripped out and ignored; in this case the values in and put the results in are discarded; they help however immensely in remembering the syntax of the command;
the program checks whether there are an appropriate number of arguments, there needs to be at least two columns in the case of SORT;
the appropriate column C1 is then checked to see if it contains data;
if all the above conditions are satisfied the program proceeds, the values in C1 are sorted and the results placed into C2 overwriting any values that are already there; no result is automatically printed because there may be several thousands of rows in the column; and
the program then waits for the next command.
There are a few rules of the syntax that are worth knowing.
each command line must start with command name, which may be abbreviated to the first four letters, but some commands need only three letters; for example MAX gives the maximum of a column but MAXI sets the maximum iterations for the IGLS iterations;
you can use upper and lower case characters in any combination;
only one command per line will be acted on although a command can be used any number of times;
10
© Jones and Subramanian _____________________________________________________________________________________________________________________
columns should not be given same name as a file, names may be from one to eight characters in length; no two columns can have the same name; to remove the name of a column, enter a blank name with NAME C1 ‘’
Each command has a generic form or syntax, which is used for example in the volume The MLwiN Command Interface and in the online help command (in the index you have to look up each command preceded with the word COMMAND). These look horrendous at first sight and it does not get much better with familiarity! The key to this is to understand the generic form of what we have been calling arguments but what is referred to in the manual as parameter descriptors. These have to be replaced by specific columns or boxes etc. The full set of these in a somewhat simplified fashion is as follows:
replace with a numeral (eg 23) or a box label (eg B12)
replace with a box label of the form B9; a sequence of consecutive boxes is indicated by a hyphen (B1-B4 implies B1 B2 B3 B4)
replace with a column label of the form C88
replace with a group label such as G12
replace with a string variable such as S1 (maximum allowed is S20)
a sequence of between 1 and 8 alphabetic, numeric or underline characters in single quotes (eg ‘Price’ ‘P1' ‘P_1')
replace with a column label (eg C3), a column name (eg‘Price’)
replace with a list of column labels (eg C1 C2 C3), a sequence of consecutively labelled or named columns (eg C1-C5 and ‘District’-‘Size’ are equivalent in our example )
replace with standard DOS pathname (eg C:\mydata)
replace with standard DOS filename (eg C:\mydata\model1.ws)
angle brackets denote arguments or parameter descriptions
|
a vertical bar means a choice must be made between two forms of the arguments; only one can be used
{}
braces enclose arguments that may be omitted altogether
....
ellipsis indicates that further arguments in the form within the braces may be included on the same command line.
11
© Jones and Subramanian _____________________________________________________________________________________________________________________
To take some examples of increasing complexity, the generic form SET the value in to can be turned in to the specific example: SET B1 10 will set the contents of B1 to 10; while SET B1 B12 will set B1 to the contents of B12; RANKs of values in output to could be applied in specific form as RANK ‘Price’ c6
will put the ranks of house prices to column c6 with the house with the lowest price given the rank 1;note that the syntax shows that rank works on a column but not on a column group;3 LOGTen logarithm base 10 of values in to could be applied in the specific form as LOGT ‘Price’ ‘Size’ to c7 c8 would put the logarithm base 10 of values of ‘Price’ into c7 and the log of ‘Size’ into c8. SORT on {} key(s) in {carrying } results to {with carried data to could take the specific form SORT 2 ‘District’ ‘HouseID’ carry ‘Price’-c8 results to ‘District-c8 physically re-orders the data on two keys, the first being the district number, the second being the house identifier, the data that is in ‘Price’ ‘Type’ and ‘Size c6 c7, c8’ is carried along in the process and all the results are output back to their original columns. As a result of these commands our worksheet would look like:
1
District 1
HouseID 1
Price 75000
Type 1
Size 6
C6 3
C7 4.8751
C8 0.77815
3
If you give the command RANK ‘Price’ ‘Size’c6 c7 (no blank between ‘Size’ and c6) the program without warning will overwrite the column Size with the rank of Price, and C6 and C7 are simply ignored as extra text. Make sure you save your worksheet frequently checking that the data is still what you think it is. You have been warned!
12
© Jones and Subramanian _____________________________________________________________________________________________________________________
2 3 4 5 6
1 1 2 2 2
2 3 1 2 3
152000 142000 42350 112600 66700
3 1 1 1 2
4 6 4 8 8
6 5 1 4 2
5.1818 5.1523 4.6269 5.0515 4.8241
0.60206 0.77815 0.60206 0.90309 0.90309
There are a number of useful commands for manipulating the worksheet, such as, NAME C2 ‘District’ C2 ‘HouseId’ which gives names or headings to specific columns. NAME on its own, displays a summary of the names and contents of the columns of the worksheet. ERASE C1-C4 removes columns C1 C2 C3 and C4 from the worksheet together with their names; the dash implies consecutive integers. MARK 1 C1-C4 mark columns C1 to C4 to prevent them from being written over without warning by other commands; at the outset of a MLwiN session, the columns c1096-c1099 are write-protected because this is where the program stores its estimated coefficients and standard errors. WIPE clears all the data from the worksheet and clears any specified model. TIDY makes the existing worksheet more efficient in terms of storage. SAVE C:\myresults\newmodel1 save the worksheet and existing model settings to a specified filename with the default extension .ws; the worksheet will be saved as a binary file and nothing can be done with this file except to retrieve it back into MLwiN, it CANNOT be printed or usefully read by other software. RETRIEVE C:\myresults\oldmodel retrieves a file which contains an existing worksheet called oldmodel, the extension .ws is presumed. You will remember that stored in the worksheet are indicators of which variables are in the fixed part of the model, which are in the random and what are the levels. There are also commands that can make use of this knowledge, thus, NLEVEL B1 will output the number of levels to the box B1; so that it is possible to write macros to estimate models to your own requirements. The allowable number of levels, cells, variables and observations in the worksheet is controlled by the INITialize command, or through the Worksheet item on the Options menu in the GUI, and ultimately by how much RAM is available.
13
© Jones and Subramanian _____________________________________________________________________________________________________________________
An example of a macro Advanced users may wish to produce a set of commands to form a macro which can then be obeyed. The Command Manual has much detail and the Manual Supplement has a section on this too (8.4) and gives some examples (eg page 65). The following is a relatively basic macro (it contains no loops) to generate data to mimic a group-level intervention study, fit and display a two-level normal-distribution model. The left hand side gives the commands, the right-hand side comments on what is being done. It is meant to give some idea of what can be done! Command Note macro to generate data for intervention study Note erase previous variables in c1-100 Note macro to generate data for intervention study Note erase previous variables in c1-100 ERASE C1-C100 Note b1 = no of classes; b2 = no of students; b3 is total calc b1 = 10 calc b2 = 10 Note b1 = no of classes; b2 = no of students; b3 is total calc b1 = 10 calc b2 = 10 Note b4 is the number of classes with the intervention calc b4 =5 Note b3 is total number calc b3 = b1 * b2 Name c1 'pupil' c2 'class' Generate 1 b3 1 'Pupil' Code b2 b1 1 'Class'
Note b5 is the number of classes without intervention calc b5 = b2-b4 Note create the intervention dummy Name c3 'int' code 1 b1 b4 c3 code 1 b1 b5 c4 calc c3 = c3-1 join c3 c4 c3 erase c4 Note generate prior ability score with mean of zero and var of 1 Name c4 'Pre' Nrandom b3 c4 Note create Boy dummy as a binomial with 0.5 probability Name c5 'Boy' Bran b3 c5 0.5 1 Note calculate interactions Name c6 'Boy*Pre' c7 'Boy*Int' c8 'Int*pre' c9 'Boy*Pre*Int' calc c6 = c5*c4 calc c7 = c5*c3 calc c8= c3*c4 calc c9 = c5*c4*c3 Note generate the constant
Comment Note means that rest of line is ignored
Erases contents and names of columns B1, B2 etc are boxes which hold a single value, ie a scalar Calc is used here to assign a particular value to a box
Calc is used here to make a calculation C’s are column vectors; Name is used to assign a name GENErate numbers from 1 to b3 and store in column CODE the numbers from one to b2, each number repeated b1 times in a block, and repeat this set of blocks 1 times, and store in column
Codes a set of 1’s in c3 for b4 classes Codes a set of 1’s in c4 for b5 classes C3’s become a set of zeroes Join c4 to the bottom of c3; put the joined column back into c3
b3 random numbers to c4 from standard Normal distribution B3 random numbers to c5 from the Binomial distribution with probability 0f 0.5 with a trial of 1 Rest of predictor variables being created
14
© Jones and Subramanian _____________________________________________________________________________________________________________________
Name c10 'cons' Put b3 1 c10 Note calculate the pre-test as a function of the variables and postulated coefficients Name c11 'Fixed' Note no effect for intervention calc 'Fixed' = 0.0 + 1.0* "pre" + 0.3*"boy" -1.5 *"int" + 2.0*"Boy*int" + 0.0*"Int*Pre" + 1.0*"Boy*Pre*Int" Note create groupcodes Name c12 'grpcode' calc c12 = 'boy' + 2*('int') Catn 1 c12 0 '\ Girl NoInt' 1 '\Boy No Int' 2 '\Girl Int' 3 '\ Boy Int' Note graph the known fixed results GIND 1 1
GYCO 'Fixed' GXCO 'Pre' Glab 2 GGRO 'grpcode' GTyp 1 GLTH 3 GCLR 16 Gtext 3 Note work on stochastic element; calculate the sd of the dependent variable aver 'Fixed' b6 b7 b8 Note simulate normal with mean of zero and SD of 1 Name c13 'Lev1Noise' c14 'Lev2Noise' c15 'Post' c16 'classId' Nran b3 'lev1noise' Note as much level 1 noise as signal calc 'lev1noise' = 'lev1noise' *(b8 * 1) Note level 2 noise 0.25 of signal Nran b1 'lev2noise' calc 'lev2noise' = 'lev2noise' * (b8*0.25) Note generate index for class and then replicate the school noise Generate 1 b1 1 'classid' Merge 'classid' 'lev2noise' 'class' 'Lev2noise'
calc 'post' = 'Fixed' + 'Lev1Noise' + 'Lev2noise' GIND 1 2 Glab 0 GYCO 'post' Gxco 'Pre' GSSZ 10 gcoo 0 0 Note model Clear Resp 'post'
Put b3 1’s into c10
C12 gets the numeric values for 4 types of pupil
Assigning category names; Mode 1 assigns names (Mode 0 means clear) the \ instructs MLwiN to treat the string as text rather than a column name GINDex graph set 1 , data set 1; subsequent graphics commands refer to this data set, until a new GIND command is issued Declares ‘fixed’ as the vertical column to be plotted Declares ‘Pre’ as the horizontal column to be plotted 2 create text from group category names in grouped plots Define a grouping variable for grouped line plots. Set graph type for current data set.; 0:line, 1:point, 2:line+point Set line thickness in pixels for the current data set Set colour of lines; 16 is the rainbow option GTEXt; text labels; mode 3 places the labels on the graph Stores the descriptives of ‘Fixed’; number of values to b6, mean to b7; standard deviation to b8 to b
NB Nran gives mean of zero and SD of 1 1 here gives the same amount of noise as signal from fixed part Lev1noise gives leve 1 residuals; b3 long Lev2Noise gives class residuals ; b1 long 0.25 gives a quarter as much noise as signal Generate 1 to b1 in steps of 1 ; gives short vector of class identifiers Replicate using the distinct codes in ’classid’ carrying data from ‘Lev2noise’ find corresponding codes in ‘Class’ and generate corresponding data; storing in now long vector‘ Ctreate the response variable GINDex graph set 1 , data set 1 Hide label for current data set Define Y and X axes set the symbol size for the current data set, measured in thousandth's of the overall size of the graph window. Attach the current data set sub-graph in the current table of subgraphs; top left sub-graph has co-ordinate (0,0). Clears any previous model Declares response variable
15
© Jones and Subramanian _____________________________________________________________________________________________________________________
Iden 1 'pupil' 2 'class' EXpl 1 'cons' 'pre' 'boy' 'int' 'boy*pre' 'boy*int' 'Int*pre' 'Boy*pre*int'
Identifies the variables defining the levels Declares explanatory variables and automatically puts them in the fixed part
setv 1 'cons' setv 2 'cons' Batch 1
Set the variance terms associated with the constant at level 1 Set the variance terms associated with the constant at level 2 Turns batch mode on; iterates to converge or until max iterations
Maxit 20 Start Note make predictions of the fixed part Name c17 'pred' Pred 'pred' Gind 1 3 Glab 2 Gyco 'pred' Gxco 'pre' Gtyp 1 GGro c12 GCLR 16 GLTH 2 gcoo 0 1 gtab 1 2
Sets a maximum of 20 iterations Starts estimation
Prediction based on all foex part terms GINDex graph set 1 , data set 3 Ccreate text from group category names in grouped plots Y axis X axis Draw as lines Group codes Rain bow colours Line thickness Attaches the current data set as position 0 1 across divide the current graph set into a 1 by 2 table of sub-graphs
The macros are classified in the following help 'books': · · · · ·
General macro commands Macro commands for controlling graphics Macro commands for updating the MLwiN front end Macro commands for controlling MCMC estimation Miscellaneous new macro commands
Note that any windows which are open will not be updated while macros are running, but will register changed parameter values etc. after macro execution is complete. Windows will also be updated whenever a macro is paused. MLwiN incorporates a macro editor which is accessed from the File menu. Commands can be written directly or pasted from a wordprocessor, they can also be executed from this editor. To help ‘learn’ the commands it is useful to work in the GUI, and then view the Command interface window with the User box un-ticked; this will give the commands that have been generated by you. Very unfortunately these cannot be simply cut and pasted (as you would do with SPSS syntax) to form a macro as they contain a lot of extraneous commands that have been generated in the background as general housekeeping. This facility is however useful for seeing what is going on.
16
© Jones and Subramanian _____________________________________________________________________________________________________________________
Finally, in terms of macros, if you are running a model with a Normal response be careful to include commands in your macro which turn off pre and post file processing using the commands PREFile 0 POSTfile 0 This is because MLwiN itself uses macros to fit non-normal models.
An appendix on commands There now follows a short appendix on commands which has been written by Rebecca Pillinger of the Centre of Multilevel Modelling (University of Bristol). In the first part of this Appendix some of the more useful commands are listed alphabetically along with where to find the best documentation: C = best documentation in Command Manual http://www.cmm.bristol.ac.uk/MLwiN/download/manuals.shtml; S = best documentation in Manual Supplement http://www.cmm.bristol.ac.uk/MLwiN/download/manuals.shtml; F = best documentation in FAQs http://www.cmm.bristol.ac.uk/MLwiN/tech-support/supportfaqs/index.shtml
These are then grouped in a set of functional groups related to model specification, data manipulation etc. The final page is an aide memoire for using commands.
17
© Jones and Subramanian _____________________________________________________________________________________________________________________
ADDT BATCh CALC CENT CLEAr CODE DOFFs ECHO EDIT ENDLoop ERASe ESTM EXPA FPATh GALLfilt GENErat GIND GTAB GTYP GXCO GYCO IDEN INDE JOIN LFUN LIKE LINEa LOOP MAXIt MCMC MCOMp MONI MSTOre MWIPe NAME NEXT NMVA NOTAtion NRANdo OBEY OFFSets PAUSe PICK POSTfile PREFile PRINt PRIOr PUT
add an explanatory variable iterate till convergence or stop after 1 iteration calculate specify centring used or not for explanatory variables erase model set up in Equations window create a repeating sequence specify variable to use for denominator or offsets display output from commands in macros or not change value in individual cell(s) of column marks end of loop in a macro delete contents of column like Estimates button in Equations window like + and -buttons in Equations window generated by MLwiN at start of session generated by MLwiN at start of session create a sequence specify graph display number and dataset number specify graph trellis (also called by MLwiN when redrawing graph) specify plot type specify x variable for graph specify y variable for graph specify level identifier variable show single or multiple subscripts in Equations window join columns, boxes or values into one column specify link function for discrete response model calculate likelihood (during model fitting) like Nonlinear button in Equations window mark start of a loop in a macro specify maximum number of iterations when running model run model using MCMC display model comparison table issued during estimation store model in model comparison table erase all models from model comparison table rename a column like More button at top of MLwiN like Name button in Equations window simple or general notation in Equations window generate Normal random variable perform the commands in a specified macro issued during estimation pause macro & return control to user capture value from particular cell of column generated by MLwiN at start of session generated by MLwiN at start of session display value stored in a box generated by MLwiN at start of session create a constant vector
S C C S C C S C C C C S S C C C C C C C C C S C S C S C C C S C S S C S S S C C C C C C C C C C
18
© Jones and Subramanian _____________________________________________________________________________________________________________________
RDISt RESP RSPSs RSTAta SETV SORT STARt SWITch TRACk WSET ZRET ZSAV
specify distribution of response specify the response variable open SPSS worksheet open Stata worksheet add a random effect sort (some) columns according to values in others run model using (R)IGLS conditional statement in macro generated by MLwiN at start of session refresh windows open .wsz worksheet save as .wsz worksheet
S C S S C C S C S S S
19
© Jones and Subramanian _____________________________________________________________________________________________________________________
20
© Jones and Subramanian _____________________________________________________________________________________________________________________
21
© Jones and Subramanian _____________________________________________________________________________________________________________________
2. Getting started with MLwiN Introduction This chapter aims to provide you with some practice with MLwiN commands before you begin to fit multilevel models. It is may be helpful if you have already read the previous chapter and you appreciated some of the design concepts, particularly the worksheet, which is covered there. We are going to cover some of the basic procedures that are needed before you begin modelling. In particular, we will show you how to:
input data to the program; save and retrieve worksheets; sort data by higher level units; transform data and calculate new columns; describe and cross-tabulate data; produce histograms and scatterplots; handle categorical variables in tables and plots.
Throughout we will be using a single running example of house-price data. This chapter is best followed by working hands-on at the computer, using the GUI, as we proceed. Use the help option at any time. We assume that you have a working knowledge of Microsoft Windows as the program shares many features common to other applications such as word processors and some statistical packages. Thus, file opening and saving is standard, as is the copying and pasting.
The opening screen On entry to the program you will see a rather unappealing grey screen. At the top, highlighted in blue to show that it is the active window is the MLwiN icon, and the usual minimize, re-size and close icons. Below this is a row of headings running from File to Help; this is the main menu. Below this are some buttons START, MORE, STOP which control the estimation process. In this session we want to concentrate on getting to know what can be achieved by the features listed on the main menu. In particular, if you place your cursor over the following headings, you will get some appreciation of what can be achieved. Three headings give standard Windows operations: EDIT
provides the usual facilities for Cut, Copy and Paste
WINDOW
allows windows to cascade, and the closure of all windows
22
© Jones and Subramanian _____________________________________________________________________________________________________________________
HELP
gives access to the help system (described briefly in Chapter 1)
We are going to use some of the facilities under the following headings: FILE
importing and exporting data; saving and retrieving files
DATA MANIPULATION
calculation, sorting of data
BASIC STATISTICS
averages, cross tabulations, correlations
GRAPHS
customize graphs for histograms and scatterplots; trellis graphs when a categorical variable is involved.
OPTIONS
controls the size of the worksheet, the amount of decimal points to be displayed, and the location of directories for acquiring macros
There are two headings we are not going to look at all in this chapter, these are,
MODEL
facilitates the specification of the multilevel models; the next chapter will focus on these commands
ESTIMATION
controls which estimation process is used.
Data Input: copying data from an Excel spreadsheet In an Excel spreadsheet enter the following 20 values with their headings (this page and over) House 134 8 2 161 153 102 61 139 164 80 155 7 24 76
District 7 1 1 8 8 5 3 7 8 4 8 1 2 4
Price 93 62 63 55 157 86 52 106 78 78 58 80 90 55
Size 6 4 5 3 8 4 5 7 6 6 5 6 6 7
Type Det Terr Semi Terr Det Semi Terr Semi Terr Terr Terr Terr Semi Semi 23
© Jones and Subramanian _____________________________________________________________________________________________________________________
66 194 137 5 150 114
4 9 7 1 7 6
45 90 121 67 118 99
2 9 8 4 9 7
Terr Semi Semi Semi Terr Semi
These values have been taken from a larger file, which we will be using later. As is common, the data are a mixture of categorical (ie alphabetical) and numerical values.
Highlight all five columns and all values and copy them (CTR-c should work)4
Switch across into MLwiN and on the Edit button on the Main Menu choose Paste, this should bring up the dialog box that follows:
4
For this procedure to work properly, the data should be organized in rows and columns of equal length, with each column separated or delimited by white space or tabs. 24
© Jones and Subramanian _____________________________________________________________________________________________________________________
Tick the box for ‘Use first row as names’ Tick the box for ‘Free columns’ This simply means that the values 134, 8, 2 etc are ready to be pasted into a previously free column c1 and we are going to name this variable ‘House’ . The values 7,1,1 etc are going to be put into column 2 and this variable is going to be called ‘District’ and so on. You will see at the top of the screen that you can set a particular value (eg -9.9 etc) to be missing. If you have missing data it is a good idea to use the same missing value code for all variables.5 If everything is correct, click on the Paste button at the bottom of the screen. The Paste View window should close to be replaced by the Names window
5
By default, a value of System Unknown (a very large negative number, -9.999E+29) is in force. There is much more on importing data into MLwiN at http://www.cmm.bristol.ac.uk/MLwiN/tech-support/support-faqs/data-n/index.shtml
25
© Jones and Subramanian _____________________________________________________________________________________________________________________
If you are working on a networked computer you may hit a problem as during the pasting process, MLwiN will need to write a small file and the current directory may be a ‘read only’ one. If you encounter this problem, proceed as follows: Options on Main menu Directories Browse to get ‘My documents’6 as the current directory as follows
Let us take another look at the Names worksheet (it is a good idea to keep this window always open as it gives the structure of the worksheet). The bottom of the main window gives you a Tab for each open window; clicking on the relevant tab will make that window active. If you have closed the Names window, then Data manipulation on main menu Click on Names To give again the following summary of the worksheet
6
Or some other directory to which you have permission to write.
26
© Jones and Subramanian _____________________________________________________________________________________________________________________
There are 5 columns, each with 20 observations. Notice that you are provided for each named column a minimum and maximum, a count of the total number of missing vales; and that the text variable ‘Type’ is automatically given a categorical coding indicated by the status ‘True’. Across the top of the Names window, you will see the following functionality which all works by clicking on a column or groups of columns Name
this edits the name of a column; at the outset, columns are named c1, c2 etc. Consequently, it is good practice to edit these and replace the name with something memorable as you soon develop a lot of columns.
Description
You can add a long distinctive Description to each column to fully document your worksheet.
Toggle Categorical
A toggle is a command that is like an on-off switch; if you highlight a column; if it is categorical (ie True) it changes to numeric (ie False); if it is already numeric(ie False), the column will become categorical (True).
View
allows you to ‘see’ the values in the column, we will do this soon. Copy, Paste and Delete are as you would expect
Copy, Paste, Delete
Not surprisingly these commands allow you to Copy from a column(s), Paste to a column(s), and Delete a column(s).7
Categories
This allows you to set the labels of a categorical variable.
Regenerate
drops the labels for any codes which do not appear in the data, while the labels of any codes which do appear in the data are preserved.
There are two other aspects of functionality Help This gives help on the Names window; it is currently out of date and refers to an old version of the software. A tickbox for Used columns
Ticking on this box results in the display only of columns with values them; this can be useful if you have stored data in ‘high’ columns; eg the estimates are automatically stored in c1096-c1099; see Chapter 1.
To view the data: In the Names window Highlight House to Type until all 5 variables turn dark blue Click on View in the Names Menu This will bring up the following data8 7
If anything goes wrong and the incorrect data has been pasted, highlight columns c1-c5 and click Delete on the Names window, and then re-paste having copied the correct data from Excel
27
© Jones and Subramanian _____________________________________________________________________________________________________________________
This shows the actual values for all 20 observations. As the Show value Labels box is ticked, you will see text for categories. You can use the slider to move up and down the observations. It is clear from this display that the data are not sorted by district by house. Having got the data into MLwiN, is is a good idea to save the worksheet. Appropriate sorting must be completed before modelling.
Saving and retrieving the worksheet To save the worksheet File on the Main Menu Save worksheet as C:\Training manual\house.wsz
8
You could also have used Data Manipulation on Main Menu View or Edit the data Click on View and highlight the columns you wish to see Drag the right-hand window to see all the five columns;
28
© Jones and Subramanian _____________________________________________________________________________________________________________________
That *.wsz extension is as a MLwiN 2.1 compressed worksheet.9 Of course, the exact path and filename depend on your preferences; but make a note where you have stored it and the name you have used To retrieve a previously saved worksheet:10 File on the Main Menu Open worksheet C:\Training manual\house.wsz
Data manipulation Data manipulation window under the main menu offers a range of options and we are going to explore some of them in this chapter. The data must be sorted appropriately before the data can be modeled correctly. House within the same district must be next to each other in the worksheet. The sequence for sorting is as follows: Data Manipulation on Main Menu Sort Increase number of keys to 2 [because houses nested in ditricts] Choose District as the highest key [slowest changing] Choose House as the lowest key [fastest changing] Highlight House to Type (ie all 5 columns) Click on Same as Input Click on Add to Action List Click on Execute
The graphic below shows the Sort window just before the execution button is pressed. It is worth spending a little time on this window as it has a number of design features that are common to several data manipulation windows. The window is divided into two panes; on the left pane, actions are specified and columns chosen, while on the right pane there are buttons for actually carrying out the actions. The common elements buttons are as follows: Free columns
selects the first set of free or empty columns for the output; they will not necessarily be next to each other.
Same as input
the input columns are effectively overwritten by the output from the action.
Add to action list
this adds the chosen action to the list on the right-hand pane of the
9
Using this extension means that the worksheet is ‘zipped’ as it is stored, making a huge reduction in storage, typically over 95% on a non-compressed file. The worksheet can also be stored in Minitab, SPSS and Stata format; see the Manual Supplement; section 6 10 It is also possible to read data files in Minitab, SPSS and Stata format, but we strongly urge you to reduce the number of columns to those that you are going use in the modelling, before importing the data.
29
© Jones and Subramanian _____________________________________________________________________________________________________________________
window, the actions are not yet executed, merely selected.
Execute
executes all the actions selected on the right hand pane, a * will appear in the action executed column at the top of the right-hand pane.
Remove all
removes all actions not yet executed.
Remove
remove any highlighted action from the Action list, but only if it has not been executed.
Undo
undoes all the executed actions, that is those on the action list with a *
Viewing the revised data (use Names or Data manipulation, then View or Edit Data or click on the Data tab if the window is still open) it is clear that the data has been sorted so that all the houses in District 1 are adjacent to each other in the worksheet, and then houses in District 2 and so on. This is a fundamental requirement that has to be done before modelling; if it is not done the results obtained are very unlikely to be the correct ones. When you are satisfied that the data
30
© Jones and Subramanian _____________________________________________________________________________________________________________________
are as required and that nothing has been overwritten, you can save the worksheet in the usual way. Closing a window means that the action , in this case Sorting, cannot be undone.
Recoding variables A useful procedure is to be able to recode a variable. Here we will recode the numerical size variable into two groups: houses with more or less than 5 rooms. 11 The sequence is: Data Manipulation on Main Menu Click on Recode Choose by Range (as it is a numerical value) Values in range 1 to 5 to New Value 0 Highlight input column Size Free column [highlights c6] 11
If the letter 'm' is entered in the ‘values in range’ box this is expanded to 'missing' and allows you to operate on the current missing value code as a condition.
31
© Jones and Subramanian _____________________________________________________________________________________________________________________
MUST Add to Action List Values in range 6 to 10 to New Value 1 Add to Action List [as Size & C6 already highlighted] Execute
We can now use the Names window to re-name column 6 to ‘largep’ (with the Edit name function); change from numeric to categorical (with Toggle categorical to True function), and give a more descriptive categorical textual label to the categories than the default labels (with the Categories function). As you can see we have used the labels ‘Small’ for those houses coded 0 , and ‘Large’ for those code 1. Here is the data after recoding, renaming, and re-labelling
32
© Jones and Subramanian _____________________________________________________________________________________________________________________
Calculations and transformations The next type of data manipulation that we will consider is calculations involving columns of data. A straightforward calculation is to calculate the price per room for every house. The sequence is:
Data Manipulation on Main Menu Calculate Highlight c7 in left-hand pane and click across arrow Click = or type on keyboard Highlight Price and click across arrow Click or type / [that is divide] Highlight Size and click across arrow Click Calculate The price of house has been divided by the number of rooms and the result has been stored in a new column 7. The calculate facility can also be used to perform transformations. For example, we may wish to model the logarithm (base 10) of price. The sequence is now: Clear the previous sequence by deleting it Highlight c8 in left-hand pane and click across arrow Click = or type on keyboard Highlight LOGTen in list of functions, click up arrow Complete the equation so that it reads C8 = LOGTen(“Price”) After naming the new columns Cost and LogPrice, viewing the data gives the following results:
Again save your updated worksheet. Clearly, the arithmetic expression may be entered in three different ways:
using the keyboard to enter numbers, column names (which must be enclosed in quotes ‘’), column numbers, and the arithmetic operators +, -,
33
© Jones and Subramanian _____________________________________________________________________________________________________________________
* (multiplication), / (division), ^ (exponentiation, that is raised to a power), or – (unary, which has the effect of multiplying the following term by –1);
using the visual keypad in the lower right hand side of the window to enter numbers or arithmetic operators;
using the drop down menu box listing arithmetic and matrix transformations, and the upwards pointing arrow to place it at the cursor position; columns which are to be operated upon must be enclosed within round brackets.
The 'missing value' button allows the missing value code to be included in operations. The calculator follows standard mathematical practice, but it is a good idea to separate the elements of a complicated expression with round brackets. The calculate window also allows logical expressions to be evaluated, returning the value 1 if true, 0 if false. The following relational operators are allowed: < >= ==
is greater than is greater than or equal to is equal to
|
or
as are the following Logical operators & !
and not
As an example we could have recoded the size variable into 0 for below 5, and 1 for above 5 in a single command: C10=(‘Size’>=6) To produce a code in C10 that identifies properties more than 5 rooms and over £70 thousand pounds, we could use the following: C10=(‘Size’>=6 & ‘Price’ > 70) Calculate can also operate on matrices.
34
© Jones and Subramanian _____________________________________________________________________________________________________________________
Data Description The next stage is to undertake some statistical description. We begin with averages and correlations. which can be found under Basic Statistics on the Main Menu. For example we can describe the two numeric variables Price and Size
Results go to an Output window which automatically pops up when you click the Calculate window; this also shows the commands that were used in the back-end to derive the results. AVERage 2 “price” “size” N Missing price size
Mean 20 20
0 0
s.d. 82.650 5.8500
27.993 1.8994
The output in the output window can be cut and pasted in to a word-processor. It is also possible to calculate a weighted average. For example, if the relatively-rare detached properties had been over sampled by a factor of 2 to get a more reliable estimate we could attach weights which are 0.5 for detached properties; 1.0 otherwise. [Bug: This does not appear to work currently] To select more than one column hold down the control key while highlighting. Correlations between variables are derived in a similar fashion. Note the means and standard deviations will also be produced automatically. CORRelate 3 “price” “size” “logprice” 20 observations Means price size 82.650 5.8500
logprice 1.8951
S.D.'s price 27.993
logprice 0.14111
size 1.8994
Correlations price size
logprice
35
© Jones and Subramanian _____________________________________________________________________________________________________________________
price size logprice
1.0000 0.7324 0.9826
1.0000 0.7587
1.0000
We next proceed to deal with categorical data. The simplest procedure is to produce a table or a tally for a single variable using Basic Statistics and Tabulate.
Here we have chosen a single variable for the columns of the table; that is Type; we have also checked Counts, but not Means. All the other column names in the table (the two instances of house) are not highlighted. We have also asked for the percentage of row totals. The results from the output window are TABUlate 1 N %
'Type' Det 2 10.0
Terr 9 45.0
Semi 9 45.0
TOTALS 20 100.0
There are three types of house in our sample of 20; 10% of houses are detached. The next graphic shows the choice of decision to cross-tabulate type on the columns with Largep on the rows (row box checked); and Percentage of the Grand Total checked.
36
© Jones and Subramanian _____________________________________________________________________________________________________________________
Columns are levels of Type Rows are levels of Largep
Small N TOTAL %
Det 0 0.0
Terr 5 25.0
Semi 3 15.0
TOTALS 8 40.0
Large N TOTAL %
2 10.0
4 20.0
6 30.0
12 60.0
TOTALS TOTAL %
2 10.0
9 45.0
9 45.0
20 100.0
The revised output window shows that 5 of the 9 terraces are small properties; whereas both detached properties are large. The final example shows the results when we cross-tabulate type and property size again, but this time we request descriptive statistics (via the Means option) for the continuous variable ‘Price’ The distinctive feature here is that the Variate column contains the continuous variable. The results show that mean price changes with both property size and type. Large detached houses having the highest price of 125k; while small terrace properties only average around 54k. Variable tabulated is Price Columns are levels of Type Rows are levels of Largep 0 N MEANS
Det 0 *
Terr 5 54.4
Semi 3 72.0
TOTALS 8 61.0
37
© Jones and Subramanian _____________________________________________________________________________________________________________________
SD'S
*
6.43
12.3
8.82
1 N MEANS SD'S
2 125. 45.3
4 88.5 19.7
6 93.5 22.1
12
TOTALS MEANS SD'S
2 125. 45.3
9 69.6 13.8
9 86.3 19.8
20 82.7 20.2
97.1 25.1
As can be seen from the windows there are several features we have not used:
there is an option to filter on other variables; so there results could be derived for houses only in district 1.
you may also alternatively or additionally request percentages of the row, column or grand total and a chi squared statistic for testing row and column independence; the latter also outputs the contribution to the overall chi squared for each cell of the table.
the cells of a tabulation can also be stored in a column by checking the box for Store.
Graphical display This final section of this chapter now uses the graphical procedures to produce some histograms and scatterplots. There is a lot of fine tuning available in MLwiN; here we are keen to limit discussion to the basics. There are three levels of structure in the graphs
there are ten graph display sets, D1-D10 which are accessed using the top left drop down menu; a display is what can be shown on a computer screen at one time;
everyone of these 10, can be subdivided into up to 25 sub-graphs using the Layout button, each of which is displayed in its own panel with its own X and Y axis;
for each 25 sub-graph we can have up to 49 datasets plotted (indicated by ds #), each one consisting of a set of X and Y coordinates held in worksheet columns which can be drawn on the same or on different sub-graphs.
This sound horribly complicated but an example should help. A single graphic display (D1 say) could consist of 3 subgraphs; one for a histogram of Price; one for a scatterplot of Price against Size, and one for Price against size together with Cost against Size on the same subgraph. The first two subgraphs would consist of a single data set (Price for the histogram; Price and Size for 38
© Jones and Subramanian _____________________________________________________________________________________________________________________
the simple scatterplot), but the third graph would have two datasets (one for Price and Size; the other for Cost and Size). Let us start by sticking with graph display D1, and drawing a histogram for the Price data. Graphs on main menu Customised graphs In the customized graphics window; highlight Price for y axis; and choose Histogram for plot type. The completed window should then look like: To draw the plot, click on the Apply button which will open a new window in which the histogram will be displayed. If you click inside the Graph Display window, a Graph Options window will pop up; complete the Titles tab with a suitable title, and use the Scale tab to choose ‘nice’ values for the vertical and horizontal axes. Clicking on the Apply button brings up the graph with the required titles and scales. Not including the border, gives the cut and paste graph as follows
Our next aim is to plot a scatterplot of Price against size for all houses. Click on Del Data set on the Customized graph window to remove the histogram. Choose Point on the Plot Type box on the Plot What? tab and choose Price for the Y axis and Size for the X axis as shown below. 39
© Jones and Subramanian _____________________________________________________________________________________________________________________
Clicking on the Apply button will produce the scatterplot, but to get the graph below you will have to alter the Titles and choose Autoscale for both axes
Clicking inside the graphical display and as close as possible to the most expensive property, and choosing the Identify point on the Graph Display tab to identify that the point is a 8 roomed house costing £157k, and is in the 16th row of the worksheet. Having got some feel for plotting, let us look at the Customized Graph Display window in a little more detail. On the far left there is a table grid which lists the data sets included in the current graph and there is a further column (accessed by scrolling to the right of this grid) which displays the order in which the graphs are to be plotted. This can be changed from the Other tab.
40
© Jones and Subramanian _____________________________________________________________________________________________________________________
Clicking on a row in the table grid will result in the right-hand side window displaying, for modification, the details of the particular data set and associated graph. After a data set has been modified or specified you must click the Apply button for these changes to take effect. The modifications to a graph are based on a set of tabs: The Plot what? tab allows you, via drop down menus, to specify the following: Y and X variables:
both for a scatter plot; Y for a histogram;
Grouping variable:
a group is a higher-level identifier such as district, we will see later how to produce a varying relations plot with one line per district using this command;
Filtering variable:
as the name suggests this is used to select only those points for plotting where the value in an associated filter column is equal to 1;
Plot type:
we have already seen histogram and point, other options are Line - which draws lines between adjacent points in the order they appear in the column, Line + Point draws the points and joins adjacent points in the order they appear in the column with straight lines; and Bar is a bar-chart
Row and columns
this allow you to plot what are called trellis graphics whereby a set of graphs are plotted for each category of a categorical variable; for example if we choose Type for col codes
We get three graphs; a scatterplot of Price and Size for each type of property. You will have to re-size the graphs to get all the labels on each graph.
41
© Jones and Subramanian _____________________________________________________________________________________________________________________
Row codes result in graphs being placed one above each other; column codes result in the graphs being placed alongside each other. These category settings can be used in combination with each other to see the effect of more than one set of categorical variables. The Plot style tab allows you to specify the following: Symbol type and size:
there are 14 different symbol types available with type 1 (triangle) as the default; any integer symbol size can be specified, with size 25 (points) as the default.
Line type and thickness:
there are 6 line types (default is 1 for a continuous line), and 5 line thicknesses (default is 1 for the thinnest); if you are plotting a Bar, line type controls the type of shading that you can use to colur the graphs.
Colour:
there are 16 colours available for symbols and lines, the default being blue); option 16 allows the colours to be rotated to show different groups;
The Position tab allows you to position the current graph within a 5 x 5 matrix of subgraphs; if each subgraph is placed in the same position the sub-graphs will be superimposed on each other; The error bars tab allows the plotting of confidence limits; The Other tab allows you to produce a key to the plot with labels for subsets of points or lines in the graph. To construct labels you have a choice of using either the variable used to group data or any other variable (click on check box above and enter name of variable). This tab also allows you to change the plot order for the currently selected graph so that particular symbols or lines can appear to be superimposed on others. Its final feature concerns highlights. This is particularly useful when there are several related subgraphs on a display. You may highlight an outlier in one graph with a distinctive colour and then this point will show up as a highlight on the other graphs. The Labels button deals with how the labels and legends are placed on each graph. The final item on the display is the check box Autosort with the tick signifying that it is on it as the default. This ensures that the data are plotted in ascending order of the horizontal variable. This is important if a grouping variable is specified. Playing around with the options we can produce the following: Two graphs on the same plot (via Layout) with Log of Price and Price being plotted against number of rooms; we used Plot Style to change the nature and size of the symbols
42
© Jones and Subramanian _____________________________________________________________________________________________________________________
Two scatter plots on the same graph; one showing price by number of rooms; the other showing cost per room by number of rooms.
Starting with version 2.26, MLwiN has an updated facility for drawing graphs which utilizes the Gigasoft ProEessentials scientific graphic software, Version 6 (http://www.gigasoft.com/). While this change is largely transparent to the user it does have one big advantage in the form of a higher-quality and editable graphics image that can be exported to word processor software. This can be accessed by right clicking in a Graph display. You can see that you can change the font size or change the numerical precision of the values plotted on the axes; here we want to choose the export dialog which brings up another window
43
© Jones and Subramanian _____________________________________________________________________________________________________________________
The default is to send the image to the clipboard with a size of 152 by 101 millimetres in the form of an EMF. The latter stands for Enhanced Meta File so that the image is exported not as a pixelated bitmap but as a vector. The advantage of this is that vector graphics use geometric primitives to constitute an image. This makes for a much clearer picture when enlarging pictures, as these files are not resolution dependent. Re-scaling will not result in the loss of any quality. Pasting the image for the clipboard into Word and right clicking on the resultant image and choosing to Edit the picture allows it to become a Microsoft drawing object. This means that the graph can be editable so that you can change the headings in terms of types of fonts and font size. The downside is when you are making multiple graphs in the same screen (that is using differing positions for graphs in one overall graphic), You can only export one graph at a time and not the overall graph. You can either use the export dialog to export each image one at a time and then paste them into a suitably sized table. Or you can still continue to use the bitmap procedure: The latter operation of a simple copy of the screen is facilitated by Edit on main menu and copy screen shot, which will put the image on the clipboard. The command window and the output window The command interface is where old-styled command can be given directly. On the command screen, if the user box is unchecked then, in addition to user specified commands, all commands issued by the MLwiN front end, that is system generated commands will also be displayed. Clicking on Output will open a window where all the commands and associated output and results will be displayed. This window will contain all user generated commands and associated output since the start of the MLwiN session. A separate check box at the bottom of the output window specifies whether MLwiN front end generated commands and associated output will be displayed – but these will only be displayed following the checking of the box, not from the start of the session. This is useful, for example, if the user wishes to monitor error diagnostics occurring, say an iteration is proceeding very slowly, then stop after the iteration, open up the command interface output window, check this box and restart the iteration, leaving the window open. You will then see any error diagnostics being displayed. Creating a log activities It is often useful to keep a log of activities and to print out parameter estimates and values for variables. A ‘log’ can be turned on in the following manner Data manipulation on main menu Command window
44
© Jones and Subramanian _____________________________________________________________________________________________________________________
Logo c:\templog.txt Print ‘price’ ‘size’ Fixe Rand
turns the log on prints out the values of variables lists fixed parameters estimates lists random parameters estimates
Logo
turns the log off
The file c:\templog.txt can now be read into a favoured word-processor ; we recommend a nonproportional font (for example Courier) and a small size font (such as 8 or 9 point) so as to preserve column alignment.
Copying text When only small amounts of output need to be transferred to a word-processor, you can select text and copy to the clipboard for input to other software, from where it may be printed etc. Use the copy item in the edit menu. To select one individual line in these screens, simply click on it. To select multiple individual lines hold down the keyboard control key and click on each line you wish to select. To select a range of lines click on the first one, hold down the shift key and click on the last one of the range.
45
© Jones and Subramanian _____________________________________________________________________________________________________________________
3. Fitting a two-level model Introduction This chapter aims to provide a straightforward example of fitting a two-level model with a continuous response and a continuous predictor. It is concerned with the practicalities of model specification and estimation. In essence it is a very short guide to the GUI of MLwiN. As always with this program there are several ways of doing the same thing and we will try and guide you through a convenient route. We will consider the following models: 1
2 3 4
a random intercepts null model with Price as the response; no predictor variables (apart from the Constant) and with the levels defined as houses in districts; the so-called empty or null RI model a model which additionally includes the Size of house; a model in which the parameter associated with Size is allowed to vary over District; that is random slopes as well as intercepts; a model in which a particular district is treated as an outlier.
For any multilevel model, there is a basic sequence of procedures which we will follow:
data input; sorting, creating the constant term; model specification: response, predictors, level, terms for the fixed and random part; estimation: the fitting of the specified model by a chosen procedure; examining the estimates and values such as standard errors estimating the residuals at each level for diagnosis of model ills and sometimes to make substantive interpretations; graphing the results both to look at estimate residuals and predictions from the estimated model model re-specification, and the cycle begins over again.
We presume that you have worked through Chapter 2 but give in abbreviated form the commands for reading the data and manipulating the input into its required form for modelling.
Data input and manipulation (Version 2.1 reads SPSS, Minitab, Stata files) Here is a recommended sequence to read an ASCII file: Data input File on Main Menu ASCII text file input Columns: c1-c5 File: c:\kjtemp\house.dat (change to all files to see this one) OK
46
© Jones and Subramanian _____________________________________________________________________________________________________________________
Name columns The Names window will open automatically; highlight each column in turn and click on Edit names to give the following names Names C1: House enter C2: District enter C3: Price enter C4: Size enter C5: Type enter Naming categories Highlight Type and Toggle Categorical which will change the categorical heading from False to True. Keeping Type highlighted, click on Categories; which will bring up the Set categories dialog box; highlight each name in turn, click Edit and give the categories as shown Categories 1: 2: 3: OK to complete
Terr Semi Det
The completed Names window should be as follows
Save the worksheet File on Main Menu Save worksheet as c:\kjtemp\house2.wsz Remember to write down the complete filename you have used. Saving the worksheet will save the data, the names, the categories, the equations that form the model specification, the current estimates and the commands to re-draw any graphs.
Sorting the data: houses within districts
47
© Jones and Subramanian _____________________________________________________________________________________________________________________
The program requires that hierarchical data are sorted so that all lower level units are grouped by higher level units; this is achieved by sorting. It is very important that all other relevant data are ‘carried’ in this sort ; otherwise, the data will get out of order and incorrect results will arise. Data Manipulation on Main Menu Sort Increase number of keys to 2 Choose District as the highest key [slowest changing] Choose House as the lowest key [fastest changing] Highlight House to Type Same as Input Add to Action List Execute
Check data and save sorted worksheet In the Names window (you can use the tabs at the bottom of the main MLwiN window to navigate between currently opened windows) Highlight the columns names House to Type inclusive and click on View to bring up the data extract
If it looks correct, save the revised data (it is good practice to do this as you go along) File on Main Menu Save [as House 2.wsz} Yes to overwrite There is a final variable we have to create before beginning modelling: the constant; that is a set of 1’s. There are many ways of doing this but you must ensure that there is a 1 for each and every house. The simplest way to achieve this is: Data Manipulation on The Main Menu
48
© Jones and Subramanian _____________________________________________________________________________________________________________________
Generate Vector Constant Vector Output Column: 6 Number of copies: 1126 Value: 1 Generate Close window
The Generate vector just before Generate is clicked should look like:
Edit the name c6 so that it is called ‘cons’. After saving the revised worksheet, you are ready for modelling; close the View data windows.
Model 1: two-level null random intercepts Specifying the model Go to Model on the main menu. Clicking on Equations will bring up the following screen which is the heart of the program. Here models are specified and estimates displayed. (It is also possible to specify models in the command window and to see the equations displayed there.) Ignoring the bottom tool bar for the moment; there are two equations:
49
© Jones and Subramanian _____________________________________________________________________________________________________________________
y is the response; N indicates a normal distribution for a fixed part Xβ and a random part Ω; β0 is the first fixed part estimate to be specified, and x0 is the first predictor variable to be specified. Red (or probably a paler grey in these notes!) is important as it indicates that the variable and the parameter associated with it have not yet been specified.
To specify the response, click on either of the y’s and complete the pop-up menu as follows: y price [replaces none] N levels to: ij [that is 2 levels] Level 2 (j): District [j is higher level unit] Level 1(i) House [i is lower level unit] Done To specify the predictor to be a constant in the null random intercepts model; click on either β0 or x0; complete the pop-up menu as follows: x Tick Tick Tick Done
cons fixed part j district i house
[replaces none] [includes β0 ] [allows β0 parameter to vary at level 2] [allows β0 parameter to vary at level 1]
This completes the specification and the revised screen shows the variables and parameters have changed from red to black indicating that specification is complete.
50
© Jones and Subramanian _____________________________________________________________________________________________________________________
Pressing the + button on the bottom toolbar increases the detail; pressing + again will bring even more detail. You should now see the full algebraic specification of the model. Pressing - reduces the detail, clicking on the Zoom button allow the fonts size to be varied. You can copy this specification and paste into as graphic into a word-processor
Before proceeding to estimation it is always a good idea to just check the hierarchy with the following sequence: Model on main Menu Hierarchy viewer
This gives the summary of the number of houses in each and every higher-level district. Close the windows when you have examined the structure and it is as given here. Any problems are
51
© Jones and Subramanian _____________________________________________________________________________________________________________________
likely to be a result of incorrect sorting. Here there are 50 districts and they are numbered from 1 to 50, and there is a maximum of 25 houses in a single district. Estimating the model Before estimation begins, click on estimates in the lower tool bar twice. The blue values are to be ignored as they are not the converged values. To start estimation click the START button at the top of the screen, watch the screen at the bottom as the fixed and random parameters are estimated district by district and the ‘gauge’ tanks are filled, as the iteration counter increases. As the parameters converge on a stable value, the coefficients in the Equations window will turn green. The letters IGLS next to STOP inform you that the default estimation procedure is being used: iterative generalized least squares. When all the estimates are green, the overall model has converged, and these are the estimates you want. (Unlike single-level models estimated by ordinary least squares; the multilevel model does not have a simple analytical exact solution; rather the IGLS algorithm performs an iterative sequence of Fixed-Random Fixed until a stable solution is reached.) For model 1, the following estimates are derived:
The terms in the Equations window represent parameter estimates with their estimated standard errors in brackets. We will discuss the log-likelihood later, 1126 out of 1126 cases in use means that there are no missing values in our data.
Some questions:3.1 What does 80.98 represent? And 170.3; and is it significantly different from zero? And 629.7? Does it appear that house prices vary between districts? Answers at the end of the Chapter We can usefully get some feel for the size of the district effects by using the assumption that these effects come from a Normal distribution with a variance of 170.314 and a mean of 80.98 to derive 95% Coverage Bounds.
52
© Jones and Subramanian _____________________________________________________________________________________________________________________
Data Manipulation on main menu Command interface and then type the following commands to get the upper and lower coverage bounds ->calc b1 = 80.98 + 1.96* (170.314^0.5) 106.56
[^ 0.5 means take the square root; raised to the power of 0.5]
->calc b1 = 80.98 - 1.96* (170.314^0.5) 55.401
On average across all of London the average house price is some £81K; in the dearest 2.5% of districts it is £107k in the cheapest 2.5% it is £55k. Using Coverage Bounds in this way gives us a very good feel for what the model is telling us in the original metric of the data (£K’s)
Estimating residuals The next stage is to examine the residuals which are the ‘neighbourhood effects’, the latent variable at level 2. One useful procedure is to estimate the level-2 residuals, their ranks and produce a ‘caterpillar’ plot to see which are significantly different from the overall average. The sequence is: Model on Main Menu Residuals Change 1.0 to 1.96 SD (comparative) of residual Level 2 : district [replace 1 house] Click Set Columns Calculate
The completed screen should look like:
53
© Jones and Subramanian _____________________________________________________________________________________________________________________
giving the columns where the requested values are to be stored; eg the residuals are in C300 and their ranks in c305. To view the values you can either use the View data window, or use the command interface to print them out. Return to the residuals window and select the plots tab, and on the single pane at the top of the screen, select the ‘residual +/- 1.96 SD x rank button and then Apply (Notice that D10 is the default graph display for this plot; ie the commands to execute the graph will be stored in Display 10.)
54
© Jones and Subramanian _____________________________________________________________________________________________________________________
This gives a caterpillar plot, which plots each residual with its 95% confidence band against rank. By clicking on
the graph we can identify the cheapest and dearest districts.
Some questions:3.2 What does the Cons variable represent? What does the dotted line at 0 represent? What are the triangles? What are the whiskers around the triangles? Why are all roughly equal here? Click in the graph and use the Identify points tab to answer What is the dearest district; what does a house on average cost there? What is the cheapest district; what does a house on average cost there?
Making predictions and drawing varying relation plots The next task is to make predictions of houses prices in each district and then to plot them in a customized graph. Model on Main Menu Predictions
55
© Jones and Subramanian _____________________________________________________________________________________________________________________
the top screen needs to be completed by choosing items from the middle screen, the bottom buttons control the form of the results and where they are going to be stored. Below is the completed screen to derive the predicted mean prices for each district; the level-1 residuals remain ‘greyed-out,’ and the results are stored in column 7 which is currently unused. Clicking on an item toggles it in and out of the equation. Calculate needs to be pressed to make the calculations. Nothing appears to happen but if you View the data you will see that a set of predictions has been made.
Some questions:3.3 If you request just β0 what do you get in c7? If you request β0 μ0j and eoij ? If you request μ0j ? If you request β0 and μ0j ? [This is the one you really want!]
Next bring up the Customised graphics window Graphs Main menu Customised graphs
56
© Jones and Subramanian _____________________________________________________________________________________________________________________
Currently the D10 graphic display is in operation as this was used to produce the caterpillar residual plot. Change this to D1 Choose y is c7 x is size [this is not yet in the model] Group is district [to get a line of predictions for each district] Plot type is line and point Apply The completed window is
The resultant graph after titles have been added and without the surrounding box is
57
© Jones and Subramanian _____________________________________________________________________________________________________________________
Click the points to identify the two most expensive districts as districts 34 and 43.
Some questions:3.4 The model does not include the size of the house, but the predicted values have been plotted here for convenience and visibility What do you think will happen to the effect for district 34 when we do take account of the size of houses in that district? What do you think will happen to the effect for district 43 when we do take account of the size of houses in that district?
That completes the first model, save the worksheet, model equations, graphs and estimates to a file called model1.wsz, after giving the name Yhat1 to column 7. Close all windows except the Equations and Names window. Model 2: 2-level random intercepts with a predictor centred on a specific value Specifying and estimating the model To include the new variable in the fixed part of the model, click on Add Term on the bottom toolbar of the equations window In the Specify term pop-up window Leave order to be 0 (this can be used to create 1st, 2nd order etc interactions) Specify variable to be Size Because it is not a categorical variable you will be asked what should be done about centering Choose centering around the value 5 which is the median house size; this will give an interpretable intercept Done
The initial estimate is zero and the model has to be estimated by clicking on More in the top toolbar, estimation will progress from the current estimates; START restarts the estimation from the beginning. After some iterations the model will converge when all the estimates turn green.
58
© Jones and Subramanian _____________________________________________________________________________________________________________________
Some questions: 3.5 What do the estimates represent 75.7 10.7 94.4 359 How have the values in the random part changed from Model 1?
Calculating and graphing level-2 residuals Model on Main Menu Residuals Start Output at C310
[not to overwrite existing values from Model 1] [to get 95% confidence intervals]
Change 1.0 to 1.96 standard errors Tick all types of residuals Level 2: district Set Columns [to get all output columns] Calc [to estimate] Return to Residuals window Plot Tab Click residuals +/- 1.96 SD x rank Apply
[on single plots pane] [plot in D9; compare with D10]
To get a caterpillar plot of the revised level-2 district residuals
59
© Jones and Subramanian _____________________________________________________________________________________________________________________
Comparing the plot with last time there has been quite a lot of change, with one district now clearly differentiated from before. Use Identify points to verify that the outlying district is number 34.
Some questions:3.6 Why is District 34 found to be so outlying (expensive) once size is taken into account?
We can compare the differential district effects before and after taking account of the size of the houses in each district. First we will obtain the District number and store it in column c299; ‘District’ is a long column, c299 will be short, that is of length 50 with the district number in the 50 cells Data manipulation on main menu Unreplicate (this takes the first entry from a variable defined by higher-level blocks) Blocks defined by District Input column, highlight District Output column c299 Add to Action List Execute
60
© Jones and Subramanian _____________________________________________________________________________________________________________________
Now we can look at the level 2 differentials before and after taking account of house-size; get the rank of the change, and print out the results Command interface to give the following commands CALCulate c298 = c300-c310 Rank c298 c297 print c299 c297 c298 c300 c310 N = 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23
C299 50 1.0000 2.0000 3.0000 4.0000 5.0000 6.0000 7.0000 8.0000 9.0000 10.000 11.000 12.000 13.000 14.000 15.000 16.000 17.000 18.000 19.000 20.000 21.000 22.000 23.000
C297 50 14.000 5.0000 12.000 7.0000 17.000 46.000 48.000 33.000 36.000 16.000 15.000 38.000 29.000 26.000 4.0000 32.000 8.0000 11.000 23.000 9.0000 6.0000 40.000 41.000
C298 50 -5.3864 -9.2851 -6.1852 -8.7127 -4.0600 10.103 19.593 0.40610 2.5147 -4.2565 -4.5115 5.8391 -0.48079 -1.3743 -11.952 0.019660 -8.5475 -6.2162 -2.2189 -8.2806 -9.1743 6.2205 6.5983
C300 50 -5.9741 -10.407 -7.6054 -9.4058 1.8152 20.035 22.840 -1.7899 0.099253 -4.3282 -2.9927 6.7331 13.039 -12.270 -15.424 -2.2798 0.37350 -6.9176 -0.089050 -5.0770 -20.997 2.2811 -3.1120
C310 50 -0.58769 -1.1222 -1.4202 -0.69314 5.8752 9.9321 3.2471 -2.1960 -2.4155 -0.071628 1.5188 0.89399 13.519 -10.895 -3.4721 -2.2994 8.9210 -0.70141 2.1299 3.2036 -11.822 -3.9394 -9.7103
61
© Jones and Subramanian _____________________________________________________________________________________________________________________
24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50
24.000 25.000 26.000 27.000 28.000 29.000 30.000 31.000 32.000 33.000 34.000 35.000 36.000 37.000 38.000 39.000 40.000 41.000 42.000 43.000 44.000 45.000 46.000 47.000 48.000 49.000 50.000
42.000 34.000 43.000 44.000 39.000 19.000 37.000 10.000 22.000 31.000 13.000 1.0000 24.000 21.000 35.000 30.000 28.000 18.000 49.000 50.000 45.000 47.000 3.0000 2.0000 25.000 27.000 20.000
6.8156 1.2376 9.0172 9.2692 6.1717 -2.6985 4.5773 -6.7588 -2.5041 -0.15105 -5.6012 -15.792 -2.0675 -2.5533 2.4555 -0.15651 -0.72402 -3.4193 23.280 26.325 9.6230 17.533 -14.594 -14.627 -1.5340 -1.1854 -2.5918
-2.3677 8.2349 3.3456 2.8885 4.9160 -9.1740 -2.1991 -10.883 -11.203 5.1737 39.555 -19.904 -5.7458 2.9719 3.8540 7.3810 4.4584 3.6026 11.295 34.102 15.694 10.654 -15.438 -10.734 -6.3371 -15.858 -6.8305
-9.1834 6.9973 -5.6716 -6.3808 -1.2557 -6.4755 -6.7764 -4.1243 -8.6985 5.3248 45.157 -4.1124 -3.6783 5.5252 1.3984 7.5375 5.1824 7.0219 -11.985 7.7774 6.0711 -6.8782 -0.84418 3.8924 -4.8030 -14.672 -4.2388
Some questions:3.7 What has happen in District 43 and why? What has happen in District 42 and why? What has happen in District 47 and why? What has happen in District 49 and why?
Predictions and varying relations plots Model on Main Menu Predictions complete the window as follows putting the revised district estimates to c9 The residuals at level 1 must remain greyed out if you want to see the plot for districts
62
© Jones and Subramanian _____________________________________________________________________________________________________________________
Graphics on Main Menu Customized graphics Switch to D1 Click on right-side to ds#2 Y c9 X size group districts Plot Position tab choose col 1 and row 2 Apply
[display set D1] [subgraph 2 not to overwrite ds#1] [revised predictions; note that Size-5 has been stored in the worksheet at col 8] [to plot district lines]
[original plot in col 1 row 1]
The Plot what screen should show that there are two subgraphs in display D1. The parallel lines assumption of the RI plot is clear.
63
© Jones and Subramanian _____________________________________________________________________________________________________________________
We will come back to deal with the outlying district later. Model 3: fully random model at level 2 Specifying and estimating a random-intercepts, random-slope model Return to the equations window Click on Size-5 Tick District as well as fixed Click Done Click More Save revised model as model3.wsz
[to get X variable pop-up menu] [to allow the associated slope parameter to vary over district] [to close window] [ continue estimation, blue to green]
Some questions:3.8 What do the estimate values 75.4, 10.97, 89.45, 18.34, 9.95, 333.685 represent? Do you anticipate fanning in or out; what does this mean for the district geography of house prices?
Calculating and graphic residuals Model on Main Menu Residuals Start Output at C320 Change 1.0 to 1.96 standard errors Tick all types of residuals Level 2: district
[not to overwrite existing] [to get 95% confidence bands]
64
© Jones and Subramanian _____________________________________________________________________________________________________________________
Set Columns [to get all output columns] Calc [to estimate] Return to Residuals window Plot Tab Click residuals +/- 1.96 SD x rank Apply Two plots produced automatically.
[on single plots pane] [to get two plots in D10]
Click in top graph Titles tab Margin (top) type Residuals from Model 3 Margin (left) type Random Intercepts Tick box to show margins Apply Click in bottom graph Titles tab Margin (left) type Random Slopes Tick box to show margins Apply
Use Identify points to verify that the outlying district in terms of the random intercept is number 34,and that it is also the place with the steepest slope.
65
© Jones and Subramanian _____________________________________________________________________________________________________________________
Some questions:3.9 What determines the width of the 95% CI’s in the random slopes plot; when will they be at their widest?
Return to Residuals Window Plots tab Tick Residuals on pairwise pane [to get covariance plot] Click Apply Click in graph Graph title Model 3: covariance plot The positive covariance is very clear, as is the outlying nature of district 34.
Now we can look at the level 2 differentials for both the slope and the intercepts Command interface to give the following commands to get the district number and the random differential intercepts and slopes print c299 c320 c321
66
© Jones and Subramanian _____________________________________________________________________________________________________________________
N = 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50
C299 50 1.0000 2.0000 3.0000 4.0000 5.0000 6.0000 7.0000 8.0000 9.0000 10.000 11.000 12.000 13.000 14.000 15.000 16.000 17.000 18.000 19.000 20.000 21.000 22.000 23.000 24.000 25.000 26.000 27.000 28.000 29.000 30.000 31.000 32.000 33.000 34.000 35.000 36.000 37.000 38.000 39.000 40.000 41.000 42.000 43.000 44.000 45.000 46.000 47.000 48.000 49.000 50.000
C320 50 -0.19398 -2.3724 -2.9278 -1.8981 6.0119 6.3271 1.8723 -1.8411 -2.1286 0.58211 1.7127 0.33241 12.679 -10.636 -6.4260 -1.9112 9.6443 0.0034718 2.6922 4.1637 -14.658 -3.5564 -7.8404 -7.6445 6.2728 -3.3423 -4.8370 -2.1416 -5.8893 -6.1453 -4.1574 -8.4238 5.0159 45.592 -4.7512 -3.3733 5.6252 0.78230 7.1120 5.0326 7.3397 -7.5262 5.8392 4.3925 -4.7613 -1.2574 2.8532 -4.2773 -13.686 -3.2753
C321 50 0.26047 -2.0271 -3.5767 -2.0765 1.1431 3.0318 0.44089 -1.7399 -0.32526 1.3920 0.28910 0.85776 6.2983 -5.0009 -3.5854 -4.0985 1.5116 0.75936 2.5756 1.2828 -5.1722 -0.26637 -2.0646 -1.5662 4.4549 -2.5659 -1.4591 1.5812 -0.22170 -0.31275 -1.0244 -2.5767 -0.49149 7.4688 -1.0532 -0.68327 1.7440 2.1432 2.3910 3.3929 2.4399 -2.0009 0.35901 1.2918 -1.2811 -0.74382 -0.97565 0.44639 -3.0067 2.3404
67
© Jones and Subramanian _____________________________________________________________________________________________________________________
Some questions:3.10 What does a 6 room house cost in district 34 Predictions and varying relation plots Model on Main Menu Predictions Click on Cons [to get all terms associated with Constant included] Click on Size-5 [to get all terms associated with Size-5 included] Click on Level-1 residuals associated with Cons to exclude Output to c10 [free column] Calc
Name C10 as ‘Yhat3' and save the revised worksheet. To get varying relation graph Graphs on Main Menu Customized graphics D1 ds#3 y: yhat3 x: Size Plot type Group Position tab Choose Column 2 and Row 1 Apply
[for graph display] [for third subgraph on display] [predicted values for each district] Line+point District
[to draw a line for each district]
68
© Jones and Subramanian _____________________________________________________________________________________________________________________
The fanning out associated with model 3 is clearly seen, there are bigger differences in price between districts for larger properties.
Model 4: Treating district as an outlier in the fixed part of the model We now want to deal with district 34 as the marked outlier. We want to do this because it breaks the assumption that the district residuals (the joint distribution of the random intercepts and slopes) follow a multivariate Normal distribution. We do so by including separate terms (both intercept and slope) for district 34 in the fixed part of the model; it will automatically be removed from the level-2 random part. Specifying and estimating the model Click on the line for District 34 in the top; right-hand graph of the varying relations plot Identify point in Multilevel Filtering, highlight Level 2 district, idcode = 34 In model pane highlight Absorb in to dummy Apply In the ‘Absorb outliers into dummy variables’ pop-up menu Tick interaction with Cons [to get dummy with1 for District 34] Tick interaction with Size-5 [to get interaction between dummy and Size-5] Done This will create two new variables and include them in the model Return to the Equations window More iterations To get the estimated model as
69
© Jones and Subramanian _____________________________________________________________________________________________________________________
Some questions: 3.11 What do the estimates represent and how have they changed since model 3? 74.3 54.37 10.88 2.96 (significant?); 25.21 9.588 13.494 333.63
Some questions:3.12 What does a 5 room house cost in district 34 when included in the random part What does a 5 room house cost in district 34 when included in the fixed part Why is there this difference? You should now be able to plot the residuals from this model and draw the varying relations plot.
70
© Jones and Subramanian _____________________________________________________________________________________________________________________
There are now only 49 residual estimates as District 34 has been ‘dummied out’. There are also now no distinct outliers in the level-2 residuals
The distinctive nature of district 34 is seen as is the fanning out, so that the biggest differences in Price between districts are for larger properties. The properties with the least geographical differences would appear to be four-roomed ones.
71
© Jones and Subramanian _____________________________________________________________________________________________________________________
Some Answers Some questions:3.1 What does 80.98 represent? 80.98 is the mean house price across all the districts and all the houses; And 170.3; and is it significantly different from zero? Does it appear that house prices vary between districts? 170.3 is the between district variance and as it is more than 2* the standard error, we can informally say that there is ‘significant’ between-district variance; we need a multilevel model to model these data adequately as house prices vary between districts; And 629.7? 629.7 is the within district between house variation
Some questions:3.2 What does the Cons variable represent? The Cons represents the level 2 district residual, that is the term associated with the Variable Cons at that level What does the dotted line at 0 represent? The dotted line is really 80.98, the mean house price across London What are the triangles? The triangles are the point estimates of the district effects. What are the whiskers around the triangles? The whiskers are the 95% confidence intervals Why are all roughly equal here? They are all roughly equal here because there are about 25 houses in each district Click in the graph and use the Identify points tab to answer What is the dearest district; what does a house on average cost there? What is the cheapest district; what does a house on average cost there? The dearest district is district 34 and houses cost some 39k more than generally across the city; the cheapest district is 21 and houses cost some 21k less than the all London average.
72
© Jones and Subramanian _____________________________________________________________________________________________________________________
Some questions:3.3 If you request just β0 what do you get in c7? The grand mean intercept estimate for all the values If you request β0 μ0j and eoij ? The raw data for the response, Price If you request μ0j ? The district differential If you request β0 and μ0j ? [Its this one you want!] The district mean price
Some questions:3.4 The model does not include the size of the house, but the plotted values have been predicted here for convenience. What do you think will happen to the effect for district 34 when we do take account of the size of houses in that district? As the houses are relatively small in this district, the size of the district effect should increase. What do you think will happen to the effect for district 43 when we do take account of the size of houses in that district? As the houses are relatively large in this district, the size of the district effect should decrease.
Some questions: 3.5 What do the estimates represent 75.667 is the grand mean house price for a 5-roomed house across all districts 10.692 is the grand mean slope, the cost of an additional room across all districts 94.436 is the between district variance; which although still significant has been substantially reduced; that must mean that size of houses varies between areas 359.093 is the within-district, between-house variation; this has also decreased, there is a lot less unexplained between houses when account is taken of their size.
73
© Jones and Subramanian _____________________________________________________________________________________________________________________
Some questions:3.6 Why is District 34 found to be so outlying (expensive) once size is taken into account? District 34 has relatively small houses, when this is taken into account, the district effect is much bigger Some questions:3.7 What has happen in District 43 and why? District differential without size +34K; with size; £8k, therefore large houses What has happen in District 42 and why? District differential without size +11K; with size; -£11k, therefore large houses What has happen in District 47 and why? District differential without size -11K; with size; £4k, therefore small houses What has happen in District 49 and why? Little change; therefore not distinctive in terms of size of house Some questions:3.8 What do the estimate values 75.4, 10.97, 89.45, 18.34, 9.95, 333.685 represent? 75.4 is the grand mean intercept, the average cost of a 5 room house across all 50 districts 10.97 is the grand mean slope, the cost of an extra room ‘averaged’ across all 50 districts There are now three terms at level 2 representing the variance-covariance for districts 89.454: there is significant between district-variance for 5 roomed house; the cost of a 5-room house varies from place to place; 9.948: the variance for the slopes is also significant; while generally the cost of an extra room is 10.976, this varies from place to place 18.341: the covariance between the random intercepts and slopes is positive and significant; this means that districts which are expensive for a 5 room house will also have a steeper marginal relationship between price and size. 333.685 is the within- district, between-house variation for all types of houses, the homoscedasticity assumption Do you anticipate fanning in or out; what does this mean doe the district geography of house prices? Fanning out; there are greater between-neighbourhood differences for large properties Some questions:3.9 What determines the width of the 95% CI’s in the random slopes plot; when will they be at their widest?
74
© Jones and Subramanian _____________________________________________________________________________________________________________________
They will be wide when they are imprecisely estimated; ie, there are few houses sampled within a district and when the size of houses with a district does not vary very much; eg all 5 roomed houses. Some questions:3.10 What does a 6 room house cost in district 34 £75k for a 5 room house generally plus £45.6 premium a 5 roomed house in district 34 plus £11k for an extra room generally, plus an extra £7.5k for the extra room in district 34 Some questions: 3.11 What do the estimates represent and how have they changed since model 3? 74.3 is the grand mean house prices for a 5-roomed house across all districts, except in district 34, where a 5 room house is 54.37 dearer (this difference is highly significant); 10.88 is the grand mean slope, the cost of an additional room across all districts, except in district 34, where an additional room is additionally 2.96 dearer (this difference is not very significant from zero); 25.25 is the between district variance for a 5 roomed house; which although still significant has been substantially reduced now district 34 is not treated as part of the London distribution; 9.948: the variance for the slopes is also significant; this has not changed a great deal as the residual plot showed that district 34 while having the steepest slope was not an outlying value; (ie outlying in differential intercept but not differential slope) 13.497: the covariance between the random intercepts and slopes is positive and significantly different from zero; this means that districts which are expensive for a 5 room house will also have a steeper relationship between price and size 333.63 is the within-district, between house variation; this has hardly changed Some questions:3.12 What does a 5 room house cost in district 34 when included in the random part What does a 5 room house cost in district 34 when included in the fixed part With random-part specification; £75k for a 5 room house generally plus £45.6 premium for district 34. With fixed-part specification; £74k for a 5 room house generally plus £54.37 premium for district 34. Why is there this difference? Shrinkage. With a random-part specification the estimate for District 34 is shrunk back to the grand mean across all districts; with the fixed part-specification it is not assumed to be part of the London distribution and is estimated as an island unto itself and takes on its own value unsupported by what is found generally across all of London.
75
© Jones and Subramanian _____________________________________________________________________________________________________________________
4. Confidence intervals, residuals and diagnostics Introduction This chapter does not fit any further models but tries to give you a deeper perspective on the models we have fitted. In particular it concentrates on the capabilities of two particular MLwiN windows: prediction and residuals, whilst using the customized graphics to show the results. We will consider the following procedures:
95 percent confidence intervals for the fixed part of the model; use of 1.4 sd * comparative residuals to compare district differences; a catch-all plot for detecting outliers, heterogeneity and non-linearity; Normal score plots for assessing whether the level-1 and level-2 residuals conform to a Gaussian distribution a range of multilevel diagnostics techniques for identifying unusual or influential observations
Plotting 95 percent confidence intervals for the fixed part of the model File on Main Menu Open model4.wsz Model on Main Menu Equations Use the facilities on the bottom toolbar to see the detailed estimates of the models and the variables involved It should look something like this.
76
© Jones and Subramanian _____________________________________________________________________________________________________________________
We are going to plot fixed effects and their 95 percent confidence interval Model on Main Menu Predictions Click Names [to see what variables we are dealing with] Click on all fixed effect coefficients [to include in predictions] Leave level-2 terms greyed out [to exclude from predictions] Output from predictions to c18 Change 1.0 to 1.96 SE [to set 95% limits] Select fixed part in box Output of SE to c19 Calc The completed window should look like:
Now we have to plot these values. Graphs on Main Menu Customized graphs
77
© Jones and Subramanian _____________________________________________________________________________________________________________________
D3 [a new graphical display] ds#1 [a new subgraph] Complete the plot what tab as follows
Notice that the group is now d-district34.cons as there should be two lines: group 1 for district 34; group 0 for the rest of the city. Next we specify the error bars tab as follows:
Notice that we are plotting the error bars as lines, and that the values we calculated from the predictions window are offsets from the values that we used to produce the lines.
Then Apply to get the graph. To turn off any highlights left over from the previous models Click in the graph Identify point tabs 78
© Jones and Subramanian _____________________________________________________________________________________________________________________
In graphs scroll down menu Reset all Apply Using the titles tab we can get the following completed graph
Notice the comparative width of the error bands on the two lines reflecting the much fewer houses in district 34. The differential length of the lines reflects that district 34 does not have large or particularly small houses. The separation of the lines suggests that District 34 is a very different place from the rest of the city. The curves in the standard errors, most clearly seem for District 34 correctly show where we have the least information, that is away from where size it at its average value. The next graph shows the 95 percent confidence intervals overlaid on the
79
© Jones and Subramanian _____________________________________________________________________________________________________________________
original data. It was achieved by this sequence. Staying with ds#1 On plot style tab Choose red for the color and 2 for line thickness On the other tab Choose order 2 [so that it is plotted second] Highlight ds#2 on plot what? Tab y price plot type: point
x: size group: none
on position tab choose column 1 row 1 [ same position as ds#1 to overplot] on plot style tab reduce the symbol size to 10 on plot other tab choose order 1 [so that it is plotted first] Apply
Comparing pairs of districts We have so far used residual by rank plots to have a graphical representation of whether a particular district is significantly different at the 0.05 level from the overall mean. To do this we used the 1.96 standard deviations of the comparative residuals. Goldstein and Healy (1995, JRSS Series A158, 175-7) showed that if we want to compare any pair of higher-level units (schools in their example), then we can judge the significance, again at 0.05 level, by whether 1.4 * standard deviation interval for the comparative residuals overlap. The residual screens are as follows (Note level is 2: district).
80
© Jones and Subramanian _____________________________________________________________________________________________________________________
No overlap suggests differences between districts at the 95 percent level.
81
© Jones and Subramanian _____________________________________________________________________________________________________________________
Normal score plots Multilevel models assume that the distribution at each level come from a multivariate Gaussian distribution. A simple technique for graphically assessing the univariate element of this assumption for each residual at each level is to plot a normal score plot. The desired plot of a normal distribution is an approximate straight line. Non-normality can be due to outliers, model mis-specification, or due to the intrinsic nature of the data. We will tackle this latter problem by non-linear discrete response multilevel models (see later). Here we shows the results for level 2. In Plots tab in residual window click on Standardised residual x nomal score.
Both of these look acceptable; the distributional assumptions would appear to be fulfilled once District 34 is dummied out. A similar plot can be made at level1 for houses.
This too shows approximate Normality.
82
© Jones and Subramanian _____________________________________________________________________________________________________________________
Catch-all plots A catch-all plot of the standardized level-1 residuals against the fitted part based on the fixed terms It is an exceedingly useful plot for detecting outliers, heterogeneity and non-linearity. If there were no problems the plot should essentially be structure-less with the majority of points within plus and minus 2 or 3 standard deviation of a mean of zero. The horizontal axis is the fixed part predictions. For Model 4 the plot is as follows.
Some questions:4.1 Is there a problem here?
83
© Jones and Subramanian _____________________________________________________________________________________________________________________
Diagnosing unusual or influential observations MLwiN has a range of multilevel diagnostic techniques for identifying unusual or influential observations. These are based on the work of Langford and Lewis.12 The plots for the constant in Model 4 at level-2 are as follows. Here we will focus here on the diagnostics at level 2 associated with the Cons only.
12
Langford I, Lewis T (1998) Outliers in multilevel data, Journal of the Royal Statistical Society A, 161: 121-160. 84
© Jones and Subramanian _____________________________________________________________________________________________________________________
There are districts with a high leverage but there is no evidence of a strong influence; there are no striking outliers. There are three new type of diagnostics here: Leverage values (unusual in x): These are the standardized potential leverage values as defined by Langford and Lewis. Leverage values are calculated using the projection, or hat matrix of the fitted values of the model, and a high leverage value for a particular district indicates that any change in the intercept for that district will tend to move the regression surface appreciably. An approximate cut-off point for looking at unusually high leverage values is 2p/n, where p is the number of random variables at a particular level, and n is the number of units at that level. Here, we have 2 variables and 50 districts, so unusually high leverage values may be above 4/50, that is, 0.08. On both plots districts exceed this value. These districts have a high potential for affecting the results. Deletion residuals (dropping one district at a time) show in this case the deviation between the intercept for each particular district and the mean intercept for all districts, when the model is fitted to the data excluding that district. Consequently, they will show up any outlying districts very clearly. However, when the number of units at a particular level is large, these will be very similar to the standardized residuals. It is the deletion residuals that are used in the calculation of influence values. Influence values are a combination of extreme residuals and leverage values; DFITS as defined in Langford and Lewis. The influence values combine the deletion residuals and leverage values to measure the impact of each district on the estimate for, in this case, the level 2 residual. The scatter-plot in the lower right hand diagram shows leverage values against standardized residuals and can be clicked on to identify particular outlier districts. The following general advice is from the MLwiN manual (p.221) “We may be interested in observations which are outlying with respect to any of the random parameters in a model and hence the first issue is where to start data exploration in multilevel model. Rather than looking at the individual data points we have found it most useful to begin at the level of highest aggregation, which will often be simply the highest level in the model. There are two reasons for this. Researchers are often most interested in the highest level of aggregation and will naturally concentrate their initial efforts here. However, if discrepancies can be found in higher level structures, these are more likely to be indicative of serious problems than a few outlying points in lower level units. After the highest level has been analyzed, lower levels should be examined in turn, with analysis and initial treatment of outliers at the lowest level of the model. The highest level should then be re-examined after a revised model has been fitted to the data. The objective is to determine whether an outlying unit at a higher level is entirely outlying or outlying due to the effects of one or two aberrant lower level units it contains. Similarly examination of lower level units may show that one or two lower level units are aberrant within a particular higher level unit which does not appear unusual and that the higher level unit would be aberrant without these lower level units. Hence, care must be taken with the
85
© Jones and Subramanian _____________________________________________________________________________________________________________________
analysis not simply to focus on interesting higher level units but to explore fully lower level units as well”. As all the diagnostics are stored it is possible to develop you own graphical display. Here we have plotted in the final graph Leverage against Deletion residuals and have put on the plot the District number. We have done this model for Model 3, that is without district 34 being dummied out.
86
© Jones and Subramanian _____________________________________________________________________________________________________________________
The outlying nature of District 34 is very apparent. Further discussion on assessing the assumptions of multilevel models is to be found in the Multilevel Handbook.13
Some answers Question 4.1 Is there a problem here? The easiest way to interpret a catch all plot is to mentally take a number of vertical slices of the horizontal axis and picture the mean and variance of the standardized residuals for these slices. If you do it for the catch-all plot of Model 4, there is no change in the mean, that is there is no evidence of non-linearity. However, as we move from left to right the variance at first increases and then possibly decreases, but it is difficult to tell the latter as the number of observations declines at large room sizes. On balance there is strong evidence of heteroscedasticity which will need to be addressed.
13
Snijders, T,A.B., and Berkhof, J (2008) Diagnostic checks for multilevel models. Leeuw, J & Meijer, E (eds.), Handbook of Multilevel Analysis, Springer, Chapter 3, 141-175) http://stat.gamma.rug.nl/handbook_ml_ch3.pdf
87
© Jones and Subramanian _____________________________________________________________________________________________________________________
5. Modelling variance functions: continuous variables Introduction This chapter is primarily concerned with demonstrating that multilevel modelling is fundamentally about estimating a variance function at each level. We shall consider the following procedures:
the calculation of the variance partitioning function (at first for simple models), this also known as the Intra-Class Correlation; the display of a higher-level constant, quadratic and linear function at level-2; the display of a higher-level constant, quadratic and linear function at level-1
This chapter will only deal with continuous predictor variables; categorical predictors will be considered later.
The Variance Partitioning Function The variance partitioning function can be characterized as the higher-level variance of the random part divided by total variance. This is equivalent to how much variation remains to be accounted for at the higher level. Retrieve the worksheet for the null model (that is model1.wsz).
CALC b1 = 170.3/(170.3 + 629.7) PRINt b1 0.21 Question 5.1: what does this value represent?
88
© Jones and Subramanian _____________________________________________________________________________________________________________________
Include size-5 in the model (to get Model2) and More iterations to convergence
The revised VPC function is now CALC b1 = 94.4/(94.4 + 359.09) PRINt b1 0.20 Question 5.2: what does this value mean?
Displaying heterogeneity in a random-intercepts model Staying with model2 we can build a set of graphs that show the nature of heterogeneity very clearly. We strongly recommend that you follow this sequence because in more complex models, this approach is very helpful in characterizing the nature of the model and the estimates. First, name a set of columns to hold the results.
89
© Jones and Subramanian _____________________________________________________________________________________________________________________
These will hold
the level 2 variance function; the level 1 variance function; the general line or ‘fit’ across houses and districts; the Level 2 95% confidence bounds; that is how districts vary around the general line; the Level 1 95% confidence bounds; that is how houses in a district vary around the general line; the Variance Partitioning Function
We will begin with calculating the variance function at level-2, that is the variance between districts Model on Main menu Variance function Click on name Level 2: district Variance output to lev2var Click Calc
The COPY button will only copy the values of the results of the table, here the level -1 variance of 94.436.
90
© Jones and Subramanian _____________________________________________________________________________________________________________________
At any particular level (here level 2 denoted by ‘district’ in the dialogue box at the bottom left) this screen displays and stores the variance contributed by that level as a function of the predictor variables for the random effects. In the present case, the level-2 model specifies only an intercept variance, that is a constant function. The level-2 variance is simply 94.436. [Notice you can also store the SE of the variance.] The middle left hand of the screen allows you to specify values for variables and display the associated value for the variance function. Here we have set the value of Cons to be 1 (the only possible value) and get the resultant estimate of 94.436
Repeat the procedure to get the level-1 variance function.
We now estimate the general line using the Predictions subwindow, make predictions for the fixed part only and store the results in the column named fit. This is the general line across all of London.
91
© Jones and Subramanian _____________________________________________________________________________________________________________________
Use the calculate subwindow to calculate the VPC function as:
Do not close this window as we will want to repeat this calculation for subsequent models. We can now calculate the 95 percent confidence bounds at each level in the command window as follows (^ means raise to a power; so ^0.5 means calculating the square root, which is this case is the standard deviation). calc 'lev295cb' = 1.96 * ('lev2var'^0.5) calc 'lev195cb' = 1.96 * ('lev1var'^0.5) Using Names sub-window to check the data you should have
92
© Jones and Subramanian _____________________________________________________________________________________________________________________
Question 5.3: Which of the new variables are constant and which are not; why is this?
To plot the results using customized graphics; this looks a lot of work but once done in this form, it requires no change for other model in our sequence D2
to get a new main window ds#1 plot what?
y: Lev2Var x: size plot type: line plot style color: 9blue line thickness: 3 plot position col1;row1 other construct key labels from; tick box; type in Level 2 (do not tick group code; and do not use text as the same name as a variable) ds#2 plot what?
y: Lev1Var x: size plot type: line plot style color: 4red line thickness: 3 plot position col1;row1 [so that both lev-1 and lev-2 are on the same graph] other construct key labels from; tick box; type in Level 1 ds#3 plot what?
y: fit x: size plot type: line plot style line thickness: 3 plot position col2;row1 error bars: y+errors: Lev295CB y-errors: Lev295CB plot as offsets y error type: line ds#4 plot what?
y: fit x: size plot type: line plot style line thickness: 3 plot position col1;row2 error bars: y+errors: Lev195CB y-errors: Lev195CB plot as offsets y error type: line ds#5 plot what?
y: VPC plot type: line plot style line thickness: 3 plot position col2;row2
x: size
93
© Jones and Subramanian _____________________________________________________________________________________________________________________
Labels click on graph; Done. Apply; which after using titles gives
Questions 5.4: What do variance function graph show? What do the level-2 ‘tramlines’ show? What do the level-1 ‘tramlines’ show? What do the VPC show? Repeat the procedure for model 3 that allow size-5 to vary in the random part at level-2.
94
© Jones and Subramanian _____________________________________________________________________________________________________________________
Repeat the process outlined above, calculate level-2 variance; level-1 variance, then the fit, then the 95 percent confidence bounds at each level, and the VPC function to get the revised graphs. It is also worth pausing and looking at the level 2 variance function for some characteristic values.
In the top pane, the algebraic variance function is detailed. In the middle pane, you can see that the 5 roomed house (Size-5 =0) has a between district variance of 89.45, that is what we have been calling the random intercepts variance. Smaller properties with 3 rooms have a smaller variance at 55.88; larger properties with 7 rooms have a larger variance at 202.61. Clearly, the between district variance is a positive quadratic variance of size. Here are the completed graphs
95
© Jones and Subramanian _____________________________________________________________________________________________________________________
Questions 5.5: What do variance function graph show? What do the level-2 tramlines show? What do the level-1 tramlines show? What do the VPC show?
Repeat the process with the dummies for district34 in the model.
Repeat the process outlined above (leaving out Dist34 in fixed predictions)
Questions 5.6: What do the level-2 tramlines show? 96
© Jones and Subramanian _____________________________________________________________________________________________________________________
Model 5: Linear function at level-2 The models so far have had a constant variance function at level 1, and a constant or quadratic variance function at level 2. We know try and fit a linear function at level 2. Click on the variance term for size-5 to remove it.
Click on more; let it run for say 10 iterations and then stop it; you will notice that the estimates are not converging. You can see what is happening in the trajectories window. Model on Main menu Trajectories Click on select Highlight the two district-level random terms Change structured graph layout to 3 graphs per row Done Start the model running
97
© Jones and Subramanian _____________________________________________________________________________________________________________________
You will see that there is a very pronounced saw-tooth result; the level 2 estimates are oscillating between 0 and 25 for the constant part of the level-2 variance function; and between 0 and 13 for the linear part. The model fails to converge however long you run it. The model is an inappropriate one; the full quadratic is needed ( the quadratic terms is substantial and significant) and the results do not converge. Close the trajectory window; it has high overheads as it is updated continuously as models are estimated.
Model 6: Complex heterogeneity at level-1: linear function of size Amend the model to get the full quadratic at level-2 and the linear function at level-1. Click on size-5 Tick var j district and i house This gives full quadratic at level-2 Click on variance term for size-5/size-5 at level-1 and delete from the model.
The model will converge, somewhat slowly, to
98
© Jones and Subramanian _____________________________________________________________________________________________________________________
Question 5.7 Do we need the new term? What has happened to the other terms in the model?
It is worth looking specifically at the Leve-1 variance function
In the top pane, the linear variance function is detailed. In the middle pane, you can see that the 5 roomed house (Size-5 =0) has a within-district, between house variance of 290.95. Smaller properties with 3 rooms have a smaller variance at 175.757; larger properties with 7 rooms have a larger variance at 406. Clearly, the between house variance is not constant; the assumption of homoscedasticity was not a good one Repeat the process outlined above (omit Dist34 in predictions) results in the following graphs
Question 5.8: What has changed on the graphs? 99
© Jones and Subramanian _____________________________________________________________________________________________________________________
Model 7: Complex heterogeneity at level-1: quadratic function of size We come to the final model; in which there is a quadratic variance function at level 1.Click on size-5/size-5 variance at level-1 to include term in the model and more for convergence.
Question 5.9: do we need the new term?
Repeat the process outlined above (leave out Dist34 in predictions). The level-1 variance is worth looking at
In the top pane, the quadratic variance function is detailed. In the middle pane, you can see that the 5 roomed house (Size-5 =0) has a within-district, between house variance of 211. Smaller properties with 3 rooms have a smaller variance at 186; larger properties with 7 rooms have a much larger variance at 450. Very clearly, the between house variance is not constant; the initial assumption of homoscedasticity was not a good one
100
© Jones and Subramanian _____________________________________________________________________________________________________________________
The final set of graphs:
Questions 5.10: How have the results changed?
Estimation procedures The Start button begins the MLwiN estimation process, iteration 0 uses a single level OLS model; then each subsequent iteration involves a the estimation of the Fixed and Random parts until the model converges. Convergence means that there is no substantial change in the estimates. The degree of tolerance for detecting change is set in the Estimation control box in the main window
101
© Jones and Subramanian _____________________________________________________________________________________________________________________
Here the default value of 10-2 means that each and every model parameter has to change by less than this relative amount to achieve convergence The More button instructs MLwiN to continue iterations from the current model parameter values; that is it does not go back to the iteration 0 and start estimation from scratch. When should you use More, and when should you use Start? It makes little difference if you model is rather simple and can be rapidly estimated due to a small number of observations. However, if you are adding a single term to a large and complex model, then you may wish to use More. It may take fewer iterations to convergence. You may even be experiencing convergence difficulties due to a single parameter which you discern by examining the estimates in the Trajectories window. You can then temporarily remove this parameter, More to convergence, then add the parameter back in, then More to convergence. The success of this tactic depends on whether the omitted parameter leads to a substantial change in the other parameters; if it does not, then the approach is much more likely to work The Stop button instructs MLwiN to end whatever part of the estimation process it is currently engaged in. The current estimation method is displayed – here it is IGLS.
102
© Jones and Subramanian _____________________________________________________________________________________________________________________
Chapter 5: some answers Question 5.1: what does this value represent? There are two alternative but mutually compatible answers. First, 21% of the total variation lies at the higher, district level. Second, two randomly chosen houses with a district can be anticipated to have a correlation in their prices of 0.21
Question 5.2: what does this value mean? The VPC after taking account of house size; here it has not changed a great deal.
Question 5.3: Which of the new variables are constant and which are not; why is this? All but the Fit is just a single value because we are dealing with a homoscedastic random intercepts model which has a single variance at each level; the Fit has different values because it gives the citywide average price for properties of different sizes
Question 5.4: What do variance function graph show? No change with size; that is a Random intercepts model; the between-district variance is larger than the within district, between house variance What do the level-2 tramlines show? The 95% confidence bounds for the 2.5% highest and lowest districts; parallel because of the random intercepts assumption What do the level-1 tramlines show? The 95% confidence bounds for the 2.5% highest and lowest costing houses within a district; parallel because of the homoscedasticty assumption What do the VPC show? The percentage of the total unexplained variance that is at the district level. Questions 5.5: What do variance function graph show? The level-2 variance increases quadratically with Size (due to the positive ‘covariance’) The level-1 variance is constant with size; the homoscedasticity assumption What do the level-2 tramlines show? The biggest differences between districts around the general trend are for the largest properties What do the level-1 tramlines show? The variance is presumed to be unchanging with size What do the VPC show?
103
© Jones and Subramanian _____________________________________________________________________________________________________________________
The greatest difference between districts is for the largest properties; that is the greatest similarity within a district is for large properties. Imagine a 10 roomed property in district X which is a £1million, then other 10 room properties in that district are also likely to be worth a £1million. In district Y if one 10 roomed property is worth only £0.2 million, then other 10 roomed properties in that district with be similarly priced. Big differences between; great similarity within. Questions 5.6: What do the level-2 tramlines show? It is only for larger than average properties that there is any substantial differences between districts are found. For smaller than average properties, the district in which the property is located does not matter a great deal.
Question 5.7 Do we need the new term? Yes the new linear term at level 1 is significant; the assumption of level-1 homoscedasticty can be rejected. What has happened to the other terms in the model? The level -1 variance estimate of 290.95 estimate is for 5 roomed properties; it used to be 333.6 for all properties. At level 2, most noticeably the linear term (the ‘covariance’) has reduced substantially from 13.49 to 9.97; there is some confounding of variance across levels The standard errors have not changed a great deal. Question 5.8: What has changed on the graphs? The level-1 variance function is markedly increasing as a positive linear function; larger properties are more variable in price. The 95% confidence bounds show a fanning out reflecting this heterogeneity. The VPC looks quite different. For the 4 roomed house (the standard 2 up, 2 down?) there is very little differential geography; it does not matter much where the property is. Larger properties are more variable between districts and of the unexplained variation over a third of the variation is at the district level. Equivalently large properties are more similar within districts. But also now smaller properties are more variable between districts and of the unexplained variation about a third of the variation is at the district level. Equivalently small properties are more similar within Question 5.9: do we need the new term. The new term is highly significant; we really do need a quadratic variance function
104
© Jones and Subramanian _____________________________________________________________________________________________________________________
Questions 5.10: How have the results changed The major change is the clear increasing positive quadratic function with size at level 1. Larger properties are very variable in price. There is a marked fanning out in the within district, between-house confidence bounds. While the shape of the VPC has not changed a great deal; the actual values have. With the linear variance function at level 1, the VPC for 2 rooms was about a 0.3, not is only 0.15 when a quadratic function is used. A good example of why it is important to model explicitly heteroscedasticity and to look at the VPC as a graphical display.
105
© Jones and Subramanian _____________________________________________________________________________________________________________________
6.Comparing models and significance testing Introduction In this chapter no new models are fitted but we discuss how to
Display estimates in tabular form; Produce a table comparing the estimates of a set of models; Use likelihood ratio tests to assess whether a particular model is a ‘significant’ improvement over another using a procedure based on the change in deviance (normal theory model only; and only for nested models); Undertake a univariate Wald test to assess whether an estimate associated with a particular variable is significantly different from zero; Undertake a multivariate Wald test using a contrast matrix to assess whether a set of estimates are significantly different from a specified value; Provide some conclusions about what tests to use in model comparisons.
Estimate tables While Estimate Tables in MLwiN would appear to be just another way of displaying the results, they also offer two additional aspects of functionality:
Some aspects of the estimates are only available there;
With complex specification and re-specification of the model, the software can become ‘confused’ about what variables are and what are not in the model. This happens rarely but it can happen that a variable does not appear in the equations window but the software ‘thinks’ that it is included in the model. This goes back to a concept that was a feature of MLn - the candidate list. You first had to declare a variable to be in the candidate list for the model and then subsequently declare that it was in the fixed or random part or both. The Estimate Tables give the candidate list and allow rapid inclusion and omission of variables from the model. However, if all else fails, you can use the Clear button on the bottom of the Equations window to remove all terms and start all over again.
To see this functionality Retrieve model7.ws Model on main menu Estimate tables In the fixed part you should get
106
© Jones and Subramanian _____________________________________________________________________________________________________________________
Clicking on Estimate table in the Window menu brings up the current estimates for the parameters, by default the Fixed parameters appear. A series of letters with ticks appears at the top right of the screen. Reading from left to right we have in the fixed part S: E: S: P:
the symbol of the parameter The estimate The standard error of the estimate The value of the estimate on the previous iteration
The latter allows you to discern which parameters are not converged. Clicking on + button creates a second box with initially the same estimates; this can be used to contain further estimates for the random variables. Clicking the cursor in this box and then selecting, say, level2 from the dialogue box where ‘fixed part’ is initially displayed, will give the following, note the section of the screen is now surrounded by a thick line:
107
© Jones and Subramanian _____________________________________________________________________________________________________________________
The additional letters at the top right of the screen represent C: N:
The correlation associated with each parameter (random parameters only) The number of times convergence has been achieved for each parameter (random parameters only).
Here we find that the correlation (the standardized covariance) between the two latent variables is 0.657. Clicking on the ± button brings up a dialogue box with a list of all the variables. If the fixed part of the model has been selected then those variables currently in the fixed part are marked with a tick.
Clicking on these or other variables toggles them out or in to the current model. Likewise, if the random parameters at any level are selected this dialogue box indicates which variables at that level are in the model and can be removed or further ones inserted. You may also add or remove individual components of a covariance matrix by double clicking in the relevant box. This is useful for specifying complex variance functions, especially at level-1. If you have removed an element you may reinsert it in this way only if one of the coefficients associated with it is ‘present’ in the covariance matrix through another parameter. Remember that when variables are removed or inserted all relevant screens are updated. It is a good idea before clicking OK to review all components of the model. Thus the Estimate tables window allows you to insert and delete variables and parameters associated with the fixed and random parts of the model. This is particularly useful if you wish to enter or remove several variables at a time which can be done easily in the Estimate tables window but only one at a time in the Equations window. Storing and comparing model estimates – performing a LRT MLwiN has a very useful facility for comparing a set of models. To show this we will start with the simplest possible model in which there is only an overall mean and a single variance term, that is the model is a single-level model.
108
© Jones and Subramanian _____________________________________________________________________________________________________________________
In the bottom toolbar of the equation window, click on Store, and call these estimates Model 0, followed by OK
This can only be done for the model specified in the currently active equations window. Note this only stores the estimates and you need to continue saving the worksheet if you want all the attributes and the specification of the model. To view the estimates Model on main menu Compare stored models Which will bring up the following summary of the estimates
109
© Jones and Subramanian _____________________________________________________________________________________________________________________
Note that in addition to the estimates and their standard errors, it has the correlations between the random terms (not needed in this model), the deviance as (-2*log likelihood) and the structure of the model (this can be particularly useful for monitoring what happens when predictors with missing values are included in the model as the entire row of observations will be deleted, that is listwise deletion is the default procedure). Note the DIC and pD are reserved for models fitted by Bayesian MCMC procedures and not likelihood (IGLS/RIGLS) ones. We can then fit a more complex model, the two level random intercepts model
110
© Jones and Subramanian _____________________________________________________________________________________________________________________
which we can now Store as 1: Null; now comparing the models gives us the following comparison table
To get a well-presented table for a report click on the Copy button which will put the table as tab-delimited text onto the clipboard. Then in Word14, Paste, highlight all the values from the bottom up and Table, Insert table, after removing redundant material you get the following (we suggest you centre and bold the headings and right-justify the numeric values) and replace variance-covariance terms (as here) with something more meaningful than Cons/Cons
Response Fixed Part Cons Random Part Level: District
Model 0 Price 81.070
S.E.
0.844
1: Null Price
S.E.
80.980
1.992
170.314
39.695
629.706
27.149
Level: House 801.411 Deviance Units: District Units: House
14
10724.307 50 1126
33.775
10550.647 50 1126
It is sometimes easier to paste this to Excel for further manipulation before pasting to Word.
111
© Jones and Subramanian _____________________________________________________________________________________________________________________
We can now test with a likelihood ratio test if the more complex multilevel model is an improvement over the simpler single level model. That is we want to test the null hypothesis that is zero First we have to determine the change in the number of parameters – we can do this by inspection of the tables, that is there is one more estimate in Model 1, with the value of 170.314, that is the between-district variance ( ). We also need to calculate the change in the deviance by subtracting the deviance for the more complex model from the deviance of the simpler model; we can do this in the Command Window Change in Deviance = Deviance of More Complex Model – Deviance of Simpler Model Calc b1 = 10724.307 - 10550.647 which gives a value of 173.66 This is a very large reduction for one degree of freedom and we can get an associated p value by typing in the command window (CPRO, stands for Chi-square Probability value) CPRO b1 1 1.1744e-039 a very small value. It is very unlikely we could have got such a reduction by chance and would conclude that we need a multilevel model. We can now fit a third and more complicated model in which Size-5, the size of the house is additionally included
Storing this as 2 +Size, we can compare all three models so far fitted .
112
© Jones and Subramanian _____________________________________________________________________________________________________________________
MLwiN offers fine control over which models are compared via Model on Main menu Manage stored models Which brings up the following window
Thus highlighting Model 0 and the 2+Size model followed by Compare we can do that comparison
113
© Jones and Subramanian _____________________________________________________________________________________________________________________
We can then readily test if the more complex model is a significant improvement; there are two extra parameters so there are two degrees of freedom in the test. Calc b1 = 10724.307 – 9917.003 807.30 Cpro b1 2 0.00000 So the inclusion of the two terms ( and ) leads to a highly significant reduction in the deviance, or equivalently a highly significant improvement in the fit. As final example, we will additionally include random slopes for Size-5
114
© Jones and Subramanian _____________________________________________________________________________________________________________________
Store the results as 3: RS and then compare the last two models (you will have to close Manage stored models and then open it again to refresh it); note the additional information of the positive correlation of 0.615 between random slopes and intercepts
We now want to test the null hypothesis that both
and
are simultaneously zero.
Calc b1 = 9917.003 – 9873.35 43.653 cpro b1 2 3.3180e-010 So there is very strong evidence (a very small p value) that the more complex model is needed.
In the last two chapters we have fitted 7 models (remember model 5 did not converge); the tables below show the estimates including the deviance. The first model, model 0, is simply a singlelevel model with a mean and a variance.
115
© Jones and Subramanian _____________________________________________________________________________________________________________________
Response
Model 0 Price
Fixed Part Cons (Size-5)
SE
81.070
Random Part Level: District Cons/Cons (Size-5)/Cons (Size-5)/(Size5) Level: House Cons/Cons
801.411
Deviance Units: District Units: House
10724.307 50 1126
Response Fixed Part Cons (Size-5) D_District(34).Cons D_District(34).(Size -5) Random Part Level: District Cons/Cons (Size-5)/Cons (Size-5)/(Size-5) Level: House Cons/Cons (Size-5)/Cons (Size-5)/(Size-5) Deviance Units: District Units: House
1:Null Price
0.844
80.980
33.775
4+Dist34 Price
SE
1.992
2 +Size Price
75.667 10.692
SE
3: RS Price
1.497 0.367
SE
75.411 10.976
1.471 0.594
170.314 39.695
94.436 22.099
89.454 21.619 18.341 6.482 9.948 3.389
629.706 27.149
359.093 15.482
333.685 14.665
10550.647 50 1126 SE
9917.003 50 1126
6+Lev11:lLin SE Price
9873.350 50 1126 7:Lev1:Quad S E Price
74.326 10.884 54.374 2.961
0.934 0.578 6.399 5.477
74.299 10.627 54.446 3.560
0.935 0.570 6.448 5.115
74.084 10.536 54.704 3.204
0.909 0.559 6.184 4.770
25.205 13.494 9.587
8.587 4.023 3.174
27.456 9.976 9.973
8.547 3.921 3.115
26.994 10.043 8.660
8.091 3.802 3.026
333.633
14.630
290.949 28.836
12.793 3.488
211.796 32.958 26.510
13.692 4.435 4.702
9824.448 50 1126
9725.134 50 1126
9677.462 50 1126
116
© Jones and Subramanian _____________________________________________________________________________________________________________________
The Likelihood Ratio Test: the change in the Deviance We have so far used the Likelihood Ratio Test, we now consider more it more formally. The LRT is a procedure to test whether one model is a significantly better fit to the data that another simpler model. 15 It is based on the change in the deviance from one model to another. The deviance of a model (-2*log-likelihood) is a badness of fit statistic which is a function of the estimates and the number of observations. On its own it is not very useful, but we can use it compare models, to see whether one model is a significantly better fit to the data than another model. This deviance procedure can only be used in MLwiN with normal theory models and when one model is nested, that is a simplification of another. Two models are nested if both contain the same terms and one has at least one additional term.16 There is much discussion about the correct p value associated with a likelihood ratio test, when the additional terms are variance parameters. The point is that because a variance cannot be less than zero, the null hypothesis is at the boundary of the parameter space. It is therefore suggested that you should half the p values; that is a p < 0.02 should be halved to p