HLM software has been one of the leading statistical packages for hierarchi- cal linear ..... (dependent) variable, and level 2 representing agencies, a random ..... pp. 263â267; Raudenbush et al., 2011, pp. 274â278). For instance, within HLM 7.
3
Introductory Guide to HLM With HLM 7 Software G. David Garson
H
LM software has been one of the leading statistical packages for hierarchical linear modeling due to the pioneering work of Stephen Raudenbush and Anthony Bryk, who created the software and authored the leading text on hierarchical linear and nonlinear modeling (Bryk & Raudenbush, 1992; Raudenbush & Bryk, 2002). Though differences among software packages’ capabilities have diminished over time, HLM 7 offers a number of appealing advantages and capabilities. Among these are what many consider to be a more intuitive model specification environment, greater ease in creating three- and four-level models, its wide choice of estimation options, integrated likelihood ratio hypothesis testing, graphics options, and the ability easily to handle heterogeneous hierarchical linear models (where the dependent is thought to have different error variances for different levels of some grouping variable such as Agency).
HLM SOFTWARE Scientific Software International (SSI) distributes HLM 7. A free student edition of HLM 7 is available.1 The student edition is full-featured, including examples, but is limited in the size and complexity of models (though it will work with all example files provided with the software). HLM 7 software operates through several modules, each designed for a different type of HLM model, only some of which can be illustrated here due to space constraints: HLM2. For two-level linear and nonlinear models with one dependent variable. HLM3 and HLM4. For three-level and four-level models with one dependent variable. HGLM. For generalized linear models for distributions other than normal and link functions other than identity, handling binary, count, multinomial, and ordinal outcome variables in Bernoulli, binomial, Poisson, multinomial, and ordinal models.
55
56
PA RT I . G U I D E
HMLM. For multivariate normal models with more than one outcome variable, including when the level 1 covariance structure is homogenous, heterogeneous, loglinear, or AR(1) (first-order autoregressive). HMLM2. For two-level HMLM models where level 1 is nested within level 2. HCM2. For models where level 1 units are cross-classified by two level 2 units. HCM3. For three-level cross-classified models. HLMHCM. For two- and three-level hierarchical linear models with cross-classified random effects (ex., repeated test scores nested within students who are cross-classified by schools and neighborhoods).
In summary, HLM 7 is a versatile and full-featured environment for many linear and generalized linear mixed models.
ENTERING DATA INTO HLM 7 HLM software stores data in its own multivariate data matrix (MDM) format, which may be created from raw data or from data files imported from SPSS, SAS, Stata, SYSTAT, or other packages. MDM format files come in flavors keyed to the several types of HLM modules noted above. File creation options are accessed from the HLM File menu, illustrated in Figure 3.1 below. The example below illustrates data entry from an SPSS .sav file for models of type HLM2, but similar procedures are followed for other model types. “Stat package input,” depicted above, is the most common method of creating .mdm data files. Further, not only are data commonly prepared using statistical or data packages outside HLM 7, but as an additional preprocessing step, the researcher also should rule out multicollinearity among the level 2 (or higher) predictors. Having done this, there are two methods of importing files into HLM 7 from other statistical packages.
Input Method 1: Separate Files for Each Level This method results in faster processing but requires more time to set up the data. It requires that separate files be created outside of HLM 7 for each level of HLM analysis. For SPSS, these are .sav files. For SAS, these are SAS 5 transport files. Separate SYSTAT and Stata files are also acceptable. For instance, HLM 7 software comes with example files from the Singer (1998) “High School and Beyond” study. The SPSS files for this example include HSB1.SAV, which contains the level 2 link field (ID is school ID) and any student-level variables. There are multiple rows per school, one row per student. It is critical that the level 1 file is sorted such that all students for a given school ID are adjacent.
CHAPTER 3 . I ntroductory Guide to H L M With H L M 7 S o ftware
Figure 3.1
HLM 7 file menu
Likewise, the school-level (level 2) file, HSB2.SAV, contains the same level 2 link field and any school-level variables.
Input Method 2: Using a Single Statistics Program Data File This method2 is easier in terms of data management and is the one illustrated in this chapter. The same statistics package file formats as for Method 1 may be used. For the example, the single data file must be sorted such that all students for a given school ID are adjacent.
Making the MDM File The next step is to create the .mdm file, which is HLM software’s native data format. After it is created, the input data files are not needed. After creating the
57
58
PA RT I . G U I D E
Figure 3.2
HLM 7 select MDM type window
input data file in SPSS, SAS, or another package, HLM 7 is run and “Stat package input” is selected. This causes the “Select MDM type” window illustrated above to appear. The researcher chooses the HLM model type wanted. For instance, for a simple two-level hierarchical linear model, the selection would be HLM2. After selecting HLM2, the “Make MDM - HLM2” dialog box appears, illustrated in Figure 3.3. Here, the following steps are necessary: Set the “Input File Type” to “SPSS/Windows” (or another statistical package format). In the level 1 specification area, click the “Browse” button and browse to the input file for level 1. Then, as illustrated at the top of Figure 3.4, click the “Choose variables” button, click the checkbox indicating the level 2 link variable (id in the example), and click the checkboxes of any other level 1 variables in the analysis. In the level 2 specification area, click the “Browse” button and browse to the input file for level 2. This may be the same file as for level 1 (following Method 2 above). Again click the “Choose variables” button, click the checkbox indicating the level 2 link variable (agency), and click the checkboxes of any other level 2 variables in the analysis, as indicated in the lower half of Figure 3.4.
CHAPTER 3 . I ntroductory Guide to H L M With H L M 7 S o ftware
Figure 3.3
Make MDM - HLM 2 window in HLM 7
Save the MDM template file by clicking the “Save mdmt file” button, making sure the file location window points to the desired folder and giving a filename (add the .mdmt extension), then clicking the “Save” button. To complete the process, the researcher clicks the “Make MDM” button, giving a filename (here, M3_L2.mdm, standing for mixed linear model Chapter 3, 2-level). The .mdm file is created, and the descriptive statistics module runs. Alternatively, one may click the “Check Stats” button. This output, shown in Figure 3.5, should be examined to verify the results. For instance, it is prudent to examine the reported sample size, which, if low, flags that the researcher has not sorted the Level 1 file to assure that individual rows for the same level 2 ID (AGENCY in this example) are adjacent.
59
60
PA RT I . G U I D E
Figure 3.4
Choose variables windows in HLM 7
CHAPTER 3 . I ntroductory Guide to H L M With H L M 7 S o ftware
Figure 3.5
61
“Check Stats” button output in HLM 7
Click the “Done” button to exit to the WHLM model construction screen discussed below. At this point, the researcher will have saved three files to the disk: the newly created HLM-compatible data file, H3_L2.mdm in this example; the default template creatmdm.mdmt (the researcher may override the default name); and the output file above, HLM2MDM.STS (if desired, use File, Save As, to save output under a different name, as this default file may get reused with new content if there are multiple runs).
THE NULL MODEL IN HLM 7 After data are entered, the next step is to create the model. Typically, the first model created is the null model. The null model serves two purposes: (1) It is the basis for calculating the intraclass correlation coefficient (ICC), which is the usual test of whether multilevel modeling is needed; and (2) it outputs the deviance statistic (-2LL) and other coefficients used as a baseline for comparing later, more complex models. For the current example, the null model addresses the question, “Is there a (level 2) agency effect on the (level 1) intercept of performance score, which represents the mean score?” If there is an agency effect, then ordinary regression methods will suffer from correlated error, and some form of linear mixed modeling is required. The null model, like all two-level hierarchical models in HLM 7, is created in the WHLM modeling dialog, illustrated in Figure 3.6. This dialog is reached either on clicking “Done” in the “MAKE MDM” dialog or, if the MDM file was
62
Figure 3.6
PA RT I . G U I D E
WHLM modeling window in HLM 7: Null model
previously saved, from the HLM menu by selecting File, “Create a new model using an existing MDM file,” and then opening the appropriate .mdm file. In the WHLM modeling dialog illustrated in Figure 3.6, the employee level (level 1) dependent variable performance score (SCORE0) is designated as the outcome variable. No other predictors are added. HLM 7 already knows “Agency” is the level 2 grouping variable and automatically assumes it is a predictor of the level 1 intercept of SCORE0. Table 3.1 Summary of the Null Model When SCORE0 is designated as the outcome variable, HLM 7 constructs and displays the model, in this case the null model (also called the intercept-only model Level 1 Model or the one-way ANOVA model with random effects). SCORE0ij = β0j + rij The null model is shown in Table 3.1 below. Clicking the “Mixed” button at the bottom of the WHLM dialog Level 2 Model creates the combined HLM equation shown at the botβ0j = γ00 + u0j tom of the figure: The two separate equations shown in Mixed Model the upper main window are mathematically equivalent to the single combined mixed model equation. For SCORE0ij = γ00 + u0j+ rij learning purposes, it is easier to examine the equations
CHAPTER 3 . I ntroductory Guide to H L M With H L M 7 S o ftware
at each level. At level 1, SCORE0 is predicted by an intercept term and a random term. The symbol for the intercept term varies depending on the distribution specified for the outcome variable (this is done in the “Basic Settings” window, described below) and is expressed equivalently but differently in output. The level 1 intercept term, expressed as β0j in output, is a function of a random intercept term at level 2 (γ00) and a level 1 residual error term (rij). The level 1 intercept, in turn, is a function of the grand mean (γ00) across level 2 units, which are agencies in this example, plus a random error term (u0j), signifying the intercept is modeled as a random effect. Substituting the right-hand side of the level 2 equation into the level 1 equation gives the mixed model equation for the null random intercept model. HLM 7 will create one level 1 regression for each agency, and then will utilize the variance in these intercepts when estimating parameters and standard errors at level 1. This is what makes the process different from ordinary regression, where a single overall intercept is estimated. Before calculating estimates, the researcher may specify the distribution of the outcome variable by selecting “Basic Settings” from the main menu bar, yielding the window shown in Figure 3.7. The normal distribution, used in this example, is the default. Other available specifications support Bernoulli, Poisson, multinomial, and ordinal distributions. Selecting “Bernoulli” for a binary outcome variable applies a logistic link function, and in the ensuing multilevel logistic regression, interpretations are in terms of the log odds of the outcome rather than in terms of the raw outcome itself. Selecting “Multinomial” creates a multilevel multinomial regression using a logit link. “Ordinal” supports multilevel ordinal regression models. Multilevel Poisson regression models employ a Poisson log link and require an exposure variable (time, for example). In this window, one may also specify the name and location of the output statistics file and the output graphics file. It is also possible to modify model estimation settings prior to running the model by selecting “Other Settings” from the main menu bar, then “Estimation Settings,” as illustrated in Figure 3.8. Estimation settings were discussed in Chapter 2. For the null model, we use the default setting, restricted maximum likelihood estimation. There is also an “Iterations Settings” window, also from the “Other Settings” menu. Though not illustrated here, it provides options discussed in Chapter 2 with regard to estimation settings. One may also select “Other Settings” from the main menu bar, then “Output Settings” to obtain the window shown in Figure 3.9. For this model, one may choose to print out variance–covariance matrices or to restrict output to the main results. The default is restricted output and no matrices. To run the null model, the researcher simply selects “Run Analysis” from the main menu bar. Output is sent to the file location and a name is specified in the “Basic Model Specifications” window (Figure 3.7). To view the output, select File, View Output, from the main menu bar. For this example, the critical output of the null model looks as shown in Table 3.2. The phrase “Number of estimated parameters = 2” refers to the fact that in a null model, estimates are made for the level 1 intercept and the level 2 intercept. In the
63
64
PA RT I . G U I D E
Figure 3.7
Basic Model Specifications window from Basic Settings menu in HLM 7
final variance components table, the fact that the component for the intercept (161.94, which HLM labels tau, τ) is significant means that the intercept of the outcome variable, SCORE0, is significantly affected by its predictors, which in this example is the level 2 effect of agency. A non-significant intercept in the variance components table term (not the case here) would mean that after other variables in the model are controlled, there would be no residual between-groups variance in the level 1 dependent variable (Score0). The agency effect is smaller than the residual variance component (212.69, which HLM also labels sigmasquared, σ2), indicating that there is still considerable residual variation in Score0 yet to be explained and that a model with additional predictors may be needed. The fact that the intercept component is significant means that the intraclass correlation coefficient, ICC, is also significant, indicating that a multilevel model is appropriate and needed. ICC varies from +1.0 when group means differ but within any group there is no variation, to - 1/(n - 1) when group means are all the same but within-group variation is very large. At the extreme, when ICC
65
CHAPTER 3 . I ntroductory Guide to H L M With H L M 7 S o ftware
Figure 3.8
Estimation Settings window from Other Settings menu in HLM 7
Figure 3.9
Output Settings window for HLM2
Table 3.2
Final Estimation of Variance Components for the Two-Level Null Model
Random effect
Standard deviation
Variance component
d.f.
χ2
p-value
INTRCPT1, u0
12.72558
161.94029
131
2391.61810