The Impact of Measurement Error in Predictor Variables in Multilevel ...

2 downloads 544 Views 146KB Size Report
17, 2006, Jacksonville. ... classrooms in schools, or schools in districts frequently use multilevel models to estimate and account for the ... A variety of technical issues accompany the valid estimation of parameters in multilevel models ...
Impact of Measurement Error 1

The Impact of Measurement Error in Predictor Variables in Multilevel Models: An Empirical Investigation of Statistical Bias and Sampling Error

Jeffrey D. Kromrey 1 James T. Coraggio Ha T. Phan Jeanine L. Romano Melinda R. Hess Reginald S. Lee Constance V. Hines Stephen L. Luther

1

Jeanine Romano is affiliated with the University of Tampa; other authors are affiliated with the University of

South Florida

Paper presented at the annual meeting of the Florida Educational Research Association, November 15 – 17, 2006, Jacksonville.

Impact of Measurement Error 2

The Impact of Measurement Error in Predictor Variables in Multilevel Models: An Empirical Investigation of Statistical Bias and Sampling Error Multilevel models are commonly used in education, psychology, sociology, and medicine. Models involving nested data structures such as studies of students in classrooms, classrooms in schools, or schools in districts frequently use multilevel models to estimate and account for the correlation among these nested observations. Similarly, in longitudinal research, repeated observations may be considered as nested within individuals. The frequency with which multilevel data structures are encountered has led to a useful set of statistical methods, referred to in the literature by such names as hierarchical linear modeling, multilevel modeling, mixed linear modeling, or growth curve modeling (for a general introduction to multilevel modeling, see Raudenbush & Bryk, 2002). Multilevel Models Multilevel models provide an analysis of the relationship between one or more predictors and an outcome variable. Multilevel models are distinguished from traditional single-level models (such as ANOVA or multiple regression models) in that the predictors are conceptualized as occurring at more than one level, hence the nested effect. For example, a study of classroom achievement may include student level predictors (e.g., previous achievement, gender, attendance) and classroom level predictors (e.g., type of curriculum, teachers’ years of experience, classroom mean prior achievement). A level-1 model may describe the achievement outcome as a function of the student level predictors:

yij

= β 0 j + β1 j X 1ij + β 2 j X 2ij + ...β pj X pij + rij

Impact of Measurement Error 3

where yij is the achievement outcome score for the ith child in the jth classroom, β0j is the intercept of the regression equation predicting reading comprehension at the end of the study in the jth classroom, βhj are the regression coefficients indexing the relationship between each predictor (Xhij) and the outcome scores in the jth classroom, and rij is the residual error, which is assumed to be normally distributed with a covariance of Σ. A level-2 model may then be fit to the data to examine the relationship between the classroom level variables and the parameters of the level-1 model:

β0 j

= γ 00

+ γ 01 W1 j + γ 02 W2 j + ... + γ 0 q Wqj

+ u0 j

β1 j

= γ 10

+ γ 11 W1 j + γ 12 W2 j + ... + γ 1q Wqj

+ u1 j

β pj

= γ p0

. . .

+ γ p1 W1 j + γ p 2 W2 j + ... + γ pq Wqj

+ u pj ,

where Wij are level-2 predictors, and uij are random errors, which are assumed to be normally distributed with a covariance of Γ. Statistical inferences in multilevel models may include hypothesis testing and confidence interval estimation of both regression weights (i.e., the fixed effects) and errors in the model (the random effects). Purpose of the Study

A variety of technical issues accompany the valid estimation of parameters in multilevel models, including assumptions of normality and homogeneous variances, appropriate specification of error structures, and accurate specification of predictors and their functional form. A fundamental technical issue that is often overlooked in the analysis of multilevel data is the impact of measurement error in predictors. For example, in a recent review of 98 published reports of research conducted using multi-level models, Ferron et al. (2006) found only 16

Impact of Measurement Error 4

studies that suggested any consideration of the impact of measurement error on the analysis. These results parallel the disregard for measurement error in many reports of multiple regression analysis. Although research on the effects of random measurement errors in regression analysis has a fairly long history (see Pedhazur, 1997, for a brief review) and the effects of measurement errors on the validity of regression analysis can be severe (Cochran, 1968), Jencks et al. (1972) suggested that “the most frequent approach to measurement error is indifference” (p. 330). The apparent neglect of this assumption in applied research may be related to a lack of its treatment in the technical literature in education. In contrast to the educational literature, which yields few publications on the impact of measurement error in multilevel models, the statistical literature in medical research provides a plethora of papers highlighting both the impact of measurement error on parameter estimates and methods for reducing such impact (see, for example, Kim & Zeleniuch-Jacquotte, 1997; Palta & Lin, 1999; Ko & Davidian, 2000). The majority of the medical literature, however, addresses generalized multilevel models in which the outcomes are dichotomous or multinomial variables. The purpose of this study was to investigate the impact of measurement error in predictor variables used in multilevel models with continuous outcome variables. Method

This research was a Monte Carlo study, in which random samples were generated under known and controlled population conditions. In this Monte Carlo study, samples were generated from multivariate, multilevel populations and each sample was analyzed using a multilevel model. The Monte Carlo study included four factors in the design. These factors were (a) number of level-2 units included in the sample (N2 = 15, 30, 60, and 120), (b) number of level-1 units

Impact of Measurement Error 5

included in the study (the level-1 sample was a random variable with 15 ≤ n1 ≤ 30 , 20 ≤ n1 ≤ 50 , and 20 ≤ n1 ≤ 100 ), (c) regressor intercorrelation ( ρ 12 = .00, .30, and .60, at both level-1 and level-2), and (d) regressor reliabilities ( ρ xx = .60, .80, .90, and 1.00). All models were specified with three level-1 predictors and two level-2 predictors. Measurement error was simulated in the data following procedures used by Maxwell, Delaney, and Dill (1984); Jaccard and Wan (1995); and Hess and Kromrey (2002). Following this method, two normally distributed random variables for each regressor were generated, one of which represented the ‘true score’ on the regressor, the other representing measurement error. Fallible, observed scores on the regessors were calculated as the sum of the true and error components, consistent with classical measurement theory. The reliabilities of the regressors were controlled by adjusting the error variance relative to the true score variance:

ρ xx =

σ T2 σ T2 + σ E2

where σ T2 and σ E2 are the true and error variances, respectively, and ρ xx is the reliability. The simulations were conducted using SAS/IML version 9.1. Conditions for the study were run under Windows XP. Normally distributed random variables were generated using the RANNOR random number generator in SAS. A different seed value for the random number generator was used in each execution of the program. The program code was verified by handchecking results from benchmark datasets. For each condition investigated in this study, 1,000 samples were generated. The use of 1,000 estimates provides adequate precision for the investigation of the sampling behavior of point and interval estimates of the multilevel model coefficients. For example, 1,000 samples

Impact of Measurement Error 6

provides a maximum 95% confidence interval width around an observed proportion that is ± .03 (Robey & Barcikowski, 1992). The outcomes of interest in this simulation study included both point estimates (the bias and sampling error of the fixed effects and random effects) and interval estimates (confidence interval coverage and width for both sets of effects). In addition, conditions were simulated in which one of the coefficients was null in the population (allowing estimation of Type I error control in the presence of measurement error). A final set of conditions was structured such that a single predictor was measured without any measurement error. This set of conditions allowed examination of the extent to which fallible regressors in a model impact estimates associated with a regressor measured without error. To guide the interpretation of the simulation results, each of the outcomes described above was treated as a dependent variable and a series of factorial ANOVAs was conducted. The independent variables in each ANOVA were the four Monte Carlo design factors: (a) the number of level-2 units included in the study, (b) the number of level-1 units included in the study, (c) the regressor intercorrelation, and (d) the regressor reliabilities. In addition to main effects, the first-order interactions of these factors were examined. For each of these analyses, effect size indices were calculated using eta-squared (η 2 ) to estimate the proportion of variance associated with each effect (Maxwell & Delaney, 1990). Results

The results of this research were initially examined in terms of estimated statistical bias and RMSE of the point estimates of the parameters (both the fixed effects and the random effects). Subsequently, the impact of measurement error on inferences from the models was evaluated (Type I error control and statistical power of hypothesis tests, and confidence interval

Impact of Measurement Error 7

coverage and width). Prior to examining these results, however, an analysis of the nonconvergence rates among the multilevel level models is presented. Non-convergence of models

The models were estimated using the default convergence criteria provided in SAS PROC MIXED. For samples in which the model failed to converge under these criteria, a more liberal set of criteria was applied. These liberal criteria included a maximum of 500 evaluations of the likelihood (rather than the default of 150) to allow more iterations, the use of Fisher scoring for five iterations, and a convergence criterion of 0.002. Using the default convergence criteria, the mixed model convergence proved to be difficult in samples with fewer than 30 level-2 units although the convergence rate increased rapidly as the number of level-2 units increased (Table 1). Convergence of these models was also substantially affected by the reliability of the regressors ( ρ xx ) although even with perfectly reliable regressors, over 12% of the models failed to converge when the number of level-2 units was below 30 and over 35% percent failed to converge when the regressor reliability was .60. Table 1 Proportion of Nonconverged Samples by Number of Level-2 Units and Regressor Intercorrelation Number of Level-2 Units

ρ xx

15

30

60

120

.60

.36

.04

.00

.00

.80

.22

.02

.00

.00

.90

.17

.01

.00

.00

1.00

.12

.00

.00

.00

Model convergence was also affected by the level of intercorrelation among the regressors; nonconvergence climbed as ρ12 increased although this was muted somewhat by the reliability of the regressors (Table 2). For example, with the fewest number of level-2 units,

Impact of Measurement Error 8

perfect reliability, and uncorrelated predictors, nearly 9% of the models failed to converge. However with fewer level-2 units the effect of more measurement error and less reliable predictors became quite pronounced. For example, with 15 level-2 units, ρ12 = .60 and ρ xx = .60, over 44% of the simulated samples failed to converge; when the number of level-2 units increased to 30 in this scenario, 7% of the samples still did not converge to a solution. Table 2 Proportion of Nonconverged Samples Under Default Convergence Criteria by Number of Level-2 Units, Regressor Reliability, and Regressor Intercorrelation

ρ12 N2

15

30

60

120

ρ xx

.00

.30

.60

.60

.25

.32

.44

.80

.15

.19

.30

.90

.11

.14

.23

1.00

.09

.11

.17

.60

.01

.03

.07

.80

.00

.01

.03

.90

.00

.00

.01

100

.00

.00

.01

.60

.00

.00

.00

.80

.00

.00

.00

.90

.00

.00

.00

1.00

.00

.00

.00

.60

.00

.00

.00

.80

.00

.00

.00

.90

.00

.00

.00

1.00

.00

.00

.00

Applying more liberal convergence criteria to the nonconverging samples (see Table 3) resulted in fewer instances of nonconvergence (no more than 3/10 of 1% with N2 = 30). However, for the smallest number of level-2 units, nonconvergence rates continued to exceed

Impact of Measurement Error 9

10% in most conditions. These nonconvergence results highlight the importance of designing studies with adequate numbers of level-2 units. Table 3 Proportion of Nonconverged Samples Under Liberal Convergence Criteria by Number of Level-2 Units, Regressor Reliability, and Regressor Intercorrelation

ρ12 N2

15

30

60

120

ρ xx

.00

.30

.60

.60

.11

.12

.14

.80

.09

.10

.12

.90

.08

.10

.11

1.00

.09

.09

.11

.60

.00

.00

.00

.80

.00

.00

.00

.90

.00

.00

.00

1.00

.00

.00

.00

.60

.00

.00

.00

.80

.00

.00

.00

.90

.00

.00

.00

1.00

.00

.00

.00

.60

.00

.00

.00

.80

.00

.00

.00

.90

.00

.00

.00

1.00

.00

.00

.00

Statistical Bias

Variability in levels of statistical bias in the parameter estimates was associated with the reliability of predictors for all parameters except the model intercept. The magnitude of the relationship between bias and reliability ranged from η 2 = .18 for the random effect parameters to η 2 = .96 for the level-2 fixed effect parameters.

Impact of Measurement Error 10

A summary of the estimated mean statistical bias by levels of design factors are displayed in Table 4. Under conditions of perfect regressor reliability ( ρ xx = 1.00), no statistical bias was detected in any of the fixed effects (i.e., level-1, level-2 and cross-level interaction coefficients) or random effects. With regressor reliability values less than 1.00, however, the estimated mean statistical bias became more substantial. The direction of bias was positive for random effects (i.e., the average sample values were larger than the parameters), and negative for fixed effects (the average sample values underestimated the parameters). As evident in Table 4, statistical bias tended to be the most severe for parameter estimates of the level-2 fixed effects. Table 4 Estimated Mean Statistical Bias of Parameter Estimates by Regressor Reliablity Parameter Estimated

ρ xx .60

Level-1 -.12

Level-2 -.34

Cross-level Interaction -.26

Random Effect .27

.80

-.06

-.17

-.14

.13

.90

-.03

-.08

-.07

.07

1.00

.00

.00

.00

.00

Root Mean Square Error (RMSE)

Variability in RMSE was examined as a function of the five design factors investigated within the context of this study. The resultant analysis suggested that for the level-1 fixed effects, one factor in the research design, regressor reliability ( ρ xx ) was associated with substantial differences in average RMSE (η 2 = .61) . Similarly, for the level-2 fixed effects, the cross-level interaction effects and the random effects, regressor reliability was the only design factor associated with substantial differences in RMSE (η 2 = .87, .80, and .19, respectively). For model intercepts, however, two design factors and their interaction were associated with

Impact of Measurement Error 11

substantial differences in RMSE: the number of level-2 units in the study (η 2 = .47 ) , regressor reliability (η 2 = .32 ) , and the interaction between the number of level-2 units and regressor

reliability (η 2 = .20 ) . The estimated RMSE means by levels of these design factors are reported in Table 5 and a graph of the interaction effect is shown in Figure 1. Examination of Table 5 reveals that as the number of level-2 units increased, estimated RMSE means decreased; this pattern was consistent across all four levels of regressor reliability investigated. As anticipated, as regressor reliability increased the average RMSE decreased. The estimated RMSE was greatest when the number of level-2 units was small and the regressor reliability was low. Table 5 Mean RMSE of Model Intercepts by Number of Level 2 Units and Regressor Reliability Regressor Reliability

( ρ xx )

N2

.60

.80

.90

1

15

.08

.05

.03

.01

30

.04

.02

.01

.01

60

.02

.01

.01

.00

120

.01

.01

.00

.00

Type I Error Control

The analysis of the variability in estimated Type I error rates suggested that three factors in the research design were associated with substantial differences in these rates: the number of level-2 units in the study (η 2 = .14), the extent of intercorrelation between the level-1 regressors (η 2 = .19), and the regressor reliability (η 2 = .28). The estimated mean Type I error rate by levels of these design factors are presented in Table 6.

Impact of Measurement Error 12

0.1 0.6 0.8 RMSE

0.9 1

0.05

0 15

30

60

120

N2

Figure 1 . RMSE of Model Intercepts: Number of Level-2 Units and Regressor Reliability

Under conditions of perfect regressor reliability ( ρ xx = 1.00), the estimated Type I error rate converged toward the nominal alpha level (.05) as the number of level-2 units increased regardless of the level of intercorrelation among the regressors. With regressor reliability values less than 1.00, however, the estimated Type I error rate increased substantially with larger numbers of level-2 units, larger values of regressor intercorrelation, and lower values of regressor reliability. At the smallest level of N2 (N2 = 15), the mean estimated Type I error rate reached as high as .282 (with ρ xx = .60 and ρ12 = .60). As the number of level-2 units increased to N2 = 120, the mean estimated Type I error rate in these conditions increased to .934 (i.e., incorrectly rejecting the null hypothesis in nearly every sample tested). Especially disconcerting was the lack of Type I error control at otherwise ‘respectable’ levels of regressor reliability. For

Impact of Measurement Error 13

example, with ρ xx = .90, the mean estimated Type I error rate reached as high as .120 with N2 = 15 and as high as .450 with N2 = 120.

Table 6 Mean Estimated Type I Error Rate by Number of Level 2 Units, Regressor Reliability, and Regressor Intercorrelation Regressor Intercorrelation N2

15

30

60

120

ρ xx

.00

.30

.60

.60

.10

.17

.28

.80

.09

.12

.20

.90

.08

.09

.12

1.00

.08

.08

.08

.60

.12

.26

.48

.80

.09

.17

.33

.90

.07

.10

.16

1.00

.06

.06

.06

.60

.16

.44

.75

.80

.11

.26

.56

.90

.08

.13

.26

1.00

.06

.06

.06

.60

.25

.67

.93

.80

.16

.44

.81

.90

.09

.21

.45

1.00

.05

.06

.05

Statistical Power

The analysis of the variability in estimated statistical power suggested that two factors in the research design were associated with substantial differences in statistical power: the number of level-1 units and the number of level-2 units in the study. The number of level-1 units was

Impact of Measurement Error 14

identified as substantially related to statistical power (η 2 > .10) in the main effects of level-1 coefficients (η 2 > .10), the cross-level interaction effects (η 2 = .13), and the interaction between the number of level-1and level-2 units in the random effects (η 2 = .25). The number of level-2 units was identified as significant in the model intercepts (η 2 =.94), the main effects of level-1 (η 2 = .76), the main effects of level-2 (η 2 = .89), the cross-level interaction effects (η 2 = .67), and the random effects (η 2 = .45). The estimated mean power for the interaction of the number of level-1 units and the number of level-2 units are shown in Tables 8 through 10. For fixed and cross-level effects (Tables 8 and 9), the statistical power increased as the number of units increased. This increase in statistical power was more substantial as the number of level-2 units increased. In the level-1 fixed effects, the statistical power for the largest n1 sample size (n1 = 60) ranged from a low of .08 (N2 = 15) to a high of .39 (N2 =120). In the cross-level effects, the statistical power for the largest n1 sample size (n1 = 60) ranged from .10 (N2 = 15) to .32 (N2 =120). For the random effects (Table 9), the statistical power was initially higher for the smaller level-1 and level-2 sample sizes. This is believed to be an anomaly based on the low convergence rate for the random effects in the small sample sizes.

Table 7 Estimated Mean Power of Tests for Level-1 Fixed Effects by Number of Level-1 Units and Number of Level-2 Units Number of Level-2 Units n1

15

30

60

120

22.5

.05

.09

.13

.22

35

.06

.10

.17

.29

60

.08

.13

.22

.39

Impact of Measurement Error 15

Table 8 Estimated Mean Power of Tests for Cross-Level Interaction Fixed Effects by Number of Level-1 Units and Number of Level-2 Units Number of Level-2 Units n1

15

30

60

120

22.5

.07

.09

.12

.19

35

.08

.10

.15

.24

60

.10

.13

.19

.38

Table 9 Estimated Mean Power for Tests of Random Effects by Number of Level-1 Units and Number of Level-2 Units Number of Level-2 Units n1

15

30

60

120

22.5

.52

.33

.37

.52

35

.37

.31

.49

.73

60

.26

.42

.74

.93

Confidence Interval Coverage

The factors in the research design that had significant impact on the variability in the confidence band coverage were the number of level-2 units and the magnitude of the reliability. When the confidence band coverage was evaluated, the number of level-2 units was found to have a significant impact on the variability in band coverage for level-1 coefficients (η2 = .13), and level-2 coefficients (η2 = .16). In addition there was a significant interaction between the number of level-2 units and the magnitude of the reliability (η2 = .11) for the confidence band coverage of the level-2 coefficients. Regarding the impact of the magnitude of the reliability, this factor was associated with the confidence interval coverage for the cross-level interaction effects

Impact of Measurement Error 16

(η2 = .67), the random effects (η2 = .23), the level-1 coefficients (η2 = .54), and the level-2 coefficients, (η2 = .63). In Table 10 the average estimated confidence band coverage by magnitude of the reliability for the cross-level interaction effects and the random effects are displayed. It was not surprising that when ρ xx = 1.00 the band coverage was near the nominal 95% level for both design factors. As regressor reliability decreased, confidence band coverage decreased for both types of parameters, with the impact on the cross-level interaction confidence bands being much more severe than the impact on the random effects confidence bands. The band coverage for cross-level interaction ranged from .13 to .94 while coverage ranged only from .52 to .95 for the random effects.

Table 10 Estimated Average Confidence Interval Coverage by Regressor Reliability

ρ xx

Cross-level Interaction

Random Effects

.60

.13

.52

.80

.40

.54

.90

.72

.64

1

.94

.95

In Table 11, mean confidence band coverage for the level-1 fixed effects coefficients are presented as a function of the number of level-2 units and reliability. With perfect reliability, the band coverage was at the nominal level (.95) regardless of the number of units. When N2 = 15, as the reliability decreased the band coverage decreased but was never smaller than .71. The impact of measurement error on band coverage was much more dramatic with larger sample

Impact of Measurement Error 17

sizes. At the most extreme, with N2 =120 and ρ xx = .60, the estimated band coverage was only .13 Table 11 Estimated Average Confidence Interval Coverage for Level-1 Coefficients by Regressor Reliability and Number of Level-2 Units Number of Level-2 Units

ρ xx

15

30

60

120

1.00

.95

.95

.95

.95

.90

.93

.91

.88

.80

.80

.89

.82

.68

.46

.60

.71

.47

.25

.13

The mean confidence interval coverage for the level-2 coefficients by number of level-2 units and reliability is presented in Table 12. As with the other coefficients, the band coverage remained near the nominal level when regressors were measured without error and the coverage decreased as regressor reliability decreased. Further, the impact was most pronounced with larger numbers of level-2 units, becoming as small as .02 with the largest samples, N2 = 120, and smallest level of reliability examined ρ xx = .60. Table 12 Estimated Average Confidence Interval Coverage for Level-2 Coefficients by Regressor Reliability and Number of Level-2 Units Number of Level-2 Units

ρ xx

15

30

60

120

1.00

.93

.94

.94

.95

.90

.89

.87

.80

.66

.80

.83

.74

.56

.31

.60

.64

.40

.15

.02

Impact of Measurement Error 18

Confidence Interval Width

The factors in the research design that had significant impact on the variability in the confidence band width were the number of level-2 units and the magnitude of the reliability. When the confidence band widths were evaluated, the number of level-2 units were found to have a significant impact on the variability in bandwidth for model intercepts (η2 = .57), level-1 effects (η2 = .86), level-2 effects (η2 = .18), cross-level interaction effects (η2 = .84), and random effects (η2 = .45). In addition to the number of level-2 units, the magnitude of the reliability coefficient had a significant impact on the variability in confidence band width for the intercepts (η2 = .36) and for level-2 coefficients (η2 = .23). In Table 13 mean confidence band width by number of level-2 units is provided. As expected, larger numbers of level-2 units provide tighter confidence intervals for all three types of parameters. Table 13 Estimated Average Confidence Interval Widths by Number of Level-2 Units N2

Level-1

Cross-level interaction

Random Effects

15

.50

.46

1481.78

30

.32

.30

584.74

60

.22

.21

62.17

120

.15

.15

.15

Tables 14 and 15 display information about the impact of the number of level-2 units and the magnitude of the reliability coefficient on the confidence intervals for the intercepts and the level-2 coefficients, respectively. These data demonstrate that the average confidence interval width increased as reliability decreased, regardless of the sample size (confidence intervals with

ρ xx = .60 were approximately three times the width as those with ρ xx = 1.00).

Impact of Measurement Error 19

Table 14 Estimated Average Confidence Interval Widths for Model Intercepts by Regressor Reliability and Number of Level-2 Units Number of Level-2 Units

ρ xx

15

30

60

120

1.00

.44

.28

.19

.13

.90

.70

.45

.31

.21

.80

.90

.58

.39

.27

.60

1.22

.79

.53

.37

A similar pattern was seen for the evaluation of the confidence intervals for the level-2 coefficients (Table 15). Lower levels of regressor reliability yielded wider confidence intervals across the number of level-2 units examined. For these confidence intervals, the proportional increase in confidence interval width was less than that seen for the confidence intervals around the intercepts (confidence intervals with ρ xx = .60 were approximately twice the width as those with perfectly reliable regressors). Table 15 Estimated Average Confidence Interval Widths for Level-2 Coefficients by Regressor Reliability and Number of Level-2 Units Number of Level-2 Units

ρ xx

15

30

60

120

1.00

.46

.30

.21

.15

.90

.68

.45

.31

.22

.80

.81

.54

.37

.26

.60

.92

.62

.42

.30

Impact of Measurement Error 20

Discussion and Conclusions

The major conclusion from this research is that small level-2 samples sizes and imperfect regressor reliability along with high intercorrelation among regressors can have a profound impact on the quality of results in multi-level modeling. From low convergence rates (using default criteria) to high RMSE, bias parameter estimates and inflated Type I error rates, small level-2 sample sizes and imperfect regressor reliability can lead to deleterious results. Researchers should take caution when using regressors with unknown reliability since these simulation results suggest that even regressors with ‘acceptable’ reliability can have an adverse impact on model estimates. For example, regressors with a reliability of .80 in this simulation study produced RMSE that were over four times that of regressors with reliability of 1.00 with level-2 sample sizes of 15 and 30. A regressor reliability of .60, which is not improbable in many areas of research, resulted in a RMSE that was almost eight times that of regressors with a reliability of 1.00. Interestingly enough, while RMSE decreased with larger level-2 samples sizes, Type I error rates increased and this effect increased when regressors were intercorrelated. For instance, at a level-2 sample size of 120, the Type I error rates using regressors with a reliability of .80 were three times those of regressors with perfect reliability when the regressors were uncorrelated. When the regressors had a correlation of .30, the regressors with a reliability of .80 resulted in Type I error rates eight times those of regressors with perfect reliability. When the regressors had a correlation of .60, the Type I error rates of regressors with a reliability of .80 were fifteen times those of regressors with perfect reliability. While this may at first seem counter-intuitive (under the assumption that larger samples are ‘better’), this effect was the result of the smaller confidence intervals associated with the larger sample sizes. Not all the simulation

Impact of Measurement Error 21

results were unexpected. As one might expect, statistical power increased as both the number of level-1 and level-2 units increased (with substantially greater impact seen with increases in the number of level-2 units). With the increasing popularity of multi-level models, there is a great need for an examination of the statistical behavior of such models under less-than-ideal conditions. This research, while limited to a set number of conditions and two level multi-level models, provides important evidence related to the impact of measurement error on the inferences drawn from such models. Additional studies should be performed that investigate additional factors such as correlated measurement errors, dichotomous outcome variables, larger numbers of regressors, growth curve models and the impact of missing data in the context of fallible regressors.

Impact of Measurement Error 22

References

Raudenbush, S. W., & Bryk, A. S. (2002). Hierarchical linear models: Applications and data

analysis methods. Newbury Park: Sage Publications. Ferron, J., Hess, M. R., Hogarty, K. Y., Dedrick, R. F., Kromrey, J. D., Lang, T. R., Niles, J., and Lee, R. (2006, April). Multilevel modeling: A review of methodological issues and

applications. Paper presented at the annual meeting of the American Educational Research Association, San Francisco. Pedhazur, E. J. (1997). Multiple regression in behavioral research (3rd Ed.). Fort Worth: Harcourt Brace. Cochran, W. G. (1968). Errors of measurement in statistics. Technometrics, 10, 637 – 666. Jencks, C., and others (1972). Inequality: A reassessment of the effect of family and schooling in

America. New York: Basic Books. Kim, M. Y. & Zeleniuch-Jacquotte, A. (1997). Correcting for measurement error in the analysis of case-control data with repeated measurements of exposure. American Journal of

Epidemiology, 145, 1003-101. Ko, H. & Davidian, M. (2000). Correcting for measurement error in individual-level covariates in nonlinear mixed effects models. Biometrics, 56, 368-375. Palta, M. & Lin, C. (1999). Latent variables, measurement error and methods for analyzing longitudinal binary and ordinal data. Statistics in Medicine, 18, 385-396. Hess, M. R. & Kromrey, J. D. (2002, April). Interval estimates of R2: An empirical investigation

into the influence of fallible regressors. Paper presented at the annual meeting of the American Educational Research Association, New Orleans. Maxwell, S. E., & Delaney, H. D. (1990). Designing experiments and analyzing data. Belmont, CA: Wadsworth. Maxwell, S. E., Delaney, H. D., & Dill, C. A. (1984). Another look at ANCOVA versus blocking. Psychological Bulletin, 95, 136-147. Jaccard, J. & Wan, C. K. (1995). Measurement error in the analysis of interaction effects between continuous predictors using multiple regression: Multiple indicator and structural equation approaches. Psychological Bulletin, 117, 348-357.

Impact of Measurement Error 23

Robey, R. R. & Barcikowski, R. S. (1992). Type I error and the number of iterations in Monte Carlo studies of robustness. British Journal of Mathematical and Statistical Psychology, 45, 283-288.

Suggest Documents