Nov 13, 2018 -
If you ignore manipulation errors in your experiments you might be p-hacking without knowing. Working Paper (November 13, 2018) Wagner A Kamakura, Rice University Abstract This research brief is a cautionary note on the dangers of using p-values as empirical evidence for crossover effects in consumer research. If you rely on experimental manipulation to produce between-subjects variation on a moderator construct, and then test for interaction effects on the manipulation (rather than the moderator construct), your evidence of a crossover effect might be based on spurious statistical significance. Introduction C.K. is a (fictional) doctoral student who is ready for a successful career as a consumer researcher, with one JCR article already in print and another on a second revision. He feels confident about his research paradigm, based on detecting factors that moderate well-established direct effects. His most recent experiment is a good example of this paradigm. The main purpose of the experiment was to demonstrate the moderating role of mood on how paying a higher-than-expected price (captured by the differential X between paid and lowest available price, controlled by the lab-monitor under C.K.’s protocol) results in post-purchase regret (measured in the dependent variable Y). Mood (captured by K, a dummy variable) was manipulated in this experiment by having subjects watch either a uplifting (K=1) or a depressing (K=0) video clip. C.K. ran the experiment on a sample of 50 subjects in each (K=0 and K=1) group, but none of the effects (for X and K and the interaction XK) on Y showed a statistical significance at the p 0.05 𝑎𝑎𝑎𝑎𝑎𝑎 𝑝𝑝(𝑏𝑏3 < 0) < 0.05] = 𝐹𝐹𝐹𝐹𝐹𝐹𝐹𝐹𝐹𝐹 ) in 4% to 7% of the replications when none exists according to the “true” model. The reader must be
careful not to interpret these percentages as Type I errors, because these percentages are based on a comparison of two sets of estimates (from the “true” and naïve regressions), rather than between estimates and the true hypothesized value. In the cases reported in Table 1 the “true” model would not show statistically significant main effects, thereby rejecting a crossover effect, even when the interaction is statistically significant (𝑝𝑝(𝑏𝑏3 < 0) < 0.05). Because the main effects are heavily attenuated, the naïve
model is more likely to detect a crossover effect when the interaction is significant(𝑝𝑝(𝛽𝛽3 < 0)
0.05 𝑎𝑎𝑎𝑎𝑎𝑎 𝑝𝑝(𝑏𝑏3 < 0) < 0.05] = 𝐹𝐹𝐹𝐹𝐹𝐹𝐹𝐹𝐹𝐹)
16
Table 2 – Results from the second simulation with negative interaction and fixed Nsamp (𝑁𝑁𝑁𝑁𝑁𝑁𝑁𝑁𝑁𝑁 = 100, 𝑏𝑏1 = 0.3, 𝑏𝑏2 = 0.0, 𝑏𝑏3 = −0.3 , 𝜎𝜎𝜂𝜂0 = 1., 𝜎𝜎𝜂𝜂1 = 3.0)
Percentage of replications where the naïve model finds spurious crossover effects when the "true" model doesn't Std. deviation of sampling error (σ_ν) 1 1.2 1.4 1.6 1.8 2 2.2 2.4 2.6 2.8 3
Average
0.5
0.6
0.7
21% 19% 15% 15% 13% 13% 11% 12% 10% 10% 12% 14%
21% 20% 19% 15% 15% 16% 15% 14% 12% 11% 10% 15%
22% 21% 20% 18% 18% 17% 13% 13% 12% 11% 12% 16%
Shift in means between groups (θ) 0.8 0.9 1 1.1 1.2 1.3 24% 21% 21% 20% 18% 16% 17% 16% 13% 13% 12% 17%
17
23% 24% 21% 22% 21% 20% 17% 16% 13% 13% 14% 19%
26% 26% 25% 24% 22% 18% 16% 16% 16% 17% 14% 20%
28% 27% 24% 23% 23% 22% 20% 18% 17% 16% 14% 21%
23% 24% 25% 25% 24% 22% 22% 21% 18% 17% 16% 22%
23% 26% 25% 27% 27% 23% 23% 20% 22% 20% 16% 23%
1.4 21% 24% 25% 27% 25% 24% 23% 21% 20% 18% 19% 22%
1.5 Average 21% 23% 25% 27% 27% 25% 25% 27% 20% 22% 20% 24%
23% 23% 22% 22% 21% 20% 18% 18% 16% 15% 15% 19%
Table 3 – Results from the third simulation with negative interaction and fixed Nsamp (𝑁𝑁𝑁𝑁𝑁𝑁𝑁𝑁𝑁𝑁 = 100, 𝑏𝑏1 = 0.1, 𝑏𝑏2 = 0.0, 𝑏𝑏3 = −0.3 , 𝜎𝜎𝜂𝜂0 = 1., 𝜎𝜎𝜂𝜂1 = 3.0)
Percentage of replications where the naïve model finds spurious crossover effects when the "true" model doesn't Std. deviation of sampling error (σ_ν) 1 1.2 1.4 1.6 1.8 2 2.2 2.4 2.6 2.8 3 Average
0.5 12% 9% 7% 7% 6% 6% 5% 3% 4% 3% 5% 6%
0.6 15% 11% 12% 7% 6% 7% 6% 7% 4% 4% 3% 8%
0.7 16% 14% 8% 8% 8% 12% 7% 8% 4% 3% 4% 8%
Shift in means between groups (θ) 0.8 0.9 1 1.1 1.2 1.3 19% 24% 25% 29% 36% 36% 17% 17% 21% 23% 30% 29% 13% 16% 17% 19% 24% 26% 9% 10% 14% 15% 19% 20% 12% 8% 13% 17% 14% 17% 8% 11% 8% 10% 11% 11% 6% 10% 9% 9% 9% 10% 7% 9% 6% 10% 11% 11% 6% 7% 7% 8% 7% 11% 4% 3% 3% 6% 5% 10% 4% 6% 5% 5% 6% 6% 9% 11% 12% 14% 16% 17%
18
1.4 40% 34% 29% 22% 20% 18% 16% 10% 9% 12% 8% 20%
1.5 Average 41% 27% 37% 22% 29% 18% 26% 14% 21% 13% 21% 11% 12% 9% 12% 8% 11% 7% 10% 6% 7% 5% 21% 13%
References Berkson, J. (1950) ”Are there two regressions,” Journal of the American Statistical Association, 45: 164-180. Bisbe, Josep, Germa Coenders, Willem E. Saris and Joan Manuel Batista-Foguet (2006) “Correcting measurement error bias in interaction models with small samples,” Metodoloski Zvezki 3(2): 267-287. Bollen, K.A. and Paxton, P. (1998), “Interactions of latent variables in structural equation models”. Structural Equation Modeling, 5, 267-293. Cheng C.L., H. Schneeweiss, M. Thamerus. “A small sample estimator for a polynomial regression with errors in variables”. Journal of the Royal Statistical Society, Series B 2000; 62(4):699–709 Fuller , Wayne A. (1987) Measurement Error Models, New York: Wiley & Sons Head, Megan H., Luke Holmas, Rob Lanfear, Andrew T. Kahn and Michael D. Jennions (2015)” The extent and consequences of p-hacking in science,” PLOS Biology, 13(3): 1-15. Huang, Li-Shan, Hongkun Wang and Christopher Cox (2005) “Assessing interaction effects in linear measurement error models,” Journal of the Royal Statistical Society, Series C (Applied Statistics) 54(1): 21-30. Hulland, J. (1999) “Use of Partial Least Squares (PLS) in strategic management research: a review of four recent studies”. Strategic Management Journal, 20, 195-224. Ioannidis, John P. A. (2005) “Why most published research findings are false,” PLOS Medicine, 2(8) e124. Kenny, D. A and C.M. Judd (1984) “Estimating the non-linear and interactive effects of latent variables,” Psychological Bulletin, 96, 201-210. Muffi, Stefanie and Lukas F. Keller (2015) “Reverse attenuation in interaction terms due to covariate measurement error,” Biometrical Journal, 75(6): 1068-1083. Murad, Havi and Lawrence S. Freedman (2007) “Estimating and testing interactions in linear regression models when explanatory variables are subject to classical measurement error,” Statistics in Medicine 26: 4293-4310 Open Science Collaboration (2015) “Estimating the reproducibility of psychological science,” Science 349(6251): 943-953. Schmidt, Frank L., & Hunter, John E. (1996). “Measurement error in psychological research: Lessons from 26 research scenarios”. Psychological Methods, 1(2), 199-223. Simonsohn, Uri, Leif D. Nelson and Joseph P. Simmons (2014) “P-curve: A key to the file drawer,” Journal of Experimental Psychology: General 143(2): 534-537
19
Spiegelman, D, A McDermott and B Rosner (1997) “Regression calibration method for correcting measurement-error bias in nutritional epidemiology,” The American Journal of Clinical Nutrition, 65(4), pp. 1179S–1186S
20