http://econtent.hogrefe.com/doi/pdf/10.1027/1864-9335/a000270 - Wednesday, March 09, 2016 12:34:34 AM - Universitäts- und Stadtbibliothek Köln IP Address:134.95.9.18
Editorial Increasing Replicability Christian Unkelbach Department of Psychology, Universität zu Köln, Germany
Over the last four years (Unkelbach, 2013–2015), I used the editorials to keep readers updated on the proceedings behind the scenes, to inform about important changes, and to provide authors with information about impact and turnaround rates. From this informational perspective, this will be an atypical editorial, as I will almost solely address how Social Psychology as a journal might contribute to a better social psychological science by increasing the replicability of the results published within our pages. This is a somewhat personal view, but informed by recent publications on the topic and by the experiences as Editor-in-Chief of Social Psychology.
Replicable Social Psychological Science In October 2015, Brian Nosek and 270 contributing authors published an article in Science (Open Science Collaboration, 2015) that attempted to replicate 100 psychological experiments from high-profile outlets. This article had its predecessor in Social Psychology with our Registered Reports Special Issue on replications (Nosek & Lakens, 2014); in this special issue, authors preregistered their replication attempts of classic and contemporary findings. Some classic findings replicated (e.g., Wesselmann et al., 2014), while others did not (Nauts, Langner, Huijsmans, Vonk, & Wigboldus, 2014), and there was a lively debate captured in the comments on this issue and the respective rejoinders (e.g., Monin & Oppenheimer, 2014; Schwarz & Strack, 2014). The recent 100 experiments Science article used only published experiments from 2008. It came up with a rate of 36 significant results in the direction of the original findings. The debate about the implications of the specific numbers and the reasons for successes and failures of these attempts has taken place elsewhere (see, e.g., Maxwell, Lau, & Howard, 2015; Stroebe & Strack, 2014). Yet, a trend is visible: published psychological experiments seem to be not easy to replicate. In the following, I want to
Ó 2016 Hogrefe Publishing
discuss how we might increase replicability of research results published in Social Psychology.
What Have We Done, How Are We Doing, and What Will be Done In 2013, we announced and started ‘‘Replications’’ as a submission category, based on the belief that ‘‘our discipline should acknowledge the value of replications and honor researchers’ time and investments in replication attempts by providing space to publish these attempts’’ (Unkelbach, 2013, p. 2). And as stated above, in 2014, we published the Registered Reports Replication Special Issue. This early focus on issues of replicability has strongly influenced the editorial team and its decisions within the last years. This influence is visible from Ulrich Schimmack’s blog on replicability (Replicability, 2015). The blog provides a ranking of research journals in psychology based on the post hoc computed replicability of the published results. The resulting index of replicability is based on the observed median power and the percentage of significant results. It is assumed to provide a probability for replicating a given significant result with the same procedures, materials, and sample size. Given this index, Social Psychology as a journal is ranked no. 9 out of 54 journals based on articles published in 2015. This makes Social Psychology the highest ranking social psychological journal within the list. Please note that there are many explanations for how the index comes about and I am very hesitant to interpret such rankings; however, it indicates that articles published in Social Psychology have adequate (albeit not good) power, and there is an upward trend (i.e., the average index has increased from 0.62 for papers published within 2010 to 2014, to 0.71 for papers published in 2015 with a 95% confidence interval from 0.62 to 0.78). The question is how to enhance the replicability of our results further. I will address three basic points that researchers, reviewers, and editors who contribute to Social
Social Psychology 2016; Vol. 47(1):1–3 DOI: 10.1027/1864-9335/a000270
2
Editorial
http://econtent.hogrefe.com/doi/pdf/10.1027/1864-9335/a000270 - Wednesday, March 09, 2016 12:34:34 AM - Universitäts- und Stadtbibliothek Köln IP Address:134.95.9.18
Psychology should keep in mind (and I am sorry if I may be repeating the obvious here). For sure, I can only scratch the surface, but I hope it gets the gist across how I believe one might increase replicability.
n = 32 and does not increase with more participants (see Westfall et al., 2015, Figure 1); in other words, increasing the sample of participants does not further increase power if the sample of stimuli is not increased as well. Thus, stimulus sampling allows generalizations and increases power, which increases replicability.
Increase N The call for increasing sample sizes is very prominent, and there are many sources detailing the necessity of larger participant samples in psychological research. Among other advantages, larger samples increase the probability to detect a hypothesized effect if it exists, larger samples provide more precise estimates of an effect, and larger samples provide more confidence in the nonexistence of an effect given a null result. Thus, a result based on large n is less likely to be a false positive result and should therefore be more likely to replicate; this is particularly true if one does not only investigate experimental, but also correlational results (see Schönbrodt & Perugini, 2013). However, large samples are also among the most costly endeavors. But there are other ways to increase replicability.
Sample Stimuli Wells and Windschitl (1999) discussed a pervasive problem in social psychological research. Most experiments sample participants but do not sample stimuli. Their illustrative example involves a hypothetical hitchhiking experiment in which a woman or a man holds out her/his thumb a hundred times and the dependent variable is how often she/he gets a ride. The hypothetical result is that the woman gets significant more rides than the man, leading to the false conclusion that women get more rides than men as hitchhikers. However, the design only allows concluding that this specific woman gets more rides than this specific man. To make claims for women and men, one needs to sample stimuli from these categories the same way one samples participants from a population. Judd, Westfall, and Kenny (2012; see also Westfall, Kenny, & Judd, 2014) explored another aspect of this problem. Lack of stimulus sampling inflates Type I error probability for an effect because variation due to the stimuli is ascribed to the experimental variations by standard ANOVAs. The authors provide all the necessary syntax to avoid this inflated error probability by analyzing stimuli and participants simultaneously in mixed model analyses. However, for the present argument, the two most important aspects are that (a) not sampling stimuli does not allow generalization and thereby hinders successful replication beyond the stimuli used in the to-be-replicated study, and (b) not sampling stimuli stunts increases in power when n is increased (Westfall, Judd, & Kenny, 2015, p. 394). In fact, if stimuli account for 30% of the variance in a given dependent variable, effective power levels out around
1
Derive Clear Predictions Reading a scientific journal is in many respects like reading a normal journal; and journals are expected to provide exciting and surprising facts and news. However, exciting and surprising results are almost by definition unlikely, that is, it is most likely false a priori (see Ioannidis, 2005); importantly, a significant result (i.e., p < .05) does not increase the chances much that an effect is real if it is highly unlikely a priori (Nuzzo, 2014). Strong claims (i.e., unlikely findings) need strong support. In other words, replicating an unlikely result is also an attempt that is unlikely to succeed a priori. The likelihood that an effect exists is unfortunately difficult to assess directly in advance, but theoretical clarity makes for a good proxy. If a theory is based on well-grounded assumptions, a likely outcome is one that is clearly predicted by the theory. Thus, to increase replicability, effects need to be clearly theoretically derived, although this might make for less exciting reading sometimes. However, the true beauty of a hypothesis might emerge if plausible assumptions are combined to lead to surprising, but nevertheless clear predictions.1
Concrete Measures These three basic points are rather abstract, but as a general guideline for authors, reviewers, and editors, it is clear that we want to publish studies that have predictions clearly derived from theories which are tested with large samples of participants and samples of stimuli. On the more concrete level, we will implement the by now standard measures that might increase replicability by urging authors to (a) report sample size and power considerations, (b) report effect sizes and confidence intervals for these effect sizes, (c) to share their data, (d) to share their materials, and (e) to preregister experiments. While (a) and (b) are easily implemented and will be required, (c)–(e) will be by choice and based on an incentive system; that is, as already done in our Registered Reports Special Issue, we will start awarding badges for articles for which data and materials are openly accessible and in the best case, for which the underlying research was preregistered. I hope that both these abstract and concrete points further help that, as I already stated in 2013, Social Psychology as a journal contributes to a more solid foundation of social psychological research.
Please note that this point is the opposite of hypothesizing after the effects are known (i.e., HARKing; Kerr, 1998). Such a practice actively reduces the replicability, because possible chance findings (i.e., false positives) are disguised as theoretically-derived.
Social Psychology 2016; Vol. 47(1):1–3
Ó 2016 Hogrefe Publishing
Editorial
http://econtent.hogrefe.com/doi/pdf/10.1027/1864-9335/a000270 - Wednesday, March 09, 2016 12:34:34 AM - Universitäts- und Stadtbibliothek Köln IP Address:134.95.9.18
Closing Remarks These thoughts on replicability come for me at the end of my 4-year term as editor-in-chief. And given this end of my tenure, I sincerely want to thank the associate editors who worked for the journal during the last 4 years for their contribution to Social Psychology. Without the input and effort of Julia Becker, Juliane Degner, Roland Deutsch, Gerald Echterhoff, Hans-Peter Erb, Malte Friese, Michael Häfner, Hans IJzerman, Eva Jonas, Markus Kemmelmeier, Ulrich Kühnen, Alison Ledgerwood, Ruth Mayo, Margaret Shih, Nicole Tausch, and Michaela Wänke, the journal would not be in the good shape it is now. And most importantly, I want to thank Juliane Burghardt for her invaluable help and her management of the editorial office of Social Psychology: Merci, Juliane! The last bit of information is that the new incoming editor-in-chief, as of April 1, 2016, will be Kai Epstude from the University of Groningen. Welcome Kai, and all the best for your term as editor and for the journal!
References Ioannidis, J. P. A. (2005). Why most published research findings are false. PLoS Medicine, 2, e124. Judd, C. M., Westfall, J., & Kenny, D. A. (2012). Treating stimuli as a random factor in social psychology: A new and comprehensive solution to a pervasive but largely ignored problem. Journal of Personality and Social Psychology, 103, 54–69. Kerr, N. L. (1998). HARKing: Hypothesizing after the results are known. Personality and Social Psychology Review, 2, 196–217. Maxwell, S. E., Lau, M. Y., & Howard, G. S. (2015). Is psychology suffering from a replication crisis? What does ‘‘failure to replicate’’ really mean? American Psychologist, 70, 487–498. Monin, B., & Oppenheimer, D. (2014). Commentaries and rejoinder on Klein et al. (2014): The limits of direct replications and the virtues of stimulus sampling. Social Psychology, 45, 299–301. doi: 10.1027/1864-9335/a000202 Nauts, S., Langner, O., Huijsmans, I., Vonk, R., & Wigboldus, D. J. (2014). Forming impressions of personality: A replication and review of Asch’s (1946) evidence for a primacy-of-warmth effect in impression formation. Social Psychology, 45, 153–163. doi: 10.1027/1864-335/a000179 Nosek, B. A., & Lakens, D. (2014). Registered reports: A method to increase the credibility of published results.
Ó 2016 Hogrefe Publishing
3
Social Psychology, 45, 137–141. doi: 10.1027/1864-335/ a000192 Nuzzo, R. (2014). Statistical Errors. Nature, 506, 151–153. Open Science Collaboration. (2015). Estimating the reproducibility of psychological science. Science, 349, 943. Replication-Index. (2015). Replicability Ranking of 54 Psychology Journals. Retrieved from https://replicationindex. wordpress.com/2015/10/27/2015-replicability-ranking-of54-psychology-journals/ Schönbrodt, F. D., & Perugini, M. (2013). At what sample size do correlations stabilize? Journal of Research in Personality, 47, 609–612. Schwarz, N., & Strack, F. (2014). Does merely going through the same moves make for a ‘‘Direct’’ Replication? Concepts, contexts, and operationalizations. Social Psychology, 45, 305–306. doi: 10.1027/1864-9335/a000202 Stroebe, W., & Strack, F. (2014). The alleged crisis and the illusion of exact replication. Perspectives on Psychological Science, 9, 59–71. Unkelbach, C. (2013). Social Psychology – change and consistency. Social Psychology, 44, 1–3. doi: 10.1027/1864-335/ a000135 Unkelbach, C. (2014). The best of times, the worst of times. Social Psychology, 45, 71–73. doi: 10.1027/1864-335/ a000194 Unkelbach, C. (2015). Looking back and looking forward. Social Psychology, 46, 1–3. doi: 10.1027/1864-335/a000237 Wesselmann, E. D., Williams, K. D., Pryor, J. B., Eichler, F. A., Gill, D. M., & Hogue, J. D. (2014). Revisiting Schachter’s research on rejection, deviance, and communication (1951). Social Psychology, 45, 164–169. doi: 10.1027/1864-335/ a000180 Westfall, J., Kenny, D. A., & Judd, C. M. (2014). Statistical power and optimal design in experiments in which samples of participants respond to samples of stimuli. Journal of Experimental Psychology: General, 143, 2020–2045. Westfall, J., Judd, C. M., & Kenny, D. A. (2015). Replicating studies in which samples of participants respond to samples of stimuli. Perspectives on Psychological Science, 10, 390–399.
Christian Unkelbach Department of Psychology Universität zu Köln Richard-Strauss-Str. 2 50931 Köln Germany Tel. +49 221 470-2001 E-mail
[email protected]
Social Psychology 2016; Vol. 47(1):1–3