Received: 16 July 2018
Accepted: 13 August 2018
DOI: 10.1002/hbm.24371
COMMENT
Reporting matters: Brain mapping with transcranial magnetic stimulation Martin E. Héroux1,2 1
Neuroscience Research Australia, Randwick, New South Wales, Australia
2
University of New South Wales, Randwick, New South Wales, Australia
Correspondence Martin Héroux, Neuroscience Research Australia, Randwick, New South Wales 2031, Australia. Email:
[email protected]
Transcranial magnetic stimulation (TMS) allows researchers to nonin-
The authors later discuss these effects as if they were statistically
vasively probe the human brain. In a recent issue of Human Brain Map-
significant:
ping, Massé-Alarie, Bergin, Schneider, Schabrun, and Hodges (2017) used this technique to investigate the task-specific organization of the primary motor cortex for control of human forearm muscles. Specifically, TMS was used to create cortical topographical maps of four
“The findings that peak overlap was larger between ECRBsurf and ECRBfw than for ECRBfw and EDC implies that […].” p. 6130
forearm muscles at rest, and, in one of these muscles, during isometric
Implicit to this type of interpretation is that statistical trends reflect real
wrist extension and isometric grip. The authors were interested in
effects. However, additional data are more likely than not to turn a
how these maps differ between muscles, how they overlap, and how
trend into a nonsignificant result (Wood, Freemantle, King, & Nazareth,
they change with different motor tasks. Key to their approach was the
2014). Is this a problem? In this instance the spin is blatant and an
use of indwelling fine-wire electrodes to record motor evoked poten-
astute reader can draw their own conclusions. However, spin in various
tials elicited by magnetic stimulation, which revealed the size of corti-
forms is so common (Bero, 2018; Chiu, Grundy, & Bero, 2017; Héroux,
cal maps is grossly overestimated when evoked potentials are
2016) that readers and reviewers may be wooed by such biased inter-
recorded from electrodes placed on the skin surface.
pretations. Why do I say biased? The authors are lenient only in one
In their paper, Massé-Alarie et al. (2017) set (statistical) signifi-
direction. In their paper, Massé-Alarie et al. (2017) report 11 p values
cance at p < 0.05. Yet the authors interpret several p values above this
that fall between 0.02 and 0.05. Why are these not reported as tending
threshold as statistical trends. Post hoc analyses were even performed
toward nonsignificance? Regardless of whether the p values were just
for main effects that were not statistically significant:
above or just below the threshold of p = 0.05, an exact replication (i.e., same sample size and methods) of this study only has a 50% chance
“Although narrowly missing significance, there was a ten-
of reproducing these statistically significant (or near significant) effects
dency for a main effect of Pairs of tasks for percentage of
(Button et al., 2013; Forstmeier, Wagenmakers, & Parker, 2017). As
MEP peak overlap (F(1, 13) = 3.30; p = 0.053), which was
recently pointed out, p values are fickle (Cumming, 2014; Halsey,
explained by a tendency towards a greater percentage
Curran-Everett, Vowler, & Drummond, 2015), especially when sample
peak overlap in Rest-Ext (32.1 9.6%) than Rest-Grip
size is small (Button et al., 2013; Higginson & Manufó, 2016). Thus,
(14.3 8.2%; p = 0.02); Fig. 4B), and by a tendency
how confident should we be about the results of Massé-Alarie
toward a greater percent peak overlap for Grip-Ext
et al. (2017)? Looking back, how confident should we be about our own
(26.8 9.7%) compared to Rest-Grip (p = 0.09).” p. 6126
work? This type of nuanced view is not common. But we need more of
“Finally, although nonsignificant, there was a tendency
it; especially towards noninvasive brain stimulation where many pub-
toward a main effect for muscle pairs for percentage
lished effects are simply not reproducible (Héroux, Taylor & Gandevia,
of MEP peak overlap (F(2,
2015; Héroux, Loo, Taylor, & Gandevia, 2017).
18)
= 3.23; p = 0.06). This
was explained by a tendency toward a larger overlap
Another reporting matter in the paper by Massé-Alarie et al. (2017)
between ECRBfw-surf (43.5 8.7%) than ECRBfw-EDC
is the all-to-common use of the standard error of the mean (SEM) to
(25.2 7.5%;
summarize data variability (Héroux, 2016; Héroux et al., 2017; Weiss-
post
p. 6126-6127 Hum Brain Mapp. 2018;1–2.
hoc
p
= 0.03;
Fig.
6C).”
gerber, Milic, Winham, & Garovic, 2015). This is not what the SEM wileyonlinelibrary.com/journal/hbm
© 2018 Wiley Periodicals, Inc.
1
2
HÉROUX
quantifies. But does it actually matter what measure is reported? Experts think so (Curran-Everett & Benos, 2004), as do I. Here are some of the above results reported with standard deviations: “[…] which was explained by a tendency towards a greater percentage peak overlap in Rest-Ext (32.1 35.9%) than Rest-Grip (14.3 31.1%; p = 0.02); Fig. 4B), and by a tendency toward a greater percent peak overlap for Grip-Ext (26.8 36.3%) compared to Rest-Grip (p = 0.09).” Given that percentages are bound between 0 and 100, what does 14.3 31.1% actually mean? What does the underlying data look like? Reporting results with standard deviations or other appropriate measures of variability does not affect statistical tests—a significant result will remain a significant result—so let us not be afraid of them. Reporting results with standard deviations or other appropriate measures of variability provides the reader with a better sense of the underlying data, which is important to appropriately interpret study results and figures (Belia, Fidler, Williams, & Cumming, 2005; CurranEverett & Benos, 2004; Drummond & Vowler, 2011). Exploratory research is important to identify new avenues of research and test new hypotheses, and the paper by Massé-Alarie et al. (2017) raises many interesting questions on how the human primary motor cortex is organized. Nevertheless, I encourage the authors and others in the field to be mindful when reporting and interpreting study results, especially when sample sizes are relatively small. Let us heed the advice of experts and, as a field, strive toward publishing research that is less biased, and more reproducible and transparent. ORCID Martin E. Héroux
http://orcid.org/0000-0002-3354-7104
Button, K. S., Ioannidis, J. P., Mokrysz, C., Nosek, B. A., Flint, J., Robinson, E. S., & Munafò, M. R. (2013). Power failure: Why small sample size undermines the reliability of neuroscience. Nature Reviews Neuroscience, 14, 365–376. Chiu, K., Grundy, Q., & Bero, L. (2017). 'Spin' in published biomedical literature: A methodological systematic review. PLoS Biology, 15, e2002173. Cumming, G. (2014). Understanding the new statistics: Effect sizes, confidence intervals, and meta-analysis. New York, NY: Routledge. Curran-Everett, D., & Benos, D. J. (2004). Guidelines for reporting statistics in journals published by the American Physiological Society. Journal of Applied Physiology, 97, 457–459. Drummond, G. B., & Vowler, S. L. (2011). Show the data, don't conceal them. Journal of Physiology, 589, 1861–1863. Forstmeier, W., Wagenmakers, E. J., & Parker, T. H. (2017). Detecting and avoiding likely false-positive findings - A practical guide. Biological Reviews - Cambridge Philosophical Society, 92, 1941–1968. Halsey, L. G., Curran-Everett, D., Vowler, S. L., & Drummond, G. B. (2015). The fickle P value generates irreproducible results. Nature Methods, 12, 179–185. Héroux, M. E. (2016). Inadequate reporting of statistical results. Journal of Neurophysiology, 116, 1536–1537. Héroux, M. E., Loo, C. K., Taylor, J. L., & Gandevia, S. C. (2017). Questionable science and reproducibility in electrical brain stimulation research. PLoS One, 12, e0175635. Héroux, M. E., Taylor, J. L., & Gandevia, S. C. (2015). The use and abuse of transcranial magnetic stimulation to modulate corticospinal excitability in humans. PLoS One, 10, e0144151. Higginson, A. D., & Manufò, M. R. (2016). Current incentives for scientists lead to underpowered studies with erronous conclusions. PLoS Biology, 14, e2000995. Massé-Alarie, H., Bergin, M. J. G., Schneider, C., Schabrun, S., & Hodges, P. W. (2017). "Discrete peaks" of excitability and map overlap reveal task-specific organization of primary motor cortex for control of human forearm muscles. Human Brain Mapping, 38, 6118–6132. Weissgerber, T. L., Milic, N. M., Winham, S. J., & Garovic, V. D. (2015). Beyond bar and line graphs: Time for a new data presentation paradigm. PLoS Biology, 13, e1002128. Wood, J., Freemantle, N., King, M., & Nazareth, I. (2014). Trap of trends to statistical significance: Likelihood of near significant P value becoming more significant with extra data. British Medical Journal, 348, g2215.
RE FE R ENC E S Belia, S., Fidler, F., Williams, J., & Cumming, G. (2005). Researchers misunderstand confidence intervals and standard error bars. Psychological Methods, 10, 389–396. Bero, L. (2018). Meta-research matters: Meta-spin cycles, the blindness of bias, and rebuilding trust. PLoS Biology, 16, e2005972.
How to cite this article: Héroux ME. Reporting matters: Brain mapping with transcranial magnetic stimulation. Hum Brain Mapp. 2018;1–2. https://doi.org/10.1002/hbm.24371