Language Learning
ISSN 0023-8333
Guidelines for Reporting Quantitative Methods and Results in Primary Research John M. Norris,a Luke Plonsky,b Steven J. Ross,c and Rob Schoonend a
Georgetown University, b Northern Arizona University, c University of Maryland, and
d
University of Amsterdam
Adequate reporting of quantitative research about language learning involves careful consideration of the logic, rationale, and actions underlying both study designs and the ways in which data are analyzed. These guidelines, commissioned and vetted by the board of directors of Language Learning, outline the basic expectations for reporting of quantitative primary research with a specific focus on Method and Results sections. The guidelines are based on issues raised in: Norris, J. M., Ross, S., & Schoonen, R. (Eds.). (2015). Improving and extending quantitative reasoning in second language research. Currents in Language Learning, volume 2. Oxford, UK. Keywords research reporting; research design; data analysis; statistics; statistical significance
Introduction These guidelines are intended to provide authors and reviewers with basic recommendations for the reporting of designs, data, and analyses typically associated with experimental, quasi-experimental, survey/questionnaire, relational, descriptive, structural, and other primary research approaches that rely on the quantification of observations. The guidelines are divided into two main sections—Method and Results—and multiple subsections corresponding to typical components of a publishable research report (as described in Sections 2.06 and 2.07 of the Publication Manual of the American Psychological Association, 6th edition). It is not the intent of these guidelines that reports be
We appreciate the encouragement to produce these guidelines, and the feedback provided on them by the Board of Directors of Language Learning. Correspondence concerning this article should be directed to John M. Norris, Georgetown University, Department of Linguistics, 1421 37th St. NW, Box 571051, Washington, DC 20057– 1051. E-mail:
[email protected]
Language Learning 65:2, June 2015, pp. 470–476 C 2015 Language Learning Research Club, University of Michigan
DOI: 10.1111/lang.12104
470
Norris et al.
Guidelines for Reporting Quantitative Methods
structured exactly according to the section and subsection headers provided; depending on the particular approach to quantitative research, distinct categorization, ordering, and summarization or expansion of detail will be required. In order to answer research questions, support conclusions, make studies amenable to critical evaluation, and contribute otherwise to the accumulation of credible knowledge, the reporting of Method and Results sections requires careful attention to particular details, as outlined in the following. Reporting Recommendations for Method Sections Provide information sufficient to enable: (a) accurate and complete interpretation of the study setting, procedures, and analyses; (b) comparison, contrast, generalization with other studies and settings (e.g., in meta-analyses); and (c) replication. Use past tenses to describe the methods used and data collected (i.e., because they occurred in the past). Population and Participants (a) Define the human or other population(s) of interest for the study, including key demographic characteristics or other parameters that limit generalizability and help operationalize who/what is under investigation, such as age, gender, language (first, second, other), proficiency (using standardized or otherwise generalizable measures), language learning/use setting. (b) Report precise frequencies of participants reflecting each key demographic characteristic or case feature, including how they were determined (e.g., language proficiency measures utilized and their range, minimum and maximum scores, central tendency, such as mean, and dispersion, such as standard deviation, estimates for participant scores). Sampling, Assignment, and Power (a) Describe precisely how individuals/cases were recruited or otherwise identified for participation or inclusion, including sampling approach (census, convenience, random selection, self-selection, etc.), constraints on opportunity to participate (e.g., specific location, timing), communication with possible participants, and incentives provided. (b) Report response rates, approached/accepted rates, attrition, and other estimates of the likelihood of achieving robust population representation (i.e., the extent to which the study participants or cases can be presumed to reflect a defined population). (c) Describe how participants were assigned to study conditions or groups (random, stratified, counterbalanced, intact groups, etc.). (d) Estimate the number of participants or cases 471
Language Learning 65:2, June 2015, pp. 470–476
Norris et al.
Guidelines for Reporting Quantitative Methods
needed to arrive at trustworthy interpretations given the research questions and complexity of planned analyses; statistical power analysis is advisable when a particular effect size is known or anticipated in advance of the study; otherwise, consideration should be given to the minimal number of observations beyond which the study can begin to answer research questions or research hypotheses (e.g., in light of the planned number and type of inferential analyses, indicating whether minimum expectations were met).
Measurement (a) Operationalize the main phenomena (e.g., variables, constructs) in the study and that may have required observation or control, including independent, moderating, and dependent variables. (b) Describe exactly the instruments and procedures used to measure values on observed phenomena, including methods for collecting as well as extracting, coding, and/or scoring data (appending instruments whenever feasible or referring to sources where those are already published or publicly available). (c) Provide evidence regarding the consistency of the measurement instruments and associated techniques (e.g., coding, scoring), including in particular coder or rater reliability, overall instrument score reliability (e.g., internal consistency), and reliability of subscores reflecting distinct constructs in your research (reliability estimates from other studies should only be used for comparison purposes). (d) Provide validity evidence (either directly in the study itself, e.g., via pilot testing, or indirectly on the basis of previous research) supporting the use of the given measurement instruments for the intended construct interpretations being made in the actual study with the actual population sample.
Design and Procedures (a) Specify the research design adopted for answering research questions or addressing hypotheses (e.g., descriptive, relational, predictive, quasiexperimental, true experimental, modeling). (b) Describe the study setting, paying particular attention to any factors that might have influenced observed phenomena (i.e., internal validity) or limited generalizability (i.e., external validity). (c) Explain how key variables (independent or predictor, dependent or criterion, moderating) were manipulated or controlled, including, in particular, detailed accounts of treatments or other interventions. (d) Document precisely the procedures followed in collecting data, including the instructions and any other input provided to participants at all stages of data collection. Language Learning 65:2, June 2015, pp. 470–476
472
Norris et al.
Guidelines for Reporting Quantitative Methods
Analysis (a) Describe how patterns in the data were analyzed in light of the research questions or hypotheses, methodological features of the study, types of measurement, and so forth; provide a justification for the specific analyses chosen. (b) Where inferential statistics are employed (i.e., statistical significance testing), explain why particular models or techniques were adopted, in light of the study design and the qualities of the data collected (see Assumptions in Results section below). (c) Before engaging in statistical inferences, establish a level of expected statistical probability (e.g., p < .05, p < .01) on the basis of reasonable knowledge of the phenomena under investigation and the caution necessary for interpreting comparisons. (d) This initial alpha level is set overall for the level of statistical probability required in a given study; however, multiple repeated inferential testing of the same data set threatens the intended alpha level by compounding error with each new analysis; the alpha level for specific inferential analyses should be adjusted by the overall number of such analyses undertaken (e.g., using a Bonferroni correction or alternative omnibus analysis). (e) When Bayesian hypothesis testing is used, specify the basis of the chosen prior probabilities and outline the interpretation of posterior probabilities. (f) For models with fixed and random effects, provide a justification for selecting random intercepts and slopes and provide evidence that nested models have been adequately tested. (g) When structural equation or confirmatory factor models are used, specify a priori what the structural hypotheses are and avoid building interpretive arguments based on post hoc models revised according to modification indices. (h) For factor analysis, describe the specific modeling approach selected (e.g., confirmatory versus exploratory, principal components versus factor analysis), rotation strategy and type adopted, and criteria for selection of factors. (i) When Robust Statistics are used, specify the precise criteria by which specific statistics were selected and justify their preference over other traditional analyses.
Reporting Recommendations for Results Sections Provide information sufficient (a) to describe and otherwise characterize the properties of all raw data collected, (b) to warrant the use of particular analytic techniques and adjustments, (c) to display adequately the output of all analyses computed, (d) to facilitate accurate and meaningful interpretation of findings, and (e) to enable meta-analytic and other secondary analyses of data from the study.
473
Language Learning 65:2, June 2015, pp. 470–476
Norris et al.
Guidelines for Reporting Quantitative Methods
Descriptive Statistics and Graphical Analyses (a) For all groups, subgroups, and cells on all measures, always report and interpret N (sample size and, where appropriate, also n as subsample sizes), M (means or other central tendency estimates), SD (standard deviations or other dispersion estimates), and related ways of describing group data (e.g., minimum and maximum scores). (b) For frequency observations, report N or n of participants or cases observed and exact tallies observed for all categories and cells. (c) For relational analyses, report N or n and exact estimate of correlations for all variables compared. (d) For means comparisons, frequency tallies, and correlations, consider whether graphic techniques (bar charts, line graphs, histograms, scatterplots, etc.) are useful (or not) to provide interpretable summaries of main patterns in the results (e.g., means and 95% confidence intervals) as well as individual variability (e.g., dispersion of cases around means, regression lines, and similar); but, care should be taken to avoid redundancy (e.g., repetition of information in tables and graphs that already appeared in the main text), to reflect the scale of observations accurately, and to represent maximally useful/interpretable data when graphics are utilized. Effect Sizes and Confidence Intervals (a) For all comparisons (i.e., between-groups, frequencies, or correlations) report standardized estimates of the effect or relationship observed (e.g., d, r) and an estimate of the error associated with the comparison or relationship (e.g., 95% confidence intervals, or standard error). (b) Interpret point estimates (effect sizes) and their associated confidence intervals with respect to one or more of the following: field- and domain-specific guidelines, previous research, predictions of theory, practical constraints, and the type(s) of instrumentation involved (in Results as well as Discussion sections). (c) Provide effect size estimates (R2 ) for multiple regression models and variance explained for factor analysis. (d) Include graphical representation of credible intervals for Bayesian parameter estimates. Assumptions (a) For each inferential analysis utilized, specific assumptions of data quality should be tested (descriptively, graphically) and the findings reported (e.g., normality of distributions, absence of outliers, homogeneity of variance, factorability of data sets); where assumptions are not met, corresponding inferential statistical analyses are not warranted and alternative (e.g., nonparametric) analyses, adjusted interpretations, or descriptive approaches should be adopted instead (see Analysis in Method section; assumptions are typically discussed Language Learning 65:2, June 2015, pp. 470–476
474
Norris et al.
Guidelines for Reporting Quantitative Methods
there, while corresponding data may be best reported along with other statistics in the Results section). (b) Probability-based tests (e.g., Levene’s test for equality of error variances) of certain assumptions are notoriously susceptible to differences in sample sizes (e.g., the higher the N, the easier it becomes to find statistically significant inequality of error variance); such tests should be used with caution and backed up with additional (descriptive) inspection of the data. (c) Making simple mathematical adjustments to data (e.g., smoothing a normal distribution, eliminating or reigning in severe outliers) is a legitimate technique for enhancing inferential analyses; however, care should be taken to avoid altering the reality of observed phenomena, and any adjustments made should be reported in full, along with a justification for doing so. Statistical Tests (a) The precise inferential test utilized should be reported (e.g., a 2 × 3 factorial analysis of variance [ANOVA]) including the nature of the significance test (e.g., one-tailed, two-tailed). (b) The full output of inferential tests should be reported, including at minimum the magnitude and direction of the statistic itself (e.g., the t or F value), the degrees of freedom used in the calculation, and the exact p value (all reported regardless of “significance”). (c) Where followup or post hoc analyses are employed, the specific name of the analysis should be reported (e.g., Tukey’s HSD test) along with the exact test statistic and exact p values for each comparison. (d) If confirmatory ANOVA models are used, provide a clear rationale for the hierarchy of plausible rival hypotheses, and wherever feasible avoid a “straw man” null hypothesis. (e) For factor analysis report number of variables, number of factors, eigenvalues for factors, and factor loadings and communalities for variables. (f) Effect sizes associated with inferential comparisons (e.g., eta-squared) should be reported precisely (i.e., partial, squared, etc.) and interpreted as explaining the amount of variance accounted for by the particular factor or variable and/or combination of factors or variables (note that these effect sizes do not provide the same information as standardized effect sizes that focus on the comparison or contrast of pairs of variables, such as d or r; hence, both should be reported). Closing Remarks The guidelines above address basic expectations for two main sections of quantitative primary research articles: Method and Results. However, good reporting in all other sections of an article is equally important. Preliminary sections of a quantitative research report should serve to situate the study 475
Language Learning 65:2, June 2015, pp. 470–476
Norris et al.
Guidelines for Reporting Quantitative Methods
adequately within a particular domain of research; introduce key theories, concepts, and constructs of interest; review methods and findings from similar previous studies; state clearly a problem or gap addressed by the study, spell out specific research questions or hypotheses, and otherwise motivate the need for and approach to the study. Similarly, the Discussion section should summarize and synthesize findings presented in the Results, answer research questions and adjudicate hypotheses, address methodological limitations, relate findings to those of previous studies, and suggest implications for adjusting theories or informing applications of new knowledge. In sum, we hope these guidelines are useful to support the quality of reporting in Method and Results sections of empirical articles. Much of what is covered here will also imply careful consideration of what is reported in additional subsections of research articles, including introduction and literature review, as well as discussion, limitations, conclusion, and referencing. Final revised version accepted 10 February 2015 Suggested Reading The following selected resources provide important points of comparison and expansion in relation to the guidelines presented here. Readers are encouraged also to consult the chapters in the second volume of Currents in Language Learning, published as Supplement s1 of Volume 65, 2015. American Educational Research Association. (2006). Standards for reporting on empirical social science research in AERA publications. Educational Researcher, 35, 33–40. Cumming, G. (2014). The new statistics: Why and how. Psychological Science, 25, 7–29. Hancock, G. R., & Mueller, R. O. (Eds.). (2010). The reviewer’s guide to quantitative methods in the social sciences. New York: Routledge. Journal Article Reporting Standards Working Group. (2008). Reporting standards for research in psychology: Why do we need them? What might they be? American Psychologist, 63, 839–851. Wilkinson, L., & Task Force on Statistical Inference. (1999). Statistical methods in psychology journals: Guidelines and explanations. American Psychologist, 54, 594–604.
Language Learning 65:2, June 2015, pp. 470–476
476