Significant and meaningful effects in sports ...

4 downloads 0 Views 82KB Size Report
DUANE KNUDSON. Department of .... 1997; Knudson, 2005a; Morris, 1981; Mullineaux et al., 2001). .... (Huberty and Morris, 1989; Share, 1984). Another ...
Sports Biomechanics March 2009; 8(1): 96–104

Downloaded By: [California State University Chico] At: 20:03 28 April 2009

Significant and meaningful effects in sports biomechanics research

DUANE KNUDSON Department of Kinesiology, California State University, Chico, California, USA (Received 28 February 2008; revised 7 August 2008; accepted 8 August 2008)

Abstract Errors in statistical analysis of multiple dependent variables and in documenting the size of effects are common in the scientific and biomechanical literature. In this paper, I review these errors and several solutions that can improve the validity of sports biomechanics research reports. Studies examining multiple dependent variables should either control for the inflation of Type I errors (e.g. Holm’s procedure) during multiple comparisons or use multivariate analysis of variance to focus on the structure and interaction of the dependent variables. When statistically significant differences are observed, research reports should provide confidence limits or effect sizes to document the size of the effects. Authors of sports biomechanics research reports are encouraged to analyse and present their data accounting for the experiment-wise Type I error rate, as well as reporting data documenting the size or practical significance of effects reaching their standard of statistical significance.

Keywords: Alpha, effect size, error, multiple comparisons, validity

Introduction Scientific research reports follow a fairly standard format of justifying the need for a study, describing the methodology used, and reporting the observations and discussing their meaning. The journal peer review process is designed to improve the quality of these reports and make a qualitative judgement assuring several qualities. Reviewers and editors typically affirm some contribution of the study to the field, that appropriate controls and methods were used, and that the observations and conclusions drawn were appropriate. The publication of a research report is some assurance that the results that were statistically significant were also meaningful, and consequently can be integrated with other studies to contribute to the theoretical or applied knowledge base. Publication in Sports Biomechanics also implies the added assurance of potential practical application of the results to sports performance or injury prevention. Unfortunately, there are numerous threats to the internal and external validity of a research report. Internal validity relates to the accuracy of the measurements and the design of the experiment to control for extraneous variables. If internal validity can be established, then external validity can be considered. External validity concerns the generalization or application of results to the whole population and, therefore, is related to the sampling, assumptions, and results of the inferential statistical tests. These threats to research validity Correspondence: D. Knudson, Department of Kinesiology, California State University, Chico, 400 W. First St., Chico, CA 959290330, USA. E-mail: [email protected] ISSN 1476-3141 print/ISSN 1752-6116 online q 2009 Taylor & Francis DOI: 10.1080/14763140802629966

Downloaded By: [California State University Chico] At: 20:03 28 April 2009

Validity of sports biomechanics research

97

are especially great in a discipline like sports biomechanics where applied knowledge is required of a complex system in a wide variety of environments. Some important issues related to the internal and external validity of biomechanical studies are the design of the study, sampling, biofidelity of the model and variables examined, the reliability of the dependent variables (Y), and the statistical analyses performed. The logic, critical integration of previous research, and design of the study are arguably some of the most important factors in study internal validity. For example, in biomechanics there has been long-standing debate whether within-subject or between-subjects designs are more appropriate for certain research questions (Bates, 1989; Mullineaux et al., 2001). Prospective authors should consult authoritative sources for study design (e.g. Kirk, 1995) and levels of evidence for weighing evidence or applying research to practice (Ebell et al., 2004; Hadorn et al., 1996; Knudson, 2005b). Once the appropriate design for the research question has been selected, there still remain several challenges to study validity in sports biomechanics. The current paper reviews two of these challenges that are common errors in the sports biomechanics and sports science research literature: errors in statistical analysis of multiple dependent variables and the interpretation of the size of effects. Several statistical and analysis solutions that can used to address these problems and improve sports biomechanics knowledge are presented. Similar papers outlining statistical recommendations for submission to journals in other disciplines are also available (American Psychological Association, 2001; Bond et al., 1995; Curran-Everett and Benos, 2004; International Committee of Medical Journal Editors, 1997; Kirk, 2001; Thompson, 1996). Errors in statistical analysis of multiple dependent variables Advancements in the discipline of statistics are often not quickly adopted by researchers in other fields. Efforts to “reform” the use of statistical methods in many disciplines has been difficult and has met with varying degrees of success (Altman, 1998; Fidler and Cumming, 2007; Kirk, 2001). Over the last few decades, several scholars have also noted that many biomechanics research reports have used inappropriate statistical analyses (James and Bates, 1997; Knudson, 2005a; Morris, 1981; Mullineaux et al., 2001). These problems compromise the validity of the results of these studies and represent a threat to the knowledge base in sports biomechanics. These problems persist because the self-correcting nature of science is often not effective when there is poor scholarship and peer review subsequent to these errors (Ioannidis, 2005). For example, one study of psychiatric journals reported no difference in nine-year citation counts between articles with and without statistical reporting and analysis errors (Nieminen et al., 2006). Research from several disciplines has reported a large proportion (10 –70%) of papers with incorrect citation of results and references (Lok et al., 2001). Sports biomechanics likely has similar problems of inappropriate use of statistical analyses, with these incorrect results cited and perpetuated over many years. One of the most common errors is the use of multiple univariate [e.g. multiple t-test, analysis of variance (ANOVA), repeated-measures ANOVA] statistical tests on numerous dependent variables in a study that inflates the Type I error rate (Atkinson, 2002; Bender and Lange, 2001; Hsueh et al., 2003; Lang and Secic, 2006; Lundbrook, 1998; Morris, 1981; Morrow and Frankiewicz, 1979; Sankoh et al., 1997; Thomspon, 1998). This family-wise error rate or experiment-wise error rate (EER) grows based on the number of comparisons (n). If a study tests n ¼ 5 dependent variables and these variables have no correlation with each other, the true Type I error for the study is P , 0.23 (ERR ¼ 1 – (1 2 a)n) for the traditional alpha level of 0.05. If some dependent variables are correlated with each other,

Downloaded By: [California State University Chico] At: 20:03 28 April 2009

98

D. Knudson

the true Type I error rate or ERR will still be larger than the observed P-value in a statistical printout, although not as bad as the independent case. Unfortunately, once an investigator chooses to test multiple dependent variables, the Type I error for the study is the ERR and it cannot be calculated for most situations of mixed dependence between the dependent variables. The inflation of the ERR is especially important to control in clinical trials or genetic studies where hundreds of statistical tests can be made. Knudson (2005a) reported that 73% of papers published in the Journal of Applied Biomechanics and the Proceedings of the International Society of Biomechanics in Sports did not control for inflation of ERR, even though there are many simple techniques that have been proposed to correct for this error since the 1950s (Lundbrook, 1998). A review of the 2007 issue of Sports Biomechanics using Knudson’s (2005a) methodology found that all the experimental reports in this issue had multiple dependent variables and 81% of these reports did not account for the increased Type I error rate. This inflation of Type I errors in sports biomechanics research represents a threat to the knowledge base. There are many observations in sports biomechanics experimental studies that are spurious “significant” effects primarily because the authors did not take into account the large number of dependent variables tested. For example, two recent studies have shown that meaningless variables (physiological or biomechanical) can be identified as “statistically significant” by uncorrected multiple statistical comparisons (Austin et al., 2006; Knudson, 2007). The combination of these inflated Type I errors, high levels of non-replication, and poor integration of previous research has contributed to likely high levels of false research findings in scientific papers (Ioannidis, 2005). It is not hard to understand why this data analysis error is so common in biomechanics, beyond the slow adoption of new statistical techniques in a non-statistics discipline. Biomechanical data are time-varying functions; the numerous variables, many with vector components, allow researchers to collect literally hundreds to thousands of dependent variables. Improvements in computing speed and automated data collection increase the potential for the analysis of numerous variables. Fortunately, if a sports biomechanics researcher needs to look at numerous dependent variables individually, there are effective univariate statistical techniques that can adjust for inflation of the Type I error rate. Multivariate statistical techniques are complex and hard to interpret, but can identify combinations of dependent variables related to the independent variables of interest. Dealing with multiple statistical comparisons Some experimental studies (exploratory, rare subjects, etc.) in sports biomechanics need to examine the interaction of many dependent variables. Multivariate analysis of variance (MANOVA) is designed for exploring structure, order, and selection of multiple dependent variables. MANOVA, however, has several drawbacks, including difficulty of interpretation (when combinations of variables or factors are not of interest), the intercorrelation of variables, and wide variety of test statistics and step-down procedures (Morrison, 2005; Rencher, 1998; Tabachnick and Fidell, 2001). Sometimes a MANOVA is performed on the combination of all dependent variables or group dependent variables, with significant effects followed up by univariate ANOVAs. This approach, however, does not protect the inflation of Type I error (Tabachnick and Fidell, 2001) and has been challenged as inappropriate (Huberty and Morris, 1989; Share, 1984). Another approach is to look individually at each dependent variable using several univariate statistical tests, adjusting the critical alpha level to control the overall/family ERR or the probability of spurious rejections (false discovery rate). Many readers may be familiar

Downloaded By: [California State University Chico] At: 20:03 28 April 2009

Validity of sports biomechanics research

99

with Bonferroni adjustments (critical alpha ¼ overall alpha divided by the number of comparisons) to limit the inflation of the EER. Unfortunately, this approach is not normally recommended because it is very conservative (overcorrects for Type I errors) and, consequently, increases the likelihood of Type II errors and decreases statistical power. One common error with this adjustment is to report each significant test at the critical alpha level, not the overall alpha level (Bland and Altman, 1995). A better method that controls for both Type I and II errors and retains the advantages of ease of calculation and interpretation is the Holm’s correction (Lundbrook, 1998). The Holm’s procedure is a progressive adjustment in the critical alpha level based on the number of comparisons (n) and the overall desired EER (a). The hypotheses tested are arranged from the smallest to largest observed P-value. For comparisons (i) from 1 to n, each observed P-value is compared with the adjusted critical P-value according to the formula: Pc ¼ a/(n 2 i þ 1). The Holm’s adjustment is just one of several easily implemented alpha adjustment procedures in the statistics literature, but it is recommended here because of its strong combination of preserving statistical power and ease of use. There are many other advanced methods for controlling the increase in Type I errors when there are multiple comparisons of dependent variables that take into account the association between the dependent variables. These other alternatives to limit the increase in EER are based on setting a low probability of a false discovery rate (Benjamini and Hochberg, 1995; Hsueh et al., 2003; Storey, 2002). These methods are fairly new, computationally demanding, and are therefore not easily adopted by many researchers. Authors of experimental papers submitted to Sports Biomechanics with multiple dependent variables are encouraged to use either multivariate statistical techniques or show how they account for inflation of the EER or falsely rejected hypotheses. Care must be taken to account for all omnibus tests of null hypotheses of multiple dependent variables in a study. Any post hoc tests that follow up a significant main effect within each ANOVA, however, do not have to be included in the number of comparisons count. Post hoc tests are held at the alpha level of the overall test, another common error in research reports. Errors in interpreting meaningful effects Another common error that has fuelled the efforts to reform statistical analyses in scientific journals is the confusion of “statistical significance” in traditional null hypothesis significance testing with the magnitude or practical significance of the results of a study. “Statistical significance” is only the answer to the question of the probability of an observed difference or association being due to chance variation alone. In other words, it is the strength of the evidence against the null hypothesis. It is important to note that the error rates for these “yes/no” decisions are influenced by the assumptions of these tests (Chen and Zhu, 2001) and, therefore, must be verified in the analysis and reported. When a researcher uses relevant and rigorous statistical tests that reach some level of statistical significance, he or she must then interpret the size and/or meaningfulness of the effect. Several scholars and societies have recommended that research reports focus on this issue of magnitude or meaningfulness of effects observed in a study (American Psychological Association, 2001; Hopkins, 2002; International Committee of Medical Journal Editors, 1997; Kirk, 1996; Sterne and Smith, 2001; Thomas et al., 1991). The meaningfulness of the results or differences observed should be a part of every research paper in Sports Biomechanics. A related problem of meaningful differences in biomechanical reports is not interpreting the results in the context of experimental errors. All measurements are made with both

Downloaded By: [California State University Chico] At: 20:03 28 April 2009

100 D. Knudson systematic and random errors, so biomechanical studies should report estimates of these two errors. Typically, the methodology or early results sections can justify and report data on the calibration and experimental errors that define the validity of the model and measurements. The paper should then also address the uncertainty or reliability of the dependent variables examined. In the past, Pearson correlation coefficients of multiple measurements have been incorrectly used as a relative measure of reliability or random error. There are a variety of statistics that provide better measures of reliability that are easily applied in interpreting meaningfulness or size of differences and associations (Hopkins, 2000; Lexell and Downham, 2005). These statistics (e.g. mean change, standard deviation of the difference, standard error of measurement, smallest real difference) are in original units and, unlike Pearson correlations, are sensitive to systematic differences between measurements. Other errors of omission and commission related to the meaningfulness of biomechanical data typically occur in the discussion section of the report. A common error of omission is not discussing the size of differences using effect sizes (Cohen’s d, Glass’ delta, h2, or adjusted R 2) or confidence intervals. It makes little sense discussing a “statistically significant” 1% difference in a biomechanical variable, when the reliability or biological variation for that variable is ten times larger. Another common error of commission is the over-generalization of “significant” results. Authors often explicitly state or imply that the results can be applied to other athletes, not taking into account the size or similarity of the sampling procedures, as well as other important variables such as age, gender, skill level, and other subject characteristics that influence biomechanical variables. Reporting meaningful biomechanical differences There are several ways to document the meaningfulness or practical significance of the results of sports biomechanics studies. Traditionally, this has been based on measures of reliability, experimental error or biologically meaningful values. More recently, several statistical variables that document the size of effects have become more common. This section will summarize these approaches so that potential authors can select the most appropriate statistic. Several decades of biomechanics research seems to have created complacency in reporting experimental error estimates in biomechanics reports. One way to document the credibility of the results in a study is to present or cite evidence that the measurement system was accurate enough to measure specific differences. The precision of measurements (number of decimal places in A/D board or calculations) is not the same as accuracy (Winter, 2008). Precision is the numerical resolution of your measurement system, while accuracy refers to the measurement’s validity in estimating the true value of the measurand (systematic error). All measurement systems used in sport biomechanics should be calibrated to known values, and then measures of error made and reported. Measurement reliability is concerned with the random variation in repeated measurements. Variation is all but assured in repeated measurements of some variable, so what is of interest are the size and source of that variation: tester, within-subject, betweensubjects, or independent variables. There are quite advanced statistical procedures (generalizibility studies) to identify these sources of variation (Roebroeck et al., 1993); however, many recommendations focus on the easier to interpret absolute indices of reliability (Bland and Attman, 1986; Hopkins, 2000; Lexell and Downham, 2005). While reporting a relative index like the intra-class correlation coefficient for the conditions of multiple measurements is acceptable, absolute measures that use original measurement units

Downloaded By: [California State University Chico] At: 20:03 28 April 2009

Validity of sports biomechanics research

101

and give a sense of the size of meaningful differences are preferred. Reliability indexes that should be reported in sports biomechanics papers include mean change, standard error of measurement, coefficient of variation, and the smallest real difference (Lexell and Downham, 2005). Data on the accuracy and reliability of measurements are often presented in the methods section of research reports. The results and discussion sections of the report tend to document the size, meaning or clinical importance of the experimental effects. There are over 40 different kinds of effect sizes that fall into three classes: standardized, varianceaccounted-for, and corrected (Vacha-Haase and Thompson, 2004). An effect size is a statistic that quantifies the degree to which sample results differ from expectations (VachaHaase and Thompson, 2004). This paper cannot review all of these techniques but will highlight the statistics that are most useful in sports biomechanics research. Standardized effect sizes, like Cohen’s d or Glass’ delta, are useful because they can be compared across studies using different dependent variables using properties of the normal curve. Both are like Z-scores that give a measure of an effect in standard deviation units (d ¼ (MExp 2 MCont)/SDpooled and D ¼ (MExp 2 MCont)/SDControl). Most recommendations for interpreting effect sizes include visual presentation of the results, comparison with previous studies, and specialized calculation of confidence intervals for these effect sizes (Thompson, 2002; Vacha-Haase and Thompson, 2004). There are some general recommendations for defining small, medium, and large effect sizes (Table I). These can be used, but experts caution against rigidly applying these to effect size rules (Hopkins, 2002; Vacha-Haase and Thompson, 2004), like the P-value of 0.05 has been abused (Thomas et al., 1991). For example, the very strong effect strength training has on muscular performance has prompted a proposal for adjusting how effect sizes are interpreted based on training level in strength and conditioning research (Rhea, 2004). There are times when variance-accounted-for effect sizes (eta squared, omega squared, R 2) are appropriate for documenting the size of effects in sports biomechanics. Correlation, regression, and mean comparison studies are all based on the general linear model and define the variance accounted for in the dependent variable by the independent variable(s). R 2 reports the strength of linear association between independent variable (X) and dependent variable (Y), while eta squared (h2) is sensitive to any shape association between X and Y. Experts recommend that most often the adjusted versions of these (v2 and adjusted R 2) should be reported because sampling errors do not accurately model effects in the population (Vacha-Haase and Thompson, 2004). Statistics and medicine have a long history of advocacy for the use of confidence intervals (CI) to focus on magnitude and likely population differences, rather than the P-value of the hypothesis test (Sterne and Smith, 2001). By focusing on the range of likely values, rather than the false precision of the mean, the magnitude of differences between groups can be more easily interpreted. The confidence intervals from several studies can be combined in figures to summarize concisely the size of likely effects (Batterham and Hopkins, 2005; Table I. Interpretation of d in mean comparison studies. Size Trivial Small Medium Large Very large

Cohen (1988)

Hopkins (2002)

,0.20 ,0.50 ,0.80

,0.10 ,0.20 ,0.60 ,1.20 ,2.00

Downloaded By: [California State University Chico] At: 20:03 28 April 2009

102 D. Knudson Cumming and Finch, 2005). The reporting of confidence intervals and sample standard deviations also allow for quantitative reviews (meta-analyses) to combine the results of several studies for improved integration of literature to justify evidence-based practice. For example, since typical within-subject variation in vertical take-off velocity in jumping is typically less than 0.06 m/s (CV ¼ 2.5%), meaningful differences in jumping performance could be considered to occur with mean differences or confidence intervals around 3% (Knudson and Corn, 1999). Some experts have advocated the use of 90% confidence intervals to decrease the rate of Type II errors and decrease the confusion between magnitude of differences and P-values (Batterham and Hopkins, 2005; Sterne and Smith, 2001), as well as to focus on defining clinically or physiologically useful ratios of likely benefit to harm (Hopkins, 2007). Practical application and summary Based on the foregoing review of common errors in reporting of research and the recent recommendations on reporting of statistical analyses, the following suggestions for improving sports biomechanics research reports appear prudent. Research reports submitted to scientific journals should have appropriate experimental design for the problem identified by the authors and address the following characteristics to ensure a high degree of study internal and external validity: . Justify the study and specific research questions through logic and critical review of previous research. . Provide detailed methodology including all data processing and statistical tests noting standards for testing and comparison. . Cite or report the experimental error (validity and reliability) in the measurement systems used. . Justify sampling for statistical testing and adequate statistical power. . Normally focus statistical testing on a few important dependent variables, but when multiple dependent variables are of interest, use multivariate statistics or report procedures used to control the experiment-wise error rate or false discovery rate. . Explicitly report the results of statistical tests (test statistics, degrees of freedom, and observed P-value) used and the status of the relevant assumptions for those tests. . Report the size or meaningfulness of observed differences reaching statistical significance using confidence intervals or effect sizes. . Suggest applications to sport based on likely external validity of the study, critical synthesis of the results with previous research, and the levels of evidence of the research. References Altman, D. G. (1998). Statistical reviewing for medical journals. Statistics in Medicine, 17, 2661–2674. American Psychological Association (2001). Publication manual of the American Psychological Association (5th edn.). Washington, DC: APA. Atkinson, G. (2002). Analysis of repeated measurements in physical therapy research: Multiple comparisons amongst level means and multi-factorial designs. Physical Therapy in Sport, 3, 191–203. Austin, P. C., Mamdani, M. M., Huurlink, D. N., and Hux, J. E. (2006). Testing multiple statistical hypotheses resulted in spurious associations: a study of astrological signs and health. Journal of Clinical Epidemiology, 59, 964–969. Bates, B. T. (1989). Comment on “The influence of running velocity and midsole hardness on external impact forces in heel–toe running”. Journal of Biomechanics, 22, 963–965. Batterham, A. M., and Hopkins, W. G. (2005). Making meaningful inferences about magnitues. Sportscience (available online at: http://www.sportsci.org/jour/05/ambwgh.htm), 9, 6–13.

Downloaded By: [California State University Chico] At: 20:03 28 April 2009

Validity of sports biomechanics research

103

Bender, R., and Lange, S. (2001). Adjusting for multiple testing – when and how? Journal of Clinical Epidemiology, 54, 343– 349. Benjamini, Y., and Hochberg, Y. (1995). Controlling the false discovery rate: A practical and powerful approach to multiple testing. Journal of the Royal Statistical Society B, 57, 289 –300. Bland, J. M., and Altman, D. G. (1986). Statistical methods for assessing agreement between two methods of clinical measurement. Lancet, 327 (8476), 307– 310. Bland, J. M., and Altman, D. G. (1995). Multiple significance tests: The Bonferroni method. British Medical Journal, 310, 170. Bond, G. R., Mintz, J., and McHugo, G. J. (1995). Statistical guidelines for the Archives of PM&R. Archives of Physical Medicine and Rehabilitation, 76, 784–787. Chen, A., and Zhu, W. (2001). Revisiting the assumptions for inferential statistical analyses: A conceptual guide. Quest, 53, 418 –439. Cohen, J. (1988). Statistical power analysis for the behavioral sciences (2nd edn.). Mahwah, NJ: Lawrence Erlbaum Associates. Cumming, G., and Finch, S. (2005). Inference by eye: Confidence intervals and how to read pictures of data. American Psychologist, 60, 170 –180. Curran-Everett, D., and Benos, D. J. (2004). Guidelines for reporting statistics in journals published by the American Physiological Society. American Journal of Physiology: Heart and Circulatory Physiology, 287, H447–H449. Ebell, M. H., Siwek, J., Weiss, B. D., Woolf, S. H., Susman, J., Ewigman, B. et al. (2004). Strength of recommendation taxonomy (SORT): A patient-centered approach to grading evidence in the medical literature. American Family Physician, 69, 548–556. Fidler, F., and Cumming, G. (2007). Lessons learned from statistical reform efforts in other disciplines. Psychology in the Schools, 44, 441–449. Hadorn, D. C., Baker, D., Hodges, J. S., and Hicks, N. (1996). Rating the quality of evidence for clinical practice guidelines. Journal of Clinical Epidemiology, 49, 749–754. Hopkins, W. G. (2000). Measures of reliability in sports medicine and science. Sports Medicine, 30, 1–15. Hopkins, W. G. (2002). A scale of magnitudes for effect statistics. A new view of statistics (available online at: http:// sportsci.org/resource/stats/effectmag.html). Hopkins, W. G. (2007). A spreadsheet for deriving a confidence interval, mechanistic inference and clinical inference from a P value. Sportscience (available online at: http://sportsci.org/2007/wghinf.htm), 11, 16–20. Hsueh, H., Chen, J. J., and Kodell, R. L. (2003). Comparison of methods for estimating the number of true null hypotheses in multiple testing. Journal of Biopharmaceutical Statistics, 13, 675–689. Huberty, C. J., and Morris, J. D. (1989). Multivariate analysis versus multiple univariate analyses. Psychological Bulletin, 105, 302–308. International Committee of Medical Journal Editors (1997). Uniform requirements for manuscripts submitted to biomedical journals. Journal of the American Medical Association, 277, 927 –934. Ioannidis, J. P. A. (2005). Why most published research findings are false. PLoS Medicine, 2, 696–701. James, C. R., and Bates, B. T. (1997). Experimental and statistical design issues in human movement research. Measurement in Physical Education and Exercise Science, 1, 55–69. Kirk, R. E. (1995). Experimental design: Procedures for behavior sciences (3rd edn.). Belmont, CA: Brooks/Cole. Kirk, R. E. (1996). Practical significance: A concept whose time has come. Educational and Psychological Measurement, 56, 746 –759. Kirk, R. E. (2001). Promoting good statistical practices: Some suggestions. Educational and Psychological Measurement, 61, 213 –218. Knudson, D. (2005a). Statistical and reporting errors in applied biomechanics research. In Q. Wang (Ed.), Proceedings of XXIII International Symposium on Biomechanics in Sports, Vol. 2 (pp. 811– 814). Beijing: China Institute of Sport Science. Knudson, D. (2005b). Evidence-based practice in kinesiology: The theory to practice gap revisited. The Physical Educator, 62, 212–221. Knudson, D. (2007). Analysis of multiple comparison errors using ground reaction force data. In H. -J. Menzel, and M. H. Chagas (Eds.), Proceedings of the XXVth International Symposium on Biomechanics in Sports (pp. 176–179). Belo Horizontae: Federal University of Mineas Gerais. Knudson, D., and Corn, R. (1999). Intrasubject variability of the kinematics of the vertical jump. In R. H. Sanders, and B. Gibson (Eds.), Scientific Proceedings of the XVII International Symposium on Biomechanics in Sports (pp. 381–384). Perth, WA: Edith Cowan University. Lang, T., and Secic, M. (2006). How to report statistics in medicine: Annotated guidelines for authors, editors, and reviewers (2nd edn.). Philadelphia, PA: American College of Physicians.

Downloaded By: [California State University Chico] At: 20:03 28 April 2009

104 D. Knudson Lexell, J. E., and Downham, D. Y. (2005). How to assess reliability of measurements in rehabilitation. American Journal of Physical Medicine and Rehabilitation, 84, 719–723. Lok, C. K. W., Chan, M. T. V., and Martinson, I. M. (2001). Risk factors for citation errors in peer-reviewed nursing journals. Journal of Advanced Nursing, 43, 223–229. Lundbrook, J. (1998). Multiple comparison procedures updated. Clinical and Experimental Pharmacology and Physiology, 25, 1032–1037. Morris, H. H. (1981). Statistics and biomechanics: Selected considerations. In J. M. Cooper, and B. Haven (Eds.), Proceedings of the CIC Symposium: Biomechanics (pp. 216– 225). Bloomington, IN: Indiana State Board of Health. Morrison, D. F. (2005). Multivariate statistical methods (4th edn.). Belmont, CA: Thompson. Morrow, J. R., and Frankiewicz, R. G. (1979). Strategies for the analysis of repeated and multiple measures designs. Research Quarterly, 50, 297–305. Mullineaux, D. R., Bartlett, R. M., and Bennett, S. (2001). Research design and statistics in biomechanics and motor control. Journal of Sports Sciences, 19, 739 –760. Nieminen, P., Carpenter, J., Rucker, G., and Schumacher, M. (2006). The relationship between quality of research and citation frequency. BMC Medical Research Methodology, 6, 42–49. Rencher, A. C. (1998). Multivariate statistical inference and applications. New York: Wiley. Rhea, M. R. (2004). Determining the magnitude of treatment effects in strength training research through the use of the effect size. Journal of Strength and Conditioning Research, 18, 918 –920. Roebroeck, M. E., Harlaar, J., and Lankhorst, G. J. (1993). The application of generalizability theory to reliability assessment: An illustration using isometric force measurements. Physical Therapy, 73, 386 –401. Sankoh, A. J., Huque, M. F., and Dubey, S. D. (1997). Some comments on frequently used multiple endpoint adjustments methods in clinical trials. Statistics in Medicine, 16, 2529–2542. Share, D. L. (1984). Interpreting the output of multivariate analyses: A discussion on current approaches. British Journal of Psychology, 75, 349 –362. Sterne, J. A. C., and Smith, G. D. (2001). Sifting the evidence – what’s wrong with significance tests? British Medical Journal, 322, 226– 231. Storey, J. D. (2002). A direct approach to false discovery rates. Journal of the Royal Statistical Society B, 64, 479–498. Tabachnick, B. G., and Fidell, L. S. (2001). Using multivariate statistics (4th edn.). Boston, MA: Allyn & Bacon. Thomas, J. R., Salazar, W., and Landers, D. M. (1991). What is missing in p in ,.05? Effect size. Research Quarterly for Exercise and Sport, 62, 344–348. Thompson, B. (1996). AERA editorial policies regarding statistical significance testing: Three suggested reforms. Educational Researcher, 25, 26–30. Thompson, B. (2002). What future quantitative social science research could look like: Confidence intervals for effect sizes. Educational Researcher, 31, 25–32. Thompson, J. R. (1998). Invited commentary: Re: Multiple comparisions and related issues in the interpretation of epidemiologic data. American Journal of Epidemiology, 147, 801–806. Vacha-Haase, T., and Thompson, B. (2004). How to estimate and interpret various effect sizes. Journal of Counseling Psychology, 51, 473–481. Winter, E. (2008). Use and misuse of the term “significant”. Journal of Sports Sciences, 26, 429–430.