IJD® MODULE ON BIOSTATISTICS AND RESEARCH METHODOLOGY FOR THE DERMATOLOGIST MODULE EDITOR: SAUMYA PANDA
Biostatistics Series Module 3: Comparing Groups: Numerical Variables Avijit Hazra, Nithya Gogtay1
Abstract Numerical data that are normally distributed can be analyzed with parametric tests, that is, tests which are based on the parameters that define a normal distribution curve. If the distribution is uncertain, the data can be plotted as a normal probability plot and visually inspected, or tested for normality using one of a number of goodness of fit tests, such as the Kolmogorov–Smirnov test. The widely used Student’s t‑test has three variants. The one‑sample t‑test is used to assess if a sample mean (as an estimate of the population mean) differs significantly from a given population mean. The means of two independent samples may be compared for a statistically significant difference by the unpaired or independent samples t‑test. If the data sets are related in some way, their means may be compared by the paired or dependent samples t‑test. The t‑test should not be used to compare the means of more than two groups. Although it is possible to compare groups in pairs, when there are more than two groups, this will increase the probability of a Type I error. The one‑way analysis of variance (ANOVA) is employed to compare the means of three or more independent data sets that are normally distributed. Multiple measurements from the same set of subjects cannot be treated as separate, unrelated data sets. Comparison of means in such a situation requires repeated measures ANOVA. It is to be noted that while a multiple group comparison test such as ANOVA can point to a significant difference, it does not identify exactly between which two groups the difference lies. To do this, multiple group comparison needs to be followed up by an appropriate post hoc test. An example is the Tukey’s honestly significant difference test following ANOVA. If the assumptions for parametric tests are not met, there are nonparametric alternatives for comparing data sets. These include Mann–Whitney U‑test as the nonparametric counterpart of the unpaired Student’s t‑test, Wilcoxon signed‑rank test as the counterpart of the paired Student’s t‑test, Kruskal–Wallis test as the nonparametric equivalent of ANOVA and the Friedman’s test as the counterpart of repeated measures ANOVA.
From the Department of Pharmacology, Institute of Postgraduate Medical Education and Research, Kolkata, West Bengal, 1Department of Clinical Pharmacology, Seth GS Medical College and KEM Hospital, Parel, Mumbai, Maharashtra, India Address for correspondence: Dr. Avijit Hazra, Department of Pharmacology, Institute of Postgraduate Medical Education and Research, 244B Acharya J. C. Bose Road, Kolkata ‑ 700 020, West Bengal, India. E‑mail:
[email protected]
Key Words: Analysis of variance, Friedman’s test, Kolmogorov–Smirnov test, Kruskal–Wallis test, Mann–Whitney U‑test, normal probability plot, t‑test, Tukey’s test, Wilcoxon’s test
Introduction We have discussed earlier that numerical data can be recorded on an interval scale or a ratio scale – the latter scale is distinguished by a true zero and enables differences to be judged in the form of ratios. This distinction, however, usually does not influence the choice of statistical test for comparing such data. However, the distribution of the data does influence this choice. Numerical data that are normally distributed can be analyzed with parametric tests, that is tests which are based on the parameters that define a normal distribution curve. Parametric tests assume that: Access this article online Quick Response Code: Website: www.e‑ijd.org
DOI: 10.4103/0019-5154.182416
• Data are numerical • The distribution in the underlying population is normal • Observations within a group are independent of one another • The samples have been drawn randomly from the population • The samples have the same variance (“homogeneity of variances”). If it is uncertain whether the data are normally distributed, they can be plotted as a normal probability This is an open access article distributed under the terms of the Creative Commons Attribution‑NonCommercial‑ShareAlike 3.0 License, which allows others to remix, tweak, and build upon the work non‑commercially, as long as the author is credited and the new creations are licensed under the identical terms. For reprints contact:
[email protected]
How to cite this article: Hazra A, Gogtay N. Biostatistics series module 3: Comparing groups: Numerical variables. Indian J Dermatol 2016;61:251-60. Received: March, 2016. Accepted: March, 2016.
© 2016 Indian Journal of Dermatology | Published by Wolters Kluwer - Medknow
251
Hazra and Gogtay: Comparing groups: Numerical variables
plot and visually inspected. In making such a plot, the data are first sorted and the sorted data are plotted along one axis against theoretical values plotted along the other axis. These latter values are selected to make the resulting plot appear like straight line if the data are approximately normally distributed. Deviations from a straight line suggest departures from normality. The normal probability plot is a special case of the quantile‑quantile (Q–Q) probability plot used to test for a normal distribution.
population, but there is no satisfactory answer to the question which is the best test in a given situation. In general, the Kolmogorov–Smirnov test is the oldest of this family of tests, is widely used and tolerates more deviation from strict normality. Nonnormal, or skewed, data can be transformed so that they approximate a normal distribution. The commonest method is a log transformation, whereby the natural logarithms of the raw data are analyzed. If the transformed data are shown to approximate a normal distribution, they can then be analyzed with parametric tests. Large samples (say n > 100) approximate a normal distribution and can nearly always be analyzed with parametric tests. This assumption often holds even when the sample is not so large but say is over 30. However, with the increasing implementation of nonparametric tests in statistical software, the need for normality assumptions and data transformations seldom arise now‑a‑days.
Figure 1 depicts two instances of the normal probability plot. The dots are closely approximating the straight line in the left panel suggesting that the data are approximately normally distributed. The dots are sagging below the expected straight line in the right panel. This suggests that the data are positively skewed. An S‑shaped pattern about the straight line would suggest multimodal data. If we do not wish to assess normality by eyeballing, we can opt for one of the “goodness of fit” tests that test the goodness of the fit of the sample distribution to an expected normal distribution. The Kolmogorov–Smirnov test (after Andrei Nikolaevich Kolmogorov, 1933 and Nikolai Vasilevich Smirnov, 1939) for normality is frequently used. This compares the sample data with a standard normal distribution with the same mean and variance and derives a P value; if P > 0.05 then the null hypothesis cannot be rejected (i.e., the sample data are not different from the normal distribution) and the data are considered to be normally distributed. Another widely used test to determine normality based on the null hypothesis principle is the Shapiro–Wilk test (after Samuel S. Shapiro and Martin Bradbury Wilk, 1965). The Lilliefors test (after Hubert Lilliefors, 1967) is another normality test derived from the Kolmogorov–Smirnov test. There are still other tests to determine whether the sample has been derived from a normally distributed
The requirement for observations within a group to be independent means that multiple measurements from the same set of subjects cannot be treated as separate unrelated sets of observations. Such a situation requires specific repeated measures analyses. The requirement for samples to be drawn randomly from a population is not always met, but the results of hypothesis tests have proved to be reliable even if this assumption is not fully met. Before we take up individual tests, let us recapitulate through Figure 2, the tests that are available to compare groups or sets of numerical data for significant difference.
Student’s T‑test The Student’s t‑test is used to test the null hypothesis that there is no difference between two means. There are three variants:
Figure 1: Normal probability plots for normally distributed (left panel) and skewed (right panel) data
Indian Journal of Dermatology 2016; 61(3)
252
Hazra and Gogtay: Comparing groups: Numerical variables
Numerical data
Unpaired data
Parametric
2 groups Student’s unpaired t test > 2 groups Analysis of variance (ANOVA) or F test
Otherwise
2 groups Mann-Whitney U test > 2 groups Kruskal-Wallis H test or KruskalWallis ANOVA
Paired data
Otherwise
Parametric
2 groups Student’s paired t test > 2 groups Repeated measures ANOVA
Figure 3: A t distribution (for n = 10) compared with a normal distribution. A t distribution is broader and flatter, such that 95% of observations lie within the range mean ± t × standard deviation (t = 2.23 for n = 10) compared with mean ± 1.96 standard deviation for the normal distribution
2 groups Wilcoxon’s signed-rank test > 2 groups Friedman’s ANOVA
given degree of freedom. Degree of freedom is equal to one less than the sample size and denotes the number of independent observations available. As the degree of freedom increases, the t distribution approaches the normal distribution. The table for the t distribution would show that as the degree of freedom increases, the value of t approaches 1.96 at a P value of 0.05. This is analogous to a normal distribution where 5% of values lie outside 1.96 standard deviations from the mean.
Figure 2: Statistical tests to compare numerical data for difference
• One‑sample t‑test: To test if a sample mean (as an estimate of a population mean) differs significantly from the population mean. In other words, it is used to determine whether a sample comes from a population with a specific mean. The population mean is not always known, but may be hypothesized • Unpaired or independent samples t‑test: To test if the means estimated for two independent samples differ significantly • Paired or related samples t‑test: To test if the means of two dependent samples, or in other words two related data sets, differ significantly.
The formulae for the t‑test (which are relatively simple) give a value of t. This is then referred to a t distribution table to obtain a P value. However, statistical software will directly return a P value from the calculated t value. To recapitulate, the P value quantifies the probability of obtaining a difference or change similar to the one observed, or one even more extreme, assuming the null hypothesis to be true. The null hypothesis of no difference can be rejected if the P value is less than the chosen value of the probability of Type I error (α), and it can be concluded that the means of the data sets are significantly different.
The test is named after the pseudonym Student of William Sealy Gossett who published his work in 1908 while he was an employee of the Guinness Breweries company in Dublin and company policy prevented him from using his real name. His t‑test is used when the underlying assumptions of parametric tests are satisfied. However, it is robust enough to tolerate some deviation from these assumptions which can occur when small samples (say n