Qual Quant (2011) 45:319–328 DOI 10.1007/s11135-009-9298-8
Monte Carlo theoretical trials of methods for assessing statistical significance for differences between adjusted odds ratios Daniel R. Thompson · Mary Beth Zeni
Published online: 19 December 2009 © Springer Science+Business Media B.V. 2009
Abstract Odds are generally defined as the number of successes divided by the number of failures in a given number of trials. An odds ratio is the ratio of one odds divided by another. Odds ratios can be adjusted to reflect associations with the outcome independently of the influence of associations with other variables. These are adjusted odds ratios. There are several well known methods for comparing odds ratios and testing for statistically significant differences between them. Analogous methods for adjusted odds ratios are not well known or well documented. One method for comparing adjusted odds ratios is explained by Hosmer and Lemeshow (Applied logistic regression, 2000). This method is used for the odds ratios for two variables from the same data set. The purpose of this analysis was to apply this method to a different situation: comparing odds ratios for the same variable from two different data sets. Monte Carlo trials were used to assess the performance of the method and these indicated the method performed well. Keywords Odds ratios · Logistic regression · Adjusted odds ratios · Differences in adjusted odds ratios · Monte Carlo trials
1 Introduction In general, an odds is the number of successes divided by the number of failures in a given number of trials. An odds ratio is the ratio of one odds divided by another. In public health, this might be the odds of low weight births for women who smoke divided by the odds for women who do not smoke. If this ratio is 1.7, for example, it would mean the odds of delivering a low weight infant are 70% higher for women who smoke compared to women who do not smoke. In some instances it is informative to compare odds ratios. For example, the odds ratio for low birth weight associated with smoking might be 1.7 in one population and 2.5 in another
D. R. Thompson (B) · M. B. Zeni Florida State University College of Nursing, P.O. Box 3064310, Tallahassee, FL 32306-4310, USA e-mail:
[email protected]
123
320
D. R. Thompson, M. B. Zeni
population. It would be important to know whether this difference could be the result of random variation, or if it reflects a true difference in the risk associated with smoking in the two populations. Comparing odds ratios could also be relevant in comparing research results from two different studies. For example, the odds ratio of 1.7, referred to above, might be from one study and the odds ratio of 2.5, referred to above, might be from a separate study. In comparing these two hypothetical studies, it would be useful to know if the two odds ratios are statistically significantly different. If not, then the results from the two studies were essentially the same, but if the two odds ratios are statistically significantly different, then the results were different for the two studies. Adjusted odds ratios are odds ratios that have been adjusted so they are independent of the influence of associations with other factors. Techniques such as logistic regression are often used to compute adjusted odds ratios. In the smoking example above, the unadjusted odds ratio for smoking may be influenced by the association between smoking and alcohol use if women who smoke are also more likely to use alcohol and if alcohol is also associated with higher odds of low birth weight. The odds ratio associated with smoking could be adjusted using logistic regression so an association between smoking and alcohol use would not influence the adjusted odds ratio for smoking associated with low birth weight. There are several methods that have been developed to compare unadjusted odds ratios. The Breslow and Day (1993), Tarone (1985), and Woolf (1955) methods are three of these. However, techniques for comparing adjusted odds ratios are not easily found in the research literature. In the literature review for this paper, the only source found that directly addressed this subject was Hosmer and Lemeshow (2000). In their book, the statisticians provided a formula for comparing two adjusted odds ratios when both are from the same data set. This approach may be beneficial when analyzing population-based health datasets. For example, one could examine if two adjusted odds ratios from the same data set, such as either the National Survey of Children’s Health or the National Health Interview Survey, are significantly different. The purpose of this analysis is to use Monte Carlo simulation techniques to evaluate the performance of the formula when the two adjusted odd ratios are from different data sets.
2 Methods The general form of the logistic regression equation is: Log (odds) = b1 x1 + b2 x2 + b3 x3 . . . bi xi + k
(1)
From this equation it follows the adjusted odds ratio for the variable i is ebi . For example, if the odds of interest is the odds of an infant being born weighting less than 2500 grams (also known as low birth weight or LBW) and x1 is a variable that indicates whether the mother smoked, then b1 in Eq. 1 above would be the increase in the Log(odds) associated with maternal smoking. The x1 variable would be 0 if the mother did not smoke and would be 1 if the mother smoked. For infants born to mothers who smoked, the log(odds) would be: b1 1 + b2 x2 + b3 x3 . . .bi xi + k, and for infants born to mothers who did not smoke the log(odds) would be: b1 0 + b2 x2 + b3 x3 . . .bi xi + k = b2 x2 + b3 x3 . . .bi xi + k. Note that since b1 0 = 0 it can be dropped from the expression. Exponentiating the log(odds) produces the odds and exponentiating the right side of the equation also produces the odds. Therefore, the odds of LBW for infants born to mothers who smoked is given by:
123
Monte Carlo trials for differences between adjusted odds ratios
321
eb1 1+b2 x2 +b3 x3 ...bi xi +k , and for infants born to mothers who did not smoke the odds would be eb2 x2 +b3 x3 ...bi xi +k . The odds ratio associated with smoking is the ratio of these 2 odds which is: (eb1 1+b2 x2 +b3 x3 ...bi xi +k )/(eb2 x2 +b3 x3 ...bi xi +k ) = eb1 . This is a very brief and incomplete explanation of Logistic Regression and for more information the book “Applied Logistic Regression” by Hosmer and Lemeshow (2000) is an excellent source. As discussed in the introduction, it is sometimes relevant to test for a significant difference between two odds ratios. In terms of the logistic regression equation, this is accomplished by testing for significant differences between two b coefficients. Hosmer and Lemeshow (2000) give the following formula for computing the variance of the difference between two b coefficients: var (b1 − b2) = var (b1) + var (b2) − 2 covariance (b1, b2)
(2)
It should be noted that the b coefficients, their variances and covariances are nearly impossible to calculate without a computer and appropriate statistical software. Almost all statistical software packages provide this information in the standard output which can then be used in Eq. 2. Equation 2 applies when the two b coefficients are both in the same data set and consequently there is a covariance for the two variables. In situations where the b coefficients are from different data sets there would be no covariance and (2) would become: var (b1 − b2) = var (b1) + var (b2)
(3)
The variance for the difference between b1 and b2 can be used to compute a z score for the difference using the formula: Z = (b1 − b2) / var (b1 − b2) (4) In this analysis R software was used to simulate two separate data sets of 1,000 records each (n = 1,000). These data sets included simulated outcomes and risk factors and were randomly drawn from a population with known values for the prevalence’s of the risk factors and outcomes. The glm (generalized linear model) function in R was used to compute the b coefficients and associated variances and these were used with formulas (3) and (4) to compute the Z score for the difference between the b coefficients for the two data sets. Since the two data sets were randomly drawn from the same population, any difference observed between them is due to random variation. At alpha level 0.05 there should be a 0.05 probability that the two b coefficients are significantly different due to chance alone. This process was repeated 1,000 times so there were 1,000 sets of two b coefficients to compare. Formulas (3) and (4) were used to compute the Z score for each of the 1,000 pairs of b coefficients. A frequency table of the Z scores was constructed and compared to the standard normal Z scores for goodness of-fit using the Chi-Square goodness-of-fit method. This was repeated using populations with adjusted odds ratios of: 2.53, 1.00, 0.53, and 0.31. In the end, there were four tables of 1,000 Z scores which were compared to the distribution of the standard normal Z scores, using Chi-Square, to assess goodness-of-fit.
3 Results The results are shown in Figs. 1, 2, 3, 4 and Tables 1, 2, 3, 4. On the figures, the standard Normal distribution is plotted as a line and the number of trials is plotted as bars. The distribution of the Z scores for the differences between the b coefficients (bars) follows the standard
123
322
D. R. Thompson, M. B. Zeni 100
Number of Simulations
90 80
Line = Standard Normal Curve Bars = Number of Simulations
Chi-Square Goodness-of -Fit p value = 0.84
70 60 50 40 30 20 10
-3 -3.6 -3.4 -3.2 -2.0 -2.8 -2.6 -2.4 -2.2 -1.0 -1.8 -1.6 -1.4 -1.2 -0.0 -0.8 -0.6 -0.4 . 0.2 0.0 0.2 0.4 0.6 1.8 1.0 1. 2 1.4 1.6 2.8 2. 0 2.2 2.4 2.6 3.8 3.0 3.2 3.4 6
0
Z Score
Fig. 1 Z scores for 1,000 logistic regression b coefficient differences (population adjusted odds ratio = 2.53)
Number of Simulations
100 90
Line = Standard Normal Curve
80
Bars = Number of Simulations
Chi-Square Goodnes-of -Fit p value = 0.85
70 60 50 40 30 20 10
-3 -3.6 -3.4 -3.2 -2 .0 -2.8 -2.6 -2.4 -2.2 -1.0 -1.8 -1.6 -1.4 -1.2 -0.0 -0.8 -0.6 -0.4 . 0.2 0.0 2 0. 4 0. 6 0. 1.8 0 1. 2 1. 4 1. 1.6 2.8 0 2. 2 2. 4 2. 2.6 8 3. 0 3. 2 3. 3.4 6
0
Z Score
Fig. 2 Z scores for 1,000 logistic regression b coefficient difference (population adjusted odds ratio = 1.00)
Normal distribution (line). The Chi-square goodness of fit statistics indicates no statistically significant difference between the distributions of the Z scores and the Normal distribution.
4 Discussion This analysis indicates that formulas (3) and (4) are valid tests for statistically significant differences between 2 adjusted odds ratios when the odds ratios are from separate data sets. This could be useful in comparing the results of different studies. Researchers could compare two adjusted odds ratios from two infant outcome studies that used data from birth certificates in two different states. The birth certificate data represent separate data sets. For example, Thompson et al. (2008) used Florida birth records to
123
Monte Carlo trials for differences between adjusted odds ratios
323
Number of Simulations
100 90
Line = Standard Normal Curve
80
Bars = Number of Simulations
Chi-Square Goodness-of -Fit p value = 0.65
70 60 50 40 30 20 10
-3 -3.6 -3.4 -3.2 -2.0 -2.8 -2.6 -2.4 -2.2 -1.0 -1.8 -1.6 -1.4 -1.2 -0.0 -0.8 -0.6 -0.4 . 0.2 0.0 2 0. 0.4 6 0. 8 1. 0 1. 1.2 1.4 6 1. 8 2. 0 2. 2.2 2.4 6 2. 8 3. 0 3. 3. 2 4 3. 6
0
Z Score
Fig. 3 Z scores for 1,000 logistic regression b coefficient differences (population adjusted odds ratio = 0.53)
100 90
Number of Simulations
Bars = Number of Simulations 80
Line = Standard Normal Curve
70
Chi-Square Goodness-of -Fit p value = 0.94
60 50 40 30 20 10
-3 -3.6 -3.4 -3.2 -2.0 -2.8 -2.6 -2.4 -2.2 -1.0 -1.8 -1.6 -1.4 -1.2 -0.0 -0.8 -0.6 -0.4 . 0.2 0 0. 0.2 0.4 6 0. 8 1. 0 1. 1.2 4 1. 6 1. 8 2. 2.0 2.2 4 2. 6 2. 8 3. 3.0 2 3. 4 3. 6
0
Z Score
Fig. 4 Z scores for 1,000 logistic regression b coefficient differences (population adjusted odds ratio = 0.31)
investigate maternal obesity and infant death, and found that adjusted odds ratios for infant death were statistically significant in two categories of maternal body mass index: obese and morbidly obese. Perhaps a study in another state, using that state’s birth records, also found similar results. Researchers could investigate if the findings between the two states were statistically significant by employing formulas (3) and (4). Researchers could also compare odds ratios between two states or regions of the country from a national population based dataset as long as the dataset included representative sampling at the state level. For example, two states or regions of the country may report adjusted odds ratio for Hispanic ethnicity related to discontinuous health insurance coverage. Formulas (3) and (4) could be used to compare the adjusted odds ratios between the two states or regions and ascertain statistical significance. Most research articles give the confidence intervals when posting adjusted odds ratios and from these data, the logistic regression b coefficients and b coefficient standard deviations can
123
324 Table 1 Z scores for 1,000 logistic regression b coefficient differences population odds ratio = 2.53
Chi-Square goodness-of-fit p value = 0.84
D. R. Thompson, M. B. Zeni Z score
Actual number of trials
Expected number of trials
−3.6
0
0
−3.4
1
0
−3.2
0
0
−3.0
1
1
−2.8
1
2
−2.6
5
3
−2.4
5
5
−2.2
9
7
−2.0
13
11
−1.8
11
16
−1.6
20
22 30
−1.4
30
−1.2
46
39
−1.0
48
48
−0.8
51
58
−0.6
58
67
−0.4
74
74
−0.2
78
78
0.0
70
80
0.2
67
78
0.4
85
74
0.6
75
67
0.8
49
58
1.0
51
48
1.2
52
39
1.4
25
30
1.6
23
22
1.8
20
16
2.0
15
11
2.2
7
7
2.4
3
5
2.6
3
3
2.8
2
2
3.0
1
1
3.2
1
0
3.4
0
0
3.6
0
0
1000
1000
Total
be calculated. These two calculations can then be used with formulas (3) and (4) to compare adjusted odds ratios from 2 different studies. The main limitation of this analysis is the limited number of odds ratios in the trials and the sample size of 1,000 in each trial. The results could be different with different data
123
Monte Carlo trials for differences between adjusted odds ratios Table 2 Z scores for 1,000 logistic regression b coefficient differences population odds ratio = 1.00
Chi-Square goodness-of-fit p value = 0.85
Z score
Actual number of trials
325 Expected number of trials
−3.6
0
0
−3.4
0
0
−3.2
0
0
−3.0
0
1
−2.8
1
2
−2.6
2
3
−2.4
6
5
−2.2
8
7
−2.0
13
11
−1.8
11
16
−1.6
15
22 30
−1.4
26
−1.2
32
39
−1.0
40
48
−0.8
59
58
−0.6
71
67
−0.4
79
74
−0.2
74
78
0.0
88
80
0.2
86
78
0.4
76
74
0.6
79
67
0.8
69
58
1.0
41
48
1.2
41
39
1.4
32
30
1.6
16
22
1.8
15
16
2.0
7
11
2.2
4
7
2.4
5
5
2.6
1
3
2.8
3
2
3.0
0
1
3.2
0
0
3.4
0
0
3.6
0
0
1000
1000
Total
set sample sizes and/or with different odds ratios. Further Monte Carlo trials should be performed with higher, and especially lower, sample sizes and with different odds ratios to determine if these results remain consistent across a wider range of sample sizes and odds ratios.
123
326 Table 3 Z scores for 1,000 logistic regression b coefficient differences population odds ratio = 0.53
Chi-Square goodness-of-fit p value = 0.65
D. R. Thompson, M. B. Zeni Z score
Actual number of trials
Expected number of trials
−3.6
0
0
−3.4
0
0
−3.2
1
0
−3.0
1
1
−2.8
0
2
−2.6
1
3
−2.4
5
5
−2.2
7
7
−2.0
9
11
−1.8
18
16
−1.6
24
22 30
−1.4
29
−1.2
39
39
−1.0
49
48
−0.8
58
58
−0.6
75
67
−0.4
67
74
−0.2
85
78
0.0
87
80
0.2
85
78
0.4
58
74
0.6
69
67
0.8
60
58
1.0
46
48
1.2
45
39
1.4
33
30
1.6
12
22
1.8
19
16
2.0
9
11
2.2
4
7
2.4
1
5
2.6
2
3
2.8
0
2
3.0
0
1
3.2
1
0
3.4
0
0
3.6
1
0
1000
1000
Total
In summary, the theoretical construct and equations are worth exploring to determine relevancy and application in public health research. The method discussed in this article may serve as a link between tactics used in social sciences and other disciplines with public health analytical research approaches.
123
Monte Carlo trials for differences between adjusted odds ratios Table 4 Z scores for 1,000 logistic regression b coefficient differences population adjusted odds ratio = 0.31
Chi-Square goodness-of-fit p value = 0.94
Z score
Actual number of trials
327 Expected number of trials
−3.6
0
0
−3.4
0
0
−3.2
0
0
−3.0
1
1
−2.8
1
2
−2.6
2
3
−2.4
2
5
−2.2
9
7
−2.0
17
11
−1.8
19
16
−1.6
20
22
−1.4
29
30
−1.2
34
39
−1.0
61
48
−0.8
58
58
−0.6
64
67
−0.4
72
74
−0.2
90
78
0.0
83
80
0.2
68
78
0.4
64
74
0.6
65
67
0.8
54
58
1.0
50
48
1.2
34
39
1.4
34
30
1.6
27
22
1.8
16
16
2.0
8
11
2.2
10
7
2.4
2
5
2.6
3
3
2.8
2
2
3.0
0
1
3.2
1
0
3.4
0
0
3.6
0
0
1000
1000
Total
References Breslow, N.E., Day, N.E.: Statistical Methods in Cancer Research Volume 1: The Analysis of Case– Control Studies. IARC Scientific Publication No. 82, paperback reprint edition. Oxford University Press, London (1993)
123
328
D. R. Thompson, M. B. Zeni
Hosmer, D.W., Lemeshow, S.: Applied Logistic Regression. 2nd edn. Wiley, New York (2000) Tarone, R.E.: On heterogeneity tests based on efficient scores. Biometrika 72, 91–95 (1985) Thompson, D.R., Clark, C.L, Wood, B., Zeni, M.B.: Maternal Obesity and Risk of Infant Death Based on Florida Birth Records for 2004. Public Health Reports, July–August 2008, pp. 487–493 (2008) Woolf, B.: On estimating the relation between blood group and disease. Ann. Hum. Genet. 19, 251–253 (1955)
123