Relationship between prevalence rate ratios and odds ratios in cross ...

9 downloads 419 Views 61KB Size Report
computer programs for the estimation of PRR has ... tained via logistic regression analysis available in many statistical packages, to prevalence rate ratios (PRR) ...
International Journal of Epidemiology © International Epidemiological Association 1997

Vol. 26, No. 1 Printed in Great Britain

Relationship between Prevalence Rate Ratios and Odds Ratios in Cross-Sectional Studies CARLO ZOCCHETTI,* DARIO CONSONNI* AND PIER A BERTAZZI** Zocchetti C (Institute of Occupational Health, Faculty of Medicine, University of Milan, Milan, Italy), Consonni D and Bertazzi PA. Relationship between prevalence rate ratios and odds ratios in cross-sectional studies. International Journal of Epidemiology 1997; 26: 220–223. Background. Cross-sectional data are frequently encountered in epidemiology and published results are predominantly presented in terms of prevalence odds ratios (POR). A recent debate suggested a switch from POR, which is easily obtained via logistic regression analysis available in many statistical packages, to prevalence rate ratios (PRR). We thought it useful to explore the mathematical relationship between PRR and POR and to evaluate the degree of divergence of the two measures as a function of the prevalence of disease and exposure. Methods. With the use of some algebra and the common definitions of prevalence of the disease (Pr(D)), prevalence of the exposure (Pr(E)), PRR, and POR in a 2 ´ 2 table, we have identified a useful formula that represents the mathematical relationship between these four quantities. Plots of POR versus PRR for selected values of Pr(D) and Pr(E) are reported. Results. Mathematically speaking the general relationship takes the form of a second order curve which can change curvature and/or rotate around the point POR = PRR = 1 according to the values of Pr(D) and Pr(E), with POR being always further from the null value than is PRR. The discrepancies are much more influenced by variations in Pr(D) than in Pr(E). Conclusions. We think that the choice between POR or PRR in a cross-sectional study ought to be based on epidemiological grounds and not on the availability of software tools. The paper offers a formula and some examples for a better understanding of the relationship between PRR and POR as a function of the prevalence of the disease and the prevalence of the exposure. Keywords: prevalence rate ratio, prevalence odds ratio, cross-sectional study, log-linear model

A recent debate in this1–3 and other journals4–9 has pointed out the interest surrounding cross-sectional studies and the epidemiological measures used to convey their results. Some authors (see for example1,2,7,8) have indicated their preference for the use of prevalence rate ratios (PRR) against the more frequently encountered prevalence odds ratios (POR) and others3 have claimed the utility of both depending on many arguments and/or circumstances. It is likely that the great availability of computer programs which produce OR as a standard result of the fitting of logistic regression models (see for example 10–12 ) has increased the use of POR as effect measures in cross-sectional studies, while the absence of simple

computer programs for the estimation of PRR has probably reduced their application. This gap is likely to reduce13,14 and the selection of appropriate epidemiological measures should not be based on the available tools but on epidemiological grounds. To add further substance to the discussion, and expand on some suggestions made by other authors in this Journal3 we thought it useful to clarify the mathematical relationships between POR and PRR for a better understanding of their performance in different contexts, i.e. for varying values of the prevalences of the disease and exposure.

METHODS If we consider a 2 ´ 2 table deriving from a crosssectional study a number of useful quantities can be easily defined, including: prevalence of the disease (Pr(D)), prevalence of the exposure (Pr(E)), prevalence odds ratio (POR), and prevalence rate ratio (PRR) (Table 1). With the help of some algebra different

* Clinica del Lavoro «Luigi Devoto», Istituti Clinici di Perfezionamento, Milan, Italy. ** Institute of Occupational Health, Faculty of Medicine, University of Milan, Milan, Italy. Reprint requests to: Pier A Bertazzi, Institute of Occupational Health, Faculty of Medicine, University of Milan, Via San Barnaba 8, 20122 Milan, Italy.

220

PREVALENCE RATE RATIOS VERSUS ODDS RATIOS

TABLE 1 Notation and definitions used in the text Exposure

Disease Yes No Total

Yes

No

a c N1

b d N0

Total

D1 D0 T

Pr(D) = D1/T Pr(E) = N1/ T POR = (a d) / (b c) PRR = (a / N1) / (b / N0)

relationships between PRR and POR can be established,3 and a useful resulting formula is the following: POR = PRR

(1 – Pr(E) + PRR Pr(E) – Pr(D)) (1 – Pr(E) + PRR Pr(E) – PRR Pr(D))

(1)

The selected formulation enables us to explore the relationship between POR and PRR as a function of both the prevalence of the disease and the prevalence of the exposure. Implicitly, this choice corresponds to evaluating how the POR departs from the PRR which is conceptually taken as a reference measure: if PRR as a

221

function of POR is of interest, a reverse approach can be easily developed. Pr(D) and Pr(E) could range from 0 to 1 while POR and PRR could extend from 0 to infinity, but due to the relationships between these measures some combinations of the values are not permitted because they give rise to, for example, negative or undefined quantities.

RESULTS A simple look at formula (1), with a little algebra, shows that when PRR is equal to one POR will coincide exactly with PRR irrespective of the values of Pr(D) and Pr(E), while for all other conditions POR and PRR will differ. In addition, when PRR is greater than one POR will be greater than PRR, while when PRR is less than one POR will be less than PRR: both these departures from equality greatly depend on the values of Pr(D) and Pr(E). Figure 1 shows the relationship between PRR and POR for selected values of the prevalence of the disease and of the exposure, in a restricted range of PRR values. The figure highlights that the relationship is not linear (it is a quadratic curve) and that according to the values of Pr(D) and Pr(E) the curve rotates around the value of

FIGURE 1 Graphical relationship between prevalence rate ratio (PRR) and prevalence odds ratio (POR) for selected values of the prevalence of the disease (Pr(D)) and the prevalence of the exposure (Pr(E))

222

INTERNATIONAL JOURNAL OF EPIDEMIOLOGY

PRR equal to one and/or changes curvature. In addition, for the special case of Pr(D) = Pr(E) = 0.5 the POR value is exactly the square of the corresponding PRR. If we consider for the sake of comparison as a baseline the curve which corresponds to a value of Pr(D) = Pr(E) = 0.5 (black triangles) Figure 1 shows that a decrease in the prevalence of the exposure (e.g. Pr(E) = 0.2, black rectangles) will cause an increase in the curvature of the relationship giving rise to values of POR which are always greater than the baseline: in particular they will depart much more from the PRR values when PRR is greater than one and will approach the PRR values when PRR is less than one. If an increase in the prevalence of the exposure is considered (e.g. Pr(E) = 0.8, white circles) a decrease in the curvature of the relationship will result, with POR departing more from the PRR values when PRR is ,1 and approaching them when PRR is .1. A different result will be obtained if a change in the prevalence of the disease (instead of a change in the prevalence of the exposure) is considered: in this situation a rotation of the curve around the point POR = PRR = 1 will take place. For example, again with respect to the curve with Pr(D) = Pr(E) = 0.5, if we consider a decrease of Pr(D) (e.g. Pr(D) = 0.2, white rectangles) the POR will always be closer to the corresponding PRR value, while an increase in the prevalence of the disease (e.g. Pr(D) = 0.8, black circles) will cause a further departure of the POR from PRR: both the changes do not depend on PRR being greater or less than one. In addition, a change in the prevalence of the disease will affect the POR value much more than a change in the prevalence of the exposure, as can be seen from comparison of the white and black rectangle curves in Figure 1 (or the white and black circle curves). When the disease is rare (Pr(D) , 0.10) no major discrepancies emerge between POR and PRR, irrespective of the prevalence of the exposure. For example, with Pr(D) = 0.05 and PRR = 2.5 the values of POR range from 2.71 when Pr(E) = 0.01 to 2.58 when Pr(E) = 0.99, which are not very different from 2.5, and the differencies tend to diminish as PRR approaches one.

DISCUSSION Many epidemiological measures can usefully describe the results of a study, and the selection of the most appropriate is never obvious.15 In particular, some authors have pointed out that the odds ratio is a particularly misunderstood measure,9,15–17 and that the cross-sectional study requires extensive methodological discussion.1–8

Cross-sectional data can serve many purposes,18,19 and the wide range of applications could suggest the use of different epidemiological measures in different contexts. For example, cross-sectional data can be used to estimate incidence density ratios (IDR).18,19 In this situation it has been shown20 that under some restrictive assumptions (regarding, for example, the distribution over time of exposures, covariates, and incidence; migration among diseased; duration of disease) POR approximate IDR better than PRR, which means that when we are dealing with chronic diseases (i.e. long latency diseases with different follow-up periods for the subjects under observation) the use of POR may be completely justified. On the other hand, if risk ratios (RR) are the parameters of interest (considering acute diseases, with follow-up periods similar among subjects) then PRR should be the measure of choice.19 In other applications we are interested in outcomes which are not strictly ‘diseases’: consider, for example, a study with the aim of describing how the proportion of subjects with a particular condition (e.g. an adducts level above the median) varies according to some covariate(s). In these situations the use of the prevalence rate (and hence of PRR) as a descriptor of a ‘state’ seems a more natural and intelligible measure.7 In addition, it has been noted that the usual assumption of similar duration of disease (or prognosis) between exposed and unexposed subjects may not be satisfied: a case in point are musculoskeletal disorders for which duration of symptoms are likely to vary between exposure groups.5,7 In this situation again the use of PRR seems warranted. A further argument against the use of POR is that it can introduce confounding even when there is none in terms of prevalence rates.7 In other instances (the ‘sex ratio’, for example) the odds ratio is a natural epidemiological measure.3 As a further step in the discussion we have explored the relationship between POR and PRR in order to understand the most important discrepancies between them, using a general formula for this relationship as a function of the prevalence of the disease (Pr(D)) and the prevalence of the exposure (Pr(E)). The results indicate that the POR is always further away from the null value than the PRR and that the discrepancies between POR and PRR strongly depend on both Pr(D) and Pr(E), with the former being more important from a quantitative point of view. With respect to the prevalence of the disease we have considered values around 0.5 because they are very common in emerging areas like musculoskeletal disorders5 and molecular epidemiology. In the latter, for

PREVALENCE RATE RATIOS VERSUS ODDS RATIOS

example, the outcome may be represented by a categorization of a continuous variable, e.g. adducts level or other biological markers, with the median value being the cutpoint.21,22 In other areas the most frequent prevalence value of the disease and of the exposure can greatly vary from the presented examples and consequently the discrepancies between POR and PRR could vary as well. In particular, it is well known that when Pr(D) is low (less than, say, 0.10) POR and PRR will be very similar and there will be no practical reasons to distinguish between them. We have chosen to describe the relationship between POR and PRR in terms of the prevalence of the disease instead of, for example, the prevalence of the disease among non-exposed subjects3 because in many cross-sectional studies the exposure status is not a criterion for selecting subjects into the study.19 We have only addressed point estimates of POR and of PRR. Statistical aspects, like test of hypothesis or confidence interval estimation, can easily be included in the discussion recalling that both measures are defined in the frame of binomial variability. From the point of view of hypothesis testing (i.e. accepting/rejecting the hypothesis that POR or PRR is equal to one) the two measures give in practice the same answer, whereas in terms of confidence intervals POR is characterized by wider intervals which, in other words, means less precision in the estimates.9 The issue of the discrepancy in point estimation requires a further comment. When POR and PRR are considered in their own domain, the two measures do not need to be compared, but when they are interpreted as estimators of some ‘risk’ (ratio of risks) a new problem arises. Suppose, for example, that PRR is equal to two and POR is equal to four (which is the case when Pr(D) = Pr(E) = 0.5). Irrespective of their statistical significance (and confidence intervals), the two values indicate, from a quantitative point of view, very different perspectives of interpretation (a twofold versus a fourfold risk) which depend only on the choice of a specific epidemiological measure and not on the scientific question at issue. We should be aware of this bias, particularly when comparing results of different studies from the literature. In summary, irrespective of the preference of different authors or the habits induced by software tools availability, we think that POR and PRR have their specific role with cross-sectional data and that the choice between them should remain on epidemiological grounds only. The arguments made above would suggest the use of PRR instead of POR in most situations. Despite these considerations POR have played a major role in the description of the results of cross-sectional studies due

223

mainly to mathematical convenience 17 and the easy availability of advanced statistical tools (logistic regression, mainly1,7) which were not so easy to use for the estimation of PRR. This situation is expected to change rapidly thanks to new software programs (e.g. SAS GENMOD 12).

REFERENCES 1

Lee J. Odds ratio or relative risk for cross-sectional data? Int J Epidemiol 1994; 23: 201–03. 2 Hughes K. Odds ratios in cross-sectional studies. Int J Epidemiol 1995; 24: 463–64. 3 Osborn J, Cattaruzza M S. Odds ratio and relative risk for crosssectional data. Int J Epidemiol 1995; 24: 464–65. 4 Lee J, Chia K S. Estimation of prevalence rate ratios for cross sectional data: an example in occupational epidemiology. Br J Ind Med 1993; 50: 861–62. 5 Axelson O. Some recent developments in occupational epidemiology. Scand J Work Environ Health 1994; 20: 9–18. 6 Strömberg U. Prevalence odds ratio v prevalence ratio. Occup Environ Med 1994; 51: 143–44. 7 Axelson O, Fredriksson M, Ekberg K. Use of the prevalence ratio v the prevalence odds ratio as a measure of risk in cross sectional studies. Occup Environ Med 1994; 51: 574. 8 Lee J, Chia K S. Use of the prevalence ratio v the prevalence odds ratio as a measure of risk in cross sectional studies. Occup Environ Med 1994; 51: 841. 9 Nurminen M. To use or not to use the odds ratio in epidemiologic analyses? Eur J Epidemiol 1995; 11: 365–71. 10 SERC. EGRET. Seattle: Society for Epidemiologic Research and Cytel Software Corporations, 1991. 11 Preston D L, Lubin J H, Pierce D H. EPICURE. Seattle: HiroSoft International Corp, 1990. 12 SAS Institute. The GENMOD procedure. SAS Technical Report P-243. Cary, NC: SAS Institute, 1993. 13 Zocchetti C, Consonni D, Bertazzi P A. Estimation of prevalence rate ratios from cross-sectional data. Int J Epidemiol 1995; 24: 1064–65. 14 Lee J. Estimation of prevalence rate ratios from cross-sectional data: a reply. Int J Epidemiol 1995; 24: 1066–67. 15 Greenland S. Interpretation and choice of effect measures in epidemiologic analyses. Am J Epidemiol 1987; 125: 761–68. 16 Pearce N. What does the odds ratio estimate in a case-control study? Int J Epidemiol 1993; 22: 1189–92. 17 Miettinen O S. Theoretical Epidemiology. New York: John Wiley, 1985. 18 Rothman K J. Modern Epidemiology. Boston: Little, Brown, 1986. 19 Kleinbaum D G, Kupper L L, Morgenstern H. Epidemiologic Research. Principles and Quantitative Methods. New York: Van Nostrand Reinhold, 1982. 20 Alho J M. On prevalence, incidence, and duration in general stable populations. Biometrics 1992; 48: 587–92. 21 Vineis P, Bartsch H, Caporaso N et al. Genetically based N-acetyltransferase metabolic polymorphism and low-level environmental exposure to carcinogens. Nature 1994; 369: 154–56. 22 Halperin W, Kalow W, Sweeney M H, Tang B K, Fingerhut M, Timpkins B, Wille K. Induction of P-450 in workers exposed to dioxin. Occup Environ Med 1995; 52: 86–91.

(Revised version received July 1996)