Stratification for exploring heterogeneity in systematic ...

4 downloads 67506 Views 148KB Size Report
EBM notebook ... continuous (Galbraith plots)9, and investigating heterogeneity ... 9. Galbraith RF. A note on graphical presentation of estimated odds ratios from ...
EBM notebook

Jottings … One of the advantages of being the editor of Evidence-Based Medicine is that I get to see the cream of the current research. The role has helped me keep up to date for years. But alas, it is also exhausting; so I will be ending my editorship with this issue. Also, as of January 2010, the Health Information Research Unit at McMaster University will no longer be producing the contents for Evidence-Based Medicine. Evidence-Based Medicine will continue, but for a number of reasons, the BMJ Group has decided to use a new provider. Having been editor for 10 years, I decided that this was a good opportunity to stand down. My co-editor, Brian Haynes, and his staff created the publication in 1995 at the request of the BMJ Group, and Brian has edited or co-edited it to date. Brian will be stepping down as co-editor as well. Our time as editors has been hard work but extremely rewarding, and we have been honoured to be involved with such a wonderful service to clinicians. We will now cheer from the sidelines.

Stratification for exploring heterogeneity in systematic reviews In quantitative systematic reviews, the results of individual studies are combined into an overall pooled estimate by calculating a weighted average.1 It is natural to expect some differences among the results of the studies included; indeed, it would be astonishing if the results were identical. However, such variation raises the key question of whether combining the studies is tenable or highly suspect. If this variation (or heterogeneity) is consistent with the ‘‘play of chance,’’ then obtaining a pooled estimate is appropriate. If it is not, the cause of this variation should be investigated.

We would like to thank you all for your help over the years and trust that you will continue to support the journal. The BMJ Group is searching for a new editor, and we hope some of you will be interested in applying. Special thanks to the wonderful staff we’ve all worked with over the years, currently led by Angela Eady and Susan Marks, and including research associates (currently Joanne Gunby, Jean Mackay, Rovena Tey, and Lorraine Weise-Kelly); MORE rating system and production led by Dawn Jedras; Olive Goddard, the Oxford coordinator; and editorial assistants, Laurie Gunderman and Nancy Bordignon at McMaster and Mary Hodgkinson in the UK. With the change of team in 2010, I expect readers will see some innovations. We wish the new team well and hope that readers of Evidence-Based Medicine will continue to enjoy the ‘‘less is more’’ format for keeping up to date.

Paul Glasziou, MBBS, PhD Editor, Evidence-Based Medicine

STRATIFICATION AS A METHOD TO INVESTIGATE SOURCES OF HETEROGENEITY Heterogeneity may generally be thought of as either quantitative or qualitative. Quantitative (or statistical) heterogeneity describes the variation in study results that is due to betweenstudy differences in methodological characteristics (eg, aspects of study design), whereas qualitative (or clinical) heterogeneity describes variation due to between-study differences in patient characteristics. A simple method of exploring quantitative and qualitative heterogeneity is to separate studies into subgroups that share either particular methodological characteristics (eg, double-blinding) or particular clinical characteristics (eg, patients with or without prior disease), and to compare estimates of the pooled weighted average in each group.

WAYS TO IDENTIFY HETEROGENEITY

CASE STUDY ON HOW TO INVESTIGATE HETEROGENEITY USING STRATIFICATION

Heterogeneity can often be identified through a simple visual inspection of a forest plot of all individual study results.1 If the results of individual studies broadly appear to ‘‘line up’’ with each other, then there is probably little heterogeneity. However, it is not always easy to see whether the results of different studies are consistent with one another. In such cases, formal statistical approaches can be used to quantify the amount of heterogeneity, and whether it is consistent with the play of chance. One common approach is to calculate the Q statistic, which tests the null hypothesis of no true heterogeneity.2 A related approach is to calculate the I2 statistic, which quantifies the relative consistency between studies on a scale of 0–100%, with higher numbers indicating more inconsistency.3 If study results are deemed to be broadly consistent (eg, I2 ,50%), the systematic reviewer might be satisfied that the results can be considered together in a single pooled weighted estimate (although this does not necessarily mean that the pooled estimate is correct, that is, unbiased).4 However, if substantial inconsistency between studies is observed (eg, I2 .50%), the systematic reviewer may wish to be more cautious about combining the individual results until the potential reasons for heterogeneity are explored.5

We use a subset of studies6 from a systematic review conducted by Latham et al7 8 to illustrate this approach. This systematic review investigated the extent to which progressive resistance training [PRT] (strength training programs that increase the stimulus as strength improves) increases overall muscle strength. This may be particularly important (eg, for older people) because if PRT increases overall muscle strength, then this could reduce risk of fractures. Figure 1 shows the overall pooled effect and corresponding Q statistic of 95.8. The corresponding p value of ,0.001 can be found by referring to tables of the chi-square distribution with 36 degrees of freedom (df). The large Q statistic and highly significant p value shows that there is substantially more heterogeneity between the 37 studies than would be expected by chance alone, a conclusion that is supported by the high I2 statistic of 62%. In the PRT review, several potential sources of methodological heterogeneity were defined a priori: use of intention-to-treat analysis; blinding of participants; blinding of outcome assessors; and clear description of randomisation procedures. Figure 1 shows the results of the 37 studies stratified by description of randomisation procedures. A clear difference is seen between the pooled effect size estimates for studies that gave a clear description of randomization procedures compared with studies

162

EBM December 2009 Vol 14 No 6

EBM notebook

Figure 1 Forest plot for 37 studies included in the PRT meta-analysis stratified by ‘‘clear description of randomisation ’’The black squares represent the standardised mean differences (with area proportional to the amount of information available in each study; ie, larger studies have bigger squares), and horizontal lines indicate 95% CIs. The black diamond indicates the subgroup pooled estimate. The open diamond is the overall pooled estimate. The vertical line through zero represents no effect. that did not (the 2 black diamonds shown in the figure). Moreover, within-subgroup heterogeneity is much lower (and also not significant) among studies that gave a clear description of randomisation procedures (Q = 9.85, p = 0.28, I2 = 19%) than it is among studies that did not (Q = 52.2, p = 0.003, I2 = 48%). Stratification of studies into these 2 subgroups shows that methodological differences between studies can affect heterogeneity. Figure 2 summarises the information for all of the proposed methodological criteria. In addition to the marked difference in the summary estimates depending on whether a clear description of randomization procedures was given, a small, but significant, difference was found depending on participant blinding (p = 0.028, not shown in figure). In the PRT review, the following potential sources of clinical heterogeneity were defined a priori: functional limitations of patients; populations with specific health problems; intensity of treatment (high v low); and duration of PRT ((12 v .12 wks). Figure 3 shows the forest plot of the 37 studies stratified into subgroups by each of these criteria. No significant differences were found in the summary estimates of any of these subgroups, EBM December 2009 Vol 14 No 6

and so (unlike the subgroup defined by clear description of randomization procedures), the subgroup-specific estimates shown in figure 3 are each consistent with each other and also with the overall pooled estimate for all 37 studies combined.

OTHER METHODS Several other graphical methods can be used to explore heterogeneity in specific situations. These include investigating how sensitive the combined result is to any 1 study (ie, ‘‘leave 1 out’’ plots), investigating heterogeneity when the outcome is continuous (Galbraith plots)9, and investigating heterogeneity for other outcome measures such as risk differences, risk ratios, and odds ratios (L’Abbe´ plots).10 Furthermore, special regression methods can be used to investigate heterogeneity due to several factors simultaneously, although the results of such approaches should be interpreted with caution.11 In summary, in the presence of heterogeneity, the systematic reviewer should attempt to explain the potential sources of heterogeneity as an important aid to interpretation. 163

EBM notebook

Figure 2

Potential methodological sources of heterogeneity

Figure 3

Potential clinical sources of heterogeneity

Derrick A Bennett PhD, Jonathan R Emberson PhD

5.

University of Oxford; Oxford, UK

6.

1. 2.

3. 4.

164

Perera R, Heneghan C. Interpreting meta-analysis in systematic reviews. Evid Based Med 2008;13:67–9. Higgins J, Thompson S, Deeks J, et al. Statistical heterogeneity in systematic reviews of clinical trials: a critical appraisal of guidelines and practice. J Health Serv Res Policy 2002;7:51–61. Higgins JP, Thompson SG, Deeks JJ, et al. Measuring inconsistency in metaanalyses. BMJ 2003;327:557–60. Sterne JA, Gavaghan D, Egger M. Publication and related bias in meta-analysis: power of statistical tests and prevalence in the literature. J Clin Epidemiol 2000;53:1119–29.

7. 8. 9. 10. 11.

Higgins JP, Thompson SG. Quantifying heterogeneity in a meta-analysis. Stat Med 2002;21:1539–58. Bennett DA, Latham NK, Stretton C, et al. Capture-recapture is a potentially useful method for assessing publication bias. J Clin Epidemiol 2004;57:349–57. Latham NK, Bennett DA, Stretton CM, et al. Systematic review of progressive resistance strength training in older adults. J Gerontol A Biol Sci Med Sci 2004;59:48–61. Latham N, Anderson C, Bennett D, et al. Progressive resistance strength training for physical disability in older people. Cochrane Database Syst Rev 2003(2):CD002759. Galbraith RF. A note on graphical presentation of estimated odds ratios from several clinical trials. Stat Med 1988;7:889–94. L’Abbe´ KA, Detsky AS, O’Rourke K. Meta-analysis in clinical research. Ann Intern Med 1987;107:224–33. Thompson SG, Higgins JP. How should meta-regression analyses be undertaken and interpreted? Stat Med 2002;21:1559–73.

EBM December 2009 Vol 14 No 6

Suggest Documents