Checklists of methodological issues for review ...

3 downloads 106 Views 3MB Size Report
*Correspondence to: George Wells, Department of Epidemiology and Community Medicine, University of Ottawa, 451 Smyth Road, Ottawa,. ON K1H 8M5 ...
Special Issue Paper Received 23 January 2013,

Accepted 3 February 2013

Published online in Wiley Online Library

(wileyonlinelibrary.com) DOI: 10.1002/jrsm.1077

Checklists of methodological issues for review authors to consider when including non-randomized studies in systematic reviews George A Wells,a,b*† Beverley Shea,c Julian PT Higgins,d,e Jonathan Sterne,f Peter Tugwellb,g and Barnaby C Reevesh Background: There is increasing interest from review authors about including non-randomized studies (NRS) in their systematic reviews of health care interventions. This series from the Ottawa Non-Randomized Studies Workshop consists of six papers identifying methodological issues when doing this. Aim: To format the guidance from the preceding papers on study design and bias, confounding and metaanalysis, selective reporting, and applicability/directness into checklists of issues for review authors to consider when including NRS in a systematic review. Checklists: Checklists were devised providing frameworks to describe/assess: (1) study designs based on study design features; (2) risk of residual confounding and when to consider meta-analysing data from NRS; (3) risk of selective reporting based on the Cochrane framework for detecting selective outcome reporting in trials but extended to selective reporting of analyses; and (4) directness of evidence contributed by a study to aid integration of NRS findings into summary of findings tables. Summary: The checklists described will allow review groups to operationalize the inclusion of NRS in systematic reviews in a more consistent way. The next major step is extending the existing Cochrane Risk of Bias tool so that it can assess the risk of bias to NRS included in a review. Copyright © 2013 John Wiley & Sons, Ltd. Keywords:

non-randomized studies; systematic review; methodological issues; checklists

1. Introduction The aim of this paper is to format the guidance from the preceding papers of the Ottawa Non-Randomized Studies Workshop [Higgins et al., 2013; Valentine and Thompson, 2013; Norris et al., 2013; Schünemann et al., 2013] into checklists of issues for review authors to consider when including primary non-randomized studies (NRS) in a systematic review.

a

Department of Epidemiology and Community Medicine, University of Ottawa, Ottawa, Canada Department of Medicine, University of Ottawa, Ottawa, Canada c Community Information and Epidemiological Technologies, Institute of Population Health, Ottawa, Canada d MRC Biostatistics Unit, Cambridge, UK e Centre for Reviews and Dissemination, University of York, York, UK f School of Social and Community Medicine, University of Bristol, UK g Centre for Global Health, Institute of Population Health, Ottawa, Canada h Clinical Trials and Evaluation Unit, School of Clinical Sciences, University of Bristol, UK *Correspondence to: George Wells, Department of Epidemiology and Community Medicine, University of Ottawa, 451 Smyth Road, Ottawa, ON K1H 8M5, Canada. † E-mail: [email protected] b

Res. Syn. Meth. 2013, 4 63–77

63

Copyright © 2013 John Wiley & Sons, Ltd.

G. A. WELLS ET AL.

When including NRS, it is even more important to plan the review protocol carefully in advance of doing the systematic review than when including only randomized controlled trials (RCTs) [Reeves et al., 2013]. Hence, when including NRS in a systematic review, the additional tasks that review authors need to carry out start at the protocol stage. Then, when appraising individual primary studies, there is a need to carry out additional assessments using the checklists described, often in relation to information specified in the review protocol. At the protocol stage, two key tasks need to be specified. First, detailed definitions of each component of the population, intervention, comparator and outcome (PICO) question should be provided. In principle, this task is no different for a systematic review that includes NRS from one that includes only RCTs. However, since one of the main advantages of including NRS is argued to be the improved directness of the NRS evidence, it is particularly important to be able to assess the directness of primary studies in detail in relation to the review question, perhaps by considering the ideal RCT that would answer the review question (see Section 5). Second, careful consideration is needed of which types of NRS should be eligible for the review. This assessment is more challenging than determining eligibility for a systematic review of RCTs only. As described by Higgins and colleagues [Higgins et al., 2013], workshop delegates did not agree whether eligibility criteria should be set to ensure a particular level of quality of evidence or to avoid an ‘empty’ review (for example, to consider including certain types of studies only if no ‘better’ types of studies are identified). Therefore, review authors should be explicit about their position with respect to this issue and consider carefully, based on their background knowledge of the literature, what kinds of NRS will best allow the systematic review to address the PICO question specified in the review protocol [Reeves et al., 2013]. This consideration typically involves a trade-off between minimizing the risk of bias and maximizing the directness of the evidence to the research question. The task will require consideration of a number of factors, such as: the availability of evidence; the urgency of having an answer to the question; the intrinsic risk of bias to evaluations of the question (given the nature of the question, i.e. unintended/harmful versus intended/beneficial outcome); and the likely directness of evidence from different kinds of study. We recommend that eligibility be specified by design features because of the ambiguity of study design labels [Higgins et al., 2013]. The following four tasks are also important to inform the choice of eligible NRS and the ways in which they might be tackled within the review. • Identify prognostic factors available before administering an intervention that predict which intervention is likely to be administered (Rubin’s ‘assignment mechanism’ [Rubin, 1991]). • The kinds of ‘typical’ analysis that, as review authors, one would expect to see for each outcome of interest in relation to the ‘inferential logic’ for different study designs [Valentine and Thompson, 2013]. • Whether the primary studies are likely to have pre-specified a formal hypothesis as in an RCT (confirmatory objective) or not (exploratory objective). • Whether the review question is likely to have been addressed as the primary aim/objective of primary studies, or opportunistically as a subsidiary objective. The planning of the review methodology has to take place at three levels: • At the level of the review protocol (following standard considerations, plus the key review-level tasks described above). • At the level of assessing each primary study (particularly in appraising risk of bias). • At the level of each specific outcome of interest to the review (this will often take place within assessment of the primary studies, as already required in an existing Cochrane Risk of Bias appraisal) [Higgins et al., 2008; Higgins et al., 2011].

64

In the sections below, we separate considerations into these three levels. The methodological areas covered by the workshop were never intended to be comprehensive, but to cover issues that were considered to be of highest priority. Other methodological aspects relating to the inclusion of NRS in systematic reviews need to be considered, including how the risk of detection, performance and attrition bias should be assessed for different types of study. The workshop also did not consider in detail methodological requirements relating to specific study designs (defined by common sets of study design features). In relation to study design, studies in which groups for comparison are defined by participants’ outcomes (often called case– control studies) require special consideration. The structure of this paper is the same as the introductory paper, in that the four areas discussed at the workshop are reviewed, in turn. For each area, we propose a checklist of issues that review authors should consider when including primary NRS in a systematic review. With the exception of the checklist about study design features, we have framed questions about the issues so that review authors can assess them using the response options adopted for the QUADAS-2 tool (i.e. ‘yes’, ‘probably yes’, ‘probably no’, and ‘no’) [Whiting et al., 2011]. However, unlike QUADAS-2, we have not constrained the direction of the responses to be consistent (i.e. ‘yes’ does not always imply good quality) in order to avoid contrived phrasing of the questions. Copyright © 2013 John Wiley & Sons, Ltd.

Res. Syn. Meth. 2013, 4 63–77

G. A. WELLS ET AL.

2. Study design Study design and analysis features are important considerations when judging the inclusion of NRS in a systematic review. We need to achieve discrimination between the various NRS designs as determined by the study design and analysis. The type and extent of bias is likely to depend on these design features. 2.1. Tasks at the level of the review protocol Establish a consensus among the review authors to: • Identify issues relating to study design and risk of bias when including NRS to address the review question. • Consider carefully which types of NRS should be eligible for the review in terms of study design features that are not typically specified by design labels (e.g. key steps in primary studies that must have been carried out after the study was designed; confounders that must be considered). • Plan how citations will be triaged for eligibility, noting that eligibility may be difficult to apply on the basis of abstracts only; review authors will likely need to read a larger number of full text papers to decide on eligibility. 2.2. Tasks at the level of assessing a primary study Apply the list of study design features, at participant-level allocation (i.e. studies formed by classifying individuals by intervention and comparator) or cluster-level allocation (i.e. studies formed by classifying clusters by intervention and comparator) as appropriate, to each primary study (Table 1 and Table 2) [Higgins et al., 2013]. The checklists are based on four key questions. (1) Was there a relevant comparison? (2) How were the comparison Table 1. Checklist of study design features for studies formed by classifying individuals by intervention and comparator.

65

Copyright © 2013 John Wiley & Sons, Ltd.

Res. Syn. Meth. 2013, 4 63–77

G. A. WELLS ET AL.

Table 2. Checklist of study design features for studies formed by classifying clusters by intervention and comparator.

groups formed? (3) Were key research steps relating to identification of participants, baseline assessment, allocation or choice of intervention and assessment of outcomes carried out after the study was designed? (4) How comparable were the groups?

3. Confounding In the framework of the current Cochrane Risk of Bias tool [Higgins and Altman, 2008; Higgins et al., 2011], confounding would arise if flaws in the randomization process allowed participants’ prognosis to influence their allocated intervention. Assessment of the risk of bias due to unmeasured or residual confounding is a fundamental component of the appraisal of a NRS being considered for inclusion in a systematic review and the nature of any meta-analysis. ‘A meta-analysis [of NRS] may give a precise estimate of average bias, rather than an estimate of the intervention’s effect’ [Valentine and Thompson, 2013]: such an estimate has the potential to misinform future clinical or policy decisions. The risk of unmeasured or residual confounding is likely to vary across studies because the researchers will have measured (or have access to) different potential confounding variables and have made varying decisions about the variables included in analyses, and statistical methods used to control confounding. 3.1. Tasks at the level of the review protocol Establish a panel with expert content knowledge of the topic to:

66

• Identify the likely domains of confounding for each outcome, by consensus among a panel with expert content knowledge of the topic. (The term domain is used in order to recognize that different variables may have been used by primary researchers to measure one relevant domain, of confounding, e.g. Copyright © 2013 John Wiley & Sons, Ltd.

Res. Syn. Meth. 2013, 4 63–77

G. A. WELLS ET AL.

Table 3. Issues related to unmeasured and residual confounding to be assessed for each outcome specified in the review protocol.

Copyright © 2013 John Wiley & Sons, Ltd.

Res. Syn. Meth. 2013, 4 63–77

67

socioeconomic status might be characterized in a dataset by total household income, small area deprivation index, occupation or educational level attained.) The panel will need to consider likely confounding domains separately from consideration of the likely availability within primary studies of data that will allow authors to control for the confounding [Rubin, 1991]. If the review question is about an organizational evaluation and likely to have been evaluated using cluster-allocated designs, panel members should be alert to the

G. A. WELLS ET AL.

possibility of residual confounding arising at the cluster level. The panel may also specify the probable nature of the association between the confounder and the outcome (e.g. linear, U shaped). If propensity score methods are used for analysis, then such considerations would apply to the relationship between confounder and intervention. The panel may also anticipate the likely direction of confounding bias. The investment in specifying domains of confounding at the stage of the review protocol will make the subsequent assessment of unmeasured and residual confounding by review authors easier. Once this information is available in the protocol, the review authors can prepare data extraction sheets to collect their assessments of the risk of unmeasured and residual confounding for primary studies (Table 3). There should be a data collection sheet for each outcome specified in the review, listing the identified confounders (ideally, in order of importance) in rows. 3.2. Tasks at the level of assessing a primary study At the level of primary studies, tasks are mainly in relation to assessing the risk of residual confounding at the level of outcomes within studies (see below). However, if a primary study has used propensity scores to adjust for potential confounding, then an assessment may be done for all outcomes at study level; this assessment would relate to the adequacy of the adjustment using propensity scores. For consistency, we propose that the assessment would nevertheless be recorded at the level of each outcome, as described below, although the responses would be the same for each outcome. 3.3. Tasks at the level of assessing an outcome within a primary study As described by Valentine and Thompson, (2013), ‘an assessment of the risk of confounding should consider whether each confounder was measured, how precisely each was measured (e.g. binary, ordered categories, continuously scaled), and how each was taken into account in the analysis – both the design and analysis strategy (e.g. matching, multiple regression, propensity scores), and underlying assumptions about the association between the confounder and the outcome or intervention (e.g. linearity of the functional form).’ Therefore, review authors need to consider each of the following issues for each confounder in turn for each outcome, in turn (see Table 3): A. Is there evidence that a pre-specified confounding domain did not give rise to confounding? To answer this question, the review author should identify whether researchers did any of the following for one or more confounders relevant to the domain: (1) found that adjustment for the confounder(s) made no difference to the effect estimate; (2) showed that confounder(s) were not related to outcome; (3) showed that confounder(s) were not related to intervention. In judging whether any of these circumstances arose, review authors will need to decide the adequacy with which measured confounders characterized the prespecified confounding domain. B. Did the analysis adjust for the confounding domain with adequate care? To answer this question, the review author should consider: (1) the adequacy with which measured confounders characterized the pre-specified confounding domain; (2) the resolution with which the confounder was measured (i.e. age measured in years, not just young versus old or by decades); (3) the resolution with which a confounder was adjusted for in the analysis; (4) the appropriateness of the way in which the confounder was fitted in a multivariable analysis given the probable nature of the association between the confounder and the outcome specified in the review protocol. These considerations are necessarily hierarchical. That is, if a particular confounding domain did not give rise to confounding for a particular outcome in a study (A = ‘yes’) then item B is not applicable. Uncertainty about the adequacy of adjustment (‘probably yes’ or ‘probably no’) may arise if, for example, a continuously scaled confounder is described in categories in the table of participants’ baseline characteristics, rather than by a measure of central tendency and dispersion, or fitted in multivariable regression in categories. As also stated by Valentine and Thompson, (2013), ‘For an optimal assessment [of the risk of residual confounding] in a systematic review [including NRS], this information needs to be reported consistently for all included studies’ and ‘If information is not available for some primary studies, a uniform judgment about the risk of confounding across included studies cannot be made.’

4. Selective reporting

68

Evidence is increasing about the seriousness of the threat to validity from selective reporting within studies. As Norris et al. describe, most of the evidence currently comes from methodological reviews of unbiased samples of RCTs, for which the review authors have compared outcomes described in trial protocols with the outcomes reported in subsequent publications [Norris et al., 2013]. There is no evidence about the proportion of NRS that have protocols of similar detail (i.e. defining eligibility criteria, the intervention and comparator and outcomes) to those required for RCTs (e.g. for ethic review board assessment before starting a study). However, there was recognition among workshop delegates that the proportion of NRS with a detailed protocol is smaller than the proportion of RCTs. Possible reasons include: no Copyright © 2013 John Wiley & Sons, Ltd.

Res. Syn. Meth. 2013, 4 63–77

G. A. WELLS ET AL.

need for a funding application to support the study (e.g. because the data are already available and the time of the analyst is funded from elsewhere); no need for an ethics review board assessment (e.g. because available data are completely anonymized or are under the guardianship of the principal investigator); and evolution of a study question to a larger enterprise following an initial ‘peek’ to see if available data support a ‘hunch’. There are serious potential consequences of primary NRS not having a detailed protocol, with respect to both selective reporting and the risk of reporting biases and, more important in the current context, to the review author’s ability to identify the risk of reporting biases. Paper 4 described two kinds of within-study selective reporting, selective outcome reporting and selective analysis reporting. The former is now very well documented for RCTs, and Kirkham et al., (2010) have described a framework for assessing the risk of bias from selective reporting. Similar to publication bias in relation to whole studies [Dwan et al., 2008], selective outcome reporting may cause outcome reporting bias (exaggerated effect estimate at the review level) by virtue of the absence of data from a primary study for inclusion in the review (i.e. the exclusion of data likely to shift the pooled effect estimate towards the null). Selective analysis reporting, and analysis reporting bias, has not been well documented although most analysts accept that it is a source of bias, both in RCTs and NRS. In contrast to selective outcome reporting, bias arising from selective analysis reporting is hypothesized to lead to the inclusion in the review of exaggerated primary study effect estimates, in the same way as other biases. 4.1. Tasks at the level of the review protocol Delegates at the workshop suspected that a study addressing the review question in an exploratory manner, or in which an effect estimate relevant to the review arose opportunistically, is particularly at risk of selective reporting. In both situations, delegates had concern that key variables required to estimate the treatment effect are less likely to be predefined. The suspicion was that such findings often arise incidentally when analysts carry out very many different analyses; findings thought to be interesting or found to be statistically significant are then reported on the basis of an uncorrected inference test without describing the background of multiple testing. Therefore, we recommend that when writing the review protocol, the review authors discuss whether available primary studies are likely to have been designed to test one or more well-defined hypothesis relevant to the review questions or have arisen ‘incidentally’. This consideration at the protocol stage of a review translates into a consideration at the primary study level when appraising included studies. 4.2. Tasks at the level of assessing a primary study Review authors should consider two tasks at the level of primary studies relating to selective reporting: • Was a primary study designed to test/confirm one or more well-defined hypothesis relevant to the review? Answering ‘no’ to this question should not necessarily cause review authors to consider the study at high risk of bias from selective reporting. For example, some important studies about suspected harms may be designed in an exploratory fashion, rather than as confirmatory studies, precisely because the research question is not well specified or the outcome unintentional. It is not known whether the risk of bias from selective reporting varies for questions about harms versus questions about benefits. • Was estimation of the effect estimate relevant to the review in some way secondary to the primary objective for which the study data were collected? This situation could arise in many different ways, including: secondary analysis of a primary study dataset (e.g. endovascular vein harvesting for CABG); or an ‘intervention’ delivered without the intention of influencing the outcome (e.g. circumcision influencing HIV transmission, but studied in cohorts of men circumcised for religious reasons). 4.3. Tasks at the level of assessing an outcome within a primary study Because of the empirical basis underpinning selective outcome reporting, we have separated outcome-level considerations about selective outcome reporting from outcome-level considerations about selective analysis reporting. Ideally, review authors should compare the findings that were reported, and the analyses that gave rise to the findings, with outcomes and analyses specified in the study protocol. However, it is widely recognized that, currently, the protocol will only be available for a small minority of NRS. With this recognition comes the need to rely on content expertise, for example about outcomes that one would expect researchers to measure and analyse, in relation both to partially or unreported outcomes and to analyses carried out in an unexpected manner.

A. Was the outcome [specified in the review protocol] measured? B. Was the outcome [specified in the review protocol] analysed? Copyright © 2013 John Wiley & Sons, Ltd.

Res. Syn. Meth. 2013, 4 63–77

69

4.3.1. Selective outcome reporting. We recommend that the framework set out by Kirkham et al., (2010) for assessing selective outcome reporting and the risk of outcome reporting bias be applied to each outcome of interest to the review protocol. We believe that review authors can categorize these assessments as described by Kirkham et al., (2010) by answering the following three questions for each outcome in turn (see Table 4):

70

Table 4. Issues relating to selective outcome reporting to be considered for each outcome specified in the review protocol.

G. A. WELLS ET AL.

Copyright © 2013 John Wiley & Sons, Ltd.

Res. Syn. Meth. 2013, 4 63–77

G. A. WELLS ET AL.

71

Copyright © 2013 John Wiley & Sons, Ltd.

Res. Syn. Meth. 2013, 4 63–77

72

Table 5. Issues relating to selective analysis reporting to be considered for each outcome specified in the review protocol.

G. A. WELLS ET AL.

Copyright © 2013 John Wiley & Sons, Ltd.

Res. Syn. Meth. 2013, 4 63–77

G. A. WELLS ET AL.

73

Copyright © 2013 John Wiley & Sons, Ltd.

Res. Syn. Meth. 2013, 4 63–77

G. A. WELLS ET AL.

C. Was the outcome [specified in the review protocol] partially/not reported because of the statistical significance/magnitude of the effect of the intervention? These considerations are hierarchical. That is, if a particular outcome was not measured (A = ‘no’) then items B and C must inevitably be answered ‘no’, and the risk of outcome reporting bias should be judged to be low. Similarly, if an outcome is assessed as not having been analysed (i.e. A = ‘yes’ and B = ‘no’), then the answer to C must be ‘no’. Uncertainty (‘probably yes’ or ‘probably no’) may arise when judgement needs to be applied; for example, in order to classify the risk of outcome reporting bias in some situations, Kirkham et al., (2010) recognized the role of judgement, for example ‘Judgment says [the outcome was] likely to have been analysed but not reported because of non-significant results.’ 4.3.2. Selective analysis reporting. An analogous set of questions could be asked about selective analysis reporting. That is: • Was the analysis of the outcome [specified in the review protocol] adequately pre-specified in the study analysis plan? • Was the analysis of the outcome [specified in the review protocol] carried out as pre-specified in the study analysis plan? • Was the analysis selectively reported because the effect estimate was statistically significant? These questions would be very difficult to apply because NRS are even less likely to have detailed statistical analysis plans than they are to have detailed protocols. For this reason, and in view of the lack of evidence to support such questions, we propose a different approach, namely to consider specific aspects of analyses which the workshop delegates believed posed a substantial risk of selective analysis reporting (Table 5) [Norris et al., 2013]. Using this approach, review authors are directed to answer whether they suspect selective analysis reporting for each outcome in relation to: A. Selection of the cohort for analysis from the ‘incipient’ cohort / entire cohort for which data are available. Failure to include a flow chart describing how the cohort for analysis was pre-specified (c.f. Consensus Statement on the Reporting of Trials diagram [http://www.consort-statement.org/CONSORT diagram]) may disguise how the number and selection of participants included in the analysis may differ from the pre-specified analysis. Consider whether all participants are accounted for and whether the cohort analysed may have been selected after carrying out initial analyses. B. Selection of the final statistical model to report. Look for information that suggests multiple adjusted analyses were carried out but only one (or a subset or none) was reported, and check whether there was a failure to adjust for important confounders that were measured and shown not to be balanced across groups. Check also for other opportunities for multiple models to have been used, such as different hierarchical modelling assumptions and different effect measures (e.g. odds ratio and risk ratio). Various methods of inference might have been tried, such as parametric and non-parametric methods, different estimation methods, and different approaches to inference (e.g. Bayesian and frequentist). C. Selection of subgroups to report. Check whether findings for study population subgroups were reported based on unusual definitions, (e.g. unusual classification of subgroups by dose or dose frequency), and statistically significant results were reported for the unusual subgroups. (This issue is distinct from A or B because it relates to findings for subgroups of the main analysis cohort.) D. Handling of missing data. Consider whether a strategy for handling missing data was reported in the Methods section of the study report, whether the strategy applied matched the Methods or, if no strategy

Table 6. Issues related to directness of evidence contributed by a primary study in relation to: (a) population, intervention, and comparator and (b) each outcome specified in the review protocol.

74 Copyright © 2013 John Wiley & Sons, Ltd.

Res. Syn. Meth. 2013, 4 63–77

G. A. WELLS ET AL.

was described, whether a reputable method of imputation was used and whether analyses with imputation were carried out but not reported. E. Choice of thresholds in the conversion of continuous to categorical data (either for outcomes or confounders). Compare publications related to the same study checking whether different cut-points were used to create categorical outcomes; similarly, compare the cut-offs used with those used across a range of other studies and check for cut-points that are extreme or that do not make clinical sense. Answers to these questions for each area collected in the conduct of reviews will inform future methodological research about the importance of the areas in relation to selective analysis reporting.

5. Directness When review authors have appraised all of the studies included in a review, it is important for them to judge the degree of confidence they have about whether or not the available evidence directly addresses the review question. This involves consideration of whether the collective evidence from the included primary studies ‘matches’ the PICO elements defined in the review protocol. Current guidance from the Cochrane Handbook requires that this task be carried out when preparing the Summary of Findings table [Schünemann et al., 2013]. Making this judgement at the level of the review will be facilitated if the same considerations have been applied at the time of appraising each included study. Evidence from NRS can play an important role in addressing potential limitations in RCT evidence in which directness may be compromised because of design aspects of RCTs (e.g. restrictive trial population, outcomes studied, and specialist intervention). NRS may inform judgments in relation to directness in the context of systematic reviews. 5.1. Tasks at the level of the review protocol Establish a consensus among the review authors to: • Identify the purpose of the NRS in the systematic review. As described by Schünemann et al., (2013), NRS can be used as a complement, sequence, or replacement for RCTs. • Judge whether the collective evidence from the included primary studies ‘matches’ the PICO elements (population, interventions, and comparators) defined in the review protocol. 5.2. Tasks at the level of assessing a primary study Apply the checklist list of PICO elements to each primary study (Table 6). In judging directness, for the population, the disease and co-morbidities, modifiable and non-modifiable person or population characteristics, and environmental and geographic characteristics should be considered. Considerations for interventions and comparators include the nature of the intervention as a planned intervention or naturally occurring intervention and for the comparator as an active or non-active comparator. 5.3. Tasks at the level of assessing an outcome within a primary study For outcomes, the type of outcome (e.g. health, economic, systems) and comparability of their methods of assessment need to be considered. The items on the checklist are scored using a simple ‘yes’, ‘probably yes’, ‘probably no’, ‘no’, and ‘N/A’ scoring key.

6. Discussion

Copyright © 2013 John Wiley & Sons, Ltd.

Res. Syn. Meth. 2013, 4 63–77

75

Formatted as checklists, the guidance summarized in this paper aims to provide insight into the methodological issues when considering the inclusion of NRS in a systematic review. The checklists were based on the discussions at the Ottawa Non-Randomized Studies Workshop, as described in the previous papers of this series, as well as on existing tools formulated by the Non-randomized Study Methods Group (NRSMG) of the Cochrane Collaboration and the Outcome Reporting Bias In Trials Group. In particular, checklists have been devised, providing guidance to describe/assess: study designs based on study design features; risk of residual confounding and circumstances when review authors might consider meta-analysing data from NRS; risk of selective outcome or analysis reporting, based on the existing framework for detecting selective outcome reporting in trials but extended to selective reporting of analyses; and directness of evidence contributed by a systematic review with respect to the review question. These checklists will aid in assessing the risk of bias and discriminate between NRS of differing quality and directness. Prior to the Ottawa Workshop, general quality assessment instruments had been proposed for assessing NRS at the primary study level. In a review of these instruments [Bai et al., 2012], the SIGN50 for cohort and case–control studies were highly rated among these quality assessment instruments [SIGN 50 http://www.sign.ac.uk/

G. A. WELLS ET AL.

methodology/checklists.html]. However, the focus of these instruments was on general quality and not on risk of bias per se. As well, the methodological issues when considering the inclusion of NRS in a systematic review were not considered. Chapter 13 of the Cochrane Handbook [Reeves et al., 2008] provided the first concerted effort to provide insight and guidance into the methodological issues associated with systematic reviews of NRS. The results of the Ottawa Workshop and this summary paper are the next step in this evolution providing a more detailed and thought-out process of four critical issues, namely: study design and risk of bias, confounding, selective reporting, and directness. These checklists represent substantive progress on these issues. The considerations described by the checklists are generic in nature and should not need modification for different review questions. In the longer term, we expect specific assessment tools to be developed, building on these concepts and insights. There are several limitations to this work, both on the proposed checklists and on aspects of NRS that have not been considered so far. In particular, although not often used for evaluating interventions, case–control studies have not been investigated as part of this endeavor. The discussion and guidance developed for selective analysis reporting is just the first step in addressing this important topic, and work on a number of other important topics related to performance, attrition, and detection biases is in progress. Finally, there are limits to communications because of differences in terminology used by various research fields. These checklists will allow review groups to operationalize the inclusion of NRS in systematic reviews in a more consistent way. However, this research area is rapidly developing. The next major step is to extend the existing Cochrane Risk of Bias tool so that it can assess the risk of bias to NRS included in a review. This initiative involves detailed consideration of how the assessment of each bias domain may need to be customized to study designs. This is a research topic that was recently funded by the Methods Innovation Fund of The Cochrane Collaboration, and results are expected to be reported in 2014, aiming for integration of NRS into the Risk of Bias tool in a subsequent version of the Collaboration’s Review Manager software. The structure of the extended tool is anticipated to be similar to QUADAS-2 [Whiting et al., 2011], with review authors answering a number of signalling questions, nested within the usual bias domains. The risk of bias in each of these domains will be judged, informed by the responses to the signalling questions. Fundamental research into a number of particular topics for systematic reviews of NRS is critical. Topics such as the assessment of publication bias, and the need and justification for comprehensive searching require empirical evaluation. Empirical research is currently underway on selective analysis reporting related to RCTs.

7. Funding The Workshop was supported financially by the Agency for Healthcare Quality and Research (through the Ottawa Collaborating Centre of the Agency for Healthcare Quality and Research) and by a grant from the Cochrane Collaboration Discretionary Fund. BCR is supported in part by the UK National Institute for Health Research Bristol Cardiovascular Biomedical Research Unit. JPTH was supported by MRC Grant U105285807. The Health Services Research Unit is funded by the Scottish Government Executive Health Department. The views expressed in this article are those of the authors and not necessarily those of the funding bodies, The Cochrane Collaboration or its registered entities, committees or working groups or The Campbell Collaboration.

8. Acknowledgements We are grateful to the workshop participants, all of whom contributed to the discussions that provided the foundation for this paper and to members of the Cochrane Non-Randomized Studies Methods Group, whose experiences of conducting or reviewing NRS have informed the Group’s guidance. We are also grateful to Miguel Hernán and other members of the group working on the confounding and selection bias domain for the Cochrane Collaboration’s project to develop a Risk of Bias assessment tool for non-randomized studies, for developing the idea of distinguishing between confounding domains and the variables used to measure them.

References

76

Bai A, Shulka VK, Bak G, Wells GA, Quality Assessment Tools Project Report. Ottawa: Canadian Agency for Drugs and Technologies for Health: 2012. Consensus Statement on the Reporting of Trials (CONSORT). http://www.consort-statement.org/ Dwan K, Altman DG, Arnaiz JA, Bloom J, Chan AW, Cronin E, Decullier E, Easterbrook PJ, Von Elm E, Gamble C, Ghersi D, Ioannidis JP, Simes J, Williamson PR. 2008. Systematic review of the empirical evidence of study publication bias and outcome reporting bias. PLoS One 3(8): e3081–e3081. Kirkham JJ, Dwan KM, Altman DG, Gamble C, Dodd S, Smyth R, Williamson PR. 2010. The impact of outcome reporting bias in randomised controlled trials on a cohort of systematic reviews. BMJ 340: c365–c365. Copyright © 2013 John Wiley & Sons, Ltd.

Res. Syn. Meth. 2013, 4 63–77

G. A. WELLS ET AL.

Higgins JPT, Altman DG. (eds.) 2008. Assessing risk of bias in included studies (Chapter 8). In Higgins JPT, Green S (eds.), Cochrane Handbook for Systematic Reviews of Interventions. John Wiley & Sons, Chichester (UK), 187–241. Higgins JPT, Altman DG, Gøtzsche PC, Jüni P, Moher D, Oxman AD, Savović J, Schulz KF, Weeks L, Sterne JAC. 2011. Cochrane Bias Methods Group, Cochrane Statistical Methods Group. The Cochrane Collaboration’s tool for assessing risk of bias in randomised trials. BMJ 343: d5928. doi: 10.1136/bmj.d5928 Higgins JPT, Ramsay C, Reeves BC, Deeks JJ, Shea B, Valentine JC, Tugwell P, Wells GA. 2013. Issues relating to study design and risk of bias when including non-randomized studies in systematic reviews on the effects of interventions. Research Synthesis Methods, XXX-YYY. Norris SL, Moher D, Reeves BC, Loke Y, Garner S, Anderson L, Tugwell P, Wells GA. 2013. Issues relating to selective reporting when including non-randomized studies in systematic reviews on the effects of interventions. Research Synthesis Methods, XXX-YYY. Reeves BC, Deeks JJ, Higgins JPT, Wells GA. 2008. Including non-randomized studies (Chapter 13). In Higgins JPT, Green S (eds). Cochrane Handbook for Systematic Reviews of Interventions. John Wiley & Sons: Chichester (UK), 391–432. Reeves BC, Higgins JPT, Ramsay C, Shea B, Tugwell P, Wells GA. 2013. An introduction to methodological issues when including non-randomised studies in systematic reviews on the effects of interventions. Research Synthesis Methods, XXX-YYY. Rubin DB. 1991. Practical implications of modes of statistical inference for causal effects and the critical role of the assignment mechanism. Biometrics 47: 1213–1234. Schünemann HJ, Oxman AD, Higgins JPT, Vist GE, Glasziou P, Guyatt GH, on behalf of the Cochrane Applicability and Recommendations Methods Group and the Cochrane Statistical Methods Group. 2008. Presenting results and ‘Summary of findings’ tables (Chapter 11). In Higgins JPT, Green S (eds). Cochrane Handbook for Systematic Reviews of Interventions. John Wiley & Sons: Chichester (UK), 335–358. Schünemann H, Tugwell P, Reeves BC, Akl E, Santesso N, Shea B, Helfand M. 2013. Nonrandomized studies as replacement, supplementary or sequential evidence for randomized controlled trials in systematic reviews on the effects of interventions. Research Synthesis Methods, XXX-YYY. SIGN50. Quality assessment instruments for case–control and cohort studies. http://www.sign.ac.uk/methodology/ checklists.html Valentine JC, Thompson SG. 2013. Issues relating to confounding and meta-analysis when including nonrandomized studies in systematic reviews on the effects of interventions. Research Synthesis Methods, XXX-YYY. Whiting PF, Rutjes AWS, Westwood ME, Mallett S, Deeks JJ, Reitsma JB, Leeflang MMG, Sterne JAC, Bossuyt PMM, and the QUADAS-2 Group. 2011. QUADAS-2: A Revised Tool for the Quality Assessment of Diagnostic Accuracy Studies. Ann Intern Med 155: 529–536.

77

Copyright © 2013 John Wiley & Sons, Ltd.

Res. Syn. Meth. 2013, 4 63–77