Nonrandomized studies as a source of

0 downloads 0 Views 523KB Size Report
Nancy Santesso, a. Frederick A. Spencer, b. Beverley Shea, c. George Wells c and Mark Helfand f. The terms applicability, generalizability, external validity and ...
Special Issue Paper Received 18 January 2013,

Accepted 6 February 2013

Published online in Wiley Online Library

(wileyonlinelibrary.com) DOI: 10.1002/jrsm.1078

Non-randomized studies as a source of complementary, sequential or replacement evidence for randomized controlled trials in systematic reviews on the effects of interventions Holger J. Schünemann,a,b* † Peter Tugwell,c Barnaby C. Reeves,d Elie A. Akl,a,e Nancy Santesso,a Frederick A. Spencer,b Beverley Shea,c George Wellsc and Mark Helfandf The terms applicability, generalizability, external validity and transferability are related, sometimes used interchangeably and have in common that they lack a clear and consistent definition in the classic epidemiological literature. However, all of these terms generally describe one overarching theme: whether or not available research evidence can be directly utilized to answer the healthcare questions at hand, ideally supported by a judgment about the degree of confidence for this utilization. This concept has been called directness. The objectives of this paper were to delineate how non-randomized studies (NRS) inform judgments in relation to directness and the concepts that it encompasses in the context of systematic reviews. We will briefly review what is known and describe the theoretical and practical issues as well as offer guidance to those tackling the challenges of judging directness and using research evidence to answer healthcare questions with evidence from NRS. In particular, we suggest a framework in which authors can use NRS as a complement, sequence or replacement for randomized controlled trials (RCTs) by focusing on judgments about the population, intervention, comparison and outcomes. Authors of systematic reviews will use NRS to complement judgments about the inconsistencies, the rationale and credibility of subgroup analysis, the baseline risk estimates for the determination of absolute benefits and downsides, and the directness of surrogate outcomes. This evidence includes contextual or supplementary evidence. Authors of systematic review and other summaries of the evidence use NRS as sequential evidence to provide evidence when insufficient evidence is available for an outcome from RCTs, but NRS evidence is available (e.g., long-term harms). Use of evidence from NRS may also serve to replace RCT evidence when NRS provide equivalent (or potentially higher) confidence in the evidence (i.e. quality) compared to indirect evidence from RCTs. These judgments will be made in the context of other domains that influence the overall quality of the body of evidence, including the risk of bias, publication bias (i.e. limitations in the detailed study design and execution), inconsistency, imprecision and factors that increase our confidence in effects. This article will support systematic reviewers in their interaction with decision makers, that is, those who use the systematic review to develop guidelines, address health policy makers, and make clinical decisions, by making these judgments transparent. Copyright © 2013 John Wiley & Sons, Ltd. Supporting information may be found in the online version of this article. Keywords:

GRADE; observational studies; non randomized studies; decision making; systematic reviews

a

Department of Clinical Epidemiology and Biostatistics, McMaster University, Hamilton, Canada Department of Medicine, McMaster University, Hamilton, Canada c Clinical Epidemiology Unit, Ottawa Hospital Research Institute, Ottawa Hospital, Ottawa, Ontario, Canada d Bristol Heart Institute, University of Bristol, Bristol Royal Infirmary, Bristol, UK e Department of Internal Medicine, American University of Beirut, Beirut, Lebanon f Portland VA Medical Center and Department of Medicine, Oregon Health and Science University, Portland, Oregon, USA *Correspondence to: Holger J. Schünemann, Department of Clinical Epidemiology and Biostatistics, McMaster University Health Sciences Centre, Room 2C10B, 1280 Main Street West, Hamilton, ON L8S 4K1, Canada. † E-mail: [email protected] b

Res. Syn. Meth. 2013, 4 49–62

49

Copyright © 2013 John Wiley & Sons, Ltd.

H. J. SCHÜNEMANN ET AL.

1. Introduction 1.1. Applicability, generalizability, external validity and transferability are considered in directness The terms applicability, generalizability, external validity and transferability are related, sometimes used interchangeably and have in common that they lack a clear and consistent definition in the classic epidemiological literature (Pigeot and Ahrens, 2006; Rothman and Greenland, 1998; Wikipedia, 2011). However, all of the terms describe one overarching theme: whether or not available research evidence can be directly utilized to answer the health and healthcare question at hand, ideally supported by a judgment about the degree of confidence in this utilization. One could interpret the term applicable as whether or not the research evidence answers the healthcare question asked by a clinician or public health practioner. Generalizability may refer to whether or not the evidence can be generalized from the population from which the actual research evidence is obtained to the population for which a healthcare answer is required. External validity frequently refers to whether or not the evidence from research studies is valid (i.e., whether or not it measures what it is supposed to measure) in order to answer real life healthcare questions outside of the context of research studies. Finally, transferability frequently refers to whether research evidence can be transferred from one setting to another. Despite the temptation to provide different interpretations, these terms are used interchangeably and describe overlapping concepts. Although each one of them might encompass a broad enough area to express to some extent the degree of confidence about using research evidence to answer a specific healthcare question, each is probably incomplete in its definition and description. For example, the use of biomarkers as a healthcare outcome may provide applicable evidence but the direct consequences for people may be uncertain and limit one’s confidence in the evidence with respect to informing decisions about outcomes that are important to patients. 1.2. Judging indirectness in GRADE The GRADE Working Group, an international interdisciplinary working group, has suggested using the term ‘directness’ to describe all of these aspects including the concept of direct versus indirect comparisons of different interventions (Schunemann et al., 2003; Atkins et al., 2004; Guyatt et al., 2008b). The Cochrane Collaboration and the GRADE working group suggest using a very structured framework to address directness (Schünemann et al., 2008; Guyatt et al., 2011b). This framework is based on asking and answering clinical questions according to the underlying population, the underlying intervention(s), the underlying comparator and the underlying outcomes (PICO) (Guyatt et al., 2011a; Oxman et al., 1993). The concepts of directness and its opposite indirectness in the context of PICO and GRADE are described elsewhere (Atkins et al., 2004; Guyatt et al., 2008b; Guyatt et al., 2011b). For the purpose of this article, they should be seen as whether or not research evidence is sufficiently direct to answer the PICO healthcare question at hand. Indirect comparisons in this conceptualization of directness refer to a comparison of one or more interventions that have not been compared directly in the same research studies. In the context of indirect comparisons, NRS that make direct comparisons will be of particular importance when randomized controlled trials (RCTs) that make direct comparisons are unavailable, particularly in the evaluation of harms. When making judgments about directness, GRADE asks whether or not the degree of indirectness warrants lowering of the confidence in an effect (i.e., the quality of evidence). By focusing on all four elements of the PICO question rather than solely the population (Atkins et al., 2011), directness is all-encompassing and requires review authors to consider whether the investigated intervention, comparator and outcomes are similar enough to the ones specified in the PICO; for example, in the protocol of a systematic review. It also becomes clear that answering the healthcare questions posed by a review protocol requires review authors to make judgments about the degree of directness. Judgments about whether or not research evidence is direct enough can be made at various stages of evidence assessment, including by a clinician because of the idiosyncrasies of the decision-making situation, for example, the specific patient. We will explain these judgments in subsequent sections. We generally do not distinguish between the judgments about directness for benefits and harms although different types of research may be available or attainable for the evaluation of benefits and harms. 1.3. Reasons for the recent emphasis on directness in systematic reviews

50

Many evidence hierarchies consider RCTs to provide higher quality evidence than NRS. Consequently, systematic reviews of RCTs have been considered to represent the source of best evidence in health care for interventions. This emphasis on systematic reviews of RCTs rather than NRS has been criticized. Systematic reviews of RCTs have typically focused on the assessment of the risk of bias (or internal validity) of included studies while not placing adequate emphasis on whether or not the included studies provide answers that are relevant to the population, intervention, comparison and outcomes defined in the review protocol. Thus, if one is willing to accept a compromise in regard to issues such as bias and confounding resulting from observational designs, one could argue that evidence should be first direct (or applicable). In fact, there may be good reasons for such thinking. Both the risk of bias that can result from NRS designs and directness of the evidence can affect the confidence in the evidence for a specific PICO question. Consequently, direct evidence from NRS can provide equivalent Copyright © 2013 John Wiley & Sons, Ltd.

Res. Syn. Meth. 2013, 4 49–62

H. J. SCHÜNEMANN ET AL.

(or potentially higher) confidence in the evidence (i.e., quality) compared with indirect evidence from RCTs. However, a judgment of overall higher quality evidence from NRS does require judging all factors that influence the confidence in the evidence, including inconsistency, imprecision and publication bias. Concern about these issues may decrease one’s confidence in treatment effects. Conversely, large effects, dose-effect relations and absence of opposing confounders or biases may increase the confidence in an effect (Guyatt et al., 2008a). A comparison of whether or not a body of evidence from NRS provide overall higher quality evidence than a body of evidence from RCTs will also depend on an evaluation of all of these factors. One could also argue that there are many areas of health care for which valid but indirect evidence from RCTs may leave us with lower confidence in intervention effects compared with direct evidence from NRS. For example, contrast a body of evidence from RCTs of antibiotics that have been shown to be effective for a given health problem, with evidence from NRS of antibiotics that indicate resistance has developed in a certain population (in a geographic location or setting) and demonstrate that the antibiotic is not exerting the effect. This evidence from NRS would likely trump the evidence from the RCTs for the specific population. Furthermore, NRS evidence demonstrating resistance alone (without evaluating effects) may also inform that the RCT evidence is not applicable to a specific population and inform the interpretation of the RCTs. Another reason may be that evidence from RCTs may not be available for one or more outcomes of importance. The latter is frequently the case when rare or long-term benefits, harms or other downsides of interventions are critical for decision making and require evaluation (Higgins et al., 2013; Reeves et al., 2013). Thus, guideline panels developing recommendations, agencies and organizations making coverage decisions, and other organizations in a health system are increasingly recognizing the importance of evidence from NRS in their decision-making processes, and systematic reviews need to reflect their requirements. The objectives of this paper, therefore, were as follows:-

• • • •

to define issues relating to directness and the terms that it encompasses in the context of systematic reviews; to describe the theoretical and practical issues; to offer guidance to review authors tackling the challenge of judging the directness of evidence about review questions assembled in a systematic review; and to present a brief research agenda.

This penultimate article in the series will provide a guide for transparent judgments to support systematic review authors in their interaction with decision makers, that is, those who use the systematic reviews to develop guidelines, address health policy makers, and make clinical decisions. We will also explain why NRSs contribute important information when making judgments about the indirectness of a body of evidence, such as in relation to inconsistencies, the rationale and credibility of subgroup analysis, and the provision of baseline risk estimates for the determination of absolute benefits and downsides.

2. Key methodological issues 2.1. Domains of directness

Copyright © 2013 John Wiley & Sons, Ltd.

Res. Syn. Meth. 2013, 4 49–62

51

Table 1 presents a list of items that should be considered when addressing directness, that is, whether or not the research evidence in the PICO sub-domains and items is sufficiently direct to answer the healthcare question. A PICO should be sufficiently broad to be helpful in practice. Many bodies of evidence from RCTs will therefore not provide sufficient confidence in the estimates to answer broad healthcare questions. However, these considerations require judgments, and these judgments may depend on whether systematic reviews include RCTs and NRS or NRS only. Thus, we include judgments about the contextual information that NRS provide in the presence of randomized control trials. For example, a judgment may need to be made about whether or not research evidence from RCTs conducted in men is direct enough to inform if the intervention exerts the same effect in women. This judgment can be informed by evidence from complementary NRS, for example, cohort studies that demonstrated that the intervention exerted the same effect in men as in women. If NRS were to provide evidence that the intervention may be working differentially in men and women, that is, there is an interaction between gender and the intervention, then the judgment about directness would be considered less secure, and a review author might consider downgrading the quality of evidence if the question specified in the systematic review protocol focused on women as well as men. The review author needs to consider each item or sub-domain in Table 1 in a similar way. Another way in which NRS can inform a review is by providing information about the baseline risk of the control event rate in situations where RCTs do not provide sufficient information. For example, if the evidence from an RCT describes the baseline risk (or control event rate) of head injury in children less than 5 years who were riding bicycles, but the interest is in children who are between 12 and 15 years, than NRS may be able to provide the estimates of the baseline risk in this other age group. A separate judgment then needs to be made if the relative estimates of effect in the younger age group can be applied to derive absolute effects. These judgments will include assessing the risk of bias (i.e., limitations in the design and execution of the NRS), inconsistency, imprecision and publication bias. In addition, dose response relations may make one more confident about

H. J. SCHÜNEMANN ET AL.

Table 1. List of items that should be considered when addressing directness. Domain Subdomain Population

Disease and comorbidities

Non-modifiable person or population characteristics Modifiable person or population characteristics Environmental and geographic characteristics

Setting

Intervention (may separate planned from naturally occurring intervention or exposure)

Type of intervention

Components of the intervention

Comparator

Outcomes

Naturally occurring intervention No active comparison

Active comparison Health outcomes (beneficial and non-beneficial including burden) Economic outcomes (resource use) System outcomes

Item(s)

Primary condition of interest Secondary conditions of interest (comorbidities) Age Gender Genetics Ethnicity Anthropometric (weight) Type of community or organization Urban on non-urban Exposure to toxins (may be a population defining factor that can be removed through an intervention) Healthcare system and provision (tertiary care, secondary, primary care) Regulatory environment Drugs/medication Behavior Policy change (Removal of toxins) Non intended effects of law-making Components of the intervention (What are the components? Who is administering or implementing the intervention? What is the intensity) What are the components? Who is administering or implementing the intervention? What is the intensity and duration? Type of exposure Drugs: placebo Usual care Current policy continues to be used Same as intervention How the outcome is measured (valid?) When is the outcome measured? Surrogate or patient/population important Resource units consumed

52

extrapolating risk estimates from studies that do report on more than one level of baseline risk. Thus, NRS may provide evidence either supporting an interaction between the two domains in the PICO or it may provide information about a differential baseline risk in the population category. Perhaps, the most challenging domain related to judgments about directness in the PICO framework is the intervention domain. Some of the reasons for the challenges are listed in the individual items and sub-domains of the I (Intervention) domain in Table 1. Addressing directness seems of particular importance for health services interventions that have multiple components and interactions among components (‘complex interventions’). For example, in a treatment program that includes many components (e.g., type of clinic and type of probe used), a change in the administration of cryotherapy for uterine neoplasia from physician to nurse poses the question of whether or not observed effects can be considered similar if different professional groups apply the intervention. However, it needs to be emphasized that any intervention can become a complex intervention. For example, what is considered a straightforward clinical intervention often becomes complex when applied in clinical practice. For example, the simple intervention of taking a medication is influenced by many factors, for example, how a clinician informs the patient about taking the medication, general adherence, a patient’s ability to pay for the medication and forgetfulness to name a few (Table 2). Furthermore, clinical interventions can be Copyright © 2013 John Wiley & Sons, Ltd.

Res. Syn. Meth. 2013, 4 49–62

H. J. SCHÜNEMANN ET AL.

Table 2. The fallacy of simple interventions: from complex to very complex.

Examples of what makes it complex

Intervention Prescribing a medication (such as Tylenol) Prescribing two medications at one time Lifestyle changes

Patients not filling the prescription Clinician providing appropriate information why to take the medication Patients taking too many medications and deciding to take only one Barriers due to social environment and will power

Respiratory rehabilitation

Many facets to respiratory rehabilitation

Reducing indoor air pollution through cleaner stoves in low and middle income countries Health system changes (universal health coverage)

Resources may not be available

Difficult to implement and measure follow-through

Examples of information that can be supplemented through NRS about the intervention and its uptake Pharmacy databases

Qualitative research NRS showing if adjustment for confounders alters the effect. Can the target population truly realize the change (e.g. success of implanting dietary interventions depends on dietary change for entire families or populations)? NRS; Are the effects similar across different severity of the disease? NRS whether or not cleaner stoves have been installed

NRS from other settings and jurisdictions

NRS, non-randomized studies.

multi-component interventions. Respiratory rehabilitation may be perceived as a simple clinical intervention but, in real life, may involve a range of co-interventions causing concern about directness when applying evidence. A review author must decide and understand which type of exercises (e.g., anaerobic or aerobic) were included as part of the rehabilitation program, if rehabilitation takes place as in-patient or out-patient, if the exercise program is scheduled continuously or periodically, or if coaching and encouragement as psychological support were provided during exercise by rehabilitation specialists. It is important to note that these considerations apply not only to the intervention domain but also to the comparator (C) domain. Thus, the review author must make judgments about the similarity of the comparator and intervention primary studies to those used in healthcare practice. After considering all factors that influence the confidence in the estimate of an effect (i.e., the quality of evidence), the review authors must decide whether NRS provide (i) higher quality evidence about the review question because the NRS are more direct than RCTs, or (ii) contextual evidence that may alter the magnitude of the expected relative treatment effect. Addressing the directness of outcomes has been described in detail in a recent series of articles by the GRADE Working Group (Guyatt et al., 2011b). Assessing directness of the outcome largely relates to whether or not the outcome for which research evidence exists is direct enough for decision making. In other words, does it provide the information that is critically required to answer the review questions? NRS may help the author of a systematic review judge whether or not surrogate outcomes are sufficiently closely related to patient important outcomes. For example, NRS suggest that culture results 6 months after starting treatment may be a sufficiently good surrogate marker for the patient important outcome ‘survival’ or ‘cure from TB’ as culture results explain up to 60% of the variation in relapse from multidrug resistant TB (O’Donnell et al., 2013). In this and other areas of health care, NRS can provide supplementary information, or sometimes the only information, about the relationship between surrogate and patient-important outcomes. When review authors describe the results of their systematic review, they should describe how closely the settings of the primary studies match the review question, with respect to the individual sub-domains and items in the four domains of the PICO. They may also discuss how the results could be used in other settings or populations such as a community or organization, justifying their conclusions.

Many scientists and clinicians ask what the role of theory is in the judgments about directness. While theories can be constructed around any possible mechanism or healthcare question, a theory is only as good and informative Copyright © 2013 John Wiley & Sons, Ltd.

Res. Syn. Meth. 2013, 4 49–62

53

2.2. The role of theory

H. J. SCHÜNEMANN ET AL.

as the data that support it (and from which it may be derived). Therefore, the reviewer should judge if the evidence, e.g. from NRS, supports judgments about directness (and theory), but not if the theory supports these judgments. Thus, judgments about directness should be informed by data. NRS evidence may provide the contextual link of whether or not a theory that was tested in RCTs is confirmed. For example, qualitative research and focus groups provide such information by informing our understanding of the effect of complex behavioral intervention. Suppose an RCT (randomizing entire communities to intervention vs a control) demonstrated a reduction in skin cancers in a multimodal community-based program to reduce sun exposure by providing counseling, education, free sunscreen, hats at beaches and other components (Dietrich et al., 1998). Whether or not the RCT supports the underpinning theory depends on what actually happened. For example, if the theory was Prochaska’s transtheoretical (‘stages of change’) model, focus groups and qualitative research could help assess whether subjects who took action progressed through ‘stages’ (from precontemplative to contemplative) or whether, for example, a change in behavior by a group of ‘popular’ kids led to a sudden change in many other kids, whatever their stage at the time. Another example in the context of systematic reviews is whether or not users of information of systematic reviews prefer the information to be presented in terms of relative or absolute risk. A randomized trial addressing understanding and usability may provide the answer of what the best presentation format is. A non-randomized qualitative study may provide the reasons why users would prefer one format over the other or why they prefer seeing all effect measures presented. 2.3. What type of information can NRS provide? Following from the discussion earlier, NRS can be considered a ‘complement’, ‘sequence’ or ‘replacement’ (Table 3). Complementary NRS provide contextual information about whether or not an intervention works or exerts the same harm in different populations, about possible interaction effects or about different estimates of baseline risk (Figure 1). Sequential NRS are those that provide additional information about items or hypotheses tested in RCTs but where the information from RCTs is incomplete or too narrow and NRS expand our knowledge about the interventions that were evaluated in RCTs. For example, NRS may provide long-term outcomes for patients participating in short-term RCTs or information if surrogate outcomes in RCTs are truly related to patient important outcomes. Replacement NRS are studies that provide, as a body of evidence, higher quality evidence than a body of evidence from RCTs after complete assessment of all the factors influencing the confidence in estimates of the effect (using the GRADE criteria for grading the quality of evidence). For example, despite being hampered by the lack of randomization, they could provide a more direct body of evidence that leaves us with greater overall confidence in estimates of effect for a given specific healthcare question (see succeeding text). Under those circumstances, decision makers should focus on the evidence from NRS as the best available evidence. For example, single-arm NRS may help to interpret RCT evidence when they provide information about the rate of a response to an intervention in a specific community. NRS may suggest that there are gaps in regard to the efficacy of an intervention that has been evaluated in RCT setting. For example, RCTs may be conducted in highly selected individuals or involve ‘encouragement’ (a co-intervention), in contrast to what happens in usual practice where an intervention is likely to be applied without additional support or in a less selected population. 2.4. NRS and quality of evidence To judge if NRS provide higher quality evidence overall than RCTs, review authors need to judge whether or not directness can balance or outweigh concerns about risks of bias in NRS. Both risk of bias and directness should be considered on a continuum: RCTs can suffer from substantial risk of bias and NRS with low risk of bias could Table 3. Role of NRS in systematic reviews when judging directness. Role Explanation and example Complementary (could also be considered contextual or supplementary) Sequential

Replacement

Additional information on:

• whether or not an intervention works in different populations • possible interaction effects • different estimates of baseline risk

54

Information not (yet) obtained or available from RCTs on: • long-term outcomes • correlation between surrogate outcomes and patient important outcomes • Information used instead of RCT evidence (for decision making): • NRS provide higher quality evidence than RCTs (more direct evidence that leaves us with greater overall confidence in estimates of effect or quality of evidence) • NRS provide the best available evidence in the absence of RCTS

NRS, non-randomized studies; RCTs, randomized controlled trials. Copyright © 2013 John Wiley & Sons, Ltd.

Res. Syn. Meth. 2013, 4 49–62

H. J. SCHÜNEMANN ET AL.

Figure 1. Population indirectness relates to how confident one can be that an effect is correct for a certain population. It is influenced by certainty in the relative effect (“Is there reason to believe that the relative estimate of effect is not correct for this population, i.e. does it differ from other populations”) and the baseline risk (or burden of a disease) and the certainty in that determined baseline risk.

possibly be upgraded for certain features such as large magnitudes of effect or dose response relations. Thus, NRS may provide a higher quality of the body of evidence because they address the research question more directly. Even RCTs at a very low risk of bias may suffer from important limitations about directness and NRS that are well conducted but more direct could after careful evaluation provide higher quality evidence.

2.5. What else is known about directness? The PICO framework offers the advantage of a clear conceptual approach to judging directness. Elements of this framework have been described by various authors but in a limited fashion and without reference to the PICO approach (Malmivaara et al., 2006; Rothwell, 2005). Rothwell (2005) described a comprehensive list of issues that may affect external validity including issues relating to reporting of study results (e.g., completeness of reporting of relevant adverse effects). However, it may be preferable to categorize such issues individually, for example, under risk of bias or publication bias to separate judgments about directness from other judgments about the confidence in estimates, in particular using GRADE’s clear separation of the domains. Malmivaara et al. also present a comprehensive list of issues focusing on directness that is possibly confused with study reporting (e.g., reporting of inclusion and exclusion criteria of the patients) rather than focusing on the actual included studies. For example, the criteria for inclusion may differ from the actual included patients when the included patient populations are not as broad as the intended populations (Malmivaara et al., 2006). That is, the intended inclusion criteria may be broad (e.g., all age groups), but the identified studies are very narrow in the populations that were studied (e.g., only young patients). In a recent article, including participants at the Ottawa workshop that led to the current article, Atkins et al. (2011) remain close to the PICO and GRADE approach and lay out helpful steps in addressing directness, although the authors prefer the term applicability and focus on the population to which the evidence is applied (Atkins et al., 2011). They also do not include the question related to indirect comparisons that we have described here.

3. Practical considerations for review authors 3.1. Considerations for review methods

Copyright © 2013 John Wiley & Sons, Ltd.

Res. Syn. Meth. 2013, 4 49–62

55

Review authors should consider directness at several stages during the planning and conduct of a systematic review. This begins with the formulation of their PICO when they should think whether or not the PICO should be broad across all four elements of PICO. For example, should they focus on the general population as opposed to a specific population or a population with certain comorbidities? Defining the PICO elements at the outset of a review (e.g., male and female, urban and rural, less and more developed country settings) provides an explicit framework within which to ‘consider NRS’ and ensures patient centeredness (Reeves et al., 2013). These considerations will affect how authors interpret the evidence assembled in the review and, from the point of view of the review methods, may impact on what data to extract from primary studies. The included RCT evidence may be so indirect (e.g., no age restriction was posed but all studies included a relatively young population) that, without considering NRS, the review authors have to express considerable concern about the generalizability or the directness of the evidence to the population defined by the review PICO. As already described, NRS can enhance the certainty of the interpretation of the findings, for example, by providing greater confidence in either

H. J. SCHÜNEMANN ET AL.

the directness or reducing the indirectness of the RCT evidence (e.g., when they demonstrate that effects in NRS did not differ between young and older patients). Thus, there are several steps that authors should follow (Figure 2). The first relates to whether or not the RCTs provide sufficiently high quality evidence to answer the PICO question defined by the review. When a body of RCT evidence covers all the PICO elements defined in the review protocol, and this body of evidence is graded as high or moderate quality evidence for all elements of the PICO (in particular if it is broad), then NRS are unlikely to add complementary, sequential or replacement information (Table 3). The situation, that there is overall high quality evidence from RCTs, is rare and emphasizes the value of NRS. In situations when there is low quality evidence from RCTs, it is appropriate for review authors to consider whether or not the NRS literature can be complementary to the RCT evidence. If RCT and NRS evidence is consistent in terms of the direction and magnitude of the effect than the NRS evidence provides support that an effect possibly exists. At other times, NRS can provide supplementary or sequential information supporting long-term outcomes, confirming the appropriateness of surrogates or effects in subgroups. However, if RCT and NRS evidence is inconsistent, which is not supportive of the findings from RCTs, review authors should be cautious in their interpretation. They may need to make a judgment about which of the two bodies of evidence is more direct than the other, and they may conclude that the interpretation is uncertain. At other times, after a full assessment of the quality of evidence, NRS may provide higher quality evidence and replace the RCT evidence. For example, a guideline panel considering circumcision as way to reduce HIV incidence among men who have sex with men considered two bodies of evidence: RCTs conducted in heterosexual men and NRS in men who have sex with men (World Health Organization, 2011). The former body of evidence was deemed too indirect and thus judged to be of lower quality than the latter one. A key question for review authors is whether, at the protocol stage of a review, it is possible to specify whether or not to consider NRS. The best possible answer to this question is to specify a conditional/sequential rule in the protocol depending on the overall quality of evidence from RCTs after consideration of risk of bias, inconsistency, indirectness, publication bias and imprecision. If the body of evidence for a sufficiently broad question is rated as high quality, then further evaluation of NRS may not be necessary because the confidence in the estimate of effect after assessment of directness would be very certain. However, if a body of evidence from RCTs is downgraded to moderate or low quality in any of these domains, complementary or sequential evidence from NRS may prevent such a downgrade in the quality of evidence or might even replace the evidence from RCTs. The absence of RCTs, for example, in many systematic reviews addressing public health or health services questions, represents a special ‘replacement’ situation. While in these situations, one would ideally focus on evidence from RCTs that do not suffer from indirectness (e.g., addressing long-term outcomes for different exposures), they may be unavailable or deemed infeasible. NRS might act as replacement for RCT evidence. However, replacing a body of RCT evidence with NRS evidence does not imply that the latter is elevated in quality. It simply indicates that the best available evidence is from NRS and that decision makers must accept greater

56

Figure 2. Steps that systematic review authors might follow when considering NRS evidence. Steps 1 to 5 should generally be considered consecutively. Steps 1 to 3 evaluate first if evaluation of evidence from NRS is required. * This means that the question for the systematic review of RCTs is broad enough to fulfill all requirements for applicability and other elements of directness. It is unlikely to be a common situation. # In few situations will the evidence from NRS be informative if the quality of the evidence from RCTs is truly moderate as it requires upgrading of a body of evidence from NRS for any of the GRADE criteria.

Copyright © 2013 John Wiley & Sons, Ltd.

Res. Syn. Meth. 2013, 4 49–62

H. J. SCHÜNEMANN ET AL.

uncertainty (except in the rare circumstances of very high quality NRS evidence, that is, evidence that is upgraded because of special strength). In other terms, the fact that RCTs are of lower quality, not available or not feasible, does not suggest one should have greater confidence in the estimates of effects from the NRS. 3.2. Considerations for training of review authors Making these judgments will enhance the transparency of systematic reviews and enhance their usability by laying out the judgments about indirectness more clearly. While the task of judging the directness of evidence from RCTs is not without challenges, it is substantially more challenging to make judgments about the quality of evidence from NRS. This is because judgments about the specific criteria (i.e., risk of bias, imprecision, publication bias and other quality criteria) are more challenging to make when appraising NRS than RCTs. Therefore, it will be important to train review authors to make these judgments and to provide examples to support review authors. 3.3. What is the impact on the review author? Figure 2 describes the potential impact on review authors. It may become necessary to evaluate a body of evidence from NRS and assess whether or not it provides information that can enhance the review. By considering the PICO, authors have to assess the different domains and levels of directness that they may encounter, and this should start at the protocol stage because of the impact on searching for evidence, study inclusion, and exclusion and data abstraction.

4. Guidance for reviewer authors On the basis of the literature reviewed here, discussions at the workshop, and logical arguments, we would like to offer the following non-hierarchical guidance for systematic review authors: First, review authors should specify the PICO healthcare question that they are interested in addressing, defining the elements of the question in sufficient detail to facilitate judgments about directness. They can use the items in the subdomains and domains of Table 1 to specify their question as narrowly as necessary and as broadly as acceptable. Second, review authors should judge the directness of the evidence that they obtain on the basis of the factors in Table 1. This step is applicable both to individual studies in a review and across studies in a systematic review. Often, review authors do not make specific judgments about the directness but, instead, lay out these factors so that users of the systematic review (e.g., for the development of guidance) can make the necessary judgments about the directness of the evidence for the specific PICO question that they are considering. This is an appropriate modus operandi because users of the systematic review can then evaluate whether or not the evidence is direct enough for their healthcare question (that may differ slightly or substantially). To assess overall directness, they can use the tool in Table 4. Table 5a,b) provides two examples of use of this tool. In Table 5), the body of evidence from RCTs assessing the effect of heparin on mortality in patients with cancer is found to be sufficiently direct that use of Table 4. Judgments about indirectness by outcome in the confidence in estimates of effects (related to question at hand is assessed) (please see online appendix for download of this table). Outcome: . . .

Domain (original question asked)

Description (evidence found and included, including evidence from other studies) – consider the domains of study design and study execution, inconsistency, imprecision, and publication bias

Judgment – Is the evidence sufficiently direct?

Population:

Yes

Probably yes

Probably no

No

Intervention:

Yes



Probably yes



Probably no

No

Comparator:

Yes



Probably yes



Probably no

No

Direct comparison:

Yes



Probably yes



Probably no

No

Outcome:

Yes

Probably yes

Probably no

No

Copyright © 2013 John Wiley & Sons, Ltd.

□ □ No indirectness



□ □Serious

indirectness

□ □ □ □

□ □ □ □

□ □ □ Very serious indirectness

Res. Syn. Meth. 2013, 4 49–62

57

Final judgment about indirectness across domains:



58

Copyright © 2013 John Wiley & Sons, Ltd.

Trials included both low molecular heparin and unfractionated heparin. The observational studies do not suggest differential effects for the heparins. Trials used placebo injections Studies directly compared the intervention against the comparator of interest (default) Mortality was determined through follow-up of patients in the trial (e.g., telephone) The identified evidence is directly relevant to the question. NRS will not provide strong complimentary data for the effects of the intervention. NRS suggest that the baseline risk for the population is similar in the trials compared with the population not included in trials.

Intervention: heparins

Comparator: no anticoagulation

Direct comparison

Outcome: mortality

Final judgment about indirectness across domains for the outcome mortality:

NRS, non-randomized studies.

A total of eight randomized trials included patients with various types of cancer, two trials included only patients with small cell lung cancer, others included predominantly breast cancer. The studies were well executed and enrolled patients that were similar to those seen in practice. There was some degree of inconsistency in the baseline risk and related imprecision. Publication bias was not of concern.

Population: All patients with advanced cancer

Domain (original question asked)

Description (evidence found and included, including evidence from other studies) – consider the domains of study design and study execution, inconsistency, imprecision, and publication bias

□Serious indirectness



Probably yes



Probably yes



No





No

No



No



No

□ □ □ Very serious

Probably no



Probably no



Probably no



Probably no



Probably no

indirectness Footnote: The degree of indirectness does not lower our confidence that the estimates of effect would be similar for healthcare decision making. It is not useful to look for NRS evidence.

&No indirectness 

Yes & 

Yes & 

Yes & 

Probably yes

Probably yes & 

Yes



Probably yes & 



Yes

Judgment – Is the evidence is sufficiently direct?

Table 5a. Example of presentation for judgments about indirectness (not to be used for decision making). In people with cancer, does treatment with heparins compared with no treatment reduce mortality (other outcomes would be increased risk of bleeding, venous thromboembolism and others (Akl et al., 2011)). Outcome: mortality

H. J. SCHÜNEMANN ET AL.

Res. Syn. Meth. 2013, 4 49–62

Res. Syn. Meth. 2013, 4 49–62

Trials included only adjusted dose of warfarin Trials used aspirin 75–325 mg Studies directly compared the intervention against the comparator of interest (default) Bleeding was determined through follow-up of patients in the trial (e.g., in-person and telephone) The identified evidence is indirectly relevant to the question. NRS can provide strong complementary data for the effects of the intervention. NRS suggest that the baseline risk bleeding in the population as well as relative risk is lower in the trials compared with the population not included in trials. In addition, baseline risk of bleeding can be stratified by CHADS2 scoring using NRS data.

Intervention: warfarin

Comparator: aspirin

Direct comparison

Outcome: non-fatal major extracranial bleed

Final judgment about indirectness across domains for the outcome non-fatal extracranial bleeding:

59

Copyright © 2013 John Wiley & Sons, Ltd.

NRS, non-randomized studies.

A total of 11 randomized trials included patients with atrial fibrillation. The quality of evidence was rated down to imprecision [pooled risk ratio bleeding 1.42 (95% CI 0.89– 2.29)] but not inconsistency in the baseline risk and related imprecision. There were no issues of bias, inconsistency, or publication bias. In general, patients enrolled were younger and healthier than those seen in clinical practice.

Population: all patients with atrial fibrillation

Domain (original question asked)

Description (evidence found and included, including evidence from other studies) – consider the domains of study design and study execution, inconsistency, imprecision and publication bias

& Serious indirectness 



Probably yes



Probably yes



Probably yes



Probably yes



Probably yes

No





No

No



No



No

□ □ □ Very serious

Probably no



Probably no



Probably no



Probably no

Probably no & 

indirectness Footnote: The degree of indirectness lowers our confidence that the estimates of effect would be similar for healthcare decision making. It is useful to look for NRS evidence.

□No indirectness

Yes & 

Yes &  Yes & 

Yes & 



Yes

Judgment – Is the evidence is sufficiently direct?

Table 5b. Example of presentation for judgments about indirectness (not to be used for decision making). In people with atrial fibrillation, how much does treatment with warfarin compared with aspirin increase non-fatal extracranial bleeding (other outcomes would be decreased risk of non-fatal stroke and mortality). Outcome: non-fatal extracranial bleeding

H. J. SCHÜNEMANN ET AL.

Copyright © 2013 John Wiley & Sons, Ltd.

Indirectness

Imprecision

Other

2227/ 14387 (15.5%)

12/161 (7.5%)

Cryotherapy

2.4%3

319/ 7454 (4.3%)

4/168 (2.4%)

LEEP

No of patients

OR 2.66 (1.89 to 3.75)

OR 3.3 (1.04 to 10.46)

Relative (95% CI)

Spontaneous abortion (assessed with severe preterm delivery