Deriving Input Parameters for Cost-Effectiveness ... - Value in Health

8 downloads 0 Views 287KB Size Report
farction (as well as costs and health-related quality of life) through regression .... This ap- proach has a series of advantages, which are summarized by Stew-.
VALUE IN HEALTH 15 (2012) 639 – 649

Available online at www.sciencedirect.com

journal homepage: www.elsevier.com/locate/jval

Deriving Input Parameters for Cost-Effectiveness Modeling: Taxonomy of Data Types and Approaches to Their Statistical Synthesis Pedro Saramago, MSc1,*, Andrea Manca, PhD1, Alex J. Sutton, PhD2 1

Centre for Health Economics, University of York, York, UK; 2Department of Health Sciences, University of Leicester, Leicester, UK

A B S T R A C T

Background: The evidence base informing economic evaluation models is rarely derived from a single source. Researchers are typically expected to identify and combine available data to inform the estimation of model parameters for a particular decision problem. The absence of clear guidelines on what data can be used and how to effectively synthesize this evidence base under different scenarios inevitably leads to different approaches being used by different modelers. Objectives: The aim of this article is to produce a taxonomy that can help modelers identify the most appropriate methods to use when synthesizing the available data for a given model parameter. Methods: This article developed a taxonomy based on possible scenarios faced by the analyst when dealing with the available evidence. While mainly focusing on clinical effectiveness parameters, this article also discusses strategies relevant to other key input parameters in any economic model (i.e., disease natural history, resource use/costs, and

preferences). Results: The taxonomy categorizes the evidence base for health economic modeling according to whether 1) single or multiple data sources are available, 2) individual or aggregate data are available (or both), or 3) individual or multiple decision model parameters are to be estimated from the data. References to examples of the key methodological developments for each entry in the taxonomy together with citations to where such methods have been used in practice are provided throughout. Conclusions: The use of the taxonomy developed in this article hopes to improve the quality of the synthesis of evidence informing decision models by bringing to the attention of health economics modelers recent methodological developments in this field. Keywords: aggregate data, decision analytic models, economic evaluation, evidence synthesis, individual patient data, meta-analysis. Copyright © 2012, International Society for Pharmacoeconomics and Outcomes Research (ISPOR). Published by Elsevier Inc. Open access under CC BY license.

Introduction Economic evaluations assess the costs and health consequences of competing health care interventions, programs, or services. Their aim is to inform policy decisions regarding resource provision within health care systems operating under a limited or fixed budget. The information required to carry out an economic evaluation rarely comes from a single study [1]. More commonly, the evidence base informing the model parameters is represented by one or more data sources, including individual patient-level data sets (e.g., randomized controlled trials [RCTs] and observational studies), expert opinions, and secondary data analyses (e.g., metaanalysis). Decision analytic models represent an ideal vehicle to structure the decision problem, combine all available data, and characterize the various sources of uncertainty associated with the decision problem [2]. As with any modeling framework, the results of the analysis depend on the suitability of the model structure, the quality of the data inputs, and the methods used to derive these [3]. The National Institute for Health and Clinical Excellence (NICE, or the Institute) for England and Wales is one of the many national agencies worldwide that recognize the value of decision models to inform the assessment of whether or not technologies represent value for money. The Institute’s guideline for methods of technol-

ogy appraisal [4] recommends that after defining “. . . explicit criteria by which studies are included and excluded . . .” (page 14) “. . . all relevant evidence . . .” should be “. . . identified, quality assessed and, when appropriate, pooled using explicit criteria . . .” by means of “. . . justifiable and reproducible methods” (page 27). One issue typically faced by health economics modelers is how to proceed when multiple sources of evidence are available to inform the same model input (e.g., relative effectiveness). In the last decade, there has been a shift toward recognizing the need for a more systematic identification and utilization of statistical evidence synthesis in decision models [3], at least for effectiveness parameters, with approaches such as meta-analysis or mixed treatment comparisons (MTCs) being increasingly used in costeffectiveness analysis [5]. Parameters used in decision analytic models, for instance, are increasingly being estimated from aggregate data available from published literature. There are, however, several examples where the model’s parameters have been derived almost exclusively from a single individual patient-level trial data set [6 –10]. Advantages of the latter approach, compared with using aggregate data only, include more accurate modeling of the disease’s natural history and the possibility of exploring heterogeneity in baseline risk (and/or relative treatment effect) across patient groups. In this case, the challenge is how to integrate individual patient data with any other component of the evidence base that may be available in

* Address correspondence to: Pedro Saramago, Centre for Health Economics, University of York, Alcuin A, York YO10 5DD, UK. E-mail: [email protected]. 1098-3015 Copyright © 2012, International Society for Pharmacoeconomics and Outcomes Research (ISPOR). Published by Elsevier Inc. Open access under CC BY license. doi:10.1016/j.jval.2012.02.009

640

VALUE IN HEALTH 15 (2012) 639 – 649

Table 1 – A gallery of scenarios arising when using clinical evidence in cost-effectiveness models. Format of data available

Aggregate data Individual patient data Mixture of aggregate and individual-level data

Single source of evidence

Multiple sources of evidence

Parameter

Parameter

Single

Multiple

Single

Multiple

Scenario A1 Scenario C1 –

Scenario B1 Scenario D1 –

Scenario A2 Scenario C2 Scenario E2

Scenario B2 Scenario D2 Scenario F2

aggregate or summary measures format. Methods for combining multiple individual-level [11] or individual- and aggregate-level data are rapidly developing [12–16], although many applied health economics modelers are currently unaware of these. This article developed a taxonomy based on possible scenarios typically faced by the analyst when dealing with the evidence base. Because most of the methods development took place in the area of statistical synthesis of clinical effectiveness measures from RCTs, the proposed taxonomy is structured around examples concerning such parameters (see next section). Statistical approaches available to synthesize the evidence base under different scenarios are briefly reported and discussed together with key references to full explanations of the methodologies and examples of where such methods have been used in practice. We make no claim to be exhaustive with respect to reviewing the various applications because this was not the objective of the article. Instead, our aim was to use these examples to illustrate and to provide recommendations regarding which techniques are most appropriate to use to synthesize available information depending on its format, number of data sources, and number of parameters to be derived. In addition to applying this taxonomy to clinical effectiveness parameters, we consider its application to other key economic model input parameters (in the “Use of Evidence for Other Model Input Parameters” section) of an economic model including disease natural history, resource use/costs, and preferences with a view to discussing issues with the application of the taxonomy to these other parameters. In doing so, we hope to encourage a fuller application of this taxonomy to non-effectiveness parameters in future modeling. Finally, the last section summarizes the main points of the article and includes suggestions for future research.

A Taxonomy for the Use of Evidence in CostEffectiveness Models: Application to Clinical Effectiveness Evidence Good practice in health economic evaluation suggests that decision models should be structured in a way that appropriately reflects the decision problem at hand [17]. The evidence used to inform the model inputs often comes from different sources, with potentially heterogeneous designs (e.g., RCTs, observational, and expert panels). International health technology assessment standards in systematic reviews and meta-analysis [18] indicate that good-quality RCT evidence is the preferred data source for estimating the main clinical effect(s) of interest. There are, of course, many features of the evidence base (e.g., characteristics of target population and use of intermediate outcomes rather than final ones) that may complicate its use for informing a particular economic analysis. These are, however, not specific to randomized data alone. Country-specific health technology assessment guidance documents provide more heterogeneous indications as to whether or not it is acceptable to use non-randomized evidence to inform the main clinical effectiveness part of the model in the absence of evidence from good quality randomized studies [19]. In the case of NICE for instance, its methods guidance states that any limitations of the methods used, potential biases in obtained pa-

rameter estimates, caveats about the interpretation of results, and appropriate reflection of parameter uncertainty should be extensively reported in the analyses submitted for consideration of the Institute [4]. For simplicity, this article deals with situations in which the main body of evidence for effectiveness comes from randomized studies (the added level of complication deriving from the inclusion of non-randomized data is discussed when relevant to the argument). It is argued here that the selection of an appropriate method for the analysis and synthesis of clinical effectiveness data for use in a decision model does depend on three dimensions of the evidence base (listed below), the combination of which gives rise to a taxonomy of possible scenarios the analyst may face, as illustrated in Table 1. These are briefly introduced here and discussed more fully in the following sections.

Number of available sources of evidence Depending on the research question, there may be multiple sources of evidence from randomized trials from which to derive an estimate of clinical effectiveness, although there are examples where a single RCT provides the only evidence available.

Formats in which data are available The above evidence may be available 1) in aggregate form only (sometimes referred to as summary data), 2) at the individual level, and 3) in a combination of aggregate- and individual-level data. Care must be taken when classifying these data formats, because in some contexts individual-level data may not contain any extra information beyond what is conveyed by available summary statistics. For example, basic individual data can be reconstructed from a summary 2 ⫻ 2 table recording numbers of individuals at risk and those who experienced a binary outcome in a two-arm trial. Either approach will give the same estimate of the odds ratio (OR) of effect [20]. In this case, the OR is a sufficient statistic, in the sense that “. . . no other statistic which can be calculated from the same sample provides any additional information as to the value of the parameter” [21].

Number of (effectiveness) parameters to be derived It is important to distinguish between the need to synthesize the evidence to inform a single parameter versus the need to estimate multiple parameters for use in the decision model (one example may be the synthesis of the sensitivity and specificity of a diagnostic test, another one may be the need to synthesize (one or more) clinical outcome(s) reported at different time points—most often these requiring different analytical and evidence synthesis strategies).

Single source of evidence Let us start with the simplest scenario of all, that is, where there is only one single source of evidence from which to derive the parameter(s) of interest. In this case, the problem is not how to synthesize the available evidence but how to make the best of this

VALUE IN HEALTH 15 (2012) 639 – 649

single source to inform parameter estimate(s) for use in the economic model.

Aggregate data to inform the estimation of a single parameter (A1) If all the available evidence is in the form of published (summary) results of a single study, the simplest option is to use these data in the model “as they are” to inform the derivation of the relevant parameter estimate in the decision model. For a probabilistic representation of these parameters, the analyst will need to have access to multiple statistics from the source of evidence (e.g., mean and standard error). Also, plausibility and sample characteristics (e.g., skewness) may be used to define an appropriate distribution. Clearly, exploration of any statistical heterogeneity (i.e., variability between effect sizes from studies than would be expected from chance only) relating to a parameter in this circumstance is unfeasible and usually no further appraisals of the evidence are possible, other than a simple sensitivity or threshold analysis. At this stage and in the absence of other source of evidence, the analyst may want to explore whether attempting to acquire further evidence through other techniques (e.g., expert elicitation) is worth the effort.

Aggregate data to inform the estimation of multiple parameters (B1) It is possible for a single published study to provide several inputs that may be used to derive model parameters. For instance, a single (three-arm) trial may provide effectiveness data for a decision model evaluating the same three alternative treatments. Because the resulting measures of relative effectiveness (such as ORs or log-ORs) between these three arms are inherently correlated, it has been recommended that correlation between parameters should be explicitly modeled where possible. Failure to do so would produce not just an incorrect assessment of the uncertainty in the model but also result in an incorrect estimation of each treatment’s expected costs and benefits [22–24]. In such instances, the propagation of correlations is automatic if parameter estimation is conducted in the same program as the decision model (sometimes called one-step comprehensive decision modeling [25]) or can be achieved by specifying the full multivariate distributions for the correlated parameters [22]. Methods such as indirect or MTC models can be successfully used to address the above problem (see section B2 and references therein) [4]. Another situation in which it is possible to derive multiple parameters from a single study occurs when the interest of the modeler lies in estimating multiple outcomes/multiple time points on the same treatment comparison. The range of possible analytical options here may be limited by the lack of information on the correlation between outcomes. In some cases, approximate or ad hoc methods may be available to take into account the correlated nature of the outcomes (e.g., the phi coefficient, Yule’s Q, or Yule’s Y—see Epstein and Sutton [24] for further details), and this will usually be preferable to assuming that the outcomes are independent (but less desirable than obtaining the individual data and estimating the correlations directly).

Individual patient data to inform the estimation of a single parameter (C1) Access to individual-level data, especially when there is only one relevant study forming the evidence base, is particularly advantageous because it allows reanalysis of the data (e.g., inclusion of further explanatory covariates and conduct of more in-depth analyses than is possible from summary evidence extracted from published reports [26]) aimed to derive appropriate model input parameters. Indeed, compared with the use of summary data, the

641

analysis of individual data may be considered the most flexible way to explore and answer clinical and economic research questions. In this case, data can be analyzed by using the range of statistical models developed to analyze trial data to estimate the decision model parameter of interest [27,28]. Given that appropriate methods in this context are extensively documented elsewhere, we keep this section brief.

Individual patient data to inform the estimation of multiple parameters (D1) The availability of individual-level data often enables the estimation of multiple model parameters. The economic model constructed around the third Randomised Intervention Trial of unstable Angina [10] is an example where the trial data were used to derive estimate rates of cardiovascular death and myocardial infarction (as well as costs and health-related quality of life) through regression models applied to a single individual patient-level data set. As noted in section C1, the possibility to estimate correlations between correlated input model parameters based on summary measures is often limited by the data being reported. This is no longer an issue when one has access to the original study individual-level data. Another important area where access to individuallevel data facilitates the estimation of multiple parameters relates to the analysis of time-to-event data. Because trials’ follow-up are short in duration [1], to produce long-term estimates of cost-effectiveness most models need to extrapolate the observed trial results (e.g., fatal and nonfatal events) beyond the trial follow-up. This can be achieved by employing parametric distributions to model the outcome of interest. These are typically governed by a combination of two or more correlated ancillary parameters. Popular examples of parametric distributions include the Weibull, the Log-Logistic, and the Generalized Gamma [29] for the analysis of time-to-event outcomes.

Multiple sources of evidence There are situations in which the evidence base is represented by multiple studies. Depending on the format in which they are available and the number of parameters we need to estimate, these give rise to six possible scenarios (Table 1).

Multiple aggregate data to inform the estimation of a single parameter (A2) A typical case occurs when there are several studies reporting results on the same parameter of interest, and the researcher needs to combine these into a single quantitative estimate. The statistical methods most commonly used to achieve such a synthesis fall within the meta-analytic family [30]. In standard metaanalysis of clinical trials, the parameter of interest is usually some measure of comparative effectiveness between treatment arms. A fixed-effect meta-analysis (i.e., assuming that every study is estimating the same OR, and therefore, only within-study variation is taken to influence the uncertainty in the results) is carried out under the assumption that a single common (or “fixed”) effect underlies every study in the meta-analysis [18]. It is common, though, to observe between-study variation in treatment effect estimates (heterogeneous treatment effects). In such a case, it is customary to use a random effects model (which makes the assumption that while individual studies are estimating different treatment effects, these come from a common distribution with some measure of central tendency and some measure of dispersion [18]). For an up-to-date comprehensive review of recent developments in meta-analysis, the reader can refer to the article by Sutton and Higgins [31]. Some authors have argued that one of the prime weaknesses of meta-analysis is a possible failure to control for sources of bias and that a good meta-analysis of badly designed studies will still result

642

VALUE IN HEALTH 15 (2012) 639 – 649

in a biased combined statistic. In addition to the inclusion of evidence of suboptimal quality, publication and other related biases may be present [32]. Study-level features, such as participants’ characteristics, which may lead to between-study heterogeneity, can be investigated by adopting a meta-regression approach in which studylevel covariates are included in the analysis. Some researchers would prefer to include “weaker”/“low-quality” studies in the meta-analysis and add a study-level covariate reflecting the methodological quality of the trials to assess the impact of trial quality on the effect size. Unfortunately, meta-regression methods also have a number of weaknesses [33]. The analyst should be aware of the fact that the use of such mean study-level covariate values has low power (over individual patient data methods) and, more importantly, carries the risk of “ecological fallacy” (i.e., situations in which relationships observed at the aggregate variable level are incorrectly inferred to exist also at the individual level) [34] if these average patient-level characteristics are considered [20,35]. In this sense, as we shall see in section C2, access to individual data can be used to disentangle the relationship between the parameter of interest and baseline covariates [36]. Ades et al. [37] provide an extensive discussion of how between-study heterogeneity can be incorporated into the parameters of a decision analytic model (notice that if the heterogeneity parameter is used to derive parameters for a decision analytic model—technically the analysis is estimating multiple parameters [i.e., a random-effect estimate of the treatment effect, its variability, and a measure of heterogeneity] and thus belongs to the B2 category). In cost-effectiveness analysis, the use of estimates of relative treatment effects derived from a meta-analysis is common [38]. An example of its use within an economic model is the prevention and treatment of influenza A and B [39], where a separate meta-analysis was conducted to evaluate time to symptoms alleviated and time to return to normal activities for different baseline risk groups. In another study, McKenna et al. [40] recently carried out a systematic review and economic evaluation of the clinical effectiveness and cost-effectiveness of aldosterone antagonists for post–myocardial infarction heart failure. The authors estimated the effectiveness parameter to inform their cost-effectiveness model by using a Bayesian meta-regression model.

Multiple aggregate data to inform the estimation of multiple parameters (B2) Meta-analysis can also be used to achieve more complex forms of evidence synthesis, to address issues related to multiple (indirect and mixed treatment) comparisons and combinations of evidence on multiple or surrogate/intermediate end points [41]. Much of the published work on these complex methods of synthesis has been undertaken within a Bayesian framework, mainly for computational reasons but also because of its coherent link to decision making [5]. The term “multiparameter evidence synthesis” (MPES) adapted from Hasselblad and McCrory [42] has been coined to designate these extended methods of synthesis. When multiple outcomes are of interest, a multivariate metaanalysis model facilitates the joint estimation of these end points, thus estimating possible correlation between them. Often, the advantage of a multivariate random-effects meta-analysis lies in its ability to use the within-study and between-study correlation of the multiple end points of interest. For example, Reitsma et al. [43] have suggested applying a bivariate random-effects meta-analysis to jointly synthesize logit-sensitivity and logit-specificity values from diagnostic accuracy studies. More generally, a common feature of the evidence base used to inform health care funding decisions is the absence of head-tohead trials comparing all relevant treatment strategies. When more than two treatments are to be compared and the evidence base contains different randomized pairwise or multiarm compar-

isons, the appropriate techniques to use in the decision-making context are indirect treatment comparisons and network metaanalysis (or MTCs), which are simple extensions of the pairwise meta-analysis method [44,45]. MTCs can be recognized as an example of MPES, in which parameters are related to one another by a definable structure [46]. In an MTC the modeler may choose between a fixed-effects and a random-effects analysis depending on the assumptions made about any between-trial heterogeneity, as discussed in A2 [46,47]. MTC relies on exactly the same assumptions as standard pairwise meta-analysis (i.e., choice and quality of the studies), although now these are applicable to the full set of interlinked trials. Therefore, the similarity between trials included in the network will also be a determinant of the internal validity of the analyses, at the risk of having a high confounding bias [49]. In the instances where direct and indirect evidence are combined for a particular comparison, it is also vital that there are no disagreements between the direct and indirect comparisons (for instance, in an MTC model comparing three treatments [e.g., A, B, and C], consistency is achieved when, for each pairwise comparison, no discrepancies can be found between the direct and indirect estimates of the parameter of interest [e.g., OR] derived from the model. The issue here is in defining how big a difference is considered a discrepancy—although this is arbitrary to define, there are statistical tests [potentially with reduced power] for discrepancies found) [50 –52]. As for standard meta-analysis, in network meta-analysis it is important to allow for between-study heterogeneity [53]. An extension of this family of techniques, allowing for the incorporation of study-level covariates to explain between-study heterogeneity and reduce synthesis model inconsistency, is also available [53–55]. Details on the use of indirect and MTC for health technology assessment can be found elsewhere [48]. A good example illustrating the use of the MTC framework when multiple follow-up times are available is the article by Lu et al. [56]. For an application of the MPES approach, the reader is referred to the recent work by Welton et al. [57], which was originally developed by using data from the earlier economic appraisal of antiviral treatment by Turner et al. [39] referred to in A2. Another example where MTC was used in an economic analysis can be found in Woolacott et al. [58]. The authors synthesized clinical effectiveness data from several published trials in epilepsy to estimate the transition probabilities needed to populate a state transition model developed to assess the cost-effectiveness of alternative medications for epilepsy.

Multiple individual patient data sets to inform the estimation of a single parameter (C2) Meta-analysis of individual participant-level data or “mega-analysis,” where raw data from each study is obtained and synthesized to inform the estimation of a single parameter of interest, is considered the “gold-standard” in evidence synthesis [11,59]. This approach has a series of advantages, which are summarized by Stewart and Tierney [60], Simmonds et al. [61], and Stewart and Parmar [62]. Access to multiple individual-level data sets avoids the risk of bias associated with published aggregate data, allows one to obtain information possibly not available from published reports (or not available in the format required for the meta-analysis and cost-effectiveness model), and facilitates consistent inclusion/exclusion criteria to be used across studies [62,63]. An increase in statistical power to detect true patient-treatment relationships is gained when compared with meta-regression of summary data [20,64], which only assesses treatment in relation to group-level summary data [3]. It should be highlighted that, however, in most situations, access to individual patient-level trial data may be difficult due to issues such as confidentiality and sponsors’ or investigators’ rigidity in releasing this data. Surprisingly, there is paucity of published literature concerning methodologies for meta-analysis of individual data. Sim-

VALUE IN HEALTH 15 (2012) 639 – 649

monds et al. [61] recently published a review of meta-analysis by using trial-based individual-level data suggesting that most methods used in practice are straightforward. The review shows that the majority of applications use a “two-stage” process where initially each data set is analyzed separately, summary data are drawn for each study (stage one), and—subsequently— combined by using a “standard” meta-analytic model for aggregate evidence (stage two). This approach may be considered a simplification of the techniques discussed in scenario A2. Alternative and more robust approaches for dealing with binary [65], ordinal [66], continuous [11,67], and longitudinal outcomes [68,69] based on random-effects generalized hierarchical models exist. Unfortunately, these approaches appear to be rarely used in practice. It should be noted that if the outcome of interest is binary and the analyst does not need to control for covariates (or if the covariates are themselves binary), the information from the summary data will report the sufficient statistics; that is, no additional benefit is obtained from access to individual participant-level data (as in scenario A2).

Multiple individual patient data to inform the estimation of multiple parameters (D2) As outlined in section B2, there are clear theoretical and practical benefits (besides an obvious policy rationale) that justify why it would be desirable to carry out an MTC (and MPES models in general) when deriving parameters for use in cost-effectiveness analysis. Many of these benefits will also apply when the purpose is to analyze multiple data sets from which to derive multiple parameters for use in decision modeling. Nevertheless, some authors believe that this need is exacerbated by the fact that an MTC is essentially an observational study comparing several treatment strategies. For instance, Salanti et al. [54] point out that while each individual trial may have high internal validity, studies included in an MTC will almost inevitably display between-study variability in study-level characteristics that can affect the relative effectiveness of the strategies being compared. One example of this is the definition of “placebo” or “standard care” in many MTCs [54] and cost-effectiveness models [70], which has been found to vary enormously between studies. Another example is the work by Nixon et al. [71] and two subsequent cost-effectiveness models [72,73], in which the authors conducted a covariate-adjusted aggregate data MTC of trial evidence in drugs for rheumatoid arthritis. While this approach is better than an unadjusted MTC, it still suffers the same limitations as standard meta-regression. It is therefore essential to carry out a series of tests to assess the consistency of the evidence in the evidence base network.

Mixture of individual- and aggregate-level data to inform a single parameter (E2) When the analyst opts for trying to acquire individual participant data from each relevant study forming the evidence base, the most frequent scenario he or she encounters is that these data will be made available only for a subset of the evidence base. In such situations, analysts have traditionally taken two alternative routes: 1) include only those studies for which individual data were available or 2) for the studies where individual data were available, collapse these to summary evidence and use only the latter. Neither solution makes optimal use of the available data. The first option throws away important information, and the second ignores all the advantages that individual-level data may bring toward an improved estimation of the effect size. A better approach would be to jointly model the individual- and aggregatelevel data [12]. As yet, this issue has not received attention in the health economics literature, although a series of meta-analysis models have been recently developed [13–16,74,75] in the statistical literature specifically for this purpose. These models are valu-

643

able in reducing between-study heterogeneity or in identifying patient subgroups with differential treatment effects.

Mixture of individual- and aggregate-level data to inform multiple parameters (F2) Access to individual-level data (alongside existing aggregate data) is particularly important when the objective of the evidence synthesis model is the estimation of multiple input parameters to populate a cost-effectiveness model. This is clearly the most technically challenging scenario the analyst may face, and we are not aware of the existence of such models yet, despite the scenario discussed in this section being quite common. In this sense, further methodological research aimed at developing models appropriate to deal with these situations would be welcome. An interesting application does exist for diagnostic test evaluation (Riley et al. [13]), with a bivariate meta-analysis model being used to model outcomes of diagnostic studies, although this was not used to inform a cost-effectiveness model. By reconstructing individual patient-level data from available aggregate data, the authors manage to take into account the mixture of individual- and aggregate-level data, which allows for all evidence set to be simultaneously considered in estimating the parameters of interest (i.e., sensitivity and specificity of the test). Both individual patient-level data and aggregate data studies contribute to the estimation of the impact of study-level covariates and the across-study effects. For guidance, a summary of the scenarios with corresponding recommended methods and related literature is shown in Table 2.

Use of Evidence for Other Model Input Parameters While quantitative evidence synthesis methods typically focus on clinical effectiveness (including adverse events), cost-effectiveness models require information on many other input parameters, the most important being disease natural progression, cost/ resource use, and (health state) utility data. There are some examples in the literature where evidence synthesis techniques have been applied to estimate these parameters. In this section, we describe the specific characteristics of these parameters and highlight the implications for methods of synthesis—aiming to illustrate taxonomy’s applicability. While clinical effectiveness data used to populate the model typically come from randomized studies, the evidence base used to estimate disease natural progression, resource use/cost, and utility data parameters is often derived from observational evidence (e.g., registries and administrative claim data) [76]. There are various reasons for this, beyond the fact that RCT evidence may not be available to populate these model parameters. First, because of their intrinsic design, randomized data on resource use/cost are often considered to have low external validity. This may be because the trial evidence does not reflect true clinical practice [77] or because the evidence may not be relevant to the decision-maker for whom the model is being developed [78]. In this case, observational evidence may provide an opportunity to calibrate the model parameters and assess the extent to which trial evidence reflects real-world situations. Methods developed in the generalized evidence synthesis framework, which facilitate the synthesis of both randomized and nonrandomized data while accounting for the different study designs [79,80] and methods developed for cross-design synthesis [81], may be useful here. We are not aware of any application of these methods outside the analysis of clinical effectiveness, and further research in this direction, focusing on parameters such as disease natural history, could be of great interest. Second, resource use/cost (and to some extent utility) data are country specific [82], and, in many cases, it is almost impossible to find jurisdiction-specific RCT evidence on

644

VALUE IN HEALTH 15 (2012) 639 – 649

Table 2 – Scenarios and corresponding current methods literature for when using clinical evidence to inform costeffectiveness modeling. Scenario

Source of evidence

Format of data

Parameter

A1

Single

AD

Single

B1

Single

AD

Multiple

C1

Single

IPD

Single

D1

Single

IPD

Multiple

A2

Multiple

AD

Single

B2

Multiple

AD

Multiple

C2

Multiple

IPD

Single

D2

Multiple

IPD

Multiple

E2

Multiple

Mixture

Single

F2

Multiple

Mixture

Multiple

Methods*

Relevant references

Direct inclusion of reported estimate (or transformation of it) in the model Direct inclusion of estimates or transformations of them in the model—correlation should be included if reported Using standard analysis procedures relevant for the primary study Using standard multivariate estimation procedures —including correlations Meta-analysis; Meta-regression



Multivariate meta-analysis (e.g., bivariate random-effects meta-analysis); Mixed treatment comparison “Mega-analysis” (meta-analysis using IPD) Multivariate meta-analysis using IPD (e.g., bivariate randomeffects meta-analysis) Two-stage—reduce IPD to AD or reconstruct IPD from AD; One-stage—hierarchical modeling (e.g., meta-analysis using AD and IPD) Extensions of previous synthesis models to the hierarchical framework (e.g., bivariate random-effects meta-analysis of IPD)

[22–24] [26, 28] [10, 60] [20, 30–33, 35, 37, 39, 62] [42–46, 48, 50–58] [61, 64–66, 68, 69] [71] [12–16, 74, 75] [13]

AD, aggregate data; IPD, individual patient-level data. * Where there is the need to define uncertain parameters for inclusion in a decision model and evidence is available as in scenarios A1 and B1, it may be necessary that multiple statistics are reported. Also, it may be of interest to have some information on the characteristics of the distribution best representing the parameter. All subsequent scenarios allow for expressions of uncertainty.

these parameters [83,84]. Third, most RCTs have a short follow-up duration. To model disease natural history as well as long-term costs and utilities, (large) long-term observational studies are often the only solution. Other specific issues for each type of parameter will be discussed in turn in the next subsections.

Disease natural progression data One of the initial and most important phases in building any decision model is to explicitly define its structure. This entails, among other things, giving an appropriate representation of the key health states that the population of interest may experience over time and reflecting what is known about the natural history of the particular health condition being modeled as well as the impact of alternative treatment options on the disease process. These procedures should be performed in collaboration with both clinical and nonclinical experts from the field(s) of interest. Evidence about the natural history of a disease is crucial for a good understanding of possible clinically defined states and, in view of the complexity of the task, long-term individual data are ideal for this (detailed individual-level natural history data are particularly important for modeling the impact that baseline characteristics may have on parameters in the model that capture the occurrence of clinical events beyond follow-up and associated resource use and health-related quality of life). Furthermore, given the concerns about the external validity of trial-based data evidence, a favored source for baseline risk data is often case series or high-quality individual-level administrative or epidemiological data sets [85]. Despite relaxing the evidence base inclusion criteria to model disease natural history, individual-level data are still very often unavailable, leaving published summary evidence as the only feasible option to inform the model parameters. With respect to the synthesis of data informing the baseline history part of the model, the publication of a recent useful docu-

ment from NICE Decision Support Unit is worth highlighting [86]. This report reviews evidence synthesis issues that may arise when dealing with baseline natural history modeling. A discussion of the source of evidence to use for baseline outcomes, the simultaneous versus separate modeling of baseline and treatment effects, and the inclusion of covariates in baseline models is provided by the authors. Although not presented within an evidence synthesis framework, Isaman and colleagues [87– 89] proposed an approach that allows the use of published regression data to populate a multistate model describing disease natural history, even when the published study may have ignored intermediary states in the multistate model (taxonomy section B1). The authors applied their proposed methodology to model several chronic conditions, including heart disease and diabetes. Welton and Ades [90], however, use evidence synthesis methods applied to summary data to estimate transition probabilities from transition rates, with the objective of using these to model disease progression. The authors illustrate how to statistically combine data from multiple sources, including partially observed data at several follow-up times, to inform an epidemiologically realistic model (taxonomy section B2). Chao and Chen [91] recently used a similar approach and developed a multistate Markov model to predict the progression of age-related hearing loss, by synthesizing partially observed aggregate data from four studies from which they derived progression rates (taxonomy section B2). Modeling disease natural history becomes a lot easier when the analyst has access to individual-level data, and there are many examples in the literature that show how one can proceed in this case. Marshall and Jones [92], for instance, developed a multistate model to describe disease progression in diabetic patients with retinopathy and used patient-level covariates in the model to capture the natural course of the disease and identify the factors associated with progression and regression between disease stages

VALUE IN HEALTH 15 (2012) 639 – 649

(taxonomy section D1). We are not aware of any applications in the medical field that, in the presence of multiple patient-level data, carried out the synthesis of these for the purpose of modeling the natural history of a disease.

Cost/resource use data The quantification of the cost of each alternative strategy being compared is essential for any economic assessment. The best study design for quantifying health care resource utilization includes prospective data collection within a long-term naturalistic trial setting. In the absence of these, retrospective analysis of existing data sets, complemented with examination of administrative databases, can be an alternative solution. In fact, it is not uncommon for model parameters associated with health care resource use to be estimated by reviewing routine data (e.g., hospital records) [93,94]. Elicitation of expert opinion [95,96] may also play a role in informing the estimates of some resource use model parameters, although this source of evidence is considered the least preferred, because it usually carries considerable levels of subjectivity. An article by Bower et al. [95] is one of the few examples in which meta-analytic techniques were employed to synthesize cost data. By using data on costs from trials of counseling in primary care at the individual participant level, the authors attempted to overcome sample size limitations in their economic analysis by pooling short- and long-term resource use data from four different studies using a fixed-effects meta-analysis (taxonomy sections A2 and B2). This approach has a number of limitations as the authors pointed out (e.g., the significant variation between trials in SDs of costs and the difficulty in identifying comparable data and consequently of standardizing cost means), reinforcing the fact that under no circumstances will the performed analysis approach the precision of primary data collection. We agree with these conclusions and would argue that while it is possible to carry out quantitative evidence synthesis of health care resource use data, there are some real concerns that limit its validity, over and above the issues mentioned in this section, and the first of these is technical. Because resource use and costs are non-normally distributed, their statistical synthesis is particularly challenging from the analytical point of view, a problem that is exacerbated when the information is available only at the aggregate study level. Second, there is typically a large methodological heterogeneity that affects costing studies, which is often impossible to characterize statistically. Some of the study-level features that may be responsible for it are differences in data collection strategies, methods for measuring resource use and costs, follow-up duration, and methods of analysis and reporting. A third issue relates to time. Technological innovation, relative price changes, and many other factors that may affect resource use and costs are difficult to capture in a synthesis of secondary data.

Health-state utility data Quality-adjusted life-years are used extensively as a measure of health benefits in cost-effectiveness analysis for policy decisions [97]. Their advantage stems from the fact that they combine morbidity and mortality into a single numeraire. Morbidity, for the purpose of quality-adjusted life-year calculation, is measured in terms of its impact on a preference-based generic measure of health-related quality of life. Instruments that can be used to estimate preference-based generic health-related quality of life include the EuroQol five-dimensional (EQ-5D) questionnaire [98], the health utility index [99], the quality of well-being [100], and the six-dimensional health state short form (SF-6D -derived from short form 36 health survey) [101]. Several country-specific preference weights exist for the EQ-5D questionnaire and SF-6D. The abundance of alternative instruments means that researchers and

645

policymakers are often unclear as to which of these should be used and accepted in a given country or jurisdiction. Differences in the descriptive systems used by each of these instruments generate a comparability problem [102,103], which is compounded when the analyst intends to synthesize the available evidence to produce one parameter estimate of utility for use in the model. In addition, disagreement in the literature as to whose preferences should be used to value health states [104] makes the synthesis even more complex. Despite this ongoing debate, the EQ-5D questionnaire has become the most widely used preference-based generic measure of health-related quality of life in recent years. The increasing interest in the use of this instrument, at least in the UK context, coincided with the publication in 2008 of the third edition of the NICE methods guidance for technology assessment [4], which indicated in the EQ-5D questionnaire the instrument of choice for use in the reference case analyses submitted to the Institute for consideration. Publicly available repositories of health state utility values for a variety of health conditions are potentially a very useful data source. Tengs and Wallace [105] were the first to publish a national repository of 1000 utility values gathered from 154 published reports. More recently, Sullivan and Ghushchyan [106] and Sullivan et al. [107] used US Medical Expenditure Panel Survey data to develop a prediction tool for preference-based EQ-5D index scores for chronic conditions in the United States and the United Kingdom based on responders’ International Classification of Diseases, Ninth Revision codes. These repositories are particularly useful in the absence of (primary or secondary) preference-based generic health-related quality-of-life data. When these data are available and depending on the data’s format and number of parameters to inform, we have one of the scenarios described in Table 1. To date, little work has been undertaken on the methods for statistical synthesis of preference-based health-related qualityof-life data. This may be because its synthesis is not stated as a requirement by national bodies such as NICE [4]. Recently, the NICE Decision Support Unit released a technical support document giving guidance on the identification, review, and synthesis of health state utility values [108]. Among a series of recommendations, this document emphasizes the importance of selecting a main set of relevant utility values, or, in the presence of multiple relevant values, the pooling of these is suggested to improve the precision of both mean and variance estimates. Meta-regression is one of the synthesis methods proposed to account for variability and to provide support to the choice of values used. Perhaps one of the first articles to apply evidence synthesis methods to health-related quality-of-life data was published by Kinney et al. [109], who conducted a meta-analysis of 84 studies reporting summary quality-of-life data in a cardiac patient population (taxonomy section A2). Tengs and Lin [110,111] published meta-regressions in two different clinical areas (i.e., HIV/AIDS and cardiovascular) with the objective of estimating utility values associated with specific health states while controlling for specific study- and instrument-related features (taxonomy section A2). By using the same methodology, Sturza [112] recently published a meta-regression of utility values in lung cancer, finding a great deal of heterogeneity in the data even after applying strict inclusion criteria, and concluded that analysts should avoid direct comparisons of lung cancer utility values elicited with dissimilar methods (taxonomy section A2). Donnan et al. [113] and Cheng and Niparko [114] found similar problems with respect to combining utility values from a variety of assessment methods. For Cheng and Niparko [114], it was found to be problematic to do so and “. . . to some extent, this heterogeneity limits the meaningfulness of statistical pooling . . .” (page 1217). Other examples of quantitative evidence synthesis of utility estimates can be found in the litera-

646

VALUE IN HEALTH 15 (2012) 639 – 649

ture [115–120] (all examples lay within categories A2 and B2 of the taxonomy). These findings suggest that quantitative synthesis of aggregate preference-based values is limited by 1) between-study heterogeneity in instruments used, 2) the value set used to quantify utilities, and 3) the models used to approximate scores for health states; over and above the typical issues related to standard metaregression of summary binary outcome data. It has therefore been argued that, particularly in this context, the use of individual patient-level data would be essential [121]. Further work is required in this area, with regard to both methods of quantitative synthesis of heterogeneous preference-based outcomes when these are available at aggregate study level and the need to control for between-study heterogeneity induced by the use of different preference-based instruments.

Discussion The information required to carry out economic evaluation studies for policy decisions often comes from several different data sources, which often provide multiple estimates of the parameter(s) of interest. Statistical evidence synthesis techniques and decision analytic models represent an ideal vehicle to structure the decision problem, combine all available data, and characterize the various sources of uncertainty associated with the decision problem. By using the synthesis of clinical effectiveness data as a conceptual framework, we developed a taxonomy of possible scenarios that the analyst may face (and discuss appropriate methodologies to use) based on a combination of three factors: 1) the number of data sources, 2) their format(s), and 3) whether the analyst wishes to derive single or multiple parameters from the synthesis. Recommendations concerning appropriate methods to use under different scenarios were provided throughout. This article also reviewed the way in which evidence has been used to inform decision model parameters related to the disease natural history, costs, and utilities. Areas where further methodological research may be needed are also identified. The proposed taxonomy is designed to be used by health economics modelers as an instrument to support the development of their analysis plan, help them to fulfill methodological requirements, and adequately address the research question at hand. The three dimensions on which the taxonomy is based provide a simple method of characterizing and categorizing the evidence base available (i.e., in terms of its quantity and format) linking this to the (type and number of) decision model parameter(s) to be derived. Following this “checklist,” the analyst can easily identify within the relevant taxonomy cell (or cells) methods that are available and those that are recommended. This list of approaches and methods has been (wherever possible) supported by references to key methods literature and case studies where these have been put into practice. The references can then be consulted by analysts in search of further methodological and/or practical details of the subject. In this sense, the taxonomy helps to ensure consistency and completeness when carrying out the task of using evidence to inform decision models, standardizing approaches, and the adequate use of methods to analyze/synthesize evidence. In addition, it provides a useful reference on the more recent methodological developments in the context of evidence synthesis for health care cost-effectiveness analysis. The current taxonomy foundations are not, however, without limitations. First, the evidence synthesis methodologies and applied studies described throughout the article are not the result of a comprehensive systematic review (i.e., not exhaustive). We believe, however, that these are representative of the methodologies found in the methods and applied literature in this area of research. Second, despite our efforts to make this taxonomy easy to

generalize, the three dimensions (number and format of data sources and number of parameters to inform) may still not capture all possible scenarios. For instance, the taxonomy could be extended to include extra dimensions—for example, extrapolation of model estimates— or detailed to cover other aspects—for example, role of covariates within each taxonomy section. It was felt, however, that such an extension would unnecessarily increase its complexity without adding substantial benefits. Finally, the taxonomy is applied to clinical effectiveness but not to other key economic model parameters (i.e., disease natural history, resource use/costs, and preferences). Nonetheless, issues relating to the application of the taxonomy to these other parameters are discussed and its fuller application encouraged in future research. Methodological and applied literature is scarce regarding the quantitative synthesis of evidence to inform these—we believe that further research is required despite recent relevant contributions in this area [86,108]. Moreover, the specific characteristics of these parameters and of the evidence used to inform them may pose further challenges. Some of these evidence characteristics are highlighted and discussed next. In practice, studies included in a certain synthesis may vary in their degree of rigor and possibly in their relevance toward the research question. This is particularly important when “relevant” available evidence comes from observational data. Flaws in the design or conduct of a study can result in bias, and in some cases this can have as much influence on observed effects as that of the treatments. Important intervention effects, or lack thereof, can be obscured by bias. Assessment of study quality gives an indication of the strength of evidence provided by the pooled result and ultimately, quality assessment helps to answer the question of whether included studies are sufficiently robust to guide treatment, prevention, diagnostic, or policy decisions. Most of the bias adjustment proposals published so far are reweighting schemes, usually attributing lower weight to evidence with a high risk of bias. More information on this topic can be found in Spiegelhalter and Best [80], Turner et al. [122], and Welton et al. [123]. Common to all evidence identified, to potentially inform decision model parameters, is the case of partial reporting of information. If, for instance, mean differences without a measure of variance are reported, difficulties may arise when attempting to parameterize data for probabilistic modeling and strong assumptions may have to be imposed. A variety of methods for imputing variances have been proposed—see Abrams et al. [124], Wiebe et al. [125], and Furukawa et al. [126] for further details. Another example occurs when different studies report different (multiple) outcome measures, at different time points and possibly on different scales. All these issues raise important obstacles for the synthesis of evidence. A number of authors have recently published articles relating to the synthesis of cost-effectiveness model outputs [127–133]. One of the key questions here is whether or not it is appropriate to do so. In our view, there is no apparent rationale for assuming that the costs of a particular health care intervention (or their health utilities) estimated in different studies carried out in different countries and health care settings, probably using different measurement and assessment instruments, should converge toward a common value to be estimated by using evidence synthesis techniques [131].

Conclusions This article brings recent developments in quantitative evidence synthesis to the attention of the health economics modeling community, encouraging a broader and more explicit consideration of these methods in the future. Several of the techniques presented here fall in the spheres of epidemiology, statistics, and operational research, which in some cases are not directly accessible (because

VALUE IN HEALTH 15 (2012) 639 – 649

of lack of exposure and increased complexity of methods) to health economics modelers. The taxonomy should be viewed by readers/analysts as a supplement to the guidelines on methods for technology assessment published by NICE [4] and increase the users’ confidence surrounding the validity of decision model inputs and subsequent outputs.

Acknowledgments We acknowledge Mark Jit, Honorary Senior Lecturer in Mathematical Modeling from the University of Birmingham, and participants to the 2009 HESG summer conference in Sheffield (UK) for their very helpful comments on an earlier draft of this work. Also, we acknowledge Professor Mark Sculpher, Dr. Cynthia Iglesias, and Mr. Ronan Mahon for their useful comments and suggestions throughout the article-writing process. Source of financial support: Pedro Saramago was funded through a Medical Research Council Capacity Building Grant (grant number G0800139) and the Portuguese Fundacao para a Ciencia e a Tecnologia (grant number SFRH/BD/61448/2009). Andrea Manca’s contribution was made under the terms of a Career Development research training fellowship issued by the NIHR (grant number CDF-2009-02-21). The views expressed in this publication are those of the author(s) and not necessarily those of the Medical Research Council, the Portuguese Fundacao para a Ciencia e a Tecnologia, the National Institute for Health Research, or the UK Department of Health.

REFERENCES

[1] Sculpher MJ, Claxton K, Drummond M, McCabe C. Whither trialbased economic evaluation for health care decision making? Health Econ 2006;15:677– 87. [2] Sculpher M, Claxton K. Establishing the cost-effectiveness of new pharmaceuticals under conditions of uncertainty—when is there sufficient evidence? Value Health 2005;8:433– 46. [3] Cooper NJ, Sutton AJ, Ades AE, et al. Use of evidence in economic decision models: practical issues and methodological challenges. Health Econ 2007;16:1277– 86. [4] National Institute for Health and Clinical Excellence. Guide to the Methods of Technological Appraisal. London: National Institute for Health and Clinical Excellence, Institute’s Decision Support Unit, 2008. [5] Ades AE, Sculpher M, Sutton A, et al. Bayesian methods for evidence synthesis in cost-effectiveness analysis. Pharmacoeconomics 2006;24: 1–19. [6] Mihaylova B, Briggs A, Armitage J, et al. Cost-effectiveness of simvastatin in people at different levels of vascular disease risk: economic analysis of a randomised trial in 20,536 individuals. Lancet 2005;365:1779 – 85. [7] Mihaylova B, Briggs A, Armitage J, et al. Lifetime cost effectiveness of simvastatin in a range of risk groups and age groups derived from a randomised trial of 20,536 people. BMJ 2006;333:1145. [8] Briggs A, Mihaylova B, Sculpher M, et al. Cost effectiveness of perindopril in reducing cardiovascular events in patients with stable coronary artery disease using data from the EUROPA study. Heart 2007;93:1081– 6. [9] Epstein DM, Sculpher MJ, Manca A, et al. Modelling the long-term cost-effectiveness of endovascular or open repair for abdominal aortic aneurysm. Br J Surg 2008;95:183–90. [10] Henriksson M, Epstein DM, Palmer SJ, et al. The cost-effectiveness of an early interventional strategy in non-ST-elevation acute coronary syndrome based on the RITA 3 trial. Heart 2008;94:717–23. [11] Higgins JP, Whitehead A, Turner RM, et al. Meta-analysis of continuous outcome data from individual patients. Stat Med 2001;20: 2219 – 41. [12] Riley RD, Simmonds MC, Look MP. Evidence synthesis combining individual patient data and aggregate data: a systematic review identified current practice and possible methods. J Clin Epidemiol 2007;60:431–9.

647

[13] Riley RD, Dodd SR, Craig JV, et al. Meta-analysis of diagnostic test studies using individual patient data and aggregate data. Stat Med 2008;27:6111–36. [14] Riley RD, Lambert PC, Staessen JA, et al. Meta-analysis of continuous outcomes combining individual patient data and aggregate data. Stat Med 2008;27:1870 –93. [15] Riley RD, Steyerberg EW. Meta-analysis of a binary outcome using individual participant data and aggregate data. Res Synthesis Meth 2010;1:2–19. [16] Sutton AJ, Kendrick D, Coupland CA. Meta-analysis of individual- and aggregate-level data. Stat Med 2008;27:651– 69. [17] Weinstein MC, O’Brien B, Hornberger J, et al. Principles of good practice for decision analytic modeling in health-care evaluation: report of the ISPOR Task Force on Good Research Practices—Modeling Studies. Value Health 2003;6:9 –17. [18] Higgins JP, Green S. Cochrane Handbook for Systematic Reviews of Interventions 2009, v5.0.2. Available from: www.cochrane-handbook .org. [Accessed February 11, 2011]. [19] McGhan WF, Al M, Doshi JA, et al. The ISPOR Good Practices for Quality Improvement of Cost-Effectiveness Research Task Force report. Value Health 2009;12:1086 –99. doi: 10.1111/j.1524-4733.2009.00605.x [20] Lambert PC, Sutton AJ, Abrams KR, Jones DR. A comparison of summary patient-level covariates in meta-regression with individual patient data meta-analysis. J Clin Epidemiol 2002;55:86 –94. [21] Fisher RA. On the mathematical foundations of theoretical statistics. Philos Trans Royal Soc Lond 1922;Series A:309 – 68. [22] Ades AE, Lu G. Correlations between parameters in risk models: estimation and propagation of uncertainty by Markov Chain Monte Carlo. Risk Anal 2003;23:1165–72. [23] Ades AE, Claxton K, Sculpher M. Evidence synthesis, parameter correlation and probabilistic sensitivity analysis. Health Econ 2006;15: 373– 81. [24] Epstein D, Sutton A. Modelling correlated clinical outcomes in health technology appraisal. Value Health 2011;14:793–9. [25] Cooper NJ, Sutton AJ, Abrams KR, et al. Comprehensive decision analytical modelling in economic evaluation: a Bayesian approach. Health Econ 2004;13:203–26. [26] Stewart LA, Clarke MJ. Practical methodology of meta-analyses (overviews) using updated individual patient data. Cochrane Working Group. Stat Med 1995;14:2057–79. [27] Bland M. An Introduction to Medical Statistics (3rd ed.). Oxford: Oxford University Press, 2000. [28] Glick H. Economic Evaluation in Clinical Trials. Oxford: Oxford University Press, 2007. [29] Collett D. Modelling Survival Data in Medical Research (2nd ed.). Boca Raton, FL: Chapman & Hall/CRC, 2003. [30] Whitehead A. Meta-Analysis of Controlled Clinical Trials. Chichester: John Wiley & Sons, 2002. [31] Sutton AJ, Higgins JP. Recent developments in meta-analysis. Stat Med 2008;27:625–50. [32] Sutton AJ, Song F, Gilbody SM, Abrams KR. Modelling publication bias in meta-analysis: a review. Stat Meth Med Res 2000;9:421– 45. [33] Thompson SG, Higgins JPT. How should meta-regression analyses be undertaken and interpreted? Stat Med 2002;21:1559 –73. [34] Piantadosi S, Byar DP, Green SB. The ecological fallacy. Am J Epidemiol 1988;127:893–904. [35] Berlin JA, Santanna J, Schmid CH, et al. Individual patient- versus group-level data meta-regressions for the investigation of treatment effect modifiers: ecological bias rears its ugly head. Stat Med 2002;21: 371– 87. [36] Wakefield J. Ecologic studies revisited. Ann Rev Pub Health 2008;29: 75–90. [37] Ades AE, Lu G, Higgins JP. The interpretation of random-effects metaanalysis in decision models. Med Decis Making 2005;25:646 –54. [38] Gold MR. Cost-Effectiveness in Health and Medicine. New York: Oxford University Press, 1996. [39] Turner D, Wailoo A, Nicholson K, et al. Systematic review and economic decision modelling for the prevention and treatment of influenza A and B. Health Technol Assess 2003;7:iii–iv, xi–xiii, 1–170 [40] McKenna C, Burch J, Suekarran S, et al. A systematic review and economic evaluation of the clinical effectiveness and costeffectiveness of aldosterone antagonists for postmyocardial infarction heart failure. Health Technol Assess 2010;14:1–162. [41] Baker SG. A simple meta-analytic approach for using a binary surrogate endpoint to predict the effect of intervention on true endpoint. Biostatistics 2006;7:58 –70. [42] Hasselblad V, McCrory DC. Meta-analytic tools for medical decision making: a practical guide. Med Decis Making 1995;15:81–96. [43] Reitsma JB, Glas AS, Rutjes AW, et al. Bivariate analysis of sensitivity and specificity produces informative summary measures in diagnostic reviews. J Clin Epidemiol 2005;58:982–90. [44] Lumley T. Network meta-analysis for indirect treatment comparisons. Stat Med 2002;21:2313–24.

648

VALUE IN HEALTH 15 (2012) 639 – 649

[45] Lu G, Ades AE. Combination of direct and indirect evidence in mixed treatment comparisons. Stat Med 2004;23:3105–24. [46] Ades AE. A chain of evidence with mixed comparisons: models for multi-parameter synthesis and consistency of evidence. Stat Med 2003;22:2995–3016. [47] Higgins JP, Whitehead A. Borrowing strength from external trials in a meta-analysis. Stat Med 1996;15:2733– 49. [48] Sutton A, Ades AE, Cooper N, Abrams K. Use of indirect and mixed treatment comparisons for technology assessment. Pharmacoeconomics 2008;26:753– 67. [49] Dias S, Welton NJ, Ades AE. Study designs to detect sponsorship and other biases in systematic reviews. J Clin Epidemiol 2010;63:587– 8. [50] Dias S, Welton NJ, Caldwell DM, Ades AE. Checking consistency in mixed treatment comparison meta-analysis. Stat Med 2010;29: 932– 44. [51] Lu GB, Ades AE. Assessing evidence inconsistency in mixed treatment comparisons. J Am Stat Assoc 2006;101:447–59. [52] Salanti G, Higgins JPT, Ades AE, Ioannidis JPA. Evaluation of networks of randomized trials. Stat Meth Med Res 2008;17:279 –301. [53] Cooper NJ, Sutton AJ, Morris D, et al. Addressing between-study heterogeneity and inconsistency in mixed treatment comparisons: application to stroke prevention treatments in individuals with nonrheumatic atrial fibrillation. Stat Med 2009;28:1861– 81. [54] Salanti G, Marinho V, Higgins JP. A case study of multiple-treatments meta-analysis demonstrates that covariates should be considered. J Clin Epidemiol 2009;62:857– 64. [55] Salanti G, Dias S, Welton NJ, et al. Evaluating novel agent effects in multiple-treatments meta-regression. Stat Med 2010;29:2369 – 83. [56] Lu G, Ades AE, Sutton AJ, et al. Meta-analysis of mixed treatment comparisons at multiple follow-up times. Stat Med 2007;26:3681–99. [57] Welton NJ, Cooper NJ, Ades AE, et al. Mixed treatment comparison with multiple outcomes reported inconsistently across trials: evaluation of antivirals for treatment of influenza A and B. Stat Med 2008;27:5620 –39. [58] Woolacott N, Hawkins N, Mason A, et al. Etanercept and efalizumab for the treatment of psoriasis: a systematic review. Health Technol Assess 2006;10:1–233. [59] Sutton AJ, Abrams KR, Jones DR, et al. Methods for Meta-Analysis in Medical Research. London: Wiley, 2000. [60] Stewart LA, Tierney JF. To IPD or not to IPD? Advantages and disadvantages of systematic reviews using individual patient data. Eval Health Prof 2002;25:76 –97. [61] Simmonds MC, Higgins JP, Stewart LA, et al. Meta-analysis of individual patient data from randomized trials: a review of methods used in practice. Clin Trials 2005;2:209 –17. [62] Stewart LA, Parmar MK. Meta-analysis of the literature or of individual patient data: is there a difference? Lancet 1993;341:418 –22. [63] Jeng GT, Scott JR, Burmeister LF. A comparison of meta-analytic results using literature vs individual patient data: paternal cell immunization for recurrent miscarriage. JAMA 1995;274:830 – 6. [64] Smith CT, Williamson PR, Marson AG. An overview of methods and empirical comparison of aggregate data and individual patient data results for investigating heterogeneity in meta-analysis of time-toevent outcomes. J Eval Clin Pract 2005;11:468 –78. [65] Turner RM, Omar RZ, Yang M, et al. A multilevel model framework for meta-analysis of clinical trials with binary outcomes. Stat Med 2000;19:3417–32. [66] Whitehead A, Omar RZ, Higgins JP, et al. Meta-analysis of ordinal outcomes using individual patient data. Stat Med 2001;20:2243– 60. [67] Goldstein H, Yang M, Omar R, et al. Meta-analysis using multilevel models with an application to the study of class size effects. J Roy Stat Soc C-App 2000;49:399 – 412. [68] Farlow MR, Small GW, Quarg P, Krause A. Efficacy of rivastigmine in Alzheimer’s disease patients with rapid disease progression: results of a meta-analysis. Dement Geriatr Cogn Disord 2005;20:192–7. [69] Jones AP, Riley RD, Williamson PR, Whitehead A. Meta-analysis of individual patient data versus aggregate data from longitudinal clinical trials. Clin Trials 2009;6:16 –27. [70] Hawkins N, Scott DA. Cost-effectiveness analysis: discount the placebo at your peril. Med Decis Making 2010;30:536 – 43. [71] Nixon RM, Bansback N, Brennan A. Using mixed treatment comparisons and meta-regression to perform indirect comparisons to estimate the efficacy of biologic treatments in rheumatoid arthritis. Stat Med 2007;26:1237–54. [72] Brennan A, Bansback N, Nixon R, et al. Modelling the cost effectiveness of TNF-alpha antagonists in the management of rheumatoid arthritis: results from the British Society for Rheumatology Biologics Registry. Rheumatology (Oxford) 2007;46:1345–54. [73] Wailoo AJ, Bansback N, Brennan A, et al. Biologic drugs for rheumatoid arthritis in the Medicare program: a cost-effectiveness analysis. Arthritis Rheum 2008;58:939 – 46. [74] Jackson C, Best N, Richardson S. Improving ecological inference using individual-level data. Stat Med 2005;25:2136 –59.

[75] Jackson C, Best N, Richardson S. Hierarchical related regression for combining aggregate and individual data in studies of socioeconomic disease risk factors. J Roy Stat Soc A 2008;171:159 –78. [76] Briggs AH, Claxton K, Sculpher MJ. Decision Modelling for Health Economic Evaluation. Oxford: Oxford University Press, 2006. [77] Drummond MF, Sculpher MJ, Torrance GW, et al. Methods for the Economic Evaluation of Health Care Programmes (3rd ed.). Oxford: Oxford University Press, 2005. [78] Urdahl H, Manca A, Sculpher MJ. Assessing generalisability in modelbased economic evaluation studies: a structured review in osteoporosis. Pharmacoeconomics 2006;24:1181–97. [79] Prevost TC, Abrams KR, Jones DR. Hierarchical models in generalized synthesis of evidence: an example based on studies of breast cancer screening. Stat Med 2000;19:3359 –76. [80] Spiegelhalter DJ, Best NG. Bayesian approaches to multiple sources of evidence and uncertainty in complex cost-effectiveness modelling. Stat Med 2003;22:3687–709. [81] Ades AE, Sutton AJ. Multiparameter evidence synthesis in epidemiology and medical decision-making: current approaches. J Roy Stat Soc A 2006;169:5–35. [82] Manca A, Willan AR. ‘Lost in translation’: accounting for betweencountry differences in the analysis of multinational costeffectiveness data. Pharmacoeconomics 2006;24:1101–19. [83] Schulman K, Burke J, Drummond M, et al. Resource costing for multinational neurologic clinical trials: methods and results. Health Econ 1998;7:629 –38. [84] Augustovski F, Iglesias C, Manca A, et al. Barriers to generalizability of health economic evaluations in Latin America and the Caribbean region. Pharmacoeconomics 2009;27:919 –29. [85] Cooper N, Coyle D, Abrams K, et al. Use of evidence in decision models: an appraisal of health technology assessments in the UK since 1997. J Health Serv Res Policy 2005;10:245–50. [86] Dias S, Welton NJ, Sutton AJ, Ades AE. NICE DSU Technical Support Document 5: evidence synthesis in the baseline natural history model. August 2011. Available from: http://www.nicedsu.org.uk/ TSD%205%20Baseline_final%20report.pdf. [Accessed December 19, 2011]. [87] Isaman DJ, Herman WH, Brown MB. A discrete-state discrete-time model using indirect observation. Stat Med 2006;25:1035– 49. [88] Isaman DJM, Barhak J, Ye W. Indirect estimation of a discrete-state discrete-time model using secondary data analysis of regression data. Stat Med 2009;28:2095–115. [89] Ye W, Isaman DJM, Barhak J. Use of secondary data to estimate instantaneous model parameters of diabetic heart disease: lemonade method. Inform Fus. 2012;13(2):137– 45. [90] Welton NJ, Ades AE. Estimation of Markov chain transition probabilities and rates from fully and partially observed data: uncertainty propagation, evidence synthesis, and model calibration. Med Decis Making 2005;25:633– 45. [91] Chao TK, Chen TH. Predictive model for progression of hearing loss: meta-analysis of multi-state outcome. J Eval Clin Pract 2009;15:32– 40. [92] Marshall G, Jones RH. Multi-state models and diabetic retinopathy. Stat Med 1995;14:1975– 83. [93] Cookson R, Flood C, Koo B, et al. Short-term cost effectiveness and long-term cost analysis comparing laparoscopic Nissen fundoplication with proton-pump inhibitor maintenance for gastrooesophageal reflux disease. Br J Surg 2005;92:700 – 6. [94] Tumeh JW, Moore SG, Shapiro R, Flowers CR. Practical approach for using Medicare data to estimate costs for cost-effectiveness analysis. Expert Rev Pharmacoecon Outcomes Res 2005;5:153– 62. [95] Connock M, Frew E, Evans BW, et al. The clinical effectiveness and cost-effectiveness of newer drugs for children with epilepsy: a systematic review. Health Technol Assess 2006;10:iii, ix–118. [96] Wu O, Robertson L, Twaddle S, et al. Screening for thrombophilia in high-risk situations: systematic review and cost-effectiveness analysis. The Thrombosis: Risk and Economic Assessment of Thrombophilia Screening (TREATS) study. Health Technol Assess 2006;10:1–110. [97] Bower P, Byford S, Barber J, Beecham J, Simpson S, Friedli K, et al. Meta-analysis of data on costs from trials of counselling in primary care: using individual patient data to overcome sample size limitations in economic analyses. BMJ 2003;326:1247–50. [98] Brooks R. EuroQol: the current state of play. Health Policy 1996;37: 53–72. [99] Feeny D, Wu L, Eng K. Comparing short form 6D, standard gamble, and Health Utilities Index Mark 2 and Mark 3 utility scores: results from total hip arthroplasty patients. Qual Life Res 2004;13:1659 –70. [100] Kaplan RM, Ganiats TG, Sieber WJ, Anderson JP. The Quality of WellBeing Scale: critical similarities and differences with SF-36. Int J Qual Health Care 1998;10:509 –20. [101] Brazier J, Roberts J, Deverill M. The estimation of a preference-based measure of health from the SF-36. J Health Econ 2002;21:271–92.

VALUE IN HEALTH 15 (2012) 639 – 649

[102] Conner-Spady B, Suarez-Almazor ME. Variation in the estimation of quality-adjusted life-years by different preference-based instruments. Med Care 2003;41:791– 801. [103] Wee HL, Machin D, Loke WC, et al. Assessing differences in utility scores: a comparison of four widely used preference-based instruments. Value Health 2007;10:256 – 65. [104] Gandjour A. Theoretical foundation of patient v. population preferences in calculating QALYs. Med Decis Making 2010;30:E57– 63. [105] Tengs TO, Wallace A. One thousand health-related quality-of-life estimates. Med Care 2000;38:583– 637. [106] Sullivan PW, Ghushchyan V. Preference-Based EQ-5D index scores for chronic conditions in the United States. Med Decis Making 2006; 26:410 –20. [107] Sullivan PW, Sculpher MJ, Ghushchyan VH, Slejko JF. Catalogue of EQ-5D scores for the UK. Value Health 2009;12:A398 –A398. DOI: 10.1016/S1098-3015(10)74963-9. [108] Papaioannou D, Brazier JE, Paisley S. NICE DSU Technical Support Document 9: the identification, review and synthesis of health state utility values from the literature. October 2011. Available from: http:// www.nicedsu.org.uk/TSD9%20HSUV%20values_FINAL.pdf. [Accessed December 19, 2011]. [109] Kinney MR, Burfitt SN, Stullenbarger E, et al. Quality of life in cardiac patient research: a meta-analysis. Nurs Res 1996;45:173– 80. [110] Tengs TO, Lin TH. A meta-analysis of utility estimates for HIV/AIDS. Med Decis Making 2002;22:475– 81. [111] Tengs TO, Lin TH. A meta-analysis of quality-of-life estimates for stroke. Pharmacoeconomics 2003;21:191–200. [112] Sturza J. A review and meta-analysis of utility values for lung cancer. Med Decis Making 2010;30:685–93. [113] Donnan PT, McLernon D, Dillon JF, et al. Development of a decision support tool for primary care management of patients with abnormal liver function tests without clinically apparent liver disease: a record-linkage population cohort study and decision analysis (ALFIE). Health Technol Assess 2009;13:ii-iv, ix-xi, 1–134 [114] Cheng AK, Niparko JK. Cost-utility of the cochlear implant in adults: a meta-analysis. Arch Otolaryngol Head Neck Surg 1999;125:1214 – 8. [115] Dijkers M. Quality of life after spinal cord injury: a meta analysis of the effects of disablement components. Spinal Cord 1997;35:829 – 40. [116] Post PN, Stiggelbout AM, Wakker PP. The utility of health states after stroke: a systematic review of the literature. Stroke 2001;32:1425–9. [117] Bremner KE, Chong CA, Tomlinson G, et al. A review and meta-analysis of prostate cancer utilities. Med Decis Making 2007;27:288–98.

649

[118] McLernon DJ, Dillon J, Donnan PT. Health-state utilities in liver disease: a systematic review. Med Decis Making 2008;28:582–92. [119] Peasgood T, Herrmann K, Kanis JA, Brazier JE. An updated systematic review of health state utility values for osteoporosis related conditions. Osteoporos Int 2009;20:853– 68. [120] Peasgood T, Ward S, Brazier J. Health-state utility values in breast cancer. Expert Rev Pharmacoecon Outcomes Res 2010;10:553– 66. [121] Ara R, Brazier JE. Populating an economic model with health state utility values: moving toward better practice. Value Health 2010;13: 509 –18. [122] Turner RM, Spiegelhalter DJ, Smith GC, Thompson SG. Bias modelling in evidence synthesis. J R Stat Soc Ser A Stat Soc 2009;172:21– 47. [123] Welton NJ, Ades AE, Carlin JB, et al. Models for potentially biased evidence in meta-analysis using empirically based priors. J Roy Stat Soc A 2009;172:119 –36. [124] Abrams K, Gillies C, Lambert P. Meta-analysis of heterogeneously reported trials assessing change from baseline. Stat Med 2005;24:3823– 44. [125] Wiebe N, Vandermeer B, Platt R, et al. A systematic review identifies a lack of standardization in methods for handling missing variance data. J Clin Epidemiol 2006;59:342–53. [126] Furukawa T, Barbui C, Cipriani A, et al. Imputing missing standard deviations in meta-analyses can provide accurate results. J Clin Epidemiol 2006;59:7–10. [127] Nixon J, Khan K, Kleijnen J. Summarising economic evaluations in systematic reviews: a new approach. Brit Med J 2001;322:1596 – 8. [128] Birch S, Gafni A. Economics and the evaluation of health care programmes: generalisability of methods and implications for generalisability of results. Health Policy 2003;64:207–19. [129] Sculpher MJ, Pang FS, Manca A, et al. Generalisability in economic evaluation studies in healthcare: a review and case studies. Health Technol Assess 2004;8:iii–iv,1–192. [130] Welte R, Feenstra T, Jager H, Leidl R. A decision chart for assessing and improving the transferability of economic evaluation results between countries. Pharmacoeconomics 2004;22:857–76. [131] Pignone M, Saha S, Hoerger T, et al. Challenges in systematic reviews of economic analyses. Ann Intern Med 2005;142:1073–9. [132] Sculpher MJ, Drummond MF. Analysis sans frontieres: can we ever make economic evaluations generalisable across jurisdictions? Pharmacoeconomics 2006;24:1087–99. [133] Anderson R. Systematic reviews of economic evaluations: utility or futility? Health Econ 2010;19:350 – 64.