control, then one needs to define the null hypothesis and the corresponding alternative hypothesis appropriately. More specifically, the alternative hypothesis ...
Proceedings of the Annual Meeting of the American Statistical Association, August 5-9, 2001
Active Control Trials - Hypotheses and Issues* By Gang Chen, George Chi, Mark Rothmann, Ning Li Division of Biometrics I, CDER, FDA 5600 Fishers Ln, Rockville, MD 20857 Key Words: Active Margin, Control effect 1.
Control,
Non-inferiority,
Introduction
For trials with mortality or serious morbidity outcome, the use of a placebo is considered unethical when there is an effective treatment available. In such a trial, due to the lack of a concurrent placebo control, one cannot assert the existence of the active control effect. The efficacy of the new (experimental) treatment is usually established by demonstrating that it is superior to the active control [Temple (1983) and Temple and Ellenberg (2000)]. For example, in oncology for cancers that have standard therapy, a new treatment must be compared and shown superior to that standard therapy in two randomized controlled clinical trials. Unless the new treatment represents new advances in the treatment of the disease, it would be generally more difficult to show that the new treatment is better than an effective control in such trial. Thus, it is not surprising to find that when the new treatment fails to show superiority to the control, the sponsor or investigator would conclude that the new treatment is no different from the control. But it is well known that failing to reject the null hypothesis of equality between the new treatment and the control does not imply that the null hypothesis is true [Fleming (1990), (2000) and Temple (1996)]. As White (1998) aptly puts it, “the lack of evidence of difference” is not the same as “evidence of a lack of difference.” It is arguable as to whether in an active control mortality or serious morbidity trial, one should always insist on a demonstration of superiority of the new treatment over the active control. If the new treatment offers a better toxicity profile or ease of administration, this new treatment may be beneficial even when some “clinically acceptable” amount of efficacy is lost when compared to the active control or standard treatment. In such cases, the objective of an active control trial should be to demonstrate that the new treatment is effective and at the same time has better toxicity and/or safety profile. The effectiveness of the new treatment may be demonstrated by showing that the trial can rule out that the new treatment is worse than the activecontrol by some “clinically important” amount. The
issue then comes down to the proper choice of “how much worse is clinically important.”1 It has been suggested by various authors (e.g., Blackwelder (1982) and Huque, Dubey and Fredd (1989)), that if the intention is to show that the new intervention is equivalent, or non-inferior to the active control, then one needs to define the null hypothesis and the corresponding alternative hypothesis appropriately. More specifically, the alternative hypothesis should reflect the hypothesis of interest, namely that of equivalence, or “non-inferiority”. To do that, it has been suggested that a certain equivalence or “non-inferiority” margin should be pre-specified in the hypothesis [cf. Blackwelder, Nevius, and Huque]. When the active control is highly effective, then it may be clinically unacceptable to allow a new treatment to lose even a small fraction of the active control effect. In this case, if the variability of the control effect is small, then one may specify a fixed margin such that the new treatment should fall within this fixed margin of the control. This margin can be set very tight so that the new treatment should not be much worse than the control by this amount (provided that the active control is effective in the current trial as it purports to be). Various authors [Hauck and Anderson (1999), Holmgren (1999), Simon (1999), Koch and Tangen (1999), and Hassalblad and Kong (2001),] have proposed not to directly pre-specify a threshold, but rather define the “margin” in terms of retention of a certain clinically acceptable percent of the active control effect. Effectiveness is demonstrated by the new treatment, if it can be shown that the new treatment retains a “clinically acceptable” percent of the effect of the active-control. We will discuss in this paper several active control non-inferiority hypotheses, the related issues in testing the hypotheses, the key underlying assumptions, and the proper interpretation of such active control trials. *
There is active discussion underway at FDA concerning the circumstances, methodologies, and regulatory questions that are to be addressed in this area. Perspectives are evolving and no formal policy should be assumed at this time.
2. Hypotheses and issues in active control noninferiority trials A new drug effect can be demonstrated in an active control trial through the following three tests: 1) to establish efficacy through superiority test: Test Drug > Active Control, 2) to establish equivalence within a δ-margin δ: Control + δ-m > Test Drug > Control - δ-m, 3) to establish efficacy through testing a proportion retention of control effect: Test Drug > %*control effect. In mortality trials, due to ethical consideration, an active control non-inferiority trial is often conducted. In this section several active control non-inferiority hypotheses are formulated and concerns on each hypothesis will be discussed. 2.1. Active control non-inferiority hypotheses In recent designs of active control non-inferiority trials, three different hypotheses have been used. They are the hypothesis with an arbitrary fixed margin, the hypothesis with a margin based on an estimate of the control effect, and the hypothesis with a percent retention of the control effect. To formulate these hypotheses, we will use the capital letters T, C, P and δ0 to refer to, respectively, the treatment, active control, a reference “placebo” and a proportion (retention). We will use HR for hazard ratio. The arithmetic definition of the proportion of control effect retained is given:
δ =
HR ( P / C ) − HR ( T / C ) HR ( P / C ) − 1
where HR(P/C) > 1. For example, for HR(P/C) =1.3, HR(T/C)=1.2, we have, δ=1/3. A non-inferiority margin is defined as the amount of the effect that the new treatment may give up. The three hypotheses can be formulated as follows. H0: HR(T/C) >1+M vs. Ha: HR(T/C) 0 is arbitrary constant,
(1)
H0: HR(T/C) >1+M* vs. Ha: HR(T/C) 0 is a fixed constant determined based on the estimate of control effect,
H0: HR(T/C) > δ0 + (1-δ0) HR(P/C) vs. Ha: HR(T/C) < δ0 + (1-δ0) HR(P/C),
(3)
The margin M is chosen arbitrarily in hypothesis (1). For example, M can be chosen as 1.25 in a mortality trial with the interpretation that the treatment may allow to have a 25% higher risk of death than the control. If the 95% confidence interval for the treatment effect relative to the active control lies beneath this threshold, then the non-inferiority is concluded. The major concern with this fixed, arbitrary margin is that a treatment can be inferior to the control if the control can reduce the risk less than the fixed margin. The selection of a non-inferiority margin has been addressed in the ICH E10 Guideline. According ICH E10, “the margin generally is identified based on past experience in placebo-controlled trials of adequate design under conditions similar to those planned for the new trial.” Hypotheses in (2) is more appropriate than hypotheses in (1) in the sense that the margin is defined based on the control effect. If the true control effect is known, hypothesis (2) provides a correct formulation for testing whether the new treatment is non-inferior to the control. The control effect, however, is usually an unknown prameter. It is difficult to predetermine an appropriate margin for the hypothesis. To prevent the potential for inferiority of a new treatment to the active control, the ICH E10 takes a “conservative” approach to define a margin: “the margin chosen for a noninferiority trial cannot be greater than the smallest effect size that the active drug would be reliably expected to have compared with placebo in the setting of the planned trial.” For example, the FDA Center for Biologics Evaluation and Research (CBER) in the thrombolytic trial used the lower limit of 90% CI in determing the margin in hypothesis (2) to test thrombolytic is effective (CBER Memo, 1998). There are two major concerns with this approach: 1) There is no appropriate justification (e.g., type I error) for the “conservativeness” for the noninferiority test; 2) The interpretation of the test results is only associated a fixed margin. To illustrate the first concern, we use the thrombolytic trial as an example. In that trial, one may argue that any value less than the point estimate can be used as the non-inferiority margin. Why is the lower limit of 90% CI “conservative” enough? Why are the lower limit of 99% CI or the lower limit of 70% CI not chosen as the margin because they are too conservative or too liberal? If one believe that the margin is chosen appropriately in hypothesis (2), similarly in hypothesis
(1), rejecting the null hypothesis in (2) leads to the conclusion that the treatment can have a fixed higher risk relative to the control. For example, if the margin calculated based on the control effect is 10%, rejecting the null hypothesis in (2) can be interpreted that the treatment can have at most 10% higher risk of death than the control group. Whether a 10% higher risk relative to the control is acceptable needs a further indirect comparison with the control effect relative to standard care or placebo. In this indirect comparison, there is no given criterion, e.g., type I error, to ensure the claim whether the treatment is effective relative to placebo. Having discussed those issues with hypotheses in (1) and (2), those hypotheses in (3) is to test the percent of the control effect one wishes to retain. Non-inferiority is demonstrated by the treatment, if it is shown that the treatment retains at least a δ0 proportion of the control effect desired. The interpretation of the test result is more reasonable. For example, using δ0 =0.5, rejecting hypothesis (3) can be interpreted that the treatment can preserve at least 50% of the control effect. Hypothesis (3) leads to a claim of the effectiveness of the treatment, or to a claim of non-inferiority of treatment relative to the control through a predetermined proportion δ0. The advantage of testing a percent retention is that one does not need to predetermine the margin and the type I error rate is associated with the effectiveness of the treatment effect and can be controlled at a required level under certain assumptions. In many oncology drug trials and thrombolytic trial, a δ0 of 0.5 is used. Using δ0 = 0.5 is arbitrary and the selection of δ0 should be based on the distribution of the control effect and other clinical conditions, e.g., to demonstrate efficacy of the new treatment or to demonstrate non-inferiority to the control; whether the new treatment has other benefits over the active control such as better toxicity profile, ease of administration, etc. It needs to be done on a case by case basis. When the new treatment does not offer any additional benefit over the active control, then δ0 needs to be selected higher than 0.5, e.g., 0.85. When δ0 = 1, i.e., a 100% retention of the active control effect, we have a superiority trial, and δ0 = 0, i.e., a 0% retention of the active control survival effect, if there is no current placebo, we have an indirect superiority comparison between the treatment and the placebo. The major concern with hypothesis (3) is that, in a two arm trial, there is no concurrent placebo can be used in the trial for assessing placebo P, and the control effect
needs to be assessed using historical data. Under the assumption HR(P2/C2) = HR(P1/C1), hypotheses in (3) actually are: H0: HR(T/C2) > δ0 + (1-δ0) HR(P1/C1) vs. Ha: HR(T/C2) < δ0 + (1-δ0)HR(P1/C1),
(3’)
where C2 and C1 are the current control and the historical control respectively, and P1 is the historical placebo. Since there is no concurrent placebo in the trial to assess the active control effect, in order to have valid and proper statistical inference, the test of hypothesis (3) involves the following assumptions: 1) The active control effect exists in the current trial. 2) The active control effect and its associated variability need to be correctly assessed. 3) The active control effect is either maintained in the current trial (constancy assumption), or is reduced by some fraction. The first assumption is more fundamental and is usually unverifiable. When this assumption does not hold, a harmful (worse than placebo) drug may be approved based on the test. If there is no reliable historical studies for the control, the active control non-inferiority trials can not be designed. When there are historical well-controlled studies on the basis of which the distribution of the active control effect can be estimated, the control effect can be assessed if the study populations are reasonably the same from past trials to current trials. If there are currently effective therapies where there were none during those past trials, then the true current active-control effect may be modeled as a fraction of the active-control effect. Publication and selection biases are major concern in the determination of the control effect. If only favorable studies are included for the control effect - then the historical active-control effect will be over-estimated which will inflate the type I error probability for the “noninferiority” analysis. To avoid such biases, some analysts suggest using all trials (including both published and unpublished). However, this approach may not be feasible in practice. 2.2 Adjusting the active control effect and standard error The active control effect is either maintained in the current trial (constancy assumption), or is reduced by some fraction. If there is reason to believe that true loghazard ratio of P vs. C for the current study is different from the true mean log-hazard ratio of P vs. C from
those non-concurrent trials, then the estimator of the true mean log-hazard ratio of P vs. C from those nonconcurrent trials should be adjusted by some appropriate factor, θ > 0. Let log HR(P2/C2) denote the true (theoretical) log-hazard ratio for the reference placebo vs. the active control, had the two arms in the trial been the reference placebo and active-control and let log HR(P1/C1) denote the true log-hazard ratio for the reference placebo vs. the active-control. It may be reasonable to assume log HR(P2/C2) = θ log HR(P1/C1) for some fix θ > 0, or HR(P2/C2)-1 = θ [HR(P1/C1)-1] some fix θ > 0. The assessment of the variation associated with the estimate of the control effect is based on both within and between trial variability. When there are one or two non-concurrent trials that compare the reference placebo vs. the active-control, the between trial variability cannot be assessed. If it is believed that the true reference placebo vs. active-control hazard ratio for a trial varies from trial to trial, not taking between trial variability into consideration may be problematic and the efficacy of the active-control may be in question. There may be other reasons for desiring to magnify the standard error from a meta-analysis. For example, the standard error may be magnified if there is uncertainty of whether or not the active-control effect for the current trial is similar to that of the non-concurrent trials. An ideal scenario is that there are many placebo-controlled trials with little heterogeneity, conducted for demonstration of the active-control effect; the active-control effect size is not small and has not changed over time; and these trial have patient populations similar to the current active-control trial. If the control effect is small, that is, HR(P/C) ~1, testing hypothesis (3) is close to testing the following superiority hypothesis H0: HR(T/C) > 1 vs. Ha: HR(T/C) < 1. In this case design of a non-inferiority trial is not feasible. 3. Discussion Active-control “non-inferiority” trials should be considered on a case-by-case basis. We need to have assurance that the active-control effect exists in the current patient population. We need to assess whether
the current effect size, if exists, has diminished. We should have multiple historical randomized placebo controlled studies that show relatively consistent results. An active-control “non-inferiority” trial may be prone to bias towards “no difference.” Careful attention should be paid to the conduct and analysis of such a study. In the design of a non-inferiority trial, we believe that the hypothesis with a percent retention of the control effect is more appropriate formulation. Hypotheses with any type of fixed margin, regardless arbitrary margin or margin calculated based on the control effect, are in general problematic. For the sample size calculation in an active control noninferiority design, a few assumptions are needed, including type I and type II error (or power) rate, the estimate of the treatment effect (relative to the control), the estimate of the control effect (relative to placebo) and associated standard errors, and a pre-specified percent retention. In other words, the estimate of a “margin” based on the estimate of the control effect is needed only for the purpose of the sample size calculation. The hypothesis with a fixed “percent retention” should be formulated in the study design. The randomized active control trial should be well monitored and conducted. After the study is done and all efficacy data are collected, the conclusion whether the new treatment is effective should be based on testing protocol-specified hypothesis under an appropriate significance level. In some Oncology diseases, response rate may be used as a surrogate to assess the efficacy of the new treatment. In this case, a hypothesis with a fixed margin may be used in the design of an active control trial since one may believe that the tumor response can be attributed to the treatment only. The hypothesis with a predetermined margin defined as a certain percent retention of the control response rate can be used in the study design.
References: Blackwelder, W. C. (1982), “Proving the Null Hypothesis in Clinical Trials”, Controlled Clinical Trials 3 345-353. CBER memo (1998), “EXCERPTS from a CBER Memorandum Discussing Aspects of Active Comparator Trials of Thrombolytics in AMI” Fleming, T.R. (1990), “Evaluation of Active Control Trials in AIDS,” J AIDS 3 (Supplement 2) S82-S87. Fleming, T.R. (2000). “Design and Interpretation of Equivalence Trials,” Amer Heart J 139 s171-s176.
Hasselblad, V. and Kong D. F. (2001), “Statistical Methods for Comparison to Placebo in Active-control Trials,” Drug Inform J, 35 435-449. Hauck, W. W. and Anderson, S. (1999), “Some Issues in the Design and Analysis of Equivalence Trials,” Drug Inform J, 33 109-118. Holmgren, E. B. (1999), “Establishing Equivalence by Showing That a Specified Percentage of the Effect of the Active Control over Placebo is Maintained,” J Biopharm Statist 9 (4) 651-659. Huque, M., Dubey, S., and Fredd, S. (1989), “Establishing Therapeutic Equivalence with Clinical Endpoints,” Proc Biopharm Sec 46-52. International Conference on Harmonisation (2000), Choice of Control Group and Related Design Issues in Clinical Trials (ICH E10),” Food and Drug Administration, DHHS. Koch, G. and Tangen, C. (1999), “Nonparametric Analysis of Covariance and Its Role in Noninferiority Clinical Trials,” Drug Information Journal, 33, pp. 1145-1159. Simon, R. (1999), “Bayesian Design and Analysis of Active Control Clinical Trials,” Biometrics, 55 484487. Temple, R. (1983), “Difficulties in Evaluating Positive Control Trials,” Proc Amer Statist Assoc (Biopharmaceutical Section) 1-7. Temple, R. (1996), “Problems in Interpreting Active Control Equivalence Trials,” Accountability in Research 4 267-275. Temple, R. and Ellenberg, S.S. (2000). “PlaceboControlled Trials and Active Control Trials in the Evaluation of New Treatments, Part I: Ethical and Scientific Issues,” Ann Int Med 133 (6) 455-463. White, H. D. (1998), “Thrombolytic therapy and equivalence trials,” J Am Coll Cardiol 31 494-496.