2.1 Pharmacogenetics in randomized controlled trials: considerations for trial design. 17. 2.2 Potential of adaptive clinical trial designs in pharmacogenetic ...
Personalized Medicine Pharmacogenetic Testing in Drug Development and Clinical Practice
Frederieke van der Baan
Personalized Medicine – Pharmacogenetic Testing in Drug Development and Clinical Practice Thesis, Utrecht University, with a summary in Dutch © 2012 Frederieke van der Baan ISBN Author Layout Cover design Photos cover Printed by
978-94-6108-306-7 Frederieke van der Baan Karin Baars-Baak Jeroen Baars www.shutterstock.com, Warren Goldswain Gildeprint Drukkerijen, The Netherlands
Personalized Medicine Pharmacogenetic Testing in Drug Development and Clinical Practice
Geneesmiddelen op maat Farmacogenetica tijdens de ontwikkeling van geneesmiddelen en in de klinische praktijk (met een samenvatting in het Nederlands)
Proefschrift
ter verkrijging van de graad van doctor aan de Universiteit Utrecht op gezag van de rector magnificus, prof.dr. G.J. van der Zwaan, ingevolge het besluit van het college voor promoties in het openbaar te verdedigen op donderdag 28 juni 2012 des middags te 12.45 uur
door
Frederieke Hannelies van der Baan geboren op 6 april 1980 te Bunnik
Promotoren:
Prof.dr. D.E. Grobbee Prof.dr. A.C.G. Egberts
Co-promotoren:
Dr. M.J. Knol Dr. O.H. Klungel
The studies presented in this thesis were performed in the context of the Escher project (T6-202), a project of the Dutch Top Institute Pharma. Financial support by the Dutch Heart Foundation for the publication of this thesis is gratefully acknowledged. Additional financial support was provided by: J.E. Jurriaanse Stichting, Rotterdam, The Netherlands Stichting KNMP-Fondsen
Contents
1
General introduction
2
Trial design
15
2.1
Pharmacogenetics in randomized controlled trials: considerations for trial design
17
2.2
Potential of adaptive clinical trial designs in pharmacogenetic research
31
2.3
Optimizing trial design in pharmacogenetic research Comparing a fixed parallel group, group sequential and adaptive selection design on sample size requirements
47
Prediction and prognosis
65
3.1
Added value of pharmacogenetic testing in predicting statin response Results from the REGRESS trial
67
3.2
Early identification of statin responders
85
Practice and society
99
3
4
5
7
4.1
Impact of CYP2D6 and CYP2C19 polymorphisms on consumption of health care in psychiatric practice
101
4.2
Consent in psychiatric biobanks for pharmacogenetic research
117
General discussion
127
Barriers hindering the implementation of pharmacogenetic testing: safety versus efficacy
6
Summary Samenvatting Dankwoord Curriculum Vitae
139 147 155 161
General introduction
1
General Introduction
The Escher project: science-driven drug regulation and innovative research throughout phased drug development The pharmaceutical sector is highly regulated, with the aim to secure safe, effective and high quality drugs. Drug regulatory agencies are under increasing pressure to balance the desire for rapid market access to new drugs with the need for limited uncertainty on the benefits and risks of the new drugs [1]. The regulations corresponding to these high pressures are driving the costs of drug development, and thereby threaten drug innovation. The productivity decline in pharmaceutical industry has been observed for more than a decade: increased pharmaceutical research and development (R&D) investments did not result in an increased output of new active substances (Figure 1). The costs of drug development are so high that new medicines are in danger of becoming unaffordable for manufacturers to develop or for insurance companies and consumers to pay for [2]. Moreover, many of the drugs that reach the market have only limited added value over already existing treatments, despite the fact that there still are diseases for which pharmaceutical treatments either do not exist or are inadequate. This indicates that industry prefers to avoid risks and chooses to invest in already existing molecules with proven efficacy. The risk of failure is indeed considerable: success rates are estimated at 50-70% in Phase III, mainly because superiority to placebo cannot be proven [3;4], and 40% of the new active substances that survived Phase III received a negative opinion
Figure 1. The drug development productivity gap. Number of original New Drug Applications (NDAs) received by the U.S. Food and Drug Administration (FDA) per year and yearly R&D investments by Pharmaceutical Research and Manufacturers of America (PhRMA) Member Companies. Sources: FDA Prescription Drug User Fee Act (PDUFA) Performance reports and PhRMA Annual Membership Survey 2010.
9
1
Chapter 1
by the European Medicines Agency (EMA) Committee for Medicinal Products for Human Use (CHMP) in 2009 or were withdrawn (shortly) before an opinion was delivered [5]. Top Institute Pharma’s Escher project brings together university and pharmaceutical partners with the aim to energize pharmaceutical R&D by identifying, evaluating and removing regulatory barriers and by improving the scientific quality and efficiency of late phase drug research, to bring efficacious and safe medicines to patients in an efficient and timely fashion [6].
Pharmacogenetics and optimizing benefits and safety of (new) drugs When a drug has proven to be efficacious on a population level, it may still fail to work in individual patients or may cause serious side effects. Physicians and patients, but also pharmaceutical industry, would benefit greatly from the possibility to identify before the start of a treatment those patients likely to be non-responders to the standard dose as well as patients with an increased risk of adverse drug reactions. These patients can be treated with an adapted dose or an alternative drug to optimize drug outcomes. The basis of interindividual variability in response to treatment, however, is in most cases complex and multifactorial. Potential causes of differences in drug response include age, BMI, gender, co-medication, co-morbidities, and genetic polymorphisms. Pharmacogenetic research studies the contribution of genetic factors in between-patient variability in both efficacy and safety of a drug. One of the first examples of pharmacogenetics was described already more than 60 years ago. During World War II, it was observed that the antimalarial drug primaquine was associated with acute hemolytic crises mainly in AfricanAmerican soldiers, and rarely in Caucasian soldiers. Later, it was shown that this sensitivity was caused by a genetically determined deficiency of glucose 6-phosphate dehydrogenase (G6PD), which altered erythrocyte metabolism [7;8]. With the deciphering of the human genome, expectations of individualized treatment were high. The interest in pharmacogenetics and the rise in technological possibilities is reflected in a sharp rise of publications in the field of pharmacogenetics since 1994 [9]. The first pharmacogenetic research was hypothesis-driven and focused on single gene mutations in candidate gene association studies. Nowadays also hypothesis-generating research is frequently performed in genome-wide association studies. A clear distinction between the terms ‘pharmacogenetics’ and ‘pharmacogenomics’ has disappeared and the two terms are now used interchangeably. The possibility to use genetic information to predict a patient’s drug response, contributes on an individual level to benefit-risk profile of a drug. This is advantageous for physicians and patients, if therapeutic recommendations on drug and dose can help prevent side 10
General Introduction
effects or non-response in clinical practice [10]. The example of abacavir, a drug used in the treatment of HIV, is one of the success stories of pharmacogenetic testing. Serious and sometimes fatal hypersensitivity reactions were reported in approximately 8% of the patients in nine clinical trials. The occurrence of the reactions have been strongly associated with carrying the HLA-B*5701 allele. Prior to initiating therapy, screening for this allele is recommended and for HLA-B*5701-positive patients, treatment with an abacavir-containing regimen is not advised [11;12]. The use of this pharmacogenetic test in clinical practice to identify patients at risk for a hypersensitivity reaction has shown to be a cost-effective use of healthcare resources [13]. Identifying patients that are genetically predisposed to benefit from a certain drug, can also be beneficial in the drug discovery and development phases for the pharmaceutical industry. Early knowledge of (genetic) biomarkers for efficacy and safety increases the chance of successful registration and therefore lowers the risk of (late phase) failure in drug development. Besides, pharmacogenetic research may increase the knowledge on disease susceptibility genes, potentially leading to new drug targets. However, if a genedrug interaction exists, treatment outcomes will vary significantly between subpopulations carrying different genotypes, which has certain implications for study designs. For example, including different subpopulations in a standard parallel designed trial, requires large sample sizes to ensure sufficient power for a subgroup analysis. Alternatively, limiting the study population to one selected subgroup, consequently restrains the level of information on the drug. To stimulate the incorporation of pharmacogenetics into drug development and clinical practice, it is important to investigate the appropriate study designs for pharmacogenetic research, that result in the largest gain of evidence and are as efficient as possible in terms of time, money and sample size [14]. According to the Boston Consulting Group (2001), pharmacogenetics provides the possibility to streamline the drug development process in designing the ‘standard’ clinical trials in a more targeted way, requiring lower sample sizes and less time [15].
Objectives and outline of thesis In the context of the Escher project, the objective of this thesis was to study the use of pharmacogenetics throughout different phases of drug research as well as in day to day use in the post-marketing phase. In the first part of the thesis, different randomized controlled trial (RCT) designs suitable for pharmacogenetic research are investigated and compared. The second part focuses on the added value of using pharmacogenetics in clinical practice and on ethical considerations of using genetic information in drug research.
11
1
Chapter 1
Chapter 2 provides three articles on trial design. Chapter 2.1 describes the two main research questions addressed in pharmacogenetic research and evaluates possible RCT designs, differing in timing of randomization and genotyping. Chapter 2.2 focuses on adaptive trial designs and how they may be beneficial in pharmacogenetic research. With a simulation the potential benefit of an adaptive enrichment design is illustrated. In Chapter 2.3, simulation is used to compare an adaptive selection design and a group sequential design to a fixed parallel design, to study which trial design is most efficient in required sample size. Chapter 3 provides two studies on prediction and prognosis. In Chapter 3.1, it is investigated whether pharmacogenetic variables, both single genetic variants and gene-gene interactions, have an added value over demographic, clinical and lifestyle determinants in predicting statin response. In Chapter 3.2 a prognostic model is developed, composed of both genetic and non-genetic predictors, to predict a patient’s statin response. Chapter 4 provides two studies on pharmacogenetics in relation to clinical practice and society. In Chapter 4.1, the influence of predicted CYP2D6 and CYP2C19 phenotypes on the consumption of care in psychiatric inpatients is studied. Chapter 4.2 discusses the ethical aspects concerning informed consent in the establishment of a psychiatric biobank with pharmacogenetic research aims. Finally, in Chapter 5, different barriers hindering the implementation of pharmacogenetic testing in clinical practice are discussed, with a focus on the differences between pharmacogenetics intended for safety (risk reduction) and for treatment efficacy.
12
General Introduction
Reference List
1
1.
Eichler HG, Pignatti F, Flamion B et al. Balancing early market access to new drugs with the need for benefit/risk data: a mounting dilemma. Nat Rev Drug Discov 2008; 7(10): 818-26.
2.
Rawlins MD. Cutting the cost of drug development? Nat Rev Drug Discov 2004; 3(4): 360-4.
3.
Gordian M, Singh N, Zemmel R, Elias T. Why Products Fail in Phase III. In Vivo 2006; 24(4).
4.
Douglas FL, Mitchell L. Assessing Risk and Return: Personalized Medicine Development & New Innovation Paradigm. Ewing Marion Kauffman Foundation 2008; http://www.kauffman.org/uploadedFiles/HHS_White_Paper_1008.pdf.
5.
Eichler HG, Aronsson B, Abadie E, Salmonson T. New drug approval success rate in Europe in 2009. Nat Rev Drug Discov 2010; 9(5): 355-6.
6.
The Escher Project: Science driven drug regulation and innovative research throughout phased drug development. Top Institute Pharma, start date: July 2007; http://www.tipharma.com/projects/efficiency-analysis-drug-discovery-process/the-escherproject.html.
7.
Clayman CB, Arnold J, Hockwald RS et al. Toxicity of primaquine in Caucasians. J Am Med Assoc 1952; 149(17): 1563-8.
8.
Alving AS, Carson PE, Flanagan CL, Ickes CE. Enzymatic deficiency in primaquine-sensitive erythrocytes. Science 1956; 124(3220): 484-5.
9.
Holmes MV, Shah T, Vickery C et al. Fulfilling the promise of personalized medicine? Systematic review and field synopsis of pharmacogenetic studies. PLoS One 2009; 4(12): e7960.
10.
Swen JJ, Nijenhuis M, de Boer A et al. Pharmacogenetics: from bench to byte--an update of guidelines. Clin Pharmacol Ther 2011; 89(5): 662-73.
11.
Ziagen EPAR - Product Information. European Medicines Agency, last updated April 2011; http://www.ema.europa.eu/docs/en_GB/document_library/EPAR__Product_Information/human/000252/WC500050343.pdf.
12.
Ziagen label. US FDA, last updated Nov 2011; http://www.accessdata.fda.gov/drugsatfda_docs/label/2011/020977s023,020978s027lbl.pdf.
13.
Hughes DA, Vilar FJ, Ward CC et al. Cost-effectiveness analysis of HLA B*5701 genotyping in preventing abacavir hypersensitivity. Pharmacogenetics 2004; 14(6): 335-42.
14.
Stingl Kirchheiner JC, Brockmoller J. Why, when, and how should pharmacogenetics be applied in clinical studies?: current and future approaches to study designs. Clin Pharmacol Ther 2011; 89(2): 198-209.
15.
Tollman P, Guy P, Altshuler J et al. A revolution in R&D: how genomics and genetics are transforming the biopharmaceutical industry. Boston Consulting Group Report, Nov 2001.
13
General Introduction
1
Trial design
2
15
Pharmacogenetics in randomized controlled trials: considerations for trial design
2.1
FH van der Baan, OH Klungel, ACG Egberts, HG Leufkens, DE Grobbee, KCB Roes, MJ Knol Pharmacogenomics 2011; 12 (10): 1485 - 92.
17
Chapter 2.1
Abstract Pharmacogenetic analyses of randomized controlled trials aim either to detect whether a subgroup of patients identified by genetic characteristics responds differently to the treatment or to verify whether a proposed genotype-guided treatment is beneficial over standard care. This article describes three different trial designs, differing in the timing of randomization and genotyping. Each design has its own advantages, and the objectives and conditions under which each one is most suited are discussed.
18
Pharmacogenetics in randomized controlled trials: considerations for trial design
Pharmacogenetic research Pharmacogenetics (PGx) investigates the contribution of genetic variation to interindividual variability in response to drug treatment, with the ultimate aim to promote an optimal drug response for the individual patient. Identifying patients with an increased risk of adverse drug reactions or inefficacy beforehand, enables a physician to, for example, adjust the dose of the drug or to treat the patient with an alternative drug. Genotype-guided treatment could thus contribute on an individual patient level to the optimal benefit-risk ratio of a drug, in contrast with drug information on an overall population level as the typical outcome of randomized controlled trials (RCTs). Only a small number of PGx tests are currently used in clinical practice [1;2], in spite of a large number of articles published in the field of PGx. Possible reason for this low uptake is that clinical evidence to support the accuracy of a test and the benefit to a patient is perceived as uncertain in the absence of convincing data from RCTs [3;4]. The Evaluation of Genomic Applications in Practice and Prevention (EGAPP) Working Group has developed a systematic process for evidence-based assessment, focused on genetic tests and other applications of genomic technology [5]. With this initiative, scientific evidence is reviewed and assessed systematically and subsequently translated into clinical practice recommendations (e.g., [6]). It is not always straightforward which study design is appropriate for a specific PGx research question. Relevant study designs for PGx research have been discussed before [713], though in this article, we specifically focus on different design options in RCTs. There is still not agreement about the place of RCT in this field of research [14]. We describe the two main research questions addressed in PGx research and evaluate the possible RCT designs, differing in timing of randomization and genotyping. Without the aim of being complete, we present for each trial design a few examples, to highlight the main features of the design. To conclude we touch upon a current debate on the merits of RCTs in the field of PGx. Exploratory studies, such as genome wide association studies, are not within the scope of this paper since these have a different objective, approach and analysis, and they typically provide leads but no data directly applicable in clinical care.
Randomized controlled trials A RCT is an experiment in which the allocation to the exposure (e.g., drug treatment) is not based on clinical indication but is a random process, to assure comparability of potential prognostic factors between the treatment arms. PGx trials can be designed (or post-hoc analyzed) with the intention to study whether a subgroup of patients, defined by certain genetic characteristics, responds differently to the treatment. Alternatively, a trial 19
2.1
Chapter 2.1
can be aimed to verify whether genotype-guided treatment is beneficial over standard care. In the next section, the two corresponding research questions are addressed and the RCT designs that are to be considered are discussed.
Genotype as effect measure modifier Research question In the early stages of a PGx research trajectory, when a hypothesis is generated about specific genes that could be involved in variation in drug response, based on, for example, biological knowledge or evidence from similar drugs, it is relevant to gain knowledge to either confirm or disprove this hypothesis. The research question is: ‘Does the treatment effect of drug A vary between subjects with different genotypes?’ which is essentially questioning whether the gene is an effect measure modifier of the treatment effect of the drug. Post-hoc subgroup analysis The choice for an optimal study design, given this research question, depends to a large extent on the type of data (readily) available for analysis. If the drug in question is in latestage development or already on the market, it has already been tested in clinical trials. If the drug response in question, either in terms of efficacy or safety, was measured in the previous trial(s), it makes sense to use these data for a subgroup analysis. Design and analysis A RCT with a pharmacogenetic subgroup analysis is characterized by randomization into different treatment arms and subsequent use of genotype results on all participants for the subgroup analysis (Figure 1A). In this design, treatment with hypothetical drug A is compared with either an active comparator or placebo, and subjects are dichotomized on the basis of whether they possess at least one allele different from the most common genotype (genetic variant and wild-type, respectively). The comparative treatment effect is estimated for both genotypes. If they differ significantly, interaction is present in the statistical sense between the treatment effect and the genotype, and the genotype is called an effect measure modifier. As an example, Wegman et al. studied the genotypes of CYP2D6 and SULT1A1 in breast cancer patients participating in a trial, to determine the relation between these genes and the anticipated benefit from tamoxifen therapy [15]. Sherva et al. also performed a pharmacogenetic subgroup analysis and concluded that different antihypertensive treatments seem to be more efficacious for patients homozygous for the MMP3 5A allele than for carriers of a 6A allele [16]. Khambata-Ford et al. studied whether K-RAS and EGFR
20
Pharmacogenetics in randomized controlled trials: considerations for trial design
mutations are predictors of benefit from cetuximab in tumors of patients with non-smallcell lung cancer [17]. Using data from a finalized trial in which DNA was already gathered at baseline and stored, as is more and more standard practice in Phase III trials, a PGx subgroup analysis is relatively simple, fast and inexpensive. Post-hoc collection of biomaterial, however, is time-consuming, so it can be efficient to approach a selection of the trial participants. If the outcome event is independent of the gene under study, which means the gene is not a prognostic factor for the outcome, it is valid to perform an exposure-only design. Only data from treated patients are then taken into account, which is justified since the risk for the outcome is identical for all untreated patients regardless of their genotype. To illustrate, Mega et al. performed an exposure-only subgroup analysis to study the association between genetic variants in cytochrome P450 (CYP) genes and cardiovascular outcomes in subjects with acute coronary syndromes treated with clopidogrel [18]. The
Figure 1. Randomized controlled trial designs. (A) Genotyping is used for post-hoc subgroup analysis. (B) Genotyping is used to enrich the study population. (C) Genotype-guided treatment is tested versus standard care.
21
2.1
Chapter 2.1
choice for this design implies that there is no direct association between the CYP alleles and the cardiovascular outcomes. This may, however, not be justified, as was demonstrated for CYP2C9 variant alleles and an increased risk for acute myocardial infarction [19]. Statistical aspects In general, the main advantage of a RCT is that the different types of bias are effectively handled by randomization and blinding. However, analyzing many subgroups can greatly increase Type I error rates and when analyzing the results of a trial in different strata, confounding bias can be introduced if the subgrouping variable is associated with another variable that also modifies the treatment effect [20;21]. In the case of (pharmaco)genetic subgroups, the chance of introducing bias is small since inheritance of a specific genotype is in general independent of inheritance of other traits. This principle is similar to Mendelian randomization, a method that uses polymorphisms that have an effect equivalent to that produced by a modifiable exposure, to classify patients in observational data. The underlying assumption of this method is that associations between genetic variants and outcome are not generally confounded by behavioural or (short-term) environmental exposures since genotypes are assigned randomly [22]. However, genetic differences in allele frequencies between (sub)populations (population stratification) may lead to misleading conclusions in a subgroup analysis if the distribution of these populations is unequal between the strata. Confounding may also arise through linkage disequilibrium (one genetic variant being in the proximity of another functional variant and thus they are transmitted together) or through genetic variants with multiple effects [22]. Selective consenting for genotyping by participants is another possible cause of bias. At the start of the study, this phenomenon is not likely to introduce bias since consenting is not related to the study outcome. If, however, biomaterial was collected (e.g., tumor tissue) which can be reused for post-hoc genotyping, the availability of the material might be selective owing to more frequent use from exceptional patients [7]. In case there is no DNA collected at baseline and it needs to be collected post-hoc, a bias might be introduced by a selective loss of patients or by selective consenting by participants as the study’s outcomes can influence the decision to consent. The sample size of the subgroups is also a critical factor to achieve balance in baseline characteristics. Even if selective loss of patients is not an issue and DNA would be available for all trial participants, a small sample size can provide imbalances in prognostic factors [23]. A major disadvantage of choosing a post-hoc subgroup analysis is that one is dependent on the genotype distribution in the original study population. If the trial was not a priori designed to test a possible gene-drug response interaction, it may lack statistical power or be subject to bias as a result of post-hoc selective analysis. 22
Pharmacogenetics in randomized controlled trials: considerations for trial design
Genotyping before enrollment If no (suitable) trial data is readily available for a subgroup analysis to answer the question of whether a specific gene is an effect measure modifier of the drug’s treatment effect, different study designs can be considered. Besides the possibility of an observational study, the design of a RCT in which genotype information is known before randomization is a good alternative. Design and analysis Genotyping before randomization offers the possibility to use the genotype information to enrich the study population, this is exemplified by Figure 1B. In design and analysis, this RCT is similar to the previous described one (Figure 1A), though the subgroup analysis is planned a priori. If the prevalence of the selected genetic variants is unequal, it is possible to selectively include participants to create balanced numbers in the genotype strata. This is efficient for the number of participants and costs, since an excess of the more prevalent genotype (wild-type in the example of Figure 1B) is prevented. To illustrate, Kemmeren et al. studied the risk of venous thrombosis in oral contraceptive users, and the influence of factor VLeiden [24]. The presence of the factor VLeiden mutation is relatively low; 5.5% of the women screened were heterogeneous carrier of this mutation. All these women and only a selection of the factor VLeiden negative women were approached for the trial. Another example is the study by Johnson et al. in which alcoholics were randomized by 5’-HTT genotype to either ondansetron or placebo, in order to have balanced numbers of the genotypes in the treatment arms [25]. With genotype information available before randomization, it is also possible to select patients with genotypes between which the largest difference in treatment effect is expected, instead of including all genotypes. Preskorn et al. studied the impact of largely decreased CYP2D6 metabolism on plasma concentrations of desvenlafaxine, compared to venlafaxine [26]. Hence, they compared poor metabolizers, having a reduced CYP2D6 activity, with extensive metabolizers, and excluded patients with a different CYP2D6 activity (intermediate and ultrarapid metabolizers). An excess of extensive metabolizers was excluded so as to create balanced group sizes. In a ‘classical’ enrichment design, only patients expected to show the greatest benefit from the intervention are enrolled. Using PGx in the selection is also possible by excluding genotypes assumed to be associated with an increased risk for side effects or nonresponse. This strategy reduces the sample size and can increase subject safety, and thus, may eliminate the need for other safety measures, for example close monitoring of drug plasma concentrations during the trial [27]. The design is similar to the one in Figure 1B, but note that the research question is different, namely: ‘What is the benefit of drug A
23
2.1
Chapter 2.1
over placebo or an alternative treatment for patients with a particular genotype?’. The question of whether the gene is an effect measure modifier of the treatment effect of the drug should already be answered in order to decide to exclude patients carrying a specific genotype. As an example, in the Phase III trial evaluating the efficacy and safety of trastuzumab, a monoclonal antibody specifically against human epidermal growth factor receptor 2 (HER2), only women with breast cancer that overexpress HER2 were included [28]. Statistical aspects The main advantage of this design compared to the first one, is that the sample size of study is determined taking the planned subgroup analysis into account, which decreases the risk of insufficient statistical power at the end. Bias introduced by selective consenting is avoided by genotyping at the beginning of the study as well. Moreover, the possibility of randomization by genotype, creating balanced numbers of the genotypes per treatment arm, further increases the chances of a significant result and minimizes the possibility of confounding introduced by unequal genotype distribution between the subgroups. Selective inclusion of particular genotypes to create similar numbers of subjects in the genotype strata may prolong recruitment time if a prespecified number of patients carrying a rare allele is to be included. If time is a limiting factor, it might be a better option to settle for fewer patients with the rare genetic variant, a genotype ratio of 2:1 instead of 1:1, for example, resulting in only minimal loss of power. Helpful in this consideration is the number needed to genotype in order to find an eligible patient, calculated by dividing one by the prevalence of the genotype(s) under study [29]. When the existing evidence is not strong enough to exclude a subgroup of patients based on genotype, an adaptive trial design provides an alternative. In an adaptive design, accumulating data is used to modify aspects of the study as it continues without undermining the validity and integrity of the trial [30]. An adaptive PGx trial could, for example, start with enrolling patients regardless of their genotype, and depending on interim data (or data from concurrent trials) it can be decided either to continue the study unchanged and establish efficacy and safety for the whole population, or to modify the design (i.e., drop the genotype for which treatment is not beneficial) to confirm efficacy and safety in the remaining genotype(s).
Added value of genotyping in clinical practice Research question At a later stage in the PGx research trajectory, after determining that a certain gene is indeed involved in the interindividual differences in drug response, corresponding
24
Pharmacogenetics in randomized controlled trials: considerations for trial design
treatment decisions (e.g., [31]) need to be tested and validated. For the application and interpretation of a (genetic) biomarker, its qualification is desirable, as recently described in a US Food and Drug Administration (FDA) draft guidance [32]. The benefits of implementation of genotype-guided treatment in clinical daily life are not self evident, especially when the drug is already on the market and other mechanisms might already be used to detect the specific safety issue or non-response (e.g., monitoring plasma concentrations). It is therefore essential to assess the added value of the PGx-based treatment over the current use of the drug and the corresponding costs. The research question is: ‘What is the benefit-cost ratio of genotype-guided treatment over standard care?’. Design and analysis A RCT in this setting is characterized by randomization to either genotype-guided treatment or standard care (Figure 1C). Genotype-guided treatment can include an alternative treatment (or exclusion) for subjects with predetermined genotypes. Alternatively, an algorithm that includes a subjects’ genotype (and possibly other parameters as well, such as age and weight) can be used to determine the initial or maintenance dose in this treatment arm. Analysis of this RCT is similar to a ‘regular’ RCT, in which two interventions are compared. The advantage of this RCT design is that the results deal with some of the main barriers to clinical uptake: scepticism towards clinical evidence and a perception that alternative ways of handling the risks of adverse drug reactions suffice [4]. Also, methodological or logistic challenges of the implementation of genotyping in daily practice may occur during the study, that were not considered without the trial setting. In 2008, one of the first RCTs of this type was published. Patients infected with HIV type 1 were randomly assigned to either prospective HLA-B*5701 screening or standard care use of abacavir [33]. The results of this trial, combined with results from previous observational studies, were reason to change abacavir’s drug label and recommend (FDA) or require (European Medicines Agency) screening for the HLA-B*5701 allele prior to initiating therapy with abacavir. Another example of a RCT according to this design was recently published by Ito et al. [34]. They used a set of five genes to predict a patient’s response to paclitaxel in primary chemotherapy. In the trial, patients with breast cancer were randomly assigned to either (arm A) receive paclitaxel without genetic testing or (arm B) have the genetic sensitivity test diagnose their sensitivity to paclitaxel. Patients in arm B, diagnosed as insensitive to paclitaxel, received an alternative chemotherapy, and the paclitaxel-sensitives received paclitaxel as in arm A. For anticoagulant therapy, PGx-based treatment strategies have also been developed (e.g., [35]). For these drugs, however, translation of a PGx test result into treatment adjustment is more complex than in the abacavir example, which might be an extra barrier 25
2.1
Chapter 2.1
to clinical uptake. Besides, several different algorithms for warfarin dosing have been proposed [36], and therapy is currently monitored and tailored on the basis of individual INR values. Nevertheless, studies similar to the design in Figure 1C are running at this moment, to assess the safety, clinical utility and cost-effectiveness of the proposed dosing algorithms [37;38]. Statistical aspects Besides evaluating the added clinical value of the genotype-guided treatment over standard care, this design offers the possibility to calculate the cost-effectiveness of genotyping all patients and adjusting treatment to the patient’s genotype. Recently, modeling approaches have been described to estimate the cost-effectiveness of PGx testing in psychiatry [39;40]. However, randomization to either standard care or PGxbased treatment, enables a direct estimation of the benefit that genotyping will provide to the prevention of adverse drug reactions or non-response, necessary to assess the costsavings of genotyping [41]. At the same time, a ‘break-even point’ of the benefit can be calculated, illustrating the number of patients needed to genotype, and to treat accordingly, for the testing to be (cost) effective. This is valuable in the discussion about implementation of the test, since in practice it is likely that not all physicians will use the PGx-based treatment guidance or act accordingly. However, since two active treatments are compared in this RCT design, which may result in small expected effects, it is important to realize that sample sizes may need to be large in order to have sufficient statistical power. If the effect of genotyping is only expected in a small subgroup, sample sizes increase even further in order to end up with a significant result.
Concluding remarks and future perspective Among all genetic variants that have been associated with a modified drug response, only a small proportion has been integrated into the drug label by the regulating authorities, and even less PGx tests are required or recommended before prescribing a drug [2]. Examples are abacavir and azathioprine, and also for some oncology drugs it is required to test the tumor status [42;43]. For many drugs, however, it is questioned whether retrospective analyses suffice to require testing. From a regulator’s point-of-view, the results of RCTs randomizing to either genotype-guided treatment or standard care are especially relevant, since the results estimate the benefits and costs on a population level of implementation of the PGx test. Also the Evaluation of Genomic Applications in Practice and Prevention (EGAPP) Initiative grades the evidence from RCTs to be most convincing for clinical utility questions [5]. On the other hand, the generalizability of trial results may be limited since trials often do not reflect the patient population treated in actual practice nor the way the drug is being
26
Pharmacogenetics in randomized controlled trials: considerations for trial design
monitored in daily practice [44;45]. Furthermore, from a practical and ethical point-ofview it can be debated whether evidence from a genotype-guided RCT is always necessary to generate evidence of clinical utility before implementing PGx information in the label [46]. In the case of abacavir, the RCT was criticized for not adding much information to the already existing evidence, plus being costly and time-consuming. It is expected that if genotype-guided RCTs are required for all PGx tests, only few will be implemented in daily practice in the coming years. Main considerations are the type and amount of data and the evidence that is already available and possible alternatives to obtain evidence. It seems reasonable that for genes such as those encoding drug-metabolizing enzymes, dose adjustments can be based on pharmacokinetic and pharmacodynamic properties of the drug instead of clinical endpoints, similar to for example patients with renal impairment, as also stated by Woodcock and Lesko [47]. For other genes, notably genes involved in type B adverse effects, which are typically unexpected [48], the influence of the gene is not as well known as with the genes that change enzyme activity, which makes the identification of surrogate endpoints more challenging. Observational data may very well be used to investigate type B adverse effects of a drug, since they are unanticipated when prescribed and confounding by indication will not be an issue [49]. To conclude, RCT results can be valuable to obtain evidence for the rational implementation of PGx testing. However, the added value needs to be assessed for each situation independently. Before choosing a design, the question to be answered must be carefully formulated, in view of the scientific evidence that is already available.
27
2.1
Chapter 2.1
Reference List 1.
Sheffield LJ, Phillimore HE. Clinical use of pharmacogenomic tests in 2009. Clin Biochem Rev 2009; 30(2): 55-65.
2.
Becquemont L. Pharmacogenomics of adverse drug reactions: practical applications and perspectives. Pharmacogenomics 2009; 10(6): 961-9.
3.
Ingelman-Sundberg M. Pharmacogenomic biomarkers for prediction of severe adverse drug reactions. N Engl J Med 2008; 358(6): 637-9.
4.
Corkindale D, Ward H, McKinnon R. Low adoption of pharmacogenetic testing: an exploration and explanation of the reasons in Australia. Personalized Medicine 2007; 4(2): 191-9.
5.
Teutsch SM, Bradley LA, Palomaki GE et al. The Evaluation of Genomic Applications in Practice and Prevention (EGAPP) Initiative: methods of the EGAPP Working Group. Genet Med 2009; 11(1): 3-14.
6.
Recommendations from the EGAPP Working Group: testing for cytochrome P450 polymorphisms in adults with nonpsychotic depression treated with selective serotonin reuptake inhibitors. Genet Med 2007; 9(12): 819-25.
7.
Smits KM, Schouten JS, Smits LJ et al. A review on the design and reporting of studies on drug-gene interaction. J Clin Epidemiol 2005; 58(7): 651-4.
8.
Little J, Sharp L, Khoury MJ et al. The epidemiologic approach to pharmacogenomics. Am J Pharmacogenomics 2005; 5(1): 1-20.
9.
Kirchheiner J, Fuhr U, Brockmoller J. Pharmacogenetics-based therapeutic recommendations--ready for clinical practice? Nat Rev Drug Discov 2005; 4(8): 639-47.
10.
Serretti A, Kato M, Kennedy JL. Pharmacogenetic studies in depression: a proposal for methodologic guidelines. Pharmacogenomics J 2008; 8(2): 90-100.
11.
Bromley CM, Close S, Cohen N et al. Designing pharmacogenetic projects in industry: practical design perspectives from the Industry Pharmacogenomics Working Group. Pharmacogenomics J 2009; 9(1): 14-22.
12.
Baker SG, Sargent DJ. Designing a randomized clinical trial to evaluate personalized medicine: a new approach based on risk prediction. J Natl Cancer Inst 2010; 102(23): 1756-9.
13.
Stingl Kirchheiner JC, Brockmoller J. Why, when, and how should pharmacogenetics be applied in clinical studies?: current and future approaches to study designs. Clin Pharmacol Ther 2011; 89(2): 198-209.
14.
Frueh FW. Back to the future: why randomized controlled trials cannot be the answer to pharmacogenomics and personalized medicine. Pharmacogenomics 2009; 10(7): 1077-81.
15.
Wegman P, Vainikka L, Stal O et al. Genotype of metabolic enzymes and the benefit of tamoxifen in postmenopausal breast cancer patients. Breast Cancer Res 2005; 7(3): R284R290.
16.
Sherva R, Ford CE, Eckfeldt JH et al. Pharmacogenetic effect of the stromelysin (MMP3) polymorphism on stroke risk in relation to antihypertensive treatment: the genetics of hypertension associated treatment study. Stroke 2011; 42(2): 330-5.
17.
Khambata-Ford S, Harbison CT, Hart LL et al. Analysis of potential predictive markers of cetuximab benefit in BMS099, a phase III study of cetuximab and first-line taxane/carboplatin in advanced non-small-cell lung cancer. J Clin Oncol 2010; 28(6): 918-27.
28
Pharmacogenetics in randomized controlled trials: considerations for trial design 18.
Mega JL, Close SL, Wiviott SD et al. Cytochrome p-450 polymorphisms and response to clopidogrel. N Engl J Med 2009; 360(4): 354-62.
19.
Yasar U, Bennet AM, Eliasson E et al. Allelic variants of cytochromes P450 2C modify the risk for acute myocardial infarction. Pharmacogenetics 2003; 13(12): 715-20.
20.
Groenwold RH, Donders AR, van der Heijden GJ et al. Confounding of subgroup analyses in randomized data. Arch Intern Med 2009; 169(16): 1532-4.
21.
Vanderweele TJ, Knol MJ. Interpretation of subgroup analyses in randomized trials: heterogeneity versus secondary interventions. Ann Intern Med 2011; 154(10): 680-3.
22.
Davey Smith G., Ebrahim S. What can mendelian randomisation tell us about modifiable behavioural and environmental exposures? BMJ 2005; 330(7499): 1076-9.
23.
Wang SJ, O'Neill RT, Hung HJ. Statistical considerations in evaluating pharmacogenomicsbased clinical effect for confirmatory trials. Clin Trials 2010; 7(5): 525-36.
24.
Kemmeren JM, Algra A, Meijers JC et al. Effect of second- and third-generation oral contraceptives on the protein C system in the absence or presence of the factor VLeiden mutation: a randomized trial. Blood 2004; 103(3): 927-33.
25.
Johnson BA, it-Daoud N, Seneviratne C et al. Pharmacogenetic approach at the serotonin transporter gene as a method of reducing the severity of alcohol drinking. Am J Psychiatry 2011; 168(3): 265-75.
26.
Preskorn S, Patroneva A, Silman H et al. Comparison of the pharmacokinetics of venlafaxine extended release and desvenlafaxine in extensive and poor cytochrome P450 2D6 metabolizers. J Clin Psychopharmacol 2009; 29(1): 39-43.
27.
Murphy MP, Beaman ME, Clark LS et al. Prospective CYP2D6 genotyping as an exclusion criterion for enrollment of a phase III clinical trial. Pharmacogenetics 2000; 10(7): 583-90.
28.
Slamon DJ, Leyland-Jones B, Shak S et al. Use of chemotherapy plus a monoclonal antibody against HER2 for metastatic breast cancer that overexpresses HER2. N Engl J Med 2001; 344(11): 783-92.
29.
Mulder H, Heerdink ER, van Iersel EE et al. Prevalence of patients using drugs metabolized by cytochrome P450 2D6 in different populations: a cross-sectional study. Ann Pharmacother 2007; 41(3): 408-13.
30.
Gallo P, Chuang-Stein C, Dragalin V et al. Adaptive designs in clinical drug development--an Executive Summary of the PhRMA Working Group. J Biopharm Stat 2006; 16(3): 275-83.
31.
Swen JJ, Wilting I, de Goede AL et al. Pharmacogenetics: from bench to byte. Clin Pharmacol Ther 2008; 83(5): 781-7.
32.
FDA (2010) Guidance for industry, Qualification Process for Drug Development Tools http://www.fda.gov/downloads/Drugs/GuidanceComplianceRegulatoryInformation/Guidanc es/UCM230597.pdf.
33.
Mallal S, Phillips E, Carosi G et al. HLA-B*5701 screening for hypersensitivity to abacavir. N Engl J Med 2008; 358(6): 568-79.
34.
Ito Y, Nagasaki K, Miki Y et al. Prospective randomized phase II study determines the clinical usefulness of genetic biomarkers for sensitivity to primary chemotherapy with paclitaxel in breast cancer. Cancer Sci 2011; 102(1): 130-6.
35.
Warfarin Dosing www.warfarindosing.org.
29
2.1
Chapter 2.1 36.
Schelleman H, Chen J, Chen Z et al. Dosing algorithms to predict warfarin maintenance dose in Caucasians and African Americans. Clin Pharmacol Ther 2008; 84(3): 332-9.
37.
Anderson JL, Horne BD, Stevens SM et al. Randomized trial of genotype-guided versus standard warfarin dosing in patients initiating oral anticoagulation. Circulation 2007; 116(22): 2563-70.
38.
van Schie RM, Wadelius MI, Kamali F et al. Genotype-guided dosing of coumarin derivatives: the European pharmacogenetics of anticoagulant therapy (EU-PACT) trial design. Pharmacogenomics 2009; 10(10): 1687-95.
39.
Perlis RH, Patrick A, Smoller JW, Wang PS. When is pharmacogenetic testing for antidepressant response ready for the clinic? A cost-effectiveness analysis based on data from the STAR*D study. Neuropsychopharmacology 2009; 34(10): 2227-36.
40.
Serretti A, Olgiati P, Bajo E et al. A model to incorporate genetic testing (5-HTTLPR) in pharmacological treatment of major depressive disorders. World J Biol Psychiatry 2011.
41.
Rodriguez-Antona C, Gurwitz D, de LJ et al. CYP2D6 genotyping for psychiatric patients treated with risperidone: considerations for cost-effectiveness studies. Pharmacogenomics 2009; 10(4): 685-99.
42.
FDA, Table of Pharmacogenomic Biomarkers in Drug Labels, last updated 05/23/2011 http://www.fda.gov/Drugs/ScienceResearch/ResearchAreas/Pharmacogenetics/ucm083378. htm.
43.
European Medicines Agency, European Public Assessment Reports (EPARs) http://www.ema.europa.eu/ema/index.jsp?curl=pages/medicines/landing/epar_search.jsp& murl=menus/medicines/medicines.jsp&mid=WC0b01ac058001d125.
44.
Farahani P, Levine M, Gaebel K, Thabane L. Clinical data gap between phase III clinical trials (pre-marketing) and phase IV (post-marketing) studies: evaluation of etanercept in rheumatoid arthritis. Can J Clin Pharmacol 2005; 12(3): e254-e263.
45.
Zimmerman M, Mattia JI, Posternak MA. Are subjects in pharmacological treatment trials of depression representative of patients in routine clinical practice? Am J Psychiatry 2002; 159(3): 469-73.
46.
Lesko LJ, Zineh I, Huang SM. What is clinical utility and why should we care? Clin Pharmacol Ther 2010; 88(6): 729-33.
47.
Woodcock J, Lesko LJ. Pharmacogenetics--tailoring treatment for the outliers. N Engl J Med 2009; 360(8): 811-3.
48.
Meyboom RH, Lindquist M, Egberts AC. An ABC of drug-related problems. Drug Saf 2000; 22(6): 415-23.
49.
Vandenbroucke JP, Psaty BM. Benefits and risks of drug treatments: how to combine the best evidence on benefits with the best data about adverse effects. JAMA 2008; 300(20): 2417-9.
30
Pharmacogenetics in randomized controlled trials: considerations for trial design
2.1
Potential of adaptive clinical trial designs in pharmacogenetic research
2.2
FH van der Baan, MJ Knol, OH Klungel, ACG Egberts, DE Grobbee, KCB Roes Pharmacogenomics 2012; 13 (5): 571 - 8.
31
Chapter 2.2
Abstract Adaptive trial designs can be beneficial in pharmacogenetic research when prior uncertainty exists regarding the exact role and clinical relevance of genetic variability in drug response. This type of design enables us to learn about the effect of the genetic variability on drug response and to immediately use this information for the remainder of the study. For different types of adaptive trial designs, we discuss when and how the designs are suitable for pharmacogenetic research: adaptation of randomization, adaptation of patient enrollment and adaptive enrichment. To illustrate the potential benefits of an adaptive design over a fixed design, we simulated an adaptive trial based on the results of the IPASS trial. With a simple model we show that for this example an adaptive enrichment design would have led to a smaller trial, with less EGF receptor mutation-negative patients unnecessarily exposed to the drug, without compromising the α level or reducing power.
32
Potential of adaptive clinical trial designs in pharmacogenetic research
Introduction Randomized controlled trials (RCTs) have an important role in determining the efficacy and safety of interventions. However, RCTs are typically time consuming and costly in comparison to (retrospective) observational studies, and often large sample sizes are needed to ensure sufficient statistical power. In order to make a trial as efficient as possible in terms of time, money and/or sample size, it is possible to opt for an adaptive trial design, which allows prospectively planned modifications in design after patients have been enrolled in the study. Such a design uses accumulating data to decide how to modify aspects of the study during its progress, without undermining the validity and integrity of the trial [1;2]. An additional benefit is that the expected number of patients exposed to an inferior/harmful treatment can be reduced. Pharmacogenetic (PGx) research investigates the role of genetic variability as (causal) explanation of interindividual variability in response to treatment. Identifying patients who are probable non-responders to the standard dosing regimen and/or patients with an increased risk of adverse drug reactions contributes on an individual patient level to the benefit-risk ratio of a drug. In a trial with a fixed design, the patient population is defined beforehand, either an unselected or an enriched population [3]. In the case of PGx research, evidence may not be strong enough upfront to determine that a specific gene is indeed an effect measure modifier of the treatment effect. This complicates the choice whether or not to give patients with different genotypes the same treatment or dose, and whether or not to include all genotypes in the trial. In an adaptive PGx trial it is possible to learn about the effect of the genetic variability on the drug response and to immediately use this information for the remainder of the study. In this paper, we will discuss different types of adaptive designs that have been applied in clinical trials and address when and how the adaptive study designs are suitable for PGx research. In theory, all elements in a trial design can be adapted and multiple adaptations can even be combined in one trial. In practice, however, the most commonly used adaptations can be broadly divided into the following types: adaptation of the trial’s randomization and adaptation of patient enrollment. We will give a description of these adaptations as they are used in trials, illustrated by an example, and subsequently we will describe how the adaptation could be applied in PGx clinical trials. Adaptive population enrichment, a special adaptation of patient enrollment, is described in more detail since this type of design is considered particularly beneficial for PGx, but also complex to conduct. An adaptive enrichment trial was simulated based on data of the IPASS trial, a Phase III trial in non-small-cell lung cancer (NSCLC) patients. This trial aimed to compare the efficacy and safety of gefitinib versus carboplatin-paclitaxel and to examine the role of an epidermal growth factor receptor (EGFR) mutation as a predictor of treatment efficacy [4]. 33
2.2
Chapter 2.2
Adaptation of randomization Different adaptations in relation to the randomization of the trial are possible, based on interim analyses that were planned in advance of the trial, for example modifications of the dose or treatment allocation ratio. One option is the so-called ‘drop-the-loser’ design, in which one or more inferior treatment arms are dropped based on the results of interim data. The remaining treatment arms will continue in the second part of the trial, which can be either different doses of one drug (versus placebo), or different treatment regimens. It is even possible to introduce extra doses after the interim analysis, if the interim data suggest that the selected series of doses is not sufficient to obtain the intended treatment response. Besides an efficient design, it is attractive from an ethical point of view to subject a minimal number of patients to an inferior (or more harmful) treatment. Instead of dropping the inferior options at once, it is possible to gradually process the gained information and continuously update the randomization probabilities for each newly enrolled patient using Bayesian approaches. Example of a trial with interim analysis and adaptive dose selection Ho et al. performed a clinical trial with a two-stage adaptive design to select the optimal dose and to minimize patient exposure to nonefficacious doses of a novel drug (MK-0974) for the acute treatment of migraine [5]. In the first stage of the trial, patients were randomly allocated to one of seven MK-0974 dose levels, to rizatriptan or to placebo. Once 192 patients were randomized, an interim efficacy analysis was executed in order to select the MK-0974 doses to be continued in stage 2 of the trial. The lowest dose with more than 70% conditional probability of being nominally significant at the end of the trial was identified, based on a comparison of each MK-0974 dose with the placebo group. As a result, the four doses lower than this cut-off point were dropped after stage 1 of the trial due to insufficient efficacy. Example of a trial with adaptive allocation A second example is a trial comparing three treatment options for patients with acute myeloid leukemia [6]. Patients were randomized to one of the three treatment arms in an adaptive Bayesian fashion. Initially, the randomization was balanced with a probability of assignment to either arm equal to 0.33. Probability of random assignment to the control arm remained at 0.33 as long as all three arms remained in the trial, but randomization probabilities to the two investigational treatment arms were shifted in favor of the arm that performed better. After enrollment of 24 patients, one of the treatment arms was dropped based on insufficient success rates. After 34 patients in total, the probability of random assignment to one of the two remaining treatment options became also zero, due to minimal benefit, and the trial ended.
34
Potential of adaptive clinical trial designs in pharmacogenetic research
Application in PGx clinical trials Adaptation of randomization in PGx trials enables researchers to select the best treatment options for subpopulations with a specific genetic profile. For example, if a drug is (partly) metabolized by the enzyme cytochrome P 450 (CYP) 2D6, for which large genetic variability has been described, carriers of multiple functional alleles (ultrarapid metabolizers) may need a different dose than carriers of deficient alleles [7]. Using Bayesian statistics, the gained information of preceding patients with the same genotype (and possibly other patient characteristics) can be used to estimate the randomization probabilities for each new patient enrolled. Ultrarapid metabolizers might thus end up with higher doses than the other genotypes. The results of such a trial can ultimately simplify the development of PGx-based therapeutic recommendations and implementation of the PGx test in daily practice. A potential complication does arise in the inference. The trial will be able to demonstrate preference for specific doses for specific genotypes through its resulting allocation ratios. However, sample sizes (per dose and genotype) may not be adequate to actually demonstrate differences between genotypes in dose response in the classical significance testing sense. As an example, the efficacy (and safety) of lower doses may not be comparable between different CYP2D6 genotypes, if insufficient numbers of ultrarapid metabolizers are assigned to this treatment arm.
Adaptation of patient enrollment Adaptations based on interim analyses in relation to patient enrollment include early termination of a trial and sample size re-estimation. If the interim data show that there is no or minimal effect of the treatment over the comparison, deciding to terminate the trial due to futility saves time, money, and patient inclusion. On the other hand, the drug may already show its efficacy at the time of the interim analysis, which may lead to termination because of demonstrated efficacy. Another reason to not proceed with the trial is safety; an unwanted treatment response may be observed in the interim analysis. These decisions are usually guided by recommendations of an independent Data Monitoring Committee. Bauer and Köhne have proposed a flexible methodology for interim testing, using critical limits for the p-value in the first part of the trial and corresponding stopping rules that allow adaptations in the trial in the second phase [8]. It can also accommodate stopping early for futility. Alternatively, conditional power (CP) can be used at interim, which estimates the probability that the final study result will be statistically significant, given the data observed thus far [9-11]. Preset upper and lower thresholds for the CP can be used to guide decisions to terminate or continue; however, several arbitrary choices are necessary for the calculation of the CP [12]. A group sequential or a continuous sequential test procedure can also be used to allow interim stopping due to efficacy or futility [12]. A crucial part of all methods is the control of the Type I error rate. They differ
35
2.2
Chapter 2.2
in the extent to which they can accommodate adaptations to the trial, with the approach of Bauer and Köhne arguably the most flexible. If, based on the interim analysis, the decision is made to continue the trial, it is possible to recalculate the sample size needed in the second phase of the trial. In case of clinical events as primary endpoint, the event rate for the trial may be difficult to predict in advance, and the overall event rate at the moment of the interim analysis may be used to estimate the additional sample size needed. In case of a continuous outcome measure, the overall distribution at interim analysis is useful for sample size re-estimation. In both cases it is discouraged to consider the treatment specific effects for the sample size reestimation, as bias may be introduced. Nevertheless, the advantage remains that use of interim results for sample size re-estimation can reduce the risk of ending up with an underpowered study, but it is important not to use interim results based on small numbers since these results may be too unreliable. Example of an adaptive trial with interim analysis and sample size re-estimation Lecrubier et al. performed a placebo-controlled adaptive trial to compare the efficacy of St. John’s wort extract in patients suffering from a mild to moderate major depressive episode [13]. The design included an interim analysis after 169 patients in total, to avoid the ethically questionable exposure of an unnecessarily large number of depressed patients to placebo. The α levels as boundaries for early rejection or acceptance of the null hypothesis were set beforehand at 0.0153 and 0.20 respectively, following Bauer and Köhne [8]. In the interim analysis, a p-value of 0.037 was found for the null hypothesis relating to the difference between the treatment groups in the decrease of depression, which resulted in the decision to continue the trial and recruit another 206 patients (details of the sample size re-estimation not provided by Lecrubier et al.). At completion, the null hypothesis was rejected and St. John’s wort was considered effective in the treatment of mild to moderate depression. Application in PGx clinical trials In PGx clinical trials it may be suspected, but not known precisely, that treatment effects are differential across certain genotypes. Adaptive trials with sample size re-estimation may be applied here to ensure that these differential effects can be estimated with sufficient precision. At the planned interim analysis, the following evaluations could take place. In case there is no overall effect expected, the trial may be stopped due to futility. In case the trial continues, one could look at differential effects across genotypes. If there is no indication of differential effects, sample size of the second stage would only depend on the overall population. If there is an indication of differential effects, sample size reestimation could take into account the precision with which these differential effects should be estimated at trial completion. It will not be straightforward to set the criteria for the interim decisions concerning the required sample sizes. If differential effects are to be 36
Potential of adaptive clinical trial designs in pharmacogenetic research
assessed, large sample sizes may be required. However, in such a design this will only happen in cases where there is a clear indication of a genotype effect. The potential advantage over an enrichment design, where recruitment for one of the genotypes may be discontinued, is that the effects will be known much more accurately for all genotypes. Early termination after interim analysis in PGx clinical trials can be applied not only to the study population as a whole, but also to the subgroups, which means the trial could be continued with only patients with a selected genotype. This design is considered particularly beneficial for PGx, in case one or a few genetic markers are predictive of treatment response, but it also leads to considerable changes to the trial design and the original hypotheses, hence we describe this design more extensively in the next session.
Adaptive enrichment In a trial with an adaptive enrichment design, interim results per subset may guide decisions to continue or drop patients from subsets for the remainder of the trial. When the treatment under study only has its effect in a particular subgroup, the chance of ending up with a significant treatment effect in the overall population is decreased, so enrichment of the study population contributes to the chance of a successful trial. Besides an increase of the power of the study, this design can also ensure that those patients are selected for the trial that may have the largest benefit (or least side-effects). Genetic, physiologic, or other baseline characteristics may distinguish patient subsets that have differing responsiveness to treatment. Such characteristics are usually known before the adaptive trial starts and hence pre-defined. To illustrate the potential benefits of an adaptive design over a fixed design, we simulated an adaptive trial based on the actual results of a published trial with a prospectively planned PGx subgroup analysis on gefitinib, the IPASS trial [4]. Gefitinib is a specific inhibitor of EGFR tyrosine kinase, since June 2009 licensed by the EMA for use in adult patients with locally advanced or metastatic NSCLC with activating mutations of EGFR tyrosine kinase. In the beginning of the clinical development program, clinical trials did not select patients based on EGFR expression (e.g. INTEREST, ISEL, INTACT and IDEAL), but in 2004 several studies showed that mutations in the EGFR gene were associated with the sensitivity of tumors to gefitinib [14-16]. Subsequent post-hoc subgroup analyses of the trials indicated that treatment benefit of gefitinib for patients with EGFR mutations was higher than for patients without EGFR mutations [17;18]. The IPASS trial, in which the evaluation of efficacy outcomes according to EGFR status was a preplanned objective, showed that the treatment response to gefitinib was different for patients with and without a mutation of the EGFR gene [4].
37
2.2
Chapter 2.2
Eligible patients in the IPASS trial had stage IIIB or IV NSCLC, with histologic features of adenocarcinoma, were never-smokers or former light smokers and had received no previous chemotherapy or biologic or immunologic therapy. Patients were randomly assigned to receive either gefitinib or paclitaxel and carboplatin (control). Figure 1 is a schematic presentation of the trial and one of its outcomes: proportion of patients with complete or partial tumor response.
Figure 1. IPASS trial. P1: Proportion of patients with partial or complete tumor response treated with gefitinib; P2: Proportion of patients with partial or complete tumor response treated with carboplatin-paclitaxel (control); g-: Mutation-negative subgroup; g+: Mutation-positive subgroup.
Methods of the simulation study Based on these results of the IPASS trial, we simulated an adaptive trial, using the methodology as described by Wang et al [19]. In this simulated adaptive trial we assumed that 200 patients were included before the interim analysis and, for simplicity, we assumed that if the trial continued after the interim analysis, another 200 patients were included (i.e., no sample size re-estimation). Patients were included in a 1:1 ratio with respect to their EGFR mutation status for efficiency reasons, which also seemed reasonable given the IPASS data. They were randomly assigned to either gefitinib or control, stratified according to genotype, and the primary outcome was tumor response. The tumor responses of the first 200 patients were used for the interim analysis. The effects between the treatment groups, both overall and within genotype, were transformed to standardized Z-scores. Subsequently, CP for the treatment effect in the whole population (CP0) was calculated based on the Z-score of the treatment effect and the predefined effect size, using the following equation:
38
∗ √∗ ∗∆
Potential of adaptive clinical trial designs in pharmacogenetic research
with c=1.96 for one-sided testing at a level of 2.5%, N0 is the sample size at the moment of interim analysis, N is the total sample size, Zi1 is the test statistic at the moment of interim analysis for the overall population (i=0) or the subgroup (i=1) and Δ is the planned effect size. Likewise, the CP for the mutation-positive population (CP1) was determined. The CPs were used to decide whether to continue the trial with the whole population in phase 2, or to continue only with the mutation-positive patients or to stop the trial due to futility. The decision scheme, adapted from Wang et al. 2009 is presented in Figure 2. The lower threshold (L) was set at 0.7 and the upper threshold (U) at 0.8, following the settings investigated by Wang et al. [19]. Two null hypotheses were formulated, one assuming no effect in the whole population (G0) and one assuming no effect in the mutation-positive population (G1). All testing was done at an α level of 2.5% one-sided and, following Wang et al., a basic assumption was that if there is no effect in G0 there is also no effect in G1. In case the trial continued with the whole population, the two hypotheses were tested hierarchically to control the overall α level. The null hypothesis in the mutation-positive subset was only tested if the hypothesis for the whole population was rejected. In case the trial continued in the subset G1, only the subset hypothesis was tested. If the trial was discontinued due to futility, no hypothesis testing was carried out. We simulated 10,000 trials for three different scenarios, using SAS 9.2 software. In the first scenario the α level was tested under the null hypothesis, assuming an equal response to gefitinib and control for both subpopulations (40% response rate). This means there is no benefit of gefitinib in the overall population or in the subpopulation, and both null hypotheses should be accepted. Incorrectly rejecting these hypotheses in this
Figure 2. Decision scheme. First, the probability of a statistically significant final study result is assessed, for the situation that the trial is continued with the overall population (i.e., CP0). If this probability is considered too low (CP0 < L), the probability of a statistically significant final study result of an enriched trial is assessed (i.e., CP1). If this probability is considered too low as well (CP1 < U), the trial is stopped at interim due to futility. CP: Conditional power; L: Lower threshold for CP; U: Upper threshold for CP. Adapted from [19]. 39
2.2
Chapter 2.2
scenario constitutes a Type I error. Thus this scenario verifies whether the α level is maintained at 2.5%. It should be noted that all cases where the treatment effect is 0, but where response rates do differ between subpopulations, also constitute valid null hypotheses, so we investigated this for a broader range of such scenarios as well. The second scenario was based on the results of the IPASS trial, assumed there was a 1:1 ratio with respect to the EGFR mutation status. In the IPASS trial, the absence of the EGFR mutation resulted in a treatment response to gefitinib that was worse than to the control, which shows that the basic assumption that if there is no effect in G0 there is also no effect in G1 in practice not always holds. Still, it is expected that in most situations the effect of the genetic variability is not this extreme, so a third scenario was simulated, assuming for the mutation-negative subpopulation an equal response to gefitinib and control (response rate 20%) and for the mutation-positive subgroup the same treatment effects as in the IPASS trial. Results of the simulation study Under the null hypothesis, 86.3% of the trials were stopped due to futility, only 0.06% continued with the whole population and 13.6% continued with the mutation-positive subgroup. The null hypothesis for the whole population was (incorrectly) rejected in 0.04% of the trials. The null hypothesis for the mutation-positive subgroup was rejected in 2.31% of the trials, which indicates that the overall α level across the two hypotheses was maintained. The same was true in case no benefit of the treatment over control was assumed, but responses to treatment and control differed between the mutation-negative and mutation-positive subgroups. In the second scenario, the tumor responses per treatment arm and genotype were set as the results in the IPASS trial. Under these conditions, only 0.01% of the trials continued the second phase with both mutation-positives and -negatives, 91.4% were continued with only mutation-positive patients in phase 2 and 8.6% of the trials were stopped due to futility (Table 1). In 99.7% of the trials that continued in phase 2, the null hypothesis for the mutation-positive subpopulation was ultimately rejected. The decision regarding the resulting course of the trial at the moment of interim analysis is dependent on the observed treatment effect so far. As a result, biases may occur in the estimation of the treatment effect. When the trial continued in the whole population, the effect of gefitinib (proportion of patients with partial or complete tumor response, treated with gefitinib) was overestimated and that of the control was underestimated (Table 1). This is due to the condition that the response difference at the moment of interim analysis must be relatively large in order for CP0 to be larger than L. In case the trial continued with the mutation-positive subset, the smallest bias in the estimates was observed, showing only a small overestimation for the gefitinib arm and a small underestimation for the controls, both for the population as a whole and for the mutation-positive patients. Only when the trial was stopped due to futility was the treatment effect of gefitinib underestimated. A 40
Potential of adaptive clinical trial designs in pharmacogenetic research
method to correct for this bias is at this moment not available. Changing the values of U and L in the decision scheme had only limited effect on these results: increasing L to 0.8 resulted in 0.0% of the trials continuing with the whole population. Decreasing U to 0.7 resulted in 96.2% continuing with the mutation-positive subset, at the expense of trials stopped for futility. In scenario 3, 60.2% of the trials continued with the whole population, 31.7% continued with the mutation-positive subset, and 8.2% of the trials stopped due to futility (Table 2). This illustrates that if the treatment effects of gefitinib and of carboplatin/paclitaxel were assumed equal for the mutation-negative patients, the probability of continuing the trial with both mutation-positives and -negatives increased. In contrast to the results of scenario 2, continuation with the mutation-positive subgroup led to an underestimation of the true effect of gefitinib in the whole population. This is caused by the fact that the treatment effect of the mutation-positives is no longer nullified by the mutation-negative patients, therefore if a large effect of gefitinib in the mutation-positive population was observed in the interim data, the trial continued with the whole population. Only interim data with a smaller gefitinib treatment effect resulted in a trial continued with the subgroup. In only 10.7% of the trials continued in phase 2 with the whole population, the null hypothesis for this population was rejected. On the other hand, in 98.5% of all trials continued in phase 2, the null hypothesis for the mutation-positive subgroup was rejected, still providing substantial overall power.
Conclusion As traditional RCTs are time consuming, costly and may not result in a significant result in the overall population if the drug benefits only a subset of this population, innovative strategies in RCT design are needed to increase flexibility and efficiency. Adaptive trial designs can be beneficial in PGx research when prior uncertainty exists regarding the exact role and clinical relevance of genetic variability in drug response. Different adaptive study designs were discussed for PGx research. First, a trial with adaptive randomization, offering the possibility to drop inferior treatment arms per genetic subgroup, or allocating the patients with a specific genotype to the treatment arm with the highest success rate. Second, a trial with adaptive patient enrollment, allowing sample size re-estimation per subgroup if there is an indication of differential effects between genotypes at interim analysis. Third, an adaptive enrichment design, which provides the possibility to start the study with a whole patient population and, if indicated by differential treatment responses, to drop genetic subgroups during the trial, ending up with patients showing best treatment response. With a simple model of an adaptive enrichment trial design we illustrated that in the case of gefitinib, an adaptive enrichment design would have led to a smaller trial, with less 41
2.2
42 42.5 36.5 32.1
Average of simulated trials, continued with the whole population (0.01%)
Average of simulated trials, continued with the mutation-positive subgroup (91.4%)
Average of simulated trials, stopped due to futility (8.6%) 40.0
34.9
31.0
35.4
63.1
71.8
81.0
71.2
56.5
46.8
39.0
47.3
1.2
1.1
4.0
1.1
Gefitinib
23.4
23.5
23.0
23.5
Carboplatin/ paclitaxel
EGFR mutation-negative
45.6 46.6 43.1 41.4
Average of simulated trials, continued with the whole population (60.2%)
Average of simulated trials, continued with the mutation-positive subgroup (31.7%)
Average of simulated trials, stopped due to futility (8.2%)
38.4
36.3
32.6
33.6
63.1
70.3
72.3
71.2
56.5
48.4
46.0
47.3
Carboplatin/ paclitaxel
Gefitinib
Gefitinib
Carboplatin/ paclitaxel
EGFR mutation-positive
Overall population
Assumption
Tumor response, %
19.8
16.9
20.8
20.0
Gefitinib
20.3
23.0
19.2
20.0
Carboplatin/ paclitaxel
EGFR mutation-negative
Table 2. Results of the simulated adaptive trials, assuming no difference in treatment effect in mutation-negative patients (scenario 3).
36.2
Carboplatin/ paclitaxel
Gefitinib
Gefitinib
Carboplatin/ paclitaxel
EGFR mutation-positive
Overall population
Based on IPASS trial
Tumor response, %
Table 1. Results of the simulated adaptive trials, assuming treatment effects as in the IPASS trial (scenario 2).
Chapter 2.2
Potential of adaptive clinical trial designs in pharmacogenetic research
EGFR mutation-negative patients unnecessarily exposed to the drug without compromising the α level. Also for a less extreme scenario, the simulation showed that the probability of continuation with a genetic subgroup is still reasonably high. This adaptive design is especially attractive in situations with evidence that certain subpopulations may benefit more from certain treatment. Currently, most PGx examples stem from oncology, though Kirsch et al. presented evidence that certain commonly prescribed antidepressants may only be effective for those with severe, rather than moderate, depression at baseline [20]. In that case, an interim analysis with the possibility to enrich to a subpopulation defined by certain baseline factors could be considered [21]. In addition, we illustrated that biases will occur in estimating the treatment effect at trial completion. These biases depend on the actual treatment effects, the settings in the adaptive design and of course the actual scenario the adaptive trial will result in. It has been indicated that adaptive trials are more complex to conduct and to analyze (e.g., [1]). In addition, the consequence of population enrichment is that there is limited information on the excluded subgroup(s), which limits the inference on efficacy and safety that can be made for these groups. Early stopping for efficacy results in less knowledge of the treatment compared to a fixed design, especially on safety. Drawbacks for early stopping for benefit have been repeatedly addressed, e.g. recently by Bassler et al. [22]. In conclusion, we have shown there is substantial potential to improve clinical trial efficiency with adaptive designs in PGx research. Critical statistical issues in estimation, controlling α levels, power and CP criteria need to be addressed. Further research is warranted to generalize the findings and include additional consideration on costs into the proper design choice.
43
2.2
Chapter 2.2
Reference List 1.
Gallo P, Chuang-Stein C, Dragalin V et al. Adaptive designs in clinical drug development--an Executive Summary of the PhRMA Working Group. J Biopharm Stat 2006; 16(3): 275-83.
2.
Hung HM, O'Neill RT, Wang SJ, Lawrence J. A regulatory view on adaptive/flexible clinical trial design. Biom J 2006; 48(4): 565-73.
3.
van der Baan FH, Klungel OH, Egberts AC et al. Pharmacogenetics in randomized controlled trials: considerations for trial design. Pharmacogenomics 2011; 12(10): 1485-92.
4.
Mok TS, Wu YL, Thongprasert S et al. Gefitinib or carboplatin-paclitaxel in pulmonary adenocarcinoma. N Engl J Med 2009; 361(10): 947-57.
5.
Ho TW, Mannix LK, Fan X et al. Randomized controlled trial of an oral CGRP receptor antagonist, MK-0974, in acute treatment of migraine. Neurology 2008; 70(16): 1304-12.
6.
Giles FJ, Kantarjian HM, Cortes JE et al. Adaptive randomized study of idarubicin and cytarabine versus troxacitabine and cytarabine versus troxacitabine and idarubicin in untreated patients 50 years or older with adverse karyotype acute myeloid leukemia. J Clin Oncol 2003; 21(9): 1722-7.
7.
Kirchheiner J, Nickchen K, Bauer M et al. Pharmacogenetics of antidepressants and antipsychotics: the contribution of allelic variations to the phenotype of drug response. Mol Psychiatry 2004; 9(5): 442-73.
8.
Bauer P, Kohne K. Evaluation of experiments with adaptive interim analyses. Biometrics 1994; 50(4): 1029-41.
9.
Halperin M, Lan KK, Ware JH et al. An aid to data monitoring in long-term clinical trials. Control Clin Trials 1982; 3(4): 311-23.
10.
Proschan MA, Lan KKG, Wittes JT. Statistical monitoring of clinical trials. Springer, NY, USA, 2006.
11.
Lachin JM. A review of methods for futility stopping based on conditional power. Stat Med 2005; 24(18): 2747-64.
12.
van der Tweel I, van Noord PA. Early stopping in clinical trials and epidemiologic studies for "futility": conditional power versus sequential analysis. J Clin Epidemiol 2003; 56(7): 610-7.
13.
Lecrubier Y, Clerc G, Didi R, Kieser M. Efficacy of St. John's wort extract WS 5570 in major depression: a double-blind, placebo-controlled trial. Am J Psychiatry 2002; 159(8): 1361-6.
14.
Lynch TJ, Bell DW, Sordella R et al. Activating mutations in the epidermal growth factor receptor underlying responsiveness of non-small-cell lung cancer to gefitinib. N Engl J Med 2004; 350(21): 2129-39.
15.
Haber DA, Bell DW, Sordella R et al. Molecular targeted therapy of lung cancer: EGFR mutations and response to EGFR inhibitors. Cold Spring Harb Symp Quant Biol 2005; 70: 41926.
16.
Pao W, Miller V, Zakowski M et al. EGF receptor gene mutations are common in lung cancers from "never smokers" and are associated with sensitivity of tumors to gefitinib and erlotinib. Proc Natl Acad Sci U S A 2004; 101(36): 13306-11.
17.
Hirsch FR, Varella-Garcia M, Bunn PA, Jr. et al. Molecular predictors of outcome with gefitinib in a phase III placebo-controlled study in advanced non-small-cell lung cancer. J Clin Oncol 2006; 24(31): 5034-42.
44
Potential of adaptive clinical trial designs in pharmacogenetic research 18.
Douillard JY, Shepherd FA, Hirsh V et al. Molecular predictors of outcome with gefitinib and docetaxel in previously treated non-small-cell lung cancer: data from the randomized phase III INTEREST trial. J Clin Oncol 2010; 28(5): 744-52.
19.
Wang SJ, Hung HM, O'Neill RT. Adaptive patient enrichment designs in therapeutic trials. Biom J 2009; 51(2): 358-74.
20.
Kirsch I, Deacon BJ, Huedo-Medina TB et al. Initial severity and antidepressant benefits: a meta-analysis of data submitted to the Food and Drug Administration. PLoS Med 2008; 5(2): e45.
21.
Rosenblum M, van der Laan MJ. Optimizing randomized trial designs to distinguish which subpopulations benefit from treatment. Biometrika 2011; 98(4): 845-60.
22.
Bassler D, Briel M, Montori VM et al. Stopping randomized trials early for benefit and estimation of treatment effects: systematic review and meta-regression analysis. JAMA 2010; 303(12): 1180-7.
45
2.2
Optimizing trial design in pharmacogenetic research
2.3
Comparing a fixed parallel group, group sequential and adaptive selection design on sample size requirements
*
*
FH van der Baan , R Boessen , RHH Groenwold, ACG Egberts, OH Klungel, DE Grobbee, MJ Knol, KCB Roes * both authors contributed equally Submitted 47
Chapter 2.3
Abstract A two-stage clinical trial design may be efficient in pharmacogenetic research when there is some, but inconclusive evidence of effect modification by a genomic marker. Two-stage designs allow to stop early for efficacy or futility (i.e. group sequential), and can offer the additional opportunity to enrich the study population to a specific patient subgroup (i.e. adaptive selection). This study compared sample size requirements for a fixed parallel group, a group sequential and an adaptive selection design with equal overall power and control of the familywise Type I error rate. The designs were evaluated across scenarios that defined the effect size in the marker-positive and marker-negative subgroups, and the genotype distribution in the overall study population. Effect sizes were chosen to reflect realistic planning scenarios, where at least some effect was present in the markernegative subgroup. In addition, scenarios were considered in which the actual subgroup effect differed from that assumed at the planning stage. As expected, both two-stage designs were generally more efficient than a fixed parallel group design, although the expected advantage was limited unless the difference in assumed subgroup effects was large. The adaptive selection design added little gain in sample size as compared to the group sequential design when the actual effect sizes were equal to those assumed at the planning stage. However, when the actual effect sizes differed strongly in favor of enrichment, the comparative advantage of the adaptive selection design increased, which precisely reflects the adaptive nature of the design.
48
Optimizing trial design in pharmacogenetic research
Introduction The costs of bringing a new drug to the market are estimated between 500 million and 2 billion US dollars [1;2]. Late-stage failures and rising costs of Phase II and III clinical trials are considered to be key contributors to this sum. In order to make trials as efficient as possible in terms of time, money and sample size requirements, adaptive trial designs are explored. Such designs enable to modify design aspects of an ongoing trial based on accumulating data without undermining the validity and integrity of the trial [3-5]. Possible adaptations include: early stopping for futility or efficacy and restricting patient enrolment after interim analyses to the most promising patient subpopulation [3;5;6]. Adaptive trial designs have the potential of major improvement; they can increase the likelihood of a successful trial and lower the number of patients exposed to an inferior or harmful treatment [5]. Pharmacogenetic (PGx) research investigates the role of genetic variability as explanation of interindividual differences in response to treatment. In a trial with a fixed parallel group design, the study population is defined beforehand, and may be unselected or selected (e.g., patients with a specific genomic marker). If there is a priori evidence that the genomic marker is a treatment effect modifier (based on for example biological knowledge or evidence from similar drugs), but the evidence is inconclusive, the choice whether to include an unselected or selected study population is complicated. In this situation, an adaptive design would allow researchers to start with an unselected population and decide after interim analysis either to continue with the entire population or enrich to the marker-positive subgroup [7;8]. An adaptive selection trial is most appropriate when there is evidence that the markerpositive subgroup benefits from treatment, while its complement (i.e. the markernegative subgroup) does not, but the evidence is not strong enough yet to justify a clinical trial exclusively focused on the marker-positive subgroup. Realistic and ethically acceptable assumptions for trial design and sample size planning in this situation would include a clearly positive effect (compared to the control treatment) in the markerpositive population. The assumed effect in the marker-negative population would be positive, but small or even close to absent (potentially not clinically relevant). Establishing the design and sample size on an assumption that the effect in the marker-negative subgroup is absent or even harmful would call into question the ethics of the design. A clinical trial restricted to the marker-positive population would then be clearly indicated. Previous studies have indicated that adaptive trial designs are often more powerful than fixed parallel group trials when both include an equal sample size [9-11]. This study took a different approach that started from the trial’s planning stage and focused on the number of patients needed to achieve a certain power. More specifically, we compared a fixed 49
2.3
Chapter 2.3
parallel group design, a group sequential and an adaptive selection design on sample size requirements across scenarios that defined differential effect sizes for the marker-positive and marker-negative subgroups, and the prevalence of the relevant subgroup in the total study population, while the overall power and Type I error rate of the designs was kept constant. The designs addressed the same family hypotheses; i.e. whether there is an effect in the overall population, the marker-positive subgroup or both. Moreover, the three designs were also evaluated across scenarios where the actual treatment effects in the patient subgroups were different from those assumed at the planning stage.
Methods Comparison of study designs This study compared the sample size requirements of three different study designs with equal overall power, and control of the familywise Type I error rate. The first was a conventional parallel group design in which patients from an unselected patient population were randomized to treatment or control, and the difference between groups (i.e., the treatment effect) was assessed after follow-up. The second was a group sequential design that consisted of two subsequent stages and allowed for early stopping when efficacy or futility was established after stage one. The third was an adaptive selection design that also allowed for early stopping, but offered the additional opportunity to enrich the study population to a pre-specified patient subpopulation in the second stage. Adaptive selection was opted for when the effect at interim was mainly driven by patients from the subpopulation. A graphical representation of the evaluated designs is presented in Figure 1. Study population The unselected study population was denoted by G0; G1 was the prespecified patient subpopulation (e.g. marker-positive), and G2 its complement (e.g. marker-negative). ∆i referred to the true average treatment effect in patient cohort i (i=0,1,2). The test statistic of each hypothesis was assumed to (approximately) follow a normal distribution, and was expressed as standardized Z-scores Zi. Finally, f denoted the fraction of G1 subjects in G0. We considered a 1:1 randomization to treatment or control in both G1 and G2. Also, we assumed ∆1 > ∆2, as it is in the situation that an adaptive design will be most appropriate. The true effect in the unselected patient population was calculated by: ∗ + 1 − ∗
50
(1)
Optimizing trial design in pharmacogenetic research
Tested were the null hypotheses H00: Δ0=0 versus H10: Δ0>0 and H01: Δ1=0 versus H11: Δ1>0. Both null hypotheses were considered of equal importance. A list of abbreviations is presented in Appendix 1. Parallel group design For the parallel group design (i.e. a single-stage trial with patients from G0), we estimated the sample size NPG required to achieve 80% statistical power, using an iterative optimization method. Power was defined as the probability of rejecting H00, H01 or both. Hypotheses were tested one-sided with the Hochberg multiple testing procedure to control the Type I error [12]. This means that both H00: Δ0=0 and H01: Δ1=0 were rejected when the corresponding p-values were smaller than the nominal significance level. Alternatively, if the smallest of these p-values was less than half the nominal p-value, only the corresponding null hypothesis was rejected. Group sequential design The two stages of the group sequential design were separated by an interim analysis at time t, where t represented the proportion of the total sample size from which outcomes were available at interim analysis. Denoted by zij was the standardized test statistic for patient cohort i (i=0,1,2) in stage j (j=1,2). The overall test statistic for patient cohort i was a weighted average of zi1 and zi2, and calculated as: ! √" ∗ ! + #1 − " ∗ !
(2)
Figure 1. Schematic presentation of the three study designs. G0: Unselected study population; G1: Selected study population (marker-positive); PG: Parallel group design; GS: Group sequential design that allows for early stopping at interim for efficacy or futility; AS: Adaptive selection design that allows for both early stopping and population enrichment at interim.
51
2.3
Chapter 2.3
The decision to stop the trial after interim analysis was based on z11 and z01. If both values were below a predefined lower threshold, the trial was stopped for futility. We chose this lower threshold at z1130 kg m as obese. Genetic factors Genomic DNA was isolated from white cells collected at baseline via standard procedures, -1 -1 dissolved in 10 mmol l Tris, 1 mmol l EDTA, pH 8.0, and stored at 4°C. Genotypes of the REGRESS participants for whom DNA was available were ascertained by restriction fragment length polymorphism assays using oligonucleotides and endonucleases as described previously [15-17]. Five common polymorphisms in four genes were selected based on the criterion that they were reported and replicated in literature to influence responsiveness to statin therapy. ε2, ε3 and ε4 alleles in ApoE (rs429358 and rs7412) ApoE has been frequently studied in relation to statin response, especially the three common ApoE alleles ε2, ε3 and ε4, as recently reviewed [6]. The review showed, despite 70
Added value of pharmacogenetic testing in predicting statin response
some inconsistencies in findings, that carrying the ε4 allele is associated with poorer response to statin treatment and carrying the ε2 allele is associated with the greatest cholesterol-lowering effect, as found in, amongst others, the STRENGTH study [18], the PROVE IT-TIMI study [19] and in the TNT cohort [20]. In our study, three different genotype groups were distinguished: ApoE3 (Apo ε3ε3), ApoE2+ (Apo ε2ε2 or Apo ε2ε3) and ApoE4+ (Apo ε3ε4 or Apo ε4ε4). The rare Apo ε2ε4 (n=4) was excluded from the analysis. Insertion/deletion (I/D) polymorphism in ACE (rs4646994) Carrying the deletion allele of the I/D polymorphism in the ACE gene has been associated with a beneficial statin response on different outcome measures in the LCAS population and the PREFACE study [21;22]. Even though the association with statin response is not consistent for the I/D polymorphism [23;24], it was decided to include it since the association has been replicated. Three genotypes were distinguished, based on the deletion allele: ACE II (ACE I/I), ACE ID (ACE I/D) and ACE DD (ACE D/D). -154C>T polymorphism in HL (rs1800588) Findings of the -514C>T polymorphism in relation with the response of HDL-c to statin treatment are ambiguous: Zambon et al. described the CC genotype showing highest response to statins on HDL [25], whereas the RAP study concluded that carriers of the T allele have the highest HDL-C increase [26]. However, it was decided to include this polymorphism in our analysis, since it has been described to modify the response to statin treatment more than once. As in the REGRESS data homozygous carriers of the variant allele are rare, two HL genotype groups were created: HL-C (HL C/C) and HL-T (HL C/T or HL T/T). Asp299Gly polymorphism in TLR4 (rs4986790) In the REGRESS study, TLR4 299Gly carriers had significantly more benefit from pravastatin treatment in preventing cardiovascular events [27]. Subsequently, in a meta-analysis including the REGRESS data, a consistent trend was shown towards reduced frequency of carriers of the variant allele with myocardial infarction in three independent studies [28]. Two genotype groups were made for TLR4, as homozygous carriers of the variant allele are rare: TLR4Asp (TLR4 Asp/Asp) and TLR4Gly (TLR4 Asp/Gly or TLR4 Gly/Gly). Other polymorphisms for which the association with response to statin treatment has been replicated, for example Trp719Arg in KIF6 (rs20455) [9;10], could not be analyzed since they were not available for the REGRESS participants. The TaqIB polymorphism in the CETP gene has been associated with a different pravastatin response in delay in progression of coronary atherosclerosis [7] though a meta analysis of 13677 subjects concluded that this CETP variant does not influence the response to pravastatin therapy [29], so this polymorphism was not taken into account in this study. 71
3.1
Chapter 3.1
Data analysis Three linear regression models were built and compared to estimate the added value of SNPs and gene-gene interactions in the explanation of statin response: a model with all non-genetic variables (model 1), a model with all non-genetic variables and the single SNPs (model 2), and a model with all non-genetic variables, the single SNPs and the genegene interactions (model 3). In total, thirteen gene-gene interaction terms can be formed with the six genotype variables, which were all added in model 3. These three models were made for the patients randomized to pravastatin and for the patients randomized to placebo. Subsequently, the same three models were built for the overall population, including the same determinants plus the interaction of each determinant with statin treatment. This model shows which factors are an effect modifier for statin treatment. 2 The R , which is the percentage of the total variance of the outcome measure explained by the variables in the model, was used to indicate the performance of the models. With a F2 test the R of model 2 and 3 were compared to model 1 to investigate whether the addition of the genetic variables significantly improved the model. In addition, the most notable genetic variables and gene-gene interactions associated with statin response are described. For this exploratory analysis, the variables with a p-value 30)
29 (8.0)
32 (9.2)
Familial heart disease, N (%)
180 (50)
168 (49)
Hypertension by history, N (%)
99 (27)
101 (29)
History
320 (89)
300 (87)
At randomization
101 (28)
92 (27)
During the study
107 (30)
89 (26)
181 (50)
161 (47)
1
145 (40)
145 (42)
2
118 (33)
125 (36)
98 (27)
76 (22)
4.30 (0.81)
4.33 (0.79)
0.92 (0.22)
0.92 (0.22)
6.02 (0.89)
6.05 (0.86)
1.77 (0.75)
1.80 (0.77)
Non-genetic
-2
Body mass index, kg m , N (%)
3.1
Smoking, N (%)
History of myocardial infarction, N (%) CAD, N (%)
3 -1
LDL-c at baseline, mmol l , mean (SD) -1
HDL-c at baseline, mmol l , mean (SD) -1
Total-c at baseline, mmol l , mean (SD) -1
Triglycerides at baseline, mmol l , mean (SD)
73
Chapter 3.1 Table 1. (Continued). Treatment
Placebo
(N=361)
(N=346)
I/I
119 (33)
90 (26)
I/D
159 (44)
166 (48)
D/D
83 (23)
90 (26)
ε2 ε2
4 (1)
0 (0)
ε2 ε3
38 (10)
36 (10)
ε3 ε3
212 (59)
211 (61)
ε3 ε4
100 (28)
91 (26)
ε4 ε4
7 (2)
8 (2)
C/C
240 (67)
203 (59)
C/T
103 (29)
126 (36)
T/T
18 (5)
17 (5)
Asp/Asp
307 (85)
319 (92)
Asp/Gly
52 (14)
26 (8)
Gly/Gly
2 (1)
1 (0)
Genetic ACE (rs4646994), N (%)
ApoE (rs429358, rs7412), N (%)
HL (rs1800588), N (%)
TLR4 (rs4986790), N (%)
CAD: Presence of coronary artery disease (>50% stenosis) in 1, 2 or 3 vessels; LDL-c: Low-density lipoprotein cholesterol; HDL-c: High-density lipoprotein cholesterol; Total-c: Total-cholesterol; ACE: Angiotensin-converting enzyme; ApoE: Apolipoprotein E; HL: Hepatic lipase; TLR4: Toll-like receptor 4. Missings before imputation in treatment arm, N(%): Systolic BP 1(0.3), Diastolic BP 1(0.3), BMI 17(5), LDL-c at baseline 3(1), HDL-c at baseline 3(1), Familial heart disease 2(0.6) and CAD 1(0.3). Missings before imputation in placebo arm, N(%): Systolic BP 1(0.3), Diastolic BP 1(0.3), BMI 19(5), LDL-c at baseline 4(1), HDL-c at baseline 3(1), Triglycerides at baseline 1(0.3), Familial heart disease 1(0.3) and CAD 2(0.6).
variables in predicting change in LDL-c, although this was not a significant increase (p=0.104). In the placebo group, the genetic variables did have a significant predicted value over the non-genetic variables in predicting LDL-c change without treatment (p=0.016), showing that the genetic variables are associated with course of LDL-c in patients with CAD. A model with only the genetic variables explained 3.8% of the variance in change in LDL-c in the treatment group and 6.7% in the placebo group. 74
Added value of pharmacogenetic testing in predicting statin response
Model 1 comprises of the non-genetic factors age, BP, BMI, smoking, total-c at baseline, HDL-c at baseline, and triglycerides at baseline. Model 2 comprises of the same nongenetic factors as model 1 plus the genotypes of ACE, ApoE, HL and TLR4. Model 3 comprises of the same non-genetic factors and genotypes of ACE, ApoE, HL and TLR4 as 2 model 2 plus the 13 gene-gene interaction terms. R is the total percentage of variance explained and p-values are presented for the comparisons of model 2 and 3 with model 1 (reference). 2
In the third column of Table 2 the R of the prognostic models for the overall population are presented. Only the variable ‘treatment’ predicted 50.0% of the variance, adding the 2 non-genetic factors (model 1) raised the R to 67.3%. Adding the selected polymorphisms and the gene-gene interaction terms (model 3) resulted in a significant added value (p=0.009) in predicting the change in LDL-c in the overall population over the non-genetic variables. The added value of the genetic factors is presented in Figure 1, subdivided into the contribution of the single loci and the contribution of the gene-gene interactions. The gene-gene interactions appear to be responsible for the largest part of the explained variance by the genetic factors. Table 3 shows the complete models for patients receiving statins and placebo combined. In model 2, the effect of the selected polymorphisms without gene-gene interactions is presented. The genotypes ACE ID, ApoE2+ and TLR4Gly seem to have an interaction with statin treatment (β=-0.193, p=0.042; β=-0.239, p=0.072; and β=-0.252, p=0.058, respectively). Carrying one of these polymorphisms predicted a good statin response (decrease in LDL-c), which compensated for the predicted increase of LDL-c of the polymorphism without statin treatment (β=0.149 for ACE ID, β=0.034 for ApoE2+, β=0.154 for TLR4Gly). Furthermore, for patients on placebo, carrying ACE ID was a predictor for significantly worse LDL-c levels than ACE II (β=0.149, p=0.032) and patients with ApoE2+ Table 2. Model performance. Treatment
Placebo
(N=361)
(N=346)
Treatment and placebo (N=707)
R2
P-value
R2
P-value
R2
P-value
Model 1
0.430
REF
0.244
REF
0.673
REF
Model 2
0.447
0.126
0.264
0.194
0.682
0.098
Model 3
0.474
0.104
0.321
0.016
0.702
0.009
Model 1 comprises of the non-genetic factors, model 2 comprises of the same non-genetic factors as model 1 plus the genotypes of ACE, ApoE, HL and TLR4 and model 3 comprises of the non-genetic factors, the genotypes, and the gene-gene interaction terms. The models are developed for the overall study population, including interaction terms of each determinant with statin treatment.
75
3.1
Chapter 3.1
2
2
Figure 1. Added value (R ) of the genetic factors in explaining pravastatin response. R , percentage of the total variance of change in LDL-c, explained by the genetic predictors. Ref (reference) is the percentage explained by the non-genetic factors.
-1
had on average 0.205 mmol l more decrease in LDL-c after 2 years of statin treatment than patients with the ApoE3 genotype (p=0.025, data not shown). When the gene-gene interaction terms were added (model 3), the beneficial effect of statins in combination with the ApoE2+ genotype seemed to have disappeared (β=0.051, p=0.849). However, the positive effect on LDL-c change was still observed in patients with the ApoE2+ genotype in combination with ACE ID (β=-0.562, p=0.068). Carrying the combination of ApoE2+ and TLR4Gly also seemed to result in a interaction with statin treatment (β=-0.796, p=0.079). In the example below, the interpretation of Table 3 is illustrated for a fictive patient, carrying this combination of ApoE2+ and TLR4Gly. The predicted difference for this patient in LDL-c levels with and without statin treatment was -1 as much as 2.44 mmol l . For patients having the ApoE4 genotype in combination with TLR4Gly we also found an interaction with statin (β=-0.604, p=0.040), compensating the negative effect of this combination of polymorphisms on LDL-c outcome without treatment (β=0.671, p=0.004). The opposite effect was observed for the combination of ACE ID and TLR4Gly: the positive interaction term with statin, indicating an increase in LDL-c (β=0.859, p=0.011), was counterbalanced by a predicted decrease in LDL-c for patients carrying this combination without treatment (β=-0.697, p=0.013).
76
Added value of pharmacogenetic testing in predicting statin response Example of the interpretation of Table 3 The beta coefficients are the estimates of the expected change in LDL-c per increase of one unit of the predictor variable. The beta coefficient of a predictor (without interaction) is the impact of this predictor on LDL-c change without statin therapy. The impact of a predictor on LDL-c change under statin therapy is the sum of the beta coefficient of the predictor variable and the beta coefficient of the interaction term of the predictor with the statin intervention. As an example, for an 53 yr old patient, with a BMI of 28, BP 140/90, former smoker, total-c is 6.26 mmol l-1, HDL-c is 1.00 mmol l-1, triglycerides 2.10 mmol l-1 and carrying ACE I/I, Apo ε2ε3, HL C/C and TLR4 Gly/Gly, treated without a statin, the predicted change in LDL-c is: 2.020 - 53*0.008 - 140*0.003 + 90*0.004 - 0.129 - 6.26*0.359 + 1.00*0.351 + 2.10*0.176 + 0.081 + 0.207 + 0.795 = 0.96 mmol l-1. For the same patient the predicted change in LDL-c with 2-year statin treatment is: 2.020 - 53*0.008 - 53*0.006 - 140*0.003 + 140*0.006 + 90*0.004 - 90*0.007 - 0.129 + 0.047 - 6.26*0.359 6.26*0.186 + 1.00*0.351 + 1.00*0.472 + 2.10*0.176 + 2.10*0.157 + 0.081 + 0.051 + 0.207 - 0.506 + 0.795 0.796 - 0.772 = - 1.48 mmol l-1.
3.1 Discussion Our study demonstrates that the five selected polymorphisms and the thirteen corresponding gene-gene interactions have a small added value in predicting LDL-c change as response to statins over the non-genetic predictors, and also in predicting LDL-c in nontreated patients. Moreover, the gene-gene interactions appear to be responsible for the largest part of the explained variance by the genetic factors. Our combined model for statin and placebo users showed three notable interactions between single polymorphisms and statin treatment (model 2). The interaction between statin and carrying one deletion allele of the I/D polymorphism in ACE on LDL-c change, has been described before [21], however, we did not find an interaction with the D/D genotype. We identified an interaction between Apo ε2 and statins and found Apo ε2 to be a predictor of better statin response in terms of LDL-c change than Apo ε3, in concordance with earlier findings [6]. The interaction we described between statin and carrying a TLR4 299Gly allele on LDL-c change, was also found for a lower risk of cardiovascular events as outcome for statin treatment [28]. An association with HL and statin response [25;26] could not be replicated in our data. Removing this polymorphism from our analysis resulted in similar added value for the remaining genetic factors in terms 2 of R for both placebo and statin models and in lower p-values as less variables were added to the models (data not shown).
77
Chapter 3.1
Table 3. Multivariate linear regression models for change in LDL-c in all patients. N=707
Model 1
Model 2
Model 3
β
St P error value
β
St P error value
β
St error
P value
Age
-.006
.004
.111
-.007
.004
.082
-.008
.004
.048
Age * statin
-.007
.005
.174
-.006
.005
.247
-.006
.005
.270
BMI normal
049
.063
.435
.055
.063
.384
.049
.063
.438
.086
.857
-.008
.087
.924
.017
.087
.846
BMI normal * statin
.016
BMI overweight
REF
BMI obese
-.020
.102
.848
-.021
.102
.839
-.056
.102
.585
BMI obese * statin
.220
.147
.137
.194
.148
.190
.276
.149
.064
BP systolic
-.002
.002
.242
-.002
.002
.339
-.003
.002
.159
BP systolic * statin
.005
.003
.081
.004
.003
.132
.006
.003
.046
BP diastolic
.004
.004
.231
.003
.004
.360
.004
.004
.294
REF
REF
BP diastolic * statin
-.007
.005
.169
-.006
.005
.265
-.007
.005
.182
Smoking history
-.108
.087
.211
-.107
.087
.220
-.129
.088
.143
Smoking history * statin
.046
.126
.715
.008
.126
.952
.047
.128
.711
Smoking baseline
.047
.094
.616
.062
.095
.514
.086
.096
.370
Smoking baseline * statin
.059
.125
.638
.045
.126
.723
.028
.129
.830
Smoking in study
.187
.096
.051
.173
.096
.073
.175
.096
.070
Smoking in study * statin
.032
.125
.801
.058
.126
.645
.068
.126
.590
Total-c baseline
-.351
.037