Pharmacogenomics - Pitt Public Health

0 downloads 0 Views 2MB Size Report
pharmaco genes in a large number of human genomes from various ... chr6 chr7 chr8 chr9 chr10 chr11 chr12 chr13 chr14 chr15 chr16 chr17 chr18 chr19 chr20 chr21 chr22 chrx ...... xo n ic. C. Y. P. 2C. 9. N on syn on ymo u s SN. V. Ye s. 16. % ch r10. 9. 6. 74. 8. 8. 9. 2 .... might be the answer to this obstacle using an econ-.
Research Article For reprint orders, please contact: [email protected]

Pharmacogenomics

Personalized pharmacogenomics profiling using whole-genome sequencing

Aim: Pharmacogenomics holds promise to rationalize drug use by minimizing drug toxicity and at the same time increase drug efficacy. There are currently several assays to screen for known pharmacogenomic biomarkers for the most commonly prescribed drugs. However, these genetic screening assays cannot account for other known or novel pharmacogenomic markers. Materials & methods: We analyzed whole-genome sequences of 482 unrelated individuals of various ethnic backgrounds to obtain their personalized pharmacogenomics profiles. Results: Bioinformatics analysis revealed 408,964 variants in 231 pharmacogenes, from which 26,807 were residing on exons and proximal regulatory sequences, whereas 16,487 were novel. In silico analyses indicated that 1012 novel pharmacogene-related variants possibly abolish protein function. We have also performed whole-genome sequencing analysis in a sevenmember family of Greek origin in an effort to explain the variable response rate to acenocoumarol treatment in two family members. Conclusion: Overall, our data demonstrate that whole-genome sequencing, unlike conventional genetic screening methods, is necessary to determine an individual’s pharmacogenomics profile in a more comprehensive manner, which, combined with the gradually decreasing wholegenome sequencing costs, would expedite bringing personalized medicine closer to reality. Original submitted 25 November 2013; Revision submitted 30 June 2014 Keywords:  drug metabolism • gene variants • personalized pharmacogenomics profile • pharmacogenomics • whole-genome sequencing

Background Pharmacogenomics aims to correlate genomic variation and gene expression with drug efficacy and/or toxicity [1] . To date, there are several genes, also referred to as pharmacogenes, which are associated with absorption, distribution, metabolism, excretion and toxicity of several drugs (ADMET) [1–3] . Currently, there are a number of genotyping methods, namely PCR or even microarray-based assays, to perform genetic screening of known pharmaco­genomic markers in well-documented pharmacogenes. The most commonly used microarray-based platforms are the AmpliChip CYP450 Test™ (Roche Diagnostics, Basel, Switzerland), a CE in vitro diagnostic certified assay that analyses

10.2217/PGS.14.102 © 2014 Future Medicine Ltd

29 known CYP2D6 gene variants, including gene duplication and deletion, as well as two frequent CYP2C19 gene variants [4] . Also, the DMET+ assay (Affymetrix, CA, USA), allows simultaneous analysis of 1936 pharmaco­ genomic biomarkers in 231 pharmacogenes [5,6] . In addition, a high-throughput genotyping assay is also available [7] . However, these platforms, like every other microarray-based genetic screening approach, have the inherent danger of missing novel unique or rare variants in ADMET-related genes, which may either affect ADMET gene expression or enzyme structure and may hence lead to variable response or increased toxicity in commonly prescribed drugs

Pharmacogenomics (2014) 15(9), 1223–1234

Clint Mizzi1, Brock Peters2, Christina Mitropoulou3, Konstantinos Mitropoulos 4, Theodora Katsila5, Misha R Agarwal2, Ron HN van Schaik3, Radoje Drmanac2, Joseph Borg‡,6,7 & George P Patrinos*,‡,5 Laboratory of Molecular Genetics, Department of Physiology & Biochemistry, University of Malta, Msida, Malta 2 Complete Genomics, Inc., Mountain View, CA, USA 3 Erasmus MC, Faculty of Medicine & Health Sciences, Department of Clinical Chemistry, Rotterdam, The Netherlands 4 The Golden Helix Foundation, London, UK 5 University of Patras, School of Health Sciences, Department of Pharmacy, University Campus, Rion, GR-26504, Patras, Greece 6 Department of Applied Biomedical Science, Faculty of Health Sciences, University of Malta, Msida, Malta 7 Erasmus MC, Faculty of Medicine & Health Sciences, Department of Cell Biology, Rotterdam, The Netherlands *Author for correspondence: Tel.: +30 2610 9692339 gpatrinos@ upatras.gr ‡ Authors contributed equally 1

part of

ISSN 1462-2416

1223

Research Article  Mizzi, Peters, Mitropoulou et al. The advent of next-generation sequencing has created unprecedented opportunities to analyze whole genomes, which, unlike conventional medium- or even high-throughput genetic screening approaches, allows us to obtain a full picture with respect to people’s variomes. To date, whole-exome and/or whole-genome sequencing can be easily performed using several commercially available or proprietary platforms, to comprehensively analyze genome variation with a high degree of accuracy and with reasonable costs, compared with the not so distant past. Despite the fact that whole-exome sequencing is currently more cost effective than whole-genome sequencing, it has a number of limitations [8] , particularly in pharmacogenomics [9,10] : our knowledge of all truly protein-coding exons in the genome is still incomplete, so current capture probes can only target known exons; there is a degree of variability between the various commercial target enrichment kits; regulatory and untranslated regions are not sequenced; there is a significant bias in the target enrichment step, since the efficiency of capture probes varies considerably and some sequences fail to be targeted by capture probe design altogether. As such, not all templates are sequenced with equal efficiency and not all sequences can be aligned to the reference genome so as to allow base calling, therefore, a significant proportion of variants may go undetected. In the latter case, this may be particularly problematic for paralog genes, such as in the case of CYP2D6 gene resequencing [9,10] , using certain next-generation sequencing platforms. Here, we exploited whole-genome sequencing to identify novel and putatively causative genomic variants, affecting the structure and function of 231 pharmaco­genes in a large number of human genomes from various ethnic backgrounds, aiming to demonstrate the advantages of this method contrary to the commonly used conventional genetic screening approaches. Also, we have used whole-genome sequencing to determine personalized pharmacogenomics profiles of a seven-member family of Greek origin and to delineate these profiles with anticoagulation treatment response in two of these family members. Materials & methods Case selection

We have analyzed whole genome sequences from 482 nondiseased individuals of various ethnicities, as previously described [11,12] . Analysis was performed in two stages: a pilot study of 69 publicly available genomes [13] from various ethnic backgrounds (Supplementary Table 1; see online at: www.futuremedicine.com/doi/suppl/10.2217/ pgs.14.102); and a follow-up study including 413 Caucasian genomes from a collection of healthy octogenar-

1224

Pharmacogenomics (2014) 15(9)

ians analyzed on the same sequencing platform (Wellderly Study). In addition, a seven-member family of Greek origin was independently analyzed in an effort to delineate acenocoumarol treatment efficacy in two family members with their under­lying pharmacogenomic profile (Supplementary Figure 1). Informed consent was obtained from all individuals that took part in this study and the study was approved by the ethics committees of partnering Institutions Prothrombin time & international normalized ratio measurement

A total of 5 ml of peripheral blood obtained by venous puncture from two family members (family members 4 and 7) receiving oral anticoagulant treatment was collected into Vacutainer tubes containing 0.105 M trisodium citrate. Prothrombin times were subsequently determined as previously described [14] . DNA isolation & whole-genome sequencing

Genomic DNA isolation was performed from saliva using the Oragene collection kit (DNA Genotek, Ontario, Canada). Whole-genome sequencing was performed using Complete Genomics’ (CA, USA) DNA nanoarray platform [11] . DNA sequencing ­c overage was 110×. Bioinformatics & in silico analyses

Genomes were aligned with hg19 reference genomes. A list of nonredundant variants in 231 ADMET-related genes [5] was generated from all 482 human genomes. The list was annotated with Annovar comparing the novelty of the variants with dbSNP 137. These selected variants were compared against the 1936 known pharmaco­genomics variants documented in these 231 genes [5] . Combined initial analysis resulted in a list of ADMET gene-related variants found in exons, introns, gene promoters and proximal regulatory regions, taking the gene definition for hg19 found on the Complete Genomics public server [15] and increased by 200 bases upstream and 200 bases downstream from wholegenome sequenced DNA. Functional variants (exomes and regulatory) are defined as variants found in coding sequences, splicing, upstream, downstream and 5´- and 3´-UTRs. The variants were filtered according to the analysis required using custom scripts and Complete Genomics Analysis Tools (CGA™ Tools, available from [16]). To obtain predictions for a list of possible functional variants, we have employed the SIFT (available from [17]) and PROVEAN (available from [18]) algorithms that calculate the probability of a certain sequence variant to affect protein function. PROVEAN is able to make predictions for any type of protein sequence

future science group

Personalized pharmacogenomics profiling using whole-genome sequencing 

alterations, including single or multiple amino acid substitutions, deletions and insertions [19] . Results Whole-genome sequencing reveals a large number of genomic variants in the 231 pharmacogenes

We first performed a first round of analysis by analyzing 69 publicly available genomes to get an estimate of the extent of genome variation in the 231 ADMETrelated genes. In the second round of analysis, we subsequently included 413 additional human genomes from the Wellderly Study to verify our findings from the first study. From the total 482 genomes that were

chr1 chr2

chr3

chr4 chr5

chr6

chr7

Research Article

analyzed, we identified 408,964 variants in ADMETrelated genes, from which 239,194 variants (58.5%) were found only once (allele frequency = 0.2%) and 38,636 reached frequencies of over 20%. Also, 26,807 variants were discovered in exons and proximal regulatory regions. Quality of the variants called was considered. From the available public genomes we chose the variants considered of high quality found in the data file, deducted from the ti/tv, homo/het, dN/dS ratios and missingness analysis (Supplementary Table 2) . A total number of 17,733 variants were found on average for each individual in these 231 ADMETrelated genes. Confirming with the latest version of dbSNP (Release 137), we identified a total of 16,487

chr8 chr9 chr10 chr11 chr12 chr13 chr14 chr15 chr16 chr17 chr18 chr19 chr20 chr21 chr22 chrx

UGT1A1 CYP2C19, CYP2C9 VKORC1 CYP2D6 TPMT

chr1 Known variant Novel variant

Figure 1. Depiction of the known and novel genomic variants in absorption, distribution, metabolism, excretion and toxicity related genes. Pharmacogenes are found throughout the whole human genome and using next-generation sequencing technology at a whole-genome level, we identified common and rare variants from the whole genome. To depict these findings, a karyotypelike graphical representation of variants in coding and regulatory (e.g., 200 bp up- and down-stream of transcription start and polyadenylation sites, respectively) regions found in the absorption, distribution, metabolism, excretion and toxicity related genes of all 482 individuals whose genomes have been analyzed was prepared. Variants are displayed at their exact physical location in the genome where a dark blue line indicates an area containing known variants and sites of novel variants are marked in red. The locations of the most important pharmacogenes are depicted with arrows. Chr: Chromosome.

future science group

www.futuremedicine.com

1225

Research Article  Mizzi, Peters, Mitropoulou et al. variants (4% of all the variants) in exons and regulatory regions that were not annotated in dbSNP that are likely to have functional significance (Figure 1) , from which the majority (72.99%), namely 12,034, are found only once on one allele. These data are summarized in Table 1. It must be noted that analysis with the DMET+ chip for each individual, would reveal, on average a mere 249.52 variants in their 231 pharmacogenes. To further demonstrate the applicability of this approach, we have focused on the most important pharmacogenes, for which genetic tests are integrated in several drug labels by different regulatory bodies. We have decided to focus on the CYP2D6, CYPC9, VKORC1, UGT1A1 and TPMT genes, since these genes are not only the most well-documented pharmaco­genes but have also been previously shown to be involved in the metabolism of a variety (over 250) of the most-commonly prescribed drugs, such as antiepileptic drugs, antidepressants and proton-pump inhibitors, while variants in these genes are shown to be wellestablished pharmacogenomic markers, also used in clinical applications and incorporated in drug labels. Our analysis revealed 2521 novel variants in these five pharmacogenes, from which 202 reside in exons and proximal regulatory regions. Also, 11 of these variants reach frequencies of over 1% and as such they might

constitute important pharmacogenomic markers for the most commonly prescribed drugs, since they might affect the overall enzyme activity. An overview of these findings is displayed in Table 2 & Figure 2. Subsequently, we sought to determine the rate of the novel versus known genomic variants present in these 231 ADMET-related genes, and in particular the location (e.g., exons, introns and regulatory regions) and nature of these variants (e.g., substitutions and indels). Our data showed that the ratio of ADMET gene-related variants found on the DMET+ micro­array (which includes variants in intron regions; Affymetrix, vs functional variants, namely variants found in exons, 5´-UTR, 3´-UTR, and variants that overlap 1-kb region up- and down-stream of transcription start sites) was between 26.39 and 40% in our study sample (Figure 3A) . Out of all genomes, 1709 variants, which were either novel or known and annotated in dbSNP databases but not present on the DMET-array, were found with frequencies >20%. Taken together with the previous data, these data underline the potential of whole-genome sequencing to capture several novel and potentially important ADMET-related variants in individual patients. Furthermore, we have used the data available from the pilot study sample to analyze the occurrence of novel ADMET gene-related variants and calculated

Table 1. Overview of the number of genomic variants that have been identified in the CG69 and Wellderly genome collections, by whole-genome sequencing analysis. Variants

Pilot (CG69) 

 

Follow-up (413 Wellderly)

n

%

  n

%

Total

87,027

100

  321,924

100

Functional variants

4958

5.6

  21,849

6.78

Unique variants

31,637

36.35

  207,557

50.75

Total, average in each individual

15,791

n/a

  18,058

n/a

Novel, potentially functional

682

0.78

  15,803

4.9

Novel variants in exome

 

 

  4480

 

Novel, potentially functional found only on one allele

562

0.65

  11,472

3.56

Novel, potentially functional with frequency >1%

123

0.14

  738

0.23

Tandem substitutions

2631

3.02

  39,806

12.36

Insertions

5096

5.85

  39,497

12.27

Deletions

6426

7.38

  38,482

11.95

Number of variants found in DMET+ microarray (range)

285–366

 

  307–397

 

DMET+ coverage, %† (range)

33.05–41.00

 

  26.39–33.64  

Please see text for details. n/a: Not applicable. †

1226

Pharmacogenomics (2014) 15(9)

future science group

future science group

See also Figure 2. N: Novel; n/a: Not applicable; T: Total.

  n/a 5

  n/a 2.39 0.97

7   n/a

  n/a 6.58

10   n/a

  n/a 2.71

9   n/a   n/a 10

  n/a 4.87   n/a 43.48   n/a

  n/a 11 5

1.02   n/a

  n/a 9   n/a 11 n/a 16

n/a 6.64   n/a 16.18   n/a

DMET+ coverage

DMET/total ratio (%)

0.54

  n/a 5 n/a 19

  n/a

  n/a 15 91   n/a 0   n/a 37   n/a   n/a 0   n/a 2 16   n/a 140

  n/a 5

Deletions

  n/a

105

82   n/a

  n/a 0

44   n/a

  n/a 40

32   n/a

  n/a   n/a 3

  n/a 0   n/a 27

  n/a 34 25

8   n/a

  n/a 190   n/a

235   n/a 3

www.futuremedicine.com

  n/a 0

n/a 28

n/a 11

Tandem substitutions

  n/a

  n/a 14

Research Article

Insertions

n/a   2 n/a  2 n/a  2 n/a   5 n/a  0 n/a  1 n/a  0 n/a   1 n/a   0 1 Novel, potentially functional with frequency >1%

n/a

7

n/a   3

  0 14

n/a   70

 7 13

n/a  4

 6 20

n/a   19

  24 3

n/a  0

 0 11

n/a   13

 6 11

n/a  2

 2 26

n/a   28

  12 29

n/a   1

  1 50

n/a

10

15

Novel variants in exome

Novel, potentially functional found only on one allele

27   5

47.22   n/a 42.08

106   91

24.30   n/a

13  6 39

13.29   n/a   n/a

  31 5  0 32   18

54.95   n/a 9.40   n/a 3.39

19  2 50

67.89   n/a   n/a

  31 70

  1

32 23

n/a 24.7   n/a 11.2

Functional variants

Total, average per individual

T

209   29

  N T

  570 724

 N T

152   55

 N T

332   254

  N T

23  4

 N T  N

  191 226 489

T  N T   N

  1266 1675   89 68

T   N

113 241 Total

T N  

  6

CG69 TPMT 

Wellderly CG69

UGT1A1 

Wellderly CG69

VKORC1 

Wellderly CG69

CYP2C9 

Wellderly   CG69 Wellderly    

In order to demonstrate whether a fraction of all novel exome variants have a functional significance, for example, by diminishing gene expression or by exerting a damaging effect in the resulting protein product, we have performed in silico analysis using the PROVEAN and SIFT algorithms [19,21] . In this analysis, we paid particular attention to the five very wellknown pharmacogenes, namely CYP2D6, CYP2C9, UGT1A1, TPMT and VKORC1. Our in silico analysis using the SIFT and PROVEAN algorithms showed that a number of missense mutations in these genes are likely to have a damaging effect on to their respective proteins products. This in turn may directly impact on drug metabolism, which may lead to reduced or diminished drug efficacy and/or increased drug toxicity, Analysis of novel variants found in the CYP2D6 revealed three missense mutations, one of which affects two amino acids (p.Leu91Met and p.His94Arg) in the protein since it is a substitution DNA mutation; and six frameshift mutations. In TPMT, analysis revealed three missense, one nonsense and one frameshift mutation. The frameshift and nonsense mutations, as evidenced by SIFT and Polyphen analysis, putatively have a deleterious and damaging effect to the enzymes, while missense mutations can impact the activity of the enzyme. A portrayal of a number of these missense mutations in a 3D protein structure was constructed to highlight the location occupied by the mutated amino acid (Supplementary Figure 1) . Data files from the Protein Data Bank (PDB) and coordinates were obtained from the Research Collaboratory for Structural Bio­ informatics (RCSB) for the TPMT (pdb 2h11), CYP2C9 (pdb 1OG2) and CYP2D6 (2F9Q) proteins and analyzed by a desktop version of PyMOL v1.6 [22,23] . PyMOL was used to highlight the significant amino acid changes between the wild-type (normal) protein sequences against the mutant (abnormal) protein sequence. The 3D structures depict a single

CYP2D6 

In silico analysis

 

the novel versus known variant ratio among different ethnicities. Our analysis indicated that novel ADMET gene-related variations occur significantly more frequently in certain ethnic groups compared with others (Figure 4A & B) . In particular, in individuals of Gujarati Indians in Houston, Texas (GIH), Japanese in Tokyo, Japan (JPT) and Maasai in Kinyawa, Kenya (MKK) origin, the novel versus known ADMET gene-related variants is almost doubled compared with other ethnicities. This latter finding further highlights the importance of determining the incidence of pharmacogenomic markers in distinct ethnic groups and populations, hence moving personalized to ­‘populationalized’ medicine [20] .

Table 2. Overview of the number of the novel and total genomic variants identified in the CYP2D6, CYP2C9, VKORC1, UGT1A1 and TPMT genes.

Personalized pharmacogenomics profiling using whole-genome sequencing 

1227

Research Article  Mizzi, Peters, Mitropoulou et al.

CYP2D6

CYP2C9

VKORC1

TPMT

UGT1A1

Figure 2. Novel variants identified in the most important pharmacogenes. The total number of variants (depicted in blue above each gene) are compared with the number of novel variants (not annotated in dbSNP 137; depicted in red above each gene) identified in the most important pharmacogenes, namely CYP2D6, CYP2C9, VKORC1, UGT1A1 and TPMT. Green dots depict variants with a mutant allele frequency of >20 while black dots depict variants found in DMET+ Array and in the genomes. See also Table 2.

polypeptide chain, out of a possible tetramer, such as the case in CYP2D6, or duplex, such as the case in CYP2C9 and TPMT. An interesting nonsense mutation in TPMT ablates the last nine amino acids affecting the C-terminal end of the protein and was also highlighted in the protein (Supplementary Figure 1) . Delineating personalized pharmacogenomics profiles with anticoagulation treatment efficacy

Finally, in order to demonstrate the applicability of this approach in a real-life clinical scenario, we have performed whole-genome sequencing in seven members (three generations) of a family of Greek origin, to identify novel and putatively functional pharmacogenomic markers in the ADMET-related genes and attempted to correlate these with response to acenocoumarol treatment. In particular, family members 4 and 7, who are unrelated (Supplementary Figure 2), are both diagnosed with atrial fibrillation and undergo acenocou-

1228

Pharmacogenomics (2014) 15(9)

marol treatment. However, although family member 7 responds well to acenocoumarol treatment, assessed by stable drug dosing (6 mg/week for the last 36 months) and the stable international normalized ratio (INR) measurement (within 2–3; Supplementary Figure 3), family member 4 does not respond well to acenocoumarol treatment, deducted by the fluctuating INR values and underlying drug dosing scheme and the hospitalization for severe bleeding. Our data showed that around one-third of the variants where different between the two patients, where 33.44% of functional variants found in family member 4 are not found in family member 7, and 34.29% of functional variants found in family member 7 are not found in family member 4. Similarly, our analysis revealed 38 novel putatively functional variants in family member 4 that are not found in family member 7, and 40 novel putatively functional variants found in family member 7 that are not found in family member 4 (Supplementary Table 3) . We also studied novel variants found in the trio (father,

future science group

Personalized pharmacogenomics profiling using whole-genome sequencing 

Research Article

45 40

Percentage (%)

35 30 25 20 15 10 5 0 DMET coverage

Novel functional variants

Substitutions

Insertions

Deletions

Variant category Figure 3. Distribution of absorption, distribution, metabolism, excretion and toxicity gene-related variants in the 482 human genomes. Percentage of absorption, distribution, metabolism, excretion and toxicity gene-related variants in each genome, classified into five categories: DMET coverage [5] , novel functional variants, tandem substitutions, insertions and deletions (shown on the x-axis).

mother and child) with data found in the other datasets. Around 5000 high-quality novel variants were identified from the trio. We filtered variants found in CG69 and Wellderly datasets resulting in 1700 variants, Polyphen annotations were added to this list and only those that were annotated were kept resulting in seven variants (Supplementary Table 4) . Looking closer to the genes that are responsible for acenocoumarol treatment, we found that family member 4 is hetero­zygous for the CYP2C9 rs1799853 variation (CYP2C9*2) and for the rs9332242 variant in the 3´-UTR of the CYP2C9 genes. On the contrary, family member 7 had no variations in the CYP2C9 gene (Table 3). Our approach identified CYP2C9 and CYP2C19 gene variants not found on the DMET+ chip and the DMET coverage was an average of 36.18%, similar to those found in the 482 genomes. However, although we failed to identify novel functional variants in the CYP2C9 and VKORC1 genes, related to acenocoumarol treatment, we identified several novel variants found in other pharmacogenes, some of them in genes involved in the clopidogrel pathway (Supplementary Figure 4) [24] . The lack of any sequence variants in genes involved in alternative anticoagulation therapies, such as clopidogrel, may suggest that family member 4 could consider altering her anti­coagulation treatment modality from acenocoumarol to clopidogrel. Similarly, the plethora of known and novel genomic variants not only in the

future science group

CYP2C19 gene (namely rs17885098 and rs3758581, both belonging to haplotypes linked to increased toxicity risk), but also in other genes, such as ABCB1, CYP2B6 and CYP3A5, all of which are involved in the clopidogrel pathway (Supplementary Figure 3), strongly suggest that family member 7 should, under no circumstances, modify the anticoagulation treatment modality. At this point, it should be stressed that although acenocoumarol and clopidogrel act via a different mechanism (acenocoumarol is an anticoagulant, while clopidogrel is an antiplatelet agent), they are both frequently used to treat patients with atrial fibrillation, but should not generally be considered as therapeutic substitutes. Discussion The postgenomic revolution, characterized by the advent of massively parallel whole-genome and -exome sequencing, has led to the correlation of specific genomic variants with disease predisposition and other clinical features, including response to some of the most commonly prescribed drugs. There are currently very few studies to demonstrate the usefulness of whole-genome sequencing in pharmacogenomics. Initially, Ashley and coworkers showed that variants in a person’s genome could suggest probable clopidogrel resistance, positive response to lipid-lowering therapy and indicated the need for a low initial dose for warfa-

www.futuremedicine.com

1229

LWK

CHB

0

2

4

6

8

12 10

14

16

18

CG69 collection ethnic groups Known/novel

0

1

2

3

4

CG69 collection ethnic groups

Novel variants

ASW

MXL

CHB

CG69 collection ethnic groups

MKK

GIH

TSI

CEPH/UTAH

YRI

CEPH/UTAH

CEU

PUR

Percentage (%)

Figure 4. Differences observed in the rates of novel and known variants among different ethnicities. (A) Percentages of known absorption, distribution, metabolism, excretion and toxicity (ADMET) gene-related variants, present in the DMET+ platform, distributed over the 12 different ethnic groups of the pilot CG69 human genomes analysis. Please also see the Supplementary Material). (B) Percentages of novel ADMET gene variants in the 69 human genomes analyzed, distributed over the different ethnic groups (please also see the Supplementary Material). (C) Known/novel ADMET gene-related variants in the various ethnic groups analyzed. The 12 ethnic groups comprising the CG69 collection is shown at the x-axis. ASW: African ancestry in Southwest USA; CEPH/UTAH: Utah residents with ancestry from northern and western Europe; CEU: Utah residents with northern and western European ancestry from the CEPH collection; CHB: Han Chinese in Beijing, China; GIH: Gujarati Indians in Houston, Texas; JPT: Japanese in Tokyo, Japan; LWK: Luhya in Webuye, Kenya; MKK: Maasai in Kinyawa, Kenya; MXL: Mexican ancestry in Los Angeles, California; PUR: Puerto Ricans in Puerto Rico; TSI: Toscani in Italia; YRI: Yoruba in Ibadan, Nigeria.

30

JPT Percentage (%)

32

MXL PUR

34

TSI CEPH/UTAH

36

GIH

YRI

ASW

CEU

Percentage (%)

JPT

38

MKK

CHB

5

LWK

40

PUR

MXL

42

CEU ASW

JPT

6

YRI GIH

Pharmacogenomics (2014) 15(9)

TSI

LWK

1230 MKK

DMET coverage

Research Article  Mizzi, Peters, Mitropoulou et al.

future science group

future science group

96602622 chr10

7.4%

NA

The family tree and corresponding international normalized ratio values are provided in Supplementary Figures 1 & 2, respectively. Chr: Chromosome; NA: Not available; SNV: Single nucleotide variation.

Nonsynonymous SNV CYP2C19 Exonic rs3758581 A G 96602623

CYP2C19 Exonic rs17885098 C T 96522561

Research Article

Yes

 

Synonymous SNV

No 96522560

No

No NA

NA CYP2C9

CYP2C9 3’-UTR rs9332242

rs186217659 Downstream G

G C

T 96749327

96748893

chr10

96749326 chr10

Found in family member 7 but not in family member 4

96748892 chr10

Yes Nonsynonymous SNV CYP2C9 Exonic rs1799853 T C 96702047

NA

16% 96702046 chr10

NA

  Found in family member 4 but not in family member 7

Frequency in Greece Found in the DMET+ Array Exonic effect Gene Location dbSNP 137 annotation Variant sequence End Reference nucleotide  sequence Start nucleotide Chromosome

rin [25] . In another study, Drögemöller and coworkers attempted to address for the first time the application of whole-genome resequencing in pharmaco­genomics applications [26] . Indeed a year later, Nelson and coworkers identified a large number of rare functional variants in 202 drug target genes carried out in just over 14,000 individuals [27] . This study took a very large population genetics approach, and focused on a different set of genes (drug targets) than the ones described in this paper. Nelson and co­workers successfully showed that mutations in such genes can very well reduce or completely abolish efficacy of corresponding drugs, highlighting the importance and further support the use of whole-genome sequencing for personalized healthcare and clinical diagnostics [27] . In our study, we sought to identify the number of variants (both common and rare), not only in many more pharmacogenes (n = 231), but most importantly those that are directly involved in drug metabolism, while we have also taken a more personalized medicine approach by linking novel variants with drug response. Genetic disorders have already begun to greatly benefit from whole-genome sequencing applications and genomic data from selected patients with well-defined phenotypes are accummulating at a very rapid pace [28] . Studies making use of whole-genome sequencing technology in recent years, have led to an exponential increase in the generation of human genome variation data from research centers and diagnostic laboratories, thereby enriching our knowledge of the genetic hetero­ geneity underlying both monogenic disorders and complex clinical traits. As previously discussed, it is obvious that analyzing only the exome fraction of our genome, though inexpensive, has several disadvantages over whole-genome sequencing (e.g., poor coverage of some coding regions as a result of the target enrichment step and zero coverage of potentially important and clinically relevant genomic variants lying in genomic regions other than exons). In this study, we attempted to exploit the ultimate genetic analysis, namely the whole-genome nucleotide sequence analysis of individual genomes to determine personalized pharmacogenomics profiles. We have opted to use the proprietary DNA nanoball resequencing technology of Complete Genomics, instead of the other available next-generation sequencing technologies since the mate pair and unique mapping-based sequence assembly of Complete Genomics does not call any variant in a sequence if there is a conflicting repeated region, something that is particularly important for the CYP2D6 gene [28] . In other words, due to mate pairs, most of the genes (or most of their exons) can be properly sequenced in spite of having closely related pseudogenes [11] .

Table 3. Genomic variants found in the CYP2C9 and CYP2C19 genes, related to anticoagulation treatment (acenocoumarol and clopidogrel, respectively) identified as being heterozygous in family members 4 and 7.

Personalized pharmacogenomics profiling using whole-genome sequencing 

www.futuremedicine.com

1231

Research Article  Mizzi, Peters, Mitropoulou et al. Although the exact nature of intronic variants and of variants residing in intergenic regions has not yet been completely elucidated, failing to determine a comprehensive personalized pharmacogenomic profile and to diagnose rare, if not unique, but clinically significant pharmacogenomic markers may have serious implications in individual patients. This may, in turn, increase the chances of poor drug response, increase the chances of adverse reactions, which reciprocally will result in increased treatment costs. Our data showed that the number of ADMETrelated variants identified by whole-genome sequencing is significantly higher compared with the variants that would have been identified if DMET+, the most comprehensive genotyping platform for pharmaco­ genomic biomarkers available to date, or any other currently available genetic screening platform were used. Although this is to be expected, it clearly shows that, as with every other genetic screening method, a seemingly negative result from a pharmacogenomic screening platform does not necessarily mean that a person does not bear other novel and putatively deleterious variants to ADMET enzymes that would lead to reduced drug efficacy or increase drug toxicity. Interestingly, our independent whole-genome seq­ uencing analysis in the Greek seven-member family led to two very interesting findings. First, the analysis managed to explain, at least in part, the variation in acenocoumarol treatment response between the two family members, as a result of the presence of the CYP2C9*2 allele in heterozygosity in family member 4 (Table 3) . Of course, the involvement of other novel genomic variants in genes yet to be shown to relate with acenocoumarol metabolism cannot be excluded. Second, the analysis also highlighted that: family member 4 may consider altering her anticoagulation treatment modality from acenocoumarol to clopidogrel, since in this case, this alternative treatment may yield better clinical results; family member 7 should, under no circumstances, modify the anticoagulation treatment modality, since in this case, she may face adverse reactions. Our study has a number of limitations. First of all, due to the large number of novel variants that were identified, we were unable to perform functional studies to confirm whether certain genomic variants in those 231 pharmacogenes are indeed deleterious to the protein function and as such can be exploited as pharmaco­ genomic biomarkers. Although the SIFTbased in silico analysis provides an attractive alternative to predict pathogenicity of the novel variants, it cannot fully replace functional assays. Using these datasets, we could not create a set of cases and controls to study drug treatment of acenocoumarol treatment. Further-

1232

Pharmacogenomics (2014) 15(9)

more, due to the fact that (rare) variants of interest have not been validated using an independent genotyping technology, such as Sanger sequencing, these findings should be treated with caution. Also, the analysis of just two members from a single family does not allow us to draw solid conclusions but only to highlight the importance and potential usefulness of whole-genome sequencing in pharmacogenomics. Finally, our study takes into consideration only those 231 pharmacogenes that have been previously shown to be implicated in predicting drug efficacy and/or toxicity and does not take into account genomic variation in other genes whose role as pharmacogenomic biomarkers is yet to be determined. Obviously the fact that a large number of human genome data is freely available allows replication of this exercise once evidence for novel pharmacogenes is provided. Moreover, the benefit of such an analysis conducted on the Wellderly and public genomes available from Complete Genomics would be amplified much more if critical and well-defined clinical phenotypes were available. This would mean that novel genes that are predicted to act as pharmacogenes can be identified in this way, and could lead to the discovery of new functions and possibly new treatment options. Conclusion & future perspective As personalized drug treatment and genomic medicine moves closer to becoming a reality, the use of mediumor even high-throughput microarray-based genetic screening tests that fit all ethnicities warrants reconsideration and tweaking to accommodate today’s patient needs. Also, as next-generation sequencing technology improves, based on both read lengths and accuracy, this approach will facilitate the more accurate geno­typing of pharmacogenes in future studies and aid in the implementation of next-generation sequencing-based ­pharmacogenomic assays in the clinic. This study clearly demonstrates that not only can whole-genome sequencing reveal a relatively large number of unique (or rare) pharmacogenomic markers that would otherwise go undetected by conventional genetic screening methods, such as PCR or microarraybased methods, but also that significant variation exists among different ethnic groups as far as novel ADMET gene-related variants are concerned. At this point, one should bear in mind that before such an approach and resulting data are used in a clinical setting, verification needs to be performed in a quality accredited (International Organization for Standardization [ISO] or Clinical Laboratory Improvement Amendments [CLIA] certified) laboratory. An important aspect of next-generation sequencing technology that would be critical for its early adop-

future science group

Personalized pharmacogenomics profiling using whole-genome sequencing 

tion in the clinic is cost–effectiveness. In other words, it becomes obvious that performing a comprehensive personalized pharmacogenomic profile using wholegenome sequencing (which currently costs US$3000 and is dropping) that will include almost all of the germline and de novo genomic variants needed to manage all current and future treatment modalities is cost effective when compared against the costs of testing for a single marker or several markers in few pharmaco­ genes (from US$300 up to US$1500, respectively). At present, setting up a (centralized) whole-genome sequencing facility and training clinicians to be able to translate pharmacogenomic data are two of the most important hurdles that need to be overcome [29] , but sample outsourcing for data analysis and interpretation might be the answer to this obstacle using an economy-of-scale model. Ultimately, as pharmacogenomic testing costs using whole-genome sequencing and its cost–effectiveness are well-documented, it will only be a matter of time until testing cost reimbursement is adopted by national insurance bodies. Overall, our findings undoubtedly highlight the value of whole-genome sequencing, to determine patients’ unique personalized pharmacogenomics profiles. Considering the fact that whole-genome sequencing costs continue to rapidly decline and at the same time genome-sequencing services become more accu-

Research Article

rate in delivering clinical-grade genome sequences, it is expected that whole-genome sequencing will gradually assume an integral part in genomic medicine in the not so distant future. Financial & competing interests disclosure This study was supported in part by ALTF 71-2011 grant to J Borg and the Golden Helix Foundation (P3; RD-201201) and European Commission grants (RD-CONNECT; FP7–305444) to GP Patrinos. This project was encouraged by the Genomic Medicine Alliance [30] . The authors declare the following conflict of interests: B Peters, MR Agarwal and R Drmanac are employees of Complete Genomics, Inc. The authors have no other relevant affiliations or financial involvement with any organization or entity with a financial interest in or financial conflict with the subject matter or materials discussed in the manuscript apart from those disclosed. No writing assistance was utilized in the production of this manuscript.

Ethical conduct of research The authors state that they have obtained appropriate institutional review board approval or have followed the principles outlined in the Declaration of Helsinki for all human or animal experimental investigations. In addition, for investigations involving human subjects, informed consent has been obtained from the participants involved.

Executive summary • Pharmacogenomics holds promise to rationalize drug use by minimizing drug toxicity and at the same time increase drug efficacy. • There are currently several genetic screening assays to screen for known pharmacogenomic biomarkers for the most commonly prescribed drugs; however, these cannot account for other known or novel pharmacogenomic markers. • We analyzed whole-genome sequences of 482 unrelated individuals of various ethnic backgrounds to obtain their personalized pharmacogenomics profiles. • Bioinformatics analysis revealed 408,964 variants in 231 pharmacogenes, from which 26,807 were residing on exons and proximal regulatory sequences, whereas 16,487 were novel, out of which 1012 variants possibly abolish protein function, as indicated by in silico analysis. • We have also performed whole-genome sequencing analysis in a seven member family of Greek origin in an effort to explain the variable response rate to acenocoumarol treatment in two family members. • Overall, our data demonstrate that whole-genome sequencing, unlike conventional genetic screening methods, is necessary to determine an individual’s pharmacogenomic profile in a more comprehensive manner, which, combined with the gradually decreasing whole-genome sequencing costs, would expedite bringing personalized medicine closer to reality.

References 1

2

3

4

Evans WE, Relling MV. Pharmacogenomics: translating functional genomics into rational therapeutics. Science 286(487), 487–491 (1999).

Roche AmpliChip 450 test. http://molecular.roche.com/assays/Pages/ AmpliChipCYP450Test.aspx 

5

de Denus S, Letarte N, Hurlimann T et al. An evaluation of pharmacists’ expectations towards pharmacogenomics. Pharmacogenomics 14(2), 165–175 (2013).

Burmester JK, Sedova M, Shapero MH, Mansfield E. DMET microarray technology for pharmacogenomics-based personalized medicine. Methods Mol. Biol. 632, 99–124 (2010) 

6

Affymetrix DMET+ microarray. www.affymetrix.com/estore/browse/products. jsp?productId=131412#1_1 

PharmGKB.  www.pharmgkb.org 

future science group

www.futuremedicine.com

1233

Research Article  Mizzi, Peters, Mitropoulou et al.

1234

7

Johnson JA, Burkley BM, Langaee TY et al. Implementing personalized medicine: development of a cost-effective customized pharmacogenetic genotyping array. Clin. Pharmacol. Ther. 92(4), 437–439 (2012).

20

Mette L, Mitropoulos K, Vozikis A, Patrinos GP. Pharmacogenomics and public health: implementing ‘populationalized’ medicine. Pharmacogenomics 13(7), 803–813 (2012).

8

Bamshad MJ, Ng SB, Bigham A W et al. Exome sequencing as a tool for Mendelian disease gene discovery. Nat. Rev. Genet. 12(11), 745–755 (2011).

21

9

Gamazon ER, Skol AD, Perera MA. The limits of genomewide methods for pharmacogenomics testing. Pharmacogenet. Genomics 22(4), 261–272 (2012).

Kumar P, Henikoff S, Ng PC. Predicting the effects of coding non-synonymous variants on protein function using the SIFT algorithm. Nature Protocols 4(7), 1073–1081 (2009).

22

RCSB Protein Data Bank. www.rcsb.org/pdb/home/home.do 

23

PyMOL. www.pymol.org 

24

Sangkuhl K, Klein TE, Altman RB. Clopidogrel pathway. Pharmacogenet. Genomics 20(7), 463–465 (2010).

25

Ashley EA, Butte AJ, Wheeler MT. Clinical assessment incorporating a personal genome. Lancet 375(9725), 1525–1535 (2010).

26

Drögemöller BI, Wright GEB, Niehaus DJH, Emsley RA, Warnich L. Whole-genome resequencing in pharmacogenomics: moving away from past disparities to globally representative applications. Pharmacogenomics 12(12), 1717–1728 (2011). 

27

Nelson MR, Wegmann D, Ehm MG et al. An abundance of rare functional variants in 202 drug target genes sequenced in 14,002 people. Science 337(6090), 100–104 (2012).

28

Drögemöller BI, Wright GE, Niehaus DJ et al. Nextgeneration sequencing of pharmacogenes: a critical analysis focusing on schizophrenia treatment. Pharmacogenet. Genomics 23(12), 666–674 (2013).

29

Kampourakis K, Vayena E, Mitropoulou C et al. Key challenges for next-generation pharmacogenomics. EMBO Rep. 15(5), 472–476 2014).

30

Genomic Medicine Alliance. www.genomicmedicinealliance.org 

10

Altman RB, Whirl-Carrillo M, Klein TE. Challenges in the pharmacogenomic annotation of whole genomes. Clin. Pharmacol. Ther. 94(2), 211–213 (2013).

11

Drmanac R, Sparks AB, Callow MJ et al. Human genome sequencing using unchained base reads on self-assembling DNA nanoarrays. Science 327(5961), 78–81 (2010).

12

Roach JC. Analysis of genetic inheritance in a family quartet by whole-genome sequencing. Science 328, 636–639 (2010).

13

69 Genomes Data.  www.completegenomics.com/public-data/69-Genomes 

14

Fritsma GA. Evaluation of Hemostasis. In: Hematology: Clinical Principles and Applications. Rodak B (Ed). WB Saunders Company, PA, USA, 719–753 (2002).

15

Complete Genomics public server.  ftp://ftp2.completegenomics.com

16

CGA™ Tools. www.completegenomics.com/analysis-tools/cgatools 

17

SIFT tool. http://sift.jcvi.org 

18

PROVEAN tool webpage. http://provean.jcvi.org/about.php#about_3

19

Choi Y, Sims GE, Murphy S, Miller JR, Chan AP. Predicting the functional effect of amino acid substitutions and indels. PLoS ONE 7(10), e46688 (2012).

Pharmacogenomics (2014) 15(9)

future science group