Clinical Chemistry

4 downloads 0 Views 13MB Size Report
Linda M. Thienpont. Ghent University ...... native approach, Frederick Sanger and colleagues de- ...... Lo YM, Corbetta N, Chamberlain PF, Rai V, Sar- gent IL ...
Clinical Chemistry Volume 55, Number 4, Pages 601– 844

APRIL 2009

EDITOR-IN-CHIEF Nader Rifai Children’s Hospital Boston Harvard Medical School Boston, MA

DEPUTY EDITORS Thomas M. Annesley

James C. Boyd

University of Michigan Ann Arbor, MI

University of Virginia Charlottesville, VA

ASSOCIATE AND SECTION EDITORS Fred S. Apple

Robert Dufour

W. Greg Miller

David B. Sacks

Hennepin County Medical Center University of Minnesota Minneapolis, MN

Clinical Case Studies Co-Editor George Washington University Washington, DC

Virginia Commonwealth University Richmond, VA

Brigham and Women’s Hospital Harvard Medical School Boston, MA

Michael J. Bennett

Ann Gronowski

Reviews Editor University of Pennsylvania Philadelphia, PA

Clinical Case Studies Co-Editor Washington University School of Medicine St. Louis, MO

University Medical Center Utrecht The Netherlands

Karel G.M. Moons

David E. Bruns

Michael Oellerich

Mitchell G. Scott Washington University School of Medicine St. Louis, MO

Georg-August-University Go¨ttingen, Germany

Carl T. Wittwer

Alan Remaley

University of Utah Salt Lake City, UT

Glen L. Hortin

Perspectives Editor University of Virginia Charlottesville, VA

National Institutes of Health Bethesda, MD

Eleftherios P. Diamandis

Y.M. Dennis Lo

Citation Classics Editor Mount Sinai Hospital University of Toronto Canada

The Chinese University of Hong Kong China

Douglas G. Altman

W. Edward Highsmith, Jr.

G. Mike Makrigiorgos

Tatsuya Sawamura

Centre for Statistics in Medicine Oxford, United Kingdom

Mayo Clinic Rochester, MN

Dana Farber Cancer Institute Harvard Medical School Boston, MA

National Cardiovascular Center Osaka, Japan

Mark E. Meyerhoff

Dr. Margarete Fischer-Bosch Institute of Clinical Pharmacology Stuttgart, Germany

Book Reviews Editor National Institutes of Health Bethesda, MD

Ian Young Queen’s University Belfast, Northern Ireland

BOARD OF EDITORS

David Altshuler Massachusetts General Hospital Harvard Medical School Boston, MA

Marilyn A. Huestis National Institute on Drug Abuse Baltimore, MD

Patrick M.M. Bossuyt

David R. Jacobs, Jr.

Academic Medical Center University of Amsterdam The Netherlands

University of Minnesota Minneapolis, MN

Steven A. Carr

Mayo Clinic Rochester, MN

Broad Institute Harvard University/MIT Cambridge, MA

Nancy Cook Brigham and Women’s Hospital Harvard Medical School Boston, MA

Charles S. Eby Washington University School of Medicine St. Louis, MO

Jose´ Manuel Ferna´ndez-Real Hospital of Girona “Dr. Josep Trueta” Spain

University of Michigan Ann Arbor, MI

Chad A. Mirkin Northwestern University Evanston, IL

Allan Jaffe

Klaus Jung

David A. Morrow Brigham and Women’s Hospital Harvard Medical School Boston, MA

University Hospital Charite´ Berlin, Germany

兾 rge G. Nordestgaard Bo

Wolfgang Koenig

Copenhagen University Hospital Denmark

University of Ulm Medical Center Germany

Robert Rej

Larry J. Kricka University of Pennsylvania Philadelphia, PA

Kristian Linnet University of Copenhagen Denmark

PUBLISHER

MANAGING EDITOR

Mac Fancher AACC Washington, DC

Sheehan Misko AACC Washington, DC

© 2009 American Association for Clinical Chemistry

Wadsworth Center for Laboratories and Research Albany, NY

Matthias Schwab

Meir Stampfer Harvard School of Public Health Boston, MA

Ulf-Håkan Stenman Helsinki University Central Hospital Finland

Philippa Talmud University College London London, UK

Linda M. Thienpont Ghent University Belgium

Per Magne Ueland Haukeland Hospital Bergen, Norway

William L. Roberts

Teun van Gelder

University of Utah Salt Lake City, UT

Erasmus Medical Center Rotterdam, The Netherlands

EDITORIAL COORDINATORS Sarah J. Walker AACC Washington, DC

Rachelle Detweiler AACC Washington, DC

www.clinchem.org

Clinical Chemistry

Contents

INTRODUCTION Molecular Diagnostics: At the Cutting Edge of Translational Research Y.M.D. Lo and C.T. Wittwer

APRIL 2009

Microarray-Based Genomic DNA Profiling Technologies in Clinical Molecular Diagnostics Y. Shen and B.-L. Wu

659

601 Analytical Ancestry: “Firsts” in Fluorescent Labeling of Nucleosides, Nucleotides, and Nucleic Acids L.J. Kricka and P. Fortina 670

EDITORIALS From Transrenal DNA to Stem Cell Differentiation: An Unexpected Twist S.R. Umansky (see article on page 715)

Volume 55, Number 4, Pages 601– 844

602

On Targeting Cell-Free DNA in Urine: A Protocol for Optimized DNA Analysis M. Bauer and B. Pertl (see article on page 723) 605 Next-Generation Sequencing of Plasma/Serum DNA: An Emerging Research and Molecular Diagnostic Tool Y.M.D. Lo and R.W.K. Chiu (see article on page 730) 607

Utilizing the Molecular Gateway: The Path to Personalized Cancer Management J.B. Overdevest, D. Theodorescu, and J.K. Lee

684

Management of Gene Promoter Mutations in Molecular Diagnostics K.M.K. de Vooght, R. van Wijk, and W.W. van Solinge

698

POINT/COUNTERPOINT Point: Use of Pharmacogenetics in Guiding Treatment with Warfarin M. Wadelius

709

PERSPECTIVE A New Tool for Oligonulceotide Import into Cells D.C. Leslie and J.P. Landers

Counterpoint: Pharmacogenetic-Based Initial Dosing of Warfarin: Not Ready for Prime Time C.S. Eby 712 609

ARTICLES SPECIAL REPORT The MIQE Guidelines: Minimum Information for Publication of Quantitative Real-Time PCR Experiments S.A. Bustin, V. Benes, J.A. Garson, J. Hellemans, J. Huggett, M. Kubista, R. Mueller, T. Nolan, M.W. Pfaffl, G.L. Shipley, J. Vandesompele, and C.T. Wittwer

MOLECULAR DIAGNOSTICS AND GENETICS

611

MINI-REVIEWS

Presence of Donor-Derived DNA and Cells in the Urine of Sex-Mismatched Hematopoietic Stem Cell Transplant Recipients: Implication for the Transrenal Hypothesis E.C.W. Hung, T.K.F. Shing, S.S.C. Chim, P.C. Yeung, R.W.Y. Chan, K.W. Chik, V. Lee, N.B.Y. Tsui, C.-K. Li, C.S.C. Wong, R.W.K. Chiu, and Y.M.D. Lo (see editorial on page 602) 715

623

Optimization of Transrenal DNA Analysis: Detection of Fetal DNA in Maternal Urine E.M. Shekhtman, K. Anne, H.S. Melkonyan, D.J. Robbins, S.L. Warsof, and S.R. Umansky (see editorial on page 605) 723

PCR-Based Methods for the Enrichment of Minority Alleles and Mutations C.A. Milbury, J. Li, and G.M. Makrigiorgos 632

Profile of the Circulating DNA in Apparently Healthy Individuals J. Beck, H.B. Urnovitz, J. Riggert, M. Clerici, and E. Schu¨tz (see editorial on page 607) 730

MicroRNAs: Novel Biomarkers for Human Cancer C.L. Bartels and G.J. Tsongalis

REVIEWS Next-Generation Sequencing: From Basic Research to Diagnostics K.V. Voelkerding, S.A. Dames, and J.D. Durtschi

641

European External Quality Control Study on the Competence of Laboratories to Recognize Rare Sequence Variants Resulting in Unusual Genotyping Results J. Ma´rki-Zay, C.L. Klein, D. Gancberg, H.G. Schimmel, and L. Dux 739 continued

Clinical Chemistry (ISSN 0009-9147) is published monthly by the American Association for Clinical Chemistry, 1850 K Street, NW, Suite 625, Washington, DC 20006. New subscriptions, renewals, changes of address, back issues, and all customer service questions should be addressed to: AACC, Subscription Department, 1850 K Street, NW, Suite 625, Washington, DC 20006. Telephone (202) 857-0717 or 1 (800) 892-1400; fax (202) 887-5093; e-mail [email protected]. Subscription rates: Institutional subscription USA $992, elsewhere $1,144. Individual subscription USA $299, elsewhere $464. Airmail delivery outside USA is an additional $270. Individual subscriptions are for personal use and not to be used in a library. Periodicals postage paid at Washington, DC and at additional mailing offices. Postmaster: Send address changes to Clinical Chemistry, 1850 K Street, NW, Suite 625, Washington, DC 20006. Copyright © 2009 The American Association for Clinical Chemistry.

Clinical Chemistry

Contents

ARTICLES, continued Coamplification at Lower Denaturation Temperature–PCR Increases Mutation-Detection Selectivity of TaqMan-Based Real-Time PCR J. Li, L. Wang, P.A. Ja¨nne, and G.M. Makrigiorgos

748

Circulating Prostate Tumor Cells Detected by Reverse Transcription–PCR in Men with Localized or Castration-Refractory Prostate Cancer: Concordance with CellSearch Assay and Association with Bone Metastases and with Survival P. Helo, A.M. Cronin, D.C. Danila, S. Wenske, R. Gonzalez-Espinoza, A. Anand, M. Koscuiszka, R.-M. Va¨a¨na¨nen, K. Pettersson, F.K.-H. Chun, T. Steuber, H. Huland, B.D. Guillonneau, J.A. Eastham, P.T. Scardino, M. Fleisher, H.I. Scher, and H. Lilja 765

Overinterpretation of Clinical Applicability in Molecular Diagnostic Research B. Lumbreras, L.A. Parker, M. Porta, M. Polla´n, J.P.A. Ioannidis, and I. Herna´ndez-Aguado

Rapid Single-Nucleotide Polymorphism Detection of Cytochrome P450 (CYP2C9) and Vitamin K Epoxide Reductase (VKORC1) Genes for the Warfarin Dose Adjustment by the SMart-Amplification Process Version 2 T. Aomori, K. Yamamoto, A. Oguchi-Katayama, Y. Kawai, T. Ishidao, Y. Mitani, Y. Kogo, A. Lezhava, Y. Fujita, K. Obayashi, K. Nakamura, H. Kohnke, M. Wadelius, L. Ekstro¨m, C. Skogastierna, A. Rane, M. Kurabayashi, M. Murakami, P.E. Cizdziel, Y. Hayashizaki, and R. Horiuchi 804 INFECTIOUS DISEASE Generating Aptamers for Recognition of VirusInfected Cells Z. Tang, P. Parekh, P. Turner, R.W. Moyer, and W. Tan

813

BRIEF COMMUNICATION A Multiplex Assay for Detecting Genetic Variations in CYP2C9, VKORC1, and GGCX Involved in Warfarin Metabolism A.J. Rai, N. Udar, R. Saad, and M. Fleisher 823

CLINICAL CASE STUDY Genetic Testing for Developmental Delay: Keep Searching for an Answer D.T. Miller, Y. Shen, D.J. Harris, B.-L. Wu, and M.M. Sobeih

827

COMMENTARIES 774

S.W. Cheung

830

N.L.S. Tang

831

CITATION CLASSIC The Beginnings of Real-Time PCR P.M. Williams 786

833

INTERVIEW A Conversation with Elizabeth Blackburn M. Landau

EVIDENCE-BASED MEDICINE AND TEST UTILIZATION Familial and Sporadic Porphyria Cutanea Tarda: Characterization and Diagnostic Strategies A.K. Aarsand, H. Boman, and S. Sandberg

APRIL 2009

POINT-OF-CARE TESTING

mRNA Expression and BRAF Mutation in Circulating Melanoma Cells Isolated from Peripheral Blood with High Molecular Weight MelanomaAssociated Antigen–Specific Monoclonal Antibody Beads M. Kitago, K. Koyanagi, T. Nakamura, Y. Goto, M. Faries, S.J. O’Day, D.L. Morton, S. Ferrone, and D.S.B. Hoon 757

Interindividual and Interethnic Variation in Genomewide Gene Expression: Insights into the Biological Variation of Gene Expression and Clinical Implications H.P.Y. Fan, C.D. Liao, B.Y. Fu, L.C.W. Lam, and N.L.S. Tang

Volume 55, Number 4, Pages 601– 844

835

LETTER TO THE EDITOR 795

Validity of Maternal Genotypes in DNA from Archival Pregnancy Serum Samples M.I.L. Ivarsson, J. Dillner, and J. Carlson

842

continued

Clinical Chemistry

Contents

CLINICAL CHEMIST Lily Robinson and the Mount Vernon Affair: Recollections

844

ACCENT姞—CONTINUING EDUCATION CREDIT FOR READERS OF CLINICAL CHEMISTRY For more information go to http://apps.aacc.org.ccj/accent

ON THE COVER Elizabeth Blackburn, PhD, of the University of California, San Francisco, examines a chromosome (blue) and its telomeres (red). Dr. Blackburn discovered the ribonucleoprotein enzyme, telomerase that keeps the ends of chromosomes intact. As the “leading lady of telomerase biology,” she shares her views on science, women and politics, in the Interview beginning on page 835. ©Reproduced with permission. Photographer, Micheline Pelletier. Photo taken on the occasion of Dr. Blackburn receiving the 2008 L’OREALUNESCO Award for Women in Science as the laureate for North America. Color figures for Reviews sponsored by Department of Laboratory Medicine, Children’s Hospital Boston.

Volume 55, Number 4, Pages 601– 844

APRIL 2009

Introduction

Clinical Chemistry 55:4 601 (2009)

Molecular Diagnostics: At the Cutting Edge of Translational Research Y. M. Dennis Lo1 and Carl T. Wittwer2*

The goal of translational research is to advance basic research and new technology toward clinical utility. Molecular diagnostics focuses primarily on nucleic acids. Rapid advances in molecular diagnostics both enable basic research and result in practical diagnostic tests. This central position of molecular diagnostics in translational research has led us to bring you this special edition of Clinical Chemistry, Molecular Diagnostics: At the Cutting Edge of Translational Research. We start the issue with a reminder that basic research is the foundation for advances in molecular diagnostics through an interview with Dr. Elizabeth Blackburn, a pioneer in telomere biology. This area of research holds promise in improving our understanding of aging, stem cell biology, and diseases like cancer and has stimulated the development of new molecular diagnostic tests. Basic research has also provided us with new tools for molecular diagnostics, including high-density microarrays, massively parallel sequencing, extremely sensitive real-time PCR, and methods for rapid discrimination of very small differences between molecules. “Wholegenome” arrays are now available for mRNA expression, single-nucleotide polymorphism genotyping, and copynumber variants. Of course, clinical utility does not map directly to the number of tests performed or the number of spots on an array. A 100K chip does not imply 100K times the clinical value of a single test. Nevertheless, expression arrays are well suited to facilitate identification of important transcripts that may be included in reduced analysis sets for tumor subclassification. Genotyping arrays provide extraordinary power for association studies to map complex traits. Copy-number arrays have direct clinical applications with a resolution impossible by conventional cytogenetics. Next-generation sequencing leverages the power of parallel processing by use of emulsion or solid-phase PCR and even single-molecule sequencing. Hybrid applications have begun to appear: next-generation sequencing of mRNA can be used for ex-

1

Department of Chemical Pathology, Prince of Wales Hospital, The Chinese University of Hong Kong, Shatin, New Territories, Hong Kong, People’s Republic of China; 2 Department of Pathology, University of Utah Medical School, Salt Lake City, UT. * Address correspondence to this author at: Department of Pathology, University of Utah Medical School, 50 N. Medical Drive, Salt Lake City, UT, 84132. Fax 801-581-4517; e-mail [email protected]. Received January 13, 2009; accepted January 16, 2009. Previously published online at DOI: 10.1373/clinchem.2008.119586

pression analysis, and oligonucleotide arrays can be used for both single-nucleotide polymorphisms and copynumber variants. Closed-tube amplification methods such as realtime PCR have enabled much of current molecular diagnostics. Very low concentrations of nucleic acids can be measured, promising early cancer detection, noninvasive prenatal diagnosis, minimal residual-disease monitoring, and personalized therapy. Nucleic acids turn up in unexpected places such as urine and saliva, and fetal nucleic acids appear in maternal plasma; all have diagnostic promise. New classes of nucleic acids, such as microRNAs, are being discovered that are highly specific as regulators, with strong potential as biomarkers. Very small differences between nucleic acids can be detected rapidly by mass spectroscopy or high-resolution melting. Clinical chemistry today encompasses much more than measuring blood glucose. Nevertheless, like blood glucose, the best molecular correlates to disease are usually related to disease etiology. Molecular diagnostics will continue to evolve. Not all methods and markers will survive the test of time, but those that do will complement the rapidly expanding menu of molecular diagnostic tests focused on improving medical care and our quality of life. As guest coeditors, we would like to dedicate this edition to our mentor and friend, David Bruns, former Editor-in-Chief of Clinical Chemistry.

Author Contributions: All authors confirmed they have contributed to the intellectual content of this paper and have met the following 3 requirements: (a) significant contributions to the conception and design, acquisition of data, or analysis and interpretation of data; (b) drafting or revising the article for intellectual content; and (c) final approval of the published article. Authors’ Disclosures of Potential Conflicts of Interest: Upon manuscript submission, all authors completed the Disclosures of Potential Conflict of Interest form. Potential conflicts of interest: Employment or Leadership: C.T. Wittwer, Idaho Technology. Consultant or Advisory Role: Y.M.D. Lo, Sequenom. Stock Ownership: Y.M.D. Lo, Sequenom; C.T. Wittwer, Idaho Technology. Honoraria: None declared. Research Funding: Y.M.D. Lo, Sequenom; C.T. Wittwer, Idaho Technology, ARUP. Expert Testimony: None declared. Role of Sponsor: The funding organizations played no role in the design of study, choice of enrolled patients, review and interpretation of data, or preparation or approval of manuscript.

601

Editorials

Clinical Chemistry 55:4 602–604 (2009)

From Transrenal DNA to Stem Cell Differentiation: An Unexpected Twist Samuil R. Umansky1*

In this issue of Clinical Chemistry, Hung et al. describe the results of a study originally devoted to investigation of the transrenal DNA (Tr-DNA)1 phenomenon. These authors report on the presence of donorderived, cell-free DNA and cells in the urine of sexmismatched hematopoietic stem cell transplant (HSCT) recipients (1 ). The term Tr-DNA defines DNA molecules that appear in the urine from sources located outside of the urinary system. The investigators who found such molecules in urine for the first time suggested that DNA fragments from cells dying throughout the body appear in the bloodstream as so-called cell-free circulating DNA (cfcDNA) and then cross the kidney barrier into the urine (2 ). Otherwise, it is difficult to explain how fetal DNA appears in the urine of pregnant women or how tumor-specific DNA markers appear in the urine of patients with tumors located outside of the urinary system. This original observation was reproduced in many laboratories (2–7 ). At the same time, several groups could not detect fetal DNA in the urine of pregnant women (8, 9 ), raising doubts about the concept of Tr-DNA. The latter results can be explained by the fact that Tr-DNA fragments are shorter than cfcDNA, and shorter amplicons should be used for their detection (4, 7, 10 ), especially in prenatal models in which concentrations of fetal DNA in maternal urine are low. It is also important to remember that urinary DNA from other sources complicates analysis of TrDNA. Other sources of urinary DNA include epithelial cells shed from the organs of the urinary system, often lymphocytes and other white blood cells, and cell-free DNA released into the urine after chromatin degradation in cells dying in the kidney and bladder. Bacterial infection is another potential source of urinary DNA; hence urine is widely used for diagnosing sexually transmitted diseases (11, 12 ).

1

Xenomics, Monmouth Junction, NJ. * Address correspondence to the author at: Xenomics, 1 Deer Park Drive, Suite F, Monmouth Junction, NJ 08852. Fax 732-438-8299; e-mail sumansky@ xenomics.com. Received October 31, 2008; accepted December 3, 2008. Previously published online at DOI: 10.1373/clinchem.2008.119552 2 Nonstandard abbreviations: Tr-DNA, transrenal DNA; cfcDNA, cell-free circulating DNA; CK, cytokeratin.

602

As a model for investigation of the Tr-DNA phenomenon, Hung and colleagues studied HSCT recipients, whose plasma contains a significant amount of donor-specific cfcDNA. The primary goal of the study was to find out if there is a correlation between concentrations and sizes of cfcDNA in the plasma and TrDNA in the urine. To simplify detection of donor DNA in plasma and urine, the authors selected sexmismatched HSCT recipients. Twenty-one of 22 patients had more than 99% of donor lymphohematopoietic cells in the peripheral blood. To differentiate male and female DNA, a zinc finger-protein gene assay was designed based on the nucleotide difference between the zinc finger-protein genes located on chromosomes X and Y. In addition, Y chromosome–specific SRY sequences were used for male donor/female recipient patients. The contribution of donor-derived cfcDNA in plasma was in the range of 72.8%–79.3%, supporting the earlier conclusion about the predominantly hematopoietic origin of cfcDNA in plasma (13 ). These data indicate that only about 25% of cfcDNA is delivered to the bloodstream by other tissues. However, one should keep in mind that in spite of all precautions undertaken by the authors, it is very difficult to design experiments that will unambiguously exclude damage of a small number of white blood cells with subsequent release of DNA into plasma during blood collection and centrifugation. Assuming that blood contains about 5 million white blood cells per mL, one can calculate that death of 1/1000 cells during blood collection and plasma preparation can lead to release of 5000 DNA genome equivalents, or ⬎30 ng DNA/mL. In particular, this source of DNA can explain the very broad range of cfcDNA concentrations in plasma reported by different authors. After analysis of cfcDNA in plasma, Hung et al. investigated urinary DNA. The mean fractional concentration of male DNA was 92.3% in the urine of male sex-mismatched HSCT recipients, a result that is not surprising because most of this DNA came from cells shed from the kidney and bladder or dying in situ. More interestingly, urine supernatants from female sex-mismatched HSCT recipients also contained significant amounts (26%– 88.1%) of male, and thus donor-derived, DNA. When the fractional concentrations of cell-free donor-derived DNA in urine supernatants and plasma were compared, no correlation

Editorials was found. This result was expected because the percentage of donor DNA in the urine supernatant depends on several widely variable factors: (a) the amount of the cell-free recipient DNA from urinary cells dying in situ or in the urine; (b) the amount of white blood cells, which are mainly donor-derived, present in the urine; and (c) the amount of plasma cellfree DNA, which is a source of Tr-DNA. In the next set of experiments, Hung et al. used primers for SRY amplicons in the range of 63–377 bp to compare the lengths of donor-derived cell-free DNA in urine and plasma of 5 female sex-mismatched HSCT recipients. Large DNA fragments were found in cellfree urinary DNA but not in cfcDNA in plasma. Again, because donor-derived white blood cells are present in the urine, where at least some of them are dying, one should expect the presence of large fragments of donor DNA in urine supernatants. Thus, the data obtained do not support but also do not contradict the Tr-DNA concept. Systems that exclude the appearance of cells from tissues located outside of urinary tract and bearing specific DNA markers in the urine would be more informative from this viewpoint. One such system, in patients with nasopharyngeal carcinoma, was investigated in the same laboratory. The authors found not only sequences of tumor-specific Epstein-Barr virus in the urine of these patients but also a correlation between the concentrations of transrenal Epstein-Barr virus DNA and Epstein-Barr virus cfcDNA in plasma (7 ). Because Hung et al. (1 ) did not find any correlation between concentrations and lengths of donorderived DNA in plasma and urine supernatants, they performed more detailed analyses of cells from the urine sediments. Using FISH with probes for X and Y chromosomes, Hung et al. easily detected donor cells in all urine samples, a finding that explained the results discussed above. In addition to this, however, as a reward for the carefully performed study, the authors found something else: a small proportion of donor cells in some samples were morphologically similar to urinary epithelial cells. Therefore, the investigators analyzed the urinary sediments more carefully, combining FISH with immunofluorescent detection of cytokeratin (CK) as an epithelial marker. All 10 sexmismatched HSCT recipients had recipient-derived CK-positive epithelial cells and CK-negative donorderived white blood cells. Surprisingly, in urine sediments of 3 patients a small amount (0.4%⫺1.3%) of donor-derived CK-positive epithelial cells was found. At the time of urine collection, these patients were 2.3, 3.7, and 14.2 years posttransplantation. More studies are necessary to explain this phenomenon, but currently the authors’ suggestion about differentiation of bone marrow– derived stem cells into epithelial tubular

cells looks reasonable. The plasticity of hematopoietic stem cells (14, 15 ) as well as their involvement in renal repair (16, 17 ) has already been well documented and intensively investigated. Because of the absence of available approaches for investigation of the phenomenon in clinical patients, however, all these data were obtained in animal models. If confirmed in larger clinical studies, the observations of Hung et al. may have important biological and medical consequences, providing a model for better understanding of in vivo stem cell differentiation and potential therapeutic applications of this phenomenon.

Author Contributions: All authors confirmed they have contributed to the intellectual content of this paper and have met the following 3 requirements: (a) significant contributions to the conception and design, acquisition of data, or analysis and interpretation of data; (b) drafting or revising the article for intellectual content; and (c) final approval of the published article. Authors’ Disclosures of Potential Conflicts of Interest: Upon manuscript submission, all authors completed the Disclosures of Potential Conflict of Interest form. Potential conflicts of interest: Employment or Leadership: S.R. Umansky, Xenomics. Consultant or Advisory Role: None declared. Stock Ownership: S.R. Umansky, Xenomics. Honoraria: None declared. Research Funding: None declared. Expert Testimony: None declared. Role of Sponsor: The funding organizations played no role in the design of study, choice of enrolled patients, review and interpretation of data, or preparation or approval of manuscript.

References 1. Hung ECW, Shing TKF, Chim SSC, Yeung PC, Chan RWY, Chik KW, et al. Presence of donor-derived DNA and cells in the urine of sex-mismatched hematopoietic stem cell transplant recipients: implication for the transrenal hypothesis. Clin Chem 2009;55:715–22. 2. Botezatu I, Serdyuk O, Potapova G, Alechina R, Arsenin S, Melkonyan H, et al. Genetic analysis of DNA excreted in urine: a new approach for detecting specific genomic DNA sequences from cells dying in an organism. Clin Chem 2000;46:1078 – 84. 3. Al-Yatama MK, Mustafa AS, Ali S, Abraham S, Khan Z, Khaja N. Detection of Y chromosome-specific DNA in the plasma and urine of pregnant women using nested polymerase chain reaction. Prenat Diagn 2001;21:399 – 402. 4. Koide K, Sekizawa A, Iwasaki M, Matsuoka R, Honma S, Farina A, et al. Fragmentation of cell-free fetal DNA in plasma and urine of pregnant women. Prenat Diagn 2005;25:604 –7. 5. Su YH, Wang M, Aiamkitsumrit B, Brenner DE, Block TM. Detection of a K-ras mutation in urine of patients with colorectal cancer. Cancer Biomarkers 2005;1:177– 82. 6. Umansky SR, Tomei LD. Transrenal DNA testing: progress and perspectives. Expert Rev Mol Diagn 2006;6:153– 63. 7. Chan KC, Leung SF, Yeung SW, Chan AT, Lo YM. Quantitative analysis of the transrenal excretion of circulating EBV DNA in nasopharyngeal carcinoma patients. Clin Cancer Res 2008;14:4809 –13. 8. Li Y, Zhong XY, Kang A, Troeger C, Holzgreve W, Hahn S. Inability to detect cell free fetal DNA in the urine of normal pregnant women nor in those affected by preeclampsia associated HELLP syndrome. J Soc Gynecol Investig 2003;10:503– 8.

Clinical Chemistry 55:4 (2009) 603

Editorials 9. Illanes S, Denbow ML, Smith RP, Overton TG, Soothill PW, Finning K. Detection of cell-free fetal DNA in maternal urine. Prenat Diagn 2006;26: 1216 – 8. 10. Melkonyan HS, Feaver WJ, Meyer E, Scheinker V, Shekhtman EM, Xin Z, Umansky SR. Transrenal nucleic acids from proof-of-principle to clinical tests: problems and solutions. Ann NY Acad Sci 2008;1137:73– 81. 11. Takahashi S, Takeyama K, Miyamoto S, Ichihara K, Maeda T, Kunishima Y, et al. Detection of Mycoplasma genitalium, Mycoplasma hominis, Ureaplasma urealyticum, and Ureaplasma parvum DNAs in urine from asymptomatic healthy young Japanese men. J Infect Chemother 2006;12:269 –71. 12. Belenko S, Dembo R, Weiland D, Rollie M, Salvatore C, Hanlon A, Childs K. Recently arrested adolescents are at high risk for sexually transmitted

604 Clinical Chemistry 55:4 (2009)

diseases. Sex Transm Dis 2008;35:758 – 63. 13. Lui YY, Chik KW, Chiu RW, Ho CY, Lam CW, Lo YM. Predominant hematopoietic origin of cell-free DNA in plasma and serum after sex-mismatched bone marrow transplantation. Clin Chem 2002;48:421–7. 14. Herzog EL, Chai L, Krause DS. Plasticity of marrow-derived stem cells. Blood 2003;102:3483–93. 15. Rovo´ A, Gratwohl A. Plasticity after allogeneic hematopoietic stem cell transplantation. Biol Chem 2008;389:825–36. 16. Lin F. Renal repair: role of bone marrow stem cells. Pediatr Nephrol 2008; 23:851– 61. 17. Roufosse C, Cook HT. Stem cells and renal regeneration. Nephron Exp Nephrol 2008;109:e39 – 45.

Editorials

Clinical Chemistry 55:4 605–606 (2009)

On Targeting Cell-Free DNA in Urine: A Protocol for Optimized DNA Analysis Margit Bauer* and Barbara Pertl

This is not the first editorial in Clinical Chemistry dedicated to the testing of urine for cell-free DNA. In 2000 Lo (1 ) highlighted the findings of Botezatu et al. (2 ), who were the first to describe a transfer of DNA across the kidney barrier into urine. The origins of detected transrenal DNA sequences were carcinomas, blood transfusions, and, in pregnant women, the fetus. Following this first report, some pioneering work was done in transplant medicine. Urinary microchimerism was found in women after receiving kidney transplants from males (3, 4 ). In a quantitative analysis it was demonstrated that transplant-derived DNA was increased during graft rejection, indicating a potential marker for transplant control (4 ). Later, attempts were made to detect cell-free fetal DNA in maternal urine. However, maternal urine analysis was faced with technical difficulties, and it was suggested that the concentration of transrenal DNA might be too low for standard PCR protocols. On the basis of DNA isolation and amplification techniques some authors reported positive results (2, 5–7 ), whereas others did not (4, 8, 9 ). The reason for the differing outcomes was attributed to renal function, in particular glomerular permeability, small size of DNA fragments, and the presence of urinary nucleases. A comprehensive review covering many aspects of transrenal DNA was published by Umansky and Tomei (10 ). It was reported that transrenal DNA appears in fragments of 150 –200 bp. Nearly a decade after the first report (2 ), another transformative report on this topic appears in this issue of Clinical Chemisty (11 ). Interestingly, Melkonyan and Umansky are among the authors of both papers. The current study, reported by Shekhtman et al. (11 ), aimed at optimizing the molecular technique of transrenal DNA analysis in urine. In a first set of experiments the investigators used a novel DNA isolation and purification technique based on the adsorption of nucleic acids to Q-Sepharose resin, which allows isola-

Department of Obstetrics and Gynecology, Medical University Graz, Graz, Austria. * Address correspondence to this author at: Department of Obstetrics and Gynecology, Medical University Graz, Auenbruggerplatz 14, A-8036, Graz, Austria. E-mail [email protected]. Received February 3, 2009; accepted February 4, 2009. Previously published online at DOI: 10.1373/clinchem.2008.121855

tion of short (150 –200 bp) and very short (50 –150 bp) DNA fragments. A comparison with the silica-based method showed that Q-Sepharose resin was superior in DNA detection, as demonstrated by a 100% detection rate with the Q-Sepharose resin method vs 70% with the silica method. Shekhtman et al. (11 ) designed another experiment to characterize fetal transrenal DNA. They performed real-time PCR using 4 sets of primers to amplify sequences of 25– 88 bp within the sex determining region Y (SRY) gene. These experiments showed that the assay diagnostic sensitivity was inversely correlated with the length of the PCR targets. The use of a short amplicon size of 25 bp led to a tremendous increase in the detection of fetal DNA in maternal urine, whereas it was completely undetectable with 88-bp fragments. The experimental validation was followed by a clinical study on pregnant women. The results of 173 urine samples obtained from pregnant women revealed a positive predictive value of 87.6% and a negative predictive value of 95.2%. A drawback of this study, as addressed by the authors, was that the tested urine samples were not freshly collected, but had been stored for 2 years, a condition that might have decreased the sensitivity. For a small number of urine samples (n ⫽ 15) collected in the first trimester, the test showed 100% sensitivity and 100% specificity. A potential problem with urine analysis is the possibility of contamination with male DNA, originating from sexual intercourse, leading to falsepositive results. Some authors have introduced an additional centrifugation step before DNA isolation to avoid male contamination by sperm (7 ). The study reported by Shekhtman et al. (11 ) provides important insights into the molecular characteristics of transrenal DNA. Based on their new technical approach of DNA isolation and amplification, urine has become for the first time a potent body fluid for noninvasive DNA analysis. The protocol for optimized transrenal DNA analysis might pave the road to interesting future studies in prenatal diagnosis and oncology and transplant medicine.

Author Contributions: All authors confirmed they have contributed to the intellectual content of this paper and have met the following 3 requirements: (a) significant contributions to the conception and design, acquisition of data, or analysis and interpretation of data; (b) drafting

605

Editorials or revising the article for intellectual content; and (c) final approval of the published article. Authors’ Disclosures of Potential Conflicts of Interest: No authors declared any potential conflicts of interest. Role of Sponsor: The funding organizations played no role in the design of study, choice of enrolled patients, review and interpretation of data, or preparation or approval of manuscript. Acknowledgments: We thank the European SAFE Network of Excellence for supporting our work.

References 1. Lo YMD. Molecular testing of urine: catching DNA on the way out. Clin Chem 2000;46:1039 – 40. 2. Botezatu I, Serdyuk O, Potapova G, Shelepov V, Alechina R, Molyaka Y, et al. Genetic analysis of DNA excreted in urine: a new approach for detecting specific genomic DNA sequences from cells dying in an organism. Clin Chem 2000;46:1078 – 84. 3. Zhang J, Tong KL, Li PK, Chan AY, Yeung CK, Pang CC, et al. Presence of donor- and recipient-derived DNA in cell-free urine samples of renal transplantation recipients: urinary DNA chimerism. Clin Chem 1999;45:1741– 6.

606 Clinical Chemistry 55:4 (2009)

4. Zhong XY, Hahn D, Troeger C, Klemm A, Stein G, Thomson P, et al. Cell-free DNA in urine: a marker for kidney graft rejection, but not for prenatal diagnosis? Ann NY Acad Sci 2001;945:250 –7. 5. All-Yatama MK, Mustafa AS, Ali S, Abraham S, Khan Z, Khaja N. Detection of Y chromosome-specific DNA in the plasma and urine of pregnant women using nested polymerase chain reaction. Prenat Diagn 2001;21:399 – 402. 6. Koide K, Sekizawa A, Iwasaki M, Matsuoka R, Honma S, Farina A, et al. Fragmentation of cell-free fetal DNA in plasma and urine of pregnant woman. Prenat Diagn 2005;25:604 –7. 7. Majer S, Bauer M, Magnet E, Strele A, Giegerl E, Eder M, et al. Maternal urine for prenatal diagnosis: an analysis of cell-free fetal DNA in maternal urine and plasma in the third trimester. Prenat Diagn 2007;27:1219 –23. 8. Illianes S, Denbow ML, Smith RP, Overton TG, Soothill PW, Finning K. Detection of cell-free fetal DNA in maternal urine. Prenat Diagn 2006;26: 1216 – 8. 9. Li Y, Zhong X Y, Kang A, Troeger C, Holzgreve W, Hahn S. Inability to detect cell free fetal DNA in the urine of normal pregnant women nor in those affected by preeclampsia associated HELLP Syndrome. J Soc Gynecol Investig 2003;10:503– 8. 10. Umansky SR, Tomei LD. Transrenal DNA testing: progress and perspectives. Expert Rev Mol Diagn 2006;6:153– 63. 11. Shekhtman EM, Anne K, Melkonyan HS, Robbins DJ, Warsof SL, Umansky SR. Optimization of transrenal DNA analysis: detection of fetal DNA in maternal urine. Clin Chem 2009;55:723–9.

Editorials

Clinical Chemistry 55:4 607–608 (2009)

Next-Generation Sequencing of Plasma/Serum DNA: An Emerging Research and Molecular Diagnostic Tool Y.M. Dennis Lo* and Rossa W.K. Chiu

Over the past decade, it has been increasingly realized that cell-free DNA and RNA molecules present in plasma and serum are valuable molecular diagnostic tools (1, 2 ). For example, tumor-derived (3 ) and fetalderived (4 ) nucleic acids have been found in cancer patients and pregnant women, respectively, thus opening up clinical uses in oncology and prenatal diagnosis. A number of these applications have already been incorporated into clinical practice, such as the prenatal determination of fetal RhD status (5 ) and the detection and monitoring of nasopharyngeal carcinoma using plasma Epstein-Barr virus DNA measurement (6 ). In contrast to its clinical applications, the characterization of circulating nucleic acids has not received as much attention. In this regard, DNA sequencing is a powerful method to address this imbalance. A number of groups have used conventional cloning and DNA sequencing techniques to detect and study circulating nucleic acids (7–9 ). However, such methods are laborintensive and can generate sequence information for only a small number of molecules from a few genomic loci. The recent advent of next-generation, massively parallel sequencing technologies (10 ) has provided an alternative approach for the detection, measurement, and characterization of plasma nucleic acids. Three groups have reported the use of massively parallel sequencing technologies in the analysis of plasma/serum DNA (11–13 ). These approaches can be divided into 2 main types: the first type involves the random sequencing of DNA molecules in plasma/serum (11, 12 ), and the second type involves the deep sequencing of a selected (i.e., nonrandom) subset of circulating DNA molecules (13 ). As an illustration of the power of the random sequencing approach, Chiu et al. (11 ) and Fan et al. (12 ) used the Illumina/Solexa platform to randomly sequence a short (25–36 bp) tag on

Centre for Research into Circulating Fetal Nucleic Acids, Li Ka Shing Institute of Health Sciences and Department of Chemical Pathology, The Chinese University of Hong Kong, Prince of Wales Hospital, Shatin, New Territories, Hong Kong SAR, China. * Address correspondence to this author at: Department of Chemical Pathology, The Chinese University of Hong Kong, Prince of Wales Hospital, 30-32 Ngan Shing St., Shatin, New Territories, Hong Kong SAR, China. Fax ⫹852 2636 5090; e-mail [email protected]. Received January 25, 2009; accepted January 28, 2009. Previously published online at DOI: 10.1373/clinchem.2009.123661

millions of DNA molecules obtained from the plasma of pregnant women. The short tags allowed such molecules to be mapped back to the reference human genome. The relative representation of each chromosome in plasma could then be calculated. These groups showed that in women carrying fetuses affected by a chromosomal aneuploidy (e.g., trisomy 21) the proportional representation of the affected chromosome (e.g., chromosome 21) would be increased in maternal plasma. This approach thus provides a basis for the development of noninvasive prenatal tests for fetal chromosomal aneuploidies. Korshunova et al. (13 ), on the other hand, illustrated the use of deep sequencing of selected, nonrandom genomic loci by performing massively parallel bisulfite sequencing of 4 loci using the Roche/454 platform in sera obtained from cancer patients and control subjects. They found that DNA molecules containing every conceivable cytosinemethylation pattern could be found in the sera of both the cancer patients and cancer-free controls, thus highlighting the challenge for developing highly specific serum DNA methylation markers for cancer. In this issue of Clinical Chemistry, Beck et al. (14 ) provide yet another illustration of the random sequencing approach by applying the Roche/454 platform to the massively parallel sequencing of serum DNA from a cohort of apparently healthy individuals. Beck et al. obtained a total of 450 000 sequences from 50 apparently healthy individuals. Although not explicitly stated, it appears that the authors had pooled the samples and then subjected the pooled samples to sequencing, obtaining an average of 9100 sequence reads per sample. The authors found that most classes of sequences that they analyzed (e.g., genes and RNA and DNA coding sequences) did not appear to differ between serum DNA and genomic DNA. On the other hand, they found evidence of overrepresentation of Alu sequences in serum DNA. It has been well established that serum contains a higher concentration of DNA than plasma, possibly because of DNA released from the blood cells during the clotting process (15 ). Thus, one can perhaps argue that it might have been more reflective of the in vivo situation in the circulation if the authors had studied plasma rather than serum DNA, using massively parallel sequencing. It is particularly interesting that the authors found hepatitis B virus DNA sequences in one serum sample, 607

Editorials thus demonstrating that the approach might have future application in infectious disease research. The authors did not provide data to show whether the hepatitis B virus sequences obtained using the massively parallel sequencing approach fit with those obtained using conventional cloning and sequencing approach from the same case. It is also intriguing that the authors reported that some 0.16% of the sequences might be of possible bacterial origin. It would be of interest to see a more detailed analysis of the types of bacteria that might have contributed to this phenomenon and its potential clinical significance. An increasing number of platforms are available for massively parallel sequencing (10 ). For the quantitative random sequencing of plasma/serum DNA, such as that discussed above for the noninvasive prenatal diagnosis of fetal chromosomal aneuploidies (11, 12 ), one could perhaps argue that platforms that provide short reads but with a higher throughput in terms of the number of molecules analyzed might be more efficient and cost effective than platforms that provide longer reads but with a lower throughput in terms of the number of molecules sequenced. In such applications, as long as the read length is sufficient for mapping back to the genome (e.g., 25–36 bp) or another reference set of sequences, the task would be adequately performed. On the other hand, for applications that are focused on the characterization of plasma/serum DNA, e.g., identification of novel pathogens or mutation detection, platforms with a longer read length might have advantages over platforms with shorter read lengths. With the rapid increases in read lengths and throughput of the various platforms, the gaps between these systems for plasma/serum nucleic acid sequencing might become narrower in the near future. It will probably still be a number of years before the diagnostic applications using massively parallel sequencing become commonplace, however. Currently, the equipment and reagents are still relatively expensive. Furthermore, the bioinformatic support that is needed to analyze the data is immense and out of reach for most diagnostic laboratories at the present time. Nonetheless, with additional technical advances and cost reduction in the coming years, it is highly likely that massively parallel sequencing approaches will eventually become a routine tool in laboratory medicine.

Author Contributions: All authors confirmed they have contributed to the intellectual content of this paper and have met the following 3 requirements: (a) significant contributions to the conception and design,

608 Clinical Chemistry 55:4 (2009)

acquisition of data, or analysis and interpretation of data; (b) drafting or revising the article for intellectual content; and (c) final approval of the published article. Authors’ Disclosures of Potential Conflicts of Interest: Upon manuscript submission, all authors completed the Disclosures of Potential Conflict of Interest form. Potential conflicts of interest: Employment or Leadership: None declared. Consultant or Advisory Role: Y.M.D. Lo, Sequenom. Stock Ownership: Y.M.D. Lo, Sequenom. Honoraria: None declared. Research Funding: Y.M.D. Lo, Sequenom. Expert Testimony: None declared. Other: Y.M.D. Lo holds patents or patent applications on aspects of the use of plasma DNA/RNA for molecular diagnosis. Role of Sponsor: The funding organizations played no role in the design of study, choice of enrolled patients, review and interpretation of data, or preparation or approval of manuscript.

References 1. Gahan PB, Swaminathan R. Circulating nucleic acids in plasma and serum: recent developments. Ann N Y Acad Sci 2008;1137:1– 6. 2. Lo YMD, Chiu RWK. Prenatal diagnosis: progress through plasma nucleic acids. Nat Rev Genet 2007;8:71–7. 3. Diehl F, Schmidt K, Choti MA, Romans K, Goodman S, Li M, et al. Circulating mutant DNA to assess tumor dynamics. Nat Med 2008;14:985–90. 4. Lo YMD, Tsui NB, Chiu RWK, Lau TK, Leung TN, Heung MM, et al. Plasma placental RNA allelic ratio permits noninvasive prenatal chromosomal aneuploidy detection. Nat Med 2007;13:218 –23. 5. Finning K, Martin P, Summers J, Massey E, Poole G, Daniels G. Effect of high throughput RHD typing of fetal DNA in maternal plasma on use of anti-RhD immunoglobulin in RhD negative pregnant women: prospective feasibility study. BMJ 2008;336:816 – 8. 6. Lo YMD, Chan LY, Lo KW, Leung SF, Zhang J, Chan AT, et al. Quantitative analysis of cell-free Epstein-Barr virus DNA in plasma of patients with nasopharyngeal carcinoma. Cancer Res 1999;59:1188 –91. 7. Poon LLM, Leung TN, Lau TK, Chow KC, Lo YMD. Differential DNA methylation between fetus and mother as a strategy for detecting fetal DNA in maternal plasma. Clin Chem 2002;48:35– 41. 8. Suzuki N, Kamataki A, Yamaki J, Homma Y. Characterization of circulating DNA in healthy human plasma. Clin Chim Acta 2008;387:55– 8. 9. van der Vaart M, Pretorius PJ. A method for characterization of total circulating DNA. Ann N Y Acad Sci 2008;1137:92–7. 10. Schuster SC. Next-generation sequencing transforms today’s biology. Nat Methods 2008;5:16 – 8. 11. Chiu RWK, Chan KCA, Gao Y, Lau VYM, Zheng W, Leung TY, et al. Noninvasive prenatal diagnosis of fetal chromosomal aneuploidy by massively parallel genomic sequencing of DNA in maternal plasma. Proc Natl Acad Sci U S A 2008;105:20458 – 63. 12. Fan HC, Blumenfeld YJ, Chitkara U, Hudgins L, Quake SR. Noninvasive diagnosis of fetal aneuploidy by shotgun sequencing DNA from maternal blood. Proc Natl Acad Sci U S A 2008;105:16266 –71. 13. Korshunova Y, Maloney RK, Lakey N, Citek RW, Bacher B, Budiman A, et al. Massively parallel bisulphite pyrosequencing reveals the molecular complexity of breast cancer-associated cytosine-methylation patterns obtained from tissue and serum DNA. Genome Res 2008;18:19 –29. 14. Beck J, Urnovitz HB, Riggert J, Clerici M, Schutz E. Profile of the circulating DNA in apparently healthy individuals. Clin Chem 2009;55:730 – 8. 15. Lo YMD, Tein MS, Lau TK, Haines CJ, Leung TN, Poon PM, et al. Quantitative analysis of fetal DNA in maternal plasma and serum: implications for noninvasive prenatal diagnosis. Am J Hum Genet 1998;62:768 –75.

Perspective

Clinical Chemistry 55:4 609–610 (2009)

A New Tool for Oligonulceotide Import into Cells Daniel C. Leslie1 and James P. Landers1,2,3*

The field of gene therapy potentially offers physicians an entirely new set of armaments with which to battle disease. The approach, radical in concept but powerful in its potential effect, generically aims to correct defective genes responsible for disease development by inserting the normal gene into a nonspecific location in the genome or by swapping out the abnormal gene through homologous recombination. Perhaps more tangible, near-term approaches are those that look to either repair the abnormal gene via selective reverse mutation or “knockdown” the expression of the mutated gene. Delivery of oligonucleotides, whether as entire genes or shorter antisense strands, remains one of the most significant challenges in the field. The linchpin to success is the ability to efficiently and selectively insert oligonucleotides into the desired cells. Current efforts in the application of oligonucleotide delivery in this and other arenas are restricted by the relatively low efficiency of methods used for oligonucleotide transfer. In addition, potential hazards exist with any method that introduces foreign biological material into the body—toxicity issues as well as the potential for mounting an immune response against a specific gene delivery vehicle complicate the challenge. It is against this backdrop that a series of papers out of the Mirkin group at Northwestern University is significant (1– 4 ). Few disagree with the power that oligonucleotide probes offer as detectors of nucleic acids and regulators of gene expression in living cells. However, the utility of such probes is hampered by difficulties in cellular uptake and degradation once inside the cell. Coupling these probes to gold nanoparticles is one approach that may have measurable utility. The use of biomolecule-modified nanoparticles by Mirkin and colleagues has established that these entities may present a new arsenal of versatile tools for intracellular reporting and gene expression control. These tools will facilitate research in living, functioning cells, possibly leading to novel therapeutic agents and perhaps laying the groundwork for a new generation of molecular vehicles for oligonucleotide delivery.

Departments of 1 Chemistry; 2 Mechanical Engineering; and 3 Pathology, University of Virginia, Charlottesville, VA. * Address correspondence to this author at: Department of Chemistry, University of Virginia, McCormick Road, P.O. Box 400319, Charlottesville, VA 22904. Fax 434-243-8852; e-mail [email protected]. Received January 13, 2009; accepted January 29, 2009. Previously published online at DOI: 10.1373/clinchem.2008.113019

Cellular Uptake Gold nanoparticle– oligionucleotide complexes can regulate protein expression in cells (1 ). The advantages over other methods are that the complexes do not require a separate transfection agent to enter the cell in substantial quantities and once there are less susceptible to DNase digestion. More than 60% of the oligionucleotides remain bound to the gold nanoparticles after 48 h inside C166 cells, a mouse endothelial cell line. Further experiments showed that bound oligonucleotides were less susceptible to enzymatic degradation, opening the possibility for longer knockdown than oligonucleotides alone. The nanoparticles outperformed commercially available transfection agents (lipoplexes) in percentage knockdown in expression of enhanced green fluorescent protein, total antisense oligonucleotide delivery, and observed nontoxicity. The expression was decreased by 6%– 8% with the commercial kits, but the nanoparticles decreased expression by up to 20% (1 ). Further studies confirmed that the uptake of nanoparticles is high across multiple cell lines (2 ), more than 3 orders of magnitude higher than the uptake of unmodified nanoparticles alone. The modified nanoparticles were shown to change markedly in both diameter and surface potential on exposure to cell culture media. This effect is attributed to the binding of positively charged serum proteins to the nanoparticles, a process that is key to cellular uptake. The nontoxic cellular uptake of large masses of stable oligonucleotides opens the possibility of delivering reporters and controllers of gene activity in unprecedented ways. Nano-flares Probes to detect intracellular levels of RNA are often difficult to transfect and susceptible to degradation. Recently, Seferos et al. showed that oligonucleotidemodified nanoparticles could be used as intracellular reporters of gene expression in living cells (3 ). Nanoflares combine transfecting and reporting RNA concentrations in living cells by hybridizing a fluorescently labeled oligonucleotide with its bound complement. When hybridized, the fluorophore is quenched by the gold nanoparticle. The bound oligonucleotides are complementary to the target mRNA sequences and, in the presence of the correct mRNA, the fluorescent reporter is released from the nanoparticle as the mRNA binds to the bound complement. Fluorescence data from 609

Perspective a population of nano-flare–treated cells was obtained, with a 2.5-fold increase in fluorescence in the surviving recognition sequence compared to a population treated with noncomplementary nano-flares. The background fluorescence levels of noncomplementary molecular beacons in a population of cells from the SKBR3 human breast-cancer cell line were almost twice those of the background of noncomplementary nano-flare–treated cells. Ultimately, the nano-flares compared favorably to RTPCR quantification of survivin expression in cells exposed to small interfering RNA. Peptide Antisense Nanoparticles Patel et al. demonstrated the synthesis of gold nanoparticles modified with both synthetic peptides and oligonucleotides designed to regulate gene expression, a technique that may lead to novel therapies (4 ). Building on their previous work with proteins facilitating intracellular trafficking, the peptides bound to the nanoparticles were linked to increased cellular uptake and/or altered intracellular localization. The peptide– oligonucleotide-modified nanoparticles showed a 75% decreased expression of glyceraldehyde 3-phosphate dehydrogenase, whereas oligonucleotide-modified nanoparticles showed only a 50% decrease in expression (4 ). Interestingly, the increased knockdown was not associated with greater numbers of nanoparticles per cell, but the nanoparticles were reproducibly associated more closely to the cell’s nucleus. Synopsis A convincing body of work increasingly suggests that gold nanoparticles could see wide application for intra-

cellular reporting and gene expression control. The reports surveyed here suggest the evolution of nanoparticlebased therapeutic approaches, offering a more efficient knockdown of gene expression but without the need for the transfection agents that often negatively augment cellular function. As with other emerging technologies, and particularly with nanotechnology-based tools, short- and long-term toxicity must be studied, as well as the effect on cellular processes in vivo and on the immune system. As recently noted by Austin and Lim, the perils of nanotechnology in medicine must be carefully weighed (5 ). Although gold is chemically inert and generally regarded as safe in biological organisms, its properties in nanoparticle form are less understood and must be evaluated. Even with the inertness of gold, its pharmacokinetics and pharmacodynamics in any nanoparticle-based approach will have to be evaluated and understood. However, that should not taint our view of the potential advances in oligonucleotide delivery described in these reports.

Author Contributions: All authors confirmed they have contributed to the intellectual content of this paper and have met the following 3 requirements: (a) significant contributions to the conception and design, acquisition of data, or analysis and interpretation of data; (b) drafting or revising the article for intellectual content; and (c) final approval of the published article. Authors’ Disclosures of Potential Conflicts of Interest: No authors declared any potential conflicts of interest. Role of Sponsor: The funding organizations played no role in the design of study, choice of enrolled patients, review and interpretation of data, or preparation or approval of manuscript.

References 1. Rosi NL, Giljohann DA, Thaxton CS, Lytton-Jean AKR, Han MS, Mirkin CA. Oligonucleotidemodified gold nanoparticles for intracellular gene regulation. Science (Wash DC) 2006;312:1027–30. 2. Giljohann DA, Seferos DS, Patel PC, Millstone JE, Rosi NL, Mirkin CA. Oligonucleotide loading deter-

610 Clinical Chemistry 55:4 (2009)

mines cellular uptake of DNA-modified gold nanoparticles. Nano Lett 2007;7:3818 –21. 3. Seferos DS, Giljohann DA, Hill HD, Prigodich AE, Mirkin CA. Nano-flares: probes for transfection and mRNA detection in living cells. J Am Chem Soc 2007;129:15477–9.

4. Patel PC, Giljohann DA, Seferos DS, Mirkin CA. Peptide antisense nanoparticles. Proc Nat Acad Sci (Wash DC) 2008;105:17222– 6 5. Austin RH, Lim S-F. The Sackler Colloquium on promises and perils in nanotechnology for medicine. Proc Nat Acad Sci (Wash DC) 2008;105:17217–21.

Special Report

Clinical Chemistry 55:4 611–622 (2009)

The MIQE Guidelines: Minimum Information for Publication of Quantitative Real-Time PCR Experiments Stephen A. Bustin,1* Vladimir Benes,2 Jeremy A. Garson,3,4 Jan Hellemans,5 Jim Huggett,6 Mikael Kubista,7,8 Reinhold Mueller,9 Tania Nolan,10 Michael W. Pfaffl,11 Gregory L. Shipley,12 Jo Vandesompele,5 and Carl T. Wittwer13,14

BACKGROUND: Currently, a lack of consensus exists on how best to perform and interpret quantitative realtime PCR (qPCR) experiments. The problem is exacerbated by a lack of sufficient experimental detail in many publications, which impedes a reader’s ability to evaluate critically the quality of the results presented or to repeat the experiments. CONTENT:

The Minimum Information for Publication of Quantitative Real-Time PCR Experiments (MIQE) guidelines target the reliability of results to help ensure the integrity of the scientific literature, promote consistency between laboratories, and increase experimental transparency. MIQE is a set of guidelines that describe the minimum information necessary for evaluating qPCR experiments. Included is a checklist to accompany the initial submission of a manuscript to the publisher. By providing all relevant experimental conditions and assay characteristics, reviewers can assess the validity of the protocols used. Full disclosure of all reagents, sequences, and analysis methods is necessary to enable other investigators to reproduce results. MIQE details should be published either in abbreviated form or as an online supplement.

1

Centre for Academic Surgery, Institute of Cell and Molecular Science, Barts and the London School of Medicine and Dentistry, London, UK; 2 Genomics Core Facility, EMBL Heidelberg, Heidelberg, Germany; 3 Centre for Virology, Department of Infection, University College London, London, UK; 4 Department of Virology, UCL Hospitals NHS Foundation Trust, London, UK; 5 Center for Medical Genetics, Ghent University Hospital, Ghent, Belgium; 6 Centre for Infectious Diseases, University College London, London, UK; 7 TATAA Biocenter, Go¨teborg, Sweden; 8 Institute of Biotechnology AS CR, Prague, Czech Republic; 9 Sequenom, San Diego, California, USA; 10 Sigma–Aldrich, Haverhill, UK; 11 Physiology Weihenstephan, Technical University Munich, Freising, Germany; 12 Quantitative Genomics Core Laboratory, Department of Integrative Biology and Pharmacology, University of Texas Health Science Center, Houston, Texas, USA; 13 Department of Pathology, University of Utah, Salt Lake City, Utah, USA; 14 ARUP Institute for Clinical and Experimental Pathology, Salt Lake City, Utah, USA. * Address correspondence to this author at: 3rd Floor Alexandra Wing, The Royal London Hospital, London E1 1BB, UK. Fax ⫹44-(0)20-7377 7283; e-mail [email protected]. Received October 20, 2008; accepted January 27, 2009. Previously published online at DOI: 10.1373/clinchem.2008.112797

SUMMARY:

Following these guidelines will encourage better experimental practice, allowing more reliable and unequivocal interpretation of qPCR results.

© 2009 American Association for Clinical Chemistry

The fluorescence-based quantitative real-time PCR (qPCR)15 (1–3 ), with its capacity to detect and measure minute amounts of nucleic acids in a wide range of samples from numerous sources, is the enabling technology par excellence of molecular diagnostics, life sciences, agriculture, and medicine (4, 5 ). Its conceptual and practical simplicity, together with its combination of speed, sensitivity, and specificity in a homogeneous assay, have made it the touchstone for nucleic acid quantification. In addition to its use as a research tool, many diagnostic applications have been developed, including microbial quantification, gene dosage determination, identification of transgenes in genetically modified foods, risk assessment of cancer recurrence, and applications for forensic use (6 –11 ). This popularity is reflected in the prodigious number of publications reporting qPCR data, which invariably use diverse reagents, protocols, analysis methods, and reporting formats. This remarkable lack of consensus on how best to perform qPCR experiments has the adverse consequence of perpetuating a string of serious shortcomings that encumber its status as an independent yardstick (12 ). Technical deficiencies that affect assay performance include the following: (a) inadequate sample storage, preparation, and nucleic acid quality, yielding highly variable results; (b) poor choice of reverse-transcription primers and primers and probes for the PCR, leading to inefficient and less-than-robust assay performance;

15

Nonstandard abbreviations: qPCR, quantitative real-time PCR; MIQE, Minimum Information for Publication of Quantitative Real-Time PCR Experiments; RTqPCR, reverse transcription– qPCR; FRET, fluorescence resonance energy transfer; Cq, quantification cycle, previously known as the threshold cycle (Ct), crossing point (Cp), or take-off point (TOP); RDML, Real-Time PCR Data Markup Language; LOD, limit of detection; NTC, no-template control.

611

Special Report and (c) inappropriate data and statistical analyses, generating results that can be highly misleading. Consequently, there is the real danger of the scientific literature being corrupted with a multitude of publications reporting inadequate and conflicting results ( 13 ). The publication (14 ) and retraction (15 ) of a Science “Breakthrough of the Year 2005” report provides a disquieting warning. The problem is exacerbated by the lack of information that characterizes most reports of studies that have used this technology, with many publications not providing sufficient experimental detail to permit the reader to critically evaluate the quality of the results presented or to repeat the experiments. Specifically, information about sample acquisition and handling, RNA quality and integrity, reversetranscription details, PCR efficiencies, and analysis parameters are frequently omitted, whereas sample normalization is habitually carried out against single reference genes without adequate justification. The aim of this document is to provide authors, reviewers, and editors specifications for the minimum information, set out in Table 1, that must be reported for a qPCR experiment to ensure its relevance, accuracy, correct interpretation, and repeatability. MIQE (Minimum Information for Publication of Quantitative Real-Time PCR Experiments, pronounced mykee) is modeled on similar guidelines drawn up for DNA microarray analysis (16 ), proteomics experiments (17 ), genome sequence specification (18 ), and those under discussion for RNA interference work (19, 20 ) and metabolomics (21 ), all of which are initiatives coordinated under the umbrella of MIBBI (Minimum Information for Biological and Biomedical Investigations, http://www.mibbi.org) (22 ). Compulsory inclusion of a common reporting language to allow data sharing is not proposed, although it is envisaged that a future update of these guidelines could include such a recommendation. Rather, these guidelines target the reliability of results to help ensure the integrity of the scientific literature, promote consistency between laboratories, and increase experimental transparency. They should be read in conjunction with recent publications that deal in depth with the issue of qPCR standardization (23–26 ). 1. Nomenclature A few terms require standardization to ensure clarification: 1.1 We propose that the abbreviation qPCR be used for quantitative real-time PCR and that RT-qPCR be used for reverse transcription– qPCR. Applying the abbreviation RT-PCR to qPCR causes confu612 Clinical Chemistry 55:4 (2009)

1.2 1.3 1.4

1.5

1.6

sion and is inconsistent with its use for conventional (legacy) reverse transcription–PCR. Genes used for normalization should be referred to as reference genes, not as housekeeping genes. TaqMan probes should be referred to as hydrolysis probes. The term FRET probe (fluorescence resonance energy transfer probe) refers to a generic mechanism in which emission/quenching relies on the interaction between the electron-excitation states of 2 fluorescent dye molecules. LightCycler-type probes should be referred to as dual hybridization probes. The Oxford English Dictionary lists only quantification, not quantitation; therefore, the former is the proper word. The nomenclature describing the fractional PCR cycle used for quantification is inconsistent, with threshold cycle (Ct), crossing point (Cp), and take-off point (TOP) currently used in the literature. These terms all refer to the same value from the real-time instrument and were coined by competing manufacturers of real-time instruments for reasons of product differentiation, not scientific accuracy or clarity. We propose the use of quantification cycle (Cq), according to the RDML (Real-Time PCR Data Markup Language) data standard (http:// www.rdml.org) (27 ).

2. Conceptual Considerations To explain and justify the guidelines, we find it useful to review a number of key issues surrounding qPCR experiments: 2.1 Analytical sensitivity refers to the minimum number of copies in a sample that can be measured accurately with an assay, whereas clinical sensitivity is the percentage of individuals with a given disorder whom the assay identifies as positive for that condition. Typically, sensitivity is expressed as the limit of detection (LOD), which is the concentration that can be detected with reasonable certainty (95% probability is commonly used) with a given analytical procedure. The most sensitive LOD theoretically possible is 3 copies per PCR (28 ), assuming a Poisson distribution, a 95% chance of including at least 1 copy in the PCR, and single-copy detection. Experimental procedures typically include sample-processing steps (i.e., extraction) and, when required, reverse transcription. If the volume changes and the efficiencies of these steps are accounted for, the most sensitive LOD theoretically possible can be expressed in units relevant to the experiment, such as copies per nanogram of tissue. Experimental results less than the theoreti-

Special Report

MIQE Guidelines for qPCR

Table 1. MIQE checklist for authors, reviewers, and editors.a Item to check

Importance

Experimental design

Item to check

Importance

qPCR oligonucleotides

Definition of experimental and control groups

E

Primer sequences

Number within each group

E

RTPrimerDB identification number

D

Assay carried out by the core or investigator’s laboratory?

D

Probe sequences

Dd

Acknowledgment of authors’ contributions

D

Location and identity of any modifications

E

Manufacturer of oligonucleotides

D

Purification method

D

Sample

E

Description

E

Volume/mass of sample processed

D

Microdissection or macrodissection

E

Complete reaction conditions

E

Processing procedure

E

Reaction volume and amount of cDNA/DNA

E

If frozen, how and how quickly?

E

Primer, (probe), Mg2⫹, and dNTP concentrations

E

If fixed, with what and how quickly?

E

Polymerase identity and concentration

E

Sample storage conditions and duration (especially for FFPEb samples)

E

Nucleic acid extraction

qPCR protocol

Buffer/kit identity and manufacturer

E

Exact chemical composition of the buffer

D

Procedure and/or instrumentation

E

Additives (SYBR Green I, DMSO, and so forth)

E

Name of kit and details of any modifications

E

Manufacturer of plates/tubes and catalog number

D

Source of additional reagents used

D

Complete thermocycling parameters

E

Details of DNase or RNase treatment

E

Reaction setup (manual/robotic)

D

Contamination assessment (DNA or RNA)

E

Manufacturer of qPCR instrument

E

Nucleic acid quantification

E

Instrument and method

E

Evidence of optimization (from gradients)

D

Purity (A260/A280)

D

Specificity (gel, sequence, melt, or digest)

E

Yield

D

For SYBR Green I, Cq of the NTC

E

RNA integrity: method/instrument

E

Calibration curves with slope and y intercept

E

qPCR validation

RIN/RQI or Cq of 3⬘ and 5⬘ transcripts

E

PCR efficiency calculated from slope

E

Electrophoresis traces

D

CIs for PCR efficiency or SE

D

Inhibition testing (Cq dilutions, spike, or other)

E

r2 of calibration curve

E

Linear dynamic range

E

Reverse transcription Complete reaction conditions

E

Cq variation at LOD

E

Amount of RNA and reaction volume

E

CIs throughout range

D

Priming oligonucleotide (if using GSP) and concentration

E

Evidence for LOD

E

Reverse transcriptase and concentration

E

If multiplex, efficiency and LOD of each assay

E

Temperature and time

E

Manufacturer of reagents and catalogue numbers

D

qPCR analysis program (source, version)

Cqs with and without reverse transcription

Dc

Method of Cq determination

E

Storage conditions of cDNA

D

Outlier identification and disposition

E

qPCR target information

Data analysis E

Results for NTCs

E

Gene symbol

E

Justification of number and choice of reference genes

E

Sequence accession number

E

Description of normalization method

E

Location of amplicon

D

Number and concordance of biological replicates

D

Amplicon length

E

Number and stage (reverse transcription or qPCR) of technical replicates

E

In silico specificity screen (BLAST, and so on)

E

Repeatability (intraassay variation)

E

Pseudogenes, retropseudogenes, or other homologs?

D

Reproducibility (interassay variation, CV)

D

Sequence alignment

D

Power analysis

D

Secondary structure analysis of amplicon

D

Statistical methods for results significance

E

Location of each primer by exon or intron (if applicable)

E

Software (source, version)

E

What splice variants are targeted?

E

Cq or raw data submission with RDML

D

a

All essential information (E) must be submitted with the manuscript. Desirable information (D) should be submitted if available. If primers are from RTPrimerDB, information on qPCR target, oligonucleotides, protocols, and validation is available from that source. FFPE, formalin-fixed, paraffin-embedded; RIN, RNA integrity number; RQI, RNA quality indicator; GSP, gene-specific priming; dNTP, deoxynucleoside triphosphate. c Assessing the absence of DNA with a no–reverse transcription assay is essential when first extracting RNA. Once the sample has been validated as DNA free, inclusion of a no–reverse transcription control is desirable but no longer essential. d Disclosure of the probe sequence is highly desirable and strongly encouraged; however, because not all vendors of commercial predesigned assays provide this information, it cannot be an essential requirement. Use of such assays is discouraged. b

Clinical Chemistry 55:4 (2009) 613

Special Report

2.2

2.3

2.4

2.5

cally possible LOD should never be reported. It also follows that results of “0” are meaningless and misleading. LOD estimates in qPCR analyses are complicated by the logarithmic nature of Cq, because Cq is undefined when the template concentration is zero. Appropriate determination and modeling of the LOD in the qPCR is the focus of continued research (26 ). Analytical specificity refers to the qPCR assay detecting the appropriate target sequence rather than other, nonspecific targets also present in a sample. Diagnostic specificity is the percentage of individuals without a given condition whom the assay identifies as negative for that condition. Accuracy refers to the difference between experimentally measured and actual concentrations, presented as fold changes or copy number estimates. Repeatability (short-term precision or intraassay variance) refers to the precision and robustness of the assay with the same samples repeatedly analyzed in the same assay. It may be expressed as the SD for the Cq variance. Alternatively, the SD or the CV for copy number or concentration variance may be used. CVs should not be used with Cqs, however (29 ). Reproducibility (long-term precision or interassay variance) refers to the variation in results between runs or between different laboratories and is typically expressed as the SD or CV of copy numbers or concentrations. Cq values generated from different runs are subject to inherent interrun variation (30 ); hence, reporting interrun Cq variation is not appropriate.

Publications describing mRNA concentrations for target genes should make it clear precisely what the targets are. The transcripts of most human genes and many genes in other multicellular organisms are alternatively spliced (31, 32 ), and these splicing variants specify alternative protein isoforms, with variation in splicing patterns reported in different tissues or at different developmental stages. Consequently, single exon– based RT-qPCR assays may detect a number of splice variants, whereas intron-spanning primers may be more selective but may miss some splice variants altogether. Most recently, autosomal nonimprinted genes that display allelic imbalance in their expression have been described (33 ). Taken together, these findings imply that use of an RT-qPCR assay that simply targets one or at most 2 exons of an mRNA is no longer sufficient to describe the expression level of a particular gene. Consequently, sequence information for primers must be provided together with an assessment of their specificity with respect to known splice variants and 614 Clinical Chemistry 55:4 (2009)

single-nucleotide polymorphism positions documented in transcript and single-nucleotide polymorphism databases. For primer sets selected from the RTprimerDB database (34, 35 ), this is easily done by consulting the RTprimerDB Web site (http://www. rtprimerdb.org), which contains all the relevant information. For commercial assays, lot information and the providers’ experimental validation criteria are required. The reporting of results for nonvalidated commercial assays and assays that have been validated only in silico are strongly discouraged. It must be remembered that detection of the presence of an mRNA provides no information on whether that mRNA will be translated into a protein or, indeed, whether a functional protein is translated at all. Immunohistochemistry, western blotting, or other protein-quantification methods are not always able to corroborate quantitative cellular mRNA data. It is now well established that there is frequently a lack of concordance between mRNA- and proteinconcentration data (36 ), which is particularly true for mRNAs that specify proteins that are part of multifunction protein complexes (37 ). Finally, it has become clear that knowledge of the presence and function of specific microRNAs is as important to understanding gene expression as being able to quantify the mRNA species (38 ). It is also necessary to be aware that most quantitative RNA data are not absolute, but relative. Thus, the reference genes or materials used for standardization are critical, and any assessment of the validity of an RT-qPCR experiment must also consider the appropriateness of the relative-quantification reference. Therefore, the development of universal reference DNA and RNA calibration materials, although very helpful (39, 40 ), will not be a universal panacea (41, 42 ). Much of the variance in reported expression values produced in RT-qPCR experiments is not simply due to variation in experimental protocols but is caused by corrections applied by various dataprocessing algorithms, each of which makes its own assumptions about the data. Consequently, although qPCR has frequently been proclaimed a touchstone or a gold standard, in practice this “standard” is a variable one, and the reporting of results requires considerable sophistication of analysis and interpretation (43 ). 3. Research vs Diagnostic Applications Applications of qPCR technology can be broadly divided into research and diagnostic applications. Research applications usually analyze a wide range of targets with a fairly low throughput and many different sample types. The main parameters that need to be addressed relate to assay analytical sensitivity

Special Report

MIQE Guidelines for qPCR

and specificity, which in this context refer to how many target copies the assay can detect and whether the no-template controls (NTCs) are reliably negative, respectively. In contrast, diagnostic applications usually analyze a limited number of targets, but require highthroughput protocols that are targeted at only a few sample types. Although all of the considerations that apply to research applications also apply to diagnostic assays, clinical-diagnostic assays have a number of additional requirements that need to be considered. These requirements include information on analytical sensitivity and specificity that in this context refers to how often the assay returns a positive result when a target is present and how often it is negative in the absence of the target. Furthermore, the accuracy and precision within and between laboratories is often monitored by external QC programs. Additional clinical laboratory requirements include criteria for generating reportable results, whether repeated measurements are made on samples, data on the resolution of false-positive/false-negative data, and the similarity of results from multiple laboratories that use the same and different technologies. Thus far, only a couple of interlaboratory comparisons have been performed, and both of these studies emphasized the need for standardization of qPCR diagnostic assays (44, 45 ). Another interlaboratory exercise is planned within the European Framework 7 project: SPIDIA (Standardisation and Improvement of Generic Pre-analytical Tools and Procedures for In-Vitro Diagnostics; http://www. spidia.eu). 4. Sample Acquisition, Handling, and Preparation Sample acquisition constitutes the first potential source of experimental variability, especially for experiments targeting RNA, because mRNA profiles are easily perturbed by sample-collection and -processing methods. There is some suggestion that fresh tissue can be stored on ice without major effects on RNA quality and concentration (46 ), but although this supposition may be true for some mRNAs and tissues, it may not be universally applicable. Hence, it is better to be cautious. Consequently, it is important to report in detail where the tissue sample was obtained and whether it was processed immediately. If the sample was not processed immediately, it is necessary to report how it was preserved and how long and under what conditions it was stored. A brief description of the sample is also essential. For example, microscopical examination of a tumor biopsy will reveal what percentage of the biopsy is made up of tumor cells, and this information should be reported.

Nucleic acid extraction is a second critical step. Extraction efficiency depends on adequate homogenization, the type of sample (e.g., in situ tissue vs logphase cultured cells), target density, physiological status (e.g., healthy, cancerous, or necrotic), genetic complexity, and the amount of biomass processed. Therefore, it necessary that details of the nucleic acid– extraction method be provided and that the methods used for measuring nucleic acid concentration and assessing its quality be described. Such details are particularly crucial for RNA extracted from fresh frozen laser-microdissected biopsy samples, because variations in tissue-preparation procedures have a substantial effect on both RNA yield and quality (47 ). 5. QC of Nucleic Acids 5.1. RNA SAMPLES

Quantification of RNA in the extracted samples is important, because it is advisable that approximately the same amounts of RNA be used when comparing different samples. There are several quantification procedures in common use, however, including spectrophotometry (NanoDrop; Thermo Scientific), microfluidic analysis (Agilent Technologies’ Bioanalyzer, Bio-Rad Laboratories’ Experion), capillary gel electrophoresis (Qiagen’s QIAxcel), or fluorescent dye detection (Ambion/Applied Biosystems’ RiboGreen). The methods produce different results, making it unwise to try to compare data obtained with the different methods (48 ). The preferred method for quantifying RNA uses fluorescent RNA-binding dyes (e.g., RiboGreen), which are best for detecting low target concentrations. In any case, it is advisable to measure all samples with a single method only and to report this information. It is also important to test for and report the extent of genomic-DNA contamination and to record the threshold cutoff criteria for the amounts of such contamination that are tolerable. It is essential to report whether the RNA sample has been treated with DNase (including the type of DNase used and the reaction conditions) and to report the results from a comparison of Cqs obtained with positive and no–reverse transcription controls for each nucleic acid target. It is also essential to document the quality assessment of RNA templates. The only situation in which this requirement does not apply is when the quantity of total RNA extracted is too low to permit quality assessment. This situation arises when RNA is extracted from single cells, plasma, other cell-free body fluids, some laser-captured samples, or clarified tissue culture medium. It also applies in cases in which extraction and RT-qPCR steps are performed as a continuous, singletube experiment. Key information to report includes data on RNA quantity, integrity, and the absence of Clinical Chemistry 55:4 (2009) 615

Special Report reverse transcription or PCR inhibitors. It is worth remembering that RNA degrades markedly in vivo, owing to the natural regulation of mRNAs in response to environmental stimuli (49 ). This source of RNA degradation is beyond the control of the researcher; one of its manifestations is that even high-quality RNA samples can show differential degradation of individual mRNAs. The A260/A280 ratio must be measured in a buffer at neutral pH, but such measurement is not sufficient if the nucleic acid is to be used for quantitative analysis, especially when the aim is to measure minor differences (⬍10-fold) in cellular mRNA concentrations. The absorbance ratio does provide an indication of RNA purity, because the presence of DNA or residual phenol alters the ratio. Instead, one should provide gel electrophoresis evidence at the least or, better yet, results from a microfluidics-based rRNA analysis (50 ) or a reference gene/target gene 3⬘:5⬘ integrity assay (51 ). The advantage of the use of a Bioanalyzer/Experion system to calculate an RNA integrity number or an RNA quality indicator number is that these measures provide quantitative information about the general state of the RNA sample. It is important to bear in mind, however, that these numbers relate to rRNA quality and cannot be expected to be an absolute measure of quality. Use of a 3⬘:5⬘ assay requires that the PCR efficiencies of both assays be virtually identical (51 ) and not be subject to differential inhibition. This assay also necessitates the establishment of a threshold criterion that delineates the RNA quality sufficient to yield reliable results. Ideally, the assay should target a panel of “integrity reference genes,” probably without introns, with a 3⬘:5⬘ threshold ratio of approximately 0.2–5. Clearly, further work is required to generate a universally applicable, cost-effective, and simple protocol for assessing RNA integrity. Inhibition of reverse-transcription activity or PCR should be checked by dilution of the sample (preferably) or use of a universal inhibition assay such as SPUD (52, 53 ). If the RNA sample is shown to be partially degraded, it is essential that this information be reported, because the assay’s sensitivity for detecting a low-level transcript may be reduced and relative differences in the degradation of transcripts may produce incorrect target ratios. 5.2. DNA SAMPLES

In general, degradation is much less of an issue with DNA; however, it is important to be able to assess the extent of DNA degradation for forensic applications, i.e., in cases in which harsh environmental conditions at scenes of crimes or mass disasters or at sites involving missing-person cases may have degraded the chemical structure of DNA. The small amplicon size of qPCR 616 Clinical Chemistry 55:4 (2009)

assays helps to minimize assay-related problems, but methods have been developed that provide a quantitative measurement of DNA quality (54 ) and should be considered for such specialized purposes. The potential for inhibition is a more generally applicable variable that must be addressed in a publication, and it is important to ensure that no inhibitors copurified with the DNA will distort results, e.g., pathogen detection and their quantification (55 ). Although such approaches such as spiking samples with positive controls (52 ) can be used to detect inhibition, different PCR reactions may not be equally susceptible to inhibition by substances copurified in nucleic acid extracts (56, 57 ). Consequently, it is better to routinely use dilutions of nucleic acids to demonstrate that observed decreases in Cqs or copy numbers are consistent with the anticipated result and to report these data. 6. Reverse Transcription The reverse-transcription step introduces substantial variation into an RT-qPCR assay (58, 59 ). Hence, it is essential that a detailed description of the protocol and reagents used to convert RNA into cDNA be provided. This documentation must include the amount of RNA reverse-transcribed, priming strategy, enzyme type, volume, temperature, and duration of the reversetranscription step. It is recommended that the reversetranscription step be carried out in duplicate or triplicate and that the total RNA concentration be the same in every sample (58 ). 7. qPCR The following information must be provided for qPCR assays: database accession numbers of each target and reference gene, the exon locations of each primer and any probe, the sequences and concentrations of each oligonucleotide, including the identities, positions, and linkages of any dyes and/or modified bases. Also required are the concentration and identity of the polymerase, the amount of template (DNA or cDNA) in each reaction, the Mg2⫹ concentration, the exact chemical compositions of the buffer (salts, pH, additives), and the reaction volume. The investigators must also identify the instrument they used and document all of the PCR cycling conditions. Because the consumables used affect thermal cycling, it is necessary to identify the use of single tubes, strips, or plates, and their manufacturers. The degree of transparency of the plasticware used, e.g., white or clear, is also important, because different plastics exhibit substantial differences in fluorescence reflection and sensitivity (60 ). When plates are used, the method of sealing (heat bonding

Special Report

MIQE Guidelines for qPCR

vs adhesives) can affect the evaporation of samples at the plate perimeter and should therefore be documented. Because PCR efficiency is highly dependent on the primers used, their sequences must be published. This requirement is perfectly feasible even with commercial primers, because there is a precedent for companies to make their primer and probe sequences available (http://www.primerdesign.co.uk/ research_with_integrity.asp). In addition, submission to public databases such as RTprimerDB is strongly encouraged; over time, these databases could become universal clearinghouses. 7.1. SECONDARY STRUCTURE

The structure of the nucleic acid target (e.g., stem and loop secondary RNA structure) has a substantial impact on the efficiency of reverse transcription and the PCR. Therefore, the positions of primers, probes, and PCR amplicons must take the folding of RNA templates into consideration. Sequences should be checked with nucleic acid–folding software, e.g., mfold for DNA (http://mfold.bioinfo.rpi.edu/cgi-bin/dna-form1.cgi) or RNA (http://frontend.bioinfo.rpi.edu/applications/ mfold/cgi-bin/rna-form1-2.3.cgi). Ideally, the folding structures should be made available to reviewers. 7.2. SPECIFICITY

In silico tools such as BLAST or equivalent specificity searches are useful for assay design. Any appreciable homology to pseudogenes or other unexpected targets should be documented and provided as aligned sequences for review; however, specificity must be validated empirically with direct experimental evidence (electrophoresis gel, melting profile, DNA sequencing, amplicon size, and/or restriction enzyme digestion). Algorithms for predicting an oligonucleotide’s melting temperature (Tm) are useful for initial design, but the practical optimum temperature for annealing must be determined experimentally. Although primer optimization has become unfashionable, it is clear that poor annealing optimization has a large effect on assay quality (51 ). A marked presence of primer dimers produces a lower PCR efficiency in probe-based assays and may generate false positives in assays based on SYBR Green I. Some evidence for primer optimization should be provided to reviewers, ideally in the form of annealing temperature or Mg2⫹ gradients, and be presented as Cq values, plots of fluorescence vs cycle number, and/or melting curves (61 ). 7.3. CONTROLS AND QUANTIFICATION CALIBRATORS

In addition to the no–reverse transcription control in RT-qPCR assays mentioned above, additional controls

and/or quantification calibrators are required for all qPCR reactions. NTCs detect PCR contamination when probes are used and can also distinguish unintended amplification products (e.g., primer dimers) from the intended PCR products in SYBR Green I reactions. NTCs should be included on each plate or batch of samples, and conditions for data rejection be established. For example, NTCs with Cqs ⱖ40 could be ignored if the Cq for the lowest concentration unknown is 35. Positive controls in the form of nucleic acids extracted from experimental samples are useful for monitoring assay variation over time and are essential when calibration curves are not performed in each run. Quantification calibrators may be purified target molecules, such as synthetic RNA or DNA oligonucleotides spanning the complete PCR amplicon, plasmid DNA constructs, cDNA cloned into plasmids, RNA transcribed in vitro, reference RNA pools, RNA or DNA from specific biological samples, or internationally recognized biological standards (as they become available). Dilutions should be carried out into defined concentrations of carrier tRNA (yeast or Escherichia coli at 10 –100 ng/␮L). For detection of human pathogens, calibrators can be diluted into negative control sample RNA or DNA, or they can be diluted into healthy human plasma, after which lysis may be carried out in the presence of carrier tRNA. Serial dilutions of a particular template can be prepared as stock solutions that resist several freeze–thaw cycles. A fresh batch should be prepared when a Cq shift of 0.5–1.0 is detected. Alternatively, solutions for calibration curves can be stored for a week at 4 °C and then discarded. For diagnostic assays, the qPCR should include an independently verified calibrator, if available, that lies within the linear interval of the assay. Positive and negative extraction controls are also recommended. 7.4. ASSAY PERFORMANCE

The following assay performance characteristics must be determined: PCR efficiency, linear dynamic range, LOD, and precision. 7.4.1. PCR efficiency. Robust and precise qPCR assays are usually correlated with high PCR efficiency. PCR efficiency is particularly important when reporting mRNA concentrations for target genes relative to those of reference genes. The ⌬⌬Cq method is one of the most popular means of determining differences in concentrations between samples and is based on normalization with a single reference gene. The difference in Cq values (⌬Cq) between the target gene and the reference gene is calculated, and the ⌬Cqs of the different samples are compared directly. The 2 genes must be amplified with comparable efficiencies for this comClinical Chemistry 55:4 (2009) 617

Special Report parison to be accurate. The most popular method is not necessarily the most appropriate, however, and alternative, more generalized quantitative models have been developed to correct for differences in amplification efficiency (62 ) and to allow the use of multiple reference genes (30 ). PCR amplification efficiency must be established by means of calibration curves, because such calibration provides a simple, rapid, and reproducible indication of the mean PCR efficiency, the analytical sensitivity, and the robustness of the assay. Amplification efficiency should be determined from the slope of the log-linear portion of the calibration curve. Specifically, PCR efficiency ⫽ 10⫺1/slope ⫺ 1, when the logarithm of the initial template concentration (the independent variable) is plotted on the x axis and Cq (the dependent variable) is plotted on the y axis. The theoretical maximum of 1.00 (or 100%) indicates that the amount of product doubles with each cycle. Ideally, the CIs or SEs of the means of estimated PCR efficiencies are reported from replicated calibration curves. Calibration curves for each quantified target must be included with the submitted manuscript so that this information can be made available to the reviewers; slopes and y intercepts derived from these calibration curves must be included with the publication. Differences in PCR efficiency will produce calibration curves with different slopes. As a consequence, differences between the Cq values of the targets and the references will not remain constant as template amounts are varied, and calculations of relative concentrations will be inaccurate, yielding misleading results. Cq values ⬎40 are suspect because of the implied low efficiency and generally should not be reported; however, the use of such arbitrary Cq cutoffs is not ideal, because they may be either too low (eliminating valid results) or too high (increasing false-positive results) (26 ). 7.4.2. Linear dynamic range. The dynamic range over which a reaction is linear (the highest to the lowest quantifiable copy number established by means of a calibration curve) must be described. Depending on the template used for generating calibration curves, the dynamic range should cover at least 3 orders of magnitude and ideally should extend to 5 or 6 log10 concentrations. The calibration curve’s linear interval must include the interval for the target nucleic acids being quantified. Because lower limits of quantification are usually poorly defined, the variation at the lowest concentration claimed to be within the linear interval should be determined. Correlation coefficients (r 2 values) must be reported, and, ideally, CIs should be provided through the entire linear dynamic range. 618 Clinical Chemistry 55:4 (2009)

7.4.3. LOD. The LOD is defined as the lowest concentration at which 95% of the positive samples are detected. In other words, within a group of replicates containing the target at concentrations at the LOD, no more than 5% failed reactions should occur. Low-copy PCRs are stochastically limited, and LODs of ⬍3 copies per PCR are not possible. If multiple reactions are performed, however, accurate quantification of lower concentrations can be obtained via digital PCR (29, 63, 64 ). Indeed, concentration calibrators can be checked by limiting dilution to show that the percentage of failed and successful reactions follows a Poisson distribution. 7.4.4. Precision. There are many explanations for variation in qPCR results, including temperature differences affecting the completion of annealing and/or denaturation, concentration differences introduced by pipetting errors, and stochastic variation. Precision in qPCR typically varies with concentration, decreasing with the copy number. Ideally, intraassay variation (repeatability) should be displayed in figures as SD error bars or as CIs on calibration curves with replicate samples. CVs should not be used with Cqs (29 ) but can be used to express the variance in copy numbers or concentrations. This technical variation should be distinguished from biological variation. Biological replicates can directly address the statistical significance of differences in qPCR results between groups or treatments. For diagnostic assays, it may also be necessary to report interassay precision (reproducibility) between sites and different operators. 7.5. MULTIPLEX qPCR

The ability to multiplex greatly expands the power of qPCR analysis (65, 66 ), particularly when applied to the simultaneous detection of point mutations or polymorphisms (67 ). Multiplexing requires the presentation of evidence demonstrating that accurate quantification of multiple targets in a single tube is not impaired, i.e., that assay efficiency and the LOD are the same as when the assays are run in uniplex fashion. This concern is of particular importance when targets of appreciably lower abundance are coamplified with highly abundant targets. 8. Data Analysis Data analysis includes an examination of the raw data, an evaluation of their quality and reliability, and the generation of reportable results. Various datacollection and -processing strategies have been described, and a systematic evaluation has revealed that qPCR data-analysis methods differ substantially in their performance (68 ).

Special Report

MIQE Guidelines for qPCR

Detailed information on the methods of data analysis and confidence estimation is necessary, together with specification of the software used. The methods of identifying outliers and the disposition of such data must be specified. Documenting assay precision requires identification of the statistical methods used to evaluate variances (e.g., 95% CIs) and presentation of the corresponding concentrations or Cq values. Such information should include both repeatability and reproducibility data, if available. As discussed above, reporting of CVs for Cqs is inappropriate (29 ), because Cqs will always be lower (and therefore potentially misleading) than CVs calculated for copy numbers. Information must be provided on the methods used for assessing accuracy, including the statistical significance of reported differences between groups. 8.1. NORMALIZATION

Normalization is an essential component of a reliable qPCR assay because this process controls for variations in extraction yield, reverse-transcription yield, and efficiency of amplification, thus enabling comparisons of mRNA concentrations across different samples. The use of reference genes as internal controls is the most common method for normalizing cellular mRNA data; however, although the use of reference genes is commonly accepted as the most appropriate normalization strategy (69 ), their utility must be experimentally validated for particular tissues or cell types and specific experimental designs. Unfortunately, although there is an increased awareness of the importance of systematic validation and although the potentially highly misleading effects of the use of inappropriate reference genes for normalization are widely known, these considerations also are still widely disregarded (70 ). Consequently, many molecular analyses still contain qPCR data that are poorly normalized. Normalization involves reporting the ratios of the mRNA concentrations of the genes of interest to those of the reference genes. Reference gene mRNAs should be stably expressed, and their abundances should show strong correlation with the total amounts of mRNA present in the samples. Normalization against a single reference gene is not acceptable unless the investigators present clear evidence for the reviewers that confirms its invariant expression under the experimental conditions described. The optimal number and choice of reference genes must be experimentally determined and the method reported (71–73 ). 8.2. VARIABILITY

The inherent variability of biological systems may rival or exceed experimental differences between groups. This variation is often observed when many biological

replicates are used to increase the statistical significance of the experiment. Although differences between biological replicates may be large, sufficient numbers may allow smaller experimental differences to be discerned. A recent publication provides a textbook example for handling such data and how to salvage biologically meaningful data from assays subject to high biological variation (74 ). Many factors contribute to experimental variation and influence the number of biological replicates necessary to achieve a given statistical power. Consequently, power analysis is useful for determining the number of samples necessary for valid conclusions. 8.3. QUALITATIVE ANALYSIS

The use of the PCR to detect merely the presence of a nucleic acid template, rather than to quantify it accurately, is referred to as qualitative PCR, which is widely used in pathogen diagnostics. The problem with qualitative/quantitative stratification of PCR methods is that an accurate yes/no answer requires information about the low-end sensitivity of the PCR assay. Consequently, even a qualitative assay should provide information about the assay’s performance characteristics, especially with respect to the points discussed in sections 7.4.2. and 7.4.3. Conclusions The necessity for ensuring quality-assurance measures for qPCR and RT-qPCR assays is well recognized (25, 44, 75– 86 ). The main difference between qPCR and conventional (legacy) PCR assays is the expectation of the former’s potential to quantify target nucleic acids accurately. This difference must be clearly recognized, and one cannot assume that legacy PCR assays can translate directly into the qPCR format. Table 1 provides a checklist for authors preparing a report of a qPCR study. Items deemed essential (E) are required to allow reviewers to evaluate the work and other investigators to reproduce it. Items considered desirable (D) are also important and should be included if possible, but they may not be available in all cases. Certainly, it is important to apply common sense: Compliance with all items on the checklist is not necessary for initial screening of expression signatures targeting hundreds of targets. Once a more limited set of targets (fewer than 20) has been identified, however, assay performance should be described as detailed by the checklist, which is hosted on http://www.rdml.org/miqe/. In summary, the purpose of these guidelines is 3-fold: 1. To enable authors to design and report qPCR experiments that have greater inherent value. Clinical Chemistry 55:4 (2009) 619

Special Report 2. To allow reviewers and editors to measure the technical quality of submitted manuscripts against an established yardstick. 3. To facilitate easier replication of experiments described in published studies that follow these guidelines. As a consequence, investigations that use this widely applied technology will produce data that are more uniform, more comparable, and, ultimately, more reliable.

Author Contributions: All authors confirmed they have contributed to the intellectual content of this paper and have met the following 3 requirements: (a) significant contributions to the conception and design, acquisition of data, or analysis and interpretation of data; (b) drafting or revising the article for intellectual content; and (c) final approval of the published article.

Authors’ Disclosures of Potential Conflicts of Interest: Upon manuscript submission, all authors completed the Disclosures of Potential Conflict of Interest form. Potential conflicts of interest: Employment or Leadership: J. Hellemans, Biogazelle; J. Vandesompele, Biogazelle; C.T. Wittwer, Idaho Technology. Consultant or Advisory Role: R. Mueller, DermTech International. Stock Ownership: R. Mueller, Sequenom; C.T. Wittwer, Idaho Technology. Honoraria: None declared. Research Funding: S.A. Bustin, Bowel and Cancer Research, registered charity number 1119105; J. Hellemans, Fund for Scientific Research Flanders; M. Kubista, grant agency of the Academy of Sciences, Czech Republic (grants IAA500520809 and AV0250520701); C.T. Wittwer, ARUP Institute for Clinical and Experimental Pathology and Idaho Technology. Expert Testimony: None declared. Role of Sponsor: The funding organizations played no role in the design of study, choice of enrolled patients, review and interpretation of data, or preparation or approval of manuscript.

References 1. Higuchi R, Dollinger G, Walsh PS, Griffith R. Simultaneous amplification and detection of specific DNA sequences. Biotechnology (N Y) 1992; 10:413–7. 2. Higuchi R, Fockler C, Dollinger G, Watson R. Kinetic PCR analysis: real-time monitoring of DNA amplification reactions. Biotechnology (N Y) 1993;11:1026 –30. 3. Wittwer CT, Herrmann MG, Moss AA, Rasmussen RP. Continuous fluorescence monitoring of rapid cycle DNA amplification. Biotechniques 1997;22: 130 – 8. 4. Bustin SA. Absolute quantification of mRNA using real-time reverse transcription polymerase chain reaction assays. J Mol Endocrinol 2000;25:169 –93. 5. Kubista M, Andrade JM, Bengtsson M, Forootan A, Jonak J, Lind K, et al. The real-time polymerase chain reaction. Mol Aspects Med 2006;27:95– 125. 6. Bernard PS, Wittwer CT. Real-time PCR technology for cancer diagnostics. Clin Chem 2002;48: 1178 – 85. 7. Mackay IM, Arden KE, Nitsche A. Real-time PCR in virology. Nucleic Acids Res 2002;30:1292–305. 8. Mackay IM. Real-time PCR in the microbiology laboratory. Clin Microbiol Infect 2004;10:190 – 212. 9. Bustin SA, Mueller R. Real-time reverse transcription PCR (qRT-PCR) and its potential use in clinical diagnosis. Clin Sci (Lond) 2005;109: 365–79. 10. Bustin SA, Mueller R. Real-time reverse transcription PCR and the detection of occult disease in colorectal cancer. Mol Aspects Med 2006;27: 192–223. 11. van den Berg RJ, Vaessen N, Endtz HP, Schulin T, van der Vorm ER, Kuijper EJ. Evaluation of realtime PCR and conventional diagnostic methods for the detection of Clostridium difficileassociated diarrhoea in a prospective multicentre study. J Med Microbiol 2007;56:36 – 42. 12. Bustin SA, Nolan T. Pitfalls of quantitative realtime reverse-transcription polymerase chain reaction. J Biomol Tech 2004;15:155– 66.

620 Clinical Chemistry 55:4 (2009)

13. Garson JA, Huggett JF, Bustin SA, Pfaffl MW, Benes V, Vandesompele J, Shipley GL. Unreliable real-time PCR analysis of human endogenous retrovirus-W (HERV-W) RNA expression and DNA copy number in multiple sclerosis. AIDS Res Hum Retroviruses. Forthcoming 2009. 14. Huang T, Bohlenius H, Eriksson S, Parcy F, Nilsson O. The mRNA of the Arabidopsis gene FT moves from leaf to shoot apex and induces flowering. Science 2005;309:1694 – 6. 15. Bohlenius H, Eriksson S, Parcy F, Nilsson O. Retraction. Science 2007;316:367. 16. Brazma A, Hingamp P, Quackenbush J, Sherlock G, Spellman P, Stoeckert C, et al. Minimum information about a microarray experiment (MIAME)—toward standards for microarray data. Nat Genet 2001;29:365–71. 17. Taylor CF, Paton NW, Lilley KS, Binz PA, Julian RK Jr, Jones AR, et al. The minimum information about a proteomics experiment (MIAPE). Nat Biotechnol 2007;25:887–93. 18. Field D, Garrity G, Gray T, Morrison N, Selengut J, Sterk P, et al. The minimum information about a genome sequence (MIGS) specification. Nat Biotechnol 2008;26:541–7. 19. Echeverri CJ, Beachy PA, Baum B, Boutros M, Buchholz F, Chanda SK, et al. Minimizing the risk of reporting false positives in large-scale RNAi screens. Nat Methods 2006;3:777–9. 20. Haney SA. Increasing the robustness and validity of RNAi screens. Pharmacogenomics 2007;8: 1037– 49. 21. Sansone SA, Fan T, Goodacre R, Griffin JL, Hardy NW, Kaddurah-Daouk R, et al. The metabolomics standards initiative. Nat Biotechnol 2007;25:846–8. 22. Taylor CF, Field D, Sansone SA, Aerts J, Apweiler R, Ashburner M, et al. Promoting coherent minimum reporting guidelines for biological and biomedical investigations: the MIBBI project. Nat Biotechnol 2008;26:889 –96. 23. Burns MJ, Valdivia H, Harris N. Analysis and interpretation of data from real-time PCR trace detection methods using quantitation of GM soya as a model system. Anal Bioanal Chem 2004;378: 1616 –23.

24. Burns MJ, Nixon GJ, Foy CA, Harris N. Standardisation of data from real-time quantitative PCR methods – evaluation of outliers and comparison of calibration curves. BMC Biotechnol 2005;5:31. 25. Ellison SL, English CA, Burns MJ, Keer JT. Routes to improving the reliability of low level DNA analysis using real-time PCR. BMC Biotechnol 2006;6:33. 26. Burns MJ, Valdivia H. Modelling the limit of detection in real-time quantitative PCR. Eur Food Res Technol 2008;226:1513–24. 27. Lefever S, Hellemans J, Pattyn F, Przybylski DR, Taylor C, Geurts R, et al. RDML: structured language and reporting guidelines for real-time quantitative PCR data. Nucleic Acids Res [Epub ahead of print 2009 Feb 17]. Available at: http://nar.oxfordjournals. org/cgi/content/abstract/gkp056. 28. Wittwer CT, Kusakawa N. Real-time PCR. In: Persing DH, Tenover FC, Versalovic J, Tang JW, Unger ER, Relman DA, White TJ, eds. Molecular microbiology: diagnostic principles and practice. Washington: ASM Press; 2004. p 71– 84. 29. Schmittgen TD, Livak KJ. Analyzing real-time PCR data by the comparative CT method. Nat Protoc 2008;3:1101– 8. 30. Hellemans J, Mortier G, De Paepe A, Speleman F, Vandesompele J. qBase relative quantification framework and software for management and automated analysis of real-time quantitative PCR data. Genome Biol 2007;8:R19. 31. de la Grange P, Dutertre M, Correa M, Auboeuf D. A new advance in alternative splicing databases: from catalogue to detailed analysis of regulation of expression and function of human alternative splicing variants. BMC Bioinformatics 2007;8:180. 32. Ben-Dov C, Hartmann B, Lundgren J, Valcarcel J. Genome-wide analysis of alternative pre-mRNA splicing. J Biol Chem 2008;283:1229 –33. 33. Bjornsson HT, Albert TJ, Ladd-Acosta CM, Green RD, Rongione MA, Middle CM, et al. SNP-specific array-based allele-specific expression analysis. Genome Res 2008;18:771–9. 34. Pattyn F, Robbrecht P, De Paepe A, Speleman F, Vandesompele J. RTPrimerDB: the real-time PCR

Special Report

MIQE Guidelines for qPCR

35.

36.

37.

38.

39.

40.

41.

42.

43.

44.

45.

46.

47.

48.

49. 50.

51.

52.

primer and probe database, major update 2006. Nucleic Acids Res 2006;34:D684 – 8. Pattyn F, Speleman F, De Paepe A, Vandesompele J. RTPrimerDB: the real-time PCR primer and probe database. Nucleic Acids Res 2003;31: 122–3. Gygi SP, Rochon Y, Franza BR, Aebersold R. Correlation between protein and mRNA abundance in yeast. Mol Cell Biol 1999;19:1720 –30. Schmidt MW, Houseman A, Ivanov AR, Wolf DA. Comparative proteomic and transcriptomic profiling of the fission yeast Schizosaccharomyces pombe. Mol Syst Biol 2007;3:79. Landi D, Gemignani F, Naccarati A, Pardini B, Vodicka P, Vodickova L, et al. Polymorphisms within micro-RNA-binding sites and risk of sporadic colorectal cancer. Carcinogenesis 2008;29: 579 – 84. Cronin M, Ghosh K, Sistare F, Quackenbush J, Vilker V, O’Connell C. Universal RNA reference materials for gene expression. Clin Chem 2004; 50:1464 –71. Novoradovskaya N, Whitfield ML, Basehore LS, Novoradovsky A, Pesich R, Usary J, et al. Universal Reference RNA as a standard for microarray experiments. BMC Genomics 2004;5:20. Gingeras TR. RNA reference materials for gene expression studies. Difficult first steps. Clin Chem 2004;50:1289 –90. Joseph LJ. RNA reference materials for gene expression studies. RNA metrology: forecast calls for partial clearing. Clin Chem 2004;50:1290 –2. Bustin SA, Benes V, Nolan T, Pfaffl MW. Quantitative real-time RT-PCR—a perspective. J Mol Endocrinol 2005;34:597– 601. Ramsden SC, Daly S, Geilenkeuser WJ, Duncan G, Hermitte F, Marubini E, et al. EQUAL-quant: an international external quality assessment scheme for real-time PCR. Clin Chem 2006;52:1584 –91. Damond F, Benard A, Ruelle J, Alabi A, Kupfer B, Gomes P, et al. Quality control assessment of human immunodeficiency virus type 2 (HIV-2) viral load quantification assays: results from an international collaboration on HIV-2 infection in 2006. J Clin Microbiol 2008;46:2088 –91. Micke P, Ohshima M, Tahmasebpoor S, Ren ZP, Ostman A, Ponten F, Botling J. Biobanking of fresh frozen tissue: RNA is stable in nonfixed surgical specimens. Lab Invest 2006;86:202–11. Morrogh M, Olvera N, Bogomolniy F, Borgen PI, King TA. Tissue preparation for laser capture microdissection and RNA extraction from fresh frozen breast tissue. Biotechniques 2007;43: 41–2, 4, 6 passim. Bustin SA. Real-time, fluorescence-based quantitative PCR: a snapshot of current procedures and preferences. Expert Rev Mol Diagn 2005;5: 493– 8. Doma MK, Parker R. RNA quality control in eukaryotes. Cell 2007;131:660 – 8. Fleige S, Pfaffl MW. RNA integrity and the effect on the real-time qRT-PCR performance. Mol Aspects Med 2006;27:126 –39. Nolan T, Hands RE, Bustin SA. Quantification of mRNA using real-time RT-PCR. Nat Protoc 2006; 1:1559 – 82. Nolan T, Hands RE, Ogunkolade BW, Bustin SA. SPUD: a qPCR assay for the detection of inhibitors in nucleic acid preparations. Anal Biochem 2006; 351:308 –10.

53. Ferns RB, Garson JA. Development and evaluation of a real-time RT-PCR assay for quantification of cell-free human immunodeficiency virus type 2 using a brome mosaic virus internal control. J Virol Methods 2006;135:102– 8. 54. Swango KL, Hudlow WR, Timken MD, Buoncristiani MR. Developmental validation of a multiplex qPCR assay for assessing the quantity and quality of nuclear DNA in forensic samples. Forensic Sci Int 2007;170:35– 45. 55. Garson JA, Grant PR, Ayliffe U, Ferns RB, Tedder RS. Real-time PCR quantitation of hepatitis B virus DNA using automated sample preparation and murine cytomegalovirus internal control. J Virol Methods 2005;126:207–13. 56. Huggett JF, Novak T, Garson JA, Green C, MorrisJones SD, Miller RF, Zumla A. Differential susceptibility of PCR reactions to inhibitors: an important and unrecognised phenomenon. BMC Res Notes 2008;1:70. 57. Ståhlberg A, Aman P, Ridell B, Mostad P, Kubista M. Quantitative real-time PCR method for detection of B-lymphocyte monoclonality by comparison of ␬ and ␭ immunoglobulin light chain expression. Clin Chem 2003;49:51–9. 58. Ståhlberg A, Hakansson J, Xian X, Semb H, Kubista M. Properties of the reverse transcription reaction in mRNA quantification. Clin Chem 2004;50:509 –15. 59. Ståhlberg A, Kubista M, Pfaffl M. Comparison of reverse transcriptases in gene expression analysis. Clin Chem 2004;50:1678 – 80. 60. Reiter M, Pfaffl M. Effects of plate position, plate type and sealing systems on real-time PCR results. Biotechnol Biotechnol Equip 2008;22: 824 – 8. 61. Ririe KM, Rasmussen RP, Wittwer CT. Product differentiation by analysis of DNA melting curves during the polymerase chain reaction. Anal Biochem 1997;245:154 – 60. 62. Pfaffl MW. A new mathematical model for relative quantification in real-time RT-PCR. Nucleic Acids Res 2001;29:E45. 63. Dube S, Qin J, Ramakrishnan R. Mathematical analysis of copy number variation in a DNA sample using digital PCR on a nanofluidic device. PLoS ONE 2008;3:e2876. 64. Vogelstein B, Kinzler KW. Digital PCR. Proc Natl Acad Sci U S A 1999;96:9236 – 41. 65. Elnifro EM, Ashshi AM, Cooper RJ, Klapper PE. Multiplex PCR: optimization and application in diagnostic virology. Clin Microbiol Rev 2000;13: 559 –70. 66. Wittwer CT, Herrmann MG, Gundry CN, Elenitoba-Johnson KS. Real-time multiplex PCR assays. Methods 2001;25:430 – 42. 67. Elenitoba-Johnson KS, Bohling SD, Wittwer CT, King TC. Multiplex PCR by multicolor fluorimetry and fluorescence melting curve analysis. Nat Med 2001;7:249 –53. 68. Karlen Y, McNair A, Perseguers S, Mazza C, Mermod N. Statistical significance of quantitative PCR. BMC Bioinformatics 2007;8:131. 69. Huggett J, Dheda K, Bustin S, Zumla A. Real-time RT-PCR normalisation; strategies and considerations. Genes Immun 2005;6:279 – 84. 70. Gutierrez L, Mauriat M, Guenin S, Pelloux J, Lefebvre JF, Louvet R, et al. The lack of a systematic validation of reference genes: a serious pitfall undervalued in reverse transcription-polymerase

71.

72.

73.

74.

75.

76.

77.

78.

79.

80.

81.

82.

83.

chain reaction (RT-PCR) analysis in plants. Plant Biotechnol J 2008;6:609 –18. Vandesompele J, De Preter K, Pattyn F, Poppe B, Van Roy N, De Paepe A, Speleman F. Accurate normalization of real-time quantitative RTPCR data by geometric averaging of multiple internal control genes. Genome Biol 2002;3: RESEARCH0034. Pfaffl MW, Horgan GW, Dempfle L. Relative expression software tool (REST) for group-wise comparison and statistical analysis of relative expression results in real-time PCR. Nucleic Acids Res 2002;30:e36. Andersen CL, Jensen JL, Orntoft TF. Normalization of real-time quantitative reverse transcriptionPCR data: a model-based variance estimation approach to identify genes suited for normalization, applied to bladder and colon cancer data sets. Cancer Res 2004;64:5245–50. Willems E, Leyns L, Vandesompele J. Standardization of real-time PCR gene expression data from independent biological replicates. Anal Biochem 2008;379:127–9. Apfalter P, Reischl U, Hammerschlag MR. Inhouse nucleic acid amplification assays in research: How much quality control is needed before one can rely upon the results? J Clin Microbiol 2005;43:5835– 41. Ciabatti I, Froiio A, Gatto F, Amaddeo D, Marchesi U. In-house validation and quality control of real-time PCR methods for GMO detection: a practical approach. Dev Biol (Basel) 2006;126: 79 – 86; discussion 324 –5. de Cremoux P, Bieche I, Tran-Perennou C, Vignaud S, Boudou E, Asselain B, et al. Inter-laboratory quality control for hormone-dependent gene expression in human breast tumors using real-time reverse transcription-polymerase chain reaction. Endocrine Relat Cancer 2004;11:489 –95. de Vries TJ, Fourkour A, Punt CJ, van de Locht LT, Wobbes T, van den Bosch S, et al. Reproducibility of detection of tyrosinase and MART-1 transcripts in the peripheral blood of melanoma patients: a quality control study using real-time quantitative RT-PCR. Br J Cancer 1999;80:883–91. Gabert J, Beillard E, van der Velden VH, Bi W, Grimwade D, Pallisgaard N, et al. Standardization and quality control studies of ‘real-time’ quantitative reverse transcriptase polymerase chain reaction of fusion gene transcripts for residual disease detection in leukemia – a Europe Against Cancer program. Leukemia 2003;17:2318 –57. Lemmer K, Donoso Mantke O, Bae HG, Groen J, Drosten C, Niedrig M. External quality control assessment in PCR diagnostics of dengue virus infections. J Clin Virol 2004;30:291– 6. Marubini E, Verderio P, Raggi CC, Pazzagli M, Orlando C. Statistical diagnostics emerging from external quality control of real-time PCR. Int J Biol Markers 2004;19:141– 6. Raggi CC, Verderio P, Pazzagli M, Marubini E, Simi L, Pinzani P, et al. An Italian program of external quality control for quantitative assays based on real-time PCR with Taq-Man probes. Clin Chem Lab Med 2005;43:542– 8. Sjoholm MI, Dillner J, Carlson J. Assessing quality and functionality of DNA from fresh and archival dried blood spots and recommendations for quality control guidelines. Clin Chem 2007;53: 1401–7.

Clinical Chemistry 55:4 (2009) 621

Special Report 84. Wang X, Jia S, Meyer L, Xiang B, Chen LY, Jiang N, et al. Comprehensive quality control utilizing the prehybridization third-dye image leads to accurate gene expression measurements by cDNA microarrays. BMC Bioinformatics 2006; 7:378.

622 Clinical Chemistry 55:4 (2009)

85. Winters MA, Tan LB, Katzenstein DA, Merigan TC. Biological variation and quality control of plasma human immunodeficiency virus type 1 RNA quantitation by reverse transcriptase polymerase chain reaction. J Clin Microbiol 1993;31: 2960 – 6.

86. van der Velden VH, Panzer-Grumayer ER, Cazzaniga G, Flohr T, Sutton R, Schrauder A, et al. Optimization of PCR-based minimal residual disease diagnostics for childhood acute lymphoblastic leukemia in a multi-center setting. Leukemia 2007;21:706 –13.

Mini-Reviews

Clinical Chemistry 55:4 623–631 (2009)

MicroRNAs: Novel Biomarkers for Human Cancer Claudine L. Bartels1 and Gregory J. Tsongalis1*

BACKGROUND: MicroRNAs (miRNAs), small RNA molecules of approximately 22 nucleotides, have been shown to be up- or downregulated in specific cell types and disease states. These molecules have become recognized as one of the major regulatory gatekeepers of coding genes in the human genome. CONTENT:

We review the structure, nomenclature, mechanism of action, technologies used for miRNA detection, and associations of miRNAs with human cancer. miRNAs are produced in a tissue-specific manner, and changes in miRNA within a tissue type can be correlated with disease status. miRNAs appear to regulate mRNA translation and degradation via mechanisms that are dependent on the degree of complementarity between the miRNA and mRNA molecules. miRNAs can be detected via several methods, such as microarrays, bead-based arrays, and quantitative real-time PCR. The tissue concentrations of specific miRNAs have been associated with tumor invasiveness, metastatic potential, and other clinical characteristics for several types of cancers, including chronic lymphocytic leukemia, and breast, colorectal, hepatic, lung, pancreatic, and prostate cancers.

SUMMARY:

By targeting and controlling the expression of mRNA, miRNAs can control highly complex signaltransduction pathways and other biological pathways. The biologic roles of miRNAs in cancer suggest a correlation with prognosis and therapeutic outcome. Further investigation of these roles may lead to new approaches for the categorization, diagnosis, and treatment of human cancers.

© 2009 American Association for Clinical Chemistry

Although cancer was once thought to be an acute process leading to imminent death, current knowledge of tumor cell biology suggests that cancer is a chronic

condition and that the ongoing development of targeted cancer therapies will continue to increase survival prospects. Human cancers comprise both genetic (inherited and acquired) and epigenetic alterations. Many tumor suppressor genes and oncogenes have been described, and the discovery of new tumor markers continues at a rapid pace (1 ). Recently, a novel group of biomarkers, microRNAs (miRNAs),2 has been discovered. These molecules appear to be cell type and disease specific, unlike most other biomarkers that are currently available. miRNAs promise to have an impact on laboratory medicine as new diagnostic and prognostic markers, as indicators of therapeutic response, and as targets of novel therapies. This review highlights some of what is known about miRNAs and human cancer. miRNA MicroRNAs are a family of endogenous, small (approximately 22 nucleotides in length), noncoding, functional RNAs. Bioinformatics approaches for identifying miRNAs rely on evolutionarily conserved sequences (2 ). It is estimated that there may be 1000 miRNA genes in the human genome (http://www. sanger.ac.uk/Software/Rfam/mirna/). miRNAs are expressed in a tissue-specific manner, and changes in miRNA expression within a tissue type can be correlated with disease status (3, 4 ). miRNAs are transcribed by RNA polymerase II or III as longer primary-miRNA molecules, which are subsequently processed in the nucleus by the RNase III endonuclease Drosha and DGCR8 (the “microprocessor complex”) to form intermediate stem–loop structures approximately 70 nucleotides long called “precursor miRNAs” (pre-miRNAs) (5– 8 ) (Fig. 1). These pre-miRNAs fold to form imperfect stem–loop struc-

2

1

Department of Pathology, Dartmouth Medical School, Dartmouth Hitchcock Medical Center and Norris Cotton Cancer Center, Lebanon, NH, USA. * Address correspondence to this author at: Department of Pathology, Dartmouth Hitchcock Medical Center, One Medical Center Drive, Lebanon, NH 03756, USA. Fax 603-650-8485; e-mail [email protected]. Received October 10, 2008; accepted January 30, 2009. Previously published online at DOI: 10.1373/clinchem.2008.112805

Nonstandard abbreviations: miRNA, microRNA; pre-miRNA, precursor miRNA; miRNA*, an RNA fragment similar in size to the mature miRNA sequence, with which the former forms an imperfect duplex after Dicer has removed the pre-miRNA loop; Tm, melting temperature; LNA, locked nucleic acid; PDCD4, programmed cell death 4; TPM1, tropomyosin 1; LOH, loss of heterozygosity; AIB1, amplified in breast cancer 1; CLL, chronic lymphocytic leukemia; HCC, hepatocellular carcinoma; NSCLC, non–small cell lung cancer; EGFR, epidermal growth factor receptor; PDAC, pancreatic ductal adenocarcinoma; PanIN, pancreatic intraepithelial neoplasia; CaP, prostate cancer.

623

Mini-Reviews

Fig. 1. Schematic showing biogenesis of miRNA molecules. Drosha, RNase III endonuclease; DGCR8, DiGeorge syndrome critical region 8; Dicer, RNase III endonuclease.

tures that are transported with the help of exportin-5 from the nucleus to the cytoplasm, where they undergo further processing by another RNase III endonuclease, Dicer (9, 10 ). Dicer removes the loop of the premiRNA to produce an imperfect duplex made up of the mature miRNA sequence and a fragment of similar size (miRNA*), which is derived from the opposing arm of the pre-miRNA. The miRNA strand of the duplex is loaded onto the RNA-induced silencing complex (RISC); the miRNA* separates from the duplex and is degraded (11 ). miRNAs regulate gene expression by regulating mRNA translation and degradation. The mechanism by which they regulate mRNA is dependent on the degree of miRNA complementarity with the mRNA molecule. Perfect (or near perfect) complementarity is thought to target the mRNA for degradation by the RISC, whereas imperfect complementarity is thought to block translation of the mRNA by the ribosome (12 ). Identification of miRNA targets has been difficult because only the seed sequence (about 6 – 8 bases) of the approximately 22 nucleotides aligns perfectly with 624 Clinical Chemistry 55:4 (2009)

the target mRNA’s 3⬘ untranslated region (2, 13 ). The remainder of the miRNA may bind perfectly to the target mRNA, but more often it does not. Bioinformatics approaches can identify putative targets for particular miRNAs through analysis of the miRNA seed sequences (2 ); however, these miRNAs need to be assayed in vitro or in vivo to determine if they truly affect the proposed mRNA. Once a sequence has been determined to be a unique miRNA, the miRBase Registry assigns a name according to existing guidelines (14, 15 ). In the database, a sequence of 3 or 4 letters designates the species (e.g., “hsa” for Homo sapiens); however, this prefix is usually dropped in the literature. The core of the miRNA name is the designation “miR” (denoting a mature sequence) followed by a sequentially assigned unique identifying number. Lettered suffixes are added to miRs that differ by only 1 or 2 bases (e.g., miR-10b), and numbered suffixes are assigned to miRs that have the same sequence but are derived from different primary transcripts. A suffix of ⫺5p or ⫺3p is given when mature miRNAs are derived from the 5⬘ arm or the 3⬘ arm, respectively, of the precursor miRNA. For this

Mini-Reviews

miRNA and Cancer

rapidly growing field, the miRBase registry maintains the latest nomenclature designations. miRNA-Detection Technologies There are several methods for detecting miRNAs and/or determining miRNA profiles of particular cell types, such as microarrays, bead-based arrays, and quantitative real-time PCR. The principle of miRNA microarrays is based on the Watson–Crick base pairing of nucleic acids. Microarrays permit the simultaneous detection of hundreds of miRNAs. A set of oligonucleotide capture probes are spotted on glass slides, and a sample of extracted RNA enriched for small-molecule RNAs is allowed to hybridize with the capture probes. Because miRNAs are short, it can be difficult to normalize the melting temperatures (Tm) of the probes across an array without compromising sensitivity or specificity. This problem has been overcome by the use of locked nucleic acids (LNAs). LNAs contain at least one LNA monomer—a nucleic acid analog in which the 2⬘ oxygen atom and the 4⬘ carbon atom of the ribose moiety is “locked” by a connecting methylene bridge (16 ). Each incorporated LNA monomer increases the Tm of the nucleic acid duplex by 2–10 °C (17 ). Therefore, by adjusting the number of LNA monomers incorporated in a capture probe, all the probes across an array can be Tm-normalized despite the short length of the miRNA. Many disease associations have been discovered on microarrays. Bead-based arrays, such as the Luminex FlexmiR™ arrays, also permit simultaneous quantification of hundreds of miRNAs (18 ). Locked Nucleic Acid (LNA™; Exiqon) probes are coupled to carboxylated polystyrene microspheres that incorporate variable mixtures of 2 fluorescent dyes that allow a flow cytometer to identify each microsphere (up to 100) by its unique color. Each microsphere is coupled with an LNA molecule that is specific for a particular miRNA; these probes permit discrimination between closely related members of an miRNA family. Total RNA is extracted from the sample, biotinylated, and then hybridized with the microspheres. The microspheres are washed, incubated with streptavidin–phycoerythrin, and analyzed on a Luminex analyzer. The analyzer can both identify the fluorescent microsphere and measure the intensity of the streptavidin–phycoerythrin fluorescence, allowing the user to see which miRNAs are present in the sample. The assay can also provide quantitative results if a calibration curve is produced with appropriate calibration materials, such as synthetic oligonucleotides. Quantitative real-time PCR can also be used to detect miRNAs. Quantification of mature miRNAs

usually requires reverse transcription of the miRNA with a stem–loop primer (19 ) (Fig. 2). The cDNA is then used in the real-time PCR reaction. A mixture of forward and reverse primers and a dual-labeled probe (TaqMan®) are used to amplify and detect the cDNA target. The probe has a reporter dye on the 5⬘ end and a quencher on the 3⬘ end. If the target sequence is present during the PCR, the probe binds to the target sequence. During the extension stage of the PCR cycle, the reporter dye is released by the 5⬘ exonuclease activity of Taq polymerase, and because the reporter and quencher have been separated, the fluorescence from the reporter dye is detected. Primary RNAs and/or premiRNAs can also be quantified with the same methods; however, such assays require adjusting the designs of the primers and probes (19 ). miRNA in Breast Cancer Breast cancer, the second-leading cause of cancerrelated deaths in women, is expected to account for 26% of new cancer diagnoses in 2008 (20 ). Several miRNAs are associated with breast cancer. For example, miR-155 is up-regulated in breast cancer, suggesting that it may act as an oncogene (21, 22 ). Upregulation of miR-373 and miR-520c promotes metastasis by inhibiting CD44 expression. Increased expression of the CD44 isoform CD44s is associated with overall survival in breast cancer patients (23, 24 ). Thus, there is an inverse relationship between CD44 expression and the concentrations of miR-373 and miR-520c. Ma et al. (22 ) showed increased expression of the gene encoding miR-10b, which is upregulated by the transcription factor Twist1. Overproduction of miR-10b can promote tumor invasion in vivo (22 ). miR-21 has also been found to be upregulated in breast cancer, and this upregulation causes downregulation of 2 important targets: programmed cell death 4 (PDCD4) and tropomyosin 1 (TPM1) (25–27 ). PDCD4 is a tumor suppressor that targets translation by inhibiting eukaryotic initiation factor 4 (28 ). TPM1 is a member of the tropomyosin family of proteins, which are associated with actin and serve to stabilize microfilaments (29 ). miR-17–5p, also known as miR-91, has been found to be down-regulated in breast cancer (30 ). miR-17–5p is encoded by a gene located on chromosome 13q31, which is a region that undergoes loss of heterozygosity (LOH) in breast cancer (31 ). This miRNA normally represses translation of the AIB1 (amplified in breast cancer 1) mRNA (30 ). AIB1 is a coactivator of the cell cycle regulator E2F1, and it also enhances estrogen receptor– dependent transcription (32, 33 ). Along with miR-20a, miR-17–5p also negaClinical Chemistry 55:4 (2009) 625

Mini-Reviews

Fig. 2. Amplification of miRNAs. (A), Ligation of hairpin loop primer to mature miRNA. (B), Reverse transcription to cDNA product. (C), Real-time PCR amplification of cDNA with forward and reverse primers and use of a TaqMan probe for detection.

tively regulates the CCND13 (cyclin D1) gene (34 ). Patients with low miR-335 and miR-126 concentrations have been observed to have shorter median times to metastatic relapse (35 ). Downregulation of miR-335 promotes metastasis by upregulating the extracellular matrix protein tenascin C and the transcription factor SOX4 (35 ). The role of miR-126 seems to be that of a tumor suppressor, and a decreased concentration of this miRNA promotes cell proliferation (35 ). Differential expression of genes encoding some miRNAs seems to be associated with particular pathologic features of breast cancer. For example, expression

of the gene encoding miR-30 seems to correlate with estrogen receptor and progesterone receptor status; downregulation of this miRNA is found in estrogen receptor– and progesterone receptor–negative tumors (21 ). miR-213 and miR-203 appear to correlate with tumor stage; increased expression of the genes encoding these miRNAs is found in higher-stage tumors (21 ). miR-206 has been found to target the 3⬘ untranslated region of the estrogen receptor ␣ protein, leading to an inverse correlation between miR-206 concentration and estrogen receptor ␣ status (36, 37 ). miRNA in Chronic Lymphocytic Leukemia

3

Human genes: CCND1, cyclin D1; BCL2, B-cell CLL/lymphoma 2; TCL1, T-cell leukemia/lymphoma 1; MIRLET7A1, microRNA let-7a-1; PTEN, phosphatase and tensin homolog; TPM1, tropomyosin 1 (alpha); APC, adenomatous polyposis coli; MIRLET7G, microRNA let-7g.

626 Clinical Chemistry 55:4 (2009)

Chronic lymphocytic leukemia (CLL) is the most common leukemia in the Western world, and the disease has a very variable clinical outcome. There is great in-

miRNA and Cancer

terest in developing better prognostic tests for this disease. Some studies have tried to correlate the concentrations of miRNAs in CLL cells with other biomarkers, such as ZAP-70 status, IgVH mutation status, or 13q14 deletion status; however, a consensus has yet to be reached on this issue (38, 39 ). miR-150 is upregulated in CLL blood samples along with miR-155, whose gene maps to the last exon of the B-cell integration cluster and is upregulated in CLL cells compared with nonpathologic B cells (38, 40, 41 ). Many miRNAs have been shown to be downregulated in CLL. Reduced expression of the genes encoding miR-15a and miR-16-1 correlates with a good prognosis; these genes are found at chromosome 13q14.3, which is deleted in 68% of CLL patients. Some uncertainty exists, however, as to the target(s) of these miRNAs (39, 40, 42, 43 ). Although results of transfecting a megakaryocytic cell line with these 2 miRNA genes have shown that the miRNAs may target the antiapoptotic protein BCL-2, the observation has also been made that changes in the expression of these 2 genes do not significantly correlate with changes in BCL2 (B-cell CLL/lymphoma 2) gene expression in CLL patients (40, 42 ). Several other targets for miR15a and miR-16-1 are currently under investigation (43 ). The genes encoding miR-92 miRNAs, which are members of the miR-17-92 gene cluster that maps to 13q31–32, are downregulated in CLL (40 ). Genes encoding miR-181 and miR-29 miRNAs that target the TCL1A (T-cell leukemia/lymphoma 1A) gene are downregulated in CLL (39, 44 ). It is possible that reduced expression of the genes encoding miR-181a-2 [and MIRLET7A1 (microRNA let-7a-1), which is also downregulated in CLL] may be due to impaired processing of the precursor molecule (45 ). Low concentrations of miR-29 miRNAs correlate with poor prognosis in CLL (39 ). Reduced expression of the genes encoding miR-143 and miR-145 has also been observed (46 ). miRNA in Colorectal Cancer Colorectal cancer is the third-leading cause of cancerrelated deaths, and an estimated 150 000 new cases will be reported this year (20 ). As with other human cancers, several miRNAs are up- or downregulated in this tumor type (47 ). miR-31, miR-96, miR135b, and miR183 have been found to be upregulated in colorectal neoplasms; the transcription factor CHES1 (which is involved in repressing apoptosis) is a potential target of miR-96. High expression of the gene encoding miR-21 correlates with reduced expression of the gene encoding tumor suppressor protein PDCD4; miR-21 may also target the same genes as in breast neoplasms [PTEN, phosphatase and tensin homolog; TPM1, tropomyosin 1 (alpha)]. miR-135a and miR-135b are up-

Mini-Reviews regulated, and this upregulation correlates with reduced expression of the APC (adenomatous polyposis coli) gene (48 ). miR-143 and miR-145 are both downregulated in colorectal cancer, similar to CLL. The genes encoding these miRNAs are both located on 5q23, and these miRNAs possibly originate from the same primary miRNA (49, 50 ). miR-126 promotes cell proliferation through modulation of phosphatidylinositol 3-kinase signaling (51 ). miR-133b is also downregulated, and one of its putative targets is KRAS (52 ). KRAS is a member of the Ras family of proteins, which regulates signaling pathways involved in cellular proliferation, differentiation, and survival. In addition, Lanza et al. are investigating whether a combined miRNA/mRNA panel is able to distinguish between microsatellitestable and -unstable colorectal cancers (53 ). miRNA in Hepatocellular Cancer Hepatocellular carcinoma (HCC) is the fifth-leading cause of cancer-related death in men in the US, with an estimated 21 000 new cases (men and women) diagnosed on an annual basis (20 ). The major etiologies of HCC include viral infection, metabolic abnormalities, and immune-related disorders. Five-year survival rates approach 50%–70%, with local recurrence rates of ⬎70% at 5 years (54, 55 ). Neoplasms of the liver are clinically heterogeneous and have associated risk factors and genetic alterations. The genes for specific miRNAs have been shown to be aberrantly expressed in various liver tissues that include HCC. Murakami et al. were the first to use microarray technologies to profile miRNA gene expression in HCC (56 ). They analyzed miRNA gene expression profiles in 25 pairs of HCC and adjacent nontumor tissues as well as in 9 chronic hepatitis samples. In this study, the genes for 3 miRNAs (miR-224, miR-18, and pre–miR-P18) exhibited higher expression in the HCC samples than in nontumor tissues, and the genes for 5 other miRNAs (miR-199a, miR-199a*, miR-200a, miR-125a, miR-195) showed lower expression in the HCC samples than in the adjacent nontumor tissue samples. With these 8 miRNAs, Murakami et al. developed an overall prediction accuracy of 97.8% (56 ). In a more recent study, Ladeiro et al. demonstrated the utility of miRNA profiling for distinguishing benign from malignant hepatocellular tumors (57 ). This study also characterized miRNAs in several subgroups of tumors on the basis of the presence of oncogene and tumor suppressor gene mutations and specific risk factors. Both benign and malignant hepatocellular tumors showed increased expression of the gene encoding miR-224 and decreased expression of those encoding miR-122a and miR-422b. HCCs had Clinical Chemistry 55:4 (2009) 627

Mini-Reviews increased concentrations of miR-21, miR-10b, and miR-222, whereas benign tumors showed decreased expression of the genes encoding miR-200c and miR203. Of the HCC cases, those associated with alcohol consumption showed decreased miR-126 concentrations, whereas the cases associated with hepatitis C virus infection showed upregulation of miR-96. These findings have important implications for our understanding of liver pathophysiology and could shed light on novel therapeutic approaches. Li et al. identified a profile of 69 miRNAs that differentiated noncancerous from cancerous liver tissues (58 ). Eight of these miRNAs were chosen for further validation to distinguish between benign and malignant liver tumors, as well as to differentiate healthy liver tissue. miR-125b was shown to be downregulated in HCC. Overexpression of the gene for miR-125b was associated with good survival in HCC patients. In HCC patients, Jiang et al. identified genes for 19 miRNAs whose expression was associated with either poor survival (low expression) or good survival (high expression) (59 ). Li et al. postulated that the mechanism of action of miR-125b involved inhibition of cell proliferation by suppressing phosphorylation and thus inactivation of Akt (58 ). Akt is the most crucial downstream signaling mediator of phosphatidylinositol 3-kinase. In addition, Fornari et al. established a potential oncogenic function of miR-221, which is upregulated in HCC (60 ). miR-221 targets the cyclin-dependent kinase inhibitors CDKN1B/p27 and CDKN1C/p57. Upregulation of miR-221 causes a downregulation of these inhibitors and promotes loss of cell cycle control.

the human RAS genes contain many let-7– complementary sites, and in lung cancer, the expression of let-7–related genes is 50% lower than in healthy tissue (64 ). The concentration of RAS protein, on the other hand, is significantly higher, suggesting a mechanism for let-7 homologs in human lung cancer. Kumar et al. showed that MIRLET7G (microRNA let-7g)expressing NSCLCs also had reduced concentrations of the RAS and HMGA2 proteins (65 ). Loss of miRNA control of RAS expression could thus lead to RAS overproduction and contribute to the formation of a human cancer. LOH of chromosome 3p is one of the most frequent genetic events in lung carcinogenesis. Weiss et al. showed that loss of the gene encoding miR-128b (located on chromosome 3p) correlated with the response to targeted inhibition of epidermal growth factor receptor (EGFR) (66 ). LOH of miR-128b can be considered equivalent to losing a tumor suppressor gene, because it permits increased production of EGFR. Weiss et al. initially showed that miR-128b is a regulator of EGFR in NSCLC cell lines and determined that miR128b directly regulates EGFR (66 ). miR-128b LOH was frequent in tumor samples, and this LOH was significantly correlated with the clinical response to and survival after gefitinib therapy. Yu et al. identified a signature of 5 miRNAs (let-7a, miR-221, miR-137, miR-372, miR-182) that predicted treatment outcome in NSCLC patients (67 ). Patients with a high risk score for these 5 miRNAs had an increased relapse rate and shortened survival times.

miRNA in Lung Cancer

Pancreatic cancer is the fourth-leading cause of cancerrelated deaths in the US, with a 5-year survival rate of ⬍5%. Approximately 38 000 new cases and 34 000 pancreatic cancer–related deaths will have occurred in the US in 2008 (20 ). Eighty-five percent of pancreatic tumors originate from the epithelial lining of the pancreatic duct [pancreatic ductal adenocarcinoma (PDAC)] (68 ). The mortality and morbidity associated with this disease correspond to the overall poor prognosis of pancreatic cancer, which is due to the late clinical presentation, its aggressive invasive and metastatic potential, and its resistance to chemotherapy and radiation therapy. Despite advances in the clinical management of pancreatic cancer, there is currently a lack of effective biomarker-based strategies for early detection of pancreatic cancer or for differentiating between PDAC and benign disease, such as chronic pancreatitis. To identify more accurate and sensitive biomarkers for pancreatic cancer, researchers have begun to describe miRNA profiles that might be able to provide an earlier diagnosis.

In 2008, 215 000 new cases of lung cancer will have been diagnosed and 162 000 individuals will have succumbed to their lung disease (20 ). Non–small cell lung cancer (NSCLC) is the most common cause worldwide of lung cancer–related deaths. Many studies at the molecular level have identified gene mutation spectra and gene expression profiles associated with biological processes that are altered in lung carcinogenesis (61, 62 ). Given that miRNAs have been shown to play key roles in carcinogenesis by regulating the translation and degradation of specific mRNAs controlling cellular processes, there is also a potential for miRNAs to be valuable biomarkers of lung cancer. The lethal-7 (let-7) gene was first identified as playing a critical role in the development of Caenorhabditis elegans and was later shown to have homologs in the human genome (63 ). In C. elegans, the let-7 miRNA family negatively regulates let-60/RAS. Johnson et al. have shown that the 3⬘ untranslated regions of 628 Clinical Chemistry 55:4 (2009)

miRNA in Pancreatic Cancer

Mini-Reviews

miRNA and Cancer

In several studies, miR-216 was identified as being specific for the pancreas (69 –71 ). Other studies have identified abnormal production of as few as 2 miRNAs, miR-196a and miR-217, that can distinguish PDAC samples from samples of healthy pancreas and chronic pancreatitis (72 ). In a later study, increased production of miR-196a was determined to predict poor survival of patients with PDAC (73 ). Although early-stage pancreatic carcinoma can be treated surgically, most cases present at an advanced stage, when surgical resection is not possible because of the vascular dissemination of the tumor and its spread to regional lymph nodes. Szafranska et al., therefore, evaluated the utility of miRNA gene expression profiles in fine-needle aspirate samples for reliably identifying the status of pancreatic tissue disease and to distinguish benign from malignant pancreatic tissues (74 ). Differences between miR196a and miR-217 in raw threshold cycle values obtained in real-time PCR assays clearly separated malignant and benign tissues, for both frozen samples and fine-needle aspirates. The gene encoding miR-217 was highly expressed only in healthy pancreatic tissues, and miR-196a was detected above background only in PDAC samples. Because healthy pancreas consists of about 90% acinar cells, these observations suggest that miR-217 is primarily produced in acinar cells. miR196a was found to be produced only in ductal adenocarcinoma cells, and not in healthy acinar or ductal cells. In addition, Szafranska et al. have shown that there is an absence of miR-196a production in pancreatic intraepithelial neoplasia 1b (PanIN-1b) lesions from microdissected pancreatic tissues; such lesions are likely to have a low propensity to progress into PDAC (74 ). Sixty percent of more aggressive and advanced PanIN-3 lesions were positive for miR-196a, whereas the relative level of miR-196a gene expression dropped to 25% in the intermediate PanIN-2 lesions. miRNA in Prostate Cancer Prostate cancer (CaP) is the most frequently diagnosed malignant tumor and the second-leading cause of cancer deaths in American men. It is estimated that in 2008 there will have been more than 186 000 newly diagnosed CaP cases and more than 28 000 attributed deaths (20 ). The mechanisms that underlie the occurrence and progression of CaP remain largely unknown. Some aberrantly produced miRNAs have been discovered in CaP cell lines, xenografts, and clinical samples

(75 ). These miRNAs may play critical roles in the pathogenesis of CaP. Porkka et al. identified 51 individual miRNAs from an expression profile of 319 genes encoding miRNAs as being either up- or downregulated in CaP compared with healthy tissue (76 ). Twenty-two of these miRNAs were downregulated in all CaPs examined, and 15 were downregulated only in the hormone-refractory cancers. Similarly, these investigators showed 8 miRNAs to be upregulated in all CaPs, but only 6 were upregulated in the hormone-refractory cancers. Ozen et al. have also evaluated miRNA expression in CaP and demonstrated miRNA gene expression profiles consistent with the disease process (77 ). In this study, downregulated miRNAs were known to have target mRNAs, including those for RAS, E2F3, BCL-2, and MCL-1. Conclusion Despite many years of effort to identify biomarkers for human cancers, no biomarker has generated the excitement that has accompanied the interest in the potential of miRNAs. The biologic roles of miRNAs suggest correlations with prognosis and therapeutic outcome. Inhibition of miRNAs with “antagomirs” is an attractive direction for therapy. There are numerous questions outstanding that could lead to further applications of these biomarkers. For example, what are the expression levels of miRNA genes in primary, second primary, or metastatic tumors in any given patient? If similar miRNAs are up- or downregulated in different tumor types, what are the downstream targets of these miRNAs, and are these targets similar or different for different tumor types? Answers to these questions may help shape the way human cancers are categorized, diagnosed, and treated in the future.

Author Contributions: All authors confirmed they have contributed to the intellectual content of this paper and have met the following 3 requirements: (a) significant contributions to the conception and design, acquisition of data, or analysis and interpretation of data; (b) drafting or revising the article for intellectual content; and (c) final approval of the published article. Authors’ Disclosures of Potential Conflicts of Interest: No authors declared any potential conflicts of interest. Role of Sponsor: The funding organizations played no role in the design of study, choice of enrolled patients, review and interpretation of data, or preparation or approval of manuscript.

References 1. Kulasingam V, Diamandis EP. Strategies for discovering novel cancer biomarkers through utili-

zation of emerging technologies. Nat Clin Pract Oncol 2008;5:588 –99.

2. Bentwich I. Prediction and validation of microRNAs and their targets. FEBS Lett 2005;579:5904 –10.

Clinical Chemistry 55:4 (2009) 629

Mini-Reviews 3. Lu J, Getz G, Miska EA, Alvarez-Saavedra E, Lamb J, Peck D, et al. MicroRNA expression profiles classify human cancers [see comment]. Nature 2005;435:834 – 8. 4. Rosenfeld N, Aharonov R, Meiri E, Rosenwald S, Spector Y, Zepeniuk M, et al. MicroRNAs accurately identify cancer tissue origin. Nat Biotechnol 2008;26:462–9. 5. Lee Y, Kim M, Han J, Yeom KH, Lee S, Baek SH, Kim VN. MicroRNA genes are transcribed by RNA polymerase II. EMBO J 2004;23:4051– 60. 6. Borchert GM, Lanier W, Davidson BL. RNA polymerase III transcribes human microRNAs. Nat Struct Mol Biol 2006;13:1097–101. 7. Lee Y, Han J, Yeom KH, Jin H, Kim VN. Drosha in primary microRNA processing. Cold Spring Harb Symp Quant Biol 2006;71:51–7. 8. Gregory RI, Yan KP, Amuthan G, Chendrimada T, Doratotaj B, Cooch N, Shiekhattar R. The Microprocessor complex mediates the genesis of microRNAs. Nature 2004;432:235– 40. 9. Lund E, Guttinger S, Calado A, Dahlberg JE, Kutay U. Nuclear export of microRNA precursors. Science 2004;303:95– 8. 10. Lee Y, Ahn C, Han J, Choi H, Kim J, Yim J, et al. The nuclear RNase III Drosha initiates microRNA processing. Nature 2003;425:415–9. 11. Lee Y, Jeon K, Lee JT, Kim S, Kim NV. MicroRNA maturation: stepwise processing and subcellular localization. EMBO J 2002;21:4663–70. 12. Hutvagner G, Zamore PD. A microRNA in a multiple-turnover RNAi enzyme complex. Science 2002;297:2056 – 60. 13. Brennecke, J, Stark A, Russell RB, Cohen SB. Principles of microRNA-target recognition. PLoS Biol 2005;3:e85. 14. Griffiths-Jones S, Grocock RJ, van Dongen S, Bateman A, Enright AJ. miRBase: microRNA sequences, targets and gene nomenclature. Nucleic Acids Res 2006;34:D140 – 4. 15. Ambros V, Bartel B, Bartel DP, Burge CB, Carrington JC, Chen X, et al. A uniform system for microRNA annotation. RNA 2003;9:277–9. 16. Petersen M, Nielsen CB, Nielsen KE, Jensen GA, Bondensgaard K, Singh SK, et al. The conformations of locked nucleic acids (LNA). J Mol Recognit 2000;13:44 –53. 17. Valoczi A, Hornyik C, Varga N, Burgyan J, Kauppinen S, Havelda Z. Sensitive and specific detection of microRNAs by northern blot analysis using LNA-modified oligonucleotide probes. Nucleic Acids Res 2004;32:e175. 18. Luminex. FlexmiR™ MicroRNA Human Panel instructions. http://www.luminexcorp.com/ (Accessed September 2008). 19. Schmittgen TD, Lee EJ, Jiang J, Sarkar A, Yang L, Elton TC, et al. Real-time PCR quantification of precursor and mature microRNA. Methods 2008; 44:31– 8. 20. Jemal A, Siegel R, Ward E, Hao Y, Xu J, Murray T, et al. Cancer statistics. CA Cancer J Clin 2008;58: 71–96. 21. Iorio MV, Ferracin M, Liu CG, Veronese A, Spizzo R, Sabbioni S, et al. MicroRNA gene expression deregulation in human breast cancer. Cancer Res 2005;65:7065–70. 22. Ma L, Teruya-Feldstein J, Weinberg RA. Tumour invasion and metastasis initiated by microRNA10b in breast cancer. Nature 2007;449:682– 8. 23. Huang Q, Gumireddy K, Schrier M, le Sage C,

630 Clinical Chemistry 55:4 (2009)

24.

25.

26.

27.

28.

29.

30.

31.

32.

33.

34.

35.

36.

37.

38.

39.

Nagel R, Nair S, et al. The microRNAs miR-373 and miR-520c promote tumour invasion and metastasis. Nat Cell Biol 2008;10:202–10. Diaz LK, Zhou X, Wright ET, Cristofanilli M, Smith T, Yang Y, et al. CD44 expression is associated with increased survival in node-negative invasive breast carcinoma. Clin Cancer Res 2005;11: 3309 –14. Frankel LB, Christoffersen NR, Jacobsen A, Lindow M, Krogh A, Lund AH. Programmed cell death 4 (PDCD4) is an important functional target of the microRNA miR-21 in breast cancer cells. J Biol Chem 2008;283:1026 –33. Zhu S, Si ML, Wu H, Mo YY. MicroRNA-21 targets the tumor suppressor gene tropomyosin 1 (TPM1). J Biol Chem 2007;282:14328 –36. Si ML, Zhu S, Wu H, Lu Z, Wu F, Mo YY. miR-21mediated tumor growth. Oncogene 2007;26: 2799 – 803. Schmid T, Jansen AP, Baker AR, Hegamyer G, Hagan JP, Colburn NH. Translation inhibitor Pdcd4 is targeted for degradation during tumor promotion. Cancer Res 2008;68:1254 – 60. Perry SV. Vertebrate tropomyosin: distribution, properties and function. J Muscle Res Cell Motil 2001;22:5– 49. Hossain A, Kuo MT, Saunders GF. Mir-17–5p regulates breast cancer cell proliferation by inhibiting translation of AIB1 mRNA. Mol Cell Biol 2006;26:8191–201. Eiriksdottir G, Johannesdottir G, Ingvarsson S, Bjornsdottir IB, Jonasson JG, Agnarsson BA, et al. Mapping loss of heterozygosity at chromosome 13q: loss at 13q12– q13 is associated with breast tumour progression and poor prognosis. Eur J Cancer 1998;34:2076 – 81. Louie MC, Zou JX, Rabinovich A, Chen HW. ACTR/ AIB1 functions as an E2F1 coactivator to promote breast cancer cell proliferation and antiestrogen resistance. Mol Cell Biol 2004;24:5157–71. Anzick SL, Kononen J, Walker RL, Azorsa DO, Tanner MM, Guan XY, et al. AIB1, a steroid receptor coactivator amplified in breast and ovarian cancer. Science 1997;277:965– 8. Yu Z, Wang C, Wang M, Li Z, Casimiro MC, Liu M, et al. A cyclin D1/microRNA 17/20 regulatory feedback loop in control of breast cancer cell proliferation. J Cell Bio 2008;182:509 –17. Tavazoie SF, Alarcon C, Oskarsson T, Padua D, Wang Q, Bos PD, et al. Endogenous human microRNAs that suppress breast cancer metastasis. Nature 2008;451:147–52. Adams BD, Furneaux H, White BA. The microribonucleic acid (miRNA) miR-206 targets the human estrogen receptor-␣ (ER␣) and represses ER␣ messenger RNA and protein expression in breast cancer cell lines. J Mol Endocrinol 2007; 21:1132– 47. Kondo N, Toyama T, Sugiura H, Fujii Y, Yamashita H. miR-206 expression is down-regulated in estrogen receptor ␣-positive human breast cancer. Cancer Res 2008;68:5004 – 8. Wang M, Tan LP, Dijkstra MK, van Lom K, Robertus JL, Harms G, et al. miRNA analysis in B-cell chronic lymphocytic leukaemia: proliferation centres characterized by low miR-150 and high BIC/ miR-155 expression. J Pathol 2008;215:13–20. Calin GA, Ferracin M, Cimmino A, Di Leva G, Shimizu M, Wojcik SE, et al. A microRNA signature associated with prognosis and progression in

40.

41.

42.

43.

44.

45.

46.

47.

48.

49.

50.

51.

52.

53.

54.

55.

chronic lymphocytic leukemia. N Engl J Med 2005;353:1793– 801; erratum 2006;355:533. Fulci V, Chiaretti S, Goldoni M, Azzalin G, Carucci N, Tavolaro S, et al. Quantitative technologies establish a novel microRNA profile of chronic lymphocytic leukemia. Blood 2007;109:4944 –51. Eis PS, Tam W, Sun L, Chadburn A, Li Z, Gomez MF, et al. Accumulation of miR-155 and BIC RNA in human B cell lymphomas. Proc Natl Acad Sci U S A, 2005;102:3627–32. Cimmino A, Calin GA, Fabbri M, Iorio MV, Ferracin M, Shimizu M, et al. miR-15 and miR-16 induce apoptosis by targeting BCL2. Proc Natl Acad Sci U S A 2005;102:13944 –9; erratum 2006;103:2464. Calin GA, Cimmino A, Fabbri M, Ferracin M, Wojcik SE, Shimizu M, et al. miR-15a and miR16-1 cluster functions in human leukemia. Proc Natl Acad Sci U S A 2008;105:5166 –71. Pekarsky Y, Santanam U, Cimmino A, Palamarchuk A, Efanov A, Maximov V, et al. Tcl1 expression in chronic lymphocytic leukemia is regulated by miR-29 and miR-181. Cancer Res 2006;66: 11590 –3. Marton S, Garcia MR, Robello C, Persson H, Trajtenberg F, Pritsch O, et al. Small RNAs analysis in CLL reveals a deregulation of miRNA expression and novel miRNA candidates of putative relevance in CLL pathogenesis. Leukemia 2008;22:330 – 8. Akao Y, Nakagawa Y, Kitade Y, Kinoshita T, Naoe T. Downregulation of microRNAs-143 and ⫺145 in B-cell malignancies. Cancer Sci 2007;98:1914 –20. Cummins JM, He Y, Leary RJ, Pagliarini R, Diaz LA Jr, Sjoblom T, et al. The colorectal microRNAome. Proc Natl Acad Sci U S A 2006;103:3687–92. Nagel R, le Sage C, Diosdado B, van der Waal M, Oude Vrielink JA, Bolijn A, et al. Regulation of the adenomatous polyposis coli gene by the miR-135 family in colorectal cancer. Cancer Res 2008;68: 5795– 802. Michael MZ, O’ Connor SM, van Holst Pellekaan NG, Young GP, James RJ. Reduced accumulation of specific microRNAs in colorectal neoplasia. Mol Cancer Res 2003;1:882–91. Akao Y, Nakagawa Y, Naoe T. MicroRNA-143 and ⫺145 in colon cancer. DNA Cell Biol 2007; 26:311–20. Guo C, Sah JF, Beard L, Willson JK, Markowitz SD, Guda K. The noncoding RNA, miR-126, suppresses the growth of neoplastic cells by targeting phosphatidylinositol 3-kinase signaling and is frequently lost in colon cancers. Genes Chromosomes Cancer 2008;47:939 – 46. Bandre´s E, Cubedo E, Agirre X, Malumbres R, Za´rate R, Ramirez N, et al. Identification by realtime PCR of 13 mature microRNAs differentially expressed in colorectal cancer and non-tumoral tissues. Mol Cancer 2006;5:29. Lanza G, Ferracin M, Gafa` R, Veronese A, Spizzo R, Pichiorri F, et al. mRNA/microRNA gene expression profile in microsatellite unstable colorectal cancer. Mol Cancer 2007;6:54. Bruix J, Sherman M. American Association for the Study of Liver Diseases practice guideline. Management of hepatocellular carcinoma. Hepatology 2008;42:1208 –36. Lencioni R, Cioni D, Della Pina C, Crocetti L, Bartolozzi C. Imaging diagnosis. Semin Liver Dis 2005;25:162–70.

Mini-Reviews

miRNA and Cancer

56. Murakami Y, Yasuda T, Saigo K, Urashima T, Toyoda H, Okanoue T, et al. Comprehensive analysis of microRNA expression patterns in hepatocellular carcinoma and non-tumorous tissues. Oncogene 2006;25:2537– 45. 57. Ladeiro Y, Couchy G, Balabaud C, Bioulac-Sage P, Pelletier L, Rebouissou S, et al. MicroRNA profiling in hepatocellular tumors is associated with clinical features and oncogene/tumor suppressor gene mutations. Hepatology 2008;47:1955– 63. 58. Li W, Xie L, He X, Li J, Tu K, Wei L, et al. Diagnostic and prognostic implications of microRNAs in human hepatocellular carcinoma. Int J Cancer 2008;123:1616 –22. 59. Jiang J, Gusev Y, Aderca I, Mettler TA, Nagorney DM, Brackett DJ, et al. Association of microRNA expression in hepatocellular carcinomas with hepatitis infection, cirrhosis, and patient survival. Clin Can Res 2008;14:419 –27. 60. Fornari F, Gramantieri L, Ferracin M, Veronese A, Sabbioni S, Calin GA, et al. miR-221 controls CDKN1C/p57 and CDKN1B/p27 expression in human hepatocellular carcinoma. Oncogene 2008; 27:5651– 61. 61. Meyerson M, Carbone D. Genomic and proteomic profiling of lung cancers: lung cancer classification in the age of targeted therapy. J Clin Oncol 2005;23:3219 –26. 62. Granville CA, Dennis PA. An overview of lung cancer genomics and proteomics. Am J Respir Cell Mol Biol 2005;32:169 –76.

63. Roush S, Slack FJ. The let-7 family of microRNAs. Trends Cell Biol 2008;18:505–16. 64. Johnson SM, Grosshans H, Shingara J, Byrom M, Jarvis A, Cheng A, et al. RAS is regulated by the let-7 microRNA family. Cell 2005;120:635– 47. 65. Kumar MS, Erkeland SJ, Pester RE, Chen CY, Ebert MS, Sharp PA, et al. Suppression of non-small cell lung tumor development by the let-7 microRNA family. Proc Natl Acad Sci U S A 2008;105: 3903– 8. 66. Weiss GJ, Bemis LT, Nakajima E, Sugita M, Birks DK, Robinson WA, et al. EGFR regulation by microRNA in lung cancer: correlation with clinical response and survival to gefitinib and EGFR expression in cell lines. Ann Oncol 2008;19:1053–9. 67. Yu SL, Chen HY, Chang GC, Chen CY, Chen HW, Singh S, et al. MicroRNA signature predicts survival and relapse in lung cancer. Cancer Cell 2008;13:48 –57. 68. Garcea G, Neal CP, Pattenden CJ, Steward WP, Berry DP. Molecular prognostic markers in pancreatic cancer: a systematic review. Eur J Cancer 2005;41:2213–36. 69. Baskerville S, Bartel DP. Microarray profiling of microRNAs reveals frequent coexpression with neighboring miRNAs and host genes. RNA 2005; 11:241–7. 70. Shingara J, Keiger K, Shelton J, Laosinchai-Wolf W, Powers P, Conrad D, et al. An optimized isolation and labeling platform for accurate microRNA expression profiling. RNA 2005;11:

1461–70. 71. Sood P, Krek A, Zavolan M, Macino G, Rajewsky N. Cell-type-specific signatures of microRNAs on target mRNA expression. Proc Natl Acad Sci U S A, 2006;103:2746 –51. 72. Szafranska AE, Davison TS, John J, Cannon T, Sipos B, Maghnouj A, et al. MicroRNA expression alterations are linked to tumorigenesis and nonneoplastic processes in pancreatic ductal adenocarcinoma. Oncogene 2007;26:4442–52. 73. Bloomston M, Frankel WL, Petrocca F, Volinia S, Alder H, Hagan JP, et al. MicroRNA expression patterns to differentiate pancreatic adenocarcinoma from normal pancreas and chronic pancreatitis. JAMA 2007;297:1901– 8. 74. Szafranska AE, Doleshal M, Edmunds HS, Gordon S, Luttges J, Munding JB, et al. Analysis of microRNAs in pancreatic fine-needle aspirates can classify benign and malignant tissues. Clin Chem 2008;54:1716 –24. 75. Shi XB, Tepper CG, White RW. MicroRNAs and prostate cancer. J Cell Mol Med 2008;12: 1456 – 65. 76. Porkka KP, Pfeiffer MJ, Waltering KK, Vessella RL, Tammela TLJ, Visakorpi T. MicroRNA expression profiling in prostate cancer. Cancer Res 2007;67: 6130 –5. 77. Ozen M, Creighton CJ, Ozdemir M, Ittmann M. Widespread deregulation of microRNA expression in human prostate cancer. Oncogene 2008;27: 1788 –93.

Clinical Chemistry 55:4 (2009) 631

Mini-Reviews

Clinical Chemistry 55:4 632–640 (2009)

PCR-Based Methods for the Enrichment of Minority Alleles and Mutations Coren A. Milbury,1 Jin Li,1 and G. Mike Makrigiorgos1*

BACKGROUND: The ability to identify low-level somatic DNA mutations and minority alleles within an excess wild-type sample is becoming essential for characterizing early and posttreatment tumor status in cancer patients. Over the past 2 decades, much research has focused on improving the selectivity of PCR-based technologies for enhancing the detection of minority (mutant) alleles in clinical samples. Routine application in clinical and diagnostic settings requires that these techniques be accurate and cost-effective and require little effort to optimize, perform, and analyze. CONTENT:

Enrichment methods typically segregate by their ability to enrich for, and detect, either known or unknown mutations. Although there are several robust approaches for detecting known mutations within a high background of wild-type DNA, there are few techniques capable of enriching and detecting low-level unknown mutations. One promising development is COLD-PCR (coamplification at lower denaturation temperature), which enables enrichment of PCR amplicons containing unknown mutations at any position, such that they can be subsequently sequenced to identify the exact nucleotide change.

SUMMARY:

This review summarizes technologies available for detecting minority DNA mutations, placing an emphasis on newer methods that facilitate the enrichment of unknown low-level DNA variants such that the mutation can subsequently be sequenced. The enrichment of minority alleles is imperative in clinical and diagnostic applications, especially in those related to cancer detection, and continued technology development is warranted.

© 2009 American Association for Clinical Chemistry

1

Department of Radiation Oncology, Division of Medical Physics and Biophysics, and Division of DNA Repair and Genome Stability, Dana Farber/Brigham and Women’s Cancer Center, Harvard Medical School, Boston, MA. * Address correspondence to this author at: Dana Farber/Brigham and Women’s Cancer Center, Brigham and Women’s Hospital, Level L2, Radiation Therapy, 75 Francis St., Boston, MA 02115. E-mail [email protected]. Received September 6, 2008; accepted January 21, 2009. Previously published online at DOI: 10.1373/clinchem.2008.113035

632

A prominent concern confronting clinical and diagnostic applications is the ability to detect clinically significant low-level mutations and minority alleles. The ability to discern mutations is important in many regards, but especially for (a) early cancer detection from tissue biopsies and bodily fluids such as plasma or serum; (b) assessment of residual disease after surgery or radiochemotherapy; (c) disease staging and molecular profiling for prognosis or tailoring therapy to individual patients; and (d) monitoring of therapy outcome and cancer remission/relapse. Efficient detection of cancer-relevant somatic mutations largely depends on the selectivity of the techniques and methods employed. Detection and identification of oncogene and tumor-suppressor gene mutations primarily require analysis of precancerous or cancerous tissue, sputum, urine, stool, and circulating extracellular DNA released in blood. The sample is typically composed of both wild-type and mutant DNA, and the quantity of wildtype DNA often exceeds the mutant DNA contribution. In many cases, wild-type DNA vastly exceeds mutant DNA, making it difficult to detect and identify minority alleles present at extremely low concentrations. The use of enrichment methods is often beneficial or necessary to increase the mutant concentration to a level at which accurate and precise analysis is feasible. The selectivity of the enrichment and detection methods used must be carefully considered to maintain accuracy. To detect low-level early mutations in tumors or the emergence of resistance mutations (e.g., 10–3 to 10– 6 mutant to wild-type DNA), both high selectivity and enrichment of minority alleles are required for successful detection and identification. Furthermore, for a particular approach to be used as a routine diagnostic tool, it must achieve a balance of high selectivity and enrichment while maintaining accuracy, convenience, and low cost. With these guidelines in mind, we reviewed a selection of PCR-based methods developed to preferentially enrich known or unknown mutations present in low concentrations among wild-type DNA. Parsons and Heflich (1 ) and Gocke et al. (2 ) have previously reviewed enrichment methods (also known as “genotypic selection methods”); however, those reviews focused primarily on early methods specifically designed for the enrichment of known point mutations (mutations at predefined DNA positions). Enrichment meth-

Mini-Reviews

Table 1. Minority allele enrichment methods.a Method

Selectivity

Reference

Moderate to high selectivity and enrichment of known mutations ARMS

10⫺1 to 10⫺3

Newton et al. (3 )

ASPCR

10⫺1 to 10⫺3

Wu et al. (4 )

⫺1

⫺3

Okayama et al. (5 )

ASA

10

PASA and PAMSA

10⫺1 to 10⫺3

Sommer et al. (6 ), Dutton and Sommer (7 )

COP

10⫺1 to 10⫺3

Gibbs et al. (8 )

E-PCR

10⫺1 to 10⫺4

Kahn et al. (9 )

⫺1

to 10

⫺5

Cha et al. (10 )

MAMA

10

MASA

10⫺1 to 10⫺3

Takeda et al. (11 )

PNA-mediated PCR

10⫺3 to 10⫺5

Nielsen et al. (17 ), Dabritz et al. (20 )

LNA-mediated WTB-PCR

10⫺1 to 10⫺5

Dominguez and Kolodney (18 ), Oldenburg et al. (19 )

to 10

TaqMan RSM

5 ⫻ 10⫺4

Wolff and Gemmell (16 )

TaqMAMA

5 ⫻ 10⫺5

Easterday et al. (12 )

FLAG-PCR

10⫺1 to 10⫺3

Amicarelli et al. (21 )

⫺3

AIRS-RFLP

10

⫺4

to 10

Haliassos et al. (15 )

Very high selectivity and enrichment of known mutations 10⫺3 to 10⫺8

RSM-PCR

⫺3

⫺6

Parsons and Heflich (1 ), Jenkins et al. (23 ) Kaur et al. (24 )

APRIL-ATM

10

Digital PCR and RMC-PCR

10⫺3 to 10⫺8

Vogelstein et al. (27 ), Bielas and Loeb (28 )

PAP-ASA and bi-PAP-ASA

10⫺4 to 10⫺9

Liu and Sommer (25 ), Shi et al. (26 )

Electrophoresis (HET, SSCP, DGGE, dHPLC, CDCE)

10⫺1 to 10⫺2

Lichten and Fox (29 ), Orita et al. (30 ), Cariello et al. (31 ), Li-Sucholeiki and Thilly (32 ), Underhill et al. (33 ), Emmerson et al. (34 )

Endo V-ligase PCR

10⫺1 to 10⫺2

Pincas et al. (37 )

MutY-LM-PCR

10⫺1 to 10⫺2

Zhang et al. (36 )

sRT-MELT

10⫺1 to 10⫺2

Li et al. (38 )

iFLP

10⫺3 to 10⫺5

Liu et al. (39 )

to 10

Enrichment and detection of unknown mutations

b

COLD-PCR a b

⫺1

10

⫺4

to 10

Li et al. (40 )

Selectivity is presented as a range representing the commonly achieved and maximum selectivity of mutant detection of the approach. LM, ligation-mediated; sRT-MELT, surveyor-mediated real-time melting.

ods typically segregate by their ability to enrich either known or unknown mutations. The design and development of mutation enrichment methodologies is a much easier task for known mutations than it is for unknown mutations, as sequence data can be used, specific nucleotides can be targeted, and a wider scope of applications are available. As a result, more methods have been developed and modified to preferentially enrich known mutations than unknown mutations. For many cancer-relevant genes, the occurrence of unknown somatic mutations can be very important, and

mutation-selectivity is a strong consideration in choosing the appropriate method for routine testing and identification. Selectivity of a mutation detection method refers to the selection of mutation-containing alleles among an excess of wild-type alleles; enrichment refers to a process that increases mutant allele concentration relative to wild-type alleles, such that subsequent mutation detection is facilitated. For purposes of this review, minority allele enrichment methods will be discussed according to their ability to enhance known vs unknown mutations and their degree of selectivity. Clinical Chemistry 55:4 (2009) 633

Mini-Reviews MODERATE- TO HIGH-SELECTIVITY METHODS AND ENRICHMENT OF KNOWN MUTATIONS

Many moderate- to high-selectivity PCR methods for known mutations have been developed over the past 2 decades (Table 1). One of the most widely used approaches relies on the use of 3⬘ terminal nucleotide manipulation to enhance allele-specific amplification of a particular nucleotide variant (i.e., mutant or minority allele). Methods such as amplification refractory mutation system (ARMS)2 (3 ), allele-specific amplification (ASPCR) (4 ), allele-specific amplification (ASA) (5 ), PCR amplification of specific alleles (PASA) (6 ), and PCR amplification of multiple specific alleles (PAMSA) (7 ) have the ability to enrich minority alleles present among wild-type DNA at concentrations as low as 0.1% to 1%. Generally, these approaches are relatively easy to use and tend to produce results with high accuracy, although selectivity typically remains low to moderate. Increasing the selectivity further by the inclusion of additional nucleotide mismatches toward the 3⬘ end is possible, but requires extensive experimentation and optimization. Derivatives of this approach have been developed and include competitive oligonucleotide priming (COP) (8 ), mutant enrichment PCR [enriched or mutant-enriched PCR (EPCR or ME-PCR)] (9 ), mismatch amplification mutation assay (MAMA) (10 ), and mutant allele–specific amplification (MASA) (11 ). These methods have exhibited similar or higher selectivity and the ability to enrich a single minority allele present among 102 (COP and ME-PCR) to 105 (MAMA) wild-type alleles. Various combinations of allele-specific PCR with real-time PCR have also been shown to effectively enrich minority alleles with moderate to high selectivity. For example, TaqMAMA (12 ) combines the real-time scoring attributes of TaqMan® probes with the selec-

2

Nonstandard abbreviations: ARMS, amplification refractory mutation system; ASPCR, allele-specific enzymatic amplification; ASA, allele-specific amplification; PASA, PCR amplification of specific alleles; PAMSA, PCR amplification of multiple specific alleles; COP, competitive oligonucleotide priming; E-PCR, enriched PCR; ME-PCR, mutant-enriched PCR; MAMA, mismatch amplification mutation assay; MASA, mutant allele–specific amplification; aQRT-PCR, antiprimer quenching-based real-time PCR; REMS-PCR, restriction endonuclease– mediated selective PCR; AIRS, artificial introduction of a restriction site; PNA, peptide nucleic acid; LNA, locked nucleic acid; WTB-PCR, wild-type blocking PCR; FLAG, fluorescent amplicon generation; RSM-PCR, restriction site mutation PCR; APRIL-ATM, amplification via primer ligation, at the mutation; PAP, pyrophosphate-activated polymerization; RMC, random mutation capture; CCM, chemical cleavage of mismatches; endo, endonuclease; TDG, thymine DNA glycosylase; CEL I, celery extract I; HRM, high-resolution melting; HET, heteroduplex analysis; SSCP, single-strand conformation polymorphism; DGGE, denaturing gradient gel electrophoresis; CDCE, constant denaturing capillary electrophoresis; dHPLC, denaturing HPLC; iFLP, inverse PCR-based amplified RFLP; COLD-PCR, coamplification at lower denaturation temperature PCR; Tc, critical denaturation temperature; dsDNA, double-stranded DNA; Tm, melting temperature.

634 Clinical Chemistry 55:4 (2009)

tivity of the MAMA approach to preferentially enrich and simultaneously identify the nature of a known nucleotide variant. TaqMAMA can enrich and detect an alternate allele when present in an excess of approximately 2 ⫻ 103 wild-type alleles, although it is questionable if this degree of selectivity can be achieved for all mutation screening targets. Another real-time PCRbased approach, antiprimer quenching-based realtime PCR (aQRT-PCR) (13 ), uses an allele-specific primer for mutant enrichment, real-time genotyping, and real-time product quantification in a single-step, closed-tube format. Use of thermostable restriction enzymes that selectively destroy wild-type samples during PCR, thereby enriching the mutation frequency, has also led to methods for genotypic selection [restriction endonuclease–mediated selective PCR (REMS-PCR)] (14 ). When appropriate restriction endonucleases are not available, artificial introduction of a restriction site (AIRS) RFLP (15 ) can generate an endonuclease recognition sequence by modifying 1 or more nucleotides within the priming region of the wild-type DNA. The selectivity of these approaches was reported to enable detection of 1 mutated cell among 2.5 ⫻ 103 wild-type alleles (15 ). In real-time format, the combination of TaqMan assays and allele-specific restriction by an endonuclease has detection selectivity for an allele contribution of 1 in 2.0 ⫻ 103 wild-type alleles (16 ). Alternative moderate- to high-selectivity techniques have used physical molecular modifications to allele specificity. For example, peptide nucleic acids (PNAs) (17 ) and locked nucleic acids (LNAs) (18 ) both have increased binding affinities. Through a PCR clamping approach, primers can be replaced by LNA or PNA hybridization probes that are specific for the wild type; both PNAs and LNAs can suppress the amplification of wild-type DNA, allowing for increased amplification of the mutant allele. The use of LNAs in wildtype blocking PCR (WTB-PCR) and of PNAs in K-ras amplification has exhibited the ability to identify mutations among 105 wild-type alleles (19, 20 ), although routine application of these approaches often results in detecting mutants at frequencies of 10–2 to 10–3. Last, the FLAG assay (fluorescent amplicon generation) (21 ) combines REMS-PCR with incorporation of an exceptionally thermostable endonuclease (PspGI) and PNA probes and can be performed in real-time, highthroughput, and closed-tube format. The FLAG assay has demonstrated the ability to detect 1 mutant in 103 wild-type DNA. Generally, PNA/LNA-based approaches are attractive, although the time and cost required for optimization may hinder their widespread use.

Mini-Reviews VERY-HIGH-SELECTIVITY METHODS AND ENRICHMENT OF KNOWN MUTATIONS

Some enrichment methods boast very high selectivity and the ability to preferentially enrich and identify known mutations at extremely low levels. RFLP-PCR– based approaches have proven to be simple and inexpensive in their application as well as highly selective for the enrichment of known mutations. The use of thermostable restriction endonucleases either before PCR amplification or concurrently with PCR has been aggressively applied by performing more than 1 round of enrichment, with the aim to increase digestion and suppression of the wild-type allele, thus preferentially amplifying the mutant type. Several derivatives of this approach have been developed; however, one of the first, the restriction site mutation assay (RSM-PCR) (22 ), has exhibited the capability to enrich mutants present at 1 mutant per 108 wild-type genes (1, 23 ). The APRIL-ATM method (amplification via primer ligation, at the mutation) (24 ) uses an inverse approach, mutant-specific RFLP, to digest mutant PCR products rather than wild-type products. Subsequently, oligonucleotides are ligated to the digested fragments at the site of the mutation and subjected to a second PCR, thus preferentially enriching the mutant DNA. APRILATM has exhibited high selectivity, with the ability to detect a frequency of 1.6 ⫻ 10– 6 mutant alleles among wild-type DNA. Although these RFLP-PCR– based approaches are often advantageous because they are simple in application and low in cost, in some cases they may yield selectivity only on the order of 10–3 to 10– 4 mutants per wild-type DNA (25 ). One method that reports very high selectivity and enrichment is based on the combined use of pyrophosphate-activated polymerization (PAP) and ASA (25 ). PAP-ASA employs an allele-specific oligonucleotide (P*), which is activated by pyrophosphorolysis and DNA polymerization during PCR. A 3⬘terminal dideoxynucleotide is removed in the presence of pyrophosphate, and the activated P* can then be extended by the DNA polymerase. The authors report that this approach is capable of detecting 1 mutant allele in 106 to 109 wild-type alleles (25 ). A bidirectional modification of PAP-ASA (bi-PAP-ASA) (26 ) uses 2 opposing, allele-specific 3⬘terminal oligonucleotides (P*) to increase selectivity and amplify low-level somatic mutants present among 107 to 109 wild types (Fig. 1A). On the other hand, not all reports achieve this high selectivity using pyrophosphorolysis. One recent study used a PAP-based method to detect low level B-RAF in uveal melanomas. The authors were able to detect low-level mutations that were not detectable by Sanger sequencing; however, their evaluation of the technique exhibited a detection limit of 1 mutant among 104 wild-type alleles.

PCR of single DNA molecules may also be considered a form of high mutant enrichment. For example, digital PCR (27 ) relies on the amplification of individual molecules of DNA (Fig. 1B). In the original report (27 ), DNA template was diluted to distribute approximately 1 molecule of DNA per reaction, thus allowing detection of approximately 1 mutant in 103 alleles in numerous parallel PCR reactions. In principle, the more reactions that are performed, the higher the selectivity; in practice, the selectivity is limited by the occurrence of PCR errors. Digital PCR is currently difficult to apply in routine applications with conventional thermocyclers, as it requires the analysis of a very large number of samples to detect mutants occurring at very low frequencies relative to the wild-type DNA. However, the latter assessment may eventually change with the onset of nanofluidics. Bielas and Loeb (28 ) developed [random mutation capture (RMC)], a mutant enrichment method, based on a combination of RSM and digital PCR, that is capable of identifying 1 mutant base among 108 wild-type nucleotides. In this interesting, but complex, approach, the mutant phenotype is enriched through biotin-labeled probes and magnetic bead separation. The collected fraction is then subjected to TaqI cleavage to digest remaining wild-type DNA; the final product is diluted to isolate single molecules, and quantitative PCR is performed to amplify the mutant phenotype. Overall, there are currently several approaches available that report high selectivity and enrichment of known mutations and that are also applicable for routine application. Many of these assays are simple in their methodology and application (ASA- and RFLPbased approaches, for example); however, their selectivity is often not sufficient for enhancement of extremely low-level known mutations. On the other hand, highly selective approaches for detecting known mutations can be time-consuming and difficult to perform, and therefore may not be appropriate for most routine clinical and diagnostic applications. Accordingly, the selection of a technique depends very much on the intended application. ENRICHMENT OF UNKNOWN MUTATIONS FOLLOWED BY MUTATION SEQUENCING

Traditionally, the identification of unknown mutations has relied on Sanger sequencing analysis; however, sequencing is reliable only for detecting mutant alleles that exist at concentrations above approximately 20% among wild-type DNA (27 ). This degree of sensitivity is inappropriate for detecting low-level somatic mutations in several situations, such as in premalignant tissues or during early cancer development, posttreatment tissue, or apoptotic and necrotic circulating DNA molecules or for detecting the emergence of reClinical Chemistry 55:4 (2009) 635

Mini-Reviews

Fig. 1. Highly selective PCR-based methods for known and unknown mutation enhancement and identification. (A), Bidirectional pyrophosphorolysis-activated polymerization allele-specific amplification (bi-PAP-ASA) for high selectivity of known mutations (26 ). P* is a specifically designed oligonucleotide with a 3⬘-terminal blocker that is activated, but not extended, by pyrophosphorolysis. Downstream and upstream P* contain dideoxy C and G at the 3⬘ termini that are specific to the mutant but not the wild type. Efficient amplification of the mutant occurs after pyrophosphorolysis (to remove the 3⬘-terminal ddCMP) and polymerization. Inefficient amplification is denoted by the gray arrows. Nonspecific type I error amplification is rare; type II error is caused by serial mismatch phosphorolysis and misincorporation, which results in the exponential amplification of the mutated product and reduces selectivity. (B), Digital PCR for high selectivity of both known and unknown mutations (27 ). Genomic DNA is diluted to approximately 1–2 copies per well. The number (N) of required wells varies widely and depends on putative mutant and wild-type concentrations. PCR is performed on each sample well individually. PCR amplicons can be used in many downstream applications such as direct sequencing, pyrosequencing, TaqMan assays, and molecular beacons.

sistance mutations in radiochemotherapy-treated tumors. The development of techniques that can be applied to enrich DNA containing unknown mutations that exist at low concentrations relative to the wild-type DNA, followed by sequencing to identify the exact nucleotide change, is thus of high interest. To this end, although there are techniques that perform mutation scanning with higher selectivity relative to sequencing [e.g., chemical cleavage of mismatches (CCM); cleavage of mismatches using endonuclease V (endo V), endo VII, T4, MutY, thymine DNA glycosylase (TDG), or celery extract I (CEL I) mismatch detection enzymes; high-resolution melting (HRM); and others] the following paragraphs focus specifically on methods 636 Clinical Chemistry 55:4 (2009)

that enable nondestructive selection and enrichment of DNA containing unknown mutations, such that they can be followed by sequencing to identify the position and exact nucleotide change. There are several established approaches to enrich DNA containing unknown mutations via the use of electrophoretic methods of post-PCR products. Among these are heteroduplex analysis (HET) (29 ), single-strand conformation polymorphism (SSCP) (30 ), denaturing gradient gel electrophoresis (DGGE) (31 ), constant denaturing capillary electrophoresis (CDCE) (32 ), and denaturing HPLC (dHPLC) (33 ). For example, for the commonly used dHPLC method, mutant and wild-type PCR products can be physically

Mini-Reviews

Fig. 2. COLD-PCR protocol (40 ). Two forms of COLD-PCR have been developed, Full COLD-PCR (A) and Fast COLD-PCR (B). An example protocol for a 167-bp TP53 amplicon is reviewed here. (A), Full COLD-PCR has the potential to enrich all possible mutations. Several preliminary rounds of conventional PCR enable an initial buildup of 1 or more target amplicons, then the cycling switches to COLD-PCR. After denaturation at 94 °C, the PCR amplicons are incubated (e.g., 70 °C for 2– 8 min) for reannealing and cross-hybridization. Cross-hybridization of mutant and wild-type alleles forms a mismatch-containing structure (heteroduplex) that has a lower melting temperature than a fullymatched structure (homoduplex). The PCR temperature is next raised to the critical denaturation temperature (Tc) (e.g., 86.5 °C) to preferentially denature the heteroduplexed amplicons. The temperature is reduced for primer annealing (e.g., 55 °C) and then raised to 72 °C to extend the amplicon and preferentially amplify the mutation-containing alleles. (B), Fast COLD-PCR is a simpler cycling that can be performed to enrich for mutations that reduce the melting temperature of the wild-type amplicon. Using the mutant Tc, rather than the standard 94 °C denaturation temperature, preferentially denatures the lower-Tm allele. Fast COLD-PCR does not perform the 70 °C incubation step. Fast COLD-PCR amplification and enrichment begins earlier in the cycling than in full COLD-PCR, thus resulting in higher enrichment.

separated via their difference in retention times on polycarbonate columns and subsequently collected on a fraction collector. The mutant DNA can thus be preferentially separated, PCR-amplified, and used in downstream applications. The disadvantages of fraction-mediated dHPLC are the requirement for extra steps in the overall procedure, the limited electrophoretic separation between mutant and wild-type alleles for certain mutations, and the required equipment expense. When performed accurately, however, this approach has the ability to enrich the mutant fraction 10-fold, from as little as approximately 5% to as much as approximately 50% (34 ). Enzymatic approaches using mismatch-detecting enzymes such as immobilized MutS have also been applied to enrich PCR sequences containing unknown

mutations (35 ). In addition, glycosylases MutY or TDG combined with ligation-mediated PCR have also been reported to selectively enrich mutationcontaining sequences (36 ). Unfortunately, the selectivity of MutS is only modest, whereas MutY and TDGbased approaches are restricted to detecting only a fraction of all possible mutations. In another approach, DNA ligation is combined with endo V (37 ). Endo V detects and cleaves heteroduplex DNA 1 base downstream from the mutation. AK16D DNA ligase is then used to fill in background nicks, increasing assay sensitivity. The second step in this approach uses internally labeled primers to eliminate the endo V cleavage at the 5⬘ terminus and selectively amplify cleaved fragments, thus allowing for specific mutant detection and amplification. One approach has recently reported the Clinical Chemistry 55:4 (2009) 637

Mini-Reviews

Fig. 3. COLD-PCR improves mutation detection in downstream assays (40 ). (A), Sanger sequencing detects low-level mutations after COLD-PCR. The HCC2218 cell line (TP53 exon 8; 14516 C⬎T) was diluted in wild-type DNA. Sanger sequencing was performed on products amplified by both conventional (upper panel) and COLD (lower panel; Tc 86.5 °C) PCR. COLD-PCR sequencing exhibits enrichment of the mutated allele. (B), COLD-PCR improves detection via pyrosequencing. DNA from cell line A549 was diluted 33-fold into wild-type DNA; a 98-bp K-ras exon 2 segment was amplified by both COLD-PCR (lower panel; Tc 80 °C) and conventional PCR (upper panel), followed by pyrosequencing. The G⬎A mutation of the A549 cell line was visible only when COLD-PCR was applied. (C), COLD-PCR improves the sensitivity of MALDI-TOF genotyping technologies. Fast COLD-PCR (Tc 83.5 °C) was used to amplify an 87-bp fragment in plasma-circulating DNA (hotspot mutation TP53 exon 8, codon 273). Amplicons were genotyped using MALDI-TOF. The G⬎A mutation was detectable in COLD-PCR amplicons (lower panel); however, it was undetectable in conventional PCR amplicons (upper panel). (D), COLD-PCR improves the sensitivity of TaqMan genotyping technologies. Serial dilutions of the H1975 cell line (containing T790M mutation of EGFR exon 20) in wild-type DNA were screened with conventional and COLD-PCR TaqMan genotyping for T790M mutation. Upper panel: conventional PCR TaqMan genotyping for T790M mutation; lower panel: COLD-PCR TaqMan genotyping for T790M mutation.

use of a highly selective enzyme, CEL I (Surveyor), in conjunction with ligation of a primer at the 3⬘OH end of CEL I– digested fragments to enable enrichmentPCR of mutation-containing DNA followed by sequencing (38 ). This approach detects all mutations and enables sequencing of unknown mutations at levels of 1–5 ⫻ 10–2 mutant-to-wild-type ratio. The sensitivity of approaches employing mismatch-detecting enzymes is ultimately limited by the selectivity and efficiency of the enzymes used. Compared with restriction endonucleases, the selectiv638 Clinical Chemistry 55:4 (2009)

ity of any available mismatch detecting enzyme is much inferior. An approach that employs a restriction endonuclease to perform a highly selective mutation scanning is iFLP (inverse PCR-based amplified RFLP) (39 ). iFLP combines inverse PCR, RFLP, and dHPLC. DNA is circularized and subsequently digested by TaqI restriction enzyme. Circularized DNA that does not normally contain TaqI recognition sequences is targeted in this approach. Any sequence that has acquired TaqI sites anywhere on the sequence due to a mutation is recognized by the enzyme and converted into double-

Mini-Reviews stranded linear DNA fragments, which can be ligated to TaqI-specific adaptors and PCR-amplified. This method can detect 1 unknown mutant in 105 wild-type sequences; however, the technique is time-intensive and can detect only a fraction of all possible mutations. Despite progress provided in the enrichment of unknown minority alleles by methods based on postPCR capillary electrophoresis or enzymatic recognition followed by second PCR and sequencing, these methods generally require multistep protocols that can be time-consuming to perform. However, a new technique has recently been developed that removes many difficulties associated with the enrichment of unknown mutations. Coamplification at lower denaturation temperature PCR (COLD-PCR) (40 ) is a single-step method that results in the enhancement of both known and unknown minority alleles during PCR, irrespective of mutation type and position. This approach is based on the observation that, for a given DNA sequence close to a critical denaturation temperature (Tc), the percent denaturation becomes sensitive to the exact DNA sequence, such that even point mutations make a substantial difference. This principle is used for mutation enrichment by inducing the formation of heteroduplexes at positions of mutations, during PCR. Thus by using a lower denaturation temperature during PCR, double-stranded DNA (dsDNA) containing mismatches (heteroduplexes) denature first. True homoduplexes have a higher melting temperature (Tm) and denature less than heteroduplexes at the critical denaturation temperature; thus their amplification is relatively suppressed (Fig. 2). For mutations that tend to lower Tm (such as G:C⬎T:A or G:C⬎A:T), which make up approximately 70% of the encountered mutations, COLD-PCR enriches mutant alleles even without formation of heteroduplexes (Fig. 2). As a general rule, a substantial enrichment for most COLD-PCR reactions can be obtained by using a Tc approximately 1 °C lower than the amplicon Tm; for certain sequences, however, fine-tuning of the Tc can be beneficial, and an optimal Tc can vary 0.5 °C to 1.5 °C lower than the Tm. The Tm can be experimentally determined on most real-time thermocyclers by performing a melting curve after PCR. The COLD-PCR principle enables direct PCR from genomic DNA to amplify mutation-containing alleles with a selectivity of up to 100-fold over wild-type alleles. Advantages of COLD-PCR include its simplicity in performance, preferential amplification of mismatch-containing dsDNA containing unknown mutations without the need for lengthy protocols or allele-specific primers, probes, and enzymes, and ability to sequence directly the amplified product. Additionally, COLD-PCR can be used in place of conven-

tional PCR and combined with most existing assays, while it requires essentially no additional cost, time, and labor. COLD-PCR has been used to identify both known and unknown mutations in place of PCR for a variety of techniques such as Sanger sequencing (e.g., Fig. 3A), pyrosequencing (Fig. 3B), MALDI-TOF (Fig. 3C), and TaqMan probe analyses (Fig. 3D) (40 ). Disadvantages of COLD-PCR include the requirement for precise denaturation temperature control during PCR (to within ⫾0.3 °C), restriction of analyzing sequences smaller than approximately 200 bp, vulnerability to polymerase errors, and variability of the overall mutation enrichment obtained depending on DNA position and nucleotide substitution. COLD-PCR selectivity for point mutations can increase further if subsequent PCR rounds are performed, as has already been shown for unknown deletions (40 ). As with deep-sequencing approaches that use single-molecule sequencing (41 ), COLD-PCR enrichment of mutations is ultimately limited by polymerase-introduced errors. As newer polymerases with very high fidelity are continuously being improved, however, so are the ultimate enrichment abilities of approaches like COLD-PCR. Ultradeep sequencing following several rounds of COLDPCR could reveal aspects of cancer biology that are clinically very important (e.g., the origins of resistance to therapy). In summary, technical developments on many fronts in mutation detection are accumulating at a rapid pace. As knowledge of the biological and clinical impact of low-level mutations in cancer is increasing, the need for further development of methods capable of enhancing low-level minority alleles will continue to grow.

Author Contributions: All authors confirmed they have contributed to the intellectual content of this paper and have met the following 3 requirements: (a) significant contributions to the conception and design, acquisition of data, or analysis and interpretation of data; (b) drafting or revising the article for intellectual content; and (c) final approval of the published article. Authors’ Disclosures of Potential Conflicts of Interest: Upon manuscript submission, all authors completed the Disclosures of Potential Conflict of Interest form. Potential conflicts of interest: Employment or Leadership: None declared. Consultant or Advisory Role: None declared. Stock Ownership: None declared. Honoraria: None declared. Research Funding: NIH grants CA-115439 and CA-111994 and NIH training grant 5 T32 CA09078 (the training grant was to C.A. Milbury and J. Li). Expert Testimony: None declared. Role of Sponsor: The funding organizations played no role in the design of study, choice of enrolled patients, review and interpretation of data, or preparation or approval of manuscript.

Clinical Chemistry 55:4 (2009) 639

Mini-Reviews References 1. Parsons BL, Heflich RH. Genotypic selection methods for the direct analysis of point mutations. Mutat Res 1997;387:97–121. 2. Gocke CD, Benko FA, Kopreski MS, Evans DB. Enrichment methods for mutation detection. Ann N Y Acad Sci 2000;9906:31– 8. 3. Newton CR, Graham A, Heptinstall LE, Powell SJ, Summers C, Kalsheker N, et al. Analysis of any point mutation in DNA: the amplification refractory mutation system (ARMS). Nucleic Acids Res 1989;17:2503–16. 4. Wu DY, Ugozzoli L, Pal BK, Wallace RB. Allelespecific enzymatic amplification of beta-globin genomic DNA for diagnosis of sickle-cell anemia. Proc Natl Acad Sci U S A 1989;86:2757– 60. 5. Okayama H, Curiel DT, Brantly ML, Holmes MD, Crystal RG. Rapid, nonradioactive detection of mutations in the human genome by allele-specific amplification. J Lab Clin Med 1989;114:105–13. 6. Sommer SS, Cassady JD, Sobell JL, Bottema CDK. A novel method for detecting point mutations or polymorphisms and its application to population screening for carriers of phenylketonuria. Mayo Clin Proc 1989;64:1361–72. 7. Dutton C, Sommer SS. Simultaneous detection of multiple single-base alleles at a polymorphic site. Biotechniques 1991;11:700 –2. 8. Gibbs RA, Nguyen PN, Caskey CT. Detection of single DNA-base differences by competitive oligonucleotide priming. Nucleic Acids Res 1989;17: 2437– 48. 9. Kahn SM, Jiang W, Culbertson TA, Weinstein IB, Williams GM, Tomita N, Ronai Z. Rapid and sensitive nonradioactive detection of mutant K-ras genes via enriched PCR amplification. Oncogene 1991;6:1079 – 83. 10. Cha RS, Zarbl H, Keohavong P, Thilly WG. Mismatch amplification mutation assay (MAMA): application to the c-H-ras gene. PCR Meth Appl 1992;2:14 –20. 11. Takeda S, Ichii S, Nakamura Y. Detection of K-ras mutation in sputum by mutant-allele-specific amplification (MASA). Hum Mutat 1993;2:112–7. 12. Easterday WR, Van Ert MN, Zanecki S, Keim P. Specific detection of Bacillus anthracis using a TaqMan mismatch amplification mutation assay. Biotechniques 2005;38:731–5. 13. Li J, Wang FF, Mamon H, Kulke MH, Harris L, Maher E, et al. Antiprimer quenching-based realtime PCR and its application to the analysis of clinical cancer samples. Clin Chem 2006;52:624 – 33. 14. Ward R, Hawkins N, O’Grady R, Sheehan C, O’Connor T, Impey H, et al. Restriction endonuclease-mediated selective polymerase chain reaction: a novel assay for the detection of K-ras mutations in clinical samples. Am J Pathol 1998;153:373–9. 15. Haliassos A, Chomel JC, Grandjouan S, Kruh J, Kaplan JC, Kitzis A. Detection of minority point

640 Clinical Chemistry 55:4 (2009)

16.

17.

18.

19.

20.

21.

22.

23.

24.

25.

26.

27. 28.

29.

mutations by modified PCR technique: a new approach for a sensitive diagnosis of tumorprogression markers. Nucleic Acids Res 1989;17: 8093–9. Wolff JN, Gemmell NJ. Combining allele-specific fluorescent probes and restriction assay in realtime PCR to achieve SNP scoring beyond allele ratios of 1:1000. Biotechniques 2008;44:193– 4, 6, 9. Nielsen PE, Egholm M, Berg RH, Buchardt O. Sequence-selective recognition of DNA by strand displacement with a thymine-substituted polyamide. Science (Wash DC) 1991;254:1497–500. Dominguez PL, Kolodney MS. Wild-type blocking polymerase chain reaction for detection of single nucleotide minority mutations from clinical specimens. Oncogene 2005;24:6830 – 4. Oldenburg RP, Liu MS, Kolodney MS. Selective amplification of rare mutations using locked nucleic acid oligonucleotides that competitively inhibit primer binding to wild-type DNA. J Invest Dermatol 2008;128:398 – 402. Dabritz J, Hanfler J, Preston R, Stieler J, Oettle H. Detection of Ki-ras mutations in tissue and plasma samples of patients with pancreatic cancer using PNA-mediated PCR clamping and hybridisation probes. Br J Cancer 2005;92:405–12. Amicarelli G, Shehi E, Makrigiorgos GM, Adlerstein D. FLAG assay as a novel method for realtime signal generation during PCR: application to detection and genotyping of KRAS codon 12 mutations. Nucleic Acids Res 2007;35:e131. Parry JM, Shamsher M, Skibinski DOF. Restriction site mutation analysis, a proposed methodology for the detection and study of DNA base changes following mutagen exposure. Mutagenesis 1990; 5:209 –12. Jenkins GJS, Hashemzadeh Chaleshtori M, Song H, Parry JM. Mutation analysis using the restriction site mutation (RSM) assay. Mutat Res 1998; 405:209 –20. Kaur M, Zhang Y, Liu W-H, Tetradis S, Price BD, Makrigiorgos GM. Ligation of a primer at a mutation: a method to detect low level mutations in DNA. Mutagenesis 2002;17:365–74. Liu Q, Sommer SS. Pyrophosphorolysis-activated polymerization (PAP): application to allelespecific amplification. Biotechniques 2000;29: 1072– 83. Shi J, Liu Q, Sommer SS. Detection of ultrarare somatic mutation in the human TP53 gene by bidirectional pyrophosphorolysis-activated polymerization allele-specific amplification. Hum Mutat 2007;28:131– 6. Vogelstein B, Kinzler KW. Digital PCR. Proc Natl Acad Sci U S A 1999;96:9236 – 41. Bielas JH, Loeb LA. Quantification of random genomic mutations. Nat Methods 2005;2:285– 90. Lichten MJ, Fox MS. Detection of non-homology-

30.

31.

32.

33.

34.

35.

36.

37.

38.

39.

40.

41.

containing heteroduplex molecules. Nucleic Acids Res 1983;11:3959 –71. Orita M, Iwahana H, Kanazawa H, Hayashi K, Sekiya T. Detection of polymorphisms of human DNA by gel electrophoresis as single-strand conformation polymorphisms. Proc Natl Acad Sci U S A 1989;86:2766 –70. Cariello NF, Scott JK, Kat AG, Thilly WG, Keohavong P. Resolution of a missense mutant in human genomic DNA by denaturing gradient gel electrophoresis and direct sequencing using in vitro DNA amplification: HPRT Munich. Am J Hum Genet 1988;42:726 –34. Li-Sucholeiki XC, Thilly WG. A sensitive scanning technology for low frequency nuclear point mutations in human genomic DNA. Nucleic Acids Res 2000;28:E44. Underhill PA, Jin L, Zemans R, Oefner PJ, CavalliSforza LL. A pre-Columbian Y chromosomespecific transition and its implications for human evolutionary history. Proc Natl Acad Sci U S A 1996;93:196 –200. Emmerson P, Maynard J, Jones S, Butler R, Sampson JR, Cheadle JP. Characterizing mutations in samples with low-level mosaicism by collection and analysis of DHPLC fractionated heteroduplexes. Hum Mutat 2003;21:112–5. Wagner R, Debbie P, Radman M. Mutation detection using immobilized mismatch binding protein (MutS). Nucleic Acids Res 1995;23:3944 – 8. Zhang Y, Kaur M, Price BD, Tetradis S, Makrigiorgos GM. An amplification and ligation-based method to scan for unknown mutations in DNA. Hum Mutat 2002;20:139 – 47. Pincas H, Pingle MR, Huang J, Lao K, Paty PB, Friedman AM, Barany F. High sensitivity EndoV mutation scanning through real-time ligase proofreading. Nucleic Acids Res 2004;32:e148. Li J, Berbeco R, Distel RJ, Janne PA, Wang LL, Makrigiorgos GM. s-RT-MELT for rapid mutation scanning using enzymatic selection and real time DNA-melting: new potential for multiplex genetic analysis. Nucleic Acids Res 2007;35:e84. Liu WH, Kaur M, Wang G, Zhu P, Zhang YZ, Makrigiorgos GM. Inverse PCR-based RFLP scanning identifies low-level mutation signatures in colon cells and tumors. Cancer Res 2004;64: 2544 –51. Li J, Wang L, Mamon H, Kulke MH, Berbeco R, Makrigiorgos GM. Replacing PCR with COLD-PCR enriches variant DNA sequences and redefines the sensitivity of genetic testing. Nat Med 2008; 14:579 – 84. Thomas RK, Nickerson E, Simons JF, Janne PA, Tengs T, Yuza Y, et al. Sensitive mutation detection in heterogeneous cancer specimens by massively parallel picoliter reactor sequencing. Nat Med 2006;12:852–5.

Reviews

Clinical Chemistry 55:4 641–658 (2009)

Next-Generation Sequencing: From Basic Research to Diagnostics Karl V. Voelkerding,1,2* Shale A. Dames,1† and Jacob D. Durtschi1†

BACKGROUND: For the past 30 years, the Sanger method has been the dominant approach and gold standard for DNA sequencing. The commercial launch of the first massively parallel pyrosequencing platform in 2005 ushered in the new era of high-throughput genomic analysis now referred to as next-generation sequencing (NGS). CONTENT:

This review describes fundamental principles of commercially available NGS platforms. Although the platforms differ in their engineering configurations and sequencing chemistries, they share a technical paradigm in that sequencing of spatially separated, clonally amplified DNA templates or single DNA molecules is performed in a flow cell in a massively parallel manner. Through iterative cycles of polymerase-mediated nucleotide extensions or, in one approach, through successive oligonucleotide ligations, sequence outputs in the range of hundreds of megabases to gigabases are now obtained routinely. Highlighted in this review are the impact of NGS on basic research, bioinformatics considerations, and translation of this technology into clinical diagnostics. Also presented is a view into future technologies, including real-time single-molecule DNA sequencing and nanopore-based sequencing.

SUMMARY:

In the relatively short time frame since 2005, NGS has fundamentally altered genomics research and allowed investigators to conduct experiments that were previously not technically feasible or affordable. The various technologies that constitute this new paradigm continue to evolve, and further improvements in technology robustness and process streamlining will pave the path for translation into clinical diagnostics.

© 2009 American Association for Clinical Chemistry

1

ARUP Institute for Experimental and Clinical Pathology, Salt Lake City, Utah; Department of Pathology, University of Utah, Salt Lake City, Utah. * Address correspondence to this author at: ARUP Laboratories, 500 Chipeta Way, Salt Lake City, Utah 84108. Fax (801) 584-5207; e-mail voelkek@ aruplab.com. † S.A. Dames and J.D. Durtschi contributed equally to the review. Received October 7, 2008; accepted January 29, 2009. Previously published online at DOI: 10.1373/clinchem.2008.112789 2

In 1977, 2 landmark articles describing methods for DNA sequencing were published. Allan Maxam and Walter Gilbert reported an approach in which terminally labeled DNA fragments were subjected to basespecific chemical cleavage and the reaction products were separated by gel electrophoresis (1 ). In an alternative approach, Frederick Sanger and colleagues described the use of chain-terminating dideoxynucleotide analogs that caused base-specific termination of primed DNA synthesis (2 ). Refinement and commercialization of the latter method led to its broad dissemination throughout the research community and, ultimately, into clinical diagnostics. In an industrial, high-throughput configuration, Sanger technology was used in the sequencing of the first human genome, which was completed in 2003 through the Human Genome Project, a 13-year effort with an estimated cost of $2.7 billion. In 2008, by comparison, a human genome was sequenced over a 5-month period for approximately $1.5 million (3 ). The latter accomplishment highlights the capabilities of the rapidly evolving field of “next-generation” sequencing (NGS)3 technologies that have emerged during the past 5 years. Currently, 5 NGS platforms are commercially available, with additional platforms on the horizon. To add to this pace, the US National Human Genome Research Institute (NHGRI) announced funding in August 2008 for a series of projects as part of its Revolutionary Genome Sequencing Technologies program, which has as its goal the sequencing of a human genome for $1000 or less (http://www.genome.gov/27527585). This review describes NGS technologies, reviews their impact on basic research, and explores how they have the translational potential to substantially impact molecular diagnostics. Fundamentals of NGS Platforms NGS platforms share a common technological feature— massively parallel sequencing of clonally amplified or single DNA molecules that are spatially separated in a flow cell. This design is a paradigm shift from that of

3

Nonstandard abbreviations: NGS, next-generation sequencing; NHGRI, National Human Genome Research Institute; dNTP, deoxynucleoside triphosphate; Mb, million base pairs; Gb, billion base pairs; miRNA, microRNA.

641

Reviews Sanger sequencing, which is based on the electrophoretic separation of chain-termination products produced in individual sequencing reactions. In NGS, sequencing is performed by repeated cycles of polymerase-mediated nucleotide extensions or, in one format, by iterative cycles of oligonucleotide ligation. As a massively parallel process, NGS generates hundreds of megabases to gigabases of nucleotidesequence output in a single instrument run, depending on the platform. These platforms are reviewed next. ROCHE/454 LIFE SCIENCES

The 454 technology (http://www.454.com) is derived from the technological convergence of pyrosequencing and emulsion PCR. In 1993, Nyren et al. described a sequencing approach based on chemiluminescent detection of pyrophosphate released during polymerasemediated deoxynucleoside triphosphate (dNTP) incorporation (4 ). Refinement by Ronaghi et al. served as the foundation for the commercial development of pyrosequencing (5, 6 ). On a separate front, Tawfik and Griffiths described single-molecule PCR in microcompartments consisting of water-in-oil emulsions (7 ). In 2000, Jonathan Rothberg founded 454 Life Sciences, which developed the first commercially available NGS platform, the GS 20, launched in 2005. Combining single-molecule emulsion PCR with pyrosequencing, Margulies and colleagues at 454 Life Sciences performed shotgun sequencing of the entire 580 069 bp of the Mycoplasma genitalia genome at 96% coverage and 99.96% accuracy in a single GS 20 run (8 ). In 2007, Roche Applied Science acquired 454 Life Sciences and introduced the second version of the 454 instrument, the GS FLX. Sharing the same core technology as the GS 20, the GS FLX flow cell is referred to as a “picotiter well” plate, which is made from a fused fiber-optic bundle. In its newest configuration, approximately 3.4 ⫻ 106 picoliter-scale sequencing-reaction wells are etched into the plate surface, and the well walls have a metal coating to improve signal-to-noise discrimination. For sequencing (Fig. 1), a library of template DNA is prepared by fragmentation via nebulization or sonication. Fragments several hundred base pairs in length are end-repaired and ligated to adapter oligonucleotides. The library is then diluted to single-molecule concentration, denatured, and hybridized to individual beads containing sequences complementary to adapter oligonucleotides. The beads are compartmentalized into water-in-oil microvesicles, where clonal expansion of single DNA molecules bound to the beads occurs during emulsion PCR. After amplification, the emulsion is disrupted, and the beads containing clonally amplified template DNA are enriched. The beads are again separated by limiting dilution, deposited into individual picotiter-plate wells, and combined 642 Clinical Chemistry 55:4 (2009)

with sequencing enzymes. Loaded into the GS FLX, the picotiter plate functions as a flow cell wherein iterative pyrosequencing is performed by successive flow addition of the 4 dNTPs. A nucleotide-incorporation event in a well containing clonally amplified template produces pyrophosphate release and picotiter-plate well– localized luminescence, which is transmitted through the fiber-optic plate and recorded on a charge-coupled device camera. With the flow of each dNTP reagent, wells are imaged, analyzed for their signal-to-noise ratio, filtered according to quality criteria, and subsequently algorithmically translated into a linear sequence output. With the newest chemistry, termed “Titanium,” a single GS FLX run generates approximately 1 ⫻ 106 sequence reads, with read lengths of ⱖ400 bases yielding up to 500 million base pairs (Mb) of sequence. A recognized strength of the 454 technology is the longer read length, which facilitates de novo assembly of genomes (9 ). An outstanding concern has been the accurate determination of homopolymers ⬎3– 4 bases in length. A 6-base homopolymer should theoretically yield twice the luminescence of a 3-base homopolymer. Operationally, this luminescence yield varies, and estimates of homopolymer length are less accurate with increasing length (8, 10 ). 454 has reported that the metal coating of the walls of picotiter wells mentioned above improves the accuracy of homopolymer determination. Sequence coverage depth and accuracy for the 454 technology is discussed below in the NGS Data Analysis section. ILLUMINA/SOLEXA

In 1997, British chemists Shankar Balasubramanian and David Klenerman conceptualized an approach for sequencing single DNA molecules attached to microspheres. They founded Solexa in 1998, and their goal during early development of sequencing single DNA molecules was not achieved, requiring a shift toward sequencing clonally amplified templates. By 2006, the Solexa Genome Analyzer, the first “short read” sequencing platform, was commercially launched. Acquired by Illumina (http://www.Illumina.com) in 2006, the Genome Analyzer uses a flow cell consisting of an optically transparent slide with 8 individual lanes on the surfaces of which are bound oligonucleotide anchors (Fig. 2). Template DNA is fragmented into lengths of several hundred base pairs and end-repaired to generate 5⬘-phosphorylated blunt ends. The polymerase activity of Klenow fragment is used to add a single A base to the 3⬘ end of the blunt phosphorylated DNA fragments. This addition prepares the DNA fragments for ligation to oligonucleotide adapters, which have an overhang of a single T base at their 3⬘ end to increase ligation efficiency. The adapter oligonucleotides are complementary to the flow-cell anchors. Un-

Next-Generation Sequencing

Reviews

Fig. 1. Roche 454 GS FLX sequencing. Template DNA is fragmented, end-repaired, ligated to adapters, and clonally amplified by emulsion PCR. After amplification, the beads are deposited into picotiter-plate wells with sequencing enzymes. The picotiter plate functions as a flow cell where iterative pyrosequencing is performed. A nucleotide-incorporation event results in pyrophosphate (PPi) release and well-localized luminescence. APS, adenosine 5⬘-phosphosulfate.

Clinical Chemistry 55:4 (2009) 643

Reviews

Fig. 2. Illumina Genome Analyzer sequencing. Adapter-modified, single-stranded DNA is added to the flow cell and immobilized by hybridization. Bridge amplification generates clonally amplified clusters. Clusters are denatured and cleaved; sequencing is initiated with addition of primer, polymerase (POL) and 4 reversible dye terminators. Postincorporation fluorescence is recorded. The fluor and block are removed before the next synthesis cycle.

644 Clinical Chemistry 55:4 (2009)

Reviews

Next-Generation Sequencing

der limiting-dilution conditions, adapter-modified, single-stranded template DNA is added to the flow cell and immobilized by hybridization to the anchors. In contrast to emulsion PCR, DNA templates are amplified in the flow cell by “bridge” amplification, which relies on captured DNA strands “arching” over and hybridizing to an adjacent anchor oligonucleotide. Multiple amplification cycles convert the single-molecule DNA template to a clonally amplified arching “cluster,” with each cluster containing approximately 1000 clonal molecules. Approximately 50 ⫻ 106 separate clusters can be generated per flow cell. For sequencing, the clusters are denatured, and a subsequent chemical cleavage reaction and wash leave only forward strands for single-end sequencing. Sequencing of the forward strands is initiated by hybridizing a primer complementary to the adapter sequences, which is followed by addition of polymerase and a mixture of 4 differently colored fluorescent reversible dye terminators. The terminators are incorporated according to sequence complementarity in each strand in a clonal cluster. After incorporation, excess reagents are washed away, the clusters are optically interrogated, and the fluorescence is recorded. With successive chemical steps, the reversible dye terminators are unblocked, the fluorescent labels are cleaved and washed away, and the next sequencing cycle is performed. This iterative, sequencing-by-synthesis process requires approximately 2.5 days to generate read lengths of 36 bases. With 50 ⫻ 106 clusters per flow cell, the overall sequence output is ⬎1 billion base pairs (Gb) per analytical run (11 ). The newest platform, the Genome Analyzer II, has optical modifications enabling analysis of higher cluster densities. Coupled with ongoing improvements in sequencing chemistry and projected read lengths of 50-plus bases, further increases in output should be realized. Illumina and other NGS technologies have devised strategies to sequence both ends of template molecules. Such “paired-end” sequencing provides positional information that facilitates alignment and assembly, especially for short reads (12, 13 ). A technical concern of Illumina sequencing is that base-call accuracy decreases with increasing read length (14 ). This phenomenon is primarily due to “dephasing noise.” During a given sequencing cycle, nucleotides can be under- or overincorporated, or block removal can fail. With successive cycles, these aberrations accumulate to produce a heterogeneous population in a cluster of strands of varying lengths. This heterogeneity decreases signal purity and reduces precision in base calling, especially at the 3⬘ ends of reads. Modifications in sequencing chemistry and algorithms for data-image analysis and interpretation are being pursued to mitigate dephasing (15 ). Investigators at the Wellcome Trust Sanger Institute, who

have extensive experience with the Illumina platform, have published a series of technical improvements for library preparation, including methods for increasing the reproducibility of fragmentation by adaptive focused acoustic wave sonication, enhanced efficiency of adapter ligation by use of an alternate ligase, and reducing the G⫹C bias that has been observed in Illumina reads via a modified gel-extraction protocol (16 ). APPLIED BIOSYSTEMS/SOLiD

The SOLiD (Supported Oligonucleotide Ligation and Detection) System 2.0 platform, which is distributed by Applied Biosystems (http://www.solid.appliedbiosystems. com), is a short-read sequencing technology based on ligation. This approach was developed in the laboratory of George Church and reported in 2005 along with the resequencing of the Escherichia coli genome (17 ). Applied Biosystems refined the technology and released the SOLiD instrumentation in 2007. Sample preparation shares similarities with the 454 technology in that DNA fragments are ligated to oligonucleotide adapters, attached to beads, and clonally amplified by emulsion PCR. Beads with clonally amplified template are immobilized onto a derivitized-glass flow-cell surface, and sequencing is begun by annealing a primer oligonucleotide complementary to the adapter at the adapter–template junction (Fig. 3). Instead of providing a 3⬘ hydroxyl group for polymerase-mediated extension, the primer is oriented to provide a 5⬘ phosphate group for ligation to interrogation probes during the first “ligation sequencing” step. Each interrogation probe is an octamer, which consists of (in the 3⬘-to-5⬘ direction) 2 probe-specific bases followed by 6 degenerate bases with one of 4 fluorescent labels linked to the 5⬘ end. The 2 probe-specific bases consist of one of 16 possible 2-base combinations (for example TT, GT, and so forth). In the first ligation-sequencing step, thermostable ligase and interrogation probes representing the 16 possible 2-base combinations are present. The probes compete for annealing to the template sequences immediately adjacent to the primer. After annealing, a ligation step is performed, followed by wash removal of unbound probe. Fluorescence signals are optically collected before cleavage of the ligated probes, and a wash is performed to remove the fluor and regenerate the 5⬘ phosphate group. In the subsequent sequencing steps, interrogation probes are ligated to the 5⬘ phosphate group of the preceding pentamer. Seven cycles of ligation, referred to as a “round,” are performed to extend the first primer. The synthesized strand is then denatured, and a new sequencing primer offset by 1 base in the adapter sequence (n ⫺ 1) is annealed. Five rounds total are performed, each time with a new primer with a successive offset (n ⫺ 2, n ⫺ 3, and so on). By this approach, each template nucleoClinical Chemistry 55:4 (2009) 645

Reviews

Fig. 3. Applied Biosystems SOLiD sequencing by ligation. Top: SOLiD color-space coding. Each interrogation probe is an octamer, which consists of (3⬘-to-5⬘ direction) 2 probe-specific bases followed by 6 degenerate bases (nnnzzz) with one of 4 fluorescent labels linked to the 5⬘ end. The 2 probe-specific bases consist of one of 16 possible 2-base combinations. Bottom: (A), The P1 adapter and template with annealed primer (n) is interrogated by probes representing the 16 possible 2-base combinations. In this example, the 2 specific bases complementary to the template are AT. (B), After annealing and ligation of the probe, fluorescence is recorded before cleavage of the last 3 degenerate probe bases. The 5⬘ end of the cleaved probe is phosphorylated (not shown) before the second sequencing step. (C), Annealing and ligation of the next probe. (D), Complete extension of primer (n) through the first round consisting of 7 cycles of ligation. (E), The product extended from primer (n) is denatured from the adapter/template, and the second round of sequencing is performed with primer (n ⫺ 1). With the use of progressively offset primers, in this example (n ⫺ 1), adapter bases are sequenced, and this known sequence is used in conjunction with the color-space coding for determining the template sequence by deconvolution (see Fig. 1 in the online Data Supplement). In this technology, template bases are interrogated twice.

646 Clinical Chemistry 55:4 (2009)

Reviews

Next-Generation Sequencing

Table 1. Comparison of NGS platforms. Roche 454 GS FLX

Illumina Genome Analyzer

Applied Biosystems SOLiD

Sanger

Sequencing method

Pyrosequencing

Reversible dye terminators

Sequencing by ligation

Dye terminators

Read lengths

400 bases

36 bases

35 bases

800 bp

Sequencing run time

10 h

2.5 days

6 days

3h

Total bases per run

500 Mb

1.5 Gb

4 Gb

800 bp

tide is sequenced twice. A 6-day instrument run generates sequence read lengths of 35 bases. Sequence is inferred by interpreting the ligation results for the 16 possible 2 base– combination interrogation probes. With the use of offset primers, several bases of the adapter are sequenced. This information provides a sequence reference starting point that is used in conjunction with the color space– coding scheme to algorithmically deconvolute the downstream template sequence (see Fig. 1 in the Data Supplement that accompanies the online version of this review at http:// www.clinchem.org/content/vol55/issue4). Placing 2 flow-cell slides in the instrument per analytical run produces a combined output of 4 Gb of sequence or greater. Unextended strands are capped before the ligation to mitigate signal deterioration due to dephasing. Capping coupled with high-fidelity ligation chemistry and interrogation of each nucleotide base twice during independent ligation cycles yields a companyreported sequence consensus accuracy of 99.9% for a known target at a 15-fold sequence coverage over sequence reads of 25 nucleotides. On an independent track, the Church laboratory has collaborated with Danaher Motion and Dover Systems to develop and introduce an alternative sequencing-by-ligation platform, the Polonator G.007 (http://www.polonator. org). Table 1 summarizes GS FLX, Genome Analyzer, and SOLiD platform features. HELICOS BIOSCIENCES AND SINGLE-MOLECULE SEQUENCING

The first single-molecule sequencing platform, the HeliScope, is now available from Helicos BioSciences (http://www.helicosbio.com) with a companyreported sequence output of 1 Gb/day. This technology stems from the work of Braslavsky et al., published in 2003 (18 ). Having obviated clonal amplification of template, the method involves fragmenting sample DNA and polyadenylation at the 3⬘ end, with the final adenosine fluorescently labeled. Denatured polyadenylated strands are hybridized to poly(dT) oligonucleotides immobilized on a flow-cell surface at a capture density of up to 100 ⫻ 106 template strands per square centimeter. After the positional coordinates of the captured strands are recorded by a

charge-coupled device camera, the label is cleaved and washed away before sequencing. For sequencing, polymerase and one of 4 Cy5-labeled dNTPs are added to the flow cell, which is imaged to determine incorporation into individual strands. After label cleavage and washing, the process is repeated with the next Cy5-labeled dNTP. Each sequencing cycle, which consists of the successive addition of polymerase and each of the 4 labeled dNTPs, is termed a “quad.” The number of sequencing quads performed is approximately 25–30, with read lengths of up to 45–50 bases having been achieved. The Helicos platform was used to sequence the 6407-base genome of bacteriophage M13 (19 ). This study demonstrated both the potential and important technical issues that may be relevant to all single-molecule sequencing methods that are based on sequencing by synthesis. First, sequencing accuracy was appreciably improved when template molecules were sequenced twice (“2-pass” sequencing). Second, the accuracy of sequencing homopolymers was compromised by the polymerase adding additional bases of the same identity in a homopolymeric stretch in a given dNTP addition. Helicos has since developed proprietary labeled dNTPs, termed “virtual terminators,” which the company reports reduce polymerase processivity so that only single bases are added, improving the accuracy of homopolymer sequencing. Interestingly, the percentage of strands in which longer read lengths can be achieved (e.g., 50 nucleotides) is substantially lower than that obtained with shorter (e.g., 25 nucleotides) read-length sequencing, possibly reflecting secondary structures (e.g., hairpins) assumed by the template molecules. The Impact of NGS on Basic Research In the short 4 years since the first commercial platform became available, NGS has markedly accelerated multiple areas of genomics research, enabling experiments that previously were not technically feasible or affordable. We describe major applications of NGS and then review the analysis of NGS data. Clinical Chemistry 55:4 (2009) 647

Reviews GENOMIC ANALYSIS

The high-throughput capacity of NGS has been leveraged to sequence entire genomes, from microbes to humans (3, 8, 9, 11, 20 –24 ), including the recent sequencing of the genome of cytogenetically normal acute myeloid leukemia cells, which identified novel, tumor-specific gene mutations (25 ). The longer read lengths of the 454 technology, compared with the Illumina and SOLiD short-read technologies, facilitate the assembly of genomes in the absence of a reference genome (i.e., de novo assembly). For resequencing, both long- and short-read technologies have been used successfully. In one comparative study, the 454, Illumina, and SOLiD technologies all accurately detected singlenucleotide variations when coverage depth was ⱖ15fold per allele (20 ) (the critical issue of coverage depth is discussed further in the NGS Data Analysis section). The 454 read lengths provide nucleotide haplotype information over a range of several hundred base pairs and are predicted to be better suited for detecting larger insertions and deletions and for producing alignments in areas containing repetitive sequences. Further studies are needed to compare technology performance for detecting insertions and deletions. Each platform has an optional strategy for sequencing both ends of DNA libraries (paired-end sequencing). In addition to effectively doubling sequence output, knowing that reads are associated with each other on a given fragment augments alignment and assembly, especially for short reads. Paired-end sequencing has been used to map genomic structural variation, including deletions, insertions, and rearrangements (12, 13, 26, 27 ). The ability to sequence complete human genomes at a substantially reduced cost with NGS has energized an international effort to sequence thousands of human genomes over the next decade (http://www. 1000genomes.org), which will lead to the characterization and cataloging of human genetic variation at an unprecedented level. TARGETED GENOMIC RESEQUENCING

Sequencing of genomic subregions and gene sets is being used to identify polymorphisms and mutations in genes implicated in cancer and in regions of the human genome that linkage and whole-genome association studies have implicated in disease (28, 29 ). Especially in the latter setting, regions of interest can be hundreds of kb’s to several Mb in size. To best use NGS for sequencing such candidate regions, several genomicenrichment steps, both traditional and novel, are being incorporated into overall experimental designs. Overlapping long-range PCR amplicons (approximately 5–10 kb) can be used for up to several hundred kb’s, but this approach is not practical for larger genomic regions. More recently, enrichment has been achieved 648 Clinical Chemistry 55:4 (2009)

by hybridizing fragmented, denatured human genomic DNA to oligonucleotide capture probes complementary to the region of interest and subsequently eluting the enriched DNA (30 –33 ). Capture probes can be immobilized on a solid surface (Roche NimbleGen, http://www.nimblegen.com; Agilent Technologies, http://www.agilent.com; and Febit, http://www.febit. com) or used in solution (Agilent). Current NimbleGen arrays contain 350 000 oligonucleotides of 60 –90 bp in length that are typically spaced 5–20 nucleotides apart, with oligonucleotides complementary to repetitive regions being excluded. For enrichment, 5–20 ␮g of genomic DNA is fragmented and ligated to oligonucleotide linkers containing universal PCR priming sites. This material is denatured, hybridized to an array for 3 days, and eluted, with the enriched DNA amplified by the PCR before NGS library preparation. In reported studies, up to 5 Mb of sequence has been captured on the 350K array, with 60%–75% of sequencing reads mapping to targeted regions; other reads mapping to nontargeted regions reflect nonspecific capture. In development by NimbleGen is the use of an array of 2.1 ⫻ 106 features for capturing larger genomic regions. Agilent’s solution-based technology uses oligonucleotides up to 170 bases in length, with each end containing sequences for universal PCR priming and with primer sites containing a restriction endonuclease–recognition sequence. The oligonucleotide library is amplified by the PCR, digested with restriction enzymes, and ligated to adapters containing the T7 polymerase promoter site. In vitro transcription is performed with biotinylated UTP to generate singlestranded biotinylated cRNA capture sequences. For capture, 3 ␮g of fragmented, denatured genomic DNA is hybridized with cRNA sequences for 24 h in solution. After hybridization, duplexes consisting of singlestranded DNA and cRNA are bound to streptavidincoated magnetic beads; the cRNA is then enzymatically digested, leaving enriched single-stranded DNA that is subsequently processed for NGS. An alternative enrichment approach developed by RainDance Technologies (http://www.raindancetechnologies.com) uses a novel microfluidics technology in which individual pairs of PCR primers for the genomic regions of interest are segregated in water in emulsion droplets and then pooled to create a “primer library.” Separately, emulsion droplets containing genomic DNA and PCR reagents are prepared. Two separate droplet streams are created, one with primer-library droplets and the other with droplets containing genomic DNA/PCR reagents. The 2 streams are merged and primer-library droplets and genomic DNA/PCR reagent droplets are paired in a 1:1 ratio. As paired droplets proceed through the microfluidic channel, they pass an electrical impulse that causes them to physically coalesce. The

Reviews

Next-Generation Sequencing

coalesced droplets containing individual primer pairs and genomic DNA/PCR reagents are deposited in a 96-well plate and amplified by the PCR. After amplification, the emulsions are disrupted, and the amplicons are pooled and processed for NGS. METAGENOMICS

NGS has had a tremendous impact on the study of microbial diversity in environmental and clinical samples. Operationally, genomic DNA is extracted from the sample of interest, converted to an NGS library and sequenced. The sequence output is aligned to known reference sequences for microorganisms that are predicted to be present in the sample. Closely related species can be discerned, and more distantly related species can be inferred. In addition, de novo assembly of the data set can yield information to support the presence of known and potentially new species. Qualitative genomic information is obtained, and analysis of the relative abundance of the sequence reads can be used to derive quantitative information on individual microbial species. To date, most NGS-based metagenomic analyses have used the 454 technology and its associated longer read lengths to facilitate alignment to microbial reference genomes and for de novo assembly of previously uncharacterized microbial genomes. Examples of metagenomic studies include the analysis of microbial populations in the ocean (34, 35 ) and soil (36 ), the identification of a novel arenavirus in transplantation patients (37 ), and the characterization of microflora present in the human oral cavity (38 ) and the guts of obese and lean twins (39 ). TRANSCRIPTOME SEQUENCING

NGS has provided a powerful new approach, termed “RNA-Seq,” for mapping and quantifying transcripts in biological samples. Total, ribosomal RNA– depleted, or poly(A)⫹ RNA is isolated and converted to cDNA. A typical protocol would involve the generation of firststrand cDNA via random hexamer–primed reverse transcription and subsequent generation of secondstrand cDNA with RNase H and DNA polymerase. The cDNA is then fragmented and ligated to NGS adapters. For small RNAs such as microRNAs (miRNAs) and short interfering RNAs, preferential isolation via a small RNA– enrichment method, size selection on an electrophoresis gel, or a combination of these approaches is commonly used. RNA ligase is used to join adapter sequences to the RNA; this step is often followed by a PCR amplification step before NGS processing. After sequencing, reads are aligned to a reference genome, compared with known transcript sequences, or assembled de novo to construct a genome-scale transcription map. Although RNA-Seq is in its early stages as a technology, it has already shown some ad-

vantages over gene expression arrays (40 ). First, arrays depend on tiling existing genomic sequences, whereas RNA-Seq is not constrained by this limitation, allowing characterization of transcription without prior knowledge of the genomic sites of transcription origin. RNASeq is capable of single-base resolution and, compared with arrays, demonstrates a greater ability to distinguish RNA isoforms, determine allelic expression, and reveal sequence variants. Expression levels are deduced from the total number of reads that map to the exons of a gene, normalized by the length of exons that can be uniquely mapped. Results obtained with this approach have shown close correlation with those of quantitative PCR and RNA-spiking experiments. The dynamic range of RNA-Seq for determining expression levels is 3– 4 orders of magnitude, compared with 2 orders of magnitude for expression arrays. In this context, RNASeq has shown improved performance for the quantitative detection of both highly produced transcripts and transcripts produced at low levels. RNA-Seq is being used to confirm and revise gene annotation, including 5⬘ and 3⬘, and exon/intron boundaries; the latter is achieved by mapping reads to exon junctions defined by GT-AG splicing consensus sites. Both qualitative and quantitative information regarding splicing diversity can be deduced. RNA-Seq has been applied to a variety of organisms, including Saccharomyces cerevisiae, Arabidopsis thaliana, mice, and human cells (40 –51 ). MAPPING OF DNA-BINDING PROTEINS AND CHROMATIN ANALYSIS

The delineation of regulatory proteins associated with genomes was substantially accelerated by the introduction of chromatin immunoprecipitation and microarray hybridization (ChIP-on-chip) technology (52 ). In this approach, proteins in contact with genomic DNA are chemically cross-linked (typically with mild formaldehyde treatment) to their binding sites, and the DNA is fragmented by sonication or digestion with micrococcal nuclease. The proteins cross-linked with DNA are immunoprecipitated with antibodies specific for the proteins of interest. The DNA in the immunoprecipitate is purified and hybridized to an oligonucleotide array consisting of sequences from the genome, allowing identification of the protein-binding sites. This approach has been successfully used to identify binding sites for transcription factors and histone proteins. ChIP-on-chip technology is now being supplanted in a variety of experimental settings with ChIP-Seq, in which the DNA harvested from the immunoprecipitate is converted into a library for NGS. The obtained reads are mapped to the reference genome of interest to generate a genome-wide proteinbinding map (53–55 ). Studies to date that have examClinical Chemistry 55:4 (2009) 649

Reviews ined the genomic-binding sites of the human NRSF (neuron restrictive silencer factor) and STAT1 (signal transducer and activator of transcription 1) proteins indicate the resolution of ChIP-Seq to be greater than for ChIP-on-chip, as evidenced by confirmation of previously identified binding sites and identification of novel binding sites (56, 57 ). Analogous to RNA-Seq, ChIP-Seq has the important advantage of not requiring prior knowledge of genomic locations of protein binding. In addition to the study of transcription factors, NGS is being used to map genomic methylation. One approach involves traditional bisulfite conversion of DNA followed by NGS, which has been applied to the study of entire genomes or genomic subregions (58, 59 ). Ongoing studies are attempting to develop a variant of ChIP-Seq in which genomic methylation is assayed by coupling immunoprecipitation with a monoclonal antibody directed against methylated cytosine and subsequent NGS (60 ). NGS Data Analysis NGS experiments generate unprecedented volumes of data, which present challenges and opportunities for data management, storage, and, most importantly, analysis (61 ). NGS data begin as large sets of tiled fluorescence or luminescence images of the flow-cell surface recorded after each iterative sequencing step (Fig. 4). This volume of data requires a resource-intensive data-pipeline system for data storage, management, and processing. Data volumes generated during single runs of the 454 GS FLX, Illumina, and SOLiD instruments are approximately 15 GB, 1 TB, and 15 TB, respectively. The main processing feature of the data pipeline is the computationally intensive conversion of image data into sequence reads, known as base calling. First, individual beads or clusters are identified and localized in an image series. Image parameters such as intensity, background, and noise are then used in a platform-dependant algorithm to generate read sequences and error probability–related quality scores for each base. Although many researchers use the base calls generated by the platform-specific data-pipeline software, alternative base-calling programs that use more advanced software and statistical techniques have been developed. Features of these alternative programs include the incorporation of ambiguous bases into reads, improved removal of poor-quality bases from read ends (62 ), and the use of data sets for software training (15 ). Incorporation of these features has been shown to reduce read error and improve alignment, especially as platforms are pushed to generate longer reads. These advantages, however, must be weighed against the substantial computer resources required by the large volumes of image data. 650 Clinical Chemistry 55:4 (2009)

The quality values calculated during NGS base calling provide important information for alignment, assembly, and variant analysis. Although the calculation of quality varies between platforms, the calculations are all related to the historically relevant phred score, introduced in 1998 for Sanger sequence data (63, 64 ). The phred score quality value, q, uses a mathematical scale to convert the estimated probability of an incorrect call, e, to a log scale: q ⫽ ⫺10 䡠 log10 共e兲. Miscall probabilities of 0.1 (10%), 0.01 (1%), and 0.001 (0.1%) yield phred scores of 10, 20, and 30, respectively. The NGS error rates estimated by quality values depend on several factors, including signal-to-noise levels, cross talk from nearby beads or clusters, and dephasing. Substantial effort has been made to understand and improve the accuracy of quality scores and the underlying error sources (10, 14 ), including inaccuracies in homopolymer run lengths on the 454 platform and base-substitution error biases with the Illumina format. Study of these error traits has led to examples of software that require no additional base calling but that improve quality-score accuracies and thus improve sequencing accuracy (65, 66 ). Quality values are an important tool for rejecting low-quality reads, trimming low-quality bases, improving alignment accuracy, and determining consensus-sequence and variant calls (67 ). Alignment and assembly are substantially more difficult for NGS data than for Sanger data because of the shorter reads lengths in the former. One limitation of short-read alignment and assembly is the inability to uniquely align large portions of a read set when the read length becomes too short. Similarly, the number of uniquely aligned reads is reduced when aligning to larger, more complex genomes or reference sequences because of their having a higher probability of repetitive sequences. A case in point is a modeling study that indicated that 97% of the E. coli genome can be uniquely aligned with 18-bp reads but that only 90% of the human genome can be uniquely aligned with 30-bp reads (68, 69 ). Unique alignment or assembly is reduced not only by the presence of repeat sequences but also by shared homologies within closely related gene families and pseudogenes. Nonunique read alignment is handled in software by read distribution between multiple alignment positions or leaving alignment gaps. De novo assembly will reject these reads, leading to shorter and more numerous assembled contigs. These factors are relevant when choosing an appropriate sequencing platform with its associated read length, particularly for de novo assembly (9 ). Error rates for individual NGS reads are higher than for Sanger sequencing. The higher accuracy of

Next-Generation Sequencing

Reviews

Fig. 4. Pseudocolor image from the Illumina flow cell. Each fluorescence signal originates from a clonally amplified template cluster. Top panel illustrates 4 emission wavelengths of fluorescent labels depicted in red, green, blue, and yellow. Images are processed to identify individual clusters and to remove noise or interference. The lower panel is a composite image of the 4 fluorescence channels.

Sanger sequencing reflects not only the maturity of the chemistry but also the fact that a Sanger trace peak represents highly redundant, multiple terminated extension reactions. Accuracy in NGS is achieved by sequencing a given region multiple times, enabled by the massively parallel process, with each sequence contributing to “coverage” depth. Through this process, a “consensus” sequence is derived. To assemble, align, and analyze NGS data requires an adequate number of overlapping reads, or coverage. In practice, coverage across a sequenced region is variable, and factors other

than the Poisson-like randomness of library preparation that may contribute to this variability include differential ligation of adapters to template sequences and differential amplification during clonal template generation (11, 70 ). Beyond sequence errors, inadequate coverage can cause failure to detect actual nucleotide variation, leading to false-negative results for heterozygotes (3, 11 ). Studies have shown that coverages of less than 20- to 30-fold begin to reduce the accuracy of single-nucleotide polymorphism calls in data on the 454 platform (65 ). For the Illumina system, higher Clinical Chemistry 55:4 (2009) 651

Reviews coverage depths (50- to 60-fold) have been used in an effort to improve short-read alignment, assembly, and accuracy, although coverage in the 20- to 30-fold range may be sufficient for certain resequencing applications (14 ). As noted above, one comparative study of a yeast genome showed that the 454, Illumina, and SOLiD technologies all accurately detected single-nucleotide variations when the coverage depth was ⱖ15-fold per allele (20 ). Coverage gaps can occur when sequences are not aligned because of substantial variance from a reference. Alignment of repetitive sequences in repeat regions of a target sequence can also affect the apparent coverage. Reads that align equally well at multiple sites can be randomly distributed to the sites or in some cases discarded, depending on the alignment software. In de novo–assembly software, reads with ambiguous alignments are typically discarded, yielding multiple aligned read groups, or contigs, with no information regarding relative order. A large variety of software programs for alignment and assembly have been developed and made available to the research community (see Table 1 in the online Data Supplement). Most use the Linux operating system, and a few are available for Windows. Many require a 64-bit operating system and can use ⱖ16 MB of RAM and multiple central-processing unit cores. The range of data volumes, hardware, software packages, and settings leads to processing times from a few minutes to multiple hours, emphasizing the need for sufficient computational power. Although a growing set of variations in alignment and assembly algorithms are available, there remains the trade-off between speed and accuracy in which many but not all possible alignments are evaluated, with a balance having to be struck between ideal alignment and computational efficiency. NGS software features vary with the application and in general may include alignment, de novo assembly, alignment viewing, and variant-discovery programs. In addition some NGS statistical data-analysis tools are being developed (such as JMP Genomics; SAS Institute). Software packages available for alignment and assembly to a reference sequence include Zoom (71 ), MAQ (67 ), Mosaik (72 ), SOAP (73 ), and SHRiMP (http://compbio.cs.toronto.edu/shrimp/), which supports SOLiD color-space analysis. Software for de novo assembly includes Edina (70 ), EULER-SR (74 ), SHARCGS (75 ), SSAKE (69 ), Velvet (76 ), and SOAPdenovo (http://soap.genomics.org.cn/). Recently released commercial software for alignment and de novo assembly includes packages from DNAStar (www.dnastar.com), SoftGenetics (www.softgenetics. com), and CLC bio (www.clcbio.com) that feature data viewers that allow the user to see read alignments, coverage depth, genome annotations, and variant analysis. 652 Clinical Chemistry 55:4 (2009)

Fig. 5 presents some examples of NGS data viewed in 2 different software systems. RNA-Seq data analysis poses unique challenges and requires sequence alignment across spliced regions of transcripts as well as poly(A) tails. Current software has made strong inroads, however, with incorporation of motif recognition at splice junctions and identification of intron– exon borders through regions of low alignment coverage (41 ). Deciphering multiple transcript isoforms involves mapping reads to known and putative splicing junctions and, in one approach, requires that each isoform be supported by multiple independent splice-junction reads with independent start sites (51 ). ERANGE software has been used in the analysis of the mouse transcriptome (43 ). ERANGE maps unique reads to their genomic site of origin and maps reads that match to several sites, or multireads, to a most likely site of origin. Reads that do not map to a known exon are grouped together by homology into candidate exons or parts of exons. The near proportional nature of NGS transcriptome data allows quantification of RNA production from the coverage of the assembled or aligned data. ERANGE uses normalized counts of unique reads, spliced reads, and multireads to quantify transcripts. Additional analytical considerations are needed for miRNA studies, including RNA secondary-structure analysis for hairpins, alignment to known miRNA databases, and searches within the NGS data set for complementary miRNA strands, as described in studies of developing rice grains (77 ) and chicken embryos (78 ). Research with ChIP-Seq has led to analysis methods and software that exploit the advantages over ChIP-in-chip, namely a larger, more information-rich data set. The single-base resolution of the data allows improved estimation of binding-site positions in the programs QuEST (79 ) and MACS (80 ). Aligned data at the protein-binding regions typically have 2 characteristic offset peaks, each of which is populated by only forward or only reverse reads. These peaks are hallmarks of the immunoprecipitated short ChIP-Seq DNA fragments with a binding site near the center and are used by the software to estimate binding-site location near the mean peak position. Additional program features include advancements in statistical analysis to minimize miscalled binding sites, error probability estimation, and motif analysis (see Table 1 in the online Data Supplement). A Clinical Future for NGS From the impact that NGS has made at the basicresearch level, we can anticipate its translation into molecular diagnostics. Key issues that will need to be addressed in this transition will include complexity of

Next-Generation Sequencing

Reviews

Fig. 5. Examples of NGS data viewed in 2 different software systems. (A), Roche Amplicon Variant Analyzer software displaying GS FLX data from the CFTR gene [cystic fibrosis transmembrane conductance regulator (ATP-binding cassette sub-family C, member 7)]. Lower pane shows reference sequence (green) above 18 of 68 aligned reads. Column highlighted in yellow and blue shows a heterozygous single-nucleotide polymorphism (SNP). Single T/A insertions (red) may represent errors. Upper pane shows percent variation from reference (vertical bars) and coverage (pale blue line). (B), DNAStar SeqMan Pro software displaying Illumina data from Mycobacterium massiliense. Lower pane shows reference sequence above aligned reads. Green and red arrows show direction of sequencing; base calls at variance with reference are indicated (red). Three columns in agreement (red) indicate presumptive SNPs. Other bases in red may be errors. Upper pane shows read coverage, with relative alignment positions above the graph. See the Acknowledgments for disclosure information on the CFTR-gene analysis performed on residual, deidentified DNA. Clinical Chemistry 55:4 (2009) 653

Reviews technical procedures, robustness, accuracy, and cost. By all these measures, NGS platforms will benefit from continued process streamlining, automation, chemistry refinements, cost reductions, and improved data handling. The cost of NGS is currently substantial in terms of the investment in capital equipment (from approximately $600 000 for the Roche/454 Life Science, Illumina, and Applied Biosystems SOLiD platforms to $1.35 million for the HeliScope platform) and costs of sequencing reagents (from approximately $3500 –$4500 for the Illumina, Applied Biosystems, and Roche/454 platforms to $18 000 for the HeliScope platform). Nonetheless, the cost per base is substantially lower than for Sanger sequencing, and combined with the tremendous output, it is straightforward to see why genome centers, core facilities, and commercial contract-sequencing enterprises have readily adopted this new technology. Work flow considerations include the fact that preparation of a sample library requires multiple molecular biology steps and 2– 4 days to complete, depending on the platform. In addition to the required molecular biology expertise, data analysis requires expertise in bioinformatics facilitated by a knowledge of Linux operating systems. Leveraging the high-throughput capacity of NGS platforms can be facilitated by analyzing multiple samples with separate flow-cell lanes or compartments. In addition, unique identifier sequences or “bar codes” can be ligated to individual samples, which can subsequently be pooled and sequenced. After sequencing, sequences of individual samples are derived by data deconvolution (81– 83 ). The transition of NGS into clinical diagnostics is in the early stages of development in large reference laboratories and is being leveraged for applications that require large amounts of sequence information, relative quantification, and high-sensitivity detection. Examples that meet these criteria include the aforementioned detection of mutations in tumor cells from biopsies or in the circulation. In the area of mitochondrial disorders, NGS can be used to sequence the entire 16.5-kb mitochondrial genome, determine mutation heteroplasmy percentage, and analyze nuclear genes whose protein products affect mitochondrial metabolism—all in a single analytical run. In the authors’ laboratory, sequencing of mycobacterial genomes is ongoing as an approach to refine organism identification and support clinical epidemiologic investigations. HIV quasi-species detection and relative quantification have been demonstrated and can be used to monitor emerging drug resistance (84 ). For human genetics, there is an increasing need to analyze multiple genes that, when mutated, lead to overlapping physical findings and clinical phenotypes. For example, 16 different 654 Clinical Chemistry 55:4 (2009)

genes are implicated in the pathogenesis of hypertrophic cardiomyopathy (85, 86 ). For a comprehensive diagnostic evaluation in such settings, it will be necessary to sequence upwards of 100 000 to 200 000 bp. The coupling of NGS with the genomic-enrichment techniques described above offers a promising approach to this technical challenge. Recently, investigative groups led by Y.M. Dennis Lo and Stephen Quake have applied NGS to the detection of fetal chromosomal aneuploidy (87, 88 ). Prior work had demonstrated that cell-free fetal nucleic acids (DNA and RNA) are present in maternal blood during pregnancy, along with maternally derived cell-free nucleic acids. Several analytical approaches that use cellfree fetal nucleic acids have been developed to determine fetal aneuploidy, including the analysis of placental mRNA derived from the chromosomes of interest (e.g., chromosome 21) and the determination of relative chromosomal dosage via digital PCR analysis of a large number of target chromosomal loci compared with reference chromosomal loci (89 –91 ). Building on the concept of relative chromosome dosage, the Lo and Quake groups have independently shown the feasibility of converting cell-free DNA from maternal blood into an Illumina library, followed by sequencing and mapping the reads to the reference human genome. Counting the number of reads that map to each chromosome allows the relative dosage of each chromosome to be ascertained. If fetal aneuploidy is present, the number of sequence reads mapping to the affected chromosome would be expected to be statistically overrepresented in the data set. This expectation was confirmed in trisomy 21 pregnancies, with additional supporting evidence obtained for trisomy 18 and 13 pregnancies. These studies open a new avenue for assessing fetal aneuploidy and provide a foundation for NGS-based analysis of cell-free DNA in both nonpathologic and pathophysiological states. Technologies on the Horizon New single-molecule sequencing technologies in development may decrease sequencing time, reduce costs, and streamline sample preparation. Real-time sequencing by synthesis is being developed by VisiGen (http://www.visigenbio.com) and Pacific Biosciences (http://www.pacificbiosciences.com). VisiGen’s approach uses DNA polymerase modified with a fluorescent donor molecule. Attached to a glass slide surface, the polymerase directs strand extension from primed DNA templates. Nucleotides are modified with fluorescent acceptor molecules, and light energy is used during incorporation to invoke fluorescence resonance energy transfer between polymerase and nucleotide fluorescent moieties, the latter being in the

Reviews

Next-Generation Sequencing

␥-phosphate position and cleaved away during incorporation. The company envisions its platform will consist of a massively parallel array of tethered DNA polymerases that will generate 1 ⫻ 106 bp of sequence per second. Pacific Biosciences performs single-molecule realtime sequencing and uses phospholinked fluorescently labeled dNTPs. DNA sequencing is performed in thousands of reaction wells 50 –100 nm in diameter that are fabricated with a thin metal cladding film deposited on an optical waveguide consisting of a solid, transparent silicon dioxide substrate. Each reaction well is a nanophotonic chamber in which only the bottom third is visualized, producing a detection volume of approximately 20 zL (20 ⫻ 10⫺21 L). DNA polymerase/template complexes are immobilized to the well bottoms, and 4 differently labeled dNTPs are added. As the DNA polymerase incorporates complementary nucleotides, each base is held within the detection volume for tens of milliseconds, orders of magnitude longer than the amount of time it takes for a nucleotide to diffuse in and out of the detection volume. Laser excitation enables the incorporation events in individual wells to be captured through the optical waveguide, with the fluorescent color detected reflecting the identity of the dNTP incorporated. For sequencing, Pacific Biosciences uses a modified phi29 DNA polymerase that has enhanced kinetic properties for incorporating the system’s phospholinked fluorescently labeled dNTPs. In addition, phi29 DNA polymerase is highly processive, with strand-displacement activity. By taking advantage of these properties, Pacific Biosciences has demonstrated sequencing reads exceeding 4000 bases when a circularized single-stranded DNA molecule is used as template. In this configuration, the phi29 DNA polymerase carries out multiple laps of DNA stranddisplacement synthesis around the circular template. The mean DNA-synthesis rate was determined to be approximately 4 bases/s. The observed errors, including deletions, insertions, and mismatches can be addressed by developing a consensus sequence read derived from the multiple rounds of template sequencing. Further refinement of the chemistries and platform instrumentation are ongoing, with a 2010 target date for commercial launch (92–94 ). Farther out toward the horizon is sequencing based on monitoring the passage of DNA molecules through nanopores 2–5 nm or greater in diameter. Nanopores are being fabricated in inorganic membranes (solid-state nanopores), assembled from protein channels in lipid membranes, or configured in polymer-based nanofluidic channels. In some configurations, current is applied across nanopore membranes to drive the translocation of negatively charged DNA molecules through pores while monitoring changes in membrane electrical conductance measured in the picoampere range. NABsys (http://www.nabsys.

com) is pursuing a combination of nanopores with sequencing by hybridization in which single-stranded DNA molecules are hybridized with a library of hexamers of known sequence. The hybridized DNA is interrogated through a nanopore, with the current changes being different in regions of hexamer hybridization. The patterns of hybridization are used to map annealing regions and determine sequence. Oxford Nanopore Technologies (http://www.nanoporetech.com) is developing nanopore-based sequencing that uses an ␣-hemolysin protein channel in reconstituted lipid bilayers. The nanopores are situated in individual array wells, and single DNA molecules are introduced into the wells and progressively digested by exonuclease. The released single-nucleotide bases enter the nanopore and alter the electrical current, creating a characteristic current change for each individual base (95, 96 ). Although the technology is seemingly futuristic, considerable NHGRI funding is being directed toward a variety of nanopore technologies under development as part of the goal of achieving the $1000 genome. For further descriptions of nanopore technologies, the reader is referred to recent reviews (97, 98 ). Conclusions The past few years have witnessed the emergence of NGS technologies that share a common basis, massively parallel sequencing of clonally amplified DNA molecules. In 2008, the first NGS platform based on single-molecule DNA sequencing was launched. On the horizon are realtime single-molecule DNA-sequencing technologies and approaches based on nanopores. NGS has had a substantial impact on basic genomics research in terms of scale and feasibility. Over the next several years, NGS is anticipated to transition into clinical-diagnostics use. Essential elements to make this transition successful will be the requirement of streamlining the processes, especially sample preparation, coupled with improvements in technology robustness and characterization of accuracy through validation studies. The large amounts of sequence-data output will pose a bioinformatics challenge for the clinical laboratory. In addition to data processing, the interpretation of sequencing results will require further characterization of the genomic variation present in the regions analyzed. Although considerable work lies ahead to implement NGS into clinical diagnostics, the potential applications are exciting and numerous.

Author Contributions: All authors confirmed they have contributed to the intellectual content of this paper and have met the following 3 re-

Clinical Chemistry 55:4 (2009) 655

Reviews quirements: (a) significant contributions to the conception and design, acquisition of data, or analysis and interpretation of data; (b) drafting or revising the article for intellectual content; and (c) final approval of the published article. Authors’ Disclosures of Potential Conflicts of Interest: No authors declared any potential conflicts of interest.

Role of Sponsor: The funding organizations played a direct role in the preparation of the manuscript and in the final approval of the manuscript. Acknowledgments: The analysis of the CFTR gene illustrated in Fig. 5 was performed on residual, deidentified DNA under the approval of University of Utah Institutional Review Board, human subjects protocol number 7275.

References 1. Maxam AM, Gilbert W. A new method for sequencing DNA. Proc Natl Acad Sci U S A 1977; 74:560 – 4. 2. Sanger F, Nicklen S, Coulson AR. DNA sequencing with chain-terminating inhibitors. Proc Natl Acad Sci U S A 1977;74:5463–7. 3. Wheeler DA, Srinivasan M, Egholm M, Shen Y, Chen L, McGuire A, et al. The complete genome of an individual by massively parallel DNA sequencing. Nature 2008;452:872– 6. 4. Nyren P, Pettersson B, Uhlen M. Solid phase DNA minisequencing by an enzymatic luminometric inorganic pyrophosphate detection assay. Anal Biochem 1993;208:171–5. 5. Ronaghi M, Karamohamed S, Pettersson B, Uhlen M, Nyren P. Real-time DNA sequencing using detection of pyrophosphate release. Anal Biochem 1996;242:84 –9. 6. Ronaghi M, Uhlen M, Nyren P. A sequencing method based on real-time pyrophosphate. Science 1998;281:363–5. 7. Tawfik DS, Griffiths AD. Man-made cell-like compartments for molecular evolution. Nat Biotechnol 1998;16:652– 6. 8. Margulies M, Egholm M, Altman WE, Attiya S, Bader JS, Bemben LA, et al. Genome sequencing in microfabricated high-density picolitre reactors. Nature 2005;437:376 – 80. 9. Pearson BM, Gaskin DJ, Segers RP, Wells JM, Nuijten PJ, van Vliet AH. The complete genome sequence of Campylobacter jejuni strain 81116 (NCTC11828). J Bacteriol 2007;189:8402–3. 10. Huse SM, Huber JA, Morrison HG, Sogin ML, Welch DM. Accuracy and quality of massively parallel DNA pyrosequencing. Genome Biol 2007; 8:R143. 11. Bentley DR, Balasubramanian S, Swerdlow HP, Smith GP, Milton J, Brown CG, et al. Accurate whole human genome sequencing using reversible terminator chemistry. Nature 2008; 456:53–9. 12. Korbel JO, Urban AE, Affourtit JP, Godwin B, Grubert F, Simons JF, et al. Paired-end mapping reveals extensive structural variation in the human genome. Science 2007;318:420 – 6. 13. Campbell PJ, Stephens PJ, Pleasance ED, O’Meara S, Li H, Santarius T, et al. Identification of somatically acquired rearrangements in cancer using genome-wide massively parallel paired-end sequencing. Nat Genet 2008;40:722–9. 14. Dohm JC, Lottaz C, Borodina T, Himmelbauer H. Substantial biases in ultra-short read data sets from high-throughput DNA sequencing. Nucleic Acids Res 2008;36:e105. 15. Erlich Y, Mitra PP, delaBastide M, McCombie WR, Hannon GJ. Alta-Cyclic: a self-optimizing base caller for next-generation sequencing. Nat Methods 2008;5:679 – 82.

656 Clinical Chemistry 55:4 (2009)

16. Quail MA, Kozarewa I, Smith F, Scally A, Stephens PJ, Durbin R, et al. A large genome center’s improvements to the Illumina sequencing system. Nat Methods 2008;5:1005–10. 17. Shendure J, Porreca GJ, Reppas NB, Lin X, McCutcheon JP, Rosenbaum AM, et al. Accurate multiplex polony sequencing of an evolved bacterial genome. Science 2005;309:1728 –32. 18. Braslavsky I, Hebert B, Kartalov E, Quake SR. Sequence information can be obtained from single DNA molecules. Proc Natl Acad Sci U S A 2003;100:3960 – 4. 19. Harris TD, Buzby PR, Babcock H, Beer E, Bowers J, Braslavsky I, et al. Single-molecule DNA sequencing of a viral genome. Science 2008;320: 106 –9. 20. Smith DR, Quinlan AR, Peckham HE, Makowsky K, Tao W, Woolf B, et al. Rapid whole-genome mutational profiling using next-generation sequencing technologies. Genome Res 2008;18: 1638 – 42. 21. Quinn NL, Levenkova N, Chow W, Bouffard P, Boroevich KA, Knight JR, et al. Assessing the feasibility of GS FLX Pyrosequencing for sequencing the Atlantic salmon genome. BMC Genomics 2008;9:404. 22. Satkoski JA, Malhi R, Kanthaswamy S, Tito R, Malladi V, Smith D. Pyrosequencing as a method for SNP identification in the rhesus macaque (Macaca mulatta). BMC Genomics 2008;9:256. 23. Borneman AR, Forgan AH, Pretorius IS, Chambers PJ. Comparative genome analysis of a Saccharomyces cerevisiae wine strain. FEMS Yeast Res 2008;8:1185–95. 24. Wang J, Wang W, Li R, Li Y, Tian G, Goodman L, et al. The diploid genome sequence of an Asian individual. Nature 2008;456:60 –5. 25. Ley TJ, Mardis ER, Ding L, Fulton B, McLellan MD, Chen K, et al. DNA sequencing of a cytogenetically normal acute myeloid leukaemia genome. Nature 2008;456:66 –72. 26. Kim PM, Lam HY, Urban AE, Korbel JO, Affourtit J, Grubert F, et al. Analysis of copy number variants and segmental duplications in the human genome: evidence for a change in the process of formation in recent evolutionary history. Genome Res 2008;18:1865–74. 27. Chen J, Kim YC, Jung YC, Xuan Z, Dworkin G, Zhang Y, et al. Scanning the human genome at kilobase resolution. Genome Res 2008;18:751– 62. 28. Yeager M, Xiao N, Hayes RB, Bouffard P, Desany B, Burdett L, et al. Comprehensive resequence analysis of a 136 kb region of human chromosome 8q24 associated with prostate and colon cancers. Hum Genet 2008;124:161–70. 29. Ding L, Getz G, Wheeler DA, Mardis ER, McLellan

30.

31.

32.

33.

34.

35.

36.

37.

38.

39.

40.

41.

42.

MD, Cibulskis K, et al. Somatic mutations affect key pathways in lung adenocarcinoma. Nature 2008;455:1069 –75. Albert TJ, Molla MN, Muzny DM, Nazareth L, Wheeler D, Song X, et al. Direct selection of human genomic loci by microarray hybridization. Nat Methods 2007;4:903–5. Hodges E, Xuan Z, Balija V, Kramer M, Molla MN, Smith SW, et al. Genome-wide in situ exon capture for selective resequencing. Nat Genet 2007; 39:1522–7. Okou DT, Steinberg KM, Middle C, Cutler DJ, Albert TJ, Zwick ME. Microarray-based genomic selection for high-throughput resequencing. Nat Methods 2007;4:907–9. Porreca GJ, Zhang K, Li JB, Xie B, Austin D, Vassallo SL, et al. Multiplex amplification of large sets of human exons. Nat Methods 2007;4: 931– 6. Huber JA, Mark Welch DB, Morrison HG, Huse SM, Neal PR, Butterfield DA, Sogin ML. Microbial population structures in the deep marine biosphere. Science 2007;318:97–100. Sogin ML, Morrison HG, Huber JA, Mark Welch D, Huse SM, Neal PR, et al. Microbial diversity in the deep sea and the underexplored “rare biosphere.” Proc Natl Acad Sci U S A 2006;103: 12115–20. Urich T, Lanzen A, Qi J, Huson DH, Schleper C, Schuster SC. Simultaneous assessment of soil microbial community structure and function through analysis of the meta-transcriptome. PLoS ONE 2008;3:e2527. Palacios G, Druce J, Du L, Tran T, Birch C, Briese T, et al. A new arenavirus in a cluster of fatal transplant-associated diseases. N Engl J Med 2008;358:991– 8. Keijser BJ, Zaura E, Huse SM, van der Vossen JM, Schuren FH, Montijn RC, et al. Pyrosequencing analysis of the oral microflora of healthy adults. J Dent Res 2008;87:1016 –20. Turnbaugh PJ, Hamady M, Yatsunenko T, Cantarel BL, Duncan A, Ley RE, et al. A core gut microbiome in obese and lean twins. Nature 2008;457:480 – 4. Wang Z, Gerstein M, Snyder M. RNA-Seq: a revolutionary tool for transcriptomics. Nat Rev Genet 2009;10:57– 63. Nagalakshmi U, Wang Z, Waern K, Shou C, Raha D, Gerstein M, Snyder M. The transcriptional landscape of the yeast genome defined by RNA sequencing. Science 2008;320: 1344 –9. Wilhelm BT, Marguerat S, Watt S, Schubert F, Wood V, Goodhead I, et al. Dynamic repertoire of a eukaryotic transcriptome surveyed at single-nucleotide resolution. Nature 2008;453: 1239 – 43.

Reviews

Next-Generation Sequencing

43. Mortazavi A, Williams BA, McCue K, Schaeffer L, Wold B. Mapping and quantifying mammalian transcriptomes by RNA-Seq. Nat Methods 2008; 5:621– 8. 44. Lister R, O’Malley RC, Tonti-Filippini J, Gregory BD, Berry CC, Millar AH, Ecker JR. Highly integrated single-base resolution maps of the epigenome in Arabidopsis. Cell 2008;133:523–36. 45. Cloonan N, Forrest AR, Kolle G, Gardiner BB, Faulkner GJ, Brown MK, et al. Stem cell transcriptome profiling via massive-scale mRNA sequencing. Nat Methods 2008;5:613–9. 46. Marioni JC, Mason CE, Mane SM, Stephens M, Gilad Y. RNA-seq: an assessment of technical reproducibility and comparison with gene expression arrays. Genome Res 2008;18:1509 – 17. 47. Morin R, Bainbridge M, Fejes A, Hirst M, Krzywinski M, Pugh T, et al. Profiling the HeLa S3 transcriptome using randomly primed cDNA and massively parallel short-read sequencing. Biotechniques 2008;45:81–94. 48. Morin RD, O’Connor MD, Griffith M, Kuchenbauer F, Delaney A, Prabhu AL, et al. Application of massively parallel sequencing to microRNA profiling and discovery in human embryonic stem cells. Genome Res 2008;18:610 –21. 49. Emrich SJ, Barbazuk WB, Li L, Schnable PS. Gene discovery and annotation using LCM-454 transcriptome sequencing. Genome Res 2007;17: 69 –73. 50. Pan Q, Shai O, Lee LJ, Frey BJ, Blencowe BJ. Deep surveying of alternative splicing complexity in the human transcriptome by high-throughput sequencing. Nat Genet 2008;40:1413–5. 51. Wang ET, Sandberg R, Luo S, Khrebtukova I, Zhang L, Mayr C, et al. Alternative isoform regulation in human tissue transcriptomes. Nature 2008;456:470 – 6. 52. Ren B, Robert F, Wyrick JJ, Aparicio O, Jennings EG, Simon I, et al. Genome-wide location and function of DNA binding proteins. Science 2000; 290:2306 –9. 53. Barski A, Cuddapah S, Cui K, Roh TY, Schones DE, Wang Z, et al. High-resolution profiling of histone methylations in the human genome. Cell 2007; 129:823–37. 54. Schones DE, Zhao K. Genome-wide approaches to studying chromatin modifications. Nat Rev Genet 2008;9:179 –91. 55. Valouev A, Ichikawa J, Tonthat T, Stuart J, Ranade S, Peckham H, et al. A high-resolution, nucleosome position map of C. elegans reveals a lack of universal sequence-dictated positioning. Genome Res 2008;18:1051– 63. 56. Johnson DS, Mortazavi A, Myers RM, Wold B. Genome-wide mapping of in vivo protein-DNA interactions. Science 2007;316:1497–502. 57. Robertson G, Hirst M, Bainbridge M, Bilenky M, Zhao Y, Zeng T, et al. Genome-wide profiles of STAT1 DNA association using chromatin immunoprecipitation and massively parallel sequencing. Nat Methods 2007;4:651–7. 58. Korshunova Y, Maloney RK, Lakey N, Citek RW, Bacher B, Budiman A, et al. Massively parallel bisulphite pyrosequencing reveals the molecular complexity of breast cancer-associated cytosine-methylation patterns obtained from tissue and serum DNA. Genome Res 2008;18: 19 –29.

59. Cokus SJ, Feng S, Zhang X, Chen Z, Merriman B, Haudenschild CD, et al. Shotgun bisulphite sequencing of the Arabidopsis genome reveals DNA methylation patterning. Nature 2008;452: 215–9. 60. Marguerat S, Wilhelm BT, Bahler J. Nextgeneration sequencing: applications beyond genomes. Biochem Soc Trans 2008;36:1091– 6. 61. Pop M, Salzberg SL. Bioinformatics challenges of new sequencing technology. Trends Genet 2008; 24:142–9. 62. Rougemont J, Amzallag A, Iseli C, Farinelli L, Xenarios I, Naef F. Probabilistic base calling of Solexa sequencing data. BMC Bioinformatics 2008;9:431. 63. Ewing B, Green P. Base-calling of automated sequencer traces using phred. II. Error probabilities. Genome Res 1998;8:186 –94. 64. Ewing B, Hillier L, Wendl MC, Green P. Basecalling of automated sequencer traces using phred. I. Accuracy assessment. Genome Res 1998;8:175– 85. 65. Brockman W, Alvarez P, Young S, Garber M, Giannoukos G, Lee WL, et al. Quality scores and SNP detection in sequencing-by-synthesis systems. Genome Res 2008;18:763–70. 66. Smith AD, Xuan Z, Zhang MQ. Using quality scores and longer reads improves accuracy of Solexa read mapping. BMC Bioinformatics 2008; 9:128. 67. Li H, Ruan J, Durbin R. Mapping short DNA sequencing reads and calling variants using mapping quality scores. Genome Res 2008;18: 1851– 8. 68. Whiteford N, Haslam N, Weber G, Prugel-Bennett A, Essex JW, Roach PL, et al. An analysis of the feasibility of short read sequencing. Nucleic Acids Res 2005;33:e171. 69. Warren RL, Sutton GG, Jones SJ, Holt RA. Assembling millions of short DNA sequences using SSAKE. Bioinformatics 2007;23:500 –1. 70. Hernandez D, Francois P, Farinelli L, Osteras M, Schrenzel J. De novo bacterial genome sequencing: millions of very short reads assembled on a desktop computer. Genome Res 2008;18:802–9. 71. Lin H, Zhang Z, Zhang MQ, Ma B, Li M. ZOOM! Zillions Of Oligos Mapped. Bioinformatics 2008; 24:2431–7. 72. Smith DR, Quinlan AR, Peckham HE, Makowsky K, Tao W, Woolf B, et al. Rapid whole-genome mutational profiling using next-generation sequencing technologies. Genome Res 2008;18: 1638 – 42. 73. Li R, Li Y, Kristiansen K, Wang J. SOAP: short oligonucleotide alignment program. Bioinformatics 2008;24:713– 4. 74. Chaisson MJ, Pevzner PA. Short read fragment assembly of bacterial genomes. Genome Res 2008;18:324 –30. 75. Dohm JC, Lottaz C, Borodina T, Himmelbauer H. SHARCGS, a fast and highly accurate short-read assembly algorithm for de novo genomic sequencing. Genome Res 2007;17:1697–706. 76. Zerbino DR, Birney E. Velvet: algorithms for de novo short read assembly using de Bruijn graphs. Genome Res 2008;18:821–9. 77. Zhu QH, Spriggs A, Matthew L, Fan L, Kennedy G, Gubler F, Helliwell C. A diverse set of microRNAs

78.

79.

80.

81.

82.

83.

84.

85.

86.

87.

88.

89.

90.

91.

92.

and microRNA-like small RNAs in developing rice grains. Genome Res 2008;18:1456 – 65. Glazov EA, Cottee PA, Barris WC, Moore RJ, Dalrymple BP, Tizard ML. A microRNA catalog of the developing chicken embryo identified by a deep sequencing approach. Genome Res 2008; 18:957– 64. Valouev A, Johnson DS, Sundquist A, Medina C, Anton E, Batzoglou S, et al. Genome-wide analysis of transcription factor binding sites based on ChIP-Seq data. Nat Methods 2008;5: 829 –34. Zhang Y, Liu T, Meyer CA, Eeckhoute J, Johnson DS, Bernstein BE, et al. Model-based analysis of ChIP-Seq (MACS). Genome Biol 2008;9: R137. Binladen J, Gilbert MT, Bollback JP, Panitz F, Bendixen C, Nielsen R, Willerslev E. The use of coded PCR primers enables high-throughput sequencing of multiple homolog amplification products by 454 parallel sequencing. PLoS ONE 2007; 2:e197. Meyer M, Stenzel U, Hofreiter M. Parallel tagged sequencing on the 454 platform. Nat Protoc 2008;3:267–78. Meyer M, Stenzel U, Myles S, Prufer K, Hofreiter M. Targeted high-throughput sequencing of tagged nucleic acid samples. Nucleic Acids Res 2007;35:e97. Wang C, Mitsuya Y, Gharizadeh B, Ronaghi M, Shafer RW. Characterization of mutation spectra with ultra-deep pyrosequencing: application to HIV-1 drug resistance. Genome Res 2007;17: 1195–201. Fokstuen S, Lyle R, Munoz A, Gehrig C, Lerch R, Perrot A, et al. A DNA resequencing array for pathogenic mutation detection in hypertrophic cardiomyopathy. Hum Mutat 2008;29:879 – 85. Morita H, Rehm HL, Menesses A, McDonough B, Roberts AE, Kucherlapati R, et al. Shared genetic causes of cardiac hypertrophy in children and adults. N Engl J Med 2008;358:1899 –908. Chiu RW, Chan KC, Gao Y, Lau VY, Zheng W, Leung TY, et al. Noninvasive prenatal diagnosis of fetal chromosomal aneuploidy by massively parallel genomic sequencing of DNA in maternal plasma. Proc Natl Acad Sci U S A 2008;105: 20458 – 63. Fan HC, Blumenfeld YJ, Chitkara U, Hudgins L, Quake SR. Noninvasive diagnosis of fetal aneuploidy by shotgun sequencing DNA from maternal blood. Proc Natl Acad Sci U S A 2008;105: 16266 –71. Fan HC, Quake SR. Detection of aneuploidy with digital polymerase chain reaction. Anal Chem 2007;79:7576 –9. Lo YM, Lun FM, Chan KC, Tsui NB, Chong KC, Lau TK, et al. Digital PCR for the molecular detection of fetal chromosomal aneuploidy. Proc Natl Acad Sci U S A 2007;104:13116 –21. Dennis Lo YM, Chiu RW. Prenatal diagnosis: progress through plasma nucleic acids. Nat Rev Genet 2007;8:71–7. Korlach J, Marks PJ, Cicero RL, Gray JJ, Murphy DL, Roitman DB, et al. Selective aluminum passivation for targeted immobilization of single DNA polymerase molecules in zero-mode waveguide nanostructures. Proc Natl Acad Sci U S A 2008;105:1176 – 81.

Clinical Chemistry 55:4 (2009) 657

Reviews 93. Levene MJ, Korlach J, Turner SW, Foquet M, Craighead HG, Webb WW. Zero-mode waveguides for single-molecule analysis at high concentrations. Science 2003;299:682– 6. 94. Eid J, Fehr A, Gray J, Luong K, Lyle J, Otto G, et al. Real-time DNA sequencing from single polymerase molecules. Science 2008;323: 133– 8.

658 Clinical Chemistry 55:4 (2009)

95. Astier Y, Braha O, Bayley H. Toward single molecule DNA sequencing: direct identification of ribonucleoside and deoxyribonucleoside 5⬘monophosphates by using an engineered protein nanopore equipped with a molecular adapter. J Am Chem Soc 2006;128:1705–10. 96. Wu HC, Astier Y, Maglia G, Mikhailova E, Bayley H. Protein nanopores with covalently attached

molecular adapters. J Am Chem Soc 2007;129: 16142– 8. 97. Gupta PK. Single-molecule DNA sequencing technologies for future genomics research. Trends Biotechnol 2008;26:602–11. 98. Rhee M, Burns MA. Nanopore sequencing technology: research trends and applications. Trends Biotechnol 2006;24:580 – 6.

Reviews

Clinical Chemistry 55:4 659–669 (2009)

Microarray-Based Genomic DNA Profiling Technologies in Clinical Molecular Diagnostics Yiping Shen1,2,3 and Bai-Lin Wu1,3,4*

BACKGROUND: Microarray-based genomic DNA profiling (MGDP) technologies are rapidly moving from translational research to clinical diagnostics and have revolutionized medical practices. Such technologies have shown great advantages in detecting genomic imbalances associated with genomic disorders and singlegene diseases. CONTENT:

We discuss the development and applications of the major array platforms that are being used in both academic and commercial laboratories. Although no standardized platform is expected to emerge soon, comprehensive oligonucleotide microarray platforms— both comparative genomic hybridization arrays and genotyping hybrid arrays—are rapidly becoming the methods of choice for their demonstrated analytical validity in detecting genomic imbalances, for their flexibility in incorporating customized designs and updates, and for the advantage of being easily manufactured. Copy number variants (CNVs), the form of genomic deletions/duplications detected through MGDP, are a common etiology for a variety of clinical phenotypes. The widespread distribution of CNVs poses great challenges in interpretation. A broad survey of CNVs in the healthy population, combined with the data accumulated from the patient population in clinical laboratories, will provide a better understanding of the nature of CNVs and enhance the power of identifying genetic risk factors for medical conditions.

SUMMARY:

MGDP technologies for molecular diagnostics are still at an early stage but are rapidly evolving. We are in the process of extensive clinical validation and utility evaluation of different array designs and

Children’s Hospital Boston, Boston, MA; 2 Massachusetts General Hospital, Boston, MA; 3 Harvard Medical School, Boston, MA; 4 Fudan University, Shanghai, China. * Address correspondence to this author at: Departments of Lab Medicine and Pathology, Children’s Hospital and Harvard Medical School, 300 Longwood Ave., Boston, MA 02115. Fax 617 730 0338; e-mail bai-lin.wu@childrens. harvard.edu. Received October 24, 2008; accepted January 26, 2009. Previously published online at DOI: 10.1373/clinchem.2008.112821

technical platforms. CNVs of currently unknown importance will be a rich source of novel discoveries. © 2009 American Association for Clinical Chemistry

Human DNA mutations range from single-nucleotide changes to whole-chromosome alterations. At the small-size end of the mutation spectrum, de novo changes in single base pairs occur at the rate of about 1.7 ⫻ 10⫺8 per base pair per generation (1 ). It is estimated that there are 2 nonsilent point mutations in each newborn (2 ). At the large-size end of the mutation spectrum, alterations in chromosome number causing aneuploidy affect about 0.3% of live births (3 ). In between are microscopic and submicroscopic genomic rearrangements involving different segments of chromosomes that may affect many genes or exons of a gene. The de novo rate of mutation of copy number variants (CNVs)5 in each newborn has been estimated as 1 in 8 for deletion and 1 in 50 for duplication (2 ), which are between the rates for de novo point mutations and chromosomal aneuploidy. Recent studies have shown that the human genome contains many CNVs (4 –7 ), which are the result of a distant (inherited) or current (de novo) loss or gain of genomic sequences. A portion of these alterations will produce a change in the human phenotype and in extreme cases will cause abnormal development and medical conditions. The purpose of clinical genetic diagnostics is to detect such mutations effectively and to correlate these changes with corresponding medical conditions. Various methods have different resolutions that reveal different sizes of genomic imbalance (Fig. 1). Point mutations can be detected effectively with the Sanger sequencing method after specific amplification of target regions by the PCR. The detection of chromosomal aneuploidy and large rearrangements are within the repertoire of traditional cytogenetics.

1

5

Nonstandard abbreviations: CNV, copy number variant; FISH, fluorescence in situ hybridization; MLPA, multiplex ligation-dependent probe amplification; MGDP, microarray-based genomic DNA profiling; CGH, comparative genomic hybridization; SNP, single-nucleotide polymorphism; BAC, bacterial artificial chromosome; UPD, uniparental disomy; ASD, autism spectrum disorder; pCNV, pathogenic CNV; bCNV, benign CNV; uCNV, CNV of unknown importance.

659

Reviews

Oligo microarrays

BAC array CGH

FISH

MLPA RT-PCR Sanger sequencing

Karyotyping

Fig. 1. Various methods with different resolutions detect different sizes of genomic changes. Point mutations (changes in single or multiple base pairs up to kilobase pairs) can be detected with Sanger DNA sequencing. Karyotyping can detect large chromosomal rearrangements with low resolution (down to 5 Mb). FISH provides much higher resolution (down to 100 kb) for detecting submicroscopic genomic imbalances, but it interrogates only selected loci at one time. MLPA and real-time PCR (RT-PCR) complement FISH and DNA sequencing, but all of these methods suffer from a low information content, limited throughput, and the requirement of choosing candidate targets before the test. Microarrays [BAC and oligonucleotide (oligo)] are capable of detecting both microscopic and submicroscopic copy number changes for the whole genome in a single assay. The high-resolution microarrays (oligo CGH arrays and SNP hybrid arrays) provide the most efficient means to identify genomic imbalance widely (down to kilobase pairs and approaching base pair resolution).

Fluorescence in situ hybridization (FISH) represents a merging of cytogenetic techniques with molecular technology. Such a merger and its applications in the clinical setting were necessitated by the discovery of a set of submicroscopic genomic imbalances associated with specific clinical manifestations (collectively known as microdeletion and microduplication syndromes), but this method interrogates only selected loci at a time. Other techniques designed to detect copy number changes, such as multiplex ligation-dependent probe amplification (MLPA) and real-time PCR, are complementary to those of FISH and DNA sequencing, but all of these methods suffer from low information content, limited throughput, and the requirement of choosing candidate targets before the test. Microarray-based genomic DNA profiling (MGDP) technologies are capable of detecting both microscopic and submicroscopic copy number changes for the whole genome in a single assay. They provide unprecedented sensitivity and cost-effectiveness for a large group of mutations that have evaded conventional approaches, and they are changing clinical practices. MGDP may be used as a first-tier tool in clinical genetics for many conditions previously evaluated via conventional cytogenetic approaches. We review the current status and some consid660 Clinical Chemistry 55:4 (2009)

erations regarding the clinical applications of MGDP technologies. Development of Genomic DNA Profiling Microarrays Microarrays have been widely used for gene expression analysis in the past decade; however, it is worth mentioning that DNA microarrays were initially designed for interrogating vast amounts of genomic sequence polymorphisms and variants. The intention to use microarrays for diagnostic purposes was deeply rooted from the very beginning. There are 2 major microarray platforms for genomic DNA profiling— comparative genomic hybridization (CGH) arrays and genotyping arrays—which were developed in parallel yet interactively. Regardless of the technical differences in chip manufacturing and probe types used with these platforms, they share the same principle and rely on the specific hybridization of target and probe sequences. CGH arrays use a 2-color scheme. The method infers the copy number changes in a test sample by comparing it with a reference sample. Genotyping arrays, on the other hand, do not use a control sample; rather, they use the intensity of the hybridization signal to in-

Microarrays in Clinical Molecular Diagnostics

dicate the relative DNA copy number. In addition, genotyping arrays provide information on singlenucleotide polymorphism (SNP) genotypes. TRANSFORMATION OF CGH MICROARRAYS FROM LOWRESOLUTION TARGETED BAC ARRAYS TOWARD WHOLE-GENOME HIGH-DENSITY OLIGONUCLEOTIDE ARRAYS

CGH was first reported by Kallioniemi et al. in 1992 to interrogate cancer genomic DNA with metaphase chromosomes as probes (8 ). The basic strategy of the technique is to differentially label the DNA from cancer cells and the DNA from healthy reference cells with different fluorochromes and to cohybridize the labeled samples to a metaphase spread from a healthy reference cell. The ratio of the intensities of the 2 fluorochromes reflects the copy number differences between cancer cells and healthy cells. Because this technique uses metaphase chromosomes as probes for hybridization, it is now also called “chromosome CGH.” In essence, chromosome CGH is similar to FISH painting, which uses labeled DNA from the entire genome as the probe set. This technology permits a genome-wide survey and identifies the genomic imbalance at different regions of specific chromosomes. Although chromosome CGH has demonstrated its effectiveness in detecting larger genomic imbalances in the cancer genome, the detection power has ultimately been limited by the resolution of a metaphase chromosome. The major breakthrough in CGH came from microarray technology about a decade ago. Instead of hybridizing labeled genomic targets to metaphase chromosomes, the new scheme uses cloned genomic DNA as probes in a microarray format. Because these probes contain sequence information that permits their specific localization in the human genome, the regions with a genomic imbalance can be delineated by data-visualization software that denotes all of the probes along the genome. The resolution of microarray-based CGH, which is determined by the density and size of the probe, is a substantial improvement over chromosome CGH. The first CGH microarray, termed matrix CGH, was developed in 1997 by Solinas-Toldo et al. with cosmid and plasmid artificial chromosome clones as probes (9 ). Subsequently, bacterial artificial chromosome (BAC) clones (10 ) and cDNA clones (11 ) were used to construct CGH microarrays. Subsequent improvements were focused on increasing the probe density and array coverage, as well as improving the signal-to-noise ratio. The most recent advancement in CGH microarrays has been the use of oligonucleotide sequences as probes (12 ). Compared with BAC arrays, oligonucleotide arrays have several advantages, including:

Reviews 1. Reproducibility. As opposed to BAC arrays, in which the content of the probe (i.e., PCR product of a BAC clone) varies from batch to batch, the probe sequences in oligonucleotide arrays are uniformly defined and devoid of highly repetitive sequences. Consequently, oligonucleotide arrays are more reproducible. 2. Sensitivity and specificity. The smaller interprobe spacing of oligonucleotide arrays offers a much higher probe density for better detection of smaller genomic imbalances and more accurate breakpoint mapping, therefore providing much improved sensitivity over BAC arrays. The fact that oligonucleotide probes are selected from the reference human genome sequence allows all users to use any sequence of interest as a potential target, providing a specificity that is impossible with BAC arrays, in which clones can be selected only from existing libraries and need to be validated for their physical location. 3. Customization. Oligonucleotide probes are synthesized in situ on the arrays, allowing for easy customization of content. In addition, many commercial manufacturers offer a large number of preselected array CGH oligonucleotide probes and computer interfaces, making custom design and updating of CGH arrays quite feasible and fast for clinical laboratories, whereas BAC arrays are cumbersome to update and time-consuming for printing. 4. Robustness and reliability. The substantially increased capacity of oligonucleotide arrays enables multiprobe confirmation for a single event (the detection sensitivity and specificity are a function of consecutive probes), as well as increased robustness because the higher signal-to-noise ratios provide higher confidence in CNV diagnosis (13–15 ). IMPROVEMENT OF GENOTYPING ARRAYS FROM LOWRESOLUTION SNP ARRAYS TO HIGH-RESOLUTION HYBRID ARRAYS THAT INTEGRATE SNP AND CNV PROBES

Whereas CGH arrays successfully combine microarray technology with CGH, genotyping arrays for CNVs arose from SNP arrays that were originally designed for sequencing (16 ), genotyping (17 ), and gene expression (18 ). These SNP arrays with short oligonucleotides were developed by Affymetrix with their proprietary photolithographic technology (19 ). Since then, the density of SNP arrays has been growing constantly. The DNA chips designed to genotype 10 ⫻ 103, 50 ⫻ 103, 100 ⫻ 103, 500 ⫻ 103, and even more SNPs have allowed improved detection resolution. At about the same time that Agilent Technologies developed long oligonucleotide– based whole-genome CGH arrays with their ink-jet technology (20 ), Affymetrix SNP arrays demonstrated their ability to detect changes in genomic copy number in addition to genotyping (21–23 ). Similarly, Illumina used a Clinical Chemistry 55:4 (2009) 661

Reviews

Table 1. Major commercial oligonucleotide array platforms and their current products.

Company

Agilent Technologies, Santa Clara, CA

Resolution (median probe spacing)

Probe number

4x44K CGH array

43 kb

43 000⫹

8x60K CGH array

41.4 kb

55 000⫹

2x105K CGH array

21.7 kb

99 000⫹

4x180K CGH array

13 kb

170 000⫹

244K CGH array

8.9 kb

236 000⫹

Oligonucleotide probe type

60-mer

Detection

CNV

2x400K CGH array

5.3 kb

411 000⫹

1 Million CGH array

2.1 kb

963 000⫹

Affymetrix, Santa Clara, CA

Genome-Wide Human SNP Array 6.0

0.7 kb

906 600 SNP probes and 946 000 CNV probes

25-mer

CNV, genotype, and LOHa

Illumina, San Diego, CA

HumanCNV370-Quad DNA analysis BeadChip

4.9 kb

320 000 SNP probes and 60 000 non-SNP probes for CNVs

50-mer

CNV, genotype, and LOH

Human610-Quad DNA analysis BeadChip

2.7 kb

550 000 SNP probes and 60 000 non-SNP probes for CNVs

Human1M-Duo BeadChip

1.5 kb

1.1 ⫻ 106 SNP and CNV probes targeting exons

HG18 CGH 4x72K WG Tiling v2.0

40 kb

72 000

50- to 75-mer

CNV

NimbleGen, Madison, WI

a

Array platform

385K WG Tiling, single array

6.27 kb

385K WG Tiling, 4-set array

1.57 kb

385K WG Tiling, 8-set array

713 bp

385 000/array

LOH, loss of heterozygosity.

different manufacturing approach in commercializing BeadChips for genotyping and copy number analysis (24 ). Table 1 lists the currently available oligonucleotidearray platforms for genomic DNA profiling. Reduced manufacturing costs, unprecedented detection power, and the feasibility of custom design/updates have made these arrays attractive in basic and translational research. The major advance in genotyping arrays for genomic profiling is the development of hybrid genotyping arrays, i.e., the Affymetrix Genome-Wide Human SNP Array 5.0 and the latest 6.0 version, which combines SNP probes for genotyping with CNV probes for detecting changes in copy number (25 ). The SNP Array 5.0, which is the prototype of the hybrid array, contains 500 000 SNP probes for genotyping and 420 000 nonpolymorphic probes for CNV analysis. Among the nonpolymorphic probes, 320 000 probes were chosen to provide even spacing across the genome, concentrating on areas not represented by 662 Clinical Chemistry 55:4 (2009)

SNPs, and the remaining 100 000 probes covered 2000 known CNVs (50 probes per CNV). The SNP Array 6.0 has 906 600 SNP probes, 744 000 copy number probes evenly spaced along the genome, and another 202 000 probes that target 5700 previously reported CNV regions. Thus, a single 6.0 array offers a total of 1.8 million probes for simultaneous SNP genotyping, CNV analysis, and loss-of-heterozygosity detection. Although the Affymetrix SNP Array 6.0 (hybrid arrays) is mainly a research tool at the moment, many diagnostics laboratories are actively validating this platform for clinical applications. Although longoligonucleotide CGH arrays do not have as many probes as short-oligonucleotide genotyping arrays, the signal-to-background dynamic range is generally better with long-oligonucleotide CGH arrays; however, the rich genotyping information, in addition to the CNV data offered by the hybrid arrays, provides additional value to the platform. For instance, the genotyping data simultaneously provide information regarding

Microarrays in Clinical Molecular Diagnostics

the parent of origin for the detected de novo CNV if parental samples have also been examined. The most important advantage of the hybrid arrays is their capability of detecting copy number–neutral genomic rearrangements, namely uniparental disomy (UPD), through loss-of-heterozygosity analysis. In the long run, the genotyping information can be used for SNPassociation analysis. Translational Research and Applications of MGDP Technologies for Patient Samples MGDP technologies have been used extensively in clinical and translational research for interrogating patient samples. These studies provided crucial information about clinical utility before their diagnostic applications. A recent review by Stankiewicz and Beaudet (26 ) summarized the applications of microarrays for CGH, primarily dealing with low-resolution or targeted BAC arrays in the evaluation of patients with dysmorphic features, developmental delay, and/or idiopathic mental retardation. The published data produced with different array platforms and patient cohorts indicate an overall detection rate for pathogenic genomic imbalance of patients with multiple congenital anomalies and/or developmental delay/mental retardation of 12%–18%. Only 3%–5% of such patients can be detected by G-banded karyotyping, and an additional 9%–13% can be detected by microarray-based tests (26 ). As expected, the detection rate is higher with whole-genome tiling arrays than with targeted arrays. A series of such studies overwhelmingly verified the superior sensitivity and detection power of array CGH compared with karyotyping and FISH analysis. As mentioned above, the most important advance in array CGH is replacing BACs with oligonucleotides. Similarly, short-oligonucleotide arrays have also been validated against and found superior to BAC arrays. INTERROGATING SAMPLES FROM PATIENTS AND HEALTHY INDIVIDUALS WITH OLIGONUCLEOTIDE ARRAYS

Oligonucleotide CGH arrays became commercially available in 2004 and now include platforms from Agilent Technologies (20 ), NimbleGen (27 ), Affymetrix (28 ), and Illumina (24 ) (Table 1). In-house spottedoligonucleotide arrays (29 ) and custom-designed oligonucleotide CGH arrays (13, 14, 30 –32 ) have also demonstrated high sensitivity and reproducible detection of genomic imbalance in testing with samples from patients and healthy individuals. Consequently, many laboratories have shifted from BAC arrays to oligonucleotide arrays (15 ). Some diagnostic laboratories that provide large-scale services are now using oligonucleotide arrays for clinical diagnostics.

Reviews The initial implementation of oligonucleotide CGH arrays for clinical applications focused on targeted pathogenic genomic regions. Shen et al. developed targeted oligonucleotide arrays that used Agilent custom arrays to interrogate 179 clinically relevant genomic loci (14 ). The multiplex arrays demonstrated results that were 100% concordant with the findings from BAC arrays, showing very high sensitivity. In addition, smaller genomic imbalances that were not detectable by BAC arrays were reliably detected with this oligonucleotide-array system (14 ). Similarly, Ou et al. emulated BAC arrays with oligonucleotide probes and created an array of 44 000 oligonucleotides that verified an enhanced ability for detecting copy number changes compared with BAC arrays (13 ). In addition to showing the superior analytical sensitivity of oligonucleotide arrays compared with BAC arrays, these studies also demonstrated the convenience of oligonucleotide-array design and manufacturing. Aradhya and Cherry used samples from 20 patients with dysmorphic features, developmental delay, or mental retardation to directly compare the performance of whole-genome BAC arrays of 1-Mb resolution (Spectral Chip 2600; PerkinElmer) with that of Agilent whole-genome oligonucleotide CGH arrays of 35-kb resolution (33 ). All of these patients had a typical karyotype. Ten clinically important genomicimbalance events were detected with the oligonucleotide arrays, whereas only 6 were detected with the BAC arrays. The study clearly demonstrated the superior effectiveness of the oligonucleotide arrays. Because of the large size of the BAC clones and the widespread nature of CNVs, BAC arrays provide very low specificity for detecting CNVs. Conversely, oligonucleotide arrays avoid repeat sequences via the selection of probes that avoid nonspecific regions. With the intention of improving the capture of frequent subtelomeric imbalances associated with developmental delay/mental retardation, customdesigned oligonucleotide CGH arrays were developed with improved coverage in subtelomeric regions. Exploiting the design concept used earlier in BAC arrays (34 ), Toruner et al. (32 ) randomly removed one third of the probes from an off-the-shelf Agilent wholegenome CGH array of 44 000 probes and replaced them with 14 000 subtelomeric probes. The resulting arrays provided 5-kb resolution in subtelomeres and 125-kb resolution in the remaining genome. The oligonucleotide CGH array– based molecular ruler (31 ), another design that emerged at nearly the same time, provided 50-kb resolution for subtelomeric regions and 75-kb resolution for the rest of the genome. This array detected a subtelomeric genomic imbalance of 10.9% and a pathologic imbalance of 4.7% genome wide in clinical samples (30 ). These custom designs provide Clinical Chemistry 55:4 (2009) 663

Reviews

Table 2. Exon-level DNA-profiling arrays. Reference

Target gene(s)a

CGH array platform

Specific probe no.

Resolution b

Rouleau et al. (39 )

Agilent 11K array

BRCA1

1679

NA

del Gaudio et al. (40 )

Agilent 44K array

DMD

8769

NA

Staaf et al. (41 )

Agilent 4x44K array

BRCA1, BRCA2, MSH2, MLH1, PTEN, and CDKN2A

9612

⬍500 bp, partial exon

Saillour et al. (42 )

NimbleGen tiling array

CFTR, SGCA, SGCB, SGCD, SGCE, SGCZ, and DMD

Hegde et al. (43 )

NimbleGen tiling array

Wong et al. (44 )

Agilent 44K array

72 500

Single exon and mosaic

DMD

385 747 (mean spacing, 5 bp)

Single exon and mosaic

130 Nuclear genes involved in metabolic and mitochondrial disorders

⬵44 000 (mean spacing, 250–300 bp)

Single exon and mosaic

a

BRCA1, breast cancer 1, early onset; BRCA2, breast cancer 2, early onset; MSH2, mutS homolog 2, colon cancer, nonpolyposis type 1 (E. coli); MLH1, mutL homolog 1, colon cancer, nonpolyposis type 2 (E. coli); PTEN, phosphatase and tensin homolog; CDKN2A, cyclin-dependent kinase inhibitor 2A (melanoma, p16, inhibits CDK4); CFTR, cystic fibrosis transmembrane conductance regulator (ATP-binding cassette sub-family C, member 7); SGCA, sarcoglycan, alpha (50kDa dystrophinassociated glycoprotein); SGCB, sarcoglycan, beta (43kDa dystrophin-associated glycoprotein); SGCD, sarcoglycan, delta (35kDa dystrophin-associated glycoprotein); SGCE, sarcoglycan, epsilon; SGCZ, sarcoglycan zeta. b Data not available.

enhanced coverage for clinically relevant regions, as well as decent coverage for the rest of genome for better measurement of the sizes of imbalance events. Compared with oligonucleotide CGH arrays, fewer clinical applications have been studied on genotyping arrays. Friedman et al. used the Affymetrix 100K SNP arrays to investigate genomic imbalance (28 ). Eight control trios that included the unaffected parents and an affected child with a previously recognized chromosomal abnormality or UPD were studied. The Affymetrix array detected the known abnormalities in all control cases, including 4 cases of UPD. The array also detected de novo deletions as small as 178 kb among 100 idiopathic mental-retardation patients with a typical karyotype. In addition, Ming et al. used the 50K Xba chip from the Affymetrix 100K array set for a similar study of patients with multiple congenital anomalies (35 ). Although both studies illustrated the sensitivity and detection power of Affymetrix arrays for investigating genomic imbalance, the analytical sensitivity and cutoff for detecting copy number–neutral UPD have yet to be firmly established. Flexible probe selection permits easy design of oligonucleotide arrays that can detect deletions and duplications at the exon level of specific genes (Table 2). Although PCR products for NF16 (neurofibromin 1) (36 ), NF2 [neurofibromin 2 (merlin)] (37 ), and other large, clinically important genes (38 ) have been used to

6

Human genes: NF1, neurofibromin 1; NF2, neurofibromin 2 (merlin); PARK2, Parkinson disease (autosomal recessive, juvenile) 2, parkin; DMD, dystrophin.

664 Clinical Chemistry 55:4 (2009)

develop exon-level microarrays, the current manufacturing technology for oligonucleotide arrays has simplified the process so that it is more accessible to ordinary laboratories. These custom-designed, target gene–specific arrays detect duplication as effectively as deletion at the single-exon level, with more precise breakpoint mapping (39 – 44 ). Targeting panels of genes associated with one disease at the exon level is attractive for molecular diagnostics. GENOMIC IMBALANCES IN AUTISM SPECTRUM DISORDERS

Autism spectrum disorders (ASDs) are complex, pervasive developmental-delay disorders characterized by impairments in communication, social interaction, and behavior. Studies of twins have suggested a strong genetic component to ASD. Thus far, few contributory genes have been identified, but cytogenetic changes have been one of the most consistent identifiable causes of autism. Changes detectable by high-resolution G-banding have been reported in 3%–5% of autism patients (45 ), involving the chromosomal regions at 2q37, 5p15, 11q25, 16q22.3, 17p11.2, 18q21.1, 18q23, 22q11.2, 22q13.3, and Xp22.2p22.3 (46 ). MGDP is a powerful new venue for exploring the complex nature of ASD genetics. Jacquemont et al. (47 ) used wholegenome 1-Mb BAC CGH arrays to interrogate 29 syndromic autism patients with typical karyotypes and found that more than a quarter of the patients showed spontaneous pathogenic copy number changes. The size of the imbalances ranged from 2 Mb to 17.3 Mb. This study highlighted the immense effectiveness of array CGH in genetic analysis of patients with autism. It

Reviews

Microarrays in Clinical Molecular Diagnostics

Table 3. Detection rate of clinically relevant CNVs in ASDs by array technologies.

Reference

a

Array platforms

Jacquemont et al. (47 )

BAC CGH array

Sebat et al. (48 )

Oligonucleotide CGH array

Actual resolution

Patient cohort

1 Mb

De novo CNV detection rate

29 Syndromic autism

35 kb

27.5% (8 of 29)

118 Sporadic autism

10.2% (12 of 118)

77 Multiplexa familial autism

3% (2 of 77)

Szatmari et al. (49 )

Affymetrix 10K SNP array

Weiss et al. (50 )

Affymetrix 5.0 array

Christian et al. (51 )

BAC CGH array

397 AGRE samples

2.3% (9 of 397)

Marshall et al. (52 )

Affymetrix 500K SNP array

75 kb

427 ASD families

6.3% mean (27 of 427); 7.1% (4 of 56) in simplex families; 2% (1 of 49) in multiplex families

Morrow et al. (56 )

Affymetrix 500K SNP array and targeted BAC CGH array

75 kb

42 Consanguineous samples, multiplex

0

52 Consanguineous samples, simplex

1.9% (1 of 52)

476 kb 30 kb Tiling 19K set

173 Families, simplex

5.8% (10 of 173)

751 AGRE families

6.7% (50 of 751)

Multiplex, family with ⱖ2 ASD patients; simplex, family with 1 ASD patient; AGRE, Autism Genetic Resource Exchange.

also demonstrated that extensive genomic imbalances are an important underlying cause of syndromic ASD. In another study that used oligonucleotide arrays with much higher resolution, Sebat et al. (48 ) revealed the importance of de novo CNVs as genetic risk factors for sporadic ASD. Subsequent studies have provided further evidence to support the notion that copy number changes are strongly associated with idiopathic ASD (49 –55 ). Array technology has detected pathogenic CNVs in 5.8%–10.2% of sporadic ASD patients (Table 3), whereas the rate is consistently low in familial ASD patients (2%–3%). This finding suggests that there are 2 different ASD genetic mechanisms: sporadic ASDs are more likely to be caused by de novo deletions or duplications, whereas familial ASDs are more likely to be due to other types of inherited mutations. Array studies with consanguineous families further demonstrated the existence of recessive gene mutations that are largely not genomic deletions or duplications (56 ). Collectively, these studies clearly demonstrate the clinical value of genomic profiling in the evaluation of autism. It is strongly evident that the detection yield of MGDP is much more consistent and higher than traditional cytogenetic techniques (57 ). Clinical Validation and Diagnostic Applications of MGDP Technologies CLINICAL VALIDATION—SENSITIVITY AND SPECIFICITY

As for any new technology, systematic validation is required before MGDP can be used for routine clinical

applications (14, 58, 59 ). The validation process should test all aspects of a new technology, but most importantly the sensitivity and specificity. Sensitivity and specificity are often tested by means of a positively testing cohort with known mutations that have been identified with a well-accepted technology. In the case of MGDP, results from karyotyping and FISH studies are often used. Typically, sensitivity testing examines the false-negative rate, and specificity testing inspects the false-positive rate; however, several aspects of MGDP validation are unique. Microarray technology is analog in nature. The change in genomic copy number is reflected by a change in signal intensity or a color shift (red vs green) of the microarray features (probe spots). The first critical step is to define the analytical resolution, i.e., the size of CNV that can be detected reliably. Assuming that the feature-analysis software is well established for each array platform (not necessarily the case), CNV calling depends on the algorithm used. Owing to the different array platforms and probe densities, the analytical resolution of each array needs to be specifically defined. It is important to note that sensitivity and specificity vary, depending on the number of consecutive probes examined. Higher sensitivity and specificity can be achieved when more consecutive probes are used. In addition, the signal-to-background ratio for a deletion or duplication changes with the array platform, the labeling chemistry, and the hybridization protocol. Thus, it is necessary to define an appropriate threshold for each clinical-array design. For example, Clinical Chemistry 55:4 (2009) 665

Reviews

Table 4. Clinical utility of diagnostic genomic-profiling microarrays.

a

Reference

Array platform

Shaffer et al. (65 )

Signature Genomic targeted BAC CGH array

Lu et al. (66 )

Baylor CMA V4 targeted BAC CGH array

Sample size, n

GIa/CNV size

pCNV

bCNV

uCNV

ⱖ200 kb

6.90%

1.20%

3.90%

775

ⱖ1 BAC clone

7.60%

Baylor CMA V5 targeted BAC CGH array

1738

ⱖ1 BAC clone

8.90%

Pickering et al. (67 )

Spectral Genomics Spectral Chip 2600 and Constitutional Chip

1176

ⱖ1 BAC clone

9.86%

Aston et al. (68 )

Spectral Genomics Spectral Chip 2600 and Constitutional Chip

669

ⱖ2 BAC clones

Baldwin et al. (30 )

Targeted plus wholegenome oligo CGH array

211

⬎500 kb

Shen et al. (14 )

Agilent 2x11K focused oligo CGH array

211

Fan et al. (69 )

Agilent 44K whole-genome oligo CGH array

100

Xiang et al. (70 )

Agilent 44K whole-genome oligo CGH array

50

8789

10%

0.60%

8.1%

1.70%

3.99%

10.8%

NA

15.64% (10.90%, targeted; 4.74%, backbone)

8.53% (6.16%, 1.90% complex; 2.36% familial)

ⱖ3 Consecutive probes, smallest 23 kb

11.9%

4.26%

ⱖ3 Consecutive probes, smallest 17 kb

15%

⬎500 kb

6.00%

0.72/case 12.00%

GI, genomic imbalance; CMA, chromosomal microarray analysis; NA, data not available; oligo, oligonucleotide.

chromosomal mosaicism is not so rare in patients with birth defects and mental retardation. BAC arrays have found 8% of abnormal findings to be mosaic mutations; however, different array platforms have different sensitivities for detecting mosaicism (60 – 62 ). Because microarray technology has much higher resolution than karyotyping and FISH, its analytical resolution cannot be assessed by karyotyping or FISH. In many cases, additional CNVs not detectable by karyotyping or FISH were identified with array technologies. Therefore, other technologies covering smaller-scale mutations, such as MLPA, real-time PCR, or even PCR/sequencing, should be used for validation purposes (14 ). CLINICAL VALIDATION—PLATFORM PITFALLS

It is also necessary to check for missing spots on the array that may be clinically important. For example, some commercial arrays do not cover the pseudochromosomal regions of chromosomes X and Y. On the other hand, poor-performing probes (mismapped or poorly hybridizing probes) should be taken out of the array (59 ). The latter action is particularly necessary in BAC arrays because one mismapped or poorly hybridizing clone will appreciably affect detection sensitivity and specificity. Most of these poorly hybridizing BAC 666 Clinical Chemistry 55:4 (2009)

clones may contain low-copy repeats. Thus, the presence of low-copy repeats is more of a problem in the BAC probe than the oligonucleotide probe, whereas oligonucleotide probes that do not perform well may contain secondary structure. Identifying such bad probes and removing them will improve array performance considerably (14 ). Although such probes cannot be removed from commercial chips, posthybridization data analysis (data filtering) can provide an alternative solution for amending this problem. It has been suggested that a minimum of 30 abnormal samples be used for clinical validation (58 ); however, the region and size of the genomic imbalance that the abnormal cases cover are more important than the actual number. Ideally, every probe should be evaluated with abnormal samples. For this purpose, aneuploidy cell lines, as well as samples with well-defined genomic imbalances of different sizes, are very valuable. UNDERSTANDING CNVs

The most important issue for array interpretation in clinical genetic testing is to define the nature of the identified CNVs (63 ). In general, CNVs can be categorized into 3 major groups: CNVs with established pathogenicity (pCNVs), i.e., deletions/duplications that overlap with re-

Reviews

Microarrays in Clinical Molecular Diagnostics

gions associated with defined genomic disorders or disrupt known disease genes; CNVs of a benign nature (bCNVs), i.e., deletions/duplications repeatedly observed in nonpathologic individuals and with no evidence of an association with any particular clinical phenotype; and CNVs with unknown importance (uCNVs), i.e., deletions/duplications without an established association with any genomic disorder and involving genes of unclear clinical importance [(14, 30, 65–70 ), Table 4]. Even when bCNVs and uCNVs are intentionally avoided during array design, a large percentage of the identified imbalances often belong to bCNVs or uCNVs, because a substantial number of novel CNVs are still being discovered (Table 4). We anticipate that many of CNVs of currently unknown importance (uCNVs) will eventually be confirmed as bCNVs as more precise knowledge about their locations, distributions, and frequencies in healthy populations becomes available. Meanwhile, the accumulating evidence from clinical (phenotyping), genetic (inheritance and segregation), and functional (gene content and expression) studies may reveal a portion of these uCNVs to be pathogenic. We believe that uCNVs are a rich source for identifying novel genetic risk factors for a variety of medical conditions. Our understanding of CNVs will certainly improve with systematic CNV-discovery efforts (64 ). At the same time, continued studies of genotype– phenotype association are also essential. Public databases such as DECIPHER (https://decipher.sanger. ac.uk/) represent such an effort. In the meantime, many novel CNVs are being discovered in clinical samples with the help of high-resolution whole-genome arrays. Databases derived from clinical samples are also a rich source for evaluating genotype–phenotype correlations. Eventually, this knowledge will make it possible to minimize the detection of bCNVs and maximize the detection of clinically important regions. CLINICAL-DIAGNOSTIC UTILITY

The first-generation arrays designed for clinical use were targeted BAC CGH arrays developed in house at the Baylor College of Medicine (34 ) and Signature Genomic Laboratories (71 ). Both arrays were designed to cover regions of known genomic disorders, as well as all subtelomeric regions. Array-CGH and FISH results showed 100% concordance. In addition, array CGH detected changes missed by prior karyotyping or FISH. A similar design was also available commercially, the Constitutional Chip by Spectral Genomics (72 ). The Constitutional Chip also included backbone clones to cover the rest of the genome with an interval of about 1 Mb. Since that time, more BAC clones (from 589 to ⬎4600) have been selected to cover additional clinically relevant regions (approximately 40 –150 recog-

nized genomic disorders/syndromes), improving detection rates from 5.6% in the first-generation BAC arrays to 10.8% in later arrays (68, 73 ). The clinical utility of oligonucleotide arrays has not yet been evaluated in large clinical studies; however, the current data indicate that the detection rate is higher than for BAC arrays (Table 4). High-resolution whole-genome oligonucleotide arrays have identified small novel microdeletions and microduplications in 16p11.2 that are associated with mental retardation/ autism (50, 53 ), have helped to circumscribe critical regions such as 17q21.31 (74 ), and have identified intragenic deletions/duplications in large genes, such as PARK2 [Parkinson disease (autosomal recessive, juvenile) 2, parkin] and DMD (dystrophin) (Y. Shen et al., unpublished data). Common clinical indications for MGDP-based diagnostics testing include developmental delay, multiple congenital anomalies, dysmorphic features, unexplained mental retardation, seizure disorders, ASDs, and other neurologic/psychiatric disorders such as schizophrenia, as well as prenatal diagnosis, including spontaneous abortion or fetal demise. Conclusion and Future Development Microarray-based clinical molecular diagnostics technologies such as MGDP have gone beyond the stage of “coordinated and concurrent FISH analysis” (75 ) and extended into loss-of-heterozygosity testing and possibly methylation profiling. Current MGDP technologies are capable of detecting genomic imbalances of any size but do not reveal positional or orientational information. FISH analysis has largely been responsible for providing positional information. FISH is also instrumental in testing for translocations in parental samples for de novo CNVs that have been identified in probands. MGDP has much better resolution than FISH, however. The field would benefit immensely from a tool with the localization capability of FISH and the resolution of whole-genome oligonucleotide arrays. Advanced next-generation sequencing platforms may eventually provide digital profiling for detecting copy numbers and positional alterations simultaneously. Therefore, DNA-diagnostics profiling may soon extend beyond microarray-based technology, from analog analysis to digital analysis.

Author Contributions: All authors confirmed they have contributed to the intellectual content of this paper and have met the following 3 requirements: (a) significant contributions to the conception and design, acquisition of data, or analysis and interpretation of data; (b) drafting

Clinical Chemistry 55:4 (2009) 667

Reviews or revising the article for intellectual content; and (c) final approval of the published article. Authors’ Disclosures of Potential Conflicts of Interest: Upon manuscript submission, all authors completed the Disclosures of Potential Conflict of Interest form. Potential conflicts of interest: Employment or Leadership: None declared. Consultant or Advisory Role: None declared. Stock Ownership: None declared. Honoraria: None declared. Research Funding: Y. Shen is the recipient of a Young Investigator Award from the Children’s Tumor Foundation, and B.-L. Wu is the

recipient of an NIH-CDC CETT Program grant and a Fudan Scholar Research Award. Expert Testimony: None declared. Role of Sponsor: The funding organizations played a direct role, in part, in the design of the study, the interpretation of the data, and the preparation of the manuscript. Acknowledgments: We thank Orah S. Platt of Children’s Hospital Boston, James F. Gusella of Massachusetts General Hospital, and Li-Jun Ma of the Broad Institute of MIT & Harvard for their critical reading of the manuscript and for thoughtful discussion.

References 1. Kondrashov AS. Direct estimates of human per nucleotide mutation rates at 20 loci causing Mendelian diseases. Hum Mutat 2003;21:12–27. 2. van Ommen GJ. Frequency of new copy number variation in humans. Nat Genet 2005;37:333– 4. 3. Hassold T, Hunt P. To err (meiotically) is human: the genesis of human aneuploidy. Nat Rev Genet 2001;2:280 –91. 4. Feuk L, Carson AR, Scherer SW. Structural variation in the human genome. Nat Rev Genet 2006; 7:85–97. 5. Iafrate AJ, Feuk L, Rivera MN, Listewnik ML, Donahoe PK, Qi Y, et al. Detection of large-scale variation in the human genome. Nat Genet 2004; 36:949 –51. 6. Redon R, Ishikawa S, Fitch KR, Feuk L, Perry GH, Andrews TD, et al. Global variation in copy number in the human genome. Nature 2006;444: 444 –54. 7. Sebat J, Lakshmi B, Troge J, Alexander J, Young J, Lundin P, et al. Large-scale copy number polymorphism in the human genome. Science 2004; 305:525– 8. 8. Kallioniemi A, Kallioniemi OP, Sudar D, Rutovitz D, Gray JW, Waldman F, Pinkel D. Comparative genomic hybridization for molecular cytogenetic analysis of solid tumors. Science 1992;258: 818 –21. 9. Solinas-Toldo S, Lampel S, Stilgenbauer S, Nickolenko J, Benner A, Dohner H, et al. Matrix-based comparative genomic hybridization: biochips to screen for genomic imbalances. Genes Chromosomes Cancer 1997;20:399 – 407. 10. Pinkel D, Segraves R, Sudar D, Clark S, Poole I, Kowbel D, et al. High resolution analysis of DNA copy number variation using comparative genomic hybridization to microarrays. Nat Genet 1998;20:207–11. 11. Pollack JR, Perou CM, Alizadeh AA, Eisen MB, Pergamenschikov A, Williams CF, et al. Genomewide analysis of DNA copy-number changes using cDNA microarrays. Nat Genet 1999;23:41– 6. 12. Lucito R, Healy J, Alexander J, Reiner A, Esposito D, Chi M, et al. Representational oligonucleotide microarray analysis: a high-resolution method to detect genome copy number variation. Genome Res 2003;13:2291–305. 13. Ou Z, Kang SH, Shaw CA, Carmack CE, White LD, Patel A, et al. Bacterial artificial chromosomeemulation oligonucleotide arrays for targeted clinical array-comparative genomic hybridization analyses. Genet Med 2008;10:278 – 89. 14. Shen Y, Irons M, Miller DT, Cheung SW, Lip V, Sheng X, et al. Development of a focused

668 Clinical Chemistry 55:4 (2009)

15.

16.

17.

18.

19.

20.

21.

22.

23.

24.

25.

26.

oligonucleotide-array comparative genomic hybridization chip for clinical diagnosis of genomic imbalance. Clin Chem 2007;53:2051–9. Ylstra B, van den Ijssel P, Carvalho B, Brakenhoff RH, Meijer GA. BAC to the future! or oligonucleotides: a perspective for micro array comparative genomic hybridization (array CGH). Nucleic Acids Res 2006;34:445–50. Pease AC, Solas D, Sullivan EJ, Cronin MT, Holmes CP, Fodor SP. Light-generated oligonucleotide arrays for rapid DNA sequence analysis. Proc Natl Acad Sci U S A 1994;91:5022– 6. Lipshutz RJ, Morris D, Chee M, Hubbell E, Kozal MJ, Shah N, et al. Using oligonucleotide probe arrays to access genetic diversity. Biotechniques 1995;19:442–7. Lockhart DJ, Dong H, Byrne MC, Follettie MT, Gallo MV, Chee MS, et al. Expression monitoring by hybridization to high-density oligonucleotide arrays. Nat Biotechnol 1996;14:1675– 80. Fodor SP, Read JL, Pirrung MC, Stryer L, Lu AT, Solas D. Light-directed, spatially addressable parallel chemical synthesis. Science 1991;251: 767–73. Barrett MT, Scheffer A, Ben-Dor A, Sampas N, Lipson D, Kincaid R, et al. Comparative genomic hybridization using oligonucleotide microarrays and total genomic DNA. Proc Natl Acad Sci U S A 2004;101:17765–70. Bignell GR, Huang J, Greshock J, Watt S, Butler A, West S, et al. High-resolution analysis of DNA copy number using oligonucleotide microarrays. Genome Res 2004;14:287–95. Huang J, Wei W, Zhang J, Liu G, Bignell GR, Stratton MR, et al. Whole genome DNA copy number changes identified by high density oligonucleotide arrays. Hum Genomics 2004;1: 287–99. Zhao X, Li C, Paez JG, Chin K, Janne PA, Chen TH, et al. An integrated view of copy number and allelic alterations in the cancer genome using single nucleotide polymorphism arrays. Cancer Res 2004;64:3060 –71. Peiffer DA, Le JM, Steemers FJ, Chang W, Jenniges T, Garcia F, et al. High-resolution genomic profiling of chromosomal aberrations using Infinium whole-genome genotyping. Genome Res 2006;16:1136 – 48. McCarroll SA, Kuruvilla FG, Korn JM, Cawley S, Nemesh J, Wysoker A, et al. Integrated detection and population-genetic analysis of SNPs and copy number variation. Nat Genet 2008;40:1166 –74. Stankiewicz P, Beaudet AL. Use of array CGH in the evaluation of dysmorphology, malformations,

27.

28.

29.

30.

31.

32.

33.

34.

35.

36.

37.

developmental delay, and idiopathic mental retardation. Curr Opin Genet Dev 2007;17:182–92. Selzer RR, Richmond TA, Pofahl NJ, Green RD, Eis PS, Nair P, et al. Analysis of chromosome breakpoints in neuroblastoma at sub-kilobase resolution using fine-tiling oligonucleotide array CGH. Genes Chromosomes Cancer 2005;44:305–19. Friedman JM, Baross A, Delaney AD, Ally A, Arbour L, Armstrong L, et al. Oligonucleotide microarray analysis of genomic imbalance in children with mental retardation. Am J Hum Genet 2006;79:500 –13. Carvalho B, Ouwerkerk E, Meijer GA, Ylstra B. High resolution microarray comparative genomic hybridisation analysis using spotted oligonucleotides. J Clin Pathol 2004;57:644 – 6. Baldwin EL, Lee JY, Blake DM, Bunke BP, Alexander CR, Kogan AL, et al. Enhanced detection of clinically relevant genomic imbalances using a targeted plus whole genome oligonucleotide microarray. Genet Med 2008;10:415–29. Martin CL, Nawaz Z, Baldwin EL, Wallace EJ, Justice AN, Ledbetter DH. The evolution of molecular ruler analysis for characterizing telomere imbalances: from fluorescence in situ hybridization to array comparative genomic hybridization. Genet Med 2007;9:566 –73. Toruner GA, Streck DL, Schwalb MN, Dermody JJ. An oligonucleotide based array-CGH system for detection of genome wide copy number changes including subtelomeric regions for genetic evaluation of mental retardation. Am J Med Genet A 2007;143A:824 –9. Aradhya S, Cherry AM. Array-based comparative genomic hybridization: clinical contexts for targeted and whole-genome designs. Genet Med 2007;9:553–9. Cheung SW, Shaw CA, Yu W, Li J, Ou Z, Patel A, et al. Development and validation of a CGH microarray for clinical cytogenetic diagnosis. Genet Med 2005;7:422–32. Ming JE, Geiger E, James AC, Ciprero KL, Nimmakayalu M, Zhang Y, et al. Rapid detection of submicroscopic chromosomal rearrangements in children with multiple congenital anomalies using high density oligonucleotide arrays. Hum Mutat 2006;27:467–73. Mantripragada KK, Thuresson AC, Piotrowski A, Diaz de Stahl T, Menzel U, Grigelionis G, et al. Identification of novel deletion breakpoints bordered by segmental duplications in the NF1 locus using high resolution array-CGH. J Med Genet 2006;43:28 –38. Mantripragada KK, Buckley PG, Jarbo C, Menzel

Reviews

Microarrays in Clinical Molecular Diagnostics

38.

39.

40.

41.

42.

43.

44.

45.

46.

47.

48.

U, Dumanski JP. Development of NF2 gene specific, strictly sequence defined diagnostic microarray for deletion detection. J Mol Med 2003;81: 443–51. Dhami P, Coffey AJ, Abbs S, Vermeesch JR, Dumanski JP, Woodward KJ, et al. Exon array CGH: detection of copy-number changes at the resolution of individual exons in the human genome. Am J Hum Genet 2005;76:750 – 62. Rouleau E, Lefol C, Tozlu S, Andrieu C, Guy C, Copigny F, et al. High-resolution oligonucleotide array-CGH applied to the detection and characterization of large rearrangements in the hereditary breast cancer gene BRCA1. Clin Genet 2007; 72:199 –207. del Gaudio D, Yang Y, Boggs BA, Schmitt ES, Lee JA, Sahoo T, et al. Molecular diagnosis of Duchenne/Becker muscular dystrophy: enhanced detection of dystrophin gene rearrangements by oligonucleotide array-comparative genomic hybridization. Hum Mutat 2008;29:1100 –7. Staaf J, Torngren T, Rambech E, Johansson U, Persson C, Sellberg G, et al. Detection and precise mapping of germline rearrangements in BRCA1, BRCA2, MSH2, and MLH1 using zoom-in array comparative genomic hybridization (aCGH). Hum Mutat 2008;29:555– 64. Saillour Y, Cossee M, Leturcq F, Vasson A, Beugnet C, Poirier K, et al. Detection of exonic copy-number changes using a highly efficient oligonucleotide-based comparative genomic hybridization-array method. Hum Mutat 2008; 29:1083–90. Hegde MR, Chin EL, Mulle JG, Okou DT, Warren ST, Zwick ME. Microarray-based mutation detection in the dystrophin gene. Hum Mutat 2008;29: 1091–9. Wong LJ, Dimmock D, Geraghty MT, Quan R, Lichter-Konecki U, Wang J, et al. Utility of oligonucleotide array-based comparative genomic hybridization for detection of target gene deletions. Clin Chem 2008;54:1141– 8. Reddy KS. Cytogenetic abnormalities and fragile-X syndrome in autism spectrum disorder. BMC Med Genet 2005;6:3. Vorstman JA, Staal WG, van Daalen E, van Engeland H, Hochstenbach PF, Franke L. Identification of novel autism candidate regions through analysis of reported cytogenetic abnormalities associated with autism. Mol Psychiatry 2006;11:1, 18 –28. Jacquemont ML, Sanlaville D, Redon R, Raoul O, Cormier-Daire V, Lyonnet S, et al. Array-based comparative genomic hybridisation identifies high frequency of cryptic chromosomal rearrangements in patients with syndromic autism spectrum disorders. J Med Genet 2006;43:843–9. Sebat J, Lakshmi B, Malhotra D, Troge J, LeseMartin C, Walsh T, et al. Strong association of de novo copy number mutations with autism. Science 2007;316:445–9.

49. Szatmari P, Paterson AD, Zwaigenbaum L, Roberts W, Brian J, Liu XQ, et al. Mapping autism risk loci using genetic linkage and chromosomal rearrangements. Nat Genet 2007;39:319 –28. 50. Weiss LA, Shen Y, Korn JM, Arking DE, Miller DT, Fossdal R, et al. Association between microdeletion and microduplication at 16p11.2 and autism. N Engl J Med 2008;358:667–75. 51. Christian SL, Brune CW, Sudi J, Kumar RA, Liu S, Karamohamed S, et al. Novel submicroscopic chromosomal abnormalities detected in autism spectrum disorder. Biol Psychiatry 2008;63: 1111–7. 52. Marshall CR, Noor A, Vincent JB, Lionel AC, Feuk L, Skaug J, et al. Structural variation of chromosomes in autism spectrum disorder. Am J Hum Genet 2008;82:477– 88. 53. Kumar RA, KaraMohamed S, Sudi J, Conrad DF, Brune C, Badner JA, et al. Recurrent 16p11.2 microdeletions in autism. Hum Mol Genet 2008; 17:628 –38. 54. Miller DT, Shen Y, Weiss LA, Korn J, Anselm I, Bridgemohan C, et al. Microdeletion/duplication at 15q13.2q13.3 among individuals with features of autism and other neuropsychiatric disorders. J Med Genet [Epub ahead of print 2008 Nov 26]. 55. Mefford HC, Sharp AJ, Baker C, Itsara A, Jiang Z, Buysse K, et al. Recurrent rearrangements of chromosome 1q21.1 and variable pediatric phenotypes. N Engl J Med 2008;359:1685–99. 56. Morrow EM, Yoo SY, Flavell SW, Kim TK, Lin Y, Hill RS, et al. Identifying autism loci and genes by tracing recent shared ancestry. Science 2008;321: 218 –23. 57. Schaefer GB, Mendelsohn NJ. Genetics evaluation for the etiologic diagnosis of autism spectrum disorders. Genet Med 2008;10:4 –12. 58. Shaffer LG, Beaudet AL, Brothman AR, Hirsch B, Levy B, Martin CL, et al. Microarray analysis for constitutional cytogenetic abnormalities. Genet Med 2007;9:654 – 62. 59. Thorland EC, Gonzales PR, Gliem TJ, Wiktor AE, Ketterling RP. Comprehensive validation of array comparative genomic hybridization platforms: How much is enough? Genet Med 2007;9:632– 41. 60. Ballif BC, Rorem EA, Sundin K, Lincicum M, Gaskin S, Coppinger J, et al. Detection of lowlevel mosaicism by array CGH in routine diagnostic specimens. Am J Med Genet A 2006;140: 2757– 67. 61. Cheung SW, Shaw CA, Scott DA, Patel A, Sahoo T, Bacino CA, et al. Microarray-based CGH detects chromosomal mosaicism not revealed by conventional cytogenetics. Am J Med Genet A 2007;143A:1679 – 86. 62. Cross J, Peters G, Wu Z, Brohede J, Hannan GN. Resolution of trisomic mosaicism in prenatal diagnosis: estimated performance of a 50K SNP microarray. Prenat Diagn 2007;27:1197–204. 63. Lee C, Iafrate AJ, Brothman AR. Copy number vari-

64.

65.

66.

67.

68.

69.

70.

71.

72.

73.

74.

75.

ations and clinical cytogenetic diagnosis of constitutional disorders. Nat Genet 2007;39:S48 –54. Itsara A, Cooper GM, Baker C, Girirajan S, Li J, Absher D, et al. Population analysis of large copy number variants and hotspots of human genetic disease. Am J Hum Genet 2009;84:148 – 61. Shaffer LG, Bejjani BA, Torchia B, Kirkpatrick S, Coppinger J, Ballif BC. The identification of microdeletion syndromes and other chromosome abnormalities: cytogenetic methods of the past, new technologies for the future. Am J Med Genet C Semin Med Genet 2007;145C:335– 45. Lu X, Shaw CA, Patel A, Li J, Cooper ML, Wells WR, et al. Clinical implementation of chromosomal microarray analysis: summary of 2513 postnatal cases. PLoS ONE 2007;2:e327. Pickering DL, Eudy JD, Olney AH, Dave BJ, Golden D, Stevens J, Sanger WG. Array-based comparative genomic hybridization analysis of 1176 consecutive clinical genetics investigations. Genet Med 2008;10:262– 6. Aston E, Whitby H, Maxwell T, Glaus N, Cowley B, Lowry D, et al. Comparison of targeted and whole genome analysis of postnatal specimens using a commercially available array based comparative genomic hybridisation (aCGH) microarray platform. J Med Genet 2008;45:268 –74. Fan YS, Jayakar P, Zhu H, Barbouth D, Sacharow S, Morales A, et al. Detection of pathogenic gene copy number variations in patients with mental retardation by genomewide oligonucleotide array comparative genomic hybridization. Hum Mutat 2007;28:1124 –32. Xiang B, Li A, Valentin D, Nowak NJ, Zhao H, Li P. Analytical and clinical validity of wholegenome oligonucleotide array comparative genomic hybridization for pediatric patients with mental retardation and developmental delay. Am J Med Genet A 2008;146A:1942–54. Bejjani BA, Saleki R, Ballif BC, Rorem EA, Sundin K, Theisen A, et al. Use of targeted array-based CGH for the clinical diagnosis of chromosomal imbalance: Is less more? Am J Med Genet A 2005;134:259 – 67. Shearer BM, Thorland EC, Gonzales PR, Ketterling RP. Evaluation of a commercially available focused aCGH platform for the detection of constitutional chromosome anomalies. Am J Med Genet A 2007;143A:2357–70. Shaffer LG, Kashork CD, Saleki R, Rorem E, Sundin K, Ballif BC, Bejjani BA. Targeted genomic microarray analysis for identification of chromosome abnormalities in 1500 consecutive clinical cases. J Pediatr 2006;149:98 –102. Koolen DA, Sharp AJ, Hurst JA, Firth HV, Knight SJ, Goldenberg A, et al. Clinical and molecular delineation of the 17q21.31 microdeletion syndrome. J Med Genet 2008;45:710 –20. Bejjani BA, Shaffer LG. Application of array-based comparative genomic hybridization to clinical diagnostics. J Mol Diagn 2006;8:528 –33.

Clinical Chemistry 55:4 (2009) 669

Reviews

Clinical Chemistry 55:4 670–683 (2009)

Analytical Ancestry: “Firsts” in Fluorescent Labeling of Nucleosides, Nucleotides, and Nucleic Acids Larry J. Kricka1* and Paolo Fortina2,3

BACKGROUND: The inherent fluorescent properties of nucleosides, nucleotides, and nucleic acids are limited, and thus the need has arisen for fluorescent labeling of these molecules for a variety of analytical applications. CONTENT:

This review traces the analytical ancestry of fluorescent labeling of nucleosides, nucleotides, and nucleic acids, with an emphasis on the first to publish or patent. The scope of labeling includes (a) direct labeling by covalent labeling of nucleic acids with a fluorescent label or noncovalent binding or intercalation of a fluorescent dye to nucleic acids and (b) indirect labeling via covalent attachment of a secondary label to a nucleic acid, and then binding this to a fluorescently labeled ligand binder. An alternative indirect strategy involves binding of a nucleic acid to a nucleic acid binder molecule (e.g., antibody, antibiotic, histone, antibody, nuclease) that is labeled with a fluorophore. Fluorescent labels for nucleic acids include organic fluorescent dyes, metal chelates, carbon nanotubes, quantum dots, gold particles, and fluorescent minerals.

SUMMARY:

Fluorescently labeled nucleosides, nucleotides, and nucleic acids are important types of reagents for biological assay methods and underpin current methods of chromosome analysis, gel staining, DNA sequencing and quantitative PCR. Although these methods use predominantly organic fluorophores, new types of particulate fluorophores in the form of nanoparticles, nanorods, and nanotubes may provide the basis of a new generation of fluorescent labels and nucleic acid detection methods.

© 2009 American Association for Clinical Chemistry

1

Department of Pathology and Laboratory Medicine, University of Pennsylvania School of Medicine, Philadelphia, PA; 2 Department of Cancer Biology, Kimmel Cancer Center, Thomas Jefferson University, Jefferson Medical College, Philadelphia, PA; 3 Dipartimento di Medicina Sperimentale, Universita’ “La Sapienza,” School of Medicine, Rome, Italy. * Address correspondence to this author at: Department of Pathology and Laboratory Medicine, University of Pennsylvania School of Medicine, Philadelphia, PA 19104. Fax: 215 662 7529; e-mail [email protected]. Received August 12, 2008; accepted January 15, 2009. Previously published online at DOI: 10.1373/clinchem.2008.116152

670

In science, being the first to invent or describe a method or a composition of matter or expound a valid theory carries significant prestige. Tangible rewards for being first may include granting of a limited-term monopoly in the form of a patent or international recognition and the award of major scientific prizes (1 ). Determining who was first is not always straightforward, however. History is replete with corrections and conflicts on this highly charged and often commercially sensitive topic, as is evident from the controversy that surrounds such familiar concepts as calculus and items such as the slide rule, laser, and telephone (2–5 ). In this review, we trace the origins of fluorescent labeling of nucleic acids, tracking the evolution of ideas and emergence of techniques and examining the associated intellectual property via issued patents. Our focus is on who was first to describe or discover a particular type of compound or technique or property of matter. This information is of particular significance for inquiries into the validity of patents through anticipation and obviousness analysis (6 ). As others have noted about earliest dates, however, they “have a way of becoming unfixed as the history of the subject is further studied” and “there is no way of knowing what future students will unearth” (7 ). The scope of the article is limited to labels that produce fluorescence upon irradiation with excitation energy of the appropriate wavelength. We do not consider phosphorescent labels, labels that can be converted to a fluorophore (e.g., fluorescein diacetate nanocrystal labels) (8 ), or labels that act on other substances to produce fluorescent products (e.g., alkaline phosphatase label and a fluorogenic substrate). The scientific literature is now enormous. PubMed includes more than 17 million citations back to the 1950s (9 ), the CAplus database contains more than 27 million patent and journal articles (10 ), and more than 7 million US patents have been issued (11 ). We have searched extensively in this massive collection of abstracts, papers, reviews, and books; however, an ever-present danger is that we have overlooked an obscure publication or a public disclosure at a scientific meeting captured in an abstract book that did not make its way into a library or into a public database. Likewise,

Reviews

Analytical Ancestry

Fig. 1. Landmarks in fluorescence and nucleic acid chemistry. P, patent priority.

we have inevitable linguistic biases, and publications in some languages may have escaped our scrutiny. FLUORESCENCE IN NUCLEIC ACID ANALYSIS

Fluorescence was observed in antiquity, but the science of fluorescence dates back to work by Sir George Stokes (Stokes Law of Fluorescence: the wavelength of fluores-

cence emission is greater than that of the exciting radiation) who coined the term “fluorescence” in 1852 (Fig. 1) (12 ). The term “fluorophore” to describe a chemical group associated with fluorescence was coined by Richard Meyer in 1897 (13 ). Other important landmarks were the synthesis of the fluorescent dye, fluorescein, by Adolph von Baeyer in 1871 (14 ) Clinical Chemistry 55:4 (2009) 671

Reviews and the development of the fluorescence microscope in 1911 by Heimstadt and Lehman (15 ) and the epifluorescence microscope in 1929 by Ellinger and Hirt (16 ). The range of fluorescence emission spans the electromagnetic spectrum from the x-ray (⬍10 nm) through the visible (380 –750 nm) to the infrared (IR)4 (630 nm to 3000 ␮m) regions of the spectrum. Nucleic acids have been labeled with fluorophores with emissions in the x-ray (17 ), visible (18, 19 ), and IR (20, 21 ) regions of the spectrum. As the discovery of fluorescence precedes that of nucleic acids, the starting point of our inquiries was the history of nucleic acids. The nucleic acid DNA was first isolated in 1869 by Friedrich Miescher (22 ). However, following its successful separation into a protein and an acid molecule, his pupil, Richard Altmann, named it “nucleic acid” in 1889 (23 ). The structural components (the 4 bases, the sugar, and the phosphate chain) were identified in 1929 by Phoebus Levene, and he showed that the components of DNA were linked in the order phosphate-sugar-base (24 ). He called each of these units a nucleotide and proposed that the DNA molecule consisted of a string of nucleotides linked together via the phosphate “backbone” of the molecule. Subsequently, in 1953, Watson and Crick solved the 3-dimensional structure of the DNA molecule and showed it to be a double helix (25 ). The study of the fluorescence of nucleic acids and the development of fluorescently labeled nucleic acids is set against a historical background of fluorescent methods for bioanalysis (26 ). Fluorescence was already established by the late 1940s for both in vitro (27 ) and in vivo applications (28 ) and had been in use in other areas of analysis since at least 1922, when Hadding used x-ray fluorescence to analyze minerals (29 ). By the 1930s, patents had been granted covering fluorescencedetecting apparatus useful in the diagnosis of disease (30 ), and during the 1960s and 1970s, numerous patents were issued on the use of fluorescence for analysis of cells (31, 32 ) and virus particles (33 ). Filing of patents for fluorescent nucleic acid probes began in the early 1980s and included directly labeled probes (34 – 37 ) and indirectly labeled probes (34, 35, 38, 39 ). Filings for the application of fluorescent labels in sequencing also began to appear in the 1980s for both direct (40, 41 ) and indirect (42, 43 ) labeling.

4

Nonstandard abbreviations: IR, infrared; yW, wybutosine; AAF, 2-acetylaminofluorene; NIR, near infrared; CNT, carbon nanotube; tRNA, transfer RNA; sRNA, soluble RNA; 5BrU, 5-bromouracil; TdT, terminal deoxynucleotidyl transferase; TMR, tetramethylrhodamine; YOYO, oxazole yellow homodimer; TO, thiazole orange; TOTO, thiazole orange homodimer; dsDNA, double-stranded DNA; ssDNA, single-stranded DNA; TMV, tobacco mosaic virus.

672 Clinical Chemistry 55:4 (2009)

Study of the fluorescence of bases, nucleosides, nucleotides, and their polymers has a long history dating back to the early part of the 20th century (44 – 46 ). However, the fluorescence of nucleic acids is weak and has not proved particularly useful analytically, except in the case of nucleic acids containing certain modified bases that are naturally fluorescent. The first to be described was a base designated as “Y” [wybutosine (yW)] (47 ) in L-phenylalanyl-tRNAPhe, and this has been chemically modified to another fluorescent form of the base by treatment with ammonium carbonate (48 ). Subsequently, other fluorescent modified bases (e.g., pseudo uridine, 4-thiouridine, dihydrouridine, N4acetylcytidine, 7-methylguanine, 7-methylinosine) were discovered (49 –53 ). In view of the limited fluorescent properties of nucleic acids, the application of fluorescence in nucleic acids analysis has followed several pathways (Fig. 2). Direct labeling can be achieved by covalent labeling of nucleic acids with a fluorescent label or noncovalent binding (staining or intercalation) of a fluorescent dye to nucleic acids (Fig. 2A, B). Indirect labeling can be achieved by first covalently attaching a secondary label to a nucleic acid and then binding this to a fluorescently labeled ligand binder (Fig. 2C). Alternatively, a nucleic acid can be bound to nucleic acid binder molecule (e.g., antibody, antibiotic, histone, antibody, nuclease) that is labeled with a fluorophore (Fig. 2D). MOTIVATION FOR FLUORESCENCE LABELING

By and large, the motivation for combining fluorescence and nucleic acids has been to provide a nonisotopic label (tag or marker or reporter group) that has a detectable signal to study nucleic acid sequence, structure, structural dynamics, protein and ligand interactions, or hybridization with other nucleic acids (probing) (34 ) (fluorescent labeling of the broad class of receptors, specifically nucleic acids, has been described (54 )). The impetus for synthesizing fluorescent nucleoside or nucleotide analogs has been for photoaffinity labeling, preparing coenzyme analogs, improving detectability in chromatographic analysis, and rendering DNA fragments detectable in polyacrylamide gel electrophoresis as part of dideoxy DNA sequencing protocols, or in quantitative PCR reactions. It should be appreciated that the genesis of some fluorescently labeled nucleic acids, nucleotides, or nucleosides has not always been motivated by a specific desire to develop fluorescently labeled materials. For example, nucleic acids fluorescently labeled at the C-8 of guanine bases have been isolated or synthesized in studies of the mechanism of the interaction of DNA and carcinogens such as 2-acetylaminofluorene (AAF) (55 ).

Analytical Ancestry

Reviews

Fig. 2. Covalent (A) and noncovalent (B–D) fluorescent labeling strategies. (A), Direct covalent labeling. (B), Staining or intercalation. (C), Indirect labeling via binding of a fluorescently labeled ligand binder to a ligand covalently attached to a nucleic acid. (D), Indirect labeling via binding of a fluorescently labeled nucleic acid binder directly to a nucleic acid.

SCOPE AND SELECTION OF FLUORESCENT LABELS

The vast majority of fluorescent nucleic acid labeling studies have used organic fluorescent dye molecules (e.g., fluorescein, rhodamine); however, fluorescent metal chelates that have a long-lived time-resolvable signals (e.g., europium chelates) and various organic (e.g., carbon nanotubes) and inorganic (e.g., quantum dots, gold particles, fluorescent minerals) particles have also been used. In most cases, the fluorescent labels used for nucleic acids had been previously used as labels in immunoassays. Considerations in the choice of the fluorophore include factors such as fluorescence quantum yield, Stokes shift, fluorescence emission spectrum (including time-resolvability) and ability to use several fluorophore labels simultaneously, susceptibility to photobleaching, and reduction of background interference, as in the case of time-resolved fluorescent labels based on lanthanide chelates (37, 56 ) and cyanine (57 ) and metal (20, 21 ) chelate-based IR labels. At a pragmatic level, the choice of the fluorophore has been guided by its availability and ease of attachment, and in this regard various activated fluorescein molecules have enjoyed considerable popularity (e.g., fluorescein isothiocyanate). The signal from a fluorescent label is determined in part by the fluorescence quantum yield of the fluorophore, and, the near unity quantum yield of fluorescein underlies the popularity of this label. The fluorescent signal emitted from a fluorophore depends directly on the intensity of the exci-

tation light. As the excitation light intensity increases, however, there is a tendency for organic molecules to decompose, and this leads to photobleaching and loss of fluorescent signal. Inorganic fluorophore labels such as quantum dots are not prone to photobleaching, and this has spurred their application in nucleic analysis. The Stokes shift is another important characteristic of a fluorophore label. A large Stokes shift is advantageous because it minimizes interference by the excitation light in the measurement of the fluorescence emission. Stokes shifts of up to 200 nm are possible with seminaphthofluorone type dyes (Stokes shift for fluorescein is approximately 20 nm) (58 ). An emission wavelength in the near infrared (NIR) can be advantageous for a fluorophore label, as this minimizes interfering fluorescence from biological samples, reduces scattering, increases tissue penetration for the signal, and allows the use of low-cost laser diode excitation sources (37, 56 ). Also, the overall structure of a fluorophore label can influence the ability of a labeled nucleic acid to serve as a substrate for an enzyme. This has been a critical factor in the development of labeled nucleotides for sequencing using a polymerase, and the linker that attaches a fluorophore label to a nucleotide has a pronounced effect on the effectiveness of a DNA polymerase to incorporate a fluorescent nucleotide (59 ). An interesting trend has been the impact of microand nanotechnology on direct and indirect fluorescent labeling of nucleic acids in the form of micro- or nanosized organic or inorganic particles. The scope of Clinical Chemistry 55:4 (2009) 673

Reviews this strategy includes quantum dots (60 – 62 ), metal nanoparticles and nanorods (63 ), carbon nanotubes (CNTs) (64, 65 ), dye-doped core-shell particles (66 ), dyed latex particles (67, 68 ), and liposomes or polymer shells filled with fluorescent particles (e.g., quantum dots, dyed polymer beads, and naturally occurring minerals such as eucryptite) (69 ). Fluorescent metal chelates. A chelate-based strategy for fluorescent labeling presents an interesting aspect of determining who was first to describe a particular type of fluorescent labeling. A patent published in 1977 describes labeling of “target substance” with rare-earth complexes (70 ), and another patent with a priority in 1979 describes labeling of antigens with fluorescent metal chelates (71 ). Because a nucleic acid such as DNA is an example of a target substance and is an antigen, these generic claims could be viewed as a disclosure of fluorescent metal chelate labeling of a nucleic acid. Likewise, another patent (priority 1981) describes the fluorescent lanthanide chelate labeling of a “biologically active substance,” which of course would include a nucleic acid (72 ). Subsequently, lanthanide chelates were specifically described in a patent (priority 1981) as labels for DNA in a sandwich hybridization assay (37 ). Quantum dot. A quantum dot is a nanocrystal composed of periodic group II–VI (e.g., CdSe, CdS), II–V (InP, InAs), or IV–VI (e.g., PbTe, PbS) materials. It can contain as few as 10 –50 atoms and have a diameter as small as 2–10 nm (73–75 ). Advantages of quantum dots compared with conventional organic fluorescent dyes include high quantum yield (bright signal), less susceptibility to photobleaching, and a fluorescence emission wavelength that is directly related to the diameter of the quantum dot. A quantum dot is usually coated with a shell to improve quantum efficiency and stability. The shell surface of a “core/shell” quantum dot can be functionalized by treatment with organic molecules (e.g., silanes) that provide points of attachment for nucleosides or nucleotides (60, 61 ) or DNA (62 ). Early suggestions for the use of quantum dots are to be found in articles published in 1998 (76, 77 ). Interestingly, quantum dots had been attached to nucleic acids such as transfer RNA (tRNA) as part of a quantum dot synthesis scheme in which the tRNA, dispersed in a gel matrix, acted as an ionexchange/nucleation site for formation of the quantum dots (e.g., AgO, CdS) (78 ). Metal nanoparticle or nanorod. A starting point for gold as a fluorescent label was the description of the fluorescence of bulk gold by Mooradian in 1969 (79 ). Subsequently, the fluorescence of gold nanoparticle clusters and nanorods (80, 81 ) was demonstrated. In hindsight, all of the early work using colloidal gold–labeled 674 Clinical Chemistry 55:4 (2009)

nucleic acids (82 ) could be considered fluorescent labeling, as could the DNA self-assembly of gold nanoparticle studies (83 ), albeit in neither case was the gold detected by fluorescence. Recently, however, fluorescent gold nanorod labels (17 nm diameter, 230 nm long; emissions at 743 and 793 nm) have been used to label DNA. The nanorod label was attached to thiolated DNA via Au-SH binding (80 ). Carbon nanotube. A CNT is a nanometer-diameter cylindrical carbon molecule, and as discovered in 2002, it fluoresces in the near-infrared part of the spectrum (84 ). A CNT can be derivatized to contain carboxyl groups, and these serve as points of attachment for fluorescent labeling of nucleic acids (64, 65 ). DIRECT COVALENT LABELING OF NUCLEIC ACIDS WITH A FLUORESCENT LABEL

There are several sites on a nucleic acid molecule at which covalent attachment is possible, including attachment sites on the sugar, the phosphate, and the different purine and pyrimidine bases (Fig. 3) (85 ). Most labeling reactions are designed to target a specific location on a nucleic acid, but indiscriminate covalent labeling is possible using nitrenes that react at random with other molecular species by interatomic radical insertion reactions involving, for example, carbon– hydrogen, oxygen– hydrogen, and nitrogen– hydrogen bonds. This highly reactive functional group has been used to attach an ethidium fluorophore to DNA by reacting the DNA with ethidium azide (86 ). This strategy is analogous to other photochemically generated free radical– based labeling reactions developed in the 1960s for proteins (87 ). Labeling the sugar. It appears that the first example of covalent labeling of a nucleic acid at any position involved the ribose sugar and can be attributed to Feulgen in 1924 (18, 88 ). Acid hydrolysis of DNA causes depurination, and the liberated aldehyde group of the ribose sugar is then available to react with the amine group of a pararosaniline (fuchsin) Schiff reagent. A specifically fluorescent Feulgen method was published by Ornstein et al. in 1957 (89 ) and was based on acriflavine as the Schiff-type aldehyde reagent; this produced a green fluorescent staining of nuclei in tissue sections. The scope of this reaction was subsequently expanded to other fluorogenic Schiff reagents by Kasten et al. (90 ). The first example of covalent labeling of a nucleic acid in solution, as opposed to a tissue section, was described in 1958 by Kissane and Robins (19 ). They were interested in developing a fluorometric assay for DNA in brain tissue. Their method entailed depurination of the DNA followed by

Reviews

Analytical Ancestry

Fig. 3. Sites of attachment for fluorescent labels on a nucleic acid.

reaction of the aldehyde group in the deoxyribose with 3,5-diaminobenzoic acid to produce a fluorescent Schiff base product.

An alternative nondepurinating labeling method was described by Churchich in 1963 (91 ). Periodateoxidative ring opening of the ribose sugar ring of soluClinical Chemistry 55:4 (2009) 675

Reviews ble RNA (sRNA) and subsequent reaction of the aldehydes produced with the reactive amino groups of acriflavine (3,6-diamino-10-methylacridine) gave fluorescently labeled sRNA. This was used to determine the relaxation time of the sRNA by a fluorescence polarization method. Fluorescent labeling of an intact sugar at the 3⬘ or 5⬘ position dates back to 1973 (92 ). Thymidine, blocked at the 3⬘ or 5⬘ position, was reacted with ␣-naphthyl isocyanate to produce thymidine 3⬘- or 5⬘naphthylcarbamate. Subsequently, the scope of the fluorophores attached to these sugar ring positions was expanded to include other well-known fluorophores such as anthracene (93 ) and dansyl (94 ). Alternative strategies use 5⬘-N-protected 5⬘-amino phosphoramidites, and after deprotection, the amino group is reacted with an activated fluorescent dye (e.g., fluorescein isothiocyanate) (95 ). Labeling via a reaction that bridges the 3⬘ and 5⬘ position was also developed in 1973 (96 ). ATP was trinitrophenylated by simultaneous reaction at the 2⬘- and 3⬘-hydroxyl groups of the ribose sugar to give an ATP derivative that fluoresced in ethanol–water solutions. Labeling the phosphate. The first fluorescent labeling of the phosphate group at the 5⬘ position dates back to 1973 (97 ). This was achieved by first synthesizing dansyl or anthraniloyl phosphoromorpholidate derivatives and then reacting these with the 5⬘-phosphate of tRNA. Labeling of the 3⬘-phosphate was reported by Gohlke et al. (98 ) as part of studies to make fluorogenic substrates for a ribonuclease assay (e.g., 2⬘,5⬘-bis-tert-butyldimethylsilyl 3⬘-uridine-4-methylumbelliferone-7-yl)phosphate). In 1989, the scope of phosphate labeling was expanded by labeling an internucleotide phosphate (99 ). Oligonucleotides synthesized to contain reactive phosphorthioate diesters at specific locations were dansylated to form fluorescent phosphorthioate triesters. Labeling the base. Direct fluorescent labeling of a base can be traced back to work on the photoreaction between skin-photosensitizing furocoumarins and flavin mononucleotide (100 ) and the expansion of the reaction to pyrimidine bases of nucleic acids, such as thymine (101–106 ). Usually, labeling of thymine is problematic because the 5-methyl substituent blocks the reactive 5-position. However, fluorescent labeling that involves reaction at the 5- and 6-positions of a thymine base or other pyrimidine bases is possible via a photochemically induced cyclo-addition reaction with various furocoumarins (e.g., 5-methoxypsoralen) (103 ). Thymine has also been rendered fluorescent by an alkylation reaction to produce a 1-(2,3-dioxobutyl) thymine derivative (107 ). 676 Clinical Chemistry 55:4 (2009)

Subsequently, guanosine was labeled at the 8position with N-acetoxyl-N-2-fluorenyl acetamide as part of studies on the reactions of the carcinogen Nacetoxy-N-2-fluorenylacetamide with guanosine (55 ). Ring amine groups of guanine can also be labeled via reaction with a diazotized fluorescein derivative (108 ). Labeling of AMP or dAMP, or labeling of A in poly(A) [single stranded or complexed with poly(U)], was achieved in 1974 by reaction with 9-bromomethylanthracene (109 ). Reaction occurred at the amino group at the 6-position and also at the 1-position in the case of the mononucleotide. Fluorescent labeling of adenosine and cytidine by a reaction that bridges the 1and 6-position of adenine and the 3- and 4-position of cytidine to produce etheno compounds can be achieved via a cyclization reaction with chloroacetaldehyde. This was first described by Kochetov et al. (85 ) and developed for fluorescent labeling purposes by others (110, 111 ). The synthesis of a fluorescent 1-(2,3-dioxobutyl) uracil derivative was reported in 1978 (107 ). Later, Saito et al. (112 ) adapted a photochemical alkylation reaction to make strongly fluorescent 5-pyrenyl uridine. Labeling of aminoacylated nucleic acids. Some nucleic acids, such as tRNA, are modified by addition of an amino acid, and reactive chemical groups on the amino acid provide convenient sites for covalent attachment of a label. This is exemplified by the naphthoxyhpacetylation of the amino group in the amino acid moiety in tRNA using a 2-naphthoxy acetyl ester of N-hydroxysuccinimide. This synthetic procedure was described in 1968 using a series of tRNAs (e.g., tRNAAsp) (113 ) and applied as a fluorescent labeling strategy several years later for tRNAIle (114 ). Labeling of modified bases and nucleic acids. Several of the modified bases that occur in the nucleic acids of some organisms have chemically reactive side chains that are suitable for attachment of fluorescent labels (115 ). The first examples of this type of labeling were reported in 1971. The thiol substituent in 4-thiouridine and 4-thiouracil was reacted with a fluorescent coumarin derivative to give the fluorescent sulfide (116 ). Another contemporaneous route to a fluorescent tRNA derivative involves photodimerization of 4-thiouridine with cytidine, followed by sodium borohydride reduction of the photodimer (117 ). Table1 lists other modified bases that have been fluorescent labeled (see review by Favre and Thomas (118 )). Labeling via enzymatic incorporation of a fluorescent analog or fluorescent base or modified base. Among the earliest examples is to be found in the work of Zeitz and Lee (1963) (17 ). As part of their studies on the radiosensitivity of DNA, they replaced thymine with 5-

Ring-opened sugar Intact sugar 2⬘ position 3⬘ position 5⬘ position Bridging 2⬘ and 3⬘ positions Phosphate 5⬘ 3⬘ Base A Position not known 1-position 6-position 5- and 6-position G 2-position 7-position 8-position C 4-position 6-position 3 and 4-position T 3-position

Free bases Nucleosides and nucleotides Oligonucleotides Polynucleotides “Modified” or “odd” base Y 4-Thio U N4-acetylcytidine Covalent fluorescent labeling Sugar Depurinated sugar

Natural fluorescence

Discovery (first use, publication, or patent)

Q (queuosine) X [(3-(3-amino-3-carboxypropyl)U] X-47 pre-Q1 Under modified Y Xanthosine Enzymatic incorporation of fluorescent nucleotides or analogs TdT-catalyzed T4 RNA ligase–catalyzed RNA polymerase–catalyzed Polynucleotidyl phosphorylase–catalyzed DNA polymerase–catalyzed Avian myeloblastosis virus (AMV) reverse transcriptase Noncovalent labeling Staining Eosin Nucleic acid binder Antibiotic Histone Antibody Nuclease Restriction endonuclease Poly(U)

Churchich (91)

Yoshida et al. (176)

Draper and Gold (174) Pochon et al. (52), Leng et al. (175) Barrio et al. (110), Secrist et al. (111)

Jeffrey et al. (171), Koreeda et al. (172) Casperson et al. (173) Kriek et al. (55)

Pochon et al. (53) Pochon and Perrin (109) Pochon and Perrin (109) Barrio et al. (110)

Yang and So¨ll (97) Gohlke et al. (98)

Hiratsuka and Uchida (96) Bienvenu¨e and Tournon (92) Bienvenu¨e and Tournon (92) Hiratsuka and Uchida (96)

Feulgen and Rossenbeck (18)

RajBhandary et al. (47) Lipsett (170) Pochon et al. (53)

1-position 3-position Amino acid group (tRNA) Asp “Modified” or “odd” base 4-thiouridine Modified Y (yW) 9-methyladenine 1-methylcytidine Pseudouridine (⌿) 2-thio-5-(N-methylaminomethyl)uridine

U

5- and 6-position

Discovery (first use, publication, or patent)

Heyroth and Loofbrow (44) Udenfriend and Zaltman (26) Eisinger et al. (168), Gueron et al. (169) Konev (45)

Reference

Table 1. Fluorescence and fluorescent labeling of nucleic acids. Reference

Crissman et al. (153) Lewis (154) Beiser et al. (155) Benjaminson et al. (156) Taylor et al. (68) Cheung et al. (157)

von Provazek (126)

Rozovskaia et al. (119) Richardson and Gumport (121) Ward et al. (183), Ward et al. (184) Zhenodarova and Klyagina (185) Ried et al. (186) Prober et al. (187)

Yang and So¨ll (49), Pingoud et al. (177) Schiller and Schechter (178) Faulhammer et al. (179) Kasai et al. (180) Kuchino et al. (181) Macklin et al. (182)

Favre et al. (51) Yoshikami and Keller (48) Kochetov et al. (85) Kochetov et al. (85) Yang and So¨ll (50) Yang and So¨ll (50)

Gillam et al. (113)

Lee et al. (107) Yoshida et al. (176)

Musajo et al. (103), Musajo et al. (104)

Analytical Ancestry

Reviews

Clinical Chemistry 55:4 (2009) 677

Reviews bromouracil (5BrU) in a DNA sample by growing E. coli B 15T– in a medium containing 5BrU, and then detected the bromine atom in the incorporated 5BrU by irradiating the sample with x-rays and detecting the fluorescence emission of the bromine at approximately 0.1040 nm. Subsequently, the availability of various enzymes facilitated development of in vitro protocols for labeling nucleic acids. These were originally developed for nonfluorescent labeling and subsequently expanded to include fluorescent labeling of nucleic acids. In vitro fluorescent labeling was initially achieved with an E. coli RNA polymerase-catalyzed incorporation reaction using a d(A-T) template and the fluorescent analogs of ATP, formycin, 2-aminopurine, or 2,6diaminopurine. These same analogs were also attached to the terminus of a tRNA molecule using tRNA-CCA pyrophosphorylase (38, 39 ). Other enzymes used for in vitro labeling include terminal deoxynucleotidyl transferase (TdT) that catalyzed the incorporation of fluorescent bases such as 3-O-acyl(fluorescein or rhodamine) UTP (119 ). A variant on this procedure used TdT to incorporate 4-thiouridine at the 3⬘ end of DNA, and the thiol group of the incorporated 4-thiouridine was in turn labeled with fluorescein, eosin, or aminonaphthalene 1-sulfonic acid derivatives (120 ). T4 RNA ligase is also useful for fluorescent labeling. By using fluorescein and tetramethylrhodamine (TMR) derivatives of P1-(6-aminohex-1-yl)-P2-(5⬘-adenosine), it was possible to introduce a fluorescent fluorescein or TMR label onto the 3⬘-hydroxyl group of RNA in good yield (121 ). NONCOVALENT LABELING OF NUCLEIC ACIDS WITH A FLUORESCENT LABEL

The 2 principal methods of noncovalent labeling are direct methods in which a fluorescent dye or particle binds to a single- or double-stranded nucleic acid (staining) (Fig. 2B) and indirect methods in which a fluorescently labeled nucleic acid binding agent (e.g., avidin or an antibody) binds to a secondary label (e.g., biotin, iminobiotin) covalently attached to the nucleic acid (Fig. 2C) or to a specific structure, e.g., an RNA: DNA hybrid (Fig. 2D). Direct noncovalent binding of fluorescent dyes to nucleic acids (staining). The scope of dye-binding detection methods encompasses dyes that bind to nucleotides (122 ), double- and single-stranded nucleic acid; dyes that have selectivity for double- vs single-stranded nucleic acid, DNA vs RNA (123 ); and dyes that bind to the minor groove, e.g., Hoechst 33258 (124 ), and the major groove, e.g., methyl green (125 ), of DNA. Study of the interactions between fluorescent dyes 678 Clinical Chemistry 55:4 (2009)

and nucleic acids traces back to the turn of the 20th century and studies on vital fluorochroming using eosin and erythrosin (126 ), and this in turn has its origins in the colorimetric histochemical staining reactions pioneered by Raspail in the early 1800s (127 ). Binding of the 10-methyl homolog of ethidium bromide to DNA was suggested in 1953 (128 ), and the intercalative binding of acridine, proflavine, acridine orange, and ethidium bromide to nucleic acids was demonstrated over the next decade (129 –132 ). Ensuing years saw the introduction of superior variants of ethidium bromide, e.g., ethidium homodimer (133 ); intercalating dyes based on oxazoles [e.g., oxazole yellow homodimer (YOYO)] and thiazoles [e.g., thiazole orange (TO) and thiazole orange homodimer (TOTO)] that showed greater fluorescent enhancement when bound to double-stranded DNA (dsDNA) (134, 135 ); and dyes such as PicoGreen that show greater selectivity for dsDNA vs RNA or singlestranded DNA (ssDNA) (136 ). Sensitive quantitative fluorescent DNA detection in solution using ethidium bromide was described in 1964 (131, 137, 138 ). However, the application of intercalating dyes to solid phase DNA detection, e.g., in agarose gels, is controversial (139 ). Aaij and Borst described this method in 1972 (140 ), inspired by the bright orange bands observed when DNA was separated in preparative CsCl-ethidium gradients, but an article in the following year has been more commonly cited (141 ). Single-stranded nucleic acids can also be stained. For example, acridine orange staining of singlestranded virus RNA [tobacco mosaic virus (TMV)] was described in 1961 (142 ). Subsequently, other dyes that stain ssRNA, e.g., Cuprolinic blue–magnesium chloride (143 ), dyes for ssDNA including TOTO and YOYO (144 ), and also dyes such as Hoechst 33258 that are selective for dsDNA in the presence of RNA and for dsDNA in the presence of ssDNA (145 ) have been developed. Triple-stranded nucleic acids will also bind to ethidium bromide (137, 138, 146 ). Generally, intercalating dyes show no sequence selectivity, but some dyes, such as ethidium bromide, bind to A-T base pair–rich regions (147 ), and TOTO binds preferentially to 5⬘-pyrimidine-pyrimidinepurine-purine-3⬘ motifs in dsDNA (5⬘-CTAG-3⬘ preferred binding site) (148 ). Indirect noncovalent labeling via binding of fluorescent binding agents to secondary-labeled nucleic acids. Secondary labeling of a nucleic acid with an antigen or hapten provides a general route to indirect DNA labeling through subsequent binding of the secondary label to a fluorescently labeled binding agent.

Reviews

Analytical Ancestry

Biotin and iminobiotin are secondary labels that provide a route to indirect fluorescent labeling of nucleic acids. For example, biotin covalently linked to a nucleic acid can be bound to an antibiotin antibody that is labeled with a fluorophore (34 ), a streptavidinylated particle such as a latex bead containing fluorophores (149 ), or a silica particle. Alternatively, the biotin secondary label can be bound to rabbit antibiotin antibody, and this in turn immunocomplexed with a goat antirabbit IgG labeled with a fluorophore (150 ). Fluorescein is antigenic, and so a route to indirect fluorescent labeling a nucleic acid is to first label it with fluorescein and then react the fluorescein hapten with antifluorescein antibodies that have been labeled with a fluorophore, e.g., fluorescein. This strategy was applied to detect DNA using fluorescein labeled RNA. The further indirect labeling of the RNA using fluoresceinlabeled antifluorescein antibodies produced a 5- to 10fold amplification compared to direct detection of the fluorescein-labeled RNA probe (151 ). A similar strategy has been developed for biotin and iminobiotin ligands that will bind to fluorophore-labeled avidin or streptavidin (38, 39 ). Somewhat analogous is the use of a dipalmitoylphosphatidyl secondary label that can be incorporated into the wall of a liposome that encapsulates a fluorescent dye. A dipalmitoylphosphatidyl-labeled DNA– containing liposome encapsulating sulforhodamine B exemplifies this strategy (152 ). Indirect noncovalent labeling via binding of fluorescent binding agents to nucleic acid hybrids. A range of macromolecules show binding affinity toward nucleic acid sequences. By labeling these macromolecules with a fluorophore, it is possible to achieve indirect labeling of a nucleic acid. The range of macromolecules includes antibiotics (e.g., olivomycins) (153 ), histones (154 ), antibodies (155 ), nucleases (e.g., deoxyribonuclease) (156 ), inactive restriction endonucleases (e.g., EcoR1) (68 ), and of course, nucleic acids. For example, polyadenylated RNA can be labeled by hybridization to poly(U) (50 –1000 bases) that is covalently attached to a dansylated 300-angstrom-diameter latex microsphere (157, 158 ). Fluorescently labeled monoclonal antibodies capable of distinguishing DNA-RNA hybrid complexes from single-stranded DNA and RNA and doublestranded DNA and RNA can be used to fluorescently label a DNA-RNA hybrid (159, 160 ). This forms the basis of the monoclonal anti-DNA:RNA hybrid capture assay strategy (161 ).

DEGREE OF LABELING AND LOCATION OF LABELS

Labeling methods have been developed that control the location and number of fluorescent labels attached to a nucleic acid. The ability to attach a fluorescent label at a specific location has assumed importance for probes used in energy transfer assays. Double labeling has been achieved with 2 natural fluorescent bases (e.g., pseudoU ⫹ dihydroU) (50 ), a natural fluorescent base plus labeled base (e.g., Y ⫹ 3⬘-acriflavine) (162 ), or 2 labeled bases (e.g., 3⬘acriflavine ⫹ 5⬘-anthrinoyl) (50 ). More recently, double labeling of the same oligonucleotide has assumed particular importance in the context of energy transfer probes, in which the donor and acceptor labels are located sufficiently close to quench fluorescence signal generation. Such probes are now widely used in the form of TaqMan probes for quantitative PCR assays (163 ), energy transfer primers for DNA sequencing (164, 165 ), and molecular beacons (166, 167 ). Conclusions Fluorescently labeled nucleosides, nucleotides, and nucleic acids continue to be important types of reagents for biological assay methods and underpin current methods of chromosome analysis, gel staining, DNA sequencing, and quantitative PCR. These methods use predominantly organic fluorophores, but nanotechnology is now offering new types of particulate fluorophores in the form of nanoparticles, nanorods, and nanotubes that may provide the basis of a new generation of fluorescent labels and nucleic acid detection methods.

Author Contributions: All authors confirmed they have contributed to the intellectual content of this paper and have met the following 3 requirements: (a) significant contributions to the conception and design, acquisition of data, or analysis and interpretation of data; (b) drafting or revising the article for intellectual content; and (c) final approval of the published article. Authors’ Disclosures of Potential Conflicts of Interest: Upon manuscript submission, all authors completed the Disclosures of Potential Conflict of Interest form. Potential conflicts of interest: Employment or Leadership: None declared. Consultant or Advisory Role: None declared. Stock Ownership: None declared. Honoraria: None declared. Research Funding: None declared. Expert Testimony: L.J. Kricka, Applied Biosystems. Role of Sponsor: The funding organizations played no role in the design of study, choice of enrolled patients, review and interpretation of data, or preparation or approval of manuscript. Acknowledgments: The authors thank Sonny Mark for graphic design work and Judith Currano for assistance with literature searches.

Clinical Chemistry 55:4 (2009) 679

Reviews References 1. Nobel Foundation. Nobelprize.org. http://nobelprize. org/ (Accessed February 2009). 2. Encyclopedia Britannica. Sir Isaac Newton. http:// www.britannica.com/EBchecked/topic/413189/SirIsaac-Newton (Accessed February 2009). See information on the Leibniz and Newton calculus controversy. 3. Encyclopedia Britannica. The patent dispute regarding the invention of the laser. http://www. britannica.com/EBchecked/topic/435352/WilliamOughtred (Accessed February 2009). 4. Wikipedia. Slide rule. http://simple.wikipedia. org/wiki/Slide_rule (Accessed February 2009). 5. The Great Idea Finder. http://www.ideafinder. com/history/inventions/telephone.htm (Accessed February 2009). 6. United States Patent and Trademark Office. Appendix L Patent Laws. http://www.uspto.gov/web/ offices/pac/mpep/consolidated_laws.pdf (Accessed February 2009). 7. Conn HJ. The history of staining. Geneva, NY: Biological Stain Commission, WF Humphries Press, 1933:141 pp. 8. Chan CP, Tzang LC, Sin KK, Ji SL, Cheung KY, Tam TK, et al. Biofunctional organic nanocrystals for quantitative detection of pathogen deoxyribonucleic acid. Anal Chim Acta 2007;584: 7–11. 9. U.S. National Library of Medicine, PubMed. http:// www.ncbi.nlm.nih.gov/pubmed/ (Accessed February 2009). 10. Chemical Abstracts Service. A Division of the American Chemical Society. http://www.cas.org (Accessed February 2009). 11. United States Patent and Trademark Office. http://www.uspto.gov (Accessed February 2009). 12. Stokes GG. Mathematical and physical papers. Vol. 1–5. http://tinyurl.com/b63592 (Accessed February 2009). 13. Meyer R. Zeitschrift fur Physikalische Chemie, Stochiometrie und Verwandtschaftslehre. Zeit Phys Chem 1897;24:468. [German] 14. von Baeyer A. Ueber eine neue klasse von farbstoffen. Ber Dtch Chem Ges 1871;4:555– 8. [German] 15. Kasten FH. The origins of modern fluorescence microscopy and fluorescence probes. In: Kohen E, Hirschberg JG, eds. Cell structure and function by microspectrophotometry. San Diego: Academic Press; 1989. 16. Ellinger P, Hirt A. Mikroskopische untersuchungen an lebenden organen. I. Mitteillung: methodik: intravitalmikroskopie. Zeitschrift Anatomie Entwicklungs-Geschichte 1929;90:791– 802. [German] 17. Zeitz L, Lee R. Bromine analysis in 5-bromouracillabeled DNA by X-ray fluorescence. Science (Wash DC) 1963;142:1670 –3. 18. Feulgen R, Rossenbeck H. Mikorskopischchemischer nachweis einer nucleinsaure vom typus der thymonucleisaure und die darauf beruhende elektive Farbung vom zellkernen in mikroskopischen praparaten. Praparaten Z Phys Chem 1924;135:203– 48. [German] 19. Kissane JM, Robins E. The fluorometric measurement of deoxyribonucleic acid in animal tissues with special reference to the central nervous

680 Clinical Chemistry 55:4 (2009)

system. J Biol Chem 1958;233:184 – 8. 20. Middendorf LR, Patonay G, inventors; Li-Cor, Inc. (Lincoln, NE, assignee. Sequencing near infrared and infrared fluorescence labeled DNA for detecting using laser diodes. US Patent 5,230,781. 1993 Jul 27. 21. Seiji T, Mitsuo K, Watanabe H., inventors; Hitachi Chemical Co Ltd., assignee. Pigment for fluorescence labeling, organism-derived substance labeled with pigment for fluorescence labeling and reagent containing them. Japan Patent 50401221. 1993 Feb 19. 22. Wolf G. Friedrich Miescher, the man who discovered DNA. http://www.bizgraphic.ch/miescheriana/ html/the_man_who_dicovered_dna.html (Accessed February 2009). 23. Altmann R. Ueber nucleinsa¨uren. Archiv Anat Physiol. Physiologische Abteilung. 1889:524 – 536. [German] 24. Choudhuri S. Some major landmarks in the path from nuclein to human genome. Toxicol Mechan Methods 2006;16:137–59. 25. Watson JD, Crick FHC. Molecular structure of nucleic acids: a structure for deoxyribose nucleic acid. Nature (Lond) 1953;171:737– 8. 26. Udenfriend S, Zaltzman P. Fluorescence characteristics of purines, pyrimidines, and their derivatives: measurement of guanine in nucleic acid hydrolyzates. Anal Biochem 1962;3:49 –59. 27. Rabinowitz HM. A correlation of fluorescence of human urine with benign and malignant growth. Cancer Res 1949;9:672– 6. 28. Schiller AA. Quantitative measurement of cutaneous fluorescein fluorescence as indicator of the capillary circulation. Proc Soc Exp Biol Med 1949;72:594 – 8. 29. Hadding AR. Mineralienanalyse nach ro¨ntgenspektroskopischer methode. Zeitschrift Anorganisch Allgemeine Chemie 1922;122:195–200. [German] 30. Boerstler EW, inventors. Fluorescence detecting apparatus. US Patent 2,139,797. 1938 Dec 13. 31. Wheeless LL Jr, Patten SF, inventors; Bausch & Lomb, Inc. assignee. Computerized slit-scan cytofluorometer for automated cell recognition. US Patent 3,657,537. 1972 Apr 18. 32. Adams LR, Kamentsky LA, inventors; Bio/Physics Systems, Inc. assignee. Method for analysis of blood by optical analysis of living cells. US Patent 3,684,377. 1972 Aug 15. 33. Hirschfeld T, inventor; Block Engineering, assignee. Method and apparatus for detecting and classifying nucleic acid particles. US Patent 3,887,312. 1975 Jun 3. 34. Falkow S, Moseley SL, inventors; Univ. Washington, assignee. Specific DNA probes in diagnostic microbiology. US Patent 4,358,535. 1982 Nov 9. 35. Tchen P, Kourilsky P, Leng M, Cami AB, inventors; Institut Pasteur, assignee. Probe containing a modified nucleic acid recognizable by specific antibodies and use of this probe to detect and characterize a homologous DNA sequence. France Patent 2518775. 1983 Jun 24. US Patent 5,098,825. 1992 Mar 24. 36. Heller MJ, Morrison LE, Prevatt WD, Akin C, inventors; Std. Oil Co., assignee. Light-emitting

37.

38.

39.

40.

41.

42.

43.

44.

45.

46.

47.

48.

49.

50.

51.

52.

53.

polynucleotide hybridization diagnostic method. European Patent 0070687. 1983 Jan 26. Ranki TM, Soderlund HE, inventors; Orion Corp. Ltd., assignee. Detection of microbial nucleic acids by a one-step sandwich hybridization test. US Patent 4,563,419. 1986 Jan 7. Ward DC, Langer PR, Waldrop A III, inventors; Yale Univ., assignee. Modified nucleotides and methods of preparing and using same. US Patent 4,711,955. 1987 Dec 8. Ward DC, Leary JJ, Brigati DJ, inventors; Yale Univ., assignee. Visualization polymers and their application to diagnostic medicine. US Patent 4,687,732. 1987 Aug 18. Smith LM, inventor; Cal. Inst. Technology, assignee. Nucleoside phosphoramidites and oligonucleotides produced therefrom. Great Britain Patent 2153356. 1985 Aug 21. Englert DF, Wheeler RJ, inventors; Pharmacia, Inc., assignee. Electrophoresis method and apparatus having continuous detection means. US Patent 4,707,235. 1987 Nov 17. Middendorf LR, Brumbaugh JA. inventors; Board of Regents of the Univ. Nebraska, assignee. DNA sequencing. US Patent 4,729,947. 1988 Mar 8. Middendorf LR, Bruce JC, Bruce RC, Eckles RD, Grone DL, Roemer SC, Sloniker GD, et al. Continuous, on-line DNA sequencing using a versatile infrared laser scanner/electrophoresis apparatus. Electrophoresis 1992;13:487–94. Heyroth FF, Loofbrow JR. Changes in the ultraviolet absorption spectrum of uracil and related compounds under the influence of radiations. In: Medicinal chemistry. Indianapolis: American Chemical Society; 1931. p 3441–553. Konev SV. Fluorescence spectra and spectra of action of fluorescence in some proteins. Dokl Akad Nauk USSR 1957;116:594 –7. Konev SV. Fluorescence and phosphorescence of proteins and nucleic acids. New York: Plenum Press; 1967. p 141– 6. RajBhandary UL, Chang SH, Stuart A, Faulkner RD, Hoskinson RM, Khorana HG. Studies on polynucleotides, LXVIII. The primary structure of yeast phenylalanine transferase RNA. Proc Natl Acad Sci U S A 1967;57:751– 8. Yoshikami D, Keller EB. Chemical modification of the fluorescent base in phenylalanine transfer ribonucleic acid. Biochemistry 1971;10:2969–76. Yang C, So¨ll D. Covalent attachment of fluorescent groups to transfer ribonucleic acid: reactions with 4-bromomethyl-7-methoxy-2-oxo-2Hbenzopyran. Biochemistry 1974;13:3615–21. Yang CH, So¨ll D. Studies of transfer RNA tertiary structure of singlet-singlet energy transfer. Proc Natl Acad Sci U S A 1974;71:2838 – 42. Favre A, Yaniv M, Michelson AM. The photochemistry of 4-thiouridine in Escherichia coli t-RNA Val1. Biochem Biophys Res Com 1969; 37:266 –71. Pochon F, Leng M, Michelson AM. Photochemistry of polynucleotides. III. Study of the fluorescence of polynucleotides at ordinary temperature. Biochim Biophys Acta 1968;169:350 – 62. Pochon F, Balny C, Scheit KH, Michelson AM. The photochemistry of polynucleotides. V. Stud-

Reviews

Analytical Ancestry

54.

55.

56.

57.

58.

59.

60.

61.

62.

63.

64.

65.

66.

67.

68.

ies on 4-thiouridine-containing polymers. Biochim Biophys Acta 1971;228:49 –56. Ullman EF, Schwarzberg M, Rubenstein KE. Fluorescent excitation transfer immunoassay: a general method for determination of antigens. J Biol Chem 1976;251:4172– 8. Kriek E, Miller JA, Juhl U, Miller EC. 8-(N-2fluorenylacetamido)guanosine, an arylamidation reaction product of guanosine and the carcinogen N-acetoxy-N-2-fluorenylacetamide in neutral solution. Biochemistry 1967;6:177– 82. Oser A, Valet G. Improved detection by timeresolved fluorometry of specific DNA immobilized in microtiter wells with europium/metalchelator labelled DNA probes. Nucleic Acids Res 1988;16:8178. Patonay G, Narayanan N, Strekowski L, Middendorf LR, Lipowska M, inventors; Licor Inc., assignee. A method for identifying strands of DNA using infrared fluorescence labels. European Patent 0670374. 1995 Sep 12. Yang Y, Lowry M, Xu X, Escobedo JO, SibrianVazquez M, Wong L, et al. Seminaphthofluorones are a family of water-soluble, low molecular weight, NIR-emitting fluorophores. Proc Natl Acad Sci U S A 2008;105:8829 –34. Lacenere CJ, Garg NK, Stoltz BM, Quake SR. Effects of a modified dye-labeled nucleotide spacer arm on incorporation by thermophilic DNA polymerases. Nucleosides Nucleotides Nucleic Acids 2006;25:9 –15. Castro SL, Barbera-Guillem E, inventors; BioCrystal Ltd., assignee. Functionalized nanocrystals and their use in detection systems. US Patent 6,114,038. 2000 Sep 5. Barbera-Guillem E, Nelson M, Castro SL. inventors; Bio-Pixels Ltd., assignee. Functionalized nanocrystals and their use in labeling for strand synthesis or sequence determination. US Patent 6,221,602. 2001 Apr 24. Weiss S, Bruchez M Jr, Alivisatos P, inventors; Re gents of the Univ. of California, assignee. Organo Luminescent semiconductor nanocrystal probes for biological applications and process for making and using such probes. US Patent 5,990,479. 1999 Nov 23. Li W-LR, Zhou JS, inventors; Monsanto Technology LLC, assignee. Fluorescent oligonucleotides and uses thereof. US Patent 6,838,244. 2005 Jan 4. Hannah EC, inventor; Intel Corp., assignee. Carbon nanotube molecular labels. US Patent 6,821,730. 2004 Nov 23. Jeng ES, Moll AE, Roy AC, Gastala JB, Strano MS. Detection of DNA hybridization using the near-infrared band gap fluorescence of singlewalled carbon nanotubes. Nano Lett 2006;6: 371–5. Zhou X, Zhou J. Improving the signal sensitivity and photostability of DNA hybridizations on microarrays by using dye-doped core-shell silica nanoparticles. Anal Chem 2004;76:5302–12. Li Z-P, Kambara H. Single nucleotide polymorphism analysis based on minisequencing coupled with a fluorescence microsphere technology. J Nanosci Nanotech 2005;5:1256 – 60. Taylor JR, Fang MM, Nie S. Probing specific sequences on single DNA molecules with bioconjugated fluorescent nanoparticles. Anal Chem 2000;72:1979 – 86.

69. Chandler DJ, inventor; Luminex Corp., assignee. Encapsulation of discrete quanta fluorescent particles. US Patent 6,528,165. 2003 Mar 3. 70. Wieder I, inventor; Analytical Radiation, Inc., assignee. Method and apparatus for improved analytical fluorescent spectroscopy. US Patent 4,058,732. 1977 Nov 15. 71. Soini E, Hemmila I, inventors; Wallac Oy (Finland), assignee. Fluorescence spectroscopy assay means with fluorescent chelate of a lanthanide. US Patent 4,374,120. 1983 Feb 15. 72. Hemmila I, Dakubu S, Wallac Oy (Finland), assignee. Method for fluorescence spectroscopic determination of a biologically active substance. US Patent 4,565,790. 1986 Jan 21. 73. Ekimov AI, Onushchenko AA. Quantum size effect in three-dimensional microscopic semiconductor crystals. JETP Lett 1981;34:345–9. 74. Rossetti R, Nakahara S, Brus LE. Quantum size effects in the redox potentials, resonance Raman spectra, and electronic spectra of CdS crystallites in aqueous solution. J Chem Phys 1983; 79:1086 – 8. 75. Evident Technologies. http://www.evidenttech. com (Accessed February 2009). 76. Bruchez M Jr, Moronne M, Gin P, Weiss S, Alivisatos AP. Semiconductor nanocrystals as fluorescent biological labels. Science (Wash DC) 1998; 1998;281:2013– 6. 77. Chan WC, Nie S. Quantum dot bioconjugates for ultrasensitive nonisotopic detection. Science (Wash DC) 1998;281:2016 – 8. 78. Lawton C, Conroy S, inventors; Univ. of Massachusetts Lowell, assignee. Biomolecular synthesis of quantum dot composites. US Patent 5,985,353. 1999 Nov 16. 79. Mooradian A. Photoluminescence of metals. Phys Rev Lett 1969;22:185–7. 80. Li C-Z, Male KB, Hrapovic S, Luong JHT. Fluorescence properties of gold nanorods and their application for DNA biosensing. Chem Comm 2005;31:3924 – 6. 81. Wilcoxon JP, Martin JE, Parsapour F, Wiedenman B, Kelley DF. Photoluminescence from nanosize gold clusters. J Chem Phys 1998;108: 9137– 43. 82. Wu M, Davidson N. Transmission electron microscopic method for gene mapping on polytene chromosomes by in situ hybridization. Proc Natl Acad Sci U S A 1981;78:7059 – 63. 83. Mirkin CA, Letsinger RL, Mucic RC, Storhoff JJ. A DNA-based method for rationally assembling nanoparticles into macroscopic materials. Nature (Lond) 1996;382:607–9. 84. O’Connell MJ, Bachilo SM, Huffman CB, Moore VC, Strano MS, Haroz EH, Rialon KL, et al. Band gap fluorescence from individual single-walled carbon nanotubes. Science (Wash DC) 2002; 297:593– 6. 85. Kochetov NK, Shibaev VN, Kost AA. New reaction of adenine and cytosine derivatives potentially useful for nucleic acid modifications. Tetrahedron Lett 1971;1933– 6. 86. Bolton PH, Kearns DR. Spectroscopic properties of ethidium monoazide: a fluorescent photoaffinity label for nucleic acids. Nucleic Acids Res 1978;5:4891–903. 87. Tso PO, Lu P. Interaction of nucleic acids, II. Chemical linkage of the carcinogen 3,4benzpyrene to DNA induced by photoradiation.

Proc Natl Acad Sci U S A 1964;51:272– 80. 88. Chieco P, Derenzini M. The Feulgen reaction 75 years on. Histochem Cell Biol 1999;111:345–58. 89. Ornstein L, Mautner W, Davis BJ, Tamura R. New horizons in fluorescence microscopy. J Mt Sinai Hosp N Y 1957;24:1066 –78. 90. Kasten FH, Burton V, Glover P. Fluorescent Schiff-type reagents for cytochemical detection of polyaldehyde moieties in sections and smears. Nature (Lond) 1959;184:1797– 8. 91. Churchich JE. Fluorescence studies on soluble ribonucleic acid labeled with acriflavine. Biochim Biophys Acta 1963;75:274 – 6. 92. Bienvenu¨e A, Tournon J. Specific labelling in the 3⬘ and 5⬘ OH position of a nucleoside with a fluorescent dye. Biochimie 1973;55:1167–9. 93. Tournon J. Fluorescence probing of nucleic acids: I. Singly and doubly labeled dithymidine phosphate: fluorescence and energy transfer studies. Nucleic Acids Res 1975;2:1261–73. 94. Menzel HM. On the Phe-tRNA induced binding of fluorescent oligonucleotides to the ribosomal decoding site. Nucleic Acids Res 1977;4: 2881–92. 95. Smith LM, Fung S, Hunkapiller MW, Hunkapiller TJ, Hood LE. The synthesis of oligonucleotides containing an aliphatic amino group at the 5⬘ terminus: synthesis of fluorescent DNA primers for use in DNA sequence analysis. Nucleic Acids Res 1985;13:2399 – 412. 96. Hiratsuka T, Uchida K. Preparation and properties of 2⬘(or 3⬘)-O-(2,4,6-trinitrophenyl) adenosine 5⬘-triphosphate, an analog of adenosine triphosphate. Biochim Biophys Acta 1973;320: 635– 47. 97. Yang CH, So¨ll D. Covalent attachment of fluorescent groups to the 5⬘-end of transfer RNA. Arch Biochem Biophys 1973;155:70 – 81. 98. Gohlke JR, Hedaya E, Kang J, Mier JD, inventors; Baker Inst. Corp., assignee. Novel chromogenic and/or fluorogenic substrates for monitoring catalytic or enzymatic activity. US Patent 4,378,458. 1983 Mar 28. 99. Fidanza JA, McLaughlin LW. Introduction of reporter group at specific sites in DNA containing phosphorothiodate diesters. J Am Chem Soc 1989; 111:9117–9. 100. Musajo L, Rodighiero G. A new photo-reaction between some furocoumarins and flavin mononucleotide. Nature (Lond) 1961;190:1109 –10. 101. Dall’Acqua F. Studies on the photoreaction (365 nm) between psoralen and thymine. Scienze Chemiche 1968;38:1094 –9. 102. Musajo L, Rodighiero G. The skin-photosensitizing furocoumarins. Experientia 1962;18:153– 61. 103. Musajo L, Rodighiero G, Colombo G, Torlone V, Dall’Acqua F. Photosensitizing furocoumarins: interaction with DNA and photo-inactivation of DNA containing viruses. Experientia 1965;21: 22– 4. 104. Musajo L, Rodighiero G, Dall’Acqua F. Evidences of a photoreaction of the photosensitizing furocoumarins with DNA and with pyrimidine nucleosides and nucleotides. Experientia 1965; 21:24 – 6. 105. Musajo L, Rodighiero G. The mechanism of action of the skin photosensitizing furocoumarins. Acta Derm Venereol 1967;47:298 –303. 106. Musajo L, Visentini P, Bacchinetti F, Razzi MA. Photoinactivation of Ehrlich ascites tumor

Clinical Chemistry 55:4 (2009) 681

Reviews

107.

108.

109.

110.

111.

112.

113.

114.

115.

116.

117.

118.

119.

120.

121.

122.

123.

124.

cells in vitro obtained with skin-photosensitizing furocoumarins. Experientia 1967;23: 335– 6. Lee YJ, Summers WA, Burr JG. Fluorescent and phosphorescent pyrimidine labels: ␣-diketone derivative of uracil and thymine. Tetrahedron 1978;34:2861– 8. Sato T, Okahata Y, inventors; Tokyo Institute of Technology (Japan), assignee. Fluorescencelabeled probe for DNA and a fluorescencelabeled plasmid. US Patent 6,608,213. 2003 Aug 19. Pochon F, Perrin M. Fluorescent labelling of polynucleotides by 9-bromomethylanthracene. Eur J Biochem 1974;43:107–13. Barrio JR, Secrist JA 3rd, Leonard NJ. A fluorescent analog of nicotinamide adenine dinucleotide. Proc Natl Acad Sci U S A 1972;69: 2039 – 42. Secrist JA 3rd, Barrio JR, Leonard NJ. A fluorescent modification of adenosine triphosphate with activity in enzyme systems: 1,N 6-ethenoadenosine triphosphate. Science (Wash DC) 1972;175:646 –7. Saito I, Ito S, Shinmura T, Metsuura T. A simple synthesis of fluorescent uridines by photochemical method. Tetrahedron Lett 1980;21: 2813– 6. Gillam I, Blew D, Warrington RC, von Tigerstrom M, Tener GM. A general procedure for the isolation of specific transfer ribonucleic acids. Biochemistry 1968;7:3459 – 68. Lynch DC, Schimmel PR. Cooperative binding of magnesium to transfer ribonucleic acid studied by a fluorescent probe. Biochemistry 1974;13: 1841–52. Limbach PA, Crain PF, McCloskey JA. Summary: the modified nucleosides of RNA. Nucleic Acid Res 1994;22:2183–96. Secrist JA 3rd, Barrio JR, Leonard NJ. Attachment of a fluorescent label to 4-thiouracil and 4-thiouridine. Biochem Biophys Res Commun 1971;45:1262–70. Favre A, Yaniv M. Introduction of an intramolecular fluorescent probe in E. coli tRNA(Val)(1). FEBS Lett 1971;17:236 – 40. Favre A, Thomas G. Transfer RNA: from photophysics to photobiology. Annu Rev Biophys Bioeng 1981;10:175–95. Rozovskaia TA, Bibilashvili PSh, Tarusova NB, Gurskiı˘ GV, Strel’tsov SA. Addition of the fluorescent label to the 3⬘-OH end of DNA and the 3⬘-OH end of nascent RNA. Mol Biol (Mosk) 1977;11:598 – 610. [Russian] Eshaghpour H, So¨ll D, Crothers DM. Specific chemical labeling of DNA fragments. Nucleic Acids Res 1979;7:1485–95. Richardson RW, Gumport RI. Biotin and fluorescent labeling of RNA using T4 RNA ligase. Nucleic Acids Res 1983;11:6167– 84. Dunn DA, Lin VH, Kochevar IE. The role of ground state complexation in the electron transfer quenching of methylene blue fluorescence by purine nucleotides. Photochem Photobiol 1991; 51:47–56. Armstrong JA, Niven JS. Fluorescence microscopy in the study of nucleic acids: histochemical observations on cellular and virus nucleic acids. Nature (Lond) 1957;180:1335– 6. Mikhailov MV, Zasedatelev AS, Krylov AS, Gur-

682 Clinical Chemistry 55:4 (2009)

125.

126. 127. 128.

129.

130.

131.

132.

133.

134.

135.

136.

137.

138.

139.

140. 141.

142.

skiı˘ GV. Mechanism of AT base pairs recognition by molecules of dye “Hoechst 33258.” Mol Biol (Mosk) 1981;15:690 –705. Norde´n B, Tjerneld F, Palm E. Linear dichroism studies of binding site structures in solution: complexes between DNA and basic arylmethane dyes. Biophys Chem 1978;8:1–15. von Provazek S. Uber fluoreszenz der zellen. Kleinwelt 1914;6:30. [German] Clark G, Kasten FH. History of staining, 3rd ed. Baltimore: Williams and Wilkins; 1983. 304 p. Seaman A, Woodbine M. The antibacterial activity of phenanthridine compounds. Br J Pharmacol Chemother 1954;9:265–70. Lerman LS. Structural considerations in the interaction of DNA and acridines. J Mol Bio 1961; 3:18 –30. Elliott WH. The effects of antimicrobial agents on deoxyribonucleic acid polymerase. Biochem J 1963;86:562–7. Le Pecq JB, Yot P, Paoletti C. Interaction du bromohydrate d’ethidium (BET) avec les acides nucleiques. Etude spectroflurometrique. CR Acad So Paris 1964;259:1786 –9. [French] Waring MJ. Complex formation between ethidium bromide and nucleic acids. J Mol Biol 1965;13:269 – 82. Gaugain B, Barbet J, Oberlin R, Roques BP, Le Pecq JB. DNA bifunctional intercalators. I. Synthesis and conformational properties of an ethidium homodimer and of an acridine ethidium heterodimer. Biochemistry 1978;17: 5071– 8. Lee LG, Chen CH, Chiu LA. Thiazole orange: a new dye for reticulocyte analysis. Cytometry 1986;7:508 –17. Glazer AN, Rye HS. Stable dye-DNA intercalation complexes as reagents for high-sensitivity fluorescence detection. Nature (Lond) 1992;359: 859 – 61. Singer VL, Jones LJ, Yue ST, Haugland RP. Characterization of PicoGreen reagent and development of a fluorescence-based solution assay for double-stranded DNA quantitation. Anal Biochem 1997;249:228 –38. Le Pecq JB, Paoletti C. Interaction of ethidium hydrobromate (EH) with polyribonucleotides. Applications to the study of hybridization reactions. Comptes Rendus Hebdomadaires des Seances de l’Academie des Sciences–D: Sciences Naturelles 1965;260:7033– 6. Le Pecq JB, Paoletti C. Study of displacement reactions between polyribonucleotides by use of ethidium hydrobromate (ETB): demonstration of displacement of the poly (A-2 I) by poly U. Comptes Rendus Hebdomadaires des Seances de l’Academie des Sciences–D: Sciences Naturelles 1965;261:838 – 41. Borst P. Ethidium bromide agarose gel electrophoresis: how it started. IUBMB Life 1005; 57:745–7. Aaij C, Borst P. The gel electrophoresis of DNA. Biochim Biophys Acta 1972;269:192–200. Sharp PA, Sugden B, Sambrook J. Detection of two restriction endonuclease activities in Haemophilus parainfluenzae using analytical agarose-ethidium bromide electrophoresis. Biochemistry 1973;12:3055– 63. Mayor HD, Diwan AR. Studies on the acridine orange staining of two purified RNA viruses:

143.

144.

145.

146.

147.

148.

149.

150.

151.

152.

153.

154.

155.

156.

157.

158.

159.

poliovirus and tobacco mosaic virus. Virology 1961;14:74 – 82. Tas J, Mendelson D, Noorden CJ. Cuprolinic blue: a specific dye for single-stranded RNA in the presence of magnesium chloride. I. Fundamental aspects. Histochem J 1983;15:801–14. Rye HS, Dabora JM, Quesada MA, Mathies RA. Fluorometric assay using dimeric dyes for doubleand single-stranded DNA and RNA with pictogram sensitivity. Anal Biochem 1993;208:144 –50. Labarca C, Paigen K. A simple, rapid, and sensitive DNA assay procedure. Anal Biochem 1980;102:344 –52. Mergny J-L, Collier D, Rougee M, MontenatGarestier T, Helene C. Intercalation of ethidium bromide into triple-stranded oligonucleotide. Nucleic Acids Res 1991;19:1521– 6. Latt SA, Wohlleb JC. Optical studies of the interaction of 33258 Hoechst with DNA, chromatin, and metaphase chromosomes. Chromosoma 1975;52:297–316. Bunkenborg J, Stidsen MM, Jacobsen JP. On the sequence selective bis-intercalation of a homodimeric thiazole orange dye in DNA. Bioconjugate Chem 1999;10:824 –31. Vener TI, Turchinsky MF, Knorre VD, Lukin YV, Shcherbo SN, Zubov VP, Sverdlov ED. A novel approach to non-radioactive hybridization assay of nucleic acids using stained latex particles. Anal Biochem 1991;198:308 –11. Langer-Safer PR, Levine M, Ward DC. Immunological method for mapping genes on Drosophila polytene chromosomes. Proc Natl Acad Sci U S A 1982;79:4381–5. Bauman JG, Wiegant J, van Duijn P. Cytochemical hybridisation with fluorochrome-labelled RNA. III. Increased sensitivity by the use of anti-fluorescein antibodies. Histochemistry 1981; 73:181–93. Rule GS, Montagna RA, Durst RA. Rapid method for visual identification of specific DNA sequences based on DNA-tagged liposomes. Clin Chem 1996;42:1206 –9. Crissman HA, Stevenson AP, Orlicky DJ, Kissane RJ. Detailed studies on the application of three fluorescent antibiotics for DNA staining in flow cytometry. Stain Technol 1978;53:321–30. Lewis PN. Fluorescently labelled histones as probes of nucleosome structure: preparation and general properties of methionine-labelled histone H4. Eur J Biochem 1979;99:315–22. Beiser SM, Andres GA, Christian CL, Hsu KC, Seegal BC. Immunological studies of lupus-like nephritis in NZB/NZW F1 mice [Abstract]. Federation Proc 1968;27:621, A2282. Benjaminson MA, Hunter DB, Katz IJ. Fluorochrome-labelled deoxyribonuclease: specific stain for cell nuclei. Science (New York, NY) 1968;160:1359 – 60. Cheung SW, Tishler PV, Atkins L, Sengupta SK, Modest EJ, Forget BC. Gene mapping by fluorescent in situ hybridization [Abstract]. J Cell Biol 1976;70:221A. Cheung SW, Tishler PV, Atkins L, Sengupta SK, Modest EJ, Forget BG. Gene mapping by fluorescent in situ hybridization. Cell Biol Intl Rep 1977;1:255– 62. Stollar BD. Double-helical polynucleotides: immunochemical recognition of differing conformations. Science (Wash DC) 1970;169:609 –11.

Reviews

Analytical Ancestry

160. Rudkin GT, Stollar BD. High resolution detection of DNA-RNA hybrids in situ by indirect immunofluorescence. Nature (Lond) 1977;265: 472–3. 161. Stuart WD, Frank MB, inventors; Univ. of Hawaii, assignee. Monoclonal antibodies for DNARNA hybrid complexes and their uses. US Patent 4,732,847. 1988 Mar 22. 162. Beardsley K, Cantor CR. Studies of transfer RNA tertiary structure by singlet-singlet energy transfer. Proc Natl Acad Sci U S A 1970; 65:39 – 46. 163. Gelfand D, Holland P, Saiki R, Watson R. inventors; Hoffman-La Roche Inc., assignee. Homogeneous assay system using the nuclease activity of a nucleic acid polymerase. US Patent 5,210,015. 1990 Aug 6. 164. Ju J, inventor; Incyte Pharmaceuticals, Inc., assignee. Sets of labeled energy transfer fluorescent primers and their use in multi component analysis. US Patent 5,804,386. 1998 Sep 8. 165. Soderlund HE, Weckman AM, inventors; OrionYahtyma Oy, assignee. Method for assays of nucleic acid: a regent combination and kit therefor. US Patent 5,476,769. 1995 Dec 19. 166. Tyagi S, Kramer FR, Lizardi PM, inventors; Public Health Research Institute of the City of New York, Inc., assignee. Detectably labeled dual conformation oligonucleotide probes, assays and kits. US Patent 5,925,517. 1999 Jul 20. 167. Tyagi S, Kramer FR. Molecular beacons: probes that fluoresce upon hybridization. Nat Biotechnol 1996;14:303– 8. 168. Eisinger J, Gueron M, Shulman RG, Yamane T. Excimer fluorescence of dinucleotides, polynucleotides, and DNA. Proc Natl Acad Sci U S A 1966;55:1015–20. 169. Gueron M, Shulman RG, Eisinger J. Energy transfer in dinucleotides. Proc Natl Acad Sci U S A 1966;55:814 – 8. 170. Lipsett MN. The behavior of 4-thiouridine in the E. coli s-RNA molecule. Biochem Biophys Res Comm 1965;20:244 –9. 171. Jeffrey AM, Jennette KW, Blobstein SH, Weinstein

172.

173.

174.

175.

176.

177.

178.

179.

IB, Beland FA, Harvey RG, et al. Benzo[a]pyrenenucleic acid derivative found in vivo: structure of a benzo[a]pyrenetetrahydrodiol epoxide-guanosine adduct. J Am Chem Soc 1976;98:5714 –5. Koreeda M, Moore PD, Yagi H, Yeh HJ, Jerina DM. Alkylation of polyguanylic acid at the 2-amino group and phosphate by the potent mutagen (⫹/⫺)-7beta , 8alpha-dihydroxy-9beta , 10betaepoxy-7,8,9,10-tetrahydrobenzo[a]pyrene. J Am Chem Soc 1976;98:6720 –2. Casperson T, Farber S, Foley GE, Kudynowski J, Modest EJ, Simonsson E, et al. Chem differentiation along metaphase chromosomes. Exp Cell Res 1968;49:219 –22. Draper DE, Gold L. A method for linking fluorescent labels to polynucleotides: application to studies of ribosome-ribonucleic acid interactions. Biochemistry 1980;19:1774 – 81. Leng M, Pochon F, Michelson AM. [Photochimie des polynucleotides, II Etude de la luminescence a temperature ordinaire de mononucleotides et dinucleotides]. Biochem Biophys Acta 1968;169: 338 – 49. Yoshida S, Hirose S, Iwamoto M. Use of 4-bromomethyl-7-methoxycoumarin for derivatization of pyrimidine compounds in serum analysed by high-performance liquid chromatography with fluorometric detection. J Chromatogr 1986;383:61– 8. Pingoud A, Boehme D, Riesner D, Kownatzki R, Maass G. Anti-cooperative binding of two tRNATyr molecules to tyrosyl-tRNA synthetase from Escherichia coli. Eur J Biochem 1975;56: 617–22. Schiller PW, Schechter AN. Covalent attachment of fluorescent probes to the X-base of Escherichia coli phenylalanine transfer ribonucleic acid. Nucleic Acids Res 1977;4:2161–7. Faulhammer HG, Sprinzl M, Cramer F. Fluorescamine modification of E. coli and yeast transfer RNAs and their use in the study of protein biosynthesis [Abstract]. In: Miriam Balaban, ed. Molecular mechanisms of biological recognition: proceedings of the sixth Aharon Katzir-

180.

181.

182.

183.

184.

185.

186.

187.

Katchalsky Conference, in conjunction with the Minerva Symposium in Biology, Go¨ttingen and Braunlage/Harz, September 24 –30, 1978. Amsterdam; New York: Elsevier/North-Holland Biomedical Press; 1979. Eigen M and Cramer F, organizers. Kasai H, Shindo-Okada N, Noguchi S, Nishimura S. Specific fluorescent labeling of 7-(aminomethyl)-7deazaguanosine located in the anticodon of tRNATyr isolated from E. coli mutant. Nucleic Acids Res 1979;7:231– 8. Kuchino Y, Kasai H, Yamaizumi Z, Nishimura S, Borek E. Under-modified Y base in a tRHAPhe isoacceptor observed in tumor cells. Biochim Biophys Acta 1979;565:215– 8. Macklin JJ, Trautman JK, Harris TD, inventors; Seq Ltd., assignee. Method to make fluorescent nucleotide photoproducts for DNA sequencing and analysis. World Intellectual Property Organization (WO) Patent 013110. 1999 Mar 18. Ward DC, Cerami A, Reich E, Acs G, Altwerger L. Biochemical studies of the nucleoside analogue, formycin. J Biol Chem 1969;244:3243–50. Ward DC, Reich E, Stryer L. Fluorescence studies of nucleotides and polynucleotides. I. Formycin, 2-aminopurine riboside, 2,6-diaminopurine riboside, and their derivatives. J Biol Chem 1969; 244:1228 –37. Zhenodarova SM, Klyagina VP. Step-wise oligonucleotide synthesis. XXV. Synthesis of trinucleoside diphosphates containing a fluorescent label. Bioorg Khim 1977;3:1623–5. Ried T, Baldini A, Rand TC, Ward DC. Simultaneous visualization of seven different DNA probes by in situ hybridization using combinatorial fluorescence and digital imaging microscopy. Proc Natl Acad Sci U S A 1992;89: 1388 –92. Prober JM, Trainor GL, Dam RJ, Hobbs FW, Robertson CW, Zagursky RJ, et al. A system for rapid DNA sequencing with fluorescent chainterminating dideoxynucleotides. Science (Wash DC) 1987;238:336 – 41.

Clinical Chemistry 55:4 (2009) 683

Reviews

Clinical Chemistry 55:4 684–697 (2009)

Utilizing the Molecular Gateway: The Path to Personalized Cancer Management Jonathan B. Overdevest,1 Dan Theodorescu,1 and Jae K. Lee2*

BACKGROUND: Personalized medicine is the provision of focused prevention, detection, prognostic, and therapeutic efforts according to an individual’s genetic composition. The actualization of personalized medicine will require combining a patient’s conventional clinical data with bioinformatics-based molecular-assessment profiles. This synergistic approach offers tangible benefits, such as heightened specificity in the molecular classification of cancer subtypes, improved prognostic accuracy, targeted development of new therapies, novel applications for old therapies, and tailored selection and delivery of chemotherapeutics. CONTENT:

Our ability to personalize cancer management is rapidly expanding through biotechnological advances in the postgenomic era. The platforms of genomics, proteomics, single-nucleotide polymorphism profiling and haplotype mapping, highthroughput genomic sequencing, and pharmacogenomics constitute the mechanisms for the molecular assessment of a patient’s tumor. The complementary data derived during these assessments is processed through bioinformatics analysis to offer unique insights for linking expression profiles to disease detection, tumor response to chemotherapy, and patient survival. Together, these approaches permit improved physician capacity to assess risk, target therapies, and tailor a chemotherapeutic treatment course.

SUMMARY:

Personalized medicine is poised for rapid growth as the insights provided by new bioinformatics models are integrated with current procedures for assessing and treating cancer patients. Integration of these biological platforms will require refinement of tissue-processing and analysis techniques, particularly in clinical pathology, to overcome obstacles in customizing our ability to treat cancer.

© 2009 American Association for Clinical Chemistry

Departments of 1 Molecular Physiology and Biological Physics; and 2 Public Health Sciences, University of Virginia Health Sciences Center, Charlottesville, VA. * Address correspondence to this author at: Department of Public Health Sciences, Box 800717, University of Virginia Health Sciences Center, Charlottesville, VA 22908, USA. E-mail [email protected]. Received October 1, 2008; accepted January 28, 2009. Previously published online at DOI: 10.1373/clinchem.2008.118554

684

Throughout the present “era of postgenomics,” technological breakthroughs and bioinformatics-based analysis have briskly expanded our ability to unravel the molecular composition and function of disease. These advancements in molecular assessment have heightened the anticipation for their application to improving patient care through personalized medicine (1 ). Cancer, despite sharing common aberrant physiological alterations, is a diverse constellation of disease processes (2 ). Various manifestations of cancer are immensely heterogeneous with respect to metastatic potential and resistance to treatment (3 ). Both of these factors contribute to the failure of modern cancer therapies to durably repress recurrence in patients, as evinced by the stagnant mortality rates over the past 3 decades (4 ). Thus, the heterogeneous nature of cancer and the shortcomings of currently available therapeutics suggest the potential for a central role for a personalized approach to cancer management. Indeed, the paradigm-shifting concept of targeting a dysregulated kinase in chronic myelogenous leukemia has focused therapeutic development on the comprehension of molecular mechanisms (5 ). Current standards of cancer management are ripe for personalization, because opportunities for molecularly assessing mutational dysregulation exist throughout the clinical course of disease progression (Fig. 1). Even before the occurrence of malignant transformation, the interplay between genetic composition and environmental factors shapes an individual’s predisposition for cancer. Screening for these genetic factors can provide clinicians with the insight necessary to recommend modifications to behavior, lifestyle, and diet, while monitoring for disease onset. Upon malignant transformation, the management approach transitions from modifying risk to preventing progression. The bioinformatics-based analyses discussed in this review aim to integrate data from a patient’s clinical presentation and the molecular-expression profile of the patient’s tumor with molecular information from external databases. This multisource integrative approach to cancer management promises to provide unparalleled ability to assess risk, target therapies, and tailor treatments throughout the disease course. We describe existing molecular-profiling platforms and review applications that may prove useful for current or future contributions to personalized

Path to Personalized Cancer Management

Reviews

Fig. 1. Pathways in personalized cancer management. Cancer management is personalized by integrating synergistic molecular-assessment methods with conventional methods of clinical practice. In a prediagnostic setting, combining current knowledge of environmental factors with available screens for heritable cancer-inducing genetic mutations allows physicians an opportunity to modify preventive and monitoring guidelines. The onset of malignant transformation signals a transition from passive to active treatment in order to curtail an increase in disease burden. After diagnosis, the patient’s clinical data are combined with molecular-expression profile data for the tumor, and the data are analyzed with bioinformatics methods that draw upon archival expression data from external databases. This collaborative effort tailors cancer treatment by means of increased accuracy of risk assessment and improved methods so that therapies targeted to specific molecular dysregulations can be selected. Most importantly, the iterative nature of this approach to therapeutic assessment offers a continuous opportunity for physicians to reanalyze risk and select additional therapies to adapt to a patient’s changing molecular profile.

cancer management. We consider the complementary nature of data generated through the various modalities, including transcriptomics, proteomics, singlenucleotide polymorphism (SNP)3 profiling and haplotype mapping, high-throughput genomic sequencing, and pharmacogenomics. We subsequently detail the role of bioinformatics in combining complex, multivariate molecular data to refine networks of interconnectivity among classifications of morphologically distinct neoplasms, identify and validate biomarkers, and aid in the characterization of patients’ risk, prognosis, and therapeutic response. Finally, we consider the broadened role for clinical pathology in mediating the delivery of personalized cancer management.

3

Nonstandard abbreviations: SNP, single-nucleotide polymorphism; miRNA, microRNA; FDA, Food and Drug Administration; COSMIC, Catalogue of Somatic Mutations in Cancer; FFPE, formalin-fixed, paraffin-embedded; qPCR, quantitative real-time PCR.

Molecular Avenues toward Personalized Medicine Genomic-assessment methods spawned by the sequencing of the human genome have catalyzed an increasingly molecular approach to studying the foundations of human disease. Indeed, profiling platforms that provide insight into the role of genetic sequences, RNA and protein concentrations, and activities of metabolic enzymes in the development of cancer are becoming prominent fixtures in the clinical management of cancer. These molecular-profiling techniques promise to complement current practices of clinical evaluation to permit a more comprehensive staging and assessment of tumor progression. Continuing on this path of complementarity should pave the way toward more personalized diagnoses, prognoses, and predictions of the response to treatment. Current platforms for molecular assessment include transcriptomics, proteomics, SNP profiling and haplotype mapping, high-throughput genetic sequencClinical Chemistry 55:4 (2009) 685

Reviews

Table 1. Molecular platforms for personalized medicine. Technology

Strengths

Weaknesses

Applications

Transcriptomics (genomic • High-throughput target • Unable to detect posttranslational • Classification of tumor subtypes expression) identification modifications and interactions [Weber (7 ), Ramaswamy et al. • Accurate expression quantification • Sample heterogeneity (55 )] • Cost-effective large screening • Limited access to high-quality • Prediction of therapeutic • Well-established protocol for RNA preserved samples response [Potti et al. (39 ), Lee extraction and hybridization et al. (42 )] • Emerging insight into epigenetics • Epigenetic assessment in cancer and miRNA regulation management [Esteller (14 )] • miRNA expression signatures for diagnosis and tumor classification [Calin and Croce (9 )]

a

Proteomics (proteomic expression)

• Direct functional interactions with • Difficulty in large-scale target drugs and molecular targets identification • Inaccurate and inefficient expression quantification • Sample heterogeneity • Limited access to high-quality preserved samples

• Testing and patient stratification for drug-sensitivity gene networks [Araujo et al. (56 )]

SNP/HapMap

• Cost-effective large-scale geneticvariation screening in patients • Low error rate • Well-established analysis tools

• Limited biological implications • Difficult to find direct gene targets • Unable to detect target functions

• Many disease applications in the HapMap Consortium (21 )

High-throughput gene sequencing

• Comprehensive sequence information

• Expensive experiment per sample • Overwhelming quantity of data with high analysis challenges • No direct detection of target functions

• Tumor classification, CGAP,a HCGP [Ley et al. (24 )]

Pharmacogenomics

• Directly targets patient subpopulations with specific molecular characteristics • Limits toxicity and untoward side-effects

• Limited knowledge regarding global gene-network interactions • Limited pharmaceutical alternatives for individuals with polymorphic variants leading to adverse reactions

• Targeted therapeutics [Evans and Relling (26 ), van Schaik (27 )]

CGAP, Cancer Genome Anatomy Project; HCGP, Human Cancer Genome Project.

ing, and pharmacoproteomics. Each of these approaches relies on different formats of biological input; accordingly, each has unique advantages and limitations with respect to personalized medicine. By developing a comprehensive understanding of the strengths and weaknesses of these routes, we can better leverage the information they provide in a collaborative fashion (Table 1). TRANSCRIPTOMICS: GENOMIC-EXPRESSION PROFILING

Given that many neoplastic processes are caused by mutations in the genetic code and subsequent errors in transcription, there have been concerted efforts to identify alterations in levels of gene expression that are associated with tumorigenesis. These transcriptomic studies rely on DNA-microarray technology to provide information on aberrant gene expression in human cancer. The 2 forms of microarray methods—those that detect cDNA and oligonucleotide methods—are 686 Clinical Chemistry 55:4 (2009)

provided in a variety of commercial and noncommercial platforms. The principle detection method, however, remains the same—mRNAs isolated from a patient’s tumor biopsy or from surgically resected tissue samples are used to create fluorescently labeled cDNA. These cDNA probes hybridize with complementary sequences bound to microarrays to produce a fluorescent signal. The observed intensity of each gene’s signal is proportional to the concentration of the original transcript in the tissue sample. The mechanics of microarrays and their utility in analyzing gene expression in cancer are thoroughly reviewed elsewhere (6 ). Growing from a seminal study that classified acute leukemias on the basis of genomic expression (7 ), the widespread integration of microarray technology into biomedical and clinical oncologic research has led to the identification of the transcriptional status of many different tumors. Additional studies have subsequently identified gene expression signatures that serve as

Reviews

Path to Personalized Cancer Management

biomarkers, aid in prognostic prediction, and guide chemotherapeutic selection. Minna et al. have comprehensively summarized recent studies that have used expression profiling to derive biomarkers that predict a tumor’s response to chemotherapy (8 ). Additionally, the large quantities of data produced by these studies have prompted the development of Internet-based resources, such as the Oncomine Research Platform (http://www.oncomine.org/), to catalogue transcriptome profiles from published gene expression analyses into a rapidly accessible database. Two other elements of transcriptomic assessment in cancer that deserve specific mention are the emerging field of microRNA (miRNA) and the study of epigenetic regulation. Genomic miRNAs are short noncoding sequences of 20 –22 nucleotides that exert their regulatory effects by binding to a complementary 3⬘ untranslated region of mRNA sequences. Studies continue to identify miRNA mutations that can alter the homeostatic regulation of gene expression by acting either as oncogenes (MIR155,4 microRNA 155) or as tumor suppressors (MIR15A, microRNA 15a; MIR16 –1, microRNA 16 –1) (9 ). Moreover, microarray expression-profiling methods used for mRNAs have been developed for analysis of noncoding miRNAs (10 ). Preliminary analyses with this technology have screened both healthy and tumor tissues to identify signatures capable of stratifying patients into predictive prognostic cohorts and treatment subgroups (11, 12 ). Recent work has suggested an additional route of miRNA regulation, in which reductions in miRNA concentrations in tumors are frequently due to epigenetic modifications, particularly hypermethylation (13 ). Dysfunctional epigenetic regulation is conventionally thought to have a broader role in tumor development, most commonly through alterations in DNA methylation (14 ). The classification of these aberrant methylation patterns and the identification of hypermethylation markers in cancer promise to improve clinical identification and management of individual cancers, as well as affect the personalized application of epigenetics-based treatment regimens, such as 5-azacytidine (Vidaza) and 5-aza-2⬘-deoxycytidine (decitabine) (14 ). Although inappropriate methylation remains the most comprehensively studied epigenetic modification associated with tumorigenesis,

4

Human genes: MIR155, microRNA 155; MIR15A, microRNA 15a; MIR16-1, microRNA 16-1; CYP2D6, cytochrome P450, family 2, subfamily D, polypeptide 6; CYP2C19, cytochrome P450, family 2, subfamily C, polypeptide 19; APC, adenomatous polyposis coli; BRCA1, breast cancer 1, early onset; BRCA2, breast cancer 2, early onset; ERBB2 (HER-2), v-erb-b2 erythroblastic leukemia viral oncogene homolog 2, neuro/glioblastoma derived oncogene homolog (avian); HOXB13, homeobox B13; IL17RB, interleukin 17 receptor B.

histone alterations, such as lysine acetylation and serine phosphorylation, may ultimately become therapeutic targets for reprogramming a malignant cell’s epigenetic code (14 ). The capability for multistudy, cross-institutional corroboration and the reproducibility of genomicexpression data currently make transcriptomics one of the most broad-scale, inclusive, and accurate means for personalizing oncologic care. These strengths account for the increasing reliance of expression profiling in assessing chemotherapeutic decisions, despite the often-cited potential disconnect between transcript concentrations and protein translational and functional status (15 ). Another limitation of transcriptomics, as well as of other molecular-assessment platforms, is the lack of access to patient biosamples, which can lead to the use of samples of heterogeneous quality and type. The lack of a unified, multi-institutional “tumor tissue library” with well-annotated clinical information continues to dampen the impact of transcriptomics on the individualization of cancer management. Finally, proper transcriptomic analysis requires the pathologic discrimination of relevant tumor cells from a heterogeneous background. This hurdle, however, is often overcome with such techniques as laser-capture microdissection and fluorescence-activated cell sorting that help isolate cells pertinent to a tumor’s etiology. PROTEOMICS

Proteomics, although in an early stage of exponential growth, currently offers extensive capabilities for performing system-wide analyses targeted at elucidating functional protein interactions and discovering novel biomarkers for cancer therapeutics. Given that activated protein-signaling cascades represent the final stage of genomic expression, most modern therapies have been designed to target and disrupt dysfunctional cellular signaling. Thus, the insight that proteomics can provide into the functional status of known neoplastic signaling networks will prove to be of great benefit for tumor classification and subsequent selection of treatment based on pharmacoproteomics (15 ). The protein microarray, a technology comparable to the widely used DNA microarray, stands as the cornerstone of proteomics research because of its efficiency in analyzing multiple proteins and their interactions with nucleic acids, lipids, and other small molecules (16 ). This style of targeted proteomics aims at characterizing the abundance, modification, activity, localization, and interaction of protein-signaling cascades that are widely dysregulated in cancer (16 ). The format of proteomic microarrays is tailored to the intended focus of the proteomics investigation, e.g., forward-phase arrays with antibodies immobilized to a surface or reversed-phase arrays with immobilized Clinical Chemistry 55:4 (2009) 687

Reviews analytes (or each patient’s protein samples). Details of protein microarray formats can be found elsewhere (16, 17 ). The potential of proteomic analysis has yet to be fully realized because of technological boundaries and limitations in applicability. Because the exact number of polypeptides produced in humans is uncertain, we are left with estimates that range from hundreds of thousands to millions, if one also counts splice variants and posttranslational modifications that can occur for a given protein species (18 ). The absence of currently unidentified proteins on modern protein arrays may lead to results that are inconclusive, underpowered, or have numerous false negatives. Further complicating array analysis is the fact that the relative abundances of polypeptides vary by orders of magnitude (18 ). Some studies have suggested that approximately 90% of proteins exist at moderate to low concentrations (19 ), potentially below thresholds for separation and detection. Thus, correlating plypeptide quantities obtained with current proteomics techniques to actual protein concentrations while accounting for the interactions of unknown protein species is a biological and computational hurdle. Additional insights into the existence and interactions of polypeptides are sorely needed to decrease the complexity of the nearly intractable algorithms required to account for these shortcomings. SNP PROFILING AND THE HapMap PROJECT

SNP mapping arose out of a necessity to perform largescale genomewide detection of genetic variants among patients. Coinciding with this need for efficient genomic analysis have been substantial technological advances that enable the discovery of disease associations among 10 ⫻ 106 SNPs in the entire human genome. For example, several commercial SNP arrays already can accurately survey ⬎500 000 SNPs in the human genome (20 ). Combinations of specific alleles proximate to SNPs within a single chromosome are compiled into haplotypes. Screening for such haplotypes can then be used as efficient methods for ascertaining patterns of DNA sequence variation potentially linked with disease (21 ). The task of constructing haplotype maps of human disease has been undertaken by the International HapMap Consortium, which aspires to provide linkage analysis of SNP variants to the public (21 ). Selective profiling of genes through haplotype mapping remains relatively inexpensive, a substantial benefit given the cost of genomewide sequencing. The ability to apply SNP profiling as a rapid and accurate means to survey populations for biomarkers is yet another beneficial characteristic. Nevertheless, SNP profiling and haplotype mapping suffer from several limitations. Although under688 Clinical Chemistry 55:4 (2009)

standing genetic variations helps to anticipate tumorigenic predisposition, a sole reliance on this approach generally does not provide information on functional mechanisms of disease or any associated therapeutic targets. Additionally, inferring an association from the binary nature of SNP genotype information often requires a large number of patients, e.g., a few thousand, to detect any genetic effects of relevant SNP biomarkers. SNP profiling also cannot account for any potential effects of posttranscriptional modifications or epigenetic factors (22 ). Such modifications can alter a protein’s isotype, its function in downstream signaling, interactions with other intracellular molecules, and localization within a cell. These variable alterations are not predictable from analyses of genomic sequences alone, and other approaches must be used to gather a more complete view of a cell’s neoplastic potential. HIGH-THROUGHPUT GENE SEQUENCING

Breakthroughs in high-throughput sequencing methods have rejuvenated interest in scouring the entire genomes of patients for mutations associated with a predisposition to neoplastic processes. Future applications of the improved techniques can be envisioned for both direct sequencing and heteroduplex-detection methods (22 ). Such efforts as the Cancer Genome Anatomy Project, the Human Cancer Genome Project, the Cancer Genome Project, and the Cancer Genome Atlas promise to showcase the benefits of genomewide sequencing by uncovering novel disease-associated mutations (23 ). More tangibly, the results of a pilot study that used high-throughput, sequence-based mutational profiling in primary human acute myelogenous leukemia cells have demonstrate the potential of this technology to detect mutational errors that lead to tumorigenesis (24 ). The validity of these investigators’ high-throughput screen was supported by their identification of 6 previously described sequences and 7 novel sequences associated with acute myelogenous leukemia tumorigenesis (24 ). By debunking fears that chance mutations would cloud the identification of pathologically relevant mutations, this study establishes a foundation for future genomewide screens of primary tumors. The genomic-sequencing approach suffers from limitations similar to those of the SNP-profiling technique, however. The lack of insight into various epigenetic and gene-product modifications supports the argument that genomewide screening techniques should be paired with functional assessments of a specific neoplastic process. High cost has persisted as a principal hurdle to broad-scale application of highthroughput sequencing. Although next-generation sequencing platforms will decrease expenses substantially

Reviews

Path to Personalized Cancer Management

(25 ), the sheer quantity of information delivered by a fully sequenced genome offers minimal guidance on how best to approach the task of analysis and interpretation. Distinguishing benign polymorphisms from tumorigenic mutations is still a formidable task, as is the ability to predict which detected mutations are actually transcribed and therefore biologically relevant (22 ). PHARMACOGENOMICS

Pharmacogenomics is based on assessing a patient’s genetic profile for known polymorphisms in specific networks of essential genes that encode either metabolic pathway effectors (enzymes and transporters) or drug targets (receptors) that affect the pharmacokinetics of drug metabolism and distribution (26 ). One such genetic screen designed for metabolic analysis is the AmpliChip (Roche Molecular Systems), which has been approved by the US Food and Drug Administration (FDA). This screening system identifies a patient’s cytochrome P450 genotype encoded by the CYP2D6 (cytochrome P450, family 2, subfamily D, polypeptide 6) and CYP2C19 (cytochrome P450, family 2, subfamily C, polypeptide 19) genes. This development demonstrates the utility of pharmacogenomics by fully using the genomic knowledge of polymorphism-based differences in isoenzymes that affect the metabolism of a host of drugs, including the common chemotherapeutics tamoxifen and cyclophosphamide (27 ). Pharmacogenomics thus offers a superior method for optimizing the dosage profiles of existing therapeutics. This approach, however, is limited by its narrowness of scope. Predicting the pharmacokinetics of a given therapeutic agent with our incomplete knowledge of polymorphisms in metabolic pathways is unlikely to encompass the full complexity associated with drug metabolism. Nevertheless, an increasing capacity to perform high-throughput genomewide scans for polymorphisms in genes involved in metabolism will lead to a coincident increase in the utility of pharmacogenomics. Current Genomic Applications in Personalized Cancer Management The genomics revolution has laid a foundation for deriving genetic signatures for prediagnostic genetic screening, tumor classification, evaluation of patient prognosis, determining the risk of recurrence, and therapeutic response (Table 2). Moreover, recent developments in the use of in vitro drug-sensitivity data in methods of bioinformatics extrapolation promise to provide in silico prediction of the therapeutic response.

PREDIAGNOSTIC GENETIC SCREENING

Even before malignant transformation, the interplay between genetic composition and environmental factors shape an individual’s predisposition for cancer. Knowledge of this genetic predisposition would facilitate the design of a course of preventive management for modifying the risk of neoplastic progression. Such knowledge is available through genetic screening, which is most applicable for families with a history of oncogenic gene mutations, such as in APC (adenomatous polyposis coli), BRCA1 (breast cancer 1, early onset), and BRCA2 (breast cancer 2, early onset), in which mutational status will guide a clinical decision (28 ). The creation of databases with comprehensive lists of genetic mutations known to promote tumorigenesis in humans, such as the Catalogue of Somatic Mutations in Cancer (COSMIC) (http://www.sanger.ac.uk/genetics/ CGP/cosmic/), will undoubtedly alter the utility of genetic screening in the prediagnosis of cancer (29 ). EARLY DETECTION

Genetic instability, abnormal gene transcription or translation, and altered protein production and modification lead to early transformational events that produce signaling changes in a cell’s molecular phenotype (30 ). These alterations are the foundation for genomic and proteomic analyses that can identify them. Following validation, these deviations are incorporated into biomarker measurements for detecting cancer in its nascent stages. Prostate-specific antigen has become a mainstay for detecting and monitoring prostate cancer, and the Early Detection Research Network (http:// edrn.nci.nih.gov/) has accelerated biomarker discovery by creating cross-institutional collaborative alliances for identifying and verifying potential clinical biomarkers that are “. . . easy to detect, measurable across populations, and amenable to use in one or more of the following settings: detection at an early stage; identification of high-risk individuals; early detection of recurrence; or as intermediate endpoints in chemoprevention” (31 ). With the promising leads produced by proteomic (32 ), miRNA (33 ), and epigenetic (34 ) platforms, the holistic approach to biomarker identification is a model for the integrative initiative necessary for successful personalization of cancer management. TUMOR CLASSIFICATION

The use of the microarray assessment of tumors for class prediction and discovery originated with the previously mentioned study of acute leukemias (7 ). With a newly developed method of “neighborhood analysis,” the investigators grouped genes highly expressed in one class and expressed at low levels in the comparative class into “idealized patterns of expression” (7 ). Through further bioinformatics processing, they deClinical Chemistry 55:4 (2009) 689

Reviews

Table 2. Expression signatures and commercial products for personalizing cancer management. Studies/products

Platform

Sample

Cancer type

• Classification (human patient–based modeling) Golub et al. (7 )

Affymetrix HU600 microarray – 50-gene predictor signature

Fresh

AML/ALL

Lymphochip [Alizadeh et al. (35 )]

cDNA array – 3186 genes

Fresh/frozen tissue

DLBCL

[Perou et al. (36 )]

cDNA array – 8102 genes

Fresh/frozen tissue

Breast

Breast Bioclassifier™ [University Genomics (57 )]

55-Gene RT-PCR

Fresh/FFPE tissue

Breast

MapQuant Dx™ [Ipsogen (58 )]

Affymetrix GCS3000Dx2 microarray

Fresh

Breast

Pulmotype™ [Applied Genomics (59 )]

IHC

FFPE

NSCLC

CancerTYPE ID® [bioTheranostics (60 )]

92-Gene RT-PCR

FFPE

39 Tumor types

Pathwork® Tissue of Origin Test [Dumur et al. (37 )]

cDNA array – 1550 genes

Frozen

Associates tumor with 1 of 15 tissues of origin

• Prognosis (human patient–based modeling) Yeoh et al. (61 )

Affymetrix U95Av2 microarray

BM aspirate

Pediatric ALL (classification and PVAD failure)

Rosenwald et al. (62 )

17-Gene signature cDNA microarray

FFPE

DLBCL

van ‘t Veer et al. (38 )

Oligonucleotide microarray

Frozen tissue

Breast

Paik et al. (63 )

21-Gene RT-PCR

FFPE

Breast (recurrence after tamoxifen therapy)

Rotterdam Signature [Ross et al. (64 )]

76-Gene signature from Affymetrix U133a microarray

Fresh/frozen

Breast

MammaPrint™ (based on van ‘t Veer study) [Agendia (65 )]

70-Gene oligonucleotide microarray

Fresh/RNARetain姞 tissue

Breast

eXagenBC™ [eXagen (66 )]

FISH

FFPE

Breast

Mammostrat® [Applied Genomics (67 )]

IHC

FFPE

Breast

PathVysion® [Abbott (68 )]

FISH

FFPE

Breast (for HER-2 status)

HerScan™ [Combimatrix Molecular Diagnostics (69 )]

DNA microarray

Fresh/frozen DNA

Breast (for HER-2 status)

Pulmostrat™ [Applied Genomics (70 )]

IHC

FFPE

NSCLC

Prostate Px [Aureon Laboratories (71 )]

IHC and automated pattern analysis

FFPE

Prostate

• Response to therapy (human patient–based modeling) Cario et al. (72 )

54-Gene signature cDNA microarray

BM aspirate

Pediatric ALL (multi-drug chemotherapy)

Okutsu et al. (73 )

28-Gene signature cDNA microarray

Mononuclear cells

AML (multi-drug chemotherapy)

Takata et al. (74 )

14-Gene signature from cDNA microarray for 27648 genes

Frozen biopsy

Bladder (M-VAC response)

Frank et al. (75 )

128-Gene signature of Affymetrix U133A microarray

BM aspirate

CML (imatinib mesylate resistance)

Continued on page 691

690 Clinical Chemistry 55:4 (2009)

Reviews

Path to Personalized Cancer Management

Table 2. Expression signatures and commercial products for personalizing cancer management. (Continued from page 690) Studies/products

Platform

Sample

Cancer type

Dressman et al. (76 )

38-Gene signature of Affymetrix U1332 Plus 2.0 microarray

Frozen core biopsy

Breast (doxorubicin/paclitaxel

Dressman et al. (40 )

1727-Gene signature of Affymetrix U1332 Plus 2.0 microarray

Fresh/frozen tissue

Ovarian (platinum resistance)

Oncotype DX姞 [Genomic Health (77 )]

RT-PCR

FFPE

Breast (early stage, ER⫹)

NuvoSelect™ [Ayers et al. (78 )]

30-Gene signature cDNA microarray

Fresh/frozen

Breast (TFAC chemotherapy efficacy)

HERmark™ assay [Monogram Biosciences (79 )]

Protein expression quantification

FFPE

Breast (for HER-2 status)

PharmacoDiagnostic姞 tests (EGFR, HER-2, cKit) [Dako (80 )]

IHC, FISH

FFPE

Colorectal (cetuximab and panitumumab efficacy); breast (trastuzumab efficacy); GIST (imatinib mesylate efficacy)

TheraScreen K-Ras [Diagnostic Innovations (81 )]

RT-PCR

Fresh/frozen/FFPE tissue

Colorectal (cetuximab and panitumumab efficacy)

Leumeta™ tests [Quest Diagnostics (82 )]

DNA, RNA, protein analysis

Blood (plasma)

Leukemias (CLL, CML, ALL)

PGxPredict™ [PGx Health (83 )]

SNP analysis in FCGR3A

Blood

NHL (rituximab efficacy)

AmpliChip姞 CYP450 Test [Roche Diagnostics (84 )]

Microarray for allelic variations in CYP2D6 and CYP2C19

Blood

Predict phenotype for chemotherapy metabolism

• Response to therapy (in vitro panel–based modeling) Potti et al. (41 )

In vitro drug sensitivity ⫹ Affymetrix microarray

In vitro cell lines

Cancers with common cytotoxic agent treatments

Lee et al. (42 )

In vitro drug sensitivity ⫹ Affymetrix microarray

In vitro cell lines

All cancers

• Targeted therapy (human patient–based modeling) Imatinib mesylate (Gleevec姞)

Small molecule tyrosine kinase inhibitor

CML

Lapatinib ditosylate (Tykerb姞)

Small molecule tyrosine kinase inhibitor for HER-2/neu and EGFR

Breast cancer and lung cancers

Gefitinib (Iressa姞)

Small molecule EGFR inhibitor

NSCLC

Erlotinib (Tarceva姞)

Small molecule EGFR inhibitor

NSCLC

Trastuzumab (Herceptin姞)

Humanized monoclonal antibody

HER-2 overexpressing breast cancer

Cetuximab (Erbitux姞)

Chimeric monoclonal antibody

Metastastic colorectal cancer

Panitumumab (Vectibix姞)

Humanized monoclonal antibody

Metastastic colorectal cancer

Bevacizumab (Avastin姞)

Humanized monoclonal antibody

Colorectal, lung, and breast cancer

a

AML, acute myelogenous leukemia; ALL, acute lymphoblastic leukemia; DLBCL, diffuse large B-cell lymphoma; RT-PCR, real-time PCR; IHC, immunohistochemistry; NSCLC, non–small cell lung carcinoma; BM, bone marrow; PVAD, prednisone, vincristine, asparaginase, and daunorubicin; FISH, fluorescence in situ hybridization; M-VAC, methotrexate, vinblastine, doxorubicin (Adriamycin™), and cisplatin; CML, chronic myelogenous leukemia; ER, estrogen receptor; TFAC, paclitaxel (Taxol姞), 5-fluorouracil, doxorubicin (Adriamycin), and cyclophosphamide; EGFR, epidermal growth factor receptor; GIST, gastrointestinal stromal tumor; HER-2, ERBB2 gene; CLL, chronic lymphocytic leukemia; NHL, non-Hodgkin lymphoma. b Human genes: FCGR3A, Fc fragment of IgG, low affinity IIIa, receptor (CD16a).

Clinical Chemistry 55:4 (2009) 691

Reviews rived a 50-gene predictor set for correctly assigning independent leukemia samples into acute myelogenous leukemia, acute lymphoblastic leukemia, or uncertain groups (7 ). Moreover, this study demonstrated the potential of assessing differential gene expression in cancers for identifying novel subclasses previously overlooked by conventional classification methods (7 ). Other investigators have used similar microarray platforms to apply selective assessment of gene expression to a variety of cancers, including diffuse large B-cell lymphoma (35 ), breast cancer (36 ), cancers with an unknown tissue of origin (37 ), and others (block 1 in Table 2). PROGNOSIS AND PREDICTION OF PATIENT RESPONSE

Gene expression– based risk assessment for prognosis and prediction of disease recurrence is currently available for breast tumors through Genomic Health’s Oncotype Dx® test and Agendia’s MammaPrint™ assay. By screening breast tumor biopsies against the Oncotype Dx real-time PCR– based expression panel of 21 gene biomarkers, physicians can obtain a patient’s prognostic score to guide their clinical decisions (1 ). More than a dozen recent studies and commercially available products have demonstrated the utility of genomic biomarkers to predict a patient’s prognosis and risk of recurrence (blocks 2 and 3 in Table 2). In an early study in this field, van ‘t Veer et al. used expression-microarray analysis to evaluate biopsies of primary tumors from patients with no signs of lymph node or organ metastases. The aim was to develop a gene expression signature predictive of early metastasis and therefore a poor prognosis (38 ). An unsupervised clustering of tumors according to their expression of approximately 5000 substantially regulated genes led to the identification of “good prognosis” and “bad prognosis” tumors (38 ). The number of “informative genes” was subsequently pared down to a 70-gene signature capable of correctly predicting disease outcome in 65 (85%) of 78 patients (38 ). Similar approaches for outcome prediction were used in trials involving non– small cell lung cancer and ovarian cancer (39, 40 ). Foresight into a patient’s likely outcome will aid in the delivery of adjuvant therapies to patients who require aggressive care, reduce unnecessary therapeutic toxicities and side effects in nonprogressing groups, and alleviate the financial toll of excessive treatments on the healthcare system. IN SILICO PREDICTION OF THERAPEUTIC RESPONSE

Two innovative drug-sensitivity studies have called into question the necessity of in vivo models to predict chemotherapeutic response. Both studies developed bioinformatics-centric prediction models that use gene expression patterns of common tumors, in vitro activ692 Clinical Chemistry 55:4 (2009)

ities of therapeutic compounds on these tumors, and retrospective data from matched clinical trials to predict and validate chemosensitivity for a variety of cancers (block 4 in Table 2) (41, 42 ). Potti et al. developed a genomic “predictor signature” consisting of genes whose expression correlated with single-drug sensitivity and resistance in the NCI-60 set of cancer cell lines (41 ). By applying this signature to previously published clinical-response data sets, these investigators demonstrated that patients’ responses to both singledrug therapies and combination therapies could be predicted by comparing the expression of genes in patient tumors with expression-based signatures for in vitro chemosensitivity (41 ). Concurrently, Lee et al. applied their COXEN (COeXpression ExtrapolatioN) algorithm to integrate drug-sensitivity data and gene expression data from the NCI-60 panel with gene expression data from patients’ tumors in order to identify a COXEN biomarker panel (42 ). This biomarker panel consists of genes with strongly positive or negative correlations to in vitro chemosensitivity. Comparison of the gene expression levels for this panel with gene expression profiles of patient tumors will facilitate the projection of therapeutic sensitivity into the clinic (42 ). The COXEN algorithm also demonstrates applicability for predicting chemosensitivity in cancer subtypes not included in the NCI-60 panel, a particular strength because many tumor types are not represented in the NCI-60 panel (42 ). In addition to its therapeutic-prediction capabilities, COXEN can facilitate computational drug screening, thus expanding our capacity to identify agents that are likely to be effective in patients. A trial of such screening methods identified a new potential agent for treating bladder cancer, NSC637993 {6Himidazo[4,5,1-de]acridin-6-one, 5-[2-(diethylamino) ethylamino]-8-methoxy-1-methyl-, dihydrochloride}, which exhibited similar predicted and actual chemosensitivities (42 ). The major advantage of in vitro– derived genomic predictions (41, 42 ) is the ability to include prospective drugs and drug combinations in chemosensitivity predictions. This capability is particularly important when one considers the more than 2400 doublet combinations possible among the approximately 70 FDAapproved antineoplastic agents. The FDA has proposed guidelines for clinical-trial validation of these combinatorial therapies and future multidrug regimens that use combinations of current and novel therapeutics (43 ). This document acknowledges that in vitro drug validations would expedite current practices of drug development through the use of archival tissue samples and relevant clinical information from patients (43 ). The potential of these in silico analyses to directly bridge in vitro chemosensitivity with predictions of

Reviews

Path to Personalized Cancer Management

clinical efficacy is highly attractive in its efficient use of resources. Deriving an approach to clinical treatment from in vitro results promises to shorten the time and decrease the resources necessary to match patients with a large number of treatment options. Eliminating the need for animal intermediates has the potential to abbreviate the time devoted to the research and development of novel therapeutics. Moreover, currently approved pharmaceuticals will find new application through cross-comparisons of in vitro screening chemosensitivities and tumor expression profiles. TARGETED THERAPY

The concept of targeting therapies speaks to both identifying distinct molecular mechanisms for therapeutic recourse and selecting patient populations for which a particular treatment will be most efficacious. The success of molecularly targeted inhibitors of tumorigenic pathways is readily apparent in the treatment of hematogenous cancers. In these cancers, such as chronic myelogenous leukemia, the discovery of independent molecular drivers of neoplasia have aided in the development of targeted therapies (44 ). In the case of chronic myelogenous leukemia, linking the genetic lesion, a t(9;22) translocation, with the dysregulation of an ABL kinase led to the development of a selective kinase inhibitor, imatinib mesylate (Gleevec®; Novartis). Successful identification of rogue molecularsignaling pathways is ongoing for other cancers, and novel small-molecule inhibitors and monoclonal antibodies continue to see moderate clinical use. From the use of gefitinib (Iressa®; AstraZeneca) and erlotinib (Tarceva®; Genentech) as therapies for lung cancer to the use of trastuzumab (Herceptin®; Genentech) and lapatinib (Tykerb®; GlaxoSmithKline) as therapies for breast cancer, the potential for targeting individual signaling disruptions as a method for chemotherapeutic treatment is evident (block 5 in Table 2). The integration of screening results provided by biomarker panels and pharmacodiagnostic tests will substantiate treatment decisions based on data from clinical trials (45 ). In the application of trastuzumab for breast cancer patients, determining whether the HER-2 protein is overproduced or whether ERBB2 [v-erb-b2 erythroblastic leukemia viral oncogene homolog 2, neuro/glioblastoma derived oncogene homolog (avian)] (HER-2) amplification can predict the efficacy of treatment (45 ). Screening tests such as Herceptest™ (Dako) for assessing HER-2 overproduction in tumor biopsies and fluorescence in situ hybridization analysis for ERBB2 amplification increase the likelihood for success in treatment groups. Moreover, screening patients for pertinent molecular dysregulations will bolster our knowledge of mutational frequency in a given pathway and thus expand our under-

standing of its absolute relevancy within disease populations. Bioinformatics: Techniques and Challenges Bioinformatics translates raw molecular data extracted from patient samples into interpretable, accessible, and distributable information. This step in the personalization process is necessary for interpreting most highthroughput microarray-based biomedical applications and typically consists of the following steps: (a) data preprocessing, (b) normalization, (c) biomarker discovery, (d) statistical modeling and validation of prediction, and (e) follow-up clinical confirmation. Some details of implementation of these procedures may vary slightly between different molecular platforms, such as cDNA arrays, oligonucleotide arrays, or mass spectrometry (46 ). Nevertheless, these approaches have been used across most platforms and therefore provide a systematic method for transcending the challenges encountered during the analysis of most highthroughput biotechnology data. DATA PREPROCESSING

High-throughput molecular data possess rich digital information for each molecular target. For example, a single scan of a microarray chip produces hundreds of image pixels for each of ⬎20 000 transcript probes. The initial preprocessing analysis of such high-density data is often tightly combined and encrypted within the manufacturing steps of biotechnology instrumentation. This step, however, is one of the most critical in the analysis for optimizing the molecular information obtainable from such massive amounts of biological data. In fact, preprocessing algorithms generated by third parties sometimes have demonstrated substantially higher reproducibility and sensitivity with certain commercial microarray platforms than those produced by the platform manufacturer (47 ). Additionally, identifying and quantifying specific protein species from a background of fragmented-peptide massspectrometry spectra are recognized as a challenging statistical and computational problem. The direct involvement of computational researchers in the development of these initial data-preprocessing steps will prove greatly beneficial. NORMALIZATION

Normalization is required to standardize multiple data sets produced in independent experiments before they can be combined for analysis. This step is essential for obtaining data with universal applicability despite their having differing institutional origins. Before normalization, data are often log-transformed to make their distributions more appropriate for subsequent analyClinical Chemistry 55:4 (2009) 693

Reviews ses. Data are then normalized via correction often with a simple constant factor. More sophisticated normalization methods such as nonparametric regression can be used if different data sets have nonlinear relationships (47 ). BIOMARKER DISCOVERY AND PREDICTION MODELING

The identification and selection of biomarkers that are clinically relevant are a fundamental step that determines the success of downstream applications that advance personalized medicine. This kind of discovery can be performed in various ways: (a) simple comparison of contrasting groups (e.g., disease-free survivors vs relapsed patients, responders vs nonresponders to a therapeutic compound), (b) analysis of a statistical association between patients’ outcomes and phenotypes, and (c) analysis of correlations with continuous drugsensitivity data, such as for GI50 (concentration inhibiting growth by 50%) in a cell line panel. An important key to biomarker discovery is the subsequent validation (46 ). With a set of preidentified biomarkers, many different statistical approaches have been used in prediction modeling. Such approaches include gene voting, discriminant analysis, Bayesian regression or classification, random forest, Cox regression, and support vector machines, technical details of which are beyond the scope of this review and can be found elsewhere (7, 41, 42 ). These different prediction-modeling techniques have often been found to provide similar predictive powers if each is finely tuned. An integral aspect of modeling is to maintain extremely tight control of error and bias due to overtraining and multiple comparisons. VALIDATION AND FOLLOW-UP CLINICAL CONFIRMATION

To avoid “selection bias” one must validate a trained prediction model with patient data sets that are completely independent of the original discovery and training data set (48 ). The follow-up preferably uses data from different places and clinical settings, because a patient cohort from a particular location and clinical setting may produce specific molecular results that may not occur in other patient populations. This step is particularly important to ensure the performance and accuracy of the molecular assay in clinical practice. Considerations for Clinical Pathology BIOSAMPLE QUALITY

The privileged access of the clinical pathology laboratory to patient biopsies and samples places it in a unique position to ensure sample integrity through the development and implementation of standardized guidelines for procuring and storing biosamples. The 694 Clinical Chemistry 55:4 (2009)

coordination of such efforts is currently facilitated by the US National Cancer Institute’s Office of Biorepositories and Biospecimen Research (http://biospecimens. cancer.gov/index.asp) (49 ). The establishment of this department in 2005 is a testament to the priority placed on the necessity to preserve reliable biosamples for expediting the development of molecular-based diagnostics and therapeutics. The Biospecimen Research Network, the research branch of the Office of Biorepositories and Biospecimen Research, seeks to conduct and collaborate on projects to help establish best practices for sample storage and tracking, to identify high-quality samples in existing repositories, and to determine the impact of specific variables during sample handling (49 ). Whereas the latter 2 aims will provide a short-term solution for the immediate use and interpretation of results produced from currently preserved samples, the implementation of a best-practices policy for biosample handling will have a long-standing impact on protocols for the procurement and storage of pathology samples. Variables such as time from sample excision to fixation, optimal fixation solutions and conditions (e.g., ethanol vs formalin fixation), and sample storage, cataloging, and retrieval each require specific attention (49 ). EXTRACTION OF MOLECULAR INFORMATION

Sustaining the recent momentum in the field of molecular cancer research requires a coalescence of wellannotated clinical information and molecular data derived from patient biosamples. An elegant and abundant cache of patient molecular material is the formalin-fixed, paraffin-embedded (FFPE) samples that have been produced in myriad clinical trials, in which the patient information has been well characterized and the outcomes are known. Retrieval of the full molecular information in archived FFPE samples, however, is hindered by changes in molecular structures that occur during the fixation process. The challenge of extracting useful molecular information despite formalin-induced protein crosslinkages and the addition of monomethylol groups to nucleosides has been partially overcome with the emergence of commercially available kits, such as the RNeasy FFPE Kit (Qiagen), the Paraffin Block Isolation and the RecoverAll Total Nucleic Acid Isolation Kits (Ambion), and the Paradise Plus Reagent System (Arcturus). These kits predominantly rely on proteinase K digestion to facilitate mRNA release from the bonds to cross-linked proteins (50 ). Although successful extraction of intact mRNA is tightly tied to the quality of sample fixation, isolation of contiguous miRNA is less dependent on fixation methods, as evinced by the lower cross-sample variation found in a recent study

Reviews

Path to Personalized Cancer Management

(51 ). The superior potential for extracting miRNA from such samples is primarily attributable to the short lengths of miRNAs (51 ). Such distinctions underscore the necessity for skilled clinical pathologists to determine which biological platform to implement for accurate tumor assessment given the quality of the available samples. QUANTITATIVE REAL-TIME PCR

The power of quantitative real-time PCR (qPCR) to obtain molecular-expression profiles from patient tumor biopsies must be harnessed. Although microarray technology remains the preeminent mode for biomarker discovery, qPCR analysis serves as a useful adjunct for validating microarray results, in addition to functioning as a targeted method for analyzing specific, previously verified biomarker concentrations. As an assessment platform, qPCR offers rapid, single-step amplification and quantification of molecular targets that are useful for clarifying diagnosis, predicting recurrence, and guiding treatment (52 ). For example, a qPCR-based ratio of HOXB13 (homeobox B13) expression to IL17RB (interleukin 17 receptor B) expression has been used to predict tumor recurrence in the setting of adjuvant tamoxifen monotherapy (53 ). Although conventional qPCR methods are effective for assessing expression signatures for small batches of molecular targets, the advancement of multiplex qPCR permits concentrations of multiple targets to be analyzed in a single reaction (52 ). The consolidation of resources and time offered by a multiplexed qPCR model may provide the speed and cost efficiency necessary for individualized molecular-assessment practices to gain widespread clinical integration (25 ). FOCUS ON EDUCATION

Efforts to emphasize the therapeutic potential of integrating molecular profiling and clinical assessments are necessary to accelerate the actualization of personalized care. The clinical pathology laboratory is well equipped to provide physicians with the knowledge to promote the regular incorporation of molecular tumor analysis into standard workups; however, the overall success of such efforts will reside in striking a cautionary balance of promoting personalized medicine without overselling a concept that lacks proper verification and implementation standards. This sentiment resonates with groups such as the Coriell Personalized Medicine Collaborative (54 ). A Coriell study hopes to

clarify the impact of genomic assessment on clinical outcomes while shaping the social and legal ramifications associated with openly accessible genomic profiling (54 ). Conclusions Advancements in molecular-profiling techniques have provided unprecedented insight into the genetic etiologies and basic molecular dysfunctions that lead to tumorigenesis. Moreover, bioinformatics analysis of gene expression data has produced expression signatures that are useful for personalizing many aspects of cancer management, including genetic screening, early detection, tumor classification, and prediction of prognosis and therapeutic response. Successful incorporation of bioinformatics-based assessments alongside current methods of tissue and clinical evaluation will leverage the complementary natures of these biological platforms. Orchestrating this collaboration is one of the steepest challenges facing clinical pathologists in their quest to integrate bioinformatics with frontline clinical care. The likely payoff for such efforts appears enormous. Through mediation of conventional and molecular-assessment methods, the personalization of cancer management stands poised for success.

Author Contributions: All authors confirmed they have contributed to the intellectual content of this paper and have met the following 3 requirements: (a) significant contributions to the conception and design, acquisition of data, or analysis and interpretation of data; (b) drafting or revising the article for intellectual content; and (c) final approval of the published article. Authors’ Disclosures of Potential Conflicts of Interest: Upon manuscript submission, all authors completed the Disclosures of Potential Conflict of Interest form. Potential conflicts of interest: Employment or Leadership: J.K. Lee and D. Theodorescu are cofounders of Key Genomics. Consultant or Advisory Role: None declared. Stock Ownership: D. Theodorescu, Key Genomics; J.K. Lee, Key Genomics. Honoraria: None declared. Research Funding: J.B. Overdevest, NIH Cancer Research Training in Molecular Biology (T32A009109); D. Theodorescu, AstraZeneca and NIH grant R01CA075115; J.K. Lee, NIH grant R01HL081690. Expert Testimony: None declared. Role of Sponsor: The funding organizations played no role in the design of study, choice of enrolled patients, review and interpretation of data, or preparation or approval of manuscript.

References 1. Allison M. Is personalized medicine finally arriving? Nat Biotechnol 2008;26:509 –17. 2. Hanahan D, Weinberg RA. The hallmarks of cancer. Cell 2000;100:57–70.

3. Fidler IJ. Tumor heterogeneity and the biology of cancer invasion and metastasis. Cancer Res 1978; 38:2651– 60. 4. Jemal A, Siegel R, Ward E, Hao Y, Xu J, Murray T,

Thun MJ. Cancer statistics, 2008. CA Cancer J Clin 2008;58:71–96. 5. Sherbenou DW, Druker BJ. Applying the discovery of the Philadelphia chromosome. J Clin Invest

Clinical Chemistry 55:4 (2009) 695

Reviews 2007;117:2067–74. 6. Ramaswamy S, Golub TR. DNA microarrays in clinical oncology. J Clin Oncol 2002;20:1932– 41. 7. Golub TR, Slonim DK, Tamayo P, Huard C, Gaasenbeek M, Mesirov JP, et al. Molecular classification of cancer: class discovery and class prediction by gene expression monitoring. Science 1999;286:531–7. 8. Minna JD, Girard L, Xie Y. Tumor mRNA expression profiles predict responses to chemotherapy. J Clin Oncol 2007;25:4329 –36. 9. Calin GA, Croce CM. MicroRNA signatures in human cancers. Nat Rev Cancer 2006;6:857– 66. 10. Liu CG, Calin GA, Meloon B, Gamliel N, Sevignani C, Ferracin M, et al. An oligonucleotide microchip for genome-wide microRNA profiling in human and mouse tissues. Proc Natl Acad Sci U S A 2004;101:9740 – 4. 11. Yu SL, Chen HY, Chang GC, Chen CY, Chen HW, Singh S, et al. MicroRNA signature predicts survival and relapse in lung cancer. Cancer Cell 2008;13:48 –57. 12. Merritt WM, Lin YG, Han LY, Kamat AA, Spannuth WA, Schmandt R, et al. Dicer, Drosha, and outcomes in patients with ovarian cancer. N Engl J Med 2008;359:2641–50. 13. Lujambio A, Ropero S, Ballestar E, Fraga MF, Cerrato C, Setien F, et al. Genetic unmasking of an epigenetically silenced microRNA in human cancer cells. Cancer Res 2007;67:1424 –9. 14. Esteller M. Epigenetics in cancer. N Engl J Med 2008;358:1148 –59. 15. Wulfkuhle J, Edmiston K, Liotta L, Petricoin E. Technology insight: pharmacoproteomics for cancer—promises of patient-tailored medicine using protein microarrays. Nat Clin Pract Oncol 2006; 3:256 – 68. 16. MacBeath G. Protein microarrays and proteomics. Nat Genet 2002;32:526 –32. 17. Liotta LA, Espina V, Mehta AI, Calvert V, Rosenblatt K, Geho D, et al. Protein microarrays: meeting analytical challenges for clinical applications. Cancer Cell 2003;3:317–25. 18. Aebersold R, Mann M. Mass spectrometry-based proteomics. Nature 2003;422:198 –207. 19. Marko-Varga G, Ogiwara A, Nishimura T, Kawamura T, Fujii K, Kawakami T, et al. Personalized medicine and proteomics: lessons from non-small cell lung cancer. J Proteome Res 2007; 6:2925–35. 20. Shi MM. Enabling large-scale pharmacogenetic studies by high-throughput mutation detection and genotyping technologies. Clin Chem 2001; 47:164 –72. 21. The International HapMap Consortium. The International HapMap Project. Nature 2003;426: 789 –96. 22. Weber BL. Cancer genomics. Cancer Cell 2002;1: 37– 47. 23. Strausberg RL, Simpson AJG, Wooster R. Sequence-based cancer genomics: progress, lessons and opportunities. Nat Rev Genet 2003;4: 409 –18. 24. Ley TJ, Minx PJ, Walter MJ, Ries RE, Sun H, McLellan M, et al. A pilot study of highthroughput, sequence-based mutational profiling of primary human acute myeloid leukemia cell genomes. Proc Natl Acad Sci U S A 2003;100: 14275– 80.

696 Clinical Chemistry 55:4 (2009)

25. Shendure J, Ji H. Next-generation DNA sequencing. Nat Biotechnol 2008;26:1135– 45. 26. Evans WE, Relling MV. Pharmacogenomics: translating functional genomics into rational therapeutics. Science 1999;286:487–91. 27. van Schaik RHN. CYP450 pharmacogenetics for personalizing cancer therapy. Drug Resist Updat 2008;11:77–98. 28. Ponder B. Genetic testing for cancer risk. Science 1997;278:1050 – 4. 29. Bamford S, Dawson E, Forbes S, Clements J, Pettett R, Dogan A, et al. The COSMIC (Catalogue of Somatic Mutations in Cancer) database and website. Br J Cancer 2004;91:355– 8. 30. Sidransky D. Emerging molecular markers of cancer. Nat Rev Cancer 2002;2:210 –9. 31. Srinivas PR, Kramer BS, Srivastava S. Trends in biomarker research for cancer detection. Lancet Oncol 2001;2:698 –704. 32. Wulfkuhle JD, Liotta LA, Petricoin EF. Proteomic applications for the early detection of cancer. Nat Rev Cancer 2003;3:267–75. 33. Mitchell PS, Parkin RK, Kroh EM, Fritz BR, Wyman SK, Pogosova-Agadjanyan EL, et al. Circulating microRNAs as stable blood-based markers for cancer detection. Proc Natl Acad Sci U S A 2008; 105:10513– 8. 34. Verma M, Srivastava S. Epigenetics in cancer: implications for early detection and prevention. Lancet Oncol 2002;3:755– 63. 35. Alizadeh AA, Eisen MB, Davis RE, Ma C, Lossos IS, Rosenwald A, et al. Distinct types of diffuse large B-cell lymphoma identified by gene expression profiling. Nature 2000;403:503–11. 36. Perou CM, Sorlie T, Eisen MB, van de Rijn M, Jeffrey SS, Rees CA, et al. Molecular portraits of human breast tumours. Nature 2000;406: 747–52. 37. Dumur CI, Lyons-Weiler M, Sciulli C, Garrett CT, Schrijver I, Holley TK, et al. Interlaboratory performance of a microarray-based gene expression test to determine tissue of origin in poorly differentiated and undifferentiated cancers. J Mol Diagn 2008;10:67–77. 38. van ’t Veer LJ, Dai H, van de Vijver MJ, He YD, Hart AAM, Mao M, et al. Gene expression profiling predicts clinical outcome of breast cancer. Nature 2002;415:530 – 6. 39. Potti A, Mukherjee S, Petersen R, Dressman HK, Bild A, Koontz J, et al. A genomic strategy to refine prognosis in early-stage non-small-cell lung cancer. N Engl J Med 2006;355:570 – 80. 40. Dressman HK, Berchuck A, Chan G, Zhai J, Bild A, Sayer R, et al. An integrated genomic-based approach to individualized treatment of patients with advanced-stage ovarian cancer. J Clin Oncol 2007;25:517–25. 41. Potti A, Dressman HK, Bild A, Riedel RF, Chan G, Sayer R, et al. Genomic signatures to guide the use of chemotherapeutics. Nat Med 2006;12: 1294 –300. 42. Lee JK, Havaleshko DM, Cho H, Weinstein JN, Kaldjian EP, Karpovich J, et al. A strategy for predicting the chemosensitivity of human cancers and its application to drug discovery. Proc Natl Acad Sci U S A 2007;104:13086 –91. 43. US Food and Drug Administration. Draft guidance – in vitro diagnostic multivariate index assays. http://www. fda.gov/cdrh/oivd/guidance/1610.html (Accessed October 2008).

44. Green MR. Targeting targeted therapy. N Engl J Med 2004;350:2191–3. 45. Jorgensen JT, Nielsen KV, Ejlertsen B. Pharmacodiagnostics and targeted therapies—a rational approach for individualizing medical anticancer therapy in breast cancer. Oncologist 2007;12:397– 405. 46. Macgregor PF, Squire JA. Application of microarrays to the analysis of gene expression in cancer. Clin Chem 2002;48:1170 –7. 47. Simon R, Radmacher MD, Dobbin K, McShane LM. Pitfalls in the use of DNA microarray data for diagnostic and prognostic classification. J Natl Cancer Inst 2003;95:14 – 8. 48. Ambroise C, McLachlan GJ. Selection bias in gene extraction on the basis of microarray geneexpression data. Proc Natl Acad Sci U S A 2002; 99:6562– 6. 49. Office of Biorepositories and Biospecimen Research. http://biospecimens.cancer.gov/index.asp (Accessed December 2008). 50. Lewis F, Maughan N, Smith V, Hillan K, Quirke P. Unlocking the archive— gene expression in paraffin-embedded tissue. J Pathol 2001;195: 66 –71. 51. Doleshal M, Magotra AA, Choudhury B, Cannon BD, Labourier E, Szafranska AE. Evaluation and validation of total RNA extraction methods for microRNA expression analyses in formalin-fixed, paraffin-embedded tissues. J Mol Diagn 2008;10: 203–11. 52. Bernard PS, Wittwer CT. Real-time PCR technology for cancer diagnostics. Clin Chem 2002;48: 1178 – 85. 53. Ma XJ, Wang Z, Ryan PD, Isakoff SJ, Barmettler A, Fuller A, et al. A two-gene expression ratio predicts clinical outcome in breast cancer patients treated with tamoxifen. Cancer Cell 2004;5:607–16. 54. Coriell Personalized Medicine Collaborative. http://www.coriell.org/index.php/content/view/92/ 167/ (Accessed December 2008). 55. Ramaswamy S, Ross KN, Lander ES, Golub TR. A molecular signature of metastasis in primary solid tumors. Nat Genet 2003;33:49 –54. 56. Araujo RP, Liotta LA, Petricoin EF. Proteins, drug targets and the mechanisms they control: the simple truth about complex networks. Nat Rev Drug Discov 2007;6:871– 80. 57. University Genomics, Inc. Breast Bioclassifier. http://www.bioclassifier.com/ (Accessed October 2008). 58. Ipsogen. MapQuant DX. http://www.ipsogen. com/index.php?id⫽64 (Accessed October 2008). 59. Applied Genomics, Inc. PulmoType. http://www. applied-genomics.com/pulmotype.html (Accessed October 2008). 60. bioTheranostics. Theros CancerTYPE ID. http:// www.aviaradx.com/cTYPE/cType_overview.html (Accessed October 2008). 61. Yeoh EJ, Ross ME, Shurtleff SA, Williams WK, Patel D, Mahfouz R, et al. Classification, subtype discovery, and prediction of outcome in pediatric acute lymphoblastic leukemia by gene expression profiling. Cancer Cell 2002;1: 133– 43. 62. Rosenwald A, Wright G, Chan WC, Connors JM, Campo E, Fisher RI, et al. The use of molecular profiling to predict survival after chemotherapy for diffuse large-B-cell lymphoma. N Engl J Med 2002;346:1937– 47.

Reviews

Path to Personalized Cancer Management

63. Paik S, Shak S, Tang G, Kim C, Baker J, Cronin M, et al. A multigene assay to predict recurrence of tamoxifen-treated, node-negative breast cancer. N Engl J Med 2004;351:2817–26. 64. Ross JS, Hatzis C, Symmans WF, Pusztai L, Hortobagyi GN. Commercialized multigene predictors of clinical outcome for breast cancer. Oncologist 2008;13:477–93. 65. Agendia. MammaPrint. http://usa.agendia.com/ en/mammaprint.html (Accessed October 2008). 66. eXagen. eXagenBC. http://www.exagen.com/our products/breastcancer.aspx (Accessed October 2008). 67. Applied Genomics, Inc. MammoStrat. http://www. applied-genomics.com/mammostrat.html (Accessed October 2008). 68. Abbott. PathVysion. http://www.pathvysion.com/ (Accessed October 2008). 69. Combimatrix Molecular Diagnostics. HerScan. http:// www.cmdiagnostics.com/testmenu.htm (Accessed January 2009). 70. Applied Genomics, Inc. PulmoStrat. http://www. applied-genomics.com/pulmostrat.html (Accessed October 2008). 71. Aureon Laboratories. Prostate Px. http://www.aureon. com/prognostic-tests-prostate-px.htm (Accessed October 2008). 72. Cario G, Stanulla M, Fine BM, Teuffel O, Neuhoff

73.

74.

75.

76.

77.

NV, Schrauder A, et al. Distinct gene expression profiles determine molecular treatment response in childhood acute lymphoblastic leukemia. Blood 2005;105:821– 6. Okutsu J, Tsunoda T, Kaneta Y, Katagiri T, Kitahara O, Zembutsu H, et al. Prediction of chemosensitivity for patients with acute myeloid leukemia, according to expression levels of 28 genes selected by genome-wide complementary DNA microarray analysis. Mol Cancer Ther 2002;1: 1035– 42. Takata R, Katagiri T, Kanehira M, Tsunoda T, Shuin T, Miki T, et al. Predicting response to methotrexate, vinblastine, doxorubicin, and cisplatin neoadjuvant chemotherapy for bladder cancers through genome-wide gene expression profiling. Clin Cancer Res 2005;11:2625–36. Frank O, Brors B, Fabarius A, Li L, Haak M, Merk S, et al. Gene expression signature of primary imatinib-resistant chronic myeloid leukemia patients. Leukemia 2006;20:1400 –7. Dressman HK, Hans C, Bild A, Olson JA, Rosen E, Marcom PK, et al. Gene expression profiles of multiple breast cancer phenotypes and response to neoadjuvant chemotherapy. Clin Cancer Res 2006;12:819 –26. Genomic Health. Oncotype DX. http://

78.

79.

80.

81.

82.

83.

84.

www.genomichealth.com/OncotypeDX/Index.aspx? Sid⫽33 (Accessed October 2008). Ayers M, Symmans WF, Stec J, Damokosh AI, Clark E, Hess K, et al. Gene expression profiles predict complete pathologic response to neoadjuvant paclitaxel and fluorouracil, doxorubicin, and cyclophosphamide chemotherapy in breast cancer. J Clin Oncol 2004;22:2284 –93. Monogram Biosciences, Inc. HERmark breast cancer assay. http://www.hermarkassay.com/ (Accessed October 2008). Dako. pharmacoDiagnostic Solution tests. http:// www.dakousa.com/index/prod_search/prod_groups. htm?productareaid⫽39 (Accessed January 2009). Diagnostic Innovations. TheraScreen. http://www. dxsdiagnostics.com/Content/TheraScreenKRAS.aspx (Accessed October 2008). Quest Diagnostics. Leumeta. http://www. questdiagnostics.com/hcp/topics/hem_onc/leumeta. html (Accessed October 2008). PGx Health. PGxPredict:Rituximab. http://www. pgxhealth.com/genetictests/rituximab/ (Accessed January 2009). Roche Diagnostics. AmpliChip CYP450 test. http://www.amplichip.us/ (Accessed October 2008).

Clinical Chemistry 55:4 (2009) 697

Reviews

Clinical Chemistry 55:4 698–708 (2009)

Management of Gene Promoter Mutations in Molecular Diagnostics Karen M. K. de Vooght,1* Richard van Wijk,1 and Wouter W. van Solinge1

BACKGROUND: Although promoter mutations are known to cause functionally important consequences for gene expression, promoter analysis is not a regular part of DNA diagnostics. CONTENT:

This review covers different important aspects of promoter mutation analysis and includes a proposed model procedure for studying promoter mutations. Characterization of a promoter sequence variation includes a comprehensive study of the literature and databases of human mutations and transcription factors. Phylogenetic footprinting is also used to evaluate the putative importance of the promoter region of interest. This in silico analysis is, in general, followed by in vitro functional assays, of which transient and stable transfection assays are considered the gold-standard methods. Electrophoretic mobility shift and supershift assays are used to identify trans-acting proteins that putatively interact with the promoter region of interest. Finally, chromatin immunoprecipitation assays are essential to confirm in vivo binding of these proteins to the promoter.

SUMMARY:

Although promoter mutation analysis is complex, often laborious, and difficult to perform, it is an essential part of the diagnosis of disease-causing promoter mutations and improves our understanding of the role of transcriptional regulation in human disease. We recommend that routine laboratories and research groups specialized in gene promoter research cooperate to expand general knowledge and diagnosis of gene-promoter defects.

© 2009 American Association for Clinical Chemistry

splicing, mRNA stability, and translation initiation. An important part of regulation, however, is believed to occur at the level of transcription initiation (1 ). During the past few years, much progress has been made in understanding the basis of transcriptional regulation. Transcription factors (TFs),2 chromatin-modifying enzymes, and TFs unite to activate genes and are recruited in a precise order to promoters. The timing of the activation of transcription and the ordered recruitment of factors to promoters are the engines that, at the right moment and for the right duration of time, drive transcriptional regulation of each gene throughout the cell’s life-span (2 ). Failure in timing or recruitment of TFs may affect transcriptional regulation of a gene, putatively leading to disease. In this review we focus on sequence variations in the promoter region as a putative cause of disturbed transcriptional regulation leading to disease. Not every promoter sequence variation affects transcriptional regulation. Depending on the location and the nature of the genetic defect, a mutation in the promoter region of a gene may disrupt the normal processes of gene activation by disturbing the ordered recruitment of TFs at the promoter. As a result a promoter mutation can decrease or increase the level of mRNA and thus protein. The effect of promoter mutations can be very subtle. In addition, promoter mutation analysis is complex, and the assays that are needed to investigate the functional relationship between the mutation and disease are laborious and difficult to perform. Therefore, thorough studies of promoter mutations are scarce and often confined to research laboratories. The Promoter of a Gene

Gene expression is regulated at many levels, including chromatin packing, histone modification, transcription initiation, RNA polyadenylation, pre-mRNA

The promoter (Fig. 1), a regulatory region of DNA located upstream of a gene, plays an important role in transcriptional regulation. The core promoter, a loosely defined region (approximately between nucleotides ⫺40 and ⫹50 from the transcriptional start site

1

Department of Clinical Chemistry and Haematology, Laboratory for Red Blood Cell Research, University Medical Center Utrecht, Utrecht, the Netherlands. * Address correspondence to this author at: Department of Clinical Chemistry and Haematology, Laboratory for Red Blood Cell Research, University Medical Center Utrecht, Postbus 85500, 3508 GA, Utrecht, the Netherlands. Fax ⫹31 88 7555418; e-mail [email protected]. Received November 19, 2008; accepted January 28, 2009. Previously published online at DOI: 10.1373/clinchem.2008.120931

698

2

Nonstandard abbreviations: TF, transcription factor; TSS, transcriptional start site; TFBS, transcription factor binding site; HGMD, human gene mutation database; SNP, single nucleotide polymorphism; EMSA, electrophoretic mobility shift assay.

Gene Promoter Mutation Analysis

Reviews

Fig. 1. Schematic overview of the different elements of a general promoter. The core promoter directs low-level transcription and contains binding sites for general transcription factors and RNA polymerase II. The proximal promoter contains multiple binding sites for transcription factors, which cooperatively stimulate transcriptional activity. Transcription factors are indicated by different geometric shapes. TIC, transcription initiation complex; RNA pol, RNA polymerase.

[TSS]), directs low-level transcription. The core promoter region contains binding sites for general TFs and RNA polymerase II. These general TFs, such as TFIID, TFIIA and TFIIB, assemble on the core promoter in an ordered fashion to form a transcription–initiation complex, which directs RNA polymerase II to the TSS (3 ). The core promoter may also contain other elements such as the TATA box, which is the binding site for a subunit of TFIID. This TATA box has the consensus-binding sequence 5⬘-TATAAA-3⬘ and is characteristic for tissue-specific genes, the expression of which is restricted to a limited number of cells (4 ). Housekeeping genes, the expression of which is ubiquitous, usually lack TATA boxes and instead contain GC-rich sequences. The assembly of general TFs on the core promoter is sufficient to direct low levels of transcription, a process generally referred to as basal transcription. Transcriptional activity is greatly stimulated by a second class of TFs, termed activators. In general, activators are sequence-specific DNA-binding proteins whose recognition sites are present in the proximal promoter. The proximal promoter is the region immediately upstream, up to a few hundred base pairs, from the core promoter, and typically contains multiple binding sites for TFs (1 ). In contrast to the core and proximal promoter, enhancers are regulatory DNA sequences that may be located 5⬘ or 3⬘ to or within an exon or intron of a gene. Enhancer function is by definition independent of position and orientation. Enhancers are considered to act via a DNA-loop, whereby the enhancer and core promoter are brought into close proximity by “looping out” the intervening DNA (1 ). Whereas there are common motifs in core and proximal promoters, enhancers do not contain many distinctive sequence motifs.

Therefore they cannot easily be identified on the basis of their DNA sequence alone. Sequence-specific elements that confer a negative (i.e., silencing or repressing) effect on the transcription of a target gene are called silencers. They generally have the same features as enhancers. In addition, the locus control region is a group of regulatory elements involved in regulating an entire locus or gene cluster. Locus control regions direct tissue-specific, physiological expression of a linked gene in a manner that is position independent and copy-number dependent and are composed of multiple cis-acting elements, including enhancers and silencers (1, 3 ). Many classes of TFs, which can be distinguished from each other by different DNA-binding domains, have been described. Examples of activator families include those containing a cysteine-rich zinc finger, homeobox, helix-loop-helix, basic leucine zipper, forkhead, or ETS DNA-binding domain (3 ). The TFbinding sites (TFBS) are generally small, in the range of 6 –12 bp, although binding specificity is usually dictated by no more than 4 – 6 positions within the site (1 ). The TFBS for a specific activator is therefore typically described by a consensus sequence in which certain positions are relatively constrained whereas others are more variable. In September 2003, the National Human Genome Research Institute launched the ENCODE (encyclopedia of DNA elements) project to identify all functional elements in the human genome by using a mix of different experimental and computational approaches. During its pilot phase, the project focused on approximately 1% of the human genome sequence (ENCODE Project Consortium 2004: http:/www.genome.gov/ 10005107). One of the most surprising findings was that more than half of the genes use a tissue-specific Clinical Chemistry 55:4 (2009) 699

Reviews

Table 1. Frequently used Web-based resources for in silico promoter analyses (32 ).a Resource name

URL

Information outcome

Genome browsers NCBI Ensembl

http://www.ensembl.org

Gene sequence, polymorphic variations, exon information, phylogenetic footprinting

OMIM (Online Mendelian Inheritance in Man)

http://www.ncbi.nlm.nih.gov/sites/entrez?db⫽OMIM

Gene information, promoter information, mutations

National Center for Biotechnology Information (NCBI) Entrez Nucleotide

http://www.ncbi.nlm.nih.gov/sites/entrez?db⫽Nucleotide Gene sequence, promoter sequence

Mutation database HGMD (Human Gene Mutation Database)

http://www.hgmd.cf.ac.uk/ac/index.php

Mutations

PromoterInspector

http://www.genomatix.de/online_help/help_gems/ PromoterInspector_help.html

Promoter prediction

FirstEF

http://rulai.cshl.edu/tools/FirstEF

Promoter prediction

DBTSS (Database of Transcriptional Start Sites)

http://dbtss.hgc.jp/index.html

Transcriptional start site

Promoter predictions

TF-binding profile database TRANSFAC®

http://www.gene-regulation.com/pub/databases.html# TF-binding site (matrix), TFs transfac

TF-binding site prediction

a

TESS (Transcription Element Search System)

http://www.cbil.upenn.edu/cgi-bin/tess/tess

TF-binding site prediction

Match™

http://www.gene-regulation.com/cgi-bin/pub/programs/ TF-binding site prediction match/bin/match.cgi

Please note that this list is not intended to be comprehensive. For a more extensive list, see Wasserman et al. (32 ).

and often unannotated set of exons outside the current boundaries of the annotated genes (5 ). In some genes the promoters of other neighboring genes are used in specific cells and/or developmental stages. In addition, 5⬘ untranslated regions have been shown to contain critical regulatory elements (6, 7 ). These findings contribute to the opinion that transcriptional regulation is complex and therefore difficult to study. Location of the Promoter To understand the mechanisms of transcriptional regulation of a certain gene, knowledge of the exact location of the promoter(s) and possible enhancers and silencers is necessary to relate promoter mutations to disease. In addition, this information is required to design correct promoter reporter vectors for transfection assays, the gold standard assay for investigating functional importance of promoter mutations (see below). Identifying the promoter of a specific gene poses a challenge, because core promoters are often located far upstream of the first coding exon. Furthermore, at least 700 Clinical Chemistry 55:4 (2009)

half of the mammalian genes are regulated by more than one promoter to enable tissue-specific regulation (8 ). Fortunately, the promoters of many genes have recently been identified, and some of the most important TFBSs have been characterized (1 ). Promoter prediction programs (e.g., PromoterInspector, FirstEF) may be used to identify and locate the promoter if this information is not available (Table 1). These programs are frequently modified to make them more accurate and efficient. It may be challenging to determine which region of the promoter should be screened for regulatory mutations. Recently, 5⬘ untranslated regions have been shown to contain several critical regulatory elements (6 ). Therefore it seems appropriate to start screening immediately upstream of the translation initiation site (which starts with the nucleotide sequence ATG). Evaluation of entries in the Human Gene Mutation Database (HGMD) (9 ) reveals that most registered regulatory mutations are located between nucleotides ⫹50 and ⫺500 from the TSS of a gene. Rockman et al. (10 ) analyzed the distribution of functional single-

Reviews

Gene Promoter Mutation Analysis

Table 2. Examples of mutations in transcriptional regulatory elements associated with human diseases.

Disease

Affected gene

Mutation (disrupted regulatory element)

Reference

␤-thalassemia

HBB

Numerous (TATA-box, CACCC box, EKLF)

Hardison et al. (15 )

Bernard-Soulier syndrome

GP1BB

–133A3G (GATA-1)

Ludlow et al. (33 )

Pyruvate kinase deficiency

PKLR

–72A3G (GATA-1) ⫺83G3C (PKR-RE1)

Manco et al. (34 ), Van Wijk et al. (35 )

Familial hypercholesterolemia

LDLR

Numerous (Sp-1, SRE repeat, FP1, FP2)

www.ucl.ac.uk/ldlr

Hemophilia B

F9

⫺20T3A (HNF-4), ⫺6G3A and ⫺6G3C (C/EBP)

Crossley and Brownlee (36 ), Reijnen et al. (37 )

nucleotide polymorphisms (SNPs) (see also below) in the human promoter region and showed that the first 500 nucleotides upstream of the TSS indeed contained most of the functional SNPs (59%). However, a substantial fraction was found further upstream; 13% were more than 1 kb upstream, and another 13% were located 3⬘ to the TSS. The authors even reported that 2 SNPs (1.4%) occurred even more than 10 kb upstream of their TSS. There is, therefore, a spatial distribution with respect to sequence variations affecting transcriptional regulation (10 ), although there is a bias toward the immediate 5⬘ flanking sequence. These findings indicate that in case of a suspected regulatory mutation causing disease without alterations in the proximal or core promoter region, the upstream region is likely to be a good target for further analysis. Confirming this assumption are reports that mutations in upstream promoter regions, such as enhancers, silencers, and locus control regions, are associated with disease (11 ). Significance of Promoter Mutations in Human Disease PROMOTER MUTATIONS IN DISEASE

Some 1% of single base-pair substitutions causing human genetic disease occur within gene promoter regions, where they disrupt the normal processes of gene activation and transcriptional initiation and usually decrease or increase the amount of mRNA and thus protein (12 ). Promoter mutations can alter or abolish the binding capacity of cis-acting DNA-sequence motifs for the trans-acting protein factors that normally interact with them (12 ). Examples of promoter mutations causing disease include ␤-thalassemia, BernardSoulier syndrome, pyruvate kinase deficiency, familial hypercholesterolemia, and hemophilia (Table 2) (1 ). The contribution of promoter mutations to the total of disease-causing mutations is unclear, however. For instance, the majority of missense mutations cause a qualitative defect that is fairly easy to identify. Sometimes, mutant alleles even act as dominant alleles, be-

cause the affected protein may antagonize remaining normal protein. In contrast, promoter mutations may cause small quantitative defects, which may be hard to detect. Even if the promoter of an autosomal gene is completely downregulated as result of mutation, half of the normal amount of protein is present, which is often enough to prevent severe disease. Because there are few reports about the incidence of promoter mutations, we studied the HGMD (9, 13 ). To date, this database contains a total of 73 411 registered mutations (assembly date September 2007), of which 1.5% are regulatory. An example of a thoroughly studied gene, which has been a model gene for studying mechanisms of transcriptional regulation, is the hemoglobin, beta (HBB)3 gene. The HGMD database contains a total of 490 entries for HBB, of which 234 (48%) are missense/nonsense mutations, 28 (6%) promoter mutations, and 9 (2%) other (3⬘) regulatory mutations. The first regulatory mutation entry was that of a single base change (–28A3 C) in the TATA box of the HBB gene, which caused ␤-thalassemia in a Kurdish Jewish individual in 1982 (14 ). This modification of the TATA box was the first ever found in association with a genetic disorder. Approximately 10 of 28 registered HBB promoter mutations have been studied by use of functional transfection assays (15 ). An example of a gene in which regulatory mutations have only recently been identified is the cystic fibrosis transmembrane conductance regulator (ATP-binding cassette sub-family C, member 7) (CFTR) gene. This gene was identified in 1989, and the catalogue of mutations now exceeds 1564 in number (www.genet.sickkids.on.ca/ cftr) but contains only 8 promoter mutations (0.52%). The first DNA defect, the well-known ⌬F508 deletion

3

Genes: HBB, hemoglobin, beta; CFTR, cystic fibrosis transmembrane conductance regulator (ATP-binding cassette sub-family C, member 7); GP1BB, glycoprotein 1b (platelet), beta polypeptide; PKLR, pyruvate kinase, liver and RBC; LDLR, low density lipoprotein receptor; F9, coagulation factor IX; PPOX, protoporphyrinogen oxidase; luc⫹, a gene of the firefly.

Clinical Chemistry 55:4 (2009) 701

Reviews causing cystic fibrosis, was reported in 1989 (16 ), whereas Bienvenu et al. (17 ) reported the first regulatory mutation (–741T3 G) almost 6 years later. In contrast to HBB, only 1 of the 8 catalogued CFTR promoter mutations has been characterized by use of functional transfection assays (18 ). Although the relevance of promoter mutations in cystic fibrosis is unknown, these observations suggest that the number of putative CFTR promoter mutations is underestimated. As in CFTR, promoter mutations may have been overlooked in other genes. As a result, it is difficult to assess the general incidence of disease-causing promoter mutations.

In general, polymorphic sequence variations are considered to be rather harmless, especially if located in noncoding parts of a gene. The role of polymorphisms in determining susceptibility to disease traits is the subject of much research effort, but it often remains unclear whether the polymorphisms are themselves functionally relevant or just linked to another causative mutation (19 ). The term polymorphism has been defined as a “Mendelian trait that exists in the population in at least two phenotypes, neither of which occurs at a frequency of ⬍1%” (12 ). Polymorphisms are not rare, being distributed thorough the human genome at a frequency of 1 in 200 to 1 in 1000 bp (20 ). Polymorphisms that occur in the promoter may affect gene expression and may thus have the potential to be of phenotypic or even of pathological significance (12 ). An increasing number of promoter polymorphisms have been characterized by functional studies. Some may well be pathologically important, e.g., those in the genes coding for plasminogen activator inhibitor type 1, tumor necrosis factor ␣, apolipoprotein AI, lipoprotein lipase, and interleukin 6 (12 ). Current epidemiological investigations, in which large amounts of SNPs are studied in relation to disease, are revealing considerable numbers of putative functional promoter SNPs. However, a causal link between these promoter polymorphisms and disease is often absent, because these studies generally lack functional promoter assays. Without functional promoter assays it is incorrect to state that a certain promoter sequence variation causes disease in vivo; another regulatory mutation linked to the identified polymorphism may be the one affecting promoter activity, thereby causing disease.

sembl, and the online version of Mendelian Inheritance of Man (OMIM) can be used as a first step to investigate if an identified promoter sequence variation is known, associated with disease, and previously functionally characterized (Tables 1 and 3). DNA polymorphisms are often not catalogued in these databases unless they exhibit sufficiently strong phenotypic association (21 ). The next step is to use in silico analysis to investigate whether the sequence variation is disrupting or creating a putative TFBS (Table 3). Experimental data regarding the specific binding sites of most wellcharacterized TFs have been compiled in databases such as TRANSFAC (Table 1) (22 ). In these databases experimentally determined TFBSs are used to calculate a probability score for nucleotides on a specific position in a consensus TFBS (site matrix). Programs such as TESS (transcription element search software) (Table 1) (23 ) are able to compare a genomic sequence input to all matrices in TRANSFAC and report a list of potential TFBSs based on a statistical match between a region in the sequence and a site matrix. This analysis is, however, often encumbered by the prediction of a large number of putative TFBSs, a significant fraction of which will not be involved in transcriptional regulation of the gene. This situation may be attributable to the quality of the data used to build the TFBS matrices (24 ) and discrepancies that occur owing to in vivo absence or inactivity of a TF or cofactor, or to condensed local chromatin (25 ). In addition to these false-positive problems, the comprehensiveness of the databases is also an issue; not all DNA-binding TFs have been identified, and even for some known factors, binding specificity has not yet been fully characterized (1 ). Phylogenetic footprinting (Table 3), the comparison of the sequence of interest with the homologous region in other species, is used to investigate the putative functional relevance of a promoter sequence variation. The rationale behind this process is that nucleotides within binding sites are more likely to be conserved by natural selection. Although there is abundant evidence that conserved regions do, indeed, often contain functional regulatory motifs; this correlation does not always exist because not all TFBSs are conserved among species. Finally, some of the most important transcriptional regulatory elements relevant to normal human development and disease may not be highly conserved. Instead they may be restricted to only humans or primate relatives (1 ).

Techniques for Promoter Analysis

FUNCTIONAL PROMOTER ASSAYS

POLYMORPHIC PROMOTER SEQUENCE VARIATIONS IN DISEASE

IN SILICO ANALYSIS OF PROMOTER MUTATIONS

Literature and databases such as HGMD, National Center for Biotechnology Information (NCBI) En702 Clinical Chemistry 55:4 (2009)

A promoter mutation that putatively causes disease must be characterized to assess the relevance of the DNA sequence variation in relation to the disease. Proper analysis demands proof that the mutation sig-

To investigate if promoter mutation alters promoter activity in vivo

Transgenic expression assays

To investigate protein binding to promoter in vitro

To investigate protein binding to promoter in vivo

DNase I–footprinting assay

Chromatinimmunoprecipitation (ChIP) assay

To investigate protein binding to promoter in vitro

To investigate if promoter mutation alters promoter activity in vitro

Stable transfection assays

DNA-TF–binding assays EMSA

To investigate if promoter mutation alters promoter activity in vitro

To investigate prevalence, association with disease and previous functional characterization To investigate if the sequence variation is disrupting or creating a putative TFBS To investigate putative relevance of promoter sequence variation

Aim

Functional promoter assays Transient transfection assays

Phylogenetic footprinting

TF database search

In silico analysis Literature and database search

Method

No, 32P facilities must be available

Partial digestion of 32P-labeled fragment by DNase I is followed by denaturing acrylamide gel electrophoresis; nucleotides bound by protein are protected from cleavage, producing “footprint” in ladder of labeled DNA fragments, allowing specific localization of the site of protein–DNA interaction DNA-binding proteins are crosslinked to target sites in growing cells; cells are lysed, DNA cleaved; protein–DNA complexes are purified by immunoprecipitation with antibodies directed against the DNA-binding protein; immunoprecipitate is analyzed for presence of regulatory element

Yes, when using commercially available nuclear extracts

Yes, when using commercially available nuclear extracts and nonradiolabeled probes

No, cloning and cell-culturing facilities have to be available

No, cloning and cell culturing facilities have to be available

No, cloning and cell culturing facilities have to be available

Yes

Yes

Yes

Feasible in general laboratory

Proteins recognizing a given promoter sequence are identified by incubating labeled DNA probe with nuclear extract; upon electrophoresis free labeled–probe molecules separate from protein-bound molecules

Promoter constructs are cloned into reporter assay system and injected in fertilized oocytes;. in vivo gene expression correlates with reportergene expression in transgenic organism

Promoter constructs are cloned into plasmid upstream reporter gene and stably transfected into cultured cells; reporter activity correlates with promoter activity.

Promoter constructs are cloned into plasmid upstream of reporter gene and transiently transfected into cultured cells; reporter activity correlates with promoter activity

Literature and databases are used to investigate relevance of promoter mutation Programs compare genomic sequence input to known binding sequences and report list of potential TFBSs Comparison of sequence with homologous region in other species.

Description

“Visualizing” of in vivo interaction between a specific protein and regulatory element, commercial ChIP assay kits (shortening optimization procedures) available

Clear-cut method, sensitive, detection and characterization of specific protein–DNA interactions, identification of TFs by supershift assays, commercial assays available, location of TFBS determined by mutagenesis, detection of protein– DNA interactions on a small fragment (20–30 bp) Useful for scanning large DNA fragments (50–200 bp) for protein–DNA interactions

Prediction of in vivo promoter activity, selection of transfected cells based on drug-resistance gene, natural chromatin environment and copy number, more sensitive ad robust than transient transfection assays. Better prediction of in vivo promoter activity than with other functional assays, monitoring reporter-gene expression through entire development of the organism

Prediction of in vivo promoter activity, relatively easy to perform and less timeconsuming compared to other functional promoter assays

Fast, easy, low cost, assessment of putative relevance of mutation possible

Fast, easy, low-cost assessment of putative relevance of mutation possible

Fast, easy, low cost

Main advantages

Table 3. Properties of techniques involved in promoter analysis.

Maston et al. (1 )

Maston et al. (1 ), Wray et al. (25 )

Antonarakis et al. (21 )

References

Technically challenging, TF must be known, highquality antibodies needed, optimizing shearing conditions difficult, careful titration of DNase I necessary

Only broad indication of binding site, DNA backbone affects cleavage efficiency, effective primarily when protein is at high concentrations, MgCl2 and CaCl2 (for DNase I activity) may disrupt specific interactions, less useful for investigation of specific promoter mutation

Protein–DNA interaction should maintain during gel electrophoresis, length of probe is critical, titration of nonspecific competitor DNA difficult, DNA region of interest must be known, only short probes can be tested

Carey and Smale (28 )

Knight (19 ), Carey and Smale (28 ), Galas and Schmitz (38 )

Knight (19 ), Carey and Smale (28 )

Difficulties capturing all regulatory elements in Maston et al. (1 ), Knight (19 ) single-reporter construct, in vitro–in vivo discrepancies due to differences in chromatin context and due to tissue- or developmentalstage–specific expression, plasmid DNA exists in artificial configuration leading to inactivity of regulatory elements Difficulties capturing all regulatory elements in Maston et al. (1 ), Knight (19 ) single reporter construct, in vitro–in vivo discrepancies due to differences in chromatin context and due to tissue- or developmentalstage–specific expression, more technically demanding than transient transfection assays Laborious and time-consuming Nobrega et al. (27 )

Publication bias, absence of information in case of unknown mutations, polymorphisms often not catalogued In silico vs in vivo discrepancies, not all DNAbinding TFs have been identified or fully characterized Correlation between conservation and functional activity does not always exist, not all TFBSs are conserved among species

Main disadvantages

Gene Promoter Mutation Analysis

Reviews

Clinical Chemistry 55:4 (2009) 703

Reviews

Fig. 2. Results obtained by use of transient transfection assays. The –324T3 A mutation in the erythroid-specific promoter of the pyruvate kinase, liver and RBC (PKLR) gene does not affect promoter activity compared with the wild-type construct. In contrast, the – 83G3 C strongly reduces in vitro promoter activity. We concluded that the – 83G3 C mutation in the erythroid-specific promoter of PKLR strongly downregulates promoter activity in vitro (35 ). Reprinted with permission from Van Wijk et al. (35 ). *Statistically significant (P ⬍ .05). LUC, the pGL3-Basic vector (Promega Corporation) with luc⫹ gene (a gene of the firefly).

nificantly alters promoter activity in vitro (functional assays, Table 3). One of the more versatile functional tests is based on the use of reporter gene assays. In these assays, the region of DNA to be tested for regulatory activity is cloned into a plasmid upstream of an easily assessable reporter gene, such as the genes coding for chloramphenicol acetyltransferase, ␤-galactosidase, green fluorescent protein, or luciferase (26 ). The resulting wild-type and mutant constructs are then transfected (either transiently or stably) into cultured cells, and the activity of the reporter gene is measured to determine if the promoter mutation alters reporter gene expression (see for an example Fig. 2). Cotransfection of a control reporter plasmid is used to correct for transfection efficiency within or between transfection experiments. More sophisticated testing of upstream regulatory elements is performed by constructing transgenic organisms and monitoring reporter gene expression through the entire development of the organism (27 ). Compared to constructing transgenic organisms, transient transfection assays are much easier to perform, less time-consuming, and more feasible in laboratories with only limited cell-culturing facilities. One of the restrictions is that regulatory elements can be widely dispersed and difficult to capture in a single reporter construct. Another concern is that the plasmid DNA is placed in an artificial environment, which may lead to inactivity or dysregulation of regulatory elements (19 ). A third drawback is that the in vivo activity of a reporter gene may fail to reproduce the expression pattern of its endogenous equivalent owing to differences in chromatin context. Finally, a given upstream regulatory element may, in practice, be used only for restricted purposes, such as those specific for certain tissues or developmental stages. If the cell 704 Clinical Chemistry 55:4 (2009)

culture system used to assay the reporter gene activity does not match the physiological conditions under which the regulatory element is normally active, differences in promoter activity between wild-type and mutant constructs may not be detected (1 ). Despite the limitations, reporter gene assays remain the most accurate means available to investigate the functional consequences of a promoter mutation. DNA-TF–BINDING ASSAYS

In addition to functional promoter assays, it is essential to demonstrate that the interaction of a putative TF with the DNA sequence of interest is affected by the promoter mutation (DNA-TF– binding assays, Table 3) (28 ). Several methods have been used for in vitro detection and characterization of protein–DNA interactions, including electrophoretic mobility shift assay (EMSA) and DNase I–footprinting assays (Table 3). EMSA is by far the most commonly used assay, mainly because it provides a relatively simple, rapid, and extremely sensitive technique for the detection and characterization of specific protein–DNA interactions (29 ). EMSA is based on the principle that a protein–DNA complex migrates more slowly through a native gel than the corresponding free DNA. Proteins within a nuclear extract that specifically recognize a given promoter sequence can be identified by incubating a small radiolabeled DNA probe with the extract to allow the formation of protein–DNA complexes. Application of the mixture to a native polyacrylamide gel and subsequent electrophoresis, will separate the free radiolabeled probe molecules from the protein-bound molecules (28 ). Free DNA and DNA–protein complexes are then detected by autoradiography or phosphorimager analysis (Fig. 3). Differences in binding pattern be-

Gene Promoter Mutation Analysis

Reviews

Fig. 3. Results obtained by use of EMSA and supershift assay. Incubation of a wild-type (WT) protoporphyrinogen oxidase (PPOX) probe with K562 nuclear extract results in a specific DNA–protein complex. This complex is absent in case of a GATA-1 mutated (mut) PPOX competitor (comp) probe. The DNA–protein complex supershifts after addition of anti–GATA-1. We concluded that GATA-1 binds 1 of the GATA-binding motifs in exon 1 of PPOX in vitro (6 ). Reprinted with permission from de Vooght et al. (6 ).

tween the wild-type and mutant radiolabeled probes are indicative of TFs interacting with the DNA sequence of interest. Competition studies with nonlabeled wild-type and mutant competitor probes are used to test the specificity of DNA–protein interactions. Commercial kits for performing EMSA without the need for radiolabeled probes have recently become available. The putative TF candidate can be further identified by use of an antibody directed against this protein. After electrophoresis, binding of this antibody to the DNA–protein complex results in a more slowly migrating or completely disappearing DNA–protein complex (supershift assay, Fig. 3). In case of an unknown protein interacting with the DNA sequence of interest, protein purification experiments must be performed first. One advantage of EMSA is that it is analytically sensitive and can reveal a specific protein–DNA complex even when the protein is present at low concentrations. A disadvantage is that the protein–DNA interaction has to be maintained during gel electrophoresis. Some protein-DNA complexes are not sufficiently strong to last during the typical 2- to 4-h electrophoresis time period. In addition, probes should be long enough to support the forming of stable protein-DNA interactions, and relatively large concentrations of nonspecific competitor DNA, such as poly(dI:dC), are often needed to increase specificity (28 ).

Chromatin immunoprecipitation assays demonstrate the in vivo relevance of TF binding. In brief, in these assays growing cells are treated with formaldehyde to crosslink DNA-binding proteins to their target sites. Cells are then lysed, and the DNA is cleaved into fragments by digestion with a restriction enzyme or by ultrasonic shearing. Protein–DNA complexes are purified by immunoprecipitation with antibodies directed against the DNA-binding protein of interest. To determine whether the protein was crosslinked to the putative TFBS, antibody-binding is neutralized, proteins are digested by proteinase-K treatment, and DNA is analyzed by PCR for the presence of a DNA fragment encompassing the regulatory element (30 ). The principal strength of the in vivo crosslinking assay is that it is the only method currently available for directly “visualizing” an in vivo interaction between a specific protein and a regulatory element (28 ). A limitation of the approach is that it is technically challenging and that the putatively interacting TF must be known. The method requires high-quality antibodies capable of recognizing the fixed, target-bound TF, and optimization of chromatin-shearing conditions can be difficult. Fortunately, commercial chromatin immunoprecipitation assay kits have recently become available. These kits shorten optimization procedures, making these assays accessible for less experienced laboratories. Clinical Chemistry 55:4 (2009) 705

Reviews

Fig. 4. Flow chart of the proposed standardized procedure for investigating promoter mutations and polymorphisms. Different aspects of promoter analysis are put into a decision tree for characterization of promoter mutations and polymorphicpromoter sequence variations. ChIP, chromatin immunoprecipitation.

706 Clinical Chemistry 55:4 (2009)

Reviews

Gene Promoter Mutation Analysis

PROCEDURE FOR ANALYZING PROMOTER MUTATIONS AND POLYMORPHISMS

In general, the characterization of a detected promoter variation can be performed according to the proposed standardized procedure displayed in Fig. 4. This flowchart is compatible with most published promoter mutation studies. In silico analysis is relatively quick and easy and can also be performed by less experienced laboratories. Functional promoter assays and TF-binding assays are more difficult and laborious to perform. Concluding Remarks Several studies have identified disease-causing cisregulatory mutations (31 ). The main reasons why promoter analysis is not performed on a regular basis are: (a) promoter mutations can be properly analyzed only by laborious functional or biochemical tests, (b) the location of the promoter is frequently not well defined, (c) the significance of promoter mutations is difficult to interpret, and (d) the effect of promoter mutations is considered to be too mild to cause disease. As a result, interpretation of promoter mutations is difficult and often not a feasible way to gain strong conclusive results with regard to the clinical effect of the identified mutation. Analysis of promoter mutations is important because it improves the diagnosis of disease-causing promoter mutations and also expands our understanding of the role of transcriptional regulation in human dis-

ease. To enhance the diagnosis of disease caused by mutations in the promoter region of a gene and to speed up procedures in basic promoter-research laboratories we need better prediction programs and dedicated easy-to-perform functional promoter assays as well as TF-binding assays for analyzing promoter mutations. Pending these advances, we believe that clinical laboratories should team up with research groups specializing in gene-promoter research. This is a 2-way street: routine laboratories can translate results obtained by research laboratories into diagnostic tools, whereas research groups specializing in gene-promoter research depend on the identification of regulatory mutations in patients to improve knowledge of transcriptional regulation of the gene of interest and the role of transcriptional regulation in disease.

Author Contributions: All authors confirmed they have contributed to the intellectual content of this paper and have met the following 3 requirements: (a) significant contributions to the conception and design, acquisition of data, or analysis and interpretation of data; (b) drafting or revising the article for intellectual content; and (c) final approval of the published article. Authors’ Disclosures of Potential Conflicts of Interest: No authors declared any potential conflicts of interest. Role of Sponsor: The funding organizations played no role in the design of study, choice of enrolled patients, review and interpretation of data, or preparation or approval of manuscript.

References 1. Maston GA, Evans SK, Green MR. Transcriptional regulatory elements in the human genome. Annu Rev Genomics Hum Genet 2006;23:23. 2. Cosma MP. Ordered recruitment: gene-specific mechanism of transcription activation. Mol Cell 2002;10:227–36. 3. Latchman DS. Eukaryotic transcription factors. 3rd ed. London: Academic Press; 1998. 360 p. 4. Breathnach R, Chambon P. Organization and expression of eucaryotic split genes coding for proteins. Ann Rev Biochem 1981;50: 349 – 83. 5. Denoeud F, Kapranov P, Ucla C, Frankish A, Castelo R, Drenkow J, et al. Prominent use of distal 5⬘ transcription start sites and discovery of a large number of additional exons in ENCODE regions. Genome Res 2007;17:746 –59. 6. de Vooght KM, van Wijk R, van Solinge WW. GATA-1 binding sites in exon 1 direct erythroidspecific transcription of PPOX. Gene 2008;409: 83–91. 7. Zimmermann N, Colyer JL, Koch LE, Rothenberg ME. Analysis of the CCR3 promoter reveals a regulatory region in exon 1 that binds GATA-1. BMC Immunol 2005;6:7. 8. Kimura K, Wakamatsu A, Suzuki Y, Ota T, Nishikawa T, Yamashita R, et al. Diversification of transcriptional modulation: large-scale identifica-

9.

10.

11.

12.

13. 14.

15.

tion and characterization of putative alternative promoters of human genes. Genome Res 2006; 16:55– 65. Stenson PD, Ball EV, Mort M, Phillips AD, Shiel JA, Thomas NS, et al. Human Gene Mutation Database (HGMD): 2003 update. Hum Mutat 2003;21:577– 81. Rockman MV, Wray GA. Abundant raw material for cis-regulatory evolution in humans. Mol Biol Evol 2002;19:1991–2004. Ladenvall P, Johansson L, Jansson JH, Jern S, Nilsson TK, Tjarnlund A, et al. Tissue-type plasminogen activator ⫺7,351C/T enhancer polymorphism is associated with a first myocardial infarction. Thromb Haemost 2002;87:105–9. Cooper DN. Human gene mutation in pathology and evolution. J Inherit Metab Dis 2002;25: 157– 82. Finishing the euchromatic sequence of the human genome. Nature (Lond) 2004;431:931– 45. Poncz M, Ballantine M, Solowiejczyk D, Barak I, Schwartz E, Surrey S. beta-Thalassemia in a Kurdish Jew: single base changes in the T-A-T-A box. J Biol Chem 1982;257:5994 – 6. Hardison RC, Chui DHK, Giardine B, Riemer C, Patrinos GP, Anagnou N, et al. HbVar: a relational database of human hemoglobin variants and thalassemia mutations at the globin gene

server. Hum Mutat 2002;19:225–33. 16. Riordan JR, Rommens JM, Kerem B, Alon N, Rozmahel R, Grzelczak Z, et al. Identification of the cystic fibrosis gene: cloning and characterization of complementary DNA. Science (Wash DC) 1989;245:1066 –73. 17. Bienvenu T, Lacronique V, Raymondjean M, Cazeneuve C, Hubert D, Kaplan JC, Beldjord C. Three novel sequence variations in the 5⬘ upstream region of the cystic fibrosis transmembrane conductance regulator (CFTR) gene: two polymorphisms and one putative molecular defect. Hum Genet 1995;95:698 –702. 18. McCarthy VA, Harris A. The CFTR gene and regulation of its expression. Pediatr Pulmonol 2005; 40:1– 8. 19. Knight JC. Functional implications of genetic variation in non-coding DNA for disease susceptibility and gene regulation. Clin Sci (Lond) 2003;104: 493–501. 20. Wang DG, Fan JB, Siao CJ, Berno A, Young P, Sapolsky R, et al. Large-scale identification, mapping, and genotyping of single-nucleotide polymorphisms in the human genome. Science (Wash DC) 1998;280:1077– 82. 21. Antonarakis SE, Krawczak M, Cooper DN. Disease-causing mutations in the human genome. Eur J Pediatr 2000;159:S173– 8.

Clinical Chemistry 55:4 (2009) 707

Reviews 22. Wingender E, Chen X, Hehl R, Karas H, Liebich I, Matys V, et al. TRANSFAC: an integrated system for gene expression regulation. Nucleic Acids Res 2000;28:316 –9. 23. Schug J, Overton GC. TESS: Transcription element search software on the WWW. Pennsylvania: Computational Biology and Informatics Laboratory, School of Medicine, University of Pennsylvania; 1998. 10 p. Available from: http://www. cbil.upenn.edu/tess/techreports/1997/CBIL-TR-1997– 1001-v0.0.pdf. 24. Fogel GB, Weekes DG, Varga G, Dow ER, Craven AM, Harlow HB, et al. A statistical analysis of the TRANSFAC database. Biosystems 2005; 81:137–54. 25. Wray GA, Hahn MW, Abouheif E, Balhoff JP, Pizer M, Rockman MV, Romano LA. The evolution of transcriptional regulation in eukaryotes. Mol Biol Evol 2003;20:1377– 419. 26. Alam J, Cook JL. Reporter genes: application to the study of mammalian gene transcription. Anal Biochem 1990;188:245–54. 27. Nobrega MA, Ovcharenko I, Afzal V, Rubin EM. Scanning human gene deserts for long-range en-

708 Clinical Chemistry 55:4 (2009)

hancers. Science (Wash DC) 2003;302:413. 28. Carey M, Smale ST. Transcriptional regulation in eukaryotes: concepts, strategies, and techniques. New York: Cold Spring Harbor Laboratory Press; 2000. 640 p. 29. Garner MM, Revzin A. A gel electrophoresis method for quantifying the binding of proteins to specific DNA regions: application to components of the Escherichia coli lactose operon regulatory system. Nucleic Acids Res 1981;9: 3047– 60. 30. Orlando V, Strutt H, Paro R. Analysis of chromatin structure by in vivo formaldehyde cross-linking. Methods 1997;11:205–14. 31. Wray GA. The evolutionary significance of cisregulatory mutations. Nat Rev Genet 2007;8: 206 –16. 32. Wasserman WW, Sandelin A. Applied bioinformatics for the identification of regulatory elements. Nat Rev Genet 2004;5:276 – 87. 33. Ludlow LB, Schick BP, Budarf ML, Driscoll DA, Zackai EH, Cohen A, Konkle BA. Identification of a mutation in a GATA binding site of the platelet glycoprotein Ib␤ promoter resulting in the Bernard-Soulier syndrome. J Biol Chem 1996;271:

22076 – 80. 34. Manco L, Ribeiro ML, Ma´ximo V, Almeida H, Costa A, Freitas O, et al. A new PKLR gene mutation in the R-type promoter region affects the gene transcription causing pyruvate kinase deficiency. Br J Haematol 2000;110: 993–7. 35. Van Wijk R, Van Solinge WW, Nerlov C, Beutler E, Gelbart T, Rijksen G, Nielsen FC. Disruption of a novel regulatory element in the erythroid-specific promoter of the human PKLR gene causes severe pyruvate kinase deficiency. Blood 2003;101: 1596 – 602. 36. Crossley M, Brownlee GG. Disruption of a C/EBP binding site in the factor IX promoter is associated with haemophilia B. Nature (Lond) 1990; 345:444 – 6. 37. Reijnen MJ, Sladek FM, Bertina RM, Reitsma PH. Disruption of a binding site for hepatocyte nuclear factor 4 results in hemophilia B Leyden. Proc Natl Acad Sci U S A 1992;89:6300 –3. 38. Galas DJ, Schmitz A. DNAse footprinting: a simple method for the detection of protein-DNA binding specificity. Nucleic Acids Res 1978;5: 3157–70.

Point/Counterpoint

Clinical Chemistry 55:4 709–711 (2009)

– POINT –

Use of Pharmacogenetics in Guiding Treatment with Warfarin Mia Wadelius1*

Warfarin is the most widely used oral anticoagulant for the treatment of thromboembolic disorders and for stroke prophylaxis. Warfarin is a problematic drug because it exhibits large interindividual variation in the required therapeutic dose, has a narrow therapeutic range, and shows multiple food and drug interactions. Its anticoagulant effect is monitored by measuring the international normalized ratio (INR), which is a function of the time required for a patient’s blood to coagulate relative to the time it takes for a reference blood sample. Although warfarin has been used in humans for more than 50 years, its main side effect— bleeding—is a leading cause of hospital admission and drugrelated death (1, 2 ). This problem has made patients and clinicians yearn for a new efficient and safe oral anticoagulant drug that does not require frequent monitoring. In Europe, a new oral anticoagulant drug (dabigatran) claimed to have these qualities has been licensed for short-term primary prevention of venous thromboembolic events, but its effectiveness in longterm secondary thromboprophylaxis remains to be shown. Furthermore, the daily cost of dabigatran is 5 times that of warfarin therapy including INR tests. To switch all warfarin patients (currently 1% of the population in many Western countries) to dabigatran would boost national costs in countries with subsidized drug programs; therefore, national authorities will probably encourage the continued use of warfarin, even when oral thrombin inhibitors become available for longterm thromboprophylaxis. Given that warfarin is likely to maintain its position as the most widely used oral anticoagulant for the forseeable future, it is crucial to improve the safety of this drug. The risk of over-anticoagulation and bleeding is especially high before stable anticoagulation has been established. One way to minimize this risk would be to shorten the time to stable anticoagulation by tai-

1

Department of Medical Sciences, Clinical Pharmacology, Uppsala University Hospital, Uppsala, Sweden. * Address correspondence to the author at: Clinical Pharmacology, Uppsala University Hospital, Entrance 61, 3rd Floor, Uppsala, NA, Sweden SE-75185. Fax ⫹46-18-6113703; e-mail [email protected]. Received August 18, 2008; accepted December 17, 2008. Previously published online at DOI: 10.1373/clinchem.2008.115964

loring the initial dose for each patient. The required warfarin dose, which can vary 20-fold among individuals, can be roughly estimated from clinical and demographic factors, such as age, body weight, concurrent disease, and drug and food interactions (3 ). A number of dosage algorithms that use clinical and demographic factors have been tested and are able to reduce the time to therapeutic anticoagulation (4 ). More recent discoveries have shown that variation in the genes that encode the main enzyme responsible for S-warfarin metabolism (CYP2C9,2 cytochrome P450, family 2, subfamily C, polypeptide 9) and the target of warfarin (VKORC1, vitamin K epoxide reductase complex, subunit 1) influence dose requirements by affecting pharmacokinetics and pharmacodynamics (5 ). Polymorphisms in these genes are also associated with the risk of over-anticoagulation during initiation of warfarin therapy (6 ). A large prospective study on warfarin pharmacogenetics provided probabilities of overanticoagulation (INR ⬎4) in patients with different CYP2C9 and VKORC1 alleles (Fig. 1) (7 ). During the first month of treatment, CYP2C9*3/*3 individuals had a 22-fold increased risk of an INR ⬎4 and a tendency for more episodes of serious bleeding compared with individuals with CYP2C9*1/*1. Patients homozygous for VKORC1 variants had a 4.5-fold increased risk of an INR ⬎4 within 5 weeks (7 ). Genotyping the CYP2C9 and VKORC1 genes could avert overdosing in patients who are warfarin sensitive because of these polymorphisms. Several pharmacogenetic algorithms that predict warfarin maintenance doses have been developed by combining genetic, clinical, and demographic factors with warfarin-dosing data and INR measurements (3 ). If genetic testing is integrated into routine warfarin therapy, it is estimated that American warfarin users would annually avoid 4500 to 22 000 serious bleeding events (8 ). The American regulatory agency, the Food and Drug Administration, decided to update the label of warfarin in 2007 to encourage lower initiation doses in patients with CYP2C9 and VKORC1 variant alleles.

2

Human genes: CYP2C9, cytochrome P450, family 2, subfamily C, polypeptide 9; VKORC1, vitamin K epoxide reductase complex, subunit 1.

709

Point/Counterpoint

Fig. 1. Kaplan–Meier curve showing time to first INR peak >4 related to variant alleles of VKORC1 (rs9923231, ⴚ1639G>A) and CYP2C9 (*3). For most patients, the target INR range is 2–3. An INR ⬎4 indicates an increased risk of bleeding. Figure from Wadelius et al. (7 ). Used with permission.

Warfarin thus has the potential to become one of the first drugs in which pharmacogenetic dosing is introduced into routine therapy. It is difficult to predict how routine genotyping would affect US health care costs; predictions range from savings of $357 million to increased costs of $445 million annually. Pharmacogenetic dosing algorithms currently predict approximately 50% of the dose variance in Caucasians, but perform less well in Asians and African Americans and need to be adapted by including additional factors. Furthermore, existing pharmacogenetic algorithms show a poor fit at very high doses because they rely on genetic polymorphisms that increase the sensitivity to warfarin. This situation could be improved by incorporating rarer mutations that cause warfarin resistance into the dose models. Another issue is how to predict loading doses from maintenance-dose models. Finally, it is necessary to test the clinical utility of pharmacoge710 Clinical Chemistry 55:4 (2009)

netic warfarin dosing before its implementation on a broad scale. Two prospective clinical trials of predicted warfarin dosing have shown promising results (9, 10 ). In the Israeli trial, 95 patients who began warfarin therapy according to their CYP2C9 genotype were compared with 96 patients who received doses according to a clinical algorithm (9 ). In the American trial, 101 patients who received their initial warfarin treatment according to CYP2C9 and VKORC1 genotypes were compared with 99 patients randomized to standard therapy (10 ). Despite small sample sizes, both studies claimed that pharmacogenetics increased the efficiency of warfarin initiation. To produce irrefutable results, however, requires adequately powered randomized clinical trials of pharmacogenetic warfarin dosing. Two large clinical trials— one American and one European— of pharmacogenetic vs conventional warfarin initiation are starting in 2009. If the results from these

Point/Counterpoint trials are encouraging—and previous studies on warfarin pharmacogenetics suggest that they will be—then pharmacogenetic dosing will be ready to be introduced into clinical practice. It is hoped that the implementation of pharmacogenetics will improve the safety and cost-effectiveness of oral anticoagulant treatment. Warfarin’s long era as a leading cause of serious hospital admission and drug-related death could thereby be brought to an end.

Author Contributions: All authors confirmed they have contributed to the intellectual content of this paper and have met the following 3 requirements: (a) significant contributions to the conception and design, acquisition of data, or analysis and interpretation of data; (b) drafting or revising the article for intellectual content; and (c) final approval of the published article. Authors’ Disclosures of Potential Conflicts of Interest: Upon manuscript submission, all authors completed the Disclosures of Potential Conflict of Interest form. Potential conflicts of interest: Employment or Leadership: None declared. Consultant or Advisory Role: None declared. Stock Ownership: None declared. Honoraria: None declared. Research Funding: The Swedish Heart and Lung Foundation and Society of Medicine, the So¨derberg and Selander Foundations, and the Clinical Research Support (ALF) at Uppsala University are gratefully acknowledged. Expert Testimony: None declared.

Role of Sponsor: The funding organizations played no role in the design of study, choice of enrolled patients, review and interpretation of data, or preparation or approval of manuscript.

References 1. Pirmohamed M, James S, Meakin S, Green C, Scott AK, Walley TJ, et al. Adverse drug reactions as cause of admission to hospital: prospective analysis of 18 820 patients. BMJ 2004;329:15–9. 2. Wester K, Jonsson AK, Spigset O, Druid H, Ha¨gg S. Incidence of fatal adverse drug reactions: a population based study. Br J Clin Pharmacol 2008;65: 573–9. 3. Wu AH. Use of genetic and nongenetic factors in warfarin dosing algorithms. Pharmacogenomics 2007;8:851– 61. 4. Crowther MA. Oral anticoagulant initiation: rationale for the use of warfarin dosing nomograms. Semin Vasc Med 2003;3:255– 60. 5. Wadelius M, Pirmohamed M. Pharmacogenetics of warfarin: current status and future challenges. Pharmacogenomics J 2007;7:99 –111. 6. Gage BF, Lesko LJ. Pharmacogenetics of warfarin: regulatory, scientific, and clinical issues. J Thromb Thrombolysis 2008;25:45–51. 7. Wadelius M, Chen LY, Lindh JD, Eriksson N, Ghori MJ, Bumpstead S, et al. The largest prospective warfarin-treated cohort supports genetic forecasting. Blood 2009;113:784 –92. 8. McWilliams A, Lutter R, Nardinelli C. Healthcare impact of personalized medicine using genetic testing: an exploratory analysis for warfarin. Per Med 2008;5:279 – 84. 9. Caraco Y, Blotnick S, Muszkat M. CYP2C9 genotype-guided warfarin prescribing enhances the efficacy and safety of anticoagulation: a prospective randomized controlled study. Clin Pharmacol Ther 2008;83:460 –70. 10. Anderson JL, Horne BD, Stevens SM, Grove AS, Barton S, Nicholas ZP, et al. Randomized trial of genotype-guided versus standard warfarin dosing in patients initiating oral anticoagulation. Circulation 2007;116:2563–70.

Clinical Chemistry 55:4 (2009) 711

Point/Counterpoint

Clinical Chemistry 55:4 712–714 (2009)

- COUNTERPOINT -

Pharmacogenetic-Based Initial Dosing of Warfarin: Not Ready for Prime Time Charles S. Eby1*

Availability of the human genome sequence offers the promise of personalized medicine through pharmacogenomics. Warfarin, a member of the coumarin family of oral anticoagulants used to prevent and treat thromboembolic disorders and one of the top 20 prescribed medications in the US, is an ideal drug for applying the principles of pharmacogenetics. Warfarin inhibits reduction of vitamin K epoxide by the vitamin K epoxide reductase complex, subunit 1 (VKORC1)2 enzyme, causing hypogammacarboxylation of vitamin K– dependent coagulation factors and an acquired coagulopathy. Warfarin therapy is monitored with the international normalized ratio (INR) derived from the prothrombin time. The INR therapeutic range is narrow, and the maintenance warfarin dose required to produce a therapeutic INR for an individual is both unpredictable and widely variable, leading to bleeding complications, especially during the initiation period when dose adjustments are made by trial and error (1 ). During the past 12 years, discoveries regarding the molecular basis of warfarin pharmacokinetics and pharmacodyanmics have been combined with clinical and demographic information from stably anticoagulated patients to generate many dosing algorithms. Up to 54% of the interpatient variation in therapeutic warfarin dose can be accounted for by the combination of patient age, body size, target INR, and use of amiodaron with the genotypes for 2 single-nucleotide polymorphisms (SNPs) in cytochrome 2C9 that reduce warfarin metabolism and 1 from a group of SNPs in the vitamin K epoxide reductase complex, subunit 1 (VKORC1)3 gene in high linkage disequilibrium and associated with increased sensitivity to warfarin (2 ).

1

Department of Pathology and Immunology, Division of Laboratory and Genomic Medicine, Washington University School of Medicine, St. Louis, MO. * Address correspondence to this author at: Department of Pathology and Immunology, Division of Laboratory and Genomic Medicine, Washington University School of Medicine, 660 S. Euclid Avenue, St. Louis, MO 63110. Fax 314-362-1461; e-mail [email protected]. Received January 9, 2009; accepted January 16, 2009. Previously published online at DOI: 10.1373/clinchem.2008.115972 2 Nonstandard abbreviations: VKORC1, vitamin K epoxide by vitamin K epoxide reductase complex, subunit 1; INR, international normalized ratio; SNP: single nucleotide polymorphism; FDA, US Food and Drug Administration. 3 Human genes: VKORC1, vitamin K epoxide reductase complex, subunit 1; CYP2C9, cytochrome P450, family 2, subfamily C, polypeptide 9.

712

Few pharmacogenetic algorithms have been validated, however, and all are less accurate when used to predict therapeutic warfarin doses in African Americans (3 ), most likely because of currently unknown genetic mechanisms that affect warfarin sensitivity. Ongoing molecular and translation research has identified additional genetic variants affecting warfarin dosing (4 ), but to date they are rare or have modest impact on therapeutic dose prediction. In August 2007, the US Food and Drug Administration (FDA) added information to warfarin and Coumadin® package insert regarding lowering of therapeutic warfarin doses in cases involving the cytochrome P450, family 2, subfamily C, polypeptide 9 (CYP2C9) gene CYP2CP*2/*3 SNPs and the VKORC1 SNP, but the FDA did not recommend or require genotyping be performed before initiation of warfarin therapy. In response, to enable detection of warfarin pharmacogenetically relevant SNPs, the molecular diagnostic industry has developed reagents for real-time PCR instruments, fluorescent plate readers, and reagent-instrument platforms. To date, the FDA has licensed 5 tests, and others are currently under evaluation. In 2007, the College of American Pathologists added CYP2CP*2/*3 and VKORC1 SNPs to its pharmacogenetics proficiency testing panel. It would appear that there is considerable momentum from some stakeholders to adopt pharmacogenetic-based initial dosing of warfarin into routine clinical practice. It would be reasonable to expect that more accurate, pharmacogenetic-based initial dosing of warfarin would reduce the risk of serious bleeding and thrombotic complications. A few retrospective cohort studies support an association between bleeding complications and CYP2C9 *2/*3 (5–7 ) but not VKORC1 SNPs associated with increased warfarin sensitivity (7 ). Only 3 small prospective randomized control trials have compared pharmacogenetic-based initiation of warfarin to empiric dosing (8 –10 ) (Table 1), and the findings are not convincing. The studies used different algorithms and nomograms for genetic and control dosing arms, respectively, and measured different primary outcomes, including feasibility of rapid genotyping, time to first therapeutic INR, or percentage of outof-range INRs. Only 1 trial identified a significant

Point/Counterpoint

Table 1. Randomized control trials of pharmacogenetic-based initial warfarin dosing: impact on time within therapeutic INR range. Percentage time within therapeutic INR range

Initial warfarin dosing Author (reference)

Patients

White, %

Hillman (10 )

38

100

Genotyped SNPs

CYP2C9*2/*3

Genotype arm

d.1:a PGx algorithm

Control arm

Genotype arm

Control arm

5 mg

41.7

41.5

5 mg

80.4

63.4b

10 mg

69.7

68.6

d.ⱖ2: INR-based nomogram Caraco (8 )

191

100

CYP2C9*2/*3

d.1: *1/*1: 1.25 ⫻ control arm dose Variants dosed at a percentage of *1/*1 dose *1/*2: 80% *1/*3: 60% *2/*2: 50% *2/*3: 40% *3/*3: 15% d.2–8: INR-based nomogram

Anderson (9 )

206

94

CYP2C9*2/*3

d.1-2: 2 ⫻ PGx algorithm

VKORC1 1173c d.3–4: PGx algorithm

5 mg

d.⬎5: INR-based nomogram a

d, day; PGx, pharmacogenetic-based. P ⬍0.001. c In high linkage disequilibrium with VKORC1 promoter SNP at position ⫺1639. b

improvement in time within the therapeutic INR range (8 ) (Table 1). None was powered to detect a difference in serious bleeding complications. Organizations representing clinical geneticists as well as cardiovascular physicians have recently provided position statements and recommendations regarding pharmacogenetic-based warfarin dosing. Based on a commissioned rapid ACCE (Analytical; clinical validity; clinical utility; and ethical, legal and social implications) review of this topic, a working group of the American College of Medical Genetics concluded that there is insufficient evidence to recommend for or against routine CYP2C9 and VKORC1 genotyping (11 ). The American College of Chest Physicians Antithrombotic and Thrombolytic Therapy Guideline, eighth edition, concluded there was insufficient evidence to support pharmacogenetic-based determination of initial dosing of warfarin (12 ). The Center for Medicare and Medicaid Services sought public input about pharmacogenetic testing by adding the issue to its 2008 national coverage determination dialog topics, using warfarin pharmacogenetics as an example of a potential application lacking high-quality evidence from clinical trials about its clinical utility.

Consistent calls for additional trials to evaluate the clinical utility of pharmacogenetic-based warfarin dosing are being addressed by the National Heart Lung and Blood Institute’s funding of a prospective, multicenter randomized controlled study. The COAG (Clarification of Optimal Anticoagulation through Genetics) trial, scheduled to begin in early 2009, will randomize patients beginning warfarin therapy to initial dosing determined with algorithms including genotype and clinical data or clinical data only. Patients and physicians will be blinded to which algorithm is being employed, and dose adjustments based on subsequent INR results will be protocol driven. The goal of the COAG trial is to assess the incremental effects of using genetic information on anticoagulation control. The decision to use a laboratory outcome as the primary endpoint in the COAG trial is based on evidence that improved therapeutic INR control is associated with reduced bleeding and thrombotic complications and medically related costs (13 ). Although this study is not powered to detect differences in bleeding and thrombotic complications between the 2 initial dosing algorithms, these adverse events will be assessed and reported. Other investigators are planning to conduct Clinical Chemistry 55:4 (2009) 713

Point/Counterpoint similar prospective studies in the US and abroad, and it is likely that pooling of results in the future will enable an accurate assessment of the impact of pharmacogeneticbased initial dosing of warfarin on serious bleeding and thrombotic events. Given that the clinical efficacy of pharmacogeneticbased dosing of warfarin lacks compelling evidence based on well designed and adequately powered prospective trials, it would be a leap of faith to assume that pharmacogenetic-based dosing would be more effective, i.e., safer, than empiric, clinically based initial dosing of warfarin as currently practiced, or to estimate the cost vs benefit of routine genotyping when starting a patient on warfarin. It will likely be several years until enough information is available to arrive at a consensus. Until then, however, to prematurely encourage or require pharmacogenetic testing of CYP2C9 and VKORC1 alleles by either regulatory or reimbursement entities could tarnish the future of evidencedbased personalized genomic medicine by preventing recruitment for important studies such as the COAG trial.

Author Contributions: All authors confirmed they have contributed to the intellectual content of this paper and have met the following 3 requirements: (a) significant contributions to the conception and design, acquisition of data, or analysis and interpretation of data; (b) drafting or revising the article for intellectual content; and (c) final approval of the published article. Authors’ Disclosures of Potential Conflicts of Interest: Upon manuscript submission, all authors completed the Disclosures of Potential Conflict of Interest form. Potential conflicts of interest: Employment or Leadership: C.S. Eby, International Society of Laboratory Hematology. Consultant or Advisory Role: None declared. Stock Ownership: None declared. Honoraria: None declared. Research Funding: C.S. Eby, Osmetech. Expert Testimony: None declared.

714 Clinical Chemistry 55:4 (2009)

Role of Sponsor: The funding organizations played no role in the design of study, choice of enrolled patients, review and interpretation of data, or preparation or approval of manuscript.

References 1. Gage BF, Eby CS. Pharmacogenetics and anticoagulant therapy. J Thromb Thrombolysis 2003;16:73– 8. 2. Gage B, Eby C, Johnson JA, Rieder M, Ridker P, Rettie A, et al. Use of pharmacogenetics and clinical factors to predict the therapeutic dose of warfarin. Clin Pharmacol Ther 2008;84:326 –31. 3. Schelleman H, Chen J, Christie C, Newsomb C, Brensinger C, Price M, et al. Dosing algorithms to predict warfarin maintenance dose in Caucasians and African Americans. Clin Pharmacol Ther 2008;84:332–9. 4. Caldwell M, Tarif A, Johnson J, Gage B, Falkowski M, Gardina P, et al. CYP4F2 genetic variant alters required warfarin dose. Blood 2008;111: 4106 –12. 5. Margaglione M, Brancaccio V, Giuliani N, D’Andrea G, Cappucci G, Iannaccone L, et al. Increased risk for venous thrombosis in carriers of the prothrombin G–⬎A20210 gene variant. Ann Intern Med 1998;129:89 –93. 6. Higashi MK, Veenstra DL, Kondo LM, Wittkowsky AK, Srinouanprachanh SL, Farin FM, Rettie AE. Association between CYP2C9 genetic variants and anticoagulation-related outcomes during warfarin therapy. JAMA 2002;287: 1690 – 8. 7. Limdi NA, McGwin G, Goldstein JA, Beasley TM, Arnett DK, Adler BK, et al. Influence of CYP2C9 and VKORC1 1173C/T genotype on the risk of hemorrhagic complications in African-American and European-American patients on warfarin. Clin Pharmacol Ther 2008;83:212–21. 8. Caraco Y, Blotnick S, Muszkat M. CYP2C9 Genotype-guided warfarin prescribing enhances the efficacy and safety of anticoagulation: a prospective randomized controlled study. Clin Pharmacol Ther 2008;83:460 –70. 9. Anderson JL, Horne BD, Stevens SM, Grove AS, Barton S, Nicholas ZP, et al. Randomized trial of genotype-guided versus standard warfarin dosing in patients initiating oral anticoagulation. Circulation 2007;116:2563–70. 10. Hillman MA, Wilke RA, Yale SH, Vidaillet HJ, Caldwell MD, Glurich I, et al. A prospective, randomized pilot trial of model-based warfarin dose initiation using CYP2C9 genotype and clinical data. Clin Med Res 2005;3:137– 45. 11. Flockhart DA, O’Kane D, Williams MS, Watson MS, Flockhart DA, Gage B, et al. Pharmacogenetic testing of CYP2C9 and VKORC1 alleles for warfarin. Genet Med 2008;10:139 –50. 12. Ansell J, Hirsh J, Hylek E, Jacobson A, Crowther M, Palareti G. The pharmacology and management of the vitamin K antagonists: the Seventh ACCP Conference on Antithrombotic and Thrombolytic Therapy. Chest 2008;133: 160S–98S. 13. Chiquette E, Amato MG, Bussey HI. Comparison of an anticoagulation clinic with usual medical care: anticoagulation control, patient outcomes, and health care cost. Arch Intern Med 1998;158:1641–7.

Clinical Chemistry 55:4 715–722 (2009)

Molecular Diagnostics and Genetics

Presence of Donor-Derived DNA and Cells in the Urine of Sex-Mismatched Hematopoietic Stem Cell Transplant Recipients: Implication for the Transrenal Hypothesis Emily C.W. Hung,1,2 Tristan K.F. Shing,2 Stephen S.C. Chim,3 Philip C. Yeung,2 Rebecca W.Y. Chan,4 Ki W. Chik,5 Vincent Lee,5 Nancy B.Y. Tsui,1,2 Chi-Kong Li,5 Cesar S.C. Wong,6 Rossa W.K. Chiu,1,2 and Y.M. Dennis Lo1,2*

BACKGROUND: The term “transrenal DNA” was coined in 2000 to signify that DNA in urine may come from the passage of plasma DNA through the kidney barrier. Although DNA in the urine has the potential to provide a completely noninvasive source of nucleic acids for molecular diagnosis, its existence remains controversial. METHODS:

We obtained blood and urine samples from 22 hematopoietic stem cell transplant (HSCT) recipients and used fluorescence in situ hybridization, PCR for short tandem repeats, mass spectrometry, quantitative PCR, and immunofluorescence detection to study donor-derived DNA in the urine.

RESULTS:

All HSCT recipients exhibited high amounts of donor-derived DNA in buffy coat and plasma samples. Male donor– derived DNA was detected in supernatants of urine samples from all 5 female sexmismatched HSCT recipients. Surprisingly, the amount of DNA in urine supernatants was not correlated with the plasma value. Moreover, cell-free urine supernatants contained DNA fragments ⬎350 bp that were absent in plasma. Donor-derived polymorphs were detected in urine by fluorescence in situ hybridization. Coincidentally, donor-derived cytokeratinproducing epithelial cells were discovered in urine samples from 3 of 10 sex-mismatched HSCT recipients as long as 14.2 years after transplantation.

CONCLUSIONS: This report is the first to demonstrate the presence of donor-derived DNA in the urine of HSCT

1

Li Ka Shing Institute of Health Sciences and Departments of 2 Chemical Pathology, 3 Obstetrics & Gynaecology, 4 Accident & Emergency, 5 Paediatrics, and 6 Clinical Oncology, The Chinese University of Hong Kong, Prince of Wales Hospital, Shatin, New Territories, Hong Kong SAR, China. * Address correspondence to this author at: Department of Chemical Pathology, Rm. 38023, 1/F, Clinical Sciences Bldg., Prince of Wales Hospital, 30-32 Ngan Shing St., Shatin, Hong Kong SAR, China. Fax ⫹852 2636 5090; e-mail [email protected]. Some of these data were presented in the 3rd Trainee Presentation Session of the Hong Kong College of Pathologists, Hong Kong SAR, China, November 24, 2007.

recipients; however, we show that much of this DNA originates from donor-derived cells, rather than from the transrenal passage of cell-free plasma DNA. Our discovery of donor-derived cytokeratin-producing epithelial cells raises interesting biological and therapeutic implications, e.g., the capacity of marrow stem cells to serve as an extrarenal source for renal tubule regeneration. © 2009 American Association for Clinical Chemistry

In 2000, Botezatu et al. proposed the “transrenal hypothesis,” that plasma DNA passes through the kidney barrier and enters the urinary tract to produce socalled transrenal DNA (Tr-DNA)7 (1 ). These investigators targeted a Y-chromosomal locus with nested PCR and detected male donor– derived DNA in urine samples from 5 of 9 women who received blood transfusions from male donors. In addition, such tumorassociated DNA markers as KRAS8 [v-Ki-ras2 Kirsten rat sarcoma viral oncogene homolog (KRAS1)] and RASSF1A [Ras association (RalGDS/AF-6) domain family member 1A (RASSF1)] have been detected in urine samples from patients with pancreatic and colon cancers (1 ) and breast cancer patients (2 ), respectively. Other researchers have detected fetal-derived DNA in the urine of pregnant women in the first (1, 3 ) and third (3, 4 ) trimesters. Umansky and Tomei (5 ) have reviewed the evidence for the transrenal hypothesis; however, other investigators have challenged this hypothesis. Su et al. reported finding the KRAS mutation

Received July 3, 2008; accepted October 8, 2008. Previously published online at DOI: 10.1373/clinchem.2008.113530 7 Nonstandard abbreviations: Tr-DNA, transrenal DNA; HELLP, hemolysis, increased liver enzymes, and low platelet count; HSCT, hematopoietic stem cell transplant; FISH, fluorescence in situ hybridization; STR, short tandem repeat; CK, cytokeratin; BMSC, bone marrow– derived stem cell. 8 Human genes: KRAS, v-Ki-ras2 Kirsten rat sarcoma viral oncogene homolog; RASSF1A, Ras association (RalGDS/AF-6) domain family member 1A; ZFX, zinc finger protein, X-linked; ZFY, zinc finger protein, Y-linked; LEP, leptin; SRY, sex-determining region Y.

715

in the urine of colorectal cancer patients but not in plasma (6 ). In addition, 3 independent groups were unable to detect fetal DNA in maternal urine, even in pregnant women with a compromised kidney barrier function due to HELLP (hemolysis, increased liver enzymes, and low platelet count) syndrome (7–9 ). The presence of target DNA at low concentrations may be one reason for the inability to detect fetal DNA in maternal urine. Fetal DNA is known to constitute only 2.3%–11.4% (mean, 6.2%) of the total circulating DNA in maternal plasma, even in late pregnancy (37– 43 weeks of gestation) (10 ). The amount of fetal DNA in urine, if any, is likely to be low and may be difficult to detect. In contrast, we have shown that a median of 59.5% of the DNA in the plasma of hematopoietic stem cell transplant (HSCT) recipients is of donor origin (11 ); thus, the amount of donor-derived DNA in HSCT recipients is about 5- to 25-fold higher than that of fetal DNA in maternal plasma. In view of the distinct advantages of urine as a source for nucleic acid testing and the controversies surrounding the transrenal hypothesis, we followed up our previous observation and investigated the occurrence of nonhost DNA in the urine of HSCT patients. In phase 1 of the study, we used a mass spectrometry– based platform to detect, quantify, and compare the amounts of donorderived DNA in urine and blood samples. In phase 2 of the study, we characterized the nature of urinary DNA with respect to size distribution and cellular origin. Materials and Methods PHASE 1

Patient recruitment. We recruited 22 HSCT patients from the Bone Marrow Transplant Clinic of the Department of Paediatrics, Prince of Wales Hospital, The Chinese University of Hong Kong, Shatin, Hong Kong SAR. All patients were in remission with respect to their primary conditions. Informed consent was obtained from patients or their parents. The study was approved by the Clinical Research Ethics Committee of The Chinese University of Hong Kong and was performed in accordance with the Helsinki Declaration. Sample collection and preparation. Peripheral blood was collected into EDTA-containing blood tubes. We harvested plasma by centrifuging blood samples twice to minimize contamination by blood cells, as previously described (12 ). Fresh urine samples were collected into sterile plain bottles. A urine aliquot was tested with a urinalysis reagent strip (Multistix; Bayer) and assayed for creatinine on a Roche MODULAR analyzer. To inhibit possible nuclease activities, we mixed the sample with 716 Clinical Chemistry 55:4 (2009)

0.5 mol/L EDTA, pH 8.0 (Invitrogen), to a final concentration of 10 mmol/L (1, 13 ). To separate the cellfree and cellular urine components, we centrifuged urine samples at 3000g at 4 °C for 10 min and filtered the supernatant through a 0.45-␮m filter (Milex-GV; Millipore) to remove any remaining cells or cell debris. The cell-free urine supernatant was stored at ⫺80 °C until DNA extraction. Pellets of urinary cells obtained after centrifugation were washed twice with 1⫻ PBS (144 mg/L KH2PO4, 9 g/L NaCl, and 795 mg/L Na2HPO4 䡠 7H2O, pH 7.4; Invitrogen) and stored at ⫺20 °C until DNA extraction. Fluorescence in situ hybridization and DNA short tandem repeat analyses for peripheral blood chimerism. Peripheral blood chimerism status was confirmed by fluorescence in situ hybridization (FISH) analysis for sex-mismatched HSCT recipients and by DNA short tandem repeat (STR) analysis for sex-matched HSCT recipients, as recommended by the American Society of Blood and Marrow Transplantation (14 ). Further details are in the in the Data Supplement that accompanies the online version of this article at http:// www.clinchem.org/content/vol55/issue4. DNA isolation. We used the QIAamp Blood Kit (Qiagen) according to the manufacturer’s spin protocol for blood and bodily fluids to extract DNA from 800 ␮L of plasma, 300 ␮L of buffy coat, and the urinary-cell pellet [resuspended in 200 ␮L of 1⫻ PBS (Invitrogen)] (10 ). For each 400 ␮L of fluid sample, we added 40 ␮L of Qiagen Protease and 400 ␮L of Qiagen Buffer AL, incubated at 56 °C for 10 min, and then added 400 ␮L cold absolute ethanol. We then transferred the mixture to a QIAamp Spin Column, centrifuged the column at 16 000g for 1 min, and washed the extraction column twice by centrifugation at 16 000g, first with Qiagen Buffer AW1 for 1 min and then with Buffer AW2 for 3 min. The DNA was eluted in 50 ␮L of deionized water and stored at ⫺20 °C until analysis. To extract DNA from the filtered cell-free urine supernatant, we mixed 15 mL of 6 mol/L guanidine thiocyanate (Sigma–Aldrich) and 1 mL of resin (Wizard Plus Minipreps DNA Purification System; Promega) with 10 mL of the processed urine (15 ) and incubated the mixture with gentle mixing at ambient temperature for 2 h. The resin-DNA complex was then isolated and washed on minicolumns with the wash buffer provided in the Wizard Plus Minipreps DNA Purification System. The urine DNA was then eluted in 100 ␮L deionized free water and stored at ⫺20 °C until analysis. Homogeneous MassEXTEND assay for zinc finger protein homologs. The assay for zinc finger protein genes was designed to quantify the fractional concentration

Donor DNA and Cells in HSCT Recipients’ Urine

of male DNA in a mixture of male and female DNA. The assay used identical PCR primers to coamplify the genes for zinc finger protein homologs located on the X chromosome (ZFX, zinc finger protein, X-linked) and the Y chromosome (ZFY, zinc finger protein, Y-linked) in a single PCR (amplicon size, 120 bp). The amplicons were then differentiated with the homogeneous MassEXTEND assay (Sequenom) with an extension primer (5⬘-TCATCTGGGACTGTGCA-3⬘) designed to anneal at the position adjacent to a single-nucleotide site that differentiates the 2 genes. The extension primer was extended by 1 and 2 bases for ZFX and ZFY, respectively, with a selective combination of terminator nucleotides (i.e., dideoxynucleotides). The extended products (5⬘-TCATCTGGGACTGTGCAA-3⬘ for ZFX and 5⬘-TCATCTGGGACTGTGCAGT-3⬘ for ZFY) had distinct masses of 5498.6 and 5818.8 Da, respectively. The extended products were readily resolved by MALDI-TOF mass spectrometry. The ZFX signal represented total DNA, and the ZFY signal represented the male donor– derived DNA in female recipients. The fractional concentration of male DNA in the total-DNA preparation was calculated by assessing the ZFY peak height relative to that of ZFX in the mass spectrum. For further details of the PCR reaction, see the online Data Supplement. To calculate the assay CV, we obtained genomic DNA from a healthy male volunteer and prepared artificial mixtures of 0%–100% male DNA by mixing a known concentration of male DNA with female genomic DNA. Measured and expected percentages of male DNA were compared; the mean and SD were calculated for 20 replicates. PHASE 2

ples by microscopy according to the Rant–Shepherd method, as previously described (16 ). Urine samples with 0 –15 white blood cells or red blood cells in a microscopy field produced with a ⫻20 objective were reported as negative for such cells. Only urine samples with negative microscopy results were used for further analysis. See the online Data Supplement for details of the combined FISH and immunofluorescencedetection protocol. In brief, Cytospin slides were prepared from urinary-cell pellets, fixed, and counterstained. A CEP X/Y DNA probe (Vysis/Abbott Molecular) was applied to each Cytospin slide according to manufacturer’s instructions. We mounted the slides with 4⬘,6-diamidino-2-phenylindole supplied in the CEP X/Y DNA Probe Kit and used a fluorescence microscope (Nikon Eclipse E600) to examine the slides for positive signals (orange for the centromere of chromosome X, green for the Yq12 region of chromosome Y and for CK). Two independent observers evaluated the stained slides (all areas under ⫻400 magnification); at least 500 cells were counted for each sample. The numbers of polymorphs and epithelial cells from the recipient and the donor were recorded. As external controls in each run, Cytospin slides were prepared from male and female urine samples with moderate numbers of polymorphs and epithelial cells (cases with inflammation). STATISTICAL ANALYSES

Statistical tests were performed with SigmaStat 3.0.1A software (SPSS). Results HIGH PERCENTAGE OF DONOR DNA IN BUFFY COAT SAMPLES FROM HSCT RECIPIENTS

Real-time quantitative PCR for size analysis with LEP and SRY sequences. Real-time quantitative PCR assays were used to quantify the LEP (leptin) and SRY (sexdetermining region Y) genes in a 50-␮L reaction mixture, as previously described (13 ). In female patients who received male stem cells, the SRY signal reflected donor-derived DNA, and the LEP signal reflected the total DNA from both the donor and recipient. For each locus, a common forward primer and different reverse primers (Integrated DNA Technologies) were used to generate 3 amplicons of different sizes. To control for urine concentration, we expressed the absolute DNA concentration in urine in genome equivalents per millimole of creatinine. For further details, see the online Data Supplement. Combined FISH and immunofluorescence detection for the X and Y chromosomes and cytokeratin in urine samples. We analyzed fresh unfractionated urine sam-

The characteristics of the HSCT recipients are shown in Table S1 in the online Data Supplement. The 10 sex-matched HSCT recipients and the 12 sexmismatched HSCT recipients were in complete remission with respect to their hematologic conditions. The analysis of chimerism status (i.e., the presence of lymphohematopoietic cells of nonhost origin (14 ) measured with FISH (17, 18 ) and DNA STR analyses as described above) revealed that 21 of the 22 patients had ⬎99% donor lymphohematopoietic cells in the peripheral blood, fulfilling the criterion for full chimerism with complete lymphohematopoietic replacement (14 ). The percentages of donor DNA in buffy coat samples obtained with the mass spectrometry– based assay for the zinc finger protein genes were completely concordant with the FISH and STR results (see Table S1 in the online Data Supplement). In this mass spectrometry– based assay, the mean ratio of the ZFY signal to the ZFX signal for 20 replicates of male DNA at 50 ng Clinical Chemistry 55:4 (2009) 717

per reaction was 0.46 (SD, 0.02; CV, 4.3%). Evaluation of the proportion of recovered DNA with a series of artificial mixtures containing 0%–100% male DNA indicated high correlation of the measured fractional concentration of male DNA with the expected value (r ⫽ 0.975; see Fig. S1 in the online Data Supplement). SEX-MISMATCHED HSCT RECIPIENTS HAD VERY HIGH PERCENTAGES OF DONOR-DERIVED DNA IN PLASMA

We then used the mass spectrometry– based zinc finger protein assay to examine the percentages of male DNA in plasma samples (see Table S2 in the online Data Supplement). In female sex-mismatched HSCT recipients (n ⫽ 5), the percentage of male DNA represented the amount of donor-derived DNA, and the mean fractional concentration of male donor– derived DNA in these patients was 79.3%. In male sex-mismatched HSCT recipients, the proportion of male DNA (i.e., recipient-derived) was only 27.2%; therefore, the fractional concentration of donor-derived DNA was 72.8%. All 12 sex-mismatched HSCT recipients had very high contributions of donor-derived DNA (mean fractional concentration, 76.1%). HIGH PERCENTAGES OF DONOR-DERIVED DNA IN URINE SUPERNATANTS

We then examined urine supernatants for the presence of donor-derived DNA. Sex-matched HSCT recipients were recruited as controls (see Table S2 in the online Data Supplement). Typical mass spectrometry tracings are shown in Fig. S2 in the online Data Supplement. Male sex-mismatched HSCT recipients had a mean fractional male-DNA concentration of 92.3% in urine supernatants (Fig. 1B). Such a high percentage is expected from the lysis of the host’s (male) urinary epithelial cells shed into the urine. Interestingly, all 5 female sex-mismatched HSCT recipients had male donor– derived DNA in urine supernatants. The mean fractional concentration reached 38.3% (range, 26%– 88.1%; Fig. 1A). CORRELATIONS BETWEEN THE AMOUNTS OF MALE DNA IN URINE SUPERNATANTS, URINARY-CELL PELLETS, AND PLASMA

We next examined if the amount of donor-derived male DNA in urine supernatants was correlated with the plasma value. Surprisingly, we found no significant correlation for the 5 female sex-mismatched HSCT recipients (Spearman rank order correlation, r ⫽ ⫺0.3; P ⫽ 0.683) or for the entire group of sex-mismatched HSCT recipients (r ⫽ ⫺0.427; P ⫽ 0.178) (Fig. 2A). The fractional concentration of donor-derived male DNA in urine supernatants, however, was significantly correlated with that in urinary-cell pellets (P ⫽ 0.0186; Fig. 2B). 718 Clinical Chemistry 55:4 (2009)

Fig. 1. Fractional concentrations of male DNA in samples of plasma and cell-free urine supernatant from 5 female (A) and 7 male (B) sex-mismatched HSCT recipients.

URINE SUPERNATANTS OF SEX-MISMATCHED HSCT RECIPIENTS CONTAINED DONOR-DERIVED DNA FRAGMENTS >350 bp THAT WERE ABSENT IN PLASMA

In view of the unexpected lack of correlation between plasma and urine supernatant DNA values, we proceeded to phase 2 of the study. Two gene loci were chosen to study the size of DNA fragments in plasma and urine supernatants. SRY signals represent the amount of male donor– derived DNA in female sexmismatched HSCT recipients, and LEP signals represent both donor- and recipient-derived DNA. Because urine DNA had previously been shown to contain fragments of 150 –200 bp (5 ), we designed primers to pro-

Donor DNA and Cells in HSCT Recipients’ Urine

sex-mismatched HSCT recipients had male donor– derived DNA (SRY) in both the plasma and the urine supernatants (see Table S3 in the online Data Supplement). Interestingly, 4 of these 5 patients had SRY and/or LEP fragments in urine supernatants that were larger than 350 bp. These DNA fragments were absent in the corresponding plasma samples (see Tables S3 and S4 in the online Data Supplement). FISH EVIDENCE FOR DONOR-DERIVED CELLS IN URINE

To investigate the origin of these longer DNA fragments, we conducted cellular analyses of fresh urine samples obtained from 10 of the 12 sex-mismatched HSCT recipients. We prepared urinary-cell pellets and counted a minimum of 500 cells for each patient (mean, 929 cells; range, 510 –1981 cells). In a microscopical analysis, we identified donor-derived epithelial cells and polymorphs from both donors and recipients in all 10 patients. Intriguingly, we noted that a small proportion of the donor cells in some samples had a rounded nucleus and had assumed the morphology of urinary epithelial cells. DONOR-DERIVED CELLS IN URINE CARRIED EPITHELIAL SIGNATURES

Fig. 2. Relationships of fractional concentrations of male DNA in urine and plasma. (A), Cell-free urine supernatant and plasma. (B), Cell-free urine supernatant and urinary-cell pellets. No correlation was found in (A), but a positive correlation (r ⫽ 0.682, Spearman rank order correlation; P ⫽ 0.0186) was demonstrated in (B).

duce amplicons shorter than 150 bp to maximize the yield of Tr-DNA (19 ). We also designed primers that produced amplicons of ⬎200 bp to study the contribution of non–Tr-DNA. We used amplicons of comparable fragment lengths (63, 107, and 377 bp for SRY and 63, 105, and 356 bp for LEP). Consistent with the results for the assay for the zinc finger protein genes (see Table S2 in the online Data Supplement), all 5 female

To characterize the phenotype of these epithelial-like cells, we carried out combined FISH and immunofluorescence detection with CK as an epithelial marker. Male and female urine samples were included as controls to establish probe efficiency. Seventy-five percent of the male cells positive for 4⬘,6-diamidino-2phenylindole in these control samples had positive signals for chromosomes X and Y, and 80% of the female cells had 2 chromosome X signals. No chromosome Y signal was detected in the female urine samples. For immunofluorescence detection of CK in control samples, we correlated the number of CK-positive cells with that obtained by Papanicolaou staining, a technique commonly used in conventional cytology. Seventy-two percent of the Papanicolaou-stained epithelial cells were CK positive with this protocol. Our probe efficiencies (75%, 80%, and 72% for X, Y, and CK signals, respectively) compared favorably with those reported for studies that used similar techniques (20, 21 ). All 10 sex-mismatched HSCT recipients had recipient-derived CK-positive epithelial cells and donor-derived CK-negative polymorphs in their urine (Table 1). Remarkably, 3 patients had donor-derived CK-positive epithelial cells in their urine (Fig. 3). These donor-derived cells constituted 1.3%, 0.4%, and 0.4% of all the epithelial cells in these 3 patients (Table 1). Patients 5 and 16 had received bone marrow transplants, and patient 9 had received a peripheral blood stem cell transplant. At the time of urine collection, Clinical Chemistry 55:4 (2009) 719

Table 1. Combined FISH and immunofluorescence detection of fresh urine samples from sexmismatched HSCT recipients (n ⴝ 10). Patient No.

Polymorphs from donor, %

Epithelial cells from donor, %

Time from HSCT to urine analysis, years

5

79.1

1.3

3.7

6

91.0

0.0

14.7

7

9.1

0.0

16.2

8

77.8

0.0

5.0

9

9.8

0.4

2.3

10

34.6

0.0

5.1

11

53.2

0.0

2.1

12

60.0

0.0

7.3

15

36.2

0.0

4.1

16

60.2

0.4

14.2

these 3 patients had received their transplants 3.7, 14.2, and 2.3 years before, respectively.

plasma. Furthermore, we detect donor-derived cells by in situ hybridization. In this HSCT model, the amounts of donorderived DNA in urine supernatants could be affected by several factors: (a) the number of donor-derived cells present in the urine; (b) the amount of cell-free recipient DNA from urinary cells dying in situ or in the urine; and (c) the amount of donor-derived DNA in plasma, the hypothetical source of Tr-DNA. The presence of variable percentages of donor-derived white blood cells in the urine of these HSCT recipients could explain not only the lack of correlation between the donor-derived cell-free DNA in urine and plasma but also the existence of long donor-derived DNA fragments in urine supernatants. Although the results of the present study do not support or contradict the TrDNA concept, they do suggest that the positive evidence for Tr-DNA in the current literature may be confounded by the presence of nonhost cells in the urine. Without a detailed analysis of the cellular contribution to urine DNA, the previous reports on Tr-DNA should be interpreted with caution. It is hoped that future model systems can be developed that allow one to sep-

Discussion In this study, we have demonstrated the presence of donor-derived DNA in urine samples from sexmismatched HSCT recipients. Our data highlight the important contribution of DNA in the supernatants from donor-derived cells. Further characterization of these cells led to the discovery of an unexpected population of donor-derived cells carrying epithelial signatures. Recent reports that favor the transrenal hypothesis have generally based their conclusion on the urinary detection of a target previously known to be present in plasma. The detection of donor-derived DNA in the urinary tracts of the HSCT recipients we studied would have lent support to the transrenal hypothesis. In the present investigation, we had applied more stringent criteria and hypothesized that if the transrenal hypothesis were correct, then (a) the amount of donorderived DNA in the urine supernatants would correlate with that in the plasma and (b) the sizes of donorderived DNA fragments in the urine would be limited by the physical properties of the kidney barrier and would not be larger than their hypothetical sources in the plasma. Our data show that both of these inferences do not hold. First, our MALDI-TOF mass spectrometry data show that the amounts of donor-derived DNA in urine supernatants do not correlate with those in plasma. Second, our size analysis shows that urine supernatants contain long DNA fragments that are absent in the 720 Clinical Chemistry 55:4 (2009)

Fig. 3. Combined FISH and immunofluorescence detection of CK and chromosomes X and Y in a fresh urine sample from patient 5, a male patient who had received an HSCT from a female donor. A donor-derived CK-positive cell with an epithelial morphology (ED) was seen. Indicated are staining of the nucleus by 4⬘,6-diamidino-2-phenylindole (blue), the X chromosome (orange), and the Y chromosome and CK (green). Original magnification, ⫻400. PR, recipient-derived polymorph; PD, donor-derived polymorph.

Donor DNA and Cells in HSCT Recipients’ Urine

arately explore the phenomena of cellular and cell-free DNA transfer into the urine. During the search for donor-derived urinary DNA, we discovered donor-derived cytokeratinpositive epithelial cells in the urine of HSCT recipients. This in vivo finding is novel, given the current literature. The observation of these cells in 3 of 10 sexmismatched HSCT recipients suggests that this line of differentiation was not a sporadic occurrence but possibly a common developmental path taken by bone marrow– derived stem cells (BMSCs). The results of studies with animal models and in vitro experiments have suggested the potential of BMSCs in kidney regeneration (22–27 ). BMSCs tagged with green fluorescent protein were recruited into the glomerular mesangium in both physiological (28 ) and pathologic (23 ) states. In addition to mesangial cells, Kale et al. showed that murine BMSCs could differentiate into proximal tubular cells in ischemic kidneys (25 ). Human mesenchymal stem cells have been shown to generate metanephroi and to express podocyte- and tubular epithelial cell–specific genes in rodent whole-embryo culture (29 ). Human in vivo data regarding the role of extrarenal stem cells in kidney regeneration are relatively limited to the detection of recipient-derived, Y chromosome–positive renal tubular cells in female allografts transplanted into male recipients (22, 30, 31 ). Cytokeratin was used as a tubular cell marker by Gupta et al. (30 ). These reports provide indirect evidence that BMSCs may be a source of the extrarenal stem cells in humans. Our study provides the first direct evidence that cytokeratin-producing epithelial cells can be derived extrarenally from marrow or peripheral blood stem cells. Although donor-derived cytokeratin-producing epithelial cells were detected at a modest contribution of 0.4%–1.3% in 3 of 10 patients, our data show unequivocally that this line of differentiation is possible. Moreover, the detection of these cells in HSCT recipients as long as 14.2 years after transplantation suggests

that the contribution of kidney epithelial cells from hematopoietic stem cells is stable, durable, and continual. This phenomenon has interesting biological and therapeutic implications, particularly for patients with chronic kidney diseases, for whom renal-replacement therapy is far from a cure. A larger cohort of sexmismatched HSCT patients will be needed to confirm this observation. Examination of renal biopsies from sex-mismatched HSCT patients will help elucidate the actual contribution of extrarenal stem cells in the maintenance of renal tubular architecture. Further understanding of the mechanisms involved in the generation of these epithelial cells from extrarenal sources may advance our knowledge in regenerative medicine.

Author Contributions: All authors confirmed they have contributed to the intellectual content of this paper and have met the following 3 requirements: (a) significant contributions to the conception and design, acquisition of data, or analysis and interpretation of data; (b) drafting or revising the article for intellectual content; and (c) final approval of the published article. Authors’ Disclosures of Potential Conflicts of Interest: Upon manuscript submission, all authors completed the Disclosures of Potential Conflict of Interest form. Potential conflicts of interest: Employment or Leadership: None declared. Consultant or Advisory Role: Y.M.D. Lo, Sequenom. Stock Ownership: Y.M.D. Lo, Sequenom. Honoraria: None declared. Research Funding: Earmarked Research Grant (CUHK4436/06M) from the Hong Kong Research Grants Council. Y.M.D. Lo, Sequenom. Expert Testimony: None declared. Other: Y.M.D. Lo, S.S.C. Chim, N.B.Y. Tsui, and R.W.K. Chiu hold patents or have filed patent applications on aspects of circulating nucleic acid– based diagnostics. Role of Sponsor: The funding organizations played no role in the design of study, choice of enrolled patients, review and interpretation of data, or preparation or approval of manuscript. Acknowledgments: The authors thank E.S. Lo for her kind assistance in the slide examination.

References 1. Botezatu I, Serdyuk O, Potapova G, Shelepov V, Alechina R, Molyaka Y, et al. Genetic analysis of DNA excreted in urine: a new approach for detecting specific genomic DNA sequences from cells dying in an organism. Clin Chem 2000;46: 1078 – 84. 2. Bryzgunova OE, Skvortsova TE, Kolesnikova EV, Starikov AV, Rykova EY, Vlassov VV, Laktionov PP. Isolation and comparative study of cell-free nucleic acids from human urine. Ann N Y Acad Sci 2006;1075:334 – 40. 3. Al-Yatama MK, Mustafa AS, Ali S, Abraham S, Khan Z, Khaja N. Detection of Y chromosomespecific DNA in the plasma and urine of pregnant women using nested polymerase chain reaction. Prenat Diagn 2001;21:399 – 402.

4. Majer S, Bauer M, Magnet E, Strele A, Giegerl E, Eder M, et al. Maternal urine for prenatal diagnosis—an analysis of cell-free fetal DNA in maternal urine and plasma in the third trimester. Prenat Diagn 2007;27:1219 –23. 5. Umansky SR, Tomei LD. Transrenal DNA testing: progress and perspectives. Expert Rev Mol Diagn 2006;6:153– 63. 6. Su YH, Wang M, Aiamkitsumrit B, Brenner DE, Block TM. Detection of a K-ras mutation in urine of patients with colorectal cancer. Cancer Biomark 2005;1:177– 82. 7. Zhong XY, Hahn D, Troeger C, Klemm A, Stein G, Thomson P, et al. Cell-free DNA in urine: a marker for kidney graft rejection, but not for prenatal diagnosis? Ann N Y Acad Sci 2001;945:250 –7.

8. Li Y, Zhong XY, Kang A, Troeger C, Holzgreve W, Hahn S. Inability to detect cell free fetal DNA in the urine of normal pregnant women nor in those affected by preeclampsia associated HELLP syndrome. J Soc Gynecol Investig 2003;10:503– 8. 9. Illanes S, Denbow ML, Smith RP, Overton TG, Soothill PW, Finning K. Detection of cell-free fetal DNA in maternal urine. Prenat Diagn 2006;26: 1216 – 8. 10. Lo YMD, Tein MSC, Lau TK, Haines CJ, Leung TN, Poon PMK, et al. Quantitative analysis of fetal DNA in maternal plasma and serum: implications for noninvasive prenatal diagnosis. Am J Hum Genet 1998;62:768 –75. 11. Lui YYN, Chik KW, Chiu RWK, Ho CY, Lam CWK, Lo YMD. Predominant hematopoietic origin of

Clinical Chemistry 55:4 (2009) 721

12.

13.

14.

15.

16.

17.

cell-free DNA in plasma and serum after sexmismatched bone marrow transplantation. Clin Chem 2002;48:421–7. Chiu RWK, Poon LLM, Lau TK, Leung TN, Wong EMC, Lo YMD. Effects of blood-processing protocols on fetal and total DNA quantification in maternal plasma. Clin Chem 2001;47:1607–13. Milde A, Haas-Rochholz H, Kaatsch HJ. Improved DNA typing of human urine by adding EDTA. Int J Legal Med 1999;112:209 –10. Antin JH, Childs R, Filipovich AH, Giralt S, Mackinnon S, Spitzer T, Weisdorf D. Establishment of complete and mixed donor chimerism after allogeneic lymphohematopoietic transplantation: recommendations from a workshop at the 2001 Tandem Meetings of the International Bone Marrow Transplant Registry and the American Society of Blood and Marrow Transplantation. Biol Blood Marrow Transplant 2001;7:473– 85. Su YH, Wang M, Brenner DE, Ng A, Melkonyan H, Umansky S, et al. Human urine contains small, 150 to 250 nucleotide-sized, soluble DNA derived from the circulation and may be useful in the detection of colorectal cancer. J Mol Diagn 2004; 6:101–7. Shepherd M. A revision of the microtitre tray method for urine microscopy. PHLS Microbiol Dig 1997;14:236 –7. Tsang KS, Li CK, Chik KW, Wong AP, Lau TT, Li K, et al. Up-regulation of cell growth associated with an extra Y chromosome in a child with

722 Clinical Chemistry 55:4 (2009)

18.

19.

20.

21.

22.

23.

24.

beta-thalassemia major having undergone hematopoietic stem cell transplant. J Pediatr Hematol Oncol 2000;22:133– 6. Lee V, Cheng PS, Chik KW, Wong GW, Shing MM, Li CK. Autoimmune hypothyroidism after unrelated haematopoietic stem cell transplantation in children. J Pediatr Hematol Oncol 2006;28:293–5. Chan KCA, Zhang J, Hui ABY, Wong N, Lau TK, Leung TN, et al. Size distributions of maternal and fetal DNA in maternal plasma. Clin Chem 2004;50:88 –92. Ng IO, Chan KL, Shek WH, Lee JM, Fong DY, Lo CM, Fan ST. High frequency of chimerism in transplanted livers. Hepatology 2003;38:989 –98. Meignin V, Soulier J, Brau F, Lemann M, Gluckman E, Janin A, Socie G. Little evidence of donorderived epithelial cells in early digestive acute graft-versus-host disease. Blood 2004;103: 360 –2. Poulsom R, Forbes SJ, Hodivala-Dilke K, Ryan E, Wyles S, Navaratnarasah S, et al. Bone marrow contributes to renal parenchymal turnover and regeneration. J Pathol 2001;195:229 –35. Ito T, Suzuki A, Imai E, Okabe M, Hori M. Bone marrow is a reservoir of repopulating mesangial cells during glomerular remodeling. J Am Soc Nephrol 2001;12:2625–35. Cornacchia F, Fornoni A, Plati AR, Thomas A, Wang Y, Inverardi L, et al. Glomerulosclerosis is transmitted by bone marrow-derived mesangial cell progenitors. J Clin Invest 2001;108:1649 –56.

25. Kale S, Karihaloo A, Clark PR, Kashgarian M, Krause DS, Cantley LG. Bone marrow stem cells contribute to repair of the ischemically injured renal tubule. J Clin Invest 2003;112:42–9. 26. Masuya M, Drake CJ, Fleming PA, Reilly CM, Zeng H, Hill WD, et al. Hematopoietic origin of glomerular mesangial cells. Blood 2003;101: 2215– 8. 27. Guo JK, Schedl A, Krause DS. Bone marrow transplantation can attenuate the progression of mesangial sclerosis. Stem Cells 2006;24:406 –15. 28. Imasawa T, Utsunomiya Y, Kawamura T, Zhong Y, Nagasawa R, Okabe M, et al. The potential of bone marrow-derived cells to differentiate to glomerular mesangial cells. J Am Soc Nephrol 2001; 12:1401–9. 29. Yokoo T, Ohashi T, Shen JS, Sakurai K, Miyazaki Y, Utsunomiya Y, et al. Human mesenchymal stem cells in rodent whole-embryo culture are reprogrammed to contribute to kidney tissues. Proc Natl Acad Sci U S A 2005;102:3296 –300. 30. Gupta S, Verfaillie C, Chmielewski D, Kim Y, Rosenberg ME. A role for extrarenal cells in the regeneration following acute renal failure. Kidney Int 2002;62:1285–90. 31. Mengel M, Jonigk D, Marwedel M, Kleeberger W, Bredt M, Bock O, et al. Tubular chimerism occurs regularly in renal allografts and is not correlated to outcome. J Am Soc Nephrol 2004;15: 978 – 86.

Clinical Chemistry 55:4 723–729 (2009)

Molecular Diagnostics and Genetics

Optimization of Transrenal DNA Analysis: Detection of Fetal DNA in Maternal Urine Eugene M. Shekhtman,1 Kalpana Anne,1 Hovsep S. Melkonyan,1 David J. Robbins,1 Steven L. Warsof,2 and Samuil R. Umansky1*

BACKGROUND: Fragments of DNA from cells dying throughout the body are detectable in urine (transrenal DNA, or Tr-DNA). Our goal was the optimization of Tr-DNA isolation and detection techniques, using as a model the analysis of fetal DNA in maternal urine. METHODS:

We isolated urinary DNA using a traditional silica-based method and using a new technique based on adsorption of cell-free nucleic acids on Q-Sepharose resin. The presence of Y chromosome–specific SRY (sex-determining region Y) sequences in urine of pregnant women was detected by conventional and realtime PCR using primers/probe sets designed for 25-, 39-, 65-, and 88-bp PCR targets.

RESULTS:

Method of DNA isolation and PCR target size affected fetal Tr-DNA detection. Assay diagnostic sensitivity increases as the PCR target is shortened. Shorter DNA fragments (50 –150 bp) could be isolated by Qresin– based technique, which also facilitated fetal TrDNA analysis. Using DNA isolated by Q-resin– based method and an “ultrashort” DNA target, we successfully detected SRY sequences in 78 of 82 urine samples from women pregnant with male fetuses (positive predictive value 87.6%). Eleven of 91 urine samples from women pregnant with female fetuses produced SRY false-positive results (negative predictive value 95.2%).

CONCLUSIONS: Single-copy fetal DNA sequences can be successfully detected in the urine of pregnant women when adequate methods for DNA isolation and analysis are applied. Strong precautions against sample contamination with male cells and DNA are necessary to avoid false-positive results.

© 2008 American Association for Clinical Chemistry

1

Xenomics Inc., Monmouth Junction, NJ; 2 Eastern Virginia Medical School, Norfolk, VA. * Address correspondence to this author at: Xenomics Inc., 1 Deer Park Dr., Suite F, Monmouth Junction, NJ 08852. Fax 732-438-8299; e-mail sumansky@ xenomics.com. Received June 19, 2008; accepted October 14, 2008. Previously published online at DOI: 10.1373/clinchem.2008.113050 3 Nonstandard abbreviations: ccfDNA, circulating cell-free DNA; Tr-DNA, trans-

The discovery (1–3 ) of cell-free DNA in the bloodstream (so-called circulating cell-free DNA, or ccfDNA3) has led to intensive studies of its potential diagnostic application in tumor detection and monitoring (4 – 6 ), prenatal diagnostics (7–9 ), and monitoring of trauma (10 ) and stroke (11 ). The half-life of ccfDNA is approximately 15 min (12 ), and it was found that a portion of these circulating DNA fragments cross the kidney barrier and can be found in the urine (transrenal DNA, or Tr-DNA) (13, 14 ). The existence of Tr-DNA was proven by detection of Y chromosome–specific DNA sequences in urine of women with male fetuses (13 ), mutant K-ras in urine of patients with colorectal (13, 15, 16 ) or pancreatic (13 ) cancers bearing the same mutation, sequences of Mycobacterium tuberculosis in urine of patients with pulmonary tuberculosis (17 ), and Y chromosome–specific sequences in urine of female recipients of blood transfusion from male donors (13 ). In all of these studies, DNA was isolated from urine by silica-based methods. The isolated DNA can be resolved into 2 fractions by gel electrophoresis (13, 15 ). The high-molecularweight fraction represents DNA from shed cells, whereas low-molecular-weight DNA fragments (150 – 200 bp) contain most of the Tr-DNA (15 ). The amount of specific Tr-DNA sequences isolated from urine may be low, owing to kidney filtering, nuclease activity, loss during purification, etc. In the studies cited above, this problem was overcome by the use of highly sensitive methods, such as nested PCR. Additionally, the assays of fetal DNA in maternal urine used the highly repeated DYZ14 (human Ychromosome specific repeated DNA family) sequences to facilitate detection. Later efforts by several groups to reproduce the original findings using single-copy SRY (sex-determining region, Y-linked) or TSPY (testisspecific protein, Y-linked; about 10 copies/cell) se-

renal DNA; GuSCN, guanidine isothiocyanate; UNG, uracil DNA N-glycosylase; iso-dC, 5⬘-methylisocytosine; iso-dGTP, 2⬘-deoxy-isoG triphosphate; dNTP, deoxynucleoside triphosphate; dATP, deoxyadenosine triphosphate; dCTP, deoxycytidine triphosphate; dGTP, deoxyguanosine triphosphate; dUTP, deoxyuridine triphosphate; MGB, minor groove binder; PPV, positive predictive value; NPV, negative predictive value. 4 Human genes: DYZ1: human Y-chromosome specific repeated DNA family; SRY, sex-determining region, Y-linked; TSPY testis-specific protein, Y-linked.

723

quences and conventional or real-time PCR resulted in low test detection rate or failed entirely (18 –21 ). We applied 2 approaches to improve the diagnostic sensitivity of Tr-DNA analysis, create a reliable test for prenatal sex detection of a fetus, and further develop the basic principles of the Tr-DNA technology in general. First, we used a new technique for the purification of Tr-DNA based on the binding of cell-free urinary nucleic acid or nucleoproteins to a Q-Sepharose anionexchange resin, followed by elution with LiCl. This procedure effectively concentrates DNA and partially purifies it, destroying DNA–protein bonds during elution. Electrophoresis of nucleic acids isolated by this method revealed the presence in urine, in addition to the fractions described above, of even shorter DNA fragments ranging from 50 to 150 bp (22 ). This finding indicated that focusing the detection efforts on very short DNA sequences could increase the diagnostic sensitivity of Tr-DNA assays. Second, we developed several “ultrashort” target PCR techniques and used them for Tr-DNA analysis. Here we describe the results obtained using a combination of these 2 approaches for sex detection of a fetus by PCR amplification of Y chromosome–specific SRY sequences from maternal urinary DNA. Materials and Methods URINE SAMPLE COLLECTION

We collected 173 urine samples from pregnant women at Eastern Virginia Medical School under an institutional review board–approved study according to federally regulated guidelines for research involving human subjects. These women were not instructed in the precautions necessary for the proper sample collection to avoid potential contamination with male cells/DNA. However, before sample collection a brief survey was taken to exclude women who had intercourse during the preceding 24 h. The sex of fetuses was determined by ultrasound examination and subsequently verified after delivery; 91 and 82 women were confirmed to have been pregnant with female and male fetuses, respectively. At the time of urine sample collection, 15 women were in the first trimester (8 male, 7 female fetuses), 124 women were in the second trimester (61 male and 63 female fetuses), and 34 women were in the third trimester (13 male and 21 female fetuses). A limited number of urine samples were collected from local pregnant volunteers after special instructions on avoiding male contact. All study participants signed informed consent documents. The 50 –100 mL urine (routinely obtained) was supplemented with EDTA-Na2 up to 10 mmol/L final concentration, divided into aliquots, and stored at ⫺80 °C before being shipped on dry ice for analysis. 724 Clinical Chemistry 55:4 (2009)

Received samples were archived at ⫺80 °C for up to 2 years. DNA ISOLATION

Urine samples were thawed to room temperature and mixed by gentle inversion. Two protocols were used for urinary DNA purification. Q-resin– based method. We placed 10 mL urine in a 50-mL conical centrifuge tube and diluted it with 10 mL nuclease-free water (Ambion). We added 200 ␮L of Q-Sepharose resin slurry (GE Healthcare) to each sample, and the tubes were rotated at room temperature for 30 min on a rolling drum. The Sepharose resin was pelleted by centrifugation at 1800g for 5 min at ambient temperature, and the supernatant was removed by vacuum aspiration. We resuspended the pelleted resin in 1 mL of 0.3 mol/L LiCl/10 mmol/L sodium acetate (pH 5.5) and transferred it to a Bio-Rad microspin column. The resuspension/washing buffer was removed by 1-min 800g centrifugation at ambient temperature. The resin was washed with an additional 2.0 mL of 0.3 mol/L LiCl/10 mmol/L sodium acetate (pH 5.5) by 2-min 800g centrifugation at ambient temperature. Tr-DNA was eluted from the Q-Sepharose resin with 670 ␮L of 2 mol/L LiCl/10 mmol/L sodium acetate (pH 5.5) using a 3-min 800g centrifugation at ambient temperature. The eluate was added to 2.0 mL 95% ethanol and gently mixed. This mixture was applied onto a Qiagen QIAquick column by 800g centrifugation for 1 min. The column was washed with 1 mL of 2 mol/L LiCl in 70% ethanol by 800g centrifugation for 1 min and further washed with 1 mL of 75 mmol/L potassium acetate (pH 5.0) in 80% ethanol by 800g centrifugation for 1 min. The column was dried by centrifugation in a microfuge (approximately 20 000g) for 3 min. DNA was eluted with 106 ␮L elution buffer (EB; Qiagen) using 2-min centrifugation in a microfuge (approximately 20 000g) and stored at ⫺20 °C. Silica-based method. We added 10 mL of 6 mol/L guanidine isothiocyanate (GuSCN) (Ambion) to 10 mL urine placed in a 50-mL conical centrifuge tube. We then added 200 ␮L Wizard resin slurry (Promega) to each sample, and the tubes were rotated at room temperature for 1 h on a rolling drum. We attached the required number of Wizard minicolumns with adapters to a vacuum manifold, each with a 20-mL syringe barrel with the plunger removed. We pipetted the urine/resin mixture into the 20-mL syringe barrel. The barrel/column was washed with 10 mL of 3 mol/L GuSCN followed by 10 mL of 80% isopropanol wash (Sigma Aldrich). The syringe barrel was removed and discarded. The minicolumn

Optimization of Fetal Transrenal DNA Analysis

Table 1. Sets of primers and probes.a Primer/probe

Function

Sequence

PCR target size 30 bp, PCR product size 54 bp PD-SRY30-F

Forward primer

5⬘-ATCGAGCAGCTCCGAAAGCCACACACTC-3⬘

PD-SRY30-R

Reverse primer

5⬘-ATCGCAGACGCTGGTGCTCCATTCTTG-3⬘

PCR target size 25 bp, PCR product size 61 bp ES0575

First-stage forward 5⬘-CCTCAGCCATC[iso-dC]GTCTGTCCACCGTGAGCTTGATGGCTGAGGTCGTGTGGTC-3⬘ primer

ES0576

Reverse primer

5⬘-ACGAGACTCGCCTCTGATCG-3⬘

ES0538

Second-stage forward primer

5⬘-CTGTCCACCGTGAGCTTGATG-3⬘

SRY-65MGBFAM

TaqMan MGB probe

FAM-5⬘-TCGTGTGGTCTCGCG-3⬘-MGB-BHQ1

PCR target size 39 bp, PCR product size 67 bp ES0581

First-stage forward 5⬘-GGTCAGCCATC[iso-dC]GTCTGTCCACCGTGAGCTTGATGGCTGACCCATTCATCGTGTGG primer TCT-3⬘

ES0536

Reverse primer

5⬘-CCATCTTGCGCCTCTGAT-3⬘

ES0538

Second-stage forward primer

5⬘-CTGTCCACCGTGAGCTTGATG-3⬘

SRY-65MGBFAM

TaqMan MGB probe

FAM-5⬘-TCGTGTGGTCTCGCG-3⬘-MGB-BHQ1

PCR target and PCR product size 65 bp SRY 583U23

Forward primer

5⬘-ATAGAGTGAAGCGACCCATGAAC-3⬘

SRY 630L18

Reverse primer

5⬘-AGCCATCTTGCGCCTCTG-3⬘

SRY-65MGBFAM

TaqMan MGB probe

FAM-5⬘-TCGTGTGGTCTCGCG-3⬘-MGB-BHQ1

PCR target and PCR product size 88 bp

a

SRY-111f

Forward primer

5⬘-TTTGGATAGTAAAATAAGTTTCGAAC-3⬘

SRYn-111r

Reverse primer

5⬘-CAGAAGCATATGATTGCATTGTC-3⬘

SRY-111fam

TaqMan probe

FAM-5⬘-CTGGCACCTTTCAATTTTGTCGCACT-3⬘-BHQ1

Some of the primers have 5⬘-end extension sequences (tails) not complementary to the targets, resulting in PCR products being longer than the respective targets.

was washed with 200 ␮L of 80% isopropanol and further washed with 280 ␮L of 95% ethanol. The minicolumn was removed from the vacuum manifold and dried using a 3-min centrifugation in a microfuge (approximately 20 000g). Tr-DNA was eluted with 106 ␮L hot (preheated to 60 °C) nuclease-free water (Ambion) using a 2-min centrifugation in a microfuge (approximately 20 000g). Tr-DNA was stored at ⫺20 °C. CONVENTIONAL PCR

We used primers PD-SRY30-F and PD-SRY30-R (Table 1) in conventional PCR assays. After a 10-min uracil DNA N-glycosylase (UNG) treatment, the reactions were carried out for 40 cycles in 25-␮L volumes in

the presence of 300 nmol/L each primer, 3 mmol/L MgCl2, 0.05 U/␮L JumpStart Taq DNA Polymerase (Sigma), 10 mmol/L Tris-HCl (pH 8.3), 50 mmol/L KCl, and 200 ␮mol/L of each dNTP (deoxynucleoside triphosphate). Each cycle consisted of 30-s denaturation phase at 95 °C, 10-s annealing at 60 °C, and 10-s extension at 72 °C. Reaction products were subjected to electrophoresis in 10% polyacrylamide gel and stained with SYBR Gold (Invitrogen) according to manufacturer’s instructions. In addition to the SRY band, these primers also generate a 100-bp band with both male and female DNA. Sequencing of this band revealed that the 100-bp product originates from human chromosome 2 Clinical Chemistry 55:4 (2009) 725

Fig. 1. Two-stage real-time PCR assay. First-stage forward primer P1 included a low melting temperature (Tm) target-recognition sequence, a high Tm stemforming part, an elongation blocker (iso-dC, denoted by V), and a sequence identical to the second-stage forward primer P3 (thick arrow). T, original target DNA fragment; P2, reverse primer; IP, intermediate product, which is produced in stage 1 and serves as template in stage 2; Pr, TaqMan probe, dual-labeled with fluorophore F and quencher Q.

(GenBank HS2 22329). It was produced consistently for all samples in the absence of male genomic DNA, and so we used this band as an internal control for DNA isolation and the PCR assay. Samples were characterized according to the following rules: 1. Samples that generate a 50-bp product (30-bp SRY target) were flagged as male fetuses. 2. Samples lacking a 50-bp product but generating the 100-bp product were flagged as female fetuses only. 3. Samples lacking evidence of a 50-bp or 100-bp product were flagged as invalid due to PCR inhibitors or failure of DNA isolation or amplification steps. REAL-TIME PCR

We designed and optimized real-time PCR assays for 4 targets of differing lengths within the SRY gene (Table 1). Of these, the assays for the longer 65- and 88-bp targets were standard TaqMan dual-labeled probe realtime PCR assays. For the detection of the shorter 25and 39-bp targets, which are too short to be used directly in a standard TaqMan assay, we designed a novel 2-stage, single-tube, real-time PCR method. In the first stage of this assay, the participating forward primer has a special 5⬘ tail (Fig. 1), resulting in an intermediate product that is significantly longer than the original target sequence. The intermediate product is long enough to be used as a template in a TaqMan real-time PCR in the second stage. The special 5⬘ tail of the firststage forward primer includes (a) a sequence homolo726 Clinical Chemistry 55:4 (2009)

gous to the second-stage forward primer, (b) a stemforming sequence that reduces the primer’s affinity for the intermediate product in favor of the second-stage forward primer during the second stage, and (c) a 5⬘methylisocytosine (iso-dC) base, which, in the absence of 2⬘-deoxy-isoG triphosphate (iso-dGTP), blocks nascent strand elongation, resulting in the intermediate product lacking the stem structure. The stage progression of the reaction was triggered by increasing the temperature of the annealing/extension phase after a preset number of amplification cycles, thereby effectively excluding the first-stage primer from participation in the reaction, thus initiating stage 2. Each real-time PCR assay was carried out in 25 ␮L of PCR buffer [10 mmol/L Tris-HCl (pH 8.3), 50 mmol/L KCl] in the presence of 0.05 U/␮L JumpStart Taq DNA Polymerase (Sigma), 200 ␮mol/L deoxyadenosine triphosphate (dATP), deoxycytidine triphosphate (dCTP), deoxyguanosine triphosphate (dGTP) (each), 400 ␮mol/L deoxyuridine triphosphate (dUTP), following a 10-min treatment with 0.01 U/␮L UNG. Other reaction conditions were optimized for each set of oligonucleotides, and therefore differ between the sets. The reactions for 25- and 39-bp targets contained 1.5 mmol/L MgCl2, 200 nmol/L first-stage forward primer, 700 nmol/L second-stage forward primer, 700 nmol/L reverse primer, and 100 nmol/L (for 25-bp target) or 200 nmol/L (for 39-bp target) TaqMan minor groove binder (MGB) probe. The 65-bp PCR target assay contained 3 mmol/L MgCl2, 300 nmol/L forward and reverse primer (each), and 250 nmol/L TaqMan MGB probe. The 88-bp assay contained 2 mmol/L MgCl2, 500 nmol/L forward and reverse primer (each), and 200 nmol/L TaqMan probe. TaqMan MGB probes were obtained from Applied Biosystems, and all other oligonucleotides from Integrated DNA Technologies Inc. The sequences of all oligonucleotides are listed in Table 1. We used 1, 10, 100, and 1000 genome equivalents of human male DNA (Promega) to construct the calibration curves. The PCRs for 25- and 39-bp targets were carried out for an initial 10 cycles (stage 1), followed by 40 cycles of stage 2. Stage 1 cycles consisted of 30-s denaturation at 95 °C and 60-s annealing/extension at 47 °C (for 25-bp target) or 53 °C (for 39-bp target). Stage 2 cycles consisted of 30-s denaturation at 95 °C and 60-s annealing/extension at 57 °C (for 25-bp target) or 63 °C (for 39-bp target). Fluorescence was measured during stage 2 at the end of each cycle. The 65- and 88-bp assays were run for 40 cycles, consisting of 30-s denaturation at 95 °C and 60-s annealing/extension at 66 °C (for 65-bp target) or 60 °C (for 88-bp target). Fluorescence was measured at the end of each cycle.

Optimization of Fetal Transrenal DNA Analysis

Fig. 2. Detection of fetal SRY sequences by conventional PCR in DNA isolated from urine of pregnant women by 2 methods. Women 1 through 10 were pregnant with male fetuses, women 11 through 15 with female fetuses. 么, human male genomic DNA at 1000, 100, 10, and 1 genome-equivalents per reaction (ge/rxn) (Promega); 乆, 1000 ge/rxn human female genomic DNA (Promega); NTC, no-template control; M, low-molecular-weight DNA ladder (New England Biolabs). (A) and (B), DNA was isolated using the Q-resin– based method; (C), DNA was isolated using the silica-based method. The arrows point to the band corresponding to the SRY PCR product.

STATISTICAL ANALYSIS

Statistical analyses were performed using the SYSTAT 12 (Systat Software Inc.) and Excel 2003 (Microsoft Corp.) software packages for Windows. To characterize the utility of our method, we calculated its diagnostic sensitivity and specificity, as well as the positive and negative predictive value (PPV and NPV) (23 ). We estimated the overall correctness of the method by fitting a binomial distribution to the obtained data and comparing the P values to the chosen significance level. Results In the first set of experiments, we compared the fetal DNA sequence detection in DNA isolated from maternal urine by the silica-based and new Q-resin– based technique. Primers PD-SRY30-F and PD-SRY30-R, designed to amplify a 30-bp target in the SRY gene, were used to detect this Y-chromosome–specific sequence in the urine of women pregnant with male (10 samples) or female (5 samples) fetuses. Fig. 2 shows that male fetal DNA was successfully detected in all specimens processed by the Q-resin method but only 7 DNA preparations purified by the silica method. All urine samples from pregnancies with female fetuses yielded negative results.

Using 4 sets of primers and probes that amplified 25-, 39-, 65-, and 88-bp sequences within the SRY gene, we performed real-time PCR with DNA isolated from the same urine specimens of women pregnant with male fetuses by 2 techniques, based on silica or Q-resin adsorption. Fig. 3 clearly demonstrates that both the method of DNA isolation and the PCR target size are important. First, the detection rate decreases as the length of PCR target increases. The increase of the target sequence size from 25 to 39 bp decreased the detection rate, and fetal DNA was completely undetectable in all samples with the 88-bp PCR target. Second, the isolation of DNA fragments shorter than 150 bp with the Q-resin– based technique increased the diagnostic sensitivity (the number of correctly detected male fetuses) when 25- and 39-bp sequences were analyzed. In addition, the number of SRY copies (range) detected using primers for the 25-bp target was higher in DNA isolated from same samples with the Q-resin– based method compared to the silica-based one: 196.5 (17.1– 451.8) vs 26.3 (0 – 46.9) genome equivalents/mL urine, respectively. The 2 DNA isolation methods gave similar results with the 65-bp PCR target. The real-time PCR experiments with primers and probes for every target length were performed twice and yielded 100% concordance of the results. Clinical Chemistry 55:4 (2009) 727

statistical analysis of this result, we estimate at 99% confidence level that the test accuracy is 85%–96%. Surprisingly, for 15 women in the first trimester (7–12 weeks of pregnancy, 8 male and 7 female fetuses) the test was 100% sensitive and 100% specific. Because of a limited number of first-trimester samples, however, the difference between the first and other trimesters is not statistically significant. Discussion

Fig. 3. Effect of urinary DNA isolation technique and of PCR target size on the efficiency of fetal DNA detection in urine of 10 pregnant women carrying male fetuses. Œ, Q-resin– based DNA isolation; E, silica-based DNA isolation.

Finally, to evaluate diagnostic sensitivity and specificity of fetal sex detection by analysis of Tr-DNA in maternal urine, we analyzed urine samples obtained from 173 pregnant women. The DNA was isolated using the Q-resin– based method and analyzed by conventional PCR for the presence of the 30-bp SRY target. Table 2 shows that the SRY sequences were successfully detected in Tr-DNA from 78 of 82 women pregnant with male fetuses (PPV 87.6%), and 11 of 91 urine samples from women pregnant with female fetuses gave false-positive results (NPV 95.2%). Most likely this was a result of urine sample contamination with male DNA (see below). To evaluate the reproducibility of these results, we repeated the purification and the conventional PCR analysis for a randomly selected set of 15 of the 173 urine samples. The results obtained were identical to those from the initial experiments. Thus, we correctly determined sex in 158 of 173 cases, an overall success rate of 91.3%. After further

Table 2. Fetal sex detection in the urine of pregnant women.a Actual sex Detected sex

Male Female a

Male

Female

78

11

4

80

Y chromosome–specific SRY sequences were detected by conventional PCR. PPV, 87.6%; NPV, 95.2%; sensitivity, 95.1%; specificity, 87.9%.

728 Clinical Chemistry 55:4 (2009)

The data presented here provide new information on the properties of Tr-DNA, in particular the characteristics of fetal Tr-DNA in maternal urine. First, the higher detection rate of fetal sequence in DNA purified with Q-resin compared to DNA isolated by the silica method demonstrates that DNA fragments 50 to 150 bp long contain fetal Tr-DNA. This difference is seen with 25- and 39-bp PCR targets only. Thus, larger fragments of fetal Tr-DNA, detectable by real-time PCR directed at the 65-bp target, belong to DNA fractions isolated by both methods, most likely to 150- to 200-bp DNA fragments. Second, although the shortest DNA fragments isolated by the silica method are about 150 bp long, the ability to detect the fetal sequences therein depends on the PCR target length over the range of 25– 88 bp. Recently Chan et al. (24 ) came to the same conclusion comparing median Epstein-Barr virus DNA concentrations, measured by the 59- and 76-bp amplicons, in urine of nasopharyngeal carcinoma patients. The most plausible explanation for these results is the presence of single-strand breaks in the 150- to 200-bp fragments of Tr-DNA, which make amplifiable targets substantially shorter. It is likely that the frequency of such single-strand breaks is dependent on the activity of urinary nucleases and the time between DNA secretion into urine and urine collection, but our data do not provide sufficient information to draw any conclusions. It is also unknown whether ccfDNA in the bloodstream contains such single-strand breaks and very short (50- to ⬍150-bp) DNA fragments, or whether they are formed after its secretion into the urine. The PPV of fetal DNA detection in urine of women pregnant with male fetuses was 87.6%. One should take into account that testing was performed about 2 years after sample collection. It is possible that a higher assay PPV could be achieved with freshly collected specimens. Analysis of urine samples from 10 local pregnant volunteers gave 100% PPV (data not shown). Every primer/probe set for the SRY gene used in this study was 100% specific when tested on purified male or female human genomic DNA. However, SRY

Optimization of Fetal Transrenal DNA Analysis

sequences were detected in about 12.1% of the urine samples obtained from pregnant women carrying female fetuses. Male contact with the subjects of the study was not rigorously controlled for, which may explain the presence of these SRY sequences in the samples. Our data demonstrate that fetal DNA can be successfully detected in the urine of pregnant women if adequate methods for DNA isolation and analysis are applied. The diagnostic sensitivity of the sex detection test is similar to that demonstrated by plasma DNAbased tests (25–27 ). However, the chances of urinary DNA contamination are higher than those for plasma DNA, and therefore effective precautions must be used.

Author Contributions: All authors confirmed they have contributed to the intellectual content of this paper and have met the following 3 re-

quirements: (a) significant contributions to the conception and design, acquisition of data, or analysis and interpretation of data; (b) drafting or revising the article for intellectual content; and (c) final approval of the published article. Authors’ Disclosures of Potential Conflicts of Interest: Upon manuscript submission, all authors completed the Disclosures of Potential Conflict of Interest form. Potential conflicts of interest: Employment or Leadership: E.M. Shekhtman, Xenomics Inc.; K. Anne, Xenomics Inc.; H.S. Melkonyan, Xenomics Inc.; D.J. Robbins, Xenomics Inc.; S.R. Umansky, Xenomics Inc. Consultant or Advisory Role: S.L. Warsof, Xenomics Inc. Stock Ownership: E.M. Shekhtman, Xenomics Inc.; H.S. Melkonyan, Xenomics Inc.; S.R. Umansky, Xenomics Inc. Honoraria: None declared. Research Funding: None declared. Expert Testimony: None declared. Role of Sponsor: The funding organizations played no role in the design of study, choice of enrolled patients, review and interpretation of data, or preparation or approval of manuscript.

References 1. Chen XQ, Stroun M, Magnenat J-L, Nicod LP, Kurt A-M, Lyautey J, et al. Microsatellite alterations in plasma DNA of small cell lung cancer patients. Nat Med 1996;2:1033–5. 2. Nawroz H, Koch W, Anker P, Stroun M, Sidransky D. Microsatellite alterations in serum DNA of head and neck cancer patients. Nat Med 1996;2: 1035–7. 3. Lo YM, Corbetta N, Chamberlain PF, Rai V, Sargent IL, Redman CW, Wainscoat JS. Presence of fetal DNA in maternal plasma and serum. Lancet 1997;350:485–7. 4. Wong IH, Lo YM, Zhang J, Liew CT, Ng MH, Wong N, et al. Detection of aberrant p16 methylation in the plasma and serum of liver cancer patients. Cancer Res 1999;59:71–3. 5. Anker P, Mulcahy H, Chen XQ, Stroun M. Detection of circulating tumour DNA in the blood (plasma/serum) of cancer patients. Cancer Metastasis Rev 1999;18:65–73. 6. O’Driscoll L. Extracellular nucleic acids and their potential as diagnostic, prognostic and predictive biomarkers. Anticancer Res 2007;27:1257– 65. 7. Chiu RW, Lo YM. The biology and diagnostic applications of fetal DNA and RNA in maternal plasma. Curr Top Dev Biol 2004;61:81–111. 8. Lo YM, Chiu RW. Noninvasive prenatal diagnosis of fetal chromosomal aneuploidies by maternal plasma nucleic acid analysis. Clin Chem 2008;54: 461– 6. 9. Bianchi DW. Circulating fetal DNA: its origin and diagnostic potential: a review. Placenta 2004;25 (Suppl A):S93–101. 10. Campello YV, Ikuta N, Brondani RA, Lunge VR, Fett SR, Kazantzi FAS, et al. Role of plasma DNA as a predictive marker of fatal outcome following severe head injury in males. J Neurotrauma 2007; 24:1172– 81.

11. Rainer TH, Wong LK, Lam W, Yuen E, Lam NY, Metreweli C, Lo YM. Prognostic use of circulating plasma nucleic acid concentrations in patients with acute stroke. Clin Chem 2003;49:562–9. 12. Lo YM, Zhang J, Leung TN, Lau TK, Chang AM, Hjelm NM. Rapid clearance of fetal DNA from maternal plasma. Am J Hum Genet 1999;64: 218 –24. 13. Botezatu I, Serdyuk O, Potapova G, Alechina R, Arsenin S, Melkonyan H, et al. Genetic analysis of DNA excreted in urine: a new approach for detecting specific genomic DNA sequences from cells dying in an organism. Clin Chem 2000;46: 1078 – 84. 14. Umansky SR, Tomei LD. Transrenal DNA testing: progress and perspectives. Expert Rev Mol Diagn 2006;6:153– 63. 15. Su Y-H, Wang M, Brenner DE, Ng A, Melkonyan H, Umansky SR, Syngal S, Block TM. Human urine contains small, 150 –250 nucleotide sized, soluble DNA derived from the circulation and may be useful in the detection of colorectal cancer. J Mol Diagn 2004;6:101–7. 16. Su YH, Wang M, Aiamkitsumrit B, Brenner DE, Block TM. Detection of a K-ras mutation in urine of patients with colorectal cancer. Cancer Biomarkers 2005;1:177– 82. 17. Cannas A, Goletti D, Girardi E, Chiacchio T, Calvo L, Cuzzi G, et al. Mycobacterium tuberculosis DNA detection in soluble fraction of urine from pulmonary tuberculosis patients. Int J Tuberc Lung Dis 2008;12:146 –51. 18. Al-Yatama MK, Mustafa AS, Ali S, Abraham S, Khan Z, Khaja N. Detection of Y chromosomespecific DNA in the plasma and urine of pregnant women using nested polymerase chain reaction. Prenat Diagn 2001;21:399 – 402. 19. Li Y, Zhong XY, Kang A, Troeger C, Holzgreve W,

20.

21.

22.

23. 24.

25.

26.

27.

Hahn S. Inability to detect cell free fetal DNA in the urine of normal pregnant women nor in those affected by preeclampsia associated HELLP syndrome. J Soc Gynecol Investig 2003;10:503– 8. Koide K, Sekizawa A, Iwasaki M, Matsuoka R, Honma S, Farina A, et al. Fragmentation of cellfree fetal DNA in plasma and urine of pregnant women. Prenat Diagn 2005;25:604 –7. Majer S, Bauer M, Magnet E, Strele A, Giegerl E, Eder M, et al. Maternal urine for prenatal diagnosis: an analysis of cell-free fetal DNA in maternal urine and plasma in the third trimester. Prenat Diagn 2007;27:1219 –23. Melkonyan HS, Feaver WJ, Meyer E, Scheinker V, Shekhtman EM, Xin Z, Umansky SR. Transrenal nucleic acids from proof-of-principle to clinical tests: problems and solutions. Ann N Y Acad Sci 2008;1137:73– 81. Altman DG, Bland JM. Diagnostic tests 2: predictive values. BMJ 1994;309:102. Chan KC, Leung SF, Yeung SW, Chan AT, Lo YM. Quantitative analysis of the transrenal excretion of circulating EBV DNA in nasopharyngeal carcinoma patients. Clin Cancer Res 2008;14:4809 –13. Johnson KL, Dukes KA, Vidaver J, LeShane ES, Ramirez I, Weber WD, et al. Interlaboratory comparison of fetal male DNA detection from common maternal plasma samples by real-time PCR. Clin Chem 2004;50:516 –21. Wagner J, Dzijan S, Pavan-Jukic´ D, Wagner J, Lauc G. Analysis of multiple loci can increase reliability of detection of fetal Y-chromosome DNA in maternal plasma. Prenat Diagn 2008;28:412– 6. Picchiassi E., Coata G, Fanetti A, Centra M, Pennacchi L, Di Renzo GC. The best approach for early prediction of fetal gender by using free fetal DNA from maternal plasma. Prenat Diagn 2008;28:525–30.

Clinical Chemistry 55:4 (2009) 729

Clinical Chemistry 55:4 730–738 (2009)

Molecular Diagnostics and Genetics

Profile of the Circulating DNA in Apparently Healthy Individuals Julia Beck,1 Howard B. Urnovitz,1 Joachim Riggert,2 Mario Clerici,3,4 and Ekkehard Schu¨tz1*

BACKGROUND: Circulating nucleic acids (CNAs) have been shown to have diagnostic utility in human diseases. The aim of this study was to sequence and organize CNAs to document typical profiles of circulating DNA in apparently healthy individuals. METHODS:

Serum DNA from 51 apparently healthy humans was extracted, amplified, sequenced via pyrosequencing (454 Life Sciences/Roche Diagnostics), and categorized by (a) origin (human vs xenogeneic), (b) functionality (repeats, genes, coding or noncoding), and (c) chromosomal localization. CNA results were compared with genomic DNA controls (n ⫽ 4) that were subjected to the identical procedure.

We obtained 4.5 ⫻ 105 sequences (7.5 ⫻ 107 nucleotides), of which 87% were attributable to known database sequences. Of these sequences, 97% were genomic, and 3% were xenogeneic. CNAs and genomic DNA did not differ with respect to sequences attributable to repeats, genes, RNA, and protein-coding DNA sequences. CNA tended to have a higher proportion of short interspersed nuclear element sequences (P ⫽ 0.1), of which Alu sequences were significant (P ⬍ 0.01). CNAs had a significantly lower proportion of L1 and L2 long interspersed nuclear element sequences (P ⬍ 0.01). In addition, hepatitis B virus (HBV) genotype F sequences were found in an individual accidentally evaluated as a healthy control. RESULTS:

CONCLUSIONS: Comparison of CNAs with genomic DNA suggests that nonspecific DNA release is not the sole origin for CNAs. The CNA profiling of healthy individuals we have described, together with the detailed biometric analysis, provides the basis for future studies of patients with specific diseases. Furthermore, the detection of previously unknown HBV infection

1

Chronix Biomedical GmbH, Goettingen, Germany; 2 Department of Transfusion Medicine, University of Goettingen, Goettingen, Germany; 3 Laboratory of Molecular Medicine and Biotechnology, Don C. Gnocchi ONLUS Foundation IRCCS, Milan, Italy; 4 Department of Biomedical Sciences and Technologies, University of Milan, Milan, Italy. * Address correspondence to this author at: Chronix Biomedical, Goetheallee 8, 37073 Goettingen, Germany. Fax ⫹49 551 37075726; e-mail esc@

730

suggests the capability of this method to uncover occult infections. © 2009 American Association for Clinical Chemistry

Nucleic acids have been detected in the plasma, serum, and urine of healthy and diseased humans and animals (1 ). Both DNA and RNA can be isolated from serum and plasma and are commonly referred to as circulating nucleic acids (CNAs).5 Early work concentrated on detecting quantitative differences in circulating DNA between samples from patients with disease and samples from healthy individuals (2– 4 ). The general diagnostic value of simple quantitative measures of circulating DNA is controversial (5– 8 ). Further work on the use of CNA as a diagnostic marker for neoplasia included the detection of qualitative rather than quantitative differences, such as specific oncogene mutations (9, 10 ), loss of heterozygosity (11–13 ), specific Alu amplicons (14 ), and methylation patterns (15 ) found in plasma or serum, and matching them with DNA characteristics in primary tumors (16 ). Although most of the data available in the literature on the possible diagnostic uses of CNA were derived from studies of cancer patients, increases in circulating DNA have also been reported for other diseases, including trauma (17 ), stroke (18 ), Gulf War–related illnesses (19 ), autoimmune diseases such as systemic lupus erythematosus (20 ), and diabetes mellitus (4 ). In addition, fetal CNAs extracted from maternal plasma have served as markers in prenatal diagnostics (21 ), and fetal-DNA abnormalities have been linked to pregnancy-associated disorders (22, 23 ). The cellular origin of the circulating DNA found in healthy individuals and the precise mechanisms by

chronixbiomedical.de. Received July 10, 2008; accepted January 7, 2009. Previously published online at DOI: 10.1373/clinchem.2008.113597 5 Nonstandard abbreviations: CNA, circulating nucleic acid; HBV, hepatitis B virus; WGA, whole-genome amplification; BLAST, Basic Local Alignment Search Tool; NCBI, National Center for Biotechnology Information; UTR, untranslated region; CDS, protein-coding DNA sequence.

Profile of Circulating DNA in Healthy Individuals

which DNA enters the bloodstream are unknown. An early report found a correlation between plasma DNA concentrations and known markers of cell death in lung cancer patients, suggesting that at least a portion of the DNA in serum and plasma does originate from apoptotic cells (24 ). In favor of this hypothesis are data indicating that most of the circulating DNA in the plasma of sex-mismatched bone marrow transplant patients is of hematopoietic origin (25 ). Alternatively, active cellular release of newly synthesized DNA has been suggested (26 –28 ). A complete analysis of genomic sequences in circulating DNA, especially from healthy individuals, is not currently available. Recently, analyses were reported for 556 clones of plasma DNA obtained from healthy individuals (29 ). The availability of massively parallel sequencing technologies, such as the 454 Life Sciences/Roche Diagnostics GS FLX systems, allows the generation of 100 megabases of sequence information in a single experiment. For the first time, we have applied this high-throughput sequencing technology to generate an unbiased profile of the circulating DNA in healthy individuals, a profile based on an unprecedented amount of sequence information. Materials and Methods STUDY PARTICIPANTS

We obtained serum samples from 51 apparently healthy individuals (27 female and 24 male) between 18 and 64 years of age in the Department of Transfusion Medicine of the Georg-August University of Go¨ttingen (n ⫽ 37) and the Don Gnocchi Foundation IRCCS repositories (n ⫽ 14). Donor samples were from excess serum from blood drawn for required serologic diagnostics in accordance with European Union regulation 98/79/EC. IRCCS samples were from apparently healthy volunteers. All donors provided written informed consent. All samples were anonymized. A previously undiagnosed hepatitis B virus (HBV) infection was found in one of the male volunteers (IRCCS); therefore, sequences obtained from this sample were excluded from the subsequent analysis of 50 apparently healthy individuals. SAMPLING

Serum samples were collected and stored at ⫺80 °C until further processing. Frozen serum was thawed at 4 °C, and cell debris was removed by brief centrifugation at 4000g for 20 min. Total nucleic acids were extracted from 200 ␮L of the supernatant with the High Pure Viral Nucleic Acid Kit (Roche Applied Science) according to the manufacturer’s instructions. We also collected EDTA-anticoagulated samples of whole blood from a subgroup of the volunteers (2 females, 2

males) and extracted genomic DNA with standard protocols. GENERATION OF RANDOM DNA LIBRARIES

We used the GenomePlex® Single Cell Whole Genome Amplification Kit (Sigma–Aldrich) according to manufacturer’s instructions to amplify DNA from 1 ␮L of the nucleic acid solution extracted from serum. We amplified comparable amounts (0.1 ng) of genomic DNA with the same procedure. Figs. 1 and 2 in the Data Supplement that accompanies the online version of this article at http://www.clinchem.org/content/vol55/ issue4 present the size distribution and the effect of whole-genome amplification (WGA). The amplified DNA preparations were sequenced directly with a GS FLX genome sequencer (454 Life Sciences/Roche Diagnostics) according to the manufacturer’s instructions. Raw sequences were trimmed of sequences corresponding to the adapters and primers that were used. SEQUENCE-ANALYSIS PIPELINE

We conducted local-alignment analyses with the BLAST program (Basic Local Alignment Search Tool) and highly stringent parameters to investigate the origins of circulating DNA molecules (30 ). To detect and mask repetitive elements, we used a local install of the RepeatMasker software package (Institute for Systems Biology) (31 ), which makes use of Repbase (version 12.09; Genetic Information Research Institute) (32 ). After masking repetitive elements and regions of low sequence complexity, we conducted sequential BLAST analyses for each sequence by querying databases of bacterial, viral, and fungal genomes, as well as the human genome (reference genome build 36.2). Bacterial, viral, fungal, and human genomes were obtained from the National Center for Biotechnology Information (NCBI) (ftp://ftp.ncbi.nih.gov). After each of the sequential database searches, we masked all parts of a queried sequence that produced significant hits (e ⬍ 0.0001) and subsequently used the masked sequences to query the next database. To quantify the amounts of unidentified nucleotides, we counted and subtracted the masked nucleotides from the total nucleotide counts. For each query fragment and each database search, we recorded the highest-scoring BLAST hit with a length of ⬎50% of the query sequence in an SQL database. The highest-scoring BLAST hit was defined as the longest hit with the highest percent identity (maximum of hit length ⫻ identity). For each of the sequences, we recorded the start and stop positions for query and hit, and recorded the description of the corresponding matching subject. Repeat annotations and the respective lengths were recorded according to the output produced by the RepeatMasker software. Clinical Chemistry 55:4 (2009) 731

Table 1. Detailed definition of the analyzed genomic features. Functional genomic feature

Detailed description

Gene

Sequence annotated as gene by seq_gene file as obtained from NCBI

Pseudogene

Sequence annotated as pseudogene by seq_gene file as obtained from NCBI

RNA

All parts of a gene transcribed into RNA (according to NCBI annotation)

CDS

All parts of a gene/RNA translated into protein (according to NCBI annotation)

UTR

Parts of a gene transcribed; mRNA not translated into protein (according to NCBI annotation)

Intergenic sequence

All sequences not annotated as gene or pseudogene

The genome-annotation file for known and predicted genes (seq_gene.md) was obtained from the NCBI (download date, September 15, 2008). Only entries referring to the reference genome assembly were extracted from this file. Our evaluations of positions of hits within the genomic contigs with this annotation file led to hit counts and corresponding hit lengths within annotated genes and pseudogenes. We subdivided annotated gene sequences further into transcribed sequences [RNAs and untranslated regions (UTRs)] and protein-coding DNA sequences (CDSs) (Table 1). We normalized the observed nucleotide counts for each sample and each feature by the sample’s total number of genomic hits, which was defined as the sum of nucleotides matching the human genome and repetitive-elements databases. To exemplify the binning of the genomic elements and repeats, we used repetitive elements from the annotation file of the masked human genome available at the RepeatMasker Web site (31 ) in a joined query with the seq_gene.md database. Fig. 3 in the online Data Supplement presents a Venn diagram illustrating the relationships and binnings. STATISTICAL ANALYSIS

The primary null hypothesis was equality of representation of the evaluated element in circulating DNA (observed values) and in genomic DNA processed and analyzed by the same methods (expected values). All serum nucleic acid data are presented as a ratio to the corresponding values for genomic DNA subjected to the same experimental and biometrical procedures. The Kolmogorov–Smirnov test was used to test all data sets for goodness of fit to a normal distribution. To compare observed and expected values, we used the value dispersion of the observed/expected ratio to generate Z statistics and derived the corresponding P value from the cumulative gaussian distribution function. Where applicable, we used the Bonferroni approach to correct P values for the effects of multiple testing. For parameters requiring correction for repetitive ele732 Clinical Chemistry 55:4 (2009)

ments, statistical significance was set at P values ⬍0.01, to account for gaussian error carry-forward effects. Results THE CIRCULATING GENOME

We generated 4.5 ⫻ 105 high-quality sequence reads (7.5 ⫻ 107 nucleotides total) from serum samples of 50 apparently healthy blood donors. The mean (SD) number of sequence reads per sample was 9100 (2620), of which 87% (5%) produced significant hits in one of the databases. Of these hits, 97% (4%) were assigned to be of genomic origin. The mean read length per sample was 169 (14) bp. REPRESENTATION OF GENES, RNAs, AND CDSs

Only hits with a length ⬎50% of the query length were considered for subsequent detailed allocations. The relative mean amounts of nucleotides matching to genes, pseudogenes, transcribed regions (annotated as RNAs and UTRs), and CDSs were calculated (observed) and compared with the mean amounts found in genomic DNA samples (expected). Because annotated genes, RNAs, and unprocessed pseudogenes contain introns and therefore repetitive elements, we corrected the expected amounts of these features for the amounts of repetitive elements they contained. This correction was necessary because all repetitive elements in the analyzed sequences could not be allocated to a unique genomic region and thus had to be masked before their use in queries to the human genome database. Overall, a ratio of approximately 1 was found for all of the genomic features, indicating that essentially no difference existed between the circulating DNA pool and the genome in these features’ representation in healthy individuals (Fig. 1). The highest variation was observed in the representation of CDSs, UTRs, and pseudogenes in serum DNA samples (as well as in the genomic samples). In contrast, the representation of genes and RNA se-

Profile of Circulating DNA in Healthy Individuals

1.6 1.4 1.2

Ratio (CNA/gDNA)

1 0.8 0.6 0.4 0.2

Pseudo

UTR

CDS

RNA

Intergenic

Gene

Repeats

Nonrepeat

Human

All known

0

Fig. 1. Representation of sequences from different origins in the circulating DNA pool of healthy individuals, expressed as observed/expected ratios. All hits with an expected value of ⬍0.001 were evaluated for general assignment of nucleotides to unidentified, genomic, nonrepetitive, and repetitive sequence classes. For detailed allocation to the different genomic features, only hits ⬎40 bp were determined. Whiskers represent 1.96⫻ SD. gDNA, genomic DNA; Pseudo, pseudogene.

quences in the circulating DNA pool was more consistent among the samples from healthy individuals. Fig. 2, A–C, shows the correlation of serum values with genomic values in the 50 healthy individuals. REPRESENTATION OF SINGLE GENES

The representation of serum sequences matching to annotated genes depended strongly on gene length. We compared the observed values for serum representation of sequences matching to the 4000 largest human genes with expected values and found gene length to be correlated with the gene’s representation in serum {r ⫽ 0.91; y ⫽ [1.11 (0.02)]x ⫺ [0.06 (0.02)]} (95% CIs for slope and y intercept are indicated in parentheses). The observed/expected ratio was ⬎5 in 4 genes and ⬍0.2 in 40 genes. CHROMOSOMAL DISTRIBUTION

Sequences obtained from serum DNA and genomic DNA were further evaluated by the chromosomal positions of their highest-scoring hits. We calculated the number of nucleotides matching to each of the human chromosomes and compared that number to chromosome length. The number of nucleotides derived from a chromosome was correlated with chromosome length for both sample types (r 2 ⫽ 0.96 for serum DNA samples; r 2 ⫽ 0.93 for genomic samples). The ratio of observed (serum) to expected (genomic) hit counts

was approximately 1 for all chromosomes with the exception of chromosome 19, for which the observed hits accounted for only 81% of the expected value (P ⬎ 0.05; Fig. 2D). Chromosome 19 has the highest gene density and GC content of any chromosome. Therefore, we tested whether the representation of the different chromosomes in the serum is correlated with chromosome gene density or GC content. Neither Pearson correlation (r 2 ⫽ 0.22) nor Spearman rank correlation (r 2 ⫽ 0.03) analysis revealed a significant correlation between gene density and representation of the different chromosomes in the serum. The correlation between GC content and chromosomal representation was also weak (r 2 ⫽ 0.19). REPRESENTATION OF REPETITIVE ELEMENTS

Of the CNA fragments of human origin, 51.7% were of repetitive elements. A comparative analysis of the premasked human genome showed that 50.2% of the genome represents repetitive elements. We detected 51.6% repetitive elements in the sequenced genomic samples. We detected repetitive elements within the circulating DNA sequences with RepeatMasker software and compared them with the amounts calculated for the genomic DNA samples. No significant differences were detected between the genomic and CNA samples for the different classes of interspersed repeats (short interspersed nuclear elements, long interspersed nuclear elements, long terminal repeats, and DNA transposons) (Fig. 3A). Further detailed analyses of the families and elements belonging to the most abundant repeat classes revealed an overrepresentation of Alu elements (P ⬍ 0.01; Fig. 3B). Whereas long interspersed nuclear elements L1 and L2 (Fig. 3C) were significantly underrepresented in the circulating DNA compared with the genome (P ⬍ 0.01), L3 elements were represented in genomic DNA and CNA samples to equivalent extents. PROCEDURE CONTROL

We examined whether the experimental procedure or the computerized sequence analyses was responsible for any bias in the representation of repetitive elements. To estimate this bias, we ran 2 types of additional controls. For the first control experiment, we partitioned the human genome as obtained from NCBI into 175-bp segments. A second control consisted of shearing the genomic DNA by ultrasonication before the WGA reaction. We used the amounts of repetitive elements as calculated from the premasked human genome to compare the RepeatMasker results for the genomic DNA and sheared genomic DNA samples, as well as the results for the partitioned FASTA file. The premasked human genome was downloaded from the Clinical Chemistry 55:4 (2009) 733

Fig. 2. Correlation plots of the representation of different genomic features and human autosomes within the normal circulating DNA pool vs the expected distributions. Dashed lines indicate upper and lower 95% confidence limits. Hs19, human autosome 19.

RepeatMasker Web site. Fig. 4 presents the ratios obtained in this analysis. The deviation of the amounts for the partitioned genomic FASTA sample from the nucleotide amounts of the unpartitioned genomic sequence reveals the bias that is introduced by querying short sequences alone. Deviations from 1, when seen only in the experimentally amplified genomic samples, indicate a bias that is introduced by the amplification or sequencing reactions. Close proximity of both lines indicates that shearing of the DNA before the WGA procedure introduces little additional bias. Shortening of the query sequence had no effect on nonrepetitive elements but did hinder the detection of repeats, particularly L1 elements, as seen in the divided FASTA sample. Both genomic DNA (whether sheared or of high molecular weight) and CNAs, however, showed an overrepresentation of L1 elements. This finding suggests that L1 elements are favored in the amplification reaction or in the sequencing procedure. On the con734 Clinical Chemistry 55:4 (2009)

trary, the underestimation of L2 and L3 elements is introduced by the bioinformatics approach. Because the repetitive elements found in the DNA sequenced from serum samples were compared with the amounts in the genomic DNA samples subjected to the same experimental procedures, it is unlikely that experimental bias is a cause for these differences (Fig. 3). We controlled the accuracy of sequence annotation in the query pipeline via the use of several representative parts of the human genome that were partitioned into 175-bp fragments as the input (total, 1.1 ⫻ 108 bp). We compared the annotation with the corresponding annotation in the seq_gene.md database and calculated an accuracy of ⬎96% for genes, RNAs, CDSs, and UTRs. The data in Fig. 4 reveal that the proportions of RNAs and CDSs in genomic DNA were lower than expected, a finding that appears mostly due to the WGA or sequencing procedure.

Profile of Circulating DNA in Healthy Individuals

Repetitive Elements 0.25

3

A

B

C

2.5

0.2

CNA/gDNA

0.15 1.5 0.1 1 0.05

0.5

L3

L2

L1P

L1

L1M

AluY

AluS

AluJ

Alu

LTR

Transposon

LINE

0 SINE

0

Expected amounts (% of genome)

2

Fig. 3. Representation of repetitive elements expressed as observed/expected ratios (black dots). Error bars represent 1.96⫻ SD. Gray columns indicate the expected amounts as a percentage of the human genome covered by the respective repeat class/family/element (right y axis scale): repeat classes (A); Alu family belonging to the short interspersed nuclear element (SINE) class (B); families of the long interspersed nuclear element (LINE) class (C). gDNA, genomic DNA; LTR, long terminal repeat.

SEQUENCES MATCHING TO BACTERIAL AND VIRAL GENOMES

Of the total significant hits, 0.16% originated from the bacterial genomes database, and 0.02% and 0.01% were of viral and fungal origin, respectively. One of the control individuals had an undiagnosed HBV infection at sampling time. Of the total sequence data from this patient, 15.5% were HBV sequences. The complete HBV genome could be assembled from the sequence reads derived from this patient. Comparison of the consensus sequence against the known sequences of different HBV strains revealed the highest homology to HBV genotype F. Discussion We report sequence profiles for the circulating DNA pool in healthy individuals. The combination of random amplification of whole serum DNA isolates and high-throughput sequencing provides the first description and analysis of a large amount of unbiased sequence data. Use of a sequential BLAST-analysis pipeline allowed every fragment to be compared, not only to the endogenous genome but also to bacterial and viral genomes. An interesting finding was the detection of HBV infection in one of the volunteers, who was later determined to be an HBV carrier. This result shows that mass sequencing of serum nucleic acids can provide a powerful diagnostic approach for detecting

not only disease-related endogenous CNA profiles but also blood-borne infectious agents. Profiling of the circulating DNA present in the blood of healthy individuals provides valuable information for elucidating the origin of serum CNAs. Two sources of endogenous circulating DNA have been discussed in the literature: dying cells, whether necrotic or apoptotic, and DNA actively secreted by viable cells (27, 28, 33 ). Internucleosomal fragmentation of nuclear DNA occurs during the last stages of the apoptotic cascade (34, 35 ), and a small portion of the apoptotic genomic DNA has been experimentally shown to escape final cleavage to monoor oligonucleotides and appear in the bloodstream or the urine (33, 36, 37 ). In a recently published analysis of 556 independent clones obtained from circulating DNA of healthy humans, the authors concluded that circulating DNA in plasma is derived from apoptotic cells rather than necrotic cells (29 ). They also reported that the number of clones derived from each chromosome was correlated with chromosome size. Our findings confirm that the representation of serum sequences is generally correlated with chromosome size, although we found a slight underrepresentation of chromosome 19. Chromosome 19 contains the most genes and has the highest amount of Alu elements and the highest GC content of any chromosome (38 ). We found no correlation of gene density or GC content with the chromosomal distribution of the sequenced fragments in either serum DNA or genomic DNA. It is Clinical Chemistry 55:4 (2009) 735

2.0

Ratio to DB annotation

1.5

1.0

0.5

L3

L2

P

M

L1

L1

L1

uY Al

uJ

uS Al

Al

A

R

A lu

DN

NE

NE

LT

LI

SI

R

S

do eu Ps

UT

A

CD

ic

RN

en rg

In

te

Ge

ne

0.0

Genomic

Sheared genomic

Partitioned FASTA

Fig. 4. Mean normalized nucleotide amounts for 4 samples of sheared genomic DNA (green triangles) or high molecular weight genomic DNA (blue diamonds) were calculated from pipeline results and divided by the normalized nucleotide amounts as calculated from the premasked human genome. Error bars represent 1.96⫻ SD. The human genomic sequence as downloaded from NCBI was split into 175-bp pieces and run through the repeat-masking procedure (red circles, right side). In addition, 1.1 ⫻ 108 bp were randomly selected, run through the pipeline, and directly compared with the corresponding genome annotation (red circles, left side). Shown are the normalized values calculated as above (red circles). Pseudo, pseudogene; SINE, short interspersed nuclear element; LINE, long interspersed nuclear element; LTR, long terminal repeat.

conceivable, however, that the underrepresentation of chromosome 19 in serum DNA is related to the overrepresentation of Alu sequences in the CNA pool of healthy individuals. High-throughput sequencing data on the genomic distribution of cell-free DNA isolated from the plasma of pregnant women have recently been published (39 ). The Solexa/Illumina platform was used for sequencing in this study, and isolated plasma DNA was used in the sequencing reaction without prior amplification of the DNA. The authors reported a strong bias in the representation of sequences toward GC-rich sequences for both the plasma and genomic DNA samples. First, these investigators found that the mean density of sequences matching to particular chromosomes correlated strongly with chromosomal GC content, and, second, the GC content of the sequenced fragments was, on average, approximately 10% higher than that of the sequenced human genome. The authors speculated that this bias is generated during the sequencing process (39 ). The sequences obtained with our approach are not biased toward GC-rich regions, because the 736 Clinical Chemistry 55:4 (2009)

GC-content value of 42.1% (0.2%) that we obtained is close to that of the sequenced genome (41%) (38 ). In addition, we detected no correlation between chromosomal representation and GC content. Suzuki and colleagues (29 ) reported that plasma samples from healthy individuals contained primarily DNA fragments of approximately 180 bp, with fragments ⬎500 bp observed to a much lesser extent. Native CNA preparations extracted from 4 mL of serum (pooled from 3 different blood donors) on an Agilent Technologies 2100 Bioanalyzer displayed a comparable size distribution, which was not significantly altered by the WGA procedure. The proportion of Alu repeat sequences relative to ␤-globin gene sequences has been reported to be greater in serum DNA than in lymphocyte DNA, in both healthy individuals and cancer patients (40 ). Our results confirm this finding, because we found an overrepresentation of Alu elements in the CNA pool of 50 healthy individuals compared with human genomic samples. We obtained the same result when we compared the 4 CNA samples against the 4 genomic sam-

Profile of Circulating DNA in Healthy Individuals

ples obtained from the same individuals. Sequences matching to Alu elements accounted for 11.4% (0.4%) of the total genomic hits in the CNA samples and 8.5% (0.8%) in the genomic samples. L1 elements were found in higher proportions in genomic DNA and CNA samples than expected from the published genomic sequence, in which L1 retrotransposons account for approximately 17% of the genome (38 ). L1Hs is the youngest branch of the L1 family, and some of its members are still retrotranscribed. It is estimated that 100 L1 copies in the human genome are still capable of retrotransposition, and approximately 10% of these active elements are classified as “hot,” or highly active in an artificial culture system (41 ). Although L1-activity potential has been shown to vary substantially between individuals (42 ), the low number of active L1 sequences produces little quantitative effect on L1 counts. Our data indicated an overrepresentation of L1 elements in both the mean of 4 genomic samples (22.8%) and 50 serum DNA samples (19%), compared with the L1 content calculated from the premasked human genome (17.8%). From the small SDs for L1 in the sample groups (0.1% for the genomic samples and 0.8% for the serum DNA samples), we conclude that the interindividual variation in L1 cannot account for the detected differences. Taken together, the data we have presented suggest that apoptotic genomic DNA is the major but not the sole source of CNAs in apparently healthy individuals. A circulating DNA pool consisting purely of unspecific apoptotic or necrotic nuclear DNA would have shown an even distribution over the entire genome or eventually some overrepresentation of highly histoneprotected regions. Such a distribution is not seen in our data. Whether further subtle differences exist cannot be proved or excluded; deeper (high-coverage) sequencing would be needed to address such questions.

The profile of serum DNA from healthy individuals that we have presented provides baseline information, which is especially important because CNAs are increasingly recognized as valuable diagnostic biomarkers. The use of mass sequencing and bioinformatics provides the basis for new diagnostic approaches that use CNAs as biomarkers for both malignant and nonmalignant diseases.

Author Contributions: All authors confirmed they have contributed to the intellectual content of this paper and have met the following 3 requirements: (a) significant contributions to the conception and design, acquisition of data, or analysis and interpretation of data; (b) drafting or revising the article for intellectual content; and (c) final approval of the published article. Authors’ Disclosures of Potential Conflicts of Interest: Upon manuscript submission, all authors completed the Disclosures of Potential Conflict of Interest form. Potential conflicts of interest: Employment or Leadership: J. Beck, Chronix Biomedical; H.B. Urnovitz, Chronix Biomedical; E. Schu¨tz, Chronix Biomedical. Consultant or Advisory Role: M. Clerici, Chronix Biomedical. Stock Ownership: J. Beck, Chronix Biomedical; H.B. Urnovitz, Chronix Biomedical; E. Schu¨tz, Chronix Biomedical. Honoraria: None declared. Research Funding: H.B. Urnovitz, Chronix Biomedical. Expert Testimony: None declared. Role of Sponsor: The funding organizations played a direct role in the design of the study, review and interpretation of data, preparation of the manuscript, and final approval of the manuscript. Acknowledgments: We thank Sara Hennecke, Stefan Balzer, and Carsten Mu¨ller for their skillful technical assistance, and Sascha Glinka and Birgit Ottenwa¨lder at Eurofins Medigenomix GmbH for performing the GS FLX/454 sequencing. We also thank Prof. Michael Oellerich, University of Go¨ttingen, Go¨ttingen, Germany, and Prof. William M. Mitchell, Vanderbilt University, Nashville, Tennessee, for critical reading of the manuscript and for their valuable comments.

References 1. Fleischhacker M, Schmidt B. Circulating nucleic acids (CNAs) and cancer—a survey. Biochim Biophys Acta 2007;1775:181–232. 2. Johnson PJ, Lo YM. Plasma nucleic acids in the diagnosis and management of malignant disease. Clin Chem 2002;48:1186 –93. 3. Leon SA, Shapiro B, Sklaroff DM, Yaros MJ. Free DNA in the serum of cancer patients and the effect of therapy. Cancer Res 1977;37:646 –50. 4. Swaminathan R, Butt AN. Circulating nucleic acids in plasma and serum: recent developments. Ann N Y Acad Sci 2006;1075:1–9. 5. Sozzi G, Conte D, Leon M, Ciricione R, Roz L, Ratcliffe C, et al. Quantification of free circulating DNA as a diagnostic marker in lung cancer. J Clin Oncol 2003;21:3902– 8. 6. Wu TL, Zhang D, Chia JH, Tsao KH, Sun CF, Wu JT. Cell-free DNA: measurement in various carcinomas and establishment of normal reference

range. Clin Chim Acta 2002;321:77– 87. 7. Boddy JL, Gal S, Malone PR, Harris AL, Wainscoat JS. Prospective study of quantitation of plasma DNA levels in the diagnosis of malignant versus benign prostate disease. Clin Cancer Res 2005; 11:1394 –9. 8. Boddy JL, Gal S, Malone PR, Shaida N, Wainscoat JS, Harris AL. The role of cell-free DNA size distribution in the management of prostate cancer. Oncol Res 2006;16:35– 41. 9. Mayall F, Jacobson G, Wilkins R, Chang B. Mutations of p53 gene can be detected in the plasma of patients with large bowel carcinoma. J Clin Pathol 1998;51:611–3. 10. Sorenson GD, Pribish DM, Valone FH, Memoli VA, Bzik DJ, Yao SL. Soluble normal and mutated DNA sequences from single-copy genes in human blood. Cancer Epidemiol Biomarkers Prev 1994; 3:67–71.

11. Fujiwara Y, Chi DD, Wang H, Keleman P, Morton DL, Turner R, Hoon DS. Plasma DNA microsatellites as tumor-specific markers and indicators of tumor progression in melanoma patients. Cancer Res 1999;59:1567–71. 12. Nawroz H, Koch W, Anker P, Stroun M, Sidransky D. Microsatellite alterations in serum DNA of head and neck cancer patients. Nat Med 1996;2: 1035–7. 13. Silva JM, Dominguez G, Garcia JM, Gonzalez R, Villanueva MJ, Navarro F, et al. Presence of tumor DNA in plasma of breast cancer patients: clinicopathological correlations. Cancer Res 1999; 59:3251– 6. 14. Durie BG, Urnovitz HB, Murphy WH. RT-PCR amplicons in the plasma of multiple myeloma patients— clinical relevance and molecular pathology. Acta Oncol 2000;39:789 –96. 15. Korshunova Y, Maloney RK, Lakey N, Citek RW,

Clinical Chemistry 55:4 (2009) 737

16.

17.

18.

19.

20.

21.

22.

23.

24.

Bacher B, Budiman A, et al. Massively parallel bisulphite pyrosequencing reveals the molecular complexity of breast cancer-associated cytosinemethylation patterns obtained from tissue and serum DNA. Genome Res 2008;18:19 –29. Ziegler A, Zangemeister-Wittke U, Stahel RA. Circulating DNA: a new diagnostic gold mine? Cancer Treat Rev 2002;28:255–71. Lo YM, Rainer TH, Chan LY, Hjelm NM, Cocks RA. Plasma DNA as a prognostic marker in trauma patients. Clin Chem 2000;46:319 –23. Rainer TH, Wong LK, Lam W, Yuen E, Lam NY, Metreweli C, Lo YM. Prognostic use of circulating plasma nucleic acid concentrations in patients with acute stroke. Clin Chem 2003;49:562–9. Urnovitz HB, Tuite JJ, Higashida JM, Murphy WH. RNAs in the sera of Persian Gulf War veterans have segments homologous to chromosome 22q11.2. Clin Diagn Lab Immunol 1999;6:330 –5. Li JZ, Steinman CR. Plasma DNA in systemic lupus erythematosus. Characterization of cloned base sequences. Arthritis Rheum 1989;32:726 –33. Chim SS, Jin S, Lee TY, Lun FM, Lee WS, Chan LY, et al. Systematic search for placental DNAmethylation markers on chromosome 21: toward a maternal plasma-based epigenetic test for fetal trisomy 21. Clin Chem 2008;54:500 –11. Lo YM. Fetal DNA in maternal plasma: biology and diagnostic applications. Clin Chem 2000;46: 1903– 6. Tsui DW, Chan KC, Chim SS, Chan LW, Leung TY, Lau TK, et al. Quantitative aberrations of hypermethylated RASSF1A gene sequences in maternal plasma in pre-eclampsia. Prenat Diagn 2007;27: 1212– 8. Fournie GJ, Courtin JP, Laval F, Chale JJ, Pourrat JP, Pujazon MC, et al. Plasma DNA as a marker of cancerous cell death. Investigations in patients suffering from lung cancer and in nude mice

738 Clinical Chemistry 55:4 (2009)

25.

26.

27.

28.

29.

30.

31.

32.

33.

34.

bearing human tumours. Cancer Lett 1995;91: 221–7. Lui YY, Chik KW, Chiu RW, Ho CY, Lam CW, Lo YM. Predominant hematopoietic origin of cellfree DNA in plasma and serum after sexmismatched bone marrow transplantation. Clin Chem 2002;48:421–7. Anker P, Stroun M, Maurice PA. Spontaneous extracellular synthesis of DNA released by human blood lymphocytes. Cancer Res 1976;36:2832–9. Anker P, Mulcahy H, Chen XQ, Stroun M. Detection of circulating tumour DNA in the blood (plasma/serum) of cancer patients. Cancer Metastasis Rev 1999;18:65–73. Stroun M, Maurice P, Vasioukhin V, Lyautey J, Lederrey C, Lefort F, et al. The origin and mechanism of circulating DNA. Ann N Y Acad Sci 2000;906:161– 8. Suzuki N, Kamataki A, Yamaki J, Homma Y. Characterization of circulating DNA in healthy human plasma. Clin Chim Acta 2008;387:55– 8. Altschul SF, Madden TL, Schaffer AA, Zhang J, Zhang Z, Miller W, Lipman DJ. Gapped BLAST and PSI-BLAST: a new generation of protein database search programs. Nucleic Acids Res 1997;25: 3389 – 402. Smit AFA, Hubley R, Green P. RepeatMasker Open-3.0. http://www.repeatmasker.org (Accessed April 2008). Jurka J, Kapitonov VV, Pavlicek A, Klonowski P, Kohany O, Walichiewicz J. Repbase Update, a database of eukaryotic repetitive elements. Cytogenet Genome Res 2005;110:462–7. Giacona MB, Ruben GC, Iczkowski KA, Roos TB, Porter DM, Sorenson GD. Cell-free DNA in human blood plasma: length measurements in patients with pancreatic cancer and healthy controls. Pancreas 1998;17:89 –97. Bicknell GR, Cohen GM. Cleavage of DNA to

35.

36.

37.

38.

39.

40.

41.

42.

large kilobase pair fragments occurs in some forms of necrosis as well as apoptosis. Biochem Biophys Res Commun 1995;207:40 –7. Wyllie AH. Glucocorticoid-induced thymocyte apoptosis is associated with endogenous endonuclease activation. Nature 1980;284:555– 6. Botezatu I, Serdyuk O, Potapova G, Shelepov V, Alechina R, Molyaka Y, et al. Genetic analysis of DNA excreted in urine: a new approach for detecting specific genomic DNA sequences from cells dying in an organism. Clin Chem 2000;46: 1078 – 84. Lichtenstein AV, Melkonyan HS, Tomei LD, Umansky SR. Circulating nucleic acids and apoptosis. Ann N Y Acad Sci 2001;945:239 – 49. Lander ES, Linton LM, Birren B, Nusbaum C, Zody MC, Baldwin J, et al. Initial sequencing and analysis of the human genome. Nature 2001;409: 860 –921. Fan HC, Blumenfeld YJ, Chitkara U, Hudgins L, Quake SR. Noninvasive diagnosis of fetal aneuploidy by shotgun sequencing DNA from maternal blood. Proc Natl Acad Sci U S A 2008;105: 16266 –71. Stroun M, Lyautey J, Lederrey C, Mulcahy HE, Anker P. Alu repeat sequences are present in increased proportions compared to a unique gene in plasma/serum DNA: evidence for a preferential release from viable cells? Ann N Y Acad Sci 2001; 945:258 – 64. Muotri AR, Marchetto MC, Coufal NG, Gage FH. The necessary junk: new functions for transposable elements. Hum Mol Genet 2007;16(Spec No 2):R159 – 67. Seleme MC, Vetter MR, Cordaux R, Bastone L, Batzer MA, Kazazian HH Jr. Extensive individual variation in L1 retrotransposition capability contributes to human genetic diversity. Proc Natl Acad Sci U S A 2006;103:6611– 6.

Clinical Chemistry 55:4 739–747 (2009)

Molecular Diagnostics and Genetics

European External Quality Control Study on the Competence of Laboratories to Recognize Rare Sequence Variants Resulting in Unusual Genotyping Results Ja´nos Ma´rki-Zay,1,2 Christoph L. Klein,1,3 David Gancberg,1,4 Heinz G. Schimmel,1* and La´szlo´ Dux2,5

BACKGROUND: Depending on the method used, rare sequence variants adjacent to the single nucleotide polymorphism (SNP) of interest may cause unusual or erroneous genotyping results. Because such rare variants are known for many genes commonly tested in diagnostic laboratories, we organized a proficiency study to assess their influence on the accuracy of reported laboratory results. METHODS:

Four external quality control materials were processed and sent to 283 laboratories through 3 EQA organizers for analysis of the prothrombin 20210G⬎A mutation. Two of these quality control materials contained sequence variants introduced by site-directed mutagenesis.

RESULTS:

One hundred eighty-nine laboratories participated in the study. When samples gave a usual result with the method applied, the error rate was 5.1%. Detailed analysis showed that more than 70% of the failures were reported from only 9 laboratories. Allele-specific amplification– based PCR had a much higher error rate than other methods (18.3% vs 2.9%). The variants 20209C⬎T and [20175T⬎G; 20179_20180delAC] resulted in unusual genotyping results in 67 and 85 laboratories, respectively. Eighty-three (54.6%) of these unusual results were not recognized, 32 (21.1%) were attributed to technical issues, and only 37 (24.3%) were recognized as another sequence variant. CONCLUSIONS: Our findings revealed that some of the participating laboratories were not able to recognize and correctly interpret unusual genotyping results caused by rare SNPs. Our study indicates that the majority of the failures could be avoided by improved

1

European Commission, Joint Research Centre, Institute for Reference Materials and Measurements, Geel, Belgium; 2 Department of Biochemistry, Faculty of Medicine, University of Szeged, Szeged, Hungary; 3 European Commission, Joint Research Centre, Institute for Health and Consumer Protection, Ispra, Italy; 4 European Commission, Directorate-General for Research, Health Biotechnology, Brussels, Belgium; 5 QualiCont Kht., Szeged, Hungary. * Address correspondence to this author at: European Commission, Joint Research Centre, Institute for Reference Materials and Measurements, Retieseweg 111, B-2440 Geel, Belgium. Fax ⫹32 14 571548; e-mail Heinz.Schimmel@

training and careful selection and validation of the methods applied. © 2009 American Association for Clinical Chemistry

Common functional single nucleotide polymorphisms (SNPs)6 are associated with predisposition to highincidence multifactorial disorders or altered drug metabolism and play an essential role in the diagnostic process, risk stratification, and treatment selection. These genetic alterations are usually tested using DNAbased genotyping techniques, which are designed for accurate and rapid detection of the SNP of interest and not intended for mutation screening. Although these assays are highly specific, rare variants adjacent to the nucleotide of interest may cause unusual or erroneous genotyping results (1–5 ). Such rare variants are known, for example, in genes coding factor V (6–8 ), hemochromatosis (9, 10 ), factor II (prothrombin) (5, 11–14 ), and cholesterol ester transfer protein (CETP) (1 ). However, their influence on the accuracy of reported laboratory results has not yet been investigated. The Institute for Reference Materials and Measurements (IRMM) and the Department of Biochemistry of the University of Szeged agreed on a proficiency testing (PT) study in cooperation with 3 European external quality assessment (EQA) organizations to assess how clinical laboratories interpret unusual genotyping results and how many of these laboratories recognize interfering mutations and report them correctly. Additional aims of the study were to identify weaknesses of genetic testing services, to develop QC materials for rare variants, and to gather more information about the requirements for DNA-based candidate reference materials (RMs).

ec.europa.eu. Received July 8, 2008; accepted January 27, 2009. Previously published online at DOI: 10.1373/clinchem.2008.112102 6 Nonstandard abbreviations: SNP, single nucleotide polymorphism; CETP, cholesteryl ester transfer protein; IRMM, Institute for Reference Materials and Measurements; PT, proficiency testing; EQA, external quality assessment; RM, reference material; CRM, certified reference material; LDT, laboratory-developed test; FRET, fluorescence resonance energy transfer; AD, allelic discrimination; dHPLC, denaturing HPLC.

739

Recently, in a collaborative effort, the Scientific Committee of Molecular Biology Techniques in Clinical Chemistry of the IFCC and the IRMM have developed a set of 3 plasmid-type certified reference materials (CRMs) for the analysis of the 20210G⬎A (G20210A) mutation in F2, the coagulation factor II (prothrombin) gene (15–19 ). These plasmids contain a wild-type or G20210A mutant prothrombin gene fragment that spans all primer annealing sites published to date. The G20210A mutation in the 3⬘ untranslated region of the prothrombin gene is associated with higher mean plasma prothrombin concentrations and increased risk for thrombotic diseases (20 ). The clinical significance of the presence of this SNP, indications for testing, and therapeutic consequences of the results have been reviewed (21 ). Because no specific functional (e.g., clotting) tests exist, the analysis of this SNP in the course of initial evaluation of suspected inherited thrombophilia is one of the most frequently performed genetic tests in clinical laboratories; all the EQA organizations contacted offer surveys for this test in their molecular diagnostic schemes. Other variants of the prothrombin gene, such as 20209C⬎T (22, 23 ) (C20209T), 20209C⬎A (24 ), 20207A⬎C (25 ), 20218A⬎G (14 ), and 20221C⬎T (12 ), are relatively rare (detected in 1 in 1660 samples). The C20209T mutation accounts for nearly 85% of these variants. These and similar mutations can result in unusual genotyping results, and the correct reporting and interpretation of these cases could be an important indicator of the competence of testing laboratories. Materials and Methods PRODUCTION OF THE EXTERNAL QC SAMPLES

Development and processing of the wild-type and G20210A mutant plasmid CRMs has been reported (15 ). These plasmids include a well-characterized fragment of the human prothrombin gene (GenBank M17262; nucleotide position 26 302–26 910) comprising the SNP of interest and spanning all published primer annealing sites. We introduced the C20209T mutation into the wild-type sequence using the QuikChange II SiteDirected Mutagenesis Kit (Stratagene) according to the manufacturer’s recommendations, with 2 complementary oligonucleotides as mutagenesis primers. The sense oligo was 5⬘-CCCAATAAAAGTGACTCTCAG TGAGCCTCAATGCTCCCAGTGC-3C⬘ (mutagenic nucleotide is underlined) (Proligo Biochemie GmbH). Plasmids were transfected and cultured in XL1-Blue supercompetent cells provided in the kit. We screened plasmid preparations purified from the cultured and harvested bacteria using the Qiagen 740 Clinical Chemistry 55:4 (2009)

QIAfilter Plasmid Maxi Kit (Qiagen Benelux B.V.) for the presence of mutations resulting in unusual melting curves on the LightCycler® with the Prothrombin (G20210A) Mutation Detection Kit (Roche Diagnostics) according to the kit insert. Two plasmids containing newly introduced mutations were sequenced using suitable primers (Profw, 5⬘-GCACAGACGGCTGT TCTCTT-3⬘, and Prorev, 5⬘-CCCGAGTGCTCGGAC TACCA-3⬘; synthesized by VBC Genomics, Vienna, Austria; HPLC-purified) and CEQ Dye Terminator Cycle Sequencing Kit with the Beckman CEQ8000 Sequencer (Analis) according to the manufacturer’s instructions. Purity of plasmid preparations was confirmed by the OD 260/280 ratio, and DNA concentration was determined using an Eppendorf Biophotometer (VWR International). We mixed plasmids containing the newly introduced mutations with equal amounts of the G20210A mutant or wild-type plasmids. These plasmid preparations were diluted in Tris-EDTA buffer (10 mmol/L, 1 mmol/L, pH 8.0) to a final concentration of approximately 10 pg/␮L DNA. Before aliquoting these diluted stocks, we tested them again on the LightCycler system as described above. DESIGN OF THE RING TRIAL

Four external quality control materials were aliquoted (30 ␮L/vial) into sterile, self-standing, high-recovery 1.5-mL polypropylene screw-cap vials and distributed to each of the 283 laboratories participating in 3 European EQA schemes (DGKL Referenzinstitut fu¨r Bioanalytik, Bonn, Germany; Instand e.V., Du¨sseldorf, Germany; and QualiCont Kht., Szeged, Hungary). The actual number of laboratories invited for this exercise was likely lower due to overlapping coverage of the same laboratories by multiple schemes. Two of the 4 materials contained mutations giving unusual genotyping results when using some methods: sample A, 20210A/[20175T⬎G; 20179_20180delAC], heterozygous for the G20210A mutation; sample B, G20210/C20209T, homozygous wild-type for the G20210A mutation; sample C, 20210A/20210A, homozygous for the G20210A mutation; and sample D, G20210/G20210, homozygous wild-type for the G20210A mutation. Participants were informed that the external QC samples contained plasmid-type materials but were given no information on the expected results. Using a standard form, testing laboratories were asked to give information on the type of the center activity, spectrum of the tests offered, accreditation status of the laboratory, number of genetic tests performed yearly (factor II G20210A and in general); dates of arrival and testing; storage conditions; and incidental comments/observations on the samples. The laboratories were asked to describe in detail the methods applied (including the sample processing protocol

Proficiency Testing Study on Rare Sequence Variants

Fig. 1. Melting curves on samples A and B using LightCycler and factor II (prothrombin) G20210A mutation detection kit. , no template; - - -, heterozy—, sample A (20210A/[T20175G;20179_20180delAC]); -F-, sample B (C20209T/wild-type); gous control. Vertical lines indicate melting temperatures.

and genotype and origin of the controls used) and submit their genotyping results including the raw data (e.g., copy of the gels, melting curves). No standardized terms for the reporting were defined. Further comments on the results were welcomed. Statistical analysis of the study results was performed by ␹ 2-test, and the significance limit was set to P ⬍ 0.05. Results

identified the newly introduced mutations as C20209T and [20175T⬎G; 20179_20180delAC] mutations, respectively. Of note, the second mutation was not introduced by site-directed mutagenesis, but occurred accidentally. This quite complex variation leads to a decrease of the melting temperature of the anchor probe, resulting in a relatively sharp melting peak. Melting curves obtained using LightCycler on the ring trial samples A (20210A/[20175G; 20179_20180delAC]) and B (G20210/20209T) are shown in Fig. 1.

PRODUCTION OF THE NEW QC MATERIALS

Screening of plasmid preparations detected 2 new variants displaying unusual melting peaks at approximately 54 °C and 58 °C. Sequencing of these plasmids

PROFICIENCY TESTING STUDY

After removal of duplications resulting from the overlap of laboratories participating in more than 1 EQA Clinical Chemistry 55:4 (2009) 741

These laboratories applied a large variety of genotyping techniques, representing 50 different analytical procedures. All of the assays could amplify the target sequence from the plasmids, proving that the samples were suitable for the interlaboratory comparison exercise. Genotyping results are summarized in Table 2 and can be split into usual and unusual results after analysis. Usual results have typical genotyping outcomes (e.g., in terms of sequence) or test system responses (e.g., melting curve or bands on the electropherogram), which correspond to one of the expected genotypes, namely homozygous wild-type (G20210/G20210), heterozygous (G20210/20210A), or homozygous G20210A mutant (20210A/20210A) with the surrounding sequence being conserved and being present in the vast majority of individuals. Results are called unusual when the presence of a sequence variant or the test system response did not correspond to expectations for the majority of individuals, such as abnormal melting curves, unexpected nucleotide sequences, or a band of unanticipated size in the electropherogram. More detailed analysis of the results showed that the performance of the individual laboratories did not depend on the type of the laboratory, the spectrum of tests offered, or the accreditation status of the laboratory (data not shown). Table 3 displays the use of different genotype controls in the participant laboratories.

Table 1. Details on participating laboratories. n

Type of center activity (of 188 answering) University

48

Research

3

Hospital

67

Independent laboratory

67

Manufacturer

3

Spectrum of clinical tests offered (of 180 answering) General

129

Only genetic tests

36

Other (e.g. Hemostasis)

15

Conditions tested (of 115 answering) Only risk factors

61

Monogenic diseases and risk factors

54

Accreditation status (of 179 answering) ISO 17025

26

ISO 15189

43

Other (e.g. ISO 9001)

41

Accredited

104

Not accredited

75

scheme, 189 laboratories from 21 countries participated in the survey. Some features of the participants are shown in Table 1.

Table 2. Overview of genotyping results and error rates. Sample Aa

Usual results, n

b

Sample B

Sample C

Sample D

Total

112

104

189

189

596

102

100

177

180

559

False, n

7

3

12

8

30

Not reported, n

3

1

0

1

5

Correct, n

Error rate, % Unusual results, nc

6.4

2.9

6.3

4.3

5.1

67

85

NAd

NA

Not described (false), n (%)e

50 (74.6)

33 (38.8)

NA

NA

83 (54.6)

Technical issue, n (%)f

10 (14.9)

22 (25.9)

NA

NA

32 (21.1)

7 (10.5)

30 (35.3)

NA

NA

37 (24.3)

Recognized variant, n (%) a

152

Without allelic discrimination. Results where the test system response (e.g., melting curve or electropherogram) were as expected and corresponding to homozygous wild-type (G20210/G20210), heterozygous (G20210/20210A), or homozygous G20210A mutant (20210A/20210A) genotype. These results were not influenced by the sequence variants in samples A and B. c In 67 and 85 laboratories for samples A and B, respectively, the sequence variants affected the methods, and unusual genotyping results (e.g. abnormal melting curves or electropherogram, unexpected nucleotide changes in sequencing data) were obtained that should have been detected and reported accordingly. Some of these unusual results were detected and reported either as “technical issues” or “recognized variants”; however, many participants did not describe these atypical observations and reported the results as one of the expected genotypes without additional remarks. d NA, not applicable. e Reported as one of the expected genotypes, but the sequence variant resulted in an atypical genotyping pattern using the technique applied. f Genotypes not reported but observations described such as unusual results of presumed technical origin. b

742 Clinical Chemistry 55:4 (2009)

Proficiency Testing Study on Rare Sequence Variants

Table 3. Genotype controls used in the participating laboratories.

Table 4. Methods used and error rates. Method groups

Positive control

Kit control

n

Error rate

Commercial/laboratory developed tests

106 35

Commercial

G20210/G20210 and G20210/20210A patient DNA

18

LDT

G20210/G20210 and 20210A/20210A patient DNA

1

G20210/G20210, G20210/20210A, and 20210A/20210A patient DNA

14

G20210/20210A patient DNA and 5 plasmids for rare SNPs

1

Ring trial samples

3

Sequencing

7

0/19

NIBSC-WHO reference panela

1

Denaturing HPLC

1

0/2

Molecular beacon

1

0/4

G20210/20210A patient DNA

No data a

n

111

16/320 (5.0%)

78

14/270 (5.2%)

Method principle

10

NIBSC, National Institute for Biological Standards and Control.

PCR-RFLP

34

2/134 (1.5%)a

LightCycler

78

7/155 (4.5%)

Allele-specific PCR

15

11/60 (18.3%)

Reverse Hybridization

43

5/169 (3.0%)

Allelic Discrimination

10

1/30 (3.3%)b

a

Without the 4 false results reported incorrectly due to inadequate nomenclature. b Only the results on the Sample B.

USUAL RESULTS

Genotyping assays for detection of the G20210A mutation in the prothrombin gene usually result in the genotypes homozygous wild-type (G20210/G20210), heterozygous (G20210/20210A), or homozygous G20210A mutant (20210A/20210A). All the methods provided such results on samples C and D, and the majority of the techniques also on samples A and B. Among these results, the error rate was 5.1%, which corresponds to previous observations in the literature (26–29 ). Interestingly, the error rates were higher on samples A and C containing the G20210A mutation (6.4% and 6.3%, respectively) than on samples B and D without the G20210A mutation (2.9% and 4.3%, respectively); however, this difference was not statistically significant. Currently, commercial kits as well as several laboratory developed tests (LDTs) are used for identification of the prothrombin G20210A mutation. Table 4 presents the error rates on usual results using the different techniques. The most frequently applied methods were LightCycler [fluorescence resonance energy transfer (FRET)], reverse hybridization, and PCR-RFLP. Reverse hybridization assays were carried out using commercial kits. Sixty of the 78 LightCycler users applied the Roche Factor II (Prothrombin) G20210A mutation detection kit. Among PCR-RFLP methods, most of the laboratories applied the original method published by Poort et al. (20 ) or its slightly modified versions, whereas only 4 laboratories used an improved method (30 ), which contains an additional restriction site for digestion control. Although more laboratories used commercial kits for PCR-RFLP and allele-specific PCR than LDTs, approximately 63% of the prothrombin G20210A tests were performed in laboratories employing LDTs. It should be

emphasized that the difference in error rates for commercial and laboratory-developed tests was not found to be statistically significant. Allele-specific amplification PCR showed a much higher error rate than other methods (18.3% vs 2.9%), and 6 of 15 laboratories (40%) using this technique reported false results. No statistically significant differences in performance were seen between commercial kits and LDTs. Although these findings cannot be explained only by shortcomings of the technique, the method appears to be less reliable. Altogether, 30 incorrect genotypes were reported by 19 participants for results with a usual genotyping pattern (Table 4). Further analysis of these results showed that 21 (70%) of these failures were obtained from only 9 laboratories. Two of these laboratories reported the results using inappropriate nomenclature indicating only the presence or absence of the G20210A mutation and did not distinguish the heterozygous from the homozygous G20210A mutation. Such results were not considered to be adequately and correctly reported (31 ) because the different risk levels associated with the heterozygous or homozygous genotypes may influence clinical decisions. Careful comparison of available raw data and genotypes reported revealed that 2 participants mixed up findings postanalytically. In 2 other cases, the genotypes were assigned incorrectly, but the raw data showed technically correct results. One laboratory employing an allele-specific PCR kit submitted results despite having incorrect findings on the control samples. Clinical Chemistry 55:4 (2009) 743

One participant using an inappropriately designed LightCycler assay mistyped 3 samples. UNUSUAL RESULTS

Detailed analysis of the results obtained for sample A revealed that the [20175T⬎G; 20179_20180 del AC] mutation interfered with the assay applied in 67 laboratories, leading to unusual genotyping results. In addition, most results from the 10 laboratories using the TaqMan allelic discrimination (AD) assays indicated an impaired amplification of the wild-type sequence due to the [20175T⬎G; 20179_20180delAC] mutation. Sixty laboratories employed the LightCycler instrument with the LightCycler Prothrombin G20210A Mutation Detection Kit (Roche), and 1 more participant used a laboratory-developed method similar to that applied in this kit. Using these methods, the [20175T⬎G; 20179_20180delAC] mutation led to a decrease of the melting temperature of the anchor probe, which resulted in a melting peak at approximately 1.7 °C lower than expected for the wild-type allele (Fig. 1). Although this peak was inside the range of 59 (2.5) °C indicated by the manufacturer, the difference between the wild-type and the mutant melting peaks did not fall into the range specified in the kit instructions [10 (1.5) °C]. Probably because of the small deviation from the expected values, only 23% of the laboratories reported unusual melting peaks on this sample. Approximately half of these laboratories noted that such unusual melting peaks could indicate a variant in the probe region. Three laboratories used sequencing as their routine procedure. In addition, one of the LightCycler users, observing the unusual melting curves, sequenced the region to identify the variant. All 4 laboratories could identify the mutation, although in 1 case it was not correctly described. One laboratory applied a denaturing HPLC (dHPLC) assay. This participant detected the presence of the variant, but mistyped the 20210A mutant allele as wild-type in sample A. Two laboratories using PCR-RFLP methods with high-resolution electrophoresis (PAGE or Agilent Labon-chip) found a double band instead of the expected single band on the wild-type allele. This second band probably corresponded to the undigested heteroduplex of the 2 alleles, which had a moiety with lower electrophoretic mobility due to the loop arising from the mismatches. Both laboratories noted this unusual finding. Ten of the participating laboratories used different TaqMan AD assays. Seven of these laboratories reported a homozygous mutant (20210A) genotype on sample A, 1 participant genotyped the sample as heterozygous (G20210/20210A), and 2 laboratories, upon finding unusual AD plots, genotyped the sample by PCR-RFLP. The few available raw data (for 2 of the 10 744 Clinical Chemistry 55:4 (2009)

laboratories) revealed that the sequence alteration in the sample A led to a shift in the AD plot depending on the primer sequences. In cases where AD plots were affected, amplification of the wild-type sequence was probably inefficient because of the impaired primer binding due to the [20175T⬎G; 20179_20180delAC] mutation. The lack of raw data did not allow further evaluation of the results from AD assays on sample A. The C20209T mutation in sample B gave unusual genotyping results in 85 laboratories. Seventy-eight of these laboratories applied the LightCycler technique with 6 different methods. Owing to the C20209T mutation, the melting peak was located at approximately 54 °C using the Roche Factor II (Prothrombin) G20210A mutation detection kit. In 3 published LightCycler-based assays, the melting peak is at approximately 60 °C for the wild-type allele, whereas for the G20210A mutation, it is located close to 53 °C. Using these methods, the C20209T variant appeared as a peak at approximately 50.5 °C; however, this peak was smaller than the usual ones, which might render the evaluation of these results slightly more difficult. Seven participants tested the samples by sequencing as standard procedure. Four of these assays were minisequencing techniques (e.g., pyrosequencing) determining only a few bases adjacent to the mutation of interest. One of the pyrosequencing methods was not able to detect the C20209T mutation. Only 1 of 6 other laboratories performing sequencing reported the C20209T mutation, although it could clearly be identified on the basis of the raw data provided. Two participants sequenced this sample because of the unusual melting curves they obtained using the LightCycler. The returned sequences identified the C20209T mutation. One laboratory used PCR-dHPLC technique without and with addition of wild-type sequences. This participant reported sample B as heterozygous for the G20210A mutation (wild-type/20210A) because this sample gave a chromatographic pattern that could not be distinguished from the heterozygous G20210A samples. Although a majority of laboratories did not report when they experienced unusual results on samples A and B, other participants recognized the presence of the sequence variants or attributed the unusual behavior of the assay to technical issues, including poor quality or low concentration of the DNA. The other techniques used for genotyping in 93 of 179 laboratories (52.0%) were not influenced by the additional mutations in samples A and B. Discussion Both method validation and internal/external quality control require samples of known genotypes. However, access to samples carrying rare genotypes (such as the

Proficiency Testing Study on Rare Sequence Variants

homozygous 20210A mutation in the prothrombin gene) or harboring rare SNPs (such as the 20209T allele) is limited (32 ). The restricted availability may contribute to the relatively high number of failures in the identification of G20210A mutant genotypes (both heterozygous and homozygous forms). Therefore, the availability of CRMs or QC samples is of prime importance. Three plasmid-based CRMs whose suitability has been carefully assessed are now available from IRMM (http://www.irmm.jrc.be) for testing of the G20210A mutation (16–18 ). The plasmids are designed in a way that they can be used for all methods published up to 2006. Based on these plasmid-type RMs, the desired mutations could be introduced using the site-directed mutagenesis technique, allowing the design of test samples with rare genotypes/mutations. Quality issues in human genetic testing are particularly important because these tests are carried out in principle only once in a lifetime and false results can lead to inadequate treatment or preventive measures. EQA is the key mechanism to assess the performance of diagnostic medical laboratories and the efficiency of their testing methods. Ring trials also serve as educational tools and training for the participants to sustain improvements in the quality of services. Despite these efforts, relatively high error rates for clinical genetic testing of thrombophilic mutations have been reported, which persist at approximately 5% without any notable improvement in recent years (26–29 ). Overall, the results of this interlaboratory comparison showed expected error rates on the wild-type and G20210A mutation samples without additional mutations. This error rate on the detection of SNPs may result in thousands of misclassifications yearly in the participant laboratories, which emphasizes the need for more detailed scrutiny to identify sources of errors and to eliminate weaknesses. In this study, the vast majority of the false results arose from inadvertent errors by laboratory personnel and not from technical failure of the method as such. The allele-specific PCR assays proved to be less robust than other techniques, however, and the increased error rate of certain LDTs indicates that they, as a group, have to be more carefully validated. Although the incidence of rare sequence variants interfering with some genotyping techniques is relatively low, the results of samples having such variants can be easily misinterpreted and lead to an even higher error rate. Most of these variants have been discovered because of the unusual melting temperatures observed using hybridization probes with the LightCycler instrument, which is widely used in clinical and research laboratories. Furthermore, all PCR-based methods can be compromised when mutations occur in the primer annealing region, resulting in insufficient amplifica-

tion, unusual observations, and eventually inconclusive results. In previous studies, proficiency testing was carried out on the most common SNPs, and those studies did not challenge the competence of testing laboratories to recognize and correctly report sequence variants adjacent to the mutation of interest. The present exercise revealed that where additional sequence variants resulted in unusual genotyping data and were therefore in principle detectable, only a fraction of the laboratories recognized and adequately reported unexpected SNPs. Although the G20210A mutation is a wellcharacterized risk factor for venous thromboembolism, the consequences of these adjacent variants on the analysis results have been investigated infrequently. Such rare polymorphisms should be reported as variants of unknown clinical significance clearly distinct from the wild-type or G20210A mutation alleles (33 ). In proficiency testing, most participants reporting the unusual results as technical issues are unlikely to apply another technique to check for the presence of rare sequence variants, as they would usually do for real patient samples. Interestingly, most of the participants applying sequencing as routine procedure for the detection of the G20210A mutation did not report the C20209T variant, although it could be detected from their raw data. Other laboratories sequenced the fragment to understand the reason for the unusual melting curves and were able to identify the additional mutation. Although the number of participants using sequencing techniques was relatively low in this ring trial, the results correlate with the findings of the recent EQUALseq studies (methodologic European external quality assurance for DNA sequencing) (34 ), where single-base changes also were often unnoticed. Moreover, participants who claimed to apply screening methods for previously unknown mutations in monogenic disorders did not recognize the rare variants at a higher rate than laboratories predominantly testing for well-defined SNPs. These observations underline the pivotal importance of the laboratory personnel skills and suggested that training may be an efficient tool to improve the quality of genotyping services (35 ). Conclusion This study revealed that DNA analysis for the detection of the prothrombin G20210A mutation is reliable in a majority of laboratories. However, a fraction of the participants were not able to recognize and adequately report the unusual genotyping results caused by rare SNPs. The majority of failures could be avoided by improved training and careful selection and validation of the methods applied. Clinical Chemistry 55:4 (2009) 745

Author Contributions: All authors confirmed they have contributed to the intellectual content of this paper and have met the following 3 requirements: (a) significant contributions to the conception and design, acquisition of data, or analysis and interpretation of data; (b) drafting or revising the article for intellectual content; and (c) final approval of the published article. Authors’ Disclosures of Potential Conflicts of Interest: No authors declared any potential conflicts of interest.

Role of Sponsor: The funding organizations played no role in the design of study, choice of enrolled patients, review and interpretation of data, or preparation or approval of manuscript. Acknowledgments: We thank Prof. Hans Reinauer and Ralf Flo¨rke (Instand e.V., Du¨sseldorf, Germany), Prof. Michael Neumaier and Rolf Kruse (DGKL–Referenzinstitut fu¨r Bioanalytik, Bonn, Germany), and Erika Sa´rka´ny (QualiCont Kht., Szeged, Hungary) for their support and contribution to the ring trial. We gratefully acknowledge the participation of all of the laboratories.

References 1. Teupser D, Rupprecht W, Lohse P, Thiery J. Fluorescence-based detection of the CETP TaqIB polymorphism: false positives with the TaqManbased exonuclease assay attributable to a previously unknown gene variant. Clin Chem 2001;47: 852–7. 2. Lay MJ, Wittwer CT. Real-time fluorescence genotyping of factor V Leiden during rapid-cycle PCR. Clin Chem 1997;43:2262–7. 3. Jeffrey GP, Chakrabarti S, Hegele RA, Adams PC. Polymorphism in intron 4 of HFE may cause overestimation of C282Y homozygote prevalence in haemochromatosis. Nat Genet 1999;22:325– 6. 4. Lyon E. Discovering rare variants by use of melting temperature shifts seen in melting curve analysis [Editorial]. Clin Chem 2005;51:1331–2. 5. Graham R, Liew M, Meadows C, Lyon E, Wittwer CT. Distinguishing different DNA heterozygotes by high-resolution melting. Clin Chem 2005;51: 1295– 8. 6. Mahadevan MS, Benson PV. Factor V null mutation affecting the Roche LightCycler factor V Leiden assay. Clin Chem 2005;51:1533–5. 7. Liew M, Pryor R, Palais R, Meadows C, Erali M, Lyon E, et al. Genotyping of single nucleotide polymorphisms by high-resolution melting of small amplicons. Clin Chem 2004;50:1156 – 64. 8. Lyon E, Millson A, Phan T, Wittwer CT. Detection and identification of base alterations within the region of factor V Leiden by fluorescent melting curves. Mol Diagn 1998;3:203–10. 9. Phillips M, Meadows CA, Huang MY, Millson A, Lyon E. Simultaneous detection of C282Y and H63D hemochromatosis mutations by dual-color probes. Mol Diagn 2000;5:107–16. 10. Tag CG, Gressner AM, Weiskirchen R. An unusual melting curve profile in LightCycler multiplex genotyping of the hemochromatosis H63D/C282Y gene mutations. Clin Biochem 2001;34:511–5. 11. Warshawsky I, Hren C, Sercia L, Shadrach B, Deitcher SR, Newton E, et al. Detection of a novel point mutation of the prothrombin gene at position 20209. Diagn Mol Pathol 2002;11:152– 6. 12. Wylenzek M, Geisen C, Stapenhorst L, Wielckens K, Klingler KR. A novel point mutation in the 3⬘ region of the prothrombin gene at position 20221 in a Lebanese/Syrian family [Letter]. Thromb Haemost 2001;85:943– 4. 13. Schrijver I, Lenzi TJ, Jones CD, Lay MJ, Druzin ML, Zehnder JL. Prothrombin gene variants in nonCaucasians with fetal loss and intrauterine growth retardation. J Mol Diagn 2003;5:250 –3. 14. Tag CG, Schifflers M-C, Mohnen M, Gressner AM, Weiskirchen R. Atypical melting curve resulting from genetic variation in the 3⬘-untranslated region at position 20218 in the prothrombin gene

746 Clinical Chemistry 55:4 (2009)

15.

16.

17.

18.

19.

20.

21.

22.

analyzed with the LightCycler factor II (prothrombin) G20210A assay [Letter]. Clin Chem 2005;51: 1560 –1. Klein CL, Ma´rki-Zay J, Corbisier P, Gancberg D, Cooper S, Gemmati D, et al. Reference materials (RMs) for analysis of the human factor II (prothrombin) gene G20210A mutation. Clin Chem Lab Med 2005;43:862– 8. Gancberg D, Ma´rki-Zay J, Corbisier P, Klein C, Schimmel H, Emons H. Certification of a reference material consisting of purified plasmid DNA containing a fragment from the human prothrombin gene (wildtype): certified reference material IRMM/IFCC-490. Geel (Belgium): European Commission, Directorate-General Joint Research Centre, Institute for Reference Materials and Measurements; 2006. 35 p. Report EUR 22169 EN. Gancberg D, Ma´rki-Zay J, Corbisier P, Klein C, Schimmel H, Emons H. Certification of a reference material consisting of purified plasmid DNA containing a fragment from the human prothrombin gene (G20210A mutant): certified reference material IRMM/IFCC-491. Geel (Belgium): European Commission, Directorate-General Joint Research Centre, Institute for Reference Materials and Measurements; 2006. 35 p. Report EUR 22170 EN. Gancberg D, Ma´rki-Zay J, Corbisier P, Klein C, Schimmel H, Emons H. Certification of a reference material consisting of purified plasmid DNA containing a fragment from the human prothrombin gene (heterozygous G20210 wildtype/G20210A mutant): certified reference material IRMM/IFCC492. Geel (Belgium): European Commission, Directorate-General Joint Research Centre, Institute for Reference Materials and Measurements; 2006. 33 p. Report EUR 22167 EN. Gancberg D, Corbisier P, Meeus N, Marki-Zay J, Mannhalter C, Schimmel H. Certification of reference materials for detection of the human prothrombin gene G20210A sequence variant. Clin Chem Lab Med 2008;46:463–9. Poort SR, Rosendaal FR, Reitsma PH, Bertina RM. A common genetic variation in the 3⬘untranslated region of the prothrombin gene is associated with elevated plasma prothrombin levels and an increase in venous thrombosis. Blood 1996;88:3698 –703. McGlennen RC, Key NS. Clinical and laboratory management of the prothrombin G20210A mutation. Arch Pathol Lab Med 2002;126:1319 –25. Wylenzek C, Trubenbach J, Gohl P, Wildhardt G, Alkins S, Fausett MB, et al. Mutation screening for the prothrombin variant G20210A by melting point analysis with the LightCycler system: atyp-

23.

24.

25.

26.

27.

28.

29.

30.

31.

32.

33.

ical results, detection of the variant C20209T and possible clinical implications. Clin Lab Haematol 2005;27:343– 6. Clench T, Standen GR, Ryan E, Chilcott JL, Mumford AD. Rapid detection of the prothrombin C20209T transition by LightCycler analysis. Thromb Haemost 2005;94:1114 –5. van der Putten HH, Spaargaren-van Riel CC, Bertina RM, Vos HL. Functional analysis of two prothrombin 3⬘-untranslated region variants: the C20209T variant, mainly found among AfricanAmericans, and the C20209A variant. J Thromb Haemost 2006;4:2285–7. Meadows CA, Warner D, Page S, Lyon E. Detection of novel mutation using fluorescent hybridization probes and melting temperature analysis [Abstract]. J Mol Diagn 2001;3:195. Jennings I, Kitchen S, Woods TA, Preston FE. Multilaboratory testing in thrombophilia through the United Kingdom National External Quality Assessment Scheme (Blood Coagulation) Quality Assurance Program. Semin Thromb Hemost 2005; 31:66 –72. Tripodi A, Chantarangkul V, Menegatti M, Tagliabue L, Peyvandi F. Performance of clinical laboratories for DNA analyses to detect thrombophilia mutations. Clin Chem 2005;51:1310 –1. Neumaier M, Braun A, Gessner R, Funke H. Experiences with external quality assessment (EQA) in molecular diagnostics in clinical laboratories in Germany. Working Group of the German Societies for Clinical Chemistry (DGKC) and Laboratory Medicine. Clin Chem Lab Med 2000;38:161–3. Tripodi A, Peyvandi F, Chantarangkul V, Menegatti M, Mannucci PM. Relatively poor performance of clinical laboratories for DNA analyses in the detection of two thrombophilic mutations: a cause for concern. Thromb Haemost 2002;88: 690 –1. Danneberg J, Abbes AP, Bruggeman BJ, Engel H, Gerrits J, Martens A. Reliable genotyping of the G-20210-A mutation of coagulation factor II (prothrombin). Clin Chem 1998;44:349 –51. Spector EB, Grody WW, Matteson CJ, Palomaki GE, Bellissimo DB, Wolff DJ, et al. Technical standards and guidelines: venous thromboembolism (factor V Leiden and prothrombin 20210G ⬎A testing): a disease-specific supplement to the standards and guidelines for clinical genetics laboratories. Genet Med 2005;7:444 –53. Chen B, O’Connell CD, Boone DJ, Amos JA, Beck JC, Chan MM, et al. Developing a sustainable process to provide quality control materials for genetic testing. Genet Med 2005;7:534 – 49. Richards CS, Bale S, Bellissimo DB, Das S, Grody WW, Hegde MR, et al., Molecular Subcommittee

Proficiency Testing Study on Rare Sequence Variants

of the ACMG Laboratory Quality Assurance Committee. ACMG recommendations for standards for interpretation and reporting of sequence variations: revisions 2007. Genet Med 2008;10: 294 –300.

34. Ahmad-Nejad P, Dorn-Beineke A, Pfeiffer U, Brade J, Geilenkeuser WJ, Ramsden S, et al. Methodologic European external quality assurance for DNA sequencing: the EQUALseq program. Clin Chem 2006;52:716 –27.

35. Dorn-Beineke A, Ahmad-Nejad P, Pfeiffer U, Ramsden S, Pazzagli M, Neumaier M. Improvement of technical and analytical performance in DNA sequencing by external quality assessment-based molecular training. Clin Chem 2006;52:2072– 8.

Clinical Chemistry 55:4 (2009) 747

Clinical Chemistry 55:4 748–756 (2009)

Molecular Diagnostics and Genetics

Coamplification at Lower Denaturation Temperature–PCR Increases Mutation-Detection Selectivity of TaqMan-Based Real-Time PCR Jin Li,1 Lilin Wang,1 Pasi A. Ja¨nne,2 and G. Mike Makrigiorgos1*

BACKGROUND: DNA genotyping with mutationspecific TaqMan® probes (Applied Biosystems) is broadly used in detection of single-nucleotide polymorphisms but is less so for somatic mutations because of its limited selectivity for low-level mutations. We recently described coamplification at lower denaturation temperature–PCR (COLD-PCR), a method that amplifies minority alleles selectively from mixtures of wild-type and mutation-containing sequences during the PCR. We demonstrate that combining COLD-PCR with TaqMan technology provides TaqMan genotyping with the selectivity needed to detect low-level somatic mutations. METHODS:

Minor-groove binder– based or common TaqMan probes were designed to contain a nucleotide that matches the desired mutation approximately in the middle of the probe. The critical denaturation temperature (Tc) of each amplicon was then experimentally determined. COLD-PCR/TaqMan genotyping was performed in 2 steps: denaturation at the Tc, followed by annealing and extension at a single temperature (fast COLD-PCR). The threshold cycle was used to identify mutations on the basis of serial dilutions of mutant DNA into wild-type DNA and to identify TP53 (tumor protein p53) and EGFR [epidermal growth factor receptor (erythroblastic leukemia viral (v-erb-b) oncogene homolog, avian)] mutations in tumors.

RESULTS:

COLD-PCR/TaqMan genotyping identified G⬎A mutations within TP53 exon 8 (codon 273 mutation hot spot) and C⬎T mutations within the EGFR gene (drug-resistance mutation T790M) with a selectivity improvement of 15- to 30-fold over regular PCR/ TaqMan genotyping. A second round of COLD-PCR/ TaqMan genotyping improved the selectivity by another 15- to 30-fold and enabled detection of 1 mutant in 2000 wild-type alleles. Use of COLD-PCR/TaqMan genotyping allowed quantitative identification of

1

Department of Radiation Oncology, Divisions of Genomic Stability and DNA Repair, and Medical Physics, and 2 Department of Medical Oncology, Lowe Center for Thoracic Oncology, Dana Farber Cancer Institute, Harvard Medical School, Boston, MA. * Address correspondence to this author at: Brigham and Women’s Hospital, Level L2, Radiation Therapy, 75 Francis St., Boston, MA 02115, USA. Fax (617) 587-6037; e-mail [email protected].

748

low-level TP53 and T790 mutations in colon tumor samples and in non–small-cell lung cancer cell lines treated with kinase inhibitors. CONCLUSIONS: The major improvement in selectivity provided by COLD-PCR enables the popular TaqMan genotyping method to become a powerful tool for detecting low-level mutations in clinical samples.

© 2008 American Association for Clinical Chemistry

Mutation detection plays a key role in the diagnosis, treatment, and prognosis assessment of cancer patients (1 ). Methods used for mutation detection include sequencing (2, 3 ), RFLP analysis (4 ), MALDI-TOF analysis (5 ), denaturing HPLC/Surveyor™ (6, 7 ), ligationmediated PCR (8 ), high-resolution melting (9, 10 ), peptide nucleic acid (PNA)3-locked nucleic acids (11 ), antiprimer quenching real-time PCR (12, 13 ), Scorpion primers (14 ), molecular beacons, and methods based on TaqMan® probes (Applied Biosystems) (15, 16 ). Because of its simplicity and speed, TaqMan genotyping is frequently used as an end-point approach (17 ). The reaction consists of 2 primers and 2 probes that match to either the wild-type or mutant allele. The polymorphic nucleotide is usually designed to be in the middle third of the probe, which is labeled with a reporter molecule at the 5⬘ end and with a nonfluorescent quencher at the 3⬘ end. Modifications of the TaqMan probe with minor-groove binders (MGBs) (18 ) or locked nucleic acids (19 ) increase the probe’s Tm (temperature at which 50% of the probe is denatured from the template) to allow the design of shorter probes and better discrimination between mutant and wild-type alleles. The selectivity limit of TaqMan genotyping is the detection of mutant alleles present at an abundance of approximately 10%–20%

Received June 24, 2008; accepted October 14, 2008. Previously published online at DOI: 10.1373/clinchem.2008.113381 3 Nonstandard abbreviations: PNA, peptide nucleic acid; MGB, minor-groove binder; Tm, temperature at which 50% of the probe is denatured from the complementary strand; COLD-PCR, coamplification at lower denaturation temperature–PCR; Tc, critical denaturation temperature; NSCLC, non–small-cell lung cancer; TaqMAMA, allele-specific PCR-based TaqMan genotyping.

Real-Time COLD-PCR

of that of the wild-type allele (17, 20 ). Because the frequencies of somatic mutations can often be lower (6, 21 ), this limit poses problems for the use of TaqMan genotyping in screening for somatic mutations in tumor surgical samples or bodily fluids that are often contaminated with wild-type alleles. We recently described a new form of PCR, coamplification at lower denaturation temperature–PCR (COLD-PCR), which preferentially enriches “minority alleles” from mixtures of wild-type and mutationcontaining sequences, irrespective of where a mutation lies in the sequence (22 ). COLD-PCR is based on the observations that (a) for each DNA sequence there is a critical denaturation temperature (Tc) that is lower than the Tm of the target sequence and below which PCR efficiency drops abruptly, and (b) Tc is dependent on the DNA sequence. DNA amplicons differing by a single nucleotide have substantially different and reproducible amplification efficiencies when the PCR denaturation temperature is set to the Tc. These features are exploited during PCR amplification to selectively enrich minority alleles that differ by one or more nucleotides at any position in a given sequence. Consequently, COLD-PCR amplification of genomic DNA yields PCR products that contain high percentages of variant alleles, thus permitting their detection. We have demonstrated that COLD-PCR improves the selectivity of RFLP analysis, denaturing HPLC/Surveyor, Sanger sequencing, pyrosequencing, and MALDITOF– based mutation detection by one to two orders of magnitude (22 ). We demonstrate that combining COLD-PCR with the TaqMan genotyping method provides a major improvement in the latter’s ability to quantitatively detect low-level somatic mutations in tumor samples in a real-time format. Materials and Methods SOURCE OF GENOMIC DNA

Reference human male genomic DNA was purchased from Promega and used as wild-type DNA in dilution experiments with mutation-containing DNA. Genomic DNA from SW480 and 4 lung adenocarcinoma cell lines (H1975, H820, PC9GR, and H3255GR) were purchased from the ATCC. The H3255GR cell line was developed by exposing H3255 cells to serially increasing concentrations of gefitinib for 6 months until the cells were able to proliferate in 100 nmol/L gefitinib with growth kinetics similar to those of untreated cells (23 ). Similarly, the PC9GR cell line was derived by gefitinib treatment of PC9 cells (24 ). Snap-frozen colon tumor samples were obtained from the Massachusetts General Hospital Tumor Bank following Internal Review Board approval. DNA was extracted from cell lines and tumor samples with the DNeasy Blood & Tis-

sue Kit (Qiagen). Primers were synthesized by Integrated DNA Technologies. SINGLE-ROUND COLD-PCR/TaqMan GENOTYPING

COLD-PCR/TaqMan real-time genotyping for the T790M mutation encoded by EGFR exon 20. See (25, 26 ) for further details. Real-time PCR reactions were performed directly with 70 ng genomic DNA in the presence of 0.2 ␮mol/L regular TaqMan probe (5⬘– 6-FAM-CAT GAG CTG CAT GAT GAG CTG-BHQ1–3⬘) or 0.1 ␮mol/L MGB TaqMan probe (5⬘– 6-FAMTGA GCT GCA TGA TGA GC-MGBNFQ–3⬘) that fully matches the mutation-containing sequence on DNA from H1975 cells that encodes the T790M mutation (mutation is underlined). The final concentrations of the other reagents were as follows: 1⫻ GoTaq Flexi Buffer (Promega), 1⫻ GoTaq Flexi DNA Polymerase (Promega), 0.2 mmol/L of each deoxynucleoside triphosphate, 0.2 ␮mol/L forward primer (5⬘– TGATGGCCAGCGTGGAC–3⬘), 0.2 ␮mol/L reverse primer (5⬘–CAGGAGGCAGCCGAAGG–3⬘), and 2.5 mmol/L MgCl2. The size of the PCR amplicon is 104 bp. Fast COLD-PCR cycling was performed on a Cepheid SmartCycler™ machine as follows: 95 °C for 120 s; 20 cycles of 95 °C for 15 s and 60 °C (fluorescence reading on) for 30 s; and 30 cycles of 88 °C for 15 s and 60 °C (fluorescence reading on) for 30 s. The 88 °C Tc for this amplicon was determined experimentally, as described previously (22 ). In brief, a set of PCR reactions were performed at gradually decreasing denaturation temperatures (0.3 °C steps starting from the Tm), and the lowest denaturation temperature that reproducibly yielded a PCR product was chosen. Quantification of T790M mutations in lung adenocarcinoma cell lines with COLD-PCR/TaqMan genotyping. We first used regular PCR with an intercalating dye on a 104-bp EGFR4 [epidermal growth factor receptor (erythroblastic leukemia viral (v-erb-b) oncogene homolog, avian)] amplicon to quantify the copy number of EGFR exon 20; the PCR was carried out independent of the presence or absence of a mutation. We used 0.1⫻ LCGreen dye (Idaho Technology) in this reaction without a TaqMan probe. The PCR cycling conditions were 95 °C for 120 s, and 40 cycles of 95 °C for 15 s and 60 °C (fluorescence reading on) for 60 s. We also used serial dilutions of known concentrations of reference DNA as a calibration reference to quantify the copy numbers of DNA from non–small-cell lung cancer (NSCLC) cell lines.

4

Human genes: EGFR, epidermal growth factor receptor (erythroblastic leukemia viral (v-erb-b) oncogene homolog, avian); TP53, tumor protein p53.

Clinical Chemistry 55:4 (2009) 749

TWO-ROUND COLD-PCR/TaqMan GENOTYPING FOR T790M MUTATIONS IN EGFR EXON 20

Fig. 1. Outline of COLD-PCR/TaqMan genotyping. COLD-PCR/TaqMan genotyping is a 2-step amplification that uses denaturation at a Tc and annealing/extension at a single temperature. A TaqMan probe with a centrally located nucleotide matching the mutation is used for realtime detection. WT, wild type; MUT, mutant.

To examine whether a second round of COLD-PCR increases the overall selectivity of the method, we performed a nested second round of COLD-PCR/TaqMan genotyping after completing the first round. After performing the first PCR round with a 104-bp PCR amplicon as described above, we carried out a nested PCR with a 10 000 –fold dilution of the first PCR product, a nested forward primer (5⬘–CTGGGCATCTGC CTCA–3⬘; amplicon size, 67 bp; Tc, 83.5 °C, defined experimentally as described above), and the same reverse primer used in the first PCR. The fast COLD-PCR cycling conditions were as follows: 95 °C for 120 s; 5 cycles of 95 °C for 15 s and 60 °C (fluorescence reading on) for 30 s; and 35 cycles of 83.5 °C for 15 s and 60 °C (fluorescence reading on) for 30 s. COLD-PCR/TaqMan genotyping experiments were repeated independently at least 5 times. Results PRINCIPLE OF COLD-PCR/TaqMan GENOTYPING

After determining EGFR allele copy numbers, we tested cell line DNA containing equivalent numbers of EGFR exon 20 copies with COLD-PCR/TaqMan genotyping to quantify the relative amounts of T790M mutations. Our assessment of the amount of mutant T790M allele as a percentage of the wild-type allele with COLD-PCR/TaqMan genotyping was based on a calibration curve of serial dilutions of known amounts of mutation-containing DNA added to wild-type DNA. COLD-PCR/TaqMan real-time genotyping of the G⬎A mutation in codon 273 of the TP53 exon 8 fragment. See (27 ) for further details. Real-time PCR reactions were performed directly from 20 ng genomic DNA in the presence of 0.2 ␮mol/L of a TaqMan probe (5⬘– 6FAM-TTT GAG GTG CAT GTT TGT GCC-BHQ-1– 3⬘) that fully matches the mutation-containing sequence in DNA from SW480 cells (mutation is underlined). The final concentrations of the other reagents were as follows: 1⫻ GoTaq Flexi Buffer, 1⫻ GoTaq Flexi DNA Polymerase, 0.2 mmol/L of each deoxynucleoside triphosphate, 0.2 ␮mol/L forward primer (5⬘–TGG TAA TCT ACT GGG ACG–3⬘), 0.2 ␮mol/L reverse primer (5⬘–CGG AGA TTC TCT TCC TCT–3⬘), and 3 mmol/L MgCl2. The size of the COLDPCR amplicon was 87 bp, with a Tc of 83.5 °C defined experimentally as described above. The fast COLDPCR cycling conditions were as follows: 95 °C for 120 s; 25 cycles of 95 °C for 15 s and 58 °C (fluorescence reading on) for 60 s; and 25 cycles of 83.5 °C for 15 s and 58 °C (fluorescence reading on) for 60 s. Experiments were repeated at least 5 times in independent experiments. 750 Clinical Chemistry 55:4 (2009)

COLD-PCR can be carried out in 2 formats, full COLD-PCR and fast COLD-PCR, depending on whether it is necessary to detect all mutations comprehensively or to detect specific Tm-reducing mutations in a rapid and highly selective fashion (22 ). The combination of full COLD-PCR with TaqMan genotyping can be applied for Tm-increasing mutations such as A:T⬎G:C or T:A⬎G:C or for Tm-decreasing mutations; however, the Tm of a DNA sequence is reduced for the great majority of mutations encountered in cancer samples (28 ), including the T790M mutation (i.e., C⬎T, EGFR exon 20) and the codon 273 mutation (G⬎A, TP53 exon 8) examined in this investigation. In view of the simplicity, speed, and high mutation enrichment achieved via fast COLD-PCR, we focused on developing the combination of fast COLD-PCR with TaqMan genotyping to detect Tm-reducing mutations. Because the present application is aimed at detecting low-level mutant alleles, the COLD-PCR/TaqMan reaction uses a single TaqMan probe specific for the mutant allele, in which the mutation is placed approximately in the middle of the probe (i.e., there is no need for a second TaqMan probe to detect the wild-type allele, as in conventional TaqMan genotyping). The cycling program includes approximately 20 –25 regular PCR cycles to build-up the PCR product, followed by a switch to a 2-step PCR consisting of denaturing at Tc and then lowering to a single temperature for both primer annealing and extension (Fig. 1). At the Tc, the majority of the wild-type amplicons remain doublestranded; however, mutant amplicons are largely denatured at the Tc and function as template for primer and

Real-Time COLD-PCR

150 100

Fluorescence

50%

Regular PCR/TaqMan Genotyping (Dilution of Mutant into Wild Type)

mutant 12%

50

WT

A

0 20 150

30

40

COLD-PCR/TaqMan Genotyping (Dilution of Mutant into Wild Type)

12%

100

6% 3%

mutant

0.8%

50

B

0.1%

0

WT

30

40

50

60

PCR cycles Fig. 2. Comparison of regular PCR/TaqMan genotyping with COLD-PCR/TaqMan genotyping for the T790M C>T mutation in EGFR exon 20. Serial dilutions into wild-type DNA of DNA from the H1975 cell line containing the T790M mutation encoded by EGFR exon 20 were screened for the T790M mutation with regular PCR/TaqMan genotyping (A) and with COLD-PCR/TaqMan genotyping (MGB probe) (B).

probe binding. Lowering the temperature from the Tc to the annealing and extension temperature allows the probe to bind with the complementary mutant strand. Accordingly, COLD-PCR not only enriches the mutant but also reduces the chance that the probe will mismatch-bind to the wild-type strand by keeping the wild type double-stranded. During the annealing and extension step, the 5⬘33⬘ exonuclease activity of Taq polymerase digests the probe to release the reporter from the quencher, allowing the fluorescence signal to be read at this step (29 ). The presence and quantity of mutations are detected by recording the threshold cycle of the real-time reaction relative to that of reference samples containing known amounts of the same mutation. IMPROVEMENT OF TaqMan GENOTYPING VIA REAL-TIME QUANTITATIVE COLD-PCR

Validation of COLD-PCR/TaqMan genotyping was done by means of serial dilutions of DNA from tumorderived cell lines containing the gefitinib-resistance mutation [C⬎T at codon 790 of EGFR exon 20 (cell line H1975)] (30 ) or TP53 hot-spot mutation G⬎A at codon 273 of TP53 exon 8 (cell line SW480) (31 ). These Tm-reducing mutants are suitable for enrichment via fast COLD-PCR. Fig. 2 depicts representative results comparing the selectivities of regular PCR/ TaqMan genotyping with COLD-PCR/TaqMan genotyping for a 104-bp amplicon from EGFR exon 20. The

selectivity limit of regular PCR/TaqMan genotyping is about 12% mutant allele (Fig. 2A); in contrast, COLDPCR improves the selectivity to 0.8% (Fig. 2B). Next, we tested whether COLD-PCR/TaqMan genotyping of EGFR exon 20 encoding the T790M variant can quantify the low population of T790M mutations in NSCLC cell lines. Because the EGFR gene is frequently amplified in NSCLC cells, the potential variation in copy number for EGFR exon 20 needs to be considered before T790M quantification. We applied regular PCR in the presence of the LCGreen dye to quantify the copy numbers for the 3 NSCLC lines (H820, H3255GR, and PC9GR; Fig. 3, A and B). H3255GR exhibits 16-fold amplification, and H820 and PC9GR exhibit 4-fold amplification. On the basis of this quantification, we diluted the genomic DNA from these 3 cell lines to obtain equal copy numbers of exon 20 and tested for T790M mutants and known dilutions of T790M mutants added to wild-type DNA (Fig. 3, C and D). The percentage of T790M was calculated by interpolation to be 5% for H820, 2.85% for H3255GR, and 0.4% for PC9GR. Fig. 4, A and B, presents the results of applying real-time COLD-PCR to TaqMan genotyping of the hot-spot mutations in codon 273 of TP53 exon 8. Whereas the limit of selectivity for regular PCR/TaqMan genotyping is about 10% mutant allele, COLDPCR/TaqMan genotyping can detect as little as 0.33% mutant alleles among wild-type alleles, an improveClinical Chemistry 55:4 (2009) 751

A

C

B

D

Fig. 3. Quantification of the T790M mutation in NSCLC cell lines. (A), Primary amplification curve for EGFR exon 20 with the LCGreen dye. DNA from NSCLC cell lines (PC9GR, H820, and H3255GR) was serially diluted into human male DNA and amplified. (B), Plot of the log concentration of input genomic DNA vs threshold cycle to quantify the copy number of EGFR exon 20. (C), Primary amplification curve for the T790M mutation from EGFR exon 20 with COLD-PCR/Taqman genotyping (MGB probe). Serial dilutions of T790M mutant DNA and DNA from NSCLC cell lines were tested. (D), Plot of the log concentration (percent) of T790M mutant DNA vs threshold cycle to quantify the percentage of T790M in NSCLC cell lines. WT, wild type.

ment of approximately 30-fold. Examination of 4 cancer samples, one of which (CT20) is known to contain a codon 273 G⬎A mutation at a low level (approximately 5%) (6, 7 ), with COLD-PCR/TaqMan genotyping clearly identified the mutation-containing sample, but regular PCR/TaqMan genotyping did not (Fig. 4, C and D). Thus, our data demonstrate that COLD-PCR improves TaqMan-genotyping selectivity by 15- to 30-fold. 752 Clinical Chemistry 55:4 (2009)

To understand further the improvement in mutation selectivity produced by the application of COLDPCR, we subjected the PCR product used in TaqMan genotyping of the T790 EGFR exon 20 mutation to an RFLP assay with a restriction enzyme, NlaIII, that selectively recognizes mutation-containing DNA. The digested products were then examined via denaturing HPLC, as reported previously (22 ). For comparison to

Real-Time COLD-PCR

Codon 273 TP53 400

A

Regular PCR/TaqMan Genotyping (Dilution Mutant)

400

300

300

200

200

B

COLD-PCR/TaqMan Genotyping (Dilution Mutant) 100% 33% 10%

Fluorescence

100%

100

33% mutant 10% 3.3% wt

0 10 400

20

30

40

Regular PCR/TaqMan Genotyping (Tumor Samples)

300 200 100

3.3% 1% 0.33%

0

wt

50

C

10 400

mutant

1:30

100

D

300

20

30

40

50

COLD-PCR/TaqMan Genotyping (Tumor Samples)

200 CT20

CT20, TL6, TL8, TL18

100 TL6, TL8, TL18

0

0 10

20

30

40

50

10

20

30

40

50

Cycles Fig. 4. Comparison of regular PCR/TaqMan genotyping with COLD-PCR/TaqMan genotyping for the G>A mutation in codon 273 of TP53 exon 8. Serial dilutions into wild-type DNA of SW480 cell line DNA containing the G⬎A mutation in codon 273 of TP53 exon 8 were evaluated for the codon 273 mutation with regular PCR/TaqMan genotyping (A) and COLD-PCR/TaqMan genotyping (B). Clinical tumor samples from 3 NSCLC cancer patients (TL6, TL8, and TL18) and a colon cancer sample (CT20) were evaluated for the codon 273 mutation with regular PCR/TaqMan genotyping (C) and COLD-PCR/TaqMan genotyping (D).

COLD-PCR/TaqMan genotyping, we conducted identical experiments after regular PCR/TaqMan genotyping. Fig. 5 demonstrates that the product produced by regular PCR/TaqMan genotyping and digested with NlaIII barely shows the mutant peak (12% mutant relative to wild type), in agreement with the real-time PCR results (Fig. 2). In contrast, NlaIII-digested products produced by COLD-PCR/TaqMan genotyping depict mutant peaks down to 0.8% mutant alleles. The data in Fig. 5 are additional verification that the improved real-time PCR quantification of T790M mutations indeed reflects the anticipated mutationspecific products and not false-positive signals. FURTHER IMPROVEMENT OF TaqMan GENOTYPING VIA 2 ROUNDS OF COLD-PCR AMPLIFICATION

Given that a single round of COLD-PCR/TaqMan genotyping can detect as little as 0.8% mutant alleles, we tested whether nested COLD-PCR/TaqMan genotyping can further improve the selectivity of T790M mutant detection. The nested PCR generates a 67-bp product from EGFR exon 20. When applied directly to genomic DNA (i.e., not in a nested format), COLD-

PCR/TaqMan genotyping of the 67-bp region had a selectivity of approximately 0.8% T790M mutant alleles (see Fig. 1 in the Data Supplement that accompanies the online version of this article at http://www. clinchem.org/content/vol55/issue4). When 2 COLDPCR TaqMan reactions are applied in series (the first COLD-PCR for a 104-bp amplicon and the second a nested PCR for the 67-bp amplicon), the combined selectivity for T790M detection is far superior to the selectivity of a single reaction. Fig. 6A shows that a single round of COLD-PCR/TaqMan genotyping fails to detect 0.1% T790M mutant alleles. In contrast, 2 rounds of COLD-PCR/TaqMan genotyping improved the selectivity to better than 0.05% mutant alleles, whereas 11 replicates of the wild-type DNA remained at the baseline. Thus, 2 rounds of COLD-PCR combined with TaqMan genotyping improve the mutation detection over that obtained with a single round. Discussion We have described COLD-PCR/TaqMan genotyping, a real-time mutation-detection methodology that comClinical Chemistry 55:4 (2009) 753

Wild-type peak 160

Fluorescence

140

Regular PCR COLD-PCR

WT

120

0.1% T790M

100 80 Mutant peak

0.8% T790M

60 Mutant peak

40

3% T790M

20 Mutant peak

12% T790M

0 4

6

8

10

dHPLC retention time (min) Fig. 5. RFLP confirmation of enrichment of the T790M mutation by COLD-PCR/TaqMan genotyping. DNA from the H1975 cell line DNA serially diluted into wild-type (WT) DNA were screened for the T790M mutation with regular PCR/TaqMan genotyping and with COLD-PCR/TaqMan genotyping. The PCR product was digested with NlaIII, and the digest products were separated by denaturing HPLC to discriminate the mutant peak from the wild-type peak.

bines COLD-PCR and TaqMan genotyping for detecting the EGFR-encoded T790M mutant and TP53 codon 273 mutations in serial dilutions of mutant DNA, in cell lines, and in biological samples. The clinical relevance of these mutations is well established. T790M, an acquired mutation in the EGFR protein that renders NSCLC patients resistant to gefitinib or erlotinib, is found in approximately 50% of tumors from patients who have acquired resistance to these kinase inhibitors (32 ). The presence of hot-spot mutation at codon 273 of TP53 is a factor for a poor prognosis in NSCLC patients (27 ). The new method is based on the ability of fast COLD-PCR to enrich Tm-reducing mutations and the ability of the TaqMan probe to detect mutations in a real-time, quantitative format. Consequently, a single round of COLD-PCR/ TaqMan genotyping quantitatively detects as little as 0.8% mutant alleles with a 15- to 30-fold better selectivity than regular PCR/TaqMan genotyping. The addition of a second round of COLD-PCR/TaqMan genotyping further improves the selectivity and reproducibly identifies 1 mutant allele among 2000 wild-type alleles. 754 Clinical Chemistry 55:4 (2009)

Alternative TaqMan-based approaches that detect low amounts of mutant alleles have been described. Allele-specific PCR-based TaqMan genotyping, TaqMAMA, uses a mutant-matched nucleotide at the 3⬘ end of a primer and a penultimate 3⬘ mismatch to achieve specific allele discrimination in the PCR (33 ); however, the optimization of TaqMAMA conditions can be tedious (33 ). PNA-based TaqMan genotyping uses a PNA to inhibit wild-type DNA and a mutant-specific TaqMan probe to detect mutations (34 ). The necessity to define experimental conditions such as probe concentration while retaining not only the compatibility between the PNA probe and the TaqMan probe but also the ability of the PNA to inhibit the wild type increases the complexity of assay development. Scorpion assays (35 ) provide a good alternative to TaqMan genotyping in that the probe and primer are combined on a single oligonucleotide. DxS Ltd. offers a commercially available combination of Scorpion and ARMS® (amplification refractory mutation system) technologies that can detect low-level mutations such as T790M in EGFR with a sensitivity similar to that of

Real-Time COLD-PCR

A

First-Round COLD-PCR

60

0.05% and 0.1% mutant

40

11 repeats of WT

Fluorescence

20

(104-bp amplicon) 0

B

10

30

20

40

50

Second-Round (nested) COLD-PCR

60

0.1% mutant 40 20

0.05% mutant (67-bp amplicon) 11 repeats of WT

0 10

20

30

40

50

Cycles Fig. 6. Two-round COLD-PCR/TaqMan genotyping of the T790M mutation. DNA from the H1975 cell line was serially diluted into human male DNA. Duplicate samples containing 0.1% and 0.05% T790M mutant and 11 replicates of reference wild-type (WT) DNA were tested with COLD-PCR/TaqMan genotyping. (A), Single-round COLD-PCR/TaqMan genotyping of the T790M mutation (104-bp amplicon). (B), Second-round (nested) COLD-PCR/TaqMan genotyping of the T790M mutation (67-bp amplicon).

the single-round COLD-PCR/TaqMan assay; however, the Scorpion assay is relatively more complex, expensive, and slow (1 h for the COLD-PCR/Taqman assay vs 2–3 h for the Scorpion assay) (35 ). COLD-PCR achieves realtime mutation detection without tedious optimization or the use of costly PNA probes or Scorpion primers, because COLD-PCR/TaqMan genotyping uses only temperature to inhibit amplification of the wild type. Another potential advantage of the COLD-PCR/TaqMan approach is in the multiplex detection of mutations. Multiplexing would be more difficult to achieve with combinations of PNA and TaqMan probes because of the number of oligonucleotides used in the reaction. In summary, without relying on special probes and reagents, COLD-PCR/TaqMan genotyping is simple, fast, easy to use, and low in cost compared with other TaqMan-based mutation-detection methods. The major improvement in selectivity obtained with COLD-PCR enables the popular TaqMan genotyping method to become a powerful tool for detecting lowlevel mutations in clinical samples.

Author Contributions: All authors confirmed they have contributed to the intellectual content of this paper and have met the following 3 requirements: (a) significant contributions to the conception and design, acquisition of data, or analysis and interpretation of data; (b) drafting or revising the article for intellectual content; and (c) final approval of the published article. Authors’ Disclosures of Potential Conflicts of Interest: Upon manuscript submission, all authors completed the Disclosures of Potential Conflict of Interest form. Potential conflicts of interest: Employment or Leadership: None declared. Consultant or Advisory Role: P.A. Ja¨nne, AstraZeneca, Roche, Boehringer Ingelheim, and AVEO Pharmaceuticals. Stock Ownership: None declared. Honoraria: None declared. Research Funding: P.A. Ja¨nne, Pfizer; G.M. Makrigiorgos, NIH grants CA-115439 and CA-111994; J. Li, NIH training grant 5 T32 CA09078. Expert Testimony: None declared. Role of Sponsor: The funding organizations played no role in the design of study, choice of enrolled patients, review and interpretation of data, or preparation or approval of manuscript.

Clinical Chemistry 55:4 (2009) 755

References 1. Croce CM. Oncogenes and cancer. N Engl J Med 2008;358:502–11. 2. Bayley H. Sequencing single molecules of DNA. Curr Opin Chem Biol 2006;10:628 –37. 3. Marsh S. Pyrosequencing applications. Methods Mol Biol 2007;373:15–24. 4. Jenkins GJ, Chaleshtori MH, Song H, Parry JM. Mutation analysis using the restriction site mutation (RSM) assay. Mutat Res 1998;405:209 –20. 5. Ragoussis J, Elvidge GP, Kaur K, Colella S. Matrixassisted laser desorption/ionisation, time-of-flight mass spectrometry in genomics research. PLoS Genet 2006;2:e100. 6. Li J, Berbeco R, Distel RJ, Ja¨nne PA, Wang L, Makrigiorgos GM. s-RT-MELT for rapid mutation scanning using enzymatic selection and real time DNA-melting: new potential for multiplex genetic analysis. Nucleic Acids Res 2007;35:e84. 7. Yeung AT, Hattangadi D, Blakesley L, Nicolas E. Enzymatic mutation detection technologies. Biotechniques 2005;38:749 –58. 8. Shi C, Eshleman SH, Jones D, Fukushima N, Hua L, Parker AR, et al. LigAmp for sensitive detection of single-nucleotide differences. Nat Methods 2004;1:141–7. 9. Lipsky RH, Mazzanti CM, Rudolph JG, Xu K, Vyas G, Bozak D, et al. DNA melting analysis for detection of single nucleotide polymorphisms. Clin Chem 2001;47:635– 44. 10. Liew M, Pryor R, Palais R, Meadows C, Erali M, Lyon E, Wittwer C. Genotyping of single-nucleotide polymorphisms by high-resolution melting of small amplicons. Clin Chem 2004;50:1156 – 64. 11. Orum H. PCR clamping. Curr Issues Mol Biol 2000;2:27–30. 12. Li J, Makrigiorgos GM. Anti-primer quenchingbased real-time PCR for simplex or multiplex DNA quantification and single-nucleotide polymorphism genotyping. Nat Protoc 2007;2:50 – 8. 13. Li J, Wang F, Mamon H, Kulke MH, Harris L, Maher E, et al. Antiprimer quenching-based real-time PCR and its application to the analysis of clinical cancer samples. Clin Chem 2006;52:624 –33. 14. Whitcombe D, Theaker J, Guy SP, Brown T, Little S. Detection of PCR products using self-probing amplicons and fluorescence. Nat Biotechnol 1999;17:804 –7.

756 Clinical Chemistry 55:4 (2009)

15. Bernard PS, Wittwer CT. Real-time PCR technology for cancer diagnostics. Clin Chem 2002;48: 1178 – 85. 16. Makrigiorgos GM. PCR-based detection of minority point mutations. Hum Mutat 2004;23:406 –12. 17. De la Vega FM, Lazaruk KD, Rhodes MD, Wenz MH. Assessment of two flexible and compatible SNP genotyping platforms: TaqMan SNP Genotyping Assays and the SNPlex Genotyping System. Mutat Res 2005;573:111–35. 18. Kutyavin IV, Afonina IA, Mills A, Gorn VV, Lukhtanov EA, Belousov ES, et al. 3⬘-minor groove binder-DNA probes increase sequence specificity at PCR extension temperatures. Nucleic Acids Res 2000;28:655– 61. 19. Moore P. Simplifying the probe set. Nature 2005; 435:238. 20. Wilkening S, Hemminki K, Thirumaran RK, Bermejo JL, Bonn S, Forsti A, Kumar R. Determination of allele frequency in pooled DNA: comparison of three PCR-based methods. Biotechniques 2005;39:853– 8. 21. Ja¨nne PA, Borras AM, Kuang Y, Rogers AM, Joshi VA, Liyanage H, et al. A rapid and sensitive enzymatic method for epidermal growth factor receptor mutation screening. Clin Cancer Res 2006;12:751– 8. 22. Li J, Wang L, Mamon H, Kulke MH, Berbeco R, Makrigiorgos GM. Replacing PCR with COLD-PCR enriches variant DNA sequences and redefines the sensitivity of genetic testing. Nat Med 2008; 14:579 – 84. 23. Engelman JA, Mukohara T, Zejnullahu K, Lifshits E, Borras AM, Gale CM, et al. Allelic dilution obscures detection of a biologically significant resistance mutation in EGFR-amplified lung cancer. J Clin Invest 2006;116:2695–706. 24. Ogino A, Kitao H, Hirano S, Uchida A, Ishiai M, Kozuki T, et al. Emergence of epidermal growth factor receptor T790M mutation during chronic exposure to gefitinib in a non small cell lung cancer cell line. Cancer Res 2007;67:7807–14. 25. Kwak EL, Sordella R, Bell DW, Godin-Heymann N, Okimoto RA, Brannigan BW, et al. Irreversible inhibitors of the EGF receptor may circumvent acquired resistance to gefitinib. Proc Natl Acad Sci U S A 2005;102:7665–70.

26. Pao W, Miller VA, Politi KA, Riely GJ, Somwar R, Zakowski MF, et al. Acquired resistance of lung adenocarcinomas to gefitinib or erlotinib is associated with a second mutation in the EGFR kinase domain. PLoS Med 2005;2:e73. 27. Huang C, Taki T, Adachi M, Konishi T, Higashiyama M, Miyake M. Mutations in exon 7 and 8 of p53 as poor prognostic factors in patients with non-small cell lung cancer. Oncogene 1998;16: 2469 –77. 28. Greenman C, Stephens P, Smith R, Dalgliesh GL, Hunter C, Bignell G, et al. Patterns of somatic mutation in human cancer genomes. Nature 2007;446:153– 8. 29. Heid CA, Stevens J, Livak KJ, Williams PM. Real time quantitative PCR. Genome Res 1996;6:986 –94. 30. de La Motte Rouge T, Valent A, Ambrosetti D, Vielh P, Lacroix L. [Clinical and molecular predictors of response to EGFR tyrosine kinase inhibitors in non-small cell lung cancer]. Ann Pathol 2007;27:353– 63. [French] 31. Olivier M, Eeles R, Hollstein M, Khan MA, Harris CC, Hainaut P. The IARC TP53 database: new online mutation analysis and recommendations to users. Hum Mutat 2002;19:607–14. 32. Engelman JA, Ja¨nne PA. Mechanisms of acquired resistance to epidermal growth factor receptor tyrosine kinase inhibitors in non-small cell lung cancer. Clin Cancer Res 2008;14:2895–9. 33. Glaab WE, Skopek TR. A novel assay for allelic discrimination that combines the fluorogenic 5⬘ nuclease polymerase chain reaction (TaqMan) and mismatch amplification mutation assay. Mutat Res 1999;430:1–12. 34. Nagai Y, Miyazawa H, Huqun, Tanaka T, Udagawa K, Kato M, et al. Genetic heterogeneity of the epidermal growth factor receptor in nonsmall cell lung cancer cell lines revealed by a rapid and sensitive detection system, the peptide nucleic acid-locked nucleic acid PCR clamp. Cancer Res 2005;65:7276 – 82. 35. Maheswaran S, Sequist LV, Nagrath S, Ulkus L, Brannigan B, Collura CV, et al. Detection of mutations in EGFR in circulating lung-cancer cells. N Engl J Med 2008;359:366 –77.

Clinical Chemistry 55:4 757–764 (2009)

Molecular Diagnostics and Genetics

mRNA Expression and BRAF Mutation in Circulating Melanoma Cells Isolated from Peripheral Blood with High Molecular Weight Melanoma-Associated Antigen–Specific Monoclonal Antibody Beads Minoru Kitago,1 Kazuo Koyanagi,1 Takeshi Nakamura,1 Yasufumi Goto,1 Mark Faries,2 Steven J. O’Day,3 Donald L. Morton,2 Soldano Ferrone,4 and Dave S.B. Hoon1*

BACKGROUND: The detection of circulating tumor cells (CTCs) in the peripheral blood of melanoma patients by quantitative real-time reverse-transcription PCR (qRT-PCR) analysis correlates with a poor prognosis. The assessment of CTCs from blood has been difficult because of lack of a good monoclonal antibody (mAb) directed against surface cell antigens to capture melanoma cells.

Blood was collected prospectively from 57 melanoma patients (43 test and 14 test-development cases) and 5 healthy donors. High molecular weight melanoma-associated antigen (HMW-MAA)-specific mAbs bound to immunomagnetic beads were used to isolate CTCs. mRNA and/or DNA were extracted from CTCs. Testing for the expression of a melanomaassociated gene panel (MLANA, MAGEA3, and MITF) with qRT-PCR and for the presence of BRAFmt (a BRAF gene variant encoding the V600E mutant protein) verified the beads-isolated CTCs to be melanoma cells. A peptide nucleic acid– clamping PCR assay was used for BRAFmt analysis.

tected in 17 (81%) of the 21 assessed stage IV melanoma patients. CONCLUSION: The assay of bead capture coupled with the PCR has utility for assessing CTCs in melanoma patients, which can then be characterized for both genomic and transcriptome expression.

© 2009 American Association for Clinical Chemistry

METHODS:

RESULTS:

Spiking of peripheral blood cells (PBCs) with melanoma cells showed that the beads-based detection assay can detect approximately 1 melanoma cell in 5 ⫻ 106 PBCs. qRT-PCR analysis detected MLANA, MAGEA3, and MITF expression in 19 (44%), 29 (67%), and 19 (44%) of the patients, respectively. At least one biomarker of the panel was positive in 40 (93%) of the 43 melanoma patients. BRAFmt was de-

1

Department of Molecular Oncology; and 2 Division of Surgical Oncology, John Wayne Cancer Institute at Saint John’s Health Center, Santa Monica, CA; 3 The Angeles Clinic and Research Institute, Santa Monica, CA; 4 Departments of Surgery, Immunology, and Pathology, University of Pittsburgh Cancer Institute, Pittsburgh, PA. * Address correspondence to this author at: Department of Molecular Oncology, John Wayne Cancer Institute, 2200 Santa Monica Blvd., Santa Monica, CA 90404. Fax 310-449-5282; e-mail [email protected]. Received September 18, 2008; accepted January 12, 2009. Previously published online at DOI: 10.1373/clinchem.2008.116467 5 Nonstandard abbreviations: CTC, circulating tumor cell; qRT-PCR, quantitative real-time reverse-transcription PCR; BRAFmt, BRAF gene variant encoding the

The metastasis of melanoma cells to distant sites often portends a poor prognosis (1, 2 ), and detection of melanoma cells in the circulation is useful to assess disease progression (3 ). Previous studies have shown that the detection of melanoma cells in blood by direct isolation of mRNA without isolation of tumor cells is associated with a poor disease outcome (3–9 ) and is a predictor of subclinical disease. Furthermore, we and others have reported that serial monitoring of circulating tumor cells (CTCs)5 may be useful for predicting disease outcome for melanoma patients receiving adjuvant therapy (8 –10 ). To date, investigators have identified a limited number of biomarkers that are expressed by melanoma cells but are not detectable in healthy cells. We have developed a multimarker quantitative real-time reverse-transcription PCR (qRT-PCR) assay that uses specific prognostic biomarkers to detect melanoma cells in the blood and identify lymph node metastasis (9, 11, 12 ). This panel of genes includes MLANA6

V600E mutant protein; MAPK, mitogen-activated protein kinase; mAb, monoclonal antibody; HMW-MAA, high molecular weight melanoma-associated antigen; PBC, peripheral blood cell; AJCC, American Joint Committee on Cancer; FRET, fluorescence resonance energy transfer; PNA, peptide nucleic acid; LNA, locked nucleic acid; qPCR, quantitative PCR; PBLs, peripheral blood lymphocytes. 6 Human genes: MLANA, melan-A (also known as MART-1, melanoma antigen recognized by T cells 1); MAGEA3, melanoma antigen family A, 3; MITF, microphthalmia-associated transcription factor; BRAF, v-raf murine sarcoma viral oncogene homolog B1; GAPDH, glyceraldehyde-3-phosphate dehydrogenase.

757

(melan-A; also known as MART-1, melanoma antigen recognized by T cells 1), MAGEA3 (melanoma antigen family A, 3) (12, 13 ), and MITF (microphthalmiaassociated transcription factor) (9 ). MLANA is a melanocyte-differentiation antigen that is frequently synthesized by melanoma cells (12, 14 ). MAGEA3 is commonly synthesized by malignant cells of different embryologic origin and is not detectable in healthy tissues except male germline cells and placenta (12, 13 ). MITF plays an important role in melanocyte development and melanoma growth (9, 15 ). These qRT-PCR markers are able to detect both occult metastatic melanoma cells in sentinel lymph nodes (12 ) and CTCs in the blood, demonstrating their prognostic utility for melanoma patients (3, 9, 10 ). A mutant form of the BRAF (v-raf murine sarcoma viral oncogene homolog B1) gene (BRAFmt) encoding the V600E variant protein occurs at high frequency in melanomas (16 ). BRAF encodes a serine/ threonine kinase downstream in the RAS-MAPK pathway that transduces regulatory signals from RAS through MAPKs (mitogen-activated protein kinases) (17, 18 ). BRAFmt has been detected in 50%– 80% of metastatic lesions (16, 19 ). In addition, we have described the presence of circulating BRAFmt DNA in the serum of melanoma patients and demonstrated its clinical utility (19 ). Direct RT-PCR analysis of blood permits detection of transcriptome of CTCs. To date, characterization of CTCs for abnormalities in genotypic and mRNA expression has not been well studied in melanoma patients. This situation is due in part to the difficulty of isolating CTCs with monoclonal antibodies (mAbs) capable of recognizing cell surface markers. Characterizing the genotypic and phenotypic defects in CTCs would assist researchers in understanding the mechanisms that CTCs use to escape immune recognition and may suggest strategies for counteracting them. Therefore, in the present study we used high molecular weight melanoma-associated antigen (HMW-MAA) as a marker to develop a method for isolating CTCs from peripheral blood cells (PBCs). This cell surface antigen is highly expressed on melanoma cells in at least 85% of melanoma patients and is not detectable on healthy PBCs. We were able to obtain high-affinity mAbs that recognize distinct and spatially distant antigenic determinants of HMW-MAA and used a mixture of these HMW-MAA mAbs to capture CTCs. Materials and Methods MELANOMA CELL LINES

The human melanoma cell lines MA, MB, MC, MD, ME, MF, MG, MH, MI, MJ, MK, ML, and MM estab758 Clinical Chemistry 55:4 (2009)

lished and characterized at the John Wayne Cancer Institute were used for in vitro studies. Cells were grown in a T75 (75-cm2) culture flask (Corning Incorporated) in RPMI 1640 medium containing 100 mL/L heat-inactivated fetal calf serum, 1000 U/L of penicillin, and 1 mg/L of streptomycin (all from Gibco/ Invitrogen). Cells were harvested for mRNA analysis when they reached 70%– 80% confluence, as previously described (20 ). PATIENTS

All melanoma patients enrolled in this study signed consent forms for the use of blood samples and physical and medical histories. The study was carried out according to the guidelines set forth by the Saint John’s Health Center/John Wayne Cancer Institute Institutional Review Board. Disease stage was determined according to the American Joint Committee on Cancer (AJCC) and recorded at the time of blood draw. Blood was obtained from 14 stage IV melanoma patients for pilot experiments to optimize assays, and blood samples from 43 stage IV melanoma patients were used to validate assays. The study was conducted in a doubleblind fashion. The investigators who performed the PCR assays did not know the patients’ disease status. MONOCLONAL AND POLYCLONAL ANTIBODIES

Mouse mAbs 225.28, 763.74, TP61.5, VF1-TP41.2, and VF4-TP108, which recognize distinct and spatially distant determinants of HMW-MAA [as described previously (21, 22 )], were purified from ascitic fluid by sequential precipitation with ammonium sulfate and caprylic acid (23 ). The purity of mAb preparations was assessed by SDS-PAGE analysis, and the activity of mAb preparations was monitored by testing with HMW-MAA– bearing melanoma cells in a binding assay. R-phycoerythrin–labeled F(ab⬘)2 fragments of goat antimouse IgG antibodies were purchased from Santa Cruz Biotechnology. FLOW CYTOMETRIC ANALYSIS

Cells were incubated at 4 °C for 1 h with each HMWMAA–specific mAb (1 ␮g) or with an isotypematched control mAb. Cells were washed twice with PBS (144 mg/L KH2PO4, 9 g/L NaCl, 795 mg/L Na2HPO4 䡠 7 H2O, pH 7.2) containing 5 g/L BSA, incubated at 4 °C for an additional 30 min with an optimal amount of R-phycoerythrin–labeled F(ab⬘)2 fragments of goat antimouse IgG Abs, washed twice again, fixed in 40 g/L paraformaldehyde, and analyzed with a flow cytometer (FACSCalibur; BD Biosciences). Melanoma cells (104) were acquired for each sample. Debris, cell clusters, and dead cells were gated out by lightscatter assessment before single-parameter histograms

BRAF Mutation and mRNA Markers in Circulating Melanoma Cells

were drawn. Data were analyzed with the aid of CellQuest software (BD Biosciences). ISOLATION OF MELANOMA CELLS FROM PBCs WITH IMMUNOMAGNETIC BEADS

Blood samples (1 mL) were collected into sodium citrate– containing tubes, and the first several milliliters were discarded (10 ). Total cells in blood were collected after red blood cells were lysed with Purescript RBC lysis solution (Gentra Systems) according to the manufacturer’s instructions. CTCs were isolated from the total cells with immunomagnetic beads (CELLection™ Pan Mouse IgG Kit; Invitrogen) according to the manufacturer’s recommendations, with minor modifications. In brief, HMW-MAA–specific mouse mAbs 225.28, 763.74, TP61.5, VF1-TP41.2, and VF4-TP108 (3 ␮g each) were added separately or in combination to cells purified from PBCs and resuspended in 4.5 mL PBS containing 1g/L BSA. Samples were incubated at 4 °C overnight with rotation on an RKDynal rotor (Dynal Biotech/Invitrogen). After 2 washes with this PBS/ BSA solution, immunomagnetic beads (antimouse IgG– coated) were added to the samples, and incubation was continued at 4 °C for an additional 40 min with rotation on the RKDynal rotor. Samples were then placed in a Dynal MPC-6 magnetic device (Invitrogen) at 4 °C for 2 min. The fluid was removed by careful pipetting, and the beads were resuspended in the PBS/ BSA solution. The beads were washed twice with this washing procedure and then resuspended in RPMI 1640 medium containing 10 mL/L fetal bovine serum. Cells were released from the beads by the addition of Releasing Buffer (Invitrogen) and a 20-min incubation on a mixing device. The mixture was then vigorously pipetted and placed in the Dynal MPC-6 magnet for 2 min. The medium containing the released CTCs was collected with a pipet. RNA ASSAYS

Total RNA was extracted with Tri Reagent (Molecular Research Center) as described previously (10 ). RNA was quantified and assessed for purity by ultraviolet spectrophotometry and the Quant-iT™ RiboGreen® RNA Assay Kit (Molecular Probes/Invitrogen). Blood processing, RNA extraction, qRT-PCR assay setup, and post-PCR product analysis were carried out in separate designated rooms to prevent cross-contamination. Reverse-transcription reactions were performed with Moloney murine leukemia virus reverse transcriptase (Promega) with oligo(dT) primer. qRT-PCR assays were performed with the ABI 7900HT instrument (Applied Biosystems). Primer and probe sequences designed for qRT-PCR analyses have previously been described (3, 9, 11, 12 ). The fluorescence resonance energy transfer (FRET) probe sequences were 5⬘–FAM-CAGAA

CGTCACCACCACCTTATT-BHQ-1–3⬘ for MLANA, 5⬘–FAM-AGCTCCTGCCCACACTCCCGCCTGT-BHQ1–3⬘ for MAGEA3, 5⬘–FAM-AGAGCACTGGCCAAAG AGAGGCA-BHQ-1–3⬘ for MITF, and 5⬘–FAM-CAGCA ATGCCTCCTGCACCACCAA-BHQ-1–3⬘ for GAPDH (glyceraldehyde-3-phosphate dehydrogenase). Five microliters of cDNA synthesized from 250 ng total RNA were transferred to a well of a 96-well PCR plate (Fisher Scientific), along with each primer, probe, and custom iTaq Supermix with ROX (Bio-Rad Laboratories) (9, 11, 12 ). After a precycling hold at 95 °C for 10 min, samples were amplified in 45 PCR cycles: denaturation at 95 °C for 1 min; annealing for 1 min at 55 °C for GAPDH, at 58 °C for MAGEA3 and MITF, and at 59 °C for MLANA; and extension at 72 °C for 1 min. We generated a calibration curve with threshold cycle values from 9 serial dilutions of plasmid templates (100 to 108 copies) (3 ). The threshold cycle of each sample was interpolated from the calibration curve, and the number of mRNA copies was calculated by the iCycler iQ Real-Time PCR Detection System software (Bio-Rad). Thirteen melanoma lines and blood samples from 49 healthy donors were used to optimize the assay, as has previously been described (3, 9, 10 ). We performed each assay at least twice, and each assay included positive controls (melanoma lines), negative mRNA controls (PBCs), and PCR reagent controls without template. GAPDH, a so-called housekeeping gene, was used as an internal control to verify the integrity of the RNA and reversetranscription efficiency. Any sample with an inadequate quantity of GAPDH mRNA was excluded from the study. The mean mRNA copy number calculated was used for analysis (10 ). DNA EXTRACTION AND BRAF DNA MUTATION

We collected blood samples for the BRAFmt study from a subset of 21 melanoma patients (AJCC stage IV) whose samples were assessed with qRT-PCR and isolated CTCs as described above. Genomic DNA was extracted from CTCs (24 ), and BRAFmt was assessed as previously described (19, 25 ). In brief, primers were designed to amplify exon 15 of the BRAF gene, which includes the mutation hot spot that encodes the V600E variant. The peptide nucleic acid (PNA) (Applied Biosystems) was designed to clamp the hot spot on the wild-type template and block the wild-type template from being amplified in the PCR (19, 25 ). A FRET dual-labeled locked nucleic acid (LNA) probe (Proligo/Sigma–Aldrich) was designed to recognize and hybridize specifically to the T-to-A mutation at the sequence encoding V600E, because this mutation is the most frequently seen for the BRAF gene at this hot spot. The design of a second FRET DNA probe (BioSource/ Invitrogen) was based on using sequences adjacent to Clinical Chemistry 55:4 (2009) 759

the LNA probe and avoiding the hot spot. This probe was used to amplify and quantify the total number of DNA templates, both wild-type BRAF and BRAFmt, in the PCR. Quantitative PCR (qPCR) with both the PNA clamp and the FRET LNA probe was used for mutation detection. BRAF qPCR used the following primers and probe: forward, 5⬘–CCTCACAGTAAAAATAGGTG– 3⬘; reverse, 5⬘–ATAGCCTCAATTCTTACCA–3⬘; LNA, 5⬘–CTACAGAGAAATCTCGAT-BHQ-1–3⬘; and PNA, 5⬘–CTACAGTGAAATCTCG–3⬘. The iCycler iQ RealTime PCR Detection System (Bio-Rad) was used for the PCR assay. CTC genomic DNA (20 ng) was amplified by qPCR in a 20-␮L reaction volume containing 250 nmol/L each PCR primer, 250 nmol/L LNA, 500 nmol/L PNA, 800 ␮mol/L deoxynucleoside triphosphates, MgCl2, PCR buffer, and 1 U AmpliTaq Gold Polymerase (Applied Biosystems). The PCR conditions were 50 cycles of 94 °C for 1 min, 72 °C for 50 s, 53 °C for 50 s, and 72 °C for 1 min. Each sample was assayed in triplicate; control reactions included PCRs with templates derived from the appropriate positive and negative cell lines and a PCR reagent control without template. DNA from the MA cell line was established as the reference for measuring units of BRAFmt target DNA (heterozygous). The amount of target mutant DNA in 1 ␮g/mL MA genomic DNA was arbitrarily established as 1 U of BRAFmt. qPCR results for samples were normalized to this reference to quantify the relative units of BRAFmt in all of the samples. All PCRs for mutantsequence analysis were done in triplicate, and the median was used for data analysis. Representative BRAFmt and wild-type BRAF genes from V600E melanoma tumors were sequenced to confirm the accuracy of the PCR assay, as has previously been described (19, 25 ). The BRAF PCR used primers 5⬘–TGTTTTCCTTTACTTACTACACCTCA–3 (forward) and 5⬘–AGCATCTCAGGGCCAAAAAT–3⬘ (reverse). The PCR products were purified with the QIAquick PCR Purification Kit (Qiagen) and then sequenced directly at 58 °C with the GenomeLab DTCS Quick Start Kit (Beckman Coulter) according to the manufacturer’s instructions. Products of dyetermination reactions were assessed by capillary array electrophoresis on a CEQ 8000XL Genetic Analysis System (Beckman Coulter).

the sensitivity and specificity of each mAb (Fig. 1). HMW-MAA expression was detected on all melanoma lines with all HMW-MAA–specific mAbs tested. The individual mAbs recognize different epitopes on HMW-MAA. We subsequently used a mixture of all 5 HMW-MAA–specific mAbs for this assay. ISOLATION OF MELANOMA CELLS MIXED WITH PBCs

We conducted pilot experiments to optimize conditions for isolating CTCs from peripheral blood with HMW-MAA mAbs and immunomagnetic beads (Fig. 2). We used the immunomagnetic beads to assess captured CTCs via both an indirect technique and a direct technique to determine the optimal assay. In brief, the indirect technique entails first incubating melanoma cells with HMW-MAA mAbs; melanoma cells labeled with HMW-MAA mAbs are then bound and captured by the immunomagnetic beads. In the direct technique, HMW-MAA mAbs are first incubated with immunomagnetic beads; melanoma cells are then bound and captured by immunomagnetic beads with HMWMAA mAbs. Melanoma cells were added to healthy donor PBCs at ratios of 1–103 per 5 ⫻ 106 PBCs. qRTPCR analysis successfully detected mixtures of 1–5 melanoma cells in 5 ⫻ 106 healthy donor PBCs by the indirect technique but required at least 100 melanoma cells in 5 ⫻ 106 PBCs with the direct technique. Thus, the indirect technique was better than the direct technique for separating melanoma cells from healthy donor PBCs. A mixture of HMW-MAA–specific mAbs (225.28, 763.74, TP61.5, VF1-TP41.2, and VF4-TP108) was used to confirm the improvement in tumor cell capture. qRT-PCR analysis was always successful in detecting approximately 1 melanoma cell mixed with 5 ⫻ 106 PBCs in quadruplicate experiments that used the mAb mixture but was not always successful when a single HMW-MAA mAb was used. For in vitro studies, the mixture of HMW-MAA mAbs was better than a single HMW-MAA mAb for isolating melanoma cells from mixtures with healthy donor PBCs. In a pilot study that used samples from 14 melanoma patients, we determined that the optimal and most efficient amount of mAbs to use was 3 ␮g of each mAb. The optimal incubation time with CTCs and primary antibody was determined to be overnight. qRT-PCR DETECTION LIMIT FOR MELANOMA CELLS MIXED WITH PBCs

Results HMW-MAA PROTEIN EXPRESSION ON MELANOMA CELLS

The expression of HMW-MAA protein on cells of melanoma lines was initially screened by flow cytometric analysis with HMW-MAA–specific mAbs 225.28, 763.74, TP61.5, VF1-TP41.2, and VF4-TP108 to verify 760 Clinical Chemistry 55:4 (2009)

To assess the limit of detection for melanoma cells in blood with the optimized immunomagnetic bead method (i.e., indirect technique, overnight incubation with a mixture of all 5 HMW-MAA–specific mAbs), we performed immunomagnetic bead separation with a mixture of HMW-MAA mAbs and a qRT-PCR analysis of melanoma cells that we serially diluted with healthy

BRAF Mutation and mRNA Markers in Circulating Melanoma Cells

Fig. 1. HMW-MAA protein expression on melanoma cell lines by flow cytometry analysis with each of the HMW-MAA specific mAbs (225.28, 763.74, TP61.5, VF1-TP41.2, and VF4-TP108) and an isotype-matched control Ab.

donor PBCs. In the assays, we added serial dilutions of melanoma cells (104, 103, 102, 10, 1, and 0 cells) ex-

Fig. 2. HMW-MAA–positive cells were isolated from suspensions of healthy donor PBCs and melanoma cells by means of an HMW-MAA mAb mixture and immunomagnetic beads.

pressing MAGEA3, MLANA, and MITF to 5 ⫻ 106 donor-derived PBCs and assessed the qRT-PCR assay’s ability to detect each marker. This in vitro assay was performed both with and without the immunomagnetic bead technique, but always with the HMWMAA–specific mAbs added. The assay was repeated multiple times to validate the reproducibility and robustness of the assay system. This in vitro model system was used to mimic the detection of CTCs in blood. Use of the beads permitted the detection of MAGEA3, MLANA, and MITF mRNAs from approximately 1 melanoma cell mixed with 5 ⫻ 106 PBCs (Table 1). The number of mRNA copies detected gradually decreased with serial dilution of the melanoma cells (Fig. 3). The number of mRNA copies for individual markers varied with the cell line, as expected. This heterogeneity in mRNA production is also to be expected in blood samples from patients. All biomarkers were positive for detecting 10 melanoma cells mixed with 5 ⫻ 106 PBCs in all 4 experiments (Table 1). Detection frequencies for MAGEA3, MLANA, and MITF transcripts from approximately 1 melanoma cell mixed with 5 ⫻ 106 PBCs were 100%, 100%, and 75%, respectively. We perClinical Chemistry 55:4 (2009) 761

Table 1. qRT-PCR detection of melanoma markers in serially diluted melanoma cell lines (n ⴝ 4).a

A

MLANA

106

Cell number 104

103

102

101

100

0

GAPDH

4

4

4

4

4

0

MLANA

4

4

4

4

4

0

MAGEA3

4

4

4

4

4

0

MITF

4

4

4

4

3

0

Marker

a

Data represent the number of times the indicated marker was detected (of 4 experiments) at the indicated numbers of melanoma cells mixed with 5 ⫻ 106 PBCs. An HMW-MAA mAb mixture was used in an indirect assay with immunomagnetic beads. GAPDH is a housekeeping gene used as an internal control.

mRNA copy number

105

104

103

102

101

100

104

103

102

101

100

0

Melanoma cells / 5 x 10 6 PBLs

B

105

MAGEA3

We performed immunomagnetic beads– based qRTPCR assays with samples from healthy blood donors and obtained no positive signals for any of the 3 markers. These results demonstrated the specificity of the assay. In blood samples from 43 melanoma patients, MAGEA3, MLANA, and MITF mRNAs were detected in 67%, 44%, and 44% of the patients, respectively, when we used the assay with optimized CTC isolation (Table 2). At least one of the markers was detected in 40 (93%) of the 43 patients investigated when the panel of all 3 biomarkers was assessed; this detection level was higher than that obtained for any single biomarker. This analysis demonstrated the sensitivity and efficiency of the assay. The combination of the capture of melanoma CTCs with immunomagnetic beads containing HMW-MAA mAbs and qRT-PCR analysis constitutes a unique approach that integrates 2 different techniques to assess CTCs.

103

102

101

100

104

103

102

101

100

0

Melanoma cells / 5 x 10 6 PBLs

C 10

4

mRNA copy number

ASSESSMENT OF CTCs IN PBCs

mRNA copy number

104

formed the same experiments without immunomagnetic beads to assess the detection of melanoma cells in blood via direct isolation of mRNA from blood (Fig. 3). The numbers of mRNA copies obtained for each marker in samples processed with immunomagnetic beads were higher for all serial-dilution experiments than for samples processed without beads. The numbers of copies of MAGEA3 and MLANA transcripts obtained for bead-processed samples were the same as those obtained for samples without the bead-capture method when approximately 100 melanoma cells were mixed with 5 ⫻ 106 PBCs.

MITF

103

102

101

100 104

103

102

Melanoma cells / 5 x

101

10 6

100

0

PBLs

Fig. 3. qRT-PCR detection limit for melanoma cells (104, 103, 102, 10, 1, and 0 cells) mixed with 5 ⴛ 106 PBLs from healthy donors. Assay for MLANA (A), MAGEA3 (B), and MITF (C) with (filled columns) and without (open columns) the use of immunomagnetic beads with bound anti–HMW-MAA mAbs. Data are presented as the mean (SD).

ASSESSMENT OF BRAF MUTATION IN CTCs

In an in vitro model, the assay detected BRAFmt from 1–5 MA cells in mixtures with 5 ⫻ 106 healthy donorderived PBCs. BRAFmt was assessed in isolated CTCs 762 Clinical Chemistry 55:4 (2009)

from melanoma patients. Genomic DNA was extracted from 21 melanoma patients (AJCC stage IV), and BRAFmt was detected in 17 (81%) of these patients.

BRAF Mutation and mRNA Markers in Circulating Melanoma Cells

Table 2. Multimarker mRNA expression in blood samples from melanoma patients (n ⴝ 43). Positive expression, n (%)

Melanoma marker

MLANA

19 (44)

MAGEA3

29 (67)

MITF

19 (44)

No. of markers detected 0

3 (7)

1

19 (44)

2

15 (35)

3

6 (14)

4 ⱖ1

0 (0) 40 (93)

The study demonstrated that HMW-MAA–positive melanoma cells frequently have BRAFmt. The analysis also demonstrated that BRAFmt is frequently detected in CTCs with this assay. Discussion To characterize CTCs in PBCs, we combined immunomagnetic-bead isolation with DNA and RNA PCR assays. We previously reported that qRT-PCR– based methods can detect CTCs and that positive signals for markers are associated with metastatic disease and poor prognosis in melanoma patients (3, 9, 10 ). The specificity of these qRT-PCR assays with respect to the cell type being assessed is controversial, however, primarily because it is not clear whether all melanoma-associated biomarkers actually are derived from CTCs in the blood of melanoma patients. The use of immunomagnetic-bead capture of melanoma cells in our approach exploits the selective expression of HMW-MAA on melanoma cells (22 ). HMW-MAA has been shown to be heterogeneous in the expression of antigenic determinants recognized by mAbs. To minimize the occurrence of false-negative results caused by differential expression of HMWMAA determinants on melanoma cells and to increase the detection capability of our isolation procedure, we used a mixture of mAbs that recognize distinct and spatially distant determinants of HMW-MAA to isolate CTCs from the PBCs of melanoma patients. The use of an HMW-MAA mAb mixture enabled us to maximize the exclusion of healthy blood cells and other types of circulating cells and to maximize the isolation of CTCs (i.e., HMW-MAA–positive melanoma cells). The immunomagnetic beads– based assay is capable of capturing intact

HMW-MAA–positive melanoma cells. The identity of these CTCs as melanoma cells was established by subsequent analysis of 3 recognized melanoma-associated mRNA biomarkers frequently found in melanomas. The genes encoding these mRNA biomarkers do not have directly correlated functions (3, 9, 10 ). Our in vitro model demonstrated that our assay can detect approximately 1 melanoma cell diluted into 5 ⫻ 106 peripheral blood lymphocytes (PBLs) from healthy donors. This assay provides a new opportunity not only to improve melanoma diagnosis but also to identify patients who have spreading systemic disease. The approach of coupling the capture of CTCs with the PCR permits improved characterization of the CTCs in blood both for the expression of tumor biomarkers and for genomic aberrations. Our previous studies found that metastatic melanoma tumors were heterogeneous for the expression of melanoma-associated markers (3, 10 –12 ). For the clinical application, we selected 3 mRNA biomarkers for detecting CTCs in the blood of melanoma patients. Use of a combination of melanoma-associated markers in the qRT-PCR assay can circumvent the problem of heterogeneity in mRNA expression of individual markers. We previously demonstrated that melanoma-associated genes are not expressed at 100% in all melanoma cells (3 ). The MAGEA3 transcript had the highest overall detection rate in PBCs from melanoma patients, followed by MLANA and MITF transcripts. Differences between results obtained for cell lines and blood samples may be related to the physiology of cells in the circulation or to the phenotype of the clone shed into the blood (26 ). In paired patient blood samples, detection rates of individual biomarkers with qRT-PCR assays after immunomagnetic beads– based capture was slightly higher for some biomarkers than for individual biomarkers assessed by qRT-PCR without bead capture. We previously reported that the presence of a BRAF mutation in circulating DNA in serum may have clinical utility in predicting tumor response and disease outcome (19 ); however, whether the BRAF mutation in serum DNA actually derives from metastatic melanoma, tumors, or CTCs remains to be determined. Our results demonstrated a high frequency of BRAFmt in cells captured with HMW-MAA mAbs. BRAFmt analysis in the present assay may be useful as a melanomaassociated biomarker. Our description of this assay for a surrogate biomarker of CTCs in the blood of melanoma patients is the first such report. The BRAF mutation encoding the V600E variant is specific to tumor cells, particularly in melanoma, although nevi cells have been reported to possess the V600E variant (27 ). This mutation can be found in up to 80% of patients with metastatic cutaneous melanomas. A recent study found that circulating non-small cell lung cancer cells Clinical Chemistry 55:4 (2009) 763

were captured and that the cells featured epidermal growth factor receptor mutation in 11 of 12 patients (28 ). Currently, melanoma prognosis continues to be based on the tumor and demographic and prognostic factors of the host, despite the importance of ongoing dynamic factors of tumor metastasis. Our findings demonstrate the potential clinical usefulness of an immunomagnetic beads– based assay for detecting CTCs in the blood of melanoma patients. Future studies involving serial blood analysis and long-term follow-up of patients may be needed for a more detailed assessment of the clinical utility of this assay.

Author Contributions: All authors confirmed they have contributed to the intellectual content of this paper and have met the following 3 re-

quirements: (a) significant contributions to the conception and design, acquisition of data, or analysis and interpretation of data; (b) drafting or revising the article for intellectual content; and (c) final approval of the published article. Authors’ Disclosures of Potential Conflicts of Interest: Upon manuscript submission, all authors completed the Disclosures of Potential Conflict of Interest form. Potential conflicts of interest: Employment or Leadership: None declared. Consultant or Advisory Role: None declared. Stock Ownership: None declared. Honoraria: None declared. Research Funding: D.S.B. Hoon, NIH. Expert Testimony: None declared. Role of Sponsor: The funding organizations played no role in the design of study, choice of enrolled patients, review and interpretation of data, or preparation or approval of manuscript.

References 1. Balch CM, Buzaid AC, Soong SJ, Atkins MB, Cascinelli N, Coit DG, et al. Final version of the American Joint Committee on Cancer staging system for cutaneous melanoma. J Clin Oncol 2001; 19:3635– 48. 2. Greene FL, Page DL, Fleming ID, Fritz A, Balch CM, Haller DG, Morrow M, eds. AJCC cancer staging manual. 6th ed. New York: SpringerVerlag; 2002. 421 p. 3. Koyanagi K, Kuo C, Nakagawa T, Mori T, Ueno H, Lorico AR Jr, et al. Multimarker quantitative realtime PCR detection of circulating melanoma cells in peripheral blood: relation to disease stage in melanoma patients. Clin Chem 2005;51:981– 8. 4. Mellado B, Gutierrez L, Castel T, Colomer D, Fontanillas M, Castro J, Estape J. Prognostic significance of the detection of circulating malignant cells by reverse transcriptase-polymerase chain reaction in long-term clinically disease-free melanoma patients. Clin Cancer Res 1999;5:1843– 8. 5. Palmieri G, Strazzullo M, Ascierto PA, Satriano SM, Daponte A, Castello G. Polymerase chain reaction-based detection of circulating melanoma cells as an effective marker of tumor progression. Melanoma Cooperative Group. J Clin Oncol 1999; 17:304 –11. 6. Pantel K, Cote RJ, Fodstad O. Detection and clinical importance of micrometastatic disease. J Natl Cancer Inst 1999;91:1113–24. 7. Taback B, Morton DL, O’Day SJ, Nguyen DH, Nakayama T, Hoon DS. The clinical utility of multimarker RT-PCR in the detection of occult metastasis in patients with melanoma. Recent Results Cancer Res 2001;158:78 –92. 8. Wascher RA, Morton DL, Kuo C, Elashoff RM, Wang HJ, Gerami M, Hoon DS. Molecular tumor markers in the blood: early prediction of disease outcome in melanoma patients treated with a melanoma vaccine. J Clin Oncol 2003; 21:2558 – 63. 9. Koyanagi K, O’Day SJ, Gonzalez R, Lewis K, Robinson WA, Amatruda TT, et al. Microphthalmia transcription factor as a molecular marker for circulating tumor cell detection in blood of melanoma patients. Clin Cancer Res 2006;12:1137– 43.

764 Clinical Chemistry 55:4 (2009)

10. Koyanagi K, O’Day SJ, Gonzalez R, Lewis K, Robinson WA, Amatruda TT, et al. Serial monitoring of circulating melanoma cells during neoadjuvant biochemotherapy for stage III melanoma: outcome prediction in a multicenter trial. J Clin Oncol 2005;23:8057– 64. 11. Takeuchi H, Kuo C, Morton DL, Wang HJ, Hoon DS. Expression of differentiation melanomaassociated antigen genes is associated with favorable disease outcome in advanced-stage melanomas. Cancer Res 2003;63:441– 8. 12. Takeuchi H, Morton DL, Kuo C, Turner RR, Elashoff D, Elashoff R, et al. Prognostic significance of molecular upstaging of paraffinembedded sentinel lymph nodes in melanoma patients. J Clin Oncol 2004;22:2671– 80. 13. Miyashiro I, Kuo C, Huynh K, Iida A, Morton D, Bilchik A, et al. Molecular strategy for detecting metastatic cancers with use of multiple tumorspecific MAGE-A genes. Clin Chem 2001;47:505–12. 14. Kawakami Y, Eliyahu S, Delgado CH, Robbins PF, Rivoltini L, Topalian SL, et al. Cloning of the gene coding for a shared human melanoma antigen recognized by autologous T cells infiltrating into tumor. Proc Natl Acad Sci U S A 1994; 91:3515–9. 15. Davis IJ, Kim JJ, Ozsolak F, Widlund HR, Rozenblatt-Rosen O, Granter SR, et al. Oncogenic MITF dysregulation in clear cell sarcoma: defining the MiT family of human cancers. Cancer Cell 2006;9:473– 84. 16. Shinozaki M, Fujimoto A, Morton DL, Hoon DS. Incidence of BRAF oncogene mutation and clinical relevance for primary cutaneous melanomas. Clin Cancer Res 2004;10:1753–7. 17. Davies H, Bignell GR, Cox C, Stephens P, Edkins S, Clegg S, et al. Mutations of the BRAF gene in human cancer. Nature 2002;417:949 –54. 18. Wan PT, Garnett MJ, Roe SM, Lee S, NiculescuDuvaz D, Good VM, et al. Mechanism of activation of the RAF-ERK signaling pathway by oncogenic mutations of B-RAF. Cell 2004;116:855– 67. 19. Shinozaki M, O’Day SJ, Kitago M, Amersi F, Kuo C, Kim J, et al. Utility of circulating B-RAF DNA mutation in serum for monitoring melanoma pa-

20.

21.

22.

23.

24.

25.

26.

27.

28.

tients receiving biochemotherapy. Clin Cancer Res 2007;13:2068 –74. Takeuchi H, Fujimoto A, Tanaka M, Yamano T, Hsueh E, Hoon DS. CCL21 chemokine regulates chemokine receptor CCR7 bearing malignant melanoma cells. Clin Cancer Res 2004;10:2351– 8. Goto Y, Ferrone S, Arigami T, Kitago M, Tanemura A, Sunami E, et al. Human high molecular weight-melanoma-associated antigen: utility for detection of metastatic melanoma in sentinel lymph nodes. Clin Cancer Res 2008;14:3401– 8. Campoli MR, Chang CC, Kageshita T, Wang X, McCarthy JB, Ferrone S. Human high molecular weight-melanoma-associated antigen (HMWMAA): a melanoma cell surface chondroitin sulfate proteoglycan (MSCP) with biological and clinical significance. Crit Rev Immunol 2004;24: 267–96. Temponi M, Kageshita T, Perosa F, Ono R, Okada H, Ferrone S. Purification of murine IgG monoclonal antibodies by precipitation with caprylic acid: comparison with other methods of purification. Hybridoma 1989;8:85–95. Spugnardi M, Tommasi S, Dammann R, Pfeifer GP, Hoon DS. Epigenetic inactivation of RAS association domain family protein 1 (RASSF1A) in malignant cutaneous melanoma. Cancer Res 2003;63:1639 – 43. Kim J, Giuliano AE, Turner RR, Gaffney RE, Umetani N, Kitago M, et al. Lymphatic mapping establishes the role of BRAF gene mutation in papillary thyroid carcinoma. Ann Surg 2006;244:799 – 804. Hoon DS, Wang Y, Dale PS, Conrad AJ, Schmid P, Garrison D, et al. Detection of occult melanoma cells in blood with a multiple-marker polymerase chain reaction assay. J Clin Oncol 1995;13:2109 –16. Michaloglou C, Vredeveld LC, Soengas MS, Denoyelle C, Kuilman T, van der Horst CM, et al. BRAFE600-associated senescence-like cell cycle arrest of human naevi. Nature 2005;436:720 – 4. Maheswaran S, Sequist LV, Nagrath S, Ulkus L, Brannigan B, Collura CV, et al. Detection of mutations in EGFR in circulating lung-cancer cells. N Engl J Med 2008;359:366 –77.

Clinical Chemistry 55:4 765–773 (2009)

Molecular Diagnostics and Genetics

Circulating Prostate Tumor Cells Detected by Reverse Transcription–PCR in Men with Localized or Castration-Refractory Prostate Cancer: Concordance with CellSearch Assay and Association with Bone Metastases and with Survival Pauliina Helo,1 Angel M. Cronin,1† Daniel C. Danila,2† Sven Wenske,1 Rita Gonzalez-Espinoza,3 Aseem Anand,3 Michael Koscuiszka,2 Riina-Minna Va¨a¨na¨nen,4 Kim Pettersson,4 Felix K.-H. Chun,5 Thomas Steuber,5 Hartwig Huland,5 Bertrand D. Guillonneau,1 James A. Eastham,1 Peter T. Scardino,1 Martin Fleisher,3 Howard I. Scher,2 and Hans Lilja1,2,3*

BACKGROUND: Reverse transcription–PCR (RT-PCR) assays have been used for analysis of circulating tumor cells (CTCs), but their clinical value has yet to be established. We assessed men with localized prostate cancer or castration-refractory prostate cancer (CRPC) for CTCs via real-time RT-PCR assays for KLK3 [kallikrein-related peptidase 3; i.e., prostate-specific antigen (PSA)] and KLK2 mRNAs. We also assessed the association of CTCs with disease characteristics and survival.

KLK3, KLK2, and PSCA (prostate stem cell antigen) mRNAs were measured by standardized, quantitative real-time RT-PCR assays in blood samples from 180 localized-disease patients, 76 metastatic CRPC patients, and 19 healthy volunteers. CRPC samples were also tested for CTCs by an immunomagnetic separation system (CellSearch™; Veridex) approved for clinical use.

PSA concentrations. PSCA mRNA was detected in only 7 CRPC patients (10%) and was associated with a positive KLK mRNA status. CONCLUSIONS: Real-time RT-PCR assays of KLK mRNAs are highly concordant with CellSearch CTC results in patients with CRPC. KLK2/3-expressing CTCs are common in men with CRPC and bone metastases but are rare in patients with metastases diagnosed only in soft tissues and patients with localized cancer.

© 2009 American Association for Clinical Chemistry

METHODS:

All healthy volunteers were negative for KLK mRNAs. Results of tests for KLK3 or KLK2 mRNAs were positive (ⱖ80 mRNAs/mL blood) in 37 patients (49%) with CRPC but in only 15 patients (8%) with localized cancer. RT-PCR and CellSearch CTC results were strongly concordant (80%– 85%) and correlated (Kendall ␶, 0.60 – 0.68). Among CRPC patients, KLK mRNAs and CellSearch CTCs were closely associated with clinical evidence of bone metastases and with survival but were only modestly correlated with serum

For patients with prostate cancer, there is need for improved predictive markers to facilitate treatment selection and to monitor the effects of treatment. This need is particularly acute for patients with metastatic and/or castration-refractory disease. In these patients, the measured concentration of prostate-specific antigen (PSA)6 is only loosely associated with survival, bone scans offer only limited information on changes in the disease, and biopsy has poor sensitivity and is not clinically practical on a repeated basis. In addition, there is a need for improved preoperative staging modalities for patients with localized prostate cancer. These problems have created interest in circulating tumor cells (CTCs) as a potential marker. The numerous techniques that have been used to detect CTCs can be categorized as techniques for detecting tissueor disease-specific gene expression, such as reverse

1



RESULTS:

Departments of Surgery (Urology Service); 2 Medicine (Genitourinary Oncology Service); and 3 Clinical Laboratories, Memorial Sloan-Kettering Cancer Center, New York, NY; 4 Department of Biotechnology, University of Turku, Turku, Finland; 5 Department of Urology, University Hospital of Hamburg, Hamburg, Germany. * Address correspondence to this author at: Departments of Clinical Laboratories, Surgery and Medicine, Memorial Sloan-Kettering Cancer Center, 1275 York Ave., Box 213, New York, NY 10065. E-mail [email protected].

A.M. Cronin and D.C. Danila made equivalent contributions. Received September 19, 2008; accepted January 26, 2009. Previously published online at DOI: 10.1373/clinchem.2008.117952 6 Nonstandard abbreviations: PSA, prostate-specific antigen; CTC, circulating tumor cell; RT-PCR, reverse transcription–PCR; CRPC, castration-refractory prostate cancer; MSKCC, Memorial Sloan-Kettering Cancer Center; RP, radical prostatectomy; UKE, University Hospital of Hamburg; CPE, concordance probability estimate.

765

transcription–PCR (RT-PCR), and techniques for detecting CTCs as intact cells, such as flow cytometry and immunomagnetic capture. The one assay approved for clinical use in the US is CellSearch semiautomated immunomagnetic capture and detection (Veridex) (1 ). It has been hypothesized that a more sensitive method for CTC detection may be real-time RT-PCR of tissuespecific transcripts. End-point RT-PCR for molecular staging of prostate cancer was investigated with high expectations in the early 1990s (2, 3 ). The hope was that CTC detection might provide a more accurate way to preoperatively predict pathologic stage and the risk of disease recurrence. After initially showing promise, however, CTC-detection methods subsequently have often produced discrepant results, and most reports have failed to show clinical value (4 –11 ). Furthermore, the proportions of positive samples in different disease stages remain unclear. Frequencies of blood samples positive for prostate-specific RNAs have ranged from 0% to 81% in clinically localized disease and from 31% to 100% in metastatic disease (12, 13 ). Widespread inconsistencies in results may stem largely from differences in preanalytical sample processing, differences in analytical methods, poor standardization, and qualitative or semiquantitative detection of end-point RTPCR products. We have developed sensitive, highly reproducible, and fully standardized real-time quantitative RT-PCR assays for KLK37 (kallikrein-related peptidase 3; i.e., PSA) and KLK2 (human kallikrein 2) (14 –16 ). In this study, we compared this method to CellSearch for patients with castration-resistant prostate cancer (CRPC), and investigated the association of CTCs with disease characteristics and survival. We also investigated the frequency of KLK3- or KLK2-expressing CTCs in men with clinically localized prostate cancer. Finally, we describe a new real-time quantitative RTPCR assay for the detection of PSCA (prostate stem cell antigen) mRNA. The PSCA gene is overexpressed in prostate cancer metastases (17 ), and our objective was to assess the frequency of PSCA-expressing CTCs in patients with CRPC. Materials and Methods PARTICIPANTS, SAMPLE COLLECTION, AND CellSearch ANALYSIS

The study enrolled 80 patients treated at Memorial Sloan-Kettering Cancer Center (MSKCC) for metastatic CRPC with castrate concentrations of testosterone (⬍1.74 nmol/L). Radionuclide bone scans were

7

Human genes: KLK3, kallikrein-related peptidase 3; KLK2, kallikrein-related peptidase 2; PSCA, prostate stem cell antigen.

766 Clinical Chemistry 55:4 (2009)

reviewed for the presence or absence of metastatic bone disease, and computed tomography and/or MRI scans were evaluated for lymph node, liver, or lung soft tissue disease, or for epidural and prostatic/pelvic masses. Four patients without evidence of metastatic disease or castrate concentrations of testosterone were excluded. As controls, the study enrolled 19 healthy volunteers: 12 men younger than 40 years without prostate cancer and 7 women. We enrolled 3 groups of patients with localized disease. The first consisted of 42 patients who had undergone radical prostatectomy (RP) at MSKCC at least 6 months before sample collection (median, 24 months; interquartile range, 6 – 80 months). Eleven patients in this group (26%) had pathologic stage pT3a, 4 patients (10%) had stage pT3b disease, and 27 patients had pT2 cancer with positive surgical margins (3 patients, 7%), capsular invasion (15 patients, 36%), or no adverse pathologic features (9 patients, 21%). The second group included 87 patients who were scheduled to undergo RP for clinically localized prostate cancer and whose RP sample subsequently showed at least one unfavorable feature, defined as seminal vesicle invasion, extracapsular extension or capsular invasion, positive surgical margin, or lymph node involvement. Thirty-two patients in the second group (37%) had pathologic stage pT3a, 9 patients (10%) had pT3b disease, and 2 patients (2%) were in stage pT4. The remaining patients had pT2 cancer with lymph node involvement (1 patient, 1%), positive surgical margin (7 patients, 8%), or capsular invasion (36 patients, 41%). The third group consisted of 51 patients with prostate cancer diagnosed at University Hospital of Hamburg (UKE), Germany, and who were scheduled to undergo either RP for clinically localized cancer or radiation therapy of the prostate. Six (23%) of the 26 pre-RP patients had a pathologic stage of pT3a, and 6 (23%) had pT3b disease. Fourteen patients had pT2 cancer, 2 (8%) with positive surgical margins and 12 (46%) with no adverse pathologic features. Biochemical recurrence was defined as having at least one serum PSA value ⬎0.4 ng/mL. Peripheral blood (2.5 mL) was collected in PAXgene Blood RNA tubes (PreAnalytiX). Samples were incubated at room temperature for 24 h and stored at ⫺20 °C and ⫺80 °C until RNA isolation. For CRPC patients, a second sample of 7.5 mL was collected at the same visit in a CellSave tube (Veridex) and processed for CTC counts by CellSearch immunomagnetic selection as previously described (18 ). All samples were collected under institutional review board–approved protocols with informed consent.

Circulating Tumor Cells in Localized and Metastatic Prostate Cancer

RNA ISOLATION AND cDNA SYNTHESIS

RNA was isolated with the PAXgene Blood RNA kit (PreAnalytiX) according to the manufacturer’s instructions, including the optional DNase digestion. An internal standard RNA, m3PSA (see the Data Supplement that accompanies the online version of this article at http://www.clinchem.org/content/vol55/issue4) was added in predetermined amounts to each patient sample at the beginning of the RNA isolation (step 5 in the PAXgene RNA-isolation protocol) to produce 10 000 molecules/␮L of the reverse-transcription reaction. The internal standard reflects any variation arising from the different steps of the RT-PCR protocol from RNA isolation to signal detection. RNA was quantified with the RiboGreen RNA-quantitation reagent (Invitrogen), immediately divided into 2 aliquots, and reverse-transcribed with High-Capacity cDNA Archive Kit (Applied Biosystems). RNA priming and reverse transcription are detailed in the online Data Supplement. In brief, an aliquot of RNA was reversetranscribed with a mixture of sequence-specific RNA primers and anchored oligo(dT)12 primers to detect KLK mRNA and internal standard RNA, and a second aliquot was reverse-transcribed with a mixture of random primers and anchored oligo(dT)12 primers for the detection of PSCA expression. CALIBRATION CURVE

As an external calibration curve, we used calibrators containing varying amounts of KLK3 and KLK2 mRNA (2.5 to 104 mRNA copies/␮L of the reversetranscription reaction, corresponding to 160 to 6.4 ⫻ 105 copies/mL of blood) and a fixed amount of m3PSA RNA (10 000 RNA molecules/␮L of reversetranscription reaction). Calibrator RNAs were diluted into 100 ng/␮L tRNA (Escherichia coli MRE 600 tRNA; Roche Applied Science) in sterile water. The external calibrators were analyzed along with all patient samples. In vitro production of RNA calibrators and data analysis have been described (19 –21 ). REAL-TIME RT-PCR METHODOLOGY

The real-time quantitative RT-PCR methodology for KLK3 and KLK2 mRNA has been described in detail (14 –16, 19 ). In this study, 2.5 ␮L cDNA, representing 39 ␮L of blood, was used as template in 10-␮L amplification reactions. All samples were run in duplicate. The mean within-assay variation (maximum difference between duplicate reactions) was ⬍0.5 threshold cycles for the internal standard mRNA and ⬍1 threshold cycle for KLK3 and KLK2 mRNAs. RT-PCR ASSAY FOR PSCA mRNA

An internally standardized real-time quantitative RTPCR assay was developed for PSCA mRNA. The primer

and probe sequences were designed to span the exon splice junctions; oligonucleotide sequences are shown in Table 1 in the online Data Supplement. The reporter probe (Thermo Fisher Scientific) had a 5⬘ aminolinker and a 3⬘ phosphate; it was terbium-labeled at the 5⬘ end and purified as described (16 ). The quencher probe was modified with a 3⬘ Dabcyl quencher moiety (Thermo Fisher Scientific). PCR reactions were run in duplicate with 5 ␮L of cDNA as template in 25-␮L reactions, with reaction compositions and thermal cycling as described for KLK3 mRNA (15 ). Data were analyzed as previously described (22 ). The PSCA mRNA assay was performed on 71 of the 76 samples from CRPC patients. STATISTICAL METHODS

A positive RT-PCR result was defined as ⱖ80 copies of target RNA per milliliter of blood in both PCR replicates, with at least 20% recovery of the internal standard. A positive CellSearch result was defined as ⱖ5 CTCs per 7.5 mL of blood sample. We analyzed the relationship between KLK mRNA and CellSearch CTC results in 2 ways. First, we calculated the concordance: the proportion of patients categorized as either both positive or both negative in KLK mRNA and CellSearch assays. Second, we considered the KLK mRNA and CTC results as continuous variables and calculated the Kendall ␶ to estimate correlation. We tested the association of a positive RT-PCR result to clinical characteristics of the CRPC patients: presence or absence of biochemical progression at the time of research blood draw, location of metastasis, and overall survival. The probability of survival after the time of blood draw was estimated with Kaplan– Meier methods. Univariate associations of PSA, log CTC count, log KLK2 mRNA, and log KLK3 mRNA were evaluated with Cox proportional hazards regression models. Predictive accuracy is given by the concordance probability estimate (CPE). The CPE measures the level of concordance between the survival time and the prognostic index as determined by the Cox model. The CPE ranges between 0.5 and 1.0, with 1.0 representing perfect concordance between the prognostic index and survival time, and 0.5 representing absence of a relationship between prognostic index and survival time [for further information on the CPE, see (23 )]. Owing to the limited follow-up, we were unable to compare actual recurrence outcomes for the patients with localized disease. We instead used the postoperative nomogram probability as a surrogate for recurrence outcomes. The postoperative nomogram probability (24 ) was computed for all patients with locally advanced disease who underwent RP (42 patients with postoperative blood samples and 87 patients with Clinical Chemistry 55:4 (2009) 767

preoperative blood samples from MSKCC; 26 patients with preoperative blood samples from UKE), and the Mann–Whitney U-test was used to compare probabilities for patients positive for KLK3 or KLK2 mRNA with those who were negative for both KLK3 and KLK2 mRNAs.

Table 1. Concordance and correlation between CellSearch CTC status and KLK3 or KLK2 mRNA status in CRPC patients. Patients, n Negative Positive Concordance

Results

CellSearch CTCs

KLK3 RNA

CHARACTERISTICS OF THE CRPC PATIENTS

The clinical characteristics of the 76 CRPC patients included in the study are detailed in Table 2 in the online Data Supplement. At the time of research blood draw, 60 (79%) of the patients had increasing PSA concentrations under androgen-depletion therapy. Metastases were limited to soft tissue in 9 patients (12%), limited to bone in 26 patients (34%), and in both soft tissue and bone in the remaining 41 patients (54%). The median serum PSA concentration at the time of research blood draw was 111 ng/mL (interquartile range, 31– 433 ng/mL).

Negative

32

8

Positive

2

23

Negative

29

8

Positive

5

23

KLK2 RNA

KLK3 and KLK2 Negative

27

5

Either KLK positive

7

26

KLK3 RNA WITH KLK AND PSCA mRNAs

768 Clinical Chemistry 55:4 (2009)

85%

0.68

80%

0.60

82%

NTa

KLK2 RNA

CONCORDANCE AND CORRELATION OF CellSearch CTC COUNTS

The KLK3 mRNA status was positive in 27 (36%) of the 76 patients with CRPC, and the KLK2 mRNA status was positive in 32 patients (42%). Results for one or both KLK mRNAs were positive in 37 patients (49%). In contrast, results for both KLKs were negative in all 19 healthy volunteers. The CellSearch CTC status was positive in 31 (48%) of the 65 CRPC patients with evaluable results. The median CTC count was 4 cells per 7.5 mL of blood (interquartile range, 1–31 cells). Thirty-one patients (48%) had ⱖ5 CTCs, and 24 (37%) had ⱖ10 CTCs. A positive status for PSCA mRNA occurred at much lower frequency (10%, 7 of 71 patients tested). Six of the 7 patients positive for PSCA mRNA were also positive for one or both KLK mRNAs. Table 1 summarizes the concordance between the KLK mRNA and CellSearch CTC data. The status for either KLK3 or KLK2 mRNA was concordant with CellSearch CTC counts in 82% of the CRPC patients. Of the 34 patients with a negative CTC status (ⱕ5 cells), 7 (21%) had detectable KLK3 or KLK2 mRNA. To assess the level of confidence that KLK mRNA is a marker for CTCs, we computed the 95% CI around the estimate that a CTC-positive patient was also positive for KLK mRNA (i.e., sensitivity of KLK mRNA for detecting CTCs). The sensitivity of both KLK3 and KLK2 mRNA was 74% (95% CI, 55%– 88%), and that of either KLK3 or KLK2 mRNA was 84% (95% CI, 66%– 95%). Because CTCs are given per 7.5 mL of blood and 5 CTCs therefore correspond to only 0.67 cells/mL, we

Kendall ␶

a

80%

Negative

32

8

Positive

5

20

0.66

NT, not tested.

also evaluated the level of confidence that KLK mRNA was a marker of 2 CTCs/mL (i.e., ⱖ15 CTCs per 7.5 mL blood). The sensitivities were 82% (18 of 22 patients; 95% CI, 60%–95%) for KLK3 mRNA, 91% (20 of 22 patients; 95% CI, 71%–99%) for KLK2 mRNA, and 95% (21 of 22 patients; 95% CI, 77%–100%) when either KLK3 or KLK2 mRNA was considered. We note that this sensitivity was achieved with a much smaller volume of blood than is used in the CellSearch assay (39 ␮L of blood per PCR reaction). These results are also consistent with prior data showing that the KLK assays are sufficiently sensitive to detect ⬍2 LNCaP cells per milliliter of blood in a background of 107 nucleated cells (or 160 mRNA copies/mL of blood) (15 ). The Kendall ␶ coefficient of correlation between CTC counts and mRNA copy numbers was 0.68 for KLK3 and 0.60 for KLK2 (Table 1). Scatter plots of CTC counts vs KLK3 mRNA and vs KLK2 mRNA copy numbers are shown in Fig. 1. A comparison of KLK2 and KLK3 data also revealed high levels of concordance (80%, 52 of 65 patients) and correlation (Kendall ␶, 0.66) (Table 1). ASSOCIATION WITH DISEASE CHARACTERISTICS

A positive status for KLK3 and KLK2 mRNAs was found only in patients who had clinical evidence of

Circulating Tumor Cells in Localized and Metastatic Prostate Cancer

Fig. 1. Scatter plots showing the relationship of KLK3 mRNA and KLK2 mRNA copy numbers with CTC counts as measured by CellSearch. Each point represents 1 patient with CRPC. Indicated are patients experiencing disease progression at the time of analysis (blue) and nonprogressing patients (red). Log-transformed values were used to construct the plots.

bone metastases; one or both KLK mRNAs were positive in 13 patients (50%) with bone metastases alone, in 24 patients (59%) with bone and soft tissue metastases, and in none of the patients with soft tissue metastases alone (Fig. 2; see Table 3 in the online Data Supplement). Similarly, a positive CellSearch result was more frequent in patients with bone metastases. PSA concentration in serum (measured within 3 days of collecting blood for CTC analyses) was modestly correlated with KLK mRNA copy number and with CellSearch CTC count. Kendall ␶ coefficients of correlation were 0.43 for serum PSA and KLK3 mRNA, 0.35 for serum PSA and KLK2 mRNA, and 0.42 for serum PSA and CTC counts (all P ⬍ 0.001). A positive status for KLK mRNAs and for CTC count also varied with the number of systemic therapies administered to the patients. The frequency of patients positive for either KLK mRNA was higher in patients after failure of treatment with multiple chemotherapeutic regimens (61%, 11 of 18 patients) compared with patients with first-line chemotherapy (45%, 9 of 20 patients) or only hormonal therapy (42%, 16 of 38 patients). Interestingly, this effect was even more pronounced for CellSearch CTC results, which were positive for 10 (71%) of 14 patients with failure of multiple regimens, 10 (53%) of 19 patients who received first-line chemotherapy, and 11 (34%) of 32 patients who received only hormonal therapy.

Fig. 2. CTC count, KLK mRNA, and PSCA mRNA status (percent positive) among CRPC patients with metastases in bone (n ⴝ 26), bone and soft tissue (n ⴝ 41), and soft tissue only (n ⴝ 9). Five patients were not tested for PSCA mRNA, and CellSearch data were not available for 11 patients.

Clinical Chemistry 55:4 (2009) 769

Fig. 3. Kaplan–Meier survival probability according to KLK mRNA status for the 76 patients with CRPC.

ASSOCIATION WITH SURVIVAL

Of the 76 CRPC patients, 34 died during follow-up, with a median time to death of 17 months. The median follow-up for survivors was 14 months. Kaplan–Meier survival estimates show that survival times were shorter for patients who tested positive for KLK mRNA (Fig. 3). The corresponding plot constructed for the 60 patients with evidence of disease progression at the time of blood draw was essentially identical (data not shown). A univariate analysis indicated that KLK2 mRNA, KLK3 mRNA, CTC count, and serum PSA concentration were all highly associated with survival (all P ⬍ 0.001; Table 2). We constructed several multivariable models with these variables (Table 2) and included only patients with available PSA measurements and CTC counts (n ⫽ 60) to facilitate comparisons between models. The predictive accuracy of serum PSA alone as assessed by the CPE was 0.728. The inclusion of KLK3 mRNA produced similar results (0.726), but the CPE increased to 0.749 and 0.765 with the inclusion of KLK2 mRNA or CellSearch CTC counts, respectively. The full model (PSA ⫹ CTCs ⫹ KLK2 ⫹ KLK3) had a CPE of 0.759, which was similar that of the model that included PSA and CTC counts only. KLK mRNAs IN LOCALIZED DISEASE

Either or both KLK3 and KLK2 mRNAs were detected in only a small proportion of the patients with localized disease. Tables 4 and 5 in the online Data Supplement summarize patient characteristics according to KLK 770 Clinical Chemistry 55:4 (2009)

Table 2. Univariate analysis of associations with survival by Cox proportional hazards regression and predictive accuracy of univariate and multivariable models predicting survival. Univariate association

Variable

Patients, Hazard n ratio

95% CI

P

1.30–2.13 ⬍0.0005

Log PSA

69

1.67

Log CTC

65

1.29

1.16–1.45 ⬍0.0005

Log KLK2 mRNA

76

1.16

1.09–1.25

Log KLK3 mRNA

76

1.18

1.10–1.26 ⬍0.0005

Model

CPE (SE)a

PSA

0.728 (0.042)

KLK2 mRNA

0.670 (0.032)

KLK3 mRNA

0.688 (0.028)

PSA ⫹ KLK2 mRNA

0.749 (0.036)

PSA ⫹ KLK3 mRNA

0.726 (0.041)

PSA ⫹ KLK2 mRNA ⫹ KLK3 mRNA

0.741 (0.034)

CTC

0.718 (0.029)

PSA ⫹ CTC

0.765 (0.037)

PSA ⫹ CTC ⫹ KLK2 mRNA ⫹ KLK3 mRNA

0.759 (0.036)

a

0.0007

Calculated for the subset of 60 CRPC patients with PSA and CTC data to facilitate comparisons between models.

Circulating Tumor Cells in Localized and Metastatic Prostate Cancer

mRNA status. Of the patients treated at MSKCC, one or both KLK mRNAs were detected in 6 (14%) of the 42 patients with samples collected after RP and in 6 (7%) of the 87 patients with samples collected before RP. Results were similar for the 51 pretreatment samples from UKE; only 3 patients (6%) were positive for KLK mRNA. KLK mRNA status showed no apparent association with unfavorable localized disease features (see Tables 4 and 5 in the online Data Supplement). There was no significant difference in the postoperative nomogram probability of local recurrence between patients positive for KLK3 or KLK2 mRNA and patients negative for both mRNAs (P ⫽ 0.3 for MSKCC preoperative; P ⫽ 0.6 for MSKCC postoperative; and P ⫽ 0.6 for UKE preoperative). Furthermore, there was no significant association between a positive status for KLK mRNAs and the time from the last biopsy or other prostatic manipulation (such as transurethral prostatectomy) to research blood draw (data not shown). Discussion RT-PCR has been used extensively as a means of detecting circulating prostate tumor cells, but it has yet to become accepted, in part because of discrepant results. Our approach has been to use internally and externally standardized quantitative real-time RT-PCR, extensively optimized assay conditions, and samplecollection procedures that help to preserve the true RNA profile. With this method, we have shown high correlation and concordance between results obtained with our real-time RT-PCR assays and those obtained with an independent assay for CTCs (CellSearch) in patients with metastatic CRPC. Realtime RT-PCR assays were able to detect KLK3 or KLK2 mRNAs in ⱖ95% of samples that had 15 or more CTCs per 7.5 mL of blood according to the CellSearch assay. CellSearch has been approved by the US Food and Drug Administration for predicting progression-free and overall survival in metastatic breast cancer (1, 25–27 ) and has recently been approved for use in advanced prostate cancer. Cells isolated via CellSearch technology from patients with progressive CRPC have molecular features of malignant prostate epithelial cells (18 ). The high concordance between KLK mRNA and CellSearch CTC results suggests that both methods target the same cell population and that real-time RT-PCR assays targeting KLK3 and KLK2 mRNAs reliably detect CTCs in the majority of men with metastatic prostate cancer.

The proportion of patients with metastatic CRPC positive for KLK3 or KLK2 mRNA was approximately 50%. In contrast, few patients were positive for another androgen receptor–responsive gene, PSCA, although a PSCA-positive status was highly associated with a positive status for KLK mRNAs. Although the overall correlation between KLK and CTC results is high, the scatter plots suggest the possibility of a distinct subpopulation of patients who shed CTCs with high KLK mRNA copy numbers that escape detection by the CellSearch assay. Approximately 20% of patients with ⬍5 CTCs per 7.5 mL of blood had detectable KLK3 or KLK2 mRNA. Two of these patients had 0 CTCs, 2 had 1 CTC, and 3 had 2– 4 CTCs per 7.5 mL of blood. No healthy volunteers had a KLK mRNA signal, implying that these transcripts are limited to CTCs and that the detection of such prostatespecific transcripts in blood is indicative of tumor cell dissemination. In addition, very few patients with localized prostate cancer were positive for KLK3 or KLK2 mRNA. Hence, samples that are positive for KLK mRNA but negative by the CellSearch assay cannot be explained by a compromised specificity; instead, we hypothesize that these CTCs were too few or lacked cell surface markers for detection by the CellSearch assay. Conversely, some CRPC patients shed CTCs with very little or no detectable KLK mRNAs. This finding may reflect therapeutic repression of androgen receptor function. From the scatter plots it is evident that the numbers of KLK2 and KLK3 mRNAs per CTC vary among CRPC patients. These numbers, which are derived from combining RT-PCR data with CellSearch CTC data, might prove informative with respect to androgen receptor function, which in turn may relate to critical disease characteristics, treatment options, and outcome. Determining the clinical value of this information will require further studies of KLK mRNA status in CTCs in a large cohort of patients with advanced cancer. CTCs were more frequent in patients who had undergone 2 or more chemotherapy regimens than in those with fewer systemic treatments. This effect was stronger for CellSearch CTCs than for KLK mRNAs. This result may possibly be attributable to the tendency of androgen receptor function, and therefore KLK2 and KLK3 expression as well, to diminish in more advanced disease. In patients with only hormonal therapy, KLK mRNA analysis appears to be the more sensitive assay. Higher CTC numbers have previously been reported in patients receiving second-line therapy (28 ). CTCs, whether detected by KLK mRNAs or by CellSearch, were very strongly associated with diagnosis of bone metastasis in the CRPC patients. Metastatic disease was confirmed by bone scans and soft Clinical Chemistry 55:4 (2009) 771

tissue imaging, which, although not perfectly accurate, are the standard methods for diagnosing prostate cancer metastases. Notably, none of the patients who had soft tissue metastasis alone had a detectable KLK mRNA signal, although the number of patients was small. A similar trend was observed in a study that used CellSearch technology with a larger patient population (28 ). This association with the specific site of metastases, along with the weakness of the correlation of KLK RT-PCR and CellSearch results with the serum PSA concentration, indicates that these assays can provide distinct information that is not related simply to increased tumor burden. KLK mRNA and CellSearch CTC results were both strongly associated with survival. The shorter survival time for KLK-positive patients also held among the patients whose disease was progressing, suggesting that analysis of KLK mRNAs will have utility in this subset of patients. The accuracy of serum PSA concentration for predicting survival was enhanced with the addition of KLK mRNAs and CellSearch results, and the full model with all these variables had a predictive accuracy of 0.759. This result was similar to that of a model that included only PSA and CellSearch results (0.765) and was similar to a published model that includes PSA, CellSearch results, and albumin (28 ). A limitation of the survival analysis is that because of the cohort size, we were unable to test whether KLK mRNA and CellSearch results were associated with survival independent of potential confounding factors. One such factor could be patients’ disease state as reflected by the history of chemotherapy. CTCs were more frequently found in patients who had experienced treatment failure with chemotherapy, a group that might be expected to have shorter survival times than chemotherapy-naive patients. In another study, however, CellSearch CTC results were associated with survival independent of other prognostic factors, including number of prior chemotherapies (29 ). Testing whether this is also the case for KLK mRNA results will require a larger study. Our RT-PCR results imply a very low frequency of CTCs in patients with clinically localized disease, even among those with adverse pathologic features. Davis et al. used CellSearch and also found a low CTC frequency in men with early prostate cancer (3 of 97 patients with ⱖ3 CTCs per 22.5 mL of blood) (30 ). In that study, however, similar frequency of CTCs were detected in men without prostate cancer, whereas no CTCs were detected in healthy individuals with our RT-PCR methodology. The very low frequency of CTCs in patients with localized disease appears consistent with the close association of CTC counts with skeletal metastases revealed in our RT-PCR assays. This low frequency of CTCs also suggests that large studies 772 Clinical Chemistry 55:4 (2009)

with extended follow-up will be required to reliably assess whether the detection of CTCs before treatment is associated with systemic disease (e.g., bone metastases) or more generally with a worse outcome. Because of the limited follow-up time in this study, we were unable to analyze the actual recurrence outcomes of the patients with localized disease. The Gleason score still provides the gold standard for assessing prostate cancer aggressiveness at diagnosis. Gleason score, however, cannot be easily assessed repeatedly over the course of the disease, and because prostate cancer has a highly variable natural history, current information on the patient’s disease is required for optimal targeting of therapies. Hence, there is need for predictive markers that can be easily assessed repeatedly. This study has shown that KLK mRNA assays and CellSearch CTC assays provide prognostic information and enhance the predictive accuracy of serum PSA alone in patients with CRPC. The sample material for CTC assays is readily obtainable by standard venipuncture, and real-time quantitative RTPCR is currently one of the most sensitive methods for detecting CTCs. The concordance between RT-PCR and CellSearch methods, albeit in a small patient population, provides proof of concept that these approaches may be equally valid for detecting disseminated prostate tumor cells. Moreover, the approaches are complementary: CellSearch enables intact CTCs to be counted and characterized by fluorescence in situ hybridization and immunohistochemistry, and KLK mRNA assays provide sensitive and quantitative detection of CTC-specific gene expression. The pathobiologic mechanisms that contribute to the shedding of these cells remain to be defined, however, and so investigation continues to define patient groups in which CTC detection has the most clinical potential.

Author Contributions: All authors confirmed they have contributed to the intellectual content of this paper and have met the following 3 requirements: (a) significant contributions to the conception and design, acquisition of data, or analysis and interpretation of data; (b) drafting or revising the article for intellectual content; and (c) final approval of the published article. Authors’ Disclosures of Potential Conflicts of Interest: Upon manuscript submission, all authors completed the Disclosures of Potential Conflict of Interest form. Potential conflicts of interest: Employment or Leadership: None declared. Consultant or Advisory Role: H.I. Scher, Veridex. Stock Ownership: None declared. Honoraria: J.A. Eastham, Sanofi-Aventis Speaker’s Bureau; H.I. Scher, Veridex. Research Funding: H.I. Scher, Veridex; H. Lilja, National Cancer Institute. The work was supported by NIH Prostate SPORE Grant (P50 CA92629 Pilot Project 7 and 14), the European Union 6th

Circulating Tumor Cells in Localized and Metastatic Prostate Cancer

Framework contract LSHC-CT-2004-503011 (P-Mark), the Sidney Kimmel Center for Prostate and Urologic Cancers, the Prostate Cancer Foundation, William H. Goodwin and Alice Goodwin, and the Commonwealth Foundation for Cancer Research, and the Experimental Therapeutics Cancer of Memorial Sloan-Kettering Cancer Center. R.M. Va¨a¨na¨nen and K. Pettersson were supported by the Academy of Finland.

Expert Testimony: None declared. Role of Sponsor: The funding organizations played no role in the design of study, choice of enrolled patients, review and interpretation of data, or preparation or approval of manuscript. Acknowledgments: We thank Janet Novak of Helix Editing for substantive editing of the manuscript; this work was paid for by MSKCC.

References 1. Cristofanilli M, Budd GT, Ellis MJ, Stopeck A, Matera J, Miller MC, et al. Circulating tumor cells, disease progression, and survival in metastatic breast cancer. N Engl J Med 2004;351:781–91. 2. Katz AE, Olsson CA, Raffo AJ, Cama C, Perlman H, Seaman E, et al. Molecular staging of prostate cancer with the use of an enhanced reverse transcriptase-PCR assay. Urology 1994; 43:765–75. 3. Moreno JG, Croce CM, Fischer R, Monne M, Vihko P, Mulholland SG, Gomella LG. Detection of hematogenous micrometastasis in patients with prostate cancer. Cancer Res 1992;52: 6110 –2. 4. Ellis WJ, Vessella RL, Corey E, Arfman EW, Oswin MM, Melchior S, Lange PH. The value of a reverse transcriptase polymerase chain reaction assay in preoperative staging and followup of patients with prostate cancer. J Urol 1998;159: 1134 – 8. 5. Gao CL, Maheshwari S, Dean RC, Tatum L, Mooneyhan R, Connelly RR, et al. Blinded evaluation of reverse transcriptase-polymerase chain reaction prostate-specific antigen peripheral blood assay for molecular staging of prostate cancer. Urology 1999;53:714 –21. 6. Henke W, Jung M, Jung K, Lein M, Schlechte H, Berndt C, et al. Increased analytical sensitivity of RT-PCR of PSA mRNA decreases diagnostic specificity of detection of prostatic cells in blood. Int J Cancer 1997;70:52– 6. 7. Ignatoff JM, Oefelein MG, Watkin W, Chmiel JS, Kaul KL. Prostate specific antigen reverse transcriptase-polymerase chain reaction assay in preoperative staging of prostate cancer. J Urol 1997;158:1870 – 4; discussion 1874 –5. 8. Oefelein MG, Ignatoff JM, Clemens JQ, Watkin W, Kaul KL. Clinical and molecular followup after radical retropubic prostatectomy. J Urol 1999; 162:307–10; discussion 310 –1. 9. Shariat SF, Gottenger E, Nguyen C, Song W, Kattan MW, Andenoro J, et al. Preoperative blood reverse transcriptase-PCR assays for prostatespecific antigen and human glandular kallikrein for prediction of prostate cancer progression after radical prostatectomy. Cancer Res 2002;62: 5974 –9. 10. Sokoloff MH, Tso CL, Kaboo R, Nelson S, Ko J, Dorey F, et al. Quantitative polymerase chain reaction does not improve preoperative prostate cancer staging: a clinicopathological molecular

11.

12.

13.

14.

15.

16.

17.

18.

19.

20.

21.

analysis of 121 patients. J Urol 1996;156: 1560 – 6. Thomas J, Gupta M, Grasso Y, Reddy CA, Heston WD, Zippe C, et al. Preoperative combined nested reverse transcriptase polymerase chain reaction for prostate-specific antigen and prostate-specific membrane antigen does not correlate with pathologic stage or biochemical failure in patients with localized prostate cancer undergoing radical prostatectomy. J Clin Oncol 2002;20: 3213– 8. de la Taille A, Olsson CA, Katz AE. Molecular staging of prostate cancer: dream or reality? Oncology (Williston Park) 1999;13:187–94; discussion 194 – 8, 204 –5 pas. Schamhart DH, Maiazza R, Kurth KH. Identification of circulating prostate cancer cells: a challenge to the clinical implementation of molecular biology [review]. Int J Oncol 2005;26:565–77. Nurmi J, Wikman T, Karp M, Lovgren T. Highperformance real-time quantitative RT-PCR using lanthanide probes and a dual-temperature hybridization assay. Anal Chem 2002;74:3525–32. Rissanen M, Helo P, Va¨a¨na¨nen RM, Wahlroos V, Lilja H, Nurmi M, et al. Novel homogenous timeresolved fluorometric RT-PCR assays for quantification of PSA and hK2 mRNAs in blood. Clin Biochem 2007;40:111– 8. Nurmi J, Ylikoski A, Soukka T, Karp M, Lovgren T. A new label technology for the detection of specific polymerase chain reaction products in a closed tube. Nucleic Acids Res 2000;28:e28. Reiter RE, Gu Z, Watabe T, Thomas G, Szigeti K, Davis E, et al. Prostate stem cell antigen: a cell surface marker overexpressed in prostate cancer. Proc Natl Acad Sci U S A 1998;95:1735– 40. Shaffer DR, Leversha MA, Danila DC, Lin O, Gonzalez-Espinoza R, Gu B, et al. Circulating tumor cell analysis in patients with progressive castration-resistant prostate cancer. Clin Cancer Res 2007;13:2023–9. Nurmi J, Lilja H, Ylikoski A. Time-resolved fluorometry in end-point and real-time PCR quantification of nucleic acids. Luminescence 2000;15: 381– 8. Ylikoski A, Karp M, Pettersson K, Lilja H, Lovgren T. Simultaneous quantification of human glandular kallikrein 2 and prostate-specific antigen mRNAs in peripheral blood from prostate cancer patients. J Mol Diagn 2001;3:111–22. Ylikoski A, Sjoroos M, Lundwall A, Karp M,

22.

23.

24.

25.

26.

27.

28.

29.

30.

Lovgren T, Lilja H, Iitia A. Quantitative reverse transcription-PCR assay with an internal standard for the detection of prostate-specific antigen mRNA. Clin Chem 1999;45:1397– 407. Va¨a¨na¨nen RM, Rissanen M, Kauko O, Junnila S, Va¨isa¨nen V, Nurmi J, et al. Quantitative real-time RT-PCR assay for PCA3. Clin Biochem 2008;41: 103– 8. Gonen M, Heller G. Concordance probability and discriminatory power in proportional hazards regression. http://www.bepress.com/mskccbiostat/ paper2/ (Accessed February 2009). Collection of Biostatistics Research Archive (COBRA) Web site. Stephenson AJ, Scardino PT, Eastham JA, Bianco FJ Jr, Dotan ZA, DiBlasio CJ, et al. Postoperative nomogram predicting the 10-year probability of prostate cancer recurrence after radical prostatectomy. J Clin Oncol 2005;23:7005–12. Budd GT, Cristofanilli M, Ellis MJ, Stopeck A, Borden E, Miller MC, et al. Circulating tumor cells versus imaging—predicting overall survival in metastatic breast cancer. Clin Cancer Res 2006; 12:6403–9. Cristofanilli M, Hayes DF, Budd GT, Ellis MJ, Stopeck A, Reuben JM, et al. Circulating tumor cells: a novel prognostic factor for newly diagnosed metastatic breast cancer. J Clin Oncol 2005;23:1420 –30. Riethdorf S, Fritsche H, Muller V, Rau T, Schindlbeck C, Rack B, et al. Detection of circulating tumor cells in peripheral blood of patients with metastatic breast cancer: a validation study of the CellSearch system. Clin Cancer Res 2007;13: 920 – 8. Danila DC, Heller G, Gignac GA, GonzalezEspinoza R, Anand A, Tanaka E, et al. Circulating tumor cell number and prognosis in progressive castration-resistant prostate cancer. Clin Cancer Res 2007;13:7053– 8. de Bono JS, Scher HI, Montgomery RB, Parker C, Miller MC, Tissing H, et al. Circulating tumor cells predict survival benefit from treatment in metastatic castration-resistant prostate cancer. Clin Cancer Res 2008;14:6302–9. Davis JW, Nakanishi H, Kumar VS, Bhadkamkar VA, McCormack R, Fritsche HA, et al. Circulating tumor cells in peripheral blood samples from patients with increased serum prostate specific antigen: initial results in early prostate cancer. J Urol 2008;179:2187–91; discussion 2191.

Clinical Chemistry 55:4 (2009) 773

Clinical Chemistry 55:4 774–785 (2009)

Molecular Diagnostics and Genetics

Interindividual and Interethnic Variation in Genomewide Gene Expression: Insights into the Biological Variation of Gene Expression and Clinical Implications Harris P.Y. Fan,1† Chen Di Liao,1† Brenda Yan Fu,1,2 Linda C.W. Lam,2 and Nelson L.S. Tang1,3*

BACKGROUND: Analysis of gene expression in peripheral blood samples is increasingly being applied in biomarker studies of disease diagnosis and prognosis. Although knowledge of interindividual and interethnic variation in gene expression is required to set ethnicityspecific reference intervals and to select reference genes and preferred markers from a list of candidate genes, few studies have attempted to characterize such biological variation on a genomewide scale. METHODS:

The genomewide expression profiles of 11 355 transcripts expressed among 210 multiethnic individuals of the HapMap project were obtained and analyzed; 4 replicates were included for each sample. The total biological CV in gene expression (CVb) was partitioned into interindividual (CVg), inter– ethnic group (CVe), and residual components by randomeffects mixed models. RESULTS: CVg was the major component of CVb, and the differences among transcripts were large (up to 38%). Distinct groups of genes were characterized by CV values and expression levels. Of the genes with lowest biological variation (CVb ⬍ 1.5%), 35 genes were highly expressed, whereas 32 had intermediate or low expression. Although CVg was almost always greater than CVe, we identified 10 genes in which ethnic variation predominated (range, 8%–18%). On the other hand, 17 annotated genes were highly variable with CVg values ranging between 15% and 38%. CONCLUSIONS: Genomewide analysis of gene expression variation demonstrated biological differences among transcripts. Transcripts with the least biological variation are better candidates for reference genes, whereas those with low interindividual variation may be good disease markers. The presence of interethnic variation

Departments of 1 Chemical Pathology and 2 Psychiatry, Faculty of Medicine, The Chinese University of Hong Kong, Shatin, Hong Kong SAR, China; 3 Laboratory for Genetics of Disease Susceptibility, Li Ka Shing Institute of Health Sciences, The Chinese University of Hong Kong, Shatin, Hong Kong SAR, China. * Address correspondence to this author at: Department of Chemical Pathology, Faculty of Medicine, The Chinese University of Hong Kong, Prince of Wales

774

suggests that ethnicity-specific reference intervals may be necessary. © 2009 American Association for Clinical Chemistry

The abundance of transcripts, or gene expression, in whole blood and its various cellular components is increasingly being investigated in the search for clinical diagnostic or prognostic markers. Changes in gene expression after in vitro exposure to a chemotherapy agent have also been explored in pharmacogenetic applications to try to associate such changes with the subsequent clinical response (1, 2 ). Analyses of gene expression in various blood-derived samples have been explored for use in the diagnosis of septic shock (3, 4 ), diagnosis of hypercoagulable states (5 ), posttraumatic injury prognostics (6 ), and diagnosis of infection caused by different pathogens (7, 8 ). After completion of the human genome project, clinical applications of the transcriptome have been investigated by means of exploratory research with microarrays (9 ). Such studies can quantify ⬎30 000 transcripts in a single microarray and identify hundreds of potential markers that are differentially expressed between study groups and controls. In clinical practice, however, real-time PCR analyses will perform such molecular tests with reduced sets of transcripts. Only a handful of gene markers can be analyzed, and the expression of marker genes will be quantified against a reference (i.e., housekeeping) gene, by comparative threshold-cycle methods (10 ). Translation from exploratory research to clinical applications requires that a more limited set of marker genes be defined. Furthermore, QC methods, reference intervals, and the degrees of biological and analytical variation have to be established.

Hospital, Shatin, New Territories, Hong Kong SAR, China. Fax ⫹852 26365090; e-mail [email protected]. H.P.Y. Fan and C.D. Liao contributed equally to the study. Received October 21, 2008; accepted January 30, 2009. Previously published online at DOI: 10.1373/clinchem.2008.119107



Interindividual Variation in Genomewide Gene Expression

Similar to the assessments of routinely measured clinical analytes, several aspects of QC have recently been examined for gene expression, including sample processing (6, 11, 12 ), preanalytical optimization and variation (13–15 ), and effects of long-term storage (16 ). Further characterization of the potential utility of gene expression in blood samples requires data on the biological variation in the marker genes. Such data are useful in QC and for selecting the most informative marker genes from a large group of differentially expressed candidate genes identified in exploratory studies. Gene expression analysis is not clinically useful unless such variation data are available (17, 18 ). Recently, Peters et al. examined interindividual and intraindividual variation in gene expression in blood samples (19 ). They used standardized reverse transcription–PCR analysis to measure the expression of 19 marker genes. Interindividual variation accounted for almost half of the total variance and was the major component of biological variation; however, the magnitude of the interindividual variation in these 19 marker genes varied over a 4-fold range, with CVs ranging from 10% to 40%. Markers with lower interindividual variation were more informative than those with high variation and had a higher individuality index (17, 19 ). Although differences in gene expression across the entire transcriptome and mechanisms of regulation are important topics in the field of genetics, few studies examined CVs of genetic expression from the perspective of clinical chemistry with the objectives of assay development. Moreover, the available reports have examined only a limited number of genes. Early studies suggested that expression differences were partly due to inherited genetic polymorphisms (20 ), and several genomewide studies have been performed to associate differences in gene expression to sequence variation in the genome (21, 22 ). Although these studies were designed to address the genetic mechanism leading to interindividual variation, they also provided high-quality data sets regarding genomewide expression for a large sample of individuals (20, 22 ). We have analyzed one such set (22 ) of whole-genome gene expression data covering ⬎30 000 gene transcripts. Our study of the gene expression profiles of 270 lymphoblastoid cell lines derived from volunteer donors yielded expression data for ⬎10 000 transcripts. Our results provide insight into the interindividual variation of genes expressed in the hematolymphoid lineage and are most readily extrapolated to gene expression in blood samples.

files for 270 HapMap lymphoblastoid cell lines derived from individuals of 4 ethnic groups, and we further analyzed data from 210 unrelated individuals (60 Africans, 60 Caucasians, 45 Chinese, and 45 Japanese). The data set contained 4 replicates of each cell line and was analyzed on the Sentrix® Human-6 Expression BeadChip (Illumina). This array quantifies the expression of approximately 47 000 different transcripts. Values of expression intensity were normalized on a logarithmic scale with a quantile normalization method (22 ). Because the hematolymphoid lineage did not express all transcripts, we used an arbitrary threshold of intensity (mean raw intensity ⬎100) to filter out genes with low or absent expression. The filtering process retained 11 355 transcripts for analysis, representing 10 749 different genes or unique transcripts. The analysis of variance was similar to that previously described by Peters et al. (19 ). The available information (sex, ethnic group) for the cell lines was used to partition total variation into interindividual, inter– ethnic group, and residual components. Partitioning of variance components was carried out with a nested multilevel model. Variances were partitioned in a nested model in the following order: (a) inter– ethnic group CV (CVe)4, (b) interindividual CV (CVg), and (c) residual (analytical) CV among the 4 replicates. In this multilevel model, the effect of sex on gene expression was taken as a fixed effect. The log-transformed intensity values of each transcript were used, and the parameters of the nested model were determined with the NLM module in the R statistical package (23 ). Results RELATIVE GENE EXPRESSION AND VARIANCES ACROSS THE GENOME

Fig. 1A summarizes the relative expression of transcripts across the entire genome (11 355 genes). The distribution is skewed, with raw expression intensities most commonly occurring in the range of 102–103 (i.e., values of 2–3 on a log10 scale), and highly expressed genes (expression intensities ⬎104) were few. The range of expression intensities covered more than 3 orders of magnitude. IDENTIFICATION OF GENES WITH THE LEAST BIOLOGICAL VARIANCE

Assessments of CVg and CVe indicated that these components contributed to the total biological CV (CVb).

Materials and Methods 4

The Wellcome Trust Sanger Institute (http://www. sanger.ac.uk/) provided genomewide expression pro-

Nonstandard abbreviations: CVe, inter– ethnic group CV; CVg, interindividual CV; CVb, total biological CV; MHC, major histocompatibility complex; CVr, component of CVb due to response/treatment group.

Clinical Chemistry 55:4 (2009) 775

B 50

800

40

600

30

CV b (%)

Frequency (no. of transcripts)

A 1000

400

20

200

10

0

0

2.0

3.0

4.0

5.0

1.0

Log mean expression

2.0

3.0

4.0

5.0

6.0

Log mean expression

Fig. 1. Expression of and interindividual variation in 11 355 transcripts in lymphoblast cell lines. (A), Distribution frequency of the 11 355 transcripts by mean expression (after log10 transformation). (B), Distribution of biological variation (CVb) in the expression of 11 355 genes.

In general, highly expressed genes have lower CVb values (Fig. 1B). Table 1 lists genes according to transcript expression in order of increasing CVb value. An examination of Table 1 shows that most of the genes with high expression values (⬎104) that show promise as reference genes with very low variation are those that encode ribosomal proteins. This list also includes commonly used reference genes, such as those encoding ␤-actin and ␤2-microglobulin (see Table 1 in the Data Supplement that accompanies the online version of this article at http://www.clinchem.org/ content/vol55/issue4). Reference genes of intermediate expression are commonly involved in one of 2 cellular processes: (a) respiration (mitochondrial proteins or enzymes in the respiratory chain), or (b) transcription and processing of mRNA (such as components of the splicing machinery). On the other hand, reference genes expressed at low levels (102–103) showed no apparent functional prevalence. These low-expression reference genes have functions in a variety of processes, such as apoptosis, cell signaling, and the immune response. TRANSCRIPTS WITH HIGH CVg

Fig. 2 and Table 2 show the relationship between the 2 types of biological variation, CVg and CVe. In general, the variation in gene expression within an ethnic group 776 Clinical Chemistry 55:4 (2009)

(i.e., CVg) was higher than variation across ethnic groups (CVe). Therefore, the majority of the points fall on the plot beneath the line of identity (slope ⫽ 1, intercept ⫽ 0). The atypical genes that showed high ethnic differentiation lie above the line of identity. Genes with low CV values in both dimensions (i.e., lower-left corner of Fig. 2) are reference genes. At the other extreme are genes with high CVg values (open squares in Fig. 2; Table 2). The highest CVg value (38%) was for a gene in major histocompatibility complex (MHC) class II, HLA-DQA15 (major histocompatibil-

5

Human genes: HLA-DQA1, major histocompatibility complex, class II, DQ alpha 1; HLA-DQA2, major histocompatibility complex, class II, DQ alpha 2; HLADRB1, major histocompatibility complex, class II, DR beta 1; HLA-DRB3, major histocompatibility complex, class II, DR beta 3; HLA-DRB5, major histocompatibility complex, class II, DR beta 5; GSTM1, glutathione S-transferase mu 1; GSTT1, glutathione S-transferase theta 1; UGT2B17, UDP glucuronosyltransferase 2 family, polypeptide B17; CXCL10, chemokine (C-X-C motif) ligand 10; UGT2B7, UDP glucuronosyltransferase 2 family, polypeptide B7; UGT2B11, UDP glucuronosyltransferase 2 family, polypeptide B11; UTS2, urotensin 2; ACTB, actin, beta; GAPDH, glyceraldehyde-3-phosphate dehydrogenase; RPL19, ribosomal protein L19; RPL32, ribosomal protein L32; RPL11, ribosomal protein L11; RPS18, ribosomal protein S18; UBE2D2, ubiquitin-conjugating enzyme E2D 2 (UBC4/5 homolog, yeast); HPRT1, hypoxanthine phosphoribosyltransferase 1; HLA-DRA, major histocompatibility complex, class II, DR alpha; HLA-DMA, major histocompatibility complex, class II, DM alpha; HLA-DMB, major histocompatibility complex, class II, DM beta.

Interindividual Variation in Genomewide Gene Expression

Table 1. Genes expressed with the least biological variation (CVb < 1.5%).a

CVb, %

CVg, %

Log10 mean transcript expression

ACTB

0.54

0.42

4.71

7p15-p12

RPS27

0.55

0.30

4.76

1q21

TMSB4X

0.62

0.23

4.72

RPS24

0.67

0.49

4.72

RPS3A

0.69

0.31

RPS2

0.72

0.54

RPL21

0.74

TMSL3

Gene symbol

Chromosomal location

Gene name/description

Function by ontology

High expression (⬎104; only top 30 genes are shown) Actin beta

Cytoskeleton

Ribosomal protein S27 (metallopanstimulin 1)

Protein biosynthesis

Xq21.3-q22

Thymosin beta 4, X-linked

Cytoskeleton

10q22-q23

Ribosomal protein S24

Protein biosynthesis

4.69

4q31.2-q31.3

Ribosomal protein S3A

Protein biosynthesis

4.70

16p13.3

Ribosomal protein S2

Protein biosynthesis

0.29

4.53

13q12.2

Ribosomal protein L21

Protein biosynthesis

0.74

0.60

4.70

4q22.1

Thymosin-like 3

Cytoskeleton

RPS14

0.74

0.47

4.69

5q31-q33

Ribosomal protein S14

Ribosomal assembly

UBC

0.80

0.69

4.71

12q24.3

Ubiquitin C

Protein modification

RPL31

0.80

0.43

4.68

2q11.2

Ribosomal protein L31

Protein biosynthesis

RPL35

0.81

0.52

4.70

9q34.1

Ribosomal protein L35

Protein biosynthesis

RPL19

0.86

0.55

4.55

17q11.2-q12

Ribosomal protein L19

Protein biosynthesis

FAU

0.87

0.66

4.39

11q13

Finkel-Biskis-Reilly murine sarcoma virus (FBRMuSV) ubiquitously expressed; ribosomal protein S30

Protein biosynthesis

RPLP1

0.87

0.70

4.59

15q22

Ribosomal protein, large, P1

Protein biosynthesis

RPL23A

0.88

0.76

4.62

17q11

Ribosomal protein L23a

Protein biosynthesis

RPS18

0.89

0.62

4.56

6p21.3

Ribosomal protein S18

Protein biosynthesis

EEF1A1

0.89

0.57

4.70

6q14.1

Eukaryotic translation elongation factor 1 alpha 1

Protein biosynthesis

RPL24

0.89

0.52

4.69

3q12

Ribosomal protein L24

Protein biosynthesis

RPS23

0.90

0.59

4.68

5q14.2

Ribosomal protein S23

Translation

TMSL4

0.91

0.66

4.65

9q34.11

Thymosin-like 4 (pseudogene)

Cytoskeleton

RPS10

0.91

0.75

4.48

6p21.31

B2M

0.92

0.77

4.63

15q21-q22.2

RPL18A

0.92

0.78

4.65

19p13

RPL9

0.92

0.43

4.29

RPL11

0.92

0.73

4.68

RPS19

0.93

0.66

4.54

19q13.2

Ribosomal protein S19

Protein biosynthesis

RPS4X

0.93

0.63

4.50

Xq13.1

Ribosomal protein S4, X-linked

Cell cycle

RPL38

0.94

0.60

4.66

17q23-q25

Ribosomal protein L38

Protein biosynthesis

RPL32

0.94

0.62

4.57

3p25-p24

Ribosomal protein L32

Protein biosynthesis

LOC387845

0.95

0.51

4.57

12p12.3

Similar to eukaryotic translation elongation factor 1 alpha 1

Protein biosynthesis

Ribosomal protein S10

Protein biosynthesis

Beta-2-microglobulin

Antigen processing

Ribosomal protein L18a

Protein biosynthesis

4p13

Ribosomal protein L9

Protein biosynthesis

1p36.1-p35

Ribosomal protein L11

Protein biosynthesis

Continued on page 778

Clinical Chemistry 55:4 (2009) 777

Table 1. Genes expressed with the least biological variation (CVb < 1.5%).a (Continued from page 777)

CVb, %

CVg, %

Log10 mean transcript expression

RPL36

0.95

0.54

4.57

19p13.3

Ribosomal protein L36

RPS21

0.97

0.69

4.64

20q13.3

Ribosomal protein S21

Protein biosynthesis

UBB

0.97

0.83

4.39

17p12-p11.2

Ubiquitin B

Protein modification

GNB2L1

0.97

0.78

4.45

5q35.3

Guanine nucleotide binding protein (G protein), beta polypeptide 2-like 1

Signal transduction

EDF1

0.94

0.76

3.55

9q34.3

Endothelial differentiation-related factor 1

Regulation of transcription, DNAdependent

CASP10

1.13

0.85

3.21

2q33-q34

Caspase 10, apoptosisrelated cysteine peptidase

Proteolysis

LOC341315

1.28

0.87

3.95

12q14.2

Hypothetical LOC341315



ARPC3

1.30

1.30

3.86

12q24.11

Actin related protein 2/3 complex subunit 3, 21kDa

Cell motility

SNRPD2

1.32

1.09

3.93

19q13.2

Small nuclear ribonucleoprotein D2 polypeptide 16.5kDa

Spliceosome assembly

UXT

1.34

1.09

3.65

Xp11.23-p11.22

Ubiquitously-expressed transcript

Cytoskeleton

COX5B

1.39

1.25

3.70

2cen-q13

Cytochrome c oxidase subunit Vb

Electron transport

NEDD8

1.39

1.24

3.72

14q12

Neural precursor cell expressed, developmentally downregulated 8

Regulation of transcription from RNA polymerase II promoter

COX4I1

1.39

0.86

3.99

16q22-qter

Cytochrome c oxidase subunit IV isoform 1

Electron transport

SERF2

1.43

1.30

3.97

15q15.3

Small EDRK-rich factor 2

Transcription

LOC127545

1.44

0.81

3.79

1p35.1

Similar to ribosomal protein L18a



LOC136143

1.45

1.25

3.99

7q31.31

Similar to ribosomal protein L18



Prickle4 (C6orf49)

1.45

1.37

3.54

6p21.31

Prickle homolog 4 (Drosophila) (formerly: chromosome 6 open reading frame 49)



EIF3G (EIF3S4)

1.45

1.14

3.51

19p13.2

Eukaryotic translation initiation factor 3, subunit G

Protein biosynthesis

MRPS15

1.48

1.27

3.73

1p35-p34.1

Mitochondrial ribosomal protein S15

Protein biosynthesis

NDUFA1

1.49

1.25

3.90

Xq24

NADH dehydrogenase (ubiquinone) 1 alpha subcomplex, 1, 7.5kDa

Generation of precursor metabolites and energy

Gene symbol

Chromosomal location

Function by ontology

Gene name/description

Protein biosynthesis

Intermediate expression (103–104)

Continued on page 779

778 Clinical Chemistry 55:4 (2009)

Interindividual Variation in Genomewide Gene Expression

Table 1. Genes expressed with the least biological variation (CVb < 1.5%).a (Continued from page 778)

CVb, %

CVg, %

Log10 mean transcript expression

1.49

1.44

3.58

12q24.31

ADP-ribosylation-like factor 6 interacting protein 4

RNA splicing

BNIPL

0.86

0.46

2.08

1q21.2

BCL2/adenovirus E1B 19kD interacting protein like

Apoptosis

PPP1R14A

0.89

0.67

2.60

19q13.1

Protein phosphatase 1, regulatory (inhibitor) subunit 14A

Protein phosphatase inhibitor activity

CGB1

1.05

0.87

2.39

19q13.32

Chorionic gonadotropin beta polypeptide 1

Endocrine

F8

1.07

0.72

2.21

Xq28

Coagulation factor VIII, procoagulant component (hemophilia A)

Acute-phase response

RERG

1.19

0.92

2.22

12p12.3

RAS-like, estrogenregulated, growth inhibitor

Small GTPase–mediated signal transduction

KIRREL

1.29

0.78

2.19

1q21-q25

Kin of IRRE like (Drosophila)

Cell adhesion

GABRD

1.29

0.83

2.08

1p36.3

Gamma-aminobutyric acid (GABA) A receptor, delta

Ion transport

CRYBA2

1.29

0.90

2.02

2q34-q36

Crystallin, beta A2

Biological process

ORAOV1

1.30

1.06

2.03

11q13.2

Oral cancer overexpressed 1



HM13

1.35

1.00

2.54

20q11.21

Histocompatibility (minor) 13

Endoplasmic reticulum

SERPING1

1.39

0.93

2.32

11q12-q13.1

Serpin peptidase inhibitor, clade G (C1 inhibitor), member 1 (angioedema hereditary)

Complement activation, classical pathway

C21orf62

1.40

1.11

2.14

21q22.1

Chromosome 21 open reading frame 62



SSX4

1.41

1.04

2.02

Xp11.23

Synovial sarcoma, X breakpoint 4



LOC387927

1.45

1.32

2.01

13q14.3

Similar to NIMA (never in mitosis gene a)-related expressed kinase 5



ANXA11

1.46

1.33

2.05

10q23

Annexin A11

Immune response

Gene symbol

ARL6IP4

Chromosomal location

Function by ontology

Gene name/description

Low expression (102– 103)

a

Only annotated genes are listed. Some transcripts of unknown function are excluded from this table. Genes are listed in order of increasing CVb within each expression category.

ity complex, class II, DQ alpha 1). Interestingly, 4 other genes in MHC class II also showed high CVgs [HLADQA2 (major histocompatibility complex, class II, DQ alpha 2) and HLA-DRB1, HLA-DRB3, and HLA-DRB5 (major histocompatibility complex, class II, DR beta

genes 1, 3, and 5)]. This level of CVg predicts that the typical 5% and 95% population reference values would span a 2-fold range. Therefore, the individuality index, which is inversely proportional to CVg, will be low, and the reference intervals for these genes will be wide. Clinical Chemistry 55:4 (2009) 779

20%

High CV g (>15%) CV e > CV g All other transcripts Line of identity

UGT2B17

18% 16% 14%

HLA-DRB5

12%

CVe

HLA-DQA1

HLA-DQA2

HLA-DRB1

10% 8%

HLA-DRB3

6%

GSTM1

4%

GSTT1 CXCL10

2% 0% 0%

5%

10%

15%

20%

25%

30%

35%

40%

CV g CVg for optimal biomarkers (CVg < 2.25%)

Fig. 2. Comparison of CVe and CVg for each of the 11 355 transcripts. Interindividual variation (CVg) is larger for most transcripts. Transcripts with CVg values ⬎15% are presented as open squares and are listed in Table 2. Atypical transcripts with CVe values greater than CVg are also indicated (⫻) and are listed in Table 3. Ideal biomarkers should have a low CVg (for example, CVg below the lower quartile value, i.e., ⬍2.25%).

Showing high variation in expression were several enzymes encountered in pharmacogenetics, including GSTM1 (glutathione S-transferase mu 1), GSTT1 (glutathione S-transferase theta 1), and UGT2B17 (UDP glucuronosyltransferase 2 family, polypeptide B17). In addition, CXCL10 [chemokine (C-X-C motif) ligand 10], an important chemokine that is induced by interferon ␥, also had a high CVg. Table 2 in the online Data Supplement lists other genes with intermediate CVgs (10%–15%). More important are the genes with low CVg values that are not reference genes, because they have the potential to become ideal biomarkers. The genes listed in Table 3 in the online Data Supplement have transcript CVgs in the lowest quartile; they are potential preferred biomarkers of gene expression. GENE TRANSCRIPTS WITH HIGH CVe

For the vast majority of genes, CVe is less than CVg; however, there are some exceptions. Table 3 lists genes with CVes both ⬎8% and greater than the CVg; these genes are plotted as ⫻ symbols in Fig. 2. Three of the genes [UGT2B7 (UDP glucuronosyltransferase 2 family, polypeptides B7), UGT2B11 (UDP glucuronosyl780 Clinical Chemistry 55:4 (2009)

transferase 2 family, polypeptides B11), and UGT2B17] encoded transcripts for pharmacogenetically or metabolically important enzymes. The signals for 2 probes for sequences in UTS2 (urotensin 2) confirmed the high interethnic variation for this gene. Discussion This study is the first attempt on a whole-genome scale to examine interindividual and interethnic variation in gene expression with a perspective toward testing in the clinical laboratory. The past 5 years have witnessed an increasing number of laboratory applications of gene expression analysis for blood samples. The most prominent application is the diagnosis of early septic shock with prognostic application in critically ill patients (24 –26 ). Furthermore, gene expression analysis has also been used as a tool to detect circulating tumor cells (27, 28 ). Both applications rely on robust gene expression analysis performed at a high level of QC. Biological variation has been well documented for most common analytes in routine clinical use. The typical magnitude of the variation is often determined early during development and validation of the assays;

Interindividual Variation in Genomewide Gene Expression

Table 2. Genes expressed with high interindividual variation (CVg).a

CVg, %

Log10 mean transcript expression

CXCL10

15.46

2.29

4q21

Chemokine (C-X-C motif) ligand 10

Immune response

FXYD2

15.80

2.87

11q23

FXYD domain containing ion transport regulator 2

Ion transport

GSTM1

18.41

2.26

1p13.3

Glutathione S-transferase mu 1

Metabolism

GSTT1

15.15

2.27

22q11.23

Glutathione S-transferase theta 1

Glutathione metabolism

HLA-DQA1

38.46

3.38

6p21.3

Major histocompatibility complex, class II, DQ alpha 1

Antigen processing

HLA-DQA2

23.16

2.61

6p21.3

Major histocompatibility complex, class II, DQ alpha 2

Antigen processing

HLA-DRB1

18.99

2.87

6p21.3

Major histocompatibility complex, class II, DR beta 1

Antigen processing

HLA-DRB3

22.21

3.33



Major histocompatibility complex, class II, DR beta 3

Antigen processing

HLA-DRB5

20.57

3.50

6p21.3

Major histocompatibility complex, class II, DR beta 5

Antigen processing

IGKV6–21

17.57

2.21

2q

Immunoglobulin kappa variable 6–21 (non-functional)

Immune response

IGLV9–49

15.52

2.24

22q

Immunoglobulin lambda variable

Immune response

MMP7

15.14

2.82

11q21-q22

Matrix metallopeptidase 7 (matrilysin, uterine)

Peptidoglycan metabolism

NLRP2 (NALP2)

15.85

2.26

19q13.42

NLR family, pyrin domain containing 2 (NACHT leucine rich repeat and PYD containing 2)

Immune response

NR2F2

17.83

2.20

15q26

Nuclear receptor subfamily 2, group F, member 2

Neuron migration

PLS3

15.73

2.28

Xq23

Plastin 3 (T isoform)

Actin cytoskeleton

TNFRSF19

16.31

2.12

13q12.11-q12.3

Tumor necrosis factor receptor superfamily, member 19

Apoptosis

UGT2B17

16.22

2.45

4q13

UDP glucuronosyltransferase 2 family polypeptide B17

Metabolism

Gene symbol

a

Chromosomal location

Gene name

Function by ontology

Only annotated genes with known function and CVg values ⬎15% are listed. Table 2 in the online Data Supplement presents results for transcripts with CVg values ⬎15% that are not annotated or without known function and for genes with CVg values of 10%–15%. Genes are listed in alphabetical order by gene symbol.

however, examination of biological variation in gene expression has not been performed on a large scale for blood samples or for cells of the same lineage. In this study, we have used a collection of hematolymphoid cell lines derived from 210 individuals of different ethnic backgrounds to examine such variation. This genomewide analysis provides some new insights into genes with different degrees of biological variation. Such information is essential for developing biological markers for disease diagnosis or prognostication. REVIEW OF COMMONLY USED REFERENCE GENES

In most clinical applications of quantifying gene expression, the expression of a marker gene is compared with that of a reference gene that is stably expressed among individuals (10, 29 ). Various genes have been

proposed as references (30 –33 ). Table 1 in the online Data Supplement lists some of the most frequently used reference genes. ACTB (actin, beta) and GAPDH (glyceraldehyde3-phosphate dehydrogenase) are the most commonly used reference genes. Our results confirmed that the ACTB gene is a good reference candidate. As others have reported (34 ), however, we found GAPDH expression to be more variable than expected. We therefore recommend that this gene not be used in a study as the only reference gene (33, 35 ). We also confirmed that good reference genes showing stable expression include a group encoding ribosomal proteins, some of which others have used (32 ). We note, however, that 2 such genes, RPL19 and RPL32 (ribosomal proteins L19 and L32), showed some degree of interethnic variation. Clinical Chemistry 55:4 (2009) 781

Table 3. Transcripts with high interethnic variation (CVe) in which CVe > CVg.a Gene symbol

a b

CVe, %

CVg, %

CVe/CVg ratio

CVb, %

Chromosomal position

Gene name

Function by ontology

CXXC4

10.12

7.15

1.42

12.39

4q22

CXXC finger 4

Wnt signaling

KIF21A

9.00

6.01

1.50

10.82

12q12

Kinesin family member 21A

Microtubule-based movement

LOC376138

11.26

5.90

1.91

12.71



RGS20

11.51

9.58

1.20

14.97

8q12.1

Regulator of G-protein signalling 20

Signaling pathway

TBC1D4

9.82

9.42

1.04

13.61

13q22.2

TBC1 domain family, member 4

GTPase activator activity

TUBB

8.00

7.87

1.02

11.22

6p21.33

Tubulin beta

Cell motility

UGT2B11

15.96

13.85

1.15

21.13

4q13.2

UDP glucuronosyltransferase 2 family, polypeptide B11

Metabolism

UGT2B17

17.98

16.22

1.11

24.22

4q13

UDP glucuronosyltransferase 2 family, polypeptide B17

Metabolism

UGT2B7

14.42

12.40

1.16

19.02

4q13

UDP glucuronosyltransferase 2 family, polypeptide B7

Lipid metabolism

UTS2b

11.99

11.74

1.02

16.78

1p36

Urotensin 2

Muscle contraction

UTS2b

13.93

13.58

1.03

19.45

1p36

Urotensin 2

Muscle contraction



Only genes with high overall biological variation ⬎10% (combined CVe and CVg) are listed. Genes are listed in alphabetical order by gene symbol. Data for 2 different UTS2 probes are shown.

Therefore, 2 other ribosomal protein genes, RPL11 (ribosomal protein L11) and RPS18 (ribosomal protein S18), may be better choices. The genes for all these ribosomal proteins and well-established reference genes (e.g., ACTB) are all expressed at the highest range of expression, ⬎104. Although they may be adequate for controlling most aspects of technical errors, such as efficiencies of RNA extraction and reverse transcription, they may not be the best normalization genes for marker transcripts produced at lower concentrations (30, 36 ). The best approach would be to use reference genes for normalization that are expressed at the same order of magnitude as the markers of interest. The currently used reference genes with low to moderate expression (i.e., mean expression intensity at 102–103) all show some degree of variation, with a CVg as high as 4.5% for UBE2D2 [ubiquitin-conjugating enzyme E2D 2 (UBC4/5 homolog, yeast)], for example. Furthermore, HPRT1 (hypoxanthine phosphoribosyltransferase 1), another well-known and moderately expressed reference gene, might have a sex-biased expression (P ⫽ 0.03 for a sex effect), given that it is located on the X chromosome. Transcripts of such genes are not ideal for normalization of the expression of diseaseassociated marker genes. We recommend that the genes in Table 1 with low or moderate expression be further evaluated as reference genes. 782 Clinical Chemistry 55:4 (2009)

INTERINDIVIDUAL VARIATION ACCOUNTS FOR THE MAJOR COMPONENT OF BIOLOGICAL VARIATION IN GENE EXPRESSION, BUT INTERETHNIC VARIATION MAY ALSO BE IMPORTANT

Interindividual variation is the most important component of genetic differences in the population. The same conclusion was reached recently in an evaluation of QC parameters in a setting of clinical laboratory tests (19 ). Almost half (43%) of the total variation was due to interindividual variation. These results demonstrate the necessity of characterizing interindividual variation before translating laboratory tests of gene expression into clinical practice. Spielman et al. (37 ) used a different approach in a study of 142 lymphoblast cell lines from ethnic groups across 2 continents (Chinese and Japanese vs Caucasians). The investigators identified 35 genes with significantly different expression in the 2 main ethnic groups. Although both the statistical methods and the microarray chip used for expression profiling were different from those of our study, 2 (22%) of 9 genes in our Table 3 are in common with the genes in their list (37 ). We analyzed 11 355 transcripts of hematolymphoid cell lines from 210 individuals from several ethnic groups to quantify the variation in gene expression and confirmed that CVe was greater than CVg for fewer than 4% (427) of the transcripts. This result suggests that different reference intervals may be necessary for

Interindividual Variation in Genomewide Gene Expression

Table 4. List of candidate genes proposed to predict survival after septic shock from Pachot et al. (3 ).

CVg, %

Log10 mean transcript expression

Gene name

HLA-DRA

2.34

4.07

Major histocompatibility complex, class II, DR alpha

HLA-DRB1

18.99

2.87

Major histocompatibility complex, class II, DR beta 1

HLA-DMA

2.88

3.70

Major histocompatibility complex, class II, DM alpha

HLA-DMB

3.59

3.67

Major histocompatibility complex, class II, DM beta

Gene symbol

different ethnic groups for certain transcripts if they become useful as markers. SELECT MARKER GENES WITH LOW CVg VALUES FOR CLINICAL APPLICATIONS

In searching for clinically useful biomarkers among a group of differentially expressed genes, it is important to consider the biological variation due to CVg and that due to a clinically relevant phenotype or response (CVr). For a marker gene to be effective at differentiating clinically important groups, it must show large between-group variation (i.e., a high CVr) but a low CVg within each of the groups; that is, the preferred biomarker would be the one with a high CVr-to-CVg ratio. A potential biomarker with a low CVg is better than other genes with high CVgs that measure a similar clinical phenotype or response. A classic example is the comparison of serum creatinine with cystatin C in the assessment of renal function (17 ). It is well known that a large decrease in the glomerular filtration rate is required before the serum creatinine concentration increases above the upper reference limit for the population. Compared with cystatin C, the CVr-to-CVg ratio is much lower for serum creatinine because of this marker’s large interindividual variation (17 ). At the stage of exploratory research into a clinically important response, multiple genes may be found to be differentially expressed in patients and control individuals, and such investigations are usually carried out in microarray experiments. From the hundreds of differentially expressed candidates, a few gene markers will then need to be chosen for translational research in order to further characterize their clinical utility, usually by real-time quantitative PCR analyses. An appropriate process of marker selection is therefore required at this stage. Genomewide data that document interindividual variation can play an important role in prioritizing gene markers for further evaluation. The example of prognosis prediction in septic shock patients illustrates the potential of the use of interindividual variation for prioritizing candidate gene markers. Pachot et al. (3 ) studied the expression of MHC class II genes in whole blood from patients with

septic shock and compared expression levels in survivors and nonsurvivors. The investigators reported a generalized down-regulation of genes at MHC class II loci during the course of septic shock; nonsurvivors had even greater suppression of these genes (3 ). Subsequent publications confirmed this phenomenon (38, 39 ). Survivors and nonsurvivors showed differential expression of many MHC class II genes (CVr), including HLA-DRA (major histocompatibility complex, class II, DR alpha), HLA-DRB1, HLA-DMA (major histocompatibility complex, class II, DM alpha), and HLA-DMB (major histocompatibility complex, class II, DM beta), and it was not certain which would be the preferred marker genes for clinical applications. We examined CVg for these genes (Table 4). HLA-DRB1 clearly had a very high CVg (19%) compared with other genes (even across the entire genome). As mentioned above, biomarkers with a narrow reference interval and a high CVr-to-CVg ratio are preferable. Accordingly, HLA-DRB1 would not be a good marker because it has a low CVr-to-CVg ratio owing to its high CVg; therefore, the other 3 markers with similar CVgs are preferred. In fact, HLA-DMB was found to predict survival at both early and late time points (3 ). LIMITATIONS

We quantified interindividual variation in gene expression under a situation in which environmental effects were kept to minimum, i.e., cell lines were raised and grown in an identical medium and environment. To fully characterize the biological variation of an analyte requires measurement of both interindividual variation and intraindividual variation to determine individuality indices and reference interval values (17, 18 ). These parameters are all essential for characterizing the clinical utility of a laboratory test; however, such a highly controlled environmental setting does not allow a complete examination of intraindividual variation. Therefore, the values for the variation parameters we have presented represent estimates of the lower limits of the true parameter values. Having said that, we did observe large differences in interindividual variation across the genome. As illustrated in the example of prognosis prediction after septic shock, variation data Clinical Chemistry 55:4 (2009) 783

can enable the translation of research data into clinical testing. Our analysis provides the basis for prioritizing marker genes for further evaluation. Conclusion This genomewide analysis of CVg for hematolymphoid gene transcripts has demonstrated large biological variation in gene expression across the whole genome. Our results are useful for prioritizing marker genes during the translation of exploratory research to the clinical application of gene expression markers.

Author Contributions: All authors confirmed they have contributed to the intellectual content of this paper and have met the following 3 requirements: (a) significant contributions to the conception and design, acquisition of data, or analysis and interpretation of data; (b) drafting or revising the article for intellectual content; and (c) final approval of the published article. Authors’ Disclosures of Potential Conflicts of Interest: No authors declared any potential conflicts of interest. Role of Sponsor: The funding organizations played no role in the design of study, choice of enrolled patients, review and interpretation of data, or preparation or approval of manuscript.

References 1. Shukla SJ, Dolan ME. Use of CEPH and non-CEPH lymphoblast cell lines in pharmacogenetic studies. Pharmacogenomics 2005;6:303–10. 2. Dermitzakis ET. From gene expression to disease risk. Nat Genet 2008;40:492–3. 3. Pachot A, Monneret G, Brion A, Venet F, Bohe´ J, Bienvenu J, et al. Messenger RNA expression of major histocompatibility complex class II genes in whole blood from septic shock patients. Crit Care Med 2005;33:31– 8; discussion 236 –7. 4. Fahy RJ, Exline MC, Gavrilin MA, Bhatt NY, Besecker BY, Sarkar A, et al. Inflammasome mRNA expression in human monocytes during early septic shock. Am J Respir Crit Care Med 2008;177:983– 8. 5. Mueller J, Rox JM, Madlener K, Poetzsch B. Quantitative tissue factor gene expression analysis in whole blood: development and evaluation of a real-time PCR platform. Clin Chem 2004;50: 245–7. 6. Russom A, Sethu P, Irimia D, Mindrinos MN, Calvano SE, Garcia I, et al. Microfluidic leukocyte isolation for gene expression analysis in critically ill hospitalized patients. Clin Chem 2008;54:891– 900. 7. Mistry R, Cliff JM, Clayton CL, Beyers N, Mohamed YS, Wilson PA, et al. Gene-expression patterns in whole blood identify subjects at risk for recurrent tuberculosis. J Infect Dis 2007;195: 357– 65. 8. Ramilo O, Allman W, Chung W, Mejias A, Ardura M, Glaser C, et al. Gene expression patterns in blood leukocytes discriminate patients with acute infections. Blood 2007;109:2066 –77. 9. Cobb JP, Mindrinos MN, Miller-Graziano C, Calvano SE, Baker HV, Xiao W, et al. Application of genome-wide expression analysis to human health and disease. Proc Natl Acad Sci U S A 2005;102:4801– 6. 10. Nolan T, Hands RE, Bustin SA. Quantification of mRNA using real-time RT-PCR. Nat Protoc 2006; 1:1559 – 82. 11. Mitsuhashi M, Tomozawa S, Endo K, Shinagawa A. Quantification of mRNA in whole blood by assessing recovery of RNA and efficiency of cDNA synthesis. Clin Chem 2006;52:634 – 42. 12. Zheng Z, Luo Y, McMaster GK. Sensitive and quantitative measurement of gene expression directly from a small amount of whole blood. Clin Chem 2006;52:1294 –302.

784 Clinical Chemistry 55:4 (2009)

13. Wright C, Bergstrom D, Dai H, Marton M, Morris M, Tokiwa G, et al. Characterization of globin RNA interference in gene expression profiling of whole-blood samples. Clin Chem 2008;54:396 – 405. 14. Kim SJ, Dix DJ, Thompson KE, Murrell RN, Schmid JE, Gallagher JE, Rockett JC. Effects of storage, RNA extraction, GeneChip type, and donor sex on gene expression profiling of human whole blood. Clin Chem 2007;53:1038 – 45. 15. Langebrake C, Gunther K, Lauber J, Reinhardt D. Preanalytical mRNA stabilization of whole bone marrow samples. Clin Chem 2007;53:587–93. 16. Marteau JB, Mohr S, Pfister M, Visvikis-Siest S. Collection and storage of human blood cells for mRNA expression profiling: a 15-month stability study. Clin Chem 2005;51:1250 –2. 17. Keevil BG, Kilpatrick ES, Nichols SP, Maylor PW. Biological variation of cystatin C: implications for the assessment of glomerular filtration rate. Clin Chem 1998;44:1535–9. 18. Widjaja A, Morris RJ, Levy JC, Frayn KN, Manley SE, Turner RC. Within- and between-subject variation in commonly measured anthropometric and biochemical variables. Clin Chem 1999;45: 561– 6. 19. Peters EH, Rojas-Caro S, Brigell MG, Zahorchak RJ, des Etages SA, Ruppel PL, et al. Qualitycontrolled measurement methods for quantification of variations in transcript abundance in whole blood samples from healthy volunteers. Clin Chem 2007;53:1030 –7. 20. Morley M, Molony CM, Weber TM, Devlin JL, Ewens KG, Spielman RS, Cheung VG. Genetic analysis of genome-wide variation in human gene expression. Nature 2004;430:743–7. 21. Cheung VG, Spielman RS, Ewens KG, Weber TM, Morley M, Burdick JT. Mapping determinants of human gene expression by regional and genomewide association. Nature 2005;437:1365–9. 22. Stranger BE, Forrest MS, Dunning M, Ingle CE, Beazley C, Thorne N, et al. Relative impact of nucleotide and copy number variation on gene expression phenotypes. Science 2007;315:848 – 53. 23. R Development Core Team. R: a language and environment for statistical computing [software]. Vienna: R Foundation for Statistical Computing; 2005. Available at: http://www.r-project.org/. 24. Menges T, Konig IR, Hossain H, Little S, Tchatalbachev S, Thierer F, et al. Sepsis syndrome and

25.

26.

27.

28.

29.

30.

31.

32.

33.

34.

35.

36.

death in trauma patients are associated with variation in the gene encoding tumor necrosis factor. Crit Care Med 2008;36:1456 – 62, e1– 6. Payen D, Lukaszewicz AC, Belikova I, Faivre V, Gelin C, Russwurm S, et al. Gene profiling in human blood leucocytes during recovery from septic shock. Intensive Care Med 2008;34: 1371– 6. Tang BM, McLean AS, Dawes IW, Huang SJ, Cowley MJ, Lin RC. Gene-expression profiling of gram-positive and gram-negative sepsis in critically ill patients. Crit Care Med 2008;36:1125– 8. Pathak AK, Bhutani M, Kumar S, Mohan A, Guleria R. Circulating cell-free DNA in plasma/serum of lung cancer patients as a potential screening and prognostic tool. Clin Chem 2006;52:1833– 42. Xi L, Nicastri DG, El-Hefnawy T, Hughes SJ, Luketich JD, Godfrey TE. Optimal markers for real-time quantitative reverse transcription PCR detection of circulating tumor cells from melanoma, breast, colon, esophageal, head and neck, and lung cancers. Clin Chem 2007;53:1206 –15. Schmittgen TD, Livak KJ. Analyzing real-time PCR data by the comparative C-T method. Nat Protoc 2008;3:1101– 8. Huggett J, Dheda K, Bustin S, Zumla A. Real-time RT-PCR normalisation; strategies and considerations. Genes Immun 2005;6:279 – 84. Liu DW, Chen ST, Liu HP. Choice of endogenous control for gene expression in nonsmall cell lung cancer. Eur Respir J 2005;26:1002– 8. Eisenberg E, Levanon EY. Human housekeeping genes are compact. Trends Genet 2003;19: 362–5. de Kok JB, Roelofs RW, Giesendorf BA, Pennings JL, Waas ET, Feuth T, et al. Normalization of gene expression measurements in tumor tissues: comparison of 13 endogenous control genes. Lab Invest 2005;85:154 –9. Rubie C, Kempf K, Hans J, Su T, Tilton B, Georg T, et al. Housekeeping gene variability in normal and cancerous colorectal, pancreatic, esophageal, gastric and hepatic tissues. Mol Cell Probes 2005; 19:101–9. Bustin SA. Quantification of mRNA using realtime reverse transcription PCR (RT-PCR): trends and problems. J Mol Endocrinol 2002;29:23–39. Bustin S. Absolute quantification of mRNA using

Interindividual Variation in Genomewide Gene Expression

real-time reverse transcription polymerase chain reaction assays. J Mol Endocrinol 2000;25:169 –93. 37. Spielman RS, Bastone LA, Burdick JT, Morley M, Ewens WJ, Cheung VG. Common genetic variants account for differences in gene expression among

ethnic groups. Nat Genet 2007;39:226 –31. 38. Turina M, Dickinson A, Gardner S, Polk HC Jr. Monocyte HLA-DR and interferon-gamma treatment in severely injured patients—a critical reappraisal more than a decade later. J Am Coll

Surg 2006;203:73– 81. 39. Terblanche M, Almog Y, Rosenson RS, Smith TS, Hackam DG. Statins and sepsis: multiple modifications at multiple levels. Lancet Infect Dis 2007; 7:358 – 68.

Clinical Chemistry 55:4 (2009) 785

Clinical Chemistry 55:4 786–794 (2009)

Molecular Diagnostics and Genetics

Overinterpretation of Clinical Applicability in Molecular Diagnostic Research Blanca Lumbreras,1 Lucy A. Parker,1* Miquel Porta,2 Marina Polla´n,3 John P.A. Ioannidis,4 and Ildefonso Herna´ndez-Aguado1

BACKGROUND: We evaluated whether articles on molecular diagnostic tests interpret appropriately the clinical applicability of their results.

We selected original-research articles published in 2006 that addressed the diagnostic value of a molecular test. We defined overinterpretation of clinical applicability by means of prespecified rules that evaluated study design, conclusions regarding applicability, presence of statements suggesting the need for further clinical evaluation of the test, and diagnostic accuracy. Two reviewers independently evaluated the articles; consensus was reached after discussion and arbitration by a third reviewer.

applicability of findings for new molecular diagnostic tests is common. © 2009 American Association for Clinical Chemistry

METHODS:

RESULTS:

Of 108 articles included in the study, 82 (76%) used a design that used healthy controls or alternativediagnosis controls, only 15 (11%) addressed a clinically relevant population similar to that in which the test might be applied in practice, 104 articles (96%) made definitely favorable or promising statements regarding clinical applicability, and 61 (56%) of the articles apparently overinterpreted the clinical applicability of their findings. Articles published in journals with higher impact factors were more likely to overinterpret their results than those with lower impact factors (adjusted odds ratio, 1.71 per impact factor quartile; 95% CI, 1.09 –2.69; P ⫽ 0.020). Overinterpretation was more common when authors were based in laboratories than in clinical settings (adjusted odds ratio, 18.7; 95% CI, 1.41–249; P ⫽ 0.036).

CONCLUSIONS: Although expectations are high for new diagnostic tests based on molecular techniques, the majority of published research has involved preclinical phases of research. Overinterpretation of the clinical

1

Public Health Department, Miguel Herna´ndez University, Alicante, Spain [CIBER en Epidemiologı´a y Salud Pu´blica (CIBERESP)]; 2 Institut Municipal d’Investigacio´ Me`dica, Facultat de Medicina, Universitat Auto`noma de Barcelona, Spain [CIBER en Epidemiologı´a y Salud Pu´blica (CIBERESP)]; 3 Cancer and Environmental Epidemiology Area, National Centre for Epidemiology, Instituto de Salud Carlos III, Madrid, Spain [CIBER en Epidemiologı´a y Salud Pu´blica (CIBERESP)]; 4 Department of Hygiene and Epidemiology, University of Ioannina School of

786

With the remarkable advances in genomic and proteomic technologies, a large number of studies on new molecular diagnostic tests are being published. Expectations are high for the development of noninvasive molecular diagnostic tests, yet analysis and interpretation of the data have presented unique challenges (1 ). Few of the many proposed tests have been introduced into clinical practice with clearly documented benefits (2– 4 ). Today, more than ever, intense promotion of molecular-diagnostic techniques strengthens the need to ensure that the provision of diagnostic tests in clinical settings is evidence-based; however, offering guidance for the introduction of a new diagnostic test into clinical practice remains a challenge (5 ). Besides the increased sensitivity to issues of reporting (6 ) and quality assessment (7 ), several authors (8 –10 ) have proposed a formal structure to guide the process of diagnostic-test development. In the path toward a successful clinical application, a diagnostic test should be evaluated in distinct populations that are similar to those in which the test is intended for eventual use (in clinical practice or in public health). Although preliminary studies may evaluate the ability of the test to distinguish between known disease cases and control individuals who are either healthy or have a specific, different diagnosis, excellent results in the preliminary, preclinical phases do not prove clinical utility. Application of a test in the real world usually involves a different spectrum of disease than preliminary studies, because real-life diagnostic investigations tend to address primarily patients suspected of the target condition and not patients with

Medicine, Ioannina, Greece. * Address correspondence to this author at: Public Health Department, Miguel Herna´ndez University, E-03550 Alicante, Spain. Fax ⫹34 965 919551; e-mail [email protected]. Received November 26, 2008; accepted January 30, 2009. Previously published online at DOI: 10.1373/clinchem.2008.121517

Overinterpretation of Molecular Diagnostic Research

severe clear-cut disease or obviously healthy people. Moreover, other, competing diagnoses are prevalent in real life, whereas most healthy control– or alternative diagnosis– control studies typically exclude patients with diagnoses that compete in the differential diagnosis. Analytical issues (e.g., reproducibility) (11, 12 ) and potential biases (13 ) may also complicate the transition from discovery to clinical translation (1 ). Although these conceptual and methodologic requirements have long been established, it is unknown whether the new generations of studies on molecular diagnostic tests recognize and integrate the extra requirements for clinical translation or, by contrast, whether they tend to overinterpret or exaggerate preliminary results as providing conclusive evidence for clinical applicability. Our aim was to analyze a large sample of recent articles on molecular-diagnostic tests to determine whether the authors’ assessment of the clinical applicability of their results was coherent with their study design and findings or whether they overinterpreted the clinical significance of the available information. Materials and Methods DATA SOURCES AND SEARCHING

We identified diagnostic-accuracy studies on molecular research through a computerized search of MEDLINE that used the medical subject headings (MeSH): “Diagnosis” and “Genomics” or “Microarray analysis”; “Molecular diagnostic techniques” (MeSH) and “Sensitivity and Specificity” (MeSH); “diagnos*” and “genomics” or “proteomics”; and finally, “molecular” or “genetic” and “diagnostic test.” The searches were carried out on May 11, 2007. The full search strategy is documented in Fig. 1 in the Data Supplement that accompanies the online version of this article at http:// www.clinchem.org/content/vol55/issue4. STUDY SELECTION

We selected original research articles that used human participants in studies in which the main objective was to address the diagnostic value of a given test whose methodology was based on molecular techniques. The term “molecular techniques” included technologies that provide a comprehensive analysis of cellularspecific constituents, such as RNA, DNA, proteins, and intermediary metabolites, as well as techniques such as in situ hybridization of chromosomes for cytogenetic analysis, identification of pathogenic organisms via analysis of species-specific DNA sequences, and detection of mutations with the PCR. To maintain a focus on recent research, we limited our sample to articles published in 2006.

A single investigator screened the titles and abstracts according to specific criteria. Reviews, editorials, letters, and case reports were excluded. We also excluded preevaluation studies that focused on the analytical aspects of a diagnostic test (technical aspects on how a method is applied or how measurements are made) and studies that aimed to monitor disease prognosis or treatment effects. To assess the reliability of the selection process, 2 investigators independently assessed a random sample of 200 abstracts; they agreed with the initial reviewer 94% and 83% of the time. DATA EXTRACTION AND DEFINITIONS

Two investigators independently extracted data from each article. The data extractors assigned each study to one of 3 following study designs according to previous definitions (14 ): (a) healthy-control or alternative diagnosis– control study; (b) consecutive series or series of clinically relevant patients in which the spectrum of patients/samples reflects, as closely as possible, populations in which the test may be used in practice; and (c) studies that could not be assigned with confidence to either of the 2 other groups. Table 1 details the operational definitions for each type of design. Furthermore, all statements in the articles referring to clinical applicability and potential need for further clinical evaluation were recorded, as follows: • Statements regarding clinical applicability of the test. Statements on clinical applicability were graded as definitely favorable, as promising, or as unfavorable. Conditional language such as “may” was considered as promising; however, if the authors affirmed that a study reflected the clinical evaluation of the test under question or that the test could be considered an option for diagnosis, it was marked as definitely favorable. The final weight of the decision regarding overinterpretation was based in the abstract. • Statements regarding further clinical evaluation of the test. The presence or absence of statements regarding the need for further clinical evaluation was recorded for each study. A distinction was made between studies that mentioned further clinical evaluation as a desirable possibility and those that stated clinical evaluation was necessary. Only the latter were considered to “mention need of further clinical evaluation.” We defined overinterpretation of clinical applicability with the following rules, which were agreed upon up front and evaluated in a pilot study of 10 articles to ensure that they were operational (Table 2). In brief, overinterpretation was defined in studies with healthy or alternative-diagnosis controls when authors gave a conclusion that was definitely favorable for the appliClinical Chemistry 55:4 (2009) 787

Table 1. Rules for classification of study designs of molecular-diagnostic studies. Study design

Description

Consecutive series or patient series based on a clinically relevant population

Consecutively enrolled patients with clinical suspicion of disease Individuals presenting at a specific center or group of centers who have symptoms indicative of the disease in question Consecutive samples sent to diagnostic lab for analysis and possible diagnosis of the disease in question In screening, when participants share the same characteristics as target population (e.g., asymptomatic “at risk” individuals)

Healthy control or alternative-diagnosis control

Clear selection of disease-positive cases and healthy controls Diseased tissue and healthy adjacent tissue from same patient The same patient is tested before and after treatment/surgery is performed Analysis of amplified spectrum of cases and controls (e.g., severe disease, mild disease, benign disease, healthy controls) Selection of large variety of controls that might pose a diagnostic challenge (but still compared with definitely disease-positive cases) Studies stating “consecutive series or patient series,” yet results clearly indicating that investigators used a healthy-control or alternative diagnosis–control study (e.g., include a healthy control group)

Other

Studies that do not follow a healthy-control or alternative diagnosis–control design, but it is not clearly evident that investigators use consecutive series or patient series based on a clinically relevant population.

cation of the test to the clinic (with or without mentioning the requirement of further clinical evaluation), or if authors stated that the assessed test was promising but did not mention the need for further clinical evaluation. In studies including patient series, any statement in a study that concluded that the test had clinical applications was classified as overinterpretation if the study had unacceptable diagnostic accuracy, as follows: Both sensitivity and specificity were ⬍60% in the main analysis; either sensitivity or specificity was ⬍50% in the main analysis without justification of the merits of

the test as an exclusion/inclusion test; the lower limits of the CIs of both sensitivity and specificity were ⬍50%; the area under the ROC curve was ⬍0.55 or had CIs that reached to ⬍0.50; or, an accuracy index was absent, along with insufficient information provided to calculate sensitivity or specificity. Transcriptions of a selection of the articles examined and their classifications are provided in Annex 1 of the online Data Supplement for illustrative purposes, and some detailed examples are described in the Results. The degree of observer agreement regarding the presence or

Table 2. Rules for the assessment of overinterpretation. Study design

Overinterpretation

Not overinterpretation

Consecutive series or patient series based on a clinically relevant population

Definitely favorable comments regarding clinical application of a test with unacceptable diagnostic accuracy

Definitely favorable, promising, or unfavorable comments regarding the clinical applicability of a test evaluated with acceptable diagnostic accuracy

Promising statements regarding clinical application of a test with unacceptable diagnostic accuracy, but without mentioning the need for further clinical evaluation

Promising statements regarding clinical application of a test with unacceptable diagnostic accuracy, but with statement mentioning the need for further clinical evaluation

Healthy control or alternative-diagnosis control

Definitely favorable comments regarding clinical application of the test under study

Unfavorable comments regarding clinical application

Other

Promising statements regarding clinical application, but without mentioning the need for further clinical evaluation

Promising statements regarding clinical application, but with statement mentioning the need for further clinical evaluation

788 Clinical Chemistry 55:4 (2009)

Overinterpretation of Molecular Diagnostic Research

absence of overinterpretation was 79% at this stage. Discrepancies were resolved by consensus and by independent review by a third investigator. The reviewers were aware of the journal source and authorship. From each study we also recorded the following variables: Thomson Reuters’ bibliographic impact factor; journal categories selected by Thomson Reuters’ Web of Science (Journal Citation Reports 2006); whether the authors were based in a laboratory, in a clinical setting, or both; the disease studied; the molecular methodology used, categorized as genetargeting techniques (PCR-based and microarray), protein-targeting techniques (mass spectrometry or 2-dimensional gel electrophoresis, antibody array or protein microarray), and other; mention of previous studies on the same test and how the results were reported; and description of other diagnostic tests for the same diagnostic problem. We also recorded the sample size; in proteomic or genomic studies in which a pattern-recognition model is developed in a training set and then applied in an independent “validation” set (13 ), we recorded only the number of patients/samples included in the validation set. STATISTICAL ANALYSIS

To assess the association between the outcome variable (overinterpretation) and the variables listed in the previous paragraph, we computed odds ratios and their 95% CIs by means of unconditional logistic regression. Multivariable models considered all variables with P values ⬍0.10 in univariate analyses and used stepwise forward selection. We always included study design and accuracy index as adjusting factors in the multivariable analysis, because they were included in the criteria for judging overinterpretation (as discussed above) and because they could be related with other study characteristics, thus acting as classic confounders. Study size and bibliographic impact-factor data were categorized in quartiles. Analyses were carried out with STATA/SE 8.0 (StataCorp). Results EVALUATED ARTICLES

After screening the titles and abstracts of 1614 articles retrieved in the electronic searches, we considered 147 articles potentially eligible for the study after reviewing the abstracts. After examination of the full texts, we ultimately included 108 articles (see Annex 2 and Flowchart in the online Data Supplement). Table 3 lists the characteristics of the sample of 108 reports. Most of the included reports (83%) used a healthy-control or alternative diagnosis– control design to assess diagnostic accuracy. Regarding the measurement of diagnostic accuracy, more than half (n ⫽

58) of the studies reported classic diagnostic indexes (sensitivity and specificity, or area under the ROC curve). We presented sensitivity and specificity in the same category as area under the ROC curve because 9 of the 12 studies that reported area under the ROC curve presented it along with sensitivity and specificity values; however, when we separately analyzed the 3 studies that reported only area under the ROC curve, we obtained similar results. The sample size ranged from 4 to 8156, with a median of 68. Thirty-one reports (29%) mentioned previous studies on the same tests; of these 31 reports, 15 quantitatively described the results of the previous studies. More than two thirds (n ⫽ 75) of the studies mentioned the existence of other diagnostic tests for the same diagnostic problem. Approximately half (n ⫽ 53, 49%) of the reports stated the need for studies other than diagnostic evaluations, such as identification of biomarkers or assessment of prognostic value. OVERALL STANCE AND INTERPRETATION OF THE RESULTS

Half (n ⫽ 54, 50%) of the articles studied made definitely favorable statements with regard to clinical application, whereas 50 studies (46%) made statements that were classified as promising. Only 4 studies made unfavorable statements regarding the evaluated diagnostic test. About half (n ⫽ 57, 53%) of the articles mentioned the need to evaluate the test’s diagnostic performance in further studies. Fifty-seven (59%) of the 97 studies that did not use a clinically relevant population overinterpreted the clinical applicability. Of the 15 studies carried out with a clinically relevant population, 4 studies (3%) were also deemed to have overinterpreted their results because of insufficient diagnostic accuracy. In combination, overinterpretation of the clinical applicability of the test under study was apparent in more than half (n ⫽ 61, 56%) of the examined articles. Authors solely based in clinical settings were much less likely to overinterpret results, and articles published in journals focusing on medical specialties were also less likely to do so. Furthermore, a higher impact factor for a journal was associated with a higher chance of overinterpretation (Table 4). Multivariable analyses indicated that laboratory-based authors were more likely than clinic-based authors to overinterpret the clinical implications of their results (odds ratio adjusted for study design, type of diagnostic accuracy index, and impact factor, 18.7; 95% CI, 1.41–249.26; P ⫽ 0.026). Articles from journals with impact factors in the upper quartile were more likely to overinterpret than those from the lowest quartile (odds ratio adjusted for study design, type of diagnostic accuracy index, and authorship, 4.33; 95% CI, 1.03–18.23; P ⫽ Clinical Chemistry 55:4 (2009) 789

Table 3. Main characteristics of the 108 articles on molecular-diagnostic tests: overall results and results according to whether they overinterpreted clinical applicability. Overinterpretation of studies? No. of studies

Yes, n (%)

No, n (%)

Consecutive series or series of clinically relevant patients

15

4 (27)

11 (73)

Healthy control or alternative-diagnosis control

82

50 (61)

32 (39)

Other

11

7 (64)

4 (36)

Sensitivity and specificity, or area under the ROC curve

57

25 (44)

32 (56)

Predictive values or accuracy

14

10 (71)

4 (29)

Diagnostic index not calculated

36

26 (72)

10 (28)

Variables

Study design

0.042

Accuracy index

0.014

0.44b

Sample size by quartile, n Q1 (4–37)

27

15 (56)

Q2 (38–68)

26

17 (65)

9 (34)

Q3 (69–107)

27

16 (59)

11 (41)

12 (44)

Q4 (108–8156)

26

12 (46)

14 (54)

Journal category

0.025

Medicine

36

15 (42)

21 (58)

Oncology

32

16 (50)

16 (50)

Biomedical or general science

19

14 (74)

5 (26)

Laboratory and methodology

21

16 (76)

5 (24) 0.050b

Impact factor by quartile Q1 (⬍2.15)

25

11 (44)

14 (56)

Q2 (2.16–3.87)

25

12 (48)

13 (52)

Q3 (3.88–5.74)

28

18 (64)

10 (36)

Q4 (5.75–51.30)

21

14 (67)

7 (33)

9

6 (67)

3 (33)

11

2 (18)

9 (82)

Not classifiedc Authorship Clinic-based

Pa

0.005

Both clinic- and laboratory-based

79

34 (43)

45 (57)

Laboratory-based

26

15 (58)

11 (42) 0.30d

Technique used Gene-targeting techniques PCR-based

34

20 (59)

14 (41)

Microarray

20

14 (70)

6 (30)

44

20 (46)

24 (55)

9

6 (67)

3 (33)

1

0 (0)

Protein-targeting techniques Mass spectrometry or 2D gel electrophoresis Antibody array or protein microarray Other Lipidomics

1 (100)

Continued on page 791

0.045). The association between overinterpretation and impact factor was linear (odds ratio, 1.71 per quartile; 95% CI, 1.09 –2.69; P ⫽ 0.020). We calculated 790 Clinical Chemistry 55:4 (2009)

cross-tabulations to see the differences between journals with high vs low impact factors. The only difference observed was in journal category. The higher-

Overinterpretation of Molecular Diagnostic Research

Table 3. Main characteristics of the 108 articles on molecular-diagnostic tests: overall results and results according to whether they overinterpreted clinical applicability. (Continued from page 790) Overinterpretation of studies? Variables

No. of studies

Yes, n (%)

No, n (%)

Pa

0.57d

Disease type Cancer

61

31 (51)

26 (49)

Infectious disease

19

14 (74)

5 (26)

Congenital disorders

10

6 (60)

4 (40)

Autoimmune disease and transplant rejection

8

4 (50)

4 (50)

Neurologic disease

6

3 (50)

3 (50)

e

Other Total

4

3 (75)

1 (25)

108

61 (57)

47 (44)

P values from ␹2 univariate test of homogeneity unless otherwise stated. ␹2 test of tendency. c Articles that did not enter the Thomson Reuters’ ISI Web of Knowledge Journal Citation Report, edition 2006. Excluded from the statistical analysis were articles that were published in BMC Medical Genetics, World Journal of Gastroenterology, Taiwan Journal of Obstetrics & Gynecology, Molecular Diagnosis & Therapy, Molecular Cancer, Translational Research, Journal of Zhejiang University. Science. B, and Journal of Thoracic Oncology. d Fisher exact test (2-tailed). e Adenomyosis, endometriosis, osteonecrosis of the femoral head, and idiopathic pulmonary fibrosis. a

b

impact journals included a higher proportion of those categorized as “laboratory and methodology,” whereas the lower-impact journals included more “biomedical or general science” journals (P ⫽ 0.010).

EXAMPLES IN THE ASSESSMENT OF OVERINTERPRETATION

Example 1 (reference 25 in Annex 1 in the online Data Supplement). This study used an alternative diagnosis–

Table 4. Multivariable analyses: variables significantly associated with overinterpretation of results.

n (%)

Adjusted odds ratioa

Consecutive series or series of clinically relevant patients

15 (13.9)

1.00

Healthy control or alternative-diagnosis control

82 (75.9)

4.54

1.13–18.15

0.032

Other

11 (10.2)

5.67

0.88–36.80

0.069

Sensitivity and specificity, or area under the ROC curve

57 (52.8)

1.00

Predictive values or overall accuracy

14 (12.9)

1.85

0.42–8.13

0.417

Diagnostic index not reported

36 (33.3)

2.87

1.03–7.96

0.043

Clinic-based

11 (9.5)

1.00

Both clinic- and laboratory-based

79 (68.1)

4.50

0.44–46.14

0.206

Laboratory-based

26 (22.4)

18.73

1.41–249.26

0.026

1.09–2.69

0.020

Variables

95% CI

P

Study design

Accuracy index

Authorship

Impact factor (by quartiles) Linear relationshipb a b

1.71

Logistic regression model controlling for the effects of study design, type of accuracy index, authorship, and bibliographic-impact factor. Reference category is the previous quartile.

Clinical Chemistry 55:4 (2009) 791

control design, and the statements regarding clinical applicability were considered definitely favorable: “This rapid MS-MA is a good primary screening method that can be implemented in a diagnostic laboratory to determine the methylation patterns of patients with suspected PWS or A.” The authors confirm that the diagnostic test is a good primary-screening method, despite the limited conclusiveness of the study design; therefore, the study was considered as overinterpretation. Example 2 (reference 40 in Annex 1 in the online Data Supplement). This study used a healthy-control design, and we did not consider it to have overinterpreted its results. The statements regarding clinical applicability were judged as simply promising (“This study shows that free-circulating DNA can be detected in cancer patients compared with disease-free individuals, and suggests a new, non invasive approach for early detection of cancer.”). The authors additionally specify the need of further studies to evaluate the test (“Further studies are needed to understand the correlation of these new molecular markers with cancer diagnosis, outcome of disease, and eventually treatment response.”). Example 3 (reference 87 in Annex 1 in the online Data Supplement). This study used a clinically relevant population, and we considered the statements regarding clinical utility as definitely favorable (“Componentbased testing and the whole-allergen CAP are equally relevant in the diagnosis of grass-, birch- and catallergic patients.”). The authors specify the need for further clinical evaluation (“The clinical relevance of each allergen needs to be validated separately before the implementation of multiallergen panels into routine diagnostic settings.”). This study had acceptable diagnostic accuracy (sensitivity, 72%; specificity, 92%) and therefore was not considered to have overinterpreted the clinical applicability of its results. Example 4 (reference 54 in Annex 1 in the online Data Supplement). This study also used a clinically relevant population; we considered the statements regarding clinical utility as definitely favorable (“This PCR assay detects a variety of strains exhibiting characteristics of the EAEC group, making it a useful tool for identifying both typical and atypical EAEC.”); however, the authors did not report any measure of diagnostic accuracy. The study was therefore considered overinterpretation. DISCUSSION Although clinical evaluation is necessary before introducing a test into clinical practice, few recent diagnostic studies on molecular research have been carried out in a clinically relevant population. The authors almost 792 Clinical Chemistry 55:4 (2009)

always interpreted their findings as either definitely favorable or at least promising for the evaluated technology. More than half of the articles apparently overinterpreted the clinical applicability of their findings, and such interpretation was more likely for articles in which all of the authors were laboratory-based and in articles published in journals with higher impact factors. Most of the reviewed studies used healthy- or alternative diagnosis– control designs. These studies are not all equal (14 ): Some may be affected by biases, whereas others may be unbiased. Such nonequivalence is one more reason why evaluations with study designs that come closer to the real-life clinical settings are warranted. Some authors have stressed the need to measure the value of a diagnostic test on health outcomes as a final phase in the evaluation of its clinical utility, once the test has been accepted clinically and made commercially available (6, 9 ). We have not covered this issue in this study; however, we do agree that evaluating whether a test positively influences health outcomes is a key aspect. We chose not to cover this aspect because few molecular-diagnostic tests have been incorporated into practice and because trials evaluating the clinical utility of such tests are still scarce. For example, no randomized trials have conclusively assessed the clinical utility of tests involving gene expression profiling, despite several thousand published articles on the subject (2 trials are ongoing) (15 ). Other empirical investigations of the methodologic aspects of diagnostic research have reported serious methodologic limitations (16 –19 ). In the present study, however, we examined the applicability of diagnostic-test results to practice on the basis of the study design and independently of other methodologic aspects. We documented that considerable distance often exists between study design and the clinical applicability of the molecular-diagnostic tests, even if the design and the data are methodologically sound. With the continuing development of new diagnostic tests, comprehensive clinical evaluations are needed if clinical harm and unnecessary spending are to be avoided. As our results show, studies that make claims about the clinical applicability of molecular-diagnostic tests often have not evaluated populations of clinically relevant patients and therefore lack evidence on which to base their claims. Enticing promises exist across the field of molecular medicine (20 ). The exaggeration of the clinical implications of preliminary investigations that we observed in our study may be due to different processes (4, 21, 22 ), including commercial influences (4 ) and insufficient awareness by researchers of their own “interpretive biases” (23, 24 ). Overinterpretation can certainly arise when a strong result is obtained from a very small study. Indeed, the lack of reproducibility in analyses of

Overinterpretation of Molecular Diagnostic Research

proteomic and genomic data is often ascribed to small samples: The main difficulty in conducting a satisfactory early assessment is obtaining sufficient numbers of individuals for both training and validation; thus, the results may be overinterpreted. Large sample sizes and replication in multiple independent data sets are necessary but not sufficient for reliable results, however. Comprehensive clinical evaluation of a single diagnostic test is expensive in terms of both money and time (25 ). Reliable consecutive series of samples that are representative of the real clinical settings of interest may be difficult to obtain in molecular-based research. Unless a well-thought-out research study is designed in collaboration with a clinical center, few groups are likely to hand over their “precious” clinical samples and their clinical and demographic data to a laboratory (26 ). Clinicians may be more sensitive to the difficulties and implications of moving these tests to the bedside and thus may be more cautious in their interpretation. Such reticence would be consistent with our observation that articles by exclusively laboratorybased authors were more likely to overinterpret the clinical applicability of their results. Finally, the observed relationship between journal-impact factor and overinterpretation could be a form of bias: Studies with the more spectacular conclusions appear in journals with higher impact factors, many of which are also more biologically and industry oriented than clinically based. Some caveats about our methods require some discussion. First, we used an operational search strategy and definition to identify a sufficiently large number of molecular-diagnostic studies, but there is no established and widely agreed strategy for identifying such studies in the literature. To evaluate the consistency of the selection process, 2 investigators assessed a random sample of the abstracts and achieved an adequate degree of agreement with the initial reviewer. Therefore, only one reviewer carried out the complete search of the potential reports through MEDLINE. We cannot totally exclude the potential for selective inclusion, but our hope is that it is not large. Furthermore, the internal validity of the type of study we conducted does not require the same completeness of the sample that systematic reviews and metaanalyses of research findings require. More importantly, passing judgment on whether overinterpretation exists is not always straightforward,

and there is a risk that our own assessments overinterpret the language of an article. To establish an adequate definition of overinterpretation, we took into account several aspects in each scientific report; however, we acknowledge that this scheme is not a perfectly objective rule. The agreement between the independent data extractors was less than perfect. Although such deficiencies may affect the exact extent of estimated overinterpretation, it does not affect our main conclusion that inferences on clinical applicability are exaggerated in this literature. The requirements for the introduction of diagnostic tests into clinical practice are less strict than for the introduction of new treatments. Hence, flawed or exaggerated claims for diagnostic-research results could lead to the premature adoption of defective tests, which could translate into erroneous decisions with adverse consequences for health. All in all, our results emphasize the necessity for caution when interpreting the results of diagnosticaccuracy studies in molecular research.

Author Contributions: All authors confirmed they have contributed to the intellectual content of this paper and have met the following 3 requirements: (a) significant contributions to the conception and design, acquisition of data, or analysis and interpretation of data; (b) drafting or revising the article for intellectual content; and (c) final approval of the published article. Authors’ Disclosures of Potential Conflicts of Interest: Upon manuscript submission, all authors completed the Disclosures of Potential Conflict of Interest form. Potential conflicts of interest: Employment or Leadership: None declared. Consultant or Advisory Role: None declared. Stock Ownership: None declared. Honoraria: None declared. Research Funding: Spanish Agency for Health Technology Assessment (Exp PI06/90311) and CIBER en Epidemiologı´a y Salud Pu´blica (CIBERESP), Instituto de Salud Carlos III, Government of Spain. Expert Testimony: None declared. Role of Sponsor: The funding organizations played no role in the design of study, choice of enrolled patients, review and interpretation of data, or preparation or approval of manuscript. Acknowledgments: This manuscript was presented in poster format at the Fifth Annual Meeting of Health Technology Assessment International (HTAi), Montre´al, Canada, July 6 –9, 2008. We thank Jonathan Whitehead for editorial help in preparing an early version of the manuscript.

References 1. Zolg W. The proteomic search for diagnostic biomarkers: lost in translation? Mol Cell Proteomics 2006;5:1720 – 6. 2. Ioannidis JP. Molecular bias. Eur J Epidemiol 2005;20:739 – 45.

3. Check E. Proteomics and cancer: running before we can walk? Nature 2004;429:496 –7. 4. Porta M, Herna´ndez-Aguado I, Lumbreras B, CrousBou M. ‘Omics’ research, monetization of intellectual property and fragmentation of knowledge: can

clinical epidemiology strengthen integrative research? J Clin Epidemiol 2007;60:1220 –5. 5. Herna´ndez-Aguado I. The winding road towards evidence based diagnoses. J Epidemiol Community Health 2002;56:323–5.

Clinical Chemistry 55:4 (2009) 793

6. Bossuyt PM, Reitsma JB, Bruns DE, Gatsonis CA, Glasziou PP, Irwig LM, et al. The STARD statement for reporting studies of diagnostic accuracy: explanation and elaboration. Clin Chem 2003;49: 7–18. 7. Whiting P, Rutjes AW, Reitsma JB, Bossuyt PM, Kleijnen J. The development of QUADAS: a tool for the quality assessment of studies of diagnostic accuracy included in systematic reviews. BMC Med Res Methodol 2003;3:25. 8. Feinstein AR. Clinical epidemiology: the architecture of clinical research. Philadelphia: WB Saunders; 1985. 812 p. 9. Sackett DL, Haynes RB. Evidence base of clinical diagnosis: the architecture of diagnostic research. BMJ 2002;324:539 – 41. 10. Pepe MS, Etzioni R, Feng Z, Potter JD, Thompson ML, Thornquist M, et al. Phases of biomarker development for early detection of cancer. J Natl Cancer Inst 2001;93:1054 – 61. 11. Dobbin KK, Beer DG, Meyerson M, Yeatman TJ, Gerald WL, Jacobson JW, et al. Interlaboratory comparability study of cancer gene expression analysis using oligonucleotide microarrays. Clin Cancer Res 2005;11:565–72. 12. Storms V, Baele M, Coopman R, Willems A, de Baere T, Haesebrouck F, et al. Study of the intraand interlaboratory reproducibility of partial sin-

794 Clinical Chemistry 55:4 (2009)

13.

14.

15.

16.

17.

18.

19.

gle base C-sequencing of the 16S rRNA gene and its applicability for the identification of members of the genus Streptococcus. Syst Appl Microbiol 2002;25:52–9. Ransohoff DF. Rules of evidence for cancer molecular-marker discovery and validation. Nat Rev Cancer 2004;4:309 –14. Rutjes AW, Reitsma JB, Vandenbroucke JP, Glas AS, Bossuyt PM. Case-control and two-gate designs in diagnostic accuracy studies. Clin Chem 2005;51:1335– 41. Ioannidis JP. Is molecular profiling ready for use in clinical decision making? Oncologist 2007;12: 301–11. Reid MC, Lachs MS, Feinstein AR. Use of methodological standards in diagnostic test research. Getting better but still not good. JAMA 1995;274: 645–51. Lijmer JG, Mol BW, Heisterkamp S, Bonsel GJ, Prins MH, van der Meulen JH, et al. Empirical evidence of design-related bias in studies of diagnostic tests. JAMA 1999;282:1061– 6. Lumbreras-Lacarra B, Ramos-Rincon JM, Herna´ndez-Aguado I. Methodology in diagnostic laboratory test research in clinical chemistry and clinical chemistry and laboratory medicine. Clin Chem 2004;50:530 – 6. Yesupriya A, Evangelou E, Kavvoura FK, Patso-

20.

21.

22.

23. 24.

25.

26.

poulos NA, Clyne M, Walsh MC, et al. Reporting of human genome epidemiology (HuGE) association studies: an empirical assessment. BMC Med Res Methodol 2008;8:31. Kyzas PA, Denaxa-Kyza D, Ioannidis JP. Almost all articles on cancer prognostic markers report statistically significant results. Eur J Cancer 2007;43: 2559 –79. van’t Veer LJ, Bernards R. Enabling personalized cancer medicine through analysis of geneexpression patterns. Nature 2008;452:564 –70. Van den Bruel A, Aertgeerts B, Buntinx F. Results of diagnostic accuracy studies are not always validated. J Clin Epidemiol 2006;59:559 – 66. Kaptchuk TJ. Effect of interpretive bias on research evidence. BMJ 2003;326:1453–5. Porta M, ed. A dictionary of epidemiology. 5th ed. New York: Oxford University Press; 2008. Interpretive bias; p. 133. Veenstra TD. Global and targeted quantitative proteomics for biomarker discovery. J Chromatogr B Analyt Technol Biomed Life Sci 2007; 847:3–11. Lumbreras B, Porta M, Herna´ndez-Aguado I. Assessing the social meaning, value and implications of research in genomics. J Epidemiol Community Health 2007;61:755– 6.

Clinical Chemistry 55:4 795–803 (2009)

Evidence-Based Medicine and Test Utilization

Familial and Sporadic Porphyria Cutanea Tarda: Characterization and Diagnostic Strategies Aasne K. Aarsand,1* Helge Boman,2,3 and Sverre Sandberg1,4

BACKGROUND: Porphyria cutanea tarda (PCT) occurs in sporadic (sPCT) and familial (fPCT) forms, which are generally clinically indistinguishable and have traditionally been differentiated by erythrocyte uroporphyrinogen decarboxylase (UROD, EC 4.1.1.37) activity. We used UROD gene sequencing as the reference standard in assessing the diagnostic accuracy of UROD activity, evaluating the mutation spectrum of the UROD gene, determining the frequency and disease attributes of PCT and its subtypes in Norway, and developing diagnostic models that use clinical and laboratory characteristics for differentiating fPCT and sPCT. METHODS:

All consecutive patients with PCT diagnosed within a 6-year period were used for incidence calculations. UROD activity analysis, UROD gene sequencing, analysis of hemochromatosis mutations, and registration of clinical and laboratory data were carried out for 253 patients. RESULTS:

Fifty-three percent of the patients had diseaserelevant mutations, 74% of which were c.578G⬎C or c.636⫹1G⬎C. The UROD activity at the optimal cutoff had a likelihood ratio (LR) of 9.2 for fPCT, whereas a positive family history had an LR of 19. A logistic regression model indicated that low UROD activity, a high uroporphyrin– heptaporphyrin ratio, a young age at diagnosis, male sex, and low alcohol consumption were predictors of fPCT. The incidence of PCT was 1 in 100 000. CONCLUSIONS: Two commonly occurring mutations are responsible for the high frequency of fPCT in Norway. UROD activity has a high diagnostic accuracy for differentiating the 2 PCT types, and a model that takes

1

Norwegian Porphyria Centre (NAPOS), Laboratory of Clinical Biochemistry; and Center for Medical Genetics and Molecular Medicine, Haukeland University Hospital, Bergen, Norway; 3 Section of Medical Genetics and Molecular Medicine, Department of Clinical Medicine, University of Bergen, Bergen, Norway; 4 Norwegian Quality Improvement of Primary Care Laboratories (NOKLUS), Section for General Practice, University of Bergen, Bergen, Norway. * Address correspondence to this author at: Norwegian Porphyria Centre (NAPOS), Laboratory of Clinical Biochemistry, Haukeland University Hospital, NO-5021 2

into account both clinical information and laboratory test results can be used to predict fPCT. © 2009 American Association for Clinical Chemistry

Porphyria cutanea tarda (PCT)5 is a disorder of porphyrin metabolism with associated skin photosensitivity that usually presents with vesiculobullous eruptions on the hands and face, and signs of liver damage. The disease is caused by a deficiency in uroporphyrinogen decarboxylase (UROD), the fifth enzyme in the heme synthesis pathway, and can be classified into several types, familial PCT (fPCT) and sporadic PCT (sPCT) being the most common. fPCT, an autosomal dominant disorder of low penetrance characterized by low UROD activity in all cells, constitutes about 25% of the cases in most populations (1–7 ). The level of UROD activity in erythrocytes has traditionally been used to distinguish between fPCT and sPCT, because sPCT shows reduced UROD activity only in the liver. Both forms are considered clinically indistinguishable and are precipitated by the same factors: iron overload, infection by hepatitis C virus, and the use of estrogens and alcohol (8, 9 ). The increasing public interest in hereditary diseases and genetic testing is likely to make differentiation between sPCT and fPCT increasingly important and consequently produce a greater demand for DNA analysis. In our experience, many PCT patients want to know whether they have a hereditary susceptibility for the disease, and their healthy family members want to be able to use this knowledge to consider precautions, particularly regarding the use of birth control pills in young women. The aims of this study were (a) to examine the diagnostic accuracy of UROD activity as a fPCT marker, with UROD6 gene sequencing as the reference standard; (b) to estimate fPCT frequency and the spectrum of mutations responsible for fPCT in a large, ethnically homogeneous

Bergen, Norway. Fax ⫹47 55973115; e-mail [email protected]. Received September 5, 2008; accepted January 13, 2009. Previously published online at DOI: 10.1373/clinchem.2008.117432 5 Nonstandard abbreviations: PCT, porphyria cutanea tarda; UROD, uroporphyrinogen decarboxylase; fPCT, familial PCT; sPCT, sporadic PCT; uroheptaporphyrin ratio, uroporphyrin– heptaporphyrin ratio; OR, odds ratio; LR, likelihood ratio. 6 Human genes: UROD, uroporphyrinogen decarboxylase; HFE, hemochromatosis.

795

group of PCT patients; (c) to identify what other factors, including HFE (hemochromatosis) genotypes, characterize the 2 subtypes; (d) to use these factors and UROD activity to develop diagnostic models for differentiating sPCT and fPCT; and (e) to estimate the incidence of PCT in Norway.

provoking factors, liver- and iron-related biochemical variables at diagnosis, maltreatment, and family history. Questionnaires were returned for 73% of the patients. The study was approved by the Regional Ethics Committee of Western Norway. BIOCHEMICAL METHODS

Materials and Methods PARTICIPANTS

We diagnosed PCT according to established criteria (8 ) and calculated disease incidence from all patients with PCT diagnosed at the Norwegian Porphyria Centre during 2000 –2005 and the first 5 months of 2006 (n ⫽ 251). Material for DNA analysis was obtained from 196 of the patients and from 57 patients with previously diagnosed PCT who had control samples analyzed during the study period. These 253 patients were enrolled in the other parts of the study. Ostensibly healthy family members at risk for the disease (n ⫽ 74) were tested during the study period and were included in parts of the study. The UROD gene was sequenced in the 253 PCT patients for whom DNA samples were available. Seven of these patients possessed sequence variant c.1104*3G⬎A, which has previously been reported as a likely diseasecausing mutation (10 ). This variant coincided with well-established UROD mutations in 2 of these patients. The biochemical and clinical presentations of these 2 patients, however, were not compatible with hepatoerythropoietic porphyria. This variant was also seen in 3 of 200 healthy control individuals and was therefore categorized as a single-nucleotide polymorphism in our material (Table 1, II). UROD sequence variants of undetermined significance were identified in 5 patients, and c.745C⬎T coincided with c.876 – 7_878dup10 in one of these patients (Table 1, III). After excluding these 5 patients, we constituted the remaining 248 symptomatic index patients, hereafter denoted as “symptomatic PCT patients,” as the main study individuals in this study. We analyzed HFE mutants encoding the C282Y and H63D variants in all patients and conducted analyses of the HFE mutant encoding the S65C variant in 243 patients and analyses of UROD activity and reticulocytes in 246 patients. For the 74 healthy family members at risk, we carried out genetic and biochemical testing primarily as for the symptomatic patients. We collected material for enzyme and DNA analyses simultaneously for each participant and carried out porphyrin analyses at the time of diagnosis for the symptomatic patients and at the time of genetic testing for the healthy family members at risk. A standardized questionnaire filled out by each patient’s personal physician was used to gather information regarding time of disease onset, symptoms, 796 Clinical Chemistry 55:4 (2009)

We removed the buffy coat of heparin- or EDTAanticoagulated blood and washed the erythrocytes in 0.15 mol/L NaCl. Erythrocyte UROD activity was measured essentially as previously described (11 ). Reticulocytes were enumerated on the Cell-Dyn 4000 (Abbott Diagnostics) in a subsample of the washed blood sample. Analysis of urinary ␦-aminolevulinic acid and porphobilinogen was performed by the ALA/PBG Column Test (Bio-Rad Laboratories), followed by spectrophotometric measurements. Plasma samples were fluorometrically scanned as previously described (12 ). Urine and fecal samples were both dissolved in 7 mol/L HCl and 0.1 mol/L phosphate buffer (pH 3.5) and extracted with 1 mL methanol on a solid-phase extraction column (Bond Elut C8; Varian). HPLC separation was performed on a BetaBasic 18 column (4.6 ⫻ 150 mm, 5 ␮m; Thermo Scientific) with a flow rate of 1.0 mL/ min and with mobile phases of 1 g/L trifluoroacetic acid in water and acetonitrile in a volume ratio of 70:30 (A) and acetonitrile (B). The gradient program was 0 –5 min (0%–50% B), 5–7 min (50%–100% B), 7.5– 8.5 min (100%– 0% B), and 8.5–13 min (hold at 0% B). Fluorometric detection was carried out with excitation at 403 nm and emission at 618 nm. Urinary creatinine was assayed on the Hitachi Modular P instrument (Roche Diagnostics). The total analytical CV (within-day and between-day) for a UROD activity of 1.8 U/L erythrocytes was 5.2% during the study period. DNA SEQUENCING AND MUTATION DETECTION

DNA was purified from EDTA-treated blood with the BioRobot M48 workstation (Qiagen) according to the manufacturer’s instructions. We used Oligo 6.3 software (Molecular Biology Insights) to design PCR primers for amplification of exons and flanking introns (at least 50 bp) for the UROD and HFE (exon 2 for S65C) genes. PCR and real-time PCR amplifications were performed with AmpliTaq Gold (Applied Biosystems) or Taq polymerase (Qiagen) according to the manufacturer’s instructions. The PCR products were treated with shrimp alkaline phosphatase/exonuclease I (ExoSAP-IT, USB Corporation) and sequenced with the BigDye Terminator Cycle Sequencing Ready Reaction Kit, version 2.0 (Applied Biosystems) according to the manufacturer’s instructions. The HFE mutants encoding the H63D and C262Y variants were determined by real-time allelic discrimination with the 7900HT Fast Real-Time PCR System (Applied Biosystems) ac-

Familial and Sporadic Porphyria Cutanea Tarda

Table 1. DNA sequence variants in the UROD locus observed in Norwegian PCT patients (n ⴝ 253).a Patients, n

Population frequency

Sequence variants

Protein prediction

Reference

5

c.20⫹1G⬎Ab

Splice defect

This study

3

c.59T⬎Cb

L20P

This study

3

c.100T⬎Ab

W34R

This study

5

c.238G⬎T

A80S

Brady et al. (1 )

2

c.448C⬎Tb

P150S

This study

4

c.568C⬎Tb

Q190X

This study

39

c.578G⬎C

R193P

Phillips et al. (15 )

I. Likely causative mutations

c.636⫹1G⬎C

Splice defect

Garey et al. (16 )

3

c.850T⬎Cb

W284R

This study

7

c.876–7_878dup10b

p.?

This study

1

c.910_912delAACb

p.Asn304del

This study

1

c.1046_1047delATb

p.His349ArgfsX10

This study

58

II. Likely single-nucleotide polymorphisms c.20⫹79T⬎Gb c.21–12C⬎Gb,c c.21–148C⬎T

rs12749939

c.450G⬎A

“P150P” rs2234479

c.474⫹36G⬎C

rs2234480

c.603A⬎G

“P201P” rs2228084

c.758T⬎A

L253Q rs36033115

c.1104*3G⬎A

2/200d 3/200d

Cappellini et al. (10 )

c.1104*70G⬎A

rs41269105

3

c.745C⬎Tb

R249W

0/200d

This study

1

b

c.758T⬎C

L253P

0/200d

This study

1

c.578G⬎Ab

R193H

III. Undetermined variants

This study

a

Reference sequences NM_000374.3 and ENSG00000126088. Sequence variant not listed as a likely causative mutation in the Human Gene Mutation Database (26 ) or as a variant single-nucleotide polymorphism in Ensembl (27 ). c Reverse-transcription PCR analysis showed transcripts of correct size only. d Number of heterozygotes observed among 200 healthy control individuals. b

cording to the manufacturer’s instructions. The presence of the S65C variant was determined by sequencing HFE exon 2 or via restriction enzyme analysis with HinfI (New England BioLabs). DNA samples from most patients with c.578G⬎C were tested for up to 43 microsatellite marker loci surrounding the UROD locus on chromosome 1. Diagnostic samples from a few relatives of index patients were available for establishing definite haplotypes. STATISTICAL ANALYSIS

Analyse-it® (Analyse-it Software) and Microsoft Excel® in Microsoft Office 2003 were used in ROC curve

analysis to evaluate the diagnostic accuracy of UROD activity. General statistical analysis, group-comparison testing, table analysis, logistic regression, and correlation studies were carried out with SPSS® 15.0 for Windows (SPSS). Logistic regression analysis (with forward stepwise variable selection) was performed with the PCT disease type (sporadic/familial) as the dependent variable and the following sets of independent variables: (a) UROD activity alone (n ⫽ 246); (b) UROD activity, HFE status with the genotypes in Table 2 and the wild-type/wild-type genotype as the reference category, sex (male, 0; female, 1), age at diagnosis, uroporphyrin– heptaporphyrin ratio (uroheptaporphyrin Clinical Chemistry 55:4 (2009) 797

Table 2. HFE genotypes for symptomatic PCT patients (n ⴝ 243) by sPCT and fPCT subgroups and compared with healthy Norwegian blood donors.a

C282Y

a b

H63D

S65C

⫹/⫹

⫺/⫺

⫺/⫺

⫹/⫺

⫺/⫺

⫺/⫺

All PCT patients (n ⴝ 243), %b

6.6** 15.6

sPCT patients (n ⴝ 113), %

fPCT patients (n ⴝ 130), %

General population (n ⴝ 204), %

9.7

3.8

0.5

15.9

15.4

12.3

⫹/⫺

⫹/⫺

⫺/⫺

8.6***

8.8

8.5

1.0

⫹/⫺

⫺/⫺

⫹/⫺

0.8

0.9

0.8

0.5

⫺/⫺

⫹/⫹

⫺/⫺

⫺/⫺

⫹/⫺

⫺/⫺

5.8* 20.2

5.3

6.2

1.5

17.7

22.3

17.2

⫺/⫺

⫹/⫺

⫹/⫺

1.2

1.8

0.8

1.0

⫺/⫺

⫺/⫺

⫹/⫺

2.5

4.4

0.8

2.0

⫺/⫺

⫺/⫺

⫺/⫺

35.4

41.5

64.2

38.7***

Analysis of S65C was not performed in 5 PCT patients, and results for these patients are not included. Indicated are frequencies significantly different from those of healthy Norwegians (*P ⫽ 0.02; **P ⬍ 0.001; ***P ⬍ 0.0001).

ratio), ferritin, ␥-glutamyltransferase, alanine aminotransferase, liver disease (absent, 0; present, 1), and high alcohol consumption (absent, 0; or present, 1), as scored by the patient’s personal physician as a provoking factor for PCT (n ⫽ 151); and (c) all variables in (b) except for UROD activity. Logistic regression analysis was also performed in the fPCT and sPCT subgroups with age at symptom onset (⬍50 years and ⱖ50 years) as the dependent variable and HFE status, sex, liver disease, high alcohol consumption, and estrogen use (absent, 0; present, 1) as the independent variables (n ⫽ 166). CIs for indices of diagnostic accuracy were calculated by the Wilson score method (13 ). All P values are the results of 2-tailed tests, and P values ⬍0.05 are considered statistically significant. Results PCT was diagnosed in 251 consecutive patients with an even distribution over the study period. The Norwegian Porphyria Centre performs porphyria diagnostics for the entire population, and this result yields a minimum estimate for the incidence of symptomatic PCT in Norway of 1.0 per 100 000 inhabitants. UROD AND HFE GENOTYPES

A disease-relevant sequence variant was identified in 131 (53%) of the 248 symptomatic PCT patients. UROD variants c.578G⬎C and c.636⫹1G⬎C accounted for 30% and 44% of the fPCT cases, respectively (Table 1). There were no significant differences between sporadic and familial cases with regard to HFE status (Table 2). 798 Clinical Chemistry 55:4 (2009)

PATIENT CHARACTERISTICS

Dividing the symptomatic PCT patients into sporadic and familial cases according to the results of UROD gene sequencing revealed the differences between the 2 groups described in Table 3. In addition, alcohol was more often a provoking factor among men than among women (43% vs 25%), and liver disease, primarily infection with hepatitis B and C viruses, was more frequent in males with sPCT than males with fPCT (29% vs 4%). Twenty-five percent of fPCT patients reported maltreatment (mainly antibiotics, cortisone, or a combination of both), compared with 13% of the sPCT patients. fPCT patients homozygous for the C282Y variant had a younger age of symptom onset (33 years) than either wild-type homozygotes (50 years) or all other HFE genotypes (49 years). A logistic regression analysis indicated that being male and the presence of liver disease predicted a younger age of disease onset in fPCT [odds ratio (OR), 2.9; 95% CI, 1.1–7.2] and sPCT (OR, 0.033; 95% CI, 0.004 – 0.273), respectively. DIAGNOSTICS

UROD activity was not correlated with the reticulocyte percentage (range, 0%– 4%). Fig. 1 shows the distribution of UROD activities in fPCT and sPCT cases. Because the sample size was ⬎200 (14 ), we used ROC curve analysis directly to identify the optimal cutoff of UROD activity. The cutoff of 1.3 U/L erythrocytes had a sensitivity of 96% (95% CI, 90%–99%), a specificity of 90% (95% CI, 82%–94%), a likelihood ratio (LR) of 9.2 (95% CI, 5.3–15.6), and positive and negative predictive values of 91% (95% CI, 84%–95%) and 95% (95% CI, 90%–98%), respectively (Fig. 2). UROD activities and concentrations of urinary porphyrin

Familial and Sporadic Porphyria Cutanea Tarda

Table 3. Biochemical and clinical characteristics of symptomatic fPCT and sPCT patients (n ⴝ 248).a Biochemical or clinical variable

sPCT

fPCT

Male sex, %

45 (36–54)

57 (49–65)

Age at first symptoms, years

53 (51–56)**

48 (45–51)

Age at diagnosis, years

57 (55–59)***

51 (48–53)

UROD activity, U/L erythrocytes

1.7 (1.6–1.7)***

0.9 (0.8–0.9)

Total porphyrins, nmol/mmol creatinine

883 (323, 2044)

901 (319, 2066)

Uroporphyrins, nmol/mmol creatinine

556 (204, 1250)

673 (222, 1471)

Heptaporphyrins, nmol/mmol creatinine

222 (77, 458)

198 (63, 471)

Uroheptaporphyrin ratio

2.7 (1.6, 3.9)***

3.2 (2.4, 5.8)

Ferritin, ␮g/L

b

351 (148, 901)**

256 (113, 821)

ALT,c U/Ld

71 (39, 137)**

86 (49, 173)

␥-GT, U/Le

125 (47, 329)***

65 (31, 179)

Vesicles on hands, %

92 (86–98)

88 (81–95)

Vesicles on face, %

16 (8–24)

21 (12–30)

Hyperpigmentation, %

30 (20–40)

31 (21–41)

Hypertrichosis, %

29 (19–39)

27 (18–36)

High alcohol consumption, %

48 (38–58)***

22 (14–30)

Use of estrogen, %

23 (14–32)

13 (6–20)

Liver disease, %

16 (8–24)*

Relatives with known PCT, %

2 (0–5)***

5 (1–9) 32 (24–40)

a

The minimum number of patients with available data for clinical descriptors and liver variables was 159. Data are presented as the percentage (95% CI), the mean (95% CI), or as the median (10th percentile, 90th percentile), as indicated. For variables with nongaussian distributions, Student t-tests were performed after the data were transformed to natural logarithms. Indicated are statistically significant differences for sPCT vs fPCT (*P ⬍ 0.05; **P ⬍ 0.01; ***P ⬍ 0.0001). b Upper reference limits are ⬍240 ␮g/L for females and ⬍300 ␮g/L for males. c ALT, alanine aminotransferase; ␥-GT, ␥-glutamyltransferase. d Upper reference limits are ⬍45 U/L for females and ⬍70 U/L for males. e Upper reference limits for individuals older than 40 years are ⬍75 U/L for females and ⬍115 U/L for males.

metabolites were similar in the 3 major sequencevariant groups (i.e., c.578G⬎C, c.636⫹1G⬎C, and “others”). Use of an uroheptaporphyrin ratio ⬎2.6 as a marker for fPCT had a sensitivity of 78% (95% CI, 70%– 85%), a specificity of 51% (95% CI, 41%– 60%), an LR of 1.6 (95% CI, 1.3–2.0), and positive and negative predictive values of 63% (95% CI, 55%–71%) and 68% (95% CI, 60%–75%), respectively (Fig. 2). Having a positive family history of PCT had an LR of 19 (95% CI, 5–76) for diagnosing fPCT, and a negative or unknown family history had an LR of 0.7 (95% CI, 0.6 – 0.8) (Table 2). A logistic regression analysis with diagnosis of fPCT vs sPCT as the dependent variable identified the following as significant discriminators: UROD activity (OR, 0.0013; 95% CI, 0.0001– 0.013), uroheptaporphyrin ratio (OR, 2.94; 95% CI, 1.18 –7.33), age at diagnosis (OR, 0.91; 95% CI, 0.84 – 0.99), sex (OR, 0.22; 95% CI, 0.05– 0.93), and high alcohol consumption (OR, 0.09; 95% CI, 0.02– 0.49). This model correctly predicted 93% of the cases. In some diagnostic laboratories, measurement of UROD activity is not available. In this setting,

logistic regression analysis correctly classified 83% of the cases with the following significant variables: sex (OR, 0.23; 95% CI, 0.09 – 0.60), age at diagnosis (OR, 0.96; 95% CI, 0.92–1.00), uroheptaporphyrin ratio (OR, 3.34; 95% CI, 1.75– 6.39), ferritin (OR, 0.998; 95% CI, 0.996 – 0.999), ␥-glutamyltransferase (OR, 0.985; 95% CI, 0.976 – 0.993), alanine aminotransferase (OR, 1.02; 95% CI, 1.01–1.03), and high alcohol consumption (OR, 0.22; 95% CI, 0.08 – 0.62). All models gave essentially the same ORs for the different discriminators when they were applied to the patient subgroups with and without a positive family history of PCT. PREDICTIVELY TESTED FAMILY MEMBERS

Forty-three of the 74 tested healthy family members at risk for the disease carried their family’s mutation. Eight of these 43 individuals were in a subclinical disease state and fulfilled the biochemical criteria for PCT. All urinary metabolites with the exception of coproporphyrins were significantly lower in the subclinical Clinical Chemistry 55:4 (2009) 799

Discussion

Fig. 1. UROD activity for symptomatic sPCT and fPCT patients. Boxes represent the interquartile range, horizontal lines represent the median, and whiskers represent the range of the data, with the exclusion of outliers (E, 1.5–3 box lengths) and extreme values (ⴱ, ⬎3 box lengths) as indicated.

patients than in the symptomatic PCT patients (median uroporphyrin concentration, 140 and 605 nmol/ mmol creatinine, respectively).

1 0.9 0.8

UROD activity

Sensitivity

0.7

Uroheptaporphyrin ratio

0.6 0.5 0.4 0.3 0.2 0.1 0 0

0.2

0.4

0.6 1 – specificity

0.8

1

Fig. 2. ROC curves for diagnostic performance of UROD activity and the uroheptaporphyrin ratio in identifying symptomatic patients with fPCT. Areas under the ROC curve are 0.94 (95% CI, 0.91– 0.98) for UROD activity and 0.73 (95% CI, 0.66 – 0.79) for the uroheptaporphyrin ratio.

800 Clinical Chemistry 55:4 (2009)

PCT is the most prevalent porphyria in most populations and differs from the other hereditary porphyrias in that it also has a sporadic form. Our study is the first to report that the familial form may constitute more than half of the cases in a large national sample (⬎240 index patients). This frequency is significantly higher than frequencies reported for comparable populations (1, 3, 5 ). The largest of these studies (1 ) identified 19 fPCT cases in 84 patients, but because DNA analysis was performed only in patients with low UROD activity, the frequency of fPCT might have been underestimated by a few percent. The 2 other studies reported the familial form to constitute about 25% of cases when UROD gene analysis was performed in all patients [53 and 61 patients, respectively (3, 5 )], whereas 9 of 18 PCT patients in a Chilean study had familial disease (6 ). Most (74%) of the fPCT cases in our study were caused by 2 frequently occurring mutations, c.578G⬎C and c.636⫹1G⬎C. Gene-marker studies of c.578G⬎C indicate that this variant is a founder mutation that originated in the northwestern part of southern Norway (data not shown). c.578G⬎C was first reported in a patient in Utah (15 ), and gene-marker analysis demonstrated that this patient was a descendent of the same ancestor. The sequence variant c.636⫹1G⬎C, which has been found in several unrelated patients, represents either an ancient mutational event or a hot spot for recurrent mutations (3, 15, 16 ). All of our patients with the c.636⫹1G⬎C variant also carried the T allele in the single-nucleotide polymorphism c.636⫹30G⬎T (rs11211066), an observation compatible with a common origin of this mutation on a UROD haplotype with the rarer allele (T) in position c.636⫹30. Founder mutations have not previously been reported in PCT, in contrast to, for example, W198X in acute intermittent porphyria (northern Sweden) (17 ) and the R59W variant in variegate porphyria (South Africa) (18 ). We documented the incidence of PCT in Norway to be 1.0 per 100 000 inhabitants. Excluding patients with c.578G⬎C and c.636⫹1G⬎C reduces the estimated incidence to 0.6 per 100 000. Few reports have documented the incidence of PCT, but an estimate of 0.2– 0.5 per 100 000 population has been reported for the UK (9 ). A recent metaanalysis has shown that HFE genotypes encoding variants C282Y and/or H63D are associated with a 2- to 48-fold increased risk of PCT compared with the wild-type homozygote, with the C282Y/ C282Y conferring the highest risk (19 ). Increased frequencies of C282Y homozygotes, C282Y/H63D compound heterozygotes, and H63D homozygotes were seen in our PCT patients, compared with the general Norwegian population, but no association was seen for

Familial and Sporadic Porphyria Cutanea Tarda

Posttest probability of fPCT

A

20% pretest probability of fPCT

1 0.9 0.8 0.7 0.6 0.5 0.4 0.3 0.2 0.1 0

50% pretest probability of fPCT 90% pretest probability of fPCT

0

0.5

1

1.5

2

2.5

3

UROD activity (U/L erythrocytes)

B Female with high alcohol consumption Female with normal alcohol consumption Male with normal alcohol consumption

1 Posttest probability of fPCT

S65C. The percentage of C282Y homozygotes was only 6.6%, which is low compared with reports of 15%–25% in PCT patients from Britain and Scandinavia, in which the general population frequencies of HFE genotypes are comparable to those of Norway (1, 20, 21 ). The distributions of HFE genotypes in the sPCT and fPCT patient groups were not significantly different in our study, contrary to what has been found in studies in which the 2 disease forms were differentiated on the basis of UROD activity (2, 21 ). In general, PCT, particularly the sporadic type, has been considered to occur more frequently in men (9, 22 ). In our study, however, 57% of the familial cases and 45% of the sporadic cases occurred in men (Table 3). With the exception of the uroheptaporphyrin ratio, the 2 groups showed no significant differences with respect to the urinary excretion of porphyrins, in contrast with the finding in a smaller study of a higher concentration of total urinary porphyrins in fPCT patients (20 ). C282Y homozygosity has been reported to be associated with an earlier onset of symptoms, in both the familial and sporadic disease types (1 ). Our logistic regression analysis, however, showed that only liver disease and sex predicted a younger age of disease onset, in the sporadic and the familial groups, respectively. The consequence of demonstrating the presence of an HFE mutation in a patient with PCT is in itself not apparent, because PCT patients generally exhibit biochemical signs of iron overload. One study found that chloroquine is not as effective in C282Y homozygotes as in the other HFE genotypes, but the treatment of choice will be venesection in most cases, irrespective of the results of HFE testing (23 ). UROD activity differentiates sporadic and familial cases of PCT quite well (LR of 9.2 at ⬍1.3 U/L erythrocytes). In comparison, porphobilinogen deaminase activity has been shown to differentiate between patients with acute intermittent porphyria and their healthy relatives with an LR of 3.6 (24 ). In several of the porphyrias, particularly erythropoietic protoporphyria, a small percentage of the patients with clear clinical and biochemical disease have no identifiable molecular defect (25 ). Not finding a molecular defect in a “true” fPCT case in our study would have affected the diagnostic accuracy of UROD activity. Having one or more family members with overt PCT is a strong predictor of the familial type. In our study group with a 53% prevalence of the familial type, such a family history would give a posttest probability of fPCT of 95% (Table 3), and a prevalence of 20% would give a posttest probability of about 80%. Correspondingly, not having or “not knowing” about relatives with PCT would, in populations with fPCT prevalences of 53% and 20%, give posttest probabilities of 44% and 15%, respectively.

0.8 0.6 0.4 0.2 0 0

0.5

1

1.5

2

2.5

3

UROD activity (U/L erythrocytes)

Fig. 3. Posttest probabilities of fPCT at different UROD activity levels. (A), UROD activity as the sole independent variable in populations with 20%, 50%, and 90% prevalence of fPCT. (B), Consideration of sex and alcohol as a provoking factor in a 54-year-old patient with a uroheptaporphyrin ratio of 2.95.

Posttest probabilities of fPCT can also be calculated with various logistic regression models with different independent variables (Fig. 3). With the use of UROD activity as the sole independent variable and a pretest probability of fPCT of 53%, a UROD result of ⱕ0.8 U/L erythrocytes will confirm fPCT, and a result of ⱖ1.6 U/L will exclude fPCT. Both give posttest probabilities of ⬎90% (Fig. 3A). Applying the same model in a patient with a positive family history, a UROD result of ⱕ1.3 U/L will produce only minor increases in the high posttest probability of fPCT. In a patient with a negative or unknown family history, however, a UROD result of ⱕ0.8 U/L increases the probability of fPCT from 44% to ⬎90%. Extending the diagnostic model to include clinical and biochemical information, such as the patient being a young man at the time of diagnosis with low alcohol consumption and a high uroheptaporphyrin ratio, further increases the probability of fPCT (Fig. 3B). When this diagnostic model Clinical Chemistry 55:4 (2009) 801

was used in the 5 patients excluded because of UROD sequence variants of undetermined significance (Table 1, III), the posttest probability of fPCT in the patient who carried both the disease-causing c.876 –7_878dup10 variant and c.745C⬎T was 99%. The probability of fPCT in the 4 other patients ranged from 7% to 34%. If measurement of UROD activity is not available, a diagnostic model incorporating sex, age at diagnosis, uroheptaporphyrin ratio, ferritin, ␥-glutamyltransferase, alanine aminotransferase, and alcohol consumption can be used to differentiate sPCT and fPCT, but the model has low predictive power. The clinical value of discriminating fPCT from sPCT is not clear. Although distinguishing the disease form at present does not appear to affect either the prognosis or the choice of treatment, our experience is that it is important for both patients and some of their family members to know whether the disease is hereditary. It is possible that the awareness of the hereditary aspect is higher in Norwegian patients, at least 20% of whom know of relatives with overt PCT. In addition, 20% of the healthy family members at risk in our study who had been demonstrated to carry their family’s mutation had biochemically diagnosable PCT, indicating that the penetrance of fPCT is possibly higher than has heretofore been assumed. Future follow-up of these and similar patients is therefore important to further elucidate the importance of differentiating between fPCT and sPCT. A diagnostic strategy for differentiating fPCT and sPCT can be derived from the information in this

study, depending on what probability is deemed necessary for diagnosis, what clinical and laboratory data are known for the patient, what laboratory methods are available, and how they have been standardized. When a diagnostic strategy is to be implemented, it is important to have an estimate of the local prevalence of fPCT and to consider a possible spectrum bias.

Author Contributions: All authors confirmed they have contributed to the intellectual content of this paper and have met the following 3 requirements: (a) significant contributions to the conception and design, acquisition of data, or analysis and interpretation of data; (b) drafting or revising the article for intellectual content; and (c) final approval of the published article. Authors’ Disclosures of Potential Conflicts of Interest: No authors declared any potential conflicts of interest. Role of Sponsor: The funding organizations played no role in the design of study, choice of enrolled patients, review and interpretation of data, or preparation or approval of manuscript. Acknowledgments: Data on UROD activity and UROD gene sequencing results for 71 of the patients described in this report were presented as a poster at the International Congress of Clinical Chemistry and Laboratory Medicine, Kyoto, Japan, 2002. We thank Solveig Blaaflat and Hanne Margrethe Jacobsen on behalf of the Norwegian Porphyria Centre and the Centre for Medical Genetics and Molecular Medicine, Haukeland University Hospital; Vo Tri Toan, University of Bergen, for his work on S65C in healthy Norwegians; and Dr. John D. Philips, University of Utah School of Medicine, for supplying a DNA sample from the previously described c.578G⬎C patient (15 ).

References 1. Brady JJ, Jackson HA, Roberts AG, Whatley SD, Rowlands GL, Darby C, et al. Co-inheritance of mutations in the uroporphyrinogen decarboxylase and hemochromatosis genes accelerates the onset of porphyria cutanea tarda. J Invest Dermatol 2000;115:868 –74. 2. Bulaj ZJ, Phillips JD, Ajioka RS, Franklin MR, Griffen LM, Guinee DJ, et al. Hemochromatosis genes and other factors contributing to the pathogenesis of porphyria cutanea tarda. Blood 2000;95:1565–71. 3. Christiansen L, Bygum A, Jensen A, Brandrup F, Thomsen K, Horder M, Petersen NE. Uroporphyrinogen decarboxylase gene mutations in Danish patients with porphyria cutanea tarda. Scand J Clin Lab Invest 2000;60:611–5. 4. Koszo F, Morvay M, Dobozy A, Simon N. Erythrocyte uroporphyrinogen decarboxylase activity in 80 unrelated patients with porphyria cutanea tarda. Br J Dermatol 1992;126:446 –9. 5. Mendez M, Poblete-Gutierrez P, Garcia-Bravo M, Wiederholt T, Moran-Jimenez MJ, Merk HF, et al. Molecular heterogeneity of familial porphyria cutanea tarda in Spain: characterization of 10 novel mutations in the UROD gene. Br J Dermatol 2007; 157:501–7. 6. Poblete-Gutierrez P, Mendez M, Wiederholt T,

802 Clinical Chemistry 55:4 (2009)

7.

8.

9. 10.

11.

Merk HF, Fontanellas A, Wolff C, Frank J. The molecular basis of porphyria cutanea tarda in Chile: identification and functional characterization of mutations in the uroporphyrinogen decarboxylase gene. Exp Dermatol 2004;13:372–9. Tavazzi D, Di Montemuros FM, Fargion S, Fracanzani AL, Fiorelli G, Cappellini MD. Levels of uroporphyrinogen decarboxylase (URO-D) in erythrocytes of Italian porphyria cutanea tarda patients. Cell Mol Biol (Noisy-le-grand) 2002;48:27–32. Deacon AC, Whatley SD, Elder GH. Porphyrins and disorders of porphyrin metabolism. In: Burtis CA, Ashwood ER, Bruns DE, eds. Tietz textbook of clinical chemistry and molecular diagnostics. 4th ed. New York: Elsevier Saunders; 2006. p 1209 – 35. Elder GH. Porphyria cutanea tarda. Semin Liver Dis 1998;18:67–75. Cappellini MD, Martinez di Montemuros F, Tavazzi D, Fargion S, Pizzuti A, Comino A, et al. Seven novel point mutations in the uroporphyrinogen decarboxylase (UROD) gene in patients with familial porphyria cutanea tarda (f-PCT). Hum Mutat 2001;17:350. McManus J, Blake D, Ratnaike S. An assay of uroporphyrinogen decarboxylase in erythrocytes. Clin Chem 1988;34:2355–7.

12. Da Silva V, Simonin S, Deybach JC, Puy H, Nordmann Y. Variegate porphyria: diagnostic value of fluorometric scanning of plasma porphyrins. Clin Chim Acta 1995;238:163– 8. 13. Uitenbroek GD. Diagnostic effectiveness. SISA; 2002. http: // www.quantitativeskills.com/sisa/statistics/ diagnos.htm (Accessed July 2008). 14. Leeflang MM, Moons KG, Reitsma JB, Zwinderman AH. Bias in sensitivity and specificity caused by data-driven selection of optimal cutoff values: mechanisms, magnitude, and solutions. Clin Chem 2008;54:729 –37. 15. Phillips JD, Parker TL, Schubert HL, Whitby FG, Hill CP, Kushner JP. Functional consequences of naturally occurring mutations in human uroporphyrinogen decarboxylase. Blood 2001;98: 3179 – 85. 16. Garey JR, Harrison LM, Franklin KF, Metcalf KM, Radisky ES, Kushner JP. Uroporphyrinogen decarboxylase: a splice site mutation causes the deletion of exon 6 in multiple families with porphyria cutanea tarda. J Clin Invest 1990;86: 1416 –22. 17. Andersson C. Acute intermittent porphyria in northern Sweden: a population-based study [Thesis]. Umeå, Sweden: Umeå Universitet, 1997: pp20 –1.

Familial and Sporadic Porphyria Cutanea Tarda

18. Meissner PN, Dailey TA, Hift RJ, Ziman M, Corrigall AV, Roberts AG, et al. A R59W mutation in human protoporphyrinogen oxidase results in decreased enzyme activity and is prevalent in South Africans with variegate porphyria. Nat Genet 1996;13:95–7. 19. Ellervik C, Birgens H, Tybjaerg-Hansen A, Nordestgaard BG. Hemochromatosis genotypes and risk of 31 disease endpoints: meta-analyses including 66,000 cases and 226,000 controls. Hepatology 2007;46:1071– 80. 20. Bygum A, Christiansen L, Petersen NE, Hørder M, Thomsen K, Brandrup F. Familial and sporadic porphyria cutanea tarda: clinical, biochemical and genetic features with emphasis on iron status. Acta Derm Venereol 2003;83:115–20. 21. Harper P, Floderus Y, Holmstro¨m P, Eggertsen G,

Gåfvels M. Enrichment of HFE mutations in Swedish patients with familial and sporadic form of porphyria cutanea tarda. J Intern Med 2004;255: 684 – 8. 22. Anderson KE, Sassa S, Bishop DF, Desnick RJ. Disorders of heme biosynthesis: X-linked sideroblastic anemia and the porphyrias. In: Scriver CR, Beaudet AL, Valle D, Sly WS, Childs B, Kinzler KW, Vogelstein B, eds. The metabolic and molecular bases of inherited disease. 8th ed. New York: McGraw-Hill; 2001. p 2961–3062. 23. Stolzel U, Ko¨stler E, Schuppan D, Richter M, Wollina U, Doss MO, et al. Hemochromatosis (HFE) gene mutations and response to chloroquine in porphyria cutanea tarda. Arch Dermatol 2003;139:309 –13. 24. Kauppinen R, von und zu Fraunberg M. Molecular

and biochemical studies of acute intermittent porphyria in 196 patients and their families. Clin Chem 2002;48:1891–900. 25. Whatley SD, Mason NG, Holme SA, Anstey AV, Elder GH, Badminton MN. Gene dosage analysis identifies large deletions of the FECH gene in 10% of families with erythropoietic protoporphyria. J Invest Dermatol 2007;127: 2790 – 4. 26. Human Gene Mutation Database. HGMD professional release 2008.1. https://portal.biobaseinternational.com/hgmd/pro/gene.php?gene⫽UROD (Accessed July 2008). Accessed through BIOBASE Biological Databases: http://www.biobaseinternational.com/. 27. Ensembl. http://www.ensembl.org/ (Accessed July 2008). This Web site provides a genome browser.

Clinical Chemistry 55:4 (2009) 803

Clinical Chemistry 55:4 804–812 (2009)

Point-of-Care Testing

Rapid Single-Nucleotide Polymorphism Detection of Cytochrome P450 (CYP2C9) and Vitamin K Epoxide Reductase (VKORC1) Genes for the Warfarin Dose Adjustment by the SMart-Amplification Process Version 2 Tohru Aomori,1,2 Koujirou Yamamoto,1,2* Atsuko Oguchi-Katayama,3 Yuki Kawai,3,4 Takefumi Ishidao,3,4 Yasumasa Mitani,3,4 Yasushi Kogo,3,4 Alexander Lezhava,3 Yukiyoshi Fujita,2 Kyoko Obayashi,2 Katsunori Nakamura,1 Hugo Kohnke,5 Mia Wadelius,5 Lena Ekstro¨m,6 Cristine Skogastierna,6 Anders Rane,6 Masahiko Kurabayashi,7 Masami Murakami,8 Paul E. Cizdziel,3 Yoshihide Hayashizaki,3,9 and Ryuya Horiuchi1,2

BACKGROUND:

Polymorphisms of the CYP2C9 (cytochrome P450, family 2, subfamily C, polypeptide 9) gene (CYP2C9*2, CYP2C9*3) and the VKORC1 (vitamin K epoxide reductase complex, subunit 1) gene (⫺1639G⬎A) greatly impact the maintenance dose for the drug warfarin. Prescreening patients for their genotypes before prescribing the drug facilitates a faster individualized determination of the proper maintenance dose, minimizing the risk for adverse reaction and reoccurrence of thromboembolic episodes. With current methodologies, therapy can be delayed by several hours to 1 day if genotyping is to determine the loading dose. A simpler and more rapid genotyping method is required. METHODS:

We developed a single-nucleotide polymorphism (SNP)-detection assay based on the SMart Amplification Process version 2 (SMAP 2) to analyze CYP2C9*2, CYP2C9*3, and VKORC1 ⫺1639G⬎A polymorphisms. Blood from consenting participants was used directly in a closed-tube real-time assay without DNA purification to obtain results within 1 h after blood collection.

RESULTS:

We analyzed 125 blood samples by both SMAP 2 and PCR-RFLP methods. The results showed perfect concordance.

1

Department of Clinical Pharmacology, Gunma University Graduate School of Medicine, Maebashi, Japan; 2 Department of Pharmacy, Gunma University Hospital, Maebashi, Japan; 3 Genome Exploration Research Group (Genome Network Project Core Group), RIKEN Genomic Sciences Center (GSC), RIKEN Yokohama Institute, Yokohama, Japan; 4 K.K. DNAFORM, Yokohama, Japan; 5 Department of Medical Sciences, Clinical Pharmacology, Uppsala University Hospital, Uppsala, Sweden; 6 Karolinska Institute, Department of Clinical Pharmacology, Karolinska University Hospital, Stockholm, Sweden; 7 Department of Medicine and Biological Science, Gunma University Graduate School of Medicine, Maebashi, Japan; 8 Department of Clinical Laboratory Medicine, Gunma University Graduate School of Medicine, Maebashi, Japan; 9 Genome Science

804

CONCLUSIONS: The results validate the accuracy of the SMAP 2 for determination of SNPs critical to personalized warfarin therapy. SMAP 2 offers speed, simplicity of sample preparation, the convenience of isothermal amplification, and assay-design flexibility, which are significant advantages over conventional genotyping technologies. In this example and other clinical scenarios in which genetic testing is required for immediate and better-informed therapeutic decisions, SMAP 2– based diagnostics have key advantages.

© 2008 American Association for Clinical Chemistry

Warfarin is the most widely prescribed anticoagulant for the treatment of thromboembolic disorders. Because of the narrow therapeutic index and the large individual variation observed between warfarin dose and its anticoagulant effect (1 ), a careful adjustment of the dose on the basis of the prothrombin time expressed as the international normalized ratio (PTINR)10 is essential. Subgroup analyses in a number of studies have shown that the risk of bleeding increases sharply when the PT-INR is greater than the upper limit of the therapeutic interval (2–4 ), and the risk of thromboembolic events increases when the PT-INR falls below the therapeutic interval (4, 5 ). To prevent adverse events requires immediate adjustment of the warfarin dosage. Establishing a

Laboratory, Discovery Research Institute, RIKEN Wako Institute, Wako, Japan. * Address correspondence to this author at: Department of Clinical Pharmacology, Gunma University School of Medicine, 3-39-22 Showa-machi, Maebashi 371-8511, Japan. Fax ⫹81-27-220-8743; e-mail [email protected]. ac.jp. Received September 3, 2008; accepted December 16, 2008. Previously published online at DOI: 10.1373/clinchem.2008.115295 10 Nonstandard abbreviations: PT-INR, prothrombin time expressed as the international normalized ratio; VKORC1, subunit 1 of the vitamin K epoxide reductase complex; SNP, single-nucleotide polymorphism; Wt, wild type; SMAP 2, SMart-Amplification Process version 2; dNTP, deoxynucleoside triphosphate; Mt, mutant; LAMP, loop-mediated isothermal amplification.

Rapid Genotyping for Warfarin Dose Adjustment by SMAP 2

Table 1. Properties of various gene analyses. SMAP

Sample preparation (DNA extraction) Detection time

a

Not required 30–40 min

Isothermal

O

Target amplification

O

RFLP

Required About 10 h O

LAMPa

Invader

NASBA

Not required

Required

Required

30 min

4h

90 min

O

O

O

O

Required ⬎100 min O

SDA

RCA

bDNA

Required

Required

Required

70 min

90 min

105 min

O

O

O

O

O

NASBA, nucleic acid sequence– based amplification; LCR, ligase chain reaction; SDA, strand-displacement amplification; RCA, rolling-circle amplification; bDNA, branched DNA; O, applicable.

proper maintenance dose is challenging, however, because of widespread interindividual variation in the response to warfarin. This variation is explained in part by the genotype for the target site of warfarin action (pharmacodynamic effects) and by the genotype for the individual’s metabolizing enzymes (pharmacokinetic effects). Subunit 1 of the vitamin K epoxide reductase complex (VKORC1), a component of vitamin K epoxide reductase (VKOR), is a chief molecular target of warfarin (6 ). VKOR reportedly is a multisubunit enzyme, but a single peptide, VKORC1, may be responsible for its reductase activity (7 ). This enzyme recycles vitamin K 2,3-epoxide to vitamin K hydroquinone, which is required by ␥-glutamyl carboxylase for the posttranslational modification of blood coagulation factors II, VII, IX, X, and others. Recent findings have shown that polymorphisms in VKORC111 (vitamin K epoxide reductase complex, subunit 1) have a large impact on warfarin dose (8 – 13 ). In our previous study (14 ), we found that 3 VKORC1 polymorphisms (⫺1639G⬎A, 1173C⬎T, and 1542G⬎C) were invariantly linked. The haplotypes can be categorized into M1, M2, and M3 groups, as Rieder et al. previously observed (13 ). Individuals in the M3 group, which has the ⫺1639AA genotype, require a lower maintenance dose of warfarin than individuals in the other groups. A multivariate analysis clearly showed that this polymorphism was the most important determinant of daily warfarin dose, explaining 16.5% of the variation. From these findings, we concluded that genotyping can be simplified considerably through the use of the ⫺1639G⬎A polymorphism as the

11

O

LCR

Human genes: VKORC1, vitamin K epoxide reductase complex, subunit 1; CYP2C9, cytochrome P450, family 2, subfamily C, polypeptide 9; CYP2C8, cytochrome P450, family 2, subfamily C, polypeptide 8; CYP2C18, cytochrome P450, family 2, subfamily C, polypeptide 18; CYP2C19, cytochrome P450, family 2, subfamily C, polypeptide 19.

chief pharmacodynamic marker for predicting warfarin sensitivity. Clinically available warfarin is a racemic mixture of S- and R-warfarin, and the potency of S-warfarin is 3- to 5-fold higher than that of R-warfarin (1, 15 ). S-warfarin is metabolized to 7-hydroxywarfarin predominantly by the polymorphic enzyme encoded by the CYP2C9 gene (cytochrome P450, family 2, subfamily C, polypeptide 9), whereas R-warfarin is metabolized by multiple cytochrome P450s (16, 17 ). The enzymatic activity of CYP2C9 has a substantial influence on the observed anticoagulant effect of S-warfarin, the primary active form of the drug. Previous findings revealed that the common functional variants encoded by single-nucleotide polymorphisms (SNPs) CYP2C9*2 (430C⬎T, exon 3) and CYP2C9*3 (1075A⬎C, exon 7) have approximately 70% and 10%, respectively, of the metabolic capacity of the enzyme encoded by the wild-type (Wt) gene (CYP2C9*1) (18, 19 ). In 2007, the US Food and Drug Administration updated the labeling recommendations for warfarin to stress that genetic information is helpful for improving the initial estimate of warfarin dose for individual patients (20 ), after several randomized prospective clinical trials showed the benefit of genotype-guided warfarin prescription (21, 22 ). Warfarin therapy typically commences soon after diagnosis of a disease for which this drug has a clinical indication; therefore, testing with a short turnaround time, ideally within 1 h of diagnosis, should be made available. Numerous strategies have been developed for SNP discrimination (Table 1) (23–32 ), but no method has eliminated background amplification or DNA purification so that CYP2C9 and VKORC1 genotypes can be diagnosed within 1 h of blood collection. Recently, we reported the SMart-Amplification Process version 2 (SMAP 2), which can detect SNPs after about 30 min of incubation under isothermal conditions (33 ). We adapted Clinical Chemistry 55:4 (2009) 805

the SMAP 2 by designing primer sets to detect the VKORC1 ⫺1639G⬎A, CYP2C9*2, and CYP2C9*3 genetic polymorphisms. Materials and Methods CLINICAL SAMPLES

The DNA-analysis study was approved by the Institutional Review Board for clinical trials at Gunma University Hospital and the Ethical Committee for Human Genome Analysis at Gunma University. Written consent was obtained from all participants after they had been informed of the experimental procedure and the purpose of the study. Two milliliters of blood was drawn from healthy Japanese volunteers and unrelated Japanese patients treated with warfarin at the Department of Cardiovascular Medicine, Gunma University Hospital, Maebashi, Japan, and was anticoagulated with 4 mg dipotassium EDTA. Blood samples were analyzed immediately or stored at ⫺80 °C until assayed. Blood samples were also obtained from Swedish patients being treated at the anticoagulation clinic at Uppsala University Hospital and from the Swedish warfarin genetics (WARG) cohort at Karolinska Institute. Approval was obtained from the appropriate ethics committees. These blood samples were anticoagulated and stored as described above. DETECTION OF VKORC1 ⴚ1639G>A, CYP2C9*2, AND CYP2C9*3 POLYMORPHISMS BY THE SMAP 2

The SMAP 2 is a unique genotyping technology that can detect a germline genetic mutation in a single step after about 30 min of incubation under isothermal conditions (34 ). A set of specifically designed primers, designated the turn-back primer, the folding primer, the boost primer, 2 outer primers, and the competitive probe (34 ), enables allele-specific amplification. The turn-back primer contains a sequence at the 3⬘ end complementary to the target sequence and another sequence at the 5⬘ end that is complementary to a sequence present on the same DNA strand but situated approximately 15– 40 bp further downstream. The folding primer consists of a sequence at the 3⬘ end complementary to a target genomic sequence and a sequence at the 5⬘ end that self-anneals to create a hairpin structure. Outer primers strand-displace the DNA synthesized from the turn-back primer and the folding primer. These synthesized DNAs lead to a self-primed DNA synthesis. The role of the boost primer is to accelerate the speed of amplification. For suppression of background amplification, the SMAP 2 uses 2 strategies: the use of an asymmetrical primer design, which minimizes misamplification pathways, and inclusion of Taq MutS in the reaction mixture. Taq MutS is a 806 Clinical Chemistry 55:4 (2009)

mismatch-binding protein that recognizes mismatched duplexes between the target DNA and any primer, such as an SNP-discrimination primer (turnback, folding, or boost primer). Binding of this protein to misprimed templates creates a barrier to the stranddisplacing DNA polymerase, preventing exponential amplification of those molecules, thereby reducing background. The SMAP 2 assay was carried out in a 25-␮L reaction mixture containing 3.2 ␮mol/L folding primer, 3.2 ␮mol/L turn-back primer, 1.6 ␮mol/L boost primer, 0.4 ␮mol/L of each outer primer, 16 ␮mol/L competitive probe (in the case of CYP2C9*2), 1.4 ␮mol/L of each deoxynucleoside triphosphate (dNTP), 50 mL/L DMSO, 20 mmol/L Tris-HCl (pH 8.0), 10 mmol/L KCl, 10 mmol/L (NH4)2SO4, 8 mmol/L MgSO4, 1 mL/L Tween® 20, SYBR® Green I (Molecular Probes) diluted to 1 part in 100 000, 6 U Aac DNA polymerase (K.K. DNAFORM), and 0.3 ␮g Taq MutS (K.K. DNAFORM). The templates used for the SMAP 2 assays were prepared by mixing 1 volume of whole blood with 2 volumes of 150 mmol/L NaOH, vortex-mixing, and incubating at 98 °C for 5 min. The sample preparation was chilled on ice, and 0.5 ␮L was added directly into a reaction mixture. SMAP 2 reactions were assembled on ice and incubated at 60 °C for 60 min. The Mx3000P system (Stratagene) or the ABI 7500 Fast Real-Time PCR instrument (Applied Biosystems) was used to maintain isothermal conditions and to monitor the change in fluorescence intensity due to intercalation of the SYBR Green I dye during the reaction. DETECTION OF VKORC1 ⴚ1639G>A, CYP2C9*2, AND CYP2C9*3 POLYMORPHISMS BY PCR-RFLP

Genomic DNA was extracted from whole blood with a QIAamp Blood Kit (Qiagen). For the VKORC1 ⫺1639G⬎A polymorphism, PCR was carried out in a 25-␮L volume containing 0.1 ␮g genomic DNA, 12.5 pmol of each primer (5⬘-ATCCCTCTGGGAAGT CAAGC-3⬘ and 5⬘-CACCTTCAACCTCTCCATCC3⬘; Kurabo), 0.2 nmol/L of each dNTP, 2.5 ␮L PCR Gold Buffer (Applied Biosystems), 37.5 nmol/L MgCl2, and 1 U Taq DNA polymerase (Applied Biosystems). The cycling profile used for all reactions consisted of an initial step at 95 °C for 5 min; 35 cycles of 95 °C for 60 s, 60 °C for 30 s, and 72 °C for 2 min; and a 10-min final extension at 72 °C. The resulting 636-bp product was digested for 1 h with 20 U NciI at 37 °C and analyzed by electrophoresis in a 20-g/L agarose gel. NciI cut PCR products containing the ⫺1639A allele into 522- and 114-bp fragments and cut PCR products containing the ⫺1639G allele into 472-, 50-, and 114-bp fragments.

Rapid Genotyping for Warfarin Dose Adjustment by SMAP 2

Fig. 1. VKORC1 sequence and the primer set for VKORC1 typing. (A), Sequence of the VKORC1 gene. Arrow indicates position of the SNP nucleotides of the VKORC1 allele (⫺1639G⬎A). Heavy lines indicate DNA sequences used for primer design. (B), Primer set for SNP typing of the VKORC1 allele. Turn-back primer (TP) W and folding primer (FP) M were designed to be Wt allele–specific and Mt allele–specific, respectively. Arrows indicate the nucleotides corresponding to the SNP site. The FP has a specific sequence at the 5⬘ end (lowercase letters) to enable self-annealing hairpin formation. Only 1 outer primer (OP) is used in the Mt allele assay. BP, boost primer.

For the CYP2C9*2 polymorphism, PCR was carried out in a 25-␮L volume containing 0.1 ␮g genomic DNA, 10 pmol of each primer (5⬘-GTATTTTGGCCT GAAACCCATA-3⬘ and 5⬘-GGCCTTGGTTTTTCT CAACTC-3⬘; Kurabo), 5.0 nmol/L of each dNTP, 2.5 ␮L PCR Gold Buffer (Applied Biosystems), 37.5 mmol/L MgCl2, and 1 U Taq DNA polymerase (Applied Biosystems). The cycling profile consisted of an initial step at 94 °C for 9 min; 30 cycles of 94 °C for 30 s, 58 °C for 30 s, and 72 °C for 30 s; and a 5-min final extension at 72 °C. The resulting 454-bp product was digested for 1 h with 20 U AvaII at 37 °C and analyzed by electrophoresis in a 20-g/L agarose gel. AvaII cut the PCR products containing the 430C allele into 397- and 57-bp fragments and did not cut PCR products containing the 430T allele. For the CYP2C9*3 polymorphism, the PCR was the same as that for CYP2C9*2, but the primer sequences were 5⬘-TGCACGAGGTCCAGAGGTAC3⬘ and 5⬘-ACAAACTTACCTTGGGAATGAGA-3⬘ (Kurabo). The cycling profile consisted of an initial step at 94 °C for 5 min; 30 cycles of 94 °C for 30 s, 57 °C for 30 s, and 72 °C for 30 s; and a 7-min final extension at 72 °C. The resulting 105-bp product was digested for 1 h with 20 U KpnI at 37 °C and analyzed by electrophoresis on a 30-g/L agarose gel. KpnI did not cut PCR products containing the 1075A allele but did cut PCR

products containing the 1075C allele into 85- and 20-bp fragments. Results IDENTIFICATION OF THE VKORC1 ⴚ1639G>A POLYMORPHISM BY SMAP 2

Fig. 1 shows the locations and sequences of the primers for SMAP 2– based detection of the VKORC1 ⫺1639G⬎A SNP. The discrimination primers used to recognize the Wt and mutant (Mt) alleles of the VKORC1 polymorphism were of different designs. The Wt allele (⫺1639G) was recognized by the 5⬘ end of the turn-back primer, with the discriminating nucleotide being the second from the end (n ⫺ 1). For the Mt allele (⫺1639A), the 3⬘ end of the folding primer was designed to detect the SNP, with the discriminating nucleotide also being the second from the end (n ⫺ 1). These disparate design approaches were the product of empirically screening many possibilities and selecting primer designs that provided high specificity and low misamplification frequency. The VKORC1 ⫺1639G (Wt) SMAP 2 assay produced amplification results for homozygous Wt (Wt/ Wt) and heterozygous (Wt/Mt) blood samples, beginning at approximately 20 min of incubation and reaching a plateau at approximately 30 min. As was Clinical Chemistry 55:4 (2009) 807

Fig. 2. Amplification time course of the SMAP 2. SNP typing of VKORC1 ⫺1639A, CYP2C9*2, and CYP2C9*3 with allele-specific primers in human blood samples. The 3 SNP genotypes are shown. Left, center, and right panels show assay results for homozygous Wt (Wt/Wt), heterozygote (Wt/Mt), and homozygous Mt (Mt/Mt), respectively. Each panel shows results for 3 typical patient samples. dR, difference of relative fluorescence unit.

expected, misamplification was suppressed for the ⫺1639A homozygous Mt (Mt/Mt) samples; however, misamplification of the homozygous Mt became evident after 40 min (Fig. 2) and was reproducible. Conversely, similar amplification kinetics were observed for the VKORC1 ⫺1639A (Mt) SMAP 2 assay. Conclusive results from homozygous Mt and heterozygous (Wt/Mt) blood samples were obtained after approximately 30 min, and the results reached a plateau at approximately 40 min. Misamplification from homozygous Wt blood samples did not become apparent even after 40 min of incubation, however. Hence, SNP calls for the VKORC1 SMAP 2 assays were based on a standard detection time of 40 min. To demonstrate the accuracy and clinical utility of these assays for handling multiple samples, we assayed 125 blood samples by both the SMAP 2 and PCR-RFLP methods. Of the 125 samples, 1 person was homozygous Wt, 25 were heterozygous (Wt/Mt), and 99 were homozygous for VKORC1 ⫺1639A (Mt/Mt). All of the data for the SMAP 2 and PCR-RFLP analyses showed perfect concordance. 808 Clinical Chemistry 55:4 (2009)

DISCRIMINATION OF SUBFAMILY GENES AND IDENTIFICATION OF CYP2C9 POLYMORPHISMS BY SMAP 2

Members of the large gene family encoding the cytochrome P450 proteins are challenging to distinguish from each other by hybridization or amplification techniques. In the case of CYP2C9, it is necessary to discriminate the signals from those for the highly related subtypes CYP2C8 (cytochrome P450, family 2, subfamily C, polypeptide 8), CYP2C18 (cytochrome P450, family 2, subfamily C, polypeptide 18), and CYP2C19 (cytochrome P450, family 2, subfamily C, polypeptide 19), because all 4 genes have a high degree of sequence similarity. We selected SMAP 2 primer sequences that were unique to the CYP2C9 subtype to enable amplification of this subtype alone; these primer sequences imperfectly matched the other 3 family members (CYP2C8, CYP2C18, and CYP2C19). The locations and sequences of the primers for CYP2C9*2 and CYP2C9*3 are shown in Figs. 3 and 4, respectively. For the CYP2C9*2 and CYP2C9*3 polymorphisms, we engineered the folding primer to be the

Rapid Genotyping for Warfarin Dose Adjustment by SMAP 2

Fig. 3. The primer set for CYP2C9*2 typing and sequence alignment of CYP2C9 subfamily genes. (A), Sequence alignment of CYP2C9 subfamily genes. Dots show positions for which the nucleotides are identical to those in the CYP2C9 sequence. Arrow indicates position of SNP nucleotides (430C⬎T). Heavy lines on the sequences indicate primer regions. (B), Primer set for SNP typing of the CYP2C9*2 allele. Folding primers (FPs) W and M were designed to be specific for the Wt allele and the Mt allele, respectively. The FP has a specific sequence at the 5⬘ end (lowercase letters) to enable self-annealing hairpin formation. Arrows indicate the nucleotides corresponding to SNP sites. The turn-back primer (TP) was placed 50 bp downstream from the SNP site to discriminate CYP2C9 from other members of the gene subfamily. BP, boost primer; OP, outer primer; CP, competitive probe.

SNP-discrimination primer, and its 3⬘ terminal nucleotide recognized both Wt and Mt alleles. We analyzed 125 blood samples. For the CYP2C9*2 polymorphism, 123 were homozygous Wt (Wt/Wt), 1 was heterozygous (Wt/Mt), and 1 was homozygous Mt (Mt/Mt). For the CYP2C9*3 polymorphism, 116 were homozygous Wt, 8 were heterozygous (Wt/Mt), and 1 was homozygous Mt. All of the data for the 125 human blood samples were verified by PCRRFLP analyses, which demonstrated perfect concordance with the SMAP 2 results and confirmed the SMAP 2 assay’s high specificity for the CYP2C9 alleles. The amplification time course was similar to that of VKORC1 assays (Fig. 2), and misamplification was never observed, even after 40-min incubations. Discussion In this study, we established a SMAP 2– based diagnostic method to detect VKORC1 ⫺1639G⬎A, CYP2C9*2, and CYP2C9*3 genetic polymorphisms, which are important

for evaluating adjustments in warfarin dosage. Our method was able to detect these 3 SNPs (or Wt sequences) within 1 h with ⬍1 ␮L of whole blood per assay. The SMAP 2 is a polymerase-based chain reaction that uses a unique primer design and a strand-displacing polymerase for amplification of target DNA sequences under isothermal conditions. In combination with the unique background-suppression capabilities of MutS, the SMAP 2 can detect a particular SNP by DNA amplification alone, with no requirement for any downstream analysis, such as restriction enzyme treatment, electrophoresis, or probe hybridization. Furthermore, commonly used PCR-based techniques usually require careful DNA extraction, because impurities interfere with the enzymatic activity of Taq DNA polymerase. The SMAP 2, however, uses the enzyme Aac polymerase, which is highly resistant to cellular contaminants; hence, the assay works directly on blood samples, which require only a simple initial heatlysis and -denaturation step. Furthermore, the specificity of the SMAP 2 is dictated by the amplification process: Detection of amplified DNA with an intercalating fluoresClinical Chemistry 55:4 (2009) 809

Fig. 4. The primer set for CYP2C9*3 typing and sequence alignment of CYP2C9 subfamily genes. (A), Sequence alignment of CYP2C9 subfamily genes. Dots show positions for which the nucleotides are identical to those in the CYP2C9 sequence. Arrow indicates position of SNP nucleotides (1075A⬎C). Heavy lines on the sequences indicate primer regions. (B), Primer set for SNP typing of the CYP2C9*3 allele. Folding primers (FPs) W and M were designed to be specific for the Wt allele and the Mt allele, respectively. The FP has a specific sequence at the 5⬘ end (lowercase letters) to enable self-annealing hairpin formation. Arrows indicate the nucleotides corresponding to SNP sites.

cent dye is sufficient for making an SNP determination. Blood sample preparation and assay setup generally take 20 min, and the SMAP 2 reaction itself takes 40 min. Reaction times vary slightly for different genes, depending on primer melting temperatures or the conformations of the primers themselves. These are the common factors that affect the amplification efficiency in primer-based amplification methods. The SMAP 2 demonstrated several advantages over conventional genotyping technologies for genotyping warfarin dose–related genes. One of the strong points is the flexibility of primer design. Loopmediated isothermal amplification (LAMP) is a rapid SNP-typing method similar to ours; however, this method requires the use of 2 SNP-discrimination primers that must hybridize around the SNP. In CYP2C9*2 genotyping, for example, the DNA sequence around the SNP is the same as in the CYP2C19 gene (Fig. 3). Consequently, the LAMP assay’s inner primers cannot hybridize to CYP2C9 selectively because the 2 genes have identical sequences in this critical area. Because the SMAP 2 requires only a single discrimination primer for genotyping SNPs, it has greater design flexibility and is better suited for subfamily discrimination. In our method, we engineered the folding primer to be the SNP-discrimination primer and placed the turn-back primer 50 bp downstream from the SNP site to discriminate CYP2C9 from other members of 810 Clinical Chemistry 55:4 (2009)

the subfamily. Our SNP-detection primer (i.e., the folding primer) is by itself not specific for CYP2C9 for the same reason as the LAMP primer, but because it is used in combination with the turn-back primer, the assay is specific for CYP2C9. It is important to note that CYP2C9 metabolizes not only S-warfarin but also many other drugs, including phenytoin (35 ), tolbutamide (36 ), losartan (37 ), and nonsteroidal antiinflammatory drugs (38 – 40 ). Hence, the SMAP 2– based CYP2C9 assay used in this study may also be useful for similar pharmacokinetics studies or the clinical use of these and other drugs. The methods we have described make possible rapid and accurate genotyping in the clinical setting as well as adjustments of warfarin dose based on genetic information. For more personalized medical care and improvement in the efficacy and safety of anticoagulation therapy, SMAP 2– based diagnostics may become an important technology for future point-of-care testing.

Author Contributions: All authors confirmed they have contributed to the intellectual content of this paper and have met the following 3 requirements: (a) significant contributions to the conception and design, acquisition of data, or analysis and interpretation of data; (b) drafting

Rapid Genotyping for Warfarin Dose Adjustment by SMAP 2

or revising the article for intellectual content; and (c) final approval of the published article. Authors’ Disclosures of Potential Conflicts of Interest: Upon manuscript submission, all authors completed the Disclosures of Potential Conflict of Interest form. Potential conflicts of interest: Employment or Leadership: P.E. Cizdziel, K.K. DNAFORM. Consultant or Advisory Role: None declared. Stock Ownership: None declared. Honoraria: None declared. Research Funding: Y. Hayashizaki, research grant to the RIKEN Genome Exploration Research Project from the Ministry of Education,

Culture, Sports, Science, and Technology of Japan (MEXT). This research was partially supported by the Ministry of Education, Culture, Sports, Science, and Technology of Japan, Grant-in-Aid for Young Scientists (B), 19790115, 2007. Expert Testimony: None declared. Role of Sponsor: The funding organizations played no role in the design of study, choice of enrolled patients, review and interpretation of data, or preparation or approval of manuscript. Acknowledgments: We thank Aiko Matsumoto (Gunma University) for her secretarial assistance and Tomoko Senbongi (Gunma University) for technical assistance.

References 1. Kaminsky LS, Zhang ZY. Human P450 metabolism of warfarin. Pharmacol Ther 1997;73:67–74. 2. Anticoagulants in the Secondary Prevention of Events in Coronary Thrombosis (ASPECT) Research Group. Effect of long-term oral anticoagulant treatment on mortality and cardiovascular morbidity after myocardial infarction. Lancet 1994;343:499 –503. 3. Cannegieter SC, Rosendaal FR, Wintzen AR, van der Meer FJ, Vandenbroucke JP, Brie¨t E. Optimal oral anticoagulant therapy in patients with mechanical heart valves. N Engl J Med 1995;333: 11–7. 4. Fihn SD, McDonell M, Martin D, Henikoff J, Vermes D, Kent D, White RH, for the Warfarin Optimized Outpatient Follow-up Study Group. Risk factors for complications of chronic anticoagulation. A multicenter study. Ann Intern Med 1993;118:511–20. 5. Stroke Prevention in Atrial Fibrillation Investigators. Adjusted-dose warfarin versus low-intensity, fixed-dose warfarin plus aspirin for high-risk patients with atrial fibrillation: Stroke Prevention in Atrial Fibrillation III randomised clinical trial. Lancet 1996;348:633– 8. 6. Li T, Chang CY, Jin PJ, Lin PJ, Khvorova A, Stafford DW. Identification of the gene for vitamin K epoxide reductase. Nature 2004;427:541– 4. 7. Chu PH, Huang TY, Williams J, Stafford DW. Purified vitamin K epoxide reductase alone is sufficient for conversion of vitamin K epoxide to vitamin K and vitamin K to vitamin KH2. Proc Natl Acad Sci U S A 2006;103:19308 –13. 8. Wadelius M, Chen LY, Downes K, Ghori J, Hunt S, Eriksson N, et al. Common VKORC1 and GGCX polymorphisms associated with warfarin dose. Pharmacogenomics J 2005;5:262–70. 9. Veenstra DL, You JH, Rieder MJ, Farin FM, Wilkerson HW, Blough DK, et al. Association of vitamin K epoxide reductase complex 1 (VKORC1) variants with warfarin dose in Hong Kong Chinese patient population. Pharmacogenet Genomics 2005;15:687–91. 10. Sconce EA, Kahn TI, Wynne HA, Avery P, Monkhouse L, King BP, et al. The impact of CYP2C9 and VKORC1 genetic polymorphism and patient characteristics upon warfarin dose requirements: proposal for a new dosing regimen. Blood 2005;106:2329 –33. 11. Aquilante CL, Langaee TY, Lopez LM, Yarandi HN, Tromberg JS, Mohuczy D, et al. Influence of coagulation factor, vitamin K epoxide reductase complex subunit 1, and cytochrome P450 2C9 gene polymorphisms on warfarin dose re-

12.

13.

14.

15.

16. 17.

18.

19.

20.

21.

22.

23.

quirements. Clin Pharmacol Ther 2006;79:291– 302. Lee SC, Ng SS, Oldenburg J, Chong PY, Rost S, Guo JY, et al. Interethnic variability of warfarin maintenance requirement is explained by VKORC1 genotype in an Asian population. Clin Pharmacol Ther 2006;79:197–205. Rieder MJ, Reiner AP, Gare BF, Nickerson DA, Eby CS, McLeod HL, et al. Effect of VKORC1 haplotypes on transcriptional regulation and warfarin dose. N Engl J Med 2005;352:2285–93. Obayashi K, Nakamura K, Kawana J, Ogata H, Hanada K, Kurabayashi M, et al. VKORC1 gene variations are the major contributors of variation in warfarin dose in Japanese patients. Clin Pharmacol Ther 2006;80:169 –78. Lee CR, Goldstein JA, Pieper JA. Cytochrome P450 2C9 polymorphism: a comprehensive review of the in-vitro and human data. Pharmacogenetics 2002;12:251– 63. Daly AK, King BP. Pharmacogenetics of oral anticoagulants. Pharmacogenetics 2003;13:247–52. Takahashi H, Echizen H. Pharmacogenetics of warfarin elimination and its clinical implications. Clin Pharmacokinet 2001;40:587– 603. Furuya H, Fernandez-Salguero P, Gregory W, Taber H, Steward A, Gonzalez FJ, Idle JR. Genetic polymorphism of CYP2C9 and its effect on warfarin maintenance dose requirement inpatients undergoing anticoagulation therapy. Pharmacogenetics 1995;5:389 –92. Rettie AE, Haining RL, Bajpai M, Levy RH. A common genetic basis for idiosyncratic toxicity of warfarin and phenytoin. Epilepsy Res 1999;35:253–5. U.S. Food and Drug Administration. FDA news: FDA approves updated warfarin (Coumadin) prescribing information. http://www.fda.gov/bbs/ topics/NEWS/2007/NEW01684.html (Accessed January 2009). Caraco Y, Blotnick S, Muszkat M. CYP2C9 genotype-guided warfarin prescribing enhances the efficacy and safety of anticoagulation: a prospective randomized controlled study. Clin Pharmacol Ther 2008;83:460 –70. Anderson JL, Horne BD, Stevens SM, Grove AS, Barton S, Nicholas ZP, et al. Randomized trial of genotype-guided versus standard warfarin dosing in patients initiating oral anticoagulation. Circulation 2007;116:2563–70. Iwasaki M, Yonekawa T, Otsuka K, Suzuki W, Nagamine K, Hase T, et al. Validation of the loop-mediated isothermal amplification method for single nucleotide polymorphism genotyping with whole blood. Genome Lett 2003;2:119 –

26. 24. Lyamichev V, Mast AL, Hall JG, Prudent JR, Kaiser MW, Takova T, et al. Polymorphism identification and quantitative detection of genomic DNA by invasive cleavage of oligonucleotide probes. Nat Biotechnol 1999;17:292– 6. 25. Compton J. Nucleic acid sequence-based amplification. Nature 1991;350:91–2. 26. Wabuyele MB, Farquar H, Stryjewski W, Hammer RP, Soper SA, Cheng YW, Barany F. Approaching real-time molecular diagnostics: single-pair fluorescence resonance energy transfer (spFRET) detection for the analysis of low abundant point mutations in K-ras oncogenes. J Am Chem Soc 2003;125:6937– 45. 27. Walker GT, Fraiser MS, Schram JL, Little MC, Nadeau JG, Malinowski DP. Strand displacement amplification—an isothermal, in vitro DNA amplification technique. Nucleic Acids Res 1992;20: 1691– 6. 28. Lizardi PM, Huang X, Zhu Z, Bray-Ward P, Thomas DC, Ward DC. Mutation detection and singlemolecule counting using isothermal rolling-circle amplification. Nat Genet 1998;19:225–32. 29. Erice A, Brambilla D, Bremer J, Jackson JB, Kokka R, Yen-Lieberman B, Coombs RW. Performance characteristics of the QUANTIPLEX HIV-1 RNA 3.0 assay for detection and quantitation of human immunodeficiency virus type 1 RNA in plasma. J Clin Microbiol 2000;38:2837– 45. 30. Heim M, Meyer UA. Genotyping of poor metabolisers of debrisoquine by allele-specific PCR amplification. Lancet 1990;336:529 –32. 31. Bassler HA, Flood SJ, Livak KJ, Marmaro J, Knorr R, Batt CA. Use of a fluorogenic probe in a PCR-based assay for the detection of Listeria monocytogenes. Appl Environ Microbiol 1995;61: 3724 – 8. 32. Bao YP, Huber M, Wei TF, Marla SS, Storhoff JJ, Mu¨ller UR. SNP identification in unamplified human genomic DNA with gold nanoparticle probes. Nucleic Acids Res 2005;33:e15. 33. Mitani Y, Lezhava A, Kawai Y, Kikuchi T, OguchiKatayama A, Kogo Y, et al. Rapid SNP diagnostics using asymmetric isothermal amplification and a novel mismatch-suppression technology. Nat Methods 2007;4:257– 62. 34. Watanabe J, Mitani Y, Kawai Y, Kikuchi T, Kogo Y, Oguchi-Katayama A, et al. Use of a competitive probe in assay design for genotyping of the UGT1A1 *28 microsatellite polymorphism by the smart amplification process. Biotechniques 2007; 43:479 – 84. 35. Veronese ME, Mackenzie PI, Doecke CJ, McMa-

Clinical Chemistry 55:4 (2009) 811

nus ME, Miners JO, Birkett DJ. 1991. Tolbutamide and phenytoin hydroxylations by cDNA-expressed human liver cytochrome P4502C9. Biochem Biophys Res Commun 1991;175:1112– 8. 36. Relling MV, Aoyama T, Gonzalez FJ, Meyer UA. Tolbutamide and mephenytoin hydroxylation by human cytochrome P450 in the CYP2C subfamily. J Pharmacol Exp Ther 1990;252:442–7. 37. Stearns RA, Chakravarty PK, Chen R, Chiu SH.

812 Clinical Chemistry 55:4 (2009)

Biotransformation of losartan to its active carboxylic acid metabolite in human liver microsomes. Role of cytochrome P4502C and 3A subfamily members. Drug Metab Dispos 1995;23:207–15. 38. Tracy TS, Rosenbluth BW, Wringhton SA, Gonzalez FJ, Korzekwa KR. Role of cytochrome P450 2C9 and an allelic variant in the 4⬘-hydroxylation of (R)- and (S)-flurbiprofen. Biochem Pharmacol 1995;49:1269 –75.

39. Hamman MA, Thompson GA, Hall SD. Regioselective and stereoselective metabolism of ibuprofen by human cytochrome P450 2C. Biochem Pharmacol 1997;54:33– 41. 40. Mancy A, Broto P, Dijols S, Dansette PM, Mansuy D. The substrate binding site of human liver cytochrome P450 2C9: an approach using designed tienilic acid derivatives and molecular modeling. Biochemistry 1995;34:10365–75.

Clinical Chemistry 55:4 813–822 (2009)

Infectious Disease

Generating Aptamers for Recognition of Virus-Infected Cells Zhiwen Tang,1† Parag Parekh,1† Pete Turner,2 Richard W. Moyer,2* and Weihong Tan1,2,3,4,5,6,7*

BACKGROUND: The development of molecular probes capable of recognizing virus-infected cells is essential to meet the serious clinical, therapeutic, and nationalsecurity challenges confronting virology today. We report the development of DNA aptamers as probes for the selective targeting of virus-infected living cells. METHODS:

To create aptamer probes capable of recognizing virus-infected cells, we used cell-SELEX (systematic evolution of ligands via exponential enrichment), which uses intact infected live cells as targets for aptamer selection. In this study, vaccinia virus– infected and – uninfected lung cancer A549 cells were chosen to develop our model probes.

RESULTS:

A panel of aptamers has been evolved by means of the infected cell–SELEX procedure. The results demonstrate that the aptamers bind selectively to vaccinia virus–infected A549 cells with apparent equilibrium dissociation constants in the nanomolar range. In addition, these aptamers can specifically recognize a variety of target infected cell lines. The aptamers’ target is most likely a viral protein located on the cell surface.

CONCLUSIONS: The success of developing a panel of DNA-aptamer probes capable of recognizing virusinfected cells via a whole living cell–SELEX selection strategy may increase our understanding of the molecular signatures of infected cells. Our findings suggest that aptamers can be developed as molecular probes for use as diagnostic and therapeutic reagents and for facilitating drug delivery against infected cells.

© 2009 American Association for Clinical Chemistry

Successful viral infection (1–7 ) involves (a) binding of the virus to the host cell surface, (b) receptor-mediated entry, (c) uncoating of the virus particle, (d) production of viral proteins and replication of the viral DNA, and (e) virus assembly followed by release of new virus

1

Departments of Chemistry, 2 Molecular Genetics and Microbiology, and 3 Physiology and Functional Genomics, 4 Shands Cancer Center, 5 Center for Research at Bio/Nano Interface, 6 UF Genetics Institute, 7 McKnight Brain Institute, University of Florida, Gainesville, FL. † Z. Tang and P. Parekh made equal contributions as first authors. * Address correspondence to these authors at: (W. Tan) 114 Leigh Hall, Department of Chemistry, University of Florida, Gainesville, FL 32611. Fax 352-8462410; e-mail [email protected]. (R.W. Moyer) P.O. Box 100266, Department of

particles (1, 3, 7 ). As part of viral replication, assembly, and release during this process, the cell surface is modified by the insertion of viral proteins (8 –10 ). These cell surface alterations or markers provide virusspecific targets for the design of specific molecular probes that can recognize and provide molecular signatures for virus-infected cells; however, because few biomarkers are currently known and available for effective diagnosis of viral disease, the ability of investigators to target and study such viral proteins and infected cells has thus far been limited. Aptamers offer a promising technology for addressing this deficiency. Aptamers are single-stranded DNA (ssDNA)8 or/RNA oligonucleotides that can be selected to bind to a variety of target molecules, including whole cells, large purified molecules such as proteins, and small molecules such as ATP (11–13 ). After folding into a well-defined spatial conformation, aptamers are able to specifically recognize their target molecules with high affinities (14 ). Specific aptamers are selected by an in vitro selection process, termed SELEX (systematic evolution of ligands via exponential enrichment). SELEX consists of a series of repeated enrichment cycles and counterselections based on competitive binding that ultimately selects for a group of aptamers that will bind specifically to a given specific target (15, 16 ). Compared with the use of antibodies in traditional probes, aptamers have advantages for targeting virally infected cells, including stable performance, reproducible properties, low immunogenicity, high selectivity, strong affinity, and capabilities of facile modification for further optimization (17, 18 ). Most of all, aptamers can be used as a discovery tool to analyze the molecular basis of a specific disease or infection process. Aptamers can be selected to recognize infected cells without prior knowledge of the new potential biomarkers after cells are infected; moreover, once the aptamers are selected, they can be used to detect these specific biomarkers, as we have demonstrated in our pre-

Molecular Genetics and Microbiology, University of Florida College of Medicine, Gainesville, FL 32610-0266. E-mail [email protected]. Received August 2, 2008; accepted February 3, 2009. Previously published online at DOI: 10.1373/clinchem.2008.113514 8 Nonstandard abbreviations: ssDNA, single-stranded DNA; SELEX, systematic evolution of ligands via exponential enrichment; FBS, fetal bovine serum; GFP, green fluorescent protein; Kd, equilibrium dissociation constants.

813

vious work with leukemia cell samples (19, 20 ). Such benefits have attracted much scientific attention, leading to the conclusion that aptamers also hold great promise for the detection of virus-infected cells and for related bioassays required for early detection of infection and, in the case of inhibitory aptamers, the treatment of disease (21–23 ). Conventional molecular probes are usually designed to interact with simple entities, such as a purified proteins or small molecules. In practical biomedical applications and studies, however, the targets are usually much more complex. Moreover, the target molecule in many cases is unknown or difficult to identify, which makes the development of biomarker probes with traditional technologies nearly impossible. Aptamer selection, on the other hand, allows researchers to evolve a molecular probe against unidentified or unknown targets by taking advantage of the unique repetitive competition, amplification, and counterselection strategy noted above and described at length later. This selection process produces molecular probes capable of recognizing unknown or complex targets, such as membrane proteins of living cells, including receptors. For example, use of SELEX has recently led to the development of unique probes that specifically recognize blood cancer cells, red blood cell membranes, and endothelial cells (24 –30 ). Therefore, compared with current biomarker-probe technologies, the cell-SELEX strategy offers unique advantages for developing molecular probes with the specificities and sensitivities required to recognize the protein modifications in infected cell membranes that occur during viral infection. Some groups have developed aptamers that target vaccinia and Mycobacterium tuberculosis (31, 32 ); however, although these aptamers can specifically recognize the infectious agent, they cannot recognize infected cells (33–36 ). We therefore developed a methodology for isolating aptamers that can specifically recognize vaccinia virus–infected A549 cells, which we used as the model for intact infected cell–SELEX selection. Vaccinia virus, a member of Orthopoxvirus genus (6, 36 ), is closely related to both monkeypox and variola (smallpox) viruses. It is true that any given pair of cells infected by the same virus may reflect only slight molecular differences. Nonetheless, we demonstrate the ability of our aptamer-based probe not only to identify infected cells but also to identify viral modifications of the cellular plasma membrane. Materials and Methods CELL LINES, VIRUSES, AND REAGENTS

Cell lines A549 (human lung carcinoma), Hep 3B and Hep G2 (human hepatocellular carcinoma), HeLa (hu814 Clinical Chemistry 55:4 (2009)

man cervical adenocarcinoma), HCT 116 (human colon cancer), and CaOV3 (human ovarian cancer) were obtained from the ATCC and cultured in MEM medium (Gibco/Invitrogen) supplemented with 100 mL/L fetal bovine serum (FBS) (heat-inactivated; Gibco/Invitrogen) 30 mg/L penicillin, and 50 mg/L streptomycin (Cellgro/Mediatech). To initiate infections, we added virus to cells in culture media that did not contain FBS. Two vaccinia virus strains, WR and IHDJ, were used in this work. A recombinant of vaccinia virus WR that contained the gene for green fluorescent protein (GFP) under the control of an early/late viral promoter was also used as indicated to facilitate the monitoring of viral infections. The wash buffer contained 4.5 g/L glucose and 5 mmol/L MgCl2 in Dulbecco PBS (Sigma–Aldrich). The binding buffer used for selection was prepared by adding yeast tRNA (0.1 g/L; Sigma–Aldrich), FBS (100 mL/L), and BSA (1 g/L; Fisher) into the wash buffer. These additions were designed to reduce background binding. Trypsin was purchased from Fisher Biotech. The Taq polymerase and deoxynucleoside triphosphates used in the PCR were obtained from Takara. VIRUS INFECTION

The cells were split and cultured for 24 h before infection. The virus was kept at ⫺80 °C in cell culture media and freshly thawed before infection. After thawing and dilution in infection culture media lacking FBS, the virus was sonicated for 1 min to disperse the viral particles. The cells were washed with the PBS buffer and mixed with sonicated virus in infection culture media. The multiplicity-of-infection value was 4 in all infection experiments. Infected cells were then gently agitated for 1 h to facilitate uniform adsorption of virus. After incubation, the virus inoculum was removed, fresh culture media was added, and the infection was continued. After 12 h, the infected cells were washed with PBS buffer before starting the selection process. For monitoring of selection by flow cytometry, the cells were washed 3 times with PBS and harvested by digestion with trypsin for 30 s to 1 min, after which the suspended cells were immediately incubated in cell culture media and swung slowly for 1 h at 37 °C to recover the cells. SELEX PRIMERS AND LIBRARY

We used a Cy5-labeled 5⬘ primer (5⬘–Cy5–ATCGT CTGCT CCGTC CAATA–3⬘) and a biotinylated 3⬘ primer (5⬘– biotin–GCACG ACCTCA CACCA AA–3⬘) in the PCR. The SELEX library consisted of a central randomized sequence of 45 nucleotides (N45) flanked by 2, 18-nucleotide primer-hybridization sites (5⬘– ATCGT CTG CTC CGT CCA ATA–N45–TTT GGT GTG AGG TCG TGC–3⬘). To eliminate the possibility

Aptamers for Recognizing Virus-Infected Cells

of nonspecific amplification of random library sequences in the PCR, the primers and library sequences were carefully optimized by means of online software for predicting oligonucleotide secondary structure (Oligo Analyzer; Integrated DNA Technologies).

Kit (Invitrogen). The Genome Sequencing Services Laboratory at the University of Florida sequenced a number of the cloned aptamer candidate sequences. The sequences were aligned and grouped by ClustalX software (version 1.83). The aptamer candidates were then grouped according to the repeats of their families.

SELEX PROTOCOL

The procedure for selection of whole infected living cells was as follows: The initial pool containing 20 nmol of DNA library was dissolved in 1.0 mL binding buffer and used for the first round of selection. In the sequential-selection round, the amount of ssDNA library was 200 pmol and was dissolved in 400 ␮L binding buffer. To minimize the potential for intermolecular hybridization, we denatured the pools by heating at 95 °C for 5 min and cooled them on ice for 10 min before selection. The ssDNA pool was then incubated with infected A549 cells on a 5.0 cm– diameter cell culture dish (Corning) and shaken at 120 rpm for 60 min in a cold room (0 °C– 4 °C). When the binding incubation was finished, the binding buffer was removed, and the cells were washed with 1.0 mL wash buffer. After washing, the cells were harvested and transferred into 400 ␮L water. The bound DNAs were eluted by heating at 95 °C for 5 min and then desalted for PCR amplification (10 –20 cycles of denaturation at 94 °C for 30 s, annealing at 60 °C for 30 s, and extension at 72 °C for 30 s). The last PCR cycle was followed by a 5-min extension at 72 °C. After 3 rounds, counter-selection was introduced to improve the specificity of the selected aptamer candidates. For counter-selection, the bound DNA was eluted in binding buffer, incubated with uninfected A549 cells, and shaken at 120 rpm for 60 min in a cold room (0 °C– 4 °C). The supernatant was desalted and then amplified by the PCR. A column containing 150 ␮L streptavidin-coated Sepharose beads (Amersham Bioscience/GE Healthcare) was used to make the sense Cy5-labeled ssDNA. The PCR product was rinsed 3 times through the column to attach the PCR product onto the streptavidin-coated Sepharose beads via the biotin–streptavidin interaction. The PCR product was then denatured by slowly washing with 0.5 mL 50 mmol/L NaOH. The sense ssDNA was then isolated from the biotinylated antisense ssDNA strand and eluted in the NaOH solution. After elution, the ssDNA solution was passed through a desalting column to remove the NaOH, which would interfere with the sequential PCR or the binding assay. To improve the affinity and specificity of the selected aptamers, we made the wash procedure more stringent by extending the wash time from 3 min to 10 min and by increasing the number of washes (from 2 to 5 washes). After 13 rounds of selection, the enriched ssDNA pool was PCR-amplified with unlabeled primers and cloned into Escherichia coli with the TA Cloning

FLOW CYTOMETRIC ANALYSIS

To evaluate the enrichment of aptamer candidates during selection, we incubated the Cy5-labeled ssDNA pool with 5 ⫻ 105 target cells in 200 ␮L of binding buffer on ice for 30 min to allow the binding of aptamer candidate molecules to their targets on the membranes of infected cells. Cells were then washed twice with 2.0 mL binding buffer and resuspended in 0.4 mL binding buffer. The fluorescence of Cy5 and GFP was measured with a FACScan cytometer (BD Immunocytometry Systems) by counting 30 000 events. The Cy5-labeled initial ssDNA library was used as a control background sample in all experiments. The binding affinity of a given aptamer was determined by measuring the flow cytometry signal characterizing the binding of infected cells to aptamer at concentrations of 0 –200 nmol/L. After subtracting the mean fluorescence value of the control sample, we used the ligand-binding analysis function of SigmaPlot software (Jandel Scientific) to calculate the apparent equilibrium dissociation constants (Kds) of the aptamer– cell interaction from the mean fluorescence intensities of target cells bound with aptamers (37, 38 ). TRYPSIN TREATMENT OF INFECTED CELLS

At 12 h after infection, cells were washed 3 times with 2 mL PBS buffer and then incubated with 1 mL Hanks’ balanced salt solution containing 0.5 g/L trypsin and 0.53 mmol/L EDTA at 37 °C for a selected time (2–20 min). To inhibit the trypsin, we quickly mixed the sample with 200 ␮L FBS and placed it on ice. The treated cells were then washed with 2 mL binding buffer and used in the aptamer-binding assay, as described for the flow cytometry analysis. APTAMER COMPETITION EXPERIMENT

In the competition experiment, 20-fold higher concentrations of unlabeled aptamer candidates were simultaneously introduced with Cy5-labeled aptamer to determine the effect of unlabeled aptamer on the binding capability of the labeled aptamer. In brief, infected cells (5 ⫻ 105) were incubated with 200 ␮L binding buffer containing 0.2 nmol unlabeled aptamers and 10 pmol Cy5-labeled aptamer. All other experimental conditions and operations were the same as described for the flow cytometric analysis. Clinical Chemistry 55:4 (2009) 815

Fig. 1. Schematic of the infected living cell– based aptamer selection. The ssDNA pool was incubated with vaccinia-infected A549 cells (target cells). After washing, the bound DNAs were eluted by heating to 95 °C for 5 min before incubating with uninfected A549 cells. The unbound DNA in supernatant was then amplified by the PCR. The sense ssDNAs were separated from PCR products for the next round of selection or tested by flow cytometry to monitor the SELEX progression. When the selected pool was sufficiently enriched, the aptamer pool was then cloned and sequenced for aptamer identification.

Results and Discussion INFECTED CELL–SELEX AND THE ENRICHMENT OF APTAMER CANDIDATES

The principle of the viral infected cell–SELEX strategy is illustrated in Fig. 1. As the number of selection rounds increases during infected cell–SELEX, an enhancement in fluorescence intensity (right shift) indicates enrichment for a DNA-sequence population specifically recognizing the infected cells (Fig. 2A). On the other hand, this enrichment (right shift) does not occur when the enriched aptamer pools are exposed to uninfected cells (Fig. 2B). The fluorescence signal showed considerable enhancement from the seventh pool, but did not increase from the ninth pool (even with increasing rounds of selection) up to the 13th pool, implying saturation of aptamer-candidate enrichment. The highly enriched DNA pools were then cloned and sequenced. 816 Clinical Chemistry 55:4 (2009)

IDENTIFICATION OF APTAMER CANDIDATES

The seventh and 13th selected pools were subjected to cloning with a TA Cloning Kit, and the individual clones were sequenced by a high-throughput genomesequencing method. Sequences derived from the seventh pool were more diverse than those in the 13th pool, clearly showing the process of enrichment during competitive selection. Many sequences in the seventh pool were not present in the 13th pool, whereas members of some families increased their frequencies in later pools. For example, the proportion of the TV01 sequence family increased from 4.8% to 53.7%, the TV03 family increased from 1.1% to 6.7%, and the TV06 family increased from 0.5% to 4%. On the other hand, the TV02 family decreased from 10.7% to 7.4% (Table 1). Interestingly, some aptamers, such as TV01, TV02, and TV03, had shorter sequences than those of the original DNA library, indicating an unexpected sequence optimization during the selection procedure. This shortening might have been caused by nonspecific

Aptamers for Recognizing Virus-Infected Cells

Fig. 2. Cytometry results of selected aptamer pools with infected and uninfected A549 cells. The red curve represents the nonspecific binding of the initial DNA library (Lib) with vaccinia-infected and -uninfected A549 cells. (A), The selected seventh and ninth aptamer pools showed increased fluorescence intensity with infected A549 cells, indicating enhanced binding of the aptamer pools to infected cells. (B), The selected seventh and ninth pools did not show enhanced binding to uninfected A549 cells.

amplification and sequence preference during the PCR. To be more specific, when a DNA library containing a large variety of sequences is amplified by the PCR, a proportion of the amplicons are the products of nonspecific amplification. Under these circumstances, if a nonspecific product also has an affinity for infected cells, some will be retained and

amplified in the next enriched pool; however, because the efficiency of the PCR is greater for shorter sequences than for longer targets, shorter sequences will amplify more efficiently than longer sequences. Given this logic, shorter sequences possessing similar affinities for target cells will ultimately be enriched faster than longer competitors and be pre-

Table 1. Sequences and Kds of the identified aptamers.a

a

In 13th selected pool, %

In 7th selected pool, %

Kd, nmol/L

53.7

4.8

7.3 (1.4)

ATCG TCT GCT CCG TCC AAT A CCT GCA TAT ACA CTT TGC ATG TGG T TTG GTG TGA GGT CGT GC

7.4

10.7

3.6 (0.48)

TV03

ATCG TCT GCT CCG TCC AAT A GCG TGC ATT GGT TTA CTG CAT CCG TGA AAC TGG GC T TTG GTG TGA GGT CGT GC

6.7

1.1

4.1 (0.59)

TV04

ATCG TCT GCT CCG TCC AAT A AAC CTG CAT AAT TTA TAA GTC TAG ACT GCT GCA T TTG GTG TGA GGT CGT GC

6.0



6.5 (0.94)

TV05

ATCG TCT GCT CCG TCC AAT A GCC TCA CCC TGC ATA ATT TAT AGA CTA CAC TTA GGA ATC GCT GCA T TTG GTG TGA GGT CGT GC

4.0



4.1 (0.44)

TV06

ATCG TCT GCT CCG TCC AAT A GGA CCG ATA GGA ACC ACG GAC TGC ATG TTT CTG CAT TTG ACG TGG T TTG GTG TGA GGT CGT GC

4.0

0.5

6.5 (1.3)

TV07

ATCG TCT GCT CCG TCC AAT A TCC GAG CAA GAA CTC ATA TTG CAT TAT TTA TAG CTA CGC GCT GCA T TTG GTG TGA GGT CGT GC

3.4



11.7 (2.9)

TV08

ATCG TCT GCT CCG TCC AAT A TGA TGA CAC CTG CAT AAT TTA TAG TGA GTC TTG ATT CAC GCT GCA T TTG GTG TGA GGT CGT GC

2.7



2.7 (6.2)

Name

Sequence

TV01

ATCG TCT GCT CCG TCC AAT A GT GCA TTG AAA CTT CTG CAT CCT CG T TTG GTG TGA GGT CGT GC

TV02

Underscored sequences represent primers, and sequences in boldface are coding areas conserved in the 8 aptamers. Percentages indicate proportions of the aptamer candidates in the seventh and 13th selected pools. No entry (—) indicates that the aptamer candidate was not present in the seventh pool. Kds are presented as the mean (SD) and are for tests with vaccinia-infected A549 cells.

Clinical Chemistry 55:4 (2009) 817

Fig. 3. Identification of aptamer candidates. Flow cytometry assay for the binding capacity of the Cy5-labeled sequences TV01, TV05, and TV07 with infected (A) and uninfected (B) A549 cells. The aptamer-candidate concentration in the binding buffer was 250 nmol/L. For infected cells, there is positive correlation between the GFP and TV01 signals (C); for infected cells stained with the DNA library, there is no correlation between the 2 channels (D).

sented as higher repeats in the selected pool. On the other hand, the folding of different aptamers may have some impact on PCR efficiency and thus also contribute to amplification preference. In other words, the lengths of some aptamers were unexpectedly shortened from their original sizes during the selection process, a procedure normally performed manually after selection (29, 39 ). Therefore, to the best of our knowledge, this report is the first to describe auto-optimization of aptamer length during selection. Eight aptamer sequences, TV01 to TV08, from the selected aptamer families were isolated from the 13th enriched pool for further analysis according to sequence relatedness. The binding properties of the selected candidates with respect to infected cells were then evaluated by flow cytometry according to the amount of Cy5-labeled aptamer bound to cells. All 8 818 Clinical Chemistry 55:4 (2009)

aptamer candidates showed preferential binding to vaccinia-infected A549 cells (Fig. 3; see Fig. 1 in the Data Supplement that accompanies the online version of this article at http://www.clinchem.org/content/ vol55/issue4). There was also a positive correlation between GFP expression (in infected cells) and aptamer signals (Fig. 3), a result consistent with expression of GFP encoded by the recombinant virus leading to the synthesis and presentation of the aptamer target in the infected cells. THE SELECTED APTAMERS CAN BIND TO DIFFERENT INFECTED CELLS

To confirm that the infection process was valid, we also infected several other cell lines besides A549 (including Hep 3B, Hep G2, HeLa, HCT 116, and CaOV3) with vaccinia WR and tested infected cells with the candidate aptamers. In each case, the aptamers reacted only

Aptamers for Recognizing Virus-Infected Cells

Fig. 4. The binding of aptamers targeting 2 vaccinia strains infecting HeLa cells. Flow cytometry assay for the binding capacity of the Cy5-labeled sequences TV01 and TV04 that target vaccinia WR–infected (A) and vaccinia IHDJ–infected (B) HeLa cells. The final aptamer concentration in the binding buffer was 250 nmol/L.

with infected cells (see Figs. 2 and 3 in the online Data Supplement) and not with uninfected cells (Fig. 4 in the online Data Supplement). These results suggest that the expression and presentations of the viral protein(s) targeted by the aptamers are similar on the various host cells; however, the intensities and patterns of the aptamers’ fluorescence signals were somewhat different in the different infected cell lines. We also tested a second strain of vaccinia virus (vaccinia IHDJ). HeLa cells infected with either the IHDJ or WR virus were tested with 2 of the aptamer candidates, TV01 and TV04. Fig. 4 shows that both aptamers showed considerable binding to infected HeLa cells. Again, however, the binding patterns of the aptamers against the 2 viral strains infecting these cells were different. This result might reflect slight strain-specific sequence differences in the targeted viral protein, or it may indicate slightly different associations with other viral or cellular proteins, although these viral strains are very closely related. THE SELECTED APTAMERS BIND TO INFECTED CELLS WITH HIGH AFFINITY AND SPECIFICITY

For a molecular probe, selectivity and affinity are key requirements. The binding affinities of the selected aptamers toward infected cells were evaluated by measuring the apparent Kds via flow cytometry. As shown in Fig. 5 and in Fig. 5 in the online Data Supplement, all 8 aptamers (TV01–TV08) bind to infected A549 cells with high affinity. The mean (SD) apparent Kds of these aptamers ranged from 2.7 (0.6) nmol/L to 11.7 (2.9) nmol/L (Table 1). The high affinities and specificities of the selected aptamers

are the direct result of the selection strategy we used, which uses increasingly stringent selection conditions with each subsequent round to select against sequences with relatively weak binding affinities. Moreover, the introduction of a counter-selection efficiently eliminates aptamer candidates that bind to surface proteins of uninfected cells. Consequently, only the aptamer candidates with the highest affinities and selectivities for proteins on the surfaces of infected cells are likely to emerge as successful aptamer candidates from the cell-SELEX process. THE BINDING OF APTAMER TV01 COMPETES WITH OTHER APTAMERS

One of the most important questions is whether the selected aptamers recognize one or multiple targets. According to the sequence alignment, all 8 aptamers share a common sequence, TGCAT (Table 1), and all of the aptamers have similar binding affinities and profiles with respect to infected A549 cells (Fig. 3; see Fig. 1 in the online Data Supplement). This finding could mean that the targets of these aptamers might be the same. To determine whether the aptamer targets were closely related or identical, we carried out a bindingcompetition experiment to observe the effects of the 8 unlabeled aptamers (TV01–TV08) on the binding of Cy5-labeled TV01. As expected, a 20-fold excess of unlabeled TV01 as a control effectively decreased the binding signal of Cy5-labeled TV01, indicating strong competition (see Fig. 6 in the online Data Supplement) and implying that unlabeled TV01 and labeled TV01 have similar or identical targets. Similar results were Clinical Chemistry 55:4 (2009) 819

Fig. 5. Kd determination for the aptamers. The Kds of aptamers were determined by flow cytometry: TV01 (A), TV02 (B), TV07 (C), and TV08 (D). All aptamers showed high affinity toward vaccinia-infected A549 cells with apparent Kd values in the nanomolar range.

obtained when unlabeled TV02 through TV08 were used as competitors, indicating comparable effects (Fig. 6 in the online Data Supplement). These results demonstrate that the binding sites of aptamers TV01– TV08 probably recognize the same target. Studies are ongoing to identify the target molecules of the selected aptamers. THE TARGET OF THE APTAMERS ON INFECTED CELLS IS MOST LIKELY A PROTEIN

According to the current literature, there are many potential targets for aptamers, including proteins, sugars, lipids, or other molecules commonly associated with cell surfaces. In our previous work with developing aptamers against cancer cells, the selected aptamers were usually found to bind to cell membrane proteins, and binding could be demonstrated by proteinasedigestion experiments and be confirmed with sequential biomarker discovery (29, 30 ). To determine whether these 8 aptamers recognize cell surface pro820 Clinical Chemistry 55:4 (2009)

teins, we treated infected A549 cells with trypsin for varying lengths of time before incubating them with the aptamers. After we treated cells with trypsin for 12 min, aptamers TV01–TV04 showed markedly decreased binding (Fig. 7 in the online Data Supplement). We also carried out this trypsin-digestion experiment with other vaccinia-infected cell lines, including Hep 3B, Hep G2, and HeLa. Trypsin digestion affected the binding affinities of aptamers TV01–TV04 on these infected cell lines as well (Fig. 2 in the online Data Supplement). After 20 min of digestion, the aptamers showed little binding to infected Hep 3B cells, but we noted some minimal binding to infected Hep G2 and HeLa cells. These results indicate that the binding targets of the aptamers are proteins susceptible to protease digestion. Aptamers are gaining attention as molecular probes and have demonstrated their potential in many diagnostic and therapeutic applications, including the ability to target whole living cells or whole microbes.

Aptamers for Recognizing Virus-Infected Cells

The selection of aptamers that recognize infected cells could provide both virologists and clinicians with a much-needed tool for molecular-based diagnostics and for investigating potential treatments. This method for generating probes can also be applied to nonantigenic viral targets against which antibodies cannot be raised easily. The development of molecular probes capable of recognizing virally infected cells via the whole living cell–SELEX strategy may increase our understanding of the molecular signatures of infected cells, thereby promising great improvements for diagnostic, therapeutic and drug-delivery applications.

Author Contributions: All authors confirmed they have contributed to the intellectual content of this paper and have met the following 3 requirements: (a) significant contributions to the conception and design, acquisition of data, or analysis and interpretation of data; (b) drafting

or revising the article for intellectual content; and (c) final approval of the published article. Authors’ Disclosures of Potential Conflicts of Interest: Upon manuscript submission, all authors completed the Disclosures of Potential Conflict of Interest form. Potential conflicts of interest: Employment or Leadership: None declared. Consultant or Advisory Role: None declared. Stock Ownership: None declared. Honoraria: None declared. Research Funding: This work was supported by NIH R21CA122648 and NIH R01 GM079359 grants, an NIH SERCEB 1-U54-AI-057157 grant, and an NSF NIRT grant. Expert Testimony: None declared. Role of Sponsor: The funding organizations played no role in the design of study, choice of enrolled patients, review and interpretation of data, or preparation or approval of manuscript. Acknowledgments: We are grateful for helpful discussions with Kwame Sefah and Ying Li on this work.

References 1. Bayry J, Kaveri SV. Modelling infectious diseases: viral complexity. Nat Rev Microbiol 2006;4: 637–9. 2. Wiles S, Hanage WP, Frankel G, Robertson B. Modelling infectious disease – time to think outside the box? Nat Rev Microbiol 2006;4:307–12. 3. Inman RD. Mechanisms of disease: infection and spondyloarthritis. Nat Clin Pract Rheum 2006;2: 163–9. 4. Hultgren S, Elam J. Infectious disease and women’s health. Nat Rev Microbiol 2008;6:254. 5. bu-Raddad LJ, Patnaik P, Kublin JG. Dual infection with HIV and malaria fuels the spread of both diseases in sub-Saharan Africa. Science 2006;314:1603– 6. 6. Adams MM, Rice AD, Moyer RW. Rabbitpox virus and vaccinia virus infection of rabbits as a model for human smallpox. J Virol 2007;81:11084 –95. 7. Lewis K. Persister cells, dormancy and infectious disease. Nat Rev Microbiol 2007;5:48 –56. 8. Vasileva L, Dumanova L. Conformational changes in the membranes of cells infected with Newcastle disease virus. Vet Med Nauki 1979;16:8 – 12. [Bulgarian] 9. Hestdal K, Aukrust P, Muller F, Lien E, Bjerkeli V, Espevik T, Froland SS. Dysregulation of membrane-bound tumor necrosis factor-␣ and tumor necrosis factor receptors on mononuclear cells in human immunodeficiency virus type 1 infection: low percentage of p75-tumor necrosis factor receptor positive cells in patients with advanced disease and high viral load. Blood 1997; 90:2670 –9. 10. Ahmed M, Schidlovsky G. Detection of virusassociated antigen on membranes of cells productively infected with Marek’s disease herpesvirus. Cancer Res 1972;32:187–92. 11. Wilson DS, Szostak JW. In vitro selection of functional nucleic acids. Annu Rev Biochem 1999;68: 611– 47. 12. Osborne SE, Ellington AD. Nucleic acid selection and the challenge of combinatorial chemistry. Chem Rev 1997;97:349 –70. 13. Rupcich N, Chiuman W, Nutiu R, Mei S, Flora KK,

14.

15.

16.

17.

18.

19.

20.

21.

22.

23.

24.

Li YF, Brennan JD. Quenching of fluorophorelabeled DNA oligonucleotides by divalent metal ions: Implications for selection, design, and applications of signaling aptamers and signaling deoxyribozymes. J Am Chem Soc 2006;128: 780 –90. Breaker RR. Natural and engineered nucleic acids as tools to explore biology. Nature 2004;432: 838 – 45. Ellington AD, Szostak JW. In vitro selection of RNA molecules that bind specific ligands. Nature 1990;346:818 –22. Tuerk C, Gold L. Systematic evolution of ligands by exponential enrichment: RNA ligands to bacteriophage T4 DNA polymerase. Science 1990; 249:505–10. Fang XH, Sen A, Vicens M, Tan WH. Synthetic DNA aptamers to detect protein molecular variants in a high-throughput fluorescence quenching assay. Chembiochem 2003;4:829 –34. Guo K, Wendel HP, Scheideler L, Ziemer G, Scheule AM. Aptamer-based capture molecules as a novel coating strategy to promote cell adhesion. J Cell Mol Med 2005;9:731– 6. Navani NK, Li YF. Nucleic acid aptamers and enzymes as sensors. Curr Opin Chem Biol 2006; 10:272– 81. Lee JF, Stovall GM, Ellington AD. Aptamer therapeutics advance. Curr Opin Chem Biol 2006;10: 282–9. Jayasena SD. Aptamers: an emerging class of molecules that rival antibodies in diagnostics. Clin Chem 1999;45:1628 –50. Mallikaratchy P, Tang ZW, Tan WH. Cell specific aptamer-photosensitizer conjugates as a molecular tool in photodynamic therapy. ChemMedChem 2008;3:425– 8. Smith JE, Medley CD, Tang ZW, Shangguan D, Lofton C, Tan WH. Aptamer-conjugated nanoparticles for the collection and detection of multiple cancer cells. Anal Chem 2007;79:3075– 82. Blank M, Weinschenk T, Priemer M, Schluesener H. Systematic evolution of a DNA aptamer binding to rat brain tumor microvessels. Selective

25.

26.

27.

28.

29.

30.

31.

32.

33.

targeting of endothelial regulatory protein pigpen. J Biol Chem 2001;276:16464 – 8. Daniels DA, Chen H, Hicke BJ, Swiderek KM, Gold L. A tenascin-C aptamer identified by tumor cell SELEX: systematic evolution of ligands by exponential enrichment. Proc Natl Acad Sci U S A 2003;100:15416 –21. Morris KN, Jensen KB, Julin CM, Weil M, Gold L. High affinity ligands from in vitro selection: complex targets. Proc Natl Acad Sci U S A 1998;95: 2902–7. Wang CL, Zhang M, Yang GA, Zhang DJ, Ding HM, Wang HX, et al. Single-stranded DNA aptamers that bind differentiated but not parental cells: subtractive systematic evolution of ligands by exponential enrichment. J Biotechnol 2003;102: 15–22. Shangguan D, Li Y, Tang Z, Cao ZC, Chen HW, Mallikaratchy P, et al. Aptamers evolved from live cells as effective molecular probes for cancer study. Proc Natl Acad Sci U S A 2006;103:11838 – 43. Tang ZW, Shangguan D, Wang KM, Shi H, Sefah K, Mallikratchy P, et al. Selection of aptamers for molecular recognition and characterization of cancer cells. Anal Chem 2007;79:4900 –7. Mallikaratchy P, Tang ZW, Kwame S, Meng L, Shangguan DH, Tan WH. Aptamer directly evolved from live cells recognizes membrane bound immunoglobin heavy mu chain in Burkitt’s lymphoma cells. Mol Cell Proteomics 2007;6: 2230 – 8. Nitsche A, Kurth A, Dunkhorst A, Pa¨nke O, Sielaff H, Junge W, et al. One-step selection of Vaccinia virus-binding DNA aptamers by MonoLEX. BMC Biotechnol 2007;7:48. Chen F, Zhou J, Luo FL, Mohammed AB, Zhang XL. Aptamer from whole-bacterium SELEX as new therapeutic reagent against virulent Mycobacterium tuberculosis. Biochem Biophys Res Commun 2007;357:743– 8. Rotz LD, Hughes JM. Advances in detecting and responding to threats from bioterrorism and emerging infectious disease. Nat Med 2004; 10(Suppl):S130 – 6.

Clinical Chemistry 55:4 (2009) 821

34. Morens DM, Folkers GK, Fauci AS. The challenge of emerging and re-emerging infectious diseases. Nature 2004;430:242–9. 35. Antia R, Regoes RR, Koella JC, Bergstrom CT. The role of evolution in the emergence of infectious diseases. Nature 2003;426:658 – 61. 36. Turner PC, Moyer RW. The cowpox virus fusion

822 Clinical Chemistry 55:4 (2009)

regulator proteins SPI-3 and hemagglutinin interact in infected and uninfected cells. Virology 2006;347:88 –99. 37. Davis KA, Abrams B, Lin Y, Jayasena SD. Use of a high affinity DNA ligand in flow cytometry. Nucleic Acids Res 1996;24:702– 6. 38. Davis KA, Lin Y, Abrams B, Jayasena SD. Staining

of cell surface human CD4 with 2⬘-F-pyrimidinecontaining RNA aptamers for flow cytometry. Nucleic Acids Res 1998;26:3915–24. 39. Shangguan D, Tang ZW, Mallikaratchy P, Xiao ZY, Tan WH. Optimization and modifications of aptamers selected from live cancer cell lines. Chembiochem 2007;8:603– 6.

Brief Communication

Clinical Chemistry 55:4 823–826 (2009)

A Multiplex Assay for Detecting Genetic Variations in CYP2C9, VKORC1, and GGCX Involved in Warfarin Metabolism Alex J. Rai,1,2* Nitin Udar,3 Rana Saad,4 and Martin Fleisher1

Departments of 1 Clinical Laboratories and 2 Pathology, Memorial Sloan-Kettering Cancer Center, New York, NY; 3 Beckman Coulter, Fullerton, CA; 4 Department of Pathology, Baylor University Medical Center, Dallas, TX; * address correspondence to this author at: Department of Clinical Laboratories, Memorial SloanKettering Cancer Center, 1275 York Avenue, Box 88, New York, NY 10065. Fax 212-717 3397; e-mail [email protected]. BACKGROUND: Patients differ in responses to warfarin, which is commonly prescribed to treat thromboembolic events. Genetic variations in the cytochrome P450, family 2, subfamily C, polypeptide 9 (CYP2C9), vitamin K epoxide reductase complex, subunit 1 (VKORC1), and gamma-glutamyl carboxylase (GGCX) genes have been shown to contribute to impaired metabolism of warfarin. METHODS:

We designed a custom multiplex singlenucleotide polymorphism (SNP) panel to interrogate the CYP2C9 *2, *3, VKORC1 (–1639G3 A), and GGCX (1181T3 G) alleles simultaneously in a single sample by use of single-base extension and capillary electrophoresis after genomic DNA extraction and PCR amplification.

RESULTS:

Our assay successfully detected various genotypes from known controls and 24 unknown samples. It was found to be 100% concordant with sequencing results.

CONCLUSIONS: Our multiplexed SNP panel can be successfully used in genotyping of patient blood samples. Results can be combined with other clinical parameters in an algorithm for warfarin dosing. These data provide a proof-in-principle of multiplexed SNP analysis using rapid single-base extension and capillary electrophoresis, and warrant additional validation using a larger cohort of patient samples.

The anticoagulant warfarin was prescribed to more than one million persons in the US in 2006. Warfarin effectively prevents coagulation and is used to treat patients who are at increased risk for thromboembolic events (1 ). Individual responses to warfarin therapy can vary greatly, and these differing responses have a genetic basis that has been recognized for many years (2, 3 ).

Pharmacogenetics is the study of genetic differences and their effects on drug metabolism. The application of pharmacogenetics in identifying individuals with the polymorphisms that produce various responses in patients and the concomitant adjustment of their warfarin dose is expected to confer substantial benefit to patients and to the healthcare system overall. Improved patient safety and reduction in healthcare costs can be realized by decreasing the time required to achieve a stable international normalized ratio, thereby decreasing the risk of adverse events and the length of hospitalization for some patients. The importance of this initiative is underscored by its inclusion in the Critical Path Initiative of the US Food and Drug Administration (4 ) as an example of patient-tailored therapies. In addition, the US Food and Drug Administration recently mandated the inclusion of a warning label on warfarin packaging recommending pharmacogenetic testing for patients starting such therapy (5 ). Studies of heterogeneity in patient sensitivity to warfarin have led to identification of at least 2 genes that contribute to differential responses in patients. These genes, cytochrome P450, family 2, subfamily C, polypeptide 9 (CYP2C9)5 and vitamin K epoxide reductase subunit protein 1 (VKORC1), contribute approximately 60% to this difference (6, 7 ). There are 2 clinically important alleles of CYP2C9 [2C9*2 (430C3 T) and 2C9*3 (1075A3 C)] and 1 for VKORC1 (⫺1639 G3 A) with demonstrated effects on warfarin metabolism (4 ). In addition, another recently described polymorphism in the gamma-glutamyl carboxylase (GGCX) gene, 1181T3 G, has demonstrated a modest yet significant effect on warfarin metabolism (8 ). Because the GGCX gene product is immediately downstream of the VKORC1 gene product in warfarin metabolism, and its genetic variation contributes mildly to the warfarin response, we included this in our panel. We sought to design a multiplexed assay allowing for the interrogation of these 4 single-nucleotide polymorphisms (SNPs) in a single sample. Primers for PCR of the appropriate gene fragments were used for amplification, in a single plex or multiplex format. Subsequently, SNP interrogation oligonucleotides were used to query the allele of interest, with the antisense strands being investigated for the 4 SNPs. Unlabelled oligonucleotides of varying sizes were designed to flank the SNP of interest and were subsequently extended by 1 base with base terminator. This entire reaction was per-

5

Human genes: CYP2C9, cytochrome P450, family 2, subfamily C; VKORC1, vitamin K epoxide reductase subunit protein 1; GGCX, gamma-glutamyl carboxylase.

823

Brief Communication

Table 1. DNA sequence for all PCR primers and single-nucleotide polymorphism (SNP) oligonucleotides used in the assay, 5ⴕ 3 3ⴕ direction. Tm, °Ca

Sequence

Expected size, nt

Observed size, nt

Application

PCR Primer 2C9*2-f

TTGCCTGGGATCTCCCTCCTAGTTTCG

63

2C9*2-r

TGGCCACCCCTGAAATGTTTCCAAGAA

60

2C9*3-f

TTGCTACAACAAATGTGCCATTTTTCTCCTT

58

2C9*3-r

CACCCGGTGATGGTAGAGGTTTAAAAATGAT

60

VKORC1-f

CGCCAGAGGAAGAGAGTTCCCAGAAGG

64

VKORC1-r

CTGCCTCCAGCATTGCCCTGACACCTA

64

GGCX-f

TGGGAGCTGCCTTCACCCTGCTCTACC

66

GGCX-r

AATGGCACAACGAGATCCCTGCCTGCT

63

278

PCR

338

PCR

462

PCR

668

PCR

SNP Primer

a

2C9*2-SNP-f

GGATGGGGAAGAGGAGCATTGAGGAC

63

27

N/A

sequencing

2C9*2-SNP-r

GGCAGCGGGCTTCCTCTTGAACAC

63

25

24

SNP

2C9*3-SNP-f

AAAAAATGCTGTGGTGCACGAGGTCCAGAGATAC

64

35

N/A

sequencing

2C9*3-SNP-r

AAAAAAAAAGGCAGGCTGGTGGGGAGAAGGTCAA

64

35

31

SNP

VKORC1-SNP-r

AAAAAAAAAAAAAGAGAAGACCTGAAAAACAACC ATTGGCC

62

42

34

SNP/sequencing

GGCX-SNP-r

AAAAAAAAAAAAAAAAAAAAAAGCACCATCATGT CCCAGGAATAGCCATAC

64

52

48

SNP/sequencing

Melting temperature (Tm) was calculated with the Oligo Calc: Oligonucleotide Properties Calculator (http://www.basic.northwestern.edu/biotools/oligocalc.html).

formed in solution. In the final step, the multiplexed SNP reaction was separated by capillary electrophoresis, and a window of 10 –90 bp was visualized. Because each of the bases was labeled with a different color fluorophore, we readily detected the allele on the basis of peak size and color. Genomic DNA was isolated from whole blood by using the QIAamp DNA mini kit (Qiagen) according to manufacturer’s instructions. Primer3 software (http://frodo.wi.mit.edu) was used to design primers to amplify fragments of 200 –700 bp surrounding the SNP of interest. Details for each of the PCR primers are listed in Table 1; they were amplified simultaneously in a single multiplex reaction. In a total reaction volume of 40 ␮L, the final concentration for each component was as follows: 50 nmol/L of each forward and reverse primer, 200 nmol/L dNTPs, 100 ng genomic DNA, 10⫻ reaction buffer [100 mmol/L Tris-HCl (pH 8.6), 500 mmol/L KCl, 15 mmol/L MgCl2], nuclease-free water, and 0.5 ␮L Taq Polymerase (5 U/␮L; USB). Samples were subject to these cycling parameters: denaturation at 95 °C for 10 min, 35 cycles of 94 °C for 30 s, 55 °C for 30 s, and 68 °C for 1 min, followed by 4 °C hold. Each primer set amplified a single fragment, and the multiplex reaction produced 4 bands comigrating with the appropriate singlet bands (Fig. 1). The 824 Clinical Chemistry 55:4 (2009)

SNP oligonucleotide extension reaction was performed in a 10-␮L reaction volume comprised of 4 ␮L SNPStart MasterMix, 1 ␮L of nuclease-free PCR-grade water, 5 ␮mol/L (final concentration) of each SNP interrogation primer (from 100 ␮mol/L mix), and 2 ␮L PCR product. The SNP interrogation oligonucleotides were of different lengths (Table 1), and after capillary electrophoresis, resolved and manifested as distinct peaks of the appropriate size (Fig. 1). Our multiplexed SNP interrogation assay was used for the simultaneous detection of these 4 alleles in a single reaction, and entailed these steps: (a) PCR amplification of the target gene fragment and removal of deoxynucleotides and primers by using exonuclease– shrimp alkaline phosphatase (ExoSAP-IT, GE HealthCare); (b) inactivation of exonuclease–shrimp alkaline phosphatase; (c) hybridization of SNP oligonucleotide ending 1 base from the SNP of interest and single-base extension to allow incorporation of a fluorophorespecific chain-terminating nucleotide, by using the SNPStart Kit (Beckman Coulter), and subsequent removal of any excess dye with additional shrimp alkaline phosphatase (SAP, GE HealthCare); and (d) capillary electrophoresis separation of the fragments on a GeXP genetic analysis system (Beckman Coulter).

Brief Communication

Fig. 1. Representative electropherogram with the following genotype: CYP2C9 heterozygous 2C9*2 and wild-type 2C9*3, VKORC1 homozygous mutant, and GGCX wild-type. Peaks at 13 and 88 bp correspond to size standards. The other peaks correspond to the 4 SNPs being interrogated, with peak size and color used for identification (A, red; C, black; G, green; T, blue). Inset shows singlet and multiplexed PCR reactions run on 2% agarose gel: lane 1, 100-bp molecular weight ladder; lane 2, 2C9*2 fragment amplification; lane 3, 2C9*3 fragment amplification; lane 4, VKORC1 fragment amplification; lane 5, GGCX fragment amplification; lane 6, multiplexed PCR reaction with all 4 products.

We tested samples containing the various polymorphisms of interest (2C9*2 and 2C9*3, VKORC1, and GGCX) to confirm the correct genotype, and a single peak at the expected size was detected. We assayed all samples using this reaction and compared the results with results from automated sequencing performed with an ABI 3730 (University of California at Los Angeles Sequencing Facility), with the appropriate primers (Table 1). We tested a set of known positive samples and also a set of 24 unknown samples. The GGCX SNP exhibits a low allele frequency and therefore, as expected, was not detected in our small cohort. Our assay can be performed within a 5-h turnaround time with minimal hands-on effort, after extraction of genomic DNA. We observed 100% concordance on 24 samples when we compared our assay with

traditional DNA sequencing (see Supplemental Table 1 in the Data Supplement that accompanies the online version of this Brief Communication at http://www. clinchem.org/content/vol55/issue4). This assay can be used for high-throughput screening of patient samples, allowing for analysis of two 96-well plates on a single overnight run. It can be employed to assess a patient’s genotype with regard to each of these alleles. Because these alleles have demonstrated clinical significance in warfarin metabolism, it is important to identify patients with these polymorphisms who will be enrolled for treatment in warfarin-containing regimens. Genotyping patients before initiation of therapy or during the first few days of starting a regimen would help determine the appropriate drug dose and improve the likelihood of achieving a stable international normalized ratio, thus reducing adverse events. Our mulClinical Chemistry 55:4 (2009) 825

Brief Communication tiplex SNP panel can be used as a stand-alone test for patients starting warfarin therapy, or its results can be combined in an algorithm with additional parameters such as weight, height, age, and sex to provide dosing adjustment recommendations (9 ). Our data indicate that this assay warrants further analysis and application with larger cohorts of patient samples to correlate these polymorphisms with requirements for maintenance doses of warfarin.

Author Contributions: All authors confirmed they have contributed to the intellectual content of this paper and have met the following 3 requirements: (a) significant contributions to the conception and design,

acquisition of data, or analysis and interpretation of data; (b) drafting or revising the article for intellectual content; and (c) final approval of the published article. Authors’ Disclosures of Potential Conflicts of Interest: Upon manuscript submission, all authors completed the Disclosures of Potential Conflict of Interest form. Potential conflicts of interest: Employment or Leadership: N. Udar, Beckman Coultler. Consultant or Advisory Role: None declared. Stock Ownership: N. Udar, Beckman Coulter. Honoraria: None declared. Research Funding: A. J. Rai, Beckman Coulter. Expert Testimony: None declared. Role of Sponsor: The funding organizations played no role in the design of study, choice of enrolled patients, review and interpretation of data, or preparation or approval of manuscript.

References 1. Wadelius M, Pirmohamed M. Pharmacogenetics of warfarin: current status and future challenges. Pharmacogenomics J 2007;7:99 –111. 2. El Rouby S, Mestres CA, LaDuca FM, Zucker ML. Racial and ethnic differences in warfarin response. J Heart Valve Dis 2004;13:15–21. 3. Kamali F. Genetic influences on the response to warfarin. Curr Opin Hematol 2006;13:357– 61. 4. US Food and Drug Administration. Critical Path Initiative: warfarin dosing. http://69.20.19.211/oc/ initiatives/criticalpath/warfarin.html (Accessed February 2009). 5. US Food and Drug Administration. New labeling information for warfarin (marketed as Coumadin).

826 Clinical Chemistry 55:4 (2009)

http://www.fda.gov/cder/drug/infopage/warfarin/ (Accessed February 2009). 6. Herman D, Peternel P, Stegnar M, Breskvar K, Dolzan V. The influence of sequence variations in factor VII, gamma-glutamyl carboxylase and vitamin K epoxide reductase complex genes on warfarin dose requirement. Thromb Haemost 2006;95: 782–7. 7. Wadelius M, Chen LY, Downes K, Ghori J, Hunt S, Eriksson N, et al. Common VKORC1 and GGCX polymorphisms associated with warfarin dose. Pharmacogenomics J 2005;5:262–70. 8. Rieder MJ, Reiner AP, Rettie AE. Gamma-glutamyl carboxylase (GGCX) tagSNPs have limited utility

for predicting warfarin maintenance dose. J Thromb Haemost 2007;5:2227–34. 9. Zhu Y, Shennan M, Reynolds KK, Johnson NA, Herrnberger MR, Valdes R Jr, et al. Estimation of warfarin maintenance dose based on VKORC1 (-1639 G⬎A) and CYP2C9 genotypes. Clin Chem 2007;53:1199 –205.

Previously published online at DOI: 10.1373/clinchem.2008.118497

Clinical Case Study

Clinical Chemistry 55:4 827–832 (2009)

Genetic Testing for Developmental Delay: Keep Searching for an Answer David T. Miller,1,2,4* Yiping Shen,1,4 David J. Harris,2,4 Bai-Lin Wu,1,4 and Magdi M. Sobeih3,4

CASE A 6-year-old girl of Irish, English, and French ancestry was referred to a pediatric neurologist for evaluation of developmental delay. She presented with expressive language delay with disarticulation. She did not speak in phrases until age 3, and formal testing revealed a language equivalent of 3 years 4 months when she was 5 years 6 months old (Clinical Evaluation of Language Fundamentals–Preschool) and an IQ of 64 (Wechsler Preschool & Primary Scale of Intelligence). Gross motor development was also delayed; she first walked at age 17–18 months. She never had any developmental regression. There was a family history of learning disability in her mother and a maternal uncle, and a maternal first cousin once removed was born with myelomeningocele. She was delivered at term after an uncomplicated pregnancy that was conceived by in vitro fertilization. At 10 days of age, she was diagnosed with atrioventricular (A-V)5 canal malformation and coarctation of the aorta. She underwent surgical repair of her coarctation at age 10 days and of her A-V canal defect at age 4 months. She had a bifid uvula, a finding often associated with presence of a submucous cleft palate. Modified barium swallow demonstrated a poorly coordinated swallow reflex leading to poor feeding and aspiration. As an infant, she had true vocal cord paralysis, believed to be a complication from intubation. She had gastroesophageal reflux disease, treated with ranitidine (Zantac), and was diagnosed with mild vesicoureteral reflux after a urinary tract infection at age 7 months. She has not had any episodes suggesting seizures or any features of autism other than language delay.

Department of Laboratory Medicine, 2 Division of Genetics, and 3 Neurolinguistics Clinic/Behavioral Neurology in Department of Neurology, Children’s Hospital Boston, Boston, MA; 4 Harvard Medical School, Boston, MA. * Address correspondence to this author at: Department of Laboratory Medicine, Children’s Hospital Boston, 300 Longwood Ave., Boston, MA 02115. Fax: 617-730-0338; e-mail [email protected]. Received October 16, 2008; accepted January 5, 2009. DOI: 10.1373/clinchem.2008.119438 5 Nonstandard abbreviations: A-V, atrioventricular; FISH, fluorescence in situ hybridization; aCGH, array comparative genomic hybridization; BAC, bacterial artificial chromosome; ST-FISH, subtelomere FISH; MLPA, multiplex ligation and probe amplification; NAHR, nonallelic homologous recombination.

Notable physical exam features include widely spaced eyes (hypertelorism), bulbous nasal tip, high arched palate, and fifth finger clinodactyly. Her neurologist ordered an MRI that showed mild thinning of the corpus callosum with prominence of the lateral and third ventricles, all nonspecific findings. Multiple genetic tests were ordered during infancy to determine the cause of her cardiac anomalies and developmental delays. The history of A-V canal malformation and aortic coarctation raised suspicion for microdeletion of chromosome 22q11.2, also called velocardiofacial syndrome. Fluorescence in situ hybridization (FISH) for chromosome 22q11.2 was normal. G-banded karyotype, PTPN11 gene sequencing for Noonan syndrome, and fragile X DNA testing results were also normal. At age 3 years, array comparative genomic hybridization (aCGH) was ordered from an outside laboratory. This whole-genome array of 2600 bacterial artificial chromosome (BAC) clones (Spectral Genomics, Inc.) spaced 1 Mb apart showed a gain in copy number of BAC clones extending from clone RP11–1K11 at 8p23.2 (chr8: 4 596 114 – 4 755 793; human genome build 18) to clone RP11–23H1 at 8p22 (chr8: 15 027 287–15 191 603; hg18) indicative of an approximately 10.6-Mb duplication at 8p22p23.2. Neither parent was a carrier of the 8p22p23.2 duplication based on FISH testing, indicating a de novo copy number change. Two years later, the patient’s neurologist ordered high-resolution whole genome oligonucleotide microarray (244K array G4411B; Agilent Technologies), taking advantage of this new technology to gather more information. This test again identified the 8p22p23.2 duplication, and defined the lesion more precisely as an 11.5-Mb duplication (chr8: 3 969 033–15 475 755; hg18). In addition, a 3.0-Mb duplication was identified at chromosome 22q11.2 (chr22: 17 086 001–20 131 661; hg18) (Fig. 1).

1

DISCUSSION The evaluation of children with developmental delay, dysmorphic features, and even autism spectrum disorders has improved tremendously as a result of clinical laboratory testing. Many children with developmental delay do not have physical exam features or medical histories specific enough for a clear clinical diagnosis. 827

Clinical Case Study

Fig. 1. Copy number changes identified by aCGH (Agilent 244k). (A), 8p22p23.2 duplication (arrowhead) compared to duplications reported in other studies. (B), 22q11.2 duplication (arrowhead) compared to typical recurrent 3-Mb deletion/duplication region. Chromosomal position is indicated relative to the centromere (cen) and telomere (tel) of each chromosome. Chromosome band number increases moving away from the centromere. Scale in megabases (Mb).

Among such patients, laboratory testing can be extremely helpful and is an integral component of the diagnostic evaluation. Typical recommendations include a G-banded karyotype, fragile X molecular genetic testing, aCGH, and neuroimaging (1 ). Microarray-based CGH to detect changes in genomic copy number, more than any other single test, has dramatically increased the rate of diagnosis among individuals with unexplained developmental delay and mental retardation. G-banded karyotypes have typically identified abnormalities in 3%– 4% of individuals with idiopathic mental retardation (2 ). Subtelomere FISH (ST-FISH) for submicroscopic deletions and duplications adds to the diagnostic yield. In the largest study of ST-FISH, presumed pathogenic changes were found in 2.6% of 11 688 unselected cases with previously normal karyotype (3 ). Multiple studies now support the conclusion that CGH with broad genomic coverage, so-called chromo828 Clinical Chemistry 55:4 (2009)

somal microarray, has a higher diagnostic yield than G-banded karyotype and ST-FISH in the evaluation of patients with developmental delay and mental retardation, detecting abnormalities in up to 8% of patients with previously normal karyotypes using arrays targeted to clinically relevant areas of the genome (4, 5 ). This detection rate will increase as more laboratories implement arrays with whole-genome coverage. No other clinical laboratory test has a comparable clinical sensitivity for patients with a diagnosis of developmental delay and mental retardation. We would have expected a G-banded karyotype to detect the large 8p duplication, but we know of other clinical aCGH cases where equally large events were not detected. We also would have expected the earlier BAC array to detect the 22q11.2 duplication. The laboratory that reported those results only reported a positive result if 2 or more adjacent BAC clones showed a consistent copy number change, and there may have been some technical limi-

Clinical Case Study tation with 1 of the probes in that region or with performing or interpreting that array. In this patient, 2 relatively large chromosomal duplications were identified. Based on the first aCGH result, it was reasonable to assume that the large duplication of 8p22p23.2 could account for the learning disabilities seen in this patient. Two recurrent large duplication events involving 8p23.1– 8p23.2 overlap with the distal duplication in this patient. The more distal overlapping duplication is associated with speech delay, autism, and learning difficulties (6 ). The more proximal, and smaller, overlapping duplication has been described in individuals with learning disabilities, but also in healthy family members (7 ). A-V canal malformations have also been reported in some patients with 8p22p23.2 duplication. Deletions of chromosome 22q11.2 are the most common genetic cause for A-V canal malformations like that seen in this patient. Typically, these cases have been tested by metaphase FISH. However, duplications will usually be located adjacent to the original position on the chromosome and would be difficult to discern without performing interphase FISH. Newer hybridization-based methods such as multiplex ligation and probe amplification (MLPA) and aCGH can readily detect such duplications. In fact, the advent of these technologies has led to the discovery of numerous duplication syndromes that were not previously recognized, including the 22q11.2 duplication syndrome (8 ). The 22q11.2 duplication syndrome shows variable penetrance and expressivity, with shared features of DiGeorge/velocardiofacial syndrome. Typical reported features include hypertelorism, broad nasal bridge, epicanthal folds, fifth finger clinodactyly, urogenital anomalies, hypotonia, scoliosis, seizures, and/or abnormalities on electroencephalogram. To date, at least 65 patients have been reported (9 ), and the phenotype is widely variable, including several individuals with no detectable symptoms. As this patient entered formal schooling, her difficulties became more apparent. The history of significant language delay evolved into learning disability. Behavioral problems emerged with questions of attention deficit disorder because of her difficulties in retaining learned material. In addition, symptoms of anxiety in anticipation of new activities surfaced. Further characterization of these behaviors suggested they were not primary but secondary to intellectual disability, prompting further testing revealing IQ in the mild to moderate mental retardation range. These behavioral symptoms could be secondary to the cognitive impairment, but behavioral problems such as short attention span, hyperactivity, impulsivity, and aggression have been described in

patients with 22q11.2 duplication syndrome. Consideration of the underlying genetic disorder is important in clinical decision-making about treatment with psychotropic medications. Both deletions and duplications of 22q11.2 are mediated by recombination between nearby, segmentally duplicated sequences via a mechanism termed nonallelic homologous recombination (NAHR) (10 ). The broad phenotypic variability observed in this condition has not been explained, but could be attributable to several factors, including the influence of other genes on the expression of genes within 22q11.2 region, epigenetic factors that modify gene expression, or even environmental factors. With a correct diagnosis, genetic counseling for recurrence risk to future pregnancies was updated. The parents were also counseled that recurrence risk for the de novo 8p22p23.2 duplication would be minimal, approximately 2%–3%, due to possible germline mosaicism. They subsequently had twins under the assumption of a low recurrence risk. Further parental testing revealed the same 22q11.2 duplication in the patient’s mother, with the result of a 50% recurrence risk. This case underscores the impact of improvements in diagnostic genetic testing on genetic counseling for patients and their families.

POINTS TO REMEMBER • Lack of clinical correlation with test results should prompt further evaluation. Also, patients evaluated in the past for developmental delay may benefit from updated testing, especially if no diagnosis has been made. • Chromosomal microarrays more easily detect genomic duplications than metaphase FISH. FISH 22q11.2 was unable to detect the duplication in this case (interphase FISH would have been required). • The lower-density BAC array also missed the 22q11.2 duplication, underscoring the value of high-density oligonucleotide arrays with full genome coverage. • Many unnecessary tests were performed, placing a burden on the patient and family, and delaying the diagnosis. • Owing to variable penetrance for 22q11.2 deletions and duplications, a parent may not have obvious symptoms. • Identifying multiple imbalance events in a single patient provides a unique opportunity to look for genetic modifying effects and could help genotype–phenotype correlation studies.

Clinical Chemistry 55:4 (2009) 829

Clinical Case Study

Author Contributions: All authors confirmed they have contributed to the intellectual content of this paper and have met the following 3 requirements: (a) significant contributions to the conception and design, acquisition of data, or analysis and interpretation of data; (b) drafting or revising the article for intellectual content; and (c) final approval of the published article. Authors’ Disclosures of Potential Conflicts of Interest: No authors declared any potential conflicts of interest. Role of Sponsor: The funding organizations played no role in the design of study, choice of enrolled patients, review and interpretation of data, or preparation or approval of manuscript.

References 1. Moeschler JB. Medical genetics diagnostic evaluation of the child with global developmental delay or intellectual disability. Curr Opin Neurol 2008;21: 117–22. 2. Shevell M, Ashwal S, Donley D, Flint J, Gingold M, Hirtz D, et al. Practice parameter: evaluation of the child with global developmental delay: report of the Quality Standards Subcommittee of the American Academy of Neurology and the Practice Committee of the Child Neurology Society. Neurology

2003;60:367– 80. 3. Ravnan JB, Tepperberg JH, Papenhausen P, Lamb AN, Hedrick J, Eash D, et al. Subtelomere FISH analysis of 11 688 cases: an evaluation of the frequency and pattern of subtelomere rearrangements in individuals with developmental disabilities. J Med Genet 2006;43:478 – 89. 4. Shaffer LG, Kashork CD, Saleki R, Rorem E, Sundin K, Ballif BC, Bejjani BA. Targeted genomic microarray analysis for identification of chromosome abnormalities in 1500 consecutive clinical cases. J Pediatr 2006;149:98 –102. 5. Lu X, Shaw CA, Patel A, Li J, Cooper ML, Wells WR, et al. Clinical implementation of chromosomal microarray analysis: summary of 2513 postnatal cases. PLoS ONE 2007;2:e327. 6. Glancy M, Barnicoat A, Vijeratnam R, de Souza S, Gilmore J, Huang S, et al. Transmitted duplication of 8p23.1– 8p23.2 associated with speech delay, autism and learning difficulties. Eur J Hum Genet 2009;17:37– 43. 7. Barber JC, Maloney VK, Huang S, Bunyan DJ, Cresswell L, Kinning E, et al. 8p23.1 duplication syndrome: a novel genomic condition with unexpected complexity revealed by array CGH. Eur J Hum Genet 2008;16:18 –27. 8. Slavotinek AM. Novel microdeletion syndromes detected by chromosome microarrays. Hum Genet 2008;124:1–17. 9. Courtens W, Schramme I, Laridon A. Microduplication 22q11.2: a benign polymorphism or a syndrome with a very large clinical variability and reduced penetrance? Report of two families. Am J Med Genet A 2008;146A: 758 – 63. 10. Lupski JR. Genomic disorders: structural features of the genome can lead to DNA rearrangements and human disease traits. Trends Genet 1998;14:417–22.

Commentary Sau Wai Cheung

Array comparative genomic hybridization technology is now being used to provide a genome-wide screen for unexpected genomic imbalances. As a result, this technique has become a valuable clinical diagnostic test and is enabling rapid identification of microdeletions and microduplications in much larger numbers of patients, many of whom present with no obvious clinical diagnosis. One of the major impacts from this revolutionary technology is the ability to identify microduplications, thus enabling medical geneticists to correlate the findings with the clinical assessment, as with the patient described by Miller et al. However, discovery of microduplications also poses a challenge to clinicians owing to the lack of clinical description and wide phenotypic variability that appears to be characteristic of microduplications. Oligo-based arrays have further improved the diagnostic capabilities of this test over the previous artificial-chromosome– based arrays derived from

Department of Molecular and Human Genetics, Baylor College of Medicine, Houston, TX. Address correspondence to the author at: Department of Molecular and Human Genetics, Baylor College of Medicine, One Baylor Plaza, NAB 2015, Houston, TX 77030. Fax 713 798 4998; e-mail [email protected]. Received January 8, 2009; accepted January 29, 2009. DOI: 10.1373/clinchem.2008.122713

830 Clinical Chemistry 55:4 (2009)

bacterial artificial chromosome/P1. The notable advantages of oligo-based arrays are: (a) better design flexibility, which allows avoidance of repetitive sequences and ability to select oligos with good performance; (b) increased robustness because of enhanced dynamic ranges (signal to background); and (c) higher reproducibility and greater precision in mapping of aberrations. The genome contains many low-copy repeats that can lead to these rearrangements, primarily as a result of nonallelic homologous recombination. Chromosome 22q11.2 is such a region, it is gene rich and contains multiple region-specific low-copy repeats. These low-copy repeats are known to mediate genomic disorders in this region, including DiGeorge syndrome/velocardiofacial syndrome, cat-eye syndrome, der(22) syndrome, and the 22q11.2 microduplication syndrome. Although DiGeorge syndrome/velocardiofacial syndrome is the most frequently identified genomic disorder of the 22q11.2 region, with an estimated frequency of 1 in 4000 live births, only a small number of patients with microduplication of this region have been described to date, likely owing to the highly variable and mild phenotype that may escape syndromic identification. Although FISH analysis can potentially identify microduplication in the DiGeorge syndrome/velocardiofacial syndrome critical region, this analysis method requires accurate clinical assess-

Clinical Case Study ment of the phenotype by the clinician, who must specifically request evaluation of interphase cells. Our experience indicates that microduplications are also more likely than microdeletions to be inherited, making the identification of these copy- number changes even more important for accurate recurrencerisk counseling for these families. We are practicing in an exciting time, because many more new diseases will be described on the basis of copy-number variants identified through this important clinical test.

Author Contributions: All authors confirmed they have contributed to the intellectual content of this paper and have met the following 3 requirements: (a) significant contributions to the conception and design,

acquisition of data, or analysis and interpretation of data; (b) drafting or revising the article for intellectual content; and (c) final approval of the published article. Authors’ Disclosures of Potential Conflicts of Interest: Upon manuscript submission, all authors completed the Disclosures of Potential Conflict of Interest form. Potential conflicts of interest: Employment or Leadership: S.W. Cheung, Department of Molecular and Human Genetics, Baylor College of Medicine. Consultant or Advisory Role: None declared. Stock Ownership: None declared. Honoraria: None declared. Research Funding: None declared. Expert Testimony: None declared. Role of Sponsor: The funding organizations played no role in the design of study, choice of enrolled patients, review and interpretation of data, or preparation or approval of manuscript.

Commentary Nelson L.S. Tang

Miller et al. report a case of developmental delay with microduplication of 22q11.2. This case illustrates several important issues related to structural aberration in the genome and shows that resolution of cytogenomic analysis is important in clinical genomic disorders. In this patient diagnosis was made with only a moderately dense 244K array, whereas the older 2.6K bacterial artificial chromosome array failed to detect this 3-Mb duplication. New chips with much higher resolution are entering the market, such as the one million single nucleotide polymorphism genotyping chips and chips with probes targeted for tens of thousands of copynumber variations (CNVs). These high-density platforms enable detection of submicroscopic structural variants that could not previously be revealed. On the other hand, the ability to scan the genome at this high resolution also leads to new challenges. We now know that a vast number of such structural variations are present in the human genome, and it has been estimated that up to a thousand CNVs may be found between any 2 individuals (1 ). At the moment there is insufficient understanding about the biological role and clinical significance of gain or deletion in most of these CNVs. As in the patient described by Miller et al.,

Laboratory for Genetics of Disease Susceptibility, Li Ka Shing Institute of Health Sciences, The Chinese University of Hong Kong. Address correspondence to the author at: Department of Chemical Pathology, Faculty of Medicine, The Chinese University of Hong Kong, Shatin, Hong Kong SAR, China. Fax ⫹852 30059108; e-mail [email protected]. Received January 11, 2009; accepted February 2, 2009. DOI: 10.1373/clinchem.2008.122739

it is not certain what phenotypic role is played by the de novo 8p22 duplication. Although there is no good answer to the question of why genomic copy-number changes are common causes of developmental delay and mental retardation, CNVs have also been implicated in other neurodevelopment disorders, including autism and schizophrenia (2, 3 ). Duplication or gain of CNVs at 22q11.2 was found to be one of the recurrent structural changes among autistic patients. This structural change was inherited from the father in one case and sporadically with de novo duplication in another. A variable penetrance was observed in the family with inherited 22q11.2 gain, and the father who carried the same duplication was not affected. The latest research studies were carried out with a high-density single-nucleotide polymorphism– genotyping array, and the results suggested that CNV could be defined at an even higher resolution on these platforms. We are now facing a new dilemma. On one hand, it is tempting to apply the latest technology in clinical diagnostics. On the other hand, such an application will generate results that are so new that we do not have any understanding of their biological significance. Many more studies with large sample sizes and functional genetic experiments will be required to answer these questions.

Author Contributions: All authors confirmed they have contributed to the intellectual content of this paper and have met the following 3 requirements: (a) significant contributions to the conception and design, acquisition of data, or analysis and interpretation of data; (b) drafting

Clinical Chemistry 55:4 (2009) 831

Clinical Case Study or revising the article for intellectual content; and (c) final approval of the published article. Authors’ Disclosures of Potential Conflicts of Interest: No authors declared any potential conflicts of interest. Role of Sponsor: The funding organizations played no role in the design of study, choice of enrolled patients, review and interpretation of data, or preparation or approval of manuscript.

832 Clinical Chemistry 55:4 (2009)

References 1. Lee C, Iafrate AJ, Brothman AR. Copy number variations and clinical cytogenetic diagnosis of constitutional disorders. Nat Genet 2007;39:S48 –54. 2. Walsh T, McClellan JM, McCarthy SE, Addington AM, Pierce SB, Cooper GM, et al. Rare structural variants disrupt multiple genes in neurodevelopmental pathways in schizophrenia. Science (Wash DC) 2008;320:539 – 43. 3. Marshall CR, Noor A, Vincent JB, Lionel AC, Feuk L, Skaug J, et al. Structural variation of chromosomes in autism spectrum disorder. Am J Hum Genet 2008;82:477– 88.

Citation Classic

Clinical Chemistry 55:4 833–834 (2009)

The Beginnings of Real-Time PCR P. Mickey Williams1*

Featured Article: Heid CA, Stevens J, Livak KJ, Williams PM. Real time quantitative PCR. Genome Res 1996;6:986 –94.2 During the mid-1990s the PCR was becoming a mature technology, impacting such areas such as cloning and automated sequencing. Many scientists were making efforts to tame the quantitative power of PCR. The power of exponential amplification was tremendous, but it had proven hard to control in quantitative applications. Many attempts were made to use socalled end-point quantification. This approach attempted to determine a starting target concentration based on the quantification of the final amplified product. Several aspects of PCR complicated the success of these efforts. A major source of difficulty was that PCR often reaches a plateau stage at which exponential amplification ceases and there is very little to no further product accumulation. Therefore, the product concentration at plateau is similar for all starting target amounts. In essence, it is impossible to accurately quantify starting target concentrations if PCR reaches plateau. Another complication is that PCR, like any enzyme-driven reaction, is sensitive to reaction inhibition. Many of the most interesting biological samples and methods for nucleic acid purification possessed known inhibitors of PCR (e.g., heme from blood). Southern and Northern blots with radioactive probes were the method of choice for many aspiring scientists wishing to have semiquantitative nucleic acid data, although these methods came with their own inherent pitfalls. It was during this time period that new thoughts began to surface. David Gelfand and Pam Holland and colleagues at Cetus had described the 5⬘-to-3⬘ nuclease activity of DNA polymerase and the method they used to harness this activity to degrade a hybridization probe placed in a PCR (1 ). Their discovery was eventually and cleverly named “TaqMan,” honoring one of the first electronic video games, Pacman. Additionally, Russ Higuchi and colleagues, formerly from Cetus but now at Roche, described “simultaneous amplification

1

Roche Molecular Systems, Pleasanton, CA. * Address correspondence to the author at: Roche Molecular Systems, 4300 Hacienda Drive, Pleasanton, CA 94588. E-mail [email protected]. 2 This paper has been cited 2895 times since its publication. Received January 29, 2009; accepted February 4, 2009. Previously published online at DOI: 10.1373/clinchem.2008.122226

and detection” of PCR products achieved by use of the accumulation of ethidium bromide fluorescence generated during a PCR (2, 3 ). Ken Livak and colleagues at ABI described the use of fluorescent energy transfer for generating a signal system for probe degradation in the 5⬘-to-3⬘ nucleolytic PCR assay (4 ). Major advances in PCR equipment were also taking place. A group at Roche led by Bob Watson developed a kinetic PCR thermal cycler that captured real-time EtBr fluorescence. Another group, led by Carl Witwer and Kirk Ririe, developed a capillary-based rapid thermal cycler that dramatically decreased times required for PCR reactions. It was during this period that Roche’s efforts to bring PCR technology into the diagnostics arena began to bear fruit. A recently described disease, AIDS, was the major focus of the efforts of many researchers and pharmaceutical companies. A clear and powerful application of PCR was in the detection and quantification of HIV, the causative agent of AIDS. PCR afforded a sensitive detection method. Shirley Kwok, John Sninsky, and colleagues at Roche developed a quantitative reverse-transcription PCR test for measuring HIV viral RNA copy numbers. The use of a coamplified control sequence of known copy number and detection before plateau phase helped overcome some of the hurdles to successful PCR quantification. This test eventually became the Cobas Amplicor HIV-1 Monitor test. The development of this test was a pivotal moment in bringing the power of PCR technology to diagnostic applications. The test has been used as part of many clinical studies and has clearly demonstrated that viral burden is an important end-point of therapy response. During this period many additional quantitative PCR advances were made; an example is the work of Mike Piatak and his colleague Jeffrey Lifson. They too were focused on HIV viral burden and they described another means of employing PCR for the quantitative detection of HIV. They published a report of the development of a quantitative competitive PCR as a tool for monitoring HIV viral burden (5 ). My laboratory at Genentech was fortunate enough to have connections with both Applied BioSystems and Roche. We were selected as the ␤ test site for a prototype real-time quantitative PCR instrument being developed by Applied BioSystems. Both Chris Heid and I realized the potential of this application if it truly provided sensitive, reliable, and accurate quantification of 833

Citation Classic target nucleic acids. With the successful development of this instrument, researchers would have a powerful tool to address any nucleic acid sequence or sample type of their interest. We began the initial experiments, during which we were learning how to design primers and probes for best results on this prototype instrument. After several months of close interactions with both Applied BioSystems scientists and engineers and our colleagues at Roche, we knew this technology was real and ready for use. We designed a series of experiments to demonstrate to ourselves and others the potential of this tool. When we completed of these initial experiments we knew it was important to share the results with others in such a way that the technology would be used. This effort resulted in the publication of 2 articles in Genome Research (6, 7 ), 1 of which is the article featured here. Never in our dreams did we realize the ultimate impact this methodology would have on the quantitative study of nucleic acids in both research and diagnostic applications. To date thousands of reports have been published describing research that relied on the sensitivity, dynamic range, and precise quantification of real-time PCR and reversetranscription PCR. Many have considered this technique the gold standard of sequence-specific nucleic acid quantification. Since our early work, many have improved upon the early efforts, and today a researcher has many instruments from which to choose. Looking back, it was a great pleasure to have been a part of this technology in its infancy and to watch it grow into the powerful tool it has become. Finally, to highlight the link between our 1996 article and the journal Clinical Chemistry, it should be noted that our work was presented at the AACC Oak Ridge Conference a year earlier in 1995, and the abstracts published in Clinical Chemistry (8, 9 ).

834 Clinical Chemistry 55:4 (2009)

Author Contributions: All authors confirmed they have contributed to the intellectual content of this paper and have met the following 3 requirements: (a) significant contributions to the conception and design, acquisition of data, or analysis and interpretation of data; (b) drafting or revising the article for intellectual content; and (c) final approval of the published article. Authors’ Disclosures of Potential Conflicts of Interest: No authors declared any potential conflicts of interest. Role of Sponsor: The funding organizations played no role in the design of study, choice of enrolled patients, review and interpretation of data, or preparation or approval of manuscript.

References 1. Holland PM, Abramson RD, Watson R, Gelfand DH. Detection of specific polymerase chain reaction product by utilizing the 5⬘-3⬘ exonuclease activity of Thermus aquaticus DNA polymerase. Proc Natl Acad Sci U S A 1991;88: 7276 – 80. 2. Higuchi R, Dollinger G, Walsh PS, Griffith R. Simultaneous amplification and detection of specific DNA sequences. Nat Biotechnol (Lond) 1992;10: 413–7. 3. Higuchi R, Fockler C, Dollinger G, Watson R. Kinetic PCR analysis: real-time monitoring of DNA amplification reactions. Nat Biotechnol (Lond) 1993;11: 1026 –30. 4. Livak KJ, Flood SJ, Marmaro J, Giusti W, Deetz K. Oligonucleotides with fluorescent dyes at opposite ends provide a quenched probe system useful for detecting PCR product and nucleic acid hybridization. PCR Methods Appl 1995;4:357– 62. 5. Piatak M, Jr., Luk KC, Williams B, Lifson JD. Quantitative competitive polymerase chain reaction for accurate quantitation of HIV DNA and RNA species. Biotechniques 1993;14:70 – 81. 6. Gibson UE, Heid CA, Williams PM. A novel method for real time quantitative RT-PCR. Genome Res 1996;6:995–1001. 7. Heid CA, Stevens J, Livak KJ, Williams PM. Real time quantitative PCR. Genome Res 1996;6:986 –94. 8. Heid C, Stevens J, Williams PM. Quantitative PCR using closed tube multicolor fluorescence. Clin Chem 1995;41:1686. 9. Gibson U, Heid C, Ling A, Stevens J, Williams PM. Quantitative RT-PCR using closed tube multicolor fluorescence. Clin Chem 1995;41:1686.

Clinical Chemistry 55:4 835–841 (2009)

Interview By Misia Landau

A Conversation with Elizabeth Blackburn Elizabeth Blackburn has spent much of her professional life exploring the far ends of chromosomes. In 1976, she discovered that they were capped by strange repeating sequences of DNA. Her discovery of telomeres and, with colleague Carol Greider, of the telomere-making enzyme telomerase, would draw her to the center of a now-burgeoning field, telomere biology, earn her numerous awards, and, eventually—as a famously appointed-and-dismissed member of President George W. Bush’s Advisory Council on Bioethics—a kind of scientific celebrity. Blackburn, the Morris Herzstein professor of biology and physiology in the department of biochemistry and biophysics at the University of California, San Francisco (UCSF), was born on November 26, 1948, in Hobart, Tasmania. She is the subject of a recent and critically acclaimed biography, Elizabeth Blackburn and the Story of Telomeres, by Catherine Brady (1 ), a story that begins at the far end of the world. She spoke with me from her home in California. Catherine Brady describes you as a young woman of 17 living in Tasmania—shy with boys but very passionate about science. You were infatuated with amino acids and felt they had a “teasing beauty.” What captured your imagination? I’d just always been very interested in biology and the idea that there would be a chemistry behind it was captivating for reasons I can’t explain. I guess I got the feeling that it would be a real explanation for what underlay a lot of biology. But I also liked the shapes and I liked the names. Is it true that you plastered your bedroom walls with pictures of amino acids that you drew? Absolutely. Butchers used to wrap meat in big white rolls of paper and so I got some of that. You must have some artistic ability. I used to love drawing when I was a kid, of naturalistic things. I didn’t do abstract things. Your parents were both general practitioners and presumably somewhat familiar with science. Yet it sounds like you felt the need to protect or hide your love of science. Not from my parents—they were very encouraging. I remember I got a microscope for Christmas once. I was so excited. They were very encouraging of that. Among my peers, that wasn’t a necessarily acceptable thing. So I didn’t talk about it.

Did you actually have to suppress this passion? I ask this because in the early part of her book, Brady portrays a tension in you between a very strong inner will and an external pressure to appear feminine. It’s a little sad to examine oneself but I think she kind of nailed it. When I try to think about it, that seems real to me. So you felt some pressure to feel more demure? Yes, that’s right. I remember I had this chemistry set out in this little garden shed and I thought that was just the most wonderful thing. But it wasn’t the kind of thing I would talk about with my school friends. And I remember I had little guinea pigs and one of them had little babies. They were born very mature looking, not bare, they had fur on them. I put them in my hand and showed them to one of my friends and it was like—aarrghh! I thought they were so cute but she saw them as little vermin. That reminds me of another very early episode when you were living in a town called Snug. You are maybe 3 years old and you’re playing outside with a bull ant, talking to it. Later you would develop a habit of singing to animals, even jelly fish. There was no television growing up so there was a huge boredom factor. There was a lot you’d do for self amusement! 835

Interview Did you ever think of becoming a naturalist? I think that intellectually biology was really interesting to me. But a naturalist, that wasn’t the way you’d find out about it— how it worked. You really had a deep curiosity. I had no interest in a description of things or names of species or anything like that. My mother would know all these names of plants and different flowers and I was almost rejecting. I didn’t want to just know names of things. I remember really wanting to know how it all worked. Certainly living close to nature as you did, you might ask, how do flowers develop? How do animals grow? That change through time is something you can visualize and have a sense about mechanism. The chemistry behind it—the biochemistry—I thought, this is how you are really going to get at it. I really, really thought that and I think it was because of books I read. Any in particular? There was a book by [George] Gamow. He was really quite influential in molecular biology thinking. I remember reading the book and being excited by the idea that there were molecules behind life, that life was made up of molecules. It was when I was a late teenager. There was also a book by Jacob Bronowski. It was about the sciences, with really great illustrations, partially abstract pictures of instrumentation and principles behind physics and things like that. I used to read quite voraciously so those weren’t the only books. But those were the ones that grabbed me for science—also the biography of Madame Curie. I read an intriguing quote in which you said that you believed then, and you still do, that you loved science because it was a world into which you could escape. Was it an escape from something or to some place? It wasn’t like I felt I was actively escaping from something. But I was very happy in this world of the mind. So when your mind is very occupied, and you’re just really interested in what you’re doing, that’s a great place to be. So I think it was a positive pull rather than anything that I was escaping from, except the general vicissitudes of daily life. You had 6 siblings! Yet we all preserved our private spaces. It was a really interesting thing when I look back on it. We’re all very close to each other though we’re geographically very separated. Whenever I see them we instantly reconnect. You love animals and being outdoors and you were very comfortable in this messy, wet, chaotic world of nature. Interesting, I never saw it as messy and chaotic. 836 Clinical Chemistry 55:4 (2009)

But you’ve spent many, many hours in labs, which are very controlled places. Though in high school it sounds like you liked to mix things up and make explosions. I think this happened like once! A little teenage rebellion here—like, I’ll be clever. But I assure you, I was not a serial exploder. This is one daring thing me and my friends did. It felt very, very daring. And we were reproved for it. Did you immediately feel comfortable in a chemistry lab? Yes, I did. Right from high school, where there was a chemistry laboratory and a biology lab, I loved it.

My mother would know all these names of plants and different flowers. . . I didn’t want to just know names of things. I remember really wanting to know how it all worked. You really seemed to get it—that’s something very few people do with chemistry. For most people it’s quite an effort to understand what they’re doing. I was good but I wasn’t brilliant at chemistry, not at all. But I really like the hand-mind connection part of it. You had some influential teachers who helped you? I had a very good teacher in high school who had a very good personality. When you’re a teenager, that makes an enormous difference. The field of chemistry was then and to some extent still is a male bastion. Did that register with you consciously when you decided to major in biochemistry at Melbourne University? It did, although at this very big university there were certainly women professors in chemistry who were very distinguished in their fields. On the other hand, their numbers were low. You described yourself to Brady as having had a “genderless mind.” What do you mean by that? I’m not quite sure. She said that you were not, at that time, acknowledging to yourself and certainly not to others that you were different because you were a woman. Right, this is true. I wanted to just do it. This was the subject matter I thought was interesting and it didn’t matter whether I was a male or a female. I had the

Interview strong feeling it shouldn’t matter. Now I look back and think, “Well, maybe women and men do think about things differently.” And that’s great because this is how to solve problems—you have different minds, different ways of thinking about problems. So now I feel a little more nuanced about it. It’s actually the interplay of the mind and the subject matter. Molecular biology was on the rise when you were a biochemistry major. How would you describe the difference between molecular biology and biochemistry? In biochemistry there’s much more quantitative thinking—you find out how things work through measurement and the quantification of rates and things like that. Whereas in molecular biology there’s much more an attraction to leaping . . . To the elegant solution? Yes, that’s right and I think much of this came from physics thinking. The culture of it was very dominated by people who had been physicists, like Francis Crick and Max Delbruck—people like that. So a huge value on the elegant solution and the logic of it, which was perfect given that the information content of the genetic material was the prime focus. Would you describe yourself as comfortable in both worlds? I do. I love them both for different reasons. You moved to the Medical Research Council (MRC) Laboratory of Molecular Biology at Cambridge University in 1971 to work with Fred Sanger. How would you describe Sanger’s influence on you? Very, very strong—to be at the bench all the time. I loved being there. He was somewhat laissez-faire in a good way, one that suited my temperament. He was always in the laboratory so he was always very available when you wanted to talk with him and yet he let you find your own way. I think he figured the best way to educate people was to let them explore themselves. I think that’s been influential on how I mentor people. I try to give them space to do their own thinking. The idea that you are sort of slave labor to do a particular task that your advisor has thought about doesn’t strike me as being the best way to get a graduate or postgraduate education. Letting the person do their own thinking is the whole key in science. How do you cultivate independent thinking? There’s probably a certain selection for people who come to the laboratory who are ready to want to think about things. One of my students, I remember, she came and she was very clear. She said, “I want to be a

student in your lab.” And she said, “I don’t want you telling me what to do.” And I said, “You’ve got it!” Sanger’s was a very male-dominated lab. Yes, it was. The whole culture very much was. There were some female scientists there but, frankly, they were not, so far as I could tell, high up in the pecking order. But we didn’t talk about it. The whole culture was about science. But I loved it. I had no problem with that. You met your future husband John Sedat there. I’m curious, there were a lot of men around in that laboratory. What drew the two of you together? The usual chemistry.

Letting the person do their own thinking is the whole key in science. In more ways than one! I guess so. It’s been a real partnership, both of you passionate about science. Right, but following different enough routes. You did love to literally travel together. What have been some of your favorite places to visit? Ethiopia was one of the really interesting places we went to in the early 1970s. It was John’s idea to go. And I thought this was a terrific idea, too. So off we went, very adventurously. John always says I never forgave him for the 3-day bus trip across the mountains. We really were very adventurous then. I remember at one point going through the mountains on this bus. Lots of people carried rifles or arms with them all the time and on the bus suddenly everybody took their rifles off safety. We said, “What’s going on?” They said, “This area is known for bandits.” We were so adventurous at the time that we just took this in stride. You were able to take off for several weeks at a time. The work ethic was that you really worked—as we say now, 24/7. You worked day in and day out. The lights were on in the MRC laboratory till 11 o’clock or midnight every night. People were furious because they thought we were wasting electricity by leaving the lights on. People in Cambridge didn’t realize that these were people who work all the time. We were a very isolated sort of culture from the surrounding Cambridge culture, which was much more, “Have a life.” For us life was being in the lab. Clinical Chemistry 55:4 (2009) 837

Interview You had good luck with mentors. Not only luck. I’ve had good advice in picking good ones. Yes, because after Cambridge you moved to Yale to work with Joseph Gall. That was with very good input, including from John Sedat, my husband, who had known about Joe as being an excellent mentor as well as a good scientist. It sounds like there were quite a few more women in the lab. Yes, that was really a transformative kind of experience because suddenly it was a very different culture. What were the salient differences? Well, in one way it was paradoxical because at Yale, it was much more hierarchical than at the MRC laboratory in Cambridge in the sense that students and postdocs were very much in a sort of hierarchy at Yale. That wasn’t the case, paradoxically enough, at this laboratory in Cambridge where any student would chat with very famous scientists at the morning tea table and things like that. But at Yale people were busier and more structured. On the other hand, suddenly in the US in the 1970s, people were much more aware that women and men should have equal opportunities so that just permeated the whole place—at least my little corner of Yale. While you were working in the Sanger laboratory you were developing methods to sequence DNA using RNA. Right— copying it into RNA. There were methods for sequencing RNA that had been worked out largely by Fred Sanger and some others. So that was one way of getting directly at the sequence of DNA. I had been working with this little tiny bacteriophage, ␾X174, that actually my husband John Sedat had introduced to Fred Sanger. It was a tiny single-stranded DNA, about the littlest DNA anybody knew. So John said to Fred, “Look, this would be the thing to use.” And Fred thought this was a really good idea. So that was what the whole laboratory was working on in different ways. That’s what you brought to Joe Gall’s lab, the knowledge of this technique? This technique and all the other techniques that I had been so exposed to and very familiar with because there were people around me in the laboratory working on all sorts of other techniques as well. It was sort of bootstrapping your way into DNA sequences through all sorts of different avenues. 838 Clinical Chemistry 55:4 (2009)

So how did you become interested in the DNA located at the tips of chromosomes? Well, it was still unclear if we could see any DNA at all. So there had to be special tricks. Even with the very small bacteriophage ␾X174 there had to be special tricks to break it up into smaller pieces. Some of that John Sedat had devised—this was before restriction enzymes were known. We had to radiolabel the ends of DNA molecules—people had done that for ␭-phage DNA. So then I thought, wouldn’t it be great to look at something that wasn’t viral or bacteriophage DNA but rather a regular DNA from a regular cell. Bacterial chromosomes were circular but eukaryotic chromosomes were linear. Joe Gall had found a whole lot of very small linear chromosomes, and so I thought it would be very exciting to look at the ends of these little chromosomes. It was also at this point that you encountered a creature that would play a big role in your life. Yes, yes—Tetrahymena thermophila.

And then the hypothesis started coming, could they be added on to the end of chromosomes by an enzyme? You described it at as love at first sight. It’s a great model system but it’s also a very beautiful little organism to look at through the microscope. It’s a little pear shape and it swims in little spirals. What did you see in Tetrahymena that led to the discovery of telomeres? Well, there were these very uniform high– copy number little DNA molecules that you could purify out using these methods that had been worked out in Joe Gall’s laboratory. So now you had this population of linear DNA molecules and it was feasible to get enough to do radiolabeling of the ends and just see what sequences could be pieced together. Even when cloning became possible it was very hard to clone the telomeres and so you had to use all these different sequencing methodologies that I learned about in Fred’s laboratory. So basically I applied these methods, combinations of methods, to sequence the ends of the chromosomes. We had to make them up along the way and apply different ones depending on what you saw. Try this, try that.

Interview Improvisation? Very much so, because it was completely uncharted territory.

out. It may even have roles unrelated to telomeres. All of this is still quite exciting. One thing we do know it does is synthesize DNA.

What you uncovered were these strange DNA repeats. What did you make of that? It was just wonderfully puzzling. For a long time we had no clue what it could be. And then the hypothesis started coming, could they be added on to the end of chromosomes by an enzyme? More and more data suggested that was happening. In the 1980s, Carol Greider and I decided to look for telomerase, as we called the enzyme, in Tetrahymena. We knew it had lots of chromosome ends and we knew a lot of new ends were generated at a particular developmental stage in the organism. So we knew this would be a place to look for any kind of enzyme that synthesized the telomeres.

You and Carol Greider discovered telomerase late in 1984. Two years later, you discovered you were pregnant with your son Ben. You have mentioned that, by then, you had earned the ability to have a child— that you had proved yourself scientifically. Yes, as much as I thought about it. To be honest I didn’t actively think about it a whole lot. I just went along day to day.

Even the idea that an enzyme was synthesizing the DNA, that was a novel leap, isn’t that true? Right. You had to propose something that nobody had ever thought of before. People knew that there were enzymes that could just stick nucleotides onto the ends of DNA but not an enzyme that could put a particular DNA sequence onto the ends of DNA without copying another DNA—that was what we were proposing. And then we hunted for it deliberately. This enzyme was essentially creating DNA from scratch? Well, it was copying it from RNA, which was very unusual because in molecular biology we grew up with the central dogma that in normal cells DNA should be copied into RNA and RNA should never be copied into DNA. Now, reverse transcriptase . . . We knew about reverse transcriptase but that had been this aberration in viruses. But it did exist and so the unusual thing then was to find out, “Okay, is this actually a reverse transcriptase that copies RNA into DNA in cells?” But the difference between telomerase and reverse transcriptase, you proposed, was that telomerase was actually carrying its own RNA. That’s right. And it’s just copying a very small portion of its own RNA. Right. But the actual active site of the protein turns out to be very close to the HIV reverse transcriptase and other known reverse transcriptases. But that’s just a part of the whole story. It has this little module in it and then it has big conserved domains that bind the RNA and bind the telomeres and do other functions that we and others are still trying to figure

Did having a child change your life much as a scientist? You mention the difficulty in being at 7 am meetings when Ben was a bit older. Well that’s because I was chairing at UCSF at that stage. No scientist would have a 7 o’clock meeting because we’re all night owls. In the world of science, because the hours are somewhat flexible, in theory, at least, you should be able to mix having children and doing science. But because it’s such a very engaging, full-time thing, in practice it becomes hard, although I’ve become aware now of people who’ve been very successful as part time scientists.

You had to propose something that nobody ever thought of before. You describe your genderless identity as a kind of protective coloration in the male-dominated world of science. But I notice that you often remember what you were wearing on various occasions, like when you met Fred Sanger and Joe Gall. Why is this? It has more to do with the mind. It wasn’t to do with life. Do you have a fashion sense? I do, I do. Though I don’t always use it. I’m interested in how people look. Again it’s sort of aesthetic and I like to look okay. But in the world of science for many years, the idea was—it’s kind of like the world of Silicon Valley is now—you dress casual. If you could dress as you want to, what would that look like? Well, I do try to dress a bit more. I like to be comfortable. And I do like clothes that look nice and I really like clothes that have nice lines. If I could afford it I would wear suits from Paris. I’ve come to realize it’s possible Clinical Chemistry 55:4 (2009) 839

Interview to be very stylish and look terrific and still be a really great scientist as well. That was a long time coming for me. You’ve been photographed many times. Do you have a favorite portrait? Hmmm. Yes, there are some that I like. There’s a photographer, Micheline Pelletier, and when I got the L’Oreal UNESCO prize last year she came and took a whole lot of photographs while I was in Paris. And she came and took photographs of John and me in San Francisco. And she took some really nice ones. What is it that you like about these photos? Do they reveal something about you? I think they look fairly friendly and soft, some of them. I guess I like that. It’s not a side of me that I’ve necessarily thought about. But she is a photographer and she seemed to like that aspect. It’s interesting that somebody from the outside would look and see that. A softness? Yes, I suppose; and a pleasantness. It was once a quiet corner of science but the whole area of telomeres and telomerase research has really exploded. How do you feel about that and what do you think are the implications of this burgeoning interest? It deals with something fairly profound which is, why do cells stop multiplying? Why do organisms die, which is the extrapolation of that. And we’re seeing connections now in people. What I like is that it takes me back to my roots in biochemistry, because now quantities are mattering again—the amounts of telomerase, the lengths of telomeres, which you can measure in people. Those turn out to have statistical relation with things like mortality. But you have to think in very different ways because it’s not a nice simple mechanistic idea, where molecule A does this to molecule B. It’s a complex integration of things that end up giving you a quantity—amount of telomerase, telomere length, and so forth. So it’s a challenging way of thinking about things again. The shortening of telomeres and increase in telomerase are linked with aging and cancer, respectively. Yes, that’s what’s so fascinating. There was an interesting political chapter in your life a few years ago when you were appointed to the President’s Advisory Council on Bioethics. You later learned that other scientists had turned down the invitation but you accepted. Why was that? I really thought I could offer something. I did think that 840 Clinical Chemistry 55:4 (2009)

national policy was important and I was intrigued by the general questions of bioethics, which were somewhat narrowly addressed by this particular council. And I thought I could really contribute to the debate, especially about stem cells because I know about cell biology and molecular biology, and I wasn’t a stem cell biologist myself. So it wasn’t like I had an ax to grind. I really thought I could contribute something and I’ll learn something in the process. You learned first-hand how science is affected by the broader political context. Did you become more cynical? Not at all. I went into it very cynical. One would have to be blind not to realize that there would be a lot of political overtone to such a group. But I thought, “Look, I’m just going in as a scientist. I’m not an expert in policy, bioethics, theology—all these sorts of interesting areas that people bring to bear on these questions. But I am an expert in my particular area and I can bring honest expertise into the discussion that way.” But it didn’t make me more cynical.

I love that things unfold in unexpected directions, so I hope that if I took a sabbatical I wouldn’t know what would be the end product. It made you more famous. Exactly. I was not renewed on this Council for a second term, in a sense, kicked off by the White House. What was really interesting was that huge numbers of people just got very, very upset about this because they realized it was kind of symbolic of this administration not caring about scientific evidence. I can’t tell you the emails and things I was just deluged with. That to me was the opposite of cynicism. It was, this is really good! People are not just taking all of this lying down. Any hopes for the future of science under the Obama administration? Yes, huge hopes! They’ve made very clear statements that they are interested. And I’m looking at the high appointments they’ve made and they’re clearly scientists who do take science policy very seriously. If you had a sabbatical, how would you spend it? Well, we did have a short one in Paris last year, and we had a wonderful time really just thinking and talking about science.

Interview Which part of Paris? It was the Curie Institute, a cancer-related institute that has some excellent basic research going on. I love that things unfold in unexpected directions, so I hope that I if I took a sabbatical I wouldn’t know what would be the end product. This one I took, I didn’t really know what the end product would be. To me that is the whole essence of a sabbatical, to have an adventure. Some people take a sabbatical to do a purpose-driven thing. I figure I can do that while I’m at the medical school. I’d like to finish the sabbatical thinking differently from the way I did when I started it. It certainly happened with the last one. What did you come out with at the end? It’s hard to explicate. It wasn’t just one individual thing. There was a huge reenergization that I felt. We’d been looking at things in telomere biology and ignoring other things—that really is what came out of it. It was “Wait a minute—there are all these stones that have been left unturned by the field and by us, and we need to be looking at things in different ways for telomeres. We need to try to think about them as very, very dynamic entities.” What do you like to do to relax, if you do relax? Do you read novels? I’m reading Simon Winchester’s book about Joseph Needham. And I like spy stories. Which in particular? I’ve enjoyed the John LeCarre spy stories. They’re very gloomy. You like gloomy? No, but for some reason I like that sort of gloomy world of fiction. And his novels are very morally ambiguous. I guess I like that subtlety about them. I’m sure I wouldn’t agree with his philosophies half the time. But he raises moral ambiguities so they are really intriguing.

When do you find time to read? Coming back from the east coast on airplane trips— that’s the time I really read because I’m too exhausted to do anything. Lots of my friends say they get lots of work done flying back from the east coast and I can’t believe them for a moment. I’m just way too exhausted. What’s your favorite time of the day? What an interesting question! Lunch time. Why? You take a break and you do something interesting at lunch time. You talk with people. Finally, what would be your advice to a young woman starting out in biochemistry or the life sciences today? Well, I guess the usual things, like “Go for it.” And don’t be afraid to ask people for help—and then feel free to ignore it! I think that was what I didn’t do well. I was not good at getting helpful advice. I think I was too proud and I had all the answers. You had quite a few good ones. I’m sure I could have done better. Reference 1. Brady C. Elizabeth Blackburn and the story of telomeres: deciphering the ends of DNA. Cambridge (MA): The MIT Press; 2007. 424 p.

Sponsored by the Department of Laboratory Medicine, Children’s Hospital Boston

Misia Landau e-mail [email protected] Previously published online at DOI: 10.1373/clinchem.2008.119578

Clinical Chemistry 55:4 (2009) 841

Letter to the Editor

Clinical Chemistry 55:4 842–843 (2009)

Validity of Maternal Genotypes in DNA from Archival Pregnancy Serum Samples To the Editor: Much functional genomic research has focused on genotypic data from large cohorts or entire populations (1 ). The large populationbased serum biobanks collected for maternal (pregnancy) screening and stored in many countries could be useful for genetic epidemiologic studies if DNA extracted from archival serum and plasma is of sufficient quantity and quality for successful genotyping analyses (2– 4 ). Serum collected during pregnancy also contains cell-free DNA from the fetus (5 ), however, which might affect the outcome of genotyping analyses. We have evaluated the concordance of genotypes between DNA extracted from archival maternal sera with DNA from fresh whole blood from the same women. The median DNA yield was 15 mg/L (range 1–34 mg/L) using QiaAmp DNA minipreps (Qiagen) on 200 ␮L fresh whole blood from 137 women and 90 ␮g/L (range 0 – 4800 ␮g/L) using the MagNA Pure LC and Total Nucleic Acid Isolation Kit (MagNAPure; Roche Diagnostics) on 200 ␮L of 191 archival maternal sera from the same women. DNA yield decreased with increasing serum sample age (P ⫽ 0.005, Jonckheere–Terpstra) (Fig. 1), with no difference due to firstor second-trimester sampling (P ⫽ 0.616, Mann–Whitney). Sample volumes were sufficient for repeat DNA extractions from 175 serum samples. Twenty microliters from each 50-␮L extract was evaporated to dryness at 37 °C and used in a 6-␮L Y chromosome–specific PCR (5 ) with 0.0625 ␮L 40⫻ assay mix, 0.1% BSA (Sigma-Aldrich), and 3.5 mol/L MgCl2 on a 7900 HT Sequence Detection System (Applied Biosystems). Another evaporated 842

Fig. 1. DNA yield from serum in relation to sample age.

20-␮L aliquot was used for multiple displacement amplification (MDA)1 (GenomiPhi V2; GE healthcare) with 0.1% BSA. MDA products were dissolved in 80 ␮L TE (10 mmol/L Tris, 1 mmol/L EDTA, pH 8.0) by shaking at 4 °C overnight. DNA concentrations were determined using 2 ␮L template in a F2 g.20210G⬎A (rs1799963) TaqMan MGB assay, with serum samples supplemented with BSA and MgCl2 as above. All samples were analyzed for 10 highfrequency single nucleotide polymorphisms (SNPs) using 2 ␮L whole blood extract, 9 ␮L serum extract, or 2 ␮L MDA product, 0.075 ␮L assay mix, BSA, and MgCl2 as above. Seventy-two of 84 serum samples from women carrying male fetuses and 1 of 91 samples from women carrying female fetuses were Y chromosome positive (P ⫽ 8E⫺30). There was no correlation between sample age, gestational age (P ⫽ 0.2– 0.9, all Pearson ␹2), or DNA yield (P ⫽ 0.9, Mann–Whitney), and Y chromosome detection. The genotyping rate was 88.9% when using 0.1 ng up to 0.4 ng serum DNA and ⬎99% when

1

Nonstandard abbreviations: MDA, multiple displacement amplification; SNP, single nucleotide polymorphism.

using ⱖ0.4 ng serum DNA or ⬎2 ng whole blood DNA as PCR template. The DNA yield was enough to provide 0.4 ng per SNP analysis from 81% (154 of 191) of the serum samples. All genotyping results that were based on ⱖ0.1 ng archival serum DNA as template were identical to the corresponding whole blood samples. Analysis of an admixture of heterozygous and homozygous DNA for one of the SNPs revealed that at least 50% heterozygous DNA was required to alter the genotyping results of homozygous samples and at least 90% homozygous DNA was required to alter the genotyping results of heterozygous samples. The median DNA yield after MDA was 506 ng (range 0 – 4165 ng). Genotypes of the MDA-treated samples could be automatically determined for 4 of 10 SNPs by the 7900HT software. The genotyping rate was 91%, of which 90% were concordant with corresponding genotypes of neat serum extracts. The poor performance in MDA and subsequent genotyping analyses could be due to inhibitors present in the MagNAPure extracts, DNA degradation and fragmentation during storage, or sequence context. The failed SNPs contained poly G or C runs, CG content ⬎50%, and/or multiple CpG sites. The lowest genotyping and concordance rates were obtained from the MDA products of the serum samples with the lowest DNA yield. Archival serum from maternity serological screening is a useful source of DNA for genetic epidemiologic studies. For reliable genotyping results, at least 0.4 ng of serum-derived DNA should be used in the TaqMan genotyping assays. The presence of realistic amounts of fetal DNA of a discordant genotype may at worst cause an indeterminate genotypic assignment but will not cause false maternal genotyping results on the TaqMan 7900 HT system. MDA

Letter to the Editor is not recommended on DNA extracted from serum samples.

Author Contributions: All authors confirmed they have contributed to the intellectual content of this paper and have met the following 3 requirements: (a) significant contributions to the conception and design, acquisition of data, or analysis and interpretation of data; (b) drafting or revising the article for intellectual content; and (c) final approval of the published article. Authors’ Disclosures of Potential Conflicts of Interest: Upon manuscript submission, all authors completed the Disclosures of Potential Conflict of Interest form. Potential conflicts of interest: Employment or Leadership: None declared. Consultant or Advisory Role: J.A. Carlson, Qiagen. Stock Ownership: None declared. Honoraria: None declared. Research Funding: This study was supported by the Swedish National Biobanking Program, financed by the Knut and Alice Wallenberg Foundation functional genomics initiatives Swegene and WCN and by the EU 6th framework grant Cancer Control using Population-based Registries and Biobanks (principal investigator, J. Dillner).

Expert Testimony: J.A. Carlson has presented her laboratory’s production data at two lectures for Qiagen without economic remuneration. Role of Sponsor: The funding organizations played no role in the design of study, choice of enrolled patients, review and interpretation of data, or preparation or approval of manuscript. Acknowledgments: We thank Maria Anderberg, Kia Sjo¨lin, Aline Marshall, Kristin Andersson, Christina Gerouda, and Sophia Harlid for excellent laboratory assistance.

References 1. Manolio TA, Bailey-Wilson JE, Collins FS. Genes, environment and the value of prospective cohort studies. Nat Rev Genet 2006;7:812–20. 2. Sjoholm MI, Hoffmann G, Lindgren S, Dillner J, Carlson J. Comparison of archival plasma and formalin-fixed paraffin-embedded tissue for genotyping in hepatocellular carcinoma. Cancer Epidemiol Biomarkers Prev 2005;14:251–5. 3. Ulvik A, Ueland PM. Single nucleotide polymorphism (SNP) genotyping in unprocessed whole blood and serum by real-time PCR: application to SNPs affecting homocysteine and folate metabolism. Clin Chem 2001;47:2050 –3. 4. Pukkala E, Andersen A, Berglund G, Gislefoss R, Gudnason V, Hallmans G, et al. Nordic biological specimen banks as basis for studies of cancer causes and control: more than 2 million sample

donors, 25 million person years and 100,000 prospective cancers. Acta Oncol 2007;46:286 – 307. 5. Lo YM, Tein MS, Lau TK, Haines CJ, Leung TN, Poon PM, et al. Quantitative analysis of fetal DNA in maternal plasma and serum: implications for noninvasive prenatal diagnosis. Am J Hum Genet 1998;62:768 –75.

Malin I. L. Ivarsson2,3,4* Joakim Dillner2,4 Joyce Carlson2,3 2

The Swedish National Biobanking Program and Departments of 3 Clinical Chemistry and 4 Medical Microbiology Lund University Malmo¨ University Hospital Malmo¨, Sweden

*Address correspondence to this author at: Department of Clinical Chemistry University Hospital MAS, Entrance 71 205 02 Malmo¨, Sweden E-mail [email protected] Previously published online at DOI: 10.1373/clinchem.2008.116277

Clinical Chemistry 55:4 (2009) 843

Clinical Chemist

Clinical Chemistry 55:4 844 (2008)

Compiled by Nader Rifai

Lily Robinson

result of ingestion of beads from ornamental objects by children.

and the Mount Vernon Affair Recollections For this assignment I was inspired by the 1978 Cold War case of Gyorgi Markov, but with a twist. Gyorgi Markov was a Bulgarian journalist who while at a bus station in London was stabbed in the leg with the tip of an umbrella. The umbrella had been modified to inject a small pellet that presumably contained ricin. I thought about using an umbrella and ricin, but decided on the more potent toxin abrin which was hidden in the heel of one of my shoes. By releasing a lever in the shoe, I exposed a small injectable device. The small gauge of the needle allowed me to easily stab the tip into the leg of the diplomat to produce a sensation no more annoying than an insect bite. Abrin and Ricin The rosary pea or jequirity pea (Abrus precatorius) is a plant common to many tropical locations and the source of the toxin abrin. While all parts of the plant are poisonous, it is the seeds (bright orange with a small black cap) that are primarily implicated in poisoning. The seeds or beads have traditionally been used for ornamental purposes (Abrus means “beautiful”) as in prayer beads (precare means “to pray”), jewelry, maracas (Mexican shaker) or used as a folk medicine. Interestingly, because of their uniform size, the seeds have historically been used for weighing gold and jewels in Southeast Asia (1 carat ⫽ 2 seeds). As a result of their availability in certain geographic areas, the seeds are utilized as a means of suicide in some countries. However, many poisonings with jequirity peas (and castor beans) are unintentional as a

Received November 12, 2008; accepted November 18, 2008. DOI: 10.1373/clinchem.2008.124701

Patients who ingest the crushed seed (whole seeds often pass through the GI tract intact) develop vomiting, diarrhea, and epigastric pain. As the abdominal pain progresses, patients exhibit bloody diarrhea and melena. Central nervous system manifestations include altered sensorium, generalized tonic-clonic seizures, and diffuse cerebral edema with raised intracranial pressure. In later stages, patients present with toxic hepatitis, acute renal failure, and hemolysis; acute demyelinating encephalitis has also been reported. Abrin is a toxalbumin (as is ricin) with an estimated fatal dose of 0.1 to 1 microgram/kg or one to two crushed seeds. The mechanism of action for the toxin is inhibition of protein synthesis. The toxin is composed of an A-chain with N-glycosidase activity that inhibits protein synthesis and a B-chain that binds cell-surface receptors and facilitates the A-chain’s entry into cells. The A-chain irreversibly binds the 60S ribosomal subunit effectively blocking the binding of elongation factor 2 and the beginning of protein synthesis. Ricin is derived from the castor bean (Ricinus communis) which originated in Asia and Africa but has now spread to many regions around the globe. These perennial shrubs have large, deep-green, divided leaves and produce spiny fruits that contain mottled seeds resembling an engorged tick. Whereas the outermost shell casing of the plant is used to make castor oil, it is the crushed inner seeds that contain the toxin ricin. Ricin has been considered historically for use in chemical warfare as it can be prepared as a rudimentary plant extract or in powdered, crystalline, or liquid forms. Like abrin, ricin is a glycoprotein lectin composed of an Aand B-chain linked by a disulfide bond. Detection of ricin in fluids has been by immunologically based methods which can measure concentrations at approximately 0.1 ng/mL. Other methods such as MALDI mass spectrometry have also been used. Abrin has been detected in food and beverages by enzyme-linked immunosorbent assay and electrochemiluminescence technologies. © 2008 Lily Robinson

Bibliography • Audi J, Belson M, Patel M, Schier J, Osterloh J. Ricin poisoning: a comprehensive review. JAMA 2005; 294:2342–51. • Crompton R, Gall D. Georgi Markov: death in a pellet. Med Leg J 1980;48:51– 62. • Dickers K, Bradberry S, Rice P, Griffiths GD, Vale JA. Abrin poisoning. Toxicol Rev 2003;22:137– 42. • Garber E, Walker J, O’Brien T. Detection of abrin

844

in food using enzyme-linked immunosorbent assay and electrochemiluminescence technologies. J Food Prot 2008;71: 1868 –74. • Kinamore P, Jaeger R, de Castro F. Abrus and ricinus ingestion: management of three cases. Clin Toxicol 1980;17:401–5. • Olsnes S. The history of ricin, abrin and related toxins. Toxicon 2004;44:361–70.

• Sahni V, Agarwal S, Singh N, Sidkar S. Acute demyelinating encephalitis after jequirity pea ingestion (Abrus precatorius). Clin Toxiocol 2007;45: 77–9. • Subrahmanyan D, Mathew J, Raj M. An unusual manifestation of Abrus precatorius poisoning: a report of two cases. Clin Toxicol 2008;46: 173–5.