IntOGen: integration and data mining of ...

9 downloads 0 Views 247KB Size Report
the Zinc Finger Consortium continues to make reagents and software for both .... Atlas4 and the International Cancer Genome Consortium (http:// · www.icgc.org/) ...
© 2010 Nature America, Inc. All rights reserved.

correspondEnce in their Correspondence1 provide data that enable calculation of failure rates for modular assembly. Although it is true that modular assembly yielded ZFNs for ~25% of the DNA sites targeted, failure rates measured instead by the number of zincfinger proteins tested are consistent with those reported in our original Correspondence2. For example, at the human CCR5 gene, Kim et al. screened 315 pairs of ZFNs for activity4; this large-scale effort yielded only a small number of functional ZFN pairs (93.3% failure rate for ZFN pairs tested). Similarly, for the tobacco SuRB gene3, we tested 32 zinc-finger arrays in vitro but identified only three with functional activity (91.6% failure rate for zinc-finger arrays tested). These data are consistent with our original predicted failure rates of ~94% and ~76% for modularly assembled ZFN pairs and zinc-finger arrays, respectively 2. We believe that failure rates measured by numbers of zinc-finger arrays or ZFN pairs tested rather than by numbers of DNA sites targeted are more relevant statistics for potential ZFN users because these influence how many proteins must be modularly assembled and tested for each potential site. We also respectfully disagree with various statements Kim et al.1 make regarding oligomerized pool engineering (OPEN), a selection-based method for engineering zinc-finger arrays5. Kim et al.1 contend that our recent report3 shows that “modularly assembled ZFNs … outperformed ZFNs made using OPEN in terms of mutation frequencies” and that modular assembly and OPEN “resulted in genome modification at one out of four target sites”1. However, the ZFNs made by modular assembly and by OPEN in our study (ref. 3) were designed to recognize different DNA target sites, and therefore the conclusion of Kim et al.1 is based on an indirect comparison. For the single site where direct comparison was possible, only the OPEN approach yielded functional ZFNs. In addition, we found in a different direct comparison of OPEN and modular assembly at five different target sites in the EGFP gene that ZFNs made using OPEN were active at four sites, whereas modularly assembled ZFNs were active at only one site5. Furthermore, the ZFNs made using OPEN outperformed the modularly assembled ZFNs at this one site. We believe that the higher activity and success rate of ZFNs made using OPEN is most likely due to the method’s explicit consideration of the well-established context-dependent behavior of zinc fingers2,5, a parameter that is largely ignored in modular assembly. Kim et al.1 assert that “ZFNs made using OPEN thus far are largely limited to targeting GNN repeat sequences.” However, ~40% (11 of 28) of the half-sites successfully targeted in endogenous genes by ZFNs made using OPEN to date actually contain one or more non-GNN subsites (Supplementary Table 1). Furthermore, potential users of the ZFN technology should be aware that with both modular assembly and OPEN it is easiest to target sequences composed entirely of GNN subsites. Kim et al.1 further suggest that “careful choice and use of reliable modules could improve success rates” of modular assembly. However, given that our original Correspondence showed that modular assembly has considerably higher failure rates for target sites harboring one or more non-GNN subsites than for those composed solely of GNN subsites2, taking this approach could also substantially restrict the targeting range of the method. In summary, we advise potential users to carefully weigh the effort required both to engineer and to validate ZFNs. Although modular assembly is simpler to perform than the selection-based 92 | VOL.7 NO.2 | FEBRUARY 2010 | nature methods

OPEN method, our direct comparisons suggest that OPEN is more efficient than modular assembly for engineering functional ZFNs. As researchers who have practiced both methods2,3,5–7, we have concluded that modular assembly requires as much (if not more) time and effort to use than OPEN when one considers the requirement to screen hundreds of largely nonfunctional modularly assembled ZFNs for cellular activity. Nonetheless, the Zinc Finger Consortium continues to make reagents and software for both modular assembly and OPEN available to academic scientists (http://www.addgene.org/zfc/; http://www. zincfingers.org/software-tools.htm). We believe that rather than attempting to improve the success rate of modular assembly, future efforts should instead focus on further simplification of selection-based techniques or development of more effective design-based methods that account for the context-dependent behavior of zinc-finger domains. Note: Supplementary information is available on the Nature Methods website. ACKNOWLEDGMENTS We thank the members of our laboratories for helpful discussions. J.K.J. is supported by US National Institutes of Health (R01GM069906, R01GM088040, RC2HL101553, R24GM078369 and R21HL091808), the Cystic Fibrosis Foundation and the Massachusetts General Hospital Pathology Service. D.F.V. is supported by the US National Science Foundation (DBI 0501678 and MCB 0209818). T.C. is supported by the German Research Foundation (SPP1230–CA311/2), the German Ministry of Education and Research (01GU0618) and the European Commission’s 6th and 7th Framework Programmes (ZNIP–037783 and PERSIST–222878). COMPETING INTERESTS STATEMENT The authors declare no competing financial interests.

J Keith Joung1,2, Daniel F Voytas3 & Toni Cathomen4 1Molecular Pathology Unit, Center for Cancer Research, and Center for Computational and Integrative Biology, Massachusetts General Hospital, Charlestown, Massachusetts, USA. 2Department of Pathology, Harvard Medical School, Boston, Massachusetts, USA. 3Department of Genetics, Cell Biology and Development, and Center for Genome Engineering, University of Minnesota, Minneapolis, Minnesota, USA. 4Department of Experimental Hematology, Hannover Medical School, Hannover, Germany. e-mail: [email protected], [email protected] or [email protected]

Kim, J.-S., Lee, H.J. & Carroll, D. Nat. Methods 7, 91 (2010). Ramirez, C.L. et al. Nat. Methods 5, 374–375 (2008). Townsend, J.A. et al. Nature 459, 442–445 (2009). Kim, H.J., Lee, H.J., Kim, H., Cho, S.W. & Kim, J.S. Genome Res. 19, 1279– 1288 (2009). 5. Maeder, M.L. et al. Mol. Cell 31, 294–301 (2008). 6. Foley, J.E. et al. PLoS ONE 4, e4348 (2009). 7. Zou, J. et al. Cell Stem Cell 5, 97–110 (2009).

1. 2. 3. 4.

IntOGen: integration and data mining of multidimensional oncogenomic data To the Editor: The use of high-throughput techniques has come to the fore in modern cancer research. Several projects collate and analyze multiple datasets from cancer gene studies 1–3. The vast amount of oncogenomic data produced to date, together with data from new, large-scale projects such as The Cancer Genome Atlas4 and the International Cancer Genome Consortium (http:// www.icgc.org/) provides two new challenges5: (i) biologically relevant integration of the information coming from heterogeneous sources and (ii) an intuitive visualization system to capture changes important to tumorigenesis (driver alterations).

correspondEnce

Genes

Lung Ovary Lymph nodes Liver Brain Stomach Testis Thyroid gland HR system Pancreas Breast Colon Urinary bladder Prostate gland Mouth Cervix uteri Mediastinum

Cell-cycle genes

© 2010 Nature America, Inc. All rights reserved.

GSE10072 GSE1037 GSE11117 GSE12428 GSE1897 GSE2514 GSE4115 GSE6044 GSE7339 GSE7670

Genes

IntOGen is a framework that addresses a Experiment Cancer these issues by collating, organizing, anaSamples 1 2 3 n type A lyzing and integrating data from genomeStep 1 Step 2 wide experiments that study several types Identification of Combination of driver alterations experiments of alterations in different human cancer + … types. The current version contains data from almost 800 independent experiments studying transcriptomic alterations, 0 0.05 1 genomic gains and losses and somatic Altered Not altered Corrected P value mutation information (Supplementary Note 1). We designed the system to incorb porate data from new experiments and from other alteration types when available (for example, epigenomic and proteomic CDC2 CCNB2 data). There are several characteristics that PTTG1 MAD2L1 make IntOGen unique. (i) We annotate all BUB18 CCNB1 samples manually according to the same Combination CDKN2A structured vocabulary: the International SFN Cell cycle CDC45L p53 signaling pathway Classification of Disease for Oncology PCNA ECM-receptor interaction CCNE2 Antigen processing and presentation (http://www.who.int/classifications/icd/ CDC6 Bladder cancer YWHAZ Systemic lupus erythematosus en/), which specifies tumor topography and CCNA2 Type I diabetes mellitus morphology. In this way, we can relate speModule analysis cific alterations with clinical annotations 0 0.05 1 Cell cycle in a hierarchical way (types and subtypes) P value for upregulation and combine and compare the results for different experiments with the same disease classification to detect shared patterns. (ii) To identify the most relevant alterations, Figure 1 | Identification of driver alterations at different levels in IntOGen. (a) Matrix representation of an experiment shows alteration patterns of genes over samples. First we detect genes that are altered in more we apply statistical methodologies based samples than expected by chance. Then we combine experiments annotated with the same International on the rationale that driver alterations are Classification of Diseases (ICD) term to obtain a combined P value (Supplementary Methods). (b) Once found in more samples than expected by we have a P value per experiment (step 1 in a), we perform an enrichment test for each module (module analysis). Then we combine the experiments according to cancer types to obtain a combined P value. The chance, and we provide combined evidence from multiple studies that analyze the same results are shown in color-coded matrices with red denoting significant P values (corrected P < 0.05), gray tumor type (Fig. 1a and Supplementary denoting insignificant P values and white indicating no analysis for that item. HR system, hematopoietic and reticuloendothelial system. Methods). (iii) Analysis at the level of individual genes does not capture the full biological complexity. For this, we need to analyze the contribution hope it will help bridge the gap between biological and clinical of biological modules (for example, pathways) to cancer (Fig. 1b, information in cancer. Supplementary Note 1 and Supplementary Methods). Analysis of Note: Supplementary information is available on the Nature Methods website. TCGA glioblastoma expression data4 using IntOGen methodology successfully identified similar genes and modules as highlighted in COMPETING INTERESTS STATEMENT the original study (Supplementary Note 2). (iv) As this exhaustive The authors declare no competing financial interests. analysis generates huge amounts of data, we developed a powerful Gunes Gundem1, Christian Perez-Llamas1, Alba Jene-Sanz1, and intuitive web interface (http://www.intogen.org/), designed to Anna Kedzierska2, Abul Islam1, Jordi Deu-Pons1,3, be a discovery tool for cancer research (Fig. 1 and Supplementary Simon J Furney1,4 & Nuria Lopez-Bigas1 Note 1). The users can identify the genes and modules significantly 1Research Unit on Biomedical Informatics, Department of Experimental and Health (corrected P < 0.05) altered in a cancer type or explore the alteraScience, Universitat Pompeu Fabra, 2Bioinformatics and Genomics Program, Centre tion pattern of a gene module of interest across many cancer types. for Genomic Regulation and 3National Institute of Bioinformatics, Biomedical Informatics Node, Barcelona Biomedical Research Park, Barcelona, Spain. 4Present Users can easily browse results of individual experiments or of combinations of experiments with the same clinical annotation or address: National Institute for Health Research Biomedical Research Centre for Mental Health, Institute of Psychiatry, King’s College, London, UK. from pathways to genes in these pathways. In addition, users can e-mail: [email protected] perform custom combinations of the experiments or analyze their 1. Rhodes, D.R. et al. Neoplasia 6, 1–6 (2004). own modules in the context of cancer. In summary, IntOGen is a unique and valuable resource for 2. Mitelman, F., Johansson, B. & Mertens, F. Nat. Rev. Cancer 7, 233–245 (2007). 3. Baudis, M. & Cleary, M.L. Bioinformatics 17, 1228–1229 (2001). the cancer research community that facilitates interrogation of 4. The Cancer Genome Atlas Research Network. Nature 455, 1061–1068 (2008). 5. Chin, L. & Gray, J.W. Nature 452, 553–563 (2008). a substantial amount of data from genome-wide analyses. We

nature methods | VOL.7 NO.2 | FEBRUARY 2010 | 93

Suggest Documents