MIDAS - Multiple Integration of Data Annotation Study

0 downloads 0 Views 920KB Size Report
called MIDAS (Multiple Integration of Data Annotation Study), that integrates patient data from the Laboratory Information Management System (LIMS), data from.
MIDAS - Multiple Integration of Data Annotation Study Integration of genotype data from NGS and Sanger sequencing with HPO-coded phenotype data in a diagnostic setting

P-ClinG-083

Doelken, SC1*, Eck, SH1, Brumm, R1, Kuecuek S1, Rath S1, Hasselbacher V1, Sofeso C1, Busse B1, Chahrokh-Zadeh S1, Marschall C1, Mayer K1, Rost I1, Klein HG1 * [email protected] 1Center

for Human Genetics and Laboratory Diagnostics Dr.Klein, Dr.Rost and Colleagues, Lochhamer Str. 29, 82152 Munich, Germany

Introduction In order to integrate Next-generation-sequencing data, Sanger sequencing data and phenotype data in a diagnostic laboratory, we developed a software tool called MIDAS (Multiple Integration of Data Annotation Study), that integrates patient data from the Laboratory Information Management System (LIMS), data from the routine Sanger sequencing workflow, as well as phenotype data based on the Human Phenotype Ontology (HPO) with NGS results. In particular, GenotypePhenotype correlations identified in one patient are made available for all other cases, to aid the interpretation and build a comprehensive knowledge base.

Sequencing and Analysis Pipeline

Data Integration - MIDAS

For the NGS-Panel-analysis, exonic regions of more than 1200 custom selected genes are enriched in parallel by oligonucleotide hybridization and capture (Agilent QXT), followed by massively-parallel sequencing on the Illumina NextSeq. By providing various different templates for smaller subpanels, only genes from the requested indication are selected for data analysis, to limit interpretation to relevant genes, while simultaneously minimizing the possibility of incidental findings. Data analysis is performed using the CLC Genomics Workbench and custom developed Perl scripts (Figure 1). Target regions which fail to reach the designated coverage threshold of 20X are re-analyzed by Sanger sequencing. Identified candidate mutations are independently confirmed. Phenotype information is standardized by Human Phenotype Ontology (HPO) terms.

MIDAS integrates patient data from our LIMS system, data from the routine Sanger sequencing workflow as well as phenotype data, based on the Human Phenotype Ontology (HPO) with the NGS results (Figure 3). In particular, Genotype-Phenotype correlations identified in one patient are made available for all other cases to aid the interpretation and build a comprehensive knowledge base. The data can be queried via a web interface for dynamic data access and filtering.

Reports MIDAS

HPO Interface

NGS Sequencing LIMS Mapping Target Region Statistics

Variant Calling Select Target Regions

CLCbio Genomics Workbench

Analysis Software

NGS-DB

Variant Interface

Analysis Pipeline

Runs Interface

Low Coverage Exons Sanger Seq

NGS

Variant Filter coverage

Suggest Documents