Accurate and fast detection of complex and nested ... - Michael Schatz

0 downloads 151 Views 7MB Size Report
Available: github.com/fritzsedlazeck/SURVIVOR. Sniffles: • SVs detecCon for long reads. • Nested SV. • Manuscript
Accurate and fast detection of complex and nested structural variations using long read technologies Fritz Sedlazeck

Friday, Oct 28 CSHL

Structural varia+on

Weischenfeldt et. al. (2013)

Long Read Technologies •  (+) SVs in repe++ve regions •  (+) Can iden+fy nested SVs

•  (-) Higher error rate •  (-) Hard to align

Hard to align

BWA-MEM

Human genome: 1kb Inversion

Improving long read alignment

1.  Split the reads: –  Transloca+ons –  Inversions –  Duplica+ons

2.  Improve alignment: –  Inser+ons –  Dele+ons

Philipp Rescheneder

NGM-LR workflow

NGM-LR reconcile Dot plot of 500bp segments from a read spanning a inversion

NGM-LR inversion

BWA-MEM

NGM-LR

Convex Pairwise Alignment

AAAGAATTCA A-A-A-T-CA

vs.

•  Linear: gap cost always the same •  Affine: separate penal+es for opening and extending a gap •  Convex: ini+ally similar to affine, but becomes propor+onally less costly for larger gaps

AAAGAATTCA AAA----TCA

NGM-LR dele+on

BWA-MEM

NGM-LR

NGM-LR complex SV

BWA-MEM

NGM-LR

Sniffles •  Analyzing split reads, alignment events and noisy regions. •  Parameter es+ma+on •  Op+onal: Genotype es+ma+on •  Op+onal: Clustering of SVs

Analyzing noisy regions •  Extract the differences in the alignment •  Detect the noisy regions: Plane sweep algorithm

•  Store poten+al regions in a self balancing binary tree.

Simula+on/Evalua+on 1.  Simulate 20 SVs of each type using SURVIVOR 2.  Simulate Pacbio and illumina paired end reads 3.  Evalua+on using SURVIVOR Found +-10bp + type Found +-1kbp + ignore type Not found Not simulated

100 250 500 1000 5000 10000 50000 100 250 500 1000 5000 10000 50000 100 250 500 1000 5000 10000 50000 100 250 500 1000 5000 10000 50000 100 250 500 1000 5000 10000 50000

100 250 500 1000 5000 10000 50000 100 250 500 1000 5000 10000 50000 100 250 500 1000 5000 10000 50000 100 250 500 1000 5000 10000 50000

100 80 60 40 20 0 100 250 500 1000 5000 10000 50000

100 250 500 1000 5000 10000 50000

InvDel

100 250 500 1000 5000 10000 50000

100 250 500 1000 5000 10000 50000

Tra

100 250 500 1000 5000 10000 50000

100 250 500 1000 5000 10000 50000

Inv

100 250 500 1000 5000 10000 50000

100 250 500 1000 5000 10000 50000

100 250 500 1000 5000 10000 50000

Indel

100 250 500 1000 5000 10000 50000

100 250 500 1000 5000 10000 50000

Dup

100 250 500 1000 5000 10000 50000

100 250 500 1000 5000 10000 50000

PBHoney 100 80 60 40 20 0

100 250 500 1000 5000 10000 50000

SURVIVOR 100 80 60 40 20 0

100 250 500 1000 5000 10000 50000

Sniffles +BWA 100 80 60 40 20 0

Sniffles +NGM−LR

Evalua+on of Sniffles InvDup

Found +-10bp + type

Found +-1kbp + ignore type

Not found Not simulated

Summary NextGenMap:

Sniffles:

SURVIVOR:

Future work:

•  Short read version: github.com/cibiv/NextGenMap •  Long read mapper: github.com/philres/nextgenmap-lr •  Self detec+on of SVs •  Manuscript in prepara+on

•  Toolkit for SV detec+on on short reads •  Simula+on/Evalua+on of current methods •  Consensus approach •  Accepted Nature Communica+ons •  Available: github.com/fritzsedlazeck/SURVIVOR

•  •  •  • 

•  •  •  • 

SVs detec+on for long reads Nested SV Manuscript in prepara+on Available: github.com/fritzsedlazeck/Sniffles

How much coverage is needed? Nanopore support Analysis of nested SVs Applica+on to Cancer Genomes (See Maria Na@estad's Poster #79)

Acknowledgments

Maria Naaestad Han Fang Srividya Ramakrishnan

Philipp Rescheneder Moritz Smolka Arndt von Haeseler

Daniel Jeffares Jürg Bähler Christophe Dessimoz

Michael Schatz

NGM-LR nanopore BWA-MEM

NGM-LR

BWA-MEM

NGM-LR

Norris et al (2016)

Evalua+on of Sniffles: SKBR3 8 Mb

Her2 GSDMB RARA

PKIA

Inversion was only found by Sniffles

TATDN1

Suggest Documents