HPC for fast and accurate NGS workflows

Using HPC techniques to accelerate NGS workflows Pierre Carrier, Richard Walsh, Bill Long, Carlos Sosa, Jef Dawson Cray Life Science working group Cray Inc.

TrinityRNASeq: collaboration with Brian Haas, Timothy Tickle The Broad Institute Thomas William TU-Dresden COM PUTE 7/6/2014

|

STORE ISMB 2014

|

A N A LY Z E 1

Outline ● Overview of an important NGS workflow example: MegaSeq ● What is the HPC approach? What is distributed memory? Toward the 1 hour genome ● Example case: trinityRNASeq ● Overview of TrinityRNASeq (Inchworm, Chrysalis, Butterfly) ● Inchworm: structure of sequential code and solution with MPI ● MPI scaling results of the mouse and axolotl RNA sequences

● Your feedback: What are the key bottlenecks you encounter?

COM PUTE 7/6/2014

|

STORE ISMB 2014

|

A N A LY Z E 2

NGS Workflow Modular Tools ● Most bioinformatics workflows include several intermediate and separate modular programs, often written in perl, python, java, or C/C++ (i.e., they are serial codes, sometimes with openMP multi-threading): ●

BWA

●

bowtie/bowtie2

●

tophat

●

GATK

●

BioPerl

●

blat

●

samtools

●

Picard

●

R

●

The list goes on…

● Currently, relatively few bioinformatics programs are parallelized for distributed

memory, using the well-established Message Passing Interface (MPI) standard: ●

mpiBLAST/abokiaBLAST

●

ABySS

●

Ray

●

HMMER

●

PhyloBayes-MPI

A typical bioinformatics workflow is a combination of all these separate programs COM PUTE 7/6/2014

|

STORE ISMB 2014

|

A N A LY Z E 3

Overview of an NGS workflow: MegaSeq Megan J. Puckelwartz, et al., Supercomputing for the parallelization of whole genome analysis, http://bioinformatics.oxfordjournals.org/content/30/11/1508

Parallelism can be obtained by computing simultaneously multiple genomes

Software pipeline Picard (java) 2/ReadGroup

BWA (C) 1/ReadGroup

samtools (C) 25/ReadGroup

Picard (java) 100/genome

GATK (6x) (java) 25/genome 1/genome

COM PUTE 7/6/2014

|

STORE ISMB 2014

|

snpEff (perl, python)

A N A LY Z E 4


This outstanding improvement can, we believe, be further improved by the use of HPC programming techniques discussed next, that help reduce computing time to get one full genome in 1 hour

Each genome requires ~1 TB of space

In the MegaSeq example, that would mean computing all 240 genomes in less than an hour

COM PUTE 7/6/2014

|

STORE ISMB 2014

|

A N A LY Z E 6


This outstanding improvement can, we believe, be further improved by the use of HPC programming techniques discussed next, that help reduce computing time to get one full genome in 1 hour

Each genome requires ~1 TB of space

In the MegaSeq example, that would mean computing all 240 genomes in less than an hour

COM PUTE 7/6/2014

|

STORE ISMB 2014

|

A N A LY Z E 6

Overview of an NGS workflows

VariantFiltration

Haplotype Caller

SAMPE (BWA)

PrintReads

ALN (BWA)

BaseRecalibrator

Extract fastq (Picard)

Compress/ Split/ Sort/ Duplicates/ reorder (samtools, Picard)

IndelRealigner

Genome 1

RealignerTargetCreator

Let’s look more closely at this general pipeline:

Annotate variants (snpEff)

GATK

COM PUTE 7/6/2014

|

STORE ISMB 2014

|

A N A LY Z E 7

Overview of an NGS workflows

VariantFiltration

Haplotype Caller

SAMPE (BWA)

PrintReads

ALN (BWA)

BaseRecalibrator

Extract fastq (Picard)

Compress/ Split/ Sort/ Duplicates/ reorder (samtools, Picard)

IndelRealigner

Genome 1

RealignerTargetCreator

Let’s look more closely at this general pipeline:

Annotate variants (snpEff)

…

…

…

…

GATK

ACCESS to SAME DATABASE disk

Approximately < 2 days

Approximately 1 TGGTGTTACCTGAAGTCGAAGCAGG >1 TGTCTGCTCTCAGCAGTGATATCAA >1 CTTTGCCCTTTTTACTAGGGAGGGG >1 CTGTCGGAGATAGAGCCGATGAATT >4 GGCTCTATGAGATGGGTTTGAAGTG >1 AGCACGTGGGCAGGCGCCTGTCGCG >2 ACAAAAATACACTGGCTGCCCTGGC >7 TAAAAACCTTCTCTGGATTAACTCA >2 CAGGTCTGTATACTAATTTTTGTCA >1 AGAAAGACGACCAATCAGGCTGTCG >1 AATGTGAAAACTGTGCTGGTATGTC >1 TCTGAAGGGGGTTTTTCCCGGGGGA >1 AAAAAAGCTTGTAGTGGAGAGTTGG >1 CTTCCCCCTTCAAGGTCAAGTACTT >1 GGATAACCAGATAATAGATTAGCTT

1 Reading

2

(n)

COM PUTE 7/6/2014

Each MPI rank sends K-mers to targeted MPI ranks.

|

STORE ISMB 2014

|

A N A LY Z E 13

Architecture of MPI Inchworm Brian Haas BioIT 2014

MPI Inchworm Phase 2 of 2: Building the Contigs 1 G C GGCTCTATGAGATGGGTTTGAAGTG

A T

MPI_Send MPI_Recv

2

Find greedy extension to the right. C Rank(1) msg to Rank(2): Greedy extension to:

K

GCTCTATGAGATGGGTTTGAAGTG ?

(n)

Rank(2) responds to Rank(1): T

COM PUTE 7/6/2014

|

STORE ISMB 2014

|

A N A LY Z E 14

Results for mouse Advantage of distributed memory: time scalability 5000 MPI-inchworm

Total time (seconds)

4500

OMP-Inchworm

4000 3500 3000

2500 2000

Limited by core counts of one single node

1500 1000 500 0

0

5

10

15

20

25

30

# Threads or # MPI ranks COM PUTE 7/6/2014

|

STORE ISMB 2014

|

A N A LY Z E 22

Results for mouse and axolotl Advantage of distributed memory: memory scalability Mouse MPI inchworm on XC30 (Intel IVB-10)

Contig Builder

Absolute time (seconds)

1000

Kmer Server

100

10

1

100

1000 # of MPI ranks

COM PUTE 7/6/2014

|

STORE ISMB 2014

|

A N A LY Z E 15


Contig Builder


1000

Kmer Server

100

The Mexican Axolotl (~30GB) computation is impossible to perform using the serial code, on 64 or 128 GB nodes; requires using special memory node

10

Axolotl MPI inchworm on XC30 (Intel IVB-10)

1

100

1000 # of MPI ranks Absolute time (seconds)

1000

More memory is required for the axolotl. …Use more MPI ranks to distribute memory

100

10

300

100

1000

# of MPI ranks

COM PUTE 7/6/2014

|

STORE ISMB 2014

|

A N A LY Z E 15


Contig Builder


1000

Kmer Server

100

The Mexican Axolotl (~30GB) computation is impossible to perform using the serial code, on 64 or 128 GB nodes; requires using special memory node

10

Axolotl MPI inchworm on XC30 (Intel IVB-10)

1

100

1000 # of MPI ranks Absolute time (seconds)

1000

Distributed memory allows you to do research on problems that otherwise would not be amenable

More memory is required for the axolotl. …Use more MPI ranks to distribute memory

100

10

300

100

1000

# of MPI ranks

COM PUTE 7/6/2014

|

STORE ISMB 2014

|

A N A LY Z E 15

Some references

C/Fortran

C/Fortran

COM PUTE 7/6/2014

C/Fortran

C++

|

STORE ISMB 2014

|

C/Fortran2008 PGAS

A N A LY Z E 16

Your feedback concerning your workflows?

[email protected] [email protected]

COM PUTE 7/6/2014

|

STORE ISMB 2014

|

A N A LY Z E 16

HPC for fast and accurate NGS workflows

HPC for fast and accurate NGS workflows

Suggest Documents

NGS; High Throughput Sequencing Workflows - Life Sciences

A hierarchical approach for fast and accurate

Enabling Analytic and HPC Workflows with COMPSs - UPCommons

Fast and Accurate Image Inpainting for Advanced

METHOD FOR FAST AND ACCURATE FREQUENCY ... - CiteSeerX

Toward Accurate HPC Productivity Measurement1 - UMD CS

Automated workflows for accurate mass-based putative metabolite ...

Fast, Accurate Simulation for SDN Prototyping - Wisc

Fast processing techniques for accurate ultrasonic

Fast Rescheduling of Multiple Workflows to

Fast and Accurate Thermal Modeling and Optimization for Monolithic ...

Pattern Recognition and Data-Driven Analytics for Fast and Accurate ...

Fast and Accurate Thermal Modeling and Optimization for ... - gtcad

Fast and Accurate Bessel Function Computation

Accurate and Fast Replication on the Generation

Fast and Accurate Uncertainty Estimation in

Fast and Accurate Collocation of the Visible

A fast and accurate image-based measuring

Fast and Accurate Color Image Processing Using

A FAST AND ACCURATE ISOTACHOPHORESIS SIMULATION CODE

Automated, accurate and fast segmentation of 4D

Fast and Accurate Structural RNA Alignment by

Accurate and Fast Iris Segmentation - CiteSeerX

Fast and Accurate Active Appearance Models