MethylCoder: software pipeline for bisulfite-treated sequences

Recommend Documents

Jun 6, 2011 - then directly usable in tools such as samtools (Li et al., 2009). The C, T summations are reported per reference cytosine in simple text and ...

Software Data-Processing Pipeline for Transient Detection

Jan 29, 2010 - software data-processing pipeline is being developed by the Raman Research ... strategies, illustrated with real data at low radio frequencies.

Arithmetic Sequences - Kuta Software

Given the explicit formula for an arithmetic sequence find the first five terms and the ... sequence and the common difference find the recursive formula and the.

Geometric Sequences - Kuta Software

Y. Worksheet by Kuta Software LLC. Given the first term and the common ratio of a geometric sequence find the first five terms and the explicit formula. 15) a.

Clustering pipeline for determining consensus sequences in targeted ...

Oct 7, 2014 - arXiv:1410.1608v1 [q-bio.GN] 7 Oct 2014. 1. Clustering pipeline for determining consensus sequences in targeted next-generation sequencing.

HYD-PREDIC : THE PIPELINE HYDRATE PREDICTION SOFTWARE

PREDICTION SOFTWARE. FLOW ASSURANCE SOFTWARE WITH A DIFFERENCE ... HYD-PREDIC SOFTWARE IS AVAILABLE FOR USE AS PER SAAS ...

Self-Contained Cross-Cutting Pipeline Software Architecture

interface layer consists of fields, controls with which users interact. ... consists of complex shared libraries to manage calling into the data layer, execute file ...

a software pipeline for image-based models and application example

of such software achieves broad technical coverage across a very general ..... features using hexahedral meshes that also allow the addition of irregular ... created within a year by a small team with the goal of facilitating rapid ..... pathway, cir

RIEMS: a software pipeline for sensitive and comprehensive ...

These strategies include initial clustering [12], mapping .... to a eukaryote by the mapping is indicated in a dedicated column. ..... acid (fna) tool commands.

A Parallel Software Pipeline for DMET Microarray ...

Jun 14, 2018 - Giuseppe Agapito ID , Pietro Hiram Guzzi ID .... makes the multiple analysis of DMET datasets faster since, the only operation required from.

A software application for modelling the pipeline ... - Google Sites

The software application developed from the model allows designing a pipeline ... terms the behaviour of the process and

PepLine: A Software Pipeline for High-Throughput ... - ACS Publications

Mar 19, 2008 - Tandem Mass Spectrometry Data on Genomic Sequences ... PepLine is a fully automated software which maps MS/MS fragmentation spectra ... For M.T.: e-mail, ..... The default value for R0 is 1, assuming good quality spectra.

Sentieon DNA pipeline for variant detection - Software-only ... - PeerJ

Jan 22, 2016 - 45 comparison of the two pipelines. The eight samples were analyzed. 46 individually using SAMtools 1.2/GATK 3.3 and Sentieon DNA. 47.

A Unified Software Pipeline Construction Scheme for ... - Google Sites

Abstract. We present a software pipeline construction scheme for DO- loops, while-loops, and loops with multiple exits, which unifies, simpli- fies, and ...

Automatic Extraction of Pipeline Parallelism for Embedded Software

Index TermsâAutomatic Parallelization, Embedded Software, ... 2011 IEEE 17th International Conference on Parallel and Distributed ... 2. 3. (a) Sequential code for (i = 0; i < NUMAV; ++i) { float sample_real[SLICE]; .... Edge type: Read-after-Write

A Unified Software Pipeline Construction Scheme for ... - Google Sites

scheduling, and k the degree of kernel unrolling required by modulo expan- sion3. ... Build the ScdTask, called lasttask

A review of software for analyzing molecular sequences - BioMed ...

Nov 24, 2014 - tools and algorithms directly into the pipeline [5]. WATERS ..... a sliding window, variations in evoluti

A review of software for analyzing molecular sequences - Springer Link

Nov 24, 2014 - Many software packages for analyzing microbial sequences such as the 16S gene from 454 sequencers and Illumina platforms are available.

HIVSetSubtype: Software for Subtype Classification of HIV-1 Sequences

be useful in the database. Such challenges have led distinct research communities to derive computational research on evolutionary analysis of HIV forward.

Pipeline

För utveckling av verksamhet, produkter och livskvalitet. Datorteknik. Tomas Nordström. Föreläsning 10. Accelerationsmekanismer ...

pipeline fundamentals - Texas Pipeline Association

Is your home heated by Natural Gas? â Your local gas company distribution (pipeline). â¢ If energy sources are offsho

Optimal Pipeline Connection for the West African Gas Pipeline Project

Feb 20, 2011 - allow for direct flow from S4 toS3 ,S3 to S2 andS2 to S1 . The geometric property of direct ..... Retrieved from: http://www.google.com.gh/search?

de-pipeline a software-pipelined loop - Computer Science Home

dependencies among the instructions in the prelude and in the loop kernel. For example, if instruction at L1 is issued at cycle t, then the three instructions ...

For Immediate Release - Alyeska Pipeline

Dec 30, 2016 - âWe're supportive of an external environment that encourages responsible resource development and helps

MethylCoder: software pipeline for bisulfite-treated sequences

Download PDF

6 downloads 2677 Views 63KB Size Report

Comment

BIOINFORMATICS APPLICATIONS NOTE Sequence analysis

Vol. 27 no. 17 2011, pages 2435–2436 doi:10.1093/bioinformatics/btr394

Advance Access publication June 30, 2011

MethylCoder: software pipeline for bisulfite-treated sequences Brent Pedersen∗ , Tzung-Fu Hsieh, Christian Ibarra and Robert L. Fischer Department of Plant and Microbial Biology, University of California, Berkeley, CA 94720, USA Associate Editor: Martin Bishop

ABSTRACT Motivation: MethylCoder is a software program that generates per-base methylation data given a set of bisulfite-treated reads. It provides the option to use either of two existing short-read aligners, each with different strengths. It accounts for soft-masked alignments and overlapping paired-end reads. MethylCoder outputs data in text and binary formats in addition to the final alignment in SAM format, so that common high-throughput sequencing tools can be used on the resulting output. It is more flexible than existing software and competitive in terms of speed and memory use. Availability: MethylCoder requires only a python interpreter and a C compiler to run. Extensive documentation and the full source code are available under the MIT license at: https://github.com/brentp /methylcode. Contact: [email protected] Received on April 15, 2011; revised on June 6, 2011; accepted on June 24, 2011

1

INTRODUCTION

Whole-genome bisulfite sequencing (BS-Seq) is used to determine the methylation levels at each cytosine. In DNA that has been treated with bisulfite, unmethylated cytosines are converted to uracil and then to thymine following PCR amplification. Methylated cytosines are not converted; so an epigenetic mark is visible in the genetic code. In order to determine nucleotide-resolution methylation levels, bisulfite-treated reads must be re-aligned to the reference genome. Since some of the cytosines (C) in each read may have been converted to thymines (T), direct alignment is not possible because conversions will appear as mismatches. Our approach is to artificially (in silico) convert remaining Cs to Ts in the reads and the reference genome. A short-read aligner can then be used to map the converted reads to the converted genome. For each alignment, the original read and reference sequence can be recovered. A reference C that is aligned to a T in a read can be considered as converted, or unmethylated. Previous software for aligning BS-Seq data used a variety of methods (see Huss, 2010 for a more complete list). Bisulfite sequence mapping program (Xi and Li, 2009) uses a bit mask (that treats Cs and Ts identically) and hashing to map bisulfite-treated reads. Earlier tools such as CokusAlignment (Cokus et al., 2008) use strategies that are slower than current aligners. BS-Seeker (Chen et al., 2010) and Bismark (Kreuger and Andrews, 2011) use a similar strategy to the one described here. BS-Seeker is memory intensive, and does not support paired-end reads. Both Bismark and BS-Seeker ∗ To

whom correspondence should be addressed.

are limited to the bowtie aligner and do not support color space reads. Bisulfite-treated reads analysis tool (BRAT; Harris et al., 2009) also uses a hashing approach and is the only other aligner that avoids double-counting overlapping paired-end reads. We introduce MethylCoder, a fast, memory-efficient BS-Seq pipeline. It supports both paired- and single-end reads in color space or nucleotide formats. MethylCoder provides a single entry point and common output formats for the bowtie (Langmead et al., 2009) and genomic short-read nucleotide alignment program (GSNAP) (Wu and Nacu, 2010) aligners. Each of these aligners has different strengths; GSNAP has no limitation on the size of the reference, but does not consider quality information with the reads. Bowtie can only map to references