On the Use of Bit Arrays in the Detection of DNA ...

2 downloads 0 Views 321KB Size Report
context of sequences: RRBSMAP, Bismark[3], HPG-. Methyl[4,5] and others. ◇ However, output data must be post-processed to obtain useful statistics.
On the Use of Bit Arrays in the Detection of DNA Regions with Methylated Cytosines

César González Segura 1 Mariano Pérez Martínez 1 Juan M. Orduña Huertas 1 Javier Chaves-Martínez 2 Ana Bárbara Garcia-García 2

1

1 Departamento de Informática, Universidad de Valencia, SPAIN {Cesar.Gonzalez-Segura, Mariano.Perez, Juan.Orduna}@uv.es 2 INCLIVA Health Research Institute, SPAIN

2 GOBIERNO DE ESPAÑA

MINISTERIO DE ECONOMÍA, INDUSTRIA Y COMPETITIVIDAD

Introduction





The study of epigenetics has become crucial to understand diseases like obesity, hypertension or diabetes mellitus 2[1,2]. DNA methylation is also becoming an important factor for epigenetic analysis. 



Several tools are available to retrieve the methylation context of sequences: RRBSMAP, Bismark[3], HPGMethyl[4,5] and others.

However, output data must be post-processed to obtain useful statistics.

2

Methylation detection pipeline 

Our pipeline has several stages: 1. Starts with FASTQ files coming from DNA samples treated with bisulphite sequencing. 2. These sequences are aligned to the reference genome using HPG-Methyl[4,5]. 3. The methylated regions are mapped using the new tool HPG-Hmapper and a report is generated from the maps. Methylation report Methylated C in Chromosome n: 1210013 nt From total C: 8% Methylated C in CpG isles: 77% Etc...

Methylation map Chromosome n

DNA Samples

Bisulphite Treatment

Alignment (HPG-Methyl)

mC mapping (HPG-HMapper)

··· 221210 223130 223132 ···

3

Bit array data structure



Methylation information is stored in a compressed bit array data structure. 





Each byte maps to 8 base-pairs, where a bit set to one indicates a methylated base, and a zero a non-methylated base.

This allows for fast read and write random access to any position in the genome. The data layout allows acceleration using SIMD instructions.

4

Bit array: writing 

Example: writing the methylation information of the first 8nt of Aligned at:  Chromosome 3.  Position 2200. 1) Get the compressed index and offset for the alignment position: Cpos = 2200 / 8 = 275

Coffset = 2200 mod 8 = 0

2) Fetch the compressed value and apply a bit-mask to the corresponding bits: V = (map3(275)

Suggest Documents