Reproducible floating-point atomic addition in data-parallel environment

Recommend Documents

ReproPhylo: An Environment for Reproducible Phylogenomics - PLOS

Sep 3, 2015 - Copy a Project attribute (e.g. a tree or alignment object) into an ..... (TIFF). S1 Methods. An example code. The code snippets in this ... (PDF). S2 Methods. Scripts used in this research. A static HTML .... Suyama M, Torrents D, Bork

The R software environment in reproducible ... - Wiley Online Library

Apr 17, 2012 - ment of good, reliable scientific software is a ... guage designed for data analysis. It pro- ... A mailing list and forum called r-sig-geo. (special ...

Virtual Environment for Exploring Atomic Bonding - EuroHaptics

Page 1 ... crease the amount of time required to create a simple molecule in a virtual ..... Our atom builder is a tool that incorporates the use of 3D graphics and ...

Scalable Cache Coherence for Atomic Blocks in a Lazy Environment

message to the directories in the chunk's write- and read-set, and a skip message ..... Table 1 shows all of the message types needed in ScalableBulk. There are ...

Reproducible Cancer Biomarker Discovery in

Oct 14, 2011 - Pharmacoproteomic analysis of prechemotherapy and postchemotherapy plasma samples from patients receiving neoadjuvant or adjuvant ...

Reproducible journal with worksheets

hoarder. If you answered yes to four or more questions, it's likely that you suffer from financial obsession and hoarding behavior. Your relationship with.

Reproducible Research Using Stata

Jul 11, 2005 - Reproducible Research Using Stata. reStructuredText. Examples. Reproducible .... Page 15 ... Two stata commands: stata2doc and s2d do-file.

Ciencia reproducible - CSIC Digital

Pero existen cursos, libros y material de aprendizaje fÃ¡cilmente dis- ponibles (ver ApÃ©ndice 1). .... http://ropensci.org/ blog/2015/06/03/baad). AdemÃ¡s, es .... Dropbox, Overleaf o Authorea representan un gran avance al res- pecto, pero estÃ¡n ..

RULES Reproducible Worksheets

RULES by Cynthia Lord. Name: ... Each of the following rules can be said to apply to more than one character in RULES. Explain how. Late doesn't mean not ...

Reproducible Econometric Simulations - CiteSeerX

The most recent version of all working papers can be downloaded at .... containing 15 papers in total, there is only a single paper not making use of simulations. ... rences in parentheses): GAUSS (12), Stata (7), MATLAB (6), Fortran (3), R (3), ...

Reproducible doxycycline-inducible transgene

The DHFR locus was chosen due to our interest in its replication domain and .... Figure 5), basal luciferase activity was relatively low and uniform across the four ...

Reproducible Echocardiography in Juvenile Sheep ...

diagnostic tool, we implanted 10 sheep with a pulmonary valve homograft and monitored them through .... Matsushita Electric, Osaka, Japan) for later analysis.

BioContainers: Reproducible Bioinformatics

BioContainers Hackathon, October 2017 http://biocontainers.pro | @biocontainers | github.com/biocontainers. Bioinformatics Software Reproducibility. Software.

Reproducible research using biomodels

Jun 4, 2018 - Abstract Like other types of computational research, modeling and simula- tion of biological processes (biomodels) is still largely ...

Reproducible analyses in bioinformatics with R

Nov 20, 2016 - Modularizing your R code into packages. 6. How to ... Unlike prior source code tracking systems git ... git config âglobal user.email â[email address]â .... HTML). â« particular popular: markdown. 3. Reproducible reports with.

Reproducible Research in Signal Processing - Infoscience - EPFL

ers helps detect fraud, a recent example of which is that of the stem-cell research scandal [7], [8]. In response to this and other cases of scientific misconduct, ...

Experiences with Reproducible Research in Various ... - CiteSeerX

open source software. 1. ... from the research results and publications, and adding software to it to make the results .... noise ratio, we built custom amplifiers and preamplifiers. ... On a qualitative level, we can see three main advantages of re-

Reproducible unipolar resistance switching in ...

(Received 11 March 2007; accepted 5 April 2007; published online 1 May 2007). The resistance switching .... widely used in the leakage current analysis of the .... S.-A. Seo, and I. K. Yoo, IEEE Electron Device Lett. 26, 719 (2005). 10D. S. Lee ...

Experiences with Reproducible Research in Various ... - CiteSeerX

Distributed under the GNU Public Licence - GPL [16], the code allows the user to simulate any ultrawide bandwidth trans- mission by specifying the parameters ...

Reproducible Research - Semantic Scholar

Timothy A. Cohn*, Bruce H. Curran*, Morton M. Denn*, Alexander Dickison, ... Keyser, Timothy L. Killeen, Harvey Leff, Rudolf Ludeke*, Kevin B. Marvel*,.

BeamLab and Reproducible Research

We will also describe a new software package BeamLab containing .... For example, by using Adobe Acrobat. 16 ..... Figure 5 (a) provides a graphical illustration.

Downloadable Reproducible eBooks

These sample pages from this eBook are provided ... To learn about new eBook and print titles, professional ... Celebrate Basic World Geography Skills .

Ciencia reproducible - CSIC Digital

(1) Departamento de Ecología Integrativa, Estación Biológica de Doñana (EBD-CSIC), Consejo Superior de Investigaciones Científicas, Avda. Américo.

Reproducible research and Biostatistics

American Journal of Epidemiology 156, 1â11. LAINE, C., GOODMAN, S. N., GRISWOLD, M. E. AND SOX, H. C. (2007). Reproduc

Reproducible floating-point atomic addition in data-parallel environment

Download PDF

0 downloads 0 Views 360KB Size Report

Comment

... can be depicted with the simplified following CUDA kernel which computes ..... floating-point additions by fixed-point additions, which .... control-flow regions.

Proceedings of the Federated Conference on Computer Science and Information Systems pp. 721–728

DOI: 10.15439/2015F86 ACSIS, Vol. 5

Reproducible floating-point atomic addition in data-parallel environment David Defour

Sylvain Collange

Laboratoire DALI-LIRMM 52 avenue Paul Alduy 66860 Perpignan Cerdex - France Email: [email protected]

INRIA – Centre de recherche Rennes – Bretagne Atlantique Campus de Beaulieu, F-35042 Rennes Cedex, France Email: [email protected]

numbers. This is a major issue as non-determinism of floating-point calculations in parallel programs causes validation and debugging issues, and may even lead to deadlocks [1].

Abstract—Floating-point additions in concurrent execution environment are known to be hazardous, as the result depends on the order in which operations are performed. This problem is encountered in data parallel execution environments such as GPUs, where reproducibility involving floating-point atomic addition is challenging. This problem is due to the rounding error or cancellation that appears for each operation, combined with the lack of control over execution order. In this article we propose two solutions to address this problem: work reassignment and fixed-point accumulation. Work reassignment consists in enforcing an execution order that leads to weak reproducibility. Fixed-point accumulation consists in avoiding rounding errors altogether thanks to a long accumulator and enables strong reproducibility.

I.

The numerical non-reproducibility of floating-point atomic additions is due to the combination of two phenomena: rounding-error and the order in which operations are executed. This problem can be depicted with the simplified following CUDA kernel which computes the sum of N floating-point numbers stored in table i val according to their address i adr in a table res located in global memory. __global__ void GlobalSum(float *i_val, int *i_adr, float *res, int N){ int gid = blockDim.x*blockIdx.x+threadIdx.x;

I NTRODUCTION

Efficient exploitation of modern multicore architectures relies on a hierarchical structuration of computation as well as execution concurrency. This affects determinism and numerical reproducibility making software development tedious. For example, GPUs manage several thousands of concurrent threads thanks to hardware resources such as warp and block schedulers. The design complexity of those processors is such that thread scheduling is mostly unknown and can be considered unpredictable. A common workaround is to enforce interaction between tasks using memory consistency with synchronization mechanisms such as locks, atomics or barriers.

for(uint i=0; i