Towards an Efficient CPU-GPU Code Hybridization

Recommend Documents

Towards a Unified CPUâGPU code hybridization

Time: 3rd order TVD Runge-Kutta, ... Runge-Kutta propagation. 1. .... C++ template code, parametered with the nb of elements processed by each thread.

An Efficient Compression Code for Text Databases *

for words and phrases directly on the compressed text using any known ... Text compression techniques are based on exploiting redundancies in the text.

An Efficient Hybridization Scheme for Stacked Mesh 3D ... - PDP 2012

Technology, University of Turku, Finland. Intl. Conf. on Parallel, Distributed and Network-Based Computing, Feb. 15-17, 2012, Garching, Germany ...

An Approach Towards Efficient Energy Distribution

proved that GA-EMC worked more efficiently as compared to the binary parti- ..... In this chapter, an EMC is proposed with an objective to minimize the energy.

Towards an Efficient Numerical Simulation of

May 16, 2012 - tic materials, and ligaments by one-dimensional nonlinear Cosserat rods. In order to .... and the cartilage layers by a linear viscoelastic KelvinâVoigt law. ÏV(u, Ëu,x) ... Using conservation of linear momentum we obtain the stand

Towards an Efficient Urdu Keyboard Layout ...

Dec 11, 2014 - language computing standards (Hussain, et al.,2004). The QWERTY keyboard has been used as a primary input method for a long time though ...

Towards an Efficient and Scalable Discontinuous Galerkin

An e}cient and scalable Discontinuous Galerkin shallow water model on the ... such as exponential convergence, computational e}ciency, scalability, and the ...

Towards an efficient smart space architecture

International Journal of Advanced Studies in Computer Science and Engineering. Towards an ... information delivered to meet the user's requirements is dependent on ..... context is the principal entry on which all of the system processes are ...

Towards an Efficient Implementation of Sequential Montgomery ...

AbstractâA method to generate efficient implementations of sequential Montgomery Multiplication (MM) is proposed. It is applied to radix-2 MM, but could be ...

Towards an efficient meshless computational ... - Semantic Scholar

techniques like the finite element/finite volume methods require a mesh. ... method of finite spheres and compare it with the other popular interpolation ..... 182. Й└ 0Y 0К and Bj as in equation (23). The integration points ri are the positive .....

Plastic Polymers for Efficient DNA Microarray Hybridization ...

gies for the miniaturization and cost-effective mass production of medical diagnostic devices. ... Corresponding author. Mailing address: Centre de Recherche en.

Towards Efficient Biped Robots

of locomotion variables, namely: step length, hip height, hip ripple, hip offset, foot clearance and link lengths. Based on these variables and their influence on the.

Sparse Code Multiple Access: An Energy Efficient ... - IEEE Xplore

Email: {zhangshunqing, xuxiuqiang, lulei, wuyiqun, hegaoning, ... link level performance. ... Index Termsâ5G, sparse code multiple access (SCMA), en-.

An efficient 3D topology optimization code written in Matlab - IUPUI ...

Oct 17, 2013 - introduces 3D finite element analysis and its numerical implementation. Section ... As an illustration, assume that a design has one single hole. ...... the reader can find a step-by-step tutorial on our website http://top3dapp.com.

An Efficient Compiler Technique for Code Size ... - Cs.UCLA.Edu

code size reduction, for the MIPS 32/16 bit ISA, using our technique. Our approach more than doubles the code size reduction achieved by existing compilers.

VLEI Code: An Efficient Labeling Method for ... - Semantic Scholar

... Code: An Efficient Labeling Method for Handling XML Documents in an RDB. Kazuhito Kobayashi. ââ . Wenxin Liang. â . Dai Kobayashi. â . Akitsugu Watanabe.

an efficient protocol for private iris-code matching ... - Semantic Scholar

In our proposed system, the biometric server, Bob, has an iris gallery which stores the ..... [9] E. Barker, W. Burr, A. Jones, T. Polk, S. Rose, M. Smid, and Q. Dang,.

An Efficient Algorithm for Space-Time Block Code Classification

Classification. Yahia A. Eldemerdash, Octavia A. Dobre, Mohamed Mareyâ , George K. Karagiannidisâ¡, and Bruce Liaoâ â ...... Processes. McGraw-Hill, 2001.

Efficient code motion and an adaption to strength ... - Springer Link

Common subexpression elimination, partia] redundancy elimination and loop ... denotes the non empty data domain and H0 the function, which maps every ...

An Efficient Code-Based Threshold Ring Signature Scheme with a

Jul 2, 2017 - Siyuan Chen,2 and Kim-Kwang Raymond Choo5. 1 Department of ... In this paper, we propose a novel code-based threshold ring signature scheme with ..... dimension, minimum distance, and error-correcting ability of the code ..... stores an

Efficient transformation of an auditory population code ...

We ana- lyzed the transformation of the neural code within this network. Indeed ... The output of this size-constrained network is the only source of acoustic in- ...... (2006) Adaptive filtering enhances information transmission in visual cortex.

TTVFast: An efficient and accurate code for transit timing inversion ...

Mar 7, 2014 - Transit timing variations (TTVs) have proven to be a powerful technique for ...... the number of steps per orbit required to reach arbitrary ac-.

An efficient spectral interpolation routine for the TwoPunctures code

Apr 3, 2013 - Vasileios Paschalidis,1 Zachariah B. Etienne,1 Roman Gold,1 and Stuart L. Shapiro1, â. 1Department of Physics, University of Illinois at ...

An Efficient Algorithm for Space-Time Block Code Classification

transmitted by the column vector Xb = [xb,0, ..., xb,Nsâ1. ] ... represents the vector of the chan- nel coefficients, which characterize the ..... Surveys Tuts., vol. 11, pp.

Towards an Efficient CPU-GPU Code Hybridization

Download PDF

2 downloads 0 Views 3MB Size Report

Comment

One simulation = thousands of CPU hours. 1 ... Could the GPU be used to optimize the CPU code? 3 .... Add tiling algorithms to improve CPU cache efficiency.

Towards an Efficient CPU-GPU Code Hybridization: a Simple Guideline for Code Optimizations on Modern Architecture with OpenACC and CUDA L. Oteski, G. Colin de Verdière, S. Contassot-Vivier, S. Vialle, J. Ryan Acks.: CEA/DIF, IDRIS, GENCI, NVIDIA, Région Lorraine.

CFD at ONERA

One simulation = thousands of CPU hours

From http://www.onera.fr

1

Why optimize? Cost of a CPU hour ~$0.05→$0.10 [Walker (2009)]. Example: 1000 cores for 1 month ~$36,000→$72,000. Dividing application runtime allows to: →be more cost-efficient, →get faster results, →increase the problem size.

2

CPU-GPU differences CPU: few # of cores, vector machines, hard to manage registers, GPU: huge # of cores, vector machines, easy to manage registers (--ptxas-options=-v with nvcc). Question: Due to vector approaches, could GPU and CPU be considered as similar devices? → Could the GPU be used to optimize the CPU code? 3

A prototype flow Yee's vortex with the full Navier-Stokes equations, 2 dimensional structured mesh, Numerical method: → Space: 2nd order Discontinuous Galerkin, → Time: 3rd order TVD Runge-Kutta, → Boundaries: X and Y-periodic.

4

Test case configuration 2001x2001 mesh, 100 time-steps. CPU: Intel Xeon E5-1650v3 (6 cores), Compiler: icc 2016, -O3 -march=native GPU: K20Xm (2688 CUDA cores) Compilers: nvcc CUDA-8.0, -O3 pgc++ 16.10, -O3 -acc -ta=tesla:cuda8.0,nollvm,nordc 5

Experimental protocol Developments with the full CUDA version GPU-CUDA int tidx=threadIdx.x+...; int tidy=threadIdx.y+...; if(tidx