GPU Computing with CUDA Lecture 10 ... - Boston University

Recommend Documents

GPU Computing with CUDA Lecture 1 ... - Boston University

‣Main references. - Kirk, D. and Hwu, W. Programming Massively Parallel Processors. - Sanders, J. and Kandrot, E. CUDA by Example. - CUDA C Programming ...

GPU Computing with CUDA

on compiled parallel C applications. â« Available in laptops, desktops, and clusters. â¢ GPU parallelism is doubling e

Introduction to GPU computing with CUDA

Dec 8, 2014 - CUDA software abstraction / programming model: basics, development ...... 3D in Symbian , Android, iPhone SDK, ... â¢ OpenGL ES 2.0.X (2007- ...

Introduction to GPU computing with CUDA

Dec 10, 2014 - book CUDA Application Design and Development, 2011, by R. ...... https://developer.nvidia.com/content/native-android-development-tutorial.

GPU Parallel Computing Architecture and CUDA Programming ...

Data Parallel Problem Decomposition. Parallel Memory Sharing. Transparent Scalability. CUDA Programming Model. CUDA: C on the GPU. CUDA Example.

IRJET- Parallel Computing with CUDA

https://irjet.net/archives/V5/i6/IRJET-V5I6314.pdf

GPU Computing - CUDA - A short overview of ...

command-line profiler; export COMPUTE_PROFILE=1; export .... PyCuda wraps the CUDA driver API into the Python language reference : Andeas KlÃ¶ckner,.

NVIDIA CUDA Software and GPU Parallel Computing Architecture

Abstract. In the past, graphics processors were special purpose hardwired application accelerators ... elected to the National Academy of Engineering (NAE) for.

Fast heterogeneous computing with CUDA ... - Semantic Scholar

atom potentials per second, performing 290 GFLOPS of floating point arithmetic. ... contribution of each atom float dx = x ... desktop in a personal supercomputer.

GPU Computing

Mar 4, 2011 ... ABSTRACT |The graphics processing unit (GPU) has become an integral part of today's ... interest from the scientific computing community. Simi- larly, the ... Downloaded on June 20, 2009 at 18:30 from IEEE Xplore. Restrictions a

Ray Tracing on a GPU with CUDA - Semantic Scholar

Martin ZlatuÅ¡ka. Czech Technical University in Prague. Faculty of Electrical .... per by Horn et al. [Hor07] addresses the lack of lo- cal memory to implement the ...

CUDA-lite: Reducing GPU Programming Complexity - CiteSeerX

the memory interface can enhance the main memory access efficiency if the ac- ... This paper presents CUDA-lite, an experimental enhancement to CUDA that.

aes on gpu using cuda - wseas.us

result was on AES 128, for an 8 MB input file, the performance being of 8.28 Gbps. The GPU .... CUDA AES is up to 134 times faster than AES CPU (JAVA). ... AES GPU vs CPU .... http://www.hardwareheaven.com/reviews/8800GTs/whatis.php.

CUDA-lite: Reducing GPU Programming Complexity - IMPACT

The CUDA programming environment from. NVIDIA is an attempt to make programming many-core GPUs more accessible to programmers. However, there are ...

GPU: CUDA Programming Model - NanoCAD Group

Introduction. • GPU. – A highly parallel multithreaded many core processor. • CUDA. – A parallel programming model and software environment that exposes ...

GPU CUDA Accelerated Image Inpainting using ...

*Corresponding author, e-mail: [email protected], [email protected], [email protected]. Abstract. This paper describes the technique to ...

High-Productivity CUDA Programming | GTC 2013 - GPU ...

It's 93% of GPU compute time, but only 36% of total time. What's going on during that other ~60% of time? Maybe it's one-time startup overhead not worth looking ...

Speed-limit Sign Recognition with GPU Computing

[Courtesy of BMW and Opel]. SIFT-based Approach. (SIFT: Scale Invariant Feature .... (feature vector) for each keypoint to provide invariance to viewpoint and.

GPU Computing Taxonomy

Book Citation Index in Web of Scienceâ¢. Core Collection ... Keywords: host, device, GPU computing, single program multiple data (SPMD). 1. Objective ... perform computations or using graphics processing unit (GPU) to speed up computations. 2.1. ...

Directives Based GPU Computing

Nov 16, 2012 - To provide better abstractions for programming GPU ... compare GPU directive models; these include Hernandez et al. ...... neural network.

Quantum Computing - Lecture Notes.pdf - Washington - University of ...

The following lecture notes are based on the book Quantum Computation ... computing course that I teach here at the University of Washington to computer science grad- ... 8. 3 Entanglement. 9. 4 Teleportation. 11. 5 Super-dense Coding . 15.

Lecture 4: CUDA Programming Model - NYU

Architecture and Programming. Mohamed Zahran (aka Z) [email protected] http://www.mzahran.com. CSCI-GA.3033-012. Lecture 4: CUDA Programming.

Lecture 10

This lecture was based on Elaine Rich and Kevin Knight, Artificial Intelligence textbook. Exercise: compare goal-stack planning and regression planning from ...

Lecture 10

P(book|verb) > P(book|noun), because there are many more nouns than ..... MM based Tagger: Small Example that. VERB. NOUN. PRON. CONJ. Book. ADJ.

GPU Computing with CUDA Lecture 10 ... - Boston University

Download PDF

72 downloads 104 Views 1MB Size Report

Comment

‣Tomorrow's lab: implement O(N2) N body calculation on GPU. ‣Nyland L., Harris M., Prins J. “Fast N-Body Simulation with CUDA”. GPU Gems 3, Chapter 31.

GPU Computing with CUDA Lecture 10 - Applications - N body problem

Christopher Cooper Boston University August, 2011 UTFSM, Valparaíso, Chile 1

Outline of lecture ‣ What is an N body problem? ‣ What kind of problems can be solved with N body approach? ‣ Fast calculation algorithms ‣ N-body problem on GPU

2

Introduction ‣ N body problem calculates the interaction between N bodies

xj xi

x = xi − xj

! |x|ij = (xi − xj )2 + (yi − yj )2 + (zi − zj )2

‣ Classic example: Gravitational Potential

N ! mj φi = |x| ij j=0

∇φ = g 3

Introduction ‣ Useful for anything with Green’s function! ‣ Poisson ∇ φ = f - Astrophysics - Fluid mechanics - Electrostatics 2

!

!

Ω

φ∇2 GdΩ +

Ω

‣ Helmholtz ∇ φ + k φ = f - Acoustics - Electromagnetics 2

2

G∇2 φdΩ = !

Γ

!

Gf dΩ

Ω

n · [G∇φ − (∇G)φ]dΓ =

φ=

!

!

Gf dΩ

Ω

Gf dΩ

Ω

‣ Poisson- Boltzmann ∇(!∇φ) + k 2 φ = f - Geophysics - Biophysics

4

Introduction ‣ Radial Basis Function interpolation

u(x) =

N ! i=0

αi φ(|x − xi |)

5

Order of N body problems ‣ Direct calculation is O(N2) N ! mj Φi = |x| ij j=0

6

Fast algorithms ‣ Multipole expansions

7

Fast algorithms Independent of i

Independent of j

8

Fast algorithms

9

Fast algorithms ‣ Tree Code - Barnes and Hut ‣ Fast Multipole Methods (FMM) - Greengard and Rokhlin

Tree code

FMM 10

Fast algorithms ‣ Flow of calculation

11

N body on GPU ‣ Tomorrow’s lab: implement O(N2) N body calculation on GPU ‣ Nyland L., Harris M., Prins J. “Fast N-Body Simulation with CUDA”. GPU Gems 3, Chapter 31. ‣ Lots of operations per load ‣ Much faster, but still O(N2)!

12

N body on GPU ‣ Look at the problem as a matrix vector product N ! mj Φi = |x| ij j=0

Φi =

N ! j=0

mj (|x|ij + !2 )

‣ Each thread will compute one row (not one element)

13

N body on GPU ‣ Data reuse: Tiling - Load tiles to shared memory for data reuse

14

N body on GPU ‣ Kernel Allocate shared memory Start loop on blocks Load to shared memory Start loop on elements Do sum 15

N body on GPU - Optimization ‣ Loop unrolling - Avoid unnecessary operation ‐ #pragma unroll 32

‣ Separate last loop - The number of elements might not be multiple of the block size - Separate last loop to avoid unnecessary warps performing a calculation

‣ Vary block size

16

N body on GPU - Performance metric ‣ Compute bounded problem - Performance in FLOPS/s - Count number of floating point operations and divide by kernel execution time

17