Debugging CUDA Kernel Code with NVIDIA NSight Visual Studio ...
Recommend Documents
Top 100 NVIDIA CUDA application showcase speedups as of. May, 9 .... application development spanning decades. 2. ... Sh
Protected mode debugger: Visual Studio or WinDbg. Let's start by ... Title
CodeView Tutorial Example ... Let's step through the basic features of the Visual
Studio.
been provided to support debugging CUDA code. cuda-gdb is supported on 32-
bit and ... Just as programming in CUDA C is an extension to C programming, ...
Apr 24, 2007 ... ii. CUDA Programming Guide Version 0.8.2 ... CUDA: A New Architecture for
Computing on the GPU ....................................3. 1.3. Document's ...
Aug 27, 2009 - NVIDIA OpenCL Programming Guide Version 2.3 ... CUDA's Scalable Programming Model . ..... and Application Programming Interfaces. 1.3.
Local. Off-chip. No. R/W. One thread. Thread. Shared. On-chip. N/A. R/W. All threads in a block Block. Global. Off-chip.
Aug 27, 2009 - NVIDIA OpenCL Programming Guide Version 2.3 ... CUDA's Scalable Programming Model . ..... and Application
Aug 31, 2009 - NVIDIA OpenCL Programming Guide [ 1 ] and NVIDIA OpenCL Best Practices. Guide [ 2 ]. Heterogeneous Comput
would be to consider re-tuning gzip in the light of modern demands. However we have not tried to do this, instead we seek a CUDA C++ kernel which is not only ...
Aug 27, 2009 - 1BOpenCL on the CUDA Architecture. 12. NVIDIA OpenCL Programming Guide Version 2.3. A kernel is executed over an NDRange by a grid ...
CUDA programming on NVIDIA GPUs. Mike Giles [email protected].
Oxford University Mathematical Institute. Oxford-Man Institute for Quantitative ...
Unified Virtual Addressing. Faster Multi-GPU ... and then call cudaMemcpy() as usual ..... Amazon EC2. Peer 1 ... ICHEC.
Jul 12, 2011 - Many find 66% is enough to saturate the bandwidth. Look at increasing ... http://developer.download.nvidi
which to operate. Additional parallelism can be exposed to the GPU's hardware schedulers and load balancers dynamically,
Oct 18, 2011 - functions using proprietary NVIDIA compilers/assemblers, compiles the host code using a general purpose .
Dynamic Parallelism in CUDA is supported via an extension to the CUDA programming model that enables a CUDA kernel to cr
Jul 16, 2013 - fusion, or by generation of kernels of optimized size from a code which is not explicitly ... data-parallel language embedded in Python. It seems ...
Each thread block reduces a portion of the array. But how do we communicate ... Expensive to build in hardware for GPUs
Learn about the latest breakthroughs developers, engineers and researchers are achieving on the GPU. Learn about the sei
Mar 24, 2008 - In this whitepaper the Discrete Cosine Transform (DCT) is discussed. The two-dimensional variation of the
Adobe Premiere Pro CS5 takes a bold step forward with its new Mercury
Playback. Engine by .... (Figure 2 shows a representation of heterogeneous
computing.) ...
mendation algorithms should be optimized to work with large amounts ..... diction code from existing C# implementation to Python, utilizing NumPy [19].
Learn about the latest breakthroughs developers, engineers and researchers are achieving on the GPU. Learn about the sei
Debugging CUDA Kernel Code with NVIDIA NSight Visual Studio ...
Creating projects and CUDA build system – live demo ... Visual Studio integrated
development for GPU and CPU ... Visual Studio 2008 SP1 and 2010 SP1.
S3478 - Debugging CUDA Kernel Code with NVIDIA Nsight Visual Studio Edition
Agenda What is Nsight and how can it help me? Creating projects and CUDA build system – live demo Nsight CUDA debugger features helps you find issues – live demo Nsight support for CUDA Dynamic Parallelism – live demo Summary of Nsight — Overview of Nsight System & application trace and CUDA profiling — Supported configurations
Q&A
NVIDIA Nsight Visual Studio Edition Visual Studio integrated development for GPU and CPU
Build
Debug
Profile
NVIDIA Nsight
GPU Debugger Local and single GPU Compute and Graphics debugging GPU breakpoints including complex conditionals GPU memory views and exception reporting Dynamic Shader Editing
Graphics Inspector
System Analysis
Real-time inspection of 3D API calls and states
View CPU & GPU events on a single timeline
Investigate GPU pipeline states
Examine workload dependencies, memory transfe
See contributing fragments with Pixel History
CPU/OS, Compute, Direct3D and OpenGL Trace
Profile frames to find GPU bottlenecks
Trace WDDM and I/O ETW events Capture call stack and jump to source
NVIDIA Nsight for Compute Developers
CUDA debugger
• • •
Info page and Warp watch view
• •
Use on-target conditional breakpoints to locate errors
CUDA profiler
Debug CUDA kernels directly on GPU hardware
CUDA memory checker
• •
Application and system trace
Out of bounds memory access detection Enables precise error detection
• • •
Review CUDA activities across CPU and GPU Activity correlation panel Source code correlation Deep kernel analysis to detect factors limiting maximum performance Unlimited experiments on live kernels
CUDA® Build System CUDA Toolkit 4.2 and 5.0 support Visual Studio 2008 SP1 and 2010 SP1 CUDA project wizard Visual C++ and .NET project integration Project setting extensions
NVIDIA Nsight CUDA® Debugging Native GPU debugging with mixed CUDA-C/PTX/SASS assembly Debug GPU PTX/SASS code without Symbolic info with CUDA-C Debugger attach to running process On device conditional breakpoint evaluation with program variables GPU memory views and data breakpoints CUDA expression engine and stack frame support Massively-threaded GPU kernels navigation and run-control CUDA memory checker CUDA Info Tool-Window shows {all CUDA resources | Memory Allocations, Contexts, …}
Application to debug: Mandelbrot Mandelbrot Set is the visual representation of an iterated function on the complex plane Z = Z2 + C
Attaching to a running Kernel with Nsight Set the Nsight monitor to “Use this monitor for CUDA attach”
From the command line, enable Nsight to catch GPU exceptions & memory issues: — SET NSIGHT_CUDA_DEBUGGER=2 — Setting it to 1 will allow Nsight to catch subset of GPU exceptions
Nsight Debugging Make sure “Generate GPU Debug information” is set to “Yes (-G)” Debug | Start Debugging (or F5), launches the CPU debugger Choose Nsight | Start CUDA Debugging — This will launch the “Startup Project”
Working directory = project directory if setting is empty — != CPU directory setting
CUDA info toolwindow • Provides a view from the CUDA driver API layer, which sits below the CUDA runtime CUcontext
CUmodule
CUfunction
CUstream
What is Dynamic Parallelism? The ability to launch new grids from the GPU — Dynamically — Simultaneously — Independently
CPU
GPU
Fermi: Only CPU can generate GPU work
CPU
GPU
Kepler: GPU can generate work for itself
What Does It Mean? CPU
GPU
GPU as Co-Processor
CPU
GPU
Autonomous, Dynamic Parallelism
Fixed Grid
Dynamic Work Generation Statically assign conservative worst-case grid
Dynamically assign performance where accuracy is required Initial Grid
Dynamic Grid
CDP example code __global__ ChildKernel(void *data){ //Operate on data } __global__ ParentKernel(void *data){ ... ChildKernel(data); } // In Host Code ParentKernel