Provide a thin abstraction layer: programmer can cleanly reason about what the ... control transfers from C/C++ applicat
High Performance Graphics On The CPU Using ispc Matt Pharr Intel Beyond Programmable Shading Course, ACM SIGGRAPH 2011
ispc: Goals • Deliver excellent performance to programmers who want to run SPMD programs on the CPU
• Provide a thin abstraction layer: programmer can cleanly reason about what the compiler will do
• Allow close-coupling and fine-grained interactions between C/C++ code and ispc code
Beyond Programmable Shading Course, ACM SIGGRAPH 2011
ispc Execution Model • Program instances are executed in n-wide SPMD when control transfers from C/C++ application code to ispc code (parallel_for)
– n is typically 4 or 8 for 4-wide vector units (SSE) • You can use your own task/threading system to run over concurrent execution contexts
• Or, use launch and sync in ispc to express concurrent tasks Beyond Programmable Shading Course, ACM SIGGRAPH 2011
ispc: Key Features • C-based syntax • Pointers, data structures shared with C/C++ code (no driver/data reformatting)
• Only a function call boundary between C/C++ and ispc code
• Recursion, externally-defined functions just work • Rich standard library: vectorized transcendentals, atomics, ...
Beyond Programmable Shading Course, ACM SIGGRAPH 2011
Building Applications with ispc C/C++ Source C/C++ Source
ispc Source ispc Source
ispc Compiler
C/C++ Compiler
Object File
Object File
Linker Executable Beyond Programmable Shading Course, ACM SIGGRAPH 2011
“Hello ispc” C++ Application Code int nVertices = ...; float *x = new float[nVertices]; float *y = new float[nVertices]; float *z = new float[nVertices]; // fill in x[], y[], z[] float matrix[3][3] = { { ... }, ... }; transform3x3(x, y, z, matrix, nVertices);
ispc Code
export void transform3x3(uniform float xarray[], uniform float yarray[], uniform float zarray[], uniform float m[4][4], uniform int nVertices) { uniform int i; for (i = 0; i < nVertices; i += programCount) { float x = xarray[i + programIndex]; float y = yarray[i + programIndex]; float z = zarray[i + programIndex]; float xt = m[0][0]*x + m[0][1]*y + m[0][2]*z; float yt = m[1][0]*x + m[1][1]*y + m[1][2]*z; float zt = m[2][0]*x + m[2][1]*y + m[2][2]*z; xarray[i + programIndex] = xt; yarray[i + programIndex] = yt; zarray[i + programIndex] = zt; }
}
Beyond Programmable Shading Course, ACM SIGGRAPH 2011
A Ray Tracer in ispc C++ Application Code int width = ..., height = ...; const float raster2camera[4][4] = { ... }; const float camera2world[4][4] = { ... }; float *image = new float[width*height]; Triangle *triangles = new Triangle[nTris]; LinearBVHNode *nodes = new LinearBVHNode[nNodes]; // init triangles and nodes raytrace(width, height, raster2camera, camera2world, image, nodes, triangles);
ispc Code export void raytrace(uniform int width, uniform int height, const uniform float raster2camera[4][4], const uniform float camera2world[4][4], uniform float image[], const LinearBVHNode nodes[], const Triangle triangles[]) { // ... // map program instances to rays // ... for (y = 0; y < height; y += yStep) { for (x = 0; x < width; x += xStep) { Ray ray; generateRay(raster2camera, camera2world, x+dx, y+dy, ray); BVHIntersect(nodes, triangles, ray); int offset = (y + idy) * width + (x + idx); image[offset] = ray.maxt; id[offset] = ray.hitId; } } }
Beyond Programmable Shading Course, ACM SIGGRAPH 2011
Performance Dual Xeon X5680 (12 cores)
Ray Tracer SH Radiance Probe Gen. Deferred Shading
Serial C
ispc SPMD + tasks
1x
102.25x
1x
65.71x
1x
39.40x Beyond Programmable Shading Course, ACM SIGGRAPH 2011
Integration With Regular Debuggers
Beyond Programmable Shading Course, ACM SIGGRAPH 2011
Try It Yourself! • ispc is available in open-source from http://ispc.github.com
– Codegen uses the (excellent) LLVM compiler toolkit – Supports Linux, Windows, Mac OS X – x86 and x86-64 targets, SSE2 and SSE4 (AVX soon)
Beyond Programmable Shading Course, ACM SIGGRAPH 2011