Intel Xeon Phi MIC Offload Programming Models - TACC User Portal

0 downloads 91 Views 6MB Size Report
Nov 16, 2014 - integer, parameter :: N = 100000 ! constant real. :: a(N), b(N), c(N), d(N) ! on stack ... !dir$ offload
Intel Xeon Phi MIC Offload Programming Models Kent Milfeld Nov. 16, 2014 These slides: tinyurl.com/sc14-phi © The University of Texas at Austin, 2014 Please see the final slide for copyright and licensing information.

PowerPoint original available on request ver KM2014-011

Key References •  Jeffers and Reinders, Intel Xeon Phi... –  High Performance Parallelism Pearls, Nov. 17

•  Intel Developer Zone –  http://software.intel.com/en-us/mic-developer –  http://software.intel.com/en-us/articles/effective-use-of-the-intelcompilers-offload-features

•  Stampede User Guide and related TACC resources –  Search User Guide for "Advanced Offload" and follow link Other specific recommendations throughout this presentation

2

Overview •  Basic Offload Process •  Automatic Offload •  Explicit Offload (directives) •  Virtual Shared Memory –  Offload Examples •  Building Libraries –  Controlling Offload •  Asynchronous Offload –  Monitoring –  Environment Variables

Source code available on Stampede: tar xvf ~train00/offload_demos.tar

3

Offloading: MIC as coprocessor A program running on the host offloads work by directing the MIC to execute a specified block of code. The host also directs the exchange of data between host and MIC.

app

“...do work and

running on host

host

Ideally, the host stays active while the MIC coprocessor does its assigned work.

deliver results as directed...”

x16 PCIe

node 4

Offload Models •  Compiler Assisted Offload (CAO) Insert Directives –  Explicit (Non-shared Memory Model) •  Programmer directs code execution AND data movement

–  Implicit (Virtual Shared Memory Model) •  Programmer marks some data as “shared” in the virtual sense •  Runtime automatically synchronizes values between host and MIC

•  Automatic Offload (AO) Some MKL lib routines offload when enabled –  Computationally intensive calls to Intel Math Kernel Library (MKL) –  MKL automatically manages details –  More than just offload: work division across host and MIC!

5

Explicit Model: Direct Control of Data Movement •  Available for C/C++ and Fortran •  Rich set of Data Motion directives and clauses (for Non-Shared Model through COI*) •  Supports simple (“bitwise copyable”) data structures (think 1d arrays of scalars)

*Coprocessor Offload Infrastructure 6

F90

program main

Explicit Offload

use omp_lib integer :: nprocs nprocs = omp_get_num_procs() print*, "procs: ", nprocs end program

ifort -openmp off00host.f90 icc -openmp off00host.c #include #include int main( void )

C/C++ {

int totalProcs;

Simple Fortran and C codes that each return "procs: 16" on Sandy Bridge host…

totalProcs = omp_get_num_procs(); printf( "procs: %d\n", totalProcs ); return 0; } 7

F90

program main

Explicit Offload

use omp_lib integer :: nprocs

offload directive

!dir$ offload target(mic) nprocs = omp_get_num_procs()

runs on MIC

print*, "procs: ", nprocs end program

runs on host

ifort -openmp off01simple.f90 icc -openmp off01simple.c

#include #include int main( void ) int totalProcs;

Add a one-line directive/pragma that offloads to the MIC the one line of executable code that occurs below it…

C/C++ {

offload pragma

runs on MIC #pragma offload target(mic) totalProcs = omp_get_num_procs(); printf( "procs: %d\n", totalProcs ); return 0;

…codes now return "procs: 240"… }

runs on host 8

F90

program main

Explicit Offload

use omp_lib integer :: nprocs !dir$ offload target(mic) nprocs = omp_get_num_procs()

don't use "-mmic"

print*, "procs: ", nprocs end program

ifort -openmp off01simple.f90 icc -openmp off01simple.c #include #include int main( void )

C/C++ {

int totalProcs;

Don't even need to change the compile line…

#pragma offload target(mic) totalProcs = omp_get_num_procs(); printf( "procs: %d\n", totalProcs ); return 0; } 9

F90

program main

Explicit Offload

use omp_lib integer :: nprocs

Execution blocks until offload finished

!dir$ offload target(mic) nprocs = omp_get_num_procs() print*, "procs: ", nprocs end program

off01simple #include #include int main( void ) int totalProcs;

Not asynchronous (yet): the host pauses until MIC is finished.

C/C++ {

Execution blocks until offload finished

#pragma offload target(mic) totalProcs = omp_get_num_procs();

printf( "procs: %d\n", totalProcs ); return 0; } 10

F90 !dir$ offload begin target(mic) nprocs = omp_get_num_procs() maxthreads = omp_get_max_threads() !dir$ end offload

Explicit Offload off02block

C/C++

Can offload section of code as a block

#pragma offload target(mic) { totalProcs = omp_get_num_procs(); maxThreads = omp_get_max_threads(); }

11

program main integer, parameter :: N = 500000 real :: a(N)

F90

Explicit Offload

! constant ! on stack

!dir$ offload target(mic) !$omp parallel do do i=1,N a(i) = real(i) end do !$omp end parallel do ...

off03omp

C/C++

int main( void ) {

…or an OpenMP region defined by an omp directive…

double a[500000]; // on the stack; //literal here is important int i; #pragma offload target(mic) #pragma omp parallel for for ( i=0; i

Suggest Documents