Modern C++, OpenCL SYCL & OpenCL CL2.hpp - Ronan KERYELL

46 downloads 177 Views 2MB Size Report
Nov 18, 2014 - Python 3.x: ..... Interface with external OpenCL framework .... No: CL2.hpp is just a C++ wrapper atop Op
Modern C++, OpenCL SYCL & OpenCL CL2.hpp Ronan Keryell AMD & Khronos OpenCL SYCL committee Ronan.Keryell at amd dot com

11/18/2014 — SuperComputing 2014 — OpenCL BoF

•2014

I

Outline

1

2 great things in 2014 2014 take 1: C++14 2014 take 2: OpenCL SYCL

2

OpenCL SYCL C++... putting everything altogether Possible future extensions & applications

3

C++ CL.hpp OpenCL wrapper

4

Conclusion

Modern C++, OpenCL SYCL & OpenCL CL2.hpp

SuperComputing 2014 — OpenCL BoF

2 / 31

•2014

I2014 take 1: C++14

Outline

1

2 great things in 2014 2014 take 1: C++14 2014 take 2: OpenCL SYCL

2

OpenCL SYCL C++... putting everything altogether Possible future extensions & applications

3

C++ CL.hpp OpenCL wrapper

4

Conclusion

Modern C++, OpenCL SYCL & OpenCL CL2.hpp

SuperComputing 2014 — OpenCL BoF

3 / 31

•2014

I2014 take 1: C++14

C++14 • 2 Open Source compilers available before ratification (GCC & Clang/LLVM) • Confirm new momentum & pace: 1 major (C++11) and 1 minor (C++14) version on a 6-year cycle • Next big version expected in 2017 (C++1z) I Already being implemented! ,

• Monolithic committee replaced by many smaller task forces I Parallelism TS (Technical Specification) I Concurrency TS I Definitely matters for HPC and heterogeneous computing!

C++ is a complete new language • Forget about C++98, C++03... • Now open and public • Send your proposals and get involved! Modern C++, OpenCL SYCL & OpenCL CL2.hpp

SuperComputing 2014 — OpenCL BoF

4 / 31

•2014

I2014 take 1: C++14

Modern C++ & HPC

(I)

• Huge library improvements I I I I I

library and multithread memory model Hash-map Algorithms Random numbers ...

; HPC

• Uniform initialization and range-based for loop s t d : : v e c t o r < i n t > my_vector { 1 , 2 , 3 , 4 , 5 } ; f o r ( i n t &e : my_vector ) e += 1 ;

• Easy functional programming style with lambda (anonymous) functions s t d : : t r a n s f o r m ( s t d : : begin ( v ) , s t d : : end ( v ) , [ ] ( i n t v ) { r e t u r n 2∗v ; } ) ;

Modern C++, OpenCL SYCL & OpenCL CL2.hpp

SuperComputing 2014 — OpenCL BoF

5 / 31

•2014

I2014 take 1: C++14

Modern C++ & HPC

(II)

• R-value references & std :: move semantics I matrix_A = matrix_B + matrix_C  Avoid copying (TB, PB... /) when assigning or function return

I Without messing up with references or worse... pointers! I (Remember Chandler’s talk on abstractions & C++ yesterday at the LLVM-HPC workshop?)

 easier:  easy • Lot of meta-programming improvements to make meta-programming  variadic templates, type traits < type_traits >... • Make simple things simpler to be able to write generic numerical libraries, etc.

Modern C++, OpenCL SYCL & OpenCL CL2.hpp

SuperComputing 2014 — OpenCL BoF

6 / 31

•2014

I2014 take 1: C++14

Modern C++ & HPC

(III)

• Automatic type inference for terse programming I Python 3.x: d e f add ( x , y ) : return x + y p r i n t ( add ( 2 , 3 ) ) # 5 p r i n t ( add ( " 2 " , " 3 " ) ) # 23 I Same in C++14 but also with static compile-time type-checking: auto add = [ ] ( auto x , auto y ) { r e t u r n x + y ; } ; s t d : : c o u t C { c , range { N, M } } ; / / Enqueue some computation k e r n e l t a s k command_group ( myQueue , [ & ] ( ) { / / D e f i n e t h e data used / produced auto ka = A . get_access ( ) ; auto kb = B . get_access ( ) ; auto kc = C . get_access ( ) ; / / Create & c a l l OpenCL k e r n e l named " mat_add " p a r a l l e l _ f o r < c l a s s mat_add >( range { N, M } , [ = ] ( i d i ) { kc [ i ] = ka [ i ] + kb [ i ] ; } ); } ) ; / / End o f our commands f o r t h i s queue } / / End scope , so w a i t f o r t h e queue t o complete . / / Copy back t h e b u f f e r data w i t h RAII b e h a v i o u r . return 0;

SuperComputing 2014 — OpenCL BoF

13 / 31

•OpenCL SYCL

I

Asynchronous task graph programming • Theoretical graph of an application described with tasks kernels using buffers through accessors cl::sycl::accessor cl::sycl::accessor cl::sycl::accessor cl::sycl::buffer a init_a Display cl::sycl::buffer c matrix_add cl::sycl::buffer b init_b cl::sycl::accessor cl::sycl::accessor cl::sycl::accessor • Possible schedule by SYCL runtime: init_b init_a matrix_add

Display

; Automatic overlap of kernels & communications

I Even better when looping around in an application

Modern C++, OpenCL SYCL & OpenCL CL2.hpp

SuperComputing 2014 — OpenCL BoF

14 / 31

•OpenCL SYCL

I

Task graph programming — the code # i n c l u d e # i n c l u d e < iostream > u s i n g namespace c l : : s y c l ; / / Size o f t h e m a t r i c e s c o n s t s i z e _ t N = 2000; c o n s t s i z e _ t M = 3000; i n t main ( ) { { / / By s t i c k i n g a l l t h e SYCL work i n a { } block , we ensure / / a l l SYCL t a s k s must complete b e f o r e e x i t i n g t h e b l o c k

[ = ] ( i d i n d e x ) { B [ i n d e x ] = i n d e x [0]∗2014 + i n d e x [ 1 ] ∗ 4 2 ; }); }); / / Launch an asynchronous k e r n e l t o compute m a t r i x a d d i t i o n c = a + b command_group ( myQueue , [ & ] ( ) { / / I n t h e k e r n e l a and b are read , b u t c i s w r i t t e n auto A = a . get_access ( ) ; auto B = b . get_access ( ) ; auto C = c . get_access ( ) ; / / From these accessors , t h e SYCL r u n t i m e w i l l ensure t h a t when / / t h i s k e r n e l i s run , t h e k e r n e l s computing a and b completed

/ / Create a queue t o work on queue myQueue ; / / Create some 2D b u f f e r s o f f l o a t f o r our m a t r i c e s b u f f e r a ( { N, M } ) ; b u f f e r b ( { N, M } ) ; b u f f e r c ( { N, M } ) ; / / Launch a f i r s t asynchronous k e r n e l t o i n i t i a l i z e a command_group ( myQueue , [ & ] ( ) { / / The k e r n e l w r i t e a , so g e t a w r i t e accessor on i t auto A = a . get_access ( ) ;

/ / Enqueue a p a r a l l e l k e r n e l on a N∗M 2D i t e r a t i o n space p a r a l l e l _ f o r < c l a s s matrix_add > ( { N, M } , [ = ] ( i d i n d e x ) { C[ index ] = A[ index ] + B[ index ] ; }); }); /∗ Request an access t o read c from t h e host−s i d e . The SYCL r u n t i m e ensures t h a t c i s ready when t h e accessor i s r e t u r n e d ∗/ auto C = c . get_access ( ) ; s t d : : c o u t