Portable Vectorization and Parallelization of C++ Multi

0 downloads 0 Views 2MB Size Report
Jun 18, 2017 - C++ Multi-dimensional ..... struct MultiThomasSolver{ template ... typedef typename A2D : : Element Element ; typedef typename ...
Portable Vectorization and Parallelization of C++ Multi-dimensional Array Computations ARRAY 2017

[email protected] [email protected] EDF Lab (France) 18 June 2017

1. Considered Class of Problems 1. Considered Class of Problems 2. Legolas++ Array Basic API 3. Legolas++ Array for N Tridiagonal systems 4. Portable Vectorization 5. Under the Hood 6. Performance 7. Conclusion

2/42 -

Considered Class of Problems

Algorithm

Considered Class of Problems

Problem 1 Problem 2 Algorithm

...

Problem N

Considered Class of Problems

Problem 1 Problem 2 Algorithm

...

Deterministic Problem N

Considered Class of Problems Equally Sized Problem 1 Problem 2 Algorithm

...

Deterministic Problem N

3/42 -

Considered Class of Problems Equally Sized Problem 1 Problem 2 Algorithm

...

Deterministic Problem N → Vectorized and Parallel Implementation 3/42 -

Example : Convolution Image Filter convolution({(s0 , t0 ), (s1 , t1 ) . . . , (sn , tn )})

j+r j j-r

i-r

4/42 -

i

i+r

Running Example : Tridiagonal Linear System Thomas algorithm for TX=B:

D[0]

U[0]

L[1]

D[1]

U[1]

L[2]

D[2]

U[2]

L[3]

D[3]

U[3]

X[3]

B[3]

L[4]

D[4]

X[4]

B[4]

→ algo: Thomas(L,D,U,X,B) 5/42 -

*

X[0]

B[0]

X[1]

B[1]

X[2]

=

B[2]

j=3

j=2

D[3][0] U[3][0]

X[3][0]

B[3][0]

L[3][1] D[3][1] U[3][1]

X[3][1]

B[3][1]

D[2][0] U[2][0] j=1

L[2][1] D[2][1] U[2][1]

D[1][0] U[1][0] j=0

D[0][0] U[0][0]

6/42 -

X[3][3]

X[1][1]

X[2][3]

B[1][1]

=

*

*

L[1][4] D[1][4]

B[2][0]

=

B[2][1]

=

*

X[0][0] B[0][0] X[1][2] B[1][2] L[2][4] D[2][4] X[2][4]

L[1][3] D[1][3] U[1][3]

L[0][2] D[0][2] U[0][2]

X[2][1]

X[3][2]

X[1][0] B[1][0] X[2][2] B[2][2] L[3][4] D[3][4] X[3][4]

L[2][3] D[2][3] U[2][3]

L[1][2] D[1][2] U[1][2]

L[0][1] D[0][1] U[0][1]

*

L[3][3] D[3][3] U[3][3]

L[2][2] D[2][2] U[2][2]

L[1][1] D[1][1] U[1][1]

X[2][0]

L[3][2] D[3][2] U[3][2]

X[0][1]

X[0][2]

X[1][3]

=

X[1][4]

B[0][1]

B[0][2]

L[0][3] D[0][3] U[0][3]

X[0][3]

B[0][3]

L[0][4] D[0][4]

X[0][4]

B[0][4]

B[1][3]

B[1][4]

B[2][3]

B[2][4]

B[3][2]

B[3][3]

B[3][4]

2. Legolas++ Array Basic API 1. Considered Class of Problems 2. Legolas++ Array Basic API 3. Legolas++ Array for N Tridiagonal systems 4. Portable Vectorization 5. Under the Hood 6. Performance 7. Conclusion

7/42 -

Legolas++ Array template class

Rectangular multi-dimensional arrays. //3D Array containing 100 double elts Legolas : : Array X3D ( 1 0 , 5 , 2 ) ;

8/42 -

Legolas++ Array template class

Rectangular multi-dimensional arrays. //3D Array containing 100 double elts Legolas : : Array X3D ( 1 0 , 5 , 2 ) ; Legolas : : Array X2D=X3D [ 0 ] ; for ( int k=0; k

Suggest Documents