Portable Vectorization and Parallelization of C++ Multi-dimensional Array Computations ARRAY 2017
[email protected] [email protected] EDF Lab (France) 18 June 2017
1. Considered Class of Problems 1. Considered Class of Problems 2. Legolas++ Array Basic API 3. Legolas++ Array for N Tridiagonal systems 4. Portable Vectorization 5. Under the Hood 6. Performance 7. Conclusion
2/42 -
Considered Class of Problems
Algorithm
Considered Class of Problems
Problem 1 Problem 2 Algorithm
...
Problem N
Considered Class of Problems
Problem 1 Problem 2 Algorithm
...
Deterministic Problem N
Considered Class of Problems Equally Sized Problem 1 Problem 2 Algorithm
...
Deterministic Problem N
3/42 -
Considered Class of Problems Equally Sized Problem 1 Problem 2 Algorithm
...
Deterministic Problem N → Vectorized and Parallel Implementation 3/42 -
Example : Convolution Image Filter convolution({(s0 , t0 ), (s1 , t1 ) . . . , (sn , tn )})
j+r j j-r
i-r
4/42 -
i
i+r
Running Example : Tridiagonal Linear System Thomas algorithm for TX=B:
D[0]
U[0]
L[1]
D[1]
U[1]
L[2]
D[2]
U[2]
L[3]
D[3]
U[3]
X[3]
B[3]
L[4]
D[4]
X[4]
B[4]
→ algo: Thomas(L,D,U,X,B) 5/42 -
*
X[0]
B[0]
X[1]
B[1]
X[2]
=
B[2]
j=3
j=2
D[3][0] U[3][0]
X[3][0]
B[3][0]
L[3][1] D[3][1] U[3][1]
X[3][1]
B[3][1]
D[2][0] U[2][0] j=1
L[2][1] D[2][1] U[2][1]
D[1][0] U[1][0] j=0
D[0][0] U[0][0]
6/42 -
X[3][3]
X[1][1]
X[2][3]
B[1][1]
=
*
*
L[1][4] D[1][4]
B[2][0]
=
B[2][1]
=
*
X[0][0] B[0][0] X[1][2] B[1][2] L[2][4] D[2][4] X[2][4]
L[1][3] D[1][3] U[1][3]
L[0][2] D[0][2] U[0][2]
X[2][1]
X[3][2]
X[1][0] B[1][0] X[2][2] B[2][2] L[3][4] D[3][4] X[3][4]
L[2][3] D[2][3] U[2][3]
L[1][2] D[1][2] U[1][2]
L[0][1] D[0][1] U[0][1]
*
L[3][3] D[3][3] U[3][3]
L[2][2] D[2][2] U[2][2]
L[1][1] D[1][1] U[1][1]
X[2][0]
L[3][2] D[3][2] U[3][2]
X[0][1]
X[0][2]
X[1][3]
=
X[1][4]
B[0][1]
B[0][2]
L[0][3] D[0][3] U[0][3]
X[0][3]
B[0][3]
L[0][4] D[0][4]
X[0][4]
B[0][4]
B[1][3]
B[1][4]
B[2][3]
B[2][4]
B[3][2]
B[3][3]
B[3][4]
2. Legolas++ Array Basic API 1. Considered Class of Problems 2. Legolas++ Array Basic API 3. Legolas++ Array for N Tridiagonal systems 4. Portable Vectorization 5. Under the Hood 6. Performance 7. Conclusion
7/42 -
Legolas++ Array template class
Rectangular multi-dimensional arrays. //3D Array containing 100 double elts Legolas : : Array X3D ( 1 0 , 5 , 2 ) ;
8/42 -
Legolas++ Array template class
Rectangular multi-dimensional arrays. //3D Array containing 100 double elts Legolas : : Array X3D ( 1 0 , 5 , 2 ) ; Legolas : : Array X2D=X3D [ 0 ] ; for ( int k=0; k