High Performance Dense Linear System Solver with Soft Error ...

September 29, 2011

High Performance Dense Linear System Solver with Soft Error Resilience Peng Du, Piotr Luszczek, Jack Dongarra

Agenda •  Soft error threat to the dense linear solver •  LU factorization •  Error propagation •  Error modeling •  Fault tolerant algorithm •  Performance Evaluation September 29, 2011

2

Soft error • 

• 

Silent error due to radiation •  Alpha particle •  High energy neutron •  Thermal neutron 0

1

0

0

1

1

0

1

0

1

1

0

1

1

0

1

Outbreaks •  Commercial computing system from Sun Microsystem in 2000 •  ASC Q supercomputer at Los Alamos National Lab in 2003

September 29, 2011

3

LU based linear solver

Ax = b A = LU x = U \ (L \ b)

Block LU factorization

GETF2

GEMM

GETF2

TRSM

STRSM

General work flow (1) Generate checksum for the input matrix as additional columns (2) Perform LU factorization WITH the additional checksum columns (3) Solve Ax=b using LU from the factorization (even if soft error occurs during LU factorization) (4) Check for soft error (5) Correct solution x

Why is soft error hard to handle?

•  Soft error occurs silently •  Propagation

Example: Error propagation Error location (using matlab notation and 1based index)

Error strikes right before panel factorization of (41:200, 41:60),

Case 1: Error at (35,10), in L area

Case 2: Error at (50,120), in A’ area

Note: Pivoting on the left of panel factorization is delayed to the end of error detection and recovery so that error in L area does not move

Case 1: Non-propagating error

Error does not propagate in this case

Case 2: Propagating error

Soft error challenge

September 29, 2011

11

Error modeling (for propagating error) •  When? •  Answer: Doesn’t really matter

A

A September 29, 2011

U

L

12

Error modeling (for “where”) Input matrix

One step of LU

If no soft error occurs

If soft error occurs at step t

A

At = Lt −1Pt −1 At −1

U = (Ln Pn )(L1 P1 )(L0 P)0 A0

A = L P A − λ e eT t t−1 t−1 t−1 i j = Lt−1 Pt−1 (Lt−2 Pt−2 L0 P0 )A0 − λ ei eTj

 Define an initial erroneous initial matrix A

A ≅ (L P L P L P )−1 A t−1 t−1 t−2 t−2 0 0 t = A − (Lt−1 Pt−1 Lt−2 Pt−2 L0 P0 )−1 λ ei eTj = A − deTj

Locate Error T       v],  P [ A, A× e, A× w] = L[U , c, A = A + de j  Ae, Aw] = L[  U , c,  v]  ⇒ P [ A,

# % ⇒$ % &

P A = L U  = L c PAe  PAw = L v

! T G =# e #" wT

$! 1 1  1 $ & &# w w  w &%#" 1 2 n & %

Column j

T

Recover Ax=b Luk’s work •  Sherman Morison Formula • 

Recover Ax=b Given:

!# P A = L U " #$ A x = b To Solve:

Ax = b

Recover Ax=b

Ax = b ⇒ x = A−1b  = ( PA)  −1 Pb  ⇒ x = A−1 ( P −1 P)b −1  ( PA) = ?

Recover Ax=b Recall: T  A − A = de j

Therefore: T       PA − PA = ( Pai j − LU i j )e j

 = L U + L(  L−1 Pa  − U )eT = L(  U + teT ) PA ij ij j j −1 T T      = LU (I + U te j ) = LU (I + ve j ) −1   t = L Pai j − U i j

v = U −1t

Recover Ax=b

−1 T    ( PA) = ( LU (I + ve j )) −1   = (I + ve ) ( LU )

Sherman Morrison

T −1 j

" % 1 = $$ I − veTj '' ( L U )−1 # 1+ v j &

Recover Ax=b

Ax = b " % 1 = $$ I − veTj '' x # 1+ v j &

Recover Ax=b  (1) L Ux = Pb ( −1   t = L Pai j − U i j * * * * v = U −1t (2) ) * * " % y * x = $ I − j veT ' x j ' $ 1+ v * # & j +

Recover Ax=b  (1) L Ux = Pb

Needs protection

( −1    t = L Pa − U * ij ij * * * v = U −1t (2) ) * * " % y * x = $ I − j veT ' x j ' $ 1+ v * # & j +

How to detect & recovery a soft error in L? • 

The recovery of Ax=b requires a correct L

• 

L does not change once produced •  Static checkpointing for L

• 

Delay pivoting on L to prevent checksum of L from being invalidated

U

L

Checkpointing for L, idea 1 PDGEMM based checkpointing •  Checkpointing time increases when scaled to more processes and larger matrices • 

NOT SCALABLE

Checkpointing for L, idea 2 •  Local Checkpointing •  Each process checkpoints their local involved data •  Constant checkpointing time

SCALABLE

Encoding for L • 

On each process, for a column of L

l = [l1 ,l2 ,,ln ]

! l1 + l2 ++ ln = c1 # " #$ w1l1 + w2l2 ++ wn ln = c2 ! l1 ++ li ++ ln = c1 # " #$ w1l1 ++ wi li ++ wn ln = c2 " $ c1 − c1 = li − li # $% c2 − c2 = wi (li − li )

c2 − c2 wi = c1 − c1

Kraken Performance

Two 2.6 GHz six-core AMD Opteron processors per node

32x32 MPI processes, 6 threads/(process, core)

6,144 cores used in total

12 10

Tflop/s

8 6 4 FT PDGESV (One error in L and U) 2

NETLIB PDGESV

0 8

9

10

11 12 13 Matrix size x 10000

14

15

16

Kraken Performance

Two 2.6 GHz six-core AMD Opteron processors per node

64x64 MPI processes, 6 threads/(process, core)

24,576 used cores in total

30 25

Tflop/s

20 15 10 5

FT PDGESV (One error in L and U) Netlib PDGESV

0

Matrix size

September 29, 2011

29

• 

Backup slides

September 29, 2011

30

Locate Error

 = L c PAe Ac −1  −1   T   ⇒ c = L PAe = L P( A + de j )e

 = L ( P A + Pde )e A −1

T j

L

 T )e = L−1 ( L U + Pde j −1    = Ue + L Pd −1     ⇒ c − Ue = L Pd = r

U

P Ac = L U

c

A× e A×w

r =U ×e −c U ×e

U

U × w

v

Locate Error

 PAw = L v   AA+c deT )w ⇒ v = L−1 PAw = L−1 P( j −1   T   = L ( PA + Pde j )w −1  A T   = L ( LU + Pde j )w

U

P Ac = L U

L

 + L−1 Pdw  = Uw j

c

A× e A×w

 = L Pdw  ⇒ v − Uw =s j

v

s =U × w− v

−1

U ×e

U

U × w

Locate Error

"$  = L−1 Pd  =r c − Ue ⇒ s = w × r # −1 j  = w L Pd  =s  v − Uw $% j ( 1 + * 1 * - = s. / r ⇒ wj *  * ) 1 , •  Wj is the jth element of vector w in the generator matrix

•  Component-wise division of s and r reveals wj

•  Search wj in w reveals the initial soft error’s column

Extra Storage • 

For input matrix of size MxN on PxQ grid •  A copy of the original matrix •  Not necessary when it’s easy to re-generate the required column of the original matrix •  2 additional columns: 2 x M N •  Each process has 2 rows: 2 × , in total P × 2 × N

Q

extra storage 2 × M + P × 2 × N r= = matrix storage M ×N 2 P×2 P×2 N →∞ = + ⎯⎯⎯ → N M M

High Performance Dense Linear System Solver with Soft Error ...

High Performance Dense Linear System Solver with Soft Error ...

Suggest Documents

High Performance Dense Linear System Solver ... - Semantic Scholar

High Performance Linear System Solver with Resilience ... - The Netlib

High Performance Linear System Solver with Resilience ... - The Netlib

high performance dense linear systems solution on a beowulf cluster

Profiling High Performance Dense Linear Algebra ... - The Netlib

High Performance Video Codec with Error Concealment

Accelerating R with high performance linear algebra

High-performance parallel solver for 3D ...

Soft Error Resilient System Design through Error ... - CiteSeerX

Soft Error Resilient System Design through Error ... - CiteSeerX

High Performance Dense Wavelength Division Multiplexer ... - CiteSeerX

Overview of Iterative Linear System Solver Packages - The Netlib

Soft Error

Binding Performance and Power of Dense Linear Algebra ... - HPCA

High-Dimensional Inference on Dense Graphs with

On The Radiation-Induced Soft Error Performance of ...

A quantum linear system algorithm for dense matrices

PARALLEL SOLUTION OF DENSE LINEAR

HIgH PERfoRmancE WItH HIgH InTEgRITy:

Soft Error Susceptibility Analysis of SRAM-Based FPGAs in High ...

A Streaming High-Throughput Linear Sorter System with Contention

High Performance Conjugate Gradient Solver on ... - Semantic Scholar

Built-In Soft Error Resilience for Robust System ... - Semantic Scholar

System-Level Soft Error Analysis of Sequential Circuits