2015. On new Boeing 787, integer overflow bug may cause pilots to lose control of aircraft. Source: http://www.nytimes.com. Donkey Kong integer overflow bug.
Pinpointing Scale-Dependent Integer Overflow Bugs in Large-Scale Parallel Applications Ignacio Laguna and Martin Schulz Lawrence Livermore National Laboratory
ACM/IEEE-CS International Conference for High Performance Computing, Networking, Storage and Analysis (SC), Salt Lake City, Utah, Nov 15, 2016
1
LLNL-PRES-707817. This work was performed under the auspices of the U.S. Department of Energy by Lawrence Livermore National Laboratory under contract DE-AC52-07NA27344. Lawrence Livermore National Security, LLC
The pathology of scale-dependent bugs
Hide themselves at small-scale
2
Show up at large-scale
Integer overflow bugs have a long history— but they still occur today! 1981 Donkey Kong integer overflow bug in level 22
2015 On new Boeing 787, integer overflow bug may cause pilots to lose control of aircraft
Source: http://www.nytimes.com
Source: http://errors.wikia.com/wiki/DK_kill_screen
3
An integer overflow is an integer operation that exceeds the max size of the type used to store it
int x = 2147483647; x = x + 1;
$ cat sizes.c #include #include int main() { printf("%d\n", INT_MAX); printf("%d\n", INT_MIN); printf("%u\n", UINT_MAX); printf("%ld\n", LONG_MAX); printf("%ld\n", LONG_MIN); printf("%lu\n", ULONG_MAX); return 0; }
4
$ ./sizes 2147483647 -2147483648 4294967295 9223372036854775807 -9223372036854775808 18446744073709551615
int type: 2.14×109 long int type: 9.22×1018
Signed overflows are undefined in C/C++,
Unsigned overflows are well defined
5
Expression
Result
UINT_MAX + 1
0
1 33,750,000 100,000 MPI processes => 337,500,000 1,000,000 MPI processes => 3,375,000,000 7
Input (problem size)
Cannot be stored in 32-bit signed integer
Bug case in HPCCG Input (problem size)
Defines scale (number of MPI processes)
... 72 int local_nrow = nx*ny*nz; // This is the size of our subblock 73 assert(local_nrow>0); // Must have something to work with 74 int local_nnz = 27*local_nrow; // Approximately 27 nonzeros per row 75 76 int total_nrow = local_nrow*size; // Total number of grid points in mesh 77 long long total_nnz = 27* (long long) total_nrow; // Approximately 27 nonzeros per row (except for boundary nodes) 78 79 int start_row = local_nrow*rank; // Each processor gets a section ...
Overflow occurs with Default input: 50x50x50 MPI ranks: 20,000 (or more)
8
Source: https://mantevo.org/
Error propagation makes it hard to find the root cause of failures
... int size = getWorldSize(); ... ... rank = size OP ... ; ... ... ... ... ... MPI_Send(..., rank, ...); ...
Error message: MPI_ERR_RANK: invalid rank
9
AGENDA
1
2
3
10
Problem description
Scale-dependent integer overflows
Existing solutions and their limitations Use 64-bit integers everywhere Use compiler checks
Our approach
Predicting bugs at small-scale runs Static + dynamic analysis Some bugs we found
Is using 64-bit integers in all operations a possible solution? 64-bit Integers are hard to overflow Type
Value
long long max value
9223372036854775807 (9.22 x 1018)
long long min value
-9223372036854775808 (-9.22 x 1018)
unsigned long long max value
18446744073709551615 (1.84 x 1019)
Common programming pattern allows switching integer types easily
#ifdef LARGE_SCALE typedef long long MyInteger; #else typedef int MyInteger; /* default */ #endif ... MyInteger var; 11
64-bit integer operations are more costly than 32-bit integer operations
for (MyInt i=0; i < size; i++) a[i] = b[i] OP c[i];
On my Mac laptop (2.3 GHz Intel Core i7)
1.2
Time (sec)
1
32-bit 64-bit
0.8 0.6 0.4 0.2 0 a=b+c
a=b*c
a=b/c
a=b%c
a=b&c
Using 64-bit integers may increase memory footprint 12
Compiler checks can detect integer overflows (but cannot predict at large scale)
Sanitizer (Undefined Behavior) % cat test.cc int main(int argc, char **argv) { int k = 0x7fffffff; k += argc; return 0; } % clang++ -fsanitize=undefined test.cc % ./a.out test.cc:3:5: runtime error: signed integer overflow: 2147483647 + 1 cannot be represented in type 'int' 13
AGENDA
1
2
3
14
Problem description
Scale-dependent integer overflows
Existing solutions and their limitations Use 64-bit integers everywhere Use compiler checks
Our approach
Predicting bugs at small-scale runs Static + dynamic analysis Some bugs we found
Our approach is to catch these errors using
small-scale runs (when testing)
Hide themselves at small-scale
15
Show up at large-scale
Our approach combines compiler and runtime analysis with machine learning
Static analysis in LLVM to identify scale-dependent (SD) operations
1
Runtime analysis to log the result of SD operations at small scale
21
Regression analysis to forecast the values of SD operations at large scale
16
3
We use static analysis to detect
scale-dependent (SD) integer operations
17
1
Identify source of scale values
2
Dependence analysis (inter-procedural)
3
Instrument SD operations to log their values
int getWorldSize() { int size; MPI_Comm_size(MPI_COMM_WORLD, &size); return size; } int numElements() { int x, ret; x = rand() % 27; ret = getWorldSize() + x; SD return ret; } int main(int argc, char **argv) { int input, elems, params; params = argc + 1; printf(“Parameters %d\n”, params); int elems = numElements() * 3; SD input = atoi(argv[1]); input *= 3; SD ... }
We identify scale-dependent (SD) loops;
Operations inside SD loops are also SD Regular loop #define SIZE 10000 void bar() { for (int i=0; i < SIZE; ++i) { ... } }
Number of iterations is independent of scale
Scale-dependent loop int getWorldSize() { int size; MPI_Comm_size(MPI_COMM_WORLD, &size); return size; } void foo() { int x = 0; int max = getWorldSize() + ... for (int i=0; i < max; ++i) { x += 10; SD indirect } }
Number of iterations depends on scale
18
We use a regression model to forecast the value of SD operations at large scale
SD operation runtime values
Runs at small scales 1000000 800000
Model 1
600000
Model 2
400000 200000 0
0
16
32
48
64
80
MPI processes
What would be the value with 1,000,000 MPI processes?
19
Case study 1:
MPI_Gather in MPICH
mpich-3.1.4, mpich-3.2b4
Implementation func mask = 0 x1 ; int MPI_Gather( com while ( mask < comm_size ) { …! if (( mask & relative_rank ) == 0) { int MPI_Gather( communicator, … ) { size = getSize( com ... … ! if ( src < comm_size ) { Scale ... while value ( … < size ) size = getSize( communicator ); if ( rank == root ) { …! ... MPIC_Recv( buf Scale-dependent loop if (( rank + mask + while recvblks comm_size ( …==< size ){ ) }! ((( rank + mask ) % comm_size ) < …! … (( rank + mask + recvblks ) % comm_size ) ) ) { mask