Pinpointing Scale-Dependent Integer Overflow Bugs ...

19 downloads 36 Views 2MB Size Report
2015. On new Boeing 787, integer overflow bug may cause pilots to lose control of aircraft. Source: http://www.nytimes.com. Donkey Kong integer overflow bug.
Pinpointing Scale-Dependent Integer Overflow Bugs in Large-Scale Parallel Applications Ignacio Laguna and Martin Schulz Lawrence Livermore National Laboratory

ACM/IEEE-CS International Conference for High Performance Computing, Networking, Storage and Analysis (SC), Salt Lake City, Utah, Nov 15, 2016

1

LLNL-PRES-707817. This work was performed under the auspices of the U.S. Department of Energy by Lawrence Livermore National Laboratory under contract DE-AC52-07NA27344. Lawrence Livermore National Security, LLC

The pathology of scale-dependent bugs

Hide themselves at small-scale

2

Show up at large-scale

Integer overflow bugs have a long history— but they still occur today! 1981 Donkey Kong integer overflow bug in level 22

2015 On new Boeing 787, integer overflow bug may cause pilots to lose control of aircraft

Source: http://www.nytimes.com

Source: http://errors.wikia.com/wiki/DK_kill_screen

3

An integer overflow is an integer operation that exceeds the max size of the type used to store it

int x = 2147483647; x = x + 1;

$ cat sizes.c #include #include int main() { printf("%d\n", INT_MAX); printf("%d\n", INT_MIN); printf("%u\n", UINT_MAX); printf("%ld\n", LONG_MAX); printf("%ld\n", LONG_MIN); printf("%lu\n", ULONG_MAX); return 0; }

4

$ ./sizes 2147483647 -2147483648 4294967295 9223372036854775807 -9223372036854775808 18446744073709551615

int type: 2.14×109 long int type: 9.22×1018

Signed overflows are undefined in C/C++,
 Unsigned overflows are well defined

5

Expression

Result

UINT_MAX + 1

0

1 33,750,000 100,000 MPI processes => 337,500,000 1,000,000 MPI processes => 3,375,000,000 7

Input (problem size)

Cannot be stored in 32-bit signed integer

Bug case in HPCCG Input (problem size)

Defines scale (number of MPI processes)

... 72 int local_nrow = nx*ny*nz; // This is the size of our subblock 73 assert(local_nrow>0); // Must have something to work with 74 int local_nnz = 27*local_nrow; // Approximately 27 nonzeros per row 75 76 int total_nrow = local_nrow*size; // Total number of grid points in mesh 77 long long total_nnz = 27* (long long) total_nrow; // Approximately 27 nonzeros per row (except for boundary nodes) 78 79 int start_row = local_nrow*rank; // Each processor gets a section ...

Overflow occurs with Default input: 50x50x50 MPI ranks: 20,000 (or more)

8

Source: https://mantevo.org/

Error propagation makes it hard to find the root cause of failures

... int size = getWorldSize(); ... ... rank = size OP ... ; ... ... ... ... ... MPI_Send(..., rank, ...); ...

Error message: MPI_ERR_RANK: invalid rank

9

AGENDA

1

2

3

10

Problem description

Scale-dependent integer overflows

Existing solutions and their limitations Use 64-bit integers everywhere Use compiler checks

Our approach

Predicting bugs at small-scale runs Static + dynamic analysis Some bugs we found

Is using 64-bit integers in all operations a possible solution? 64-bit Integers are hard to overflow Type

Value

long long max value

9223372036854775807 (9.22 x 1018)

long long min value

-9223372036854775808 (-9.22 x 1018)

unsigned long long max value

18446744073709551615 (1.84 x 1019)

Common programming pattern allows switching integer types easily

#ifdef LARGE_SCALE typedef long long MyInteger; #else typedef int MyInteger; /* default */ #endif ... MyInteger var; 11

64-bit integer operations are more costly than 32-bit integer operations

for (MyInt i=0; i < size; i++) a[i] = b[i] OP c[i];

On my Mac laptop (2.3 GHz Intel Core i7)

1.2

Time (sec)

1

32-bit 64-bit

0.8 0.6 0.4 0.2 0 a=b+c

a=b*c

a=b/c

a=b%c

a=b&c

Using 64-bit integers may increase memory footprint 12

Compiler checks can detect integer overflows (but cannot predict at large scale)

Sanitizer (Undefined Behavior) % cat test.cc int main(int argc, char **argv) { int k = 0x7fffffff; k += argc; return 0; } % clang++ -fsanitize=undefined test.cc % ./a.out test.cc:3:5: runtime error: signed integer overflow: 2147483647 + 1 cannot be represented in type 'int' 13

AGENDA

1

2

3

14

Problem description

Scale-dependent integer overflows

Existing solutions and their limitations Use 64-bit integers everywhere Use compiler checks

Our approach

Predicting bugs at small-scale runs Static + dynamic analysis Some bugs we found

Our approach is to catch these errors using
 small-scale runs (when testing)

Hide themselves at small-scale

15

Show up at large-scale

Our approach combines compiler and runtime analysis with machine learning

Static analysis in LLVM to identify scale-dependent (SD) operations

1

Runtime analysis to log the result of SD operations at small scale

21

Regression analysis to forecast the values of SD operations at large scale

16

3

We use static analysis to detect
 scale-dependent (SD) integer operations

17

1

Identify source of scale values

2

Dependence analysis (inter-procedural)

3

Instrument SD operations to log their values

int getWorldSize() { int size; MPI_Comm_size(MPI_COMM_WORLD, &size); return size; } int numElements() { int x, ret; x = rand() % 27; ret = getWorldSize() + x; SD return ret; } int main(int argc, char **argv) { int input, elems, params; params = argc + 1; printf(“Parameters %d\n”, params); int elems = numElements() * 3; SD input = atoi(argv[1]); input *= 3; SD ... }

We identify scale-dependent (SD) loops;
 Operations inside SD loops are also SD Regular loop #define SIZE 10000 void bar() { for (int i=0; i < SIZE; ++i) { ... } }

Number of iterations is independent of scale

Scale-dependent loop int getWorldSize() { int size; MPI_Comm_size(MPI_COMM_WORLD, &size); return size; } void foo() { int x = 0; int max = getWorldSize() + ... for (int i=0; i < max; ++i) { x += 10; SD indirect } }

Number of iterations depends on scale

18

We use a regression model to forecast the value of SD operations at large scale

SD operation runtime values

Runs at small scales 1000000 800000

Model 1

600000

Model 2

400000 200000 0

0

16

32

48

64

80

MPI processes

What would be the value with 1,000,000 MPI processes?

19

Case study 1:

MPI_Gather in MPICH

mpich-3.1.4, mpich-3.2b4

Implementation func mask = 0 x1 ; int MPI_Gather( com while ( mask < comm_size ) { …! if (( mask & relative_rank ) == 0) { int MPI_Gather( communicator, … ) { size = getSize( com ... … ! if ( src < comm_size ) { Scale ... while value ( … < size ) size = getSize( communicator ); if ( rank == root ) { …! ... MPIC_Recv( buf Scale-dependent loop if (( rank + mask + while recvblks comm_size ( …==< size ){ ) }! ((( rank + mask ) % comm_size ) < …! … (( rank + mask + recvblks ) % comm_size ) ) ) { mask