Formalizing C in Coq - Robbert Krebbers

2 downloads 184 Views 561KB Size Report
Oct 24, 2014 - This program violates the sequence point restriction. ▷ due to two .... Visualization of the operationa
Formalizing C in Coq Robbert Krebbers ICIS, Radboud University Nijmegen, The Netherlands

October 24, 2014 @ Princeton University, USA

1

What is this program supposed to do? The C quiz, question 1

int main() { int x; int y = (x = 3) + (x = 4); printf("x=%d,y=%d\n", x, y); }

2

What is this program supposed to do? The C quiz, question 1

int main() { int x; int y = (x = 3) + (x = 4); printf("x=%d,y=%d\n", x, y); } Let us try some compilers I

Clang prints x=4,y=7, seems just left-right

2

What is this program supposed to do? The C quiz, question 1

int main() { int x; int y = (x = 3) + (x = 4); printf("x=%d,y=%d\n", x, y); } Let us try some compilers I

Clang prints x=4,y=7, seems just left-right

I

GCC prints x=4,y=8, does not correspond to any order

2

What is this program supposed to do? The C quiz, question 1

int main() { int x; int y = (x = 3) + (x = 4); printf("x=%d,y=%d\n", x, y); } Let us try some compilers I

Clang prints x=4,y=7, seems just left-right

I

GCC prints x=4,y=8, does not correspond to any order

This program violates the sequence point restriction I

due to two unsequenced writes to x

I

resulting in undefined behavior

I

thus both compilers are right 2

Underspecification in C11

I

Unspecified behavior: two or more behaviors are allowed For example: order of evaluation in expressions (+57 more)

I

Implementation defined behavior: like unspecified behavior, but the compiler has to document its choice For example: size and endianness of integers (+118 more)

I

Undefined behavior: the standard imposes no requirements at all, the program is even allowed to crash For example: dereferencing a NULL or dangling pointer, signed integer overflow, . . . (+201 more)

3

Underspecification in C11

I

Unspecified behavior: two or more behaviors are allowed For example: order of evaluation in expressions (+57 more) Non-determinism

I

Implementation defined behavior: like unspecified behavior, but the compiler has to document its choice For example: size and endianness of integers (+118 more) Parametrization

I

Undefined behavior: the standard imposes no requirements at all, the program is even allowed to crash For example: dereferencing a NULL or dangling pointer, signed integer overflow, . . . (+201 more) No semantics/crash state

3

Why does C use underspecification that heavily?

Pros for optimizing compilers: I

More optimizations are possible

I

High run-time efficiency

I

Easy to support multiple architectures

4

Why does C use underspecification that heavily?

Pros for optimizing compilers: I

More optimizations are possible

I

High run-time efficiency

I

Easy to support multiple architectures

Cons for programmers/formal methods people: I

Portability and maintenance problems

I

Hard to capture precisely in a semantics

I

Hard to formally reason about

4

Why does C use underspecification that heavily?

Pros for optimizing compilers: I

More optimizations are possible

I

High run-time efficiency

I

Easy to support multiple architectures

Cons Challenges for programmers/formal methods people: I

Portability and maintenance problems

I

Hard to capture precisely in a semantics

I

Hard to formally reason about

4

Approaches to underspecification CompCert (Leroy et al.) /

VST (Appel et al.)

I

Main goal: verification of/w.r.t. CompCert compiler in

I

Specific choices for unspecified/impl-defined behavior For example: 32-bits ints

I

Describes some undefined behavior Undefined: dereferencing NULL, . . . But still defined: integer overflow, aliasing violations, . . .

I

VST separation logic proofs specific to CompCert

Coq

Formalin (Krebbers & Wiedijk) I

Main goal: compiler independent C11 semantics in

Coq

I

Describes all unspecified and undefined behavior

I

Describes some implementation-defined behavior For example: no legacy architectures with 1s’ complement 5

The Formalin project (Krebbers & Wiedijk, OCaml part

)

Coq part

CH2 O core syntax

Type judgment

Subject red. and progress

CH2 O operational semantics

= translation = proof

Executable structured memory model

6

The Formalin project (Krebbers & Wiedijk, OCaml part

.c file

CIL abstract syntax

)

Coq part

CH2 O abstract syntax

CH2 O core syntax

Type soundness Type judgment

Subject red. and progress

CH2 O operational semantics

= translation = proof

Executable structured memory model

6

The Formalin project (Krebbers & Wiedijk, OCaml part

.c file

CIL abstract syntax

)

Coq part

CH2 O abstract syntax

Stream of finite sets of states

CH2 O core syntax

Soundness and completeness

Type soundness Type judgment

Subject red. and progress

CH2 O operational semantics

= translation = proof

Executable structured memory model

6

The Formalin project (Krebbers & Wiedijk, OCaml part

.c file

CIL abstract syntax

)

Coq part

CH2 O abstract syntax

Stream of finite sets of states

CH2 O core syntax

Soundness and completeness

Type soundness Type judgment

Subject red. and progress

CH2 O operational semantics

Soundness

Separation logic = translation = proof

Executable structured memory model

6

The Formalin project (Krebbers & Wiedijk, OCaml part

.c file

CIL abstract syntax

)

Coq part

CH2 O abstract syntax

Stream of finite sets of states

CH2 O core syntax

Soundness and completeness

Type soundness Type judgment

Subject red. and progress

Preservation and composition

Refinement judgment

CH2 O operational semantics

Soundness

Separation logic

= translation = proof

Executable structured memory model

6

Non-local control flow and block scope variables The C quiz, question 2

int *p = NULL; l: if (p) { return (*p); } else { int j = 17; p = &j; goto l; }

7

Non-local control flow and block scope variables The C quiz, question 2

int *p = NULL; l: if (p) { return (*p); } else { int j = 17; p = &j; goto l; }

memory: p NULL

7

Non-local control flow and block scope variables The C quiz, question 2

int *p = NULL; l: if (p) { return (*p); } else { int j = 17; p = &j; goto l; }

memory: p NULL

7

Non-local control flow and block scope variables The C quiz, question 2

int *p = NULL; l: if (p) { return (*p); } else { int j = 17; p = &j; goto l; }

memory: p

j

NULL

17

7

Non-local control flow and block scope variables The C quiz, question 2

int *p = NULL; l: if (p) { return (*p); } else { int j = 17; p = &j; goto l; }

memory: p

j



17

7

Non-local control flow and block scope variables The C quiz, question 2

int *p = NULL; l: if (p) { return (*p); } else { int j = 17; p = &j; goto l; }

memory: p

j



17

7

Non-local control flow and block scope variables The C quiz, question 2

int *p = NULL; l: if (p) { return (*p); } else { int j = 17; p = &j; goto l; }

memory: p •

7

Non-local control flow and block scope variables The C quiz, question 2

int *p = NULL; l: if (p) { return (*p); } else { int j = 17; p = &j; goto l; }

memory: p •

C11, 6.2.4p2: the value of a pointer becomes indeterminate when the object it points to (or just past) reaches the end of its lifetime. =⇒ Undefined behavior

7

Non-local control flow and block scope variables Operational semantics [Krebbers/Wiedijk,FoSSaCS’13]

Problem: goto/return/etc. affect memory

8

Non-local control flow and block scope variables Operational semantics [Krebbers/Wiedijk,FoSSaCS’13]

Problem: goto/return/etc. affect memory Solution: I Execute gotos and returns in small steps I I

Not so much to search for labels, . . . but to naturally perform required allocations and deallocations

8

Non-local control flow and block scope variables Operational semantics [Krebbers/Wiedijk,FoSSaCS’13]

Problem: goto/return/etc. affect memory Solution: I Execute gotos and returns in small steps I I

I

Not so much to search for labels, . . . but to naturally perform required allocations and deallocations

Traversal through the AST in the following directions: downwards to the next statement upwards to the next statement y l to a label l: after a goto l ↑↑ v to the top of the statement after a return

I & I % I I

8

Non-local control flow and block scope variables Operational semantics [Krebbers/Wiedijk,FoSSaCS’13]

Problem: goto/return/etc. affect memory Solution: I Execute gotos and returns in small steps I I

I

Not so much to search for labels, . . . but to naturally perform required allocations and deallocations

Traversal through the AST in the following directions: downwards to the next statement upwards to the next statement y l to a label l: after a goto l ↑↑ v to the top of the statement after a return

I & I % I I

I

Use a zipper to keep track of the position and the stack

8

Non-local control flow and block scope variables Visualization of the operational semantics on an example

int *p = NULL l: if (p) return (*p)

int j = 17 ; p = &j

goto l

9

Non-local control flow and block scope variables Visualization of the operational semantics on an example

int *p = NULL

direction:

&

l:

memory: p

if (p) return (*p)

int j = 17 NULL ; p = &j

goto l

9

Non-local control flow and block scope variables Visualization of the operational semantics on an example

int *p = NULL

direction:

&

l:

memory: p

if (p) return (*p)

int j = 17 NULL ; p = &j

goto l

9

Non-local control flow and block scope variables Visualization of the operational semantics on an example

int *p = NULL

direction:

&

l:

memory: p

if (p) return (*p)

int j = 17 NULL ; p = &j

goto l

9

Non-local control flow and block scope variables Visualization of the operational semantics on an example

int *p = NULL

direction:

&

l: if (p) return (*p)

memory: p

j

NULL

17

int j = 17 ; p = &j

goto l

9

Non-local control flow and block scope variables Visualization of the operational semantics on an example

int *p = NULL

direction:

&

l: if (p) return (*p)

memory: p

j

NULL

17

int j = 17 ; p = &j

goto l

9

Non-local control flow and block scope variables Visualization of the operational semantics on an example

int *p = NULL

direction:

&

l: if (p) return (*p)

memory: p

j



17

int j = 17 ; p = &j

goto l

9

Non-local control flow and block scope variables Visualization of the operational semantics on an example

int *p = NULL

direction:

%

l: if (p) return (*p)

memory: p

j



17

int j = 17 ; p = &j

goto l

9

Non-local control flow and block scope variables Visualization of the operational semantics on an example

int *p = NULL

direction:

%

l: if (p) return (*p)

memory: p

j



17

int j = 17 ; p = &j

goto l

9

Non-local control flow and block scope variables Visualization of the operational semantics on an example

int *p = NULL

direction:

&

l: if (p) return (*p)

memory: p

j



17

int j = 17 ; p = &j

goto l

9

Non-local control flow and block scope variables Visualization of the operational semantics on an example

int *p = NULL

direction:

&

l: if (p) return (*p)

memory: p

j



17

int j = 17 ; p = &j

goto l

9

Non-local control flow and block scope variables Visualization of the operational semantics on an example

int *p = NULL

direction:

yl

l: if (p) return (*p)

memory: p

j



17

int j = 17 ; p = &j

goto l

9

Non-local control flow and block scope variables Visualization of the operational semantics on an example

int *p = NULL

direction:

yl

l: if (p) return (*p)

memory: p

j



17

int j = 17 ; p = &j

goto l

9

Non-local control flow and block scope variables Visualization of the operational semantics on an example

int *p = NULL

direction:

yl

l: if (p) return (*p)

memory: p

j



17

int j = 17 ; p = &j

goto l

9

Non-local control flow and block scope variables Visualization of the operational semantics on an example

int *p = NULL

direction:

yl

l:

memory: p

if (p) return (*p)

int j = 17 • ; p = &j

goto l

9

Non-local control flow and block scope variables Visualization of the operational semantics on an example

int *p = NULL

direction:

yl

l:

memory: p

if (p) return (*p)

int j = 17 • ; p = &j

goto l

9

Non-local control flow and block scope variables Visualization of the operational semantics on an example

int *p = NULL

direction:

&

l:

memory: p

if (p) return (*p)

int j = 17 • ; p = &j

goto l

9

Non-local control flow and block scope variables Visualization of the operational semantics on an example

int *p = NULL

direction:

&

l:

memory: p

if (p) return (*p)

int j = 17 • ; p = &j

goto l

9

Non-local control flow and block scope variables Goto considered harmful?

http://xkcd.com/292/

10

Non-local control flow and block scope variables Goto considered harmful?

http://xkcd.com/292/

Not necessarily: ` {P} . . . goto main_sub3; . . . {Q}

10

Non-local control flow and block scope variables Axiomatic semantics [Krebbers/Wiedijk,FoSSaCS’13]

CH2 O Hoare sextuples are of the shape ∆; J; R ` {P} s {Q}

11

Non-local control flow and block scope variables Axiomatic semantics [Krebbers/Wiedijk,FoSSaCS’13]

CH2 O Hoare sextuples are of the shape ∆; J; R ` {P} s {Q} where: I

{P} s {Q} is a Hoare triple, as usual

11

Non-local control flow and block scope variables Axiomatic semantics [Krebbers/Wiedijk,FoSSaCS’13]

CH2 O Hoare sextuples are of the shape ∆; J; R ` {P} s {Q} where: I

{P} s {Q} is a Hoare triple, as usual

I

∆ maps function names to their pre- and post-conditions

11

Non-local control flow and block scope variables Axiomatic semantics [Krebbers/Wiedijk,FoSSaCS’13]

CH2 O Hoare sextuples are of the shape ∆; J; R ` {P} s {Q} where: I

{P} s {Q} is a Hoare triple, as usual

I

∆ maps function names to their pre- and post-conditions

I

J maps labels to their jumping condition When executing a goto l, the assertion J l has to hold

11

Non-local control flow and block scope variables Axiomatic semantics [Krebbers/Wiedijk,FoSSaCS’13]

CH2 O Hoare sextuples are of the shape ∆; J; R ` {P} s {Q} where: I

{P} s {Q} is a Hoare triple, as usual

I

∆ maps function names to their pre- and post-conditions

I

J maps labels to their jumping condition When executing a goto l, the assertion J l has to hold

I

R has to hold to execute a return

11

Non-local control flow and block scope variables Axiomatic semantics [Krebbers/Wiedijk,FoSSaCS’13]

CH2 O Hoare sextuples are of the shape ∆; J; R ` {P} s {Q} where: I

{P} s {Q} is a Hoare triple, as usual

I

∆ maps function names to their pre- and post-conditions

I

J maps labels to their jumping condition When executing a goto l, the assertion J l has to hold

I

R has to hold to execute a return

Remark: the assertions P, Q, J and R correspond to the directions &, %, y and ↑↑ of traversal

11

Non-local control flow and block scope variables The block scope variable rule

∆; J ↑ ∗ x0τ → 7− –; R ↑ ∗ x0τ → 7− – ` {P ↑ ∗ x0τ → 7− –} s {Q ↑ ∗ x0τ → 7− –} ∆; J; R ` {P} localτ s {Q}

When entering a block: I

The De Bruijn indexes are lifted: ( ) ↑

I

The memory is extended: ( ) ∗ x0τ → 7− –

12

Non-local control flow and block scope variables The block scope variable rule

∆; J ↑ ∗ x0τ → 7− –; R ↑ ∗ x0τ → 7− – ` {P ↑ ∗ x0τ → 7− –} s {Q ↑ ∗ x0τ → 7− –} ∆; J; R ` {P} localτ s {Q}

When entering a block: I

The De Bruijn indexes are lifted: ( ) ↑

I

The memory is extended: ( ) ∗ x0τ → 7− –

When leaving a block: the reverse

12

Non-local control flow and block scope variables The block scope variable rule

∆; J ↑ ∗ x0τ → 7− –; R ↑ ∗ x0τ → 7− – ` {P ↑ ∗ x0τ → 7− –} s {Q ↑ ∗ x0τ → 7− –} ∆; J; R ` {P} localτ s {Q}

When entering a block: I

The De Bruijn indexes are lifted: ( ) ↑

I

The memory is extended: ( ) ∗ x0τ → 7− –

When leaving a block: the reverse Important: symmetry matches gotos going both in and out

12

Non-local control flow and block scope variables The block scope variable rule

∆; J ↑ ∗ x0τ → 7− –; R ↑ ∗ x0τ → 7− – ` {P ↑ ∗ x0τ → 7− –} s {Q ↑ ∗ x0τ → 7− –} ∆; J; R ` {P} localτ s {Q}

When entering a block: I

The De Bruijn indexes are lifted: ( ) ↑

I

The memory is extended: ( ) ∗ x0τ → 7− –

When leaving a block: the reverse Important: symmetry matches gotos going both in and out Important: using De Bruijn indexes avoids shadowing

12

Non-determinism and sequence points The C quiz, question 3

int x = 0, y = 0, *p = &x; int main() { *p = f(); printf("x=%d,y=%d\n", x, y); }

13

Non-determinism and sequence points The C quiz, question 3

int x = 0, y = 0, *p = &x; int f() { p = &y; return 17; } int main() { *p = f(); printf("x=%d,y=%d\n", x, y); }

13

Non-determinism and sequence points The C quiz, question 3

int x = 0, y = 0, *p = &x; int f() { p = &y; return 17; } int main() { *p = f(); printf("x=%d,y=%d\n", x, y); } Let us try some compilers I

Clang prints x=0,y=17

I

GCC prints x=17,y=0

13

Non-determinism and sequence points The C quiz, question 3

int x = 0, y = 0, *p = &x; int f() { p = &y; return 17; } int main() { *p = f(); printf("x=%d,y=%d\n", x, y); } Let us try some compilers I

Clang prints x=0,y=17

I

GCC prints x=17,y=0

Non-determinism appears even in innocently looking code

13

Non-determinism and sequence points Separation logic [Krebbers, POPL’14]

Observation: non-determinism corresponds to concurrency Idea: use the separation logic rule for parallel composition {P1 } e1 {Q1 } {P2 } e2 {Q2 } {P1 ∗ P2 } e1 } e2 {Q1 ∗ Q2 }

14

Non-determinism and sequence points Separation logic [Krebbers, POPL’14]

Observation: non-determinism corresponds to concurrency Idea: use the separation logic rule for parallel composition {P1 } e1 {Q1 } {P2 } e2 {Q2 } {P1 ∗ P2 } e1 } e2 {Q1 ∗ Q2 } What does this mean: I

Split the memory into two disjoint parts

I

Prove that e1 and e2 can be executed safely in their part

I

Now e1 } e2 can be executed safely in the whole memory

14

Non-determinism and sequence points Separation logic [Krebbers, POPL’14]

Observation: non-determinism corresponds to concurrency Idea: use the separation logic rule for parallel composition {P1 } e1 {Q1 } {P2 } e2 {Q2 } {P1 ∗ P2 } e1 } e2 {Q1 ∗ Q2 } What does this mean: I

Split the memory into two disjoint parts

I

Prove that e1 and e2 can be executed safely in their part

I

Now e1 } e2 can be executed safely in the whole memory

Disjointness ⇒ no sequence point violation (accessing the same location twice in one expression)

14

Non-determinism and sequence points Operational semantics [Krebbers, POPL’14]

Head reduction for expressions (e, m) _h (e 0 , m0 )

15

Non-determinism and sequence points Operational semantics [Krebbers, POPL’14]

Head reduction for expressions (e, m) _h (e 0 , m0 ) I

On assignments: add locked address a to Ω1 ∪ Ω2 ([a]Ω1 := [v ]Ω2 , m) _h ([v ]{a}∪Ω1 ∪Ω2 , lock a (m[a := v ]))

I

Meanwhile: lock a makes accesses to a undefined

I

On sequence points: unlock Ω in memory ([v ]Ω ? e2 : e3 , m) _h (e2 , unlock Ω m) provided . . .

15

Non-determinism and sequence points Operational semantics [Krebbers, POPL’14]

Head reduction for expressions (e, m) _h (e 0 , m0 ) I

On assignments: add locked address a to Ω1 ∪ Ω2 ([a]Ω1 := [v ]Ω2 , m) _h ([v ]{a}∪Ω1 ∪Ω2 , lock a (m[a := v ]))

I

Meanwhile: lock a makes accesses to a undefined

I

On sequence points: unlock Ω in memory ([v ]Ω ? e2 : e3 , m) _h (e2 , unlock Ω m) provided . . .

Gives a local treatment of sequence points

15

Non-determinism and sequence points Operational semantics [Krebbers, POPL’14]

Head reduction for expressions (e, m) _h (e 0 , m0 ) I

On assignments: add locked address a to Ω1 ∪ Ω2 ([a]Ω1 := [v ]Ω2 , m) _h ([v ]{a}∪Ω1 ∪Ω2 , lock a (m[a := v ]))

I

Meanwhile: lock a makes accesses to a undefined

I

On sequence points: unlock Ω in memory ([v ]Ω ? e2 : e3 , m) _h (e2 , unlock Ω m) provided . . .

Gives a local treatment of sequence points

Small step reduction S(P, φ, m) _ S(P 0 , φ0 , m0 ) I

(P, φ) gives the position/direction in the whole program

I

Different φs for expressions, statements, function calls

I

Uses evaluation contexts, i.e. if (e1 , m1 ) _h (e2 , m2 ), then S(P, E[ e1 ], m1 ) _ S(P, E[ e2 ], m2 ) 15

Strict aliasing restrictions What is aliasing?

Aliasing: multiple pointers referring to the same object

16

Strict aliasing restrictions What is aliasing?

Aliasing: multiple pointers referring to the same object int f(int *p, int *q) { int x = *q; *p = 17; return x; } If p and q alias, the original value n of *p is returned n p

q

16

Strict aliasing restrictions What is aliasing?

Aliasing: multiple pointers referring to the same object int f(int *p, int *q) { int x = *q; *p = 17; return x *q; } If p and q alias, the original value n of *p is returned n p

q

Optimizing x away is unsound: 17 would be returned

16

Strict aliasing restrictions What is aliasing?

Aliasing: multiple pointers referring to the same object int f(int *p, int *q) { int x = *q; *p = 17; return x *q; } If p and q alias, the original value n of *p is returned n p

q

Optimizing x away is unsound: 17 would be returned Alias analysis: to determine whether pointers can alias 16

Strict aliasing restrictions The C quiz, question 4

Consider a similar function: short g(int *p, short *q) { short x = *q; *p = 17; return x; } And call it with aliased pointers: union { int x; short y; } u; u.y = 3; g(&u.x, &u.y);

x y &u.x

&u.y

17

Strict aliasing restrictions The C quiz, question 4

Consider a similar function: short g(int *p, short *q) { short x = *q; *p = 17; return x; } And call it with aliased pointers: union { int x; short y; } u; u.y = 3; g(&u.x, &u.y);

x y &u.x

&u.y

C89 allows p and q to be aliased, and thus requires g to return 3

17

Strict aliasing restrictions The C quiz, question 4

Consider a similar function: short g(int *p, short *q) { short x = *q; *p = 17; return x; } And call it with aliased pointers: union { int x; short y; } u; u.y = 3; g(&u.x, &u.y);

x y &u.x

&u.y

C89 allows p and q to be aliased, and thus requires g to return 3 C99/C11 allow type-based alias analysis: reads/writes with “the wrong type” cause undefined behavior =⇒ A compiler can assume that p and q do not alias 17

Strict aliasing restrictions How to treat pointers [Krebbers, CPP’13]

Others (e.g. CompCert) Memory: a finite map of cells which consist of arrays of bytes

Our approach

Pointers: pairs (o, i) where o identifies the cell, and i the offset into that cell Too little information to capture strict aliasing restrictions

18

Strict aliasing restrictions How to treat pointers [Krebbers, CPP’13]

Others (e.g. CompCert) Memory: a finite map of cells which consist of arrays of bytes

Our approach A finite map of cells which consist of well-typed trees of bits

Pointers: pairs (o, i) where o identifies the cell, and i the offset into that cell Too little information to capture strict aliasing restrictions

18

Strict aliasing restrictions How to treat pointers [Krebbers, CPP’13]

Others (e.g. CompCert) Memory: a finite map of cells which consist of arrays of bytes

Our approach A finite map of cells which consist of well-typed trees of bits

Pointers: pairs (o, i) where o identifies the cell, and i the offset into that cell

Pairs (o, r ) where o identifies the cell, and r the path through the tree in that cell

Too little information to capture strict aliasing restrictions

18

Strict aliasing restrictions How to treat pointers [Krebbers, CPP’13]

Others (e.g. CompCert) Memory: a finite map of cells which consist of arrays of bytes

Our approach A finite map of cells which consist of well-typed trees of bits

Pointers: pairs (o, i) where o identifies the cell, and i the offset into that cell

Pairs (o, r ) where o identifies the cell, and r the path through the tree in that cell

Too little information to capture strict aliasing restrictions

A semantics for strict aliasing restrictions

18

Strict aliasing restrictions Example of the memory as a structured forest

Consider: struct S { union U { signed char x[2]; int y; } u; void *p; } s = { { .x = {33,34} }, s.u.x + 2 } The object in memory looks like: os 7→ void∗:

(ptr p)0 (ptr p)1 . . . (ptr p)31

.0 signed char: 10000100 01000100 EEEEEEEE

EEEEEEEE struct S

union U

signed char[2]

p = (os : struct S, ,− −−− → 0 ,−−−→• 1 ,−−−−−−− → 0, 8)signed char>void

19

Strict aliasing restrictions The C quiz, question 5

union U { int x; short y; } u = { .x = 3 }; short q1() { return u.y; } short q2() { short *p = &u.y; return *p; }

20

Strict aliasing restrictions The C quiz, question 5

union U { int x; short y; } u = { .x = 3 }; short q1() { return u.y; // OK } short q2() { short *p = &u.y; return *p; // Undefined } Type-punning: reading a union using a pointer to another variant I

C11 vaguely mentions a notion of “visible”

I

GCC only if “the memory is accessed through the union type”

20

Strict aliasing restrictions The C quiz, question 5

union U { int x; short y; } u = { .x = 3 }; short q1() { return u.y; // OK } short q2() { short *p = &u.y; return *p; // Undefined } Type-punning: reading a union using a pointer to another variant I

C11 vaguely mentions a notion of “visible”

I

GCC only if “the memory is accessed through the union type”

Formalized by decorating pointers with annotations 20

Strict aliasing restrictions Strict-aliasing Theorem

Theorem (Strict-aliasing) Given: I

addresses Γ; m ` a1 : σ1 and Γ; m ` a2 : σ2

I

with annotations that do not allow type-punning

I

σ1 , σ2 6= unsigned char

I

σ1 not a subtype of σ2 and vice versa

Then there are two possibilities: 1. a1 and a2 do not alias 2. accessing a1 after a2 (and vice versa) has undefined behavior

21

Strict aliasing restrictions Strict-aliasing Theorem

Theorem (Strict-aliasing) Given: I

addresses Γ; m ` a1 : σ1 and Γ; m ` a2 : σ2

I

with annotations that do not allow type-punning

I

σ1 , σ2 6= unsigned char

I

σ1 not a subtype of σ2 and vice versa

Then there are two possibilities: 1. a1 and a2 do not alias 2. accessing a1 after a2 (and vice versa) has undefined behavior

Theorem Compilers can perform type-based alias analysis 21

CH2 O Abstract C # » x ∈ string := Set of strings I ∈ cinit ::= e | { #» r := I } k ∈ cintrank ::= char | short | int sto ∈ cstorage ::= static | extern | auto | long | long long | ptr s ∈ cstmt ::= e | skip si ∈ signedness ::= signed | unsigned | goto x | return e ? τi ∈ cinttype ::= si ? k | break | continue τ ∈ ctype ::= void | def x | τi | τ ∗ | {s} #» | τ [e] | struct x | union x | sto τ x := I ? ; s | enum x | typeof e | typedef x := τ ; s α ∈ assign ::= := | } := | := } | s1 ; s2 | x : s e ∈ cexpr ::= x | constτi z | sizeof τ | while(e) s | τi min | τi max | τi bits | for(e1 ; e2 ; e3 ) s | &e | ∗ e | do s while(e) | e1 α e2 | if (e) s1 else s2 | x(~e ) | abort d ∈ decl ::= struct τ# » x | union τ# » x | allocτ e | free e | typedef τ | }u e | e1 } e2 # » | enum x := e ? : τi | e1 && e2 | e1 || e2 # » | global I ? : sto τ | e1 ? e2 : e3 | (e1 , e2 ) # » #» | fun (τ x ? ) s ? : sto τ | (τ ) I | e . x Θ ∈ decls := list (string × decl) r ∈ crefseg ::= [e] | .x 22

CH2 O Abstract C Translation to CH2 O core C in Coq [Krebbers/Wiedijk, Submitted] I

Named variables to De Bruijn indices

23

CH2 O Abstract C Translation to CH2 O core C in Coq [Krebbers/Wiedijk, Submitted] I

Named variables to De Bruijn indices

I

Disambiguate l-values and r-values

23

CH2 O Abstract C Translation to CH2 O core C in Coq [Krebbers/Wiedijk, Submitted] I

Named variables to De Bruijn indices

I

Disambiguate l-values and r-values

I

Sound/complete constant expression evaluation, e.g. in τ [e] [[ e ]]Γ,getstack P,m = v

iff

S(P, e, m) _∗ S(P, v , m)

23

CH2 O Abstract C Translation to CH2 O core C in Coq [Krebbers/Wiedijk, Submitted] I

Named variables to De Bruijn indices

I

Disambiguate l-values and r-values

I

Sound/complete constant expression evaluation, e.g. in τ [e] [[ e ]]Γ,getstack P,m = v

I

iff

S(P, e, m) _∗ S(P, v , m)

Simplification of loops, e.g. while(e) s becomes catch (loop (if (e 0 ) skip else throw 0 ; catch s 0 ))

23

CH2 O Abstract C Translation to CH2 O core C in Coq [Krebbers/Wiedijk, Submitted] I

Named variables to De Bruijn indices

I

Disambiguate l-values and r-values

I

Sound/complete constant expression evaluation, e.g. in τ [e] [[ e ]]Γ,getstack P,m = v

I

iff

S(P, e, m) _∗ S(P, v , m)

Simplification of loops, e.g. while(e) s becomes catch (loop (if (e 0 ) skip else throw 0 ; catch s 0 ))

I

Expansion of typedef and enum declarations

23

CH2 O Abstract C Translation to CH2 O core C in Coq [Krebbers/Wiedijk, Submitted] I

Named variables to De Bruijn indices

I

Disambiguate l-values and r-values

I

Sound/complete constant expression evaluation, e.g. in τ [e] [[ e ]]Γ,getstack P,m = v

I

iff

S(P, e, m) _∗ S(P, v , m)

Simplification of loops, e.g. while(e) s becomes catch (loop (if (e 0 ) skip else throw 0 ; catch s 0 ))

I

Expansion of typedef and enum declarations

I

Translation of constants like INT_MIN

23

CH2 O Abstract C Translation to CH2 O core C in Coq [Krebbers/Wiedijk, Submitted] I

Named variables to De Bruijn indices

I

Disambiguate l-values and r-values

I

Sound/complete constant expression evaluation, e.g. in τ [e] [[ e ]]Γ,getstack P,m = v

I

iff

S(P, e, m) _∗ S(P, v , m)

Simplification of loops, e.g. while(e) s becomes catch (loop (if (e 0 ) skip else throw 0 ; catch s 0 ))

I

Expansion of typedef and enum declarations

I

Translation of constants like INT_MIN

I

Translation of compound literals

23

CH2 O Abstract C Translation to CH2 O core C in Coq [Krebbers/Wiedijk, Submitted] I

Named variables to De Bruijn indices

I

Disambiguate l-values and r-values

I

Sound/complete constant expression evaluation, e.g. in τ [e] [[ e ]]Γ,getstack P,m = v

I

iff

S(P, e, m) _∗ S(P, v , m)

Simplification of loops, e.g. while(e) s becomes catch (loop (if (e 0 ) skip else throw 0 ; catch s 0 ))

I

Expansion of typedef and enum declarations

I

Translation of constants like INT_MIN

I

Translation of compound literals

Theorem (Type soundness) If the translator succeeds, the CH2 O core program is well-typed. 23

Executable semantics Goal: define exec : state → Pfin (state)

24

Executable semantics Goal: define exec : state → Pfin (state) Problems: 1. Decomposition E[ e1 ] of expressions is non-deterministic: S(P, E[ e1 ], m1 ) _ S(P, E[ e2 ], m2 )

if (e1 ,m1 )_h(e2 ,m2 )

2. Object identifiers o for newly allocated memory are arbitrary: S(P, (&, localτ s), m) _ S((localo:τ 2) :: P, (&, s), alloc o τ m)

if o ∈dom / m

24

Executable semantics Goal: define exec : state → Pfin (state) Problems: 1. Decomposition E[ e1 ] of expressions is non-deterministic: S(P, E[ e1 ], m1 ) _ S(P, E[ e2 ], m2 )

if (e1 ,m1 )_h(e2 ,m2 )

2. Object identifiers o for newly allocated memory are arbitrary: S(P, (&, localτ s), m) _ S((localo:τ 2) :: P, (&, s), alloc o τ m)

if o ∈dom / m

Solutions: 1. Enumerate all possible decompositions E[ e1 ] 2. Pick a canonical object identifier fresh m for o 24

Executable semantics Example of the Coq code Definition cexec (Γ : env Ti) (δ : funenv Ti) (S : state Ti) : listset (state Ti) := let ’State k φ m := S in match φ with | Stmt & s => match s with | skip => {[ State k (Stmt % skip) m ]} | goto l => {[ State k (Stmt (y l) (goto l)) m ]} | ... | local{τ} s => let o := fresh (dom indexset m) in (* use canonical object identifier *) {[ State (CLocal o τ :: k) (Stmt & s) (mem_alloc Γ o false τ m) ]} end | Expr e => match maybe_EVal e with | Some (Ω,v) => ... | None => ’(E,e’) ← expr redexes e; (* monadic programming to try each decomposition *) match ehexec Γ (get_stack k) e’ m with | Some (e2,m2) => {[ State k (Expr (subst E e2)) m2 ]} | None => match maybe_ECall_redex e’ with | Some (f, Ωs, vs) => S {[ State (CFun E :: k) (Call f vs) (mem_unlock ( Ωs) m) ]} | _ => {[ State k (Undef (UndefExpr E e’)) m ]} end end end | ...

25

Executable semantics Example of the Coq code Definition cexec (Γ : env Ti) (δ : funenv Ti) (S : state Ti) : listset (state Ti) := let ’State k φ m := S in match φ with | Stmt & s => match s with | skip => {[ State k (Stmt % skip) m ]} | goto l => {[ State k (Stmt (y l) (goto l)) m ]} | ... | local{τ} s => let o := fresh (dom indexset m) in (* use canonical object identifier *) {[ State (CLocal o τ :: k) (Stmt & s) (mem_alloc Γ o false τ m) ]} end | Expr e => match maybe_EVal e with | Some (Ω,v) => ... | None => ’(E,e’) ← expr redexes e; (* monadic programming to try each decomposition *) match ehexec Γ (get_stack k) e’ m with | Some (e2,m2) => {[ State k (Expr (subst E e2)) m2 ]} | None => match maybe_ECall_redex e’ with | Some (f, Ωs, vs) => S {[ State (CFun E :: k) (Call f vs) (mem_unlock ( Ωs) m) ]} | _ => {[ State k (Undef (UndefExpr E e’)) m ]} end end end | ...

25

Executable semantics Example of the Coq code Definition cexec (Γ : env Ti) (δ : funenv Ti) (S : state Ti) : listset (state Ti) := let ’State k φ m := S in match φ with | Stmt & s => match s with | skip => {[ State k (Stmt % skip) m ]} | goto l => {[ State k (Stmt (y l) (goto l)) m ]} | ... | local{τ} s => let o := fresh (dom indexset m) in (* use canonical object identifier *) {[ State (CLocal o τ :: k) (Stmt & s) (mem_alloc Γ o false τ m) ]} end | Expr e => match maybe_EVal e with | Some (Ω,v) => ... | None => ’(E,e’) ← expr redexes e; (* monadic programming to try each decomposition *) match ehexec Γ (get_stack k) e’ m with | Some (e2,m2) => {[ State k (Expr (subst E e2)) m2 ]} | None => match maybe_ECall_redex e’ with | Some (f, Ωs, vs) => S {[ State (CFun E :: k) (Call f vs) (mem_unlock ( Ωs) m) ]} | _ => {[ State k (Undef (UndefExpr E e’)) m ]} end end end | ...

25

Executable semantics Soundness and completeness [Krebbers/Wiedijk, Submitted]

Theorem (Soundness) If S2 ∈ exec S1 , then S1 _ S2 .

26

Executable semantics Soundness and completeness [Krebbers/Wiedijk, Submitted]

Theorem (Soundness) If S2 ∈ exec S1 , then S1 _ S2 .

Definition (Permutation) We let S1 ∼f S2 , if S2 is obtained by renaming S1 with respect to f : index → option index.

26

Executable semantics Soundness and completeness [Krebbers/Wiedijk, Submitted]

Theorem (Soundness) If S2 ∈ exec S1 , then S1 _ S2 .

Definition (Permutation) We let S1 ∼f S2 , if S2 is obtained by renaming S1 with respect to f : index → option index.

Theorem (Completeness) If S1 _∗ S2 , then there exists an f and S20 such that: S20

∗ exec

S1

f ∗

S2 26

Executable semantics Demo

$ cat foo.c int main(void) { int x = 10, *p = &x; return *p + *p + *p + *p + *p + *p; } $ ./ch2o -t foo.c .....,.......,,,.....+;!|?*%%$$$$$$$$%%*?|!;:+-,.. ........ "" 60

27

Conclusion

C11 is not a simple language

28

Questions

Sources: http://robbertkrebbers.nl/research/ch2o/

http://xkcd.com/371/

29