Oct 24, 2014 - This program violates the sequence point restriction. â· due to two .... Visualization of the operationa
Formalizing C in Coq Robbert Krebbers ICIS, Radboud University Nijmegen, The Netherlands
October 24, 2014 @ Princeton University, USA
1
What is this program supposed to do? The C quiz, question 1
int main() { int x; int y = (x = 3) + (x = 4); printf("x=%d,y=%d\n", x, y); }
2
What is this program supposed to do? The C quiz, question 1
int main() { int x; int y = (x = 3) + (x = 4); printf("x=%d,y=%d\n", x, y); } Let us try some compilers I
Clang prints x=4,y=7, seems just left-right
2
What is this program supposed to do? The C quiz, question 1
int main() { int x; int y = (x = 3) + (x = 4); printf("x=%d,y=%d\n", x, y); } Let us try some compilers I
Clang prints x=4,y=7, seems just left-right
I
GCC prints x=4,y=8, does not correspond to any order
2
What is this program supposed to do? The C quiz, question 1
int main() { int x; int y = (x = 3) + (x = 4); printf("x=%d,y=%d\n", x, y); } Let us try some compilers I
Clang prints x=4,y=7, seems just left-right
I
GCC prints x=4,y=8, does not correspond to any order
This program violates the sequence point restriction I
due to two unsequenced writes to x
I
resulting in undefined behavior
I
thus both compilers are right 2
Underspecification in C11
I
Unspecified behavior: two or more behaviors are allowed For example: order of evaluation in expressions (+57 more)
I
Implementation defined behavior: like unspecified behavior, but the compiler has to document its choice For example: size and endianness of integers (+118 more)
I
Undefined behavior: the standard imposes no requirements at all, the program is even allowed to crash For example: dereferencing a NULL or dangling pointer, signed integer overflow, . . . (+201 more)
3
Underspecification in C11
I
Unspecified behavior: two or more behaviors are allowed For example: order of evaluation in expressions (+57 more) Non-determinism
I
Implementation defined behavior: like unspecified behavior, but the compiler has to document its choice For example: size and endianness of integers (+118 more) Parametrization
I
Undefined behavior: the standard imposes no requirements at all, the program is even allowed to crash For example: dereferencing a NULL or dangling pointer, signed integer overflow, . . . (+201 more) No semantics/crash state
3
Why does C use underspecification that heavily?
Pros for optimizing compilers: I
More optimizations are possible
I
High run-time efficiency
I
Easy to support multiple architectures
4
Why does C use underspecification that heavily?
Pros for optimizing compilers: I
More optimizations are possible
I
High run-time efficiency
I
Easy to support multiple architectures
Cons for programmers/formal methods people: I
Portability and maintenance problems
I
Hard to capture precisely in a semantics
I
Hard to formally reason about
4
Why does C use underspecification that heavily?
Pros for optimizing compilers: I
More optimizations are possible
I
High run-time efficiency
I
Easy to support multiple architectures
Cons Challenges for programmers/formal methods people: I
Portability and maintenance problems
I
Hard to capture precisely in a semantics
I
Hard to formally reason about
4
Approaches to underspecification CompCert (Leroy et al.) /
VST (Appel et al.)
I
Main goal: verification of/w.r.t. CompCert compiler in
I
Specific choices for unspecified/impl-defined behavior For example: 32-bits ints
I
Describes some undefined behavior Undefined: dereferencing NULL, . . . But still defined: integer overflow, aliasing violations, . . .
I
VST separation logic proofs specific to CompCert
Coq
Formalin (Krebbers & Wiedijk) I
Main goal: compiler independent C11 semantics in
Coq
I
Describes all unspecified and undefined behavior
I
Describes some implementation-defined behavior For example: no legacy architectures with 1s’ complement 5
The Formalin project (Krebbers & Wiedijk, OCaml part
)
Coq part
CH2 O core syntax
Type judgment
Subject red. and progress
CH2 O operational semantics
= translation = proof
Executable structured memory model
6
The Formalin project (Krebbers & Wiedijk, OCaml part
.c file
CIL abstract syntax
)
Coq part
CH2 O abstract syntax
CH2 O core syntax
Type soundness Type judgment
Subject red. and progress
CH2 O operational semantics
= translation = proof
Executable structured memory model
6
The Formalin project (Krebbers & Wiedijk, OCaml part
.c file
CIL abstract syntax
)
Coq part
CH2 O abstract syntax
Stream of finite sets of states
CH2 O core syntax
Soundness and completeness
Type soundness Type judgment
Subject red. and progress
CH2 O operational semantics
= translation = proof
Executable structured memory model
6
The Formalin project (Krebbers & Wiedijk, OCaml part
.c file
CIL abstract syntax
)
Coq part
CH2 O abstract syntax
Stream of finite sets of states
CH2 O core syntax
Soundness and completeness
Type soundness Type judgment
Subject red. and progress
CH2 O operational semantics
Soundness
Separation logic = translation = proof
Executable structured memory model
6
The Formalin project (Krebbers & Wiedijk, OCaml part
.c file
CIL abstract syntax
)
Coq part
CH2 O abstract syntax
Stream of finite sets of states
CH2 O core syntax
Soundness and completeness
Type soundness Type judgment
Subject red. and progress
Preservation and composition
Refinement judgment
CH2 O operational semantics
Soundness
Separation logic
= translation = proof
Executable structured memory model
6
Non-local control flow and block scope variables The C quiz, question 2
int *p = NULL; l: if (p) { return (*p); } else { int j = 17; p = &j; goto l; }
7
Non-local control flow and block scope variables The C quiz, question 2
int *p = NULL; l: if (p) { return (*p); } else { int j = 17; p = &j; goto l; }
memory: p NULL
7
Non-local control flow and block scope variables The C quiz, question 2
int *p = NULL; l: if (p) { return (*p); } else { int j = 17; p = &j; goto l; }
memory: p NULL
7
Non-local control flow and block scope variables The C quiz, question 2
int *p = NULL; l: if (p) { return (*p); } else { int j = 17; p = &j; goto l; }
memory: p
j
NULL
17
7
Non-local control flow and block scope variables The C quiz, question 2
int *p = NULL; l: if (p) { return (*p); } else { int j = 17; p = &j; goto l; }
memory: p
j
•
17
7
Non-local control flow and block scope variables The C quiz, question 2
int *p = NULL; l: if (p) { return (*p); } else { int j = 17; p = &j; goto l; }
memory: p
j
•
17
7
Non-local control flow and block scope variables The C quiz, question 2
int *p = NULL; l: if (p) { return (*p); } else { int j = 17; p = &j; goto l; }
memory: p •
7
Non-local control flow and block scope variables The C quiz, question 2
int *p = NULL; l: if (p) { return (*p); } else { int j = 17; p = &j; goto l; }
memory: p •
C11, 6.2.4p2: the value of a pointer becomes indeterminate when the object it points to (or just past) reaches the end of its lifetime. =⇒ Undefined behavior
7
Non-local control flow and block scope variables Operational semantics [Krebbers/Wiedijk,FoSSaCS’13]
Problem: goto/return/etc. affect memory
8
Non-local control flow and block scope variables Operational semantics [Krebbers/Wiedijk,FoSSaCS’13]
Problem: goto/return/etc. affect memory Solution: I Execute gotos and returns in small steps I I
Not so much to search for labels, . . . but to naturally perform required allocations and deallocations
8
Non-local control flow and block scope variables Operational semantics [Krebbers/Wiedijk,FoSSaCS’13]
Problem: goto/return/etc. affect memory Solution: I Execute gotos and returns in small steps I I
I
Not so much to search for labels, . . . but to naturally perform required allocations and deallocations
Traversal through the AST in the following directions: downwards to the next statement upwards to the next statement y l to a label l: after a goto l ↑↑ v to the top of the statement after a return
I & I % I I
8
Non-local control flow and block scope variables Operational semantics [Krebbers/Wiedijk,FoSSaCS’13]
Problem: goto/return/etc. affect memory Solution: I Execute gotos and returns in small steps I I
I
Not so much to search for labels, . . . but to naturally perform required allocations and deallocations
Traversal through the AST in the following directions: downwards to the next statement upwards to the next statement y l to a label l: after a goto l ↑↑ v to the top of the statement after a return
I & I % I I
I
Use a zipper to keep track of the position and the stack
8
Non-local control flow and block scope variables Visualization of the operational semantics on an example
int *p = NULL l: if (p) return (*p)
int j = 17 ; p = &j
goto l
9
Non-local control flow and block scope variables Visualization of the operational semantics on an example
int *p = NULL
direction:
&
l:
memory: p
if (p) return (*p)
int j = 17 NULL ; p = &j
goto l
9
Non-local control flow and block scope variables Visualization of the operational semantics on an example
int *p = NULL
direction:
&
l:
memory: p
if (p) return (*p)
int j = 17 NULL ; p = &j
goto l
9
Non-local control flow and block scope variables Visualization of the operational semantics on an example
int *p = NULL
direction:
&
l:
memory: p
if (p) return (*p)
int j = 17 NULL ; p = &j
goto l
9
Non-local control flow and block scope variables Visualization of the operational semantics on an example
int *p = NULL
direction:
&
l: if (p) return (*p)
memory: p
j
NULL
17
int j = 17 ; p = &j
goto l
9
Non-local control flow and block scope variables Visualization of the operational semantics on an example
int *p = NULL
direction:
&
l: if (p) return (*p)
memory: p
j
NULL
17
int j = 17 ; p = &j
goto l
9
Non-local control flow and block scope variables Visualization of the operational semantics on an example
int *p = NULL
direction:
&
l: if (p) return (*p)
memory: p
j
•
17
int j = 17 ; p = &j
goto l
9
Non-local control flow and block scope variables Visualization of the operational semantics on an example
int *p = NULL
direction:
%
l: if (p) return (*p)
memory: p
j
•
17
int j = 17 ; p = &j
goto l
9
Non-local control flow and block scope variables Visualization of the operational semantics on an example
int *p = NULL
direction:
%
l: if (p) return (*p)
memory: p
j
•
17
int j = 17 ; p = &j
goto l
9
Non-local control flow and block scope variables Visualization of the operational semantics on an example
int *p = NULL
direction:
&
l: if (p) return (*p)
memory: p
j
•
17
int j = 17 ; p = &j
goto l
9
Non-local control flow and block scope variables Visualization of the operational semantics on an example
int *p = NULL
direction:
&
l: if (p) return (*p)
memory: p
j
•
17
int j = 17 ; p = &j
goto l
9
Non-local control flow and block scope variables Visualization of the operational semantics on an example
int *p = NULL
direction:
yl
l: if (p) return (*p)
memory: p
j
•
17
int j = 17 ; p = &j
goto l
9
Non-local control flow and block scope variables Visualization of the operational semantics on an example
int *p = NULL
direction:
yl
l: if (p) return (*p)
memory: p
j
•
17
int j = 17 ; p = &j
goto l
9
Non-local control flow and block scope variables Visualization of the operational semantics on an example
int *p = NULL
direction:
yl
l: if (p) return (*p)
memory: p
j
•
17
int j = 17 ; p = &j
goto l
9
Non-local control flow and block scope variables Visualization of the operational semantics on an example
int *p = NULL
direction:
yl
l:
memory: p
if (p) return (*p)
int j = 17 • ; p = &j
goto l
9
Non-local control flow and block scope variables Visualization of the operational semantics on an example
int *p = NULL
direction:
yl
l:
memory: p
if (p) return (*p)
int j = 17 • ; p = &j
goto l
9
Non-local control flow and block scope variables Visualization of the operational semantics on an example
int *p = NULL
direction:
&
l:
memory: p
if (p) return (*p)
int j = 17 • ; p = &j
goto l
9
Non-local control flow and block scope variables Visualization of the operational semantics on an example
int *p = NULL
direction:
&
l:
memory: p
if (p) return (*p)
int j = 17 • ; p = &j
goto l
9
Non-local control flow and block scope variables Goto considered harmful?
http://xkcd.com/292/
10
Non-local control flow and block scope variables Goto considered harmful?
http://xkcd.com/292/
Not necessarily: ` {P} . . . goto main_sub3; . . . {Q}
10
Non-local control flow and block scope variables Axiomatic semantics [Krebbers/Wiedijk,FoSSaCS’13]
CH2 O Hoare sextuples are of the shape ∆; J; R ` {P} s {Q}
11
Non-local control flow and block scope variables Axiomatic semantics [Krebbers/Wiedijk,FoSSaCS’13]
CH2 O Hoare sextuples are of the shape ∆; J; R ` {P} s {Q} where: I
{P} s {Q} is a Hoare triple, as usual
11
Non-local control flow and block scope variables Axiomatic semantics [Krebbers/Wiedijk,FoSSaCS’13]
CH2 O Hoare sextuples are of the shape ∆; J; R ` {P} s {Q} where: I
{P} s {Q} is a Hoare triple, as usual
I
∆ maps function names to their pre- and post-conditions
11
Non-local control flow and block scope variables Axiomatic semantics [Krebbers/Wiedijk,FoSSaCS’13]
CH2 O Hoare sextuples are of the shape ∆; J; R ` {P} s {Q} where: I
{P} s {Q} is a Hoare triple, as usual
I
∆ maps function names to their pre- and post-conditions
I
J maps labels to their jumping condition When executing a goto l, the assertion J l has to hold
11
Non-local control flow and block scope variables Axiomatic semantics [Krebbers/Wiedijk,FoSSaCS’13]
CH2 O Hoare sextuples are of the shape ∆; J; R ` {P} s {Q} where: I
{P} s {Q} is a Hoare triple, as usual
I
∆ maps function names to their pre- and post-conditions
I
J maps labels to their jumping condition When executing a goto l, the assertion J l has to hold
I
R has to hold to execute a return
11
Non-local control flow and block scope variables Axiomatic semantics [Krebbers/Wiedijk,FoSSaCS’13]
CH2 O Hoare sextuples are of the shape ∆; J; R ` {P} s {Q} where: I
{P} s {Q} is a Hoare triple, as usual
I
∆ maps function names to their pre- and post-conditions
I
J maps labels to their jumping condition When executing a goto l, the assertion J l has to hold
I
R has to hold to execute a return
Remark: the assertions P, Q, J and R correspond to the directions &, %, y and ↑↑ of traversal
11
Non-local control flow and block scope variables The block scope variable rule
∆; J ↑ ∗ x0τ → 7− –; R ↑ ∗ x0τ → 7− – ` {P ↑ ∗ x0τ → 7− –} s {Q ↑ ∗ x0τ → 7− –} ∆; J; R ` {P} localτ s {Q}
When entering a block: I
The De Bruijn indexes are lifted: ( ) ↑
I
The memory is extended: ( ) ∗ x0τ → 7− –
12
Non-local control flow and block scope variables The block scope variable rule
∆; J ↑ ∗ x0τ → 7− –; R ↑ ∗ x0τ → 7− – ` {P ↑ ∗ x0τ → 7− –} s {Q ↑ ∗ x0τ → 7− –} ∆; J; R ` {P} localτ s {Q}
When entering a block: I
The De Bruijn indexes are lifted: ( ) ↑
I
The memory is extended: ( ) ∗ x0τ → 7− –
When leaving a block: the reverse
12
Non-local control flow and block scope variables The block scope variable rule
∆; J ↑ ∗ x0τ → 7− –; R ↑ ∗ x0τ → 7− – ` {P ↑ ∗ x0τ → 7− –} s {Q ↑ ∗ x0τ → 7− –} ∆; J; R ` {P} localτ s {Q}
When entering a block: I
The De Bruijn indexes are lifted: ( ) ↑
I
The memory is extended: ( ) ∗ x0τ → 7− –
When leaving a block: the reverse Important: symmetry matches gotos going both in and out
12
Non-local control flow and block scope variables The block scope variable rule
∆; J ↑ ∗ x0τ → 7− –; R ↑ ∗ x0τ → 7− – ` {P ↑ ∗ x0τ → 7− –} s {Q ↑ ∗ x0τ → 7− –} ∆; J; R ` {P} localτ s {Q}
When entering a block: I
The De Bruijn indexes are lifted: ( ) ↑
I
The memory is extended: ( ) ∗ x0τ → 7− –
When leaving a block: the reverse Important: symmetry matches gotos going both in and out Important: using De Bruijn indexes avoids shadowing
12
Non-determinism and sequence points The C quiz, question 3
int x = 0, y = 0, *p = &x; int main() { *p = f(); printf("x=%d,y=%d\n", x, y); }
13
Non-determinism and sequence points The C quiz, question 3
int x = 0, y = 0, *p = &x; int f() { p = &y; return 17; } int main() { *p = f(); printf("x=%d,y=%d\n", x, y); }
13
Non-determinism and sequence points The C quiz, question 3
int x = 0, y = 0, *p = &x; int f() { p = &y; return 17; } int main() { *p = f(); printf("x=%d,y=%d\n", x, y); } Let us try some compilers I
Clang prints x=0,y=17
I
GCC prints x=17,y=0
13
Non-determinism and sequence points The C quiz, question 3
int x = 0, y = 0, *p = &x; int f() { p = &y; return 17; } int main() { *p = f(); printf("x=%d,y=%d\n", x, y); } Let us try some compilers I
Clang prints x=0,y=17
I
GCC prints x=17,y=0
Non-determinism appears even in innocently looking code
13
Non-determinism and sequence points Separation logic [Krebbers, POPL’14]
Observation: non-determinism corresponds to concurrency Idea: use the separation logic rule for parallel composition {P1 } e1 {Q1 } {P2 } e2 {Q2 } {P1 ∗ P2 } e1 } e2 {Q1 ∗ Q2 }
14
Non-determinism and sequence points Separation logic [Krebbers, POPL’14]
Observation: non-determinism corresponds to concurrency Idea: use the separation logic rule for parallel composition {P1 } e1 {Q1 } {P2 } e2 {Q2 } {P1 ∗ P2 } e1 } e2 {Q1 ∗ Q2 } What does this mean: I
Split the memory into two disjoint parts
I
Prove that e1 and e2 can be executed safely in their part
I
Now e1 } e2 can be executed safely in the whole memory
14
Non-determinism and sequence points Separation logic [Krebbers, POPL’14]
Observation: non-determinism corresponds to concurrency Idea: use the separation logic rule for parallel composition {P1 } e1 {Q1 } {P2 } e2 {Q2 } {P1 ∗ P2 } e1 } e2 {Q1 ∗ Q2 } What does this mean: I
Split the memory into two disjoint parts
I
Prove that e1 and e2 can be executed safely in their part
I
Now e1 } e2 can be executed safely in the whole memory
Disjointness ⇒ no sequence point violation (accessing the same location twice in one expression)
14
Non-determinism and sequence points Operational semantics [Krebbers, POPL’14]
Head reduction for expressions (e, m) _h (e 0 , m0 )
15
Non-determinism and sequence points Operational semantics [Krebbers, POPL’14]
Head reduction for expressions (e, m) _h (e 0 , m0 ) I
On assignments: add locked address a to Ω1 ∪ Ω2 ([a]Ω1 := [v ]Ω2 , m) _h ([v ]{a}∪Ω1 ∪Ω2 , lock a (m[a := v ]))
I
Meanwhile: lock a makes accesses to a undefined
I
On sequence points: unlock Ω in memory ([v ]Ω ? e2 : e3 , m) _h (e2 , unlock Ω m) provided . . .
15
Non-determinism and sequence points Operational semantics [Krebbers, POPL’14]
Head reduction for expressions (e, m) _h (e 0 , m0 ) I
On assignments: add locked address a to Ω1 ∪ Ω2 ([a]Ω1 := [v ]Ω2 , m) _h ([v ]{a}∪Ω1 ∪Ω2 , lock a (m[a := v ]))
I
Meanwhile: lock a makes accesses to a undefined
I
On sequence points: unlock Ω in memory ([v ]Ω ? e2 : e3 , m) _h (e2 , unlock Ω m) provided . . .
Gives a local treatment of sequence points
15
Non-determinism and sequence points Operational semantics [Krebbers, POPL’14]
Head reduction for expressions (e, m) _h (e 0 , m0 ) I
On assignments: add locked address a to Ω1 ∪ Ω2 ([a]Ω1 := [v ]Ω2 , m) _h ([v ]{a}∪Ω1 ∪Ω2 , lock a (m[a := v ]))
I
Meanwhile: lock a makes accesses to a undefined
I
On sequence points: unlock Ω in memory ([v ]Ω ? e2 : e3 , m) _h (e2 , unlock Ω m) provided . . .
Gives a local treatment of sequence points
Small step reduction S(P, φ, m) _ S(P 0 , φ0 , m0 ) I
(P, φ) gives the position/direction in the whole program
I
Different φs for expressions, statements, function calls
I
Uses evaluation contexts, i.e. if (e1 , m1 ) _h (e2 , m2 ), then S(P, E[ e1 ], m1 ) _ S(P, E[ e2 ], m2 ) 15
Strict aliasing restrictions What is aliasing?
Aliasing: multiple pointers referring to the same object
16
Strict aliasing restrictions What is aliasing?
Aliasing: multiple pointers referring to the same object int f(int *p, int *q) { int x = *q; *p = 17; return x; } If p and q alias, the original value n of *p is returned n p
q
16
Strict aliasing restrictions What is aliasing?
Aliasing: multiple pointers referring to the same object int f(int *p, int *q) { int x = *q; *p = 17; return x *q; } If p and q alias, the original value n of *p is returned n p
q
Optimizing x away is unsound: 17 would be returned
16
Strict aliasing restrictions What is aliasing?
Aliasing: multiple pointers referring to the same object int f(int *p, int *q) { int x = *q; *p = 17; return x *q; } If p and q alias, the original value n of *p is returned n p
q
Optimizing x away is unsound: 17 would be returned Alias analysis: to determine whether pointers can alias 16
Strict aliasing restrictions The C quiz, question 4
Consider a similar function: short g(int *p, short *q) { short x = *q; *p = 17; return x; } And call it with aliased pointers: union { int x; short y; } u; u.y = 3; g(&u.x, &u.y);
x y &u.x
&u.y
17
Strict aliasing restrictions The C quiz, question 4
Consider a similar function: short g(int *p, short *q) { short x = *q; *p = 17; return x; } And call it with aliased pointers: union { int x; short y; } u; u.y = 3; g(&u.x, &u.y);
x y &u.x
&u.y
C89 allows p and q to be aliased, and thus requires g to return 3
17
Strict aliasing restrictions The C quiz, question 4
Consider a similar function: short g(int *p, short *q) { short x = *q; *p = 17; return x; } And call it with aliased pointers: union { int x; short y; } u; u.y = 3; g(&u.x, &u.y);
x y &u.x
&u.y
C89 allows p and q to be aliased, and thus requires g to return 3 C99/C11 allow type-based alias analysis: reads/writes with “the wrong type” cause undefined behavior =⇒ A compiler can assume that p and q do not alias 17
Strict aliasing restrictions How to treat pointers [Krebbers, CPP’13]
Others (e.g. CompCert) Memory: a finite map of cells which consist of arrays of bytes
Our approach
Pointers: pairs (o, i) where o identifies the cell, and i the offset into that cell Too little information to capture strict aliasing restrictions
18
Strict aliasing restrictions How to treat pointers [Krebbers, CPP’13]
Others (e.g. CompCert) Memory: a finite map of cells which consist of arrays of bytes
Our approach A finite map of cells which consist of well-typed trees of bits
Pointers: pairs (o, i) where o identifies the cell, and i the offset into that cell Too little information to capture strict aliasing restrictions
18
Strict aliasing restrictions How to treat pointers [Krebbers, CPP’13]
Others (e.g. CompCert) Memory: a finite map of cells which consist of arrays of bytes
Our approach A finite map of cells which consist of well-typed trees of bits
Pointers: pairs (o, i) where o identifies the cell, and i the offset into that cell
Pairs (o, r ) where o identifies the cell, and r the path through the tree in that cell
Too little information to capture strict aliasing restrictions
18
Strict aliasing restrictions How to treat pointers [Krebbers, CPP’13]
Others (e.g. CompCert) Memory: a finite map of cells which consist of arrays of bytes
Our approach A finite map of cells which consist of well-typed trees of bits
Pointers: pairs (o, i) where o identifies the cell, and i the offset into that cell
Pairs (o, r ) where o identifies the cell, and r the path through the tree in that cell
Too little information to capture strict aliasing restrictions
A semantics for strict aliasing restrictions
18
Strict aliasing restrictions Example of the memory as a structured forest
Consider: struct S { union U { signed char x[2]; int y; } u; void *p; } s = { { .x = {33,34} }, s.u.x + 2 } The object in memory looks like: os 7→ void∗:
(ptr p)0 (ptr p)1 . . . (ptr p)31
.0 signed char: 10000100 01000100 EEEEEEEE
EEEEEEEE struct S
union U
signed char[2]
p = (os : struct S, ,− −−− → 0 ,−−−→• 1 ,−−−−−−− → 0, 8)signed char>void
19
Strict aliasing restrictions The C quiz, question 5
union U { int x; short y; } u = { .x = 3 }; short q1() { return u.y; } short q2() { short *p = &u.y; return *p; }
20
Strict aliasing restrictions The C quiz, question 5
union U { int x; short y; } u = { .x = 3 }; short q1() { return u.y; // OK } short q2() { short *p = &u.y; return *p; // Undefined } Type-punning: reading a union using a pointer to another variant I
C11 vaguely mentions a notion of “visible”
I
GCC only if “the memory is accessed through the union type”
20
Strict aliasing restrictions The C quiz, question 5
union U { int x; short y; } u = { .x = 3 }; short q1() { return u.y; // OK } short q2() { short *p = &u.y; return *p; // Undefined } Type-punning: reading a union using a pointer to another variant I
C11 vaguely mentions a notion of “visible”
I
GCC only if “the memory is accessed through the union type”
Formalized by decorating pointers with annotations 20
Strict aliasing restrictions Strict-aliasing Theorem
Theorem (Strict-aliasing) Given: I
addresses Γ; m ` a1 : σ1 and Γ; m ` a2 : σ2
I
with annotations that do not allow type-punning
I
σ1 , σ2 6= unsigned char
I
σ1 not a subtype of σ2 and vice versa
Then there are two possibilities: 1. a1 and a2 do not alias 2. accessing a1 after a2 (and vice versa) has undefined behavior
21
Strict aliasing restrictions Strict-aliasing Theorem
Theorem (Strict-aliasing) Given: I
addresses Γ; m ` a1 : σ1 and Γ; m ` a2 : σ2
I
with annotations that do not allow type-punning
I
σ1 , σ2 6= unsigned char
I
σ1 not a subtype of σ2 and vice versa
Then there are two possibilities: 1. a1 and a2 do not alias 2. accessing a1 after a2 (and vice versa) has undefined behavior
Theorem Compilers can perform type-based alias analysis 21
CH2 O Abstract C # » x ∈ string := Set of strings I ∈ cinit ::= e | { #» r := I } k ∈ cintrank ::= char | short | int sto ∈ cstorage ::= static | extern | auto | long | long long | ptr s ∈ cstmt ::= e | skip si ∈ signedness ::= signed | unsigned | goto x | return e ? τi ∈ cinttype ::= si ? k | break | continue τ ∈ ctype ::= void | def x | τi | τ ∗ | {s} #» | τ [e] | struct x | union x | sto τ x := I ? ; s | enum x | typeof e | typedef x := τ ; s α ∈ assign ::= := | } := | := } | s1 ; s2 | x : s e ∈ cexpr ::= x | constτi z | sizeof τ | while(e) s | τi min | τi max | τi bits | for(e1 ; e2 ; e3 ) s | &e | ∗ e | do s while(e) | e1 α e2 | if (e) s1 else s2 | x(~e ) | abort d ∈ decl ::= struct τ# » x | union τ# » x | allocτ e | free e | typedef τ | }u e | e1 } e2 # » | enum x := e ? : τi | e1 && e2 | e1 || e2 # » | global I ? : sto τ | e1 ? e2 : e3 | (e1 , e2 ) # » #» | fun (τ x ? ) s ? : sto τ | (τ ) I | e . x Θ ∈ decls := list (string × decl) r ∈ crefseg ::= [e] | .x 22
CH2 O Abstract C Translation to CH2 O core C in Coq [Krebbers/Wiedijk, Submitted] I
Named variables to De Bruijn indices
23
CH2 O Abstract C Translation to CH2 O core C in Coq [Krebbers/Wiedijk, Submitted] I
Named variables to De Bruijn indices
I
Disambiguate l-values and r-values
23
CH2 O Abstract C Translation to CH2 O core C in Coq [Krebbers/Wiedijk, Submitted] I
Named variables to De Bruijn indices
I
Disambiguate l-values and r-values
I
Sound/complete constant expression evaluation, e.g. in τ [e] [[ e ]]Γ,getstack P,m = v
iff
S(P, e, m) _∗ S(P, v , m)
23
CH2 O Abstract C Translation to CH2 O core C in Coq [Krebbers/Wiedijk, Submitted] I
Named variables to De Bruijn indices
I
Disambiguate l-values and r-values
I
Sound/complete constant expression evaluation, e.g. in τ [e] [[ e ]]Γ,getstack P,m = v
I
iff
S(P, e, m) _∗ S(P, v , m)
Simplification of loops, e.g. while(e) s becomes catch (loop (if (e 0 ) skip else throw 0 ; catch s 0 ))
23
CH2 O Abstract C Translation to CH2 O core C in Coq [Krebbers/Wiedijk, Submitted] I
Named variables to De Bruijn indices
I
Disambiguate l-values and r-values
I
Sound/complete constant expression evaluation, e.g. in τ [e] [[ e ]]Γ,getstack P,m = v
I
iff
S(P, e, m) _∗ S(P, v , m)
Simplification of loops, e.g. while(e) s becomes catch (loop (if (e 0 ) skip else throw 0 ; catch s 0 ))
I
Expansion of typedef and enum declarations
23
CH2 O Abstract C Translation to CH2 O core C in Coq [Krebbers/Wiedijk, Submitted] I
Named variables to De Bruijn indices
I
Disambiguate l-values and r-values
I
Sound/complete constant expression evaluation, e.g. in τ [e] [[ e ]]Γ,getstack P,m = v
I
iff
S(P, e, m) _∗ S(P, v , m)
Simplification of loops, e.g. while(e) s becomes catch (loop (if (e 0 ) skip else throw 0 ; catch s 0 ))
I
Expansion of typedef and enum declarations
I
Translation of constants like INT_MIN
23
CH2 O Abstract C Translation to CH2 O core C in Coq [Krebbers/Wiedijk, Submitted] I
Named variables to De Bruijn indices
I
Disambiguate l-values and r-values
I
Sound/complete constant expression evaluation, e.g. in τ [e] [[ e ]]Γ,getstack P,m = v
I
iff
S(P, e, m) _∗ S(P, v , m)
Simplification of loops, e.g. while(e) s becomes catch (loop (if (e 0 ) skip else throw 0 ; catch s 0 ))
I
Expansion of typedef and enum declarations
I
Translation of constants like INT_MIN
I
Translation of compound literals
23
CH2 O Abstract C Translation to CH2 O core C in Coq [Krebbers/Wiedijk, Submitted] I
Named variables to De Bruijn indices
I
Disambiguate l-values and r-values
I
Sound/complete constant expression evaluation, e.g. in τ [e] [[ e ]]Γ,getstack P,m = v
I
iff
S(P, e, m) _∗ S(P, v , m)
Simplification of loops, e.g. while(e) s becomes catch (loop (if (e 0 ) skip else throw 0 ; catch s 0 ))
I
Expansion of typedef and enum declarations
I
Translation of constants like INT_MIN
I
Translation of compound literals
Theorem (Type soundness) If the translator succeeds, the CH2 O core program is well-typed. 23
Executable semantics Goal: define exec : state → Pfin (state)
24
Executable semantics Goal: define exec : state → Pfin (state) Problems: 1. Decomposition E[ e1 ] of expressions is non-deterministic: S(P, E[ e1 ], m1 ) _ S(P, E[ e2 ], m2 )
if (e1 ,m1 )_h(e2 ,m2 )
2. Object identifiers o for newly allocated memory are arbitrary: S(P, (&, localτ s), m) _ S((localo:τ 2) :: P, (&, s), alloc o τ m)
if o ∈dom / m
24
Executable semantics Goal: define exec : state → Pfin (state) Problems: 1. Decomposition E[ e1 ] of expressions is non-deterministic: S(P, E[ e1 ], m1 ) _ S(P, E[ e2 ], m2 )
if (e1 ,m1 )_h(e2 ,m2 )
2. Object identifiers o for newly allocated memory are arbitrary: S(P, (&, localτ s), m) _ S((localo:τ 2) :: P, (&, s), alloc o τ m)
if o ∈dom / m
Solutions: 1. Enumerate all possible decompositions E[ e1 ] 2. Pick a canonical object identifier fresh m for o 24
Executable semantics Example of the Coq code Definition cexec (Γ : env Ti) (δ : funenv Ti) (S : state Ti) : listset (state Ti) := let ’State k φ m := S in match φ with | Stmt & s => match s with | skip => {[ State k (Stmt % skip) m ]} | goto l => {[ State k (Stmt (y l) (goto l)) m ]} | ... | local{τ} s => let o := fresh (dom indexset m) in (* use canonical object identifier *) {[ State (CLocal o τ :: k) (Stmt & s) (mem_alloc Γ o false τ m) ]} end | Expr e => match maybe_EVal e with | Some (Ω,v) => ... | None => ’(E,e’) ← expr redexes e; (* monadic programming to try each decomposition *) match ehexec Γ (get_stack k) e’ m with | Some (e2,m2) => {[ State k (Expr (subst E e2)) m2 ]} | None => match maybe_ECall_redex e’ with | Some (f, Ωs, vs) => S {[ State (CFun E :: k) (Call f vs) (mem_unlock ( Ωs) m) ]} | _ => {[ State k (Undef (UndefExpr E e’)) m ]} end end end | ...
25
Executable semantics Example of the Coq code Definition cexec (Γ : env Ti) (δ : funenv Ti) (S : state Ti) : listset (state Ti) := let ’State k φ m := S in match φ with | Stmt & s => match s with | skip => {[ State k (Stmt % skip) m ]} | goto l => {[ State k (Stmt (y l) (goto l)) m ]} | ... | local{τ} s => let o := fresh (dom indexset m) in (* use canonical object identifier *) {[ State (CLocal o τ :: k) (Stmt & s) (mem_alloc Γ o false τ m) ]} end | Expr e => match maybe_EVal e with | Some (Ω,v) => ... | None => ’(E,e’) ← expr redexes e; (* monadic programming to try each decomposition *) match ehexec Γ (get_stack k) e’ m with | Some (e2,m2) => {[ State k (Expr (subst E e2)) m2 ]} | None => match maybe_ECall_redex e’ with | Some (f, Ωs, vs) => S {[ State (CFun E :: k) (Call f vs) (mem_unlock ( Ωs) m) ]} | _ => {[ State k (Undef (UndefExpr E e’)) m ]} end end end | ...
25
Executable semantics Example of the Coq code Definition cexec (Γ : env Ti) (δ : funenv Ti) (S : state Ti) : listset (state Ti) := let ’State k φ m := S in match φ with | Stmt & s => match s with | skip => {[ State k (Stmt % skip) m ]} | goto l => {[ State k (Stmt (y l) (goto l)) m ]} | ... | local{τ} s => let o := fresh (dom indexset m) in (* use canonical object identifier *) {[ State (CLocal o τ :: k) (Stmt & s) (mem_alloc Γ o false τ m) ]} end | Expr e => match maybe_EVal e with | Some (Ω,v) => ... | None => ’(E,e’) ← expr redexes e; (* monadic programming to try each decomposition *) match ehexec Γ (get_stack k) e’ m with | Some (e2,m2) => {[ State k (Expr (subst E e2)) m2 ]} | None => match maybe_ECall_redex e’ with | Some (f, Ωs, vs) => S {[ State (CFun E :: k) (Call f vs) (mem_unlock ( Ωs) m) ]} | _ => {[ State k (Undef (UndefExpr E e’)) m ]} end end end | ...
25
Executable semantics Soundness and completeness [Krebbers/Wiedijk, Submitted]
Theorem (Soundness) If S2 ∈ exec S1 , then S1 _ S2 .
26
Executable semantics Soundness and completeness [Krebbers/Wiedijk, Submitted]
Theorem (Soundness) If S2 ∈ exec S1 , then S1 _ S2 .
Definition (Permutation) We let S1 ∼f S2 , if S2 is obtained by renaming S1 with respect to f : index → option index.
26
Executable semantics Soundness and completeness [Krebbers/Wiedijk, Submitted]
Theorem (Soundness) If S2 ∈ exec S1 , then S1 _ S2 .
Definition (Permutation) We let S1 ∼f S2 , if S2 is obtained by renaming S1 with respect to f : index → option index.
Theorem (Completeness) If S1 _∗ S2 , then there exists an f and S20 such that: S20
∗ exec
S1
f ∗
S2 26
Executable semantics Demo
$ cat foo.c int main(void) { int x = 10, *p = &x; return *p + *p + *p + *p + *p + *p; } $ ./ch2o -t foo.c .....,.......,,,.....+;!|?*%%$$$$$$$$%%*?|!;:+-,.. ........ "" 60
27
Conclusion
C11 is not a simple language
28
Questions
Sources: http://robbertkrebbers.nl/research/ch2o/
http://xkcd.com/371/
29