Software Tools for Technology Transfer manuscript No. (will be inserted by the editor)
A Low-Level Memory Model and an Accompanying Reachability Predicate Shaunak Chatterjee1 , Shuvendu K. Lahiri2 , Shaz Qadeer2 , Zvonimir Rakamari´ c3 1 2 3
University of California, Berkeley, e-mail:
[email protected] Microsoft Research, e-mail: shuvendu,
[email protected] University of British Columbia, e-mail:
[email protected]
Received: date / Revised version: date
Abstract. Reasoning about program heap, especially if it involves handling unbounded, dynamically heapallocated data structures such as linked lists and arrays, is challenging. Furthermore, sound analysis that precisely models heap becomes significantly more challenging in the presence of low-level pointer manipulation that is prevalent in systems software. The reachability predicate has already proved to be useful for reasoning about the heap in type-safe languages where memory is manipulated by dereferencing object fields. In this paper, we present a memory model suitable for reasoning about low-level pointer operations that is accompanied by a formalization of the reachability predicate in the presence of internal pointers and pointer arithmetic. We have designed an annotation language for C programs that makes use of the new predicate. This language enables us to specify properties of many interesting data structures present in the Windows kernel. We present our experience with a prototype verifier on a set of illustrative C benchmarks.
1 Introduction Static software verification has the potential to improve programmer productivity and reduce the cost of producing reliable software. By finding errors at the time of compilation, these techniques help avoid costly software changes late in the development cycle and after deployment. Many successful tools for detecting errors in systems software have emerged in the last decade [3,22,13, 37,12,1]. These tools can scale to large software systems; however, this scalability is achieved at the price of precision. Heap and heap-allocated data structures are one of the most significant sources of imprecision for these
tools. Fundamental correctness properties, such as control and memory safety, depend on intermediate assertions about the contents of data structures. Therefore, imprecise reasoning about the heap usually results in a large number of annoying false warnings increasing the probability of missing the real errors. This is a significant drawback since studies have shown that many of the systems software code bugs and reported failures with high impact on availability are still related to memory management [11,28,36]. Because of the vast size of available memory in todays computer systems, faithfully representing each memory allocation and access in a static verifier is not going to scale. Therefore, verification tools rely on abstract memory models that trade precision for scalability, and in turn, they define the operational semantics of their programming language with respect to the chosen memory model. We present in this paper a memory model that is precise enough to capture most of the low-level pointer operations, and yet abstracts enough details to enable verification to scale. We also give the respective operational semantics of the C language. The reachability predicate [30] is important for specifying properties of linked data structures. Informally, a memory location v is reachable from a memory location u in a heap if either u = v or u contains the address of a location x and v is reachable from x. Automated reasoning about the reachability predicate is difficult for two reasons. First, reachability cannot be expressed in first-order logic, the input language of choice for most modern and scalable automated theorem provers. Second, it is difficult to precisely specify the update to the reachability predicate when a heap location is updated. Previous work has addressed these problems in the context of a reachability predicate suitable for verifying programs written in high-level languages such as Java and C# [34,26,2,23,8]. This predicate is inadequate for reasoning about low-level software, which com-
2
Shaunak Chatterjee et al.: A Low-Level Memory Model and an Accompanying Reachability Predicate
monly uses programming idioms such as internal pointers (addresses of object fields) and pointer arithmetic to move between object fields. We illustrate this point with several examples in Section 3. The goal of our work is to build a scalable verifier for systems software that can reason precisely about heap-allocated data structures. To this end, in addition to the memory model, we introduce in this paper the accompanying reachability predicate suitable for verifying low-level programs written in C. We describe how to automatically compute the precise update for the described predicate and a method for reasoning about it using automated first-order theorem provers. With recent noticable performance improvements of theorem provers, we believe that the introduced reachablity predicate could be used for suitably extending theoremprover-based static checkers for low-level programs, making them therefore more precise. We have designed a specification language that uses our reachability predicate, allows succinct specification of interesting properties of low-level software, and is conducive to modular program verification. We have implemented a modular verifier for annotated C programs called Havoc (Heap-Aware Verifier fOr C programs). We report on our preliminary encouraging experience with Havoc on a set of illustrative C programs. 2 Related work Havoc is a static assertion checker for annotated C programs in the same style that ESC/Java [20] is a static checker for annotated Java programs, and Boogie [4] is a static checker for Spec# [6] programs.1 However, Havoc is different in that it deals with the low-level intricacies of C and provides reachability as a fundamental primitive in its specification language. The ability to specify reachability properties also distinguishes Havoc from other assertion checkers for C such as SATABS [12], SATURN [37], Calysto [1], and VerifiedC [35]. Traditionally, such tools either overapproximate unbounded data structures, which in turn leads to a lot of false warnings, or simply treat them unsoundly as bounded data structures (often with only one element), which causes real bugs to be missed. The work of McPeak and Necula [29] allows reasoning about reachability, but only indirectly using ghost fields in heap-allocated objects. These ghost fields must be updated manually by the programmer whereas Havoc provides the update to its reachability predicate automatically. There are several verifiers that do allow the verification of properties based on the reachability predicate. TVLA [27] is a verification tool based on abstract interpretation using 3-valued logic [34]. It provides a gen1 Spec# is an extension of C# that provides specification primitives for writing method pre- and postconditions and object invariants.
eral specification logic combining first-order logic with reachability. Recently, an axiomatization of reachability in first-order logic was also added to the system [26]. However, TVLA has mostly been applied to Java programs and, to our knowledge, cannot handle the interaction of reachability with pointer arithmetic. Caduceus [19] is a modular verifier for C programs. It allows the programmer to write specifications in terms of arbitrary recursive predicates, which are axiomatized in an external theorem prover. It then allows the programmer to interactively verify the generated verification conditions in that prover. Havoc only allows the use of a fixed set of reachability predicates but provides much more automation than Caduceus. All the verification conditions generated by Havoc are discharged automatically using SMT (Satisfiability Modulo Theories) provers. Unlike Caduceus, Havoc understands internal pointers and the use of pointer arithmetic to move between fields of an object. There are many approaches to checking of heapmanipulating programs that employ the idea of local reasoning based on the frame rule of separation logic [33, 32]. The frame rule in this context shows that it is sound to look at only a fragment of the input heap when analyzing each program instruction. This important property has been used to speed up and increase scalability of many analyses [38,7,21,18]. While many of these approaches infer loop invariants of list-manipulating programs automatically, they don’t handle low-level features of C programs that are prevalent in systems code, such as pointer arithmetic, at all. Havoc, on the other hand, supports reasoning about linked data structures in the presence of low-level pointer manipulations, while loop invariants currently have to be provided manually. Calcagno et al. have used separation logic to reason about memory safety and absence of memory leaks in low-level code [9]. They perform abstract interpretation using rewrite rules that are tailored for “multiword lists”, a fixed predicate expressed in separation logic. Our approach is more general since we provide a family of reachability predicates, which the programmer can compose arbitrarily for writing richer specifications (possibly involving quantifiers); the rewriting involved in the generation and validation of verification conditions is taken care of automatically by Havoc. Their tool can infer loop invariants but handles procedures by inlining. In contrast, Havoc performs modular reasoning, but does not infer loop invariants.
3 Motivation Consider the two doubly-linked lists shown in Figure 1. The list at the top is typical of high-level object-oriented programs. The linking fields Flink and Blink point to the beginning of the successor and predecessor objects in the list. In each iteration of a loop that iterates over the
Shaunak Chatterjee et al.: A Low-Level Memory Model and an Accompanying Reachability Predicate
3
q
k Flink
Flink
Flink
Blink
Blink
Blink
Flink
Flink
Flink
Blink
Blink
Blink
p p+4
Fig. 1. Doubly-linked lists in Java and C.
linked list, the iterator variable points to the beginning of a list object whose contents are accessed by a simple field dereference. Existing work would allow properties of this linked list to be specified using the two reachability predicates RFlink and RBlink , each of which is a binary relation on object references. For example, RFlink (a, b) holds for object references a and b if a.Flinki = b for some i ≥ 0. The list at the bottom is typical of low-level systems software. Such a list is constructed by embedding a structure LIST ENTRY containing the two fields, Flink and Blink, into the objects that are supposed to be linked by the list. typedef struct _LIST_ENTRY { struct _LIST_ENTRY *Flink; struct _LIST_ENTRY *Blink; } LIST_ENTRY; The linking fields, instead of pointing to the beginning of the list objects, point to the beginning of the embedded linking structure. In each iteration of a loop that iterates over such a list, the iterator variable contains a pointer to the beginning of the structure embedded in a list object. A pointer to the beginning of the list object is obtained by performing pointer arithmetic captured with the following C macro. #define CONTAINING_RECORD(a, T, f) \ (T *) ((int)a - (int)&((T *)0)->f)
This macro expects an internal pointer a to a field f of an object of type T and returns a typed pointer to the beginning of the object. There are two good engineering reasons for this ostensibly dangerous programming idiom. First, it becomes possible to write all list manipulation code for operations such as insertion and deletion separately in terms of the type LIST ENTRY. Second, it becomes easy to have one object be a part of several different linked lists; there is a field of type LIST ENTRY in the object corresponding to each list. For these reasons, this idiom is common both in the Windows and the Linux operating system2 . Unfortunately, this programming idiom cannot be modeled using the predicates RFlink and RBlink described earlier. The fundamental reason is that these lists may link objects via pointers at a potentially non-zero offset into the objects. Different data structures might use different offsets; in fact, the offset used by a particular data structure is a crucial part of its specification. This is in stark contrast to the first kind of linked lists in which the linking offset is guaranteed to be zero. The crucial insight underlying our work is that for analyzing low-level software, the reachability predicate must be a relation on pointers rather than object references. Therefore, we introduce an integer-indexed set of binary reachability predicates: for each integer n, the 2
In Linux, the CONTAINING RECORD macro corresponds to the list entry macro.
4
Shaunak Chatterjee et al.: A Low-Level Memory Model and an Accompanying Reachability Predicate
predicate Rn is a binary relation on the set of pointers. Suppose n is an integer and p and q are pointers. Then Rn (p, q) holds if and only if either p = q, or recursively Rn (∗(p+ n), q) holds, where ∗(p+ n) is the pointer stored in memory at the address obtained by incrementing p by n. Our reachability predicate captures the insight that in low-level programs a list of pointers is constructed by performing an alternating sequence of pointer arithmetic (with respect to a constant offset) and memory lookup operations. For example, let p be the address of the Flink field of an object in the linked list at the bottom of Figure 1. Then, the forward-going list is captured by the pointer sequence p, ∗(p + 0), ∗(∗(p + 0) + 0), . . . . Similarly, assuming that the size of a pointer is 4, the backward-going list is captured by the pointer sequence p, ∗(p + 4), ∗(∗(p + 4) + 4), . . . . Our reachability predicate is a generalization of the existing reachability predicate and can just as well describe the linked list at the top of Figure 1. Suppose the offset of the Flink field in the linked objects is k and q is the address of the start of some object in the list. Then, the forward-going list is captured by q, ∗(q + k), ∗(∗(q + k) + k), . . . and the backward-going list is captured by q, ∗(q + k + 4), ∗(∗(q + k + 4) + k + 4), . . . . 3.1 Example We illustrate the use of our reachability predicate in program verification with the example in Figure 2. The example has a type A and a global structure g with a field a. The field a in g and the field link in the type A have the type LIST ENTRY, which was defined earlier. These fields are used to link together in a circular doubly-linked list the object g and a set of objects of type A. The field a in g is the dummy head of this list. For an example of such a heap structure see Figure 3. The procedure list iterate iterates over this list, setting the data field of each list element to 42. Except for verifying the safety of each memory access in list iterate, we would also like to verify two additional properties. First, the only parts of the callervisible state modified by list iterate are the data fields of the list elements. Second, the data field of each list element is 42 when list iterate terminates. To prove these properties on list iterate, it is crucial to have a precondition stating that the list of objects linked by the Flink field of LIST ENTRY is circular. To specify this property, we extend the notion of wellfounded lists, first described in an earlier paper [23], to
our new reachability predicate. The predicate Rn is wellfounded with respect to a set BS of blocking pointers if for all pointers p, the sequence ∗(p + n), ∗(∗(p + n) + n), . . . contains a pointer in BS. This member of BS is called the blocker of p with respect to the offset n and is denoted by Bn [p]. Typical members of BS include pointer values that indicate the end of linked lists, e.g., the null pointer or the head &g.a of the circular lists in our example. Our checker Havoc enforces a programming discipline associated with well-founded lists. Havoc provides an auxiliary variable BS whose value is a set of pointers and allows program statements to add or remove pointers from BS. Further, each heap update in the program is required to preserve the well-foundedness of Rn with respect to each offset n of interest. The first precondition of list iterate uses the notion of well-foundedness to express that &g.a is the head of a circular list. In this precondition, B(&g.a,0) refers to B0 [&g.a]. We use B0 to specify that the circular list is formed by the Flink field, which is at offset 0 within LIST ENTRY. The second precondition illustrates how facts about an entire collection of pointers are expressed in our specification language. In this precondition, the expression list(g.a.Flink,0) refers to the finite and non-empty set of pointers in the sequence g.a.Flink, ∗(g.a.Flink + 0), . . . upto but excluding the pointer B0 (g.a.Flink). In Havoc, we represent a pointer as a pair comprised of an object reference and an integer offset into the object, and the program memory is a map from pointers to pointers. Therefore, the function Off retrieves the offset (or the second component) from a pointer. This precondition states that the offset of each pointer in list(g.a.Flink,0), excluding the dummy head, is equal to 4, the offset of the field sequence link.Flink in the type A. The third precondition uses the function Obj, which retrieves the object reference (or the first component) from a pointer. This precondition says that the object of each pointer, excluding the dummy head, in list(g.a.Flink,0) is different from the object of the dummy head. The modifies clause illustrates yet another constructor of a set of pointers provided by our language. If S is a set of pointers, then decr(S, n) is the set of pointers obtained by decrementing each pointer in S by n. The modifies clause captures the update of the data field at relative offset −4 from the members of list(g.a.Flink,0). The postcondition of the procedure introduces the operator deref, which returns the content of the memory at a pointer address. This postcondition says that the value of the data field of each object in the list, excluding the dummy head, is 42.
Shaunak Chatterjee et al.: A Low-Level Memory Model and an Accompanying Reachability Predicate
5
typedef struct { int data; LIST_ENTRY link; } A; struct { LIST_ENTRY a; } g; requires BS(&g.a) && B(&g.a, 0) == &g.a requires forall(x, list(g.a.Flink, 0), x == &g.a || Off(x) == 4) requires forall(x, list(g.a.Flink, 0), x == &g.a || Obj(x) != Obj(&g.a)) modifies decr(list(g.a.Flink, 0), 4) ensures forall(x, list(g.a.Flink, 0), x == &g.a || deref(x-4) == 42) void list_iterate() { LIST_ENTRY *iter = g.a.Flink; while (iter != &(g.a)) { A *elem = CONTAINING_RECORD(iter, A, link); elem->data = 42; iter = iter->Flink; } } Fig. 2. Motivating example.
elem iter data
data
data
a.Flink
link.Flink
link.Flink
link.Flink
a.Blink
link.Blink
link.Blink
link.Blink
g
Fig. 3. Example of an input structure for our motivating example.
Using loop invariants provided by us (not shown in the figure), Havoc is able to verify that the implementation of this procedure satisfies its specification. Note that in the presence of potentially unsafe pointer arithmetic and casts, it is nontrivial to verify that the heap update operation elem->data := 42 does not change the linking structure of the list. Since Havoc cannot rely on the static type of the variable elem, it must prove that the offset of elem before the operation is 0 and therefore the operation cannot modify either linking field.
4 Operational semantics of C Our semantics for C programs depends on three fundamental types, the uninterpreted type ref of object references, the type int of integers, and the type ptr = ref × int of pointers. In Havoc, each variable from a C program, regardless of its static type, contains a pointer value. A pointer is a pair containing an object reference and an integer offset. An integer value is encoded as a pointer value whose first component is the special constant null of type ref. The constructor function Ptr : ref × int → ptr constructs a pointer value from its components. The selector functions Obj : ptr → ref
and Off : ptr → int retrieve the first and second component of a pointer value, respectively. The heap of a C program is modeled using two map variables, Mem and Alloc, and a map constant Size. The variable Mem maps pointers to pointers and intuitively represents the contents of the memory at a pointer location. The variable Alloc maps object references to the set {UNALLOCATED, ALLOCATED, FREED} and is used to model memory allocation. The constant Size maps object references to positive integers and represents the size of the object. The procedure call malloc(n) for allocating a memory buffer of size n returns a pointer Ptr(o, 0) where o is an object such that Alloc[o] = UNALLOCATED before the call and Size[o] ≥ n. The procedure modifies Alloc[o] to be ALLOCATED. The procedure call free(p) for freeing a memory buffer whose address is contained in p requires that Alloc[Obj(p)] == ALLOCATED and Off(p) == 0 and updates Alloc[Obj(p)] to FREED. The full specification of malloc and free is given in Figure 4. Currently, Havoc’s memory model understands only word-aligned memory accesses. For example, writing a 4-byte integer value to the memory location Ptr(o, 20) is not going to affect the value stored at the memory location Ptr(o, 21). Similarly, reading a byte from the memory location Ptr(o, 21) is not going to return the
6
Shaunak Chatterjee et al.: A Low-Level Memory Model and an Accompanying Reachability Predicate function PLUS(ptr, ptr) returns ptr; axiom (forall x,y:ptr :: (Obj(x) == null ==> PLUS(x,y) == Ptr(Obj(y), Off(x)+Off(y))) && (Obj(y) == null ==> PLUS(x,y) == Ptr(Obj(x), Off(x)+Off(y))) && (Obj(x) != null && Obj(y) != null ==> Obj(PLUS(x,y)) == null))
procedure malloc(n: ptr) returns new:ptr requires Obj(n) == null && 0 < Off(n) modifies Alloc ensures old(Alloc)[Obj(new)] == UNALLOCATED ensures Alloc[Obj(new)] == ALLOCATED ensures Off(new) == 0 ensures Off(n) MINUS(x,y) == Ptr(Obj(x), Off(x)-Off(y))) && (Obj(y) != null && Obj(x) == Obj(y) ==> MINUS(x,y) == Ptr(null, Off(x)-Off(y))) && (Obj(y) != null && Obj(x) != Obj(y) ==> Obj(MINUS(x,y)) == null))
procedure free(p: ptr) requires Alloc(Obj(p)) == ALLOCATED && Off(p) == 0 modifies Alloc ensures alloc[Obj(p)] != UNALLOCATED ensures alloc[Obj(p)] != ALLOCATED ensures (forall o:ref :: o == Obj(p) || old(Alloc)[o] == Alloc[o])
function LT(ptr, ptr) returns bool; axiom (forall x,y:ptr :: LT(x,y) Off(MINUS(y,x)) > 0)
Fig. 4. Specification of procedures malloc and free that are used to model memory allocation.
C1
Fig. 5. Specification of functions PLUS, MINUS, and LT that are used to model arithmetic and comparison operations on the type ptr.
typedef struct { int x; int y[10]; } DATA;
}
B1 B2 B3 B4 B5 B6 B7 B8 B9 B10 B11
}
void init(int *in, int size, int *out) { int i; i = 0; while (i < size) { in[i] = i; *out = *out + i; i++; } }
B12 B13 B14 B15 B16 B17 B18 B19 B20 B21
procedure init(in:ptr, size:ptr, out:ptr) { var i:ptr; i := Ptr(null,0); while (LT(i, size)) { Mem[PLUS(in, Ptr(null,Off(i)*4))] := i; Mem[out] := PLUS(Mem[out], i); i := PLUS(i, Ptr(null,1)); } }
C2 C3 C4 C5 C6 C7 C8 C9 C10 C11 C12
DATA *create() { int a;
C13 C14 C15 C16 C17 C18 C19 C20 C21 C22
DATA *d = (DATA *) malloc(sizeof(DATA)); init(d->y, 10, &a); d->x = a; return d;
procedure create() returns d:ptr { var a:ptr; call a := malloc(Ptr(null,4)); call d := malloc(Ptr(null,44)); call init(PLUS(d, Ptr(null,4)), Ptr(null,10), a); Mem[PLUS(d, Ptr(null,0))] := Mem[a]; call free(a);
Fig. 6. Translation of a simple C example on the left into the slightly simplified (to make it more readable) BoogiePL code on the right.
Shaunak Chatterjee et al.: A Low-Level Memory Model and an Accompanying Reachability Predicate
7
second byte of the integer that was written to Ptr(o, 20), but rather an unconstrained value. We are planning to address this deficiency in the future. Havoc takes an annotated C program and translates it into a BoogiePL [15] program. BoogiePL has been designed to be an intermediate language for program verification tools that use automated theorem provers. This language is simple and has well-defined semantics. The operational semantics of C, as interpreted by Havoc, is best understood by comparing a C program with its BoogiePL translation. Figure 6 shows two procedures, create and init, on the left and their translations on the right. The example uses the C struct type DATA defined on line C1. Note that variables of both static type int and int* in C are translated uniformly as variables of type ptr. The translation of the first argument d->y of the call to init on line C7 shows that we treat field accesses and pointer arithmetic uniformly. Since the array field y is at an offset 4 in DATA, we treat d->y as d+4 on line B6. Array accesses are also translated using pointer arithmetic. For instance, array access on line C18 is translated as in+i*4 on line B17 since the size of each array element is 4. The translation uses the function PLUS to model pointer arithmetic and the function LT on line B16 to model arithmetic comparison operations on the type ptr. The definitions of these functions are given in Figure 5. The example also shows how we handle the & operator. In the procedure create, the address of the local variable a is passed as an out-parameter to the procedure init. Our translation handles this case by allocating a on the heap on line B3. Then, the C expression &a on line C7 is translated as a on line B7, while the expression a on line C9 is translated as the heap access Mem[a] on line B8. Note that our translator allocates a static variable on the heap only if the program takes the address of that variable or if the type of the variable is a structure or union. We allocate all structures on the heap. For example, there is no heap allocation for the local variable i in the procedure init. To prevent access to the heap-allocated object corresponding to a local variable of a procedure, it is freed at the end of the procedure. Therefore, the translation freed the local variable a on line B9.
and f i+1 (n, u) = Mem[f i (n, u)+n] for all i > 0. Then Mem is well-founded with respect to the set of blocking pointers BS and offset n if for all u ∈ ptr, there is i > 0 such that f i (n, u) ∈ BS. If a heap is well-founded with respect to BS and n, then the function idx n maps a pointer u to the least i > 0 such that f i (n, u) ∈ BS. Using these concepts, we now define for each n ∈ int, a predicate Rn ⊆ ptr × ptr and a function Bn : ptr → ptr.
5 Reachability and pointer arithmetic
6 Annotation language
We now give the formal definition of our new reachability predicate in terms of the operational semantics of C as interpreted by Havoc. As in our previous work [23], we define the reachability predicate on well-founded heaps. Let the heap be represented by the function Mem : ptr → ptr and let BS ⊆ ptr be a set of pointers. We define a sequence of functions f i : int× ptr → ptr for i ≥ 0 as follows: for all n ∈ int and u ∈ ptr, we have f 0 (n, u) = u
Our annotation language has three components: basic expressions that evaluate to pointers, set expressions that evaluate to sets of pointers, and formulas that evaluate to boolean values. The syntax for these expressions is given in Figure 7. The set of basic expressions is captured by Expr . The expression addr(x) represents the address of the variable x. The expression x represents the value of x in the
Rn [u, v] ≡ ∃i. 0 ≤ i < idx n (u) ∧ v = f i (n, u) Bn [u] ≡ f idx n (u) (n, u) Suppose a program performs the operation Mem[x] := y to update the heap. Then Havoc performs the most precise update to the predicate Rn and the function Bn by automatically inserting the following code just before the operation: assert(Rn [y, x − n] ⇒ BS[y]) Bn := λ u : ptr. Rn [u, x − n] ? (BS[y] ? y : Bn [y]) : Bn [u] Rn := λ u, v : ptr. Rn [u, x − n] ? (Rn [u, v] ∧ ¬Rn [x − n, v]) ∨ v = x − n ∨ (¬BS[y] ∧ Rn [y, v]) : Rn [u, v] The assertion enforces that the heap stays well-founded with respect to the blocking set BS and the offset n. It is inserted for each offset n of interest in the program. The value of Bn [u] is updated only if x − n is reachable from u and otherwise remains unchanged. Similarly, the value of Rn [u, v] is updated only if x−n is reachable from u and otherwise remains unchanged. These updates are generalizations of the updates provided in our earlier paper [23] to account for pointer arithmetic. We note that the ability to provide such updates as described above guarantees that if a program’s assertions —preconditions, postconditions, and loop invariants— are quantifier-free, then its verification condition is quantifier-free as well. This property is valuable because the handling of quantifiers is typically the least complete and efficient aspect of all theorem provers that combine first-order reasoning with arithmetic.
8
Shaunak Chatterjee et al.: A Low-Level Memory Model and an Accompanying Reachability Predicate n∈ e∈
int Expr ::= n | x | addr(x) | e + e | e - e | deref(e) | block(e, n) | old(x) | old deref(e) | old block(e, n) S∈ Set ::= {e} | BS | list(e, n) | old list(e, n) | array(e, n, e) [b] φ ∈ Formula ::= alloc(e) | old alloc(e) | Obj(e) == Obj(e) | Off(e) < Off(e) | in(e, S) | ! φ | φ && φ | forall(x, S, φ) C ∈ CmpdSet ::= S | incr(C, n) | decr(C, n) | deref(C) | old deref(C) union(C, C) | intersection(C, C) | difference(C, C) Fig. 7. Core annotation language. Syntactic sugar, that allows writing many common idioms succinctly and more conviniently, is added on top of it.
[|n|] = Ptr(null, n) Mem[bpl x], if address of x is taken [|x|] = otherwise bpl x, [|addr(x)|] = bpl x [|e1 + e2 |] = Ptr(Obj([|e1 |]), Off([|e1 |]) + Off([|e2 |])) [|e1 - e2 |] = Ptr(null, Off([|e1 |]) - Off([|e2 |])) [|deref(e)|] = Mem[[|e|]] [|block(e, n)|] = Bn [[|e|]] [|old(x)|] = old([|x|]) [|old deref(e)|] = old(Mem)[[|e|]] [|old block(e, n)|] = old(Bn )[[|e|]] [|alloc(e)|] [|old alloc(e)|] [|Obj(e1 ) == Obj(e2 )|] [|Off(e1 ) < Off(e2 )|] [|! φ|] [|φ1 && φ2 |] [|forall(x, S, φ)|]
= = = = = = =
Alloc[Obj([|e|])] == ALLOCATED old(Alloc)[Obj([|e|])] == ALLOCATED Obj([|e1 |]) == Obj([|e2 |]) Off([|e1 |]) < Off([|e2 |]) ! [|φ|] [|φ1 |] && [|φ2 |] (forall x : ptr :: ! [|in(x, S)|] || [|φ|])
[|in(e, {e′ })|] [|in(e, BS)|] [|in(e, list(e′ , n))|] [|in(e, old list(e′ , n))|] [|in(e, array(e1 , n, e2 ))|] [|in(e, incr(C, n))|] [|in(e, decr(C, n))|] [|in(e, deref(C))|] [|in(e, old deref(C))|] [|in(e, union(C1 , C2 ))|] [|in(e, intersection(C1 , C2 ))|] [|in(e, difference(C1 , C2 ))|]
= = = = = = = = = = = =
[|e|] == [|e′ |] BS[[|e|]] Rn [[|e′ |], [|e|]] old(Rn )[[|e′ |], [|e|]] (exists i : int :: 0