Modular Enforcement of Information Flow Policies in Data Structures

0 downloads 0 Views 289KB Size Report
tures such as hash tables can leak information, e.g. the operation history ... uniquely represented (UR) data structures [17, 18] for modular ...... Lecture Notes in.
Modular Enforcement of Information Flow Policies in Data Structures Gordon Stewart Princeton University Princeton, NJ, USA [email protected]

Anindya Banerjee Aleksandar Nanevski IMDEA Software Institute Madrid, Spain {anindya.banerjee, aleks.nanevski}@imdea.org

Abstract—Standard implementations of common data structures such as hash tables can leak information, e.g. the operation history, to attackers with later access to a machine’s memory. This leakage is particularly damaging whenever the history of operations performed on a data structure must remain secret, such as in voting machines. We show how unique representation—the requirement that a data structure have canonical machine representations—can be used to perform modular verification of information flow policies in programs that compose data structures with their clients. We present a compositional verification system based on Relational Hoare Type Theory (RHTT) that uses unique representation to enforce end-to-end security guarantees such as noninterference for such programs. We validate our system and technique with examples drawn from arrays, multisets, hash tables, and a medical database application. The system, theorems, and examples have all been verified in Coq.

I. I NTRODUCTION Data structures pose serious problems for information flow analyses. The most serious is the potential mismatch between a data structure’s interface, given by its method specifications, and the implementation of this interface in memory: a data structure may store sensitive information about usage patterns or other private data, e.g., in log files or in the linking structure, that is hidden from the client by the interface but which can be recovered by an attacker with later access to the memory. Such access is especially problematic in the case of vote storage units since a leak of the ballot-insertion order can imperil voter privacy [25]. In cryptoprocessors [3], evidence of secret keys in the structure of a hash table can expose ATMs and smartcards to attack. Previous work on languagebased enforcement of confidentiality policies [13, 26, 34] can prevent such leaks, but at the cost of a serious loss of precision resulting in the rejection of secure and useful programs. No previous static analysis that we are aware of enables reasoning about the information flow behavior of data structures and their clients in a manner that is both modular and precise. In this paper, we present initial results on the use of uniquely represented (UR) data structures [17, 18] for modular enforcement of information flow policies in Relational Hoare Type Theory (RHTT, [27]), a novel language and verification system capable of expressing rich information flow and access control policies via dependent types. Unique representation— the requirement that a data structure’s implementation store only its client-visible state, and no more—is key to decoupling

information flow proofs of data structure clients from the details of data structure implementations, and thus to achieving modularity. To see why, consider what happens when a client, Alice, attempts to prove that the program insert(h, T ); insert(l , T ); remove(h, T ) on hash table T , confidential key h, and public key l purges the confidential key h from the system. One way to state this policy is in the terminology of the classic 2-safety formulation of noninterference: the idea that public outputs in two runs of the program must agree even when confidential inputs do not.1 Here Alice’s confidential inputs are the values of h in the two runs; her public inputs are the key l and the initial state of the hash table T . To prove that h is effectively purged, she must show that the hash table T 0 that results after inserting h and l , then removing h has the same memory image as the hash table that results after inserting l alone. But to do so, she must inspect the hash table implementation and thus break its interface abstraction. Indeed, what Alice finds is that even though both hash tables contain the same key set— namely, {l }—T 0 may have hashed l to a different location due to a hash collision with h, thus invalidating her policy. Unique representation prevents such leaks, and ensures that Alice can reason with respect to the abstraction provided by the data structure’s interface, rather than with respect to its implementation. The main contributions of this work are the following. 1) Unique representation as abstraction. We identify unique representation as an important abstraction principle for reasoning about information flow. The use of UR data structures gives rise to noninterference proofs that are independent of the data structure implementations, and thus compositional. We justify these claims by first implementing and verifying in RHTT several common data structures, including multisets (Sec. IV) and hash tables (Sec. V), and then demonstrating how the libraries can be used in proofs of client programs. 2) Unique representation in RHTT. We present a verification system for information flow based on these ideas that integrates unique representation with RHTT’s support for 1 Timing and other attacks over covert channels are outside the scope of this paper.

specification and enforcement of conditional, state-based access control and noninterference policies. 3) Verified UR filter hash tables. We describe a new UR variant of filter hash tables [16], and prove its correctness and unique representation in RHTT (Sec. V). As in standard probing hash tables, na¨ıve treatment of collisions in filter hash tables leaks information since a private or high security item in the table can change the position at which a public item is placed. In our UR variant, we enforce a canonical ordering on items during insertions and deletions that ensures the absence of such leaks. 4) Medical database application. Finally, we present an implementation in RHTT of a medical database application based on that of Borgstr¨om et al. [11]. We use this extended example to illustrate the usefulness of unique representation in the enforcement of erasure-dependent access control policies (Sec. III and Sec. VI). The medical database application is described in detail in Appendix A. All of our proofs are machine-checked in Coq, and are available at http://www.cs.princeton.edu/∼jsseven/papers/ur. To the best of our knowledge this paper is the first to consider verification of data structures and their clients with respect to realistic security policies such as information erasure. Although Golovin’s PhD dissertation [17] suggests applications of UR data structures to privacy, the formal connections to security, and to erasure policies in particular, were not developed. The rest of the paper is organized as follows: The next section provides an introduction to Relational Hoare Type Theory (RHTT) and simple erasure policies. Sec. III introduces erasure-dependent access control policies as motivation for UR data structures. Sec. IV defines unique representation formally and describes our implementations of UR arrays and bounded multisets. Sec. V presents a new, UR variant of filter hash tables. The final sections of the paper revisit erasure-dependent access control (Sec. VI), discuss related work (Sec. VII), and conclude with pointers to future work (Sec. VIII).

Noninterference, types, specifications: The specification of flexible security policies crucially depends on a relational specification of noninterference (NI) which says low outputs of a computation are independent of high inputs [14]. It is independence that is a relational property: Consider a function f :A2 →A2 , where A2 = A × A. Let e.1 and e.2 denote resp. the first and the second component of the ordered pair e. Then, mathematically, f ’s first output is independent of f ’s second argument iff ∀x1 x2 y1 y2 . x1 = x2 → f (x1 , y1 ).1 = f (x2 , y2 ).1 In other words, in two runs of f , equal x inputs, lead to equal f (x , y).1 outputs. This relational statement of independence can be viewed as a definition of NI in terms of f alone [2, 7]. The consequence is this: rather than employ security lattices and orderings on variables to determine what is low security and what is not [5, 15], we consider inputs and outputs related by equality in the two runs of f above as low (x and f (x , y).1 above). The unconstrained values (y and f (x , y).2) are implicitly considered high. These ideas can be lifted to security lattices with multiple levels (see discussion in [27]), but we forgo such development here. This reading of low security immediately lends itself to a type-theoretic interpretation and thus motivates RHTT’s use of dependent type theory. The primary observation is that if information that x is low is absent at a module interface then x is possibly high. But this is precisely a notion of information hiding that is explained using standard constructs of type theory such as abstract types and abstract predicates [24]. The type STsec A (p, q) specifies heap-manipulating, potentially diverging RHTT computations. The type A gives the return type of an RHTT computation (int, string, etc.) whereas p and q define the program’s precondition and postcondition respectively, as in Floyd-Hoare logic. Following Separation Logic [30], the precondition p is a predicate on heaps that defines the subset of memories in which it is safe to run the command. In particular, if a program typechecks in RHTT with precondition p, then every heap location accessed by the program, besides those allocated by the program itself, will be accessible in every heap described by p. In order to specify relational properties such as NI, the postcondition q:A2 → heap2 → heap2 → prop relates pairs of return values (y1 , y2 ) of type A2 with pairs of initial heaps (i1 , i2 ) and pairs of final heaps (m1 , m2 ). Here heap is the type of heaps, modeled semantically as finite partial maps from locations to values, and prop can roughly be read as bool. Intuitively, the relational postcondition of the (termination-insensitive) STsec judgment says that if two runs from initial states i1 and i2 terminate in final states m1 and m2 , returning values y1 and y2 as results, then the triple ((y1 , y2 ), (i1 , i2 ), (m1 , m2 )) is in q. N OTATION : Examples in the sequel will often use the notation, e.g., yy, ii , mm to stand for pairs (y1 , y2 ), (i1 , i2 ), (m1 , m2 ) respectively. An example erasure policy: Consider the following specification of an erasure policy, and a program conforming to the specification, adapted from Hunt and Sands [19], both

II. BACKGROUND : RHTT We briefly recapitulate the main components of the RHTT framework introduced by Nanevski et al. [27]. Fundamentally, RHTT is based on the following aspects of dependent type theory: dependent function types, inductive types and module systems. Dependent function types describe how a function body depends on its arguments; inductive types are needed to specify data structures such as lists, trees, graphs, etc.; module systems including abstract types and predicates are needed for information hiding. Nanevski et al. show that these aspects jointly give RHTT the power to specify and verify flexible security policies: for example, one can specify the higher-order erasure-dependent access policy “Alice grants Bob the right to read her salary provided Bob proves that his code will erase all copies of her salary before termination.” In Sec. III, we adapt a medical records system from Borgstr¨om et al. [11] to incorporate erasure policies such as the one above. First we provide basic intuitions. 2

presented here in a somewhat stylized RHTT notation.2 We consider two integer heap pointers x and y that are arguments to the program. The policy is: the program must erase the contents of y to high (i.e., by potentially caching the value into a confidential heap pointer) before termination. In RHTT, we express this policy with the type

In this program, in addition to purely functional constructs such as anonymous functions fun, we use side-effecting primitives such as write x n, which stores the value of n into the location x ; read x , which returns the contents of x ; and x ← e1 ; e2 , which sequentially composes e1 and e2 , substituting the return value of e1 for x in e2 . The above program satisfies the policy because the value stored into y at the end is low (the initial contents of x incremented by 2). The ending contents of x are not low, as they depend on the initial contents of y. The policy, which does not specify the ending security level of the contents of x , allows this flow. Indeed, the program illustrates that memory cells may change their security level during the program run— a defining characteristic of our semantic definition of NI. While the STsec type of the above program classifies the security of the contents of x and y, it doesn’t classify the pointer addresses themselves, as the latter requires discerning the address names in two different runs. RHTT provides for this possibility by endowing the STsec type with a local context, which is a list of types of the variables that are considered local to the computation. This list is the empty list, [ ], in the first STsec type we presented. To reflect that the pointer y is high at the beginning, in addition to having high contents, we change the type as follows.

Πx y:ptr. STsec [ ] unit (p, q) where the precondition, p, is fun i . ∃u v :nat. i = (x 7→ u • y 7→ v ) and the postcondition, q, is fun rr ii mm. ∀uu vv . ii = ((x , x ) Z⇒ uu •• (y, y) Z⇒ vv ) → u1 = u2 → ∃uu 0 vv 0 . mm = ((x , x ) Z⇒ uu 0 •• (y, y) Z⇒ vv 0 ) ∧ v10 = v20

(Recall our notation: uu is (u1 , u2 ), vv 0 is (v10 , v20 ).) The above STsec type is an instance of a dependent function type; it describes functions with arguments x and y of type ptr (pointers), as indicated by the variables following the Πsymbol. Such functions produce computations of type STsec with the listed pre- and postconditions; the type is dependent, because the arguments appear in the assertions in order to describe the policy. We will explain [ ] momentarily. The precondition p states that the program must start with an initial heap i containing two pointers x and y, with appropriately-typed contents. Here x 7→ u is the singleton heap containing only the location x storing value u; while • is disjoint heap union. As i is the disjoint union of smaller singleton heaps, there is no aliasing between the two pointers x , y. The postcondition binds over three variables rr :unit2 and ii , mm:heap2 , which are, respectively, the pair of return values, the pair of initial heaps, and the pair of final heaps for the two runs of the program.3 The notation zz Z⇒ ww denotes a pair of singleton heaps (z1 7→ w1 , z2 7→ w2 ). Similarly •• lifts • to pairs; that is, jj •• kk = (j1 • k1 , j2 • k2 ). Thus, the postcondition q states that if the contents of x in the two initial heaps are equal (hence low), then the contents of y are low in the end. This is only possible if the value stored in y is erased, by overwriting it with a constant, or with some other value computed from the initial (low) contents of x , but not y, as the initial contents of y are not declared low. One program satisfying the above policy (i.e. type) is the following.

Πx :ptr. STsec [ptr] unit (fun y i . ∃u v :nat. i = (x 7→ u • y 7→ v ), fun yy rr ii mm. ∀uu vv . ii = ((x , x ) Z⇒ uu •• yy Z⇒ vv ) → u1 = u2 → ∃uu 0 vv 0 . mm = ((x , x ) Z⇒ uu 0 •• yy Z⇒ vv 0 ) ∧ v10 = v20 ) Now y is pushed inside the precondition, and similarly, yy is pushed into the postcondition, so that y1 and y2 refer to the addresses of y in two different runs (contrast with (y, y) in the postcondition of the original type). The program syntax changes too, as the local variables now have to be bound within the scope of do. In other words, our program now looks like: fun x . do (fun y. u ← read x ; · · · ) In general, variables bound by Π are considered low, whereas variables declared in the local context may be low or high, depending on the policy described by the type. III. M OTIVATION : C ONDITIONAL ACCESS AND E RASURE Erasure-dependent access policies grant clients the right to access confidential data provided the clients first prove they will eventually erase this data (and any derived secrets). UR data structures facilitate such erasure proofs by preventing inadvertent leaks through the memory representations of data structures maintained by the clients. In this section, we explore this central idea by surveying a medical database application we built in RHTT, based on a workflow of Borgstr¨om et al. [11], which supports erasuredependent policies in addition to standard role-based access control. We do not describe the full system here due to space constraints (cf. Appendix A for a more complete description)

fun x y. do (u ← read x ; v ← read y; if v = 0 then write x (u + 1) else ret (); write y (u + 2)) 2 Our prototype in Coq uses combinators to avoid some explicit variable bindings, but we elide those here as they are just a technical device. Likewise, although our prototype uses large footprint specifications, we present specifications in this paper using small footprints, for readability. 3 Since our postconditions relate input to output heaps, they are written in VDM style [9]. As pointed out by Kleymann [20], this style is essential for languages with procedures, such as RHTT.

3

endorse : ∀ (client (G

: type)

client = b ptr

: type)

G 2

2

2

= b int

ccshape (cc:client2 ) (kk :G 2 ) (ii :heap2 ) = b ii = cc Z⇒ kk

(ccshape : client → G → heap → prop) (ccmp : G → G).  //input : STsec program requiring can read emr     (STsec [database, client, user] int     (fun db c pat i .     ∃(k :G) (u:user) (j t:heap).     can read emr (get uid u) (get uid pat) ∧     i =j •t ∧     j ∈ shape db u ∧ t ∈ cshape c k ,  fun ddb cc ppat yy ii mm. R   ∀kk uu jj tt. ii = jj •• tt →     jj ∈ sshape ddb uu →     tt ∈ ccshape cc kk →    2 0  ∃tt :heap . mm = jj •• tt 0 ∧    0  tt ∈ ccshape cc (ccmp k1 , ccmp k2 ) ∧     //no leakage of high data    (t1 = t2 → t10 = t20 )))

ccmp : G → G = b fun k . k + 1 prog : R = b do (fun db c pat. emr ← read emr db pat; k ← read c; write c (hash1 emr ); write c (k + 1); ret (hash2 emr ))

counting client : S = b endorse ptr int (fun cc kk ii . ii = cc Z⇒ kk ) (fun k . k + 1) prog Fig. 2: An endorse client implementation with an integer pointer as local state. On each call to endorse, counting client stores a hash of the confidential patient record emr into local state, then overwrites this hash with the incremented count k + 1. In order to typecheck counting client, we must prove that prog leaks into local state no data derived from emr .

 //output : equivalent program w /o can read emr     → STsec [database, client, user] int     (fun db c pat i .  ∃(k :G) (u:user) (j t:heap). S   i =j •t ∧     j ∈ shape db u ∧ t ∈ cshape c k ,    fun ddb cc ppat yy ii mm. · · · )

A. Medical Database Application We would like to enforce the policy: clients may read confidential patient records, but only if they erase any data derived from these records before program exit. To conservatively enforce that a client does not steal patient records, the medical records database could require that clients deallocate all local state on function return, and thereby ensure that clients steal nothing, but this is prohibitively expensive. If a client makes many thousands of calls to the database, allocating and then deallocating local state on each call becomes infeasible. In general, any client local state not derived from the confidential data should be allowed to escape a call to the database. In RHTT, higher-order STsec types allow us to formulate such permissive policies. Fig. 1 gives the database-specific definitions required to support erasure-dependent access control: namely, the function endorse; while Fig. 2 presents a client that calls endorse in order to access confidential patient medical records. The endorse function is parametric in the types client, defining the client’s local state, and G, defining the values stored in the client’s local state, as well as ccmp and ccshape, and thus is quite general. The function ccmp defines how the values given by G evolve after each call to the database. The relational predicate ccshape defines the heap representation, over two runs, of the client’s local state client2 and local values G 2 . The body of endorse’s specification puts everything together. It takes as argument an STsec computation of type R which requires can read emr permission (the “input”), and produces a computation of type S that does not (the “output”), but which is identical in all other respects. Here

Fig. 1: Higher-order erasure in a medical database application. The function endorse takes as input an STsec computation requiring the can read emr token, and produces as result a new computation that does not require can read emr. Clients that call endorse must prove that they do not leak confidential information (t1 = t2 → t10 = t20 ).

but instead focus on how the verification effort required to prove erasure policies, and noninterference policies generally, grows as the state maintained by clients evolves from local variables or scalar pointers to data structures such as arrays or hash tables. As complexity increases, our experience has shown that the use of unique representation as an abstraction principle greatly simplifies information flow proofs, and in some cases even enables proofs that were previously infeasible. In the examples that follow, we go into some detail in order to describe the application extensions required to support expressive higher-order access control policies like the ones we have just described. The big picture, which we return to again at the end of this section, is that unique representation facilitates noninterference proofs by drawing a clear abstraction barrier between data structures and their clients. 4

can read emr is an abstract predicate, or token, that is required in order to call the database function read emr, which reads a patient’s confidential electronic medical record, or emr. The medical database ensures that can read emr is introduced only through appropriately sequenced calls to the functions request consent (by a doctor user) and give consent (by the appropriate patient). Appendix A describes how we enforce temporal access control policies such as this one. Crucially, endorse will succeed only if the client can prove that its final local state does not depend on any secrets that it computed during the call (the “no leakage of high data” condition). We express this requirement in RHTT by enforcing the policy: if the client local heaps t1 and t2 were equal initially (i.e., low), then the client final heaps t10 and t20 must also be equal in the two runs (t1 = t2 → t10 = t20 ). Note that we make no restrictions in the type of endorse on the confidentiality of the client’s return value (yy:int2 ). This does not violate our security policy; it just requires that any further computations that may depend on the client’s result be private as well.

Implications: The situation grows more complicated when local state evolves from an integer pointer to a hash table or other data structure. In general, operation sequences on data structures that are invertible at the level of the data structure interface—such as interposed insertions of key-value pairs into a hash table followed by their removal—can leave traces in the underlying heap representation, and thus make it difficult to prove noninterference. Unique representation alleviates this difficulty by making it possible to reason symbolically about (the absence of) such flows: by ensuring that logical equality of the data structure states in two runs implies actual equality of the underlying heaps, it reduces noninterference proofs to arguments about equalities of symbolic, mathematical representations. This reasoning can often be performed elegantly, through the application of algebraic laws. In the next two sections, we put unique representation on a formal footing, then describe UR variants of bounded multisets and hash tables which guarantee that the heap leaves no trace of the history of operations performed. This history independence theorem—a key consequence of unique representation— makes our multiset and hash table variants amenable to proofs of erasure properties such as those required by endorse, and to noninterference proofs more broadly.

Counting Client: Fig. 2 defines a client which maintains as local state a count of the number of times it has been called. The type client is implemented as a pointer to this count. The shape invariant ccshape specifies that the heaps i1 and i2 in the two runs (recall that Z⇒ operates on pairs of pointers and values) are singleton heaps from c1 and c2 to the integer counts k1 and k2 respectively. The function ccmp just increments the count after each call. The client program prog operates as follows: it first reads the medical record of a patient, pat, from the database and stores the resulting value in the variable emr . Next, it dereferences the pointer c into a second local variable k , for safekeeping. It now stores an integer hash of the medical record emr into c. If prog were to return at this point, it would not satisfy the required noninterference property (i.e., t10 = t20 ) since emr is confidential data and therefore not known to be equal in the two runs. To ensure that t10 does equal t20 —and that the count is properly updated—prog overwrites the value in the client local heap a second time with k + 1, the incremented count. It then returns hash2 emr , a second integer hash of pat’s medical record. The program counting client applies endorse to prog. On its own, prog would have had to request permission from a patient in order to call read emr, thereby establishing the function’s precondition. By endorsing prog, we avoid the need to establish this precondition, but at the expense of the additional proof obligation t1 = t2 → t10 = t20 . In this case, demonstrating that prog meets this specification is straightforward: even though the program stores the confidential value hash1 emr into local state at an intermediate point during the computation, it overwrites this value with a low value (k + 1) before function return. Because counting client’s local state consists of a single pointer to a scalar value, proving that t10 = t20 is straightforward. Indeed, this proof in RHTT is only a few lines.

IV. U NIQUE R EPRESENTATION In this section, we define unique representation formally and explore its implications, the most important of which is operation order independence (also called strong history independence). We then describe our implementation of insertonly bounded multisets as an example of a naturally UR data structure. First, we develop general definitions. Data Structures: A data structure ties a logical interface (an abstract type and operations on that type) to a concrete implementation of the interface, usually in the form of an executable program. One can model a data structure mathematically as a tuple consisting of the type A—of logical or abstract states—and the type M , of concrete or machine states. Typically, the machine states are just memory states whereas logical states may range from mathematical objects such as vectors to model arrays, to more complex objects such as finite key-value maps, to model data structures such as hash tables. In our RHTT development, we often need a third type, S , to store representation details for the data structure. In an array data structure, for example, S would be instantiated as the type of base pointers paired with the array size. In what follows, we will generically call values s : S root pointers, even though s may contain information beyond just the root pointer, as in the array example. Each pair of a data structure’s logical states and root pointers defines a set of concrete states implementing it on the machine. When a root pointer s and logical state a are implemented by machine state m, we say that the shape of s and a is m. The predicate shape : S → A → M → prop on representation types S , logical states A, and machine states 5

7 1 1 12

M gives the formal definition of the shape relation in our development. Unique Representation: A UR data structure in this context is one for which shape is a function.

s

1 4 3 s+9

Fig. 3: 10-element bounded multiset at base address s. The integers stored in each heap cell give the multiplicities of items s0 through s9 .

Definition 1 (Unique representation (UR) [17, 18, 28]). A data structure is uniquely represented iff for all root pointers s, logical states a and machine states m1 and m2 , if m1 ∈ shape s a, and m2 ∈ shape s a, then m1 = m2 .

Security Implication of UR: The security implication of SHI—and hence of UR by Theorem 1—is that an attacker with access to the machine representation of a data structure can learn no more than an attacker constrained by the data structure’s public interface. In other words, different sequences of operations that result in the same logical state leave no trace of the history of operations that got them there. In particular, any sequence of operations that is then reverted (e.g., a series of insertions in an array followed by removals of all the inserted elements) is equivalent, at the memory level, to having never performed the operations in the first place.

In our RHTT development, we state this property as ∀s a m1 m2 . (m1 , m2 ) ∈ sshape (s, s) (a, a) → m1 = m2 Here sshape is the relational version of shape: it applies to pairs of root pointers s, logical states a, and memory states m1 and m2 ; it states that m1 and m2 both implement the logical state a at root pointer s. Examples of data structures exhibiting the UR property include arrays and fixed-size stacks, insert-only bounded multisets implemented as arrays, treaps, etc. (See Golovin’s Ph.D. dissertation [17, Chapter 4] for even more examples.) Data structures that require dynamic memory allocation are generally not UR. In this work we focus on array-based UR data structures such as hash tables; reasoning about information flow for dynamically allocated data structures is an open research problem. History Independence: Unique representation is closely tied to the related notion of strong history independence (SHI) [18, 28]. Intuitively, a data structure is strongly history independent if it is oblivious to the order in which logically equivalent sequences of operations are performed. That is, the memory representations of history independent data structures cannot distinguish two sequences of operations that result in the same final logical states, assuming they began in the same initial logical (and machine) states. Strong history independence asserts history independence regardless of the initial logical state, whereas weak history independence holds only for operation sequences that are executed from a particular initial logical state.

A. UR Bounded Multisets Operation-order independence is an important security policy in contexts in which the history of operations performed must remain secret. One such context is that of voting machines: history independence of the vote storage unit ensures that an attacker with access to the voting machine’s memory— such access is more than plausible, cf., [4]—cannot identify the order in which ballots were cast, and thus identify which voters cast which ballots by correlating a voter’s appearance at the public polling place with the ballot. Molnar et al. [25] describe the architecture of a vote storage unit that achieves operation-order independence, and thus voter anonymity, by storing votes in insert-only bounded multisets. In this section, we give an RHTT implementation of insertonly bounded multisets inspired by the architecture of Molnar et al. as an example of a UR data structure. Insert-only Bounded Multisets: We implement n-element bounded multisets as arrays with n cells, each containing a natural number. RHTT arrays are contiguous memory blocks rooted at a pointer. Fig. 3 depicts one such array, of size 10, rooted at the pointer s. The value of the natural number at each index i in the array denotes the number of times element i appears in the multiset, or its multiplicity. To specify multisets, we associate with each array a finite function f from the finite type I (a parameter of the model) to nat. Indexes of type I correspond to indexes of the underlying array, and the range of f gives the multiplicities of the elements at each index in I. Fig. 4 gives the signature and RHTT implementation of the insert-only bounded multisets module. The type mset is an alias for arrays with indexes I and cells of type nat. The predicate shape takes a multiset s—a pointer to the base of the array—and a finite function f as arguments. Here f defines the mathematical representation of s, for specification purposes. The shape predicate is implemented by calling the corresponding shape predicate of the array module, which

Definition 2 (Strong history independence (SHI) [18, 28]). Consider any sequences of logical operations X and Y , root pointers s, logical states a, and machine states m such that shape s a m. Suppose logically interpreting X from a results in a 0 , and logically interpreting Y from a also results in a 0 . A data structure is strongly history independent iff executing X from m results in state m 0 , and executing Y from m results in state m 00 such that m 0 = m 00 ; and moreover, shape s a 0 m 0 and shape s a 0 m 00 . It is straightforward to show Theorem 1 ([17, 18, 28]). UR implies SHI. We have mechanized the proof of this theorem in RHTT for a slight elaboration of the general data structure model just described. 6

hash tables. They consist of m levels, or filters, each of which has an associated hash function H and an array for storing key-value pairs, just as in standard open-addressing hashing. When a collision occurs in the top filter of the table (i.e., level 0), insertion is reattempted at a lower level (e.g., level 1) until either the insertion succeeds or the table runs out of filters. In the latter case, the item is placed in a secondary key-value map. As we described in the introduction, filter hash tables suffer from the same information leakage problems as standard openaddressing hash tables: insertions and deletions of equal sets of key-value pairs can result in hash tables whose heap layouts, both within each filter and across the table, are radically different. An attacker with later access to the program memory can therefore recover information about high security items in the table even after they have been deleted, just by inspecting the hash table’s memory representation. It is precisely the lack of unique representation that enables such leaks. We have addressed this problem by defining a new UR variant of filter hash tables.4 UR filter hash tables enforce unique representation by respecting a canonicity invariant. To understand canonicity, imagine an m-level filter hash table with m filters or levels is split into two parts: (1) the top filter d0 and (2) the remaining filters d1 . . . dm−1 , together with the auxiliary store b, which we call the “bucket”. Let Hi be the hash function associated with filter i and let < be an irreflexive, transitive, total order on keys.

mset (I :finType) = b array I nat shape (s:mset) (f :I →fin nat) = b Array.shape s f bump (I :finType) (k :I ) (f :I →fin nat) = b Array.updf k ((f k )+1) f insert (s:mset) : STsec [I ] unit (fun k i . ∃f . i ∈ shape s f , fun kk yy ii mm. ∀ff . ii ∈ sshape (s, s) ff → mm ∈ sshape (s, s) (bump k1 f1 , bump k2 f2 )) = b do (fun k . n ← Array.read s k ; Array.write s k (n + 1)) Fig. 4: Signature and implementation of insert-only multisets.

specifies that s is the base of a contiguous memory region of size |I | with contents given by f . The precondition of insert defines when it is safe to execute the method, i.e., shape s f must hold of the input heap i and the base pointer s for some finite function f . In insert’s postcondition, the proposition mm ∈ sshape (s, s) (bump k1 f1 , bump k2 f2 ) states that in the two runs, the output heaps mm contain the local state of the multiset s, the contents (i.e., multiplicities) of whose indices k1 and k2 have been incremented by 1. The code for insert first reads the contents of k into n by calling the array read operation Array.read. Next, it writes n + 1 at index k by calling the array write operation Array.write. In our RHTT implementation of arrays, Array.read and Array.write are implemented via pointer arithmetic on the base address. The proof (in the Coq scripts) that bounded multisets are UR follows from the fact that multisets are implemented as arrays. Arrays are naturally UR: for each base pointer s and finite function f , there is a unique contiguous block rooted at s whose cells are determined by the range of f .

Definition 3 (Canonicity). An m-level filter hash table is canonical iff the following properties hold for all 0 ≤ i < m: 0 0 • For all keys k and k , if (a) k is stored in filter di , (b) k is stored either in a strictly lower filter or in the bucket, b, and (c) Hi (k ) = Hi (k 0 ), then k < k 0 . • For all indexes j , if di [j ] is empty, then there is no key k such that (a) Hi (k ) = j , and (b) k is stored either in a strictly lower filter or in the bucket, b. In other words, canonicity requires that keys be placed as high up in the filter hash table as possible and that whenever there is a collision during insertion, a suitable strict order < be used to resolve the collision. Of course, deletion must preserve the canonicity invariant as well. One consequence of this property is that all occupied slots in the top filter d0 always contain the minimum key among all keys currently in the table that would also have hashed to that slot. Nonminimum keys that collided with the minimum key upon insertion must be placed in lower levels of the hash table. Intuitively, canonicity implies unique representation since: (1) the keys in every hash collision path are uniquely ordered by the < relation; and (2) empty slots are always filled in order, topmost first. (See our Coq scripts for a formal proof of this theorem.) To maintain the canonicity invariant, we slightly alter the usual insertion and deletion routines for filter hash tables to ensure that the minimum key for each index is always stored

V. U NIQUELY R EPRESENTED F ILTER H ASH TABLES Open-addressing hash tables—those that use probing to resolve collisions—are a key imperative data structure with wide-ranging applications. Unfortunately, such hash tables, unlike arrays, are not naturally UR: key collisions can create different memory layouts depending on the order in which keys are inserted, and thus leak information to attackers with later access to the memory. In this section, we implement and prove correct a UR variant of filter hash tables. Filter hash tables provide performance and space guarantees similar to mlevel cuckoo hash tables [16], and therefore usually outperform chaining hash tables with, e.g., UR buckets. A. Canonical Representation Standard filter hash tables were first proposed by Fotakis et al. [16] as a simple yet efficient variant of m-level cuckoo

4 As far as we are aware, this is the first UR variant of filter hash tables to be described.

7

0 1

F,3 D,4 G,9

B,7

2

1

E,1 D,4 F,3

2

G,9

0 stash

bucket

2 3

0 1 2

E,1 A,3 C,7 B,7

0

F,3 D,4

1

G,9

2

H,3

I 11

3

E,1 A,3 F,3 D,4 G,9

B,7

3

Panel 5: remove A

Panel 4: insert CIH

1

B,7

3

3

0

Panel 3: insert A

Panel 2: insert E

Panel 1: insert GFDB

Panel 6: remove F

E,1 D,4 C,7 B,7

0

E,1 D,4 C,7 B,7

F,3

1

G,9

G,9

2

H,3

3

I,11

H,3

I 11

Fig. 5: 4-level UR filter hash table containing keys A-I . Any two insertion orders, e.g., ABCDEFGHI and GFDBEACIH , produce the same memory layout. Each additional filter added to the structure implements, along with those beneath it, the kv -map interface, yielding a modular construction. For example, in Panel 1, filters 1-3 and the auxiliary kv -map, bucket, together form the stash for the top-level filter 0.

in d0 (and so on for filters 1 through m − 1, for the remaining keys). This requires checking whether a newly inserted key k 0 with H0 (k 0 ) = j is less than the current key in slot j of d0 . If it is, then k 0 is placed in d0 and the current key k in slot j is evicted to a lower level. If k 0 > k then k 0 is inserted into a lower level as usual. When k = k 0 , the new value v 0 associated with k 0 overwrites the current value v (we do not maintain duplicate keys). Deletion is somewhat trickier: when a key k is removed from slot j in filter d0 , we must find the smallest key k 0 at level d1 or lower such that H0 (k 0 ) = j and recursively move this key up into filter d0 . To do so, we could maintain back pointers into lower levels in order to efficiently find the key k 0 . However, our current prototype forgoes such back pointers in favor of a simple yet somewhat inefficient solution: we require each filter in the hash table to use the same hash function, H . This makes na¨ıve deletion faster at the expense of the potential for more collisions. In future work, we plan to implement and verify a more efficient delete method that supports independent hash functions at each level. In the rest of the description of filter hash tables and our implementation, we assume a single hash function H .

Note that the table that results after all keys have been inserted is equivalent to the table resulting from insertion of the keys in order, ABCDEFGHI , or in any other permutation. In Panel 4, key I is placed in the auxiliary key-value map, bucket, since it collided with other keys at every level of the table during insertion. Panels 5 and 6 illustrate deletions. In Panel 5, key A is deleted, causing D to be moved up into filter 0. In Panel 6, key F is deleted, causing a chain of three reorderings: key G is moved from filter 2 to filter 1, key H is moved from filter 3 to filter 2 and key I is moved from the bucket into filter 3. B. Preliminaries Fig. 6 gives the preliminary RHTT definitions used in our implementation and in the correctness and unique representation proofs of filter hash tables. We present these definitions first, before the imperative RHTT code for hash table lookup, insertion and deletion, because the definitions are important for understanding the specifications of these functions. The function hash of type K → In maps keys k :K to indices of type In . Here K is a type with a built-in order operation, called ord, that is irreflexive, transitive and total. For example, K could be instantiated as the type of integers with the usual < relation as the order. Indexes of type In are integers in the range 0 to n −1, inclusive. These indexes define the slots in the arrays that are used to implement each filter level of the hash table. Fixing the n parameter, as the above definitions do, enforces that each filter in the hash table has the same size. Hashmaps (hashmap) are pairs of two objects, an array defining the uppermost filter in the hash table (filter 0 in Fig. 5) and a “hashable” key-value map called the stash, defined by the type kvmap (highlighted in gray in Fig. 5). The stash includes all of the filters besides the topmost one, as well as the bucket. The type option (K ×V ) defines optional pairs of keys and values. These pairs can either be None, meaning no pair at all, or b(k , v )c, pronounced “some”, meaning an actual pair (k , v ). Hashable key-value maps are just like standard keyvalue maps except that they expose—in addition to methods

Fig. 5 illustrates insertion of the keys GFDBEACIH into a 4-level filter hash table. Key G is inserted first. Then key F is inserted, causing a collision with G in the topmost filter. Since F < G in the usual lexicographic order, G is evicted from d0 and re-inserted into d1 . Keys D and B are inserted next, without incident, into filter d0 . The state of the table after G, F , D and B have all been inserted is shown in Panel 1 of the figure. Next, key E is inserted in slot 1 of d0 , causing a second collision with F . The eviction of F and its re-insertion into filter d1 causes a chained collision with G, which is evicted now for the second time into filter d2 . The state of the table after E has been inserted is shown in Panel 2. Key A’s insertion causes the eviction of D into a free slot in filter d1 (Panel 3). The rest of the keys are inserted in a similar manner, resulting in the filter hash table shown in Panel 4 of the figure. 8

for lookup, insertion and deletion of key-value associations—a hash function on keys. We use this hash function parameter to fix the hash function for each filter. Our construction of filter hash tables is modular: filter hash tables are parametrized by a stash and themselves implement the hashable key-value map interface, meaning we can construct multilevel filter hash tables of depth m by instantiating the stash with a second filter hash table of depth m − 1. For the bucket, we use a simple functional implementation of the hashable key-value map interface as sorted association lists. The only requirement is that the bucket be UR, which we achieve by sorting the keyvalue pairs and removing duplicates. The next four functions in the figure together define the shape predicate for filter hash tables. The function minkey takes two arguments, a finite map f from keys to values and an index ix , and returns the minimum key in f , if one exists, that hashes to ix . To find the keys that hash to ix , it first filters the keys of f (keys of f ) by the predicate (fun k . hash k = ix ). The hashtab and stash functions partition the key-value pairs in f into the toplevel array and the stash. The hashtab function constructs f ’s toplevel array by building a new (finite) function returning b(k , v )c at index ix whenever k is the minkey for f at index ix . Otherwise, hashtab returns None at that index. The stash function operates by iteratively removing any keys already in the toplevel table. For each index ix in the range [0. . n − 1], it removes k from f whenever the hashtab of f already contains a key-value pair at index ix . Thus the stash includes all key-value pairs that were not already selected for inclusion in the toplevel filter. Finally, shape (t:hashmap) (f :K *fin V ) asserts that the heap, i , can be split into two subheaps, j and k , such that j has the shape of the array given by hashtab f and k has the shape of the key-value map given by stash f .

hash : K → In hashmap = b (array In (option (K × V )) × kvmap) minkey (f :K *fin V ) (ix :In ) = b min (filter (fun k . hash k = ix ) (keys of f )) hashtab (f :K *fin V ) = b fun (ix :In ). if minkey f ix is bk c then if fnd k f is bv c thenb(k , v )c else None else None stash (f :K *fin V ) = b foldr (fun ix f0 . if hashtab f ix is b(k , v )c then rem k f0 else f0 ) f [0. . n − 1] shape (t:hashmap) (f :K *fin V ) (i :heap) = b ∃j k :heap. i = j • k ∧ Array.shape t.1 (hashtab f ) j ∧ Kvmap.shape t.2 (stash f ) k Fig. 6: Preliminary specifications and definitions for UR filter hash tables. The functions hashtab and stash define the reference functional implementation of the partitioning of keyvalue pairs into the toplevel filter and the stash.

new = b do (t1 ← Array.new In None; t2 ← Stash.new; ret (t1 , t2 )) lookup (t:hashmap) = b do (fun k . o ← Array.read t.1 (hash k ); if isNone o then ret None else if (o is b(k 0 , v 0 )c && k = k 0 ) then ret bv 0 c else Stash.lookup t.2 k )

C. Implementation

insert (t:hashmap) = b do (fun k v . o ← Array.read t.1 (hash k ); if (isNone o || (o is b(k 0 , v 0 )c && k = k 0 )) then Array.write t.1 (hash k ) b(k , v )c else let b(k 0 , v 0 )c = o in if k < k 0 then Stash.insert t.2 (k 0 , v 0 ); Array.write t.1 (hash k ) b(k , v )c else Stash.insert t.2 k v

Fig. 7 presents our RHTT implementation of filter hash tables, parametrized by a key-value map module Stash. Function new is the shortest of the four functions in the interface. It calls the new functions from the array module and from the key-value map module to allocate a new array, t1 , and a new stash, t2 . The array is initialized to contain values of type option (K ×V ), with default value None. The stash is initially empty. The lookup function takes a key k as its sole local argument. It first looks up the value in the toplevel filter, t.1, at position hash k using the array read method. Array.read returns an option, o, so lookup must first check whether o is None— meaning the toplevel array is empty at position hash k —before proceeding. If o is None, then lookup immediately returns None. It can do so because the canonicity invariant implies that k is not present: if k were present, it would have been hashed into the empty slot. The insert function is somewhat more complicated. It takes two local arguments, a key, k , and a value, v . Like lookup, its first step is to look up the current key-value pair, o, at position hash k in the toplevel array. If o is None then insert writes

remove (t:hashmap) = b do (fun k . o ← Array.read t.1 (hash k ); if (o is b(k 0 , v 0 )c && k = k 0 ) then kmin ← Stash.minkey t.2 (hash k ); if isNone kmin then Array.write (hash k ) None 0 else let bkmin c = kmin in 0 0 bvmin c ← Stash.lookup t.2 kmin ; 0 Stash.remove t.2 kmin ; 0 0 Array.write t.1 (hash k ) b(kmin , vmin )c else Stash.remove t.2 k Fig. 7: RHTT implementation of UR filter hash tables.

9

lookup states that the heap i satisfies the key-value map shape predicate. In our filter hash table implementation of this interface, shape is instantiated with the filter hash table shape predicate defined in the previous section. The postcondition of lookup states that lookup leaves the heap unchanged (mm = ii ) and that the option values y1 and y2 that lookup returns are those discovered when keys k1 and k2 are looked up in the mathematical finite maps f1 and f2 that represent the key-value map t (ii ∈ sshape (t, t) ff ). Key-value map insert (also shown in Figure 8) takes three arguments, the map t, a key k , and its new value v . As in lookup, t is a function argument that we assume to be low, whereas k and v are function arguments that are implicitly high. insert’s postcondition states that in the output heap mm, k1 is mapped to v1 in one run and k2 is mapped to v2 in the other. We model these updates mathematically using the finite map insertion function, which updates a key with a new value in a key-value map. Complexity: We have not yet closely analyzed the complexity of our filter hash table variant but believe that much of the argument from [16] (in which filter hash tables are shown to have worst-case constant access time for certain load factors) can be applied to the UR case in order to prove at least expected constant time for most operations.

lookup (t:kvmap) : STsec [K ] (option V ) (fun k . ∃f . i ∈ shape t f , fun kk yy ii mm. ∀ff . ii ∈ sshape (t, t) ff → mm = ii ∧ yy = (fnd k1 f1 , fnd k2 f2 )) insert (t:kvmap) : STsec [K , V ] unit (fun k v . ∃f . i ∈ shape t f , fun kk vv yy ii mm. ∀ff . ii ∈ sshape (t, t) ff → mm ∈ sshape (t, t) (ins k1 v1 f1 , ins k2 v2 f2 )) Fig. 8: Specifications of key-value map lookup and insert. The filter hash table functions of Fig. 7 meet these interfaces, along with similar ones for new and remove.

the pair b(k , v )c into the array at that position. The canonicity invariant again implies that there are no keys elsewhere in the table that hash to the same slot as k and yet are less than k in the order. Likewise, if o is an actual key-value pair (k 0 , v 0 ) such that k = k 0 , then insert can just overwrite the current value of k 0 with the new value, v . Otherwise, o is a keyvalue pair (k 0 , v 0 ) such that k 6= k 0 . Now there are two cases: either (1) k < k 0 , or (2) k 0 < k . In case (1), lookup inserts k 0 recursively into the key-value map, then overwrites k 0 at position hash k in the toplevel array with b(k , v )c. Case (2) is even simpler: k 0 stays where it is and k is inserted recursively into the stash. The remove function is similar in structure to insert. Like both insert and lookup, it first reads the current key-value pair, o, located at position hash k in the toplevel filter. If o is b(k 0 , v 0 )c such that k = k 0 , then remove must delete k from the toplevel array and replace it with the next smallest key that would have hashed to that slot, if one exists. To do so, remove calls Stash.minkey on the stash, returning a value kmin of type option K . If kmin is an actual key, then it and its associated value, vmin , are removed from the stash and written into the toplevel array at position hash k . (Stash.minkey guarantees that hash k = hash k 0 .) Otherwise, None is written into the array at that position, effectively clearing the slot. When o is None or b(k 0 , v 0 )c such that k 6= k 0 , k is removed recursively from the secondary map.

VI. E RASURE - DEPENDENT ACCESS C ONTROL R EVISITED In Sec. III, we described an RHTT implementation of a medical database application. One of the distinguishing features of the system was that it supported expressive conditional access control policies. In particular, the higher-order function endorse granted a client unrestricted access to confidential patient records provided the client could first prove that any confidential data it stored in local state during the computation was erased before its function exited. In Sec. III, our example client’s local state consisted of a single pointer to an integer; the simplicity of this local state made it quite easy to prove that the client program met the erasure policy required by endorse. In this section, we revisit the client of Sec. III in order to prove erasure policies for more complicated data structures such as the filter hash tables of Sec. V. Unique representation is the key technology that enables these proofs. Hashing Client: Fig. 9 presents the RHTT code for our enhanced medical database client. In the original client of Sec. III, the client’s local state, given by the type client, was a pointer to an integer. Here client is the type of 4-level filter hash tables, defined by the function URHashTab.build map. This function constructs a filter hash table with an arbitrary number of filters (4 in this case). hash, which gives the hash function used, is a parameter. The predicate ccshape is defined wrt. the UR filter hash table’s relational shape invariant, URHashTab.sshape. This shape invariant applies the filter hash table’s unary shape predicate shape (Sec. V) to heaps i1 and i2 respectively. The function ccmp defines how the client’s local state evolves after each call to hashing client. Here we specify that after a single call, the client will have

D. Specifying Filter Hash Tables Our filter hash table implementation satisfies the hashable key-value map interface, which we described informally in the previous section. We forgo presenting the entire interface here since most of the specifications are straightforward (please see our Coq scripts for more details). Below we focus on a few of the interface functions. The lookup function, specified by the STsec type given in Figure 8 and implemented in the previous section by filter hash table lookup, takes two arguments, a function argument t defining a key-value map and a local argument of type K providing the key to be looked up. The precondition of 10

Here (m1 , m2 ) ∈ sshape (c, c) (f , f ) states that the concrete heaps m1 and m2 each implement the same finite key-value map f , with root pointer c. The specifications of insert and remove establish that f is equal in the two runs. That m1 and m2 are equal then follows from the theorem. Note that if the client’s local state had been a standard hash table, or some other non-UR data structure, it would have been difficult if not impossible to prove the required noninterference policy. For standard hash tables, a simple counterexample for the code above is the case in which both 17 and 18 hash to the same index. The collision upon insertion changes where the low kv pair (18, v ) is stored in the heap. Consequently the presence of the confidential pair (17, (hash1 emr )) is detectable even after key 17 has been removed from the table.

client = b URHashTab.build map hash 4 G = b finMap K V ccshape (cc:client2 ) (ff :G 2 ) (ii :heap2 ) = b URHashTab.sshape cc ff ii ccmp (f :G) : G = b ins 18 v (rem 17 f ) prog (c:client) : R = b do (fun db pat. emr ← read emr db pat; URHashTab.insert c 17 (hash1 emr ); URHashTab.insert c 18 v ; URHashTab.remove c 17; ret (hash2 emr )) b endorse hashing client (c:client) : S = (URHashTab.build map hash 4) (finMap K V ) URHashTab.sshape (fun f . ins 18 v (rem 17 f )) prog

VII. R ELATED W ORK There has been much work on semantics of noninterference and its relaxations [32, 33] as well as enforcement mechanisms based on type systems, logics and other program analyses. Chong and Myers [12, 13] study specification and enforcement of erasure policies in the context of a simple imperative language and show how their framework can be extended to Jif [26]. Their framework handles declassification policies as well: they describe a duality between declassification and erasure where the former can be viewed as a relaxation of noninterference and the latter as a strengthening of noninterference. Hunt and Sands [19] show the close connection between information erasure policies and noninterference in the context of simple imperative as well as interactive programs. To the best of our knowledge, there is little published work on verification of information flow policies for implementations of common data structures, barring the works we mention below. We have already discussed (Sec. IV) the work of Molnar et al. [25] who propose insert-only bounded multisets as a data structure for vote storage units. Russo et al. [31] show dynamic enforcement of secure information flow in non-UR data structures such as dynamically allocated DOM trees in the presence of deletion. Each node of a tree is assigned two security labels, one for the existence of the node and the other for the node’s contents. The scalability of their approach to other data structures needs to be investigated. Furthermore it would be interesting to verify that erasure is secure in the presence of sequences of insertion and deletion operations (as we consider in this paper). Nanevski et al. [27] consider verification of flexible security policies in possibly heterogeneous linked lists. Such lists may contain mixed high and low data as well as mixed high and low links, although their main example considers low links only. Here we consider more involved data structures such as hash tables and recognize unique representation as a critical property to ensure secure erasure in the presence of adversaries who have direct access to the concrete memory state. Micciancio [23] develops algorithms for oblivious tree data structures in the context of cryptography. Specifically, he shows that oblivious trees can be used to guarantee privacy of the incremental signature scheme of Bellare et al. [6]. Naor and

Fig. 9: A medical database client with a 4-level UR filter hash table as local state.

inserted the low key-value pair (18, v ), for some v , into (and removed key 17 from) its local hash table. The hashing client function is similar in structure to the counting client function of Sec. V. In order to execute the code within the do block, it wraps prog in a call to the medical database application’s endorse function. When hashing client typechecks, RHTT generates a proof obligation requiring that prog leak no information about any confidential patient records it access via calls to read emr. Within the do block, prog first reads pat’s electronic medical record, then inserts a confidential integer hash of this record (hash1 emr ) into the filter hash table, associated with key 17. prog next inserts the low key-value pair (18, v ) into the hash table. Finally, it removes the confidential data it previously inserted at key 17 by calling URHashTab.remove c 17. To prove that hashing client meets the erasure policy, we must show that the client’s local heap is low after the function is executed, assuming its local state was low before execution. To do so, we reason in two steps: we first show that symbolically executing prog from equal initial states results in equal symbolic, i.e., logical, final states. That is, inserting the confidential key-value pair (17, (hash1 emr )), the public or low key-value pair (18, v ) and then removing the confidential pair (17, (hash1 emr )) results in final keyvalue maps f10 and f20 in the two runs that are equal. That this is true can be seen by considering how insert commutes with remove. For example, one such commutation property is rem k (ins k v f ) = rem k f . Once we have proved that f10 = f20 , in the second step we apply the unique representation theorem ∀c f m1 m2 . (m1 , m2 ) ∈ sshape (c, c) (f , f ) → m1 = m2 for UR filter hash tables to show that equality of the logical states f10 and f20 implies equality of the final memory states. 11

Teague [28] consider Micciancio to be the first to explore the “idea of maintaining a data structure such that no extraneous information is available”. In the algorithms community, Amble and Knuth [1] studied hash tables without deletions; however, their goal was to develop fast searching algorithms rather than exploit properties such as history independence. Theoretical work on history independent data structures by Naor and Teague [28], Hartline et al. [18], and Blelloch and Golovin [10, 17] has been a primary impetus for our work, which complements the algorithmic approaches by applying unique representation to achieve provable guarantees about the functional correctness and security of programs.

of reasoning similar to the logic of Plotkin and Abadi [29]. This would enable correctness proofs of security policies that incorporate parametricity reasoning. A first step towards this internalization has been recently considered by Krishnaswami and Benton [21]. Finally, future work must address erasure in dynamically allocated data structures, possibly following Golovin [17]. R EFERENCES [1] O. Amble and D. E. Knuth, “Ordered hash tables,” Comput. J., vol. 17, no. 2, pp. 135–142, 1974. [2] T. Amtoft, S. Bandhakavi, and A. Banerjee, “A logic for information flow in object-oriented programs,” in POPL, 2006. [3] R. Anderson and M. Kuhn, “Tamper resistance – a cautionary note,” in The Second USENIX Workshop on Electronic Commerce, 1996. [4] A. W. Appel, “Security seals on voting machines: A case study,” ACM Trans. Inf. Syst. Secur., vol. 14, pp. 18:1–18:29, September 2011. [Online]. Available: http://doi.acm.org/10.1145/2019599.2019603 [5] D. Bell and L. LaPadula, “Secure computer systems: Mathematical foundations,” MITRE Corp., Tech. Rep. MTR-2547, 1973. [6] M. Bellare, O. Goldreich, and S. Goldwasser, “Incremental cryptography: The case of hashing and signing,” in CRYPTO, 1994. [7] N. Benton, “Simple relational correctness proofs for static analyses and program transformations,” in POPL, 2004. [8] J.-P. Bernardy, P. Jansson, and R. Paterson, “Parametricity and dependent types,” in ICFP, 2010, full version available at http://publications.lib.chalmers.se/ records/fulltext/local 135303.pdf. [9] D. Bjørner and C. B. Jones, Eds., The Vienna Development Method: The Meta-Language, ser. Lecture Notes in Computer Science, vol. 61. Springer, 1978. [10] G. E. Blelloch and D. Golovin, “Strongly historyindependent hashing with applications,” in FOCS, 2007. [11] J. Borgstr¨om, A. D. Gordon, and R. Pucella, “Roles, stacks, histories: A triple for Hoare,” JFP, vol. 21, no. 2, pp. 159–207, 2011. [12] S. Chong and A. C. Myers, “Language-based information erasure,” in CSFW, 2005. [13] ——, “End-to-end enforcement of erasure and declassification,” in CSF, 2008. [14] E. S. Cohen, “Information transmission in sequential programs,” in Foundations of Secure Computation, R. A. DeMillo, D. P. Dobkin, A. K. Jones, and R. J. Lipton, Eds., 1978. [15] D. Denning, “A lattice model of secure information flow,” CACM, vol. 19, no. 5, pp. 236–242, 1976. [16] D. Fotakis, R. Pagh, P. Sanders, and P. Spirakis, “Space efficient hash tables with worst case constant access time,” in STACS, 2003.

VIII. D ISCUSSION We have provided evidence that unique representation is an important abstraction principle for reasoning about the information flow behavior of data structures and their clients. To support this claim, we used Relational Hoare Type Theory to verify unique representation and noninterference properties of arrays, multisets, hash tables, and their clients, as well as a medical database application. As far as we are aware, ours is the first formal (static) verification system for information flow that makes compositional reasoning about nontrivial data structures and their clients possible. Much work remains. One direction to explore is the connection between enforcement of erasure policies in a highlevel language and their preservation through the compilation toolchain. For example, an information-flow aware compiler must not optimize away certain operations introduced by the programmer to prevent information leakage (e.g., the zeroing out of confidential local variables in the stack frame before function return) just because these operations do not affect a program’s dataflow. Another direction involves studying interactions between garbage collection and erasure guarantees, especially in the context of timing and other attacks over covert channels. Proper engineering of an incremental garbage collector could ensure that private data is erased, and prevent some timing attacks. However, any end-to-end erasure guarantee of programs composed with the garbage collector would depend on the correctness of the garbage collector itself (cf. McCreight et al. [22]). The resolution of both of the above issues may require generalizing our specifications so that they express not only properties of the beginning and the end of the computation, but of intermediate points as well. This may require reasoning about traces and temporal properties. Additionally, in this work, we have assumed that the attacker cannot break the opaque sealing of abstract predicates in Coq. That this is indeed the case, is the parametricity theorem for Coq. Bernardy et al. [8] have recently established this property for a Calculus of Constructions (and other pure type systems), which is a significant fragment of Coq. An interesting direction for future work would be to reflect the parametricity property into Coq itself, thus internalizing a form 12

[17] D. Golovin, “Uniquely represented data structures with applications to privacy,” Ph.D. dissertation, Carnegie Mellon University, Pittsburgh, PA, August 2008. [18] J. D. Hartline, E. S. Hong, A. E. Mohr, W. R. Pentney, and E. Rocke, “Characterizing history independent data structures,” Algorithmica, vol. 42, no. 1, pp. 57–74, 2005. [19] S. Hunt and D. Sands, “Just forget it - the semantics and enforcement of information erasure,” in ESOP, 2008. [20] T. Kleymann, “Hoare logic and auxiliary variables,” Formal Aspects of Computing, vol. 11, pp. 541–566, 1999. [21] N. R. Krishnaswami and N. Benton, “Adding equations to System F types,” in ESOP, 2012, to appear. [22] A. McCreight, T. Chevalier, and A. Tolmach, “A certified framework for compiling and executing garbagecollected languages,” in ICFP, 2010. [23] D. Micciancio, “Oblivious data structures: Applications to cryptography,” in STOC, 1997. [24] J. C. Mitchell and G. D. Plotkin, “Abstract types have existential type,” TOPLAS, vol. 10, no. 3, pp. 470–502, 1988. [25] D. Molnar, T. Kohno, N. Sastry, and D. Wagner, “Tamper-evident, history-independent, subliminal-free data structures on prom storage-or-how to store ballots on a voting machine (extended abstract),” in IEEE Symp. Security and Privacy, 2006. [26] A. C. Myers, “JFlow: Practical mostly-static information flow control,” in POPL, 1999. [27] A. Nanevski, A. Banerjee, and D. Garg, “Verification of information flow and access control policies with dependent types,” in IEEE Symp. Security and Privacy, 2011. [28] M. Naor and V. Teague, “Anti-presistence: history independent data structures,” in STOC, 2001. [29] G. D. Plotkin and M. Abadi, “A logic for parametric polymorphism,” in TLCA, 1993. [30] J. C. Reynolds, “Separation logic: a logic for shared mutable data structures,” in LICS, 2002. [31] A. Russo, A. Sabelfeld, and A. Chudnov, “Tracking information flow in dynamic tree structures,” in ESORICS, 2009. [32] A. Sabelfeld and A. C. Myers, “Language-based information-flow security,” IEEE J. Selected Areas in Communications, vol. 21, no. 1, pp. 5–19, 2003. [33] A. Sabelfeld and D. Sands, “Declassification: Dimensions and principles,” JCS, vol. 17, no. 5, pp. 517–548, 2009. [34] D. M. Volpano, C. E. Irvine, and G. Smith, “A sound type system for secure flow analysis,” JCS, vol. 4, no. 2/3, pp. 167–188, 1996.

A PPENDIX A: M EDICAL DATABASE A PPLICATION This appendix presents key features of the medical database application described in the body of the paper. For even more details, interested readers are directed to the RHTT code that accompanies this paper. In what follows, we revert to large footprint specifications in order to clarify the connection to the code. A. Workflow Consider the medical records workflow shown in Fig. 10, adapted from Borgstr¨om et al. [11]. The policy is: users do ((db, root) ← new database; switch user db root; activate role admin db; doc ← new user db [clinician]; pat ← new user db [patient]; switch user db doc; activate role clinician db; request consent db pat; switch user db pat; activate role patient db; give consent db doc; switch user db doc; activate role clinician db; read emr db pat) Fig. 10: RHTT code for a medical records workflow adapted from Borgstr¨om et al. [11]. Our implementation of the workflow combines dynamic role-based access control with statebased policies. registered as doctors may read confidential patient records, but only if the patient has previously given consent. Patients may give consent only to doctors who have previously requested it. Users are created dynamically by a root user through calls to the new user function. Users are logged in and roles are activated dynamically as well, through calls to switch user and activate role respectively. B. Implementation In our RHTT implementation of this program, the function new database allocates local state for a new database data structure that records the current user, a list of activated roles and a roster of users. The type of new database in RHTT is STsec [ ] (database × user) (fun i . True, fun yy ii mm. ∃jj kk :heap. let ddb = b (get db yy.1, get db yy.2) in let uu = b (get user yy.1, get user yy.2) in let uuid = b (get uid yy.1, get uid yy.2) in mm = jj •• kk •• ii ∧ jj ∈ sshape ddb ([ ], [ ]) uu ([uuid .1], [uuid .2]) ∧ kk ∈ uushape (get user yy.1, get user yy.2) ([admin], [admin]) specifying that new database takes no local arguments ([ ]) and returns a pair of a database and a user as result. The precondition True ensures that new database can be called in 13

of active roles to [ ], whereas activate role role first checks that the active user is permitted to activate role and if so, adds role to act. The new user function

any heap, i.e., it never gets stuck. The relational postcondition of the STsec type specifies that the heaps m1 and m2 resulting from a call to new database are an extension of the initial heaps i1 and i2 by two smaller pairs of heaps, the heaps (j1 , j2 ) containing the newly allocated local database state, and the heaps (k1 , k2 ) containing the newly allocated local state of the root user. (Recall from Sec. II that e.g., ii in the type refers to pair (i1 , i2 ) as above.) The abstract predicate uushape is a relational invariant over two runs of new database parametrized by pairs of users and pairs of sequences of roles (seq role). Thus the judgment

STsec [database, seq role] user (fun db roles i . ∃(act:seq role) (user :user) (rost:seq uid) (j h:heap). i = j • h ∧ j ∈ shape db act user rost ∧ admin ∈ act, fun ddb rroles yy ii mm. let uuid = b (get uid yy.1, get uid yy.2) in ∀aact uu rrost jj hh. ii = jj •• hh → jj ∈ sshape ddb aact uu rrost → ∃jj 0 uu 0 :heap2 . mm = jj 0 •• uu 0 •• hh ∧ jj 0 ∈ sshape ddb aact uu (uuid .1 :: rrost.1, uuid .2 :: rrost.2) ∧ uu 0 ∈ uushape yy rroles)

kk ∈ uushape (get user yy.1, get user yy.2) ([admin], [admin]) states that in the two runs, heaps kk contain local state for the pair of users get user yy.1, and get user yy.2; furthermore, each such user has been granted administrative privileges via membership in the admin role — hence the pair ([admin], [admin]). The abstract predicate sshape ddb ([ ], [ ]) uu ([uuid .1], [uuid .2]) specifies a relational invariant similar to uushape over the local database state in the two runs. Here ddb is the pair of databases returned by the two runs, ([ ], [ ]) expresses that no roles are yet active, uu is the pair of root users returned in the two runs and ([uuid .1], [uuid .2]) gives the current roster, which initially just contains the ids for the root user. The implementation of new database first allocates a pair of a user id, 0, and a list of roles, [admin], for the root user, then allocates local state for the new database, a triple of the active root user, the empty list of active roles and the singleton roster [0].

takes two arguments, the root pointer to the database state db and a list of roles, roles, that will be granted to the new user upon creation. Function new user’s precondition requires that heap j , a subheap of the initial heap i , contain the local state for db (j ∈ shape db act user rost) and furthermore, that the admin role be active (admin ∈ act). This second condition is an instance of an access-control condition, here specified as a predicate on the database state. In the workflow given above, both calls to new user are safe since the root user previously activated the admin role, thereby putting admin in the list of active roles. Function new user’s postcondition extends the local database heap jj 0 —updated to add the new user to the roster—with a new heap uu 0 containing the new user’s local state. The implementation of new user do (fun db roles. db state ← read db; let rost = b get rost db state in let fresh id = b next uid rost in u ← lalloc (fresh id , roles); write db (upd rost db state (fresh id :: rost)); ret (u, fresh id ))

do (u ← lalloc (0, [admin]); db ← lalloc ((u, 0), [ ], [0]); ret (db, (u, 0)) We implement the abstract type database from the STsec type as a pointer to the triple and implement the abstract user type as a pointer to the user’s local state paired with the user’s id. Duplicating the user id in this way makes it possible to recover the id with one fewer pointer dereference. However, clients are shielded from such implementation details by an opaque interface. Finally, new database returns a pair of (1) a pointer db:database to the local database state, and (2) a tuple (u, 0):user where u is a pointer to the root user’s local state and 0 is the root user id. The function lalloc performs low allocation [27, Sec.III] that guarantees that the allocated pointers are equal in the two runs. The next three functions in the workflow (1) log in the root user (switch user db root), (2) activate the root user’s admin role (activate role admin db) and (3) create two new users, a clinician, doc, and a patient, pat, through two calls to the function new user for dynamic creation of principals. To describe these we will need the unary counterpart, shape db act user rost, of the relational invariant sshape that we saw earlier. At a high level, switch user updates the active user (the user component of shape) and clears act, the list

allocates the new user’s local state (u), assigns the user a unique user id (fresh id ) and adds the new user id to the database roster. The new user is not automatically logged in but can do so later through a call to switch user. Function new user is unusual in that it combines role-based access control with temporal access constraints. The former arises from the fact that a principal must be a member of the admin group in order to call new user. The latter are a result of predetermined controls on the order in which interface functions must be called, enforced by the preconditions of STsec types: in order to satisfy new user’s precondition, a user must first call activate role to move admin into the list of active roles. Three of the other functions called in the workflow above, request consent, give consent and read emr, impose similar role-based and temporal constraints on access. These constraints can be summarized as follows: 1) Only administrative users may create new users. 2) Only clinicians may request consent. 14

that update the heap would then invalidate can read emr. However, this violates our intended semantics; can read emr tokens should persist across unrelated updates to the database state. To give a flavor for how such functions and abstract predicates are defined in RHTT, we describe the STsec type of request consent in more detail here.

3) A user must activate a role before exercising capabilities associated with that role. 4) A patient may give consent only to a doctor who has previously requested it. 5) A doctor may read a patient’s EMR only after the patient has given consent. Constraints 1 and 2 are role- or capability-based whereas 3, 4 and 5 are temporal: they require that actions be performed in a prespecified order. In RHTT, we specify such temporal or state-based access control properties using abstract predicates. By carefully controlling the interrelationships between abstract predicates and the functions exposed by the module interface, RHTT programs may impose arbitrary temporal constraints on clients. For example, the STsec type for a function f may state that an abstract predicate p holds in any state resulting from the execution of f . The precondition of a second function g may require that p hold in any state in which g is executed. If the module interface ensures that f is the only way to prove that p holds then we have enforced that f is always called before g. Furthermore, since p is indexed by the state resulting from the call to f , any updates to the state will invalidate p, requiring that f be called again before g is executed. In the context of the workflow above, we use just this approach to enforce that 1) request consent is called before give consent; and 2) give consent is called before read emr. To do so, we define three new abstract predicates, requested consent, consented and can read emr, each of which is a function from a doctor’s user id and a patient’s user id to prop. The use of these predicates is governed by the constraint that a client of the medical records module cannot call read emr db pat unless the current user is doc and can read emr doc pat holds. We further require that (1) give consent db doc can only be called when requested consent doc pat holds for the current user pat, (2) calling give consent produces the abstract predicate consented doc pat and (3) consented doc pat implies can read emr doc pat. To ensure that can read emr permission can be retracted, we might choose to index the predicate by the current heap, in addition to doc and pat. Database calls

STsec [database, user] unit (fun db pat i . ∃(act:seq role) (user :user) (rost:seq user) (j h:heap). i = j • h ∧ j ∈ shape db act user rost ∧ clinician ∈ act, fun ddb ppat yy ii mm. ∀aact uu rrost jj hh. ii = jj • hh ∧ jj ∈ sshape ddb aact uu rrost → mm = ii ∧ rrequested consent (get uid uu.1, get uid uu.2) (get uid ppat.1, get uid ppat.2)) Function request consent takes a database and a user—from whom the active principal is requesting consent—as local arguments. The precondition of the STsec type requires that the subheap j contain local state for the database and that the clinician role be active. The postcondition asserts that requested consent u pat holds after both executions. To express this fact, we assert that the relational predicate rrequested consent (get uid uu.1, get uid uu.2) (get uid ppat.1, get uid ppat.2) holds in the final states. Here rrequested consent just lifts the unary requested consent to operate on pairs of heaps. In order to use the relational version of this predicate in the preconditions of functions that require it, we expose the additional property that rrequested consent ddoc ppat → requested consent ddoc.1 ppat.1 ∧ requested consent ddoc.2 ppat.2 i.e., that the relational invariant implies the unary property for each run.

15