Processor Rescue: Safe Coding for Hardware Aliasing Peter T. Breuer⋆ , Jonathan P. Bowen1 , and Simon Pickin2 1
School of Computing, Telecommunications and Networks, Birmingham City University, UK
[email protected],
[email protected] 2 Facultad de Inform´ atica, Complutense University of Madrid, Spain
[email protected]
Abstract. What happens if a Mars lander takes a cosmic ray through the processor and thereafter 1 + 1 = 3? Coping with the fault is feasible but requires the numbers 2 and 3 to be treated as indistinguishable for the purposes of arithmetic, while as memory addresses they continue to access different memory cells. If a program is to run correctly in this altered environment it must be prepared to see address 2 sporadically access data in memory cell 3, which is known as ‘hardware aliasing’. This paper describes a programming discipline that allows software to run correctly in a hardware aliasing context, provided the aliasing is underpinned by hidden determinism. Key words: hardware aliasing, machine code, compilation.
1
Introduction
that a Mars lander has suffered a cosmic ray hit that damages a piece Inewmagine of internal circuitry in its arithmetic logic unit (ALU) so that 1 + 1 = 3 is the outcome from an addition. It continues to be the case that 2 + 0 = 2 and 3 - 1 = 2 as usual, but 1 + 1 and possibly more come out wrong. What should the mission team do about that? We hope they have a backup processor ready to take over in the lander in the short term3 , but it turns out that we can rehabilitate the faulty processor via some reprogramming from back home on Earth, first adjusting fault handlers to tidy up the processor’s arithmetic so that it works with less precision than before, then rewriting application machine codes in the lander to suit. Successful rehabilitation restores processing redundancy. ⋆
3
The first named author wishes to acknowledge the support of HecuSys Inc. (http://www.hecusys.com) in connection with KPU technology, described herein. The Curiosity rover now on Mars has a second processor that was signalled to take over on February 28, 2013, due to an issue with the active processor’s flash memory that resulted in it continuously rebooting, draining power. The rover was not heard from again until March 4, the spacecraft nearly having been lost. The issue reoccurred late in March, 2013, and full normal operation was only finally resumed on March 25, 2013, after reloading programs to a different portion of flash memory.
2
Peter T. Breuer, Jonathan P. Bowen, and Simon Pickin
A solution in the 1 + 1 = 3 case is to ignore bits 0 and 1 everywhere, so that the bit-patterns 0, 1, 2, 3 are all taken to mean the same integer value 0, and the bit-patterns 4, 5, 6, 7 are all taken to mean the integer value 1, and so on. Viewed like that, arithmetic in the processor is not 32-bit, but 30-bit, with a four-way redundancy in how each integer is coded as a bit-pattern. Only the top 30 bits of each bit-pattern in memory or on disk remain meaningful. When the processor calculates 1 + 1 = 3, it should be looked at as a concrete instance ‘on the metal’ of the sum 0+0 = 0, with different bit patterns representing the integer 0. There is potentially a crack in the illusion when the processor checks for zero, because it needs on the metal to check for any of the four possible bit-patterns 0, 1, 2 or 3 representing 0, but programs can be altered to cope or it can be arranged that they fault on each branch-if-zero machine instruction and the four-fold check is run in the fault handler, avoiding any application code changes. The result is fully homomorphically encrypted computation [7, 18]. The processor continues to work correctly, but using a multi-valued numerical encoding in registers and memory that is different from the standard encoding. The reader need not be concerned that it works correctly because a formal proof of correctness in the case of a Von Neumann processor architecture and 1-to-many encodings is given in [7], and for a rewrite-rule machine architecture in [18]. In the case of the Von Neumann machine, programs have to keep data addresses and program addresses separated according to the type discipline articulated in [5], because the former are encrypted and the latter are not, but that is the only restriction. Memory access in a mended lander’s processor is problematic, however, and is not so straightforward to handle. A memory offset that is intended by the programmer to be 0 may be expressed by the processor as any of the bit-patterns 0, 1, 2 or 3, and the memory circuits will access memory at any one of four corresponding locations, with apparently haphazard results from the point of view of the program. That is classically known as hardware aliasing, and this paper shows how to ensure that machine code modified for a damaged Mars lander continues to work correctly in the face of the hardware aliasing issues that arise in connection with memory addressing. Aliasing in the more familiar sense – let us dub it ‘software aliasing’ in this context in order to distinguish it – is relatively well-studied. It occurs when two different addresses access the same location in memory. This kind of aliasing is broadly treated in most texts on computer architecture (see, e.g., Chapter 6 of [2]) and is common lore in operating systems kernel programming. In contrast, ‘hardware’ aliasing only nowadays commonly arises in embedded systems under certain circumstances4 and awareness of it is restricted to a few embedded systems programmers. But programmers do have prior experience with hardware aliasing, although the era has been largely forgotten now. It used to be common in the early days of DOS and the IBM PC, when memory managers such 4
Barr in [2] describes it thus: “Hardware Aliasing: . . . [is] used to describe the situation where, due to either a hardware design choice or a hardware failure, one or more of the available address bits is not used in the memory selection process.” That is, the available memory address bits do not serve to distinguish two different locations.
Processor Rescue
3
as Quarterdeck’s expanded memory manager (QEMM) [11] allowed the same address to access both BIOS and RAM opportunistically just below 1024K. The situation envisaged here is somewhat special with respect to that venerable era in that aliasing in a mended Mars lander is fully deterministic ‘under the hood’. It may be observed that 1 + 1 = 3 consistently, every time, because the damage is a physical open or short circuit, and running exactly the same sequence of machine code always produces exactly the same arithmetic results. Axioms: the discussion in this paper is restricted to hidden deterministic hardware aliasing contexts, which means that: I a saved address copy always accesses the same memory location on reuse; II recalculating the address exactly the same way accesses the same location. We are motivated particularly by one special case: the Krypto-Processor (KPU) is a processor design in which encrypted data circulates through memory and registers without ever being decrypted [9, 8]. It is in principle secure from backdoors, privileged observers and malfeasants. Nevertheless, a KPU is in every way but one an ordinary CPU; the difference is that its ALU has been transformed mathematically so that it works natively on encrypted data. Because practical encryptions encode each integer value as many alternative bit-patterns in the KPU, it looks from the outside rather like a Mars lander with a damaged ALU. To make sense of the numbers that come out of the KPU, or a damaged Mars lander’s processor, one has to look at the numbers through the right kind of ‘refracting lenses’. The right lens to look at the numbers coming out of a KPU with is the decrypting function for the cipher in use. In the case of the damaged Mars lander, it is the lens that sees no difference between the bit patterns 0, 1, 2, 3, equating them all to the intended number 0. The canonical KPU design encrypts 32-bit integers in 64 bits, so up to 232 different 64-bit patterns exist for each intended 32-bit address. While all are meant by the programmer to address the same location, they all in fact address different locations via the KPU’s quite standard memory hardware, which is not privy to the encryption used in order that ‘cold boot’ attacks [13, 17, 12] that examine memory contents may not succeed – such attacks will discover no more than encrypted memory contents located at encrypted memory addresses, and many alternative encrypted addresses at that. The memory address decoder hardware in the KPU does not come equipped with the magic lenses that decrypt what the circulating bit-patterns mean, and the result is hardware aliasing. The other case of note that produces hardware aliasing by design is a bespoke embedded processor with, say, 40 bits of arithmetic but provision for 64-bit addressing. The extra memory address lines might be grounded, or high, and it varies from design to design. The lines may be connected to 64-bit address registers in the processor, so their values change with the register contents and it is up to software to set the extra bits to zero, or one, or some consistent value, in order that calculating an address may yield a consistent result. The good news is that a single executable may be compiled that works for all design variants in the same generic class, just because the aliasing is deterministic underneath.
4
Peter T. Breuer, Jonathan P. Bowen, and Simon Pickin
Indeed, given conditions (I), (II), it turns out to be not hard to write code that avoids aliasing. There is a programming discipline to be followed, and conventional assembly and machine code may be altered to follow it. The rationale behind the discipline is that (I), (II) say that the bit-pattern representing a memory address is just a number deterministically produced by the processor from its inputs. Provide exactly the same inputs again (identical bit-patterns) and the processor will repeat the same transformations to produce the same outputs. If every time an address is needed, the same instructions are used to calculate it in exactly the same way from the same bit-patterns as starting point, the same pattern of bits must result. So the discipline consists of using exactly the same calculation for the same address from the same starting point every time. ‘Copying’ (I) is just the trivial case, nevertheless it is one point where existing assembly and machine code almost always needs modification to work in a hardware aliasing context. Reduced instruction set (RISC) [16] architecture processors in particular have code that is written to move data between registers by adding zero through the ALU, not by shuffling data out of one register into another. That enables the instruction set to be comprised of one less instruction. But passing data through the ALU, even adding zero, may transform the bit-representation non-trivially in the setting considered here; programmers assume that adding zero to the bit-pattern 0x1 gives just the same bit-pattern 0x1 again, but in the context of a KPU or a broken Mars lander, the number 0 may be represented by the bit-pattern 0x2, and adding zero (in the form of the alternative bit-pattern 0x1) may produce the different bit-pattern 0x3 although it is just another encoding of the number 0. We are not aware of any existing techniques to fix or accommodate a broken processor in the field. There is work on strategies to cope with ionising radiation (e.g., [19], which advocates for redundant caching hardware) but we have found none that contemplates repairing a chip’s ability to calculate after damage to it. The layout of this paper is as follows: Section 2 discusses mending processor arithmetic to the point where programs can run again. Section 3 introduces the consequent hardware aliasing problem, and Section 4 shows how to compile around hardware aliasing when the aliasing effect is underpinned by hidden determinism. Section 5 illustrates the procedure with a short example and references where to find larger examples and software tools related to this solution.
2
Processor Repair up to a Point
When a processor develops the idiosyncrasy that 1 + 1 = 3, that gives rise to logical contradictions through the standard laws of algebra. Surely 1 + 2 = 1 + 1 + 1 = 3 + 1 = 4 in consequence, yet 1 + 2 = 3 may continue to be the output from the processor. So does 3 = 4? In this section a full repair for 1 + 1 = 3 will be examined for a hypothetical 3-bit CPU with the following arithmetic tables,
Processor Rescue
5
incorporating the 1 + 1 = 3 fault: + 0 1 2 3 4 5 6 7
0 0 1 2 3 4 5 6 7
1 1 3 3 4 5 6 7 0
2 2 3 4 5 6 7 0 1
3 3 4 5 6 7 0 1 2
4 4 5 6 7 0 1 2 3
5 5 6 7 0 1 2 3 4
6 6 7 0 1 2 3 4 5
7 7 0 1 2 3 4 5 6
× 0 1 2 3 4 5 6 7
0 0 0 0 0 0 0 0 0
1 0 1 2 3 4 5 6 7
2 0 2 4 6 0 2 4 6
3 0 3 6 1 4 7 2 5
4 0 4 0 4 0 4 0 4
5 0 5 2 7 4 1 6 3
6 0 6 4 2 0 6 4 2
7 0 7 6 5 4 3 2 1
In this particular case a solution is to regard the bit-patterns 0, 1, 2, 3 as equivalent, meaning 0. In general the solution is always to develop a notion of equivalence among bit-patterns. An equivalence class proposed here is {0, 1, 2, 3} meaning 0, and the equivalence it is part of is x ≡ y iff x − y ∈ {0, 1, 2, 3}. The aim is to develop an equivalence such that all the variant answers that one may get for a particular result by varying the calculation through the stricken ALU lie within the same equivalence class. That requires adjustments in the arithmetic tables beyond the original fault. In this case x +′ y = (x2 ^ y2 ).(x + y)1,0 and x ×′ y = (x2 & y2 ).(x × y)1,0 will do (giving the formulae bitwise here): +′ 0 1 2 3 4 5 6 7
0 0 1 2 3 4 5 6 7
1 1 3 3 0 5 6 7 4
2 2 3 0 1 6 7 4 5
3 3 0 1 2 7 4 5 6
4 4 5 6 7 0 1 2 3
5 5 6 7 4 1 2 3 0
6 6 7 4 5 2 3 0 1
7 7 4 5 6 3 0 1 2
×′ 0 1 2 3 4 5 6 7
0 0 0 0 0 0 0 0 0
1 0 1 2 3 0 1 2 3
2 0 2 0 2 0 2 0 2
3 0 3 2 1 0 3 2 1
4 0 0 0 0 4 4 4 4
5 0 1 2 3 4 5 6 7
6 0 2 0 2 4 6 4 6
7 0 3 2 1 4 7 6 5
It can be seen that the modifications leave the four quarters of the tables each occupied by elements from only one of the two equivalence classes {0, 1, 2, 3} and {4, 5, 6, 7}. That is, if x′ ≡ x and y ′ ≡ y then x′ +′ y ′ ≡ x +′ y, which means that the modified arithmetic ‘makes sense’ with respect to this equivalence. While it is not hard to make these modifications in practice – the processor must be reconfigured to fault on arithmetic instructions and the fault handler must be programmed to produce the modified result using the functionality available –, in this particular case the fault handler might as well just correct the original result 1 + 1 = 3 back to the intended 1 + 1 = 2. However, in general that is not always convenient. It may be, for example, that one pin of the ALU output is stuck to the internal carry, so it is hard to force the correct value to appear in that bit position without also losing the carry. It is always the case, however, that whatever repair is effected, the repair either merely permutes all the bit-patterns available or else it makes equivalent some bit-patterns, which now represent the same integer just as 0, 1, 2, 3, 4 represent the integer 0 above. The latter situation is of most interest. The equivalence classes can be viewed as the inverse images of decodings to integers under some function D. In the example above the equivalence class {0, 1, 2, 3} = D−1 {0} and {4, 5, 6, 7} = D−1 {1}, so D is D(x) = ⌊x/4⌋, the integer part of x/4. The design of the repaired addition and multiplication operations is such that D(x +′ y) = D(x) + D(y) mod 2 and D(x ×′ y) = D(x)D(y) mod 2. That means that D is a mathematical
6
Peter T. Breuer, Jonathan P. Bowen, and Simon Pickin
homomorphism on the set of bit-patterns that turns the computer operations into the arithmetic operations mod 2 (more generally, 2n , for some n). In repairing a faulty processor one is looking to create a homomorphism, and the arithmetic will then be a homomorphic image of arithmetic mod 232 (for a 32-bit processor), and thus be an arithmetic mod 2n for n ≤ 32. That means that the effective precision, or number of bits, of a repaired processor is reduced from 32 to n, and if any two bit-patterns represent some one integer, then n < 32 and there are exactly 232−n bit-patterns representing each integer value. The reasoning in the paragraphs above explains why the KPU is relevant here. The arithmetic in a KPU is an altered form of computer arithmetic, such that the result of an addition E(a) + E(b) in the processor of two encrypted values E(a) and E(b) is an encryption E(a+b mod 232 ) of the expected arithmetic result a + b mod 232 . In the canonical KPU design there are 232 encryptions of every 32-bit integer, each fitting into a 64-bit word, and E is a 1-to-232 ‘many-valued function’, or relation. Its inverse, the decrypting function D, has D(E(a) + E(b)) = D(E(a + b mod 232 )) = a + b mod 232 . Writing a′ = E(a) and b′ = E(b), this says that D(a′ + b′ ) = a + b mod 232 = D(a′ ) + D(b′ ) mod 232 , and D is a homomorphic function. The decryption D establishes equivalence classes for an equivalence x ≡D y iff D(x) = D(y) that settles which bit-patterns x and y are alternate codings for the same integer value D(x) = D(y). Any strategy that allows programs to continue working in the context of the deliberately changed arithmetic in a KPU also enables continued working in the context of a repair to an impaired arithmetic in a conventional processor.
3
The Hardware Aliasing Problem
How code may go wrong when arithmetic in the processor is a reduced-precision image of the original, either through repair in a Mars lander or deliberately in a KPU, is illustrated by the way that a compiler renders machine code for the stack pointer movement around a function call. Say the code of subroutine foo first decrements the stack pointer by 32 to make space for a frame of 8 local variables of one word (4 bytes) each on the stack. Before return from the routine, the code increments the pointer back to its original value sp 0 . The following is the assembler/machine code emitted by a RISC compiler (gcc 4.9 for MIPS): foo: addiu sp sp -32 # decrement stack pointer register by 32 (8 word frame) . . . more code . . .
addiu sp sp 32 # restore initial stack pointer value by adding 32 again jr ra # jump back to return address stored in ra register The ‘restore . . . by adding 32’ calculates sp 0 - 32 + 32. That is a bit-pattern equivalent via the equivalence of the previous section to the intended result sp 0 but not necessarily identical to it. If sp 0 were, say, 0xb0000000, then sp 0 - 32 + 32 might
Processor Rescue
7
be not 0xb0000000 but 0x12345678. Though both 0xb0000000 and 0x12345678 represent the same integer value, they are different encodings of it. The outcome is that a different ‘alias’ bit-pattern of the initial stack pointer sp 0 ends up in the sp register. The caller gets back a pointer 0x12345678 that does not point to its own data, which was left at 0xb0000000. It restores from 0x12345678 so it will not recover the data it wrote at 0xb0000000 earlier. The following code sequence works instead: foo: move fp sp addiu sp sp -32
# copy stack pointer register to fp register # decrement stack pointer register by 32 (an 8 word frame)
. . . more code . . .
move sp fp jr ra
# copy stack pointer value back from fp register # jump back to return address stored in ra register
This code is not victim to the aliasing effect. It takes an extra register (fp) and needs an extra instruction (the initial move), but the old register content may first have been saved on the stack to be restored before return so it is not lost. The fp register may also be saved on the stack during execution of the interior code in the routine, and restored before return, so there is no loss of a slot. How may one formally show the second code is aliasing-safe? One technique is described in [6]. There, semi-automatic decompilation [3, 4] of RISC machine code to assembler code for a stack machine and its automatic validation via a Hoare logic [15] is used, and that can show that the code above (in full) is safe. The technique annotates the decompiled code in the style of verification frameworks such as VCC [10]. Where human assistance is required is in choosing between alternative decompilations of the machine code, which amounts to choosing between alternative logical rules of inference that may be applied at each point in the machine code (the different logical rules correspond to different decompilations). However, [6] shows that there are at most 32 different decompilations possible for each RISC machine code, corresponding to the different registers in which the stack pointer may reside at program start. If one assumes that the stack pointer is in the sp register, as is standard, then there is no ambiguity. An interesting point is that different decompilations of machine code correspond to different proofs that the machine code is safe, so while there may be several different decompilations available from point to point in the code, very few combinations of those will fit together coherently.
4
Constructing Hardware Alias-Safe Code
Another technique than validation of existing machine code after the fact is to construct the machine code to be safe in the first place, which means compiling appropriately. The validation technique of [6] establishes, inter alia, that: (a) reads and writes of local variables within the current routine’s stack frame are only by means of machine code memory load and store instructions that each address a fixed constant offset from the bottom of the frame;
8
Peter T. Breuer, Jonathan P. Bowen, and Simon Pickin
(b) no read or write beyond the current routine’s stack frame boundary, say to a parent frame’s local variable, is attempted; (c) no stack location is read before it is written. Reading these criteria as a recipe for compiling machine code results in machine code that is safe against hardware aliasing by construction. Criterion (a) and (b) taken together mean there is only one way of accessing a local variable at offset 12 from the bottom of the frame. It is via a lw r, 0xabcd(fp)
or
sw 0xabcd(fp), r
machine code instruction, respectively a load to and a store from register r, where 0xabcd is a fixed constant bit-pattern encoding the integer 12. Only one of the possible bit-patterns (aliases) representing the integer offset 12 is allowed and 0xabcd has been chosen here. Following that recipe guarantees that every time the address of the variable is calculated during the subroutine call, it is by means of exactly the same calculation on exactly the same atomic components, namely fp + 0xabcd.5 The integer offset encoded must be less than the frame size, so there is no possibility of accessing local variables in the parent frame from a subroutine – such accesses are often generated by compilers as optimisations [14, 1]. That would amount to using two different calculations for the same intended address, which cannot be relied on to deliver the same bit-pattern. Because the same calculation is used for the address of the local variable each time, down to the bit-patterns representing constant elements of the calculation, Axiom II of Section 1 applies, and the same memory location really is accessed each time, thereby avoiding hardware aliasing. Exactly the same technique is used to access arrays: (d) an array element may only be accessed via an explicit offset from the bottom of the array embedded in a load or store instruction, and the same bit-pattern must be used for the displacement each time, even if other bit-patterns exist that also represent the same integer offset; (e) no access below zero or beyond the array extent may be attempted; (f) no array element may be read before it is written. In consequence, exactly the same calculation for the address of an array element is used each time it is needed, and Axiom II guarantees that the same memory location is accessed, avoiding aliasing. Strings, however, are accessed via a different pattern. The idea is to use calculations for the address that have the form base + 0, base + 1 + 0, base + 1 + 1 + 0, . . . 5
A more sophisticated version of this recipe relaxes the rule to allow different aliases than 0xabcd to be used provided that the same alias is always used for one write and the succeeding reads; the next write may use a different alias again.
Processor Rescue
9
for the consecutive elements of the string. That is done by incrementing the string pointer from the base address in constant amounts via immediate addition operations and then doing a final access via a load or store at a displacement of zero. Given that the base address of the string is in register r1 , the instruction sequence to read the second element of the string into register r2 is addi r1 r1 0xf000baaa; addi r1 r1 0xf000baaa; lw r2 0xdeedd04e(r1 ) where 0xf000baaa is a bit-pattern representing the integer 1 and 0xdeedd04e is a bit-pattern representing the integer 0. That is: (g) a string element may only be accessed by a sequence of constant increments from the base of the string, using the same bit-pattern for the increment each time, followed by a load or store instruction with displacement zero, expressed as the same constant bit-pattern each time; (h) no access below the start of the string or beyond a null element in the string may be attempted; (i) no string element may be read before it is written. Because strings are set up in read-only memory during program load before the program runs, some calculation by the compiler at the time the executable file is constructed ensures that the string elements are specified at exactly that address bit-pattern where the program will look for them at runtime.
5
Example Code
A simple machine code program, safe from aliasing, that just calls ‘printstr’ with a string address as argument, then calls ‘halt’, is shown in Table 1. The reader may recognise it as a “hello world” program. It contains subroutine calls, conditionals, jumps, etc., as well as string accesses. The code was emitted by a modified standard compiler (gcc 4.9 for MIPS), so some compiler quirks are still visible. An address for the “hello world” string on the heap is introduced on line 7 of ‘main’ by the li a0 (load immediate) instruction, which sets the a0 (‘0th argument’) register for the call to ‘printstr’ on line 8. Execution stops in the ‘halt’ subroutine. The ‘main’ code contains the safe-from-aliasing stack push and pop sequence described in Section 3. Line 1 saves the stack pointer in the frame pointer, line 2 changes the stack pointer, making a local frame in which those registers that will be clobbered by the subroutine calls can be saved (lines 3 and 4). The frame pointer itself is one of those saved registers (line 4). It is to be supposed that the called subroutines each execute a similar sequence as the ‘main’ program does in order to recover the value of the stack pointer that they had on entry. Therefore line 15 restores the frame pointer after the subroutine calls have returned, and line 16 moves it back into the stack pointer, reestablishing the value of the stack pointer that it had at entry to the subroutine. Note that program addresses are here embedded ‘as is’ in program machine code. The processor and instruction set architecture is or should be designed so
10
Peter T. Breuer, Jonathan P. Bowen, and Simon Pickin
Table 1: Example code. For clarity the intended offsets and increments are shown, not the bit-patterns that code for them.
1. 2. 3. 4. 7. 8. 10. 11. 14. 15. 16. 17.
main: move fp sp addiu sp sp -32 sw ra 28(sp) sw fp 24(sp) li a0 jal jal nop lw ra 28(sp) lw fp 24(sp) move sp fp jr ra helloworld: hstring datai
; ; ; ; ; ; ;
copy stack pointer to frame pointer push stack for local frame save ra in local frame save old stack pointer in local frame load string address call printstr subroutine call halt subroutine
; ; ; ;
restore ra prepare to restore old stack pointer pop stack, deleting local frame return
there is no calculation involved in going from the bit-pattern for an address that is embedded in a machine code jump or branch instruction to that which the fetch cycle in the processor uses to retrieve the target instruction from memory. A KPU simulator based on the OpenRISC v1.1 processor (see homepage at http://opencores.org/or1k) is available at http://sf.net/p/or1ksim64kpu, and it exhibits 232 -way hardware aliasing via its 32-bits-encrypted-in-64-bits architecture. A tool-chain is available at http://sf.net/p/or1k64kpu-binutils/. The OpenRISC ‘or1ksim’ test suite has been compiled for this KPU, and a typical arithmetic test in the suite (the ‘is-add-test’, for addition) comprises 205,582 executed instructions, 176,117 loaded. The test suite executes without error, which lends some empirical weight to the claim that the compilation strategy described in this paper is sound, and results in hardware aliasing-safe code.
Conclusion In this paper, we have described a style of compilation to machine code that avoids hardware aliasing in an environment where the aliasing has hidden determinism. In that kind of environment, a program may choose to replay the same calculation for the same address, resulting in a unique bit-pattern being used to access that address at runtime, avoiding hardware aliasing. A repeatable fault in a processor can be masked by regarding ‘wrong’ calculations as producing a bit-pattern that is an alternative to the conventional encoding of the result. Modulo an induced equivalence, conventional computer arithmetic is restored, albeit with fewer bits of precision. But the repair causes programs to exhibit hardware aliasing, because different bit-patterns intended to encode the same target address really access different memory locations. Recompiling programs in the style described here then completes the repair.
Processor Rescue
11
References 1. Allen, R., Kennedy, K.: Optimizing compilers for modern architectures: a dependence-based approach, vol. 289. Morgan Kaufmann San Francisco (2002) 2. Barr, M.: Programming Embedded Systems in C and C++. O’Reilly & Associates, Inc., Sebastopol, CA, USA, 1st edn. (1998) 3. Bowen, J.P., Breuer, P.T.: Decompilation. In: van Zuylen, H. (ed.) The REDO Compendium: Reverse Engineering for Software Maintenance, chap. 10, pp. 131– 138. John Wiley & Sons (1993) 4. Breuer, P.T., Bowen, J.P.: Decompilation: The enumeration of types and grammars. ACM Transactions on Programming Languages and Systems (TOPLAS) 16(5), 1613–1647 (Sep 1994) 5. Breuer, P.T., Bowen, J.P.: Typed assembler for a RISC crypto-processor. In: Proc. ESSoS’12: International Symposium on Engineering Secure Software and Systems. LNCS, vol. 7159, pp. 22–29. Springer (Feb 2012) 6. Breuer, P.T., Bowen, J.P.: Certifying machine code safe from hardware aliasing: RISC is not necessarily risky. In: Counsell, S., N´ un ˜ ez, M. (eds.) OpenCert 2013, Workshops collocated with SEFM 2013, Madrid, Spain, 23–26 Sep. 2013. LNCS, vol. 8368. Springer (Sep 2013) 7. Breuer, P.T., Bowen, J.P.: A fully homomorphic crypto-processor design: Correctness of a secret computer. In: Proc. ESSoS’13: International Symposium on Engineering Secure Software and Systems. LNCS, vol. 7781, pp. 123–138. Springer (Mar 2013) 8. Breuer, P.T., Bowen, J.P.: Avoiding hardware aliasing: Verifying RISC machine and assembly code for encrypted computing. In: Proc. 25th IEEE Intlernational Symposium on Software Reliability Engineering Workshops (ISSRE 2014), 2nd IEEE International Workshop on Reliability and Security Data Analysis (RSDA 2014). pp. 365–370. IEEE (Nov 2014) 9. Breuer, P.T., Bowen, J.P.: Towards a working fully homomorphic crypto-processor – practice and the secret computer. In: J¨ orjens, J., Pressens, F., Bielova, N. (eds.) Proc. ESSoS’14: International Symposium on Engineering Secure Software and Systems. LNCS, vol. 8364, pp. 131–140. Springer (Feb 2014) 10. Cohen, E., Dahlweid, M., Hillebrand, M., Leinenbach, D., Moskal, M., Santen, T., Schulte, W., Tobies, S.: VCC: A practical system for verifying Concurrent C. In: Berghofer, S., Nipkow, T., Urban, C., Wenzel, M. (eds.) Proc. 22nd TPHOLs, LNCS, vol. 5674, pp. 23–42. Springer (2009) 11. Glosserman, P.: Quarterdeck Expanded Memory Manager: QEMM, instant power for 386, 486 or Pentium PCs. Quarterdeck Office Systems (1985) 12. Gruhn, M., M¨ uller, T.: On the practicability of cold boot attacks. In: 8th International Conference on Availability, Reliability and Security (ARES 2013). pp. 390–397 (Sep 2013) 13. Halderman, J.A., Schoen, S.D., Heninger, N., Clarkson, W., Paul, W., Calandrino, J.A., Feldman, A.J., Appelbaum, J., Felten, E.W.: Lest we remember: cold-boot attacks on encryption keys. Communications of the ACM 52(5), 91–98 (2009) 14. He, J., Bowen, J.P.: Specification, verification and prototyping of an optimized compiler. Formal Aspects of Computing 6(6), 643–658 (1994) 15. Hoare, C.A.R.: An axiomatic basis for computer programming. Communinications of the ACM 12(10), 576–580 (Oct 1969) 16. Patterson, D.A.: Reduced instruction set computers. Communications of the ACM 28(1), 8–21 (Jan 1985)
12
Peter T. Breuer, Jonathan P. Bowen, and Simon Pickin
17. Simmons, P.: Security through amnesia: A software-based solution to the cold boot attack on disk encryption. In: Proc. 27th Annual Computer Security Applications Conference (ACSAC’11). pp. 73–82. ACM, New York, NY, USA (2011) 18. Tsoutsos, N.G., Maniatakos, M.: The HEROIC framework: Encrypted computation without shared keys. IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems 34(6), 875–888 (Apr 2015) 19. Wang, S., Hu, J., Ziavras, S.G.: On the characterization of data cache vulnerability in high-performance embedded microprocessors. In: Proc. IC-SAMOS 2006: International Conference on Embedded Computer Systems: Architectures, Modeling and Simulation. pp. 14–20 (Jul 2006)