Remote Cache Timing Attack on Advanced Encryption Standard and Countermeasures Darshana Jayasinghe, Jayani Fernando, Ranil Herath, and Roshan Ragel Department of Computer Engineering, Faculty of Engineering, University of Peradeniya, Peradeniya 20400 Sri Lanka
[email protected],
[email protected],
[email protected], and
[email protected]
Abstract— AES, Advanced Encryption Standard, is a symmetric key encryption standard being widely used to secure data in places where data confidentiality is a critical issue. AES was adopted from the Rijndael algorithm which was developed by Joan Daemen and Vincent Rijmen. In 2001 NIST, National Institute of Standards and Technology, declared Rijndael algorithm as the next generation cryptographic algorithm, and thus was titled AES – Advanced Encryption Standard. NIST spent several years analyzing the Rijndael algorithm for vulnerabilities against all known breeds of attacks and finally declared it to be a secure algorithm. In 2005 Daniel J. Bernstein claimed that the software implementation of AES is vulnerable to side channel attacks. Side Channel Attacks are a form of cryptanalysis that focuses not on breaking the underlying cipher directly but on exploiting weaknesses found in certain implementations of a cipher. One could derive attacks based on side-channel information gained through timing information, radiation of various sorts, power consumption statistics, cache contents, etc. AES uses a series of table look ups to increase its performance. Since these tables do not fully fit into the cache, cache hits and misses are frequent during encryption, causing various look up times, and thus various encryption times that change according to the input text and the encryption key. The Cache Timing Attack proposed by Bernstein correlates the timing details for encryption under a known key with an unknown key to deduce the unknown key. Bernstein demonstrated the attack against the OpenSSL 0.9.7a AES implementation on an 850MHz Pentium III desktop computer running FreeBSD 4.8. Over the years many researchers have proposed a number of countermeasures against Bernstein’s Cache Timing Attack but there is no evidence to date of any investigation carried out to determine their effectiveness and efficiency. Our study focused on verifying Bernstein’s Cache Timing Attack and investigating some of the countermeasures that have been proposed by implementing them. Keywords— Side Channel Attack, Advanced Encryption Standard, Cache Timing Attack, Security, Countermeasures I.
I
INTRODUCTION
N 1997 the National Institute of Standards and Technology (NIST) started a process to identify a replacement for the Data Encryption Standard (DES), a
978-1-4244-8551-2/10/$26.00 ©2010 IEEE
block cipher used for shared secret encryption [1]. After four years of competition Rijndael, proposed by Vincent Rijmen and Joan Daemen, was announced as the Advanced Encryption Standard (AES) under FIPS 197 by the NIST [2]. Today AES is widely deployed in both software and hardware and is expected to be the world's predominant block cipher over the coming years. AES is an iterated block cipher, which uses a fixed block size of 128 bits and a key which is 128, 192 or 256 bits in length. Different transformations operate on the intermediate results, called states. After an initial round key addition, the state array is transformed by implementing a round function 10, 12, or 14 times depending on the key length. Each round except the last consists of four stages: SubBytes, ShiftRows, MixColumns and AddRoundKey. Two of these stages involve transformations over Galois Field (GF - 28). Generally in software implementations, the multiplicative inverse over GF (28) is pre-computed and stored in memory in a table named SBOX. In order to speed up execution of the cipher, software implementations may further combine the SubBytes and ShiftRows with MixColumns, transforming them into a sequence of table lookups. These tables store pre-computed values avoiding time consuming computations. During the AES selection process, it was believed that timing attacks were only applicable to software with a datadependent execution path (i.e., branch statements, datadependent shifts, etc.). In the final evaluation of AES candidates, NIST stated that table lookup operations are ‘not vulnerable to timing attacks’ and declared Rijndael as capable of averting side-channel attacks. Despite the previous optimistic claims by the NIST, recent research has proven some implementations of AES to be vulnerable to several forms of side channel attacks [3]. Side Channel Attacks (SCA) [4] are a form of cryptanalysis that focuses not on breaking the underlying cipher directly but on exploiting weaknesses found in certain implementations of a cipher. One could derive attacks based on side-channel information gained through
177
ICIAfS10
timing information [5], radiation of various sorts [6], power consumption statistics [7], cache contents [8], etc. In 2005 Daniel Bernstein demonstrated a remote cachetiming attack against AES [9]. His attack was successful over OpenSSL 0.9.7a on a Pentium III. Following Bernstein's work, Mairéad O'Hanlon and Anthony Tonge claimed to have failed with the attack on a Pentium IV running OpenSSL 0.9.7f. Our research investigates the applicability of Bernstein's attack and we have implemented a number of countermeasures and have evaluated their performance and soundness. The rest of the paper is organized as follows: Section II describes related work. In Section III we discuss how we have implemented and investigated Bernstein’s attack. Section IV is on countermeasures against Bernstein’s attack and Section V is on performance impact of experimented countermeasures. In Section VI we conclude the paper. II. RELATED WORK Side-channel attacks have been demonstrated against implementations of AES, utilizing cache contents, timing and power consumption. In 1998, Kelsey et al. [10] mentioned the prospect of attacks based on cache hit ratio in large S-box ciphers. In 1999 Koeune and Quisquater [11] demonstrated a timing attack against the reference Rijndael implementation which used branch statements to perform multiplication in the field GF (28). However software implementations typically use pre-computed values and therefore are immune to this attack. The use of table lookups into cached memory has recently been used as a side-channel. One type of attack as demonstrated by Osvik, Shamir, and Tromer, directly uses cache accesses as a side-channel [12]. The attack exploited inter-process leakage through the state of the CPU’s memory cache allowing an unprivileged process to attack the AES encryption process running in parallel on the same processor. This demonstrated that knowledge about specific values in cached memory that have been accessed by AES encryption could leak enough information to reconstruct the key. Bertoni et al. [13] paved way for another class of cache attacks that focuses on the use of power consumption. Their technique was to determine whether cache lookups performed during AES encryption resulted in hits or misses by power analysis. Cedric Laradoux [14] also described an attack devised by observing power consumption of the first round. Acıicmez et al. [15] extended this approach by considering the first two rounds of AES.
Another branch of cache attacks is based on observing timing variations in cache access patterns. These avoid the need to observe the victim's cache directly. Tsunoo et al. briefly mentioned the possibility of such an attack on AES while demonstrating their attack on DES and MISTY [16]. Joseph Bonneau and Ilya Mironov demonstrated an attack based on a similar approach utilizing the correlation between cache hits and encryption time [17]. They focused on individual cache-collisions during encryption in the last round and obtained information about key bytes. Daniel Bernstein demonstrated a different type of timing attack against AES in 2005 [9]. The attack takes into advantage the fact that AES, at the beginning of encryption, XORs each input byte X0 with a key byte k0 and uses the result as the index to access its tables. The table lookup time, and thus the encryption time, would therefore vary with the value of X0. Bernstein carried out the attack by collecting a large volume of timing data for each value of an input byte using a reference machine, and correlated this data with data from the target machine to recover the key. This approach is widely applicable because, as Bernstein claims, it is difficult to achieve fast constant-time AES software as it is “extremely difficult to load an array entry in time that does not depend on the entry’s index”. Bernstein demonstrated the attack against the AES implementation of OpenSSL 0.9.7a on an 850MHz Pentium III desktop computer running FreeBSD 4.8. He further claimed that the attack had been successful on every chip he tested: AMD Athlon, Intel Pentium III, Intel Pentium M, IBM PowerPC RS64 IV, and Sun UltraSPARC III. Following Bernstein's work, Mairéad O'Hanlon and Anthony Tonge claimed to have failed with the attack on a Pentium IV running GCC 4.0.0 and OpenSSL 0.9.7f, however succeeding with GCC 2.95.3 against the Miracle implementation of AES. Our research investigates the applicability and validity of Bernstein's attack and we have implemented a number of countermeasures and have evaluated their performance and soundness. III. REVISITING BERNSTEIN’S ATTACK To perform the attack proposed by Bernstein two servers are used. One is the actual victim's server and the other is a replica identical to the victim's server. Random data packets are sent to the server for encryption by clients and the time for encrypting the packets are recorded. The server begins its execution by encrypting some known plaintext-zero using the server’s key: the resulting cipher text is known as scrambled-zero. When the server receives a data packet from the client a
178
time stamp is added to the packet. Then the packet is encrypted with the server’s key and another time stamp is added. A packet containing the first 16 bytes of the original packet and the two time stamps is padded with scrambledzero and sent back to the client. With this method the server avoids sending any encrypted data to the attacker; only the timing data are sent. Upon receiving the encrypted packet, the number of cycles that have been taken by the encryption process is calculated using the two time stamp values contained in the reply. Only the packets that have consumed more than 10000 cycles are considered for the attack to reduce the effect of noise. For each plaintext byte, the average number of cycles, deviation and the estimated deviation for the encryption is calculated. Then the server’s scrambled-zero value is obtained by sending a random data packet to the server. The reply from the server contains scrambled-zero. After collecting sufficient amount of timing data for both the known key and the unknown key, a set of key possibilities for each key byte is identified by comparing the two sets of timing data. Finally a data packet having all zeros is encrypted with the different key combinations from the set of identified key possibilities. By comparing the resulting cipher text and the scrambled zeros of the server, the key combination that would encrypt the zeros in the same way as done by the server is identified as the secret key. The attack has 3 stages and they are: (1) collect data under a known key, (2) collect data for the unknown key and (3) key deduction. A. Collect data under a known key The server program listens to UDP port 10000 for data packets sent by the study program (client/ attacker). Received packets are encrypted with the known key and in response the server sends a packet which contains a Nonce, which is the direct copy of the first 16 bytes of the random data packet sent by the client, and two timestamps, with which to calculate the time taken for encrypting the data packet. When the client receives this packet it calculates all the statistical data, a selected portion of which is then printed to a file.
of the correlation program, called the key space. With the help of scrambled-zero, the search program deduces the key. The search program uses brute force since the key space is considerably small as shown in Fig. 1. The illustrated key space is approximately 2.6 x 1011 as opposed to 3.4 x 1038 (2128) which is the original key space of 128bits AES. Even using an older 5 Mbps Ethernet card to communicate between the server and the client we could collect the data within 8 hours (~13 million data packets). The key deducing stage can take a considerable amount of time to search the key using brute force. After running the search program which tries all the key possibility combinations within the reduced key space to identify the actual key, within 24 hours the exact key was extracted. In Bernstein’s proposal of this attack the amount of data packets to be encrypted for the attack was a large value 227, and there is no evidence of any attempts to determine the optimal number of packets to be encrypted to perform the attack.
Fig. 1. Sample key space after correlation. The first column is the number of possible keys for each byte; the second column the byte number and the rest of the columns are the possible keys
We investigated on the optimal number of packets needed to extract the key successfully by performing the attack with different numbers of packets, counting the number of key possibility combinations they give and finding out whether the actual key is among the obtained key possibilities.
B. Collect data for the unknown key The above process is repeated for the victim’s server and the resulting data is saved to a separate file. C. Key deduction Correlation program compares the timing details for both the cases and it will generate the possible key space according to the timing details. Fig. 1 shows a sample output
179
A. Eliminating T tables In this countermeasure the pre-calculated T tables are removed and the values are calculated when necessary. However T tables were originally introduced to increase the efficiency of encryption and therefore this will cause a huge performance reduction. When this countermeasure was performed, only two secret key bytes were found in the final key space. Under such circumstances the attack cannot be continued because missing even one key byte in the final key space makes the attack impossible.
Fig. 2. Variation of the number of data packets used for the attack and the resulting key combinations
Fig. 2 shows the variation of the number of data packets used for the attack and the resulting key combinations. According to the graph, 224 is the minimum number of data packets that should be used to carry out a successful attack. The key space gradually increases with the increasing number of packets as the ‘correlation’ program finds enough data to deduce the key possibilities. After the peak the ‘correlation’ program finds sufficient data to correctly deduce the key and keys that are less likely to be the correct key excluded from the set of key possibilities. Therefore the data sets obtained after the peak are used to determine the correct key combination. IV. COUNTERMEASURES Up to now, various countermeasures have been proposed against Bernstein's attack. As tabulated in Table I, these countermeasures fall into one of three categories: software improvements, hardware improvements, and operating system support. TABLE I COUNTERMEASURES AGAINST CACHE TIMING ATTACK
Eliminating T tables Software Improvement
Operating System Support Hardware Improvement
Masking timing data evicted from the cache Using smaller tables Disabling the cache Cache partitioning and locking Placing look up tables in registerfile Performing hardware
encryption
using
Following subsections describe the countermeasures in detail.
B. Masking timing data evicted from the cache Encryption time is masked so that the attacker will get wrong timing information. This will lead to a wrong key space. Running a dummy for-loop or a thread sleep for a random time is enough to mask the timing data. Random numbers can be generated from the GCC pseudo random number generator. In the dummy for-loop, the key space was fluctuating with the number of data packets. The final reduced key space contained only two secret key bytes. In the thread sleep countermeasure there remained 12 secret key bytes in the key space. Although an attacker cannot derive the secret key from these 12 bytes, this can be considered vulnerability in terms of statistical analysis. C. Using Smaller tables Rather than using pre-calculated T tables, smaller tables can be used while performing arithmetic calculations when necessary. This approach should be faster than the first countermeasure. D. Disabling the cache It is argued that software implementations of AES are vulnerable to timing attacks, because T tables may not perfectly fit into the cache with other data. Disabling the cache will allow the T tables to be fetched from the RAM (Random Access Memory) which is capable of storing all the T tables inside at once. However disabling the cache requires kernel support which is not available in the user space. E. Cache partitioning and locking In this countermeasure the cache is partitioned such that T tables are aligned to separate colours of the cache and therefore will not wipe out each other. After loading the T tables, cache is locked so that the partition cannot be used to store other data during encryption. Once the encryption finishes the locked partition will be freed. Again kernel support is essential for this method. In this countermeasure 10 secret key bytes were there in the reduced key space. The expected result was a large key space, as cache partitioning
180
reduces T tables overwriting each other. However, the observed result was a small key space and we failed to reason this behaviour. F. Placing lookup tables in registerfile If the microprocessor has enough registers which can hold all the T tables, they can be accessed in constant time. However, x86 and x64 architectures do not have enough general purpose registers to hold all the T tables so do most of the other microprocessor architectures.
start encrypting another packet and thereby a huge improvement in efficiency can be gained. However since GCC’s random number generator is used to generate random numbers required for these two methods, with statistical analysis it might be possible to perform an attack on AES and find the keys. Therefore using random numbers to hide timing data is not an ideal countermeasure against cache timing attacks.
G. Performing encryption using hardware Designing hardware components to calculate T box values when required will guarantee constant time operations. Introducing new instructions to general purpose CPUs to perform AES encryption will help eliminate timing attacks. Currently this is done by the Intel cooperation with their new Core-I processors. However, this requires additional hardware and cannot be applied to the existing systems. Except for embedding AES instructions on the CPU core, other countermeasures always reduce the performance of the cipher. Most of the countermeasures discussed here cannot be implemented on x86 architecture because of the lack of functionality support. Since operating system support is needed for cache locking, caches disabling etc. those are not general countermeasures that can be implemented in any computer. V. PERFORMANCE COMPARISON OF EXPERIMENTED COUNTERMEASURES Eliminating T tables, masking timing data evicted from the cache, and cache partitioning were implemented and their performance impacts were analyzed. All the performance comparisons were done against unprotected AES implementation and are tabulated in Table II. TABLE II PERFORMANCE COMPARISON OF COUNTERMEASURES Methods Number of Performance Cycles Comparison Unprotected AES 2,232 Replacing table lookups with arithmetic operations 69, 938 ~31 x slower Random sleep 2, 275 ~1.02 x slower Random loop 2, 343 ~1.05 x slower Cache Partitioning 6, 080 ~2.7 x slower
Putting the thread to sleep for some random time and having a random dummy loop makes the attacks impossible and there is very little overhead. Furthermore this counter measure can be implemented on all platforms. The thread sleep time, during which the CPU is idle, can be used to
Fig. 3. Number of clock cycles consumed for the tested countermeasures
Replacing the pre-computed table by calculating those values by arithmetic operations significantly degrades the performance of the cipher. Cache partitioning is hardware dependent. It depends on the line size and cache associatively. This countermeasure cannot be implemented in all hardware platforms. But cache partitioning seems to be better countermeasures as they make the cipher invulnerable to cache timing attacks and do not suffer from a large decrease in efficiency. As shown in Fig. 3 random sleep and random loop affect performance degradation by approximately the same quantity. VI. CONCLUSION In this paper, we have investigated the applicability of Bernstein's attack and we have implemented a number of countermeasures and have evaluated their performance and soundness. We have reported and compared the
181
2006. Lecture Notes in Computer Science, vol. 4249 (Springer, Berlin, 2006), pp. 201–215.
performance impact of a number of countermeasures and can conclude that the random sleep or loop will be a good countermeasure. Future work includes investigating more countermeasures. ACKNOWLEDGMENT We would like to thank Dhammika Elkaduwe of Computer Engineering, University of Peradeniya for his valuable and continuous feedback on this project. REFERENCES [1] [2] [3] [4]
[5] [6] [7] [8] [9] [10]
[11]
[12] [13]
[14]
[15]
[16]
[17]
Advanced Encryption Standard, Wikipedia, Available: http://en.wikipedia.org/wiki/Advanced_Encryption_Standard, Accessed on 02 March 2010. Advanced Encryption Standard, Federal Information Processing Standards Publications 197, 26 November 2001. F. Koeune and J. Quisquater. A timing attack against Rijndael. Technical Report CG-1999/1, June 1999. Paul Kocher, Timing attacks on implementations of diffie-hellman, rsa, dss, and other systems, Proceedings of the 16th Annual International Cryptology Conference on Advances in Cryptology (CRYPTO 1996), p 104-113, Springer-Verlag London, UK, 1996. D. Brumley and D. Boneh. Remote Timing Attacks are Practical, in USENIX, August 2003. J.-J. Quisquater and D. Samyde. Electromagnetic Analysis (EMA): Measures and counter-measures for smart cards. In E-smart, pages 200–210, 2001. S. Mangard. A Simple Power-Analysis (SPA) Attack on Implementations of the AES Key Expansion. In ICISC 2002, volume 2587, pages 343–358. Dag Arne Osvik Adi Shamir and Eran Tromer Cache attacks an countermeasures: The Case of AES, Lecture Notes in Computer Science, Volume 3860/2006, pages 1-20 December 2005 Daniel J. Bernstein, “Cache Timing Attacks on AES”, April 2005. J. Kelsey, B. Schneier, D. Wagner, C. Hall, Side channel cryptanalysis of product ciphers, in Proc. 5th European Symposium on Research in Computer Security. Lecture Notes in Computer Science, vol. 1485 (Springer, Berlin, 1998), pp. 97–110. F. Koeune, J.-J. Quisquater, A timing attack against Rijndael. Technical Report CG-1999/1, Université catholique de Louvain, Available:http://www.dice.ucl.ac.be/crypto/tech_reports/CG1999_1.p s.gz D.A. Osvik, A. Shamir, E. Tromer, Other people’s cache: Hyper Attacks on HyperThreaded processors. Fast Software Encryption (FSE) 2005 rump session, Feb. 2005 G. Bertoni, V. Zaccaria, L. Breveglieri, M. Monchiero, G. Palermo, AES power attack based on induced cache miss and countermeasure, in Proc. International Conference on Information Technology: Coding and Computing (ITCC’05) (IEEE, New York, 2005), pp. 586–591 C. Lauradoux, Collision attacks on processors with cache and countermeasures, in Western European Workshop on Research in Cryptology (WEWoRC) 2005. Lectures Notes in Informatics, vol. P74 (2005), pp. 76–85. Available: http://www.cosic.esat.kuleuven.ac.be/WeWorc/allAbstracts.pdf O. Acıiçmez, Ç.K. Koç, Trace driven cache attack on AES (short paper), in Proc. International Conference on Information and Communications Security (ICICS) 2006. Lecture Notes in Computer Science, vol. 4296 (Springer, Berlin, 2006), pp. 112–121 Y. Tsunoo, T. Saito, T. Suzaki, M. Shigeri, H. Miyauchi, Cryptanalysis of DES implemented on computers with cache, in Proc. Cryptographic Hardware and Embedded Systems (CHES) 2003. Lecture Notes in Computer Science, vol. 2779 (Springer, Berlin, 2003), pp. 62–76 J. Bonneau, I. Mironov, Cache-collision timing attacks against AES, in Proc. Cryptographic Hardware and Embedded Systems (CHES)
182