FPGA based Optimization for Masked AES Implementation Zheng Yuan, Yi Wang, Jing Li, Renfa Li and Wei Zhao
Embedded Systems & Networking Laboratory, Hunan University Graduate Innovation Base, Changsha, China. e-mail:
[email protected] Abstract-Masking
methods
are
popularly
used
to
defend
against power analysis attacks in embedded systems. Apart from power analysis attack, there also exists glitch attack when porting the design to gate level. In this paper, we firstly divided the existing masking methods into different types according to their functions, masking value and applications. Secondly, we compared different masked S-box hardware implementation. Finally, we proposed the masked AES encryption with 32-bit and
US-bit
data
path
hardware
implementation.
The
experimental results show that our proposed design takes up less hardware resources and has the ability to defend against differential power analysis(DPA) and glitch attacks.
I.
INTRODUCTION
More and more attention has been putting on information security with the technological innovation of computer, communication and network. In order to secure the information transformation, modem cryptography provides an efficient and secure approach in embedded systems. Rijndael algorithm was proposed by the National Institutes of Standards and Technology (NIST) to be Advanced Encryption Standard (AES) in 2001 [1]. From then on, AES became popularly used in embedded systems, such as smart cards, mobile phone, Personal Digital Assistant (PDA), Radio Frequency Identification (RFID) and automotive electronics. Unfortunately, side channel attacks pose a serious threat to AES, these attacks are based on "side channel information", that is, information that can be retrieved from the encryption device that is neither the plaintext to be encrypted nor the ciphertext resulting from the encryption process [2]. Power analysis attacks are powerful attacks among them. Power analysis attacks include simple power analysis (SPA), differential power analysis (DPA), higher order differential power analysis (HODPA) and glitch attack. Simple Power Analysis (SPA) is a technique that involves directly interpreting power consumption measurements during cryptographic operations. Messerges et al. interpreted DES' secret key when applying SPA attack in the work of [3]. Sommer successfully broke the secure system in the application of smart cards by using SPA [4]. Mangard proposed an SPA attack to the procedure of AES' key schedule [5]. DPA attack is based on statistical analysis in
This
work
is
supported
by
"Chinese
National
which the attacker can guess the correctness of the keys by comparing the differences between a sample power trace and the correct key power trace. Kocher et al. discovered that DPA attack could easily retrieve the sensitive information of smart cards [2]. Drs et al. successfully broke the dedicated AES ASIC implementation by using DPA attack [6]. Mangard et al. detailed that DPA attack efficiently retrieved secret key of AES hardware implementation [7]. HODPA attack is a powerful technique that exploits joint leakage information of several intermediate values to "crack" the secret key. Waddle and Wagner proposed several different second-order power analysis attacks to overcome masked cryptographic algorithms [8]. Joye et al. provided a theoretical measurement of information leakage, a function of hardware-dependent parameters to introduce an efficient second-order power analysis (20-DPA) attack [9]. Proutff et al. proposed a statistical analysis of 20-DPA attack in [10]. In gate level, input signal postponing through circuit used different arriving time, therefore it leaded to the possibility of glitch attack. Mangrad analyzed glitch attack in theory [11]. The successful approaches using glitch attacks can be found in Golic [12], Trichina [13] and Canright et al. [14]. Numerous countermeasures have been proposed to defend against above power analysis attacks. Masking is one of the popularly used methods, which has the advantages of low cost and easy implementation [12-40]. In this paper, we detailed different previous masking methods which applied to S-box and AES encryption. We also classified these masking methods into different types according to their functions, values of mask and applications. Moreover, we ported a masking scheme for S-box over GF(24) to Xilinx Virtex-5 FPGA. Apart from 8 bit data path software implementation, we presented a 32-bit and 128-bit data path hardware implementation. A detailed comparison are given to show that our proposed method using up less area and with the ability to resist against DPA and glitch attack. II.
MASKING CLASSIFICATION
Masking can break the dependence between the power consumption and the intermediate values in the cryptographic algorithm. Those types of masking schemes are as follows.
Science
Foundation "(No.60873074) and "the Fundamental Research Funds for Chinese
Central Universities ".
978-1-61284-857-0/11/$26.00 @2011 IEEE
A.
According to the fuction of operations: masking schemes can be divided into Boolean masking, additive masking, multiplicative masking, and mixed masking.
Boolean masking: the intermediate value is concealed by exclusive-oring (Xor) it with the mask. Thus, the masked intermediate value is vm=vGJm. Piret and Standaert proved the security of Boolean masking schemes for block cipher [15]. Schramm and Paar provided a higher order masking using Xoring operation to defend against HODPA [16], but it cannot be applied to the smart card. A technique of Boolean masking, the logic AND operation based on the gate level, was introduced by Golic [12], the author proposed Xor-based, MUX-based and NAND-based masking schemes, however, these methods cannot resist against glitch attack. Additive masking: the intermediate value is concealed by an additive operation. Thus, the masked intermediate value can be represented by vm=v+m (mod n). Addictive masking requires to recompute a new S-box and store it in RAM, Blomer et al. proposed additive masking operations, but their design took up larger areas [17]. Herbst et al. proposed an addictive masking scheme in the application of smart cards, which cannot resist against HODPA [18]. Multiplicative masking: the intermediate value is concealed by a multiplicative operation. Thus, the masked intermediate value is vm=vxm (mod n). Golic and Tymen proposed a modified multiplicative masking method which generated a new S-box by using random additive masks [19], but, this solution cannot resist against 20-DPA attack. Trichina et al. optimized a multiplicative masking, which converted a Boolean mask to a multiplicative mask [20], it could resist against DPA attack, but it takes up larger area. Alam et al. proposed a synchronous balanced mask multiplier architecture, where glitches are minimized, but it required more hardware [21].
Algorithmic level masking: Baek and Noh proposed several new efficient masking algorithms for multipliers over finite fields, however, these methods cannot resist against HODPA [28]. Akkar and Giraud proposed algorithmic level masking for an implementation of AES, but their design achieved lower speed frequency [29].
III.
IMPLEMENTATION OF
MASKED S-BOX
In this section, we proposed our architecture for masked S box over GF(24). Our implementation of the masked S-box is based on the work of [30]. Although the work of [31] and [32] had already presented the possibility of masked S-box implementation, they only discussed the implementation on smart card software platform, which was lack of detail design explorations when porting to hardware platform. Some work had been done by directly using look-up table which is totally 256 bytes for the unprotected S-box implementation. When applying to the masked method with two masks using look-up table implementation, it needs 28x28x256 bytes which are too large for hardware implementation. Therefore, it is obvious that the design would use up less area when converted the operations from GF(28) to GF(24). In the work of [32], Oswald et al. already discussed the needed six pre-computed table for implementing masked S-box over GF(24). The six pre-computed tables are: Tdl: «x+ m), m') �X2xPo+ m' Td2: «x+ m), (y+m'))� «x+ m) +(y + m') x(y + m') Tm: «x+m ),(y+m'))�(x+m)x(y +m') T'inv: «x+ m), m') � Tinv (x)+ m' T'map: «x+ m), m) � Tmap (x+ m) T'map-I: «x+ m), m) � Tmap-I(x+ m)
Mixed masking: a masking scheme combines Boolean, additive and multiplicative operations. Messerges proposed a method which converted between Boolean masking and arithmetic masking, but it cannot resist against DPA attacks [22]. A sound method for switching is presented by Goubin [23]. Trichina proposed an efficient and secure AES co processor for smart cards by using Xor and AND operations but this method cannot defend against glitch attack [13]. Trichina et al. claimed that they realized a secure hardware implementation of AES against first order DPA attacks at gate level [24]. B.
According to the application of masking: masking schemes can be divided into gate level and algorithmic level masking.
Gate level masking: Golic and Menicocci proposed a masking technique based on AND gate, which cannot resist against HODPA and glitch attacks [25]. A gate level masking scheme proposed by Fischer and Gammel [26] and an AND gate masking technique proposed by Kumar et al. [27] can resist against DPA and glitch attack, but the above two methods increased the size of design. Figure l. Structure of proposed masked S-box.
COMPARISON OF IMPLEMENTATION RESULTS OF 8-BIT MASKED S-BOX
TABLE I.
Platform Alam [21]
Technology
Cycles
mUltiplicative masking
5
5478 gates
8.33ns delay
-
3628 gates
59.13ns delay
I I
452 gates
-
1023 gates
27ns delay
-
100 slices
16.67 ns delay
masked-AND
-
332 LUT
35.39 ns delay
WOOL
-
456 LUT
46.8 ns delay
RSL
-
174 LUT
30.35 ns delay
Boolean masking
I
2051 gates/127 slices
14.299 ns delay
Canright [14]
0.13J.Ull CMOS
mUltiplicative masking
Baek [28]
0.18J.Ull CMOS
algorithmic level masking
Kamoun [33]
Xilinx Virtex 4
Suzuki [34] this work
algorithmic level masking gate level
XCV I 000-6
masking
Xilinx Virtex-5
WOOL: Wave Dynamic Differential logic;
TABLE II.
Popp [35]
RSL: Random SWltchmg logic.
COMPARISON OF IMPLEMENTATION RESULTS OF MASKED AES Technology
Data path (bit)
Area
Speed
Thr (Mbps)
0.35J.Ull CMOS
gate level masking (MDPL)
-
16.5 K gates
9.82 MHz
-
-
111
5.52*
145*
7.80*
0.18J.Ull CMOS
gate level masking
masked-AND
20.1 K gates
32
WOOL
45.85 K gates* 64 K ROMs
256 fixed masks 8-bit RlSC
algorithmic
architecture**
level masking
256 ROMs
single mask 8 16 masks
0.18J.U1l CMOS
Kamoun [33]
Xilinx Virtex 4
Mentens [37]
XCV800-4
Trichina [38]
0.18J.lm CMOS
13MHz*
40.6*
1.34*
78*
1.70*
-
-
-
-
-
-
-
-
-
-
0.46*
1536 ROMs
57.1 ns
11.9
algorithmic level masking
-
2281 slices
137MHz
-
masking combine additive and multiplicative
multiplicative masking multiplicative masking multiplicative masking additive masking
This work Virtex-5 XCVLX30
( Kbpslgates)
25MHz*
25.7 K gates
Virtex-5 XCVLXSO
-
4452 slices·
23MHz
29
-
32 -
21.7 K gates
74
3.41*
49 K gates
43.18 ns -
900
18.4*
-
42.4 K gates
-
1150
-
32 128
Boolean masking 32
4175 slices
22.84 ns
140.13
27.1* -
3580 slices
20.77 ns
157.07
-
116MHz
1350
24.4
300.4
15.03
55.4 K gates! 4992 slices 20.0 K gates!
103.3MH
885 slices
z
PMRML: Pre-charge Masked Reed-Muller LogiC; MDPL: Maskmg Dual-Ratl Pre-charge logiC;
*:The author estimated based on original work;
**: Software implementation with micro Controller Unit.
Where, Tdl, Td2 , Tm and T'inv perfonn the necessary operations of transfonning the masked S-box over GF(28) to GF(24), while T 'map perfonns masked isomorphic mapping from GF(28) to GF(24) xGF(24) and T'ma/ perfonns masked isomorphic mapping back from GF(24)xGF(24) to GF(28) with an additional masked affme transfonnation (Fig.1). T'inv is the modified inversion operation over GF(24) recomputed by the input mask m' and the output mask m. Therefore, it needs 24x24x24x4 bits to store all the possible values for T'inv, where m =F m'. When m = m', it reduces the needed memory to store only 24x24x4 bits. All the above tables can be pre computed and stored in read only memory (ROM). IV.
*
32
Virtex-E
Thr: Throughput;
256 ROMs
* 46.4MHz
algorithmic level masking
0.25J.lffiCMOS
Ordu [40]
RAMs
35.6MHz
-
0.18J.U1l CMOS
Zhou [39]
256
4096 RAMs
MOS-box Baek [28]
18.6K gates 30.3 K gates
MOPL
Oswald [32]
Thr!Area
Platform
PMRML
Lin [36]
Speed
multiplicative masking
(balanced pipelined S-box)
9Onm CMOS
Area
RESULT AND COMPARISON
In this section, we have implemented the proposed design using Hardware Description Language (HDL) and then
synthesized our design using Xilinx ISE 12.1 and ported the design to Virtex-5 FPGA. Table I gives the comparison results of our masked S-box with the existing masked S-box designs. From table I, it is obvious that Alam's design achieved 1.7 times faster than ours. However, our design only needed 1 clock cycle for transfonning S-box from GF(28) to GF(24), where Alam's design needed 5 clock cycles. Table II shows a comparison of masked AES encryption implementation. It is hard to make fair comparison as different techniques and targeted platfonns are used. We achieved best throughput of masked AES with 128-bit data path, which is 17% larger than Zhou's design (Zhou's design is the best among the existing methods). Our masked AES with32-bit data path takes up 21% more area than Popp's one (popp's design is the smallest among the existing methods). Kamoun's design is the fastest among all the implementations which achieved 137MHz. Although, Zhou's design slightly larger than our design in throughput/area aspect, it cannot resist against glitch attack.
V.
CONCLUSION
In this paper, we detail the existing masking methods and classify those methods by their functions, masking value and applications. We also explore some methods for masked S-box implementation. In order to resist against DPA and glitch attack, we proposed an optimized AES implementation with 32-bit and 128-bit data path (masked S-box over GF(24» separately. We also show that the design efficiently resists against glitch attacks because the operations only consist of table look-up and Xor operations which breaks the foundations (AND operation) of glitch attacks. Our proposed design achieved rather better performance among the existing designs. REFERENCES [I]
National Institute of Standards and Technology, "Advanced Encryption Standard(AES)," FIPS-197, 2001.
[2]
P. Kocher, J. Jaffe and B. Jun, "Differential power analysis," in CRYPTO'99,vol. 1666,Springer-Verlag,1999,pp. 388-397.
[3]
T. S. Messerges, E. A. Dabbish and R. H. Sloan. "Investigations of power analysis attack on smartcards," Proc. USENIX Association. Workshop on Smartcard Technology (WOST 99),USENIX press,1999, pp.151-162.
[4]
R. Mayer-Sommer, "Smartly analyzing the simplicity and the power of simple power analysis on smartcards," in CHES 2000, vol. 1965, Springer-Verlag,2000,pp. 78-92.
[5]
S. Mangard,"A simple power analysis attack on implementations of the AES key expansion," in ICISC 2002,vol. 2587, Springer-Verlag, 2003, pp. 343-358.
[6]
S. B. Ors, F. GOrkaynak, E. Oswald and B. Preneel, "Power-analysis attack on an ASIC AES implementation," in IICC 2004, IEEE Press, Apr. 2004,pp. 546-566.
[7]
S. Mangard, N. Prarnstaller and E. Oswald, "Successfully attacking AES hardware implementations," in CHES 2005, vol. 3659, Springer Verlag,2005,pp. 157-171.
[19] J. Golic and C. Tymen, "Multiplicative masking and power analysis of AES," in CHES 2003,vol. 2529,Springer-Verlag,2003,pp. 31-47. [20] E. Trichina, D. De Seta and L. Germani, "Simplified adaptive mUltiplicative masking for AES," in CHES 2002, vol. 2523, Springer Verlag,2003,pp. 71-85. [21] M. Alam, S. Ghosh, M. J. Mohan and D. Mukhopadhyay, D. R. Chowdhury and I. S. Gupta, "Effect of glitches against masked AES S box implementation and countermeasure," lET Information Security, vol. 3,Feb. 2009,pp. 34-44. [22] T. Messerges, "Securing the AES finalists against power analysis attacks," in FSE 2001,vol. 1978, Springer-Verlag, 2001,pp. 293-301. [23] L. Goubin, "A sound method for switching between Boolean and arithmetic masking," in CHES 2001,vol. 2162, Springer-Verlag, 2001, pp. 3-15. [24] E. Trichina,T. Korkishko and K.H. Lee, "Small size, low power, side channel-immune AES coprocessor: design and synthesis results," in AES 2005,vol. 3373,Springer-Verlag,2005,pp. 113-127. [25] J. D. Golic and R. Menicocci, "Universal masking on logic gate level," lET Electronics Letters,vol. 40,May. 2004,pp. 526-528. [26] W. Fischer and B. M. Gammel, "Masking at gate level in the presence of glitches, " in CHES 2005, vol. 3659, Springer-Verlag, 2005, pp. 187-200. [27] K. Kumar, D. Mukhopadhyay and D. RoyChowdhury, "Design of a differential power analysis resistant masked AES S-Box," in INDOCRYPT 2007,vol. 4859,Springer-Verlag,2007, pp. 373-383. [28] Y. Baek and M. Noh, "DPA-resistant finite field mUltipliers and secure AES design," in Information Security Practice and Experience, vol. 3903,Springer-Verlag,2006, pp. 1-12. [29] M. Akkar and C. Giraud,"An implementation of DES and AES, secure against some attacks," in CHES 2001,vol. 2162,Springer-Verlag,2001, pp. 309-318. [30] J. Wolkerstorfer, E. Oswald and M. Lamberger, "An ASIC implementation of the AES Sboxes," in CT-RSA 2002, vol. 2271, Springer-Verlag,2002,pp. 67-78. [31] E. Oswald,S. Mangard, N. Prarnstaller and V. Rijmen,"A side-channel analysis resistant description of the AES S-box," in FSE, vol. 3557, Springer-Verlag, 2005,pp. 413-423.
[8]
J. Waddle and D. Wagner, "Towards efficient second-order power analysis," In CHES 2004,vol. 3156,Springer-Verlag,2004,pp. 1-15.
[32] E. Oswald and K. Schramm, "An efficient masking scheme for AES software implementations," in WISA, vol. 3786, Springer-Verlag, 2005,pp. 292-305.
[9]
M. Joye, P. Paillier and B. Schoenmakers, "On second-order differential power analysis," In CHES 2005, vol. 3659, Springer Verlag,2005,pp. 293-308.
[33] N. Kamoun,L. Bossuet and A. Ghazel, "SRAM-FPGA implementation of masked S-Box based DPA countermeasure for AES," in IDT 2008. pp. 74 - 77.
[10] E. Prouff,M. Rivain and R. Bevan,"Statistical analysis of second order differential power analysis," IEEE Transactions on Computers,vol. 58, 2009, pp. 799-811.
[34] D. Suzuki, M. Saeki and T. Ichikawa, "Random Switching Logic: A countermeasure against DPA based on transition probability," (http://eprint.iacr.orgl),Report 2004/346,2004.
[11] S. Mangard,E. Oswald and T. Popp. "Power analysis attacks: revealing the secrets of smart cards," Spinger-Verlag, 2007
[35] T. Popp and S. Mangard, "Masked dual-rail pre-charge logic: DPA
[12] J. D. Golic, "Techniques for random masking in hardware," IEEE Transactions on Circuits and Systems, vol. 54, Feb. 2007, Pages: 291300. [13] E. Trichina, "Combinational logic design for AES transformation on masked data ," Cryptology ePrint (http://eprint.iacr.orgl) ,Report 20031236,2003.
subbyte Archive
resistance without routing constraints," Cryptographic Hardware and Embedded Systems,vol. 3659,Springer-Verlag,2005,pp. 172-186. [36] K. J. Lin, S. C. Fang, S. H. Yang and C. C. Lo, "Overcoming glitches and dissipation timing skews in design of DPA-resistant cryptographic hardware," Proc. IEEE Symp. Design, Automation & Test in Europe Conference & Exhibition (DATE 07),IEEE Press,Apr. 2007,pp. 1-6. [37] N. Mentens, L. Batina, B. Preneel and I.
Verbauwhede, "An FPGA
[14] D. Canright and L. Batina, "A very compact "perfectly masked " S-box for AES," ACNS 2008,vol. 5037,Springer-Verlag,2008, pp. 446-459.
implementation of Rijndael: trage-offs for side-channel security," Programmable Devices and Systems (IFAC 04),2004,pp. 493-498.
[15] G. Piret and F. X. Standaert, "Security analysis of higher-order Boolean
[38] E. Trichina and T. Korkishko, "Secure AES hardware module for resource constrained devices," in Security in Ad-hoc and Sensor Networks,vol. 3313,2005,Springer-Verlag,pp. 215-229.
masking schemes for block ciphers (with conditions of perfect masking)," lET Information Security,vol. 2,Mar. 2007,pp. 1-11. [16] K. Schramm and C. Paar, "Higher order masking of the AES," in CT RSA 2006,vol. 3860,Springer-Verlag,2006,pp. 208-225. [17] J. BUlmer, J. Guajardo and V. Krummel, "Provably secure masking of AES," in Cryptography,vol. 3357,Springer-Verlag,2005,pp. 69-83. [18] C.
Herbst, E.
Oswald
and S. Mangard, "An
AES
smart
card
implementation resistant to power analysis attacks," in ACNS 2009,vol. 3989,Springer-Verlag,2006,pp. 239-252.
[39] Y. Zhou, G. Qian, Y. Xing, H. Liu, S. Goto and Y. Tsunoo, "An approach of using different positions of double registers to protect AES hardware structure from DPA," Proc. IEEE Symp. Electronic Commerce and Security,IEEE Press,2010,pp. 223-227.
[40] L. Ordu and B. Ors, "Power analysis resistant implementations of AES," ICECS 07,2007,pp. 1408-1411.
hardware