FPGA based Optimization for Masked AES Implementation - IEEE Xplore

FPGA based Optimization for Masked AES Implementation Zheng Yuan, Yi Wang, Jing Li, Renfa Li and Wei Zhao

Embedded Systems & Networking Laboratory, Hunan University Graduate Innovation Base, Changsha, China. e-mail:[email protected] Abstract-Masking

methods

are

popularly

used

to

defend

against power analysis attacks in embedded systems. Apart from power analysis attack, there also exists glitch attack when porting the design to gate level. In this paper, we firstly divided the existing masking methods into different types according to their functions, masking value and applications. Secondly, we compared different masked S-box hardware implementation. Finally, we proposed the masked AES encryption with 32-bit and

US-bit

data

path

hardware

implementation.

The

experimental results show that our proposed design takes up less hardware resources and has the ability to defend against differential power analysis(DPA) and glitch attacks.

I.

INTRODUCTION

More and more attention has been putting on information security with the technological innovation of computer, communication and network. In order to secure the information transformation, modem cryptography provides an efficient and secure approach in embedded systems. Rijndael algorithm was proposed by the National Institutes of Standards and Technology (NIST) to be Advanced Encryption Standard (AES) in 2001 [1]. From then on, AES became popularly used in embedded systems, such as smart cards, mobile phone, Personal Digital Assistant (PDA), Radio Frequency Identification (RFID) and automotive electronics. Unfortunately, side channel attacks pose a serious threat to AES, these attacks are based on "side channel information", that is, information that can be retrieved from the encryption device that is neither the plaintext to be encrypted nor the ciphertext resulting from the encryption process [2]. Power analysis attacks are powerful attacks among them. Power analysis attacks include simple power analysis (SPA), differential power analysis (DPA), higher order differential power analysis (HODPA) and glitch attack. Simple Power Analysis (SPA) is a technique that involves directly interpreting power consumption measurements during cryptographic operations. Messerges et al. interpreted DES' secret key when applying SPA attack in the work of [3]. Sommer successfully broke the secure system in the application of smart cards by using SPA [4]. Mangard proposed an SPA attack to the procedure of AES' key schedule [5]. DPA attack is based on statistical analysis in

This

work

is

supported

by

"Chinese

National

which the attacker can guess the correctness of the keys by comparing the differences between a sample power trace and the correct key power trace. Kocher et al. discovered that DPA attack could easily retrieve the sensitive information of smart cards [2]. Drs et al. successfully broke the dedicated AES ASIC implementation by using DPA attack [6]. Mangard et al. detailed that DPA attack efficiently retrieved secret key of AES hardware implementation [7]. HODPA attack is a powerful technique that exploits joint leakage information of several intermediate values to "crack" the secret key. Waddle and Wagner proposed several different second-order power analysis attacks to overcome masked cryptographic algorithms [8]. Joye et al. provided a theoretical measurement of information leakage, a function of hardware-dependent parameters to introduce an efficient second-order power analysis (20-DPA) attack [9]. Proutff et al. proposed a statistical analysis of 20-DPA attack in [10]. In gate level, input signal postponing through circuit used different arriving time, therefore it leaded to the possibility of glitch attack. Mangrad analyzed glitch attack in theory [11]. The successful approaches using glitch attacks can be found in Golic [12], Trichina [13] and Canright et al. [14]. Numerous countermeasures have been proposed to defend against above power analysis attacks. Masking is one of the popularly used methods, which has the advantages of low cost and easy implementation [12-40]. In this paper, we detailed different previous masking methods which applied to S-box and AES encryption. We also classified these masking methods into different types according to their functions, values of mask and applications. Moreover, we ported a masking scheme for S-box over GF(24) to Xilinx Virtex-5 FPGA. Apart from 8 bit data path software implementation, we presented a 32-bit and 128-bit data path hardware implementation. A detailed comparison are given to show that our proposed method using up less area and with the ability to resist against DPA and glitch attack. II.

MASKING CLASSIFICATION

Masking can break the dependence between the power consumption and the intermediate values in the cryptographic algorithm. Those types of masking schemes are as follows.

Science

Foundation "(No.60873074) and "the Fundamental Research Funds for Chinese

Central Universities ".

978-1-61284-857-0/11/$26.00 @2011 IEEE

A.

According to the fuction of operations: masking schemes can be divided into Boolean masking, additive masking, multiplicative masking, and mixed masking.

Boolean masking: the intermediate value is concealed by exclusive-oring (Xor) it with the mask. Thus, the masked intermediate value is vm=vGJm. Piret and Standaert proved the security of Boolean masking schemes for block cipher [15]. Schramm and Paar provided a higher order masking using Xoring operation to defend against HODPA [16], but it cannot be applied to the smart card. A technique of Boolean masking, the logic AND operation based on the gate level, was introduced by Golic [12], the author proposed Xor-based, MUX-based and NAND-based masking schemes, however, these methods cannot resist against glitch attack. Additive masking: the intermediate value is concealed by an additive operation. Thus, the masked intermediate value can be represented by vm=v+m (mod n). Addictive masking requires to recompute a new S-box and store it in RAM, Blomer et al. proposed additive masking operations, but their design took up larger areas [17]. Herbst et al. proposed an addictive masking scheme in the application of smart cards, which cannot resist against HODPA [18]. Multiplicative masking: the intermediate value is concealed by a multiplicative operation. Thus, the masked intermediate value is vm=vxm (mod n). Golic and Tymen proposed a modified multiplicative masking method which generated a new S-box by using random additive masks [19], but, this solution cannot resist against 20-DPA attack. Trichina et al. optimized a multiplicative masking, which converted a Boolean mask to a multiplicative mask [20], it could resist against DPA attack, but it takes up larger area. Alam et al. proposed a synchronous balanced mask multiplier architecture, where glitches are minimized, but it required more hardware [21].

Algorithmic level masking: Baek and Noh proposed several new efficient masking algorithms for multipliers over finite fields, however, these methods cannot resist against HODPA [28]. Akkar and Giraud proposed algorithmic level masking for an implementation of AES, but their design achieved lower speed frequency [29].

III.

IMPLEMENTATION OF

MASKED S-BOX

In this section, we proposed our architecture for masked S box over GF(24). Our implementation of the masked S-box is based on the work of [30]. Although the work of [31] and [32] had already presented the possibility of masked S-box implementation, they only discussed the implementation on smart card software platform, which was lack of detail design explorations when porting to hardware platform. Some work had been done by directly using look-up table which is totally 256 bytes for the unprotected S-box implementation. When applying to the masked method with two masks using look-up table implementation, it needs 28x28x256 bytes which are too large for hardware implementation. Therefore, it is obvious that the design would use up less area when converted the operations from GF(28) to GF(24). In the work of [32], Oswald et al. already discussed the needed six pre-computed table for implementing masked S-box over GF(24). The six pre-computed tables are: Tdl: «x+ m), m') �X2xPo+ m' Td2: «x+ m), (y+m'))� «x+ m) +(y + m') x(y + m') Tm: «x+m ),(y+m'))�(x+m)x(y +m') T'inv: «x+ m), m') � Tinv (x)+ m' T'map: «x+ m), m) � Tmap (x+ m) T'map-I: «x+ m), m) � Tmap-I(x+ m)

Mixed masking: a masking scheme combines Boolean, additive and multiplicative operations. Messerges proposed a method which converted between Boolean masking and arithmetic masking, but it cannot resist against DPA attacks [22]. A sound method for switching is presented by Goubin [23]. Trichina proposed an efficient and secure AES co processor for smart cards by using Xor and AND operations but this method cannot defend against glitch attack [13]. Trichina et al. claimed that they realized a secure hardware implementation of AES against first order DPA attacks at gate level [24]. B.

According to the application of masking: masking schemes can be divided into gate level and algorithmic level masking.

Gate level masking: Golic and Menicocci proposed a masking technique based on AND gate, which cannot resist against HODPA and glitch attacks [25]. A gate level masking scheme proposed by Fischer and Gammel [26] and an AND gate masking technique proposed by Kumar et al. [27] can resist against DPA and glitch attack, but the above two methods increased the size of design. Figure l. Structure of proposed masked S-box.

COMPARISON OF IMPLEMENTATION RESULTS OF 8-BIT MASKED S-BOX

TABLE I.

Platform Alam [21]

Technology

Cycles

mUltiplicative masking

5

5478 gates

8.33ns delay

-

3628 gates

59.13ns delay

I I

452 gates

-

1023 gates

27ns delay

-

100 slices

16.67 ns delay

masked-AND

-

332 LUT

35.39 ns delay

WOOL

-

456 LUT

46.8 ns delay

RSL

-

174 LUT

30.35 ns delay

Boolean masking

I

2051 gates/127 slices

14.299 ns delay

Canright [14]

0.13J.Ull CMOS

mUltiplicative masking

Baek [28]

0.18J.Ull CMOS

algorithmic level masking

Kamoun [33]

Xilinx Virtex 4

Suzuki [34] this work

algorithmic level masking gate level

XCV I 000-6

masking

Xilinx Virtex-5

WOOL: Wave Dynamic Differential logic;

TABLE II.

Popp [35]

RSL: Random SWltchmg logic.

COMPARISON OF IMPLEMENTATION RESULTS OF MASKED AES Technology

Data path (bit)

Area

Speed

Thr (Mbps)

0.35J.Ull CMOS

gate level masking (MDPL)

-

16.5 K gates

9.82 MHz

-

-

111

5.52*

145*

7.80*

0.18J.Ull CMOS

gate level masking

masked-AND

20.1 K gates

32

WOOL

45.85 K gates* 64 K ROMs

256 fixed masks 8-bit RlSC

algorithmic

architecture**

level masking

256 ROMs

single mask 8 16 masks

0.18J.U1l CMOS

Kamoun [33]

Xilinx Virtex 4

Mentens [37]

XCV800-4

Trichina [38]

0.18J.lm CMOS

13MHz*

40.6*

1.34*

78*

1.70*

-

-

-

-

-

-

-

-

-

-

0.46*

1536 ROMs

57.1 ns

11.9


-

2281 slices

137MHz

-

masking combine additive and multiplicative

multiplicative masking multiplicative masking multiplicative masking additive masking

This work Virtex-5 XCVLX30

( Kbpslgates)

25MHz*

25.7 K gates

Virtex-5 XCVLXSO

-

4452 slices·

23MHz

29

-

32 -

21.7 K gates

74

3.41*

49 K gates

43.18 ns -

900

18.4*

-

42.4 K gates

-

1150

-

32 128

Boolean masking 32

4175 slices

22.84 ns

140.13

27.1* -

3580 slices

20.77 ns

157.07

-

116MHz

1350

24.4

300.4

15.03

55.4 K gates! 4992 slices 20.0 K gates!

103.3MH

885 slices

z

PMRML: Pre-charge Masked Reed-Muller LogiC; MDPL: Maskmg Dual-Ratl Pre-charge logiC;

*:The author estimated based on original work;

**: Software implementation with micro Controller Unit.

Where, Tdl, Td2 , Tm and T'inv perfonn the necessary operations of transfonning the masked S-box over GF(28) to GF(24), while T 'map perfonns masked isomorphic mapping from GF(28) to GF(24) xGF(24) and T'ma/ perfonns masked isomorphic mapping back from GF(24)xGF(24) to GF(28) with an additional masked affme transfonnation (Fig.1). T'inv is the modified inversion operation over GF(24) recomputed by the input mask m' and the output mask m. Therefore, it needs 24x24x24x4 bits to store all the possible values for T'inv, where m =F m'. When m = m', it reduces the needed memory to store only 24x24x4 bits. All the above tables can be pre computed and stored in read only memory (ROM). IV.

*

32

Virtex-E

Thr: Throughput;

256 ROMs

* 46.4MHz


0.25J.lffiCMOS

Ordu [40]

RAMs

35.6MHz

-

0.18J.U1l CMOS

Zhou [39]

256

4096 RAMs

MOS-box Baek [28]

18.6K gates 30.3 K gates

MOPL

Oswald [32]

Thr!Area

Platform

PMRML

Lin [36]

Speed

multiplicative masking

(balanced pipelined S-box)

9Onm CMOS

Area

RESULT AND COMPARISON

In this section, we have implemented the proposed design using Hardware Description Language (HDL) and then

synthesized our design using Xilinx ISE 12.1 and ported the design to Virtex-5 FPGA. Table I gives the comparison results of our masked S-box with the existing masked S-box designs. From table I, it is obvious that Alam's design achieved 1.7 times faster than ours. However, our design only needed 1 clock cycle for transfonning S-box from GF(28) to GF(24), where Alam's design needed 5 clock cycles. Table II shows a comparison of masked AES encryption implementation. It is hard to make fair comparison as different techniques and targeted platfonns are used. We achieved best throughput of masked AES with 128-bit data path, which is 17% larger than Zhou's design (Zhou's design is the best among the existing methods). Our masked AES with32-bit data path takes up 21% more area than Popp's one (popp's design is the smallest among the existing methods). Kamoun's design is the fastest among all the implementations which achieved 137MHz. Although, Zhou's design slightly larger than our design in throughput/area aspect, it cannot resist against glitch attack.

V.

CONCLUSION

In this paper, we detail the existing masking methods and classify those methods by their functions, masking value and applications. We also explore some methods for masked S-box implementation. In order to resist against DPA and glitch attack, we proposed an optimized AES implementation with 32-bit and 128-bit data path (masked S-box over GF(24» separately. We also show that the design efficiently resists against glitch attacks because the operations only consist of table look-up and Xor operations which breaks the foundations (AND operation) of glitch attacks. Our proposed design achieved rather better performance among the existing designs. REFERENCES [I]

National Institute of Standards and Technology, "Advanced Encryption Standard(AES)," FIPS-197, 2001.

[2]

P. Kocher, J. Jaffe and B. Jun, "Differential power analysis," in CRYPTO'99,vol. 1666,Springer-Verlag,1999,pp. 388-397.

[3]

T. S. Messerges, E. A. Dabbish and R. H. Sloan. "Investigations of power analysis attack on smartcards," Proc. USENIX Association. Workshop on Smartcard Technology (WOST 99),USENIX press,1999, pp.151-162.

[4]

R. Mayer-Sommer, "Smartly analyzing the simplicity and the power of simple power analysis on smartcards," in CHES 2000, vol. 1965, Springer-Verlag,2000,pp. 78-92.

[5]

S. Mangard,"A simple power analysis attack on implementations of the AES key expansion," in ICISC 2002,vol. 2587, Springer-Verlag, 2003, pp. 343-358.

[6]

S. B. Ors, F. GOrkaynak, E. Oswald and B. Preneel, "Power-analysis attack on an ASIC AES implementation," in IICC 2004, IEEE Press, Apr. 2004,pp. 546-566.

[7]

S. Mangard, N. Prarnstaller and E. Oswald, "Successfully attacking AES hardware implementations," in CHES 2005, vol. 3659, Springer Verlag,2005,pp. 157-171.

[19] J. Golic and C. Tymen, "Multiplicative masking and power analysis of AES," in CHES 2003,vol. 2529,Springer-Verlag,2003,pp. 31-47. [20] E. Trichina, D. De Seta and L. Germani, "Simplified adaptive mUltiplicative masking for AES," in CHES 2002, vol. 2523, Springer Verlag,2003,pp. 71-85. [21] M. Alam, S. Ghosh, M. J. Mohan and D. Mukhopadhyay, D. R. Chowdhury and I. S. Gupta, "Effect of glitches against masked AES S box implementation and countermeasure," lET Information Security, vol. 3,Feb. 2009,pp. 34-44. [22] T. Messerges, "Securing the AES finalists against power analysis attacks," in FSE 2001,vol. 1978, Springer-Verlag, 2001,pp. 293-301. [23] L. Goubin, "A sound method for switching between Boolean and arithmetic masking," in CHES 2001,vol. 2162, Springer-Verlag, 2001, pp. 3-15. [24] E. Trichina,T. Korkishko and K.H. Lee, "Small size, low power, side channel-immune AES coprocessor: design and synthesis results," in AES 2005,vol. 3373,Springer-Verlag,2005,pp. 113-127. [25] J. D. Golic and R. Menicocci, "Universal masking on logic gate level," lET Electronics Letters,vol. 40,May. 2004,pp. 526-528. [26] W. Fischer and B. M. Gammel, "Masking at gate level in the presence of glitches, " in CHES 2005, vol. 3659, Springer-Verlag, 2005, pp. 187-200. [27] K. Kumar, D. Mukhopadhyay and D. RoyChowdhury, "Design of a differential power analysis resistant masked AES S-Box," in INDOCRYPT 2007,vol. 4859,Springer-Verlag,2007, pp. 373-383. [28] Y. Baek and M. Noh, "DPA-resistant finite field mUltipliers and secure AES design," in Information Security Practice and Experience, vol. 3903,Springer-Verlag,2006, pp. 1-12. [29] M. Akkar and C. Giraud,"An implementation of DES and AES, secure against some attacks," in CHES 2001,vol. 2162,Springer-Verlag,2001, pp. 309-318. [30] J. Wolkerstorfer, E. Oswald and M. Lamberger, "An ASIC implementation of the AES Sboxes," in CT-RSA 2002, vol. 2271, Springer-Verlag,2002,pp. 67-78. [31] E. Oswald,S. Mangard, N. Prarnstaller and V. Rijmen,"A side-channel analysis resistant description of the AES S-box," in FSE, vol. 3557, Springer-Verlag, 2005,pp. 413-423.

[8]

J. Waddle and D. Wagner, "Towards efficient second-order power analysis," In CHES 2004,vol. 3156,Springer-Verlag,2004,pp. 1-15.

[32] E. Oswald and K. Schramm, "An efficient masking scheme for AES software implementations," in WISA, vol. 3786, Springer-Verlag, 2005,pp. 292-305.

[9]

M. Joye, P. Paillier and B. Schoenmakers, "On second-order differential power analysis," In CHES 2005, vol. 3659, Springer Verlag,2005,pp. 293-308.

[33] N. Kamoun,L. Bossuet and A. Ghazel, "SRAM-FPGA implementation of masked S-Box based DPA countermeasure for AES," in IDT 2008. pp. 74 - 77.

[10] E. Prouff,M. Rivain and R. Bevan,"Statistical analysis of second order differential power analysis," IEEE Transactions on Computers,vol. 58, 2009, pp. 799-811.

[34] D. Suzuki, M. Saeki and T. Ichikawa, "Random Switching Logic: A countermeasure against DPA based on transition probability," (http://eprint.iacr.orgl),Report 2004/346,2004.

[11] S. Mangard,E. Oswald and T. Popp. "Power analysis attacks: revealing the secrets of smart cards," Spinger-Verlag, 2007

[35] T. Popp and S. Mangard, "Masked dual-rail pre-charge logic: DPA

[12] J. D. Golic, "Techniques for random masking in hardware," IEEE Transactions on Circuits and Systems, vol. 54, Feb. 2007, Pages: 291300. [13] E. Trichina, "Combinational logic design for AES transformation on masked data ," Cryptology ePrint (http://eprint.iacr.orgl) ,Report 20031236,2003.

subbyte Archive

resistance without routing constraints," Cryptographic Hardware and Embedded Systems,vol. 3659,Springer-Verlag,2005,pp. 172-186. [36] K. J. Lin, S. C. Fang, S. H. Yang and C. C. Lo, "Overcoming glitches and dissipation timing skews in design of DPA-resistant cryptographic hardware," Proc. IEEE Symp. Design, Automation & Test in Europe Conference & Exhibition (DATE 07),IEEE Press,Apr. 2007,pp. 1-6. [37] N. Mentens, L. Batina, B. Preneel and I.

Verbauwhede, "An FPGA

[14] D. Canright and L. Batina, "A very compact "perfectly masked " S-box for AES," ACNS 2008,vol. 5037,Springer-Verlag,2008, pp. 446-459.

implementation of Rijndael: trage-offs for side-channel security," Programmable Devices and Systems (IFAC 04),2004,pp. 493-498.

[15] G. Piret and F. X. Standaert, "Security analysis of higher-order Boolean

[38] E. Trichina and T. Korkishko, "Secure AES hardware module for resource constrained devices," in Security in Ad-hoc and Sensor Networks,vol. 3313,2005,Springer-Verlag,pp. 215-229.

masking schemes for block ciphers (with conditions of perfect masking)," lET Information Security,vol. 2,Mar. 2007,pp. 1-11. [16] K. Schramm and C. Paar, "Higher order masking of the AES," in CT RSA 2006,vol. 3860,Springer-Verlag,2006,pp. 208-225. [17] J. BUlmer, J. Guajardo and V. Krummel, "Provably secure masking of AES," in Cryptography,vol. 3357,Springer-Verlag,2005,pp. 69-83. [18] C.

Herbst, E.

Oswald

and S. Mangard, "An

AES

smart

card

implementation resistant to power analysis attacks," in ACNS 2009,vol. 3989,Springer-Verlag,2006,pp. 239-252.

[39] Y. Zhou, G. Qian, Y. Xing, H. Liu, S. Goto and Y. Tsunoo, "An approach of using different positions of double registers to protect AES hardware structure from DPA," Proc. IEEE Symp. Electronic Commerce and Security,IEEE Press,2010,pp. 223-227.

[40] L. Ordu and B. Ors, "Power analysis resistant implementations of AES," ICECS 07,2007,pp. 1408-1411.

hardware

FPGA based Optimization for Masked AES Implementation - IEEE Xplore

FPGA based Optimization for Masked AES Implementation - IEEE Xplore

Suggest Documents

FPGA-Based Real-Time Implementation of AES

FPGA-Based 40.9-Gbits/s Masked AES With Area Optimization for ...

Design and Implementation of Fast FPGA Based ... - IEEE Xplore

FPGA implementation of Hilbert transformer based on ... - IEEE Xplore

FPGA-based Hardware Implementation of Optical Flow ... - IEEE Xplore

Theoretical Design and FPGA-Based Implementation of ... - IEEE Xplore

FPGA-Based Real-Time Implementation of AES Algorithm ... - wseas.us

High Performance Median FPGA Implementation for ... - IEEE Xplore

Real-Time Machine Vision FPGA Implementation for ... - IEEE Xplore

FPGA implementation of multiplication algorithms for ECC - IEEE Xplore

FPGA implementation of multipliers for ECC - IEEE Xplore

FPGA Implementation of Tabu Search for the Quadratic ... - IEEE Xplore

FPGA implementation of multipliers for ECC - IEEE Xplore

Novel Architecture for Efficient FPGA Implementation of ... - IEEE Xplore

Model-Based Optimization for Robotics - IEEE Xplore

Parallel Implementation and Performance Optimization ... - IEEE Xplore

FPGA Implementation AES for CCM Mode ... - Semantic Scholar

FPGA Implementation AES for CCM Mode ... - Semantic Scholar

An FPGA Based Ecosystem for USBPHY Validation - IEEE Xplore

An FPGA-Based Open Platform for Ultrasound ... - IEEE Xplore

FPGA Based Control of Series Resonant Converter for ... - IEEE Xplore

NoC-AXI Interface for FPGA-based MPSoC Platforms - IEEE Xplore

FPGA-Based Multiple-Channel Vibration Analyzer for ... - IEEE Xplore

FPGA-Based Reconfigurable Processor for Ultrafast ... - IEEE Xplore