School of Electronics and Computing Systems. University of ..... 2006 : proceedings, ser. Lecture notes in computer science. Springer, 2006. [Online]. Available:.
SDMLp: On the Use of Complementary Pass Transistor Logic for Design of DPA Resistant Circuits Lakshmi Narasimhan Ramakrishnan∗ , Manoj Chakkaravarthy† , Antarpreet Singh Manchanda‡ , Mike Borowczak and Ranga Vemuri This work was performed while at the University of Cincinnati: Authors now at Synopsys∗ , Intel† , and NVIDIA‡ .
Digital Design Environments Lab School of Electronics and Computing Systems University of Cincinnati Cincinnati, Ohio 45221–0030 {ramakrln, chakkamj, manchaas} @ mail.uc.edu {borowcm, vemurir} @ ucmail.uc.edu Abstract—The emergence and proliferation of Smart Cards and other security-centric technologies require ongoing advancement in secure-IC design. We propose advanced IC protection from Differential Power Analysis attack though a hybrid-logic style based on Complementary Pass-transistor and Dynamic and Differential Logic (DDL) in conjunction with a synthesis methodology based on Reduced Ordered Binary Decision Diagrams. We demonstrate the capabilities of our logic cell and compare it with Wave Dynamic Differential Logic, and the traditional Standard Complementary Logic (SCMOS). Experimental results on a DES layout show significant reduction in area (43%) and total power (50%) and near constant power consumption in every cycle when compared to existing DDL styles at a speed penalty (20%) against SCMOS. Index Terms—SDMLp, DPA, Dynamic Differential Logic, Pass Transistor Logic, ROBDD Based Synthesis
[5]. Unfortunately, many existing countermeasures are quite specific in nature, limiting their applicability. Generic physical design methods exist, but suffer significant power and/or area penalties, thus making them unsuitable for low power applications. Our methodology reduces the existing gap, by reducing data dependent power variations, while reducing overall area and power consumption. The remainder of the paper is organized as follows: Section II covers the motivation for our work, Section III explains the proposed logic style -SDMLp, Section IV explains the synthesis technique, Section V includes comparative results and analysis of SDMLp performance and finally Section VI draws conclusions and discusses our future direction. II. M OTIVATION
I. I NTRODUCTION Increases in the use of smart cards and other cryptographic devices for authentication and secure communication coupled with the increasing sophistication of the attacks on them necessitates new design methodologies for secure devices. Two broad topics exist in the development of secure devices: the design of mathematically secure or hardened algorithms and the construction of secure physical implementations. Mathematical hardening, while potentially effective against cryptanalysis, does not guarantee security of implemented ASIC designs: data dependent information can leak through secondary sources such as power consumption, output delay, and EM emanations. Most physical attacks on secure devices, beyond those of brute force, target implementation weaknesses that are based on the correlation between device outputs and the leaked secondary information [1],[2],[3]. These methods belong to a class of attacks known as Side Channel Attacks; they gain information through the use of secondary data dependent sources not typically considered important to protect. In the past decade, significant work has gone into improving side channel attack mechanisms and counter-measures [4],
One of the most powerful side channel attacks, based on Differential Power Analysis or DPA, proposed by Kocher et al. [6], enhances the detection of data dependent information present in power consumption of the devices. DPA is a major threat for security as it is effective and cheap to execute [7]. The attack exploits the relationship between power consumption and data dependent computations of the device to reveal sensitive information. The statistical nature of the attack eliminates the effects of noise and weak security measures. Output data dependencies originate due to the characteristics of the widely used complementary CMOS logic style which consumes disproportionate power for each of the four possible transition scenarios: 0 → 0, 1 → 1, 0 → 1 and 1 → 0. To address this problem several logic styles have been proposed. First, Tiri et al. proposed Sense Amplifier Based Logic (SABL) [8], which introduced the concept of Dynamic and Differential Logic (DDL) and capacitance matching. DDL enables one switching per cycle by having both complementary and non-complementary signals which generate differential outputs and by pre-charging output nodes before evaluation. Second, Wave Dynamic Differential Logic (WDDL) developed by Tiri et al. [9], introduced the idea of pre-charge wave
propagation to enable DDL in standard static complementary CMOS (SCMOS) cells. It is critical to note that a WDDL implementation consumes approximately twice the power and requires more than twice the area, compared to SCMOS implementation. Later, Reduced Complementary Dynamic and Differential Logic (RCDDL) [10] was developed to consume less area and power than WDDL. Both RCDDL and WDDL use complementary logic structures to create complementary sections of a circuit, but have different switching capacitance due to their inherent physical structures. Our methodology for DPA resistant circuits addresses the issues of increased area and power consumption while consuming near constant power for all operations. III. SDML P : S ECURE D IFFERENTIAL M ULTIPLEXER L OGIC USING PASS T RANSISTORS While Pass-Transistor logic styles have been analyzed extensively from area and power savings perspective their use in the design of secure logic style has not been given any attention. Complementary Pass-transistor Logic (CPL) [11] is a differential logic and has symmetrical structure, making it an ideal candidate for secure circuit design. The fact that many logic functions can be realized using the same basic design makes it all the more attractive. CPL unfortunately only satisfies the differential requirement of a secure logic, not the dynamic requirement of one switching per cycle.
TABLE I G ENERIC CPL GATE INPUT PARAMETERS WITH ASSOC . OUTPUT IP1 A B
IP2 B A
S B B
Out A·B A+B
Output A·B A+B
A
A
B
A·B+A·B
A·B+A·B
B A B A
A B A A
S B B A
S·A+S·B A·B A+B A
S·A+S·B A+B A·B A
SDMLp is based on CPL and consists of two major transistor networks: Evaluation and Pre-discharge. The pre-discharge network is made up of PMOS transistors and is essential for introducing dynamic behavior to CPL, making SDMLp a DDL. The Evaluation network is made up of NMOS transistors similar to that of CPL. Figs. 1 and 2 show a two-input generic cell implemented in CPL and SDMLp respectively. These cells can be configured based on Table I to realize any two input function. The inputs Ip1bar, Ip2bar and Sbar are complements of Ip1, Ip2 and S respectively. Further, the Out and Outbar functions of SDMLp cell are formally expressed using (1) and (2). In the cell, transistors m1, m2, m7 and m8 form the NMOS network used for evaluation while the transistors m3, m4, m9 and m10 are used to propagate pre-discharge signal during setup. Out = Ip1 · S + Ip2 · Sbar + S · Sbar.
(1)
Outbar = Ip1bar · S + Ip2bar · Sbar + S · Sbar.
(2)
A. Setup - Pre-discharge phase
Fig. 1.
CPL Cell
The pre-discharge phase begins when both S and Sbar are forced to logic low. This is accomplished, using methods discussed in [9]. When S and Sbar are pulled down to logic level 0, the evaluation network is turned off and the pre-discharge network is activated. In other words, the NMOS transistors m1, m2, m7 and m8 stop conducting while the PMOS transistors m3, m4, m9 and m10 start conducting - forcing Out and Outbar to logic level 0. Pre-discharge signal propagation is required to initialize interconnect and internal capacitances of the circuit. Pre-discharge signal propagation along with near constant capacitance of the cell due to its symmetric structure enables constant power consumption during evaluation. B. Evaluation phase The evaluation phase begins when S and Sbar return to being complement of each other. During evaluation phase no direct path exists through the pre-discharge network and the transistors m1, m2, m7 and m8 evaluate to complementary, Out and Outbar signals. C. Early Propagation Effects
Fig. 2.
SDMLp Cell
Like other secure DDL logic styles (e.g WDDL, RCDDL), SDMLp also can potentially suffer from early propagation
effects [12]. The early propagation effect could be a potential source of data-dependent power consumption. While the leakage can be improved by adjusting the time delay between the input signals, it usually requires many additional constraints in the circuit design [13]. IV. S YNTHESIS AND I MPLEMENTATION OF SDML P C IRCUITS The SDMLp cell is a dual rail multiplexer. A binary decision diagram (BDD) provides an efficient data structure to represent a Boolean equation in the form of a directed acyclic graph (DAG). Every node in a BDD provides a one to one mapping to a multiplexer, in this case, to an SDMLp cell. G. Paul et al. discuss a similar BDD based synthesis for a dual rail multiplexer-based logic [14]. An optimal variable ordering is obtained using the CUDD package [15] and a ROBDD is generated using the BDD reduction procedure [16]. As mentioned in [14], the availability of differential signals enables the optimization of the ROBDD further by adding an additional step to the BDD reduction procedure by eliminating nodes whose children are complementary as shown in Fig. 3.
Fig. 3.
Fig. 4.
SDMLp design flow
Fig. 5.
SDMLp cell layout
Complementary Nodes
Fig. 4 shows the design flow when using the SDMLp logic cell. The ROBDD synthesis program generates an SDMLp Verilog gate level netlist for the combinational portion of the cryptography circuit (S-Boxes, XOR trees, multiplexer modules in the case of DES) and is integrated into a physical design flow. BLIF generation was done using a PERL script and the BDD Reduction program was implemented in C++. V. E XPERIMENTAL R ESULTS AND A NALYSIS All the experiments were performed in 90nm technology using the implementation flow described in Section IV. The SDMLp custom library was designed using Custom Designer. Logical synthesis was performed using DC Compiler, place and route using IC compiler and the post layout timing verification was performed using Primetime. Nanosim was used for functional verification and for Hspice simulation of the extracted netlist and to obtain current traces. The current traces were analyzed using Matlab, Perl and GNU C++.1
A. Cell level analysis Experimental results presented in Table II, show a significant reduction in maximum instantaneous current variance, between all possible input transitions, for several basic gates. This near constant switched current consumption improves attack resistance of SDMLp, but as Table III shows, incurs a delay penalty when compared to SCMOS. From Tables V and IV, we see that SDMLp also consumes relatively less power and occupies less area when compared to WDDL. As mentioned earlier, constant load capacitance was used for both Out and Outbar outputs during simulation. Similar to WDDL, this is a requirement for optimum performance 1 All tools are Registered Trademarks of Synopsys, Inc., except Matlab, Perl and C++ which are Registered Trademarks of MathWorks, Inc., The Perl Foundation and The Free Software Foundation, Inc. respectively.
TABLE II M AXIMUM I NSTANTANEOUS C URRENT VARIANCE (10−9 Amps2 ) Gates(2X1) AND OR XOR MUX
SCMOS 35.81 32.63 69.74 74.43
SDMLp 1.29 1.36 1.39 1.26
WDDL 7.91 6.96 7.73 8.64
Avg Std Dev
53.15 21.98
1.325 0.06
7.81 0.69
TABLE III P ROPAGATION D ELAY (10−12 S ECONDS ) Gates(2X1) AND OR XOR MUX
SCMOS 24.86 25.6 49.81 47.62
SDMLp 35.61 36.1 36.52 37.23
WDDL 37.6 35.56 78.61 77.22
Avg Std Dev
36.97 13.59
36.37 0.69
57.25 23.89
TABLE IV L AYOUT A REA (λ2 ). Gates(2X1) AND OR XOR MUX
SCMOS 7.6 7.6 12.2 11.4
SDMLp 18.4 18.4 18.4 18.4
WDDL 15.2 15.2 24.4 22.8
Avg Std Dev
9.7 2.45
18.4 0.00
19.4 4.89
TABLE V AVERAGE P OWER C ONSUMPTION (10−6 WATTS ). Gates(2X1) AND OR XOR MUX
SCMOS 5.29 5.62 6.69 7.02
SDMLp 7.41 7.43 7.02 6.93
WDDL 13.95 14.47 22.12 21.98
Avg Std Dev
6.15 0.83
7.20 0.26
18.13 4.53
of SDMLp and can be met by using fat wire routing [17] or other routing techniques for balancing capacitances of complementary wires. B. DES Cryptographic circuit In our second experiment, we repeated our cell based experiment on a DES circuit and its major cryptographic components. The DES hardware consists of eight Substitution boxes (S-boxes) [18]. In this experiment, all 8 DES S-boxes were implemented in SCMOS, SDMLp and WDDL logic styles; their area, power and instantaneous current variance are characterized in Tables VI-VIII respectively. After the implementation and characterization of the SBoxes, a full DES hardware implementation was done using the three logic styles showing signification reductions in area, power and instantaneous power at an expected speed
TABLE VI DES L AYOUT A REA (λ2 ). DES S-box 1 2 3 4 5 6 7 8
SCMOS 1137.45 1125.54 1091.42 1101.71 1079.64 1120.24 1122.96 1055.29
SDMLp 1766.09 1586.3 1537.81 1235.04 1615.82 1651.18 1512.04 1560.09
WDDL 2619.83 2503.26 2434.72 2393.85 2360.16 2408.69 2499.77 2461.09
Avg Std Dev
1104 28
1558 152
2460 82
TABLE VII DES T OTAL P OWER C ONSUMPTION (10−5 W atts). DES S-box 1 2 3 4 5 6 7 8
SCMOS 9.03 9.00 8.99 9.04 8.99 9.00 9.02 8.98
SDMLp 13.9 13.2 12.7 11.0 13.9 13.1 13.0 13.2
WDDL 20.1 17.9 18.5 14.9 18.7 19.2 17.1 18.3
Avg Std Dev
9.00 0.02
13.0 0.91
18.09 1.56
TABLE VIII DES M AXIMUM I NSTANTANEOUS C URRENT VARIATION (10−7 Amps2 ). DES S-box 1 2 3 4 5 6 7 8
SCMOS 7490 186.0 323.0 8690 245.0 7670 9420 323.0
SDMLp 3.03 4.01 4.11 5.30 7.18 4.29 4.62 4.77
WDDL 43.6 83.6 38.9 14.8 32.3 53.5 32.3 49.7
Avg Std Dev
4293 4342
4.66 1.21
43.59 20.15
TABLE IX F ULL C HIP DES I MPLEMENTED IN SCMOS, SDML P AND WDDL. DESFull Chip Area (λ2 ) Total Power (mW) Max. Op Freq. (MHz) Inst. Current Var. (10−6 Amps2 )
SCMOS 13247.21 1.37 100 1892.3
SDMLp 18715.26 1.47 66.67 5.95
WDDL 32714.64 2.96 83.33 229.1
penalty over WDDL. Table IX summarizes all the results, in particular the highlighting the dramatic move towards the typical SCMOS design metrics with the exception of speed. Fig. 6 shows the three DES layouts in the same bounding box of 260λ by 260λ.
(a)
(b) (a)
(c) Fig. 6.
DES layouts for (a) SCMOS, (b) WDDL and (c) SDMLp. (b)
C. Attack on DES Single Round Implementation In our final experiment, a Differential Power Analysis attack was performed on one round of DES circuit designed using SCMOS SDMLp and WDDL cells. The attack was performed based on the framework adopted by Junee for DES [19] and Tanimura & Dutt for AES [20]. We arbitrarily chose a secret key (20) while performing the attack. The secret key of SCMOS circuit was easily revealed on application of 70 random vectors. On the other hand, the secret key for the SDMLp and WDDL implementation could not be revealed even after the application of 5, 000 random vectors. In Fig. 7 the waveform shows the Correlation Coefficient versus Number of vectors during DPA attack for SCMOS, SDMLp and WDDL. The darkened line corresponds to the original Key. It can be seen that the correlation coefficient corresponding to the key stands out distinctly from other values over time in case of SCMOS design, while the correlation coefficient for the original key corresponding to SDMLp and WDDL is indistinguishable among other correlation values thereby improving attack resistance. Fig. 8 plots the Correlation Coefficient versus all 64 key guesses for SCMOS, SDMLp and WDDL respectively; a distinguishable peak in correlation identifies the key. Thus the secret key is revealed in the case of SCMOS, but in the case of WDDL and SDMLp there is no single key guess with a distinct peak. Finally, the correlation for the correct key guess (20), is indistinguishable from the correlation coefficient of other keys: further traces would not enable its distinction either. VI. C ONCLUSION AND F UTURE W ORK In this paper, we have proposed and validated a new DPA resistant logic style and methodology based on CPL and DDL.
(c) Fig. 7. DPA attack on (a) SCMOS, (b) WDDL and (c) SDMLp: Correlation Coefficient vs. Number of Vectors.
We have shown its simplicity and flexibility by implementing a universal cell. We have characterized its performance and compared it with SCMOS, and WDDL. SDMLp cell when compared to existing secure libraries shows approximately 2 fold reduction in area, 40 fold reduction in constant current consumption and 2 fold reduction in total power. We have shown that the logic style is scalable by implementing a DES chip and compared its performance against WDDL and SCMOS. The proposed logic style when compared to WDDL is slower (20%) due to the use of pass transistors however there is significant reduction in area (43%), power (50%) and current variance (40x). Finally, our design style, in addition to its benefits over existing styles maintains sidechannel resistance. A comparative analysis of side channel strength between our
(a)
(b)
(c) Fig. 8. DPA attack on (a) SCMOS, (b) WDDL and (c) SDMLp: Correlation Coefficient vs. Key Guess.
style and existing styles remains an open task. DES chips in all three logic styles are currently being fabricated for further validation and characterization of the proposed logic style. ACKNOWLEDGMENT The authors would like to thank Arun Kettimuthu and Greg Mefford, of the Digital Design Environments Laboratory, whose parallel and related work provided us with valuable insight and feedback into this work. R EFERENCES [1] D. Agrawal, B. Archambeault, J. R. Rao, and P. Rohatgi, “The em sidechannel(s),” in Revised Papers from the 4th International Workshop on Cryptographic Hardware and Embedded Systems, ser. CHES ’02. London, UK, UK: Springer-Verlag, 2003, pp. 29–45.
[2] P. C. Kocher, “Timing attacks on implementations of diffie-hellman, rsa, dss, and other systems,” in Proceedings of the 16th Annual International Cryptology Conference on Advances in Cryptology, ser. CRYPTO ’96. London, UK, UK: Springer-Verlag, 1996, pp. 104–113. [3] C. Clavier, J.-S. Coron, and N. Dabbous, “Differential power analysis in the presence of hardware countermeasures,” in Proceedings of the Second International Workshop on Cryptographic Hardware and Embedded Systems, ser. CHES ’00. London, UK, UK: Springer-Verlag, 2000, pp. 252–263. [4] S. Yang, W. Wolf, N. Vijaykrishnan, D. N. Serpanos, and Y. Xie, “Power attack resistant cryptosystem design: A dynamic voltage and frequency switching approach,” in Proceedings of the conference on Design, Automation and Test in Europe - Volume 3, ser. DATE ’05. Washington, DC, USA: IEEE Computer Society, 2005, pp. 64–69. [5] D. Suzuki, M. Saeki, and T. Ichikawa, “Random switching logic: A new countermeasure against dpa and second-order dpa at the logic level,” IEICE Trans. Fundam. Electron. Commun. Comput. Sci., vol. E90-A, no. 1, pp. 160–168, Jan. 2007. [Online]. Available: http://dx.doi.org/10.1093/ietfec/e90-a.1.160 [6] P. Kocher, J. Jaffe, and B. Jun, “Differential power analysis.” SpringerVerlag, 1999, pp. 388–397. [7] S. Mangard, E. Oswald, and T. Popp, Power Analysis Attacks: Revealing the Secrets of Smart Cards (Advances in Information Security). Secaucus, NJ, USA: Springer-Verlag New York, Inc., 2007. [8] K. Tiri, M. Akmal, and I. Verbauwhede, “A dynamic and differential cmos logic with signal independent power consumption to withstand differential power analysis on smart cards,” 2002, pp. 403–406. [9] K. Tiri and I. Verbauwhede, “A logic level design methodology for a secure dpa resistant asic or fpga implementation,” 2004, pp. 246–251. [10] V. Sundaresan, S. Rammohan, and R. Vemuri, “Power invariant secure ic design methodology using reduced complementary dynamic and differential logic,” in Very Large Scale Integration, 2007. VLSI - SoC 2007. IFIP International Conference on, oct. 2007, pp. 1 –6. [11] J. Rabaey, Digital Integrated Circuits: A Design Perspective, ser. Prentice Hall electronics and VLSI series. Prentice Hall, 1996. [12] K. Kulikowski, M. Karpovsky, and A. Taubin, “Power attacks on secure hardware based on early propagation of data,” in On-Line Testing Symposium, 2006. IOLTS 2006. 12th IEEE International, 0-0 2006, p. 6 pp. [13] J. Domingo-Ferrer, J. Posegga, D. Schreckling, and I. W. . S. Cards), Smart card research and advanced applications: 7th IFIP WG 8.8/11.2 International Conference, CARDIS 2006, Tarragona, Spain, April 19-21, 2006 : proceedings, ser. Lecture notes in computer science. Springer, 2006. [Online]. Available: http://books.google.com/books?id=RvWgaZ4HzmgC [14] G. Paul, S. Pradhan, A. Pal, and B. Bhattacharya, “Low power bdd-based synthesis using dual rail static dcvspg logic,” in Circuits and Systems, 2006. APCCAS 2006. IEEE Asia Pacific Conference on, dec. 2006, pp. 1504 –1507. [15] F. Somenzi, “Cudd: Cu decision diagram package.” [Online]. Available: http://bessie.coloado.edu/ fabio/CUDD [16] R. E. Bryant, “Graph-based algorithms for boolean function manipulation,” IEEE Trans. Comput., vol. 35, no. 8, pp. 677–691, aug 1986. [Online]. Available: http://dx.doi.org/10.1109/TC.1986.1676819 [17] K. Tiri and I. Verbauwhede, “Place and route for secure standard cell design,” in Smart Card Research and Advanced Applications VI, ser. IFIP International Federation for Information Processing, J.-J. Quisquater, P. Paradinas, Y. Deswarte, and A. El Kalam, Eds. Springer Boston, 2004, vol. 153, pp. 143–158. [18] U. D. of Commerce and NIST, “Data encryption standard,” FIPS, October 1999. [Online]. Available: http://csrc.nist.gov/publications/fips/fips46-3/fips46-3.pdf [19] R. Junee, “Power analysis attacks :: A weakness in cryptographic smart cards and microprocessors,” Master’s thesis, Bachelor of Computer Engineering and Bachelor of Commerce, November 2002. [20] K. Tanimura and N. Dutt, “Exccel: Exploration of complementary cells for efficient dpa attack resistivity,” in Hardware-Oriented Security and Trust (HOST), 2010 IEEE International Symposium on, june 2010, pp. 52 –55.