Leakage Power Reduction in Flip-Flops by Using ... - IEEE Xplore

Reduction in Flip Flops by Using MTCMOS and ULP Switch

Leakage Power

Huazhong Yang Department of Electronic Engineering Tsinghua University Beijing, China [email protected]

Saihua Lin Department of Electronic Engineering Tsinghua University Beijing, China [email protected] Abstract-As feature size of the CMOS technology continues to scale down, leakage power has become an ever-increasing important part of the total power consumption of a chip. By analyzing the leakage path of flip flops, we propose a method to reduce the leakage power of flip flops in this paper. Experimental results show that the leakage power of the proposed flip flop can be reduced by an average of 72.35% and 21.88% in standby mode and in active mode respectively while the delay time stays the same and the expense of area is small. I.

M IP DCOII. Finally, in section IV, we make a conclusion of this paper.

Q QB

D

INTRODUCTION

With the scaling of CMOS technology, leakage power is expected to become a significant portion of the total power consumption in future CMOS systems. Previously, most of the techniques, such as MTCMOS [1], and reverse body bias [2], are focused on the leakage reduction of combinational logics, whereas in this paper, we try to reduce the leakage power in sequential logics, such as flip flops. Due to the tighter timing constraints and critical performances of digital systems, new flip-flop families have been developed and integrated in high performance microprocessors. Among them, IP DCO shown in Fig. 1 is assumed to be the fastest [3] and has a large amount of negative setup time. However, the characteristic of high power consumption limits its application in low power integrated circuits design. Hence in this paper, we first analyze the leakage path existed in this flip flop and then, we try to use MTCMOS and ULP switch techniques to reduce the leakage power. Experimental results show that the leakage power of the proposed flip flop can be reduced by an average of 72.35% and 21.88% in standby mode and in active mode respectively while the delay time stays the same and the area penalty is very small. The rest of the paper is organized as follows. In section II, we analyze the leakage path of IP DCO and propose the new flip flop M IP_DCOI. In section III, we provide the experimental results of these two flip flops and further propose an implicit conditional discharge flip flop

1-4244-0173-9/06/$20.00 ©2006 IEEE.

Figure 1. IP_DCO II.

IP_DCO ANALYSIS AND IMPROVEMENT

Analysis From the BSIM MOS transistor model [4], the subthreshold leakage current can be given by:

A.

IDS where

=

l0eV

VGS- Vth 0 + rVBS + 7VDSJ n VT

V1,

)(l e- VDS /VT )

(1)

VT=kTIq is the thermal voltage, VGS VDS and ,

,

VBS are the gate-to-source, the drain-to-source, and the bulkto-source voltages, respectively. y and 77 are the body effect

and DIBL coefficients, respectively. n is the sub-threshold slope coefficient, and IO = goCO W /Lff L VT2e'8 From (1), we can find although a transistor is "off', there still exists current flowing through this transistor which results in the leakage power. .

21

Aside from the sub-threshold leakage current in standby mode, in active mode there still exists some leakage paths. Fig. 2 shows the keeper in IP DCO. When node a is assumed to perform low to high transition, node b will perform high to low transition after the delay time of an inverter. Hence, before node b is stabilized as the low voltage 0, NO transistor is still turned on. As a result, when node a is charged, a leakage path will exist and cause extra power consumption. Similar analysis can be done when node a is assumed to perform high to low transition. 0-

aba leak

\NO

Figure 3. M_IP_DCOI

b b -1

-1

Figure 2. Keeper Analysis in IP_DCO

B. Improvement By analyzing the leakage paths of IP DCO in both active mode and in standby mode, we propose a new circuit to reduce the leakage power, as shown in Fig. 3. The inner keeper and the output keeper is simplified and improved for leakage reduction. For example, consider the case that Q in Fig. 3 is assumed to perform low to high transition. After clk transits from low to high, node xin is discharged quickly. As a result, MI is turned on while N7 is turned off which results in the leakage power reduction in active mode. In order to reduce the leakage power in standby mode, we use high-Vth transistors. For example, the low-Vth threeinverter chain is replaced with high-Vth three-inverter chain. Another technique taken to reduce the leakage power is to use ULP switch, as shown in Fig. 4. In [5], the authors proposed a ULP latch to reduce the standby power of CMOS flip flops in SOI technology. However, this technique is difficult to implement in a CMOS technology because of the effect of the bulk. Thus in this paper, we modified this technique and propose a ULP switch technique. Similar results have been obtained in [6]. By inserting the ULP switch in the inverter, the leakage power of the inverter can be reduced since in standby mode, both the NMOS transistor and PMOS transistor operate in nearly cutoff region. Fig. 5 shows the effectiveness of the ULP switch when considering a minimum sized inverter. The input of the inverter is set to zero. We can find that as the power supply voltage increases, more leakage power can be reduced. Furthermore, by adjusting the sizes of the ULP switch, we can change the total delay of the inverter chain very easily.

I-ligh Vth

Low Vth

With ULP Switch

F

WithoutIt ULP Switch

Figure 4. Symbol definition

1200

w/o ULP Switch with ULP Switch

1000

a)0

800

0~n

600

a) 0)

a1)

400

-i

200

C.-

1

1.5

2 2.5 Power Supply (V)

3

Figure 5. Comparison of leakage powers with and without ULP switch of a minimum sized inverter.

RESULTS In this section, we first compare the leakage power of IP_DCO and M_IP_DCOI in standby mode and then, we compare the power and delay time in active mode. Finally, III.

22

we propose a new implicit conditional discharge flip flop at the end of this section.

We further experiment the power consumption when different input data patterns are applied. PatternO represents clk = 100 MHz and D = 20 MHz which means the typical A. Standby Mode Leakage Power Comparision case and the internal node xin has redundant switching The IP DCO and M IP DCO are implemented in a 0..18 activity. Patternl represents clk = 100 MHz and D = 500 gm CMOS technology and simulated using HSPICE. Two MHz. Pattern2 represents clk= 100 MHz and D = 100 MHz, which means the switching activity of data is comparable to threshold voltages (Vth) are available both for NMOS clock. Pattern3 represents clk = 100 MHz and D = 0, which transistors and PMOS transistors. High speed NMOS means the power is mainly due to the inverter chain. (PMOS) transistors feature Vth= 0.1646 V (- 0.2253 V). Low leakage NMOS (PMOS) transistors feature Vth= 0.3075 V (Pattern4 represents clk = 100 MHz and D = 1, which means 0.4555 V). The minimum length and width for high speed the power is mainly due to the switching activity of internal NMOS/PMOS transistors are 0.24 tm and 0.24 tm node xin. Table III and Fig. 6 show the power comparison respectively. The minimum length and width for low leakage of these five cases. NMOS/PMOS transistors are 0.18 tm and 0.24 tm From these results, we can find the new flip flop gain an respectively. Thus, although M IP DCOI has more average power reduction of 21.88% compared to the transistors than IP_DCO, the total L, W products of the two original one. The minimum power reduction occurs when circuits are still comparable (2.7264 giM2 and 2.8912 gmi2) the input data is constant zero. In this case, the power is when these two circuits are optimized to have the same datamainly due to the inverter chain of IP DCO or M_IP_DCOI. to-Q (D-Q) time. We can find that it amounts to more than one third of the total power consumption. Therefore, we can shorten the Table I shows the leakage power comparison of the two inverter chain to minimize the total power consumption as circuits in standby mode. Since when clk is high, the voltage long as the function is not affected. xin of the internal node can be high or low, we assume (1) represents the case when xin is high while (0) represents the case when xin is low in Table I. We can find that the TABLE III. POWER COMPARISON IN DIFFERENT PATTERNS proposed method is very efficient and the leakage is reduced by an average of 72.35% compared to the original one. DFF PattemO Patteml Patter2 Patter3 Patter4

(Atw)

D 0

Cik PDCO (nW) Vdd 182.0763 320.1770

M_IP_DCOI (nW) 83.5982 59.2497

Vdd

Vdd 182.0764 320.1770

83.5983

Vdd

(1)

(0)

(1)

(0)

(1)

(1)

493.3686 O 0 0 317.7729 Minimnum leakage power reduction Average Leakage power reduction

71.6789 71.6528 54.09% 72.35%

(0)

(0)

Original Proposed

(fF) 20 20

D-

D-

Q(lh) Q(hl) (ps) (ps) 106 105

92 92

Power

(GW)

16.57 12.10

(Atw)

5.41 5.28 2.40% 21.88%

(Atw)

18.16 12.96

20

15

TABLE II. DELAY AND POWER COMPARISON AFTER OPTIMIZATION

CL

(Atw)

59.2497

B. Active Mode Power Comparision We first optimize the two circuits so that they have the same D-Q time and then, we compare the power consumptions of these two circuits. The delay time is defined as Maximum (D-Q(hl), D-Q(lh)) and PDP is defined as the product of delay time and power. We can find M IP_DCOI exhibits about 26% better performance in terms of PDP, as shown in Table II.

DFF

(Atw)

19.54 Original 15.92 14.43 Proposed 11.90 Minimum Power Reduction Average Power Reduction

TABLE I. LEAKAGE POWER COMPARISON IN STANDBY MODE

PDP

(fJ)

15.92 T 1.688 11.90 T 1.250

I-,L-10

a) 0

n~

PatternO Patternl Pattern2 Pattern3 Pattern4 Figure 6. Power consumption comparison in different patterns

Fig. 7 and Fig. 8 show behavior of both the IP DCO and the M CP DCOI when the supply voltages are changed. The delay times of the two circuits are comparable when the

23

supply voltage is small. However, when the supply voltage is high, the proposed flip flop shows more improved characteristics over the original one as shown in Fig. 6. From Fig. 8, we can find the proposed flip flop is more power efficient. The PDP of the proposed flip flop is much smaller than that of the original one in the whole supply voltage span. The minimum and average PDP of the proposed flip flop are 24.89% and 25.96% less than that of the original one respectively. 130

25 20

15

0~ n

10

original --E-proposed G

12

original proposed

5 0O

,-

50

100 CL (fF)

110

0-

150

200

Figure 9. PDP vs. capacitance load

n 100

90

80 1.5

-l,

2.5

2 Vdd (V)

3

Figure 7. Delay vs. Supply Voltage

4.5 4

original

switching,

we

can

apply

conditional

method,

such

paper, we propose an

implicit conditional

discharge flip flop as shown in Fig. 10. Previously, the authors are focused on the explicit conditional discharge flip flop

3

and exclude the

power

of

pulse generator

for

power

comparison [7]. However, it is found that the pulse generator can consume considerable power and thus make explicit style of the flip flops no superior to the implicit style [3]. What's more, if we want to use that style of the flip flops we have to design a pulse generator first, which is not very convenient. Therefore, in this paper, we propose the implicit conditional discharge flip flop.

2.5

2

1.5

1E 0.5 1.5

as

conditional pre-charge method and conditional discharge method [7]. However, when applying pre-charge method, the voltage of node xin is kept 0 if D is constant high and thus, if Q is assumed to perform high to low transition, xin has to be charged to high value first and then Q might drop. As a result, the high-to-low D-Q time is much larger than that of low-to-high D-Q time, which makes it worse than the original IP DCO. Hence in this

proposed

3.5

n

C. Further Discussion Both IP DCO and M_IP_DCOI have the problems that the internal node xin will be pre-charged redundantly even when D is constant high. In order to eliminate this redundant

2.5

2

3

Vdd (V) Figure 8. PDP vs. Supply voltage

We also examine the behavior of both the IP DCO and the M CP DCO when the load capacitance is changed. We can still find the proposed flip flop is superior to the original one, as shown in Fig. 9. Figure

10.

M_IP_DCOII

24

The drawback of M_IP_DCOII compared to IP_DCO is that more transistors are used and hence the chip area is increased. Another drawback is that the delay time of M IP DCOII is larger than that of IP DCO because there are three transistors in the discharge path of the second stage. Besides, there still exist some glitches in M IP DCOII. However, due to the reduction of total power, the PDP is still much smaller than that of IP DCO. For example, the PDP of IP DCO is only 1.009 fl while the PDP of M_IP_DCOII is 0.653 fJ, less than 35.3% of the IP_DCO.

Figure 11. Waveform of IP_DCO

IV. CONSICLUSION In this paper, we first analyze the leakage power consumption in the IP DCO, which is one of the fastest flip flops among all kinds of flip flops. Then, we propose a new flip flop M IP DCOI with MTCMOS and ULP switch to reduce the leakage power of the flip flop. Experimental results show that the proposed flip flop behaves better than the original one even when the supply voltage is changed and the load capacitance is changed, whereas the expense of area is very small. The leakage power of the proposed flip flop can be reduced by an average of 72.35% and 21.88% in standby mode and in active mode respectively. Finally, at the end of the paper, we propose a new implicit conditional discharge flip flop to reduce the redundant switching activity of the internal node xin. It is still found that considerable power can be saved when the data input is constant high. REFERENCES [1] [2] [3]

[4]

Figure 12. Waveform of M_IP_DCOII

Fig. 11 and Fig. 12 show the differences between these two different styles of flip flips. We can find that when D is high, the internal switching activity of M IP_DCOII is less than that of IP DCO. We can also find that the glitches of M IP_DCOII are still less than that of IP DCO. Due to these reductions, the power of M IP_DCOII is only 4.296 tW in this case whereas the power of IP DCO is 9.515 tW, more than two times of that of the M IP DCOII.

[5]

[6]

[7] [8]

S. Shigematsu et al., "A 1-V high-speed MTCMOS circuit scheme for power-down applications," in Proc. IEEE Symp. VLSI Circuits Dig. Tech. Papers, 1995, pp. 125-126. T. kobayashi and T. Sakurai, "Self-adjusting threshold-voltage scheme(SATS) for low-voltage high-speed operation," in Proc. IEEE Custom Integrated Circuits Conf, 1994, pp.271-274. J. Tschanz, et al., "Comparative delay and energy of single edgetriggered & dual edge-triggered pulsed flip-flops for highperformance microprocessors," in Proc. ISLPED'01, Huntington Beach, CA, Aug. 2001, pp.207-212. B. J. Sheu, et al., "BSIM: Berkeley short-channel IGFET model for MOS transistors," IEEE J. Solid-State Circuits, vol. 22, pp. 558-566, Aug. 1987. David Levacq, Vincent Dessard, and Denis Flandre, "Ultra-low power flip-flops for MTCMOS circuits," in Proc. ISCAS'05, pp. 4681-4684. Narender hanchate, Nagarajan Ranganathan, "LECTOR: A techniuqe for leakage reduction in CMOS circuits," IEEE Trans. VLSI Syst., vol.. 12, no. 2, Feb. 2004, pp. 196-205. Peiyi Zhao, Tarek k. Darwish, and Magdy A. Bayoumi, "Highperformance and low-power conditional discharge flip-flop," IEEE Trans. VLSI Syst., vol. 12, no. 5, May, 2004, pp. 477-484. Vladimir Stojanovic and Vojin G. Oklobdzija, "Comparative analysis of master-slave latches and flip-flops for high-performance and lowpower systems," IEEE J. Solid-State Circuits, vol. 34, no. 4, April. 1999, pp. 536-548.

25