2016 IEEE International Symposium on Nanoelectronic and Information Systems
LECTOR Based Gated Clock Approach to Design Low Power FSM for Serial Adder Pritam Bhattacharjee, Alak Majumder VLSI Design Laboratory, Department of Electronics & Computer Engineering National Institute of Technology, Arunachal Pradesh Yupia, India – 791112.
[email protected];
[email protected] multiple full adders for the same purpose. Therefore, it is generally observed that serial adders are preferred over the parallel adders as they need quite a less number of components in order to produce the required result [2]. In the context to the power consumption of the serial adders, the major constituents are the static power due to leakage current through the active and inactive devices and the dynamic power due to the switching transitions of the clock triggering the sequential block. The current leak (contention current) from the power supply (VDD) to the ground during logic transition in CMOS is something really inevitable in circuit design and this must be stopped. It can be done using down scaling of the supply voltage, which on the other hand reduces the operation speed, as the supply voltage approaches towards the threshold voltage (VT) of the transistors [3]. This can be addressed by reducing VT, which again results in exponential increase in subthreshold leakage current.
Abstract— As the chip size is getting decreased with the advent of technology, power dissipation has become a major issue to the circuit designers at the time of designing an integrated circuit. The substantial sources of power dissipation are the static power and dynamic power. A serial adder, one of the vital parts of any processor micro-architecture, is a victim to the huge power flow. In this paper, we have intervened on the solution for controlling both static and dynamic power flow by implementing the LECTOR based clock gating technique on the sequential elements of serial adder. LECTOR helps to reduce the static power by blocking the current between the power lines and gated clock minimizes the dynamic power by eliminating the needless switching of system clock. The simulation is carried out using 32nm, 45nm, 65nm and 90nm Predictive Technology Model and compared the result with Double Gated flip-flop based serial adder. Keywords— LECTOR based clock gating techniques; D Flip Flop; Serial adder; Static power.
The contention current can also be restricted using a concept known as power gating [20]. Here, the power supply is connected to the circuit network via controlled transistors referred as sleep transistors. These sleep transistors control the flow of current to the circuit network, allowing the flow only during the active stages. But the problem is, introduction of the sleep transistors degrade the logic value of the power supply voltage. Thereby, power gating is not a preferable option. The switching power on the other hand, governed by the clock-net in the circuit network, is a major concern for dynamic power dissipation. To solve the issue, the clock signal is shut off for a certain amount of time, when no operation takes place. In this way, switching activity factor of the design is reduced leading to an enormous saving of dynamic power. This technique is known as Clock Gating [4]. Therefore, the effective solution to this problem is to introduce an efficient clock gating technique in the design of serial adder which may block the current leakage through the power lines and reduces the transitions of clock signals during no change of input data to the sequential element. In this paper, we have implemented the efficient LECTOR based clock gating technique, which was proposed in [5], on the clock of the sequential block of serial adder. We have observed savings in average and dynamic power consumption with a small penalty in static power after investigating the serial adder architecture using PTM 32nm, 45nm, 65nm and 90nm technologies [16 – 19].
I. INTRODUCTION The trend of low power applicability with increasing operating frequency in integrated circuit has brought in the necessity to search for intelligent power reduction techniques. As per the report of International Technology Roadmap for Semiconductors (ITRS) in 2013, the increase in clock frequencies and chip density along with the integration of several technologies in a limited space (tablet, phablets and GPS etc.) have changed the motive of VLSI Industry from speed-sensitive to low power-sensitive [1]. The aggressive down scaling of the process technologies in recent days, has introduced usage of products like tablet PCs and smart-phones etc. Most of these products have reliable micro-architectures where power consumption needs to be extremely low in order to maintain the virtue of portability. One of the mostly used power hungry micro-architectures is serial adder, where its ‘SUM’ output is dependent on the previous ‘CARRY’ output which is serially stored in a bit storage cell as shown in fig. 1. The inputs A, B and the output ‘SUM’ uses data register packs comprising of flip flops for initial storage. Partial sum occurring during the process of addition uses the data register as well.
The organization of the paper is as follows: In section II, we have explored the various clock gating techniques and the architecture of serial adder in brief. Section III deals with the working principle of LECTOR based clock gating technique. The finite state machine (FSM) for serial adder is presented in Section IV along with its implementation using LECTOR based clock gating technique. The relative parametric analysis of this implementation with an same implementation using different clock gating style is discussed in Section V. Finally, we conclude the work in Section VI.
Fig. 1. Block level view of Serial Adder.
Though, both parallel and serial adders are having same peripheral overheads, the serial adders require a single full adder for multi-bit addition whereas the parallel adders require 978-1-5090-6170-9/16 $31.00 © 2016 IEEE DOI 10.1109/iNIS.2016.24
250
II. LITERATURE SURVEY
Machine. The output values of Mealy machine are incorporated depending on the present state as well as the present input whereas, the output value of Moore machine is depended only on the present state. Therefore, it is stated that Mealy machine implementation comprising of N states and k bits of input requires 2kN states to implement the same by Moore machine. This introduces architectural overhead in terms of number of components. So, to implement serial adder FSM, we have chosen Mealy machine in this paper.
The clock gating techniques are broadly classified as latchfree based gating, latch based gating and flip-flop based gating. All the three categories have their pros and cons which are extensively reported in [6]. There are few more gating techniques that are derived combining latch-free based gating and latch based gating styles. They are the double-gated gating, NC2MOS gating, dynamic gating and bootstrap XOR gating as reported in [7], [8] and [9] respectively. Recently, in 2016, Bhattacharjee et.al. [5], has presented a LECTOR-based gating technique, which has proved its own performance ability superior to all the other existing gating techniques. Serial adder design is focused in this paper as because they are the important constituent of the modern day processor micro-architectures and consumes large power. Also, for the case of designing multi-bit binary adders, where operational speed is not a matter of concern, serial adders are the best option as they have the least overhead compared to the other adders [2]. The state-of-the-art in this domain is quite a few. In [15], different design styles of 1 bit serial adder have been reported. But, the maximum and minimum average power consumed by those designs, when simulated on 0.8μm technology with a power supply of 2.4Volt, were 13Watt and 4.39Watt respectively. As low power applicability is the modern demand in processor architectures, therefore, we got motivated in this implementation. So, an effort is made to employ LECTOR–based clock gating (LT-CG) technique in designing the serial adder, as it has the tremendous potential to minimize the static power dissipation.
The serial adder is a digital system that has the capability of computing addition of two arbitrary numbers incorporating a full adder and a flip-flop. For multi bit addition operation, bit wise addition is done from the least significant bit (LSB) to the most significant bit (MSB) on the full adder. The bitwise addition generates sum and carry. This carry is transferred and added bit by bit from LSB to MSB. Therefore, the carry is considered to be a state variable. Generally, a circuit with one or more state variables is analysed using FSM [13, 21]. The variable carry can have the value either binary ‘0’ or binary ‘1’ [12]. State ‘a’ is defined for binary ‘0’ and state ‘b’ is for binary ‘1’. The Mealy model for the serial adder with the output ‘SUM’, shown in figure 1, depends on both the state variable ‘CARRY_IN’ and the present input values A and B [14]. For A=0 and B=0, the SUM generated is 0, but for A=0 and B=1 or vice-versa, the SUM is 1. Both the cases produce the carry to be 0. As for A=1 and B=1 the carry is 1, there is a state change from ‘a’ to ‘b’. This fact is depicted using state diagram as shown in fig. 3.
III. LECTOR – BASED CLOCK GATING TECHNIQUE As discussed in section–I, the major contributors to the power consumption are static and dynamic power dissipation. Though, the dynamic power dissipation could be stopped using clock gating, the current flow between the power lines during transitions of logic, leading to static power dissipation, is a concern [3]. In 2004, a new logic design style was reported which was able to block the contention current flowing in the power lines. The design style is called Leakage Control TransisTOR (LECTOR) [8]. This design style is incorporated to the combination of latch free based and latch– based clock gating style and is recently reported as LECTOR– based clock gating technique [5].
Fig. 3. State Diagram of Mealy FSM for Serial Adder. Present State (P)
For AB=
00
01
10
11
00
01
10
11
a
a
a
a
b
0
1
1
0
b
a
b
b
b
1
0
0
1
Present State
TABLE II: State Assignment table for Serial Adder Next State (N) Output SUM 00
01
10
11
00
01
10
11
0
0
0
0
1
0
1
1
0
1
0
1
1
1
1
0
0
1
(P)
Fig. 2. Block level view of LECTOR – based clock gated latch [5].
TABLE I: State table for Serial Adder Next State (N) Output SUM
For AB=
The block level architecture of this gating is shown in fig. 2. The clock only triggers when the data input is different than the present output data. The comparison of whether the input is indifferent of the output is carried out using XOR or XNOR gate depending on the application. IV. FINITE STATE MACHINE (FSM) FOR SERIAL ADDER The FSM which is a form of abstraction is used to model any digital system’s behaviour indicating every available state of the system and its corresponding changeover between states. It is generally categorized as Mealy machine and Moore
Fig. 4. K-Map representation of Next State and SUM.
251
On the other way round, the state ‘b’, where the CARRY_IN is 1, is obtained due to A=1, B=1 and A=0, B=1 or vice-versa. For the case of A=0 and B=0, the machine again changes the state from ‘b’ to ‘a’, as the carry generated is ‘0’. The state and state assignment table for the serial adder is depicted in Table I and Table II respectively. The Karnaughmap simplification for the next state and the output SUM is shown in fig. 4.
architecture of serial adder. This flip flop is customised using the Master–Slave latch configuration as shown in figure 5.a). The architecture in fig. 5a is simulated in 90nm PTM technology [19] with a power supply of 1.1V and a global clock frequency of 1.25GHz. The transient response of the architecture is shown in fig. 5b. It is evident from there that serial output CARRY_IN is generated with the triggering of gated clock and thereafter the output SUM is correctly obtained. The average delay from input to SUM output is 38.50399ps with an average power dissipation of 95.51μW.
The novelty of this work lies in implementing LECTOR based clock gating to design the flip-flop present in the
(a)
(b) Fig. 5. Serial Adder a) Block level architecture using LT-CG based Master-Slave latch configuration b) Transient response in 90nm PTM.
252
V. RESULT & ANALYSIS
that of the DG-CG based serial adder design as shown in Table–III. Latency is premeditated as the summation of Setup time and the average delay.
In this section, we have analyzed the simulation results of the architecture in 32nm, 45nm, 65nm and 90nm PTM technology. The serial adder is also tested after designing the sequential element using Double Gated clock gating technique. The discussions are categorized as timing analysis and power analysis.
B. Power Analysis The average power dissipated by the LT-CG serial adder is 88.84μW, 84.39μW, 81.91μW and 95.51μW for 32nm, 45nm, 65nm and 90nm PTM technology respectively with an operating frequency of 1.25GHz. This power is quite less as compared to the average power consumed by the serial adder implemented using DG-CG as shown in Table–IV. It is observed that the static power consumption gets reduced with the introduction of LECTOR as shown in fig. 6. In Table IV, we have displayed the percentage improvement of all the respective power consumptions.
A. Timing Analysis Here, we have incorporated the test model reported in [5] to analyze LECTOR – based clock gated serial adder architecture. The propagation delay of the architecture is observed to be nominal, but slightly greater in comparison to the serial adder incorporated using Double–Gated clock gating (DG-CG). But, the latency, setup and hold time of the LECTOR–based architecture is found to be quite better than
TABLE III: Timing Analysis of Gated Serial Adders Parameters
Predictive Technology Model (nm)
Rise Delay (ps)
Fall Delay (ps)
Average Delay (ps)
Latency (ps)
Setup Time (ps)
Hold Time (ps)
DG-CG Serial Adder
90
27.81
48.8
38.31
103.70
65.39
55.45
LT-CG Serial Adder
90
28.001
49.07
38.50
97.90
59.39
39.31
– 0.68
– 0.42
– 0.51
5.59
9.17
29.11
21.75
38.93
30.34
87.27
56.92
49.83 31.53
Percentage of Improvement DG-CG Serial Adder
65
LT-CG Serial Adder
65
Percentage of Improvement
21.80
39.62
30.71
76.67
45.96
– 0.19
– 1.77
– 1.21
12.15
19.26
36.71
DG-CG Serial Adder
45
16.84
34.86
25.85
71.24
45.39
33.93
LT-CG Serial Adder
45
16.58
34.31
25.45
59.38
39.59
22.58
1.54
1.57
1.56
16.66
25.25
42.96
Percentage of Improvement DG-CG Serial Adder
32
12.09
27.85
19.97
50.74
30.77
26.21
LT-CG Serial Adder
32
11.91
28.07
19.99
49.29
29.30
18.89
1.50
–0.80
–0.10
2.85
4.78
27.93
Percentage of Improvement
TABLE IV: Power Analysis of Gated Serial Adders Parameters
Predictive Technology Model (nm)
Average Power
Dynamic Power
Static Power
(μW)
(μW)
(μW)
(fJ)
DG-CG Serial Adder
90
134.21
64.35
32.62
51.41
LT-CG Serial Adder
90
95.51
48.04
22.09
36.77
28.83
25.35
34.28
28.47
Percentage of Improvement
Power Delay Product
DG-CG Serial Adder
65
112.98
50.75
45.54
34.27
LT-CG Serial Adder
65
81.91
41.45
28.92
25.15
27.50
18.31
36.51
26.61
Percentage of Improvement DG-CG Serial Adder
45
102.50
44.13
49.13
26.49
LT-CG Serial Adder
45
84.39
38.46
35.38
21.47
17.67
12.85
39.49
18.95
DG-CG Serial Adder
32
135.79
72.87
58.48
27.11
LT-CG Serial Adder
32
88.34
37.09
44.02
17.65
34.58
49.09
10.39
34.89
Percentage of Improvement
Percentage of Improvement
253
[2] [3] [4] [5]
[6]
[7]
[8] Fig. 6. Plot shows reduction of Static Power after incorporating LECTOR.
[9]
[10]
VI. CONCLUSIONS This paper investigates serial adder architecture, an important module of most of the modern day’s processor unit, incorporating LT-CG and DG-CG technique employed in the sequential block of it. The simulation and analysis are done using four process nodes namely 32nm, 45nm, 65nm and 90nm PTM. The timing performance of LT-CG based serial adder has got some savings over the DG-CG based serial adder with a minor penalty on the average delay. Again, certain savings of about 34.58%, 17.67%, 27.50% and 28.83% in average power has been noticed in LT-CG based serial adder than that of DG-CG based design using 32nm, 45nm, 65nm and 90nm process technology respectively. It is also observed that the static power is reduced by 10.39%, 39.49%, 36.51% and 34.28% in 32nm, 45nm, 65nm and 90nm respectively due to the inclusion of LECTOR.
[11]
[12]
[13]
[14] [15]
[16] [17] [18] [19] [20]
Acknowledgment We are thankful to Department of Electronics & Information Technology, MCIT, Govt. of India, for providing the financial grant under SMDP-C2SD project and Visvesaraya PhD Scheme.
[21]
References [1]
L. Wilson, ‘International technology roadmap for semiconductors (ITRS)’, Semiconductor Industry Association, 2013.
254
Weste, Neil HE, and Kamran Eshraghian. Principles of CMOS VLSI design. Vol. 188. New York: Addison-Wesley, 1985. K. Roy, S.C. Prasad, “Low power CMOS VLSI circuit design”, Wiley, 2013. T. Xanthopoulos, “Clocking in Modern VLSI Systems”, Springer, 2009. P. Bhattacharjee, A. Majumder and T.D. Das, “A 90 nm Leakage Control Transistor Based Clock Gating for Low Power Flip Flop Applications.” In IEEE 59th International Midwest Symposium on Circuits and Systems (MWSCAS), pp. 381-384. IEEE 2016. J. Shinde, S.S. Salankar, "Clock gating—A power optimizing technique for VLSI circuits." India Conference (INDICON), 2011 Annual IEEE. IEEE, 2011. A.G.M. Strollo, E. Napoli, D.D. Caro, "New clock-gating techniques for low-power flip-flops." Proceedings of the 2000 international symposium on Low power electronics and design. ACM, 2000. A.G.M. Strollo, E. Napoli, D.D. Caro, "Low-power flip-flops with reliable clock gating." Microelectronics journal 32.1 (2001): 21-28. M.O. Shaker, M.A. Bayoumi, "A clock gated flip-flop for low power applications in 90 nm CMOS." 2011 IEEE International Symposium of Circuits and Systems (ISCAS). IEEE, 2011. N. Hanchate, & N. Ranganathan, LECTOR: a technique for leakage reduction in CMOS circuits. IEEE Transactions on Very Large Scale Integration (VLSI) Systems, 12(2), 196-205, 2004. H. Lee, G.E. Sobelman, "FPGA-based digit-serial CSD FIR filter for image signal format conversion." Microelectronics journal 33.5 (2002): 501-508. Benini, Luca, and Giovanni De Micheli. "Transformation and synthesis of FSMs for low-power gated-clock implementation." Proceedings of the 1995 international symposium on Low power design. ACM, 1995. Cappello, Peter R., and Kenneth Steiglitz. "Digital signal processing applications of systolic algorithms." VLSI systems and computations. Springer Berlin Heidelberg, 1981. 245-254. Weste, Neil, David Harris, and Ayan Banerjee. "Cmos vlsi design." A circuits and systems perspective 11 (2005): 739. Njolstad, Tormod, and Einar J. Aas. "Power consumption and performance of low-voltage bit-serial adders." Circuits and Systems, 1996. ISCAS'96., Connecting the World., 1996 IEEE International Symposium on. Vol. 4. IEEE, 1996. http://ptm.asu.edu/modelcard/2006/32nm_bulk.pm http://ptm.asu.edu/modelcard/2006/45nm_bulk.pm http://ptm.asu.edu/modelcard/2006/60nm_bulk.pm http://ptm.asu.edu/modelcard/2006/90nm_bulk.pm Hu, Zhigang, et al. “Microarchitectural techniques for power gating of execution units.” Proceedings of the 2004 International Symposium on Low power electronics and design. ACM, 2004. Bhattacharjee P, Sadhu A, Das K. “A register-transfer-level description of synthesizable binary multiplier and binary divider.” In Microelectronics, Computing and Communications (MicroCom), 2016 International Conference on 2016 Jan (pp. 1-6). IEEE.