Design of a Dynamic CMOS Incrementer/Decrementer and a ... - inase

66 downloads 0 Views 362KB Size Report
power for increment and decrement operations respectively than ..... Table 1. TABLE I. AVERAGE POWER DISSIPATION VALUES OF DIFFERENT.
Proceedings of the 2013 International Conference on Electronics and Communication Systems

Design of a Dynamic CMOS Incrementer/Decrementer and a Parallel Cascading Architecture B. Archanadevi, V. Anbumani, T. Malathy, P. Balasubramanian

N. E. Mastorakis Division of Electrical Engineering and Computer Science Military Institutions of University Education Hellenic Naval Academy, Piraeus 18539, Greece [email protected]

Department of Electronics and Communication Engineering S. A. Engineering College, Chennai 600 077, India [email protected] Abstract—Dynamic CMOS based transistor level designs of incrementer/decrementer circuit is presented in this work. The design of a new 8-bit decision module is first described. This is followed by elucidation of an original cascading architecture to realize larger size incrementer/decrementer circuits. From SPICE simulations corresponding to a 0.25µ µm CMOS process technology, it is inferred that an 8-bit incrementer/decrementer embedding the new decision module macro dissipates 48% less power for incrementing and 30% less power for decrementing than the one incorporating a conventional macro. Further, 16bit and 32-bit incrementers/decrementers constructed using the proposed cascade consume 21% and 23% reduced average power for increment and decrement operations respectively than their conventional counterparts.

The rest of this paper is structured as follows. Section 2 describes the least significant one bit principle underlying the design of the decision module block. Section 3 details the new 8-bit dynamic CMOS based decision module macro design, which forms the ‘heart’ of the incrementer/decrementer circuit. In Section 4, a novel cascading architecture proposed to build larger size incrementer/decrementer circuits is presented, followed by the documentation of the simulation results and the concluding remarks in Section 5. II.

INCREMENT/DECREMENT – OPERATING PRINCIPLE

The basic template for performing increment/decrement operation is illustrated through the circuit shown in Figure 1.

Keywords—Incrementer/decrementer; Dynamic CMOS logic; Low power; Full-custom design; Digital integrated circuit

I.

INTRODUCTION

Microprocessors, microcontrollers and application-specific processors [1] usually require a program counter to perform operations such as incrementing and/or decrementing with respect to address calculations for data access, branching and storage functions. In this context, an incrementer/decrementer circuit plays an important and crucial role [2] [3]. Generally, incrementer/decrementer circuits can be optimally designed in a full-custom manner at the transistor level [2] – [4] rather than a semi-custom synthesis using standard cells [5]. In this regard, references [3] and [4] deal with the design of an 8-bit incrementer/decrementer circuit based on the principle of priority encoding [4] [6] [7]. Based on the survey of existing literature [2] – [5] and to the best of our knowledge, no other better circuit level design exists compared to reference [3]. The basic circuit module can eventually be cascaded to implement larger size increment/decrement functionality at the physical level, which can be subsequently made available either on-chip or off-chip. Moreover, circuit level designs can be better optimized for area/delay/power parameters than gate level designs. Reference [3] has also put forward a scheme to realize higher order incrementer/decrementer circuits through multi-level lookahead and folding techniques.

146

Fig. 1. Priority encoding based 8-bit incrementer/decrementer module [3] [4]

Proceedings of the 2013 International Conference on Electronics and Communication Systems

The 8-bit incrementer/decrementer circuit depicted in Figure 1 contains three sub-modules viz. data-in selector, decision module, and data-out selector. I0 to I7 represent the primary inputs, D0 to D7 denote the data inputs to the 8-bit decision module, and Y0 to Y7 signify the primary (incremented/decremented) outputs. Here, I0 (and hence D0) and Y0 are the least significant bits, and assume the highest priority among the input and output bits. The data-in selector consists of an Inc/Dec’ signal whose value indicates the type of operation to be performed; increment if Inc/Dec’ = 1, and decrement otherwise. Clk refers to the clock signal which is used to synchronize the circuit operation. During the fallingedge of clock, the 8-bit decision module is pre-charged as all the nMOS transistors (n1 to n8) turn-on and eventually the decision module outputs are driven to 1 (set). However, during the rising-edge of the clock, nMOS transistors n1 to n8 will turn-off and pMOS transistors p1 to p8 turn-on. Based on the priority, the least significant output bit corresponding to a least significant input bit, and the less significant output bits are retained as 1, while the more significant output bits are driven from 1 to 0 during the clock evaluation phase. Due to the high switching activity encountered with initial ‘setting-up’ of all the outputs and subsequent high-to-low transitions of select output bits, and with widespread usage of pMOS transistors, the average power dissipation of the incrementer/decrementer circuit shown in Figure 1 is likely to be very high. LA_in, LA_int, and LA_out are the respective input, internal and output lookahead signals, which are used to build larger size incrementers/decrementers via cascading.

The decision module macro shown in Figure 2 implements the following output equations. Y0 = D0 + D1 + D2 + D3 + D4 + D5 + D6 + D7

(1)

Y1 = D0’ (D1 + D2 + D3 + D4 + D5 + D6 + D7)

(2)

Y2 = D0’ D1’ (D2 + D3 + D4 + D5 + D6 + D7)

(3)

Y3 = D0’ D1’ D2’ (D3 + D4 + D5 + D6 + D7)

(4)

Y4 = D0’ D1’ D2’ D3’ (D4 + D5 + D6 + D7)

(5)

Y5 = D0’ D1’ D2’ D3’ D4’ (D5 + D6 + D7)

(6)

Y6 = D0’ D1’ D2’ D3’ D4’ D5’ (D6 + D7)

(7)

Y7 = D0’ D1’ D2’ D3’ D4’ D5’ D6’ D7

(8)

With reference to Figure 2, during the falling-edge of the clock (Clk), pMOS transistors pc1 to pc8 turn-on, and outputs Y0 to Y7 are reset. During the leading clock edge, the pMOS transistors turn-off, and the circuit processes data inputs and produces the desired outputs. For example, given LA_in is 1, and for a rising clock edge, if D1 is 1, and all the other inputs are 0, nMOS transistors ev1, ev2 and ev3 would turn-on thus resulting in outputs Y0 and Y1 becoming set, while the remainder of the outputs continue to remain reset. Since the outputs are initially reset, and only the requisite outputs are set during the evaluation phase, the average switching activity would be potentially less for this decision module compared to that shown in Figure 1, highlighting the likelihood of a low power design.

To explain the increment/decrement circuit operation, let us consider an example primary input as 11001011. To enable the decrement function, Inc/Dec’ signal is set to 0 in Figure 1. Hence, the output of the data-in selector is the same as its input, i.e. D7 to D0 = I7 to I0 = 11001011. According to the ‘least significant one bit’ (LSOB) principle [3], the output of the 8-bit decision module is equal to 00000001, since a 1 is present in the least significant position of the data input. As a result, only the least significant but highest priority output bit is driven to 1, while the other more significant but low priority output bits are driven to 0. By EXORing the data input and the decision module output in the data-out selector module, we obtain the final result as 11001010, which is indeed the desired decrement value. Assuming the same primary input, to perform increment operation, Inc/Dec’ signal is set to 1 in the data-in selector module. This complements the primary input, and hence D7 to D0 becomes 00110100. With this as the input supplied to the decision module, as per the LSOB principle, the decision module will output 00000111. EXORing this with the primary input (11001011); the data-out selector produces 11001100, which is the required increment value. III.

pc8 Y7 D7

pc7 Y6

D6

pc6 Y5

D5

pc5 Y4

D4

pc4 Y3

D3

pc3 Y2

D2

pc2 Y1 ev2

D1

pc1 Y0

D0

ev1

ev3

LA_in

PROPOSED 8-BIT DECISION MODULE DESIGN

A new 8-bit decision module has been designed on the basis of the LSOB principle, as shown in Figure 2, utilizing the dynamic CMOS logic style. This decision module can in fact replace the one shown in Figure 1 to facilitate binary increment or decrement operations on demand. It can be seen in Figure 2 that there is no separate lookahead output, but only a lookahead input provision. Nevertheless for cascading, we resort to a parallel schema which is discussed in Section 4.

Clk

Fig. 2. Proposed nMOS based 8-bit decision module macro

Comparing the 8-bit decision modules of Figures 1 and 2, it is evident that pMOS transistors are predominant in the conventional design, while nMOS transistors are widely prevalent in the proposed design – this could indeed translate

147

Proceedings of the 2013 International Conference on Electronics and Communication Systems

into substantial savings in terms of power dissipation with an aspect ratio of two governing the width of pMOS versus nMOS transistors, except for those nMOS transistors which lie in the critical path whose widths could be made comparable to or higher than pMOS transistors to spruce signal propagation. IV.

CASCADING ARCHITECTURES

A. Existing Architecture In reference [3], a linear cascade of increment/decrement blocks (granularity of the basic block being 8) is used to realize higher order incrementer/decrementer circuits. The output lookahead of a fundamental increment/decrement module serves as the input lookahead for the next module in the cascade. The maximum delay encountered with lookahead signal propagation would be governed by a logarithmic order: O[log2 2{(N/8)-1}], where N signifies the overall size of the incrementer/decrementer circuit. Figure 3 portrays a 16-bit incrementer/decrementer as a sample, featuring a series connection of two 8-bit increment/decrement building blocks. The 8-bit Inc/Dec Block 1 in the cascade of Figure 3 assumes higher priority than the succeeding module (Inc/Dec Block 2).

Fig. 3. 16-bit incrementer/decrementer realized using the cascaded structure of [3], utilizing 8-bit incrementer/decrementer block(s) as shown in Fig. 1

To reiterate the significance of primary inputs and outputs, inputs and outputs having the highest subscript are referred to as ‘most significant’, while those inputs and outputs featuring lesser subscripts are identified as ‘less significant’. In terms of priority assignment however, the 8-bit increment/decrement block comprising least significant primary inputs and outputs is accorded the ‘highest’ priority, while the successive blocks are associated with descending order of priority. To explain the operation of priority based 16-bit incrementer/decrementer shown above, we revert back to Figure 1 for assistance. The equations for internal and output lookahead signals with respect to Figure 1 are given by, LA_int = (D3 + D2 + D1 + D0 + LA_in)’

inputs of the decision module are equal to 1 simultaneously, then more pMOS transistors will turn-on to produce a similar output, nevertheless at the expense of more switching activity. Supposing LA_in is not 1, but if the lower orders nibble (D3 to D0) is equal to 1, then the internal lookahead signal LA_int becomes equal to 0, and turns-on pMOS transistors op5 to op8. This eventually drives the higher order output nibble of the decision module to 0. Referring back to Figure 3, it can be seen that LA_in1 is connected to ‘ground’. Under this condition, if any or all of the primary inputs I7 to I0 are 1, then as per the internal circuit diagram of the 8-bit Inc/Dec Block 1 depicted by Figure 1, LA_out1 and consequently LA_in2 will become 1. This implies that irrespective of the value of Inc/Dec’ signal, outputs Y15 to Y8 will equal I15 to I8, which means the most significant byte has not been incremented or decremented, and only the least significant byte will be subject to an increment or decrement operation depending on the value of Inc/Dec’ control signal. Again, referring back to Figure 1, we find that during the falling-edge of the global clock, the pMOS transistor marked as ‘pla’ will turn-on and hence LA_out will become equal to 0. Relating this with Figure 3, it may be noted that if none of the inputs I7 to I0 are 1, then LA_out1 and subsequently LA_in2 are equal to 0. Under this scenario, if any or all of the more significant inputs I15 to I8 equals 1, then during the rising edge of the clock, an incremented or decremented value will be visible only in the output byte Y15 to Y8. B. Proposed Architecture If any primary input byte contains a 1, then increment or decrement operation is to be invoked for the respective primary input group and the less significant primary input groups, while no incrementing or decrementing needs to be performed for the more significant groups of primary inputs. In other words, 8-bit incrementer/decrementer modules corresponding to more significant primary input groups can be disabled – disabling can be done purely on the basis of primary data inputs, and hence a sequential transfer of lookahead signals from one stage to another is not necessary. This observation has led us to the idea of parallel reset of unwanted increment/decrement building blocks, and the novel cascading architecture that satisfies this requirement is shown in Figure 4 for the case of a 16-bit incrementer/decrementer.

(9)

LA_out = D7 + D6 + D5 + D4 + D3 + D2 + D1 + D0 + LA_in (10) Considering the 8-bit increment/decrement module shown in Figure 1, and with respect to the above equations, if LA_in is 1 during the rising clock edge, pMOS transistors p1 to p8 and op1 to op8 will turn-on, which would drive all the outputs of the decision module to 0. Consequently, outputs Y0 to Y7 will reflect the values of inputs I0 to I7. If any or all of the data

148

Fig. 4. 16-bit incrementer/decrementer circuit implemented using the novel cascading architecture

Proceedings of the 2013 International Conference on Electronics and Communication Systems

TABLE I.

AVERAGE POWER DISSIPATION VALUES OF DIFFERENT INCREMENTER/DECREMENTER CIRCUITS, ESTIMATED USING TSPICE

Let us now briefly consider the operation of the 16-bit circuit shown in Figure 4. The 8-bit Inc/Dec Blocks shown herein are based on the structure portrayed by Figure 1, but featuring the 8-bit decision module macro that was shown in Figure 2. In Figure 4, LA_in1 is connected to the ‘supply’, hence depending on inputs I7 to I0; the corresponding outputs could either reflect the inputs or incremented or decremented value. If any of I7 to I0 is a 1, the output of the NOR gate ‘nor8’ becomes 0, and it disables the 8-bit Inc/Dec Block 2. In such an instance, outputs Y15 to Y8 would just reflect the values of primary inputs I15 to I8, and no incrementing/decrementing action is performed. On the other hand, if I7 to I0 is equal to 0, the lookahead input LA_in2 is driven to 1 as ‘nor8’ equals 1, and therefore based on any of I15 to I8 being equal to 1, incremented or decremented value is output in Y15 to Y8. Comparing the 8-bit decision module of Figure 2 with the one shown in Figure 1, it is clear that the separate logic associated with lookahead signaling has been eliminated in the former – this enables a reduction in device count and tends to facilitate a less complex and more elegant physical design. However, comparing the cascade topologies of Figures 3 and 4, it is seen that an extra NOR gate (8-inputs) is required. In fact, for constructing larger sized incrementers/decrementers, both NOR gates and AND gates would be required, as shown in Figure 5 for a generic N-bit incrementer/decrementer – these extra gates tend to slightly offset the reduction achieved in the number of transistors and power dissipation. But the advantage is that parallel reset of select increment/decrement blocks is feasible and static implementations can be used for NOR and AND gates. In general, (N/8) basic increment-cumdecrement blocks are required to realize an N-bit incrementer/decrementer, and the basic equation for a Kth stage input lookahead is given by (11), with K ranging from 2 to N/8. The symbol ‘•’ signifies logical conjunction in the equation given below.

Design Style

8-bits Existing [3]

V.

(11)

8-bits Proposed

Firstly, 8-bit decision module circuit macros which form the heart of the incrementer/decrementer were constructed based on existing and proposed methods at the transistor level. Secondly, 8-bit incrementer/decrementer basic building blocks, corresponding to both existing and proposed designs, were realized physically on the basis of the structure shown in Figure 1. Finally, 16-bit and 32-bit incrementer/decrementer circuits were designed based on both existing and proposed cascading architectures by utilizing 8-bit increment/decrement modules and extra elements. All the circuits designed using Tanner tools [8] pertain to a 0.25µm bulk CMOS process technology. A range of input patterns was used to verify the functionality of the circuits designed, and also to estimate the average power dissipation for a nominal clock frequency of 100MHz using Tanner SPICE – the power values are given in Table 1.

16-bits 32-bits

Increment Decrement Increment Decrement Increment Decrement Increment Decrement Increment Decrement Increment Decrement

Power (mW)

263.8 288.7 1334 1888 2951 4142 136.9 203.5 948 1258 2567 3612

%age decrease

48.1% 29.5% 28.9% 33.4% 13% 12.8%

The respective savings in power dissipation obtained for the proposed approach over the existing method for various incrementers/decrementers in mentioned in the last column of the above Table. From the simulation results, it is seen that the overall power savings garnered by the proposed approach for incrementing operation is 30%, while it is around 25% for decrementing operation, compared to the existing approach. However, the margin of average power savings for the former in comparison with the latter tends to narrow down with increase in the size of the incrementer/decrementer – this is due to the addition of extra logic gates in the former to feed the lookahead input signal of constituent 8-bit Inc/Dec Blocks. Nonetheless, the overall difference in device count between the two approaches for the above circuits is just about 3%. REFERENCES [1] [2]

[3]

[4]

RESULTS AND CONCLUSION

16-bits 32-bits

LA_inK = {I8(K-1)-1 + … + I8(K-1)-8}’ • {I8(K-2)-1 + … + I8(K-2)-8}’ • … • {I8(K-6)-1 + … + I8(K-6)-8}’ • {I7 + … + I0}’

Incrementer/Decrementer Circuit and Total Power Incrementer/decrementer size and operating mode

[5]

[6]

[7]

[8]

149

S. Furber, ARM System-on-Chip Architecture, 2nd edition, Pearson Education Limited, 2000. R. Hashemian, “Highly parallel increment/decrement using CMOS technology,” Proc. 33rd IEEE International Midwest Symposium on Circuits and Systems, vol. 2, 1991, pp. 866-869. C.-H. Huang, J.-S.F. Wang, Y.-C. Huang, “Design of high-performance CMOS priority encoders and incrementers/decrementers using multilevel lookahead and multilevel folding techniques,” IEEE Journal of Solid-State Circuits, vol. 37, no. 1, January 2002, pp. 63-76. C.-H. Huang, J.-S.F. Wang, Y.-C. Huang, “A high-speed CMOS incrementer/decrementer,” Proc. IEEE International Symposium on Circuits and Systems, vol. 4, 2001, pp. 88-91. S. Veeramachaneni, L. Avinash, M.K. Krishna, P.S. Reddy, M.B. Srinivas, “A novel high-speed binary and gray incrementer/decrementer for an address generation unit,” Proc. International Conference on Industrial and Information Systems, 2007, pp. 427-430. J. Mohanraj, P. Balasubramanian, K. Prasad, “Power, delay and area optimized 8-bit CMOS priority encoder for embedded applications,” Proc. 10th International Conference on Embedded Systems and Applications, 2012, pp. 111-113, Nevada, USA. S.-W. Huang, Y.-J. Chang, “A full parallel priority encoder design used in comparator,” Proc. 53rd IEEE International Midwest Symposium on Circuits and Systems, 2010, pp. 877-880. Tanner EDA. Available: http://www.tannereda.com

Proceedings of the 2013 International Conference on Electronics and Communication Systems

Fig. 5. Proposed cascade architecture for realizing N-bit incrementer/decrementer

150

Suggest Documents