An Alternative Logic Approach to Implement High

3 downloads 0 Views 316KB Size Report
Sep 7, 2005 - CMOS realization [3], several full adders built upon different static logic styles have been presented, namely: Differential. Cascode Voltage ...
An Alternative Logic Approach to Implement High-Speed Low-Power Full Adder Cells Mariano Aguirre

Monico Linares

Department of Electronics INAOE-Mexico P.O. Box 51 and 216, 72000, Puebla, Mexico Phone/Fax: +52 (222) 247 05 17

[email protected]

[email protected] This paper describes the design of a full adder cell using an alternative logic structure that is based on the multiplexing of the Boolean functions XOR/XNOR and AND/OR, to obtain the SUM and CARRY outputs, respectively. The resultant full adder shows to be more efficient on regards of power consumption and delay when compared with other ones recently reported as good candidates to build low-power arithmetic modules.

ABSTRACT This paper presents a high-speed low-power 1-bit full adder cell designed upon an alternative logic structure to derive the SUM and CARRY outputs. Hspice and Nanosim simulations show that this full adder cell designed using a 0.35µm CMOS technology and supplied with 3.3V, exhibits delay and power dissipation around 720ps and 840µW, respectively. These features reflect an overall improvement of 30% in the power-delay metric, when compared with the performance of other realizations recently published as well featured cells for low-power applications.

The rest of this paper is organized as follows: Section 2 reviews the published work about the design of 1-bit full adders, and the logic structure adopted like standard to implement those cells. Section 3 presents the alternative logic scheme used to build the full adder proposed in this paper. Section 4 explains the features of the simulation environment used to obtain the power-delay performance of the full adders being compared. Section 5 shows the analysis of the simulation results from the comparison carried out, and explains the weaknesses and advantages of the previous and new full adder cells. Finally, Section 6 concludes this work pointing out the features of the proposed circuit.

Categories and Subject Descriptors B.2.4 [Hardware]: Arithmetic and Logic Structures – High Speed arithmetic.

General Terms: Performance, Design. Keywords Full Adder, High-Speed, Low-Power.

2. PREVIOUS WORK

1. INTRODUCTION

Several papers have been published regarding the design of lowpower full adders, trying on both: the logic style and the logic structure used to build the adder module. Since the standard CMOS realization [3], several full adders built upon different static logic styles have been presented, namely: Differential Cascode Voltage Switch (DCVS) [4], Complementary PassTransistor Logic (CPL) [5], Double Pass-Transistor Logic (DPL) [6], and Swing Restored CPL (SR-CPL) [7]. On the base of these logic styles, some work has been done to build new full adders by changing the internal logic structure of the module.

Addition is a fundamental arithmetic operation that is widely used in many VLSI systems such as application-specific DSP architectures and microprocessors. This module is the core of many operations such as subtraction, multiplication, division, address generation, etc., besides addition. In the majority of these systems, the adder is part of the critical path that determines the overall performance of the system. That is why enhancing the performance of the full adder cell results of great interest [1]. On the other hand, the ever-increasing demand for mobile products, working with a high throughput capability and a limited source of power, makes the design of low-power adder cells another significant goal to be attained. There are three major components of power dissipation in complementary metal oxide semiconductor (CMOS) circuits [2]: switching power, shortcircuit power and static power. Reducing whichever of these components will end up with lower power consumption for the whole system.

In earlier work [8], transmission function theory was used to build a full adder formed by three main logic blocks: a XOR-XNOR gate to obtain A ⊕ B and A ⊕ B signals (Block 1), and XOR blocks or multiplexers to obtain the SUM (So) and CARRY (Co) outputs (Blocks 2 and 3), as shown in Figure 1.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. SBCCI'05, September 4–7, 2005, Florianópolis, Brazil. Copyright 2005 ACM 1-59593-174-0/05/0009...$5.00.

Figure 1. A full-adder module formed by three logic blocks.

166

Table 1. True-table for a 1-bit full adder: A, B and C are inputs, and So and Co are outputs. C 0 0 0 0 1 1 1 1

B 0 0 1 1 0 0 1 1

A 0 1 0 1 0 1 0 1

So Co 0 0 1 0 1 0 0 1 1 0 0 1 0 1 1 1

Figure 2. Alternative logic scheme for designing full adder modules.

This logic structure is based on the full adder’s true-table shown in Table 1, and it has been adopted as the standard internal configuration in the most of the enhancements developed for the 1-bit full adder cell.

- There are no internally generated signals controlling the selection of the output multiplexers. Instead, the C input signal, which has full voltage swing and no extra delay, is used to drive the multiplexers, reducing so the overall propagation delay.

Since the proposal presented in [8], several papers have introduced new full adder cells by trying on different realizations for the three logic blocks shown in Figure 1. Chronologically, some of them are: 14TA [9], 14TB [10], wu_ng [11], 16T [12], 10TA [13], 10TB [14], full_rest [15], mux_based [16], and wey_chow [17].

- The capacitive load for the C input has been reduced, as it is connected only to some transistor gates and no longer to some drain or source terminals, where the diffusion capacitance is becoming very large for sub-micrometer technologies. Therefore, the overall delay for larger modules where the C signal falls on the critical path can be reduced. - The propagation delay for the So and Co outputs can be tuned up individually by adjusting the XOR/XNOR and the AND/OR gates, what is advantageous for applications where the skew between arriving signals is critical for a proper operation (e.g. wave-pipelining).

After a deep comparative study presented in [18], the most efficient realization for block 1 was extracted: the one implemented with SR-CPL logic style. But another important conclusion was pointed out over there: the major problem on regards of propagation delay for a full adder built upon the logic structure shown in Figure 1, is that it is necessary to obtain the A ⊕ B and A ⊕ B intermediate signals, which are then used to drive other blocks in order to generate the final outputs. Thus, the overall propagation delay and, in most of the cases, the power consumption of the full adder, depend on the delay and voltage swing of the A ⊕ B and A ⊕ B signals, generated within the cell.

- The placement of buffers at the full adder outputs is implemented by interchanging the A ⊕ B and A ⊕ B signals, and the AND/OR gates to NAND/NOR gates, at the input of the multiplexers, improving so the performance for load-sensitive applications. Thus, based on the alternative logic structure introduced here, and the results presented in [18], two new full adders were derived, as presented in Figures 3 and 4.

Therefore, to increase the operational speed of the full adder, it is necessary to look out for a new logic structure that avoids the generation of intermediate signals used to control the selection or transmission of other signals located on the critical path.

3. ALTERNATIVE LOGIC STRUCTURE Examining the full adder’s true-table in Table 1, it can be seen that the So output is equal to the A ⊕ B value when C=0, and to A ⊕ B when C=1. Thus, a multiplexer can be used to obtain the respective value based upon the C input value, as stated before. But, following the same criteria, the Co output is equal to the (A AND B) value when C=0, and to (A OR B) value when C=1. Again, C can be used to select the respective value for the required condition, driving a multiplexer. Hence, an alternative logic scheme to design a full adder cell can be formed by a logic block to get the ( A ⊕ B ) and ( A ⊕ B ) signals, other block to obtain the ( A • B ) and ( A + B ) signals, and two multiplexers being driven by the C input to generate the So and Co outputs, as shown in Figure 2. The features and advantages that can be expected for this alternative logic structure are:

Figure 3. Full adder designed with the proposed logic structure and DPL logic style.

167

- The short-circuit consumption of the full adder itself, as it is receiving signals with finite slopes coming from the buffers connected at the inputs, instead of ideal ones coming from voltage sources. - The short-circuit and static dissipation of the inverters connected to the outputs of the full adder due to the finite slopes and degraded voltage swing of the full adder output signals. The importance of including the effects and power consumption of the buffers connected at the inputs and outputs of the full adder cell come from the fact that the module is always going to be used in combination with other modules to build a larger system, and these static inverters are a good generalization for any other module to be considered.

5. SIMULATION RESULTS Several full adders were compared on regards of power consumption and delay. They were named: cmos_26 [3], cmos_28 [3], cpl [5], sr_cpl [7], dcvs [4], bay_10a [13], bay_10b [14], bay_14a [9], bay_14b [10], bay_16 [12], full_rest [15], mux_based [16], tran_funct [8], wey_chow [17], wu_ng [11], our first proposal (Figure 3) using XOR/XNOR gates designed with logic style DPL (Ours_1), and a second proposal (Figure 4) using XOR/XNOR gates designed with logic style SR-CPL (Ours_2).

Figure 4. Full adder designed with the proposed logic structure and SR-CPL logic style. Figure 3 presents a full adder designed using DPL logic style to build the XOR/XNOR gates and a pass-transistor based multiplexer to obtain the So output. In Figure 4, the SR-CPL logic style was used instead to build these gates. In both cases, the AND/OR gates have been built using a powerless and groundless pass-transistor configuration, respectively, and also a passtransistor based multiplexer to get the Co output.

These full adders were designed using an AMS 0.35 µm CMOS technology, and were simulated using the BSIM3v3 model (level 49) and supplied with 3.3 volts. Sizing methodology for the circuits incorporated the following steps, as suggested in [21]:

4. SIMULATION SETUP

a) Set all the N transistors to the minimum size. If there were n transistors connected in series, then the size for each transistor within the chain was modified to n times the original size.

The test bed used to simulate the full adders being compared is shown in Figure 5. This simulation environment has been commonly used to compare the performance of the full adders analyzed in [19, 20 and 21].

b) Set all the P transistors to double the minimum size (to compensate for the mobility difference between N and P transistors). If there were p transistors connected in series, then the size for each transistor within the chain was modified to p times the original size. c) Simulate the circuit with an input pattern to cover all input combinations, as used in [20, 21]. The highest input signal frequency was 250 MHz. d) Figure out the transition with the highest propagation delay, and resize the transistors involved in this critical path.

Figure 5. Test bed used to simulate the full adder cells under analysis.

e) Repeat steps c) and d) until no longer improvement is attained for the propagation delay.

In our study, the advantage of using this test bed is that the following power components are taken into account, besides the dynamic one:

Simulations were carried out using Nanosim to determine the power consumption features of the designed full adders, and Hspice to measure the propagation delay for the output signals (the longest time required for one output signal to reach the 50% of its voltage swing from the moment when one of the input signals reached the 50% of its voltage swing). Table 2 shows the simulation results from the full adder’s performance comparison regarding power dissipation and delay. Following is the explanation for each column:

- The short-circuit dissipation of the inverters connected at the full adder inputs. This power consumption varies according to the capacitive load that the adder module offers at the inputs. Even more, the energy required to charge and discharge the full adder internal nodes when the module has no direct power supply connections (such is the case of pass-transistor logic styles), comes through these inverters connected at the full adder inputs.

168

Table 2. Simulation results of the full adders compared (power in µW, delay in ns, width in µm, and Vdd in V). scheme cmos_26 cmos_28 cpl cpl_sr cpl_uye dcvs bay_10a bay_10b bay_14a bay_14b bay_16 full_rest mux_based tran_funct wey_chow wu_ng Ours_1 Ours_2

top add top add top add top add top add top add top add top add top add top add top add top add top add top add top add top add top add top add

avg power

pwr supply

dynamic

static

short-circuit

% wasted/total

1286.4 882.2 1737.0 1025.0 2976.1 2577.0 2264.2 1848.8 2179.9 1750.0 2965.5 2515.6 1576.4 1261.1 1565.5 1119.0 1221.0 799.6 1290.6 829.3 1343.8 817.0 1795.5 1260.3 1560.2 1155.3 1225.0 871.6 1346.4 871.1 1161.6 719.1 843.8 395.2 835.6 422.6

1286.3 811.5 1736.8 1025.0 2975.9 2504.4 2264.1 1804.8 2179.7 1688.6 2965.4 2515.6 1576.4 1237.5 1565.5 953.7 1221.0 684.8 1290.6 663.3 1343.8 699.6 1795.5 1027.0 1560.2 1100.2 1225.0 808.8 1346.4 739.9 1161.6 728.6 843.8 280.2 835.6 364.7

875.2 683.8 1420.7 746.8 984.4 702.6 1097.9 810.2 1190.3 982.7 1579.7 1179.1 632.0 436.9 960.6 773.2 848.8 658.0 975.5 785.7 919.4 724.4 894.3 687.7 750.8 566.6 796.6 612.2 942.2 747.1 959.6 741.8 751.1 510.8 710.8 466.6

4.3 4.3 1991.6 37.6 55.1 55.1 2.0 2.0 0.7 0.7 0.7 0.7 0.3 0.3 266.3 266.3 4.6 4.6 -

406.9 363.0 315.8 277.9 37.6 1910.0 1165.9 1127.6 989.3 963.3 1385.3 1336.2 889.4 829.6 602.6 561.0 371.6 331.0 314.2 275.2 423.7 393.4 634.6 615.1 804.5 758.0 428.0 379.8 404.3 372.2 202.0 168.6 92.7 56.4 124.7 89.8

31.9 34.9 18.2 27.1 66.9 73.4 51.5 58.1 45.3 49.5 46.7 53.1 60.0 67.0 38.6 42.1 30.5 33.5 24.4 26.0 31.5 35.2 50.1 56.2 51.8 57.3 34.9 38.3 30.0 33.2 17.4 18.5 11.0 10.0 15.0 16.1

% add/top

delay

pwr * delay

Σ width

Vdd min

68.6

0.703

904.3

67.8

1.3

59.0

0.984

1709.2

184.8

1.3

86.6

0.781

2324.3

113.2

1.8

81.7

0.812

1838.5

116.4

1.4

80.3

0.853

1859.5

116.4

1.4

84.8

1.107

3282.8

182.4

1.3

80.0

1.955

3081.9

51.4

2.8

71.5

1.157

1811.3

84.8

2.4

65.5

1.220

1489.6

68.7

2.4

64.3

1.366

1763.0

84.0

2.4

60.8

1.688

2268.3

80.4

2.8

70.2

1.022

1835.0

72.7

1.8

74.0

1.362

2125.0

65.6

2.4

71.2

0.932

1141.7

63.0

1.3

64.7

1.024

1378.7

75.8

1.7

61.9

1.067

1239.4

79.4

1.7

46.8

0.716

604.2

52.8

1.5

50.6

0.734

613.3

50.4

1.5

- scheme: it indicates the tested full adder and separates the measured values for the whole test-bed (top) and the adder cell alone (add).

- %wasted/total: it represents the percentage of power that is not used to charge/discharge capacitances (static + short-circuit) with respect to the total dissipation.

- avg. power: it shows the average power that is being taken from the power supply and the module inputs (as mentioned before, some adders take energy from the inverters connected at the inputs).

- %add/top: it represents the percentage of power dissipation of the full adder cell under test, with respect to the power consumption of the whole test bed. - delay: it indicates the longest propagation delay of the So and Co output signals.

- pwr. supply: it shows the average power portion taken only from the power supply. - dynamic: it indicates the power dissipation component caused by charging/discharging the parasitic capacitances within the cells, throughout the inherent operation.

- pwr * delay: this metric relates the performance about power consumption and propagation delay of a cell, providing the energy required to perform the logic function, so the overall performance can be matched in a better way.

- static: it refers to the power consumption incurred when the input and output signals are set and there exists a direct path from Vdd to Gnd, due to some partially turned-on transistors.

- Σ width: It is the sum of all transistors width for each circuit, giving an idea of the required implementation area, as all transistors length is kept at minimum (0.35 µm).

- short-circuit: it reflects the power dissipation due to direct paths from Vdd to Gnd that are created momentarily when input signals are transitioning with a finite slope.

- Vdd min: The minimum supply voltage that maintains the correct functionality of the full adder cell, been able to drive the buffers connected at the outputs with proper logic values.

169

Thus, the following statements about power consumption and delay can be extracted from the results on Table 2: - The full adders designed with pass transistor logic styles (cmos_26, cpl, cpl_sr, Ours_1 and Ours_2) exhibit less delay than the other ones, what can be expected from the fact of reduced internal parasitic capacitances as stated in [7] for these logic styles. - On the other hand, the full adders designed upon the logic structure shown in Figure 1 (bay_10a, bay_10b, bay_14a, bay_14b, bay_16, full_rest, tran_funct, wey_chow) have larger propagation delays (around or exceeding 1 ns) as expected, due to the internal XOR/XNOR gates that generate intermediate signals having an extra delay, used to control the output blocks.

Figure 6. Short-circuit path (dashed line) created when input A changes from 0 to 1. fed from two inverters, in this test bed) applied to the cell. If a transition occurs at input A (0 → 1), a momentarily direct path from Vdd to Gnd arise, as shown in Figure 6.

- The full adders having an incomplete voltage swing (bay_14a, bay_14b, bay_16) present less power consumption than other ones (cmos_26, tran_funct) but only when the surrounding circuitry dissipation is neglected (row “add”). If the whole testbed dissipation (row “top”) is considered, then those proposals have no longer better performance than the other ones. Even more, the propagation delay for those adders is longer, due to the current-driving capability degradation of the multiplexers being controlled by the nodes having an incomplete voltage swing, making the power-delay product even worse than the value exhibited for the other adders.

- The importance of the simulation setup and the inclusion of the power dissipation components for the surrounding circuitry are now evident, as some realizations reported previously as lowpower cells, have shown to be no better performed than other ones when considering the whole test-bed dissipation. Figure 7 shows the So and Co output waveforms for the proposed full adder Ours_1, for an input pattern corresponding to the truetable’s adder shown in Table 1. With the power and delay features shown in the simulations performed, the proposed adders look to be good candidates to build low-power high speed arithmetic modules.

- Furthermore, the minimum supply voltage that maintains the right operation (column “Vdd min”) for those circuits is higher than the supply required for the ones having internal nodes with a full voltage swing.

Additionally, to validate these results with a more complex test vehicle, five 8-bits carry-ripple adders were designed using some of the 1-bit full adders being compared. Table 3 shows the power dissipation for these adders when operated at 125 MHz, and the highest operational frequency attained for each circuit.

- On regards of the proposals in this paper, it can be clearly seen the advantage of the alternative logic structure derived above, as both realizations designed upon this scheme (Ours_1 and Ours_2) exhibit the least power consumption, delay (except for the cmos_26 realization) and power-delay product.

As can be seen, the results for the power dissipation maximum operational frequency for each 8-bit adder consistent with the results obtained for the individual 1-bit adder cells. It is worth to point out that the first proposal in paper (Ours_1) exhibits the least power dissipation and highest operational frequency among the others, as expected.

- In addition, as these realizations have no static dissipation, nor internal direct paths from Vdd to Gnd (except for the inverters at the inputs, which could be even avoided if the inputs are coming from flip-flops with complementary outputs), they look like good candidates for battery-operated applications where low power dissipation modules at stand-by modes are required. Even more, the power consumption can be further reduced for these circuits, as they have shown to work properly with power supplies as low as 1.5V.

and are full this the

- On the subject of the required implementation area (column “Σwidth”), it can be noticed that the pass-transistor based circuits occupy less area than the static ones. In particular, the proposed full adders require the least area, which can also be considered as one of the factors for the enhanced performance. - The short-circuit column reveals an important issue that can not be seen when reporting condensed power dissipation. Some adders exhibit short-circuit dissipation even when they were reported as no having this component [15, 20], as they do not have an internal direct path from Vdd to Gnd (powerless or groundless).When the adders are simulated in the test-bed shown in Figure 5, a short-circuit path arise under certain conditions. Consider an input pattern of A = B = 0 (actually these signals are

Figure 7. Sum (So) and Carry (Co) outputs waveforms for the full adder Ours_1.

170

[7] R. Zimmerman and W. Fichtner, “Low-power logic styles: CMOS versus pass-transistor logic”, IEEE JSSC, Vol. 32, No. 7, July 1997, pp. 1079-1090.

Table 3. Simulation results for 8-bits carry-ripple adders. adder

power @ 125 MHz (mW)

Fmax (MHz)

cmos_26

3.73

416

bay_16

4.38

166

full_rest

4.56

384

tran_funct

3.68

384

Ours_1

3.42

454

[8] N. Zhuang and H. Wu, “A new design of the CMOS full adder”, IEEE JSSC, Vol. 27, No. 5, May 1992, pp. 840-844 [9] A. M. Shams and M. Bayoumi, “A new cell for low power adders”, Proceedings of the International MWSCAS, 1995. [10] E. Abu-Shams, A. Elchouemi, S. Sayed and M. Bayoumi, “An efficient low power basic cell for adders”, IEEE ISCAS 1995, pp. 306-308. [11] A. Wu and C. K. Ng, “High performance low power low voltage adder”, IEE Electronic Letters, Vol. 33, No. 8, April 1997.

6. CONCLUSIONS The design of high-speed low-power full adder cells based upon an alternative logic approach has been presented. Nanosim and Hspice simulations have shown a great improvement on regards of power-delay metric for the proposed adders, when compared with several previously published realizations.

[12] A. M. Shams and M. Bayoumi, “A novel low-power building block CMOS cell for adders”, Proceedings of the ISCAS, June 1998. [13] A. M. Shams and M. Bayoumi, “A 10-transistor low-power high-speed full adder cell”, IEEE International Symposium on Circuits and Systems, 1999, pp. 43-46.

The full adders designed upon this logic structure and DPL and SR-CPL logic styles, exhibit a delay around 720ps and power consumption around 840µW, for an overall reduction of 30% respect to the best featured one of the other adders been compared, but in general about 50% respect to the other ones.

[14] A. M. Shams and M. Bayoumi, “A Low Power 10-Transistor Full Adder Cell for Embedded Architectures”, IEEE ISCAS 2001, pp. 226-229.

Some work can be done in the future on the design of high-speed low-power full adders, taking this alternative logic structure and trying on new realizations for the constituent logic blocks (XOR/XNOR, AND, OR and MUX cells).

[15] D. Radhakrishnan, “Low-Voltage Low-Power CMOS Full Adder”, IEE Proceedings on Circuits Devices and Systems, Vol. 148, No. 1, February 2001, pp. 19-24. [16] B. Alhalabi and A. Al-Sheraidah, “A novel low-power multiplexer-based full adder cell”, IEEE ICECS 2001, Vol. 3 , September 2001, pp. 1433-1436.

7. ACKNOWLEDGMENTS The authors express thanks to the CONACYT–Mexico by the support offered through the PhD. Grant #112784.

[17] I. Wey, C. Huang and H. Chow, “A new low-voltage CMOS 1-bit full adder for high performance applications”, Proceedings of the IEEE APC-ASIC, August 2002, pp. 21-24.

8. REFERENCES [1] A. M. Shams and M. Bayoumi, “Performance evaluation of 1-bit CMOS adder cells”, IEEE ISCAS, Orlando, Florida, May 1999, pp. I27 – I30.

[18] M. Aguirrre and M. Linares, “A low-power low-voltage 1-bit CMOS full adder for energy-efficient multimedia applications”, Proceedings of the IEEE ICED’04, Veracruz, México, November 2004.

[2] A. P. Chandrakasan, S. Sheng and R. W. Brodersen, “Lowpower CMOS digital design”, IEEE JSSC, Vol. 27, April 1992, pp. 473-483.

[19] A. M. Shams and M. Bayoumi, “A Structured Approach for Designing Low Power Adders”, ASILOMAR Conference on Signals, Systems and Computers, Vol. 1, 1998, pp. 757-761.

[3] N. Weste and K. Eshraghian, Principles of CMOS design, A system perspective, Addison-Wesley, 1988.

[20] H. T. Bui, Y. Wang and Y. Jiang, “Design and Analysis of Low-Power 10-Transistor Full Adders Using Novel XORXNOR Gates”, IEEE Trans. on Circuits and Systems, Vol. 49, No. 1, January 2002, pp. 25-30.

[4] K. M. Chu and D. Pulfrey, “A comparison of CMOS circuit techniques: differential cascode voltage switch logic versus conventional logic”, IEEE JSSC, Vol. sc-22, No. 4, August 1987, pp. 528-532.

[21] A. M. Shams, T. K. Darwish and M. Bayoumi, “Performance Analysis of Low-Power 1-Bit CMOS Full Adder Cells”, IEEE Trans. on VLSI Systems, Vol. 10, No.1, February 2002, pp. 20-29.

[5] K. Yano, et al, “A 3.8ns CMOS 16 × 16-b multiplier using complementary pass-transistor logic”, IEEE JSSC, Vol. 25, April 1990, pp. 388-395. [6] M. Suzuki, et al, “A 1.5ns 32-b CMOS ALU in double passtransistor logic”, IEEE JSSC, Vol. 28, No. 11, November 1993, pp. 1145-1150.

[22] “Nanosim User Guide”, Release 2002.03 Synopsys

Inc.

171