Low-Power Field-Programmable VLSI Using ... - Semantic Scholar

1 downloads 0 Views 925KB Size Report
Dec 12, 2005 - A low-power field-programmable VLSI (FPVLSI) is pre- sented to overcome the problem of large power consumption in field- programmable ...
IEICE TRANS. FUNDAMENTALS, VOL.E88–A, NO.12 DECEMBER 2005

3298

PAPER

Special Section on VLSI Design and CAD Algorithms

Low-Power Field-Programmable VLSI Using Multiple Supply Voltages Weisheng CHONG†a) , Nonmember, Masanori HARIYAMA† , and Michitaka KAMEYAMA† , Members

SUMMARY A low-power field-programmable VLSI (FPVLSI) is presented to overcome the problem of large power consumption in fieldprogrammable gate arrays (FPGAs). To reduce power consumption in routing networks, the FPVLSI consists of cells that are based on a bit-serial pipeline architecture which reduces routing block complexity. Moreover, a level-converter-less multiple-supply-voltage scheme using dynamic circuits is proposed, where the cells in non-critical paths use a low supply voltage for low power under a speed constraint. The FPVLSI is evaluated based on a 0.18-µm CMOS design rule. The power consumption of the FPVLSI using multiple supply voltages is reduced to 17% or less compared to that of the static-circuit-based FPVLSI using multiple supply voltages. key words: reconfigurable processor, FPGA, multiple-supply-voltage scheme

1.

Introduction

Field-programmable gate arrays (FPGAs) are widely used to implement special-purpose processors. FPGAs are costeffective and flexible because functions and interconnections of logic resources can be directly programmed by end users [1]. Despite their design cost advantage, FPGAs impose large power consumption overhead compared to custom silicon alternatives [2]. The overhead increases packaging costs and limits integrations of FPGAs into portable devices. To overcome the problem, this paper presents a field-programmable VLSI (FPVLSI) that is based on the following concepts: • Compact bit-serial fine-grain pipeline architecture to achieve a higher degree of spatial parallelism. The parallelism reduces power consumption under the speed and area constraints. • Level-converter-less multiple-supply-voltage scheme using dynamic circuits. A low supply voltage is used at non-critical paths to achieve low power without power consumption overhead of level converters. For evaluations using an elliptic wave filter, an FFT and an FIR, power consumption of the FPVLSI using multiple supply voltages is reduced to 17% or less compared to that of the static-circuit-based FPVLSI using multiple supply voltages. Power consumption of the FPVLSI using multiple supply voltages is reduced to 53% or less compared to that of the FPVLSI using a single supply voltage. Manuscript received March 24, 2005. Manuscript revised June 14, 2005. Final manuscript received August 1, 2005. † The authors are with the Graduate School of Information Sciences, Tohoku University, Sendai-shi, 980-8579 Japan. a) E-mail: [email protected] DOI: 10.1093/ietfec/e88–a.12.3298

2.

Related Works

Most previous works on low-power FPGAs focused on dynamic power [3]. A low-power FPGA, called Flex Power FPGA, that focused on static power was presented in [4]. The Flex Power FPGA has a programmable threshold voltage using a four-terminal double-gate MOS transistor. Our work focuses mainly on dynamic power that still dominates overall FPGA power consumption even in a 0.13-µm technology [5]. Multiple-supply-voltage scheme is an effective lowpower technique. Low-power FPGAs using the scheme were presented in [3], [6]. However, there is overhead of level converters in these FPGAs because they implemented the scheme using static circuits. There is no publication for multiple-supply-voltage FPGAs without using level-converters. Glitch power is a significant part of total FPGA power [7]. One of the popular architecture-level techniques to reduce glitch power is pipelining that reduces logic depths [5]. Pipelining is efficient to reduce glitches for a routing wire because a single pipelined input of the wire causes at most a single output transition only. However, pipelining is inefficient to reduce glitches for a logic block because logic block inputs with different arrival times cause spurious output transitions. To our knowledge, there is no effort to eliminate glitch power in FPGA using a dynamic-circuit-based technique. FPGAs based on a bit-serial architecture were presented in [8], [9] to achieve high speed performance.There is no publication on a low-power FPGA based on a bit-serial architecture using multiple supply voltages. 3.

Architecture of the FPVLSI

3.1 Overall Architecture As shown in Fig. 1, the FPVLSI has a 2-dimensional cellular array structure. Each cell consists of a logic block and a routing block. Note that a supply voltage for the logic block is denoted as Vsel and it is selected between a high supply voltage (VH ) and a low supply voltage (VL ). A bit-serial fine-grain pipeline architecture is used to reduce power consumption by reducing the complexity of routing networks [9] and by reducing a supply voltage level under a speed

c 2005 The Institute of Electronics, Information and Communication Engineers Copyright 

CHONG et al.: LOW-POWER FIELD-PROGRAMMABLE VLSI USING MULTIPLE SUPPLY VOLTAGES

3299

Fig. 1

FPVLSI architecture using multiple supply voltages.

(A) Allocation.

(B) Scheduled DFG.

Fig. 3 Allocation of the DFG onto the FPVLSI using a single supply voltage. Fig. 2

DFG.

constraint. The supply voltage level can be reduced because of a high degree of spatial parallelism in the bit-serial pipeline architecture. In the bit-serial architecture, simple routing networks with a small number of tracks are sufficient because a data word is transmitted bit-by-bit using a single routing track, irrespective of word lengths [10]. The small number of tracks reduces capacitive load and an area of routing blocks. A smaller routing block leads to a smaller cell that increases the total number of cells in a chip, under the same chip area constraint. The degree of spatial parallelism increases with number of cells. Hence, the smaller routing block results in a higher degree of spatial parallelism. A multiple-supply-voltage scheme using dynamic circuits is proposed in the FPVLSI to reduce power consumption. In the scheme, each cell is connected to a voltage selector that switches off a logic block or connects the logic block to high or low supply voltages. The supply voltage VH is selected for cells in critical paths to meet a speed constraint. The supply voltage VL is selected for cells in noncritical paths to reduce dynamic power consumption under the speed constraint. Moreover, unused logic blocks in an application are switched off to reduce static power consumption. A multiple-supply-voltage scheme exploits timing slacks of imbalanced delays between critical and noncritical paths to reduce power consumption. Figure 2 shows a data flow graph (DFG) that is allocated onto a singlesupply-voltage FPVLSI shown in Fig. 3(a). Voltage selectors are not shown for simplicity. Figure 3(b) shows a scheduled DFG where delays of routing blocks RB1 , RB2 , RB3 and RB4 cause a timing slack between inputs of a logic block O3. In the FPVLSI that is fully pipelined, a clock period is the summation of a routing block delay and a logic block delay. For the same DFG in Fig. 2, Fig. 4 shows the multiplesupply-voltage FPVLSI uses VL in O1 and RB1 to reduce

(A) Allocation.

(B) Scheduled DFG.

Fig. 4 Allocation of the DFG onto the FPVLSI using multiple supply voltages.

power consumption under a speed constraint. Level converters in a typical multiple-supply-voltage scheme cause short circuit current overhead. Without using level converters, there are direct-current paths in VH blocks whose pMOS transistors are not fully switched off by low-voltage-swing inputs from VL -blocks. Figure 5(a) shows a level converter that converts a low-swing input to a high-swing output that fully switches off a pMOS transistor [11]. The main drawback of level converters is short circuit current, as shown in Fig. 5(a). Figure 5(b) shows the short circuit current flows when both transistors M1 and M2 are switched on. Thick lines in Fig. 5(a) show slow feed back paths that cause M1 to be still on when M2 is on.Level converters limit the efficiency of a multiple-supply-voltage scheme because they impose a small voltage gap between VH and VL . There is a trade-off between large and small voltage gaps. A large voltage gap increases short circuit current because of a slower VL -inverter in the feed back paths.

IEICE TRANS. FUNDAMENTALS, VOL.E88–A, NO.12 DECEMBER 2005

3300

(a) Circuit. Fig. 5

(a) Circuit.

(b) Simulation.

Fig. 7

Circuit and short circuit current of a level converter.

(a) Circuit. Fig. 6

(b) Waveform. Glitches in static circuits.

(b) Operations.

No direct current path in dynamic circuits.

However, a small voltage gap cannot fully exploit a timing slack between critical and non-critical paths to reduce power consumption. The power consumption of level converters is significant in field-programmable devices using multiple supply voltages because level converters are required in every cell. A level-converter-less multiple-supply-voltage scheme is important to overcome the problems of level converters. To realize the scheme, CMOS dynamic circuits can be used because there is no direct current in dynamic circuits in ideal cases, regardless of voltage swings of their inputs [12]. Let us consider an inverter using dynamic circuits as shown in Fig. 6(a). It operates in precharge-evaluation cycles that are controlled by a clock signal, CLK. As shown in Fig. 6(b), CLK switches off a pMOS pull-up transistor (denoted by MP) and an nMOS pull-down transistor (denoted by MN) in evaluation and precharge cycles, respectively. 3.2 Low Glitch-Power Architecture Using Dynamic Circuits Glitch power is one of the significant parts in FPGA total power. It can be up to 19% of the total power [7]. Glitches are spurious transitions due to imbalanced path delays of logic block inputs. Let us consider imbalanced path delays for inputs of an OR gate. Figures 7(a) and 7(b) show the circuit and its glitch, respectively. To eliminate glitches in the FPVLSI, we use dynamic circuits that are glitch-less because each of their outputs has at most one transition per clock cycle. Figure 8 shows an OR gate using dynamic circuits that is glitch-less irrespective of its imbalanced input

(a) Circuit.

(b) Waveform. Fig. 8

Glitch-less dynamic circuits.

delays. 4.

Cell Design

Figure 9 shows a cell structure where supply voltages of a logic block and a routing block are connected to a voltage selector and VH , respectively. In the routing block, the only components that require a supply voltage are SRAM configuration memory bits that use VH to fully switch off or switch on programmable switches. Note that the memory bits are static after a configuration, thus they do not consume dynamic power. Outputs of the logic block in Fig. 9 are registered to ensure functionality of dynamic-circuit-based logic blocks connected in series. Figure 10 shows incorrect functionality in logic blocks without output registers. The logic blocks use simplified 1-input LUTs although actual LUTs have four inputs as shown in Fig. 9. An output Sb from a source logic block is connected to a transistor Mb in a destination logic block. The logic blocks are programmed as inverters in this example. Therefore, by programming a memory bit M1 as

CHONG et al.: LOW-POWER FIELD-PROGRAMMABLE VLSI USING MULTIPLE SUPPLY VOLTAGES

3301

(a) Circuit. (each logic block is programeed as an inverter)

(b) Waveform. Fig. 11 The logic blocks in a series connection with output registers (the actual logic block has five inputs). Fig. 9

Cell structure with a dynamic-circuit-based logic block.

(a) Circuit. (each logic block is programeed as an inverter) Fig. 12 Structure of a 2-input LUT using dynamic circuits (a 4-input LUT is used in the FPVLSI).

(b) Waveform. Fig. 10 The logic blocks in a series connection without output registers (the actual logic block has five inputs).

one, the transistor Mb is connected to ground in an evaluation cycle. Figure 10(b) shows outputs Sb and OUT are ones during the first precharge cycle. The output Sb starts to discharge at the start of the first evaluation cycle. As long as the output Sb exceeds the switching threshold of the transistor Mb, which is Vt, a conducting path exists between the output OUT and ground. Therefore, the output OUT is discharged until the output Sb falls to Vt, where the transistor Mb will be shut off. A level of the discharged output OUT cannot be recovered. The level might cause incorrect functionality because a correct level should be high. To ensure

correct functionality, logic block inputs should be static during transitions from a precharge cycle to an evaluation cycle. Figure 11(a) shows a solution using an output register in a source logic block. The register holds the input Sb to be static during the transitions. Figure 11(b) shows conduction states of the transistor Mb in a precharge cycle and an adjacent evaluation cycle are the same. The solution also meets a requirement of pipeline registers in a bit-serial pipeline architecture. An input RST resets one of the pipeline registers to indicate word boundaries of arithmetic operations in the bit-serial architecture. The input RST is also registered to reset other pipeline registers in the next clock cycle. An important component in a logic block is a 4-input lookup table (LUT) that can realize arbitrary 4-input logic functions. Figure 12 shows the structure of the LUT using dynamic circuits. For simplicity, instead of showing a 4input LUT used in an actual cell, a 2-input LUT is shown. A pull-down network in the dynamic circuits consists of a

IEICE TRANS. FUNDAMENTALS, VOL.E88–A, NO.12 DECEMBER 2005

3302

multiplexer and programmable switches. Multiplexer inputs (S 0 and S 1) and memory bits for the switches determine values of a LUT output. 5.

Evaluation

5.1 Features of the FPVLSI Figure 13 shows a layout of the FPVLSI designed in a 0.18µm CMOS design rule. As a comparison, the same FPVLSI architecture is also designed using static circuits. The FPVLSIs using dynamic and static circuits are D-FPVLSI and S-FPVLSI, respectively. As shown in Fig. 14, there is a level converter at the LUT output in S-FPVLSI. For simplicity, instead of a 4-input LUT used in S-FPVLSI, a 2input LUT is shown. D-FPVLSI and S-FPVLSI are simulated using HSPICE. Features of D-FPVLSI and S-FPVLSI are summarized in Table 1. Large power consumption and a large delay of a level converter degrade the performance of an S-FPVLSI logic block. 5.2 Comparison of FPVLSIs Using Multiple Supply Voltages

a DFG of the elliptic wave filter and its allocation onto DFPVLSI, respectively. Shaded cells are in critical paths and thus use VH . Allocations onto S-FPVLSI and D-FPVLSI are the same. Table 2 shows the evaluation results. Power consumption of the elliptic wave filter implementation using DFPVLSI is 11% of that using S-FPVLSI. Delay overhead of level converters leads to higher VH and VL in S-FPVLSI. The higher supply voltages and short circuit current of the level Table 1

Features of D-FPVLSI and S-FPVLSI (VH = 1.8 V, VL = 1.2 V).

Architecture CMOS technology Cell area (µm2 ) Number of level converters in a cell Minimum delay of a routing block (ns) (Voltage swing=1.8 V) Minimum delay of a logic block (ns)

Minimum clock period (ns) (normalized value) Normalized power consumption

D-FPVLSI 0.18-µm 118 × 115

S-FPVLSI 0.18-µm 118 × 115

0 0.46

2 0.46

0.40 (evaluation cycle) 0.17 (precharge cycle) 0.86 (1)

2.00

2.46 (2.86)

0.09

1

To show advantages of a level-converter-less multiplesupply-voltage scheme, D-FPVLSI and S-FPVLSI are evaluated using an elliptic wave filter, an FFT and an FIR. First, let us evaluate the performance of D-FPVLSI and S-FPVLSI using a 16-bit 5th-order elliptic wave filter, under the same area and speed constraints. Figures 15(a) and 15(b) show

Fig. 13

Layout of the FPVLSI. (a) DFG.

Fig. 14 Level converter in a 2-input LUT using static circuits (a 4-input LUT is used in the S-FPVLSI).

(b) Allocation. Fig. 15

DFG of an elliptic wave filter and its allocation.

CHONG et al.: LOW-POWER FIELD-PROGRAMMABLE VLSI USING MULTIPLE SUPPLY VOLTAGES

3303 Table 2 Performance comparison for multiple-supply-voltage FPVLSIs using the elliptic wave filter. Architecture Circuit style Supply voltage Clock period (ns) Normalized throughput Normalized power consumption

D-FPVLSI dynamic VH :1.75 V VL :0.9 V 18.3 1 0.11

S-FPVLSI static VH :1.8 V VL :1.2 V 18.3 1 1

(a) DFG.

(a) DFG. (b) Allocation. Fig. 17

DFG of an FIR and its allocation.

Table 4 Performance comparison for multiple-supply-voltage FPVLSIs using the FIR. Architecture Circuit style Supply voltage Clock period (ns) Normalized throughput Normalized power consumption

(b) Allocation. Fig. 16

DFG of FFT butterfly operations and their allocation.

Table 3 Performance comparison for multiple-supply-voltage FPVLSIs using the FFT. Architecture Circuit style Supply voltage Clock period (ns) Normalized throughput Normalized power consumption

D-FPVLSI dynamic VH :1.76 V VL :1.0 V 14.6 1 0.13

S-FPVLSI static VH :1.8 V VL :1.2 V 14.6 1 1

S-FPVLSI static VH :1.8 V VL :1.25 V 12 1 1

Table 5 Performance comparison for dynamic-circuit-based FPVLSIs using the elliptic wave filter. Architecture Circuit style Supply voltage Clock period (ns) Normalized throughput Normalized power consumption

Multiple VDDs dynamic VH :1.75 V VL :0.9 V 18.3 1 0.42

Single VDD dynamic VDD:1.71 V 18.3 1 1

Table 6 Performance comparison for dynamic-circuit-based FPVLSIs using the FFT. Architecture

converters cause larger power consumption in S-FPVLSI. Next, let us evaluate the performance of D-FPVLSI and S-FPVLSI using a 16-point FFT, under the same area and speed constraints. An FFT consists of butterfly operations shown in Fig. 16(a). Figure 16(b) shows an allocation of the operations onto D-FPVLSI. Table 3 shows a comparison between FFT implementations using D-FPVLSI and S-FPVLSI. Power consumption of the FFT implementation using D-FPVLSI is 13% of that using S-FPVLSI. Finally, performance of D-FPVLSI and S-FPVLSI is compared using 16-bit 3-tap FIR shown in Fig. 17(a). Figure 17(b) shows an allocation of the FIR onto D-FPVLSI. Table 4 shows comparison results. Power consumption of the FIR imple-

D-FPVLSI dynamic VH :1.7 V VL :1.1 V 12 1 0.17

Circuit style Supply voltage Clock period (ns) Normalized throughput Normalized power consumption

Multiple VDDs dynamic VH :1.76 V VL :1.0 V 14.6 1 0.45

Single VDD dynamic VDD:1.74 V 14.6 1 1

mentation using D-FPVLSI is 17% of that using S-FPVLSI. 5.3 Comparison of FPVLSIs Using Dynamic Circuits To show advantages of a multiple-supply-voltage scheme in dynamic circuits, let us compare single-supply-voltage and

IEICE TRANS. FUNDAMENTALS, VOL.E88–A, NO.12 DECEMBER 2005

3304 Table 7 Performance comparison for dynamic-circuit-based FPVLSIs using the FIR. Architecture Circuit style Supply voltage Clock period (ns) Normalized throughput Normalized power consumption

Multiple VDDs dynamic VH :1.7 V VL :1.1 V 12 1 0.53

Single VDD dynamic VDD:1.68 V 12 1 1

FPVLSI will be reduced using a multiple-threshold-voltage scheme. Acknowledgements This work was supported in part by Industrial Techno-logy Research Grant Program in 2002 from New Energy and Industrial Technology Development Organization (NEDO) of Japan. References

(a) Using the elliptic wave filter.

(b) Using the FFT.

(c) Using the FIR. Fig. 18

Power consumption evaluations of the FPVLSIs.

multiple-supply-voltage D-FPVLSIs using an elliptic wave filter, an FFT and an FIR. Tables 5, 6 and 7 show the comparison results using the elliptic wave filter, the FFT and the FIR, respectively. Power consumption of the D-FPVLSI using multiple supply voltages is 53% or less compared to that of the D-FPVLSI using a single supply voltage. Under the same speed constraint, VH of the multiple-supplyvoltage D-FPVLSI is slightly higher than the VDD of the single-supply-voltage D-FPVLSI. This is to compensate the delay of a slower logic block in the multiple-supply-voltage D-FPVLSI. A pull-down network in the logic block cannot be fully switched on using VL -inverters shown in Fig. 12. Figure 18 summarizes the power consumption evaluations of S-FPVLSI, the multiple-supply-voltage D-FPVLSI and the single-supply-voltage D-FPVLSI. 6.

Conclusion

We presented a low-power FPVLSI that is based on a proposed level-converter-less multiple-supply-voltage scheme using dynamic circuits. In future work, static power of the

[1] S.D. Brown, R.J. Francis, J. Rose, and Z.G. Vranesic, FieldProgrammable Gate Arrays, pp.1–2, Kluwer Academic Publishers, 1992. [2] V. George, H. Zhang, and J. Rabaey, “The design of a low energy FPGA,” Proc. 1999 International Symposium on Low Power Electronics and Design, pp.188–193, California, USA, Aug. 1999. [3] A. Gayasen, K. Lee, N. Vijaykrishnan, M. Kandemir, M.J. Irwin, and T. Tuan, “A dual-Vdd low power FPGA architecture,” Proc. International Conference on Field-Programmable Logic and Its Applications, pp.145–157, Aug. 2004. [4] T. Kawanami, M. Hioki, H. Nagase, T. Tsutsumi, T. Nakagawa, T. Sekigawa, and H. Koike, “Preliminary evaluation of flex power FPGA: A power reconfigurable architecture with fine granularity,” IEICE Trans. Inf. & Syst., vol.E87-D, no.8, pp.2004–2010, Aug. 2004. [5] S. Wilton, S.S. Ang, and W. Luk, “The impact of pipelining on energy per operation in field-programmable gate arrays,” Proc. International Conference on Field-Programmable Logic and Its Applications, pp.719–728, Aug. 2004. [6] F. Li, Y. Lin, and L. He, “FPGA power reduction using configurable dual-Vdd,” Proc. 41st Annual Conference on Design Automation, pp.735–740, CA, USA, June 2004. [7] F. Li, D. Chen, L. He, and J. Cong, “Architecture evaluation for power-efficient FPGAs,” Proc. 11th International Symposium on Field Programmable Gate Arrays, pp.175–184, Feb. 2003. [8] T. Isshiki, A. Ohta, T. Watanabe, T. Nakada, K. Akahane, I. Sisla, D. Li, and H. Kunieda, “High density bit-serial FPGA with LUT embedding shift register function,” Proc. IEEE Asia Pacific Conference on Circuit and System, pp.475–480, 2002. [9] N. Ohsawa, M. Hariyama, and M. Kameyama, “High-performance field programmable VLSI processor based on a direct allocation of a control/data flow graph,” IEEE Computer Society Annual Symposium on VLSI Emerging Trends in VLSI Systems Design (ISVLSI’02), pp.95–100, PA, USA, April 2002. [10] A. Tisserand, P. Marchal, and C. Piguet, “An on-line arithmetic based FPGA for low-power custom computing,” Proc. 9th International Workshop on Field Programmable Logic and Applications (FPL), pp.264–273, Glasgow, UK, 1999. [11] F. Ishihara, F. Sheikh, and B. Nikolic, “Level conversion for dualsupply systems,” IEEE Trans. Very Large Scale Integr. (VLSI) Syst., vol.12, no.2, pp.569–571, Nov. 2004. [12] J. Rabaey, Digital Integrated Circuits: A Design Perspective, pp.223–224, Prentice Hall, New Jersey, 1996.

CHONG et al.: LOW-POWER FIELD-PROGRAMMABLE VLSI USING MULTIPLE SUPPLY VOLTAGES

3305

Weisheng Chong received the B.E. and M.E. degrees in electrical engineering from Universiti Teknologi Malaysia, Johor Bahru, Malaysia, in 1999 and 2002, respectively. He is currently a Ph.D. student in information sciences at Tohoku University, Sendai, Japan. His research interests include reconfigurable architectures, low-power designs and cryptographic systems.

Masanori Hariyama received the B.E. degree in electronic engineering, M.S. degree in information sciences, and Ph.D. in Information Sciences from Tohoku University, Sendai, Japan, in 1992, 1994, and 1997, respectively. He is currently an associate professor in Graduate School of Information Sciences, Tohoku University. His research interests include VLSI computing for real-world application such as robots, high-level design methodology for VLSIs and reconfigurable computing.

Michitaka Kameyama received the B.E., M.E. and D.E. degrees in electronic engineering from Tohoku University, Sendai, Japan, in 1973, 1975, and 1978, respectively. He is currently a professor in the Graduate School of Information Sciences, Tohoku University. His general research interests include intelligent integrated systems for real-world applications and robotics, high-level synthesis of VLSI processors, and multiple-valued VLSI computing. Dr. Kameyama received the Outstanding Paper Awards at the 1984, 1985, 1987 and 1989 IEEE International Symposiums on Multiple-Valued Logic, the Technically Excellent Award from the Society of Instrument and Control Engineers of Japan in 1986, the Outstanding Transactions Paper Award from the IEICE in 1989, the Technically Excellent Award from the Robotics Society of Japan in 1990, and the Special Award at the 9th LSI Design of the Year in 2002. Dr. Kameyama is an IEEE Fellow.

Suggest Documents