A Fine-Grained Reconfigurable Logic Array Based on Double Gate Transistors Paul Beckett School of Electrical & Computer Engineering, RMIT University
[email protected] Abstract A fine-grained reconfigurable architecture based on double gate technology is presented. The logic function operating on the first gate of a double gate (DG) transistor is reconfigured by altering the bias on its second gate. A compact reconfigurable cell is proposed that merges two stacked 3-state resonant tunneling devices and non-silicon transistors and “hides” the cost of reconfiguration by exploiting vertical integration. Each cell in the array can act as logic or interconnect, or both - contrasting with current FPGA structures in which logic and interconnect are built and configured largely as separate items. Simulation results for a SOI DG-MOSFET implementation is presented and two alternative non-silicon device technologies, metal-insulator-metal and carbon nanotube transistors, are briefly explored. Of these, carbon nanotube devices appear to offer the highest current drive at the limit of scaling and will operate into the gigahertz range but then only within architectures that are locally connected.
1. Introduction Reconfigurable architectures are of great interest to system designers because they offer a way of achieving power and performance efficiency by matching specific algorithmic constructs with an appropriate architecture [1]. Even more importantly, as scaling continues into the sub-100nm region, it will be increasingly difficult to manufacture defect free devices at the envisaged densities without a significant simplification in manufacturing processes [2]. Reconfigurability [3], along with redundancy [4], will therefore come to play a critical role in enhancing yields by allowing circuits to be less sensitive to manufacturing defects. The traditional approach to developing reconfigurable systems, in FPGAs for example, has been to build separate regions of programmable logic gates and interconnection blocks and to manage these two resources more-or-less separately during the synthesis process. Therefore, much of the work on reconfigurable platforms has been directed towards answering questions such as “how much of each and in what form?” (see, for example, [5], [6], [7]). The reduced fanout, power handling capacity, gain and reliability of deep-sub-micron (DSM) and nano-scale devices will have a number of consequences for reconfigurable systems. As device dimensions shrink, it will become increasingly difficult to manufacture the complex heterogeneous components that have underpinned field-programmable technology to date. Physical issues such as the increasing difficulty in achieving alignment between process layers [8] as well as the prospect of poor performance of FET transistors at reduced gate lengths [9] have already forced designers to look towards alternative manufacturing techniques on which to base programmable architectures. Ideas such as chemically-assembled molecular electronics [10], nanotube and nanowire devices [11], [12], [13], quantum dot techniques [14], [15] and magnetic spin-tunneling devices [16] have all been proposed as the basis of future reconfigurable systems. The double gate transistor is a promising device applicable to DSM due in particular to its inherent resistance to shortchannel effects and ideal sub-threshold performance. Typically, the same gate potential would be applied to the two gates and in structures such as the π-gate transistor [17] this is the only option. However, accessing the two gates separately offers some opportunities for innovative circuit design [18]. This paper outlines a field-programmable architecture based on double gate transistor circuits combined with a multi-valued configuration RAM based on resonant tunneling devices (RTDs) that supports very fine-grained reconfigurability. The underlying idea uses the multi-value logic capability of RTD circuits to bias the operating point of a double gate transistor circuit via one gate while the other gate is used to form the logic array. In this way, the overheads imposed by reconfigurability can be reduced or hidden to an extent where very fine-grained organization becomes a viable option. The rest of the paper proceeds as follows: in Section 2, a very brief overview of the issues affecting the performance of FPGA devices is given. The primary objective is to reveal the opportunities for enhancement available in the FPGA domain. In Section 3, some device technologies appropriate to the reconfigurable cell are described, including double gate
and resonant tunneling devices. Two alternative, non-silicon technologies are also discussed. Section 4 then identifies one way that these could be assembled to form a homogenous reconfigurable processing mesh and offers some preliminary performance estimates. Finally, the paper is summarized and some directions for future work are briefly outlined.
2. FPGA Performance Issues This section encompasses a brief examination of the impact of wiring delays and structural organization on FPGA size and performance. The objective is to look for key areas where improvements might be made to current technology. For FPGAs using DSM technology, interconnect and wiring delays are already the dominant factor in the total delay figure [19], typically accounting for as much as 80% of the path delay [20]. As devices scale, the effect of distributed resistance and capacitance of both programmable interconnect switches and wiring will become worse. De Dinechin [7] has estimated that, if the organization of FPGAs stays the same, their operating frequency will only increase O(λ½) with reducing feature size (λ), leading to an widening gap between their performance and that of custom hardware. Indeed, ASIC designers face essentially the same problem and, as a result, future interconnect architectures are likely to include “fat” (i.e. un-scaled) global wires plus careful repeater insertion [21]. This observation has led some researchers to propose the idea of pipelining the interconnect as well as the logic [22], [23]. At a basic level, the wiring delay problem is simple to articulate: as interconnection width and thickness decrease, resistance per unit length increases, while as interconnections become denser (and oxide layers thinner), capacitance also tends to increase [24]. For example, if the RC delay of a 1mm metal line in 0.5µm technology is 15ps then at 100nm (in the same materials) the delay would be 340ps [21]. However, more detailed analyses of scaled wires [25], [26] have identified two distinct performance regions. For short connections (those that tend to dominate current ASIC wiring), the ratio of local interconnection delay to gate delay stays very close to unity - i.e. interconnection delay closely tracks gate delay with scaling. On the other hand, global wiring tends to increase in length with increasing levels of integration, implying that the interconnection delay of these wires will increase relative to intrinsic gate delay. Sylvester and Keutzer [21] concluded that the scaling of global wires will be increasingly unsustainable below about 180nm due to the rising RC delays of scaled-dimension conductors. As interconnect delay appears to be tolerably small in blocks of 50 – 100K gates, they argue for hierarchical implementation methodologies based on macro-blocks of this size. However, their results could equally be used to support a case for flat, locally connected organizations. Previous studies on the effect of logic block size on performance in FPGAs have resulted in numerous recommendations as to the optimum LUT table size in both clustered and non-clustered cases. The recommendations have tended to be small - ranging from 2-3 input [19], through 5-7 [27] as high as 8-10 inputs [28]. However, the perception that fine-grained architectures (those with path widths of one or two bits) exhibit high routing overheads and poor routability [29] has resulted in a move by some towards a coarse-grained, “array-of-processors” approach [30]. Studies of commercial FPGAs [5] have demonstrated that logic clusters are typically configured with more routing inputs than are strictly necessary and that these, in turn, have more configuration bits than necessary [31]. This contributes to the fact that 80-90% of the area of a typical FPGA is occupied by the interconnect switches and wires while most of the remaining area goes into configuration memory. The actual logic function occupies only a few percent of the area in a typical device [20]. A final observation about FPGA geometries is that their components tend to be intrinsically large. For example, deHon has estimated the area of a “typical” 4-input LUT (4-LUT) to be roughly 600Kλ2 if the programmable interconnect and configuration memory are included. Even with small values of λ, this figure severely limits the logic density of current and future FPGAs. In summary, a “wish-list” of features for future FPGA architectures might include the following items: • a simplified processing technology; • flexible organizations that allow a tradeoff to be made between the routing and logic and that do not rely on global interconnections; • an organization that reduces or hides the overhead imposed by reconfigurability; • a very small footprint for logic and interconnect supporting a high density of components.
3. Proposed Reconfigurable Cell The cell that forms the basis of the proposed reconfigurable architecture uses a combination of double gate transistors and RTD-based configuration RAM. This section outlines the basic operation of these devices, and in particular describes how they might be exploited within a reconfigurable logic array.
3.1 Double Gate Transistors The many problems associated with scaling MOS transistors are likely to result in the double-gate transistor becoming a preferred circuit element. Theoretically, these devices do not need channel doping and therefore can be scaled to dimensions below 10nm without running into problems of uncontrollable parameter variations due to the random distribution of dopant atoms [32]. It appears likely that most applications for double gate transistors will use them with the front and back gates at the same potential as this leads to the best performance as a switching device [33]. However, if the two gates can be accessed independently, one can be used to set the operating point of the transistor thus affecting the behaviour of the other. This is the basis for the operation of the proposed cell. To verify the operation of the double gate circuits, a number of SPICE-3 simulations were performed using the fully depleted SOI MOS (level 10) models developed at the University of Florida and provided by the nanoHub at Purdue [34]. Figure 1 demonstrates that the effect of changing the bias on the back gate (VG2) is to alter the voltage threshold at the front gate. When two complementary transistors are combined into an inverter circuit, altering VG2 therefore has the effect of modulating its switching threshold and gain. Figure 2 illustrates the results of the simulation performed on a SOI CMOS inverter under five operating conditions (VG2 = 1.5, 0.5, 0, -0.5 and -1.5, VDD = 1V, no load). This simulation verified that altering the back gate bias moves the switching threshold of the inverter such that, at the two extremes, the output stays high (Vo>0.8V) or low (Vo