VHDL-designed application onto the designed FPGA device. 1. ... The LUT is used for the implementation of ... which supports the implementation of digital logic on the ... Proposed Xilinx Toronto Alliance. Input Format. VHDL/. Verilog. VHDL/.
A LOW-ENERGY FPGA: ARCHITECTURE DESIGN AND SOFTWARE-SUPPORTED DESIGN FLOW 1 Konstantinos Siozios, Dimitrios Soudris and Antonios Thanailakis Dep. of Electrical and Computer Engineering, Democritus University of Thrace, 67100, Xanthi, Greece {ksiop, dsoudris, thanail}@ee.duth.gr
ABSTRACT The aim of the PhD thesis is the development of systematic methodologies both for hardware and software level for designing low-energy and performance efficient reconfigurable systems. This problem is tackled at two different design tasks, namely the design of efficient CLB architecture and the supporting CAD tools for mapping a VHDL-designed application onto the designed FPGA device. 1. FPGA ARCHITECTURE AND CAD TOOLS DEVELOPMENT The first part of the thesis describes the FPGA’s Configurable Logic Block (CLB) at 0.18μm STM technology. During this stage a lot of parameters [1] that affect the CLB’s efficiency were taken in consideration. Among them is the appropriate cluster size in order to minimize power consumption, the best performance to power consumption ration for all the elements of the CLB, as well as the use of low-power techniques at the component design whenever this is possible. LUT Inputs (K). The LUT is used for the implementation of logic functions. It has been demonstrated in [4] that 4input LUTs lead to the lowest power consumption for the FPGA, providing an efficient area-delay product. CLB Inputs (I). An exploration for finding the optimal number of CLB inputs, which provides 98% utilization of all the BLEs [4], results in an almost linear dependency with the number of LUT inputs, and the cluster size. Cluster Size (N). The Cluster Size corresponds to the number of BLEs within a CLB. Taking into account mostly the minimization of power consumption, our design exploration proved that a cluster size of 5 BLEs leads to the minimization of power consumption [4]. The effectiveness of the design choices that were made during the design was proven by the experimental results for the CLB’s efficiency. Equally important to an FPGA platform is a tool set, which supports the implementation of digital logic on the proposed FPGA. The second part of the thesis involves the development of easy to use tools capable of programming 1
This work was partially supported by the project IST-34793-AMDREL which is funded by the European Commission (http://vlsi.ee.duth.gr/amdrel)
an FPGA consisting of the proposed CLB [2]. It comprises a sequenced set of steps employed in programming an FPGA chip, as shown in Fig. 1. The input is the RTLVHDL circuit description, while the output of the CAD flow is the bitstream file that can be used to configure the FPGA. The flow comprises three different types of tools: i) non-modified tools (E2FMT, SIS), ii) modified existing tools (PowerModel, T-VPack), iii) and new tools (DIVINER, DRUID, DUTYS, EX-VPR, DAGGER [3]). It is the first complete academic design flow beginning from an RTL description of the application and producing the actual configuration bitstream. Additionally, the proposed tool framework can be used in architecture-level exploration, i.e. in finding the appropriate FPGA array size (number of CLBs) and routing track parameters (SB, CB, etc.) for the optimal implementation of a target application. All the tools are open-source and available for on-line running in AMDREL website [5].
Fig.1.
Proposed design framework
All tools can be executed both from the command line and the GUI presented in a following subsection. It should be noted, that the proposed design framework possesses the following attractive features: − Source description in C/C++ language − Linux Operating System − Input: RTL VHDL, Structural VHDL, EDIF, BLIF
Output: FPGA Configuration Bitstream Implementation Process Technology Independence Portability (e.g. i386, SPARC) Min. requirements: x486, 64MB RAM, 30MB HD Modularity: each tool can run as a standalone tool Graphical User Interface Capability of running locally or remotely
b13
b11
b10
b09
b07
b06
b04
b03
b02
b01
umin_8bit
mux48
Subtract4
mux7
mux32
mux4
mux2_if
fft16pt
fft256pt
mul5and2
Bitstream size comparisson
40000 35000 30000 25000 20000 15000 10000 5000 0 decrem9
Fig. 2 shows the maximum frequencies obtained by two frameworks and devices (proposed framework with AMDREL FPGA, and Xilinx tools with Xilinx devices). It can be seen that both frameworks perform similarly, with the proposed one outperforms Xilinx in certain benchmarks, while Xilinx outperforms the proposed one in others.
Power consumption comparison
Fig. 4 shows the results from applying the DAGGER strategy for partial bitstream reconfiguration to the proposed FPGA array for a number of benchmarks. The left bar for each benchmark of Fig. 4 shows the size of bitstream file in bits, which are required for reconfiguration of the FPGA array deriving from EX-VPR tool. The DAGGER bitstream file is the size of the configuration file that produced by DAGGER tool, employing features such as compression and partial reconfiguration. The DAGGER bitstream file, as it is smaller than the initial one, needs less memory cells for storing the FPGA configuration. Additionally, it lets better hardware resources utilization, as it programs only the functional CLB.
addsub_3
Table 1. Qualitative comparison study FEATURE Proposed Xilinx Toronto Alliance VHDL/ VHDL/ Input Format BLIF VHDL Verilog Verilog 9 9 8 9 Synthesis 9 9 8 8 Power estimation 9 9 8 8 Area estimation 9 9 8 8 Archit. descript. 9 9 9 9 Placement 9 9 9 9 Routing 9 9 8 8 Bitstream 8 9 8 8 Back-annotation 9 9 8 8 GUI 9 8 8 8 Access through HTTP 9 9 9 9 User Manual Solaris/ OS Linux Solaris Linux Window
Fig. 3.
add5and2
Table 1 shows a qualitative comparison among the proposed design framework, a commercial flow and two other academic approaches. Based on the available features, the proposed design flow is the most complete academic framework, and is at least in terms of provided features comparable with commercial tools.
Bitstream size
− − − − − − −
Benchmark Initial bitstream size
Fig. 4.
DAGGER bitstream file
DAGGER bitstream file size
References
Fig. 2:
Maximum frequency comparison
Fig. 3 provides power consumption values for some of the benchmarks. It can be seen that the power consumption of the proposed architecture is somewhat greater than that of the Xilinx architecture for benchmarks after b14. This is due to the different number of LUTs that Xilinx and AMDREL devices have.
[1] V. Kalenteridis, et al., “An Integrated FPGA Design Framework: Custom Designed FPGA Platform and Application Mapping Toolset Development”, 11th Reconfigurable Architectures Workshop (RAW 2004), pp. 138a, Apr. 26-27, 2004 [2] K. Siozios, et al., “DAGGER: A Novel Generic Methodology for FPGA Bitstream Generation and its Software Tool Implementation,” in 12th Reconfigurable Architectures Workshop (RAW 2005), pp. 165b, Apr.4-5, 2005 [3] K. Siozios, et al., “A Novel FPGA Configuration Bitstream Generation Algorithm and Tool Development”, in Proc. of 13th FPL 2004, pp. 1116-1118. [4] H. Kalenteridis et al, “A complete platform and toolset for system implementation on fine-grain reconfigurable hardware”, Microprocessors and Microsystems, Vol. 29, 2005, pp. 247–259 [5] http://vlsi.ee.duth.gr:8081