A Multiplier Jasvinder
Generator
for Xilinx Anshul
Pal Singh
Shashi Kumar
Introduction
High level synthesizers produce technology independent designs as network of RTL components. Such a design can be realised in a specific technology using a technology mapper. Most of the technology mappers take a Boolean network as input and therefore require the RTL components to be expanded into boolean networks. A recently reported technology mapper [l] maps RTL networks onto LUT based FPGA’s without expanding them and produces better resuits.However, there are components like multipliers, decoders, RAMS etc. which cannot be handled efficiently by general purpose, mappers and require specialized tools called module generators. Module generators are specialised for a class’ of modules as well as a technology. In this paper, we present a module generator which can produce variety of multiplier designs for LUT based FPGA’s. It incorporates algorithms for generating seand pipelined designs. The quential , combinational multiplier generator forms a part of the IDEAS [Z] synthesis system. Different types of multipliers which can be generated have been included in the IDEAS component library, along with functions which estimate the CLB count and delays for the given size parameters and selected FPGA device. The multiplier generator generates designs for XC3000 and XC4000 family of Xilinx FPGA devices. For Xilinx XC4000 family of devices it takes advantage of the built in dedicated carry logic to generate fast multipliers. The output of the generator is a netlist in terms of the Xilinx XACT and XBLOX components which is finally mapped onto the FPGA using Xilinx XACT and XBLOX tools. A few other multiplier generators/algorithms for FPGA’s have been reported in literature [3], [4]. These cater to a particular multiplier architecture rather than giving options for tradeoff between area and delay. [3] generates a Radix-4 Booth Encoded Wallace tree multiplier [5] architecture while [4] generates a pipelined architecture.
2
Kumar,
Deptt. of Computer SC. & Engineering Indian Institute of Technology, Delhi New-Delhi-110 016, INDIA anshul,
[email protected]
Cadence Design Systems(I) Pvt. Ltd. #A-l/B-8, Noida Export Processing Zone Noida305, INDIA
[email protected]
1
FPGA’s
Overview tor
of the
Multiplier
1 i----t--------j--------+-----i
Figure 1: Block Diagram
of the Multiplier
Generator
be used to generate high speed or low cost or a high throughput multiplier. Figure 1 shows the block diagram of the multiplier generator. IDEAS synthesiser produces designs which may contain upto 4 different types of multipliers drawn These are from the IDEAS component library. high speed, medium speed, low cost(slow) and high throughput (pipelined) multipliers. Functions to estimate the cost and delay for the various types of multipliers are specified in the IDEAS component library which is accessed by the Library Manager to make suitable choices. Datapart synthesiser makes a proper choice of the multiplier component to be selected as a part of the final netlist by looking at the constraints set by the user and also the information obtained from the Library Manager. The multiplier generator algorithms take advantage of the FPGA architecture features for efficiently mapping the multiplier designs. A number of techniques have been proposed for efficiently mapping the multiplier architectures onto CLBs [6], and a few of these have been implemented. These include a mapping for Radix-4 Booth encoder, mapping for Carry Look Ahead (CLA adder module, using the built-in fast-carry logic oft h e Xilinx XC4000 family of devices and also clubbing the partial product generation logic with the adder logic, for combinational designs. In the XC4000 family of devices each CLB includes high
Genera-
For catering to the need of different area/timing requirements for a design the generator incorporates a number of multiplier generator algorithms which can
322 1063-9667/95 $04.0001995
IEEE
Proceedings of the 9th International Conference on VLSI Design: VLSI in Mobile Communication 1063-9667/96 $10.00 © 1996 IEEE
9th international Cmference on VLSI Design - Junuary 1996
CLB Count 204 227 275 393 273 393 64 23
Delay (ns)/ Throughput (MHz) 500.5(ns)
Type I Delay CLB Levels CSA(3) ] N + M -1 (A) N + M -2 (A) CSA (4) CSA (X N -;‘+(MM [#A) WAL (3 WAL (41 N + MI2 (Al WAL (X) Q(log N ) WAL (P) ( N + M-2 ) / 2 (A)
N = M 12 12 16 16
(Aj
:i 6 8
23.9 MHz 10.3 MHz
Table 3: CLB Level Delay Functions chitectures Table 1: Mapping FPGA’s
Multiplier
Architectures
on XC3000 In Tables 1,2 & 3 the terms 3, 4, X, P stands for XC3000, XC4000, XBLOX and Pipelined designs respectively and A, E stands for Analytical and Empirical respectively. In Table 1, for XC3000 the device used for mapping was 3090~~84-50 and that for XC4000 was 409Opg191-6 (for 16 bit designs) and 4003~~84-6 (for the remaining) .
CLB Estimation Function Type I CSA(3) 1 ( N * M )/2 + (N-l)*M (A) 0.9738 * N2 + 2,0502 * N - 6.93 (E) CSA(4) CSA (X) 1.0398 + N2 + 0.3295 * N + 1.5 (E) 1.7045 * N2 + 1.7445 * N + 2.5455 (E) WAL(3) 1.6017 * N2 + 0.7478 * N - 4.8537 (E) WAL(4) 1.8368 * N2 + 5.9606 * N + 18.0976 (E) WAL (X) WAL fP) 1.5 * N2 + 2 * N - 2 fE) Table 2: CLB Estimation chitectures
Functions
4
Conclusions
We have described a multiplier generation scheme which can ca.ter for a range of area and cost requirements by using four different architectures. Wallace tree multiplier architecture is of advantage, only if, we use fast adders for the final carry propagate adder stage. Pipelining in the Wallace tree architecture results in increased throughput at the cost of more CLB’s . It is possible to incorporate pipeling in Carry Save adder multiplier architectur also, which will result in higher clock rate. FinalIy, we would like to mention the fact that the equations for the delay values will be valid only if the design is mapped onto a single FPGA.
for Various Ar-
speed carry logic. There are two 4 input LUTs in each CLB which can be configured as a a-bit adder with built in carry. This dedicated carry circuitry is so fast and efficient that the conventional speed-up methods like CLA are of not much use even upto 16 bits. For the XC4000 family of devices we are using the dedicated carry logic by using XBLOX module for generating the Carry Propagate Adder. 3
for Various Ar-
References M.Balakrishnan and A.Kumar, [l] A.R.Naseer, “FAST: FPGA Targetted RTL Structure Synthesis Technique”, Proceedings of 7th International Conference on VLSI Design, Jan. 1994, pp 21-24.
Results
All the multiplier generator algorithms have been implemented and the results are shown in the tables below, where, N and M stand for the number of multiplier .and multiplicand bits respectively. Table 1 gives a few representative results obtained in terms of the number of CLB’s and delay values for various architectures. As it can be seen from the table the Wallace tree architecture does not offer appreciable speed advantage over the Carry Save Adder CSA) architecture. This is because the final Carry L ropagate Adder stage in the Wallace tree architecture dominates the reduced tree height. It also brings out the advantage of using the XBLOX tool in case of XC4000 family of devices. The difference will be more apparent for larger values of N. Table 2 & 3 gives the formulae for estimating the CLB count and delays respectively for the various types of multipliers. Some of these have been derived analytically, whereas the others have been obtained empirically using the actual data.
[2] ~~e~;teda~~ign Computer
Automation ,System: System Design Automataon Lab, Deptt. of SC.‘~ Engineering, IIT Delhi, Dee 1990.
[3] Suthikshn Kumar et.al, “A Fast Multiplier Generator for FPGA’s”, Proc. of the 8th International Conference on VLSI Design, pp. 50-53, Jan 1995. [4] M.E.Louie and M.D.Ercegovac, “A Variable Precision Multiplier Generator for Field Programmable Gate Arrays”, Proc. ACM Second International Workshop on FPGA’s,Feb 1994, Berkeley, CA. “M * N Booth Encoded Multiplier [5] J.F.Ardekani, Generator Using Optimised Wallace Trees”, IEEE Trans. on VLSI System, Vol 1, No 2, June 1993, pp 120-125. [6] J.P.Singh,“A Multiplier Generator for IDEAS Synthesis System”, M.Tech Thesis, Deptt. of Computer SC. &’ Engineering, IIT Delhi, May 1995.
323
Proceedings of the 9th International Conference on VLSI Design: VLSI in Mobile Communication 1063-9667/96 $10.00 © 1996 IEEE