A Multiplier Generator for Xilinx FPGA's - CiteSeerX

4 downloads 0 Views 265KB Size Report
Jasvinder Pal Singh. Cadence Design Systems(I) Pvt. Ltd. #A-l/B-8,. Noida Export Processing Zone. Noida-. 305, INDIA [email protected]. 1 Introduction.
A Multiplier Jasvinder

Generator

for Xilinx Anshul

Pal Singh

Shashi Kumar

Introduction

High level synthesizers produce technology independent designs as network of RTL components. Such a design can be realised in a specific technology using a technology mapper. Most of the technology mappers take a Boolean network as input and therefore require the RTL components to be expanded into boolean networks. A recently reported technology mapper [l] maps RTL networks onto LUT based FPGA’s without expanding them and produces better resuits.However, there are components like multipliers, decoders, RAMS etc. which cannot be handled efficiently by general purpose, mappers and require specialized tools called module generators. Module generators are specialised for a class’ of modules as well as a technology. In this paper, we present a module generator which can produce variety of multiplier designs for LUT based FPGA’s. It incorporates algorithms for generating seand pipelined designs. The quential , combinational multiplier generator forms a part of the IDEAS [Z] synthesis system. Different types of multipliers which can be generated have been included in the IDEAS component library, along with functions which estimate the CLB count and delays for the given size parameters and selected FPGA device. The multiplier generator generates designs for XC3000 and XC4000 family of Xilinx FPGA devices. For Xilinx XC4000 family of devices it takes advantage of the built in dedicated carry logic to generate fast multipliers. The output of the generator is a netlist in terms of the Xilinx XACT and XBLOX components which is finally mapped onto the FPGA using Xilinx XACT and XBLOX tools. A few other multiplier generators/algorithms for FPGA’s have been reported in literature [3], [4]. These cater to a particular multiplier architecture rather than giving options for tradeoff between area and delay. [3] generates a Radix-4 Booth Encoded Wallace tree multiplier [5] architecture while [4] generates a pipelined architecture.

2

Kumar,

Deptt. of Computer SC. & Engineering Indian Institute of Technology, Delhi New-Delhi-110 016, INDIA anshul,[email protected]

Cadence Design Systems(I) Pvt. Ltd. #A-l/B-8, Noida Export Processing Zone Noida305, INDIA [email protected]

1

FPGA’s

Overview tor

of the

Multiplier

1 i----t--------j--------+-----i

Figure 1: Block Diagram

of the Multiplier

Generator

be used to generate high speed or low cost or a high throughput multiplier. Figure 1 shows the block diagram of the multiplier generator. IDEAS synthesiser produces designs which may contain upto 4 different types of multipliers drawn These are from the IDEAS component library. high speed, medium speed, low cost(slow) and high throughput (pipelined) multipliers. Functions to estimate the cost and delay for the various types of multipliers are specified in the IDEAS component library which is accessed by the Library Manager to make suitable choices. Datapart synthesiser makes a proper choice of the multiplier component to be selected as a part of the final netlist by looking at the constraints set by the user and also the information obtained from the Library Manager. The multiplier generator algorithms take advantage of the FPGA architecture features for efficiently mapping the multiplier designs. A number of techniques have been proposed for efficiently mapping the multiplier architectures onto CLBs [6], and a few of these have been implemented. These include a mapping for Radix-4 Booth encoder, mapping for Carry Look Ahead (CLA adder module, using the built-in fast-carry logic oft h e Xilinx XC4000 family of devices and also clubbing the partial product generation logic with the adder logic, for combinational designs. In the XC4000 family of devices each CLB includes high

Genera-

For catering to the need of different area/timing requirements for a design the generator incorporates a number of multiplier generator algorithms which can

322 1063-9667/95 $04.0001995

IEEE

Proceedings of the 9th International Conference on VLSI Design: VLSI in Mobile Communication 1063-9667/96 $10.00 © 1996 IEEE

9th international Cmference on VLSI Design - Junuary 1996

CLB Count 204 227 275 393 273 393 64 23

Delay (ns)/ Throughput (MHz) 500.5(ns)

Type I Delay CLB Levels CSA(3) ] N + M -1 (A) N + M -2 (A) CSA (4) CSA (X N -;‘+(MM [#A) WAL (3 WAL (41 N + MI2 (Al WAL (X) Q(log N ) WAL (P) ( N + M-2 ) / 2 (A)

N = M 12 12 16 16

(Aj

:i 6 8

23.9 MHz 10.3 MHz

Table 3: CLB Level Delay Functions chitectures Table 1: Mapping FPGA’s

Multiplier

Architectures

on XC3000 In Tables 1,2 & 3 the terms 3, 4, X, P stands for XC3000, XC4000, XBLOX and Pipelined designs respectively and A, E stands for Analytical and Empirical respectively. In Table 1, for XC3000 the device used for mapping was 3090~~84-50 and that for XC4000 was 409Opg191-6 (for 16 bit designs) and 4003~~84-6 (for the remaining) .

CLB Estimation Function Type I CSA(3) 1 ( N * M )/2 + (N-l)*M (A) 0.9738 * N2 + 2,0502 * N - 6.93 (E) CSA(4) CSA (X) 1.0398 + N2 + 0.3295 * N + 1.5 (E) 1.7045 * N2 + 1.7445 * N + 2.5455 (E) WAL(3) 1.6017 * N2 + 0.7478 * N - 4.8537 (E) WAL(4) 1.8368 * N2 + 5.9606 * N + 18.0976 (E) WAL (X) WAL fP) 1.5 * N2 + 2 * N - 2 fE) Table 2: CLB Estimation chitectures

Functions

4

Conclusions

We have described a multiplier generation scheme which can ca.ter for a range of area and cost requirements by using four different architectures. Wallace tree multiplier architecture is of advantage, only if, we use fast adders for the final carry propagate adder stage. Pipelining in the Wallace tree architecture results in increased throughput at the cost of more CLB’s . It is possible to incorporate pipeling in Carry Save adder multiplier architectur also, which will result in higher clock rate. FinalIy, we would like to mention the fact that the equations for the delay values will be valid only if the design is mapped onto a single FPGA.

for Various Ar-

speed carry logic. There are two 4 input LUTs in each CLB which can be configured as a a-bit adder with built in carry. This dedicated carry circuitry is so fast and efficient that the conventional speed-up methods like CLA are of not much use even upto 16 bits. For the XC4000 family of devices we are using the dedicated carry logic by using XBLOX module for generating the Carry Propagate Adder. 3

for Various Ar-

References M.Balakrishnan and A.Kumar, [l] A.R.Naseer, “FAST: FPGA Targetted RTL Structure Synthesis Technique”, Proceedings of 7th International Conference on VLSI Design, Jan. 1994, pp 21-24.

Results

All the multiplier generator algorithms have been implemented and the results are shown in the tables below, where, N and M stand for the number of multiplier .and multiplicand bits respectively. Table 1 gives a few representative results obtained in terms of the number of CLB’s and delay values for various architectures. As it can be seen from the table the Wallace tree architecture does not offer appreciable speed advantage over the Carry Save Adder CSA) architecture. This is because the final Carry L ropagate Adder stage in the Wallace tree architecture dominates the reduced tree height. It also brings out the advantage of using the XBLOX tool in case of XC4000 family of devices. The difference will be more apparent for larger values of N. Table 2 & 3 gives the formulae for estimating the CLB count and delays respectively for the various types of multipliers. Some of these have been derived analytically, whereas the others have been obtained empirically using the actual data.

[2] ~~e~;teda~~ign Computer

Automation ,System: System Design Automataon Lab, Deptt. of SC.‘~ Engineering, IIT Delhi, Dee 1990.

[3] Suthikshn Kumar et.al, “A Fast Multiplier Generator for FPGA’s”, Proc. of the 8th International Conference on VLSI Design, pp. 50-53, Jan 1995. [4] M.E.Louie and M.D.Ercegovac, “A Variable Precision Multiplier Generator for Field Programmable Gate Arrays”, Proc. ACM Second International Workshop on FPGA’s,Feb 1994, Berkeley, CA. “M * N Booth Encoded Multiplier [5] J.F.Ardekani, Generator Using Optimised Wallace Trees”, IEEE Trans. on VLSI System, Vol 1, No 2, June 1993, pp 120-125. [6] J.P.Singh,“A Multiplier Generator for IDEAS Synthesis System”, M.Tech Thesis, Deptt. of Computer SC. &’ Engineering, IIT Delhi, May 1995.

323

Proceedings of the 9th International Conference on VLSI Design: VLSI in Mobile Communication 1063-9667/96 $10.00 © 1996 IEEE

Suggest Documents