Mapping multi-mode circuits to LUT-based FPGA using embedded ...

Mapping multi-mode circuits to LUT-based FPGA using embedded MUXes Tim Courtney, Richard Turner, Roger Woods Queen’s University Belfast {t.courtney, r.h.turner, r.woods}@ee.qub.ac.uk Programmable Systems Laboratory Electrical & Electronic Engineering, Ashby Building Stranmillis Road, Belfast, BT9 5AH Northern Ireland

Introduction

A model consisting of function units and a MUX has been proposed as a model for reconfigurable circuits [1]. If a system can be mapped to this format then a reconfigurable solution can be applied. The delay penalty for reconfiguration is on an upward trend, as device size and configuration block size increase and configuration bus size does not. For a simple circuit on a Xilinx XC2V8000 the reconfiguration penalty is likely to be at least 22500 times the circuit delay [2,3]. There are many instances when full generality is not required and therefore reducing flexibility has previously given rise to an area gain for DSP circuits [4]. In this paper, a design strategy to avoid this reconfiguration penalty without incurring the overhead of general-purpose circuitry is suggested and applied to a simple example and then to a 10-polynomial 32-bit parallel CRC system.

2.

s a b c

Figure 1. Virtex circuit synthesised using Synplify Pro

The detailed workings of the CRC decoder can be found in [5]. It is based on modulo-2 division operation. A full

0

output(k-j+1,…,k) input

j

output(1,.., k-j)

Background on the Parallel CRC

0

o

Figure 2. Embedded MUX circuit In the parallel CRC decode circuit the incoming data word is broken into blocks of j-bits; the operations relating to these bits are performed in parallel. Thus, an m-bit data word is processed in m/j clock cycles. The main processing element in the parallel circuit is an array of XOR gates. The number of XOR gates and their connectivity is defined by the CRC generator polynomial and is therefore different for different polynomials, although every XOR array, for a fixed generator length, has the same number of inputs and outputs. This connectivity is obtained from a series of modulo-2 matrix multiplications based on the generator polynomial. The resulting matrix is of size j by m and for j=8 m=32, as considered here, requires 4388 four input LUTs to implement on a Virtex FPGA. The general structure of the parallel CRC circuit is given in Figure 3; Figure 4 then shows a particular example for generator x4+x3+1 with 4-bit input blocks.

A simple circuit containing a MUX selecting between a 5-input AND and a 5-input OR was described in VHDL and synthesized using Synplify Pro 7.0, resulting in the circuit shown in Figure 1. This circuit uses four LUTs.

3.

1

d e

Embedded MUX technique

The presence of the MUX at the output of Figure 1 indicates that a reconfigurable implementation can be used. In this case, 2 LUTs would be used to implement the 5-way AND or OR function respectively. The embedded MUX circuit is shown in Figure 2. The method involves identifying the common features (associativity and inputs) and partitioning the design in such a way that resource is left in the LUT to implement a MUX. This results in a 50% reduction in area.

1

j 2-way XORs j M Registers on Outputs 1,…, k

1.

treatment of the parallel circuit can be found in [6] but a brief introduction is included here.

XOR array

Abstract For some systems, a general-purpose FPGA solution tends to be large and slow. A reconfigurable solution is smaller and faster but has a delay associated with the reconfiguration. In this paper, embedded MUXes are used to achieve the performance of reconfiguration without the time penalty. For a CRC circuit an area reduction of 93% compared to a generalpurpose solution and a reduction of 17-34% compared to similar software compiled systems is achieved.

k

output

Figure 3. General Picture of Parallel CRC

4.

Implementation details

A 10-polynomial system is considered as this gives some flexibility without being too large. The matrices for the XOR array for each of these 10 polynomials were derived using Matlab™ and the circuits for the XOR array outputs were derived from them. The following paragraphs show only the circuits for the third output from the XOR array, the structure and design method for the other outputs were similar. The hexadecimal representations of the polynomials are: FA8FC37F F6D15C19 DACEC37F A39431B7

F76D83AD 8A6C8B65

Proceedings of the 10 th Annual IEEE Symposium on Field-Programmable Custom Computing Machines (FCCM’02) 1082-3409/02 $17.00 © 2002 IEEE

93A5A6DD A58B5D9B C7DCACA5 C7A9CF8D

f Z d

D

h d g e X sel0 a f

z-1 ‘0’

z-1

‘0’

z-1

‘0’

z-1

‘0’

z-1 Registers

XOR Array

Figure 4. Parallel CRC circuit for G= x4+x3+1, with j=4 Due to the logic resources of the Xilinx Virtex device (used in this work), it makes sense to split the 10-way MUX (Figure 5) into a tree of 2-way MUXes. This splitting results in five groups of two to implement. Each group is then implemented as a circuit with two possible outputs, chosen between using a MUX, as shown in Figure 6. This output then feeds into the rest of the MUX tree. In the implementation of these five circuits, simple hardware sharing has been used. If a previously designed block provides the required output then it is re-used. In Figure 6 there are three LUT inputs labelled X, Y and Z, these are the outputs from three LUTs that are hardware shared. In the five circuits the aim is to have many MUXes embedded into the LUTs with the XOR gates to eliminate redundancy in the first level of logic. Four versions of the 10polynomial CRC have been implemented. Two used commercial tools to compile behavioural VHDL to edif netlists, one used ten constant polynomial circuits and MUXed between them, and the last was the proposed embedded MUX structure written using structural VHDL. The first three had low target speeds (1 MHz) for compilation to allow the tools to optimise for area. This was then changed to 35MHz for place & route. Table 1 shows that the embedded MUX technique creates the smallest circuits. a⊕d⊕e⊕f b⊕d a⊕c⊕h a, b, c, d, e, f, g, h 8

a⊕b⊕c⊕g

a⊕c⊕d⊕f c⊕d

b⊕c⊕d⊕h d⊕f

a⊕b⊕c⊕d⊕g d⊕g⊕h

Figure 5. Circuits for Output(3)

Output(3)

c d sel0 c Y f sel0 a b h

F6 MUX

2-input XORs

sel2

F5 MUX

sel0

(0)

(1)

F5 MUX

(2)

(3)

sel1

‘0

d c sel0

sel3

Output(3)

g

‘0

Figure 6- The circuit as implemented Circuit Synplify Pro Xilinx Foundation 10 + MUX Proposed

5.

Circuit area (LUTs)

Target Speed (MHz)

Relative area

Max. Speed (MHz

375

35

1·20

48·0

368

35

1·50

61·2

399

35

1·28

312 35 1·00 Table 1. Comparison of circuits

35·5

Acknowledgements

The authors acknowledge the support of the Engineering and Physical Sciences Research Council (grant GR/98909).

6. References [1]N. Shirazi, W. Luk, P. Cheung, “Automating Production of Run-Time Reconfigurable Designs”, pp147-156,Proc. IEEE Symp. on FCCM 1998, April 1998 [2]T. Courtney, R. Turner, R. Woods, “Multiplexer Based Reconfiguration for Virtex Multipliers”, pp749-758, Field Programmable Logic and Applications, August 2000. [3] “Virtex II Platform FPGA handbook”, p387, published online from www.xilinx.com, revised Dec. 2001. [4] T Courtney, R Turner, R Woods, “Implementation of Fixed Coefficient DSP functions using the reduced coefficient multiplier” (invited paper), Volume II, spec-L3.1, IEEE Conf. On Acoustics, Speech and Signal Processing, Special Session “Configurable Computing for DSP”, Salt Lake City, 7-11 May 2001, [5] A S Tanenbaum, “Computer Networks” Prentice Hall, 1981 pp128-131 [6] T. Pei, C. Zukowski, “High-Speed Parallel CRC Circuits in VLSI”, IEEE Transactions on Communications, Vol 40, No. 4, April 1992

Proceedings of the 10 th Annual IEEE Symposium on Field-Programmable Custom Computing Machines (FCCM’02) 1082-3409/02 $17.00 © 2002 IEEE

Mapping multi-mode circuits to LUT-based FPGA using embedded ...

Mapping multi-mode circuits to LUT-based FPGA using embedded ...

Suggest Documents

PARTITIONING FOR FPGA CIRCUITS

PLACEMENT ALGORITHM FOR FPGA CIRCUITS

PLACEMENT ALGORITHM FOR FPGA CIRCUITS

Hands-on Teaching of Embedded Systems Design Using FPGA ...

Hands-on Teaching of Embedded Systems Design Using FPGA

Hands-on Teaching of Embedded Systems Design Using FPGA

Mapping Applications to Tiled Multiprocessor Embedded Systems

cad system for the atmel fpga circuits

FPGA-Based Multimodal Embedded Sensor System ... - MDPI

Embedded Network Firewall on FPGA - Semantic Scholar

Embedded Network Firewall on FPGA - Semantic Scholar

Managing Security in FPGA-Based Embedded Systems

Embedded System Implementation on FPGA System ...

USING FPGA CONFIGURATION CIRCUITRY TO ... - CiteSeerX

Design of Adaptive Multimode RF Front-End Circuits - Electronics ...

multiseeded multimode formation of embedded clusters in the rosette

Full characterization of a highly multimode entangled state embedded

Datapath-oriented FPGA Mapping and Placement

Technology Mapping and Clustering for FPGA ... - UCLA.edu

FPGA Implementation of Log-polar Mapping - umexpert

Estimation of LiBr-H2O Using Multimode Interference

Automated Mapping of Reo Circuits to Constraint ... - Semantic Scholar

Automated Mapping of Reo Circuits to Constraint ... - CyberLeninka

Using the Multimode Sample Introduction System ...