Recon gurable Hardware: A New Paradigm for Digital Signal

Recon gurable Hardware: A New Paradigm for Digital Signal Processing Implementation Lars Lundheim Hgskolen i Sr-Trndelag, N-7005 Trondheim, Norway E-mail: [email protected], URL: http://www.hist.no/IN/IET/tilsette/lars.htm

ABSTRACT

Recon gurable hardware oers new degrees of freedom to the digital signal processing engineer. Necessary terms and concepts are reviewed and an implementation of a neural net is given as an example.

1. INTRODUCTION

For fast implementations of digital signal processsing algorithms one traditionally has had to choose between two alternatives. Either to realize the algorithm as a program on a general purpose DSP chip, or to go for a hard-wired solution based on one or several custom-designed ASICS. The software-alternative has the advantages of

exibility, fast development time and relatively cheap components. An ASIC-based solution is faster and may be more compact, but takes longer time to make and costs more. It is therefore used only for high-volume production and in applications with high demands on speed, weight etc. This situation is now changing. During the last decade several families of so-called Field Programmable Gate Arrays (FPGAs) have been introduced. These are digital integrated cirquits whose function can be speci ed by the user, and changed afterwards. In a few milliseconds an FPGA can be recon gured from an FFT-processor, say, to a median lter, with processing speed comparable to an ASIC. An FPGA is a component combining the best from two worlds: software versatility and ASIC performance. Several processor-boards using this new technology have been designed at various laboratories. In the following we will give a short introduction of FPGAs and recon gurable processors based on these. Then we will present an example of an implementation of a signal processing algorithm (a neural network for a high energy physics application). Finally, a comparison with traditional

methods is given.

2. RECONFIGURABLE PROCESSORS 2.1. Field Programmable Gate Arrays

A traditional (mask programmed) gate array is a an Application Speci c Integrated Cirquit (ASIC) consisting of a large array of logic gates whose logic behaviour and interconnection is determined in the last processing stage at the chip manufacturer. By using tools provided by the manufacturer a user can specify this last processing stage, giving the cirquit a desired behaviour. Signal processing algorithms can be implemented in this way when DSPs are not fast enough, or when a high production volume makes them less economical. Gate arrays are expensive, and the time from design to delivery of processed chips may be several weeks. In the second half of the eighties, a new kind of gate arrays was introduced. In contrast to the traditional variety, the Field Programmable Gate Arrays were programmable in the eld, i.e. by the user, not at the silicon foundry. The obvious advantage is that the user does not have to wait for the manufacturer to perform the extra processing step, neither to pay extra for this. An FPGA runs in two modes. In the con guration mode a bit stream is fed to the chip, specifying the logic behaviour it will have when switched to its run mode. Today most FPGAs are reprogrammable. This means that the con guration pattern of the chip may be deleted, and a new con guration bit-stream entered.

2.2. The Xilinx XC3000 Series Logic Gate Arrays

Two of the most used families of FPGAs today are the XC3000 and XC4000 series of Xilinx. We will use the XC3000 series as an example of a typical FPGA in this paper. An FPGA from this family is called a Logic Cell Array (LCA). It consists of a regular array of Con gurable Logic Blocks (CLBs). Each CLB can implement one or two combina-

tional logic functions of up to ve variables. The output of these functions can be stored in two ip ops. Between the rows and coloumns of CLBS are routing channels with various resources connecting the CLBs together. These also provide access to the I/O blocks (IOBs) situated along the perimeter of the chip. These are connected to the pins of the package and can be programmed as input, output, or disabled. The XC3000 LCAs come in various sizes from 64 to 320 LCAs, corresponding to 2000 to 9000 gate equivalents. For a more comprehensive introduction to FPGAs the tutorial by Rose et. al [1] is recommended.

2.3. Architectures

In itself an FPGA may be used as a component in a signal processing system. An example of such use is given by Klock et. al [2] where a lter bank is implemented in a Xilinx XC4000 Chip. Such ad hoc use of FPGAs will probably become frequent in the future. In this paper we are more interested in the use of general processor architectures composed of several FPGAs and RAM modules connected by buses or so-called Field Programmable Interconnection chips (FPICs). More than 50 dierent processors of this type are developed at research laboratories all over the world, and some of these are now emerging at the commercial market. Similar in concept, dierent names have been chosen, such as Programmable Active Memory (see Section 3 below), Functional Memory [3], Virtual Computer [4], ZAREPTA [5], Con gurable Matrix [6], Enable++ [7], to mention a few1 . As important as the computation power measured in number of gates, size of RAM and buses etc. is the user friendliness and eciency of the development tools. Manufacturers of FPGAs usually provide tools for placement and routing of designs together with libraries of modules such as counters, adders etc. These tools are not sucient for a smooth design and debugging environment. For example, if high density and large speed is needed, the placement of the necessary logical functions on the chip must be speci ed by the designer. Additional development tools are usually designed in parallel with the recon gurable processors. This work is at least as important as developing the processors boards. 1

A quickly growing list can be found at the WWW site

http://uts.cc.utexas.edu/ guccione/HW list.html.

3. DECPERLE-1

We will now take a closer look at one particular recon gurable processor. This board, DecPeRLe-1, was developed by DEC, Paris Research Laboratory (PRL) in 1992 as a co-processor for Digital's work stations. It is an example of what PRL calls a Programmable Active Memory, (PAM) \Memory" because from the host work station it can be read and written just as a memory cell, \Active" because, in contrast to ordinary memories, the data are modi ed between write and read, and \Programmable" because the way the data are modi ed can be decided by the user. For an extensive description of the hardware see [8]. It consists of 23 XC3090-100 Xilinx FPGAs | also called Logic Cell Arrays (LCAs) | and four 1MB SRAM banks. The units are interconnected by buses as shown in the very simpli ed Figure 1. The FPGAs in Figure 1 are represented by squares. The capital letter inside the square denotes the function of the chip: M: 16 of the chips are arranged in a 4x4 array, called the computational matrix of the board. Each of the matrix chips can communicate with its nearest neighbour through 16 wires. This part of the board is responsible for most of the computation performed by the board. S: In addition to the direct connections between the matrix chips, each chip has a 16 bit connection to each of four 64 bit wide buses (North, South, East, West). These buses end up in the so called switches. The switch chips can then be used to connect the bus lines to the memory banks (N, S, E, W) or to the two 32 bit wide North East and South West data buses. A switch chip also connects the two fos with the data buses. C: Two chips are responsible for generating address and control signals for the RAMs and other necessary control signals. These are called Controllers. One should note that even if the three classes of FPGA chips are intended for dierent use, they are all identical and completely programmable by the user. The board communicates with a host via two fos. These may be accessed at run-time either by i/o calls on a word by word basis or by DMA. Along with DecPeRLe-1 a suite of software tools has been developed. These allow the user to specify a design in detail using a C++ library, and

adrN C

adrE

FIFOs

Host adapter

>> S