On Retargeting with FPGA Technology

0 downloads 0 Views 22KB Size Report
now operational, contains a large number of PALs and .... The number of flip-flops in a PAL chip is ... because a larger number of logic blocks and routing seg-.
On Retargeting with FPGA Technology Zeljko Zilic and Zvonko Vranesic Department of Electrical and Computer Engineering University of Toronto

Abstract The paper describes the experience and presents the results of a retargeting exercise aimed at replacing the PAL-implemented control circuitry of a high-performance multiprocessor with FPGA devices. While straightforward retargeting is easy to perform, it does not give good results with respect to speed requirements. In order to attain the desired speed performance, it is necessary to redesign the sequential circuits found in the original design.

1.0 Introduction Field Programmable Gate Arrays (FPGAs) have matured as a technology that is used in numerous applications where low cost of implementation and short development time are of essence. However, these applications have typically excluded the cases where speed of operation is of critical importance, because FPGAs are significantly slower than other user-programmable devices such as PALs. From the practical point of view it is interesting to consider the possibility of using FPGAs in as speeddemanding environment as possible. We have done so in the context of a high-performance multiprocessor system, called Hector [8], which is being developed at the University of Toronto. A prototype Hector machine, which is now operational, contains a large number of PALs and hence requires a large printed circuit board for each processor module. The reason for using PALs was to meet the timing requirements imposed by the 16.66 MHz system clock. Having a stable design for the Hector prototype, we are now attempting to assess the suitability of recently introduced FPGAs for implementation of circuitry that is now realized with PALs, in hope of making large use of FPGAs in the next version of Hector multiprocessor. This paper presents the results of this exercise and describes some

design decisions that have to be made when attempting to use FPGAs in a high-speed environment. An interesting aspect of our exercise is that we have not done a completely new design with FPGAs in mind. Instead, we used the retargeting approach whereby an existing design comprising PAL, LSI and MSI logic is redesigned in as automatic way as possible using the FPGA technology. Such retargeting is of considerable interest, particularly if it can be done quickly. Section 2 describes the retargeting environment used. Although the environment helped us achieve the rapid development, special measures had to be taken in redesigning the sequential logic in order to meet the critical timing constraints. The techniques used in retargeting the state machines are given in Section 3. Section 4 highlights the results obtained.

2.0 The Retargeting Process Retargeting is a technologically complex process that cannot be done without using specialized tools. We will use the notation introduced by Gajski and Kuhn [5] and Figure 1 to describe the retargeting procedure in terms of design synthesis [7]. The Y-chart depicts how each element in a system can be described in behavioural, structural and physical domain. The level of abstraction increases when going from the center point, and mappings between the domains are referred to as synthesis, generation, extraction and analysis processes. The original design process consists of a series of mappings from the behavioural description to a set of partitioned final design elements. Retargeting process starts from the structural (which elements are used) and physical (how are they connected) information about the existing design and produces a new design in a new technology. Often, the original design is given in more than one technology and more than one data format, and the retargeting environment must recognize all of them.

structural

behavioural synthesis

Synthesis, Partitioning physical

g

behavioural

g

g

structural synthesis

behavioural behavioural

structural

structural technology mapping

physical

physical physical behavioural

structural

physical

Original Design Retargeting

2.2 FIGURE 1.

2.1

Y-Chart Description of Retargeting

The Retargeting Environment

The task of logic synthesis (mapping from behavioural to structural domain) is delicate and it depends heavily on the technology used. For example, the PAL-based designs are based on the two-level synthesis methods, since PALs contain a level of AND gates followed by a level of OR gates, while the FPGA-based designs tend to have multiple levels, because FPGAs contain a set of logic blocks that can be interconnected in arbitrary ways. The logic synthesis methods are thus very different, and the retargeting procedure must deal with the details of all the technologies involved. In our case, we employed the retargeting program CORE [3] from the Exemplar Logic Inc., very often referred to as Exemplar. This is a complete retargeting environment that can be used to produce the designs in gate array, FPGA and several proprietary netlist notations. Inputs can be given in various behavioural, structural and physical domain description forms, such as VHDL, Palasm, PLA format (also known as ESPRESSO), Xilinx XNF, etc. The environment uses its own integration language, Exemplar Integration Language (EIL), to merge the design descriptions given in independent design modules.

The Procedure

Armed with the retargeting environment, the task can be done in a straightforward way. We will describe it in more detail, in order to show some of the options offered, but also to point out the potential obstacles in obtaining the high performance of the design in the target technology. In our example, the original design was given by the set of ABEL PAL descriptions and schematics describing the PAL and catalogue MSI and SSI device interconnections. The target technology was the Xilinx XC4000 FPGA family. We also had the ABEL design environment that was able to produce the set of combinatorial and sequential logic descriptions in the sequential extension of PLA design format. Programs ahdl2pla and plaopt perform the mapping to PLA format and the subsequent optimization. The retargeting procedure consists of taking the designs given in ABEL, converting them to the technology independent PLA format (tt2 in Figure 2) and grouping the related units into design modules. The grouping usually follows the schematics grouping, and an EIL file describes how the PAL units are connected. The design modules are then grouped until the whole design is contained in one module. The circuits comprising LSI or MSI elements can be included in form of VHDL description, as shown in Figure 2. The result is given in XNF format, from which, by the technology mapping (using program ppr), the routed and placed design is obtained.

eil

PAL1 ABL

ahdl2pla

tt1

plaopt

tt2 MODULE1 core

xnf

PAL2 ABL

ahdl2pla

tt1

plaopt

tt2

ppr

eil

lca

MODULEn2 xdelay

PAL(m) ABL

ahdl2pla

tt1

xnf ppr

plaopt

tt2

MODULEj core

ahdl2pla

tt1

lca xdelay

xnf ppr

PAL(n-1) ABL

core

eil

eil

core

lca

plaopt

tt2

xnf xdelay

ppr lca xdelay

MSI VHDL

MODULEn core

xnf ppr lca xdelay

FIGURE 2.

Retargeting Procedure

The design verification must be done independently and, if performance is of no interest, the retargeting procedure is finished in a single iteration. However, our goal was to produce a fast design, and the method to achieve this will now be described. 2.3

Performance Directed Retargeting

The performance of the entire design, or a part of it, can be assessed only when the routing and placement is done. In our example, Xilinx program xdelay was used to estimate the delays caused by the logic block and routing network delays. The routing component is not known before the placement and routing, and this is the main reason why no sensible performance estimate can be made a priori. Furthermore, even the routing delay values are not unique to the design, since the Xilinx routing and placement program is nondeterministic. Repeating this step a few times and taking the best result seems to be the best recipe for achieving the best performance at this step.

We found that the optimization procedures incorporated in Exemplar perform very well for the combinational logic and that not much improvement can be achieved in this area. However, this is not true for the sequential logic. The reason is that the sequential logic synthesis methods have not been developed for all the logic device architectures, and that the input sequential logic is predominantly given in structural domain, even if the optimal synthesis was available. For retargeting, the sequential devices must first pass the analysis (from structural to behavioural domain) and then the synthesis (to new structural domain) that matches the target architecture must be employed, and this is certainly not done in the process described above.

3.0 State Machine Performance We will examine now how the speed of the sequential logic can be optimized when doing retargeting from PALbased designs. The number of flip-flops in a PAL chip is limited, but the number of inputs to a part of the state machine (together with the feedback from flip-flops) can exceed 20. The designer’s goal is to minimize the number of bits needed for the state assignment, while the fanin of

each state can be relatively large. With the FPGAs, the designer wants to contain all the logic associated with one flip-flop in one logic block. This means that the fanin of the flip-flop is limited to some small number. If the fanin exceeds this number, the speed penalty must be paid, because a larger number of logic blocks and routing segments will be needed. On the other hand, the number of available flip-flops is much larger in FPGAs. Table 1 summarizes the differences between PAL and FPGA technologies with respect to the sequential synthesis. TABLE 1.

Differences between PALs and FPGAs

Characteristics

PAL

FPGA

# Inputs per block

large (>20) small (